How to use Python to crawl small game websites and collect favorite games (with source code)

 

 

Introduction:

Python is a simple, easy to learn and powerful programming language. Without tedious configuration, master the basic syntax and understand the basic library functions, you can write your own programs by calling a large number of existing toolkits, easily realize batch automatic operation, and greatly improve the efficiency of business and learning. Python crawlers can obtain data on Web pages in bulk.

Python environment configuration

1. Code editor pcharm community

2. Code interpreter Python 3.7.6

3. Create a project in pysharm and configure Python environment

4. Two ways to install the tool kit

 

4455 Miniclip games reptile actual battle

1. Basic steps of reptile

  • Download Web pages using requests
  • Using beautifulsop to parse requests downloaded content into DOM (document object model)
  • Get the required data through DOM

 

2. Local operation of 4455 Miniclip games

  • Support to download to local Games: games with. swf extension
  • The absolute address can be obtained from the < embed > src attribute of the game main page
Example of absolute address of the game: http://sxiao.4399.com/4399swf/upload/ftp29/liuxinyu/20190731/7/main.swf
  1. The relative address can be obtained from the game information page: in the < script > tab, you can get the relative address from the Ctrl+F search keyword ﹤ strGamePath
Example of the relative address of the game: / upload_swf/ftp29/liuxinyu/20190731/7/main.swf
  1. Required files: iqiyi universal player (renamed as universal unicast) (GeePlayer.exe) PC version of universal unicast

 

 

 

3. 4455 Miniclip games reptile implementation ideas

  • Climb the 4455 minigame page (http://www.4399.com/flash/gamehw.htm) and get all the game links by parsing the DOM
  • Traverse all game links, start the thread to download the linked web page and determine whether the game supports downloading to the local, if so, splice the download address and start the game download thread
  • Game download thread: download the. swf file according to the download address and save it locally

Full code

1import os
 2import re
 3import threading
 4
 5from bs4 import BeautifulSoup as bs
 6import requests
 7
 8
 9def getAllGameUrl():
10    """
11    Get all game names and links to the game information page
12    :return:
13    """
14    gameUrlList = []
15    response = requests.get('http://www.4399.com/flash/gamehw.htm')
16    dom = bs(response.content, 'html.parser')
17    gameLiList = dom.select('#skinbody > div:nth-child(6) > ul > li')
18    for i in gameLiList:
19        # Get the name of the game
20        gameName = i.select_one('a > b').get_text()
21        # Get a link to the game information page
22        # 'http://www.4399.com/flash/212103.htm'
23        gameInfoUrl = indexUrl + i.select_one('a')['href']
24        gameUrlList.append({'gameName': gameName, 'gameInfoUrl': gameInfoUrl})
25    return gameUrlList
26
27
28def downloadIfAvailable(game):
29    """
30    Determine whether a game supports local download
31    :return:
32    """
33    response = requests.get(game['gameInfoUrl'])
34    plainText = response.text
35    relativeUrlList = re.findall(r'(?<=_strGamePath=").+?\.swf', plainText)
36    if len(relativeUrlList) != 0:
37        gameUrl = gameBaseUrl + relativeUrlList[0]
38        game['gameUrl'] = gameUrl
39        threading.Thread(target=downloadAGame, args=(game,)).start()
40
41
42def downloadAGame(game):
43    """
44    Download the game according to the download link and save it to.swf file
45    :param game:
46    :return:
47    """
48    downloadPath = 'games/'
49    if not os.path.exists(downloadPath):
50        try:
51            os.mkdir(downloadPath)
52        except FileExistsError as e:
53            print(e)
54    with open(downloadPath + game['gameName'] + '.swf', 'wb') as file:
55        file.write(requests.get(game['gameUrl']).content)
56        print(game['gameName'] + 'Download complete')
57
58
59if __name__ == '__main__':
60    indexUrl = 'http://www.4399.com'
61    gameBaseUrl = 'http://sxiao.4399.com/4399swf'
62    gameUrlList = getAllGameUrl()
63    for i in gameUrlList:
64        threading.Thread(target=downloadIfAvailable, args=(i,)).start()

Tags: Python Programming Attribute

Posted on Mon, 11 May 2020 02:49:10 -0700 by Tea_J