python crawls the real-time data of new coronavirus and draws the epidemic map of each province

Idea:
After accidentally contacting the crawler, I want to try to crawl the data of the new coronavirus, but I am programming Xiaobai. I can only look for the code on the Internet for learning. I have seen the blogger Hakuna_Matata_001 After the code of, we can say that we wrote it silently, and then added some new ideas of our own, and added comments according to our actual situation (we can say that every code has been added, after all, it is very small white)
I just want to realize the epidemic map, so on the basis of the original blogger, I added the specific information of each city
Data source and reference:
Website of reference blog: Hakuna_Matata_001
Data source: Tencent real time epidemic information
Comparison table: Country name in Chinese and EnglishDetailed names of cities in China

Difference:
Because most of the codes are copy codes, there is no specific step-by-step explanation. There are two main differences:
(1) The original blogger did not have the function of automatic update, so he added the function of automatic update with time and datetime
(2) The original blogger didn't have the data specific to each city in a province, so he added the specific city data according to the rules, but the base map is the map of each province, not integrated into the whole map of China

The codes and comments are as follows:

import requests
import pandas as pd
import json
from pyecharts.charts import * #Import all charts. Because of the update of pyecharts version, charts need to be imported from charts
from pyecharts import options as opts
from pyecharts.globals import ThemeType   #Import topics for pyecharts
import time
from datetime import datetime



pd.set_option('display.max_columns', None)    #Restrict table, show all columns
#pd.set_option('display.max_rows', None)      #Show all rows
#pd.set_option('expand_frame_repr', False)    #Do not wrap until one line is displayed

#Data crawling. The crawling data format is json format
def catch_cityinfo():
    url = 'https://view.inews.qq.com/g2/getOnsInfo?name=disease_h5'
    response = requests.get(url).json()['data']    #Note the use of json. What you need is the data set in the link
    data = json.loads(response)
    return data

while True:  #Used to set the update time
    if datetime.now() < datetime(2020, 3, 2, 23, 59, 59):   #Refers to the deadline of program operation: March 2, 2020, 23:59:59
        data = catch_cityinfo()


        # data = catch_cityinfo()
        # Print out the keys parameter of data, convenient for data selection and retrieval
        print('data.keys:\n', data.keys())  # Use \ n to wrap

        # Centralized data processing, select data according to the content in data.keys
        # Print the extracted data for later processing
        lastUpdateTime = data['lastUpdateTime']  # Last updated time, related to total amount, can be used for chart title
        print('Last updated:\n', lastUpdateTime)
        chinaTotal = data['chinaTotal']  # The total number of confirmed cases, suspected cases, etc. in China. After printing, you can know the specific contents for statistical use (drawing charts)
        print('chinaTotal:\n', chinaTotal)
        chinaAdd = data['chinaAdd']  # The total number of people in the whole country, which is increased from yesterday, is used for Statistics (drawing charts)
        print('chinaAdd:\n', chinaAdd)
        chinaDayList = data['chinaDayList']  # The domestic daily data, which is calculated according to the different time date, can be used to draw a line chart to observe the changes
        print('Domestic daily data chinaDayList:\n', chinaDayList)
        chinaDayAddList = data['chinaDayAddList']  # Daily new data in China, statistics based on different time and date
        print('Domestic daily increase data chinaDayAddList:\n', chinaDayAddList)
        areaTree = data['areaTree']  # Specific total data of each country, total data of each province in China, and total data of each city are used for display in the map
        print('Regional classification area:\n', areaTree)  # After printing, you can see the data of the area. The data of the epidemic map is mainly used here

        # Domestic data processing
        # In the areaTree data, the 0 is the data of China. The areaTree is a dictionary, but there are other dictionaries in it. The China data in the 0 line is still the dictionary data, so take out the "children" in it
        china_data = areaTree[0]['children']  # Data of China, data of each province in children
        print('china_data:\n', china_data)
        china_list = []  # Define an array first, then use

        for a in range(len(china_data)):  # len determines the length of China data, and then assigns the length to a
            province = china_data[a]['name']  # By assigning a value to a (0-len (China? Data)), the names of each province are saved in the new dictionary Province
            province_data = china_data[a]['children']  # The dictionary data of children, that is, the information of each city in each province, is stored in the new dictionary "provision" data
            for b in range(len(province_data)):  # Same with a
                city = province_data[b]['name']  # Take the name of each city
                city_today = province_data[b]['today']
                # print('city_today:',city_today)    #View data types in today for easy extraction
                city_total = province_data[b]['total']  # Get the list data of the total number of each city
                # print('city_total:',city_total)    #View data types in total for easy extraction

                china_dict = {}

                china_dict['province'] = province
                china_dict['city'] = city
                china_dict['total'] = city_total
                china_dict['today'] = city_today

                # Add each small dictionary to an array of China list with append function, and the final data form is [{...}]
                china_list.append(china_dict)

        china_data = pd.DataFrame(china_list)  # Pay attention to the use of DataFrame
        print('After preliminary treatment china_data:\n', china_data.head())


        # Define processing functions
        def confirm(x):  # confirm refers to the number of confirmed people
            confirm = eval(str(x))['confirm']  # eval outputs a dictionary. str converts the data into a string, which is interpreted as adding a colon to the original data: ""
            return confirm


        def suspect(x):  # suspect is the number of people suspected
            suspect = eval(str(x))['suspect']
            return suspect


        def dead(x):  # dead is the number of deaths
            dead = eval(str(x))['dead']
            return dead


        def heal(x):  # heal is the number of people cured
            heal = eval(str(x))['heal']
            return heal


        # The function of map is to run the specified function in map (). The object of action is list data. A new list will be saved without changing the original data
        # It is speculated that the function of China data ['total '] is to extract the' total 'list data as the input data of map()
        china_data['confirm'] = china_data['total'].map(confirm)
        china_data['suspect'] = china_data['total'].map(suspect)
        china_data['dead'] = china_data['total'].map(dead)
        china_data['heal'] = china_data['total'].map(heal)
        china_data['addconfirm'] = china_data['today'].map(confirm)
        china_data['addsuspect'] = china_data['today'].map(suspect)
        china_data['adddead'] = china_data['today'].map(dead)
        china_data['addheal'] = china_data['today'].map(heal)

        # The whole list contains the data of province, total, total, confirm, etc. at this time, the data of total and total have been extracted and included in the data of confirm, prospect, etc
        # Therefore, use the following code to remove the total and total data in China data
        # Single quotation mark and double quotation mark in python can be used to represent strings. The combination of the two can make the representation of strings more flexible
        china_data = china_data[["city", "province", "confirm", "suspect",
                                 "dead", "heal", "addconfirm", "addsuspect", "adddead", "addheal"]]

        # Try commenting on the previous line of code to see the difference
        # print(china_data)
        print("After complete treatment china_data:\n", china_data.head())  # In the form of tables, you can see the structure of data more intuitively

        # International data processing
        global_data = pd.DataFrame(areaTree)  # Look at the data structure from areaTree to extract data
        global_data['confirm'] = global_data['total'].map(confirm)
        global_data['suspect'] = global_data['total'].map(suspect)
        global_data['dead'] = global_data['total'].map(dead)
        global_data['heal'] = global_data['total'].map(heal)
        global_data['addconfirm'] = global_data['today'].map(confirm)
        global_data['addsuspect'] = global_data['today'].map(suspect)
        global_data['adddead'] = global_data['today'].map(dead)
        global_data['addheal'] = global_data['today'].map(heal)

        # Need to add Chinese English comparison table
        # Because in the global map of pyecharts, the country name is in English, so you need to use the English name to match the data with the map spot
        global_name = pd.read_excel("National Chinese English comparison.xlsx")
        # Use pd.merge to merge. Left ﹣ on and right ﹣ on are the columns used to reference the merge in the two tables
        # how is the insertion method. inner refers to the common part of two tables in the merge position. If there are different parts in the added reference value, both are saved. The record without data is NaN
        global_data = pd.merge(global_data, global_name, left_on="name", right_on="Chinese", how="inner")
        global_data = global_data[["name", "English", "confirm", "suspect", "dead", "heal",
                                   "addconfirm", "addsuspect", "adddead", "addheal"]]
        print("global_data:\n", global_data.head())



        # Data visualization

        # In the pyechart drawing step, first determine the base Map, such as Pie, Bar, Map, and determine the theme and size of the canvas
        # The second step is to add data with the add function. The content of add includes the coordinates of the chart, the corresponding data, the position of the chart in the canvas, etc
        # add when the base map is a map, you need to determine the type of the base map, that is, maptype = 'world' or 'china' or 'Hubei'
        # Use get global opts to define the chart name, and use visual map opts to modify the color and legend range
        # Use set series opts to determine the legend style
        # Save as html file with render

 
        # Global epidemic diagnosis map
        world_map = Map(init_opts=opts.InitOpts(theme=ThemeType.WESTEROS))
        world_map.add("", [list(z) for z in zip(list(global_data["English"]), list(global_data["confirm"]))],
                      maptype='world',
                      is_map_symbol_show=True)  # Is? Map? Symbol? Show will have a map marker, with a point above each country
        world_map.set_global_opts(title_opts=opts.TitleOpts(title="nCoV Global epidemic map"),
                                  visualmap_opts=opts.VisualMapOpts(is_piecewise=True,
                                                                    pieces=[
                                                                        {"min": 101, "label": '>100'},
                                                                        {"min": 10, "max": 100, "label": '10-100'},
                                                                        {"min": 0, "max": 9, "label": '0-9'}
                                                                    ]))
        world_map.set_series_opts(label_opts=opts.LabelOpts(is_show=False))  # If True, each country's name is displayed on the map
        world_map.render('world_map.html')

        # Epidemic map of provinces in China
        area_data = china_data.groupby("province")[
            "confirm"].sum().reset_index()  # reset_index is the recovery index. It is understood that 0, 1, 2... Are added before the list to associate the corresponding values of each row
        area_data.columns = ["province", "confirm"]  # Here, we define the index of the column direction of area ﹣ data, which is for the data of DataFrame

        area_map = Map(init_opts=opts.InitOpts(theme=ThemeType.WESTEROS))
        area_map.add("", [list(z) for z in zip(list(area_data["province"]), list(area_data["confirm"]))],
                     maptype="china",
                     is_map_symbol_show=False)
        area_map.set_global_opts(title_opts=opts.TitleOpts(title="nCoV Epidemic map of China"),
                                 visualmap_opts=opts.VisualMapOpts(is_piecewise=True, pieces=
                                 [
                                     {"min": 1001, "label": '>1000', "color": "#893448"},  # Do not specify max, which means Max is infinite
                                     {"min": 500, "max": 1000, "label": '500-1000', "color": "#ff585e"},
                                     {"min": 101, "max": 499, "label": '101-499', "color": "#fb8146"},
                                     {"min": 10, "max": 100, "label": '10-100', "color": "#ffb248"},
                                     {"min": 0, "max": 9, "label": '0-9', "color": "#fff2d1"}
                                 ]))
        area_map.set_series_opts(label_opts=opts.LabelOpts(is_show=False))  # If True, the name of each identity is displayed on the map
        area_map.render("area_map.html")

        # Urban epidemic map
        chinaCity_data = china_data[["city", "province", "confirm"]]
        # print('chinaCity_data:\n',chinaCity_data)

        city_name = pd.read_excel('Chinese cities.xlsx')
        # print('city_name:\n',city_name)

        # In DataFrame data, use this to calculate the number of table rows (https://blog.csdn.net/u012189747/article/details/78203364)
        row_number = area_data.iloc[:, 0].size
        # print(area_data.iloc[:,0].size)

        # Note that area data is an object in DataFrame data format, and the way to locate data is different
        # area_data['province'].loc[na] in the following, the preceding ['province'] is the column direction index, and the loc[na] is the row direction index (https://blog.csdn.net/angaixing0071/article/details/101700345)
        # When writing, you can first set a value, such as for na in range(0,1), to verify the code, and then change it to for Na in range (0, row Ou number):
        for na in range(0, row_number):
            provinceName = area_data['province'].loc[na]
            # print('provinceName:\n',provinceName)

            city_data = china_data.loc[china_data['province'] == provinceName].reset_index(drop=True).reset_index()
            # print('city_data:\n',city_data)  #After printing, observe index
            cityName = city_name.loc[city_name['province'] == provinceName].reset_index(drop=True).reset_index()
            # print('cityName:\n',cityName)  #After printing, observe the index to facilitate the merging
            cityData = pd.merge(city_data, cityName, left_on='index', right_on='index', how='inner')
            # print('cityData:\n', cityData)  #Print it out, which is good for the next data selection
            cityData = cityData[["province_x", "city", "city_name", "confirm"]]
            # print('cityData:\n',cityData)   #After printing, observe whether the data is aligned

            city_map = Map(init_opts=opts.InitOpts(theme=ThemeType.WESTEROS))
            city_map.add("", [list(z) for z in zip(list(cityData['city_name']), list(cityData['confirm']))],
                         maptype=provinceName,
                         is_map_symbol_show=False)
            city_map.set_global_opts(title_opts=opts.TitleOpts(provinceName + "nCoV Epidemic map"),
                                     visualmap_opts=opts.VisualMapOpts(is_piecewise=True, pieces=
                                     [
                                         {"min": 5001, "label": '>5000'},  # Do not specify max, which means Max is infinite
                                         {"min": 1501, "max": 5000, "label": '1500-5000'},
                                         {"min": 501, "max": 1500, "label": '500-1500'},
                                         {"min": 351, "max": 500, "label": '350-500'},
                                         {"min": 201, "max": 350, "label": '200-350'},
                                         {"min": 101, "max": 200, "label": '100-200'},
                                         {"min": 51, "max": 100, "label": '50-100'},
                                         {"min": 0, "max": 50, "label": '0-50'}
                                     ]))
            city_map.set_series_opts(label_opts=opts.LabelOpts(is_show=False))
            city_map.render('{}.html'.format(provinceName))

            page = Page()
            page.add(city_map)

        time.sleep(86400)  # The unit is seconds, 86400 seconds is 24 hours, which means 24 hours update
    else:
        print('End time, stop fetching')
        break


Result diagram



Published 1 original article, praised 0, visited 7
Private letter follow

Tags: JSON Programming Python

Posted on Tue, 11 Feb 2020 23:33:25 -0800 by vanessa123