Using Python visual artifact plot to dynamically demonstrate the trend of global epidemic

data sources

The epidemic data comes from open source projects Akshare.

Preparation

Operating environment:

  1. Windows 10 system
  2. Anaconda(Python 3.7)
  3. Jupyter Notebook
    Python library used this time: akshare, panda, plotly

Data import

import akshare as ak
import pandas as pd
import plotly
from plotly.offline import iplot, init_notebook_mode
import plotly.express as px
from datetime import datetime

init_notebook_mode()

Init "notebook" mode() is to use plot offline without registering an account, but the function is not complete in online mode. Here are two modes:
Two modes of plot

Offline mode: there is no limit to the number of images, all of which are local.
Online mode: up to 25 pieces can be uploaded, edited and viewed online through a browser. There are three situations for better sharing: public, private and secret.

# Get number from akshare
# df_all_history = ak.epidemic_history()

# Get data from csv file
df_all_history = pd.read_csv('epidemic_all_20200307.csv',index_col=0)


df_all_history

Because the data obtained by this project is sometimes not stable and may encounter connection failure, the saved data is selected here.

Extract data

For the data obtained from the above, some data formats need to be adjusted. For date, we will organize two columns of data, one is the date in time format (['date ']), the other is the date in string format (['dates']). The reason for this setting is that we need to use the dates of these two formats in the future.

df_all = df_all_history

# Save date in string format as a column
df_all['dates'] = df_all_history['date']

# Convert date in string format to date format
df_all['date'] = pd.to_datetime(df_all['date'])

Obtaining epidemic data of foreign countries

The above data is global data. We can exclude those belonging to China and get foreign data.

# Overseas, by country
df_oversea = df_all.query("country!='China'")
df_oversea.fillna(value="", inplace=True)

df_oversea

Data visualization

First, use plot express to see the overall trend of the epidemic situation in different countries.

fig_oversea = px.line(df_oversea, x='dates', y='confirmed',
                      line_group='country',
                      color='country',
                      color_discrete_sequence=px.colors.qualitative.D3,
                      hover_name='country',
                     )

fig_oversea.show()

The effect is as follows



As can be seen from the figure above, the development trend of the epidemic situation in most countries is obvious from February 10. Therefore, we will focus on the situation after this period.

# Existing data demonstration from February 10, 2020
df_oversea_recent = df_oversea.set_index('date')
df_oversea_recent = df_oversea_recent['2020-02-10':]
df_oversea_recent

Since the data of some countries are not recorded from February 10, 2020, it is necessary to supplement the data. We can manually create a new excel data table and fill in the value of the supplementary date as 0.
The main supplement here is Iran's data, because Iran is developing too fast, which must be included in the analysis. Other countries, if they need to supplement, can continue to improve in the future.

# In some countries, the data is not from February 10, 2020, so we need to supplement the data with a value of 0
# The data is supplemented in excel table and read here

df_oversea_buchong = pd.read_excel('epidemic_buchong.xlsx')
df_oversea_buchong['dates'] = df_oversea_buchong['date'].apply(lambda x:x.strftime('%Y-%m-%d'))
df_oversea_buchong.set_index('date', inplace=True)
df_oversea_buchong.fillna(value="", inplace=True)
print(df_oversea_buchong.info())
df_oversea_buchong

After completing the data to be supplemented, we can combine the above two parts of data and analyze them together.

# Consolidated supplementary data
df_oversea_recent_new = df_oversea_recent.append(df_oversea_buchong)
df_oversea_recent_new.sort_index(inplace=True)
df_oversea_recent_new

After getting the merged data, first of all, the bubble chart is used to visualize the changes. Here, the scatter chart of plot express is used.

fig_oversea_recent = px.scatter(df_oversea_recent_new, x='dead', y='confirmed',
                                size='confirmed', text='country', color='country',
                                color_discrete_sequence=px.colors.qualitative.Light24,
                                animation_frame='dates',animation_group='country',
                                hover_name='country',
                                range_x=[-10,260],
                                range_y=[0,8000],
                                size_max=50,
                                template='plotly_white',

)

fig_oversea_recent.show()

The effect is as follows


Source code and data file

Download source code Download data
Published 12 original articles, won praise 3, visited 936
Private letter follow

Tags: Python Excel Windows Anaconda

Posted on Tue, 10 Mar 2020 02:05:43 -0700 by pvtpyro