The epidemic data comes from open source projects Akshare.
- Windows 10 system
- Anaconda(Python 3.7)
- Jupyter Notebook
Python library used this time: akshare, panda, plotly
import akshare as ak import pandas as pd import plotly from plotly.offline import iplot, init_notebook_mode import plotly.express as px from datetime import datetime init_notebook_mode()
Init "notebook" mode() is to use plot offline without registering an account, but the function is not complete in online mode. Here are two modes:
Two modes of plot
Offline mode: there is no limit to the number of images, all of which are local.
Online mode: up to 25 pieces can be uploaded, edited and viewed online through a browser. There are three situations for better sharing: public, private and secret.
# Get number from akshare # df_all_history = ak.epidemic_history() # Get data from csv file df_all_history = pd.read_csv('epidemic_all_20200307.csv',index_col=0) df_all_history
Because the data obtained by this project is sometimes not stable and may encounter connection failure, the saved data is selected here.
For the data obtained from the above, some data formats need to be adjusted. For date, we will organize two columns of data, one is the date in time format (['date ']), the other is the date in string format (['dates']). The reason for this setting is that we need to use the dates of these two formats in the future.
df_all = df_all_history # Save date in string format as a column df_all['dates'] = df_all_history['date'] # Convert date in string format to date format df_all['date'] = pd.to_datetime(df_all['date'])
The above data is global data. We can exclude those belonging to China and get foreign data.
# Overseas, by country df_oversea = df_all.query("country!='China'") df_oversea.fillna(value="", inplace=True) df_oversea
First, use plot express to see the overall trend of the epidemic situation in different countries.
fig_oversea = px.line(df_oversea, x='dates', y='confirmed', line_group='country', color='country', color_discrete_sequence=px.colors.qualitative.D3, hover_name='country', ) fig_oversea.show()
The effect is as follows
As can be seen from the figure above, the development trend of the epidemic situation in most countries is obvious from February 10. Therefore, we will focus on the situation after this period.
# Existing data demonstration from February 10, 2020 df_oversea_recent = df_oversea.set_index('date') df_oversea_recent = df_oversea_recent['2020-02-10':] df_oversea_recent
Since the data of some countries are not recorded from February 10, 2020, it is necessary to supplement the data. We can manually create a new excel data table and fill in the value of the supplementary date as 0.
The main supplement here is Iran's data, because Iran is developing too fast, which must be included in the analysis. Other countries, if they need to supplement, can continue to improve in the future.
# In some countries, the data is not from February 10, 2020, so we need to supplement the data with a value of 0 # The data is supplemented in excel table and read here df_oversea_buchong = pd.read_excel('epidemic_buchong.xlsx') df_oversea_buchong['dates'] = df_oversea_buchong['date'].apply(lambda x:x.strftime('%Y-%m-%d')) df_oversea_buchong.set_index('date', inplace=True) df_oversea_buchong.fillna(value="", inplace=True) print(df_oversea_buchong.info()) df_oversea_buchong
After completing the data to be supplemented, we can combine the above two parts of data and analyze them together.
# Consolidated supplementary data df_oversea_recent_new = df_oversea_recent.append(df_oversea_buchong) df_oversea_recent_new.sort_index(inplace=True) df_oversea_recent_new
After getting the merged data, first of all, the bubble chart is used to visualize the changes. Here, the scatter chart of plot express is used.
fig_oversea_recent = px.scatter(df_oversea_recent_new, x='dead', y='confirmed', size='confirmed', text='country', color='country', color_discrete_sequence=px.colors.qualitative.Light24, animation_frame='dates',animation_group='country', hover_name='country', range_x=[-10,260], range_y=[0,8000], size_max=50, template='plotly_white', ) fig_oversea_recent.show()
The effect is as follows