Four faster and simpler ways to realize Python data visualization

Have you used all these visualization methods, such as heat map, 2D density map, spider web map and tree view?

Thermodynamic chart

Heat Map is a matrix representation method of data, in which the value of each matrix element is represented by a color.

Different colors represent different values. Through the index of the matrix, two items or two features that need to be compared are associated. Thermograph is very suitable to show the relationship between multiple characteristic variables, because you can directly know the size of matrix elements at this location through color.

By looking at the other points in the heat map, you can also see the comparison between each relationship and the other relationships in the data set. Color is so intuitive that it provides us with a very simple way to interpret data.

Now let's look at the implementation code.

Compared with "matplotlib", seaborn can be used to draw more advanced graphics. It usually needs more components, such as multiple colors, graphics or variables.

"matplotlib" can be used to display graphics, "NumPy" can be used to generate data, "Panda" can be used to process data! Drawing is just a simple function of seaborn.

import seaborn as sns
import pandas as pd
import numpy as np
import matplotlib.pyplot as plt
# Create a random dataset
data = pd.DataFrame(np.random.random((10,6)), columns=["Iron Man","Captain America","Black Widow","Thor","Hulk", "Hawkeye"])
print(data)
# Plot the heatmap
heatmap_plot = sns.heatmap(data, center=0, cmap='gist_ncar')
plt.show()

Two dimensional density map

2D Density Plot is an intuitive extension of one-dimensional density plot. Compared with one-dimensional density plot, its advantage is that it can see the probability distribution of two variables.

For example, in the following two-dimensional density chart, the scale chart on the right shows the probability of each point in color. Where our data is most likely to occur (that is, where the data points are most concentrated), it seems to be around size=0.5, speed=1.4.

As you know now, two-dimensional density map is very useful for quickly finding the most concentrated area of our data in the case of two variables, rather than just one variable like one-dimensional density map.

When you have two variables that are very important to the output and want to know how they work together on the distribution of the output, it is very effective to observe the data with a two-dimensional density map.

It turns out that it's very convenient to use "seaborn" to write code! This time, we will create a skew distribution to make the data visualization more interesting. You can adjust most of the optional parameters to make the visualization look clearer.

import seaborn as sns
import matplotlib.pyplot as plt
from scipy.stats import skewnorm
# Create the data
speed = skewnorm.rvs(4, size=50)
size = skewnorm.rvs(4, size=50)
# Create and shor the 2D Density plot
ax = sns.kdeplot(speed, size, cmap="Reds", shade=False, bw=.15, cbar=True)
ax.set(xlabel='speed', ylabel='size')
plt.show()

Spider web diagram

Spider Plot is one of the best ways to show one to many relationships. In other words, you can plot and view the values of multiple variables related to a variable or category.

In the spider web diagram, the significance of one variable relative to another is clear and obvious, because in a specific direction, the area covered and the length from the center become larger. If you want to see how several different categories of objects described with these variables differ, you can draw them side by side.

In the chart below, we can easily compare the different attributes of Avenger alliance and see their respective advantages! (note that the data is set randomly.)

Here, we can directly use "matplotlib" instead of "seaborn" to create visual results. We need to have each attribute equally spaced around the circumference.

We will label each corner and then draw the value as a point whose distance from the center depends on its value / size. Finally, to make the display clearer, we will use a translucent color to fill the area surrounded by the lines connecting the attribute points.

import pandas as pd
import seaborn as sns
import numpy as np
import matplotlib.pyplot as plt
# Get the data
df=pd.read_csv("avengers_data.csv")
print(df)
"""
 # Name Attack Defense Speed Range Health
0 1 Iron Man 83 80 75 70 70
1 2 Captain America 60 62 63 80 80
2 3 Thor 80 82 83 100 100
3 3 Hulk 80 100 67 44 92
4 4 Black Widow 52 43 60 50 65
5 5 Hawkeye 58 64 58 80 65
"""
# Get the data for Iron Man
labels=np.array(["Attack","Defense","Speed","Range","Health"])
stats=df.loc[0,labels].values
# Make some calculations for the plot
angles=np.linspace(0, 2*np.pi, len(labels), endpoint=False)
stats=np.concatenate((stats,[stats[0]]))
angles=np.concatenate((angles,[angles[0]]))
# Plot stuff
fig = plt.figure()
ax = fig.add_subplot(111, polar=True)
ax.plot(angles, stats, 'o-', linewidth=2)
ax.fill(angles, stats, alpha=0.25)
ax.set_thetagrids(angles * 180/np.pi, labels)
ax.set_title([df.loc[0,"Name"]])
ax.grid(True)
plt.show()

Dendrogram

We have used Tree Diagram since primary school! Trees are natural and intuitive, which makes them easy to interpret.

Directly connected nodes are closely related, while nodes with multiple connections are not very similar. In the following visualization results, I drew a tree view of a small part of Pokemon game's data set based on Kaggle's statistics (health, attack, defense, special attack, special defense, speed).

Therefore, the most statistically matched Pokemon will be closely linked.

For example, at the top of the picture, there is a direct connection between the ape and the pointed beak. If we look at the data, the total score of the ape is 438, and that of the pointed beak is 442. The two are very close!

But if we look at LADA, we can see that it has a total score of 413, which is quite different from the Aberdeen and the pointbill, so they are separated in the tree view!

As we move up the tree, the Pokemon in the green group are more similar to each other than any of them in the red group, even if there is no direct green connection.

For the tree view, we actually need to use "Scipy" to draw.

After reading the data in the dataset, we will delete the string column. This is just to make the visual results more intuitive and easy to understand, but in practice, converting these strings into classification variables will get better results and comparison results.

We also set the index of the data frame so that it can be properly used as a column that references each node. Finally, we need to tell you that it only needs one line of simple code to calculate and draw the tree view in Scipy.

import pandas as pd
from matplotlib import pyplot as plt
from scipy.cluster import hierarchy
import numpy as np
# Read in the dataset
# Drop any fields that are strings
# Only get the first 40 because this dataset is big
df = pd.read_csv('Pokemon.csv')
df = df.set_index('Name')
del df.index.name
df = df.drop(["Type 1", "Type 2", "Legendary"], axis=1)
df = df.head(n=40)
# Calculate the distance between each sample
Z = hierarchy.linkage(df, 'ward')
# Orientation our tree
hierarchy.dendrogram(Z, orientation="left", labels=df.index)
plt.show()

Source network, only for learning, if there is infringement, contact delete.

I have collected high-quality technical articles and experience summary in my official account [Python circles].

Don't panic. I have a set of learning materials, including 40 + E-books, 600 + teaching videos, involving Python foundation, reptile, framework, data analysis, machine learning, etc. I'm not afraid you won't learn!

file

Tags: Programming Attribute Python network

Posted on Mon, 13 Apr 2020 01:58:42 -0700 by Quinton1337