Pandas basic composite method merge

First, take the weather as an example to prepare the following data:

df1 = pd.DataFrame({
    'city': ['newyork', 'chicago', 'orlando'],
    'temperature': [21, 24, 32],
})

df2 = pd.DataFrame({
    'city': ['newyork', 'chicago', 'orlando'],
    'humidity': [89, 79, 80],
})

df = pd.merge(df1, df2, on='city')

Output:

The above example is to merge the two dataframe s based on the 'city', but the two sets of data are highly consistent. Let's adjust them as follows:

df1 = pd.DataFrame({
    'city': ['newyork', 'chicago', 'orlando', 'baltimore'],
    'temperature': [21, 24, 32, 29],
})

df2 = pd.DataFrame({
    'city': ['newyork', 'chicago', 'san francisco'],
    'humidity': [89, 79, 80],
})

df = pd.merge(df1, df2, on='city')

Output:

From the output, we can see that through merge, the intersection of two data will be taken

Then, we should be able to imagine that different value ranges can be achieved by adjusting parameters
Collection and union:

df = pd.merge(df1, df2, on='city', how='outer')

Output:

Left alignment:

df = pd.merge(df1, df2, on='city', how='left')

Output:

Right aligned:

df = pd.merge(df1, df2, on='city', how='right')


In addition, when we are fetching and merging, we may sometimes want to know which side a certain data comes from, which can be obtained through the indicator parameter:

df = pd.merge(df1, df2, on='city', how='outer', indicator=True)

Output:

In the above example, there is no conflict between the column names of the merged data, so the merging is very smooth. If two groups of data have the same column names, what will it look like? See the following example:

df1 = pd.DataFrame({
    'city': ['newyork', 'chicago', 'orlando', 'baltimore'],
    'temperature': [21, 24, 32, 29],
    'humidity': [89, 79, 80, 69],
})

df2 = pd.DataFrame({
    'city': ['newyork', 'chicago', 'san francisco'],
    'temperature': [30, 32, 28],
    'humidity': [80, 60, 70],
})

df = pd.merge(df1, df2, on='city')

Output:

We found that the same column name is automatically added with 'x' and 'y' as a distinction. In order to observe the data more intuitively, we can also customize the distinction flag:

df3 = pd.merge(df1, df2, on='city', suffixes=['_left', '_right'])

Output:

Well, that's all about merge merge, enjoy~~~

Tags: Python

Posted on Sun, 01 Dec 2019 04:46:39 -0800 by V_dirt_God