后台-插件-广告管理-内容页广告位一(手机)

# Python数据分析入门（十一）：数据合并

2021-04-18 18:39:09python人已围观

pd.merge:(left, right, how='inner',on=None,left_on=None, right_on=Non

# 数据合并(pd.merge)

• 根据单个或多个键将不同DataFrame的行连接起来

• 类似数据库的连接操作

• pd.merge:(left, right, how='inner',on=None,left_on=None, right_on=None )
left:合并时左边的DataFrame
right:合并时右边的DataFrame
how:合并的方式,默认'inner', 'outer', 'left', 'right'
on:需要合并的列名,必须两边都有的列名，并以 left 和 right 中的列名的交集作为连接键
left_on: left Dataframe中用作连接键的列
right_on: right Dataframe中用作连接键的列

• 内连接 inner:对两张表都有的键的交集进行联合

• 全连接 outer：对两者表的都有的键的并集进行联合

• 左连接 left：对所有左表的键进行联合

• 右连接 right：对所有右表的键进行联合

示例代码：

```import pandas as pd
import numpy as np

left = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key': ['K0', 'K1', 'K2', 'K3'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

pd.merge(left,right,on='key') #指定连接键key```

```key    A    B    C    D
0    K0    A0    B0    C0    D0
1    K1    A1    B1    C1    D1
2    K2    A2    B2    C2    D2
3    K3    A3    B3    C3    D3```

```left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})

right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

pd.merge(left,right,on=['key1','key2']) #指定多个键，进行合并```

```    key1    key2    A    B    C    D
0    K0    K0    A0    B0    C0    D0
1    K1    K0    A2    B2    C1    D1
2    K1    K0    A2    B2    C2    D2```

```#指定左连接

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})

pd.merge(left, right, how='left', on=['key1', 'key2'])
key1    key2          A    B    C    D
0    K0        K0        A0    B0    C0    D0
1    K0        K1        A1    B1    NaN    NaN
2    K1        K0        A2    B2    C1    D1
3    K1        K0        A2    B2    C2    D2
4    K2        K1        A3    B3    NaN    NaN```

```#指定右连接

left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
pd.merge(left, right, how='right', on=['key1', 'key2'])
key1    key2          A    B    C    D
0    K0        K0        A0    B0    C0    D0
1    K1        K0        A2    B2    C1    D1
2    K1        K0        A2    B2    C2    D2
3    K2        K0        NaN    NaN    C3    D3```

how指定连接方式

### “外连接”(outer)，结果中的键是并集

```left = pd.DataFrame({'key1': ['K0', 'K0', 'K1', 'K2'],
'key2': ['K0', 'K1', 'K0', 'K1'],
'A': ['A0', 'A1', 'A2', 'A3'],
'B': ['B0', 'B1', 'B2', 'B3']})
right = pd.DataFrame({'key1': ['K0', 'K1', 'K1', 'K2'],
'key2': ['K0', 'K0', 'K0', 'K0'],
'C': ['C0', 'C1', 'C2', 'C3'],
'D': ['D0', 'D1', 'D2', 'D3']})
pd.merge(left,right,how='outer',on=['key1','key2'])```

```key1    key2    A    B    C    D
0    K0    K0    A0    B0    C0    D0
1    K0    K1    A1    B1    NaN    NaN
2    K1    K0    A2    B2    C1    D1
3    K1    K0    A2    B2    C2    D2
4    K2    K1    A3    B3    NaN    NaN
5    K2    K0    NaN    NaN    C3    D3```

### 处理重复列名

```# 处理重复列名
df_obj1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'data' : np.random.randint(0,10,7)})
df_obj2 = pd.DataFrame({'key': ['a', 'b', 'd'],
'data' : np.random.randint(0,10,3)})

print(pd.merge(df_obj1, df_obj2, on='key', suffixes=('_left', '_right')))```

```   data_left key  data_right
0          9   b           1
1          5   b           1
2          1   b           1
3          2   a           8
4          2   a           8
5          5   a           8```

### 按索引连接

```# 按索引连接
df_obj1 = pd.DataFrame({'key': ['b', 'b', 'a', 'c', 'a', 'a', 'b'],
'data1' : np.random.randint(0,10,7)})
df_obj2 = pd.DataFrame({'data2' : np.random.randint(0,10,3)}, index=['a', 'b', 'd'])

print(pd.merge(df_obj1, df_obj2, left_on='key', right_index=True))```

```   data1 key  data2
0      3   b      6
1      4   b      6
6      8   b      6
2      6   a      0
4      3   a      0
5      0   a      0```

# 数据合并(pd.concat)

### 1. NumPy的concat

np.concatenate

```import numpy as np
import pandas as pd

arr1 = np.random.randint(0, 10, (3, 4))
arr2 = np.random.randint(0, 10, (3, 4))

print(arr1)
print(arr2)

print(np.concatenate([arr1, arr2]))
print(np.concatenate([arr1, arr2], axis=1))```

```# print(arr1)
[[3 3 0 8]
[2 0 3 1]
[4 8 8 2]]

# print(arr2)
[[6 8 7 3]
[1 6 8 7]
[1 4 7 1]]

# print(np.concatenate([arr1, arr2]))
[[3 3 0 8]
[2 0 3 1]
[4 8 8 2]
[6 8 7 3]
[1 6 8 7]
[1 4 7 1]]

# print(np.concatenate([arr1, arr2], axis=1))
[[3 3 0 8 6 8 7 3]
[2 0 3 1 1 6 8 7]
[4 8 8 2 1 4 7 1]]```

### 2. pd.concat

• 注意指定轴方向，默认axis=0
• join指定合并方式，默认为outer
• Series合并时查看行索引有无重复
```df1 = pd.DataFrame(np.arange(6).reshape(3,2),index=list('abc'),columns=['one','two'])

df2 = pd.DataFrame(np.arange(4).reshape(2,2)+5,index=list('ac'),columns=['three','four'])

pd.concat([df1,df2]) #默认外连接，axis=0
four    one    three    two
a    NaN        0.0    NaN        1.0
b    NaN        2.0    NaN        3.0
c    NaN        4.0    NaN        5.0
a    6.0        NaN    5.0        NaN
c    8.0        NaN    7.0        NaN

pd.concat([df1,df2],axis='columns') #指定axis=1连接
one    two    three    four
a    0    1    5.0        6.0
b    2    3    NaN        NaN
c    4    5    7.0        8.0

#同样我们也可以指定连接的方式为inner
pd.concat([df1,df2],axis=1,join='inner')

one    two    three    four
a    0    1    5        6
c    4    5    7        8```

Tags：数据   合并   入门   分析   Python

 后台-插件-广告管理-内容页广告位二(手机)

## 相关文章

 后台-插件-广告管理-内容页广告位三(手机)
 后台-插件-广告管理-内容页广告位四(手机)

## 文章评论

 留言与评论（共有 0 条评论）

 验证码：

## 站点信息

• 文章统计13614篇文章
• 浏览统计468次浏览
• 评论统计1个评论
• 标签管理标签云
• 统计数据：统计代码
• 微信公众号：扫描二维码，关注我们