当前位置：首页 > article >正文

4-pandas常用操作

article 2025/2/28 9:46:24

前言

一、DataFrame修改index、columns

1.获取index

df2 = pd.DataFrame(np.arange(9).reshape(3,3),index=['sh','cs','bj'],columns=['a','b','c'])
df2.index

2.修改index

df2.index = ['shanghai','changsha','beijing']

df2.columns = ['A','B','C'] # 注意赋值的顺序

3.批量修改，通过函数

def test_map(x):
return x+'_ABC'

# rename()重命名
df2.rename(index=test_map,columns=test_map,inplace=True) # 给index和columns都运用上面函数的规则.直接df2上进行修改

4.单独修改，通过字典

df2.rename(index={'shanghai_ABC':'shanghai'},columns={'c_ABC':'C'})

二、设置索引set_index

df3.set_index('销售日期',inplace=True)

三、表格合并

1.concat使用

拼接

pd.concat([df4,new_row],ignore_index=True)

pd.concat([df4,new_row],axis=1)

代码如下（示例）：

import pandas as pd

data = {
    'Date': ['2023-09-01', '2023-09-02', '2023-09-03'],
    'Steps': [8000, 9000, 7500]  # 步数
}
df4 = pd.DataFrame(data)



new_data = {'Date':'2023-09-04','Steps':8000}
new_row = pd.DataFrame([new_data])

pd.concat([df4,new_row],ignore_index=True)

2.merge使用

pd.merge(user_df,buy_df,on='CustomerID',how='inner')  
how='inner'求交集  默认设置how='inner'

代码如下（示例）：

import pandas as pd
user_data = {
    'CustomerID': [1, 2, 3, 4, 5],
    'Name': ['Rose', 'Bob', 'Jack', 'David', 'Lucy'],
    'Email': ['rose@163.com', 'bob@163.com', 'jack@163.com', 'david@163.com', 'lucy@163.com']
}
user_df = pd.DataFrame(user_data)


buy_data = {
    'CustomerID': [1, 2, 1, 3, 4, 3, 6],
    'OrderID': [101, 102, 103, 104, 105, 106, 107],
    'Product': ['A', 'B', 'C', 'D', 'E', 'A', 'B'],
    'Quantity': [2, 1, 3, 2, 4, 1, 2]
}
buy_df = pd.DataFrame(buy_data)


pd.merge(user_df,buy_df,on='CustomerID',how='inner')  # how='inner'求交集  默认设置how='inner'
pd.merge(user_df,buy_df,on='CustomerID',how='outer')  # how='outer' 求并集

pd.merge(user_df,buy_df,on='CustomerID',how='left')   # how='left'左连接

pd.merge(user_df,buy_df,on='CustomerID',how='right')  # how='right'右连接

3.join使用

基于索引合并
students_df.join(scores_df,on='StudentID',how='inner')  # 默认how='left'
代码如下（示例）：

students_data = {
    'StudentID': [1, 2, 3, 4, 5],
    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],
    'Age': [18, 19, 18, 20, 19]
}
students_df = pd.DataFrame(students_data)
students_df.set_index('StudentID',inplace=True)

scores_data = {
    'StudentID': [1, 2, 3, 4, 5],
    'Math': [90, 85, 92, 78, 88],
    'Science': [88, 87, 91, 79, 90]
}
scores_df = pd.DataFrame(scores_data)
scores_df.set_index('StudentID',inplace=True)

students_df.join(scores_df,on='StudentID',how='inner')  # 默认how='left'

四，数据查询

1.df3.info() # 查看数据类型

2.df3.query('A>2') 查询，引号内是筛选的条件

df3.query('(A>2) and (B<40)')


data = {
    'A': [1, 2, 3, 4, 5],
    'B': [10, 20, 30, 40, 50]
}
df3 = pd.DataFrame(data)


df3[df3['A']>2]


df3.query('A>2')  # 查询   条件一定要用引号   然后引号内是筛选的条件

3.isin

data = {
    'City': ['长沙', '北京', '上海', '成都', '云南'],
    'Population (millions)': [84, 39, 27, 23, 15]
}
df4 = pd.DataFrame(data)
df4[df4['City'].isin(['长沙','成都'])]  # df4['City'].isin(['长沙']) City列是否包含指定内容，isin()里接列表

五，拆包

1.展开为多列

df5[['a','b','c']] = df5['B'].apply(pd.Series)

df5['B'].apply(pd.Series)

data = {
    'A': [1, 2, 3],
    'B': [['x', 'y'], ['p', 'q', 'r'], ['m']]
}
df5 = pd.DataFrame(data)


df5[['a','b','c']] = df5['B'].apply(pd.Series)