app开发者平台在数字化时代的重要性与发展趋势解析
637
2022-09-09
数据清洗之 分组对象与apply函数
分组对象与apply函数
函数apply即可用于分组对象,也可以作用于dataframe数据Groupby.apply(func)需要注意axis=0和axis=1的区别
np.sum, axis=0 相当于计算每列的总和np.sum, axis=1 相当于计算每行的总和
import pandas as pdimport numpy as npimport
os.getcwd()
'D:\\Jupyter\\notebook\\Python数据清洗实战\\数据清洗之数据统计'
os.chdir('D:\\Jupyter\\notebook\\Python数据清洗实战\\数据')
df = pd.read_csv('online_order.csv', encoding='gbk', dtype={'customer':str, 'order':str})
df.head(5)
customer | order | total_items | discount% | weekday | hour | Food% | Fresh% | Drinks% | Home% | Beauty% | Health% | Baby% | Pets% | |
0 | 0 | 0 | 45 | 23.03 | 4 | 13 | 9.46 | 87.06 | 3.48 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 |
1 | 0 | 1 | 38 | 1.22 | 5 | 13 | 15.87 | 75.80 | 6.22 | 2.12 | 0.00 | 0.00 | 0.0 | 0.0 |
2 | 0 | 2 | 51 | 18.08 | 4 | 13 | 16.88 | 56.75 | 3.37 | 16.48 | 6.53 | 0.00 | 0.0 | 0.0 |
3 | 1 | 3 | 57 | 16.51 | 1 | 12 | 28.81 | 35.99 | 11.78 | 4.62 | 2.87 | 15.92 | 0.0 | 0.0 |
4 | 1 | 4 | 53 | 18.31 | 2 | 11 | 24.13 | 60.38 | 7.78 | 7.72 | 0.00 | 0.00 | 0.0 | 0.0 |
grouped = df.groupby('weekday')
# 只可传入一个统计参数# agg可传入多个# grouped.apply([np.mean, np.sum])
grouped.apply(np.mean)[['total_items', 'discount%', 'weekday']]
total_items | discount% | weekday | |
weekday | |||
1 | 30.662177 | 8.580705 | 1.0 |
2 | 31.868612 | 8.638014 | 2.0 |
3 | 31.869796 | 7.794507 | 3.0 |
4 | 32.251899 | 8.068155 | 4.0 |
5 | 31.406619 | 9.159031 | 5.0 |
6 | 32.154814 | 8.414258 | 6.0 |
7 | 32.373837 | 8.710171 | 7.0 |
df.columns
Index(['customer', 'order', 'total_items', 'discount%', 'weekday', 'hour', 'Food%', 'Fresh%', 'Drinks%', 'Home%', 'Beauty%', 'Health%', 'Baby%', 'Pets%'], dtype='object')
var = ['Food%', 'Fresh%', 'Drinks%', 'Home%', 'Beauty%', 'Health%', 'Baby%', 'Pets%']
df[var].head(5)
Food% | Fresh% | Drinks% | Home% | Beauty% | Health% | Baby% | Pets% | |
0 | 9.46 | 87.06 | 3.48 | 0.00 | 0.00 | 0.00 | 0.0 | 0.0 |
1 | 15.87 | 75.80 | 6.22 | 2.12 | 0.00 | 0.00 | 0.0 | 0.0 |
2 | 16.88 | 56.75 | 3.37 | 16.48 | 6.53 | 0.00 | 0.0 | 0.0 |
3 | 28.81 | 35.99 | 11.78 | 4.62 | 2.87 | 15.92 | 0.0 | 0.0 |
4 | 24.13 | 60.38 | 7.78 | 7.72 | 0.00 | 0.00 | 0.0 | 0.0 |
# 计算每个变量的总和df[var].apply(np.sum, axis=0)
Food% 706812.19Fresh% 606818.38Drinks% 700477.06Home% 406187.25Beauty% 176788.48Health% 33988.76Baby% 332884.34Pets% 31292.61dtype: float64
# 对每一行求和df[var].apply(np.sum, axis=1).head(5)
0 100.001 100.012 100.013 99.994 100.01dtype: float64
# Food% - Fresh%df[var].apply(lambda x: x[0] - x[1], axis=1).head(5)
0 -77.601 -59.932 -39.873 -7.184 -36.25dtype: float64
版权声明:本文内容由网络用户投稿,版权归原作者所有,本站不拥有其著作权,亦不承担相应法律责任。如果您发现本站中有涉嫌抄袭或描述失实的内容,请联系我们jiasou666@gmail.com 处理,核实后本网站将在24小时内删除侵权内容。
发表评论
暂时没有评论,来抢沙发吧~