《Python数据分析:活用pandas库》学习笔记Day1:Panda DataFrame基础知识
Python数据分析:活用pandas库
Python强大易用,是数据处理和数据分析利器,而众多库的加持令其如虎添翼。Pandas就是其中一个非常流行的开源库,它可以确保数据的准确性,将数据可视化,还可以高效地操作大型数据集。借助它,Python可以快速地自动化和执行几乎任何数据分析任务。
本书细致讲解了Pandas的基础知识和常见用法,通过简单的实例展示了如何使用Pandas解决复杂的现实问题,以及如何利用matplotlib、seaborn、statsmodels和sklearn等库辅助进行Python数据分析,涵盖了数据处理、数据可视化、数据建模等内容。此外,本书还简单介绍了Python数据分析生态系统。
本书是Python数据分析入门书,每个概念都通过简单实例来阐述,便于读者理解与上手。具体内容包括:Python及Pandas基础知识,加载和查看数据集,Pandas的DataFrame对象和Series对象,使用matplotlib、seaborn和Pandas提供的绘图方法为探索性数据分析作图,连接与合并数据集,处理缺失数据,清理数据,转换数据类型,处理字符串,应用函数,分组操作,拟合及评估模型,正则化方法与聚类技术等。
第一章 Pandas DataFrame基础知识
1.1简介
pandas主要是用来进行数据处理/数据分析的第三方库,其中不仅包含了数据处理、甚至还有统计分析等相关计算,其内部封装了numpy的相关组件。
pandas的主要数据类型有:series(一维结构)、dataframe(二维结构)、pannel(三维结构)
1.2加载数据集
从官网下载数据集:https://www.ituring.com.cn/book/2557 ,下载后解压缩,本文的python源代码都放置在notebooks文件夹中,本节内容将针对data文件夹下的gapminder.tsv进行操作
import pandas as pd
df=pd.read_csv("..\data\gapminder.tsv",sep="\t")
print(df.shape)
(1704, 6)
print(df.dtypes)
country object
continent object
year int64
lifeExp float64
pop int64
gdpPercap float64
dtype: object
print(df.head(6))
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
1 Afghanistan Asia 1957 30.332 9240934 820.853030
2 Afghanistan Asia 1962 31.997 10267083 853.100710
3 Afghanistan Asia 1967 34.020 11537966 836.197138
4 Afghanistan Asia 1972 36.088 13079460 739.981106
5 Afghanistan Asia 1977 38.438 14880372 786.113360
print(df.tail(6))
country continent year lifeExp pop gdpPercap
1698 Zimbabwe Africa 1982 60.363 7636524 788.855041
1699 Zimbabwe Africa 1987 62.351 9216418 706.157306
1700 Zimbabwe Africa 1992 60.377 10704340 693.420786
1701 Zimbabwe Africa 1997 46.809 11404948 792.449960
1702 Zimbabwe Africa 2002 39.989 11926563 672.038623
1703 Zimbabwe Africa 2007 43.487 12311143 469.709298
print(df.sample(6))
country continent year lifeExp pop gdpPercap
545 Gabon Africa 1977 52.790 706367 21745.573280
1341 Serbia Europe 1997 72.232 10336594 7914.320304
1494 Syria Asia 1982 64.590 9410494 3761.837715
1389 Slovenia Europe 1997 75.130 2011612 17161.107350
274 Chad Africa 2002 50.525 8835739 1156.181860
462 Egypt Africa 1982 56.006 45681811 3503.729636
print(df.info())
<class 'pandas.core.frame.DataFrame'>
RangeIndex: 1704 entries, 0 to 1703
Data columns (total 6 columns):
# Column Non-Null Count Dtype
--- ------ -------------- -----
0 country 1704 non-null object
1 continent 1704 non-null object
2 year 1704 non-null int64
3 lifeExp 1704 non-null float64
4 pop 1704 non-null int64
5 gdpPercap 1704 non-null float64
dtypes: float64(2), int64(2), object(2)
memory usage: 66.6+ KB
None
##1.3查看列、行、单元格
###1.3.1获取列子集:通过名称、位置、范围来指定
print(df["country"].head())
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan
Name: country, dtype: object
print(df["country"])
0 Afghanistan
1 Afghanistan
2 Afghanistan
3 Afghanistan
4 Afghanistan
...
1699 Zimbabwe
1700 Zimbabwe
1701 Zimbabwe
1702 Zimbabwe
1703 Zimbabwe
Name: country, Length: 1704, dtype: object
print(df[["country","continent","year"]])
country continent year
0 Afghanistan Asia 1952
1 Afghanistan Asia 1957
2 Afghanistan Asia 1962
3 Afghanistan Asia 1967
4 Afghanistan Asia 1972
... ... ... ...
1699 Zimbabwe Africa 1987
1700 Zimbabwe Africa 1992
1701 Zimbabwe Africa 1997
1702 Zimbabwe Africa 2002
1703 Zimbabwe Africa 2007
[1704 rows x 3 columns]
print(df.loc[:,["country","continent","year"]])
country continent year
0 Afghanistan Asia 1952
1 Afghanistan Asia 1957
2 Afghanistan Asia 1962
3 Afghanistan Asia 1967
4 Afghanistan Asia 1972
... ... ... ...
1699 Zimbabwe Africa 1987
1700 Zimbabwe Africa 1992
1701 Zimbabwe Africa 1997
1702 Zimbabwe Africa 2002
1703 Zimbabwe Africa 2007
[1704 rows x 3 columns]
print(df.iloc[:,[0,1,-4]])
country continent year
0 Afghanistan Asia 1952
1 Afghanistan Asia 1957
2 Afghanistan Asia 1962
3 Afghanistan Asia 1967
4 Afghanistan Asia 1972
... ... ... ...
1699 Zimbabwe Africa 1987
1700 Zimbabwe Africa 1992
1701 Zimbabwe Africa 1997
1702 Zimbabwe Africa 2002
1703 Zimbabwe Africa 2007
[1704 rows x 3 columns]
###1.3.2获取行子集:通过行名称loc、行索引iloc来指定
print(df.loc[0]) #第一行
country Afghanistan
continent Asia
year 1952
lifeExp 28.801
pop 8425333
gdpPercap 779.445314
Name: 0, dtype: object
print(df.loc[99]) #第100行
country Bangladesh
continent Asia
year 1967
lifeExp 43.453
pop 62821884
gdpPercap 721.186086
Name: 99, dtype: object
print(df.iloc[1703]) #最后一行
country Zimbabwe
continent Africa
year 2007
lifeExp 43.487
pop 12311143
gdpPercap 469.709298
Name: 1703, dtype: object
print(df.iloc[-1]) #最后一行
country Zimbabwe
continent Africa
year 2007
lifeExp 43.487
pop 12311143
gdpPercap 469.709298
Name: 1703, dtype: object
print(df.loc[-1]) #最后一行,错误
---------------------------------------------------------------------------
ValueError Traceback (most recent call last)
f:\zk\py\jupyter\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
350 try:
--> 351 return self._range.index(new_key)
352 except ValueError as err:
ValueError: -1 is not in range
The above exception was the direct cause of the following exception:
KeyError Traceback (most recent call last)
<ipython-input-37-1c8cb0fb85f1> in <module>
----> 1 print(df.loc[-1]) #最后一行
f:\zk\py\jupyter\lib\site-packages\pandas\core\indexing.py in __getitem__(self, key)
893
894 maybe_callable = com.apply_if_callable(key, self.obj)
--> 895 return self._getitem_axis(maybe_callable, axis=axis)
896
897 def _is_scalar_access(self, key: Tuple):
f:\zk\py\jupyter\lib\site-packages\pandas\core\indexing.py in _getitem_axis(self, key, axis)
1122 # fall thru to straight lookup
1123 self._validate_key(key, axis)
-> 1124 return self._get_label(key, axis=axis)
1125
1126 def _get_slice_axis(self, slice_obj: slice, axis: int):
f:\zk\py\jupyter\lib\site-packages\pandas\core\indexing.py in _get_label(self, label, axis)
1071 def _get_label(self, label, axis: int):
1072 # GH#5667 this will fail if the label is not present in the axis.
-> 1073 return self.obj.xs(label, axis=axis)
1074
1075 def _handle_lowerdim_multi_index_axis0(self, tup: Tuple):
f:\zk\py\jupyter\lib\site-packages\pandas\core\generic.py in xs(self, key, axis, level, drop_level)
3737 raise TypeError(f"Expected label or tuple of labels, got {key}") from e
3738 else:
-> 3739 loc = index.get_loc(key)
3740
3741 if isinstance(loc, np.ndarray):
f:\zk\py\jupyter\lib\site-packages\pandas\core\indexes\range.py in get_loc(self, key, method, tolerance)
351 return self._range.index(new_key)
352 except ValueError as err:
--> 353 raise KeyError(key) from err
354 raise KeyError(key)
355 return super().get_loc(key, method=method, tolerance=tolerance)
KeyError: -1
print(df.tail(1),"\n") #
print(df.iloc[-1])
country continent year lifeExp pop gdpPercap
1703 Zimbabwe Africa 2007 43.487 12311143 469.709298
country Zimbabwe
continent Africa
year 2007
lifeExp 43.487
pop 12311143
gdpPercap 469.709298
Name: 1703, dtype: object
print(type(df.tail(1)),type(df.iloc[-1]))
<class 'pandas.core.frame.DataFrame'> <class 'pandas.core.series.Series'>
###1.3.3混合 df.loc[[行],[列]] df.iloc[[行],[列]] 可以在loc内的索引值可以用切片代替a🅱️c
print(df.loc[:,["year","pop"]],df.iloc[:,[2,4]])
year pop
0 1952 8425333
1 1957 9240934
2 1962 10267083
3 1967 11537966
4 1972 13079460
... ... ...
1699 1987 9216418
1700 1992 10704340
1701 1997 11404948
1702 2002 11926563
1703 2007 12311143
[1704 rows x 2 columns] year pop
0 1952 8425333
1 1957 9240934
2 1962 10267083
3 1967 11537966
4 1972 13079460
... ... ...
1699 1987 9216418
1700 1992 10704340
1701 1997 11404948
1702 2002 11926563
1703 2007 12311143
[1704 rows x 2 columns]
print(df.loc[:,["lifeExp","pop"]],"\n",df.iloc[:,list(range(3,5))])
lifeExp pop
0 28.801 8425333
1 30.332 9240934
2 31.997 10267083
3 34.020 11537966
4 36.088 13079460
... ... ...
1699 62.351 9216418
1700 60.377 10704340
1701 46.809 11404948
1702 39.989 11926563
1703 43.487 12311143
[1704 rows x 2 columns]
lifeExp pop
0 28.801 8425333
1 30.332 9240934
2 31.997 10267083
3 34.020 11537966
4 36.088 13079460
... ... ...
1699 62.351 9216418
1700 60.377 10704340
1701 46.809 11404948
1702 39.989 11926563
1703 43.487 12311143
[1704 rows x 2 columns]
print(df.iloc[1:10:2,:2])
country continent
1 Afghanistan Asia
3 Afghanistan Asia
5 Afghanistan Asia
7 Afghanistan Asia
9 Afghanistan Asia
print(df.iloc[1:10:2,::])
country continent year lifeExp pop gdpPercap
1 Afghanistan Asia 1957 30.332 9240934 820.853030
3 Afghanistan Asia 1967 34.020 11537966 836.197138
5 Afghanistan Asia 1977 38.438 14880372 786.113360
7 Afghanistan Asia 1987 40.822 13867957 852.395945
9 Afghanistan Asia 1997 41.763 22227415 635.341351
##1.4 分组和聚合方式
print(df.groupby("year"))
print(df.groupby("year")["lifeExp"])
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0DB33F28>
<pandas.core.groupby.generic.SeriesGroupBy object at 0x0DB33D30>
for a,b in df.groupby("year"):
print (a,"\n",b)
1952
country continent year lifeExp pop gdpPercap
0 Afghanistan Asia 1952 28.801 8425333 779.445314
12 Albania Europe 1952 55.230 1282697 1601.056136
24 Algeria Africa 1952 43.077 9279525 2449.008185
36 Angola Africa 1952 30.015 4232095 3520.610273
48 Argentina Americas 1952 62.485 17876956 5911.315053
... ... ... ... ... ... ...
1644 Vietnam Asia 1952 40.412 26246839 605.066492
1656 West Bank and Gaza Asia 1952 43.160 1030585 1515.592329
1668 Yemen, Rep. Asia 1952 32.548 4963829 781.717576
1680 Zambia Africa 1952 42.038 2672000 1147.388831
1692 Zimbabwe Africa 1952 48.451 3080907 406.884115
[142 rows x 6 columns]
1957
country continent year lifeExp pop gdpPercap
1 Afghanistan Asia 1957 30.332 9240934 820.853030
13 Albania Europe 1957 59.280 1476505 1942.284244
25 Algeria Africa 1957 45.685 10270856 3013.976023
37 Angola Africa 1957 31.999 4561361 3827.940465
49 Argentina Americas 1957 64.399 19610538 6856.856212
... ... ... ... ... ... ...
1645 Vietnam Asia 1957 42.887 28998543 676.285448
1657 West Bank and Gaza Asia 1957 45.671 1070439 1827.067742
1669 Yemen, Rep. Asia 1957 33.970 5498090 804.830455
1681 Zambia Africa 1957 44.077 3016000 1311.956766
1693 Zimbabwe Africa 1957 50.469 3646340 518.764268
[142 rows x 6 columns]
1962
country continent year lifeExp pop gdpPercap
2 Afghanistan Asia 1962 31.997 10267083 853.100710
14 Albania Europe 1962 64.820 1728137 2312.888958
26 Algeria Africa 1962 48.303 11000948 2550.816880
38 Angola Africa 1962 34.000 4826015 4269.276742
50 Argentina Americas 1962 65.142 21283783 7133.166023
... ... ... ... ... ... ...
1646 Vietnam Asia 1962 45.363 33796140 772.049160
1658 West Bank and Gaza Asia 1962 48.127 1133134 2198.956312
1670 Yemen, Rep. Asia 1962 35.180 6120081 825.623201
1682 Zambia Africa 1962 46.023 3421000 1452.725766
1694 Zimbabwe Africa 1962 52.358 4277736 527.272182
[142 rows x 6 columns]
1967
country continent year lifeExp pop gdpPercap
3 Afghanistan Asia 1967 34.020 11537966 836.197138
15 Albania Europe 1967 66.220 1984060 2760.196931
27 Algeria Africa 1967 51.407 12760499 3246.991771
39 Angola Africa 1967 35.985 5247469 5522.776375
51 Argentina Americas 1967 65.634 22934225 8052.953021
... ... ... ... ... ... ...
1647 Vietnam Asia 1967 47.838 39463910 637.123289
1659 West Bank and Gaza Asia 1967 51.631 1142636 2649.715007
1671 Yemen, Rep. Asia 1967 36.984 6740785 862.442146
1683 Zambia Africa 1967 47.768 3900000 1777.077318
1695 Zimbabwe Africa 1967 53.995 4995432 569.795071
[142 rows x 6 columns]
1972
country continent year lifeExp pop gdpPercap
4 Afghanistan Asia 1972 36.088 13079460 739.981106
16 Albania Europe 1972 67.690 2263554 3313.422188
28 Algeria Africa 1972 54.518 14760787 4182.663766
40 Angola Africa 1972 37.928 5894858 5473.288005
52 Argentina Americas 1972 67.065 24779799 9443.038526
... ... ... ... ... ... ...
1648 Vietnam Asia 1972 50.254 44655014 699.501644
1660 West Bank and Gaza Asia 1972 56.532 1089572 3133.409277
1672 Yemen, Rep. Asia 1972 39.848 7407075 1265.047031
1684 Zambia Africa 1972 50.107 4506497 1773.498265
1696 Zimbabwe Africa 1972 55.635 5861135 799.362176
[142 rows x 6 columns]
1977
country continent year lifeExp pop gdpPercap
5 Afghanistan Asia 1977 38.438 14880372 786.113360
17 Albania Europe 1977 68.930 2509048 3533.003910
29 Algeria Africa 1977 58.014 17152804 4910.416756
41 Angola Africa 1977 39.483 6162675 3008.647355
53 Argentina Americas 1977 68.481 26983828 10079.026740
... ... ... ... ... ... ...
1649 Vietnam Asia 1977 55.764 50533506 713.537120
1661 West Bank and Gaza Asia 1977 60.765 1261091 3682.831494
1673 Yemen, Rep. Asia 1977 44.175 8403990 1829.765177
1685 Zambia Africa 1977 51.386 5216550 1588.688299
1697 Zimbabwe Africa 1977 57.674 6642107 685.587682
[142 rows x 6 columns]
1982
country continent year lifeExp pop gdpPercap
6 Afghanistan Asia 1982 39.854 12881816 978.011439
18 Albania Europe 1982 70.420 2780097 3630.880722
30 Algeria Africa 1982 61.368 20033753 5745.160213
42 Angola Africa 1982 39.942 7016384 2756.953672
54 Argentina Americas 1982 69.942 29341374 8997.897412
... ... ... ... ... ... ...
1650 Vietnam Asia 1982 58.816 56142181 707.235786
1662 West Bank and Gaza Asia 1982 64.406 1425876 4336.032082
1674 Yemen, Rep. Asia 1982 49.113 9657618 1977.557010
1686 Zambia Africa 1982 51.821 6100407 1408.678565
1698 Zimbabwe Africa 1982 60.363 7636524 788.855041
[142 rows x 6 columns]
1987
country continent year lifeExp pop gdpPercap
7 Afghanistan Asia 1987 40.822 13867957 852.395945
19 Albania Europe 1987 72.000 3075321 3738.932735
31 Algeria Africa 1987 65.799 23254956 5681.358539
43 Angola Africa 1987 39.906 7874230 2430.208311
55 Argentina Americas 1987 70.774 31620918 9139.671389
... ... ... ... ... ... ...
1651 Vietnam Asia 1987 62.820 62826491 820.799445
1663 West Bank and Gaza Asia 1987 67.046 1691210 5107.197384
1675 Yemen, Rep. Asia 1987 52.922 11219340 1971.741538
1687 Zambia Africa 1987 50.821 7272406 1213.315116
1699 Zimbabwe Africa 1987 62.351 9216418 706.157306
[142 rows x 6 columns]
1992
country continent year lifeExp pop gdpPercap
8 Afghanistan Asia 1992 41.674 16317921 649.341395
20 Albania Europe 1992 71.581 3326498 2497.437901
32 Algeria Africa 1992 67.744 26298373 5023.216647
44 Angola Africa 1992 40.647 8735988 2627.845685
56 Argentina Americas 1992 71.868 33958947 9308.418710
... ... ... ... ... ... ...
1652 Vietnam Asia 1992 67.662 69940728 989.023149
1664 West Bank and Gaza Asia 1992 69.718 2104779 6017.654756
1676 Yemen, Rep. Asia 1992 55.599 13367997 1879.496673
1688 Zambia Africa 1992 46.100 8381163 1210.884633
1700 Zimbabwe Africa 1992 60.377 10704340 693.420786
[142 rows x 6 columns]
1997
country continent year lifeExp pop gdpPercap
9 Afghanistan Asia 1997 41.763 22227415 635.341351
21 Albania Europe 1997 72.950 3428038 3193.054604
33 Algeria Africa 1997 69.152 29072015 4797.295051
45 Angola Africa 1997 40.963 9875024 2277.140884
57 Argentina Americas 1997 73.275 36203463 10967.281950
... ... ... ... ... ... ...
1653 Vietnam Asia 1997 70.672 76048996 1385.896769
1665 West Bank and Gaza Asia 1997 71.096 2826046 7110.667619
1677 Yemen, Rep. Asia 1997 58.020 15826497 2117.484526
1689 Zambia Africa 1997 40.238 9417789 1071.353818
1701 Zimbabwe Africa 1997 46.809 11404948 792.449960
[142 rows x 6 columns]
2002
country continent year lifeExp pop gdpPercap
10 Afghanistan Asia 2002 42.129 25268405 726.734055
22 Albania Europe 2002 75.651 3508512 4604.211737
34 Algeria Africa 2002 70.994 31287142 5288.040382
46 Angola Africa 2002 41.003 10866106 2773.287312
58 Argentina Americas 2002 74.340 38331121 8797.640716
... ... ... ... ... ... ...
1654 Vietnam Asia 2002 73.017 80908147 1764.456677
1666 West Bank and Gaza Asia 2002 72.370 3389578 4515.487575
1678 Yemen, Rep. Asia 2002 60.308 18701257 2234.820827
1690 Zambia Africa 2002 39.193 10595811 1071.613938
1702 Zimbabwe Africa 2002 39.989 11926563 672.038623
[142 rows x 6 columns]
2007
country continent year lifeExp pop gdpPercap
11 Afghanistan Asia 2007 43.828 31889923 974.580338
23 Albania Europe 2007 76.423 3600523 5937.029526
35 Algeria Africa 2007 72.301 33333216 6223.367465
47 Angola Africa 2007 42.731 12420476 4797.231267
59 Argentina Americas 2007 75.320 40301927 12779.379640
... ... ... ... ... ... ...
1655 Vietnam Asia 2007 74.249 85262356 2441.576404
1667 West Bank and Gaza Asia 2007 73.422 4018332 3025.349798
1679 Yemen, Rep. Asia 2007 62.698 22211743 2280.769906
1691 Zambia Africa 2007 42.384 11746035 1271.211593
1703 Zimbabwe Africa 2007 43.487 12311143 469.709298
[142 rows x 6 columns]
print(df.groupby("year")["lifeExp"].mean())
<pandas.core.groupby.generic.DataFrameGroupBy object at 0x0DB33388>
<pandas.core.groupby.generic.SeriesGroupBy object at 0x0DB33BE0>
year
1952 49.057620
1957 51.507401
1962 53.609249
1967 55.678290
1972 57.647386
1977 59.570157
1982 61.533197
1987 63.212613
1992 64.160338
1997 65.014676
2002 65.694923
2007 67.007423
Name: lifeExp, dtype: float64
print(df.groupby(["year","continent"])["lifeExp","gdpPercap"].mean())
lifeExp gdpPercap
year continent
1952 Africa 39.135500 1252.572466
Americas 53.279840 4079.062552
Asia 46.314394 5195.484004
Europe 64.408500 5661.057435
Oceania 69.255000 10298.085650
1957 Africa 41.266346 1385.236062
Americas 55.960280 4616.043733
Asia 49.318544 5787.732940
Europe 66.703067 6963.012816
Oceania 70.295000 11598.522455
1962 Africa 43.319442 1598.078825
Americas 58.398760 4901.541870
Asia 51.563223 5729.369625
Europe 68.539233 8365.486814
Oceania 71.085000 12696.452430
1967 Africa 45.334538 2050.363801
Americas 60.410920 5668.253496
Asia 54.663640 5971.173374
Europe 69.737600 10143.823757
Oceania 71.310000 14495.021790
1972 Africa 47.450942 2339.615674
Americas 62.394920 6491.334139
Asia 57.319269 8187.468699
Europe 70.775033 12479.575246
Oceania 71.910000 16417.333380
1977 Africa 49.580423 2585.938508
Americas 64.391560 7352.007126
Asia 59.610556 7791.314020
Europe 71.937767 14283.979110
Oceania 72.855000 17283.957605
1982 Africa 51.592865 2481.592960
Americas 66.228840 7506.737088
Asia 62.617939 7434.135157
Europe 72.806400 15617.896551
Oceania 74.290000 18554.709840
1987 Africa 53.344788 2282.668991
Americas 68.090720 7793.400261
Asia 64.851182 7608.226508
Europe 73.642167 17214.310727
Oceania 75.320000 20448.040160
1992 Africa 53.629577 2281.810333
Americas 69.568360 8044.934406
Asia 66.537212 8639.690248
Europe 74.440100 17061.568084
Oceania 76.945000 20894.045885
1997 Africa 53.598269 2378.759555
Americas 71.150480 8889.300863
Asia 68.020515 9834.093295
Europe 75.505167 19076.781802
Oceania 78.190000 24024.175170
2002 Africa 53.325231 2599.385159
Americas 72.422040 9287.677107
Asia 69.233879 10174.090397
Europe 76.700600 21711.732422
Oceania 79.740000 26938.778040
2007 Africa 54.806038 3089.032605
Americas 73.608120 11003.031625
Asia 70.728485 12473.026870
Europe 77.648600 25054.481636
Oceania 80.719500 29810.188275
<ipython-input-91-8e3b916ee35a>:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
print(df.groupby(["year","continent"])["lifeExp","gdpPercap"].mean())
print(df.groupby(["year","continent"])["lifeExp","gdpPercap"].mean().reset_index())
year continent lifeExp gdpPercap
0 1952 Africa 39.135500 1252.572466
1 1952 Americas 53.279840 4079.062552
2 1952 Asia 46.314394 5195.484004
3 1952 Europe 64.408500 5661.057435
4 1952 Oceania 69.255000 10298.085650
5 1957 Africa 41.266346 1385.236062
6 1957 Americas 55.960280 4616.043733
7 1957 Asia 49.318544 5787.732940
8 1957 Europe 66.703067 6963.012816
9 1957 Oceania 70.295000 11598.522455
10 1962 Africa 43.319442 1598.078825
11 1962 Americas 58.398760 4901.541870
12 1962 Asia 51.563223 5729.369625
13 1962 Europe 68.539233 8365.486814
14 1962 Oceania 71.085000 12696.452430
15 1967 Africa 45.334538 2050.363801
16 1967 Americas 60.410920 5668.253496
17 1967 Asia 54.663640 5971.173374
18 1967 Europe 69.737600 10143.823757
19 1967 Oceania 71.310000 14495.021790
20 1972 Africa 47.450942 2339.615674
21 1972 Americas 62.394920 6491.334139
22 1972 Asia 57.319269 8187.468699
23 1972 Europe 70.775033 12479.575246
24 1972 Oceania 71.910000 16417.333380
25 1977 Africa 49.580423 2585.938508
26 1977 Americas 64.391560 7352.007126
27 1977 Asia 59.610556 7791.314020
28 1977 Europe 71.937767 14283.979110
29 1977 Oceania 72.855000 17283.957605
30 1982 Africa 51.592865 2481.592960
31 1982 Americas 66.228840 7506.737088
32 1982 Asia 62.617939 7434.135157
33 1982 Europe 72.806400 15617.896551
34 1982 Oceania 74.290000 18554.709840
35 1987 Africa 53.344788 2282.668991
36 1987 Americas 68.090720 7793.400261
37 1987 Asia 64.851182 7608.226508
38 1987 Europe 73.642167 17214.310727
39 1987 Oceania 75.320000 20448.040160
40 1992 Africa 53.629577 2281.810333
41 1992 Americas 69.568360 8044.934406
42 1992 Asia 66.537212 8639.690248
43 1992 Europe 74.440100 17061.568084
44 1992 Oceania 76.945000 20894.045885
45 1997 Africa 53.598269 2378.759555
46 1997 Americas 71.150480 8889.300863
47 1997 Asia 68.020515 9834.093295
48 1997 Europe 75.505167 19076.781802
49 1997 Oceania 78.190000 24024.175170
50 2002 Africa 53.325231 2599.385159
51 2002 Americas 72.422040 9287.677107
52 2002 Asia 69.233879 10174.090397
53 2002 Europe 76.700600 21711.732422
54 2002 Oceania 79.740000 26938.778040
55 2007 Africa 54.806038 3089.032605
56 2007 Americas 73.608120 11003.031625
57 2007 Asia 70.728485 12473.026870
58 2007 Europe 77.648600 25054.481636
59 2007 Oceania 80.719500 29810.188275
<ipython-input-97-5ac49de90e81>:1: FutureWarning: Indexing with multiple keys (implicitly converted to a tuple of keys) will be deprecated, use a list instead.
print(df.groupby(["year","continent"])["lifeExp","gdpPercap"].mean().reset_index())
print(df.groupby(["continent"])["country"].nunique())
continent
Africa 52
Americas 25
Asia 33
Europe 30
Oceania 2
Name: country, dtype: int64
print(df.groupby(["continent"])["country"].value_counts())
continent country
Africa Algeria 12
Angola 12
Benin 12
Botswana 12
Burkina Faso 12
..
Europe Switzerland 12
Turkey 12
United Kingdom 12
Oceania Australia 12
New Zealand 12
Name: country, Length: 142, dtype: int64
print(df.groupby(["continent"])["country"].unique())
continent
Africa [Algeria, Angola, Benin, Botswana, Burkina Fas...
Americas [Argentina, Bolivia, Brazil, Canada, Chile, Co...
Asia [Afghanistan, Bahrain, Bangladesh, Cambodia, C...
Europe [Albania, Austria, Belgium, Bosnia and Herzego...
Oceania [Australia, New Zealand]
Name: country, dtype: object
import matplotlib
print(df.groupby("year")["lifeExp"].mean().plot())
---------------------------------------------------------------------------
ModuleNotFoundError Traceback (most recent call last)
<ipython-input-112-3be0cbb50fbe> in <module>
----> 1 import matplotlib
2 print(df.groupby("year")["lifeExp"].mean().plot())
ModuleNotFoundError: No module named 'matplotlib'