当前位置：首页 > article >正文

深入了解 Pandas 中的数据：Series 和 DataFrame 的使用指南

article 2024/10/25 14:19:10

深入了解 Pandas 中的数据：Series 和 DataFrame 的使用指南

本文介绍了 Pandas 数据分析库中两种核心数据结构：Series 和 DataFrame。Series 是一维数据结构，类似于 Python 的列表或字典，而 DataFrame 则是类似于表格的二维数据结构，包含行列标签，使得数据操作更加直观和灵活。通过本文，你将学习如何创建 Series 和 DataFrame，如何进行自定义索引、从不同格式的数据（如 JSON、NumPy 数组）转换为 Pandas 数据，并将 Pandas 数据转换为 NumPy 数组。同时，文章还提供了完整的代码示例，帮助你快速上手 Pandas 的数据处理。

文章目录

深入了解 Pandas 中的数据：Series 和 DataFrame 的使用指南
- - 一简介
  - 二导入库
  - 三数据序列 Series
  - - 1 创建 Series
    - 2 自定义索引
    - 3 字典转 Series
    - 4 NumPy 数组转 Series
    - 5 Series 转换为 List 和 NumPy 数组
  - 四数据表 DataFrame
  - - 1 二维数组转 DataFrame
    - 2 自定义列名
    - 3 Series 转 DataFrame
    - 4 合并两个 Series 成为 DataFrame
    - 5 自定义特殊的索引
    - 6 获取索引值和列名
    - 7 JSON 数据转换成 DataFrame
  - 五 DataFrame 转 NumPy 数组
  - 六总结
  - 七完整代码示例
  - 八源码地址

一简介

在 Pandas 中，Series 是一维数据结构，类似于列表或字典，而 DataFrame 是一个二维数据结构，类似于表格，包含多行和多列的数据。

二导入库

在开始使用 Pandas 之前，我们需要导入相关的库：

import pandas as pd
import numpy as np

三数据序列 Series

Series 是 Pandas 中用于存储一维数据的对象，可以由列表、字典或 NumPy 数组创建。

1 创建 Series

l = [11, 22, 33]
s = pd.Series(l)
print("List:", l)
print("Series:", s)

2 自定义索引

我们可以为 Series 自定义索引，以便更好地管理数据。

s = pd.Series(l, index=["a", "b", "c"])
print(s)

3 字典转 Series

使用字典创建 Series 时，字典的键会作为索引。

s = pd.Series({"a": 11, "b": 22, "c": 33})
print(s)

4 NumPy 数组转 Series

可以将 NumPy 数组转换为 Series，并指定自定义的索引。

s = pd.Series(np.random.rand(3), index=["a", "b", "c"])
print(s)

5 Series 转换为 List 和 NumPy 数组

可以将 Series 转换为 Python 列表或 NumPy 数组：

print("Array:", s.to_numpy())
print("List:", s.values.tolist())

四数据表 DataFrame

DataFrame 是 Pandas 中的二维数据结构，可以通过列表、字典、Series 等方式创建。

1 二维数组转 DataFrame

我们可以将二维数组转换为 DataFrame：

df = pd.DataFrame([
    [1, 2],
    [3, 4]
])
print(df)
# 选择数据：第 0 行，第 1 列
df.at[0, 1]

2 自定义列名

我们可以使用字典来创建 DataFrame，并指定列名：

df = pd.DataFrame({"col1": [1, 3], "col2": [2, 4]})
print(df)

3 Series 转 DataFrame

从 DataFrame 中取出一个列将得到一个 Series 对象：

print(df["col1"], "\n")
print("取出来之后的类型：", type(df["col1"]))

4 合并两个 Series 成为 DataFrame

可以将两个 Series 拼接成一个 DataFrame：

df = pd.DataFrame({"col1": pd.Series([1, 3]), "col2": pd.Series([2, 4])})
print(df)

5 自定义特殊的索引

可以为 Series 和 DataFrame 自定义索引：

s = pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"])
df = pd.DataFrame({"col1": [1, 3], "col2": [2, 4]}, index=["a", "b"])
print(s, "\n")
print(df)
# 获取索引和列名
df.index, df.columns

6 获取索引值和列名

获取 DataFrame 的索引和列名的方式如下：

print("索引：", df.index, "\n")
print("列名：", df.columns)
print("索引类型：", type(df.index), "\n")
print("列名类型：", type(df.columns))

7 JSON 数据转换成 DataFrame

可以将 JSON 格式的数据转换为 DataFrame：

my_json_data = [
    {"age": 12, "height": 111},
    {"age": 13, "height": 123}
]
df = pd.DataFrame(my_json_data, index=["jack", "rose"])
print(df)

五 DataFrame 转 NumPy 数组

DataFrame 可以很方便地转换为 NumPy 数组：

df = pd.DataFrame({"col1": [1, 3], "col2": [2, 4]}, index=["a", "b"])
print(df.to_numpy())

六总结

Pandas 提供了灵活且强大的数据结构 Series 和 DataFrame，使得数据的存储、处理和分析变得非常简单方便。Series 主要用于处理一维数据，而 DataFrame 是处理二维数据的利器。通过以上内容，相信你对 Pandas 的基本数据操作有了更深入的理解。

七完整代码示例

# This is a sample Python script.

# Press ⌃R to execute it or replace it with your code.
# Press Double ⇧ to search everywhere for classes, files, tool windows, actions, and settings.
import pandas as pd
import numpy as np


def print_hi(name):
    # Use a breakpoint in the code line below to debug your script.
    print(f'Hi, {name}')  # Press ⌘F8 to toggle the breakpoint.
    # 数据序列Series
    # 创建
    # 转换 Numpy
    # 数据表DataFrame
    # 创建
    # 转换 Numpy
    # 数据序列Series
    l = [11, 22, 33]
    s = pd.Series(l)
    print("list:", l)
    print("series:", s)
    # 自定义索引
    s = pd.Series(l, index=["a", "b", "c"])
    print(s)
    # 一维的 Series
    # 字典转序列
    s = pd.Series({"a": 11, "b": 22, "c": 33})
    print(s)
    # numpy转序列
    s = pd.Series(np.random.rand(3), index=["a", "b", "c"])
    print(s)
    # 序列转 List 和 numpy
    print("array:", s.to_numpy())
    print("list:", s.values.tolist())

    # 数据表DataFrame
    # 二维数组变成 Pandas 的 DataFrame。
    df = pd.DataFrame([
        [1, 2],
        [3, 4]
    ])
    print(df)
    # 选择数据
    # 第 0 行，第 1 列
    # 或 第一个维度中的第 0 号，第二个维度中的第 1 号
    print(df.at[0, 1])
    # 自定义索引序号，key 会被转成 column
    df = pd.DataFrame({"col1": [1, 3], "col2": [2, 4]})
    print(df)
    # Series 转 DataFrame ，从 DataFrame 中取出一个 Column
    print(df["col1"], "\n")
    print("取出来之后的 type：", type(df["col1"]))
    # 两个 Series 拼在一起
    df = pd.DataFrame({"col1": pd.Series([1, 3]), "col2": pd.Series([2, 4])})
    print(df)
    # 自定义特殊的索引
    s = pd.Series([1.0, 2.0, 3.0], index=["a", "b", "c"])
    df = pd.DataFrame({"col1": [1, 3], "col2": [2, 4]}, index=["a", "b"])
    print(s, "\n")
    print(df)
    # 获取索引值
    print("索引：", df.index, "\n")
    print("列名：", df.columns)
    print("索引类型：", type(df.index), "\n")
    print("列名类型：", type(df.columns))
    # json数据转换成DataFrame
    my_json_data = [
        {"age": 12, "height": 111},
        {"age": 13, "height": 123}
    ]
    print(pd.DataFrame(my_json_data, index=["jack", "rose"]))
    # DataFrame 转 numpy
    df = pd.DataFrame({"col1": [1, 3], "col2": [2, 4]}, index=["a", "b"])
    print(df.to_numpy())
    # Pandas 中，Series 的一维数据，和 DataFrame 的二维数据


# Press the green button in the gutter to run the script.
if __name__ == '__main__':
    print_hi('Pandas 中的数据是什么')

# See PyCharm help at https://www.jetbrains.com/help/pycharm/

复制粘贴并覆盖到你的 main.py 中运行，运行结果如下。

Hi, Pandas 中的数据是什么
list: [11, 22, 33]
series: 0    11
1    22
2    33
dtype: int64
a    11
b    22
c    33
dtype: int64
a    11
b    22
c    33
dtype: int64
a    0.178483
b    0.084620
c    0.767404
dtype: float64
array: [0.17848271 0.08462048 0.76740383]
list: [0.17848271435988583, 0.08462047778035053, 0.7674038307556463]
   0  1
0  1  2
1  3  4
2
   col1  col2
0     1     2
1     3     4
0    1
1    3
Name: col1, dtype: int64 

取出来之后的 type： <class 'pandas.core.series.Series'>
   col1  col2
0     1     2
1     3     4
a    1.0
b    2.0
c    3.0
dtype: float64 

   col1  col2
a     1     2
b     3     4
索引： Index(['a', 'b'], dtype='object') 

列名： Index(['col1', 'col2'], dtype='object')
索引类型： <class 'pandas.core.indexes.base.Index'> 

列名类型： <class 'pandas.core.indexes.base.Index'>
      age  height
jack   12     111
rose   13     123
[[1 2]
 [3 4]]