当前位置：首页 > article >正文

《Python深度学习》第二讲：深度学习的数学基础

article 2025/3/17 18:22:09

本讲来聊聊深度学习的数学基础。

深度学习听起来很厉害，其实它背后是一些很有趣的数学原理。本讲会用简单的方式解释这些原理，还会用一些具体的例子来帮助你理解。

2.1 初识神经网络

先从一个简单的任务开始：识别手写数字。

想象一下，你有一堆手写数字的图片，你想让计算机识别出这些数字。这听起来是不是有点像魔法？其实，这就是深度学习能做的事情。我们用一个简单的神经网络来解决这个问题。

我们用的是MNIST数据集，这是一个很经典的数据集，包含60,000张训练图片和10,000张测试图片，每张图片都是一个28×28的灰度图像。我们的目标是训练一个神经网络，让它能够识别这些数字。

from keras.datasets import mnist
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

我们来看看这些数据。

train_images是一个形状为(60000, 28, 28)的3D张量，train_labels是一个长度为60000的数组，包含0到9的数字。test_images和test_labels也是类似的，但只有10,000张图片。

print(train_images.shape)  # 输出：(60000, 28, 28)
print(train_labels.shape)  # 输出：(60000,)

接下来，我们构建一个简单的神经网络。

这个网络包含两个Dense层，第一层有512个神经元，第二层有10个神经元，对应10个数字类别。我们使用ReLU激活函数和softmax激活函数，让网络能够输出每个数字的概率。

from keras import models
from keras import layers

network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

然后，我们需要编译网络。

我们选择rmsprop作为优化器，categorical_crossentropy作为损失函数，accuracy作为评估指标。

network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

在训练之前，我们需要对数据进行预处理。

我们需要将图片数据从整数转换为浮点数，并且将它们的形状从(28, 28)变为(28 * 28)。我们还需要将标签数据进行分类编码。

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

最后，我们开始训练网络。

我们用fit方法来训练网络，每次迭代都会计算损失值和准确率。我们训练5个epoch，每次处理128张图片。

network.fit(train_images, train_labels, epochs=5, batch_size=128)

训练完成后，我们在测试集上评估模型的性能。

test_loss, test_acc = network.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)

结果怎么样呢？

你会发现，这个简单的神经网络在测试集上的准确率可以达到98%左右。这说明深度学习真的很强大，它能够自动从数据中学习规律，然后在新数据上进行准确的预测。

2.2 神经网络的数据表示

好，下面我们来聊聊神经网络的数据表示。

在深度学习中，所有的数据都是以张量的形式存储的。张量就像是一个多维数组，它可以存储不同维度的数据。比如，标量是一个0维张量，它就是一个单独的数字；向量是一个1维张量，它是一组数字；矩阵是一个2维张量，它是一个数字的网格。在深度学习中，我们经常用到高维张量，比如一个4维张量可以用来表示一批图像数据。

我们来看一个具体的例子。

在MNIST数据集中，每张图片是一个28×28的灰度图像，它是一个2维张量。当我们把所有图片放在一起时，它就变成了一个3维张量。比如，train_images是一个形状为(60000, 28, 28)的3D张量，表示有60,000张图片，每张图片是一个28×28的矩阵。

print(train_images.shape)  # 输出：(60000, 28, 28)

张量的属性也很重要。

每个张量都有几个重要的属性：它的轴数（维度数）、形状（每个维度的大小）和数据类型（比如float32或uint8）。这些属性决定了张量的结构和存储方式。

print(train_images.ndim)  # 输出：3
print(train_images.shape)  # 输出：(60000, 28, 28)
print(train_images.dtype)  # 输出：uint8

在训练神经网络之前，我们需要对数据进行预处理。

我们需要将图片数据从整数转换为浮点数，并且将它们的形状从(28, 28)变为(28 * 28)。我们还需要将标签数据进行分类编码。

train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

为什么要这样做呢？

因为神经网络需要输入的数据是浮点数，并且数据的范围最好在0到1之间。这样可以让网络的学习过程更加稳定。分类编码则是为了让标签数据更适合神经网络的输出层。

2.3 神经网络的“齿轮”：张量运算

张量运算是神经网络的核心，它决定了数据如何在神经网络中流动和变换。我们来看几个具体的张量运算。

逐元素运算

逐元素运算是最简单的张量运算之一，它会对张量中的每个元素单独进行操作。比如，ReLU激活函数会将所有负值置为0，这可以帮助网络处理非线性问题。

import numpy as np

def naive_relu(x):
    assert len(x.shape) == 2
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] = max(x[i, j], 0)
    return x

广播机制

广播是一种非常巧妙的机制，它可以让不同形状的张量进行运算。比如，你可以将一个2D张量和一个1D张量相加，广播会自动将1D张量扩展到2D张量的形状，然后进行逐元素运算。

def naive_add_matrix_and_vector(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 1
    assert x.shape[1] == y.shape[0]
    x = x.copy()
    for i in range(x.shape[0]):
        for j in range(x.shape[1]):
            x[i, j] += y[j]
    return x

张量点积

点积是一种非常重要的张量运算，它会将两个张量的元素合并在一起。比如，矩阵乘法就是一种点积运算，它在神经网络中被广泛用于计算权重和输入的组合。

def naive_matrix_dot(x, y):
    assert len(x.shape) == 2
    assert len(y.shape) == 2
    assert x.shape[1] == y.shape[0]
    z = np.zeros((x.shape[0], y.shape[1]))
    for i in range(x.shape[0]):
        for j in range(y.shape[1]):
            row_x = x[i, :]
            column_y = y[:, j]
            z[i, j] = np.dot(row_x, column_y)
    return z

张量变形与转置

张量变形可以改变张量的形状，但不会改变它的数据。比如，你可以将一个3D张量变形为一个2D张量，以便进行矩阵运算。转置则是将矩阵的行和列互换，这在某些情况下非常有用。

x = np.array([[5, 78, 2, 34, 0],
              [6, 79, 3, 35, 1],
              [7, 80, 4, 36, 2]])
x = x.reshape((2, 3, 5))  # 变形为一个3D张量
x = np.transpose(x)  # 转置

2.4 神经网络的“引擎”：基于梯度的优化

想象你有一个函数，你想找到它的最小值。导数就像是一个指南针，它告诉你函数在某个点的变化方向。梯度则是导数在多维空间中的推广，它告诉你函数在每个方向上的变化速度。

梯度下降算法就像是一个登山者，它会沿着梯度的反方向一步一步地寻找函数的最小值。

每次迭代，它都会根据梯度的大小调整自己的位置，直到找到最小值。

def naive_gradient_descent(x, gradient, learning_rate):
    x -= learning_rate * gradient
    return x

随机梯度下降（SGD）是一种更高效的梯度下降算法，它每次只用一小部分数据来计算梯度。

这样可以大大减少计算量，同时让训练过程更快。

def naive_sgd(x, gradient, learning_rate):
    x -= learning_rate * gradient
    return x

链式求导与反向传播算法是深度学习的核心。

反向传播算法通过链式求导来计算每个参数的梯度。想象你有一个复杂的机器，你想调整它的每个部件来达到最好的效果。反向传播就像是一个工程师，它从输出端开始，逐层计算每个部件的调整方向，然后将这些调整方向传播回输入端。

def naive_backpropagation(x, y, learning_rate):
    # 这里是一个简化的反向传播过程
    gradient = np.dot(x.T, y)
    x = naive_gradient_descent(x, gradient, learning_rate)
    return x

2.5 回顾第一个例子

好，下面我们来回顾一下第一个例子。

我们用Keras框架实现了一个简单的神经网络，让它能够识别手写数字。我们加载了MNIST数据集，对数据进行了预处理，构建了一个包含两个Dense层的神经网络，编译了网络，训练了网络，并在测试集上评估了模型的性能。

我们来看看这个过程的具体代码。

from keras.datasets import mnist
from keras import models
from keras import layers

# 加载数据
(train_images, train_labels), (test_images, test_labels) = mnist.load_data()

# 数据预处理
train_images = train_images.reshape((60000, 28 * 28))
train_images = train_images.astype('float32') / 255

test_images = test_images.reshape((10000, 28 * 28))
test_images = test_images.astype('float32') / 255

from keras.utils import to_categorical

train_labels = to_categorical(train_labels)
test_labels = to_categorical(test_labels)

# 构建网络
network = models.Sequential()
network.add(layers.Dense(512, activation='relu', input_shape=(28 * 28,)))
network.add(layers.Dense(10, activation='softmax'))

# 编译网络
network.compile(optimizer='rmsprop',
                loss='categorical_crossentropy',
                metrics=['accuracy'])

# 训练网络
network.fit(train_images, train_labels, epochs=5, batch_size=128)

# 评估模型
test_loss, test_acc = network.evaluate(test_images, test_labels)
print('Test accuracy:', test_acc)

结果怎么样呢？