当前位置：首页 > article >正文

Introducing Optimization

article 2025/3/2 23:02:54

Chapter6：Introducing Optimization

声明：本篇博客笔记来源于《Neural Networks from scratch in Python》，作者的youtube

其实关于神经网络的入门博主已经写过几篇了，这里就不再赘述，附上链接。
1.一文窥见神经网络
2.神经网络入门(上)
3.神经网络入门(下)

前五章内容：
1.Coding Our First Neurons
2.Adding Hidden Layers
3.Activation Functions
4.Calculating Network Error with Loss

现在神经网络已经建立起来，可以让数据通过它，并且能够计算损失后，下一步是确定如何调整权重和偏差来减少损失的损失。
找到一种智能的方法来调整神经元输入的权重和偏差，以最小化损失是神经网络的主要难点。

我们尝试使用随机初始化来对网络的权重和偏置进行更新

import numpy as np
import nnfs
from nnfs.datasets import spiral_data
nnfs.init()

# 构建两层神经元之间的关系
class Layer_Dense:
    # 初始化权重和偏置
    def __init__(self, n_inputs, n_neurons):
        # weights(2,3) 下标00代表输入层第一个神经元和隐藏层第一个神经元之间的权重
        #      0       1      2
        # 0  0.013   0.016  0.011
        # 1  0.006   0.006  0.046
        self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)
        # biases(1,3)  代表隐藏层3个神经元的偏置
        #      0    1    2
        # 0  0.0   0.0  0.0
        self.biases = np.zeros((1, n_neurons))

    # 两层之间的前向传播，计算weights*inputs+biases
    def forward(self, inputs):
        # inputs(300,2) * weights(2,3) + biases(1,3) 这里将300个样本（矩阵）直接乘输入层和隐藏层之间的权重矩阵完成批量前向传播
        # outputs(300,3) 300个样本，每个样本对应三个类别的预测概率值
        self.outputs = np.dot(inputs, self.weights) + self.biases


# 隐藏层激活函数ReLU
class Activation_ReLU:
    def forward(self, inputs):
        self.outputs = np.maximum(0, inputs)


# 输出层激活函数Softmax
class Activation_Softmax:
    def forward(self, inputs):
        # 指数运算
        exp_vals = np.exp(inputs - np.max(inputs))  # 防止指数爆炸
        # 指数运算结果求和
        probability = exp_vals / np.sum(exp_vals, axis=1, keepdims=True)
        self.outputs = probability


# 计算平均损失
class Loss:
    def calculate(self, y_pred, y):
        # 计算交叉熵
        sample_losses = self.forward(y_pred, y)
        # 计算平均损失
        average_loss = np.mean(sample_losses)
        return average_loss


# 计算交叉熵
class cross_entropy(Loss):
    def forward(self, y_pred, y_true):
        # 为防止除0发生，利用clip函数将y_pred限制在10^-7~1-10^-7,即0.0000001~0.9999999
        y_pred_clipped = np.clip(y_pred, 1e-7, 1-1e-7)
        # 根据不同y_true类型进行置信度得分计算
        # 假设y_true为一维列表形式
        # 根据真实标签的下标，将预测结果中每个样本的对应类别预测值过滤出来
        samples = len(y_pred)
        if len(y_true.shape) == 1:
            # range(samples)生成下标0~299的列表，
            # 从 y_pred_clipped 中提取每个样本对应真实类别的预测值，结果是一个一维数组 correct_confidences，其中每个值是对应样本的预测置信度
            correct_confidences = y_pred_clipped[range(samples), y_true]
        elif len(y_true.shape) == 2:
            correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)

        # 计算交叉熵
        negtivate_log_likelihood = -np.log(correct_confidences)
        return negtivate_log_likelihood


# 数据集
# X (300,2) y（300,）这个真实标签是len(y_true.shape)==1的类型
'''
 X是每个数据(x,y)坐标值
 X    x值       y值
 0  0.00299, 0.00964
 1  0.01288, 0.01556
 y表示类别
 y 类别下标
 0 0
 1 0
 2 1
'''
X, y = spiral_data(samples=100, classes=3)  # X数据，y真实标签
# 构建从输入层到隐藏层
dense1 = Layer_Dense(2, 3)
# 隐藏层激活函数
relu = Activation_ReLU()
# 构建从隐藏层到输出层
dense2 = Layer_Dense(3, 3)
# 输出层激活函数
softmax = Activation_Softmax()
# 实例化交叉熵函数
loss_function = cross_entropy()


# 初始化最低损失
lowest_loss = 9999
# 用于在更新时，保存最低损失对应网络中的权重和偏置
best_dense1_weights = dense1.weights.copy()
best_dense1_biases = dense1.biases.copy()
best_dense2_weights = dense2.weights.copy()
best_dense2_biases = dense2.biases.copy()

for iteration in range(9999):
    # 更新权重和偏置
    dense1.weights += 0.05 * np.random.randn(2, 3)
    dense1.biases += 0.05 * np.random.randn(1, 3)
    dense2.weights += 0.05 * np.random.randn(3, 3)
    dense2.biases += 0.05 * np.random.randn(1, 3)

    # 传入数据，前向传播到隐藏层
    dense1.forward(X)
    relu.forward(dense1.outputs)
    # 传入隐藏层激活后的值
    dense2.forward(relu.outputs)
    softmax.forward(dense2.outputs)
    # 计算平均损失
    loss = loss_function.calculate(softmax.outputs, y)
    # 计算精度
    prediction = np.argmax(softmax.outputs, axis=1)
    accuracy = np.mean(prediction == y)

    if loss < lowest_loss:
        # 打印迭代次数、损失、精确度
        print('New set of weights found, iteration:', iteration, 'loss:', loss, 'accuracy:', accuracy)
        # 保存当前最低损失对应的权重和偏置
        best_dense1_weights = dense1.weights.copy()
        best_dense1_biases = dense1.biases.copy()
        best_dense2_weights = dense2.weights.copy()
        best_dense2_biases = dense2.biases.copy()
        # 更新当前最低损失
        lowest_loss = loss
    # 当前损失loss大于最低损失，则将初始化的权重和偏置还原到网络中
    else:
        dense1.weights = best_dense1_weights.copy()
        dense1.biases = best_dense1_biases.copy()
        dense2.weights = best_dense2_weights.copy()
        dense2.biases = best_dense2_biases.copy()

============
New set of weights found, iteration: 0 loss: 1.1008677 accuracy: 0.3333333333333333
New set of weights found, iteration: 1 loss: 1.0994315 accuracy: 0.3333333333333333
New set of weights found, iteration: 2 loss: 1.0991217 accuracy: 0.3333333333333333
New set of weights found, iteration: 3 loss: 1.0986339 accuracy: 0.3333333333333333
New set of weights found, iteration: 4 loss: 1.0986199 accuracy: 0.3333333333333333
New set of weights found, iteration: 5 loss: 1.0984716 accuracy: 0.36333333333333334
New set of weights found, iteration: 18 loss: 1.0983391 accuracy: 0.3333333333333333
New set of weights found, iteration: 27 loss: 1.0982698 accuracy: 0.3333333333333333
New set of weights found, iteration: 31 loss: 1.0982264 accuracy: 0.37333333333333335
New set of weights found, iteration: 35 loss: 1.0979562 accuracy: 0.3333333333333333
New set of weights found, iteration: 36 loss: 1.0977433 accuracy: 0.3433333333333333
New set of weights found, iteration: 37 loss: 1.0976934 accuracy: 0.3333333333333333
New set of weights found, iteration: 44 loss: 1.097596 accuracy: 0.3466666666666667
New set of weights found, iteration: 50 loss: 1.0973785 accuracy: 0.36333333333333334
New set of weights found, iteration: 51 loss: 1.0959908 accuracy: 0.3566666666666667
New set of weights found, iteration: 60 loss: 1.0959282 accuracy: 0.35333333333333333
New set of weights found, iteration: 65 loss: 1.0954362 accuracy: 0.38333333333333336
New set of weights found, iteration: 67 loss: 1.093989 accuracy: 0.4166666666666667
New set of weights found, iteration: 71 loss: 1.0926254 accuracy: 0.37666666666666665
New set of weights found, iteration: 79 loss: 1.0921575 accuracy: 0.35333333333333333
New set of weights found, iteration: 94 loss: 1.0918257 accuracy: 0.4166666666666667
New set of weights found, iteration: 101 loss: 1.0914664 accuracy: 0.38666666666666666
New set of weights found, iteration: 102 loss: 1.0909607 accuracy: 0.38333333333333336
New set of weights found, iteration: 103 loss: 1.0906307 accuracy: 0.35333333333333333
New set of weights found, iteration: 106 loss: 1.089146 accuracy: 0.4166666666666667
New set of weights found, iteration: 113 loss: 1.0891142 accuracy: 0.37666666666666665
New set of weights found, iteration: 115 loss: 1.088237 accuracy: 0.36333333333333334
New set of weights found, iteration: 120 loss: 1.0880405 accuracy: 0.39
New set of weights found, iteration: 129 loss: 1.0874124 accuracy: 0.42333333333333334
New set of weights found, iteration: 140 loss: 1.087239 accuracy: 0.4266666666666667
New set of weights found, iteration: 157 loss: 1.0870038 accuracy: 0.42
New set of weights found, iteration: 163 loss: 1.0870035 accuracy: 0.38666666666666666
New set of weights found, iteration: 172 loss: 1.0862479 accuracy: 0.4266666666666667
New set of weights found, iteration: 175 loss: 1.0861241 accuracy: 0.41
New set of weights found, iteration: 179 loss: 1.0860893 accuracy: 0.3466666666666667
New set of weights found, iteration: 186 loss: 1.0853186 accuracy: 0.37666666666666665
New set of weights found, iteration: 190 loss: 1.0852814 accuracy: 0.42
New set of weights found, iteration: 191 loss: 1.0846506 accuracy: 0.42
New set of weights found, iteration: 203 loss: 1.0842136 accuracy: 0.42333333333333334
New set of weights found, iteration: 204 loss: 1.084184 accuracy: 0.3566666666666667
New set of weights found, iteration: 214 loss: 1.0837997 accuracy: 0.37666666666666665
New set of weights found, iteration: 218 loss: 1.0836842 accuracy: 0.4166666666666667
New set of weights found, iteration: 235 loss: 1.0836092 accuracy: 0.43333333333333335
New set of weights found, iteration: 238 loss: 1.0832268 accuracy: 0.38666666666666666
New set of weights found, iteration: 241 loss: 1.0831857 accuracy: 0.4033333333333333
New set of weights found, iteration: 246 loss: 1.0826017 accuracy: 0.38333333333333336
New set of weights found, iteration: 250 loss: 1.0825759 accuracy: 0.4033333333333333
New set of weights found, iteration: 254 loss: 1.0817988 accuracy: 0.38
New set of weights found, iteration: 282 loss: 1.0817244 accuracy: 0.38
New set of weights found, iteration: 286 loss: 1.0810702 accuracy: 0.41
New set of weights found, iteration: 288 loss: 1.0806731 accuracy: 0.37333333333333335
New set of weights found, iteration: 314 loss: 1.0806231 accuracy: 0.4066666666666667
New set of weights found, iteration: 340 loss: 1.080356 accuracy: 0.4
New set of weights found, iteration: 578 loss: 1.080259 accuracy: 0.4033333333333333
New set of weights found, iteration: 630 loss: 1.0802449 accuracy: 0.4166666666666667
New set of weights found, iteration: 877 loss: 1.0801865 accuracy: 0.4166666666666667
New set of weights found, iteration: 901 loss: 1.0801494 accuracy: 0.43
New set of weights found, iteration: 935 loss: 1.0800657 accuracy: 0.41333333333333333
New set of weights found, iteration: 978 loss: 1.0799247 accuracy: 0.42
New set of weights found, iteration: 1049 loss: 1.0798801 accuracy: 0.3933333333333333
New set of weights found, iteration: 1092 loss: 1.0797858 accuracy: 0.38666666666666666
New set of weights found, iteration: 1103 loss: 1.0795524 accuracy: 0.4033333333333333
New set of weights found, iteration: 1159 loss: 1.0795078 accuracy: 0.39666666666666667
New set of weights found, iteration: 1434 loss: 1.079379 accuracy: 0.41
New set of weights found, iteration: 1944 loss: 1.0793691 accuracy: 0.42
New set of weights found, iteration: 1967 loss: 1.0792985 accuracy: 0.4066666666666667
New set of weights found, iteration: 3281 loss: 1.0792687 accuracy: 0.42
New set of weights found, iteration: 4016 loss: 1.0792656 accuracy: 0.39666666666666667
New set of weights found, iteration: 4309 loss: 1.0792212 accuracy: 0.4033333333333333
New set of weights found, iteration: 5157 loss: 1.0791875 accuracy: 0.3933333333333333
New set of weights found, iteration: 5415 loss: 1.0790575 accuracy: 0.39

我们通过观察可知，loss虽然下降了，但是accuracy却在上下起浮，显然通过随机初始化来更新权重和偏置不可取。我们需要引入别的方法来对权重和偏置进行更新。

Chapter7：Derivatives

随机改变和搜索最优权重和偏差并没有什么效果主要原因是：权重和偏差的可能组合的数量是无限的，大海捞针去寻找权重和偏置的最优组合效率太低。
每个权重和偏置对损失的影响程度不同——这种影响取决于每个权重和偏置本身以及当前输入样本（第一层的输入）：
输入样本乘以权重后加偏置，并用激活函数对该结果进行非线性映射，前一层的输出是后一层的输入最终到输出层，输出结果与真实结果作损失，这意味着参数（每一个神经元的权重和偏置）和输入样本均对损失有影响——这就是为什么我们要计算每个输入样本单独的损失值。权重或偏差如何影响的损失函数不一定是线性的。为了知道如何调整权重和偏差，我们首先需要了解参数对损失的影响。

The Impact of a Parameter on the Output
在线性函数中，我们如何描述输入x对函数y的影响？斜率（slope）
$\frac{Change\ in\ y}{Change\ in\ x}=\frac{\Delta y}{\Delta x}$
在非线性函数中，我们如何描述输入x对函数y的影响？由于非线性函数的斜率不是一成不变的，斜率通过两点之x之差与y之差比值得到的，也就是说它只能描述两点之间的情况，对于线性函数斜率不变是适用的，而非线性函数每个点的情况都不一样，那
我们可以把两个点无限靠近这样就能描述一个无限小区间（看作一个点）的情况了

def f(x):
    return 2*x**2
# 详细地说，无限小的delta值将近似于精确的导数；然而，delta增量值需要在数值上稳定，
# 这意味着，我们的增量不能超过Python浮点精度的限制（不能太小，因为它可能被四舍五入为0，而且，正如我们所知，除以0是“非法的”）。
# 因此，我们的解决方案被限制在估计导数和保持数值稳定之间，从而引入了这个小而明显的误差

delta = 0.0001  
x1 = 1
x2 = x1 + delta
y1 = f(x1)
y2 = f(x2)
slope = (y2-y1)/(x2-x1)
print(slope)
=========
# f(x)=2x^2的导数为4x,在x=1处的导数值为4,我们通过近似切线的“割线”斜率计算结果为
4.0001999999987845 # 可见相差很小

“导数”的含义是：当x做出改变时对函数值y的影响多大，也就说导数 $f^{'} (x) = d y / d x$ 对影响进行了量化
代码求解导数的方法—数值微分,即用两个无限接近的点计算切线的斜率

import numpy as np
import matplotlib.pyplot as plt


def f(x):
    return 2 * x ** 2


x = np.arange(0, 5, 0.0001)  # 生成从 0 到 5，步长为 0.0001 的数组 `x`
y = f(x)
plt.plot(x, y)  # 绘制函数2x^2
# 绘制切线
delta = 0.0001
x1 = 2
x2 = x1 + delta
y1 = f(x1)
y2 = f(x2)
print((x1, y1), (x2, y2))
# 切线 y=mx+b 斜率m
numerical_derivatives = (y2 - y1) / (x2 - x1)
# 切线 y=mx+b, b = y - mx
b = y2 - numerical_derivatives * x2  # 计算切线的截距


def tange_line(i):
    return numerical_derivatives * i + b  # 给定i值返回切线上的函数值


to_plot = [x1 - 0.9, x1, x1 + 0.9]  # 包含三个值的列表，切线上三个点的横坐标
plt.plot(to_plot, [tange_line(i) for i in to_plot])  # to_plot横坐标，列表内的值为纵坐标，plot函数根据三个点连接为直线
print('Approximate derivative for f(x)', f'where x = {x1} is {numerical_derivatives}')  # 打印在 `x1` 处的近似导数

plt.show()
========
(2, 8) (2.0001, 8.000800020000002)
Approximate derivative for f(x) where x = 2 is 8.000199999998785

Chapter8: Gradients, Partial Derivatives, and the Chain Rule

在第一章提到，神经网络本质上就是一个层层嵌套的复杂函数，我们要想知道每个权重和偏置如何影响损失，就需要对这个复杂函数进行层层求偏导数（多元函数的导数），也就是链式法则（Chain Rule），由复杂函数最内层（对应神经网络输出层）逐层向外，最终到最外层（对应神经网络输入层），输出结果与真实数据作损失得到损失函数。