当前位置：首页 > article >正文

吴恩达深度学习——建立逻辑回归分类器识别猫

article 2025/2/28 19:56:38

本文来自吴恩达《深度学习》L1W2作业2，仅为个人学习所用。
理论来自吴恩达深度学习——神经网络编程的基础知识在理论中说明了一些函数的形式，本文不再累述。

文章目录

数据下载
- 相关包
- lr_utils文件解读
数据预处理
- 加载数据
- 数据预处理
构建过程
- 选用函数
- 计算图
- 前向传播
- - 参数预处理
  - 实现sigmoid函数
  - 实现损失函数
- 反向传播
- - 相关参数的导数
  - 梯度下降
- 预测
- 封装
调用
- 训练和测试
- 打印图像
- 预测
源码

数据下载

来自https://blog.csdn.net/u013733326/article/details/79639509中的百度网盘。

lr_utils文件解读

import numpy as np
import h5py

# 维度，有209张图片，64*64像素，3个矩阵
# print(train_set_x_orig.shape)
# # 训练集数量
# print(train_set_x_orig.shape[0])
# # 测试集数量
# print(test_set_x_orig.shape[0])
# # 图片大小
# print(train_set_x_orig.shape[1])


def load_dataset():
    # 训练集的位置,是h5文件
    train_dataset = h5py.File('E:\Git\python\\25_py_01\\0120\\012002_wuenda\datasets\\train_catvnoncat.h5', "r")
    # 训练集图像数据
    # 有209张图片，大小为64*64，有r,g,b三个矩阵
    train_set_x_orig = np.array(train_dataset["train_set_x"][:])  # your train set features
    # 查看维度
    # print(train_set_x_orig.shape)

    # 训练集对应分类值0不是猫1是猫
    # 是一个一维数组，不是向量
    train_set_y_orig = np.array(train_dataset["train_set_y"][:])  # your train set labels
    # print(train_set_y_orig.shape)

    # 测试集位置
    test_dataset = h5py.File('E:\Git\python\\25_py_01\\0120\\012002_wuenda\datasets\\test_catvnoncat.h5', "r")
    # 测试集图像数据
    test_set_x_orig = np.array(test_dataset["test_set_x"][:])  # your test set features
    # 测试集对应分类值0不是猫1是猫
    test_set_y_orig = np.array(test_dataset["test_set_y"][:])  # your test set labels

    # 所存为bytes类型的两个字符串：[b 'non-cat' , b'cat']
    classes = np.array(test_dataset["list_classes"][:])  # the list of classes

    # 因为不是向量，故需要变成1行209列的向量
    train_set_y_orig = train_set_y_orig.reshape((1, train_set_y_orig.shape[0]))
    # print(train_set_y_orig.shape)
    test_set_y_orig = test_set_y_orig.reshape((1, test_set_y_orig.shape[0]))
    # 全部返回
    return train_set_x_orig, train_set_y_orig, test_set_x_orig, test_set_y_orig, classes

train_set_x_orig ：保存训练集里面的图像数据，有209张64x64的图像。
train_set_y_orig ：保存训练集的图像对应的分类值，0表示不是猫，1表示是猫。
test_set_x_orig ：保存测试集里面的图像数据，有50张64x64的图像。
test_set_y_orig ：保存测试集的图像对应的分类值，0表示不是猫，1表示是猫。
classes ：保存bytes类型保存的两个字符串数据，数据为：[b’non-cat’ b’cat’]。

以下在l1w2_4.py中实现。

数据预处理

加载数据

# 加载数据
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

数据预处理

首先查看数据的维度。

# 以训练集为例
print(train_set_x_orig.shape)
print(train_set_y.shape)

通过调试，可以看到每一张图片具体的RGB值在这里插入图片描述
在课程中，将1张图片的R,G,B的值转换成只有1列的特征向量。共209张，有209列，在此进行转换

# ---- 预处理 ----
# 将维度为（a，b，c，d）的矩阵X展平为形状为(bcd, a)的矩阵X_flatten时的一个技巧
# 1张图片rgb为1列，共209张图片
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

查看维度和进行调试，是否符合预期。
在这里插入图片描述确实是将209张图片每一张图片的R、G、B拼在了一列组成特征向量。

为了加快训练的速度，将值做归一化处理，使得RGB的值在 $[0, 1]$ 之间。由于RGB值最大为255，只需要除以255即可。

# ---- 归一化，加快训练 ----
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.

构建过程

选用函数

由于本次编程只需要输出标签是0还是1，故是一个二元分类问题。数据的训练集train_set_y_orig中已经标记好了是猫还是不是猫。课程中选用 $\hat{y}=\boldsymbol{w}^T\boldsymbol{x}+b$ ，该函数形式简单，通过对输入特征进行加权求和再加上偏置来得到输出 $y$ ，其背后的线性组合原理非常直观，人们很容易理解模型是如何根据输入特征来计算输出的，这在模型的初步构建和理解阶段具有很大优势。

以下函数在model.py中实现。

计算图

在这里插入图片描述

函数表达式中， $w$ 和 $b$ 是向量参数， $x$ 是输入的数据集向量，计算出的 $z$ 是输出的标签向量。将标签（预测值）经过Sigmoid函数重新映射到区间 $[0, 1]$ 上，最后计算该回归直线与真实的数据的差异（损失）完成一次前向传播。

通过分析可知，前向传播部分需要完成的函数有参数预处理、计算sigmoid函数和计算损失函数。

前向传播

参数预处理

# 初始化参数w和b
def initialize_with_zeros(dim):
    # w是一个列向量，dim满足64*64*3
    w = np.zeros((dim, 1))
    b = 0

    # w是否满足条件
    # b是否是float或者int类型
    assert (w.shape == (dim, 1))
    assert (isinstance(b, float) or isinstance(b, int))
    return w, b

实现sigmoid函数

# 逻辑斯蒂函数
def sigmoid(z):
    return 1 / (1 + np.exp(-z))

实现损失函数

# 计算损失函数
# w b参数
# X 输入的数据集
# Y 输出的标签
def propagate(w, b, X, Y):
    # 获取数据集的数量
    m = X.shape[1]

    # --------前向传播--------
    # 计算标签的概率
    A = sigmoid(np.dot(w.T, X) + b)
    # 损失函数
    cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

第一次前向传播后继续更新参数以便于获得拟合实际情况的 $w$ h和 $b$ ，进行反向传播。在理论中，根据计算图，计算出相关的导数。使用梯度下降法继续更新 $w$ 和 $b$ ，等待下一次的损失函数的计算。如此迭代多次，一直到损失函数的值下降到可以接受的范围内，完成训练。

在这里插入图片描述

反向传播

梯度下降

# 梯度下降更新值
# w b
# num_iterations 循环迭代次数
# learning_rate 学习率
# print_cost 打印损失函数
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=False):
    # 损失值，用于画图
    costs = []

    # 多次迭代
    for i in range(num_iterations):
    	# 获取dw，db和cost
        grads, cost = propagate(w, b, X, Y)

        dw = grads['dw']
        db = grads['db']

        # 更新参数
        w = w - learning_rate * dw
        b = b - learning_rate * db

        if i % 100 == 0:
            costs.append(cost)
        if print_cost and i % 100 == 0:
            print("Cost after iteration %i: %f" % (i, cost))

    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs

预测

迭代完成后，获得训练后的 $\hat{y}=\boldsymbol{w}^T\boldsymbol{x}+b$ 。使用该函数进行预测。

def predict(w, b, X):
    # 获取输入集的维度
    m = X.shape[1]
    # 预测标签
    Y_prediction = np.zeros((1, m))
    # 将预测的输入集修改成满足要求的维度
    w = w.reshape(X.shape[0], 1)
    # 计算并重新映射到[0,1]区间
    A = sigmoid(np.dot(w.T, X) + b)
    
    # 分类
    for i in range(A.shape[1]):
        if A[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1

    assert (Y_prediction.shape == (1, m))
    return Y_prediction

封装

# 模型
# num_iterations 迭代次数
# learning_rate 学习率
# print_cost 每100步打印一次损失
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
    # 初始化参数
    w, b = initialize_with_zeros(X_train.shape[0])

    # 梯度下降
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

    w = parameters['w']
    b = parameters['b']

    # 预测
    Y_prediction_train = predict(w, b, X_train)
    Y_prediction_test = predict(w, b, X_test)

    # 打印预测值与真实值之间差值的平均值
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    # 封装
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test,
         "Y_prediction_train": Y_prediction_train,
         "w": w,
         "b": b,
         "learning_rate": learning_rate,
         "num_iterations": num_iterations}

    return d

以下在l1w2_4.py中实现。

调用

训练和测试

# ---- 使用模型 ----
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=2000, learning_rate=0.005, print_cost=True)

在这里插入图片描述
可以看到训练集的准确率达到99%，而测试集的准确率达到了70%，考虑到我们使用的数据集很小，并且逻辑回归是线性分类器，对于这个简单的模型来说，这实际上还不错。

打印图像

# 打印图像
costs = np.squeeze(d['costs'])
plt.plot(costs)
plt.ylabel('cost')
plt.xlabel('iterations (per hundreds)')
plt.title("Learning rate =" + str(d["learning_rate"]))
plt.show()

在这里插入图片描述
可以看到损失率在不断的下降。

预测

自己找一张jpg格式的图片，使用转换工具转换成64*64大小的图片来预测是不是猫。
在这里插入图片描述

# 加载数据
fname = 'E:\Git\python\\25_py_01\\0120\\012002_wuenda\L1W2\cat\image\\24poJOgl7m_small.jpg'
# 数据预处理
image = np.array(plt.imread(fname))
my_image = np.array(Image.fromarray(image).resize((64, 64))).reshape((1, 64 * 64 * 3)).T

# 进行预测
my_predicted_image = predict(d["w"], d["b"], my_image)
plt.imshow(image)

# 打印预测结果
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[
    int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")

在这里插入图片描述

源码

l1w2_4.py

import numpy as np
import matplotlib.pyplot as plt
from PIL import Image
from lr_utils import load_dataset
from model import model
from model import predict

# 加载数据
train_set_x_orig, train_set_y, test_set_x_orig, test_set_y, classes = load_dataset()

# ---- 预处理 ----
# 将维度为（a，b，c，d）的矩阵X展平为形状为(bcd, a)的矩阵X_flatten时的一个技巧
# 1张图片rgb为1列，共209张图片
train_set_x_flatten = train_set_x_orig.reshape(train_set_x_orig.shape[0], -1).T
test_set_x_flatten = test_set_x_orig.reshape(test_set_x_orig.shape[0], -1).T

# ---- 归一化，加快训练 ----
train_set_x = train_set_x_flatten / 255.
test_set_x = test_set_x_flatten / 255.

print(train_set_x.shape[0])

# ---- 使用模型 ----
d = model(train_set_x, train_set_y, test_set_x, test_set_y, num_iterations=2000, learning_rate=0.005, print_cost=True)

# # 打印图像
# costs = np.squeeze(d['costs'])
# plt.plot(costs)
# plt.ylabel('cost')
# plt.xlabel('iterations (per hundreds)')
# plt.title("Learning rate =" + str(d["learning_rate"]))
# plt.show()

# 加载数据
fname = 'E:\Git\python\\25_py_01\\0120\\012002_wuenda\L1W2\cat\image\\24poJOgl7m_small.jpg'
# 数据预处理
image = np.array(plt.imread(fname))
my_image = np.array(Image.fromarray(image).resize((64, 64))).reshape((1, 64 * 64 * 3)).T

# 进行预测
my_predicted_image = predict(d["w"], d["b"], my_image)
plt.imshow(image)

# 打印预测结果
print("y = " + str(np.squeeze(my_predicted_image)) + ", your algorithm predicts a \"" + classes[
    int(np.squeeze(my_predicted_image)),].decode("utf-8") + "\" picture.")

model.py

import numpy as np


# ----相关的函数----

# 逻辑斯蒂函数
def sigmoid(z):
    return 1 / (1 + np.exp(-z))


# 初始化参数w和b
def initialize_with_zeros(dim):
    # w是一个列向量，dim满足64*64*3
    w = np.zeros((dim, 1))
    b = 0

    # w是否满足条件
    # b是否是float或者int类型
    assert (w.shape == (dim, 1))
    assert (isinstance(b, float) or isinstance(b, int))
    return w, b


# 计算损失函数
# w b参数
# X 输入的数据集
# Y 输出的标签
def propagate(w, b, X, Y):
    # 获取数据集的数量
    m = X.shape[1]

    # --------前向传播--------
    # 计算标签概率
    A = sigmoid(np.dot(w.T, X) + b)
    # 损失函数
    cost = -1 / m * np.sum(Y * np.log(A) + (1 - Y) * np.log(1 - A))

    # --------反向传播--------
    dw = 1 / m * np.dot(X, (A - Y).T)
    db = 1 / m * np.sum(A - Y)

    assert (dw.shape == w.shape)
    assert (db.dtype == float)
    # 移除不必要的维度
    cost = np.squeeze(cost)
    assert (cost.shape == ())

    grads = {"dw": dw,
             "db": db}
    return grads, cost


# 梯度下降更新值
# w b
# num_iterations 循环迭代次数
# learning_rate 学习率
# print_cost 打印损失函数
def optimize(w, b, X, Y, num_iterations, learning_rate, print_cost=False):
    # 损失值，用于画图
    costs = []

    # 多次迭代
    for i in range(num_iterations):
        grads, cost = propagate(w, b, X, Y)

        dw = grads['dw']
        db = grads['db']

        # 更新参数
        w = w - learning_rate * dw
        b = b - learning_rate * db

        if i % 100 == 0:
            costs.append(cost)
        if print_cost and i % 100 == 0:
            print("Cost after iteration %i: %f" % (i, cost))

    params = {"w": w,
              "b": b}

    grads = {"dw": dw,
             "db": db}

    return params, grads, costs


# 预测
def predict(w, b, X):
    # 获取输入集的维度
    m = X.shape[1]
    # 预测标签
    Y_prediction = np.zeros((1, m))
    # 将预测的输入集修改成满足要求的维度
    w = w.reshape(X.shape[0], 1)
    # 计算并重新映射到[0,1]区间
    A = sigmoid(np.dot(w.T, X) + b)

    # 分类
    for i in range(A.shape[1]):
        if A[0, i] <= 0.5:
            Y_prediction[0, i] = 0
        else:
            Y_prediction[0, i] = 1

    assert (Y_prediction.shape == (1, m))
    return Y_prediction


# 模型
# num_iterations 迭代次数
# learning_rate 学习率
# print_cost 每100步骤打印一次损失
def model(X_train, Y_train, X_test, Y_test, num_iterations=2000, learning_rate=0.5, print_cost=False):
    # 初始化参数
    w, b = initialize_with_zeros(X_train.shape[0])

    # 梯度下降
    parameters, grads, costs = optimize(w, b, X_train, Y_train, num_iterations, learning_rate, print_cost)

    w = parameters['w']
    b = parameters['b']

    # 预测
    Y_prediction_train = predict(w, b, X_train)
    Y_prediction_test = predict(w, b, X_test)

    # 打印预测值与真实值之间差值的平均值
    print("train accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_train - Y_train)) * 100))
    print("test accuracy: {} %".format(100 - np.mean(np.abs(Y_prediction_test - Y_test)) * 100))

    # 封装
    d = {"costs": costs,
         "Y_prediction_test": Y_prediction_test,
         "Y_prediction_train": Y_prediction_train,
         "w": w,
         "b": b,
         "learning_rate": learning_rate,
         "num_iterations": num_iterations}

    return d