用 Python 从零开始创建神经网络(二十):模型评估
模型评估
- 引言
引言
在第11章《测试或样本外数据》中,我们讨论了验证数据和测试数据之间的区别。对于目前的模型,我们在训练过程中进行了验证,但目前没有一个好的方法来对测试数据运行测试或进行预测。首先,我们将在Model
类中添加一个新的evaluate
方法:
# Evaluates the model using passed in dataset
def evaluate(self, X_val, y_val, *, batch_size=None):
此方法接收样本(
X
v
a
l
X_{val}
Xval)、目标输出(
y
v
a
l
y_{val}
yval)以及一个可选的批次大小参数。首先,根据数据的长度和批次大小参数计算步骤数量。这与train
方法中的计算方式相同:
# Default value if batch size is not being set
validation_steps = 1
# Calculate number of steps
if batch_size is not None:
validation_steps = len(X_val) // batch_size
# Dividing rounds down. If there are some remaining
# data, but not a full batch, this won't include it
# Add `1` to include this not full batch
if validation_steps * batch_size < len(X_val):
validation_steps += 1
然后,我们将从Model
类的train
方法中移动一段代码。我们将这段代码以及用于计算步骤数量和重置累积损失与准确率的代码部分移动到evaluate
方法中,使其变成:
# Evaluates the model using passed in dataset
def evaluate(self, X_val, y_val, *, batch_size=None):
# Default value if batch size is not being set
validation_steps = 1
# Calculate number of steps
if batch_size is not None:
validation_steps = len(X_val) // batch_size
# Dividing rounds down. If there are some remaining
# data, but not a full batch, this won't include it
# Add `1` to include this not full batch
if validation_steps * batch_size < len(X_val):
validation_steps += 1
# Reset accumulated values in loss
# and accuracy objects
self.loss.new_pass()
self.accuracy.new_pass()
# Iterate over steps
for step in range(validation_steps):
# If batch size is not set -
# train using one step and full dataset
if batch_size is None:
batch_X = X_val
batch_y = y_val
# Otherwise slice a batch
else:
batch_X = X_val[step*batch_size:(step+1)*batch_size]
batch_y = y_val[step*batch_size:(step+1)*batch_size]
# Perform the forward pass
output = self.forward(batch_X, training=False)
# Calculate the loss
self.loss.calculate(output, batch_y)
# Get predictions and calculate an accuracy
predictions = self.output_layer_activation.predictions(output)
self.accuracy.calculate(predictions, batch_y)
# Get and print validation loss and accuracy
validation_loss = self.loss.calculate_accumulated()
validation_accuracy = self.accuracy.calculate_accumulated()
# Print a summary
print(f'validation, ' +
f'acc: {validation_accuracy:.3f}, ' +
f'loss: {validation_loss:.3f}')
现在,在Model
类的train
方法中原本放置那段代码的位置,我们可以调用新的evaluate
方法:
# Model class
class Model:
...
# def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
def train(self, X, y, *, epochs=1, batch_size=None, print_every=1, validation_data=None):
...
...
# If there is the validation data
if validation_data is not None:
# Evaluate the model:
self.evaluate(*validation_data, batch_size=batch_size)
如果你对*validation_data
部分感到困惑,这里的星号(称为“解包表达式”)会将validation_data
列表解包为单个值。以下是一个简单的示例,说明其工作原理:
a = (1, 2)
def test(n1, n2):
print(n1, n2)
test(*a)
>>>
1 2
现在我们有了这个独立的evaluate
方法,可以随时评估模型——无论是在训练期间还是按需评估,只需传入验证数据或测试数据即可。首先,我们像往常一样创建并训练一个模型:
# Create dataset
X, y, X_test, y_test = create_data_mnist('fashion_mnist_images')
# Shuffle the training dataset
keys = np.array(range(X.shape[0]))
np.random.shuffle(keys)
X = X[keys]
y = y[keys]
# Scale and reshape samples
X = (X.reshape(X.shape[0], -1).astype(np.float32) - 127.5) / 127.5
X_test = (X_test.reshape(X_test.shape[0], -1).astype(np.float32) - 127.5) / 127.5
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(X.shape[1], 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 128))
model.add(Activation_ReLU())
model.add(Layer_Dense(128, 10))
model.add(Activation_Softmax())
# Set loss, optimizer and accuracy objects
model.set(
loss=Loss_CategoricalCrossentropy(),
optimizer=Optimizer_Adam(decay=1e-3),
accuracy=Accuracy_Categorical()
)
# Finalize the model
model.finalize()
# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10, batch_size=128, print_every=100)
然后我们可以添加代码来进行评估。目前,除了我们用于验证的数据之外,没有其他特定的测试数据,但现在我们可以使用这些数据来测试这个方法:
model.evaluate(X_test, y_test)
运行之后,我们得到:
>>>
...
epoch: 10
step: 0, acc: 0.906, loss: 0.198 (data_loss: 0.198, reg_loss: 0.000), lr: 0.0001915341888527102
step: 100, acc: 0.930, loss: 0.193 (data_loss: 0.193, reg_loss: 0.000), lr: 0.00018793459875963167
step: 200, acc: 0.922, loss: 0.175 (data_loss: 0.175, reg_loss: 0.000), lr: 0.00018446781036709093
step: 300, acc: 0.922, loss: 0.245 (data_loss: 0.245, reg_loss: 0.000), lr: 0.00018112660749864155
step: 400, acc: 0.898, loss: 0.303 (data_loss: 0.303, reg_loss: 0.000), lr: 0.00017790428749332856
step: 468, acc: 0.938, loss: 0.144 (data_loss: 0.144, reg_loss: 0.000), lr: 0.00017577781683951485
training, acc: 0.915, loss: 0.237 (data_loss: 0.237, reg_loss: 0.000), lr: 0.00017577781683951485
validation, acc: 0.881, loss: 0.334
validation, acc: 0.881, loss: 0.334
验证准确率和损失在末尾重复显示两次,并显示相同的值,因为我们在训练期间进行了验证,并在相同数据上立即进行了评估。通常,你会训练一个模型,调整其超参数,然后重新训练,依此类推,使用传递给训练方法的训练和验证数据。接着,当你找到表现最佳的模型和超参数时,你会将该模型应用于测试数据,并在将来用于生产环境中的预测。
接下来,我们还可以对训练数据进行评估:
model.evaluate(X, y)
运行之后打印结果:
>>>
validation, acc: 0.915, loss: 0.231
这里的“验证”是指我们对模型进行了评估,但这是使用训练数据完成的。我们将其与刚刚在这些数据上进行的训练结果进行比较:
training, acc: 0.915, loss: 0.237 (data_loss: 0.237, reg_loss: 0.000), lr: 0.00017577781683951485
你可能会注意到,尽管使用的是相同的数据集,但准确率和损失值之间仍存在一些差异。这种差异源于以下事实:模型打印的是训练轮次期间累积的准确率和损失,而此时模型仍在学习;因此,平均准确率和损失与训练结束后在训练数据上进行的评估结果有所不同。在训练过程结束时对训练数据运行评估将返回最终的准确率和损失。
在下一章中,我们将添加保存和加载模型的功能;同时,我们还将构建一种方法来获取和设置模型的参数。
本章的章节代码、更多资源和勘误表:https://nnfs.io/ch20