Oracle用 Python 从零开始创建神经网络（十八）：模型对象（Model Object）

忿忿的泥巴坨 发表于 2025-1-2 19:10:56

用 Python 从零开始创建神经网络（十八）：模型对象（Model Object）

引言

我们构建了一个可以执行前向传播、反向传播以及精度丈量等辅助任务的模型。通过编写相当多的代码并在一些较大的代码块中进行修改，我们实现了这些功能。此时，将模型自己转化为一个对象的做法开始显得更有意义，特殊是当我们盼望生存和加载这个对象以用于未来的预测任务时。别的，我们还可以利用这个对象减少一些常见代码行，使得与当前代码库的协作更加便捷，同时也更容易构建新的模型。为了完成模型对象的转换，我们将利用我们最近工作的模型，即利用正弦数据的回归模型：
from nnfs.datasets import sine_data

X, y = sine_data()
有了数据之后，我们制作模型类的第一步就是添加我们想要的各层。因此，我们可以通过以下操作来开始我们的模型类：
# Model class
class Model:
def __init__(self):
   # Create a list of network objects
   self.layers = []

# Add objects to the model
def add(self, layer):
   self.layers.append(layer)
如许，我们就可以利用模型对象的添加方法来添加图层。仅这一点就能大大进步可读性。让我们添加一些图层：
# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())
我们现在也可以查询这个模型：
print(model.layers)
>>>
[<__main__.Layer_Dense object at 0x000001D1EB2A2900>,
<__main__.Activation_ReLU object at 0x000001D1EB2A2180>,
<__main__.Layer_Dense object at 0x000001D1EB2A3F20>,
<__main__.Activation_ReLU object at 0x000001D1EB2B9220>,
<__main__.Layer_Dense object at 0x000001D1EB2BB800>,
<__main__.Activation_Linear object at 0x000001D1EB2BBA40>]
除了添加层，我们还想为模型设置丧失函数和优化器。为此，我们将创建一个名为 set 的方法：
# Set loss and optimizer
def set(self, *, loss, optimizer):
self.loss = loss
self.optimizer = optimizer
在参数定义中利用星号（*）体现后续的参数（在本例中是loss和optimizer）为关键字参数。由于这些参数没有默认值，因此它们是必须的关键字参数，也就是说必须通过名称和值的形式传递，从而使代码更加易读。
现在，我们可以将一个调用此方法的语句添加到我们新创建的模型对象中，并传递loss和optimizer对象：
# Create datasetX, y = sine_data()# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())
# Set loss and optimizer objectsmodel.set( loss=Loss_MeanSquaredError(), optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3), ) 设置好模型的层、丧失函数和优化器后，下一步就是练习了，因此我们要添加一个 train 方法。现在，我们先将其作为一个占位符，不久后再进行填充：
# Train the model
def train(self, X, y, *, epochs=1, print_every=1):
# Main training loop
for epoch in range(1, epochs+1):
   # Temporary
   pass
然后，我们可以在模型定义中添加对 train 方法的调用。我们将传递练习数据、epochs 的数目（10000，我们目前利用的是），以及打印练习摘要的频率。我们不必要或不盼望每一步都打印，因此我们将对其进行设置：
# Create datasetX, y = sine_data()# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())
# Set loss and optimizer objectsmodel.set( loss=Loss_MeanSquaredError(), optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3), )model.train(X, y, epochs=10000, print_every=100) 要进行练习，我们必要执行前向传播。在对象中执行前向传播轻微复杂一些，因为我们必要在层的循环中完成此操作，并且必要知道前一层的输出以正确地传递数据。查询前一层的一个问题是，第一层没有“前一层”。我们定义的第一层是第一隐含层。因此，我们的一个选择是创建一个“输入层”。这被认为是神经网络中的一层，但没有与之相关的权重和偏置。输入层仅包含练习数据，我们仅在循环迭代层时将其用作第一层的“前一层”。我们将创建一个新类，并像调用Layer_Dense类一样调用它，称为Layer_Input：
# Input "layer"
class Layer_Input:
# Forward pass
def forward(self, inputs):
   self.output = inputs
forward方法将练习样本设置为self.output。这一属性与其他层是通用的。这里没有必要实现反向传播方法，因为我们永久不会用到它。现在大概看起来创建这个类有点多余，但盼望很快你就会明白我们将如何利用它。接下来，我们要为模型的每一层设置前一层和后一层的属性。我们将在Model类中创建一个名为finalize的方法：
# Finalize the model
def finalize(self):
   # Create and set the input layer
   self.input_layer = Layer_Input()
   # Count all the objects
   layer_count = len(self.layers)
   # Iterate the objects
   for i in range(layer_count):
         # If it's the first layer,
         # the previous layer object is the input layer
         if i == 0:
            self.layers.prev = self.input_layer
            self.layers.next = self.layers
         # All layers except for the first and the last
         elif i < layer_count - 1:
            self.layers.prev = self.layers
            self.layers.next = self.layers
         # The last layer - the next object is the loss
         else:
            self.layers.prev = self.layers
            self.layers.next = self.loss
这段代码创建了一个输入层，并为模型对象的self.layers列表中的每一层设置了next和prev引用。我们创建了Layer_Input类，以便在循环中为第一隐藏层设置prev属性，因为我们将以同一的方式调用所有层。对于末了一层，其next层将是我们已经创建的丧失函数。
现在，我们已经为模型对象执行前向传播所需的层信息准备就绪，让我们添加一个forward方法。我们将同时在练习时和之后仅进行预测（也称为模型推理）时利用这个forward方法。以下是在Model类中继续添加的代码：
# Forward pass
class Model:
...
# Performs forward pass
def forward(self, X):
   # Call forward method on the input layer
   # this will set the output property that
   # the first layer in "prev" object is expecting
   self.input_layer.forward(X)
   # Call forward method of every object in a chain
   # Pass output of the previous object as a parameter
   for layer in self.layers:
         layer.forward(layer.prev.output)
   # "layer" is now the last object from the list,
   # return its output
   return layer.output
在这种情况下，我们传入输入数据                                  X                            X                X，然后简单地通过 Model 对象中的 input_layer 处置惩罚该数据，这会在该对象中创建一个 output 属性。从这里开始，我们迭代 self.layers 中的层，这些层从第一个隐藏层开始。对于每一层，我们对上一层的输出数据 layer.prev.output 执行前向传播。对于第一个隐藏层，layer.prev 是 self.input_layer。调用每一层的 forward 方法时会创建该层的 output 属性，然后该属性会作为输入传递到下一层的 forward 方法调用中。一旦我们遍历了所有层，就会返回末了一层的输出。
这就是一次前向传播。现在，让我们将这个前向传播方法调用添加到 Model 类的 train 方法中：
# Forward pass
class Model:
...
# Train the model
def train(self, X, y, *, epochs=1, print_every=1):
   # Main training loop
   for epoch in range(1, epochs+1):
         # Perform the forward pass
         output = self.forward(X)
         # Temporary
         print(output)
         sys.exit()
到目前为止的完整Model类：
# Model class
class Model:
def __init__(self):
   # Create a list of network objects
   self.layers = []

# Add objects to the model
def add(self, layer):
   self.layers.append(layer)
   # Set loss and optimizer def set(self, *, loss, optimizer):    self.loss = loss    self.optimizer = optimizer    # Train the model def train(self, X, y, *, epochs=1, print_every=1):    # Main training loop    for epoch in range(1, epochs+1):          # Perform the forward pass          output = self.forward(X)          # Temporary          print(output)          sys.exit() # Finalize the model def finalize(self):    # Create and set the input layer    self.input_layer = Layer_Input()    # Count all the objects    layer_count = len(self.layers)    # Iterate the objects    for i in range(layer_count):          # If it's the first layer,          # the previous layer object is the input layer          if i == 0:             self.layers.prev = self.input_layer             self.layers.next = self.layers          # All layers except for the first and the last          elif i < layer_count - 1:             self.layers.prev = self.layers             self.layers.next = self.layers          # The last layer - the next object is the loss          else:             self.layers.prev = self.layers             self.layers.next = self.loss # Performs forward pass def forward(self, X):    # Call forward method on the input layer    # this will set the output property that    # the first layer in "prev" object is expecting    self.input_layer.forward(X)    # Call forward method of every object in a chain    # Pass output of the previous object as a parameter    for layer in self.layers:          layer.forward(layer.prev.output)    # "layer" is now the last object from the list,    # return its output    return layer.output 末了，我们可以在主代码中添加 finalize 方法调用（请记着，除其他事项外，该方法还能让模型的图层知道它们的上一层和下一层）。
# Create datasetX, y = sine_data()# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(1, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 64))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Linear())
# Set loss and optimizer objectsmodel.set( loss=Loss_MeanSquaredError(), optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3), )# Finalize the modelmodel.finalize()model.train(X, y, epochs=10000, print_every=100) >>>
[[ 0.00000000e+00]
[-1.13209149e-08]
[-2.26418297e-08]
...
[-1.12869511e-05]
[-1.12982725e-05]
[-1.13095930e-05]]
此时，我们已经在Model类中覆盖了模型的前向传播。我们仍必要计算丧失和准确率，并进行反向传播。在此之前，我们必要知道哪些层是“可练习的”，也就是说这些层具有我们可以调整的权重和偏置。为此，我们必要查抄层是否有weights或biases属性。我们可以通过以下代码进行查抄：
# 如果层包含一个名为“weights”的属性，
# 那么它是一个可训练层 -
# 将其添加到可训练层列表中
# 我们不需要检查偏置 -
# 检查权重已经足够了
if hasattr(self.layers, 'weights'):
self.trainable_layers.append(self.layers)
其中，                               i                            i                i 是层列表中某一层的索引。我们将把这段代码添加到 finalize 方法中。以下是目前该方法的完整代码：
# Finalize the model
def finalize(self):
   # Create and set the input layer
   self.input_layer = Layer_Input()
   # Count all the objects
   layer_count = len(self.layers)
   # Initialize a list containing trainable layers:
   self.trainable_layers = []
   # Iterate the objects
   for i in range(layer_count):
         # If it's the first layer,
         # the previous layer object is the input layer
         if i == 0:
            self.layers.prev = self.input_layer
            self.layers.next = self.layers
         # All layers except for the first and the last
         elif i < layer_count - 1:
            self.layers.prev = self.layers
            self.layers.next = self.layers
         # The last layer - the next object is the loss
         # Also let's save aside the reference to the last object
         # whose output is the model's output
         else:
            self.layers.prev = self.layers
            self.layers.next = self.loss
            self.output_layer_activation = self.layers

         # 如果层包含一个名为“weights”的属性，
         # 那么它是一个可训练层 -
         # 将其添加到可训练层列表中
         # 我们不需要检查偏置 -
         # 检查权重已经足够了
         if hasattr(self.layers, 'weights'):
            self.trainable_layers.append(self.layers)
接下来，我们将修改平凡 Loss 类，使其包含以下内容：
# Common loss class
class Loss:
...
# Calculates the data and regularization losses
# given model output and ground truth values
def calculate(self, output, y):
   # Calculate sample losses
   sample_losses = self.forward(output, y)
   # Calculate mean loss
   data_loss = np.mean(sample_losses)
   # Return the data and regularization losses
   return data_loss, self.regularization_loss()

# Set/remember trainable layers
def remember_trainable_layers(self, trainable_layers):
   self.trainable_layers = trainable_layers
commonLoss 类中的 remember_trainable_layers 方法“告知”丧失对象哪些是 Model 对象中的可练习层。在单次调用期间，calculate 方法已被修改为还会返回 self.regularization_loss() 的值。regularization_loss 方法目前必要一个层对象，但随着在 remember_trainable_layers 方法中设置了 self.trainable_layers 属性，我们现在可以迭代所有可练习层，以计算整个模型的正则化丧失，而不是每次仅针对一个层进行计算：
# Common loss class
class Loss:
...
# Regularization loss calculation
def regularization_loss(self):
   # 0 by default
   regularization_loss = 0
   # Calculate regularization loss
   # iterate all trainable layers
   for layer in self.trainable_layers:
         # L1 regularization - weights
         # calculate only when factor greater than 0
         if layer.weight_regularizer_l1 > 0:
            regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
         # L2 regularization - weights
         if layer.weight_regularizer_l2 > 0:
            regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
         # L1 regularization - biases
         # calculate only when factor greater than 0
         if layer.bias_regularizer_l1 > 0:
            regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
         # L2 regularization - biases
         if layer.bias_regularizer_l2 > 0:
            regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
   return regularization_loss
为了计算准确率，我们必要预测效果。目前，根据模型的范例，预测必要不同的代码。例如，对于 softmax 分类器，我们利用 np.argmax()，但对于回归，由于输出层利用线性激活函数，预测效果直接为输出值。抱负情况下，我们必要一个预测方法，该方法能够为我们的模型选择合适的预测方式。为此，我们将在每个激活函数类中添加一个 predictions 方法：
# Softmax activation
class Activation_Softmax:
...
# Calculate predictions for outputs
def predictions(self, outputs):
   return np.argmax(outputs, axis=1)
# Sigmoid activation
class Activation_Sigmoid:
...
# Calculate predictions for outputs
def predictions(self, outputs):
   return (outputs > 0.5) * 1
# Linear activation
class Activation_Linear:
...
# Calculate predictions for outputs
def predictions(self, outputs):
   return outputs
在 predictions 函数内部进行的所有计算与之前章节中针对适当模型所执行的计算雷同。只管我们没有筹划将 ReLU 激活函数用于输出层的激活函数，但我们为了完整性仍会在此处包含它：
# ReLU activation
class Activation_ReLU:
...
# Calculate predictions for outputs
def predictions(self, outputs):
   return outputs
我们仍然必要在 Model 对象中为终极层的激活函数设置一个引用。之后我们可以调用 predictions 方法，该方法将根据输出计算并返回预测值。我们将在 Model 类的 finalize 方法中设置这一引用。
# Model class
class Model:
...
# Finalize the model
def finalize(self):
...
# The last layer - the next object is the loss
         # Also let's save aside the reference to the last object
         # whose output is the model's output
         else:
            self.layers.prev = self.layers
            self.layers.next = self.loss
            self.output_layer_activation = self.layers
就像不同的预测方法一样，我们也必要以不同的方式计算准确率。我们将以类似于特定丧失类对象实现的方式来实现这一功能——创建特定的准确率类及其对象，并将它们与模型关联。
首先，我们会编写一个通用的 Accuracy 类，该类目前只包含一个方法 calculate，用于返回根据比较效果计算的准确率。我们已经在代码中添加了对 self.compare 方法的调用，但这个方法目前还不存在。我们将在继承自 Accuracy 类的其他类中创建该方法。现在只必要知道这个方法会返回一个由 True 和 False 值组成的列表，指示预测是否与真实值匹配。接下来，我们计算这些值的均匀值（True 被视为1，False 被视为0），并将其作为准确率返回。代码如下：
# Common accuracy class
class Accuracy:
# Calculates an accuracy
# given predictions and ground truth values
def calculate(self, predictions, y):
   # Get comparison results
   comparisons = self.compare(predictions, y)
   # Calculate an accuracy
   accuracy = np.mean(comparisons)
   # Return accuracy
   return accuracy
接下来，我们可以利用这个通用的 Accuracy 类，通过继承它并进一步构建针对特定范例模型的功能。通常情况下，每个这些类都会包含两个方法：init（不要与 Python 类的 __init__ 方法混淆）用于从模型对象内部进行初始化，以及 compare 用于执行比较计算。
对于回归模型，init 方法将计算准确率的精度（与我们之前为回归模型编写并在练习循环之前运行的内容雷同）。compare 方法将包含我们在练习循环中实际实现的比较代码，利用 self.precision。必要留意的是，初始化时不会重新计算精度，除非通过将 reinit 参数设置为 True 强制重新计算。这种筹划允许多种用例，包罗独立设置 self.precision、在必要时调用 init（例如，在模型创建过程中从外部调用），甚至多次调用 init（这将在后续某些情况下非常有效）：
# Accuracy calculation for regression model
class Accuracy_Regression(Accuracy):
def __init__(self):
   # Create precision property
   self.precision = None
# Calculates precision value
# based on passed in ground truth
def init(self, y, reinit=False):
   if self.precision is None or reinit:
         self.precision = np.std(y) / 250
# Compares predictions to the ground truth values
def compare(self, predictions, y):
   return np.absolute(predictions - y) < self.precision
然后，我们可以通过在 Model 类的 set 方法中，以与当前设置丧失函数和优化器雷同的方式设置准确率对象。
# Model class
class Model:
...
# Set loss, optimizer and accuracy
def set(self, *, loss, optimizer, accuracy):
self.loss = loss
self.optimizer = optimizer
self.accuracy = accuracy
然后，我们可以在完成前向传播代码之后，将丧失和准确率的计算添加到模型中。必要留意的是，我们还在 train 方法的开头通过 self.accuracy.init(y) 初始化准确率，并且可以多次调用，如之前提到的那样。在回归准确率的情况下，这将在第一次调用时进行一次精度计算。以下是实现了丧失和准确率计算的 train 方法代码：
# Model class
class Model:
...
# Train the model
def train(self, X, y, *, epochs=1, print_every=1):
   # Initialize accuracy object
   self.accuracy.init(y)
   # Main training loop
   for epoch in range(1, epochs+1):
         # Perform the forward pass
         output = self.forward(X)
         # Calculate loss
         data_loss, regularization_loss = self.loss.calculate(output, y)
         loss = data_loss + regularization_loss
         # Get predictions and calculate an accuracy
         predictions = self.output_layer_activation.predictions(output)
         accuracy = self.accuracy.calculate(predictions, y)
末了，我们将在 finalize 方法中通过调用先前创建的 remember_trainable_layers 方法并传入 Loss 类的对象来实现（self.loss.remember_trainable_layers(self.trainable_layers)）。以下是目前为止的完整模型类代码：
# Model class
class Model:
def __init__(self):
   # Create a list of network objects
   self.layers = []

# Add objects to the model
def add(self, layer):
   self.layers.append(layer)
   # Set loss, optimizer and accuracy def set(self, *, loss, optimizer, accuracy):    self.loss = loss    self.optimizer = optimizer    self.accuracy = accuracy             # Finalize the model def finalize(self):    # Create and set the input layer    self.input_layer = Layer_Input()    # Count all the objects    layer_count = len(self.layers)    # Initialize a list containing trainable layers:    self.trainable_layers = []    # Iterate the objects    for i in range(layer_count):          # If it's the first layer,          # the previous layer object is the input layer          if i == 0:             self.layers.prev = self.input_layer             self.layers.next = self.layers          # All layers except for the first and the last          elif i < layer_count - 1:             self.layers.prev = self.layers             self.layers.next = self.layers          # The last layer - the next object is the loss          # Also let's save aside the reference to the last object          # whose output is the model's output          else:             self.layers.prev = self.layers             self.layers.next = self.loss             self.output_layer_activation = self.layers          # 如果层包含一个名为“weights”的属性，          # 那么它是一个可练习层 -          # 将其添加到可练习层列表中          # 我们不必要查抄偏置 -          # 查抄权重已经足够了          if hasattr(self.layers, 'weights'):             self.trainable_layers.append(self.layers)          # Update loss object with trainable layers          self.loss.remember_trainable_layers(self.trainable_layers) # Train the model def train(self, X, y, *, epochs=1, print_every=1):    # Initialize accuracy object    self.accuracy.init(y)    # Main training loop    for epoch in range(1, epochs+1):          # Perform the forward pass          output = self.forward(X)          # Calculate loss          data_loss, regularization_loss = self.loss.calculate(output, y)          loss = data_loss + regularization_loss          # Get predictions and calculate an accuracy          predictions = self.output_layer_activation.predictions(output)          accuracy = self.accuracy.calculate(predictions, y)             # Performs forward pass def forward(self, X):    # Call forward method on the input layer    # this will set the output property that    # the first layer in "prev" object is expecting    self.input_layer.forward(X)    # Call forward method of every object in a chain    # Pass output of the previous object as a parameter    for layer in self.layers:          layer.forward(layer.prev.output)    # "layer" is now the last object from the list,    # return its output    return layer.output Loss 类的全部代码：
# Common loss class
class Loss:
# Regularization loss calculation
def regularization_loss(self):
   # 0 by default
   regularization_loss = 0
   # Calculate regularization loss
   # iterate all trainable layers
   for layer in self.trainable_layers:
         # L1 regularization - weights
         # calculate only when factor greater than 0
         if layer.weight_regularizer_l1 > 0:
            regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
         # L2 regularization - weights
         if layer.weight_regularizer_l2 > 0:
            regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
         # L1 regularization - biases
         # calculate only when factor greater than 0
         if layer.bias_regularizer_l1 > 0:
            regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
         # L2 regularization - biases
         if layer.bias_regularizer_l2 > 0:
            regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
   return regularization_loss

# Set/remember trainable layers
def remember_trainable_layers(self, trainable_layers):
   self.trainable_layers = trainable_layers

# Calculates the data and regularization losses
# given model output and ground truth values
def calculate(self, output, y):
   # Calculate sample losses
   sample_losses = self.forward(output, y)
   # Calculate mean loss
   data_loss = np.mean(sample_losses)
   # Return the data and regularization losses
   return data_loss, self.regularization_loss()
现在我们已经完成了完整的前向传播并计算了丧失和准确率，接下来可以开始反向传播。在 Model 类中的 backward 方法在结构上与 forward 方法类似，只是顺序相反并利用不同的参数。按照之前练习方法中的反向传播，我们必要调用丧失对象的 backward 方法来创建 dinputs 属性。接着，我们将按照相反的顺序遍历所有层，调用它们的 backward 方法，并将下一层（正常顺序中的下一层）的 dinputs 属性作为参数传入，从而有效地反向传播由该下一层返回的梯度。请记着，我们已经将丧失对象设置为末了一层（输出层）的下一层。
# Model class
class Model:
...
# Performs backward pass
def backward(self, output, y):
   # First call backward method on the loss
   # this will set dinputs property that the last
   # layer will try to access shortly
   self.loss.backward(output, y)
   # Call backward method going through all the objects
   # in reversed order passing dinputs as a parameter
   for layer in reversed(self.layers):
         layer.backward(layer.next.dinputs)
接下来，我们将在 train 方法的末尾调用该 backward 方法：
# Perform backward pass
self.backward(output, y)
在完成反向传播之后，末了一个操作是进行优化。之前，我们针对每一个可练习的层多次调用优化器对象的 update_params 方法。现在，我们必要通过遍历可练习层的列表并在循环中调用 update_params() 方法，使这段代码更加通用：
# Optimize (update parameters)
         self.optimizer.pre_update_params()
         for layer in self.trainable_layers:
            self.optimizer.update_params(layer)
         self.optimizer.post_update_params()
然后我们可以输出有效的信息——此时，train 方法的末了一个参数就派上了用场：
# Print a summary
         if not epoch % print_every:
            print(f'epoch: {epoch}, ' +
                  f'acc: {accuracy:.3f}, ' +
                  f'loss: {loss:.3f} (' +
                  f'data_loss: {data_loss:.3f}, ' +
                  f'reg_loss: {regularization_loss:.3f}), ' +
                  f'lr: {self.optimizer.current_learning_rate}')
# Model class
class Model:
...
# Train the model
def train(self, X, y, *, epochs=1, print_every=1):
   # Initialize accuracy object
   self.accuracy.init(y)
   # Main training loop
   for epoch in range(1, epochs+1):
         # Perform the forward pass
         output = self.forward(X)
         # Calculate loss
         data_loss, regularization_loss = self.loss.calculate(output, y)
         loss = data_loss + regularization_loss
         # Get predictions and calculate an accuracy
         predictions = self.output_layer_activation.predictions(output)
         accuracy = self.accuracy.calculate(predictions, y)
         # Perform backward pass          self.backward(output, y)          # Optimize (update parameters)          self.optimizer.pre_update_params()          for layer in self.trainable_layers:             self.optimizer.update_params(layer)          self.optimizer.post_update_params()          # Print a summary          if not epoch % print_every:             print(f'epoch: {epoch}, ' +                   f'acc: {accuracy:.3f}, ' +                   f'loss: {loss:.3f} (' +                   f'data_loss: {data_loss:.3f}, ' +                   f'reg_loss: {regularization_loss:.3f}), ' +                   f'lr: {self.optimizer.current_learning_rate}') 现在，我们可以将精度类对象传入模型，并测试模型的性能：
>>>
epoch: 100, acc: 0.006, loss: 0.085 (data_loss: 0.085, reg_loss: 0.000), lr: 0.004549590536851684
epoch: 200, acc: 0.032, loss: 0.035 (data_loss: 0.035, reg_loss: 0.000), lr: 0.004170141784820684
...
epoch: 9900, acc: 0.934, loss: 0.000 (data_loss: 0.000, reg_loss: 0.000), lr: 0.00045875768419121016
epoch: 10000, acc: 0.970, loss: 0.000 (data_loss: 0.000, reg_loss: 0.000), lr: 0.00045458678061641964
我们的新模型体现良好，现在我们能够通过 Model 类更轻松地创建新模型。我们必要继续修改这些类，以支持全新的模型。例如，我们尚未处置惩罚二元逻辑回归。为此，我们必要添加两点内容。首先，我们必要计算分类准确率：
# Accuracy calculation for classification model
class Accuracy_Categorical(Accuracy):
# No initialization is needed
def init(self, y):
   pass
# Compares predictions to the ground truth values
def compare(self, predictions, y):
   if len(y.shape) == 2:
         y = np.argmax(y, axis=1)
   return predictions == y
这与分类的准确率计算雷同，只是将其封装到一个类中，并增加了一个切换参数。当该类与二元交织熵模型一起利用时，这个切换参数会禁用将独热编码转换为稀疏标签的操作，因为该模型始终必要真实值是一个二维数组，并且它们未进行独热编码。必要留意的是，这里并未执行任何初始化，但该方法必要存在，因为它将在 Model 类的 train 方法中调用。接下来，我们必要添加的是利用验证数据对模型进行验证的能力。验证只必要执行前向传播并计算丧失（仅数据丧失）。我们将修改 Loss 类的 calculate 方法，以使其也能够计算验证丧失：
# Common loss class
class Loss:
...
# Calculates the data and regularization losses
# given model output and ground truth values
def calculate(self, output, y, *, include_regularization=False):
   # Calculate sample losses
   sample_losses = self.forward(output, y)
   # Calculate mean loss
   data_loss = np.mean(sample_losses)
   # If just data loss - return it
   if not include_regularization:
         return data_loss
   # Return the data and regularization losses
   return data_loss, self.regularization_loss()
我们新增了一个参数和条件，以仅返回数据丧失，因为在这种情况下不会利用正则化丧失。为了运行它，我们将以与练习数据雷同的方式传递预测值和目标值。默认情况下，我们不会返回正则化丧失，这意味着我们必要更新 train 方法中对该方法的调用，以在练习期间包含正则化丧失：
# Calculate loss
         data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)
然后我们可以将验证代码添加到 Model 类中的 train 方法中。我们向函数添加了 validation_data 参数，该参数担当一个包含验证数据（样本和目标）的元组；添加了一个 if 语句查抄是否存在验证数据；如果存在，则执行代码对这些数据进行前向传播，按照与练习期间雷同的方式计算丧失和准确率，并打印效果：
# Model class
class Model:
...
# Train the model
def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
...
   # If there is the validation data
   if validation_data is not None:
         # For better readability
         X_val, y_val = validation_data
         # Perform the forward pass
         output = self.forward(X_val)
         # Calculate the loss
         loss = self.loss.calculate(output, y_val)
         # Get predictions and calculate an accuracy
         predictions = self.output_layer_activation.predictions(output)
         accuracy = self.accuracy.calculate(predictions, y_val)
         # Print a summary
         print(f'validation, ' +
               f'acc: {accuracy:.3f}, ' +
               f'loss: {loss:.3f}')
现在我们可以通过以下代码创建测试数据并测试二元逻辑回归模型：
# Create train and test dataset
X, y = spiral_data(samples=100, classes=2)
X_test, y_test = spiral_data(samples=100, classes=2)

# Reshape labels to be a list of lists
# Inner list contains one output (either 0 or 1)
# per each output neuron, 1 in this case
y = y.reshape(-1, 1)
y_test = y_test.reshape(-1, 1)

# Instantiate the model
model = Model()

# Add layers
model.add(Layer_Dense(2, 64, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
model.add(Activation_ReLU())
model.add(Layer_Dense(64, 1))
model.add(Activation_Sigmoid())

# Set loss, optimizer and accuracy objects
model.set(
loss=Loss_BinaryCrossentropy(),
optimizer=Optimizer_Adam(decay=5e-7),
accuracy=Accuracy_Categorical()
)

# Finalize the model
model.finalize()

# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)
>>>
epoch: 100, acc: 0.625, loss: 0.675 (data_loss: 0.674, reg_loss: 0.001), lr: 0.0009999505024501287
epoch: 200, acc: 0.630, loss: 0.669 (data_loss: 0.668, reg_loss: 0.001), lr: 0.0009999005098992651
...
epoch: 9900, acc: 0.905, loss: 0.312 (data_loss: 0.276, reg_loss: 0.037), lr: 0.0009950748768967994
epoch: 10000, acc: 0.905, loss: 0.312 (data_loss: 0.275, reg_loss: 0.036), lr: 0.0009950253706593885
validation, acc: 0.775, loss: 0.423
现在，我们已经简化了前向传播和反向传播代码，包罗验证过程，这是重新引入Dropout的好时机。回顾一下，Dropout是一种通过禁用或过滤掉某些神经元来正则化和进步模型泛化能力的方法。如果在我们的模型中利用Dropout，那么在进行验证和推理（预测）时，我们必要确保倒霉用Dropout。在之前的代码中，通过在验证过程中不调用Dropout的前向传播方法实现了这一点。这里，我们有一个通用方法，用于同时执行练习和验证的前向传播，因此必要一种不同的方法来关闭Dropout——即在练习过程中通知各层，并让它们“决定”是否包罗计算。我们要做的第一件事是为所有层和激活函数类的前向传播方法添加一个布尔参数training，因为我们必要以同一的方式调用它们：
# Forward pass
def forward(self, inputs, training):
当我们不处于练习模式时，可以在Layer_Dropout类中将输出直接设置为输入，并在不改变输出的情况下从方法中返回：
# If not in the training mode - return values
if not training:
self.output = inputs.copy()
return
我们在培训时，会让dropout参与进来：
# Dropout
class Layer_Dropout:
...
# Forward pass
def forward(self, inputs, training):
   # Save input values
   self.inputs = inputs
   # If not in the training mode - return values
   if not training:
         self.output = inputs.copy()
         return
   # Generate and save scaled mask
   self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate
   # Apply mask to output values
   self.output = inputs * self.binary_mask
接下来，我们修改Model类的forward方法，添加training参数，并调用各层的forward方法以传递该参数的值：
# Model class
class Model:
...
# Performs forward pass
def forward(self, X, training):
   # Call forward method on the input layer
   # this will set the output property that
   # the first layer in "prev" object is expecting
   self.input_layer.forward(X, training)
   # Call forward method of every object in a chain
   # Pass output of the previous object as a parameter
   for layer in self.layers:
         layer.forward(layer.prev.output, training)
   # "layer" is now the last object from the list,
   # return its output
   return layer.output
我们还必要更新Model类中的train方法，因为在调用forward方法时，training参数必要被设置为True：
# Perform the forward pass
output = self.forward(X, training=True)
然后在验证过程中将其设置为False：
# Perform the forward pass
output = self.forward(X_val, training=False)
# Model class
class Model:
...
# Train the model
def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
   # Initialize accuracy object
   self.accuracy.init(y)
   # Main training loop
   for epoch in range(1, epochs+1):
         # Perform the forward pass
         output = self.forward(X, training=True)
         # Calculate loss
         data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)
         loss = data_loss + regularization_loss
         # Get predictions and calculate an accuracy
         predictions = self.output_layer_activation.predictions(output)
         accuracy = self.accuracy.calculate(predictions, y)
         # Perform backward pass
         self.backward(output, y)
         # Optimize (update parameters)
         self.optimizer.pre_update_params()
         for layer in self.trainable_layers:
            self.optimizer.update_params(layer)
         self.optimizer.post_update_params()
         # Print a summary
         if not epoch % print_every:
            print(f'epoch: {epoch}, ' +
                  f'acc: {accuracy:.3f}, ' +
                  f'loss: {loss:.3f} (' +
                  f'data_loss: {data_loss:.3f}, ' +
                  f'reg_loss: {regularization_loss:.3f}), ' +
                  f'lr: {self.optimizer.current_learning_rate}')
   # If there is the validation data
   if validation_data is not None:
         # For better readability
         X_val, y_val = validation_data
         # Perform the forward pass
         output = self.forward(X_val, training=False)
         # Calculate the loss
         loss = self.loss.calculate(output, y_val)
         # Get predictions and calculate an accuracy
         predictions = self.output_layer_activation.predictions(output)
         accuracy = self.accuracy.calculate(predictions, y_val)
         # Print a summary
         print(f'validation, ' +
               f'acc: {accuracy:.3f}, ' +
               f'loss: {loss:.3f}')
末了，我们必要处置惩罚Model类中结合了Softmax激活和CrossEntropy丧失的类。这里的挑战在于，之前我们是为每个模型单独手动定义前向传播和后向传播的。然而，现在我们在计算的两个方向上都有循环，对输出和梯度的计算有同一的方式，以及其他改进。我们不能简单地移除Softmax激活和Categorical Cross-Entropy丧失并用一个结合了两者的对象替换它们。按照目前的代码，这种方式是行不通的，因为我们以特定的方式处置惩罚输出激活函数和丧失函数。
由于结合对象仅优化了后向传播的部分，我们决定让前向传播保持稳定，仍然利用单独的Softmax激活和Categorical Cross-Entropy丧失对象，只处置惩罚后向传播部分。
首先，我们必要主动确定当前模型是否是一个分类器，以及它是否利用了Softmax激活和Categorical Cross-Entropy丧失。这可以通过查抄末了一层对象的类名（这是一个激活函数对象）以及丧失函数对象的类名来实现。我们将在finalize方法的末尾添加此查抄：
   # If output activation is Softmax and
   # loss function is Categorical Cross-Entropy
   # create an object of combined activation
   # and loss function containing
   # faster gradient calculation
   if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
         # Create an object of combined activation
         # and loss functions
         self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()
为了进行此查抄，我们利用了 Python 的isinstance函数。如果给定对象是指定类的实例，isinstance函数将返回True。如果两个查抄都返回True，我们将设置一个新属性，该属性包含Activation_Softmax_Loss_CategoricalCrossentropy类的对象。
我们还必要在Model类的构造函数中，将此属性初始化为None值：
   # Softmax classifier's output object
   self.softmax_classifier_output = None
末了一步是在反向传播期间查抄这个对象是否已设置，如果已设置则利用它。为此，我们必要轻微修改当前的反向传播代码以单独处置惩罚这种情况。
首先，我们调用组合对象的backward方法；然后，由于我们不会调用激活函数对象（即层列表中的末了一个对象）的backward方法，因此必要用在激活/丧失对象中计算出的梯度来设置该对象的dinputs属性。末了，我们可以对除末了一层以外的所有层进行迭代并执行它们的反向传播操作：
   # If softmax classifier
   if self.softmax_classifier_output is not None:
         # First call backward method
         # on the combined activation/loss
         # this will set dinputs property
         self.softmax_classifier_output.backward(output, y)
         # Since we'll not call backward method of the last layer
         # which is Softmax activation
         # as we used combined activation/loss
         # object, let's set dinputs in this object
   self.layers[-1].dinputs = self.softmax_classifier_output.dinputs
   # Call backward method going through
   # all the objects but last
   # in reversed order passing dinputs as a parameter
   for layer in reversed(self.layers[:-1]):
         layer.backward(layer.next.dinputs)
   return
到目前为止的完整模型类代码如下：
# Model classclass Model: def __init__(self):    # Create a list of network objects    self.layers = []    # Softmax classifier's output object
   self.softmax_classifier_output = None
         # Add objects to the model def add(self, layer):    self.layers.append(layer)    # Set loss, optimizer and accuracy def set(self, *, loss, optimizer, accuracy):    self.loss = loss    self.optimizer = optimizer    self.accuracy = accuracy             # Finalize the model def finalize(self):    # Create and set the input layer    self.input_layer = Layer_Input()    # Count all the objects    layer_count = len(self.layers)    # Initialize a list containing trainable layers:    self.trainable_layers = []    # Iterate the objects    for i in range(layer_count):          # If it's the first layer,          # the previous layer object is the input layer          if i == 0:             self.layers.prev = self.input_layer             self.layers.next = self.layers          # All layers except for the first and the last          elif i < layer_count - 1:             self.layers.prev = self.layers             self.layers.next = self.layers          # The last layer - the next object is the loss          # Also let's save aside the reference to the last object          # whose output is the model's output          else:             self.layers.prev = self.layers             self.layers.next = self.loss             self.output_layer_activation = self.layers          # If layer contains an attribute called "weights",          # it's a trainable layer -          # add it to the list of trainable layers          # We don't need to check for biases -          # checking for weights is enough          if hasattr(self.layers, 'weights'):             self.trainable_layers.append(self.layers)          # Update loss object with trainable layers          self.loss.remember_trainable_layers(self.trainable_layers)    # If output activation is Softmax and
   # loss function is Categorical Cross-Entropy
   # create an object of combined activation
   # and loss function containing
   # faster gradient calculation
   if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
         # Create an object of combined activation
         # and loss functions
         self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()
# Train the model def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):    # Initialize accuracy object    self.accuracy.init(y)    # Main training loop    for epoch in range(1, epochs+1):          # Perform the forward pass          output = self.forward(X, training=True)          # Calculate loss          data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)          loss = data_loss + regularization_loss          # Get predictions and calculate an accuracy          predictions = self.output_layer_activation.predictions(output)          accuracy = self.accuracy.calculate(predictions, y)          # Perform backward pass          self.backward(output, y)          # Optimize (update parameters)          self.optimizer.pre_update_params()          for layer in self.trainable_layers:             self.optimizer.update_params(layer)          self.optimizer.post_update_params()          # Print a summary          if not epoch % print_every:             print(f'epoch: {epoch}, ' +                   f'acc: {accuracy:.3f}, ' +                   f'loss: {loss:.3f} (' +                   f'data_loss: {data_loss:.3f}, ' +                   f'reg_loss: {regularization_loss:.3f}), ' +                   f'lr: {self.optimizer.current_learning_rate}')    # If there is the validation data    if validation_data is not None:          # For better readability          X_val, y_val = validation_data          # Perform the forward pass          output = self.forward(X_val, training=False)          # Calculate the loss          loss = self.loss.calculate(output, y_val)          # Get predictions and calculate an accuracy          predictions = self.output_layer_activation.predictions(output)          accuracy = self.accuracy.calculate(predictions, y_val)          # Print a summary          print(f'validation, ' +                f'acc: {accuracy:.3f}, ' +                f'loss: {loss:.3f}') # Performs forward pass def forward(self, X, training):    # Call forward method on the input layer    # this will set the output property that    # the first layer in "prev" object is expecting    self.input_layer.forward(X, training)    # Call forward method of every object in a chain    # Pass output of the previous object as a parameter    for layer in self.layers:          layer.forward(layer.prev.output, training)    # "layer" is now the last object from the list,    # return its output    return layer.output # Performs backward pass def backward(self, output, y):    # If softmax classifier    if self.softmax_classifier_output is not None:          # First call backward method          # on the combined activation/loss          # this will set dinputs property          self.softmax_classifier_output.backward(output, y)          # Since we'll not call backward method of the last layer          # which is Softmax activation          # as we used combined activation/loss          # object, let's set dinputs in this object          self.layers[-1].dinputs = self.softmax_classifier_output.dinputs          # Call backward method going through          # all the objects but last          # in reversed order passing dinputs as a parameter          for layer in reversed(self.layers[:-1]):             layer.backward(layer.next.dinputs)          return    # First call backward method on the loss    # this will set dinputs property that the last    # layer will try to access shortly    self.loss.backward(output, y)    # Call backward method going through all the objects    # in reversed order passing dinputs as a parameter    for layer in reversed(self.layers):          layer.backward(layer.next.dinputs) 别的，我们将不再必要Activation_Softmax_Loss_CategoricalCrossentropy类的初始化器和前向传播方法，因此我们可以将它们移除，仅生存反向传播方法：
# Softmax classifier - combined Softmax activation
# and cross-entropy loss for faster backward step
class Activation_Softmax_Loss_CategoricalCrossentropy():
...
# Backward pass
def backward(self, dvalues, y_true):
   # Number of samples
   samples = len(dvalues)
   # Copy so we can safely modify
   self.dinputs = dvalues.copy()
   # Calculate gradient
   self.dinputs -= 1
   # Normalize gradient
   self.dinputs = self.dinputs / samples
现在我们可以通过利用 Dropout 来测试更新后的 Model 对象：
# Create dataset
X, y = spiral_data(samples=1000, classes=3)
X_test, y_test = spiral_data(samples=100, classes=3)
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(2, 512, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
model.add(Activation_ReLU())
model.add(Layer_Dropout(0.1))
model.add(Layer_Dense(512, 3))
model.add(Activation_Softmax())
# Set loss, optimizer and accuracy objects
model.set(
loss=Loss_CategoricalCrossentropy(),
optimizer=Optimizer_Adam(learning_rate=0.05, decay=5e-5),
accuracy=Accuracy_Categorical()
)
# Finalize the model
model.finalize()
# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)
>>>
epoch: 100, acc: 0.716, loss: 0.726 (data_loss: 0.666, reg_loss: 0.060), lr:
0.04975371909050202
epoch: 200, acc: 0.787, loss: 0.615 (data_loss: 0.538, reg_loss: 0.077), lr:
0.049507401356502806
...
epoch: 9900, acc: 0.861, loss: 0.436 (data_loss: 0.389, reg_loss: 0.046),
lr: 0.0334459346466437
epoch: 10000, acc: 0.880, loss: 0.394 (data_loss: 0.347, reg_loss: 0.047),
lr: 0.03333444448148271
validation, acc: 0.867, loss: 0.379
看起来齐备都按预期工作。现在有了这个 Model 类，我们可以定义新的模型，而无需重复编写大量代码。重复编写代码不但令人厌烦，还更容易出现一些难以察觉的小错误。

到目前为止的完整代码：

import numpy as npimport nnfsfrom nnfs.datasets import sine_data, spiral_dataimport sysnnfs.init()# Dense layerclass Layer_Dense: # Layer initialization def __init__(self, n_inputs, n_neurons,             weight_regularizer_l1=0, weight_regularizer_l2=0,             bias_regularizer_l1=0, bias_regularizer_l2=0):    # Initialize weights and biases    # self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)    self.weights = 0.1 * np.random.randn(n_inputs, n_neurons)    self.biases = np.zeros((1, n_neurons))    # Set regularization strength    self.weight_regularizer_l1 = weight_regularizer_l1    self.weight_regularizer_l2 = weight_regularizer_l2    self.bias_regularizer_l1 = bias_regularizer_l1    self.bias_regularizer_l2 = bias_regularizer_l2    # Forward pass def forward(self, inputs, training):    # Remember input values    self.inputs = inputs    # Calculate output values from inputs, weights and biases    self.output = np.dot(inputs, self.weights) + self.biases          # Backward pass def backward(self, dvalues):    # Gradients on parameters    self.dweights = np.dot(self.inputs.T, dvalues)    self.dbiases = np.sum(dvalues, axis=0, keepdims=True)    # Gradients on regularization    # L1 on weights    if self.weight_regularizer_l1 > 0:          dL1 = np.ones_like(self.weights)          dL1 = -1          self.dweights += self.weight_regularizer_l1 * dL1    # L2 on weights    if self.weight_regularizer_l2 > 0:          self.dweights += 2 * self.weight_regularizer_l2 * self.weights    # L1 on biases    if self.bias_regularizer_l1 > 0:          dL1 = np.ones_like(self.biases)          dL1 = -1          self.dbiases += self.bias_regularizer_l1 * dL1    # L2 on biases    if self.bias_regularizer_l2 > 0:          self.dbiases += 2 * self.bias_regularizer_l2 * self.biases    # Gradient on values    self.dinputs = np.dot(dvalues, self.weights.T)             # Dropoutclass Layer_Dropout:          # Init def __init__(self, rate):    # Store rate, we invert it as for example for dropout    # of 0.1 we need success rate of 0.9    self.rate = 1 - rate          # Forward pass def forward(self, inputs, training):    # Save input values    self.inputs = inputs    # If not in the training mode - return values    if not training:          self.output = inputs.copy()          return    # Generate and save scaled mask    self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate    # Apply mask to output values    self.output = inputs * self.binary_mask          # Backward pass def backward(self, dvalues):    # Gradient on values    self.dinputs = dvalues * self.binary_mask    # Input "layer"class Layer_Input: # Forward pass def forward(self, inputs, training):    self.output = inputs    # ReLU activationclass Activation_ReLU:    # Forward pass def forward(self, inputs, training):    # Remember input values    self.inputs = inputs    # Calculate output values from inputs    self.output = np.maximum(0, inputs)          # Backward pass def backward(self, dvalues):    # Since we need to modify original variable,    # let's make a copy of values first    self.dinputs = dvalues.copy()    # Zero gradient where input values were negative    self.dinputs = 0          # Calculate predictions for outputs def predictions(self, outputs):    return outputs             # Softmax activationclass Activation_Softmax: # Forward pass def forward(self, inputs, training):    # Remember input values    self.inputs = inputs    # Get unnormalized probabilities    exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))    # Normalize them for each sample    probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)    self.output = probabilities          # Backward pass def backward(self, dvalues):    # Create uninitialized array    self.dinputs = np.empty_like(dvalues)    # Enumerate outputs and gradients    for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):          # Flatten output array          single_output = single_output.reshape(-1, 1)          # Calculate Jacobian matrix of the output and          jacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)          # Calculate sample-wise gradient          # and add it to the array of sample gradients          self.dinputs = np.dot(jacobian_matrix, single_dvalues)             # Calculate predictions for outputs def predictions(self, outputs):    return np.argmax(outputs, axis=1)          # Sigmoid activationclass Activation_Sigmoid: # Forward pass def forward(self, inputs, training):    # Save input and calculate/save output    # of the sigmoid function    self.inputs = inputs    self.output = 1 / (1 + np.exp(-inputs))          # Backward pass def backward(self, dvalues):    # Derivative - calculates from output of the sigmoid function    self.dinputs = dvalues * (1 - self.output) * self.output    # Calculate predictions for outputs def predictions(self, outputs):    return (outputs > 0.5) * 1    # Linear activationclass Activation_Linear: # Forward pass def forward(self, inputs, training):    # Just remember values    self.inputs = inputs    self.output = inputs          # Backward pass def backward(self, dvalues):    # derivative is 1, 1 * dvalues = dvalues - the chain rule    self.dinputs = dvalues.copy()    # Calculate predictions for outputs def predictions(self, outputs):    return outputs             # SGD optimizerclass Optimizer_SGD: # Initialize optimizer - set settings, # learning rate of 1. is default for this optimizer def __init__(self, learning_rate=1., decay=0., momentum=0.):    self.learning_rate = learning_rate    self.current_learning_rate = learning_rate    self.decay = decay    self.iterations = 0    self.momentum = momentum          # Call once before any parameter updates def pre_update_params(self):    if self.decay:          self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))    # Update parameters def update_params(self, layer):    # If we use momentum    if self.momentum:          # If layer does not contain momentum arrays, create them          # filled with zeros          if not hasattr(layer, 'weight_momentums'):             layer.weight_momentums = np.zeros_like(layer.weights)             # If there is no momentum array for weights             # The array doesn't exist for biases yet either.             layer.bias_momentums = np.zeros_like(layer.biases)          # Build weight updates with momentum - take previous          # updates multiplied by retain factor and update with          # current gradients          weight_updates = self.momentum * layer.weight_momentums - self.current_learning_rate * layer.dweights          layer.weight_momentums = weight_updates                      # Build bias updates          bias_updates = self.momentum * layer.bias_momentums - self.current_learning_rate * layer.dbiases          layer.bias_momentums = bias_updates    # Vanilla SGD updates (as before momentum update)    else:          weight_updates = -self.current_learning_rate * layer.dweights          bias_updates = -self.current_learning_rate * layer.dbiases    # Update weights and biases using either    # vanilla or momentum updates    layer.weights += weight_updates    layer.biases += bias_updates                # Call once after any parameter updates def post_update_params(self):    self.iterations += 1    # Adagrad optimizerclass Optimizer_Adagrad: # Initialize optimizer - set settings def __init__(self, learning_rate=1., decay=0., epsilon=1e-7):    self.learning_rate = learning_rate    self.current_learning_rate = learning_rate    self.decay = decay    self.iterations = 0    self.epsilon = epsilon          # Call once before any parameter updates def pre_update_params(self):    if self.decay:          self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))    # Update parameters def update_params(self, layer):    # If layer does not contain cache arrays,    # create them filled with zeros    if not hasattr(layer, 'weight_cache'):          layer.weight_cache = np.zeros_like(layer.weights)          layer.bias_cache = np.zeros_like(layer.biases)    # Update cache with squared current gradients    layer.weight_cache += layer.dweights**2    layer.bias_cache += layer.dbiases**2    # Vanilla SGD parameter update + normalization    # with square rooted cache    layer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)    layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)    # Call once after any parameter updates def post_update_params(self):    self.iterations += 1                      # RMSprop optimizerclass Optimizer_RMSprop:             # Initialize optimizer - set settings def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, rho=0.9):    self.learning_rate = learning_rate    self.current_learning_rate = learning_rate    self.decay = decay    self.iterations = 0    self.epsilon = epsilon    self.rho = rho    # Call once before any parameter updates def pre_update_params(self):    if self.decay:          self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))    # Update parameters def update_params(self, layer):    # If layer does not contain cache arrays,    # create them filled with zeros    if not hasattr(layer, 'weight_cache'):          layer.weight_cache = np.zeros_like(layer.weights)          layer.bias_cache = np.zeros_like(layer.biases)    # Update cache with squared current gradients    layer.weight_cache = self.rho * layer.weight_cache + (1 - self.rho) * layer.dweights**2    layer.bias_cache = self.rho * layer.bias_cache + (1 - self.rho) * layer.dbiases**2             # Vanilla SGD parameter update + normalization    # with square rooted cache    layer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)    layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)    # Call once after any parameter updates def post_update_params(self):    self.iterations += 1          # Adam optimizerclass Optimizer_Adam: # Initialize optimizer - set settings def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, beta_1=0.9, beta_2=0.999):    self.learning_rate = learning_rate    self.current_learning_rate = learning_rate    self.decay = decay    self.iterations = 0    self.epsilon = epsilon    self.beta_1 = beta_1    self.beta_2 = beta_2    # Call once before any parameter updates def pre_update_params(self):    if self.decay:          self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))          # Update parameters def update_params(self, layer):    # If layer does not contain cache arrays,    # create them filled with zeros    if not hasattr(layer, 'weight_cache'):          layer.weight_momentums = np.zeros_like(layer.weights)          layer.weight_cache = np.zeros_like(layer.weights)          layer.bias_momentums = np.zeros_like(layer.biases)          layer.bias_cache = np.zeros_like(layer.biases)    # Update momentum with current gradients    layer.weight_momentums = self.beta_1 * layer.weight_momentums + (1 - self.beta_1) * layer.dweights    layer.bias_momentums = self.beta_1 * layer.bias_momentums + (1 - self.beta_1) * layer.dbiases    # Get corrected momentum    # self.iteration is 0 at first pass    # and we need to start with 1 here    weight_momentums_corrected = layer.weight_momentums / (1 - self.beta_1 ** (self.iterations + 1))    bias_momentums_corrected = layer.bias_momentums / (1 - self.beta_1 ** (self.iterations + 1))    # Update cache with squared current gradients    layer.weight_cache = self.beta_2 * layer.weight_cache + (1 - self.beta_2) * layer.dweights**2    layer.bias_cache = self.beta_2 * layer.bias_cache + (1 - self.beta_2) * layer.dbiases**2    # Get corrected cache    weight_cache_corrected = layer.weight_cache / (1 - self.beta_2 ** (self.iterations + 1))    bias_cache_corrected = layer.bias_cache / (1 - self.beta_2 ** (self.iterations + 1))    # Vanilla SGD parameter update + normalization    # with square rooted cache    layer.weights += -self.current_learning_rate * weight_momentums_corrected / (np.sqrt(weight_cache_corrected) + self.epsilon)    layer.biases += -self.current_learning_rate * bias_momentums_corrected / (np.sqrt(bias_cache_corrected) + self.epsilon)                      # Call once after any parameter updates def post_update_params(self):    self.iterations += 1                # Common loss classclass Loss: # Regularization loss calculation def regularization_loss(self):             # 0 by default    regularization_loss = 0    # Calculate regularization loss    # iterate all trainable layers    for layer in self.trainable_layers:          # L1 regularization - weights          # calculate only when factor greater than 0          if layer.weight_regularizer_l1 > 0:             regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))          # L2 regularization - weights          if layer.weight_regularizer_l2 > 0:             regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)          # L1 regularization - biases          # calculate only when factor greater than 0          if layer.bias_regularizer_l1 > 0:             regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))          # L2 regularization - biases          if layer.bias_regularizer_l2 > 0:             regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)    return regularization_loss # Set/remember trainable layers def remember_trainable_layers(self, trainable_layers):    self.trainable_layers = trainable_layers # Calculates the data and regularization losses # given model output and ground truth values def calculate(self, output, y, *, include_regularization=False):    # Calculate sample losses    sample_losses = self.forward(output, y)    # Calculate mean loss    data_loss = np.mean(sample_losses)    # If just data loss - return it    if not include_regularization:          return data_loss    # Return the data and regularization losses    return data_loss, self.regularization_loss()    # Cross-entropy lossclass Loss_CategoricalCrossentropy(Loss): # Forward pass def forward(self, y_pred, y_true):    # Number of samples in a batch    samples = len(y_pred)    # Clip data to prevent division by 0    # Clip both sides to not drag mean towards any value    y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)    # Probabilities for target values -    # only if categorical labels    if len(y_true.shape) == 1:          correct_confidences = y_pred_clipped[             range(samples),             y_true          ]    # Mask values - only for one-hot encoded labels    elif len(y_true.shape) == 2:          correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)    # Losses    negative_log_likelihoods = -np.log(correct_confidences)    return negative_log_likelihoods    # Backward pass def backward(self, dvalues, y_true):    # Number of samples    samples = len(dvalues)    # Number of labels in every sample    # We'll use the first sample to count them    labels = len(dvalues)    # If labels are sparse, turn them into one-hot vector    if len(y_true.shape) == 1:          y_true = np.eye(labels)    # Calculate gradient    self.dinputs = -y_true / dvalues    # Normalize gradient    self.dinputs = self.dinputs / samples    # Softmax classifier - combined Softmax activation# and cross-entropy loss for faster backward stepclass Activation_Softmax_Loss_CategoricalCrossentropy():    # # Creates activation and loss function objects # def __init__(self): # self.activation = Activation_Softmax() # self.loss = Loss_CategoricalCrossentropy() # # Forward pass # def forward(self, inputs, y_true): # # Output layer's activation function # self.activation.forward(inputs) # # Set the output # self.output = self.activation.output # # Calculate and return loss value # return self.loss.calculate(self.output, y_true) # Backward pass def backward(self, dvalues, y_true):    # Number of samples    samples = len(dvalues)          # If labels are one-hot encoded,    # turn them into discrete values    if len(y_true.shape) == 2:          y_true = np.argmax(y_true, axis=1)    # Copy so we can safely modify    self.dinputs = dvalues.copy()    # Calculate gradient    self.dinputs -= 1    # Normalize gradient    self.dinputs = self.dinputs / samples    # Binary cross-entropy lossclass Loss_BinaryCrossentropy(Loss): # Forward pass def forward(self, y_pred, y_true):    # Clip data to prevent division by 0    # Clip both sides to not drag mean towards any value    y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)    # Calculate sample-wise loss    sample_losses = -(y_true * np.log(y_pred_clipped) + (1 - y_true) * np.log(1 - y_pred_clipped))    sample_losses = np.mean(sample_losses, axis=-1)    # Return losses    return sample_losses             # Backward pass def backward(self, dvalues, y_true):    # Number of samples    samples = len(dvalues)    # Number of outputs in every sample    # We'll use the first sample to count them    outputs = len(dvalues)    # Clip data to prevent division by 0    # Clip both sides to not drag mean towards any value    clipped_dvalues = np.clip(dvalues, 1e-7, 1 - 1e-7)    # Calculate gradient    self.dinputs = -(y_true / clipped_dvalues - (1 - y_true) / (1 - clipped_dvalues)) / outputs    # Normalize gradient    self.dinputs = self.dinputs / samples             # Mean Squared Error lossclass Loss_MeanSquaredError(Loss): # L2 loss # Forward pass def forward(self, y_pred, y_true):    # Calculate loss    sample_losses = np.mean((y_true - y_pred)**2, axis=-1)    # Return losses    return sample_losses    # Backward pass def backward(self, dvalues, y_true):    # Number of samples    samples = len(dvalues)    # Number of outputs in every sample    # We'll use the first sample to count them    outputs = len(dvalues)    # Gradient on values    self.dinputs = -2 * (y_true - dvalues) / outputs    # Normalize gradient    self.dinputs = self.dinputs / samples    # Mean Absolute Error lossclass Loss_MeanAbsoluteError(Loss): # L1 loss def forward(self, y_pred, y_true):    # Calculate loss    sample_losses = np.mean(np.abs(y_true - y_pred), axis=-1)    # Return losses    return sample_losses    # Backward pass def backward(self, dvalues, y_true):    # Number of samples    samples = len(dvalues)    # Number of outputs in every sample    # We'll use the first sample to count them    outputs = len(dvalues)    # Calculate gradient    self.dinputs = np.sign(y_true - dvalues) / outputs    # Normalize gradient    self.dinputs = self.dinputs / samples # Common accuracy class
class Accuracy:
# Calculates an accuracy
# given predictions and ground truth values
def calculate(self, predictions, y):
   # Get comparison results
   comparisons = self.compare(predictions, y)
   # Calculate an accuracy
   accuracy = np.mean(comparisons)
   # Return accuracy
   return accuracy
# Accuracy calculation for classification model
class Accuracy_Categorical(Accuracy):
# No initialization is needed
def init(self, y):
   pass
# Compares predictions to the ground truth values
def compare(self, predictions, y):
   if len(y.shape) == 2:
         y = np.argmax(y, axis=1)
   return predictions == y
         # Accuracy calculation for regression model
class Accuracy_Regression(Accuracy):
def __init__(self):
   # Create precision property
   self.precision = None
# Calculates precision value
# based on passed in ground truth
def init(self, y, reinit=False):
   if self.precision is None or reinit:
         self.precision = np.std(y) / 250
# Compares predictions to the ground truth values
def compare(self, predictions, y):
   return np.absolute(predictions - y) < self.precision
   # Model classclass Model: def __init__(self):    # Create a list of network objects    self.layers = []    # Softmax classifier's output object
   self.softmax_classifier_output = None
         # Add objects to the model def add(self, layer):    self.layers.append(layer)    # Set loss, optimizer and accuracy def set(self, *, loss, optimizer, accuracy):    self.loss = loss    self.optimizer = optimizer    self.accuracy = accuracy             # Finalize the model def finalize(self):    # Create and set the input layer    self.input_layer = Layer_Input()    # Count all the objects    layer_count = len(self.layers)    # Initialize a list containing trainable layers:    self.trainable_layers = []    # Iterate the objects    for i in range(layer_count):          # If it's the first layer,          # the previous layer object is the input layer          if i == 0:             self.layers.prev = self.input_layer             self.layers.next = self.layers          # All layers except for the first and the last          elif i < layer_count - 1:             self.layers.prev = self.layers             self.layers.next = self.layers          # The last layer - the next object is the loss          # Also let's save aside the reference to the last object          # whose output is the model's output          else:             self.layers.prev = self.layers             self.layers.next = self.loss             self.output_layer_activation = self.layers          # If layer contains an attribute called "weights",          # it's a trainable layer -          # add it to the list of trainable layers          # We don't need to check for biases -          # checking for weights is enough          if hasattr(self.layers, 'weights'):             self.trainable_layers.append(self.layers)          # Update loss object with trainable layers          self.loss.remember_trainable_layers(self.trainable_layers)    # If output activation is Softmax and
   # loss function is Categorical Cross-Entropy
   # create an object of combined activation
   # and loss function containing
   # faster gradient calculation
   if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
         # Create an object of combined activation
         # and loss functions
         self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()
# Train the model def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):    # Initialize accuracy object    self.accuracy.init(y)    # Main training loop    for epoch in range(1, epochs+1):          # Perform the forward pass          output = self.forward(X, training=True)          # Calculate loss          data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)          loss = data_loss + regularization_loss          # Get predictions and calculate an accuracy          predictions = self.output_layer_activation.predictions(output)          accuracy = self.accuracy.calculate(predictions, y)          # Perform backward pass          self.backward(output, y)          # Optimize (update parameters)          self.optimizer.pre_update_params()          for layer in self.trainable_layers:             self.optimizer.update_params(layer)          self.optimizer.post_update_params()          # Print a summary          if not epoch % print_every:             print(f'epoch: {epoch}, ' +                   f'acc: {accuracy:.3f}, ' +                   f'loss: {loss:.3f} (' +                   f'data_loss: {data_loss:.3f}, ' +                   f'reg_loss: {regularization_loss:.3f}), ' +                   f'lr: {self.optimizer.current_learning_rate}')    # If there is the validation data    if validation_data is not None:          # For better readability          X_val, y_val = validation_data          # Perform the forward pass          output = self.forward(X_val, training=False)          # Calculate the loss          loss = self.loss.calculate(output, y_val)          # Get predictions and calculate an accuracy          predictions = self.output_layer_activation.predictions(output)          accuracy = self.accuracy.calculate(predictions, y_val)          # Print a summary          print(f'validation, ' +                f'acc: {accuracy:.3f}, ' +                f'loss: {loss:.3f}') # Performs forward pass def forward(self, X, training):    # Call forward method on the input layer    # this will set the output property that    # the first layer in "prev" object is expecting    self.input_layer.forward(X, training)    # Call forward method of every object in a chain    # Pass output of the previous object as a parameter    for layer in self.layers:          layer.forward(layer.prev.output, training)    # "layer" is now the last object from the list,    # return its output    return layer.output # Performs backward pass def backward(self, output, y):    # If softmax classifier    if self.softmax_classifier_output is not None:          # First call backward method          # on the combined activation/loss          # this will set dinputs property          self.softmax_classifier_output.backward(output, y)          # Since we'll not call backward method of the last layer          # which is Softmax activation          # as we used combined activation/loss          # object, let's set dinputs in this object          self.layers[-1].dinputs = self.softmax_classifier_output.dinputs          # Call backward method going through          # all the objects but last          # in reversed order passing dinputs as a parameter          for layer in reversed(self.layers[:-1]):             layer.backward(layer.next.dinputs)          return    # First call backward method on the loss    # this will set dinputs property that the last    # layer will try to access shortly    self.loss.backward(output, y)    # Call backward method going through all the objects    # in reversed order passing dinputs as a parameter    for layer in reversed(self.layers):          layer.backward(layer.next.dinputs)       # # Create dataset# X, y = sine_data()# # Instantiate the model# model = Model()# # Add layers# model.add(Layer_Dense(1, 64))# model.add(Activation_ReLU())# model.add(Layer_Dense(64, 64))# model.add(Activation_ReLU())# model.add(Layer_Dense(64, 1))# model.add(Activation_Linear())# # Set loss and optimizer objects# model.set(# loss=Loss_MeanSquaredError(),# optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),# accuracy=Accuracy_Regression()# )# # Finalize the model# model.finalize()# model.train(X, y, epochs=10000, print_every=100)########################################################################################## # Create train and test dataset# X, y = spiral_data(samples=100, classes=2)# X_test, y_test = spiral_data(samples=100, classes=2)# # Reshape labels to be a list of lists# # Inner list contains one output (either 0 or 1)# # per each output neuron, 1 in this case# y = y.reshape(-1, 1)# y_test = y_test.reshape(-1, 1)# # Instantiate the model# model = Model()# # Add layers# model.add(Layer_Dense(2, 64, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))# model.add(Activation_ReLU())# model.add(Layer_Dense(64, 1))# model.add(Activation_Sigmoid())# # Set loss, optimizer and accuracy objects# model.set(# loss=Loss_BinaryCrossentropy(),# optimizer=Optimizer_Adam(decay=5e-7),# accuracy=Accuracy_Categorical()# )# # Finalize the model# model.finalize()# # Train the model# model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)########################################################################################## Create dataset
X, y = spiral_data(samples=1000, classes=3)
X_test, y_test = spiral_data(samples=100, classes=3)
# Instantiate the model
model = Model()
# Add layers
model.add(Layer_Dense(2, 512, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
model.add(Activation_ReLU())
model.add(Layer_Dropout(0.1))
model.add(Layer_Dense(512, 3))
model.add(Activation_Softmax())
# Set loss, optimizer and accuracy objects
model.set(
loss=Loss_CategoricalCrossentropy(),
optimizer=Optimizer_Adam(learning_rate=0.05, decay=5e-5),
accuracy=Accuracy_Categorical()
)
# Finalize the model
model.finalize()
# Train the model
model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)

本章的章节代码、更多资源和勘误表：https://nnfs.io/ch18

免责声明：如果侵犯了您的权益，请联系站长，我们会及时删除侵权内容，谢谢合作！更多信息从访问主页：qidao123.com:ToB企服之家，中国第一个企服评测及商务社交产业平台。

页: [1]

IT评测·应用市场-qidao123.com技术社区's Archiver

用 Python 从零开始创建神经网络（十八）：模型对象（Model Object）