用 Python 从零开始创建神经网络(十八):模型对象(Model Object) ...

打印 上一主题 下一主题

主题 996|帖子 996|积分 2988

马上注册,结交更多好友,享用更多功能,让你轻松玩转社区。

您需要 登录 才可以下载或查看,没有账号?立即注册

x
引言

我们构建了一个可以执行前向传播、反向传播以及精度丈量等辅助任务的模型。通过编写相当多的代码并在一些较大的代码块中进行修改,我们实现了这些功能。此时,将模型自己转化为一个对象的做法开始显得更有意义,特殊是当我们盼望生存和加载这个对象以用于未来的预测任务时。别的,我们还可以利用这个对象减少一些常见代码行,使得与当前代码库的协作更加便捷,同时也更容易构建新的模型。为了完成模型对象的转换,我们将利用我们最近工作的模型,即利用正弦数据的回归模型:
  1. from nnfs.datasets import sine_data
  2. X, y = sine_data()
复制代码
有了数据之后,我们制作模型类的第一步就是添加我们想要的各层。因此,我们可以通过以下操作来开始我们的模型类:
  1. # Model class
  2. class Model:
  3.     def __init__(self):
  4.         # Create a list of network objects
  5.         self.layers = []
  6.         
  7.     # Add objects to the model
  8.     def add(self, layer):
  9.         self.layers.append(layer)
复制代码
如许,我们就可以利用模型对象的添加方法来添加图层。仅这一点就能大大进步可读性。让我们添加一些图层:
  1. # Instantiate the model
  2. model = Model()
  3. # Add layers
  4. model.add(Layer_Dense(1, 64))
  5. model.add(Activation_ReLU())
  6. model.add(Layer_Dense(64, 64))
  7. model.add(Activation_ReLU())
  8. model.add(Layer_Dense(64, 1))
  9. model.add(Activation_Linear())
复制代码
我们现在也可以查询这个模型:
  1. print(model.layers)
复制代码
  1. >>>
  2. [<__main__.Layer_Dense object at 0x000001D1EB2A2900>,
  3. <__main__.Activation_ReLU object at 0x000001D1EB2A2180>,
  4. <__main__.Layer_Dense object at 0x000001D1EB2A3F20>,
  5. <__main__.Activation_ReLU object at 0x000001D1EB2B9220>,
  6. <__main__.Layer_Dense object at 0x000001D1EB2BB800>,
  7. <__main__.Activation_Linear object at 0x000001D1EB2BBA40>]
复制代码
除了添加层,我们还想为模型设置丧失函数和优化器。为此,我们将创建一个名为 set 的方法:
  1. # Set loss and optimizer
  2. def set(self, *, loss, optimizer):
  3.     self.loss = loss
  4.     self.optimizer = optimizer
复制代码
在参数定义中利用星号(*)体现后续的参数(在本例中是loss和optimizer)为关键字参数。由于这些参数没有默认值,因此它们是必须的关键字参数,也就是说必须通过名称和值的形式传递,从而使代码更加易读。
现在,我们可以将一个调用此方法的语句添加到我们新创建的模型对象中,并传递loss和optimizer对象:
  1. # Create datasetX, y = sine_data()# Instantiate the model
  2. model = Model()
  3. # Add layers
  4. model.add(Layer_Dense(1, 64))
  5. model.add(Activation_ReLU())
  6. model.add(Layer_Dense(64, 64))
  7. model.add(Activation_ReLU())
  8. model.add(Layer_Dense(64, 1))
  9. model.add(Activation_Linear())
  10. # Set loss and optimizer objectsmodel.set(    loss=Loss_MeanSquaredError(),    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),    )
复制代码
设置好模型的层、丧失函数和优化器后,下一步就是练习了,因此我们要添加一个 train 方法。现在,我们先将其作为一个占位符,不久后再进行填充:
  1. # Train the model
  2. def train(self, X, y, *, epochs=1, print_every=1):
  3.     # Main training loop
  4.     for epoch in range(1, epochs+1):
  5.         # Temporary
  6.         pass
复制代码
然后,我们可以在模型定义中添加对 train 方法的调用。我们将传递练习数据、epochs 的数目(10000,我们目前利用的是),以及打印练习摘要的频率。我们不必要或不盼望每一步都打印,因此我们将对其进行设置:
  1. # Create datasetX, y = sine_data()# Instantiate the model
  2. model = Model()
  3. # Add layers
  4. model.add(Layer_Dense(1, 64))
  5. model.add(Activation_ReLU())
  6. model.add(Layer_Dense(64, 64))
  7. model.add(Activation_ReLU())
  8. model.add(Layer_Dense(64, 1))
  9. model.add(Activation_Linear())
  10. # Set loss and optimizer objectsmodel.set(    loss=Loss_MeanSquaredError(),    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),    )model.train(X, y, epochs=10000, print_every=100)
复制代码
要进行练习,我们必要执行前向传播。在对象中执行前向传播轻微复杂一些,因为我们必要在层的循环中完成此操作,并且必要知道前一层的输出以正确地传递数据。查询前一层的一个问题是,第一层没有“前一层”。我们定义的第一层是第一隐含层。因此,我们的一个选择是创建一个“输入层”。这被认为是神经网络中的一层,但没有与之相关的权重和偏置。输入层仅包含练习数据,我们仅在循环迭代层时将其用作第一层的“前一层”。我们将创建一个新类,并像调用Layer_Dense类一样调用它,称为Layer_Input:
  1. # Input "layer"
  2. class Layer_Input:
  3.     # Forward pass
  4.     def forward(self, inputs):
  5.         self.output = inputs
复制代码
forward方法将练习样本设置为self.output。这一属性与其他层是通用的。这里没有必要实现反向传播方法,因为我们永久不会用到它。现在大概看起来创建这个类有点多余,但盼望很快你就会明白我们将如何利用它。接下来,我们要为模型的每一层设置前一层和后一层的属性。我们将在Model类中创建一个名为finalize的方法:
  1.         # Finalize the model
  2.         def finalize(self):
  3.             # Create and set the input layer
  4.             self.input_layer = Layer_Input()
  5.             # Count all the objects
  6.             layer_count = len(self.layers)
  7.             # Iterate the objects
  8.             for i in range(layer_count):
  9.                 # If it's the first layer,
  10.                 # the previous layer object is the input layer
  11.                 if i == 0:
  12.                     self.layers[i].prev = self.input_layer
  13.                     self.layers[i].next = self.layers[i+1]
  14.                 # All layers except for the first and the last
  15.                 elif i < layer_count - 1:
  16.                     self.layers[i].prev = self.layers[i-1]
  17.                     self.layers[i].next = self.layers[i+1]
  18.                 # The last layer - the next object is the loss
  19.                 else:
  20.                     self.layers[i].prev = self.layers[i-1]
  21.                     self.layers[i].next = self.loss
复制代码
这段代码创建了一个输入层,并为模型对象的self.layers列表中的每一层设置了next和prev引用。我们创建了Layer_Input类,以便在循环中为第一隐藏层设置prev属性,因为我们将以同一的方式调用所有层。对于末了一层,其next层将是我们已经创建的丧失函数。
现在,我们已经为模型对象执行前向传播所需的层信息准备就绪,让我们添加一个forward方法。我们将同时在练习时和之后仅进行预测(也称为模型推理)时利用这个forward方法。以下是在Model类中继续添加的代码:
  1. # Forward pass
  2. class Model:
  3.         ...
  4.     # Performs forward pass
  5.     def forward(self, X):
  6.         # Call forward method on the input layer
  7.         # this will set the output property that
  8.         # the first layer in "prev" object is expecting
  9.         self.input_layer.forward(X)
  10.         # Call forward method of every object in a chain
  11.         # Pass output of the previous object as a parameter
  12.         for layer in self.layers:
  13.             layer.forward(layer.prev.output)
  14.         # "layer" is now the last object from the list,
  15.         # return its output
  16.         return layer.output
复制代码
在这种情况下,我们传入输入数据                                    X                              X                  X,然后简单地通过 Model 对象中的 input_layer 处置惩罚该数据,这会在该对象中创建一个 output 属性。从这里开始,我们迭代 self.layers 中的层,这些层从第一个隐藏层开始。对于每一层,我们对上一层的输出数据 layer.prev.output 执行前向传播。对于第一个隐藏层,layer.prev 是 self.input_layer。调用每一层的 forward 方法时会创建该层的 output 属性,然后该属性会作为输入传递到下一层的 forward 方法调用中。一旦我们遍历了所有层,就会返回末了一层的输出。
这就是一次前向传播。现在,让我们将这个前向传播方法调用添加到 Model 类的 train 方法中:
  1. # Forward pass
  2. class Model:
  3.         ...
  4.     # Train the model
  5.     def train(self, X, y, *, epochs=1, print_every=1):
  6.         # Main training loop
  7.         for epoch in range(1, epochs+1):
  8.             # Perform the forward pass
  9.             output = self.forward(X)
  10.             # Temporary
  11.             print(output)
  12.             sys.exit()
复制代码
到目前为止的完整Model类:
  1. # Model class
  2. class Model:
  3.     def __init__(self):
  4.         # Create a list of network objects
  5.         self.layers = []
  6.         
  7.     # Add objects to the model
  8.     def add(self, layer):
  9.         self.layers.append(layer)
  10.         # Set loss and optimizer    def set(self, *, loss, optimizer):        self.loss = loss        self.optimizer = optimizer        # Train the model    def train(self, X, y, *, epochs=1, print_every=1):        # Main training loop        for epoch in range(1, epochs+1):            # Perform the forward pass            output = self.forward(X)            # Temporary            print(output)            sys.exit()    # Finalize the model    def finalize(self):        # Create and set the input layer        self.input_layer = Layer_Input()        # Count all the objects        layer_count = len(self.layers)        # Iterate the objects        for i in range(layer_count):            # If it's the first layer,            # the previous layer object is the input layer            if i == 0:                self.layers[i].prev = self.input_layer                self.layers[i].next = self.layers[i+1]            # All layers except for the first and the last            elif i < layer_count - 1:                self.layers[i].prev = self.layers[i-1]                self.layers[i].next = self.layers[i+1]            # The last layer - the next object is the loss            else:                self.layers[i].prev = self.layers[i-1]                self.layers[i].next = self.loss    # Performs forward pass    def forward(self, X):        # Call forward method on the input layer        # this will set the output property that        # the first layer in "prev" object is expecting        self.input_layer.forward(X)        # Call forward method of every object in a chain        # Pass output of the previous object as a parameter        for layer in self.layers:            layer.forward(layer.prev.output)        # "layer" is now the last object from the list,        # return its output        return layer.output
复制代码
末了,我们可以在主代码中添加 finalize 方法调用(请记着,除其他事项外,该方法还能让模型的图层知道它们的上一层和下一层)。
  1. # Create datasetX, y = sine_data()# Instantiate the model
  2. model = Model()
  3. # Add layers
  4. model.add(Layer_Dense(1, 64))
  5. model.add(Activation_ReLU())
  6. model.add(Layer_Dense(64, 64))
  7. model.add(Activation_ReLU())
  8. model.add(Layer_Dense(64, 1))
  9. model.add(Activation_Linear())
  10. # Set loss and optimizer objectsmodel.set(    loss=Loss_MeanSquaredError(),    optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),    )# Finalize the modelmodel.finalize()model.train(X, y, epochs=10000, print_every=100)
复制代码
  1. >>>
  2. [[ 0.00000000e+00]
  3. [-1.13209149e-08]
  4. [-2.26418297e-08]
  5. ...
  6. [-1.12869511e-05]
  7. [-1.12982725e-05]
  8. [-1.13095930e-05]]
复制代码
此时,我们已经在Model类中覆盖了模型的前向传播。我们仍必要计算丧失和准确率,并进行反向传播。在此之前,我们必要知道哪些层是“可练习的”,也就是说这些层具有我们可以调整的权重和偏置。为此,我们必要查抄层是否有weights或biases属性。我们可以通过以下代码进行查抄:
  1.                         # 如果层包含一个名为“weights”的属性,  
  2.                         # 那么它是一个可训练层 -  
  3.                         # 将其添加到可训练层列表中  
  4.                         # 我们不需要检查偏置 -  
  5.                         # 检查权重已经足够了  
  6.                         if hasattr(self.layers[i], 'weights'):
  7.                                 self.trainable_layers.append(self.layers[i])
复制代码
其中,                                   i                              i                  i 是层列表中某一层的索引。我们将把这段代码添加到 finalize 方法中。以下是目前该方法的完整代码:
  1.     # Finalize the model
  2.     def finalize(self):
  3.         # Create and set the input layer
  4.         self.input_layer = Layer_Input()
  5.         # Count all the objects
  6.         layer_count = len(self.layers)
  7.         # Initialize a list containing trainable layers:
  8.         self.trainable_layers = []
  9.         # Iterate the objects
  10.         for i in range(layer_count):
  11.             # If it's the first layer,
  12.             # the previous layer object is the input layer
  13.             if i == 0:
  14.                 self.layers[i].prev = self.input_layer
  15.                 self.layers[i].next = self.layers[i+1]
  16.             # All layers except for the first and the last
  17.             elif i < layer_count - 1:
  18.                 self.layers[i].prev = self.layers[i-1]
  19.                 self.layers[i].next = self.layers[i+1]
  20.             # The last layer - the next object is the loss
  21.             # Also let's save aside the reference to the last object
  22.             # whose output is the model's output
  23.             else:
  24.                 self.layers[i].prev = self.layers[i-1]
  25.                 self.layers[i].next = self.loss
  26.                 self.output_layer_activation = self.layers[i]
  27.             
  28.             # 如果层包含一个名为“weights”的属性,  
  29.             # 那么它是一个可训练层 -  
  30.             # 将其添加到可训练层列表中  
  31.             # 我们不需要检查偏置 -  
  32.             # 检查权重已经足够了  
  33.             if hasattr(self.layers[i], 'weights'):
  34.                     self.trainable_layers.append(self.layers[i])
复制代码
接下来,我们将修改平凡 Loss 类,使其包含以下内容:
  1. # Common loss class
  2. class Loss:
  3.         ...        
  4.     # Calculates the data and regularization losses
  5.     # given model output and ground truth values
  6.     def calculate(self, output, y):
  7.         # Calculate sample losses
  8.         sample_losses = self.forward(output, y)
  9.         # Calculate mean loss
  10.         data_loss = np.mean(sample_losses)
  11.         # Return the data and regularization losses
  12.         return data_loss, self.regularization_loss()   
  13.         
  14.     # Set/remember trainable layers
  15.     def remember_trainable_layers(self, trainable_layers):
  16.         self.trainable_layers = trainable_layers
复制代码
commonLoss 类中的 remember_trainable_layers 方法“告知”丧失对象哪些是 Model 对象中的可练习层。在单次调用期间,calculate 方法已被修改为还会返回 self.regularization_loss() 的值。regularization_loss 方法目前必要一个层对象,但随着在 remember_trainable_layers 方法中设置了 self.trainable_layers 属性,我们现在可以迭代所有可练习层,以计算整个模型的正则化丧失,而不是每次仅针对一个层进行计算:
  1. # Common loss class
  2. class Loss:
  3.         ...
  4.         # Regularization loss calculation
  5.     def regularization_loss(self):        
  6.         # 0 by default
  7.         regularization_loss = 0
  8.         # Calculate regularization loss
  9.         # iterate all trainable layers
  10.         for layer in self.trainable_layers:
  11.             # L1 regularization - weights
  12.             # calculate only when factor greater than 0
  13.             if layer.weight_regularizer_l1 > 0:
  14.                 regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
  15.             # L2 regularization - weights
  16.             if layer.weight_regularizer_l2 > 0:
  17.                 regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
  18.             # L1 regularization - biases
  19.             # calculate only when factor greater than 0
  20.             if layer.bias_regularizer_l1 > 0:
  21.                 regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
  22.             # L2 regularization - biases
  23.             if layer.bias_regularizer_l2 > 0:
  24.                 regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
  25.         return regularization_loss
复制代码
为了计算准确率,我们必要预测效果。目前,根据模型的范例,预测必要不同的代码。例如,对于 softmax 分类器,我们利用 np.argmax(),但对于回归,由于输出层利用线性激活函数,预测效果直接为输出值。抱负情况下,我们必要一个预测方法,该方法能够为我们的模型选择合适的预测方式。为此,我们将在每个激活函数类中添加一个 predictions 方法:
  1. # Softmax activation
  2. class Activation_Softmax:
  3.         ...            
  4.     # Calculate predictions for outputs
  5.     def predictions(self, outputs):
  6.         return np.argmax(outputs, axis=1)
复制代码
  1. # Sigmoid activation
  2. class Activation_Sigmoid:
  3.         ...
  4.     # Calculate predictions for outputs
  5.     def predictions(self, outputs):
  6.         return (outputs > 0.5) * 1
复制代码
  1. # Linear activation
  2. class Activation_Linear:
  3.     ...
  4.     # Calculate predictions for outputs
  5.     def predictions(self, outputs):
  6.         return outputs
复制代码
在 predictions 函数内部进行的所有计算与之前章节中针对适当模型所执行的计算雷同。只管我们没有筹划将 ReLU 激活函数用于输出层的激活函数,但我们为了完整性仍会在此处包含它:
  1. # ReLU activation
  2. class Activation_ReLU:  
  3.         ...
  4.     # Calculate predictions for outputs
  5.     def predictions(self, outputs):
  6.         return outputs
复制代码
我们仍然必要在 Model 对象中为终极层的激活函数设置一个引用。之后我们可以调用 predictions 方法,该方法将根据输出计算并返回预测值。我们将在 Model 类的 finalize 方法中设置这一引用。
  1. # Model class
  2. class Model:
  3.         ...
  4.         # Finalize the model
  5.     def finalize(self):
  6.             ...
  7.                         # The last layer - the next object is the loss
  8.             # Also let's save aside the reference to the last object
  9.             # whose output is the model's output
  10.             else:
  11.                 self.layers[i].prev = self.layers[i-1]
  12.                 self.layers[i].next = self.loss
  13.                 self.output_layer_activation = self.layers[i]
复制代码
就像不同的预测方法一样,我们也必要以不同的方式计算准确率。我们将以类似于特定丧失类对象实现的方式来实现这一功能——创建特定的准确率类及其对象,并将它们与模型关联。
首先,我们会编写一个通用的 Accuracy 类,该类目前只包含一个方法 calculate,用于返回根据比较效果计算的准确率。我们已经在代码中添加了对 self.compare 方法的调用,但这个方法目前还不存在。我们将在继承自 Accuracy 类的其他类中创建该方法。现在只必要知道这个方法会返回一个由 True 和 False 值组成的列表,指示预测是否与真实值匹配。接下来,我们计算这些值的均匀值(True 被视为1,False 被视为0),并将其作为准确率返回。代码如下:
  1. # Common accuracy class
  2. class Accuracy:
  3.     # Calculates an accuracy
  4.     # given predictions and ground truth values
  5.     def calculate(self, predictions, y):
  6.         # Get comparison results
  7.         comparisons = self.compare(predictions, y)
  8.         # Calculate an accuracy
  9.         accuracy = np.mean(comparisons)
  10.         # Return accuracy
  11.         return accuracy
复制代码
接下来,我们可以利用这个通用的 Accuracy 类,通过继承它并进一步构建针对特定范例模型的功能。通常情况下,每个这些类都会包含两个方法:init(不要与 Python 类的 __init__ 方法混淆)用于从模型对象内部进行初始化,以及 compare 用于执行比较计算。
对于回归模型,init 方法将计算准确率的精度(与我们之前为回归模型编写并在练习循环之前运行的内容雷同)。compare 方法将包含我们在练习循环中实际实现的比较代码,利用 self.precision。必要留意的是,初始化时不会重新计算精度,除非通过将 reinit 参数设置为 True 强制重新计算。这种筹划允许多种用例,包罗独立设置 self.precision、在必要时调用 init(例如,在模型创建过程中从外部调用),甚至多次调用 init(这将在后续某些情况下非常有效):
  1. # Accuracy calculation for regression model
  2. class Accuracy_Regression(Accuracy):
  3.     def __init__(self):
  4.         # Create precision property
  5.         self.precision = None
  6.     # Calculates precision value
  7.     # based on passed in ground truth
  8.     def init(self, y, reinit=False):
  9.         if self.precision is None or reinit:
  10.             self.precision = np.std(y) / 250
  11.     # Compares predictions to the ground truth values
  12.     def compare(self, predictions, y):
  13.         return np.absolute(predictions - y) < self.precision
复制代码
然后,我们可以通过在 Model 类的 set 方法中,以与当前设置丧失函数和优化器雷同的方式设置准确率对象。
  1. # Model class
  2. class Model:
  3.         ...
  4.         # Set loss, optimizer and accuracy
  5.         def set(self, *, loss, optimizer, accuracy):
  6.                 self.loss = loss
  7.                 self.optimizer = optimizer
  8.                 self.accuracy = accuracy
复制代码
然后,我们可以在完成前向传播代码之后,将丧失和准确率的计算添加到模型中。必要留意的是,我们还在 train 方法的开头通过 self.accuracy.init(y) 初始化准确率,并且可以多次调用,如之前提到的那样。在回归准确率的情况下,这将在第一次调用时进行一次精度计算。以下是实现了丧失和准确率计算的 train 方法代码:
  1. # Model class
  2. class Model:
  3.         ...
  4.     # Train the model
  5.     def train(self, X, y, *, epochs=1, print_every=1):
  6.         # Initialize accuracy object
  7.         self.accuracy.init(y)
  8.         # Main training loop
  9.         for epoch in range(1, epochs+1):
  10.             # Perform the forward pass
  11.             output = self.forward(X)
  12.             # Calculate loss
  13.             data_loss, regularization_loss = self.loss.calculate(output, y)
  14.             loss = data_loss + regularization_loss
  15.             # Get predictions and calculate an accuracy
  16.             predictions = self.output_layer_activation.predictions(output)
  17.             accuracy = self.accuracy.calculate(predictions, y)
复制代码
末了,我们将在 finalize 方法中通过调用先前创建的 remember_trainable_layers 方法并传入 Loss 类的对象来实现(self.loss.remember_trainable_layers(self.trainable_layers))。以下是目前为止的完整模型类代码:
  1. # Model class
  2. class Model:
  3.     def __init__(self):
  4.         # Create a list of network objects
  5.         self.layers = []
  6.         
  7.     # Add objects to the model
  8.     def add(self, layer):
  9.         self.layers.append(layer)
  10.         # Set loss, optimizer and accuracy    def set(self, *, loss, optimizer, accuracy):        self.loss = loss        self.optimizer = optimizer        self.accuracy = accuracy                # Finalize the model    def finalize(self):        # Create and set the input layer        self.input_layer = Layer_Input()        # Count all the objects        layer_count = len(self.layers)        # Initialize a list containing trainable layers:        self.trainable_layers = []        # Iterate the objects        for i in range(layer_count):            # If it's the first layer,            # the previous layer object is the input layer            if i == 0:                self.layers[i].prev = self.input_layer                self.layers[i].next = self.layers[i+1]            # All layers except for the first and the last            elif i < layer_count - 1:                self.layers[i].prev = self.layers[i-1]                self.layers[i].next = self.layers[i+1]            # The last layer - the next object is the loss            # Also let's save aside the reference to the last object            # whose output is the model's output            else:                self.layers[i].prev = self.layers[i-1]                self.layers[i].next = self.loss                self.output_layer_activation = self.layers[i]            # 如果层包含一个名为“weights”的属性,              # 那么它是一个可练习层 -              # 将其添加到可练习层列表中              # 我们不必要查抄偏置 -              # 查抄权重已经足够了              if hasattr(self.layers[i], 'weights'):                    self.trainable_layers.append(self.layers[i])                # Update loss object with trainable layers                self.loss.remember_trainable_layers(self.trainable_layers)    # Train the model    def train(self, X, y, *, epochs=1, print_every=1):        # Initialize accuracy object        self.accuracy.init(y)        # Main training loop        for epoch in range(1, epochs+1):            # Perform the forward pass            output = self.forward(X)            # Calculate loss            data_loss, regularization_loss = self.loss.calculate(output, y)            loss = data_loss + regularization_loss            # Get predictions and calculate an accuracy            predictions = self.output_layer_activation.predictions(output)            accuracy = self.accuracy.calculate(predictions, y)                # Performs forward pass    def forward(self, X):        # Call forward method on the input layer        # this will set the output property that        # the first layer in "prev" object is expecting        self.input_layer.forward(X)        # Call forward method of every object in a chain        # Pass output of the previous object as a parameter        for layer in self.layers:            layer.forward(layer.prev.output)        # "layer" is now the last object from the list,        # return its output        return layer.output
复制代码
Loss 类的全部代码:
  1. # Common loss class
  2. class Loss:
  3.     # Regularization loss calculation
  4.     def regularization_loss(self):        
  5.         # 0 by default
  6.         regularization_loss = 0
  7.         # Calculate regularization loss
  8.         # iterate all trainable layers
  9.         for layer in self.trainable_layers:
  10.             # L1 regularization - weights
  11.             # calculate only when factor greater than 0
  12.             if layer.weight_regularizer_l1 > 0:
  13.                 regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))
  14.             # L2 regularization - weights
  15.             if layer.weight_regularizer_l2 > 0:
  16.                 regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)
  17.             # L1 regularization - biases
  18.             # calculate only when factor greater than 0
  19.             if layer.bias_regularizer_l1 > 0:
  20.                 regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))
  21.             # L2 regularization - biases
  22.             if layer.bias_regularizer_l2 > 0:
  23.                 regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)
  24.         return regularization_loss
  25.     # Set/remember trainable layers
  26.     def remember_trainable_layers(self, trainable_layers):
  27.        self.trainable_layers = trainable_layers
  28.     # Calculates the data and regularization losses
  29.     # given model output and ground truth values
  30.     def calculate(self, output, y):
  31.         # Calculate sample losses
  32.         sample_losses = self.forward(output, y)
  33.         # Calculate mean loss
  34.         data_loss = np.mean(sample_losses)
  35.         # Return the data and regularization losses
  36.         return data_loss, self.regularization_loss()
复制代码
现在我们已经完成了完整的前向传播并计算了丧失和准确率,接下来可以开始反向传播。在 Model 类中的 backward 方法在结构上与 forward 方法类似,只是顺序相反并利用不同的参数。按照之前练习方法中的反向传播,我们必要调用丧失对象的 backward 方法来创建 dinputs 属性。接着,我们将按照相反的顺序遍历所有层,调用它们的 backward 方法,并将下一层(正常顺序中的下一层)的 dinputs 属性作为参数传入,从而有效地反向传播由该下一层返回的梯度。请记着,我们已经将丧失对象设置为末了一层(输出层)的下一层。
  1. # Model class
  2. class Model:
  3.         ...
  4.     # Performs backward pass
  5.     def backward(self, output, y):
  6.         # First call backward method on the loss
  7.         # this will set dinputs property that the last
  8.         # layer will try to access shortly
  9.         self.loss.backward(output, y)
  10.         # Call backward method going through all the objects
  11.         # in reversed order passing dinputs as a parameter
  12.         for layer in reversed(self.layers):
  13.             layer.backward(layer.next.dinputs)
复制代码
接下来,我们将在 train 方法的末尾调用该 backward 方法:
  1.                         # Perform backward pass
  2.                         self.backward(output, y)
复制代码
在完成反向传播之后,末了一个操作是进行优化。之前,我们针对每一个可练习的层多次调用优化器对象的 update_params 方法。现在,我们必要通过遍历可练习层的列表并在循环中调用 update_params() 方法,使这段代码更加通用:
  1.                         # Optimize (update parameters)
  2.             self.optimizer.pre_update_params()
  3.             for layer in self.trainable_layers:
  4.                 self.optimizer.update_params(layer)
  5.             self.optimizer.post_update_params()
复制代码
然后我们可以输出有效的信息——此时,train 方法的末了一个参数就派上了用场:
  1.                         # Print a summary
  2.             if not epoch % print_every:
  3.                 print(f'epoch: {epoch}, ' +
  4.                       f'acc: {accuracy:.3f}, ' +
  5.                       f'loss: {loss:.3f} (' +
  6.                       f'data_loss: {data_loss:.3f}, ' +
  7.                       f'reg_loss: {regularization_loss:.3f}), ' +
  8.                       f'lr: {self.optimizer.current_learning_rate}')
复制代码
  1. # Model class
  2. class Model:
  3.         ...
  4.     # Train the model
  5.     def train(self, X, y, *, epochs=1, print_every=1):
  6.         # Initialize accuracy object
  7.         self.accuracy.init(y)
  8.         # Main training loop
  9.         for epoch in range(1, epochs+1):
  10.             # Perform the forward pass
  11.             output = self.forward(X)
  12.             # Calculate loss
  13.             data_loss, regularization_loss = self.loss.calculate(output, y)
  14.             loss = data_loss + regularization_loss
  15.             # Get predictions and calculate an accuracy
  16.             predictions = self.output_layer_activation.predictions(output)
  17.             accuracy = self.accuracy.calculate(predictions, y)
  18.             # Perform backward pass            self.backward(output, y)            # Optimize (update parameters)            self.optimizer.pre_update_params()            for layer in self.trainable_layers:                self.optimizer.update_params(layer)            self.optimizer.post_update_params()            # Print a summary            if not epoch % print_every:                print(f'epoch: {epoch}, ' +                      f'acc: {accuracy:.3f}, ' +                      f'loss: {loss:.3f} (' +                      f'data_loss: {data_loss:.3f}, ' +                      f'reg_loss: {regularization_loss:.3f}), ' +                      f'lr: {self.optimizer.current_learning_rate}')
复制代码
现在,我们可以将精度类对象传入模型,并测试模型的性能:
  1. >>>
  2. epoch: 100, acc: 0.006, loss: 0.085 (data_loss: 0.085, reg_loss: 0.000), lr: 0.004549590536851684
  3. epoch: 200, acc: 0.032, loss: 0.035 (data_loss: 0.035, reg_loss: 0.000), lr: 0.004170141784820684
  4. ...
  5. epoch: 9900, acc: 0.934, loss: 0.000 (data_loss: 0.000, reg_loss: 0.000), lr: 0.00045875768419121016
  6. epoch: 10000, acc: 0.970, loss: 0.000 (data_loss: 0.000, reg_loss: 0.000), lr: 0.00045458678061641964
复制代码
我们的新模型体现良好,现在我们能够通过 Model 类更轻松地创建新模型。我们必要继续修改这些类,以支持全新的模型。例如,我们尚未处置惩罚二元逻辑回归。为此,我们必要添加两点内容。首先,我们必要计算分类准确率:
  1. # Accuracy calculation for classification model
  2. class Accuracy_Categorical(Accuracy):
  3.     # No initialization is needed
  4.     def init(self, y):
  5.         pass
  6.     # Compares predictions to the ground truth values
  7.     def compare(self, predictions, y):
  8.         if len(y.shape) == 2:
  9.             y = np.argmax(y, axis=1)
  10.         return predictions == y
复制代码
这与分类的准确率计算雷同,只是将其封装到一个类中,并增加了一个切换参数。当该类与二元交织熵模型一起利用时,这个切换参数会禁用将独热编码转换为稀疏标签的操作,因为该模型始终必要真实值是一个二维数组,并且它们未进行独热编码。必要留意的是,这里并未执行任何初始化,但该方法必要存在,因为它将在 Model 类的 train 方法中调用。接下来,我们必要添加的是利用验证数据对模型进行验证的能力。验证只必要执行前向传播并计算丧失(仅数据丧失)。我们将修改 Loss 类的 calculate 方法,以使其也能够计算验证丧失:
  1. # Common loss class
  2. class Loss:
  3.         ...
  4.     # Calculates the data and regularization losses
  5.     # given model output and ground truth values
  6.     def calculate(self, output, y, *, include_regularization=False):
  7.         # Calculate sample losses
  8.         sample_losses = self.forward(output, y)
  9.         # Calculate mean loss
  10.         data_loss = np.mean(sample_losses)
  11.         # If just data loss - return it
  12.         if not include_regularization:
  13.             return data_loss
  14.         # Return the data and regularization losses
  15.         return data_loss, self.regularization_loss()
复制代码
我们新增了一个参数和条件,以仅返回数据丧失,因为在这种情况下不会利用正则化丧失。为了运行它,我们将以与练习数据雷同的方式传递预测值和目标值。默认情况下,我们不会返回正则化丧失,这意味着我们必要更新 train 方法中对该方法的调用,以在练习期间包含正则化丧失:
  1.                         # Calculate loss
  2.             data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)
复制代码
然后我们可以将验证代码添加到 Model 类中的 train 方法中。我们向函数添加了 validation_data 参数,该参数担当一个包含验证数据(样本和目标)的元组;添加了一个 if 语句查抄是否存在验证数据;如果存在,则执行代码对这些数据进行前向传播,按照与练习期间雷同的方式计算丧失和准确率,并打印效果:
  1. # Model class
  2. class Model:
  3.         ...
  4.         # Train the model
  5.     def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
  6.                 ...
  7.         # If there is the validation data
  8.         if validation_data is not None:
  9.             # For better readability
  10.             X_val, y_val = validation_data
  11.             # Perform the forward pass
  12.             output = self.forward(X_val)
  13.             # Calculate the loss
  14.             loss = self.loss.calculate(output, y_val)
  15.             # Get predictions and calculate an accuracy
  16.             predictions = self.output_layer_activation.predictions(output)
  17.             accuracy = self.accuracy.calculate(predictions, y_val)
  18.             # Print a summary
  19.             print(f'validation, ' +
  20.                   f'acc: {accuracy:.3f}, ' +
  21.                   f'loss: {loss:.3f}')
复制代码
现在我们可以通过以下代码创建测试数据并测试二元逻辑回归模型:
  1. # Create train and test dataset
  2. X, y = spiral_data(samples=100, classes=2)
  3. X_test, y_test = spiral_data(samples=100, classes=2)
  4. # Reshape labels to be a list of lists
  5. # Inner list contains one output (either 0 or 1)
  6. # per each output neuron, 1 in this case
  7. y = y.reshape(-1, 1)
  8. y_test = y_test.reshape(-1, 1)
  9. # Instantiate the model
  10. model = Model()
  11. # Add layers
  12. model.add(Layer_Dense(2, 64, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
  13. model.add(Activation_ReLU())
  14. model.add(Layer_Dense(64, 1))
  15. model.add(Activation_Sigmoid())
  16. # Set loss, optimizer and accuracy objects
  17. model.set(
  18.     loss=Loss_BinaryCrossentropy(),
  19.     optimizer=Optimizer_Adam(decay=5e-7),
  20.     accuracy=Accuracy_Categorical()
  21.     )
  22. # Finalize the model
  23. model.finalize()
  24. # Train the model
  25. model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)
复制代码
  1. >>>
  2. epoch: 100, acc: 0.625, loss: 0.675 (data_loss: 0.674, reg_loss: 0.001), lr: 0.0009999505024501287
  3. epoch: 200, acc: 0.630, loss: 0.669 (data_loss: 0.668, reg_loss: 0.001), lr: 0.0009999005098992651
  4. ...
  5. epoch: 9900, acc: 0.905, loss: 0.312 (data_loss: 0.276, reg_loss: 0.037), lr: 0.0009950748768967994
  6. epoch: 10000, acc: 0.905, loss: 0.312 (data_loss: 0.275, reg_loss: 0.036), lr: 0.0009950253706593885
  7. validation, acc: 0.775, loss: 0.423
复制代码
现在,我们已经简化了前向传播和反向传播代码,包罗验证过程,这是重新引入Dropout的好时机。回顾一下,Dropout是一种通过禁用或过滤掉某些神经元来正则化和进步模型泛化能力的方法。如果在我们的模型中利用Dropout,那么在进行验证和推理(预测)时,我们必要确保倒霉用Dropout。在之前的代码中,通过在验证过程中不调用Dropout的前向传播方法实现了这一点。这里,我们有一个通用方法,用于同时执行练习和验证的前向传播,因此必要一种不同的方法来关闭Dropout——即在练习过程中通知各层,并让它们“决定”是否包罗计算。我们要做的第一件事是为所有层和激活函数类的前向传播方法添加一个布尔参数training,因为我们必要以同一的方式调用它们:
  1.         # Forward pass
  2.         def forward(self, inputs, training):
复制代码
当我们不处于练习模式时,可以在Layer_Dropout类中将输出直接设置为输入,并在不改变输出的情况下从方法中返回:
  1.                 # If not in the training mode - return values
  2.                 if not training:
  3.                         self.output = inputs.copy()
  4.                         return
复制代码
我们在培训时,会让dropout参与进来:
  1. # Dropout
  2. class Layer_Dropout:        
  3.         ...
  4.     # Forward pass
  5.     def forward(self, inputs, training):
  6.         # Save input values
  7.         self.inputs = inputs
  8.         # If not in the training mode - return values
  9.         if not training:
  10.             self.output = inputs.copy()
  11.             return
  12.         # Generate and save scaled mask
  13.         self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate
  14.         # Apply mask to output values
  15.         self.output = inputs * self.binary_mask
复制代码
接下来,我们修改Model类的forward方法,添加training参数,并调用各层的forward方法以传递该参数的值:
  1. # Model class
  2. class Model:
  3.         ...
  4.     # Performs forward pass
  5.     def forward(self, X, training):
  6.         # Call forward method on the input layer
  7.         # this will set the output property that
  8.         # the first layer in "prev" object is expecting
  9.         self.input_layer.forward(X, training)
  10.         # Call forward method of every object in a chain
  11.         # Pass output of the previous object as a parameter
  12.         for layer in self.layers:
  13.             layer.forward(layer.prev.output, training)
  14.         # "layer" is now the last object from the list,
  15.         # return its output
  16.         return layer.output
复制代码
我们还必要更新Model类中的train方法,因为在调用forward方法时,training参数必要被设置为True:
  1.                         # Perform the forward pass
  2.                         output = self.forward(X, training=True)
复制代码
然后在验证过程中将其设置为False:
  1.                         # Perform the forward pass
  2.                         output = self.forward(X_val, training=False)
复制代码
  1. # Model class
  2. class Model:
  3.         ...
  4.     # Train the model
  5.     def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):
  6.         # Initialize accuracy object
  7.         self.accuracy.init(y)
  8.         # Main training loop
  9.         for epoch in range(1, epochs+1):
  10.             # Perform the forward pass
  11.             output = self.forward(X, training=True)
  12.             # Calculate loss
  13.             data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)
  14.             loss = data_loss + regularization_loss
  15.             # Get predictions and calculate an accuracy
  16.             predictions = self.output_layer_activation.predictions(output)
  17.             accuracy = self.accuracy.calculate(predictions, y)
  18.             # Perform backward pass
  19.             self.backward(output, y)
  20.             # Optimize (update parameters)
  21.             self.optimizer.pre_update_params()
  22.             for layer in self.trainable_layers:
  23.                 self.optimizer.update_params(layer)
  24.             self.optimizer.post_update_params()
  25.             # Print a summary
  26.             if not epoch % print_every:
  27.                 print(f'epoch: {epoch}, ' +
  28.                       f'acc: {accuracy:.3f}, ' +
  29.                       f'loss: {loss:.3f} (' +
  30.                       f'data_loss: {data_loss:.3f}, ' +
  31.                       f'reg_loss: {regularization_loss:.3f}), ' +
  32.                       f'lr: {self.optimizer.current_learning_rate}')
  33.         # If there is the validation data
  34.         if validation_data is not None:
  35.             # For better readability
  36.             X_val, y_val = validation_data
  37.             # Perform the forward pass
  38.             output = self.forward(X_val, training=False)
  39.             # Calculate the loss
  40.             loss = self.loss.calculate(output, y_val)
  41.             # Get predictions and calculate an accuracy
  42.             predictions = self.output_layer_activation.predictions(output)
  43.             accuracy = self.accuracy.calculate(predictions, y_val)
  44.             # Print a summary
  45.             print(f'validation, ' +
  46.                   f'acc: {accuracy:.3f}, ' +
  47.                   f'loss: {loss:.3f}')
复制代码
末了,我们必要处置惩罚Model类中结合了Softmax激活和CrossEntropy丧失的类。这里的挑战在于,之前我们是为每个模型单独手动定义前向传播和后向传播的。然而,现在我们在计算的两个方向上都有循环,对输出和梯度的计算有同一的方式,以及其他改进。我们不能简单地移除Softmax激活和Categorical Cross-Entropy丧失并用一个结合了两者的对象替换它们。按照目前的代码,这种方式是行不通的,因为我们以特定的方式处置惩罚输出激活函数和丧失函数。
由于结合对象仅优化了后向传播的部分,我们决定让前向传播保持稳定,仍然利用单独的Softmax激活和Categorical Cross-Entropy丧失对象,只处置惩罚后向传播部分。
首先,我们必要主动确定当前模型是否是一个分类器,以及它是否利用了Softmax激活和Categorical Cross-Entropy丧失。这可以通过查抄末了一层对象的类名(这是一个激活函数对象)以及丧失函数对象的类名来实现。我们将在finalize方法的末尾添加此查抄:
  1.         # If output activation is Softmax and
  2.         # loss function is Categorical Cross-Entropy
  3.         # create an object of combined activation
  4.         # and loss function containing
  5.         # faster gradient calculation
  6.         if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
  7.             # Create an object of combined activation
  8.             # and loss functions
  9.             self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()
复制代码
为了进行此查抄,我们利用了 Python 的isinstance函数。如果给定对象是指定类的实例,isinstance函数将返回True。如果两个查抄都返回True,我们将设置一个新属性,该属性包含Activation_Softmax_Loss_CategoricalCrossentropy类的对象。
我们还必要在Model类的构造函数中,将此属性初始化为None值:
  1.         # Softmax classifier's output object
  2.         self.softmax_classifier_output = None
复制代码
末了一步是在反向传播期间查抄这个对象是否已设置,如果已设置则利用它。为此,我们必要轻微修改当前的反向传播代码以单独处置惩罚这种情况。
首先,我们调用组合对象的backward方法;然后,由于我们不会调用激活函数对象(即层列表中的末了一个对象)的backward方法,因此必要用在激活/丧失对象中计算出的梯度来设置该对象的dinputs属性。末了,我们可以对除末了一层以外的所有层进行迭代并执行它们的反向传播操作:
  1.         # If softmax classifier
  2.         if self.softmax_classifier_output is not None:
  3.             # First call backward method
  4.             # on the combined activation/loss
  5.             # this will set dinputs property
  6.             self.softmax_classifier_output.backward(output, y)
  7.             # Since we'll not call backward method of the last layer
  8.             # which is Softmax activation
  9.             # as we used combined activation/loss
  10.             # object, let's set dinputs in this object
  11.         self.layers[-1].dinputs = self.softmax_classifier_output.dinputs
  12.         # Call backward method going through
  13.         # all the objects but last
  14.         # in reversed order passing dinputs as a parameter
  15.         for layer in reversed(self.layers[:-1]):
  16.             layer.backward(layer.next.dinputs)
  17.         return
复制代码
到目前为止的完整模型类代码如下:
  1. # Model classclass Model:    def __init__(self):        # Create a list of network objects        self.layers = []        # Softmax classifier's output object
  2.         self.softmax_classifier_output = None
  3.             # Add objects to the model    def add(self, layer):        self.layers.append(layer)        # Set loss, optimizer and accuracy    def set(self, *, loss, optimizer, accuracy):        self.loss = loss        self.optimizer = optimizer        self.accuracy = accuracy                # Finalize the model    def finalize(self):        # Create and set the input layer        self.input_layer = Layer_Input()        # Count all the objects        layer_count = len(self.layers)        # Initialize a list containing trainable layers:        self.trainable_layers = []        # Iterate the objects        for i in range(layer_count):            # If it's the first layer,            # the previous layer object is the input layer            if i == 0:                self.layers[i].prev = self.input_layer                self.layers[i].next = self.layers[i+1]            # All layers except for the first and the last            elif i < layer_count - 1:                self.layers[i].prev = self.layers[i-1]                self.layers[i].next = self.layers[i+1]            # The last layer - the next object is the loss            # Also let's save aside the reference to the last object            # whose output is the model's output            else:                self.layers[i].prev = self.layers[i-1]                self.layers[i].next = self.loss                self.output_layer_activation = self.layers[i]            # If layer contains an attribute called "weights",            # it's a trainable layer -            # add it to the list of trainable layers            # We don't need to check for biases -            # checking for weights is enough             if hasattr(self.layers[i], 'weights'):                    self.trainable_layers.append(self.layers[i])            # Update loss object with trainable layers            self.loss.remember_trainable_layers(self.trainable_layers)        # If output activation is Softmax and
  4.         # loss function is Categorical Cross-Entropy
  5.         # create an object of combined activation
  6.         # and loss function containing
  7.         # faster gradient calculation
  8.         if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
  9.             # Create an object of combined activation
  10.             # and loss functions
  11.             self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()
  12.     # Train the model    def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):        # Initialize accuracy object        self.accuracy.init(y)        # Main training loop        for epoch in range(1, epochs+1):            # Perform the forward pass            output = self.forward(X, training=True)            # Calculate loss            data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)            loss = data_loss + regularization_loss            # Get predictions and calculate an accuracy            predictions = self.output_layer_activation.predictions(output)            accuracy = self.accuracy.calculate(predictions, y)            # Perform backward pass            self.backward(output, y)            # Optimize (update parameters)            self.optimizer.pre_update_params()            for layer in self.trainable_layers:                self.optimizer.update_params(layer)            self.optimizer.post_update_params()            # Print a summary            if not epoch % print_every:                print(f'epoch: {epoch}, ' +                      f'acc: {accuracy:.3f}, ' +                      f'loss: {loss:.3f} (' +                      f'data_loss: {data_loss:.3f}, ' +                      f'reg_loss: {regularization_loss:.3f}), ' +                      f'lr: {self.optimizer.current_learning_rate}')        # If there is the validation data        if validation_data is not None:            # For better readability            X_val, y_val = validation_data            # Perform the forward pass            output = self.forward(X_val, training=False)            # Calculate the loss            loss = self.loss.calculate(output, y_val)            # Get predictions and calculate an accuracy            predictions = self.output_layer_activation.predictions(output)            accuracy = self.accuracy.calculate(predictions, y_val)            # Print a summary            print(f'validation, ' +                  f'acc: {accuracy:.3f}, ' +                  f'loss: {loss:.3f}')    # Performs forward pass    def forward(self, X, training):        # Call forward method on the input layer        # this will set the output property that        # the first layer in "prev" object is expecting        self.input_layer.forward(X, training)        # Call forward method of every object in a chain        # Pass output of the previous object as a parameter        for layer in self.layers:            layer.forward(layer.prev.output, training)        # "layer" is now the last object from the list,        # return its output        return layer.output    # Performs backward pass    def backward(self, output, y):        # If softmax classifier        if self.softmax_classifier_output is not None:            # First call backward method            # on the combined activation/loss            # this will set dinputs property            self.softmax_classifier_output.backward(output, y)            # Since we'll not call backward method of the last layer            # which is Softmax activation            # as we used combined activation/loss            # object, let's set dinputs in this object            self.layers[-1].dinputs = self.softmax_classifier_output.dinputs            # Call backward method going through            # all the objects but last            # in reversed order passing dinputs as a parameter            for layer in reversed(self.layers[:-1]):                layer.backward(layer.next.dinputs)            return        # First call backward method on the loss        # this will set dinputs property that the last        # layer will try to access shortly        self.loss.backward(output, y)        # Call backward method going through all the objects        # in reversed order passing dinputs as a parameter        for layer in reversed(self.layers):            layer.backward(layer.next.dinputs)
复制代码
别的,我们将不再必要Activation_Softmax_Loss_CategoricalCrossentropy类的初始化器和前向传播方法,因此我们可以将它们移除,仅生存反向传播方法:
  1. # Softmax classifier - combined Softmax activation
  2. # and cross-entropy loss for faster backward step
  3. class Activation_Softmax_Loss_CategoricalCrossentropy():  
  4.     ...
  5.     # Backward pass
  6.     def backward(self, dvalues, y_true):
  7.         # Number of samples
  8.         samples = len(dvalues)     
  9.         # Copy so we can safely modify
  10.         self.dinputs = dvalues.copy()
  11.         # Calculate gradient
  12.         self.dinputs[range(samples), y_true] -= 1
  13.         # Normalize gradient
  14.         self.dinputs = self.dinputs / samples
复制代码
现在我们可以通过利用 Dropout 来测试更新后的 Model 对象:
  1. # Create dataset
  2. X, y = spiral_data(samples=1000, classes=3)
  3. X_test, y_test = spiral_data(samples=100, classes=3)
  4. # Instantiate the model
  5. model = Model()
  6. # Add layers
  7. model.add(Layer_Dense(2, 512, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
  8. model.add(Activation_ReLU())
  9. model.add(Layer_Dropout(0.1))
  10. model.add(Layer_Dense(512, 3))
  11. model.add(Activation_Softmax())
  12. # Set loss, optimizer and accuracy objects
  13. model.set(
  14.     loss=Loss_CategoricalCrossentropy(),
  15.     optimizer=Optimizer_Adam(learning_rate=0.05, decay=5e-5),
  16.     accuracy=Accuracy_Categorical()
  17.     )
  18. # Finalize the model
  19. model.finalize()
  20. # Train the model
  21. model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)
复制代码
  1. >>>
  2. epoch: 100, acc: 0.716, loss: 0.726 (data_loss: 0.666, reg_loss: 0.060), lr:
  3. 0.04975371909050202
  4. epoch: 200, acc: 0.787, loss: 0.615 (data_loss: 0.538, reg_loss: 0.077), lr:
  5. 0.049507401356502806
  6. ...
  7. epoch: 9900, acc: 0.861, loss: 0.436 (data_loss: 0.389, reg_loss: 0.046),
  8. lr: 0.0334459346466437
  9. epoch: 10000, acc: 0.880, loss: 0.394 (data_loss: 0.347, reg_loss: 0.047),
  10. lr: 0.03333444448148271
  11. validation, acc: 0.867, loss: 0.379
复制代码
看起来齐备都按预期工作。现在有了这个 Model 类,我们可以定义新的模型,而无需重复编写大量代码。重复编写代码不但令人厌烦,还更容易出现一些难以察觉的小错误。

到目前为止的完整代码:

  1. import numpy as npimport nnfsfrom nnfs.datasets import sine_data, spiral_dataimport sysnnfs.init()# Dense layerclass Layer_Dense:    # Layer initialization    def __init__(self, n_inputs, n_neurons,                 weight_regularizer_l1=0, weight_regularizer_l2=0,                 bias_regularizer_l1=0, bias_regularizer_l2=0):        # Initialize weights and biases        # self.weights = 0.01 * np.random.randn(n_inputs, n_neurons)        self.weights = 0.1 * np.random.randn(n_inputs, n_neurons)        self.biases = np.zeros((1, n_neurons))        # Set regularization strength        self.weight_regularizer_l1 = weight_regularizer_l1        self.weight_regularizer_l2 = weight_regularizer_l2        self.bias_regularizer_l1 = bias_regularizer_l1        self.bias_regularizer_l2 = bias_regularizer_l2        # Forward pass    def forward(self, inputs, training):        # Remember input values        self.inputs = inputs        # Calculate output values from inputs, weights and biases        self.output = np.dot(inputs, self.weights) + self.biases            # Backward pass    def backward(self, dvalues):        # Gradients on parameters        self.dweights = np.dot(self.inputs.T, dvalues)        self.dbiases = np.sum(dvalues, axis=0, keepdims=True)        # Gradients on regularization        # L1 on weights        if self.weight_regularizer_l1 > 0:            dL1 = np.ones_like(self.weights)            dL1[self.weights < 0] = -1            self.dweights += self.weight_regularizer_l1 * dL1        # L2 on weights        if self.weight_regularizer_l2 > 0:            self.dweights += 2 * self.weight_regularizer_l2 * self.weights        # L1 on biases        if self.bias_regularizer_l1 > 0:            dL1 = np.ones_like(self.biases)            dL1[self.biases < 0] = -1            self.dbiases += self.bias_regularizer_l1 * dL1        # L2 on biases        if self.bias_regularizer_l2 > 0:            self.dbiases += 2 * self.bias_regularizer_l2 * self.biases        # Gradient on values        self.dinputs = np.dot(dvalues, self.weights.T)                # Dropoutclass Layer_Dropout:            # Init    def __init__(self, rate):        # Store rate, we invert it as for example for dropout        # of 0.1 we need success rate of 0.9        self.rate = 1 - rate            # Forward pass    def forward(self, inputs, training):        # Save input values        self.inputs = inputs        # If not in the training mode - return values        if not training:            self.output = inputs.copy()            return        # Generate and save scaled mask        self.binary_mask = np.random.binomial(1, self.rate, size=inputs.shape) / self.rate        # Apply mask to output values        self.output = inputs * self.binary_mask            # Backward pass    def backward(self, dvalues):        # Gradient on values        self.dinputs = dvalues * self.binary_mask        # Input "layer"class Layer_Input:    # Forward pass    def forward(self, inputs, training):        self.output = inputs        # ReLU activationclass Activation_ReLU:      # Forward pass    def forward(self, inputs, training):        # Remember input values        self.inputs = inputs        # Calculate output values from inputs        self.output = np.maximum(0, inputs)            # Backward pass    def backward(self, dvalues):        # Since we need to modify original variable,        # let's make a copy of values first        self.dinputs = dvalues.copy()        # Zero gradient where input values were negative        self.dinputs[self.inputs <= 0] = 0            # Calculate predictions for outputs    def predictions(self, outputs):        return outputs                # Softmax activationclass Activation_Softmax:    # Forward pass    def forward(self, inputs, training):        # Remember input values        self.inputs = inputs        # Get unnormalized probabilities        exp_values = np.exp(inputs - np.max(inputs, axis=1, keepdims=True))        # Normalize them for each sample        probabilities = exp_values / np.sum(exp_values, axis=1, keepdims=True)        self.output = probabilities            # Backward pass    def backward(self, dvalues):        # Create uninitialized array        self.dinputs = np.empty_like(dvalues)        # Enumerate outputs and gradients        for index, (single_output, single_dvalues) in enumerate(zip(self.output, dvalues)):            # Flatten output array            single_output = single_output.reshape(-1, 1)            # Calculate Jacobian matrix of the output and            jacobian_matrix = np.diagflat(single_output) - np.dot(single_output, single_output.T)            # Calculate sample-wise gradient            # and add it to the array of sample gradients            self.dinputs[index] = np.dot(jacobian_matrix, single_dvalues)                # Calculate predictions for outputs    def predictions(self, outputs):        return np.argmax(outputs, axis=1)              # Sigmoid activationclass Activation_Sigmoid:    # Forward pass    def forward(self, inputs, training):        # Save input and calculate/save output        # of the sigmoid function        self.inputs = inputs        self.output = 1 / (1 + np.exp(-inputs))            # Backward pass    def backward(self, dvalues):        # Derivative - calculates from output of the sigmoid function        self.dinputs = dvalues * (1 - self.output) * self.output        # Calculate predictions for outputs    def predictions(self, outputs):        return (outputs > 0.5) * 1        # Linear activationclass Activation_Linear:    # Forward pass    def forward(self, inputs, training):        # Just remember values        self.inputs = inputs        self.output = inputs            # Backward pass    def backward(self, dvalues):        # derivative is 1, 1 * dvalues = dvalues - the chain rule        self.dinputs = dvalues.copy()        # Calculate predictions for outputs    def predictions(self, outputs):        return outputs                # SGD optimizerclass Optimizer_SGD:    # Initialize optimizer - set settings,    # learning rate of 1. is default for this optimizer    def __init__(self, learning_rate=1., decay=0., momentum=0.):        self.learning_rate = learning_rate        self.current_learning_rate = learning_rate        self.decay = decay        self.iterations = 0        self.momentum = momentum            # Call once before any parameter updates    def pre_update_params(self):        if self.decay:            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))        # Update parameters    def update_params(self, layer):        # If we use momentum        if self.momentum:            # If layer does not contain momentum arrays, create them            # filled with zeros            if not hasattr(layer, 'weight_momentums'):                layer.weight_momentums = np.zeros_like(layer.weights)                # If there is no momentum array for weights                # The array doesn't exist for biases yet either.                layer.bias_momentums = np.zeros_like(layer.biases)            # Build weight updates with momentum - take previous            # updates multiplied by retain factor and update with            # current gradients            weight_updates = self.momentum * layer.weight_momentums - self.current_learning_rate * layer.dweights            layer.weight_momentums = weight_updates                        # Build bias updates            bias_updates = self.momentum * layer.bias_momentums - self.current_learning_rate * layer.dbiases            layer.bias_momentums = bias_updates        # Vanilla SGD updates (as before momentum update)        else:            weight_updates = -self.current_learning_rate * layer.dweights            bias_updates = -self.current_learning_rate * layer.dbiases        # Update weights and biases using either        # vanilla or momentum updates        layer.weights += weight_updates        layer.biases += bias_updates                    # Call once after any parameter updates    def post_update_params(self):        self.iterations += 1        # Adagrad optimizerclass Optimizer_Adagrad:    # Initialize optimizer - set settings    def __init__(self, learning_rate=1., decay=0., epsilon=1e-7):        self.learning_rate = learning_rate        self.current_learning_rate = learning_rate        self.decay = decay        self.iterations = 0        self.epsilon = epsilon            # Call once before any parameter updates    def pre_update_params(self):        if self.decay:            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))        # Update parameters    def update_params(self, layer):        # If layer does not contain cache arrays,        # create them filled with zeros        if not hasattr(layer, 'weight_cache'):            layer.weight_cache = np.zeros_like(layer.weights)            layer.bias_cache = np.zeros_like(layer.biases)        # Update cache with squared current gradients        layer.weight_cache += layer.dweights**2        layer.bias_cache += layer.dbiases**2        # Vanilla SGD parameter update + normalization        # with square rooted cache        layer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)        layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)        # Call once after any parameter updates    def post_update_params(self):        self.iterations += 1                        # RMSprop optimizerclass Optimizer_RMSprop:                # Initialize optimizer - set settings    def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, rho=0.9):        self.learning_rate = learning_rate        self.current_learning_rate = learning_rate        self.decay = decay        self.iterations = 0        self.epsilon = epsilon        self.rho = rho        # Call once before any parameter updates    def pre_update_params(self):        if self.decay:            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))        # Update parameters    def update_params(self, layer):        # If layer does not contain cache arrays,        # create them filled with zeros        if not hasattr(layer, 'weight_cache'):            layer.weight_cache = np.zeros_like(layer.weights)            layer.bias_cache = np.zeros_like(layer.biases)        # Update cache with squared current gradients        layer.weight_cache = self.rho * layer.weight_cache + (1 - self.rho) * layer.dweights**2        layer.bias_cache = self.rho * layer.bias_cache + (1 - self.rho) * layer.dbiases**2                # Vanilla SGD parameter update + normalization        # with square rooted cache        layer.weights += -self.current_learning_rate * layer.dweights / (np.sqrt(layer.weight_cache) + self.epsilon)        layer.biases += -self.current_learning_rate * layer.dbiases / (np.sqrt(layer.bias_cache) + self.epsilon)        # Call once after any parameter updates    def post_update_params(self):        self.iterations += 1            # Adam optimizerclass Optimizer_Adam:    # Initialize optimizer - set settings    def __init__(self, learning_rate=0.001, decay=0., epsilon=1e-7, beta_1=0.9, beta_2=0.999):        self.learning_rate = learning_rate        self.current_learning_rate = learning_rate        self.decay = decay        self.iterations = 0        self.epsilon = epsilon        self.beta_1 = beta_1        self.beta_2 = beta_2        # Call once before any parameter updates    def pre_update_params(self):        if self.decay:            self.current_learning_rate = self.learning_rate * (1. / (1. + self.decay * self.iterations))            # Update parameters    def update_params(self, layer):        # If layer does not contain cache arrays,        # create them filled with zeros        if not hasattr(layer, 'weight_cache'):            layer.weight_momentums = np.zeros_like(layer.weights)            layer.weight_cache = np.zeros_like(layer.weights)            layer.bias_momentums = np.zeros_like(layer.biases)            layer.bias_cache = np.zeros_like(layer.biases)        # Update momentum with current gradients        layer.weight_momentums = self.beta_1 * layer.weight_momentums + (1 - self.beta_1) * layer.dweights        layer.bias_momentums = self.beta_1 * layer.bias_momentums + (1 - self.beta_1) * layer.dbiases        # Get corrected momentum        # self.iteration is 0 at first pass        # and we need to start with 1 here        weight_momentums_corrected = layer.weight_momentums / (1 - self.beta_1 ** (self.iterations + 1))        bias_momentums_corrected = layer.bias_momentums / (1 - self.beta_1 ** (self.iterations + 1))        # Update cache with squared current gradients        layer.weight_cache = self.beta_2 * layer.weight_cache + (1 - self.beta_2) * layer.dweights**2        layer.bias_cache = self.beta_2 * layer.bias_cache + (1 - self.beta_2) * layer.dbiases**2        # Get corrected cache        weight_cache_corrected = layer.weight_cache / (1 - self.beta_2 ** (self.iterations + 1))        bias_cache_corrected = layer.bias_cache / (1 - self.beta_2 ** (self.iterations + 1))        # Vanilla SGD parameter update + normalization        # with square rooted cache        layer.weights += -self.current_learning_rate * weight_momentums_corrected / (np.sqrt(weight_cache_corrected) + self.epsilon)        layer.biases += -self.current_learning_rate * bias_momentums_corrected / (np.sqrt(bias_cache_corrected) + self.epsilon)                        # Call once after any parameter updates    def post_update_params(self):        self.iterations += 1                    # Common loss classclass Loss:    # Regularization loss calculation    def regularization_loss(self):                # 0 by default        regularization_loss = 0        # Calculate regularization loss        # iterate all trainable layers        for layer in self.trainable_layers:            # L1 regularization - weights            # calculate only when factor greater than 0            if layer.weight_regularizer_l1 > 0:                regularization_loss += layer.weight_regularizer_l1 * np.sum(np.abs(layer.weights))            # L2 regularization - weights            if layer.weight_regularizer_l2 > 0:                regularization_loss += layer.weight_regularizer_l2 * np.sum(layer.weights * layer.weights)            # L1 regularization - biases            # calculate only when factor greater than 0            if layer.bias_regularizer_l1 > 0:                regularization_loss += layer.bias_regularizer_l1 * np.sum(np.abs(layer.biases))            # L2 regularization - biases            if layer.bias_regularizer_l2 > 0:                regularization_loss += layer.bias_regularizer_l2 * np.sum(layer.biases * layer.biases)        return regularization_loss    # Set/remember trainable layers    def remember_trainable_layers(self, trainable_layers):       self.trainable_layers = trainable_layers    # Calculates the data and regularization losses    # given model output and ground truth values    def calculate(self, output, y, *, include_regularization=False):        # Calculate sample losses        sample_losses = self.forward(output, y)        # Calculate mean loss        data_loss = np.mean(sample_losses)        # If just data loss - return it        if not include_regularization:            return data_loss        # Return the data and regularization losses        return data_loss, self.regularization_loss()       # Cross-entropy lossclass Loss_CategoricalCrossentropy(Loss):    # Forward pass    def forward(self, y_pred, y_true):        # Number of samples in a batch        samples = len(y_pred)        # Clip data to prevent division by 0        # Clip both sides to not drag mean towards any value        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)        # Probabilities for target values -        # only if categorical labels        if len(y_true.shape) == 1:            correct_confidences = y_pred_clipped[                range(samples),                y_true            ]        # Mask values - only for one-hot encoded labels        elif len(y_true.shape) == 2:            correct_confidences = np.sum(y_pred_clipped * y_true, axis=1)        # Losses        negative_log_likelihoods = -np.log(correct_confidences)        return negative_log_likelihoods        # Backward pass    def backward(self, dvalues, y_true):        # Number of samples        samples = len(dvalues)        # Number of labels in every sample        # We'll use the first sample to count them        labels = len(dvalues[0])        # If labels are sparse, turn them into one-hot vector        if len(y_true.shape) == 1:            y_true = np.eye(labels)[y_true]        # Calculate gradient        self.dinputs = -y_true / dvalues        # Normalize gradient        self.dinputs = self.dinputs / samples        # Softmax classifier - combined Softmax activation# and cross-entropy loss for faster backward stepclass Activation_Softmax_Loss_CategoricalCrossentropy():      # # Creates activation and loss function objects    # def __init__(self):    #     self.activation = Activation_Softmax()    #     self.loss = Loss_CategoricalCrossentropy()    # # Forward pass    # def forward(self, inputs, y_true):    #     # Output layer's activation function    #     self.activation.forward(inputs)    #     # Set the output    #     self.output = self.activation.output    #     # Calculate and return loss value    #     return self.loss.calculate(self.output, y_true)    # Backward pass    def backward(self, dvalues, y_true):        # Number of samples        samples = len(dvalues)             # If labels are one-hot encoded,        # turn them into discrete values        if len(y_true.shape) == 2:            y_true = np.argmax(y_true, axis=1)        # Copy so we can safely modify        self.dinputs = dvalues.copy()        # Calculate gradient        self.dinputs[range(samples), y_true] -= 1        # Normalize gradient        self.dinputs = self.dinputs / samples        # Binary cross-entropy lossclass Loss_BinaryCrossentropy(Loss):     # Forward pass    def forward(self, y_pred, y_true):        # Clip data to prevent division by 0        # Clip both sides to not drag mean towards any value        y_pred_clipped = np.clip(y_pred, 1e-7, 1 - 1e-7)        # Calculate sample-wise loss        sample_losses = -(y_true * np.log(y_pred_clipped) + (1 - y_true) * np.log(1 - y_pred_clipped))        sample_losses = np.mean(sample_losses, axis=-1)        # Return losses        return sample_losses               # Backward pass    def backward(self, dvalues, y_true):        # Number of samples        samples = len(dvalues)        # Number of outputs in every sample        # We'll use the first sample to count them        outputs = len(dvalues[0])        # Clip data to prevent division by 0        # Clip both sides to not drag mean towards any value        clipped_dvalues = np.clip(dvalues, 1e-7, 1 - 1e-7)        # Calculate gradient        self.dinputs = -(y_true / clipped_dvalues - (1 - y_true) / (1 - clipped_dvalues)) / outputs        # Normalize gradient        self.dinputs = self.dinputs / samples                # Mean Squared Error lossclass Loss_MeanSquaredError(Loss): # L2 loss    # Forward pass    def forward(self, y_pred, y_true):        # Calculate loss        sample_losses = np.mean((y_true - y_pred)**2, axis=-1)        # Return losses        return sample_losses        # Backward pass    def backward(self, dvalues, y_true):        # Number of samples        samples = len(dvalues)        # Number of outputs in every sample        # We'll use the first sample to count them        outputs = len(dvalues[0])        # Gradient on values        self.dinputs = -2 * (y_true - dvalues) / outputs        # Normalize gradient        self.dinputs = self.dinputs / samples        # Mean Absolute Error lossclass Loss_MeanAbsoluteError(Loss): # L1 loss    def forward(self, y_pred, y_true):        # Calculate loss        sample_losses = np.mean(np.abs(y_true - y_pred), axis=-1)        # Return losses        return sample_losses        # Backward pass    def backward(self, dvalues, y_true):        # Number of samples        samples = len(dvalues)        # Number of outputs in every sample        # We'll use the first sample to count them        outputs = len(dvalues[0])        # Calculate gradient        self.dinputs = np.sign(y_true - dvalues) / outputs        # Normalize gradient        self.dinputs = self.dinputs / samples     # Common accuracy class
  2. class Accuracy:
  3.     # Calculates an accuracy
  4.     # given predictions and ground truth values
  5.     def calculate(self, predictions, y):
  6.         # Get comparison results
  7.         comparisons = self.compare(predictions, y)
  8.         # Calculate an accuracy
  9.         accuracy = np.mean(comparisons)
  10.         # Return accuracy
  11.         return accuracy
  12.     # Accuracy calculation for classification model
  13. class Accuracy_Categorical(Accuracy):
  14.     # No initialization is needed
  15.     def init(self, y):
  16.         pass
  17.     # Compares predictions to the ground truth values
  18.     def compare(self, predictions, y):
  19.         if len(y.shape) == 2:
  20.             y = np.argmax(y, axis=1)
  21.         return predictions == y
  22.              # Accuracy calculation for regression model
  23. class Accuracy_Regression(Accuracy):
  24.     def __init__(self):
  25.         # Create precision property
  26.         self.precision = None
  27.     # Calculates precision value
  28.     # based on passed in ground truth
  29.     def init(self, y, reinit=False):
  30.         if self.precision is None or reinit:
  31.             self.precision = np.std(y) / 250
  32.     # Compares predictions to the ground truth values
  33.     def compare(self, predictions, y):
  34.         return np.absolute(predictions - y) < self.precision
  35.         # Model classclass Model:    def __init__(self):        # Create a list of network objects        self.layers = []        # Softmax classifier's output object
  36.         self.softmax_classifier_output = None
  37.             # Add objects to the model    def add(self, layer):        self.layers.append(layer)        # Set loss, optimizer and accuracy    def set(self, *, loss, optimizer, accuracy):        self.loss = loss        self.optimizer = optimizer        self.accuracy = accuracy                # Finalize the model    def finalize(self):        # Create and set the input layer        self.input_layer = Layer_Input()        # Count all the objects        layer_count = len(self.layers)        # Initialize a list containing trainable layers:        self.trainable_layers = []        # Iterate the objects        for i in range(layer_count):            # If it's the first layer,            # the previous layer object is the input layer            if i == 0:                self.layers[i].prev = self.input_layer                self.layers[i].next = self.layers[i+1]            # All layers except for the first and the last            elif i < layer_count - 1:                self.layers[i].prev = self.layers[i-1]                self.layers[i].next = self.layers[i+1]            # The last layer - the next object is the loss            # Also let's save aside the reference to the last object            # whose output is the model's output            else:                self.layers[i].prev = self.layers[i-1]                self.layers[i].next = self.loss                self.output_layer_activation = self.layers[i]            # If layer contains an attribute called "weights",            # it's a trainable layer -            # add it to the list of trainable layers            # We don't need to check for biases -            # checking for weights is enough             if hasattr(self.layers[i], 'weights'):                    self.trainable_layers.append(self.layers[i])            # Update loss object with trainable layers            self.loss.remember_trainable_layers(self.trainable_layers)        # If output activation is Softmax and
  38.         # loss function is Categorical Cross-Entropy
  39.         # create an object of combined activation
  40.         # and loss function containing
  41.         # faster gradient calculation
  42.         if isinstance(self.layers[-1], Activation_Softmax) and isinstance(self.loss, Loss_CategoricalCrossentropy):
  43.             # Create an object of combined activation
  44.             # and loss functions
  45.             self.softmax_classifier_output = Activation_Softmax_Loss_CategoricalCrossentropy()
  46.     # Train the model    def train(self, X, y, *, epochs=1, print_every=1, validation_data=None):        # Initialize accuracy object        self.accuracy.init(y)        # Main training loop        for epoch in range(1, epochs+1):            # Perform the forward pass            output = self.forward(X, training=True)            # Calculate loss            data_loss, regularization_loss = self.loss.calculate(output, y, include_regularization=True)            loss = data_loss + regularization_loss            # Get predictions and calculate an accuracy            predictions = self.output_layer_activation.predictions(output)            accuracy = self.accuracy.calculate(predictions, y)            # Perform backward pass            self.backward(output, y)            # Optimize (update parameters)            self.optimizer.pre_update_params()            for layer in self.trainable_layers:                self.optimizer.update_params(layer)            self.optimizer.post_update_params()            # Print a summary            if not epoch % print_every:                print(f'epoch: {epoch}, ' +                      f'acc: {accuracy:.3f}, ' +                      f'loss: {loss:.3f} (' +                      f'data_loss: {data_loss:.3f}, ' +                      f'reg_loss: {regularization_loss:.3f}), ' +                      f'lr: {self.optimizer.current_learning_rate}')        # If there is the validation data        if validation_data is not None:            # For better readability            X_val, y_val = validation_data            # Perform the forward pass            output = self.forward(X_val, training=False)            # Calculate the loss            loss = self.loss.calculate(output, y_val)            # Get predictions and calculate an accuracy            predictions = self.output_layer_activation.predictions(output)            accuracy = self.accuracy.calculate(predictions, y_val)            # Print a summary            print(f'validation, ' +                  f'acc: {accuracy:.3f}, ' +                  f'loss: {loss:.3f}')    # Performs forward pass    def forward(self, X, training):        # Call forward method on the input layer        # this will set the output property that        # the first layer in "prev" object is expecting        self.input_layer.forward(X, training)        # Call forward method of every object in a chain        # Pass output of the previous object as a parameter        for layer in self.layers:            layer.forward(layer.prev.output, training)        # "layer" is now the last object from the list,        # return its output        return layer.output    # Performs backward pass    def backward(self, output, y):        # If softmax classifier        if self.softmax_classifier_output is not None:            # First call backward method            # on the combined activation/loss            # this will set dinputs property            self.softmax_classifier_output.backward(output, y)            # Since we'll not call backward method of the last layer            # which is Softmax activation            # as we used combined activation/loss            # object, let's set dinputs in this object            self.layers[-1].dinputs = self.softmax_classifier_output.dinputs            # Call backward method going through            # all the objects but last            # in reversed order passing dinputs as a parameter            for layer in reversed(self.layers[:-1]):                layer.backward(layer.next.dinputs)            return        # First call backward method on the loss        # this will set dinputs property that the last        # layer will try to access shortly        self.loss.backward(output, y)        # Call backward method going through all the objects        # in reversed order passing dinputs as a parameter        for layer in reversed(self.layers):            layer.backward(layer.next.dinputs)         # # Create dataset# X, y = sine_data()# # Instantiate the model# model = Model()# # Add layers# model.add(Layer_Dense(1, 64))# model.add(Activation_ReLU())# model.add(Layer_Dense(64, 64))# model.add(Activation_ReLU())# model.add(Layer_Dense(64, 1))# model.add(Activation_Linear())# # Set loss and optimizer objects# model.set(#     loss=Loss_MeanSquaredError(),#     optimizer=Optimizer_Adam(learning_rate=0.005, decay=1e-3),#     accuracy=Accuracy_Regression()#     )# # Finalize the model# model.finalize()# model.train(X, y, epochs=10000, print_every=100)########################################################################################## # Create train and test dataset# X, y = spiral_data(samples=100, classes=2)# X_test, y_test = spiral_data(samples=100, classes=2)# # Reshape labels to be a list of lists# # Inner list contains one output (either 0 or 1)# # per each output neuron, 1 in this case# y = y.reshape(-1, 1)# y_test = y_test.reshape(-1, 1)# # Instantiate the model# model = Model()# # Add layers# model.add(Layer_Dense(2, 64, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))# model.add(Activation_ReLU())# model.add(Layer_Dense(64, 1))# model.add(Activation_Sigmoid())# # Set loss, optimizer and accuracy objects# model.set(#     loss=Loss_BinaryCrossentropy(),#     optimizer=Optimizer_Adam(decay=5e-7),#     accuracy=Accuracy_Categorical()#     )# # Finalize the model# model.finalize()# # Train the model# model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)########################################################################################## Create dataset
  47. X, y = spiral_data(samples=1000, classes=3)
  48. X_test, y_test = spiral_data(samples=100, classes=3)
  49. # Instantiate the model
  50. model = Model()
  51. # Add layers
  52. model.add(Layer_Dense(2, 512, weight_regularizer_l2=5e-4, bias_regularizer_l2=5e-4))
  53. model.add(Activation_ReLU())
  54. model.add(Layer_Dropout(0.1))
  55. model.add(Layer_Dense(512, 3))
  56. model.add(Activation_Softmax())
  57. # Set loss, optimizer and accuracy objects
  58. model.set(
  59.     loss=Loss_CategoricalCrossentropy(),
  60.     optimizer=Optimizer_Adam(learning_rate=0.05, decay=5e-5),
  61.     accuracy=Accuracy_Categorical()
  62.     )
  63. # Finalize the model
  64. model.finalize()
  65. # Train the model
  66. model.train(X, y, validation_data=(X_test, y_test), epochs=10000, print_every=100)
复制代码


   本章的章节代码、更多资源和勘误表:https://nnfs.io/ch18

免责声明:如果侵犯了您的权益,请联系站长,我们会及时删除侵权内容,谢谢合作!更多信息从访问主页:qidao123.com:ToB企服之家,中国第一个企服评测及商务社交产业平台。
回复

使用道具 举报

0 个回复

倒序浏览

快速回复

您需要登录后才可以回帖 登录 or 立即注册

本版积分规则

忿忿的泥巴坨

金牌会员
这个人很懒什么都没写!
快速回复 返回顶部 返回列表