Chris Malec Data Scientist

Numpy Neural Nets Layers

Another Numpy Neural Net: Layers

Deep Learning is built on layers, you input numbers on one side and the crank turns forward from layer to layer until it spits numbers out on the other side. Then the crank turns backwards and a loss out the other side spits a derivative on one side. This incredible modularity is both necessary (you have to introduce a non-linearity periodically or all your operations can be combined into one linear transformation) and probably one reason why the field moves so fast. So let’s start with the classic, fully connected layer, we’ll loop back and do a couple more layers once we have the rest of the machinery in place.

A Base Class

But first, a Base Class:

from DifferentiableFunction import DifferentiableFunction
from typing import Dict
from abc import abstractmethod

class Layer:
    def __init__(self):
        self.name = None
        self.type = 'Layer'
        self.weights = None
        self.biases = None
        self.last_input = None
        self.last_z = None
        self.weights_initialized = False

    @abstractmethod
    def initialize_weights(self):
        pass

    @abstractmethod
    def forward(self, input_data: np.ndarray) -> np.ndarray:
        if self.weights_initialized is False:
            self.initialize_weights()
            self.weights_initialized = True

    @abstractmethod
    def backward(self, output_gradient: np.ndarray) -> Dict[str,np.ndarray]:
        pass

    @abstractmethod
    def to_dict(self) -> Dict:
        pass
    @classmethod
    @abstractmethod
    def from_dict(cls, data: Dict) -> 'Layer':
        pass

The Layer Class defines what we would like all layers to have, and since layers are generally used by other objects, certain methods we need the layer to have. So all layers have certain attributes specified in the __init__ method, we can call this from subclasses and not wonder if we forgot something later. The point of the @abstractmethod decorator is that if a subclass does not implement the method underneath it, python will throw a NotImplemented error. There’s a number of them, because if a layer is missing any of these, it won’t work! It needs a forward pass, a backward pass, a way to initialize weights and a to/from_dict method. The @classmethod decorator means that the method can be called from the class rather than an instance (Layer.from_dict(kwargs) instead of Layer(kwargs).from_dict(kwargs)), this makes a lot of sense for ‘load’ type functions, where is seems silly to try to make some kind of dummy instance so you can instantiate the real instance.

The last two ‘dict’ methods are really so that we can specify these models as a yaml or json file instead of inline. This can be very useful when you want to separate the code to run/train your model from the actual model architecture. You don’t want to be pushing code to the repo just to try out a different number of hidden nodes.

Subclass

So now we’re ready to code up our Dense layer:

class DenseLayer(Layer):
    """
    A fully connected neural network layer.
    
     Parameters:
        input_size (int): The number of input features.
        output_size (int): The number of output features (neurons).
        activation_function (DifferentiableFunction): The activation function to apply.
        name (str, optional): Name of the layer. Defaults to None.
    """

    def __init__(self, 
                 input_size:int,
                 output_size: int, 
                 activation_function: DifferentiableFunction,
                 name: str = None):
        super().__init__()
        self.name = name
        self.type = 'Dense'
        self.input_size = input_size
        self.output_size = output_size
        self.activation_function = activation_function

    def initialize_weights(self):
        """
        initialize weights with He initialization.
        """
        self.weights = np.random.randn(self.input_size, self.output_size) * (np.sqrt(2./self.input_size))
        self.biases = np.zeros(self.output_size)
        logger.info(f"Weights and biases initialized for layer {self.name}")

    def forward(self, input_data: np.ndarray) -> np.ndarray:
        """
        Performs the forward pass through the layer, supporting batched input.
        Parameters:
            input_data (np.ndarray): Input data to the layer. Shape: (batch_size, input_size)

        Returns:
            np.ndarray: Output after applying weights, biases, and activation function. Shape: (batch_size, output_size)
        """
        super().forward(input_data)
        self.last_input = input_data
        self.last_z = np.dot(input_data, self.weights) + self.biases  # (batch_size, output_size)
        logger.debug(f"Forward pass in layer {self.name}: input shape {input_data.shape}, z shape {self.last_z.shape}")
        return self.activation_function.function(self.last_z)

    def backward(self, output_gradient: np.ndarray) -> Dict[str,np.ndarray]:
        """
        Performs the backward pass through the layer, updating weights and biases.
        Parameters:
            output_gradient (np.ndarray): Gradient of the loss with respect to the layer's output. Shape: (batch_size, output_size)
            learning_rate (float): Learning rate for weight updates.
        Returns:
            np.ndarray: Gradient of the loss with respect to the layer's input. Shape: (batch_size, input_size)
            """
        activation_gradient = self.activation_function.derivative(self.last_z)  # (batch_size, output_size)
        delta = output_gradient * activation_gradient  # (batch_size, output_size)

        weight_gradient = np.dot(self.last_input.T, delta) / self.last_input.shape[0]  # (input_size, output_size)
        bias_gradient = np.sum(delta, axis=0) / self.last_input.shape[0] # (output_size,)

        input_gradient = np.dot(delta, self.weights.T)  # (batch_size, input_size)
        logger.debug(f"Backward pass in layer {self.name}: output_gradient shape {output_gradient.shape}, input_gradient shape {input_gradient.shape}") 
        grad_dict = {'inputs': input_gradient,
                     'weights': weight_gradient,
                     'biases': bias_gradient}
        
        return grad_dict

        # Update weights and biases will be handled by the optimizer in Model.py

    def to_dict(self) -> Dict:
        return {
            'name': self.name,
            'type': self.type,
            'input_size': self.input_size,
            'output_size': self.output_size,
            'activation_function': self.activation_function.__class__.__name__,
        }
    
    @classmethod
    def from_dict(cls, data: Dict) -> 'DenseLayer':
        activation_function = getattr(__import__('DifferentiableFunction'), data['activation_function'])()
        layer = cls(input_size=data['input_size'],
                    output_size=data['output_size'],
                    activation_function=activation_function,
                    name=data['name'])
        return layer

The call to super().__init__() initializes all our default variables we’ll need to do forward/backward passes. We update self.type to the new layer type, and implement all the abstract methods.

forward simply does a matrix multiplication of the weights with the inputs and adds the biases, then applies the activation function. The first time forward pass is called, it initializes the weights by calling the base class’ forward method

backward calculates the relevant gradients and packages them up into a dictionary to be used by the optimizer. Note we don’t actually update the weights on the forward or backward pass partly to decrease the number of objects the layer needs to depend on.

Note that I also through some logging in here. Putting logging inside your classes is usually doing yourself a real solid later when some indeterminate layer is ruining your life come training time. It’s also way better than a print statement because you don’t have to remember to erase it (I feel like a n00b every time a colleague points out a print I missed in a PR review). To put logging in just throw this on top of any file:

import logging

logger = logging.getLogger(__name__)
logger.INFO("This is a log message at loglevel INFO")

Finally, the to/from dict methods make it possible to save/load a layer from a dictionary like form such as json or yaml.

Next up we’ll look at Optimizers and then we’ll be ready to put it all together into a model.