Skip to main content

Forward Propagation and backpropagation

As promised in the previous blog, we are to dive deep into the mathematical implementation of forward propagation and backpropagation.

Forward Propagation

As the name suggests, the input data is fed into the model in the forward direction. Each hidden layer in the network get access to the inputted data, processes the data as per its respective activation function and then passes it to the successive layer.

There are 2 steps of processing which takes place in each neuron of the hidden layer.

1. Preactivation: In this process, the weighted sum of the inputs is fed into the neuron.
2. Activation: Here the calculated weighted sum is then passed on to an activation function which adds non-linearity to the network. Based on the result of the activated function on the weighted sum, the neuron makes a decision whether to pass this information further to the successive neuron or not.

So coming to the mathematical approach, as we know for the ith training example in a 2 layer network,

For the first hidden layer:
Z[1] = W[1]X + b[1] 
                                                        
This represents the first hidden layer of the ith training example where W is the weight, X is the input and b is the bias.

Now applying the activation function on the weighted sum for that layer,
 A[1] = g[1](Z[1])
A
                                                                                                                                             
           
Now for the second hidden layer, we follow the same logic,
Z[2] = W[2]A[1] + b[2]
A[2] = g[2](Z[2])

Since this is the final layer of the network, and we are performing a binary classification, and we require the prediction to be between 0 and 1, g[2] is the sigmoid activation function. So this signifies,
ŷ = A[2]

And that's it! Here we used a 2-layer neural network to help us deconstruct the math involved in forward propagation.

Backpropagation

The most complex part for me in Deep Learning was Backpropagation. Forward propagation made sense because all it included was a bunch of matrix multiplication, addition of the bias terms, and including some activation functions to bring in some non-linearity. But then how do we compare ŷ(i) with the actual label, y(i). We need to come up with a good prediction to find the best values for our weights W and bias b.

In backpropagation, our main objective is to minimize the cost function by adjusting the network's weights and biases. Firstly,we calculate the gradients of the loss function L with respect to what we want to change in the neural network. The loss function is defined as,
L(ŷ,y) = -ylog(ŷ) – (1-y)log(1-ŷ) 

As we have seen in the forward propagation, ŷ A[2] , hence we first start by calculating the derivative of L with respect to a.

d(L(a,y))/da = -(y/a) + (1-y)/(1-a)
where a symbolizes A[2]

Now we will do the same for the derivative of a with respect to z.
Here, 
a = g(z)
where g is the sigmoid function,so
a = 1/(1-e-z)

Now finding the derivative with respect to z,
da/dz = e-z/(1+e-z) 2

So substituting a = 1/(1-e-z), we get:
da/dz = a(1-a)

Now, we need to calculate dz which is basically, 
dL/dz = (dL/da)*(da/dz)
=[(-y/a)+(1-y)/(1-a)]*a(1-a)
 a-y

Hence from the above equation, we get 
dz = a-y

Now finally, we will be calculating dW[1]b[1], dW[2] and b[2]. With some further matrix calculations, we find out that:
.
dw[2] = dz[2]a[2]T
db[2] = dz[2]
dw[1] = dz[1]xT
db[1] = dz[1]

Now we will update the weights and biases:

W = W - αdW
b = b - αdb
where α is the learning rate.

The updated version of W and b will hence result in a better prediction.

We will keep in mind that this entire process is only for a single training example. We usually use a gradient descent which can be calculated for 'm' training examples.

Hence, the goal of backpropagation is very straightforward: adjust each weight in the network in proportion to how much it contributes to the overall error. This property makes backpropagation a major algorithm used for better predictions of our deep learning model. 

Comments