Forward & Backward Propagation

Kinder Chen
2 min readOct 8, 2021

Neural Networks have two major processes: Forward Propagation and Back Propagation. During Forward Propagation, we start at the input layer and feed our data in, propagating it through the network until we’ve reached the output layer and generated a prediction. Back Propagation is essentially the opposite of Forward Propagation. In the Forward Propagation step, we start at the beginning and work towards the end of the network. In Back Propagation, we start at the end of the Neural Network — the prediction we ended up with as a result of Forward Propagation — and work backwards, allowing our model to learn by comparing our prediction with the actual label for the training data.

Backpropagation Algorithm

Back Propagation handles one mini-batch at a time, and it goes through the full training set multiple times. Each pass is called an epoch. Each mini-batch is passed to the network’s input layer, which sends it to the first hidden layer. The algorithm then computes the output of all the neurons in this layer. The result is passed on to the next layer, its output is computed and passed to the next layer, and so on until we get the output of the last layer, the output layer. This is the forward pass and all intermediate results are preserved since they are needed for the backward pass. Next, the algorithm measures the network’s output error, i.e., it uses a loss function that compares the desired output and the actual output of the network, and returns some measure of the error. Then it computes how much each output connection contributed to the error. This is done analytically by applying the chain rule, which makes this step fast and precise. The algorithm then measures how much of these error contributions came from each connection in the layer below, again using the chain rule, working backward until the algorithm reaches the input layer. This reverse pass efficiently measures the error gradient across all the connection weights in the network by propagating the error gradient backward through the network. Finally, the algorithm performs a Gradient Descent step to tweak all the connection weights in the network, using the error gradients it just computed.

--

--

Kinder Chen

What happened couldn’t have happened any other way…