I continue with my article series on how to program a training algorithm for a multi-layer perceptron [MLP]. In the course of my last articles
A simple program for an ANN to cover the Mnist dataset – V – coding the loss function
A simple program for an ANN to cover the Mnist dataset – IV – the concept of a cost or loss function
A simple program for an ANN to cover the Mnist dataset – III – forward propagation
A simple program for an ANN to cover the Mnist dataset – II
A simple program for an ANN to cover the Mnist dataset – I
we have already created code for the "Feed Forward Propagation" algorithm [FFPA] and two different cost functions - "Log Loss" and "MSE". In both cases we took care of a vectorized handling of multiple data records in mini-batches of training data.
Before we turn to the coding of the so called "error back-propagation" [EBP], I found it usefull to clarify the math behind behind this method for ANN/MLP-training. Understanding the basic principles of the gradient descent method for the optimization of MLP-weights is easy. But comprehending
- why and how gradient descent method leads to the back propagation of error terms
- and how we cover multiple training data records at the same time
is not - at least not in my opinion. So, I have discussed the required analysis and resulting algorithmic steps in detail in a PDF which you find attached to this article. I used a four layer MLP as an example for which I derived the partial derivatives of the "Log Loss" cost function for weights of the hidden layers in detail. I afterwards generalized the formalism. I hope the contents of the PDF will help beginners in the field of ML to understand what kind of matrix operations gradient descent leads to.
In the next article we shall encode the surprisingly compact algorithm for EBP. In the meantime I wish all readers Merry Christmas ...
Addendum 01.01.2020: Corrected a missing "-" for the cost function in the above PDF.