The Normal Equation in Linear Regression
A linear model makes a prediction by simply computing a weighted sum of the input features, plus a constant called the bias or intercept term.
To train a Linear Regression model, we need to find the value of θ that minimizes the mean squared error (MSE) or RMSE. MSE cost function for a Linear Regression model is
The value of θ that minimizes the cost function can be represented by the Normal Equation, which gives a closed-form solution.
The solution of θ can also be calculated by
where X dagger is the pseudoinverse of X (the Moore-Penrose inverse).The pseudoinverse itself is computed using a standard matrix factorization technique called Singular Value Decomposition (SVD) that can decompose the training set matrix X into the matrix multiplication of three matrices U Σ V⊺.The pseudoinverse is computed as
This approach is more efficient than computing the Normal Equation, which handles edge cases nicely. The Normal Equation may not work if the matrix X⊺X is not invertible, i.e., singular.
The Normal Equation computes the inverse of X⊺ X. The computational complexity of the Normal Equation is typically about O(n^2.4) to O(n^3), depending on the implementation. The SVD approach is about O(n^2). Both the Normal Equation and the SVD approach get very slow when the number of features grows large. On the other side, both are linear with regard to the number of instances you want to make predictions on and the number of features.