Performance Measure

Kinder Chen
2 min readSep 8, 2021

A typical performance measure for regression problems is the Root Mean Square Error (RMSE). It gives an idea of how much error the system typically makes in its predictions, with a higher weight for large errors.

m is the number of instances in the dataset you are measuring the RMSE on. x(i) is a vector of all the feature values (excluding the label) of the ith instance in the dataset, and y(i) is its label. X is a matrix containing all the feature values (excluding labels) of all instances in the dataset. There is one row per instance, and the ith row is equal to the transpose of x(i), noted (x(i))⊺. h is your system’s prediction function, also called a hypothesis. When your system is given an instance’s feature vector x(i), it outputs a predicted value ŷ(i) = h(x(i)) for that instance. RMSE(X,h) is the cost function measured on the set of examples using your hypothesis h.

In the case that there are many outlier districts, you may consider using the mean absolute error (MAE).

Both the RMSE and the MAE are ways to measure the distance between two vectors: the vector of predictions and the vector of target values. Computing the RMSE corresponds to the Euclidean norm. Computing the MAE corresponds to the Manhattan norm because it measures the distance between two points in a city if you can only travel along orthogonal city blocks. The higher the norm index, the more it focuses on large values and neglects small ones. This is why the RMSE is more sensitive to outliers than the MAE. But when outliers are exponentially rare like in a bell-shaped curve, the RMSE performs very well and is generally preferred.

--

--

Kinder Chen

What happened couldn’t have happened any other way…