AIC and BIC

2 min readOct 6, 2021

This blog introduces to two measures: AIC (Akaike information criterion) and BIC (Bayesian information criterion), which give a comprehensive measure of model performace taking into account the additional variables. We can try to find the model that minimizes a theoretical information criterion, i.e., AIC and BIC, which defined as

where m is the number of instances, and p is the number of parameters learned by the model, L is the maximized value of the likelihood function of the model.

The AIC is generally used to compare each candidate model. For every model that uses Maximum Likelihood Estimation, the log-likelihood is automatically computed, and as a consequence, the AIC is very easy to calculate. The AIC acts as a penalized log-likelihood criterion, giving a balance between a good fit (high value of log-likelihood) and complexity (complex models are penalized more than fairly simple ones). The AIC is unbounded so it can take any type of value, but the bottom line is that when comparing models, the model with the lowest AIC should be selected. The BIC emerged as a Bayesian response to the AIC, but can be used for the exact same purposes. The idea is to select the candidate model with the highest probability given the data. This idea can be formalized inside a Bayesian framework, involving prior probabilities on candidate models along with prior densities on all parameters in the models. The lower AIC/BIC, the better the model is performing.

Both the BIC and the AIC penalize models that have more parameters to learn and reward models that fit the data well. They often end up selecting the same model. When they differ, the model selected by the BIC tends to be simpler with fewer parameters than the one selected by the AIC, but tends to not fit the data quite as well, which is especially true for larger datasets.

AIC and BIC

Written by Kinder Chen

No responses yet