# Gaussian Mixture Model

A Gaussian mixture model (GMM) is a probabilistic model that assumes that the instances were generated from a mixture of several Gaussian distributions whose parameters are unknown. All the instances generated from a single Gaussian distribution form a cluster that typically looks like an ellipsoid. Each cluster can have a different ellipsoidal shape, size, density, and orientation.

GMM relies on the Expectation-Maximization (EM) algorithm. EM initializes the cluster parameters randomly, then it repeats two steps until convergence, first is expectation step, assigning instances to clusters and then maximization step, updating the clusters. In the context of clustering, we can think of EM as a generalization of K-Means that not only finds the cluster centers, but also their size, shape, and orientation, as well as their relative weights. EM uses soft cluster assignments, not hard assignments. For each instance, during the expectation step, the algorithm estimates the probability that it belongs to each cluster based on the current cluster parameters. Then, during the maximization step, each cluster is updated using all the instances in the dataset, with each instance weighted by the estimated probability that it belongs to that cluster. These probabilities are called the responsibilities of the clusters for the instances. During the maximization step, each cluster’s update will mostly be impacted by the instances it is most responsible for. EM can end up converging to poor solutions, so it needs to be run several times, keeping only the best solution.

A GMM is a generative model, meaning we can sample new instances from it and they are ordered by cluster index. It is also possible to estimate the density of the model at any given location. For each instance it is given, GMM can estimate the log of the probability density function (PDF) at that location. The computational complexity of training a GMM depends on the number of instances *m*, the number of dimensions *n*, the number of clusters *k*, and the constraints on the covariance matrices.