Mini-batch Gradient Descent

1 min readSep 9, 2021

Mini-batch Gradient Descent computes the gradients on small random sets of instances called mini-batches. Mini-batch gradient descent seeks to find a balance between the robustness of stochastic gradient descent and the efficiency of batch gradient descent. It is the most common implementation of gradient descent used in the field of deep learning.

Mini-batch sizes or batch sizes, are often tuned to an aspect of the computational architecture on which the implementation is being executed. The main advantage of Mini-batch GD is that we can get a performance boost from hardware optimization of matrix operations, especially when using GPUs.

The algorithm’s progress in parameter space is less erratic, especially with fairly large mini-batches. As a result, Mini-batch GD will end up walking around a bit closer to the global minimum, but it may be harder for it to escape from local minima. On the other hand, Mini-batch requires the configuration of an additional mini-batch size hyperparameter for the learning algorithm. Error information must be accumulated across mini-batches of training examples like batch gradient descent.

Mini-batch Gradient Descent

Written by Kinder Chen

No responses yet