Batch and Online Learning
One criterion used to classify Machine Learning systems is whether or not the system can learn incrementally from a stream of incoming data, which is batch and online learning.
Batch Learning
In batch learning, the system is incapable of learning incrementally: it must be trained using all the available data. This will generally take a lot of time and computing resources, so it is typically done offline. First the system is trained, and then it is launched into production and runs without learning anymore; it just applies what it has learned. This is called offline learning.
If you want a batch learning system to know about new data, you need to train a new version of the system from scratch on the full dataset , then stop the old system and replace it with the new one.
Online Learning
In online learning, you train the system incrementally by feeding it data instances sequentially, either individually or in small groups called mini- batches. Each learning step is fast and cheap, so the system can learn about new data on the fly.
Online learning is great for systems that receive data as a continuous flow and need to adapt to change rapidly or autonomously. It is also a good option if you have limited computing resources. Online learning algorithms can also be used to train systems on huge datasets that cannot fit in one machine’s main memory, which is called out-of- core learning. The algorithm loads part of the data, runs a training step on that data, and repeats the process until it has run on all of the data.
One important parameter of online learning systems is how fast they should adapt to changing data: this is called the learning rate. If you set a high learning rate, then your system will rapidly adapt to new data, but it will also tend to quickly forget the old data. Conversely, if you set a low learning rate, the system will have more inertia; that is, it will learn more slowly, but it will also be less sensitive to noise in the new data or to sequences of nonrepresentative data points, i.e. outliers.
A big challenge with online learning is that if bad data is fed to the system, the system’s performance will gradually decline. If it’s a live system, your clients will notice. To reduce this risk, you need to monitor your system closely and promptly switch learning off if you detect a drop in performance. You may also want to monitor the input data and react to abnormal data.