Feature Scaling in Machine Learning
Feature scaling is an important transformations to engineer data. With few exceptions, Machine Learning algorithms don’t perform well when the input numerical attributes have very different scales. On the other hand, scaling the target values is generally not required.
There are two common ways to get all attributes to have the same scale: min- max scaling and standardization. For min-max scaling or normalization, values are shifted and rescaled so that they end up ranging from 0 to 1. We do this by subtracting the min value and dividing by the max minus the min. For some reason, you also can change the range if you don’t want 0-1. For standardization, first it subtracts the mean value to make standardized values have a zero mean, and then it divides by the standard deviation so that the resulting distribution has unit variance. Unlike min-max scaling, standardization does not bound values to a specific range, which may be a problem for some algorithms. However, standardization is much less affected by outliers. Min-max scaling would easily crush all the other outlier values , whereas standardization would not be much affected.
As with all the transformations, it is important to fit the scalers to the training data only, not to the full dataset including the test set. Only then can you use them to transform the training set and the test set and new data.