API in Scikit-Learn
All objects share a consistent API in Scikit-Learn. This blog introduces some common interfaces.
Estimators
Any object that can estimate some parameters based on a dataset is called an estimator . The estimation itself is performed by the fit() method, and it takes only a dataset as a parameter. Any other parameter needed to guide the estimation process is considered a hyperparameter, and it must be set as an instance variable, generally via a constructor parameter.
Transformers
The estimators that can also transform a dataset are called transformers. The transformation is performed by the transform() method with the dataset to transform as a parameter. It returns the transformed dataset. This transformation generally relies on the learned parameters. All transformers also have a convenience method called fit_transform() that is equivalent to call fit() and then transform(), but sometimes fit_transform() is optimized and runs much faster.
Predictors
The estimators, given a dataset, which are capable of making predictions are called predictors. A predictor has a predict() method that takes a dataset of newinstances and returns a dataset of corresponding predictions. It also has a score() method that measures the quality of the predictions, given a test set and the corresponding labels, in the case of supervised learning algorithms.
Inspection
All the estimator’s hyperparameters are accessible directly via public instance variables, and all the estimator’s learned parameters are accessible via public instance variables with an underscore suffix. Datasets are represented as NumPy arrays or SciPy sparse matrices, instead of homemade classes. Hyperparameters are just regular Python strings or numbers. Existing building blocks are reused as much as possible. Scikit-Learn provides reasonable default values for most parameters, making it easy to quickly create a baseline working system.