Gradients often get smaller and smaller as the algorithm progresses down to the lower layers. As a result, the Gradient Descent update leaves the lower layers’ connection weights virtually unchanged, and training never converges to a good solution, which is called the vanishing gradients problem. Oppositely, the gradients can grow…

# Loss Function in Deep Learning

In the context of an optimization algorithm, the function used to evaluate a candidate solution (i.e. a set of weights) is referred to as the objective function. Typically, with neural networks, we seek to minimize the error. As such, the objective function is often referred to as a cost function…

# Hidden Layer Activation Functions

This blog introduces three most commonly used activation functions in hidden layers: Rectified Linear Activation (ReLU), Logistic (Sigmoid) and Hyperbolic Tangent (Tanh).

## Sigmoid

The sigmoid activation function, σ(z) = 1/(1+exp(-z)), is also called the logistic function. Logistic function has a well-defined nonzero derivative everywhere, allowing Gradient Descent to make some progress…

# Activation Functions in Neural Networks

An activation function in a neural network defines how the weighted sum of the input is transformed into an output from a node or nodes in a layer of the network. Sometimes it is called transfer function or squashing function.

Activation functions allow Deep Learning models to capture nonlinearity. If…

# Forward & Backward Propagation

Neural Networks have two major processes: Forward Propagation and Back Propagation. During Forward Propagation, we start at the input layer and feed our data in, propagating it through the network until we’ve reached the output layer and generated a prediction. Back Propagation is essentially the opposite of Forward Propagation. In…

# Gaussian Mixture Model

A Gaussian mixture model (GMM) is a probabilistic model that assumes that the instances were generated from a mixture of several Gaussian distributions whose parameters are unknown. All the instances generated from a single Gaussian distribution form a cluster that typically looks like an ellipsoid. …

# DBSCAN

DBSCAN algorithm defines clusters as continuous regions of high density. For each instance, the algorithm counts how many instances are located within a small distance ε from it. This region is called the instance’s ε-neighborhood. If an instance has at least instances in its ε- neighborhood including itself, then it… ## Kinder Chen

What happened couldn’t have happened any other way…