Stochastic Gradient Descent (Wikipedia) is a gradient-based optimization algorithm that is used to learn network parameters during the training phase. ^{1} The gradients are typically calculated using the backpropagation algorithm. In practice, people use the minibatch version of SGD, where the parameter updates are performed based on a batch instead of a single example, increasing computational efficiency. Many extensions to vanilla SGD exist, including Momentum, Adagrad, RMSProp, Adadelta or Adam.

**Further Reading**

**Sources**

“Deep Learning Glossary.”

*WildML*, 8 Sept. 2017, www.wildml.com/deep-learning-glossary/ (1)