The softmax function is typically used to convert a vector of raw scores into class probabilities at the output layer of a Neural Network used for classification.^{1} It normalizes the scores by exponentiating and dividing by a normalization constant. If we are dealing with a large number of classes, a large vocabulary in Machine Translation for example, the normalization constant is expensive to compute. There exist various alternatives to make the computation more efficient, including Hierarchical Softmaxor using a sampling-based loss such as NCE.

**Sources**

“Deep Learning Glossary.”

*WildML*, 8 Sept. 2017, www.wildml.com/deep-learning-glossary/ (1)