In many practical problems, the most effective classification techniques are based on deep learning. In this approach, once the neural network generates values corresponding to different classes, these values are transformed into probabilities by using the softmax formula. Researchers tried other transformation, but they did not work as well as softmax. A natural question is: why is softmax so effective? In this paper, we provide a possible explanation for this effectiveness: namely, we prove that softmax is the only consistent approach to probability-based classification. In precise terms, it is the only approach for which two reasonable probability-based ideas -- Least Squares and Bayesian statistics -- always lead to the exact same classification.