There is a reasonably accurate empirical formula that predicts, for two words i and j, the number Xij of times when the word i will appear in the vicinity of the word j. The parameters of this formula are determined by using the weighted least square approach. Empirically, the predictions are the most accurate if we use the weights proportional to a power of Xij. In this paper, we provide a theoretical explanation for this empirical fact.
Technical Report: UTEP-CS-22-110