Publication Date

7-1-2024

Comments

Technical Report: UTEP-CS-24-36

Abstract

In many practical situations, for each of two classifications, we know the probabilities that a randomly selected object belong to different categories. For example, we know what proportion of people are below 20 years old, what proportion is between 20 and 30, etc., and we also know what proportion of people earns less than 10K, between 10K and 20K, etc. In such situations, we are often interested in proportion of people who are classified by two classifications into two given categories. For example, we are interested in the proportion of people whose age is between 20 and 30 and whose income is between 10K and 20K. If we do not have detailed records of all the objects, we select a small sample and count how many objects from this sample belong to each pair of categories. The resulting proportions are a good first-approximation estimate for the desired proportion. However, for a random sample proportions of each category are, in general, somewhat different from the proportions in the overall population. Thus, the first-approximation estimates need to be adjusted, so that they fit with the overall-population values. The problem of finding proper adjustments is known as the proportional fitting problem. There exist many efficient iterative algorithms for solving this problem, but it is still desirable to find classes for which even faster algorithms are possible. In this paper, we show that for the case when one of the classifications has only two categories, the proportional fitting problem can be reduced to solving a polynomial equation of order equal to number n of categories of the second classification. So, for n = 2, 3, 4, explicit formulas for solving quadratic, cubic, and quartic equations lead to explicit solutions for the proportional fitness problem. For n > 4, fast algorithms for solving polynomial equations lead to fast algorithms for solving the proportional fitness problem.

Share

COinS