Publication Date

9-1-2023

Comments

Technical Report: UTEP-CS-23-53

Abstract

Often, we need to know how to estimate the value of a difficult-to-directly estimate quantity y -- e.g., tomorrow's temperature -- based on the known values of several quantities x1, ..., xn. In many practical situations, we know that the relation between y and xi can be accurately described by a linear function. So, to find this dependence, we need to estimate the coefficients of this linear dependence based on the known cases in which we know both y and xi; this is known as linear regression. In the ideal situation, when in each case, we know all the inputs xi, the computationally efficient and well-justified least squares method provides a solution to this problem. However, in practice, some of the inputs are often missing. There are heuristic methods for dealing with such missing values, but the problem is that different methods lead to different results. This is the main problem with which we deal in this paper. To solve this problem, we propose a new well-justified method that eliminates this undesirable non-uniqueness. An auxiliary computational problem emerges if after we get a linear dependence of y on xi, we learn the values of an additional variable xn+1. In this case, in principle, we can simply re-apply the least square method "from scratch", but this idea, while feasible, is still somewhat time-consuming, so it is desirable to come up with a faster algorithm that would utilize the previous regression result. Such an algorithm is also provided in this paper.

Share

COinS