Date of Award
2025-08-01
Degree Name
Doctor of Philosophy
Department
Mathematical Sciences
Advisor(s)
Amy Wagler
Second Advisor
Mandal Abhijit
Abstract
The Iterative Proportional Fitting (IPF) algorithm is widely used in contingency table estimation, survey weighting, and synthetic population generation due to its simplicity and strong theoretical foundation for matching observed marginal distributions. However, in high-dimensional settings, IPF faces substantial computational and memory demands, as well as statistical instability caused by sparse contingency tables. Moreover, IPF is less useful in modern population synthesis tasks that require both scalability and realism because, despite its superiority in matching known marginal distributions, it cannot produce realistic out-of-sample data points. To address these limitations, we first propose a blockwise IPF framework, in which the feature space is partitioned into smaller, correlated groups and IPF is applied independently within each group. This design significantly enhances computational efficiency while ensuring alignment with marginal distributions and preserving inter-variable dependencies. Second, we develop a hybrid framework to integrate IPF-derived weights into machine learning-based generative models. Two strategies are explored: (1) pre-sampling, where training data is reweighted using IPF weights to match marginal targets, and (2) weighted learning, where these weights are directly incorporated into the model's training objective. While the framework is model-agnostic, we use Bayesian networks as a case study. Extensive simulation studies and real-world synthetic population generation experiments demonstrate that the proposed blockwise IPF framework scales efficiently to high-dimensional settings, maintaining statistical accuracy while offering substantial reductions in computational time. These experiments further show that the hybrid strategy produces synthetic data with greater sample diversity and improved alignment with marginal distributions. Finally, we introduce early-stage work on a neural network-based approach for estimating the joint distribution of a contingency table given expected marginals. Preliminary results suggest that this new paradigm holds significant promise for addressing several fundamental limitations of IPF.
Language
en
Provenance
Received from ProQuest
Copyright Date
2025-08
File Size
226 p.
File Format
application/pdf
Rights Holder
William Ofosu Agyapong
Recommended Citation
Agyapong, William Ofosu, "Rethinking Iterative Proportional Fitting: Scalable And Hybrid Approaches To Joint Distribution Fitting" (2025). Open Access Theses & Dissertations. 4322.
https://scholarworks.utep.edu/open_etd/4322