Date of Award

2025-08-01

Degree Name

Doctor of Philosophy

Department

Mathematical Sciences

Advisor(s)

Abhijit Mandal

Second Advisor

Amy Wagler

Abstract

Social network analysis (SNA) research is often rife with data collection pitfalls, frequently leading to incomplete and missing data. With the growing use of SNA-based research, researchers must address the challenge of missing data and synthetic data generation in these settings. Missing data occurs due to longitudinal non-response or lack of response to sensitive or difficult-to-answer questions. Synthetic data generation in SNA settings addresses the lack of representation that is often present in large-scale SNA studies. This dissertation investigates synthetic data generation methods to address these challenges and develops a novel algorithm that leverages information from multi-modal data, e.g., databases combining graphical data with participant-level survey data. The synthetic data generation methods incorporate latent variable and stochastic modeling approaches, as well as large language models, approaches well-suited to SNA settings. The proposed algorithm is assessed using a variety of synthetic data generation approaches to determine the quality and diversity of the synthetic data. This assessment employs a rigorous set of metrics that are fine-tuned to SNA multi-modal data settings. The results demonstrate that the LLM and stochastic modeling approach outperformed the two latent feature models examined. This outcome potentially stems from the variable mapping in the latent feature models.

Language

en

Provenance

Received from ProQuest

File Size

117 p.

File Format

application/pdf

Rights Holder

Hortencia Josefina Hernandez

Share

COinS