Date of Award

2025-12-01

Degree Name

Master of Science

Department

Systems Engineering

Advisor(s)

Sergio Luna

Abstract

Large Language Models (LLMs) are increasingly viewed as viable tools for Systems Engineering, yet empirical data regarding their effectiveness in generating engineering-quality requirements remains limited. This thesis compares the performance of GPT-5 and Gemini 3 Pro in generating system functional requirements for the Miner Guardian Drone, a prototype UAV designed for minefield remediation. Both models were provided with identical prompts and tasked with producing 15 functional requirements. Two independent raters evaluated the outputs using a nine-dimensional quality rubric aligned with IEEE 29148 and INCOSE guidelines to ensure rigorous assessment. The difference between the models was analyzed using descriptive statistics and a Single Factor Analysis of Variance (ANOVA). The results indicate that Gemini 3 Pro achieved a higher mean quality score and demonstrated greater consistency with lower variability compared to GPT-5. While GPT-5 produced high-quality outputs, it exhibited a wider score range and a higher frequency of lower-quality outliers. However, the ANOVA results suggest that the performance difference was not statistically significant at the 95% confidence level, implying that while Gemini showed superior descriptive metrics, the variance in the current sample size precludes a definitive rejection of the null hypothesis. This thesis offers an evidence-based, methodological comparison of state-of-the-art LLMs for requirements engineering and identifies critical future research opportunities, including domain-tuned LLMs and automated traceability support.

Language

en

Provenance

Received from ProQuest

File Size

58 p.

File Format

application/pdf

Rights Holder

Guadalupe Nevarez

Share

COinS