Date of Award
2023-12-01
Degree Name
Master of Science
Department
Electrical and Computer Engineering
Advisor(s)
Rodrigo A. Romero
Abstract
Ionizing radiation remains an obstacle to bringing graphics processing units (GPU) to space. Since radiation-hardened GPU chips are technically infeasible at the moment, an emphasis has been placed on the adaptation of commercial-off-the-shelf (COTS) GPUs to the space domain. At present, GPU error detection methods require redundant computation. This thesis work explores the utilization of hardware performance counters, special registers useful for monitoring internal GPU hardware events, for symptom-based, lightweight error detection. Hardware performance counters are successfully utilized for the detection of anomalous single event upsets in the L0 instruction cache, the load store unit, the arithmetic and logic unit, the fused multiply add pipeline, and the address divergence unit of a GPU. These upsets are detected using both supervised and unsupervised shallow machine learning models. Results indicate a viable alternative to redundancy-based computational methods for detection and handling of single-event upsets in a subset of components of a GPU architecture.
Language
en
Provenance
Recieved from ProQuest
Copyright Date
2023-12
File Size
84 p.
File Format
application/pdf
Rights Holder
Antonio E Teijeiro
Recommended Citation
Teijeiro, Antonio E., "Towards a Spaceworthy COTS Graphics Processing Unit: Hardware Performance Counter Based Symptomatic Fault Detection" (2023). Open Access Theses & Dissertations. 4055.
https://scholarworks.utep.edu/open_etd/4055