Date of Award

2023-12-01

Degree Name

Master of Science

Department

Electrical and Computer Engineering

Advisor(s)

Rodrigo A. Romero

Abstract

Ionizing radiation remains an obstacle to bringing graphics processing units (GPU) to space. Since radiation-hardened GPU chips are technically infeasible at the moment, an emphasis has been placed on the adaptation of commercial-off-the-shelf (COTS) GPUs to the space domain. At present, GPU error detection methods require redundant computation. This thesis work explores the utilization of hardware performance counters, special registers useful for monitoring internal GPU hardware events, for symptom-based, lightweight error detection. Hardware performance counters are successfully utilized for the detection of anomalous single event upsets in the L0 instruction cache, the load store unit, the arithmetic and logic unit, the fused multiply add pipeline, and the address divergence unit of a GPU. These upsets are detected using both supervised and unsupervised shallow machine learning models. Results indicate a viable alternative to redundancy-based computational methods for detection and handling of single-event upsets in a subset of components of a GPU architecture.

Language

en

Provenance

Recieved from ProQuest

File Size

84 p.

File Format

application/pdf

Rights Holder

Antonio E Teijeiro

Share

COinS