Date of Award


Degree Name

Doctor of Philosophy


Computer Science


Rodrigo A. Romero


Heterogeneous architectures can improve the performance of applications with computationally intensive operations. Even when these architectures may reduce the execution time of applications, there are opportunities for additional performance improvement as the memory hierarchies of the central processor cores and the coprocessor cores are separate. Applications running on heterogeneous architectures where graphics processing units (GPUs) execute throughput-intense, data-parallel operations may run in a single address space provided by unified virtual addressing or expand the upper bounds of scalability and high performance computing by explicitly partitioning and transferring data across orthogonal host and device address spaces. For explicit handling, applications must allocate space in the GPU global memory, copy input data, invoke kernels, and copy results to the CPU memory. By overlapping inter-memory data transfers and GPU computation steps, applications may further reduce execution time. This research presents a software architecture with a runtime pipeline for GPU input/output scheduling that acts as a bidirectional interface between the GPU computing application and the physical device. The main aim of this system is to reduce the impact of the processor-memory performance gap by exploiting device I/O and computation overlap. Evaluation using application benchmarks shows processing improvements with speedups up to 2.37x with respect to baseline, non-streamed GPU execution. In addition, the presented input/output scheduling system is a high-level, systems abstraction that removes application software complexity while exploiting the input/output and processing concurrency capabilities of the underlying GPU.




Received from ProQuest

File Size

130 pages

File Format


Rights Holder

Julio Cesar Olaya