Date of Award
Doctor of Philosophy
Patricia J. Teller
Simultaneous multithreading (SMT) allows multiple hardware threads to execute concurrently on a processor core, potentially increasing the utilization and throughput of the processor core by a factor of the degree of multithreading. However, such performance gains may not be achieved due to contention for resources shared by the threads. Hardware thread priorities can be used to control the ratio of decode cycles allocated to the hardware threads of a processor core and, therefore, the degree of resource contention among the threads. The IBM POWER5, which has two hardware threads associated with each of its two cores, supports hardware thread priorities. The best priority settings or best priority pair, i.e., the priority settings for a given co-schedule of application threads, that provide best throughput, depends on the characteristics of the application threads comprising the co-schedule.
In this dissertation we first demonstrate that the judicious setting of hardware thread priorities can be used to improve SMT processor core throughout. Then, we present a methodology for predicting the best priority pair for a given co-schedule of two application threads, each of which maps to a code segment that is characterized throughout by one shareable resource signature, which describes the code segment's utilization of shareable core resources when executed in single-threaded mode. Given a co-schedule of two such application threads, i.e., thread1 and thread2, thread1's shareable resource signature provides insights about the availability of core resources, which will be shared by the threads in SMT mode, for the use of thread2. The methodology and an implementation of the methodology for IBM's POWER5 processor make significant contributions toward application characterization that are useful in terms of determining hardware thread priorities that improve SMT processor core throughput and that may be useful in terms of phase detection, multi-core scheduling, and power management. The major contributions of this dissertation are:
1. the realization that hardware thread priorities can be used to enhance SMT processor throughput,
2. the notion of a shareable resource signature,
3. the best priority pair prediction methodology, and
4. demonstration of the utility of the methodology, e.g.,
For the 21 application thread co-schedules studied, the implementation for the IBM POWER5 processor achieved throughput that is between 0.59% to 16.442% better than default for nine of 21 co-schedule and for 11 of 21 throughput is equal to default.
For application threads with 10% or higher floating-point unit utilization, the predicted best priority pair yields throughput that is between 3.56% and 16.49% higher than default priorities.
17 out of 10,000 shareable resource signatures are sufficient to represent 95.6% of the execution time of 20 SPEC CPU2006 and 3 NAS NPB3.2 serial benchmarks (3 data inputs), and 10 PETSc KSP solvers (12 data inputs). The PETSc KSP solvers had signatures that were independent of input data, while only one of the three NAS NPB benchmarks (bt-mz) had a signature that was independent of the input data.
Received from ProQuest
Mitesh Ramesh Meswani
Meswani, Mitesh Ramesh, "Improving Throughput of Simultaneous Multithreaded (SMT) Processors using Shareable Resource Signatures and Hardware Thread Priorities" (2009). Open Access Theses & Dissertations. 2730.