The Intel Xeon Phi accelerator is currently being used in several large-scale computer clusters and supercomputers to enhance the execution-time performance of computation-intensive applications. While performing a comprehensive profiling of the Intel Xeon Phi execution-time behavior of different applications included in the Rodinia Benchmark suite, we observed large variations in application execution times. In this report we present the average execution times for different runs of each application. In addition, we describe the different steps taken to try to solve this problem.
For example, a brief study was performed using one of these applications, i.e., a matrix-multiply kernel. By improving the vectorization of this application, the variation was reduced from an average of 25% to an average of 10%. However, the root cause of the remaining variation was not identified. Because the execution times of the other applications also exhibit similar levels of variation, we hypothesize that this execution-time variation could be caused by the hardware or by performance issues associated with how OpenMP is utilized.