Goal Directed Performance Tuning For Scientific Applications

Download Goal Directed Performance Tuning For Scientific Applications PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Goal Directed Performance Tuning For Scientific Applications book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.
Goal-directed Performance Tuning for Scientific Applications

Abstract: "Performance tuning, as carried out by compiler designers and application programmers to close the performance gap between the achievable peak and delivered performance, becomes increasingly important and challenging as the microprocessor speeds and system sizes increase. However, although performance tuning on scientific codes usually deals with relatively small program regions, it is not generally known how to establish a reasonable performance objective and how to efficiently achieve this objective. We suggest a goal-directed approach and develop such an approach for each of three major system performance components: central processor unit (CPU) computation, memory accessing, and communication. For the CPU, we suggest using a machine-application performance model that characterizes workloads on four key function units (memory, floating-point, issue, and a virtual 'dependence unit') to produce an upper bound performance objective, and derive a mechanism to approach this objective. A case study shows an average 1.79x speedup achieved by using this approach for the Livermore Fortran Kernels 1-12 running on the IBM RS/6000. For memory, as compulsory and capacity misses are relatively easy to characterize, we derive a method for building application-specific cache behavior models that report the number of misses for all three types of conflict misses: self, cross, and ping-pong. The method uses averaging concepts to determine the expected number of cache misses instead of attempting to count them exactly in each instance, which provides a more rapid, yet realistic assessment of expected cache behavior. For each type of conflict miss, we propose a reduction method that uses one or a combination of three techniques based on modifying or exploiting data layout: array padding, initial address adjustment, and access resequencing. A case study using a blocked matrix multiply program as an example shows that the model is within 11% of the simulation results, and that each type of conflict miss can be effectively reduced or completely eliminated. For communication in shared memory parallel systems, we derive an array grouping mechanism and related loop transformations to reduce communication caused by the problematic case of nonconsecutive references to shared arrays and prove several theorems that determine when and where to apply this technique. The experimental results show a 15% reduction in communication, a 40% reduction in data subcache misses, and an 18% reduction in maximum user time for a finite element application on a 56 processor KSR1 parallel computer."
Performance Evaluation and Benchmarking with Realistic Applications

The book discusses rationales for creating and updating benchmarks, the use of benchmarks in academic research, benchmarking methodologies, the relation of SPEC benchmarks to other benchmarking activities, shortcomings of current benchmarks, and the need for further benchmarking efforts. Performance evaluation and benchmarking are of concern to all computer-related disciplines. A benchmark is a standard program or set of programs that can be run on different computers to give an accurate measure of their performance. This book covers a variety of aspects of computer performance evaluation, with a focus on Standard Performance Evaluation Corporation (SPEC) benchmarks. SPEC is a nonprofit organization whose members represent industry, academia, and other organizations. The book discusses rationales for creating and updating benchmarks, the use of benchmarks in academic research, benchmarking methodologies, the relation of SPEC benchmarks to other benchmarking activities, shortcomings of current benchmarks, and the need for further benchmarking efforts. Contributors Brian Armstrong, Frederica Darema, Edward S. Davidson, Sylvia Dieckmann, Jozo J. Dujmovic, Rudolf Eigenmann, J. Kelly Flanagan, Greg Gaertner, Jonathan Geisler, John Gustafson, Urs Hölzle, Shih-Hao Hung, Kathryn S. McKinley, Reinhard Riedl, Faisal Saied, Frank Sorenson, Mark Straka, Valerie Taylor, Olivier Temam, Rajat Todi, Reinhold Weicker
Performance-oriented Application Development for Distributed Architectures

Annotation This publication is devoted to programming models, languages, and tools for performance-oriented program development in commercial and scientific environments. The included papers have been written based on presentations given at the workshop PADDA 2001. The goal of the workshop was to identify common interests and techniques for performance-oriented program development in commercial and scientific environments. Distributed architectures currently dominate the field of highly parallel computing. Distributed architectures, based on Internet and mobile computing technologies, are important target architectures in the domain of commercial computing too. The papers in this publication come from the two areas: scientific computing and commercial computing.