Fault Tolerance And Efficiency In Massively Parallel Algorithms

Download Fault Tolerance And Efficiency In Massively Parallel Algorithms PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Fault Tolerance And Efficiency In Massively Parallel Algorithms book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.
Foundations of Dependable Computing

Author: Gary M. Koob
language: en
Publisher: Springer Science & Business Media
Release Date: 2007-11-23
Foundations of Dependable Computing: Paradigms for Dependable Applications, presents a variety of specific approaches to achieving dependability at the application level. Driven by the higher level fault models of Models and Frameworks for Dependable Systems, and built on the lower level abstractions implemented in a third companion book subtitled System Implementation, these approaches demonstrate how dependability may be tuned to the requirements of an application, the fault environment, and the characteristics of the target platform. Three classes of paradigms are considered: protocol-based paradigms for distributed applications, algorithm-based paradigms for parallel applications, and approaches to exploiting application semantics in embedded real-time control systems. The companion volume subtitled Models and Frameworks for Dependable Systems presents two comprehensive frameworks for reasoning about system dependability, thereby establishing a context for understanding the roles played by specific approaches presented in this book's two companion volumes. It then explores the range of models and analysis methods necessary to design, validate and analyze dependable systems. Another companion book (published by Kluwer) subtitled System Implementation, explores the system infrastructure needed to support the various paradigms of Paradigms for Dependable Applications. Approaches to implementing support mechanisms and to incorporating additional appropriate levels of fault detection and fault tolerance at the processor, network, and operating system level are presented. A primary concern at these levels is balancing cost and performance against coverage and overall dependability. As these chapters demonstrate, low overhead, practical solutions are attainable and not necessarily incompatible with performance considerations. The section on innovative compiler support, in particular, demonstrates how the benefits of application specificity may be obtained while reducing hardware cost and run-time overhead.
Fault-Tolerant Parallel Computation

Author: Paris Christos Kanellakis
language: en
Publisher: Springer Science & Business Media
Release Date: 2013-03-09
Fault-Tolerant Parallel Computation presents recent advances in algorithmic ways of introducing fault-tolerance in multiprocessors under the constraint of preserving efficiency. The difficulty associated with combining fault-tolerance and efficiency is that the two have conflicting means: fault-tolerance is achieved by introducing redundancy, while efficiency is achieved by removing redundancy. This monograph demonstrates how in certain models of parallel computation it is possible to combine efficiency and fault-tolerance and shows how it is possible to develop efficient algorithms without concern for fault-tolerance, and then correctly and efficiently execute these algorithms on parallel machines whose processors are subject to arbitrary dynamic fail-stop errors. The efficient algorithmic approaches to multiprocessor fault-tolerance presented in this monograph make a contribution towards bridging the gap between the abstract models of parallel computation and realizable parallel architectures. Fault-Tolerant Parallel Computation presents the state of the art in algorithmic approaches to fault-tolerance in efficient parallel algorithms. The monograph synthesizes work that was presented in recent symposia and published in refereed journals by the authors and other leading researchers. This is the first text that takes the reader on the grand tour of this new field summarizing major results and identifying hard open problems. This monograph will be of interest to academic and industrial researchers and graduate students working in the areas of fault-tolerance, algorithms and parallel computation and may also be used as a text in a graduate course on parallel algorithmic techniques and fault-tolerance.
Fault-Tolerance and Efficiency in Massively Parallel Algorithms

We present an overview of massively parallel deterministic algorithms which combine high fault-tolerance and efficiency. This desirable combination (called robustness here) is nontrivial, since increasing efficiency implies removing redundancy whereas increasing fault-tolerance requires adding redundancy to computations. We study a spectrum of algorithmic models for which significant robustness is achievable, from static fault, synchronous computation to dynamic fault, asynchronous computation. In addition to fail-stop processor models. we examine and deal with arbitrarily initialized memory and restricted memory access concurrency. We survey the deterministic upper bounds for the basic Write-All primitive, the lower bounds on its efficiency, and we identify some of the key open questions. We also generalize the robust computing of functions to relations; this new approach can model approximate computations. We show how to compute approximate Write-All optimally. Finally, we synthesize the state-of-the-art in a complexity classification, which extends with fault- tolerance the traditional classification of efficient parallel algorithms.