Built In Fault Tolerant Computing Paradigm For Resilient Large Scale Chip Design

Download Built In Fault Tolerant Computing Paradigm For Resilient Large Scale Chip Design PDF/ePub or read online books in Mobi eBooks. Click Download or Read Online button to get Built In Fault Tolerant Computing Paradigm For Resilient Large Scale Chip Design book now. This website allows unlimited access to, at the time of writing, more than 1.5 million titles, including hundreds of thousands of titles in various foreign languages.
Built-in Fault-Tolerant Computing Paradigm for Resilient Large-Scale Chip Design

With the end of Dennard scaling and Moore’s law, IC chips, especially large-scale ones, now face more reliability challenges, and reliability has become one of the mainstay merits of VLSI designs. In this context, this book presents a built-in on-chip fault-tolerant computing paradigm that seeks to combine fault detection, fault diagnosis, and error recovery in large-scale VLSI design in a unified manner so as to minimize resource overhead and performance penalties. Following this computing paradigm, we propose a holistic solution based on three key components: self-test, self-diagnosis and self-repair, or “3S” for short. We then explore the use of 3S for general IC designs, general-purpose processors, network-on-chip (NoC) and deep learning accelerators, and present prototypes to demonstrate how 3S responds to in-field silicon degradation and recovery under various runtime faults caused by aging, process variations, or radical particles. Moreover, we demonstrate that 3S not only offers a powerful backbone for various on-chip fault-tolerant designs and implementations, but also has farther-reaching implications such as maintaining graceful performance degradation, mitigating the impact of verification blind spots, and improving chip yield. This book is the outcome of extensive fault-tolerant computing research pursued at the State Key Lab of Processors, Institute of Computing Technology, Chinese Academy of Sciences over the past decade. The proposed built-in on-chip fault-tolerant computing paradigm has been verified in a broad range of scenarios, from small processors in satellite computers to large processors in HPCs. Hopefully, it will provide an alternative yet effective solution to the growing reliability challenges for large-scale VLSI designs.
Dependable Multicore Architectures at Nanoscale

This book provides comprehensive coverage of the dependability challenges in today's advanced computing systems. It is an in-depth discussion of all the technological and design-level techniques that may be used to overcome these issues and analyzes various dependability-assessment methods. The impact of individual application scenarios on the definition of challenges and solutions is considered so that the designer can clearly assess the problems and adjust the solution based on the specifications in question. The book is composed of three sections, beginning with an introduction to current dependability challenges arising in complex computing systems implemented with nanoscale technologies, and of the effect of the application scenario. The second section details all the fault-tolerance techniques that are applicable in the manufacture of reliable advanced computing devices. Different levels, from technology-level fault avoidance to the use of error correcting codes and system-level checkpointing are introduced and explained as applicable to the different application scenario requirements. Finally the third section proposes a roadmap of future trends in and perspectives on the dependability and manufacturability of advanced computing systems from the special point of view of industrial stakeholders. Dependable Multicore Architectures at Nanoscale showcases the original ideas and concepts introduced into the field of nanoscale manufacturing and systems reliability over nearly four years of work within COST Action IC1103 MEDIAN, a think-tank with participants from 27 countries. Academic researchers and graduate students working in multi-core computer systems and their manufacture will find this book of interest as will industrial design and manufacturing engineers working in VLSI companies.
Encyclopedia of Parallel Computing

Author: David Padua
language: en
Publisher: Springer Science & Business Media
Release Date: 2014-07-08
Containing over 300 entries in an A-Z format, the Encyclopedia of Parallel Computing provides easy, intuitive access to relevant information for professionals and researchers seeking access to any aspect within the broad field of parallel computing. Topics for this comprehensive reference were selected, written, and peer-reviewed by an international pool of distinguished researchers in the field. The Encyclopedia is broad in scope, covering machine organization, programming languages, algorithms, and applications. Within each area, concepts, designs, and specific implementations are presented. The highly-structured essays in this work comprise synonyms, a definition and discussion of the topic, bibliographies, and links to related literature. Extensive cross-references to other entries within the Encyclopedia support efficient, user-friendly searchers for immediate access to useful information. Key concepts presented in the Encyclopedia of Parallel Computing include; laws and metrics; specific numerical and non-numerical algorithms; asynchronous algorithms; libraries of subroutines; benchmark suites; applications; sequential consistency and cache coherency; machine classes such as clusters, shared-memory multiprocessors, special-purpose machines and dataflow machines; specific machines such as Cray supercomputers, IBM’s cell processor and Intel’s multicore machines; race detection and auto parallelization; parallel programming languages, synchronization primitives, collective operations, message passing libraries, checkpointing, and operating systems. Topics covered: Speedup, Efficiency, Isoefficiency, Redundancy, Amdahls law, Computer Architecture Concepts, Parallel Machine Designs, Benmarks, Parallel Programming concepts & design, Algorithms, Parallel applications. This authoritative reference will be published in two formats: print and online. The online edition features hyperlinks to cross-references and to additional significant research. Related Subjects: supercomputing, high-performance computing, distributed computing