Fault-Tolerance Techniques for High-Performance Computing (Record no. 58344)

000 -LEADER
fixed length control field 03801nam a22005055i 4500
001 - CONTROL NUMBER
control field 978-3-319-20943-2
005 - DATE AND TIME OF LATEST TRANSACTION
control field 20200421112542.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field 150701s2015 gw | s |||| 0|eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
ISBN 9783319209432
-- 978-3-319-20943-2
082 04 - CLASSIFICATION NUMBER
Call Number 004.24
245 10 - TITLE STATEMENT
Title Fault-Tolerance Techniques for High-Performance Computing
300 ## - PHYSICAL DESCRIPTION
Number of Pages IX, 320 p. 113 illus.
490 1# - SERIES STATEMENT
Series statement Computer Communications and Networks,
505 0# - FORMATTED CONTENTS NOTE
Remark 2 Part I: General Overview -- Fault-Tolerance Techniques for High-Performance Computing -- Part II: Technical Contributions -- Errors and Faults -- Fault-Tolerant MPI -- Using Replication for Resilience on Exascale Systems -- Energy-Aware Check pointing Strategies.
520 ## - SUMMARY, ETC.
Summary, etc This timely text/reference presents a comprehensive overview of fault tolerance techniques for high-performance computing (HPC). The text opens with a detailed introduction to the concepts of checkpoint protocols and scheduling algorithms, prediction, replication, silent error detection and correction, together with some application-specific techniques such as algorithm-based fault tolerance. Emphasis is placed on analytical performance models. This is then followed by a review of general-purpose techniques, including several checkpoint and rollback recovery protocols. Relevant execution scenarios are also evaluated and compared through quantitative models. Topics and features: Includes self-contained contributions from an international selection of preeminent experts Provides a survey of resilience methods and performance models Examines the various sources for errors and faults in large-scale systems, detailing their characteristics, with a focus on modeling, detection and prediction Reviews the spectrum of techniques that can be applied to design a fault-tolerant message passing interface Investigates different approaches to replication, comparing these to the traditional checkpoint-recovery approach Discusses the challenge of energy consumption of fault-tolerance methods in extreme-scale systems, proposing a methodology to estimate such energy consumption This authoritative volume is essential reading for all researchers and graduate students involved in high-performance computing. Dr. Thomas Herault is a Research Scientist in the Innovative Computing Laboratory (ICL) at the University of Tennessee Knoxville, TN, USA. Dr. Yves Robert is a Professor in the Laboratory of Parallel Computing at the Ecole Normale Sup�erieure de Lyon, France, and a Visiting Research Scholar in the ICL.
650 #0 - SUBJECT ADDED ENTRY--SUBJECT 1
General subdivision Reusability.
700 1# - AUTHOR 2
Author 2 Herault, Thomas.
700 1# - AUTHOR 2
Author 2 Robert, Yves.
856 40 - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier http://dx.doi.org/10.1007/978-3-319-20943-2
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type eBooks
264 #1 -
-- Cham :
-- Springer International Publishing :
-- Imprint: Springer,
-- 2015.
336 ## -
-- text
-- txt
-- rdacontent
337 ## -
-- computer
-- c
-- rdamedia
338 ## -
-- online resource
-- cr
-- rdacarrier
347 ## -
-- text file
-- PDF
-- rda
650 #0 - SUBJECT ADDED ENTRY--SUBJECT 1
-- Computer science.
650 #0 - SUBJECT ADDED ENTRY--SUBJECT 1
-- Computer software
650 #0 - SUBJECT ADDED ENTRY--SUBJECT 1
-- Computer system failures.
650 #0 - SUBJECT ADDED ENTRY--SUBJECT 1
-- Numerical analysis.
650 14 - SUBJECT ADDED ENTRY--SUBJECT 1
-- Computer Science.
650 24 - SUBJECT ADDED ENTRY--SUBJECT 1
-- System Performance and Evaluation.
650 24 - SUBJECT ADDED ENTRY--SUBJECT 1
-- Performance and Reliability.
650 24 - SUBJECT ADDED ENTRY--SUBJECT 1
-- Numeric Computing.
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE
-- 1617-7975
912 ## -
-- ZDB-2-SCS

No items available.