000 04024nam a22005415i 4500
001 978-3-031-01737-7
003 DE-He213
005 20240730163655.0
007 cr nn 008mamaa
008 220601s2012 sz | s |||| 0|eng d
020 _a9783031017377
_9978-3-031-01737-7
024 7 _a10.1007/978-3-031-01737-7
_2doi
050 4 _aTK7867-7867.5
072 7 _aTJFC
_2bicssc
072 7 _aTEC008010
_2bisacsh
072 7 _aTJFC
_2thema
082 0 4 _a621.3815
_223
100 1 _aKim, Hyesoon.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_979873
245 1 0 _aPerformance Analysis and Tuning for General Purpose Graphics Processing Units (GPGPU)
_h[electronic resource] /
_cby Hyesoon Kim, Richard Vuduc, Sara Baghsorkhi, Jee Choi, Wen-mei W. Hwu.
250 _a1st ed. 2012.
264 1 _aCham :
_bSpringer International Publishing :
_bImprint: Springer,
_c2012.
300 _aXII, 88 p.
_bonline resource.
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
347 _atext file
_bPDF
_2rda
490 1 _aSynthesis Lectures on Computer Architecture,
_x1935-3243
505 0 _aGPU Design, Programming, and Trends -- Performance Principles -- From Principles to Practice: Analysis and Tuning -- Using Detailed Performance Analysis to Guide Optimization.
520 _aGeneral-purpose graphics processing units (GPGPU) have emerged as an important class of shared memory parallel processing architectures, with widespread deployment in every computer class from high-end supercomputers to embedded mobile platforms. Relative to more traditional multicore systems of today, GPGPUs have distinctly higher degrees of hardware multithreading (hundreds of hardware thread contexts vs. tens), a return to wide vector units (several tens vs. 1-10), memory architectures that deliver higher peak memory bandwidth (hundreds of gigabytes per second vs. tens), and smaller caches/scratchpad memories (less than 1 megabyte vs. 1-10 megabytes). In this book, we provide a high-level overview of current GPGPU architectures and programming models. We review the principles that are used in previous shared memory parallel platforms, focusing on recent results in both the theory and practice of parallel algorithms, and suggest a connection to GPGPU platforms. We aim to provide hints to architects about understanding algorithm aspect to GPGPU. We also provide detailed performance analysis and guide optimizations from high-level algorithms to low-level instruction level optimizations. As a case study, we use n-body particle simulations known as the fast multipole method (FMM) as an example. We also briefly survey the state-of-the-art in GPU performance analysis tools and techniques. Table of Contents: GPU Design, Programming, and Trends / Performance Principles / From Principles to Practice: Analysis and Tuning / Using Detailed Performance Analysis to Guide Optimization.
650 0 _aElectronic circuits.
_919581
650 0 _aMicroprocessors.
_979874
650 0 _aComputer architecture.
_93513
650 1 4 _aElectronic Circuits and Systems.
_979875
650 2 4 _aProcessor Architectures.
_979876
700 1 _aVuduc, Richard.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_979877
700 1 _aBaghsorkhi, Sara.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_979878
700 1 _aChoi, Jee.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_979879
700 1 _aHwu, Wen-mei W.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_979880
710 2 _aSpringerLink (Online service)
_979881
773 0 _tSpringer Nature eBook
776 0 8 _iPrinted edition:
_z9783031006098
776 0 8 _iPrinted edition:
_z9783031028656
830 0 _aSynthesis Lectures on Computer Architecture,
_x1935-3243
_979882
856 4 0 _uhttps://doi.org/10.1007/978-3-031-01737-7
912 _aZDB-2-SXSC
942 _cEBK
999 _c84863
_d84863