000 04145nam a22004575i 4500
001 978-3-319-21903-5
003 DE-He213
005 20200421112557.0
007 cr nn 008mamaa
008 160203s2016 gw | s |||| 0|eng d
020 _a9783319219035
_9978-3-319-21903-5
024 7 _a10.1007/978-3-319-21903-5
_2doi
050 4 _aQA76.6-76.66
072 7 _aUM
_2bicssc
072 7 _aCOM051000
_2bisacsh
082 0 4 _a005.11
_223
100 1 _aNielsen, Frank.
_eauthor.
245 1 0 _aIntroduction to HPC with MPI for Data Science
_h[electronic resource] /
_cby Frank Nielsen.
250 _a1st ed. 2016.
264 1 _aCham :
_bSpringer International Publishing :
_bImprint: Springer,
_c2016.
300 _aXXXIII, 282 p. 101 illus. in color.
_bonline resource.
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
347 _atext file
_bPDF
_2rda
490 1 _aUndergraduate Topics in Computer Science,
_x1863-7310
505 0 _aPreface -- Part 1: High Performance Computing (HPC) with the Message Passing Interface (MPI) -- A Glance at High Performance Computing (HPC) -- Introduction to MPI: The Message Passing Interface -- Topology of Interconnection Networks -- Parallel Sorting -- Parallel Linear Algebra.-The MapReduce Paradigm -- Part 11: High Performance Computing for Data Science -- Partition-based Clustering with k means -- Hierarchical Clustering -- Supervised Learning: Practice and Theory of Classification with k NN rule -- Fast Approximate Optimization to High Dimensions with Core-sets and Fast Dimension Reduction -- Parallel Algorithms for Graphs -- Appendix A: Written Exam -- Appendix B: SLURM: A resource manager and job scheduler on clusters of machines -- Appendix C: List of Figures -- Appendix D: List of Tables -- Appendix E: Index.
520 _aThis gentle introduction to High Performance Computing (HPC) for Data Science using the Message Passing Interface (MPI) standard has been designed as a first course for undergraduates on parallel programming on distributed memory models, and requires only basic programming notions. Divided into two parts the first part covers high performance computing using C++ with the Message Passing Interface (MPI) standard followed by a second part providing high-performance data analytics on computer clusters. In the first part, the fundamental notions of blocking versus non-blocking point-to-point communications, global communications (like broadcast or scatter) and collaborative computations (reduce), with Amdalh and Gustafson speed-up laws are described before addressing parallel sorting and parallel linear algebra on computer clusters. The common ring, torus and hypercube topologies of clusters are then explained and global communication procedures on these topologies are studied. This first part closes with the MapReduce (MR) model of computation well-suited to processing big data using the MPI framework. In the second part, the book focuses on high-performance data analytics. Flat and hierarchical clustering algorithms are introduced for data exploration along with how to program these algorithms on computer clusters, followed by machine learning classification, and an introduction to graph analytics. This part closes with a concise introduction to data core-sets that let big data problems be amenable to tiny data problems. Exercises are included at the end of each chapter in order for students to practice the concepts learned, and a final section contains an overall exam which allows them to evaluate how well they have assimilated the material covered in the book.
650 0 _aComputer science.
650 0 _aComputer programming.
650 1 4 _aComputer Science.
650 2 4 _aProgramming Techniques.
710 2 _aSpringerLink (Online service)
773 0 _tSpringer eBooks
776 0 8 _iPrinted edition:
_z9783319219028
830 0 _aUndergraduate Topics in Computer Science,
_x1863-7310
856 4 0 _uhttp://dx.doi.org/10.1007/978-3-319-21903-5
912 _aZDB-2-SCS
942 _cEBK
999 _c59215
_d59215