000 03940nam a22005055i 4500
001 978-3-031-01897-8
003 DE-He213
005 20240730163744.0
007 cr nn 008mamaa
008 220601s2013 sz | s |||| 0|eng d
020 _a9783031018978
_9978-3-031-01897-8
024 7 _a10.1007/978-3-031-01897-8
_2doi
050 4 _aTK5105.5-5105.9
072 7 _aUKN
_2bicssc
072 7 _aCOM043000
_2bisacsh
072 7 _aUKN
_2thema
082 0 4 _a004.6
_223
100 1 _aGanti, Venkatesh.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_980398
245 1 0 _aData Cleaning
_h[electronic resource] /
_cby Venkatesh Ganti, Anish Das Sarma.
250 _a1st ed. 2013.
264 1 _aCham :
_bSpringer International Publishing :
_bImprint: Springer,
_c2013.
300 _aXV, 69 p.
_bonline resource.
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
347 _atext file
_bPDF
_2rda
490 1 _aSynthesis Lectures on Data Management,
_x2153-5426
505 0 _aPreface -- Acknowledgments -- Introduction -- Technological Approaches -- Similarity Functions -- Operator: Similarity Join -- Operator: Clustering -- Operator: Parsing -- Task: Record Matching -- Task: Deduplication -- Data Cleaning Scripts -- Conclusion -- Bibliography -- Authors' Biographies.
520 _aData warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.
650 0 _aComputer networks .
_931572
650 0 _aData structures (Computer science).
_98188
650 0 _aInformation theory.
_914256
650 1 4 _aComputer Communication Networks.
_980399
650 2 4 _aData Structures and Information Theory.
_931923
700 1 _aSarma, Anish Das.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_980400
710 2 _aSpringerLink (Online service)
_980401
773 0 _tSpringer Nature eBook
776 0 8 _iPrinted edition:
_z9783031007699
776 0 8 _iPrinted edition:
_z9783031030253
830 0 _aSynthesis Lectures on Data Management,
_x2153-5426
_980402
856 4 0 _uhttps://doi.org/10.1007/978-3-031-01897-8
912 _aZDB-2-SXSC
942 _cEBK
999 _c84953
_d84953