MARC View

000			03940nam a22005055i 4500
001			978-3-031-01897-8
003			DE-He213
005			20240730163744.0
007			cr nn 008mamaa
008			220601s2013 sz \| s \|\|\|\| 0\|eng d
020			_a9783031018978 _9978-3-031-01897-8
024	7		_a10.1007/978-3-031-01897-8 _2doi
050		4	_aTK5105.5-5105.9
072		7	_aUKN _2bicssc
072		7	_aCOM043000 _2bisacsh
072		7	_aUKN _2thema
082	0	4	_a004.6 _223
100	1		_aGanti, Venkatesh. _eauthor. _4aut _4http://id.loc.gov/vocabulary/relators/aut _980398
245	1	0	_aData Cleaning _h[electronic resource] / _cby Venkatesh Ganti, Anish Das Sarma.
250			_a1st ed. 2013.
264		1	_aCham : _bSpringer International Publishing : _bImprint: Springer, _c2013.
300			_aXV, 69 p. _bonline resource.
336			_atext _btxt _2rdacontent
337			_acomputer _bc _2rdamedia
338			_aonline resource _bcr _2rdacarrier
347			_atext file _bPDF _2rda
490	1		_aSynthesis Lectures on Data Management, _x2153-5426
505	0		_aPreface -- Acknowledgments -- Introduction -- Technological Approaches -- Similarity Functions -- Operator: Similarity Join -- Operator: Clustering -- Operator: Parsing -- Task: Record Matching -- Task: Deduplication -- Data Cleaning Scripts -- Conclusion -- Bibliography -- Authors' Biographies.
520			_aData warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.
650		0	_aComputer networks . _931572
650		0	_aData structures (Computer science). _98188
650		0	_aInformation theory. _914256
650	1	4	_aComputer Communication Networks. _980399
650	2	4	_aData Structures and Information Theory. _931923
700	1		_aSarma, Anish Das. _eauthor. _4aut _4http://id.loc.gov/vocabulary/relators/aut _980400
710	2		_aSpringerLink (Online service) _980401
773	0		_tSpringer Nature eBook
776	0	8	_iPrinted edition: _z9783031007699
776	0	8	_iPrinted edition: _z9783031030253
830		0	_aSynthesis Lectures on Data Management, _x2153-5426 _980402
856	4	0	_uhttps://doi.org/10.1007/978-3-031-01897-8
912			_aZDB-2-SXSC
942			_cEBK
999			_c84953 _d84953