000 | 03940nam a22005055i 4500 | ||
---|---|---|---|
001 | 978-3-031-01897-8 | ||
003 | DE-He213 | ||
005 | 20240730163744.0 | ||
007 | cr nn 008mamaa | ||
008 | 220601s2013 sz | s |||| 0|eng d | ||
020 |
_a9783031018978 _9978-3-031-01897-8 |
||
024 | 7 |
_a10.1007/978-3-031-01897-8 _2doi |
|
050 | 4 | _aTK5105.5-5105.9 | |
072 | 7 |
_aUKN _2bicssc |
|
072 | 7 |
_aCOM043000 _2bisacsh |
|
072 | 7 |
_aUKN _2thema |
|
082 | 0 | 4 |
_a004.6 _223 |
100 | 1 |
_aGanti, Venkatesh. _eauthor. _4aut _4http://id.loc.gov/vocabulary/relators/aut _980398 |
|
245 | 1 | 0 |
_aData Cleaning _h[electronic resource] / _cby Venkatesh Ganti, Anish Das Sarma. |
250 | _a1st ed. 2013. | ||
264 | 1 |
_aCham : _bSpringer International Publishing : _bImprint: Springer, _c2013. |
|
300 |
_aXV, 69 p. _bonline resource. |
||
336 |
_atext _btxt _2rdacontent |
||
337 |
_acomputer _bc _2rdamedia |
||
338 |
_aonline resource _bcr _2rdacarrier |
||
347 |
_atext file _bPDF _2rda |
||
490 | 1 |
_aSynthesis Lectures on Data Management, _x2153-5426 |
|
505 | 0 | _aPreface -- Acknowledgments -- Introduction -- Technological Approaches -- Similarity Functions -- Operator: Similarity Join -- Operator: Clustering -- Operator: Parsing -- Task: Record Matching -- Task: Deduplication -- Data Cleaning Scripts -- Conclusion -- Bibliography -- Authors' Biographies. | |
520 | _aData warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks. | ||
650 | 0 |
_aComputer networks . _931572 |
|
650 | 0 |
_aData structures (Computer science). _98188 |
|
650 | 0 |
_aInformation theory. _914256 |
|
650 | 1 | 4 |
_aComputer Communication Networks. _980399 |
650 | 2 | 4 |
_aData Structures and Information Theory. _931923 |
700 | 1 |
_aSarma, Anish Das. _eauthor. _4aut _4http://id.loc.gov/vocabulary/relators/aut _980400 |
|
710 | 2 |
_aSpringerLink (Online service) _980401 |
|
773 | 0 | _tSpringer Nature eBook | |
776 | 0 | 8 |
_iPrinted edition: _z9783031007699 |
776 | 0 | 8 |
_iPrinted edition: _z9783031030253 |
830 | 0 |
_aSynthesis Lectures on Data Management, _x2153-5426 _980402 |
|
856 | 4 | 0 | _uhttps://doi.org/10.1007/978-3-031-01897-8 |
912 | _aZDB-2-SXSC | ||
942 | _cEBK | ||
999 |
_c84953 _d84953 |