Normal view MARC view ISBD view

Data Cleaning (Record no. 84953)

000 -LEADER
fixed length control field	03940nam a22005055i 4500
001 - CONTROL NUMBER
control field	978-3-031-01897-8
005 - DATE AND TIME OF LATEST TRANSACTION
control field	20240730163744.0
008 - FIXED-LENGTH DATA ELEMENTS--GENERAL INFORMATION
fixed length control field	220601s2013 sz \| s \|\|\|\| 0\|eng d
020 ## - INTERNATIONAL STANDARD BOOK NUMBER
ISBN	9783031018978
--	978-3-031-01897-8
082 04 - CLASSIFICATION NUMBER
Call Number	004.6
100 1# - AUTHOR NAME
Author	Ganti, Venkatesh.
245 10 - TITLE STATEMENT
Title	Data Cleaning
250 ## - EDITION STATEMENT
Edition statement	1st ed. 2013.
300 ## - PHYSICAL DESCRIPTION
Number of Pages	XV, 69 p.
490 1# - SERIES STATEMENT
Series statement	Synthesis Lectures on Data Management,
505 0# - FORMATTED CONTENTS NOTE
Remark 2	Preface -- Acknowledgments -- Introduction -- Technological Approaches -- Similarity Functions -- Operator: Similarity Join -- Operator: Clustering -- Operator: Parsing -- Task: Record Matching -- Task: Deduplication -- Data Cleaning Scripts -- Conclusion -- Bibliography -- Authors' Biographies.
520 ## - SUMMARY, ETC.
Summary, etc	Data warehouses consolidate various activities of a business and often form the backbone for generating reports that support important business decisions. Errors in data tend to creep in for a variety of reasons. Some of these reasons include errors during input data collection and errors while merging data collected independently across different databases. These errors in data warehouses often result in erroneous upstream reports, and could impact business decisions negatively. Therefore, one of the critical challenges while maintaining large data warehouses is that of ensuring the quality of data in the data warehouse remains high. The process of maintaining high data quality is commonly referred to as data cleaning. In this book, we first discuss the goals of data cleaning. Often, the goals of data cleaning are not well defined and could mean different solutions in different scenarios. Toward clarifying these goals, we abstract out a common set of data cleaning tasks that often need to be addressed. This abstraction allows us to develop solutions for these common data cleaning tasks. We then discuss a few popular approaches for developing such solutions. In particular, we focus on an operator-centric approach for developing a data cleaning platform. The operator-centric approach involves the development of customizable operators that could be used as building blocks for developing common solutions. This is similar to the approach of relational algebra for query processing. The basic set of operators can be put together to build complex queries. Finally, we discuss the development of custom scripts which leverage the basic data cleaning operators along with relational operators to implement effective solutions for data cleaning tasks.
700 1# - AUTHOR 2
Author 2	Sarma, Anish Das.
856 40 - ELECTRONIC LOCATION AND ACCESS
Uniform Resource Identifier	https://doi.org/10.1007/978-3-031-01897-8
942 ## - ADDED ENTRY ELEMENTS (KOHA)
Koha item type	eBooks
264 #1 -
--	Cham :
--	Springer International Publishing :
--	Imprint: Springer,
--	2013.
336 ## -
--	text
--	txt
--	rdacontent
337 ## -
--	computer
--	c
--	rdamedia
338 ## -
--	online resource
--	cr
--	rdacarrier
347 ## -
--	text file
--	PDF
--	rda
650 #0 - SUBJECT ADDED ENTRY--SUBJECT 1
--	Computer networks .
650 #0 - SUBJECT ADDED ENTRY--SUBJECT 1
--	Data structures (Computer science).
650 #0 - SUBJECT ADDED ENTRY--SUBJECT 1
--	Information theory.
650 14 - SUBJECT ADDED ENTRY--SUBJECT 1
--	Computer Communication Networks.
650 24 - SUBJECT ADDED ENTRY--SUBJECT 1
--	Data Structures and Information Theory.
830 #0 - SERIES ADDED ENTRY--UNIFORM TITLE
--	2153-5426
912 ## -
--	ZDB-2-SXSC

No items available.

Central Library

Data Cleaning (Record no. 84953)