000 04478nam a22005415i 4500
001 978-3-031-01878-7
003 DE-He213
005 20240730165153.0
007 cr nn 008mamaa
008 220601s2021 sz | s |||| 0|eng d
020 _a9783031018787
_9978-3-031-01878-7
024 7 _a10.1007/978-3-031-01878-7
_2doi
050 4 _aTK5105.5-5105.9
072 7 _aUKN
_2bicssc
072 7 _aCOM043000
_2bisacsh
072 7 _aUKN
_2thema
082 0 4 _a004.6
_223
100 1 _aPapadakis, George.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_987720
245 1 4 _aThe Four Generations of Entity Resolution
_h[electronic resource] /
_cby George Papadakis, Ekaterini Ioannou, Emanouil Thanos, Themis Palpanas.
250 _a1st ed. 2021.
264 1 _aCham :
_bSpringer International Publishing :
_bImprint: Springer,
_c2021.
300 _aXVII, 152 p.
_bonline resource.
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
347 _atext file
_bPDF
_2rda
490 1 _aSynthesis Lectures on Data Management,
_x2153-5426
505 0 _aPreface -- Acknowledgments -- Entity Resolution: Past, Present, and Yet-to-Come -- Preliminaries -- Generation 1: Addressing Veracity -- Generation 2: Also Addressing Volume -- Generation 3: Also Addressing Variety -- Generation 4: Also Addressing Velocity -- Leveraging External Knowledge -- Resources for Entity Resolution -- Possible Directions for Future Work -- Bibliography -- Authors' Biographies.
520 _aEntity Resolution (ER) lies at the core of data integration and cleaning and, thus, a bulk of the research examines ways for improving its effectiveness and time efficiency. The initial ER methods primarily target Veracity in the context of structured (relational) data that are described by a schema of well-known quality and meaning. To achieve high effectiveness, they leverage schema, expert, and/or external knowledge. Part of these methods are extended to address Volume, processing large datasets through multi-core or massive parallelization approaches, such as the MapReduce paradigm. However, these early schema-based approaches are inapplicable to Web Data, which abound in voluminous, noisy, semi-structured, and highly heterogeneous information. To address the additional challenge of Variety, recent works on ER adopt a novel, loosely schema-aware functionality that emphasizes scalability and robustness to noise. Another line of present research focuses on the additional challenge ofVelocity, aiming to process data collections of a continuously increasing volume. The latest works, though, take advantage of the significant breakthroughs in Deep Learning and Crowdsourcing, incorporating external knowledge to enhance the existing words to a significant extent. This synthesis lecture organizes ER methods into four generations based on the challenges posed by these four Vs. For each generation, we outline the corresponding ER workflow, discuss the state-of-the-art methods per workflow step, and present current research directions. The discussion of these methods takes into account a historical perspective, explaining the evolution of the methods over time along with their similarities and differences. The lecture also discusses the available ER tools and benchmark datasets that allow expert as well as novice users to make use of the available solutions.
650 0 _aComputer networks .
_931572
650 0 _aData structures (Computer science).
_98188
650 0 _aInformation theory.
_914256
650 1 4 _aComputer Communication Networks.
_987723
650 2 4 _aData Structures and Information Theory.
_931923
700 1 _aIoannou, Ekaterini.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_987724
700 1 _aThanos, Emanouil.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_987725
700 1 _aPalpanas, Themis.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_987728
710 2 _aSpringerLink (Online service)
_987729
773 0 _tSpringer Nature eBook
776 0 8 _iPrinted edition:
_z9783031001055
776 0 8 _iPrinted edition:
_z9783031007507
776 0 8 _iPrinted edition:
_z9783031030062
830 0 _aSynthesis Lectures on Data Management,
_x2153-5426
_987730
856 4 0 _uhttps://doi.org/10.1007/978-3-031-01878-7
912 _aZDB-2-SXSC
942 _cEBK
999 _c86140
_d86140