000 04388nam a22005535i 4500
001 978-3-031-03763-4
003 DE-He213
005 20240730164017.0
007 cr nn 008mamaa
008 220601s2022 sz | s |||| 0|eng d
020 _a9783031037634
_9978-3-031-03763-4
024 7 _a10.1007/978-3-031-03763-4
_2doi
050 4 _aQ334-342
050 4 _aTA347.A78
072 7 _aUYQ
_2bicssc
072 7 _aCOM004000
_2bisacsh
072 7 _aUYQ
_2thema
082 0 4 _a006.3
_223
100 1 _aPaun, Silviu.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_981638
245 1 0 _aStatistical Methods for Annotation Analysis
_h[electronic resource] /
_cby Silviu Paun, Ron Artstein, Massimo Poesio.
250 _a1st ed. 2022.
264 1 _aCham :
_bSpringer International Publishing :
_bImprint: Springer,
_c2022.
300 _aXIX, 197 p.
_bonline resource.
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
347 _atext file
_bPDF
_2rda
490 1 _aSynthesis Lectures on Human Language Technologies,
_x1947-4059
505 0 _aPreface -- Acknowledgements -- Introduction -- Coefficients of Agreement -- Using Agreement Measures for CL Annotation Tasks -- Probabilistic Models of Agreement -- Probabilistic Models of Annotation -- Learning from Multi-Annotated Corpora -- Bibliography -- Authors' Biographies.
520 _aLabelling data is one of the most fundamental activities in science, and has underpinned practice, particularly in medicine, for decades, as well as research in corpus linguistics since at least the development of the Brown corpus. With the shift towards Machine Learning in Artificial Intelligence (AI), the creation of datasets to be used for training and evaluating AI systems, also known in AI as corpora, has become a central activity in the field as well. Early AI datasets were created on an ad-hoc basis to tackle specific problems. As larger and more reusable datasets were created, requiring greater investment, the need for a more systematic approach to dataset creation arose to ensure increased quality. A range of statistical methods were adopted, often but not exclusively from the medical sciences, to ensure that the labels used were not subjective, or to choose among different labels provided by the coders. A wide variety of such methods is now in regular use. This book is meantto provide a survey of the most widely used among these statistical methods supporting annotation practice. As far as the authors know, this is the first book attempting to cover the two families of methods in wider use. The first family of methods is concerned with the development of labelling schemes and, in particular, ensuring that such schemes are such that sufficient agreement can be observed among the coders. The second family includes methods developed to analyze the output of coders once the scheme has been agreed upon, particularly although not exclusively to identify the most likely label for an item among those provided by the coders. The focus of this book is primarily on Natural Language Processing, the area of AI devoted to the development of models of language interpretation and production, but many if not most of the methods discussed here are also applicable to other areas of AI, or indeed, to other areas of Data Science.
650 0 _aArtificial intelligence.
_93407
650 0 _aNatural language processing (Computer science).
_94741
650 0 _aComputational linguistics.
_96146
650 1 4 _aArtificial Intelligence.
_93407
650 2 4 _aNatural Language Processing (NLP).
_931587
650 2 4 _aComputational Linguistics.
_96146
700 1 _aArtstein, Ron.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_981639
700 1 _aPoesio, Massimo.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_981640
710 2 _aSpringerLink (Online service)
_981641
773 0 _tSpringer Nature eBook
776 0 8 _iPrinted edition:
_z9783031037733
776 0 8 _iPrinted edition:
_z9783031037535
776 0 8 _iPrinted edition:
_z9783031037832
830 0 _aSynthesis Lectures on Human Language Technologies,
_x1947-4059
_981642
856 4 0 _uhttps://doi.org/10.1007/978-3-031-03763-4
912 _aZDB-2-SXSC
942 _cEBK
999 _c85215
_d85215