000 04116nam a22005295i 4500
001 978-3-031-01814-5
003 DE-He213
005 20240730163723.0
007 cr nn 008mamaa
008 220601s2016 sz | s |||| 0|eng d
020 _a9783031018145
_9978-3-031-01814-5
024 7 _a10.1007/978-3-031-01814-5
_2doi
050 4 _aTA1501-1820
050 4 _aTA1634
072 7 _aUYT
_2bicssc
072 7 _aCOM016000
_2bisacsh
072 7 _aUYT
_2thema
082 0 4 _a006
_223
100 1 _aKanatani, Kenichi.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_980174
245 1 0 _aComputational Methods for Integrating Vision and Language
_h[electronic resource] /
_cby Kenichi Kanatani, Yasuyuki Sugaya.
250 _a1st ed. 2016.
264 1 _aCham :
_bSpringer International Publishing :
_bImprint: Springer,
_c2016.
300 _aXVI, 211 p.
_bonline resource.
336 _atext
_btxt
_2rdacontent
337 _acomputer
_bc
_2rdamedia
338 _aonline resource
_bcr
_2rdacarrier
347 _atext file
_bPDF
_2rda
490 1 _aSynthesis Lectures on Computer Vision,
_x2153-1064
505 0 _aAcknowledgments -- Figure Credits -- Introduction -- The Semantics of Images and Associated Text -- Sources of Data for Linking Visual and Linguistic Information -- Extracting and Representing Visual Information -- Text and Speech Processing -- Modeling Images and Keywords -- Beyond Simple Nouns -- Sequential Structure -- Bibliography -- Author's Biography.
520 _aModeling data from visual and linguistic modalities together creates opportunities for better understanding of both, and supports many useful applications. Examples of dual visual-linguistic data includes images with keywords, video with narrative, and figures in documents. We consider two key task-driven themes: translating from one modality to another (e.g., inferring annotations for images) and understanding the data using all modalities, where one modality can help disambiguate information in another. The multiple modalities can either be essentially semantically redundant (e.g., keywords provided by a person looking at the image), or largely complementary (e.g., meta data such as the camera used). Redundancy and complementarity are two endpoints of a scale, and we observe that good performance on translation requires some redundancy, and that joint inference is most useful where some information is complementary. Computational methods discussed are broadly organized into ones forsimple keywords, ones going beyond keywords toward natural language, and ones considering sequential aspects of natural language. Methods for keywords are further organized based on localization of semantics, going from words about the scene taken as whole, to words that apply to specific parts of the scene, to relationships between parts. Methods going beyond keywords are organized by the linguistic roles that are learned, exploited, or generated. These include proper nouns, adjectives, spatial and comparative prepositions, and verbs. More recent developments in dealing with sequential structure include automated captioning of scenes and video, alignment of video and text, and automated answering of questions about scenes depicted in images.
650 0 _aImage processing
_xDigital techniques.
_94145
650 0 _aComputer vision.
_980175
650 0 _aPattern recognition systems.
_93953
650 1 4 _aComputer Imaging, Vision, Pattern Recognition and Graphics.
_931569
650 2 4 _aComputer Vision.
_980176
650 2 4 _aAutomated Pattern Recognition.
_931568
700 1 _aSugaya, Yasuyuki.
_eauthor.
_4aut
_4http://id.loc.gov/vocabulary/relators/aut
_980177
710 2 _aSpringerLink (Online service)
_980178
773 0 _tSpringer Nature eBook
776 0 8 _iPrinted edition:
_z9783031006869
776 0 8 _iPrinted edition:
_z9783031029424
830 0 _aSynthesis Lectures on Computer Vision,
_x2153-1064
_980179
856 4 0 _uhttps://doi.org/10.1007/978-3-031-01814-5
912 _aZDB-2-SXSC
942 _cEBK
999 _c84912
_d84912