Manasse, Mark.

On The Efficient Determination of Most Near Neighbors [electronic resource] / by Mark Manasse. - 1st ed. 2012. - IV, 88 p. online resource. - Synthesis Lectures on Information Concepts, Retrieval, and Services, 1947-9468 . - Synthesis Lectures on Information Concepts, Retrieval, and Services, .

Introduction -- Comparing Web Pages for Similarity: An Overview -- A Personal History of Web Search -- Uniform Sampling after Alta Vista -- Why Weight (and How)? -- A Few Applications.

The time-worn aphorism "close only counts in horseshoes and hand-grenades" is clearly inadequate. Close also counts in golf, shuffleboard, archery, darts, curling, and other games of accuracy in which hitting the precise center of the target isn't to be expected every time, or in which we can expect to be driven from the target by skilled opponents. This lecture is not devoted to sports discussions, but to efficient algorithms for determining pairs of closely related web pages -- and a few other situations in which we have found that inexact matching is good enough; where proximity suffices. We will not, however, attempt to be comprehensive in the investigation of probabilistic algorithms, approximation algorithms, or even techniques for organizing the discovery of nearest neighbors. We are more concerned with finding nearby neighbors; if they are not particularly close by, we are not particularly interested. In thinking of when approximation is sufficient, remember the oft-told joke about two campers sitting around after dinner. They hear noises coming towards them. One of them reaches for a pair of running shoes, and starts to don them. The second then notes that even with running shoes, they cannot hope to outrun a bear, to which the first notes that most likely the bear will be satiated after catching the slower of them. We seek problems in which we don't need to be faster than the bear, just faster than the others fleeing the bear.

9783031022814

10.1007/978-3-031-02281-4 doi


Computer networks .
Computer Communication Networks.

TK5105.5-5105.9

004.6