Selects documents that contain the word you specify plus words that
are similar to the query term. The TYPO/N
operator
performs approximate pattern matching to identify similar words. This
makes it ideal for use in an environment where documents have been
scanned using optical character recognition (OCR).
The optional N
variable in the operator name
expresses the maximum number of errors between the query term and a
matched term, a value called the error distance. If
N
is not specified, an error distance of 2 is
used.
The error distance between two words is based on the calculation of errors, where an error is defined to be a character insertion, deletion, or transposition. For example, for these sets of words, the second word matches the first within an error distance of 1:
mouse, house (m → h) agreed, greed (a is deleted) cat, coat (o is inserted)
For the query below, documents with the words "sweeping" and "swimming" will match, since there are 3 transpositions in the word (e ? i, e ? m, p ? m).
<TYPO/3> sweeping
Both of the queries below will return the same results. Documents containing the words "swept" and "kept" will match, since the "kept" word contains 1 transposition and 1 deletion.
<TYPO/2> swept <TYPO> swept
The TYPO/N
operator must scan the collection's
word list in order to find candidate matching words. This makes it
impractical for use in large collections (greater than 100,000 documents
unless a current spanning word list is available) or in
performance-sensitive environments. Performance can be improved by
generating a spanning word list for the collections to be used.
Please note these limitations: A query term specified with
TYPO/N
can have a maximum length of 32 characters.
Also, TYPO/N
is not supported with multi-byte
character sets.