TYPO/N

Selects documents that contain the word you specify plus words that are similar to the query term. The TYPO/N operator performs approximate pattern matching to identify similar words. This makes it ideal for use in an environment where documents have been scanned using optical character recognition (OCR).

The optional N variable in the operator name expresses the maximum number of errors between the query term and a matched term, a value called the error distance. If N is not specified, an error distance of 2 is used.

The error distance between two words is based on the calculation of errors, where an error is defined to be a character insertion, deletion, or transposition. For example, for these sets of words, the second word matches the first within an error distance of 1:

mouse, house (m → h)
agreed, greed (a is deleted)
cat, coat (o is inserted)

For the query below, documents with the words "sweeping" and "swimming" will match, since there are 3 transpositions in the word (e ? i, e ? m, p ? m).

<TYPO/3> sweeping

Both of the queries below will return the same results. Documents containing the words "swept" and "kept" will match, since the "kept" word contains 1 transposition and 1 deletion.

<TYPO/2> swept
<TYPO> swept

The TYPO/N operator must scan the collection's word list in order to find candidate matching words. This makes it impractical for use in large collections (greater than 100,000 documents unless a current spanning word list is available) or in performance-sensitive environments. Performance can be improved by generating a spanning word list for the collections to be used.

Please note these limitations: A query term specified with TYPO/N can have a maximum length of 32 characters. Also, TYPO/N is not supported with multi-byte character sets.