Content Search

In the editorial system, the Search Engine Server indexes the versions of CMS files. On the live system, it indexes the exported web documents. In addition to this, in both systems the most important file fields are indexed.

The search results returned by the Search Engine Server are composed of data records. The server returns one record for each document that matches the search query. Each record consists of a set of document fields whose contents have been set during indexing (see Content Indexing). The document fields can be configured as required (see Configuring Collections).

In addition to important version and file fields, the rank of a matching document is available as a document field (score) in the standard configuration. The rank of a document specifies the relevance of the latter in relation to the corresponding search query. The relevance is given as a number between 0 and 100.

For both the editorial and the live system, the following table lists the document fields returned by the server for each document that matches a search query:

Document Field Editorial System Live System
collection
score
docId (from version/file ID)
lastChanged
objId (from file ID)
title
visiblePath

The Autonomy search module assigns the contents of the lastChanged and title zones to the document fields with the respective names. The version and file fields listed below are indexed as zones:

Indexed File Fields
Editorial System Live System
name
objClass
objType
suppressExport
visiblePath
workFlowName
File Permissions

Indexed Version Fields
Editorial System Live System
blobLength
exportBlob
(exported object, not for images)
contentType
lastChanged
mimeType
state
title
validFrom
validUntil
custom fields
(excluding signature and linklist fields)

The visiblePath zone and field are empty in the editorial system. On the live system, they contain the path to the document.

For custom version fields of the multi-selection (multienum) type, each field value is indexed as a zone with the name of the version field. If such a field has several values, for each value a zone with the same name is indexed.

The same applies to file permissions: Each user group with a certain permission is indexed as a zone with the name of that file permission. An exception to this is the live server read permission (permissionLiveServerRead). The corresponding zone contains the names of all groups that have been given this permission. For documents that are not subject to access restrictions, the zone noPermissionLiveServerRead is indexed with the content free.

In search requests, you can execute a targeted search for documents containing the search term in one or more zones using the operator IN.

Multiple Parsers

The Infopark Search Cartridge supports search queries in several formats. For each format, a so-called parser is responsible. A parser analyzes input – in this case, search queries – and converts them into a general internal format in order to perform the actions corresponding to the input.

Search queries can be made either as free text, in explicit syntax, or in simple syntax. The default configuration uses the parser for queries in simple syntax.

The free text parser can be used for making search queries in written language, i. e. without using operators (e. g. „peace negotiations in the Middle East"). The Infopark Search Cartridge internally converts free text queries into search queries by removing unimportant words like articles, conjunctions, or prepositions (so-called stop words) and by taking into account the specifics of natural language such as noun phrases and word order. (See also the information about the operator FREETEXT).

In contrast to this, for queries in explicit or simple syntax, the search engine takes into account the operators with which search terms may be combined. For further information regarding the simple and explicit syntax as well as operators, please refer to the sections The Syntax of the Search Queries and Operators and Modifiers.

Pre-Processing and Post-Processing

The Search Engine Server allows each search request it receives from a client (including the Content Manager or the Template Engine) to be processed by a pre-processor before the request is passed to the search module. With such a pre-processor, terms or operations can be added to search queries, or disallowed search terms can be removed from the queries, for example. Because the search request the pre-processor receives is the XML document originally sent to the Search Engine Server, the pre-processor must be able to process XML documents.

The post-processing of search results works analogously to the pre-processing. Once the Search Engine Server has passed a (if applicable, pre-processed) search request to the search module, the module returns a search result. This result can be processed by a post-processor, in order to, for example, extend or shorten the list of the found documents or to attach to each hit an additional document field whose respective value has been calculated by the the post-processor.

Character Sets

The Search Cartridge uses the character encoding UTF-8. In order to be able to return search results (i. e. primarily the contents of document fields) encoded in UTF-8, the indexed documents must have this character set, too. This is ensured by the Content Manager and the Template Engine, respectively.