indexing

This configuration element, which can be found in the instance-specific config/indexing.xml file, determines the details of content indexing by the Content Management Server and the Template Engine.

  • advancedSearch: Configures the indexing when the advanced search is used in the Content Management Server. The element has the same subentries as incrementalExport.

  • contentPreprocessors: This element defines preprocessors, that are called before versions are indexed. If no preprocessors are to be called, <contentPreprocessors /> must be specified. Example for an internal and an external preprocessor definition:

    <contentPreprocessors type=list>
      <preprocessor>
        <processor type="internal"/>
        <mimeTypes type="list">
          <mimeType>application/vnd.ms-excel</mimeType>
          <mimeType>application/vnd.ms-powerpoint</mimeType>
          <mimeType>application/msword</mimeType>
        </mimeTypes>
      </preprocessor>
      <preprocessor>
        <processor type="external">bin/tclsh</processor>
        <processorArguments type="list">
          <argument>pdfToTextWrapper.tcl</argument>
        </processorArguments>
        <mimeTypes type="list">
          <mimeType>application/pdf</mimeType>
        </mimeTypes>
      </preprocessor>
      <preprocessor>
        <!-- Another preprocessor for other MIME types -->
      </preprocessor>
    </contentPreprocessors>
    

    Each preprocessor is responsible for at least one MIME type. As with all lists, The contentPreprocessors Element has an obligatory attribute, type="list". This element consists of subelements each of which defines a preprocessor. Each preprocessor subelement has the following subelements:

    • mimeTypes defines the MIME types of the versions to be processed by this preprocessor.

      Attributes: type with the value list (obligatory).

      Content: For each MIME type a mimeType element, whose value is the respective MIME type (for example text/html).

    • processor defines the preprocessor for versions with one of the specified MIME types.

      Attributes: type with one of the following values: internal, external, ignore, ignoreBlob. Default: external.

      Content, if type has the value internal: The blob is filtered by the Verity filter application before it is indexed.

      Content, if type has the value ignore: the version is not indexed; the content of the element is ignored.

      Content, if type has the value ignoreBlob: empty. All fields except the main content are indexed. The main content is not converted (normally, all field values are converted to plain text before a version is indexed).

      Content, if type has the value external: The data to be indexed is passed to the program specified. Further arguments can be passed to it by means of the processorArguments element. For further explanations on the external preprocessor facility please refer to the Search Server documentation.

    • processorArguments is optional. This element defines the arguments to be passed to the program defined as processor.

      Attributes: type with the value list (obligatory).

      Content: Each commandline argument is specified as the content of an argument subelement.

      Note: Up to version 6.7.0, the commandline arguments need to be provided directly as the value of the processorArguments element (e.g. <processorArguments>pdfToTextWrapper.tcl</processorArguments>).

  • incrementalExport: Configures the indexing for the incremental export. The element has the following subentries:

    • isActive: Switches indexing on (true) or off (false).

    • collectionSelection: Defines rules that determine the collection to be used for indexing a document. Example:

      <collectionSelection>
        <select collection="cm-contents">
          <isEqual name="state" value="edited"/>
        </select>
        <select collection="cm-contents">
          <isEqual name="state" value="released"/>
        </select>
      </collectionSelection>
      

      In each select element collection determines a collection into which a document is indexed if all of the rules contained in the element apply. The rules contained in a select element are AND-related. An OR relation can be formed by using more than one select element in which the same collection name is specified. If the collection attribute is omitted, the document is not indexed if the rules apply. The rules are processed one by one. The first set of rules that applies determines the collection into which the document is indexed. Each rule is represented by one element and can be reversed by adding the tag attribute negate="true". The following rules exist:

      • isEqual: This rule applies if the value of the file or version field specified by means of the name tag attribute exactly corresponds to the string value. Example:
        <isEqual name="mimeType" value="application/x-shockwave-flash" />

      • isTrue: This rule applies if the file or version field specified by means of the name tag attribute has the value true, yes, or 1.

      • isFalse: This rule applies if the file or version field specified by means of the name tag attribute has the value false, no, or 0 hat.

      • hasPrefix: This rule applies if the value of the file or version field specified by means of the name tag attribute begins with the string value. Example:
        <hasPrefix name="mimeType" value="application/" />

      • hasSuffix: This rule applies if the value of the file or version field specified by means of the name tag attribute begins with the string value. Example:
        <hasSuffix name="mimeType" value="/zip" />

      • matches: This rule applies if the value of the file or version field specified by means of the name tag attribute contains a string that matches the regular expression specified as value. Example:
        <matches name="collspec" value=".*live.*" />

  • staticExport: Configures the indexing for the static export by the Content Management Server. The element has the same subentries as incrementalExport.

  • vseLocale: Determines the locale (language specific settings) the Verity Search Cartridge is to use. uni, germanx, and englishx are available by default (additional locales can be acquired). uni is a universal locale that uses the UTF-8 character encoding. However, no language-specific search query functions such as stemming or typographical tolerance can be used. The value specified is applied to all collections. If this value is changed, all collections need to be created again.