Setting MIME types



You can use the -mimeinclude, -indmimeinclude, ‑mimeexclude, and -indmimeexclude MIME type criteria options to include or exclude MIME types.

Syntax restrictions

Following are the restrictions when you specify MIME type criteria:

Using the wildcard character (*)

The asterisk (*) wildcard character does not operate as a regular expression for the value of the MIME type criteria. Instead, you can only use it to replace the entire MIME type or MIME subtype.

For example, the following value is a valid substitute for text/html:

text/*

The following value is NOT a valid substitute for text/html:

text/h*

Multiple parameter values

If you use quotataion marks while specifying a series of parameter values for a single instance of a MIME type critera, enclose each separate parameter value in single-quotation marks. For example:

-mimeinclude 'text/plain' 'application/*'

If you enclose the entire sequence of parameter values, as follows:

-mimeinclude 'text/plain application/*'

Verity Spider considers the entire expression a single value.

You can also use multiple instances of the MIME type criteria, each with a single parameter value, where quotation marks are necessary only if you use the wildcard character (*). For example:

-mimeinclude text/plain 
-mimeinclude 'application/*'.Setting MIME Types

MIME types and web crawling

When you index a website, Verity Spider evaluates your MIME type criteria against the "Content-Type" HTTP headers sent by the web server hosting that website. That web server passes along MIME type information based on its own internal tables.

When you encounter MIME types being dropped, make sure that the web server you are indexing has the necessary MIME type information. For information about specifying MIME types, see the documentation for your web server.

You can examine the indexing job’s log files for indications that files are being skipped due to MIME types. For example, a typical ASCII file you might want indexed is a log file (filename.log). Unless the web server understands that files with .LOG extensions are ASCII text, of MIME type text/plain, you see in the indexing job log file that .LOG files are skipped because of MIME type, even if you use the following:

-mimeinclude 'text/*'

MIME types and file system indexing

When you index a file system, Verity Spider reads filenames and evaluates your MIME type criteria against an internal, compiled list of known MIME types and associated filename extensions. You cannot edit this list. However, you can use the -mimemap option to create a custom MIME type mapping.

When you encounter MIME types being dropped, check whether Verity Spider recognizes that particular MIME type. For more information, see the table, Known MIME types for file system indexing.

You can examine the indexing job’s log files for indications that files are being skipped due to MIME types. For example, a typical ASCII file you might want indexed is a log file (filename.log). Since Verity Spider does not understand that files with .LOG extensions are ASCII text, of MIME type text/plain, you see in the indexing job log file that .LOG files are skipped because of MIME type, even if you use the following:

-mimeinclude 'text/*'.Setting MIME Types

Indexing unknown MIME types

Whenever you find MIME types being dropped, or you plan to index files whose extensions are not known to Verity Spider by default, use the -mimemap option to point to a file that contains your own custom mappings for filename extensions and MIME types.

You can also use the regular expression '*/*' for your MIME type criteria; for example:

-mimeinclude '*/*'

On either platform, you include single-quotation marks for values that include wildcard characters.

Also use inclusion and exclusion criteria to finely control what is indexed, as follows:

  1. If your list of file types to index is rather long, use exclusion criteria (-exclude, -indexclude, -mimeexclude, or -indmimeexclude) to exclude extensions you know you do not want to index; for example:

    -exclude '*.exe' '*.com'
  2. If the list of file types you want to index is relatively small, use inclusion criteria (-include, ‑indinclude, -mimeinclude, or -indmimeinclude) to specify them; for example:

    -include '*.txt' '*.1st' '*.log' .Setting MIME Types

Known MIME types for file system indexing

The following table lists the MIME types that Verity Spider recognizes when indexing file systems:

Format

MIME type

Extension

HTML

text/html

htm, html

ASCII

text/plain

txt, text, pl, eml

ASCII, source files

text/plain

c, h, cpp, cxx

PDF

application/pdf

pdf

MS Word

application/msword

doc

MS Excel

application/vnd.ms-excel

xls

MS PowerPoint

application/vnd.ms-powerpoint

ppt

WordPerfect 5.1

application/wordperfect5.1

wpd

RTF

application/rtf

rtf

FrameMaker MIF

application/vnd.mif

mif

Applixware

application/applixware

aw

Zip files

application/zip

zip

Eudora mail

text/x-mbox

mbx