ColdFusion 9.0 Resources |
Networking optionsThe Verity Spider networking options are listed here. -agentnameSyntax-agentname string Specifies the value for the agent name field that is part of the HTTP request. You can use the -agentname option to impersonate a browser client because web servers can be configured to return different versions of the same page depending on the requesting agent. Use double-quotation marks if the name contains a space. Use the -cmdfile option if the agent name you want to use contains forbidden characters, such as slashes or backslashes. -connectionsSyntax-connections num_connections Specifies the maximum number of simultaneous socket connections to make to websites for indexing. Each connection implies a separate thread. The default value is 6. Note: The Verity Spider dynamic
flow control makes the most use of all available connections when
indexing websites. If you are indexing multiple sites, you might want
to increase this number. Increasing the number of connections does
not always help, because of such dependencies as your network connection
and the capabilities of the remote hosts.
-headerSyntax-header string Specifies an HTTP header to add to the request; for example: -header "Referer: http://www.verity.com/" Verity Spider sends some predefined headers, such as Accept and User-Agent, by default. Special headers are sometimes necessary to correctly index a site. For example, earlier versions of Verity Spider did not support the Host header, which was needed for Virtual Host indexing. Also, a Proxy-authentication header was required to pass a user name and password to a proxy server. In the current version of Verity Spider, the Host header is supported by default, and the -proxyauth option is available for proxy server authentication. Therefore, the -header option is maintained only for backwards compatibility and possible future enhancements. Note: Misuse of this option causes
spider failure. If this happens, rerun the indexing task with modified -header values.
-noflowctrlTypeWeb crawling only Disables round-robin indexing of websites with network flow control. By default, Verity Spider uses round-robin indexing of websites to avoid overwhelming a web server and to improve indexing performance. Verity Spider connects to each web server in a round-robin manner, using up to the value for the -connections option. This means that one URL is fetched from each web server, in turn. Note: Using the -noflowctrl option
can result in a significant drop in performance.
-noproxySyntax-noproxy name_1 [name_n] ... Used with the -proxy option, the -noproxy option specifies that Verity Spider directly access the hosts whose names match those specified. By default, when you specify the -proxy option, Verity Spider first tries to access every host with the proxy information. To improve performance, use the -noproxy option for the hosts you know can be accessed without a proxy host. For the name variable, you can use the asterisk (*) wildcard for text strings; for example: '*.verity.com' You cannot use the question mark (?) wildcard, and the -regexp option does not let you use regular expressions. In Windows, include double-quotation marks around the argument to protect the asterisk special character (*). On UNIX, use single-quotation marks. This is only required when you run the indexing job from a command line. Quotation marks are not necessary within a command file (the -cmdfile option). Note: You must have valid Verity Spider licensing capability
to use this option.
-proxySyntax-proxy proxyhost:port Specifies host and port for proxy server. Note: You must
have valid Verity Spider licensing capability to use this option.
See also -proxyauth for proxy servers that require authentication, and -noproxy for hosts that you know are accessible without having to go through a proxy server. -proxyauthSyntax-proxyauth login:password Specifies login information for proxy server connections that require authorization to get outside the firewall. Use this option with the -proxy option. Note: You must have valid Verity Spider licensing capability
to use this option. Information Server V3.7 does not support retrieving
documents for viewing through secure proxy servers. Do not use the -proxyauth option
for indexing documents that are viewed through Information Server
V3.7
-timeoutSyntax-timeout num_seconds Specifies the time period, in seconds, that Verity Spider should wait before timing out on a network connection and on accessing data. The data access value is automatically twice the value that you specify for the network connection time out. The default value for the network connection time-out is 30 seconds, and therefore the default value for the data access time-out is 60 seconds. |