INTERPRETATION OF THE RESULTS

http-analyze creates a summary of the information found in the server's logfile. The analyzer counts the requests, saves the unique URLs, sitenames, browser types and referrer-URLs and creates a comprehensive statistics report. The following terms are used in this report:

Hits green
A hit is any response from the web server on behalf of a request sent by a browser, such as text (HTML) files, images, applets, audio/movie clips and even error messages. For example, if a page is requested which contains two inline images, the server would generate three hits: one hit for the HTML page itself and two hits for the images. If an invalid URL is requested, the server would respond with a Code 404 (Not Found) status code, which is also a response accounted for as a hit.
Files blue
If the server sends back a file for this request, this is accounted for as a Code 200 (OK) response. Such a response is classified as a file sent. Again, file here means any kind of a file, no matter whether it contains text (HTML documents) or binary data (images, applets, movies, etc.). Note that if you would configure the web server to only log accesses to HTML files, but not images nor any other binary data, the number of files would directly correspond to the number of documents served.
Cached yellow
A cached file is a Code 304 (Not Modified) response. This response is generated by the server if a document hasn't changed since the last time it was transferred to the site requesting it. If the browser has access to a local copy of a document requested by a user - either through its local disk cache or through a caching server -, it sends out a conditional request, which asks for the document to be sent only if it has been changed since it was requested the last time. If the document hasn't been change since then, the server sends back a Code 304 response to inform the browser that it can use its local copy.
 
While this caching mechanism can significantly reduce network traffic, it causes an inaccuracy in the statistics report regarding the number a file is requested by someone because of two reasons: First, the browser can be configured to send conditional requests every time, once per session or never if a cached file is requested. Second, online services, ISPs, companies and many other organizations use so-called caching servers or proxies, which itself fulfill requests if the file is found in the cache. Since proxies can serve hundreds to thousands of users, requests from certain sites could be caused by thousands of users requesting a cached file or by just one person with his/her browser configured to not cache anything at all. The ratio between files sent and cached files therefore reflects the efficiency of caching mechanisms - but only for those requests which were handled by your web server.
Pageviews magenta
The pageview mechanism can be used to separate requests for text or HTML files from all other types of requests. If a filename pattern has been defined, http-analyze classifies all URLs matching this pattern as pageviews (text files), which allows to estimate the number of "real" text documents transmitted by your web server. Filename patterns may be defined using the option -G or the PageView directive in the configuration file. The suffix .html is pre-defined already.
KBytes transferred orange
This is the amount of data sent during the whole summary period as reported by the server. Note that some servers record the size of a document instead of the actual number of bytes transferred. While in most cases this is the same, if a user interrupts the transmission by pressing the browser's stop button before the page has been received completely, some servers (for example all Netscape web servers) log the size of the file instead the amount of data transmitted actually.
KBytes requested
This is the amount of data requested during the whole summary period. http-analyze computes this number by summing up the values of KBytes transferred and KBytes saved by cache (see below).
KBytes saved by cache
The amount of data saved by various caching mechanisms. This value is computed by multiplying the number of cached files (Code 304) responses with the size of the corresponding file. Because http-analyze can determine the size of a file only if the file has been transmitted at least once in the same summary period, the values for KBytes saved by cache and KBytes requested are just approximations of the real values.
Unique URLs
The total number of unique URLs is the sum of all different URLs (files) on your web server, which have been requested at least once in the corresponding summary period.
Referrer URLs
If a user follows a link to your web site and his/her browser sends the URL of the page containing the link to the server, this URL is logged as the referrer URL (the location referring to your document). Note that the browser does not necessarily send a referrer URL and even if it does, a proxy server may alter or delete it before forwarding the request to a web server. Such requests appear under Unknown in the referrer URL list.
Self-referrer URLs
As soon as the browser detects any inline objects (images, applets, etc.) in a page just loaded, it sends out separate requests for those objects. If the objects reside on the same server as the page referring to them, the corresponding referrer URLs contain the URL of the page on your server. Such requests are called self-referrer URLs. If configured correctly, http-analyze separates all self-referrer URLs from the rest of the referrer URLs in the report. This allows to separate accesses, which actually originated by using inline objects in a text page, from the remaining (external) accesses.
Unique sites
This is the number of all different hostnames or IP addresses found in the logfile. Each different hostname is counted only once per period, so this number shows how many systems did send requests to your server.
Sessions red
Similar to unique sites, this is the number of different hostnames or IP addresses accessing the server during a certain time-window, which defaults to one day for backward compatibility. Accesses from a known hostname outside this time-window get accounted for as a new session. You can increase or decrease the time-window for sessions using the option -u or the Session directive in the configuration file. For example, if you set the time-window to 2 hours, all accesses from the same host in less than 2 hours are accounted for as the same session, while any access more than 2 hours apart from the first one is accounted for as a new session.
Request Method
The browser uses a certain method to request a document from a web server. For example, documents, images, applets, etc. are usually requested using the GET method. Other often used methods are the HEAD method to request more information about a document such as its size without have the server send its actual content, and the POST method, a special way to transfer user input from forms into CGI scripts.
 
Although all logfile entries with a valid request method are accounted for as hits, only URLs requested using either the GET or the POST method are processed further. The remaining hits are summarized under Request Methods other than GET/POST.
Response Codes
In reply of a request from a browser, the server sends back a status code such as a Code 200 (OK) or Code 404 (Not Found) response. Similar to the request methods, the analyzer will account any valid response code as a hit, but it will only process those URLs, which did cause a Code 200 (OK), Code 304 (Not Modified), or Code 404 (Not Found) response from the server. All other responses are summarized in the monthly summary page under Other Response Codes. See the current HTML specification at http://www.w3.org/ for information about all valid response codes and their meaning. http-analyze recognizes HTTP/1.1 responses according to RFC2068.
Unresolved
A system identifies itself to a web server using an IP number. Depending on the configuration, the web server might perform a DNS lookup to resolve the IP number into a hostname. If no hostname has been assigned to this IP number, only the IP number is logged. Such requests are accounted for under Unresolved in the country list of the statistics report. Since some systems intentionally have no hostname, a percentage of up to 35% for unresolved IP numbers is absolutely normal. If the country list shows only 100% unresolved IP numbers, either enable the DNS lookup in your web server or have a DNS resolver utility preprocess the logfile before feeding the data into http-analyze. For our Commercial Service Licensees, we offer a fast DNS resolver utility with negative caching and a history mechanism. Visit our support site for more information.

What the report does NOT show ...

Due to the nature of the HTTP protocol used for communication between the browser and the server and due to the type of information available in the server's logfile, the analyzer can not:

  • identify a person as a visitor of your server,
  • count the number of visitors of your server,
  • find out the email address of a visitor,
  • track the path a visitor takes through your site,
  • measure the time a visitor sees a page of your server,
  • determine the last page someone saw before leaving your site,
  • inform you about the sudden death of the visitor while looking at your homepage,
  • nor show any other information not recorded in the server's logfile.
  • Even if you classify certain URLs as pageviews or use a specific time-window to count sessions, this does in no way tell you anything about the number of real visitors of your server.

    However, if you use an appropriate server structure with files grouped by its content or if you use the HideURL directive to group unstructered files together, the statistics report does show you at least a trend or a tendency. Following the numbers for some time, you soon get a feeling which documents are most interesting for the visitors of your site.