http-analyze creates a summary of the information found in the server's
logfile. The analyzer counts the requests, saves the unique URLs, sitenames,
browser types and referrer-URLs and creates a comprehensive statistics report.
The following terms are used in this report:
- Hits
- A hit is any response from the web server on behalf of a request sent
by a browser, such as text (HTML) files, images, applets, audio/movie clips
and even error messages. For example, if a page is requested which contains
two inline images, the server would generate three hits: one hit for the
HTML page itself and two hits for the images. If an invalid URL is requested,
the server would respond with a Code 404 (Not Found) status code,
which is also a response accounted for as a hit.
- Files
- If the server sends back a file for this request, this is accounted for as a
Code 200 (OK) response. Such a response is classified as a file sent.
Again, file here means any kind of a file, no matter whether it contains
text (HTML documents) or binary data (images, applets, movies, etc.).
Note that if you would configure the web server to only log accesses to
HTML files, but not images nor any other binary data, the number of files
would directly correspond to the number of documents served.
- Cached
- A cached file is a Code 304 (Not Modified) response.
This response is generated by the server if a document hasn't changed
since the last time it was transferred to the site requesting it.
If the browser has access to a local copy of a document requested by a user
- either through its local disk cache or through a caching server -, it sends
out a conditional request, which asks for the document to be sent only
if it has been changed since it was requested the last time. If the document
hasn't been change since then, the server sends back a Code 304
response to inform the browser that it can use its local copy.
-
- While this caching mechanism can significantly reduce network traffic,
it causes an inaccuracy in the statistics report regarding the number
a file is requested by someone because of two reasons: First, the
browser can be configured to send conditional requests every time,
once per session or never if a cached file is requested.
Second, online services, ISPs, companies and many other organizations
use so-called caching servers or proxies, which itself fulfill requests
if the file is found in the cache. Since proxies can serve hundreds to
thousands of users, requests from certain sites could be caused by
thousands of users requesting a cached file or by just one person
with his/her browser configured to not cache anything at all.
The ratio between files sent and cached files therefore
reflects the efficiency of caching mechanisms - but only for those
requests which were handled by your web server.
- Pageviews
- The pageview mechanism can be used to separate requests for text
or HTML files from all other types of requests.
If a filename pattern has been defined, http-analyze classifies all
URLs matching this pattern as pageviews (text files), which allows to
estimate the number of "real" text documents transmitted by
your web server. Filename patterns may be defined using the option
-G or the PageView directive
in the configuration file. The suffix .html is pre-defined already.
- KBytes transferred
- This is the amount of data sent during the whole summary period as
reported by the server. Note that some servers record the size of a
document instead of the actual number of bytes transferred. While in
most cases this is the same, if a user interrupts the transmission by
pressing the browser's stop button before the page has been received
completely, some servers (for example all Netscape web servers) log
the size of the file instead the amount of data transmitted actually.
- KBytes requested
- This is the amount of data requested during the whole summary period.
http-analyze computes this number by summing up the values of
KBytes transferred and KBytes saved by cache (see below).
- KBytes saved by cache
- The amount of data saved by various caching mechanisms.
This value is computed by multiplying the number of cached files
(Code 304) responses with the size of the corresponding file.
Because http-analyze can determine the size of a file only
if the file has been transmitted at least once in the same summary
period, the values for KBytes saved by cache and KBytes
requested are just approximations of the real values.
- Unique URLs
- The total number of unique URLs is the sum of all different URLs
(files) on your web server, which have been requested at least once in the
corresponding summary period.
- Referrer URLs
- If a user follows a link to your web site and his/her browser sends
the URL of the page containing the link to the server, this URL is logged
as the referrer URL (the location referring to your document).
Note that the browser does not necessarily send a referrer URL and even
if it does, a proxy server may alter or delete it before forwarding the
request to a web server. Such requests appear under Unknown in
the referrer URL list.
- Self-referrer URLs
- As soon as the browser detects any inline objects (images, applets, etc.)
in a page just loaded, it sends out separate requests for those objects.
If the objects reside on the same server as the page referring to them,
the corresponding referrer URLs contain the URL of the page on your server.
Such requests are called self-referrer URLs.
If configured correctly, http-analyze separates all self-referrer URLs
from the rest of the referrer URLs in the report.
This allows to separate accesses, which actually originated by using inline
objects in a text page, from the remaining (external) accesses.
- Unique sites
- This is the number of all different hostnames or IP addresses found in
the logfile. Each different hostname is counted only once per period, so
this number shows how many systems did send requests to your server.
- Sessions
- Similar to unique sites, this is the number of different hostnames
or IP addresses accessing the server during a certain time-window,
which defaults to one day for backward compatibility. Accesses from a
known hostname outside this time-window get accounted for as a new
session. You can increase or decrease the time-window for sessions
using the option -u or the Session
directive in the configuration file. For example, if you set the time-window
to 2 hours, all accesses from the same host in less than 2 hours are
accounted for as the same session, while any access more than 2 hours
apart from the first one is accounted for as a new session.
- Request Method
- The browser uses a certain method to request a document from a web server.
For example, documents, images, applets, etc. are usually requested using the
GET method. Other often used methods are the HEAD method to
request more information about a document such as its size without have the
server send its actual content, and the POST method, a special way
to transfer user input from forms into CGI scripts.
-
- Although all logfile entries with a valid request method are accounted
for as hits, only URLs requested using either the GET or the
POST method are processed further. The remaining hits are summarized
under Request Methods other than GET/POST.
- Response Codes
- In reply of a request from a browser, the server sends back a status
code such as a Code 200 (OK) or Code 404 (Not Found) response.
Similar to the request methods, the analyzer will account any valid response
code as a hit, but it will only process those URLs, which did cause a Code 200
(OK), Code 304 (Not Modified), or Code 404 (Not Found)
response from the server. All other responses are summarized in the monthly
summary page under Other Response Codes. See the current HTML
specification at http://www.w3.org/ for information about all valid
response codes and their meaning.
http-analyze recognizes HTTP/1.1 responses according to RFC2068.
- Unresolved
- A system identifies itself to a web server using an IP number.
Depending on the configuration, the web server might perform a DNS lookup
to resolve the IP number into a hostname. If no hostname has been assigned
to this IP number, only the IP number is logged.
Such requests are accounted for under Unresolved in the country list
of the statistics report. Since some systems intentionally have no hostname,
a percentage of up to 35% for unresolved IP numbers is absolutely normal.
If the country list shows only 100% unresolved IP numbers, either
enable the DNS lookup in your web server or have a DNS resolver utility
preprocess the logfile before feeding the data into http-analyze.
For our Commercial Service Licensees, we offer a fast DNS resolver utility
with negative caching and a history mechanism.
Visit our support site for more information.
What the report does NOT show ...
Due to the nature of the HTTP
protocol used for communication between the browser and the server and
due to the type of information available in the server's logfile, the
analyzer can not:
Even if you classify certain URLs as pageviews or use a specific
time-window to count sessions, this does in no way tell you anything
about the number of real visitors of your server.
However, if you use an appropriate server structure with files grouped by
its content or if you use the HideURL
directive to group unstructered files together, the statistics report
does show you at least a trend or a tendency. Following the numbers for
some time, you soon get a feeling which documents are most interesting
for the visitors of your site.