Standard Apache Access LoggingUsing Apache's basic logging features, you can keep track of who visits your Web sites by logging accesses to the servers hosting them. You can log every aspect of the browser requests and server responses, including the IP address of the client, user, and resource accessed. You need to take three steps to create a request log:
The next few sections will take a closer look at these steps. Deciding What to LogAs well as logging nearly every aspect associated with the request, you can define how your log entries appear by creating a log format. A log format is a string that contains text mixed with log formatting directives. Log formatting directives start with a % and are followed by a directive name or identifier, usually a letter indicating the piece of information to be logged. When Apache logs a request, it scans the string and substitutes the value for each directive. For example, if the log format is This is the client address %a, the log entry is something like This is the client address 10.0.0.2. That is, the logging directive %a is replaced by the IP address of the client making the request. Table 25.1 provides a comprehensive list of all formatting directives. The Common Log Format (CLF) is a standard log format. Most Web sites can log requests using this format, and the format is understood by many log processing and reporting tools. Its format is the following: "%h %l %u %t \"%r\" %>s %b" That is, it includes the hostname or IP address of the client, remote user via identd, remote user via HTTP authentication, time when the request was served, text of the request, status code, and size in bytes of the content served. By the Way You can read the Common Log Format documentation of the original W3C server at http://www.w3.org/Daemon/User/Config/Logging.html. The following is a sample CLF entry: 10.0.0.1 - - [26/Aug/2004:11:27:56 -0800] "GET / HTTP/1.1" 200 1456 You are now ready to learn how to define log formats using the LogFormat directive. This directive takes two arguments: The first argument is a logging string, and the second is a nickname that will be associated with that logging string. For example, the following directive from the default Apache configuration file defines the CLF and assigns it the nickname common: LogFormat "%h %l %u %t \"%r\" %>s %b" common You can also use the LogFormat directive with only one argument, either a log format string or a nickname. This will have the effect of setting the default value for the logging format used by the TRansferLog directive, explained in "Logging Accesses to Files" later in this chapter. The HostNameLookups DirectiveWhen a client makes a request, Apache knows only the IP address of the client. Apache must perform what is called a reverse DNS lookup to find out the hostname associated with the IP address. This operation can be time-consuming and can introduce a noticeable lag in the request processing. The HostNameLookups directive allows you to control whether to perform the reverse DNS lookup. The HostNameLookups directive can take one of the following arguments: on, off, or double. The default is off. The double lookup argument means that Apache will find out the hostname from the IP and then will try to find the IP from the hostname. This process is necessary if you are really concerned with security, as described in http://httpd.apache.org/docs-2.0/dns-caveats.html. If you are using hostnames as part of your Allow and Deny rules, a double DNS lookup is performed regardless of the HostNameLookups settings. If HostNameLookups is enabled (on or double), Apache will log the hostname. This does cause extra load on your server, which you should be aware of when making the decision to turn HostNameLookups on or off. If you choose to keep HostNameLookups off, which would be recommended for medium-to-high traffic sites, Apache will log only the associated IP address. There are plenty of tools to resolve the IP addresses in the logs later. Refer to the "Managing Apache Logs" section later in this chapter. Additionally, the result will be passed to CGI scripts via the environment variable REMOTE_HOST. The IdentityCheck DirectiveAt the beginning of the chapter, we explained how to log the remote username via the identd protocol using the %l log formatting directive. The IdentityCheck directive takes a value of on or off to enable or disable checking for that value and making it available for inclusion in the logs. Because the information is not reliable and takes a long time to check, it is switched off by default and should probably never be enabled. We mentioned %l only because it is part of the CLF. For more information on the identd protocol, see RFC 1413 at http://www.rfceditor.org/rfc/rfc1413.txt. Status CodeYou can specify whether to log specific elements in a log entry. At the beginning of the chapter, you learned that log directives start with a %, followed by a directive identifier. In between, you can insert a list of status codes, separated by commas. If the request status is one of the listed codes, the parameter will be logged; otherwise, a - will be logged. For example, the following directive identifier logs the browser name and version for malformed requests (status code 400), and requests with methods not implemented (status code 501). This information can be useful for tracking which clients are causing problems. %400,501{User-agent}i You can precede the method list with an ! to log the parameter if the methods are implemented: %!400,501{User-agent}i Logging Accesses to FilesLogging to files is the default way of logging requests in Apache. You can define the name of the file using the TRansferLog and CustomLog directives. The transferLog directive takes a file argument and uses the latest log format defined by a LogFormat directive with a single argument (the nickname or the format string). If no log format is present, it defaults to the CLF. The following example shows how to use the LogFormat and transferLog directives to define a log format that is based on the CLF but that also includes the browser name: LogFormat "%h %l %u %t \"%r\" %>s %b \"%{User-agent}i\"" TransferLog logs/access_log The CustomLog directive enables you to specify the logging format explicitly. It takes at least two arguments: a logging format and a destination file. The logging format can be specified as a nickname or as a logging string directly. For example, the directives LogFormat "%h %l %u %t \"%r\" %>s %b \"%{User-agent}i\"" myformat CustomLog logs/access_log myformat and CustomLog logs/access_log "%h %l %u %t \"%r\" %>s %b \"%{User-agent}i\"" are equivalent. Logging Environment Variables with CustomLogThe CustomLog directive accepts an environment variable as a third argument. If the environment variable is present, the entry will be logged; otherwise, it will not. If the environment variable is negated by prefixing an ! to it, the entry will be logged if the variable is not present. The following example shows how to avoid logging images in GIF and JPEG format in your logs: SetEnvIf Request_URI "(\.gif|\.jpg)$" image CustomLog logs/access_log common env=!image By the Way The regular expression used for pattern matching in this and other areas of the httpd.conf file follow the same format for regular expressions in PHP and other programming languages. Logging Accesses to a ProgramBoth transferLog and CustomLog directives can accept an executable program, prefixed by a pipe sign |, as an argument. Apache will write the log entries to the standard input of this program. The program will, in turn, process the input by logging the entries to a database, transmitting them to another system, and so on. If the program dies for some reason, the server makes sure that it is restarted. If the server stops, the program is stopped as well. The rotatelogs utility, bundled with Apache and explained later in this chapter, is an example of a logging program. As a general rule, unless you have a specific requirement for using a particular program, it is easier and more reliable to log to a file on disk and do the processing, merging, analysis of logs, and so on, at a later time, possibly on a different machine. By the Way Make sure that the program you use for logging requests is secure because it runs as the user Apache was started with. On Unix, this usually means root because the external program will be started before the server changes its user ID to the value of the User directive, typically nobody or www. ![]() |