Kyle Rankin

Chopping Logs

Why wait for your awstats job to finish when you need custom log results now? Check out a quick-and-dirty Perl one-liner that creates speedy tallies from log files and is easy to tweak to suit your particular statistics needs.

If you are a sysadmin, logs can be both a bane and a boon to your existence. On a bad day, a misbehaved program could dump gigabytes of errors into its log file, fill up the disk and light up your pager like a Christmas tree. On a good day, logs show you every clue you need to track down any of a hundred strange system problems. Now, if you manage any Web servers, logs provide even more valuable information in terms of statistics. How many visitors did you get to your main index page today? What spider is hammering your site right now?

Many excellent log-analysis tools exist. Some provide really nifty real-time visualizations of Web traffic, and others run every night and generate manager-friendly reports for you to browse. All of these programs are great, and I suggest you use them, but sometimes you need specific statistics and you need them now. For these on-the-fly statistics, I've developed a common template for a shell one-liner that chops through logs like Paul Bunyan.

What I've found is that although the specific type of information I need might change a little, for the most part, the algorithm remains mostly the same. For any log file, each line contains some bit of unique information I need. Then, I need to run through the log file, identify that information and keep a running tally that increments each time I see the particular pattern. Finally, I need to output that information along with its final tally and sort based on the tally.

There are many ways you can do this type of log parsing. Old-school command-line junkies might prefer a nice sed and awk approach. The whipper-snappers out there might pick a nicely formatted Python script. There's nothing at all wrong with those approaches, but I suppose I fall into the middle-child scripting category—I prefer Perl for this kind of text hacking. Maybe it's the power of Perl regular expressions, or maybe it's how easy it is to use Perl hashes, or maybe it's just what I'm most comfortable with, but I just seem to be able to hack out this kind of script much faster in Perl.

Before I give a sample script though, here's a more specific algorithm. The script parses through each line of input and uses a regular expression to match a particular column or other pattern of data on the line. It then uses that pattern as a key in a hash table and increments the value of that key. When it's done accepting input, the script iterates through each key in the hash and outputs the tally for that key and the key itself.

For the test case, I use a general-purpose problem you can try yourself, as long as you have an Apache Web server. I want to find out how many unique IP addresses visited one of my sites on November 1, 2008, and the top ten IPs in terms of hits.

Here's a sample entry from the log (the IP has been changed to protect the innocent):

123.123.12.34 - - [01/Nov/2008:19:34:02 -0700] "GET •/talks/pxe/ui/default/iepngfix.htc HTTP/1.1" •404 308 "-" "Mozilla/4.0 (compatible; MSIE 7.0; •■Windows NT 6.0; SLCC1; .NET CLR 2.0.50727; •Media Center PC 5.0; .NET CLR 3.0.04506; InfoPath.2)"

And, here's the one-liner that can parse the file and provide sorted output:

perl -e 'while(<>){ if( m|(A\d+\.\d+\.\d+\.\d+).*? •01/Nov/2008| ){ $v{$1}++; } } foreach( keys •%v ){ print "$v{$_}\t$_\n"; }' •/var/log/apache/access.log | sort -n

When you run this command, you should see output something like the following only with more lines and IPs that aren't fake:

Was this article helpful?

0 0

Post a comment