DeG00gleIISer

Google

Introduction

Google is one of the success stories of the Internet in the last year. While so many Internet businesses have fallen by the wayside, Google continues to gain market share in the Internet search market. As an example, of the top 20 sites referring internet traffic to brettb.com during the first week of April 2002, eight of the sites were those owned or operated by Google.

DeG00gleIISer is a small utility for determining what users coming to your website from Google are searching for. It extracts this information from the logfiles of Internet Information Server 4.0 (or Internet Information Services on Windows 2000), provided they are in the W3C Extended Log Format*. Note that the logfiles must log the Referer (i.e. HTTP_REFERER Server Variable).

*Other log formats may work.

Sample Report

A sample report was prepared from the IIS logfiles for brettb.com from November 2000 to April 2002, and may be viewed here.

Each report contains three HTML pages. The "Page Summary" report shows the number of Google referrals to each page. The "Page Detail" report shows the keywords searches on Google that resulted in visits to the specific pages. Finally, the "Keyword Summary" report shows the words that users were searching for on Google when they visited your site.

Using the DeG00gleIISer Script

The DeG00gleIISer is available as a download. Although the script should be compatible with any version of Perl 5, it has been specifically developed using ActiveState's ActivePerl for Windows. ActivePerl is a free download from the ActiveState website.

The DeG00gleIISer.pl Perl script is run from the command line, using the following command line options:

Command line option Description Example
--i Input folder --i="C:\WINNT\system32\LogFiles\W3SVC1"
--o Output folder (optional, default is to place them in the current working folder) --o="D:\reports\mywebsite"
--v Verbose mode outputs script status to console (optional, default is on) --v=1 or --v=0

As an example, the script could be called from the command line using:

DeG00gleIISer.pl --i="C:\WINNT\system32\LogFiles\W3SVC1" --o="D:\reports\mywebsite" --v=0

Don't forget that it is possible to run Perl scripts on Windows machines from .bat scripts, which can of course be scheduled using the AT command or the Windows Task Scheduler.

Further Configuration Options

As well as the command line options, there are a number of options that can only be modified by editing the DeG00gleIISer Perl Script. These options may be altered by changing the value of specific variables in the section of the script labelled "The following variables can be changed as required". The variables that can be changed are:

variable name Description Example or default value
$minimumkeywordlevel The minimum number of searches for a keyword before it appears in the "Keyword Summary" report. It is recommended that the value of this variable and the $minimumpagelevel variable are increased from their default values for sites generating a lot of traffic as the reports can become very large otherwise. 5
$minimumpagelevel Minimum number of referrals for a page before it appears in the "Page Summary" report 5
$ignorecase Set to 1 if the case of search terms should be ignored, or 0 if case should be taken into account. 1
$pagesummaryreport_filename Filename to be used for the "Page Summary" report PageSummary.html
$pagedetailreport_filename Filename to be used for the "Page Detail" report PageDetail.html
$keywordsummaryreport_filename Filename to be used for the "Keyword Summary" report KeywordSummary.html
$musthave A list of file extensions that the IIS log files should contain .log\$|.txt\$
$weburlextension A list of file extensions that should only be used when log files are scanned for Google referrals .asp\$|.aspx\$|.htm\$|.html\$|.pdf\$
$mustnothave A list of strings that should a URL should not contain when log files are scanned for Google referrals _ScriptLibrary|_vti|postinfo

Source Code and Documentation

The source code for the DeG00gleIISer script is available using the link below:

bullet.gif (1378 bytes)DeG00gleIISer_1.0.zip (5.30 Kb ZIP file).

The DeG00gleIISer has been downloaded 0 times since 30 December 1899.

bullet.gif (1378 bytes)PerlDoc generated technical documentation.

Legal Issues

This software has been released as 'freeware' on the condition that it is not modified in any way. The software may not be redistributed without permission from the author.

Google is a is a trademark of Google Inc..

Limited technical support for this application is available by contacting the author. No responsibility will be accepted for any consequence arising from the installation of this program.

Further Reading

bullet.gif (1378 bytes)ActivePerl is a free download from the ActiveState website.
ASP Documentation Tool. This creates technical documentation for ASP pages.

There are plenty of online resources for learning Perl, with http://www.perl.com and http://www.perl.org being two of the best starting points.

[ Home ] [ What's New? ] [ Web stuff ] [ Gallery ] [ Table of Contents ] [ Search ] [ Contact ]
[ Winnersh Triangle Web Solutions Limited ]