Brettb.Com
  HOME | ABOUT ME | BIOTECHNOLOGY | ARTICLES | TOOLS | GALLERY | CONTACT
Search: Go
DEVELOPER TOOLS
 ASP Documentation Tool
 .NET Documentation Tool
 PHP Documentation Tool
 SQL Documentation Tool
 VB6 Documentation Tool
 Indexing Service Companion
 The Website Utility

TECHNICAL ARTICLES
 ASP
 ASP.NET
 JavaScript
 Transact SQL
 Software Reviews

PHOTO GALLERIES
 Canon EOS 300D Samples
 Red Arrows 2004
 Living Coasts
 Akihabara Maids!
 Web Page Backgrounds
 More Galleries...

TRAVEL LOG
 2007: Tokyo
 2006: Hong Kong
 2005: New York City

NEW STUFF
 ASP Spell Check
 Code Documentors
 The Website Utility
 Search Engine Optimisation
 Build an ASP Search Engine
 My Tropical Fishtank
 Text WorkBench
 Other New Stuff...

POPULAR STUFF
 Regular Expressions
 ASP Documentation Tool
 Index Server & ASP
 JavaScript Ad Rotator

LINKS
 Business Website
 ASPAlliance Articles
 Software Documentation Portal

Microsoft Certified Professional

Home > Articles

Using the HTTP protocol with PerlScript and ASP

One topic often discussed by ASP programmers is how to access content from other servers using protocols such as HTTP. There are many uses of such procedures, such as ensuring a user entering details into a web form enters a valid URL, or for pulling stock quotes from one site and publishing them via another.

There are several approaches to obtaining content from other servers, and in particular using the HTTP protocol to programmatically access one web page from within another. ASP developers using VBScript or JScript might like to take a look at this article, which describes using an ActiveX object to achieve this. Alternatively the AspHTTP™ component from ServerObjects Inc. is popular with developers.

An alternative approach is to use the PerlScript ActiveX scripting engine. This allows developers to write ASP documents in Perl, rather than the traditional VBScript or JScript. Like VBScript and JScript, Perl is an interpreted language, and is relatively easy to learn. It has long been the language of choice for many web developers, and due to the long association of Perl with the Internet, it is also unsurprising to find that it offers excellent support for the development of Internet applications. Perl is also a good choice when writing a script to extracting and parsing content from other servers due to its superior text handling capabilities.

Using PerlScript

If you want to write an ASP document in PerlScript, then you may want to add the following as the first line of your document:

<%@ LANGUAGE="PerlScript" %>

All the code added to this page between the <% %> tags will then be interpreted as PerlScript instead of the server’s default scripting language (which is usually VBScript).

Although you can, in theory, mix VBScript, JScript and PerlScript within the same document, this will lead to decreased server performance when compared to using a single scripting engine. More importantly, you run the risk of your ASP document outputting content from the various scripting engines in a different order to that which you might have intended. 

One further warning is that there will likely be all kinds of security risks from letting your web pages take input from other web pages. You should, therefore, use this sample code with care, or perhaps restrict its use to an Intranet environment rather than on a publicly accessible Internet site. Don’t forget as well that extracting content from third party web services could bring you into legal difficulties unless you have explicit permission to do so!

Anyway, onto the code samples. The first is a function called CheckURL that will determine whether a specified URL exists. The script uses the libwww Perl library, a collection of modules that can be used to programmatically access the web.

<%
sub CheckURL {
# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to check
$url_to_check = $_[0];

# Use the libwww Perl library
use LWP::UserAgent;

# Create a new instance of a libwww UserAgent in order to send HTTP requests
$ua = new LWP::UserAgent;

# Set the HTTP_USER_AGENT HTTP header for the request
$ua->agent("
Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)");

# Set a timeout for the HTTP request (in seconds)
$ua->timeout(3);

# Set a maximum size for the HTTP request (in bytes)
$ua->max_size(8192);

#Initialise the HTTP request
$request = new HTTP::Request 'GET' => $url_to_check;

# Set the UserAgent to receive HTML
$request->header('Accept' => 'text/html');

# Send the HTTP request
$result = $ua->request($request);

# Check the outcome of the HTTP request
if ($result->is_success) {
$url_status = "
$url_to_check was detected";
} else {
$url_status = "
$url_to_check was not detected";
}

# Return a string with the status of the request
return $url_status;

}
%>

This function can then be called using the following PerlScript (changing the required URL as appropriate):

<%
$Response->Write(CheckURL("
http://www.brettb.com/"));
%>

Extending the script

PerlScript offers a wealth of ways for extending the basic script shown above. For example, using the following as the last line of the CheckURL function will cause the script to return the actual HTML from the HTTP request:

return $result->content;

This is useful if you want to parse the HTML in order to extract portions of it.

Alternatively, if you are interested in the precise error message returned from a server, then the following code will be useful:

return $result->error_as_HTML;

If a URL is not found, then the function will return the following:

An Error Occurred
404 Object Not Found

Writing a link extractor

The following code demonstrates how PerlScript can be used to extract all of the hyperlinks from a document requested using HTTP. There are two functions: ExtractLinks and LinkCollector. ExtractLinks is the main function. LinkCollector is called from ExtractLinks, and is used to gather the requested document’s hyperlinks into a list. The two functions are shown below:

sub ExtractLinks{

# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to extract links from

$url_to_check = $_[0];

# Use the libwww Perl library
use LWP::UserAgent;

# Use the link extracting HTML parser
use HTML::LinkExtor;

# The URL module is used here to expand URLs by including their base reference
use URI::URL;

# Create a list that will be used to contain details of the links within the document
@LinksList= (); 

# Create a new instance of a libwww UserAgent in order to send HTTP requests
$ua = new LWP::UserAgent;

# Set the HTTP_USER_AGENT HTTP header for the request
$ua->agent("
Mozilla/4.0 (compatible; MSIE 4.0; Windows NT)");

# Set a timeout for the HTTP request (in seconds)
$ua->timeout(3);

# Set a maximum size for the HTTP request (in bytes)
$ua->max_size(8192);

# Create an instance of the link extracting HTML parser
$parser = HTML::LinkExtor->new(\&LinkCollector);

#Initialise the HTTP request
$result = $ua->request(HTTP::Request->new(GET => $url_to_check),
sub {$parser->parse($_[0])});

# Expand URLs to include the base reference
$base = $result->base;
@LinksList = map { $_ = url($_, $base)->abs; } @LinksList;

# Check the outcome of the HTTP request
# If successful, then return a list of links in the requested document
# otherwise, return an error message

if ($result->is_success) {

for (@LinksList) {
$LinksList = $LinksList . "
$_<br>";
}

return "$LinksList";

} else {
return "
$url_to_check was not detected";
}

}

# A short subroutine to collect the links into a list
sub LinkCollector {

($tag, %attr) = @_;
push(@LinksList, values %attr);

}
%>

The ExtractLinks subroutine can then be called using something like:

<%
$Response->Write(ExtractLinks("
http://www.brettb.com/"));
%>

Further reading

If you want to install ActivePerl on your web server, then download it (free of charge) from the ActiveState website. The installation routine creates an extensive library of documentation, including reference guides to the Perl modules and functions described in this article.

There are plenty of online resources for learning Perl, with http://www.perl.com and http://www.perl.org  being two of the best starting points.

You might also like to invest in one of these featured books:

Learning Perl (2nd Edition)  Effective Perl Programming: Writing Better Programs With Perl

Useful Development Tools

ASP Documentation Tool™
Automatically creates developer documentation for ASP 2.0 and 3.0 web applications written in VBScript and JScript. Documentation for Microsoft Access, SQL Server 7/2000 databases and Visual Basic 6.0 components associated with the web application can also be incorporated into the reports. Documentation is created in HTML, HTML Help and plain text formats.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (5.2Mb ZIP file).

.NET Documentation Tool
Automatically creates technical documentation for .NET Framework Windows and ASP.NET applications written in C# or VB.NET and SQL Server 7/2000/2005 or Microsoft Access databases associated with the application. Documentation is created in HTML, HTML Help and plain text formats.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (5Mb ZIP file).

SQL Documentation Tool
The SQL Documentation Tool creates technical documentation for Microsoft SQL Server 7.0 and 2000 databases. Technical documentation is created in HTML and HTML Help formats. The HTML Help format documentation is fully searchable and cross referenced. The SQL Documentation Tool documents SQL Server Tables, Views, Stored Procedures, Triggers and Table Relationships.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (10.3Mb ZIP file).

VB Documentation Tool
The VB Documentation Tool creates technical documentation for Microsoft Visual Basic 6.0 projects. Technical documentation is created in HTML and HTML Help formats. The HTML Help format documentation is fully searchable and cross referenced.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (1Mb ZIP file).

Indexing Service Companion
The Indexing Service Companion is a Windows application that extends the functionality of the Microsoft Windows Indexing Service so that it is able to index content from remote websites and also from ODBC databases. As such it can be used as a low cost alternative to Sharepoint Portal Search Services.
   Try Sample Search Facility Try Sample Search Facility.
   Download Trial Version Download Trial Version (1.7Mb ZIP file).

The Website Utility
The Website Utility examines websites for errors and areas that need to be optimised for search engines by using a built in web crawling engine. Errors checked for include broken or moved hyperlinks, missing page titles and missing meta tags. It also generates HTML for use in creating website site maps (table of contents pages - like this one), and is able to create both client-side JavaScript Search Engines and server-side ASP Search Engines for a website.
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (3Mb ZIP file).

PHP Documentation Tool™
Automatically creates developer documentation for PHP web applications. Documentation is created in HTML, HTML Help and plain text formats.
   View Sample Output (HTML Help format) View Sample Output (HTML Help format).
   View Sample Output (HTML Format) View Sample Output (HTML Format).
   Download Trial Version Download Trial Version (1.0Mb ZIP file).
ASP Documentation Tool - Free Trial Available!

Documentation tools to automate the documentation of SQL Server databases and ASP, C#, VB.NET and VB 6.0 application source code

  Site Map

All content is © 1995 - 2008 Brett Burridge