|
Home > Articles
Using the HTTP protocol with PerlScript and ASP
One topic often discussed by ASP programmers is how to access content from other
servers using protocols such as HTTP. There are many uses of such procedures, such as
ensuring a user entering details into a web form enters a valid URL, or for pulling stock
quotes from one site and publishing them via another.
There are several approaches to obtaining content from other servers, and in particular
using the HTTP protocol to programmatically access one web page from within another. ASP
developers using VBScript or JScript might like to take a look at this article, which
describes using an ActiveX object to achieve this. Alternatively the AspHTTP
component from ServerObjects Inc. is popular
with developers.
An alternative approach is to use the PerlScript ActiveX scripting engine. This allows
developers to write ASP documents in Perl, rather than the traditional VBScript or
JScript. Like VBScript and JScript, Perl is an interpreted language, and is relatively
easy to learn. It has long been the language of choice for many web developers, and due to
the long association of Perl with the Internet, it is also unsurprising to find that it
offers excellent support for the development of Internet applications. Perl is also a good
choice when writing a script to extracting and parsing content from other servers due to
its superior text handling capabilities.
Using PerlScript
If you want to write an ASP document in PerlScript, then you may want to add the
following as the first line of your document:
<%@ LANGUAGE="PerlScript" %>
All the code added to this page between the <% %> tags will then be interpreted
as PerlScript instead of the servers default scripting language (which is usually
VBScript).
Although you can, in theory, mix VBScript, JScript and PerlScript within the same
document, this will lead to decreased server performance when compared to using a single
scripting engine. More importantly, you run the risk of your ASP document outputting
content from the various scripting engines in a different order to that which you might
have intended.
One further warning is that there will likely be all kinds of security risks from
letting your web pages take input from other web pages. You should, therefore, use this
sample code with care, or perhaps restrict its use to an Intranet environment rather than
on a publicly accessible Internet site. Dont forget as well that extracting content
from third party web services could bring you into legal difficulties unless you have
explicit permission to do so!
Anyway, onto the code samples. The first is a function called CheckURL that will
determine whether a specified URL exists. The script uses the libwww Perl library, a
collection of modules that can be used to programmatically access the web.
<%
sub CheckURL {
# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to check
$url_to_check = $_[0];
# Use the libwww Perl library
use LWP::UserAgent;
# Create a new instance of a libwww UserAgent in order to
send HTTP requests
$ua = new LWP::UserAgent;
# Set the HTTP_USER_AGENT HTTP header for the request
$ua->agent("Mozilla/4.0 (compatible; MSIE 4.0;
Windows NT)");
# Set a timeout for the HTTP request (in seconds)
$ua->timeout(3);
# Set a maximum size for the HTTP request (in bytes)
$ua->max_size(8192);
#Initialise the HTTP request
$request = new HTTP::Request 'GET' => $url_to_check;
# Set the UserAgent to receive HTML
$request->header('Accept' => 'text/html');
# Send the HTTP request
$result = $ua->request($request);
# Check the outcome of the HTTP request
if ($result->is_success) {
$url_status = "$url_to_check was detected";
} else {
$url_status = "$url_to_check was not detected";
}
# Return a string with the status of the request
return $url_status;
}
%>
This function can then be called using the following PerlScript (changing the required
URL as appropriate):
<%
$Response->Write(CheckURL("http://www.brettb.com/"));
%>
Extending the script
PerlScript offers a wealth of ways for extending the basic script shown above. For
example, using the following as the last line of the CheckURL function will cause the
script to return the actual HTML from the HTTP request:
return $result->content;
This is useful if you want to parse the HTML in order to extract portions of it.
Alternatively, if you are interested in the precise error message returned from a
server, then the following code will be useful:
return $result->error_as_HTML;
If a URL is not found, then the function will return the following:
An Error Occurred
404 Object Not Found
Writing a link extractor
The following code demonstrates how PerlScript can be used to extract all of the
hyperlinks from a document requested using HTTP. There are two functions: ExtractLinks and
LinkCollector. ExtractLinks is the main function. LinkCollector is called from
ExtractLinks, and is used to gather the requested documents hyperlinks into a list.
The two functions are shown below:
sub ExtractLinks{
# Subroutine to check that a URL exists
# Use the first argument of the function as the URL to extract links from
$url_to_check = $_[0];
# Use the libwww Perl library
use LWP::UserAgent;
# Use the link extracting HTML parser
use HTML::LinkExtor;
# The URL module is used here to expand URLs by including
their base reference
use URI::URL;
# Create a list that will be used to contain details of the
links within the document
@LinksList= ();
# Create a new instance of a libwww UserAgent in order to
send HTTP requests
$ua = new LWP::UserAgent;
# Set the HTTP_USER_AGENT HTTP header for the request
$ua->agent("Mozilla/4.0 (compatible; MSIE 4.0;
Windows NT)");
# Set a timeout for the HTTP request (in seconds)
$ua->timeout(3);
# Set a maximum size for the HTTP request (in bytes)
$ua->max_size(8192);
# Create an instance of the link extracting HTML parser
$parser = HTML::LinkExtor->new(\&LinkCollector);
#Initialise the HTTP request
$result = $ua->request(HTTP::Request->new(GET => $url_to_check),
sub {$parser->parse($_[0])});
# Expand URLs to include the base reference
$base = $result->base;
@LinksList = map { $_ = url($_, $base)->abs; } @LinksList;
# Check the outcome of the HTTP request
# If successful, then return a list of links in the requested document
# otherwise, return an error message
if ($result->is_success) {
for (@LinksList) {
$LinksList = $LinksList . "$_<br>";
}
return "$LinksList";
} else {
return "$url_to_check was not detected";
}
}
# A short subroutine to collect the links into a list
sub LinkCollector {
($tag, %attr) = @_;
push(@LinksList, values %attr);
}
%>
The ExtractLinks subroutine can then be called using something like:
<%
$Response->Write(ExtractLinks("http://www.brettb.com/"));
%>
Further reading
If you want to install ActivePerl on your web server, then download it (free of charge)
from the ActiveState website. The installation
routine creates an extensive library of documentation, including reference guides to the
Perl modules and functions described in this article.
There are plenty of online resources for learning Perl, with http://www.perl.com
and http://www.perl.org being two of the best
starting points.
You might also like to invest in one of these featured books:
Useful Development Tools
| ASP
Documentation Tool |
| Automatically creates developer documentation for ASP 2.0
and 3.0 web applications written in VBScript and JScript. Documentation for Microsoft
Access, SQL Server 7/2000 databases and Visual Basic 6.0 components associated with the
web application can also be incorporated into the reports. Documentation is created in
HTML, HTML Help and plain text formats. |
View Sample
Output (HTML Help format).
View Sample Output (HTML Format).
Download
Trial Version (5.2Mb ZIP file). |
| .NET Documentation Tool |
| Automatically creates technical documentation for .NET Framework Windows and ASP.NET applications written in C# or VB.NET and SQL Server 7/2000/2005 or Microsoft Access databases associated with the
application. Documentation is created in HTML, HTML Help and plain text formats. |
View Sample
Output (HTML Help format).
View Sample Output (HTML Format).
Download
Trial Version (5Mb ZIP file). |
| SQL
Documentation Tool |
| The SQL Documentation Tool creates technical documentation for Microsoft SQL Server 7.0 and 2000 databases. Technical documentation is created in HTML and HTML Help formats. The HTML Help format documentation is fully searchable and cross referenced. The SQL Documentation Tool documents SQL Server Tables, Views, Stored Procedures, Triggers and Table Relationships. |
View Sample
Output (HTML Help format).
View Sample Output (HTML Format).
Download
Trial Version (10.3Mb ZIP file). |
| Indexing Service Companion |
|
The Indexing Service Companion is a Windows application that extends the functionality of the Microsoft Windows Indexing Service so that it is able to index content from remote websites and also from ODBC databases. As such it can be used as a low cost alternative to Sharepoint Portal Search Services.
|
Try Sample Search Facility.
Download
Trial Version (1.7Mb ZIP file). |
| The Website Utility |
| The Website Utility examines websites for errors and
areas that need to be optimised for search engines by using a built in web crawling engine.
Errors checked for include broken or moved hyperlinks, missing page titles and missing meta tags.
It also generates HTML for use in creating website site maps (table of contents pages - like this one), and is
able to create both client-side JavaScript Search Engines and server-side ASP Search Engines for a website. |
View Sample Output (HTML Format).
Download
Trial Version (3Mb ZIP file). |
|