CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
remote site

 



pbuks
Deleted

Jan 3, 2001, 3:28 PM

Post #1 of 5 (290 views)
remote site Can't Post

Hello,

How can I read the html source of a remote site (example http://www.myweb.com/index.html) and filter some words from it. Or save that particular file in my cgi-bin and proces the file from there.

Thx



BigRich
Novice

Jan 3, 2001, 8:13 PM

Post #2 of 5 (284 views)
Re: remote site [In reply to] Can't Post

#!/usr/bin/perl -w

use strict;
use LWP::Simple;
my $content = get("http://www.myweb.com/index.html");


All of the html from http://www.myweb.com/index.html is now stored in the $content variable.

Check out the docs for LWP if you plan on retrieving content from the web.

Good luck,

BigRich





pbuks
Deleted

Jan 4, 2001, 12:47 AM

Post #3 of 5 (279 views)
Re: remote site [In reply to] Can't Post

Thx for your help. I will let you know if it worked.




pbuks
Deleted

Jan 4, 2001, 7:28 AM

Post #4 of 5 (276 views)
Re: remote site [In reply to] Can't Post

I tried this lines of code.

#!e:\perl\bin\perl -w
print "Content-type:text/html\n\n";

use strict;
use LWP::Simple;
my $content = get("http://games.skynet.be/page.html?channel=arena&pagelang=nl&subject=scores&sid=20");

print $content;

If I run this script he displays the following link:
http://games.skynet.be/index2.html

THX



BigRich
Novice

Jan 4, 2001, 9:48 PM

Post #5 of 5 (268 views)
Re: remote site [In reply to] Can't Post

It didn't work because in your original post you asked how to retrieve the content from a simple html page when in fact the site you are trying to access is a multi-language site that is done in frames and uses cookies.

You have to have the proper cookie to access the page your are trying to access. If not, you get sent to index2.html where you get a cookie based on the menu selection you choose. It doesn't matter if it's a browser or CGI scrip, you still need the proper cookie.

You may be able to construct a UserAgent using LWP::UserAgent that can accept cookies but more than likely you'll need a bot/spider to access the information you want to get at.

You need to study the docs that came with Perl. The docs you need to concentrate on are HTTP (Cookies, Headers, Request, Response) and LWP(UserAgent, RobotUA, lwpcook, etc).

You'll also want to do a search for "perl bots" for sites and info pertaining to bots and spiders.

You could also re-post in the "Advanced" section of this forum where someone with more experince with bots/spiders may be able to help. Be sure to give the url that you are trying to access, not a simple example as you did here and explain that the site is done in frames and uses cookies.

Good luck,

BigRich



 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives