CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
loop through websites

 



thePanda41
Novice

Jul 30, 2007, 6:39 AM

Post #1 of 15 (1522 views)
loop through websites Can't Post

I am attempting to run through a website which contains a bunch of links. Is it possible to run a loop that will open each link individually, run my regular expressions to capture certain data, then go on to the next links and do the same thing?


KevinR
Veteran


Jul 30, 2007, 9:13 AM

Post #2 of 15 (1519 views)
Re: [thePanda41] loop through websites [In reply to] Can't Post

Sounds possible.
-------------------------------------------------


adaykin
Novice

Jul 30, 2007, 12:15 PM

Post #3 of 15 (1517 views)
Re: [thePanda41] loop through websites [In reply to] Can't Post

check out the Robot::UA class I can't remember if that is a default module or not, but you will need to get familiar with that and LWP classes most likely
------------------------------------------------------------

New Horizon Designs <-- My site, just updated the GUI to a PHP Nuke interface


KevinR
Veteran


Jul 30, 2007, 4:12 PM

Post #4 of 15 (1515 views)
Re: [adaykin] loop through websites [In reply to] Can't Post

Robot::UA? Maybe you are thinking of LWP::UserAgent?
-------------------------------------------------


adaykin
Novice

Jul 30, 2007, 6:25 PM

Post #5 of 15 (1513 views)
Re: [KevinR] loop through websites [In reply to] Can't Post

Sorry I meant to say LWP::RobotUA, it's a default module installed with Perl, just go into your command prompt and type in "perldoc LWP::RobotUA" that should give you a start. I'm using it now to traverse a few sites.
------------------------------------------------------------

New Horizon Designs <-- My site, just updated the GUI to a PHP Nuke interface


KevinR
Veteran


Jul 30, 2007, 8:16 PM

Post #6 of 15 (1512 views)
Re: [adaykin] loop through websites [In reply to] Can't Post

No LWP modules are core modules. But they might included with some distributions of perl. You can see a list of the core (5.8) modules starting with "L" here:

http://perldoc.perl.org/index-modules-L.html
-------------------------------------------------


adaykin
Novice

Jul 31, 2007, 6:42 AM

Post #7 of 15 (1509 views)
Re: [KevinR] loop through websites [In reply to] Can't Post

Well if he wants to get it with activestate it comes installed there by default. Even on linux machines that have Perl already installed everyone I have touched comes with the LWP modules already installed.

I would recommend activestate if Perl isn't already installed on your machine.They have Perl in binary format there with an executable format easy to install.
------------------------------------------------------------

New Horizon Designs <-- My site, just updated the GUI to a PHP Nuke interface


hydpm
User

Jul 31, 2007, 7:26 AM

Post #8 of 15 (1504 views)
Re: [thePanda41] loop through websites [In reply to] Can't Post

You can use linkchecker .
i think it will server ur purpose


hydpm
User

Jul 31, 2007, 7:28 AM

Post #9 of 15 (1503 views)
Re: [thePanda41] loop through websites [In reply to] Can't Post

    use WE_Frontend::LinkChecker;
my $lc = WE_Frontend::LinkChecker->new(-url => "http://www/",
-restrict => [..]);
my $errors = $lc->check_html;
print $errors;


thePanda41
Novice

Jul 31, 2007, 8:26 AM

Post #10 of 15 (1498 views)
Re: [thePanda41] loop through websites [In reply to] Can't Post

thanks guys, I'll test some of that out and see where I get.


KevinR
Veteran


Jul 31, 2007, 9:54 AM

Post #11 of 15 (1496 views)
Re: [wingsof5r] loop through websites [In reply to] Can't Post


In Reply To
You can use linkchecker .
i think it will server ur purpose


It might work for all I know, I have never heard of that module before, but the description of the module is:

WE_Frontend::LinkChecker - check a site for broken links
-------------------------------------------------


KevinR
Veteran


Jul 31, 2007, 9:59 AM

Post #12 of 15 (1493 views)
Re: [adaykin] loop through websites [In reply to] Can't Post


In Reply To
Well if he wants to get it with activestate it comes installed there by default. Even on linux machines that have Perl already installed everyone I have touched comes with the LWP modules already installed.

I would recommend activestate if Perl isn't already installed on your machine.They have Perl in binary format there with an executable format easy to install.


Sorry, mate, I hope it did not seem as though I was trying to nit-pick your suggestion. The distinction between a "default" and a "core" module was my only concern.

Kevin
-------------------------------------------------


KevinR
Veteran


Jul 31, 2007, 10:04 AM

Post #13 of 15 (1492 views)
Re: [thePanda41] loop through websites [In reply to] Can't Post


In Reply To
thanks guys, I'll test some of that out and see where I get.


If nothing else works, you can always get the page with LWP or LWP::Simple and use HTML::TokeParser to get the links out of the html code and then loop through them.

http://search.cpan.org/~gaas/HTML-Parser-3.56/lib/HTML/TokeParser.pm

there is an example in the TokeParser module for getting the links from the html document.
-------------------------------------------------


hydpm
User

Aug 1, 2007, 7:09 AM

Post #14 of 15 (1485 views)
Re: [thePanda41] loop through websites [In reply to] Can't Post

I have used the link checker tool in one of the projects:
The linkchecker-4.0-1.i386.rpm can be downloaded from
http://linkchecker.sourceforge.net/

Hope this will help you.

you can write a perl script invoking this utility.

I have used some thing like below:

---------------------
echo "#Links to validate the integration for apache"
echo "#############################################"
linkchecker -r 0 http://rhel4-in-qa1.spikesource.in/
linkchecker -r 0 http://rhel4-in-qa1.spikesource.in/withauth -u guest -p guest
linkchecker -r 0 https://rhel4-in-qa1.spikesource.in/ -u guest -p guest
linkchecker -r 0 https://rhel4-in-qa1.spikesource.in/withauth -u guest -p guest
--------------------------------------

it is just a part of the code


adaykin
Novice

Aug 2, 2007, 7:12 AM

Post #15 of 15 (1474 views)
Re: [KevinR] loop through websites [In reply to] Can't Post


In Reply To

In Reply To
Well if he wants to get it with activestate it comes installed there by default. Even on linux machines that have Perl already installed everyone I have touched comes with the LWP modules already installed.

I would recommend activestate if Perl isn't already installed on your machine.They have Perl in binary format there with an executable format easy to install.


Sorry, mate, I hope it did not seem as though I was trying to nit-pick your suggestion. The distinction between a "default" and a "core" module was my only concern.

Kevin



np man I see where you were coming from
------------------------------------------------------------

New Horizon Designs <-- My site, just updated the GUI to a PHP Nuke interface

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives