CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Download a list of files from web... faster?

 



smilebey
Novice

Oct 9, 2013, 4:11 AM

Post #1 of 9 (917 views)
Download a list of files from web... faster? Can't Post

Hello Perl Community,

Currently I am trying to improve my code for downloading textual files from a website. This is the code:

Code
foreach $line (@file) {  
#Each line has following struture: did, filename, blank (blank included because it will capture the newline)
($did, $get_file, $blank) = split (",", $line);
$get_file = "http://www.arandomwebsite.com/Archives/" . $get_file;
$_ = $get_file;

if ( /([0-9|-]+).txt/ ) {
$filename = $write_dir . "/" . $did . ".txt";
open OUT, ">$filename" or die $!;
print "file $did \n";

my $request = HTTP::Request->new(GET => $get_file);
my $response =$ua->get($get_file );
$p = $response->content;
if ($p) {
print OUT $p;
close OUT;
} else {
#error logging
print LOG "error in $filename - $did \n" ;
}
}
}

My question is: Is there any way to improve (speed up) the downloading process? The loop, the general structure, or even the command?

I appreciate any comments and help. Thanks in advance.

smilebey


(This post was edited by smilebey on Oct 9, 2013, 4:12 AM)


2teez
Novice

Oct 9, 2013, 9:00 AM

Post #2 of 9 (905 views)
Re: [smilebey] Download a list of files from web... faster? [In reply to] Can't Post

Hi,

I would suggest you profile your script using a good Perl source profiler like https://metacpan.org/module/Devel::NYTProf and then you can weed out or re-write those lines clogging your script.

Need I say that the portion of your script you posted is not complete to say much.

Hope this helps


smilebey
Novice

Oct 9, 2013, 9:16 AM

Post #3 of 9 (901 views)
Re: [2teez] Download a list of files from web... faster? [In reply to] Can't Post

Thanks for your reply. I will take a look at your link. I've never worked with a source profiler. I'll try it.

Of course, the portion of my code wouldn't work on its own, but I didn't want to post the complete source code, since it doesn't give you more important insights. But the code above works perfectly and it is correct.

I am just interested whether someone of you guys thinks that there is another way to for instance construct the for-loop, or use something else than HTTP::Request, etc.


(This post was edited by smilebey on Oct 9, 2013, 9:17 AM)


2teez
Novice

Oct 9, 2013, 9:54 AM

Post #4 of 9 (896 views)
Re: [smilebey] Download a list of files from web... faster? [In reply to] Can't Post

Hi again,

Observations:
The following observations might not be correct, since the whole script is not seen.
1. How is your dataset formed? And how large is it?
If your dataset is CVS file, why not use a CSV module like Text::CSV_XS, ofcourse if it is a simple CSV, the split you used is still OK.

If your dataset is large, why first put all of that into an array variable, then FOR loop them one after another. When you can simply go over each line doing the same thing using a while loop.

2. As much as it depends on me, I try not to DRY i.e Don't Repeat Yourself. By using subroutine and finding a way to bring codes together that does the same thing. E.g. Why write "open" function several times, when I can easily use a subroutine that does that and incorporate with others codes.

3. Since regex try to match every times is used, why not use a "qr/STRING/" so that your regex is compiled once but used as many times as you want? Please check and read http://perldoc.perl.org/perlop.html#Regexp-Quote-Like-Operators

There are about 2 other points I could mention but those might just be a favored opinion.

Hope this helps.


Laurent_R
Veteran / Moderator

Oct 9, 2013, 10:55 AM

Post #5 of 9 (891 views)
Re: [smilebey] Download a list of files from web... faster? [In reply to] Can't Post

You should run the profiler if you really need better performance, but I frankly do not think that the execution time of your code is anything significant compared to the download time and possibly the IOs (printing to files).

The general idea is that trying to improve performance of some code that represents only 1 or 2% of your running time is just useless.


smilebey
Novice

Oct 9, 2013, 11:22 AM

Post #6 of 9 (889 views)
Re: [Laurent_R] Download a list of files from web... faster? [In reply to] Can't Post

Thank you both for the answers. I will implement the suggestions 2teez mentioned. But I think, you are right, Laurent_R. I would say the downloading and printing part are the crucial piles in my case. But since you didnt suggest any alternative, I guess there is no better than HTTP::Request and then printing to files, right?


Zhris
Enthusiast

Oct 9, 2013, 12:24 PM

Post #7 of 9 (879 views)
Re: [smilebey] Download a list of files from web... faster? [In reply to] Can't Post

Hi,

A common way to download source to file is to use LWP::UserAgent mirror method (http://search.cpan.org/~gaas/libwww-perl-6.05/lib/LWP/UserAgent.pm).


Quote
$ua->mirror( $url, $filename )

This method will get the document identified by $url and store it in file called $filename. If the file already exists, then the request will contain an "If-Modified-Since" header matching the modification time of the file. If the document on the server has not changed since this time, then nothing happens. If the document has been updated, it will be downloaded again. The modification time of the file will be forced to match that of the server.

The return value is the the response object.



Code
#!/usr/bin/perl 
use strict;
use warnings;
use LWP::UserAgent;

my $ua = LWP::UserAgent->new;
my $res = $ua->mirror('http://example.com', 'file.txt');
die $res->status_line unless $res->is_success;


Chris


(This post was edited by Zhris on Oct 9, 2013, 12:50 PM)


smilebey
Novice

Oct 9, 2013, 1:53 PM

Post #8 of 9 (866 views)
Re: [Zhris] Download a list of files from web... faster? [In reply to] Can't Post

Hello Chris,

thanks for your suggestion. I implemented it and tested the performance of your code. Interestingly, no significant improvement at all. However, thanks again.


FishMonger
Veteran / Moderator

Oct 9, 2013, 9:03 PM

Post #9 of 9 (860 views)
Re: [smilebey] Download a list of files from web... faster? [In reply to] Can't Post

There are a few inefficiencies in your code, but I doubt that they are enough to cause much of a slowdown in the downloads. If you need to improve speed of the download process, you'll need to look at your internet connection and the server that you're downloading from.

You've already been advised to profile your script to see were it's spending its time. If you need to focus an the download process, then extract that portion out of the main script and put it into a new script to be profiled and optimized.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives