CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Out of memory

 



wrkrbeee
Novice

Jan 16, 2015, 10:20 AM

Post #1 of 17 (4426 views)
Out of memory Can't Post

[reply]
Hi everyone, similar to other posts, I am receiving the "out of memory" error while downloading a 450MB text file. Attached the code for your convenience. It appears as though the code is reading the entire file rather than reading one line at a time. FYI, I inherited the code. What I know about PERL fits easily in a doll house thimble. I am grateful for any help or insight. Thank you![/reply]
Attachments: out of memory.txt (1.73 KB)


FishMonger
Veteran / Moderator

Jan 16, 2015, 11:11 AM

Post #2 of 17 (4418 views)
Re: [wrkrbeee] Out of memory [In reply to] Can't Post

Why haven't you followed the advice and suggestions we gave you yesterday in your perlmonks thread?

http://www.perlmonks.org/?node_id=1113334


wrkrbeee
Novice

Jan 16, 2015, 11:18 AM

Post #3 of 17 (4415 views)
Re: [FishMonger] Out of memory [In reply to] Can't Post


In Reply To
Apologize for my ignorance. I incorporated that which is commensurate with my PERL skills (which are infinitesimal to say the least). Happy to withdraw the question if that serves the forum best.



FishMonger
Veteran / Moderator

Jan 16, 2015, 11:21 AM

Post #4 of 17 (4414 views)
Re: [wrkrbeee] Out of memory [In reply to] Can't Post

Part of your problem, which I hinted at in the perlmonks thread is with these 2 lines.

Code
   my $response =$ua->get($get_file ); 
$p = $response->content;


$response already holds the contents of the file (plus additional data), but you then copy that content into $p thereby doubling the amount of memory your program is using to store that data.


wrkrbeee
Novice

Jan 16, 2015, 11:33 AM

Post #5 of 17 (4411 views)
Re: [FishMonger] Out of memory [In reply to] Can't Post

Thank you, sorry to bother the forum with this.


wrkrbeee
Novice

Jan 17, 2015, 6:30 AM

Post #6 of 17 (4397 views)
Re: [FishMonger] Out of memory [In reply to] Can't Post


In Reply To
Hi Fishmonger, I corrected the duplicate storage issue you mentioned, still receive the out of memory error. You also mentioned using LWP::Simple, along with the getstore function. I attempted to incorporate LWP::Simple along with getstore but receive a message like "Can't locate object method "new" via package "LWP::Simple" at line 20 (code attached if that is easier to work with). If you prefer, I can pursue this issue elsewhere. Thank you.




Code
 
#!/usr/bin/perl
use LWP::UserAgent;
use LWP::Simple;
use HTTP::Request;
sub get_http
{
my $url = shift;
my $request = HTTP::Request->new(GET => $url);
my $response = $ua->request($request);
if (!$response->is_success)
{
print STDERR "GET '%s' failed: %s\n",
$url, $response->status_line;
return undef;
}
return $response->content;
}
# user agent object for handling HTTP requests
#my $ua = LWP::UserAgent->new;
my $ua = LWP::Simple->new;

# if you only want a portion of the filing, un-comment the next line
#$ua->max_size(50000); # 50k byte limit

######################### write dir , use "\\" and not "\", for example: "C:\\temp"
#$write_dir = "E:\\Research\\SEC filings 10K and 10Q\\Data\\2000";
$write_dir = "G:\\Research\\SEC filings 10K and 10Q\\Data\\Filing Docs\\2014";
######################### write dir

######################### filename with urls (put in same directory as script)
open dlthis, "test.txt" or die $!;
######################### filename with urls (put in same directory as script)

######################### log
open LOG , ">download_log.txt" or die $!;
######################### log

#my @file = <dlthis>;
while (my $line = <dlthis>) {


#foreach $line (@file) {
#CIK, filename, blank is not used (included because it will capture the newline)
($CIK, $get_file, $blank) = split (",", $line);
$get_file = "http://www.sec.gov/Archives/" . $get_file;
$_ = $get_file;

if ( /([0-9|-]+).txt/ ) {
$filename = $write_dir . "/" . $CIK . ".txt";
open OUT, ">$filename" or die $!;
print "file $CIK, $get_file\n";

my $request = HTTP::Request->new(GET => $get_file);
my $response = $ua->getstore($get_file,$filename);
#$p = $response->content;
if ($response->content) {

print OUT $response->content;

close OUT;
} else {
#error logging
print LOG "error in $filename - $CIK \n" ;
}
}
}

close LOG;



FishMonger
Veteran / Moderator

Jan 17, 2015, 7:18 AM

Post #7 of 17 (4395 views)
Re: [wrkrbeee] Out of memory [In reply to] Can't Post

It's VERY IMPORTANT to use the strict and warnings pragmas in EVERY perl script you write. They will point out lots of problems in your code.

The strict pragma will require you to declare your vars, which is done with the my keyword.

It's better style to put all subroutine definitions at the end of the script instead of the beginning. A single sub like this isn't too bad, but as you add more or longer subs, the user would need to unnecessarily wade through all of that code before they get to the main body of the script.

LWP::Simple is a functional module, not an OO module so trying to call the new() method will fail and give you that error message.

When opening a filehandle you should use (1) a lexical var for the filehandle, (2) the var name should describe what it holds, (3) the 3 arg form of open, and (4) the die statement should include the filename.

e.g.,

Code
open my $url_fh, '<', 'test.txt' or die "failed to open 'test.txt' <$!>";


I'll work up a rewrite of your script in a little while.


wrkrbeee
Novice

Jan 17, 2015, 7:47 AM

Post #8 of 17 (4392 views)
Re: [FishMonger] Out of memory [In reply to] Can't Post

 

In Reply To
You are going way above the call of duty to rewrite the script. I am truly grateful for your time and patience, and please know that I am trying to help myself in lieu of relying on other people.



FishMonger
Veteran / Moderator

Jan 17, 2015, 7:49 AM

Post #9 of 17 (4391 views)
Re: [wrkrbeee] Out of memory [In reply to] Can't Post

This is untested so it may need to be tweaked a little.


Code
#!/usr/bin/perl 

use strict;
use warnings;
use LWP::Simple;

my $write_dir = 'G:/Research/SEC filings 10K and 10Q/Data/Filing Docs/2014';
my $url_file = 'test.txt';
my $log_file = 'download_log.txt';
my $base_url = 'http://www.sec.gov/Archives';

open my $url_fh, '<', $url_file or die "failed to open '$url_file' <$!>";
open my $log_fh, '>', $log_file or die "failed to open '$log_file' <$!>";

while (my $line = <$url_fh>) {
chomp $line;
my ($CIK, $get_file) = split /,/, $line;
next unless $get_file =~ /[0-9|-]+\.txt$/;

print "file $CIK, $base_url/$get_file\n";

my $filename = "$write_dir/$CIK.txt";
my $rc = getstore("$base_url/$get_file", $filename);

if (is_error($rc)) {
print $log_fh "error in $filename - $rc\n";
}

}

close $url_fh;
close $log_fh;



wrkrbeee
Novice

Jan 17, 2015, 8:14 AM

Post #10 of 17 (4389 views)
Re: [FishMonger] Out of memory [In reply to] Can't Post


In Reply To
Very grateful here, I am working to avoid bothering you again. Thank you so much!



wrkrbeee
Novice

Jan 19, 2015, 10:04 AM

Post #11 of 17 (4375 views)
Re: [FishMonger] Out of memory [In reply to] Can't Post


In Reply To
Hi Fishmonger, your code worked beautifully. Any recommendations for learning to work with PERL in a more independent fashion (online course, classroom course, book, etc.)? My primary need for PERL is to download accounting/financial data (textual and tabular) from websites. I'm guessing that like other languages, there are essentials for any use, then more specific techniques for something along the lines of my needs. Any suggestions are greatly appreciated. Thank you for all you've done for me.



Laurent_R
Veteran / Moderator

Jan 20, 2015, 12:39 AM

Post #12 of 17 (4369 views)
Re: [wrkrbeee] Out of memory [In reply to] Can't Post

Learning Perl (O'Reilly) is a very good starting point.


FishMonger
Veteran / Moderator

Jan 20, 2015, 6:45 AM

Post #13 of 17 (4361 views)
Re: [wrkrbeee] Out of memory [In reply to] Can't Post

In addition to the Learning Perl book, which I also highly recommend, you should become very familiar with the perl documentation. That documentation is included with your perl installation and accessible via the perldoc tool/command, unless your installation is severely broken.

You can also read it online. http://perldoc.perl.org/


Tejas
User

Jan 20, 2015, 7:40 AM

Post #14 of 17 (4355 views)
Re: [FishMonger] Out of memory [In reply to] Can't Post

Hi FishMonger

Does this code get the files located at theis URL

my $rc = getstore("$base_url/$get_file", $filename);

Thanks
Tejas


FishMonger
Veteran / Moderator

Jan 20, 2015, 7:44 AM

Post #15 of 17 (4352 views)
Re: [Tejas] Out of memory [In reply to] Can't Post


Quote
Does this code get the files located at theis URL


The getstore() function retrieves/saves a single file, not multiple files.


(This post was edited by FishMonger on Jan 20, 2015, 7:45 AM)


wrkrbeee
Novice

Jan 20, 2015, 9:24 AM

Post #16 of 17 (4347 views)
Re: [Laurent_R] Out of memory [In reply to] Can't Post

Thank you for the reference to the book! That is helpful!


Laurent_R
Veteran / Moderator

Jan 20, 2015, 2:34 PM

Post #17 of 17 (4344 views)
Re: [wrkrbeee] Out of memory [In reply to] Can't Post

You're welcome.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives