CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Reading Huge .csv and searching it

 



Dhamma
New User

Jan 29, 2013, 2:56 AM

Post #1 of 10 (977 views)
Reading Huge .csv and searching it Can't Post

Hi,

I work on a tool that reads giant .csv files with measurements and has some loops looking for interesting points, putting these in a new file.

I have problems with use of uninitialized values.


Code
while (<*.csv>) {    


open(FILE, $_)or die "die Datei wurde nicht gefunden\n";
@input = <FILE>;
close FILE;
chomp @input;

foreach my $nr (0..$#input){
$input[$nr]=~ tr/,/./;}


$i=0;
while ($i<=$#input){


@Woerter = split(/;/,$input[$i]);

foreach my $nr (0..$#Woerter){

$tabelle[$i][$nr]= $Woerter[$nr];}
$i++;}


$p =2;
$count=0;
$ii =1;
$zeile= $#input;
$zeile= 1900;
while ($p < $zeile){
$a=$tabelle[$p][1];
$b=$tabelle[$p-1][1];


The program goes on. But here the warning occurs. I do not go to the last 10 lines of the file and there are no holes in my data.

Why is perl thinking of uninitialized values?
This is a problem later when I want to write the array with the searched lines in a file using $ref and join...

Thanks for any ideas!


BillKSmith
Veteran

Jan 29, 2013, 5:59 AM

Post #2 of 10 (973 views)
Re: [Dhamma] Reading Huge .csv and searching it [In reply to] Can't Post

The statement

Code
$zeile= 1900;

would cause the problem later if the value of $#input is less than 1900. (You would be indexing beyond the array when you reference $tabelle[$p][1]. Perl extends the array and sets the values to the special value undef. This would be reported in the error message as "undefined".)
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Jan 29, 2013, 6:03 AM

Post #3 of 10 (972 views)
Re: [Dhamma] Reading Huge .csv and searching it [In reply to] Can't Post

You have to tell where you get this warning.

Just as an example of a possible cause:


Code
@Woerter = split(/;/,$input[$i]);


If your input line has no ";" and you try to read $Woerter[1], this value will be undefined (because the <hole line will be in the first element of the array, i.e. $Woerter[0].

More generally, you ar eprobably making some assumptions about the content of the file lines, and this assumption might not always hold true.

The warning message usually tells you which line of code and which line of input file if responsible for the uninitialized valuie warning. Sop, if you don't find by yourserf, please give the full warning, with details.


Dhamma
New User

Jan 30, 2013, 12:44 AM

Post #4 of 10 (960 views)
Re: [Laurent_R] Reading Huge .csv and searching it [In reply to] Can't Post

Hi,

my warning comes in the next line. Its Just:

If a!=b

I will use $#input later, the file I test the script with has way more than 1900 lines.

Every line has several colums seperated by ",".

The data exists! Thats why I wonder about the warnings.



Full warning ist:

"Use of uninitialzied value $a in numeric ne (!=) line 46.
Use of uninitialzied value $b in numeric ne (!=) line 46."

I get about 150 of them. There are no holes in the data file, I checked it. The warning with the file line doednt occur any more, no idea why.


FishMonger
Veteran / Moderator

Jan 30, 2013, 6:29 AM

Post #5 of 10 (956 views)
Re: [Dhamma] Reading Huge .csv and searching it [In reply to] Can't Post

You should post the actual code that is generating the warning.

$a and $b are special global vars used by the sort function. It's best not to use them outside of that usage.

You should add some debug print statements that dump out the vars in question so that you can see what they actually contain.

Based on the code you've posted, it's clear that you're not using the strict pragma and I'll assume that you're also not using the warnings pragma. Those 2 pragma should be in EVERY Perl script you write. So, always begin your scripts like this:

Code
#!/usr/bin/perl 

use strict;
use warnings;


The strict pragma will require you to declare your vars, which in most cases is done be using the 'my' keyword like you did in the foreach initializations.


Quote
There are no holes in the data file, I checked it.

That may be true, and if it is, it means that your parsing of that data is flawed. Since you haven't provided any example data for us to test, we can't be sure what part of your parsing is wrong.


7stud
Enthusiast

Jan 30, 2013, 5:23 PM

Post #6 of 10 (951 views)
Re: [Dhamma] Reading Huge .csv and searching it [In reply to] Can't Post

1) Globbing returns a list. Try running this code:

Code
use strict; 
use warnings;
use 5.012;

my @arr = ('hello', 'world');

while (@arr) {
say 'x';
}

What conclusions can you draw from the output? Next try outputting $_ instead of 'x'.

2) Don't use <> to do your globbing; use the glob() function instead. Using <> to do your globbing can introduce a sneaky error in your code: Hard coding values in your code is bad, so you should assign values to variables, and then use the variables, but look what happens here:

Code
use strict; 
use warnings;
use 5.012;

my $pattern = '*.csv';

for my $fname (<$pattern>) {
say;
}

--output:--
readline() on unopened filehandle at 2.pl line 7.


3) You say your csv files are huge, but you are doing this:

Code
@input = <FILE>;


That reads the whole file into memory at one time. Is there a reason you can't read line by line? Too slow?

Code
use strict; 
use warnings;
use 5.012;


my $fname = 'data.txt';

open my $INFILE, "<", $fname
or die "Couldn't open $fname: $!";

while (my $line = <$INFILE>) {
#process line
}



4) You should also be using the 3-arg form of open().

5) You should not use bareword filehandles e.g. FILE.

6) You should declare your variables with my().

7) You should always have these lines at the top of your code:

Code
 
use strict;
use warnings;
use 5.012; #depending on your perl version


8)

Code
foreach my $nr (0..$#input){  
$input[$nr]=~ tr/,/./;}


'for' can be used instead of 'foreach' anywhere in perl, and it's shorter to type. And your loop is better written like this:

Code
my @lines = ('a,a', 'b,b'); 

for my $line (@lines) {
$line =~ s/,/./g;
}

say for @lines;


--output:--
a.a
b.b

$line becomes an alias for each of the elements in the array, so changing $line changes the array. When an experienced perl programmer reads this loop control:

Code
 (0..$#Woerter)


it feels like getting stuck in the eye with a sharp stick. You will rarely use $#arr_name.



8) You have thousands of problems in the code you posted. You need to learn *modern* perl, and it would behoove you to stop reading whatever tutorials you are reading now and buy a begining perl book that was published in the last 5 years.


(This post was edited by 7stud on Jan 30, 2013, 6:24 PM)


Dhamma
New User

Jan 30, 2013, 11:31 PM

Post #7 of 10 (931 views)
Re: [7stud] Reading Huge .csv and searching it [In reply to] Can't Post

Thanks!

I did use warnings, but didnt use strict.

This is really helpful!



And I used a free learn perl in 21 Days tutorial... Probably somewhat older.

Before I change the whole program:

Will PERL be able to read .csv files of a week with one line full of measurements per per second and generate a monthly report? Each files will be a week, so 600.000 lines. My boss wants me to use Java but it is slow and I hate it.

If you say Perl is able to I will get a modern book before I try more!



Thanks Guys(and gals..) You are amazing!


FishMonger
Veteran / Moderator

Jan 31, 2013, 6:32 AM

Post #8 of 10 (925 views)
Re: [Dhamma] Reading Huge .csv and searching it [In reply to] Can't Post

Perl can easily and efficiently parse your csv file and generate a report and can do it faster than a java program.

600,000 line csv file is not really that big assuming that each line is not absurdly long.


7stud
Enthusiast

Jan 31, 2013, 8:55 AM

Post #9 of 10 (921 views)
Re: [Dhamma] Reading Huge .csv and searching it [In reply to] Can't Post


Quote
Will PERL be able to read .csv files of a week with one line full of measurements per per second and generate a monthly report?

One of perl's strengths is reading text and matching it against regular expressions. When a computer programmer talks about a 'huge' file, they might mean a file with over something like 5GB of data or roughly 75 million lines. A "big" file might be 1GB or roughly 15 million lines. A 600,000 line file is not trivial, but it is certainly not 'huge'.

It's faster to read the whole file into memory, then process it. However, for files that are larger than your computer's memory, you need to read the file line by line. You might start off by writing code that reads a file line by line. After you finish your program, you can benchmark your program and if you find you need more speed, you can look for ways to make your code faster.


(This post was edited by 7stud on Jan 31, 2013, 9:02 AM)


FishMonger
Veteran / Moderator

Jan 31, 2013, 9:45 AM

Post #10 of 10 (914 views)
Re: [7stud] Reading Huge .csv and searching it [In reply to] Can't Post


Quote
'for' can be used instead of 'foreach' anywhere in perl, and it's shorter to type. And your loop is better written like this:
Code

Code
my @lines = ('a,a', 'b,b');  

for my $line (@lines) {
$line =~ s/,/./g;
}

say for @lines;



IMO, it would be better written as:

Code
for my $line (@lines) {  
$line =~ tr/,/./;
}

or more simply as:

Code
tr/,/./ for @lines;


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives