Home: Perl Programming Help: Intermediate:
Stuck with uninitialized values errors



stuckinarut
User

Sep 2, 2014, 1:18 PM


Views: 13741
Stuck with uninitialized values errors

Stuck with a uninitialized values errors that are evading me ;-(

I only need to extract an 'Island Name' from each record contained in a very large XML file (saved with a .txt file extension). I considered XML:SIMPLE, but I should be able to do this minor single-element task with just a standalone script.

The <TAG></TAG>structure I'm trying to deal with is:

<island id="xxxxxxxx">Island Name To Extract Here</island>

The xxxxxx's are DIFFERENT numbers in each record.

My REGEXP *should* work for all front tags unless I've messed up:


Code
<island id=m/\"\d+\"/>


Here is the full script code:


Code
#!/usr/bin/perl 

use strict;
use warnings;

my $L_list;
my $L_count;
my $line;
my @F;
my $island;

$/ = '</island>';
open my $L_list, '<', 'listL.txt' or die "Cannot open listL.txt: $!";
while (<$L_list>) {
chomp;
chomp $line;
$line =~ s/\r//g; # removes windows CR characters
$line =~ s/\s+$//; # removes trailing white spaces

if ($line = /island/) {
@F = split '<island id=m/\"\d+\"/>', $_ ;
print $F[$#F];
print "\n";

$L_count ++;
}

}

print "TOTAL RECORDS: $L_count\n";


I appreciate any assistance.

Thanks.

-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 1:25 PM


Views: 13738
Re: [stuckinarut] Stuck with uninitialized values errors

You never assigned a value to $line.

Start by changing:

Code
while (<$L_list>) {


To:

Code
while (my $line = <$L_list>) {



stuckinarut
User

Sep 2, 2014, 1:31 PM


Views: 13737
Re: [stuckinarut] Stuck with uninitialized values errors

Hmmm... I just discovered sometimes the world 'Island' or 'Islands' can be in some other parts of the file I do NOT want, so just changed this code segment to ONLY get any <TAG> lines than start with <island ... or end with island>


Code
if ($line = m/\<island/ || m/island\>/) {


My apologies for not seeing this earlier ;-(

-stuckinarut


stuckinarut
User

Sep 2, 2014, 1:37 PM


Views: 13736
Re: [FishMonger] Stuck with uninitialized values errors


In Reply To
You never assigned a value to $line.

Start by changing:

Code
while (<$L_list>) {


To:

Code
while (my $line = <$L_list>) {



Ohhhh - Thank You, FishMonger. Dunno how I missed that one.

Now the only error is an uninitalized value in the (revised) pattern match:


Code
	if ($line = m/\<island/ || m/island\>/) {


I already my $island; stated, but it appers I'm not understanding what is need.

-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 1:39 PM


Views: 13732
Re: [stuckinarut] Stuck with uninitialized values errors


Code
if ($line =~ m/\<island/ || $line =~ m/island\>/) {



(This post was edited by FishMonger on Sep 2, 2014, 1:40 PM)


stuckinarut
User

Sep 2, 2014, 1:51 PM


Views: 13727
Re: [FishMonger] Stuck with uninitialized values errors


In Reply To

Code
if ($line =~ m/\<island/ || $line =~ m/island\>/) {



I can see not enough sleep here last night as should have caught this one. Thanks again, FishMonger.

But I am really baffled by the new error that appeared about an 'uninitialized value in split':

@F = split '<island id=m/\"\d+\"/>', $_ ;

Do I always need to initialize $_ as well ???

-stuckinarut


stuckinarut
User

Sep 2, 2014, 1:56 PM


Views: 13720
Re: [stuckinarut] Stuck with uninitialized values errors

No, I can't do that and get a "Can't use global $_ " error ;-(

Darn.

-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 1:59 PM


Views: 13718
Re: [stuckinarut] Stuck with uninitialized values errors

Without the =~ binding operator it will bind to $_.

If you need to bind it to another var, you need to be explicit and specify that binding.

On another note, the first arg to the split function is a regex pattern, not a string which includes an embedded regex. The second arg is the var (string) you want to split.


(This post was edited by FishMonger on Sep 2, 2014, 2:00 PM)


stuckinarut
User

Sep 2, 2014, 2:03 PM


Views: 13715
Re: [FishMonger] Stuck with uninitialized values errors

Hmmm... I' now thinking what I should maybe do is just use a full <TAG>Data To Extract Here</TAG> REGEXP match and forget the 'split' ... does that make any sense?

Thanks.

-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 2:05 PM


Views: 13711
Re: [stuckinarut] Stuck with uninitialized values errors

The problems you're having parsing the xml is exactly the reason why you shouldn't be doing this manual via regex.

USE an XML parser module!!


stuckinarut
User

Sep 2, 2014, 2:07 PM


Views: 13710
Re: [stuckinarut] Stuck with uninitialized values errors


In Reply To
Hmmm... I' now thinking what I should maybe do is just use a full <TAG>Data To Extract Here</TAG> REGEXP match and forget the 'split' ... does that make any sense?

Thanks.

-stuckinarut


ESPECIALLY since I just realized some Island Names to be extracted can be two or 3 words.

Oh boy.

-stuckinarut


stuckinarut
User

Sep 2, 2014, 6:28 PM


Views: 13698
Re: [stuckinarut] Stuck with uninitialized values errors

Got some desperately needed sleep and now have this working error-free , but just to print out the entire REGEXP matched <TAG>Data</TAG > part of the applicable lines and a total record count.

I've tried using $1 and $_ to try and extract just the -> .* <- data part of the REGEXP match needed, but can't seem to pull things together to get either one to work.


Code
#!/usr/bin/perl 

use strict;
use warnings;

my $L_list;
my $L_count;
my $line;
my $id;

# $/ = '</island>';

open $L_list, '<', 'listL.txt' or die "Cannot open listL.txt: $!";
while (my $line = <$L_list>) {
chomp $line;
$line =~ s/\r//g; # removes windows CR characters
$line =~ s/\s+$//; # removes trailing white spaces

if ($line =~ m/\<island/ || $line =~ m/island\>/) {

$line =~ 'm/\<island id=/\"\d{8}\"\>.*\<\/island\>/';

print $line;
print "\n";

$L_count ++;
}

}

print "TOTAL RECORDS: $L_count\n";


-stuckinarut


stuckinarut
User

Sep 2, 2014, 6:46 PM


Views: 13695
Re: [stuckinarut] Stuck with uninitialized values errors

I thought this would work, but still no joy ;-(


Code
	$line =~ 'm/\<island id=/\"\d{8}\"\>(.*)\<\/island\>/'; 
$line =~ $1;
print $line;
print "\n";


-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 6:52 PM


Views: 13692
Re: [stuckinarut] Stuck with uninitialized values errors

Remove the single quotes.

The angle brackets are not special in a regex neither are the double quotes, so there's not need to escape them.


Code
if ( $line =~ m/<island id="\d{8}">(.*)<\/island>/ ) { 
$line = $1;
}


stuckinarut
User

Sep 2, 2014, 7:05 PM


Views: 13690
Re: [FishMonger] Stuck with uninitialized values errors


In Reply To
Remove the single quotes.

The angle brackets are not special in a regex neither are the double quotes, so there's not need to escape them.


Code
if ( $line =~ m/<island id="\d{8}">(.*)<\/island>/ ) { 
$line = $1;
}


OMG-OMG, FishMonger - that WORKS! I was trying other forms of assignment for the $1 and kept getting 'uninitialized' errors, but in trying to initialize then got errors saying I couldn't do it.

THANK YOU VERY, VERY MUCH FOR YOUR PATIENT HELP !!!

-stuckinarut


Laurent_R
Veteran / Moderator

Sep 2, 2014, 11:25 PM


Views: 13674
Re: [stuckinarut] Stuck with uninitialized values errors

It still won't work the way you want if you can have two pairs of tags on the same line. If this can happen you'll need non greedy quantifier ( "(.*?)" ) and some form of looping.


stuckinarut
User

Sep 2, 2014, 11:44 PM


Views: 13672
Re: [Laurent_R] Stuck with uninitialized values errors


In Reply To
It still won't work the way you want if you can have two pairs of tags on the same line. If this can happen you'll need non greedy quantifier ( "(.*?)" ) and some form of looping.


Hi, Laurent:

Thanks for the Head's Up, as in later reviewing more of the output file lines, I discoveredthat there were some other <TAG> lines with...

<isComment>Some Text</isComment>

... embedded between the <island...> and </island? TAGS.

The main <TAGS> are only one per line, but now to try and figure out how to EXCLUDE the <isComment>blah</isComment> stuff.

Enjoy your train rides today :^)

-stuckinarut


Laurent_R
Veteran / Moderator

Sep 3, 2014, 9:45 AM


Views: 13661
Re: [stuckinarut] Stuck with uninitialized values errors

Maybe a two-step process: first get the whole chunk between the <island> and </island> tags, and then remove from that chunck the part between the <comment> and </comment> tags (or whatever these tags are exactly).

But you are clearly arriving at the limit of what can sanely be done with regexes, an XML module (or posibly a simple parser) might be recommended, as pointed out already by Fishmonger.


stuckinarut
User

Sep 3, 2014, 10:13 AM


Views: 13659
Re: [Laurent_R] Stuck with uninitialized values errors


In Reply To
Maybe a two-step process: first get the whole chunk between the <island> and </island> tags, and then remove from that chunck the part between the <comment> and </comment> tags (or whatever these tags are exactly).

But you are clearly arriving at the limit of what can sanely be done with regexes, an XML module (or posibly a simple parser) might be recommended, as pointed out already by Fishmonger.


Thanks Laurent, this is the approach I was thinking and another option I'm also chewing on.

Fortunately, for right now Thanks to FishMonger's help, I got enough of what was needed in the output for some decision making.

It also turns out that the XML source file has other stuff in it which may give an off-the-shelf XML parsing module some indigestion ;-(

Keep 'riding the rails' !!!

-stuckinarut