CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Stuck with uninitialized values errors

 



stuckinarut
Novice

Sep 2, 2014, 1:18 PM

Post #1 of 19 (4920 views)
Stuck with uninitialized values errors Can't Post

Stuck with a uninitialized values errors that are evading me ;-(

I only need to extract an 'Island Name' from each record contained in a very large XML file (saved with a .txt file extension). I considered XML:SIMPLE, but I should be able to do this minor single-element task with just a standalone script.

The <TAG></TAG>structure I'm trying to deal with is:

<island id="xxxxxxxx">Island Name To Extract Here</island>

The xxxxxx's are DIFFERENT numbers in each record.

My REGEXP *should* work for all front tags unless I've messed up:


Code
<island id=m/\"\d+\"/>


Here is the full script code:


Code
#!/usr/bin/perl 

use strict;
use warnings;

my $L_list;
my $L_count;
my $line;
my @F;
my $island;

$/ = '</island>';
open my $L_list, '<', 'listL.txt' or die "Cannot open listL.txt: $!";
while (<$L_list>) {
chomp;
chomp $line;
$line =~ s/\r//g; # removes windows CR characters
$line =~ s/\s+$//; # removes trailing white spaces

if ($line = /island/) {
@F = split '<island id=m/\"\d+\"/>', $_ ;
print $F[$#F];
print "\n";

$L_count ++;
}

}

print "TOTAL RECORDS: $L_count\n";


I appreciate any assistance.

Thanks.

-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 1:25 PM

Post #2 of 19 (4917 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

You never assigned a value to $line.

Start by changing:

Code
while (<$L_list>) {


To:

Code
while (my $line = <$L_list>) {



stuckinarut
Novice

Sep 2, 2014, 1:31 PM

Post #3 of 19 (4916 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

Hmmm... I just discovered sometimes the world 'Island' or 'Islands' can be in some other parts of the file I do NOT want, so just changed this code segment to ONLY get any <TAG> lines than start with <island ... or end with island>


Code
if ($line = m/\<island/ || m/island\>/) {


My apologies for not seeing this earlier ;-(

-stuckinarut


stuckinarut
Novice

Sep 2, 2014, 1:37 PM

Post #4 of 19 (4915 views)
Re: [FishMonger] Stuck with uninitialized values errors [In reply to] Can't Post


In Reply To
You never assigned a value to $line.

Start by changing:

Code
while (<$L_list>) {


To:

Code
while (my $line = <$L_list>) {



Ohhhh - Thank You, FishMonger. Dunno how I missed that one.

Now the only error is an uninitalized value in the (revised) pattern match:


Code
	if ($line = m/\<island/ || m/island\>/) {


I already my $island; stated, but it appers I'm not understanding what is need.

-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 1:39 PM

Post #5 of 19 (4911 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post


Code
if ($line =~ m/\<island/ || $line =~ m/island\>/) {



(This post was edited by FishMonger on Sep 2, 2014, 1:40 PM)


stuckinarut
Novice

Sep 2, 2014, 1:51 PM

Post #6 of 19 (4906 views)
Re: [FishMonger] Stuck with uninitialized values errors [In reply to] Can't Post


In Reply To

Code
if ($line =~ m/\<island/ || $line =~ m/island\>/) {



I can see not enough sleep here last night as should have caught this one. Thanks again, FishMonger.

But I am really baffled by the new error that appeared about an 'uninitialized value in split':

@F = split '<island id=m/\"\d+\"/>', $_ ;

Do I always need to initialize $_ as well ???

-stuckinarut


stuckinarut
Novice

Sep 2, 2014, 1:56 PM

Post #7 of 19 (4899 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

No, I can't do that and get a "Can't use global $_ " error ;-(

Darn.

-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 1:59 PM

Post #8 of 19 (4897 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

Without the =~ binding operator it will bind to $_.

If you need to bind it to another var, you need to be explicit and specify that binding.

On another note, the first arg to the split function is a regex pattern, not a string which includes an embedded regex. The second arg is the var (string) you want to split.


(This post was edited by FishMonger on Sep 2, 2014, 2:00 PM)


stuckinarut
Novice

Sep 2, 2014, 2:03 PM

Post #9 of 19 (4894 views)
Re: [FishMonger] Stuck with uninitialized values errors [In reply to] Can't Post

Hmmm... I' now thinking what I should maybe do is just use a full <TAG>Data To Extract Here</TAG> REGEXP match and forget the 'split' ... does that make any sense?

Thanks.

-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 2:05 PM

Post #10 of 19 (4890 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

The problems you're having parsing the xml is exactly the reason why you shouldn't be doing this manual via regex.

USE an XML parser module!!


stuckinarut
Novice

Sep 2, 2014, 2:07 PM

Post #11 of 19 (4889 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post


In Reply To
Hmmm... I' now thinking what I should maybe do is just use a full <TAG>Data To Extract Here</TAG> REGEXP match and forget the 'split' ... does that make any sense?

Thanks.

-stuckinarut


ESPECIALLY since I just realized some Island Names to be extracted can be two or 3 words.

Oh boy.

-stuckinarut


stuckinarut
Novice

Sep 2, 2014, 6:28 PM

Post #12 of 19 (4877 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

Got some desperately needed sleep and now have this working error-free , but just to print out the entire REGEXP matched <TAG>Data</TAG > part of the applicable lines and a total record count.

I've tried using $1 and $_ to try and extract just the -> .* <- data part of the REGEXP match needed, but can't seem to pull things together to get either one to work.


Code
#!/usr/bin/perl 

use strict;
use warnings;

my $L_list;
my $L_count;
my $line;
my $id;

# $/ = '</island>';

open $L_list, '<', 'listL.txt' or die "Cannot open listL.txt: $!";
while (my $line = <$L_list>) {
chomp $line;
$line =~ s/\r//g; # removes windows CR characters
$line =~ s/\s+$//; # removes trailing white spaces

if ($line =~ m/\<island/ || $line =~ m/island\>/) {

$line =~ 'm/\<island id=/\"\d{8}\"\>.*\<\/island\>/';

print $line;
print "\n";

$L_count ++;
}

}

print "TOTAL RECORDS: $L_count\n";


-stuckinarut


stuckinarut
Novice

Sep 2, 2014, 6:46 PM

Post #13 of 19 (4874 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

I thought this would work, but still no joy ;-(


Code
	$line =~ 'm/\<island id=/\"\d{8}\"\>(.*)\<\/island\>/'; 
$line =~ $1;
print $line;
print "\n";


-stuckinarut


FishMonger
Veteran / Moderator

Sep 2, 2014, 6:52 PM

Post #14 of 19 (4871 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

Remove the single quotes.

The angle brackets are not special in a regex neither are the double quotes, so there's not need to escape them.


Code
if ( $line =~ m/<island id="\d{8}">(.*)<\/island>/ ) { 
$line = $1;
}


stuckinarut
Novice

Sep 2, 2014, 7:05 PM

Post #15 of 19 (4869 views)
Re: [FishMonger] Stuck with uninitialized values errors [In reply to] Can't Post


In Reply To
Remove the single quotes.

The angle brackets are not special in a regex neither are the double quotes, so there's not need to escape them.


Code
if ( $line =~ m/<island id="\d{8}">(.*)<\/island>/ ) { 
$line = $1;
}


OMG-OMG, FishMonger - that WORKS! I was trying other forms of assignment for the $1 and kept getting 'uninitialized' errors, but in trying to initialize then got errors saying I couldn't do it.

THANK YOU VERY, VERY MUCH FOR YOUR PATIENT HELP !!!

-stuckinarut


Laurent_R
Veteran / Moderator

Sep 2, 2014, 11:25 PM

Post #16 of 19 (4853 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

It still won't work the way you want if you can have two pairs of tags on the same line. If this can happen you'll need non greedy quantifier ( "(.*?)" ) and some form of looping.


stuckinarut
Novice

Sep 2, 2014, 11:44 PM

Post #17 of 19 (4851 views)
Re: [Laurent_R] Stuck with uninitialized values errors [In reply to] Can't Post


In Reply To
It still won't work the way you want if you can have two pairs of tags on the same line. If this can happen you'll need non greedy quantifier ( "(.*?)" ) and some form of looping.


Hi, Laurent:

Thanks for the Head's Up, as in later reviewing more of the output file lines, I discoveredthat there were some other <TAG> lines with...

<isComment>Some Text</isComment>

... embedded between the <island...> and </island? TAGS.

The main <TAGS> are only one per line, but now to try and figure out how to EXCLUDE the <isComment>blah</isComment> stuff.

Enjoy your train rides today :^)

-stuckinarut


Laurent_R
Veteran / Moderator

Sep 3, 2014, 9:45 AM

Post #18 of 19 (4840 views)
Re: [stuckinarut] Stuck with uninitialized values errors [In reply to] Can't Post

Maybe a two-step process: first get the whole chunk between the <island> and </island> tags, and then remove from that chunck the part between the <comment> and </comment> tags (or whatever these tags are exactly).

But you are clearly arriving at the limit of what can sanely be done with regexes, an XML module (or posibly a simple parser) might be recommended, as pointed out already by Fishmonger.


stuckinarut
Novice

Sep 3, 2014, 10:13 AM

Post #19 of 19 (4838 views)
Re: [Laurent_R] Stuck with uninitialized values errors [In reply to] Can't Post


In Reply To
Maybe a two-step process: first get the whole chunk between the <island> and </island> tags, and then remove from that chunck the part between the <comment> and </comment> tags (or whatever these tags are exactly).

But you are clearly arriving at the limit of what can sanely be done with regexes, an XML module (or posibly a simple parser) might be recommended, as pointed out already by Fishmonger.


Thanks Laurent, this is the approach I was thinking and another option I'm also chewing on.

Fortunately, for right now Thanks to FishMonger's help, I got enough of what was needed in the output for some decision making.

It also turns out that the XML source file has other stuff in it which may give an off-the-shelf XML parsing module some indigestion ;-(

Keep 'riding the rails' !!!

-stuckinarut

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives