CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Cracking A Challenging Text Pattern

 



nanohurtz
Novice

Mar 6, 2002, 9:47 PM

Post #1 of 12 (1591 views)
Cracking A Challenging Text Pattern Can't Post

I'm looking for a PERL code snippet that can parse a text file containing entries like this..

BATCH RAMS125A#REDMERGE
AND123AT
UnsureRRD456WT
WWS789WT
PLL665QQ
END
BATCH RSWS555A#BLUMERGE
POP555BT
III777CT
END
BATCH RCCS919A#WHTMERGE
UDE888QT
RXX818WT
END
BATCH RDDS919A#PRPMERGE
UDE888QT
RUU818WT
KJJ112WE
RTT900VC
END

into this...

RAMS125A | REDMERGE | AND123AT
RAMS125A | REDMERGE | RRD456WT
RAMS125A | REDMERGE | WWS789WT
RAMS125A | REDMERGE | PLL665QQ
RSWS555A | BLUMERGE | POP555BT
RSWS555A | BLUMERGE | III777CT
RCCS919A | WHTMERGE | UDE888QT
RCCS919A | WHTMERGE | RXX818QT
RDDS919A | PRPMERGE | UDE888QT
RDDS919A | PRPMERGE | RUU818QT
RDDS919A | PRPMERGE | KJJ112QT
RDDS919A | PRPMERGE | RTT900QT

Im able to script and trap the first two classes in an if statement

if (m/^BATCH (.*)#(.*)$/) {
$o_lvar=$1;
$o_rvar=$2;
print OUTFILE "$o_lvar\t$o_rvar\n";
}

but can't seem to get the internal loop perfected to capture the remaining multiple instances of this pattern It think a filter like this is needed but am not sure ([A-Z][A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]$/) to create the table above.. Any help is greatly appreciated!
-NanoHurtz

"The danger from computers is not that they will eventually get as smart as men, but
we will meanwhile agree to meet them halfway."
-Bernard Avishai


mhx
Enthusiast / Moderator

Mar 6, 2002, 10:39 PM

Post #2 of 12 (1590 views)
Re: [nanohurtz] Cracking A Challenging Text Pattern [In reply to] Can't Post

Here's what I would do:

[perl]
#!/usr/bin/perl -w
use strict;

my @data = map [ /(\w+)\#(\w+)\s+(.*)/s and ($1, $2, [split /\s+/, $3]) ],
do {local $/; <DATA>} =~ /BATCH\s+(.*?)\s+END/sg;

for my $rec ( @data ) {
print map { join( ' | ', @$rec[0,1], $_ ) . "\n" } @{$rec->[2]};
}
[/perl]

I'll skip the explanation since you posted to intermediate. If this seems like too much magic to you, just tell me and I'll explain what I'm doing here.

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



Paul
Enthusiast

Mar 7, 2002, 8:52 AM

Post #3 of 12 (1581 views)
Re: [mhx] Cracking A Challenging Text Pattern [In reply to] Can't Post

>> \# <<

Hmm why do people insist on escaping non-meta characters in regexs Crazy


nanohurtz
Novice

Mar 7, 2002, 8:57 AM

Post #4 of 12 (1579 views)
Cracking A Challenging Text Pattern::Still Need Help [In reply to] Can't Post

Trust me I understand.. there was one more thing I was missing though.. same pattern but with a snag

BATCH RAMS125A#REDMERGE
AND123AT
RRD456WT
FOLLOWS WWS789WT
FOLLOWS PLL665QQ
END
BATCH RSWS555A#BLUMERGE
POP555BT
III777CT
END
BATCH RCCS919A#WHTMERGE
EXCEPT UDE888QT
RXX818WT
END


It's still to produce the same out put however minus the "Follows", "Excepts" and or anything that does not comply with the (char, char, char, num,num,num,char,char) format like "WWS789WT" Between the "BATCH" and "END" class. Hence the ([A-Z][A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z]$/) (..I know, it's malformed) string mentioned earlier
-NanoHurtz

"The danger from computers is not that they will eventually get as smart as men, but
we will meanwhile agree to meet them halfway."
-Bernard Avishai


mhx
Enthusiast / Moderator

Mar 7, 2002, 9:10 AM

Post #5 of 12 (1576 views)
Re: [RedRum] Cracking A Challenging Text Pattern [In reply to] Can't Post


In Reply To
Hmm why do people insist on escaping non-meta characters in regexs


Mmmm, I don't know. Perhaps because I was in the mood to type lots of backslashes. Wink
Of course you're right, the backslash is optional, but it won't hurt. (Except our eyes, of course.)

-- \m\h\x

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



mhx
Enthusiast / Moderator

Mar 7, 2002, 9:17 AM

Post #6 of 12 (1574 views)
Re: [nanohurtz] Cracking A Challenging Text Pattern::Still Need Help [In reply to] Can't Post

Ok, then what about the following?

[perl]
#!/usr/bin/perl -w
use strict;

my @data = map [ /(\w+)#(\w+)\s+(.*)/s and ($1, $2,
[grep /^[A-Z]{3}\d{3}[A-Z]{2}$/, split /\s+/, $3])
], do {local $/; <DATA>} =~ /BATCH\s+(.*?)\s+END/sg;

for my $rec ( @data ) {
print map { join( ' | ', @$rec[0,1], $_ ) . "\n" } @{$rec->[2]};
}

__DATA__
BATCH RAMS125A#REDMERGE
AND123AT
RRD456WT
FOLLOWS WWS789WT
FOLLOWS PLL665QQ
END
BATCH RSWS555A#BLUMERGE
POP555BT
III777CT
END
BATCH RCCS919A#WHTMERGE
EXCEPT UDE888QT
RXX818WT
END
[/perl]

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



nanohurtz
Novice

Mar 7, 2002, 9:29 AM

Post #7 of 12 (1571 views)
Re: [mhx] Cracking A Challenging Text Pattern::Still Need Help [In reply to] Can't Post

Excellent..thanks!! didn't think of using a grep statement (duh..sometimes the classics don't hurt)..I like the code, but what about actually reading in an external file and producing the the delimited ("|" or .xls) output file like my original code had...

@ARGV == 2 or die "usage: $0 "Cant read your input file\,";

($inputput, $output) = (at)ARGV;

open (INFILE, "< $filein") or die "What's this? $!\n";

open (OUTFILE, "> $fileout") or die "What the..? $!\n";

while (<INFILE>) {

chomp;

if (m/^BATCH (.*)#(.*)$/) {
$o_lvar=$1;
$o_rvar=$2;
print OUTFILE "$o_lvar\t$o_rvar\n";
}


}

exit 0;
-NanoHurtz

"The danger from computers is not that they will eventually get as smart as men, but
we will meanwhile agree to meet them halfway."
-Bernard Avishai


mhx
Enthusiast / Moderator

Mar 7, 2002, 4:51 PM

Post #8 of 12 (1566 views)
Re: [nanohurtz] Cracking A Challenging Text Pattern::Still Need Help [In reply to] Can't Post


In Reply To
but what about actually reading in an external file and producing the the delimited ("|" or .xls) output file like my original code had...


I didn't think that was a problem? Well, just open your input file and replace the DATA handle by the input file handle and open the output file and add the handle to that file to the print statement in the for loop.

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



nanohurtz
Novice

Mar 7, 2002, 9:36 PM

Post #9 of 12 (1560 views)
Re: [nanohurtz] Cracking A Challenging Text Pattern [In reply to] Can't Post

can someone please tell me what is wrong with this code?


#!/appl/perl5/bin/perl -w
#
use strict;
open(IN, "< file_in") or die "cannot open the input-file for reading";
open(OUT, "> file_out.xls") or die "Cannout open the output-file for writing";
my ($o_bvar, $o_lvar, $o_rvar);
foreach(<IN>) {
chomp($_);
if ($_=~/^BATCH (.*)#(.*)$/) {
$o_lvar=$1;
$o_rvar=$2;
} elsif (($_!~/^END/i) && $_) {
$_=~/([A-Z][A-Z][A-Z][0-9][0-9][0-9][A-Z][A-Z])$/i;
$o_bvar=$1;
print OUT "$o_lvar\t$o_rvar\t$o_bvar\t$_\n";
}
}
close(IN);
close(OUT);

I keep getting an initialization chunk(###) errors in my print OUT statement although the variables had been clearly defined. I think the error seems to occur when ever the $_=~[A-Z]..n statement does not find a match for that line in the file


i.e

BATCH RQDS333A#SPOCKERT
AT 0300 <--no_match
UNTIL 0400 <--no_match
RDS999DP <--match
TTR144RT <--match
FFR877RT <--match
AQQ877FR <--match
END
BATCH GHHR888A#BACONHED
AT 0300 <--no_match
UNTIL 0400 <--no_match
OOP433DP <--match
WWS777RT <--match
AT 1200 <--no_match
FFR877RT <--match
KKL616FR <--match
KKL616FR <--match
KKL616FR <--match
AT 0700 <--no_match
END


should I be re-initializing my $o_bvar variable with every line read? and if so how? Unsure

-Perl'd Out
-NanoHurtz

"The danger from computers is not that they will eventually get as smart as men, but
we will meanwhile agree to meet them halfway."
-Bernard Avishai


mhx
Enthusiast / Moderator

Mar 7, 2002, 11:33 PM

Post #10 of 12 (1557 views)
Re: [nanohurtz] Cracking A Challenging Text Pattern [In reply to] Can't Post

The code I've posted is supposed to operate on the whole file at once. You won't get it to work reading one line after another. So, here's a complete working version that reads from one file and writes to another:

[perl]
#!/usr/bin/perl -w
use strict;

my $infile = 'file_in';
my $outfile = 'file_out.xls';

open IN, $infile or die "cannot open $infile: $!\n";
my @data = map [ /(\w+)#(\w+)\s+(.*)/s and ($1, $2,
[grep /^[A-Z]{3}\d{3}[A-Z]{2}$/, split /\s+/, $3])
], do {local $/; <IN>} =~ /BATCH\s+(.*?)\s+END/sg;
close IN;

open OUT, ">$outfile" or die "cannot open $outfile: $!\n";
for my $rec ( @data ) {
print OUT map { join( ' | ', @$rec[0,1], $_ ) . "\n" } @{$rec->[2]};
}
close OUT;
[/perl]

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



nanohurtz
Novice

Mar 8, 2002, 9:33 AM

Post #11 of 12 (1551 views)
Re: [mhx] Cracking A Challenging Text Pattern [In reply to] Can't Post

The force is definitely strong with you. Thanks for your help above all your patience Wink
-NanoHurtz

"The danger from computers is not that they will eventually get as smart as men, but
we will meanwhile agree to meet them halfway."
-Bernard Avishai


dave
Novice

Mar 10, 2002, 2:18 AM

Post #12 of 12 (1532 views)
Re: [nanohurtz] Cracking A Challenging Text Pattern::Still Need Help [In reply to] Can't Post


Code
# I can't see why you would want to store data like that, but anyway...  

my $m;
map{ s/^((BATCH|END)\s)(.*)$/$m=$3;$m=~s!#! | !/oe || s/^((\w+)\s)?(.+)/$3/o && print "$m| $_" }<DATA>;

# Use the /o modifier to compile patterns just once.


#-------------------------------------
__DATA__
BATCH RAMS125A#REDMERGE
AND123AT
FOLLOWS RRD456WT
FOLLOWS WWS789WT
PLL665QQ
END
BATCH RSWS555A#BLUMERGE
POP555BT
III777CT
END
BATCH RCCS919A#WHTMERGE
EXCEPT UDE888QT
RXX818WT
END
BATCH RDDS919A#PRPMERGE
EXCEPT UDE888QT
RUU818WT
FOLLOWS KJJ112WE
RTT900VC
END

dave

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives