CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
regex grouping and parsing issue

 



alferic
Novice

Oct 16, 2013, 12:24 AM

Post #1 of 6 (716 views)
regex grouping and parsing issue Can't Post

Hello Guru's,

I need your assistance on the issue that I have. I have file that contains like this:

LOT ID : KC28H23AA NODE : TB30
DEVICE : DI3303X01XX-D5X-DUA USER ID:
FILENAME: I3303X WF# : 17

I used the code below to parse the values I wanted:
my $lotid = "";
my $node = "";
my $device = "";
my $user_id = "";
my $filename = "";
my $wafer_id = "":
open FH, $file or die "can't open $file\n";
while($line=<FH>)
{
chomp($line);
$line =~ s/\cM//g;
if ($line =~ /^LOT\s+ID\s+:(.*?)NODE\s+:(.*?)/i)
{
$lotid = uc($1);
$node = uc($2);
}
elsif ($line =~ /^DEVICE\s+:(.*?)USER\s+ID:(.*?)/i)
{
$device = uc($1);
$user_id = uc($2);
}
elsif ($line =~ /^FILENAME:(.*?)WF#\s+:\s+(\d+)/i)
{
$filename = uc($1);
$wafer_id = uc($2);
}
}
close(FH);

print "lotid : $lotid\n";
print "node : $node\n";
print "deviceid : $device\n";
print "userid : $user_id\n";
print "file : $filename\n";
print "waferid : $wafer_id\n";

The problem is some values won't print. if you run the script using the data above, all parsed values appears except "node" value.
Output:
lotid : KC28H23AA
node :
deviceid : DI3303X01XX-D5X-DUA
userid :
testplan : I3303X
waferid : 17

I ran a couple of test to check if my script is dynamic but encountered some issues below.
-When i put a value on the user id section of the data, the user id value will not appear.
-If I remove the value under "WF#", the "filename" value will not appear also.

Please help.


Zhris
Enthusiast

Oct 16, 2013, 7:16 AM

Post #2 of 6 (708 views)
Re: [alferic] regex grouping and parsing issue [In reply to] Can't Post

Hi,

Its all to do with your regexps.


Quote
When i put a value on the user id section of the data, the user id value will not appear.


(.*?) performs as minimal matching as possible, therefore you will need to tell it to match everything up to the end of the line using $ end of line anchor, otherwise it won't.


Quote
If I remove the value under "WF#", the "filename" value will not appear also


If you remove the value, then the regexp will no longer match. Tell it there can be digits or not by replacing + with * quantifier.

These adjustments should fix your immediate issues, but will not necessarily fix every input scenario you may have beyond those you have described...


Code
^LOT\s+ID\s+:(.*?)NODE\s+:(.*?) 
^LOT\s+ID\s+:(.*?)NODE\s+:(.*?)$

^DEVICE\s+:(.*?)USER\s+ID:(.*?)
^DEVICE\s+:(.*?)USER\s+ID:(.*?)$

^FILENAME:(.*?)WF#\s+:\s+(\d+)
^FILENAME:(.*?)WF#\s+:\s+(\d*)



Chris


(This post was edited by Zhris on Oct 16, 2013, 7:22 AM)


alferic
Novice

Oct 16, 2013, 7:52 PM

Post #3 of 6 (679 views)
Re: [Zhris] regex grouping and parsing issue [In reply to] Can't Post

Hi Chris,

Thank you for your inputs! I google and read about perl positioning if that how they call it :) I manage to get what I need. I've used the regex below:

$line =~ /^LOT ID.{3}(.+)NODE.{5}(.+)/i

$line =~ /^DEVICE.{3}(.+)USER ID.{2}(.+)/i

$line =~ /^FILENAME.{2}(.+)WF#.{6}(\d+)/i


2teez
Novice

Oct 16, 2013, 10:46 PM

Post #4 of 6 (676 views)
Re: [alferic] regex grouping and parsing issue [In reply to] Can't Post

Hi alferic,

What happens if you have another string to parse apart from the ones you have shown hitherto, one, you will have to increase your if/else statement for every new cases of regex.

I would rather suggest you use a Dispatch table for each case of regex than using if/else statement. Or you look for a common regex for all the lines to parse like so:


Code
use warnings; 
use strict;
use Data::Dumper;

my %info;
my $req = qr/^(.*?)(\s+)?:\s+(.+?)\s+(.+):\s+(\w*)/;

while(<DATA>){
chomp;
@info{$1,$4}=($3,$5) if /$req/;
}

print Dumper \%info

__DATA__
LOT ID : KC28H23AA NODE : TB30
DEVICE : DI3303X01XX-D5X-DUA USER ID:
FILENAME: I3303X WF# : 17


Then printing your desired output will just be a matter of simple logic.
See output below for demostration

Code
$VAR1 = { 
'NODE ' => 'TB30',
'FILENAME' => 'I3303X',
'USER ID' => '',
'LOT ID' => 'KC28H23AA',
'WF# ' => '17',
'DEVICE' => 'DI3303X01XX-D5X-DUA'
};



Zhris
Enthusiast

Oct 17, 2013, 8:11 AM

Post #5 of 6 (664 views)
Re: [alferic] regex grouping and parsing issue [In reply to] Can't Post

Hi alferic,

Good to hear you have solved your issue.

The new regexp's you have produced look alot tidier, but do not work with your original sample of input data.

2teez provides a practical approach that is ideal for the type of input you are processing. Consider using if you continue to have issues down the line.

Chris


(This post was edited by Zhris on Oct 17, 2013, 8:12 AM)


alferic
Novice

Oct 19, 2013, 6:25 AM

Post #6 of 6 (637 views)
Re: [Zhris] regex grouping and parsing issue [In reply to] Can't Post

Thanks Chris and 2Teez, I will consider 2teez approach if I encounter some issues with my approach.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives