CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Parsing data from a file with split & regex

 



bill1234
Novice

Jan 23, 2013, 11:49 AM

Post #1 of 9 (1245 views)
Parsing data from a file with split & regex Can't Post

So, i have a pretty basic understanding of how to parse data from something like a comma delimited file, but i can't really figure out how to do the following:

I have a file with large amounts of data, and it will be listed in one large text string for example:
employee name="Steve" employee phone="5551234123" employee address=''1234 street"

I want a script that will parse all of this into a comma or pipe-separated file, maybe even with a header, like:
Name|Number|Address
Steve|5551234123|1234 street

I also need it to be restrictive, so it can't just include all data between quotes, i would need it specifically to print everything from within (employee name="").


BillKSmith
Veteran

Jan 23, 2013, 3:09 PM

Post #2 of 9 (1240 views)
Re: [bill1234] Parsing data from a file with split & regex [In reply to] Can't Post

It is very hard to get a parser right without more info about the syntax. This does your example:

Code
use strict; 
use warnings;
use Readonly;
Readonly::Scalar my $SEPARATOR => q(|);
my $Infile = <DATA>;
my @fields = $Infile =~ /([\w ]+="[\w ]+")/g;
my %output_hash;
foreach my $field (@fields) {
my ($label, $value) = $field =~ /([\w ]+)="([\w ]+)"/;
$output_hash{$label} = $value;
}
print join( $SEPARATOR, keys %output_hash), "\n";
print join( $SEPARATOR, values %output_hash), "\n";
__DATA__
employee name="Steve" employee phone="5551234123" employee address="1234 street"

Good Luck,
Bill


7stud
Enthusiast

Jan 23, 2013, 4:14 PM

Post #3 of 9 (1239 views)
Re: [bill1234] Parsing data from a file with split & regex [In reply to] Can't Post


In Reply To
I have a file with large amounts of data, and it will be listed in one large text string for example:
employee name="Steve" employee phone="5551234123" employee address=''1234 street"

The first thing you should do is ring the neck of the person that chose that format.


Code
 
use strict;
use warnings;
use 5.012;

my $test = 'employee name="Steve" employee phone="5551234123" employee address="1234 street"';

while ($text =~ /
\s* #0 or more spaces
( #start of $1 (whole field)
( #start of $2 (field name)
[^=]+ #not an equals sign, 1 or more times
) #end of $2
= #an equals sign
" #a double quote
( #start of $3 (field value)
[^"]+ #not a double quote, 1 or more times
) #end of $3
" #a double quote
) #end of $1
/gxms #global matching flag plus standard xms
) {

my($whole_field, $field_name, $field_value) = ($1, $2, $3);
say $whole_field;
say $field_name;
say $field_value;
say '*' x 20;
}

--output:--
employee name="Steve"
employee name
Steve
********************
employee phone="5551234123"
employee phone
5551234123
********************
employee address="1234 street"
employee address
1234 street
********************



(This post was edited by 7stud on Jan 23, 2013, 9:35 PM)


rovf
Veteran

Jan 24, 2013, 1:15 AM

Post #4 of 9 (1212 views)
Re: [bill1234] Parsing data from a file with split & regex [In reply to] Can't Post

(1) What characters can go inside the quotes strings? For instances, can the data between the quotes contain double-quotes? Can they contain one of the following characters:

$ @ % { } %

(2) Are the name of the fields relevant? In your example, you just use the part within the quotes, but ignore everything else.


bill1234
Novice

Jan 28, 2013, 8:48 AM

Post #5 of 9 (1196 views)
Re: [BillKSmith] Parsing data from a file with split & regex [In reply to] Can't Post


In Reply To
It is very hard to get a parser right without more info about the syntax. This does your example:

Code
use strict; 
use warnings;
use Readonly;
Readonly::Scalar my $SEPARATOR => q(|);
my $Infile = <DATA>;
my @fields = $Infile =~ /([\w ]+="[\w ]+")/g;
my %output_hash;
foreach my $field (@fields) {
my ($label, $value) = $field =~ /([\w ]+)="([\w ]+)"/;
$output_hash{$label} = $value;
}
print join( $SEPARATOR, keys %output_hash), "\n";
print join( $SEPARATOR, values %output_hash), "\n";
__DATA__
employee name="Steve" employee phone="5551234123" employee address="1234 street"




This worked great. I'm running into problems though reading the data from a file. Using the following i was able to get it to print to file using <DATA> as the input, but when i tried to read it from a file it ends up printing no data.

use strict;
use warnings;
use Readonly;
use autodie;
Readonly::Scalar my $SEPARATOR => q(|);

my $filename2 = 'c:\test\test.txt';
open(my $fh1, '<', $filename2);
my $Infile=$filename2;
my @fields = $Infile =~ /([\w ]+="[\w ]+")/g;
my %output_hash;

my $filename = 'C:\test\test2.txt';
open my $fh, '>>', $filename or die "Cannot open '$filename' for reading: $!";
foreach my $field (@fields) {
my ($label, $value) = $field =~ /([\w ]+)="([\w ]+)"/;
$output_hash{$label} = $value;
}
print $fh join( $SEPARATOR, keys %output_hash), "\n";
print $fh join( $SEPARATOR, values %output_hash), "\n";
#__DATA__
#name="Steve" phone="5551234123" address="1234 street"


Chris Charley
User

Jan 28, 2013, 1:06 PM

Post #6 of 9 (1192 views)
Re: [bill1234] Parsing data from a file with split & regex [In reply to] Can't Post

This is my attempt to print the pipe separated values.

Code
#!/usr/bin/perl 
use strict;
use warnings;

my $file = "o33.txt";
open FH, "<", $file or die "Unable to open '$file'. $!";

while (<FH>) {
print join("|", /="([^"]+)/g), "\n";
}

close FH or die "Unable to close '$file'. $!";

__END__
*** o33.txt contents
employee name="Steve" employee phone="5551234123" employee address="1234 street"
employee name="Tom" employee phone="4441234123" employee address="1234 Ave"
employee name="Nick" employee phone="8881234123" employee address="PO Box 1881"

*** prints
Steve|5551234123|1234 street
Tom|4441234123|1234 Ave
Nick|8881234123|PO Box 1881



BillKSmith
Veteran

Jan 28, 2013, 7:33 PM

Post #7 of 9 (1183 views)
Re: [bill1234] Parsing data from a file with split & regex [In reply to] Can't Post

I used the diamond (<>) operator to read from perl's special filehandle (DATA). You must replace my filehandle (DATA) with your lexical filehandle ($fh1).

In your code, Replace

Code
my $Infile=$filename2;


with

Code
my $infile = <$fh1>;


I believe that the rest of your code is correct (assuming that your file 'test.txt' contains just one long line)
Good Luck,
Bill


bill1234
Novice

Jan 29, 2013, 8:39 AM

Post #8 of 9 (1173 views)
Re: [BillKSmith] Parsing data from a file with split & regex [In reply to] Can't Post


In Reply To
I used the diamond (<>) operator to read from perl's special filehandle (DATA). You must replace my filehandle (DATA) with your lexical filehandle ($fh1).

In your code, Replace

Code
my $Infile=$filename2;


with

Code
my $infile = <$fh1>;


I believe that the rest of your code is correct (assuming that your file 'test.txt' contains just one long line)


That worked. Thanks!

What would i have to change to get it to read a multi-line file?


BillKSmith
Veteran

Jan 29, 2013, 9:52 AM

Post #9 of 9 (1169 views)
Re: [bill1234] Parsing data from a file with split & regex [In reply to] Can't Post

You can read multiple lines by putting a while loop around the code between open and print. This will still output only two lines. The first is headers (in an unpredictable order). The second is the values (in the same order as the headers). If any header is repeated anywhere in the file, it will only be output once (with the value of the last occurrence).
Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives