CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Help with parsing text

 



andrew_b
stranger

Jul 8, 2006, 6:42 PM

Post #1 of 3 (4138 views)
Help with parsing text Can't Post

 
I need to parse some text that contains essentially key value pairs. It is in the following format:

Key is:
Beginning of line, followed by known text, followed by a ':' character.

Value is:
Text. Could include multiple lines, blank lines, the ':' character, whatever.

A key/value pair ends when any other key is encountered, or the end of the file.

The program will have a list of the keys. They are optional and the order may be unpredictable.

I'm having a hard time saying:
Match any key in this list, followed by text, up until the first instance of any other key in the list.

So far the only reliable approach I thought of is:

Match a key, store the remaining text.
Run through the list of keys, looking for matches in the original remainder, storing these as well.
Select the shortest one.

Can anyone suggest a more efficient approach?

Thanks,
Andrew


KevinR
Veteran


Jul 8, 2006, 7:09 PM

Post #2 of 3 (4137 views)
Re: [andrew_b] Help with parsing text [In reply to] Can't Post

lets see some of the text
-------------------------------------------------


andrew_b
stranger

Jul 9, 2006, 8:47 AM

Post #3 of 3 (4134 views)
Re: [KevinR] Help with parsing text [In reply to] Can't Post

Consider the following:


Code
#!/usr/bin/perl 
use warnings;
use strict;

#the text could be from a file, etc. but:
my $text;
undef $/;
$text = <DATA>;

#The keys would be known, e.g.
my @keys = (
'A key could be:',
'A value could be:',
'The order is:',
'The keys are:',
);

#What I'd like to end up with is something like:
my %result = (
'A key could be:' => 'Anything\n\n',
'A value could be:' => 'Any text.\n\nEven multiple lines.\n\nMight include a : or whatever\n\n',
'The order is:' => '\nunpredictable\n\n',
'The keys are:' => 'optional\n',
);

#the code below works but seems inefficient:
my %mess;

foreach my $key(@keys) {
if ($text =~ /$key/) {
$mess{$key} = {'remainder' => $', 'length' => length $'};
}
}

while (my ($mess_key, $mess_val_ref) = each %mess) {
foreach my $key(@keys) {
if ($mess_val_ref->{'remainder'} =~ /$key/) {
my $length = length $`;
if ($length < $mess{$mess_key}{'length'}) {
$mess{$mess_key}{'length'} = $length;
$mess{$mess_key}{'value'} = $`;
}
}
else {
$mess{$mess_key}{'value'} = $mess_val_ref->{'remainder'};
}
}
}

while( my($key, $val) = each %mess ) {
print $key, $val->{'value'};
}

#other ideas?

__DATA__
A key could be: Anything

A value could be: Any text.

Even multiple lines.

Might include a : or whatever

The order is:
unpredictable

The keys are: optional


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives