
Zhris
Enthusiast
Oct 19, 2013, 9:59 AM
Post #4 of 5
(2076 views)
|
Re: [newbie01.perl] - Help parsing a long string with special character
[In reply to]
|
Can't Post
|
|
Hi, Here I mainly focus on the actual parsing. I'm not certain if there is a name for the format of data you are parsing, but I have namespaced it blah. There may even be a way to parse it using a module available on CPAN (I did briefly look at Text::Balanced). I have produced a package with three core functions: in - this takes a string of blah and returns a Perl ref. It won't exactly produce Bill's proposed structure i.e. ADDRESS keys remain, but I believe the structure produced is more appropriate. I have tested with complex, deeply nested blah and have not had any problems. out - this takes a Perl ref and returns a string of blah. I haven't done much testing with this, therefore there may be a scenario where poorly formed blah is produced or the two blahs are not identical in the process blah -> perl ref -> blah. Inevitably, its best to only call this on Perl refs that were originally generated from blah, unless you know how to formulate the appropriate structure. fetch - this takes a selector string and a Perl ref and returns a Perl ref. It is a very rough and ready function designed to provide a chained selector style interface to a Perl ref, it doesn't really belong with the blah functions. There are scenarios where this would break i.e. number only keys. This could eventually be reworked to allow you pass in the format you desire i.e. 'testp1.description.address_list_02.host'. This is a prototype and not production ready, please use with care. I will continue to work on this when I have time and will eventually post the final package. But you have something to work from...
#!/usr/bin/perl use strict; use warnings FATAL => qw/ all /; package blah; sub in { my ($str, $ref, $opt) = @_; $str ||= ''; $ref ||= { }; $opt->{open} ||= '('; $opt->{close} ||= ')'; $opt->{equal} ||= '='; my $has_child = 0; my $val = ''; while ($str =~ s/^(.)(.*)/$2/sg) { my $chr = $1; if ($chr eq $opt->{open}) { die 'poorly formed blah' unless $str =~ m/^\s*[A-Z0-9_]+\s*\Q$opt->{equal}\E/i; $has_child++; my $key; ($key, $str) = split /$opt->{equal}/, $str, 2; $key = trim($key); if (exists $ref->{$key}) { (ref $ref->{$key} eq 'ARRAY') ? (push @{$ref->{$key}}, undef) : ($ref->{$key} = [ $ref->{$key}, undef ]) ; ($str, $ref->{$key}->[-1]) = in($str, $ref->{$key}->[-1], $opt); } else { ($str, $ref->{$key}) = in($str, $ref->{$key}, $opt); } } elsif ($chr eq $opt->{close}) { die 'poorly formed blah' unless $str =~ m/^\s*(?:\Q$opt->{open}\E|\Q$opt->{close}\E|\s*$)/i; $ref = trim($val) unless $has_child; return ($str, $ref); } else { $val .= $chr; } } return $ref; } sub out { my ($ref, $opt, $prev_key) = @_; $ref ||= { }; $opt->{open} ||= '('; $opt->{close} ||= ')'; $opt->{equal} ||= '='; $opt->{pretty} ||= 0; # todo indented / linespaced blah. my $str = ''; if (ref $ref eq 'ARRAY') { foreach my $val (@$ref) { $str .= $opt->{open} . $prev_key . $opt->{equal} . out($val) . $opt->{close}; } } elsif (ref $ref eq 'HASH') { while (my ($key, $val) = each %$ref) { if (ref $val eq 'ARRAY') { $str .= out($val, $opt, $key); } else { $str .= $opt->{open} . $key . $opt->{equal} . out($val, $opt, $key) . $opt->{close}; } } } else { $str .= $ref; } return $str; } sub fetch { my ($str, $ref, $opt) = @_; return $ref unless defined $str; my $sel; ($sel, $str) = split /\./, $str, 2; if ($sel =~ m/^\d+$/) { return fetch($str, $ref->[$sel], $opt); } elsif ($sel =~ m/^[A-Z0-9_]+/i) { return fetch($str, $ref->{$sel}, $opt); } return; } sub trim { $_[0] =~ s/^\s+//; $_[0] =~ s/\s+$//; return $_[0]; } package main; use Data::Dumper qw/ Dumper /; my $blah = do { local $/ = undef; <DATA> }; my $ref; $ref = blah::in($blah); print Dumper $ref; $blah = blah::out($ref); print Dumper $blah; $ref = blah::in($blah); print Dumper $ref; $ref = blah::fetch('DESCRIPTION.ADDRESS_LIST.1.ADDRESS', $ref); print Dumper $ref; $blah = blah::out($ref); print Dumper $blah; $ref = blah::in($blah); print Dumper $ref; __DATA__ (DESCRIPTION = (LOAD_BALANCE=off) (FAILOVER=on) (CONNECT_TIMEOUT=5) (TRANSPORT_CONNECT_TIMEOUT=3) (RETRY_COUNT=3) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (Host = testp1prim.mnl.ph.com) (Port = 10666) ) ) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (Host = testp1stdby.mnl.ph.com) (Port = 10666) ) ) (CONNECT_DATA = (SERVICE_NAME=testp1_app.mnl.ph.com) ) ) output:
$VAR1 = { 'DESCRIPTION' => { 'LOAD_BALANCE' => 'off', 'RETRY_COUNT' => '3', 'TRANSPORT_CONNECT_TIMEOUT' => '3', 'CONNECT_DATA' => { 'SERVICE_NAME' => 'testp1_app.mnl.ph.com' }, 'ADDRESS_LIST' => [ { 'ADDRESS' => { 'PROTOCOL' => 'TCP', 'Port' => '10666', 'Host' => 'testp1prim.mnl.ph.com' } }, { 'ADDRESS' => { 'PROTOCOL' => 'TCP', 'Port' => '10666', 'Host' => 'testp1stdby.mnl.ph.com' } } ], 'FAILOVER' => 'on', 'CONNECT_TIMEOUT' => '5' } }; $VAR1 = '(DESCRIPTION=(LOAD_BALANCE=off)(RETRY_COUNT=3)(TRANSPORT_CONNECT_TIMEOUT=3)(CONNECT_DATA=(SERVICE_NAME=testp1_app.mnl.ph.com))(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(Port=10666)(Host=testp1prim.mnl.ph.com)))(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(Port=10666)(Host=testp1stdby.mnl.ph.com)))(FAILOVER=on)(CONNECT_TIMEOUT=5))'; $VAR1 = { 'DESCRIPTION' => { 'LOAD_BALANCE' => 'off', 'RETRY_COUNT' => '3', 'TRANSPORT_CONNECT_TIMEOUT' => '3', 'ADDRESS_LIST' => [ { 'ADDRESS' => { 'PROTOCOL' => 'TCP', 'Port' => '10666', 'Host' => 'testp1prim.mnl.ph.com' } }, { 'ADDRESS' => { 'PROTOCOL' => 'TCP', 'Port' => '10666', 'Host' => 'testp1stdby.mnl.ph.com' } } ], 'CONNECT_DATA' => { 'SERVICE_NAME' => 'testp1_app.mnl.ph.com' }, 'FAILOVER' => 'on', 'CONNECT_TIMEOUT' => '5' } }; $VAR1 = { 'PROTOCOL' => 'TCP', 'Port' => '10666', 'Host' => 'testp1stdby.mnl.ph.com' }; $VAR1 = '(PROTOCOL=TCP)(Port=10666)(Host=testp1stdby.mnl.ph.com)'; $VAR1 = { 'PROTOCOL' => 'TCP', 'Port' => '10666', 'Host' => 'testp1stdby.mnl.ph.com' }; I hope this helps you to achieve your goal. Chris
(This post was edited by Zhris on Oct 19, 2013, 11:23 AM)
|