CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
- Help parsing a long string with special character

 



newbie01.perl
Novice

Oct 16, 2013, 8:01 PM

Post #1 of 5 (1172 views)
- Help parsing a long string with special character Can't Post

Hi,

Before anything else, sorry for a long post.

I need some advice please on how to parse the following string output.

I currently have the Perl script below, just starting and still a long way to go and urgently needing some guidance specially on parsing tips. Hopefully, I explain well what I am wanting to do.

- Perl script -


Code
$: cat tnsping.pl 
#!/bin/perl

my $TNS=$ARGV[0] ;
open ( PROG, "/opt/oracle/9.2.0.7/bin/tnsping $TNS |" ) || die "Failed: $! \n";
while ( <PROG> ) {
if ( $_ =~ /Attempting to contact/ )
{
print $_ ;
}
}
close PROG;

exit 0 ;
$


- Sample OUTPUT run -


Code
 
$: ./tnsping.pl testp1
Attempting to contact (DESCRIPTION =(LOAD_BALANCE=off)(FAILOVER=on)(CONNECT_TIMEOUT=5)(TRANSPORT_CONNECT_TIMEOUT=3)(RETRY_COUNT=3) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (Host = testp1prim.mnl.ph.com) (Port = 10666))) (ADDRESS_LIST = (ADDRESS = (PROTOCOL = TCP) (Host = testp1stdby.mnl.ph.com) (Port = 10666))) (CONNECT_DATA = (SERVICE_NAME=testp1_app.mnl.ph.com)))
$: ./tnsping.pl testd1
Attempting to contact (DESCRIPTION = (ADDRESS = (COMMUNITY = tcpip.world) (PROTOCOL = TCP) (Host = testd1.mnl.ph.com) (Port = 25666)) (CONNECT_DATA = (SID = testd1) (GLOBAL_NAME = testd1.mnl.ph.com)))
$: ./tnsping.pl livep1
Attempting to contact (DESCRIPTION = (ADDRESS_LIST = (LOAD_BALANCE=on)(FAILOVER=on) (ADDRESS = (PROTOCOL = TCP)(HOST=livep1-vip1.mnl.ph.com)(PORT = 1529)) (ADDRESS = (PROTOCOL = TCP)(HOST=livep1-vip2.mnl.ph.com)(PORT = 1529)) (ADDRESS = (PROTOCOL = TCP)(HOST=livep1-vip3.mnl.ph.com)(PORT = 1529)) (ADDRESS = (PROTOCOL = TCP)(HOST=livep1-vip4.mnl.ph.com)(PORT = 1529)) (ADDRESS = (PROTOCOL = TCP)(HOST=livep1-vip5.mnl.ph.com)(PORT = 1529)) (ADDRESS = (PROTOCOL = TCP)(HOST=livep1-vip6.mnl.ph.com)(PORT = 1529))) (CONNECT_DATA = (SERVER = DEDICATED) (SERVICE_NAME = livep1_app.mnl.ph.com)))



- Note that the output can sometimes have two ADDRESS_LIST sections or one ADDRESS_LIST but with multiple ADDRESS section or no ADDRESS_LIST at all but with an ADDRESS section
- I want to be able to parse the output so that I can access the values like a variable.

- The string that I am parsing is only one line. For clarity, the string that I want to parse, manually indented, looks as below:

For tnsping testp1


Code
(DESCRIPTION = 
(LOAD_BALANCE=off)
(FAILOVER=on)
(CONNECT_TIMEOUT=5)
(TRANSPORT_CONNECT_TIMEOUT=3)
(RETRY_COUNT=3)
(ADDRESS_LIST =
(ADDRESS =
(PROTOCOL = TCP)
(Host = testp1prim.mnl.ph.com)
(Port = 10666)
)
)
(ADDRESS_LIST =
(ADDRESS =
(PROTOCOL = TCP)
(Host = testp1stdby.mnl.ph.com)
(Port = 10666)
)
)
(CONNECT_DATA =
(SERVICE_NAME=testp1_app.mnl.ph.com)
)
)


- I want to to be able to access the whole string as testp1.description. Then access testp1.description.load_balance should give me the value OFF, testp1.description.failover should me ON and so on. I need to be able to count the number of address_list though which can sometimes be more than four (4).
- I want to be able to access the values from first address_list as testp1.description.address_list_01.protocol, testp1.description.address_list_01.host and so on. And the second address list via testp1.description.address_list_02.protocol, testp1.description.address_list_02.host and so on. And also to be able to get all the information as testp1.description.address_list_01 or testp1.description.address_list_02.
- I want to be able to access the service_name information as testp1.description.connect_data.service_name or the full section as testp1.description.connect_data

- This is the the string in its simplest form.

For tnsping testd1


Code
(DESCRIPTION = 
(ADDRESS =
(COMMUNITY = tcpip.world)
(PROTOCOL = TCP)
(Host = testd1.mnl.ph.com)
(Port = 25666)
)
(CONNECT_DATA =
(SID = testd1)
(GLOBAL_NAME = testd1.mnl.ph.com)
)
)


- I want to to be able to access the whole string as testd1.description and to be able to access the values from the address_list as testd1.description.address_list.community, testd1.description.address_list.host etc. Sometimes the string will have a testd1.description.load_balance and sometimes it won't. The ADDRESS section may sometimes have more information that is whown above, i.e. it may contain FAILOVER information as well.

- The last variance of the string that I want to parse is as below:

For tnsping livep1


Code
(DESCRIPTION =  
( ADDRESS_LIST =
(LOAD_BALANCE=on) (FAILOVER=on)
( ADDRESS = (PROTOCOL = TCP) (HOST=livep1-vip1.mnl.ph.com) (PORT = 1529) )
( ADDRESS = (PROTOCOL = TCP) (HOST=livep1-vip2.mnl.ph.com) (PORT = 1529) )
( ADDRESS = (PROTOCOL = TCP) (HOST=livep1-vip3.mnl.ph.com) (PORT = 1529) )
( ADDRESS = (PROTOCOL = TCP) (HOST=livep1-vip4.mnl.ph.com) (PORT = 1529) )
( ADDRESS = (PROTOCOL = TCP) (HOST=livep1-vip5.mnl.ph.com) (PORT = 1529) )
( ADDRESS = (PROTOCOL = TCP) (HOST=livep1-vip6.mnl.ph.com) (PORT = 1529) )
)
( CONNECT_DATA =
(SERVER = DEDICATED) (SERVICE_NAME = livep1_app.mnl.ph.com)
)
)


- This is somewhat similar to tnsping testd1 but this time, I have multiple address lines this time. I want to to be able to access the whole string as livep1.description and to be able to access the values from the first address as livep1.description.address_list.address_01.host, livep1.description.address_list.address_01.port and so on and the values from the second address as livep1.description.address_list.address_02.host, livep1.description.address_list.address_02.host and so on. And also to be able to access the full address_list section as livep1.description.address_list or the first address section as livep1.description.address_list.address_01. The ADDRESS section may sometimes have more information that is whown above, i.e. it may contain FAILOVER information as well specific for each ADDRESS instead of being part of the ADDRESS_LIST as its global list.

- I believe the first thing that I have to do is to be able to distinguish which format is the one that I need to parse, i.e. is it the one that is not using an ADDRESS_LIST, one that has multiple ADDRESS_LIST or one that has one ADDRESS_LIST but may or may not have multiple ADDRESS'es. Then I need to do different parsing function/routines for each format.

- Hope to get feedback/guidance for what I am wanting to do. It would be good if I can get an example of parsing one of the string format and I will try and work out how to do it with the rest of the format.

- Sorry again for a long post and thanks in advance.


BillKSmith
Veteran

Oct 17, 2013, 10:26 AM

Post #2 of 5 (1158 views)
Re: [newbie01.perl] - Help parsing a long string with special character [In reply to] Can't Post

You have two problems. The first is the actual parsing. The second is storing/accessing the results of the parse. I am only going to address the second in this reply.

Your data structure design translates into Perl rather well. The syntax is slightly different from yours. The following is a complete perl program which you can run and/or modify. The data is your 'indented' data translated into a perl data structure. A few items from this structure are accessed and printed. Read the documentation in perldoc perldsc for details of how it works.


Code
use strict; 
use warnings;
my %test1 = (
DESCRIPTION => {
LOAD_BALANCE => 'off',
FAILOVER => 'on',
CONNECT_TIMEOUT => 5,
TRANSPORT_CONNECT_TIMEOUT => 3,
RETRY_COUNT => 3,
ADDRESS_LIST => [
{
PROTOCOL => 'TCP',
Host => 'testp1prim.mnl.ph.com',
Port => 10666,
},
{
PROTOCOL => 'TCP',
Host => 'testp1stdby.mnl.ph.comi',
Port => 10666,

}
],
CONNECT_DATA => {
SERVICE_NAME => 'testp1_app.mnl.ph.com',
},
},
);

my %whole_string = %{$test1{DESCRIPTION}};

print $test1{DESCRIPTION}{LOAD_BALANCE}, "\n";
print $test1{DESCRIPTION}{ADDRESS_LIST}[1]{PROTOCOL}, "\n";


Note that the index for perl arrays (address_list in this case) start at zero. The protocl for the second address is printed.
Good Luck,
Bill


Zhris
Enthusiast

Oct 18, 2013, 3:35 PM

Post #3 of 5 (1131 views)
Post deleted by Zhris [In reply to]

 


Zhris
Enthusiast

Oct 19, 2013, 9:59 AM

Post #4 of 5 (1099 views)
Re: [newbie01.perl] - Help parsing a long string with special character [In reply to] Can't Post

Hi,

Here I mainly focus on the actual parsing.

I'm not certain if there is a name for the format of data you are parsing, but I have namespaced it blah. There may even be a way to parse it using a module available on CPAN (I did briefly look at Text::Balanced).

I have produced a package with three core functions:

in - this takes a string of blah and returns a Perl ref. It won't exactly produce Bill's proposed structure i.e. ADDRESS keys remain, but I believe the structure produced is more appropriate. I have tested with complex, deeply nested blah and have not had any problems.

out - this takes a Perl ref and returns a string of blah. I haven't done much testing with this, therefore there may be a scenario where poorly formed blah is produced or the two blahs are not identical in the process blah -> perl ref -> blah. Inevitably, its best to only call this on Perl refs that were originally generated from blah, unless you know how to formulate the appropriate structure.

fetch - this takes a selector string and a Perl ref and returns a Perl ref. It is a very rough and ready function designed to provide a chained selector style interface to a Perl ref, it doesn't really belong with the blah functions. There are scenarios where this would break i.e. number only keys. This could eventually be reworked to allow you pass in the format you desire i.e. 'testp1.description.address_list_02.host'.

This is a prototype and not production ready, please use with care. I will continue to work on this when I have time and will eventually post the final package. But you have something to work from...


Code
#!/usr/bin/perl 
use strict;
use warnings FATAL => qw/ all /;

package blah;

sub in
{
my ($str, $ref, $opt) = @_;

$str ||= '';
$ref ||= { };
$opt->{open} ||= '(';
$opt->{close} ||= ')';
$opt->{equal} ||= '=';

my $has_child = 0;
my $val = '';

while ($str =~ s/^(.)(.*)/$2/sg)
{
my $chr = $1;

if ($chr eq $opt->{open})
{
die 'poorly formed blah' unless $str =~ m/^\s*[A-Z0-9_]+\s*\Q$opt->{equal}\E/i;

$has_child++;

my $key;
($key, $str) = split /$opt->{equal}/, $str, 2;
$key = trim($key);

if (exists $ref->{$key})
{
(ref $ref->{$key} eq 'ARRAY') ? (push @{$ref->{$key}}, undef) : ($ref->{$key} = [ $ref->{$key}, undef ]) ;
($str, $ref->{$key}->[-1]) = in($str, $ref->{$key}->[-1], $opt);
}
else
{
($str, $ref->{$key}) = in($str, $ref->{$key}, $opt);
}
}
elsif ($chr eq $opt->{close})
{
die 'poorly formed blah' unless $str =~ m/^\s*(?:\Q$opt->{open}\E|\Q$opt->{close}\E|\s*$)/i;

$ref = trim($val) unless $has_child;

return ($str, $ref);
}
else
{
$val .= $chr;
}
}

return $ref;
}

sub out
{
my ($ref, $opt, $prev_key) = @_;

$ref ||= { };
$opt->{open} ||= '(';
$opt->{close} ||= ')';
$opt->{equal} ||= '=';
$opt->{pretty} ||= 0; # todo indented / linespaced blah.

my $str = '';

if (ref $ref eq 'ARRAY')
{
foreach my $val (@$ref)
{
$str .= $opt->{open} . $prev_key . $opt->{equal} . out($val) . $opt->{close};
}
}
elsif (ref $ref eq 'HASH')
{
while (my ($key, $val) = each %$ref)
{
if (ref $val eq 'ARRAY')
{
$str .= out($val, $opt, $key);
}
else
{
$str .= $opt->{open} . $key . $opt->{equal} . out($val, $opt, $key) . $opt->{close};
}
}
}
else
{
$str .= $ref;
}

return $str;
}

sub fetch
{
my ($str, $ref, $opt) = @_;

return $ref unless defined $str;

my $sel;
($sel, $str) = split /\./, $str, 2;

if ($sel =~ m/^\d+$/)
{
return fetch($str, $ref->[$sel], $opt);
}
elsif ($sel =~ m/^[A-Z0-9_]+/i)
{
return fetch($str, $ref->{$sel}, $opt);
}

return;
}

sub trim
{
$_[0] =~ s/^\s+//;
$_[0] =~ s/\s+$//;

return $_[0];
}



package main;

use Data::Dumper qw/ Dumper /;

my $blah = do { local $/ = undef; <DATA> };
my $ref;

$ref = blah::in($blah);
print Dumper $ref;
$blah = blah::out($ref);
print Dumper $blah;
$ref = blah::in($blah);
print Dumper $ref;
$ref = blah::fetch('DESCRIPTION.ADDRESS_LIST.1.ADDRESS', $ref);
print Dumper $ref;
$blah = blah::out($ref);
print Dumper $blah;
$ref = blah::in($blah);
print Dumper $ref;



__DATA__
(DESCRIPTION =
(LOAD_BALANCE=off)
(FAILOVER=on)
(CONNECT_TIMEOUT=5)
(TRANSPORT_CONNECT_TIMEOUT=3)
(RETRY_COUNT=3)
(ADDRESS_LIST =
(ADDRESS =
(PROTOCOL = TCP)
(Host = testp1prim.mnl.ph.com)
(Port = 10666)
)
)
(ADDRESS_LIST =
(ADDRESS =
(PROTOCOL = TCP)
(Host = testp1stdby.mnl.ph.com)
(Port = 10666)
)
)
(CONNECT_DATA =
(SERVICE_NAME=testp1_app.mnl.ph.com)
)
)


output:

Code
$VAR1 = { 
'DESCRIPTION' => {
'LOAD_BALANCE' => 'off',
'RETRY_COUNT' => '3',
'TRANSPORT_CONNECT_TIMEOUT' => '3',
'CONNECT_DATA' => {
'SERVICE_NAME' => 'testp1_app.mnl.ph.com'
},
'ADDRESS_LIST' => [
{
'ADDRESS' => {
'PROTOCOL' => 'TCP',
'Port' => '10666',
'Host' => 'testp1prim.mnl.ph.com'
}
},
{
'ADDRESS' => {
'PROTOCOL' => 'TCP',
'Port' => '10666',
'Host' => 'testp1stdby.mnl.ph.com'
}
}
],
'FAILOVER' => 'on',
'CONNECT_TIMEOUT' => '5'
}
};
$VAR1 = '(DESCRIPTION=(LOAD_BALANCE=off)(RETRY_COUNT=3)(TRANSPORT_CONNECT_TIMEOUT=3)(CONNECT_DATA=(SERVICE_NAME=testp1_app.mnl.ph.com))(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(Port=10666)(Host=testp1prim.mnl.ph.com)))(ADDRESS_LIST=(ADDRESS=(PROTOCOL=TCP)(Port=10666)(Host=testp1stdby.mnl.ph.com)))(FAILOVER=on)(CONNECT_TIMEOUT=5))';
$VAR1 = {
'DESCRIPTION' => {
'LOAD_BALANCE' => 'off',
'RETRY_COUNT' => '3',
'TRANSPORT_CONNECT_TIMEOUT' => '3',
'ADDRESS_LIST' => [
{
'ADDRESS' => {
'PROTOCOL' => 'TCP',
'Port' => '10666',
'Host' => 'testp1prim.mnl.ph.com'
}
},
{
'ADDRESS' => {
'PROTOCOL' => 'TCP',
'Port' => '10666',
'Host' => 'testp1stdby.mnl.ph.com'
}
}
],
'CONNECT_DATA' => {
'SERVICE_NAME' => 'testp1_app.mnl.ph.com'
},
'FAILOVER' => 'on',
'CONNECT_TIMEOUT' => '5'
}
};
$VAR1 = {
'PROTOCOL' => 'TCP',
'Port' => '10666',
'Host' => 'testp1stdby.mnl.ph.com'
};
$VAR1 = '(PROTOCOL=TCP)(Port=10666)(Host=testp1stdby.mnl.ph.com)';
$VAR1 = {
'PROTOCOL' => 'TCP',
'Port' => '10666',
'Host' => 'testp1stdby.mnl.ph.com'
};


I hope this helps you to achieve your goal.

Chris


(This post was edited by Zhris on Oct 19, 2013, 11:23 AM)


Zhris
Enthusiast

Oct 21, 2013, 12:23 PM

Post #5 of 5 (1051 views)
Re: [newbie01.perl] - Help parsing a long string with special character [In reply to] Can't Post

Hi,

I have attached the latest / final version of my code to this post.

I wouldn't say its production ready but I have made numerous improvements and performed further testing to ensure its more stable.

I also successfully experimented with converting the raw data to JSON first then used the JSON module to parse it, therefore its something you may wish to consider.

If you do read this post, I hope my code helps you in completing your overall task.

Chris
Attachments: blah.pl (7.32 KB)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives