CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Advanced:
Help Parsing a line please.

 



cuboidgraphix
User

Jan 14, 2009, 11:13 AM

Post #1 of 22 (5370 views)
Help Parsing a line please. Can't Post

Hi FishMonger,
I was wondering if you could help me parse this line. Well actually I have hundreds of lines such as these.. but they're similar. I am working on a script that I'm writing from browsing and reading on the web, since I'm learning parsing lines. So far I got a lil. I was wondering if you could help me, since the web doesn't offer much to go on.

# 90148 Nak Active "14 Jan 09 00:00:57" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"


Where the variables would be..

Code
$index = "90148"; 
$status = "Active";
$date = "2009-01-14";
$time = "00:00:57";
$source = "O%:CBS1:BSM1:BSMPerfandUtil1";
$severity = "MINOR";
$fault_name = "CPU Usage exceeded the Threshold";


Thanks for any help you see fit to offer.


KevinR
Veteran


Jan 14, 2009, 11:19 AM

Post #2 of 22 (5368 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

What have you tried so far?
-------------------------------------------------


cuboidgraphix
User

Jan 14, 2009, 12:03 PM

Post #3 of 22 (5366 views)
Re: [KevinR] Help Parsing a line please. [In reply to] Can't Post


In Reply To
What have you tried so far?



Well I don't know if this is the best way to do it.. but I think it's too long and I'm doing too much just to get these two.


Code
open (DATA, $ARGV[0]) 
or die "Can't open '$ARGV[0]' $!";

while(my $line = <DATA>){
if($line =~ /^#/){
$space = ' ';
$quote = '"';

# Index Variable

$limit1 = index($line, $space);
$offset = ++$limit1;

$limit2 = index($line, $space, $offset);

$count1 = $limit2 - $limit1;

$index = substr($line, $limit1, $count1);

# Status Variable

$limit3 = ++$limit2;
$limit4 = index($line, $space, $limit3);

$count2 = $limit4 - $limit2;


$status = substr($line, $limit3, $count2);


}

close DATA;



Is there a shorter way of doing this?


KevinR
Veteran


Jan 14, 2009, 12:27 PM

Post #4 of 22 (5355 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

post some sample lines of data inside of the code tags to retain the formatting.

To me it looks like you could maybe use the split() function but without a better understanding of the data and what the columns represent its hard to say.
-------------------------------------------------


cuboidgraphix
User

Jan 14, 2009, 12:30 PM

Post #5 of 22 (5354 views)
Re: [KevinR] Help Parsing a line please. [In reply to] Can't Post


In Reply To
post some sample lines of data inside of the code tags to retain the formatting.

To me it looks like you could maybe use the split() function but without a better understanding of the data and what the columns represent its hard to say.


Header:

Code
# Index    Status Activity Date/Time             Source    Severity FaultName


Data:

Code
# 90148 Nak    Active   "14 Jan 09 00:00:57" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold" 
# 90148 Nak Clear "14 Jan 09 00:07:59" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"
# 90148 Ack Clear "14 Jan 09 00:07:59" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"
# 1692545 Nak Active "14 Jan 09 01:48:32" "O%:CBS1:Cells1:MC800BTS1029:MCBTSSubsystem1:Root1:CEM2" MINOR "SWERR log threshold exceeded at CEM, please upload the log // [XCEM] SWERR Log"



(This post was edited by cuboidgraphix on Jan 14, 2009, 12:42 PM)


KevinR
Veteran


Jan 14, 2009, 2:07 PM

Post #6 of 22 (5335 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

It looks like you can use the core module Text::ParsWords juding by the sample data. Here is an example of how to use it just to tokenize the lines:


Code
use Text::ParseWords; 
while (<DATA>) {
my @words = &quotewords('\s+', 0, $_);
$i = 0;
foreach (@words) {
print "$i: <$_>\n";
$i++;
}
}

__DATA__
# 90148 Nak Active "14 Jan 09 00:00:57" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"
# 90148 Nak Clear "14 Jan 09 00:07:59" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"
# 90148 Ack Clear "14 Jan 09 00:07:59" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"
# 1692545 Nak Active "14 Jan 09 01:48:32" "O%:CBS1:Cells1:MC800BTS1029:MCBTSSubsystem1:Root1:CEM2" MINOR "SWERR log threshold exceeded at CEM, please upload the log // [XCEM] SWERR Log"


Since your overall goal is unlcear I am not what else to suggest. Are you parsing the file line by line and doing something with the variables right then or are you building a large data set then doing something with the data?
-------------------------------------------------


(This post was edited by KevinR on Jan 14, 2009, 2:08 PM)


cuboidgraphix
User

Jan 14, 2009, 2:26 PM

Post #7 of 22 (5333 views)
Re: [KevinR] Help Parsing a line please. [In reply to] Can't Post

Sorry, I don't understand the terminology. 'Tokenize'.

Well the purpose of this parsing is gathering this info and inserting it into a Mysql database.

Yes... I plan to parse line by line.

While parsing the line, I want to get the following variables out of the line.

Index
Status
Activity
(Date/Time ... I want to break it up into two variables. Like below)
Date
Time
Source
Faultname.

Maybe if I put in all my code you'll follow.


Code
#!/usr/bin/perl 
# This file is the parser.pl
# This is a script that will parse data collected by collector.pl
# and insert the data into a MySQL database.

use strict;
use warnings;
use Mysql;
use POSIX qw/strftime/;

my $date = strftime("%y%m%d%H%M", localtime(time));
my $host = "localhost";
my $database = "DB";
my $tablename = "TN";
my $user = "USR";
my $pass = "PWD";

my $connect = Mysql->connect($host, $database, $user, $pass);
$connect->selectdb($database);

open (DATA, $ARGV[0])
or die "Can't open '$ARGV[0]' $!";

while(my $line = <DATA>){
if($line =~ /^#/){
$space = ' ';
$quote = '"';

# Index Variable

$limit1 = index($line, $space);
$offset = ++$limit1;

$limit2 = index($line, $space, $offset);

$count1 = $limit2 - $limit1;

$index_var = substr($line, $limit1, $count1);

# Status Variable

$limit3 = ++$limit2;
$limit4 = index($line, $space, $limit3);

$count2 = $limit4 - $limit2;

$status_var = substr($line, $limit3, $count2);



# Inserting Values into Database

my $query = "INSERT INTO $tablename VALUES ('$index_var', '$status_var', '$activity_var',
'$date_var', '$time_var', '$source_var',
'$severity_var', '$faultname_var', FALSE)";


$execute = $connect->query($query);


}
}

close DATA;



cuboidgraphix
User

Jan 14, 2009, 2:54 PM

Post #8 of 22 (5330 views)
Re: [KevinR] Help Parsing a line please. [In reply to] Can't Post


In Reply To
It looks like you can use the core module Text::ParsWords juding by the sample data. Here is an example of how to use it just to tokenize the lines:


Code
use Text::ParseWords; 
while (<DATA>) {
my @words = &quotewords('\s+', 0, $_);
$i = 0;
foreach (@words) {
print "$i: <$_>\n";
$i++;
}
}

__DATA__
# 90148 Nak Active "14 Jan 09 00:00:57" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"
# 90148 Nak Clear "14 Jan 09 00:07:59" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"
# 90148 Ack Clear "14 Jan 09 00:07:59" "O%:CBS1:BSM1:BSMPerfandUtil1" MINOR "CPU Usage exceeded the Threshold"
# 1692545 Nak Active "14 Jan 09 01:48:32" "O%:CBS1:Cells1:MC800BTS1029:MCBTSSubsystem1:Root1:CEM2" MINOR "SWERR log threshold exceeded at CEM, please upload the log // [XCEM] SWERR Log"


Since your overall goal is unlcear I am not what else to suggest. Are you parsing the file line by line and doing something with the variables right then or are you building a large data set then doing something with the data?



Really trying to follow with your piece of code.. but this part is all new to me.. guess there a whole lot left for me to learn. :)


KevinR
Veteran


Jan 14, 2009, 3:39 PM

Post #9 of 22 (5326 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

The data you posted has some spaces on the very end, but I have removed them to test the code I posted. If you run the code you see it produces this output:

0: <#>
1: <90148>
2: <Nak>
3: <Active>
4: <14 Jan 09 00:00:57>
5: <O%:CBS1:BSM1:BSMPerfandUtil1>
6: <MINOR>
7: <CPU Usage exceeded the Threshold>
0: <#>
1: <90148>
2: <Nak>
3: <Clear>
4: <14 Jan 09 00:07:59>
5: <O%:CBS1:BSM1:BSMPerfandUtil1>
6: <MINOR>
7: <CPU Usage exceeded the Threshold>
0: <#>
1: <90148>
2: <Ack>
3: <Clear>
4: <14 Jan 09 00:07:59>
5: <O%:CBS1:BSM1:BSMPerfandUtil1>
6: <MINOR>
7: <CPU Usage exceeded the Threshold>
0: <#>
1: <1692545>
2: <Nak>
3: <Active>
4: <14 Jan 09 01:48:32>
5: <O%:CBS1:Cells1:MC800BTS1029:MCBTSSubsystem1:Root1:CEM2>
6: <MINOR>
7: <SWERR log threshold exceeded at CEM, please upload the log // [XCEM] SWERR Log>

the angle brackets <> are just there so you can see what each token is, especially helpful if a token is empty.

So look at the first line:

0: <#>
1: <90148>
2: <Nak>
3: <Active>
4: <14 Jan 09 00:00:57>
5: <O%:CBS1:BSM1:BSMPerfandUtil1>
6: <MINOR>
7: <CPU Usage exceeded the Threshold>

0 thru 7 would be the index of the array value @words, so $words[1] is 90148 which is your $index scalar. Now you appear to want to split the date/time shown above into two seperate variables and reformat the date. That can be handled by a seperate function after you get the hang of using the module to parse the lines.
-------------------------------------------------


cuboidgraphix
User

Jan 14, 2009, 5:24 PM

Post #10 of 22 (5320 views)
Re: [KevinR] Help Parsing a line please. [In reply to] Can't Post

Brilliant KevinR. simply brilliant!!!
I follow now. Let me try and fit it into my script and see what I get.

Thanks


cuboidgraphix
User

Jan 14, 2009, 8:38 PM

Post #11 of 22 (5312 views)
Re: [KevinR] Help Parsing a line please. [In reply to] Can't Post

Hey KevinR,
I tried your script and sorta modified it somewhat to fit in my script. This script is working just like I wanted it and also with the date/time format. But I'd like for you to look at it and let me know if I wrote it right and if I used good Perl Practice on it, since I think it looks kinda confusing.

So let me know what you think please....


Code
#!/usr/bin/perl 
# This file is the parser.pl
# This is a script that will parse data collected by collector.pl
# and insert the data into a MySQL database.

use strict;
use warnings;
use Mysql;
use Text::ParseWords;
use Date::Parse;

my $host = "localhost";
my $database = "DB";
my $tablename = "TN";
my $user = "USR";
my $pass = "PWD";

my $connect = Mysql->connect($host, $database, $user, $pass);
$connect->selectdb($database);

open(DATA, $ARGV[0])
or die "Can't open '$ARGV[0]' $!";

while (<DATA>) {
if($_ =~ /^#/){
my @words = &quotewords('\s+', 0, $_);

# Index Variable
my $index = $words[1];
print "Index:", $index, "\n";

# Status Variable
my $status = $words[2];
print "Status:", $status, "\n";

# Activity Variable
my $activity = $words[3];
print "Activity:", $activity, "\n";

# Date/Time Variable
my $datetime = $words[4];

# Parsing Date/Time
my ($ss,$mm,$hh,$day,$month,$year,$zone) = strptime($datetime);

# Date Variable
my $date=sprintf("%4d-%02d-%02d",$year+2000,$month+1,$day);
print "Date:", $date, "\n";

# Time Variable
my $time=sprintf("%2d:%02d:%02d",$hh,$mm,$ss);
print "Time:", $time, "\n";

# Source Variable
my $source = $words[5];
print "Source:", $source, "\n";

# Severity Variable
my $severity = $words[6];
print "Severity:", $severity, "\n";

# Fault Name Variable
my $faultname = $words[7];
print "FaultName:", $faultname, "\n\n";


# Inserting Values into Database
my $query = "INSERT INTO $tablename VALUES(
'$index', '$status', '$activity',
'$date', '$time', '$source',
'$severity', '$faultname', FALSE)";
$execute = $connect->query($query);
}
}

close DATA;



KevinR
Veteran


Jan 14, 2009, 8:59 PM

Post #12 of 22 (5310 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

Your code is good. Could it be a little better? Maybe. But there are no major problems with it.
-------------------------------------------------


cuboidgraphix
User

Jan 14, 2009, 9:18 PM

Post #13 of 22 (5308 views)
Re: [KevinR] Help Parsing a line please. [In reply to] Can't Post


In Reply To
Your code is good. Could it be a little better? Maybe. But there are no major problems with it.


I'm always up for learning more and improving ... so please .. let me know what could be done different and/or improved.

Thanks.


KevinR
Veteran


Jan 14, 2009, 9:46 PM

Post #14 of 22 (5305 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

OK then. Use the proper quotes. Use single-quotes for simple strings and reserve double-quotes for strings that have some type of interpolation of variables or meta sequences. For example, all of the below strings should be single-quoted:


Code
my $host = 'localhost';  
my $database = 'DB';
my $tablename = 'TN';
my $user = 'USR';
my $pass = 'PWD';


Using the correct quotes in perl code is important to learn. Also good to know when not to use them, for example:


Code
$var = "$str";


there is no need to ever quote a scalar to assign it to another scalar yet we see people still doing that for some reason. Quotes are for creating new strings.

In your code you do stuff like this:


Code
# Index Variable  
my $index = $words[1];
print "Index:", $index, "\n";


You already have a scalar $words[1] that has the value you need, there is no need to create a new variable, $index. I understand you want to give the scalar a meaningful name, and that is a good practice to get into. But in this case I would not create the new variable just to use it in the INSERT statment, I would just add a line of documentation to make it clear that $words[1] is the index value from the file, and etc for the other variables. But that is a judgement call, because like I said, there are no major problems with your code.
-------------------------------------------------


KevinR
Veteran


Jan 14, 2009, 9:51 PM

Post #15 of 22 (5304 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

The Mysql module:

use Mysql;

Is no longer recommend to be used.


Quote
OBSOLETE SOFTWARE

As of Msql-Mysql-modules 1.19_10 M(y)sqlPerl is no longer a separate module. Instead it is emulated using the DBI drivers. You are strongly encouraged to implement new code with DBI directly. See "COMPATIBILITY NOTES" below.


http://search.cpan.org/~capttofu/DBD-mysql-3.0008/lib/Mysql.pm

There might be some suggestion in regards to your databse stuff but I am very rusty with SQL so I can't make comments about it.
-------------------------------------------------


(This post was edited by KevinR on Jan 14, 2009, 9:53 PM)


FishMonger
Veteran / Moderator

Jan 15, 2009, 5:46 AM

Post #16 of 22 (5288 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

Kevin,
Good suggestion on using Text::ParseWords, I would not have thought of that, I was leaning towards using a regex.


cuboidgraphix,
I have a few additional suggestions.

Add error handling on @ARGV which produces a usage statement if the required parameter is missing.

Don't use DATA as your filehandle. DATA is a special built-in filehandle that is used to inline your data like Kevin did in his post. Instead you should be using a lexical var for the filehandle.

Drop the & from the quotewords(...) subroutine call. It's not needed and has side effects which you should learn about.

Instead of using the @words array, assign your vars directly in that same call to quotewords(...).

Use the DBI module as already recommended by Kevin and do the prepare statement prior to the while loop. This will increase efficiency.

Here's an updated version of a portion of the script which addresses most of those recommendations and a couple that I haven't mentioned.

Code
my $file = $ARGV[0] || die "Usage: $0 <filename>\n"; 
open my $FH, '<', $file or die "Can't open '$file' $!";

my $sth = $dbh->prepare("INSERT INTO $tablename VALUES (?,?,?,?,?,?,?,?,FALSE)");
while (<$FH>) {

next if $_ !~ /^#/;

my (undef, $index, $status, $activity, $datetime, $source, $severity, $faultname) = quotewords('\s+', 0, $_);
my ($time) = $datetime =~ /(\S+)$/;
my (undef,undef,undef,$day,$month,$year) = strptime($datetime);
my $date = sprintf("%4d-%02d-%02d",$year+2000,$month+1,$day);

$sth->execute($index, $status, $activity, $date, $time, $source, $severity, $faultname);
}



(This post was edited by FishMonger on Jan 15, 2009, 5:55 AM)


cuboidgraphix
User

Jan 15, 2009, 6:22 AM

Post #17 of 22 (5283 views)
Re: [FishMonger] Help Parsing a line please. [In reply to] Can't Post

Wow!!! ... Thanks guys, I really appreciate you all passing your knowledge like this.. I feel so fortunate to have you both as my GO TO guys.


Thanks very much for the suggestions.. and I WILL take your advice and change my script around.

Thanks again.


cuboidgraphix
User

Jan 15, 2009, 7:27 AM

Post #18 of 22 (5280 views)
Re: [FishMonger] Help Parsing a line please. [In reply to] Can't Post

OK guys ... a lil help please?

The following is my updated code.


Code
#!/usr/bin/perl 
# This file is the parser.pl
# This is a script that will parse data collected by collector.pl
# and insert the data into a MySQL database.

use strict;
use warnings;
use DBI;
use DBD::mysql;
use Text::ParseWords;
use Date::Parse;

my $host = 'localhost';
my $database = 'DB';
my $tablename = 'TN';
my $user = 'USR';
my $pass = 'PWD';

# Data source name
my $dsn = "dbi:mysql:$database:localhost:3306";

# Perl DBI Connect
my $dbh = DBI->connect($dsn, $user, $pass);

my $file = $ARGV[0] || die "Usage: $0 <filename>\n";
open my $FH, '<', $file or die "Can't open '$file' $!";

my $sth = $dbh->prepare("INSERT INTO $tablename VALUES (?,?,?,?,?,?,?,?,FALSE)");
while (<$FH>) {

next if $_ !~ /^#/;

my (undef, $index, $status, $activity, $datetime, $source, $severity, $faultname) = quotewords('\s+', 0, $_);
my ($time) = $datetime =~ /(\S+)$/;
my (undef,undef,undef,$day,$month,$year) = strptime($datetime);
my $date = sprintf("%4d-%02d-%02d",$year+2000,$month+1,$day);

$sth->execute($index, $status, $activity, $date, $time, $source, $severity, $faultname);
}
close $FH;


It works and gets inserted into the database... The problem is that I get these error codes..


Code
DBD::mysql::st execute failed: Duplicate entry '90149-Clear-2009-01-15-00:07:51' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 6. 
DBD::mysql::st execute failed: Duplicate entry '1903042-Clear-2009-01-15-00:10:32' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 7.
DBD::mysql::st execute failed: Duplicate entry '1753818-Clear-2009-01-15-01:50:32' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 10.
DBD::mysql::st execute failed: Duplicate entry '1825718-Clear-2009-01-15-05:43:15' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 16.
DBD::mysql::st execute failed: Duplicate entry '1825719-Clear-2009-01-15-05:43:31' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 17.
DBD::mysql::st execute failed: Duplicate entry '1692552-Clear-2009-01-15-06:40:31' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 20.
DBD::mysql::st execute failed: Duplicate entry '1852117-Clear-2009-01-15-07:44:02' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 24.
DBD::mysql::st execute failed: Duplicate entry '1852118-Clear-2009-01-15-07:44:03' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 25.
DBD::mysql::st execute failed: Duplicate entry '1852119-Clear-2009-01-15-07:44:03' for key 1 at /home/bsm/parser.pl line 37, <$FH> line 26.



Why is this? Am I missing something?

Thanks.


FishMonger
Veteran / Moderator

Jan 15, 2009, 7:53 AM

Post #19 of 22 (5273 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

I assume that $index, $status, and $date are setup as primary keys in your db.

The error message is telling you that you already have an entry in the database that matches the primary key values that you're trying to insert.

Have you checked your file to see if there area any duplicate lines?

You could use replace instead of insert.
http://dev.mysql.com/doc/refman/5.0/en/replace.html


cuboidgraphix
User

Jan 15, 2009, 7:58 AM

Post #20 of 22 (5271 views)
Re: [FishMonger] Help Parsing a line please. [In reply to] Can't Post

Hi Fish,
That is absolutely right. I checked the file and I did find duplicated lines. My question is.. if there are duplicated lines, would it mess up the database eventually when trying to insert them into the database? I ask because I don't like to see error messages like those.


FishMonger
Veteran / Moderator

Jan 15, 2009, 8:29 AM

Post #21 of 22 (5265 views)
Re: [cuboidgraphix] Help Parsing a line please. [In reply to] Can't Post

Duplicate lines in your file won't affect the db because the insert statement will simple fail and the original entry will remain intact.

You have a few options.
1) Leave it as is and ignore those warnings.
2) Disable warnings, which is not a good idea.
3) Use REPLACE instead of INSERT which does an update if it's a duplicate, otherwise it does an insert.
4) Preparse you file and remove the duplicates.


cuboidgraphix
User

Jan 15, 2009, 8:43 AM

Post #22 of 22 (5263 views)
Re: [FishMonger] Help Parsing a line please. [In reply to] Can't Post

Cool.. Thanks for the info Fish.

I think I'll leave it as is ... because there's more coming to these script. Next I need to query the database and send an sms through python. That's the reason for the last value.. FALSE..

After they have been sent, they will be updated to TRUE. That's why I can't use the REPLACE.

So thanks again guys... I appreciate all the help so far.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives