CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Parsing text and obtaining value

 



stevieger
Novice

Dec 14, 2012, 8:15 AM

Post #1 of 13 (2123 views)
Parsing text and obtaining value Can't Post

I have a large disorderly text file that contains data I want to extract in a way such that I locate a specific string, such as "Time:" but then I want the value for time that comes afterward.

So for example if somewhere in a messy text file there is this "Time: 12:00:00" I want a variable equal to 12:00:00, and I know it comes directly after the string "Time:" how can I do this?

I understand how to match the string and locate that but what I'm unsure of is how to reference the data that comes after it and put it to a variable.

my code so far:

Code
use strict; 
use warnings;

open FILE, "text.txt" or die "error opening file: $!";

while (my $line = <FILE>) {
if($line=~/Time:(.*?) $/){
print "$1\n";
}
}

Code
 
I know there is some error to this, I am a newbie but I was just experimenting with some RegExp pattern matching stuff trying to get it to work.

Any help is appreciated.


Laurent_R
Veteran / Moderator

Dec 14, 2012, 8:38 AM

Post #2 of 13 (2122 views)
Re: [stevieger] Parsing text and obtaining value [In reply to] Can't Post

If you line contains something like "Time: 12:00:00", then you can do something like this:


Code
my $captured_value = $1 if /Time: (\d\d:\d\d:\d\d)/;


Or, if the hour may have only one number (such as 8:00:00) then change it to:


Code
my $captured_value = $1 if /Time: (\d?\d:\d\d:\d\d)/;



stevieger
Novice

Dec 14, 2012, 11:08 AM

Post #3 of 13 (2113 views)
Re: [Laurent_R] Parsing text and obtaining value [In reply to] Can't Post

So I used the concept and am able to extract the data in its' own variable which is what I want. Now could someone explain how I would do this for multiple items within the text file?

For instance after I capture the value for "Time:" I want to then capture the value for "Date:" which comes after "Time:" in the file.

It seems the loop quits after the first if statement and even if I put another one it doesn't execute.


Code
use strict; 
use warnings;

my $captured_value;
my $captured_value2;

open FILE, "text.txt" or die "error opening file: $!";

while (my $line = <FILE>) {
if ($line =~/Time:(\d\d:\d\d:\d\d)/){
$captured_value = $1;
print $captured_value;
}
if($line =~/Date:(\d\d\d\d\d\d\d\d\d\d)/){
$captured_value2 = $2;
print $captured_value2;
}


my goal is to pull multiple values out of text file report and set them all to separate variables.

Thanks.


FishMonger
Veteran / Moderator

Dec 14, 2012, 12:19 PM

Post #4 of 13 (2110 views)
Re: [stevieger] Parsing text and obtaining value [In reply to] Can't Post

Please post 4 or 5 lines from your text file an explain what portions of the line you need to extract.


stevieger
Novice

Dec 14, 2012, 12:35 PM

Post #5 of 13 (2108 views)
Re: [FishMonger] Parsing text and obtaining value [In reply to] Can't Post

below is a sample of my text file, this is how it looks in wordpad, it does not look as friendly in notepad. I figured out how to get date and time into variables so I'm ok on that front. I need to be able to capture the values from the measured column, JUST those numbers, any ideas on how to do that?


Code
Date:09/12/2012                                  Time:04:53 

+-------------------------------------------+--------------------+-----------+----------+
| | limits | | |
| +---------+----------+ | |
| | minimum | maximum | In Limits | Measured |
+===========================================+=========+==========+===========+==========+
| Percentage [%] | 0.00 | 0.50 | Not ok | 0.78 |
| Average value | 880.00 | 10000.00 | OK | 977.65 |



Laurent_R
Veteran / Moderator

Dec 14, 2012, 1:55 PM

Post #6 of 13 (2103 views)
Re: [stevieger] Parsing text and obtaining value [In reply to] Can't Post

Hi,

since date and time are on the same line, you can capture both elements in one shot with something like this:


Code
my ($date, $time) = ($1, $2) if m{Date: (\d?\d/\d\d/\d\d) .*Time: (\d?\d:\d\d:\d\d)};


Here, $1 captures the first expression in parentheses in the regex (i.e. the date), and $2 captures the second expression in parentheses. And $1 and $2 are then copied into the $date and $time variables.

I changed the regex delimiters for {} instead of //, because that allows to use / characters (for the date) within the regex without making it too much more complicated.

This might still be a little bit cryptic for you, but I hope you start to get a sense of these constructs.


stevieger
Novice

Dec 14, 2012, 5:27 PM

Post #7 of 13 (2099 views)
Re: [Laurent_R] Parsing text and obtaining value [In reply to] Can't Post

I get the constructs from reading online about how to use regex so I understand the code.

From my text file in my last post you see there is a table with some numbers entered into it, and what I'm trying to do is get the last number in the table, in the measured value column. So using these techniques I'm having a hard time matching and getting that number, are there any other parsing techniques I could use to pull that out?


Laurent_R
Veteran / Moderator

Dec 15, 2012, 1:34 AM

Post #8 of 13 (2096 views)
Re: [stevieger] Parsing text and obtaining value [In reply to] Can't Post

With data such as:


Code
|             Percentage [%]                |    0.00 |     0.50 | Not ok    |     0.78 |  
| Average value | 880.00 | 10000.00 | OK | 977.65 |


is you need to capture only the last value, you can do a number of things.

Assuming your line is in the $line variable, you could split the input on the '|' character and get the last element of the resulting array:


Code
my @fields = split /\|/, $line; # splits the line 
my $value = pop @fields; # gets the last element from the splitted array
$value =~ s/\s//g; # removes leading and trailing spaces


You could also use directly a regular expression:


Code
my $value = $1 if $line =~ /([\d]+)\s+\|\s+$/;


(The regex looks for a group of digits and dots, followed by spaces, followed by |, followed by spaces, the whole thing being at the end of the string.)


BillKSmith
Veteran

Dec 15, 2012, 6:59 AM

Post #9 of 13 (2093 views)
Re: [stevieger] Parsing text and obtaining value [In reply to] Can't Post

Are you sure that your file is a text file? If wordpad can format it nicely, it may be a word file. If so, it would be better to use a module to parse it.

In any case, notepad displays the raw content of the file. This is what you need to write the regex.
Good Luck,
Bill


stevieger
Novice

Dec 15, 2012, 11:02 AM

Post #10 of 13 (2081 views)
Re: [BillKSmith] Parsing text and obtaining value [In reply to] Can't Post

Yeah wordpad displays it nicely but when opened in notepad it is much more messy


BillKSmith
Veteran

Dec 15, 2012, 11:20 AM

Post #11 of 13 (2078 views)
Re: [stevieger] Parsing text and obtaining value [In reply to] Can't Post

Please attach the file. We must be certain of its format before we can help you.
Good Luck,
Bill


omega
Novice

Dec 15, 2012, 11:26 AM

Post #12 of 13 (2077 views)
Re: [stevieger] Parsing text and obtaining value [In reply to] Can't Post

please rewrite the code with the following:

Code
while ( my $line = <FILE> ) { 
if ($line =~/Time:(.*) /{
print "$1\n";
}
}


using a space at the end of the string, which after debugging "could" be replaced with '?'. Im no expert, just seems to be a logic issue of where your brackets are placed and what perl is looking for.


Laurent_R
Veteran / Moderator

Dec 15, 2012, 2:41 PM

Post #13 of 13 (2068 views)
Re: [stevieger] Parsing text and obtaining value [In reply to] Can't Post

Just another point from reading again the thread.

Your original message stated:


Quote
So for example if somewhere in a messy text file there is this "Time: 12:00:00" I want a variable equal to 12:00:00



But in the sample file you posted later, the time format was different: "12:00", not "12:00:00". Quite obviously, the regex I gave you would match "12:00:00", but would not match "12:00". So, in brief, you might have to adapt the regex I posted in accordance with the true format of your time stamps.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives