CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Parsing Text File

 



ozi
Novice

Dec 14, 2008, 1:41 PM

Post #1 of 18 (2011 views)
Parsing Text File Can't Post

I have several text files that I need certain content parsed out of each.

Here is a partial example of one of the textfiles ( I have stared out sensitive information)


Code
---  
- Name: Client Name
- Contact:
- - Addresses
- - ************
- - Phone_numbers
- - ************
- ************
- - Email_addresses
- - ************
- -
Note 3060148:
- Author: ************
- Written: ************
- About: ************
- Body: |-
<h1>Lead</h1>

************
- Note 2909448:
- Author: ************
- Written: ************
- About: ************
- Body: |-
<h1>Pre-Paid Hours</h1>

N/A
- Note 2909446:
- Author: ************
- Written: ************
- About: ************
- Body: |+
<h1>Domain Names</h1>

************
************
************

************
Admin: ************
Pass: ************

Whois INFO:

Administrative Contact :
************
************

Technical Contact :
************
************


Record expires on ************
- Note 2909443:
- Author: ************
- Written: ************
- About: ************
- Body: |-
<h1>Web Hosting</h1>
************

************

FTP Access
************
************
************
************

************
************
************
************


What I need is to grab the data for each Note that is after the Body tag. Each file is going to have a different amount of data as well. Oh, and I also need to have the client name within this new data set.

My end goal is to be able to have ... for example all of the Web Hosting information from each file within one delimited text file.

I'm not new to perl, but am inexperienced when it comes to parsing this type of text file.

Any help would be appreciated and I hope the above made sense.

Thank you in advance.


KevinR
Veteran


Dec 14, 2008, 10:12 PM

Post #2 of 18 (2002 views)
Re: [ozi] Parsing Text File [In reply to] Can't Post

What have you tried so far?
-------------------------------------------------


ozi
Novice

Dec 14, 2008, 10:43 PM

Post #3 of 18 (2000 views)
Re: [KevinR] Parsing Text File [In reply to] Can't Post

I have tried a couple of different things. I started out trying to do this with just one file


Code
open (CLIENTS,"$ppath/contacts/client name.txt"); 

while( $line = <CLIENTS>) {

$line =~ s/\s+$//;
$line =~ s/-//g;
$line =~ s/- //g;
$line =~ s/[\n\r]//mg;
$line =~ s/\+//g;
$line =~ s/Body: //g;
$line =~ s/Name: //g;
$line =~ s/ Phone_numbers//;
$line =~ s/ Email_addresses//;
$line =~ s/\<h1>//g;
$line =~ s/ <br>//g;
$line =~ s/Addresses.*//g;
$line =~ s/Author:.*//g;
$line =~ s/Contact://g;
$line =~ s/Note.*//g;
$line =~ s/Written.*//g;
$line =~ s/About.*//g;
$line =~ s/^\s+/|/;
$line =~ s/<br>//g;
$line =~ s/\r//gs;


@list = ( $line );
%temp;
@list = grep { ++$temp{$_} < 2 } @list;

foreach $rec (@list){
$rec =~ s/^\s+/|/;
print "$rec";

open (DATABASE6,">>contacts/client name.txt");
print DATABASE6 "$rec";
close (DATABASE6);
}
}
close (CLIENTS);


But all this does is parse out some unwanted stuff and delimit the rest of the data.


So then I tried this ....


Code
opendir(INFILE, "$ppath/contacts/") || die ("Unable to open directory"); 
@files2 = grep !/^\./, readdir(INFILE);
closedir(INFILE);

foreach $textfile (sort @files2) {

open (CLIENTS,"$ppath/contacts/$textfile");

while( $line = <CLIENTS>) {

$line =~ s/Body: //g;
$line =~ s/-//g;
$line =~ s/"//g;
$line =~ s/\\n//g;
$line =~ s/\\//g;
$line =~ s/\s+$//;
$line =~ s/^\s+//;

if ($line =~ /<h1>.*/ ) {
$line =~ s/\<h1>//g;

@list = ($line);
%temp;
@list = grep { ++$temp{$_} < 2 } @list;

open (DATABASE6,">>headers.txt");
print DATABASE6 "$line";
close (DATABASE6);

foreach $message (@list) {
print "$message<br>";
}

}
}
close (CLIENTS);
}


This gives me the headers that I need grepping out any duplicate header names, but I cannot figure out how to grab the data after each heading until the next Note appears.

I also forgot to mention in my original post that wherever there is a note... the about line also includes the Client Name which also needs to be parsed into the same extracted information.


KevinR
Veteran


Dec 14, 2008, 11:03 PM

Post #4 of 18 (1999 views)
Re: [ozi] Parsing Text File [In reply to] Can't Post

See if this helps you get started. Creates a hash of arrays using the Note XXXXXX values as the hash keys so they must be unique values. I used Data::Dumper as a convenience to print the data.


Code
use strict; 
use warnings;
use Data::Dumper;
my %data = ();
open (IN, 'yourfile') or die "$!";
LINE: while( my $line = <IN>) {
if ($line =~ /\- (Note \d+):/) {
my $note = $1;
<IN>,<IN>,<IN>;
while ($line = <IN>) {
redo LINE if ($line =~ /\- Note \d+:/);
push @{$data{$note}},$line;
}
}
}
print Dumper \%data;

-------------------------------------------------


FishMonger
Veteran / Moderator

Dec 15, 2008, 9:24 AM

Post #5 of 18 (1987 views)
Re: [ozi] Parsing Text File [In reply to] Can't Post

I'm not sure how much of that data you want to extract, but see if this is close.


Code
#!/usr/bin/perl 

use strict;
use warnings;
use Data::Dumper;

my (%data, $client, $note);
my $datafile = 'filename';

open my $DATAFILE, '<', $datafile or die "can't open '$datafile' $!";
while( my $line = <$DATAFILE>) {

if ( $line =~ /^\s+- Body: (.*\n)/ ) {
$data{$client}{$note}{body} .= $1;
while ( $line = <$DATAFILE> ) {
if ( $line =~ /\s+Note (\d+):/ ) {
$note = $1;
last;
}
$data{$client}{$note}{body} .= $line;
}
next;
}

$client = $1 if $line =~ /^- Name: (.+)/;
$note = $1 if $line =~ /\s+Note (\d+):/;

if ( $note and $line =~ /^\s+- (\w+):\s*(.+)/ ) {
$data{$client}{$note}{$1} = $2;
}

}
print Dumper \%data;



(This post was edited by FishMonger on Dec 15, 2008, 9:26 AM)


ozi
Novice

Dec 15, 2008, 9:47 AM

Post #6 of 18 (1981 views)
Re: [FishMonger] Parsing Text File [In reply to] Can't Post

Ok, FishMonger's is closer. Kevin's wouldn't show up all the notes within a given textfile and it's probably due to that not every note has the - in front of it.

This script however will still show all the notes within a given file. I need to parse out each individual note and only show one note at a time. If I have to have a different script for each note that is fine too.

I have to have this script perform this function for 361 files and the notes for each file are not in the same order.
But for each file I need to parse out for example just the "Web Hosting" information or just the "Content Management System" etc.

And thank you to both of you for helping in this.


KevinR
Veteran


Dec 15, 2008, 10:10 AM

Post #7 of 18 (1977 views)
Re: [ozi] Parsing Text File [In reply to] Can't Post

I see I left out the client name but that is easily added. But like my post said....

"See if this helps you get started"

The code should be considered more educational than functional.

Regards,
Kevin
-------------------------------------------------


ozi
Novice

Dec 15, 2008, 11:18 AM

Post #8 of 18 (1975 views)
Re: [KevinR] Parsing Text File [In reply to] Can't Post

I understand and I want to learn this too. Unfortunately I'm in a time crunch and wish I knew how to parse out the hash. But I guess I'll just go research.

Thanks anyways and Happy Holidays.


ozi
Novice

Dec 15, 2008, 1:07 PM

Post #9 of 18 (1969 views)
Re: [KevinR] Parsing Text File [In reply to] Can't Post

Ok, i have tried


Code
my (@data, $sample, %sample); 
@data = ( \%data );
$sample{%data} = @data;
print $sample{%data};


This returns a value of 1



Then I tried

Code
my (@data, $sample, %sample); 
@data = ( \%data );

foreach my $content(@data){
if ($content =~ /<h1>Content Management System.*/){
print "$content\n";
}
}


Which did not return anything.


Then ....

Code
my (@array, @temp); 

@array = keys(%data);
foreach my $content(@array){
print "testing my $content";
}


Which printed out
testing my Client Name


ozi
Novice

Dec 16, 2008, 11:11 AM

Post #10 of 18 (1936 views)
Re: [KevinR] Parsing Text File [In reply to] Can't Post

Ok, I'm stumped. I just can't seem to figure this out. It's probably due to that I'm trying to hurry as this needs to get done asap.... Here is the latest that I've tried to parse this information.


Code
  
use strict;
use warnings;
use Data::Dumper;
my %data = ();
open (IN, 'clientname.txt') or die "$!";
LINE: while( my $line = <IN>) {
if ($line =~ /\- (Note \d+):/) {
my $note = $1;
<IN>,<IN>,<IN>;
while ($line = <IN>) {
redo LINE if ($line =~ /\- Note \d+:/);
push @{$data{$note}},$line;
}
}
}
print Dumper \%data;

my (@data, $sample, %sample, $key);

@data = (Dumper \%data);
$sample{'key'} = \@data;

foreach $key (sort keys %data) {
print "1. $key: <br>2. $data{$key}<br />";
}


Which returns the 1. clientname and 2. hash reference

So then I tried this .....


Code
 #snipped out code ... 
my (@data, $sample, %sample, $key);

@data = (Dumper \%data);
$sample{'key'} = \@data;

foreach $key (\%data) {
if ($key =~ /<h1>Content Management System.*/){

print "1. $key: <br>2. $data{$key}<br />";
} }


Which returns nothing.

Please please help? I don't even know if I'm getting close or if i'm way off base here.

Another question for you...and just a curious question right now. But if I were to ask you to do this and I pay you for it, how much would it cost and how long would it take?

Thanks.


KevinR
Veteran


Dec 16, 2008, 2:13 PM

Post #11 of 18 (1932 views)
Re: [ozi] Parsing Text File [In reply to] Can't Post

I don't have the time to help until tonight (my time). So I'll check back later.
-------------------------------------------------


ozi
Novice

Dec 17, 2008, 10:16 AM

Post #12 of 18 (1920 views)
Re: [KevinR] Parsing Text File [In reply to] Can't Post

Ok, I look forward to your reply.


KevinR
Veteran


Dec 17, 2008, 10:43 AM

Post #13 of 18 (1918 views)
Re: [ozi] Parsing Text File [In reply to] Can't Post

Well, you have taken the question and the code FishMonger posted (not credited to him) to another forum. So you may as well stick with the other forum or check back here and maybe FishMonger will give it another try.
-------------------------------------------------


ozi
Novice

Dec 17, 2008, 10:56 AM

Post #14 of 18 (1917 views)
Re: [KevinR] Parsing Text File [In reply to] Can't Post

I apologize for that, I will go back into that forum and credit him accordingly.

I'm desperate here and was hoping to get someone to help. This is the first time I've posted questions with respect to Perl and I guess I just got carried away and wasn't thinking clearly.

Is there any way I can talk you or FishMonger into helping me again?

Or point me in a direction to look?


KevinR
Veteran


Dec 17, 2008, 11:11 AM

Post #15 of 18 (1914 views)
Re: [ozi] Parsing Text File [In reply to] Can't Post

It quite hard to help with such a limited and censored data file example. I realize some of the data might be sensitive but text file parsing is very dependent on the text, and when you can't see all the text its really impossible to think of a strategy to parse the data into records. Writng code by guess and assumption is rarely useful which explains why the code I previously posted did not work well.
-------------------------------------------------


(This post was edited by KevinR on Dec 17, 2008, 11:12 AM)


ozi
Novice

Dec 17, 2008, 11:20 AM

Post #16 of 18 (1908 views)
Re: [KevinR] Parsing Text File [In reply to] Can't Post

I understand what your saying. But the data contains information such as ftp usernames and passwords, registrar usernames and passwords, email names and passwords etc. So you can see why i'm hesitant in giving that information out.

Within each file this information is going to be different anyway.

What can I do to help in this process?

Thanks!!


KevinR
Veteran


Dec 17, 2008, 12:53 PM

Post #17 of 18 (1901 views)
Re: [ozi] Parsing Text File [In reply to] Can't Post

see the other forum, I post there as KevinADC
-------------------------------------------------


ozi
Novice

Dec 17, 2008, 5:55 PM

Post #18 of 18 (1894 views)
Re: [KevinR] Parsing Text File [In reply to] Can't Post

Thank you Kevin

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives