CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
modify data between two files

 



jayspry
Novice

Apr 30, 2002, 11:46 PM

Post #1 of 13 (2778 views)
modify data between two files Can't Post

I have a project that uses two files. Both are text. One has fields delimited by / and the other by :.

The first file would have a format of:

NAME/field2/field3/field4/

POINT/field2/field3/field4/field5/field6/field7/

NAME/field2/field3/field4/

POINT/field2/field3/field4/field5/field6/field7/

POINT/field2/field3/field4/field5/field6/field7/

POINT/field2/field3/field4/field5/field6/field7/

etc.



The second file has a format of:

UNIT:field2:field3:field4:field5:field6:field7

UNIT:field2:field3:field4:field5:field6:field7

UNIT:field2:field3:field4:field5:field6:field7

etc.





I'm very new to PERL and a friend suggested that instead of using EXCEL macro to do this, I should use PERL (since some locations I'll be using the files will not have EXCEL available, but PERL is). I'd like to give it a try.

What I need to do is concatenate information from the Second file into those lines in the First file that begin with NAME. For example, I need to check and see if NAME in the first file matches a UNIT in the second File and then conactenate NAME[field3] with UNIT[field6]. I need to output a new file that contains the NAME lines that have been cancatenated and all of the POINT lines (nothing happens with them) in order. I'm not sure if I need to try and read both files into arrays (I'm having a hard time understanding their set up) or only the second one and just read one line at a time from the first file and output each line; making changes to the NAME lines as I go.

Any ideas to point me in the right direction would be greatly appreciated.



Thank you,



Jay Spry


rGeoffrey
User / Moderator

May 1, 2002, 10:07 AM

Post #2 of 13 (2774 views)
Re: [jayspry] modify data between two files [In reply to] Can't Post

Since you are writing file one with modifications it makes sense to read file two first and hold on to its data and then read file one exactly once and print out each line as you get it. This would not be a good plan if file two is huge and it would cause problems holding it in memory. Here is a script that does it...


Code
#!/usr/local/bin/perl 

use strict;

my $unitfile = 'source.txt';
my $startfile = 'start.txt';
my $endfile = 'end.txt';

my %units;
open (SOURCE, $unitfile) or die "could not read from '$unitfile', $!";
while (my $line = <SOURCE>) {
chomp $line;
my @line = split (':', $line);
$units{$line[0]} = \@line;
}
close SOURCE;


open (IN, $startfile) or die "could not read from '$startfile', $!";
open (OUT, ">$endfile") or die "could not write to '$endfile', $!";
while (my $line = <IN>) {
chomp $line;
my @line = split ('/', $line);

if (exists ($units{$line[0]})) {
$line[2] .= $units{$line[0]}[5];
print OUT join ('/', @line), "\n";
} else {
print OUT $line, "\n";
}
}
close IN;
close OUT;


We start by reading the file you want to steal from. For each line we chomp the newline off the end and then split it into an array of the separate fields. Then we use the first value as a key in a hash of arrays so we can find the data again later. It is important to note that I declare @line inside the while loop so each value in the hash will be different. If you accidentally delare @line outside the loop, then each entry in the hash will have a pointer to the exact same array, and it will hold the values from the last pass through the while loop.

Then we read each line in the main file. For each line we again chomp off the newline and split it into pieces. Then we check to see if the first value exists as a key in our hash %units. If it does then we know we will have to do the merge, otherwise we just print the line and tack the newline back onto the end.

If we do need to mangle this line we append to the third field on the line ($line[2]) the value from the sixth position in the array we stored earlier and is hidden in the hash. Then we join the parts back together for printing and remember to reattach the newline.

For those interested in perl golf, my code has huge bits just begging to be cut out, most notably the variable $line.

Below are the data files I tested with...

where source.txt (file two) is


Code
Flintstone:Fred:Wilma:Pebbles:Bedrock:Dino:Quary 
Jetson:George:Jane:Judy:Future:Astro:Spacely Sprockets
Simpson:Homer:Marge:Bart:Springfield:Santa's Little Helper:Power Plant


and start.txt (file one) is


Code
Flintstone/field2/field3/field4/  
POINT/field2/field3/field4/field5/field6/field7/
Simpson/field2/field3/field4/
POINT/field2/field3/field4/field5/field6/field7/
POINT/field2/field3/field4/field5/field6/field7/
POINT/field2/field3/field4/field5/field6/field7/


we get an end.txt (new file) of


Code
Flintstone/field2/field3Dino/field4/  
POINT/field2/field3/field4/field5/field6/field7/
Simpson/field2/field3Santa's Little Helper/field4/
POINT/field2/field3/field4/field5/field6/field7/
POINT/field2/field3/field4/field5/field6/field7/
POINT/field2/field3/field4/field5/field6/field7/



jayspry
Novice

May 1, 2002, 2:33 PM

Post #3 of 13 (2772 views)
Re: [rGeoffrey] modify data between two files [In reply to] Can't Post

Geoffrey,



Thank you very much for the reply. This will get me started very well. I know I did not think of any of the points you did. I think I can stumble though getting other part of the script done, it was this one item that was stumping me. Where would I find you PERL GOLF?



Thanks again,Smile



Jay


jayspry
Novice

May 1, 2002, 2:39 PM

Post #4 of 13 (2770 views)
Re: [jayspry] modify data between two files [In reply to] Can't Post

Geoffrey,



Forget my question about PERL GOLF. I looked at the rest of the site and found it.Blush



Thanks again for the help.



Jay


jayspry
Novice

May 1, 2002, 7:13 PM

Post #5 of 13 (2768 views)
Re: [jayspry] modify data between two files [In reply to] Can't Post

Geoffrey,



One last thing I forgot. Based on your script, how would I make changes to the POINT lines based on whether or not the NAME line had been modified. In other words, there is data in the source.txt file that would need to be added (to say POINT[5]) only if the NAME line for that POINT record had been modified by the source.txt file. If there was no match for the NAME line in the source.txt, then it and the folloiwng POINT lines would be output as they orginally were. The key is if there is a match between the NAME line and the source.txt, then modifications would take place on both NAME line and any following POINT lines (the same value in UNIT[7] would be used for all POINT[5] lines under that changed NAME line)



Using your output sample:

Flintsone/field2/field3Dino/field4

POINT/field2/field3/field4/change/field6/field7/

NAME/field2/field3/field4 - no match in the source.txt file

POINT/field2/field3/field4/filed5/field6/field7/

I assume I need to add some method of writing out POINT lines with the modifications until and new NAME line is read.



Thanks,



Jay


Paul
Enthusiast

May 2, 2002, 4:43 AM

Post #6 of 13 (2762 views)
Re: [rGeoffrey] modify data between two files [In reply to] Can't Post

Is this your URL?

http://www.justanotherperlhacker.org/

...if so you've been hacked, unless it is supposed to be play on words relating to "perl hacker"?


jayspry
Novice

May 2, 2002, 8:28 PM

Post #7 of 13 (2757 views)
Re: [RedRum] modify data between two files [In reply to] Can't Post

Paul,



Never heard of it. Unsure



Jay Spry


jayspry
Novice

May 2, 2002, 8:43 PM

Post #8 of 13 (2755 views)
Re: [RedRum] modify data between two files [In reply to] Can't Post

Paul,



Guest you were talking to Geoffrey! Got to read the heading. I thought you were asking me if the URL was mine.



Jay Spry


jayspry
Novice

May 2, 2002, 9:30 PM

Post #9 of 13 (2754 views)
Re: [rGeoffrey] modify data between two files [In reply to] Can't Post

Geoffrey,



I just had an idea. Could I set up a type of switch? The switch set to 'yes' is the NAME line is changed and use that to read the next POINT line and make the changes to it. If the NAME line is not changed, then swtich would be set to 'no' and the read the next POINT line and just output the POINT line without changes.



Jay Spry


rGeoffrey
User / Moderator

May 2, 2002, 10:32 PM

Post #10 of 13 (2752 views)
Re: [jayspry] modify data between two files [In reply to] Can't Post

A flag would be a good idea. But rather than set it to 'yes' and 'no' it would be better to set it to 1 or 0 because it makes the if statements easier. One possiblity looks something like...


Code
		if ($line[0] eq 'POINT') { 
if ($flag) {
$line[2] .= $units{$line[0]}[5];
print OUT join ('/', @line), "\n";
} else {
print OUT $line, "\n";
}
} else {
if (exists ($units{$line[0]})) {
$line[2] .= $units{$line[0]}[5];
$flag = 1;
print OUT join ('/', @line), "\n";
} else {
else $flag = 0;
print OUT $line, "\n";
}
}


I still don't know if it will work exactly as is though. Will the first value on the POINT lines be 'POINT'? If not, how do you plan on telling if you are on a the two types of lines apart? One choice would be to switch the first line to


Code
		if (scalar (@line) == 4) {


On the other topic, JustAnotherPerlHacker.org is mine and it did have an unwanted visitor. Fortunately they only added one line to the top of the front page and I fixed the damage. No I just have to figure out what went wrong. And someday I will get around to doing something interesting with the domain.


jayspry
Novice

May 3, 2002, 3:52 AM

Post #11 of 13 (2747 views)
Re: [rGeoffrey] modify data between two files [In reply to] Can't Post

Geoffrey,



Yes the POINT line will begin with POINT and instead of a line starting with NAME, it will start with TRACK. So, you would have lines like

TRACK,field2,field3,field4,field5,field6,field7

POINT,field2,field3,field4,field5,field6,field7.......

POINT,field2,field3,field4,field5,field6,field7.......

.

.

.

.

TRACK,field2,field3,field4,field5,field6,field7

POINT,field2,field3,field4,field5,field6,field7.......

POINT,field2,field3,field4,field5,field6,field7.......

.

.

.

And so on. TRACK and then a series of POINTS, followed by TRACK a series of POINTS , etc. TRACK will tell what the track is and POINT will be the actual data for that TRACK. The other text file (a csv file) contains information that will modify the TRACK and POINT records if there is a match in the csv file. If there is no match, then the TRACK and the associated POINT records are just simply output to the new file as they originally were.



Jay


jayspry
Novice

May 3, 2002, 3:57 AM

Post #12 of 13 (2746 views)
Re: [rGeoffrey] modify data between two files [In reply to] Can't Post

I should have said that the match between the TRACK file and the cvs file is based on the TRACK line (like how you set up the NAME example before, only the match is on fields 2 and 3 in the TRACK line only). Just to make things a little clearer, I hope.Smile



Thanks for all you help,



Jay


jayspry
Novice

May 5, 2002, 8:37 PM

Post #13 of 13 (2732 views)
Re: [rGeoffrey] modify data between two files [In reply to] Can't Post

Well, here is my attempt at the core problem. I have a problem with finding the 'source.txt' and 'start.txt' files. I have looked at DIR and other things, but when I run the script Perl does not find those files. They are on a WINDOWS 2000 machine in the directory c:/p/D and the script is in c:/p. I'm not seeing the right way of setting the PATH or changing directories or something. I thought I could use :

my $dir = 'c:/p/D'; but it did not work. Thanks for your help.



Jay Spry



!/usr/bin/perl -w

use strict;

my $cvsfile = 'source.txt';
my $wamfile = 'start.txt';
my $newwamfile = 'wam.txt';

my %cvs;
my %cvs2;
my $flag = 0;

open (SOURCE, $cvsfile) or die "Could not read from '$cvsfile',$!";
while (my $line = <SOURCE>) {
chomp $line;
my @line = split (':',$line);
$cvs{$line[0]} = \@line;
$cvs2{$line[2]} = \@line;
}
close SOURCE;

open (IN,$wamfile) or die "Could not read from '$wamfile',$!";
open (OUT,">$newwamfile") or die "Could not write to '$newwamfile',$!";
$flag=0;
while (my $line = <IN>) {
chomp $line;
my @line = split ('/',$line);
if ($line[0] eq 'TRACK') {
if(exist ($cvs{$line[0]})) {
if (exist ($cvs2{$line[1]})) {
$line[1] = 'TRU';
$line[2] = $cvs{$line[0]}[2] ." ". $cvs{$line[0]}[3];
if ($cvs{$line[0]}[5] eq "") {
$line[4] = $line[4]
} else {
$line[4] = $cvs{$line[0]}[5];
}
$line[5] = $line[1].$cvs{$line[0]}[1];
$line[6] = $cvs{$line[0]}[2];
$line[7] = $cvs{$line[0]}[6];
$flag = 1;
print OUT join ('/',@line),"\n";
} else {
$flag = 0;
print OUT $line, "\n";
}
}
}
if ($line[0] eq 'POINT') {
if ($flag eq 1) {
$line[18] = $cvs{$line[0]}[7];
$line[20] = $cvs{$line[0]}[8];
print OUT join ('/', @line),"\n";
} else {
print OUT $line, "\n";
$flag = 0;
}
}
}
close IN;
close OUT;

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives