CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Multiple matching

 



fds
Novice

Dec 1, 2010, 9:29 AM

Post #1 of 14 (4054 views)
Multiple matching Can't Post

Hi, how do I match multiple lines in a text file to records in a flatfile database?

This is what I have so far. If I set $mts to 2 it shows the first 2 lines (good).

When I grab the data to get the matching field (4th one), so I can print all the records with the same numbers in, I only get one being printed.

Code
print "Content-type: text/html\n\n"; 
$mfl = "months.txt";
$mts = "2"; # could be many

=months.txt contents
200812
200809
200804
200708
200701
=cut

print "Show these $mts months worth<br>";

# get the months
open (MNF,"$mfl") || die("Cannot open $mfl");
@mfl = <MNF>;
close (MNF);
@lines;
for ($i = 0; $i < $mts; $i++) {$mon = <@mfl>; push @lines, $mon;}
foreach $mon (@lines) {print "$mon<br>\n";} # Good I got them

print "---------------------------<br>";

### grab records form main db
while ($rec = <DATA>) {
@fd=split /\|/, "$rec";
if ($fd[3] == $mon){
print "$rec<br>\n"; ## only shows one
}
}

__DATA__
1|The Weight|Aaa|200701|
2|Honky Tonk Women|200701|
3|Puking in the Seine|Ccc|200708|
4|Money For Nothing|200804|
5|The Pusher|Woo|200809|
6|Jailhouse Rock|Foo|200812|


Hope someone can help please.
----------------------------------------------
fdsaadsfdsaf


rovf
Veteran

Dec 2, 2010, 1:16 AM

Post #2 of 14 (4049 views)
Re: [Ted] Multiple matching [In reply to] Can't Post


Quote
how do I match multiple lines in a text file to records in a flatfile database?


This is a bit unclear. Do you mean: You have a set of keys in a text file, and a set of records in a flatfile database, and for each key you want to retrieve the record which matches the key?


fds
Novice

Dec 2, 2010, 1:57 AM

Post #3 of 14 (4046 views)
Re: [rovf] Multiple matching [In reply to] Can't Post

Yes.

The code only partially works as it only shows one key's records.

I really need to see it get the records for all the keys that are set.

I am assuming (because I just plain don't know), that this must entail some other kind of looping until all the keys records have been obtained. Even that's a bit of a guess. I've never done this before.
----------------------------------------------
fdsaadsfdsaf


rovf
Veteran

Dec 2, 2010, 3:55 AM

Post #4 of 14 (4043 views)
Re: [Ted] Multiple matching [In reply to] Can't Post

There are a couple of things I don't understand in your code:

(1) The

use strict; use warnings;

is missing. IMO it rarely makes sense to discuss code unless you have these enabled.

(2) You read first the file into an array, and then process the array row by row. Is there a reason why you don't process the file line by line in the first place?

(3) You have a statement which just evaluates a variable:

@lines;

What is this supposed to do?

(4) You use a funny construct,

$mon = <@mfl>

I was not aware that it is even legal to use an array variable inside <...>. May I ask you what this is doing?


fds
Novice

Dec 2, 2010, 4:38 AM

Post #5 of 14 (4041 views)
Re: [rovf] Multiple matching [In reply to] Can't Post

Hi, rovf. thanks for coming back on this.


Quote
The use "strict; use warnings;" is missing. IMO it rarely makes sense to discuss code unless you have these enabled.

I don't usually use 'strict', but tried it. That totally blew out everything (errors all over the place), so I put "my" in front of a whole bunch of varables and arrays until the errors stopped and there are no results at all now.



Quote
You read first the file into an array, and then process the array row by row. Is there a reason why you don't process the file line by line in the first place?


Not sure what you mean. Do you mean the months file. Where the first return would have a reading of the database and get the result, then repeat the process for every month that matches the number ($mts)



Quote
You have a statement which just evaluates a variable: @lines;

Turns out that is not needed as it is defined in the push.


Quote
You use a funny construct, $mon = <@mfl>

Don't know, but it works as it's referring to array mfl from when the file was opened. No errors came up.


I realize I'm not very good at a lot of this and have learned what I know in Perl by trail and error. If something works, I keep it and reuse in scripts. If not I dump it (which is why I don't use strict normally as every time I do it trashes stuff). Me "getting there" is a slow process :-), so sorry if it's a bit confusing.
----------------------------------------------
fdsaadsfdsaf


FishMonger
Veteran / Moderator

Dec 2, 2010, 6:19 AM

Post #6 of 14 (4036 views)
Re: [Ted] Multiple matching [In reply to] Can't Post

Your decision not to always use the strict and warnings pragmas is probably the main reason why you're not progressing as quickly as you should. Those pragmas help to point out lots of coding mistakes/errors and will aide in learning how to write better quality code.

Don't put quotes around single vars.
See: 'perldoc -q quoting'

You should be using the 3 arg form of open and a lexical var for the filehandle instead of the bareword and the die statement should include the reason it failed.

You're using the wrong data structure for storing the "months". You should be using a hash instead of the array.

Code
my %month;  
open my $fh, '<', $mfl or die "Cannot open $mfl $!";
while ( my $month = <$fh> ) {
chomp $month;
$month{$month}++;
}
close $fh;


If you do plan on using the less efficient array approach, then this should be cleaned up.

Code
@lines;  
for ($i = 0; $i < $mts; $i++) {$mon = <@mfl>; push @lines, $mon;}


A cleaner and less verbose syntax is:

Code
my @lines = @mfl[0..$mts-1];


Your DATA section has an inconsistent number of fields. Some lines have 3 fields, others have 4 fields. Is that what you really want?


rovf
Veteran

Dec 2, 2010, 6:32 AM

Post #7 of 14 (4033 views)
Re: [Ted] Multiple matching [In reply to] Can't Post


Quote
I don't usually use 'strict', but tried it. That totally blew out everything (errors all over the place), so I put "my" in front of a whole bunch of varables and arrays until the errors stopped and there are no results at all now.


Much better. Though there are cases where it makes sense to release strictness, these should be exceptions, and confined to a small block.

Maybe you can re-post the current version of your program, i.e. the version you made strict?


Quote
Not sure what you mean. Do you mean the months file.


Your program basically goes like this:

open (MNF,"$mfl") || die("Cannot open $mfl");
@mfl = <MNF>;
close (MNF);
.... # do something with @mfl

So you first put the whole file into an array (@mfl), and then you process the array. I was just curious why you do this, instead of processing the file line by line.


Quote
Turns out that is not needed as it is defined in the push.


This is *never* needed, as such a statement has no effect whatsoever. You just take the value of a variable, and then throw it away. It's a no-op.




Quote
Don't know, but it works as it's referring to array mfl from when the file was opened. No errors came up.


I'm not sure what you mean by "it works" (what is it doing?), but at least I learned that this construct is not illegal, as I thought before. However, I tried it out and found it pretty useless: In my experiments, <@mfl> evaluates to @mfl when executed in list context, and to $mfl[0] in scalar context. Since you use it in scalar context, the statement is equivalent to

$mon = $mfl[0]

which means that you always push the first element to your list in each operation.

If you want @lines contain the first $mts elements of @mfl, you could use array slices.


fds
Novice

Dec 2, 2010, 11:07 AM

Post #8 of 14 (4014 views)
Re: [rovf] Multiple matching [In reply to] Can't Post

Well. it looks like I've got it all wrong, so I'm starting from scratch again.

First I have a file months.txt

Code
200812 
200809
200804
200708
200701



Then I read in the file with the strict like suggested.

Code
use strict; 
use warnings;
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
print "Content-type: text/html\n\n";
my $mfl = 'months.txt';
my $mts = '2';


open(MNF, "$mfl") || die("could not open $mfl !");
my @mfl = <MNF>;
close (MNF);
my @lines = @mfl[0..$mts-1];
foreach my $mon (@lines) {print "$mon<br>\n";}

This returns (in this case):
200812
200809

From these 2 representative months. I need to grab all the relevant matching records from this file: dbs.txt.
There was a mistake earlier and 1 field was missed out.

Code
1|The Weight|Aaa|200701|  
2|Honky Tonk Women|Bbb|200701|
3|Puking in the Seine|Ccc|200708|
4|Money For Nothing|Mee|200804|
5|The Pusher|Woo|200809|
6|Jailhouse Rock|Foo|200812|



I want to return:

Code
5|The Pusher|Woo|200809|  
6|Jailhouse Rock|Foo|200812|



Quote
So you first put the whole file into an array (@mfl), and then you process the array. I was just curious why you do this, instead of processing the file line by line.

Because I thought that was the way I had to do it. That if the 2 months are in an array I could just reference the array. Is that the wrong way to do it? I've never tried to reference one file from another before, so am stabbing the dark here a bit.

Fishmonger:

Quote
You're using the wrong data structure for storing the "months". You should be using a hash instead of the array.

Does a % process differently than an @ - like hold it in memory longer or something. I know nothing about these and have never used them.

At to the quotes, leaving them out the script failed, so I put single quotes instead of double.
Using the 'perldoc -q quoting' on my live server is forbidden as it is Shell and we are not allowed to use that unless we have a $1200 a month dedicated server.

The "my @lines" bit is neat and I would never have figured that out. Thanks.
----------------------------------------------
fdsaadsfdsaf


FishMonger
Veteran / Moderator

Dec 2, 2010, 12:20 PM

Post #9 of 14 (4008 views)
Re: [Ted] Multiple matching [In reply to] Can't Post

In Perl the % is used for hashes which is also known as associative array and is made up of key/value pairs.

In this case, I'm suggesting that you use the months as the keys to the hash and use the hash as a lookup table.

Perl's documentation is online and is available at: http://perldoc.perl.org/


Code
#!/usr/bin/perl 

use strict;
use warnings;


my $mfl = "months.txt";
my $mts = 2;
my %months;

open my $fh, '<', $mfl or die "Cannot open $mfl $!";

for ( 1..$mts ) {
chomp(my $month = <$fh>);
$months{$month}++;
}
close $fh;

while ( my $rec = <DATA> ) {
my $month = (split /\|/, $rec)[-2];
print $rec if exists $months{$month};
}


__DATA__
1|The Weight|Aaa|200701|
2|Honky Tonk Women|200701|
3|Puking in the Seine|Ccc|200708|
4|Money For Nothing|200804|
5|The Pusher|Woo|200809|
6|Jailhouse Rock|Foo|200812|



rovf
Veteran

Dec 2, 2010, 12:41 PM

Post #10 of 14 (4005 views)
Re: [Ted] Multiple matching [In reply to] Can't Post

Of course this still leaves us with the question, why you read the whole file into an array, when you need only the first $mts lines. Note that if you evaluate <MNF> in scalar context, instead of list context, you get the next line from the file, so you would push the required lines into @lines one after the next, until you have read $mts lines (instead of the whole file). Also note that in this case the lines read will contain the final \n, so you will have to chomp them after reading. See perldoc -f chomp.


Quote
Because I thought that was the way I had to do it. That if the 2 months are in an array I could just reference the array.


Yep! But you need only those two months, not all of the input data.

BTW, since you are doing a lookup by key, it might be more efficient (and easier to program) to use a hash instead of an array - but this is an implementation detail.

As for processing the data file, you simply read sequentially to it, extract the "month key", and see if it is in your array @lines (or hash, depending on how you want to do it). If it is, you print the record.

Let me know if you have problems with one of these steps.


fds
Novice

Dec 2, 2010, 1:51 PM

Post #11 of 14 (4000 views)
Re: [rovf] Multiple matching [In reply to] Can't Post

Hi Fishmonger,
I see what's happening. Oddly enough I was just looking up about the % hash and it mentioned a look up table (then I checked my email and here I am).

You're defining the array at "my $rec" and splitting up the record. Then you have the "$rec)[-2]" - why the -2?

I've read in both files like this:

Code
use strict;  
use warnings;
use CGI::Carp qw(warningsToBrowser fatalsToBrowser);
print "Content-type: text/html\n\n";

my $mfl = "months.txt";
my $dbf = "dbs.txt";
my $mts = 2;
my %months;

open my $fh, '<', $mfl or die "Cannot open $mfl $!";
for ( 1..$mts ) {
chomp(my $month = <$fh>);
$months{$month}++;
}
close $fh;

open my $df, '<', $dbf or die "Cannot open $dbf $!";
while ( my $rec = <$df> ) {
chomp;
my $month = (split /\|/, $rec)[-2];
print $rec if exists $months{$month};
# if (exists $months{$month}) { print "$rec<br>\n"}
}
close $df;



Get this result - good:
5|The Pusher|Woo|200809|
6|Jailhouse Rock|Foo|200812|

A great big thanks.

figured out the -2. It the second field from the end of the record, yes?

------------------------------
rovf, I'll have to look at what you're saying tomorrow as I'm in the middle of a stinking cold and this poor artist's head is beginning to get done in Smile

Must say I'm learning a lot here guys and appreciate your efforts (and your patience with me).
----------------------------------------------
fdsaadsfdsaf

(This post was edited by Ted on Dec 2, 2010, 2:47 PM)


rovf
Veteran

Dec 2, 2010, 11:47 PM

Post #12 of 14 (3961 views)
Re: [Ted] Multiple matching [In reply to] Can't Post


Quote
igured out the -2. It the second field from the end of the record, yes?


Right. Negative array indices count from the end.


fds
Novice

Dec 3, 2010, 5:19 AM

Post #13 of 14 (3949 views)
Re: [rovf] Multiple matching [In reply to] Can't Post

Hi rovf,

Quote
Yep! But you need only those two months, not all of the input data.


If this is for the sake of speed, then given that each year would have 12 lines and it somehow, miraculously, continues for 30 years, then we are only talking about 2.81k. With increasing processor and hard disk speeds, that's probably okay to live with.

It's part of a forum script and we've only got 28 years before all 32 bit stuff blows out anyways, then this will all be redundant and I'll be well in my grave Wink

I'll try to get to grips with what Fishmonger shared and see if my ST sort will work so I can get the latest to top.

You're posting times look as though you're in Europe. I'm in an outer 'burb of London and it's finally stopped snowing after 3 days.
----------------------------------------------
fdsaadsfdsaf


fds
Novice

Dec 3, 2010, 11:00 AM

Post #14 of 14 (3937 views)
Re: [Ted] Multiple matching [In reply to] Can't Post

Well, I've got this working with my sort and everything seems fine.

Thanks to both of you for all the input and help.
----------------------------------------------
fdsaadsfdsaf

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives