CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Need to capture all users in file1 not in file2

 



Sanctioner
New User

Apr 30, 2014, 7:50 AM

Post #1 of 4 (2620 views)
Need to capture all users in file1 not in file2 Can't Post

Hello,

I have a csv list of users with full info and a csv list of autologin-only users with just their acct and userid. I need to get a list of all non-autologin users.

here is a sample of the 2 files:

users.csv:

Code
"0001111111001NAT","123456DF","JOHN","test@test.gov","2014-03-12-09.59.11.908000" 
"0002222222001LA","654321AS","JANE","asdf@asdf.com","2014-03-12-10.00.10.020000"


users_autolog.csv:

Code
"0001111111001NAT","123456DF"


The expected output is:

Code
"0002222222001LA","654321AS","JANE","asdf@asdf.com","2014-03-12-10.00.10.020000"


My plan is to use a regex to capture the first 2 fields of users.csv for each row, and use what it matches as the pattern to exclude in users_autolog.csv, and then output it to a new file.

My code does not err and is as follows:

Code
#!/usr/bin/perl -w 
use strict;
use warnings;

open (SRCFILE, "</cygdrive/c/perl/users.csv") or die $!;
open (SRCFILEAL, "</cygdrive/c/perl/users_autolog.csv") or die $!;
open (OUTFILE, ">>/cygdrive/c/perl/user_non_autolog.csv") or die $!;

my $row_pattern;

foreach (<SRCFILE>) {
($row_pattern) = $_ =~ m/(^"\d{13}\w{2,3}","\w{1,15}")/g;
chomp;

seek SRCFILEAL, 0, 0;
if (<SRCFILEAL> !~ m/$row_pattern/g) {
my $outrow = $_ =~ s/\r//g;
print OUTFILE "$_\n";
}
}

close (SRCFILE) or die $!;
close (SRCFILEAL) or die $!;
close (OUTFILE) or die $!;


The actual output is:

Code
"0001111111001NAT","123456","JOHN","test@test.gov","2014-03-12-09.59.11.908000" 
"0002222222001LA","654321","JANE","asdf@asdf.com","2014-03-12-10.00.10.020000"

..because the if statement, though the pattern matches, somehow passes regardless on every iteration. I can't figure out why. I tried looping through the 2nd file line by line and received the same result.

Arrays are not viable because these files are millions of rows.

How do I achieve the expected out put? Any help is appreciated. thanks, -s


FishMonger
Veteran / Moderator

Apr 30, 2014, 9:11 AM

Post #2 of 4 (2615 views)
Re: [Sanctioner] Need to capture all users in file1 not in file2 [In reply to] Can't Post

A better approach would be to load the users_autolog.csv file into a hash and then loop over the other file and do a simple hash lookup and only print the records that don't have a key in the hash.

Also, instead of using a regex to extract the fields, use the Text::CSV_XS module. It will handle splitting the fields and puts them into an array ref.
http://search.cpan.org/~hmbrand/Text-CSV_XS-1.07/CSV_XS.pm


Laurent_R
Veteran / Moderator

Apr 30, 2014, 10:03 AM

Post #3 of 4 (2612 views)
Re: [Sanctioner] Need to capture all users in file1 not in file2 [In reply to] Can't Post

Your approach is extremely unefficient, because it is reading the entire autolog file for each line in the other file.

Bill gave the right solution: read only once the autoloig file and store the identifiers of the autolog file into a hash, then read the other file and print out if the identifier is not found in the hash.

As for isolating the identifier, you could use the module suggested by Bill, but if you can't for some reason, then you could also use the split function rather than a complicated regex:


Code
my $id = (split /,/, $_)[0];



Sanctioner
New User

May 1, 2014, 2:14 PM

Post #4 of 4 (2552 views)
Re: [Laurent_R] Need to capture all users in file1 not in file2 [In reply to] Can't Post

thanks for the replies. i'm more a sql guy than anything, so i appreciate the direction.

I wrote it as a hash and got what i needed and takes way less time to run.

thanks again,
-s

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives