CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Perl or simple Linux commands for joining information in two files?

 



S3
New User

Mar 24, 2013, 6:52 PM

Post #1 of 9 (795 views)
Perl or simple Linux commands for joining information in two files? Can't Post

Perl or simple Linux commands for joining information in two files?

Fellow members,

So I need to address a problem and I was wondering if I should use Perl script or if I could use the Linux command line (with commands not related solely to Perl) since my Perl is poor.

I have two tab and single space-delimited lists saved as text files.

The first file, a.txt, has data in which one line, which I call an entry, is arranged as follows:


Zip code house/residential number unique name.

(Some entries are tab-delimited while others are delimited by a single space.)

The first column is zip code, second is residential, and third is a unique name.

There are over 30,000 entries in this file.

Second file, t.txt, has two columns or one line/entry is composed as:

Unique name statistical number.

It, too, is tab-delimited and single-space delimited. There are less than 20,000 entries.

What I want to do is match unique names in t.txt to those in a.txt, so that my final outfile has:

[Unique name from t.txt] [statistical number] [zip code] [residential/house number].

The outfile must be tab-delimited. Order of information does not matter as long as information corresponds correctly to unique name in t.txt.

Since all entries are not delimited the same, I don’t know how to start feeding these different files into different hashes with the unique names as keys. (Should I use sed on the command line to substitute single spaces with tab spaces?) After reading each file into a separate hash, the code can be this?


open($FH,”<t.txt”) or die “File no open\n”;
while (<$FH>){
($tnamekey, $snumber) = split (/\+/,);
$thash{$tnamekey} = $snumber;
}

close($FH);

open($FH,”<a.txt”) or die “File no open\n”;
while (<SFH>){
??????????????????????????
$ahashkey{$anamekey} = ??????????

}

Close($FH);


$i = 0;

Foreach $tnamekey {
If ($tnamekey eq $anamekey) {
open($FH,”>>outfile”) or die “File no open\n”;
print “$tnamekey $snumber ?????????;
$i++;
}
}

Close($FH);


?????? = totally lost on what to write

I welcome any help in my lost state.

Thank you.

P.S. How do I use the code tag/function?


(This post was edited by S3 on Mar 24, 2013, 7:04 PM)


Kenosis
User

Mar 24, 2013, 7:33 PM

Post #2 of 9 (787 views)
Re: [S3] Perl or simple Linux commands for joining information in two files? [In reply to] Can't Post

What's the nature of the files' entries? If they're all non-whitespace, then space and tab delimiters will be treated the same. If entires have spaces, but are delimited by tabs, that's OK, too. However, your specs are difficult for me to understand, especially w/o (redacted, if necessary) dataset samples.

A hash is a likely solution to match 'records' between the two files, but a clarification of the nature of your datasets is needed.


(This post was edited by Kenosis on Mar 24, 2013, 7:33 PM)


Laurent_R
Veteran / Moderator

Mar 25, 2013, 12:26 AM

Post #3 of 9 (778 views)
Re: [Kenosis] Perl or simple Linux commands for joining information in two files? [In reply to] Can't Post

if you do something like this:


Code
my ($zip, $number, $whatever) = split /\s+/, $input_line;


the three field will be fed correctly, whether the separator is one or several spaces or tabs. But provoded, of course, that there is no space within the individual fields.

Otherwise, the solution is almost certainly to read your second file and store it in a hash (with the key being the field that is common with the first file), and then tyo rerad the first file line by line and complete the content of the lines with the help of the hash. Very classical, nothing complicated, no need to preprocess the files or whatever.


S3
New User

Mar 25, 2013, 9:36 AM

Post #4 of 9 (772 views)
Re: [Kenosis] Perl or simple Lample:inux commands for joining information in two files? [In reply to] Can't Post

Kenosis, I think I confused myself. Blush I have removed all entries in which there was a blank, for say, name or zipcode (or any one of the categories). What I mean is that for some entries the categories are sometimes delimited by 1 space instead of a tab space.

For example,

a[tab space]b[tab space]c
a b[tab space]c.

So all the information for each field is present; it's just that they (the categories) are delimited differently for some entries.


S3
New User

Mar 25, 2013, 9:41 AM

Post #5 of 9 (771 views)
Re: [Laurent_R] Perl or simple Linux commands for joining information in two files? [In reply to] Can't Post

Laurent_R, so I don't have to put the first file, t.txt, in a hash?


Laurent_R
Veteran / Moderator

Mar 25, 2013, 11:25 AM

Post #6 of 9 (768 views)
Re: [S3] Perl or simple Linux commands for joining information in two files? [In reply to] Can't Post

I am not sure I understood, but if you are saying that you have all the information necessary in file2.txt, then you obsiously don't need to read file1.txt and store its content in a hash. If I understood correctly, all you need to do is some form of reformating of file2.txt.

As for splitting on either space or tab, I already answered that previously.


Kenosis
User

Mar 25, 2013, 2:06 PM

Post #7 of 9 (761 views)
Re: [S3] Perl or simple Lample:inux commands for joining information in two files? [In reply to] Can't Post

Your dataset description was helpful.

If I understand you correctly, you want to match the files' unique name entires, and then combine some of their fields, printing them as a tab-delimited record.

If so, I'd read in t.txt first, and create a hash with each key as a unique name and the statistical number as its associated value. Then, iterate through a.txt, to match its unique name with a key, then format and print a line when found.

Given this, consider the following:


Code
use strict; 
use warnings;

# Declare a hash
my %hash;

# Take the first file name (a.txt) off @ARGV
my $a_text = shift;

# Read t.txt line-by-line; $_ (the default scalar) contains each line
while (<>) {
# Split the line on whitespace
if ( my @name_num = split ) {
$hash{ $name_num[0] } = $name_num[1];
}
}

# Place "a.txt" back into @ARGV
push @ARGV, $a_text;

# Read a.txt line-by-line
while (<>) {
if ( my @record = split ) {
# If the files' unique names match, print a record
print "$record[2]\t$hash{$record[2]}\t$record[0]\t$record[1]\n"
if $hash{ $record[2] };
}
}


Usage: perl script.pl a.txt t.txt >results.txt (These file names are placed into @ARGV)

The script first saves the first file's name for later, then reads through t.txt, populating a hash with key/value pairs. It restores the first file's name to @ARGV and then reads through it, splitting each line and printing a record if it finds a matching unique name in the hash from a.txt.

Hope this helps!


(This post was edited by Kenosis on Mar 26, 2013, 9:28 AM)


S3
New User

Mar 25, 2013, 4:27 PM

Post #8 of 9 (751 views)
Re: [Kenosis] Perl or simple Lample:inux commands for joining information in two files? [In reply to] Can't Post

I'm really sorry, but do you mind explaining the program again line by line? I am also trying to improve my Perl writing. I know that there are two arguments, but I thought you have to specify which argument is which file. There are also some commands, such as shift, that I don't understand.

Thank you.


Kenosis
User

Mar 26, 2013, 9:30 AM

Post #9 of 9 (743 views)
Re: [S3] Perl or simple Lample:inux commands for joining information in two files? [In reply to] Can't Post

My apologies for not commenting the script. I've added those. Let me know if you have any other questions...

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives