CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Combining words in one .txt file with words in another (Algorithms::Combinatorics)

 



TJC
Novice

Jan 8, 2014, 9:33 AM

Post #1 of 12 (1783 views)
Combining words in one .txt file with words in another (Algorithms::Combinatorics) Can't Post

Good Evening,

I've written a basic Perl script that combines words (listed 1 per line in a .txt file) from a supplied file:


Code
#!/usr/bin/env perl 

use strict;
use warnings;

my $usage = "Usage: $0 <infile.txt>\n";
my $infile = shift or die $usage;
use File::Basename;
my $DIR = dirname($infile);
my $outfile = $DIR . "/Results.txt" or die $usage;

open (my $data, "<", $infile) or die "There was a problem opening: $!";
my @primers = <$data>;
close $data;
chomp @primers;

use Algorithm::Combinatorics qw(combinations);
my $strings = \@primers;
my $iter = combinations($strings, 2);
open(my $fh, '>', $outfile);
while (my $c = $iter->next) {
print $fh @$c, "\n";
}
print ("Finished. The results are located at $outfile\n\n");


However - I would also like to be able to load/read two .txt files and combine words in one with words in the other without there being any self-combination within one list i.e.:

File 1:
Apple
Banana
Grape

File 2:
Orange

Output:
AppleOrange
BananaOrange
GrapeOrange

How could I modify my above script to do this? Any help, tips or suggestions would be great.

Thanks!


BillKSmith
Veteran

Jan 8, 2014, 10:49 AM

Post #2 of 12 (1768 views)
Re: [TJC] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

I do not know of a module, but pure perl code is fairly simple. I am assuming that both file are small enough to read into memory and that you can add the error checking.


Code
use strict; 
use warnings;
use Slurp;
my $DIR = ($ARGV[0] or '.');
my @file_1 = slurp('File_1.txt');
chomp @file_1;
my @file_2 = slurp('File_2.txt');
open my $fh, '>', "$DIR/Results.txt";
foreach my $word1 (@file_1) {
foreach my $word2 (@file_2) {
print {$fh} $word1.$word2;
}
}
close $fh;

Good Luck,
Bill


TJC
Novice

Jan 8, 2014, 11:06 AM

Post #3 of 12 (1764 views)
Re: [BillKSmith] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

Great! Thank you. I was not aware that it was as simple as that without a module.

The output that script gives me is:


Code
FOR1REV1 
FOR1REV2
FOR1REV3
FOR1REV4FOR2REV1
FOR2REV2
FOR2REV3
FOR2REV4FOR3REV1
FOR3REV2
FOR3REV3
FOR3REV4FOR4REV1
FOR4REV2
FOR4REV3
FOR4REV4


i.e. it fails to add a new-line when starting a new combination group.

Using:


Code
print {$fh} $word1.$word2, "\n";


Does not solve the problem and instead introduces lines between each combination and double-lines between combination groups.

Any ideas on what I can do?


TJC
Novice

Jan 8, 2014, 11:18 AM

Post #4 of 12 (1755 views)
Re: [TJC] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

Nevermind! I butchered together your script and my own and it resolved the issues I was having:


Code
use strict;  
use warnings;

my $usage = "Usage: $0 <infile1.txt> <infile2.txt>\n";
my $infile1 = shift or die $usage;
my $infile2 = shift or die $usage;
use File::Basename;
my $DIR = dirname($infile1);
my $outfile = $DIR . "/Results.txt" or die $usage;

open (my $data1, "<", $infile1) or die "There was a problem opening: $!";
my @FOR = <$data1>;
close $data1;
chomp @FOR;
open (my $data2, "<", $infile2) or die "There was a problem opening: $!";
my @REV = <$data2>;
close $data2;
chomp @REV;

open my $fh, '>', "$DIR/Results.txt";
foreach my $word1 (@FOR) {
foreach my $word2 (@REV) {
print {$fh} $word1.$word2, "\n";
}
}
close $fh;
print ("Finished. The results are located at $outfile\n\n");


Thanks again for the help!


Kenosis
User

Jan 8, 2014, 12:33 PM

Post #5 of 12 (1740 views)
Re: [TJC] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

It looks like you have this issue well resolved. However, in case you're interested, Perl's glob can be used to produce combinations:

Code
use strict;  
use warnings;

print "$_\n" for map { s/-/ /g; $_ } glob "{Apple,Banana,Grape}-{Orange}";

Output:

Code
Apple Orange 
Banana Orange
Grape Orange

The dash is only used to separate the word sets, as you'll note that it's later replaced with a space. Give this, a script can be written to which you can send two+ word files to produce their combinations:

Code
use strict; 
use warnings;

@ARGV > 1 and my @files = splice @ARGV, 0 or die "Send more than one file, please.\n";
my @words;
local $" = ',';

for (@files) {
push @ARGV, $_;
push @words, "{@{[map {chomp; $_} <>]}}";
}

print "$_\n" for map { s/-/ /g; $_ } glob join '-', @words;

Usage: perl script.pl file1 file2 [..fileN] >outFile

The script first checks for at least two files, then those files are moved into @files. The local $" = ','; notation tells perl to place commas between array elements during interpolation--which occurs later. Each file is pushed onto @ARGV to let Perl handle reading it (<>).

The rather ugly (and unnecessary but sometimes fun to use) part is the baby-cart, where each file's newlines are removed to produce a list which Perl interpolates into comma-separated items. This list is enclosed by braces, so it can be fed to glob.

The only advantage to coding this--other than producing an immediate headache--is that it doesn't 'hard code' for the number of files you send the script, i.e., there's no need to write a certain number of loops depending upon the number of files.

Hope this helps!


(This post was edited by Kenosis on Jan 8, 2014, 1:49 PM)


BillKSmith
Veteran

Jan 8, 2014, 7:56 PM

Post #6 of 12 (1719 views)
Re: [TJC] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

Sorry about the problem. My code worked with my test files because they had a newline at the end of the last line. It never occurred to me that this is not required.
Good Luck,
Bill


TJC
Novice

Jan 9, 2014, 2:50 AM

Post #7 of 12 (1712 views)
Re: [Kenosis] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post


In Reply To
It looks like you have this issue well resolved. However, in case you're interested, Perl's glob can be used to produce combinations


Interesting! Thanks :) Always good to know alternative ways of doing things.


TJC
Novice

Jan 9, 2014, 3:01 AM

Post #8 of 12 (1711 views)
Re: [BillKSmith] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

I'm curious to see if the script can be further modified to accommodate titles or labels for each set of letters/word i.e. the input lists are in the format of:

>Title1
Word1
>Title2
Word2

>Title3
Word3
>Title4
Word4

And the output would be
>Title1Title3
Word1Word3
>Title1Title4
Word1Word4

Whereby the titles are also merged and placed above each combination.

I sadly do not have much of an idea how I could go about this. If anybody could offer some help for this that would be great!


BillKSmith
Veteran

Jan 9, 2014, 6:00 AM

Post #9 of 12 (1689 views)
Re: [TJC] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

Use the same algorithm as before, but work with Title/Word pairs rather than simple strings.
Untested:

Code
use strict; 
use warnings;

open my $IN1, '<', 'file1.txt' or die "$!";
my @data1;
while (!eof $IN1) {
push @data1, [<$IN1>, <$IN1>];
}
close $IN1;

open my $IN2, '<', 'file2.txt' or die "$!";
my @data2;
while (!eof $IN2) {
push @data2, [<$IN2>, <$IN2>];
}
close $IN2;

open my $OUT, '>', 'result.txt';
foreach my $pair1 (@data1) {
my ($title1, $word1) = @$pair1;
chomp $title1;
foreach my $pair2 (@data2) {
my ($title2, $word2) = @$pair2;
chomp $title2;
$title2 =~ s/^\>//;
print {$OUT} $title1, $title2, "\n", $word1, $word2;
}
}
close $OUT;

Update: corrected error in open statements.
Good Luck,
Bill

(This post was edited by BillKSmith on Jan 9, 2014, 6:28 AM)


TJC
Novice

Jan 9, 2014, 7:53 AM

Post #10 of 12 (1682 views)
Re: [BillKSmith] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post


In Reply To
Use the same algorithm as before, but work with Title/Word pairs rather than simple strings.


Thank you very much for the input. When running this script using two .txt files containing:


Code
>TITLEA 
AAAAA
>TITLEB
BBBBB


and

Code
>TITLEC 
CCCCC
>TITLED
DDDDD


The output is:


Code
>TITLEATITLEC 
AAAAA
CCCCC


I've added chomp for $word1 and $word2 to solve the issue of the combination being on separate lines. However I am not sure why only 1 combination is being shown.

Thanks for any help anybody can offer with this one!


BillKSmith
Veteran

Jan 9, 2014, 9:39 AM

Post #11 of 12 (1662 views)
Re: [TJC] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

The safest approach is to chomp all inputs and explicitly put all required newlines in the output. All pairs were not processed because my input operators were in list context. They must be forced to scalar context.


Code
use strict; 
use warnings;

open my $IN1, '<', 'file1.txt' or die "$!";
my @data1;
while (!eof $IN1) {
push @data1, [scalar <$IN1>, scalar <$IN1>];
chomp @{$data1[-1]};
}
close $IN1;

open my $IN2, '<', 'file2.txt' or die "$!";
my @data2;
while (!eof $IN2) {
push @data2, [scalar <$IN2>, scalar <$IN2>];
chomp @{$data2[-1]};
}
close $IN2;

open my $OUT, '>', 'result.txt';
foreach my $pair1 (@data1) {
my ($title1, $word1) = @$pair1;
foreach my $pair2 (@data2) {
chomp @$pair2;
my ($title2, $word2) = @$pair2;
$title2 =~ s/^\>//;
print {$OUT} $title1, $title2, "\n", $word1, $word2, "'\n";
}
}
close $OUT;


Output:

Code
>TITLEATITLEC 
AAAAACCCCC'
>TITLEATITLED
AAAAADDDDD'
>TITLEBTITLEC
BBBBBCCCCC'
>TITLEBTITLED
BBBBBDDDDD'

Good Luck,
Bill


TJC
Novice

Jan 10, 2014, 5:11 AM

Post #12 of 12 (1608 views)
Re: [BillKSmith] Combining words in one .txt file with words in another (Algorithms::Combinatorics) [In reply to] Can't Post

Great - thank you for all the help throughout - i've learnt quite a lot.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives