CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
help with getting a unique return

 



regex2012
User

Nov 10, 2015, 9:34 PM

Post #1 of 9 (2207 views)
help with getting a unique return Can't Post

 I have the following:

Code
 open (my $fh1, "< /tmp/list1") or die "Can't open  for read: $!"; 
my @lines;
while (<$fh1>) {
push (@lines, $_);
}
close $fh1;
print @lines;

open (my $fh2, "< /tmp/list2") or die "Can't open for read: $!";
open (my $fhz, '>', '/tmp/remvlist') or die "Could not open file: $!";
while (<$fh2>) {
my @fields = split(',', $_);
local $" = ',';
my (@strings) = "@fields[1]\n";

print "this is @strings";
#print $fhz "this will be @fields[0]\n";

for my $term (@lines) {
if (grep $_ =~ $term, @strings) {
print "$term found.\n";
print "@fields[0]\n";
}
}

}

I can get the fields of the list2 array printed out as field 0 but the problem is that I get duplicated items too. so for each time that the grepped item is found, I get more than one return, because the list1 has duplicated items in it. Is there a way without a module to get only nonduplicate items? It is like a for each script so I see why it is doing this, but I wonder if there is any way around it.


Laurent_R
Veteran / Moderator

Nov 11, 2015, 4:10 AM

Post #2 of 9 (2202 views)
Re: [regex2012] help with getting a unique return [In reply to] Can't Post

It is certainly possible (and probably easy) to remove duplicates from your input or from your output. But please explain what a duplicated item is: for example, is it a full line coming several times in the input file?

What we really need here to better help you is a sample of both input files (with some identified duplicates), so that we can know how to handle them.

If the duplicated items are full lines from file one, then the simplest way to remove duplicates is to store the input lines in a hash rather than in an array (duplicate lines will be removed from the hash without you having to do anything more). But this really works only if you don't care about the order of the output. If you want to keep the input order, then a little extra step is necessary.

Please provide these extra details.


Chris Charley
User

Nov 11, 2015, 11:28 AM

Post #3 of 9 (2190 views)
Re: [regex2012] help with getting a unique return [In reply to] Can't Post

Related to your previous post?

http://perlguru.com/gforum.cgi?post=82260;guest=9127415#82260


regex2012
User

Nov 13, 2015, 7:07 AM

Post #4 of 9 (2157 views)
Re: [Laurent_R] help with getting a unique return [In reply to] Can't Post

I want duplicates removed and made into a third list with no duplicates. I don't care what order they are in

lista

/vol/con_form
/vol/con_error
/vol/con_form
/vol/con_error
/vol/con_error
/vol/ball_basket
/vol/ball_football
/vol/ball_basket
/vol/ball_football
/vol/ball_hockey
/vol/ball_hockey


listb

/vol/con_error
/vol/con_error
/vol/ball_basket
/vol/con_form


FishMonger
Veteran / Moderator

Nov 13, 2015, 7:16 AM

Post #5 of 9 (2155 views)
Re: [regex2012] help with getting a unique return [In reply to] Can't Post

Use a hash.

Removing duplicates is one of Perl's FAQ's. See: perldoc -q duplicate
http://perldoc.perl.org/perlfaq4.html#How-can-I-remove-duplicate-elements-from-a-list-or-array%3f


Laurent_R
Veteran / Moderator

Nov 13, 2015, 8:26 AM

Post #6 of 9 (2152 views)
Re: [regex2012] help with getting a unique return [In reply to] Can't Post

A one-liner example of using a hash to remove duplicates:

Code
$ echo '/vol/con_form 
> /vol/con_error
> /vol/con_form
> /vol/con_error
> /vol/con_error
> /vol/ball_basket
> /vol/ball_football
> /vol/ball_basket
> /vol/ball_football
> /vol/ball_hockey
> /vol/ball_hockey' | perl -ne '$hash{$_} = 1; END{print keys %hash};
> '
/vol/ball_football
/vol/ball_basket
/vol/ball_hockey
/vol/con_form
/vol/con_error



regex2012
User

Nov 13, 2015, 8:31 AM

Post #7 of 9 (2150 views)
Re: [Laurent_R] help with getting a unique return [In reply to] Can't Post

I don't know how to use a one line statement in a script.
Is there a way to do this?
In addition, that really doesn't remove all duplicates I get
/vol/ball_basket
/vol/con_error
/vol/ball_hockey
/vol/ball_hockey
/vol/ball_football
/vol/con_form

when I run it - I see hockey in there twice.


(This post was edited by regex2012 on Nov 13, 2015, 8:36 AM)


FishMonger
Veteran / Moderator

Nov 13, 2015, 9:11 AM

Post #8 of 9 (2142 views)
Re: [regex2012] help with getting a unique return [In reply to] Can't Post

You're seeing the effect of a copy/paste problem with this site. When you copy a group of lines, each line will include a trailing space. Since the second /vol/ball_hockey was not at the end of the line, it didn't include the space like the others. If you remove the trailing space from each line and chomp the line terminator, Laurent's example will work as expected.

Using an actual script would be a better example.

Code
#!/usr/bin/perl  

use warnings;
use strict;
use Data::Dumper;

my %hash = map { chomp; $_, 1 } <DATA>;
print Dumper \%hash;


__DATA__
/vol/con_form
/vol/con_error
/vol/con_form
/vol/con_error
/vol/con_error
/vol/ball_basket
/vol/ball_football
/vol/ball_basket
/vol/ball_football
/vol/ball_hockey
/vol/ball_hockey


[root@099-91-RKB-2 ~]# ./test.pl

Code
$VAR1 = { 
'/vol/ball_basket' => 1,
'/vol/ball_football' => 1,
'/vol/con_error' => 1,
'/vol/con_form' => 1,
'/vol/ball_hockey' => 1
};



regex2012
User

Nov 17, 2015, 9:29 AM

Post #9 of 9 (2107 views)
Re: [regex2012] help with getting a unique return [In reply to] Can't Post

I also found that this works for column split:

Code
use strict; 
open (my $fh1, "< /tmp/list.txt") or die "Can't open for read: $!";
my %seen = ();
while (<$fh1>) {
chomp;
my @columns = split ',';
print "$columns[0]\n" if ! $seen{$columns[0]}++;


Thanks for all your replies!

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives