Program to Filter and remove duplicates


Feb 2, 2013, 8:26 AM

Program to Filter and remove duplicates

How to write a perl program to filter/sort and remove duplicates. Below is the example.

I want to sort out only cities and to remove the duplicate city names.

Input file:-

Bengaluru : City
Maharastra : State
Tamil Nadu : State
Mumbai : City
Chennai : City
Karnataka : State
Bengaluru : City
Hyderabad : City
Bengaluru : City

The out put file should only contain the city names and the repeated city names must be removed from the list(like below).

Bengaluru : City
Mumbai : City
Chennai : City
Hyderabad : City

Feb 2, 2013, 11:05 AM

C:\>perldoc -q duplicate


Found in C:\Perl\lib\pods\perlfaq4.pod 
How can I remove duplicate elements from a list or array?
(contributed by brian d foy)

Use a hash. When you think the words "unique" or "duplicated", think
"hash keys".

If you don't care about the order of the elements, you could just create
the hash then extract the keys. It's not important how you create that
hash: just that you use "keys" to get the unique elements.

my %hash = map { $_, 1 } @array;
# or a hash slice: @hash{ @array } = ();
# or a foreach: $hash{$_} = 1 foreach ( @array );

my @unique = keys %hash;

If you want to use a module, try the "uniq" function from
List::MoreUtils. In list context it returns the unique elements,
preserving their order in the list. In scalar context, it returns the
number of unique elements.

use List::MoreUtils qw(uniq);

my @unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 1,2,3,4,5,6,7
my $unique = uniq( 1, 2, 3, 4, 4, 5, 6, 5, 7 ); # 7

You can also go through each element and skip the ones you've seen
before. Use a hash to keep track. The first time the loop sees an
element, that element has no key in %Seen. The "next" statement creates
the key and immediately uses its value, which is "undef", so the loop
continues to the "push" and increments the value for that key. The next
time the loop sees that same element, its key exists in the hash *and*
the value for that key is true (since it's not 0 or "undef"), so the
next skips that iteration and the loop goes to the next element.

my @unique = ();
my %seen = ();

foreach my $elem ( @array ) {
next if $seen{ $elem }++;
push @unique, $elem;

You can write this more briefly using a grep, which does the same thing.

my %seen = ();
my @unique = grep { ! $seen{ $_ }++ } @array;

Now, is your instructor going to give me the A in your class?

Feb 2, 2013, 11:48 AM

Also, please do not cross post the same question in several part of this forum. In effect, you are just doing a copy and paste of the same question, and expect different people to take some of their time to give your answers.

Here, Fishmonger took time to give you an answer on this post, and I took time to give you an answer on another post you made. Please don't believe that people who are willing to help you have unlimited time. This is not the case.


Feb 2, 2013, 11:25 PM

Views: 1458
I am new to Perl and I do not know whether I should post this question in Beginner or Intermediate part of this forum. However, in future, I will not post the same question in several part of the forum. Thanks!