CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Remove duplicate numbers within each <div> tag.

 



gprakash
Novice

Jul 9, 2015, 11:30 PM

Post #1 of 3 (3609 views)
Remove duplicate numbers within each <div> tag. Can't Post

Hi,
I need to remove duplicate numbers within each <div> tag. The numbers are tagged within <a>number</a>.

Input data:

Code
<div class="ind2">delegation of, <a href="ch02.xhtml#d5e3071">23</a>, <a href="ch02.xhtml#d5e3189">22</a>, <a href="ch02.xhtml#d5e3342">22</a></div> 

<div class="ind1">Medicine, <a href="ch02.xhtml#d5e2772">26</a>, <a href="ch02.xhtml#d5e2806">26</a>, <a href="ch02.xhtml#d5e3112">22</a>, <a href="ch02.xhtml#d5e3209">26</a></div>



Expected output:

Code
<div class="ind2">delegation of, <a href="ch02.xhtml#d5e3071">23</a>, <a href="ch02.xhtml#d5e3189">22</a></div> 

<div class="ind1">Medicine, <a href="ch02.xhtml#d5e3112">22</a>, <a href="ch02.xhtml#d5e3209">26</a></div>


I tried the following in which all the repeated numbers are removed. help me to remove only the duplicate numbers.


Code
sub uniq { 
my %seen;
grep $seen{$_}++, @_;
}
my (@hfind,@filtered,$indhf1,$indhf2);
while($htmcont =~ m/<div class="ind([0-9]+)">(.*?)<\/div>/sgi)
{
my $indhf = $2;
@hfind=();
while($indhf =~ m/([^>]+)<a href="([^>]+)">(.*?)<\/a>/sgi){
$indhf1 = $1;
$indhf2 = $3;
push @hfind,$indhf2;
}
@filtered = uniq(@hfind);
foreach my $duprm (@filtered){
$htmcont =~ s/<a href="([^>]+)">$duprm<\/a>//;
}
}



BillKSmith
Veteran

Jul 10, 2015, 5:37 AM

Post #2 of 3 (3594 views)
Re: [gprakash] Remove duplicate numbers within each <div> tag. [In reply to] Can't Post

Let me restate your question. Within each <div> field, you wish to remove all but one of each set of <a> fields which contain the same number. It is unclear which one you wish to keep. In the first block of your example, you kept the first of the 22's. In the second block, you kept the last of the 26's. (Perhaps it does not matter.)

It is certainly possible to do this job with regular expressions. You seem to have a good start, but it is already becoming complicated. It probably would be easier to use a module to parse the mark-up. (Even if you count the time you spend finding and learning to use the module) The module would definitely be more reliable than any DIY system.
Good Luck,
Bill


gprakash
Novice

Jul 10, 2015, 9:33 PM

Post #3 of 3 (3564 views)
Re: [BillKSmith] Remove duplicate numbers within each <div> tag. [In reply to] Can't Post

Your understanding is correct. Need to remove one set of <a> field which contains same number. It does not matter whether you remove at first or at last.

Thank you for your information. I will try to use module.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives