CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Help!!! How to find duplicates?

 



stylerr
New User

Dec 4, 2009, 12:02 PM

Post #1 of 2 (2556 views)
Help!!! How to find duplicates? Can't Post

Hello,

Consider this text:
my $text = "
$sub24835->($sub24839->($sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) ")),$sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )")),$sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) ")),$sub24842->($sub24843->("0"),$sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )")),$sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) "))),$sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )")),$sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) ")),$sub24832->($sub24855->("1"),$sub24856->($sub24857->("| a3"),$sub24859->("a3 |")),$sub24858->("a3")),$sub24849->($sub24850->("1"),$sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )")),$sub24832->($sub24855->("1"),$sub24856->($sub24857->("| a3"),$sub24859->("a3 |")),$sub24858->("a3")))))
";

I need to find all function call duplicates:
something like this:
1. $sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) "))

2. $sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )"))

Both 1. and 2. are occurred more than one time
in the text.

I tried to use this regex:
my @ttt = $ttt =~ /(\$sub\d+->\(.*\))?.*?\1/gs;

But it is not correct.
The main problem is that I want to extract THE WHOLE function call expression:
$sub111->(...) (it should contain BOTH opening AND closing parentheses).
See extracted examples above.

Thanks in advance.


stylerr
New User

Dec 7, 2009, 8:30 AM

Post #2 of 2 (2512 views)
Re: [stylerr] Help!!! How to find duplicates? [In reply to] Can't Post

Thanks everybody.

I found solution based on "Regexp::Common" CPAN module

Here it is

use Regexp::Common qw /balanced/;

my $ttt = '$sub24835->($sub24839->(
$sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) ")),
$sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )")),$sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) ")),$sub24842->($sub24843->("0"),$sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )")),$sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) "))),$sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )")),$sub24828->($sub24840->("( a1"),$sub24841->(" a1 ) ")),$sub24832->($sub24855->("1"),$sub24856->($sub24857->("| a3"),$sub24859->("a3 |")),$sub24858->("a3")),$sub24849->($sub24850->("1"),$sub24830->($sub24853->("( a2 "),$sub24854->(" a2 )")),$sub24832->($sub24855->("1"),$sub24856->($sub24857->("| a3"),$sub24859->("a3 |")),$sub24858->("a3")))))';

my @ttt = $ttt =~ /(\$sub\d+->$RE{balanced}{-parens=>'()'}).*?\1/sg;

print scalar @ttt, "\n";

print join("\n", @ttt);

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives