CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
regex removing similar enteries

 



pnajafi
New User

Jan 7, 2011, 10:04 AM

Post #1 of 4 (4787 views)
regex removing similar enteries Can't Post

Hi There,

I have the following text (it is not a regex rather a file path followed by space followed by more text):

/next/ whatevertext
/next whatevertext
/Next watevertext
/next/next wordswords
/next/ words

Result of the match should be as follows:

The first 3 lines:

the result of this match should be only one line:
/next whatevertext
(given that "whatevertext" is also the same in all the 3 lines)
this is because "\next", "\Next" and "\next\" are all similar and I want only the one lowercase with no slash at the end.


The last two lines are selected as well, because "/next/next" and "/next/" are different.

My attempted regex Pattern is:

Code
/((.*)(\/)? (.*)){1}/i


So what I did in the pattern is to say zero or more characters (.*), alternatively ending with a slash (\/)? then space then zero or more characters (.*).

Now ((.*)(\/)? (.*)){1} means that all in the bracket happens at most once. And case insensitive is set using i flag ; however upon testing I am not getting the result I am looking for.

I think the problem lies in the fact that I am looking for something that happens at most once but it happens 3 times, so I would only (probably get only the last 2 distinct lines). So how do I go about it?
Any help is appreciated.


BillKSmith
Veteran

Jan 7, 2011, 3:12 PM

Post #2 of 4 (4779 views)
Re: [pnajafi] regex removing similar enteries [In reply to] Can't Post

I think this is what you want. Note the /xms modifiers.


Code
use strict; 
use warnings;
my $INTEXT =
"/next/ whatevertext\n"
."/next whatevertext\n"
."Next watevertext\n"
."next/next wordswords\n"
."next/ words\n"
;
my ($line2,$line45) = $INTEXT =~ /.+\n #skip first line
(.+\n) #capture second line
.+\n #skip third line
(.+\n.+\n)\Z #Capture last two lines
/xms;
my $out_text = $line2.$line45;
print $out_text;

Good Luck,
Bill


pnajafi
New User

Jan 8, 2011, 9:48 PM

Post #3 of 4 (4735 views)
Re: [BillKSmith] regex removing similar enteries [In reply to] Can't Post

Thanks BillkSmith.

I have not tested this yet, but the code seems to rather follow things rather specifically to this case. I am reading a file that has lines such as:

\dir\dir1\ someurl
\dir2\dir3 someurl2
\dir2 someotherurl3
\dir2\ someotherurl3
\dir3\ someotherurl4
\Dir3 someotherurl4

and the result of the match will be:

.....
\dir\dir1\ someurl
\dir2\dir3 someurl2
\dir2 someotherurl3
\dir3 someotherurl4
.....


your code seems like it is only limited to the 4 lines, where is this file I am reading could have arbitrary number of lines.


BillKSmith
Veteran

Jan 9, 2011, 2:07 PM

Post #4 of 4 (4703 views)
Re: [pnajafi] regex removing similar enteries [In reply to] Can't Post

It now seems that you want to delete lines whose final field has already been seen. (That assumes that the missing 'h' in line 3 of your first post was a typo).

Do not try to use a single regular expression. Use grep.


Code
use strict; 
use warnings;

my $INTEXT =
'\dir\dir1\ someurl' . "\n"
.'\dir2\dir3 someurl2' . "\n"
.'\dir2 someotherurl3' . "\n"
.'\dir2\ someotherurl3' . "\n"
.'\dir3\ someotherurl4' . "\n"
.'\Dir3 someotherurl4' . "\n"
;


my @lines = split "\n", $INTEXT;

my %seen=();
my @results = grep {!seen($_)} @lines;
{
local $, = "\n";
print @results;
}


{
%seen=();
sub seen {
my ($line) = @_;
my ($field) = $line =~ /\s*(\S+)$/;
return $seen{uc $field}++;
}
}

Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives