CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Corrected - Please read again: Regexp to remove a matched pattern but...

 



Chupo_cro
Novice


Aug 25, 2012, 5:19 AM

Post #1 of 10 (7478 views)
Corrected - Please read again: Regexp to remove a matched pattern but... Can't Post

Can someone, please, help me to construct the regexp to remove the matched pattern from the string, but to avoid more than one space in the resulting string at the boundaries of the removed substring. All of the remaining substring should stay untouched.

Note: The blue text is a condition I forgot to insert in the inital text.

Example:

Code
Input: 

one two some_pattern three four
one two some_pattern three four
one twosome_pattern three four
one two some_patternthree four
one two three four some_pattern
some_pattern one two three four
onetwosome_patternthreefour
one two some_pattern three four

Desired output:

one two three four
one two three four
one two three four
one two three four
one two three four
one two three four
onetwothreefour
one two three four


Just removing the

Code
some_pattern

from the

Code
one two some_pattern three four

would result in the string

Code
one two  three four

which has two consecutive spaces.
Chupo_cro

(This post was edited by Chupo_cro on Aug 25, 2012, 9:34 AM)


Laurent_R
Veteran / Moderator

Aug 25, 2012, 7:57 AM

Post #2 of 10 (7471 views)
Re: [Chupo_cro] Regexp to remove a matched pattern but... [In reply to] Can't Post

Hi,

a suggestion:


Code
$line =~ s/some_pattern//g; # removing the pattern 
$line =~s/(\s)\s+/$1/g; # changing multi space into one space



Chupo_cro
Novice


Aug 25, 2012, 8:50 AM

Post #3 of 10 (7468 views)
Re: [Laurent_R] Regexp to remove a matched pattern but... [In reply to] Can't Post


In Reply To
Hi,

a suggestion:


Code
$line =~ s/some_pattern//g; # removing the pattern 
$line =~s/(\s)\s+/$1/g; # changing multi space into one space


Yes, that could be a soution. However, I wouldn't want to 'touch' the unmatched part of the string. I am sorry my explanation wasn't precise in that detail. Here you are an additional example to demonstrate the desired output:

Code
Input:  

one two some_pattern three four

Desired output:

one two three four


That is - all of the remaining (unmatched) substring should stay as it was.

Thank you for the reply and sorry for the inconvenience.
Chupo_cro


Laurent_R
Veteran / Moderator

Aug 25, 2012, 11:41 AM

Post #4 of 10 (7456 views)
Re: [Chupo_cro] Regexp to remove a matched pattern but... [In reply to] Can't Post

Hi,

then you can just replace {one or more spaces, some_pattern, one or more spaces} by one space. Something like this:


Code
$line =~ s/\s+some_pattern\s+/ /g;



Laurent_R
Veteran / Moderator

Aug 25, 2012, 11:45 AM

Post #5 of 10 (7455 views)
Re: [Laurent_R] Regexp to remove a matched pattern but... [In reply to] Can't Post

Well, looking again at your examples, I figured out that sometimes you do not have a space before or after some_pattern.

It should be rather like this:


Code
$line =~ s/\s*some_pattern\s*/ /g;



Chupo_cro
Novice


Aug 26, 2012, 5:28 PM

Post #6 of 10 (7419 views)
Re: [Laurent_R] Regexp to remove a matched pattern but... [In reply to] Can't Post


In Reply To
Well, looking again at your examples, I figured out that sometimes you do not have a space before or after some_pattern.

It should be rather like this:


Code
$line =~ s/\s*some_pattern\s*/ /g;


Can you, please, check if this would result in the desired output according to the input/output (one before the last blue example in my first post)? That is:


Code
Input: 

onetwosome_patternthreefour

Output:

onetwothreefour


The regexp you wrote seems to handle every case except the case when there isn't a space before or after the pattern.

In words: If there isn't any spaces before or after the pattern, then resulting string after removing the pattern also shouldn't contain the space at the place of the removed string.

I am sorry, just my explanation in words wasn't in-depth enough to completely define the problem - the input/output examples better illustrate what I would like to achieve.

Thank you for your time!
Best Regards,
Chupo_cro


Laurent_R
Veteran / Moderator

Aug 26, 2012, 11:38 PM

Post #7 of 10 (7415 views)
Re: [Chupo_cro] Regexp to remove a matched pattern but... [In reply to] Can't Post

Then you probably need several regexes to handle the differents cases, something like this

$line =~ s/\s+some_pattern[^\s]/ /g;
$line =~ s/[^\s]some_pattern\s+/ /g;
$line =~ s/\s+some_pattern\s+/ /g;
$line =~ s/some_pattern//g;


Chupo_cro
Novice


Aug 28, 2012, 2:56 AM

Post #8 of 10 (7405 views)
Re: [Laurent_R] Regexp to remove a matched pattern but... [In reply to] Can't Post


In Reply To
Then you probably need several regexes to handle the differents cases, something like this

$line =~ s/\s+some_pattern[^\s]/ /g;
$line =~ s/[^\s]some_pattern\s+/ /g;
$line =~ s/\s+some_pattern\s+/ /g;
$line =~ s/some_pattern//g;

I still haven't done the testing but am pretty sure it is going to handle all of the inputs (space(s)+not-space, not-space+space(s), space(s)+space(s), no spaces).

Thank you very much for your time! I appreciate your help,

Best Regards
Chupo_cro


Laurent_R
Veteran / Moderator

Aug 28, 2012, 4:44 AM

Post #9 of 10 (7396 views)
Re: [Chupo_cro] Regexp to remove a matched pattern but... [In reply to] Can't Post

Yes, do the testing. I haven't tested it at all, and this was just a quick answer from the top of my mind.


BillKSmith
Veteran

Aug 28, 2012, 6:04 AM

Post #10 of 10 (7387 views)
Re: [Chupo_cro] Regexp to remove a matched pattern but... [In reply to] Can't Post

I am sure that Laurent is on the right track. This will take more than one Regex. I recommend that you implement the solution as a subroutine. For testing purposes, it may worth the extra effort to package that subroutine as a module.

The subroutine should be tested with one of perl's test modules. http://search.cpan.org/~rgarcia/perl-5.10.0/lib/Test/Tutorial.pod. Your specification is already in exactly the form that you need.

EDITS: I have replaced the code in this post. The failures reported by the original were correct. I have improved the output of the test and replaced the regular expressions with a new set. The new subroutine now passes all but the blue test case. The test case appears to have too many spaces between 'one' and 'two'. I have taken the liberty of changing that in my test. The code now posted below passes all tests (including the extra one I describe below)


Code
use strict; 
use warnings;
use Test::More qw( no_plan );

my %fixed_line = (
'one two some_pattern three four' => 'one two three four',
'one two some_pattern three four' => 'one two three four',
'one twosome_pattern three four' => 'one two three four',
'one two some_patternthree four' => 'one two three four',
'one two three four some_pattern' => 'one two three four',
"one two three four some_pattern\n" => "one two three four\n",
'some_pattern one two three four' => 'one two three four',
'onetwosome_patternthreefour' => 'onetwothreefour',
'one two some_pattern three four' => 'one two three four',
);

foreach my $line (keys %fixed_line) {
my $expected = $fixed_line{$line};
my $computed = fix($line);
is( $computed, $expected, "'$line' => '$computed'" );
}

sub fix {
my ($line) = @_;
$line =~ s/(?:^\s*some_pattern\s*)|(?:\s*some_pattern$)//;
$line =~ s/(?<=\S)some_pattern(?=\S)//;
$line =~ s/\s*some_pattern\s*/ /;
return $line;
}


There are other issues which should be tested. Regexps treat Tabs and Newlines as whitespace. That may not be what you want. (e.g. your fifth case would be much different there were a newline at the end.)

I suspect that you will continue to discover special cases for some time. This kind of test will assure you that proposed fixes do not break the old code.
Good Luck,
Bill

(This post was edited by BillKSmith on Aug 28, 2012, 9:25 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives