CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Find and replace

 



hwnd
User

Aug 24, 2013, 5:18 AM

Post #1 of 7 (1145 views)
Find and replace Can't Post

I am having difficulty replacing the parentheses that are outside the text items that contain brackets, only replacing the outer parentheses.

Example:


Code
 my $str = 'this (is) a test of <a te(st)ing> of a (type)';



I am looking to get output as expected:


Code
 this 'is' a test of <a te(st)ing> of a 'type'



I can match them using a regular expression, but when I use the regex in the s/// operator it goes crazy.


Code
 (?:(?:<[^>]*>)|([\(\)]))



2teez
Novice

Aug 24, 2013, 8:04 AM

Post #2 of 7 (1138 views)
Re: [hwnd] Find and replace [In reply to] Can't Post

Hi,
Using only this data you posted, the following could do it for you:

Code
$str=~s{\s+?\((.+?)\)(\s+)?}{ '$1' }g;


You can

Code
use re 'debug';

at the top of your script to see what was going on.
Hope this helps.
*Update*
In case you want the explanation of your regex you could see this module: YAPE::Regex::Explain.
Below is how the above regex is explained:

Quote
this 'is' a test of <a te(st)ing> of a 'type' The regular expression:

(?-imsx:\s+?\((.+?)\)(\s+)?)

matches as follows:

NODE EXPLANATION
----------------------------------------------------------------------
(?-imsx: group, but do not capture (case-sensitive)
(with ^ and $ matching normally) (with . not
matching \n) (matching whitespace and #
normally):
----------------------------------------------------------------------
\s+? whitespace (\n, \r, \t, \f, and " ") (1 or
more times (matching the least amount
possible))
----------------------------------------------------------------------
\( '('
----------------------------------------------------------------------
( group and capture to \1:
----------------------------------------------------------------------
.+? any character except \n (1 or more times
(matching the least amount possible))
----------------------------------------------------------------------
) end of \1
----------------------------------------------------------------------
\) ')'
----------------------------------------------------------------------
( group and capture to \2 (optional
(matching the most amount possible)):
----------------------------------------------------------------------
\s+ whitespace (\n, \r, \t, \f, and " ") (1
or more times (matching the most amount
possible))
----------------------------------------------------------------------
)? end of \2 (NOTE: because you are using a
quantifier on this capture, only the LAST
repetition of the captured pattern will be
stored in \2)
----------------------------------------------------------------------
) end of grouping
----------------------------------------------------------------------



(This post was edited by 2teez on Aug 24, 2013, 8:19 AM)


hwnd
User

Aug 24, 2013, 8:18 AM

Post #3 of 7 (1133 views)
Re: [2teez] Find and replace [In reply to] Can't Post

Yes that works except it matches the parentheses in the <> tags if I have a string like..


Code
this (is) a test of <te(st)ing > of a (type) of (that) 
with <a (other stuff)> as well (plus quotes) and stuff foo bar baz <img (djfdjfd)> of (hey)



Laurent_R
Veteran / Moderator

Aug 24, 2013, 9:35 AM

Post #4 of 7 (1116 views)
Re: [hwnd] Find and replace [In reply to] Can't Post

Hmm, regexes might not be the right tool for this type of things with nested symbols such as <> [] {} () "" (at least not pure regexes). You can still handle very simple cases with regexes, but it quickly becomes unmanageable. You'll need to use a real parser or you can build yourself a simple finite state machine or automaton reading the input progressively and recording the current state at any point.


hwnd
User

Aug 24, 2013, 11:43 AM

Post #5 of 7 (1107 views)
Re: [Laurent_R] Find and replace [In reply to] Can't Post

Yes if I could use a parser, it would be easy. In my case, the editor I'm using to find and replace only allows regex.


Laurent_R
Veteran / Moderator

Aug 24, 2013, 11:50 AM

Post #6 of 7 (1106 views)
Re: [hwnd] Find and replace [In reply to] Can't Post

Sorry, I naively thought this was a question related to Perl.

Wink


Zhris
Enthusiast

Aug 24, 2013, 9:32 PM

Post #7 of 7 (1093 views)
Re: [hwnd] Find and replace [In reply to] Can't Post

Hi,

You could possibly perform this task in multiple steps ( three or four separate regexps ).

Of course, with real world data, you may bump into issues that haven't been represented by your example data, therefore I haven't supported i.e. only supports one set of parenthesis per < > wrapper, no support for unmatched parenthesis etc etc etc.

Breakdown (Perl-ified):

Code
>>> Substitute desirable parenthesis with another bracket type char set, perform core quotification, then re-substitute <<< 

1. Find unique brackets that are not wrapped in triangle brackets. I have tested for square brackets.
m/<(.*?)[\[\]](.*?)>/

2. Replace parenthesis brackets wrapped in triangle brackets with square brackets.
s/(<(?:.*?))\((.*?)\)((?:.*?)>)/$1\[$2\]$3/g

3. Quotify leftover parenthesis brackets.
s/\((.*?)\)/'$1'/g

4. Replace square brackets wrapped in triangle brackets with parenthesis brackets.
s/(<(?:.*?))\[(.*?)\]((?:.*?)>)/$1($2)$3/g


Code (pure Perl):

Code
my $str = "this (is) a test of <a te(st)ing> of a (type)"; 

die "not unique enough\n" if ( $str =~ m/<(.*?)[\[\]](.*?)>/ );

$str =~ s/(<(?:.*?))\((.*?)\)((?:.*?)>)/$1\[$2\]$3/g;
$str =~ s/\((.*?)\)/'$1'/g;
$str =~ s/(<(?:.*?))\[(.*?)\]((?:.*?)>)/$1($2)$3/g;

print $str;


My preference would be to do this in Perl ;). The example below is rough / probably inefficient, but is provided to represent the comparatively ease of this approach. It is also likely to support real world data more desirably:

Code
my $str = "this (is) a test of <a te(st)ing> of a (type)"; 
$str =~ s/(.)/quotify($1)/eg;
print $str;

sub quotify
{
my $c = shift;
$c = "'" unless (($c eq '<' .. $c eq '>') || $c !~ /[()]/);
return $c;
}


Finally, could it possible to use lookahead / lookbehind assertions to produce a single regexp (I need to brush up on these).

Hope this helps.

Chris


(This post was edited by Zhris on Aug 24, 2013, 10:40 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives