CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Stuck on a subsitution

 



tanker
Novice

Mar 28, 2016, 11:05 AM

Post #1 of 9 (2209 views)
Stuck on a subsitution Can't Post

Hi all, I'm a new Perl user doing some on-the-job-training. I'm writing a script to edit a text file. I'm using a series of substitutions in my script to modify the data in an array variable and I've run into a road block. I've done the following that works.

Text example: <pick;1-1;0;0>

Intention: Delete any <pick> code, match "<pick" any number of any character up to and including ">"


Code
s/<pick(.*)>//g for @data01;


So I tried something similar to match "<DOC>" and any character not a "{", up to and including next "{" and replace it with "{".

Text example: <DOC>50FWRMMED84526R CD-5112-88<pa>{I}

Between the "<DOC>" and the "{" could be anything. I'm trying to match the "{" to give it an ending "anchor". The "{" marks the beginning of the next piece of data. I've tried more combinations than I can remember but I think this was my initial trial.


Code
s/<DOC>(.*[^\{])\{/\{/g for @data01;


Can some one help me? What am I doing wrong? How would I write a match that would:
Match "<DOC>" and any character not a "{", up to and including the first occurrence of "{" and replace it with "{".

Sccond thought: I thought trying to match any character not a "{" and then including a "{" might be throwing it off so I tried the following with no success:


Code
s/<DOC>(.*)\{/\{/g for @data01;


Thanks in advance,
Stan


BillKSmith
Veteran

Mar 28, 2016, 12:17 PM

Post #2 of 9 (2204 views)
Re: [tanker] Stuck on a subsitution [In reply to] Can't Post

Neither "<" nor "{" are special characters in a regular expression. You can simply change "<" to "{" in your first example. (The capturing parentheses are not needed.)


Code
use strict; 
use warnings;
my $sample = '<DOC>50FWRMMED84526R CD-5112-88<pa>{I}';

$sample =~ s/<DOC>.*{/{/g;

print $sample, "\n";

Good Luck,
Bill


tanker
Novice

Mar 28, 2016, 12:47 PM

Post #3 of 9 (2202 views)
Re: [BillKSmith] Stuck on a subsitution [In reply to] Can't Post

No joy Frown

I'm using O'Reilly's "Programming Perl". According to the book the "Dirty Dozen" that require escaping are:

\ | ( ) [ { ^ $ * + ? .

I tried your suggestion in addition to adding an escape for the "{"


Code
s/<DOC>.*\{/\{/g


Neither worked.

Will continue to read and will check back later. Hell bent to crack this nut Smile

Thanks for the help.


FishMonger
Veteran / Moderator

Mar 28, 2016, 12:55 PM

Post #4 of 9 (2200 views)
Re: [tanker] Stuck on a subsitution [In reply to] Can't Post

Can you provide a larger sample of the file you need to process as an attachment?

The { brace does not need to be escaped when it's inside a character class and .* is the greedy quantifier. You probably should be using a non greedy quantifier.


Code
$sample =~ s/<DOC>[^{]+//g;

The g modifier may not be needed, but I'd have to see a more realistic data sample to be sure.


(This post was edited by FishMonger on Mar 28, 2016, 12:58 PM)


Laurent_R
Veteran / Moderator

Mar 28, 2016, 2:48 PM

Post #5 of 9 (2193 views)
Re: [tanker] Stuck on a subsitution [In reply to] Can't Post

Most of the time, you should avoid the ".*" combination, which is dangerous, as it often matches more than what you want. You should either have a non-greedy (or frugal) quantifier ".*?" or use a negative character class instead of the ".".

This use of a negative character class works for me:

Code
my $sample = '<DOC>50FWRMMED84526R CD-5112-88<pa>{I}'; 
$sample =~ s/<DOC>[^{]*//; # sample is now {I}


And this use of a frugal quantifier also:

Code
my $sample = '<DOC>50FWRMMED84526R CD-5112-88<pa>{I}'; 
$sample =~ s/<DOC>.*?{/{/; # sample is now {I}


If this does not work for you, then show more of your input.


BillKSmith
Veteran

Mar 28, 2016, 3:23 PM

Post #6 of 9 (2191 views)
Re: [tanker] Stuck on a subsitution [In reply to] Can't Post


Quote
I'm using O'Reilly's "Programming Perl". According to the book the "Dirty Dozen" that require escaping are:

\ | ( ) [ { ^ $ * + ? .


You are right! When my sample code "worked" (Did you run it as posted?), I did not bother to check the documentation. My "{" should be escaped. Note that Laurent got away with the same mistake in his final example.

Please post a complete example (similar to mine). Show us the exact output that you expect. All of the responses so far do solve the problem as we understand it.
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Mar 29, 2016, 7:37 AM

Post #7 of 9 (2185 views)
Re: [BillKSmith] Stuck on a subsitution [In reply to] Can't Post


In Reply To
Note that Laurent got away with the same mistake in his final example.


Yep, I knew it, though, but I made the mistake nonetheless in my quick test. And I did not bother any further, since it worked fine...


tanker
Novice

Mar 29, 2016, 8:00 AM

Post #8 of 9 (2183 views)
Re: [FishMonger and everyone else] Stuck on a subsitution [In reply to] Can't Post

FishMonger, your method worked. To be safe I escaped the "{". I also tried adding a non greedy addition but it didn't seem necessary. The whole thing was to find a <DOC> and delete everything between that and the first occurrence of a "{". That code also worked for the <DOC> that came at the end of the string.

Many sources I have viewed/read they state how in Perl, there is more than one way to do it. That can get confusing for us beginners. Smile

Thanks all for the help,
Stan

Code I ended up using:

Code
$data02 =~ s/<DOC>[^\{]+//g;



(This post was edited by tanker on Mar 29, 2016, 12:34 PM)


Laurent_R
Veteran / Moderator

Mar 29, 2016, 12:21 PM

Post #9 of 9 (2173 views)
Re: [tanker] Stuck on a subsitution [In reply to] Can't Post


In Reply To
To be safe I escaped the "{". I also tried adding a non greedy addition but it didn't seem necessary.


You don't need to escape a "{" within a character class.

And, in your specific case, you don't need a non greedy quantifier if you're using a [^{] negated character class.

A non greedy quantifier would make a difference with a string such as, for example: "abcd {I} efg {J}".

The greedy /.*\{/ regex would match "abcd {I} efg {", whereas the frugal /.*?\{/ would match only "abcd {". The difference is quite significant.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives