CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Perl - Replace is not working when it a new line

 



sunils2020
New User

Jun 16, 2014, 8:50 AM

Post #1 of 7 (7359 views)
Perl - Replace is not working when it a new line Can't Post

Hi, I am a new bie. This is the requirement I have got

perl -0777 -p -i -e '/(?s)^<style.*style>/<style\r\n tyle>/g' test

My input text:
---------------
<html
hi
hello
html>

Could you please help me with this and let me know why this is not working


Laurent_R
Veteran / Moderator

Jun 16, 2014, 10:41 AM

Post #2 of 7 (7282 views)
Re: [sunils2020] Perl - Replace is not working when it a new line [In reply to] Can't Post

Please explain what you have in your file and what you want to have instead.

There are a number of defects in what you have:
- Your regular expression is not doing anything;
- It probably does not even compile;
- Your regular expression is not looking even remotely like what you have in your text file.
- The ".*" part of your regex is very likely to be wrong: because of its "eagerness", it might match a much longer section of your input than what you want.
- I have no idea what this -0777 option of the command line is supposed to be.
- Your input text also looks pretty wierd.

If you want to do a substitution, the correct syntax is something like this:


Code
perl -pi.bak -e 's/good morning/good bye/g;' filename


I can't help you more because you did not really say what you have and what you want.


sunils2020
New User

Jun 16, 2014, 1:15 PM

Post #3 of 7 (7180 views)
Re: [Laurent_R] Perl - Replace is not working when it a new line [In reply to] Can't Post

Thank you for your reply Laurent.

I have got a html file. There are many style tags in the input file like below

test.html
<style
---
xxxxxxx
---
style>
<style
----
---
-
-
-
-
xxxxx
style>


The requirement is I will have to find the starting tag "<style" and ending tag "style>" , remove all the content between and replace
like below
<style
style>
<style
style>

Note: Search string is spanned across multiple lines


Laurent_R
Veteran / Moderator

Jun 16, 2014, 2:34 PM

Post #4 of 7 (7119 views)
Re: [sunils2020] Perl - Replace is not working when it a new line [In reply to] Can't Post

But that's not what style tags look like in HTML. They don't end up with the style keyword.

Well, anyway, it is difficult to help you if you don't provide actual data, especially when it comes to regular expressions, where you really have to know very precisely what the data looks like.

My best advice is that you probably want to look at the "m" (for multiline) regex modifier. A short example under the Perl debugger:

Code
  DB<1> $string = "<style foo\nbar\nbaz>"; 

DB<2> p $string
<style foo
bar
baz>
DB<3> $string =~ s/<style[^>]+>/<style replacement>/mg;

DB<4> p $string
<style replacement>


Here, I have a string that spans over three lines and looks like this:


Code
<style foo 
bar
baz>


The regex substitution:

Code
$string =~ s/<style[^>]+>/<style replacement>/mg;


says to replace to replace in the string anything that starts with "<style", followed by as many non ">" characters as possible, followed by a ">", with "<style replacement>". The "m" modifier says that the regex should operate in multiline mode and the "g" modifier that it should do that for all found occurrences of the search pattern.

I hope this makes sense to you. To be able to do this, you obviously have to slurp your whole file into one scalar variable, as it obviously won't work if you process the file line by line. For this, you can use the File::Slurp module or possibly use the following syntax localizing the input record separator:


Code
my $string; 
{
local $/;
$string = <FILEHANDLE>;
}


or possibly the more idiomatic:

Code
my $string = do { local $/; <FILEHANDLE> };


Having said all that, using regexes for such a relatively simple task might be acceptable (or maybe not, it depends on your input data, which you haven't shown), but, in general, using regexes for parsing HTML is usually very strongly discouraged: for anything but very trivial cases, you should probably use one of the HTML modules available on the CPAN (sample search: https://www.google.fr/search?q=cpan+html&ie=utf-8&oe=utf-8&aq=t&rls=org.mozilla:fr:official&client=firefox-a&channel=sb&gfe_rd=cr&ei=xWGfU5nWGczI4QbnxoCoBw).


FishMonger
Veteran / Moderator

Jun 16, 2014, 2:45 PM

Post #5 of 7 (7110 views)
Re: [Laurent_R] Perl - Replace is not working when it a new line [In reply to] Can't Post


Quote
- I have no idea what this -0777 option of the command line is supposed to be.


It slurps the whole file and is documented in perlrun.


Code
  Command Switches 
As with all standard commands, a single-character switch may be
clustered with the following switch, if any.

#!/usr/bin/perl -spi.orig # same as -s -p -i.orig

Switches include:

-0[*octal/hexadecimal*]
specifies the input record separator ($/) as an octal or
hexadecimal number. If there are no digits, the null character is
the separator. Other switches may precede or follow the digits. For
example, if you have a version of *find* which can print filenames
terminated by the null character, you can say this:

find . -name '*.orig' -print0 | perl -n0e unlink

The special value 00 will cause Perl to slurp files in paragraph
mode. Any value 0400 or above will cause Perl to slurp files whole,
but by convention the value 0777 is the one normally used for this
purpose.

You can also specify the separator character using hexadecimal
notation: -0x*HHH...*, where the "*H*" are valid hexadecimal
digits. Unlike the octal form, this one may be used to specify any
Unicode character, even those beyond 0xFF. So if you *really* want
a record separator of 0777, specify it as -0x1FF. (This means that
you cannot use the -x option with a directory name that consists of
hexadecimal digits, or else Perl will think you have specified a
hex number to -0.)



FishMonger
Veteran / Moderator

Jun 16, 2014, 2:55 PM

Post #6 of 7 (7102 views)
Re: [sunils2020] Perl - Replace is not working when it a new line [In reply to] Can't Post

What you're attempting to achieve is possible with a perl one-liner regex, but it's very fragile and should not be done that way. Instead, you should write a normal full script and use HTML::Parser or one of the other similar parsers to search for and replace the desired tags.

http://search.cpan.org/~gaas/HTML-Parser-3.71/Parser.pm
http://search.cpan.org/~tobyink/HTML-HTML5-Parser-0.301/lib/HTML/HTML5/Parser.pm
http://search.cpan.org/search?query=html+parser&mode=all


Laurent_R
Veteran / Moderator

Jun 16, 2014, 4:02 PM

Post #7 of 7 (7058 views)
Re: [FishMonger] Perl - Replace is not working when it a new line [In reply to] Can't Post

Thank you FishMonger, I did not know that about the -0... switch.


(This post was edited by Laurent_R on Jun 16, 2014, 4:03 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives