Home: Perl Programming Help: Regular Expressions:
match html string <!-- this is a comment -->

New User

Oct 25, 2013, 1:35 PM

Views: 20372
match html string <!-- this is a comment -->


I am trying to match just the string that is inside the html tags in the comment section of a html file (<!-- this is a comment -->) and replace it with caps. I found a way to match everything in the pattern, including the tags, but that is not what I am after.

This regex matches everything, but I am just trying to match the text "this is a comment" and replace it with caps. This is for a homework exercise so any help or tips will be appreciated. Thanks!



Oct 25, 2013, 4:10 PM

Views: 20366
Re: [augg05] match html string <!-- this is a comment -->

You really have three problems:

  • Read your entire file into a single string

  • Write a regex that will match an html comment

  • Write a substitute command that will convert all html comments to upper case.

  • The first is best done with a CPAN module such as Slurp. In your case it is probably sufficient to undefine the INPUT_RECORD_SEPARATOR ($/ - Refer perldoc perlvar).

    Your regex has to match all the special characters that make up an html comment. This is a bit difficult because many of them are regex metacharacters. The easy solution is to use \Q...\E to escape all-non alpha characters. Be careful not to escape any characters that you intend to have the usual regex meaning.

    Your substitution command requires several modifiers (Refer to the substitution operator in perldoc perlop):

  • /m and /s for the multi-line string

  • /g because there can be more that one comment

  • /e to execute the uc function in the replacement field

  • /r (only if you do not wish to change the original string

  • Good Luck,

    New User

    Oct 25, 2013, 4:21 PM

    Views: 20364
    Re: [BillKSmith] match html string <!-- this is a comment -->

    Thanks Bill for your time and quick response. I will have some time to work on this over the weekend.