CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
help with substitution of unclosed html tags

 



awebb
New User

Aug 23, 2007, 2:12 AM

Post #1 of 3 (2669 views)
help with substitution of unclosed html tags Can't Post

Hello, I have been trying to convert a HTML list like the one below to a ordered series of lines but I am having trouble finding the right regex to do the job.

I need to convert this:


Code
<ul> <li>first item <li>second item <li>third item <li>fourth item <li>fifth item </ul>


to this:

1. first item
2. second item
3. third item
4. fourth item
5. fifth item

I have been trying something like the regex below:


Code
my $count = 1; 
$count++ while ($html =~ s{<li.*>(.+)\s*(?= (<li.*>|</ul>))}{$count\. $1\n}ig);


but I keep getting only the last item to output. I would really be greatful to anyone who could tell me exactly where I am screwing things up.


KevinR
Veteran


Aug 23, 2007, 12:08 PM

Post #2 of 3 (2666 views)
Re: [awebb] help with substitution of unclosed html tags [In reply to] Can't Post

Most likely the problem is the greedy pattern match '.*' in <li.*>, change it to <li.*?> and it might help. Here isanother possible way to do what I think you are trying to do:


Code
$html = '<ul> <li>first item <li>second item <li>third item <li>fourth item <li>fifth item </ul>'; 
my $count = 1;
$html =~ s{(?:<li>)([^<]+)}{$count++ . ". $1\n"}eig;
$html =~ s{</?ul>}{}g;
print $html;


might need some further tweaking
-------------------------------------------------


awebb
New User

Aug 23, 2007, 2:20 PM

Post #3 of 3 (2663 views)
Re: [KevinR] help with substitution of unclosed html tags [In reply to] Can't Post

Thanks for your help Kevin, but I managed to fix the problem last night shortly after posting this forum topic. It turns out that the .* was greedy like you said. I fixed the problem with the regex below if anybody ever runs into the same kind of problem.


Code
my $count = 1; 
$count++ while ($html =~ s{[ \t]*<li>([^<>]+)(?= <li>|</ul>)}{$count\. $1\n}i);


I have not tried your approach yet but it may be more efficient. It looks to me like it should work. In fact I think I like the way you accomplished the task better than mine.


(This post was edited by awebb on Aug 23, 2007, 2:27 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives