CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Beginner:
Parsing Web Page - Regex


New User

Dec 6, 2013, 8:37 AM

Post #1 of 4 (1615 views)
Parsing Web Page - Regex Can't Post


While parsing a simple web page, a regex that works for what seem to be the exact same string, is not working for others. Attached is the fully working program.
This is the example output where the issue occurs:
DATA NOT FOUND: Total Bedrooms:

The HTML that I am parsing looks as such:

 <div class="field-items"> 
<div class="field-item odd">
<div class="field-label-inline-first">
Total Bedrooms:&nbsp;</div>
5 </div>

Thanks in advance!

This is perl 5, version 16, subversion 3 (v5.16.3) built for MSWin32-x64-multi-thread
(with 1 registered patch, see perl -V for more detail)

Copyright 1987-2012, Larry Wall

Binary build 1603 [296746] provided by ActiveState
Built Mar 13 2013 13:31:10
Attachments: (5.88 KB)

Veteran / Moderator

Dec 6, 2013, 8:53 AM

Post #2 of 4 (1612 views)
Re: [edlong] Parsing Web Page - Regex [In reply to] Can't Post

In almost all cases using a regex to parse HTML is a mistake because it's very fragile.

You should be using one of the HTML parsers on cpan, such as HTML::Parser.

If you scroll down to the bottom to the "SEE ALSO" section, you'll have a list of related modules.


Dec 6, 2013, 9:07 AM

Post #3 of 4 (1607 views)
Re: [edlong] Parsing Web Page - Regex [In reply to] Can't Post

Heed FishMonger's advice, and immerse yourself in the canonical You can't parse [X]HTML with regex. Because HTML can't be parsed by regex.

New User

Dec 6, 2013, 10:38 AM

Post #4 of 4 (1595 views)
Re: [Kenosis] Parsing Web Page - Regex [In reply to] Can't Post

After reviewing the options, TokeParser did the trick for me. It's not as pretty as I'd like, but I think that is primarily due to how ugly the HTML is.

Still curious why the REGEX didn't work though. Understand this isn't the best method for HTML parsing;

Thanks for the help!

(This post was edited by edlong on Dec 6, 2013, 12:45 PM)


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives