CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Intermediate:
Parsing a local HTML file



Dec 17, 2000, 9:26 PM

Post #1 of 3 (235 views)
Parsing a local HTML file Can't Post


I'm trying to parse a HTML document located on my server. I currently have comments at the beginning and end of the data I would like to retrieve like this:

Data to be retreived is here
more here
and here

What Id like to do is get everything inside of the <!--startdata--> and <!--enddata--> and print it out on another page. How would I go about this?

Enthusiast / Moderator

Dec 17, 2000, 10:38 PM

Post #2 of 3 (233 views)
Re: Parsing a local HTML file [In reply to] Can't Post

 You could do this,

open(data, "<file.html") || die "$!"; 
$whole_file .= "$_";
if ($whole_file =~ m,<!--startdata-->(.+)<!--enddata-->,s){
$content = "$1";
print "$content";


(This post was edited by sleuth on Dec 17, 2000, 9:41 PM)


Dec 18, 2000, 6:30 AM

Post #3 of 3 (227 views)
Re: Parsing a local HTML file [In reply to] Can't Post

If you're just trying to get the content in between two comment tags, you can probably use this:

my $save = 0; 
my $string;

open FILE, $html or die "can't read $html: $!";
while (<FILE>) {
$save = 0 if /^<!--enddata-->/;
$string .= $_ if $save;
$save = 1 if /^<!--startdata-->/;
last if $string and not $save;
close FILE;

However, there is Perl shorthand for this type of thing, using the flip-flop operator, ..:

my $string; 

open FILE, $html or die "can't read $html: $!";
while (<FILE>) {
if (s/^<!--startdata-->\n// .. s/^<!--enddata-->\n//) {
$string .= $_;
else {
last if $string;
close FILE;

The flip-flop operator becomes true when the left-hand expression is true, and stays true until the right-hand expression is true. However, we can take advantage of the fact that Perl doesn't require us to use "\n" to end a "line" -- we can tell Perl that a record (the proper word to use, rather than line) ends in the string "<!--startdata-->\n". If we do that, we read one record in, and the redefine a record to end in "<!--enddata-->\n" -- when we read the next record, and that's all the information you need:

my $string; 
local $/ = "<!--startdata-->\n";
open FILE, $html or die "can't read $html: $!";
<FILE>; # this is before the part you want
$/ = "<!--enddata-->\n";
chomp($string = <FILE>);
close FILE;

chomp() does not remove a trailing "\n" character -- it removes the value of $/ from the end of a string. And this variable defaults to "\n". But since I changed it to "<!--enddata-->\n", chomp() removes THAT from the end of the string.

I hope this has all been helpful and informative.


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives