CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Parsing a local HTML file

 



Hawk
stranger

Dec 17, 2000, 9:26 PM

Post #1 of 3 (165 views)
Parsing a local HTML file Can't Post

Hello,

I'm trying to parse a HTML document located on my server. I currently have comments at the beginning and end of the data I would like to retrieve like this:

<!--startdata-->
Data to be retreived is here
more here
and here
<!--enddata-->

What Id like to do is get everything inside of the <!--startdata--> and <!--enddata--> and print it out on another page. How would I go about this?



sleuth
Enthusiast / Moderator

Dec 17, 2000, 10:38 PM

Post #2 of 3 (163 views)
Re: Parsing a local HTML file [In reply to] Can't Post

 You could do this,


Code
open(data, "<file.html") || die "$!"; 
while(<data>){
$whole_file .= "$_";
}close(data);
if ($whole_file =~ m,<!--startdata-->(.+)<!--enddata-->,s){
$content = "$1";
}
print "$content";

Sleuth



(This post was edited by sleuth on Dec 17, 2000, 9:41 PM)


japhy
Enthusiast

Dec 18, 2000, 6:30 AM

Post #3 of 3 (157 views)
Re: Parsing a local HTML file [In reply to] Can't Post

If you're just trying to get the content in between two comment tags, you can probably use this:


Code
my $save = 0; 
my $string;

open FILE, $html or die "can't read $html: $!";
while (<FILE>) {
$save = 0 if /^<!--enddata-->/;
$string .= $_ if $save;
$save = 1 if /^<!--startdata-->/;
last if $string and not $save;
}
close FILE;

However, there is Perl shorthand for this type of thing, using the flip-flop operator, ..:


Code
my $string; 

open FILE, $html or die "can't read $html: $!";
while (<FILE>) {
if (s/^<!--startdata-->\n// .. s/^<!--enddata-->\n//) {
$string .= $_;
}
else {
last if $string;
}
}
close FILE;

The flip-flop operator becomes true when the left-hand expression is true, and stays true until the right-hand expression is true. However, we can take advantage of the fact that Perl doesn't require us to use "\n" to end a "line" -- we can tell Perl that a record (the proper word to use, rather than line) ends in the string "<!--startdata-->\n". If we do that, we read one record in, and the redefine a record to end in "<!--enddata-->\n" -- when we read the next record, and that's all the information you need:


Code
my $string; 
{
local $/ = "<!--startdata-->\n";
open FILE, $html or die "can't read $html: $!";
<FILE>; # this is before the part you want
$/ = "<!--enddata-->\n";
chomp($string = <FILE>);
close FILE;
}

chomp() does not remove a trailing "\n" character -- it removes the value of $/ from the end of a string. And this variable defaults to "\n". But since I changed it to "<!--enddata-->\n", chomp() removes THAT from the end of the string.

I hope this has all been helpful and informative.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives