CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Search criteria not matching incase of iterating through multiple files

 



sunils2020
New User

Jun 30, 2014, 7:00 AM

Post #1 of 6 (1776 views)
Search criteria not matching incase of iterating through multiple files Can't Post

Hi ,
I am searching for keyword <body> and body> inside the html file and extracting their contents. I have uploaded my sample html file. It is not a real html file. I just created an example with body tags.

Note:
I am searching all the files inside the directory.

Problem:
My search string <data> is successful for the first file in the directory. When the next file is picked, search criteria <data> is not found. I don't know what is wrong in the below program.

My files inside the directory are exactly same but with the different file name. ( Created for test purpose)

HTML File content:
sunil
adasdas
<body>
asasumar
hello
hi
123
</body>
asdas
adas














Below is my program.

#!/usr/local/bin/perl
#Reading a html file
use strict;
use warnings;

my $dir = 'c:\sunil';
#open (OUTFILE, '>>c:\test\x.xml');
opendir(DIR, $dir) or die $!;
while (my $file = readdir(DIR))
{
# Use a regular expression to ignore files beginning with a period
next if ($file =~ m/^\./);
#print "$dir\\$file.xml\n";
$input_filename="$dir\\$file\n";
$output_filename = "$dir\\output.xml\n";

open (MYFILE,"$input_filename");
open OUTFILE, ">>$output_filename" or die $!;
print OUTFILE "Title: $input_filename\n";
while (<MYFILE>)
{
chomp;
print "$_";
if("$_" =~ "/<body>/")
{
# Moment <body tag is found, extract all the values between
$start_reading = "read";
next;
}
if ("$_\n" =~ /body>/)
{
last;
# body> tag reached. Stop reading the file and exit
}
if ( $start_reading eq "read" )
{
#print "$_\n";
# writing out to a file
print OUTFILE "$_\n";
}
}
close (MYFILE);
close (OUTFILE);
}
closedir(DIR);
Attachments: data - Copy (2) - Copy.html (71 B)
  data - Copy (2).html (71 B)


Laurent_R
Veteran / Moderator

Jun 30, 2014, 11:56 AM

Post #2 of 6 (1735 views)
Re: [sunils2020] Search criteria not matching incase of iterating through multiple files [In reply to] Can't Post

Please use code tags to preserve the code formatting and make the code readable.

Just one quick comment:


Code
$input_filename="$dir\\$file\n"; 
$output_filename = "$dir\\output.xml\n";


Don't put "\n"'s in your file names, this is useless and counter-productive.

I'll wait for a better formatted code before giving further comments.


BillKSmith
Veteran

Jun 30, 2014, 11:57 AM

Post #3 of 6 (1735 views)
Re: [sunils2020] Search criteria not matching incase of iterating through multiple files [In reply to] Can't Post

The code you posted was never successful. In fact, it does not even compile. Let me guess. You added strict and warnings to your post because you knew we would suggest it. You do have to fix the errors that they reveal. If you declare three variables with the proper scope, you will not only fix the errors, you will fix your problem.
Good Luck,
Bill


Zhris
Enthusiast

Jun 30, 2014, 11:04 PM

Post #4 of 6 (1483 views)
Re: [sunils2020] Search criteria not matching incase of iterating through multiple files [In reply to] Can't Post

You will also have to reset $start_reading before last; upon reaching the closing body tag, otherwise you'll also write stuff before the opening body tag for all html files beyond the first.

Rather than describing other improvements you should make to your code, I would suggest you use a HTML parser to read the body section instead, it will be more reliable and work for "any" format i.e. you wouldn't have to assume opening and closing body tags exist on their own lines or control when to write with a flag variable.

Chris


(This post was edited by Zhris on Jun 30, 2014, 11:06 PM)


BillKSmith
Veteran

Jul 1, 2014, 3:46 AM

Post #5 of 6 (1372 views)
Re: [Zhris] Search criteria not matching incase of iterating through multiple files [In reply to] Can't Post

If you declare $start_reading (as a lexical variable) at the top of the readdir loop, it will go out of scope at the bottom. Each file will have its own copy of this variable. A 'reset' is never needed. That was exactly my point that proper scoping would solve the original problem.
Good Luck,
Bill


Zhris
Enthusiast

Jul 1, 2014, 4:05 AM

Post #6 of 6 (1364 views)
Re: [BillKSmith] Search criteria not matching incase of iterating through multiple files [In reply to] Can't Post

Ah yes I see, I hadn't understood what you originally inferred. As you have figured, my answer considered that $start_reading is not declared anywhere (assumably no strict too).


(This post was edited by Zhris on Jul 1, 2014, 4:09 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives