CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Problem with extraction script (help?)

 



[ICNH]
New User

May 26, 2010, 11:59 AM

Post #1 of 3 (1351 views)
Problem with extraction script (help?) Can't Post

Hey, guys. I have a simple program that is supposed to get hardcoded start/end delimiters and extract the data in between them. The program could theoretically have as many delimiters as needed for multiple searches, and it will simply loop through all the files one-by-one and extract whatever is specified. All the text is contained within multiple .html or .php files that have been scraped online. The program runs through each file individually, and compiles the results into an .xls file with html table tags (which Excel handles nicely).

I tried this for two sets of data, and in both cases, everything was extracted perfectly. However, moving onto another set, instead of extracting the company name like I want it to, it returns an entire table, and I can't understand why.

I've attached my code and a sample .php file so you can see what I'm working with. Can anyone help me understand where my error is? (Ignore the comments - this program is intended to be used by other people in the future other than myself.)

For clarification, my start delimiter is what it is because it's the only instance of that particular string in the program (which immediately precedes the company name, 101Communications). The ending delimiter is the instance of a character immediately after the company name, where theoretically the program should stop looking. If you run it, you'll see the problem I get. But if I try to use certain other delimiters (say, to get the BODY BGCOLOR right in the beginning of the program), the program works fine.

Suggestions? And thanks in advance for anyone that can offer any help.
Attachments: 101Communications.php.html (10.7 KB)
  FINAL.pl (2.47 KB)


FishMonger
Veteran / Moderator

May 26, 2010, 12:43 PM

Post #2 of 3 (1347 views)
Re: [[ICNH]] Problem with extraction script (help?) [In reply to] Can't Post

Start by adding these 2 pragmas.

Code
use strict; 
use warnings;


Then scrap your current approach and use one of the html parsers on cpan.

http://search.cpan.org/modlist/World_Wide_Web/HTML

This one may be a good choice.

HTML::TagParser
http://search.cpan.org/~kawasaki/HTML-TagParser-0.16/lib/HTML/TagParser.pm


[ICNH]
New User

May 26, 2010, 1:04 PM

Post #3 of 3 (1341 views)
Re: [FishMonger] ] Problem with extraction script (help?) [In reply to] Can't Post

Ok, I'll certainly check that out. In the event that I couldn't figure that out, is there a quick fix for the model I currently have?

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives