May 26, 2010, 11:59 AM
Post #1 of 3
Hey, guys. I have a simple program that is supposed to get hardcoded start/end delimiters and extract the data in between them. The program could theoretically have as many delimiters as needed for multiple searches, and it will simply loop through all the files one-by-one and extract whatever is specified. All the text is contained within multiple .html or .php files that have been scraped online. The program runs through each file individually, and compiles the results into an .xls file with html table tags (which Excel handles nicely).
Problem with extraction script (help?)
I tried this for two sets of data, and in both cases, everything was extracted perfectly. However, moving onto another set, instead of extracting the company name like I want it to, it returns an entire table, and I can't understand why.
I've attached my code and a sample .php file so you can see what I'm working with. Can anyone help me understand where my error is? (Ignore the comments - this program is intended to be used by other people in the future other than myself.)
For clarification, my start delimiter is what it is because it's the only instance of that particular string in the program (which immediately precedes the company name, 101Communications). The ending delimiter is the instance of a character immediately after the company name, where theoretically the program should stop looking. If you run it, you'll see the problem I get. But if I try to use certain other delimiters (say, to get the BODY BGCOLOR right in the beginning of the program), the program works fine.
Suggestions? And thanks in advance for anyone that can offer any help.