CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Screen Scraping from HTML page - help

 



rtwolfe
New User

Feb 4, 2007, 9:15 PM

Post #1 of 5 (4005 views)
Screen Scraping from HTML page - help Can't Post

Thanks in advance. I've been hammering this for a week with no luck. I've even used RegexBuddy and still can't crack the problem. I have a NASDAQ premarket action stocks page where I want to pull out the Stock Symbol, yesterday's close, today's open, the gap % and the premarket volume.

I've got a script that pulls the HTML down and puts it in a file.
I can parse out the stock symbol but I am stuck on how the get out the close, open, gap and volume data. I think my problem is with the second (and if I could figure it out 3rd, 4th and 5th groupings) - Basically the 2nd set of round brackets or parentheses don't seem to recognize that I want Perl to save the match (yesterday's close) to $2 (assume stock symbol would be in $1)

Here's the nasdaq page - http://dynamic.nasdaq.com/dynamic/premarketma.stm

Here's my regex so far - am using single line (/s option) and greedy matching
aafterhours&selected=([A-Z]{2,5})".*?\$([\d]{1,5}\.[\d]{2})<

If this helps, here's a snippet of text that have an example where I'd like to get the data. Per RegexBuddy, my regex gets me all the way thru the $6.75 but does not understand that I want it to save the 6.75 to $2. Is my problem with strings and numbers? Appreciate any and all help!


</tr>

<tr><td>

<script

language=javascript>TickerWidget.constructWi

dget('SCUR','premarket',false)</script>
<!--a

href="http://quotes.nasdaq.com/quote.dll?

mode=frameset&page=afterhours&selected=SCUR"

>SCUR</a-->

</td>
<td>


<img

src="http://content.nasdaq.com/logos/SCUR.GI

F">



<BR>


<a target="_top"

href="/asp/offsite_activity.asp?

content=http://www.securecomputing.com">
Secure Computing

Corporation</a>


</td>

<td align="right"><nobr><b>
$6.75</b></nobr></td>

<td align="right"><nobr><b>
$8.22</b></nobr></td>

<td align="right"

class="Green"><nobr><b>21.78%</b></nobr>
</td>

<td align="right">564,434</td>


Watts
New User

Feb 5, 2007, 2:52 PM

Post #2 of 5 (3997 views)
Re: [rtwolfe] Screen Scraping from HTML page - help [In reply to] Can't Post

Are you doing all the stocks listed on the page, or you just interested in a particular one?

http://quotes.nasdaq.com/quote.dll?page=multi&mode=stock&symbol=intc

Intel for example, this source code may be easier to scrape (individually) since you get the html like this:


Code
   <td >Share Volume:</td> \    
<td >63,234,844</td> \
<td >Previous Close:</td> \
<td >$&nbsp;21.23</td> \





Code
 #Example Code: if ($Line =~ m/\Q put-1st-delimiter-here \E(.*?)\Q put-2nd-delimiter-here \E(.*?)\Q put-3rd-delimiter-here \E/i) {   
#this grabs text using the first delimiter to locate a "spot" on the page and then grab whatever is in between (using $2)
#the second and third delimiters as noted by \Q ... \E


if ($Line =~ m/\QPrevious Close\E(.*?)\Qnbsp;\E(.*?)\Q</td>\E/i) {

print $2;

}





This should give you "21.23" (provided you've escaped what needs to be escaped, etc.)


Just a thought...


(This post was edited by Watts on Feb 5, 2007, 3:06 PM)


rtwolfe
New User

Feb 5, 2007, 7:43 PM

Post #3 of 5 (3985 views)
Re: [rtwolfe] Screen Scraping from HTML page - help [In reply to] Can't Post

Hi Watts:

To answer your question, I do want several stocks from the pre-market gapping page.

Specifically the ten stocks that are up in pre-market and the ten that are down.

Not interested in the 10 with most volume, just highest % change either up or down.



So I still need to parse the page with the link show earlier.

Appreciate any suggestions related to the HTML text on the

nasdaq page - http://dynamic.nasdaq.com/dynamic/premarketma.stm

I expect it is something really small that is keeping my draft regex from working, just I can't 'see' it.

Hopefully with new sets of eyes someone else will 'get it'.. Thanks again




KevinR
Veteran


Apr 2, 2007, 9:39 AM

Post #4 of 5 (3849 views)
Re: [rtwolfe] Screen Scraping from HTML page - help [In reply to] Can't Post

this question was answered and resolved on another forum when this question was originally posted here.
-------------------------------------------------


reder
New User

Apr 16, 2007, 7:14 PM

Post #5 of 5 (3757 views)
Re: [KevinR] Screen Scraping from HTML page - help [In reply to] Can't Post

I worked on a project very like this a few years ago. Modules such as HTML::TokeParser::Simple are well suited to the task. Avoid using regex as they tend to get very messy very quickly.

Also there is a HTML Template module which was very useful for scraping stock data.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives