
rtwolfe
New User
Feb 4, 2007, 9:15 PM
Post #1 of 5
(2505 views)
|
|
Screen Scraping from HTML page - help
|
Can't Post
|
|
Thanks in advance. I've been hammering this for a week with no luck. I've even used RegexBuddy and still can't crack the problem. I have a NASDAQ premarket action stocks page where I want to pull out the Stock Symbol, yesterday's close, today's open, the gap % and the premarket volume. I've got a script that pulls the HTML down and puts it in a file. I can parse out the stock symbol but I am stuck on how the get out the close, open, gap and volume data. I think my problem is with the second (and if I could figure it out 3rd, 4th and 5th groupings) - Basically the 2nd set of round brackets or parentheses don't seem to recognize that I want Perl to save the match (yesterday's close) to $2 (assume stock symbol would be in $1) Here's the nasdaq page - http://dynamic.nasdaq.com/dynamic/premarketma.stm Here's my regex so far - am using single line (/s option) and greedy matching aafterhours&selected=([A-Z]{2,5})".*?\$([\d]{1,5}\.[\d]{2})< If this helps, here's a snippet of text that have an example where I'd like to get the data. Per RegexBuddy, my regex gets me all the way thru the $6.75 but does not understand that I want it to save the 6.75 to $2. Is my problem with strings and numbers? Appreciate any and all help! </tr> <tr><td> <script language=javascript>TickerWidget.constructWi dget('SCUR','premarket',false)</script> <!--a href="http://quotes.nasdaq.com/quote.dll? mode=frameset&page=afterhours&selected=SCUR" >SCUR</a--> </td> <td> <img src="http://content.nasdaq.com/logos/SCUR.GI F"> <BR> <a target="_top" href="/asp/offsite_activity.asp? content=http://www.securecomputing.com"> Secure Computing Corporation</a> </td> <td align="right"><nobr><b> $6.75</b></nobr></td> <td align="right"><nobr><b> $8.22</b></nobr></td> <td align="right" class="Green"><nobr><b>21.78%</b></nobr> </td> <td align="right">564,434</td>
|