
JenniC
Novice
Jul 8, 2009, 11:49 AM
Views: 3273
|
|
Re: [ochez] Google Scholar Scraper and Excel Parser
|
|
|
This is fairly simple using biterscripting. Let's say your search term is "Prof. XYZ". The google URL will be "http://scholar.google.com/scholar?q=Prof.%20XYZ". # Script scholar.txt var str page ; cat "http://scholar.google.com/scholar?q=Prof.%20XYZ" > $page # Keep collecting and printing the string between "cited by " and "<". while ( { sen -c -r "^cited by &<^" $page } > 0 ) do var str match ; stex -c -r "^cited by &<^" $page > $match stex -c -r "^cited by ^]" $match > null ; stex -c -r "[^<^" $match > null # $match now has only the number following "cited by ". Print it. echo $match done I tested this script. It works. Try it now. Download biterscripting ( http://www.biterscripting.com ). Save the script as C:\Scripts\scholar.txt. Call it as You can also call it from a perl program. Or, you can translate the functionality to perl. If you make the script better, please post it. I think a lot of people can benefit from your better script. I use biterscripting to parse/scrape our own web pages. Jenni
(This post was edited by JenniC on Jul 8, 2009, 11:50 AM)
|