CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Need a Custom or Prewritten Perl Program?: I need a program that...: Re: [ochez] Google Scholar Scraper and Excel Parser: Edit Log



JenniC
Novice

Jul 10, 2009, 9:05 AM


Views: 4226
Re: [ochez] Google Scholar Scraper and Excel Parser

ochez

Today is your lucky day. I was able to come up with just the right script for you.

If you have a list of search titles in an Excel file, let's change our scholar.txt script as follows.

1. We will take the search title as an argument instead of hard-coding.

2. We will wrap the search title within double quotes so google scholar will find that exact title.

3. We will use the first (and only) "cited by" number.

Here is the resulting scholar.txt script.


Code
   
# Script scholar.txt

# Input argument - search title

var str title
var str page ; cat ("http://scholar.google.com/scholar?q="+"\""+$title+"\"") > $page
# Get string between "cited by " and "<".

var str match ; set $match = "0"
if ( { sen -c -r "^cited by &<^" $page } > 0 )
do
stex -c -r "^cited by &<^" $page > $match
stex -c -r "^cited by ^]" $match > null ; stex -c -r "[^<^" $match > null
done
endif

# $match now has the number. Print it.

# If no matches were found, $match is "0".
echo $match





Save this script as C:\Scripts\scholar.com.

Let's now write another script that will read entries from Excel file, call scholar.txt script and write results back to the excel file, one by one. We will assume the excel file has the search title in the first column, it is tab-separated, and is at C:\X.txt.




Code
   
# Script excel.txt

# Read excel file.

var str input, output ; cat "C:\X.txt" > $input

# Process entries one by one.

while ($input <> "")

do

var str entry ; lex "1" $input > $entry

var str count ; script scholar.txt title($entry) > $count

set $output = $output+$entry+"\t"+$count+"\n"

done

# The updated output is in $output. Write it back.

echo $output > "C:\X.txt"





Save this script as C:\Scripts\excel.txt. Start biterscripting, enter the following command.


Code
  script excel.txt



When the script has completed running, the results will be in X.txt. Open X.txt with excel, or print it.

This time, I have not been able to test. So, test first.

If you improve upon these scripts, do post them back. There is a value in them. There is always a value in automating something that one would otherwise need to do manually - typing, clicking, reading, entering - item by item by item.

Jenni


(This post was edited by JenniC on Jul 10, 2009, 10:33 AM)


Edit Log:
Post edited by JenniC (Novice) on Jul 10, 2009, 10:33 AM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives