CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Need a Custom or Prewritten Perl Program?: I Need a Programmer for Freelance Work:
Need to partially automate user-driven scraping. Tight Spec, no arguments about completion.

 



evan1138
New User

Mar 27, 2013, 4:32 PM

Post #1 of 1 (9914 views)
Need to partially automate user-driven scraping. Tight Spec, no arguments about completion. Quote | Reply | Private Reply

2013-03-27

For PC.

I need some perl coding to semi-automate extraction of US teacher email addresses at school, not at home. The Perl (or other?) program does not have to be clever, and since the spec is fairly complete there shouldn't be much back-and-forth about whether the job is completed.

I HAVE A CSV FILE WITH THESE FIELDS:

teacher first name
teacher last name
school name
school street address
school state
school zip
search status
email address

The last two fields start out blank and are filled with the following values:

(1) "search status" (blank: nothing tried; "found"; "last pattern tried": 1..4; abandoned; email obscured; timed out; (others ?)

(2) "email address", the email address found, or blank if not (yet) found.

These statuses are click driven and filled in by program.

THE SCREEN HAS TWO WINDOWS

The first window is the program window which I think is, for now, a static UI but with a one-line message / prompt bar at top. There are some buttons with lit/dark states, there are some radio buttons, and there are a few plain old lit/dark readouts. At the bottom of the window is are two message areas of one line each. No scrolling for now.

The second window is a browser with google search.

INITIALIZATION

The browser window is indicated to the perl program by the user during initialization in some way.

We load the table and scan to the first record that has a blank "search status"; or we are at EOF and exit with message and flush and close the file with a message in the

THE "ALGORITHM":

The current record is displayed in the program window.

The user clicks on a search method (1..4 radio buttons, but just 1 is available now) and the radio button lights up for clarity. (Became dark when the current record changed).

The program does a search on school name plus state (e.g., "Akron High School OH") and from the search results the operator copies the school's URL and clicks a button in the program window. (Or, if possible, the user just highlights the URL and the program copies it, expanding the selection in both directions and then trims of any following /'s, etc.

The program pulls the school's url from the clipboard (or from the program's own buffer if the screen scrape above is possible) and composes, e.g., the following google search string.

"site:ahs.k12.oh.us Lisa Simpson" without the quotes, and does the search.

The user then browses around and if found copies the email address into the clipboard (or again, maybe the program can screen-scrape and clean) and clicks the "found" button, or fails for some reason and clicks a different "search status"

Whatever the click, it prompts the program to install the values for "search status" and "email address" into those fields, and then display the revised record.

The operator approves and clicks "next record" which goes to the next record with blank "search status".

USER HELPERS:

There will be some user recovery buttons, and the first to be implemented will be "redo record", typically because the operator saw something wrong, or had pushed a wrong "search status" button at the end of the search. So this is included in the first pass on the program.

WHAT SAY YOU?:

OK, lots of description but not all that much action, I believe. Anybody want to talk about some coding?


(This post was edited by evan1138 on Mar 27, 2013, 5:19 PM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives