CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
How to tell if a file is reachable

 



vendetta1
New User

May 9, 2013, 6:14 PM

Post #1 of 5 (438 views)
How to tell if a file is reachable Can't Post

I have a homework assignment that doesn't make any sense to me. In the code block is what I'm given. I have never done scripting w/ webpages before, not really sure how to get started. The part I could use some pointers on is Part 2 in that I don't know how to tell if a "file" is reachable or not.
I have tried putting "wget http://www.oracle.com/us/solutions/index.html" into my shell and a bunch of info about the site comes up, but not sure what do to with it.

Here is the writeup:


Code
You have a web site containing static pages, such as www.oracle.com and you wish to verify the site. 

Part 1: Static Verifier
Write an application called static_verifier that takes command-line arguments directory and a base URI (i.e. Uniform Resource Identifier, such as h_tp://www.oracle.com/us/solutions/index.html The application will scan all .html files in the directory and its subdirectories for <a> (anchor tags) and <img> (image tags) to find linked files.
For each link, determine whether it points to an internal (this site) or external resource. If it is internal, verify whether the file exists in your snapshot. Output should consist of the file name, the missing internal links, the valid internal links, and the external links. Indent each section. Within each section, list the names alphabetically so they will be diff-compatible with baseline data.

Sample Output (from modified index.html):

data/index-broken.html
Missing Internal Links
bad_dijkstra.zip
missing_single-dispatch.cc
oldsite/index.html
test_assignment-5.html
test_assignment-5.html
Valid Internal Links
Run_ass1
allocator_skel.cc
args.cc
assignment-1.html
assignment-2.html
assignment-3.html
lecture-01.html
lecture-02.html
lecture-03.html
wordcount-btree-skel.cc
wordcount-map.cc
External Links
http://catb.org/jargon/
http://cis.stvincent.edu/html/tutorials/swd/index.html
http://courses.washington.edu/css343/zander
http://en.wikibooks.org/wiki/C%2B%2B
http://en.wikibooks.org/wiki/More_C%2B%2B_Idioms
http://en.wikipedia.org/wiki/Bash_%28Unix_shell%29
http://en.wikipedia.org/wiki/Bourne_shell
http://en.wikipedia.org/wiki/C_shell
http://en.wikipedia.org/wiki/Exit_status
http://en.wikipedia.org/wiki/Man_page
http://en.wikipedia.org/wiki/Script_%28computing%29
http://www.cs.sunysb.edu/~algorith/video-lectures/
http://www.parashift.com/c++-faq-lite/index.html
http://www.uwb.edu/css
http://www.washington.edu/computing/unix/
http://yosefk.com/c++fqa/
https://catalyst.uw.edu/collectit/dropbox/morrisb9/25684

Part 2

Write a program called file_verifier that will take the same arguments as static_verifier.

Verify that each file in the subtree is reachable directly or indirectly from the the homepage (index.html). Print out the list of unreachable files in sorted order.



hwnd
User

May 9, 2013, 6:54 PM

Post #2 of 5 (430 views)
Re: [vendetta1] How to tell if a file is reachable [In reply to] Can't Post

This is a cross post from stackoverflow, which I see your question on their has been closed.


(This post was edited by hwnd on May 9, 2013, 6:55 PM)


vendetta1
New User

May 9, 2013, 6:59 PM

Post #3 of 5 (427 views)
Re: [hwnd] How to tell if a file is reachable [In reply to] Can't Post

That was a vague question about part 1, this is part 2.


FishMonger
Veteran / Moderator

May 9, 2013, 7:12 PM

Post #4 of 5 (425 views)
Re: [vendetta1] How to tell if a file is reachable [In reply to] Can't Post

This question is also cross posted on devshed.

We can't/won't give you the solution to a homework assignment, but I will give you a hint.

Look at HTML::LinkExtor


vendetta1
New User

May 10, 2013, 12:05 PM

Post #5 of 5 (400 views)
Re: [FishMonger] How to tell if a file is reachable [In reply to] Can't Post

LinkExtor! that's what I needed. Thank you

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives