Oct 30, 2012, 9:34 PM
Post #1 of 11
Okay, so my company thinks because I had a 7 week class on Perl (Completed, got a decent grade) that I'm a genuis now (sarcasm). They've asked me for some assistance on a project. We are moving servers and networks, it's a crazy mess. My boss wants me to pull the urls from some of our intranet sites by viewing the source code, so we can see how we might want to config the new sharepoint and intranet sites (we have so much fluff for site locations, that some files haven't been updated since like 2009, but people have been creating them elsewhere). So, it's basically web crawling across our network. We had a huge layoff and its been nuts. What I thought about doing is using Perl to pull the urls from a saved .txt or .html and add them to an array (something I used to hate in Java, but find nicer in Perl). Everything will print out and I can copy it into a spreadsheet/word doc and start my Visio workflow diagram. I'd ask my scripting guru at work, but he's out with a new kid. So, now you see my dilemma. Wanna assist? Thanks for reading.
Obtaining urls from source code
Being an intern, I can't VPN in to access the intranet sites, but it's a project he'd like me to help with. So I've just been using source views from firefox, or making my own "one or two" entries to test my code.
open(DAT, $htmlfile) || die ("Dude no file by that name!");
now this pulls my fake test html (if in the same directory) and allows me to print from the array. my plan was to use a foreach method to use the $each_line to acquire each line of text and then a regular expression to verify my the information before printing it out and being able to copy it to notepad/word/etc.
Below is where I get lost. I know the idea is sound and should be rather easy to accomplish. Thanks again for any assistance.
foreach $each_line (@htmlarray)
add file data to arrray
find lines of urls
print them out so I can copy them to a text documents (probably between 1,000-1,500 urls or A LOT more);
Perl Newbie - 7 months of PERL basics.