CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
return text between <title>-tags from html

 



just.marc.smith
New User

Aug 23, 2013, 6:42 AM

Post #1 of 3 (773 views)
return text between <title>-tags from html Can't Post

Dear users,

I'm learning perl using a book PERL for beginners from geoffrey Sampson. I'm dedicated to start using PERL in my job.
However, there is a task that needs to be done rather quick.

I only got to the 50th page of the book and while I feel what I'm about to ask is probably a walk in the park for most of you, it's to complicated for me at this moment.

Here's what I'd need.

I have a file textfile containing hyperlinks.

http://www.example.com/1
http://www.example.com/2
...


The perl script should open the first hyperlink, scan the html file for the <title>-tags, and write the words between the title tags after the hyperlink in the textfile containing the hyperlinks.

After the script is done, the textfile should look like

http://www.example.com/1 webpage one
http://www.example.com/2 webpage two
...

Edit: I forgot to mention that every html-page has the same structure.

What would the code look like to do that?
Would the script read the entire html-file or just the <title>-tags?


I'm looking forward to the responses.

Kind regards,

Marc


(This post was edited by just.marc.smith on Aug 23, 2013, 6:55 AM)


BillKSmith
Veteran

Aug 23, 2013, 10:16 AM

Post #2 of 3 (764 views)
Re: [just.marc.smith] return text between <title>-tags from html [In reply to] Can't Post

This is not a good choice for you first perl project, but I understand that is not your choice. Note: This is not the place to find someone else to do it for you. Do you have any other programming experience that we can build on?

You must learn to use a module. You can probably use LWP::Simple to download the html documents. You then have to parse them to get the required data. Because the scope of this project is limited, you may be able to do that with a regular expression. A better approach is to find and use a module from CPAN to parse the html.

If LWP::Simple is installed on you machine, you can access its documentation with the following command on you command line:

Code
perldoc Lwp::Simple


Your book almost certainly has a chapter on regular expressions. This is not an easy subject unless you are already familiar with them form somewhere else (e.g. unix grep). Again, use perl's tool perldoc to read the pages perlrequick, perlretut, and perlre.

You book should also have a chapter on searching CPAN.
Once you have found a module you want to try, you may have to install it. That step depends on what operating system you are using and what version of perl.

I am sure you will have more specific questions as you work. Ask all you need.
Good Luck,
Bill


Laurent_R
Veteran / Moderator

Aug 23, 2013, 10:27 AM

Post #3 of 3 (762 views)
Re: [just.marc.smith] return text between <title>-tags from html [In reply to] Can't Post

If you've gone through only the first 50 pages of your beginners' book so far, then, yes, it is probably way too complicated for you at this point.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives