
pbuks
Deleted
Jan 5, 2001, 3:14 AM
Post #1 of 6
(1041 views)
|
|
From beginner To Advanced: Get a multilanguage sit
|
Can't Post
|
|
Hello, I want to thank BigRich for helping me so far. My question was how to get a HTML-source from a remote site. Unfortunately I didn't mentioned that is was a multilanguage site (stupid me :))). Well I will post the last message BigRich has posted me maybe someone can help me how to download the site with a Perl/CGI script. Here is the URL:http://games.skynet.be/page.html?channel=arena&pagelang=nl&subject=scores&sid=2&sort=fph&offset=50 This is what BigRich wrote : __________________________________________ It didn't work because in your original post you asked how to retrieve the content from a simple html page when in fact the site you are trying to access is a multi-language site that is done in frames and uses cookies. You have to have the proper cookie to access the page your are trying to access. If not, you get sent to index2.html where you get a cookie based on the menu selection you choose. It doesn't matter if it's a browser or CGI scrip, you still need the proper cookie. You may be able to construct a UserAgent using LWP::UserAgent that can accept cookies but more than likely you'll need a bot/spider to access the information you want to get at. You need to study the docs that came with Perl. The docs you need to concentrate on are HTTP (Cookies, Headers, Request, Response) and LWP(UserAgent, RobotUA, lwpcook, etc). You'll also want to do a search for "perl bots" for sites and info pertaining to bots and spiders. You could also re-post in the "Advanced" section of this forum where someone with more experince with bots/spiders may be able to help. Be sure to give the url that you are trying to access, not a simple example as you did here and explain that the site is done in frames and uses cookies. Good luck, BigRich __________________________________________
|