
bedazzled
New User
Mar 29, 2011, 12:16 AM
Post #1 of 1
(471 views)
|
|
WWW::Mechanize - Change URL to extract from in ICECAST
|
Can't Post
|
|
I got some help from someone to prepare this script. The script works on: http://dir.xiph.org/by_format/MP3?search=MP3&page=0 but I need it to work on: http://dir.xiph.org/by_genre/80s?search=80s&page=0 The perl program:
#!/usr/bin/perl -w use WWW::Mechanize; #$url = 'http://dir.xiph.org/by_format/MP3?search=MP3&page=0'; $ice = 'http://dir.xiph.org'; $url = $ARGV[0]; $m = WWW::Mechanize->new(); $m->get($url); $c = $m->content; @stations = $c =~ m{onclick=.javascript:pageTracker._trackPageview\(./stream/website.\)\;.>(.*?)</a></span>}gs or die "Can't get station name\n"; @urls = $c =~ m{<span class=.name.><a href=.(.*?)\"}gs or die "Can't get urls\n"; @folder = $c =~ m{<p>\[ <a href=.(.*?)\"}gs or die "Can't get streams\n"; foreach $folder (@folder){ push(@streams, $ice . $folder); } @inline_tags = $c =~ m{<ul class=.inline-tags.>(.*?)</ul>}gs or die "Can't get Inline Tags\n"; foreach (@inline_tags) { $_ =~ m{(.*?)<li><a href=.\/by_genre\/(.*?)\"}gs or die "Can't get Playlist Streams\n"; push(@genre, $2); } $len = $#stations; foreach (@streams) { $m->get($_); $d = $m->content; push(@pl, $d); } foreach $num (0..$len){ print $stations[$num] . "," . $streams[$num] . "," . $urls[$num] . "," . $genre[$num] . "," . $pl[$num] . "\n\n\n"; } Usage: for i in `cat icecastpages_mp3.txt`; do ./icecast-1.02 "$i"; done >>csvimportfile ----- 1. On icecast-1.02.txt.txt, remove both txt extensions and make file executable. 2. Usage: for i in `cat icecastpages_mp3.txt`; do ./icecast-1.02 "$i"; done >>csvimportfile #will work fine and give output 3. However I need to run (but get error messages): for i in `cat icecastpages_genre.txt`; do ./icecast-1.02 "$i"; done >>csvimportfile The difference in the text files is the url as stated at the beginning of this post. Can someone fix this to work on icecastpages_genre.txt? Thanks.
|