CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
URL matching in text

 



yapp
User

Nov 20, 2001, 7:42 AM

Post #1 of 6 (7190 views)
URL matching in text Can't Post

How can I match a URL in a part of text??

context
The point is that I want some codes put arround the URL, like the http://.. code in this forum. This should be done automatically by a perl script.

Yet an Other Perl Programmer

_________________________________
Find out more about programming
http://www.cool-programming.f2s.com


Coderifous
journeyman

Nov 26, 2001, 11:28 AM

Post #2 of 6 (7181 views)
Re: URL matching in text [In reply to] Can't Post

Is this what you mean?


Code
 
#You can ignore this test data:
@some_words = (
"this that and the other",
"that hib.jib grib grabby",
"the perlguru.com site is the best",
"other .com'rs arent as good",
"www.some_site.com",
"stupe dupe dee wupe"
);

foreach $line(@some_words){
print "Found a URL: $_\n" if s/(\b\w{3}\.\w+\.(?:com|net|org|mil|etc).*\b)/\<a href=\"$1\"\>$1\<\/a\>/i;


Basically, this will match a
www.perlguru.com
www.hotmail.com/inbox
www.hotmail.com/stuffhere/andmore.html

But it won't match:
www.this <--no second period with com|mil|etc...
super.duper.man.com
<--no www.
www.what's the deal.com
<--has "non" word characters. ie spaces and the apostraphe.
ww.exuse.com
<--no www.
www.notreally.not
<--.not isn't specified in my regex as the com|mil|net|etc... are

Now, I didn't include any filtering of anysorts, so this is a quick and crude way to get what you want. Beware though that this is untested and a user entering a bunch of crazy charactered params in their URL could provide some strange results.

Hope it helps.

Jim




yapp
User

Nov 27, 2001, 1:17 AM

Post #3 of 6 (7176 views)
Re: URL matching in text [In reply to] Can't Post

Thanks!!! ;)


In Reply To
Is this what you mean?

Indeed I want to put code codes around the URL. Not a <A href tag, buthttp://www.cool-programming.f2s.com


gregarios
stranger

Feb 24, 2002, 10:36 AM

Post #4 of 6 (7155 views)
Re: [Coderifous] URL matching in text [In reply to] Can't Post

Then there's always this to think about:

http://66.111.74.130

Greg J Piper
[url=http://www.macpicks.com]MacPiCkS



Jasmine
Administrator

Feb 24, 2002, 10:53 AM

Post #5 of 6 (7152 views)
Re: [yapp] URL matching in text [In reply to] Can't Post

What about [url=http://search.cpan.org/search?dist=HTML-Parser]HTML::LinkExtor? From the [url=http://search.cpan.org/doc/GAAS/HTML-Parser-3.25/lib/HTML/LinkExtor.pm]docs:

[perl]
#!/usr/bin/perl -w

use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;


$url = 'http://perlguru.com/gforum.cgi?post=13765'; # for instance
$ua = LWP::UserAgent->new;


# Set up a callback that collect image links
my @imgs = ();
sub callback {
my($tag, %attr) = @_;
return if $tag ne 'a'; # we only look closer at <img ...>
push(@imgs, values %attr);
}


# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
$p = HTML::LinkExtor->new(\&callback);


# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});


# Expand all image URLs to absolute ones
my $base = $res->base;
@imgs = map { $_ = url($_, $base)->abs; } @imgs;


# Print them out
print join("\n", @imgs), "\n";

[/perl]

Running the above on this page gave these results:

http://perlarchive.com/
http://perlarchive.com/guide/
http://perlguru.com/
http://tlc.perlarchive.com/
http://perlarchive.com/advertising.shtml
http://perlarchive.com/mailing_list.shtml
http://perlarchive.com/
http://perlarchive.com/
http://perlguru.com/gforum.cgi?guest=4383
http://perlguru.com/gforum.cgi?do=search;guest=4383
http://perlguru.com/gforum.cgi?do=whos_online;guest=4383
http://perlguru.com/gforum.cgi?do=login;guest=4383
http://perlguru.com/gforum.cgi?guest=4383
http://perlguru.com/gforum.cgi?guest=4383;category=4
http://perlguru.com/gforum.cgi?forum=13;guest=4383
http://www.gossamer-threads.com/scripts/gforum/
http://perlguru.com/gforum.cgi?do=post_view_printable;post=13765;guest=4383
http://perlguru.com/gforum.cgi?username=yapp;guest=4383
http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fwww.cool-programming.f2s.com
http://perlguru.com/gforum.cgi?username=Coderifous;guest=4383
http://perlguru.com/gforum.cgi?post=13765#13765
http://perlguru.com/gforum.cgi?username=yapp;guest=4383
http://perlguru.com/gforum.cgi?post=13765#13826
http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fwww.cool-programming.f2s.com
http://perlguru.com/gforum.cgi?username=gregarios;guest=4383
http://perlguru.com/gforum.cgi?post=13765#13826
http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fwww.macpicks.com
http://perlguru.com/gforum.cgi?username=Jasmine;guest=4383
http://perlguru.com/gforum.cgi?post=13765#13765
http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fsearch.cpan.org%2Fsearch%3Fdist%3DHTML-Parser
http://perlguru.com/gforum.cgi?url=http%3A%2F%2Fsearch.cpan.org%2Fdoc%2FGAAS%2FHTML-Parser-3.25%2Flib%2FHTML%2FLinkExtor.pm
http://perlguru.com/gforum.cgi?url=http%3A%2F%2F123.456.789.0
http://perlguru.com/gforum.cgi?url=invalid.whee
mailto:test@test.com
http://perlguru.com/gforum.cgi?do=post_editlog;post=14502;guest=4383
http://perlguru.com/gforum.cgi?do=search;guest=4383
http://www.gossamer-threads.com/
http://creativefundamentals.com/


(This post was edited by Jasmine on Feb 24, 2002, 10:57 AM)


yapp
User

Feb 26, 2002, 2:08 AM

Post #6 of 6 (7132 views)
Re: [Jasmine] URL matching in text [In reply to] Can't Post

Well, I might be wrong about your post, but:
I don't need to know which URL's there are in the text.

I have a piece of BBC code (like in this forum). My parser shoud automatically add [ url ] ... [ / url ] codes arround URL's that are not surrounded my those codes yet.


I've already found something, which I adjusted for readability and usability in my specific situation. However, I hope there is a better (less complex) method.

[perl]
my $BeforeURL1 = '([\n\b]|\A|[^"=\[\]\w])'; # [end of line OR: word boundary] OR: 'begin of string'
my $BeforeURL2 = '([\n\b]|\A|[^"=\[\]/:.(://\w+)])'; # OR: [not: ....... ]
my $Proto1 = '\w+://'; # http://, ftp://
my $Proto2 = 'www\.[^.]'; # www. (but not www..)
my $Domain = '[\w\~;:$\-+!*?/=&@#%.,]+'; # One of these: a-z A-Z _ ; : $ - + * ? / = & @ # % . ,
my $Rest = '[\w\~;:$\-+!*?/=&@#%]{2,}'; # One of these: a-z A-Z _ ; : $ - + * ? / = & @ # %
my $Sep = '\.'; # splits domain.rest
my $URL1 = "($Proto1$Domain$Sep$Rest)"; # http://www.url.com
my $URL2 = "($Proto2$Domain$Sep$Rest)"; # www.url.com

sub FormatURLCodes
{
return;
my $ExitCode = '\[/.*?\]';
my $EndCode = '[/]';
my $EndCodeT = '[/code]';

# Convert URL's
$_[0] =~ s<\[url\](.+?)$ExitCode> # [url]http://www.site.com[/...]
<\[url=$1]$1$EndCode>ig; # [url=http://www.site.com]http://www.site.com[/]

$_[0] =~ s<\[email\](.+?)$ExitCode> # [email]you@there.com[/...]
<\[email=$1]$1$EndCode>ig; # [email=you@there.com]you@there.com[/]

$_[0] =~ s<\Q[url=www.> # [url=www.
<[url=http://www.>ig; # [url=http://www.

$_[0] =~ s<\Q[image=www.> # [image=www.
<[image=http://www.>ig; # [image=http://www.

$_[0] =~ s<\[url=mailto:(.+?)\](.+?)$ExitCode> # [url=mailto:you@there.com]you[/...]
<[email=$1]$2$EndCode>sig; # [email=you@there.com]you[/]

# Convert URL's without XBBC codes to [url] codes
$_[0] =~ s~$BeforeURL1\\*$URL1~$1\[url=$2\]$2$EndCode~isg;
$_[0] =~ s~$BeforeURL2\\*$URL2~$1\[url=http://$2\]$2$EndCode~isg;
}
[/perl]

(the perl tags failed completely after the url codes)

Yet Another Perl Programmer

_________________________________
~~> [url=http://www.codingdomain.com]www.codingdomain.com <~~
More then 3500 X-Forum [url=http://www.codingdomain.com/cgi-perl/downloads/x-forum]Downloads! Cool

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives