CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Regular Expressions:
URL matching in text



Nov 20, 2001, 7:42 AM

Post #1 of 6 (6943 views)
URL matching in text Can't Post

How can I match a URL in a part of text??

The point is that I want some codes put arround the URL, like the http://.. code in this forum. This should be done automatically by a perl script.

Yet an Other Perl Programmer

Find out more about programming


Nov 26, 2001, 11:28 AM

Post #2 of 6 (6934 views)
Re: URL matching in text [In reply to] Can't Post

Is this what you mean?

#You can ignore this test data:
@some_words = (
"this that and the other",
"that hib.jib grib grabby",
"the site is the best",
"other .com'rs arent as good",
"stupe dupe dee wupe"

foreach $line(@some_words){
print "Found a URL: $_\n" if s/(\b\w{3}\.\w+\.(?:com|net|org|mil|etc).*\b)/\<a href=\"$1\"\>$1\<\/a\>/i;

Basically, this will match a

But it won't match:
www.this <--no second period with com|mil|etc...
<--no www.
www.what's the
<--has "non" word characters. ie spaces and the apostraphe.
<--no www.
<--.not isn't specified in my regex as the com|mil|net|etc... are

Now, I didn't include any filtering of anysorts, so this is a quick and crude way to get what you want. Beware though that this is untested and a user entering a bunch of crazy charactered params in their URL could provide some strange results.

Hope it helps.



Nov 27, 2001, 1:17 AM

Post #3 of 6 (6929 views)
Re: URL matching in text [In reply to] Can't Post

Thanks!!! ;)

In Reply To
Is this what you mean?

Indeed I want to put code codes around the URL. Not a <A href tag, but


Feb 24, 2002, 10:36 AM

Post #4 of 6 (6908 views)
Re: [Coderifous] URL matching in text [In reply to] Can't Post

Then there's always this to think about:

Greg J Piper


Feb 24, 2002, 10:53 AM

Post #5 of 6 (6905 views)
Re: [yapp] URL matching in text [In reply to] Can't Post

What about [url=]HTML::LinkExtor? From the [url=]docs:

#!/usr/bin/perl -w

use LWP::UserAgent;
use HTML::LinkExtor;
use URI::URL;

$url = ''; # for instance
$ua = LWP::UserAgent->new;

# Set up a callback that collect image links
my @imgs = ();
sub callback {
my($tag, %attr) = @_;
return if $tag ne 'a'; # we only look closer at <img ...>
push(@imgs, values %attr);

# Make the parser. Unfortunately, we don't know the base yet
# (it might be diffent from $url)
$p = HTML::LinkExtor->new(\&callback);

# Request document and parse it as it arrives
$res = $ua->request(HTTP::Request->new(GET => $url),
sub {$p->parse($_[0])});

# Expand all image URLs to absolute ones
my $base = $res->base;
@imgs = map { $_ = url($_, $base)->abs; } @imgs;

# Print them out
print join("\n", @imgs), "\n";


Running the above on this page gave these results:;guest=4383;guest=4383;guest=4383;category=4;guest=4383;post=13765;guest=4383;guest=4383;guest=4383;guest=4383;guest=4383;guest=4383;post=14502;guest=4383;guest=4383

(This post was edited by Jasmine on Feb 24, 2002, 10:57 AM)


Feb 26, 2002, 2:08 AM

Post #6 of 6 (6885 views)
Re: [Jasmine] URL matching in text [In reply to] Can't Post

Well, I might be wrong about your post, but:
I don't need to know which URL's there are in the text.

I have a piece of BBC code (like in this forum). My parser shoud automatically add [ url ] ... [ / url ] codes arround URL's that are not surrounded my those codes yet.

I've already found something, which I adjusted for readability and usability in my specific situation. However, I hope there is a better (less complex) method.

my $BeforeURL1 = '([\n\b]|\A|[^"=\[\]\w])'; # [end of line OR: word boundary] OR: 'begin of string'
my $BeforeURL2 = '([\n\b]|\A|[^"=\[\]/:.(://\w+)])'; # OR: [not: ....... ]
my $Proto1 = '\w+://'; # http://, ftp://
my $Proto2 = 'www\.[^.]'; # www. (but not www..)
my $Domain = '[\w\~;:$\-+!*?/=&@#%.,]+'; # One of these: a-z A-Z _ ; : $ - + * ? / = & @ # % . ,
my $Rest = '[\w\~;:$\-+!*?/=&@#%]{2,}'; # One of these: a-z A-Z _ ; : $ - + * ? / = & @ # %
my $Sep = '\.'; # splits
my $URL1 = "($Proto1$Domain$Sep$Rest)"; #
my $URL2 = "($Proto2$Domain$Sep$Rest)"; #

sub FormatURLCodes
my $ExitCode = '\[/.*?\]';
my $EndCode = '[/]';
my $EndCodeT = '[/code]';

# Convert URL's
$_[0] =~ s<\[url\](.+?)$ExitCode> # [url][/...]
<\[url=$1]$1$EndCode>ig; # [url=][/]

$_[0] =~ s<\[email\](.+?)$ExitCode> # [email][/...]
<\[email=$1]$1$EndCode>ig; # [][/]

$_[0] =~ s<\Q[url=www.> # [url=www.
<[url=http://www.>ig; # [url=http://www.

$_[0] =~ s<\Q[image=www.> # [image=www.
<[image=http://www.>ig; # [image=http://www.

$_[0] =~ s<\[url=mailto:(.+?)\](.+?)$ExitCode> # []you[/...]
<[email=$1]$2$EndCode>sig; # []you[/]

# Convert URL's without XBBC codes to [url] codes
$_[0] =~ s~$BeforeURL1\\*$URL1~$1\[url=$2\]$2$EndCode~isg;
$_[0] =~ s~$BeforeURL2\\*$URL2~$1\[url=http://$2\]$2$EndCode~isg;

(the perl tags failed completely after the url codes)

Yet Another Perl Programmer

~~> [url=] <~~
More then 3500 X-Forum [url=]Downloads! Cool


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives