CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Regular expression to extract domain name

 



prateekm21
New User

Apr 23, 2009, 8:54 AM

Post #1 of 7 (4960 views)
Regular expression to extract domain name Can't Post

Hello I am new to perl and I am having some issues. can someone tell me a good way to extract the domain name

EG 1:
http://www.google.com/test/india/clickhandler.js
I need - google.com

EG 2:
http://test.usa.com/o/tA$IXTq$2ALoYduAJQs$j.AJQRO-QA/nourl1
I need - usa.com

Thank You in advance


(This post was edited by prateekm21 on Apr 23, 2009, 9:42 AM)


ichi
User

Apr 24, 2009, 5:36 AM

Post #2 of 7 (4951 views)
Re: [prateekm21] Regular expression to extract domain name [In reply to] Can't Post

split on dot ".". then get the 2nd and 3rd element. Otherwise, use a module that parses URL.


Shree
Novice


Apr 27, 2009, 2:23 AM

Post #3 of 7 (4837 views)
Re: [ichi] Regular expression to extract domain name [In reply to] Can't Post

_______________________________________________________________________________

#!/usr/bin/perl
use strict;
use warnings;

my $url = "http://www.google.com/test/india/clickhandler.js";

my @arr = split(/\//,$url);

$arr[2] =~ s/^www.//;

print $arr[2];


_______________________________________________________________________________
Thanks
-Shree



(This post was edited by Shree on Apr 27, 2009, 2:26 AM)


vikas.deep
User

May 2, 2009, 1:57 AM

Post #4 of 7 (4716 views)
Re: [prateekm21] Regular expression to extract domain name [In reply to] Can't Post

Hi Prateekm,
Hope you must have been benefited by other replies to your query. I was just going through the various responses to your query I tried the one suggested by one of our friends It works well for first eg in the query but fails in case of second query. It won't even compile if you "use strict". I can think of only two options
First do not "use strict" during compilation.
Second escape the $ symbols in the string $url.
Above all are you absolutely sure that you have spotted such a url which carries $symbols. The url you have mentioned took me to some USA.com In that site I tried a few links like "hotel reservation", "Car rental", "Airline reservation" but at least in these three cases I could not spot a $ in the url address.
You see the $ is a reserved symbol in PERL and that is the reason that $ is not acceptable in the variable $url during compilation.
Also I may add that if you are working on a large number of cases(URLs) then the ".com" loses its significance In that case a better approach will be to combine the two suggestions made by our friends i.e. first split at "\" and then again explode it into an array at the dot character. Now you can just discard the "com". In my example code I have escaped all the $. There might me a better solution because of "TMTOWDI" Unimpressed
#!/usr/bin/perl
use strict;
use warnings;

my $url = "http://test.usa.com/o/tA\$IXTq\$2ALoYduAJQs\$j.AJQRO-QA/nourl1";

my @arr = split(/\//,$url);


$arr[2] =~ s/^www.//;
my @arr2 = split(/\./,$arr[2]);
print "@arr2";

print "\n",$arr[2];
-For all my suggestions " I am sure someone else can do it in a better or elegant manner!"

(This post was edited by vikas.deep on May 2, 2009, 2:03 AM)
Attachments: ment.pl (0.24 KB)


FishMonger
Veteran / Moderator

May 2, 2009, 6:32 AM

Post #5 of 7 (4698 views)
Re: [vikas.deep] Regular expression to extract domain name [In reply to] Can't Post

vikas.deep,

Suggesting to not use strict is never a good/proper suggestion.

In your example script there is no need to escape the $ in order to compile under strictures. All that's needed is to use the proper quoting, which you did not do.

The general rule that I follow is to only use double quotes or the qq() operator when you need variable interpolation, otherwise just use single quotes or the q() operator.

Using split as has already been suggested may be the easiest for the given example url's but it's not the best method. For a more robust solution, I'd recommend (as did ichi) using one of the modules written for this purpose, such as one of the URI modules. http://search.cpan.org/~gaas/URI-1.37/URI.pm


vikas.deep
User

May 2, 2009, 7:36 AM

Post #6 of 7 (4692 views)
Re: [FishMonger] Regular expression to extract domain name [In reply to] Can't Post

Yes if one uses single quotes or a q() instead of double quotes There is no need of escaping the dollar symbol when used under strictures. (I was wrong that it is not compiling because a dollar is reserved to signify a scalar in perl). I will be more careful in future. Even then I have never seen a dollar sign in a url. May be next time I will try to stare a bit harder on the url address bar.
SPlit is the easiest way to do that is why I twice used split. Use it once more to throw out the .com.
As for the module is concerned a "use so and so module" is definitely the simplest and most secure way to do things. Even though I have not checked the said module. Unimpressed
-For all my suggestions " I am sure someone else can do it in a better or elegant manner!"


FishMonger
Veteran / Moderator

May 2, 2009, 8:04 AM

Post #7 of 7 (4689 views)
Re: [vikas.deep] Regular expression to extract domain name [In reply to] Can't Post

This is the method I'd use, but I'd probably make the regex more "bullet proof".


Code
#!/usr/bin/perl 

use strict;
use warnings;
use URI::Split qw(uri_split);

my $url = 'http://usa.com/o/tA$IXTq\$2ALoYduAJQs$j.AJQRO-QA/nourl1';
my $domain = (uri_split($url))[1];

$domain =~ s/^.*\.([^.]+\..+)$/$1/;

print $domain;


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives