CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
E-mail address extraction from text

 



johng
New User

Jun 20, 2002, 7:28 AM

Post #1 of 7 (5814 views)
E-mail address extraction from text Can't Post

I'm interested in creating a regular expression that will identify an e-mail address embedded in a free text field. So far, I have created the following test example for the regular expression:

my($text) = "ddddhh deb@hot.com ddfre";
if ($text =~m/\s\w+\@{1}\w+\.\w+\s/) {print "matched\n"};

I realize there may be multiple problems in this expression, but at least one of them is that when I isolate the regular expression to:

my($text) = "ddddhh deb@hot.com ddfre";
if ($text =~m/\@/) {print "matched\n"};

I cannot get the @ symbol recognized even though I have attempted to escape it with a \. Interestingly, I can get it to escape and match other special characters like '.' .

I am using the ActiveState Perl download on a Windows NT machine. Any help would be greatly appreciated as I am very new to Perl. Thanks


fashimpaur
User

Jun 20, 2002, 8:23 AM

Post #2 of 7 (5807 views)
Re: [johng] E-mail address extraction from text [In reply to] Can't Post

John,

The issue is that when using the @ symbol in a string it must be escaped
if it is double quoted. A double quoted string gets evaluated to see if it
contains any perl variables first. To avoid this, when trying to test a string
with no evaluations necessary, single quote the string or use the q operator.

This code worked:


Code
  

my
$text = 'ddddhh deb@hot.com ddfre';
if ($text =~ /\@/g) {print "matched\n";};



Hope that helped,
Dennis

$a="c323745335d3221214b364d545".
"a362532582521254c3640504c3729".
"2f493759214b3635554c3040606a0",
print unpack"u*",pack "h*",$a,"\n\n";


uri
Thaumaturge

Jun 20, 2002, 7:58 PM

Post #3 of 7 (5802 views)
Re: [fashimpaur] E-mail address extraction from text [In reply to] Can't Post

there is a module Email::Find that extracts email addresses from plain text. simple regexes just won't do except for simple email addresses. a full email validation regex is over 4k chars long (read mastering regular expressions).

i would like to see more mentions of CPAN and existing modules at perlguru. too often i see code that tries to duplicate already existing functionality in a CPAN module.


fashimpaur
User

Jun 21, 2002, 5:07 AM

Post #4 of 7 (5802 views)
Re: [uri] E-mail address extraction from text [In reply to] Can't Post

Uri,

I agree that CPAN modules should be referred to. However, in my case, and
I am sure it is true for others, if the module is not available from Activestate
for Win32 and since my background in compiling Perl modules is not strong,
and since I do not have a C compiler on my Win32 machine, I sometimes have
to create code that duplicates CPAN efforts already done.

Sure, some modules really do not need compiling when they are completely
done in Perl. These cases sort of help avoid duplication, but sometimes at
their own cost. If you have to deal with an IT infrastructure team that does
not adapt well to change, getting even simple PM's added to your Perl Lib can
be difficult. Then you spend time modifying the code, creating directories in your
own cgi-bin and testing that it can make use of the standard modules
impractical.

Still, overall, you are right. They should be used where practical. Maybe there
should be some articles in The Learning Center to explain where to get a free
C compiler, and how to make a PM to install it. I do not have the expertise
to do so, but others do it all the time and could teach others to do the same.

Thanks for the tip on the module. I will check it out for my future use.
Dennis

$a="c323745335d3221214b364d545".
"a362532582521254c3640504c3729".
"2f493759214b3635554c3040606a0",
print unpack"u*",pack "h*",$a,"\n\n";


Paul
Enthusiast

Jul 15, 2002, 5:20 PM

Post #5 of 7 (5775 views)
Re: [fashimpaur] E-mail address extraction from text [In reply to] Can't Post

>>
if ($text =~ /\@/g) {print "matched\n";};
<<

You don't need to escape @'s in regexs


gregarios
stranger

Nov 14, 2002, 12:57 PM

Post #6 of 7 (5693 views)
Re: WEB address extraction from text [In reply to] Can't Post

How about the same question, but with a period? How can I do a search for ".info" in a list of domains lets say. I keep getting the info preceded by "any character" right now, whether I escape it or not it seems.

I'm using:

Code
if ($string=~/$text/i)

where the $text is input from a form and can contain periods.

Greg J Piper
[url=http://www.macpicks.com]MacPiCkS



(This post was edited by gregarios on Nov 14, 2002, 1:02 PM)


mhx
Enthusiast

Nov 14, 2002, 1:22 PM

Post #7 of 7 (5687 views)
Re: [gregarios] WEB address extraction from text [In reply to] Can't Post


In Reply To

Code
if ($string=~/$text/i)

where the $text is input from a form and can contain periods.


To automatically escape all metacharacters (like the dot) in $text, use:


Code
if ($string=~/\Q$text/i)


See [url=http://www.perldoc.com/perl5.8.0/pod/func/quotemeta.html]perldoc quotemeta for details.

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives