Home: Perl Programming Help: Regular Expressions:
extract some text from a webpage

New User

Nov 17, 2011, 7:18 AM

Views: 11015
extract some text from a webpage

hey all..

So.. i have this regex /(%)((?:[a-z][a-z0-9_]*))/
I'm not sure if it's correct but anyway what i'm trying to do is use it to get a line of text from a webpage that begins with a % character and ends in a whitespace so it would be like: whatever %Get_This123 whatever

I tried grep but erm it wouldnt work..


Nov 18, 2011, 4:25 AM

Views: 10920
Re: [slekness] extract some text from a webpage

Whether or not the grep function is useful, depends on how you want to process the result. A more frequently used solution is, however, to loop through the file and apply your regex to each line.

Note that the pattern matching operator in Perl is =~

(See perldoc perlop)

BTW, your regexp doesn't fulfil the condition "ends in a whitespace", because the word string might also be terminated, for instance, by a dot, comma, or uppercase letter (for this reason, your example string ".... %Get_This123 ..." would not match).

New User

Nov 30, 2011, 12:37 AM

Views: 10645
Re: [slekness] extract some text from a webpage

Try the following instead:

my $string = "whatever %Get_This123 whatever"; 
if ($string =~ /.*?^%(.*?)\s/)
my $match = $1;
print "Matched: $match\n";

Note that the above regexp is quite light and will also match "* %whatever * <space>"