CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
matching phrases, something weird happens

 



lex
New User

Apr 16, 2010, 11:00 AM

Post #1 of 6 (4864 views)
matching phrases, something weird happens Can't Post

Hi, I'm trying to say: match the first x characters, then look for some kind of end to a phrase and give me that bit. But on some text it does a strange thing:


Code
if ($text =~ m/(.{29,}?\w['")]*[!?:]+['")]*)\s/is) {


gives me:

En wr tuint iedereen er in...

yet


Code
if ($text =~ m/(.{31,}?\w['")]*[!?:]+['")]*)\s/is) {


gives me:

En wr tuint iedereen er in. President Obama kondigde gisteren aan dat er over een jaartje of dertig Amerikaanse astronauten naar Mars zullen gaan.

'En weer veilig terugkeren naar de aarde', voegde hij er voorzichtigheidshalve aan toe, met een niet mis te verstane knipoog naar de speech van John Kennedy waarmee in 1961 het Apollo-project van start ging.

Ruimtevaartliefhebbers blij, Marsfanaten enthousiast, tweede maanman Buzz Aldrin in zijn nopjes...


Does anybody see why?

Thanks!


roolic
User

Apr 16, 2010, 10:47 PM

Post #2 of 6 (4851 views)
Re: [lex] matching phrases, something weird happens [In reply to] Can't Post

your regex is not clear because the .{N,} do not set the upper limit to substring. it's better to use the exclude condition [^] to define where the string should be cut. then the condition "first 29 characters plus any symbol excluding space, comma, endlines etc" will be look like

Code
if( $test =~ /(^.{29}[^\s\'\"\(\)\.\,\;\:\?\!\n\r]+)/){ ... }



(This post was edited by roolic on Apr 16, 2010, 10:48 PM)


lex
New User

Apr 18, 2010, 11:20 AM

Post #3 of 6 (4797 views)
Re: [roolic] matching phrases, something weird happens [In reply to] Can't Post

Thanks for your answer, but your 'code' gives exactly the same result as what I had already. With '29' it's the first short phrase, with 31 it does four complete phrases as copied in my first message)...


(This post was edited by lex on Apr 18, 2010, 11:21 AM)


roolic
User

Apr 19, 2010, 1:03 AM

Post #4 of 6 (4768 views)
Re: [lex] matching phrases, something weird happens [In reply to] Can't Post

fixed a bit:

Code
#!/usr/bin/perl 
use strict;

my $test = 'En w��r tuint iedereen er in. President Obama kondigde gisteren aan dat er over een jaartje of dertig Amerikaanse astronauten naar Mars zullen gaan. \n\n\'En weer veilig terugkeren naar de aarde\', voegde hij er voorzichtigheidshalve aan toe, met een niet mis te verstane knipoog naar de speech van John Kennedy waarmee in 1961 het Apollo-project van start ging. \n\nRuimtevaartliefhebbers blij, Marsfanaten enthousiast, tweede maanman Buzz Aldrin in zijn nopjes...';

if( $test =~ /^(.{29}[^\s\'\"\(\)\.\,\;\:\?\!\n\r]*)/ ){
print "29: $1\n"; }
if( $test =~ /^(.{31}[^\s\'\"\(\)\.\,\;\:\?\!\n\r]*)/ ){
print "31: $1\n"; }


the output:

Code
---------- perl ---------- 
29: En wпїЅпїЅr tuint iedereen er
31: En wпїЅпїЅr tuint iedereen er in

Output completed (0 sec consumed) - Normal Termination



lex
New User

Apr 19, 2010, 1:41 AM

Post #5 of 6 (4765 views)
Re: [roolic] matching phrases, something weird happens [In reply to] Can't Post

So my guess is it's probably to do with the encoding of the original piece of text in this case?

As in my case I get it as described above.

I'll see if I can find a solution in that direction...

Thanks for taking the time to help me!


lex
New User

Apr 19, 2010, 1:59 AM

Post #6 of 6 (4764 views)
Re: [lex] matching phrases, something weird happens [In reply to] Can't Post

Sorry, I was wrong. I hadn't seen you wrote you 'fixed it a bit', thought you just meant to show your code did work over at your end.

With the "little bit of fixing" it now does work.

Thanks a lot!!


(This post was edited by lex on Apr 19, 2010, 2:10 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives