CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
White Spaces in the middle

 



hancocj
Novice

Nov 25, 2002, 12:44 PM

Post #1 of 9 (1260 views)
White Spaces in the middle Can't Post

How can I remove white spaces from the middle of a string?

e.g.

^=white space

$foo = "Some^^^^^^^^^Text Here.";

Perl code.....

$bar = "Some Text Here.";


Paul
Enthusiast

Nov 25, 2002, 1:37 PM

Post #2 of 9 (1254 views)
Re: [hancocj] White Spaces in the middle [In reply to] Can't Post


Code
$foo =~ s/^(\S+)\s+(\S+)/$1$2/;



hancocj
Novice

Nov 25, 2002, 2:22 PM

Post #3 of 9 (1252 views)
Re: [RedRum] White Spaces in the middle [In reply to] Can't Post

That didn't seem to work for me?


jryan
User

Nov 25, 2002, 2:35 PM

Post #4 of 9 (1250 views)
Re: [hancocj] White Spaces in the middle [In reply to] Can't Post

This should do it:


Code
$string = "   Lookahead and Lookbehind assertions are great!   "; 
$string =~
s/
(?<! ^ ) # the string of whitespace is not after the start of the string
# (negative lookbehind)
(\s+) # a string of whitespace
(?! $ ) # the string of whitespace is not before the end of the string
# (negative lookahead)
//g;
print "*$string*";



hancocj
Novice

Nov 25, 2002, 3:04 PM

Post #5 of 9 (1250 views)
Re: [jryan] White Spaces in the middle [In reply to] Can't Post

It may not be white spaces after all. When parsing an html document there are &nbsp; codes in the text. This seems to be the problem. any ideas on how to rip those out when parsing. This is what I am currently doing:

my $p = HTML::Parser->new(api_version => 3,
handlers => [
text => [\&text, "dtext"],
],
marked_sections => 1);

$p->ignore_elements("script","style","title");
$p->ignore_tags("img","title","a");
$p->parse($buffer);

sub text
{
my ( $dtext ) = @_;
$dtext =~ s/^\s+//;
$dtext =~ s/\s+$//;
$dtext =~ s/\s+/ /g;
$DOC_TEXT = $DOC_TEXT . $dtext;
}

So how would I rip out &nbsp; chars?


Paul
Enthusiast

Nov 25, 2002, 4:50 PM

Post #6 of 9 (1246 views)
Re: [hancocj] White Spaces in the middle [In reply to] Can't Post

Why do you have 3 regexs stripping spaces when the final regex will do the same as all three?


mhx
Enthusiast / Moderator

Nov 26, 2002, 12:33 AM

Post #7 of 9 (1241 views)
Re: [RedRum] White Spaces in the middle [In reply to] Can't Post


In Reply To
Why do you have 3 regexs stripping spaces when the final regex will do the same as all three?


Because the first two eliminate whitespace at the beginning and end, while the last one turns sequences of whitespace into a single space character. Wink

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



Paul
Enthusiast

Nov 26, 2002, 2:45 AM

Post #8 of 9 (1237 views)
Re: [mhx] White Spaces in the middle [In reply to] Can't Post

If he'd have used a code tag I would have spotted that Tongue

Still, the first two could be made into one :)


(This post was edited by RedRum on Nov 26, 2002, 2:45 AM)


mhx
Enthusiast / Moderator

Nov 26, 2002, 4:33 AM

Post #9 of 9 (1233 views)
Re: [RedRum] White Spaces in the middle [In reply to] Can't Post


In Reply To
Still, the first two could be made into one :)


I guess you mean like so:


Code
$s =~ s/^\s+|\s+$//g;


However, the performance of this can be lousy, as the following benchmark shows:


Code
Benchmark: timing 100000 iterations of mhx, one, two... 
mhx: 2 wallclock secs ( 1.95 usr + 0.00 sys = 1.95 CPU) @ 51282.05/s (n=100000)
one: 12 wallclock secs (11.77 usr + 0.01 sys = 11.78 CPU) @ 8488.96/s (n=100000)
two: 5 wallclock secs ( 5.08 usr + 0.01 sys = 5.09 CPU) @ 19646.37/s (n=100000)


Where one is the solution with one regex, and two is for the original solution with two separate regexes. Oh, and mhx is my optimized solution.


Code
use Benchmark; 

my $str = ' test test test test test test ';

timethese( 100000, {
one => sub {
my $s = $str;
$s =~ s/^\s+|\s+$//g;
$s =~ s/\s+/ /g;
},
two => sub {
my $s = $str;
$s =~ s/^\s+//;
$s =~ s/\s+$//;
$s =~ s/\s+/ /g;
},
mhx => sub {
my $s = $str;
$s =~ y/ //s;
$s =~ s/^\s//;
$s =~ s/\s$//;
},
} );


-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives