Home: Perl Programming Help: Regular Expressions:
how to extract characters between 2 pipes



mauto
User

Aug 1, 2002, 9:07 AM


Views: 22826
how to extract characters between 2 pipes

I have the following record:

|abcdefg|hijk|abc|pqr|xyz

I want to extract the abcdefg part, ignoring the 2 pipe characters either side of it ?


mhx
Enthusiast

Aug 1, 2002, 9:33 AM


Views: 22825
Re: [mauto] how to extract characters between 2 pipes


Code
$str = '|abcdefg|hijk|abc|pqr|xyz'; 
($abc) = $str =~ /\|([^|]*)/;
print $abc;


Hope this helps.

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



mauto
User

Aug 1, 2002, 10:26 AM


Views: 22817
Re: [mhx] how to extract characters between 2 pipes

Thanks.

How can I check to see after how many pipe characters the pqr text appears ?

|abcdefg|hijk|abc|pqr|xyz

i.e in this example I should get the result 4.


mhx
Enthusiast

Aug 1, 2002, 12:38 PM


Views: 22815
Re: [mauto] how to extract characters between 2 pipes

You could use the following:


Code
$str = '|abcdefg|hijk|abc|pqr|xyz'; 
$count = ($str =~ /(.*)pqr/)[0] =~ tr/|//;
print $count;


But that depends more or less upon what else you want to know about the string.

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



Paul
Enthusiast

Aug 1, 2002, 6:47 PM


Views: 22813
Re: [mhx] how to extract characters between 2 pipes

>>
$str = '|abcdefg|hijk|abc|pqr|xyz';
($abc) = $str =~ /\|([^|]*)/;
print $abc;
<<

I wouldn't use a regex for that.


Code
    my $str = '|abcdefg|hijk|abc|pqr|xyz';  
my $abc = (split /\|/, $str, 3)[1];
print $abc;



mhx
Enthusiast

Aug 1, 2002, 9:51 PM


Views: 22811
Re: [RedRum] how to extract characters between 2 pipes


In Reply To
I wouldn't use a regex for that.


Well, I would. Tongue


In Reply To

Code
my $abc = (split /\|/, $str, 3)[1];



If $str gets longer, this solution will get slower (it's just about the same for the given example, at least on my machine). The regex solution does not depend on the length of $str and it's IMHO also easier to read. Optimizing the regex to


Code
($abc) = $str =~ /^\|([^|]*)/;


even makes it faster than the split solution.

Besides, it's shorter. Wink

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



mauto
User

Aug 2, 2002, 3:41 AM


Views: 22805
Re: [RedRum] how to extract characters between 2 pipes


Code
$count = ($str =~ /(.*)pqr/)[0] =~ tr/|//;


Thanks.

Can you explain the above regex in words, as i am a little confused ?


jryan
User

Aug 2, 2002, 3:33 PM


Views: 22800
Re: [mauto] how to extract characters between 2 pipes


Code
 
my $text = q(|abcdefg|hijk|abc|pqr|xyz);
sub bars_before{ substr($text, 0, index($text,shift)) =~ tr/|// }

print bars_before('pqr');



mhx
Enthusiast

Aug 5, 2002, 5:12 AM


Views: 22787
Re: [mauto] how to extract characters between 2 pipes


In Reply To
Can you explain the above regex in words, as i am a little confused?


Sure. The part in red captures the whole string before pqr:


Code
$count = ($str =~ /(.*)pqr/)[0] =~ tr/|//;


That alone is a very basic regex. The .* matches zero or more arbitrary characters. The parentheses around .* capture the string that .* matched.


Code
$count = ($str =~ /(.*)pqr/)[0] =~ tr/|//;


Putting parentheses around the regex makes the regex operator work in list context, returning a list of all matched substrings. The index operator [0] simply selects the first (and only) element of that list, which is the string holding everything before pqr.


Code
$count = ($str =~ /(.*)pqr/)[0] =~ tr/|//;


Now, this string is used with the transliteration operator tr, which is normally used to replace certain characters. If no replacement characters are given, the operator simply counts the characters (in our case pipes) and returns the total count, which is then stored in $count.

You can have a look at [url=http://www.perldoc.com/perl5.8.0/pod/perlop.html#Regexp-Quote-Like-Operators]perldoc perlop for details.

Hope this helps.

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo



mauto
User

Aug 6, 2002, 5:14 AM


Views: 22777
Re: [mhx] how to extract characters between 2 pipes

Many thanks.

However, I have a problem with this. When the string is modified to:

$str = '|abcdefg|hijk|abc|pqr|xyz|abc_pqr|ghi';

The match picks up the pipe before abc_pqr, but I want to match only the pqr.


Code
$str = '|abcdefg|hijk|abc|pqr|xyz|abc_pqr|ghi';  
$count = ($str =~ /(.*)pqr/)[0] =~ tr/|//;
print $count;


$count = 6

How can I modify the regex to ensure that only pqr is matched ?


mhx
Enthusiast

Aug 6, 2002, 6:25 AM


Views: 22774
Re: [mauto] how to extract characters between 2 pipes


Code
$str = '|abcdefg|hijk|abc|pqr|xyz|abc_pqr|ghi'; 
$count = ($str =~ /(.*?\|)pqr/)[0] =~ tr/|//;


The explicit \| will ensure that the pqr is directly behind a pipe symbol.
The ? will ensure that the first occurrence of pqr is used, as it modifies the * quantifier to match as few as possible characters, instead of as many as possible, which is the default.

Hope this helps.

-- mhx

At last with an effort he spoke, and wondered to hear his own words, as if some other will was using his small voice. "I will take the Ring," he said, "though I do not know the way."

-- Frodo