Home: Perl Programming Help: Regular Expressions:
Extracting String that is contained in Double Quotes, Single Quotes and there can be Quotes within Quotes



jonjon
New User

Nov 12, 2014, 7:46 PM


Views: 41405
Extracting String that is contained in Double Quotes, Single Quotes and there can be Quotes within Quotes

["|'](.*)["|']

The above RegEx script will parse out any number of characters that fall between a single or double quote.

1 example "jon" will be extracted into a list as jon (Double Quotes)
2 example 'don' will be extracted into a list as don (Single Quotes)

But......What if I have something like below

If [Jon is Great] = "01/31/20XX" then "January"


With the Regex above, the output would be the following:

01/31/20XX" then "January

My desired output would be :

01/31/20xx
January


Zhris
Enthusiast

Nov 13, 2014, 10:01 AM


Views: 41387
Re: [jonjon] Extracting String that is contained in Double Quotes, Single Quotes and there can be Quotes within Quotes

Hi,

Regular expression quantifiers are greedy by default i.e. they will match as much as possible. You can force quantifiers to be ungreedy by following them with a question mark i.e. they will match as little as possible:


Code
["|'](.*?)["|']


Chris


(This post was edited by Zhris on Nov 13, 2014, 10:04 AM)


BillKSmith
Veteran

Nov 14, 2014, 1:50 PM


Views: 41302
Re: [jonjon] Extracting String that is contained in Double Quotes, Single Quotes and there can be Quotes within Quotes

Chris correctly identified the problem with your example. However, matching quotes can get even more complicated. Consider the following example:

Code
use strict; 
use warnings;
my $string = q(The cartoon ended with the caption "That's all folks".);
my ($match) = $string =~ m/ ["'](.*?)["']/;
print $match;


OUTPUT:

Code
That


What you want is "That's all folks".


The module Regexp::Common qw(balanced) is by far the best solution.
Good Luck,
Bill


jonjon
New User

Nov 14, 2014, 2:43 PM


Views: 41290
Re: [BillKSmith] Extracting String that is contained in Double Quotes, Single Quotes and there can be Quotes within Quotes

Thank you Zhris and Bill. However, I want the Regex to Pull the date and then I want it to extract January. In both example it will only extract the first term in quotes, which is the 01/31/20xx - I then need the regex to parse out the January seperately as well.

If [Jon is Great] = "01/31/20XX" then "January"


With the Regex above, the output would be the following:

01/31/20XX" then "January

My desired output would be :

01/31/20xx
January


Laurent_R
Veteran / Moderator

Nov 15, 2014, 5:33 AM


Views: 41123
Re: [jonjon] Extracting String that is contained in Double Quotes, Single Quotes and there can be Quotes within Quotes

If you want to capture several items, you can do it in a list context, as in this example under the Perl debugger:

Code
  DB<1>  $_ = 'If [Jon is Great] = "01/31/20XX" then "January" ' 

DB<2> @matches = /"(.*?)"/g;

DB<3> x \@matches;
0 ARRAY(0x600500ae8)
0 '01/31/20XX'
1 'January'
DB<4>


Having said that, I agree with Bill that the module Regexp::Common qw(balanced) is most probably a better solution.