CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Using Regular Expression in Perl Programming

 



OrcasKing
New User

Dec 7, 2011, 11:30 AM

Post #1 of 3 (24980 views)
Using Regular Expression in Perl Programming Can't Post

Dear All,

I am get stuck in perl programming when using the regular expression

Here's the below programming details



I have the XML File nodes that look like this

<Document ID="2">
<DocumentName>Vii</DocumentName>
<Description>XIIk1</Description>
<Component ID="1" source="ABC">10911</Component>
</Document>
<Document ID="3" format="pdf">
<DocumentName>XI-1</DocumentName>
<Description>Sample.PDF</Description>
<Component ID="1" source="ABC">VBI-1.PDF</Component>
</Document>

The PERL script matches the <Document ID=”2”> but cant find a match for “<Document ID=”3” format=”pdf”> due to the
newly introduced attribute “format=pdf”.

If the PERL script was changed to allow for this attribute

===== PERL SCRIPT ===========================================
if ($a=~ /^(.*?)<Document\s+ID="(\d+)"(.*?)>/s) {
$DocNumber = $2;
$DocumentBlockFlag = 1;
$DocCounter ++;
$DocFormat = $3;
}
===================================================================

Please provide your suggestion what is gone wrong with the perl script



OrcasKing


BillKSmith
Veteran

Dec 7, 2011, 2:17 PM

Post #2 of 3 (24956 views)
Re: [OrcasKing] Using Regular Expression in Perl Programming [In reply to] Can't Post

It is seldom a good idea to parse a mark-up language with a regular expression. Find and use a module from CPAN. I cannot offer any practical advice, but a quick search of CPAN suggest that you look into XML::Parser or XML::Node.
Good Luck,
Bill


rovf
Veteran

Feb 2, 2012, 1:25 AM

Post #3 of 3 (23266 views)
Re: [OrcasKing] Using Regular Expression in Perl Programming [In reply to] Can't Post

Works for me:


Code
 $ perl -lwe 'qq(<Document ID="3" format="pdf">)=~ /^(.*?)<Document\s+ID="(\d+)"(.*?)>/s && print "found"' 

found


In the example you posted, you had Windows-style double quotes, so I had to replace them to normal ASCII double quotes. Maybe that's the reason it did not work in your case.

Of course, best would be - as it already had been pointed out - to not use a regexp to parse XML. There are several XML parsers available.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives