CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
Search Posts SEARCH
Who's Online WHO'S
Log in LOG

Home: Perl Programming Help: Regular Expressions:
Parsing XML


New User

Oct 21, 2009, 10:01 AM

Post #1 of 3 (6120 views)
Parsing XML Can't Post

I'm trying to parse XML nodes where some nodes have node data and some do not.

for example:


<Instance id="1" name="node_1" />

<Instance id="2" name="node_2">Node data</Instance>


I want to grab each "Instance" node within the "Instance_Group", but am having issues because of the different node types.

Does anyone have a REGEXP that can parse each XML node, regardless whether or not there is node data?

I have tried a few different ideas.

- The code:

while( ($tag =~ m#<Instance[\s\S]+?/>#mi)||($tag =~ m#<Instance[\s\S]+?/Instance>#mi))

was my first try. However, if there are multiple mixed nodes, the code will grab all the way to the "/>" first, even though the first node may end in "</Instance>. That is to say, it may grab 2 or more nodes, rather than just one.

- The code:

while($tag =~ m#<Instance.+?(?:/>||/Instance>)#mi)

This attempt will grab a node with no node data successfully (i.e., <Instance id="1" name="node_1" />), but will only grab the node information up to the node data, if it exists. For example, parsing

<Instance id="2" name="node_2">Node data</Instance>

will only grab

<Instance id="2" name="node_2"> with a remainder of Node data</Instance>

Note: I did try setting the or using a single pipe in the non-saved match {i.e., (?:/>|/Instance>) } However, this produced no matches.

I've tried a few other solutions as well with no luck. Any suggestions would be appreciated.

Veteran / Moderator

Oct 21, 2009, 10:38 AM

Post #2 of 3 (6117 views)
Re: [mrominski] Parsing XML [In reply to] Can't Post

Don't use a regex to parse an XML file.

Use one of the XML parsers that are on cpan, such as XML::Simple.


Oct 22, 2009, 6:36 PM

Post #3 of 3 (6076 views)
Re: [mrominski] Parsing XML [In reply to] Can't Post

although i do not know exactly what's your output, here's a way without too much regex, by toggling flags

if( /<\/Instance_Group>/){ $f=0;}
if( /<Instance_Group>/){
if ( $f==1 ) {
print $_ ; #do your stuff here.


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives