CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Extract Entire XML Payload Based On Child Node

 



the_r
New User

Feb 10, 2017, 8:45 AM

Post #1 of 4 (1376 views)
Extract Entire XML Payload Based On Child Node Can't Post

Hello. I'm pretty new to perl. I have an xml file that has several different xml messages. What is common about each of the xml messages is that they each have a child node with the same name (EventInfo). The following is an example of the payloads:

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DeliveryTimeChanged CurrentStatus="OnHold" xmlns:ns2="http://com/post/orderupdatesasync/jaxbxml">
<EventInfo EventId="666313444" CreationDatetime="2017/02/09 07:59:17 369 GMT" RequestId="321150454">
<TopicCounts TopicName="DELIVERY.TIME.CHANGED" TopicCount="1"/>
</EventInfo>
<DeliveryChangeOperationType OperationTypeCode="DELAY" OperationSubtypeCode="HOLD" DeliveryChangeReason="Weather" DeliveryDate="20170210"/>
</DeliveryTimeChanged>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DeliveryRouteChanged CurrentStatus="OnHold" xmlns:ns2="http://com/post/orderupdatesasync/jaxbxml">
<EventInfo EventId="666313445" CreationDatetime="2017/02/09 07:59:23 639 GMT" RequestId="321150454">
<TopicCounts TopicName="DELIVERY.ROUTE.CHANGED" TopicCount="1"/>
</EventInfo>
<DeliveryRouteType OperationTypeCode="AIR" OperationSubtypeCode="HOLD" DeliveryDate="20170210"/>
</DeliveryRouteChanged>

<?xml version="1.0" encoding="UTF-8" standalone="yes"?>
<DeliveryCanceled CurrentStatus="Canceled" xmlns:ns2="http://com/post/orderupdatesasync/jaxbxml">
<EventInfo EventId="666313446" CreationDatetime="2017/02/09 07:59:44 963 GMT" RequestId="421150444">
<TopicCounts TopicName="DELIVERY.STATUS.CANCELED" TopicCount="1"/>
</EventInfo>
<DeliveryStatusType DeliveryStatusCode="CX" OperationSubtypeCode="CANCELED" DeliveryDate="20170210"/>
</DeliveryCanceled>

What I would like is to pull the entire xml message that has a certain RequestId attribute value (321150454) in the EventInfo node regardless of what the parent node is.

I have tried the following perl script:
perl -ne ' if(/EventInfo>/){$p=0} if(/RequestId="321150454"/) {print $ARGV; print " "; print; $p=1;next}print if$p' sample.xml

The output is only giving me the EventInfo node:
<EventInfo EventId="666313444" CreationDatetime="2017/02/09 07:59:17 369 GMT" RequestId="321150454">
<TopicCounts TopicName="DELIVERY.TIME.CHANGED" TopicCount="1"/>
sample.xml <EventInfo EventId="666313445" CreationDatetime="2017/02/09 07:59:23 639 GMT" RequestId="321150454">
<TopicCounts TopicName="DELIVERY.ROUTE.CHANGED" TopicCount="1"/>

How do I get the entire xml payload? Any help with this would be greatly appreciated. Thanks for your time.


Laurent_R
Veteran / Moderator

Feb 10, 2017, 10:16 AM

Post #2 of 4 (1375 views)
Re: [the_r] Extract Entire XML Payload Based On Child Node [In reply to] Can't Post

I'm not sure exactly what you mean by the entire XML payload.

Don't try to parse XML yourself. You should probably use parsers. There two kinds of parsers for XML: DOM and SAX. DOM loads the entire file in memory, so that SAX is probably better if your files are huge and may lead to memory overflow.

You should use CPAN Perl modules implementing either DOM or SAX.

Possible solutions in Perl: XML::XPath, XML::DOM, XML::SAX, XML::LibXML (recommended), XML::Twig (recommended), and many others. Just search the CPAN (http://www.cpan.org/).

Note that XML::Simple is not recommended.

Update: I had to post what I had written in emergency, but it was incomplete. I have now added what I had no time to write earlier.


(This post was edited by Laurent_R on Feb 10, 2017, 12:57 PM)


the_r
New User

Feb 10, 2017, 10:35 AM

Post #3 of 4 (1374 views)
Re: [the_r] Extract Entire XML Payload Based On Child Node [In reply to] Can't Post

Thanks for your reply. What I meant by entire xml payload was I would like to return:
<DeliveryTimeChanged CurrentStatus="OnHold" xmlns:ns2="http://com/post/orderupdatesasync/jaxbxml">
<EventInfo EventId="666313444" CreationDatetime="2017/02/09 07:59:17 369 GMT" RequestId="321150454">
<TopicCounts TopicName="DELIVERY.TIME.CHANGED" TopicCount="1"/>
</EventInfo>
<DeliveryChangeOperationType OperationTypeCode="DELAY" OperationSubtypeCode="HOLD" DeliveryChangeReason="Weather" DeliveryDate="20170210"/>
</DeliveryTimeChanged>

instead of just:
<EventInfo EventId="666313444" CreationDatetime="2017/02/09 07:59:17 369 GMT" RequestId="321150454">
<TopicCounts TopicName="DELIVERY.TIME.CHANGED" TopicCount="1"/>

I would like to make the parent node a wildcard and always search on the child node of EventInfo, but would like my result to be the entire xml payload. I tried the following, but this is not behaving correctly either.

perl -ne ' if(/(?:.*)<EventInfo/){$p=0} if(/RequestId="321150454"/) {print $ARGV; print " "; print; $p=1;next}print if$p' sample.xml


Laurent_R
Veteran / Moderator

Feb 10, 2017, 1:01 PM

Post #4 of 4 (1366 views)
Re: [the_r] Extract Entire XML Payload Based On Child Node [In reply to] Can't Post

Just to state it once more: don't try to parse XML yourself. You're very likely to get a mess. Use CPAN modules to do it for you.

Please note that my earlier post was incomplete when you first read it (I had to post it in emergency or to lose it), I have now completed it. You may want to re-read it.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives