CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions:
Regex for xml tags deletion

 



cognizant
Novice

May 7, 2016, 4:04 AM

Post #1 of 4 (5848 views)
Regex for xml tags deletion Can't Post

In Perl, using Regex, in an xml file, can tags be deleted within a tag,

xml file:

<sample>
<parents> john and les
<son> mike </son>
<daughter> liz </daughter>
</parents>
</sample>


Output pattern:

<sample>
<parents> john and les
</parents>
</sample>


Laurent_R
Veteran / Moderator

May 8, 2016, 1:44 AM

Post #2 of 4 (5836 views)
Re: [cognizant] Regex for xml tags deletion [In reply to] Can't Post

The short answer is no. In general, you should not try to edit XLM files (or HTML files) with regexes. You should use a parser and there are many modules to do that on the CPAN (see for example http://search.cpan.org/dist/XML-LibXML/LibXML.pod or http://search.cpan.org/~msergeant/XML-Parser-2.36/Parser.pm).

Now, of course, for very simple (and well formatted) cases such as your example, it might be possible to do it (though not recommended, as simple cases usually become more complicated with time). You gave an example, but did not specify the implicit rule you want to use: just discard lines with the <son> and <daughter> tags? Something else? I can't really give a solution if you don't state more clearly how (in English, not in Perl) you want to do it.


cognizant
Novice

May 8, 2016, 8:24 PM

Post #3 of 4 (5811 views)
Re: [Laurent_R] Regex for xml tags deletion [In reply to] Can't Post

Thank you for the reply Laurent,

I am working on an xml file. My objective is to delete all the content present within a tag if it has a word "remove"

I wanna see if other tags are present within the content, can these be removed using PERL ?

As an example,

If I want to delete content between two tags entitled test in an xml file that has the word "remove",

<test> avdwds343 asdasd remove
<tag1> sasdas asdsa 3432 </tag1> <tag2> asdad 2321sdaf adsfsd </tag2>
asdasdas asdsdas </test>



Test Input XML file:

<q>adds 2123 dsfd2343</q>
<w>adf232 asdsd
sdf324 </w>
<e>asd21323 adsf</e>
<test> avdwds343 asdasd remove
<tag1> sasdas asdsa 3432 </tag1> <tag2> asdad 2321sdaf adsfsd </tag2>
asdasdas asdsdas </test>
<r>dasfed3213 dsfds33 dasfdf 343
3423234 asdfsdaf324</r>
<t>asda21323 asdas</t>



Expected output XML file:

<q>adds 2123 dsfd2343</q>
<w>adf232 asdsd
sdf324 </w>
<e>asd21323 adsf</e>
<r>dasfed3213 dsfds33 dasfdf 343
3423234 asdfsdaf324</r>
<t>asda21323 asdas</t>


Laurent_R
Veteran / Moderator

May 8, 2016, 11:28 PM

Post #4 of 4 (5810 views)
Re: [cognizant] Regex for xml tags deletion [In reply to] Can't Post

Hmm, I would really recommend a parser.

With regexes, you could perhaps try something like this (untested):


Code
while (my $line = <$IN>) { 
print and next unless /delete/;
my $start-tag = $line =~ /<^(\w+>)/;
my $end_tag = "</$1";
while (my $discard = <$IN>) {
last if /$end_tag/;
}
}


But this makes a number of assumptions about the input (formatting and other), so that it will probably break as soon as something does not look exactly like your input. This type of program is really fragile and might be used only if you have complete control on the input (e.g. you write it yourself in another application).

For any other usage, use a parser.

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives