Home: Perl Programming Help: Beginner:
delete tag in xml



perlmagix
Novice

Apr 30, 2016, 11:33 PM


Views: 4024
delete tag in xml

I am working on an input xml file containing the following data,

inputfile.xml

<data>
<line> sdfe abc adsfefsdf </line>
<line> abc sdffedcfsdf sdf </line>
<line> sdfe </line><line> abc </line>
<line> sd sfefsdf </line>
<line> sdfe abc adsfefsdf </line>
<line> fhgh kk jj hjsda </line>
<line> abc </line>
..
..
..
</data>


My intention is to produce the following output file

outputfile.xml


<data>
<line> sdfe </line>
<line> sd sfefsdf </line>
<line> fhgh kk jj hjsda </line>
..
..
..
</data>

Desired output:
Remove all the tags which contain the data "abc",


I have tried the following command, with no success. In Perl, can the output be delivered by use of regular expression (regex).



Code
`sed '\|<line>*abc*| ,\|</line>|d' inputfile.xml > outputfile.xml`



(This post was edited by perlmagix on May 1, 2016, 4:26 AM)


BillKSmith
Veteran

May 1, 2016, 7:17 AM


Views: 4010
Re: [perlmagix] delete tag in xml

The only reliable way to edit xml is to parse it with a module before attempting the edits. In special cases, such as your example, you can get away with editing the xml directly. My "solution" will fail if any xml "line" contains an embedded tag or if the xml line is spread over more than one perl line. It is likely that there are other special conditions, which I have not thought of.


Code
use strict; 
use warnings;
while (my $record = <DATA>) {
$record =~ s/\<line\>[^<]*?\babc\b[^<]*?\<\/line\>//ig;
print $record if $record =~ /\S/;
}
__DATA__
<data>
<line> sdfe abc adsfefsdf </line>
<line> abc sdffedcfsdf sdf </line>
<line> sdfe </line><line> abc </line>
<line> sd sfefsdf </line>
<line> sdfe abc adsfefsdf </line>
<line> fhgh kk jj hjsda </line>
<line> abc </line>
..
..
..
</data>


Output:

Code
<data> 
<line> sdfe </line>
<line> sd sfefsdf </line>
<line> fhgh kk jj hjsda </line>
..
..
..
</data>

Good Luck,
Bill


perlmagix
Novice

May 1, 2016, 9:15 AM


Views: 4004
Re: [BillKSmith] delete tag in xml

Cheers Bill,

Smile

Mike F

In Reply To


perlmagix
Novice

May 1, 2016, 3:17 PM


Views: 3998
Re: [BillKSmith] delete tag in xml

Also, for the given example,
Is it possible to do similar operation
For an array of values,

Array Sample:
abc
de
fghi
jkl


Laurent_R
Veteran / Moderator

May 1, 2016, 11:28 PM


Views: 3987
Re: [perlmagix] delete tag in xml

If you want to filter out any array element containing "abc", one possible way is this:

Code
my @array = qw / sSDQ sdd abcsg dlk ssq/; 
my @filtered = grep { not /abc/ } @array;

Now, @filtered should contain all the elements of @array except "abcsg".


perlmagix
Novice

May 1, 2016, 11:59 PM


Views: 3984
Re: [Laurent_R] delete tag in xml

Thank you Laurent,

I have not properly conveyed my question,

"inputfile.xml" is the input file

array of values to be checked and removed from input file

@array = qw / abc de fghi jklm /;


"outputfile.xml" is the output file,

the output file, should remove all the tags which contains the elements of the array,

My question is to incorporate this array into the regex provided by Bill, :)


Mike F


BillKSmith
Veteran

May 2, 2016, 8:50 PM


Views: 3962
Re: [perlmagix] delete tag in xml

My previous warnings still apply.

Code
use strict; 
use warnings;
my @array = qw / abc de fghi jklm sdfe kk/;
my $filter = join '|', @array;
$filter = qr/$filter/;
while (my $record = <DATA>) {
$record =~ s/\<line\>[^<]*?\b$filter\b[^<]*?\<\/line\>//ig;
print $record if $record =~ /\S/;
}
__DATA__
<data>
<line> sdfe abc adsfefsdf </line>
<line> abc sdffedcfsdf sdf </line>
<line> sdfe </line><line> abc </line>
<line> sd sfefsdf </line>
<line> sdfe abc adsfefsdf </line>
<line> fhgh kk jj hjsda </line>
<line> abc </line>
..
..
..
</data>


Note: I added two array elements to remove two more lines from your original example.
Good Luck,
Bill


perlmagix
Novice

May 2, 2016, 9:07 PM


Views: 3960
Re: [BillKSmith] delete tag in xml

Simply amazing bill,

can this be extended to following "data" if

__DATA__
<data>
<line> sdfe abc adsfefsdf </line>
<line> abc sdffedcfsdf sdf </line>
<line> sdfe </line><line> abc </line>
<line> sd abcsfefsdf </line>
<line> sdfe abc adsfefsdf </line>
<line> fhgh kk dejj hjsda </line>
<line> abc </line>
..
..
..
</data>


Modified data details:

"abc" joining with other letters in a single word,

as an example

abcsdfedf or zdfcfabc


"de" joining with other letters in a single word,

as an example

degfegf or fgsfvjde or gergdesdcdf



Smile


(This post was edited by perlmagix on May 2, 2016, 9:08 PM)


BillKSmith
Veteran

May 3, 2016, 5:18 AM


Views: 3947
Re: [perlmagix] delete tag in xml

You should be able to figure this out yourself.

Assertions
Good Luck,
Bill


perlmagix
Novice

May 3, 2016, 9:34 AM


Views: 3939
Re: [BillKSmith] delete tag in xml

Sure thing,

Cheers :)