CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Compare two xml files and insert the missing tags to second xml

 



gprakash
New User

Aug 25, 2014, 2:30 AM

Post #1 of 3 (1429 views)
Compare two xml files and insert the missing tags to second xml Can't Post

I am trying perl script to compare two xml files. The single xml in template folder need to be compared with many xmls in input folder. Have to insert the missing tags from template xml file to input xml files, By comparing the attribute id.

I found the ids which are missing in input xml files. please help me how to copy the tags(missing ids) to the input xml files. I would like to insert the tag in input xml exact position as present in template xml.

I tried the following:

Code
use strict; 
use warnings;
undef $/;
use Cwd;

my $cwd = getcwd();
my $tmpdir = "$cwd//Template";
my $indir = "$cwd//Input";
my $outdir = "$cwd//Output";

# Template XML Folder
opendir(TEMP, "$tmpdir")|| die "cannot open directory!";
my @tmpfiles = grep{/\.xml/} readdir(TEMP);
closedir(TEMP);

# Input XML Folder
opendir(IN, "$indir")|| die "cannot open directory!";
my @infiles = grep{/\.xml/} readdir(IN);
closedir(IN);

# Processing XML in template Folder
my($ttitle,@ttitle,$ttitle1,$ttitle2,$ttitle3,$ttitleg,@ttitle2,@ttitleg1);
open(TDOC, "$tmpdir//$tmpfiles[0]");
$ttitle = <TDOC>;
while ($ttitle =~m/word id="([^"]+)" eng="([^"]+)" word="([^"]+)"/gis){
$ttitle1 = $1;
$ttitle2 = $2;
$ttitle3 = $3;
push @ttitleg1, $ttitle1;
}
close(TDOC);


# Processing XML in Input Folder
my ($initem,$title,$title1,$title2,$title3,@title1,%title2,@title3,@wordlist,@outtt);
foreach $initem(@infiles){
@title1=();
@outtt=();
%title2=();
#open xml file
open(XDOC, "$indir//$initem");
chomp($title = <XDOC>);
close(XDOC);
while ($title =~m/\word id="([^"]+)" eng="([^"]+)" word="([^"]+)"/g){
$title1 = $1;
$title2 = $2;
$title3 = $3;
push @title1, $title1;
}


foreach (@outtt = grep!${{map{$_,1}@title1}}{$_},@ttitleg1){
#print "@outtt\n";
@wordlist = '<word id="' . "@outtt" . '" eng="' . "$ttitle2" . '" word="' . "$ttitle3" . '" />';
print @wordlist;

open(OUTO, ">$outdir/${initem}_updated.xml") || die "Cannot create XML!";
print OUTO "@wordlist";
close(OUTO);
}

}#foreach


Template xml:

<word id="0005" eng="Add to Wishlist" word="Auf den Wunschzettel" />
<word id="0006" eng="Address 1" word="Adresse 1" />
<word id="0007" eng="Address 2" word="Adresse 2" />
<word id="pdf" eng="PDF" word="PDF" />
<word id="epub" eng="epub" word="TePub" />
<word id="docx" eng="DOCX" word="DOCX" />

Input xml:

<word id="0005" eng="Add to Wishlist" word="Auf den Wunschzettel" />
<word id="0007" eng="Address 2" word="Adresse 2" />
<word id="pdf" eng="PDF" word="PDF" />
<word id="docx" eng="DOCX" word="DOCX" />

Expected output:

<word id="0005" eng="Add to Wishlist" word="Auf den Wunschzettel" />
<word id="0006" eng="Address 1" word="Adresse 1" />
<word id="0007" eng="Address 2" word="Adresse 2" />
<word id="pdf" eng="PDF" word="PDF" />
<word id="epub" eng="epub" word="TePub" />
<word id="docx" eng="DOCX" word="DOCX" />



Laurent_R
Veteran / Moderator

Aug 25, 2014, 9:58 AM

Post #2 of 3 (1405 views)
Re: [gprakash] Compare two xml files and insert the missing tags to second xml [In reply to] Can't Post

One question first: what's wrong with proper indentation?

Second, why do you have this:

Code
$ttitle2 = $2;  
$ttitle3 = $3;

in the loop reading the template, if you are not using these variables?

Next, if you can find the missing IDs, I don't really understand where you're having difficulties inserting them.

I think you might reconsider the choice of an array to store template, you'd probably better off using a hash (with the IS as the key and the full line as ithe value.

Finally, using regexes to parse XML is usually frowned upon, using a specialized CPAN module is generally better, although it that very simple case, you will probably manage to do it the way you've chosen.


BillKSmith
Veteran

Aug 25, 2014, 1:35 PM

Post #3 of 3 (1380 views)
Re: [gprakash] Compare two xml files and insert the missing tags to second xml [In reply to] Can't Post

Your requirement to "insert the tag in input xml exact position as present in template xml" would not make sense unless all ID's in the input file are also in the template file and that they are in the same order. (Your example does meet this description.)

Take advantage of this structure with the following pseudo code.



Code
foreach ID in the template: 

If ID matches next ID from input
output tag from input
get next tag from input
else
output tag from template.

Good Luck,
Bill

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives