Home: Perl Programming Help: Regular Expressions:
XML File Parsing with Regex -- Need Help



vol7ron
New User

Aug 24, 2008, 3:43 PM


Views: 6435
XML File Parsing with Regex -- Need Help

Example XML:

Code
<class> 
<student firstName="Joe" lastName="Thomas" age="32" />
<student firstName="Bob" lastName="Villas" age="92" />
<student firstName="Don" lastName="Gaters" age="13" />
</class>


I only want to pull the attributes within. That is, firstName="", lastName="", and age="", whether there is text in the quotes, or not.

If each line of the XML is stored in @body.

Code
   foreach (@body) { 
do {
s/^\s+//; #remove leading spaces
s/\s+$//; #remove trailing spaces
if (/^(.+?)\=\"(.*?)\"/) {
print "${1}:${2} ";
}
$_ = $';
$rem = $';

} until ($rem eq "");
}


The above prints something like the following, which is almost perfect, except for that leading <student.

Code
<student firstName:Joe lastName:Thomas age:32 
<student firstName:Bob lastName:Villas age:92
<student firstName:Don lastName:Gaters age:13



vol7ron
New User

Aug 24, 2008, 4:00 PM


Views: 6432
Re: [vol7ron] XML File Parsing with Regex -- Need Help


Code
if (/(\w+?)\=\"(.+?)\"/) {  
...
}


I changed the If-Statement to use the \w (word descriptor) and removed the ^ (beginning stream mark).

I'm wondering if there is still a way using the (.+?) sequence.


KevinR
Veteran


Aug 24, 2008, 6:22 PM


Views: 6429
Re: [vol7ron] XML File Parsing with Regex -- Need Help


Code
@body = <DATA>; 
foreach (@body) {
s/^\s+//; #remove leading spaces
s/\s+$//; #remove trailing spaces
print "$1:$2\n" while (/\s([^=]+)="([^"]+)"/g);
}
__DATA__
<class>
<student firstName="Joe" lastName="Thomas" age="32" />
<student firstName="Bob" lastName="Villas" age="92" />
<student firstName="Don" lastName="Gaters" age="13" />
</class>

-------------------------------------------------