CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
clean up file

 



artperl
Novice

Apr 13, 2015, 5:28 PM

Post #1 of 5 (3177 views)
clean up file Can't Post

Hi Gurus,

I have an html file with extra <tr></tr> pairs that I would like to remove from the file to cleanup. The tricky part is that <tr> & </tr> are in different lines.
Any suggestion that I can easily take those out without removing ALL of valid <tr> </tr> pairs that has data in between?
Thanks much!...


artperl
Novice

Apr 14, 2015, 1:30 AM

Post #2 of 5 (3173 views)
Re: [artperl] clean up file [In reply to] Can't Post

here is a snipet of the file content:
<tr>
<td width="160" height="21" bgcolor="#CCECFF"><b>Max. value</b></td>
<td width="590" height="21" bgcolor="#F8F8F8">35.3999 C</td>
</tr>
<tr>
</tr>
<tr>
</tr>
<tr>
<td width="160" height="21" bgcolor="#CCECFF"><b>Cpk</b></td>
<td width="590" height="21" bgcolor="#F8F8F8">n/a .</td>
</tr>
<tr>
</tr>
<tr>
</tr>

i would like to remove <tr></tr> pairs that has nothing in between....


Zhris
Enthusiast

Apr 14, 2015, 2:54 AM

Post #3 of 5 (3170 views)
Re: [artperl] clean up file [In reply to] Can't Post

Hi,

With a regular expression...

I didn't concern myself with putting newlines back in, you could do this with another regular expression or perhaps you could adjust the original. If the trs have attributes, then you'll have to modify to account for these.


Code
use strict; 
use warnings;

my $string = do { local $/ = undef; <DATA> };

$string =~ s{\s*<tr>\s*</tr>\s*}{}g;

print $string;

__DATA__
<tr>
<td width="160" height="21" bgcolor="#CCECFF"><b>Max. value</b></td>
<td width="590" height="21" bgcolor="#F8F8F8">35.3999 C</td>
</tr>
<tr>
</tr>
<tr>
</tr>
<tr>
<td width="160" height="21" bgcolor="#CCECFF"><b>Cpk</b></td>
<td width="590" height="21" bgcolor="#F8F8F8">n/a .</td>
</tr>
<tr>
</tr>
<tr>
</tr>


With HTML::TreeBuilder...

HTML::TreeBuilder will put the root tags back in i.e. <html> -> <body> -> <table> etc, but assumably you have only provided a section of your html document and that these are already in place. If not then you could just look for and print the elements you need i.e. print map { $_->as_HTML( undef, "\t" ), "\n" } $tree->look_down( _tag => 'tr' );


Code
use strict; 
use warnings;
use HTML::TreeBuilder;

my $string = do { local $/ = undef; <DATA> };

my $tree = HTML::TreeBuilder->new_from_content( $string );

$_->delete for ( $tree->look_down( _tag => 'tr', sub { $_[0]->is_empty } ) );

print $tree->as_HTML( undef, "\t" );

__DATA__
<tr>
<td width="160" height="21" bgcolor="#CCECFF"><b>Max. value</b></td>
<td width="590" height="21" bgcolor="#F8F8F8">35.3999 C</td>
</tr>
<tr>
</tr>
<tr>
</tr>
<tr>
<td width="160" height="21" bgcolor="#CCECFF"><b>Cpk</b></td>
<td width="590" height="21" bgcolor="#F8F8F8">n/a .</td>
</tr>
<tr>
</tr>
<tr>
</tr>


Chris


(This post was edited by Zhris on Apr 14, 2015, 3:26 AM)


artperl
Novice

Apr 14, 2015, 5:10 PM

Post #4 of 5 (3135 views)
Re: [Zhris] clean up file [In reply to] Can't Post

you're amazing Chris!... the HTML::TableExtract lines work perfectly to clean-up my html files!... you're the man!... will continue to play around & learn from the documentation... but i hope you don't mind if I bog you once in a while to ask for advise ;-)


Zhris
Enthusiast

Apr 15, 2015, 10:56 AM

Post #5 of 5 (3045 views)
Re: [artperl] clean up file [In reply to] Can't Post

No problem. Note I used HTML::TreeBuilder for this task, not HTML::TableExtract, they both subclass HTML::Element down the chain, which handles the core of this task, but the latter would fall over on empty table rows and be an overkill since all you wanted to do was remove empty elements, which coincidently were table rows.

Chris


(This post was edited by Zhris on Apr 15, 2015, 11:01 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives