CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Intermediate:
Find and Replace in html file using CSV

 



jphoang
New User

May 8, 2013, 11:05 AM

Post #1 of 7 (788 views)
Find and Replace in html file using CSV Can't Post

I have a CSV file in the following format

oldval1,newval1

oldval2,newval2

etc.

I need to write a perl script to search for all the oldval's in the html file, and replace with the newval's. note: both oldvals' and newvals' contain special characters ([/: etc). Can anyone help?


Laurent_R
Veteran / Moderator

May 8, 2013, 11:51 AM

Post #2 of 7 (780 views)
Re: [jphoang] Find and Replace in html file using CSV [In reply to] Can't Post

You basically can't do it without two nested loops.

The basic idea (if performande is not an issue, i.e. if the files are relatively small): read the CSV file and store oldval/newval in a hash. Then read the HTML file and apply substitutions oldval->newval for every chunk of data that your read in the HTML file.

There are a number of ways to do that (in some cases, one of the loops can be implicit), but, because nested loops can be time consuming on large data sets, can you please give an idea of the size of each of the files (CSV and HTML). On the basis of that, either the algorithm proposed above will be fine, or another one (or a slightly modified one) might be better.

Also, the best way to determine how to split the HTML file into proper chunks will depend on the size of your data.


FishMonger
Veteran / Moderator

May 8, 2013, 12:12 PM

Post #3 of 7 (778 views)
Re: [Laurent_R] Find and Replace in html file using CSV [In reply to] Can't Post

I don't think you need 2 nested loops.

I'd load the csv file into a hash and then use the keys (oldvals) in a substitution regex as you loop over the html file to replace the oldval with the newval.

Of course, that's assuming the regex doesn't need to span multiple lines.


jphoang
New User

May 8, 2013, 12:27 PM

Post #4 of 7 (775 views)
Re: [FishMonger] Find and Replace in html file using CSV [In reply to] Can't Post

Do you could write out the script to show me how it works?


FishMonger
Veteran / Moderator

May 8, 2013, 12:43 PM

Post #5 of 7 (768 views)
Re: [jphoang] Find and Replace in html file using CSV [In reply to] Can't Post

This is untested and will probably need to be tweaked a little and additional error handling added.


Code
open my $csv_fh, '<', 'file.csv'  or die "failed to open file.csv $!"; 
my %value;
while (my $values = <$csv_fh>) {
chomp $values;
my ($old, $new) = split /,/, $values;
$value{$old} = $new;
}
my $old = join '|', keys %values;

open my $html_fh, '<', 'file.html' or die "failed to open file.html $!";
while (my $line = <$html_fh>) {
$line =~ s/\b\Q$old\E\b/$values{$1}/eg;
print $line;
}



Laurent_R
Veteran / Moderator

May 8, 2013, 3:44 PM

Post #6 of 7 (756 views)
Re: [FishMonger] Find and Replace in html file using CSV [In reply to] Can't Post

Hi Fishmonger,

the solution you proposed is exactly the type of solution I was thinking about, I think my description of the solution I made in my post is clear enough about that.

As for the "nested loops" part, you are right, I guess that I expressed my views incorrectly.

To me, looking into a hash for every entry was implicitly akin to having a nested loop.

But, thinking more about it, it is true that a hash lookup is not really like looping on data. So, my comment was not rightly formulated.


Kenosis
User

May 12, 2013, 8:48 AM

Post #7 of 7 (734 views)
Re: [jphoang] Find and Replace in html file using CSV [In reply to] Can't Post

What, exactly, needs to be replaced in your html file? A sample of your data might be helpful.

For example, let's say you want to replace "important" with "urgent". The script will need to correctly handle the following:

Code
<div class="important">important</div>

Which "important" needs to be replaced? If it's only the text that's marked up and not the attribute value (or vice versa), the script must correctly distinguish between the two.

Even though your specs say, "...all the oldval's in the html file...", the above issue should first be explicitly clarified to insure your desired substitution results.


(This post was edited by Kenosis on May 12, 2013, 9:00 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives