CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Delete part of file ?

 



patrik
stranger

Jun 3, 2001, 5:42 AM

Post #1 of 18 (3500 views)
Delete part of file ? Can't Post

How do I delete part of a file ??
I got a file called somefile.html, and I wanto delete
the things between 2 tags, like:

HTMLTAG1START
Some Lines of
text goes here
HTMLTAG1END

How do I delete the tags and the text ??

Thanx In advance.......

Excuse the bad english
Im from Sweden ;)


mhx
Enthusiast / Moderator

Jun 3, 2001, 8:31 AM

Post #2 of 18 (3493 views)
Re: Delete part of file ? [In reply to] Can't Post

Hi Patrik,

use a regex:

Code
my $file = 'somefile.html'; 
undef $/; # undefine line separator

open FILE, $file or die "cannot open $file: $!\n";
my $content = <FILE>;
close FILE;

$content =~ s/HTMLTAG1START.*?HTMLTAG1END//gs;

open FILE, ">$file" or die "cannot open $file: $!\n";
print FILE $content;
close FILE;

That should do exactly what you want. Anyway, I haven't tested, so complain if it doesn't work.

-- Marcus



(This post was edited by mhx on Jun 3, 2001, 7:33 AM)


randor
User

Jun 3, 2001, 8:39 AM

Post #3 of 18 (3493 views)
Re: Delete part of file ? [In reply to] Can't Post

patrik,

well, you could try something like this:

open FILE, "+<file.dat" or die "cannot open file.dat";
flock(FILE, LOCK_EX);
chomp (@story = <FILE>);
@story =~ s/(HTMLTAG1START .*HTMLTAG1END)//ig;
seek(FILE, 0, 0);
print FILE @story;
close FILE;

again, i did not test it, so if something is wrong with it let me know, but the general principal of it is to use the "s" operator to remove the tags and lines.


I Hope this helps...

perl programmers don't die.. they just start writing a new script.


randor
User

Jun 3, 2001, 8:40 AM

Post #4 of 18 (3493 views)
Re: Delete part of file ? [In reply to] Can't Post

mhx, good idea:) it seems you got it in seconds prior to me though:) hehe

I Hope this helps...

perl programmers don't die.. they just start writing a new script.


rGeoffrey
User / Moderator

Jun 3, 2001, 8:55 AM

Post #5 of 18 (3491 views)
Re: Delete part of file ? [In reply to] Can't Post

Two other answers were posted while I was typing and testing mine, but this one works without any regex substitutions so I will post it now...


Code
#!/usr/local/bin/perl 

use strict;

&purge_file ('purge.txt', 'HTMLTAG1START', 'HTMLTAG1END');

sub purge_file
{
my ($filename, $starttag, $endtag) = @_;

open (FILE, $filename) or die "could not read from $filename, $!";

local $/ = $starttag;
my $string = <FILE>;
chomp $string;
$/ = $endtag;
<FILE>;
$/ = undef;
$string .= <FILE>;

close FILE;
open (FILE, ">$filename") or die "could not write to $filename, $!";
print FILE $string;
close FILE;
}

This will read in the whole file in three pieces and then write what is left to the same filename so the original will be destroyed. If you have a really big file or need to keep the originial you might want to change it to print someplace else.

It works by playing with the "input record separator" $/ which is "\n" by default. $/ is used when reading from files and it is the thing that chomp will remove from the end of a string. Remember to always localize your changes to $/ with local and to do it inside a block so your changes will go away as quickly as possible or other places in the program may use your changes when you don't want them to.

First we set $/ to the starting tag and read everything upto and including the tag. And then chomp the tag off the end because we don't need it. Then we set $/ to the ending tag and read all the stuff we don't want, including the tag. And finally we set $/ to undef so it can read the rest of the file in one piece and add that to $string. Now $string has everything we want and nothing we don't so it is time to print it.

--
Sun Sep 9, 2001 - 1:46:40 GMT, a very special second in the epoch. How will you celebrate?


mhx
Enthusiast / Moderator

Jun 3, 2001, 9:40 AM

Post #6 of 18 (3487 views)
Re: Delete part of file ? [In reply to] Can't Post

Hi randor,

your script is ok, but you seem to make some assumptions that may not be legal here:

1. You assume there's support for the flock() function on patrik's system. Try this when you have to use WinXX.
2. You assume that the tags start and end on the same line. That's not neccessarily the case.
3. You assume there's only one tag pair (in each line). If there were more, your regex would kill everything between the first and the last tag pair because of the greedy *.

I don't want blame you for anything, just want to stress some points that also made my scripts fail in my early perl days. Wink I guess your script would also work for many cases, but not for all.

Hope everyone can draw some useful information from this.

-- Marcus



mhx
Enthusiast / Moderator

Jun 3, 2001, 9:48 AM

Post #7 of 18 (3486 views)
Re: Delete part of file ? [In reply to] Can't Post

rGeoffrey,

WOW!! This is a really Cool solution. One of the best examples for TMTOWTDI.
I see only one problem: What if there's more than one occurrence of the tag pair in the file? (I know, patrik didn't ask for it explicitly...) Anyway, cool solution.

-- Marcus



randor
User

Jun 3, 2001, 11:20 AM

Post #8 of 18 (3482 views)
Re: Delete part of file ? [In reply to] Can't Post

marcus,

thank you for pointing out my misses.. i am not at all above some constructive criticism..
it is true, i do sometimes make assupmtions of things that i shouldn't, which limits my help.. i thank you for pointing this out to me and in the future i will certainly try to think of these things before andwering posts,
I taught myself perl, so reading and writing to these posts are as much a learning tool to me as reading script itsself.

Randor

I Hope this helps...

perl programmers don't die.. they just start writing a new script.


mhx
Enthusiast / Moderator

Jun 3, 2001, 11:35 AM

Post #9 of 18 (3481 views)
Re: Delete part of file ? [In reply to] Can't Post

Hi Randor,


In Reply To
I taught myself perl, so reading and writing to these posts are as much a learning tool to me as reading script itsself.

So did I. It's the same for me. Didn't want to criticize, just show up some points where I spent a lot of time figuring out why my scripts were failing in some cases, so others hopefully don't have to spend this time.

-- Marcus

P.S.: Where's the quote in your signature from? Is it your invention? I really like it. Smile



randor
User

Jun 3, 2001, 2:20 PM

Post #10 of 18 (3475 views)
Re: Delete part of file ? [In reply to] Can't Post

thank you,

yes that quote is something i thought up myself, im sure someone else has used it before.. but not to my knowledge:)


I Hope this helps...

perl programmers don't die.. they just start writing a new script.


Mortimer
journeyman

Jun 4, 2001, 4:34 AM

Post #11 of 18 (3450 views)
Re: Delete part of file ? [In reply to] Can't Post


Code
#!/usr/bin/perl -w 

use strict;
use CGI qw( :fatalsToBrowser );
my $q = CGI->new();
print $q->header;

my $hit_file = '/path/to/page.txt';
my $locked_flag = '/path/to/lock.txt';
my $flock_on = 'no';
my $waited_for = 0;
my $timeout = 5;
my $global = 1;
#---------------------

if( &choose_lock( $flock_on, $locked_flag, $waited_for, $timeout ) ){
undef $/;
my $content = <HFILE>;
if( $global == 1 ){
$content =~ s/<head>.*?<\/head>//gs;
}
else{
$content =~ s/<head>.*?<\/head>//s;
}
seek( HFILE, 0, 0 );
truncate( HFILE, 0 );
print HFILE $content;
}

if($flock_on eq 'no'){
close( HFILE );
unlink( $locked_flag );
}
else{
close(HFILE);
}

###
sub choose_lock{
my($flock_on,$locked_flag,$waited_for,$timeout) = @_;
if( $flock_on eq 'no' ){
open( HFILE, "+<$hit_file" )or die( "Cannot open $hit_file: $!\n" );
&lock_f( $locked_flag, $waited_for, $timeout );
}
else{
open( HFILE, "+<$hit_file" )or die( "Cannot open $hit_file: $!\n" );
flock( HFILE, 2 );
}
}
###
sub lock_f{
my( $locked_flag, $waited_for, $timeout ) = @_;
unless( -e( $locked_flag ) ){
open( LFILE, ">$locked_flag")or die( "Cannot create $locked_flag: $!\n" );
close( LFILE );
}
else{
if( $waited_for > $timeout ){
print "Cannot open $hit_file";
return 0;
}
else{
$waited_for++;
sleep 1;
&lock_f( $locked_flag, $waited_for, $timeout );
}
}
}

I like randor's solution because I also think file locking is *very* important, and I feel better when I open and lock the file just once to do the lot.

I agree with mhx in that a regex (his regex) is most flexible (or can easily be made so) for this job.

So here's something that will (hopefully) assure us that only one process at a time can modify the file whichever os were under. Of course it's advisory. To test it, just comment out the `unlink( $locked_flag );' line and run the program so you get a lock.txt file present.

The existence of this file prevents any further processes from opening the page.txt file. Any other process, apart from the current one, trying to open the file will hang around ( using sleep() ) and make $timeout attempts to open the file. Almost every process should of course be successful.

If it can't open the file, lock_f() will return false to choose_lock(), and the process will exit.

For matching flexibility, adjust $global to 0 for just the first match, or 1 for a global match (mhx's regex). I've played around on Win32, but if anyone comes up with any surprises for me, just let me know. If someone tests this on UNIX, let me know the results of setting $flock_on to something other than 'no'. The else block in choose_lock() should force the use of Perl's own flock(), which is of course the most desirable method.

Cheers,
Dave.
www.dmscripts.com
davemortimer@bigpond.com




(This post was edited by Mortimer on Jun 4, 2001, 3:52 AM)


patrik
stranger

Jun 4, 2001, 5:53 AM

Post #12 of 18 (3445 views)
Re: Delete part of file ? [In reply to] Can't Post

Thanx 4 all answers, Ill going to try all of them tonight.

Patrik

Excuse the bad english
Im from Sweden ;)


mhx
Enthusiast / Moderator

Jun 4, 2001, 6:19 AM

Post #13 of 18 (3445 views)
Re: Delete part of file ? [In reply to] Can't Post

Hi Dave,

I agree with you that file locking is always a good idea for multiple processes having r/w access to files. Smile I'm not really sure if patrik asked for it, but anyway, it's good. Since I've been searching for the 'perfect' OS-independent file-locking-solution just yesterday (and unfortunately didn't find anything on CPAN), I've had a closer look at your solution. I recognized your solution was the same as the one that came to my mind first. (And which I have yet implemented for locking files in a two-user cross-platform software development environment.) But I personally had a problem with this solution in the context of cgi-scripting and I just want to ask you if you agree with me and what you think of my solution.
First, the problem I see with your (our) solution is a race condition between the checking of the existence of the lock file and the creation of the lock file. If two (or even more) processes should ever check for the file existance at the same time (and all see it doesn't exist), all processes open the lock file afterwards and declare the file locked. (The system doesn't prevent multiple processes from opening the same file with write access. At least Win98 doesn't.) I know that this will almost never occur, but after all, it's possible.
I guess my solution is also far from being perfect, but I like to hear your word about it. I append a '.lock' to the file I wish to lock before opening, using the rename function. If rename fails either the file doesn't exist, or it is already locked by another process. To unlock the file, I just rename it back. I have a loop similar to yours that retries renaming a few times. Bla, bla, here's the source:

Code
sub ReadVote 
{
my $vote = shift;
my($lock, $cnt) = ("$vote.lock", 0);

sleep(1) until rename( $vote, $lock ) || ++$cnt > 10;

open VOTE, $lock or die "cannot open $lock for reading: $!\n";
my @votes = map { /^\s*(.*?)\s*\[(\d+)\]\s*$/ ?
{text=>$1, count=>$2} : () } <VOTE>;
close VOTE;

if( @_ ) {
$votes[$_]->{count}++ foreach @_;

open VOTE, ">$lock" or die "cannot open $lock for writing: $!\n";
print VOTE "$_->{text} [$_->{count}]\n" foreach @votes;
close VOTE;
}

rename( $lock, $vote ) or die "cannot unlock $vote: $!\n";

return @votes;
}

It's a bit out of context here, but you should see the main thing.
The function takes a filename as first parameter and optionally some more parameters that specify how the file should be modified. If these optional parameters are missing, the file isn't modified. But that's all not so important.
I'd really like to hear your opinion about these two topics!

Thanks in advance,

-- Marcus



patrik
stranger

Jun 4, 2001, 1:39 PM

Post #14 of 18 (3437 views)
Re: Delete part of file ? [In reply to] Can't Post

Tried rGeoffreys code, it wont work :(
I use win98/NT4/win98se
Attachment = the perl code

Excuse the bad english
Im from Sweden ;)


patrik
stranger

Jun 4, 2001, 1:41 PM

Post #15 of 18 (3436 views)
Re: Delete part of file ? [In reply to] Can't Post

And heres the html doc....

Excuse the bad english
Im from Sweden ;)


rGeoffrey
User / Moderator

Jun 5, 2001, 11:19 AM

Post #16 of 18 (3422 views)
Re: Delete part of file ? [In reply to] Can't Post

The problem is in the use of single quotes while calling the function. You said...


Code
$start = '<!--id2start-->'; $end = '<!--id2end-->'; 

&purge_file ('test1.html', '$start', '$end');

But you should have said...


Code
&purge_file ('test1.html', $start, $end);

When you use single quotes you get exactly what you type so the function thought that the starting tag was '$start' not '<!--id2start-->'.

--
Sun Sep 9, 2001 - 1:46:40 GMT, a very special second in the epoch. How will you celebrate?


Mortimer
journeyman

Jun 6, 2001, 6:51 AM

Post #17 of 18 (3409 views)
Re: Delete part of file ? [In reply to] Can't Post


Code
#!/usr/bin/perl -w 

use strict;
use Fcntl;
use CGI qw( :fatalsToBrowser );
my $q = CGI->new();
print $q->header;

my $hit_file = '/path/to/page.txt';
my $locked_flag = '/path/to/lock.txt';
my $flock_on = 'no';
my $waited_for = 0;
my $timeout = 5;
my $global = 1;
#----------------------------------------------
if( &choose_lock( $flock_on, $locked_flag, $waited_for, $timeout ) ){
undef $/;
my $content = <HFILE>;
if( $global == 1 ){
$content =~ s/<head>.*?<\/head>//gs;
}
else{
$content =~ s/<head>.*?<\/head>//s;
}
seek( HFILE, 0, 0 );
truncate( HFILE, 0 );
print HFILE $content;
if($flock_on eq 'no'){
sleep 3;
unlink( $locked_flag );
print "Lock file deleted< br>";
close( HFILE );
}
else{
close(HFILE);
}
}
###
sub choose_lock{
my($flock_on,$locked_flag,$waited_for,$timeout) = @_;
if( $flock_on eq 'no' ){
open( HFILE, "+<$hit_file" )or die( "Cannot open $hit_file: $!\n" );
&lock_f( $locked_flag, $waited_for, $timeout );
}
else{
open( HFILE, "+<$hit_file" )or die( "Cannot open $hit_file: $!\n" );
flock( HFILE, 2 );
}
}
###
sub lock_f{
my( $locked_flag, $waited_for, $timeout ) = @_;
if( eval{ sysopen( LFILE, $locked_flag, O_RDWR | O_EXCL | O_CREAT )or die( $! ) } ){
print LFILE "First: $$";
print "Try $waited_for - lock file created< br>";
close( LFILE );
}
else{
if( $waited_for > $timeout ){
print "Cannot open $hit_file< br>";
return 0;
exit;
}
else{
$waited_for++;
sleep 1;
&lock_f( $locked_flag, $waited_for, $timeout );
}
}
}

#----------------------------------------------
Hello Marcus. I suppose I should have started a new thread for this because you're right about it not being relevant enough to patrik's original question.

I suppose this issue had to come up. I don't think there is a way to make any alternative file lock 100% secure accross all platforms.

But I can't accept that as a reason for not doing anything, or for not opening the file just once, and doing the whole lot in as shorter time as possible. At least our solutions go much further (depending on the number of processes started in a given time period) in securing data in files, but of course we should never abandon our backup systems!

With my solution, the problem isn't that more than one process can create *the* lock file, more that they alternate (irregularly) in creating their own lock files, and deleting the other's. They can't do anything at the same time.

I'm sure you've pictured scenarios such as where Process P1 comes along and sees there is no lock file (at the -e line). There's a pause while it gets ready to create a lock file. During this pause, before P1 can create it's lock file, P2 sees the absence of a lock file and also gets ready to create one...etc...and eventually they both do. So we have two (or maybe more) processes running with the rights to change the data file. I don't want to mention P3, and the fact that all the processes may have different workloads. The best solution is to have an os which supports flock() because apparently this can't happen.

A similar thing would happen at the rename( $vote, $lock ) in your script. If two processes come along within an instance of acceptable time, they'll both be able to see that there is a $vote to rename, and so both will continue. I wouldn't use rename because (I've read) that this function behaves too differently accross platforms. However, I'm definitely not suggesting that creating a new lock file is a better option. To be honest, I don't know.

Well anyway, I did a lot of searching, and a little reading around the net, and after hours of misery and frustration, I decided to have a go with IPC::ShareLite, a module (available from CPAN) that deals with shared segment space for variables. Before I started I emailed the author Maurice Aubrey to ask if he thought it would be suitable. He said yes, but didn't advise it because the semaphore calls should be used directly. He said to look at sysopen() and the Fcntl module. So I did and here's my revised script using sysopen(). The -e condition is out, and the data file update by a competing process now relies on sysopen() failing atomically when it is passed O_EXCL when the lock file already exists. Does this mean that the cpu time originally available between the -e test and the open is no longer a problem? Dunno!

Cheers,
Dave.
www.dmscripts.com
davemortimer@bigpond.com




mhx
Enthusiast / Moderator

Jun 6, 2001, 12:22 PM

Post #18 of 18 (3400 views)
Re: Delete part of file ? [In reply to] Can't Post

Hi Dave,

you got me right, concerning the multiple-process thing. I just wasn't explaining exactly what I meant, but you got absolutely what was in my mind.
Concerning the rename, I was assuming that, since it's a system call, it would be safe to call it from different processes. But I don't know.
Anyway, thanks a lot for the investigation, for the sysopen() and for making me curious about the rename function. I think I'll have to figure out what it really does...

-- Marcus


 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives