CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Compare unchanged files in two arrays

 

First page Previous page 1 2 Next page Last page  View All


StarkRavingCalm
User

Jan 8, 2013, 10:45 AM

Post #1 of 47 (3942 views)
Compare unchanged files in two arrays Can't Post

Hello,
I am somewhat new to perl and need some assistance.
I am writing a script to:

1 - connect to an ftp server
2 - write results of an 'll' into an array (say...files1)
3 - go to sleep for 2 minutes
4 - write results of a second 'll' into a second array (say.. files2)

All works well, what I need is help comparing the two arrays.
I only want to download files that are unchanged after the 2 minute sleep period.

I just don't know how to in perl:
'in files2 where field 5 (filesize) and field 9 (filename) are the same field 9 and field 5 then write those results into an new array (say.. unchanged files).'

Additional question...
How do I know that line 17 of files1 is the same as line 17 of files2?
In other words if a file has been uploaded in the meantime and throws off the whole list.



Thanks in advance!


FishMonger
Veteran / Moderator

Jan 8, 2013, 11:51 AM

Post #2 of 47 (3940 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

Use hashes instead of arrays.

Comparing filesize is not the best way to determine if the file has been changed. You should compare the contents. The best approach to that would be to generate an md5 checksum and compare those.

So, your hash keys will be the filenames and the value of each is the md5 checksum/digest.

Digest::MD5
http://search.cpan.org/~gaas/Digest-MD5-2.52/MD5.pm

Another choice would be File::Checksum
http://search.cpan.org/~knorr/File-Checksum-0.01/Checksum.pm


(This post was edited by FishMonger on Jan 8, 2013, 11:54 AM)


StarkRavingCalm
User

Jan 8, 2013, 11:58 AM

Post #3 of 47 (3935 views)
Re: [FishMonger] Compare unchanged files in two arrays [In reply to] Can't Post

Thanks for the reply.

The main goal is not specifically to get 'unchanged' files ala diff, but avoid the following:

Downloading files that are still being written to\uploaded
Deleting files that were not previously downloaded (i.e.. mget *.txt, mdelete *.txt)

Additionally, the majority of these files will be encrypted so reading the contents is not an option.


FishMonger
Veteran / Moderator

Jan 8, 2013, 12:14 PM

Post #4 of 47 (3934 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post


Quote
The main goal is not specifically to get 'unchanged' files ala diff, but avoid the following:

Downloading files that are still being written to\uploaded
Deleting files that were not previously downloaded (i.e.. mget *.txt, mdelete *.txt)


That's different from what your opening question said you were needing to accomplish. However, I'll stick with my original suggestion; using hashes (or HoH) would still be the better data structure. The filename would be the hash key and the sub key(s) would be the filesize and/or checksum. If the second hash has more keys, then you know that additional files were uploaded. If the filesize or checksum has changed, then you know that the file was still being written to during the prior scan.


Quote
Additionally, the majority of these files will be encrypted so reading the contents is not an option.

That's were the md5 digest would be useful. It works on binary files as well as plain text files.


FishMonger
Veteran / Moderator

Jan 8, 2013, 12:51 PM

Post #5 of 47 (3925 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

Since you haven't provided any code or details about how you've coded your script, it's difficult to provide concrete code changes for your script. However, I'll assume that you're using the Net::FTP module.

Here's a short (untested) code snippet that should point you in the right direction. This generates an md5 checksum by executing a site command which appends the info to a file, which is then download and parsed to build the checksum hash.


Code
foreach my $file ( $ftp->ls ) { 
$ftp->site("md5sum $file >> checksum.txt");
}

$ftp->get('checksum.txt');

my %checksum;
open my $fh, '<', 'checksum.txt' or die "failed to open checksum.txt $!";
while ( <$fh> ) {
chomp;
my ($checksum, $filename) = split;
$checksum{$filename} = $checksum;
}



StarkRavingCalm
User

Jan 9, 2013, 7:08 AM

Post #6 of 47 (3907 views)
Re: [FishMonger] Compare unchanged files in two arrays [In reply to] Can't Post

Thanks I will try that out.

I need to take it back a step...

I have the code writing the results of 'ls' into a hash (changed from array)
What I would like to do is make the filename the key and the filesize the value
I just cant find any way of doing so...


FishMonger
Veteran / Moderator

Jan 9, 2013, 7:13 AM

Post #7 of 47 (3906 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

If you post your code, we will be able to make suggestions on what modifications it might need.


StarkRavingCalm
User

Jan 9, 2013, 7:43 AM

Post #8 of 47 (3904 views)
Re: [FishMonger] Compare unchanged files in two arrays [In reply to] Can't Post

relevant bits below:

use warnings;
use strict;
use Net::SSH;
use Net::SFTP;
use Data::Dumper;

my $sftp = Net::SFTP->new($host, %args)
or die "Connection failed!\n\n";

my %entries = map { $_->{longname} } $sftp->ls('/tmp');
foreach my $entry (%entries) {
print "$entry \n";
}


FishMonger
Veteran / Moderator

Jan 9, 2013, 9:09 AM

Post #9 of 47 (3895 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post


Code
my %file; 

foreach my $entry ( $sftp->ls('/tmp') ) {
my $size = (split(' ', $entry->{longname}))[4];
$file{$entry->{filename}} = $size;
}

print Dumper \%file;



FishMonger
Veteran / Moderator

Jan 9, 2013, 9:27 AM

Post #10 of 47 (3893 views)
Re: [FishMonger] Compare unchanged files in two arrays [In reply to] Can't Post

I should point out that there are several ways to get the file size.

Here's one of the other methods.


Code
my @file = $sftp->ls('/tmp'); 

for my $i (0..$#file) {
print $file[$i]{a}{size}, $/;
}



StarkRavingCalm
User

Jan 9, 2013, 10:34 AM

Post #11 of 47 (3888 views)
Re: [FishMonger] Compare unchanged files in two arrays [In reply to] Can't Post

Thanks, that works great!
Need to play with the formatting a little if possible but that's not important.

So...
If I wanted to compare each of the two hashes, what it the best way to accomplish this:
'where value of keys are same write results to a list? array?'
since I only want the filenames (keys), is an array the best way to represent this in perl?

(I know you recommended MD5, but I want to try out each option and use each one as a learning opportunity)


UPDATE:

I found this code to compare the two:
Just need to tie it together...


for ( keys %hash1 ) {
unless ( exists $hash2{$_} ) {
print "$_: not found in second hash\n";
next;
}

if ( $hash1{$_} eq $hash2{$_} ) {
print "$_: values are equal\n";
}
else {
print "$_: values are not equal\n";
}
}


(This post was edited by StarkRavingCalm on Jan 9, 2013, 11:42 AM)


StarkRavingCalm
User

Jan 16, 2013, 1:18 PM

Post #12 of 47 (3806 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

Ok so I have it working in unison.
My issue right now is hidden files.
Is there a quick way to exclude them during the initial 'ls' or will it need to be separate logic?

{
my %file;
foreach my $entry ( $sftp->ls('/tmp') ) {
my $size = (split(' ', $entry->{longname}))[4];
$file{$entry->{filename}} = $size;
}

print Dumper \%file;


StarkRavingCalm
User

Jan 23, 2013, 6:48 AM

Post #13 of 47 (3771 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

quick bump

Still havent found a good method of ignoring hidden files.
Anyone have any ideas?

Should I do it in the original ls that creates the hash or do it in separate logic?


Laurent_R
Veteran / Moderator

Jan 23, 2013, 8:35 AM

Post #14 of 47 (3763 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

What do you call hidden files exactly?

Presumably they should not appear when you issue the "ls" command, so why would you want to get rid of them? Or do they appear?

Are hidden files starting with a dot? If so, it should be quite simple to filter them out with a regex or something.


StarkRavingCalm
User

Jan 23, 2013, 8:42 AM

Post #15 of 47 (3759 views)
Re: [Laurent_R] Compare unchanged files in two arrays [In reply to] Can't Post

Correct. Anything starting with a '.'
Ultimately, I'd like to do it in the initial 'ls' string, just can't find a method that works. (Don't let the 'ls' in the string fool you, its actually doing a 'll')



Code
my %file;  

foreach my $entry ( $sftp->ls('/tmp') ) {
my $size = (split(' ', $entry->{longname}))[4];
$file{$entry->{filename}} = $size;
}

print Dumper \%file;



StarkRavingCalm
User

Jan 24, 2013, 1:00 PM

Post #16 of 47 (3742 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

I would still like to find a good solution for not displaying hidden files but I have another requirement currently.

After I download the files, I would like to do some local validation.
Basically compare the hash created in previous steps against a hash created from a local 'ls -la'.
I have it working but the sort order is all messed up:
This is from the remote server:

Code
my %file;  

foreach my $entry ( $sftp->ls('/home/ftptest/inbound') )
{
my $size = (split(' ', $entry->{longname}))[4];
$file{$entry->{filename}} = $size;
}

print Dumper \%file;

RESULTS:
$VAR1 = {
'file2' => '7',
'file1' => '4',
'file3' => '10',
'file4' => '13',
'..' => '4096',
'.' => '4096'
};

Here is the ls -la from the local directory:

my %local_files;
%local_files=`ls -ltr /tmp/scripttest/inbound | awk {'print \$9,\$5'}`;
print Dumper \%local_files;

RESULTS:

VAR1 = {
'file3 10
' => 'file2 7
',
'
' => 'file4 13
',
'file1 4
' => undef
};


I know it's in the split statement on the one from the remote server, I just can't seem to get the combination correct.
Any clues?


(This post was edited by StarkRavingCalm on Jan 29, 2013, 11:10 AM)


StarkRavingCalm
User

Jan 29, 2013, 11:10 AM

Post #17 of 47 (3716 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

bump


FishMonger
Veteran / Moderator

Jan 29, 2013, 2:24 PM

Post #18 of 47 (3711 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

First, in order to skip the hidden files, you simply need to add a "next" statement prior to the split.

Code
next if $entry->{filename} =~ /^\./;


This statement of yours is not doing what you think.

Code
%local_files=`ls -ltr /tmp/scripttest/inbound | awk {'print \$9,\$5'}`;


It's combining/concatenating the filename and size into a single string which is then passed back to be used in the hash assignment. The first concatenated string become a key and the second concatenated string becomes its value. That process repeats until there are no more strings being returned.

There are multiple ways to build the hash correctly. Here's one approach which uses the system ls command without needlessly using awk.


Code
my %local_files; 
foreach (`ls -l`){
chomp;
my ($size, $file) = (split(' ', $_))[4,8];
$local_files{$file} = $size;
}
print Dumper \%local_files;



StarkRavingCalm
User

Jan 29, 2013, 2:35 PM

Post #19 of 47 (3707 views)
Re: [FishMonger] Compare unchanged files in two arrays [In reply to] Can't Post

Ok, thanks, I will try the ignore hidden files statement.

I actually just got the local ls to work the way I wanted with this:

my %local_files = split ' ', `ls -ltr /tmp/scripttest/inbound | awk {'print\$9,\$5'}`;

hopefully between this and the hidden files, i will just need some minor tweaking

thanks again


FishMonger
Veteran / Moderator

Jan 29, 2013, 2:52 PM

Post #20 of 47 (3705 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

Instead of running a piped command inside backticks, it would be more efficient to use builtin perl methods which would also have the benefit of being platform independent.


Code
foreach my $file (<*>) { $local_files{$file} = (stat($file))[7]; }



FishMonger
Veteran / Moderator

Jan 30, 2013, 7:37 AM

Post #21 of 47 (3695 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

Quoted from PM

Quote
Thanks for you help on this.

So how would this all come together?
(Just the local part)
Where do I put in the directory I want?


Please post questions like that in the actual thread, not in a PM.


The directory would be put inside the <> diamond operator.

Code
foreach my $file (</tmp/scripttest/inbound/*>) { $local_files{$file} = (stat($file))[7]; }



FishMonger
Veteran / Moderator

Jan 30, 2013, 7:40 AM

Post #22 of 47 (3694 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

You can also look at using the glob function instead of the diamond operator.

perldoc -f glob
http://perldoc.perl.org/functions/glob.html


StarkRavingCalm
User

Jan 30, 2013, 8:27 AM

Post #23 of 47 (3689 views)
Re: [FishMonger] Compare unchanged files in two arrays [In reply to] Can't Post

Thanks, this is working now with this statement:

foreach my $file (</tmp/scripttest/inbound/*>) { $local_files{$file} = (stat($file))[7]; }

but it is showing the full path:
'/tmp/scripttest/inbound/file3' => '10'

How can I have show just filename and size as I am comparing against the hashes I received from the remote server prior to download


(This post was edited by StarkRavingCalm on Jan 30, 2013, 8:29 AM)


FishMonger
Veteran / Moderator

Jan 30, 2013, 8:50 AM

Post #24 of 47 (3683 views)
Re: [StarkRavingCalm] Compare unchanged files in two arrays [In reply to] Can't Post

We could use one of the other perl methods (such as opendir/readdir) which won't retain the path portion or we could use the basename function to extract the filename.


Code
use File::Basename;  # this would be placed with the other use statements 

foreach my $file (</tmp/scripttest/inbound/*>) { $local_files{basename($file)} = (stat($file))[7]; }



StarkRavingCalm
User

Jan 30, 2013, 9:07 AM

Post #25 of 47 (3680 views)
Re: [FishMonger] Compare unchanged files in two arrays [In reply to] Can't Post

Boom!

That worked. Have to hunt down a loop in my script but we are ALMOST there.....

First page Previous page 1 2 Next page Last page  View All
 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives