CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Reading files from directory and subdirectory

 



eyebrowsbutt
Novice

Sep 11, 2012, 12:54 PM

Post #1 of 12 (1672 views)
Reading files from directory and subdirectory Can't Post

Hello,
I've been trying to write a code that allows me to remove the first line of a file and then calculate the percent of the letters G and C in files from many subdirectories in a single directory. I know the program to calculate those percentages works in singular files but I cannot get it to work on files in subdirectories. Here is the script that I have so far:


use warnings;
use strict;

list_recursively('all.fna');

exit;

###Opening directory
sub list_recursively {
my $directory = 'all.fna';
unless(opendir(DIRECTORY, $directory)) {
print "Cannot open directory $directory!\n";
exit;
}
my @files = grep (!/^\.\.?$/, readdir(DIRECTORY));

closedir(DIRECTORY);

foreach my $file (@files) {
if (-f "$directory/$file") {
GC ("$directory/$file");
print $directory/$file;
}elsif( -d "$directory/$file") {
list_recursively("$directory/$file");
}
}
######################Opening files
sub GC{
my ($proteinfilename) = @_;
chomp $proteinfilename;
open (PROTEINFILE, $proteinfilename);

my @protein = <PROTEINFILE>;

my $protein1 = shift @protein;
print @protein;
print "This is removed: $protein1\n";

my $protein = join ( '', @protein);
my $total=length($protein);

#Code for GC bases
my $GC = ($protein =~tr/GC//);
print "$GC\n";
#Calculate percent of GC bases
my $GCpercent = 100*($GC/$total);
print "$GCpercent%\n";

close PROTEINFILE;

exit;
}}
############################

When I run this program I get a bunch of scattered letters and numbers.

Thank you!


Zhris
Enthusiast

Sep 11, 2012, 1:17 PM

Post #2 of 12 (1667 views)
Re: [eyebrowsbutt] Reading files from directory and subdirectory [In reply to] Can't Post

Hi,

I haven't tested your code but I immediately notice a couple of issues.

The first thing the subroutine list_recursively does is assign $directory with 'all.fna' regardless. You most likely meant to assign $directory the first argument passed in:

Code
my $directory = 'all.fna';  
my $directory = shift;


I'm pretty certain you don't mean to print the result of $directory divided by $file. Don't forget to wrap in double quotes to ensure the value is interpolated appropriately. Also chuck in \n for readability purposes:

Code
print $directory/$file; 
print "$directory/$file\n";


Finally, not sure if you meant to put an exit at the end of the GC subroutine. There are also 2 closing curly braces afterwards (I am unable to tell if they are both necessary due to the formatting of your code upon post).

Thats a start!

Chris


(This post was edited by Zhris on Sep 11, 2012, 1:26 PM)


eyebrowsbutt
Novice

Sep 11, 2012, 1:28 PM

Post #3 of 12 (1660 views)
Re: [Zhris] Reading files from directory and subdirectory [In reply to] Can't Post

Hey!
I see, but just to clarify $directory = shift; is the same as $directory = @_;?

Thanks for your help!


Zhris
Enthusiast

Sep 11, 2012, 1:36 PM

Post #4 of 12 (1658 views)
Re: [eyebrowsbutt] Reading files from directory and subdirectory [In reply to] Can't Post

Hey,


Code
my $directory = shift; 
#OR
my ($directory) = @_;
#OR
my $directory = $_[0];


Also, I just formatted your code by hand to see that the two closing braces at the end are necessary. I couldn't tell that the GC subroutine was contained within the list_recursively subroutine (it would probably be best to separate). Wrap code you post here in code tags.

Chris


(This post was edited by Zhris on Sep 11, 2012, 1:41 PM)


eyebrowsbutt
Novice

Sep 11, 2012, 1:50 PM

Post #5 of 12 (1652 views)
Re: [Zhris] Reading files from directory and subdirectory [In reply to] Can't Post

Alright, I have made the changes (including separating subroutines) but I am still getting bad results. When I run code I get scattered letters and parts of some of the folder names. I have never successfully been able to write code for accessing files in subdirectories and I was wondering if you would know whether that should successfully open those files? I've been trying to piece together code from different tutorials and forum answers.

Thanks!


Zhris
Enthusiast

Sep 11, 2012, 1:56 PM

Post #6 of 12 (1650 views)
Re: [eyebrowsbutt] Reading files from directory and subdirectory [In reply to] Can't Post

Can you post your updated code, ensuring you wrap it in code tags before posting (http://perlguru.com/gforum.cgi?do=markup_help;).

Chris


eyebrowsbutt
Novice

Sep 11, 2012, 2:03 PM

Post #7 of 12 (1648 views)
Re: [Zhris] Reading files from directory and subdirectory [In reply to] Can't Post

This is the new code:


Code
use warnings; 
use strict;

list_recursively('all.fna');

exit;

###Opening directory
sub list_recursively {
my ($directory) = @_;

unless(opendir(DIRECTORY, $directory)) {
print "Cannot open directory $directory!\n";
exit;
}
my @files = grep (!/^\.\.?$/, readdir(DIRECTORY));

closedir(DIRECTORY);

foreach my $file (@files) {
if (-f "$directory/$file") {
GC ("$directory/$file");
print "$directory/$file";
}elsif( -d "$directory/$file") {
list_recursively("$directory/$file");
}
}
exit;

}
############################ Opening files
sub GC{
my ($proteinfilename) = @_;
chomp $proteinfilename;
open (PROTEINFILE, $proteinfilename);

my @protein = <PROTEINFILE>;

my $protein1 = shift @protein;
print @protein;
print "This is removed: $protein1\n";

my $protein = join ( '', @protein);
my $total=length($protein);

#Code for GC bases
my $GC = ($protein =~tr/GC//);
print "$GC\n";
#Calculate percent of GC bases
my $GCpercent = 100*($GC/$total);
print "$GCpercent%\n";

close PROTEINFILE;

exit;
}



Zhris
Enthusiast

Sep 11, 2012, 2:12 PM

Post #8 of 12 (1644 views)
Re: [eyebrowsbutt] Reading files from directory and subdirectory [In reply to] Can't Post

Hey,

Is 'all.fna' the correct starting directory (just in case, it looks like it could be a filename).

Also, the exit at the end of the list_recursively subroutine needs to be removed, which is probably the reason why you are only seeing "some of the folder names". This may resolve your issues, although I haven't looked into the workings of the GC subroutine (which you mentioned was working fine).

Chris


(This post was edited by Zhris on Sep 11, 2012, 2:14 PM)


eyebrowsbutt
Novice

Sep 11, 2012, 2:23 PM

Post #9 of 12 (1637 views)
Re: [Zhris] Reading files from directory and subdirectory [In reply to] Can't Post

Okay,
I just removed the "exit;" from the end of the subroutine. I tried changing the directory name to one that I knew was wrong and it gave me an error "Cannot open directory", but with 'all.fna' I did not get that error. Also the folder 'all.fna' and the perl program I am using are in the same folder.


Zhris
Enthusiast

Sep 11, 2012, 2:34 PM

Post #10 of 12 (1636 views)
Re: [eyebrowsbutt] Reading files from directory and subdirectory [In reply to] Can't Post

Hey,

If you use a directory that doesn't exist then your code will exit with the "Cannot open directory" error. If all.fna is a directory that sits in the same directory as your Perl script, then you are fine. If you run with just the exit removed, is the output better than before.

I re-wrote your recursive subroutine, and simplified the GC subroutine to just print the file_path. It might be a good starting point, since it only prints each file it recursively comes across. Printing too many unformatted strings can become confusing. If it works out, replace the GC subroutine with yours (don't forget to remove the final exit):


Code
#!/usr/bin/perl 
use warnings;
use strict;

list_recursively('all.fna');
exit;

sub list_recursively
{
my ($dir_path) = @_;

opendir my $dh, $dir_path or die "cannot open dir $dir_path: $!";

foreach my $path ( map { "$dir_path/$_" } grep { !/^\.\.?$/ } readdir $dh )
{
if (-f $path)
{
GC($path);
}
elsif (-d $path)
{
list_recursively($path);
}
}

closedir $dh;

return;
}

sub GC
{
my ($file_path) = @_;

print "$file_path\n";

return;
}


Chris


(This post was edited by Zhris on Sep 11, 2012, 3:19 PM)


eyebrowsbutt
Novice

Sep 11, 2012, 3:40 PM

Post #11 of 12 (1617 views)
Re: [Zhris] Reading files from directory and subdirectory [In reply to] Can't Post

Okay,
It seems that using your program and adding my GC (with some tweaks) worked. I really appreciate your help very much! But also if there is anyway I could get feedback on why my previous code did not work that would be great so that I really understand perl language.
Really appreciate the help! This is pretty sweet! :)


FishMonger
Veteran / Moderator

Sep 11, 2012, 4:51 PM

Post #12 of 12 (1614 views)
Re: [eyebrowsbutt] Reading files from directory and subdirectory [In reply to] Can't Post

Why reinvent the wheel by writing your own recursive directory tree sub? The only valid reason I can see (in this case) is if this is a requirement of a perl programming course.

Instead, you should be using either the File::Find or File::Find::Rule module, which handles all of the messy recursion details.


Code
#!/usr/bin/perl 

use warnings;
use strict;
use File::Find;

my $base_dir = 'all.fna';

find(\&GC, $base_dir);

sub GC {

# process only plain text files
return if not -f;

open my $protein_file, '<', $File::Find::name or do {
warn "failed to open $File::Find::name <$!>\n";
return;
};

# process file as needed

close $protein_file;
}


File::Find http://search.cpan.org/~rjbs/perl-5.16.1/lib/File/Find.pm

File::Find::Rule http://search.cpan.org/~rclamp/File-Find-Rule-0.33/lib/File/Find/Rule.pm

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives