CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Trying to create subroutines

 



Mr Keystrokes
Novice

May 24, 2017, 4:39 AM

Post #1 of 4 (1096 views)
Trying to create subroutines Can't Post

Hello there,

I have written a script which allows the user to pick out sequences from a file which are within a specific length range. I am now trying to apply subroutines to this script, but I am getting a bit confused with declaration and invocation as well as how to pass values from one subroutine into the invocation of another subroutine.

My initial script is as follows:


Code
 
open F, "human_hg19_circRNAs_putative_spliced_sequence.fa", or die $!;

my %seq = ();
my $id = '';

while (<F>){
chomp;
if ($_ =~ /^>(.+)/){
$id = $1;
}else{
$seq{$id} .= $_;
}
}



print "Enter Max sequence length: \n";
my $maxlength = <STDIN>;
chomp $maxlength;

print "Enter Min sequence length: \n";
my $minlength = <STDIN>;
chomp $minlength;

my @seqarray;

foreach $id (keys %seq){
if ((length$seq{$id} <= $maxlength) && (length$seq{$id} >= $minlength)){
push @seqarray, $id;
}
}

for $id (@seqarray){
if (-f $id){print $id, " already exists. It is about to be overwritten"};
open new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!;
print new_F ($id."\n".$seq{$id}."\n");
close new_F;
}

close F;




Below is my amateur attempt at creating subroutines:



Code
##Invoking subroutines 

my ($this_id, %that_seq) = HashSequences();
SpecifySeqLengths();


## Open file. Hash sequences. Make the sequence IDs the keys to their
## respective (hashed) sequences.


sub HashSequences{

open F, "human_hg19_circRNAs_putative_spliced_sequence.fa", or die $!;

my $id = '';
my %seq = ();


while (<F>){
chomp;
if ($_ =~ /^>(.+)/){
$id = $1;
}else{
$seq{$id} .= $_;
}
}
close F;
return ($id, %seq);
}

## Request sequence length desired. Sieve sequences of given length into
## arrays. Create file containing desired sequences.

sub SpecifySeqLengths{

print "Enter Max sequence length: \n";
my $maxlength = <STDIN>;
chomp $maxlength;

print "Enter Min sequence length: \n";
my $minlength = <STDIN>;
chomp $minlength;

my @seqarray;

foreach $id (keys %seq){
if ((length$seq{$id} <= $maxlength) && (length$seq{$id} >= $minlength)){
push @seqarray, $id;
}
}

for $id (@seqarray){
if (-f $id){print $id, " already exists. It is about to be overwritten"};
open new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!;
print new_F ($id."\n".$seq{$id}."\n");
close new_F;
}

}

I'd really appreciate an explanation conceptually of what I need to do.

Thank you.


BillKSmith
Veteran

May 24, 2017, 6:52 AM

Post #2 of 4 (1092 views)
Re: [Mr Keystrokes] Trying to create subroutines [In reply to] Can't Post

You should always put the following two pragmas at the top of your script.

Code
use strict; 
use warnings;


In this case, it will give you a very large number of errors regarding the declaration of two variables (%seq and $ID).

We can get rid of the %seq errors by changing %this_seq to %seq. (This may not be real solution, but at least it gives us a chance to work on the remaining errors.

All the remaining errors refer to a few lines. When you examine this area, you should discover that you did forget the 'my' on two 'foreach' loops.


Code
foreach my $id (keys %seq){  
...
}
for my $id (@seqarray){
...
}


Your code will now compile without errors, and may actually 'work'.

Next, notice that the variable $this_id is not used anywhere. Remove it. (You also must remove the $id from the return).

You probably are wondering how the data is passed. The subroutine HashSequences make its own copy of %seq. The return copies it back to the main program. The scope of %seq in the main program extends to the end of the file. The subroutine SpecifySeqLengths does not declare its own copy of %seq. The copy in main is still in scope, so the subroutine uses it.

This method of passing data is usually considered poor practice. It really is not to bad for only one variable in such a small program.

After you get this much working, I will suggest a number of improvements. Whether it works or not, please post a copy of your corrected code and a small sample of data so we can run it.
Good Luck,
Bill


Mr Keystrokes
Novice

May 25, 2017, 4:56 AM

Post #3 of 4 (1071 views)
Re: [BillKSmith] Trying to create subroutines [In reply to] Can't Post

Hello Bill,

Yes, the script is working now thanks to your suggestions.
I appreciate the explanation of how the data is passed from one sub to another.
Here is the code:



Code
#!/usr/bin/perl  
use strict;
use warnings;

##Invoking subroutines


my %seq = HashSequences();
SpecifySeqLengths();


## Open file. Hash sequences. Make the sequence IDs the keys to their
## respective (hashed) sequences.


sub HashSequences{

open F, "human_hg19_circRNAs_putative_spliced_sequence.fa", or die $!;

my $id = '';
my %seq = ();


while (<F>){
chomp;
if ($_ =~ /^>(.+)/){
$id = $1;
}else{
$seq{$id} .= $_;
}
}
close F;
return %seq;
}

## Request sequence length desired. Sieve sequences of given length into
## arrays. Create file containing desired sequences.

sub SpecifySeqLengths{

print "Enter Max sequence length: \n";
my $maxlength = <STDIN>;
chomp $maxlength;

print "Enter Min sequence length: \n";
my $minlength = <STDIN>;
chomp $minlength;

my @seqarray;

foreach my $id (keys %seq){
if ((length$seq{$id} <= $maxlength) && (length$seq{$id} >= $minlength)){
push @seqarray, $id;
}
}

for my $id (@seqarray){
if (-f $id){print $id, " already exists. It is about to be overwritten"};
open new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!;
print new_F ($id."\n".$seq{$id}."\n");
close new_F;
}

}


A search for sequences containing 101 bases produced the following:

hsa_circ_0000148|chr1:165878068-165878169+|NM_012474|UCK2
GGCTTCGGTAACTGATGCCGGCAGCAGCGCCTCATGGAGTGGCAGGGGAGCTCTGTCTGTGGAGGGGTAGTAGTTTCCGCTTCCTGGAGGTTTGCAGTGAG
hsa_circ_0000995|chr2:44419687-44419788-|NM_001033556|PPM1B
GTGATAGATATGTTCATTAGCTTGATTCAATCATTTTACACTGCATACATATGTTATAACATCATTTTATACCTAATACATATGTAAGATTATAATGTCAA
hsa_circ_0001842|chr9:33053120-33053221-|NM_018225|SMU1
ATATTACCGTCAACAGTGTGATTCTACTTCCTAAAAACCCTGAGCACTTTGTGGTGTGCAACAGATCAAACACGGTGGTCATCATGAACATGCAGGGGCAG
hsa_circ_0000878|chr19:4589738-4589839-|None|None
GTGTCGGAGGTTTTGTAGTGGACGCTGAGCAGGCCCCCTGGTGGGGAGGGAAGGGTGTGATCTGAGCTCGACACCCTTGATGGCCTCTCTGCACACTGTCA





What were the other suggestions of improvement that I could make to my script.
I do plan on adding more functions that allow me to manipulate large files of fasta sequences.

Thanks very much sir.


BillKSmith
Veteran

May 25, 2017, 6:33 AM

Post #4 of 4 (1067 views)
Re: [Mr Keystrokes] Trying to create subroutines [In reply to] Can't Post

The subroutine HashSequences would be much simpler if you could read your data sequence at a time rather than line at a time. This is probably possible, but I cannot tell without seeing a sample of your input data. (That is one of the reasons I asked for it.)

Allowing the subroutine SpecifySeqLengths to use the copy of %seq in the main program rarely causes any problem, but it can make errors much harder to find. The best practice is to always pass a reference to a data structure such as an array or hash.

Code
my %seq; 
...
SpecifySeqLengths(\%seq);
...
sub SpecifySeqLenghts{
my %seq = %{$_[0]};
...
}


Everything else remains the same. Now, the subroutine has its own copy of the hash. It cannot make changes to the copy in the main program.


You are using a very old style of open. New software should always use the three argument form where the second argument specifies whether the file is for input or output and whether any special processing is required. The third argument is just the file name.

You should always use "lexical filehandles". If you need to copy them or pass them to subroutine, they are just like any other scalar.

Code
#open my $F, '<', "human_hg19_circRNAs_putative_spliced_sequence.fa" 
or die $!;
...
while (<$F>){
...
}
close $F;


You should always consider using a prompt module (my favorite is IO::Prompt::Hooked for user input. In this case, you may consider it overkill, but that should be a conscience decision, not just a habit.

UPDATE:
I forgot to mention that it is not necessary open and close the output file for every write.

Code
	open my $new_F, '>>', "SeqLength_$minlength-$maxlength", or die$!;  
for my $id (@seqarray){
if (-f $id){warn "$id already exists. It is about to be overwritten"};
print $new_F ($id."\n".$seq{$id}."\n");
}
close $new_F;

Your file test does not make any sense. I do not understand what you intend.

The error message should be sent to STDERR with warn.
Good Luck,
Bill

(This post was edited by BillKSmith on May 25, 2017, 7:26 AM)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives