CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner:
Script Advice: Is it doing what I think it is?

 



MB123
Novice

Nov 29, 2012, 12:34 PM

Post #1 of 16 (2601 views)
Script Advice: Is it doing what I think it is? Can't Post

Hi all,

I have these two scripts, which I have been using on an input file such as this:


Code
FT   SNP             467 
FT /note="refAllele: C SNPstrains: 4330_2#1_EMRSA15=T 4330_2#5_EMRSA15=T 4340_5#9_EMRSA15=T 4340_6#3_EMRSA15=T 4340_6#4_EMRSA15=T 4350_2#2_EMRSA15=T 4350_2#3_EMRSA15=T 4386_1#4_EMRSA15=T 4386_2#12_EMRSA15=T 4386_2#5_EMRSA15=T "
FT /colour=1
FT SNP 522
FT /note="refAllele: G SNPstrains: 4330_8#2_EMRSA15=A (synonymous) 4340_5#2_EMRSA15=A (synonymous) 4340_6#8_EMRSA15=A (synonymous) 6133_2#2_EMRSA15=A (synonymous) 6133_2#4_EMRSA15=A (synonymous) "
FT /colour=3
FT SNP 523
FT /note="refAllele: G SNPstrains: 4414_6#1_EMRSA15=A (non-synonymous) (AA Glu->Lys) 6133_2#2_EMRSA15=A (non-synonymous) (AA Glu->Lys) "
FT /colour=2
FT SNP 546
FT /note="refAllele: G SNPstrains: 4350_1#3_EMRSA15=A (synonymous) 4386_5#5_EMRSA15=A (synonymous) 6133_1#11_EMRSA15=A (synonymous) ST398_EMRSA15=A (synonymous) "
FT /colour=3

Here is the first script:

Code
use strict; 
use warnings;

open( my $fh, '<', 'EARSS-MGE.txt') or die "Error opening file - $!\n";
open OUT, ">", "output.txt" or die "could not open output.txt $! \n";

my $this_line = "";
my $do_next = 0;
my $data = <DATA>;

while(<$fh>) {
my $last_line = $this_line;
$this_line = $_;

chomp $data;
if ($this_line =~ /\Q$data/) {
print OUT $last_line unless $do_next;
print OUT $this_line;
$do_next = 1;
} else {
print OUT $this_line if $do_next;
$last_line = "";
$do_next = 0;
}
}
close ($fh);

__DATA__
4386_7#8_
4350_7#6_
4414_1#6_
6133_2#2_
4465_5#1_
4465_5#6_
6236_1#3_
4330_8#8_
4386_6#1_
4414_5#9_
4340_5#10_
4340_6#11_
4386_6#8_


I believe that this one matches any one of the items under <DATA> and extracts that line, as well as the two immediately above and below it.

I have this second code:


Code
use warnings;  
use strict;

my %cod;
$cod{1} = "Int";
$cod{2} = "non";
$cod{3} = "syn";
$cod{4} = "stop";

$SIG{'__WARN__'} = sub{die $_[0]};
my $file = "Type 9.txt";
open IN, "<", $file or die "could not open $file $! \n";
open OUT, ">", "output.txt" or die "could not open output.txt $! \n";
print OUT "Coordinate No of Strains AA Change\n";
my $data = <DATA>;
my ($SNP, $count, $change);
while(<IN>){
if (m/^FT\s+SNP\s+(\d+)/) {
$SNP = $1;
}
elsif (m/^FT\s+\/note="(.*)"/) {
my $line = $1;
$count = ($line =~ tr/$data/$data/);
$line =~ m/\((AA \w+->\w+)\)\s*$/;
$change = $1 || "";
}
elsif (m/^FT\s+\/colour=(\d+)/) {
print OUT "$SNP $count $change\n" if $cod{$1} eq "non";
}
}

__DATA__
4386_7#8_
4350_7#6_
4414_1#6_
6133_2#2_
4465_5#1_
4465_5#6_
6236_1#3_
4330_8#8_
4386_6#1_
4414_5#9_
4340_5#10_
4340_6#11_
4386_6#8_

Which I have been using on the output of the first code. This code looks for non-synonymous lines and prints the Coordinates (i.e. 523 from the sample input), the change in AA (i.e. AA Glu->Lys), and counts the number of times any match from <DATA> occurs.

My main question is whether or not the count part of the second code "$count = ($line =~ tr/$data/$data/);" is working as I think - I am having a hard time telling from the output as some counts are ~500 which is difficult to go through by eye and see if any items not in <DATA> are included or not.

Also, does the use of '#' comment out the subsequent numbers in the items under <DATA>?

Any help would be greatly appreciated.

Many thanks


Laurent_R
Veteran / Moderator

Nov 29, 2012, 2:59 PM

Post #2 of 16 (2597 views)
Re: [MB123] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

Hi,

yes, "$count = ($line =~ tr/$data/$data/);" should count the number of times $data is appearing in $line (provided you don't have overlapping repetition of $data).

And '#' comments out whatever in on the right of it in your code, but not in the data you are reading, even if you are reading it from the __DATA__ section of your program.


MB123
Novice

Nov 29, 2012, 3:56 PM

Post #3 of 16 (2593 views)
Re: [Laurent_R] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

Thank you for your reply.

Could you please elucidate what you mean by overlapping repetition of data? As in the same item of $data repeated in the same line?

When I used this script I had some counts around ~500, when in $data n=60. Would this just be due to repetition of the same $data code in one line in the input file?

Many thanks


Laurent_R
Veteran / Moderator

Nov 29, 2012, 11:19 PM

Post #4 of 16 (2583 views)
Re: [MB123] Script Advice: Is it doing what I think it is? [In reply to] Can't Post


In Reply To
Could you please elucidate what you mean by overlapping repetition of data? As in the same item of $data repeated in the same line?


Well if $data contains "foofoo" and your line "foofoofoofoofoo", if will probbly count 2 occurrences though you could consider it comes 4 times. This example is simplistic, but it can get a bit subtle sometimes.

On the other point, if you suspect you don't get the count right, please supply examples of your data, it is difficult to say something without seeing the actual case.


(This post was edited by Laurent_R on Nov 30, 2012, 12:13 AM)


MB123
Novice

Nov 30, 2012, 4:19 AM

Post #5 of 16 (2578 views)
Re: [Laurent_R] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

I have attached an example bit of data, for which my second script returns:


Code
Coordinates       Count      Change 
14733 194 AA Ala->Thr


Using this $data:


Code
4386_7#8_ 
4350_7#6_
4414_1#6_
4465_5#1_
4465_5#6_
6236_1#3_
4330_8#8_
4386_6#1_
4414_5#9_
4340_5#10_
4340_6#11_
4386_6#8_
4430_2#1_
4430_2#11_
4386_6#12_
4414_2#9_
4330_8#6_
4386_2#9_
4386_5#2_
4386_8#8_
4395_3#7_
4414_1#10_
4415_2#10_
4415_3#10_
4430_1#12_
4465_5#3_
6133_3#11_
6133_3#9_
MSSA476_
MW2_
4430_1#3_
4350_2#5_
4386_2#3_
4414_6#7_
4415_2#7_
4386_8#10_
6133_3#10_
6236_1#2_
6236_1#9_
4350_7#4_
4414_5#2_
4430_2#4_
4414_1#8_
4340_5#8_
4414_3#2_
4414_6#4_
4415_2#5_
4430_2#12_
4340_5#6_
4350_2#1_
4350_2#9_
4414_1#12_
6133_2#2_
6133_2#4_
4330_8#2_
4340_5#2_
4350_2#12_
4350_2#8_
4465_5#5_
4340_6#8_


$data n=60, and non-synonymous changes in the example =382, but I don't think that any item in the $data should be represented more than once.

Many thanks
Attachments: Example.txt (9.45 KB)


MB123
Novice

Nov 30, 2012, 4:56 AM

Post #6 of 16 (2577 views)
Re: [MB123] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

I just went through the example I posted by hand and only found 31 true counts made up of 31 different items in $data, that is each was only represented once in the count.


Chris Charley
User

Nov 30, 2012, 7:32 AM

Post #7 of 16 (2566 views)
Re: [MB123] Script Advice: Is it doing what I think it is? [In reply to] Can't Post


Quote
My main question is whether or not the count part of the second code "$count = ($line =~ tr/$data/$data/);" is working as I think


No, it isn't getting the count. If you read the docs on the transliteration operator, (it is down 1 or 2 pages from this link), you will find this remark:


Quote
Because the transliteration table is built at compile time, neither the SEARCHLIST nor the REPLACEMENTLIST are subjected to double quote interpolation. That means that if you want to use variables, you must use an eval():


I haven't had time to look over the program to see what you wish to achieve. $data is the first line of the __DATA__ file. Is that what you need in $data?

Update: Irregardless of whether the variable contains quotes, variable interpolation does not occur for transliteration, (for the reason stated above).


(This post was edited by Chris Charley on Nov 30, 2012, 12:13 PM)


Laurent_R
Veteran / Moderator

Nov 30, 2012, 10:59 AM

Post #8 of 16 (2556 views)
Re: [Chris Charley] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

But it should work if you don't have single ou double quotes, I think, shouldn't it?

I have no time to verify now, but I think it works when you don't have quotes.


Laurent_R
Veteran / Moderator

Nov 30, 2012, 1:38 PM

Post #9 of 16 (2549 views)
Re: [Chris Charley] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

I have tried it, it does not quite work with tr//.

However, using s/// seems to give the expected results:


Code
my $line ="foobarfoofoofoobarbarbazfoo"; 
my $data = "foob";
my $count = ($line =~ s/$data/$data/g);
print $count; # prints 2



MB123
Novice

Dec 1, 2012, 4:31 AM

Post #10 of 16 (2520 views)
Re: [Laurent_R] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

Thank you both for your help.

So I should substitute for s///?

Also I need to read from every line in the __DATA__ file which I thought I was, am I not?

Many thanks.


Laurent_R
Veteran / Moderator

Dec 1, 2012, 5:31 AM

Post #11 of 16 (2519 views)
Re: [MB123] Script Advice: Is it doing what I think it is? [In reply to] Can't Post


In Reply To
Also I need to read from every line in the __DATA__ file which I thought I was, am I not?


Well, when you have:


Code
my $data = <DATA>;  
my ($SNP, $count, $change);
while(<IN>){ # ...


you are really reading just only one line of the __DATA__ section into the $data variable, and then you are reading the lines of the file having the <IN> filehandler. There does not seem to be anything in your original code to read more than the first line of the __DATA__ section.


Chris Charley
User

Dec 1, 2012, 9:03 AM

Post #12 of 16 (2508 views)
Re: [MB123] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

You might try this code to get the count. This tests $line for a match for each item in __DATA__ and gets the count.


Code
 #!/usr/bin/perl  
use strict;
use warnings;
use 5.014;

my %cod;
$cod{1} = "Int";
$cod{2} = "non";
$cod{3} = "syn";
$cod{4} = "stop";

$SIG{'__WARN__'} = sub{die $_[0]};
my $file = "Type 9.txt";
open IN, "<", $file or die "could not open $file $! \n";
open OUT, ">", "output.txt" or die "could not open output.txt $! \n";
print OUT "Coordinate No of Strains AA Change\n";
chomp(my @data = <DATA>);
my ($SNP, $count, $change);
while(<IN>){
if (m/^FT\s+SNP\s+(\d+)/) {
$SNP = $1;
}
elsif (m/^FT\s+\/note="(.*)"/) {
my $line = $1;
$count = 0;

for my $data (@data) {
$count += () = $line =~ /$data/g;
}
$line =~ m/\((AA \w+->\w+)\)\s*$/;
$change = $1 || "";
}
elsif (m/^FT\s+\/colour=(\d+)/) {
print OUT "$SNP $count $change\n" if $cod{$1} eq "non";
}
}

__DATA__
4386_7#8_
4350_7#6_
4414_1#6_
6133_2#2_
4465_5#1_
4465_5#6_
6236_1#3_
4330_8#8_
4386_6#1_
4414_5#9_
4340_5#10_
4340_6#11_
4386_6#8_



(This post was edited by Chris Charley on Dec 2, 2012, 8:41 AM)


MB123
Novice

Dec 2, 2012, 7:40 AM

Post #13 of 16 (2472 views)
Re: [Chris Charley] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

Thank you for your code.

I tried to run it but it returned an error:


Code
Can't declare array preference in "my" at Script.pl line 17, near "$data =" Global symbol "@data" requires explicit package name at Script.pl line 27. Execution aborted due to compilation errors.



Chris Charley
User

Dec 2, 2012, 8:43 AM

Post #14 of 16 (2465 views)
Re: [MB123] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

Whoops!

chomp(my @$data = <DATA>);

That should be:

chomp(my @data = <DATA>);


MB123
Novice

Dec 2, 2012, 1:06 PM

Post #15 of 16 (2457 views)
Re: [Chris Charley] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

Thank you for your help, that has worked perfectly.

Am I right in assuming that my first code is suffering from the same problem of only reading the first line from $data?

Also, is there a way that I can display my output with intergenic data (that is the first two lines of data preceding a line saying /colour=1) and synonymous data (two lines preceding /colour=3) in the same file as the non-synonymous (/colour =2) changes?

I have another script that can display the type of change but not the "AA change", but I am struggling to combine them.

I have been trying to modify this line to do this:

Code
elsif (m/^FT\s+\/colour=(\d+)/) {      
print OUT "$SNP $count $change\n" if $cod{$1} eq "non";


So far I can only get this output:


Code
Coordinate	      No of Strains	    AA Change 
467 0 refAllele: C SNPstrains: 4330_2#1_EMRSA15=T 4330_2#5_EMRSA15=T 4340_5#9_EMRSA15=T 4340_6#3_EMRSA15=T
522 3 refAllele: G SNPstrains: 4330_8#2_EMRSA15=A (synonymous) 4340_5#2_EMRSA15=A (synonymous) 4340_6#8_EMRSA15=A (synonymous)
523 1 AA Glu->Lys


Which I would ideally like to look like this:


Code
Coordinate     No of Strains      Change 
467 0 Int
522 3 syn
523 1 AA Glu->Lys


So I assume I am sort of on the right path?!

Many thanks


(This post was edited by MB123 on Dec 2, 2012, 1:07 PM)


Chris Charley
User

Dec 3, 2012, 11:35 AM

Post #16 of 16 (2411 views)
Re: [MB123] Script Advice: Is it doing what I think it is? [In reply to] Can't Post

I got this output from the t33.pl program using 51_line.txt for input.

Code
 
C:\Old_Data\perlp>perl t33.pl
26 0 Int
49 0 Int
102 0 Int
139 0 Int
177 0 Int
233 0 Int
259 0 Int
314 0 Int
319 0 Int
329 0 Int
375 0 Int
413 0 Int
414 0 Int
433 0 Int
442 0 Int
460 0 Int
703 0 AA Ala->Thr
14733 33 AA Ala->Thr

Attachments: t33.pl (1.42 KB)
  51_line.txt (10.9 KB)

 
 


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives