
MB123
Novice
Nov 29, 2012, 12:34 PM
Post #1 of 16
(7054 views)
|
Script Advice: Is it doing what I think it is?
|
Can't Post
|
|
Hi all, I have these two scripts, which I have been using on an input file such as this:
FT SNP 467 FT /note="refAllele: C SNPstrains: 4330_2#1_EMRSA15=T 4330_2#5_EMRSA15=T 4340_5#9_EMRSA15=T 4340_6#3_EMRSA15=T 4340_6#4_EMRSA15=T 4350_2#2_EMRSA15=T 4350_2#3_EMRSA15=T 4386_1#4_EMRSA15=T 4386_2#12_EMRSA15=T 4386_2#5_EMRSA15=T " FT /colour=1 FT SNP 522 FT /note="refAllele: G SNPstrains: 4330_8#2_EMRSA15=A (synonymous) 4340_5#2_EMRSA15=A (synonymous) 4340_6#8_EMRSA15=A (synonymous) 6133_2#2_EMRSA15=A (synonymous) 6133_2#4_EMRSA15=A (synonymous) " FT /colour=3 FT SNP 523 FT /note="refAllele: G SNPstrains: 4414_6#1_EMRSA15=A (non-synonymous) (AA Glu->Lys) 6133_2#2_EMRSA15=A (non-synonymous) (AA Glu->Lys) " FT /colour=2 FT SNP 546 FT /note="refAllele: G SNPstrains: 4350_1#3_EMRSA15=A (synonymous) 4386_5#5_EMRSA15=A (synonymous) 6133_1#11_EMRSA15=A (synonymous) ST398_EMRSA15=A (synonymous) " FT /colour=3 Here is the first script:
use strict; use warnings; open( my $fh, '<', 'EARSS-MGE.txt') or die "Error opening file - $!\n"; open OUT, ">", "output.txt" or die "could not open output.txt $! \n"; my $this_line = ""; my $do_next = 0; my $data = <DATA>; while(<$fh>) { my $last_line = $this_line; $this_line = $_; chomp $data; if ($this_line =~ /\Q$data/) { print OUT $last_line unless $do_next; print OUT $this_line; $do_next = 1; } else { print OUT $this_line if $do_next; $last_line = ""; $do_next = 0; } } close ($fh); __DATA__ 4386_7#8_ 4350_7#6_ 4414_1#6_ 6133_2#2_ 4465_5#1_ 4465_5#6_ 6236_1#3_ 4330_8#8_ 4386_6#1_ 4414_5#9_ 4340_5#10_ 4340_6#11_ 4386_6#8_ I believe that this one matches any one of the items under <DATA> and extracts that line, as well as the two immediately above and below it. I have this second code:
use warnings; use strict; my %cod; $cod{1} = "Int"; $cod{2} = "non"; $cod{3} = "syn"; $cod{4} = "stop"; $SIG{'__WARN__'} = sub{die $_[0]}; my $file = "Type 9.txt"; open IN, "<", $file or die "could not open $file $! \n"; open OUT, ">", "output.txt" or die "could not open output.txt $! \n"; print OUT "Coordinate No of Strains AA Change\n"; my $data = <DATA>; my ($SNP, $count, $change); while(<IN>){ if (m/^FT\s+SNP\s+(\d+)/) { $SNP = $1; } elsif (m/^FT\s+\/note="(.*)"/) { my $line = $1; $count = ($line =~ tr/$data/$data/); $line =~ m/\((AA \w+->\w+)\)\s*$/; $change = $1 || ""; } elsif (m/^FT\s+\/colour=(\d+)/) { print OUT "$SNP $count $change\n" if $cod{$1} eq "non"; } } __DATA__ 4386_7#8_ 4350_7#6_ 4414_1#6_ 6133_2#2_ 4465_5#1_ 4465_5#6_ 6236_1#3_ 4330_8#8_ 4386_6#1_ 4414_5#9_ 4340_5#10_ 4340_6#11_ 4386_6#8_ Which I have been using on the output of the first code. This code looks for non-synonymous lines and prints the Coordinates (i.e. 523 from the sample input), the change in AA (i.e. AA Glu->Lys), and counts the number of times any match from <DATA> occurs. My main question is whether or not the count part of the second code "$count = ($line =~ tr/$data/$data/);" is working as I think - I am having a hard time telling from the output as some counts are ~500 which is difficult to go through by eye and see if any items not in <DATA> are included or not. Also, does the use of '#' comment out the subsequent numbers in the items under <DATA>? Any help would be greatly appreciated. Many thanks
|