
FishMonger
Veteran
/ Moderator
Nov 13, 2012, 7:10 PM
Post #22 of 31
(19434 views)
|
Re: [MB123] File parsing problem, Use of uninitialised value error.
[In reply to]
|
Can't Post
|
|
I've been following this thread but until now I've not contributed. So far, the helpers have indirectly stated that you first need to correct the issues with the input before you can parse the data correctly. While that might be the ideal approach, it is not always/often possible. Instead, I suggest that you add additional/different error checking to accommodate the problems. Here's my test script, which has the minimal level of error checking that I think it might need, but in a production level script, the error checking/handing would be expanded. In this example I'm putting the input data inside the the script but also include commented out lines that read the input data from an external file.
#!/usr/bin/perl use strict; use warnings; use Carp; my %cod = ( 1 => "Int", 2 => "non", 3 => "syn", 4 => "stop", ); my $input_file = 'BSAC.txt'; my $output_file = 'output.txt'; #open my $input_fh, '<', $input_file or croak "could not open '$input_file' <$!>\n"; open my $output_fh, '>', $output_file or croak "could not open '$output_file <$!>\n"; printf {$output_fh} ("%-12s %-15s %-10s\n", 'Coordinate', 'No of Strains', 'AA Change'); print {$output_fh} '-' x 38, "\n"; RECORD: #while (my $line = <$input_fh>) { while (my $line = <DATA>) { chomp $line; if ( $line =~ /^FT \s+ SNP \s+ (\d+)/x) { my $snp = $1; #my $note = <$input_fh>; my $note = <DATA>; if ( $note =~ /^FT \s+ \/note = "(.+)"/x ) { $note = $1; } else { carp qq(format error parsing "note" at or near line $. - skipping this record\n); next RECORD; } my $count = ($note =~ tr/=/=/) || 0; my ($change) = $note =~ /\(AA ([^)]+)\) \s+$/x ? $1 : ''; #my $colour = <$input_fh>; my $colour = <DATA>; $colour or do { carp qq(format error parsing "colour" at or near line $. - skipping this record\n); next RECORD; }; if ( $colour =~ /^FT \s+ \/colour = (\d+)/x ) { $colour = $1; } else { carp qq(format error parsing "colour" at or near line $. - skipping this record\n); next RECORD; } if ($cod{$colour} eq 'non') { printf {$output_fh} ("%-12s %-14d %-10s\n", $snp, $count, $change); } } } #close $input_fh; close $output_fh; __DATA__ FT SNP 27534 FT /note="refAllele: T SNPstrains: 7564_8#80=C (non-synonymous) (AA Leu->Ser) " FT /colour=2 FT SNP 27682 FT /note="refAllele: T SNPstrains: 7414_8#37=C (synonymous) " FT /colour=3 FT SNP 27710 FT /note="refAllele: G SNPstrains: 7083_1#32=T (non-synonymous) (AA Val->Phe) 7521_5#41=T (non-synonymous) (AA Val->Phe) " FT /colour=2 FT SNP 27771 FT /note="refAllele: A SNPstrains: 7480_8#28=G (non-synonymous) (AA His->Arg) " FT /colour=2 FT SNP 28047 FT /note="refAllele: A SNPstrains: 7480_7#86=T (non-synonymous) (AA Lys->Ile) " FT /colour=2 FT SNP 28490 FT /note="refAllele: G SNPstrains: 7083_1#4=T (non-synonymous) (AA Gly->Cys) 7554_6#38=T (non-synonymous) (AA Gly->Cys) " FT SNP 28492 FT /note="refAllele: C SNPstrains: 7414_7#66=A (synonymous) 7414_8#44=A (synonymous) 7521_6#54=A (synonymous) " FT /colour=3 FT SNP 28548 FT /note="refAllele: C SNPstrains: 7414_8#65=T (non-synonymous) (AA Ser->Leu) " FT /colour=2 FT SNP 28787 FT /note="refAllele: G SNPstrains: 7414_7#14=A (non-synonymous) (AA Asp->Asn) " FT /colour=2 FT SNP 28840 FT /note="refAllele: C SNPstrains: 7414_8#51=T (synonymous) 7414_8#71=T (synonymous) " FT /colour=3 FT SNP 28941 FT /note="refAllele: A SNPstrains: 7083_1#1=G (non-synonymous) (AA Gln->Arg) " FT /colour=2 FT SNP 29080 FT /note="refAllele: A SNPstrains: 7414_7#49=G (synonymous) 7521_6#39=G (synonymous) 7564_8#91=G (synonymous) 7712_8#14=G (synonymous) " FT /colour=3 FT SNP 29214 FT /note="refAllele: T SNPstrains: 7554_6#36=C (non-synonymous) (AA Val->Ala) " FT /colour=2 FT SNP 29574 FT /note="refAllele: C SNPstrains: 7065_8#73=T (non-synonymous) (AA Pro->Leu) " FT /colour=2 FT SNP 29610 FT /note="refAllele: C SNPstrains: 7480_8#12=T " FT /colour=1 FT SNP 29658 FT /note="refAllele: T SNPstrains: 7564_8#79=A " Based on the sample input data, this is the contents of output.txt.
Coordinate No of Strains AA Change -------------------------------------- 27534 1 Leu->Ser 27710 2 Val->Phe 27771 1 His->Arg 28047 1 Lys->Ile 28548 1 Ser->Leu 28787 1 Asp->Asn 28941 1 Gln->Arg 29214 1 Val->Ala 29574 1 Pro->Leu And here are the "error" messages sent to stderr, which I might direct to an "error" file for later review.
format error parsing "colour" at or near line 19 - skipping this record at D:\test\Perl-1.pl line 54, <DATA> line 19. format error parsing "colour" at or near line 48 - skipping this record at D:\test\Perl-1.pl line 46, <DATA> line 48.
|