
FishMonger
Veteran
/ Moderator
Jul 9, 2013, 10:05 AM
Post #2 of 13
(2690 views)
|
Re: [cmccabe1] text search in file
[In reply to]
|
Can't Post
|
|
Yes, you should use the split function along with an array slice to extract those 5 fields. Then, if needed, use the split function on those fields to separate the 'key'='value' pairs. Here's an example using your data.
#!/usr/bin/perl use strict; use warnings; use Data::Dumper; $/ = ''; while (my $line = <DATA>) { my @fields = (split(/;/, $line))[0,6,17,21,24]; print Dumper \@fields; } __DATA__ RS=199476396;RSPOS=985955;dbSNPBuildID=136;SSR=0;SAO=1;VP=0x050260000a01000002110100;GENEINFO=AGRN:375790;WGT=1;VC=SNV;PM;S3D;NSM;REF;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.985955G>C;CLNSRC=OMIM Allelic Variant;CLNORIGIN=1;CLNSRCID=103320.0001;CLNSIG=5;CLNDSDB=GeneReviews:NCBI:OMIM:Orphanet;CLNDSDBID=NBK1168:C1850792:254300:590;CLNDBN=Myasthenia\x2c limb-girdle\x2c familial;CLNACC=RCV000019902.1 RS=207460006;RSPOS=1199489;dbSNPBuildID=136;SSR=0;SAO=0;VP=0x050060080001000002110100;GENEINFO=UBE2J2:118424;WGT=1;VC=SNV;PM;INT;OTHERKG;LSD;OM;CLNALLE=1;CLNHGVS=NC_000001.10:g.1199489G>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=.;CLNDSDBID=.;CLNDBN=.;CLNACC=. RS=144003672;RSPOS=1245104;dbSNPBuildID=134;SSR=0;SAO=2;VP=0x050268020a01000002100120;GENEINFO=ACAP3:116983|PUSL1:126789;WGT=1;VC=SNV;PM;PMC;S3D;NSM;REF;R5;OTHERKG;LSD;CLNALLE=1;CLNHGVS=NC_000001.10:g.1245104C>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=.;CLNDSDBID=.;CLNDBN=.;CLNACC=. RS=145324009;RSPOS=1469331;dbSNPBuildID=134;SSR=0;SAO=2;VP=0x050268000a01000002100120;GENEINFO=ATAD3A:55210;WGT=1;VC=SNV;PM;PMC;S3D;NSM;REF;OTHERKG;LSD;CLNALLE=1;CLNHGVS=NC_000001.10:g.1469331G>A;CLNSRC=.;CLNORIGIN=2;CLNSRCID=.;CLNSIG=1;CLNDSDB=.;CLNDSDBID=.;CLNDBN=.;CLNACC=. Outputs:
$VAR1 = [ 'RS=199476396', 'GENEINFO=AGRN:375790', 'CLNHGVS=NC_000001.10:g.985955G>C', 'CLNSIG=5', 'CLNDBN=Myasthenia\\x2c limb-girdle\\x2c familial' ]; $VAR1 = [ 'RS=207460006', 'GENEINFO=UBE2J2:118424', 'CLNORIGIN=2', 'CLNDSDBID=.', undef ]; $VAR1 = [ 'RS=144003672', 'GENEINFO=ACAP3:116983|PUSL1:126789', 'CLNALLE=1', 'CLNSRCID=.', 'CLNDSDBID=.' ]; $VAR1 = [ 'RS=145324009', 'GENEINFO=ATAD3A:55210', 'CLNHGVS=NC_000001.10:g.1469331G>A', 'CLNSIG=1', 'CLNDBN=.' ];
|