
histrung
Novice
Jan 24, 2012, 6:48 PM
Post #10 of 13
(26635 views)
|
Re: [jtra00] How to extract protein names from sequence file
[In reply to]
|
Can't Post
|
|
Is this what you want? I used the input you showed in the first post.
#!/usr/bin/perl use strict; use warnings; my $infile = 'hdec1.csv'; open (FILEHANDLE, '<', $infile); my @inlines = <FILEHANDLE>; close(FILEHANDLE); my @outlines; foreach my $line (@inlines) { chomp($line); push(@outlines,$1) if ( $line =~ />(rev_sp\|.*?\|)/) ; } my $protien = join("\n",@outlines); my $outfile = 'listdec.txt'; open (OUT, '>', $outfile); print OUT $protien."\n"; close OUT; Output cat listdec.txt rev_sp|P31946| rev_sp|P31946-2| rev_sp|P62258| rev_sp|Q04917| rev_sp|P61981| Just egrep and sed: egrep ">" hdec1.csv | sed -e 's/\(.\)\(.*|\)\(.*\)/\2/' rev_sp|P31946| rev_sp|P31946-2| rev_sp|P62258| rev_sp|Q04917| rev_sp|P61981|
(This post was edited by histrung on Jan 24, 2012, 7:11 PM)
|