CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions: Re: [jtra00] How to extract protein names from sequence file: Edit Log



histrung
Novice

Jan 24, 2012, 6:48 PM


Views: 12747
Re: [jtra00] How to extract protein names from sequence file

Is this what you want? I used the input you showed in the first post.

Code
 
#!/usr/bin/perl
use strict;
use warnings;

my $infile = 'hdec1.csv';
open (FILEHANDLE, '<', $infile);
my @inlines = <FILEHANDLE>;
close(FILEHANDLE);

my @outlines;
foreach my $line (@inlines) {
chomp($line);
push(@outlines,$1) if ( $line =~ />(rev_sp\|.*?\|)/) ;
}
my $protien = join("\n",@outlines);
my $outfile = 'listdec.txt';
open (OUT, '>', $outfile);
print OUT $protien."\n";
close OUT;

Output
cat listdec.txt
rev_sp|P31946|
rev_sp|P31946-2|
rev_sp|P62258|
rev_sp|Q04917|
rev_sp|P61981|

Just egrep and sed:
egrep ">" hdec1.csv | sed -e 's/\(.\)\(.*|\)\(.*\)/\2/'
rev_sp|P31946|
rev_sp|P31946-2|
rev_sp|P62258|
rev_sp|Q04917|
rev_sp|P61981|



(This post was edited by histrung on Jan 24, 2012, 7:11 PM)


Edit Log:
Post edited by histrung (Novice) on Jan 24, 2012, 6:49 PM
Post edited by histrung (Novice) on Jan 24, 2012, 7:11 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives