CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Regular Expressions: How to extract protein names from sequence file: Edit Log



jtra00
Novice

Jan 24, 2012, 9:34 AM


Views: 12858
How to extract protein names from sequence file

How do I only retrieve the header (protein name) from this protein sequence file below?
I would like to extract for example for the first protein only rev_sp|P31946|
When I tried, I get everything plus sequence.


file protein.txt

>rev_sp|P31946|1433B_HUMAN 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB PE=1 SV=3

NEGEGADGEDGQNESTWLTLNDRLLQMILTSDKYSEENLTDLEAIAEDFATKALSCAKEP

SNLIEYYFVSFNLALGLRIPHTPQMEKKSIEFAEQYAQQSNSVTTQKNDGSAVESLYRFY

DGKMKLYFVKSEPQTANPILYKDLLELVDNCIDQLEAEIKERYEKGMQQKKENRETKQEI

SSIVRWSSRRAGVVNKYAVSLLNREENSLEHGQETVAKMAAAMDDYREAQEALKAKQVLE

SKDMTM

>rev_sp|P31946-2|1433B_HUMAN Isoform Short of 14-3-3 protein beta/alpha OS=Homo sapiens GN=YWHAB

NEGEGADGEDGQNESTWLTLNDRLLQMILTSDKYSEENLTDLEAIAEDFATKALSCAKEP

SNLIEYYFVSFNLALGLRIPHTPQMEKKSIEFAEQYAQQSNSVTTQKNDGSAVESLYRFY

DGKMKLYFVKSEPQTANPILYKDLLELVDNCIDQLEAEIKERYEKGMQQKKENRETKQEI

SSIVRWSSRRAGVVNKYAVSLLNREENSLEHGQETVAKMAAAMDDYREAQEALKAKQVLE

SKDM

>rev_sp|P62258|1433E_HUMAN 14-3-3 protein epsilon OS=Homo sapiens GN=YWHAE PE=1 SV=1

QNEDEVDQLAEKNQEEGDGQMDSTWLTLNDRLLQMILTSDKYSEESLTDLEAIADDFAAK

ALRCARDPSNLIEYYFVSFNLALGLRIPHTPPLETMAIDSAAKYAVLSNEAAEKRDNGTA

FEALYRHYDGKMKYYFVKSEGTNAAPILHKDLVDLIDCCILKLETEVMQRYERIMKLKDE

GGKNEEKQEISSIIRWSARRAGIVNKYAVSLLNREEVTLEVDMGAVKKMSEVMEDYREAQ

EALKAQYVLDERDDM

>rev_sp|Q04917|1433F_HUMAN 14-3-3 protein eta OS=Homo sapiens GN=YWHAH PE=1 SV=4

NGEGAEEDQQDSTWLTLNDRLLQMILTSDKYSDENLTDLEAIADDFAQKALLCAQEPANQ

IEYYFVSFNLALGLRIPHTPQMQEKSIEFAEKYAAESAEVVSNKKEGSAVEALYRYYDGK

MKLYFVKSEYQFDNCNKILFKDLLSLVDNCVTELEKEIKERYAKVKELKKENGDAMTKQE

ISSIVRWSSRRAGVVNKYAVSLLNRDENSLPENLETVAKMASAMDDYREAQEALRARQLL

QERDGM

>rev_sp|P61981|1433G_HUMAN 14-3-3 protein gamma OS=Homo sapiens GN=YWHAG PE=1 SV=2



NNGEGGDDDQQDSTWLTLNDRLLQMILTSDKYSDENLTDLEAIADDFATKALHCAQEPAN

QIEYYFVSYNLALGLRIPHTPQMHEKSIEHAESYAKESSEVVTARKEGTAVEALYRYYDG

KMKLYFVKSEYQTESCNKILYNDLLSLVDQCVAELEKEIKERYARVMEIKKENGDASTKQ

EISSIVRWSSRRAGVVNKYAVSLLNREENSLPENLETVNKMAAAMDDYREAQEALRAKQV

LQERDVM


(This post was edited by jtra00 on Jan 24, 2012, 9:41 AM)


Edit Log:
Post edited by jtra00 (Novice) on Jan 24, 2012, 9:41 AM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives