CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Perl Programming Help: Beginner: [SOLVED] Merges 2 text files under few conditions : Edit Log



Thalakos
Novice

Apr 4, 2013, 2:06 PM


Views: 891
[SOLVED] Merges 2 text files under few conditions

Hi all,

I have two text files, file_a.txt and file_b.txt that look like that:

file_a:

Code
has-mir-199a 
has-miR-222
has-miR-222
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7a
hsa-let-7b
hsa-let-7b
hsa-let-7b
hsa-let-7b
hsa-let-7c
hsa-let-7c
hsa-let-7c
hsa-let-7c
hsa-let-7c
hsa-let-7c
hsa-let-7d
hsa-let-7d
hsa-let-7d
hsa-let-7d
hsa-let-7e
hsa-let-7e
hsa-let-7e
hsa-let-7e
hsa-let-7e
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7f
hsa-let-7g
hsa-let-7g
....
line cut


file_b:

Code
hsa-let-7a	KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF 
hsa-let-7b Cdc34 , Dicer , KRAS , CCND1 , CDC25A , CDK6 , HMGA2
hsa-let-7c HMGA2 , HMGA2 , HMGA2 , BCL2 , RAS , CDC25A , CDK6 , RAS
hsa-let-7d KRAS , HMGA2 , BCL2 , RAS , CDC25A , CDK6
hsa-let-7d BDNF , D3R
hsa-let-7e HMGA2
hsa-let-7g KRAS , HMGA2 , Ras , HMGA2 , CDC25A , CDK6
hsa-miR-1 c-Met , calmodulin , Gata4 , Mef2a , BCL2 , Gata4 , calmodulin , Mef2a , C/EBPa , FoxP1 , HDAC4 , MET , HCN4 , FoxP1 , HDAC4 , MET , Cdk9 , fibronectin , RasGAP , Rheb , MEF-2 , nAChR , GAJ1 , KCNJ2 , HSP60 , HSP70 , Hand2 , Kir2.1
hsa-miR-100 Plk1
hsa-miR-101 EZH2 , EZH2 , Mcl-1 , FOS , EZH2 , FOS , ATXN1 , MYCN , Ezh2
hsa-miR-101b ATXN1 , STC1
hsa-miR-106a IL-10 , E2F1 , Mylip
hsa-miR-106b p21 , APP , Itch , E2F1 , E2F1 , PCAF
hsa-miR-107 PLAG1 , BACE1
hsa-miR-10b HOXD10 , PPAR-alpha
hsa-miR-1-2 Hand2 , Irx5 , Kcnd2
hsa-miR-122 Bcl-w , ADAM-10 , SRF , Igf1R
hsa-miR-122a CCNG1 , CCNG1 , AMPK
hsa-miR-124 BDNF , D3R , Sox9
hsa-miR-124a Rb , IkappaBzeta , CDK6 , CDK6 , CDK6 , CDK6
hsa-miR-199 ET-1
hsa-miR-199a IKK-beta
hsa-miR-199a* Smad1 , ERK2 , MET
hsa-miR-222 p27 , p27 , p27 , p57 , MMP1 , SOD2 , Bim , CDKN1B/p27/Kip1 , KIT , c-KIT , p27(Kip1) , p27(Kip1) , ERalpha , CDKN1C/p57 , CDKN1B/p27/Kip1 , c-KIT , KIT , CDKN1B/p27/Kip1 , CDKN1B/p27/Kip1 , KIT , p27


The file a as only one column reporting in multiple time the same entries present in column 1 of file b.
I need a new file C in wich every multiple entry of the file A is associated with the entry related to column 2 of file B.
It should look like that:

file_c.txt:

Code
has-mir-199a	IKK-beta 
has-miR-222 p27 , p27 , p27 , p57 , MMP1 , SOD2 , Bim , CDKN1B/p27/Kip1 , KIT , c-KIT , p27(Kip1) , p27(Kip1) , ERalpha , CDKN1C/p57 , CDKN1B/p27/Kip1 , c-KIT , KIT , CDKN1B/p27/Kip1 , CDKN1B/p27/Kip1 , KIT , p27
has-miR-222 p27 , p27 , p27 , p57 , MMP1 , SOD2 , Bim , CDKN1B/p27/Kip1 , KIT , c-KIT , p27(Kip1) , p27(Kip1) , ERalpha , CDKN1C/p57 , CDKN1B/p27/Kip1 , c-KIT , KIT , CDKN1B/p27/Kip1 , CDKN1B/p27/Kip1 , KIT , p28
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
hsa-let-7a KRAS , HMGA2 , integrin beta(3) , caspase-3 , PRDM1/Blimp-1 , HMGA2 , IGF-II , HMGA2 , HMGA2 , RAS , BCL2 , RAS , MYC , CDC25A , CDK6 , NF2 , c-myc , RAS , RAS , NIRF
.....
line cut

Do you guys have any idea on how to make it works automatically?

Thanks a lot in advance,
Giorgio


(This post was edited by Thalakos on Apr 5, 2013, 9:49 PM)


Edit Log:
Post edited by Thalakos (Novice) on Apr 5, 2013, 9:49 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives