 |
|
Home:
Need a Custom or Prewritten Perl Program?:
I need a program that...:
Merges 2 text files under few very difficult conditions [SOLVED] :
Edit Log
|
|

Thalakos
Novice
Dec 11, 2012, 12:49 PM
Views: 4787
|
|
Merges 2 text files under few very difficult conditions [SOLVED]
|
|
|
Hi all, I have this two files: A) Code: K00001 32 K00001 177 K00001 189 K00001 212 K00001 232 K00001 233 K00001 234 K00001 346 K00002 182 K00002 189 K00002 273 K00003 146 K00003 193 K00003 240 K00004 176 K00005 273 K00006 192 K00007 51 K00007 184 K00008 51 K00009 51 ........ B) Code: 0 BR:ko01002 Metabolism Enzyme Families Peptidases 1 PATH:ko04142 Cellular Processes Transport and Catabolism Lysosome 2 PATH:ko04612 Organismal Systems Immune System Antigen processing and presentation 3 BR:ko03110 Genetic Information Processing Folding, Sorting and Degradation Chaperones and folding catalysts 4 PATH:ko04145 Cellular Processes Transport and Catabolism Phagosome 5 PATH:ko05152 Human Diseases Infectious Diseases Tuberculosis 6 PATH:ko05323 Human Diseases Immune System Diseases Rheumatoid arthritis 7 PATH:ko04141 Genetic Information Processing Folding, Sorting and Degradation Protein processing in endoplasmic reticulum 8 PATH:ko04210 Cellular Processes Cell Growth and Death Apoptosis 9 PATH:ko05010 Human Diseases Neurodegenerative Diseases Alzheimer's disease 10 None Unclassified Metabolism Amino acid metabolism 11 BR:ko03000 Genetic Information Processing Transcription Transcription factors 12 PATH:ko04111 Cellular Processes Cell Growth and Death Cell cycle - yeast 13 PATH:ko00600 Metabolism Lipid Metabolism Sphingolipid metabolism 14 PATH:ko04020 Environmental Information Processing Signal Transduction Calcium signaling pathway 15 PATH:ko04974 Organismal Systems Digestive System Protein digestion and absorption 16 BR:ko04030 Environmental Information Processing Signaling Molecules and Interaction G protein-coupled receptors 17 PATH:ko04080 Environmental Information Processing Signaling Molecules and Interaction Neuroactive ligand-receptor interaction 18 BR:ko01003 Metabolism Glycan Biosynthesis and Metabolism Glycosyltransferases 19 PATH:ko04620 Organismal Systems Immune System Toll-like receptor signaling pathway 20 PATH:ko05133 Human Diseases Infectious Diseases Pertussis 21 PATH:ko05164 Human Diseases Infectious Diseases Influenza A 22 PATH:ko05160 Human Diseases Infectious Diseases Hepatitis C 23 PATH:ko05142 Human Diseases Infectious Diseases Chagas disease (American trypanosomiasis) 24 BR:ko00535 Metabolism Glycan Biosynthesis and Metabolism Proteoglycans 25 BR:ko03009 Genetic Information Processing Translation Ribosome Biogenesis 26 BR:ko02000 Environmental Information Processing Membrane Transport Transporters 27 PATH:ko02010 Environmental Information Processing Membrane Transport ABC transporters 28 PATH:ko00591 Metabolism Lipid Metabolism Linoleic acid metabolism 29 BR:ko01004 Metabolism Lipid Metabolism Lipid biosynthesis proteins 30 PATH:ko00590 Metabolism Lipid Metabolism Arachidonic acid metabolism 31 PATH:ko00380 Metabolism Amino Acid Metabolism Tryptophan metabolism The files are linked by the numbers: every K0000X in file A) is associated with a function in file B). The association is supported by the same number in the raw of the K0000X and the function. I need to have a new file like this: Code: K00001 BR:...the rest of the line K00002 PATH:...the rest of the line K00003 .... K00004 .... .... So all the K0000N ID's and their associated function (BR: or PATH: depending on the related number). What makes it difficult is that in the file B) there all only 300 defined and unique lines (so 300 different function) but in file A) there are thousands of that K0000N ID's in multiple entry. In the new file I need to generate, I must have only single entries and if there are more numbers associated with K000N ID' in that case I should have something like that: Code: K00001 funcition (releted to the 1st number) function (reletade to the second number) function (releted to the 3rd number) ... Every K0000N should be in its own line with its associated function/functions (depending if comes in single or multiple entrie) separeted by tab. I know it's very tricky but I hope some of you expert here could have an idea on how to do it. Thanks in advance for any input
PS: I posted just the intial line, but attachd you can find the two whole files.
(This post was edited by Thalakos on Dec 12, 2012, 10:18 PM)
|
|
Attachments:
|
file_A_&_file_B.zip
(78.5 KB)
|
|
|
Edit Log:
|
|
Post edited by Thalakos
(Novice) on Dec 11, 2012, 12:49 PM
|
|
Post edited by Thalakos
(Novice) on Dec 12, 2012, 10:18 PM
|
|
|  |