CGI/Perl Guide | Learning Center | Forums | Advertise | Login
Site Search: in

  Main Index MAIN
INDEX
Search Posts SEARCH
POSTS
Who's Online WHO'S
ONLINE
Log in LOG
IN

Home: Need a Custom or Prewritten Perl Program?: I need a program that...: Merges 2 text files under few very difficult conditions [SOLVED] : Edit Log



Thalakos
Novice

Dec 11, 2012, 12:49 PM


Views: 9054
Merges 2 text files under few very difficult conditions [SOLVED]

Hi all,

I have this two files:

A)

Code:

K00001 32
K00001 177
K00001 189
K00001 212
K00001 232
K00001 233
K00001 234
K00001 346
K00002 182
K00002 189
K00002 273
K00003 146
K00003 193
K00003 240
K00004 176
K00005 273
K00006 192
K00007 51
K00007 184
K00008 51
K00009 51
........



B)

Code:

0 BR:ko01002 Metabolism Enzyme Families Peptidases
1 PATH:ko04142 Cellular Processes Transport and Catabolism Lysosome
2 PATH:ko04612 Organismal Systems Immune System Antigen processing and presentation
3 BR:ko03110 Genetic Information Processing Folding, Sorting and Degradation Chaperones and folding catalysts
4 PATH:ko04145 Cellular Processes Transport and Catabolism Phagosome
5 PATH:ko05152 Human Diseases Infectious Diseases Tuberculosis
6 PATH:ko05323 Human Diseases Immune System Diseases Rheumatoid arthritis
7 PATH:ko04141 Genetic Information Processing Folding, Sorting and Degradation Protein processing in endoplasmic reticulum
8 PATH:ko04210 Cellular Processes Cell Growth and Death Apoptosis
9 PATH:ko05010 Human Diseases Neurodegenerative Diseases Alzheimer's disease
10 None Unclassified Metabolism Amino acid metabolism
11 BR:ko03000 Genetic Information Processing Transcription Transcription factors
12 PATH:ko04111 Cellular Processes Cell Growth and Death Cell cycle - yeast
13 PATH:ko00600 Metabolism Lipid Metabolism Sphingolipid metabolism
14 PATH:ko04020 Environmental Information Processing Signal Transduction Calcium signaling pathway
15 PATH:ko04974 Organismal Systems Digestive System Protein digestion and absorption
16 BR:ko04030 Environmental Information Processing Signaling Molecules and Interaction G protein-coupled receptors
17 PATH:ko04080 Environmental Information Processing Signaling Molecules and Interaction Neuroactive ligand-receptor interaction
18 BR:ko01003 Metabolism Glycan Biosynthesis and Metabolism Glycosyltransferases
19 PATH:ko04620 Organismal Systems Immune System Toll-like receptor signaling pathway
20 PATH:ko05133 Human Diseases Infectious Diseases Pertussis
21 PATH:ko05164 Human Diseases Infectious Diseases Influenza A
22 PATH:ko05160 Human Diseases Infectious Diseases Hepatitis C
23 PATH:ko05142 Human Diseases Infectious Diseases Chagas disease (American trypanosomiasis)
24 BR:ko00535 Metabolism Glycan Biosynthesis and Metabolism Proteoglycans
25 BR:ko03009 Genetic Information Processing Translation Ribosome Biogenesis
26 BR:ko02000 Environmental Information Processing Membrane Transport Transporters
27 PATH:ko02010 Environmental Information Processing Membrane Transport ABC transporters
28 PATH:ko00591 Metabolism Lipid Metabolism Linoleic acid metabolism
29 BR:ko01004 Metabolism Lipid Metabolism Lipid biosynthesis proteins
30 PATH:ko00590 Metabolism Lipid Metabolism Arachidonic acid metabolism
31 PATH:ko00380 Metabolism Amino Acid Metabolism Tryptophan metabolism



The files are linked by the numbers: every K0000X in file A) is associated with a function in file B). The association is supported by the same number in the raw of the K0000X and the function. I need to have a new file like this:

Code:

K00001 BR:...the rest of the line
K00002 PATH:...the rest of the line
K00003 ....
K00004 ....
....



So all the K0000N ID's and their associated function (BR: or PATH: depending on the related number). What makes it difficult is that in the file B) there all only 300 defined and unique lines (so 300 different function) but in file A) there are thousands of that K0000N ID's in multiple entry. In the new file I need to generate, I must have only single entries and if there are more numbers associated with K000N ID' in that case I should have something like that:
Code:

K00001 funcition (releted to the 1st number) function (reletade to the second number) function (releted to the 3rd number) ...



Every K0000N should be in its own line with its associated function/functions (depending if comes in single or multiple entrie) separeted by tab.

I know it's very tricky but I hope some of you expert here could have an idea on how to do it.

Thanks in advance for any input

Code
 
PS:
I posted just the intial line, but attachd you can find the two whole files.


(This post was edited by Thalakos on Dec 12, 2012, 10:18 PM)
Attachments: file_A_&_file_B.zip (78.5 KB)


Edit Log:
Post edited by Thalakos (Novice) on Dec 11, 2012, 12:49 PM
Post edited by Thalakos (Novice) on Dec 12, 2012, 10:18 PM


Search for (options) Powered by Gossamer Forum v.1.2.0

Web Applications & Managed Hosting Powered by Gossamer Threads
Visit our Mailing List Archives