
pjshort42
Novice
Aug 1, 2012, 2:27 AM
Views: 2468
|
|
Re: [Laurent_R] speeding up clustering program
|
|
|
What I am trying to do is take the first file which is organized in a two column format (nodeA, nodeB) with several thousand entries. Each row represents a network connection (so a connection between node A and node B in the network). What I am trying to do is measure clustering by counting up the number of times that I find certain arrangements. In this case, I am looking for groups of 4 nodes that share 4 or more edges together (evidence of clustering). Before the first loop is taking all of the connections of nodes from the first list and removing any repeats with the hash so we end up with a list of every node in the network. The first loop is attempting to iterate through this entire list of nodes for every possible combination. What the loop should be doing is: first: 1,2,3,4 second: 1,2,3,5 third: 1,2,3,6 . . etc until all of the combinations are covered. Each of these combinations is then run through grep to add up how many times one member of the list of 4 nodes is connected to another of the 4 nodes. If it is more than 4, we add 1 to our count and continue on to the next set of four genes. Let me know if that makes sense! Edit: The code i posted is adding up >2, >3, >5 hopefully this doesn't confuse you I was running that to check and see if it was working correctly on a dummy file earlier. What I am really looking for is >4, >5 but those can be changed to be whatever depending on how many connections we are looking for!
(This post was edited by pjshort42 on Aug 1, 2012, 2:56 AM)
|