Wednesday, February 09, 2005

Genome to proteome

This is a great article in Technology Review about an effort to take the human genome, the DNA sequence, and figure out a whole lot more about the "proteome," the set of all the proteins it makes (usually with reference to a specific cell-type or stimulation condition).

UPDATE: I got over to , the organization which is attempting this. Remember that the DNA sequence of the genome can, with first pass analysis, be broken down into regions which look highly similar to known genes, and other regions which look properly configured to be a gene but remain hypothetical. These are sometimes referred to as "open reading frames," or ORFs. (Bloggers are not the only ones saddled with a tin-ear jargon). What GRID would like to do is use computer power alone, lots of it, to predict the 3 dimensional structure of the proteins which would get made, in the case that the ORF is really a gene. It has been a strange empirical observation that structural data sometimes reveal similarities which are not apparent from sequence data alone.

So most of the novelty involves getting the computer power together to achieve this leapfrog calculation. So they're taking a page from the SETI search and asking people to donate their leftover desktop power to compute these structures. Based on their stats page, it looks like they have more that 1.2 million CPUs at work on any given day.
I think it's interesting to compare this way of doing things with the brute force approach of actually observing everything a microbe makes, at the RNA and protein level. I can't quite tell who's pulling more oxygen.

But I'd love to see those data...

No comments: