Monday 22 November 2010

Allegory of the genome

This is something I wrote a little while ago, after reading Craig Venter's interview at Der Spiegel entitled "We have learned nothing from the genome". It's my attempt to explain (in less than 1000 words) what the genome is, why it is important, and why it doesn’t provide all the answers.


Imagine you had a library of thousands of books all piled randomly in a number of rooms. You don't even know how many books you have. So, one day you decide to sit down and catalog all the books, and put them in, like, alphabetical order by author. It's a difficult task but eventually you manage. Now once you have that catalog, and you know exctly how many books you have, where every book is, you can sort it by author, by subject, by year of publication, whatever. It's cool. This catalog is something like what we have now that we've decoded the genome.

However, the information in all the books in the library is not really contained in the catalog you made. If you want to get all the information in the library, you'd have to read every single book, and remember every line of it, which is a much more difficult task than just making the catalog. It's probably even impossible for a single person.

For example,if someone asks, "what does it say at the second half of page 52 of that book you bought on 12 March 1998?" it's very unlikely that you'll be able to recite the text. But the catalog does help you to find the book (if you've included information about when you bought each one) so you can open it on page 52, and read what it says. In a similar sense, having the genome has helped us to organise information so we can retrieve it faster. But the genome (the catalog) is like a list of the information of the whole library, it's no substitute for reading every book.

Also, imagine that your library doesn't just contain books, but also notebooks, or even loose pages. When you make the catalog, to speed up things, you decide that you're only going to look at the cover of each book, maybe the first few pages, so you get the title, the author, the year it was published, the publisher. So, for the notebooks, again you just catalog whatever is written in the cover, not everything that's written inside. So, in the end your catalog tells you that you have, for example, 8 notebooks, written by your grandmother during World War II. Again, the catalog doesn't tell you anything about the content, and since noone else has read these notebooks, if you want to know what's in them you'd have to read each and every page. Or for the loose pages, you don't even know how to categorise them sometimes. The notebooks that no-one has ever looked at and the loose pages are a bit like genes of unknown function, or like "junk DNA". Maybe they're very important, but you don't know unless you really read them in detail, and until you do you don't even know what they're important for... For example, your grandmothers notebooks could be her diary, or they could be full of cooking recipes (yummy!) or they could be an account of her activities as a spy for the Russians! Unless you open them and read them, you won't know, and the catalog (the genome) won't give you this level of detail. So, opening and reading every book and notebook, is like studying a gene in the laboratory or the clinic, it takes years before you really know what it does, and the catalog (the genome) only gives you limited clues, but still very valuable ones. Similarly, your library may include books written in ancient foreign languages that noone speaks anymore, or in a secret code (like Morse code) but for which noone has a key. Again, these books are a bit like “junk DNA”, as they may contain useful information, but unless you figure out a way to read them, you don’t know what this information is.

Finally, your catalog doesn't really help to answer meaningful/complicated questions, such as "What did the ancient Egyptians know about the sun" or "What did Tolstoy think about women?" It helps you to start looking for these answers (find all the relevant books), but again to get the answer you'd have to read all the relevant books, take notes, decide what's more important, and essentially write a summary report that answers the specific question. So, again, scientific research to answer the important questions (like, "which genes cause cancer?") is still necessary, and difficult, and it takes time, but the catalog (the genome) is a very important first step in answering these questions.

So, what I'm saying is, the genome (the catalog) is great, it's a big achievement, it speeds up things and it means you have some hope of finding what you're looking for in the library, compared to when it was all a pile of books on the floor. But it's not the answer to everything, in and of itself. 

No comments:

Post a Comment