25 November 2008

Metagenome annotation using a distributed grid of undergraduate students

ResearchBlogging.orgI really tried to come up with my own pithy title for this post. First I flirted with "Pascal's Wager: Undergrads can do big science", then I tried "Public participation in science: you're doin' it right" on for size, and I quite liked "Annotathon!" and "Please note: this metagenome has been annotated by undergrads", but in the end I decided I just couldn't beat the actual title of the paper, published in today's PLoS Biology, which describes something called the Annotathon, a clever bioinformatics teaching tool that doubles as a clever bioinformatics research tool.

Bioinformatics, the particular area of study/research in question, involves using computers to make sense of the mountains of biological data being ever more rapidly churned out by Sanger pyro- nanopore sequencing of the DNA of both single specimens (genomics) and multi-species samples (metagenomics).

The story began when researchers from Marseilles University, led by Pascal Hingamp, noticed that even as their lecture halls were heaving with undergraduates, so their data stockpiles were heaving with un-annotated DNA sequences extracted from mixed environmental samples. And that's when it happened--voila!--out of their piqued brains trundled the Annotathon!

The Annotathon involves training up undergrads to characterise DNA sequences and then setting them loose on a bunch of real stockpiled metagenomic sequences. The students have to use the internet to try and identify the organism the DNA comes from, for example, and what its biological function might be (if any).

Figure 3 from Hingamp et. al., The Annotathon Sequence Cart. The five DNA fragments, assigned to a student, illustrate each possible annotation stage: ongoing initial “Annotations 1,” awaiting initial “Evaluation 1,” ongoing final “Annotations 2,” awaiting final “Evaluation 2,” and sequence annotations “Finished.”

In return for their much-needed help sorting out oodles of DNA data, the undergrads gain a practical knowledge of the work involved in doing bioinformatics and metagenomics, and, most importantly of all, they get to experience what it's like to do real research. That's the attraction of science after all, not the heavy tomes of factoids and boooring canned (and therefore inherently condescending) experiments, but rather the being at the edge of the envelope of human knowledge, and when you get some new data, however small it might be, for a little while you are the only person on Earth who knows what you know.

And it's not just me who thinks this. Last year in the American Society for Cell Biology's publication CBE-Life Sciences Education, Anne Jurowski et. al. wrote in "Metagenomics: A Call for Bringing a New Science into the Classroom (While It's Still New)":
"The pace of research and the development of new areas of focus in biology are increasing at breathtaking speed. Unfortunately, exciting new areas of science typically do not appear in science classrooms and textbooks until many years after their inception. This pattern leaves undergraduate, and especially high school, biology education lagging behind scientific advances. The result is that too many students are never afforded opportunities to learn about the cutting-edge discoveries that make biology so exciting to professional scientists.


"The birth of this exciting new field (described more fully below) provides the life sciences research and education communities with a powerful and rare opportunity. Metagenomics is so young, and the microbial world it seeks to characterize is so vast, that there is a real possibility that scientists, teachers, and students in many areas of science can work together to advance this field. By acting now to incorporate metagenomics into biology education and to utilize biology education to inform questions and future research paths for metagenomics, the life sciences community can begin to shift from the current situation, in which scientific advances take decades to reach the classroom, toward a system in which education and research are deliberately and strategically integrated with each other from the very beginning..."
Well, if that's not a strategy for re-invigorating science edcuation, I don't know what is. And for any die hard researchers out there who are still not convinced that undergrads should be allowed to contribute to research, consider this: the fact that this paper is in PLoS Biology shows that the students are producing high quality data; indeed their work ends up immortalised in the big public databases used daily by professional researchers (that'd be you).

Laboratory equipment wish list for the new Beagle:
  • DNA extraction robot
  • Nanopore sequencer
  • Annotathon
And with that, I think I might finally be triangulating towards a good title for this post ...nah.

Update 26th November: Many thanks to Dennis in comments who writes "I'm part of an undergraduate genomics project that has students involved in both finishing and annotation. It is run by Sally Elgin at Washington University which involves over 20 other colleges. We just published an article describing it in Science (Oct 31 issue). The conclusion, of course, is that it works." I would be delighted to write another blog post on that study, but sadly, despite working at a major scientific institution, I do not have online access to Science (ahem). So, perhaps Dennis (or someone else with access) would be so kind as to send me a pdf (karen at thebeagleproject dot com) of the Science article? Thanks, Ron!

Hingamp P, Brochier C, Talla E, Gautheret D, Thieffry D and Herrmann C. (2008). Metagenome annotation using a distributed grid of undergraduate
students. PLoS Biol, 6(11): e296 DOI: 10.1371/journal.pbio.0060296

A. Jurkowski, A. H. Reid, J. B. Labov (2007). Metagenomics: A Call for Bringing a New Science into the Classroom (While It's Still New) Cell Biology Education, 6 (4), 260-265 DOI: 10.1187/cbe.07-09-0075


Jim Lemire said...

Very cool. Undergrads are capable of some excellent work if given the chance. Over the past several years we've been upping the undergrad research program at RWU with great success. It's great to hear about undergrads being recognized for their accomplishments.

mudphudder said...

I think this is a great idea too. It borrows from previous examples in other fields, such as amateur astronomers helping to map out and validate the large volumes of data obtained from large academic observatories or the hubble telescope. Large computational projects have also been run in small parallel pieces by getting people to download programs that will run when their screen-saver comes on. As you pointed, however, this idea has the benefit of involving students at the cutting edge of their field of study.

Richard Carter, FCD said...

A great idea. Very similar to Amazon's Mechanical Turk - but with better-qualified participants.

Dennis said...

I'm part of an undergraduate genomics project that has students involved in both finishing and annotation. It is run by Sally Elgin at Washington University which involves over 20 other colleges. We just published an article describing it in Science (Oct 31 issue). The conclusion, of course, is that it works.

J.Kamesh said...

Truly amazing work Pascal et al ...

Talking about annotation, Here is some thing of interest to all a web 2.0 based free service helps discovering newer scientific relations across abstracts. It provides manually curated and annotated sentences for the keywords of your choice. It's free, check it out http://www.xtractor.in

dianna.rose83@gmail.com said...

Many institutions limit access to their online information. Making this information available will be an asset to all.