25C3 - 1.4.2.3
25th Chaos Communication Congress
Nothing to hide
Speakers | |
---|---|
Magnus Manske |
Schedule | |
---|---|
Day | Day 2 (2008-12-28) |
Room | Saal 2 |
Start time | 20:30 |
Duration | 01:00 |
Info | |
ID | 3044 |
Event type | lecture |
Track | Science |
Language used for presentation | en |
Feedback | |
---|---|
Did you attend this event? Give Feedback |
All your base(s) are belong to us
Dawn of the high-throughput DNA sequencing era
New DNA genotyping and sequencing technologies have recently advanced the possibilities for both mass and individual genomics by several orders of magnitude. The personal genome on DVD, genetic analysis of entire populations, and government DNA databases are but a few of the results of this process. The field is still accelerating, and the related computational challenges are enormous.
In the year 2000, completion of sequencing of the human genome was announced, a work taking decades, costing millions and involving hundreds of scientists around the world. Subsequent advances in DNA sequencing technologies have propelled the possibilities in the field to scales unthinkable a mere decade ago. The price of sequencing an entire human genome quickly approaches $1.000, and can be done by a few individuals and a single machine in a few days. Despite this, more powerful sequencing technologies are under development, and could simplify the process even further within the coming years.
Genotyping is a technology to quickly and cheaply analyze a DNA sample for potential SNPs (single nucleotide polymorphisms, aka point mutations) on a single plate (chip). Today's DNA chips can check for one million SNPs in a cheap and automated fashion. This allows to compare groups of thousands of people for specific markers. Applications for this technology range from finding resistance genes over evolutionary relations to the separation of an individual's DNA from a mixture of thousands of people.
Both technologies require new approaches in computational approaches and storage technology. Analysis is performed on massive computer clusters with thousands of CPUs. Data storage requirements are measured in petabytes, pushing hard disk storage to the limit.
In my talk, I will describe how we got here, how we handle the technological challenges involved, and what the future might hold.