High-throughput life science instruments, especially DNA sequencers, are improving very rapidly. A DNA sequencer is now capable of generating enough data to cover the human genome dozens of times over in approximately one week. Sequencing has become a ubiquitous tool in the study of biology, genetics and disease. Today, because sequencing throughput is outpacing computer speed and storage capacity, the most crucial biological research bottlenecks are increasingly computational: computing, storage, labor, and power.
The laboratory’s goal is to make high-throughput life science data as useful as possible to everyday life scientists. We pursue this goal by:
- Developing methods and software tools that are efficient, allowing researchers to interact with datasets quickly and effectively. See: Bowtie, Bowtie 2, Lighter, Snaptron, Arioc, HISAT.
- Developing scalable tools that allow researchers to work with very large datasets, or large collections of datasets. See: Snaptron, Recount & recount2, Intropolis, Rail-RNA, Rail-dbGaP, Myrna, Crossbow, Boiler.
- Making output from our software as interpretable as possible, meaning that when a life scientist looks at a result, he or she should be able to understand what conclusion was reached, why it was reached, and what degree of confidence to have in it. See: Qtip.
The lab uses approaches from computer science — algorithms, text indexing, and high performance computing, especially cloud computing — and from statistics to create high-impact software tools (see sidebar) benefiting the wide community of life scientists who rely on these data for their research.