High-throughput life science instruments, especially DNA sequencers, are improving very rapidly. A DNA sequencer is now capable of generating enough data to cover the human genome dozens of times over in approximately one week. Sequencing has become a ubiquitous tool in the study of biology, genetics and disease. Today, because sequencing throughput is outpacing computer speed and storage capacity, the most crucial biological research bottlenecks are increasingly computational: computing, storage, labor, and power.
The laboratory’s goal is to make high-throughput life science data as useful as possible to everyday life scientists. We pursue this goal by:
- Developing methods and software tools that are efficient, allowing researchers to interact with datasets quickly and effectively. See: Bowtie, Bowtie 2, Dashing, FORGe, Lighter, Samovar, Arioc, HISAT. See also our read alignment review.
- Developing scalable tools that allow researchers to work with very large datasets, or large collections of datasets. See: Snaptron, Recount & recount2, Ascot, Intropolis, Rail-RNA, Rail-dbGaP, Myrna, Crossbow, Boiler. See also our cloud computing review.
- Making output from our software as interpretable as possible, meaning that when a life scientist looks at a result, he or she should be able to understand what conclusion was reached, why it was reached, and what degree of confidence to have in it. See: Qtip.
We also have a passion for teaching, both in the classroom at online e.g. in our highly-rated Algorithms for DNA Sequencing course on the Coursera platform. We freely distribute many kinds of teaching materials, including lecture videos, screencasts, lecture notes, and programming “notebooks.” These materials span subjects from programming in C/C++ to applied algorithms and data structures in computational biology. See the Teaching Materials page for links and details.