Screen Shot 2014-02-02 at 5.20.13 PM

High-throughput life science instruments, especially DNA sequencers, are improving very rapidly. A DNA sequencer is now capable of generating enough data to cover the human genome dozens of times over in approximately one week. Sequencing has become a ubiquitous tool in the study of biology, genetics and disease. Today, because sequencing throughput is outpacing computer speed and storage capacity, the most crucial biological research bottlenecks are increasingly computational: computing, storage, labor, and power.

The laboratory’s goal is to make high-throughput life science data as useful as possible to everyday life scientists. We pursue this goal by:

  1. Developing methods and software tools that are efficient, allowing researchers to interact with datasets quickly and effectively. See: BowtieBowtie 2, Kraken 2, Dashing, Vargas, Lighter, SamovarArioc, HISAT.  See also our read alignment review.
  2. Developing scalable tools that allow researchers to work with very large datasets, or large collections of datasets. See: Snaptron, MegadepthRecount & recount2, AscotIntropolis, Rail-RNA, Rail-dbGaP, Myrna, Crossbow, Boiler.  See also our cloud computing review.
  3. Making output from our software as interpretable and free of bias as possible.  See: Qtip, FORGe, Reference Flow and the related LevioSAM tool.

We are passionate about teaching, both in the classroom at online e.g. in our highly-rated Algorithms for DNA Sequencing course on Coursera. We freely distribute various teaching materials, including lecture videos, screencasts, lecture notes, and programming notebooks. These span subjects from programming in C/C++ to applied algorithms and data structures in computational biology. See the Teaching Materials page for links and details.

The lab is located at the Johns Hopkins University Department of Computer Science.