Skip to main content


iVar is a computational package that contains functions broadly useful for viral amplicon-based sequencing. Additional tools for metagenomic sequencing are actively being incorporated into iVar. While each of these functions can be accomplished using existing tools, iVar contains an intersection of functionality from multiple tools that are required to call iSNVs and consensus sequences from viral sequencing data across multiple replicates. We implemented the following functions in iVar: (1) trimming of primers and low-quality bases, (2) consensus calling, (3) variant calling – both iSNVs and insertions/deletions, and (4) identifying mismatches to primer sequences and excluding the corresponding reads from alignment files.

Freyja is a tool to estimate relative abundance of SARS-CoV-2 lineages from sequencing of mixed-lineage virus samples, like wastewater. Freyja builds on iVar and is composed of two main steps: (1) SNV frequency estimation and (2) depth-weighted demixing using constrained least absolute deviation regression. Additional post-processing methods are available for output aggregation and visualization.

Bjorn is a pipleine to count mutations from a given set of genomes in a parallelized manner. The pipeline is currently used by to count substitutions and deletions in all the SARS-CoV-2 genomes (over 4 million as of Oct, 2021) available on GISAID. The pipeline consists of the following steps: (1) Download SARS-CoV-2 genomes via the GISAID API (2) Divide sequences into chunks of 10,000 and run downstream steps in parallel, (3) Align these sequences using minimap2 (Li, 2018), (4) Convert the alignment into a FASTA file using datafunk, (5) Count substitutions and deletions from this alignment, (6) standardize and filter the metadata: country, division, location (using shapefiles from GADM), PAGNO lineage, date of collection, and date of submission and (6) Combine results from all chunks and convert to a JSONL object.The final JSON object can be loaded into a database such as ElasticSearch.

An R package to access API. The package includes functions that allow users to easily retrieve data from the API for downstream analysis and visualization. Users can retrieve data by specifying administrative level (World Bank region, country, state/province, metropolitan area, county), location name(s), or by constructing a custom query with additional parameters. The package also allows users to directly plot metrics of interest for the specified locations. The api also includes the geometric features of each queried location allowing users to quickly create maps to visualize epidemiological data.

BEAST is a cross-platform program for Bayesian analysis of molecular sequences using MCMC. It is entirely orientated towards rooted, time-measured phylogenies inferred using strict or relaxed molecular clock models. It can be used as a method of reconstructing phylogenies but is also a framework for testing evolutionary hypotheses without conditioning on a single tree topology. BEAST uses MCMC to average over tree space, so that each tree is weighted proportional to its posterior probability. We include a simple to use user-interface program for setting up standard analyses and a suit of programs for analysing the results.

RBeast is an R package for working with output files from BEAST and BEAST2. This package consists of four main functions: (1) Use beautier to create BEAST2 input (.xml) files. (2) Use beastier to run BEAST2. (3) Use tracerer to parse BEAST2 output (.log, .trees, etc) files. (4) Use BEASTmasteR for tip-dating analyses using fossils as dated terminal taxa.

BEAGLE is a high-performance library that can perform the core calculations at the heart of most Bayesian and Maximum Likelihood phylogenetics packages. It can make use of highly-parallel processors such as those in graphics cards (GPUs) found in many PCs. The project involves an open API and fast implementations of a library for evaluating phylogenetic likelihoods (continuous time Markov processes) of biomolecular sequence evolution. The aim is to provide high performance evaluation ‘services’ to a wide range of phylogenetic software, both Bayesian samplers and Maximum Likelihood optimizers. This allows these packages to make use of implementations that make use of optimized hardware such as graphics processing units.

Tracer is a software package for visualising and analysing the MCMC trace files generated through Bayesian phylogenetic inference. Tracer provides kernel density estimation, multivariate visualisation, demographic trajectory reconstruction, conditional posterior distribution summary and more. Tracer  can read output files from MrBayes, BEAST, BEAST2, RevBayes, Migrate, LAMARC and and possibly other MCMC programs from other domains.

Clonify is a software package that is able to perform unseeded lineage assignment on very large sets of antibody sequences. Defining the dynamics and maturation processes of antibody clonal lineages is crucial to understanding the humoral response to infection and immunization. Comprehensive study of antibody lineages has been limited by the lack of an accurate clonal lineage assignment algorithm capable of operating on next-generation sequencing datasets.

AbStar is a software package that is able to perform VDJ assignments and primary annotation of antibody and TCR sequencing data. Scalable from a single sequence to billions of sequences and conforms with AIRR data formatting standards.

AbUtils / ab[x] is a software package that provides programming primitives and utilities for interactive sequence processing, analysis, and visualization of BCR and TCR sequencing data. Conforms with AIRR data formatting standards.

systemsseRology is a collection of functions used to analyse multivariate systems serology data.


Data explorer that we developed to browse and analyze Ebola antibody data generated from Erica Saphire’s VIC consortium.

Network analysis that we developed to investigate interactions between data in the Ebola antibody dataset generated from Erica Saphire’s VIC consortium.