Yes! 🎉🎉🎉

(but many projects are still work-in-progress)

The Rust bioinformatics ecosystem contains general, easy-to-use crates like bio, along with a plethora of crates for specific tasks.

Ecosystem

Libraries

Here you can find all sorts of bioinformatics crates that were created by the Rust community:

  • bio (repo | docs | crates.io) - Implementations of many useful bioinformatics data structures and algorithms, including pattern matching, alignment, suffix arrays, BWT, FM-Index, and parsers for common file types.

  • coitrees (repo | docs | crates.io) - Cache oblivious interval tree implementation for very fast overlap queries of a static set of integer intervals, with genomic intervals in mind.

  • debruijn (repo | docs | crates.io - De Bruijn graph construction & path compression libraries.

  • fastq-rs (repo | docs | crates.io) - A fast parser for FASTQ.

  • htsget-rs (repo | docs | crates.io) - GA4GH’s htsget implementation.

  • needletail (repo | docs | crates.io) - Fast FASTX parsing and k-mer methods in Rust.

  • niffler (repo) - Simple and transparent support for compressed files.

  • noodles (repo | docs | crates.io) - Pure Rust bioinformatics I/O libraries.

  • rust-boomphf (repo | docs | crates.io) - Fast and scalable minimal perfect hashing for massive key sets.

  • rust-htslib (repo | docs | crates.io) - Provides HTSlib bindings and a high level Rust API for reading and writing BAM files.

  • seq_io (repo | docs | crates.io) - FASTA and FASTQ parsing and writing in Rust.

  • triple_accel (repo | docs | crates.io) - Rust edit distance routines accelerated using SIMD. Supports fast Hamming, Levenshtein, restricted Damerau-Levenshtein, etc. distance calculations and string search.

Tools

Genome Assembly

  • rust-mdbg (repo) - Minimizer-space de Bruijn graphs (mdBG) implementation for whole-genome assembly.

  • Peregrine (repo) - A genome assembler designed for long-reads that have good enough accuracy.

(Meta) Genomic analysis

  • bamtofastq (repo) - Tool for converting 10x BAMs produced back into to FASTQ files.

  • bcl2fastr (repo) - Faster bcl2fastq implementation.

  • block-aligner (repo) - SIMD-accelerated library for computing global and X-drop affine gap penalty sequence-to-sequence or sequence-to-profile alignments using an adaptive block-based algorithm.

  • CoverM (repo) - Read coverage calculator for metagenomics.

  • finch (repo) - A genomic MinHashing implementation.

  • fmlrc2 (repo) - A tool for correcting long, noisy reads from short reads.

  • galah (repo) - More scalable dereplication for metagenome assembled genomes.

  • hyperex (repo) - Hypervariable region primer-based extractor for 16S rRNA and other SSU/LSU sequences.

  • mfqe (repo) - FASTA/FASTQ extractor for multiple sets of read names.

  • noodles-squab (repo) - Noodles squab performs gene expression quantification by counting the number of aligned records that intersects a set of features. Output can be the raw counts or normalized counts in TPM (transcripts per million) or FPKM (fragments per kilobase per million mapped reads).

  • perbase (repo) - A highly parallelized utility for generating per-base level metrics.

  • rasusa (repo) - Randomly subsample sequencing reads to a specified coverage.

  • sabreur (repo) - Fast, reliable and handy demultiplexing tool for fastx files.

  • smafa (repo) - Biological sequence aligner for pre-aligned sequences.

  • sourmash (repo) - Quickly search, compare, and analyze genomic and metagenomic data sets.

  • yacrd (repo) - Yet Another Chimeric Read Detector for long reads.

Transcriptomics analysis

  • alevin-fry (repo) - An efficient and flexible tool for processing single-cell sequencing data, currently focused on single-cell transcriptomics and feature barcoding.

  • mudskipper (repo) - A tool for projecting genomic alignments to transcriptomic coordinates.

Variant calling

  • echtvar (repo) - Really, truly rapid variant annotation and filtering.

  • prosolo (repo) - A variant caller for multiple displacement amplified DNA sequencing data from diploid single cells.

  • varlociraptor (repo) - Varlociraptor implements a novel, unified fully uncertainty-aware approach to genomic variant calling in arbitrary scenarios.

  • vartrix (repo) - Single-Cell Genotyping Tool.

Miscelaneous

  • cute nucleotides (repo) - Cute tricks for SIMD vectorized binary encoding and decoding of nucleotides, in Rust.

Contributing

You can use the editor on GitHub to contribute to this page. Feel free to list new bioinformatics crates or CLI! Of course, you can also contribute to the ecosystem by writing a new Rust crate.

Credits

This is an up-to-date and maintained fork of the initial are we bio yet site which has become non-responsive.