Clone requests
A clone request should be submitted using the electronic form using the link below; this will automatically generate an email to ...
Data formats, Data management, Sequence data processing
A virtual FUSE filesystem for on-the-fly CRAM-to-BAM conversion.
Annotation
Acedb is a database system developed specifically for handling genome and bioinformatic data, it includes many powerful tools for the ...
Analysis
ADaM (Adaptive Daisy Model): an R package for discriminating between core fitness and context-specific fitness genes in large-scale gene essentiality ...
The Anopheles Gambiae 1000 Genome project is a global collaboration using whole genome deep sequencing to provide a high-resolution view of ...
FRont-End for Sequence COmparison - The aim is to develop a new visualisation tool that allows effective comparative genome sequence analysis.
Alien_hunter is an application for the prediction of putative Horizontal Gene Transfer (HGT) events with the implementation of Interpolated Variable Order ...
alleleIntegrator searches for allelic imbalances that represent cancer-defining somatic copy number changes to precisely identify single cancer cell transcriptomes.
AMELIA is a program that employs allele matching to analyse the effects of rare variants within a specific locus.
Analysis, Sequence data processing
Malaria amplicon-sequencing analysis
A software application that identifies anitbiotic resistance genes by running local assemblies.
accumulation of rare variants integrated and extended locus-specific test
Database software
ARNIE is an online database that integrates the extracellular protein interaction network generated in our lab using AVEXIS technology with spatiotemporal ...
Genome browser and annotation tool that allows visualisation of sequence features, next generation data and the results of analyses within the ...
Annotation, Visualisation
ACT is a Java application for displaying pairwise comparisons between two or more DNA sequences.
AutoCSA is a mutation detection program designed to detect small mutations (1-50 bases) in sequence traces.
Analysis, Data formats, Sequence data processing
This is a tools for comparing a BAM file to a CRAM file, after converting from one format to the ...
An interactive Java application for visualising read-alignment data stored in BAM files.
Tool
BASiCz - Blood Atlas of Single Cells in zebrafish
Data management
Client programs and API for use with iRODS (Integrated Rule-Oriented Data System).
Data management, Development
Desktop application
Analysis, Laboratory management
BD Sortware for the Influx stores all the index data inside the .fcs file, however the common analysis tools (FlowJo, ...
Laboratory management
When requested by management for statistics on index sorting by our customers we wrote this script. It will trawl though ...
This python script applies predefined sort templates to the BD Influx workspace. It can also take designs from Excel and ...
BD Sortware for the Influx names all its files with non-descriptive names. When backing up and cleaning our PC we ...
Explore and visualise data from CRISPR base editing mutagenesis screens
A set of tools to analyse the output from TraDIS analyses.
Tools for early stage NGS alignment file processing including fast sorting and duplicate marking.
BioView was born out of the Standalone AutoCSA mutation detection project.
The Sanger Insitute made available to researchers a number of blast indicies. Most of these were for microbrial pathogens but also ...
BlobToolKit is a software suite to aid researchers in identifying and isolating non-target data in draft and publicly available genome assemblies.
BOAT (Bayesian Overlap Analysis Tool) identifies variants that are associated with two traits and tests for enrichment (i.e. whether ...
Data formats
Common assembly format (CAF).
A fully-featured genome browser for cancer genetics
A combined functional annotation score of non-synonymous coding variants
CCRaVAT (Case-Control Rare Variant Analysis Tool) & QuTie (Quantitative Trait)
Analysis, Visualisation
A Hub for Preclinical Cancer Models - Annotation, Genomics & Functional Datasets
Comprehensive mapping of tissue cell architecture via integrated single cell and spatial transcriptomics (cell2location model)
Pipeline management, Sequence data processing, Visualisation
Chromoview is a tool for viewing clone tiling paths on chromosomes. It includes data on genomes maintained by the Genome Reference ...
A software toolkit to circularize genome assemblies.
Analysis, Sequence data processing, Statistical and population genetics
High-definition reconstruction of clonal composition
cnD is a program to detect copy number variants from short read sequence data.
COMET (Corrected Overlap and Marginal Enrichment Test) is a computationally efficient method for identifying SNP features (e.g. functional annotation) that ...
Data management, Development, Pipeline management, Sequence data processing
Cookie Monster is a tool for triaging the huge amounts of sequencing (and related) data by its metadata, from various ...
COSMIC, the "Catalogue Of Somatic Mutations In Cancer" is an expert-curated database encompassing the wide variety of somatic mutation mechanisms causing ...
Database software, Visualisation
COSMIC-3D provides interactive 3D visualisations of more than 8,000 human proteins displaying cancer mutations.
Analysis, Statistical and population genetics, Visualisation
This website shows how SARS-CoV-2 lineages have changed in frequency over time across England.
Laboratory management, Pipeline management
The Institute contributed a range of protocols on all aspects of large-scale, nation-wide virus sample and metadata collection, preparation, sequencing, analysis ...
CRAM is a more highly compressed alternative to the BAM and SAM DNA sequence alignment file formats.
A data mining tool for the purpose of querying a sequence and annotation data source, Java based.
CRISPR GUARD is a method for reducing off-target editing by Cas9 and base editors.
CRISPR-Cas9 Resources. A clone request should be submitted using the electronic form using the link below; this will automatically generate an ...
An R package for unsupervised identification and correction of gene independent cell responses to CRISPR-cas9 targeting
Cross_genome is a package to build up genome assembly scaffolds using cross-species synteny.
Crumble - Lossy compression of DNA sequence quality values
Visualisation
Time course changes in gene expression in naive and memory CD4 T cells after activation
Distributed annotation service
DbCon provides a simple interface to DBCP, offers distributed pooling configuration and provides a clean layer of separation between Java code ...
DECIPHER is an interactive web-based database which incorporates a suite of tools designed to aid the interpretation of genomic variants. DECIPHER ...
DecoyPYrat - Fast Hybrid Decoy Sequence Database Creation for Proteomic Mass Spectrometry Analyses
Dindel is a program for calling small indels from short-read sequence data ('next generation sequence data'). It is currently designed to ...
D3E is a method for identifying differentially expressed genes from single-cell RNA-seq experiments. D3E compares the full distribution between two sample ...
An interactive Java application for generating circular and linear representations of genomes.
Analysis, Data management, Sequence data processing, Statistical and population genetics
dNdScv tool uses dN/dS methods to quantify selection in cancer and somatic evolution
Development, Systems administration
docker-proxify creates a container environment in which outbound connections are transparently proxied through a proxy server.
Doublescan is a program for comparative ab initio prediction of protein coding genes in mouse and human DNA.
EMu is software for inferring the mutational signatures present in a number of cancer mutation sets.
The Ensembl project creates evidence-based annotation of genome sequences and integrates these data with other biological information. All of Ensembl' ...
Analysis, Database software, Statistical and population genetics, Visualisation
Mobile and Web application for free and easy data collection
Eponine is a probabilistic method for detecting transcription start sites (TSS) in mammalian genomic sequence, with good specificity and excellent positional ...
Resources available
Analysis, Sequence data processing, Visualisation
a graphical tool for visualising genotype intensity data in order to assess genotype calls as part of quality control procedures ...
Analysis, Gene finding, Ontology
The Exomiser is a Java program that finds potential disease-causing variants from whole-exome or whole-genome sequencing data.
A fast solution to cluster genetic sequences
To help with the operation of the flow cytometry facility we have written a range of Python scripts to automate ...
Gap5 is a DNA sequence assembly visualiser and editing tool.
Gene finding
GAZE is a tool for the integration of gene prediction signal and content sensor information into complete gene structures.
The GDSC database facilitates the identification of molecular features of cancers that predict response to anti-cancer drugs.
The aim of GENCODE is to annotate all evidence-based gene features in the human and mouse genomes at high accuracy. ...
A genome database containing the latest sequence data and annotation/curation for organisms sequenced by the Pathogen group.
Simple and fast gene-protein visualisation and editing.
Genevar is a platform of database and web services designed for data integration, analysis and visualization of SNP-gene associations in eQTL ( ...
Analysis, Data management, Database software
GoaT is a powerful data aggregator and portal to explore and report underlying data for the eukaryotic tree of life. It ...
Data management, Visualisation
The GenomeHubs project provides tools for visualising and interpreting genomic datasets
Analysis, Database software
Genome Assembly Evaluation Browser
This software converts GFF3 files from the annotation tool Prokka into a format that is suitable for submission to EMBL.
Genome-wide LInkage DisEquilibrium Repository and Search Engine
GRAFT (Genomic Rearrangement Assembly For Tumours) is an algorithm designed to time rearrangements.
Genealogies Unbiased By recomBinations In Nucleotide Sequences
Gubbins (Genealogies Unbiased By recomBinations In Nucleotide Sequences) can be used to identify recombination.
Genome Wide Annotation of VAriants - a functional annotation tool for non-coding sequence variation
Hail based analysis pipelines for HG projects: pipelines for QC of genome - sequenced cohorts, and GWAS after QC. Designed to ...
Scans for likely coding regions using 6-mers but without deriving information from base composition.
Terraform and ansible codebase to provision clusters (e.g. hail/spark) at Sanger. The framework can be used to provision ...
Data formats, Database software, Development, Ontology
hypr - Research and development of hypermedia information systems.
Image is a package of analysis algorithms for processing gel images from restriction digest fingerprinting experiments.
An interactive atlas of immune cell receptor interactions in the human body
Io_trace system call traces a process and reports the amount of I/O performed to each file, socket or file ...
A clone request should be submitted using the electronic form using the link below; this will automatically generate an email ...
A de novo assembler designed specifically for read pairs sequenced at highly variable depth from RNA virus samples.
A program that analyses the effects of low frequency and rare variants on quantitative traits within a chromosomal region
Krocus is a software application that can predict MLST directly from uncorrected long reads.
A program for calculating FDR estimates with large datasets.
Pipeline management
A flexible front end to plate based pipelines in DNA Pipelines
Profile Hidden Markov Models (pHMMs) are a widely used tool for protein family research.
This is a result of the growing number of well characterised protein families in databases such as Pfam. By adding additional ...
LookSeq is a web-based application for alignment visualization, browsing and analysis of genome sequence data.
The Malaria Cell Atlas is an active project led by the Lawniczak lab to provide an interactive data resource of single ...
Margarita infers genealogies from population genotype data and uses these to map disease loci.
MascotPercolator is a software package that interfaces the proteomics spectral identification algorithm Mascot (Matrix Science) with Percolator, a well performing machine ...
The MEROPS database of proteolytic enzymes, their substrates and inhibitors provides a "one-stop shop" for researchers with an interest in proteolytic ...
METACARPA performs scalable meta-analysis between genetic association studies, both effect-size based and p-value based, while correcting for unknown sample overlap.
Metadata-check is a tool for verifying that the metadata in a BAM/CRAM file header is consistent with the metadata ...
Phylogenetics
Easily create Interactive web visualisations using Maps, phylogenetic trees and metadata
Sequence data processing
A python pipeline for mitochondrial genome assembly from PacBio high fidelity reads, developed within the Darwin Tree of Life Project.
Data management, Laboratory management
Reagent creation and barcoding service
A software application for taking MLST databases from multiple locations and consolidating them in one place so that they can ...
Data management, Systems administration
mpistat is a tool for efficiently gathering file system statistics from distributed parallel file systems using a large number of ...
This web-based graphical tool allows for rapid generation of .fa files suitable for MPRA experiments.
A tool for the design of high-throughput massively parallel reporter assays (MPRAs)
The Multiple Motif Meta Analysis (M3A) method can be used to identify enriched and de-enriched DNA motifs from a collection ...
NestedMICA is a method for discovering over-represented short motifs in large sets of strings. Typical applications include finding candidate transcription factor ...
Analysis, Pipeline management
A bioinformatics analysis pipeline used for RNA sequencing data, written in the new nextflow DSL2 language syntax, leveraging nextflow modules.
NPG in Sequencing Informatics develops, maintains and runs tracking, analysis, qc and archival software to support the Illumina sequencing and ...
Analysis, Statistical and population genetics
Olorin is an interactive filtering tool for next generation sequencing data coming from the study of large complex disease pedigrees.
optiCall is a robust genotype-calling algorithm for calling rare, low-frequency and common variants from SNP microarray intensity data.
Optimist: Inference of positive selection
Otter is an interactive, graphical, genome annotation tool used by the Havana group to produce high-quality gene models. (Otterlace was renamed ...
Analysis, Annotation, Sequence data processing
Tools to generate automatically high quality sequence by ordering contigs, closing gaps, correcting sequence errors and transferring annotation.
Obtians pairwise SNP distance matrices from multiple sequence alignments
A Bacterial Pangenome Analysis Pipeline
Analysis, Database software, Phylogenetics, Statistical and population genetics, Visualisation
Processing and Visualisation of Microbial Genome Sequences in Phylogenetic and Geographical Contexts
A parallel copy program for Lustre.
PEER is a collection of Bayesian approaches to infer hidden determinants and their effects from gene expression profiles using factor ...
The Pf3k project is a global collaboration using the latest sequencing technologies to provide a high-resolution view of the natural variation ...
The open access resource was established at the Wellcome Trust Sanger Institute in 1998. Its vision is to provide a ...
Database software, Gene finding, Ontology
PhenoDigm is an algorithm to prioritise disease gene candidates based on phenotype information. It incorporates the OWLSim mechanism to align ...
Phusion is a software package for assembling genome sequences from whole genome shotgun(WGS) reads.
Phusion2 is a pipeline for de novo genome assembly using NGS data. It is based upon a strategy called read ...
PhyloCanvas is a HTML5 phylogeneric tree viewer written entirely in browser-native JavaScript, using no external libraries, and uses the Canvas element. ...
PICNIC (Predicting Integral Copy Numbers In Cancer) is an algorithm designed to identify copy number segments and genotypes in cancer using ...
Pindel can detect breakpoints of large deletions, medium sized insertions, inversions, tandem duplications and other structural variants at single-based resolution ...
A kmer based approach for identifying plasmids.
Analysis, Annotation, Gene finding, Vector Resources
Plasmo GEM is a non-profit, open-access malaria research resource, providing tools for the manipulation of Plasmodium genomes, and using them ...
Annotation, Data formats, Visualisation
PoGo – Fast Mapping of Peptides to Genomic Coordinates for Proteogenomic Analyses
A tool for clustering genomes
A Python script to email the last user of the day when they are operating outside office hours using data ...
This is a very simple script takes the bookings or cancellations as inputted into a csv file and books them ...
A script that queries the PPMS calendar system to produce a sorted HTML file of the day’s schedule.
The Parasite Genomics Group is working on the assembly, annotation and manual curation of several different reference genomes.
Analysis, Database software, Visualisation
Genetic screens to identify cancer dependencies and drug targets
Projector is a program for the comparative, homology based prediction of protein coding genes in mouse and human DNA.
ProServer is a very lightweight DAS server written in Perl.
Pseudogene inference from Loss of Constraint.
QUASR is a lightweight pipeline written to process and analyse next-generation sequencing (NGS) data from Illumina, 454, and Ion Torrent platforms. ...
QuickTree allows the reconstruction of phylogenies for very large protein families that would be infeasible using other popular methods.
A tool that evaluates the accuracy of a genome assembly using mapped paired end reads.
RetroSeq is for detecting non-reference TE insertions from Illumina paired-end whole-genome sequencing data.
The Rfam database is a collection of RNA families.
A software application for rapidly constructing pan genomes from large numbers of prokaryote samples.
A software application for fast, reference-free pseudo-phylogenomic trees from reads or contigs.
Analysis, Annotation, Data formats, Sequence data processing
SAMtools, BCFtools. and HTSlib are tools for manipulating sequence alignment (SAM, BAM, CRAM) and variant call (VCF and BCF) files.
A free genotype imputation and phasing service provided by the Wellcome Sanger Institute.
A tool for unsupervised projection of single cell RNA-seq data.
Simple Comparison of Outputs Program
Data formats, Sequence data processing
Scramble is a DNA sequencing file format conversion tool included as part of the Staden io_lib package.
Teaching material the Hemberg group's course on computational analysis of single-cell RNA-seq data
Seuqence enrichment analyis used in Genome Wide Association studies (GWAS)
A suite of tools for visualising sequence alignments
Web based LIMS written in Ruby on Rails.
Data management, Sequence data processing
Serapis manages the ingestion of data into the iRODS-based human genetics data archive at Sanger.
A k-mer based Pipeline to identify the Serotype from Illumina NGS reads for given references.
This code performs targeted archiving of the files arising from the analysis of Sanger sequencing projects.
SC3 is a method for unsupervised clustering of single-cell RNA-seq data. In addition to a graphical user-interface, SC3 provides additional ...
SMALT aligns DNA sequencing reads with a reference genome.
SMIS ( Single Molecular Integrative Scaffolding ): an assembly pipeline to improve scaffolds using Oxford Nanopore or PacBio long reads.
Sequence data processing, Visualisation
Browser based quality control tool to expedite reviewing predicted variants in next generation sequencing files.
SNP-o-matic is a fast, stringent short-read mapping software.
Finds SNP sites from a multi-FASTA alignment file.
An R package for the estimation and removal of cell free mRNA contamination in droplet based single cell RNA-seq data.
A printing service for barcoded labels with a WYSIWIG interface
Short description of the Software or database that will appear on every page that links to this page.
SSAHA2: Sequence Search and Alignment by Hashing Algorithm.
ssahaEST: Sequence Search and Alignment by Hashing Algorithm
ssahaSNP: Sequence Search and Alignment by Hashing Algorithm ssahaSNP is a polymorphism detection tool. It detects homozygous SNPs and indels by ...
ssahaSNP is a tool for the detection of SNPs and short indels using the first generation capillary sequencing reads.
A LIMS system to support and guide users in the lab providing the Spatial Genomics services.
Data management, Pipeline management
Tabula is a tool for recording and analysing command line sessions.
Data formats, Development, Systems administration
Extract a tar archive, defusing any tarbombs.
Tarchecksum checks that all the files within a tar archive are identical to the files on disk in the directory ...
Data management, Pipeline management, Systems administration
teepot is a buffered version of the unix tee command.
Clone request submissions.
Tiffin is a database of predicted regulatory motifs and predicted functional sites ("motif instances") on genome sequences.
Generates and updates track hubs for use with the Ensembl Track Hub Registry.
Workflows and tools to investigate the genomic diversity of complex organisms.
TreeFam is a database composed of phylogenetic trees inferred from animal genomes. It provides orthology/parology predictions as well the ...
Colocalisation plots between Treg QTLs and immune disease GWAS
Turbo SLoMo is a modified version of SLoMo a PTM site localisation tool created at Birmingham University.
VEGA displays all of the manual annotation from the HAVANA team.
Data management, Database software
Browser-based Shiny frontend to view internal Lustre volume reports.
WTSI Genome Editing (WGE) is a website and database that provides tools for designing genome editing of human and mouse genomes ...
Analysis, Database software, Ontology, Visualisation
WormBase is the model organism database of C. elegans and other nematodes. The related resource WormBase ParaSite, is a portal ...
The BD Influx has a very useful feature that automatically saves sort statistics for every run. The XDP does not ...
Development
Bearer token codec library.
ZMap is a genome browser written in C++ with the aim of providing fast access to high volume data.