Agile sequence analysis

Desktop programs for analyzing massively parallel sequence data

The Agile suite of programs allows the rapid analysis of clonal sequencing data, aligned to the human genome, with a view to identifying disease-causing sequence variants. It comprises several component programs, each with its own user guide and download page, as described below:

 

Agile variant viewer screenshot

AgileVariantViewer identifying a PXDN variant that causes congenital cataract, corneal opacity, and glaucoma (citation).

 


Fastq and Fasta file manipulation and read quality filtering

AgilePairedEndReadsCombiner

Can combine paired end read sequences that overlap to form a single read and quality score string.

AgileQualityFilter

Allows the filtering out of low quality sequences, with the exported data formatted as either FASTA or FASTQ. If the reads are 5′‑end tagged with sample identifiers, the program will also sort and remove the tags from the output data.

Sam file manipulation

AgileSamFileSorter

Sorts the aligned sequence reads in an unordered SAM file by chromosome number and position.

AgileSamFileMerger

Combines two or more ordered SAM files to create a single ordered file.

Autozygosity mapping with exome data

AgileGenotyper

Creates a pseudo-microarray SNP genotyping file from an ordered SAM file containing exon sequence data. The file will contain the genotype data at over 0.5 million SNP sites previously identified by the 1000 Genomes project. Such a file can then be used as a data source for a mapping program designed for analyzing Affymetrix microarray SNP data (see Genetic mapping).

AgileMultiIdeogram

AgileMultiIdeogram displays autozygous regions from multiple individuals, identified in Affymetrix microarray SNP chip genotype data and/or exome variant data, against a circular ideogram of the human autosomal genome. AgileMultiIdeogram web page

AgileROH

AgileROH is the core functionality of AgileMultiIdeogram ported to C++ such that it can be run as a command line application in both Windows and Linux environments. The programs demonstrate how to use the source code to either just identify the autozygous regions and export to a text file or filter the VCF file to remove all variants not in the autozygous regions. AgileROH web page

AgileVariantMapper

Visualises sequence variant data from whole exome data, allowing the identification of autozygous regions in consanguineous individuals. The data can originate from files exported by AgileGenotyper, AgileAnnotator, AgileVariantViewer or a tab-delimited text file formatted as described in the user guide webpage.

AgileVCFMapper

AgileVCFMapper allows exome sequence variants in *.VCF files to be used to map disease loci in a similarly manner to AutoSNPa and IBDFinder. While it doesn't duplicate the functions of Phaser, Sample and DominantMapper, AgileVCFMapper can export the exome variant data to a coherent set of SNP genotype files that these programs can use.

Genome annotation file creation

AgileGAFCreator

AgileGAFCreator is designed to create annotation files of the human genome that can be used by other programs in Agile suite of Next Generation Sequencing programs.

Germline mutation detection

AgileAnnotator

Reads an ordered SAM file and identifies any sequence variant located in a protein-coding exon or within 50 bp of a splice site.

AgileKnownSNPFilter

Analyzes sequence variants exported by AgileAnnotator and identifies those previously found in the 1000 Genomes project.

AgileFileConverter

Reformats data in tab-delimited text files to a format that can be imported into AgileVariantViewer or AgileFileViewer. Since many commercial NGS service providers supply variant data in tab-delimited text files, AgileFileConverter allows such data to be analyzed by the Agile suite of programs.

AgileVariantViewer

Allows variants identified by AgileAnnotator and optionally filtered by AgileKnownSNPFilter to be interactively filtered by read depth and by allele read depth ratio. AgileVariantViewer can then export sequence variants for the whole genome, a single chromosome or a chromosomal region.

AgileGeneFilter

Allows sequence variants exported from AgileVariantViewer to be filtered, by identifying the genes within which they are located, and then performing a textual data search on those genes, using information from UniProt.

AgileFileViewer

Reads a sequence variant file created by AgileAnnotator (or one filtered by either AgileKnownSNPFilter or AgileVariantViewer) and displays associated information for each variant.

AgileVariantSelector

Reads multiple VCF files or 'annotated VCF' like files and identifies variants which occur at least n times in the data from affected individuals and discounts any variant that appears in a normal individuals data set.

GeneTIER

replaces the knowledge-based inference traditionally used in candidate disease gene prioritization, instead using experimental data from tissue-specific gene expression datasets. It is implemented as a hosted web application, and may be found here.

OVA

is an online variant filtering and prioritisation application. Ontology Variant Analysis Tool can filter your VCF files on a wide array of criteria. Remaining genes are prioritisated based on their functional and phenotypic profile similarity to a user supplied phenotype. It is implemented as a hosted web application, and may be found here.

Filtering the output from AlamutHT or Pindel

AgileExomeFilter

Enables the rapid filtering, sorting and screening of single-base variants as well as small indels derived from an exome-sequencing experiment and annotated by AlamutHT, allowing the rapid detection of possibly deleterious variants.

AgilePindelFilter

Enables the rapid filtering and screening of indel variants derived from an exome-sequencing experiment and annotated by Pindel, to allow the rapid detection of possibly deleterious variants.

Somatic mutation detection

AgileSMPoint

Identifies somatic sequence variants occurring at specific genomic positions and uses unaligned next-generation sequence data.

AgileSMAll

Identifies somatic sequence variants occurring at all positions within a PCR product and uses unaligned next-generation sequence data.