Galaxy Training

Collection of training resources for Galaxy courses - Community Site

Proudly part of

Advanced Exome Analysis using Galaxy

Aims

In this practical you will use several additional features not covered in the previous sessions. This will help you to:

Before starting

Using the Copy datasets function, copy the following datasets to a new history:

Check the coverage of your favourite genes

This check can be performed before running the experiment as it only requires a BED files containing the regions covered by the Exome Capture Kit. These files are freely available from the vendor sites.

Question:

Quality control of aligned reads

Questions:

Quickly examine your VCF and BAM file

The next Galaxy release will include two direct links to quickly display VCF and BAM file at iobio.io, a platform for immediate visual feedback of complex genomic datasets.

In the meantime, you can display your data at iobio.io as follows:

Flag low quality variants

The aim of this step is to reduce the false positive calls by identifying low quality variants. The best solution is to apply GATK Variant Quality Score Recalibration. If it cannot be applied (e.g. for small sample sets), low quality variants can be flagged using the following criteria, according to GATK Best Practices (aka GATK hard-filtering):

Filters for SNPS:

Filters for INDELs:

Note that you need to apply different filters to SNPs and INDELs. Browse the Published workflows section and run GATK Hard Filters. Edit the workflow to inspect the different sections and execute. The output now includes different variants whose value in column FILTER is different from PASS: these variants are considered as low-quality variants and are assigned a low priority.

Question:

Annotations with SnpEFF

SnpEFF Variant effect and annotation is a popular tool for the annotation of VCF files. This will populate the INFO column of your file with the new annotations, and the header of the VCF with a short description.

Question:

Annotate with your internal resources

To annotate your VCF with info extracted from internal resources, i.e. allele frequency from a reference population, you can run GATK Variant annotator. Briefly, it takes a VCF as input and adds the annotations extracted from the INFO column of multiple VCF files.

Let’s assume you want to annotate which variants in your set are present in NCBI ClinVar, a database of variants of clinical relevance. To do that, execute GATK Variant annotator as follows:

If you want to export the final VCF in a Excel-compatible file, run the VCFtoTab-delimited tool.

Runs of Homozygosity

Identification of Runs of Homozygosity (RoH) is a strategy to limit the search for candidate genes to specific chromosomal regions in consanguineous families. You can identify RoH in your family with the following tools:

Bedtools

The bedtools utilities are a suite of tools for a wide-range of operations with genomic data. Using bedtools you can, for example, intersect, merge and count genomic intervals from files in different formats such as BAM, BED and VCF.

Questions: