Collection of training resources for Galaxy courses - Community Site
Proudly part of
In this practical you will use several additional features not covered in the previous sessions. This will help you to:
Using the Copy datasets function, copy the following datasets to a new history:
This check can be performed before running the experiment as it only requires a BED files containing the regions covered by the Exome Capture Kit. These files are freely available from the vendor sites.
Question:
father
, mother
or proband
)
for each input file. For more info see the QPLOT website.Questions:
The next Galaxy release will include two direct links to quickly display VCF and BAM file at iobio.io, a platform for immediate visual feedback of complex genomic datasets.
In the meantime, you can display your data at iobio.io as follows:
The aim of this step is to reduce the false positive calls by identifying low quality variants. The best solution is to apply GATK Variant Quality Score Recalibration. If it cannot be applied (e.g. for small sample sets), low quality variants can be flagged using the following criteria, according to GATK Best Practices (aka GATK hard-filtering):
Filters for SNPS:
Filters for INDELs:
Note that you need to apply different filters to SNPs and INDELs. Browse the Published workflows section and run GATK Hard Filters. Edit the workflow to inspect the different sections and execute. The output now includes different variants whose value in column FILTER is different from PASS: these variants are considered as low-quality variants and are assigned a low priority.
Question:
count
on column [value corresponding to the column FILTER]
, do not round results
.SnpEFF Variant effect and annotation is a popular tool for the annotation of VCF files. This will populate the INFO column of your file with the new annotations, and the header of the VCF with a short description.
Question:
To annotate your VCF with info extracted from internal resources, i.e. allele frequency from a reference population, you can run GATK Variant annotator. Briefly, it takes a VCF as input and adds the annotations extracted from the INFO column of multiple VCF files.
Let’s assume you want to annotate which variants in your set are present in NCBI ClinVar, a database of variants of clinical relevance. To do that, execute GATK Variant annotator as follows:
Run GATK Variant Annotator with the following parameters:
hg19_chr8.fa
Binding for reference-ordered resource data:
clinvar_YYYYMMDD_hg19.vcf
clinvar
Expressions: to annotate with the CLNSIG (Variant Clinical Significance, from 0 to 7) and CLNDBN (Variant disease name) parameters from ClinVar, enter the two following expressions:
clinvar.CLNSIG
clinvar.CLNDBN
NexteraRapidCaptureExpandedExome_Target.hg19.chr8.padding200.bed
in
Advanced GATK options -> Operate on genomic intervalIf you want to export the final VCF in a Excel-compatible file, run the VCFtoTab-delimited tool.
Identification of Runs of Homozygosity (RoH) is a strategy to limit the search for candidate genes to specific chromosomal regions in consanguineous families. You can identify RoH in your family with the following tools:
10
, corresponding to 10Kb) for Filter by Runs of Homozygosity (ROH).
The software will return only the variants located in a RoH with length greater than this value. In the tabular output the last two columns contain the number of SNPs and length of the RoH.The bedtools utilities are a suite of tools for a wide-range of operations with genomic data. Using bedtools you can, for example, intersect, merge and count genomic intervals from files in different formats such as BAM, BED and VCF.
Questions: