# NGS bioinformatics quality control and variant annotation for cancer diagnosis

• ### Detailed program and venue

Venue: Streamed in Zoom
Schedule: from 9h15 to 17h15, with a lunch break and 2 coffee breaks.

PARTS I & II: NGS BIOINFORMATICS PIPELINE & QUALITY CONTROL (QC)

09:15-09:35 Introduction: tumor board context, example with lung cancer explaining spectrum of detectable alterations. Dr. Timothée Olivier

09:35-10:05 Introduction to NGS technologies (Dr. Aitana Lebrand)

10:05-10:45 Overview of a bioinformatics analysis pipeline and basic QC (Dr. Aitana Lebrand)

10:45-11:00 Break

11:00-12:30 Hands-on on Galaxy: using Galaxy to filter variants based on quality filters and variant allele frequency. (Dr. Yann Christinat)

12:30-13:30 Break

PART III: ANNOTATION

13:30-14:45 Clinical significance: pathogenicity & actionability. Predict variant effect and impact using bioinformatics tools. Annotation using literature and other public databases. (Dr. Aitana Lebrand)

14:45-15:00 Break

15:00-17:00 Hands-on on variant annotation. (Dr. Yann Christinat)

17:00-17:15 Wrap-up of the course and take-home messages. (Dr. Yann Christinat & Dr. Aitana Lebrand)

• ### Hands-on (morning session)

To get used to the Galaxy platform, we will perform the first exercise together then participants will be split into random groups of three and sent to breakout rooms. We’ll be visiting randomly but don’t hesitate to call for help.

The VCF and pipelines files are available on Moodle.

If Galaxy takes too long to process your pipelines, you can look at the solutions on Moodle (we will make them available upon request).

#### Exercise 1

In this exercise, we will use the Galaxy pipeline to annotate the VCF and extract certain fields into a readable text file (that can be imported into Excel).

2. Click on the “workflow” tab and click “import”
3. Download the file Galaxy-Workflow-VCF_annotation_pipeline.ga from Moodle and upload it into Galaxy via the “Archived Workflow File” file selector. Then click the “Import workflow” button.
4. Then run the pipeline with the VCF file HUG16-1.vcf and download the resulting text file (click the white arrow on blue background on the right of the workflow name to run the workflow)

#### Exercise 2

Now that you are familiar with Galaxy, the sequencing laboratory has given you the NGS result of an old FFPE sample of a lung adenocarcinoma patient. You suspect (rightfully) that there are FFPE-induced artefacts in your sample and would like to filter them out.

1. Copy the imported workflow and rename it (go to workflow, click on the workflow name and then on “copy” and “rename”)
2. Run the workflow with the VCF file HUG16-2.vcf and observe the result. Can you spot the mutations that are artefacts?
3. Decide on a threshold for the allelic frequency that could be used to add a filtering step to the pipeline. In practice, we would not report mutations below 5% but in case of FFPE artefact we might increase the limit to 10% or 20%. Pick one (5%, 10% or 20%) and explain your choice.
4. Edit your new pipeline and add the tool “SnpSift Filter” (go to workflow, click on the workflow name and then on “edit”; add the tool after “Annotation with SnpEff”, before “Extract to text table”)
5. Run the new workflow on the same sample with a filtering parameter: in the field “Filter criteria” enter “GEN[0].AF>0.01” and replace 0.01 (1%) by the threshold of your choice.
6. Download the filtered VCF file and the text file and compare the results with or without filter to ensure that everything went fine.

#### Exercise 3 (optional)

When looking at the EGFR mutations in the first sample (HUG16-1.vcf), you notice that these two mutations are quite close (only 15 base pairs apart). Further inspection of the alignment with IGV (or another tool) revealed that these two mutations are always on the same reads. You decide that they should be treated as one and go on to modify the VCF manually.

Step 1. Identify the position, reference sequence and the altered sequence of the new mutation (indel) as it should be entered in the VCF.

chr7      55'242'466     55'242'481              V              VRef DNA    AAGGAATTAAGAGAAGCAACATCTCCGMut DNA    AAGG----------AGCAA-----CCG

Step 2
. Modify the VCF and run it through the first Galaxy pipeline. What do you observe?

• ### Hands-on (afternoon session)

We will upload the VCF file into VEP together then participants will be split into random groups of three and sent to breakout rooms. We’ll be visiting randomly but don’t hesitate to call for help.

#### Exercise 1

Annotate the variants in the HUG16-1.vcf file using the online tool VEP

1. Go to http://grch37.ensembl.org/Tools/VEP and click "New job"
2. In the section “Additional configurations: Additional annotations”, tick the box «Identify canonical transcripts»
3. In the section “Additional configurations: Identifiers”, tick the box “HGVS”
4. In the section "Variants and frequency data: Frequency data for colocated variants", tick the gnomAD box
5. Upload the VCF and submit the job
6. On the result page, add the filter «Canonical is YES»

For each variant, decide whether they are potential germline variant or not. For the somatic variants, infer the impact on the function of the protein.

#### Exercise 2

Using diverse resources (COSMIC, PubMed, VEP, etc.)  Infer the impact of the merged EGFR mutation, i.e. when the two frameshifts mutations are merged into one (p.Leu747_Ser752delinsGln), on the function of EGFR. Does that change anything?

#### Exercise 3

We will now be looking at the clinical actionability of some specific variants in our two cases of lung adenocarcinoma:

Case 1 (HUG16-1):

• The new EGFR mutation when the two variants are merged (p.Leu747_Ser752delinsGln, exon 19)
• The EGFR mutation with a lower allele frequency (p.Thr790Met)

Case 2 (HUG16-2):

• The PIK3CA mutation (p.His1047Arg)

For each of these mutations, define whether they actionable and in which tier they fall according the guidelines. To that end, you can use the following resources: