Skip to content Skip to footer

Training courses: School of Python for genomics Massimiliano Orsini. Advanced Module: Applying Machine Learning to Genomics Data

ISS_Logo IIB_ELIXIR_LOGO


Course Description

The rapid development of next-generation sequencing technologies has made genomic sequencing one of the primary sources of biological data. Despite the availability of numerous analysis tools, genomic research often requires specific skills to create custom scripts or software. To address this, the Società Italiana di Diagnostica di Laboratorio Veterinaria (SIDiLV) and the Italian Infrastructure of Bioinformatics (IIB/ELIXIR Italy), together with the Italian National Institute of Health (ISS) and Sapienza University of Rome, organised the “School of Python for Genomics Massimiliano Orsini”.

The School consists of 3 consequential modules:

  1. Basic module: Introduction to Python programming (Nov 2022)
  2. Intermediate module: Analysing your genomic data with Python (Jan-Feb 2024)
  3. Advanced Module: Applying Machine Learning to Genomics Data (Apr 2026)

Machine Learning (ML) is increasingly central to scientific research and finds important applications in interpreting genomic characterization data of pathogens. The main objective of this third module is to use Python to apply various ML models to genomic data from a population of pathogenic Escherichia coli strains. Participants will be guided through the critical analysis of evaluation metrics to select the most appropriate algorithms for predictive problems related to strain circulation and clinical associations.

Important Dates

  • Deadline for applications: 8 April, 2026
  • Chosen participants will be notified by: 10 April, 2026
  • School date: 20-24 April, 2026

Venue

National Institute of Health (ISS)
Viale Regina Elena, 299
00161 Rome, Italy

Fee

No registration fee is required for the course itself. However, selected participants will be required to pay the SIDiLV membership fee for the year 2026 (Full fee: 50 euros; Reduced fee for under 35s: 30 euros).
Participants are expected to cover their own travel and accommodation costs.

Instructors

  • Luca De Sabato, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
  • Arnold Knijn, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
  • Loredana Le Pera, Core Facilities Technical-Scientific Service, National Institute of Health (FAST - ISS), Italy
  • Valeria Michelacci, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
  • Damiano Parrone, Sapienza University of Rome, Italy
  • Allegra Via, Sapienza University of Rome, Italy

Registration

Application Form

A maximum of 20 candidates will be selected. Priority will be given to those who attended the Intermediate Module in 2024. Selection will be based on the order of registration and the prerequisites declared in the application form. Notifications of acceptance will be sent by 10 April, 2026.

Prerequisites

This Advanced Module is a follow-up to the previous modules of the School. It assumes solid intermediate Python programming skills and experience with genomic data analysis. The course is highly practical and based on exercises with real datasets; participants must bring their own laptop.

Target audience

Research scientists, veterinarians, and health professionals at any stage of their career who are applying (or planning to apply) Machine Learning to genomic data.

Learning Outcomes

At the end of the course, participants will be able to:

  • Describe the fundamental concepts of ML and distinguish between the main learning paradigms
  • Illustrate the phases of the ML workflow, from data collection and pre-processing to model evaluation
  • Explore the structure of a dataset and visualize its characteristics using plotting techniques
  • Perform dataset pre-processing
  • Apply dimensionality reduction techniques (PCA) and clustering
  • Implement ML algorithms in Python using standard libraries
  • Evaluate model performance using metrics appropriate to the task
  • Select the most suitable algorithm based on the specific problem
  • Recognize and manage the main ML issues (overfitting, data leakage, class imbalance)
  • Critically interpret model results and contextualize them with respect to the biological problem of interest

Organisers

  • SIDiLV
  • ELIXIR Italy
  • Stefano Morabito - SIDiLV/ELIXIR Italy/National Institute of Health (ISS), Italy
  • Valeria Michelacci - SIDiLV/National Institute of Health (ISS), Italy
  • Loredana Le Pera - ELIXIR Italy (Training Platform)/National Institute of Health (ISS), Italy
  • Allegra Via - ELIXIR Italy (Training Platform)/Sapienza Univ. of Rome, Italy

Helpers

  • Andrea Cacioppo, National Institute of Health (ISS), Italy
  • Donald Baku, Institute of Molecular Biology and Pathology - CNR, Italy
  • Arnold Knijn, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
  • Gianmarco Pascarella, ELIXIR Italy (Training Platform)/CNR, Italy

Contact

For any type of inquiry, please feel free to contact valeria.michelacci@iss.it

Programme


Time Objective Activity Instructor

Day 1 - Monday 20/04/2026

14:00 Welcome
14:00 - 14:45 Participant Introductions
14:45 - 15:00 Course Introduction - School History Presentation Massimiliano Orsini, Stefano Morabito
15:00 - 15:15 SIDILV Society and ELIXIR Research Infrastructure Presentation Presentation Valeria Michelacci, Loredana Le Pera
15:15 - 15:45 Review of Previous Modules Presentation Luca De Sabato
15:45 - 16:15 Break
16:15 - 16:40 Setting
16:40 - 17:25 Introduction to Machine Learning Presentation Allegra Via
17:25 - 17:30 Wrap-up

Day 2 - Tuesday 21/04/2026

09:00 - 09:30 Warm-up
09:30 - 09:45 Previous Day Recap and Q&A
09:45 - 10:15 Machine Learning Pipeline and Artificial Dataset Presentation Presentation Damiano Parrone
10:15 - 11:00 Data Import and Preprocessing (Artificial Dataset) Hands-on Damiano Parrone
11:00 - 11:30 Break
11:30 - 13:00 Data Preprocessing and Feature Encoding Hands-on Damiano Parrone
13:00 - 14:00 Lunch
14:00 - 15:30 Model Generation (K-Nearest Neighbors) and Hyperparameter Tuning Hands-on Damiano Parrone
15:30 - 16:00 Break
16:00 - 17:30 Model Generation (Decision Trees) and Hyperparameter Tuning Hands-on Damiano Parrone

Day 3 - Wednesday 22/04/2026

09:00 - 09:30 Previous Day Recap and Q&A
09:30 - 11:00 Cross-Validation (Single Model) Hands-on Damiano Parrone
11:00 - 11:30 Break
11:30 - 13:00 Best Model Selection Hands-on Damiano Parrone
13:00 - 14:00 Lunch
14:00 - 14:15 Real Dataset Presentation - Luca De Sabato
14:15 - 15:30 Data Import and Preprocessing (Real Dataset) Hands-on Luca De Sabato
15:30 - 16:00 Break
16:00 - 17:30 Data Preprocessing and Feature Encoding Hands-on Luca De Sabato

Day 4 - Thursday 23/04/2026

09:00 - 09:30 Previous Day Recap and Q&A
09:30 - 11:00 Model Generation (K-Nearest Neighbors) and Hyperparameter Tuning Hands-on Luca De Sabato
11:00 - 11:30 Break
11:30 - 13:00 Model Generation (Decision Trees) and Hyperparameter Tuning Hands-on Luca De Sabato
13:00 - 14:00 Lunch
14:00 - 15:30 Cross-Validation (Single Model) Hands-on Luca De Sabato
15:30 - 16:00 Break
16:00 - 17:30 Best Model Selection Hands-on Luca De Sabato

Day 5 - Friday 24/04/2026

09:00 - 09:30 Previous Day Recap and Q&A
09:30 - 11:00 Feature Importance Analysis Hands-on Damiano Parrone
11:00 - 11:30 Break
11:30 - 13:00 Model Packaging and Deployment Hands-on Luca De Sabato
13:00 - 14:00 Lunch
14:00 - 15:30 Dimensionality Reduction Presentation Damiano Parrone
15:30 - 16:00 Final Discussion and Closing Remarks - Stefano Morabito