Training courses: School of Python for genomics Massimiliano Orsini. Advanced Module: Applying Machine Learning to Genomics Data

ISS_Logo IIB_ELIXIR_LOGO

Course Description

The rapid development of next-generation sequencing technologies has made genomic sequencing one of the primary sources of biological data. Despite the availability of numerous analysis tools, genomic research often requires specific skills to create custom scripts or software. To address this, the Società Italiana di Diagnostica di Laboratorio Veterinaria (SIDiLV) and the Italian Infrastructure of Bioinformatics (IIB/ELIXIR Italy), together with the Italian National Institute of Health (ISS) and Sapienza University of Rome, organised the “School of Python for Genomics Massimiliano Orsini”.

The School consists of 3 consequential modules:

Basic module: Introduction to Python programming (Nov 2022)
Intermediate module: Analysing your genomic data with Python (Jan-Feb 2024)
Advanced Module: Applying Machine Learning to Genomics Data (Apr 2026)

Machine Learning (ML) is increasingly central to scientific research and finds important applications in interpreting genomic characterization data of pathogens. The main objective of this third module is to use Python to apply various ML models to genomic data from a population of pathogenic Escherichia coli strains. Participants will be guided through the critical analysis of evaluation metrics to select the most appropriate algorithms for predictive problems related to strain circulation and clinical associations.

Important Dates

Deadline for applications: 8 April, 2026
Chosen participants will be notified by: 10 April, 2026
School date: 20-24 April, 2026

Venue

National Institute of Health (ISS)
Viale Regina Elena, 299
00161 Rome, Italy

Fee

No registration fee is required for the course itself. However, selected participants will be required to pay the SIDiLV membership fee for the year 2026 (Full fee: 50 euros; Reduced fee for under 35s: 30 euros).
Participants are expected to cover their own travel and accommodation costs.

Instructors

Luca De Sabato, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
Loredana Le Pera, Core Facilities Technical-Scientific Service, National Institute of Health (FAST - ISS), Italy
Valeria Michelacci, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
Damiano Parrone, Sapienza University of Rome, Italy
Allegra Via, Sapienza University of Rome, Italy

Registration

Application Form

A maximum of 20 candidates will be selected. Priority will be given to those who attended the Intermediate Module in 2024. Selection will be based on the order of registration and the prerequisites declared in the application form. Notifications of acceptance will be sent by 10 April, 2026.

Prerequisites

This Advanced Module is a follow-up to the previous modules of the School. It assumes solid intermediate Python programming skills and experience with genomic data analysis. The course is highly practical and based on exercises with real datasets; participants must bring their own laptop.

Target audience

Research scientists, veterinarians, and health professionals at any stage of their career who are applying (or planning to apply) Machine Learning to genomic data.

Learning Outcomes

At the end of the course, participants will be able to:

Describe the fundamental concepts of ML and distinguish between the main learning paradigms
Illustrate the phases of the ML workflow, from data collection and pre-processing to model evaluation
Explore the structure of a dataset and visualize its characteristics using plotting techniques
Perform dataset pre-processing
Apply dimensionality reduction techniques (PCA) and clustering
Implement ML algorithms in Python using standard libraries
Evaluate model performance using metrics appropriate to the task
Select the most suitable algorithm based on the specific problem
Recognize and manage the main ML issues (overfitting, data leakage, class imbalance)
Critically interpret model results and contextualize them with respect to the biological problem of interest

Organisers

SIDiLV
ELIXIR Italy
Stefano Morabito - SIDiLV/ELIXIR Italy/National Institute of Health (ISS), Italy
Valeria Michelacci - SIDiLV/National Institute of Health (ISS), Italy
Loredana Le Pera - ELIXIR Italy (Training Platform)/National Institute of Health (ISS), Italy
Allegra Via - ELIXIR Italy (Training Platform)/Sapienza Univ. of Rome, Italy

Helpers

Andrea Cacioppo, National Institute of Health (ISS), Italy
Donald Baku, Institute of Molecular Biology and Pathology - CNR, Italy
Arnold Knijn, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
Gianmarco Pascarella, ELIXIR Italy (Training Platform)/CNR, Italy

Contact

For any type of inquiry, please feel free to contact valeria.michelacci@iss.it

Programme

Time	Objective	Activity	Instructor
Day 1 - Monday 20/04/2026
14:00	Welcome
14:00 - 14:45	Participant Introductions
14:45 - 15:00	Course Introduction - School History	Presentation	Massimiliano Orsini, Stefano Morabito
15:00 - 15:15	SIDILV Society and ELIXIR Research Infrastructure Presentation	Presentation	Valeria Michelacci, Loredana Le Pera
15:15 - 15:45	Review of Previous Modules	Presentation	Luca De Sabato
15:45 - 16:15	Break
16:15 - 16:40	Setting
16:40 - 17:25	Introduction to Machine Learning	Presentation	Allegra Via
17:25 - 17:30	Wrap-up
Day 2 - Tuesday 21/04/2026
09:00 - 09:30	Warm-up
09:30 - 09:45	Previous Day Recap and Q&A
09:45 - 10:15	Machine Learning Pipeline and Artificial Dataset Presentation	Presentation	Damiano Parrone
10:15 - 11:00	Data Import and Preprocessing (Artificial Dataset)	Hands-on	Damiano Parrone
11:00 - 11:30	Break
11:30 - 13:00	Data Preprocessing and Feature Encoding	Hands-on	Damiano Parrone
13:00 - 14:00	Lunch
14:00 - 15:30	Model Generation (K-Nearest Neighbors) and Hyperparameter Tuning	Hands-on	Damiano Parrone
15:30 - 16:00	Break
16:00 - 17:30	Model Generation (Decision Trees) and Hyperparameter Tuning	Hands-on	Damiano Parrone
Day 3 - Wednesday 22/04/2026
09:00 - 09:30	Previous Day Recap and Q&A
09:30 - 11:00	Cross-Validation (Single Model)	Hands-on	Damiano Parrone
11:00 - 11:30	Break
11:30 - 13:00	Best Model Selection	Hands-on	Damiano Parrone
13:00 - 14:00	Lunch
14:00 - 14:15	Real Dataset Presentation	-	Luca De Sabato
14:15 - 15:30	Data Import and Preprocessing (Real Dataset)	Hands-on	Luca De Sabato
15:30 - 16:00	Break
16:00 - 17:30	Data Preprocessing and Feature Encoding	Hands-on	Luca De Sabato
Day 4 - Thursday 23/04/2026
09:00 - 09:30	Previous Day Recap and Q&A
09:30 - 11:00	Model Generation (K-Nearest Neighbors) and Hyperparameter Tuning	Hands-on	Luca De Sabato
11:00 - 11:30	Break
11:30 - 13:00	Model Generation (Decision Trees) and Hyperparameter Tuning	Hands-on	Luca De Sabato
13:00 - 14:00	Lunch
14:00 - 15:30	Cross-Validation (Single Model)	Hands-on	Luca De Sabato
15:30 - 16:00	Break
16:00 - 17:30	Best Model Selection	Hands-on	Luca De Sabato
Day 5 - Friday 24/04/2026
09:00 - 09:30	Previous Day Recap and Q&A
09:30 - 11:00	Feature Importance Analysis	Hands-on	Damiano Parrone
11:00 - 11:30	Break
11:30 - 13:00	Model Packaging and Deployment	Hands-on	Luca De Sabato
13:00 - 14:00	Lunch
14:00 - 15:30	Dimensionality Reduction	Presentation	Damiano Parrone
15:30 - 16:00	Final Discussion and Closing Remarks	-	Stefano Morabito