Training courses: School of Python Advanced module
Course Description
The rapid development of next-generation sequencing technologies has made genomic sequencing one of the primary sources of biological data. Despite the availability of numerous analysis tools, genomic research often requires specific skills to create custom scripts or software. To address this, the Società Italiana di Diagnostica di Laboratorio Veterinaria (SIDiLV) and the Italian Infrastructure of Bioinformatics (IIB/ELIXIR Italy), together with the Italian National Institute of Health (ISS) and Sapienza University of Rome, organised the “School of Python for Genomics Massimiliano Orsini”.
The School consists of 3 consequential modules:
- Basic module: Introduction to Python programming (Nov 2022)
- Intermediate module: Analysing your genomic data with Python (Jan-Feb 2024)
- Advanced Module: Applying Machine Learning to Genomics Data (Apr 2026)
Machine Learning (ML) is increasingly central to scientific research and finds important applications in interpreting genomic characterization data of pathogens. The main objective of this third module is to use Python to apply various ML models to genomic data from a population of pathogenic Escherichia coli strains. Participants will be guided through the critical analysis of evaluation metrics to select the most appropriate algorithms for predictive problems related to strain circulation and clinical associations.
Important Dates
- Deadline for applications: 8 April, 2026
- Chosen participants will be notified by: 10 April, 2026
- School date: 20-24 April, 2026
Venue
National Institute of Health (ISS)
Viale Regina Elena, 299
00161 Rome, Italy
Fee
No registration fee is required for the course itself. However, selected participants will be required to pay the SIDiLV membership fee for the year 2026 (Full fee: 50 euros; Reduced fee for under 35s: 30 euros).
Participants are expected to cover their own travel and accommodation costs.
Instructors
- Luca De Sabato, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
- Arnold Knijn, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
- Loredana Le Pera, Core Facilities Technical-Scientific Service, National Institute of Health (FAST - ISS), Italy
- Valeria Michelacci, Dep. of Food Safety, Nutrition and Veterinary Public Health, National Institute of Health (SANV - ISS), Italy
- Damiano Parrone, Sapienza University of Rome, Italy
- Allegra Via, Sapienza University of Rome, Italy
Registration
A maximum of 20 candidates will be selected. Priority will be given to those who attended the Intermediate Module in 2024. Selection will be based on the order of registration and the prerequisites declared in the application form.
Notifications of acceptance will be sent by 10 April, 2026.
Prerequisites
This Advanced Module is a follow-up to the previous modules of the School. It assumes solid intermediate Python programming skills and experience with genomic data analysis. The course is highly practical and based on exercises with real datasets; participants must bring their own laptop.
Target audience
Research scientists, veterinarians, and health professionals at any stage of their career who are applying (or planning to apply) Machine Learning to genomic data.
Learning Outcomes
At the end of the course, participants will be able to:
- Describe the fundamental concepts of ML and distinguish between the main learning paradigms
- Illustrate the phases of the ML workflow, from data collection and pre-processing to model evaluation
- Explore the structure of a dataset and visualize its characteristics using plotting techniques
- Perform dataset pre-processing
- Apply dimensionality reduction techniques (PCA) and clustering
- Implement ML algorithms in Python using standard libraries
- Evaluate model performance using metrics appropriate to the task
- Select the most suitable algorithm based on the specific problem
- Recognize and manage the main ML issues (overfitting, data leakage, class imbalance)
- Critically interpret model results and contextualize them with respect to the biological problem of interest
Organisers
- SIDiLV
- ELIXIR Italy
- Stefano Morabito - SIDiLV/ELIXIR Italy/National Institute of Health (ISS), Italy
- Valeria Michelacci - SIDiLV/National Institute of Health (ISS), Italy
- Loredana Le Pera - ELIXIR Italy (Training Platform)/National Institute of Health (ISS), Italy
- Allegra Via - ELIXIR Italy (Training Platform)/Sapienza Univ. of Rome, Italy
Contact
For any type of inquiry, please feel free to contact valeria.michelacci@iss.it
Preliminary Programme
| 20 April - 14:00 -17:00 - Introduction to Machine Learning and Dataset Exploration | |||
| 21 April - 9:00 - 13:00 and 14:00 - 17:00 - Data pre-processing, cleaning and dimensionality reduction techniques | |||
| 22 April - 9:00 - 13:00 and 14:00 - 17:00 - Building a Machine Learning pipeline | |||
| 23 April - 9:00 - 13:00 and 14:00 - 17:00 - Model Validation, Hyperparameter Tuning and Model Interpretation | |||
| 24 April - 9:00 - 13:00 and 14:00 - 16:00 - ML results Interpretation and Discussion |