CS-E4880 - Machine Learning in Bioinformatics D, Lecture, 3.3.2023-2.6.2023
This course space end date is set to 02.06.2024 Search Courses: CS-E4880
Topic outline
-
-
Anna Cichonska: Integration of machine learning with experimental approaches to rationally design a new generation of kinase drugs File PDF
Abstract
Most kinase drugs have several off-target interactions that either contribute to their therapeutic or toxic effects in humans. For instance, sorafenib’s primary target is RAF kinase, and it was originally designed to treat RAF-driven cancers, such as melanoma. However, sorafenib’s off-target interactions with PDGFR and VEGFR kinases enabled its repurposing for other indications, including thyroid, renal and hepatocellular cancers, increasing its therapeutic index. On the other hand, off-target interactions can also lead to unexpected adverse reactions in patients, as seen with highly promiscuous kinase drugs, for example, sunitinib. Harmonic Discovery’s mission is to engineer a new generation of kinase drugs with precision multi-target or polypharmacological profiles. This can lead to advances in the treatment of diseases such as cancer, new therapies with reduced likelihood of resistance mechanisms, and safer drugs for patients. We accomplish this by integrating the latest advances in machine learning, in the fields of chemistry and biology, with modern experimental techniques, such as CRISPR screening. In this talk, I will present examples of using machine learning for kinase drug design.
Bio
Anna Cichońska holds a PhD in Information and Computer Science from Aalto University, and her research has been conducted in collaboration with the Institute for Molecular Medicine Finland FIMM. She has received two doctoral dissertation awards, including the award for the best doctoral thesis of 2018-2019 in Finland in the field of Bioinformatics. Anna’s work has been focused on developing machine learning methods and tools to facilitate biomedical research in the “multiple genes - multiple diseases - multiple drugs” paradigm. Anna was one of the main organizers of the IDG-DREAM
Drug-Kinase Binding Prediction Challenge, an international crowd-sourced competition that evaluated the power of machine learning models for predicting yet unexplored drug-target activities.
After graduation, Anna continued her research as a Postdoc at FIMM, and worked as a Senior Data Scientist at Nightingale Health, a Finnish biotech company leveraging metabolomics for preventive and personalized medicine. At Nightingale, Anna led scientific collaborations with some of the largest biobanks in the world. Currently, Anna leads the development of computational platform for next-generation therapeutics at Harmonic Discovery, a biotech startup based in the USA and Finland. -
Heli Julkunen: Metabolic blood biomarker profiling for risk prediction of various chronic diseases – evidence from 275,000 individuals in the UK Biobank File PDF
Abstract
Early identification of individuals at risk of developing chronic diseases is essential for effective prevention strategies. With the potential to enhance the quality of life of individuals and reduce the burden on healthcare systems, there is a need for low-cost tests that can simultaneously profile the risk of multiple diseases.
Nightingale Health Plc has developed a high-throughput nuclear magnetic resonance (NMR) based metabolomics platform, which quantifies diverse blood biomarkers from multiple metabolic pathways, including lipoprotein measures, fatty acids and small molecules such as amino acids, ketones, and glycolysis metabolites. These NMR biomarker profiles have recently been measured in 275,000 individuals from UK Biobank. The initial results have revealed a prominent role of these metabolic biomarkers as risk markers across various health outcomes beyond cardiometabolic diseases.
In the talk, I will showcase the use of these biomarker profiles for predicting the onset of various common diseases, offering the potential to identify individuals with several-fold increased risk compared to the general population. I will also show how combining the metabolic risk profiles with polygenic risk scores can provide a more comprehensive view of an individual's disease risk by capturing both inherited and accumulated lifestyle risk. From a translational perspective, this may complement the identification of high-risk individuals across various diseases beyond standard risk factor assessment.
Bio
Heli Julkunen is a senior data scientist at Nightingale Health Plc, the Finnish innovator of an internationally recognized metabolomics platform for large cohorts, biobanks, and trials. She has led studies on biomarker discovery and risk prediction using the Nightingale biomarker data in UK Biobank. She is part of Nightingale’s science team, with research focus on translating the biomarker profiling from cohorts to preventative healthcare.
-
Abstract: Biomedical data poses multiple hard challenges that break conventional machine learning assumptions. In this talk, I will highlight the need to transcend our prevalent machine learning paradigm and methods to enable them to become the driving force of new scientific discoveries. I will present machine learning methods that have the ability to bridge heterogeneity of individual biological datasets by transferring knowledge across datasets with an unique ability to discover novel, previously uncharacterized phenomena. I will discuss the findings and impact these methods have for annotating comprehensive single-cell atlas datasets and discovery of novel cell types.Bio: Maria Brbic (https://brbiclab.epfl.ch/) is an Assistant Professor of Computer Science and, by courtesy, of Life Sciences at the Swiss Federal Institute of Technology, Lausanne (EPFL). She develops new machine learning methods and applies her methods to advance biology and biomedicine. Her methods have been used by global cell atlas consortia efforts aiming to create reference maps of all cell types with the potential to transform biomedicine, including the Human BioMolecular Atlas Program (HuBMAP) and Fly Cell Atlas consortium. Prior to joining the EPFL faculty in 2022, Maria was a postdoctoral fellow at Stanford University, Department of Computer Science, and was a member of the Chan Zuckerberg Biohub at Stanford. Maria received her Ph.D. from University of Zagreb in 2019 while also researching at Stanford University as a Fulbright Scholar and University of Tokyo. She was named a rising star in EECS by MIT in 2021.
-
Markus Heinonen: Generative Models for molecules File PDF
Abstract: Generative models have recently become a major branch of machine learning, spearheaded by the breakthroughs in generating natural language (chatGPT) or images (Midjourney, etc). Similarly, they can used to generate novel drug-like molecules or protein structures. These models learn to represent a potentially informative latent encoding of an object, can be used for downstream tasks such as classification. In this lecture I will cover the foundations of generative models, and highlight how they can used to in molecular generation.
-
Elena Casiraghi: Patient Similarity Networks and their integration for diagnostic/prognostic biomarker discovery File PDF
Abstract
Methods for phenotype and outcome prediction are largely based on inductive supervised models that use selected biomarkers to make predictions, without explicitly considering the functional relationships between individuals. In this talk, we will first describe a network-based approach named Patient-Net (P-Net) in which biomolecular profiles of patients are modeled in a graph-structured space that represents gene expression relationships between patients. Then a kernel-based semi-supervised transductive algorithm is applied to the graph to explore the overall topology of the graph and to predict the phenotype/clinical outcome of patients. Importantly, P-Net also provides interpretable models that can be easily visualized to gain clues about the relationships between patients, and to formulate hypotheses about their stratification.Though simple and effective, PNet has been designed to only work on unimodal data, which limits the amount of information that can be leveraged for prediction.Thence, main focus of our research is the investigation of proper methods for multi-modal data integration. The second part of the talk will be therefore devoted to a brief description of the main data-integration strategies and approaches.BioElena Casiraghi is and Associate Professor at Computer Science Department, Università degli Studi di Milano, as well asAffiliate researcher of the Environmental Genomics and Systems Biology Division of the Berkeley University
Her research life started at the Information Technology Department in VTT (Valtion Teknillinen Tutkimuskeskus, Helsinki, Finland), where she worked in the development of virtual reality applications (projects in cooperation with Nokia). Her research work in the Department of Computer Science in Milan started in the field of medical image processing for developing diagnostic/prognostic predictive models. Besides medical imaging applications, she also focused on the development of novel machine learning algorithms for pattern recognition and manifold learning to be applied in the health informatics, bioinformatics, and computational biology fields. Among others, she recently focused on intrinsic dimensionality estimation and dimensionality reduction to develop novel theories and automatic algorithms dealing with high-dimensional datasets characterized by a small cardinality (Small Sample Size Problem), imputation algorithms to deal with data missingness, multi-modal data integration, graph analysis, and graph representation learning techniques. She has ongoing cooperations with the main hospitals in the Milan area, the Berkeley research laboratories, the Jackson Laboratories, Columbia University, Virginia Tech University.
Main interests: BioInformatics, Computational and Systems Biology, Health Informatics, (Graph) Machine Learning and graph Representation Learning, High-dimensional data processing, Explainability/Interpretability
Email: elena (dot) casiraghi (at) unimi (dot) it -
Speaker: Elina Francovic-Fontaine, Laval University, Canada
Time and place: 15.5.2023 at 14:15, seminar room T3 Aalto CS Building.
Abstract:
Machine learning needs to be made accessible for metabolomics scientists to help with tasks like biomarkers discovery. This presentation will introduce the Metabolomics Dashboard for Interpretable Classification (MeDIC), aiming at facilitating the access to machine learning for non-domain experts. Our work focuses on providing the scientific community with a sound and straightforward tool to ease the analysis of untargeted liquid-chromatography-mass spectrometry metabolomic data. We propose the MeDIC to perform metabolomics data analyses relying on machine learning tools. It allows the extraction of relevant features from a pool of interpretable classifiers to perform biomarker discovery. Alongside the pure machine learning study, it provides an extensive result analysis pipeline, allowing non domain experts to understand the inner mechanisms of the algorithms.
Bio :
Elina is a Ph.D. student from Laval University in Canada. She completed her Bachelor's degree in bioinformatics. As an undergraduate student, she developed a strong interest in machine learning through her internships. For her Master's degree, she focused on the application of machine learning to metabolomics data and the creation of a pipeline with a web-based interface to produce and analyse machine learning experiments on mass spectrometry metabolomics data. For her Ph.D, she furthers the development of the tool by working on developing multi-omics algorithms focusing on the integration of metabolomics with other omics.
-