Cornell University, Machine Learning in Medicine (MLIM) seminar - Uncertainty in deep generative models with applications to genomics and drug design

Abstract

In this talk I will discuss how combining uncertainty quantification and deep generative modeling helps address key questions in genomics and drug design. The first part will cover an approach we developed to predict the clinical significance of protein variants in a fully unsupervised manner, directly learning from the natural distribution of proteins in evolutionary data. Our model EVE (Evolutionary model of Variant Effect) not only outperforms computational approaches that rely on labelled data, but also performs on par with high-throughput assays which are increasingly used as strong evidence for variant classification. We predict the pathogenicity of 11 million variants across 1,081 disease genes and assign high-confidence reclassification for 72k Variants of Unknown Significance by combining uncertainty metrics and other sources of evidence. The second part will focus on the task of optimizing a back-box objective function over high-dimensional structured spaces (e.g., maximizing drug-likeness of molecules). Optimization in the latent space of deep generative models is a recent and promising approach to do so. However, existing methods in this area lack robustness as they may decide to explore areas of latent space for which no data was available during training. We propose a new approach that quantifies and leverages the epistemic uncertainty of the decoder to guide the optimization process, and show it yields a more effective optimization as it avoids cases in which the decoder generates unrealistic or invalid objects.

Date
Mar 19, 2021
Pascal Notin
Pascal Notin
Scientific Lead

Research in AI for Protein Design