3

RNAGym: Large-scale Benchmarks for RNA Fitness and Structure Prediction

RNAGym: Large-scale Benchmarks for RNA Fitness and Structure Prediction

Preprint. Large-scale benchmarks to assess models for RNA fitness & structure prediction.

Large-scale discovery, analysis, and design of protein energy landscapes

Large-scale discovery, analysis, and design of protein energy landscapes

Preprint. A multiplexed experimental approach to analyze conformational fluctuations across thousands of protein domains, revealing hidden variations that affect protein cooperativity and function.

Predicting Promoter Variant Effects from Evolutionary Sequences

Predicting Promoter Variant Effects from Evolutionary Sequences

Preprint. A conditional autoregressive transformer model trained on 14.6 million mammalian promoter sequences that achieves state-of-the-art performance in predicting the effects of indels in human promoter regions.

TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction

TranceptEVE: Combining Family-specific and Family-agnostic Models of Protein Sequences for Improved Fitness Prediction

NeurIPS, LMRL, 2022. A hybrid family-specific and family-agnostic model to achieve SOTA performance on protein fitness prediction and human variant annotation.

RITA: a Study on Scaling Up Generative Protein Sequence Models

RITA: a Study on Scaling Up Generative Protein Sequence Models

ICML, WCB, 2022. The first paper investigating scaling laws in protein language modeling.

Viral Evolution and Antibody Escape Mutations using Deep Generative Models

Viral Evolution and Antibody Escape Mutations using Deep Generative Models

We leverage deep generative models of evolutionary sequences to predict viral escape mutations.

Improving Compute Efficacy Frontiers with SliceOut

Improving Compute Efficacy Frontiers with SliceOut

Preprint. A memory-efficient dropout-inspired scheme to train large neural networks faster with no loss in accuracy.

Principled Uncertainty Estimation for High Dimensional Data

Principled Uncertainty Estimation for High Dimensional Data

We introduce an importance sampling-based estimator to estimate the epistemic uncertainty of deep learning models for high-dimensional discrete datasets.