1

Multi-Scale Representation Learning for Protein Fitness Prediction

Multi-Scale Representation Learning for Protein Fitness Prediction

NeurIPS, 2024. A multimodal framework that integrates protein sequence, structure, and surface topology features to achieve state-of-the-art fitness prediction.

ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design

ProteinGym: Large-Scale Benchmarks for Protein Fitness Prediction and Design

NeurIPS, 2023. Large-scale benchmarks to assess models for protein fitness prediction and design.

ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers

ProteinNPT: Improving Protein Property Prediction and Design with Non-Parametric Transformers

NeurIPS, 2023. A conditional semi-supervised pseudo-generative model for fitness prediction and design.

DiscoBAX: Discovery of Optimal Intervention Sets in Genomic Experiment Design

DiscoBAX: Discovery of Optimal Intervention Sets in Genomic Experiment Design

ICML, 2023. A sample-efficient method for discovering optimal sets that are both diverse and optimize the function of interest.

Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval

Tranception: Protein Fitness Prediction with Autoregressive Transformers and Inference-time Retrieval

ICML, 2022. A suite of autoregressive transformers with biological priors, augmented with inference-time retrieval, to achieve SOTA performance on protein fitness prediction.

GeneDisco: A Benchmark for Experimental Design in Drug Discovery

GeneDisco: A Benchmark for Experimental Design in Drug Discovery

ICLR, 2022. A benchmark suite for evaluating active learning algorithms for experimental design in drug discovery.

Improving Black-box Optimization in VAE Latent Space Using Decoder Uncertainty

Improving Black-box Optimization in VAE Latent Space Using Decoder Uncertainty

NeurIPS, 2021. A framework that uses the epistemic uncertainty of the decoder of a VAE to guide the optimization of properties of high-dimensional structured objects (e.g., molecules) in latent space.