RNAGym: Large-scale Benchmarks for RNA Fitness and Structure Prediction

Rohit Arora, Murphy Angelo, Christian Andrew Choe, Courtney A. Shearer, Aaron W. Kollasch, Fiona Qu, Ruben Weitzman, Artem Gazizov, Sarah Gurev, Erik Xie, Debora Marks, Pascal Notin

June 2025

RNAGym

Abstract

Understanding RNA structure and predicting the functional consequences of mutations are fundamental challenges in computational biology with broad implications for therapeutic development and synthetic biology. Current evaluation of machine learning-based RNA models suffers from disparate experimental datasets and inconsistent performance assessments across different RNA families. To address these challenges, we introduce RNAGym, a large-scale benchmarking framework specifically designed for three core tasks–RNA fitness, secondary structure, and tertiary structure prediction. The framework integrates extensive datasets, including 70 standardized deep mutational scanning assays covering over a million mutations across diverse RNA types; 901k chemical-mapping reactivity profiles for secondary structure; and 215 diverse tertiary structures curated from the PDB. RNAGym is designed to facilitate a systematic comparison of RNA models, offering an essential resource to enhance the understanding and development of these models.

Type

Preprint

Publication

preprint

Benchmark; RNA Fitness; RNA Structure; Deep Mutational Scanning assay; PDB