Understanding RNA structure and predicting the functional consequences of mutations are fundamental challenges in computational biology with broad implications for therapeutic development and synthetic biology. Current evaluation of machine learning-based RNA models suffers from disparate experimental datasets and inconsistent performance assessments across different RNA families. To address these challenges, we introduce RNAGym, a large-scale benchmarking framework specifically designed for three core tasks–RNA fitness, secondary structure, and tertiary structure prediction. The framework integrates extensive datasets, including 70 standardized deep mutational scanning assays covering over a million mutations across diverse RNA types; 901k chemical-mapping reactivity profiles for secondary structure; and 215 diverse tertiary structures curated from the PDB. RNAGym is designed to facilitate a systematic comparison of RNA models, offering an essential resource to enhance the understanding and development of these models.