Rocklin Lab Releases Megascale Open Protein Stability Dataset to Advance Biomolecular AI | Recent News Summary

The Rocklin Lab at Northwestern University has released the MGnify Stability Dataset, a megascale open resource containing folding stability measurements for 1.8 million diverse protein domains, generated using cDNA display proteolysis and aimed at advancing biomolecular AI and protein stability prediction models.

The dataset is supported by the OpenFold Consortium and includes both stable and unstable proteins from over 200,000 sequence families, providing crucial negative data often missing in public biological datasets.

The MGnify Stability Dataset is currently restricted to protein domains 60–80 amino acids in length, with experimental stabilities resolved up to approximately 5 kcal/mol, and is available to the research community for accelerating the development of improved machine learning models for protein design and analysis.

Sources:

Protein stability prediction by fine-tuning a protein language model ...

ROCKLIN LAB @ NORTHWESTERN CHICAGO - About

Rocklin Lab Releases Megascale Open Protein Stability Dataset to ...

Mega-scale experimental analysis of protein folding stability in ...

Predicting protein folding stability and aggregation propensity using ...

OpenFold Consortium

README.md · RosettaCommons/MegaScale at main - Hugging Face

Large-scale discovery, analysis, and design of protein energy ... - PMC

Transfer learning to leverage larger datasets for improved ...

Pioneering New Methods to Understand Protein Folding

Global analysis of protein folding using massively parallel design ...

Rocklin Lab Releases Megascale Open Protein Stability Dataset to Advance Biomolecular AI