Rocklin Lab Releases Megascale Open Protein Stability Dataset to Advance Biomolecular AI | Recent News Summary
The Rocklin Lab at Northwestern University has released the MGnify Stability Dataset, a megascale open resource containing folding stability measurements for 1.8 million diverse protein domains, generated using cDNA display proteolysis and aimed at advancing biomolecular AI and protein stability prediction models.
The dataset is supported by the OpenFold Consortium and includes both stable and unstable proteins from over 200,000 sequence families, providing crucial negative data often missing in public biological datasets.
The MGnify Stability Dataset is currently restricted to protein domains 60–80 amino acids in length, with experimental stabilities resolved up to approximately 5 kcal/mol, and is available to the research community for accelerating the development of improved machine learning models for protein design and analysis.
Sources:
Protein stability prediction by fine-tuning a protein language model ...
ROCKLIN LAB @ NORTHWESTERN CHICAGO - About
Rocklin Lab Releases Megascale Open Protein Stability Dataset to ...
Mega-scale experimental analysis of protein folding stability in ...
Predicting protein folding stability and aggregation propensity using ...
README.md · RosettaCommons/MegaScale at main - Hugging Face
Large-scale discovery, analysis, and design of protein energy ... - PMC
Transfer learning to leverage larger datasets for improved ...
Pioneering New Methods to Understand Protein Folding
Global analysis of protein folding using massively parallel design ...
Rocklin Lab Releases Megascale Open Protein Stability Dataset to Advance Biomolecular AI