Blog post published May 6, 2026

ProxyTm: Protein Thermostability at Scale


Introducing ProxyTm: a new platform for measuring protein thermostability quickly, cheaply, and at scale.

ProxyTm allows us to measure thermostability for a library of  ~10,000 nanobody (VHH) variants in a single one-week experiment. This represents 100–1000x lower cost and higher throughput than conventional methods. 

Story at a glance

Previously, we developed RamaX to enable screening large libraries in just 1-2 weeks. Now, we've extended this method to measure protein melting temperature (Tm), a crucial developability property.

  • Capability:  In one week, we collected a dataset that is 20x the size of all public VHH Tm data
  • Validation: We show that these datapoints strongly correlate with gold-standard Tms (MAE = 3.3°C, Spearman rho = 0.90), validated against 530 reference sequences
  • Prediction: Next, we added a simple Tm prediction head to an antibody language model and ran head-to-head finetuning experiments. Training on ProxyTm data resulted in signifcant performance gains compared to public datapoints only.
  • Open-source baselines: Open-source models in this space (NanoMelt, NBsTem, TemBERTure) fail to generalize to the larger and more diverse ProxyTm test set. Our ProxyTm-augmented models achieve 2–3x lower error on the same held-out data.

New releases

We're excited to begin offering ProxyTm screening to our partners. This platform joins RamaX (binder discovery), RamaX-Opt (affinity maturation), and DiffuseSandbox (AI binder design).

Furthermore, we're releasing our model for predicting Tm from VHH sequence via API endpoint, for use by the community!

Why thermostability

Thermostability is a critical developability property: an antibody that unfolds too easily is harder to manufacture, formulate, and dose. That’s why Tm is one of the most important biophysical characteristics measured during lead optimization.

But conventional methods require individually purified proteins, limiting throughput to 10s or 100s of measurements per campaign. Across the datasets available in the literature, there are only ~640 VHH sequences with Tm measurements available.

The bigger picture

A preliminary look at scaling law behavior for this initial ProxyTm data suggest significant future gains in performance as we validate ProxyTm for larger libraries, up to 100K–1M+ Tm datapoints at a time.

In tandem, we're actively onboarding more protein formats and developability properties. Our goal is to build the single experimental assay that can measure a panel of properties — polyreactivity, expression, and more — at the library scale, in a matter of weeks. This will enable a new generation of multitask models that vastly accelerate protein engineering.

We seek to raise the ceiling where the lack of large, high-quality datasets fundamentally limits the performance of predictive models in this space. So for us, developing ProxyTm represents a proof-of-concept in the synthesis of many ideas at Diffuse.

Sounds exciting? Get in touch to discuss how we can work together, or work with us by joining our team!

— The Diffuse team

Contact Us

Email

info@diffuse.bio

Headquarters

San Carlos, CA

Send us a message

We are eager to connect with you.

Successfully Sent

We will be in touch with you shortly.

Oops, something went wrong! Please double-check your submission and try again.
TwitterLinkedIn
© 2025 Diffuse Bio