What we measure, how we measure it, and where the model is honest about its limits. Last updated 2026-05-10. Engine version v18.3.
Two distinct accuracy numbers describe a battery degradation model and they are not interchangeable. We report both because the difference matters.
Calibrated-fit numbers are reported throughout the published literature and the marketing copy of most commercial degradation tools. Held-out forward-prediction is the standard since Severson et al. Nature Energy 2019 introduced it as the appropriate benchmark for cycle-life prediction. We publish both.
Each preset is a hand-tuned PhysicsConfig that fits the published curve. Calibration uses Nelder–Mead in log-space across 4–8 rate constants (SEI, Si-cracking, plating, transport). Six of eight are fully-attributable peer-reviewed curves; two are synthesized references (one proxy fit, one literature composite) and are labeled accordingly in the table below.
| Dataset | Provenance | Si % | C-rate | T (°C) | RMSE (pp) | R² | Grade |
|---|---|---|---|---|---|---|---|
| LG_M50T | SYNTHESIZED (Kirkaldy 2022 proxy) | 2.5 | 1.0 | 25 | 0.32 | 0.999 | A+ |
| NatComms_20Si_LPD | Nat. Commun. 2021, 12, 2811 | 20 | 0.5 | 25 | 0.13 | 1.000 | A+ |
| NatComms_20Si_HPD | Nat. Commun. 2021, 12, 2811 | 20 | 0.5 | 25 | 1.23 | 0.993 | A (knee-prone) |
| HPQ_GEN3_18650 | HPQ Silicon / Novacium GEN3, 2024 | 18 | 0.5 | 25 | 0.18 | 0.994 | A+ |
| SiGr_5pct_45C_1C | SYNTHESIZED (Dressler-style composite) | 5 | 1.0 | 45 | 0.25 | 0.999 | A+ |
| Kirk_2024_Moderate | Kirk et al., ACS Energy Lett. 2024 | 10 | 0.5 | 25 | 0.17 | 0.999 | A+ |
| Kirk_2024_FastCharge | Kirk et al., ACS Energy Lett. 2024 | 10 | 2.0 | 30 | 1.01 | 0.996 | A (knee-prone) |
| Dose_2023_1C | Dose et al., J. Power Sources 2023 | 8 (nano) | 1.0 | 25 | 0.23 | 0.999 | A+ |
Summary (6 attributable + 2 synthesized): 6 A+ (RMSE 0.13–0.32 pp), 2 A on knee-prone cells (RMSE 1.01–1.23 pp). Mean RMSE across all 8: 0.44 pp. The two synthesized references (LG_M50T proxy, SiGr_5pct_45C_1C composite) reproduce literature-typical aging shapes for chemistries where the original cycler data was not publicly available at the time of build — they are useful for regression testing but should not be cited as independent validation against those specific cells. Re-sourcing real data for both is on the post-sprint backlog.
Our grade scheme uses the following thresholds, defined in model/battery_model_v18.py line 1247:
Stricter conventions in some national lab reviews use A+ < 0.5 pp / A < 1.0 pp. By those thresholds, our calibrated count would be 3 A+ (NatComms_LPD 0.13, Kirk_Moderate 0.17, HPQ_GEN3 0.18) plus 3 A (Dose_2023 0.23, SiGr_5pct 0.25, LG_M50T 0.32) plus 2 B (Kirk_FastCharge 1.01, NatComms_HPD 1.23). All eight datasets remain within 1.5 pp of the published curves either way.
Held-out protocol: calibrate the model using only data up to the 90% retention cutoff, then run forward prediction to the 70% retention cutoff (typical end-of-warranty checkpoint). Compute RMSE on the held-out portion only.
| Cell class | Calibrated-fit RMSE | Held-out forward RMSE | Status |
|---|---|---|---|
| Linear-degradation cells (5 datasets) | 0.13–0.32 pp | ~1–3 pp (mean) | Production-ready |
| LG_M50T-class (low-Si EV) | 0.32 pp | 4.98 pp (Track 2 expanded fit) | Sprint target hit |
| Knee-prone cells (HPD-graphite, fast-charge) | 1.01–1.23 pp | ~12 pp (mean) | Active research direction |
The architecture sprint May 4–10, 2026 closed the gap on LG_M50T-class cells from 14.10 pp to 4.98 pp by expanding the fit-parameter set to engage v18.3's cathode LAM, electrolyte depletion, and R-overpotential coupling mechanisms (Track 2). The remaining held-out gap on knee-prone cells is the work for the N≥30 cross-cell validation panel beginning 2026-05-11.
Use the calibrated-fit numbers when evaluating whether the model can reproduce a curve you've already characterized — design-of-experiments sweeps, sensitivity analysis, mechanism decomposition on cells you have full data for.
Use the held-out forward-prediction numbers when evaluating whether the model can predict cycle life from limited early-cycle data — warranty modeling, fleet replacement-rate forecasting, BMS calibration from sparse field measurements. Linear-degradation cells are production-ready; knee-prone cells are active research.
The /api/calibrate response surfaces both numbers when you supply enough data to compute them. The dispatch_to field on the pre_screen endpoint flags cells that route to the identifiability-limited branch — those are the ones where the held-out gap matters.
Calibrated envelope: 2.5–20% silicon, 0.5C–2C charge, 25–45°C ambient.Outside this envelope every predict response includes a validation_warnings array with field-specific messages and an out_of_validated_range boolean.
Two known model gaps in the customer-visible warning:
Every simulation run produces an AuditRecord with deterministic run_id (SHA-256 of config + protocol), config_hash, model_version, output_schema_version, timestamp, platform info, and invariant-check messages. Run the same config + protocol against the same model version at any point in the future and you get an identical retention curve.
Version retention policy: model versions are pinnable via the ?model_version= query parameter and remain callable for a minimum of 24 months after a successor version ships. See /sla for full details.
We publish both calibrated-fit and held-out numbers because the difference matters and because we expect peer reviewers, national lab researchers, and warranty actuaries to compute their own. If you've reproduced these numbers and disagree, write us at jason@scaleprognostics.com with the data and we'll publish the disagreement here.