← Back to home

Methodology & Validation

What we measure, how we measure it, and where the model is honest about its limits. Last updated 2026-05-10. Engine version v18.3.

Two metrics, not one

Two distinct accuracy numbers describe a battery degradation model and they are not interchangeable. We report both because the difference matters.

  • Calibrated-fit RMSE — the residual between the model and a published retention curve after the model has been fit to that curve. Measures whether the engine can reproduce the data it has been calibrated against.
  • Held-out forward-prediction RMSE — the residual between the model and the published curve when the model is calibrated using only the early portion of the data (e.g., cycles 0–500 at 100% retention down to 90%) and then extrapolated forward to predict the rest. Measures the metric that matters for cycle-life forecasting.

Calibrated-fit numbers are reported throughout the published literature and the marketing copy of most commercial degradation tools. Held-out forward-prediction is the standard since Severson et al. Nature Energy 2019 introduced it as the appropriate benchmark for cycle-life prediction. We publish both.

Calibrated-fit performance (8 datasets)

Each preset is a hand-tuned PhysicsConfig that fits the published curve. Calibration uses Nelder–Mead in log-space across 4–8 rate constants (SEI, Si-cracking, plating, transport). Six of eight are fully-attributable peer-reviewed curves; two are synthesized references (one proxy fit, one literature composite) and are labeled accordingly in the table below.

DatasetProvenanceSi %C-rateT (°C)RMSE (pp)Grade
LG_M50TSYNTHESIZED (Kirkaldy 2022 proxy)2.51.0250.320.999A+
NatComms_20Si_LPDNat. Commun. 2021, 12, 2811200.5250.131.000A+
NatComms_20Si_HPDNat. Commun. 2021, 12, 2811200.5251.230.993A (knee-prone)
HPQ_GEN3_18650HPQ Silicon / Novacium GEN3, 2024180.5250.180.994A+
SiGr_5pct_45C_1CSYNTHESIZED (Dressler-style composite)51.0450.250.999A+
Kirk_2024_ModerateKirk et al., ACS Energy Lett. 2024100.5250.170.999A+
Kirk_2024_FastChargeKirk et al., ACS Energy Lett. 2024102.0301.010.996A (knee-prone)
Dose_2023_1CDose et al., J. Power Sources 20238 (nano)1.0250.230.999A+

Summary (6 attributable + 2 synthesized): 6 A+ (RMSE 0.13–0.32 pp), 2 A on knee-prone cells (RMSE 1.01–1.23 pp). Mean RMSE across all 8: 0.44 pp. The two synthesized references (LG_M50T proxy, SiGr_5pct_45C_1C composite) reproduce literature-typical aging shapes for chemistries where the original cycler data was not publicly available at the time of build — they are useful for regression testing but should not be cited as independent validation against those specific cells. Re-sourcing real data for both is on the post-sprint backlog.

Threshold convention disclosure

Our grade scheme uses the following thresholds, defined in model/battery_model_v18.py line 1247:

  • A+: RMSE < 1.0 percentage points
  • A: 1.0 ≤ RMSE < 2.0 pp
  • B: 2.0 ≤ RMSE < 3.5 pp
  • C: 3.5 ≤ RMSE < 5.0 pp
  • D: RMSE ≥ 5.0 pp

Stricter conventions in some national lab reviews use A+ < 0.5 pp / A < 1.0 pp. By those thresholds, our calibrated count would be 3 A+ (NatComms_LPD 0.13, Kirk_Moderate 0.17, HPQ_GEN3 0.18) plus 3 A (Dose_2023 0.23, SiGr_5pct 0.25, LG_M50T 0.32) plus 2 B (Kirk_FastCharge 1.01, NatComms_HPD 1.23). All eight datasets remain within 1.5 pp of the published curves either way.

Held-out forward-prediction performance

Held-out protocol: calibrate the model using only data up to the 90% retention cutoff, then run forward prediction to the 70% retention cutoff (typical end-of-warranty checkpoint). Compute RMSE on the held-out portion only.

Cell classCalibrated-fit RMSEHeld-out forward RMSEStatus
Linear-degradation cells (5 datasets)0.13–0.32 pp~1–3 pp (mean)Production-ready
LG_M50T-class (low-Si EV)0.32 pp4.98 pp (Track 2 expanded fit)Sprint target hit
Knee-prone cells (HPD-graphite, fast-charge)1.01–1.23 pp~12 pp (mean)Active research direction

The architecture sprint May 4–10, 2026 closed the gap on LG_M50T-class cells from 14.10 pp to 4.98 pp by expanding the fit-parameter set to engage v18.3's cathode LAM, electrolyte depletion, and R-overpotential coupling mechanisms (Track 2). The remaining held-out gap on knee-prone cells is the work for the N≥30 cross-cell validation panel beginning 2026-05-11.

What this means for the cycle-life forecasting use case

Use the calibrated-fit numbers when evaluating whether the model can reproduce a curve you've already characterized — design-of-experiments sweeps, sensitivity analysis, mechanism decomposition on cells you have full data for.

Use the held-out forward-prediction numbers when evaluating whether the model can predict cycle life from limited early-cycle data — warranty modeling, fleet replacement-rate forecasting, BMS calibration from sparse field measurements. Linear-degradation cells are production-ready; knee-prone cells are active research.

The /api/calibrate response surfaces both numbers when you supply enough data to compute them. The dispatch_to field on the pre_screen endpoint flags cells that route to the identifiability-limited branch — those are the ones where the held-out gap matters.

Out-of-validated-envelope behavior

Calibrated envelope: 2.5–20% silicon, 0.5C–2C charge, 25–45°C ambient.Outside this envelope every predict response includes a validation_warnings array with field-specific messages and an out_of_validated_range boolean.

Two known model gaps in the customer-visible warning:

  • Cold-charge plating (T < 0°C): the model's plating mechanism does not activate at sub-zero charging. Real silicon-graphite cells plate severely in this regime. The validation warning correctly flags T < 25°C as outside the calibrated envelope; the underlying model behavior underestimates degradation here. Do not use for cold-climate cycle-life forecasting until this is resolved.
  • Extreme fast charge (≥3C): mechanical fatigue and transport collapse activate but the magnitude is not validated against published >3C cycling data. Treat predictions as directional, not certified.

Reproducibility

Every simulation run produces an AuditRecord with deterministic run_id (SHA-256 of config + protocol), config_hash, model_version, output_schema_version, timestamp, platform info, and invariant-check messages. Run the same config + protocol against the same model version at any point in the future and you get an identical retention curve.

Version retention policy: model versions are pinnable via the ?model_version= query parameter and remain callable for a minimum of 24 months after a successor version ships. See /sla for full details.

Open peer review welcome

We publish both calibrated-fit and held-out numbers because the difference matters and because we expect peer reviewers, national lab researchers, and warranty actuaries to compute their own. If you've reproduced these numbers and disagree, write us at jason@scaleprognostics.com with the data and we'll publish the disagreement here.