Testing¶
The integration is built test-first: pure control math gets exhaustive unit,
property, snapshot, and mutation tests; the HA-facing glue gets full
pytest-homeassistant-custom-component integration tests.
Layout¶
tests/unit/ mirrors the package (control/, control/mpc/, devices/,
sensing/) and holds every pure test, including the cross-module
property/regression/snapshot suites; tests/ha/ holds the hass-fixture tests,
with the Home Assistant fixtures scoped to tests/ha/conftest.py (the root
tests/conftest.py carries only shared constants, and the engine-input
builders shared by the engine/property/regression suites live in
tests/unit/control/builders.py); the hass-side setup helpers
(TRV-with-number registry setup, desired-state and calibration-mode drivers)
live in tests/ha/helpers.py — which is also the single point of access
to coordinator internals: tests that need to manipulate internal state
(simulated clocks, injected learned models, store round-trips) go through
its intention-named accessors (runtime, set_maintenance_clock,
window_timers, rmot, forecast_cache, expire_startup_grace, ...), never coordinator._*
directly. This is lint-enforced: ruff's SLF001 (flake8-self) flags private
access on other objects everywhere — source and tests alike — with
helpers.py the single exempted file. Production refactors may break that
one helper section, but never the test files themselves. Mutation testing
targets tests/unit/ alone
([tool.mutmut] tests_dir), since those are the tests that kill control/
mutants.
Test plan¶
- Pure-function unit tests (no HA): comfort index & dew point against reference tables; hysteresis state machine across engage/release/edge sequences; sensor aggregation & fallbacks; MPC sysid/observer/optimizer on synthetic first-order thermal simulations (assert convergence, stability, no windup, bounds respected).
- Control-engine tests: arbitration priority (frost > window > outdoor gating > demand), heat/cool mutual-exclusion invariant, deadband early-out, the worked example from requirements (home avg > 25 OR living room > 25 engages AC; stays on until both ≤ target or room near heat band).
- Adapter tests: the
ClimateAdapterissues the correct services for TRV valve %, TRV offset, and AC setpoint + mode over a generic HAclimateentity, honouringAdapterCapabilities(capability gating, range/step clamping) with mocked services. - Integration tests (hass fixture): config flow (device + sensor selection, overrides), entity creation snapshots, options/runtime entity changes re-trigger control, restart restores learned MPC + preset state.
- Resilience tests: a TRV or AC going
unavailableexcludes only that device while the home entity stays available and controls the rest; an offline sensor drops out of the home/area average; an area with an offline configured sensor falls back to the home average; one device raising an exception/timeout never aborts the cycle for the others; absent-device learned state is retained across the dropout. - Regression fixtures: recorded sensor traces replayed through the control
engine (
control/engine.py) to catch behavioural drift.
A subagent-driven verification pass reviews coverage gaps before sign-off.
Property tests (Hypothesis)¶
Hypothesis property tests check control invariants across generated inputs
(e.g. the heat/cool mutual exclusion, hysteresis monotonicity, bounds on
computed valves). CI runs with HYPOTHESIS_PROFILE=ci (no deadline —
loaded-runner deadline flake protection; the profile is registered in
tests/conftest.py).
Snapshot tests (syrupy)¶
syrupy powers entity-state and diagnostics snapshot tests, plus dedicated
numeric regression snapshots (tests/unit/test_snapshot_regression.py,
tests/unit/snapshots/) pinning the exact comfort-curve and
adaptive-cooling-comfort outputs. Regenerate intentionally with
pytest --snapshot-update.
Coverage gate¶
pytest-cov targets ≥ 97% line + branch coverage on control/,
sensing/, devices/, gated in CI. Coverage is not forced in
pyproject.toml addopts — that would break single-file runs (fail_under)
and mutmut's per-mutant subset runs; CI passes --cov explicitly. Run it
locally with:
A codecov.yml splits coverage into components (control / devices / sensing /
shell) for visibility; the hard 97% gate stays in pytest. The few lines left
uncovered are deliberate: defensive guards unreachable through the public
surface (e.g. the target-mode branch of _calibration_writes, which its only
caller already filters out) and the UnsupportedStorageVersionError import
fallback, which only executes on HA < 2026.3 and is exercised by the floor
canary rather than the gated run (excluded via pragma: no cover).
Mutation testing (mutmut)¶
Line/branch coverage and Hypothesis invariants don't prove the math is
right — a flipped sign or mis-scaled dt in the thermal model, optimizer, or
comfort curves can still pass. mutmut mutates the pure control modules and
surfaces any mutant the suite fails to kill.
-
Scope (
[tool.mutmut]inpyproject.toml):paths_to_mutateiscustom_components/climate_orchestrator/control/only — the highest-value, subtlest code.tests_diristests/unit/(the pure tests are what killcontrol/mutants; skipping the hass-fixture suites makes mutation runs far faster).also_copyincludes the wholecustom_components/package because mutmut copies onlypaths_to_mutateinto itsmutants/work dir, butcontrol/imports the rest of the package. -
Running. Run in the HA dev env (Python 3.14):
On macOS, run it inside a Linux container instead — mutmut's runner uses a raw
os.fork, which crashes on macOS ≥ 13.2 (the setproctitle after-fork problem, boxed/mutmut#446): -
Triage. For each surviving mutant, inspect it (
uv run mutmut show <id>) and decide: a meaningful survivor (the mutation changes observable behaviour) gets a new test that kills it; an equivalent mutant (no observable behaviour change — e.g. a defensive clamp that current inputs never reach) is recorded with its rationale in the ledger below rather than papered over with a contrived test.
Survivor ledger¶
Note
Mutation testing covers the control/ package only (the pure control
and MPC math, where a flipped sign survives every other net). The
stateful modules extracted from the coordinator — windows.py,
persistence.py, adaptation.py, events.py, supervision.py — are
exercised by the integration suites but are outside mutmut's scope.
A clean run is 845 mutants, 25 surviving (~97% killed) — measured on
Python 3.14 with the locked HA stack; counts shift when control/ or the
dependency lock changes. Every survivor
below has been triaged (rounds #98, #115, #127) and accepted as a residual;
any survivor not on this list is a regression and must be either killed
with a test or triaged onto this list with a rationale. Inspect any entry
with uv run mutmut show <name> (prefix:
custom_components.climate_orchestrator.control.).
| Module / mutants | Count | Why they survive |
|---|---|---|
mpc.controller — compute_valve_pct__mutmut_1 |
1 | Default-arg artifact: mutates the max_opening_pct=100.0 default to 101.0. Observably equivalent because of the final [0, 100] hardware clamp — any optimum above 1.0 clamps to 100 either way. |
mpc.optimizer — _rollout_cost__mutmut_3, optimize_valve__mutmut_{1,2,8,11,31} |
6 | Default-arg artifacts (horizon, effort_weight, max_opening defaults) and bound-edge variants the bounded minimiser maps to the same clamped output. |
slope — temperature_slope_per_min__mutmut_{10,22,23,26,29,30} |
6 | Equivalent variants of the least-squares slope accumulation: reordered/refactored sum terms that produce identical results for every reachable input. |
comfort — effective_temperature__mutmut_{1,2} |
2 | Equivalent rearrangements of the blend expression. |
adaptive_bias — update_bias_integral__mutmut_{1,2} |
2 | Default-arg artifacts on decay. |
adaptive_comfort — cool_edge_shift__mutmut_{3,5} |
2 | Equivalent under the saturating-exponential contract (clamps already bound the output). |
engine — decide__mutmut_{84,118} |
2 | Equivalent: alternate orderings of independent guard checks with identical decisions. |
runtime_stats — runtime_fraction__mutmut_{28,40} |
2 | Equivalent boundary rewrites already pinned by the exact-value tests; the mutated forms compute the same fractions. |
hysteresis — evaluate_demand__mutmut_1 |
1 | Default-arg artifact. |
forecast — expand_forecast__mutmut_4 |
1 | Equivalent interpolation rewrite (same series for every step grid). |
Note
Mutant names are positional: editing a function renumbers its mutants,
so after touching a control/ module expect renamed survivors and
re-triage just those (as in round #127, where four renumbered survivors
turned out to be genuinely new and were killed).
Next: Tooling — the lint/type/CI stack that runs all of this.