Verification¶
sft-wick’s correctness rests on two complementary layers of evidence:
A deductive test suite (Phases 1–8, 275 tests, ~3.5 min full run) that pits each elementary transformation in the package against an independent reference — any failure points at a specific module.
Two inductive end-to-end demos (Gaussian and non-Gaussian driving) that compare the package’s full pipeline to direct Langevin simulation over a large parameter grid.
Both layers are necessary: the deductive suite proves the machinery is right per step; the demos confirm the output matches physics on non-trivial problems.
Overview — why deductive tests prove correctness¶
Most scientific software is validated inductively: run it on a case where you know the expected answer (comparison to simulation, a known limit, published numbers), see they agree. This is a necessary sanity check but not a proof — you have shown only that in the tested regime, the answer happens to be right. A bug could still hide in any untested regime.
Deductive verification takes a different stance. For each elementary transformation the code performs — expanding products, applying Wick’s theorem, collapsing component-index sums, integrating a diagram numerically — construct an independent reference and check they agree exactly (or within a precisely-defined tolerance set by the numerical method, not by convenience). A pass at this level does not just say “the final answer looks plausible”; it says every step is doing what it should, and any future regression in a specific step will surface as a specific failing test.
The suite has eight phases. Phases 1–4 cover the symbolic + numerical core; Phases 5–8 were added alongside the L1/L2 wrapper layer to cover spatial homogeneity, white-noise components, dynamic non-local couplings, time-dependent linear operators, and the end-to-end workflow and YAML-config layers.
Phase 1 — Symbolic expansion¶
File: ``tests/test_deductive_expansion.py``, ~80 tests, <1 s.
The operator-level Wick engine, simplification passes,
diagram-canonicalisation, and index-routing logic are compared
against a brute-force reference (tests/brute_wick.py) that has
zero imports from sft-wick and therefore constitutes an
independent re-implementation of the same rules.
For the simplest non-trivial case — order 2, scalar / two-component field, single cubic vertex — the brute reference enumerates all 105 candidate pairings and classifies each by vanishing reason:
ψ-ψ contractions (auto-zero) |
15 |
R(x,x) killed by Itô |
48 |
Causal R-loops (DFS cycle check) |
12 |
Surviving pairings |
30 |
Labelled spatial topologies |
12 |
Canonical Feynman diagrams (after |
6 |
vertex-relabeling) |
Every one of those integers has a dedicated test that cross-checks it against sft-wick’s output. Topics covered:
pairing count
(2n−1)!!for n=2..5 (T1)per-pairing vanishing rules (T2)
operator-level Wick sum vs brute propagator-product sum (T3)
spatial topology multiplicities (T4)
two independent compute_moment paths (collect_topology=False vs True) produce the same Feynman topology set (T5)
rational prefactor \((-1)^n / n! \times \mathrm{multinomial}\) (T6)
response-phase \((-i)^{n_R}\) (T7)
two canonical-form algorithms (nauty vs brute permutation search) induce the same equivalence relation (T8)
diagram-collection soundness — no false merges, no false splits (T9)
individual simplification passes (T10)
non-local vertex instantiation and leg-count vanishing (T11, T12)
FK coupling-sum collapse at N=2 and N=3 (T13, T20)
α=0 regression between Case A and Case B (T14)
multi-component coverage at N∈{1,2,3} with hand-derived closed forms for all six order-2 FF diagrams (T15–T19)
non-iso propagator consistency (T21)
apply_diagonalretainsKroneckerDelta(a, b)for external observable indices so cross-pair (a != b) order-0 evaluates to 0 underdiag_C=True(regression lock for a long-standing bug that demo1 / FF tests didn’t catch because they only exercise diagonal observable pairs).
Phase 2 — Propagator numerics¶
File: ``tests/test_deductive_numerics.py``, ~22 tests, ~50 s.
The closed-form \(C(t, t')\) formula and the package’s
PropagatorCache are verified against domain-split
scipy.dblquad references that reach 1e-12 relative precision.
closed-form C vs adaptive-quadrature (domain-split to kill the cusp at \(\tau_1=\tau_2\), N1)
Gauss-Legendre and trapezoidal quadrature convergence (N2)
stationarity limit \(C \to \lambda / [\gamma(\gamma+1/\sigma_t)]\) (N3)
dimensional scaling in \(\lambda\) and \(\gamma\) (N4)
α=0 regression for the two-kernel form (N5)
two-kernel \(\kappa^{(2)}_{\mathrm{eff}}\) shape at α>0 (N6)
spline-table (
precompute_C_table) interpolation error (N7)Itô flag end-to-end (N8)
off-diagonal \(\kappa^{(2)}\) → non-diagonal C propagator (N9)
Phase 3 — Full diagram evaluation¶
~3 tests, ~1 s.
For the double-tadpole diagram (where the time integral factorises
into two independent 1-D integrals, each evaluable to machine
precision via scipy.quad), QMC and scipy.nquad are compared
to the closed-form reference to 1e-6 relative (P1). The QMC path
is also compared to the moment closed-form \(N^2 \times B^2\)
within its self-reported 3σ error band (P2), and the Sobol
convergence rate is checked across a scan of n_samples (P3).
Phase 4 — Alternative-path consistency¶
~7 tests, ~17 s.
sft-wick exposes multiple paths for the same computation (a nauty- based symbolic engine, a vectorised QMC, a parallel batch). Each alternative is pitted against a tested baseline — two independent implementations producing the same answer is strong deductive evidence that both are correct.
Alternative |
Baseline |
|---|---|
|
|
parallel |
serial |
|
|
|
|
dispatcher auto-selection |
explicit method paths (C6) |
A benchmark-driven outcome of this phase: the default
integrate_moment(method='qmc') now auto-selects the vectorised
path whenever the cache supports batch C evaluation, yielding a
208× speedup over the previous scalar default with bit-identical
results.
Phase 5 — Spatial coordinates (homogeneity modes)¶
File: ``tests/test_deductive_numerics.py``, ``TestSpatialAwareCache`` (S1–S8 + S2b/S6b) and ``tests/test_d_dim_spatial.py`` (DD1, DD2).
The spatial caches add three homogeneity-dependent code paths —
translation (1-D or d-dim, reduces to r = ||x1 - x2||),
rotation (radial + Legendre, accepts d-dim unit vectors), and
general (full 2-D, scalar-x only on the full-grid path). Each
has its own precompute_C_table_* builder and its own spline
backend. Phase 5 verifies that:
translation and general give bit-identical answers on translation-invariant inputs (S3);
rotation reduces to closed-form on the direct-radial kernel (S6);
bubble-diagram cross-group C scaling \(\xi(r)/\xi(0) = e^{-r/\sigma_x}^{n_{\mathrm{cross}}}\) holds under both eager and lazy cache builds (S5, S7);
translation accepts d=3 vector positions and reduces them to the right scalar
r(S2b);rotation accepts d=3 unit-vector inputs (S6b);
general full-grid mode raises
NotImplementedErroron vector inputs and directs the user to lazy mode (S8);Expansion.evaluate(positions=...)end-to-end produces the same total for 1-D scalar and equivalent 3-D vector configurations (DD1), and is rotation-isotropic for translation-invariant noise (DD2).
Phase 6 — White-noise component¶
``TestWhiteNoiseComponent``, W1–W3.
GaussianNoise.sigma2 adds a δ(t-t’)·σ²(t) white-noise term to
κ². Phase 6 verifies linearity (W2 — smooth + white = sum of both
evaluated separately) and that the white-noise piece is correctly
absorbed into the lazy translation spline alongside the smooth
kernel (W3).
Phase 7 — Dynamic non-local coupling¶
``tests/test_workflow_config.py``, ``tests/test_dynamic_coupling.py``, and demo2 validation.
Non-local vertices with callable coupling tensors are routed
through DynamicCouplingPromise inside
integrate_moment_qmc_vectorized. Two contracts are supported:
fn(n_list, t_list) → tensor– per-sample call (default).fn(n_2d, t_2d) → (n_samples, ...)when the user opts in viaNonLocalVertex(coupling_vectorized=True)– one call per integrand, useful for heavy callables.
Locked invariants:
DC1 – prop-indexed dynamic coupling raises
NotImplementedError(the v1 limitation).WF6 – a constant callable κ^(3) routed through the dynamic path produces the same FK total as the same constant tensor on the static fast path (rtol 1e-10).
WF7 – per-sample and vectorised callables produce bit-identical totals (rtol 1e-10).
Validated against demo2’s FK channel to 0.48 % relative agreement against the hand-coded reference integrator.
Phase 8 — Time-dependent linear operator¶
``tests/test_diagonal_A_time_dependent.py``, T1–T4.
DiagonalA(gamma=callable) accepts a time-dependent γ(t) and
pre-computes \(\Gamma_a(t) = \int_0^t \gamma_a(\tau) d\tau\) on a
grid as a cubic spline. Phase 8 checks:
callable-constant γ matches the static list input (T1, T1b);
linear γ(t) = g₀ + g₁·t agrees with its closed-form Γ(t) (T2);
causality R(t₁, t₂)=0 for t₁<t₂ is preserved (T3);
end-to-end
Systemround-trip works with the time-dependent path (T4).
Phases WF + CF — Workflow and YAML round-trip¶
``tests/test_workflow.py`` (WF1–WF5) and ``tests/test_workflow_config.py`` (CF1–CF5).
WF tests check that sweep
reproduces validate_phase5.py reference values to 1e-5 relative
(WF4), that by_vertex_type() classifies demo2’s 6 FF + 2 FK
order-2 diagrams correctly (WF3), and that
totals has the expected
DataFrame schema (WF5).
CF tests verify that a YAML config → Python System
round-trip produces a structurally identical system (CF1, CF2), and
that the full YAML-driven run_workflow agrees with the equivalent
L1 Python pipeline to rtol=1e-10 (CF3 — a bit-identity equivalence
check). CF4 exercises --override, CF5 covers malformed-input
errors.
What the deductive suite buys you¶
275 passing deductive tests do not prove that every physical observable is computed correctly in every regime — that is an infinite claim no test suite can make. But they do prove:
Every known failure mode at the tested scale has a specific test catching it, so a future bug cannot silently pass through.
Any future regression in a specific transformation produces a specific failing test that points at the broken module.
The multiple alternative paths (fast, parallel, numerical) all return the same answer as the default, and the dispatcher picks the fastest compatible path automatically.
Symbolic and numerical layers are decoupled: Phase 1 does not depend on Phase 2 passing and vice versa, so a numerical bug cannot mask a symbolic bug or vice versa.
End-to-end validation — the two demos¶
For full inductive validation against direct simulation on
physically non-trivial problems, see the two demos in
examples/:
- examples/demo1 — Gaussian driving
Two-component Langevin system with quadratic self-interactions, Gaussian \(\eta\) with exponential space-time covariance, compared against a Heun-integrated Monte Carlo simulation with \(10^5\) realisations. Perturbative expansion to order 4 (0th, 2nd, 4th FF). Agreement within the MC error envelope across the full \((r, t)\) grid. Open:
examples/demo1/analysis_combined.ipynb.- examples/demo2 — Non-Gaussian driving, non-local MSR vertex
Same Langevin system with a quadratic deformation of the noise \(\tilde\eta = \eta + \alpha(\eta^2 - \lambda)\). The resulting \(\kappa^{(3)}\) introduces a non-local ψψψ interaction vertex, splitting the order-2 diagrams into FF (6) + FK (2) + KK (0, selection-rule forbidden). Demonstrates:
the non-local vertex infrastructure (
Vertexwithlocal=False)a coupling-mediated selection rule forcing FK to contribute only to the cross correlator \(\xi_{12}\)
the first-principle two-kernel \(\kappa^{(2)}_{\mathrm{eff}} = \lambda \kappa_k + 2\alpha^2\lambda^2 \kappa_k^2\) closing the diagonal residual at the expected 1:1 level.
Open:
examples/demo2/analysis.ipynb.
Running the tests¶
# Deductive core (fast, <10 s — Phase 1 only)
pytest tests/test_deductive_expansion.py -v
# Full numerical deductive suite (includes Phases 2–7, ~2 min)
pytest tests/test_deductive_numerics.py -v
# Workflow + config + time-dependent-A tests
pytest tests/test_workflow.py tests/test_workflow_config.py \
tests/test_diagonal_A_time_dependent.py -v
# Everything (275 tests, ~3.5 min on M-series)
pytest tests/ -v
# Demo notebooks (minutes)
cd examples/demo1 && python run_simulation.py --n_real 5000 && \
jupyter nbconvert --to notebook --execute analysis_combined.ipynb
cd examples/demo2 && python run_simulation.py --n_real 200000 --alpha 0.6 && \
jupyter nbconvert --to notebook --execute analysis.ipynb
For the detailed per-test matrix, tolerances, and design rationale,
see deductive_verification.md
at the project root — it is the authoritative reference that test
code links back to when explaining a specific check.