Validation & Testing¶
This page documents the validation framework and testing approach for the NHRA Game Theory toolkit.
Validation Philosophy¶
The model follows a multi-layer validation approach:
flowchart TB
subgraph unit["Unit Level"]
ut["Unit Tests"]
pt["Property Tests"]
end
subgraph mech["Mechanism Level"]
nash["Nash Equilibrium Checks"]
payoff["Payoff Consistency"]
end
subgraph system["System Level"]
sens["Sensitivity Analysis"]
back["Backtesting"]
end
unit --> mech --> system
Unit Testing¶
Test Suite¶
Tests are located in tests/ and run with pytest:
# Run all tests
poetry run pytest
# Run with verbose output
poetry run pytest -v
# Run specific test file
poetry run pytest tests/test_games.py
Coverage Requirements¶
Coverage is enforced at 95% minimum:
Property-Based Testing¶
We use Hypothesis for property-based testing:
from hypothesis import given, strategies as st
@given(st.floats(min_value=0.5, max_value=2.0))
def test_pressure_bounds(pressure):
"""Payoffs should be finite for valid pressure range."""
gp = GameParams(pressure=pressure, ...)
game = definition_game(gp)
assert np.isfinite(game.u_row).all()
Mechanism Validation¶
Nash Equilibrium Verification¶
For each game, we verify:
- Existence: At least one equilibrium exists
- Best Response: Each strategy is a best response
- Stability: Small perturbations don't break equilibrium
def test_nash_existence():
"""Every game should have at least one Nash equilibrium."""
gp = GameParams(pressure=1.0, efficiency_gap=0.3, ...)
for game_fn in [definition_game, bargaining_game, cost_shifting_game]:
game = game_fn(gp)
equilibria = solve_all_equilibria(game)
assert len(equilibria) >= 1
Payoff Monotonicity¶
Key payoff relationships are verified:
| Condition | Expected Outcome |
|---|---|
| ↑ Pressure | ↑ Coordination incentive |
| ↑ Efficiency gap | ↑ Cost shifting temptation |
| ↑ Audit pressure | ↓ Upcoding incentive |
def test_pressure_monotonicity():
"""Higher pressure should increase coordination payoffs."""
gp_low = GameParams(pressure=0.8, ...)
gp_high = GameParams(pressure=1.5, ...)
game_low = discharge_coordination_game(gp_low)
game_high = discharge_coordination_game(gp_high)
# Coordination payoff should be higher under pressure
assert game_high.u_row[0, 0] > game_low.u_row[0, 0]
Sensitivity Analysis¶
Sobol Global Sensitivity¶
We use SALib for Sobol sensitivity analysis:
This produces:
- First-order indices (S1): Direct effect of each parameter
- Total-order indices (ST): Total effect including interactions
Key Findings¶
Typical sensitivity rankings (parameter importance for system pressure):
| Rank | Parameter | ST Index |
|---|---|---|
| 1 | cost_shifting_intensity |
~0.65 |
| 2 | efficiency_gap |
~0.20 |
| 3 | pressure (initial) |
~0.10 |
| 4 | Other parameters | <0.05 |
Recursive Backtesting¶
Approach¶
The recursive backtest validates model dynamics against historical patterns:
The backtest:
- Initialises from historical state (2017)
- Runs simulation forward
- Compares predicted vs actual trajectories
- Uses rolling window validation
Metrics¶
| Metric | Target | Description |
|---|---|---|
| RMSE | <0.15 | Root mean squared error |
| MAPE | <20% | Mean absolute percentage error |
| Direction | >70% | Correct direction of change |
CI/CD Validation¶
All validation runs automatically in GitHub Actions:
# Extract from .github/workflows/ci.yml
- name: Tests
run: poetry run pytest -q
- name: Input traceability
run: poetry run python scripts/check_parameters_grounded.py
- name: Verify Pipeline
run: poetry run snakemake --cores 1 run_baseline context_pack --forceall
Validation Gates¶
PRs must pass:
- All unit tests
- Type checking (mypy strict)
- Lint/format (ruff)
- Security scan (bandit)
- Parameter grounding check
- Pipeline verification
- Documentation build
Running Validation Locally¶
Full Validation Suite¶
# Run complete validation
poetry run snakemake --cores 4 all
# Or individually:
poetry run pytest # Unit tests
poetry run mypy --strict src/nhra_gt # Type check
poetry run python scripts/run_sobol_analysis.py # Sensitivity
poetry run python scripts/validation/recursive_backtest.py # Backtest
Quick Smoke Test¶
See Also¶
- Development Guide — Tooling and CI/CD details
- Game Theory Models — Game specifications
- Design Documentation — Architecture overview