mars

HPC Contracts

This page defines what an HPC-ready mars-earth release is allowed to claim. It converts the archived HPC roadmap and packaging feasibility notes into testable contracts before implementation or external submission work starts. Parallel subagent execution guidance lives in HPC Parallel Execution Guide.

The current project is not yet an HPC runtime. The contracts below are gates: a release, registry recipe, or foundation packet must not claim an HPC tier until the corresponding contract is implemented, tested, and documented.

Contract Levels

Level Name Claim allowed Required evidence
H0 HPC-packaging ready Source-installable in HPC-style environments Spack, EasyBuild, and conda-forge recipes or submission artifacts; clean Linux smoke tests; no accelerator claims
H1 CPU throughput runtime Faster, deterministic batch replay on shared-memory CPU systems Rust CPU-parallel prediction/design-matrix paths, benchmark baselines, regression thresholds, thread controls, conformance parity
H2 Stable runtime boundary Host-language and packaging boundary is stable enough for HPC consumers Narrow C ABI or equivalent FFI, Arrow-compatible batch interchange where practical, ABI/version tests, documented error and memory ownership
H3 Accelerator-ready runtime GPU/accelerator execution is available for replay workloads Optional accelerator backend, CPU fallback parity, device capability detection, numerical tolerance contract, no mandatory accelerator dependency
H4 Distributed execution Multi-node or distributed replay is supported Explicit partitioning semantics, deterministic aggregation, distributed smoke tests, failure-mode documentation

Cross-Cutting Guarantees

Implementation Dependency Graph

The HPC tracks should be implemented in this order unless a later explicit design review changes the dependency graph:

  1. H0 packaging readiness can run in parallel with H1 benchmark baseline work, but upstream submissions must wait until package names, source URLs, checksums, and smoke tests are real rather than placeholders.
  2. H1 CPU throughput runtime is the first compute contract. It establishes benchmark baselines, serial/parallel parity, resource controls, and memory visibility that later contracts depend on.
  3. H2 stable runtime boundary depends on H1 semantics being stable enough to expose through an ABI or batch interchange layer.
  4. H3 accelerator portability depends on H1 benchmark data and should not choose a backend until CPU kernel shape and packaging constraints are known.
  5. H4 distributed execution depends on H1 partitioning semantics. It should depend on H3 only when distributed accelerator claims are being made.
  6. HPSF and E4S packets require H0 plus credible H1/H2 evidence, or must be explicitly framed as pre-submission feedback for packaging readiness only.

Parallel Work Ownership

When using parallel subagents, assign disjoint write scopes:

Lane Primary files Must not edit
Contract governance docs/hpc_contracts.md, claim-check docs/scripts Runtime kernels and upstream recipe contents
H0 Spack packaging/spack/**, Spack submission notes EasyBuild, conda-forge, Rust kernels
H0 EasyBuild packaging/easybuild/**, EasyBuild submission notes Spack, conda-forge, Rust kernels
H0 conda-forge packaging/conda-forge/**, recipe drafts Spack, EasyBuild, Rust kernels
H1 CPU runtime rust-runtime/src/**, Rust/Python benchmark and parity tests H2 ABI files, H3 accelerator files, packaging recipes
H2 ABI/Arrow FFI boundary files, ABI docs/tests, Arrow/batch adapters H1 kernel internals unless required through an agreed interface
H3 accelerator optional backend files and accelerator docs/tests H1 serial semantics, H0 recipes, H4 distributed adapter
H4 distributed distributed adapter files and cluster recipe docs/tests H1 kernel internals, H3 backend internals
HPSF/E4S packets community packet docs and evidence summaries Runtime implementation and packaging recipe internals

Cross-lane changes must be proposed in the owning track first. Workers should not revert or rewrite another lane’s files.

Claim-Check Gate

Before any HPC implementation or submission phase is marked complete, run a docs claim check that flags unsupported use of these terms unless the nearby text names the implemented contract level and limitation:

The claim check should start as a lightweight repository script and may later be promoted to CI.

H0: HPC-Packaging Ready

H0 means the project can be packaged by HPC-oriented package managers without

Required deliverables:

Non-goals:

H1: CPU Throughput Runtime

H1 means replay workloads can use shared-memory CPU parallelism while preserving single-thread deterministic behavior.

Required deliverables:

Initial success thresholds:

Non-goals:

H2: Stable Runtime Boundary

H2 means HPC consumers have a stable, narrow runtime boundary that can be used without depending on Python internals.

Required deliverables:

Non-goals:

H3: Accelerator-Ready Runtime

H3 means replay workloads can run through an optional backend. The shared backend registry, CPU-fallback selection layer, and optional H3 array-module replay adapter are implemented. H3 CUDA, ROCm, Metal, TPU, FPGA, and ASIC factories remain optional module-backed adapters and do not add mandatory vendor dependencies.

Required deliverables:

Non-goals:

H4: Distributed Execution

H4 means replay can be partitioned across workers or nodes with documented semantics. The current H4 implementation includes a CPU-cluster replay path and an explicit H4 command-backed multi-node adapter. The command adapter is unavailable unless a worker command such as python -m pymars.cluster_worker is configured, so imports and local predictions do not start cluster workers.

Required deliverables:

Non-goals:

External Submission Contract

External submissions must map to the contract level they actually satisfy:

Claim Review Command

uv run python3 scripts/check_hpc_claims.py --strict
./scripts/check_hpc_claims.sh

Current State

As of 2026-05-11: