Profiling and Performance

This guide documents the Rust-core profiling workflow around the committed scalar CPU baseline. It is intentionally narrow: the goal is to describe the measured baseline, the JSON artifact contract, and the criteria for deciding when SIMD or accelerator work deserves a follow-on track.

Baseline scaffold

The current benchmark scaffold lives in bindings/rust/benches/:

scalar_cpu_baseline.rs runs the deterministic EVPI workload
scalar_cpu_baseline.json records the workload, expected value, and regression policy
README.md explains the local entrypoint and the current comparison rule

The baseline is intentionally scalar and deterministic:

workload: a fixed two-strategy EVPI matrix
expected result: 3.0
metric type: scalar_cpu
comparison rule: exact workload/value match
regression policy: ci-contract-only

Recommended local check:

cargo test --benches scalar_cpu_baseline -- --nocapture

Workflow

Use the baseline in three steps:

Run the scalar benchmark and confirm the EVPI result matches the committed artifact.
Record timing, memory, or throughput measurements in the same artifact family using the same workload identity.
Compare the new artifact against the committed baseline before promoting a new optimization claim.

The key rule is that the workload seed and EVPI value remain stable unless the track explicitly re-baselines them.

Artifact format

The committed artifact is JSON and should stay small enough to review in code review or CI logs.

Example:

{
  "benchmark_name": "scalar_cpu_baseline",
  "metric_type": "scalar_cpu",
  "workload": {
    "seed": 42,
    "repeats": 10000,
    "net_benefits": [[10.0, 1.0], [2.0, 8.0]]
  },
  "expected": {
    "evpi": 3.0,
    "comparison_rule": "exact",
    "regression_policy": "ci-contract-only"
  },
  "metadata": {
    "phase": "phase-1-scalar-cpu-baseline",
    "notes": [
      "Deterministic baseline for the Rust core performance track.",
      "Timing comparisons are deferred until a stable baseline artifact exists."
    ]
  }
}

How to read the outputs

The current artifact family is correctness-first. Timing is observed, but the committed contract only enforces the scalar workload and value.

metric_type identifies the measurement family.
expected.evpi is the correctness anchor.
expected.comparison_rule states how strict the comparison is.
expected.regression_policy says whether CI only records the artifact or enforces a threshold.
metadata.phase records which profiling phase produced the artifact.

When memory and throughput artifacts arrive, they should keep the same JSON family and add measured fields for the new metric rather than replacing the scalar contract. The scalar workload and expected EVPI remain the baseline reference unless the track explicitly re-baselines them.

Promotion criteria for SIMD or accelerators

SIMD, Rayon, and accelerator work should be promoted only when the scalar baseline and artifact layer already exist.

Open a follow-on track when all of the following are true:

the scalar baseline is stable and reproducible
the hot path shows a measurable gain from vectorization or parallelism
the proposed change preserves the same result semantics and tolerance policy
the memory/throughput artifacts show a repeatable improvement, not a one-off
the optimization can be described as an internal execution change rather than a new public contract

Practical order:

Scalar CPU baseline
Memory and throughput measurement
Rayon or equivalent multithreading feasibility
SIMD feasibility
GPU or other accelerator feasibility only if the earlier data justify it

If a candidate optimization needs a different workload, a different result shape, or a different correctness policy, it belongs in a follow-on track rather than in the baseline profiling contract.