This document records the first runnable baseline set for the H1 replay-path
benchmark requirement in docs/hpc_contracts.md.
rust-runtimedesign_matrix and predicttests/fixtures/model_spec_v1.jsonRust criterion baselines:
cd rust-runtime
cargo bench --bench runtime_bench -- --noplot
Python smoke wrapper (thread controls and memory deltas; requires local pymars runtime dependencies):
python3 scripts/benchmark_runtime_threads.py \
--mode predict \
--rows 1024,8192 \
--threads 1,4 \
--repeats 3
All runs used thread_count=1 and thread_count=4. Medians below are from
the criterion output above.
| Workload | Operation | Threads=1 Median | Threads=4 Median | Delta |
|---|---|---|---|---|
| 64 rows | design_matrix |
6.4 us | 81.4 us | 11.3x slower |
| 1,024 rows | design_matrix |
93.3 us | 413.7 us | 4.4x slower |
| 8,192 rows | design_matrix |
712.0 us | 1.90 ms | 2.7x slower |
| 64 rows | predict |
2.4 us | 77.1 us | 32.0x slower |
| 1,024 rows | predict |
26.7 us | 133.3 us | 5.0x slower |
| 8,192 rows | predict |
215.7 µs | 201.3 µs | 0.93x (7% faster) |
Interpretation:
design_matrix and most predict workloads are currently slower in this benchmark
shape with a 4-thread override, and the delta is dominated by thread-pool
overhead.predict on 8,192 rows shows a small median improvement for threads=4.