mars

Parity Audit Recommendations

This memo turns the py-earth feature matrix and the mars / earth repo gap matrix into a Rust-first roadmap summary. It uses the bucket assignments from Parity Audit Gap Classification so the next work item can be assigned without reopening settled decisions.

Source Basis

Bucket Handoff

The classification note is the source of truth for what this memo promotes into the roadmap:

Executive Summary

The audit points to a small set of remaining parity-critical gaps that should drive the next Rust-first implementation work:

This recommendation list is scoped to the current repo gap matrix. The R earth matrix still contains additional parity-critical surfaces - plots, prediction intervals / variance models, GLM-style extensions, and update workflows - that remain upstream requirements and should be promoted into separate tracks if the repo chooses to close them.

At the same time, several differences are intentional and should stay outside the parity backlog:

The nice-to-have tie-handling row stays in the evidence queue until the upstream contract is pinned down more explicitly.

Parity-Critical Gaps To Close Next

Area Why it matters Recommendation
Example outputs and documented claims The classification note marks canonical examples and claims as parity-critical, so they must remain fixture-backed and visible in the audit trail. Lock down the canonical examples and documentation claims with fixtures before treating the public story as closed.
Serialization and pickle support The repo gap matrix marks check_estimators_pickle as an expected failure, and the classification note keeps serialization in the parity-critical bucket. If fitted estimators cannot round-trip cleanly, the Rust-first core cannot yet be treated as a stable model artifact. Prioritize a dedicated Rust-first state serialization track for Earth and its fitted model state. Add fixture-backed save/load coverage and keep the sklearn pickle check in the failure queue until the round-trip contract is proven.
multioutput regression The repo gap matrix still treats multioutput regression as an expected failure, and the classification note keeps it parity-critical. That is a material compatibility gap for sklearn users and should not be folded into a generic cleanup bucket. Create a separate multioutput implementation track with explicit success criteria for fit, predict, and estimator checks. Do not blur this into the single-output path until the supported contract is clear.
  1. Close example outputs and documented claims first. This is parity-critical and keeps the public contract fixture-backed before deeper implementation work proceeds.
  2. Close estimator artifact persistence next. The classification note keeps serialization in the parity-critical bucket, and fitted-model round-trips are a prerequisite for a stable Rust-first core artifact boundary.
  3. Close multioutput after serialization. It is still parity-critical, but it is a separate behavioral contract from single-output fitting and should stay isolated.
  4. Keep tie handling in the evidence queue. It is only nice-to-have, so it should inform audit coverage without becoming the next implementation track.
  5. Leave warning behavior, formula/interface ergonomics, and packaging/versioning/release behavior documented as intentional boundaries.
  6. Treat the R earth plot / interval / GLM / update-workflow surface as a separate future track family rather than folding it into the current repo gap list.

Intentional Boundaries Versus Future Tracks

These upstream-only or intentionally out of scope differences should stay explicit as boundaries, not parity defects:

If any of these boundaries are revisited later, they should be opened as separate implementation tracks with their own acceptance criteria.

Roadmap Implication

The parity audit should feed the Rust-first roadmap in this order: