Skip to content

Reference data manifests

Reference data manifests are the year-scoped source record for pricing inputs. They keep the artifact trail explicit: what was discovered, what was extracted, what was validated, and what remains a documented gap.

See the current implementation track in Conductor spec and CI notes.

Manifests should move through a small, visible lifecycle instead of being treated as a passive data dump.

StatusMeaning
source-discoveredThe upstream artifact is known and identified, but the manifest has not been normalized yet.
draftedA manifest stub exists with the year, schema version, and first-pass source inventory.
schema-checkedRequired fields, types, and cross-references have been validated.
gap-recordedThe year depends on a missing upstream artifact, and the absence is encoded explicitly.
fixture-testedThe manifest loads against checked examples and the validator behavior is stable.
validatedThe manifest is the accepted record for that pricing year.
deprecatedThe manifest has been superseded by a later year or schema revision.

Do not skip straight from discovery to validated. If a year is incomplete, record the gap and leave the manifest in an earlier status until the missing material is explained or replaced.

Every manifest should carry enough information to reconstruct both the source trail and the calculator support claim for that year.

Field groupRequired contentNotes
IdentityPricing year, manifest path, and schema versionThe schema version must be pinned in the file so evolution is explicit.
StatusValidation status and status date or noteUse the lifecycle terms above, not informal labels.
Source artifactsURL, local path, publication date, retrieval date, checksum, and provenance noteEach downloaded or extracted artifact needs a durable provenance trail.
ConstantsNEP/NEC constants and other year-specific pricing constantsTreat constants as source-backed inputs, not free-form commentary.
WeightsStream price weights and other pricing weightsWeight records should stay attached to the year that produced them.
AdjustmentsAdjustment parameters and any year-specific overridesKeep the parameter names stable where possible.
Coding setsCoding-set versions and any transition notesRecord the exact versions that shaped the record.
Support linkReference to the calculator support matrixThe manifest should be easy to line up with the public coverage story.
GapsExplicit gap records for anything missing upstreamMissing material must be encoded, never silently omitted.

Provenance is the difference between a useful manifest and a guess.

Keep these rules in mind:

Provenance ruleExpectation
TraceabilityEvery source entry should point back to a public artifact, archive record, or reproducible extraction step.
HashesDownloaded files should carry a checksum so future runs can detect drift.
Retrieval datesRecord when the source was fetched, not just when it was published.
License or usage notesPreserve any provenance or license caveat that affects redistribution or downstream use.
Transform notesIf an extraction, OCR, or normalization step happened, record it explicitly.

Source provenance should let a reviewer answer three questions quickly:

  1. Where did this value come from?
  2. What did we do to it before storing it?
  3. How would we prove it still matches the source today?

Gap records are first-class manifest content. They explain why a year is not fully populated without pretending the missing artifact exists.

Use a gap record when:

  • an expected source file was never published
  • a recorded URL now resolves to a 404 or equivalent dead end
  • OCR or extraction failed and the result would be misleading
  • a year intentionally omits a source family that appears in adjacent years

Each gap record should include:

FieldPurpose
Gap keyA stable identifier for the missing item
Expected artifactWhat should have been present
ReasonWhy the artifact is missing or unusable
ImpactWhat part of the manifest or calculator support story is affected
ReplacementWhether another source, fixture, or note stands in for it

Gaps should remain visible in the manifest and in the source archive. They are part of the historical record, not an implementation failure to be hidden.

Schema evolution should be versioned and boring.

  • Add optional fields before making them required.
  • Bump the schema version when required fields, validation semantics, or provenance shape changes.
  • Keep older manifests readable during a transition window when possible.
  • Update the docs and CI notes in the same change when the schema contract changes.

The manifest schema should never drift silently across years. If a year needs a different shape, make that shape visible in the versioned schema field.

Manifest completeness is not the same thing as calculator support.

Use the public calculator coverage matrix to state which families are actually implemented, which are gap-recorded, and which have validation caveats. The manifest should point back to that matrix so readers can move between source provenance and executable coverage without guessing.