Energy-Flow Cosmology (EFC) — Evaluation Ledger

Pitch Validation Likelihood Evaluation Models White Paper Roadmap Gaps External Predictions Atlas Changelog

Morten Magnusson · Symbiose Research, Sandnes, Norway · ORCID: 0009-0002-4860-5095 · April 2026 · CC-BY-4.0

Status: This ledger closes the global closure gap from the 2026-04-23 system audit: per-test outcomes are aggregated into a single, quantitative verdict via frozen thresholds and a documented score function. Verdicts are computed, not paraphrased. Falsification status and model preference are reported as separate fields — conflation was an explicit audit finding. Scope today: thresholds frozen, aggregation formula defined, weight registry seeded with defaults; live current_state computation after Likelihood Ledger pipelines execute.


1. Purpose

The Validation Ledger records per-test results. The Likelihood Ledger records how those results were computed. This ledger computes the global verdict: how results combine, where the falsification line is, and what the current state of the framework is.

Without this layer, the system can produce a wall of test outcomes without ever answering: is EFC currently falsified, in tension, or acceptable — and how does it compare overall against ΛCDM?


2. Per-test verdict thresholds

Applied to every test with a chi2 or delta_chi2 in the Likelihood Ledger before aggregation.

VerdictCriterionInterpretation
PASS|Δχ²| < 1Indistinguishable from reference at current precision
TENSION1 ≤ |Δχ²| < 9   (~1–3σ)Notable deviation; not decisive
FAIL|Δχ²| ≥ 9   (>3σ)Decisive deviation against expectation

Validation Ledger tests that carry the vocabulary value COLLAPSED map to FAIL; MARGINAL maps to TENSION when |Δχ²| is unavailable.


3. Global score function

Frequentist form (always computed):

EFC_score = Σᵢ  wᵢ · Δχ²ᵢ

Sum runs over inference-grade tests only — those with a Likelihood Ledger row in status executed or frozen. Weights wᵢ declared in the weight registry (default 1.0).

Bayesian form (when evidence is available):

ln K_total = Σᵢ  ln Kᵢ      (log-Bayes factor, EFC vs reference, per test)

Reported alongside the frequentist score whenever the Likelihood Ledger row includes evidence in its outputs.


4. System-level falsification rule

EFC is falsified at the system level if either:

  1. Any KC-tagged kill-criterion test returns FAIL, or
  2. The global posterior odds K_total against ΛCDM exceed 1 : 100 (decisive against, Jeffreys' scale).

A non-kill-criterion test returning FAIL produces TENSION at the system level — flagged for review but not falsifying. The kill-criterion registry is pulled from validation-ledger/data/grav.json and mirrored into evaluation-ledger/data/evaluation.json["thresholds"]["kill_criterion_ids"].


5. Falsification status ≠ model preference

An explicit gap in the 2026-04-23 audit: "non-rejectable" and "best model" were conflated. This ledger reports both as separate fields.

FieldValuesSource
falsification_statusnon-falsified · tension · falsifiedPer rule in §4
model_preferenceprefers_efc · neutral · prefers_lcdm · insufficient_dataPer ΔAIC / ln K from Model Comparison

Both are reported every release. "ΛCDM is currently preferred" and "EFC is currently not falsified" can both be true simultaneously — this ledger refuses to paper over that.


6. Weight registry

Non-unit weights must carry a rationale. Default is weight = 1.0.

{
  "test_id": "s8_growth_fsigma8_2026",
  "weight": 1.0,
  "rationale": "Primary observable for μ < 1 prediction"
}

Down-weighting without a rationale string is rejected by the sync script. This blocks silent suppression of inconvenient tensions.


7. Atlas linkage read-only

The Atlas defines the model space. This ledger reads from Atlas via data/atlas-link.json to project per-test posteriors back into Atlas viability regions for reporting.

FieldValue / meaning
atlas_refPointer to the Atlas resource being consumed
consumed_fieldse.g. allowed_region, excluded_region
modifies_atlasfalse (schema-enforced constant)

This ledger does not write to the Atlas. Downstream projection (posteriors → Atlas regions) is a separate step; Atlas remains the source of truth for its own content.


8. Reporting block (per release)

Every generation of the ledger publishes a current_state block:

{
  "falsification_status": "non-falsified | tension | falsified",
  "model_preference":     "prefers_efc | neutral | prefers_lcdm | insufficient_data",
  "efc_score":            <number or null>,
  "ln_k_total":           <number or null>,
  "tension_tests":        ["test_id", ...],
  "fail_tests":           ["test_id", ...]
}

9. Open questions

  1. Default weighting — uniform (wᵢ=1) or proportional to data volume? Current: uniform with opt-in rationale-backed overrides.
  2. Correlated tests — should the sum be covariance-aware when observables overlap? Planned: covariance-aware variant for related observables.
  3. Kill-criterion registry — hard-coded here or pulled from grav.json? Proposal: pulled from grav.json (single source of truth).

Bottom line

What this ledger fixesTurns per-test results into a single quantitative system verdict with frozen thresholds — no more semantic verdicts.
Hard ruleFalsification status and model preference are reported separately, always. Conflation is blocked by schema.
Hard dependencyLikelihood Ledger rows in status executed or frozen. No pipeline, no score.
Atlas safetyRead-only linkage via atlas-link.json; modifies_atlas pinned to false in schema.
StatusThresholds and formulas frozen; weight registry seeded; current_state computation blocked until the first Likelihood Ledger pipeline executes.

© 2026 Energy-Flow Cosmology Initiative · Evaluation Ledger — frozen thresholds, documented aggregation, separated verdicts