Energy-Flow Cosmology (EFC) — Evaluation Ledger

Pitch Validation Likelihood Evaluation Models White Paper Roadmap Gaps External Predictions Atlas Changelog

Morten Magnusson · Symbiose Research, Sandnes, Norway · ORCID: 0009-0002-4860-5095 · June 2026 · CC-BY-4.0

RCMP framing. EFC is read by regime (L0–L3), not by the background: ΛCDM is the special-case limit of EFC in L0/L1, derived from the variational action (K(ρ)→∞ ⇒ μ,Σ,η→1), so recovering ΛCDM there is a designed feature, not a contest. EFC-distinctive physics lives in L2 (perturbation/growth) and L3 (galactic). falsification_status ≠ model_preference.

Status: This ledger closes the global closure gap from the 2026-04-23 system audit: per-test outcomes are aggregated into a single, quantitative verdict via frozen thresholds and a documented score function. Verdicts are computed, not paraphrased. Falsification status and model preference are reported as separate fields — conflation was an explicit audit finding. Scope today: thresholds frozen, aggregation formula defined, weight registry seeded with defaults; live current_state computation after Likelihood Ledger pipelines execute.

1. Purpose

The Validation Ledger records per-test results. The Likelihood Ledger records how those results were computed. This ledger computes the global verdict: how results combine, where the falsification line is, and what the current state of the framework is.

Without this layer, the system can produce a wall of test outcomes without ever answering: is EFC currently falsified, in tension, or acceptable — and how does it compare overall against ΛCDM?

2. Per-test verdict thresholds

Applied to every test with a chi2 or delta_chi2 in the Likelihood Ledger before aggregation.

Verdict	Criterion	Interpretation
PASS	\|Δχ²\| < 1	Indistinguishable from reference at current precision
TENSION	1 ≤ \|Δχ²\| < 9 (~1–3σ)	Notable deviation; not decisive
FAIL	\|Δχ²\| ≥ 9 (>3σ)	Decisive deviation against expectation

Validation Ledger tests that carry the vocabulary value COLLAPSED map to FAIL; MARGINAL maps to TENSION when |Δχ²| is unavailable.

3. Global score function

Frequentist form (always computed):

EFC_score = Σᵢ  wᵢ · Δχ²ᵢ

Sum runs over inference-grade tests only — those with a Likelihood Ledger row in status executed or frozen. Weights wᵢ declared in the weight registry (default 1.0).

Bayesian form (when evidence is available):

ln K_total = Σᵢ  ln Kᵢ      (log-Bayes factor, EFC vs reference, per test)

Reported alongside the frequentist score whenever the Likelihood Ledger row includes evidence in its outputs.

4. System-level falsification rule

EFC is falsified at the system level if either:

Any KC-tagged kill-criterion test returns FAIL, or
The global posterior odds K_total against ΛCDM exceed 1 : 100 (decisive against, Jeffreys' scale).

A non-kill-criterion test returning FAIL produces TENSION at the system level — flagged for review but not falsifying. The kill-criterion registry is pulled from validation-ledger/data/grav.json and mirrored into evaluation-ledger/data/evaluation.json["thresholds"]["kill_criterion_ids"].

5. Falsification status ≠ model preference

An explicit gap in the 2026-04-23 audit: "non-rejectable" and "best model" were conflated. This ledger reports both as separate fields.

Field	Values	Source
`falsification_status`	`non-falsified` · `tension` · `falsified`	Per rule in §4
`model_preference`	`prefers_efc` · `neutral` · `prefers_lcdm` · `insufficient_data`	Per ΔAIC / ln K from Model Comparison

Both are reported every release. "ΛCDM is currently preferred" and "EFC is currently not falsified" can both be true simultaneously — this ledger refuses to paper over that.

6. Weight registry

Non-unit weights must carry a rationale. Default is weight = 1.0.

{
  "test_id": "s8_growth_fsigma8_2026",
  "weight": 1.0,
  "rationale": "Primary observable for μ < 1 prediction"
}

Down-weighting without a rationale string is rejected by the sync script. This blocks silent suppression of inconvenient tensions.

7. Atlas linkage read-only

The Atlas defines the model space. This ledger reads from Atlas via data/atlas-link.json to project per-test posteriors back into Atlas viability regions for reporting.

Field	Value / meaning
`atlas_ref`	Pointer to the Atlas resource being consumed
`consumed_fields`	e.g. `allowed_region`, `excluded_region`
`modifies_atlas`	`false` (schema-enforced constant)

This ledger does not write to the Atlas. Downstream projection (posteriors → Atlas regions) is a separate step; Atlas remains the source of truth for its own content.

8. Reporting block (per release)

Every generation of the ledger publishes a current_state block:

{
  "falsification_status": "non-falsified | tension | falsified",
  "model_preference":     "prefers_efc | neutral | prefers_lcdm | insufficient_data",
  "efc_score":            <number or null>,
  "ln_k_total":           <number or null>,
  "tension_tests":        ["test_id", ...],
  "fail_tests":           ["test_id", ...]
}

9. Open questions

Default weighting — uniform (wᵢ=1) or proportional to data volume? Current: uniform with opt-in rationale-backed overrides.
Correlated tests — should the sum be covariance-aware when observables overlap? Planned: covariance-aware variant for related observables.
Kill-criterion registry — hard-coded here or pulled from grav.json? Proposal: pulled from grav.json (single source of truth).

Bottom line

What this ledger fixes	Turns per-test results into a single quantitative system verdict with frozen thresholds — no more semantic verdicts.
Hard rule	Falsification status and model preference are reported separately, always. Conflation is blocked by schema.
Hard dependency	Likelihood Ledger rows in status `executed` or `frozen`. No pipeline, no score.
Atlas safety	Read-only linkage via `atlas-link.json`; `modifies_atlas` pinned to `false` in schema.
Status	Thresholds and formulas frozen; weight registry seeded; `current_state` computation blocked until the first Likelihood Ledger pipeline executes.