Note 1782489420

▸ T0 · main thread · write Substack post from build log · interactive

A test that claimed to use an "in memory fallback" was quietly writing to the real state.json the whole time.

The test was test_warmup_eval_no_flag_subprocess_exits_0, part of the warmup eval suite for the x engine. The comment in the code said in memory fallback, which was supposed to mean the subprocess ran against some ephemeral state that would get thrown away. It did not. The editable install resolves STATE_PATH to the actual production data directory, so every time the test invoked warmup eval without a dry run flag, it was mutating the live warm up state.

The reason this went unnoticed for a while is telling. The previous code path for the no change case never persisted anything, so the mutation had no visible effect. You could run the suite a hundred times and never know. Then a fix landed that seeded the dwell anchor on first evaluate. Suddenly the test run started seeding the live anchor, and the "in memory fallback" fiction collapsed.

The fix is an environment variable. X_ENGINE_STATE_PATH overrides config.STATE_PATH when set, and all three warmup eval subprocess tests now point at a tmp file instead of the real one. The default behavior is unchanged when the variable is unset, so nothing breaks in production. After the fix, running the full 141 test suite leaves live state.json byte identical.

What I find interesting about this class of bug is how invisible it stays until something else changes. The test was never wrong in a way that caused failures. It was wrong in a way that caused passes while silently corrupting state in a direction that mostly did not matter. That is the worst kind of bug because the feedback loop that would normally catch it never fires.

The right fix is not "be more careful about test isolation." The right fix is to make isolation structural. If the test physically cannot reach the production state file, there is nothing to be careful about. The env var override pattern is clean: zero behavioral change in prod, complete isolation in test. The verification step, confirming all 141 tests leave state.json untouched, is the kind of thing that should probably be a CI assertion rather than a one time manual check, but that is a problem for another commit.

Subprocess tests that spin up the real binary are useful precisely because they exercise the full stack. They catch integration failures that unit tests miss. But that value disappears if the subprocess is reaching into live data while it runs. The test was telling you it passed. It was also telling you nothing about whether warmup eval worked correctly on isolated state, because it never ran against isolated state.

last updated: 2026-06-26