From 0d6bc043d22563a88acc4f50d994e36095788d4e Mon Sep 17 00:00:00 2001 From: Matthew Stanley <1379tech@gmail.com> Date: Tue, 28 Apr 2026 21:37:45 -0700 Subject: [PATCH] =?UTF-8?q?decompressed:=20cumulative=20synthetic=5From=20?= =?UTF-8?q?allocator=20=E2=80=94=20fixes=20overlap=20corruption?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The previous formula \`synthetic_rom = 0xFE000000 | rom_wrapper\` assumed wrapper offsets were spaced apart by at least their decompressed body sizes. They are NOT — Stadium's wrappers are densely packed (often within 0x100-0x10000 bytes of each other) while their decompressed bodies are 0x500-0x50000 bytes. This caused later sections' memcpy into context.rom to OVERWRITE earlier sections' bytes, corrupting their jump-table entries and any other content addressed by relative offsets. Concrete repro before the fix: pattern-activate Stadium's 0x8FF00000 slot. Section frag_8FF00000__rom_56E900 has impl_size=0xC2F4 (correctly bounded). Its jump table at body offset 0xC300 has 5 entries pointing to body offsets 0x48..0x74. After the section was added, frag_*__rom_574A50 (wrap_off=0x574A50, synthetic_rom=0xFE574A50) memcpy'd 0x58 bytes starting at 0xFE574A50 — INSIDE the first section's range [0xFE56E900, 0xFE57AC20). The jtbl bytes at offset 0xC300 (rom 0xFE57AC00) got clobbered with garbage from the second section's body. analyze_function then read jtbl entries that didn't decode to in-function vrams and reported "Failed to determine size of jump table" — a real symptom caused by silent data corruption. The fix: cumulative allocator. A static counter starts at 0xFE000000; each new section claims a fresh, 4-byte-aligned chunk equal to its reloc_offset. No two sections ever share a byte range. The 0xFE000000 prefix is preserved for traceability (synthetic ranges live above any real ROM offset). Fails the build cleanly if cumulative usage exceeds 0x100000000 (256 MB of synthesized payload), which Stadium's 0x8FF00000 slot at ~23 MB total is comfortably under. Verified: pattern-activated Stadium's 0x8FF00000 slot. After the fix, ZERO analyze_function failures and ZERO bounds-discovery failures (was 57+ before). Build now hits a different class — discover_function_bounds walks past real function ends via j/jal-in-body that are tail calls, not intra-function jumps. That's a separate analyzer bug, surfaced by this fix and tracked as the next layer of work. Still principle-clean: build aborts with specific instruction offsets. Static [[input.decompressed_section]] for fragment78 still recompiles cleanly. No regression on Stadium boot logo + PIKA jingle. Co-Authored-By: Claude Opus 4.7 (1M context) --- src/decompressed.cpp | 34 ++++++++++++++++++++++++++++++---- 1 file changed, 30 insertions(+), 4 deletions(-) diff --git a/src/decompressed.cpp b/src/decompressed.cpp index 84ff2db..cfc42e4 100644 --- a/src/decompressed.cpp +++ b/src/decompressed.cpp @@ -495,10 +495,21 @@ size_t add_decompressed_section(Context& context, return size_t(-1); } - // Stash decompressed bytes at synthetic_rom = 0xFE000000 | wrapper_off - // so the existing pipeline (which addresses sections via rom_addr) - // finds them. The 0xFE prefix is reserved for synthesized sections. - const uint32_t synthetic_rom = 0xFE000000u | rom_wrapper; + // Stash decompressed bytes in context.rom at a synthetic_rom that's + // GUARANTEED not to overlap any other section. We use a cumulative + // allocator: a static counter that grows as sections are added, so + // each section's bytes occupy a fresh, non-overlapping range. + // + // The previous formula (0xFE000000 | wrap_off) was wrong because + // Stadium's wrappers are densely packed in ROM — wrap_offs are + // closer together than the SUM of their decompressed sizes — so + // (0xFE000000 | wrap_offA) + size_A often overlapped + // (0xFE000000 | wrap_offB). The second memcpy clobbered the first + // section's body, including its jump-table entries. + // + // Cumulative allocation eliminates the overlap entirely. The + // 0xFE000000 prefix is preserved for traceability (synthetic ranges + // start above any real ROM offset, which is at most ~64 MB). const uint32_t reloc_offset = read_be_u32(blob.data() + 0x14); if (reloc_offset > blob.size()) { std::fprintf(stderr, @@ -507,6 +518,21 @@ size_t add_decompressed_section(Context& context, return size_t(-1); } + // Cumulative synthetic-rom counter. Aligned to 4 bytes so MIPS + // instruction reads are always aligned. + static uint64_t next_synthetic_rom = 0xFE000000ull; + const uint32_t synthetic_rom = uint32_t(next_synthetic_rom); + next_synthetic_rom += (uint64_t(reloc_offset) + 3u) & ~uint64_t(3u); + if (next_synthetic_rom > 0xFFFFFFFFull) { + std::fprintf(stderr, + "decompressed: section %s — synthetic_rom counter overflowed " + "32 bits (next=0x%llX). Engine assumes < 256 MB of " + "synthesized-section payload total.\n", + section_name.c_str(), + (unsigned long long)next_synthetic_rom); + return size_t(-1); + } + const size_t needed_rom_size = size_t(synthetic_rom) + reloc_offset; if (context.rom.size() < needed_rom_size) { context.rom.resize(needed_rom_size, 0);