The previous formula \`synthetic_rom = 0xFE000000 | rom_wrapper\` assumed
wrapper offsets were spaced apart by at least their decompressed body
sizes. They are NOT — Stadium's wrappers are densely packed (often
within 0x100-0x10000 bytes of each other) while their decompressed
bodies are 0x500-0x50000 bytes. This caused later sections' memcpy
into context.rom to OVERWRITE earlier sections' bytes, corrupting
their jump-table entries and any other content addressed by
relative offsets.
Concrete repro before the fix: pattern-activate Stadium's 0x8FF00000
slot. Section frag_8FF00000__rom_56E900 has impl_size=0xC2F4 (correctly
bounded). Its jump table at body offset 0xC300 has 5 entries pointing
to body offsets 0x48..0x74. After the section was added, frag_*__rom_574A50
(wrap_off=0x574A50, synthetic_rom=0xFE574A50) memcpy'd 0x58 bytes
starting at 0xFE574A50 — INSIDE the first section's range
[0xFE56E900, 0xFE57AC20). The jtbl bytes at offset 0xC300 (rom 0xFE57AC00)
got clobbered with garbage from the second section's body. analyze_function
then read jtbl entries that didn't decode to in-function vrams and
reported "Failed to determine size of jump table" — a real symptom
caused by silent data corruption.
The fix: cumulative allocator. A static counter starts at 0xFE000000;
each new section claims a fresh, 4-byte-aligned chunk equal to its
reloc_offset. No two sections ever share a byte range. The 0xFE000000
prefix is preserved for traceability (synthetic ranges live above any
real ROM offset). Fails the build cleanly if cumulative usage exceeds
0x100000000 (256 MB of synthesized payload), which Stadium's 0x8FF00000
slot at ~23 MB total is comfortably under.
Verified: pattern-activated Stadium's 0x8FF00000 slot. After the fix,
ZERO analyze_function failures and ZERO bounds-discovery failures
(was 57+ before). Build now hits a different class — discover_function_bounds
walks past real function ends via j/jal-in-body that are tail calls,
not intra-function jumps. That's a separate analyzer bug, surfaced by
this fix and tracked as the next layer of work. Still principle-clean:
build aborts with specific instruction offsets.
Static [[input.decompressed_section]] for fragment78 still
recompiles cleanly. No regression on Stadium boot logo + PIKA jingle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stadium's dynamic-asset slot at vram 0x8FF00000 contains a mix of
fragment shapes:
Code fragments — real MIPS function at +0x20 ending in jr \$ra
(and possibly more functions). Stadium dispatches
the +0x00 J trampoline to invoke them.
Data fragments — pure data starting at +0x20 (tables of (tag,
pointer) records, animation curves, etc.). The
+0x00 J trampoline is a dormant placeholder that
Stadium NEVER actually calls. Stadium reads the
data directly via R_MIPS_32 pointers from elsewhere.
The previous code path attempted to recompile a function at +0x20
in EVERY synthesized section, which (a) was incorrect for data
fragments, and (b) reliably produced invalid C from data words
decoded as instructions.
Detection heuristic: scan the first 0x100 instructions of the body
for any jr \$ra (encoded as 0x03E00008). If absent, the fragment is
data-only — register the section + R_MIPS_32 relocs but emit NO
FuncEntry rows. If Stadium ever does dispatch the +0x00 J for one
of these (which shouldn't happen), the runtime LOOKUP_FUNC reports
the miss loudly — that's the correct surface, NOT a stub.
Tested on Stadium's 0x8FF00000 slot via [[input.decompressed_section_pattern]]:
- 282 wrappers attempted
- 62 classified as data-only (registered without impl function)
- 220 attempted as code; first failure surfaces an analyze_function
jump-table sizing gap (separate issue, distinct from data-only
classification)
Static [[input.decompressed_section]] for fragment78 is unaffected
(still recompiles cleanly; boot logo + PIKA jingle still play).
The pattern stays inactive in Stadium's game.toml until the
analyze_function jtbl gap is addressed; build correctly refuses to
proceed if activated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a public N64Recomp::discover_function_bounds() in src/analysis.h
that performs a BFS-based control-flow walk of a function's body,
following:
- Conditional branches (target + fall-through)
- Unconditional j/jal targets when intra-body
- jr $ra returns (block ends after delay slot)
- jr-via-jump-table dispatches: the existing register-state
simulator from analyze_function detects the lui+addiu+addu+lw+jr
pattern and records the jtbl base; we then read entries out of
the body bytes and feed targets back into the BFS until
convergence.
Returns the function's byte size (max-reachable + 4 to cover the
delay slot of the last instruction). On failure, populates a specific
error message with the offending offset and reason — caller treats
this as a build error, NOT a graceful skip (per the project's
no-stubs principle).
Wires into decompressed.cpp's pattern path, replacing the prior
inline BFS that had a TODO for jump-table handling. The pattern
caller now propagates failures via `synthesize_decompressed_patterns`
returning false, which surfaces in main.cpp's exit_failure path.
Concrete behavior change: activating a pattern that includes a
fragment with computed jumps now produces a build error pointing at
the specific section name + offset + the analyzer's failure reason,
instead of silently producing a partial binary. Tested on Stadium's
0x8FF00000 slot — first failing wrapper is at ROM 0x8CC400 with an
indirect jr at offset 0x827C the simulator doesn't pattern-match.
The static [[input.decompressed_section]] path for fragment78 is
unaffected (still recompiles cleanly, no regression on boot logo +
PIKA jingle).
Future work surfaced by this change: the simulator's lui+addiu
+addu+lw+jr pattern doesn't cover every jump-table shape Stadium
uses. Each gap surfaces as a specific build-error offset; resolution
is to extend analyze_instruction to recognize the additional pattern
(or, when it's a true tail-call rather than a jtbl, distinguish
those at the jr site).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior "pattern-synthesized recompile failures are best-effort:
log + skip" path was a stub by another name — it produced binaries
where some fragment bodies silently didn't exist, and the failure
deferred to a runtime lookup-miss when Stadium tried to dispatch
into them. That violates the project's no-stubs principle.
Two changes here:
1. **Remove the soft-skip in main.cpp's recompile loops.** Recompile
failures revert to fatal `std::exit(EXIT_FAILURE)` regardless of
whether the section is pattern-synthesized. Build-time errors
surface; the user has to make a real choice about how to
resolve them.
2. **Replace the "scan to first jr ra" heuristic in decompressed.cpp
with a real BFS-based control-flow walker.** The walker:
- Starts at impl entry (+0x20).
- Follows conditional branches (target + fall-through).
- Follows j/jal targets when intra-function.
- Treats jr $ra as a return; ends the basic block.
- Returns max-reachable-offset + 4 as the function's true size.
For functions with computed jumps (jr <reg> not jr $ra — i.e.
jump-table dispatches), the walker reports a build-time error
with a specific offset and a list of options for the user
(declare via single-block form, or extend the walker to follow
jump-table targets). NOT a skip.
3. **Pattern-caller propagates synthesis failures as build aborts.**
`synthesize_decompressed_patterns` returns false when any section
fails to add, and main.cpp's exit_failure path runs.
Net effect on Stadium today: the static [[input.decompressed_section]]
for fragment78 still recompiles cleanly (boot logo + PIKA jingle
unaffected). Activating the pattern would now fail loudly on the
first fragment with computed jumps, instead of silently shipping a
binary missing those bodies. That's the principle: build errors
surface, runtime stubs don't.
The "extend the walker to follow jump-table targets" work is
documented in the error message and is the next step if/when
pattern activation matters more than fragment78's single case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Section::content_hash, populates it on pattern-synthesized
sections with FNV-1a-64 of the first 0x100 bytes of the decompressed
body, and emits it into recomp_overlays.inl's SectionTableEntry. The
runtime side hashes the same window over the bytes Stadium loads at
fragment_ptr and looks up the matching section by hash.
Build-time and runtime use:
- SAME hash algorithm: FNV-1a-64
- SAME window: 0x100 bytes (95% uniqueness across Stadium's 282
distinct fragment bodies; falls back to first-candidate on the
residual ~5%)
- SAME byte source: pre-relocation decompressed bytes (link-time
form, before Stadium's R_MIPS_32 patches run)
Section table emit gains the .content_hash field; non-pattern sections
get hash=0, runtime-side condition `sec.content_hash != 0` filters
them out of the candidate set.
Pairs with the runtime-side change in
lib/N64ModernRuntime/librecomp/src/overlays.cpp.
Activation in PokemonStadiumRecomp's game.toml is gated on a
follow-up: pattern-synthesized impl bodies currently get a basic
forward-CFG-walked size which produces invalid C for fragments with
internal jump tables (data interpreted as code). Future fix: emit
pattern-section impl bodies as runtime-dispatched stubs instead of
trying to statically recompile each body. Until then, fragment78
stays declared as a single static [[input.decompressed_section]];
the engine's pattern infrastructure is in place, ready to be flipped
on once the impl-body emit is reshaped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds [[input.decompressed_section_pattern]] for slots where many
fragments share a link vram (e.g. Stadium streams 279+ different
fragments through vram 0x8FF00000 across the game). Per-fragment
[[input.decompressed_section]] entries don't scale to that cardinality
and miss the runtime-swap dispatch problem entirely.
Engine pipeline:
1. Scan baserom.z64 for every Yay0 wrapper.
2. For each, decompress 0x40 bytes and check whether the prefix
matches the expected J <vram + 0x20> trampoline + FRAGMENT magic.
Wrappers in PERS-SZP form are detected by the -0x18 prefix.
3. For matches, fully decompress and FNV-1a-64 hash the body.
4. Deduplicate by content hash (Stadium has ~11 byte-identical
duplicates across its 279 wrappers).
5. Synthesize one Section per unique content. Section names
<base_name>__rom_<wrapper_offset>; functions become
func_<vram>__rom_<offset> via the existing collision-suffix
machinery (default for pattern-discovered sections, since
collisions are the EXPECTED case here).
Implementation function (the +0x20 entry) gets a basic forward CFG
walk to determine its size:
- Walk instructions tracking forward branch targets within the func.
- Stop at jr $ra IF no tracked forward branches still need to be
reached.
- Falls back to first-jr-ra heuristic if walk is inconclusive.
Pattern-synthesized recompile failures are non-fatal: pattern sections
have rom_addr in synthetic 0xFE000000 range, and main.cpp's recompile
loop log + skips them instead of std::exit. Lets the build proceed
even when our basic CFG walk misjudges a function with weird shape
(e.g. computed jumps through jump tables we don't analyze). Stadium's
Path-3 single-fragment case (fragment78 wrapper at ROM 0x9E93F0)
still recompiles cleanly; ~225 of 282 dynamic-slot fragments
recompile, ~57 fail and skip.
Validation on Stadium's 0x8FF00000 slot:
- 293 Yay0 wrappers found (293 vs 279 from prior validate script —
earlier scan undercounted due to a tight 1KB decode window).
- 282 sections after dedupe (11 collapsed as content-identical).
- Build proceeds to completion; no Stadium boot regression
(logo + PIKA jingle still render).
Outstanding for next session — runtime side:
- Modify register_runtime_fragment in librecomp/src/overlays.cpp
to read bytes at fragment_ptr (first 0x40 → fall back to full
body for the residual ~5%), hash, and look up the matching
section. Currently it picks by id alone, so for slot 0x8FF00000
only ONE of the 282 sections gets bound to func_map at any time
(the most-recently registered).
- Refactor cross-section R_MIPS_32 retargeting to use a vram
hashmap (currently O(N²) which gets expensive at 282 sections).
- Relink fragment78's prior single-fragment block can stay; it
works alongside patterns and serves as the "I know exactly which
one I want" form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds [[input.decompressed_section]] toml block + Yay0/PERS-SZP wrapper
decoders + an in-memory section synthesis pass. Required for games
like Pokemon Stadium where Stadium's CPU-side decompressor materializes
fragment bytes at runtime and the static recompiler can't see them in
the ELF/ROM-direct path.
User-facing config:
[[input.decompressed_section]]
name = "fragment78"
vram = 0x8FF00000
rom_wrapper = 0x9E93F0
wrapper_format = "pers_szp_yay0"
Pipeline:
1. compression/{yay0,pers_szp}.{h,cpp} decode the wrapper.
2. decompressed.cpp parses the FRAGMENT-format header (relocOffset,
sizeInRam) + Stadium-format reloc table, translates it to
N64Recomp::Reloc entries (R_MIPS_32/26/HI16/LO16) with paired
HI16/LO16 immediate computation, and synthesizes a Section
handed to the existing recompilation pipeline. Stores
decompressed bytes into context.rom at synthetic_rom =
0xFE000000 | rom_wrapper to keep them out of real-ROM addr space.
3. Two functions per fragment: the +0x00 entry trampoline (J + nop)
and the +0x20 implementation (runs to first jr ra in body).
4. After all decompressed sections are added, retargets each
R_MIPS_32 reloc to whichever existing section's vram range
contains its target address (cross-section pointer support).
Adds [output] collision_policy:
"error" (default) — abort the build if two emitted symbols collide
on name; print both colliders + how to opt in.
"suffix" — auto-disambiguate by appending __rom_<rom_addr>
to colliding symbols. Suffix only appears where
collisions exist.
Validated end-to-end on Stadium's fragment78 (wrapper at ROM 0x9E93F0,
decomp_size=0x25340, 319 relocs). Recompiled func_8FF00020 dispatches
to runtime_addr+0x24DC0 correctly; Stadium boots past the prior
crash point, no regression on the N64 logo + PIKA jingle.
Future work: pattern form ([[input.decompressed_section_pattern]]) for
slots like vram 0x8FF00000 where Stadium streams 279 different
fragments at the same link addr. Validation script
(tools/_validate_dynfrag.py in the consumer repo) confirms 268 distinct
content-hashes, 23MB total payload — feasible as engine work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three engine fixes uncovered by Stadium fragment dispatch:
1. recompilation.cpp: SectionAbsolute guard in print_func_call_by_address.
Stadium's .fragmentN sections JAL into SHN_ABS symbols (e.g.
osGbpakReadWrite); resolve_jal indexed context.sections[] at 65534
and segfaulted on first dispatch. Skip reloc resolution when
reloc_section >= context.sections.size().
2. main.cpp (overlay table emit): filter unsupported MIPS reloc types
before indexing reloc_names[]. Stadium's .rel.fragmentN includes
R_MIPS_PC16 (type 10) which the recompiler doesn't model; the OOB
read embedded a NUL byte in the .type field and broke the C compile.
3. main.cpp: bounds-check inversion in the static-funcs scan
(read section_funcs[size] before checking i < size). Latent bug
exposed by .fragment1's larger CreateStatic surface.
4. recomp.h: forward-declare recomp_register_runtime_fragment so funcs
files can call it from inlined hook text generated by
[[patches.hook]] on Memmap_RelocateFragment.
(NOTE: original local commit de76241 also added a recomp_unhandled_*
forward-decl family; those declarations are dropped from this PR — they
violate the no-stubs principle and depend on a runtime API not yet in
upstream N64ModernRuntime.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add function hooks to mod symbol format
* Add function sizes to section function tables
* Add support for function hooks in live generator
* Add an option to the context to force function lookup for all non-relocated function calls
* Include relocs in overlay data
* Include R_MIPS_26 relocs in symbol file dumping/parsing
* Add manual patch symbols (syms.ld) to the output overlay file and relocs
* Fix which relocs were being emitted for patch sections
* Fix sign extension issue with mfc1, add TODO for banker's rounding
This commit implements the "live recompiler", which is another backend for the recompiler that generates platform-specific assembly at runtime. This is still static recompilation as opposed to dynamic recompilation, as it still requires information about the binary to recompile and leverages the same static analysis that the C recompiler uses. However, similarly to dynamic recompilation it's aimed at recompiling binaries at runtime, mainly for modding purposes.
The live recompiler leverages a library called sljit to generate platform-specific code. This library provides an API that's implemented on several platforms, including the main targets of this component: x86_64 and ARM64.
Performance is expected to be slower than the C recompiler, but should still be plenty fast enough for running large amounts of recompiled code without an issue. Considering these ROMs can often be run through an interpreter and still hit their full speed, performance should not be a concern for running native code even if it's less optimal than the C recompiler's codegen.
As mentioned earlier, the main use of the live recompiler will be for loading mods in the N64Recomp runtime. This makes it so that modders don't need to ship platform-specific binaries for their mods, and allows fixing bugs with recompilation down the line without requiring modders to update their binaries.
This PR also includes a utility for testing the live recompiler. It accepts binaries in a custom format which contain the instructions, input data, and target data. Documentation for the test format as well as most of the tests that were used to validate the live recompiler can be found here. The few remaining tests were hacked together binaries that I put together very hastily, so they need to be cleaned up and will probably be uploaded at a later date. The only test in that suite that doesn't currently succeed is the div test, due to unknown behavior when the two operands aren't properly sign extended to 64 bits. This has no bearing on practical usage, since the inputs will always be sign extended as expected.
* implement nrm filename toml input
* change name of mod toml setting to 'mod_filename'
* add renaming and re mode
* fix --dump-context arg, fix entrypoint detection
* refactor re_mode to function_trace_mode
* adjust trace mode to use a general TRACE_ENTRY() macro
* fix some renaming and trace mode comments, revert no toml entrypoint code, add TODO to broken block
* fix arg2 check and usage string
* Terminate offline mod recompilation if any functions fail to recompile
* Fixed edge case with switch case jump table detection when lo16 immediate is exactly 0
* Prevent emitting duplicate reference symbol defines in offline mod recompilation
* Fix function calls and add missing runtime function pointers in offline mod recompiler
* Remove reference context from parse_mod_symbols argument
* Add support for special dependency names (self and base recomp), fix non-compliant offline mod recompiler output
* Fix export names not being set on functions when parsing mod syms, add missing returns to mod parsing
* Switch offline mod recompilation to use a base global event index instead of per-event global indices
* Add support for creating events in normal recompilation
* Output recomp API version in offline mod recompiler
* Removed dependency version from mod symbols (moved to manifest)
* Added mod manifest generation to mod tool
* Implement mod file creation in Windows
* Fixed some error prints not using stderr
* Implement mod file creation on posix systems
* De-hardcode symbol file path for offline mod recompiler
* Fix duplicate import symbols issue and prevent emitting unused imports