The recompiler walks instructions linearly and advances
reloc_index only forward (recompilation.cpp: while
section.relocs[reloc_index].address < vram). This requires the
section's relocs to be sorted by address — out-of-order entries
are skipped silently and emitted as literal immediates instead of
RELOC_HI16/RELOC_LO16/RELOC_R_MIPS_26 macros.
Stadium's raw fragment reloc table orders entries by HI16+LO16
pair adjacency, NOT by instruction address. For stadium_models
sub-fragment 4 (variant 384, hash 0x242995EF6B92F471), the raw
table holds:
[HI16 @ off 0x50][HI16 @ off 0x30][LO16 @ off 0x60]
Both HI16 entries pair correctly via raw-list adjacency
(target_section_offset is computed before this sort). But when
the recompiler walks instruction at PC 0x8FF00030, the
already-advanced reloc_index points at the entry for offset 0x50
(seen first in raw order) and the address comparison fails. The
HI16 at 0x30 is silently skipped and emitted as the literal
`S32(0x8FF1 << 16)`.
The matching LO16 at 0x60 IS encountered in linear order, so it
emits correctly as `RELOC_LO16(384, 0xB14C)`. The asymmetric pair
produces:
ctx->r3 = S32(0x8FF1 << 16); // HI literal
ctx->r2 = ADD32(ctx->r3,
(int16_t)RELOC_LO16(384, 0xB14C)); // LO reloc'd
For section 384 with runtime base 0x8027FAB0, RELOC_LO16(384,
0xB14C) sign-extends to -0x5404, yielding `0x8FF10000 + (-0x5404)
= 0x8FF0ABFC` instead of the intended `runtime_base + 0xB14C =
0x8028ABFC`. The result lands back in the pattern bucket and
process_geo_layout walks bogus geo data → cmd_byte=0xFF →
lookup-miss 0x00000E00 → crash.
Fix: std::sort section_out.relocs by address ascending after
parsing. Pairing has already been computed via raw-list adjacency
(which is independent of address), so the sort only affects the
recompiler's lookup order. Verified: variant 384's recompiled C
now emits all three lui instructions as RELOC_HI16(384, ...),
attract demo no longer crashes on 0x8FF0ABFC.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Path 2 of the pattern-fragment dispatch architecture: each variant of
a [[input.decompressed_section_pattern]] now gets a unique link-time
ram_addr from a synthetic vram pool (0xC0000000+, KSEG2/KSEG3 — unused
by N64 software so it can't collide with engine-resident sections like
RSP at 0xA4000000+).
Why: when multiple variants share a single canonical link bucket
(e.g. all stadium_models pattern variants at 0x8FF00000), runtime
fragment-vaddr resolution via gFragments[id] is single-pointer and
ambiguous when more than one variant is host-resident at the same
time. Per-variant synthetic ram_addrs make each variant's RELOC_HI16
/ RELOC_LO16 emit produce a unique 0xCXXXXXXX literal at runtime,
giving variant-internal references unambiguous identity without
depending on caller PC, host stack walks, or data-context tracking.
Implementation:
- add_decompressed_section accepts an override_link_ram_addr param.
The bytes-encoded `vram` (= canonical link bucket) is passed to
parse_fragment_relocs and discover_function_bounds (so jump tables
resolve correctly against the body's encoded references), while
section.ram_addr is set to the override. The two roles of vram are
cleanly separated.
- New original_pattern_id field on Section. Populated for synthetic-
link variants with the original game-side fragment id derived from
the pattern's canonical bucket (e.g. 0xEF for stadium_models).
Lets the runtime candidate filter know which game id should
include this synthetic section as a candidate, eliminating cross-
pattern hash-collision misregistration.
- main.cpp emit: section_load_table now writes original_pattern_id
into the SectionTableEntry initializer.
- decompressed.cpp pattern loop: every unique variant now gets
synthetic ram_addr = 0xC0000000 + variant_idx * 0x100000 (1 MB
stride, ~286 KB largest observed variant). For Stadium's 279
unique variants the pool occupies 0xC0000000..0xCDB00000, well
within the runtime-side 512-bucket capacity.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
discover_function_bounds was extending max_reached to cover the jump
table's bytes themselves (jtbl_end - 4). This was wrong: jtbl entries
are DATA, not code. Including their bytes in the function's words
range made recompile_function try to interpret intervening data words
as MIPS instructions, which usually fails (encountering 'pref',
'INVALID', branches to outside the function, etc.).
The fix: stop extending max_reached for jtbl_end. The recompiler's
analyze_function reads jtbl entries directly from context.rom — they
don't need to be part of func.words. Function bounds = max-reached-
instruction + 4 (delay slot), nothing more.
Effect on Stadium's 0x8FF00000 pattern activation:
Before this fix: 1 recompile error (`func_8FF00020__rom_FE000020`
at instr 8243 — 'pref' instruction emitted at vram 0x8FF080B8,
followed by INVALID instructions). The function had been sized
large enough to contain unrelated data after the case-arms region.
After this fix: 0 errors. All 219 pattern-synthesized sections
recompile cleanly.
Stadium boot impact (pattern activated, 30-second run):
- 1048 audio hits (15x improvement; was ~70 with static fragment78).
- 31 fragments registered (3.4x improvement; was ~9).
- 17 different fragment78-family variants streamed through link
vram 0x8FF00000, each correctly content-hash-dispatched.
- Stadium reaches a NEW failure mode much later: `lookup miss
at 0x810001D0` — a different fragment slot (link bucket 0x10)
needs similar treatment. Tracked separately.
The pattern is now ACTIVE in Stadium's game.toml. game.toml updated
to document the active configuration.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
discover_function_bounds was treating intra-body j/jal targets as
intra-function jumps and adding them to the BFS worklist. For Stadium
fragments where the body contains MULTIPLE small functions glued by
tail calls, this caused one function's BFS to absorb its neighbors
(and eventually walk into data past the last function), producing
huge function bodies that the recompiler choked on.
The fix: don't follow j or jal targets at all.
J — almost always a tail call to a neighboring function in
Stadium fragments. Loops use conditional B* with negative
offsets (which the BFS still handles). Treating J as a hard
block terminator drops the function-spanning behavior. If a
genuinely intra-function J ever shows up (rare), the missing
target surfaces as an analyzer warning at that specific
offset, which the build flags loudly.
JAL — a call into another function, by definition. Following its
target absorbed the callee into the caller. Now we walk past
the JAL+delay slot (control returns after the call) but
leave the target alone — the callee gets discovered and
recompiled separately when it's reached as a JAL'd function
elsewhere, OR remains a runtime LOOKUP_FUNC dispatch.
Effect on Stadium's 0x8FF00000 pattern activation:
Before: 57+ analyze_function failures (combination of overlap
corruption + function-spanning).
After overlap fix (28c4fdd): 0 analyze failures, 0 bounds-discovery
failures, but 1 recompile-time error at instr 8243 of the
first synthesized section — function still walking into
data via some non-J path.
After this change: that count holds at 1 — the function-spanning
via J was already fixed by the overlap commit (which
surfaced the issue), but the dataset hadn't shrunk further.
The remaining 1 failure has a DIFFERENT root cause that's
NOT j/jal (likely a conditional branch with a wild target
or a jtbl entry pointing into data). Tracked as separate
investigation.
Static [[input.decompressed_section]] for fragment78 still
recompiles cleanly. No regression on Stadium boot.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The previous formula \`synthetic_rom = 0xFE000000 | rom_wrapper\` assumed
wrapper offsets were spaced apart by at least their decompressed body
sizes. They are NOT — Stadium's wrappers are densely packed (often
within 0x100-0x10000 bytes of each other) while their decompressed
bodies are 0x500-0x50000 bytes. This caused later sections' memcpy
into context.rom to OVERWRITE earlier sections' bytes, corrupting
their jump-table entries and any other content addressed by
relative offsets.
Concrete repro before the fix: pattern-activate Stadium's 0x8FF00000
slot. Section frag_8FF00000__rom_56E900 has impl_size=0xC2F4 (correctly
bounded). Its jump table at body offset 0xC300 has 5 entries pointing
to body offsets 0x48..0x74. After the section was added, frag_*__rom_574A50
(wrap_off=0x574A50, synthetic_rom=0xFE574A50) memcpy'd 0x58 bytes
starting at 0xFE574A50 — INSIDE the first section's range
[0xFE56E900, 0xFE57AC20). The jtbl bytes at offset 0xC300 (rom 0xFE57AC00)
got clobbered with garbage from the second section's body. analyze_function
then read jtbl entries that didn't decode to in-function vrams and
reported "Failed to determine size of jump table" — a real symptom
caused by silent data corruption.
The fix: cumulative allocator. A static counter starts at 0xFE000000;
each new section claims a fresh, 4-byte-aligned chunk equal to its
reloc_offset. No two sections ever share a byte range. The 0xFE000000
prefix is preserved for traceability (synthetic ranges live above any
real ROM offset). Fails the build cleanly if cumulative usage exceeds
0x100000000 (256 MB of synthesized payload), which Stadium's 0x8FF00000
slot at ~23 MB total is comfortably under.
Verified: pattern-activated Stadium's 0x8FF00000 slot. After the fix,
ZERO analyze_function failures and ZERO bounds-discovery failures
(was 57+ before). Build now hits a different class — discover_function_bounds
walks past real function ends via j/jal-in-body that are tail calls,
not intra-function jumps. That's a separate analyzer bug, surfaced by
this fix and tracked as the next layer of work. Still principle-clean:
build aborts with specific instruction offsets.
Static [[input.decompressed_section]] for fragment78 still
recompiles cleanly. No regression on Stadium boot logo + PIKA jingle.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Stadium's dynamic-asset slot at vram 0x8FF00000 contains a mix of
fragment shapes:
Code fragments — real MIPS function at +0x20 ending in jr \$ra
(and possibly more functions). Stadium dispatches
the +0x00 J trampoline to invoke them.
Data fragments — pure data starting at +0x20 (tables of (tag,
pointer) records, animation curves, etc.). The
+0x00 J trampoline is a dormant placeholder that
Stadium NEVER actually calls. Stadium reads the
data directly via R_MIPS_32 pointers from elsewhere.
The previous code path attempted to recompile a function at +0x20
in EVERY synthesized section, which (a) was incorrect for data
fragments, and (b) reliably produced invalid C from data words
decoded as instructions.
Detection heuristic: scan the first 0x100 instructions of the body
for any jr \$ra (encoded as 0x03E00008). If absent, the fragment is
data-only — register the section + R_MIPS_32 relocs but emit NO
FuncEntry rows. If Stadium ever does dispatch the +0x00 J for one
of these (which shouldn't happen), the runtime LOOKUP_FUNC reports
the miss loudly — that's the correct surface, NOT a stub.
Tested on Stadium's 0x8FF00000 slot via [[input.decompressed_section_pattern]]:
- 282 wrappers attempted
- 62 classified as data-only (registered without impl function)
- 220 attempted as code; first failure surfaces an analyze_function
jump-table sizing gap (separate issue, distinct from data-only
classification)
Static [[input.decompressed_section]] for fragment78 is unaffected
(still recompiles cleanly; boot logo + PIKA jingle still play).
The pattern stays inactive in Stadium's game.toml until the
analyze_function jtbl gap is addressed; build correctly refuses to
proceed if activated.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds a public N64Recomp::discover_function_bounds() in src/analysis.h
that performs a BFS-based control-flow walk of a function's body,
following:
- Conditional branches (target + fall-through)
- Unconditional j/jal targets when intra-body
- jr $ra returns (block ends after delay slot)
- jr-via-jump-table dispatches: the existing register-state
simulator from analyze_function detects the lui+addiu+addu+lw+jr
pattern and records the jtbl base; we then read entries out of
the body bytes and feed targets back into the BFS until
convergence.
Returns the function's byte size (max-reachable + 4 to cover the
delay slot of the last instruction). On failure, populates a specific
error message with the offending offset and reason — caller treats
this as a build error, NOT a graceful skip (per the project's
no-stubs principle).
Wires into decompressed.cpp's pattern path, replacing the prior
inline BFS that had a TODO for jump-table handling. The pattern
caller now propagates failures via `synthesize_decompressed_patterns`
returning false, which surfaces in main.cpp's exit_failure path.
Concrete behavior change: activating a pattern that includes a
fragment with computed jumps now produces a build error pointing at
the specific section name + offset + the analyzer's failure reason,
instead of silently producing a partial binary. Tested on Stadium's
0x8FF00000 slot — first failing wrapper is at ROM 0x8CC400 with an
indirect jr at offset 0x827C the simulator doesn't pattern-match.
The static [[input.decompressed_section]] path for fragment78 is
unaffected (still recompiles cleanly, no regression on boot logo +
PIKA jingle).
Future work surfaced by this change: the simulator's lui+addiu
+addu+lw+jr pattern doesn't cover every jump-table shape Stadium
uses. Each gap surfaces as a specific build-error offset; resolution
is to extend analyze_instruction to recognize the additional pattern
(or, when it's a true tail-call rather than a jtbl, distinguish
those at the jr site).
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The prior "pattern-synthesized recompile failures are best-effort:
log + skip" path was a stub by another name — it produced binaries
where some fragment bodies silently didn't exist, and the failure
deferred to a runtime lookup-miss when Stadium tried to dispatch
into them. That violates the project's no-stubs principle.
Two changes here:
1. **Remove the soft-skip in main.cpp's recompile loops.** Recompile
failures revert to fatal `std::exit(EXIT_FAILURE)` regardless of
whether the section is pattern-synthesized. Build-time errors
surface; the user has to make a real choice about how to
resolve them.
2. **Replace the "scan to first jr ra" heuristic in decompressed.cpp
with a real BFS-based control-flow walker.** The walker:
- Starts at impl entry (+0x20).
- Follows conditional branches (target + fall-through).
- Follows j/jal targets when intra-function.
- Treats jr $ra as a return; ends the basic block.
- Returns max-reachable-offset + 4 as the function's true size.
For functions with computed jumps (jr <reg> not jr $ra — i.e.
jump-table dispatches), the walker reports a build-time error
with a specific offset and a list of options for the user
(declare via single-block form, or extend the walker to follow
jump-table targets). NOT a skip.
3. **Pattern-caller propagates synthesis failures as build aborts.**
`synthesize_decompressed_patterns` returns false when any section
fails to add, and main.cpp's exit_failure path runs.
Net effect on Stadium today: the static [[input.decompressed_section]]
for fragment78 still recompiles cleanly (boot logo + PIKA jingle
unaffected). Activating the pattern would now fail loudly on the
first fragment with computed jumps, instead of silently shipping a
binary missing those bodies. That's the principle: build errors
surface, runtime stubs don't.
The "extend the walker to follow jump-table targets" work is
documented in the error message and is the next step if/when
pattern activation matters more than fragment78's single case.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds Section::content_hash, populates it on pattern-synthesized
sections with FNV-1a-64 of the first 0x100 bytes of the decompressed
body, and emits it into recomp_overlays.inl's SectionTableEntry. The
runtime side hashes the same window over the bytes Stadium loads at
fragment_ptr and looks up the matching section by hash.
Build-time and runtime use:
- SAME hash algorithm: FNV-1a-64
- SAME window: 0x100 bytes (95% uniqueness across Stadium's 282
distinct fragment bodies; falls back to first-candidate on the
residual ~5%)
- SAME byte source: pre-relocation decompressed bytes (link-time
form, before Stadium's R_MIPS_32 patches run)
Section table emit gains the .content_hash field; non-pattern sections
get hash=0, runtime-side condition `sec.content_hash != 0` filters
them out of the candidate set.
Pairs with the runtime-side change in
lib/N64ModernRuntime/librecomp/src/overlays.cpp.
Activation in PokemonStadiumRecomp's game.toml is gated on a
follow-up: pattern-synthesized impl bodies currently get a basic
forward-CFG-walked size which produces invalid C for fragments with
internal jump tables (data interpreted as code). Future fix: emit
pattern-section impl bodies as runtime-dispatched stubs instead of
trying to statically recompile each body. Until then, fragment78
stays declared as a single static [[input.decompressed_section]];
the engine's pattern infrastructure is in place, ready to be flipped
on once the impl-body emit is reshaped.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds [[input.decompressed_section_pattern]] for slots where many
fragments share a link vram (e.g. Stadium streams 279+ different
fragments through vram 0x8FF00000 across the game). Per-fragment
[[input.decompressed_section]] entries don't scale to that cardinality
and miss the runtime-swap dispatch problem entirely.
Engine pipeline:
1. Scan baserom.z64 for every Yay0 wrapper.
2. For each, decompress 0x40 bytes and check whether the prefix
matches the expected J <vram + 0x20> trampoline + FRAGMENT magic.
Wrappers in PERS-SZP form are detected by the -0x18 prefix.
3. For matches, fully decompress and FNV-1a-64 hash the body.
4. Deduplicate by content hash (Stadium has ~11 byte-identical
duplicates across its 279 wrappers).
5. Synthesize one Section per unique content. Section names
<base_name>__rom_<wrapper_offset>; functions become
func_<vram>__rom_<offset> via the existing collision-suffix
machinery (default for pattern-discovered sections, since
collisions are the EXPECTED case here).
Implementation function (the +0x20 entry) gets a basic forward CFG
walk to determine its size:
- Walk instructions tracking forward branch targets within the func.
- Stop at jr $ra IF no tracked forward branches still need to be
reached.
- Falls back to first-jr-ra heuristic if walk is inconclusive.
Pattern-synthesized recompile failures are non-fatal: pattern sections
have rom_addr in synthetic 0xFE000000 range, and main.cpp's recompile
loop log + skips them instead of std::exit. Lets the build proceed
even when our basic CFG walk misjudges a function with weird shape
(e.g. computed jumps through jump tables we don't analyze). Stadium's
Path-3 single-fragment case (fragment78 wrapper at ROM 0x9E93F0)
still recompiles cleanly; ~225 of 282 dynamic-slot fragments
recompile, ~57 fail and skip.
Validation on Stadium's 0x8FF00000 slot:
- 293 Yay0 wrappers found (293 vs 279 from prior validate script —
earlier scan undercounted due to a tight 1KB decode window).
- 282 sections after dedupe (11 collapsed as content-identical).
- Build proceeds to completion; no Stadium boot regression
(logo + PIKA jingle still render).
Outstanding for next session — runtime side:
- Modify register_runtime_fragment in librecomp/src/overlays.cpp
to read bytes at fragment_ptr (first 0x40 → fall back to full
body for the residual ~5%), hash, and look up the matching
section. Currently it picks by id alone, so for slot 0x8FF00000
only ONE of the 282 sections gets bound to func_map at any time
(the most-recently registered).
- Refactor cross-section R_MIPS_32 retargeting to use a vram
hashmap (currently O(N²) which gets expensive at 282 sections).
- Relink fragment78's prior single-fragment block can stay; it
works alongside patterns and serves as the "I know exactly which
one I want" form.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Adds [[input.decompressed_section]] toml block + Yay0/PERS-SZP wrapper
decoders + an in-memory section synthesis pass. Required for games
like Pokemon Stadium where Stadium's CPU-side decompressor materializes
fragment bytes at runtime and the static recompiler can't see them in
the ELF/ROM-direct path.
User-facing config:
[[input.decompressed_section]]
name = "fragment78"
vram = 0x8FF00000
rom_wrapper = 0x9E93F0
wrapper_format = "pers_szp_yay0"
Pipeline:
1. compression/{yay0,pers_szp}.{h,cpp} decode the wrapper.
2. decompressed.cpp parses the FRAGMENT-format header (relocOffset,
sizeInRam) + Stadium-format reloc table, translates it to
N64Recomp::Reloc entries (R_MIPS_32/26/HI16/LO16) with paired
HI16/LO16 immediate computation, and synthesizes a Section
handed to the existing recompilation pipeline. Stores
decompressed bytes into context.rom at synthetic_rom =
0xFE000000 | rom_wrapper to keep them out of real-ROM addr space.
3. Two functions per fragment: the +0x00 entry trampoline (J + nop)
and the +0x20 implementation (runs to first jr ra in body).
4. After all decompressed sections are added, retargets each
R_MIPS_32 reloc to whichever existing section's vram range
contains its target address (cross-section pointer support).
Adds [output] collision_policy:
"error" (default) — abort the build if two emitted symbols collide
on name; print both colliders + how to opt in.
"suffix" — auto-disambiguate by appending __rom_<rom_addr>
to colliding symbols. Suffix only appears where
collisions exist.
Validated end-to-end on Stadium's fragment78 (wrapper at ROM 0x9E93F0,
decomp_size=0x25340, 319 relocs). Recompiled func_8FF00020 dispatches
to runtime_addr+0x24DC0 correctly; Stadium boots past the prior
crash point, no regression on the N64 logo + PIKA jingle.
Future work: pattern form ([[input.decompressed_section_pattern]]) for
slots like vram 0x8FF00000 where Stadium streams 279 different
fragments at the same link addr. Validation script
(tools/_validate_dynfrag.py in the consumer repo) confirms 268 distinct
content-hashes, 23MB total payload — feasible as engine work.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three engine fixes uncovered by Stadium fragment dispatch:
1. recompilation.cpp: SectionAbsolute guard in print_func_call_by_address.
Stadium's .fragmentN sections JAL into SHN_ABS symbols (e.g.
osGbpakReadWrite); resolve_jal indexed context.sections[] at 65534
and segfaulted on first dispatch. Skip reloc resolution when
reloc_section >= context.sections.size().
2. main.cpp (overlay table emit): filter unsupported MIPS reloc types
before indexing reloc_names[]. Stadium's .rel.fragmentN includes
R_MIPS_PC16 (type 10) which the recompiler doesn't model; the OOB
read embedded a NUL byte in the .type field and broke the C compile.
3. main.cpp: bounds-check inversion in the static-funcs scan
(read section_funcs[size] before checking i < size). Latent bug
exposed by .fragment1's larger CreateStatic surface.
4. recomp.h: forward-declare recomp_register_runtime_fragment so funcs
files can call it from inlined hook text generated by
[[patches.hook]] on Memmap_RelocateFragment.
(NOTE: original local commit de76241 also added a recomp_unhandled_*
forward-decl family; those declarations are dropped from this PR — they
violate the no-stubs principle and depend on a runtime API not yet in
upstream N64ModernRuntime.)
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
* Add function hooks to mod symbol format
* Add function sizes to section function tables
* Add support for function hooks in live generator
* Add an option to the context to force function lookup for all non-relocated function calls
* Include relocs in overlay data
* Include R_MIPS_26 relocs in symbol file dumping/parsing
* Add manual patch symbols (syms.ld) to the output overlay file and relocs
* Fix which relocs were being emitted for patch sections
* Fix sign extension issue with mfc1, add TODO for banker's rounding
This commit implements the "live recompiler", which is another backend for the recompiler that generates platform-specific assembly at runtime. This is still static recompilation as opposed to dynamic recompilation, as it still requires information about the binary to recompile and leverages the same static analysis that the C recompiler uses. However, similarly to dynamic recompilation it's aimed at recompiling binaries at runtime, mainly for modding purposes.
The live recompiler leverages a library called sljit to generate platform-specific code. This library provides an API that's implemented on several platforms, including the main targets of this component: x86_64 and ARM64.
Performance is expected to be slower than the C recompiler, but should still be plenty fast enough for running large amounts of recompiled code without an issue. Considering these ROMs can often be run through an interpreter and still hit their full speed, performance should not be a concern for running native code even if it's less optimal than the C recompiler's codegen.
As mentioned earlier, the main use of the live recompiler will be for loading mods in the N64Recomp runtime. This makes it so that modders don't need to ship platform-specific binaries for their mods, and allows fixing bugs with recompilation down the line without requiring modders to update their binaries.
This PR also includes a utility for testing the live recompiler. It accepts binaries in a custom format which contain the instructions, input data, and target data. Documentation for the test format as well as most of the tests that were used to validate the live recompiler can be found here. The few remaining tests were hacked together binaries that I put together very hastily, so they need to be cleaned up and will probably be uploaded at a later date. The only test in that suite that doesn't currently succeed is the div test, due to unknown behavior when the two operands aren't properly sign extended to 64 bits. This has no bearing on practical usage, since the inputs will always be sign extended as expected.