LZ10 decompression is builtin to the GBA's bios, so we don't need ZX0. It's also significantly faster
(618 usec instead of 2311 usec in my personal benchmark code for decompression of the same data)
And it seems like by doing so, we saved 1 KB as well!
So, seems like replacing ZX0 is the right move.
The reason I didn't initially is because I misunderstood the documentation. I assumed LZ77UnCompWram could only uncompress into EWRAM, not IWRAM.
But it turns out it can do both.
And using standardized tools is usually better than using a custom implementation.
The only downside of this right now, is that we can no longer stream text tables through a smaller buffer than the entire decompressed size.
Anyway, things seem to work fine, so bye bye ZX0. It's been fun.
To protect the world from the soft float library...
To unite all arithmetic within our binary...
To denounce the evils of floating point precision...
To save more kilobytes - that's our vision....
(god this is cringe)
All floating point math has been eliminated, and replaced with
equivalent or near-equivalent fixed-point math.
sprite_data.cpp uses Q16, and get_rand_range uses a full Q32 to
ensure that the exact same results are generated as before, at
the cost of some inline assembly to do a umull (__aeabi_lmul is a
little excessive when the lower 32 bits are discarded)
This eliminates all of the expensive double precision float library,
saving a few kilobytes.
Additionally, the unneccessary parts of nanoprintf have been
disabled. There is no need for precision specifiers, long longs, or
floats.
Add a binary table format and convert the text entries into this format in text_helper/main.py. It then gets compressed with zx0.
The new text_data_table and streamed_data_table classes exist to read the various entries from this binary table. streamed_data_table specifically
exists to use a decompression buffer that is smaller than the actual binary table. But it requires a decompression buffer that is
still larger than ZX0_DEFAULT_WINDOW_SIZE (default: 2048 bytes) and will only be able to decompress in
chunks of (<decompression_buffer_size> - <ZX0_DEFAULT_WINDOW_SIZE>) bytes
Try to keep the binary text tables sufficiently small though, because since zx0 doesn't actually support random access,
getting to the last entry is significantly more expensive than reading the first one. And unless you use streamed_data_table,
it also requires <uncompressed_size> bytes of stack space, therefore IWRAM to decompress them.
I also had to rework script_array because it can no longer reference the strings directly. Instead we now reference the DIA_* "enum" values.
We also no longer store an array of script_obj instances, because these were getting stored in IWRAM since they're non-const global variables
originally. Instead we now have const arrays of script_obj_params structs, which should end up in .rodata -> therefore EWRAM.
Right now, script_obj only supports the PTGB text table (originally the dialogue array). But if the need arises to support other tables as well,
I'd consider adding a separate enum to script_obj_params to indicate the specific table.
The compilation process will also output .su files in the build folder from now on. These files indicate the stack frame size for every function in
every compilation unit, so be sure to check them from time to time. Note that they will only show the stack consumption for that specific function.
So to get the worst case stack consumption, you need to manually add all the functions in a certain stack flow.