Updated Decompiling a function (markdown)

AnonymousRandomPerson 2024-08-02 21:31:31 -04:00
parent 661a5254bf
commit d6fa29b45d

@ -273,14 +273,38 @@ If `extract_function` doesn't work for you, or if you want to manually move the
## Function decomp tips
The example function in this guide shows the overall process of decompiling a function, though it doesn't cover every situation you may encounter. While it is impractical to go over every possible assembly construct and its C equivalent, here are some assorted tips.
* When working in decomp.me, any calls to other functions can be represented as extern functions, even if they are already decompiled in the decomp project. When adding the function to the decomp, you can replace these extern functions with #includes as needed, or leave the externs there for functions that have not been decompiled yet.
* Keep in mind some of the more eclectic C constructs like (static) inline functions, ternary statements, gotos, and C library functions like `memcpy()`. All of these may produce different assembly compared to more conventional C constructs.
* Keep in mind some of the more eclectic C constructs like (static) inline functions, ternary statements, gotos, and C library functions like `memcpy()`. All of these may produce different assembly compared to more basic C constructs.
* If there are multiple places in a function with the same C code, the compiler may merge them into a single block of assembly and use unconditional branches to connect the different places to this block. This is known as a **tail merge**.
* There are times where you'll match everything in the function aside from which registers are used. For example, two variables are assigned to registers `r4` and `r5` respectively, but the target assembly assigns the first variable to `r5` and the second to `r4` instead. This is known in the decomp community as a register swap or **regswap**, and is one of the more frustrating issues to run into. There are a number of possible code changes to try and fix a regswap. Note that this list is not exhaustive.
* Ensure that the register use is actually identical in functionality. It is easy to dismiss a difference as a regswap when it is actually a value being assigned to the wrong variable.
* There are times where you'll match everything in the function aside from which registers are used. For example, two variables are assigned to registers `r4` and `r5` respectively, but the target assembly assigns the first variable to `r5` and the second to `r4` instead. This is known in the decomp community as a **regswap** (register swap) or **regalloc** (register allocation) issue, and is one of the more frustrating issues to run into. There are a number of possible code changes to try and fix a regswap. Note that this list is not exhaustive.
* Ensure that the register use is actually identical in functionality. It is easy to dismiss a difference as a regswap when it is actually a value being assigned incorrectly.
* Reuse a local variable in multiple places, or split a local variable into multiple variables.
* Move a local variable definition elsewhere in the function.
* Add or collapse struct/array accesses with local variables.
* Assign macros and enums to local variables.
* Play with the structure of conditionals and loops.
* Surround parts of the function with no-op `do while(0)` loops.
## Permuter
Another option for dealing with regswaps is the [decomp permuter](https://github.com/simonlindholm/decomp-permuter). This program will randomly change a function with some of the regswap tricks above to try matching the function.
For pmd-sky, there are a couple of steps to set up the permuter.
1. Clone the permuter repo and follow the `README.md` to install the required dependencies. The permuter is written in Python, so you'll need to have Python installed too.
2. Create a directory in the repo to contain the function you want to permute.
3. Add a base C file (e.g., `base.c`) to the new folder with the context and function code that you have so far. Note that the permuter is a little finicky and doesn't recognize certain C constructs like comments and complex casts, so you may need to make some code changes later.
4. Add a target assembly file (e.g., `target.s`) with the function ASM, excluding `arm_func_start` and `arm_func_end`.
5. Add a `function.txt` file containing the name of the function you are decompiling.
6. Add a `compile.sh` Bash script with the following contents. This will tell the permuter how to run the compiler (mwccarm) on the permuted functions it generates.
```
cd <decomp directory>
wine ./tools/mwccarm/2.0/sp2p2/mwccarm.exe -O4,s -DPM_KEEP_ASSERTS -DSDK_ARM9 -DSDK_CODE_ARM -DSDK_FINALROM -enum int -lang c99 -Cpp_exceptions off -gccext,on -proc arm946e -msgstyle gcc -gccinc -interworking -inline on,noauto -char signed -W all -W pedantic -W noimpl_signedunsigned -W noimplicitconv -W nounusedarg -W nomissingreturn -W error -gccdep -MD -c -o $3 $1
```
7. Run the `compile.sh` script on the base C file with the arguments `<base C file> -o <base object file>`. Name the object file the same as the base C file, except with a `.o` extension (e.g., `base.o`). Since the script has a `cd` command in it (required by mwccarm), pass in the absolute paths of both the `.c` and `.o` file.
8. Assemble the target assembly file with the command `rm-none-eabi-as -mthumb -march=armv5te <target assembly file> -o <target object file>`. Like with the compile command, name the target object file the same as the target assembly file, except with `.o` (e.g., `target.o`).
9. Run the permuter with `permuter.py <directory> --stop-on-zero`, where `<directory>` is the directory you created in step 2.
The permuter will run until it finds a function permutation that matches the target assembly. It will also output any permutations that are a closer match to the target than your base C code. There is no guarantee that it will find a match, but it is worth giving a shot if you are having trouble with a regswap or any other ASM difference that is functionally equivalent.
Note that the permuter only makes changes that are functionally equivalent to the base function. If your base function code has a bug that produces different behavior to the ASM you are matching, the permuter will not fix it.
## Asking for help
If you have trouble matching a function, you can ask for help on the pret Discord's #asm2c channel. Post the link to your decomp.me scratch on the channel, and other people can fork the scratch to experiment on their own. You'll see a notification on decomp.me if anyone successfully matches your function. You can also browse through previously matched functions for inspiration on tricks used by others to produce matching assembly. As you continue to decompile more, you can try helping others in the channel, which in turn will help you practice and gain exposure to the nuances of the decompilation process.