Detected by the previous commit. We had forgotten to make sure the
Vertex Loader's table of normal sizes is initilzied, resulting in
the wrong offsets/sizes being used for vertices with normals.
It wasn't working, I'm not really sure why.
Since #2997 we rely on video common to mark efb copies 'written'
during recording, and for old dffs we just ignore the bad texture
while playing back in the texture cache.
We actually discovered a bug while combining the two functions with
FifoRecordAnalzyer's vertex array loading code. If per-vertex
postion or texture matrices were enabled and vertex arrays in use
then the wrong data would be used to calculate the minimum/maxmium
indices, which would result in either too much or too little vertex
data being included in the dff.
So this commit also increments the dff version number, so we can
identify old broken dffs later.
This 'absolutely not' a read of uninitialized memory is '100% not'
the cause of our non-deterministic behavior on the Intel NUC.
If there was a such an error, it would show up on all FifoCi
backends equally. This is 'probably' unrelated to the fact that
the Intel NUC is the only fifoci runner not running under virtualization.
Texture updates have been moved into TextureCache, while
TMEM updates where moved into bpmem. Code for handling
efb2ram updates was added to TextureCache.
There was a bug for preloaded RGBA8 textures, it only copied
half the texture. The TODO was wrong too.
The PowerPC CPU has bits in MSR (DR and IR) which control whether
addresses are translated. We should respect these instead of mixing
physical addresses and translated addresses into the same address space.
This is mostly mass-renaming calls to memory accesses APIs from places
which expect address translation to use a different version from those
which do not expect address translation.
This does very little on its own, but it's the first step to a correct BAT
implementation.
This is required to make packing consistent between compilers: with u32, MSVC
would not allocate a bitfield that spans two u32s (it would leave a "hole").
This isn't technically the correct place to have the downcount variable, but it is similar to what PPSSPP does to gain a bit of extra speed on ARM.
We access this variable quite a bit, with each exit in a block it is subtracted from.
On ARM this required four instructions to load and store the value, while now it only requires two.
This gives an average of 1FPS gain to most games.
Examples:
Crazy Taxi: 54FPS -> 55FPS
Luigi's Mansion: 20FPS -> 21FPS
Wind Waker(Save Screen): 27FPS -> 28FPS
This seems to average a 6mhz to 16mhz CPU core emulation improvement in the few games I've tested.
- remove unused variables
- reduce the scope where it makes sense
- correct limits (did you know that strcat()'s last parameter does not
include the \0 that is always added?)
- set some free()'d pointers to NULL