This was added in #10394 for both the hardware and software backends to work around an issue with Mario Kart Wii, Fortune Street, and Baten Kaitos. However, it seems like the software renderer handles blending well enough that we don't need this (and in any case, it's easy to change blending in the software renderer).
Some experimentation with #11387 (not pushed) showed that the software renderer's logic would also produce correct results on the hardware backends with this hack removed, but would require fbfetch (currently); if a better solution is found the hack can also be removed from the hardware backends.
Otherwise, texelFetch() will use an out-of-bounds layer for game textures (that have 1 layer; EFB copies have 2 layers in stereoscopic 3D mode), which is undefined behavior (often resulting in a black image). The fast texture sampling path uses texture(), which always clamps (see https://www.khronos.org/opengl/wiki/Array_Texture#Access_in_shaders), so it was unaffected by this difference.
The former is deprecated and pretty much all modern drivers
support VK_EXT_debug_utils.
Android drivers dont support it. On those drivers,
we use the implementation provided by the validation layers.
Plus two miscellaneous debugger features that I found along the way when
reading Jit64's code for comparison: bJITNoBlockLinking and tracing.
Fixes https://bugs.dolphin-emu.org/issues/13127.
Small optimization. By not calling WriteExit, the block linking system
never finds out about the exit we're doing, saving us from having to
disable block linking.
We should expose Enable Controller Input and the turbo settings for
GBA just like we do for GameCube controllers and Wii Remotes.
I just forgot about it when implementing the GBA TAS input window.
Previously, if a user on Windows launched Dolphin from the command line
and specified a path to an M3U file and included backslashes in this path,
Dolphin would fail to resolve relative paths in the M3U file.
The calculation of each address in lmw/stmw currently has a dependency
on the calculation of the previous address. By removing this dependency,
the host CPU should be able to pipeline the loads/stores better. The cost
we pay for this is up to one extra register and one extra MOV instruction
per guest instruction, but often nothing.
Making EmitBackpatchRoutine support using any register as the address
register would let us get rid of the MOV, but I consider that to be too
big of a task to do in one go at the same time as this.
Now that we've flipped the C++20 switch, let's start making use of
the nice new <bit> header.
I'm planning on handling this move away from BitUtils.h incrementally
in a series of PRs. There may be a few functions remaining in
BitUtils.h by the end that C++20 doesn't have any equivalents for.
This reverts commit 351d095fff.
In hindsight, my attempted optimization messes with the return
predictor, unlike real tail calls. So I think it does more bad than
good.
The "vector shift by immediate" category encodes the shift amount for
right shifts as `size - amount`, whereas left shifts use `amount`.
We're not actually using SHRN/SHRN2 anywhere, which is why this has gone
undetected.
Use: callstack(0x80000000).
!callstack(value) works as a 'does not contain'.
Add strings to expr.h conditionals.
Use quotations: callstack("anim") to check symbols/name.
For quite some time now, we've had a setting on x86-64 that makes Dolphin
handle NaNs in a more accurate but slower way. There's only one game that
cares about this, Dragon Ball: Revenge of King Piccolo, and what that game
cares about more specifically is that the default NaN (or "generated NaN"
as I believe it's called in PowerPC documentation) is the same as on
PowerPC. On ARM, the default NaN is the same as on PowerPC, so for the
longest time we didn't need to do anything special to get Dragon Ball:
Revenge of King Piccolo working. However, in 93e636a I changed how we
handle FMA instructions in a way that resulted in the sign of NaNs
becoming inverted for nmadd/nmsub instructions, breaking the game.
To fix this, let's implement the AccurateNaNs setting, like on x86-64.
This affected the memory and registers widgets (and possibly others). I'm pretty sure it regressed in 5f629abd8b.
The SetCodeVisible line is a new fix, but the equivalent already existed in the memory widget.
The call to analyzer.Analyze breaks when it attempts to read an instruction, as it eventually tries to read memory when Memory::m_pRAM is nullptr. Trying to read when execution is not paused in general seems like a bad idea (especially as analyzer.Analyze uses PowerPC::TryReadInstruction which can update icache - this is probably still a problem).
Operations that have two operands and can't generate a default NaN,
i.e. addition and subtraction, already have the desired NaN handling
on x86. We just need to make sure to not reverse the operands.
This fixes ps_sum0/ps_sum1 outputting NaNs in cases where they shouldn't.
(HandleNaNs assumes that a NaN in a ps0 input always results in a NaN in
the ps0 output, and correspondingly for ps1.)
1. In some cases, ps_merge01 can be implemented using one instruction.
2. When we need two instructions for ps_merge01, it's best to start with
a MOV to avoid false dependencies on the destination register.
3. ps_merge10 can be implemented using a single EXT instruction.
This regressed in 0a906f553f, I think (though I haven't confirmed it). Mario Tennis and Luigi's Mansion both use these for some reason (as far as I can tell, the data isn't actually used; it's just extra data included for no reason)
DataReader is generally jank - it has a start and end pointer, but the end pointer is generally not used, and all of the vertex loaders mostly bypassed it anyways.
Wrapper code (the vertex loaer test, as well as Fifo.cpp and OpcodeDecoding.cpp) still uses it, as does the software vertex loader (which is not a subclass of VertexLoader). These can probably be eliminated later.
This new function is like MOVP2R, except it masks out the lower 12 bits,
returning them instead of writing them to the register. These lower
12 bits can then be used as an offset for LDR/STR. This lets us turn
ADRP+ADD+LDR sequences with a zero offset into ADRP+LDR sequences with
a non-zero offset, saving one instruction.
When emulated GBAs were added to Dolphin, it was possible to control them
using the GC TAS input window. (Z was mapped to Select.) Unaware of this,
I broke the functionality in b296248.
To make it possible to control emulated GBAs using TAS input again,
I'm adding a proper TAS input window for GBAs, with a real Select button
and no analog controls.
0e02ddcf52 removed separate logic for tiled versus non-tiled EFB peek caches, and as part of that made it so that color peeks updated the frame access mask even when a non-tiled cache is in use. However, the same change was not made for depth peeks. I'm not sure if this affected anything in practice.
`ImGui::GetIO` performs an assertion that a context exists, and if one doesn't then things will likely crash. Unfortunately this crash is hard to consistently reproduce.
I recently talked to a homebrew developer who was trying to add exception
handlers at link time but found out that Dolphin was overwriting their
exception handlers. I figure that's not the usual way to do exception
handlers, but... making us load the executable after setting up memory
rather than before is easy, and matches what we do when booting discs,
so I suppose there's no reason not to do it. It also matches the intent
of why Dolphin is writing default exception handlers – we're writing
them because some homebrew relies on exception handlers being left
around from whatever program was running before it (see 3dd777be70).
Let's take advantage of ARM64's input register shifting one last time,
shall we?
Before:
0x1280005b mov w27, #-0x3
0x1b1b7f18 mul w24, w24, w27
After:
0x4b180b18 sub w24, w24, w24, lsl #2
ARM64's flexible shifting of input registers also allows us to calculate
a negative power of two in one instruction; shift the input of a NEG
instruction.
Before:
0x128001f7 mov w23, #-0x10
0x1b1a7efa mul w26, w23, w26
0x93407f58 sxtw x24, w26
After:
0x4b1a13fa neg w26, w26, lsl #4
0x93407f58 sxtw x24, w26
If the destination register doesn't equal the input register, using it
to temporarily hold the immediate value is fair game as it'll be
overwritten with the result of the multiplication anyway. This can
slightly reduce register pressure.
Before:
0x52800659 mov w25, #0x32
0x1b197f5b mul w27, w26, w25
After:
0x5280065b mov w27, #0x32
0x1b1b7f5b mul w27, w26, w27
By taking advantage of ARM64's ability to shift an input register by any
amount, we can calculate multiplication by a number that is one more
than a power of two with a single instruction.
Before:
0x52800838 mov w24, #0x41
0x1b187f7b mul w27, w27, w24
After:
0x0b1b1b7b add w27, w27, w27, lsl #6
Turn multiplications by a power of two into bitshifts.
Before:
0x52800817 mov w23, #0x40
0x1b167ef6 mul w22, w23, w22
After:
0x531a66d6 lsl w22, w22, #6
Multiplication by one is also trivial. Depending on the registers
involved, either a single MOV or no instructions will be generated.
Before:
0x52800038 mov w24, #0x1
0x1b1a7f1b mul w27, w24, w26
After:
0x2a1a03fb mov w27, w26
Before:
0x52800039 mov w25, #0x1
0x1b1a7f3a mul w26, w25, w26
After:
Nothing!
Add a new function that will handle all the special cases regarding
multiplication. It does nothing for now, but will be expanded in
follow-up commits.
We can merge an SXTW with the SUB, eliminating one instruction. In
addition, it is no longer necessary to allocate a temporary register,
reducing register pressure.
Before:
0x93407f59 sxtw x25, w26
0x93407ebb sxtw x27, w21
0xcb1b033b sub x27, x25, x27
After:
0x93407f5b sxtw x27, w26
0xcb35c37b sub x27, x27, w21, sxtw
ARM64 can do perform various types of sign and zero extension on a
register value before using it. The Arm64Emitter already had support for
this, but it was kinda hidden away.
This commit exposes the functionality by making the ExtendSpecifier enum
available everywhere and adding a new ArithOption constructor.
[ VUID-VkDescriptorPoolCreateInfo-maxSets-00301 ] Object 0:
handle = 0x7f1,b8d,3cd,e70, type = VK_OBJECT_TYPE_DEVICE; |
MessageID = 0xa1,70e,236 | vkCreateDescriptorPool():
pCreateInfo->maxSets is not greater than 0.
The Vulkan spec states: maxSets must be greater than 0
BindFramebuffer depends on the pipeline which might not be set yet.
That's why the framebuffer dirty flag exists in the first place.
I assume BindFramebuffer was called directly here, in order to handle
the texture state transitions necessary for DiscardResource.
The state is tracked anyway, so we can just issue those transitions there
too and defer binding the actual framebuffer.
Fixes an issue in Zelda Twilight Princess with EFB depth peeks.
Dolphin would bind a frame buffer which doesn't have an integer format
descriptor for the color target before binding the new pipeline.
So it would accidentally use the 0 descriptor.
Debug layer error:
D3D12 ERROR: ID3D12CommandList::OMSetRenderTargets:
Specified CPU descriptor handle ptr=0x0000000000000000 does not refer to
a location in a descriptor heap. pRenderTargetDescriptors[0] is the issue.
[ EXECUTION ERROR #646: INVALID_DESCRIPTOR_HANDLE]
Fixes the following error in the D3D12 debug layer:
D3D12 WARNING: ID3D12Device::CreateCommittedResource:
Ignoring InitialState D3D12_RESOURCE_STATE_UNORDERED_ACCESS.
Buffers are effectively created in state D3D12_RESOURCE_STATE_COMMON.
[ STATE_CREATION WARNING #1328: CREATERESOURCE_STATE_IGNORED]
Fixes the following error in the D3D12 debug layer:
D3D12 ERROR: ID3D12DescriptorHeap::GetGPUDescriptorHandleForHeapStart:
GetGPUDescriptorHandleForHeapStart is invalid to call on a descriptor
heap that does not have DESCRIPTOR_HEAP_FLAG_SHADER_VISIBLE set.
If the heap is not supposed to be shader visible, then
GetCPUDescriptorHandleForHeapStart would be the appropriate method
to call. That call is valid both for shader visible and non shader
visible descriptor heaps.
[ STATE_GETTING ERROR #1315: DESCRIPTOR_HEAP_NOT_SHADER_VISIBLE]
When searching for a disc where the revision doesn't match any disc in
the datfile, the loop would never get to the part where serials_exist is
set to true, leading to a bogus error message.
Because of the previous commit, `regs_in_use` must not include `dest_reg`
when calling MMIOLoadToReg. There are also some other registers we can
skip including in regs_in_use just for efficiency's sake.
The `addr_reg_set = false` statements that I've added in this commit are
technically redundant – if `mmio_address` is non-zero then `addr_reg_set`
is already false – but it's just a coincidence that that's the case.
I originally added these in 2b1d1038a6, for both the TPipelineFunction and the size. The size was moved into the header in fdcd2b7d00 (making the size functions obsolete), but it seems that the functions themselves are no longer needed now.
I think I didn't use this approach before because it would have required ComponentFormatTable and ComponentCountRow to be templated, which would end up resulting in lines that were too long and thus wrapped in awkward places. (I *think* they didn't get inferred properly.) Now that we only need TPipelineFunction, the templating is not needed, and this ends up being a more readable version of the version with the wrapper functions.
The old calculation was stride * (max_index + 1), which fails if stride is less than the size of a component (for instance, if float XYZ positions are used, and the stride was set to 4 (i.e. sizeof(float)) instead of 12 (i.e. 3 * sizeof(float)), it would be missing the last 8 bytes of the final element in the array. Or, if stride was set to 0, then no bytes would be recorded at all (though that's not a useful configuration so it's unlikely to actually exist).
I'm not aware of any games affected by this issue.
This should fix recording the wall in the staircase leading to the basement in Luigi's Mansion (though I haven't tested it, as I don't own a copy of Luigi's Mansion). This uses NormalIndex3, and the index for the normal vector (generally 0x02XX or 0x01XX) there is always lower than the tangent or binormal (generally 0x07XX). Other games seem to usually have a similar range of indices for the normal, tangent, and binormal, so this issue wouldn't affect them.
In most cases, games will use the same type for all vertex components (either Index8 or Index16 or Direct). However, RS2's deflection towers use Index16 for the texture coordinate and Index8 for everything else, meaning the texture coordinates were recorded incorrectly (the first byte was used, so only indices 0 and 1 were recorded instead of 0 through 0x0192). Worse still, some background elements in RS2 use direct positions but indexed normals or texture coordinates, and those would not be recorded at all.
This is a regression from b5fd35f951.
`count` is the number of stereo samples to write (where each stereo sample is two shorts), while `BUFFER_SIZE` is the size of the buffer in shorts. So `count` needs to be multiplied by `2`, not `BUFFER_SIZE`. Also, when this check was failed, the previous code just clobbered whatever was past the end of the buffer after logging the warning, which corrupted `basename`, eventually resulting in Dolphin crashing.
This affected Datel's Wii-compatible Action Replay, which uses a block size of 2298, or 18384 stereo samples, which is 36768 shorts, which is bigger than the buffer size of 32768. (However, the previous commit means that only one block is transfered at a time, eliminating this issue; fixing the bounds check is just a general safety thing instead of an actual bugfix now.)
The previous implementation of Force25BitPrecision was essentially a
translation of the x86-64 implementation. It worked, but we can make a
more efficient implementation by using an AArch64 instruction I don't
believe x86-64 has an equivalent of: URSHR. The latency is the same as
before, but the instruction count and register count are both reduced.
The new `dispatcher_no_timing_check` is the same as `dispatcher_no_check`
except it includes the "stepping check" in debug mode. This lets us avoid
the `m_enable_debugging ? dispatcher : dispatcher_no_check` dance.
Maybe "tail call" isn't quite the right term for what this code
is doing, since it's jumping to the dispatcher rather than
returning, but it's the same optimization as for a tail call.
fregsIn will include FD for double-precision instructions, since for
dependency tracking purposes the instruction does read the upper
half of FD. This is not what we want in HandleNaNs.
The consequence of this bug is that if an instruction was supposed to
output a NaN and FD happens to contain a NaN and FD happens to be the
same register as an unused register in the instruction encoding, the
NaN in FD could get used as the output instead of the correct NaN.
This isn't known to affect any games, which isn't especially surprising
considering that there's only one game that needs AccurateNaNs anyway.
Jumping to `dispatcher` requires first subtracting the downcount,
otherwise `dispatcher` may unpredictably jump to CoreTiming::Advance,
which could break determinism compatibility with JitArm64. We should
jump to `dispatcher_no_check` instead.
The breakpoint check in Jit.cpp makes it redundant.
Normally this redundant check doesn't cause any issues, but if you
create a breakpoint and enable logging without breaking, you get two
log messages if the breakpoint is at the beginning of a block. See
https://bugs.dolphin-emu.org/issues/13044.
This is also a tiny performance improvement for when debugging is
active, since we no longer check for breakpoints for blocks that never
had any breakpoints to begin with.
Nothing currently uses it. It could theoretically be replaced with fmt support, but I don't think the LOG_VULKAN_ERROR macro is that useful and it'd be better to replace it with regular logging instead.
base is an unsigned variable, so we can make things little more
consistent by making the loop index unsigned so we aren't doing bit
arithmetic with signed types.
MemoryInterface already does this, so we can leave it alone.
No behavioral changes, just a consistency thing.
Rather than makring some parts of VertexLoaderManager dirty in some places and some in others, do it all in VideoState. Also, since CPState no longer contains pointers/non-CP data after d039b1bc0d, we can just use p.Do on it instead of manually saving each field.
Micro-optimization. Some CPUs can fuse CMP+B, TST+B, arith+CBZ, etc.
I also moved things around for CMP+CSET and TST+CSET - which I'm not sure
if any CPUs support - but it doesn't hurt anything, so I might as well.
Improves accuracy but isn't known to affect any games.
This turned out to be fairly convenient to implement; ORing with the
PPC default NaN will quieten SNaNs and do nothing to QNaNs.
This existed in the initial megacommit (though I don't know why) as IO_SIZE. It was used in Memmap's Init() to compute totalMemSize, but I don't know if it actually did anything then. That use was removed in 2d0f714546, but the constant persisted until cc858c63b8, when it became a static variable.
This was added in 385d8e2b15, but became somewhat redundant with Do in 4c7bbd96e4, and completely redundant now that std::is_trivially_copyable_v is well-supported.
This is the first step of getting rid of the controller indirection
on Android. (Needing a way for touch controls to provide input
to the emulator core is the reason why the controller indirection
exists to begin with as far as I understand it.)
This lets the TAS input code use a higher-level interface for
overriding inputs instead of having to fiddle with raw bits.
WiiTASInputWindow in particular was messy with how much
controller code it had to re-implement.
Fixes a Rogue Squadron II regression from 9d73583.
This set_dirty stuff is pretty tricky to reason about. I thought I
was clever when coming up with set_dirty, but maybe I was too clever
for my own good...
In case the register we're binding is the same as the immediate register,
we should fetch the immediate before calling BindToRegister. The way
the register cache currently works, calling GetImm after BindToRegister
actually does work, but it's better to not rely on it.
Avoid waiting for earlier submissions when we flush more often.
The vertex manager will flush more often if the game accesses the EFB
on the CPU, to give the GPU a head start.
Before, only the symbols box would update. However, if you edit the symbol of a function in the call stack (which seems like something that would happen reasonably often while debugging), the call stack would be out of date until it was updated by clicking on it. Callers and calls were more of an edge case; for them to be out of date, you would need to right-click on an instruction in a function other than the one containing the currently-selected instruction (though it would also affect recursive functions).
Tested on an official DOL-014 (251 blocks) memory card by executing the
0xf4 command on a card with content along its entire length and then
dumping the whole card: it reads as 0xff all the way through.
Therefor, the current implementation is already consistent with hardware.
This reverts commit fb265b610d.
The optimization in that commit is safe when the executor thread is
writing and the GUI thread is reading, but I had failed to take into
account that it's unsafe when the GUI thread is writing and the executor
thread is reading. (The native UpdateAdditionalMetadata function loops
through m_cached_files, which is unsafe if another thread is adding
elements to m_cached_files simultaneously.)
Losing out on this optimization isn't too bad, because
719930bb39 makes it very unlikely that
both threads will want the lock at the same time.
Texture dumping can already be done using VideoCommon's system (and in fact the same setting already enabled *both* of these). Dumping objects/TEV stages/texture fetches doesn't currently have an equivalent, but could be added to the FIFO player instead.
A (partial) port of #9481 to ARM64. This commit adds special cases for
immediate values equal to 0 or 0xFFFFFFFF, allowing for more efficient
or no code to be generated.
When a guest register is an immediate, it may be necessary to move this
value into a register. This is handled by gpr.R(), which lacks context
on how the register will be used. This leads to cases where the
immediate is written to a register, only for it to be overwritten. Take
for example this code generated by srwx:
0x5280031b mov w27, #0x18
0x53187edb lsr w27, w22, #24
gpr.BindToRegister() does have this context through the do_load
parameter, but didn't handle immediates. By adding this logic, we can
intelligently skip the write when do_load is false.
Because of the previous commit, this is needed to stop DolphinQt from
forgetting that the user pressed ignore whenever any part of the config
is changed.
This commit also changes the behavior a bit on DolphinQt: "Ignore for
this session" now applies to the current emulation session instead of
the current Dolphin launch. This matches how it already worked on
Android, and is in my opinion better because it means the user won't
lose out on important panic alerts in a game becase they played another
game first that had repeated panic alerts that they wanted to ignore.
For Android, this commit isn't necessary, but it makes the code cleaner.
Not doing so produces a warning in clang:
ISO C++20 considers use of overloaded operator '!=' (with operand types
'Metal::DepthStencilSelector' and 'Metal::DepthStencilSelector') to be
ambiguous despite there being a unique best viable function with
non-reversed arguments
The underlying reason for this warning is an incorrect method signature.
* 'hangle' was a typo
* Light colors include an alpha value, so they should be 8 characters, not 6
* The XF command format adds 1 to the count internally (so 0 is one word), but we need to subtract that back to produce a valid command
* XFMEM_POSTMATRICES was calculating the row by subtracting XFMEM_POSMATRICES (POS vs POST), resulting in incorrect row numbering
It stores both the konst selection value for alpha and color channels (for two tev stages per ksel), and half of a swap table row (there are 4 total swap tables, which can be used for swizzling the rasterized color and the texture color, and indices selecting which tables to use are stored per tev stage in the alpha combiner). Since these are indexed very differently, the old code was hard to follow.
The masking was incorrect. This affects the main menu of The Last Avatar, though that menu also relies on copy filter functionality that is not correctly handled in the software renderer so the difference is not obvious; that game shuffles textures across all indices for some reason, so this issue would presumably result in subtle flickering.
Per https://en.cppreference.com/w/cpp/preprocessor/replace#.23_and_.23.23_operators the `##` behavior is a nonstandard extension; this extension seems to be supported by all compilers we care about, but IntelliSense in visual studio doesn't correctly handle it, resulting in false errors in the IDE (but not when compiling).
Per https://en.cppreference.com/w/cpp/preprocessor/replace#Function-like_macros C++20 introduced a workaround, where `__VA_OPT__(, )` generates a comma if and only if `__VA_ARGS__` is non-empty.
This PR replaces all occurrences, with the exception of Externals, DSPSpy (which is not likely to be edited in MSVC and does not target C++20 currently), and JitArm64_Integer.cpp (which uses `Function(__VA_ARGS__)`, and thus does not ever need a comma).
Fixes https://bugs.dolphin-emu.org/issues/13017. With uCode switching, the existing instance of AXUCode is re-activated when GBAUCode is done, but if the state remains as WaitingForNextTask, it won't be able to do anything. Instead, it needs to be in WaitingForCmdListSize.
(When the AX uCode is resumed, startpc is set to 0x0030, at least for 0x07f88145; this is the same location as MAIL_RESUME jumps to, so DSP_RESUME should be sent when the resuming happens; that's already handled by AXUCode::Update.)
walking the zip prevents minizip from re-reading the same
data repeatedly from the actual backing filesystem.
also improves most usages of minizip to allow for >4GB,
files altho we probably don't need it
dir_path is used by PanicAlertFormatT, which prior to PR 10209 used a
lambda. Before c++20, referring to structured bindings in lambda captures
was forbidden. The problem is now doubly fixed, so put the structured
binding back in.
Fixes the Dolphin bug mentioned in
https://github.com/dolphin-emu/hwtests/issues/45.
Because this doesn't fix any observed behavior in games (no, 1080°
Avalanche isn't affected), I haven't implemented this in the JITs,
so as to not cause unnecessary performance degradations.
This was causing a bug in the rounding of paired single multiplication
operands. If Force25BitPrecision was called for quad registers, the
element size of its ADD instruction would get treated as if it was 16
instead of the intended 64, which would cause the result of the
calculation to be incorrect if the carry had to pass a 16-bit boundary.
Fixes one of the two bugs reported in
https://bugs.dolphin-emu.org/issues/12998.
This command does not upload the MAIN buffers to CPU memory. This was
functionally fixed in f11a40f858 without
updating the comments and variable names.
Prior to 7854bd7109, this was used by the debugger for the OpenGL and D3D9 plugins to control logging (via PRIM_LOG and INFO_LOG/DEBUG_LOG in VideoCommon code; PRIM_LOG was changed in 77215fd27c), and also framedumping (removed in 64927a2f81 and 2d8515c0cf), shader dumping (removed in 2d8515c0cf and this commit), and texture dumping (removed in 54aeec7a8f). Apart from shader dumping, all of these features have modern alternatives, and shader source code can be seen in RenderDoc if "Enable API Validation Layers" is checked (which also enables source attachment), so there's no point in keeping this around.
Previously, we had WBFS and CISO which both returned an upper bound
of the size, and other formats which returned an accurate size. But
now we also have NFS, which returns a lower bound of the size. To
allow VolumeVerifier to make better informed decisions for NFS, let's
use an enum instead of a bool for the type of data size a blob has.
For a few years now, I've been thinking it would be nice to make Dolphin
support reading Wii games in the format they come in when you download
them from the Wii U eShop. The Wii U eShop has some good deals on Wii
games (Metroid Prime Trilogy especially is rather expensive if you try
to buy it physically!), and it's the only place right now where you can
buy Wii games digitally.
Of course, Nintendo being Nintendo, next year they're going to shut down
this only place where you can buy Wii games digitally. I kind of wish I
had implemented this feature earlier so that people would've had ample
time to buy the games they want, but... better late than never, right?
I used MIT-licensed code from the NOD library as a reference when
implementing this. None of the code has been directly copied, but
you may notice that the names of the struct members are very similar.
c1635245b8/lib/DiscIONFS.cpp
Needed for the next commit. NFS disc images are hashed but not encrypted.
While we're at it, also get rid of SupportsIntegrityCheck.
It does the same thing as old IsEncryptedAndHashed and new HasWiiHashes.
This normalization was added in 02ac5e95c8, and changed to use floats in 4bf031c064. The conversion to floats means that sometimes there is insufficient precision for the normalization process, which results in values of NaN or infinity. Performing the whole process with doubles prevents that, but games also sometimes set the values to NaN or infinity directly (possibly accidentally due to the values not being initialized due to them not being used in the current configuration?).
The version of Mesa currently in use on FifoCI (20.3.5) has issues with NaN. Although this bug has been fixed (b3f3287eac in 21.2.0), FifoCI is stuck with the older version.
This change may or may not be incorrect, but it should result in the same behavior as already present in Dolphin, while working around the Mesa bug.
CARDUCode, GBAUCode, and INITUCode previously didn't have an implementation of it. In practice it's unlikely that this caused an issue, since these uCodes are only active for a few frames at most, but now that GBAUCode doesn't have global state, we can implement it there. I also implemented it for CARDUCode, although our CARDUCode implementation does not have all states handled yet - this is simply future-proofing so that when the card uCode is properly implemented, the save state version does not need to be bumped. INITUCode does not have any state to save, though.
The accuracy improvements are:
* The request mail must be 0xabba0000 exactly; both the low and high parts are checked
* The address is masked with 0x0fffffff
* Before, the global state meant that after the GBA uCode had been used once, it would accept 0xcdd1 commands immediately. Now, it only accepts them after execution has finished.
* moves dolphin-specific settings out of Base.props
* creates exports.props for externals, allowing to easily import
individual Externals
* corrects some cruft that accumulated and probably contributed
to msbuild overbuilding
These lookup tables total 4 megabytes, and contain data that's entirely redundant to the actual cache state (as part of an optimization, though I'm not sure whether the optimization actually is useful). This change instead recomputes these lookup tables when loading the state (which involves filling the lookup table with a marker (0xff), and then setting the 128 * 8 valid entries (1 kilobyte)).
Before, we used a replace hook and didn't write anything there. Now, we write a BLR instruction to immediately return, and then use a start hook. This makes the behavior a bit clearer (though it shoudln't matter in practice).
All of our BBA options are technically built in, so it made the BBA
Built In option kind of confusing as to what it did. So rename it to
BBA HLE to make it more clear what it is doing and why it doesn't need a
TAP.
https://bugs.dolphin-emu.org/issues/12977 indicates that this happens on startup of Spider-Man 2, even in single-core. I don't have the game, so I can't directly determine why this is happening, but presumably real hardware does not hang in this case, so we can make it less obtrusive.
I'm not sure what the XMM0 check was supposed to be, but the 0xCC008000 one is for the fifo and is handled elsewhere now (look for `optimizeGatherPipe`).
Looks like a copy-paste gone wrong. The compute shaders for the other
formats use a group size of 8 * 8, whereas the CMPR compute shader
is supposed to use a flattened 64 * 1 as I understand it.
We currently have two different code paths for initializing controllers:
Either the frontend (DolphinQt) can do it, or if the frontend doesn't do
it, the core will do it automatically when booting. Having these two
paths has caused problems in the past due to only one frontend being
tested (see de7ef47548). I would like to get rid of the latter path to
avoid further problems like this.
The movie config layer is not active for recording, only playback. Thus, recording ends up stuck with default SYSCONF settings.
The fix is simply to add in the movie config layer when recording. The way it's done is a bit hacky, but seems to work.
This struct is the only one in BPMemory that uses u64 as its base. These fields are to allow viewing it as two u32s instead. It's not used by Dolphin right now, but it is used in the copy of BPMemory.h used by hwtests.
This also changes the behavior for the invalid gamma value, which was confirmed to behave the same as 2.2.
Note that currently, the gamma value is only used for XFB copies, even though hardware testing indicates it also works for EFB copies. This will be changed in a later commit.
The only remaining casts for these types that I know of are in TextureInfo (where format_name is set to the int version of the format, and since that affects filenames and probably would break resource packs, I'm not changing it) and in TextureDecoder_Common's TexDecoder_DrawOverlay, which will be handled separately.
Adds a pass to process driver deficiencies between UID caching and use, allowing a full view of the whole pipeline, since some bugs/workarounds involve interactions between blend modes and the pixel shader
Before, Free Look would accept background input by default, which means it was easy to accidentally move the camera while typing in another window. (This is because HotkeyScheduler::Run sets the input gate to `true` after it's copied the hotkey state, supposedly for other threads (though `SetInputGate` uses a `thread_local` variable so I'm not 100% sure that's correct) and for the GBA windows (which always accept unfocused input, presumably because they won't be focused normally).
If a 64-bit register is passed to WriteConditionalExceptionExit,
the LDR instruction in it will read too much data. This seems
to be harmless right now, but causes problem in one of my PRs.
Heavily simplify logical immediate encoding.
This is based on the observation that if a valid repeating element
exists, it repeats through `value`. Thus it does not matter which
one you analyse. Thus we skip over the least significent element
if LSB = 1 by masking it out with `inverse_mask_from_trailing_ones`,
to avoid the degenerate case of a stretch of 1 bits going 'round
the end' of the word.
This should reduce (but not completely eliminate) gradual audio desyncs in dumps. This also allows for accurate sample rates for the GameCube.
Completely eliminating gradual audio desyncs will require resampling to an integer sample rate, as nothing seems to support a non-integer sample rate.
These values were obtained by setting a breakpoint at a game's entry point, and then observing the register values with Dolphin's register widget.
There are other registers that aren't handled by this PR, including CR, XER, SRR0, SRR1, and "Int Mask" (as well as most of the GPRs). They could be added in a later PR if it turns out that their values matter, but probably most of them don't.
This fixes Datel titles booting with the IPL skipped (see https://bugs.dolphin-emu.org/issues/8223), though when booted this way they are currently missing textures. Due to somewhat janky code, Datel overwrites the syscall interrupt handler and then immediately triggers it (with the `sc` instruction) before they restore the correct one. This works on real hardware due to icache, and also works in Dolphin when the IPL runs due to icache, but prior to this change `HID0.ICE` defaulted to 0 so icache was not enabled when the IPL was skipped.
DSPHLE::Initialize sets the halt and init bits to true (i.e. m_dsp_control.Hex starts as 0x804), which is reasonable behavior (this is the state the DSP will be in when starting a game from the IPL, as after `__OSStopAudioSystem` the control register is 0x804).
However, CMailHandler::m_halted defaults to false, and we only call CMailHandler::SetHalted in DSPHLE::DSP_WriteControlRegister when m_dsp_control.DSPHalt changes, so since DSPHalt defaults to true, if the first thing that happens is writing true to DSPHalt, we won't properly halt the mail handler.
Now, we call CMailHandler::SetHalted on startup. This fixes Datel titles when the IPL is skipped with DSP HLE (though this configuration only works once https://bugs.dolphin-emu.org/issues/8223 is fixed).
This fixes booting Datel titles with DSPHLE (see https://bugs.dolphin-emu.org/issues/12943). Datel messed up their DSP initialization code, so it only works by receiving a mail later on, but if halting isn't implemented then it receives the mail too early and hangs.
It's cleared whenever the uCode changes, so there's no reason to clear it in a destructor or during initialization.
I've also renamed it to ClearPending.
The # option means that 0x is prepended already, so the old code resulted in 0x0xDEADBEEF instead of the intended 0xDEADBEEF. WriteMailboxLow was already correct.
Before, both 1441 and 147f would disassemble as `lsr $acc0, #1`, when the second should be `lsr $acc0, #-1`, and both 14c1 and 14ff would be `asr $acc0, #1` when the second should be `asr $acc0, #-1`. I'm not entirely sure whether the minus signs actually make sense here, but this change is consistent with the assembler so that's an improvement at least.
devkitPro previously changed the formatting to not require negative signs for lsr and asr; this is probably something we should do in the future: 8a65c85c9b
This fixes the HermesText and HermesBinary tests (HermesText already wrote `lsr $ACC0, #-5`, so this is consistent with what it used before.)
For instance, ending with 0x009e (which you can do with CW 0x009e) indicates a LRI $ac0.m instruction, but there is no immediate value to load, so before whatever garbage in memory existed after the end of the file was used.
The bounds-checking also previously assumed that IRAM or IROM was being used, both of which were exactly 0x1000 long.
Spirv-cross's MSL codegen makes the amazing choice of compiling calls to inout functions as `State temp = s; call_function(temp); s = temp`. Not all Metal backends handle this mess well. In particular, it causes register spills on Intel, losing about 5% in performance.
X30 is used in fewer situations than the comment was claiming.
(I think that when I wrote the comment I was counting the use of X30
as a temp variable in the slowmem code as clobbering X30, but that
happens after pushing X30, so it doesn't actually get clobbered.)
This is used when fastmem isn't available. Instead of always falling
back to the C++ code in MMU.cpp, the JIT translates addresses on its
own by looking them up in a table that Dolphin constructs. This is
slower than fastmem, but faster than the old non-fastmem code.
This is primarily useful for iOS, since that's the only major platform
nowadays where you can't reliably get fastmem. I think it would make
sense to merge this feature to master despite this, since there's
nothing actually iOS-specific about the feature. It would be of use
for me when I have to disable fastmem to stop Android Studio from
constantly breaking on segfaults, for instance.
Co-authored-by: OatmealDome <julian@oatmealdome.me>
This hack was added in 8f0cbefbe5, and the part of it in SI_DeviceGCAdapter is present on Android already, so I don't see any reason why this part doesn't apply to Android.
This is mostly a brainless merge, #ifdef-ing anything that doesn't match between the two while preserving common logic. I didn't rename any variables (although similar ones do exist), but I did change one log that was ERROR on android and NOTICE elsewhere to just always be NOTICE. Further merging will follow.
Instead, saturate in OpReadRegister, as all uses of OpReadRegisterAndSaturate called OpReadRegister for other registers (and there isn't anything that writes to $ac0.m or $ac1.m without saturation).
Loading configs while another thread is messing with stuff just doesn't feel like a good idea
Hopefully fixes Wiimote Scanning Thread crashes on startup
There were 3 bugs here:
- The input register for the full register wasn't actually being used; it was read into RCX but RCX wasn't used by Update_SR_Register16_OverS32 (except as a scratch register). The way the DSP LLE recompiler uses registers is in general confusing, so this commit changes a few uses to have a variable for the register being used, to make code a bit more readable. (Default parameter values were also removed so that they needed to be explicitly specified).
- Update_SR_Register16 was doing a 64-bit test, when it should have been doing a 16-bit test. For the most part this doesn't matter due to sign-extension, but it does come up with e.g. `ORI` or `ANDI`.
- Update_SR_Register16_OverS32 did the over s32 check, and then called Update_SR_Register16. Update_SR_Register16 masks $sr with ~SR_CMP_MASK, clearing the over s32 bit. Now the over s32 check is performed after calling Update_SR_Register16 (without masking a second time). No official uCode cares about the over s32 bit.
We don't have anything called $amD, though we do have $acsD. However, these instructions affect flags based on the whole accumulator, so it's better to just use $acD.
For more information, ApplyWriteBackLog, WriteToBackLog, and ZeroWriteBackLog were added in b787f5f8f7 and the explanatory comment was added in fd40513fed, although it did not mention the specific instructions that could trigger this edge case. The statements about which registers can be written by main opcodes and extension opcodes are based on my own checking of all instructions in the manual.
It's been unused since DolphinWX was removed in 44b22c90df. Prior to that, it was used in Source/Core/DolphinWX/NetPlay/NetWindow.cpp. But the new equivalent in Source/Core/DolphinQt/NetPlay/NetPlayDialog.cpp uses NetPlayClient::GetPlayers instead. Stringifying (or creating a table, as is done now) should be done by the UI in any case.
Among other things, this trims trailing newline characters. Before (on windows) the \r would corrupt the output and make them very hard to understand (as the error message would be drawn over the code line, but part of the code line would peek out from behind it).
Ninja puts way more effort into compiling targets in parallel, and
ignores dependenceis until link time.
So we need to jump though hoops to force ninja to compile
pch.cpp before any targets which depend on the PCH.
I have no idea why cmake supports PUBLIC on target_sources,
but it does. It causes all targets that depend on this target
to try and include the files in their sources.
Except it doesn't take paths into account, so it breaks. Mabye
it would work if you used an abolute source? But I'm not sure
there is a sane usecase.
Page faults should only occur on architectures that support exception
handlers, so skip the test on other architectures to avoid spurious test
failures.
... and refresh the config before populating the backend info, as the config (specifically iAdapter) needs to be set to correctly populate the backend info.
Before, the list of valid antialiasing modes was always determined from the first adapter on the list on startup, regardless of the adapter the user selected.
This results in the list of available antialiasing modes being updated; before, it would only show the modes available for the adapter that was selected when the graphics window was opened (or the backend was last changed).
The list of available modes is updated by `GraphicsWindow::OnBackendChanged`'s call to `VideoBackendBase::PopulateBackendInfoFromUI`, and then `EnhancementsWidget::LoadSettings` updates the UI. Both of these are connected to the `GraphicsWindow::BackendChanged` signal.
On GameCube, a ramp bit has no effect if its corresponding channel is
inactive. On Wii however, enabling just the ramp implicitly also enables
the channel. AXSetVoiceMix() never does that, so this commit should have
no impact on games unless they fiddle with the mixer control value
directly.
This refactorization is done just to match the order that I made
WriteToHardware use in 543ed8a. For WriteToHardware, it's important that
things like MMIO and gather pipe are handled before we reach a special
piece of code that only should get triggered for writes that hit memory
directly, but for ReadFromHardware we don't have any code like that.
This fixes a problem where Dolphin could crash if a non-translated
read crossed the end of a physical memory region.
The same change was applied to WriteToHardware in ecbce0a.
This more accurately represents what's going on, and also ends at 0 instead of 1, making some indexing operations easier. This also changes it so that position_matrix_index_cache actually starts from index 0 instead of index 1.
(Specifically, the copy for VertexLoaderManager::position_cache. The position matrix index happens elsewhere, and the float path still has special logic to copy to scratch3.)
This increases accuracy, fixing the white rendering in Major Minor's Majestic March. However, the hardware backends can only have one viewport and scissor rectangle at a time, while sometimes multiple are needed to accurately emulate what is happening. If possible, this will need to be fixed later.
I think this is a relic of D3D9. D3D11 and D3D12 seem to work fine without it. Plus, ViewportCorrectionMatrix just didn't work correctly (at least with the viewports being generated by the new scissor code).
These aren't particularly useful, and make the code a bit more confusing. If for some reason someone wants to test what happens when these functions are disabled, it's easier to just edit the code that implements them. They aren't exposed in the UI, so one would need to restart Dolphin to do it anyways.
I am not confident there are no race conditions between s_write_mutex,
s_controller_write_payload_size, and s_controller_write_payload. But
this code should be safer than before.
s_controller_write_payload_size needs to remain an atomic because Read()
loads and stores without holding a mutex, Output() stores while holding
s_write_mutex, and ResetRumble() stores while holding s_read_mutex! I'm
pretty sure this code is wrong, specifically ResetRumble().
You can safely read or write non-atomic integers on multiple threads,
as long as every thread reading or writing it holds the same mutex
while doing so (here, s_mutex).
Removing the atomic accesses makes the code faster, but the actual
performance difference is probably negligible.
Add Diff button to CodeWidget
Add Code Diff Tool window for recording and differencing functions. Allows finding specific functions based on when they run.
This was requested by a forum user, and I thought why not.
It's a simple change to make since DiscIO already supports it,
and I assume command-line users know roughly what they're doing.
New dolphin-tool command: "header"
-b / --block_size
-c / --compression
-l / --compression_level
Informative RVZ/WIA header2 value "compression_level" is now a s32 instead of a u32, because negative compression is a thing.
Speaking of, it is now possible to use negative compression levels in dolphin-tool's convert command (not the GUI, though).
Turns out there's some Freeloader disc for the GC that triggers this
despite being a good dump. This warning is mostly intended to catch
Wii games that have been truncated at the 4.00 GiB or 4.38 GiB mark
anyway, and if someone does have a Datel dump that has been truncated,
they'll still get the "unusual size" warning.
If libusb fails to initialize, an assertion fails, but if that happens before the main window is created, then Dolphin just dies. Now, the panic alert is properly shown and the user can ignore it.
These messages hid other, more important, ones often. I have left AttemptMaxTimesWithExponentialDelay and GetSysDirectory/SetSysDirectory as info, since those are called infrequently and can be useful to the end-user.
This message would be logged, usually multiple times, for EVERY. SINGLE. PIXEL. That's pretty much useless and just makes the log unreadable. Plus, the current support (which acts as RGB8) is close enough that for end-user purposes, it's fine. I don't think the hardware backends support RGB565_Z16 and its antialiasing functionality correctly either, but they don't have similar logspam.
Previously we were using this workaround when using framebuffer fetch
to emulate dual source blending, but it seems like we also need to use
it when using framebuffer fetch to emulate logic ops, otherwise some
Adreno devices get a crash when compiling OpenGL ES ubershaders.
Using the workaround in specialized shaders doesn't seem to be
necessary, but I've made the same change there for consistency.
This gets us closer to fixing https://bugs.dolphin-emu.org/issues/12791
but doesn't actually fix it.
On devices which have hardware support for dual source blending
but not logic ops, this lets us skip performing the framebuffer
fetch in situations where the game isn't actually using logic ops.
Currently, the axes for the main and C sticks range from 0-255, with
128 being the mid-point; but this isn't symmetrical: the negative axis
has 128 values not including 0, while the positive axis has 127 values
not including 0.
Normalizing so that the range is 1-255 makes the positive and negative
axes symmetrical. The inability to yield 0 shouldn't be an issue as a
real GC controller cannot yield it anyway.
I.e. flush pokes before running an EFB peek, if the cache tile isn't present. If the cache tile is present, then EFB pokes should have been written to the cache tile and thus don't need to be flushed.
This saves the GUI from having to manually call SDIO_EventNotify.
With that out of the way, we can let users change the
"Insert SD Card" setting on Android while a game is running.
Previously, when Pause at End of Movie was disabled, the game would continue running as it should, but the menu bar would think the game was paused, showing the play button instead of the pause button. To make things worse, clicking the play button would then restart the game, instead of pausing or doing nothing. F10 paused/unpaused as normal, though.
The old behavior was essentially to enable stepping/pause mode (via `CPU::Break()`) and then if Pause at End of Movie was disabled, to un-pause on the host thread (via `CPU::EnableStepping(false)`). For reasons which aren't entirely clear to me, the first one notified the menu bar (through the `Host::UpdateDisasmDialog` callback, not the `Settings::EmulationStateChanged` one), and the second did not. In any case, this approach does not particularly make sense; I don't see any reason to pause and unpause if Pause at End of Movie is disabled; instead, we should only pause when Pause at End of Movie is enabled.
This behavior was probably introduced in c1944f623b, though I haven't tested it.
directly_mapped_vars was added in #69 (4129b30494), but for some reason FIFO_BP_LO/HI were split out from it in in #885 (65af90669b). As far as I can tell, this code (and the code that existed at the time) is identical, so there's no reason to have it handled separately.
In a code block where a guest register is accessed at least twice and the
last access is a write and the register is not discardable immediately
after the second-to-last instruction (perhaps there is an instruction
in between that can cause an exception), currently Dolphin's JITs will
flush the register after the second-to-last instruction.
It would be better if we replaced the flush after the second-to-last
instruction with a flush that only happens if the exception path is
taken. This change accomplishes that by marking guest registers as
"in use" not just when they are used as inputs but also when they are
used as outputs, preventing the loop in DoJit from flushing the
register until after the last access.
This makes codegen faster (by perhaps 10-20% in the case of Jit64,
I didn't measure too closely), which helps speed up NBA Live 2005
a little. But the game still has serious performance issues.
The DSP JIT only applies on x64, so if it doesn't work on esoteric compilers then that's not a problem. (And if it fails to compile, then it'll still produce an error on that platform, just no warnings on other platforms)
The size variable started to be unused when I created std::array variants of ReadArray, but we should follow it in case any files have fewer registers stored than they should (otherwise the remaining registers would end up with garbage data from later in the fifolog). Though, there probably aren't many fifologs where this is relevant.
Large amounts of logging can have an impact on performance, so moving the ones that have been determined to not matter to the warn level gives a way to hide those messages without hiding actual errors (and also gives a fast visual way of distinguishing between ignored and non-ignored ones due to the different colors).
Fixes https://bugs.dolphin-emu.org/issues/12827.
A description of what was going wrong:
JitArm64::Init first calls CodeBlock::AllocCodeSpace, after which
CodeBlock and Arm64Emitter consider us to have 96 MB of code space
available. JitArm64::Init then calls AddChildCodeSpace, which is
supposed to take 64 MiB of that space and give it to m_far_code.
CodeBlock's view of how much space there is gets updated from 96 MiB
to 32 MiB, but due to the missing call, Arm64Emitter keeps thinking
that it has 96 MiB of space available.
The last thing JitArm64::Init does is to call ResetFreeMemoryRanges.
This function asks Arm64Emitter how much code space is available and
stores a range of that size in m_free_ranges_near, meaning that
m_free_ranges_near ends up being backed by both nearcode and farcode!
This is a ticking time bomb; as soon as we grab memory from
m_free_ranges_near which is backed by farcode, we're in trouble.
The crash I ran into in my testing was caused by fastmem code being
allocated in farcode (our backpatch handler only handles accesses made
from nearcode), but you may as well get errors caused by code intended
for nearcode overwriting code intended for farcode or vice versa.
So why did NBA Live 2005 crash when most games had no problems,
and why was the bug bisected to the commit that increased the size
of far code from 16 MiB to 64 MiB? Well, as long as we're only
using the first 32 MiB of the big 96 MiB range, everything works.
What happens with NBA Live 2005 (I have not investigated exactly
through what mechanism this happens) is that at some point the range
in m_free_ranges_near gets split into two ranges, one which is
backed by nearcode and one which is backed by farcode. Dolphin
prefers to select the biggest range available (we don't want to
pick a tiny 1 KiB range that may not be able to fit the whole block
we're about to emit, after all), and after increasing the size of
farcode to 64 MiB, farcode is bigger than nearcode.
It doesn't make sense for alpha to add the bias ONLY when dividing by 2, while color doesn't apply the bias for divide by 2 only; hardware testing indicates that alpha should have the bias.
This fixes the menus in Mario Kart Wii (https://bugs.dolphin-emu.org/issues/11909) but reintroduces the white rectangle in Fortune Street.
This reverts commit 5aaa5141ed (and several other matching changes elsewhere).
Turning off primitive restart increases performance a lot on
Adreno for some reason. We're talking numbers like 50%-100% faster
in situations which are bottlenecked by rendering.
* Disabled: disables the overlay pointer
* Follow: default behaviour, IR pointer follows touch position
* Drag: IR pointer moves relative to the initial touch event position
At least in GLSL, after calling EmitVertex() the value of all 'out' variables (including gl_Layer and ps) becomes undefined. On OpenGL it seems like they were unchanged, but on Vulkan they became 0, resulting in bad rendering.
Fixes https://bugs.dolphin-emu.org/issues/12001
Currently, disabling mGBA when building gets rid of the ability to
change the GBA saves directory in DolphinQt, but it doesn't actually
get rid of Dolphin loading and storing the setting and creating the
folder. If the setting is set to a path you don't want to use
(perhaps you are trying to turn Dolphin portable), this is annoying.
To avoid accidentally making mistakes like this in the future,
I'm gating the existence of the setting behind an ifdef.
DiscIO depends on some IOS functions and other functions, which are in Core and not Common. This results in link errors if using DiscIO on its own (which is why DolphinTool had a listed dependency on videocommon; videocommon has a dependency on core so adding that made things build).
Fixes a crash that could occur if the static constructor function for
the MainSettings.cpp TU happened to run before the variables in
Common/Version.cpp are initialised. (This is known as the static
initialisation order fiasco.)
By using wrapper functions, those variables are now guaranteed to be
constructed on first use.
If the purpose of calling SetFullscreen using RunAsCPUThread is
to make sure that the GPU thread is paused, the fix in ef77872
is faulty when dual core is used and a panic alert comes from
the CPU thread. This change re-adds synchronization for that case.
The fix in ef77872 worked for panic alerts from
the CPU thread, but there were still problems with
panic alerts from the GPU thread in dual core mode.
This change attempts to fix those.
Using unsigned char* or signed char* results in a deprecation warning, which is treated as an error. It needs to be casted to regular char* for it to work.
At least in MSVC (which is not restricted from targetting C++20), these can be resolved to either std::format_to or fmt::format_to (though I'm not sure why the std one is available). We want the latter.
This format string is by definition dynamic and can't be checked at compile time. There are other similar strings in the log handler and in asserts, but they use vformat and thus don't need fmt::runtime. We might be able to do a similar thing where the untranslated string is compile-time checked, but FmtFormatT is used in so few places that I don't want to handle that in this PR.
This syntax is allowed by GLSL, but HLSL doesn't allow it. This meant that games using R8 comparisons in equal mode would produce shaders that failed to compile. Super Mario Galaxy's water levels were affected by this.
This matches BootManager.cpp's old behavior.
Note: The other settings of the "Controls" section (WiimoteSource
and WiimoteSourceBB) are still handled in BootManager.cpp.
The "DSP" game INI section name was supported by BootManager.cpp
before the section was ported to the new config system.
For backwards compatibility, we should keep supporting it.
Should fix https://bugs.dolphin-emu.org/issues/12792
HRWrap now allows HRESULT to be formatted, giving useful information beyond "it failed" or a hex code that isn't obvious to most users. This commit does not add any uses of it, though.
This option has always existed since it's used by FifoCI, but now it can be changed at runtime. Looping is something that should almost always be on, but it can be useful to turn it off when frame-dumping is enabled so that hundreds of copies of the same frame aren't created. Before, turning it off required restarting Dolphin.
This extends the timeout to 30 seconds, so users who have brief
connection issues won't be so swiftly disconnected, allowing the
NetPlay session to continue.
Now, enums are properly displayed, and BitFieldArray is also displayed nicely. Signed values also work correctly, and 1-bit fields are not treated as bools unless the bitfield is explicitly marked as a bool.
These tends to get requested from either pure GDB or Ghidra. They reduce the verbosity of the communications. The QSupported packet is also important to implemnent for future proofing too.
The stub was made with the assumption that the GDB architecture is rs6000:6000, but the closest is actually powerpc:750 which features much more SPR that the gekko supports, but it also has slightly different ID. This commit now assumes the more proper powerpc:750.
We don't use sampler2DMS, but we do use sampler2DMSArray.
I can't reproduce it on my phone, but a user who was running GLES
on a Tegra X1 reported a shader compilation error related to this.
Mainly concerns to building with Ninja, as that's what I tested it with.
Originally it would only prepend the first path with `/external:I`,
however all paths in the list have to be prepended with `/external:I`.
The MS documentation seems to support this, as it makes no mention of it
accepting a list.
This is probably the worst way to implement this, I am unfamiliar with
CMake.
It's always a good sign when the comments say "this will definitely
crash" and "I don't know if this is for a good reason".
Fixes https://bugs.dolphin-emu.org/issues/12762.
These GetPointer calls could cause crashes, in part because the
callers didn't do null checks and in part because GetPointer
inherently is unsafe to use for accesses larger than 1 byte.
The actual values don't matter since we overwrite all of the relevant fields, but other bits were not initialized (e.g. the top 12 bits of X10Y10), so the warning was semi-valid.
This piece of code is rather hard to understand, but my best guess
at what it's trying to do is that it tries to create opportunities
to skip writing CRs back to ppcState if we know that there are no
CR instructions (or branch instructions, etc) between an instruction
that writes to a CR register and the next blr. This is technically
inaccurate emulation, but as long as games don't do anything too
weird with their ABIs, I suppose it doesn't break anything.
So why do I want to get rid of it? Well, other than breaking some
hypothetical weird game, I imagine it could trip up people trying
to debug a game who are looking at the CR contents. And the code
is just plain confusing. (blr clearly doesn't write to CRs!)
Videocommon also depends on core, which resulted in linking errors (though I'm not sure why). Ideally, dolphintool woudln't depend on videocommon... but some stuff in core does.
Previously, EFB copies would be in the middle of other objects, as objects were only split on primitive data. A distinct object for each EFB copy makes them easier to spot, but does also mean there are more objects that do nothing when disabled (as disabling an object only skips primitive data, and there is no primitive data for EFB copies).
This also adds the commands after the last primitive data but before the next frame as a unique object; this is mainly just the XFB copy. It's nice to have these visible, though disabling the object does nothing since only primitive data is disabled and there is no primitive data in this case.
It became irrelevant in 952dfcd610, when the define was removed; now, the code the comment is referring to is in JitRegister.cpp, and oprofile is controlled by cmake.
My change in 867cd99 caused farcode to fill up much more than before.
Most games still ran fine, but certain games would have multiple
cache clears per minute, on top of being overall slow.
Maybe there's something prettier we can do about this than just
allocating more RAM, but we have the resource budget for making
Dolphin use more RAM, so I consider increasing the size of the
cache to be a good solution at least for the time being.
At least for Prince of Persia: The Forgotten Sands, 48 MB isn't
enough to stop the cache clears, but 64 MB is. (I only played the
game for a few minutes when testing, though.)
To do this, I had to decouple framebuffer fetch from shader blending. We need to be able to access framebuffer fetch input when using shader logic ops.
If InputConfig::LoadConfig() was called once with a non empty/customized config,
then called again after manually deleting the config (dolphin calls LoadConfig() every time it opens the mapping widget),
the second load would fail to clear the values on any non first EmulatedController and would instead keep the
previous config values despite it being deleted (while it would instead correctly default the first EmulatedController).
This is not a big bug though the code is better now.
Fixes bug: https://bugs.dolphin-emu.org/issues/12744
Before e1e3db13ba
the ControllerInterface m_devices_mutex was "wrongfully" locked for the whole Initialize() call, which included the first device population refresh,
this has the unwanted (accidental) consequence of often preventing the different pads (GC Pad, Wii Contollers, ...) input configs from loading
until that mutex was released (the input config defaults loading was blocked in EmulatedController::LoadDefaults()), which meant that the devices
population would often have the time to finish adding its first device, which would then be selected as default device (by design, the first device
added to the CI is the default default device, usually the "Keyboard and Mouse" device).
After the commit mentioned above removed the unnecessary m_devices_mutex calls, the default default device would fail to load (be found)
causing the default input mappings, which are specifically written for the default default device on every platform, to not be bound to any
physical device input, breaking input on new dolphin installations (until a user tried to customize the default device manually).
Default devices are now always added synchronously to avoid the problem, and so they should in the future (I added comments and warnings to help with that)
This fixes the bad rendering on the first frame when using the software renderer: the software renderer's Z buffer started out at 0, but most games clear it to 0xffffff instead; this means that things don't render correctly except for in the regions where the screen was cleared by an EFB copy earlier in the frame.
The system menu does clear the RTC flags, but we currently aren't updating the cache file, and since we clear them the system menu doesn't know to update the cache either. This means that launching a game via the system menu, and then launching a game directly and exiting via HOME will result in the system menu using an outdated cache and displaying the old game. This causes it to fail to launch the game on the disc channel (since it doesn't match the cache), resulting in it resetting (though it will ignore the cache after resetting). Not clearing the cache avoids this issue.
PNG_FORMAT_RGB and PNG_COLOR_TYPE_RGB both evaluate to 2, but PNG_FORMAT_RGBA evaluates to 3 while PNG_COLOR_TYPE_RGBA evaluates to 6; the bit indicating a palette is 1 while the bit indicating alpha is 4.
Fixes Bomberman Jetters in single core mode.
When single core mode pauses the CPU to execute the GPU
FIFO it greedily executes the whole thing. Before this commit,
Finish and Token interrupts would happen instantly, not even
taking into account how long the current FIFO window has
taken to execute. The interrupts would be effectively backdated
to the start of this execution window.
This commit does two things: It pipes the current FIFO window
execution time though to the interrupt scheduling and it enforces
a minimum delay of 500 cycles before an interrupt will be fired.
Being able to preserve the address register is useful for the
next commit, and W0 is the address register used for loads. Saving
the address register used for stores, W1, was already supported.
If a host register has been newly allocated for the destination
guest register, and the load triggers an exception, we must make
sure to not write the old value in the host register into ppcState.
This commit achieves this by not marking the register as dirty
until after the load is done.
This does this following things:
- Default to the runtime automatic number of threads for pre-compiling shaders
- Adds a distinct automatic thread count computation for pre-compilation (which has less other things going on
and should scale better beyond 4 cores)
- Removes the unused logical_core_count field from the CPU detection
- Changes the semantics of num_cores from maximaum addressable number of cores to actually available CPU cores
(which is also how it was actually used)
- Updates the computation of the HTT flag now that AMD no longer lies about it for its Zen processors
- Background shader compilation is *not* enabled by default
Removed useless locks to DeviceContainer::m_devices_mutex, as they were all already protected by m_devices_population_mutex.
We have no interest in blocking other threads that were potentially reading devices at the same time so this seems fine.
This simplifies the code, and I've adjusted a few comments which mentioned possible deadlock that should now be totally gone.
The deadlock could have happen if a thread directly called EmulatedController::UpdateReferences(), while another another thread also reached EmulatedController::UpdateReferences() within a call to ControllerInterface::UpdateDevices(), as the mentioned function locked both the DeviceContainer::m_devices_mutex and s_get_state_mutex at the same time.
The deadlock was frequent on game emulation startup on Android, due to the UpdateReferences() call in InputConfig::LoadConfig() and the UI thread triggering calls to ControllerInterface::UpdateDevices().
It could also have happened on Desktop if a user pressed "Refresh Devices" manually in the UI while the input config was loading.
Also brought some UpdateReferences() comments and thread safety fixes from https://github.com/dolphin-emu/dolphin/pull/9489
This commit changes the default value of Fast Texture Sampling to true, and also moves the setting that controls it to the experimental section of the advanced tab. This is its own commit so that it can be easily reverted when we want to default to Manual Texture Sampling.
Co-authored-by: JosJuice <josjuice@gmail.com>
Specifically, when using Manual Texture Sampling, if textures sizes don't match the size the game specifies, things previously broke. That can happen with custom textures, and also with scaled EFB copies at non-native IRs. It breaks most obviously by not scaling the texture coordinates (so only part of the texture shows up), but the hardware wrapping functionality also assumes texture sizes are a power of 2 (or else it will behave weirdly in a way that matches how hardware behaves weirdly). The fix is to provide alternative texture wrapping logic when custom texture sizes are possible.
Note that both GLSL and HLSL provide a fwidth (fragment width) function defined as `fwidth(p) = abs(dFdx(p)) + abs(dFdy(p))`. However, it's easy enough to implement this ourselves (and it makes the code a bit more obvious).
The benefit to exposing this over the raw BP state is that adjustments Dolphin makes, such as LOD biases from arbitrary mipmap detection, will work properly.
These messages apply to the User directory regardless of
whether it's global or local, so we shouldn't specify "global".
Also changing "directory" to "folder", just for consistency
with "GC folder" in the same sentence.
We implement this by first rounding to nearest integer using the current
rouding mode, then converting this value from floating point to an integral
value.
Prefer using eax to isolate the sign bit. This saves a byte when the
destination ends up as r8-15, because those require a REX prefix.
Before:
41 8B C5 mov eax,r13d
41 C1 ED 1F shr r13d,1Fh
44 03 E8 add r13d,eax
41 D1 FD sar r13d,1
After:
41 8B C5 mov eax,r13d
C1 E8 1F shr eax,1Fh
44 03 E8 add r13d,eax
41 D1 FD sar r13d,1
This function was deprecated in ffmpeg in January[1], while its
replacement got introduced in 2015[2], so now might be the time to start
using it in Dolphin. :)
[1] f7db77bd87
[2] a9a6010637
This moves the only direct call to zlib’s crc32() into its own
translation unit, but that operation is cold enough that this won’t
matter in the slightest. crc32_z() would be more appropriate, but
Android has an older zlib version…
This was originally intended to fix https://bugs.dolphin-emu.org/issues/12717 but this ended up not being the issue (instead it seems like files just weren't recompiled when imgui was updated due to MSVC weirdness). Still, using brackets instead of quotes is preferable as this is a library include.
The reload stub is at a fixed address (0x80001800) so its hook flag
should be HookFlag::Fixed.
Otherwise the hook is installed by HLE::PatchFixedFunctions but
immediately removed by HLE::PatchFunctions (which is called by
HLE::Reload right after PatchFixedFunctions).
Should fix https://bugs.dolphin-emu.org/issues/12716
This also may eventually allow loading patches from sources other than the 1:1 expected file structure host file system, such as memory or an archive file.
While trying to work on adding audiodump support for CLI, I was alerted that it was important to first try moving the DSP configs to the new config before continuing, as that makes it substantially easier to write clean code to add such a feature.
This commit aims to allow for Dolphin to only rely on the new config for DSP-related settings.
Frame Advance Speed hotkeys were swapped. This likely occurred because speed and delay are inverses (i.e. a speed increase should DECREASE the delay and vice versa).
This replaces the MAX_LOGLEVEL define with a constexpr variable
in order to fix self-comparison warnings in the logging macros
when compiling with Clang. (Without this change, the log level check
in the logging macros is expanded into something like this:
`if (LINFO <= LINFO)`, which triggers a tautological compare warning.)
GCC complains about float_emit being null when inlining
ByteswapAfterLoad into MMIOLoadToReg. ByteswapAfterLoad
does dereference float_emit, but only when passing FLAG_FLOAT,
which MMIOLoadToReg has an assert for and does not support.
Also cleaning up some unnecessarily specified namespaces while
I'm at it.
The compiler was loudly announcing each and every branch Tev was not checking in
a switch statement, but Tev has learned it's lesson and will produce that
warning no more.
Reusing the slowmem handlers of existing blocks meshes badly
with reusing the empty space left when destroying blocks.
I don't think reusing slowmem handlers was much of a gain anyway.
This is done entirely through interpreter fallbacks. It would
probably be possible to implement this using host exception
handlers instead, but I think it would be a lot of complexity
for a rarely used feature, so let's not do it for now.
For performance reasons, there are two settings for this feature:
One setting which does enables just what True Crime: New York City
needs and one setting which enables it all. The latter makes
almost all float instructions fall back to the interpreter.
Instead of having a single GUI checkbox for "Always Hide Mouse Cursor",
I have instead opted to use radio buttons so the user can swap between
different states of mouse visibility. "Movement" is the default
behavior, "Never" will hide the mouse cursor the entire time the game is
running, and "Always" will keep the mouse cursor always visible.
Previously the unhide of movement mouse_timer reset occurred within case MouseButtonPress.
Additionally, there was a redundant expression in the if statement for cursor locking.
Now works with games that deliberately avoid invalidating TMEM because
they know textures are too large to fit:
* Sonic Riders
* Metal Arms: Glitch in the System
* Godzilla: Destroy All Monsters Melee
* NHL Slapshot
* Tak and the Power of Juju
* Night at the Museum: Battle of the Smithsonian
* 428: Fūsa Sareta Shibuya de
There are two reasons for this.
1. Using Dolphin's logging system lets the user decide whether
the printout should go to the terminal, the GUI, or a file.
fmt::print always prints to stdout... unless you're on Android, in
which case it does nothing at all, because Android disables stdout.
2. The Windows version of Dolphin crashes when you use fmt::print.
Yes, really. The crash happens because a call to std::fprint in
fmt::v7::detail::fwrite_fully returns that less characters were
written than requested, which fmt handles by throwing an exception.
(As always, Dolphin does not use exception handling.)
I'm not sure why std::fprint is doing this, but since switching
away from using fmt::print is a good idea due to the previous point
anyway, I'd say it's best to just switch.
Previously, if you have "Hotkeys Require Window Focus" disabled, you could repeatedly use the "Open" hotkey, for example, to stack File Open windows over top of each other over and over.
This commit allows the hotkey manager to disable/enable on QFileDialog creation and destruction.
Currently the logic for addressing the individual TexUnits is splattered all
across dolphin's codebase, this commit attempts to consolidate it all into a
single place and formalise it using our new TexUnitAddress struct.
MappingWindow is modal, yet the user can use hotkeys while the window is active. I believe hotkeys should not be recognized while this window is active.
This string is extremely likely to be mistranslated without the
proper context. Actually, it's probably impossible to translate
this string in a good way to some languages, but I'm not sure how
to solve that. Let's at least add an i18 comment for now.
PR #10066 added functionality to call std::abort when a panic alert occurs; however, that PR only implemented it for MsgAlert and not MsgAlertFmtImpl, meaning that the functionality was not used with PanicAlertFmt (only PanicAlert, which is not used frequently).
Yes, that's right! It's time to add even more NKit warnings,
because users still don't understand what NKit is or how it works!
More specifically, some users seem to be under the impression that
converting an NKit file to for instance RVZ using Dolphin's convert
feature will result in a normal RVZ file, when it in fact results in
an NKit RVZ file (since NKit is not a container format in the sense
that GCZ/WIA/RVZ/WBFS/CISO is, but rather a kind of trimmed ISO).
I can hardly blame users for not knowing this, because it's not
intuitive unless you know the technical details of how NKit works.
Previously, s_temp_input was being used for BOTH the savestate's and the movie's input printout in the panic alert.
This commit simply performs memcpy from the correct vector for the movInput printout.
Previously, the file dialog window was ambiguous between saving or loading a .dtm. This commit simply gives a bit more context to differentiate the two windows.
Previously, when playing back a movie, you could not see the total frame count of a movie, only the total number of input polls.
This change simply shows the total frame count on movie playback.
Note that this change also results in the framecount and framecount total ALWAYS being displayed if show_movie_window is true, regardless of whether or not m_ShowFrameCount is true. I believe this is fine, as TASers are much more likely to reference the framecount than the input poll count.
Previously, only the number of total input polls would be shown in the window title when playing back a movie. This simply adds the VI / frame count total as well, which is a much more relevant number to look at while TASing.
If this commit is not applied, then the previous commit will cause hotkeys to be saved if there is a syntax error when hitting "OK" and the user presses the X to close the window.
This commit only applies changes to hotkey config if no syntax error occurs.
Previously you could type whatever gibberish you wanted into the formula bar, press OK, and receive a modal syntax error window. Closing the syntax error window would cause the hotkey config window to close as well, and your gibberish would be applied to the hotkey assignment.
This commit requires that a syntax error does not occur before accept() is called.
Previously, using TAS Input to activate the digital L and R buttons would not show these inputs in the Input Display. This commit adds the digital L and R presses to the Input Display, and also displays just "L" or "R" if the analog is set to 255.
There are certain hotkeys that we absolutely want to be able to use
without being in-game. Presently, no hotkeys are recognized unless we
are in-game.
I've identified and moved the following hotkeys to be checked before the
HotkeyScheduler checks to see if the Core is running:
- Open
- Exit
- Start Recording
- Refresh Game List
Note that Play Recording should also be implemented here, however it
looks like there is no signal for a PlayRecording() function, so this
will have to be handled in a later PR once that signal is created and
implemented.
Now that we have enum helpers for inserting values into packets and have
migrated all other enumerations over, there's no need to keep this alias
around any longer.
Previously, it was not clear where the boundary of the StickWidget was when interacting outside of the circle. This aims to restore the gray square present in the Wx-era.
Over time OnData() has become a huge function-long case statement that
attempts to manage numerous packet-related behaviors, which makes it a
little difficult to reliably ensure certain handling doesn't interfere
with another case's. It's also mildly annoying to navigate due to its
size.
To make it a little easier to read and find the specific behavior, we
can break the relevant pieces of code out into their own functions.
When RenderDoc is attached, wglShareLists fails for some reason (see baldurk/renderdoc#2361). wglCreateContextAttribsARB has a parameter for the share context, so there's no reason to use a separate wglShareLists call.
Co-authored-by: baldurk <baldurk@baldurk.org>
Previous code from #7950 only clamps correctly when the efb copies
left and top coordinates are (0, 0)
Now we should handle all situations.
Spyro: A hero's tail is an example of a game that does an oversized
EFB copy with a non-zero origin.
If W0 is locked when fpr.RW is called, the indirectly called
ConvertSingleToDoubleLower may need to emit a push+pop, so it's
better for fresx/frsqrtex to call RW before locking W0 than after.
This way the address check will take up less icache (since it's
only emitted once for each routine rather than once for each
psq_st instruction), and we also get address checking for psq_l.
Matches Jit64's approach.
The disadvantage: In the slowmem case, the routines have to
push *every* caller-saved register onto the stack, even though
most callers probably don't need it. But at long as the slowmem
case isn't hit frequently, this is fine.
In the case of the JitAsm routines, we can't actually use
backpatching. Still, I would like to gather all the load and
store instructions in one place to make future changes easier.
This adjusts the NaN replacement logic introduced in #9928 to work around the HLSL compiler optimizing away calls to isnan, which caused that functionality to not work with ubershaders on D3D11 and D3D12 (it did work with specialized shaders, despite a warning being logged for both; that warning is also now gone). Note that the `D3DCOMPILE_IEEE_STRICTNESS` flag did not solve this issue, despite the warning suggesting that it might.
Suggested by @kayru and @jamiehayes.
This is a proper fix for the issue that 3071a1d was a workaround for.
It wasn't some kind of bug in the register cache that had laid dormant,
it was a simple mistake made in b24b79e.
Fixes a regression from ecf86bb.
The GPR allocation_order is initialized with only 28 elements,
so the 29th element ends up getting zero initialized.
Very sneaky bug...
Previously in Read_U64 and Write_U64 the value that was read or written
would be truncated to a 32-bit value before being passed off to the
memcheck handler, which can result in incorrect values being logged out.
Lets us simplify SDRUpdated() a little bit.
This also fixes the layout of UReg_SDR1. Turns out this struct has been
incorrect (from a little-endian perspective) the entire time and went
unnoticed, since the union was never used.
These are trivial to resolve.
Converting the structure member into a u32 results in no increase in
structure size, as it's making use of the three extra padding bits in
the structure.
On a real Wii, these constants are normally written by the system menu
(maybe even as part of common SDK code?)
However, they're cleared by IOS whenever a PPC title is launched.
IOS memsets 0x0-0x3fff and then manually writes some constants
in low MEM1. PR #4723 added most of the writes in the 0x31xx region
but left out the four writes to the legacy constant region.
Previously Dolphin didn't actually clear 0-0x3fff so those constants
would stick around after a system menu execution.
011f7789e0 exposed those missing writes.
Prompted by https://dolphin.ci/#/builders/24/builds/985
A 1-character typo in a recent PR caused FifoCI builds to break
horribly and spew millions of panic alerts until buildbot crashed.
This PR adds a new config option -- defaulting to off -- that allows
Dolphin to abort early on when a panic alert occurs instead of
continuing forever.
Optimize division by a constant into multiplication. This method is
also used by GCC and LLVM.
We also add optimized paths for divisors 0, 1, and -1, because they
don't work using this method. They don't occur very often, but are
necessary for correctness.
Makes the enum strongly typed instead of interacting with a raw u32
value. While we're at it, we can add helpers to the NWC24Config to make
using code poke at the internals of the class a little bit less and also
make the querying a little nicer to read.
Currently we were using heap allocating maps that last for the entire
duration of the emulator running.
Given the size N of both of these maps are very small (< 20 elements),
we can just make use of an array of pairs and perform linear scans. This
is also fine, given this code isn't particularly "hot" either, so this
won't be run often.
It seems like we spend a lot of the game list scanning time in
updateAdditionalMetadata, which I suppose makes sense considering
how many different files that function attempts to open.
With the addition of just one little atomic operation, we can make
it safe to call updateAdditionalMetadata without holding a lock.
These functions don't touch any class state, so they can be turned into
internal helper functions.
While we're at it, we can move the enumerations as well.
Although it's not clear what the xA and xB conditions are intended to do, the pattern indicates that xB is the regular version and xA is the inverted version, so for consistency, IsConditionB should be the main function.
Since the merge of b24b79e, we've gotten reports that the
following games are broken on JitArm64:
* Sonic Heroes
* The SpongeBob SquarePants Movie
* Astérix & Obélix XXL
* The Incredibles: Rise of the Underminer
Disabling the register cache avoids the issue, so the cause
of the bug might not actually have anything to do with the
newly implemented instructions. Nevertheless, I don't want
to ship a beta with this problem present, so I would like to
disable these instructions for the time being.
HandleFastmemFault works correctly when faults only happen in
expected locations, but it does some things that are rather
dangerous for faults in unexpected locations, like decrementing
an iterator without checking whether it's equal to begin.
This change cleans up the logic by making m_fault_to_handler's
key be the end of the fastmem region instead of the start.
Hardware testing indicated that SRS uses a different list of registers than LRS (specifically, acS.h can be used with SRSH but not LRS, and SRS does not support AX registers, and there are 2 encodings that do nothing).
* DSP*Arithmetic: Fix grammar for ANDCF and ANDF
* DSP*Arithmetic: Fix registers used by MOVAX and MOV
* DSP*Branch: Fix documentation for JMPR
* DSP*Branch: Fix HALT encoding ("I think I saw a two")
* DSP*ExtOps: Fix 'LN encoding (The listed encoding was for 'L)
* DSP*ExtOps: Improve documentation for 'LD and 'LDAX
* DSPJitExtOps: Correct typo
* DSP*LoadStore: Remove obsolete comment about pc in SRS (This was fixed in 1419e7e5b2)
* DSP*LoadStore: Fix comments for LRR/SRR
* DSP*Misc: Improve documentation for SBCLR and SBSET
* DSP*Multiplier: Fix MULXAC encoding (The previous encoding was for MULXMVZ)
* DSP*Multiplier: Fix tabs in MULCAC and MULCMVZ (There are some other tabs in comments in the JIT, but these are the only ones that are in instruction comments instead of indicating the corresponding interpreter code. Those other comments can be corrected in a different PR, as they're not documentation related.)
* DSPJitMultiplier: Fix MULXMVZ typo
These instructions were already implememented by Dolphin, but never added to the manual. Extension instructions will be handled in a later commit, as wlil instructions that were not previously implememented by Dolphin.
We were using a "value" register to avoid clobbering physical_addr,
but this isn't actually needed anymore. The only bits we need from
physical_addr after we start clobbering it are bits 5-9, and
those bits are identical in effective_addr and physical_addr,
so we can read them from effective_addr instead.
Fixes https://bugs.dolphin-emu.org/issues/12620
The changed code did not match the corresponding code in VertexShaderGen. Some parts of the sky have 2 color channels in each vertex, while others only have 1, despite only color channel 0 being used and XFMEM_SETNUMCHAN being set to 1 for both of them. The old code (from #4601) caused channel 0 to be set to channel 1 if the vertex contained both color channels but the number of channels was set to 1, which is wrong.
Makes our conversions between the different signs explicit to indicate
that they're intentional and also silences compiler warnings when
compiling with sign conversion or stricter truncation warnings enabled.
The extension needs to happen in SetLongAcc, not GetLongAcc, as the extension needs to always be reflected in acS.h.
There is no functional difference with the write handler for acS.h, but it is more readable than 4 casts in a row.
`IsLess` would incorrectly return true if both `SR_OVERFLOW` and `SR_SIGN` are set, as `(sr & SR_OVERFLOW) != (sr & SR_SIGN)` becomes `SR_OVERFLOW != SR_SIGN` which is true as the two masks are different. This broke in e651592ef5.
This issue only affected the DSP LLE Interpreter, and not the DSP LLE JIT.
I've also included a simple test case for this. `ax0.l` (on the top left) is set to 0 if the instruction following `IFL` does not execute and to 1 if it is executed.
Retail-signed discs use the format: IOS56-64-v5661.wad
Debug-signed discs use the format: firmware.64.56.22.29.wad
Debug-signed discs usually have a 128 version of the firmware as well,
since some devkits have 128 MB MEM2. (Retail has 64 MB.)
I found it a little bit annoying that you can't start typing
the desired address immediately after opening the window.
Also getting rid of the window's ? button while I'm at it.
When you come across a cheat code in a place like the Dolphin
wiki, it's often posted like this:
$16:9 Widescreen
0441187C 3FE38E39
Sometimes users try to paste this in its entirety into the Code
field, which leads to Dolphin reporting an error on the first line.
I think it would be nice to make this a little smoother by having
Dolphin accept having a first line that starts with $.
For several reasons:
- It pegs the CPU at 95% for scanning even when Dolphin is idle
- WiimoteScannerHidapi works fine on macOS
- Less macOS code to maintain
When making 92d1d60, I checked whether the ~0x1f masking in dcbx
actually was necessary. I came to the conclusion that it wasn't,
so I removed it. However, I hadn't checked the second half of
InvalidateICache closely enough - the masking is actually needed.
This commit re-adds the masking, but this time in C++ code instead
of in jitted code in order to save icache. Though I suppose the
difference doesn't matter all that much, since this is in farcode
and all...
Hopefully fixes https://bugs.dolphin-emu.org/issues/12612.
This implements the behavior described in
https://bugs.dolphin-emu.org/issues/12565.
Thank you to eigenform, delroth, phire, marcan, segher, and Extrems
for all helping in one way or another with the efforts to reverse
engineer this behavior, and to Rylie for reporting the issue.
Write_U16_Swap leaves the upper 32 bits alone. Reimplementing this
correctly in the JIT would require more than one instruction,
so let's just call Write_U16_Swap instead, like Jit64 does.
One of the following commits will add emulation of a quirk
that only happens when writing to memory which is mapped as
write-through or cache-inhibited, so let's keep track of
which memory is mapped in this way.
This adds about a frame of latency, and since most games don't change
VI registers during scanout, we can get away with outputting the XFB at
the start of scanout. WWE Crush Hour is the (only currently known)
exception, which has flickering problems when doing it this way.
This adds a path to perform the output at the end of scanout, and gates
it behind an option which defaults to using the latency-reducing
pre-scanout path.
PR https://github.com/dolphin-emu/dolphin/pull/9700 removed spaces from within control names, which some user complained about, and their point of view is kind of understandable:
https://bugs.dolphin-emu.org/issues/12605
with this change, only spaces outside (between) control names are trimmed, which are the ones we wanted to trim in the first place.
This will still retain the major advantages from 9700.
Basically, "`Button 1` + `Button 2`" was showing as "`Button1`+`Button2`", while it will now show as "`Button 1`+`Button 2`".
Originally, 1479 (for example) would disassemble as `lsr $ACC0, #-7`. At some point (likely the conversion to fmt), this regressed to `lsr $ACC0, #4294967289`. Now, it disassembles as `lsr $ACC0, #7`.
The CPU-side AX library enables it by default and uses hardcoded parameters.
CMD_COMPRESSOR_TABLE_ADDR (0x0A) was incorrect. It's always a nop on the
GameCube and was probably confused with the Wii version.
It was believed that this only mattered when the rounding mode was
set to round to infinity, which games generally don't do, but it
can also affect the sign of the output when the inputs are all zero.
So it turns out you have to pass XMM0 as the clobber register
to HandleNaNs, because HandleNaNs uses BLENDVPD and BLENDVPD
implicitly uses XMM0, and nobody noticed when I broke this in
2c38d64 because nobody plays the one game that needs accurate NaNs.
This was added because YAGCD's info on MAXANISO (near TX_SETMODE0 in Section 5.11.1) claims it's the case, but Extrems says it does work. I haven't tested anything myself, and dolphin still does not actually implement anisotropic filtering based on this field.
This reverts commit 66b992cfe4.
A new (additional) correctness issue was revealed in the old
AArch64 code when applying it on top of modern JitArm64:
LSR was being used when LSRV was intended. This commit uses LSRV.
The workaround added in 30f9f31 caused a regression where Dolphin
incorrectly replaced runs of one byte with runs of another byte
when writing WIA and RVZ files. ReuseID::operator< was always
returning false unless the ReuseIDs being compared had different
partition keys, which caused std::map<ReuseID, GroupEntry>
to treat all ReuseIDs with the same partition key as equal.
This actually eliminates any setting pertaining to SD cards from the
NetPlay dialog, as it would effectively just be a duplicate of the
setting in the Wii pane, potentially causing confusion.
This also enables save data writing by default, as this is probably
what most players want, and should avoid them losing hours of progress
because they forgot to tick a checkbox.
This implementation is pretty efficient in my opinion. And "As
long as we aren't falling back to interpreter we're winning a lot"
applies to basically every instruction to some degree anyway.
The dcbz instruction needs to lock W30 so that the slowmem code will
push and pop it when calling into C++. Also, the slowmem code expects
that the address is present in W0, so replace the use of W0 as a scratch
register in the fastmem code with the now locked W30.
We currently have a bug when calling Arm64GPRCache::Flush with
FlushMode::MaintainState, zero free host registers, and at least
one guest register containing an immediate. We end up grabbing
a temporary register from the register cache in order to be
able to write the immediate to memory, but grabbing a temporary
register when there are zero free registers causes the least
recently used register to be flushed in a way which does not
maintain the state of the register cache.
To get around this, require callers to pass in a temporary
register in the GPR MaintainState case. In other cases,
passing in a temporary register is not required but can help
avoid spilling a register (if the caller already had a
temporary register at hand anyway, which in particular will
be the case in my upcoming memcheck pull request).
release-ubu-x64 currently fails with "sorry, unimplemented: non-trivial
designated initializers not supported". pr-ubu-x64 doesn't for some
reason, but we might as well remove the designated initializer.
This fixes various texture offsetting issues with negative texture coordinates (bringing the software renderer in line with the hardware renderers). It also handles the invalid wrap mode accurately (as was done for the hardware renderers in the previous commit). Lastly, it handles wrapping with non-power-of-2 texture sizes in a hardware-accurate way (which is somewhat broken looking, as games aren't supposed to use wrapping with non-power-of-2 sizes); this has not been done for the hardware renderers.
A voice is considered running if and only if `running` equals 1,
not if `running` is not equal to 0.
This fixes https://bugs.dolphin-emu.org/issues/12508 because for some
reason *The Sims 2 - Castaway* sets `running` to 8 when a stream
finishes playing; previously our AX HLE would just loop the voice
and eventually crash after accessing invalid memory addresses.
Thanks to JMC47 and delroth's help, I've verified that this is the
correct check for the following ucodes:
GC:
* 0x3ad3b7ac
* 0x3daf59b9
* 0x4e8a8b21
* 0x07f88145
* 0xe2136399
* 0x3389a79e
Wii:
* 0x347112ba
* 0xfa450138
* 0xadbc06bd
And while I was fixing the running check, I noticed that the is_stream
field was also being handled incorrectly, so I've fixed that as well.
Putting AX functions from AXVoice.h in an anonymous namespace does
successfully prevent compilers from merging those functions and
allows us to avoid ODR violations.
However, tools such as gdb still mix up AX GC and AX Wii functions
and variables because those have the exact same symbol names.
This can be fixed by using inline namespaces which are transparent
at the source code level but forces AX GC and AX Wii symbols to be
different.
The fast path of using CVTSD2SS/FCVTN rounds the significand if it
can't be exactly represented as a single, whereas the accurate path
instead truncates the significand. So we should only use the fast
path if we know that the lower bits of the significand are not set.
This is not known to affect any games.
Passing a width of 64 and registers encoded as double to
DUP resulted in an invalid instruction. The registers should
be encoded as quads in this situation.
Fixes https://bugs.dolphin-emu.org/issues/12575.
Manually encoding and decoding logical immediates is error-prone.
Using ORRI2R and friends lets us avoid doing the work manually,
but in exchange, there is a runtime performance penalty. It's
probably rather small, but still, it would be nice if we could
let the compiler do the work at compile-time. And that's exactly
what this commit does, so now I have no excuse for trying to
manually write logical immediates anymore.
If a branch is unconditional, its target should not be in farcode,
since that defeats the purpose of farcode (putting seldom executed
code in farcode to keep it out of the icache when possible).
Fixes a 58698b8380 regression. (The EXCEPTION_EXTERNAL_INT
immediate being wrong meant that we never took the branch,
masking the problem of the MSR.EE immediate being wrong...)
In cases where we already know that there is an exception,
either because we just checked for it or because we were
the ones that generated the exception to begin with,
we can skip the branch inside WriteExceptionExit.
Unlike most constants we emit in JitArm64, these constants are
*not* inherent to the CPU we're emulating, and can have whatever
values we want. Let's handle them more robustly, in case we
decide to change their values in the future.
Public domain does not have an internationally agreed upon definition,
As such it's generally preferred to use an extremely liberal license,
which can explicitly list the rights granted by the copyright holder.
The CC0 license is the usual choice here.
This "relicensing" is done without hunting down copyright holders, since
it is presumed that their release of this work into the public domain
authorizes us to redistribute this code under any other license of our
choosing.
This code was part of Dolphin's relicensing from v2 to v2+ a while back,
we just never updated these copyright headers. I double-checked that
segher gave us permission to relicense this code to v2+ on 2015-05-16.
SPDX standardizes how source code conveys its copyright and licensing
information. See https://spdx.github.io/spdx-spec/1-rationale/ . SPDX
tags are adopted in many large projects, including things like the Linux
kernel.
This broke ejecting Wii discs while the game is running, as the drive state was set to Ready even when no disc was present, but other code still reported the missing disc, which confused games as you can't be both ready to read and have no disc. That would cause games to show an unrecoverable error screen, instead of a "please insert the game disc" screen.
This only affected Wii games; the GameCube games used regular disc reads which worked fine.
Performance optimization, along with making the code a little
neater. Saves us from performing a single -> double -> single
conversion when calling UpdateFPRFSingle.
If we already have to use a GPR, we might as well take advantage
of the nice immediate encodings provided by GPR ORR. This is
faster, smaller, and saves a register.
Some of the code used when the carry flag is known to be a
constant value is really not much better than just setting
the carry flag and then using the normal code, and with how
rarely this code runs, it isn't well tested either.
Might as well get rid of some of this code and simplify things.
These optimizations were already present, but only when d == a. They
also make sense when this condition does not hold.
- imm == 0
Before:
41 BB 00 00 00 00 mov r11d,0
45 2B DF sub r11d,r15d
After:
45 8B DF mov r11d,r15d
41 F7 DB neg r11d
- imm == -1
Before:
41 BD FF FF FF FF mov r13d,0FFFFFFFFh
44 2B EE sub r13d,esi
0F 93 45 68 setae byte ptr [rbp+68h]
After:
44 8B EE mov r13d,esi
41 F7 D5 not r13d
C6 45 68 01 mov byte ptr [rbp+68h],1
Without this, the code added in ac28b89 misbehaves and considers
AArch64 netplay clients to not have hardware FMA support, telling
all clients to disable FMA support, which causes a desync between
x64 and AArch64 due to JitArm64 not being able to disable FMA support.
Fixes a regression from 5.0-12066, where setting the GFXBackend variable
to one other than the current global backend would crash Dolphin upon
launching the game.
fcmpX only updates the FPCC bits, not the C bit.
This was already correctly implemented in the interpreter.
Not known to affect any games, but affects a hardware test.
This fixes bounding box shaders failing to compile under Vulkan, due to
differences between GLSL and HLSL in the return value of vector
comparisons and what types these functions accept. I included all() for
the sake of completeness.
At higher resolutions, our bounding box dimensions end up being
slightly larger than original hardware in some cases. This is not
necessarily wrong, it's just an artifact of rendering at a higher
resolution, due to bringing out detail that wouldn't have appeared on
original hardware. It causes a texel to fall partially on what would
have been a single pixel at native resolution, resulting in the
coordinates getting bumped up to the next valid value. In many cases,
these slightly larger bounding boxes are perfectly fine, as games don't
hard-code expected dimensions. It is problematic in Paper Mario TTYD
though, for a somewhat complicated reason.
Paper Mario TTYD frequently uses EFB copies to pre-render a bunch of
animation frames for a character sprite (especially in Chapter 2), so
that it can then render 100 or more of them without bringing the
GameCube to its knees. Based on my observation, the game seems to set
aside a region of memory to store these EFB copies. This region is
obviously fairly small, as the GameCube only has 24MB of RAM. There are
2 rooms in Chapter 2 where you fight a horde of as many as 100 Jabbies,
which are also rendered using EFB copies, so in this room the game ends
up making 130(!) EFB copies just for Puni and Jabbi sprites. This seems
to nearly fill the region of memory it set aside for them.
Unfortunately, our slightly larger bounding boxes at higher resolutions
results in overflowing this memory, causing very strange behavior. Some
EFB copies partially overlap game state, resulting in reading it as a
garbage RGB5A3 texture that constantly changes. Others apparently
somehow trigger a corner case in our persistent buffer mapping, causing
them to partially overwrite earlier EFB copies.
What this change does is only include the screen coordinates that align
with the equivalent native resolution pixel centers, which generally
results in the bounding boxes being more in line with original
hardware. It isn't perfect, but it's enough to fix Paper Mario TTYD's
Jabbi rooms by avoiding the buffer overflow. Notably, it is more
accurate at odd resolutions than at even resolutions. Native resolution
is completely unaffected by this change, as should be the case. This
change may also have a small positive impact on shader performance at
higher resolutions, as there will be less atomic operations performed.
Not doing this can cause desyncs when TASing. (I don't know
how common such desyncs would be, though. For games that
don't change rounding modes, they shouldn't be a problem.)
When I added the software FMA path in 2c38d64 and made us use
it when determinism is enabled, I was assuming that either the
performance impact of software FMA wouldn't be too large or CPUs
that were too old to have FMA instructions were too slow to run
Dolphin well anyway. This was wrong. To give an example, the
netplay performance went from 60 FPS to 30 FPS in one case.
This change makes netplay clients negotiate whether FMA should
be used. If all clients use an x64 CPU that supports FMA, or
AArch64, then FMA is enabled, and otherwise FMA is disabled.
In other words, we sacrifice accuracy if needed to avoid massive
slowdown, but not otherwise. When not using netplay, whether to
enable FMA is simply based on whether the host CPU supports it.
The only remaining case where the software FMA path gets used
under normal circumstances is when an input recording is created
on a CPU with FMA support and then played back on a CPU without.
This is not an especially common scenario (though it can happen),
and TASers are generally less picky about performance and more
picky about accuracy than other users anyway.
With this change, FMA desyncs are avoided between AArch64 and
modern x64 CPUs (unlike before 2c38d64), but we do get FMA
desyncs between AArch64 and old x64 CPUs (like before 2c38d64).
This desync can be avoided by adding a non-FMA path to JitArm64 as
an option, which I will wait with for another pull request so that
we can get the performance regression fixed as quickly as possible.
https://bugs.dolphin-emu.org/issues/12542
Back when I wrote this code, I believe I set it to use a custom path
so that the cache would end up in a directory which Android considers
to be a cache directory. But nowadays the directory which Dolphin's
C++ code considers to be the cache directory is such a directory,
so there's no longer any reason to override the default path.
this prevented some devices from being recreated correctly, as they were exclusive (e.g. DInput Joysticks)
This is achieved by calling Settings::ReleaseDevices(), which releases all the UI devices shared ptrs.
If we are the host (Qt) thread, DevicesChanged() is now called in line, to avoid devices being hanged onto by the UI.
For this, I had to add a method to check whether we are the Host Thread to Qt.
Avoid calling ControllerInterface::RefreshDevices() from the CPU thread if the emulation is running
and we manually refresh devices from Qt, as that is not necessary anymore.
Refactored the way IOWindow lists devices to make it clearer and hold onto disconnected devices.
There were so many issues with the previous code:
-Devices changes would not be reflected until the window was re-opened
-If there was no default device, it would fail to select the device at index 0
-It could have crashed if we had 0 devices
-The default device was not highlighted as such
This helps us keeping the most important devices (e.g. Mouse and Keyboard) on the top
of the list of devices (they still are on all OSes supported by dolphin
and to make hotplug devices like DSU appear at the bottom.
-Fix Add/Remove/Refresh device safety, devices could be added and removed at the same time, causing missing or duplicated devices (rare but possible)
-Fix other devices population race conditions in ControllerInterface
-Avoid re-creating all devices when dolphin is being shut down
-Avoid re-creating devices when the render window handle has changed (just the relevantr ones now)
-Avoid sending Devices Changed events if devices haven't actually changed
-Made most devices populations will be made async, to increase performance and avoid hanging the host or CPU thread on manual devices refresh
A "devices changed" callback could have ended up waiting on another thread that was also populating devices
and waiting on the previous thread to release the callbacks mutex.
Running the min/max operation on the upside down, quad-rounded pixel
coordinates before inverting them to the standard upper-left origin
produces wrong results. Therefore, we need to do the inversion before
rounding to pixel quads.
Fragment coordinates always have a 0.5 offset from a whole integer, as
that's where the pixel center is on modern GPUs. Therefore, we want to
always round the fragment coordinates down for bounding box
calculations. This also renders the pixel center offset useless, as 0.5
vs ~0.5833333 makes no difference when rounding down.
The SDK seems to write "default" bounding box values before every draw
(1023 0 1023 0 are the only values encountered so far, which happen to
be the extents allowed by the BP registers) to reset the registers for
comparison in the pixel engine, and presumably to detect whether GX has
updated the registers with real values. Handling these writes and
returning them on read when bounding box emulation is disabled or
unsupported, even without computing real values from rendering, seems
to prevent games from corrupting memory or crashing.
This obviously does not fix any effects that rely on bounding box
emulation, but having the game not clobber its own code/data or just
outright crash is a definite improvement.
-Reworked thread waits to never hang the Host thread for more than a really small time
(e.g. when disabling DSU its thread now closes almost immediately)
-Improve robustness when a large amount of devices are connected
-Add devices disconnection detection (they'd stay there forever until manually refreshed)
We also need to ensure the the CPU does not receive stale values
which have been updated by the GPU. Apparently the buffer here
is not coherent on NVIDIA drivers. Not sure if this is a driver
bug/spec violation or not, one would think that
glGetBufferSubData() would invalidate any caches as needed, but
this path is only used on NVIDIA anyway, so it's fine. A point
to note is that according to ARB_debug_report, it's moved from
video to host memory, which would explain why it needs the
cache invalidate.
This fixes rendering issues in Viewtiful Joe (https://bugs.dolphin-emu.org/issues/12525), but it is not entirely hardware accurate, as hardware testing showed other, more complex behavior in this case. However, it should be good enough for our purposes.
Looks like the option was added to the Wx UI at commit 198d3b69, which
was a few months after the advancedWidget was originally ported from
Wx to Qt, but before anyone was actually using Qt.
When determinism is enabled, we either want all CPUs to use FMA or
we want no CPUs to use FMA. Until now, Jit64 has been been doing
the latter. However, this is inaccurate behavior, all CPUs since
Haswell support FMA, and getting JitArm64 to match the exact
inaccurate rounding used by Jit64 would be a bit annoying. This
commit switches us over to using FMA on all CPUs when determinism
is enabled, with older CPUs calling the std::fma function.
MAX_XFB_WIDTH/HEIGHT are the largest XFB sizes seen in practice, but do not make sense to use for the backbuffer size, which should be the size of the window. The old code created screenshots with a size of 720x540 on NTSC games when "Dump Frames at Internal Resolution" is unchecked; now, the window size is used.
Adds a new PlatformID for universal builds. This will allow single architecture
builds to be updated through the single architecture path, and universal builds
to be updated with universal builds.
-add a way to reset their value (from the mappings UI)
-fix "memory leak" where they would never be cleaned,
one would be created every time you wrote a character after a "$"
-fix ability to create variables with an empty string by just writing "$" (+added error for it)
-Add $ operator to the UI operators list, to expose this functionality even more
This fixes a nasty issue where you can change the Dual Core setting
during emulation, if it has been overridden by GameINI or NetPlay, by
simply changing any of the non-disabled settings. This is because
changing any of the settings will write all of them to the config.
This issue is particularly nasty because managing to disable Dual Core
during emulation, and then stopping it, results in the emulator core
being totally deadlocked. It's impossible to recover from this state,
and Dolphin will remain as a zombie process on the system, consuming
resources and holding locks, until forcibly killed.
Added RAII wrapper around the the JITPageWriteEnableExecuteDisable() and
JITPageWriteDisableExecuteEnable() to make it so that it is harder to forget to
pair the calls in all code branches as suggested by leoetlino.
Removed the unavailable CPU core dialog box that asked users to change their
selected CPU core to one that is available. Instead, Dolphin now just overrides
the core to the default, and logs that it performed the override.
In MacOS 11.2 mprotect can no longer change the access protection settings of
pages that were previously marked as executable to anything but PROT_NONE. This
commit works around this new restriction by bypassing the mprotect based write
protection and instead relying on the write protection provided by MAP_JIT.
Analytics:
- Incorporated fix to allow the full set of analytics that was recommended by
spotlightishere
BuildMacOSUniversalBinary:
- The x86_64 slice for a universal binary is now built for 10.12
- The universal binary build script now can be configured though command line
options instead of modifying the script itself.
- os.system calls were replaced with equivalent subprocess calls
- Formatting was reworked to be more PEP 8 compliant
- The script was refactored to make it more modular
- The com.apple.security.cs.disable-library-validation entitlement was removed
Memory Management:
- Changed the JITPageWrite*Execute*() functions to incorporate support for
nesting
Other:
- Fixed several small lint errors
- Fixed doc and formatting mistakes
- Several small refactors to make things clearer
This commit adds support for compiling Dolphin for ARM on MacOS so that it can
run natively on the M1 processors without running through Rosseta2 emulation
providing a 30-50% performance speedup and less hitches from Rosseta2.
It consists of several key changes:
- Adding support for W^X allocation(MAP_JIT) for the ARM JIT
- Adding the machine context and config info to identify the M1 processor
- Additions to the build system and docs to support building universal binaries
- Adding code signing entitlements to access the MAP_JIT functionality
- Updating the MoltenVK libvulkan.dylib to a newer version with M1 support
Shouldn't have any behaviour change for regular usage as both masks are 32MB
by default.
But fixes theoretical buffer overrun when memory size override is used.
The interpreter implementation of fctiwx was treating rounding
mode 0 as "round to nearest, ties towards zero", which is not
an actual IEEE-754 rounding mode. The IBM document mentioned
in a comment at the top of the function, on the other hand,
treats rounding mode 0 as "round to nearest, ties to even",
which makes more sense.
This fixes one of JMC's console-recorded F-Zero GX replays on
JitArm64. (JitArm64 uses an interpreter fallback for fctiwx.)
The GC/Wii GPU rasterizes in 2x2 pixel groups, so bounding box values
will be rounded to the extents of these groups, rather than the exact
pixel. To account for this, we'll round the top/left down to even and
the bottom/right up to odd. I have verified that the values resulting
from this change exactly match a real Wii.
Any file which includes scmrev.h must be rebuilt when scmrev.h
is regenerated. By not including scmrev.h from any file other
than Version.cpp, incremental builds become a little faster.
When trying to do a small optimization in 8a0f5ea, I failed to
take into account that WeakFlush and FlushOne update m_query_count.
Only D3D11 and OGL had this problem, not D3D12 and Vulkan.
Sorry, the fix I made to the empty string in a29660a was not
actually sufficient, as DolphinQt will call tr on the string
regardless of whether it's marked with _trans. The proper fix
is to use nullptr, which DolphinQt has a special check for.
Sending an empty string to the translation system will not
result in getting an empty string back, but rather a description
of the currently loaded translations file. So empty strings
should not be marked as translatable.
Also adding some i18n comments and rewording a string I thought
was hard to understand.
casting a value to a u32 when it's originally an int, and it's exposed as int to users,
could end up in cases where a negative number would result as a positive one.
This doesn't really affect the value range of the attachment enum,
still I think the code was wrong.
Heavily tested.
Settings.SECTION_INI_ANDROID and Settings.SECTION_BINDINGS
both have the value "Android", but we only want the former
to be marked as being handled by the new config system.
This change fixes a problem where controller settings were
not being properly saved to Dolphin.ini.
The STL has everything we need nowadays.
I have tried to not alter any behavior or semantics with this
change wherever possible. In particular, WriteLow and WriteHigh
in CommandProcessor retain the ability to accidentally undo
another thread's write to the upper half or lower half
respectively. If that should be fixed, it should be done in a
separate commit for clarity. One thing did change: The places
where we were using += on a volatile variable (not an atomic
operation) are now using fetch_add (actually an atomic operation).
Tested with single core and dual core on x86-64 and AArch64.
NumericSettings support a max, so let's use it.
It might not do much now, but the max and min values will be used to give visual feeback
in the UI in one of my upcoming input PRs
The control expression editor allows line breaks, but the serialization was
losing anything after the first line break (/r /n).
Instead of opting to encode them and decode them on serialization
(which I tried but was not safe, as it would lose /n written in the string by users),
I opted to replace them with a space.
and replacing it with a ":" prefix. Also remove white spaces and \n \t \r.
bugfix: fix EmulatedController::GetStateLock() not being aquired when reading the
expression reference
bugfix: MappingButton::UpdateIndicator() calling State(0) on outputs, breaking ongoing
rumbles if a game was running
Improvement: make expressions previews appear in Italic if they failed to parse correctly
Previously we set the texture coordinate to zero, now we set
the texture coordinate *index* to zero. This fixes the ripple
effect of the Mario painting in Luigi's Mansion.
This change should have no behavioral differences itself, but allows for changing the behavior of out of bounds tex coord indices more easily in the next commit. Without this change, returning tex0 for out of bounds cases and then applying the fixed-point logic would use the wrong tex dimension info (tex0 with I_TEXDIMS[1] or such), which is inaccurate.
Previously we set the texture coordinate to zero, now we set
the texture coordinate *index* to zero. This fixes the ripple
effect of the Mario painting in Luigi's Mansion.
Previously we set the texture coordinate to zero, now we set
the texture coordinate *index* to zero. This fixes the ripple
effect of the Mario painting in Luigi's Mansion.
Co-authored-by: Pokechu22 <Pokechu022@gmail.com>
Since the description updating is tied to the selection changing on the detail list, and the detail list is recreated on each object change, behavior was somewhat broken. Clearing the list changed the current row to zero, but nothing else (particularly m_object_data_offsets) had been updated, so the description was not necessarily correct (this is easier to observe now since the vertex data is at the end, so it's easier to get different lengths of register updates). Furthermore, subsequent clears did not update the current row since there was no visible selection, so it only changed the description once. The current row is now always set to zero, which forces an update (and also scrolls the list back to the top). The presence of FRAME_ROLE and OBJECT_ROLE are also checked so that the description is cleared if no object is selected.
- Only one search result is generated per command/line, even if there are multiple matches in that line.
- Pressing enter on the edit field begins a search, just like clicking the begin button.
- The next and previous buttons are disabled until a search is begun.
- The search results are cleared when changing objects or frames.
- The previous button once again works (a regression from the previous commit), and the register updates and graphics data for the correct object are searched.
- currentRow() never returns -1, so checking that is unnecessary (and misleading).
- The 'Invalid search parameters (no object selected)' previously never showed up before because FRAME_ROLE is present if and only if OBJECT_ROLE is present.
This way, it can be focused with the render window behind it, instead of having the main window show up and cover the render window. This is useful for adjusting the object range, among other things.
If the number of objects varied, this would result in either missing objects on some frames, or too many objects on some frames; the latter case could cause crashes. Since it used the current frame to get the count, if the FIFO is started before the FIFO analyzer is opened, then the current frame is effectively random, making it hard to reproduce consistently.
This issue has existed since the FIFO analyzer was implemented for Qt.
The 'zero frames in the range' check can be removed because now there is always at least 1 frame; of course that might be the same frame over and over again, but that's still useful for e.g. Free Look (and the 1 frame repeating effect already occurred when frame count was exclusive).
A single object can be selected instead of 2 (it was already inclusive internally), and the maximum value is the highest number of objects in any frame (minus 1) to reduce jank when multiple frames are being played back.
Now that this is only called when playback actually starts (and not on unpausing), this change makes the experience a bit better (no more missing objects from not having reset the from object after changing FIFOs).
It is no longer relevant for the current set of loaders after 7030542546. If it becomes relevant again, a static function named IsUsable or IsCompatibleWithCurrentMachine or something would be a better approach.
By taking advantage of three-operand IMUL, we can eliminate a MOV
instruction. This is a small code size win. However, due to IMUL sign
extending the immediate value to 64 bits, we can only apply this when
the magic number's most significant bit is zero.
To ensure this can actually happen, we also minimize the magic number by
checking for trailing zeroes.
Example (Unsigned division by 18)
Before:
41 BE E4 38 8E E3 mov r14d,0E38E38E4h
4D 0F AF F5 imul r14,r13
49 C1 EE 24 shr r14,24h
After:
4D 69 F5 39 8E E3 38 imul r14,r13,38E38E39h
49 C1 EE 22 shr r14,22h
This isn't entirely necessary, as they are interpreted as barewords expressions,
but it's still nicer to have by default. And my upcoming input changes will
always put `` around single letter inputs.
-Add pause state to FPSCounter.
-Add ability to have more than one "OnStateChanged" callback in core.
-Add GetActualEmulationSpeed() to Core. Returns 1 by default. It's used by my input PRs.
The SaveToSYSCONF call in BootManager.cpp was unintentionally
overriding the temporary NAND set by the preceding
InitializeWiiRoot call. Fixes
https://bugs.dolphin-emu.org/issues/12500.
Verifying a Wii game creates an instance of IOS, and Dolphin
can't handle more than one instance of IOS at the same time.
Properly supporting it is probably more effort than it's worth.
Fixes https://bugs.dolphin-emu.org/issues/12494.
Avoids the need to copy the *.mo files manually *and* more importantly
this ensures that the mo files are always recreated if the build
output directory is cleared.
Update references was failing to update the references, causing input to stay nullptr and crashing.
I fixed the case that triggered that, though also added checks against nullptrs for safety.
(cherry picked from commit 4bdcf707555a5568eddff957fa3604975ffb6ed7)
I think the AArch64 JIT has come far enough that it doesn't have to
be called experimental anymore.
I'm also labeling the x86-64 JIT as x86-64 for consistence with the
AArch64 JIT. This will especially be helpful if we start supporting
AArch64 on macOS, as AArch64 macOS can run both the x86-64 JIT and
the AArch64 JIT depending on whether you enable Rosetta 2.
I haven't observed this breaking any game, but it didn't match
the behavior of the interpreter as far as I could tell from
reading the code, in that denormals weren't being flushed.
If we can prove that FCVT will provide a correct conversion,
we can use FCVT. This makes the common case a bit faster
and the less likely cases (unfortunately including zero,
which FCVT actually can convert correctly) a bit slower.
Preparation for following commits.
This commit intentionally doesn't touch paired stores,
since paired stores are supposed to flush to zero.
(Consistent with Jit64.)
This simplifies some of the following commits. It does require
an extra register, but hey, we have 32 of them.
Something I think would be nice to add to the register cache
in the future is the ability to keep both the single and double
version of a guest register in two different host registers
when that is useful. That way, the extra register we write to
here can be read by a later instruction, saving us from
having to perform the same conversion again.
Fixes https://bugs.dolphin-emu.org/issues/12388. Might also fix
other games that have problems with float/paired instructions
in JitArm64, but I haven't tested any.
-They might have never drawn if DrawMessages wasn't called before they actually expired
-Their fade was wrong if the duration of the message was less than the fade time
This makes them much more useful for debugging, I know there might be other means
of debugging like logs and imgui, but this was the simplest so that's what I used.
If you want to print the same message every frame, but with a slightly different value
to see the changes, it now work.
To compensate for the fact that they are now always rendered once,
so on start up a lot of old messages (printed while the emulation was off) could show up,
I've added a "drop" time, which means if a msg isn't rendered for the first
time within that time, it will be dropped and never rendered.
When the interpreter writes to a discarded register, its type
must be changed so that it is no longer considered discarded.
Fixes a 62ce1c7 regression.
We normally check for division by zero to know if we should set the
destination register to zero with a XOR. However, when the divisor and
destination registers are the same the explicit zeroing can be omitted.
In addition, some of the surrounding branching can be simplified as
well.
Before:
45 85 FF test r15d,r15d
75 05 jne normal_path
45 33 FF xor r15d,r15d
EB 0C jmp done
normal_path:
B8 5A 00 00 00 mov eax,5Ah
99 cdq
41 F7 FF idiv eax,r15d
44 8B F8 mov r15d,eax
done:
After:
45 85 FF test r15d,r15d
74 0C je done
B8 5A 00 00 00 mov eax,5Ah
99 cdq
41 F7 FF idiv eax,r15d
44 8B F8 mov r15d,eax
done:
Division by a power of two can be slightly improved when the
destination and dividend registers are the same.
Before:
8B C6 mov eax,esi
85 C0 test eax,eax
8D 70 03 lea esi,[rax+3]
0F 49 F0 cmovns esi,eax
C1 FE 02 sar esi,2
After:
85 F6 test esi,esi
8D 46 03 lea eax,[rsi+3]
0F 48 F0 cmovs esi,eax
C1 FE 02 sar esi,2
Repeated erase() + iteration on a std::multimap is extremely slow.
Slow enough that it causes a 7 second long stutter during some
transitions in F-Zero X (a N64 VC game that triggers many, many icache
invalidations).
And slow enough that JitBaseBlockCache::DestroyBlock shows up on a
flame graph as taking >50% of total CPU time on the CPU-GPU thread:
https://i.imgur.com/vvqiFL6.png
This commit optimises those block link queries by replacing the
std::multimap (which is typically implemented with red-black trees)
with hash tables.
Master: https://i.imgur.com/vvqiFL6.png / 7s stutters
(starting from 5.0-2021 and with branch following disabled)
This commit: https://i.imgur.com/hAO74fy.png / ~0.7s stutters, which
is pretty close to 5.0 stable. (5.0-2021 introduced the performance
regression and it is especially noticeable when branch following
is disabled, which is the case for all N64 VC games since 5.0-8377.)
VideoCommon: Change the type of BPMemory.scissorOffset to 10bit signed: S32X10Y10
VideoBackends: Fix Software Clipper.PerspectiveDivide function, use BPMemory.scissorOffset instead of hard code 342
Oversight from #9545, which moved the "new game has been loaded" logic
to a separate OnNewTitleLoad function that has to be called explicitly
*after* a title has loaded.
Coupled with the commit that makes Dolphin not clobber 0x1800-0x3000
when using MIOS, this fixes Wind Waker and other MIOS-patched games
when they are launched from the System Menu.
MIOS puts patch data in low MEM1 (0x1800-0x3000) for its own use.
Overwriting data in this range can cause the IPL to crash when
launching games that get patched by MIOS.
See https://bugs.dolphin-emu.org/issues/11952 for more info.
Not applying the Gecko HLE patches means that Gecko codes will not work
under MIOS, but this is better than the alternative of having specific
games crash.
This particular range is kind of bizarre, and would only interpret
interleave mode 2 as a valid mode, while rejecting interleave mode 1 and
the extension byte mode.
As far as I know, based off the information on Wiibrew, we should be
considering all three values within this range as valid.
texture serialization and deserialization used to involve many memory
allocations and deallocations, along with many copies to and from
those allocations. avoid those by reserving a memory region inside the
output and writing there directly, skipping the allocation and copy to
an intermediate buffer entirely.
This adds a CMake option (DOLPHIN_DEFAULT_UPDATE_TRACK) to allow
configuring SCM_UPDATE_TRACK_STR. This is needed to enable auto-updates
in Windows CMake builds by default.
This adds a function to get the emulated or real Bluetooth device for
an active emulation instance. This lets us deduplicate all the
`ios->GetDeviceByName("/dev/usb/oh1/57e/305")` calls that are currently
scattered in the codebase and ensures Bluetooth passthrough is being
handled correctly.
This also fixes the broken check in WiimoteCommon::UpdateSource.
There was a confusion between "emulated Bluetooth" (as opposed to
"real Bluetooth" aka Bluetooth passthrough) and "emulated Wiimote".
Specifically, 'Scooby-Doo! Mystery Mayhem', 'Scooby-Doo! Unmasked', 'Ed, Edd n Eddy: The Mis-Edventures', and the Wii version of 'Happy Feet'.
The JIT cache causes problems with emulated icache invalidation in these games, resulting in areas failing to load.
This avoids some warnings, which were originally fixed by ignoring loads with a value of zero (see 636bedb207 / #3242).
Note that FifoCI will report some changes, but only on the first frame; these seem to be timing related as they don't happen if a different write is used to replace skipped ones.
They appear to relate to perf queries, and combining them with truely unknown commands would probably hide useful information. Furthermore, 0x20 is issued by every title, so without this every title would be recorded as using an unknown command, which is very unhelpful.
The swaps are confusing and don't accomplish much.
It was originally written like this:
u32 pte = bswap(*(u32*)&base_mem[pteg_addr]);
then bswap was changed to Common::swap32, and then the array access
was replaced with Memory::Read_U32, leading to the useless swaps.
While 6xx_pem.pdf §7.6.1.1 mentions that the number of trailing
zeros in HTABORG must be equal to the number of trailing ones
in the mask (i.e. HTABORG must be properly aligned), this is actually
not a hard requirement. Real hardware will just OR the base address
anyway. Ignoring SDR changes would lead to incorrect emulation.
Logging a warning instead of dropping the SDR update silently is a
saner behaviour.
debaf63fe8 moved the "Sonic epsilon hack"
to vertex shaders. However, it was only done for targets with depth
clamping. If this is not available, for example the target is OpenGL ES,
the Sonic problem appears (https://bugs.dolphin-emu.org/issues/11897).
A version of the "Sonic epsilon hack" is added for targets without
depth clamping.
This changes FileSystemProxy::Open to return a file descriptor wrapper
that will ensure the FD is closed when it goes out of scope.
By using such a wrapper we make it more difficult to forget to close
file descriptors.
This fixes a leak in ReadBootContent. I should have added such a class
from the beginning... In practice, I don't think this would have caused
any obvious issue because ReadBootContent is only called after an IOS
relaunch -- which clears all FDs -- and most titles do not get close
to the FD limit.
JitArm64::DoJit contains a check where it prints a warning and tries
to pause emulation if instructed to compile code at address 0. I'm
assuming this was done in order to provide a nicer error behavior
in cases where PC was accidentally set to null. Unfortunately, it
has started causing us problems recently, as 688bd61 writes and runs
some code at address 0 to simulate the PPC being held in reset.
What makes this worse is that calling Core::SetState from the CPU
thread is actually not allowed and will cause a deadlock instead of
the intended behavior. I don't believe there is anything on a real
console that would stop you from executing code at address 0 (as
long as the MMU has been set up to allow it), and Jit64::DoJit
doesn't contain any check like this, so let's remove the check.
This commit adds a new "discarded" state for registers.
Discarding a register is like flushing it, but without
actually writing its value back to memory. We can discard
a register only when it is guaranteed that no instruction
will read from the register before it is next written to.
Discarding reduces the register pressure a little, and can
also let us skip a few flushes on interpreter fallbacks.
The output of instructions like fabsx and ps_sel is store-safe
if and only if the relevant inputs are. The old code was always
marking the output as store-safe if the output was a single,
and never otherwise.
Also, the old code was treating the output of psq_l/psq_lu as
store-safe, which seems incorrect (if dequantization is disabled).
This improves the speed of verifying Wii WIA/RVZ files.
For me, the verification speed for LZMA2-compressed files
has gone from 11-12 MiB/s to 13-14 MiB/s.
One thing VolumeVerifier does to achieve parallelism is to
compute hashes for one chunk of data while reading the next
chunk of data. In master, when reading data from a Wii
partition, each such chunk is 32 KiB. This is normally fine,
but with WIA and RVZ it leads to rather lopsided read times
(without the compute times being lopsided): The first 32 KiB
of each 2 MiB takes a long time to read, and the remaining
part of the 2 MiB can be read nearly instantly. (The WIA/RVZ
code has to read the entire 2 MiB in order to compute hashes
which appear at the beginning of the 2 MiB, and then caches
the result afterwards.) This leads to us at times not doing
much reading and at other times not doing much computation.
To improve this, this change makes us use 2 MiB chunks
instead of 32 KiB chunks when reading from Wii partitions.
(block = 32 KiB, group = 2 MiB)
This can't actually happen in practice due to how WAD files work,
but it's very easy to add support for thanks to the last commit,
so we might as well add support for it.
The performance gains of doing this aren't too important since you
normally wouldn't run into any disc image that has overlapping blocks
(which by extension means overlapping partitions), but this change also
lets us get rid of things like VolumeVerifier's mutex that used to
exist just for the sake of handling overlapping blocks.
Panic alerts in DiscIO can potentially be very annoying since
large amounts of them can pop up when loading the game list
if you have some particularly weird files in your game list.
This was a much bigger problem back in 5.0 with its
"Tried to decrypt data from a non-Wii volume" panic alert, but
I figured I would take it all the way and remove the remaining
panic alerts that can show up when loading the game list.
I have exempted uses of ASSERT/ASSERT_MSG since they indicate
a bug in Dolphin rather than a malformed file.
If we know at compile time that the PPC carry flag definitely
has a certain value, we can bake that value into the emitted code
and skip having to read from PPCState.
When a save state is loaded, the IOS device serving bluetooth
is cast as BluetoothEmuDevice. If, however, a real Wiimote
with BT passthrough is used, this caused the game to crash.
Now the proper device class is used.
At a first glance it may look like a part of the code I added to
srawx in efeda3b has a bug when a == s. The code actually happens
to work correctly, but in the interest of making the code easier
to reason about, I'd like to change the way it's implemented. This
change should improve the pipelining a little in the a == s case too.
Fix Gamelist context menu item 'Open Containing Folder' opening wrong
target on Windows when game parent folder is [foobar] and grandparent
folder contains file [foobar].bat or [foobar].exe
Add trailing directory separator to parent folder path to force Windows
to interpret path as directory.
Fixes https://bugs.dolphin-emu.org/issues/12411
21c152f added a small hack to DVDInterface to keep WBFS and CISO
files working with Nintendo's "Error #001" anti-piracy check.
Unfortunately I don't think it's possible to support WBFS and
CISO without any kind of hack or heuristic, but what we can do
is replace the 21c152f hack (which applies regardless of file
format) with a hack that only is active when using WBFS or CISO.
This change is similar to 2a5a399, but the disc size is
calculated in a different way.
Add ! before unused variables to 'use' them.
Ubuntu-x64 emits warnings for unused variables because gcc decides
it should ignore the void cast around them. See thread for discussion:
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=66425
Loop index int i was being compared against GetControllerCount() which
returned a size_t. This was the only place GetControllerCount() was
called from so the change of return type doesn't disturb anything else.
Changing the loop index to size_t wouldn't work as well since it's
passed into GetController(), which takes an int and is called from many
places, so it would need a cast anyway on an already busy line.
...and let's optimize a divisor of 2 ever so slightly for good measure.
I wouldn't have bothered, but most GameCube games seem to hit this on
launch.
- Division by 2
Before:
41 BE 02 00 00 00 mov r14d,2
41 8B C2 mov eax,r10d
45 85 F6 test r14d,r14d
74 0D je overflow
3D 00 00 00 80 cmp eax,80000000h
75 0E jne normal_path
41 83 FE FF cmp r14d,0FFFFFFFFh
75 08 jne normal_path
overflow:
C1 F8 1F sar eax,1Fh
44 8B F0 mov r14d,eax
EB 07 jmp done
normal_path:
99 cdq
41 F7 FE idiv eax,r14d
44 8B F0 mov r14d,eax
done:
After:
45 8B F2 mov r14d,r10d
41 C1 EE 1F shr r14d,1Fh
45 03 F2 add r14d,r10d
41 D1 FE sar r14d,1
Add a function to calculate the magic constants required to optimize
signed 32-bit division.
Since this optimization is not exclusive to any particular architecture,
JitCommon seemed like a good place to put this.
Zero divided by any number is still zero. For whatever reason, this case
shows up frequently too.
Before:
B8 00 00 00 00 mov eax,0
85 F6 test esi,esi
74 0C je overflow
3D 00 00 00 80 cmp eax,80000000h
75 0C jne normal_path
83 FE FF cmp esi,0FFFFFFFFh
75 07 jne normal_path
overflow:
C1 F8 1F sar eax,1Fh
8B F8 mov edi,eax
EB 05 jmp done
normal_path:
99 cdq
F7 FE idiv eax,esi
8B F8 mov edi,eax
done:
After:
Nothing!
When the dividend is known at compile time, we can eliminate some of the
branching and precompute the result for the overflow case.
Before:
B8 54 D3 E6 02 mov eax,2E6D354h
85 FF test edi,edi
74 0C je overflow
3D 00 00 00 80 cmp eax,80000000h
75 0C jne normal_path
83 FF FF cmp edi,0FFFFFFFFh
75 07 jne normal_path
overflow:
C1 F8 1F sar eax,1Fh
8B F8 mov edi,eax
EB 05 jmp done
normal_path:
99 cdq
F7 FF idiv eax,edi
8B F8 mov edi,eax
done:
After:
85 FF test edi,edi
75 04 jne normal_path
33 FF xor edi,edi
EB 0A jmp done
normal_path:
B8 54 D3 E6 02 mov eax,2E6D354h
99 cdq
F7 FF idiv eax,edi
8B F8 mov edi,eax
done:
Fairly common with constant dividend of zero. Non-zero values occur
frequently in Ocarina of Time Master Quest.
Whether the custom RTC setting is enabled shouldn't in itself
affect determinism (as long as the actual RTC value is properly
synced). Alters the logic added in 4b2906c.
I'm not entirely certain that this is correct, but the current
code doesn't really make sense to me... If we need to force the
RTC bias to 0 when custom RTC is enabled, why don't we need to
do it when custom RTC is disabled? The code for getting the
host system's current time doesn't contain any special handling
for the guest's RTC bias as far as I can tell.
The loop in WIARVZFileReader::Chunk::Read could terminate
prematurely if the size argument was smaller than the size
of an exception list which had only been partially loaded.
Fixes issue 11393.
The problem is that left and top make no sense for a width by height array; they only make sense in a larger array where from which a smaller part is extracted. Thus, the overall size of the array is provided to CopyRegion in addition to the sub-region. EncodeXFB already handles the extraction, so CopyRegion's only use there is to resize the image (and thus no sub-region is provided).
BPMEM_TEV_COLOR_ENV + 6 (0xC6) was missing due to a typo. BPMEM_BP_MASK (0xFE) does not lend itself well to documentation with the current FIFO analyzer implementation (since it requires remembering the values in BP memory) but still shouldn't be treated as unknown. BPMEM_TX_SETMODE0_4 and BPMEM_TX_SETMODE1_4 (0xA4-0xAB) were missing entirely.
Additional changes:
- For TevStageCombiner's ColorCombiner and AlphaCombiner, op/comparison and scale/compare_mode have been split as there are different meanings and enums if bias is set to compare. (Shift has also been renamed to scale)
- In TexMode0, min_filter has been split into min_mip and min_filter.
- In TexImage1, image_type is now cache_manually_managed.
- The unused bit in GenMode is now exposed.
- LPSize's lineaspect is now named adjust_for_aspect_ratio.
Additionally, VCacheEnhance has been added to UVAT_group1. According to YAGCD, this field is always 1.
TVtxDesc also now has separate low and high fields whose hex values correspond with the proper registers, instead of having one 33-bit value. This change was made in a way that should be backwards-compatible.
The PPC is supposed to be held in reset when another version of IOS is
in the process of being launched for a PPC title launch.
Probably doesn't matter in practice, though the inaccuracy was
definitely observable from the PPC.
We should only try to load a symbol map for the new title *after* it
has been loaded into memory, not before. Likewise for applying HLE
patches and loading new custom textures.
In practice, loading/repatching too early was only a problem for
titles that are launched via ES_Launch. This commit fixes that.
The extra IPC ack is triggered by a syscall that is invoked in ES's
main function; the syscall literally just sets Y2, IX1 and IX2 in
HW_IPC_ARMCTRL -- there is no complicated ack queue or anything.
Low MEM1 is cleared by IOS before all the other constants are written.
This will overwrite the Gecko code handler but it should be fine
because HLE::Reload (which will set up the code handler hook again)
will be called after a title change is detected.
The Host constructor sets a callback on a lambda that in turn calls
Host_UpdateDisasmDialog. Since that function is not a member function
capturing this is unnecessary.
Fixes -Wunused-lambda-capture warning on freebsd-x64.
When reading a reply from a message sent to the data socket there is
the possibility that the other side gets sent multiple messages
before replying to any of them, which can lead to multiple replies
sent in a row. Though this only happens when things time out, it's
quite possible for these timeouts to happen or build up over time,
especially when initiating the connection.
This change makes sure to flush any pending bytes that have not been
read yet out of the socket after a successful POLL reply is received,
since that is the most common time when backups occur, and as well as
using the exact number of bytes in an expected reply, to ensure
the received data and the message it's replying to do not get out of
sync.
The result of calls to PPCSTATE_OFF_PS0/1 were being cast to u32 and
passed to functions expecting s32 parameters. This changes the casts
to s32 instead.
One location was missing a cast and generated a warning with VS which
is now fixed.
Added `ToggleBreakPoint` to both interface BreakPoints/MemChecks. this would allow us to toggle the state of the breakpoint.
Also the TMemCheck::is_ranged is not longer serialized to string, since can be deduce by comparing the TMemCheck::start_address and TMemCheck::end_address
DualShock UDP Client is the only place in the code that assumed OnConfigChanged()
is called at least once on startup or it won't load up the setting, so I took care of that
This was caused, because we were saving the `break_on_hit` flag with the letter `p`. Then while loading the breakpoints, we read the flag with the letter `b`, resulting in the `break_on_hit` flag being always false
Filesystem accesses aren't magically faster when they are done by ES,
so this commit changes our content wrapper IPC commands to take FS
access times and read operations into account.
This should make content read timings a lot more accurate and closer
to console. Note that the accuracy of the timings are limited to the
accuracy of the emulated FS timings, and currently performance
differences between IOS9-IOS28 and newer IOS versions are not emulated.
Part 1 of fixing https://bugs.dolphin-emu.org/issues/11346
(part 2 will involve emulating those differences)
This makes it more convenient to emulate timings for IPC commands that
perform internal IOS <-> IOS IPC, for example ES relying on FS
for filesystem access.
According to hwtests, older versions of IOS are slower at performing
various filesystem operations:
https://docs.google.com/spreadsheets/d/1OKo9IUuKCrniz4m0kYIaMP_qFtOCmAzHZ_zAmobvBcc/edit
(courtesy of JMC)
A quick glance at IOS9 reveals that older versions of IOS have a
simplistic implementation of memcpy that does not optimize large copies
by copying 16 bytes or 32 bytes per chunk, which makes cached reads
and writes noticeably slower -- the difference was significant enough
that the OoT speedrunning community noticed that IOS9 (the IOS that
is used for the OoT VC title) was slower.
More or less a complete rewrite of the function which aims
to be equally good or better for each given input, without
relying on special cases like the old implementation did.
In particular, we now have more extensive support for
MOVN, as mentioned in a TODO comment.
Instead of constructing IPCCommandResult with static member functions
in the Device class, we can just add the relevant constructors to the
reply struct itself. Makes more sense than putting it in Device
when the struct is used in the kernel code and doesn't use any Device
specific members...
This commit also changes the IPC command handlers to return an optional
IPCCommandResult rather than an IPCCommandResult. This removes the need
for a separate boolean that indicates whether the "result" is actually
a reply, and also avoids the need to set dummy result values and ticks.
It also makes it really obvious which commands can result in no reply
being generated.
Finally, this commit renames IPCCommandResult to IPCReply since the
struct is now only used for actual replies. This new name is less
verbose in my opinion.
The diff is quite large since this touches every command handler, but
the only functional change is that I fixed EnqueueIPCReply to
take a s64 for cycles_in_future to match IPCReply.
PrepareForState is now unnecessary with the new implementation of
HostFileSystem::DoState, which does what the old implementation
(CWII_IPC_HLE_Device_FileIO::PrepareForState) used to do.
I don't really see the use of this. (Maybe in the past it
was used for when we need a constant number of instructions
for backpatching? But we don't use MOVI2R for that now.)
Now that the ES class (now called ESDevice) and the ES namespace do
not conflict anymore, "IOS::" can be dropped in a lot of cases.
This also removes "IOS::HLE::" for code that is already in that
namespace. Some of those names used to be explicitly qualified
only for historical reasons.
There are no functional changes.
Some of the device names can be ambiguous and require fully or partly
qualifying the name (e.g. IOS::HLE::FS::) in a somewhat verbose way.
Additionally, insufficiently qualified names are prone to breaking.
Consider the example of IOS::HLE::FS:: (namespace) and
IOS::HLE::Device::FS (class). If we use FS::Foo in a file that doesn't
know about the class, everything will work fine. However, as soon as
Device::FS is declared via a header include or even just forward
declared, that code will cease to compile because FS:: now resolves
to Device::FS if FS::Foo was used in the Device namespace.
It also leads to having to write IOS::ES:: to access ES types and
utilities even for code that is already under the IOS namespace.
The fix for this is simple: rename the device classes and give them
a "device" suffix in their names if the existing ones may be ambiguous.
This makes it clear whether we're referring to the device class or to
something else.
This is not any longer to type, considering it lets us get rid of the
Device namespace, which is now wholly unnecessary.
There are no functional changes in this commit.
A future commit will fix unnecessarily qualified names.
According to the C standard, an offsetof expression must evaluate to an
address constant, otherwise it's undefined behavior.
Fixes https://bugs.dolphin-emu.org/issues/12409
See also https://gcc.gnu.org/bugzilla/show_bug.cgi?id=95942
There are still improper uses of offsetof (mostly in JitArm64) but
fixing that will take more effort since there's a PPCSTATE_OFF wrapper
macro that is sometimes used with non-array members and sometimes used
with arrays and variable indices... Let's keep that for another PR.
Fixes the expression window being spammed with the first entry in the
Operators or Functions select menus when scrolling the mouse wheel while
hovering over them.
Fixes https://bugs.dolphin-emu.org/issues/12405
The dolphin-redirect.php script seems to have been present since 2012
at least, but we accidentally stopped using it when the "open wiki"
feature was reimplemented in DolphinQt2 in 2016.
<@delroth> dolphin-redirect.php is slightly smarter and tries to find gameid aliases for e.g. same region
<@delroth> uh, I mean different region
PR 9262 added a bunch of Jit64 optimizations, some of
which were already in JitArm64 and some which weren't.
This change ports the latter ones to JitArm64.