This improves my PC's performance on RS2 Hoth by... 0.1% or so, which I
think is within the margin of error. But this change also cuts down on
boilerplate.
If the call to `Open` a perf map fails don't set `s_is_enabled` (though
it could already be true if you're also using VTUNE) and don't call
`std::setvbuf` with a null stream.
Also fix a typo in a comment (`if` -> `in`)
If all inputs to an fmadds instruction (including cousins like fmsubs,
fnmadd...) are single-precision, then the result is identical between a
double-precision calculation with an error-free transform (whether the
calculation is fused or not) and a single-precision FMA instruction
(must be fused). So as a performance optimization in JitArm64, if we
were going to use double precision with EFT but the inputs are singles,
instead we'll use a normal single-precision FMA instruction without
anything extra. This lets us skip both the EFT and double-to-single
conversions.
Also renaming `inaccurate_fma` to `nonfused` because it's confusing that
`inaccurate_fma` and `m_accurate_fmadds` have such similar names
despite controlling separate things.
If result_reg is set to a temporary register instead of VD because of
accurate NaNs, there's no need to allocate a secondary temporary
register because of inaccurate FMA.
TextureCacheBase::LoadImpl has a hot path where the passed-in
TextureInfo never gets used. Instead of passing in a TextureInfo, let's
pass in the stage and create the TextureInfo from the stage if needed.
This unlocks somewhere above an additional 4% performance boost in the
Hoth level of Rogue Squadron 2 on my PC. Performance varies, making it
difficult for me to measure, so treat this as a very approximate number.
When we're emulating single-precision FMA using an FMA instruction,
there's no precision benefit from using a double-precision instruction,
assuming all inputs are single-precision. But when we're emulating
single-precision FMA using separate multiplication and addition
instructions, there is.
This change increases the precision of inaccurate FMA to the same level
as Jit64, which matters since the only reason we have the inaccurate
FMA mode is for sync compatibility with Jit64.
The check only happened if `USE_VTUNE` wasn't defined, and in that case
it was immediately followed by the same check again.
The check used to avoid an unnecessary call to `StringFromFormatV` in
certain circumstances, but that call was removed in
be81fe86e1.
Building on macOS with a recent CMake version would result in the
following error:
CMake Error at Source/Core/MacUpdater/CMakeLists.txt:48 (add_custom_command):
The following keywords are not supported when using
add_custom_command(TARGET): DEPENDS
As it turns out, this form of `add_custom_command` does not accept
DEPENDS, but versions of CMake prior to 3.31 silently dropped it.
The TextureInfo constructor creates a vector of MipLevels. This could be
good for performance if MipLevels are accessed very often for each
TextureInfo, but that's not the case. Dolphin creates thousands of
TextureInfos per second that it never accesses the mipmap levels of
because there's a hit in the texture cache, and in the uncommon case of
a texture cache miss, the mipmap levels only get looped through once.
To make the common case of texture cache hits as fast as possible, let's
not create a vector in the TextureInfo constructor. This commit
implements a custom iterator for MipLevels instead.
In my testing on the Death Star level of Rogue Squadron 2, this speeds
up TextureInfo::FromStage by 200%, giving an overall emulation speedup
of a bit over 1%. Results on the Hoth level are even better, with
TextureInfo::FromStage being close to 300% faster and overall emulation
being over 4% faster. (Single core, no GPU texture decoding.)
Profiling Dolphin using Heaptrack, this turns out to be the cause of
more than half of all temporary allocations. Though since it's in a
separate thread, I suppose it wasn't affecting performance very much.
Combined with the previous commit, this brings the TrampolineInfo struct
down to 48 bytes. This matters, because Jit64 has a big
std::unordered_map where it stores many megabytes of TrampolineInfo
entries.
The key saving comes from shrinking the len member from u32 to u16. It
should be safe to even turn it into a u8, but going that far brings no
additional savings due to how the padding works out.
By moving members of the OpArg struct around, we can cut down on how
much padding the struct needs. Now it has a size of 16 bytes, small
enough for function calls to pass it in two registers instead of on the
stack.
Re-added a line of code from a cancelled PR (that I thought was in
main already) that RetroAchievements uses to properly synchronize
memory reads and writes with the emulator frames. This appears to
fix some pretty major freezing and deadlocks in the RA dev tools.
CMake support for c++23 was added in 3.20.
Remove statements explicitly setting the following policies to NEW.
These policies were introduced in or before 3.20, and now that 3.20 is
the minimum version they will automatically have the NEW behavior.
Policy: Introduced in
CMP0079: 3.13
CMP0084: 3.14
CMP0091: 3.15
CMP0092: 3.15
CMP0099: 3.17
CMP0117: 3.20
Disable scanning c++ source files for module imports (introduced as
CMP0155 in 3.28) since we don't use modules and that policy triggers
build errors with Clang if the clang-scan-deps tool isn't installed.
cpp-ipc is explicitely only available on Windows, Linux, QNX, and FreeBSD. Trying to build dolphin on any another BSD such as OpenBSD or Haiku currently leads to failure because of this.
During my quest to rid Dolphin of the memory-unsafe GetPointer function,
I didn't realize that DSPHLE had its own set of memory access functions
that were essentially duplicating the standard ones. This commit gets
rid of them and replaces calls to them with calls to MemoryManager.
Move the constructor and destructor after the definition of the class
`Internal`.
This fixes an error generated by Clang from the destructor of
`std::unique_ptr<Internal>` when setting the standard version to c++23:
`invalid application of 'sizeof' to an incomplete type 'Metal::ObjectCache::Internal'`.
Fix an error generated by Clang from the destructor of
`std::unique_ptr<FileInfo> m_file_info` when setting the standard
version to c++23:
`invalid application of 'sizeof' to an incomplete type 'DiscIO::FileInfo'`
* This fixes some UI elements (3-dot menu background, status bar when scrolling down in settings) not following Material You colors.
* It doesn't cause any issues on Android versions without dynamic colors (tested on Android 9, 11, 16)
Signed-off-by: Leonardo Ledda <leonardoledda@gmail.com>
Checked on hardware that this bias was not added because I had assumed the other way around would be true, forgot to ask about making a PR for this when I initially had done so
If we're on an x64 CPU that doesn't have the MOVBE extension, trying to
SwapAndStore a host register results in that register's value getting
clobbered with the swapped value. Jit64::stX and Jit64::stXx detect this
case, and if necessary, emit a MOV to a register that's fine to clobber.
This logic was broken by the merge of PR 12134. Jit64::stX and
Jit64::stXx were assuming that if RegCache::IsImm returns true for a
guest register, calling RegCache::Use or RegCache::BindOrImm for that
guest register would result in an immediate. However, PR 12134 made it
possible for a guest register to have both a host register and an
immediate in the register cache at the same time. When this happens,
RegCache::IsImm returns true, yet RegCache::Use and RegCache::BindForImm
return an RCOpArg whose Location returns a host register. (To make it
extra confusing, RCOpArg::IsImm calls RegCache::IsImm if the RCOpArg
came from RegCache, so RCOpArg::IsImm returns true!)
To fix this, in cases where Jit64::stX and Jit64::stXx explicitly need
an immediate to avoid having to emit an extra MOV, let's call
RegCache::Imm32 so that we're certain that we're getting an immediate.
This fixes an issue on older x64 CPUs that manifested as e.g. completely
broken graphics in Spyro: Enter the Dragonfly.
The constant propagation PR made it so that a guest register can be
present in the register cache as both a host register and an immediate
at the same time. If such a guest register is requested from the
register cache, the register cache prefers returning it as a host
register. However, RCOpArg::IsImm still returns true in this case. This
is confusing, especially since OpArg::IsImm does not return true if the
RCOpArg is converted into an OpArg.
This commit makes RCOpArg::IsImm check whether RCOpArg::Location returns
an immediate, so that RCOpArg::IsImm returns false when a host register
is being used. Code that wants to know whether an immediate exists in
the register cache rather than whether an immediate is currently being
used should call RegCache::IsImm instead.