Commit Graph

2254 Commits

Author SHA1 Message Date
LC
fa91b47863
Merge pull request #9054 from sepalani/hle-cleanup
HLE cleanup
2020-09-07 22:36:19 -04:00
Jordan Woyak
0a63340c20
Merge pull request #9037 from shuffle2/code-cleanup
Code cleanup
2020-08-30 19:43:23 -05:00
Sepalani
4c75b96254 HLE: Improve naming
Replace 'function' with 'hook' when appropriate
2020-08-28 20:29:05 +04:00
Kate
5981a1929d Add support for FreeBSD/arm64 2020-08-27 21:54:04 +01:00
Shawn Hoffman
938fd4e438 use constexpr for some compile-time expressions 2020-08-23 13:57:05 -07:00
Shawn Hoffman
79f5ea0474 initialize some variables which need to be 2020-08-23 13:57:05 -07:00
Tilka
a161e58591
Merge pull request #8914 from JosJuice/jit64-low-dcbz
Jit64: Implement low DCBZ hack
2020-08-08 21:19:16 +01:00
JosJuice
76228fa482 Jit64: Implement low DCBZ hack
I was hoping this would improve the performance of Cars 2 by
avoiding interpreter fallbacks, but it doesn't seem to have
made any measurable impact.
2020-08-08 22:03:34 +02:00
Tilka
3101d957b6
Merge pull request #8886 from JosJuice/stack-check-instruction
PatchEngine: Attempt to fix crash in IsStackSane
2020-08-08 20:59:48 +01:00
Tilka
76b955e090
Merge pull request #8940 from RenaKunisaki/master
add Break On Hit and Log On Hit for instruction breakpoints
2020-08-08 19:46:10 +01:00
Tilka
6d0bc03e00
Merge pull request #8992 from Sintendo/fselx-avx
Jit64: Avoid unnecessary MOVAPS instructions
2020-08-08 19:38:48 +01:00
JosJuice
8b4f16a310 JitArm64: Avoid double rounding in fctiwzx
FCVT doesn't necessarily round to zero, so the result
might be inaccurate if we use it. To ensure correct
rounding, we use FCVTS from double FPR to 32-bit GPR.
Unfortunately, FCVTS can't do double FPR to single FPR.
2020-08-07 22:44:04 +02:00
Sintendo
08bdeefe05 Jit64AsmCommon: Use AVX in ConvertDoubleToSingle
Using AVX we can eliminate another MOVAPS instruction here.

Before:
0F 28 C8                movaps      xmm1,xmm0
66 0F DB 0D CF 2C 00 00 pand        xmm1,xmmword ptr [1F8D283B220h]

After:
C5 F9 DB 0D D2 2C 00 00 vpand       xmm1,xmm0,xmmword ptr [271835FB220h]
2020-08-02 18:07:47 +02:00
Sintendo
31755bc13a Jit64: fselx - Optimize SSE4.1 packed
Pretty much the same optimization we did for AVX, although slightly more
constrained because we're stuck with the two-operand instruction where
destination and source have to match.

We could also specialize the case where registers b, c, and d are all
distinct, but I decided against it since I couldn't find any game that
does this.

Before:
66 0F 57 C0          xorpd       xmm0,xmm0
66 41 0F C2 C1 06    cmpnlepd    xmm0,xmm9
41 0F 28 CE          movaps      xmm1,xmm14
66 41 0F 38 15 CC    blendvpd    xmm1,xmm12,xmm0
44 0F 28 F1          movaps      xmm14,xmm1

After:
66 0F 57 C0          xorpd       xmm0,xmm0
66 41 0F C2 C1 06    cmpnlepd    xmm0,xmm9
66 45 0F 38 15 F4    blendvpd    xmm14,xmm12,xmm0
2020-07-29 17:28:48 +02:00
Sintendo
afb86a12ab Jit64: fselx - Optimize AVX packed
For the packed variant, we can skip the final MOVAPS and write the
result directly into the destination register.

Before:
66 0F 57 C0          xorpd       xmm0,xmm0
66 41 0F C2 C1 06    cmpnlepd    xmm0,xmm9
C4 C3 09 4B CC 00    vblendvpd   xmm1,xmm14,xmm12,xmm0
44 0F 28 F1          movaps      xmm14,xmm1

After:
66 0F 57 C0          xorpd       xmm0,xmm0
66 41 0F C2 C1 06    cmpnlepd    xmm0,xmm9
C4 43 09 4B F4 00    vblendvpd   xmm14,xmm14,xmm12,xmm0
2020-07-29 17:06:52 +02:00
Sintendo
a52774ca63 Jit64: fselx - Add AVX path
AVX has a four-operand VBLENDVPD instruction, which allows for the first
input and the destination to be different. By taking advantage of this,
we no longer need to copy one of the inputs around and we can just
reference it directly, provided it's already in a register (I have yet
to see this not be the case).

Before:
66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
41 0F 28 CE          movaps      xmm1,xmm14
66 41 0F 38 15 CA    blendvpd    xmm1,xmm10,xmm0
F2 44 0F 10 F1       movsd       xmm14,xmm1

After:
66 0F 57 C0          xorpd       xmm0,xmm0
F2 41 0F C2 C6 06    cmpnlesd    xmm0,xmm14
C4 C3 09 4B CA 00    vblendvpd   xmm1,xmm14,xmm10,xmm0
F2 44 0F 10 F1       movsd       xmm14,xmm1
2020-07-28 23:17:18 +02:00
Rena Kunisaki
a553f22385 Add Break On Hit and Log On Hit for instruction breakpoints 2020-07-11 13:38:58 -04:00
MerryMage
a10447eae2 JitArm64_Paired: Fix ps_msub when d == b 2020-07-01 20:11:54 +01:00
Tillmann Karras
a04ac23794 JitArm64: no intermediate rounding for paired FMA 2020-07-01 00:24:08 +01:00
Tillmann Karras
2a46c1f86f JitArm64: annotate intentional fallthrough 2020-07-01 00:10:15 +01:00
OatmealDome
089ffb9ef4 JitArm64: Don't assume fastmem arena is available 2020-06-29 00:42:56 -04:00
JosJuice
364ef76ba1 PatchEngine: Attempt to fix crash in IsStackSane
HostIsInstructionRAMAddress uses XCheckTLBFlag::OpcodeNoException,
so we should also use XCheckTLBFlag::OpcodeNoException when reading,
to ensure that we use the IBAT (as opposed to the DBAT) for both.
2020-06-18 11:57:00 +02:00
Pierre Bourdon
dd1fc711c7
PowerPC: partially implement thermal related SPRs
Doesn't support triggering interrupts when the thermal threshold is
exceeded, but allows polling for temperature information.

The THRM[123] registers are documented in most PPC datasheets, see e.g.
this PPC750CX one: http://datasheets.chipdb.org/IBM/PowerPC/750/750cx_um3-17-05.pdf
2020-06-18 07:37:44 +02:00
Jun Su
bb75050f68 Jit: fix warning -Winvalid-offsetof
Remove the warning:
warning: offsetof within non-standard-layout type ‘JitBlock’ is conditionally-supported
JitBlock contains non-trival types now. Split the fields with trival
types that needs to be access from JIT code into JitBlockData structure.
2020-05-04 18:26:56 +02:00
Minty-Meeo
cc858c63b8 Configurable MEM1 and MEM2 sizes at runtime via Dolphin.ini
Changed several enums from Memmap.h to be static vars and implemented Get functions to query them. This seems to have boosted speed a bit in some titles? The new variables and some previously statically initialized items are now initialized via Memory::Init() and the new AddressSpace::Init(). s_ram_size_real and the new s_exram_size_real in particular are initialized from new OnionConfig values "MAIN_MEM1_SIZE" and "MAIN_MEM2_SIZE", only if "MAIN_RAM_OVERRIDE_ENABLE" is true.

GUI features have been added to Config > Advanced to adjust the new OnionConfig values.

A check has been added to State::doState to ensure savestates with memory configurations different from the current settings aren't loaded. The STATE_VERSION is now 115.

FIFO Files have been updated from version 4 to version 5, now including the MEM1 and MEM2 sizes from the time of DFF creation. FIFO Logs not using the new features (OnionConfig MAIN_RAM_OVERRIDE_ENABLE is false) are still backwards compatible. FIFO Logs that do use the new features have a MIN_LOADER_VERSION of 5. Thanks to the order of function calls, FIFO logs are able to automatically configure the new OnionConfig settings to match what is needed. This is a bit hacky, though, so I also threw in a failsafe for if the conditions that allow this to work ever go away.

I took the liberty of adding a log message to explain why the core fails to initialize if the MIN_LOADER_VERSION is too great.

Some IOS code has had the function "RAMOverrideForIOSMemoryValues" appended to it to recalculate IOS Memory Values from retail IOSes/apploaders to fit the extended memory sizes. Worry not, if MAIN_RAM_OVERRIDE_ENABLE is false, this function does absolutely nothing.

A hotfix in DolphinQt/MenuBar.cpp has been implemented for RAM Override.
2020-04-28 12:10:50 -05:00
Lioncash
ee200d09eb Jit64/Jit64_Tables: Construct tables at compile-time
Utilizing constexpr, we can eliminate the need to construct the tables
at runtime and just do all the work at compile-time. Making for less
moving parts overall.

The general structure is more or less the same, however rather than one
single initialization function, each table is built off an immediately
executed lambda function. This is nice, since it narrows the scope of
the table building logic down to the tables that actually need it.
2020-04-28 17:12:24 +02:00
Sintendo
19dda51a0d Jit64: subfx - Use LEA when possible
Similar to what we do for addx. Since we're calculating b - a and
because subtraction is not communitative, we can only apply this when
source register a holds the constant.

Before:
45 8B EE             mov         r13d,r14d
41 83 ED 08          sub         r13d,8

After:
45 8D 6E F8          lea         r13d,[r14-8]
2020-04-21 22:45:47 +02:00
Sintendo
89646c898f Jit64: addx - Skip ADD after MOV when possible
We can get away with skipping the addition when we know we're dealing
with a constant zero. Just a MOV will suffice in this case.

Once again, we don't bother to add separate handling for when overflow
is needed, because no titles would ever hit that path during my testing.

Before:
8B 7D F8             mov         edi,dword ptr [rbp-8]
83 C7 00             add         edi,0

After:
8B 7D F8             mov         edi,dword ptr [rbp-8]
2020-04-21 22:45:47 +02:00
Sintendo
50f7a7d248 Jit64: addx - Prefer smaller MOV+ADD sequence
ADD has a smaller encoding for immediates that can be expressed as an
8-bit signed integer (in other words, between -128 and 127). MOV lacks
this compact representation.

Since addition allows us to swap the source registers, we can always get
the shortest sequence here by carefully checking if we're dealing with a
small immediate first. If we are, move the other source into the
destination and add the small immediate onto that. For large immediates
the reverse is preferrable.

Before:
41 BE 40 00 00 00    mov         r14d,40h
44 03 75 A8          add         r14d,dword ptr [rbp-58h]

After:
44 8B 75 A8          mov         r14d,dword ptr [rbp-58h]
41 83 C6 40          add         r14d,40h

Before:
44 8B 7D F8          mov         r15d,dword ptr [rbp-8]
41 81 C7 00 68 00 CC add         r15d,0CC006800h

After:
41 BF 00 68 00 CC    mov         r15d,0CC006800h
44 03 7D F8          add         r15d,dword ptr [rbp-8]
2020-04-21 22:42:02 +02:00
Sintendo
2481660519 Jit64: addx - Emit MOV when possible
When the source registers are a simple register and a constant zero and
overflow isn't needed, emitting LEA is kinda silly.

This will occasionally save a single byte for certain registers due to
how x86 encoding works. More importantly, LEA takes up execution
resources while MOV does not.

Before:
41 8D 7D 00          lea         edi,[r13]

After:
41 8B FD             mov         edi,r13d
2020-04-21 22:36:20 +02:00
Sintendo
1c25e6352a Jit64: addx - Emit nothing when possible
When the destination register matches a source register, the other
source register contains zero, and overflow isn't needed, the
instruction becomes a nop and we don't need to emit anything.

We could add specialized handling for the case where overflow is needed,
but none of the titles I tried would hit this path.

Before:
83 C7 00             add         edi,0

After:
2020-04-21 22:35:17 +02:00
Sintendo
f1c3ab359d Jit64: addx - Deduplicate branches part 2
No functional change, just simplify some repeated logic in the case
where we're dealing with exactly one immediate and one simple register
when overflow isn't needed.
2020-04-21 22:06:46 +02:00
Sintendo
72fbdf1a6b Jit64: addx - Deduplicate branches part 1
No functional change, just simplify some repeated logic for the cases
where the destination register matches one of the sources.
2020-04-21 22:06:39 +02:00
container1234
75a69b1145 Breakpoints: Fix crash after clearing all memory breakpoints 2020-03-14 21:57:09 +09:00
Tilka
e323f47ceb
Merge pull request #8472 from degasus/jitsetting
Core/Jits: Adds an option to disable the register cache.
2020-02-08 13:49:33 +00:00
Techjar
a106c99826 Jit64: Don't use PEXT in DoubleToSingle on AMD Zen
This was causing severe slowdown in some games.
2020-01-26 22:10:46 -05:00
Tilka
709862b818
Merge pull request #8120 from MerryMage/cdts
Jit64: Make DoubleToSingle a common asm routine
2020-01-25 19:10:37 +00:00
Connor McLaughlin
efc1ee8e6a
Merge pull request #8537 from degasus/fastmem
Core/HW -> PowerPC/JIT: Fastmem arena construction
2020-01-14 09:38:15 +10:00
Tilka
98f645daac
Merge pull request #8158 from Sintendo/jitopts
x64 micro-optimizations
2020-01-06 14:09:43 +01:00
Sintendo
12fcbac2a3 Jit64: addx - Emit LEA for register + immediate
Prefer LEA over MOV + ADD when dealing with immediates.

Before:
44 8B EE             mov         r13d,esi
41 83 C5 20          add         r13d,20h

After:
44 8D 6E 20          lea         r13d,[rsi+20h]
2020-01-05 23:39:13 +01:00
Sintendo
8e7b6f4178 Jit64: addx - Prefer ADD over LEA when possible
The old logic would always emit LEA when both sources are in a register
and OE is disabled. However, ADD is still preferable when one of the
sources matches the destination.

Before:
45 8D 6C 35 00       lea         r13d,[r13+rsi]

After:
44 03 EE             add         r13d,esi
2020-01-05 23:23:56 +01:00
David Korth
c2dd2e8a2e Use std::istringstream or std::ostringstream instead of std::stringstream where possible.
This removes std::iostream from the inheritance chain, which reduces
overhead slightly.
2019-12-29 23:45:02 -05:00
David Korth
9f3b9acad9 PowerPC.cpp: No need to explicitly initialize ppcState.
"ppcState{}" is stored in the .data segment, which means the full ~4 MB
is stored in the executable.

"ppcState" is stored in the .bss segment, which means it only stores a
note that tells it to allocate and zero ~4 MB at runtime.
2019-12-29 23:45:02 -05:00
degasus
aad8aab698 Jit64: Disable the fast address check if fastmem is disabled.
This was a huge speedup with disabled fastmem, but it still requires the fastmem arena.
So let's disable it for now, even if this commit has a huge performance hit with disabled fastmem.
2019-12-28 13:41:57 +01:00
degasus
d735943aa2 Jit64: Use safe memory helpers for psq_l* without fastmem.
RMEM won't help if there is no fastmem arena, so let's use our memory helpers.
2019-12-28 13:41:57 +01:00
degasus
74cb692591 Jit64: Only activate dcbz fastpath with fastmem.
The code is safe not to create memory errors, but it accesses the fastmem area.
2019-12-28 13:41:57 +01:00
degasus
c6019f9814 PowerPC/Jit: Create fastmem arena on init. 2019-12-28 13:41:57 +01:00
degasus
9d88180df7 MMU: Use the Memory helpers for physical memory.
physical_base is a fastmem helper. Its access is unsafe and might not be available without a Jit.
2019-12-28 12:57:51 +01:00
Stenzek
d744c5a148 Compile fixes for Windows-on-ARM64 2019-12-28 19:20:41 +10:00
Léo Lam
3cf2857aac
Merge pull request #8520 from lioncash/analyst-tidy
PowerPC/PPCAnalyst: Remove unimplemented LogFunctionCall prototype
2019-12-15 12:07:38 +01:00