~40% speed improvement for decoding GX_TF_CMPR (S3TC) textures using SSE2 intrinsics.
small fix for lle-int
Look for the DSP ROM files in Sys/GC as before before trying User/GC
to allow both system-side (also Windows) and per-user installations.
The IPL ROM is another candidate for this, but I don’t have an
image to test with.
Perhaps a more general solution would be to just phase out the Sys
directory. As used for data which would have been available in ROM
or flash on the real hardware, it really contains two classes of
files that are both read-only during emulation:
– Settings and font files, which can be freely distributed
– ROM files which must be obtained from the user’s own GC or Wii
Since the two could be freely put together on Windows without any
problems and with the users of that platform being resistant to
change, it may be easiest to just treat Sys as another directory
to be copied from the application bundle into User/Sys at startup
D3D9: Fall back to just creating a depth surface instead of a depth texture if the latter one isn’t supported by the hardware.
This is a workaround for issue 3256.
Some minor tidying of the OS X audio and wiimote code.
Retire the workaround for wxWidgets using the Objective-C reserved
keyword “id” in the public auibar.h header.
You now need at least r66546 of wxWidgets to build on OS X.
Move the expected location of the DSP ROM files from Sys/GC to User/GC.
These files are acquired from the hardware ROM by the individual user,
who does not normally have write access to Sys on Linux and OS X.
If you currently have these files in Sys/GC, just move them to User/GC
(~/Library/Application\ Support/Dolphin/GC on OS X).
Taking a random guess at what could possibly fix issue 3256…
– Assign width and height to the actual powers of two rather than to the exponents…
– Clean up FramebufferManager()
– Make use of more depth buffer formats to prevent some devices from failing to create a depth buffer
Should fix issue 3256.
LLE JIT: Minimised exception checking. Instructions which need to check for exceptions are now marked in the analyser. Moved the checking for external interrupts to the point where the CPU writes to the control register.
LLE JIT: Reworked the block linking code. It now keeps track of what each block is waiting on, minimising the amount of recompiling. Both jumps and calls can now become linked. The code also checks the cycle count before jumping to the linked block.
Warning fix for gcc ;)
Fix for bit reduction regression in GX_TF_RGB565 textures from previous commit.
D3D9: Make sure to use powers of two as render target dimensions if it’s needed by the device.
Some other cleanups.
Possibly fixes issue 3256.
GX_TF_RGB565 texture decoder optimized with SSE2 producing a ~78% speed increase over reference C implementation.
Fixed crash in debugger when attempting to enable profiler before having run any game.
ISFS_Seek: Turns out POSIX allows seek past EOF, the Wii does not.
Should fix Issue 3761 (really!), please test this.
Fix an issue on windows where found wiimotes were not being reported as found. This fixes issue 3832.
Cast size_t to unsigned long for printing.
Audio volume slider support for OS X.
Core/Core/HW: Give small amounts of time to the dsp whenever the ppc
reads the high bits from the mailbox registers. It is probably waiting for
the dsp to read the data from the cpu-to-dsp mailbox or for the dsp to
write to the dsp-to-cpu mailbox.
This about removes DSP::Read16 from lle profiles where it previously used
up to 2% of all system time. Also speeds up games quiet a bit.
Handle FileIO Read/Write more like real hardware.
Fixes Issue 3761.
Fixes Issue 1749.
Possibly fixed game crash issues by switching to unaligned SSE2 loads/stores.
Removed unnecessary work being done in the file system when logging is disabled.
Patch applied from issue 3829, author firstname.lastname@example.org.
Tweaked SetMultiVSConstant functions to prefer glProgramEnvParameters4fvEXT over glProgramEnvParameter4fvARB with fall-back for older harder.
Avoid shadowing variables.
Fix for r6707. Looks like I tried to do some invalid 16 bit addressing.
Also a small change to the mixer. This should fix audio throttling in cases where num_samples > RESERVED_SAMPLES. This seems to happen now with zelda ucode games, possibly others.
With the more aggressive polling by the per-wiimote threads,
additional input queueing in IOdarwin appears to be unnecessary.
IOBluetoothDeviceInquiry does not find already connected devices,
so no need to filter those out.
OGL: Clean up ClearScreen
Revive io_osx.mm revision history and reapply the changes in r6693.
Avoid sending the Wii OS bluetooth packets with uninitialized data
past the nominal length of the report.
XXX IOWin.cpp still always returns MAX_PAYLOAD because I don’t have
a Windows environment to test with.
Applied the logic from r6691 to the LLE dec/add/sub functions so they work without ToMask. This should give a modest speedup for these.
Pierre’s AR inc was already perfect and I only adjusted its logic a bit for visual consistency between the interpreter and JIT code.
Also applied Pierre’s optimization from the LLE inc to the Int inc.
GX_TF_I4 texture decoder optimized with SSE2 producing a ~76% speed increase over reference C implementation.
GX_TF_RGBA8 texture decoder optimized with SSE2 producing a ~68% speed increase over reference C implementation.
TABified the entire document per NeoBrainX. :)
Initialize bluetooth input queue when creating wiimote object.
clamp OGL depth clear value, this might fix a problem some people reported with r6678
This may fix the extension issue on windows. Please test. Thanks BhaaL.
This should fix wiimote extensions on linux. Now to figure out the windows issue, and then OSX … Sigh!
Removed left-overs from wiiuse, should fix the problems when building on windows.
fixed crash when compressing 4+gb isos on some builds
Dolphin SVN r6699
~68% increase in GX_TF_IA8 decoding speed. Not an oft-used texture format. An example use is the Wii cursor in MKWii in the menus.
Dolphin SVN r6698
~80% speed improvement in decoding GX_TF_I8 textures. Yes, EIGHTY PERCENT. However, for MKWii movie playback I still can’t break the fluffin’ 48 FPS boundary on my machine! There’s something else at play here because this decoder is ridonkulously fast.
~25% speed improvement in decoding GX_TF_RGB5A3 textures which aren’t used very much. I thought it would help for movie playback but I misled myself. Video playback has nothing to do with this texture format.
Next I’ll see if I can knock out some of these other texture decoders. Byte swizzling I’m sure can somehow be accomplished using _mm_unpacklo_epi8 trickery, so that’d be another big win I hope.
TextureDecoder.cpp: new SSE2 optimized GX_TF_I8 decoder. Probably not ultimately optimal SSE2 code, but provably better (on my machine) than the memset version. Tested with __rdtsc counts in an independent project. I get about 6-7 FPS more on average during the intro movie playback in Mario Kart Wii. Hope this compiles for GCC okay.
TextureDecoder.cpp: merged two functionally identical decode5A3RGBA and decode5A3rgba methods.
OpcodeDecoding.cpp and DLCache.cpp: optimization for GX_LOAD_XF_REG. The PSUHFB solution sounds better for SSSE3, but this is a small win for the default case.
LLE Int: (addr add/sub/inc/dec)
Adjusted the code work without ToMask.
This code should be functionally identical for all inputs to the previous code.
Fix issue with LinearDiskCache where only new files could be written to, Append() would fail on previously existing cache files.