GX_TF_I4 texture decoder optimized with SSE2 producing a ~76% speed increase over reference C implementation.
GX_TF_RGBA8 texture decoder optimized with SSE2 producing a ~68% speed increase over reference C implementation.
TABified the entire document per NeoBrainX. :)
Initialize bluetooth input queue when creating wiimote object.
clamp OGL depth clear value, this might fix a problem some people reported with r6678
This may fix the extension issue on windows. Please test. Thanks BhaaL.
This should fix wiimote extensions on linux. Now to figure out the windows issue, and then OSX … Sigh!
r6706 x86 – скачать, зеркало
r6706 x64 – скачать, зеркало
Removed left-overs from wiiuse, should fix the problems when building on windows.
fixed crash when compressing 4+gb isos on some builds
Dolphin SVN r6699
~68% increase in GX_TF_IA8 decoding speed. Not an oft-used texture format. An example use is the Wii cursor in MKWii in the menus.
Dolphin SVN r6698
~80% speed improvement in decoding GX_TF_I8 textures. Yes, EIGHTY PERCENT. However, for MKWii movie playback I still can’t break the fluffin’ 48 FPS boundary on my machine! There’s something else at play here because this decoder is ridonkulously fast.
~25% speed improvement in decoding GX_TF_RGB5A3 textures which aren’t used very much. I thought it would help for movie playback but I misled myself. Video playback has nothing to do with this texture format.
Next I’ll see if I can knock out some of these other texture decoders. Byte swizzling I’m sure can somehow be accomplished using _mm_unpacklo_epi8 trickery, so that’d be another big win I hope.
r6701 x86 – скачать, зеркало
r6701 x64 – скачать, зеркало
TextureDecoder.cpp: new SSE2 optimized GX_TF_I8 decoder. Probably not ultimately optimal SSE2 code, but provably better (on my machine) than the memset version. Tested with __rdtsc counts in an independent project. I get about 6-7 FPS more on average during the intro movie playback in Mario Kart Wii. Hope this compiles for GCC okay.
TextureDecoder.cpp: merged two functionally identical decode5A3RGBA and decode5A3rgba methods.
OpcodeDecoding.cpp and DLCache.cpp: optimization for GX_LOAD_XF_REG. The PSUHFB solution sounds better for SSSE3, but this is a small win for the default case.
LLE Int: (addr add/sub/inc/dec)
Adjusted the code work without ToMask.
This code should be functionally identical for all inputs to the previous code.
Fix issue with LinearDiskCache where only new files could be written to, Append() would fail on previously existing cache files.
r6692 x86 – скачать, зеркало
r6692 x64 – скачать, зеркало
Upgrade GLEW to version 1.5.7:
Improved mingw32 build support
Improved cygwin build support
Fixes issue 3798.
* Optimised out the R11 ImmPtr instructions
* Made the whitespacing a little more consistent
* Disabled block linking while it is being reworked
r6689 x86 – скачать, зеркало
r6689 x64 – скачать, зеркало
Core/DSPCore: Changed g_dsp._r back to g_dsp.r. Removed the check*Exclude
functions accidentally added. Fixed the jitted ar register arithmetic.
Added a CMakeList.txt for the UnitTests, but did not add the subdirectory
Linux build fix.
Add ciso support.
Thanks to dolphin.user839 for the patch ;)
Fixes issue 2708.
DX9/DX11: Fixing some maybe possible crashes if a game was started, the config dialog opened and the game closed again. Due to other issues this still happens quite often though…
Various warning fixes.
Compile fix for Unittests.
A lot of tests are failing right now, not sure yet if this is caused by the recent reg changes or other stuff (at least broken according to tests: iar, subarn, addarn, ‘ir, ‘nr, ‘l, ‘s, ‘sn, ‘ln, ‘lsn, ‘lsm, ‘lsnm, ‘sl etc — err, most/all that use increase/decrease)
r6687 x86 – скачать, зеркало
r6687 x64 – скачать, зеркало
More FIFO work. This is an Experimental commit, Fixed «Monopoly Wii» («FIFOs linked but out of sync» problem in this game) Re-sync the FIFO again when this is in immediate mode. Copy CP register values to PI register. Now this games is booting and ingame :)
* Fixed a bug in the JCC instruction.
* Changed the cycle count from u32 to u16.
* Added cycle counting to the block linker.
* Optimised the branch exit slightly.
Core/DSPCore: Reorganize register layout for accessing accumulators
(acc and ax) and product register with one read/write.
Gives a minuscule speedup of not more than 4%. In exchange, breaks all
your out-of-tree changes to dsp. Tests are not building again, yet.
Made wiimotes automatically connect on start based on their selected «Input Source» in the wiimote configuration.(fixes issue 2261) Made wiimotes 2-4 input source default to «None».
r6682 x86 – скачать, зеркало
r6682 x64 – скачать, зеркало
Fix ClearScreen in OpenGL as well (uses clear quads instead of glClear now).
Thanks to kiesel and sl1nk3 for helping me out here ;)
Invalidate texture cache when the STC or native mipmaps options are changed.
Fixes minor graphical glitches in these cases.
r6678 x86 – скачать, зеркало
r6678 x64 – скачать, зеркало
Small fix in addition to r6669.
LLE JIT: Completed the remaining JIT DSP instructions (lrrn, srrn, ilrrn).
LLE JIT: Completed the JIT versions of the DSP Branch instructions (added ifcc, ret, rti and halt).
more WIP BBA crap…nothing nice to see.
All 64-bit capable Macs have the SSSE3 extension.
Make ARAM DMAs take time. Allows WWE Day of Reckoning 1 & 2 to go ingame…then they crash. Not entirely sure if the crash is related.
r6676 x86 – скачать, зеркало
r6676 x64 – скачать, зеркало
Yet another ClearScreen fix, should be the last one now.
Should fix almost all regressions of the recent ClearScreen changes and keep the fixed stuff.
The Super Mario Sunshine glitch is caused by another issue and will be addressed in my next commit.
Core/DSPCore: Improve Interpreter address register add/sub, convert to
assembler for JIT. Replace JIT ToMask() with a different variant. Remove
superfluous zeroWriteBackLog calls(added by me).
Core/Common: Don’t bother creating a string and calling into a Logs trigger()
when there is noone listening. Change AtomicLoadAcquire for gcc to just
make the compiler not reorder memory accesses around it instead of doing
a full memory barrier, per the comment in the win32 variant.
Core/AudioCommon: Fix a use of uninitialized variable inside libalsa.
Microbenchmarking results for ToMask variants:(1 000 000 000 iterations):
cpu\variant| shifts | bit scan
intel mobile C2D@2.5GHz | 5.5s | 4.0s
amd athlon64x2@3GHz | 6.1s | 6.4s
(including some constant overhead identical to both variants)
r6668 x86 – скачать, зеркало
r6668 x64 – скачать, зеркало
little fix for my last commit
Ops! Little fix for my prior commit.
Second Experimental commit:
corrected peek color and peek z to correctly emulate real hardware formats.
implements native gamma correction.(i don’t own any game that uses this functionality so i will appreciate feedback)
i need a lot of feedback in this changes please
First Experimental Commit:
make some changes to the Clear code. please test a lot , the point of this commit is to determine the correct behavior of the efb clearing so feedback is welcome
More FIFO work, HACK Solution for extreme overflow on breakpoints.
1) What is the FIFO? The fifo is a ring queue for write (CPU) and read (GPU) the graphics commands.
2) What is the Brakpoints? The breakpoint is the FIFO mark to allow parallel work (CPU-GPU) When the GPU reached the breakpoint must stop read immediately until this Breakpoint will be removed for the CPU.
3) What is an overflow? The CPU write all room FIFO possible, and like a ring overwrite commands not processed yet.
4) ?Why you have an overflow? In theory should not have an overflow never because the fifo has another mark (High Watermark) When the CPU Write reach this mark raise a CP interruption and the FIFO CPU WRITES should stop write until distance between READ POINTER AND WRITE POINTER will be equal to another mark (LO Watemark to prevent and overflow.
5) ?So if impossible why you have overflows? Simple, the CP interruption is processed later and the Overflow happens. (there is a lot of theories about this)
6) ?Why is no so simple like when CPU WRITE POINTER is near to the end of the FIFO only process pending graphics command?
Because when this happens sometimes we are in BREAKPOINT and is IMPOSIBLE process the graphics commands.
— This HACK process the pending data when CPU WRITE POINTER is 32 bytes before the end of the fifo, and if there is a Breakpoint force the situation to process the commands and prevent an overflown.
In theory you have not see «FIFO is overflown by GatherPipe nCPU thread is too fast!» anymore. But if you have a hang in game where you had this please read the NOTICE LOG in user\logs, I’ve added this message «FIFO is almost in overflown, BreakPoint» when the hack is activated. (I will delete this message very soon)
Good Luck!! PD: Shuffle sorry for the large commit description :P
r6666 x86 – скачать, зеркало
r6666 x64 – скачать, зеркало