Refactor all the SSSE3 functions in TextureDecoder so that the cpu_info check isn’t looped over. Speeds up most textures dramatically (where it has previously slowed them).
New SSSE3 implementation for I4 texture decode. 14% speedup over the previous SSE4 implementation (so it was scrapped).
Add profiling (via oprofile) to the cmake build.
From my last commit: Fix build on Linux. Use SSSE3 instead of SSE3.
Remove some unused vars from the SSE2 CMPR.
fix for a error introduced in my last commit.
New SSSE3 implementation of RGB5A3. About 40% improvement (less cycles) on the plain C version and 17% on the SSE2 version.