Ronald S. Bultje
e3f7bf774c
Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this
...
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.
Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 19:33:05 +00:00
Jason Garrett-Glaser
3ae079a3c8
VP8: optimize DC-only chroma case in the same way as luma.
...
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.
Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser
51c9156438
VP8 asm: cosmetics (spacing)
...
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 03:02:56 +00:00
Jason Garrett-Glaser
8a467b2d44
VP8: 30% faster idct_mb
...
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?
Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 02:58:27 +00:00
Jason Garrett-Glaser
c25c776708
VP8: clear DCT blocks in iDCT instead of using clear_blocks.
...
~0.3% faster overall.
Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 00:07:16 +00:00
Ronald S. Bultje
dc5eec8085
Use pextrw for SSE4 mbedge filter result writing, speedup 5-10cycles on
...
CPUs supporting it.
Originally committed as revision 24437 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 19:59:34 +00:00
Ronald S. Bultje
003243c3c2
Fix and enable horizontal >=SSE2 mbedge loopfilter.
...
Originally committed as revision 24409 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 01:35:26 +00:00
Jason Garrett-Glaser
8731dbd890
Eliminate one instruction in VP8 dc_add_sse4
...
Originally committed as revision 24405 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:41:37 +00:00
Jason Garrett-Glaser
7dd224a42d
Various VP8 x86 deblocking speedups
...
SSSE3 versions, improve SSE2 versions a bit.
SSE2/SSSE3 mbedge h functions are currently broken, so explicitly disable them.
Originally committed as revision 24403 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 22:11:03 +00:00
Jason Garrett-Glaser
b8b231b5dc
Make mmx VP8 WHT faster
...
Avoid pextrw, since it's slow on many older CPUs.
Now it doesn't require mmxext either.
Originally committed as revision 24397 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-21 20:51:01 +00:00
Ronald S. Bultje
e9e456d850
VP8 MBedge loopfilter MMX/MMX2/SSE2 functions for both luma (width=16)
...
and chroma (width=8).
Originally committed as revision 24378 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:58:56 +00:00
Ronald S. Bultje
268821e76e
Chroma (width=8) inner loopfilter MMX/MMX2/SSE2 for VP8 decoder.
...
Originally committed as revision 24377 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-20 22:04:18 +00:00
Ronald S. Bultje
c60ed66dbe
Revert r24339 (it causes fate failures on x86-64) - I'll figure out what's
...
wrong with it tomorrow or so, then re-submit.
Originally committed as revision 24341 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 23:57:09 +00:00
Ronald S. Bultje
1878f685c0
Implement chroma (width=8) inner loopfilter MMX/MMX2/SSE2 functions.
...
Originally committed as revision 24339 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:53:28 +00:00
Ronald S. Bultje
fb9bdf048c
Be more efficient with registers or stack memory. Saves 8/16 bytes stack
...
for x86-32, or 2 MM registers on x86-64.
Originally committed as revision 24338 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:45:36 +00:00
Ronald S. Bultje
3facfc99da
Change function prototypes for width=8 inner and mbedge loopfilter functions
...
so that it does both U and V planes at the same time. This will have speed
advantages when using SSE2 (or higher) optimizations, since we can do both
the U and V rows together in a single xmm register.
This also renames filter16 to filter16y and filter8 to filter8uv so that it's
more obvious what each function is used for.
Originally committed as revision 24337 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-19 21:18:04 +00:00
Ronald S. Bultje
819b2dd2b1
Attempt to fix x86-64 testsuite on fate.
...
Originally committed as revision 24275 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 21:35:30 +00:00
Ronald S. Bultje
6f323f1251
Remove duplicate define.
...
Originally committed as revision 24272 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:54:47 +00:00
Ronald S. Bultje
889b2c26ee
Revert 24270, it contained some stuff that shouldn't have been in there.
...
Originally committed as revision 24271 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:54:25 +00:00
Ronald S. Bultje
2356a7834b
Remove duplicate define.
...
Originally committed as revision 24270 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:42:32 +00:00
Ronald S. Bultje
ede1b9665a
Give x86 r%d registers names, this will simplify implementation of the chroma
...
inner loopfilter, and it also allows us to save one register on x86-64/sse2.
Originally committed as revision 24269 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 19:38:10 +00:00
Ronald S. Bultje
526e831a46
Change return statement, the REP_RET is a mistake since the else case (x86-64,
...
sse2) doesn't actually loop, so REP_RET isn't necessary.
Originally committed as revision 24268 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-16 18:29:14 +00:00
Ronald S. Bultje
a711eb4829
VP8 H/V inner loopfilter MMX/MMXEXT/SSE2 optimizations.
...
Originally committed as revision 24250 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-15 23:02:34 +00:00
Ronald S. Bultje
f2a30bd840
Simple H/V loopfilter for VP8 in MMX, MMX2 and SSE2 (yay for yasm macros).
...
Originally committed as revision 24029 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 19:26:30 +00:00
Jason Garrett-Glaser
b06855f18a
SSSE3 versions of vp8 width4 bilinear MC functions
...
Originally committed as revision 24013 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-03 00:48:12 +00:00
Jason Garrett-Glaser
dcc602d802
SSSE3 versions of width4 VP8 6-tap MC functions
...
Also make some small changes to saturation order of 4-tap SSSE3 MC to fix a
non-bitexactness bug.
Patch mostly by Eli Friedman <eli.friedman AT gmail DOT com>.
Originally committed as revision 23965 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-02 05:27:41 +00:00
Jason Garrett-Glaser
82a8d0f114
Use add instead of lshift in mmxext vp8 idct
...
Originally committed as revision 23891 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 17:23:17 +00:00
Ronald S. Bultje
565344e7e4
Remove unused macros (duplicates from the now-LGPL x86util.asm).
...
Originally committed as revision 23890 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 17:04:29 +00:00
Ronald S. Bultje
2dd2f71692
MMX idct_add for VP8.
...
Originally committed as revision 23886 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 14:43:11 +00:00
Jason Garrett-Glaser
004cda8e79
Add mmxext version of VP8 DC Hadamard transform
...
Originally committed as revision 23878 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-29 01:41:59 +00:00
Jason Garrett-Glaser
a912da761d
Fix VP8 bilinear mc on x86_64
...
Originally committed as revision 23872 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 22:13:14 +00:00
Jason Garrett-Glaser
0fecad09fe
Add x86 asm functions for VP8 put_pixels
...
Originally committed as revision 23858 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 19:14:40 +00:00
Jason Garrett-Glaser
a173aa8940
Add MMX, SSE2, SSSE3 asm for VP8 bilinear MC
...
Originally committed as revision 23857 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-28 18:56:24 +00:00
Jason Garrett-Glaser
0178d14fe5
First shot at VP8 optimizations:
...
- MMXEXT, SSE2 and SSSE3 MC functions
- MMX and SSE4 IDCT dc_add functions
Patch by Jason Garrett-Glaser <darkshikari gmail com> and myself.
Originally committed as revision 23815 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-06-27 02:01:45 +00:00