Commit Graph

12154 Commits

Author SHA1 Message Date
S.N. Hemanth Meenakshisundaram
9dd9d67bd0 Define static functions fill_image_linesize() and
fill_image_data_ptr(). ff_fill_linesize() and ff_fill_pointer() now wrap
these functions.

The new functions are more generic, and are going to be exported in a
future patch.

Patch by S.N. Hemanth Meenakshisundaram smeenaks # ucsd § edu.

Originally committed as revision 24512 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 14:30:47 +00:00
Ronald S. Bultje
48adb7e7a4 Enable no-loop memory/register saving for ssse3/sse4 also.
Originally committed as revision 24511 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 14:07:57 +00:00
Ronald S. Bultje
2a180c69ea Save a register (or regsize of stackspace for x86-32) for the no-loop
mbedge loopfilter functions, by re-using space that holds a variable
that we no longer need.

Originally committed as revision 24510 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 14:00:15 +00:00
Ronald S. Bultje
bcd4aa6498 Use nested ifs instead of &&, which appears to not work with %ifidn (i.e. this
construct was always enabled, even for <ssse3 versions).

Originally committed as revision 24509 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 13:56:51 +00:00
Axel Holzinger
0b2c75cb68 Rename pow variable to pwr.
Patch by Axel Holzinger <aholzinger gmx de>.

Originally committed as revision 24508 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 13:52:49 +00:00
Ronald S. Bultje
2208053bd3 Split pextrw macro-spaghetti into several opt-specific macros, this will make
future new optimizations (imagine a sse5) much easier. Also fix a bug where
we used the direction (%2) rather than optimization (%1) to enable this, which
means it wasn't ever actually used...

Originally committed as revision 24507 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 13:50:59 +00:00
Jason Garrett-Glaser
fca05ea8a0 VP8: add missing free
Fixes a tiny memory leak.

Originally committed as revision 24504 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-26 07:10:30 +00:00
Carl Eugen Hoyos
28e241de5d Fix r24445: Instead of needlessly initialising a variable, silence the warning.
Originally committed as revision 24498 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-25 14:49:45 +00:00
Carl Eugen Hoyos
b9542223a3 Only 4-bit ADPCM IMA WAV files are supported.
Originally committed as revision 24493 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-25 14:33:16 +00:00
Ronald S. Bultje
6de5b7c6b8 Fix obvious bug in assignment. Somehow, the test vectors don't test this...
Originally committed as revision 24489 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-25 02:42:40 +00:00
Aurelien Jacobs
ba2c508d0c add SubRip muxer and demuxer
Originally committed as revision 24488 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 22:50:12 +00:00
Ronald S. Bultje
e3f7bf774c Fix SPLATB_REG mess. Used to be a if/elseif/elseif/elseif spaghetti, so this
splits it into small optimization-specific macros which are selected for each
DSP function. The advantage of this approach is that the sse4 functions now
use the ssse3 codepath also without needing an explicit sse4 codepath.

Originally committed as revision 24487 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 19:33:05 +00:00
Alex Converse
63e1278d88 aacenc: TLS: Save maximum values for each swb in a table.
This gives an almost 20% speedup.

Originally committed as revision 24484 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 17:11:51 +00:00
Alex Converse
031d5cea04 10l: Remove some commented out code that slipped in.
Originally committed as revision 24483 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 17:01:50 +00:00
Alex Converse
fe461767e6 aacenc: TLS: Try to preserve some energy in each non-zero band.
Reduce scalefactors in non-zero bands that underflow by twice as much as those
in bands that just fail to hit psy targets.

Originally committed as revision 24482 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 16:36:01 +00:00
Reimar Döffinger
edac49daf5 Use "const" qualifier for pointers that point to input data of
audio encoders.
This is purely for clarity/documentation purposes.

Originally committed as revision 24481 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 13:59:49 +00:00
Alex Converse
c226fc5bfb aacenc: Prevent premature termination of the two loop search.
Originally committed as revision 24476 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 04:23:26 +00:00
Alex Converse
81824fe059 aacdec: Only load and write each predictor variable once.
This is slightly faster and opens the door for further optimization.

Originally committed as revision 24475 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 02:57:08 +00:00
Alex Converse
70c99adb48 aacdec: 4% faster main profile decoding.
Originally committed as revision 24474 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 02:41:47 +00:00
Alex Converse
51ffd3a62f aacenc: Favor log2f() and sqrtf() over log2() and sqrt().
Originally committed as revision 24473 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-24 02:10:59 +00:00
Alex Converse
04d72abf17 aacenc: Factorize some scalefactor utilities.
Originally committed as revision 24472 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 22:37:42 +00:00
Eli Friedman
3611e7a309 Inline asm for VP56 arith coder
This is a lot more reliable to get cmov rather than trying to trick gcc into
generating it, useful since it's 2% faster overall.

Patch by Eli Friedman <eli.friedman at gmail>

Originally committed as revision 24471 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:30 +00:00
David Conrad
ca18a478e3 VP8: Inline traversing vp8_small_mvtree
Much faster read_mv_component, slightly faster overall

Originally committed as revision 24470 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:25 +00:00
David Conrad
7697cdcf95 VP8: Use vp56_rac_get_prob_branchy when the bit is only used by an if()
Originally committed as revision 24469 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:20 +00:00
David Conrad
fe1b5d974a Decode DCT tokens by branching to a different code path for each branch
on the huffman tree, instead of traversing the tree in a while loop.

Based on the similar optimization in libvpx's detokenize.c

10% faster at normal bitrates, and 30% faster for high-bitrate intra-only

Originally committed as revision 24468 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:17 +00:00
David Conrad
5474ec2ac8 Move renormalization of the VP56 arith decoder to before decoding a bit
No difference at the moment, but allows a future branchy variant
of vp56_rac_get_prob to be significantly faster

Originally committed as revision 24467 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:14 +00:00
David Conrad
b3d755ec8b Split renorm of vp56 arith decoder to its own function
Originally committed as revision 24466 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:08 +00:00
David Conrad
24675b8093 vp56's arith decoder's code_word is only 16 bits, no need for unsigned long
Originally committed as revision 24465 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:46:01 +00:00
Jason Garrett-Glaser
13a1304bb3 Add myself to VP8 copyright and maintainers.
Also add Ronald to maintainers.

Originally committed as revision 24464 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:42:35 +00:00
Jason Garrett-Glaser
414ac27d8f VP8: always_inline some things to force gcc to do the right thing
Mostly seems to help in the MC code, which gets a hundred cycles faster.

Originally committed as revision 24463 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:36:21 +00:00
Jason Garrett-Glaser
06d50ca804 VP8: use AV_RL24 instead of defining a new RL24.
Originally committed as revision 24462 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 21:17:18 +00:00
Jason Garrett-Glaser
9fddd14a8e VP8: Slightly faster MV selection
Don't clamp best mv unless it's actually used.

Originally committed as revision 24461 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 19:06:22 +00:00
Jason Garrett-Glaser
14767f35ed VP8: use AV_ZERO32 instead of AV_WN32A where relevant
Originally committed as revision 24460 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 10:42:19 +00:00
Jason Garrett-Glaser
09959ec46e VP8: eliminate redundant code in r24458
Originally committed as revision 24459 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 10:34:21 +00:00
Jason Garrett-Glaser
a71abb714e VP8: shave a few clocks off check_intra_pred_mode
Originally committed as revision 24458 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 10:24:38 +00:00
Jason Garrett-Glaser
0087aa47d0 VP8: fix broken sign bias code in MV pred
Apparently the official conformance test vectors don't test this feature,
even though libvpx uses it.

Originally committed as revision 24456 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 06:41:35 +00:00
Jason Garrett-Glaser
3ae079a3c8 VP8: optimize DC-only chroma case in the same way as luma.
Add MMX idct_dc_add4uv function for this case.
~40% faster chroma idct.

Originally committed as revision 24455 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 06:02:52 +00:00
Jason Garrett-Glaser
3df56f4118 VP8: Clean up some variable shadowing.
Originally committed as revision 24454 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 03:44:37 +00:00
Jason Garrett-Glaser
51c9156438 VP8 asm: cosmetics (spacing)
Originally committed as revision 24453 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 03:02:56 +00:00
Jason Garrett-Glaser
8a467b2d44 VP8: 30% faster idct_mb
Take shortcuts based on statistically common situations.
Add 4-at-a-time idct_dc function (mmx and sse2) since rows of 4 DC-only DCT
blocks are common.
TODO: tie this more directly into the MB mode, since the DC-level transform is
only used for non-splitmv blocks?

Originally committed as revision 24452 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 02:58:27 +00:00
Jason Garrett-Glaser
ef38842f0b VP8: smarter prefetching
Don't prefetch reference frames that were used less than 1/32th of the time so
far in the frame.
This helps speed up to ~2% on videos that, in many frames, make near-zero
(but not entirely zero) use of golden and/or alt-refs.
This is a very common property of videos encoded by libvpx.

Originally committed as revision 24451 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 01:59:56 +00:00
Baptiste Coudurier
9479415e4e In h264 parser, return immediately if buf_size is 0, avoid printing
erroneous message for last frame.

Originally committed as revision 24450 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 00:34:09 +00:00
Jason Garrett-Glaser
c25c776708 VP8: clear DCT blocks in iDCT instead of using clear_blocks.
~0.3% faster overall.

Originally committed as revision 24448 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 00:07:16 +00:00
Jason Garrett-Glaser
b74f70d646 VP8: avoid a memset for non-i4x4 blocks with no coefficients
Originally committed as revision 24447 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-23 00:05:44 +00:00
Jason Garrett-Glaser
145d31865d Get rid of more unnecessary dereferences in VP8 deblocking
Originally committed as revision 24446 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 23:11:40 +00:00
Jason Garrett-Glaser
867215336d Shut up an uninitialized variable GCC warning in VP8.
Originally committed as revision 24445 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 23:04:51 +00:00
Jason Garrett-Glaser
c4211046d2 Smarter VP8 prefetching
Prefetch all refs (including altref), but only if they've been used so far this
frame.
~2.5% faster overall.

TODO: Do something even smarter, like using how often each ref has been used
so far, so that a couple blocks of a rarely-used ref don't force us to prefetch
it.

Originally committed as revision 24444 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 23:03:08 +00:00
Jason Garrett-Glaser
8cfae560ad Fix stupid bug in VP8 prefetching code
Originally committed as revision 24443 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 22:15:43 +00:00
Jason Garrett-Glaser
2a38c2e99a Eliminate a LUT in escape decoding in VP8 decode_block_coeffs
Originally committed as revision 24441 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 22:08:09 +00:00
Jason Garrett-Glaser
d292c3455e Eliminate some repeated dereferences in VP8 inter_predict
Originally committed as revision 24438 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-07-22 21:05:30 +00:00