Commit Graph

140 Commits

Author SHA1 Message Date
Jason Garrett-Glaser
19fb234e4a H.264: split luma dc idct out and implement MMX/SSE2 versions
About 2.5x the speed.

NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.

Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
2011-01-14 21:34:25 +00:00
Diego Biurrun
ba87f0801d Remove explicit filename from Doxygen @file commands.
Passing an explicit filename to this command is only necessary if the
documentation in the @file block refers to a file different from the
one the block resides in.

Originally committed as revision 22921 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-04-20 14:45:34 +00:00
Benoit Fouet
32e543f866 Replace @returns by @return.
Originally committed as revision 22729 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-30 15:50:57 +00:00
Alexander Strange
767738f7a3 h264: Use + instead of | in some places
6 insns less on x86-64/gcc 4.2.

Originally committed as revision 22692 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-26 05:04:03 +00:00
Alexander Strange
601ca8c55c h264: Remove unused function argument
Originally committed as revision 22690 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-26 03:31:56 +00:00
Alexander Strange
f7ba470d58 h264: Simplify decode_cabac_residual() specialization
Gives more consistent inlining with some compilers (such as llvm).

Originally committed as revision 22689 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-03-26 03:29:31 +00:00
Michael Niedermayer
8897b247a5 Remove some unneeded fill_rectangle() for 16x16 blocks.
Originally committed as revision 22124 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-28 23:54:24 +00:00
Zhou Zongyi
821fe7f3e6 Optimize (amvd>2)+(amvd>32), about 1 cpu cycles faster.
patch by Zhou Zongyi @ zhouzy () os punkt pku dot edu speck cn

Originally committed as revision 22084 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-26 22:45:35 +00:00
Michael Niedermayer
b5bd070029 Change mvd_cache & mvd_table to 8bit, this is overall a bit faster
for high resolution videos.
about 20cycles faster per MB for cathederal.

Originally committed as revision 22038 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-24 20:43:06 +00:00
Michael Niedermayer
81b5e4ee92 Calculate mvd without abs()
same speed (ask gcc why, i dont know)

Originally committed as revision 22035 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-24 18:50:02 +00:00
Michael Niedermayer
855a1ba5e8 switch back to (amvd>2)+(amvd>32), its 5 cpu cycles faster now.
Originally committed as revision 22032 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-24 18:16:48 +00:00
Michael Niedermayer
01b35be14a Factorize common code from the top of decode_cabac_mb_mvd()
10-15 cpu cycles faster.

Originally committed as revision 22029 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-24 18:06:02 +00:00
Michael Niedermayer
6d0155c79c Replace mvd>2 + mvd>32 by MIN((mvd+28)*17>>9, 2)
same speed as far as i can meassure but it might have fewer branches on some
archs.
Idea from x264 / jason

Originally committed as revision 22027 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-24 16:16:08 +00:00
Michael Niedermayer
90332debfe Replace ad-hoc fill rectangle by fill_rectangle().
Originally committed as revision 22025 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-24 13:12:09 +00:00
Michael Niedermayer
f4ce853125 get rid of an if() 1 cpu cycle faster.
Originally committed as revision 21889 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-19 03:10:26 +00:00
Michael Niedermayer
e69bfde6b2 Get rid of a local variable, 10 cpu cycles faster.
Originally committed as revision 21888 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-19 02:37:11 +00:00
Michael Niedermayer
a305449df6 Move abs() from decode_cabac_mb_mvd() to the code that writes mvd_cache.
4-8 cycles faster

Originally committed as revision 21887 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-18 23:37:48 +00:00
Michael Niedermayer
90a5849efd Speedup decode_cabac_field_decoding_flag() by 9 cpu cycles.
Originally committed as revision 21875 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-18 12:13:21 +00:00
Michael Niedermayer
69cc31832f Move check for and call of predict_field_decoding_flag() from the mb code to
the row code. This function would only be needed on a MB basis for MBAFF+FMO

Originally committed as revision 21860 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-17 02:14:02 +00:00
Michael Niedermayer
59f733d1b1 2x faster ff_h264_init_cabac_states(), 4k cpu cycles less.
Sadly this is just per slice so the speedup with normal files should be negligible.

Originally committed as revision 21859 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-16 23:43:08 +00:00
Michael Niedermayer
37a9719a97 2 cpu cycles faster context calculation for decode_cabac_intra_mb_type()
Originally committed as revision 21845 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-16 02:51:37 +00:00
Michael Niedermayer
5806e8cd1f Drop a few redundant slice_num checks.
Originally committed as revision 21844 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-16 00:09:30 +00:00
Michael Niedermayer
053074276b Drop compute_mb_neighbors() and move fill_decode_neighbors() up to take its
role.
Should be faster as this is a strict code removial.

Originally committed as revision 21843 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-15 23:04:07 +00:00
Michael Niedermayer
c1bb66ac19 Split setting neighboring MBs from fill_decode_caches()
no speed change.

Originally committed as revision 21842 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-15 22:07:02 +00:00
Michael Niedermayer
cf55f59d5e Simplify decode_cabac_mb_intra4x4_pred_mode().
same speed

Originally committed as revision 21839 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-15 19:22:09 +00:00
Michael Niedermayer
f4060611e9 Merge decode_cabac_mb_type_b() into calling code.
This avoids a conditional branch and is about 3 cpu cyclues faster.

Originally committed as revision 21838 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-15 19:20:49 +00:00
Michael Niedermayer
64dd1b0a1d Merge the single line function decode_cabac_mb_transform_size()
into the calling code.
8 cpu cycles faster

Originally committed as revision 21828 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-15 01:04:07 +00:00
Michael Niedermayer
8b38d10761 indent
Originally committed as revision 21827 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-14 23:10:02 +00:00
Michael Niedermayer
f4b8b82514 Merge decode_cabac_mb_dqp() with surronding code.
~20 cpu cycles faster

Originally committed as revision 21826 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-14 23:06:25 +00:00
Michael Niedermayer
a59b9ee33d Set sub_mb_type in direct_cache instead of just the direct flag.
Simpler, cleaner and faster.

Originally committed as revision 21822 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-14 16:51:31 +00:00
Michael Niedermayer
2dc380ca8e Store sub_mb_type in direct_cache/direct_table.
This is equal complexity but could be more usefull.

Originally committed as revision 21821 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-14 14:41:27 +00:00
Michael Niedermayer
3d2c3ef4b4 Remove slice_table checks from decode_cabac_mb_cbp_luma() and set left/top_cbp so
these checks arent needed.

Originally committed as revision 21819 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-02-14 02:08:48 +00:00
Michael Niedermayer
2773920698 Optimize decode_cabac_field_decoding_flag().
~4 cpu cycles faster

Originally committed as revision 21447 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-25 02:44:34 +00:00
Måns Rullgård
c67278098d Move array specifiers outside DECLARE_ALIGNED() invocations
Originally committed as revision 21377 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-22 03:25:11 +00:00
Michael Niedermayer
7231ccf4d5 Cosmetic, get rid of &x[0]
Originally committed as revision 21309 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-18 23:55:19 +00:00
Michael Niedermayer
f432b43b08 Split fill_caches() between filter and decoder.
Originally committed as revision 21271 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-17 21:43:08 +00:00
Michael Niedermayer
c988f97566 Rearchitecturing the stiched up goose part 1
Run loop filter per row instead of per MB, this also should make it
much easier to switch to per frame filtering and also doing so in a
seperate thread in the future if some volunteer wants to try.
Overall decoding speedup of 1.7% (single thread on pentium dual / cathedral sample)
This change also allows some optimizations to be tried that would not have
been possible before.

Originally committed as revision 21270 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-17 20:35:55 +00:00
Michael Niedermayer
ddd60f28d8 Replace cabac checks in inline functions from h264.h with constants.
No benchmark because its just replacing variables with litteral constants
(so no risk for slowdown outside gcc silliness) and i need sleep.

Originally committed as revision 21237 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-16 05:41:33 +00:00
Michael Niedermayer
cc51b28299 Split cabac decoding code out of h264.c.
not slower according to benchmarks.

Originally committed as revision 21181 to svn://svn.ffmpeg.org/ffmpeg/trunk
2010-01-13 02:35:36 +00:00
Ronald S. Bultje
f6f7d15041 h264: don't touch H264Context->ref_count[] during MB decoding
The variable is copied to subsequent threads at the same time, so this
may cause wrong ref_count[] values to be copied to subsequent threads.

This bug was found using TSAN.

Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2012-10-05 02:49:45 +02:00