The error accumulator stats values cpi->prediction_error and
cpi->intra_error were being populated with rd values not
distortion values.
These are only "currently" used in a limited way for RT compress
key frame detection.
Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.
Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.
Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
rather than look up rc in the zig zag table, embed it in the macro. this
also allows us to shuffle some values in the macro and keep *d in rsi
gains of about the same order as the obj_int_extract implementation: ~2%
Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
- vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
that contains all needed address calculations
- unnecessary file encodemb_arm.c removed
- consistent with ARMv6 optimized version
Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
Adds following ARMv6 optimized functions to encoder:
- vp8_subtract_b_armv6
- vp8_subtract_mby_armv6
- vp8_subtract_mbuv_armv6
Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.
Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
now that we need asm_enc_offsets.c for x86 and arm and it is
harmless to build it for other targets, add it unconditionally
Change-Id: I320c5220afd94fee2b98bda9ff4e5e34c67062f3
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.
Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems
when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9
Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
This declaration did not match the prototype_sad() prototype, but was
unused in this translation unit, so it is removed instead. Fixes
issue 290.
Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
- No interlocks in Cortex-A8 pipeline
- One interlock cycle in ARM11 pipeline
- About 2.16 times faster than current C-code compiled with -O3
Change-Id: I60484ecd144365da45bb68a960d30196b59952b8