Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.
Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
from motion search as distortion. This helps performance in real-
time mode.
Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
The value of distortion2 returned by vp8_pick_intra4x4mby_modes
was being overwritten by the value returned by get16x16prederror
before it was tested.
Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.
Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.
Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant
only use 8 bits to store this, instead of 16. will allow saving an
xmm register in an updated version of the regular quantize
Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.
This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.
Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.
Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2
The error accumulator stats values cpi->prediction_error and
cpi->intra_error were being populated with rd values not
distortion values.
These are only "currently" used in a limited way for RT compress
key frame detection.
Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.
Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.
Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.
Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
rather than look up rc in the zig zag table, embed it in the macro. this
also allows us to shuffle some values in the macro and keep *d in rsi
gains of about the same order as the obj_int_extract implementation: ~2%
Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
- vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
that contains all needed address calculations
- unnecessary file encodemb_arm.c removed
- consistent with ARMv6 optimized version
Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
Detect the number of available cores and limit the thread allocation
accordingly. On decoder side limit the number of threads to the max
number of token partition.
Core detetction works on Windows and
Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN.
Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
Adds following ARMv6 optimized functions to encoder:
- vp8_subtract_b_armv6
- vp8_subtract_mby_armv6
- vp8_subtract_mbuv_armv6
Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.
Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.
Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628