This commit allows encoder to compare the SAD cost associated with
the best motion vector predictor, per frame. If one reference frame
has this cost more than 4 times of the best SAD cost given by other
reference frames, skip NEARESTMV, NEARMV, ZEROMV mode check of this
reference frame.
This setting is turned on in speed 2 and above. Compression quality
change in speed 2:
derf -0.014%
yt -0.097%
hd -0.023%
stdhd 0.046%
It reduces the speed 2 runtime of test sequences:
pedestrian_area_1080p 4000 kbps 310763 ms -> 303595 ms
bluesky_1080p 6000 kbps 259852 ms -> 251920 ms
Change-Id: I7f59cf79503d51836d61d56d50dc5bdf0e502e22
This commit further optimizes SSE2 operations in the second 1-D
inverse 16x16 DCT, with (<10) non-zero coefficients. The average
runtime of this module goes down from 779 cycles -> 725 cycles.
Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
This commit is the first patch optimizing SSE2 implementation of inverse
16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
transformation. It exploits the fact that only top-left 4x4 block contains
non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.
The average runtime of idct16x16_10 unit is reduced from
883 cycles -> 779 cycles (12% faster).
For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
down from 310651 ms -> 305910 ms. The decoding speed goes up from
80.37 fps -> 80.87 fps.
Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
vp9_mse16x16 by migrating to AVX2
some of the functions were optimized by processing 32 elements instead of 16.
some of the functions were optimized by processing 2 loop strides of 16
elements in a single 256 bit register
This optimization gives between 2.4% - 2.7% user level performance gain
and 42% function level gain.
Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
Some cleanups on frames_to_key, frames_since_key.
Also removes the unused fixed_q parameters in vp9.
Change-Id: If8743a32c71de30a8d17136477b53d607a7acda8
The previous implementation stops motion vector prediction test when
the zero motion vector appears for the second time. This commit fixes
it by simply skipping the second time check on zero mv and continuing
on to next mv candidate.
It slightly improves stdhd in speed 2 by 0.06% on average. Most static
sequences are not affected. A few hard ones, like jet, ped, and riverbed
were improved by 0.1 - 0.2%.
Change-Id: Ia8d4e2ffb7136669e8ad1fb24ea6e8fdd6b9a3c1
also:
o remove dead code, create_dummy_frame
o Fix a bug in command line handling that caused a segfault if wrong
number of arguments were given.
Change-Id: I78f026aee4e363967b750e6cde0982659c558a1f
The feature undergoes prior assumption that the recursive partition
size search from 4x4 to 64x64, hence utilizing information from small
blocks to determine early termination in large block rate-distortion
optimization search. The current codebase is now going from top down.
The previous function might go with not properly initialized values,
hence removed.
Tested on pedestrian_area_1080p at 4000 kbps running under speed 2.
No visible difference in runtime observed.
Change-Id: I553df415c6191413762db7ae34e8790c71d8118e