Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_variance64x64
2. vp9_sub_pixel_variance32x32
both of those function were calling vp9_sub_pixel_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 70% function level gain and 2% user level gain
Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization is done only for 64 bit
Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
Only use layered average size if number_temporal_layers > 1.
Also removed unneeded commented-out line, and change some parameter
setting in vpx_temporal_scalable_patterns.c
Change-Id: Ic86e43e7daf0313e8c5a4aba1497299158111955
Added support for external frame buffers to libvpx's VP9 decoder.
If the external frame buffer functions are set then libvpx will
call the get function whenever it needs a new frame buffer to
decode a frame into. And it will call the release function
whenever there are no more references to that buffer.
Change-Id: Id2934d005f606af6e052fb6db0d5b7c02f567522
Prior to this commit, both encoder and decoder reset mode/mv info from
previous frame in error resilient mode to ensure bitstreams are able to
decode when there is loss of frame in decoder side. However, this is
not necessary. This commit changed to remove the reset, so encoder can
continue to use mode/mv/partition information from previously encoded
frame without affecting decodeablilty under loss of frame.
Change-Id: I0279f862900dc647fb471ae3389770bb1b9f454f
Silence signed/unsigned mismatch warnings by adding casts where
ts_number_layers does not match the sign of the variable to which
it is being compared.
Change-Id: Iab25e18c877d158b2b2b417de7da94669648b2fa
In rtc coding mode, the encoder is running non-RD mode decision. It
does not need dual buffer swap as was the case in the RD mode. This
commit initializes the internal buffer pointers outside the block
coding loop for rtc mode.
Change-Id: Ie076705c60d6b7919217e3f1dfd49e7db5064ac2
The functionalities of set_offsets() are subsumed in later
set_partitioning() and rtc_use_partition() functions, hence removed.
Change-Id: Ie514b13cb66c2379f13d0be9b1da4c12ca4581e5
Flipping arf on and off too often is hurting some clips.
This change makes no difference for 50-75% of our test
clips but helps some by a big margin. (eg. std-hd crew
by 6% and one of the YT and YT-hd clips by 14%)
Average improvements for 2 pass, speed 2 (psnr,ssim)
are as follows:-
derf 0.165%, 0.210%
yt 1.210%, 1.464%
yt-hd 1.189%, 1.471%
std-hd 1.031%, 0.886%
Change-Id: I121fe66cfb4a62d384b23b484a7d648789641969
Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows
Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
This function initializes the predictor buffer pointers and
calculates reference motion vectors. It is only needed in the settings
of inter frame coding. Hence removing it from the key frame coding
branch in rtc_use_partition.
Change-Id: Ic4e16c7467a5f32be4e0bf619ef9d57afb4a7075
This function is deprecated after the re-design of partition search
that runs big block size, then four-way split, followed by
rectangular block sizes. This commit removes the related functions.
Change-Id: I417549c8e0fa3cf35bd29816b805dd4e7c3660c6
The function rd_pick_reference_frame can be deprecated. Its use was
subsumed by the adaptive motion search control.
Change-Id: Icb0c2fa335f0f06fa7b79a71f972d9fa54d750db