This patch adds a new speed feature which doesn't do the rather
expensive entropy context lookup or save to the table, while
doing costing.
The speed up on desktop36p.y4m is around 10% other clips much less.
On the RTC test set this was + 1% in overall datarate.
Change-Id: Ia5144bbf45270671e7be9c8e4055369909e2f738
This gets more accurate mode hit stats. It's also the first step to
handling ZEROMV not being allowed more intelligently.
Change-Id: I5de6734507b5177bf73e9ddbad923f218c39f3e4
intra_y_mode_mask is already enforced for the sub8x8 case.
intra_uv_mode_mask is already enforced for all sizes.
Change-Id: Ia9dd14701cb49873c2e8f24eb5f8b255eaf76a1f
The function has evolved over time, now only calls vp9_rtcd(), so this
commit removes the function and changes to call vp9_rtcd() directly.
Change-Id: I8cfa6190daa4b28f6f3d1e11bb3a07f9c95322bf
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_avg_variance64x64
2. vp9_sub_pixel_avg_variance32x32
both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 80% function level gain and 2% user level gain
Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8
Adds a speed 8 to VP9 where only the nearestmv (0 mv) is searched.
This seems to be about the same speed as vp8 speed 5.
Adds a new speed feature to disable inter modes based on a mask for
each blocksize.
Adds code for having lower complexity motion search methods
in nonrd pick mode function, even though speed 7 still uses DIAMOND
search for now.
Also uses HEX search for speed 6 rather than FAST_HEX which improves
psnr by 0.56% without any noticeable speed drop (tested on gipsmotion).
Change-Id: Ic13176572dbd3aed5884a26786940a4b1bbd8a75
For blocks at frame boundary, the selected block size sometimes needs
to be smaller than that was first given. This commit forces such block
size change only between square blocks, so as to avoid the potential
use case containing 32x16 + 16x8 + 16x8, for 1080p sequences.
Local test suggested no visible coding speed difference. Borg test
reveals no difference in terms of compression performance.
Change-Id: Ie8de87f3c6febc3acf11b4cbfdf2077f9f6def52