This commit makes a refactoring of the rtc_use_partition. It allows
the encoder to take a preferred block size for non-RD mode decision.
The boundary blocks are handled such that smaller block sizes that
fit in the boundary size will be used instread.
In rtc mode, the coding performance of speed -6 for pedestrian_1080p
goes from
158980 b/f, 38.934 dB, 22721 ms to
159008 b/f, 40.064 dB, 23721 ms.
For rtc set, the speed -6 compression performance is improved by
26%. Still about 2dB behind speed -5 at this point.
Change-Id: If0944f0880eaf1ad340bc325d97cea8d0f9dd53f
* Reduce the number of short cirtcuit checks by pre-computing and combining like checks.
* Postpone non-trivial initializations until after the shortcircuits are evaluated.
* Add some consts and const pointers.
No change to the actual results of the call or output of the encoder.
Change-Id: Ie44c4702aec6e08cfe0b8b0ba3cd6b57206478d1
This commit enables the use of DC, vertical, and horizontal intra
prediction mode in rtc non-RD mode decision. When the best cost value
of inter modes is above a given threshold, the encoder runs the
above three intra modes and selects the one that has minimum
prediction residual in terms of SAD.
This together with recent changes on non-RD mode decision and coding
control improves compression performance of speed -6 by
derf 91%
yt 61%
hd 46%
stdhd 52%
In terms of encoding speed, it is about 3 times faster than speed -5.
Change-Id: I6b483bfd0307e6482bb22a6676ae4e25a52b1310
When non-RD coding decision is used in rtc mode, the alt reference
is not used for inter frame prediction. This commit disabled alt ref
option whenever speed -6 is used.
Change-Id: I0b33ca03661de1db2d9bef1bcbff848cd4c9396f
In the first coding run of a 64x64 block, check the coding mode
for each 8x8 block. Will need a second annealing stage to decide
the partition size to be encoded.
Change-Id: Ida9417805ff3358979b0c0429d4099c023c88866
In good quality mode motion search, the best matches are normally
found after searching in a large area. In real time mode, to make
encoding fast, a center-biased fast HEX search is used, which
converges quickly most of the time. A 4-point diamond search is
also carried out as the following refining search, which gives more
precise results, and maintains good motion search quality.
At speed 5, the borg test on rtc set showed an overall PSNR loss of
0.936%. The encoding speed gain is 4% - 5%.
Change-Id: I42cd68bb56a09ca1b86293c99d5f7312225ca7ae
Run sub-pixel motion search when NEWMV gives lower rate-distortion
cost. This improves coding performance of derf set by 8%, std-hd by
2.2%.
Change-Id: Ife50f7fda8463927784fe59a41cc439c833e941a
- Rename and make static
s/vp9_compute_qdelta_by_rate/compute_qdelta_by_rate/
- Make base_q_index an integer.
- Add a cast.
Change-Id: Iea8d1397fd2717e7373b182ec51f5db960ef2cca
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_variance64x64
2. vp9_sub_pixel_variance32x32
both of those function were calling vp9_sub_pixel_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 70% function level gain and 2% user level gain
Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde