Speed 6 uses small tx size, namely 8x8. max_intra_bsize needs to
be modified accordingly to ensure valid intra mode checking.
Borg test on RTC set showed an overall PSNR gain of 0.335% in speed
-6.
This also changes speed -5 encoding by allowing DC_PRED checking
for block32x32. Borg test on RTC set showed a slight PSNR gain of
0.145%, and no noticeable speed change.
Change-Id: I1502978d8fbe265b3bb235db0f9c35ba0703cd45
This is the first step to rework the rate-distortion modeling used
in rtc coding mode. The overall goal is to make the modeling
customized for the statistics encountered in the rtc coding.
This commit makes encoder to perform rate-distortion modeling for
DC and AC coefficients separately. No speed changes observed.
The coding performance for pedestrian_area_1080p is largely
improved:
speed -5, from 79558 b/f, 37.871 dB -> 79598 b/f, 38.600 dB
speed -6, from 79515 b/f, 37.822 dB -> 79544 b/f, 38.130 dB
Overall performance for rtc set at speed -6 is improved by 0.67%.
Change-Id: I9153444567e5f75ccdcaac043c2365992c005c0c
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.
It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.
In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.
Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
Making this consistent with intra mode masks: you need to specify
allowed inter/intra modes to use.
Change-Id: Iaecd28bf79047259707d8e7a59a57bb7b856383e
Added a skipping test in non-rd inter-mode. After interpolation
prediction step, the residuals are tested to see if they will be
quantized to 0 based on modeling between spatial domain and
frequency domain.
Set static-thresh to 800 for >=720p and 300 for <720p, rtc set
tests showed
1. Speed 5, psnr: -0.514%; ssim: -1.748%;
speedup on related clips: 5% -11%
2. Speed 6, psbr: -0.628%; ssim: -1.637%;
speedup on related clips: 4% - 9%
Change-Id: I62fbf26bc043ecd2b584f255f1a4ee5ab52bfcf3
When the variance is far less than sse, the block is considered to
be under light change. All the energy is compacted into DC coeff
and can be coded at low cost. In such situation, switch the rate-
distortion modeling from sse+var based back to variance based.
Note that this is a temporary solution to handle the rare situations
where the scene light changes.
Change-Id: I1ee0fe2b9eda6b5fac40152e1841bf23f4d229fd
This commit introduces a chessboard pattern search for the prediction
filter type search. It runs extensive search in alternate blocks and
allows the rest blocks to refer coding decisions of their nearby
neighbors.
For pedestrian 1080p at 4000 kbps, the runtime of speed -5 goes down
from 43990 ms to 42200 ms. The overall compression performance for
RTC set is changed by -1.37%.
Change-Id: Icfe220c49451cda796f0ca91d935c9ed01e56c9d
This commit allows the non-RD mode decision flow to select
prediction filter type in NEWMV mode. It provides 8.14% compression
performance gains in both settings of AQ=0 and 3. The current speed
impact is about 5% to 10% slower.
Change-Id: Id66ecebf77abd8f90fb3f6a066c0e8dfb4bf1c42
This commit estimates the motion vector rate cost right after full
pixel motion search. It combines this and the mode cost and compares
the corresponding rate-distortion cost. If it is already above the
current best one, skip the rest sub-pixel motion search and modeling
process. For pedestrian_area 1080p at 4000 kpbs, the speed -5 runtime
goes down from 39425 ms -> 38399 ms.
Change-Id: If4cd7119fd6c266798d5cf1d19d19ab425e52a26
This commit reduces the frequency of frames using finer quantizer
in non-RD coding flow, and slightly tune up the quantizer resolution
when used. It provides 1.7% compression gains in speed -5 at no speed
difference.
Change-Id: I430249a51260a841a0402666e5ec1566e4f7d5a6
This commit enables a recursive partition type search for non-RD
mode decisions. It allows the encoder to choose variable block
sizes in a 64x64 block based on rate-distortion modeling.
It improves speed -6 coding efficiency for rtc set by 2.4%. Most
of the gains come from 32-40dB range, where many sequences gain
about 5% to 20%. Local tests suggest there is no speed change.
Change-Id: I06300016e500a21652812b7b3b081db39a783d66
The third pred_mv is stored in x->pred_mv[ref_frame]. This commit make
sure the correct mv is read.
Change-Id: Ibed24daf36703a63f0394c87b2381ee1d2eb7910
In non-rd pick_mode code, added mode skipping according to
thresholds. Used rd thresholds now, but we can modified them
later for real-time case.
RTC set borg test showed a 0.095% PSNR gain. For different rtc
clips, the real-time(speed 7) encoder speedup is 2% - 10%.
Change-Id: Ic72535c96b891092c662453be32d3168f7e34dcc
This is an initial attempt to allow variable block size partition
in non-RD coding flow. It tests 8x8, 16x16 and 32x32 block size per
64x64 block, all using non-RD mode decision and the associated rate
distortion costs from modeling, then selects the best block size to
encode the entire 64x64 block. Such operations are triggered every
other 3 frames. The blocks of intermediate frames will reuse the
collocated block's partition type.
It improves the compression performance by 13.2%. Note that the gains
are not evenly distributed. For many hard clips, the compression
performance is improved by 20% to 28%. Local speed test shows that
it will also increase runtime by 50%, as compared to speed -7. It is
now enabled in speed -6 setting.
Change-Id: Ib4fb8659d21621c9075b3c369ddaa9ecb0a4b204
This commit allows the non-RD mode decision process to return the
rate and distortion costs associated with the selected mode.
Change-Id: Ibe0f67d323f65839fd9cb0a726c1219bf7b55da9
Brings back most of Jim's previous patch for choosing
partitioning based on variance while making it compatible
with the current state of the code. Also adds a
nonrd_use_partition() function to recursively encode for any
arbitrary sb_type decisions within a 64x64 block; and
includes some refactoring.
Currently, when the VAR_BASED_PARTITIONING mode is turned on
for speed 7, there is a 10+% speed-up observed.
Experiments/improvements with this new partitioning method
will be conducted subsequently.
Change-Id: Ie6f43bfbde30583e941f450bf07c3b48828c9571
This commit adjusts the rate-distortion modeling for non-RD mode
decision. It puts more weights on energy from AC coefficients when
estimating the cost. The coding performance for rtc testset is
improved by 0.72%.
Change-Id: Ifa6ff11311a513ec2b10586589e82a9a21f6c61c
Assign interp_kernel value in MACROBLOCKD. This will be used to
select prediction filter coefficient sets and generate motion
compensated prediction.
Change-Id: I28c8dfb2dae6566f6939bb328aca5875c94bee65
Adds a fast diamond search which is about 5% faster than FAST_HEX
with only a 0.1% drop in psnr when turned on for both speeds 5 and 7.
This search is turned on for speed 7.
Change-Id: I497630aa88a5148926086bb3038e7975e5f4eb98
This commit allows the non-RD mode decision to skip mode RD modelling
check, if the motion vector associated with the current mode is
same as that of NEARESTMV mode. This makes speed -7 about 2% faster.
Previous change that converts cost metric from SAD to model based RD
value makes the codec 6% slower at speed -7.
Change-Id: I30cfec5452f606a671b8432a2f7f0c94fbb49fc8
The rate-distortion model in non-RD coding mode is only applied to
luma component. This commit removed a few redundant addition steps.
Change-Id: Id8edc0a47c2dbef8deba43debe2c95db39454de3
This commit replaces SAD cost with modeled rate-distortion cost
for non-RD mode decision. It translates the prediction residual
SSE into estimate rate and reconstruction distorion costs, hence
capturing the quantization setting effect. The compression
performance of speed -7 for rtc set is improved by 14.79%.
Change-Id: Ifda014eb0501d13109fe7f92680bf1410b463632
The core motion estimation fucntions all return sad now consistently.
The only exception is vp9_full_pixel_diamond(), however the core diamond
and refining search routines called from vp9_full_pixel_diamond() also
return SAD. If variance of pred error + mv cost is desired it must be
calculated explicitly outside these functions. For very fast encoding,
hopefully this will eliminate some redundant computations.
Also suggests reimplementing FAST_HEX with the vp9_pattern_search
framework. It is not exactly the same as the existing FAST_HEX, but
performance is slightly better and speed is very similar. Enables
removing a lot of duplicate code.
Change-Id: I152736393438c25bdf7e96b37cbb8ce330f4f94a
Adds a speed 8 to VP9 where only the nearestmv (0 mv) is searched.
This seems to be about the same speed as vp8 speed 5.
Adds a new speed feature to disable inter modes based on a mask for
each blocksize.
Adds code for having lower complexity motion search methods
in nonrd pick mode function, even though speed 7 still uses DIAMOND
search for now.
Also uses HEX search for speed 6 rather than FAST_HEX which improves
psnr by 0.56% without any noticeable speed drop (tested on gipsmotion).
Change-Id: Ic13176572dbd3aed5884a26786940a4b1bbd8a75
This commit checks if the motion vector associated with the current
mode has been computed in previous mode tests. If possible, skip the
redundant reference block generation and SAD calculation in the
non-RD mode decision process.
For test sequence pedestrian_area 1080p, the runtime goes from
24261 ms to 23770 ms. This does not change compression performance.
The speed-up is mostly around places with consistent motion.
Change-Id: I97be63c6a2d07c57be26b3c600fbda3803adddda
Added fast HEX search while doing non-rd partition picking to
speed up the encoder.
Borg test(speed 7) on rtc set showed 1.8% overall PSNR loss.
Encoder speedup was 5% - 15% for different rtc clips.
Change-Id: I9c83026eabc70b69fcc747c90369ec60bfa3ca24
As Yunqing suggested, this commit makes non-RD mode decision always run
sub-pixel motion search in NEWMV mode. The compression performance
gains becomes fairly significant after we enabled sub-pixel accuracy
motion compensated prediction to calculate SAD cost.
For test sequences pedestrian_area at 1080p and vidyo1 at 720p, the
runtime goes slower by 5%. For rtc test set, the compression performance
is improved by 21.20%.
Change-Id: I38cbfdd5c53d79423e1fafb3154f8ddeed63bbf0
This commit builds the actual prediction block in sub-pixel accuracy
and uses which to calculate SAD for non-RD mode decision. In the trail
run on pedestrian_area at 1080p, rtc speed -7 runtime goes from
23495 ms -> 25107 ms (7% slower). The compression performance is
improved by 20.57% for rtc test set.
Change-Id: I438589cd103fe99f1b50c2d1939ac6ca43fa0157
Use a set of dedicated variables to buffer the current best mode
in non-RD mode decision. This allows to use mode_info for more
complicated test in the non-RD process.
Change-Id: I6024c9feb0662afd3eb29f7017f7b5a5446f303f
This commit makes a refactoring of the rtc_use_partition. It allows
the encoder to take a preferred block size for non-RD mode decision.
The boundary blocks are handled such that smaller block sizes that
fit in the boundary size will be used instread.
In rtc mode, the coding performance of speed -6 for pedestrian_1080p
goes from
158980 b/f, 38.934 dB, 22721 ms to
159008 b/f, 40.064 dB, 23721 ms.
For rtc set, the speed -6 compression performance is improved by
26%. Still about 2dB behind speed -5 at this point.
Change-Id: If0944f0880eaf1ad340bc325d97cea8d0f9dd53f
This commit enables the use of DC, vertical, and horizontal intra
prediction mode in rtc non-RD mode decision. When the best cost value
of inter modes is above a given threshold, the encoder runs the
above three intra modes and selects the one that has minimum
prediction residual in terms of SAD.
This together with recent changes on non-RD mode decision and coding
control improves compression performance of speed -6 by
derf 91%
yt 61%
hd 46%
stdhd 52%
In terms of encoding speed, it is about 3 times faster than speed -5.
Change-Id: I6b483bfd0307e6482bb22a6676ae4e25a52b1310
In the first coding run of a 64x64 block, check the coding mode
for each 8x8 block. Will need a second annealing stage to decide
the partition size to be encoded.
Change-Id: Ida9417805ff3358979b0c0429d4099c023c88866
Run sub-pixel motion search when NEWMV gives lower rate-distortion
cost. This improves coding performance of derf set by 8%, std-hd by
2.2%.
Change-Id: Ife50f7fda8463927784fe59a41cc439c833e941a
This patch only works if the video is a width and height that are both
a multiple of 32.. It sets every partition to 16x16, and does INTRADC
only on the first frame and ZEROMV on every other frame. It always does
does the largest possible transform, and loop filter level is set to 4.
Was ~20% faster than speed -5 of vp8
Now 20% slower but adds motion search ( every block ), nearest, near
and zeromv
The SVC test was changed because - while this realtime mode produces
bad quality albeit quickly, it isn't obeying all the rules it should
about which frames are available.
Change-Id: I235c0b22573957986d41497dfb84568ec1dec8c7
This commit setups a test framework for real-time coding. It enables
a light motion search for non-RD mode decision purpose.
Change-Id: I8bec656331539e963c2b685a70e43e0ae32a6e9d