This commit replaces the previous table based intra mode model
coding with a more balanced entropy coding system. It reduces the
decoder lookup table size by 1K bytes. The key frame compression
performance is about even on average. There are a few points where
the compression performance is improved by over 5%. Most test
points are fairly close to the lookup table approach.
Change-Id: I47154276c0a6a22ae87de8845bc2d494681b95f6
This commit allows the entropy coder for transform block partition
to account for its relative position with respect to the block size.
Change-Id: I2b5019c378bfb58c11b926fa50c0db1933f35852
Refactor the recursive transform block partition to reduce repeated
computation maximum transform block size per block.
Change-Id: Ib408c78dc6923fe7d337dc937e74f2701ac63859
This commit makes the tx_block_rd_b() compute the rate and
distortion cost per transform block, instead of accumulating these
costs.
Change-Id: Iff5adc4c27cc54f8e6eb3abd95f8d88ba00f462c
If the skip flag is already on, there is no need to further check
the all zero block case. This improves encoding speed at no coding
statistics change.
Change-Id: Icab997ca2977e650351a47ff1314def5ac4ecb1d
This commit allows the encoder to force all zero quantized
coefficient block per transform block, if that provides better
rate-distortion trade-off.
Change-Id: I5b57b28cccd257ebfaf7c1749dda7be482abc834
This commit refactors the transform block partition entropy
coding process to improve the encoding speed. There is no change
in the compression statistics.
Change-Id: I237466fd95c1b888df432babfa36e01f74240eef
This commit allows the encoder to account for the boundary block
information to estimate the transform block partitiion rate cost
in the rate-distortion optimization scheme.
Change-Id: Idb79cf936d96cdd15bcba27e47318295413a5f5d
Select the probability model for transform block partition coding
conditioned on the neighbor transform block sizes.
Change-Id: Ib701296e59009bad97dbd21d8dcd58bc5e552f39
This commit makes the encoder to properly compute the rate
distortion cost for blocks that partially cover extend pixels.
Change-Id: I44529af6f76925cdc0f6b24a5d190b51b0813983
If the frame header sets to use fixed transform block size, use
the univariate transform block partition search flow.
Change-Id: Ic422ecb6565642cd8ddb96dc67a37109ef3ce90f
This commit fixes the over count issue in the recursive transform
block partition rate cost estimation. It improves the compression
performance by about 0.45%.
Change-Id: I01ccda954ed0e120263977472c1c759c3c67170c
This commit enables the rate-distortion optimization for recursive
transform block partition for inter mode blocks based on luma
component. The chroma component infers the transform block size
decision from those of luma component.
Change-Id: I907cc52af888a606b718e087e717b189fa505748
Move the rate-distortion estimate function outside the recursion
as an individual operating module.
Change-Id: I662199223c256664bcd312084b3aebffb8a8034b
This commit makes the rate-distortion estimation of the chroma
components support the recursive transform block partition
inferred from the luma component mode decisions.
Change-Id: I2e038bebf558da406e966015952ad1058bdf4766
Allocate memory buffer to store the transform coding partition
information of inter prediction mode blocks.
Change-Id: I428b1dd0b26e8eaf24030a833554ceb4479c5551
This commit separates Hadamard transform/quantization operations
from rate and distortion computation in block_yrd. This allows one
to skip SATD computation when all transform blocks are quantized
to zero. It also uses a new block error function that skips
repeated computation of sum of squared residuals. It reduces the
CPU cycles spent on block error calculation in block_yrd by 40%.
Change-Id: I726acb2454b44af1c3bd95385abecac209959b10
To enable us to the scale-invariant motion estimation
code during mode selection, each of the reference
buffers is scaled to match the size of the frame
being encoded.
This fix ensures that a unit scaling factor is used in
this case rather than the one calculated assuming that
the reference frame is not scaled.
Change-Id: Id9a5c85dad402f3a7cc7ea9f30f204edad080ebf
Revised adjustment for rd based on source complexity.
Two cases:
1) Bias against low variance intra predictors
when the actual source variance is higher.
2) When the source variance is very low to give a slight
bias against predictors that might introduce false texture
or features.
The impact on metrics of this change across the test sets is
small and mixed.
derf -0.073%, -0.049%, -0.291%
std hd -0.093%, -0.1%, -0.557%
yt +0.186%, +0.04%, - 0.074%
ythd +0.625%, + 0.563%, +0.584%
Medium to strong psycho-visual improvements in some
problem clips.
This feature and intra weight on GF group length now
turned on by default.
Change-Id: Idefc8b633a7b7bc56c42dbe19f6b2f872d73851e
This experiment biases the rd decision based on the impact
a mode decision has on the relative spatial complexity of the
reconstruction vs the source.
The aim is to better retain a semblance of texture even if it
is slightly misaligned / wrong, rather than use a simple rd
measure that tends to favor use of a flat predictor if a perfect
match can't be found.
This improves the appearance of texture and visual quality
on specific test clips but is hidden under a flag and currently
off by default pending visual quality testing on a wider Yt set.
Change-Id: Idf6e754a8949bf39ed9d314c6f2daaa20c888aad
The joint_motion_search function alternates prediction
between two reference frames. In order to reuse existing
code, a pointer to the appropriate reference frame is
written into xd->plane[0].pre[0], that the motion
estimation code assumes points to the reference frame.
If this first reference frame was scaled then the
pointer was incorrectly being reset to point to the
unscaled reference frame rather than the scaled
version.
Change-Id: I76f73a8d8f4f15c1f3a5e7e08a35140cdb7886ab
It was tiny when it was orginally marked INLINE. Forcing this function
to be inlined prevents the compiler from inlining its much smaller
callers.
No measurable speed impact, 28320 byte smaller libvpx.a
Change-Id: I6bf4c917157d15cbadb3cd3e20a9e82d35dc7d6f
Frame buffers are now allocated dynamically on-demand.
Entries in the reference frame map, cm->ref_frame_map,
may now be set to -1 (INVALID_IDX) to indicate that
there is not a valid reference buffer in that "slot".
All slots in the reference frame map are now initialized
to the empty state (-1) and each buffer is initialized
to have a reference count of 0.
Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582
Note: This feature is still in development.
Add an option for the encoder to decide the resolution
at which to encode each frame.
Each KF/GF/ARF goup is tested to see if it would be
better encoded at a lower resolution. At present, each
KF/GF/ARF is coded first at full-size and if the coded
size exceeds a threshold (twice target data rate) at
the maximum active Q then the entire group is encoded
at lower resolution.
This feature is enabled in vpxenc by setting:
--resize-allowed=1
In addition, if the vpxenc command line also specifies
valid frame dimensions using:
--resize-width=XXXX & --resize_height=YYYY
then *all* frames will be encoded at this resolution.
Change-Id: I13f341e0a82512f9e84e144e0f3b5aed8a65402b
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a74d9ce6fb9f875706dc639c95402f5.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.
Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
This commit explicitly set the second reference frame type to be
NONE in key frame coding mode. This fixes a subtle dependency of
reference motion vector used by next inter frame on mode_info
reset before key frame coding.
Change-Id: I5ff0359753fdc9992b0bfe889490f7a32d7d5f6a
Where there is very subtle motion, especially when combined
with low spatial complexity, the codec sometimes fails to quickly
pick up the ambient motion field.
Once it has been established though the field propagates well using
Nearest and Near MV.
This patch looks specifically at the case where the Nearest and Near
have not been established as non zero vectors and in this case
discounts the cost of searching for a new vector in the rd code.
This will almost certainly have some implications in terms of encode
speed but it should be possible to mitigate the impact in a subsequent
using first pass stats and the local spatial complexity.
Average results for test sets approximately neutral.
Change-Id: I44a29e20f11f7ab10f8c93ffbdc50183d9801524
These speed-up features for key frame coding are only turned on
in the settings of hybrid non-RD and RD mode decision. It provides
about 20% speed-up to the hybrid key frame coding at the expense
of certain compression performance loss. For vidyo1, the key frame
coding statistics are changed
9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us
Overall rtc set compression performance is down by -0.257%.
Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f