This commit introduces a new block match motion estimation
using integral projection measurement. The 2-D block and the nearby
region is projected onto the horizontal and vertical 1-D vectors,
respectively. It then runs vector match, instead of block match,
over the two separate 1-D vectors to locate the motion compensated
reference block.
This process is run per 64x64 block to align the reference before
choosing partitioning in speed 6. The overall CPU cycle cost due
to this additional 64x64 block match (SSE2 version) takes around 2%
at low bit-rate rtc speed 6. When strong motion activities exist in
the video sequence, it substantially improves the partition
selection accuracy, thereby achieving better compression performance
and lower CPU cycles.
The experiments were tested in RTC speed -6 setting:
cloud 1080p 500 kbps
17006 b/f, 37.086 dB, 5386 ms ->
16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster)
pedestrian_area 1080p 500 kbps
53537 b/f, 36.771 dB, 18706 ms ->
51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings)
blue_sky 1080p 500 kbps
70214 b/f, 33.600 dB, 13979 ms ->
53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster)
jimred 400 kbps
13380 b/f, 36.014 dB, 5723 ms ->
13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower)
Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c
using this to control reallocation would miss a change if the function
were not called for every frame.
fixes potential memory corruption by the subsequent memset
Change-Id: I4c6bb6ab68803104fc824c7e27cc2f9b2cf53e33
MODE_INFO struct was modified, and vp9_print_modes_and_motion_vectors()
didn't work anymore. This patch modified vp9_debugmodes.c so that
this function works again for debug usage.
Change-Id: I293fae0295235deb2529a460a274caf7c045ac1a
This will fix the frame parallel decode hang on windows
due to not enough semaphores.
This will also make the frame parallel decode safer as
the number of frame buffers could only support maximum
8 threads.
Change-Id: Id9ef50692819dcbebbd74a0aabffbfb3f39a4309
This commit allows the encoder to account for additional chroma
plane costs in the mode decision process, if the current block
potentially contains significant color change. It improves the
visual quality at very low bit-rates.
The compression performance of dark720p is improved by 12.39% in
speed 6. For jimred at 150 kbps, the PSNR of V component (red)
increased by 0.2 dB, at the expense of about 5% increase in
encoding time. Note that for sequences where the chroma components
are fairly consistent, the encoding time increase is negligible.
On average the rtc set compression performance is improved by
1.172% in PSNR and 1.920% in SSIM.
Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
+ nearest for consistency
near is a reserved word in windows builds so using it as a parameter
name may cause build failures with some configurations
Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403
1. move the check of search method of USE_TX_8X8 up one level to
avoid operations of build_tree_distributions()
2. count tx used and avoid computaton for coef udpate when one size
is not used at all.
Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c
On Nexus 7 speed -6 saw ~18% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c
On Nexus 7 speed -6 saw ~15% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9
On Nexus 7 speed -6 saw ~30% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: Id12af7d1883243c23e6692e898aea82299633d58
On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30%
increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51
This commit replaces an integer divide with a table-lookup. It is
to improve decoding speed, and at the same time, to reduce possible
complications with a bug in AMD Family 12h processors:
"665 Integer Divide Instruction May Cause Unpredictable Behavior"
Change-Id: I678b707a538798a923850bac467e66e847e6def7
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8
side. In our testing, we achieve 2X speed by adopting this change.
Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716
1. Added row-based loopfilter in encoder;
2. Moved common multi-threaded loopfilter functions from decoder
to common;
3. Merged multi-threaded loopfilter code, and made encoder/
decoder call same function to reduce code duplication.
Encoder tests showed that 1% - 2% speedup was seen for good-quality
2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6%
speedup using 4 threads were seen for real-time mode(at speed 7).
Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5%
increase in perf for 720p.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee
This commit adds encoder side control for vp9 to set color space info
in the output compressed bitstream.
It also amends the "vp9_encoder_params_get_to_decoder" test to verify
the correct color space information is passed from the encoder end to
decoder end.
Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650
On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10%
increase in perf for 720p.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334
Add optimized Neon functions of:
vp9_variance32x64
vp9_variance64x32
vp9_variance64x64
On Nexus 7 speed -5 and -6 saw about a 4% increase in perf.
Speeds -7 and -8 saw about a 6% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa
When qdiff is larger, the sad/variance threshold should also be
higher which indicates a more aggressive action on MFQE.
Change-Id: I44c5c93572805458d4f87fdc7619cc9d8a522185
Separate functions and rename files. This will make it easier to disable
some functions later to help work around a compiler issue in chromium.
Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5
By using weighted averaging in the calculation of the frames to be
displayed, we get an average gain of more than 1 db for key frames
whose base qp are 20 higher than non-key frames.
Change-Id: I7bcb2e7b9c6420ea3f73f33204d18b072dffd17c