On rtc set:
speed 7 quality improves about 0.5%
speed 8 quality improves about 1.0%
Encoding time for speed 7 changes from 67804ms to 65889ms
Encoding time for speed 8 changes from 58659ms to 56808ms
Change-Id: Iabcfb53012fc1b9f3326cdbc167e5758b8c7ad30
After syncing the frame worker thread, avaiable thread count should
increase by 1 even the worker thread does not have displayable frame
to output.
Change-Id: I9eeb87720fed82dfe38555286833ff88e8a8e746
Reuse the yv12_mb array to fetch the buffer pointers/strides
corresponding to the current reference frame.
Change-Id: I5276b7494158b2cccef15213be2dc189e9036851
This commit allows the encoder to account for additional chroma
plane costs in the mode decision process, if the current block
potentially contains significant color change. It improves the
visual quality at very low bit-rates.
The compression performance of dark720p is improved by 12.39% in
speed 6. For jimred at 150 kbps, the PSNR of V component (red)
increased by 0.2 dB, at the expense of about 5% increase in
encoding time. Note that for sequences where the chroma components
are fairly consistent, the encoding time increase is negligible.
On average the rtc set compression performance is improved by
1.172% in PSNR and 1.920% in SSIM.
Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c
This patch continues the work to remove frame_parallel_decoding_mode
requirement in VP9 multi-threaded tile decoder. In order to do that,
the frame counts associated to each thread need to be accumulated
together after the frame is decoded.
Change-Id: Idba1a756cedfed3c154aef52ed82c8da3bbf9e0c
The original implementation had the following comment:
// Ignore mv costing if mvsadcost is NULL
However the current implementation does not allow for this.
If x exists then nmvsadcost must not be null.
This removes the only warning from -Wpointer-bool-conversion
https://code.google.com/p/webm/issues/detail?id=894
Change-Id: I1a2cee340d7972d41e1bbbe1ec8dfbe917667085
The current multi-threaded tile decoder requires that the videoes
are encoded with frame_parallel_decoding_mode = 1. This requirement
is not necessary, and is better to be removed. This patch includes
the first part of the work.
Change-Id: Ic7695fb3cfe13f9022582c9f0edd2aa6e2e36d28
1. Adjusted the threshold for coef update computation based on counts
of tx used, avoid coef update computation when count is low (<20)
2. Move sf->lpf_pick = LPF_PICK_MINIMAL_LPF to speed 8.
Change-Id: I02b44309e40fcdbf135c7934ae067a3f42502d30
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls.
Current frame parallel decode will only speed up the decoding for frame
parallel encoded videos. For non frame parallel encoded videos, frame
parallel decode is slower than serial decode due to lack of loopfilter
worker thread.
There are still some known issues that need to be addressed. For example:
decode frame parallel videos with segmentation enabled is not right sometimes.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
This reverts commit a18da9760a.
Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02
+ nearest for consistency
near is a reserved word in windows builds so using it as a parameter
name may cause build failures with some configurations
Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403
1. move the check of search method of USE_TX_8X8 up one level to
avoid operations of build_tree_distributions()
2. count tx used and avoid computaton for coef udpate when one size
is not used at all.
Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c
1. reduce the size of temporaray arrays on stack
2. avoid build_tree_distribution for tx size that is not used at all.
Change-Id: I0f8d7124e16a3789d3c15ad24cf02c1c12789e2c
This patch was to fix issue 924:
https://code.google.com/p/webm/issues/detail?id=924
The SECTION_RODATA macro was modified to support macho32 format.
The sub-pixel functions were modified to pass in 2 more parameters
to handle the global offsets for PIC build.
Change-Id: I3bfcd336bcae945edf300bca4ab40376a2628cd4
On Nexus 7 speed -6 saw ~18% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c
On Nexus 7 speed -6 saw ~15% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9
On Nexus 7 speed -6 saw ~30% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
BUG=https://code.google.com/p/webm/issues/detail?id=908
Change-Id: Id12af7d1883243c23e6692e898aea82299633d58
Floating point is used in vp9_convert_qindex_to_q(), so sometime unit
test ActiveMapTest would cause run time error without properly call
to clear_system_state to reset register status.
Change-Id: I181e9395148c44a6ca8b97d6e109bd4a152143c6
Add distortion threshold condition to refresh state of a coding block,
and allow for qp adjustment also for some intra modes and non-zero motion modes.
Also some code cleanup (remove unused variables/code).
Change-Id: I735fa2b28bc64f60e0323976b82510577b074203
Currently disabled by default: enabled using
#define GROUP_ADAPTIVE_MAXQ
In this patch the active max Q is adjusted for each GF
group based on the vbr bit allocation and raw first pass
group error.
This will tend to give a lower q for easy sections
and a higher value for very hard sections. As such it is
expected to improve quality in some of the easier
sections where quality issues have been reported.
This change tends to hurt overall psnr but help
average psnr. SSIM also shows a small gain.
Average results for derf, yt, std-hd and yt-hd test sets were
as follows (%change for average psnr, overal psnr and ssim):-
derf +0.291, - 0.252, -0.021
yt +6.466, -1.436, +0.552
std-hd +0.490, +0.014, +0.380
yt-hd +5.565, - 1.573, +0.099
Change-Id: Icc015499cebbf2a45054a05e8e31f3dfb43f944a
On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30%
increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51
This commit replaces an integer divide with a table-lookup. It is
to improve decoding speed, and at the same time, to reduce possible
complications with a bug in AMD Family 12h processors:
"665 Integer Divide Instruction May Cause Unpredictable Behavior"
Change-Id: I678b707a538798a923850bac467e66e847e6def7
In frame parallel decode, libvpx decoder decodes several frames on all
cpus in parallel fashion. If not being flushed, it will only return frame
when all the cpus are busy. If getting flushed, it will return all the
frames in the decoder. Compare with current serial decode mode in which
libvpx decoder is idle between decode calls, libvpx decoder is busy
between decode calls. VP9 frame parallel decode is >30% faster than serial
decode with tile parallel threading which will makes devices play 1080P
VP9 videos more easily.
* frame-parallel:
Add error handling for frame parallel decode and unit test for that.
Fix a bug in frame parallel decode and add a unit test for that.
Add two test vectors to test frame parallel decode.
Add key frame seeking to webmdec and webm_video_source.
Implement frame parallel decode for VP9.
Increase the thread test range to cover 5, 6, 7, 8 threads.
Fix a bug in adding frame parallel unit test.
Add VP9 frame-parallel unit test.
Manually pick "Make the api behavior conform to api spec." from master branch.
Move vp9_dec_build_inter_predictors_* to decoder folder.
Add segmentation map array for current and last frame segmentation.
Include the right header for VP9 worker thread.
Move vp9_thread.* to common.
ctrl_get_reference does not need user_priv.
Seperate the frame buffers from VP9 encoder/decoder structure.
Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:"""
Conflicts:
test/codec_factory.h
test/decode_test_driver.cc
test/decode_test_driver.h
test/invalid_file_test.cc
test/test-data.sha1
test/test.mk
test/test_vectors.cc
vp8/vp8_dx_iface.c
vp9/common/vp9_alloccommon.c
vp9/common/vp9_entropymode.c
vp9/common/vp9_loopfilter_thread.c
vp9/common/vp9_loopfilter_thread.h
vp9/common/vp9_mvref_common.c
vp9/common/vp9_onyxc_int.h
vp9/common/vp9_reconinter.c
vp9/decoder/vp9_decodeframe.c
vp9/decoder/vp9_decodeframe.h
vp9/decoder/vp9_decodemv.c
vp9/decoder/vp9_decoder.c
vp9/decoder/vp9_decoder.h
vp9/encoder/vp9_encoder.c
vp9/encoder/vp9_pickmode.c
vp9/encoder/vp9_rdopt.c
vp9/vp9_cx_iface.c
vp9/vp9_dx_iface.c
Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64
For low spatial resolutions: bias partittion selection to smaller block sizes,
and base the variance computation on 4x4 down-sampling.
Also move the threshold computations into the choose_partitioning,
so they are computed once for each sb block.
On low-res clips (RTC_derf) PSNR/SSIMetrics increase by about 4-5%.
No change for resolutions above CIF.
Change-Id: I93f8ff742c8044786977bb6e31dcf8efda6dd1b0
Just before a forced key frame we often get a foreshortened
arf/gf group. In such a case, we do not want to update
rc->last_boosted_qindex, which is used to define the Q range
for the forced key frame itself.
This gives a small average metrics gain for the YT and YT-HD sets
(eg. YT SSIM +0.141%).
Change-Id: Ie06698bc4f249e87183b8f8fb27ff8f3fde216d9
The comparison of address in the condition is not necessary, since
they will constantly be non-null.
Change-Id: Id0b0075283f5af65215d5761a8160a4cb2a15c9b
The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8
side. In our testing, we achieve 2X speed by adopting this change.
Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716
1. Added row-based loopfilter in encoder;
2. Moved common multi-threaded loopfilter functions from decoder
to common;
3. Merged multi-threaded loopfilter code, and made encoder/
decoder call same function to reduce code duplication.
Encoder tests showed that 1% - 2% speedup was seen for good-quality
2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6%
speedup using 4 threads were seen for real-time mode(at speed 7).
Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4
This commit fixes a bug in denoiser reference frame buffer swap,
which disables frame buffer update.
Change-Id: I39a9427180fd18f9692602064ad821f7af4714c0
On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5%
increase in perf for 720p.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee
On some platforms, such as 32bit Windows and 32bit Mac, the allocated
memory isn't aligned automatically. The thread data is aligned to
ensure the correct access in SIMD code.
Change-Id: I1108c145fe982ddbd3d9324952758297120e4806
This commit adds encoder side control for vp9 to set color space info
in the output compressed bitstream.
It also amends the "vp9_encoder_params_get_to_decoder" test to verify
the correct color space information is passed from the encoder end to
decoder end.
Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650
On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10%
increase in perf for 720p.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334
Don't put small empty frame in front of a key frame. We will
put key frame flag in webm container if there's a visible key
frame. But there will be decoding error when we seek to here
if we put the small empty frame, which will be inter frame,
in front of it.
Change-Id: Id50c2c1fd31da0405ff6faa7375cc2f49c55402d
This commit added a field to vpx_image_t for indicating color space,
the field is also added to YUV_BUFFER_CONFIG. This allows the color
space information pass through the decoder from input stream to the
output buffer.
The commit also updated compare_img() function with added verification
of matching color space to ensure the color space information to be
correctly passed from encode to decoder in compressed vp9 streams.
Change-Id: I412776ec83defd8a09d76759aeb057b8fa690371
Add optimized Neon functions of:
vp9_variance32x64
vp9_variance64x32
vp9_variance64x64
On Nexus 7 speed -5 and -6 saw about a 4% increase in perf.
Speeds -7 and -8 saw about a 6% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.
Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa
Replaced "color space" with "color format" in comments where color
sampling format is concerned, so to differentiate from the concept
defined in COLOR_SPACE.
Change-Id: I8c935034c166b24307a99352dab1686531276bb8
This commit refactors the motion compensated reference block fetch
process in denoiser. It skips the stage that generates motion
compensated reference block if denoiser decides to use copy block
mode. For high motion clips, this could speed up the denoising
process by about 10%.
Change-Id: I8ef4fa5fe766a8c4529119b9ec01faefb3d4ef53
Use frame buffer pointer swap instead of memcpy when possible.
These two CLs make the denoiser when running on vidyo1 720p at
speed -6 over 10% faster.
Change-Id: I64fe8a2422cafca6787a50c7f4dfb961191c0a9d
These two parameters are used to control the denoiser cut-off
thresholds. They should be properly initialized when starting
mode search of a given block.
Change-Id: Iba8a25487026a0dbe0d350c347d7e4e4e237b637
When qdiff is larger, the sad/variance threshold should also be
higher which indicates a more aggressive action on MFQE.
Change-Id: I44c5c93572805458d4f87fdc7619cc9d8a522185
The vp9_denoiser_free() function will internally check if the
buffer pointers are NULL. This commit makes the encoder always
call vp9_denoiser_free() after finishing encoding. It protects the
case where noise_sensitivity_level is changed during encoding
process and happen to be turned off towards the end of sequence,
which could result memory space allocated to denoiser not being
released.
Change-Id: Ie20dc2f2e6e5fb6333fbab3356bc153978a6a0f8
Use the correct frame size and stride value for chroma components
when setting the initial values. These control parameters are
assigned when the denoiser buffer was allocated and initialized.
Change-Id: Ia6318194c7738aff540bcbd34f77f0dac46221a1
Allocate the frame buffer allocation for denoiser once during the
encoder initialization. This avoids allocating frame buffer
multiple times and overwriting the buffer pointer without proper
releasing.
Change-Id: I9b3baa6283449d86fd164534d344c036bb035700
When testing frame sse to choose a loop filter value and
when checking ambient error in kf Q selection, use 64 bit
values for accumulating the sse, to avoid risk of overflow
for large image formats.
Change-Id: I03765d16c843d0ade61a45b0cd46312472697e57
Separate functions and rename files. This will make it easier to disable
some functions later to help work around a compiler issue in chromium.
Change-Id: I7f30e109f77c4cd22e2eda7bd006672f090c1dc5
This makes the inter_mode counts update consistent with other symbols.
Also, forward updates should work corerctly now.
Change-Id: Id98be26fd08875162e644bb8f1de6f0918f85396
With "show_existing_frame" frames:
Minimum data size for profile 0 and 1 is 1 byte (8bits)
Minimum data size for profile 2 and 3 is 2 bytes (9bits)
Otherwise:
Minimum data size is 8 bytes.
This resolves the VP9 failure in fuzzing test build #56.
Change-Id: I146d9d37688f535dd68d24aacc76d464ccffdf04
By using weighted averaging in the calculation of the frames to be
displayed, we get an average gain of more than 1 db for key frames
whose base qp are 20 higher than non-key frames.
Change-Id: I7bcb2e7b9c6420ea3f73f33204d18b072dffd17c
This commit fixes the buffer alignment control in denoised video
output function. The encoder is now able to properly store the
denoised input video into provided file when enabled.
Change-Id: I258e272c8d4a9b52592e16d6d09976c6f5c21728
Use mbmi->segment_id directly in vp9_pick_inter_mode. The value is
set outside this function, hence no need to assign it again.
Change-Id: I3d63cdd2e4fadf62ccdefada638b00d979eb3741
Check if block size is below 8x8 for rectangular block coding. It
is added to support 4x8 and 8x4 block coding for RTC mode.
Change-Id: I760b328f45b98ae48adc45ed5a39fb643cd8aebd
This commit simplifies the reference motion vector part for sub8x8
block coding in RTC mode and reduces the required local variables.
Change-Id: I470d1482092563b68af22404dc1f497e7457b0a8
This commit enables sub8x8 inter block coding for RTC mode. The
use of sub8x8 blocks can be turned on by allowing
choose_partitioning function to select 4x4/4x8/8x4 block sizes.
Change-Id: Ifbf1fb3888fe4c094fc85158ac3aa89867d8494a
Properly set the corresponding scaling factor of the reference
frame in the non-RD mode decision process. This allows the mode
search process to account for the scaled reference frame when
selecting coding mode.
Change-Id: I9d41bff6931c98e5a82b413e37ac5e6e14b93b23
Local variables used at the setjmp() site need to be marked volatile.
Relevant excerpt from the 'man longjmp':
===============
The values of automatic variables are unspecified after a call to
longjmp() if they meet all the following criteria:
· they are local to the function that made the corresponding setjmp(3) call;
· their values are changed between the calls to setjmp(3) and longjmp(); and
· they are not declared as volatile.
===============
Change-Id: I093e6eeeedbf5f781d202248ca701ba2c29d3064
This commit adds a guard condition to the intra mode test skip
control in RTC coding mode. If all inter modes are skipped, force
the encoder to check intra mode. It avoids situations where the
encoder processes without properly assigning required mode
information.
Change-Id: Ibb349fee997d6584ce901d08b06e8df3ca9c01b1
Initial patch to remove get_zbin_mode_boost() and
cpi->zbin_mode_boost.
For now sets a dummy value of 0 for zbin extra pending
a further clean up patch.
Change-Id: I64a1e1eca2d39baa8ffb0871b515a0be05c9a6af
The alternate reference frame is disabled in non-RD mode. No need
to keep the related entries in the THR_MODES array.
Change-Id: I53386f4bb1c6284f582801f27246c5edf55bc24b
In RTC coding mode, the alternate reference frame modes and compound
inter prediction modes are disabled. This commit reworks the
related mode search threshold update process to skip interacting
with these coding modes. It provides about 1.5% speed-up for speed
-6 on average.
vidyo1
16551 b/f, 40.451 dB, 6261 ms -> 16550 b/f, 40.459 dB, 6190 ms
nik720p
33316 b/f, 38.795 dB, 6335 ms -> 33310 b/f, 38.798 dB, 6237 ms
mmmoving
33265 b/f, 41.055 dB, 7176 ms -> 33267 b/f, 41.064 dB, 7084 ms
dark720
33329 b/f, 39.729 dB, 11235 ms -> 33331 b/f, 39.733 dB, 10731 ms
Change-Id: If2a4090a371cd28f579be219c013b972d7d9b97f
This commit removes undefined value options of cpu-used for VP9 and
changed vpxenc prompt to reflect the usable range of [-8,8]
Change-Id: Ib80fef3dbb6ec9aabac45ed13e8ab6fbaf94f55e
Use a temporary variable to store the transform size associated
with the best intra mode and restore the mode_info if the overall
best mode is intra mode.
Change-Id: I2606e0061ad32f91b095462902b1eb734b128eea
Only for the rectangle blocks larger than 16X16, SAD and Variance are
still based on the internal square blocks.
Change-Id: I3754da1b0254147313f86a0140dbf4f980f06a5a
The mode_info array was unnecessarily reset to zero every frame
when error resilient mode turned on, given that the mode info
values per block will be assigned during mode search stage.
This commit removes this reset operation. It reduces the runtime
cost on memset operation to 1/3. The overall speed -6 runtime is
reduced by 2%.
Change-Id: I32ecb73338d8995cc0c5147de09357364f13d45b
This commit explicitly set the second reference frame type to be
NONE in key frame coding mode. This fixes a subtle dependency of
reference motion vector used by next inter frame on mode_info
reset before key frame coding.
Change-Id: I5ff0359753fdc9992b0bfe889490f7a32d7d5f6a
These were established for compatibility. Make sure to use them.
Most frequently they manifest as issues on Visual Studio builds.
Change-Id: I39d764d2eb341b999d7a6132cb44b2acfc511160
Delete vp9_dc_only_idct_add_neon.c
The function was merged with vp9_short_idct4x4_1_add (later
vp9_idct4x4_1_add) in d2de1ca and should have been deleted then.
Change-Id: Ie58ba3dd9dc7330a8f1238dd7dd71c9ed4639b94
Signed-off-by: James Yu <james.yu@linaro.org>
Where there is very subtle motion, especially when combined
with low spatial complexity, the codec sometimes fails to quickly
pick up the ambient motion field.
Once it has been established though the field propagates well using
Nearest and Near MV.
This patch looks specifically at the case where the Nearest and Near
have not been established as non zero vectors and in this case
discounts the cost of searching for a new vector in the rd code.
This will almost certainly have some implications in terms of encode
speed but it should be possible to mitigate the impact in a subsequent
using first pass stats and the local spatial complexity.
Average results for test sets approximately neutral.
Change-Id: I44a29e20f11f7ab10f8c93ffbdc50183d9801524
Change 72141 introduced a new use of vp9_avg_4x4.
This call needs to switch to using vp9_highbd_avg_4x4
when performing high bitdepth encodes.
Change-Id: I6a8ba4b62f8a75d0a917b365a55245e2f0438ea1
When multiple intra modes are tested, the previous mode info
update process may overwrite the selected best intra mode and make
the final selection use an inter mode. This commit fixes this
issue by moving the mode_info reset outside the intra mode search
loop.
Change-Id: I15ed4288a6b3cb0832104a5e6d5d9a25cd1a5b2b
If vp9_pick_inter_mode works properly, it should at least check
one coding mode and hence get best_tx_size assigned a valid value.
There is no need to initialize best_tx_size with a legitimate
value before starting the mode search.
Change-Id: Ic0496cd89672ea9c2c512a9bd1da952190af9cba
Make the variable reduction_fac log2 based and explicitly use
right shift when computing intra_cost_penalty.
Change-Id: I208f1fb879a02debb3b3fc64f9fd06260dcf1c86
Add vp9_iht8x8_add_neon.c
- vp9_iht8x8_64_add_neon
The assembly did not previously implement tx_type 0
BUG=716
Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934
Signed-off-by: James Yu <james.yu@linaro.org>
Add vp9_iht4x4_add_neon.c
- vp9_iht4x4_16_add_neon
The assembly did not previously implement tx_type 0
BUG=715
Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6
Signed-off-by: James Yu <james.yu@linaro.org>
Fails to compile. Bad calls to vp9_alloc_frame_buffer
and vp9_realloc_frame_buffer in postproc.c
This reverts commit 399823b6f5.
Change-Id: I29f0e173f8e185d3a303cfdb17813e1eccb51e3a
Add support for setting byte alignment on the Y, U, and V plane of the
reference buffers. The byte alignment must be a power of 2, from 32 to
1024. A value of 0 sets legacy alignment.
Change-Id: I7c1399622f7aa68e123646369216b32047dda73d
the entire module is wrapped in CONFIG_VP9_POSTPROC which is forcibly
enabled with CONFIG_INTERNAL_STATS
+ a similar change in vp9_alloccommon.c
Change-Id: I374993297a9fba5bef2f0b71f984eba42f0995a3
This commit fixes a bug in the PICK_MODE_CONTEXT index for
horizontal partition case. The compression performance change
is less than 0.01% level, since most blocks are selected to
use square block size in RTC coding mode.
Change-Id: I67effc18ae8795fccdd82a55f4efc609fa5cb3e1
For key frame under variance source partition: 4x4 prediction blocks
may be selected when variance of 8x8 block is very high (threshold is set fairly high for now).
Testing on some RTC clips shows this helps to reduce some ringing artifacts on key frame.
Encoded key frame size increases about ~10%. Key frame PSNR increases about ~0.1-0.2dB.
Change-Id: I56e203fac32ea6ef69897fb3ea269c59cb50d174
This commit explicitly uses the bit shift operation instead of
division for computing block variance.
Change-Id: Id19c0ff27dd1d1ae4aceee6657e1aad0d406bd74
If decoding starts with intra-only frame, there is a possibility
of using uninitialized entropy context, what leads to undefined
behavior.
Change-Id: Icbb64b5b1bd1e5de2a4bfa2884e56bc0a20840af
The 8x8 DCT uses a fast version whenever possible.
There was a mistake in the checking code which
meant sometimes the fast version was used when it
was not safe to do so.
Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7
(cherry picked from commit fd05fb0c21)
This commit refactors the choose_partitioning function. It removes
redundant memset calls and makes the encoder to calculate
variance value per block only when it is needed. It reduces the
average runtime cost of choose_partitioning by 60%. Overall it
reduces speed -6 runtime by 2-5%.
Change-Id: I951922c50d901d0fff77a3bafc45992179bacef9
It is the first version of MFQE in VP9. There are a few TODOs included
in this version.
Usage: Add flag --enable-vp9-postproc to config the project.
In decoder, use flag --mfqe in the command line to enable
MFQE in postproc.
Note: Need to have key frame with low quality to see the effect of this
new patch. In my experiment, I fixed the qindex to 200 in key frame.
Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396
Replace error_resilient flag with use_prev_frame_mvs in
vp9_pick_inter_mode reference motion vector search selection.
This effectively turns off the simplified ref mv search in the
settings of frame resizing, even if error-resilient mode is off.
Change-Id: I7fed814ee7bc0cb419a03b846e0fc2de46ba7686
Update the frame motion vector only if previous frame motion vector
is needed for next frame reference motion vector.
Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f
The warning only happens in VP9 encoder's first pass due to src_mi
is not set up yet. But it will not fail the encoder as left_mi and
above_mi are not used in the first_pass and they will be set up again
in the second pass.
Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08
This commit enables the use of sub8x8 blocks in RTC key frame
encoding. It requires the block size to be preset and will decide
the coding mode and encode the bit-stream.
Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b
Rename set_modeinfo_offsets as set_mode_info_offsets, to be more
consistent with naming convention.
Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d
The later encoding process will take the top-left block's
mode_info for pre-determined block size.
Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5
The restructure moves the decision into the rd pick
modes loop and makes a decision based at the 16x16
block level instead of only the 64x64 level.
This gives finer granularity and better visual results
on the clips I have tested. Metrics results are worse
than the old AQ2 especially for PSNR and this mode
now falls between AQ0 and AQ1 in terms of visual
impact and metrics results.
Further tuning of this to follow.
It should be noted that if there are multiple iterations
of the recode loop the segment for a MB could change
in each loop if the previous loop causes a change in the
complexity / variance bin of the block. Also where a block
gets a delta Q this will alter the rd multiplier for this block
in subsequent recode iterations and frames where the
segmentation is applied.
Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7
The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors.
By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved.
In the original code, the PSHUBF uses every byte and is consecutively copied.
This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result.
For example:
filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7
REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15
PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors.
There was no observed performance impact on Core processors (expected).
Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0
the flag in the header wasn't being set based on the encoder
configuration in non-intra only mode
broken since:
fbc2fbf Adding oxcf temp variable.
Change-Id: Ib4cff9901889824bc4e68d7f0f6deb1e41df2f53
The initial reset of this_rdc in vp9_pick_inter_mode is not needed,
since it will be re-assign when used.
Change-Id: Ic0e12d741cbab292fc214c1eabb48b129af7839b
Compare the current best mode rate-distortion cost with the skip
threshold to decide if performing motion search.
Change-Id: Ia071824f8dd3b7db485f424692a485a2da6a1a9f
These speed-up features for key frame coding are only turned on
in the settings of hybrid non-RD and RD mode decision. It provides
about 20% speed-up to the hybrid key frame coding at the expense
of certain compression performance loss. For vidyo1, the key frame
coding statistics are changed
9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us
Overall rtc set compression performance is down by -0.257%.
Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f
When block size is below 16x16, the encoder swap from non-RD to
RD mode for key frame coding. This largely brough back the key
frame compression performance. For vidyo1 at 1000 kbps, the key
frame coding statistics are changed
9978F, 34.183 dB, 36807 us -> 9838F, 35.020 dB, 61677 us
As compared to the full RD case
7187F, 34.930 dB, 214470 us
The overall rtc set coding performance (single key frame setting)
is improved by 1.5%.
Change-Id: I78a4ecf025d7b24ec911e85be94e01da05e77878
Change 72193 made the encoder behave differently
when configured with and without high bitdepth.
This change means the same algorithm is used for both.
Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
No more checking of corrupted reference frame as we skip
decoding any non-intra frame in case of frame corrupted.
Change-Id: I77d41bbb02fc5f61972740e2d411441eb6a17073
Currently, VP9 supports column-tile encoding, which allows a frame
to be encoded in multiple column tiles independently. The number of
column tiles are set by encoder option "--tile-columns". This
provides a way to encode a frame in parallel.
Based on previous set of patches, this patch implemented the tile-
based multi-threaded encoder. Each thread processes one or more
tiles.
Usage:
For HD clips:
--tile-columns=2 --threads=1/2/3/4
While using 4 threads, tests showed that the encoder achieved
2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at
realtime speed 5.
Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4
Change 71789 renamed CONFIG_VP9_HIGH to CONFIG_VP9_HIGHBITDEPTH.
However, one use of CONFIG_VP9_HIGH was missed.
Change-Id: I0ebb9c71380c6d810a25708d15471abf9533e695
For key frame at speed 6: enable the non-rd mode selection in speed setting
and use the (non-rd) variance_based partition.
Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames),
mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16.
Loss in key frame quality (~0.6-0.7dB) compared to rd coding,
but speeds up key frame encoding by at least 6x.
Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6.
Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
This commit reworks the ONE_LOOP_REDUCED coefficient probability
model update process. It allows model update for every coefficient
across the spectrum at a coarser resolution, instead of performing
precise update only for certain subset of probability models.
The overall runtime remains nearly same (<1% change) for speed -6.
The compression performance is improved by 7.5% in PSNR for speed
-5 and 4.57% for speed -6, respectively.
Change-Id: Ifb17136382ee7e39a9f34ff4a4f09a753125c8d1
Synchronize all threads immediately as a subsequent decode call may
cause a resize invalidating some allocations.
fixes one aspect of crbug.com/437655
Change-Id: Ie993b62c2756478543206ddbe43ec6268d90a470
Change 72056 unfolded some macro definitions,
but lost some alternative behaviour required for
high bitdepth encodes.
This causes the encoder to crash, see issue 884.
Change-Id: I8ce4d73c9fe0a3c10ccb86fba210fabc8b2f0ccc
Also removes some spurious changes in common/vp9_blockd.h which
was introduced by a rebase issue between nextgen and master branches.
Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
(cherry picked from commit 005d80cd05)
(cherry picked from commit 08d2f54800)
(cherry picked from commit 4230c2306c)
AQ2 modified to use mb_av_energy in defining variance
thresholds used alongside complexity when defining the
segment to be used for an SB64.
Slight improvements in metrics (ssim and PSNR).
Change-Id: Idb9cb73f7d9c4f7118cd7e84ac77b0f25cacbf81
Incorporate segment delta-q into estimated bits.
This generally improves the rate control under cyclic refresh (aq=3) mode.
Change-Id: I1dc60fb230e7d08357fae18909d8ed27bf58e037
This patch greatly increase the strength of AQ1.
Visual tests show strong gains on many clips but their is a big
hit on psnr.
SSIM is more mixed with some winners and losers.
Change-Id: Idaa5d3b41d8576096bfa000b62bc531c3d8bf6a1
Each tile's tok starting address is calculated before the encoding
process. These addresses are stored so that the same calculation
won't be done again in packing bit stream.
Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50
When the golden frame is boosted, the rate correction factor is not
correlated well with other inter frames even in CBR mode. This commit
changes to use GF specific rate_correction_factor when gf_cbr_boost
is greater than 20%.
Change-Id: I6312c1564387bcacc11f4c5e8a9cfdc781b5c3ab
This commit allows the encoder to increase the mode test kick-off
thresholds if the previous best mode renders all zero quantized
coefficients, thereby saving motion search runs when possible.
The compression performance of speed -5 and -6 is down by -0.446%
and 0.591%, respectively. The runtime of speed -6 is improved by
10% for many test clips.
vidyo1, 1000 kbps
16578 b/f, 40.316 dB, 7873 ms -> 16575 b/f, 40.262 dB, 7126 ms
nik720p, 1000 kbps
33311 b/f, 38.651 dB, 7263 ms -> 33304 b/f, 38.629 dB, 6865 ms
dark720p, 1000 kbps
33331 b/f, 39.718 dB, 13596 ms -> 33324 b/f, 39.651 dB, 12000 ms
mmoving, 1000 kbps
33263 b/f, 40.983 dB, 7566 ms -> 33259 b/f, 40.978 dB, 7531 ms
Change-Id: I7591617ff113e91125ec32c9b853e257fbc41d90
This patch modified struct VP9_COMP. Created a struct ThreadData
to include data that need to be copied for each thread. In
multiple thread case, one thread processes one tile. all threads
share one copy of VP9_COMP,
(refer to VP9_COMP *cpi in the code)
but each thread has its own copy of ThreadData,
(refer to ThreadData *td in the code).
Therefore, within the scope of encode_tiles(), both cpi and td
need to be passed as function parameters.
In single thread case, the FRAME_COUNTS pointer in ThreadData
points to "counts" in VP9_COMMON.
Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e
The intra mode penalty is covered by intra_cost_penalty. This
commit removes the other intra cost threshold, provided that the
constant 50 is negligible in normal rate-distortion cost.
Change-Id: I9b8b7483c43b9a41741622e7057def1f7d51bb72
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.
The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected. For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.
Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
(cherry picked from commit 454342d4e7)
This commit makes a non-RD coding mode decision process for key
frame coding. It can be optionally turned on in speed -6 and above.
Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5
This commit allows more aggressive decision to skip forward
transform and quantization for luma component in RTC coding mode.
The chroma components remains going through the normal coding
routine, since they are not included in the non-RD mode search
process.
It reduces the runtime cost by 2% - 10%. In speed -6,
vidyo1 1000 kbps
16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms
nik720p 1000 kbps
33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms
dark720p 1000 kbps
33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms
The compression performance of speed -6 is improved by 0.44% in
PSNR and 1.31% in SSIM.
Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7
In AQ1 a rate adjustment was applied for blocks coded with a
deltaq. This tends to skew the partition selection and cause
rate overshoot.
For example, consider a 64x64 super block where some but not all
sub blocks are in a low q segment and some are in a high q segment.
The choice of Q when considering large partition and transform sizes
is defined by the lowest sub block segment id (currently this implies the
lowest Q). If some parts of the larger partition are very hard this will
cause a high rate component.
The correct behavior here is for the rd code to discard the large partition
choice and break down to sub blocks where some have low and some
have high Q. However the rate correction factor above mask the high
cost of coding at a larger partition size.
Change-Id: Ie077edd0b1b43c094898f481df772ea280b35960
Make the midpoint variance used in AQ mode 1 segmentation
depend on the overall complexity of the frame in two pass.
Change-Id: I452814ec57f7a32352e41bb250e78066abe952dd
Add an additional restriction to bit/complexity based
segmentation based on spatial variance.
Only lower Q when both the number of bits spent
in the initial encoding pass and the spatial complexity are
below a threshold. This will prevent the low Q segments
being used just because there is a surfeit of bits.
Small metrics gains especially opsnr.
derf ~0.2% std-hd ~0.3%
Change-Id: I6a8496d466d673f9b0e2b2ca6304ea7b6d8e1cce
This is the first of a series of patches to restructure and
improve AQ mode 1 (variance based AQ).
Change-Id: Idcf693131a3ea2459dcfd957a54a65b971fa4a2a
Similar to mask_filter, the filter_cache in RD_OPT struct can be
moved out, and declared as a local variable since it is only
used in pick_inter_mode functions.
Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17
Correct calculation of number of mbs in two pass code when
frame resizing is enabled. Always use initial number of mbs if
scaling is enabled, as this is what was used in the first pass.
Change-Id: I49a4280ab5a8b1000efcc157a449a081cbb6d410
The mask_filter in RD_OPT struct is used to record rd result in
filter decision. It is only used in pick_inter_mode functions,
and is removed from the struct and declared as a local variable.
Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b
The max_partition_size and max_partition_size are set at the
beginning while setting speed features, and then adjusted at
SB level. Moving them to mb struct ensures there is a local
copy for each thread.
Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4
VP9/DatarateTestVP9Large.ChangingDropFrameThresh/[34] fails post the
merge of commit#ffa06b37. This commit adds reset of rc tracking info
when frame is dropped, and fixes the causes of the bad interaction
between the tests and the previous commit.
Change-Id: I848acfd9fcb336359662274325190f94aac76eae
This commit reworks the forward transform and quantization process
for 8x8 block coding. It combines the two operations in a single
function to save a store/load stage of the original transform
coefficients. Overall the speed -6 is slightly faster (around 1%
range). The compression performance of speed -6 is improved by
3.4%.
Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
In rare cases, the interaction between rate correction factor and Q
choices may cause severe oscillating frame sizes that are way off
target bandwidth. This commit adds tracking of rate control results
for last two frames, and use the information to prevent oscillating
Q choices.
Change-Id: I9a6d125a15652b9bcac0e1fec6d7a1aedc4ed97e
vp9_quantize_fp is the quantization process used by rtc coding
mode. This commit adds a sse2 implementation of it. The
implementation is modified based on vp9_quantize_b_sse2. No speed
difference from ssse3 version.
Change-Id: I24949c5b27df160b4f35117d28858d269454e64a
Current setting had active_worst_quality set too high (close to worst_quality)
for first frame(s) following first key frame. This changes that to be somewhat
more aggressive in allowing active_worst_quality to be lower following key frame.
Also remove the 4/5 reduction in active_worst for key frame as
this should be set by the user qp_max setting.
Change-Id: I0530b3ddcc85c00e3eb7568de1b14a31206c4a4c
The function pointer in compressor instance does not change, so this
commit changes to call the function directly.
Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5
This commit adds a check condition to the prediction buffering
operation used in the rtc coding mode. This resolves a unit test
warning in example/vpx_tsvc_encoder_vp9_mode_7.
Change-Id: I9fd50d5956948b73b53bd8fc5a16ee66aff61995
These 2 members in RD_OPT were moved to TileDataEnc struct
already, and therefore were removed here.
Change-Id: I22fee3b67f96e473a58e194a7edc76dbd48bfa04
Several frame counters in encoder are updated at SB level. Combine
those counters and put them in a separate struct, which allows us
to allocate one copy for each thread.
Change-Id: I00366296a13c0ada4d8fa12f5e07728388b6cab7
Modified VP9_COMP struct to include MACROBLOCK *mb. This change
makes it feasible in multi-thread case to allocate a mb for each
thread.
Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e
A flush bug is discovered during putting frame parallel decoder
into Android. This test will expose that bug.
Change-Id: Ia047f27972f4da0471649f79f1f91e7695297473
Two members in struct CYCLIC_REFRESH
int64_t projected_rate_sb;
int64_t projected_dist_sb;
are updated at the superblock level, which makes them shared data
in the multi-thread situation, and requires extra work to handle
them. However, those values are updated and used immediately, and
therefore can be removed. This patch cleaned up the code and
removed the two members.
Change-Id: I2c6ee4552bf49fb63ce590cdb47f9723974fffb1
Prepare for the introduction of frame-size change
logic into the recode loop.
Separated the speed dependent features into
separate static and dynamic parts, the latter being
those features that are dependent on the frame size.
Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313
Add extra vp9_clear_system_state() calls to fix
double / mmx issue introduced into first pass
code for 32 bit builds.
Change-Id: I84cd2986b80d83650a091ab25c43755efeb82e03
Rate correction factor is used to correct the estimated rate for any
given quantizer, and feeds into rate control for quantizer selection.
We make use of the actual bits used to calculate this rate correction
factor with an adjustment limit to prevent over-adjustment.
This commit adapts the adjustment limit to the difference between the
estimated bits and the actual bits, allows the adjustment limit to vary
between 0.125 (when estimate is close to actual) and 0.625 (when there
is >10X factor off between estimated and actual bits). By doing this,
the commit appears to have largely corrected two observed issues:
1. Adjustment is too slow when the actual bits used is way off from
estimate due to the small adjustment limit.
2. Extreme oscillating quantizer choices due to the feedback loop.
Change-Id: I4ee148d2c9d26d173b6c48011313ddb07ce2d7d6
This commit makes the speed -6 and above use the reconstructed
boundary pixels for precise intra prediction. This allows more
intra prediction modes to be tested in the non-RD coding process.
Enabling horizontal and vertical intra prediction modes can
improve the speed -6 compression performance for rtc set
by 0.331%.
Change-Id: I3a99f9d12c6af54de2bdbf28c76eab8e0905f744
I0c5f010 changed to allow update golden reference buffer in CBR mode,
this commit changes the use of rate_correction_factor for those frames
to be aligned with the new usage. This commit attempts to solve two
issues:
a. Initialization of rate correction factor for Golden Frame
Prior to this patch, even the regular inter frame has been update
the rate correction factor based on content and encoding results,
the first golden frame would still use the ininitialized value
that can be way off.
b. Allowing rate correction factor update to be slightly faster
Prior to this patch, when the rate correction factor is off, the
update to the factor is too slow, the factor could not get close
to a semi-correct value even after many frames.
The commit helps all clips in psnr/ssim metric, but especially to
a few clip in RTC set that rate correction was way off. For example
thaloundeskmtgvga gained about .5dB for both overall/average psnr.
Change-Id: I0be5c41691be57891d824505348b64be87fa3545
Adds support for one-pass rc-enabled SVC encoder with callbacks for
getting per-layer packets.
- the callback function registration is implemented as an encoder
control function.
- if the callback function is not registered, the old way of
aggregating packets with superframe will take effect.
- one more control function “VP9E_GET_SVC_LAYER_ID” has been
implemented to get the temporal/spatial id from the encoder
within the callback. This can be used to get the ids to put on RTP
packet.
Change-Id: I1a90e00135dde65da128b758e6c00b57299a111a
This commit rename a reserved color space entry to BT_2020, it intends
to provide support for VP9 bitstream to pass along the color space
type defined in BT.2020(Rec.2020)
please note this entry does not have any effect on encoding/decoding
behavior, but allow applications to the pass the information along
from encoding end to decoding end.
Change-Id: I4678520e89141ea5e8900f7bd1c0e95b710b7091
This commit integrates the non-RD mode decision process and the
encoding process into a single recursion scheme.
Change-Id: I6a7e72a0b84d567554801ebbe01ec75d54c1f77d
This patch was to fix the vpxdec fuzzing3 test failure. When an
error occurs, setjmp() is invoked, which calls the decoder
removing routine. In multiple thread situation, other threads
could try to access the frame context memory that is already
deallocated, thus causing a segfault.
An invalid unit test was added for this issue.
Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952
The aim of this patch is to apply a positive weighting to
frames that have a significant number of blocks that are
of low spatial complexity and are dark. The rationale behind
this is that artifacts tend to be more visible in such frames.
In this patch the weight is only applied in regard to the distribution
of bits between frames. Hence if all the frames share similar
characteristics (as is the case for most of our short test clips) there
will be little or no net effect.
However, the effect can be seen on some longer form test content.
For example Tears of steel baseline test:
2323.09 Kbit/s opsnr 39.915 ssim 74.729
With this patch:-
2213.34 Kbit/s opsnr 39.963 ssim 74.808
(Sligtly better metrics and about 5% smaller)
The weighting may well need some further tuning along side changes
to the aq modes.
Change-Id: Ieced379bca03938166ab87b2b97f55d94948904c
This commit removes the cyclic aq mode dependency on
in_static_area and reworks the corresponding cut-off thresholds.
It improves the compression performance of speed -5 by 1.47% in
PSNR and 2.07% in SSIM, and the compression performance of speed
-6 by 3.10% in PSNR and 5.25% in SSIM. Speed wise, about 1% faster
in both settings at high bit-rates.
Change-Id: I1ffc775afdc047964448d9dff5751491ba4ff4a9
This will save the memory and improve the decode speed due to
removing unnecessary memset of big prev_mi array for
all the key frames.
Decoding a all key frames 1080p video shows speed improve around 2%.
Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10
This commit makes the RTC coding mode to conditionally skip the
reference frame mode search, when the predicted motion vector of
the current reference frame gives more than two times sum of
absolute difference compared to that of other reference frames.
It reduces the runtim by 1% - 4% for speed -5 and -6. The average
compression performance is improved by about 0.1% in both settings.
It is of particular benefit to light change scenarios. The
compression performance of test clip mmmovingvga.y4m is improved by
6.39% and 15.69% at high bit rates for speed -5 and -6, respectively.
Speed -5
vidyo1 16555 b/f, 40.818 dB, 12422 ms ->
16552 b/f, 40.804 dB, 12100 ms
nik 33211 b/f, 39.138 dB, 11341 ms ->
33228 b/f, 39.139 dB, 11023 ms
mmmoving 33263 b/f, 40.935 dB, 13508 ms ->
33256 b/f, 41.068 dB, 12861 ms
Speed -6
vidyo1 16541 b/f, 40.227 dB, 8437 ms ->
16540 b/f, 40.220 dB, 8216 ms
nik 33272 b/f, 38.399 dB, 7610 ms ->
33267 b/f, 38.414 dB, 7490 ms
mmmoving 33255 b/f, 40.555 dB, 7523 ms ->
33257 b/f, 40.975 dB, 7493 ms
Change-Id: Id2aef76ef74a3cba5e9a82a83b792144948c6a91
This commit unfolds the legacy macro definitions used in the
sub-pixel motion search and refactors the operational flow for
later optimizations.
Change-Id: I3e3f770cad961d03d1a6eb0b2a0186cc77eaf2b8
The current logic was allowing for disabling golden refresh only
for two pass svc encoding. This change disables it as long as
more than 1 layer encoding is used (for example temporal layers under 1pass CBR).
Change-Id: I4dc5204a7ad365c821ec7963e93b59da82e1826b
In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic
_mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1.
until it will be fixed I created a workaround that create the up convert by
using broadcast128+shuffle.
The bug was reported here:
https://code.google.com/p/webm/issues/detail?id=867
Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351
A recent change has introduced big quality drops for speed 7 and 12
for --rt mode. The change reverted the big drop and improved quality
by 9.5% for speed 7 and 13.4% for speed 12.
Change-Id: I07b82e3bb6002a73af486a083458c88877bdad01
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.
Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
This commit makes the inter prediction buffer system to support
hybrid partition search. It reduces the runtime of speed -5 by
about 3%. No compression performance change.
vidyo1 720p 1000 kbps
11831 ms -> 11497 ms
nik 720p 1000 kbps
10919 ms -> 10645 ms
Change-Id: I5b2da747c6395c253cd074d3907f5402e1840c36
Combined vp9_denoiser_8xM_sse2 and vp9_denoiser_4xM_sse2 into one
function vp9_denoiser_NxM_sse2_small and passed the bitexact testing.
Changed the name of the function vp9_denoiser_64_32_16xM_sse2 to
vp9_denoiser_NxM_sse2_big.
Change-Id: Ib22478df585994dd347ebae04202c0b701e7f451