Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.
Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.
Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.
Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
Major parts have been implemented as follows:
(1) Added BRF_UPDATE, LASTNRF_UPDATE, and NRF_UPDATE in firstpass.c;
(2) Added the handling for the scenario of
"cpi->common.show_existing_frame == 1" at the encoder;
(3) Added a new reference frame of BWDREF_FRAME;
(4) Have bwd-ref work with upsampled references.
Note that when the experiment of "ext_refs" turned on, this experiment
will be turned off automatically currently.
RD performance in Overall PSNR has been improved, compared against the
VP10 baseline:
lowres: Avg -3.312; BDRate -3.154
derflr: Avg -1.927; BDRate -1.176
midres: Avg -2.149; BDRate -2.001
hdres : Avg -0.567; BDRate -0.588
Change-Id: I4c06ff51cc20194bffbd4d2346e57ba3dcf6b62c
Removing redundant calls to memcpy from
build_wedge_inter_predictor_from_buf yields a net 4% encoder speedup
with ext-inter only. The output is identical.
Change-Id: If97d4e323a5c8aca90c84a25a72085e006b05446
This patch always compares the most recent show frames between
the encoder and the decoder to test the mismatch.
Change-Id: I68a91ad0996a598231450debfd616e24992419b5
This is to replace vp10/common/reconinter.c:build_masked_compound.
Functionality is equivalent, but the interface is slightly more
generic.
Total encoder speedup with ext-inter: ~7.5%
Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
This commit makes the filter extension in highbd aware of the
dual filter and ext-interp experiments to prevent enc/dec mismatch
when both experiments are turned on.
Change-Id: I11ac1f041bd5f73d61e839d6386d9c5d008da3f7
This commit makes the sub8x8 chroma component inter predictor
operate at 2x2 block level. This allows one to use the actual motion
vector associated with each individal pixel block. It improves the
compression performance
lowres 0.40%
midres 0.25%
hdres 0.15%
Change-Id: Ia40e07cc7fde463dbf660018850e024932136c4f
Use the same rounding method that is used throughout the codebase,
where the halfway value is rounded up rather than down.
Change-Id: Ie969ed7eb9fcc88a93a90c7e274fd82f336c7e4d
With ext-interp, a switchable interpolation filter is coded iff the
motion vector uses fractional pixel movement (ie, true subpixel
movement). With ext-interp and obmc enabled at the same time, the RD
search proceeds as:
1. Do motion search
2. Do interpolation filter search iff subpixel motion, otherwise use
EIGHTTAP_REGULAR
3. Evaluate obmc=0
4. Evaluete obmc=1 - This involves another motion search
If the motion search in step 4 yields an integer motion vector, while
the search in step 1 did not, then an interp_filter value other than
EIGHTTAP_REGULAR is invalid, and will cause an assertion failure
at output time, or a mismatch if not using --enable-debug.
The fix sets the interp_filter to EIGHTTAP_REGULAR if obmc=1 is picked
with an integer motion vector.
Change-Id: I4685d1ad537f41d833dc9eb64845956b67886cca
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.
Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe
If a reference block is coded with sub8x8 block size, and if it
has sub-pixel level motion vectors, its prediction filter type
should be used as context information.
The coding performance gains of dual filter type coding scheme are
lowres 0.57%
hdres 0.88%
Change-Id: I68b98f2518d02f11c29d0256aeb45b2580fe5cac
This commit reworks the probability model contexts used in the
prediction filter type entropy coding.
Change-Id: I7abc68cb469248d0d7ca1046da3c086ecb7b066a
- Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
block, for experiment, EXT_TX.
- Encoder speed improves about 12%-15%.
- Update the unit tests for bit-exact result against C.
Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
Use model for interintra mode search.
Speed-up about 5-10% with about 0.04 drop in efficiency.
lowres: -2.60%
Change-Id: I825bf0ba8a46eb7f19fc528c25b8df066fb8ea95
minus the non-existent nonrd portion. original change:
commit d642294b1c57a5adacb1038ff45766c38bae8a6d
Author: Jingning Han <jingning@google.com>
Date: Thu Feb 11 12:36:49 2016 -0800
Fix tsan error in VP9 sub8x8 intra mode search
This commit fixes issue 1141. The issue was triggered in multi-tile
encoding. The change properly saves and restores the block context
information in the real-time mode selection process. It removes
several redundant memcpy operations in sub8x8 intra block mode
search.
Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010
Change-Id: Idfa38c54c9e645479f6870d46f71fb1e91c071da
For the current stage, we assume a single prediction filter type
per direction in the settings of compound inter prediction modes.
Change-Id: I12a1afdd364b93fcee870bd11ad01fc40ab48cff