678 Commits

Author SHA1 Message Date
Debargha Mukherjee
560a15e62d Adds higher precision for homography model 3rd row
Also adds a function to integerize a double model.

Change-Id: Ie09b3e165492cf66ab81fe25d4bc2422a5e6defd
2016-06-09 04:12:57 -07:00
Jingning Han
68cd946994 Add MIN_TX_SIZE definition
Change-Id: I399d601d40827ac383a6687cbeaec59e9a9c63e4
2016-06-08 11:29:02 -07:00
Debargha Mukherjee
d3180b8b97 Merge "Fix build failure happened in reconinter.c" into nextgenv2 2016-06-07 14:22:25 +00:00
Debargha Mukherjee
13155e7725 Merge "Optimize wedge partition selection." into nextgenv2 2016-06-07 09:50:13 +00:00
Aamir Anis
99d9a8fe30 Updated loop restoration
1. Wiener restoration filter now has normalization and evaluation of
quantization procedure.
2. Corrected scaling of bits in RD cost computation.
3. Changed dynamic range and number of bits for Wiener filter.
Observed gains: Overall 0.58% for low_res, 0.7% for mid_res sequences.

Change-Id: I8928b3ea493bfe1790926b00388d6c4bafc08e19
2016-06-06 15:49:52 -07:00
Angie Chiang
2250c6b07b Fix build failure happened in reconinter.c
Change-Id: Ifd5ed91e4e91238fb53a202c8d76c11fbb9ccf7c
2016-06-06 14:41:14 -07:00
Debargha Mukherjee
c2ebd0e6da Merge "Move range checks into WRAPLOW" 2016-06-06 16:28:24 +00:00
Geza Lore
efda2831e5 Optimize wedge partition selection.
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.

Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.

Details are in wedge_utils.c.

Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.

ext-inter gains about ~4.5% speedup.

Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
2016-06-06 14:43:10 +01:00
Debargha Mukherjee
aa90983696 Move range checks into WRAPLOW
Provides more comprehensive coverage for --enable-coefficient-checking.
The intent is to make the --enable-coefficient-checking option
consistent with the VP9 spec.

Change-Id: I12d0120756d17572ca2b2d7e6a2ab9d8071d8d58
2016-06-03 11:27:33 -07:00
Debargha Mukherjee
cbf51c5ba0 Merge "Pre-compute and use contiguous wedge masks." into nextgenv2 2016-06-03 13:27:02 +00:00
Geza Lore
ab29978e9f Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect.

Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of  operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.

Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
2016-06-03 00:16:22 -07:00
Debargha Mukherjee
17c4f1c7f5 Merge "Use standard rounding in combine_interintra." into nextgenv2 2016-06-02 19:29:16 +00:00
Debargha Mukherjee
7534a15c3a Merge "Warped motion functions added" into nextgenv2 2016-06-02 19:28:03 +00:00
Geza Lore
888e90e823 Use standard rounding in combine_interintra.
Use the same rounding method that is used throughout the codebase,
where the halfway value is rounded up rather than down.

Change-Id: I04e92850bc69a7d7a07b06e3d2ce97f6f2ada321
2016-06-02 16:26:05 +01:00
Debargha Mukherjee
faf3c2cd38 Warped motion functions added
Change-Id: I5064ef1421e17c3ecafe70e7ff1fc7db0c16cc8f
2016-05-31 14:03:23 -07:00
Zoe Liu
e89ca180c2 Make the bi-predictive frame group interval adjustable
This is for the bidir-pred experiment. Previously the length of the
bi-predictive frame group interval is fixed at 2, i.e. one
bi-predictive frame may be inserted every other frame. This patch
makes the length adjustable, i.e. any positive number may be
specified, but the use of the backward ref will be turned off if the
bi-predictive frame group interval is larger than the golden frame
group.

Further, an additional rate factor level has been added:
INTER_LOW
, which applies to LAST_BIPRED_UPDATE frames that are not used as
references.

Change-Id: I5514d34a64dd486bbb5756c2d0612946f598a789
2016-05-28 16:46:45 -07:00
Linfeng Zhang
af7fb17c09 Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.

Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.

Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.

Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
2016-05-27 09:51:16 -07:00
Hui Su
e717ece4ab Merge "Add a quick path in build_intra_predictors" into nextgenv2 2016-05-26 22:12:53 +00:00
hui su
bad6e169bf Add a quick path in build_intra_predictors
For the cases where no reference data is available.

Change-Id: Ibf1ac9b7073acc2c7fc44da893f3d608dc74bc1e
2016-05-25 15:21:57 -07:00
Yi Luo
bfe4c0ae07 Integrate HBD inverse HT flip types sse4.1 optimization
- tx_size: 4x4, 8x8, 16x16.
- tx_type: FLIPADST_DCT, DCT_FLIPADST, FLIPADST_FLIPADST,
  ADST_FLIPADST, FLIPADST_ADST.
- Encoder speed improvement:
  park_joy_1080p_12: ~11%, crowd_run_1080p_12: ~7%.
- Add unit test cases for bit-exact against C.

Change-Id: Ia69d069031fa76c4625e845bfbfe7e6f6ed6e841
2016-05-25 12:32:10 -07:00
Yi Luo
cb507ff29a Merge "HBD inverse HT 8x8 and 16x16 sse4.1 optimization" into nextgenv2 2016-05-24 22:06:07 +00:00
Zoe Liu
cf5083d4cd Added an experiment "bidir_pred" for backward prediction
Major parts have been implemented as follows:
(1) Added BRF_UPDATE, LASTNRF_UPDATE, and NRF_UPDATE in firstpass.c;
(2) Added the handling for the scenario of
"cpi->common.show_existing_frame == 1" at the encoder;
(3) Added a new reference frame of BWDREF_FRAME;
(4) Have bwd-ref work with upsampled references.

Note that when the experiment of "ext_refs" turned on, this experiment
will be turned off automatically currently.

RD performance in Overall PSNR has been improved, compared against the
VP10 baseline:

lowres: Avg -3.312; BDRate -3.154
derflr: Avg -1.927; BDRate -1.176
midres: Avg -2.149; BDRate -2.001
hdres : Avg -0.567; BDRate -0.588

Change-Id: I4c06ff51cc20194bffbd4d2346e57ba3dcf6b62c
2016-05-24 13:55:57 -07:00
Yi Luo
28cdee448d HBD inverse HT 8x8 and 16x16 sse4.1 optimization
- Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoding speed improves ~27% on crowd_run_1080p_12.
- Merge 4x4, 8x8, 16x16 unit tests in one test file.

Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c
2016-05-24 12:55:30 -07:00
Geza Lore
2935b4db0e Remove redundant memcpy from wedge predictor.
Removing redundant calls to memcpy from
build_wedge_inter_predictor_from_buf yields a net 4% encoder speedup
with ext-inter only. The output is identical.

Change-Id: If97d4e323a5c8aca90c84a25a72085e006b05446
2016-05-24 11:31:18 +01:00
Geza Lore
62b6331753 Pick up bit-depth from the right place
Change-Id: Icbdb036d7927b77b84bd78e8348ec8b5be88df08
2016-05-24 11:08:23 +01:00
Debargha Mukherjee
fb65f9b54b Merge "Add optimized vpx_blend_mask6" into nextgenv2 2016-05-23 23:43:52 +00:00
Geza Lore
a661bc87c4 Add optimized vpx_blend_mask6
This is to replace vp10/common/reconinter.c:build_masked_compound.
Functionality is equivalent, but the interface is slightly more
generic.

Total encoder speedup with ext-inter: ~7.5%

Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
2016-05-23 16:28:58 +01:00
Debargha Mukherjee
fa5022978d Merge "Wedge refactoring to handle signs better" into nextgenv2 2016-05-20 23:19:39 +00:00
Debargha Mukherjee
e5de2ad632 Wedge refactoring to handle signs better
Mostly refactoring. Handles signs better though results are
more or less neutral.

Change-Id: If499537c8f8da4f34d104ebfda072eb4c85fb12f
2016-05-20 14:12:52 -07:00
Yaowu Xu
93921097a6 Merge "Properly handle the filter extension in highbd setting" into nextgenv2 2016-05-20 20:00:51 +00:00
Yaowu Xu
7fd0e1b991 Merge "Port change to highbitdepth code path" into nextgenv2 2016-05-20 19:59:41 +00:00
Yaowu Xu
ba794ea356 Port change to highbitdepth code path
This fixes the crash in encoder when configure with both  highbitdepth
and dual-filter.

Change-Id: Ie06cc528094f4b31b7fc0ba75e7b15cae031d707
2016-05-20 11:30:37 -07:00
Hui Su
83713e7059 Merge "Use standard rounding in intra filters." into nextgenv2 2016-05-20 16:36:04 +00:00
Jingning Han
f1c283f4de Merge "Rework sub8x8 chroma component inter predictor" into nextgenv2 2016-05-20 15:50:32 +00:00
Jingning Han
d84a2e7dc0 Properly handle the filter extension in highbd setting
This commit makes the filter extension in highbd aware of the
dual filter and ext-interp experiments to prevent enc/dec mismatch
when both experiments are turned on.

Change-Id: I11ac1f041bd5f73d61e839d6386d9c5d008da3f7
2016-05-19 09:59:48 -07:00
Yi Luo
5fec33012e Merge "Fix to conform Google's coding convention" into nextgenv2 2016-05-19 16:07:01 +00:00
Jingning Han
0f513752a0 Rework sub8x8 chroma component inter predictor
This commit makes the sub8x8 chroma component inter predictor
operate at 2x2 block level. This allows one to use the actual motion
vector associated with each individal pixel block. It improves the
compression performance

lowres  0.40%
midres  0.25%
hdres   0.15%

Change-Id: Ia40e07cc7fde463dbf660018850e024932136c4f
2016-05-19 09:03:57 -07:00
Jingning Han
936ed0804d Merge "Account sub8x8 block reference filter type for prob context" into nextgenv2 2016-05-19 15:04:31 +00:00
Jingning Han
300083da27 Merge "Re-structure probability model context for prediction filter type" into nextgenv2 2016-05-19 15:04:18 +00:00
Geza Lore
fa63b5514a Use standard rounding in intra filters.
Use the same rounding method that is used throughout the codebase,
where the halfway value is rounded up rather than down.

Change-Id: Ie969ed7eb9fcc88a93a90c7e274fd82f336c7e4d
2016-05-19 13:16:42 +01:00
Yi Luo
346d2449f0 Fix to conform Google's coding convention
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.

Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe
2016-05-18 18:15:53 -07:00
Jingning Han
9161464f6c Account sub8x8 block reference filter type for prob context
If a reference block is coded with sub8x8 block size, and if it
has sub-pixel level motion vectors, its prediction filter type
should be used as context information.

The coding performance gains of dual filter type coding scheme are
lowres  0.57%
hdres   0.88%

Change-Id: I68b98f2518d02f11c29d0256aeb45b2580fe5cac
2016-05-18 12:35:31 -07:00
Angie Chiang
6f28581b26 Turn on flip in inverse txfm2d
Fix build failed
Reduce txfm test time

Change-Id: Ieaf6b27f3a272d06286f817f01230413fa8adcf6
2016-05-18 11:26:57 -07:00
Jingning Han
27d44a1843 Re-structure probability model context for prediction filter type
This commit reworks the probability model contexts used in the
prediction filter type entropy coding.

Change-Id: I7abc68cb469248d0d7ca1046da3c086ecb7b066a
2016-05-18 11:11:43 -07:00
Jingning Han
436f78fab7 Silience compiler warnings in unsigned int
Add suffix u to clarify the unsigned int constant when the value
is above 2^31.

Change-Id: Ic712096285b7bf37deaeb5ad1b6b117fc0d67093
2016-05-17 16:46:42 -07:00
Jingning Han
c4e7fde68a Merge "Properly handle 2D filter boundary extension" into nextgenv2 2016-05-16 21:34:28 +00:00
Yi Luo
ceabb00704 Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2 2016-05-16 21:15:08 +00:00
Debargha Mukherjee
250c6af087 Merge "Various wedge enhancements" into nextgenv2 2016-05-16 21:11:35 +00:00
Debargha Mukherjee
fb8ea1736b Various wedge enhancements
Increases number of wedges for smaller block and removes
wedge coding mode for blocks larger than 32x32.

Also adds various other enhancements for subsequent experimentation,
including adding provision for multiple smoothing functions
(though one is used currently), adds a speed feature that decides
the sign for interinter wedges using a fast mechanism, and refactors
wedge representations.

lowres: -2.651% BDRATE

Most of the gain is due to increase in codebook size for 8x8 - 16x16.

Change-Id: I50669f558c8d0d45e5a6f70aca4385a185b58b5b
2016-05-16 12:41:47 -07:00
Jingning Han
14dd5538e9 Properly handle 2D filter boundary extension
The amount of border extension needed in the first stage inter
filtering is decided by the length of the second stage filter
kernel.

Change-Id: Icddbc58c02234d5df09ff0eeebcf166ffe689203
2016-05-16 11:49:27 -07:00