45 Commits

Author SHA1 Message Date
Yaowu Xu
7e89c102c4 vp9-highbitdepth -> vpx-highbitdepth
Change-Id: I1e90cf7ab4bb02c0ef119b0bd1596771edefedff
2016-08-05 15:41:33 -07:00
Yaowu Xu
d4c4724090 Cherry pick renaming changes from AOMedia branch
Manually cherry-picked the following changes:
8c8d16de vp9 -> vpx in names
75b57d39 VP9_ -> VPX_ in function names
761a7088 VP9_INTERP_EXTEND -> VPX_INTERP_EXTEND
4273a52c VP9->VPX in border pixel macros
03568c31 VP9_FRAME_MARKER -> VPX_FRAME_MARKER
2334f51d VP9->VPX in fdct function names

Change-Id: Icc18dbf4b416dd0fa21033b3e19ab8a47c893508
2016-07-29 13:31:32 -07:00
Sarah Parker
166c3250a3 Add buf0, width, height fields to buf_2d
These are needed for the warping function in the global motion
experiment.

Change-Id: Iaab176d0c0b90f6b938e2bac48b24c07e87e3cd9
2016-07-18 11:04:56 -07:00
Geza Lore
4c4f04ac11 Optimize and cleanup obmc predictor and rd search.
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the obmc predictor. Clean up calc_target_weighted_pred.

Encoder speedup: 1.3%
Decoder speedup: 6.5%

Change-Id: I0c774fe53d22399e92a10d1daf3af0010d88d2c5
2016-07-13 16:54:20 +00:00
Geza Lore
cd489264e1 Optimize and cleanup supertx predictor.
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the supertx predictor.

Decoder speedup of up to 4% has been observed.

Change-Id: I255a5ba4cc24f78dc905d25b6e2f7fbafac13253
2016-07-11 18:14:21 +00:00
Geza Lore
bfa59b4a5f Improve vpx_blend_* functions.
- Made source buffers pointers to const.
- Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
  indicative that the function does alpha blending. The 6, or 6b
  suffix was misleading, as the max mask value (64) does not fit into
  6 bits.
- Added VPX_BLEND_* macros to use when needing to blend scalars.
- Use VPX_BLEND_A256 in combine_interintra to be more explicit about
  the operation being done.
- Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
  masks directly and apply them to all rows/columns
  (vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
  horizontal version now falls back on the 2D version. This can be
  improved upon if it show up high enough in a profile.
- All vpx_blend_a64_* functions now support block sizes down to 1x1
  (ie: a single pixel). This is for usage convenience. The SSE4.1
  optimized versions fall back on the C implementation if
  w <= 2 or h <= 2. This can again be improved if it becomes hot code.

Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
2016-07-11 19:05:17 +01:00
Geza Lore
135d663159 Reinstate "Optimize wedge partition selection." without tests.
This reinstates commit efda2831e5f758b4f350679b5c55c0b9282449b0
without the tests and with fixes for 32 bit x86 builds.

Change-Id: I34be4fe1e8a67686d26ba256fd7efe0eb6a569e8
2016-06-21 20:31:50 +01:00
Debargha Mukherjee
902ee5060c A crash fix for supertx / ext-inter combination.
Change-Id: I9860376c98aa3b25f5bf86ed13d4a7631fa6b153
2016-06-13 13:57:30 -07:00
Angie Chiang
95340fccb3 Revert "Optimize wedge partition selection."
This reverts commit efda2831e5f758b4f350679b5c55c0b9282449b0.

This commit causes segmentation fault at SSE2/SumSquares2DTest.RandomValues/0

Change-Id: I171937e4daf6f15323e8206418773deb03bd8c53
2016-06-09 19:17:37 -07:00
Geza Lore
efda2831e5 Optimize wedge partition selection.
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.

Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.

Details are in wedge_utils.c.

Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.

ext-inter gains about ~4.5% speedup.

Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
2016-06-06 14:43:10 +01:00
Geza Lore
ab29978e9f Pre-compute and use contiguous wedge masks.
This is purely a refactoring patch and has no functional effect.

Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of  operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.

Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
2016-06-03 00:16:22 -07:00
Debargha Mukherjee
fa5022978d Merge "Wedge refactoring to handle signs better" into nextgenv2 2016-05-20 23:19:39 +00:00
Debargha Mukherjee
e5de2ad632 Wedge refactoring to handle signs better
Mostly refactoring. Handles signs better though results are
more or less neutral.

Change-Id: If499537c8f8da4f34d104ebfda072eb4c85fb12f
2016-05-20 14:12:52 -07:00
Yaowu Xu
ba794ea356 Port change to highbitdepth code path
This fixes the crash in encoder when configure with both  highbitdepth
and dual-filter.

Change-Id: Ie06cc528094f4b31b7fc0ba75e7b15cae031d707
2016-05-20 11:30:37 -07:00
Jingning Han
0f513752a0 Rework sub8x8 chroma component inter predictor
This commit makes the sub8x8 chroma component inter predictor
operate at 2x2 block level. This allows one to use the actual motion
vector associated with each individal pixel block. It improves the
compression performance

lowres  0.40%
midres  0.25%
hdres   0.15%

Change-Id: Ia40e07cc7fde463dbf660018850e024932136c4f
2016-05-19 09:03:57 -07:00
Jingning Han
9161464f6c Account sub8x8 block reference filter type for prob context
If a reference block is coded with sub8x8 block size, and if it
has sub-pixel level motion vectors, its prediction filter type
should be used as context information.

The coding performance gains of dual filter type coding scheme are
lowres  0.57%
hdres   0.88%

Change-Id: I68b98f2518d02f11c29d0256aeb45b2580fe5cac
2016-05-18 12:35:31 -07:00
Debargha Mukherjee
fb8ea1736b Various wedge enhancements
Increases number of wedges for smaller block and removes
wedge coding mode for blocks larger than 32x32.

Also adds various other enhancements for subsequent experimentation,
including adding provision for multiple smoothing functions
(though one is used currently), adds a speed feature that decides
the sign for interinter wedges using a fast mechanism, and refactors
wedge representations.

lowres: -2.651% BDRATE

Most of the gain is due to increase in codebook size for 8x8 - 16x16.

Change-Id: I50669f558c8d0d45e5a6f70aca4385a185b58b5b
2016-05-16 12:41:47 -07:00
Yue Chen
372e12b959 Merge "Add single motion search for OBMC predictor" into nextgenv2 2016-05-11 17:20:32 +00:00
Yue Chen
370f203a40 Add single motion search for OBMC predictor
Weighted single motion search is implemented for obmc predictor.
When NEWMV mode is used, to determine the MV for the current block,
we run weighted motion search to compare the weighted prediction
with (source - weighted prediction using neighbors' MVs), in which
the distortion is the actual prediction error of obmc prediction.

Coding gain: 0.404/0.425/0.366 for lowres/midres/hdres
Speed impact: +14% encoding time
              (obmc w/o mv search 13%-> obmc w/ mv search 27%)

Change-Id: Id7ad3fc6ba295b23d9c53c8a16a4ac1677ad835c
2016-05-10 18:27:45 -07:00
Debargha Mukherjee
3fbe6e5e49 Merge "Wedge rd improvements" into nextgenv2 2016-05-10 20:34:00 +00:00
Debargha Mukherjee
447032eb32 Wedge rd improvements
Improves speed by about 10-15% by combining y-only rd with
modeling function in a better way.
Also, coding efficiency improves by about 0.1%

lowres: -1.805% BDRATE with ext-inter

Change-Id: I6ef1f8942ec6806252f3fcf749ae4f30dffe42b1
2016-05-10 11:47:48 -07:00
Jingning Han
9de916eb20 Fix dual filter type for high bit-depth
This commit fixes the compiler error in high bit-depth inter
predictor when dual filter type experiment is turned on.

Change-Id: I404a76a246477f2fcffc38a3275007d5dfe229cd
2016-05-09 02:14:48 +00:00
Jingning Han
bd33326372 Dual prediction filter type for motion compensated reference
Make the bit-stream level support per direction filter type coding
for motion compensated reference.

Change-Id: I61a2360b301075f6734cfd9711b7ae68f214174d
2016-05-07 03:03:04 +00:00
Debargha Mukherjee
3407785536 Refactoring and uv fix for wedge
lowres: -1.72%

Change-Id: I4c883097caac72fab8e01945454579891617145e
2016-05-03 08:02:08 -07:00
Debargha Mukherjee
88fe7871be Refactor wedge generation
Change-Id: I2ec4f562e28a4673477e20186f9d6167b24b76b8
2016-04-28 17:51:21 -07:00
Debargha Mukherjee
0fc82ea1cf Refactoring and cosmetic changes to ext-inter expt
Change-Id: Icd457480744b7734b3c412c9fed43be738373334
2016-04-05 15:16:18 -07:00
Debargha Mukherjee
8d3a4aa891 Some fixes/speed-ups on inter-intra part of ext-inter
Fixes an issue with rectangular inter-intra blocks.
Includes various other refactoring and cleanups to enable fast mixing
of inter and intra predictors.
Uses only the best single inter reference so far for the inter-intra
search.

About 30% speed-up with a 0.1% hit in performance.

This is part one of overhauling on the ext-inter experiment. To be
continued in subsequent patches.

Change-Id: Id10ee100c78c6e00009a3a4f930a4435ef403a95
2016-03-30 14:39:29 -07:00
Debargha Mukherjee
91707ac79e Merge "Extend superblock size fo 128x128 pixels." into nextgenv2 2016-03-30 20:55:32 +00:00
Geza Lore
552d5cd715 Extend superblock size fo 128x128 pixels.
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.

Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
2016-03-30 18:23:06 +01:00
Julia Robson
068e799459 Fix for ext_interp experiment
Amends previous commit to also handle subsampling correctly.
Change ID of prev commit: I6b07e6cf9b287ba4b5bd6599af4a7412e50b3bdc

Was causing occassional failures for 422 streams due to accessing
elements beyond the extent of the bmi array.

Change-Id: I37ebabf4c01ca84bcd1851428172bdf753805d98
2016-03-29 16:09:49 +01:00
Yue Chen
2e3f77316d Refactor prediction functions of OBMC
Merge the functions that generate prediction by above/left predictors
for the encoder and the decoder.

Change-Id: I57e53a8f2eb8d3028c4ed0c9abdcbf00503f95a0
2016-03-21 17:04:13 -07:00
Debargha Mukherjee
f34deab243 Adds compound wedge prediction modes
Incorporates wedge compound prediction modes.

Change-Id: Ie73b54b629105b9dcc5f3763be87f35b09ad2ec7
2016-03-10 07:19:54 -08:00
Debargha Mukherjee
48589e8d07 Merge "Some refactoring and cleanups of interp filter" into nextgenv2 2016-02-29 15:55:48 +00:00
Debargha Mukherjee
bab2912b5e Some refactoring and cleanups of interp filter
Includes various cosmetic changes and refactoring including
naming the sharp filters differently (since they are no longer
8-tap).

Change-Id: Ida5a19ca0daa9f6a64a6734394c685b2a4a2564a
2016-02-26 15:42:49 -08:00
Geza Lore
7ded038af5 Port interintra experiment from nextgen.
The interintra experiment, which combines an inter prediction and an
inter prediction have been ported from the nextgen branch. The
experiment is merged into ext_inter, so there is no separate configure
option to enable it.

Change-Id: I0cc20cefd29e9b77ab7bbbb709abc11512320325
2016-02-26 13:01:51 -08:00
Yue Chen
d1cad9c3f5 Overlapped block motion compensation experiment
In this experiment, an obmc inter prediction mode is enabled for
>= 8X8 inter blocks. When the obmc flag is on, the regular block-
based motion compensation will be refined by using predictors of
the above and left blocks.
Fixed some compatibility issues with vp9_highbitdepth, supertx,
ref_mv, and ext_interp.

Coding gain (%) on derflr/hevcmr/hevchd
OBMC:
1.047/1.022/0.708
OBMC + SUPERTX:
1.652/1.616/1.137
SUPERTX:
0.862/0.779/0.630

Change-Id: I5d8d3c4729c6d3ccb03ec7034563107893103b7f
2016-02-12 13:36:25 -08:00
Angie Chiang
d5349112e8 add convolution function with adjustable length
Change-Id: I1a5b1e15a188ef11594d0c6ac0dbd42aac59cfca
2016-02-05 17:33:19 -08:00
Angie Chiang
10ad97bc55 Pass filter type instead of filter array
Change-Id: I25f2149ddaa332722f7ab82e8f832a253c4b6ab3
2016-02-01 17:03:50 -08:00
Debargha Mukherjee
eef57c1e99 Fixes ext-interp experiment
Fixes integer pel MV usage for the sub8x8 case, which fixes a
rare mismatch issue.

Also adds some other minor missing code related to filter threshes.

Change-Id: I6b07e6cf9b287ba4b5bd6599af4a7412e50b3bdc
2016-01-27 09:24:48 -08:00
Debargha Mukherjee
3787b17439 Super transform - ported from nextgen branch
Various additional changes were made to make the experiment
compatible with misc_fixes.

derflr: +0.979%
hevcmr: +0.865%

Speed-wise with --enable-supertx the encoder is only about 10%
slower than without. Decoding impact is about 30% slowdown.

Note this does not work with ext-tx or var-tx yet. That is
a TODO.

Change-Id: If25af4241a7a9efbd28f58eda3c4f044c7a7ef4b
2016-01-04 22:12:57 -08:00
Debargha Mukherjee
85514c40ae New interpolation experiment
Adds a new interpolation experiment.

Improves entropy coding to send the filter type only if
the motion vectors have subpel components.
Adds one new 8-tap smooth filter, and tweaks the others.

derflr: +0.695%
hevcmr: +0.305%

About 5% encode slowdown. No visible impact for decoding.

Also makes the interpolation framework flexible to support both
strictly interpolating filters as well as non-interpolating
filters that filter integer offsets. This is mainly for
further experimentation and if not found useful the code will
be removed.

Change-Id: I8db9cde56ca916be771fe54a130d608bf10786e6
2015-11-06 09:51:34 -08:00
Yaowu Xu
7c514e2dfd Merged branch 'master' into nextgenv2
Resolved Conflicts in the following files:
        configure
        vp10/common/idct.c
        vp10/encoder/dct.c
        vp10/encoder/encodemb.c
        vp10/encoder/rdopt.c

Change-Id: I4cb3986b0b80de65c722ca29d53a0a57f5a94316
2015-09-29 16:17:32 -07:00
Johann
7e14baa1da Mark VP10 functions as 'INLINE'
Change-Id: I3dce6c702344a5cb5aaf9de1e4be44c53f9ce7e9
2015-09-01 17:05:04 -07:00
Yaowu Xu
2dcefd9c7f Correct guard macros in header files
Change-Id: Ifce12a95c1cdc36dc6ac5a72759249a17407da9e
2015-08-13 09:25:39 -07:00
Jingning Han
54d66ef165 Remove vp9_ prefix from vp10 files
Remove the vp9_ prefix from vp10 file names.

Change-Id: I513a211b286a57d6126fc1b0fbfd6405120014f1
2015-08-11 21:24:08 -07:00