1473 Commits

Author SHA1 Message Date
Jingning Han
35b3bd3e3b Fix an encoding failure case when speed features are on
This commit fixes an encoding failure case triggered when early
termination feature is turned on for transform block size search.
It resolves the corresponding enc/dec mismatch issue.

Change-Id: I2c5b7d8b1efe25fe3810e6ed307f4b1865dede49
2015-11-10 16:04:00 -08:00
Yaowu Xu
b49ac0b160 Merge branch 'master' into nextgenv2
Change-Id: I8811bfd8fc132b9f515707e795bb6308e4bf263b
2015-11-09 09:52:18 -08:00
Debargha Mukherjee
bc54f9dc00 Merge "Resolve conficts caused by master branch merging" into nextgenv2 2015-11-06 23:35:07 +00:00
Angie Chiang
c7c69d88af Merge changes I7ca0cc34,I97189d6e,I4e2b51cf,I21158867,I8d73beee into nextgenv2
* changes:
  Add adst_dct config to vp10_fwd_txfm2d_cfg
  Add adst_adst config to vp10_fwd_txfm2d_cfg
  Add dct_adst config to vp10_fwd_txfm2d_cfg
  Add dct_dct config to vp10_fwd_txfm2d_cfg
  Add vp10_fwd_txfm2d_8x8/16x16/32x32
2015-11-06 23:34:56 +00:00
Angie Chiang
e26c712ab2 Merge "Add vp10_fwd_txfm2d_4x4" into nextgenv2 2015-11-06 23:34:35 +00:00
hui su
6ab6ac450b Use accurate bit cost for uv_mode in UV intra mode RD selection
On derflr, +0.1% for VP10; however, -0.03% on VP9.

Change-Id: I09c724232ede74254043d61d3cadc506256af0af
2015-11-06 14:45:43 -08:00
hui su
707cd03658 Resolve conficts caused by master branch merging
Change-Id: I167e241b789331572581fcb0567ebe535b4b9345
2015-11-06 14:35:08 -08:00
Angie Chiang
45222e5b20 Add adst_dct config to vp10_fwd_txfm2d_cfg
Change-Id: I7ca0cc341ae36ac9f7aa24789f8872161b832b7b
2015-11-06 10:47:46 -08:00
Angie Chiang
786f1af891 Add adst_adst config to vp10_fwd_txfm2d_cfg
Change-Id: I97189d6e917929c756a3f89fe0ab66077a0a5436
2015-11-06 10:47:46 -08:00
Angie Chiang
634d0bdc7c Add dct_adst config to vp10_fwd_txfm2d_cfg
Change-Id: I4e2b51cf5b0dedb9ea1106747edb76835804fffc
2015-11-06 10:47:46 -08:00
Angie Chiang
51c0c35c6a Add dct_dct config to vp10_fwd_txfm2d_cfg
Change-Id: I21158867fb2b762d3632d0664ebe70c68d0953e1
2015-11-06 10:47:46 -08:00
Angie Chiang
f08141c734 Add vp10_fwd_txfm2d_8x8/16x16/32x32
Change-Id: I8d73beee5a619d26f3f8640a6679150d874522c4
2015-11-06 10:47:45 -08:00
Angie Chiang
ff7fe99342 Add vp10_fwd_txfm2d_4x4
Change-Id: I9bca3b1c76b64575366d71ab65ffef7264ce0c9b
2015-11-06 10:39:27 -08:00
Debargha Mukherjee
85514c40ae New interpolation experiment
Adds a new interpolation experiment.

Improves entropy coding to send the filter type only if
the motion vectors have subpel components.
Adds one new 8-tap smooth filter, and tweaks the others.

derflr: +0.695%
hevcmr: +0.305%

About 5% encode slowdown. No visible impact for decoding.

Also makes the interpolation framework flexible to support both
strictly interpolating filters as well as non-interpolating
filters that filter integer offsets. This is mainly for
further experimentation and if not found useful the code will
be removed.

Change-Id: I8db9cde56ca916be771fe54a130d608bf10786e6
2015-11-06 09:51:34 -08:00
Hui Su
9b3ad185dc Merge "ext-intra experiment" into nextgenv2 2015-11-06 17:40:49 +00:00
Debargha Mukherjee
70e514ce78 Merge "Flip the result of the inverse transform for FLIPADST." into nextgenv2 2015-11-06 09:20:46 +00:00
Debargha Mukherjee
46d2cc5714 Merge "Eliminate copying for FLIPADST in fwd transforms." into nextgenv2 2015-11-06 08:37:25 +00:00
Angie Chiang
b0df5e0f9e Add iadst32
Change-Id: I3a53ee51146d0bd4b0fe4b27c286e8c921f9823b
2015-11-04 14:23:56 -08:00
Angie Chiang
35486a6b88 Add iadst16
Change-Id: I093881aacaf9a070f78cc4eea2e8a6ede8a71792
2015-11-04 14:23:56 -08:00
Angie Chiang
0ca0cc240b Add iadst8
Change-Id: Ia58e4735d7d7bfd2ac55259c32705118c6745c6d
2015-11-04 14:23:56 -08:00
Angie Chiang
ba69089e65 Add iadst4
Change-Id: Ie419b2b1e939a41c30ed609e1ba46f5f6609b2a5
2015-11-04 14:23:56 -08:00
Angie Chiang
7467833401 Add idct32
Change-Id: I75412bdc4bd0d9c90e8b56e02e0e467a2d9957f9
2015-11-04 14:23:56 -08:00
Angie Chiang
d3cee565ad Add idct16
Change-Id: I8e5ba3a3f9b64ccbf038e371525e897774729b06
2015-11-04 14:23:56 -08:00
Angie Chiang
bd9db2f55b Add idct8
Change-Id: I8092a6f229b196c5c8b7dcd2dff8aaf68253e422
2015-11-04 14:23:56 -08:00
Angie Chiang
7d2b7b6944 Add idct4
Change-Id: I1d1b6822452772cec95160491c7bc6d3bba1f5c2
2015-11-04 14:23:56 -08:00
Angie Chiang
a9253a2029 Add fadst32
Change-Id: I77299f0e39fc7cef91e7e420513dbd05194f320a
2015-11-04 14:23:56 -08:00
Angie Chiang
a7d26f4e80 Add fadst16
Change-Id: I5175e39b5df73646488f74b2a9e4a463ae79d91a
2015-11-04 14:23:56 -08:00
Debargha Mukherjee
12fac1c281 Merge "Fix transform tables in C implementations." into nextgenv2 2015-11-04 21:11:38 +00:00
Angie Chiang
3813c2bc46 Merge "Add fadst8" into nextgenv2 2015-11-04 20:21:08 +00:00
Angie Chiang
498866b699 Merge "Add fadst4" into nextgenv2 2015-11-04 20:20:57 +00:00
Jingning Han
de00c163c7 Merge "Simplify txfm rate-distortion optimization" into nextgenv2 2015-11-04 19:31:03 +00:00
Jingning Han
493d02347c Simplify txfm rate-distortion optimization
This commit refactors the rate-distortion optimization scheme for
transform block coding. When both ext-tx and var-tx experiments
are turned on, the encoding time for bus_cif at 1000 kbps goes down
from 706377 ms to 666503 ms (5.6% speed-up). The coding statics
remain unchanged.

Change-Id: I20835db573725580aad79c16220f799ce01f2093
2015-11-04 10:25:48 -08:00
Geza Lore
4f5108090a Flip the result of the inverse transform for FLIPADST.
When using FLIPADST, the vp10_inv_txfm_add functions used to flip
the destination array, add the result of the inverse transform, to it
and then flip the destination back. This has been replaced by
flipping the result of the inverse transform before adding it to the
destination. Up-Down flipping is done by negating the destination
stride, and staring from the bottom, so it should now be free.
Left-right flipping is done with the usual SSE2 instructions in the
optimized code.

The C functions match the SSE2 functions as expected, so the C functions
now do the flipping as well when required. Adding this cleanly required
some refactoring of the C functions, but there is no measurable
performance impact when ext-tx is not enabled.

Encode speedup with ext-tx enabled is about 3%.

Change-Id: I5b04e5d720f0b9f0d54fd8607a8764f2314c7234
2015-11-04 17:11:44 +00:00
Yaowu Xu
4aafd01861 Merge branch 'master' into nextgenv2 2015-11-04 05:00:05 -08:00
hui su
be3559ba07 ext-intra experiment
Currently there are two parts in this experiment: extra directional intra
prediction modes and the filter intra modes migrated from the nextgen branch.

Several macros are defined in "blockd.h" to provide controls of the experiment
settings. Setting "DR_ONLY" as 1 (default is 0) means we only use directional
modes, and skip the filter-intra modes; "EXT_INTRA_ANGLES" (default is 128)
defines the number of different angles we want to support; setting
"ANGLE_FAST_SEARCH" as 1 (default is 1) means we use fast sub-optimal search
for the best prediction angle, instead of exhaustive search. The fast search
is about 6 times faster than the exhaustive search, while preserving about
60% of the coding gains.

With extra directional prediction modes (fast search), we observe the following
code gains (number in parentheses is for all-key-frame setting):
derflr +0.42%  (+1.79%)
hevclr +0.78%  (+2.19%)
hevcmr +1.20%  (+3.49%)
stdhd  +0.56%
Speed-wise, about 110% slower for key frames, and 30% slower overall.

The gains of filter intra modes mostly add up with the gains of directional
modes. The overall coding gain of this experiment:
derflr +0.94%
hevclr +1.46%
hevcmr +1.94%
stdhd  +1.58%

Change-Id: Ida9ad00cdb33aff422d06eb42b4f4e5f25df8a2a
2015-11-03 18:46:02 -08:00
Jingning Han
4101154d5b Merge "Re-work rate-distortion optimization scheme for transform coding" into nextgenv2 2015-11-03 22:47:21 +00:00
Hui Su
3cbe767972 Merge "Generate intra prediction reference values only when necessary" 2015-11-03 20:55:14 +00:00
Alex Converse
255bcf8697 Merge "misc fixes: Remove a wasted value." 2015-11-03 17:52:34 +00:00
Geza Lore
01bb4a318d Eliminate copying for FLIPADST in fwd transforms.
This patch eliminates the copying of data when using FLIPADST forward
transforms, by incorporating the necessary data flipping into the
load_buffer_* functions of the SSE2 optimized forward transforms. The
load_buffer_* functions are normally inlined, so the overhead of copying
the data is removed and the overhead of flipping is minimized. Left to
right flipping is still not free, as the columns need to be shuffled in
registers.

To preserve identity between the C and SSE2 implementations, the
appropriate C implementations now also do the data flipping as part of
the transform, rather than relying on the caller for flipping the input.

Overall speedup is about 1.5-2% in encode on my tests. Note that these
are only the forward transforms. Inverse transforms to come in a later
patch.

There are also a few code hygiene changes:
- Fixed some indents of switch statements.
- DCT_DCT transform now always use vp10_fht* functions, which dispatch
  to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct*
  directly, some of them used to call vp10_fht*).

Change-Id: I93439257dc5cd104ac6129cfed45af142fb64574
2015-11-03 17:10:55 +00:00
Geza Lore
2b39bcec29 Fix transform tables in C implementations.
These tables were out of sync with the indexing enum since the
refactoring in commit 4f16f119 (change 303389), due to the removal
of the ext_tx_to_txtype lookup table. This patch just puts them
back in order.

Change-Id: Ieb7d57654f61b99b511d54c9ba09abbd5e8d0d14
2015-11-03 17:10:51 +00:00
Jingning Han
696ee004a5 Re-work rate-distortion optimization scheme for transform coding
This commit re-works the rate-distortion optimization scheme for
transform coding. It improves the overall compression performance.
For derf set, the ext-tx experiment provides 2.27% coding gains,
and the new scheme that integrates multiple transform type selection
and recursive transform block partitioning provides a total of 3.24%
coding gains.

Change-Id: Ia1887c4c44b73dfb915d091d96660a99f09d5cc3
2015-11-03 09:03:53 -08:00
Jingning Han
6d43a53a0c Merge "Incorporate flexible tx type and tx partition in RD scheme" into nextgenv2 2015-11-03 16:43:48 +00:00
Yaowu Xu
2c32861814 Merge branch 'master' into nextgenv2 2015-11-03 05:00:04 -08:00
Jingning Han
4b594d3d00 Incorporate flexible tx type and tx partition in RD scheme
This commit hooks up the rate-distortion optimization system to
fully exploit recursive transform block partition and multiple
transform type. The compression performance of the two experiments
largely adds up. For derf set, ext-tx provides additional 2.1%
coding gains on top of the gains due to recursive transform block
partition (0.69%).

Change-Id: I1091fb9545f74e489a6a2489dc3c12f5abd05043
2015-11-02 17:40:05 -08:00
Jingning Han
4b0ef55f10 Fix block size computation in coeff token packing
Correctly compute the block size in bit-stream coefficient token
packing. This fixes an enc/dec mismatch at very high bit-rates.

Change-Id: I37bf084731dc660df0c695cad406ddcd0f9eb904
2015-11-02 14:55:55 -08:00
hui su
16bf821dfc Move palette-based intra prediction out of misc-fixes
Change-Id: Ia59724413c4a4831390119a33d40a7d713b4b69f
2015-11-02 11:11:25 -08:00
Jingning Han
dfd054649f Merge "Make loop filter support recursive transform block partitioning" into nextgenv2 2015-11-02 19:06:48 +00:00
hui su
e085fb643f Generate intra prediction reference values only when necessary
This can help increase encoding speed substantially.

Change-Id: Id0c009146e6e74d9365add71c7b10b9a57a84676
2015-11-02 10:26:50 -08:00
Jingning Han
88b3b23619 Merge "Refactor loop filter mask" into nextgenv2 2015-10-31 18:21:33 +00:00
Jingning Han
365fa8d12d Merge "Fix a switch condition in select_tx_block" into nextgenv2 2015-10-31 18:20:54 +00:00