62 Commits

Author SHA1 Message Date
Debargha Mukherjee
e6790e30c5 Replace DST1 with DST2 for ext-tx experiment
A small gain (0.1 - 0.2%) with this experiment on derflr/hevcmr.

The DST2 can be implemened very efficiently using sign flipping
of odd indexed inputs, followed by DCT, followed by reversal of
the output. This is how it is implemented in this patch.
SIMD optimization is pending.

Change-Id: Ic2fc211ce0e6b7c6702974d76d6573f55cc4da0e
2015-12-14 13:54:41 -08:00
Peter de Rivaz
d79850e60c Bug fix for high bitdepth using flipped transforms.
Fixes mismatch and performance drop.

Change-Id: Ib99711eb3b78257a8105073e2b6d7031459357bb
2015-11-23 12:04:55 -08:00
Geza Lore
85ab9d56cc Flip the result of the inv transform for FLIPADST.
This is a port of 4f5108090a6047d5d4d9ce1df302da23b2ef4bc5

This commit also fixes a bug where FLIPADST transforms when combined
with a DST (that is FLIPADST_DST and DST_FLIPADST) did not actually did
a flipped transform but a straight ADST instead. This was due to the C
implementation that it fell back on not implementing flipping.  This is
now fixed as well and FLIPADST_DST and DST_FLIPADST does what it is
supposed to do.

There are 3 functions in the SR_MODE experiment that should be updated,
but given that the build of SR_MODE is broken at the upstream tip of
nextgen, I could not test these, so I have put in assertions and FIXME
notes at the problematic places.

Change-Id: I5b8175b85f944f2369b183a26256e08d97f4bdef
2015-11-17 14:31:28 -08:00
Debargha Mukherjee
3a45a1edfd Remove dst1 config option and merge with ext-tx
Change-Id: I0152ed352ae2a0a725a508b5c209ef2c1dc2302d
2015-11-13 11:24:38 -08:00
Debargha Mukherjee
ff9aa146cb Reimplementatio of dst1 for speed
Encoder with --enable-ext-tx --enable-dst1 is now 4 times faster.

Change-Id: Ia750ad3516698ce94da4ceb566b1c51539537a95
2015-10-02 11:06:55 -07:00
Debargha Mukherjee
5a25d567d7 64x64 idct fix.
Change-Id: If0e0cd7cbe71e9586657f5e8ffa87dcdebc686ba
2015-09-25 05:54:23 -07:00
Debargha Mukherjee
cdcebfba29 tx64x64 experiment fix for high-bitdepth
Change-Id: Ia8d769b43bad0f9ad0684ecf6925e580339c7397
2015-09-24 05:45:03 -07:00
Debargha Mukherjee
4dbaf9a5ab Redo DST1 in the ext-tx experiment
Moved from nextgenv2 branch to test with other experiments.

derflr: +1.629%

Change-Id: Ie7c720053ed8b628177679c4351bb31b54716a71
2015-09-16 09:46:13 -07:00
Shunyao Li
2de18d1fd2 Super resolution mode (+CONFIG_SR_MODE)
CONFIG_SR_MODE=1, enable SR mode
USE_POST_F=1, enable SR post filter
SR_USE_MULTI_F=1, enable SR post filter family
Not compatible with other experiments yet

Change-Id: I116f1d898cc2ff7dd114d7379664304907afe0ec
2015-08-31 15:29:39 -07:00
Debargha Mukherjee
23690fc5d1 Adds support for DST1 transforms for inter blocks
Adds an additional transform in the ext_tx experiment that
is a 2d DST1-DST1 combination.

To enable use --enable-ext-tx --enable-dst1.

This needs to be later extended to combine DST1 with DCT
or ADST.

Change-Id: I6d29f1b778ef8294bcfb6a512a78fc5eda20723b
2015-07-24 16:23:09 -07:00
Debargha Mukherjee
4b57a8b356 Add extended transforms for 32x32 and 64x64
Framework for alternate transforms for inter 32x32 and larger based
on dwt-dct hybrid is implemented.
Further experiments are to be condcuted with different
variations of hybrid dct/dwt or plain dwt, as well as super-resolution
mode.

Change-Id: I9a2bf49ba317e7668002cf1499211d7da6fa14ad
2015-07-23 18:01:22 -07:00
Debargha Mukherjee
b433dd4443 Adds wavelet transforms + hybrid dct/dwt variants
The wavelets implemented are 2/6, 5/3 and 9/7 each with
a lifting based scheme for even block sizes. The 9/7
one is a double implementation currently.

This is to start experiments with:
1. Replacing large transforms (32x32 and 64x64) with wavelets
or wavelet-dct hybrids that can hopefully localize errors better
spatially. (Will also need alternate entropy coder)
2. Super-resolution modes where the higher sub-bands may be
selectively skipped from being conveyed, while a smart
reconstruction recovers the lost frequencies.

The current patch includes two types of 32x32 and 64x64
transforms: one where only wavelets are used, and another
where a single level wavelet decomposition is followed
by a lower resolution dct on the low-low band.

Change-Id: I2d6755c4e6c8ec9386a04633dacbe0de3b0043ec
2015-06-08 23:30:38 -07:00
Deb Mukherjee
963393321c Iadst transforms to use internal low precision
Change-Id: I266777d40c300bc53b45b205144520b85b0d6e58
2015-05-06 10:10:18 -07:00
Peter de Rivaz
41973e0e3e Refactored idct routines and headers
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.

The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected.  For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.

Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
2015-05-06 09:59:20 -07:00
hui su
b69152db79 Add high bit depth support for tx-skip expt
+0.3% on 10-bit
+0.3% on 12-bit

With other high bit compatible experiments on 12-bit
+12.44% (+0.17) over 8-bit baseline

Change-Id: I40b4c382fa54ba4640d08d9d01950ea8c1200bc9
2015-04-16 14:54:39 -07:00
punksu
571fdbb05f dpcm intra prediction for tx_skip
Implements vertical, horizontal, and tm dpcm intra prediction for
blocks in tx_skip mode. Typical coding gain on screen content video
is 2%~5%.

Change-Id: Idd5bd84ac59daa586ec0cd724680cef695981651
2015-01-14 14:54:09 -08:00
hui su
5de9280ae9 tx_skip mode for lossy coding
This patch improves the non-transform coding mode. At this
point, the coding gain on screen content videos is about
12% for lossless, an 15% for lossy case.

1. Encode tx_skip flags with context. Y tx_skip flag context is
whether the prediction mode is inter or intra. UV flag context
is Y flag.

2. Transform skipping is less helpful when the Q-index is high.
So it is enabled only when the Q-index is smaller than a
threshold. Currently the threshold is set as 255 for intra blocks,
and 0 for inter blocks.

3. The shift of the prediction residue, when copying them to the
coeff buffer, is set as 3 when the Q-index is larger than a
threshold (currently set as 0), and 2 otherwise.

Change-Id: I372973c7518cf385f6e542b22d0f803016e693b0
2014-12-15 10:46:41 -08:00
Deb Mukherjee
c82de3bede Extending ext_tx expt to include dst variants
Extends the ext-tx experiment to include regular and flipped
DST variants. A total of 9 transforms are thus possible for
each inter block with transform size <= 16x16.

In this patch currently only the four ADST_ADST variants
(flipped or non-flipped in both dimensions) are enabled
for inter blocks.

The gain with the ext-tx experiment grows to +1.12 on derflr.
Further experiments are underway.

Change-Id: Ia2ed19a334face6135b064748f727fdc9db278ec
2014-11-20 14:25:40 -08:00
hui su
d97fd3eef6 Non transform coding experiment
Non-transform option is enabled in both intra and inter modes.
In lossless case, the average coding gain on screen content
clips is 11.3% in my test.

Change-Id: I2e8de515fb39e74c61bb86ce0f682d5f79e15188
2014-11-19 21:20:21 -08:00
Deb Mukherjee
0c7a94f49b Adding a 64x64 transform mode
Preliminary 64x64 transform implementation.
Includes all code changes.
All mismatches resolved.

Coding results for derf and stdhd are within noise. stdhd is slightly
higher, derf is slightly lower.

To be further refined.

Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b
2014-10-30 00:45:57 -07:00
Deb Mukherjee
1929c9b391 Rename highbitdepth functions to use highbd prefix
Uses highbd_ prefix convention consistently.

Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-10-09 14:40:40 -07:00
Jingning Han
12344f2697 Add range check in inverse ADST 16x16
Bit-stream clarification related to Issue 868.

Change-Id: I92a7bc5b7782c9ea5c3f6cceec761742183c9514
2014-10-06 11:07:58 -07:00
Deb Mukherjee
3bcc2af8cd Some data type changes in vp9_idct.c
Resolves a visual studio warning, and includes some cleanups.

Change-Id: I6a7576ef323c475b7d1c659800cd82c6cb1fd18d
2014-10-04 16:03:04 -07:00
Deb Mukherjee
d50716face Incorporate WRAPLOW macro into non-highbitdepth tx
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.

Also incudes some minor cleanups/renaming etc.

Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
2014-10-03 11:38:05 -07:00
Jingning Han
0829d2be7f Remove redundant header file declaration
Some header file in vp9_idct.c has been included in vp9_idct.h.
This commit removes these redundant declarations.

Change-Id: I0238c27e4efff5c981eb437022c6bc6970c4e445
2014-09-30 09:13:00 -07:00
Deb Mukherjee
10783d4f3a Adds high bitdepth transform functions and tests
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.

Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
2014-09-11 19:56:33 -07:00
Jingning Han
41a350a83d Change eob threshold for partial inverse 8x8 2D-DCT to 12
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.

The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).

Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
2014-05-08 09:48:58 -07:00
Dmitry Kovalev
ff41764920 Removing _1d suffix from transform names.
It is enough to specify (e.g.) idct16, it is obviously different from
idct16x16.

Change-Id: I6b408a37a945de3162429380b59a775b03b95db0
2014-01-27 16:15:36 -08:00
hkuang
6debc446e0 Remove unnecessary eob checking.
Change-Id: Ia568f70bddc1a2b62141a0197459119ca74c22b5
2013-11-20 11:58:11 -08:00
Jingning Han
7637387cf1 Fix coding format in vp9_idct
Change-Id: If97ae16a4478717933345b6b9d5bc1b417b8dd84
2013-11-14 16:05:22 -08:00
Yunqing Wang
f88315cb29 Add 32x32 idct function for eob<=34 case
When only upper-left 8x8 area has non-zero dct coefficients, we
could skip 1D IDCT for 9th to 32th rows to save operations. This
function is called when eob <= 34.

Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
2013-10-24 16:13:21 -07:00
Dmitry Kovalev
65f118d72f Making input pointer of any inverse transform constant.
Also renaming dest_stride to stride in some places.

Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940
2013-10-11 18:27:12 -07:00
Dmitry Kovalev
ac468dde46 Consistent names for inverse hybrid transforms (2 of 2).
Renames:
  vp9_iht_add       -> vp9_iht4x4_add
  vp9_iht_add_8x8   -> vp9_iht8x8_add
  vp9_iht_add_16x16 -> vp9_iht16x16_add

Change-Id: I8f1a2913e02d90d41f174f27e4ee2fad0dbd4a21
2013-10-11 15:49:05 -07:00
Dmitry Kovalev
7ef573914d Consistent names for inverse hybrid transforms (1 of 2).
Renames:
  vp9_short_iht4x4_add     -> vp9_iht4x4_16_add
  vp9_short_iht8x8_add     -> vp9_iht8x8_64_add
  vp9_short_iht16x16_add_c -> vp9_iht16x16_256_add

Change-Id: Ibca7a188fd062b196787ac5efc1ea545e7f166c0
2013-10-11 13:31:32 -07:00
Dmitry Kovalev
44195fda71 Adding const to the input argument of all 1D transforms.
Also adding static to iadst16_1d and fadst16 functions.

Change-Id: I13c7df3b776f0f8efc6e80099bdb0a2f6d29edaf
2013-10-11 11:19:58 -07:00
Dmitry Kovalev
ddf1b76205 Removing vp9_idct4_1d_sse2 function.
We have two SSE2-optimized functions for idct4_1d:
  vp9_idct4_1d_sse2 <-- removing this one
  idct4_1d_sse2

vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
  vp9_idct4x4_16_add_c   -> vp9_idct4x4_16_add_see2
  idct8_1d               -> vp9_idct8x8_{16, 10, 1}_see2
  vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2

Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
2013-10-10 16:50:43 -07:00
Dmitry Kovalev
1e766b50e2 Giving consistent names to IDCT 32x32 functions.
Renames:
  vp9_short_idct32x32_add   -> vp9_idct32x32_1024_add
  vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add
  vp9_idct_add_32x32        -> vp9_idct32x32_add

Change-Id: Id85306f5814bac6c47463a6b5901a93082510666
2013-10-10 11:27:39 -07:00
Dmitry Kovalev
419c3f6fba Merge "Giving consistent names to IDCT 16x16 functions." 2013-10-10 10:43:14 -07:00
Jingning Han
6594ca8897 All zero coeff skip in IDCT 32x32
When all coefficients are zeros, skip the corresponding 1-D inverse
transform. This practice has been used in the SSE2 implementation of
inverse 32x32 DCT. This commit imports this algorithm into the C code.

Change-Id: I0f58bfcb183a569fab85d524d5d9cf8ae8653f86
2013-10-08 11:47:29 -07:00
Dmitry Kovalev
b096c5a336 Giving consistent names to IDCT 16x16 functions.
Renames:
  vp9_short_idct16x16_add    -> vp9_idct16x16_256_add
  vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add
  vp9_short_idct16x16_1_add  -> vp9_idct16x16_1_add
  vp9_idct_add_16x16         -> vp9_idct16x16_add

Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3
2013-10-07 14:31:10 -07:00
Dmitry Kovalev
c6ad70d5f1 Giving consistent names to IDCT 8x8 functions.
Renames:
  vp9_short_idct8x8_add    -> vp9_idct8x8_64_add
  vp9_short_idct8x8_1_add  -> vp9_idct8x8_1_add
  vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add
  vp9_idct_add_8x8         -> vp9_idct8x8_add

Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1
2013-10-06 00:24:09 -07:00
Dmitry Kovalev
3a0602578e Giving consistent names to IDCT/IWHT functions.
The idea is to have the following names for each transform size:

vp9_idct4x4_add
  vp9_idct4x4_1_add
  vp9_idct4x4_10_add
  vp9_idct4x4_16_add

vp9_idct8x8_add
  vp9_idct8x8_1_add
  vp9_idct8x8_10_add
  vp9_idct8x8_64_add

etc for 16x16, 32x32

The actual list of renames in this patch:

vp9_idct_add_lossless     -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add   -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add

vp9_idct_add            -> vp9_idct4x4_add
vp9_short_idct4x4_add   -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add

Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
2013-10-04 14:17:06 -07:00
Dmitry Kovalev
be7eec79be Moving all idct/iht functions in one place.
Moving functions from vp9_idct_blk to vp9_idct because these functions are
used from both encoder and decoder. Removing duplicated code from
vp9_encodemb.c and reusing existing functions.

Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
2013-10-02 14:13:33 -07:00
Dmitry Kovalev
548671dd20 Removing vp9_add_constant_residual_{8x8, 16x16, 32x32} functions.
We don't need these functions anymore. The only one which was actually
used is vp9_add_constant_residual_32x32. Addition of
vp9_short_idct32x32_1_add eliminates this single usage. SSE2 optimized
version of vp9_short_idct32x32_1_add will be added in the next patch set,
right now it is only C implementation. Now we have all idct functions
implemented in a consistent manner.

Change-Id: I63df79a13cf62aa2c9360a7a26933c100f9ebda3
2013-09-30 10:56:37 -07:00
Dmitry Kovalev
3fab2125ff Renaming vp9_short_idct10_8x8_add to vp9_short_idct8x8_10_add.
Making name consistent with vp9_short_idct8x8 and vp9_short_idct8x8_1.

Change-Id: I99e0be040ec893f9571dcf090e18f98dc58339f5
2013-09-27 15:26:27 -07:00
Dmitry Kovalev
15a36a0a0d Renaming vp9_short_idct10_16x16 to vp9_short_idct16x16_10.
Making function name consistent with vp9_short_idct16x16 and
vp9_short_idct16x16_1.

Change-Id: I70e54be9e6b9a1dddab0de470686591e96d05517
2013-09-26 14:01:25 -07:00
Yaowu Xu
6037f17942 Rename defined constants
The change is to better reflect the nature of the constants.

Change-Id: Icabac6e9bceefbdb3f03f8218f88ef75943c30fb
2013-09-24 10:53:01 -07:00
Jingning Han
67719abde1 Remove unused vp9_short_idct10_32x32_add
The inverse 32x32 transform detects all zero entries and skips the
computations accordingly per 8 rows in the first 1-D operation. The
function vp9_short_idct10_32x32_add performs differently and is not
used anywhere, hence removed.

Change-Id: Ic4fad422debbde7b6b6ffed47c69fbd4268a906c
2013-08-01 12:45:16 -07:00
Jingning Han
a7c4de22e1 16x16 inverse 2D-DCT with DC only
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.

Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
2013-07-29 14:45:53 -07:00
Jingning Han
325e0aa650 Special handle on DC only inverse 8x8 2D-DCT
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.

Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011
2013-07-26 14:16:51 -07:00