86 Commits

Author SHA1 Message Date
Debargha Mukherjee
e6790e30c5 Replace DST1 with DST2 for ext-tx experiment
A small gain (0.1 - 0.2%) with this experiment on derflr/hevcmr.

The DST2 can be implemened very efficiently using sign flipping
of odd indexed inputs, followed by DCT, followed by reversal of
the output. This is how it is implemented in this patch.
SIMD optimization is pending.

Change-Id: Ic2fc211ce0e6b7c6702974d76d6573f55cc4da0e
2015-12-14 13:54:41 -08:00
Debargha Mukherjee
3a45a1edfd Remove dst1 config option and merge with ext-tx
Change-Id: I0152ed352ae2a0a725a508b5c209ef2c1dc2302d
2015-11-13 11:24:38 -08:00
Geza Lore
177ad11981 Eliminate copying for FLIPADST in fwd transforms.
This is a port of 01bb4a318dc0f9069264b7fd5641bc3014f47f32

This commit also fixes a bug where FLIPADST transforms when combined
with a DST (that is FLIPADST_DST and DST_FLIPADST) did not actually did
a flipped transform but a straight ADST instead. This was due to the C
implementation that it fell back on not implementing flipping.  This is
now fixed as well and FLIPADST_DST and DST_FLIPADST does what it is
supposed to do.

Change-Id: I89c67ca1d5e06808a1567c51e7d6bec4998182bd
2015-11-13 09:34:26 -08:00
Debargha Mukherjee
ff9aa146cb Reimplementatio of dst1 for speed
Encoder with --enable-ext-tx --enable-dst1 is now 4 times faster.

Change-Id: Ia750ad3516698ce94da4ceb566b1c51539537a95
2015-10-02 11:06:55 -07:00
Debargha Mukherjee
4dbaf9a5ab Redo DST1 in the ext-tx experiment
Moved from nextgenv2 branch to test with other experiments.

derflr: +1.629%

Change-Id: Ie7c720053ed8b628177679c4351bb31b54716a71
2015-09-16 09:46:13 -07:00
Cheng Chen
cc4d523d9f Resolve bug of DST1 in ext_tx experiment.
Change-Id: I828569e3596f9b9e8487aec7c4056e66cf1fc1f2
2015-07-28 10:58:14 -07:00
Debargha Mukherjee
23690fc5d1 Adds support for DST1 transforms for inter blocks
Adds an additional transform in the ext_tx experiment that
is a 2d DST1-DST1 combination.

To enable use --enable-ext-tx --enable-dst1.

This needs to be later extended to combine DST1 with DCT
or ADST.

Change-Id: I6d29f1b778ef8294bcfb6a512a78fc5eda20723b
2015-07-24 16:23:09 -07:00
Debargha Mukherjee
b433dd4443 Adds wavelet transforms + hybrid dct/dwt variants
The wavelets implemented are 2/6, 5/3 and 9/7 each with
a lifting based scheme for even block sizes. The 9/7
one is a double implementation currently.

This is to start experiments with:
1. Replacing large transforms (32x32 and 64x64) with wavelets
or wavelet-dct hybrids that can hopefully localize errors better
spatially. (Will also need alternate entropy coder)
2. Super-resolution modes where the higher sub-bands may be
selectively skipped from being conveyed, while a smart
reconstruction recovers the lost frequencies.

The current patch includes two types of 32x32 and 64x64
transforms: one where only wavelets are used, and another
where a single level wavelet decomposition is followed
by a lower resolution dct on the low-low band.

Change-Id: I2d6755c4e6c8ec9386a04633dacbe0de3b0043ec
2015-06-08 23:30:38 -07:00
Peter de Rivaz
41973e0e3e Refactored idct routines and headers
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.

The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected.  For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.

Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
2015-05-06 09:59:20 -07:00
Alex Converse
9b638cded6 tx_skip: Avoid undefined shift behavior.
vp9_quantize_rect did illegal shifts but didn't use the results.
The shift |a << b| is unfortunately undefined if |a < 0|, but the
more verbose |a * (1 << b)| generates the same machine code.

Change-Id: I7ceac66fa20a700630cf8ed008949146b161dab4
2015-04-30 12:56:27 -07:00
Deb Mukherjee
35d38646ec Misc changes to support high-bitdepth with supertx
Change-Id: I0331646d1c55deb6e4631e64bd6b092fb892a43e
2015-03-12 16:52:25 -07:00
punksu
571fdbb05f dpcm intra prediction for tx_skip
Implements vertical, horizontal, and tm dpcm intra prediction for
blocks in tx_skip mode. Typical coding gain on screen content video
is 2%~5%.

Change-Id: Idd5bd84ac59daa586ec0cd724680cef695981651
2015-01-14 14:54:09 -08:00
hui su
5de9280ae9 tx_skip mode for lossy coding
This patch improves the non-transform coding mode. At this
point, the coding gain on screen content videos is about
12% for lossless, an 15% for lossy case.

1. Encode tx_skip flags with context. Y tx_skip flag context is
whether the prediction mode is inter or intra. UV flag context
is Y flag.

2. Transform skipping is less helpful when the Q-index is high.
So it is enabled only when the Q-index is smaller than a
threshold. Currently the threshold is set as 255 for intra blocks,
and 0 for inter blocks.

3. The shift of the prediction residue, when copying them to the
coeff buffer, is set as 3 when the Q-index is larger than a
threshold (currently set as 0), and 2 otherwise.

Change-Id: I372973c7518cf385f6e542b22d0f803016e693b0
2014-12-15 10:46:41 -08:00
hui su
d97fd3eef6 Non transform coding experiment
Non-transform option is enabled in both intra and inter modes.
In lossless case, the average coding gain on screen content
clips is 11.3% in my test.

Change-Id: I2e8de515fb39e74c61bb86ce0f682d5f79e15188
2014-11-19 21:20:21 -08:00
Deb Mukherjee
0c7a94f49b Adding a 64x64 transform mode
Preliminary 64x64 transform implementation.
Includes all code changes.
All mismatches resolved.

Coding results for derf and stdhd are within noise. stdhd is slightly
higher, derf is slightly lower.

To be further refined.

Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b
2014-10-30 00:45:57 -07:00
Deb Mukherjee
1929c9b391 Rename highbitdepth functions to use highbd prefix
Uses highbd_ prefix convention consistently.

Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-10-09 14:40:40 -07:00
Deb Mukherjee
10783d4f3a Adds high bitdepth transform functions and tests
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.

Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
2014-09-11 19:56:33 -07:00
Jingning Han
6b0bc34b62 Fix C versions of DC calculation functions
This commit fixes the scaling factors used in the C versions of the
DC calculation functions.

Change-Id: Iab41108c2bb93c2f2e78667214f3a772a2b707b5
2014-06-13 16:09:40 -07:00
Jingning Han
ccba289f8d Fast computation path for forward transform and quantization
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.

It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.

In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.

Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
2014-06-12 11:10:54 -07:00
Jingning Han
7f547336b7 Adjust the forward 16x16 DCT computation steps
This commit adjusts the forward 16x16 DCT computation steps to
simplify the register level operations. It fixes the corresponding
sse2 version accordingly.

Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08
2014-05-19 12:39:26 -07:00
Andrew Russell
549c31f8ae minor spelling cleanup in comments
Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06
2014-02-12 16:32:51 -08:00
Dmitry Kovalev
005fc6970b Finally removing "short" from transform names.
Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d
2014-02-06 11:54:15 -08:00
Dmitry Kovalev
ff41764920 Removing _1d suffix from transform names.
It is enough to specify (e.g.) idct16, it is obviously different from
idct16x16.

Change-Id: I6b408a37a945de3162429380b59a775b03b95db0
2014-01-27 16:15:36 -08:00
Jingning Han
bdc4371174 Take out assertion from inverse transforms
Separate the rounding and right shift operations of forward transform
from those of inverse transform. Take out the assertion check from
inverse transforms. If the transform coefficients were constructed to
cause intermediate steps of inverse transform overflow, the codec will
just let it overflow without breaking the decoding flow.

Change-Id: I73cfc3706c4e840fc543a77cbc4cdb0b05d07730
2013-11-15 15:30:47 -08:00
Dmitry Kovalev
ae2f732e8c Adding fht{4x4, 8x8, 16x16} functions.
Adding these functions to encapsulate tx_type check. Changing TX_TYPE to
int to match the declaration in vo9_rtch.h.

Change-Id: I6f3a2df6e35595ca73b6aaa9e3909ee7bc3fd16f
2013-10-25 17:55:07 -07:00
Dmitry Kovalev
600a3860a4 Making input pointer constant for all fdct/fht functions.
Change-Id: I78f7012f967a777ddd39bae6671eb501df6bbfe8
2013-10-24 11:48:25 -07:00
Dmitry Kovalev
fd724f13b0 Renaming vp9_short_fdct4x4 and vp9_short_walsh4x4.
For consistency with idct function names. Renames:
  vp9_short_fdct4x4  -> vp9_fdct4x4
  vp9_short_walsh4x4 -> vp9_fwht4x4

Change-Id: Id15497cc1270acca626447d846f0ce9199770f58
2013-10-23 14:28:39 -07:00
Dmitry Kovalev
a018988ce8 Renaming vp9_short_fdct32x32 to vp9_fdct32x32.
For consistency with idct function names.

Change-Id: Ie77b7178e0894c57cd5cb9243c949eb9224ece18
2013-10-23 13:41:40 -07:00
Dmitry Kovalev
5bdd4d9ccf Merge "Renaming vp9_short_fdct16x16 to vp9_fdct16x16." 2013-10-23 13:37:09 -07:00
Dmitry Kovalev
02feb63684 Renaming vp9_short_fdct16x16 to vp9_fdct16x16.
For consistency with idct function names.

Change-Id: I5ca355ba99fdba04f09254be95cf79808b534f71
2013-10-23 10:57:12 -07:00
Dmitry Kovalev
fa143dbc8e Renaming vp9_short_fdct8x8 to vp9_fdct8x8.
For consistency with idct function names.

Change-Id: I7b6af2f92c66eff56f84ed29edc3a66af8dc421f
2013-10-23 10:52:33 -07:00
Dmitry Kovalev
9f09618bd4 Merge "Using stride (# of elements) instead of pitch (bytes) in fdct4x4." 2013-10-22 13:05:24 -07:00
Dmitry Kovalev
a767d10fa5 Merge "Using stride (# of elements) instead of pitch (bytes) in fdct8x8." 2013-10-22 11:34:17 -07:00
Dmitry Kovalev
190c2b4591 Using stride (# of elements) instead of pitch (bytes) in fdct4x4.
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.

Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
2013-10-21 15:27:35 -07:00
Dmitry Kovalev
e5fa44c869 Using stride (# of elements) instead of pitch (bytes) in fdct8x8.
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.

Change-Id: Ibc944952a192e6c7b2b6a869ec2894c01da82ed1
2013-10-18 12:20:26 -07:00
Dmitry Kovalev
1aa7fd5aef Using stride (# of elements) instead of pitch (bytes) in fdct16x16.
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.

Change-Id: I2d95fdcbba96aaa0ed24a80870cb38f53487a97d
2013-10-18 11:49:33 -07:00
Dmitry Kovalev
e05412fc23 Using stride (# of elements) instead of pitch (bytes) in fdct32x32.
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.

Change-Id: Id623c5113262655fa50f7c9d6cec9a91fcb20bb4
2013-10-17 13:02:28 -07:00
Dmitry Kovalev
a4585285ed Removing unused 8x4 transform from the encoder.
Change-Id: Icbcf68b5b685a56f255ebc3859c9692accdadf9e
2013-10-15 11:27:28 -07:00
Dmitry Kovalev
44195fda71 Adding const to the input argument of all 1D transforms.
Also adding static to iadst16_1d and fadst16 functions.

Change-Id: I13c7df3b776f0f8efc6e80099bdb0a2f6d29edaf
2013-10-11 11:19:58 -07:00
Dmitry Kovalev
fc82dbb434 Consistent names for FDCT functions.
Renames:
  fdct4_1d   -> fdct4
  fadst4_1d  -> fadst4
  fdct8_1d   -> fdct8
  fadst8_1d  -> fadst8
  fdct16_1d  -> fdct16
  fadst16_1d -> fadst16

"_1d" suffix is redundant, so removing it. The same will happen with idct
in the next change sets.

Change-Id: Ibf421cd2f569146c6079269df7a31819c098265e
2013-10-10 11:53:55 -07:00
Jim Bankoski
424c74e736 cpplint vp9_dct.c issues resolved
Change-Id: Ia21653a447040f1b472d21ebd19103b0558c4b16
2013-10-04 13:47:59 -07:00
Yaowu Xu
6037f17942 Rename defined constants
The change is to better reflect the nature of the constants.

Change-Id: Icabac6e9bceefbdb3f03f8218f88ef75943c30fb
2013-09-24 10:53:01 -07:00
Yaowu Xu
014acfa2af fix integer overflow errors
Change-Id: I76f440a917832c02d7a727697b225bac66b99f56
2013-09-19 08:14:26 -07:00
Jingning Han
3cf46fa591 Fix 32x32 forward transform SSE2 version
This commit fixed the potential overflow issue in the SSE2
implementation of 32x32 forward DCT. It resolved the corrupted
coded frames in the border of scenes.

Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
2013-08-31 18:47:08 -07:00
Jingning Han
2cb75c9607 Refactor SSE2 8x8 functional units
These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT
hybrid transform coding.

Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d
2013-07-03 10:11:59 -07:00
Christian Duvivier
466e0cf303 SSE2 version of vp9_short_fdct32x32_rd.
43,000 -> 5,750 cycles, about 7.5x faster.

Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0
2013-06-29 13:53:00 -07:00
Jingning Han
ab362621fe Add 8x8 dct/adst unit tests
This commit enables 8x8 DCT and hybrid transform unit tests. It
also tunes the forward hybrid transform rounding opertions for
more precise round-trip performance.

Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
2013-06-25 09:57:01 -07:00
Jingning Han
a41a4860c0 Make fdct32 computation flow within 16bit range
This commit makes use of dual fdct32x32 versions for rate-distortion
optimization loop and encoding process, respectively. The one for
rd loop requires only 16 bits precision for intermediate steps.
The original fdct32x32 that allows higher intermediate precision (18
bits) was retained for the encoding process only.

This allows speed-up for fdct32x32 in the rd loop. No performance
loss observed.

Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3
2013-06-18 09:46:24 -07:00
Yaowu Xu
042e70e45e Changed to use a new variant of WHT
The commit changed to use a new variant of Walsh-Hadamard Transform
by Tim Terriberry. This new variant has the best compression among a
number of variants that developed by Tim.

Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9
2013-05-30 15:37:52 -07:00
Timothy B. Terriberry
95339d6825 Reduce WHT complexity.
Saves 1 add, 3 shifts (and a shift bias) per 1-D transform.

Change-Id: I1104bb1679fe342b2f9677df8a9cdc0cb9699e7d
2013-05-27 13:23:52 -07:00