generic-library/vpx

Author	SHA1	Message	Date
Debargha Mukherjee	e6790e30c5	Replace DST1 with DST2 for ext-tx experiment A small gain (0.1 - 0.2%) with this experiment on derflr/hevcmr. The DST2 can be implemened very efficiently using sign flipping of odd indexed inputs, followed by DCT, followed by reversal of the output. This is how it is implemented in this patch. SIMD optimization is pending. Change-Id: Ic2fc211ce0e6b7c6702974d76d6573f55cc4da0e	2015-12-14 13:54:41 -08:00
Debargha Mukherjee	3a45a1edfd	Remove dst1 config option and merge with ext-tx Change-Id: I0152ed352ae2a0a725a508b5c209ef2c1dc2302d	2015-11-13 11:24:38 -08:00
Geza Lore	177ad11981	Eliminate copying for FLIPADST in fwd transforms. This is a port of 01bb4a318dc0f9069264b7fd5641bc3014f47f32 This commit also fixes a bug where FLIPADST transforms when combined with a DST (that is FLIPADST_DST and DST_FLIPADST) did not actually did a flipped transform but a straight ADST instead. This was due to the C implementation that it fell back on not implementing flipping. This is now fixed as well and FLIPADST_DST and DST_FLIPADST does what it is supposed to do. Change-Id: I89c67ca1d5e06808a1567c51e7d6bec4998182bd	2015-11-13 09:34:26 -08:00
Debargha Mukherjee	ff9aa146cb	Reimplementatio of dst1 for speed Encoder with --enable-ext-tx --enable-dst1 is now 4 times faster. Change-Id: Ia750ad3516698ce94da4ceb566b1c51539537a95	2015-10-02 11:06:55 -07:00
Debargha Mukherjee	4dbaf9a5ab	Redo DST1 in the ext-tx experiment Moved from nextgenv2 branch to test with other experiments. derflr: +1.629% Change-Id: Ie7c720053ed8b628177679c4351bb31b54716a71	2015-09-16 09:46:13 -07:00
Cheng Chen	cc4d523d9f	Resolve bug of DST1 in ext_tx experiment. Change-Id: I828569e3596f9b9e8487aec7c4056e66cf1fc1f2	2015-07-28 10:58:14 -07:00
Debargha Mukherjee	23690fc5d1	Adds support for DST1 transforms for inter blocks Adds an additional transform in the ext_tx experiment that is a 2d DST1-DST1 combination. To enable use --enable-ext-tx --enable-dst1. This needs to be later extended to combine DST1 with DCT or ADST. Change-Id: I6d29f1b778ef8294bcfb6a512a78fc5eda20723b	2015-07-24 16:23:09 -07:00
Debargha Mukherjee	b433dd4443	Adds wavelet transforms + hybrid dct/dwt variants The wavelets implemented are 2/6, 5/3 and 9/7 each with a lifting based scheme for even block sizes. The 9/7 one is a double implementation currently. This is to start experiments with: 1. Replacing large transforms (32x32 and 64x64) with wavelets or wavelet-dct hybrids that can hopefully localize errors better spatially. (Will also need alternate entropy coder) 2. Super-resolution modes where the higher sub-bands may be selectively skipped from being conveyed, while a smart reconstruction recovers the lost frequencies. The current patch includes two types of 32x32 and 64x64 transforms: one where only wavelets are used, and another where a single level wavelet decomposition is followed by a lower resolution dct on the low-low band. Change-Id: I2d6755c4e6c8ec9386a04633dacbe0de3b0043ec	2015-06-08 23:30:38 -07:00
Peter de Rivaz	41973e0e3e	Refactored idct routines and headers This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665	2015-05-06 09:59:20 -07:00
Alex Converse	9b638cded6	tx_skip: Avoid undefined shift behavior. vp9_quantize_rect did illegal shifts but didn't use the results. The shift \|a << b\| is unfortunately undefined if \|a < 0\|, but the more verbose \|a * (1 << b)\| generates the same machine code. Change-Id: I7ceac66fa20a700630cf8ed008949146b161dab4	2015-04-30 12:56:27 -07:00
Deb Mukherjee	35d38646ec	Misc changes to support high-bitdepth with supertx Change-Id: I0331646d1c55deb6e4631e64bd6b092fb892a43e	2015-03-12 16:52:25 -07:00
punksu	571fdbb05f	dpcm intra prediction for tx_skip Implements vertical, horizontal, and tm dpcm intra prediction for blocks in tx_skip mode. Typical coding gain on screen content video is 2%~5%. Change-Id: Idd5bd84ac59daa586ec0cd724680cef695981651	2015-01-14 14:54:09 -08:00
hui su	5de9280ae9	tx_skip mode for lossy coding This patch improves the non-transform coding mode. At this point, the coding gain on screen content videos is about 12% for lossless, an 15% for lossy case. 1. Encode tx_skip flags with context. Y tx_skip flag context is whether the prediction mode is inter or intra. UV flag context is Y flag. 2. Transform skipping is less helpful when the Q-index is high. So it is enabled only when the Q-index is smaller than a threshold. Currently the threshold is set as 255 for intra blocks, and 0 for inter blocks. 3. The shift of the prediction residue, when copying them to the coeff buffer, is set as 3 when the Q-index is larger than a threshold (currently set as 0), and 2 otherwise. Change-Id: I372973c7518cf385f6e542b22d0f803016e693b0	2014-12-15 10:46:41 -08:00
hui su	d97fd3eef6	Non transform coding experiment Non-transform option is enabled in both intra and inter modes. In lossless case, the average coding gain on screen content clips is 11.3% in my test. Change-Id: I2e8de515fb39e74c61bb86ce0f682d5f79e15188	2014-11-19 21:20:21 -08:00
Deb Mukherjee	0c7a94f49b	Adding a 64x64 transform mode Preliminary 64x64 transform implementation. Includes all code changes. All mismatches resolved. Coding results for derf and stdhd are within noise. stdhd is slightly higher, derf is slightly lower. To be further refined. Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b	2014-10-30 00:45:57 -07:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Jingning Han	6b0bc34b62	Fix C versions of DC calculation functions This commit fixes the scaling factors used in the C versions of the DC calculation functions. Change-Id: Iab41108c2bb93c2f2e78667214f3a772a2b707b5	2014-06-13 16:09:40 -07:00
Jingning Han	ccba289f8d	Fast computation path for forward transform and quantization This commit enables a fast path computational flow for forward transformation. It checks the sse and variance of prediction residuals and decides if the quantized coefficients are all zero, dc only, or more. It then selects the corresponding coding path in the forward transformation and quantization stage. It is currently enabled in rtc coding mode. Will do it for rd coding mode next. In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up. Overall coding performance for rtc set is changed by -0.18%. Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1	2014-06-12 11:10:54 -07:00
Jingning Han	7f547336b7	Adjust the forward 16x16 DCT computation steps This commit adjusts the forward 16x16 DCT computation steps to simplify the register level operations. It fixes the corresponding sse2 version accordingly. Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08	2014-05-19 12:39:26 -07:00
Andrew Russell	549c31f8ae	minor spelling cleanup in comments Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06	2014-02-12 16:32:51 -08:00
Dmitry Kovalev	005fc6970b	Finally removing "short" from transform names. Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d	2014-02-06 11:54:15 -08:00
Dmitry Kovalev	ff41764920	Removing _1d suffix from transform names. It is enough to specify (e.g.) idct16, it is obviously different from idct16x16. Change-Id: I6b408a37a945de3162429380b59a775b03b95db0	2014-01-27 16:15:36 -08:00
Jingning Han	bdc4371174	Take out assertion from inverse transforms Separate the rounding and right shift operations of forward transform from those of inverse transform. Take out the assertion check from inverse transforms. If the transform coefficients were constructed to cause intermediate steps of inverse transform overflow, the codec will just let it overflow without breaking the decoding flow. Change-Id: I73cfc3706c4e840fc543a77cbc4cdb0b05d07730	2013-11-15 15:30:47 -08:00
Dmitry Kovalev	ae2f732e8c	Adding fht{4x4, 8x8, 16x16} functions. Adding these functions to encapsulate tx_type check. Changing TX_TYPE to int to match the declaration in vo9_rtch.h. Change-Id: I6f3a2df6e35595ca73b6aaa9e3909ee7bc3fd16f	2013-10-25 17:55:07 -07:00
Dmitry Kovalev	600a3860a4	Making input pointer constant for all fdct/fht functions. Change-Id: I78f7012f967a777ddd39bae6671eb501df6bbfe8	2013-10-24 11:48:25 -07:00
Dmitry Kovalev	fd724f13b0	Renaming vp9_short_fdct4x4 and vp9_short_walsh4x4. For consistency with idct function names. Renames: vp9_short_fdct4x4 -> vp9_fdct4x4 vp9_short_walsh4x4 -> vp9_fwht4x4 Change-Id: Id15497cc1270acca626447d846f0ce9199770f58	2013-10-23 14:28:39 -07:00
Dmitry Kovalev	a018988ce8	Renaming vp9_short_fdct32x32 to vp9_fdct32x32. For consistency with idct function names. Change-Id: Ie77b7178e0894c57cd5cb9243c949eb9224ece18	2013-10-23 13:41:40 -07:00
Dmitry Kovalev	5bdd4d9ccf	Merge "Renaming vp9_short_fdct16x16 to vp9_fdct16x16."	2013-10-23 13:37:09 -07:00
Dmitry Kovalev	02feb63684	Renaming vp9_short_fdct16x16 to vp9_fdct16x16. For consistency with idct function names. Change-Id: I5ca355ba99fdba04f09254be95cf79808b534f71	2013-10-23 10:57:12 -07:00
Dmitry Kovalev	fa143dbc8e	Renaming vp9_short_fdct8x8 to vp9_fdct8x8. For consistency with idct function names. Change-Id: I7b6af2f92c66eff56f84ed29edc3a66af8dc421f	2013-10-23 10:52:33 -07:00
Dmitry Kovalev	9f09618bd4	Merge "Using stride (# of elements) instead of pitch (bytes) in fdct4x4."	2013-10-22 13:05:24 -07:00
Dmitry Kovalev	a767d10fa5	Merge "Using stride (# of elements) instead of pitch (bytes) in fdct8x8."	2013-10-22 11:34:17 -07:00
Dmitry Kovalev	190c2b4591	Using stride (# of elements) instead of pitch (bytes) in fdct4x4. Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b	2013-10-21 15:27:35 -07:00
Dmitry Kovalev	e5fa44c869	Using stride (# of elements) instead of pitch (bytes) in fdct8x8. Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: Ibc944952a192e6c7b2b6a869ec2894c01da82ed1	2013-10-18 12:20:26 -07:00
Dmitry Kovalev	1aa7fd5aef	Using stride (# of elements) instead of pitch (bytes) in fdct16x16. Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: I2d95fdcbba96aaa0ed24a80870cb38f53487a97d	2013-10-18 11:49:33 -07:00
Dmitry Kovalev	e05412fc23	Using stride (# of elements) instead of pitch (bytes) in fdct32x32. Just making fdct consistent with iht/idct/fht functions which all use stride (# of elements) as input argument. Change-Id: Id623c5113262655fa50f7c9d6cec9a91fcb20bb4	2013-10-17 13:02:28 -07:00
Dmitry Kovalev	a4585285ed	Removing unused 8x4 transform from the encoder. Change-Id: Icbcf68b5b685a56f255ebc3859c9692accdadf9e	2013-10-15 11:27:28 -07:00
Dmitry Kovalev	44195fda71	Adding const to the input argument of all 1D transforms. Also adding static to iadst16_1d and fadst16 functions. Change-Id: I13c7df3b776f0f8efc6e80099bdb0a2f6d29edaf	2013-10-11 11:19:58 -07:00
Dmitry Kovalev	fc82dbb434	Consistent names for FDCT functions. Renames: fdct4_1d -> fdct4 fadst4_1d -> fadst4 fdct8_1d -> fdct8 fadst8_1d -> fadst8 fdct16_1d -> fdct16 fadst16_1d -> fadst16 "_1d" suffix is redundant, so removing it. The same will happen with idct in the next change sets. Change-Id: Ibf421cd2f569146c6079269df7a31819c098265e	2013-10-10 11:53:55 -07:00
Jim Bankoski	424c74e736	cpplint vp9_dct.c issues resolved Change-Id: Ia21653a447040f1b472d21ebd19103b0558c4b16	2013-10-04 13:47:59 -07:00
Yaowu Xu	6037f17942	Rename defined constants The change is to better reflect the nature of the constants. Change-Id: Icabac6e9bceefbdb3f03f8218f88ef75943c30fb	2013-09-24 10:53:01 -07:00
Yaowu Xu	014acfa2af	fix integer overflow errors Change-Id: I76f440a917832c02d7a727697b225bac66b99f56	2013-09-19 08:14:26 -07:00
Jingning Han	3cf46fa591	Fix 32x32 forward transform SSE2 version This commit fixed the potential overflow issue in the SSE2 implementation of 32x32 forward DCT. It resolved the corrupted coded frames in the border of scenes. Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9	2013-08-31 18:47:08 -07:00
Jingning Han	2cb75c9607	Refactor SSE2 8x8 functional units These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT hybrid transform coding. Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d	2013-07-03 10:11:59 -07:00
Christian Duvivier	466e0cf303	SSE2 version of vp9_short_fdct32x32_rd. 43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0	2013-06-29 13:53:00 -07:00
Jingning Han	ab362621fe	Add 8x8 dct/adst unit tests This commit enables 8x8 DCT and hybrid transform unit tests. It also tunes the forward hybrid transform rounding opertions for more precise round-trip performance. Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3	2013-06-25 09:57:01 -07:00
Jingning Han	a41a4860c0	Make fdct32 computation flow within 16bit range This commit makes use of dual fdct32x32 versions for rate-distortion optimization loop and encoding process, respectively. The one for rd loop requires only 16 bits precision for intermediate steps. The original fdct32x32 that allows higher intermediate precision (18 bits) was retained for the encoding process only. This allows speed-up for fdct32x32 in the rd loop. No performance loss observed. Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3	2013-06-18 09:46:24 -07:00
Yaowu Xu	042e70e45e	Changed to use a new variant of WHT The commit changed to use a new variant of Walsh-Hadamard Transform by Tim Terriberry. This new variant has the best compression among a number of variants that developed by Tim. Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9	2013-05-30 15:37:52 -07:00
Timothy B. Terriberry	95339d6825	Reduce WHT complexity. Saves 1 add, 3 shifts (and a shift bias) per 1-D transform. Change-Id: I1104bb1679fe342b2f9677df8a9cdc0cb9699e7d	2013-05-27 13:23:52 -07:00

1 2

86 Commits