generic-library/vpx

Author	SHA1	Message	Date
Debargha Mukherjee	e6790e30c5	Replace DST1 with DST2 for ext-tx experiment A small gain (0.1 - 0.2%) with this experiment on derflr/hevcmr. The DST2 can be implemened very efficiently using sign flipping of odd indexed inputs, followed by DCT, followed by reversal of the output. This is how it is implemented in this patch. SIMD optimization is pending. Change-Id: Ic2fc211ce0e6b7c6702974d76d6573f55cc4da0e	2015-12-14 13:54:41 -08:00
Debargha Mukherjee	3a45a1edfd	Remove dst1 config option and merge with ext-tx Change-Id: I0152ed352ae2a0a725a508b5c209ef2c1dc2302d	2015-11-13 11:24:38 -08:00
Debargha Mukherjee	4dbaf9a5ab	Redo DST1 in the ext-tx experiment Moved from nextgenv2 branch to test with other experiments. derflr: +1.629% Change-Id: Ie7c720053ed8b628177679c4351bb31b54716a71	2015-09-16 09:46:13 -07:00
Shunyao Li	2de18d1fd2	Super resolution mode (+CONFIG_SR_MODE) CONFIG_SR_MODE=1, enable SR mode USE_POST_F=1, enable SR post filter SR_USE_MULTI_F=1, enable SR post filter family Not compatible with other experiments yet Change-Id: I116f1d898cc2ff7dd114d7379664304907afe0ec	2015-08-31 15:29:39 -07:00
Debargha Mukherjee	23690fc5d1	Adds support for DST1 transforms for inter blocks Adds an additional transform in the ext_tx experiment that is a 2d DST1-DST1 combination. To enable use --enable-ext-tx --enable-dst1. This needs to be later extended to combine DST1 with DCT or ADST. Change-Id: I6d29f1b778ef8294bcfb6a512a78fc5eda20723b	2015-07-24 16:23:09 -07:00
Debargha Mukherjee	4b57a8b356	Add extended transforms for 32x32 and 64x64 Framework for alternate transforms for inter 32x32 and larger based on dwt-dct hybrid is implemented. Further experiments are to be condcuted with different variations of hybrid dct/dwt or plain dwt, as well as super-resolution mode. Change-Id: I9a2bf49ba317e7668002cf1499211d7da6fa14ad	2015-07-23 18:01:22 -07:00
Debargha Mukherjee	b433dd4443	Adds wavelet transforms + hybrid dct/dwt variants The wavelets implemented are 2/6, 5/3 and 9/7 each with a lifting based scheme for even block sizes. The 9/7 one is a double implementation currently. This is to start experiments with: 1. Replacing large transforms (32x32 and 64x64) with wavelets or wavelet-dct hybrids that can hopefully localize errors better spatially. (Will also need alternate entropy coder) 2. Super-resolution modes where the higher sub-bands may be selectively skipped from being conveyed, while a smart reconstruction recovers the lost frequencies. The current patch includes two types of 32x32 and 64x64 transforms: one where only wavelets are used, and another where a single level wavelet decomposition is followed by a lower resolution dct on the low-low band. Change-Id: I2d6755c4e6c8ec9386a04633dacbe0de3b0043ec	2015-06-08 23:30:38 -07:00
Peter de Rivaz	41973e0e3e	Refactored idct routines and headers This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665	2015-05-06 09:59:20 -07:00
hui su	b69152db79	Add high bit depth support for tx-skip expt +0.3% on 10-bit +0.3% on 12-bit With other high bit compatible experiments on 12-bit +12.44% (+0.17) over 8-bit baseline Change-Id: I40b4c382fa54ba4640d08d9d01950ea8c1200bc9	2015-04-16 14:54:39 -07:00
punksu	571fdbb05f	dpcm intra prediction for tx_skip Implements vertical, horizontal, and tm dpcm intra prediction for blocks in tx_skip mode. Typical coding gain on screen content video is 2%~5%. Change-Id: Idd5bd84ac59daa586ec0cd724680cef695981651	2015-01-14 14:54:09 -08:00
hui su	5de9280ae9	tx_skip mode for lossy coding This patch improves the non-transform coding mode. At this point, the coding gain on screen content videos is about 12% for lossless, an 15% for lossy case. 1. Encode tx_skip flags with context. Y tx_skip flag context is whether the prediction mode is inter or intra. UV flag context is Y flag. 2. Transform skipping is less helpful when the Q-index is high. So it is enabled only when the Q-index is smaller than a threshold. Currently the threshold is set as 255 for intra blocks, and 0 for inter blocks. 3. The shift of the prediction residue, when copying them to the coeff buffer, is set as 3 when the Q-index is larger than a threshold (currently set as 0), and 2 otherwise. Change-Id: I372973c7518cf385f6e542b22d0f803016e693b0	2014-12-15 10:46:41 -08:00
hui su	d97fd3eef6	Non transform coding experiment Non-transform option is enabled in both intra and inter modes. In lossless case, the average coding gain on screen content clips is 11.3% in my test. Change-Id: I2e8de515fb39e74c61bb86ce0f682d5f79e15188	2014-11-19 21:20:21 -08:00
Deb Mukherjee	0c7a94f49b	Adding a 64x64 transform mode Preliminary 64x64 transform implementation. Includes all code changes. All mismatches resolved. Coding results for derf and stdhd are within noise. stdhd is slightly higher, derf is slightly lower. To be further refined. Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b	2014-10-30 00:45:57 -07:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
Jingning Han	12344f2697	Add range check in inverse ADST 16x16 Bit-stream clarification related to Issue 868. Change-Id: I92a7bc5b7782c9ea5c3f6cceec761742183c9514	2014-10-06 11:07:58 -07:00
Jingning Han	74c2997bc9	Remove redundant header file from vp9_idct.h Change-Id: Id92544762e7b96d3c729dfc8e04ecff91cbcc7f9	2014-10-01 14:58:27 -07:00
Deb Mukherjee	872b207b78	Moves transform type defines to vp9_common Moves transform type defines to vp9_common.h from vp9_idct.h so that they can be included in vp9_rtcd_defs.pl safely. Change-Id: Id5106227bee5934f7ce8b06f2eb9fa8a9a2e0ddb	2014-09-30 19:44:17 -07:00
James Zern	4a296e6baa	Revert "Fix compiling error in vp9_idct.h" This reverts commit eafc8c9c40d712aabe234bed5269a02c62fa0bfc. tran_low_t/tran_high_t don't belong in a public header, they're private. Similarly the public headers shouldn't rely on config defines, vpx_config.h isn't installed. Change-Id: I194ec273598da418df8dd727b6c0e78a556740ad	2014-09-30 16:08:55 -07:00
Jingning Han	eafc8c9c40	Fix compiling error in vp9_idct.h This commit fixes a compiling error in vp9_idct.h, where the codec checks that the intermediate steps of transformation fit within 16-bit length. The issue was due to broken file dependency. Change-Id: Ib22bba13a1e6df28489cb23d6774c561969f1fdc	2014-09-30 09:11:59 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Yaowu Xu	0a2b25dcb9	configure: add --enable-coefficient-range-checking This commit adds a configure time option used to enable strict error checking in decoder to make sure intermediate stage cofficients of inverse transforms are within valid range of signed 16 bit integer. For valid VP9 input streams, intermediate stage coefficients should always stay within the range of a signed 16 bit integer. Coefficients can go out of this range for invalid/corrupt VP9 streams. However, strictly checking this range for every intermediate coefficient can be a burden for decoder, therefore such validation is only enabled with configure option --enable-coefficient-range-checking. Change-Id: I47d47c8c4e48a922c3d223ca59064f51b3f0f5ed	2014-08-06 17:13:16 -07:00
Jingning Han	6d21cbd20b	Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs This commit enables SSSE3 implementation of the inverse 2D-DCT with only first 10 coefficients non-zero. It reduces the runtime of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up. Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe	2014-05-28 10:53:33 -07:00
Dmitry Kovalev	25a666ef39	Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c. Change-Id: I642a7d343677bf934e9a54cf4ad78e908620e39a	2014-05-01 16:45:49 -07:00
James Zern	0940c9cfde	vp9/common: add extern "C" to headers Change-Id: Ic334da9aee968e33762c2b25d9fbad24c844b411	2014-01-23 16:21:24 -08:00
Jingning Han	bdc4371174	Take out assertion from inverse transforms Separate the rounding and right shift operations of forward transform from those of inverse transform. Take out the assertion check from inverse transforms. If the transform coefficients were constructed to cause intermediate steps of inverse transform overflow, the codec will just let it overflow without breaking the decoding flow. Change-Id: I73cfc3706c4e840fc543a77cbc4cdb0b05d07730	2013-11-15 15:30:47 -08:00
Dmitry Kovalev	65f118d72f	Making input pointer of any inverse transform constant. Also renaming dest_stride to stride in some places. Change-Id: I75f602b623a5a7071d4922b747c45fa0b7d7a940	2013-10-11 18:27:12 -07:00
Dmitry Kovalev	ac468dde46	Consistent names for inverse hybrid transforms (2 of 2). Renames: vp9_iht_add -> vp9_iht4x4_add vp9_iht_add_8x8 -> vp9_iht8x8_add vp9_iht_add_16x16 -> vp9_iht16x16_add Change-Id: I8f1a2913e02d90d41f174f27e4ee2fad0dbd4a21	2013-10-11 15:49:05 -07:00
Dmitry Kovalev	44195fda71	Adding const to the input argument of all 1D transforms. Also adding static to iadst16_1d and fadst16 functions. Change-Id: I13c7df3b776f0f8efc6e80099bdb0a2f6d29edaf	2013-10-11 11:19:58 -07:00
Dmitry Kovalev	1e766b50e2	Giving consistent names to IDCT 32x32 functions. Renames: vp9_short_idct32x32_add -> vp9_idct32x32_1024_add vp9_short_idct32x32_1_add -> vp9_idct32x32_1_add vp9_idct_add_32x32 -> vp9_idct32x32_add Change-Id: Id85306f5814bac6c47463a6b5901a93082510666	2013-10-10 11:27:39 -07:00
Dmitry Kovalev	b096c5a336	Giving consistent names to IDCT 16x16 functions. Renames: vp9_short_idct16x16_add -> vp9_idct16x16_256_add vp9_short_idct16x16_10_add -> vp9_idct16x16_10_add vp9_short_idct16x16_1_add -> vp9_idct16x16_1_add vp9_idct_add_16x16 -> vp9_idct16x16_add Change-Id: Ief8a3904de78deab0f4ede944c4d0339c228cfc3	2013-10-07 14:31:10 -07:00
Dmitry Kovalev	c6ad70d5f1	Giving consistent names to IDCT 8x8 functions. Renames: vp9_short_idct8x8_add -> vp9_idct8x8_64_add vp9_short_idct8x8_1_add -> vp9_idct8x8_1_add vp9_short_idct8x8_10_add -> vp9_idct8x8_10_add vp9_idct_add_8x8 -> vp9_idct8x8_add Change-Id: Ifb8d3a45b4c0397aa805b30463f3d14581bf72c1	2013-10-06 00:24:09 -07:00
Dmitry Kovalev	3a0602578e	Giving consistent names to IDCT/IWHT functions. The idea is to have the following names for each transform size: vp9_idct4x4_add vp9_idct4x4_1_add vp9_idct4x4_10_add vp9_idct4x4_16_add vp9_idct8x8_add vp9_idct8x8_1_add vp9_idct8x8_10_add vp9_idct8x8_64_add etc for 16x16, 32x32 The actual list of renames in this patch: vp9_idct_add_lossless -> vp9_iwht4x4_add vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add vp9_idct_add -> vp9_idct4x4_add vp9_short_idct4x4_add -> vp9_idct4x4_16_add vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1	2013-10-04 14:17:06 -07:00
Dmitry Kovalev	be7eec79be	Moving all idct/iht functions in one place. Moving functions from vp9_idct_blk to vp9_idct because these functions are used from both encoder and decoder. Removing duplicated code from vp9_encodemb.c and reusing existing functions. Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8	2013-10-02 14:13:33 -07:00
Yaowu Xu	6037f17942	Rename defined constants The change is to better reflect the nature of the constants. Change-Id: Icabac6e9bceefbdb3f03f8218f88ef75943c30fb	2013-09-24 10:53:01 -07:00
Yaowu Xu	014acfa2af	fix integer overflow errors Change-Id: I76f440a917832c02d7a727697b225bac66b99f56	2013-09-19 08:14:26 -07:00
Jingning Han	78136edcdc	SSE2 high precision 32x32 forward DCT Enable SSE2 implementation of high precision 32x32 forward DCT. The intermediate stacks are of 32-bits. The run-time goes down from 32126 cycles to 13442 cycles. Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56	2013-08-12 16:52:53 -07:00
Dmitry Kovalev	ca75f1255f	Removing and moving around constant definitions. Removing unused and duplicated constants, moving them from .h to .c if possible. Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f	2013-07-15 19:26:30 -07:00
Christian Duvivier	466e0cf303	SSE2 version of vp9_short_fdct32x32_rd. 43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0	2013-06-29 13:53:00 -07:00
Jingning Han	a41a4860c0	Make fdct32 computation flow within 16bit range This commit makes use of dual fdct32x32 versions for rate-distortion optimization loop and encoding process, respectively. The one for rd loop requires only 16 bits precision for intermediate steps. The original fdct32x32 that allows higher intermediate precision (18 bits) was retained for the encoding process only. This allows speed-up for fdct32x32 in the rd loop. No performance loss observed. Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3	2013-06-18 09:46:24 -07:00
Yaowu Xu	042e70e45e	Changed to use a new variant of WHT The commit changed to use a new variant of Walsh-Hadamard Transform by Tim Terriberry. This new variant has the best compression among a number of variants that developed by Tim. Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9	2013-05-30 15:37:52 -07:00
Yunqing Wang	6344c84c82	Optimize 8x8 idct function Wrote sse2 functions of vp9_short_idct8x8 and vp9_short_idct10_8x8. Compared to c version, the sse2 version is 2X faster. The decoder test didn't show noticeable gain since 8x8 idct doesn't take much of decoding time (less than 1% in my test). Change-Id: I56313e18cd481700b3b52c4eda5ca204ca6365f3	2013-03-18 15:34:14 -07:00
Dmitry Kovalev	3603dfb62c	Consistent usage of ROUND_POWER_OF_TWO macro. Change-Id: I44660975e9985310d8c654c158ee7a61291b5a08	2013-03-07 12:24:35 -08:00
Yunqing Wang	e8bc9f4220	Optimize vp9_short_idct4x4llm function Wrote a SSE2 vp9_short_idct4x4llm to improve the decoder performance. Change-Id: I90b9d48c4bf37aaf47995bffe7e584e6d4a2c000	2013-03-04 12:01:27 -08:00
Christian Duvivier	c129203f7e	Faster vp9_short_fdct8x8. Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d	2013-02-27 17:23:08 -08:00
Dmitry Kovalev	347f3a0aa8	Code cleanup. Fixing code style, using array lookup instead of switch statements for forward hybrid transforms (in the same way as for their inverses). Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places. Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f	2013-02-27 13:51:04 -08:00
Yunqing Wang	8092aaf9ec	Merge "Optimize vp9_dc_only_idct_add_c function" into experimental	2013-02-27 11:38:45 -08:00
Yunqing Wang	35bc02c6eb	Optimize vp9_dc_only_idct_add_c function Wrote SSE2 version of vp9_dc_only_idct_add_c function. In order to improve performance, clipped the absolute diff values to [0, 255]. This allowed us to keep the additions/subtractions in 8 bits. Test showed an over 2% decoder performance increase. Change-Id: Ie1a236d23d207e4ffcd1fc9f3d77462a9c7fe09d	2013-02-26 17:16:13 -08:00
Yaowu Xu	66d94ac13c	Improve 32x32 forward dct The commit improves the 32x32 forward dct implementation: 1. change to use same constants and rounding as other forward dcts 2. select rounding to specifically minimize the roundtrip error, which improved average 19/block to .77/block using 100000 random input. Test showed a small but consistent gain on all test sets, about .15% Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53	2013-02-26 09:23:01 -08:00
Jingning Han	77a3becf92	clean up forward and inverse hybrid transform Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f	2013-02-25 09:16:12 -08:00
Dmitry Kovalev	548b4dd5f2	Code cleanup. Removing redundant 'extern' keywords and parentheses, fixing indentation, making variable names lower case, using short expressions x = c instead of x = x c, minor code simplifications. Change-Id: If6a25fcf306d1db26e90d27e3c24a32735c607de	2013-02-22 11:03:14 -08:00

1 2

53 Commits