generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	ab362621fe	Add 8x8 dct/adst unit tests This commit enables 8x8 DCT and hybrid transform unit tests. It also tunes the forward hybrid transform rounding opertions for more precise round-trip performance. Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3	2013-06-25 09:57:01 -07:00
Jingning Han	a41a4860c0	Make fdct32 computation flow within 16bit range This commit makes use of dual fdct32x32 versions for rate-distortion optimization loop and encoding process, respectively. The one for rd loop requires only 16 bits precision for intermediate steps. The original fdct32x32 that allows higher intermediate precision (18 bits) was retained for the encoding process only. This allows speed-up for fdct32x32 in the rd loop. No performance loss observed. Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3	2013-06-18 09:46:24 -07:00
Yaowu Xu	042e70e45e	Changed to use a new variant of WHT The commit changed to use a new variant of Walsh-Hadamard Transform by Tim Terriberry. This new variant has the best compression among a number of variants that developed by Tim. Change-Id: Icb3a88515463cfc644b17ca046fcd139db2557e9	2013-05-30 15:37:52 -07:00
Timothy B. Terriberry	95339d6825	Reduce WHT complexity. Saves 1 add, 3 shifts (and a shift bias) per 1-D transform. Change-Id: I1104bb1679fe342b2f9677df8a9cdc0cb9699e7d	2013-05-27 13:23:52 -07:00
Christian Duvivier	5b6d33f9af	Faster vp9_short_fdct4x4 and vp9_short_fdct8x4. Scalar path is about 1.3x faster (2.1% overall encoder speedup). SSE2 path is about 5.0x faster (8.4% overall encoder speedup). Change-Id: I360d167b5ad6f387bba00406129323e2fe6e7dda	2013-04-16 16:38:30 -07:00
John Koleszar	7f7d1357a2	Merge branch 'experimental' into master VP9 preview bitstream 2, commit '868ecb55a1528ca3f19286e7d1551572bf89b642' Conflicts: vp9/vp9_common.mk Change-Id: I3f0f6e692c987ff24f98ceafbb86cb9cf64ad8d3	2013-04-16 06:49:46 -07:00
Yaowu Xu	12ade55719	Merge "removed reference to "LLM" and "x8"" into experimental	2013-03-18 08:51:19 -07:00
Christian Duvivier	4418b790a7	Faster vp9_short_fdct16x16. Scalar path is about 1.5x faster (3.1% overall encoder speedup). SSE2 path is about 7.2x faster (7.8% overall encoder speedup). Change-Id: I06da5ad0cdae2488431eabf002b0d898d66d8289	2013-03-15 15:55:31 -07:00
Yaowu Xu	005552639b	removed reference to "LLM" and "x8" The commit changed the name of files and function to remove obselete reference to LLM and x8. Change-Id: I973b20fc1a55149ed68b5408b3874768e6f88516	2013-03-13 08:35:46 -07:00
Christian Duvivier	c129203f7e	Faster vp9_short_fdct8x8. Scalar path is about 1.4x faster (4% overall encoder speedup). SSE2 path is about 7x faster (13% overall encoder speedup). Change-Id: I7e85d8225a914a74c61ea370210414696560094d	2013-02-27 17:23:08 -08:00
Dmitry Kovalev	347f3a0aa8	Code cleanup. Fixing code style, using array lookup instead of switch statements for forward hybrid transforms (in the same way as for their inverses). Consistent usage of ROUND_POWER_OF_TWO macro in appropriate places. Change-Id: I0d3822ae11f928905fdbfbe4158f91d97c71015f	2013-02-27 13:51:04 -08:00
Yaowu Xu	858b60e8d0	Merge "Improve 32x32 forward dct" into experimental	2013-02-27 07:56:42 -08:00
Yaowu Xu	66d94ac13c	Improve 32x32 forward dct The commit improves the 32x32 forward dct implementation: 1. change to use same constants and rounding as other forward dcts 2. select rounding to specifically minimize the roundtrip error, which improved average 19/block to .77/block using 100000 random input. Test showed a small but consistent gain on all test sets, about .15% Change-Id: If0afd6a71880a522f60c1c234be0462092c2eb53	2013-02-26 09:23:01 -08:00
Dmitry Kovalev	9bf3f75168	Changing pitch value meaning for fht and iht transforms. Pitch now means the number of elements, not the number of bytes. Change-Id: Idb9f2f012e39b09d596a3cc1802305a80b7c13af	2013-02-25 18:19:55 -08:00
Jingning Han	65821d6680	Improving the forward 16x16 ADST/DCT accuracy Increase the first stage dynamic range by 4 times, and reduce it back with proper rounding before applying the second stage. Hence it still fits in the given dynamic range and slightly improves the key frame coding performance. Change-Id: Ia4c5907446f20a95dc3de079c314b3ad1221d8aa	2013-02-25 12:13:37 -08:00
Jingning Han	77a3becf92	clean up forward and inverse hybrid transform Rebased. Remove the old matrix multiplication transform computation. The 16x16 ADST/DCT can be switched on/off and evaluated by setting ACTIVE_HT16 300/0 in vp9/common/vp9_blockd.h. Change-Id: Icab2dbd18538987e1dc4e88c45abfc4cfc6e133f	2013-02-25 09:16:12 -08:00
Yaowu Xu	499fe05dc0	optimize forward 16x16 DCT for accuracy This commit added pre/post scaling for first half of fDCT16x16 to reduce error, by simulation of 100,000 blocks for random inputs, the average sse reduced from 2.1/block to 0.0498/block. also enabled tests for 16x16 fDCT and iDCT Change-Id: Id2a95f0464c6dd4118797d456237ae90274c0f02	2013-02-25 07:47:27 -08:00
Yaowu Xu	22012ee994	optimize 8x8 fdct rounding for accuracy The commit added a final rounding choice for 8x8 forward dct to get rid of a sign bias at DC position and improve the accuracry in term of round trip error for 8x8 fDCT/iDCT. This commit also enabled forward 8x8 dct test. Change-Id: Ib67f99b0a24d513e230c7812bc04569d472fdc50	2013-02-22 16:55:30 -08:00
Jingning Han	babbd5d170	Forward butterfly hybrid transform This patch includes 4x4, 8x8, and 16x16 forward butterfly ADST/DCT hybrid transform. The kernel of 4x4 ADST is sin((2k+1)(n+1)/(2N+1)). The kernel of 8x8/16x16 ADST is of the form sin((2k+1)(2n+1)/4N). Change-Id: I8f1ab3843ce32eb287ab766f92e0611e1c5cb4c1	2013-02-21 18:24:28 -08:00
Yaowu Xu	d262e26cc7	Merge lossless experiment Change-Id: I7b7b8d4fda3a23699e0c920d727f8c15d37d43aa	2013-02-20 07:54:28 -08:00
Ronald S. Bultje	46dff5d233	Remove some Y2-related code. Change-Id: I4f46d142c2a8d1e8a880cfac63702dcbfb999b78	2013-02-15 14:06:25 -08:00
Yunqing Wang	048b9d41a6	Rewrote fdct16x16 Used same algorithm as others. Change-Id: Ifdac560762aec9735cb4bb6f1dbf549e415c38a0	2013-02-13 16:19:10 -08:00
Paul Wilkins	649be94cf0	Removal of Hybrid DWT/DCT experiment. Removal of experiment to simplify code base for other changes. Change-Id: If0a33952504558511926ad212bc311fc2bffb19a	2013-02-13 15:08:48 +00:00
Yunqing Wang	aa295918ed	Rewrote fdct8x8 Use consistent algorithm. Change-Id: Ib8484821ebc454b9d3380a3d6571798decd037f3	2013-02-11 22:28:05 -08:00
Yunqing Wang	ab2dc6ae57	Merge "Integerization of dct32x32" into experimental	2013-02-11 12:15:26 -08:00
Yunqing Wang	dbccffe299	Integerization of dct32x32 Test on derf set showed 0.047% overall psnr change. Change-Id: Id16c276c251a3943850ac9b95e9b09a56cf42b19	2013-02-08 08:50:47 -08:00
Yaowu Xu	e6ad9ab02c	move dct/idct constants to a header file also removed some un-unsed functions. Change-Id: Ie363bcc8d94441d054137d2ef7c4fe59f56027e5	2013-02-07 13:51:45 -08:00
Jingning Han	d15e1da494	Butterfly ADST based hybrid transform Refactor the 8x8 inverse hybrid transform. It is now consistent with the new inverse DCT. Overall performance loss (due to the use of this variant ADST, and the rounding errors in the butterfly implementation) for std-hd is -0.02. Fixed BUILD warning. Devise a variant of the original ADST, which allows butterfly computation structure. This new transform has kernel of the form: sin((2k+1)*(2n+1) / (4N)). One of its butterfly structures using floating-point multiplications was reported in Z. Wang, "Fast algorithms for the discrete W transform and for the discrete Fourier transform", IEEE Trans. on ASSP, 1984. This patch includes the butterfly implementation of the inverse ADST/DCT hybrid transform of dimension 8x8. Change-Id: I3533cb715f749343a80b9087ce34b3e776d1581d	2013-02-07 10:07:46 -08:00
Ronald S. Bultje	aac73df1a7	Use configure checks for various inline keywords. Change-Id: I8508f1a3d3430f998bb9295f849e88e626a52a24	2013-02-06 16:12:56 -08:00
Yaowu Xu	fa36981ec8	rewrite 4x4 idct and fdct This commit changes the 4x4 iDCT to use same algorithm & constants as other iDCTs. The 4x4 fDCT is also changed to be based on the new iDCT. Change-Id: Ib1a902693228af903862e1f5a08078c36f2089b0	2013-02-05 11:42:49 -08:00
Yaowu Xu	ab1cad9bdd	fix a small bug in 16 point forward dct The commit fixes a minor error in 16 point fdct where in a rotation can produce result of -1 instead of 0. Change-Id: I45aac4a52bcd06225c6d04e643547a13e1c1aade	2013-01-31 15:39:41 -08:00
John Koleszar	76ac5b3937	Fix unused variable warnings Previous commit does not build cleanly on Jenkins with the DWT/DCT hybrid experiment enabled (--enable-dwtdcthybrid). Change-Id: Ia67e8f59d17ef2d5200ec6b90dfe6711ed6835a5	2013-01-14 12:12:43 -08:00
Deb Mukherjee	516db21c2c	Further enhancements/fixes on dct/dwt hybrid txfm Fixes some scaling issues. Adds an option to only compute the dct on the low-low subband for 32x32 and 64x64 blocks using only a single 16x16 dct after 1 and 2 wavelet decomposition levels respectively. Also adds an option to use a 8x8 dct as building block. Currenlty with the 2/6 filter and with a single 16x16 dct on the low low band, the reuslts compared to full 32x32 dct is as follows: derf: -0.15% yt: -0.29% std-hd: -0.18% hd: -0.6% These are my current recommended settings, since the 2/6 filter is very simple. Results with 8x8 dct are about 0.3% worse. Change-Id: I00100cdc96e32deced591985785ef0d06f325e44	2013-01-12 16:00:53 -08:00
Ronald S. Bultje	aa2effa954	Merge tx32x32 experiment. Change-Id: I615651e4c7b09e576a341ad425cf80c393637833	2013-01-10 08:23:59 -08:00
Deb Mukherjee	4b7304ee68	Adds 64x64 hybrid dct/dwt transform This is to add to the 64x64 transform experiment as an alternative to a 64x64 DCT. Two levels of wavelet decomposition is used on a 64x64 block, followed by 16x16 DCT on the four lowest subbands. The highest three subbands are left untransformed after the first level DWT. Change-Id: I3d48d5800468d655191933894df6b46e15adca56	2013-01-08 14:05:58 -08:00
John Koleszar	879cb7d962	Merge vp9-preview changes into experimental branch Incorportate vp9-preview changes by merging master branch into experimental. Conflicts: test/test.mk vp9/common/vp9_filter.c vp9/common/vp9_idctllm.c vp9/common/vp9_invtrans.h vp9/common/vp9_mbpitch.c vp9/common/vp9_rtcd_defs.sh vp9/common/vp9_systemdependent.h vp9/common/vp9_type_aliases.h vp9/common/x86/vp9_asm_stubs.c vp9/common/x86/vp9_subpixel_mmx.asm vp9/decoder/vp9_decodframe.c vp9/decoder/vp9_dequantize.c vp9/decoder/vp9_dequantize.h vp9/decoder/vp9_onyxd_int.h vp9/encoder/vp9_bitstream.c vp9/encoder/vp9_encodeframe.c vp9/encoder/vp9_rdopt.c Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5	2013-01-08 10:19:59 -08:00
John Koleszar	5ebe94f9f1	Build fixes to merge vp9-preview into master Various fixups to resolve issues when building vp9-preview under the more stringent checks placed on the experimental branch. Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07	2012-12-26 11:21:09 -08:00
Deb Mukherjee	210dc5b2db	Further improvements on the hybrid dwt/dct expt Modifies the scanning pattern and uses a floating point 16x16 dct implementation for now to handle scaling better. Also experiments are in progress with 2/6 and 9/7 wavelets. Results have improved to within ~0.25% of 32x32 dct for std-hd and about 0.03% for derf. This difference can probably be bridged by re-optimizing the entropy stats for these transforms. Currently the stats used are common between 32x32 dct and dwt/dct. Experiments are in progress with various scan pattern - wavelet combinations. Ideally the subbands should be tokenized separately, and an experiment will be condcuted next on that. Change-Id: Ia9cbfc2d63cb7a47e562b2cd9341caf962bcc110	2012-12-13 10:37:49 -08:00
Ronald S. Bultje	c456b35fdf	32x32 transform for superblocks. This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds code all over the place to wrap that in the bitstream/encoder/decoder/RD. Some implementation notes (these probably need careful review): - token range is extended by 1 bit, since the value range out of this transform is [-16384,16383]. - the coefficients coming out of the FDCT are manually scaled back by 1 bit, or else they won't fit in int16_t (they are 17 bits). Because of this, the RD error scoring does not right-shift the MSE score by two (unlike for 4x4/8x8/16x16). - to compensate for this loss in precision, the quantizer is halved also. This is currently a little hacky. - FDCT and IDCT is double-only right now. Needs a fixed-point impl. - There are no default probabilities for the 32x32 transform yet; I'm simply using the 16x16 luma ones. A future commit will add newly generated probabilities for all transforms. - No ADST version. I don't think we'll add one for this level; if an ADST is desired, transform-size selection can scale back to 16x16 or lower, and use an ADST at that level. Additional notes specific to Debargha's DWT/DCT hybrid: - coefficient scale is different for the top/left 16x16 (DCT-over-DWT) block than for the rest (DWT pixel differences) of the block. Therefore, RD error scoring isn't easily scalable between coefficient and pixel domain. Thus, unfortunately, we need to compute the RD distortion in the pixel domain until we figure out how to scale these appropriately. Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b	2012-12-07 14:45:05 -08:00
John Koleszar	fcccbcbb39	Add vp9_ prefix to all vp9 files Support for gyp which doesn't support multiple objects in the same static library having the same basename. Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc	2012-11-27 14:12:30 -08:00

40 Commits