generic-library/vpx

Author	SHA1	Message	Date
Yaowu Xu	bc436734bd	Use consistent paramemter type Change-Id: I2ba5c9cb7e9ac6761ac18262564182039cfaad5b	2016-08-05 16:32:48 -07:00
Yaowu Xu	7e89c102c4	vp9-highbitdepth -> vpx-highbitdepth Change-Id: I1e90cf7ab4bb02c0ef119b0bd1596771edefedff	2016-08-05 15:41:33 -07:00
Yaowu Xu	8bf837f153	Cherry pick from AOM: 68e7e4d0 Remove VP9_CAP_POSTPROC 0738390c Remove vp9_temporal denoise b89861a4 Remove vp9-postproc Change-Id: I4ecaa0ac83a519c8174a494378fc23df610ff2a8	2016-08-02 15:29:50 -07:00
Yi Luo	b2663a8a67	HBD fast path quantization speed improvement - HBD encoder speed improvement (SSE4.1): Enable CONFIG_VP9_HIGHBITDEPTH, on Xeon E5-2680, 50 frames, park_joy_1080p, 12-bit, Encoding time reduces from 4846481 to 4177471 (ms) - Add unit test to verify bit-exact and EOB calculation Change-Id: I08e8ef3549ddad5ab36d86e78557df3b288537ea	2016-07-20 14:11:10 -07:00
Yaowu Xu	6fe07a207b	Merge branch 'master' into nextgenv2 Change-Id: Ia3c0f2103fd997613d9f16156795028f89f63265	2016-07-14 16:05:48 -07:00
Geza Lore	135d663159	Reinstate "Optimize wedge partition selection." without tests. This reinstates commit efda2831e5f758b4f350679b5c55c0b9282449b0 without the tests and with fixes for 32 bit x86 builds. Change-Id: I34be4fe1e8a67686d26ba256fd7efe0eb6a569e8	2016-06-21 20:31:50 +01:00
Angie Chiang	95340fccb3	Revert "Optimize wedge partition selection." This reverts commit efda2831e5f758b4f350679b5c55c0b9282449b0. This commit causes segmentation fault at SSE2/SumSquares2DTest.RandomValues/0 Change-Id: I171937e4daf6f15323e8206418773deb03bd8c53	2016-06-09 19:17:37 -07:00
Geza Lore	efda2831e5	Optimize wedge partition selection. We can optimize wedge partition selection by pre-computing the residuals of the 2 underlying predictors, and then blend these to compute the sse of the compound predictor, without actually having to compute and subtract the compound predictor. Similarly we can pre-compute a proxy array which we can use to cheaply check which mask sign would have lower sse. Details are in wedge_utils.c. Mathematically these are equivalence transformations, but due to the finite precision the encoder output will be perturbed, though on average this should make 0% difference. ext-inter gains about ~4.5% speedup. Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792	2016-06-06 14:43:10 +01:00
Linfeng Zhang	af7fb17c09	Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10. Function level timing test shows about 27% time saving on a Xeon E5-2680 v2 desktop. Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid duplicate basenames. Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2() are identical. TODO: They should be unified later if there is no intention to keep a duplicate. Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d	2016-05-27 09:51:16 -07:00
Yi Luo	28cdee448d	HBD inverse HT 8x8 and 16x16 sse4.1 optimization - Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Encoding speed improves ~27% on crowd_run_1080p_12. - Merge 4x4, 8x8, 16x16 unit tests in one test file. Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c	2016-05-24 12:55:30 -07:00
Angie Chiang	6f28581b26	Turn on flip in inverse txfm2d Fix build failed Reduce txfm test time Change-Id: Ieaf6b27f3a272d06286f817f01230413fa8adcf6	2016-05-18 11:26:57 -07:00
Yi Luo	1d307368a9	Integrate HBD row/column flip fwd txfm SSE4.1 optimization - Integrate 5 flip transform types for each 4x4, 8x8, and 16x16 block, for experiment, EXT_TX. - Encoder speed improves about 12%-15%. - Update the unit tests for bit-exact result against C. Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb	2016-05-18 03:48:01 +00:00
Yaowu Xu	fc9deb6b0c	Remove "const" for parameters passed by value This commit removes const from parameters that are passed by value for consistency in code style. Change-Id: I2947c4e9cc6e809c4b9b4c162046e45127b8a41c	2016-05-10 09:30:44 -07:00
Yi Luo	412ad22f46	HBD hybrid transform 16x16 SSE4.1 optimization - Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Update vp10_fht16x16_test.cc to do bit-exact test against latest C version. - HBD encoder speed improves ~1.8%. Change-Id: Icfc799a212e5289bcf6cedcae3722032133a2bc6	2016-05-09 11:07:01 -07:00
Yaowu Xu	ad841b7dac	Make parameter types consistent This fixes compiler warnings from MSVC. Change-Id: Iaac0e994869561371295578a893f766493ce0544	2016-05-06 23:39:46 +00:00
Yi Luo	299c5fc202	HBD hybrid transform 8x8 SSE4.1 optimization - Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Update bit-exact unit test against current C version. - HBD encoder speed improves ~3.8%. Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec	2016-04-29 17:04:52 -07:00
Yi Luo	a4593f17ca	HBD hybrid transform 4x4 SSE4.1 optimization - Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Overall encoder speed improves ~4.5%-6%. - Update bit-exact unit test against current C version. Change-Id: If751c030612245b1c2470200c9570cf40d655504	2016-04-25 09:53:09 -07:00
Yi Luo	f095ea7dd6	Improvement on hybrid transform 4x4 DCT_DCT SSE4.1 optimization - Implemented Angie's new fwd txfm algorithm. - Improve ~100% than last 64-bit version; 3 times faster than original C code. - Passed bit-exact unit test. Change-Id: Ica30b9768706604a6d69fe42da778441f0f5f02e	2016-04-15 14:16:30 -07:00
Geza Lore	552d5cd715	Extend superblock size fo 128x128 pixels. If --enable-ext-partition is used at build time, the superblock size (sometimes also referred to as coding unit (CU) size) is extended to 128x128 pixels. Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a	2016-03-30 18:23:06 +01:00
Yi Luo	770bf71503	8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization - Wrote function: fidtx8_sse2() and fidtx16_sse2(). - Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types. - Updated 8x8/16x16 unit tests for accuracy/speed. - Running 20K times with random numbers and getting through tx type from V_DCT to H_FLIPADST, SSE2 speed improvement: 8x8: ~131% 16x16: ~66% Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a	2016-03-25 16:48:19 -07:00
Yi Luo	4970388c23	4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization - Added function fidtx4_sse2(). - Turned on vp10_fht4x4_sse2() for these tx types. - Updated 4x4 unit test for speed/accuracy. - 4x4 Unit test passed. - Running 20K times with random numbers for tx type from V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%. Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf	2016-03-24 15:09:18 -07:00
Yi Luo	659c2c98e1	Misc. updates for highbd changes - Use Makefile to control the build for highbd_fwd_txfm_sse4.c. - Fixed hybrid transform (HT) types due to recent update. - Added new unit test cases for highbd HT. Change-Id: Ifd768a9b429a8c21ed40c1de8152fb5ac71e2f90	2016-03-23 12:10:52 -07:00
Yi Luo	deb33056d1	Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed." into nextgenv2	2016-03-23 18:30:40 +00:00
Yi Luo	977dccd12c	Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed. Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004	2016-03-23 09:13:45 -07:00
Debargha Mukherjee	1b17559327	Adds 1D transforms for ADST/FlipADST to make 16 Makes a set of 16 transforms total, adding all 1D combinations of ADST and FlipADST, and removng all DST transforms. lowres, midres both improve by about 0.1% and hdres by -0.378% in BDRATE but with fewer transforms that are also simpler. Further experiments to continue later. Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e	2016-03-21 11:19:36 -07:00
Yi Luo	50a164a1f6	Implemented DST 16x16 SSE2 intrinsics optimization - Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16(). - Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2(). - Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in fwd_txfm_16x16(). - Added vp10_fht16x16_sse2() unit test against C version: vp10_fht16x16_c() (--gtest_filter=VP10Trans16x16). - Unit test passed. - Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m, and mobile_cif.y4m. Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f	2016-03-08 14:56:38 -08:00
Yi Luo	68d6a5073a	Fixed a computation bug in fdct16_sse2() fdct16_sse2() was not bit-exact with C reference, fdct16(). The inconsistency was found by writing a unit test for vp10_fht16x16_sse2(). Since the unit test needs a pending change on the inherited base class. I will commit this unit test after making a header file for this base class. Passed the uncommitted unit test: vp10_fht16x16_test.cc. Change-Id: If2b617883c633a3ea90c19e1d018240c8007102b	2016-03-02 15:20:12 -08:00
Yi Luo	0353f596e9	Implemented DST 8x8 with SSE2 intrinsics. Implemented fdst8_sse2() function against C version: fdst8(). Added seven DST related hybrid transform types in vp10_fht8x8_sse2(). Replaced vp10_fht8x8_c() with vp10_fht8x8_sse2() in fwd_txfm_8x8(). Speedup: 18.1%, 11.5%, 22.0% based on speed test from city_cif.y4m, garden_sif.y4m, mobile_cif.y4m. Change-Id: Ia4aa1ea44c7a33e494f64ce843037f8703f975e3	2016-02-24 14:58:01 -08:00
Yi Luo	5456aee6fc	Initial SSE2 function fdst4_sse2(). Applied DST sse2 to 4x4 transform. Fixed DST coefficient packing to satisfy 4x4 transpose requirement. Change-Id: I9164714c77049523dbbc9e145ebb10d7911fba9d	2016-02-19 11:13:37 -08:00
Yaowu Xu	b37e8b0e00	Merge branch 'master' into nextgenv2	2015-12-15 05:00:05 -08:00
James Zern	d36659cec7	move vp9_avg to vpx_dsp Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f	2015-12-14 14:42:12 -08:00
Geza Lore	01bb4a318d	Eliminate copying for FLIPADST in fwd transforms. This patch eliminates the copying of data when using FLIPADST forward transforms, by incorporating the necessary data flipping into the load_buffer_* functions of the SSE2 optimized forward transforms. The load_buffer_* functions are normally inlined, so the overhead of copying the data is removed and the overhead of flipping is minimized. Left to right flipping is still not free, as the columns need to be shuffled in registers. To preserve identity between the C and SSE2 implementations, the appropriate C implementations now also do the data flipping as part of the transform, rather than relying on the caller for flipping the input. Overall speedup is about 1.5-2% in encode on my tests. Note that these are only the forward transforms. Inverse transforms to come in a later patch. There are also a few code hygiene changes: - Fixed some indents of switch statements. - DCT_DCT transform now always use vp10_fht* functions, which dispatch to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct* directly, some of them used to call vp10_fht*). Change-Id: I93439257dc5cd104ac6129cfed45af142fb64574	2015-11-03 17:10:55 +00:00
Jingning Han	54d66ef165	Remove vp9_ prefix from vp10 files Remove the vp9_ prefix from vp10 file names. Change-Id: I513a211b286a57d6126fc1b0fbfd6405120014f1	2015-08-11 21:24:08 -07:00
Jingning Han	3ee6db6c81	Fork VP9 and VP10 codebase This commit folks the VP9 and VP10 codebase and makes libvpx support VP8, VP9, and VP10. Change-Id: I81782e0b809acb3c9844bee8c8ec8f4d5e8fa356	2015-08-11 17:05:28 -07:00

34 Commits