generic-library/vpx

Author	SHA1	Message	Date
Yaowu Xu	7e89c102c4	vp9-highbitdepth -> vpx-highbitdepth Change-Id: I1e90cf7ab4bb02c0ef119b0bd1596771edefedff	2016-08-05 15:41:33 -07:00
Debargha Mukherjee	e5848dea5a	Rectangular transforms 4x8 & 8x4 Added a new expt rect-tx to be used in conjunction with ext-tx. [rect-tx is a temporary config flag and will eventually be merged into ext-tx once it works correctly with all other experiments]. Added 4x8 and 8x4 tranforms for use initially with rectangular sub8x8 y blocks as part of this experiment. There is about a -0.2% BDRATE improvement on lowres, others pending. When var-tx is on rectangular transforms are currently not used. That will be enabled in a subsequent patch. Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a	2016-07-21 10:46:41 -07:00
Jingning Han	bbe1b2217b	Refactor sub8x8 block transform and quantization process This commit refactors the transform and quantization process for sub8x8 blocks and unifies the related functions. Change-Id: I005f61f3eb49eec44f947b906c4e308cab9935a2	2016-06-23 16:56:05 -07:00
Yi Luo	1d307368a9	Integrate HBD row/column flip fwd txfm SSE4.1 optimization - Integrate 5 flip transform types for each 4x4, 8x8, and 16x16 block, for experiment, EXT_TX. - Encoder speed improves about 12%-15%. - Update the unit tests for bit-exact result against C. Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb	2016-05-18 03:48:01 +00:00
Angie Chiang	1e587ae616	Merge "Add flip option for vp10_fwd_txfm2d_#x#_c" into nextgenv2	2016-05-12 18:08:28 +00:00
Angie Chiang	1954fa390f	Add flip option for vp10_fwd_txfm2d_#x#_c Will add unit test to test/vp10_fwd_txfm2d_test.cc later Change-Id: I626900c67fca4eee2ad0ae1828188527a04a5362	2016-05-10 18:14:57 -07:00
Jingning Han	5cf3408ba1	Remove unused highbd_fdct32x32 function The encoder is using vp10_fwd_txfm2d_32x32 now. Change-Id: I719f18ec0b065f1e062d01fd300533dd2f17c712	2016-05-10 14:33:34 -07:00
Yi Luo	cf7f00691f	Change hybrid transform function argument from TXFM_2D_CFG* to int Unit test shows manually developed SSE4.1 code would performs ~30% better if TXFM_2D_CFG configuration is set in lower level. This change only updates function signature. There is no performance impact. Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b	2016-04-21 18:37:21 -07:00
Angie Chiang	02d23fbbf4	Fit adst/dct's stage range into 32-bit in bd12 Change-Id: Ie428c6f0655873de3e77e844a2f2e4203cf47dff	2016-04-14 15:44:05 -07:00
Angie Chiang	ff8c490b9a	Branch dct to new implementation for bd12 Change-Id: I9281935653aacce22ac3100f79fb956c249e2bf3	2016-04-04 12:40:10 -07:00
Angie Chiang	4144a11552	Merge "Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10" into nextgenv2	2016-03-28 19:20:48 +00:00
Angie Chiang	33833aefdd	Merge "Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10" into nextgenv2	2016-03-28 18:11:47 +00:00
Angie Chiang	46b234478f	Use vp10_[fwd/inv]_txfm2d_add_32x32 for bd 10 Change-Id: I996c48a90d7d71b52594a91a35cb8712c7fc212e	2016-03-28 11:08:40 -07:00
Yi Luo	770bf71503	8x8/16x16 HT types V_DCT to H_FLIPADST SSE2 optimization - Wrote function: fidtx8_sse2() and fidtx16_sse2(). - Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types. - Updated 8x8/16x16 unit tests for accuracy/speed. - Running 20K times with random numbers and getting through tx type from V_DCT to H_FLIPADST, SSE2 speed improvement: 8x8: ~131% 16x16: ~66% Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a	2016-03-25 16:48:19 -07:00
Yi Luo	4970388c23	4x4 hybrid transform type V_DCT to H_FLIPADST SSE2 optimization - Added function fidtx4_sse2(). - Turned on vp10_fht4x4_sse2() for these tx types. - Updated 4x4 unit test for speed/accuracy. - 4x4 Unit test passed. - Running 20K times with random numbers for tx type from V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%. Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf	2016-03-24 15:09:18 -07:00
Angie Chiang	d9a0cbb1b7	Use vp10_[fwd/inv]_txfm2d_add_#x# for bd 10 Change-Id: Ie35bdbd7aafae693e3106d7ccbbdd8e65ee8800c	2016-03-23 12:05:12 -07:00
Yi Luo	deb33056d1	Merge "Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed." into nextgenv2	2016-03-23 18:30:40 +00:00
Yi Luo	977dccd12c	Highbd fht4x4 SSE4.1 optimization for DCT_DCT mode - Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1 intrinsics optimization. - Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(), and fdct4x4_sse4_1(). - Used logic right shift to avoid coeff memory write/read. - Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only. - Improved overall encoding performance >2.3% for 50 frames sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12, --bit-depth=12, 50 frames. - Unit test passed. Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004	2016-03-23 09:13:45 -07:00
Debargha Mukherjee	1b17559327	Adds 1D transforms for ADST/FlipADST to make 16 Makes a set of 16 transforms total, adding all 1D combinations of ADST and FlipADST, and removng all DST transforms. lowres, midres both improve by about 0.1% and hdres by -0.378% in BDRATE but with fewer transforms that are also simpler. Further experiments to continue later. Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e	2016-03-21 11:19:36 -07:00
Debargha Mukherjee	9b88762b17	Refactor 1D transforms In preparation for adding more 1D variants with ADST/FlipADST/etc. BDRATE actually improves by 0.21% on lowres. Change-Id: I2fa4720c69fe001fa666119a284dfc6b17fffab2	2016-03-14 22:30:09 -07:00
Jingning Han	68d9a14e9f	Merge "Enable hybrid 1-D/2-D transform coding for highbd setting" into nextgenv2	2016-03-11 18:09:11 +00:00
Jingning Han	c453ae53d0	Enable hybrid 1-D/2-D transform coding for highbd setting This commit enables the hybrid 1-D/2-D transform coding scheme for high bit-depth setting. It improves the compression performance of ext-tx experiment by 0.98% for lowres_all set. Change-Id: Ic27f5037f2c36b095a93b9f15dbae34bdcdf00aa	2016-03-10 08:58:07 -08:00
Yi Luo	50a164a1f6	Implemented DST 16x16 SSE2 intrinsics optimization - Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16(). - Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2(). - Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in fwd_txfm_16x16(). - Added vp10_fht16x16_sse2() unit test against C version: vp10_fht16x16_c() (--gtest_filter=VP10Trans16x16). - Unit test passed. - Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m, and mobile_cif.y4m. Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f	2016-03-08 14:56:38 -08:00
Jingning Han	a8dc9694a4	Hybrid 1-D/2-D transform coding This commit enables a hybrid 1-D/2-D transform coding scheme and the accompany entropy coding system. It currently uses hybrid 1-D/2-D DCT transform coding. It provides coding performance gains: lowres_all 0.55% hdres_all 0.43% Change-Id: I2b30dcafd21eb2bb3371f6e854cbab440a4dfa78	2016-03-07 09:27:46 -08:00
Debargha Mukherjee	3287f5519e	Merge "Hooks to use 32x32 masked transforms for ext-tx" into nextgenv2	2016-02-26 20:54:37 +00:00
Yi Luo	0353f596e9	Implemented DST 8x8 with SSE2 intrinsics. Implemented fdst8_sse2() function against C version: fdst8(). Added seven DST related hybrid transform types in vp10_fht8x8_sse2(). Replaced vp10_fht8x8_c() with vp10_fht8x8_sse2() in fwd_txfm_8x8(). Speedup: 18.1%, 11.5%, 22.0% based on speed test from city_cif.y4m, garden_sif.y4m, mobile_cif.y4m. Change-Id: Ia4aa1ea44c7a33e494f64ce843037f8703f975e3	2016-02-24 14:58:01 -08:00
Debargha Mukherjee	da2d4a7afc	Hooks to use 32x32 masked transforms for ext-tx Adds hooks to use 32x32 ext-tx. Also adds scan orders for the masked transforms for 32x32. Make macro USE_MSKTX_FOR_32X32 1 in blockd.h to support 32x32 masked transforms for ext-tx. Change-Id: Ie6564830266651fcafae2d536c274dafd664ce17	2016-02-24 13:08:37 -08:00
Yi Luo	5456aee6fc	Initial SSE2 function fdst4_sse2(). Applied DST sse2 to 4x4 transform. Fixed DST coefficient packing to satisfy 4x4 transpose requirement. Change-Id: I9164714c77049523dbbc9e145ebb10d7911fba9d	2016-02-19 11:13:37 -08:00
Julia Robson	c178b2d192	Making the forward transform consistent with high bit depth This patch changes the code for 16bit buffers to use the same optimisation as is used for 8bit buffers. (See change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1 for more information about the optimisation) Change-Id: I5f327a13a7b01fc356114a2aa9d1261bf76d8d69	2016-01-20 12:03:16 +00:00
Angie Chiang	96baa73ed9	Create hybrid_fwd_txfm.c Move txfm functions from encodemb to hybrid_twd_txfm.c to make encodemb's code flow clear Change-Id: If174d8ddb490d149c103e5127d30ef19adfbed13	2015-11-25 12:51:25 -08:00

30 Commits