generic-library/vpx

Author	SHA1	Message	Date
Yi Luo	0c552dfd82	Fix aom_fdct32x32_avx2 output as CONFIG_AOM_HIGHBITDEPTH=1 - Change FDCT32x32_2D_AVX2 output parameter to tran_low_t. - Add unit tests for CONFIG_AOM_HIGHBITDEPTH=1. - Update TODO notes. BUG=webm:1323 Change-Id: If4766c919a24231fce886de74658b6dd7a011246	2016-10-25 14:33:21 -07:00
Yi Luo	1a0f27aaa6	Fix avx2 16x16/32x32 fwd txfm coeff output on HBD Change-Id: Ida036defe5688894a63007a31aa2dd0b3f0b5d59	2016-10-21 14:14:00 -07:00
Yi Luo	157e45a44b	Fix the overflow of av1_fht32x32() in 2D DCT_DCT - Use range check function to avoid DCT_DCT overflow. We need to re-develop the column txfm side scaling/rounding. Now, we prefer to maintain the current BDRate level. - Encoder user level time reduction <1% owing to av1_fht32x32_avx2. - Add MemCheck unit test and fdct32() unit test. Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c	2016-10-20 09:22:24 -07:00
Yi Luo	fed8e1c06d	Hybrid forward transform 32x32 AVX2 optimization - av1_fht32x32 AVX2 function level time reduction ~89% compared to C. - av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2() But function replacement must go with the corresponding inverse txfm. - No obvious user level time reduction due to 32x32 TX_TYPE selection. - Zero high 128b YMM to avoid AVX-SSE transition penalties (fix 16x16 case). - Added 32x32 AVX2 unit tests to verify bitexact. - AVX2 optimization summary: On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results: C to AVX2: function level time reduction, ~86-89%. SSE2 to AVX2: function level time reduction, ~51%. Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036	2016-10-12 14:19:53 -07:00

4 Commits