generic-library/vpx

Author	SHA1	Message	Date
Yaowu Xu	7e89c102c4	vp9-highbitdepth -> vpx-highbitdepth Change-Id: I1e90cf7ab4bb02c0ef119b0bd1596771edefedff	2016-08-05 15:41:33 -07:00
Yaowu Xu	476e44a689	Merge "Replace variants of 'vp8' and 'vp9' with 'vpx'" into nextgenv2	2016-08-04 23:48:59 +00:00
Yaowu Xu	fe291b647e	Replace variants of 'vp8' and 'vp9' with 'vpx' Change-Id: Id6cb96b0b15efdda63348d8bfe59fc0533c85ba1	2016-08-04 22:20:38 +00:00
Yaowu Xu	56a91f139d	Fix a number of msvc warnings Change-Id: Ic5ddba3ca0c87245617b6dbc78c0f13dc952ce8b	2016-08-04 21:42:56 +00:00
Yaowu Xu	8bf837f153	Cherry pick from AOM: 68e7e4d0 Remove VP9_CAP_POSTPROC 0738390c Remove vp9_temporal denoise b89861a4 Remove vp9-postproc Change-Id: I4ecaa0ac83a519c8174a494378fc23df610ff2a8	2016-08-02 15:29:50 -07:00
Yaowu Xu	6fe07a207b	Merge branch 'master' into nextgenv2 Change-Id: Ia3c0f2103fd997613d9f16156795028f89f63265	2016-07-14 16:05:48 -07:00
Jingning Han	edbbce8e61	Fix highbd inter prediction filter sse4 overwriting issue Properly handle the case where the height is an integer multiple of 4. Change-Id: I11ac188c13f78db20902e2e333c60ce76ce837c5	2016-07-13 12:51:02 -07:00
Yi Luo	fde48c980a	Merge "HBD convolution filtering (10/12 taps) SSE4.1 optimization" into nextgenv2	2016-07-12 19:28:48 +00:00
Yi Luo	8cacca73bf	HBD convolution filtering (10/12 taps) SSE4.1 optimization - For experiment EXT_INTERP under high bit depth. - Add unit test to verify bit-exact. - Speed performance improvement: On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time drops from 6682503 ms to 5390270 ms. Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77	2016-07-12 10:13:30 -07:00
James Zern	08bd57ef0d	vp10_convolve_ssse3.c: make some functions static quiets -Wmissing-prototypes warnings BUG=b/29584271 Change-Id: I4d2eb7f4b45d7b829421976641b3212bcf29e7dd	2016-07-11 16:52:10 -07:00
James Zern	bc4341fd94	vp10: add some missing includes quiets some -Wmissing-prototypes warnings BUG=b/29584271 Change-Id: I9174728459fcabb6d9ac0028ae58029e52c0da92	2016-07-11 16:52:07 -07:00
Yi Luo	8404253f81	Fix bugs in convolution filter optimization - Fix the over-writing bug in horizontal filtering as width = 2. - Fix 10-tap vertical filtering which no longer reads one row of pixel above the block. - Fix 10-tap filter zero padding. - Encoder speed slow down ~4.0%, compared to, 81ad953 Convolution vertical filter SSSE3 optimization Change-Id: I9bb294a4529300081c29bf284e6bc6eb081cc536	2016-06-27 10:23:38 -07:00
Yi Luo	81ad95363a	Convolution vertical filter SSSE3 optimization - Apply 8-pixel vertical filtering direction parallelism. - Add unit tests to verify bit exact. - Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680. - Combinational cycle count of vp10_convolve() drops from 26.06% to 6.73%. Change-Id: Ic1ae48f8fb1909991577947a8c00d07832737e57	2016-06-23 12:56:47 -07:00
Yi Luo	229690a95c	Convolution horizontal filter SSSE3 optimization - Apply signal direction/4-pixel vertical/8-pixel vertical parallelism. - Add unit test to verify the bit exact result. - Overall encoding time improves ~24% on Xeon E5-2680 CPU. Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59	2016-06-20 11:10:30 -07:00
Yi Luo	bfe4c0ae07	Integrate HBD inverse HT flip types sse4.1 optimization - tx_size: 4x4, 8x8, 16x16. - tx_type: FLIPADST_DCT, DCT_FLIPADST, FLIPADST_FLIPADST, ADST_FLIPADST, FLIPADST_ADST. - Encoder speed improvement: park_joy_1080p_12: ~11%, crowd_run_1080p_12: ~7%. - Add unit test cases for bit-exact against C. Change-Id: Ia69d069031fa76c4625e845bfbfe7e6f6ed6e841	2016-05-25 12:32:10 -07:00
Yi Luo	28cdee448d	HBD inverse HT 8x8 and 16x16 sse4.1 optimization - Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Encoding speed improves ~27% on crowd_run_1080p_12. - Merge 4x4, 8x8, 16x16 unit tests in one test file. Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c	2016-05-24 12:55:30 -07:00
Yi Luo	346d2449f0	Fix to conform Google's coding convention - Confirm input coeff buffer is 16-byte aligned. - sizeof() prefer variable name instead of type. - Fix function name (Capital first letter then Pascal case). - Long base class name uses a newline (with colon and 4 space indent). - Remove a unnecessary reference function variable. - Method declaration precedes variable declaration in class definition. Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe	2016-05-18 18:15:53 -07:00
Yi Luo	ceabb00704	Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2	2016-05-16 21:15:08 +00:00
Yi Luo	a3a69b400c	HBD inverse HT 4x4 SSE4.1 optimization - Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Encoder overall instruction count drops 2.91%. - Decoder overall instruction count drops 1.01%. - Add unit test to test bit-exact result against C. Change-Id: I908c9e0e5106c58f67dd72d28760e6c9ce54278e	2016-05-13 12:08:43 -07:00
Angie Chiang	1954fa390f	Add flip option for vp10_fwd_txfm2d_#x#_c Will add unit test to test/vp10_fwd_txfm2d_test.cc later Change-Id: I626900c67fca4eee2ad0ae1828188527a04a5362	2016-05-10 18:14:57 -07:00
Yi Luo	412ad22f46	HBD hybrid transform 16x16 SSE4.1 optimization - Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Update vp10_fht16x16_test.cc to do bit-exact test against latest C version. - HBD encoder speed improves ~1.8%. Change-Id: Icfc799a212e5289bcf6cedcae3722032133a2bc6	2016-05-09 11:07:01 -07:00
Yaowu Xu	f2512710d5	Replace inline with INLINE This fixes build issues under MSVC Change-Id: I6db6a43cba2e8ddb099b676f1ae019fe2742f366	2016-05-05 18:28:04 -07:00
Jim Bankoski	fce3cee8dd	Move vpx_add_plane from codec to vpx_dsp and dedup. Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7	2016-05-02 12:17:39 -07:00
Yi Luo	299c5fc202	HBD hybrid transform 8x8 SSE4.1 optimization - Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Update bit-exact unit test against current C version. - HBD encoder speed improves ~3.8%. Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec	2016-04-29 17:04:52 -07:00
Alex Converse	97673cb128	Fix vp10 txfm on MSVC 2015. BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1187 Change-Id: Ied6d3d003ed6ab9cf4f03cdd1d0037ae755254f4	2016-04-27 19:40:02 +00:00
Yi Luo	a4593f17ca	HBD hybrid transform 4x4 SSE4.1 optimization - Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Overall encoder speed improves ~4.5%-6%. - Update bit-exact unit test against current C version. Change-Id: If751c030612245b1c2470200c9570cf40d655504	2016-04-25 09:53:09 -07:00
Yi Luo	cf7f00691f	Change hybrid transform function argument from TXFM_2D_CFG* to int Unit test shows manually developed SSE4.1 code would performs ~30% better if TXFM_2D_CFG configuration is set in lower level. This change only updates function signature. There is no performance impact. Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b	2016-04-21 18:37:21 -07:00
Angie Chiang	25520d8dc3	change vp10_fwd_txfm2d_#x#_sse2 to vp10_fwd_txfm2d_#x#_sse4_1 The speed performance for running 20k times is as follows Notice that the vp10_highbd_fdct#x#_sse2 version is 16-bit version plus range check The rest are 32-bit version vp10_fwd_txfm2d_4x4_c (2 ms) vp10_fwd_txfm2d_8x8_c (9 ms) vp10_fwd_txfm2d_16x16_c (45 ms) vp10_fwd_txfm2d_32x32_c (233 ms) vp10_fwd_txfm2d_4x4_sse4_1 (2 ms) vp10_fwd_txfm2d_8x8_sse4_1 (3 ms) vp10_fwd_txfm2d_16x16_sse4_1 (16 ms) vp10_fwd_txfm2d_32x32_sse4_1 (80 ms) vp10_highbd_fdct4x4_c (1 ms) vp10_highbd_fdct8x8_c (3 ms) vp10_highbd_fdct16x16_c (17 ms) highbd_fdct32x32_c (160 ms) vp10_highbd_fdct4x4_sse2 (0 ms) vp10_highbd_fdct8x8_sse2 (2 ms) vp10_highbd_fdct16x16_sse2 (8 ms) highbd_fdct32x32_sse2 (105 ms) Change-Id: I24daf1e0d4d66e91e4ce61ef71cefa7b70ee90ce	2016-03-30 15:25:26 -07:00
Angie Chiang	f2b311f580	Simplify rounding in vp10_[fwd/inv]_txfm[1/2]d_#x# Change-Id: I24ce46e157dc5b9c0d75000a1a48e9c136ed4ee1	2016-03-30 15:25:26 -07:00
Angie Chiang	11d2bb5429	Add vp10_fwd_txfm2d_sse2 Change-Id: Idfbe3c7f5a7eb799c03968171006f21bf3d96091	2016-03-30 15:25:26 -07:00
Geza Lore	4f5108090a	Flip the result of the inverse transform for FLIPADST. When using FLIPADST, the vp10_inv_txfm_add functions used to flip the destination array, add the result of the inverse transform, to it and then flip the destination back. This has been replaced by flipping the result of the inverse transform before adding it to the destination. Up-Down flipping is done by negating the destination stride, and staring from the bottom, so it should now be free. Left-right flipping is done with the usual SSE2 instructions in the optimized code. The C functions match the SSE2 functions as expected, so the C functions now do the flipping as well when required. Adding this cleanly required some refactoring of the C functions, but there is no measurable performance impact when ext-tx is not enabled. Encode speedup with ext-tx enabled is about 3%. Change-Id: I5b04e5d720f0b9f0d54fd8607a8764f2314c7234	2015-11-04 17:11:44 +00:00
Debargha Mukherjee	f18322262f	Backports highbitdepth accelerations into vp10 Ports the changes in https://chromium-review.googlesource.com/#/c/302372/3 into vp10. Change-Id: I334c409f693691227ad16fc703c91899592dd8dc	2015-10-02 00:57:37 -07:00
Angie Chiang	894ab8be7e	fix implicit declaration include vpx_dsp_rtcd.h to avoid implicit declaration of vp10_highbd_fdct32x32_rd_c Change-Id: I0b9ad50381a302750138deab14d2d5ac31f286ee	2015-09-11 12:17:15 -07:00
Angie Chiang	501efcad4a	Merge "Isolate vp10's fwd_txfm from vp9"	2015-09-11 00:10:45 +00:00
Angie Chiang	ee5b80597e	Isolate vp10's fwd_txfm from vp9 1) copy fw_txfm related files from vpx_dsp tp vp10 vpx_dsp/fwd_txfm.h → vp10/common/vp10_fwd_txfm.h vpx_dsp/fwd_txfm.c → vp10/common/vp10_fwd_txfm.c vpx_dsp/x86/fwd_dct32x32_impl_sse2.h → vp10/common/x86/vp10_fwd_dct32x32_impl_sse2.h vpx_dsp/x86/fwd_txfm_sse2.c → vp10/common/x86/vp10_fwd_txfm_sse2.c vpx_dsp/x86/fwd_txfm_impl_sse2.h → vp10/common/vp10_fwd_txfm_impl_sse2.h Change-Id: Ie9428b2ab1ffeb28e17981bb8a142ebe204f3bba	2015-09-10 15:19:43 -07:00
Angie Chiang	87175ed592	Isolate vp10's inv_txfm from vp9 1) copy following files from vpx_dsp/ to vp10/common/ vp10_inv_txfm.c vp10_inv_txfm.h vp10_inv_txfm_sse2.c vp10_inv_txfm_sse2.h 2) change the function prefix "vpx_" to "vp10_" in above files 3) add unit test at vp10_inv_txfm_test.cc Change-Id: I206f10f60c8b27d872c84b7482c3bb1d1cb4b913	2015-09-10 15:08:37 -07:00
Jingning Han	54d66ef165	Remove vp9_ prefix from vp10 files Remove the vp9_ prefix from vp10 file names. Change-Id: I513a211b286a57d6126fc1b0fbfd6405120014f1	2015-08-11 21:24:08 -07:00
Jingning Han	3ee6db6c81	Fork VP9 and VP10 codebase This commit folks the VP9 and VP10 codebase and makes libvpx support VP8, VP9, and VP10. Change-Id: I81782e0b809acb3c9844bee8c8ec8f4d5e8fa356	2015-08-11 17:05:28 -07:00

38 Commits