generic-library/vpx

Author	SHA1	Message	Date
Angie Chiang	ed8cd9a9b4	adapt_scan experiment Performance improvement BDRate lowres 0.921% midres 0.730% hdres 1.019% Change-Id: I26208d6c0531937bff44de505b4ea355c7852802	2016-10-24 18:24:56 -07:00
hui su	9ff4134f54	Renaming in filter-intra sse4 code Change-Id: Iff1786a92d164e6b9cfaf4a59ece79819494276f	2016-10-19 21:41:06 -07:00
hui su	5db9743fbb	Seperate FILTER_INTRA from EXT_INTRA experiment Prepare for the av1/nextgenv2 merge. Coding gain (%): lowres midres ext-intra 0.69 0.97 filter-intra 0.67 0.83 both 1.05 1.48 Change-Id: Ia24d6fafb3e484c4f92192e0b7eee5e39f4f4ee6	2016-10-19 21:40:49 -07:00
Michael Bebenita	6048d05225	Bit accounting. This patch adds bit account infrastructure to the bit reader API. When configured with --enable-accounting, every bit reader API function records the number of bits necessary to decoding a symbol. Accounting symbol entries are collected in global accounting data structure, that can be used to understand exactly where bits are spent (http://aomanalyzer.org). The data structure is cleared and reused each frame to reduce memory usage. When configured without --enable-accounting, bit accounting does not incur any runtime overhead. All aom_read_xxx functions now have an additional string parameter that specifies the symbol name. By default, the ACCT_STR macro is used (which expands to __func__). For more precise accounting, these should be replaced with more descriptive names. Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d	2016-10-19 04:34:29 +00:00
Debargha Mukherjee	078856a4df	Merge "Simplify 8x16 and 16x8 inverse transform tests" into nextgenv2	2016-10-14 02:53:38 +00:00
Debargha Mukherjee	a720f4b3b5	Merge "Add sse2 forward and inverse 16x32 and 32x16 transforms" into nextgenv2	2016-10-14 02:49:20 +00:00
Yue Chen	a48764d05f	Merge "Renamings for OBMC experiment" into nextgenv2	2016-10-14 01:33:00 +00:00
Yue Chen	cb60b185c7	Renamings for OBMC experiment To get ready for pulling AV1 to nextgenv2 Replace the experimental flag by MOTION_VAR. Rename major variables. Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205	2016-10-13 15:51:22 -07:00
David Barker	4f803efac1	Simplify 8x16 and 16x8 inverse transform tests Change-Id: Ie86aedfb1f3e0d9c0cf58d7183861a0ed0e8ccc8	2016-10-13 16:02:59 +01:00
David Barker	33231d4801	Add sse2 forward and inverse 16x32 and 32x16 transforms Change-Id: I1241257430f1e08ead1ce0f31db8272b50783102	2016-10-13 14:01:22 +01:00
Yi Luo	fed8e1c06d	Hybrid forward transform 32x32 AVX2 optimization - av1_fht32x32 AVX2 function level time reduction ~89% compared to C. - av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2() But function replacement must go with the corresponding inverse txfm. - No obvious user level time reduction due to 32x32 TX_TYPE selection. - Zero high 128b YMM to avoid AVX-SSE transition penalties (fix 16x16 case). - Added 32x32 AVX2 unit tests to verify bitexact. - AVX2 optimization summary: On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results: C to AVX2: function level time reduction, ~86-89%. SSE2 to AVX2: function level time reduction, ~51%. Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036	2016-10-12 14:19:53 -07:00
Yaowu Xu	4960f7c3bd	Merge "Added generic SIMD support for CLPF." into nextgenv2	2016-10-11 16:05:18 +00:00
Steinar Midtskogen	be668e92c3	Added generic SIMD support for CLPF. Change-Id: Ie03f9a5b0a4c708a586532198d755a1e7509f149	2016-10-10 11:19:37 -07:00
David Barker	4d03d6fc6f	Add sse2 forward / inverse 4x8 and 8x4 transforms Change-Id: I89ed93fb20cf975c2b463cff58879521ceaa4163	2016-10-10 09:02:45 -07:00
Peter de Rivaz	1baecfeb03	Added sse2 inverse 8x16 and 16x8 transforms Change-Id: I43628407b11e5c8e6af4df69f2acdc67ac827834	2016-10-06 11:23:14 -07:00
Alex Converse	0ad82c6edb	Rename av1_ans_test to match aom/master. Change-Id: I3b2137903a87a1f8169ff45e940575b917c26a6a	2016-09-26 13:15:41 -07:00
Alex Converse	080a2cccba	Migrate bitwriter to the interface in aom/master Change-Id: I73d46229f0feea43cbe933e51da997833cce032b	2016-09-21 11:17:08 -07:00
Geza Lore	1a800f6539	Add SSE2 versions of av1_fht8x16 and av1_fht16x8 Encoder speedup ~2% with ext-tx + rect-tx Change-Id: Id56ddf102a887de31d181bde6d8ef8c4f03da945	2016-09-09 11:29:41 -07:00
Yaowu Xu	f883b42cab	Port renaming changes from AOMedia Cherry-Picked the following commits: 0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia" 54e6676 Replace "VPx" by "AVx" 5082a36 Change "Vpx" to "Avx" 7df44f1 Replace "Vp9" w/ "Av1" 967f722 Remove kVp9CodecId 828f30c Change "Vp8" to "AOM" 030b5ff AUTHORS regenerated 2524cae Add ref-mv experimental flag 016762b Change copyright notice to AOMedia form 81e5526 Replace vp9 w/ av1 9b94565 Add missing files fa8ca9f Change "vp9" to "av1" ec838b7 Convert "vp8" to "aom" 80edfa0 Change "VP9" to "AV1" d1a11fb Change "vp8" to "aom" 7b58251 Point to WebM test data dd1a5c8 Replace "VP8" with "AOM" ff00fc0 Change "VPX" to "AOM" 01dee0b Change "vp10" to "av1" in source code cebe6f0 Convert "vpx" to "aom" 17b0567 rename vp10.mk to av1_.mk fe5f8a8 rename files vp10_* to av1_* Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419	2016-08-31 18:19:03 -07:00
Yaowu Xu	c27fc14b02	Port folder renaming changes from AOM Manually cherry-picked commits: ceef058 libvpx->libaom part2 3d26d91 libvpx -> libaom cfea7dd vp10/ -> av1/ 3a8eff7 Fix a build issue for a test bf4202e Rename vpx to aom Change-Id: I1b0eb5a40796e3aaf41c58984b4229a439a597dc	2016-08-31 17:26:24 -07:00
Yaowu Xu	253c001f8f	Port dering experiment from aom Mannually cherry-picked: 1579133 Use OD_DIVU for small divisions in temporal_filter. 0312229 Replace divides by small values with multiplies. 9c48eec Removing divisions from od_dir_find8() 0950ed8 Merge "Port active map / cyclic refresh fixes to vp10." efefdad Port active map / cyclic refresh fixes to vp10. 1eaf748 Port switch to 9-bit rate cost to aom. 0b1606e Only build deringing code when --enable-dering. e2511e1 Deringing cleanup: don't hardcode the number of levels 8fe5c5d Rename dering_in to od_dering_in to sync with Daala 4eb1380 Makes second filters for 45-degree directions horizontal 7f4c3f5 Removes the superblock variance contribution to the threshold 3dc56f9 Simplifying arithmetic by using multiply+shift cf2aaba Return 0 explicitly for OD_ILOG(0). 49ca22a Use the Daala implementation of OD_ILOG(). 8518724 Fix compiler warning in od_dering.c. 485d6a6 Prevent multiple inclusion of odintrin.h. 51b7a99 Adds the Daala deringing filter as experimental Note that a few of the changes were already in libvpx codebse. Change-Id: I1c32ee7694e5ad22c98b06ff97737cd792cd88ae	2016-08-16 13:47:18 +00:00
James Zern	cc73e1fcd4	remove SVC spatial/temporal scalability are not supported in VP10 currently. + remove the unused vp10/encoder/skin_detection.[hc] this also enables DatarateTestLarge for VP10 which passes with no experiments enabled. these were removed previously when only the SVC tests should have been: 134710a Disable tests not applicable to VP10 Change-Id: I9ee7a0dd5ad3d8cc1e8fd5f0a90260fa43da387c	2016-08-09 18:42:20 -07:00
Yi Luo	57c4711b5c	Optimization EXT_INTRA's filtered intra predictor (SSE4.1) - Add unit tests to verify the bit-exact result. - In speed test, function speed (for each mode/tx_size) improves about 23%~35%. - On E5-2680, park_joy_1080p, 10 frames, --kf-max-dist=1, encoding time improves about 1%~2%. Change-Id: Id89f313d44eea562c02e775a6253dc4df7e046a9	2016-08-08 10:02:36 -07:00
Yaowu Xu	a3cff08259	more cleanup of vp8 and vp9 Change-Id: Ic90ebe6136f4b75645ba699d49c0bcb3764ddccf	2016-08-03 12:20:33 -07:00
Yaowu Xu	5eee90730b	Rename files with vp9_ prefix Change-Id: I9c51ae3a2af698efe32288b807f881385e19822b	2016-07-29 16:45:08 +00:00
Yaowu Xu	3fa28d51af	More vp8/vp9 clean up Change-Id: I8101de20e873c19d03c7fd2977bc22003e395807	2016-07-28 18:22:47 -07:00
Yaowu Xu	3bd709fafe	Remove vp8, vp9 folders Change-Id: I09b8acd22d031ece52e1fee18b998349bf1cf06b	2016-07-28 14:33:21 +00:00
Yi Luo	b2663a8a67	HBD fast path quantization speed improvement - HBD encoder speed improvement (SSE4.1): Enable CONFIG_VP9_HIGHBITDEPTH, on Xeon E5-2680, 50 frames, park_joy_1080p, 12-bit, Encoding time reduces from 4846481 to 4177471 (ms) - Add unit test to verify bit-exact and EOB calculation Change-Id: I08e8ef3549ddad5ab36d86e78557df3b288537ea	2016-07-20 14:11:10 -07:00
Yaowu Xu	6fe07a207b	Merge branch 'master' into nextgenv2 Change-Id: Ia3c0f2103fd997613d9f16156795028f89f63265	2016-07-14 16:05:48 -07:00
Geza Lore	ebc2d34cd9	Add SSE4.1 vpx_obmc_variance* implementations and cosmetics Speedup for these functions: 4x Also include some cosmetic changes to SAD functions Change-Id: I344c32c795492507ae08742f52d035a13f583799	2016-07-12 21:04:46 -07:00
Geza Lore	bfa59b4a5f	Improve vpx_blend_* functions. - Made source buffers pointers to const. - Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more indicative that the function does alpha blending. The 6, or 6b suffix was misleading, as the max mask value (64) does not fit into 6 bits. - Added VPX_BLEND_* macros to use when needing to blend scalars. - Use VPX_BLEND_A256 in combine_interintra to be more explicit about the operation being done. - Added versions of vpx_blend_a64_* which take 1D horizontal/vertical masks directly and apply them to all rows/columns (vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied horizontal version now falls back on the 2D version. This can be improved upon if it show up high enough in a profile. - All vpx_blend_a64_* functions now support block sizes down to 1x1 (ie: a single pixel). This is for usage convenience. The SSE4.1 optimized versions fall back on the C implementation if w <= 2 or h <= 2. This can again be improved if it becomes hot code. Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d	2016-07-11 19:05:17 +01:00
Debargha Mukherjee	72ef6d7704	Refactor and clean up on blend_mask6 Change-Id: Ie9188471e7dc07ab9c95b22f258b1662e895c533	2016-07-08 15:02:57 -07:00
Geza Lore	e6f8c17ac5	Remove various testing utilities. test/assertion_helpers.h test/randomise.{cc,h} test/snapshot.h Modfiy blend_mask6_test.cc not to rely on these. Change-Id: I88b8933fe0a729a606797e5cd421795a544c612d	2016-07-07 16:22:07 +01:00
Debargha Mukherjee	fabc0ed7ad	Merge "Reinstate tests for wedge partition selection optimizations." into nextgenv2	2016-07-07 05:55:07 +00:00
Geza Lore	aacdf98c9a	Add SSE4.1 vpx_obmc_sad* implementations. Speedup for these functions: 4x Change-Id: I21baa04f53c6ab308ea3edf3ebacc62970e97454	2016-07-06 19:46:13 +00:00
Geza Lore	2791d9db1e	Reinstate tests for wedge partition selection optimizations. This reinstates the tests from commit efda2831e5f758b4f350679b5c55c0b9282449b0 with the appropriate fixes for 32 bit x86 builds. Change-Id: Ib331906c5b448ca964895ee9cbfd4266f67d1089	2016-07-06 15:09:46 +01:00
Yi Luo	f1a50db2d1	Merge "Convolution horizontal filter SSSE3 optimization" into nextgenv2	2016-06-20 20:06:02 +00:00
Yi Luo	229690a95c	Convolution horizontal filter SSSE3 optimization - Apply signal direction/4-pixel vertical/8-pixel vertical parallelism. - Add unit test to verify the bit exact result. - Overall encoding time improves ~24% on Xeon E5-2680 CPU. Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59	2016-06-20 11:10:30 -07:00
Tom Finegan	5a9f21db54	Output frames in first pass for VPX_DL_REALTIME. Since combining VPX_DL_REALTIME with VPX_RC_FIRST_PASS is basically nonsense, ignore the user's pass setting when this happens and behave as if the requested encode is a single pass encode. BUG=webm:1233 Change-Id: I5ee4c4e5838c4ca6d24988890aae490b10826db2	2016-06-17 11:25:55 -07:00
James Zern	94e84bbc07	cosmetics,test.mk: fix a typo Change-Id: Ib74a494e1cf50a356f51e8185e19ca66fcb896a2	2016-06-15 20:33:04 -07:00
James Zern	fba6f748e8	rename vp9_end_to_end_test.cc -> end_to_end_test.cc this is shared between vp9/10 BUG=webm:1235 Change-Id: I2f44b15268a33453a1c1e0c691d4fc1fc12d0263	2016-06-15 18:30:22 -07:00
James Zern	2710f76692	vp9_end_to_end_test: enable in vp10-only builds this file is shared between vp9 & vp10; this makes it available in the presence of --disable-vp9 BUG=webm:1235 Change-Id: Iaf060c3c09afd2c7df69995b0c01589f78d4945e	2016-06-15 18:28:30 -07:00
Angie Chiang	95340fccb3	Revert "Optimize wedge partition selection." This reverts commit efda2831e5f758b4f350679b5c55c0b9282449b0. This commit causes segmentation fault at SSE2/SumSquares2DTest.RandomValues/0 Change-Id: I171937e4daf6f15323e8206418773deb03bd8c53	2016-06-09 19:17:37 -07:00
Geza Lore	efda2831e5	Optimize wedge partition selection. We can optimize wedge partition selection by pre-computing the residuals of the 2 underlying predictors, and then blend these to compute the sse of the compound predictor, without actually having to compute and subtract the compound predictor. Similarly we can pre-compute a proxy array which we can use to cheaply check which mask sign would have lower sse. Details are in wedge_utils.c. Mathematically these are equivalence transformations, but due to the finite precision the encoder output will be perturbed, though on average this should make 0% difference. ext-inter gains about ~4.5% speedup. Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792	2016-06-06 14:43:10 +01:00
Geza Lore	5a69ee0e11	Move template specializations into .cc from .h Change-Id: I6d8775c1fa228fde25016a401e3c22a8e3da42f9	2016-06-03 09:34:55 +01:00
James Zern	008f27e70a	Merge "add vp10 ActiveMap/ActiveMapRefreshTest" into nextgenv2	2016-05-25 19:05:02 +00:00
Yi Luo	28cdee448d	HBD inverse HT 8x8 and 16x16 sse4.1 optimization - Covers tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST. - Encoding speed improves ~27% on crowd_run_1080p_12. - Merge 4x4, 8x8, 16x16 unit tests in one test file. Change-Id: I058ef5254d068a9523a826480c78ebbdd231824c	2016-05-24 12:55:30 -07:00
Geza Lore	a661bc87c4	Add optimized vpx_blend_mask6 This is to replace vp10/common/reconinter.c:build_masked_compound. Functionality is equivalent, but the interface is slightly more generic. Total encoder speedup with ext-inter: ~7.5% Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305	2016-05-23 16:28:58 +01:00
Yi Luo	ceabb00704	Merge "HBD inverse HT 4x4 SSE4.1 optimization" into nextgenv2	2016-05-16 21:15:08 +00:00
hui su	cafbf63d30	Add level test for VP9 Change-Id: I99f50bdd5af3f64a029c2f5f6f5fb1ff45bad67e	2016-05-16 09:54:23 -07:00

1 2 3 4 5 ...

309 Commits