generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	fa85cf131c	vp9: strip temporal filter code when CONFIG_REALTIME_ONLY is enabled. BUG=webm:1446 Change-Id: Id547783ec75383966c40ab5cf6abb4a0f7984f52	2017-08-14 14:27:53 -07:00
Johann Koenig	ff184e482a	Merge changes I4b4beab1,I02f74dec * changes: quantize test: check skip_block quantize test: use negative input	2017-08-14 20:52:52 +00:00
Johann Koenig	45b39750d6	Merge "temporal filter test: adjust inputs and runtime"	2017-08-14 20:46:22 +00:00
Johann	c06d6649c5	temporal filter test: adjust inputs and runtime Use input with a narrow range because the filter only applies when the frames are similar. Run CompareReferenceRandom more times. Especially before narrowing the input range, the filter frequently did not apply. Change-Id: Ie249bedf6d0d33dfa5884611cb1835788e418b38	2017-08-14 17:24:11 +00:00
James Zern	746c0eab3b	disable SSSE3/VP9QuantizeTest* in hbd builds this test fails with the configuration similar to the assembly prior to: `d52cb5972` quantize: copy ssse3 optimizations to intrinsics BUG=webm:1458 Change-Id: Idc5c0b84c0598259fc49609a9f0756de531d3baf	2017-08-14 09:31:14 -07:00
Johann Koenig	9bb8ce5efb	Merge "neon: vpx_quantize_b_32x32"	2017-08-10 15:42:49 +00:00
Johann Koenig	0b393ae505	Merge "quantize: copy ssse3 optimizations to intrinsics"	2017-08-10 15:42:20 +00:00
Johann	357adb68b2	quantize test: check skip_block Not all sizes were tested previously. Only 4x4 and 32x32 Change-Id: I4b4beab1b92a810a097a7306de04cc9e0e260315	2017-08-08 14:21:58 -07:00
Johann	1092cc7f1a	quantize test: use negative input coeff contains signed values. Change-Id: I02f74decf30379a28122169ab3e844d0f3bd7d23	2017-08-08 14:19:56 -07:00
Johann	93166c5e51	neon: vpx_quantize_b_32x32 With skip block the neon is about twice as fast as C. The neon has no shortcut for coeff < zbin so it always takes the same amount of time. Even if the C can take the shortcut, it is over twice as fast in neon. If it can't, that gap increases to over 10x. BUG=webm:1426 Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6	2017-08-08 14:05:18 -07:00
Johann	d52cb59729	quantize: copy ssse3 optimizations to intrinsics Fairly minor differences from sse2. pabsw and psignw are the big gains. Also re-uses some values in eob calculation to avoid an extra pcmp. Fixes test failures in HBD and OS X builds. Allows using it in 32bit builds, where it is about 40% faster than sse2. Substantially faster than the assembly for skip_block. 10-20% faster the rest of the time. Change-Id: If783bb3567e561e47667e10133b9c84414a334e2	2017-08-08 12:22:14 -07:00
Linfeng Zhang	853165ba39	Update 32x32 idct sse2 funcs, add partial case 135 Change-Id: I2b9add83f6fd8f9138fed3bec04a59877a237a6a	2017-08-07 17:37:02 -07:00
Linfeng Zhang	7f20c3ac44	Add vpx_highbd_idct16x16_{10, 38, 256}_add_sse4_1 BUG=webm:1412 Change-Id: I8877c986b4042f7b8e33f5674c86700675a0e4ca	2017-08-04 15:31:17 -07:00
Johann Koenig	cbb83ba4aa	Merge "quantize test: consolidate sizes"	2017-08-04 20:34:50 +00:00
Johann	9578a84205	quantize test: consolidate sizes Pass a max txfm size parameter and combine the base quantize test with the 32x32 test. Change-Id: I72ddf020fe6888e864ea9f3642ee2d9a8e48a04b	2017-08-04 12:45:32 -07:00
Linfeng Zhang	563d58ab84	Rewrite vpx_idct16x16_{10,256}_add_sse2() and add case 38 function BUG=webm:1412 Change-Id: I945f0fb6807b8948747243794dc7352b959221f7	2017-08-03 13:59:47 -07:00
Yunqing Wang	6843e7c7f3	Merge "Force the bit exactness in the first pass"	2017-08-03 00:03:10 +00:00
Yunqing Wang	bfd0f41f9b	Force the bit exactness in the first pass Originally, for the purpose of keeping a fast first pass, the first-pass stats between row_mt_mode = 0 and row_mt_mode = 1 are not bit exact, but that difference is very small that doesn't cause a mismatch between the final bitstreams. However, if the encoder changes, this minor difference may cause a mismatch. Thus, this patch always forces the first pass to be bit exact. BUG=webm:1453 Change-Id: I2b67cf529dee81f660f9d9e7fe9a60ea3c7b12b8	2017-08-02 15:58:39 -07:00
Johann	1059b5cc52	quantize test: add speed comparison Test some possible scenarios. Change-Id: I1a612e7153b31756be66390ceea55877856d5a33	2017-08-02 09:33:35 -07:00
Johann Koenig	847394fe77	Merge "neon: vpx_quantize_b"	2017-08-01 16:44:31 +00:00
Johann	2d6b5df657	neon: vpx_quantize_b With skip block or coeff < zbin it is about twice as fast as C. If most coeff values are > zbin it is about 10-15x as fast as C. BUG=webm:1426 Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7	2017-07-31 10:38:46 -07:00
Linfeng Zhang	75653b7032	Merge changes Ia0e20f5f,I28150789,I35df041b,I221dff34 * changes: Update vpx_idct16x16_10_add_sse2() Add vpx_idct16x16_38_add_sse2() Rewrite vpx_highbd_idct8x8_{12,64}_add_sse2 Refactor highbd idct 4x4 and 8x8 x86 functions	2017-07-28 22:43:00 +00:00
James Zern	3c73e587d1	Revert "quantize ssse3: declare all variables" This reverts commit `03f5e300d6`. This causes test failures under OSX: SSSE3/VP9QuantizeTest.EOBCheck/0 SSSE3/VP9QuantizeTest.OperationCheck/0 Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b	2017-07-28 01:22:16 -07:00
Linfeng Zhang	7f4acf8700	Add vpx_idct16x16_38_add_sse2() Change-Id: I28150789feadc0b63d2fadc707e48971b41f9898	2017-07-27 18:02:43 -07:00
Linfeng Zhang	9c43d81bc2	Refactor highbd idct 4x4 and 8x8 x86 functions BUG=webm:1412 Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec	2017-07-27 18:01:03 -07:00
Johann Koenig	a83e1f1d53	Merge "quantize ssse3: declare all variables"	2017-07-27 21:18:35 +00:00
Alexandra Hájková	666c543f7b	ppc: Add vpx_idct16x16_256_add_vsx Change-Id: Ibc3f7965423fd91179f8d8e77c7ae3e6d7f80572	2017-07-25 12:34:15 +00:00
Johann	af08fbb444	quantize test: promote RandRange() result to signed Avoid unsigned overflow warning: unsigned integer overflow: 19974 - 32703 cannot be represented in type 'unsigned int' Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12	2017-07-20 08:17:48 -07:00
Johann	c782f27ead	quantize test: lowbd functions do not pass in highbd qcoeff output looks OK but dqcoeff is no good. BUG=webm:1448 Change-Id: I07211db8a8b74f1f45fdd059852e2de0e5ee18fd	2017-07-20 08:17:48 -07:00
Johann	bde2e4aa36	quantize test: eob is output eob values are generated by the function. Change-Id: I8ce92100e83022bff99888a5a7e6ef378c49fda3	2017-07-19 14:17:19 -07:00
Johann	03f5e300d6	quantize ssse3: declare all variables Copy missing line from avx implementation. Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27	2017-07-18 12:32:57 -07:00
Johann	101981b736	quantize test: test sse2 and avx optimizations ssse3 does not pass either of the tests. avx 32x32 does not pass. Change-Id: I62c2e31336fd2327327afaa0da896ad79a3def44	2017-07-18 12:08:16 -07:00
Johann	c7ebe82253	quantize test: extend arrays Officially the quant structures are 8 elements, with one dc element and 7 repeated ac elements. The low bit depth optimizations take advantage of this to fill the xmm registers. The high bit depth version manually duplicates the values. If all the optimizations were unified, the structure sizes could be greatly reduced. Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae	2017-07-18 09:55:47 -07:00
Johann	cb61ba02f4	quantize test: restrict and correct input Use only valid values for quantize inputs. These were determined by looping over vp9_init_quantizer and looking for max and min values. This allows extending the test to the low bit depth functions which were not designed to handle all possible inputs but only valid inputs. Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e	2017-07-18 09:40:45 -07:00
James Zern	9223b947ca	Merge "fix 'make exampletest' w/CONFIG_REALTIME_ONLY"	2017-07-15 18:37:10 +00:00
Johann	e3fa4ae8e3	quantize test: use Buffer Although the low bitdepth functions are identical (excepting the need for larger intermediate values) they do not pass these tests. This improves the error output to aid debugging. Simplify buffer usage with Buffer and removing unnecessarily aligned variables. eob is a single element and never written using aligned instructions. BUG=webm:1426 Change-Id: Ic95789a135cf1e8a3846d85270f2b818f6ec7e35	2017-07-13 15:54:48 -07:00
James Zern	960466939d	fix 'make exampletest' w/CONFIG_REALTIME_ONLY for tests that aren't explicitly testing 2-pass behavior use --passes=1 with this configuration Change-Id: I6a1520ecc65d0f626486604310af29dacb9f197f	2017-07-13 10:47:20 -07:00
Johann	e381753926	sad4d neon: 64x[32,64] Rewrite 64x64. BUG=webm:1425 Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf	2017-07-12 13:26:39 +00:00
Johann	e1bde306c8	sad4d neon: 32x[16,32,64] Rewrite 32x32. Use half the accumulator registers. BUG=webm:1425 Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13	2017-07-12 13:25:18 +00:00
Johann	807ce8fb1e	sad4d neon: 16x[8,16,32] Rewrite 16x16. Use half the accumulator registers. BUG=webm:1425 Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2	2017-07-12 13:25:11 +00:00
Johann	8152b0904d	sad4d neon: 8x[4,8,16] BUG=webm:1425 Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4	2017-07-12 13:25:03 +00:00
Johann	dd4347e9ec	sad4d neon: 4x4, 4x8 BUG=webm:1425 Change-Id: I5081b5ce131821d590c53ac1206a94f50cb8b468	2017-07-12 03:38:03 +00:00
Johann Koenig	4e16f70703	Merge changes Id84d9780,Iaa6ea75b,I3362e0dd,I0020a49e,Ia42e4f36, ... * changes: sad neon: avg for 64x[32,64] sad neon: macroize 64xN definitions sad neon: avg for 32x[16,32,64] sad neon: macroize 32xN definitions sad neon: avg for 16x[8,16,32] sad neon: macroize 16xN definitions	2017-07-07 21:01:23 +00:00
James Zern	5d6060b62f	Merge "cosmetics,vp9/: normalize inv/fwd_txfm naming"	2017-07-07 19:15:02 +00:00
Johann Koenig	6c375b9cd0	Merge "fdct neon: 32x32_rd"	2017-07-07 14:05:51 +00:00
Johann	e4e08556db	sad neon: avg for 64x[32,64] BUG=webm:1425 Change-Id: Id84d97807a6a0fbcc889c4dfe11929d54f85493d	2017-07-07 07:04:04 -07:00
Johann	67cffc1ef6	sad neon: avg for 32x[16,32,64] BUG=webm:1425 Change-Id: I3362e0dded3b46ca032caa7f44db42f324bc596d	2017-07-07 07:04:04 -07:00
Johann	527e0c9b1c	sad neon: avg for 16x[8,16,32] BUG=webm:1425 Change-Id: Ia42e4f36547c5fe12114fb58379e34bce82eb2f2	2017-07-07 07:04:04 -07:00
Johann Koenig	9b253f9f0a	Merge changes I7b36a57e,If2ab51e3,Ifc685a96 * changes: sad neon: macroize 8xN definitions sad neon: avg for 8x[4,8,16] sad neon: avg for 4x4 and 4x8	2017-07-07 14:03:13 +00:00
James Zern	80b83c73ba	cosmetics,vp9/: normalize inv/fwd_txfm naming + vpx_dsp/, test/ itxfm -> inv_txfm, ftxfm -> fwd_txfm Change-Id: I3aacdb65143576d64cfe5c9b14dd358c17c1fe7e	2017-07-06 18:35:44 -07:00
Johann	63bdc574e5	sad neon: avg for 8x[4,8,16] BUG=webm:1425 Change-Id: If2ab51e3050e078b0011b174efe41fcb65a15f44	2017-07-06 07:43:09 -07:00
Johann	6bac3f80ee	sad neon: avg for 4x4 and 4x8 BUG=webm:1425 Change-Id: Ifc685a96cb34f7fd9243b4c674027480564b84fb	2017-07-06 07:12:47 -07:00
Johann	75b00592c7	fdct neon: 32x32_rd About 40% faster than the non-rd version. BUG=webm:1424 Change-Id: Ia99d14eb9532302eeaab8cd3e503395b0374b5a2	2017-07-06 06:30:50 -07:00
James Zern	5227b8200b	vp9: remove FrameWorkerData & vp9_dthread.h the file was empty after the struct removal. the only remaining use was within vp9_dx_iface, but the wrapper became unnecessary after the removal of frame_parallel_decode. BUG=webm:1395 Change-Id: I515ab585d701e77d388d12b2802d844c424f9bcd	2017-07-05 22:32:00 -07:00
James Zern	0d245d42c4	Merge "test_vector_test,vp8: correct thread range"	2017-07-05 22:33:51 +00:00
Johann Koenig	9a05f9771a	Merge "test/buffer.h: move range checking to compiler"	2017-07-05 21:15:13 +00:00
James Zern	a22bb9809e	Merge "dct_partial_test: cover vpx_fdct8x8_1_msa in hbd"	2017-07-05 21:08:46 +00:00
Hui Su	3e08a88854	Merge "level tests: allow level undershoot"	2017-07-05 20:47:20 +00:00
James Zern	23d60be414	dct_partial_test: cover vpx_fdct8x8_1_msa in hbd this was enabled in: `5ac88162b` partial fdct test Change-Id: Ibae2031ec1308fe3a3b84a1ce6e7bacda3a7cb82	2017-07-05 13:01:41 -07:00
Johann	da2ad47d66	test/buffer.h: move range checking to compiler Pass low/high values as type T. Out of range values should be caught by static analysis instead. Change-Id: I0a3ee8820af05f4c791ab097626174e2206fa6d5	2017-07-05 11:21:18 -07:00
James Zern	7d526c1654	Merge "buffer.h: incorrect RandRange results"	2017-07-02 03:48:53 +00:00
Johann	6cb3178192	buffer.h: incorrect RandRange results 'low' was promoted to unsigned, triggering a ubsan warning Change-Id: Id49340079d39c105da93cf13e96cf852a93a94ba	2017-07-01 20:01:22 -07:00
Alexandra Hájková	c757d6dde4	ppc: Add vpx_idct8x8_64_add_vsx Change-Id: I4ed1312f365509e0595dcc09890ecb050f6f2069	2017-07-01 12:55:47 -07:00
Alexandra Hájková	d8c277030c	ppc: Add vpx_idct4x4_16_add_vsx Change-Id: Id2673eece32027fb245919c7a5c81994a4a19fd8	2017-07-01 12:32:18 -07:00
James Zern	af3ab45867	test_vector_test,vp8: correct thread range testing::Range does not include the end parameter in the set of values. also adjust the start to 2 as the single threaded case is already covered in another instantiation Change-Id: Iae3bf3ed4363dd434eccfa5ad4e3c5e553fbee60	2017-06-30 16:21:06 -07:00
Johann	c2044fda1d	buffer.h: use stride_ instead of stride() Change-Id: Ib51231349bf0ff3e23672762dc7bfa49b5fe4083	2017-06-30 07:37:20 -07:00
Johann	ce5b17f9ad	testing: ranges for random values Add a method to acm_random.h to generate ranges of values Add a way to call that method to buffer.h Adjust dct_[partial_]test.cc to use it. Change-Id: I8c23ae9d27612c28f050b0e44c41cb4ad2494086	2017-06-30 07:25:30 -07:00
Johann Koenig	89d3dc043e	Merge changes Id5beb35d,I2945fe54,Ib0f3cfd6,I78a2eba8 * changes: partial fdct neon: add 32x32_1 partial fdct neon: add 16x16_1 partial fdct neon: add 4x4_1 partial fdct neon: move 8x8_1 and enable hbd tests	2017-06-30 01:00:07 +00:00
James Zern	67d7a6df2d	Merge changes from topic 'rm-dec-frame-parallel' * changes: rm vp9_frame_parallel_test.cc test_vector_test: rm ref to VPX_CODEC_USE_FRAME_THREADING	2017-06-29 23:21:18 +00:00
James Zern	e5bdab98e9	rm vp9_frame_parallel_test.cc VPX_CODEC_USE_FRAME_THREADING was made a no-op in: `01d23109a` vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op and the tests in this file have been disabled since: `6ab0870d4` disable VP9MultiThreadedFrameParallel tests BUG=webm:1395 Change-Id: I2c7a250acb65cf9522cf8a7bb724bb92070e41c6	2017-06-29 15:15:56 -07:00
James Zern	508ef2a6e3	test_vector_test: rm ref to VPX_CODEC_USE_FRAME_THREADING this was made a no-op in: `01d23109a` vp9: make VPX_CODEC_USE_FRAME_THREADING a no-op and the test hitting this branch has been disabled since: `6ab0870d4` disable VP9MultiThreadedFrameParallel tests rename the test to VP9MultiThreaded to exercise the tile-based threading BUG=webm:1395 Change-Id: I35564a75eb5a7d7f7ccb923133b1b07295201f4c	2017-06-29 15:15:48 -07:00
James Zern	bd77931421	dct_partial_test,fwd_txfm: change << to * left shift of a negative number is undefined in C; quiets a ubsan warning Change-Id: Ib1624ad5326ac8e0eead9348468ef7fe5d4df9a4	2017-06-29 14:42:03 -07:00
Johann	9fe510c12a	partial fdct neon: add 32x32_1 Always return an int32_t. Since it needs to be moved to a register for shifting, this doesn't really penalize the smaller transforms. The values could potentially be summed and shifted in place. BUG=webm:1424 Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972	2017-06-28 15:37:44 -07:00
Johann	f310ddc470	partial fdct neon: add 16x16_1 For the 8x8_1, the highbd output fit nicely in the existing function. 12 bit input will overflow this implementation of 16x16_1. BUG=webm:1424 Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec	2017-06-28 15:37:44 -07:00
Johann	4959dd3eb3	partial fdct neon: add 4x4_1 BUG=webm:1424 Change-Id: Ib0f3cfd6116fc1f5a99acb8bfd76e25b90177ffc	2017-06-28 15:37:44 -07:00
Johann	cf75ab6ccd	partial fdct neon: move 8x8_1 and enable hbd tests The function was originally written with HBD in mind. Enable it and configure the tests. BUG=webm:1424 Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1	2017-06-28 15:37:43 -07:00
Johann Koenig	81e25512c3	Merge changes Ib454762d,I966650df,Ie126553e,I068f06c6,Icb72a94e * changes: sad neon: rewrite 64x64 and add 64x32 sad neon: rewrite 32x32, add 32x16 and 32x64 sad neon: rewrite 16x8, 16x16, add 16x32 sad neon: rewrite 8x8 and 8x16 sad neon: rewrite 4x4 and add 4x8	2017-06-28 22:37:00 +00:00
Johann Koenig	d91af5f905	Merge "buffer.h: Only allow Init() to be called once."	2017-06-28 22:36:05 +00:00
Johann Koenig	35f8515c3f	Merge "partial fdct test"	2017-06-28 22:34:53 +00:00
Johann	5ac88162b9	partial fdct test Test the _1 variant of the fdct, which simply sums the block and applies a modifying shift based on the block size. BUG=webm:1424 Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468	2017-06-28 20:32:20 +00:00
Johann	ad011aaab8	sad neon: rewrite 64x64 and add 64x32 BUG=webm:1425 Change-Id: Ib454762d1c61b05a98324fe81ad58c9e09784717	2017-06-28 12:21:34 -07:00
Johann	469643757f	sad neon: rewrite 16x8, 16x16, add 16x32 BUG=webm:1425 Change-Id: Ie126553e5fffcdfaf3d82a85b368ac10ce9ab082	2017-06-28 12:16:00 -07:00
Johann	e40e78be24	sad neon: rewrite 8x8 and 8x16 BUG=webm:1425 Change-Id: I068f06c67b841f09ea07c04ada0c2f1706102138	2017-06-28 12:15:57 -07:00
Johann	46d8660ce3	sad neon: rewrite 4x4 and add 4x8 The previous implementation loaded 8 values (discarding half) BUG=webm:1425 Change-Id: Icb72a94e2557a4ee2db7091266ab58fd92f72158	2017-06-28 11:14:59 -07:00
Johann	e0330c4810	buffer.h: Only allow Init() to be called once. Change-Id: I041c8b6f314802833c5287a176dbfeec9461b08e	2017-06-28 10:59:39 -07:00
hui su	d4595de5db	level tests: allow level undershoot Obtaining a level that is lower than the target should be tolerated. Change-Id: I90a55ee6d7142e9f6cc525ebbd1e0501defcbe28	2017-06-26 15:17:04 -07:00
Linfeng Zhang	ec4afbf74a	Merge "Add vpx_highbd_idct4x4_16_add_sse4_1()"	2017-06-24 01:15:14 +00:00
James Zern	ee1fcb0e69	Merge "variance_test: move Subpel* from tuples to TestParams"	2017-06-23 22:48:40 +00:00
Linfeng Zhang	8253a27904	Add vpx_highbd_idct4x4_16_add_sse4_1() BUG=webm:1412 Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a	2017-06-23 14:30:12 -07:00
James Zern	0d1c782306	Merge "datarate_test: rename thread -> Thread in test name"	2017-06-23 20:00:51 +00:00
James Zern	54bcd98314	variance_test: move Subpel* from tuples to TestParams this normalizes these tests with the regular variance ones both in implementation and test list output Change-Id: I387aea81456f94b8223b8fb2a28cab94bc1aa9d5	2017-06-23 12:54:18 -07:00
Johann Koenig	794a5ad713	Merge "fdct32x32 neon implementation"	2017-06-23 01:58:00 +00:00
Linfeng Zhang	c5f9de573f	Merge changes I783c5f4f,I365f8e53,I5dac0e98 * changes: Clean vpx_idct16x16_256_add_sse2() Update vpx_idct{8x8,16x16,32x32}_1_add_sse2() Clean 32x32 full idct sse2 and ssse3 code	2017-06-22 21:42:23 +00:00
Johann	e67660cf37	fdct32x32 neon implementation Almost 3x faster in constrained loop testing. Over 10x faster in HBD builds. BUG=webm:1424 Change-Id: I2b7f8453e1d4ada63cde729d8115d684c4a71ff9	2017-06-22 06:40:17 -07:00
James Zern	dd88bd87db	datarate_test: rename thread -> Thread in test name this is consistent with other threaded tests and ensures gtest_filters meant to operate on these pick them up Change-Id: I99ce53720553a22c4b9905a2882273c2be2c031b	2017-06-21 20:05:31 -07:00
Linfeng Zhang	2b43a1ee18	Clean 32x32 full idct sse2 and ssse3 code vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are code relocations, no new code. Change-Id: I5dac0e98cc411a4ce05660406921118986638d19	2017-06-21 13:46:49 -07:00
Johann	1c48915233	dct tests: align InvAccuracyCheck buffers 'in' is used for the reference fdct. 'coeff' is input to the idct being tested and 'dst[16]' is output Fixes a segfault on unaligned memory access on x86. Change-Id: I3691b1380ed49986897dd89a63ce63a80a0e0962	2017-06-21 11:47:00 -07:00
James Zern	0aa3677d9d	fix build, rm ref to vpx_idct8x8_64_add_ssse3 this was deleted in: `98967645a` Remove vpx_idct8x8_64_add_ssse3() but this was merged in: `9e03eedf6` Merge changes Ib26dd515,Ie60dabc3 after: `a92991133` Merge "dct tests: run all possible sizes in one test" which added a new reference Change-Id: I8da4a6c80d27b237a378ff15eead1daab89e7e25	2017-06-20 19:46:45 -07:00
Linfeng Zhang	9e03eedf62	Merge changes Ib26dd515,Ie60dabc3 * changes: Clean 8x8 idct x86 optimization Remove vpx_idct8x8_64_add_ssse3()	2017-06-21 00:38:25 +00:00
Johann	4ebb9a36f1	dct tests: run all possible sizes in one test Modify fdct4x4_test.cc to support all size combinations. This does not add any new tests and in fact fails a few. There were minimal changes made to the tests so it's not entirely surprising that some of the larger 12 bit transforms are failing since it was initially only used for 4x4. In follow up patches the tests in fdct8x8_test.cc, dct16x16_test.cc and dct32x32_test.cc will be evaluated and moved to dct_test.cc. BUG=webm:1424 Change-Id: I72a23430f457d7fae8c91e706adc0e77c25abc8f	2017-06-19 15:39:35 -07:00
Linfeng Zhang	98967645a1	Remove vpx_idct8x8_64_add_ssse3() It's almost identical with vpx_idct8x8_64_add_sse2(), except little difference in instructions order. Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f	2017-06-15 14:09:33 -07:00
Johann Koenig	6dcd9b37ea	Merge "idct_test: don't use std::nothrow anymore"	2017-06-09 20:42:39 +00:00
Johann Koenig	8aa4ee1f10	Merge "buffer.h: allow declaring an alignment"	2017-06-09 20:42:21 +00:00
Johann	92373a5bb2	idct_test: don't use std::nothrow anymore But still check for NULL before calling Init() Change-Id: I2bf2887e1064c9103d29c542d20365c0aea75d76	2017-06-09 11:09:06 -07:00
Johann	5aee8ea752	buffer.h: allow declaring an alignment x86 simd register operations generally prefer and may require 16 byte alignment. Change-Id: I73ce577a90dc66af60743c5727c36f23200950ba	2017-06-09 11:03:15 -07:00
James Zern	b3a262dff3	Merge "vp8_decode_frame: fix oob read on truncated key frame"	2017-06-08 23:17:50 +00:00
James Zern	45daecb4f7	vp8_decode_frame: fix oob read on truncated key frame the check for error correction being disabled was overriding the data length checks. this avoids returning incorrect information (width / height) for the decoded frame which could result in inconsistent sizes returned in to an application causing it to read beyond the bounds of the frame allocation. BUG=webm:1443 BUG=b/62458770 Change-Id: I063459674e01b57c0990cb29372e0eb9a1fbf342	2017-06-08 23:16:04 +00:00
Johann	e50ea014c3	Revert "buffer.h: use size_t" This reverts commit `f08581c1d0`. type conversion warnings abound. Change-Id: I41d4c0e7a388e1008bdbc55fefda4bbca3f89f00	2017-06-08 10:20:21 -07:00
Johann Koenig	903375a48a	Merge "fdct16x16 neon optimization"	2017-06-08 15:19:36 +00:00
Johann	eae7cf2368	fdct16x16 neon optimization Roughly 2x speedup. Since the only change for HBD is to store(), the improvement appears to hold there as well. BUG=webm:1424 Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19	2017-06-07 14:59:55 -07:00
Johann Koenig	0c4f74d129	Merge changes Iade45f69,I18d90658,Ieca3f1ef * changes: buffer.h: add num_elements_ buffer.h: zero-init all values buffer.h: use size_t	2017-06-07 19:20:16 +00:00
Johann	902d63759e	buffer.h: add num_elements_ raw_size_ was being incorrectly computed and used Change-Id: Iade45f69964c567ffb258880f26006a96ae5a30d	2017-06-07 11:31:20 -07:00
Johann	4a37e3e2a0	buffer.h: zero-init all values Change-Id: I18d90658bcd4365d49adcadd6954090b3b399aa8	2017-06-07 11:27:26 -07:00
Johann	f08581c1d0	buffer.h: use size_t Change-Id: Ieca3f1ef23cd1d7b844ea3ecb054007ed280b04f	2017-06-07 11:24:27 -07:00
James Zern	ff42e04f9c	Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}"	2017-06-06 23:52:39 +00:00
Johann	de4cb716ee	buffer.h: split out init Change-Id: Idfbd2e01714ca9d00525c5aeba78678b43fb0287	2017-06-06 15:02:50 -07:00
Johann	8659764a07	buffer.h: Use T for values Change-Id: I2da4110e843b6e361028b921c24b6ca2ea9077d9	2017-06-06 12:05:14 -07:00
James Zern	4753c23983	Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx"	2017-06-06 02:19:41 +00:00
Johann Koenig	755b3daf90	Merge "comp_avg_pred neon: used by sub pixel avg variance"	2017-05-31 18:17:28 +00:00
Johann	f695b30ac2	comp_avg_pred neon: used by sub pixel avg variance BUG=webm:1423 Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804	2017-05-30 22:47:34 +00:00
Jerome Jiang	a5ab38093f	Merge "Fix vp8 race when build --enable-vp9-highbitdepth."	2017-05-30 05:47:44 +00:00
Jerome Jiang	0afa2dad76	Fix vp8 race when build --enable-vp9-highbitdepth. Split vp8/vp9 implementations on yv12_copy_frame_c. Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c. Clean up vp8 codes usage in vp9. BUG=webm:1435 Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d	2017-05-26 09:45:01 -07:00
Johann Koenig	de1a9c77a7	Merge changes Iaab2b9a1,Idfb458d3 * changes: sub pel avg variance neon: 4x block sizes sub pel variance neon: 4x block sizes	2017-05-24 18:33:53 +00:00
Johann Koenig	b11a37f540	Merge changes I31fa6ef8,I228c6f29 * changes: sub pel avg variance neon: add neon optimizations sub pel variance neon: normalize variable names	2017-05-24 18:32:02 +00:00
James Zern	566f6d75bd	partial_idct_test,InitInput: fix rollover in mult promote coeff to signed 64-bit to avoid exceeding integer bounds when squaring the value Change-Id: If77bef6bc0a6a4c39ca3013e5e2ddb426a1c6e1f	2017-05-24 15:27:38 +02:00
Alexandra Hájková	8bf6eaf433	ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64} Change-Id: I547d0099e15591655eae954e3ce65fdf3b003123	2017-05-24 13:27:09 +00:00
Linfeng Zhang	36f1b183e4	Update InitInput() in test/partial_idct_test.cc Make it work in high bit depth. BUG=webm:1412 Change-Id: Ic5cfd410a69709f01e2924774356a108a349d273	2017-05-23 14:24:23 -07:00
Johann	f6fcd3410d	sub pel avg variance neon: 4x block sizes BUG=webm:1423 Change-Id: Iaab2b9a183fdb54aae5f717aba95d90dc36a9e3b	2017-05-22 14:40:05 -07:00
Johann	188d58eaa9	sub pel variance neon: 4x block sizes Add optimizations for blocks of width 4 BUG=webm:1423 Change-Id: Idfb458d36db3014d48fbfbe7f5462aa6eb249938	2017-05-22 14:40:01 -07:00
Johann	9b0d306a2f	sub pel avg variance neon: add neon optimizations These are missing an optimized version of vpx_comp_avg_pred BUG=webm:1423 Change-Id: I31fa6ef842e98f7ff3ea079ffed51ae33178e2ed	2017-05-22 13:58:43 -07:00
Linfeng Zhang	c167345ffb	Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2 BUG=webm:1412 Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468	2017-05-22 11:24:21 -07:00
Johann Koenig	e7cac13016	Merge changes Ib8dd96f7,Ie9854b77 * changes: neon variance: process 4x blocks use memcpy for unaligned neon stores	2017-05-22 17:48:33 +00:00
Johann Koenig	3c603eadb4	Merge "neon fdct: 4x4 implementation"	2017-05-19 17:08:58 +00:00
Johann	7b742da63e	neon variance: process 4x blocks Continue processing sets of 16 values. Plenty of improvement for 4x8 (doubles the speed) but only about 30% for 4x4. BUG=webm:1422 Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905	2017-05-17 17:35:01 -07:00
Marco Paniconi	a2dfbbd7d6	Merge "vp9: Modify ChangingDropFrameThresh unittest."	2017-05-17 18:42:51 +00:00
Marco	4733df333f	vp9: Modify ChangingDropFrameThresh unittest. Add another (lower) bitrate to the test, to cover frame drop behavior at low bitrate range. Change-Id: Iaad003974159daf3d2d65ef3a6575a3e72e498d6	2017-05-17 09:38:21 -07:00
Linfeng Zhang	3210ca6d60	Update partial idct testing code Add PartialIDctTest::PrintDiff() to help debugging. In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct. Update PartialIDctTest::InitInput(). Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4	2017-05-17 09:28:32 -07:00
Johann	105503b839	neon fdct: 4x4 implementation Approximately twice as fast as C implementation. BUG=webm:1424 Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f	2017-05-17 07:38:18 -07:00
Alexandra Hájková	bcbc3929ae	ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx Change-Id: Ic9639b1331d8c5cbc207c2a036891ff0137fc56f	2017-05-13 13:13:15 +00:00
James Zern	ac8f58f6ab	Merge changes I1b54a7a5,I3028bdad,I59788cd9 * changes: ppc: Add get_mb_ss_vsx ppc: Add get4x4sse_cs_vsx ppc: Add comp_avg_pred_vsx	2017-05-12 15:24:59 +00:00
Luca Barbato	143b21e362	ppc: Add get_mb_ss_vsx Change-Id: I1b54a7a5bb642e4b836d786ea1ae506eed025e3f	2017-05-12 17:23:00 +02:00
Luca Barbato	6d225eb5f9	ppc: Add get4x4sse_cs_vsx Change-Id: I3028bdadf653665d18e781d28e9625f62804b3d8	2017-05-12 17:23:00 +02:00
Luca Barbato	a7f8bd451b	ppc: Add comp_avg_pred_vsx Change-Id: I59788cd98231e707239c2ad95ae54f67cfe24e10	2017-05-12 17:22:55 +02:00
Alexandra Hájková	f48532e271	ppc: Add vpx_sad64x32/64_vsx Change-Id: I84e3705fa52f75cb91b2bab4abf5cc77585ee3e2	2017-05-12 16:10:16 +02:00
Alexandra Hájková	0b15bf1e54	ppc Add vpx_sad32x16/32/64_vsx Change-Id: I3c4f9d595275669580413a71b3c3c810e7ddcacd	2017-05-12 16:10:11 +02:00
James Zern	a12ea1d5e9	Merge "ppc: Add vpx_sad16x8/16/32_vsx"	2017-05-12 13:33:51 +00:00
Marco	c5a4376aed	vp9: SVC: allow for setting the interp_filter in non-rd pickmode. For SVC 1 pass non-rd pickmode, the interpolation filter for the upsampling of the golden (spatial) reference was not being explicitly set and instead was takin gwhatever value was set in the previous mode/block (which would be either EIGHTTAP or EIGHTAP_SMOOTH). Fix it to the default EIGHTTAP for now, to be updated/selected adaptively in a later change. Minor adjustmemt to rate targeting thresholds in datarate unittests. Change-Id: I52085048674072c6cfb7163e11e9a2658d773826	2017-05-11 11:45:09 -07:00
Alexandra Hájková	cc7f0c0f3e	ppc: Add vpx_sad16x8/16/32_vsx Change-Id: I60619d28fffd9809f93b1af510a50e1aa02519a9	2017-05-10 19:57:30 +00:00
Johann Koenig	d713ec3c46	Merge changes I92eb4312,Ibb2afe4e * changes: subpel variance neon: add mixed sizes sub pixel variance neon: use generic variance	2017-05-10 18:19:52 +00:00
Linfeng Zhang	870cf4356c	Update test/partial_idct_test.cc Makes more sense to call the corresponding partial idct C function instead of the full idct C function as the reference. Change-Id: Ibb7681dd063edd6307ba582c10c26c4c6a4b78c6	2017-05-09 13:07:47 -07:00
Johann Koenig	1814463864	Merge changes Id602909a,Ib0e85608 * changes: neon variance: process two rows of 8 at a time neon variance: add small missing sizes	2017-05-08 17:34:20 +00:00
Linfeng Zhang	2c3a2ad6f1	Merge changes I0cfe4117,I3581d80d,Ida62c941 * changes: Split dsp/x86/inv_txfm_sse2.c Update highbd idct functions arguments to use uint16_t dst Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct	2017-05-08 16:15:57 +00:00
Jerome Jiang	3453c8d6c4	Merge "vp9: Neon optimization for denoiser. Add unit tests."	2017-05-06 01:28:32 +00:00
Jerome Jiang	83a2bfd7dc	Merge "Change target bitrate thresh in denoiser test."	2017-05-06 01:28:15 +00:00
Jerome Jiang	fff358fb06	Change target bitrate thresh in denoiser test. An intended behavior change disabling exhaustive searches in speed feature causes VP9/DatarateTestVP9LargeDenoiser.4threads test failure. Change the threshold to make it pass. BUG=webm:1429 Change-Id: Ibcbe2314c6b2525799894f5d7204fc8eb4ec2a1e	2017-05-05 16:50:19 -07:00
Jerome Jiang	069eedb3a0	vp9: Neon optimization for denoiser. Add unit tests. Denoiser on Neon is 5x faster than C code. BUG=webm:1420 Change-Id: I805ab64f809ff2137354116be6213e7ec29c1dcb	2017-05-05 16:40:52 -07:00
Johann	2346a6da4a	subpel variance neon: add mixed sizes Add support for everything except block sizes of 4. Performance is better but numbers will improve again when the variance optimizations land. BUG=webm:1423 Change-Id: I92eb4312b20be423fa2fe6fdb18167a604ff4d80	2017-05-04 15:30:01 -07:00
Johann	462e29703c	fdct 8x8 neon: minor comment cleanup Simplify HBD/non distinction in test. Document why transpose_neon.h is not used Change-Id: I17659414206ddbb8c2f1ef0d9f4a17f1745d5a52	2017-05-04 15:14:23 -07:00
Johann	cb9133c72f	neon variance: add small missing sizes Some of the mixed sizes were missing. They can be implemented trivially using the existing helper function. When comparing the previous 16x8 and 8x16 implementations, the helper function is about 10% faster than the 16x8 version. The 8x16 is very close, but the existing version appears to be faster. BUG=webm:1422 Change-Id: Ib0e856083c1893e1bd399373c5fbcd6271a7f004	2017-05-04 08:59:42 -07:00
Linfeng Zhang	d5de63d2be	Update highbd idct functions arguments to use uint16_t dst BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5	2017-05-03 13:59:16 -07:00
Linfeng Zhang	081b39f2b7	Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct BUG=webm:1388 Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112	2017-05-03 13:58:31 -07:00
Yi Luo	a3452996a1	High bit depth inter prediction horizontal/vertical filters AVX2 User level speed improvement on i7-6700, cpu-used=1, x86_64 Linux, bitrate, 1080p, 8Mbps, 4K, 16Mbps: - Decoder: 1080p: ~4% 4K: ~5% - Encoder: 1080p: ~1% 4K: ~3% Change-Id: I51b48f9c5de0d62487d5a11aa579c97bd03dd640	2017-05-03 12:18:01 -07:00
James Zern	5599e4275a	Merge changes Ia5293d94,I90d481d3,Ia509d622,I54549b03,I89b635d6 * changes: ppc: Add convolve8_vsx and convolve8_avg_vsx ppc: Add convolve8_avg_vert_vsx ppc: Add convolve8_vert ppc: Add convolve8_horiz_avg ppc: Add convolve8_horiz	2017-05-03 03:31:19 +00:00
Luca Barbato	e2ad89092d	ppc: Add convolve8_vsx and convolve8_avg_vsx Change-Id: Ia5293d948003a7fff5a7cbad6e83d8a72717c857	2017-05-02 20:27:47 -07:00
Luca Barbato	e6ca81ee67	ppc: Add convolve8_avg_vert_vsx Only the generic one again, speedups for 8x8 and larger blocks to come later. Change-Id: I90d481d3a602d1e277ead8f3934eca126b86b72d	2017-05-02 20:27:42 -07:00
Luca Barbato	a65f1771ad	ppc: Add convolve8_vert Only the generic one again, speedups for 8x8 and larger blocks to come later. Change-Id: Ia509d6225984b4930ec03928c9bcbf51486da99f	2017-05-02 20:27:33 -07:00
Luca Barbato	77772350f3	ppc: Add convolve8_horiz_avg The 8x8 and larger blocks cases can be sped up further. Change-Id: I54549b03ac6c7a4e3f485738b100c3cac7ac2e15	2017-05-02 20:27:28 -07:00
Luca Barbato	08edb85bd0	ppc: Add convolve8_horiz The 8x8 and larger blocks cases can be sped up further. Change-Id: I89b635d6b01c59f523f2d54b1284ed32916c5046	2017-05-02 20:27:16 -07:00
James Zern	ee3df31d74	Merge "vpx_scale_test: fix segfault on alloc failure"	2017-05-01 19:22:22 +00:00
James Zern	2930903d51	vpx_scale_test: fix segfault on alloc failure check the return of ResetImage() before continuing Change-Id: Iff0b038f7b9761113b8cf33a511a5306640d1273	2017-04-29 13:12:53 -07:00
Luca Barbato	d51d3934f5	ppc: Add convolve_avg Change-Id: Ib203c444c708f42072e38301ee3db97b5b53d014	2017-04-29 15:47:25 +02:00
Luca Barbato	63860ba7b8	ppc: Add convolve_copy Change-Id: Ie26d6dbe090e711d84bac01ba7da270db983f405	2017-04-29 15:47:25 +02:00
Jerome Jiang	bea27a5809	Merge "Generalize vp9 sse2 denoiser test for other platforms."	2017-04-28 15:45:52 +00:00
Johann Koenig	94ebdba71d	Merge "vp9 temporal filter: sse4 implementation"	2017-04-28 13:22:41 +00:00
Jerome Jiang	26aebd77b8	Generalize vp9 sse2 denoiser test for other platforms. Renamed to vp9_denoiser_test. Change-Id: I0d8f4c94bcb81a60949a13d9fe839cee95d03f77	2017-04-27 22:47:41 -07:00
Johann	6dfeea6592	vp9 temporal filter: sse4 implementation Approximates division using multiply and shift. Speeds up both sizes (8x8 and 16x16) by 30 times. Fix the call sites to use the RTCD function. Delete sse2 and mips implementation. They were based on a previous implementation of the filter. It was changed in Dec 2015: `ece4fd5d22` BUG=webm:1378 Change-Id: I0818e767a802966520b5c6e7999584ad13159276	2017-04-26 22:03:05 -07:00
Yunqing Wang	b68f14d0ed	Merge "Make the row based multi-threaded encoder deterministic"	2017-04-26 16:12:14 +00:00
Linfeng Zhang	51dc998f3a	Update highbd convolve functions arguments to use uint16_t src/dst BUG=webm:1388 Change-Id: I6912de2639895d817ce850da8ea9f6c8fe21da42	2017-04-25 14:22:19 -07:00
Yunqing Wang	10a497bd38	Make the row based multi-threaded encoder deterministic This patch followed allow_exhaustive_searches feature modification and continued to modify the encoder to achieve the determinism in the row based multi-threaded encoding. While row-mt = 1 and using multiple threads, the adaptive feature in encoder was disabled, which gave BDRate gain(at speed 1, -0.6% ~ -0.7%; at speed 2, -0.46% ~ -0.59%), but some encoder speed losses(7% ~ 10% at speed 1 and 3% ~ 6% at speed 2). These speed losses were acceptable considering the speed gains obtained from row-mt. Change-Id: I60d87a25346ebc487a864b57d559f560b7e398bb	2017-04-24 16:28:27 -07:00
Marco	85ca2e8a8b	vp9: Re-enable SVC datarate tests. Re-enable the SVC tests, wrap the non-zero expectation in GetMismatchFrames around #if CONFIG_VP9_DECODER. Change-Id: I0e8a2d78b868c32f18fe597540f397d3a1b303b5	2017-04-20 12:08:08 -07:00
Luca Barbato	8975436466	ppc: Add the intra predictor tests Change-Id: Idea15b916044ab3d8e74519337880a484ecfd87e	2017-04-19 20:21:40 -07:00
Luca Barbato	914b160fb5	ppc: h predictor 8x8 Slightly faster with the current compiler. Change-Id: Iae225fac08395eb430c97a2abec69c60f5cf5c47	2017-04-19 19:57:51 -07:00
Luca Barbato	0b9be93205	ppc: d63 predictor 8x8 10x faster. Change-Id: I7cedbf4df2ce7df5b6f1108b11815d088fdb9ba8	2017-04-19 19:57:51 -07:00
Luca Barbato	ee9325b0bd	ppc: tm predictor 4x4 Slightly faster. Change-Id: I0ca43f309b3d9b50435d69bd5be64b53a99bd191	2017-04-19 19:57:51 -07:00
Luca Barbato	2904eb5800	ppc: h predictor 4x4 2x faster. Change-Id: I0583dec353299c6797401b646099f18db4e0420d	2017-04-19 19:57:51 -07:00
Luca Barbato	58245d7050	ppc: dc predictor 8x8 Slightly faster, the other dc predictors cannot be faster since the computation speedup is overwhelmed by the time spent reading dst to write just the 8x8 part. Change-Id: I94a0b50500adf8b7b6bb919dbf5c7adf5b9fba66	2017-04-19 19:57:51 -07:00
Luca Barbato	6b4a65e8b1	ppc: d45 predictor 8x8 11x faster. Change-Id: I5b8f39213ee1f5260724fc254e3fb5c462435798	2017-04-19 19:57:51 -07:00
Luca Barbato	92e33c7b31	ppc: d63 predictor 32x32 About 10x faster. Change-Id: If7d0645f75c5d7deb9751edd0bf47e2f9068e9e7	2017-04-19 19:57:51 -07:00
Luca Barbato	a5469a00a8	ppc: d63 predictor 16x16 About 18x faster. Change-Id: Id043bf76c011e03e992085bb5e20f330d3e98cd4	2017-04-19 19:57:51 -07:00
Luca Barbato	cc868da526	ppc: d45 predictor 32x32 About 12x faster. Change-Id: I22c150256aefb4941861ab1f6c17d554fb694bed	2017-04-19 19:57:51 -07:00
Luca Barbato	7a7dc9e624	ppc: d45 predictor 16x16 About 16x faster. Change-Id: Ie5469fb32d5fd11bb6cb06318cea475d8a5b00b9	2017-04-19 19:57:51 -07:00
Luca Barbato	c08baa2900	ppc: dc predictor 32x32 10x and 5x faster. Change-Id: I7913c58c768334d818f541a5e219f1035791eeaf	2017-04-19 19:57:47 -07:00
Luca Barbato	22ca468c7c	ppc: dc top and left predictor 32x32 6x faster. Change-Id: I717995b4056e5579c68191d11b495372971fe1ae	2017-04-19 19:49:31 -07:00
Luca Barbato	ad9dea1f6d	ppc: dc top and left predictor 16x16 13x faster. Change-Id: I1771ac39fda599153f933cb3f0506c9f97a6cbe6	2017-04-19 19:49:31 -07:00
Luca Barbato	d68d37872c	ppc: dc_128 predictor 32x32 6x faster. Change-Id: I1da8f51b4262871cb98f0aa03ccda41b0ac2b08b	2017-04-19 19:49:31 -07:00
Luca Barbato	f9d20e6df2	ppc: dc_128 predictor 16x16 20x faster. Change-Id: I05f0deb2d38ae7966eae6b71fbc0aa51880e5709	2017-04-19 19:49:31 -07:00
Luca Barbato	0d9417de4a	ppc: tm predictor 32x32 About 8x faster. Change-Id: I9bad827ccbdf47ec95406e961c74ac2ff45f80cf	2017-04-19 19:49:26 -07:00
James Zern	a81f037f15	Merge changes I1f5a3752,I95123051,I3bb724e0,Ie81077fa,Ic80f3c05, ... * changes: ppc: tm predictor 16x16 ppc: tm predictor 8x8 ppc: horizontal predictor 32x32 ppc: horizontal predictor 16x16 ppc: vertical intrapred 16x16 and 32x32 configure: Workaround clang not enabling altivec on -mvsx configure: Match power64 as ppc64	2017-04-20 02:45:45 +00:00
Linfeng Zhang	fbbdba3b04	Merge changes I9e18a73b,Ie47c8cd4 * changes: Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve Create CAST_TO_BYTEPTR/SHORTPTR	2017-04-19 23:55:58 +00:00
Linfeng Zhang	bf8a49abbd	Clean CONVERT_TO_BYTEPTR/SHORTPTR in convolve Replace by CAST_TO_BYTEPTR/SHORTPTR. The rule is: if a short ptr is casted to a byte ptr, any offset operation on the byte ptr must be doubled. We do this by casting to short ptr first, adding offset, then casting back to byte ptr. BUG=webm:1388 Change-Id: I9e18a73ba45ddae58fc9dae470c0ff34951fe248	2017-04-19 12:13:49 -07:00
Marco	f34be01190	vp9: Fix the disabling of a SVC 3TL datarate test. Change-Id: Ib42d23ab5ee39ab3c85e1d9a84e36249e59fe74e	2017-04-19 08:01:44 -07:00
Luca Barbato	479443a570	ppc: tm predictor 16x16 About 10x faster. Change-Id: I1f5a3752d346459df3b45f92963208bf3e520f06	2017-04-19 01:48:10 +02:00
Luca Barbato	c8f5a55df4	ppc: tm predictor 8x8 About 5x faster. Change-Id: I951230517f49c0dca9ac9eac2efa8916a303b85a	2017-04-19 01:48:09 +02:00
Luca Barbato	7b0e12934e	ppc: horizontal predictor 32x32 About 5x faster. Change-Id: I3bb724e07baffd901aa2d0f65060ba48882cc9b8	2017-04-19 01:48:09 +02:00
Luca Barbato	a7a2d1653b	ppc: horizontal predictor 16x16 About 10x faster. Change-Id: Ie81077fa32ad214cdb46bdcb0be4e9e2c7df47c2	2017-04-19 01:48:09 +02:00
Luca Barbato	7ad1faa6f8	ppc: vertical intrapred 16x16 and 32x32 Change-Id: Ic80f3c050cfbe7697e81a311b4edaaa597b85cab	2017-04-19 01:48:09 +02:00
Marco	15afee1938	vp9: Disable some SVC tests for now. Disable the 1 pass CBR SVC tests with temporal_layers > 1. Issue with the commit `863f860`, which will cause encoder/decoder mismatch due to skipping encoder loopfilter for non-reference frames. Will re-enable the tests when fixed. Change-Id: I74918a0045a17976b069c4be63fbeb921974df0d	2017-04-18 09:51:42 -07:00
Johann Koenig	a6095333a7	Merge "re-enable vpx_comp_avg_pred_sse2"	2017-04-17 22:07:34 +00:00
Marco Paniconi	9aa429a66d	Revert "Revert "vp9: Avoid encoder loopfilter for non-reference frames."" This reverts commit `e9b7f98c56`. Reason for revert: Commit `d578bdad` fixes the issue (encoder/decoder mismatch in 3TL datarate test) that causes the original revert. Original change's description: > Revert "vp9: Avoid encoder loopfilter for non-reference frames." > > This reverts commit `863f860bfc`. > > This causes encoder / decoder mismatches in various > VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests > > BUG=webm:1408 > > Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14 > TBR=jzern@google.com,marpan@google.com,builds@webmproject.org,jianj@google.com BUG=webm:1408 Change-Id: Ifeb81460856d1d56482d4e0477a70ee98f8bfaa6	2017-04-17 11:02:02 -07:00
Marco	d578bdad02	vp9: Datarate test: modify frame flags for 3 TL. Modify the frame flags to update the ARF on top layer, for the tests: VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayersFrameDropping This is needed to fix the encode/decoder mismatches caused by `863f860`, and removed in the revert `e9b7f98`. Change-Id: I6b9fecfdd17315fc0179e29949338c77636026c0	2017-04-17 09:33:20 -07:00
Johann	9fa24f03b5	re-enable vpx_comp_avg_pred_sse2 Buffers on 32 bit x86 builds only guaranteed 8 byte alignment. Fixed with "AvgPred test: use aligned buffers" and "sad avg: align intermediate buffer" Also re-enable asserts on the C version. BUG=webm:1390 Change-Id: I93081f1b0002a352bb0a3371ac35452417fa8514	2017-04-17 08:40:43 -07:00
Johann Koenig	9e19102972	Merge "AvgPred test: use aligned buffers"	2017-04-17 15:36:41 +00:00
James Zern	4ba20da8b1	Merge "Add AVX2 optimization to copy/avg functions"	2017-04-15 00:26:08 +00:00
Yi Luo	aa5a941992	Add AVX2 optimization to copy/avg functions Change-Id: Ibcef70e4fead74e2c2909330a7044a29381a8074	2017-04-14 16:50:10 -07:00
Johann Koenig	7178e68bbe	Merge "Disable vpx_comp_avg_pred_sse2"	2017-04-14 22:01:39 +00:00
Johann	e3b2710b04	AvgPred test: use aligned buffers BUG=webm:1390 Change-Id: Idb6d1ce119a09c5e7c9f3c58bbbae3de63463d1d	2017-04-14 12:49:56 -07:00
James Zern	e9b7f98c56	Revert "vp9: Avoid encoder loopfilter for non-reference frames." This reverts commit `863f860bfc`. This causes encoder / decoder mismatches in various VP9/DatarateTestVP9Large.BasicRateTargeting3TemporalLayers tests BUG=webm:1408 Change-Id: Ic200c39d7ed9c0b0247ef562f5d6f7b2625f7e14	2017-04-14 11:50:06 -07:00
Johann	eaa7cdf05d	Disable vpx_comp_avg_pred_sse2 Failures on windows: unknown file: error: SEH exception with code 0xc0000005 thrown in the test body. Alignment check errors on linux: test_libvpx: ../libvpx/vpx_dsp/variance.c:230: void vpx_comp_avg_pred_c(uint8_t , const uint8_t , int, int, const uint8_t *, int): Assertion `((intptr_t)comp_pred & 0xf) == 0' failed. BUG=webm:1390 Change-Id: I5eed5381c0f1a8fe594a128eb415e77232f544ea	2017-04-14 08:43:06 -07:00
Johann Koenig	bdb593ab20	Merge "vpx_comp_avg_pred: sse2 optimization"	2017-04-14 04:10:56 +00:00
Marco	863f860bfc	vp9: Avoid encoder loopfilter for non-reference frames. Useful for SVC, where the top layer enhancement frames may not update any reference buffers, as is the case for the patterns in the 1 pass CBR SVC when #temporal_layers > 1. ~3% encoder speedup for SVC patterns with temporal layers in 1 pass CBR mode. Updated the SVC datarate tests for the mismatch frames. Set the frame-dropper off in some tests with #temporal_layers > 1 so we can correctly set #mismatch frames. Adjusted rate target threshold for tests where frame-dropper was turned off. Change-Id: Ia0c142f02100be0fed61cd2049691be9c59d6793	2017-04-13 09:51:55 -07:00
Johann	28a8622143	vpx_comp_avg_pred: sse2 optimization Provides over 15x speedup for width > 8. Due to smaller loads and shifting for width == 8 it gets about 8x speedup. For width == 4 it's only about 4x speedup because there is a lot of shuffling and shifting to get the data properly situated. BUG=webm:1390 Change-Id: Ice0b3dbbf007be3d9509786a61e7f35e94bdffa8	2017-04-13 08:44:52 -07:00
Yunqing Wang	1aa46abbdf	VP9 motion vector unit test To prevent the motion vector out of range bug, added a motion vector unit test in VP9. In the 4k video encoding, always forced to use extreme motion vectors and also encouraged to use INTER modes. In the decoding, checked if the motion vector was valid, and also checked the encoder/decoder mismatch. The tests showed that this unit test could reveal the issue we saw before. Change-Id: I0a880bd847dad8a13f7fd2012faf6868b02fa3b4	2017-04-06 00:50:56 +00:00
Johann Koenig	eec92e8a5b	Merge "vpx_comp_avg_pred: add test"	2017-03-28 21:50:01 +00:00
Johann	6e99ed72a5	vpx_comp_avg_pred: add test BUG=webm:1389 Change-Id: I23cd65f1939db026958ccb5d70b8c5cc9aa5bc51	2017-03-28 14:11:14 -07:00
Marco	07ad5a15c2	vp9: Fix to condition on using source_sad for 1 pass real-time. Make the source_sad feature work properly for cases of VBR or screen_content with SVC. Added unittest for SVC with screen-content on. Change-Id: Iba5254fd8833fb11da521e00cc1317ec81d3f89b	2017-03-24 10:21:47 -07:00
Johann	83dd9b36f4	vp9 temporal filter: additional test Change tests to reflect use. Input sizes will be 8 or 16 (but not necessarily square). filter_weight is capped at 2 and filter_strength at 6 Speed test, disabled by default. Change-Id: Idfde9d6c4b7d93aaf0e641b0f4862c15e2a2af7a	2017-03-22 19:37:04 +00:00
Johann	36d732c22b	vp9 temporal filter: add const to function prototype The input frames are not modified. Change-Id: Ideb810e3c5afeb4dbdc4c7d54024c43a8129ad39	2017-03-22 18:14:21 +00:00
Marco	4ddde47d8c	vp9: Modify datarate tests to cover denoising with multi-threading. Change-Id: I6ed48a630edf9923c25a05deaca50e0afec43918	2017-03-21 15:57:33 -07:00
James Zern	e0b4c4d1ae	Merge "Add vpx_highbd_idct32x32_1024_add_neon()"	2017-03-21 03:27:35 +00:00
James Zern	6d71d33d55	Merge "Add vpx_highbd_idct32x32_34_add_neon()"	2017-03-21 03:02:51 +00:00
Johann	775569473d	temporal filter test: update types Use 'int' for w/h since it is that way everywhere else. Pass Buffer pointers Change-Id: I9eef6890af657baba171c6bcfcc85fc976173399	2017-03-17 13:22:28 -07:00
Johann Koenig	9675affae0	Merge "test: add vp9_temporal_filter_apply test"	2017-03-17 18:18:06 +00:00
Linfeng Zhang	27530d484e	Add vpx_highbd_idct32x32_1024_add_neon() BUG=webm:1301 Change-Id: Ib90af0c1712e56b301d0e981dbe9a641e15e36ca	2017-03-17 00:27:46 -07:00
Linfeng Zhang	50b13f75b8	Add vpx_highbd_idct32x32_34_add_neon() BUG=webm:1301 Change-Id: I74dd16c6c64e7bb71aa991cedccddf0663ef5e06	2017-03-17 00:27:46 -07:00
James Zern	2882778310	Merge "Add vpx_highbd_idct32x32_135_add_neon()"	2017-03-17 07:26:52 +00:00
Linfeng Zhang	65e9fb65e8	Add vpx_highbd_idct32x32_135_add_neon() BUG=webm:1301 Change-Id: I58c2d65d385080711c3666d6d8f9d241dac7b21a	2017-03-16 22:37:55 -07:00
Rafael de Lucena Valle	405b94c661	Add Hadamard for Power8 Change-Id: I3b4b043c1402b4100653ace4869847e030861b18 Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>	2017-03-15 23:46:18 -03:00
Jerome Jiang	2fa7092808	Merge "vp9: Enable row multithreading for SVC in real-time mode."	2017-03-14 23:29:46 +00:00
Johann	a14a987c82	test: add vp9_temporal_filter_apply test Add an independent implementation of the filter. BUG=webm:1379 Change-Id: I309c459b493c3011273b78b127a786bb23c59f9c	2017-03-13 15:26:26 -07:00
Linfeng Zhang	b0bfcc368c	Merge "Add vpx_highbd_idct32x32_135_add_c()"	2017-03-13 18:49:01 +00:00
Marco	ffb3c50da1	vp9: Enable row multithreading for SVC in real-time mode. Enable row-mt for SVC for real-time mode, speed >=5. Add the controls to the sample encoders, but keep it off for now. Add the control and enable it for the 1 pass CBR unittests. For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup. Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1	2017-03-10 01:01:07 +00:00
Linfeng Zhang	48f5886605	Add vpx_highbd_idct32x32_135_add_c() When eob is less than or equal to 135 for high-bitdepth 32x32 idct, call this function. BUG=webm:1301 Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6	2017-03-08 10:46:33 -08:00
Jerome Jiang	c4c0331f65	Shift speed 2 from non-large VP9 tests to large ones. This may fix the time out failure of valgrind tests in nightly since more coverages were added on row-mt. Change-Id: Id9414e66d1a266602c7495243d9f5cb69e17ccdc	2017-03-07 13:58:11 -08:00
Vignesh Venkatasubramanian	453f18040f	vp9,realtime: Enable row multithreading for non-rd Enable row level multithreading for realtime encodes where non-rd path is used (speed >= 5). Change-Id: I5439cb49a02171166d8e1de06c7d5e6f8e819a41	2017-03-02 11:03:56 -08:00
Chrome Cunningham	b71245683b	Merge "VPX_CODEC_CAP_HIGHBITDEPTH for decoder interface"	2017-03-01 18:01:14 +00:00
Chris Cunningham	bcd0c49af3	VPX_CODEC_CAP_HIGHBITDEPTH for decoder interface Moves the def from vpx_encoder.h -> vpx_codec.h. The defined value is changed as part of this move. Adds the value to decoder capabilities when CONFIG_VP9_HIGHBITDEPTH. Change-Id: I7d61fc821cda29f1e32bb9b2b9ffd3d83966e419	2017-02-28 17:10:34 -08:00
James Zern	66919e370b	vp9_ethread_test,cosmetics: s/new-mt/row-mt/ Change-Id: I8c145337adf49d30b88a17ff31501b8751ed1fa0	2017-02-28 15:13:11 -08:00
James Zern	3ab8a05b37	stress.sh: add vp9_stress_test_row_mt vp9_stress_test now forces --row-mt=0 to cover both versions Change-Id: I8d134879435bf1d8e76ab3fd89e698efba0e86b2	2017-02-28 15:09:30 -08:00
James Zern	b58a8ccb02	stress.sh: parameterize thread count Change-Id: Iae45266cea86585f0935af4012335198cf93719f	2017-02-28 15:09:30 -08:00
James Zern	4684d286de	stress.sh: add one pass encodes Change-Id: I38e6c988f17c56fbfacd95378b27ef8d77c75f90	2017-02-28 15:09:30 -08:00

... 3 4 5 6 7 ...

2234 Commits