generic-library/vpx

Author	SHA1	Message	Date
Scott LaVarnway	4e6b5079c6	VPX: remove scaled calls from FUN_CONV_1D and FUN_CONV_2D macros. The predict lut now handles this case. The encoder now calls vpx_scaled_2d() instead of vpx_convolve8() for scaling. Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7	2015-08-05 10:47:06 -07:00
Zoe Liu	7186a2dd86	Code refactor on InterpKernel It in essence refactors the code for both the interpolation filtering and the convolution. This change includes the moving of all the files as well as the changing of the code from vp9_ prefix to vpx_ prefix accordingly, for underneath architectures: (1) x86; (2) arm/neon; and (3) mips/msa. The work on mips/drsp2 will be done in a separate change list. Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46	2015-07-31 10:27:33 -07:00
Jingning Han	097d59c28c	Cosmetics - Fix header file order in unit tests Change-Id: I9582a8d74990125b71e8fe620f7f3f2585a30798	2015-07-29 20:48:25 -07:00
Scott LaVarnway	1ec0853d17	Delete ChangeFilterWorks test This test places 128 in positions that would not be found in the VP9 filter tables. The ssse3 code packs this table into chars and uses the pmaddubsw instruction, which treats the value as signed. The ssse3 code checks for 128 in position 3, skipping the ssse3 code if found, and calls vp9_convolve8_c(). vp9_convolve8_c() is also used for scaling. ChangeFilterWorks breaks the ssse3 scaling code found in other commits. Change-Id: I1f5a76834bc35180b9094c48f9421bdb19d3d1cb	2015-07-22 09:05:17 -07:00
James Zern	017253b7a3	remove vp9_get_interp_kernel() expose filter_kernels[] and do the table lookup directly Change-Id: I0b10bff0327c3e01a723736141a9ffd377cd3d20	2015-07-06 13:04:05 -07:00
Johann	ff8505a54d	Fix --disable-use-x86inc Change-Id: I374fcd8fb45a6893dcdeac6896671be142a99f06	2015-07-01 13:15:51 -07:00
Parag Salasakar	bdfbc3e876	mips msa vp9 convolve8 avg hv optimization average improvement ~4x-6x Change-Id: I7c8b4f2334491be8a859592606e568bc95d019aa	2015-06-04 08:11:01 +05:30
Parag Salasakar	b8c1cdcd12	mips msa vp9 convolve8 avg horiz optimization average improvement ~5x-8x Change-Id: I179a69ec620fbd69979bd128f05d18113618aab4	2015-06-03 11:33:42 +05:30
Parag Salasakar	c543d38ac7	mips msa vp9 convolve8 avg vert optimization average improvement ~4x-6x Change-Id: Ia2e6f770da46416ebec31fdcea5cc7878879a9d9	2015-06-03 09:55:25 +05:30
Parag Salasakar	ebf7466cd8	mips msa vp9 updated convolve horiz, vert, hv, copy, avg module Updated sources according to improved version of common MSA macros. Enabled respective convolve MSA hooks and tests. Overall, this is just upgrading the code with styling changes. Change-Id: If5ad6ef8ea7ca47feed6d2fc9f34f0f0e8b6694d	2015-06-02 12:03:51 +05:30
Parag Salasakar	f9f078ebb6	mips msa vp9 updated macros and disable all MSA functions Done little restructuring/styling changes to the sources like generic macro definitions, their use to reduce code lines, better code alignments etc. Disabled all MSA hooks and tests Change-Id: Ic6f2dce0b501f46b80c06c46c0fe2043d557b190	2015-05-29 13:34:33 +05:30
Parag Salasakar	95cb130f32	Merge "mips msa vp9 copy and avg convolve optimization"	2015-04-30 04:39:13 +00:00
Parag Salasakar	2301d10f73	mips msa vp9 copy and avg convolve optimization average improvement ~3x-5x Change-Id: I422e4c33ea7e6d6783ba40029438ccf21b0e76bb	2015-04-29 12:28:17 +05:30
James Zern	f274c2199b	vpx_mem: remove vpx_memcpy vestigial. replace instances with memcpy() which they already were being defined to. Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c	2015-04-28 19:59:41 -07:00
Parag Salasakar	ca90d4fd96	mips msa vp9 convolve8 horiz optimization average improvement ~6x-8x Change-Id: I7c91eec41aada3b0a5231dda7869b3b968f3ad18	2015-04-21 12:31:26 +05:30
Parag Salasakar	ef51c1ab5b	mips msa vp9 convolve8 hv optimization average improvement ~5x-8x Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b	2015-04-21 09:17:49 +05:30
Parag Salasakar	27d083c1b9	mips msa vp9 convolve8 vert optimization average improvement ~6x-10x Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa	2015-04-18 08:13:04 +05:30
Jim Bankoski	18d323606d	Fix test to call clear system state in convolve_test. Assembly tests should clear system state, as we have no expectation of proper system state in between test runs.. Change-Id: I0f591996c1f17ef2a5a8572a6b445f757223a144	2014-12-12 06:18:56 -08:00
James Yu	01fc6f51e0	VP9 common for ARMv8 by using NEON intrinsics 07 Add vp9_convolve8_neon.c - vp9_convolve8_horiz_neon - vp9_convolve8_vert_neon Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	893534a996	VP9 common for ARMv8 by using NEON intrinsics 04 Add vp9_convolve8_avg_neon.c - vp9_convolve8_avg_horiz_neon - vp9_convolve8_avg_vert_neon Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	d12757f5c6	VP9 common for ARMv8 by using NEON intrinsics 03 Add vp9_copy_neon.c - vp9_convolve_copy_neon Change-Id: I291fc5423d06240876411bbceab03eae5ef585be Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:02:46 -08:00
Scott LaVarnway	617382a2e3	VP9 common for ARMv8 by using NEON intrinsics 02 Add vp9_avg_neon.c - vp9_convolve_avg_neon Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 19:00:21 -08:00
Johann	1c3594c334	Add convolve_copy and convolve_avg to the test Change-Id: Ic9438031282e63e627550f7e4cdeda36e43e647b	2014-12-09 12:56:38 -08:00
Deb Mukherjee	27dce0f324	Test name changes to use SSE/SSE2 exactly Change-Id: I3b5a478d198868c2796366f0ac59d0e2036308b8	2014-11-07 13:44:19 -08:00
Deb Mukherjee	1929c9b391	Rename highbitdepth functions to use highbd prefix Uses highbd_ prefix convention consistently. Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e	2014-10-09 14:40:40 -07:00
Deb Mukherjee	d50716face	Incorporate WRAPLOW macro into non-highbitdepth tx Incorporates the WRAPLOW macro into the non-highbitdepth transforms to aid hardware verification between a software C model and an intended hardware implementation though the use of the configure options: --enable-experimental --enable-emulate-hardware. Note that to avoid further discrepancies between the sse/sse2 implementations of the transforms and the C implementation, when the emulate hardware option is invoked, we also disable sse/sse2/etc. Also incudes some minor cleanups/renaming etc. Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287	2014-10-03 11:38:05 -07:00
hkuang	db71c1bd55	Fix compile warning. warning: comparison between signed and unsigned integer expressions. Change-Id: Ib6ee7500fe910983f290fc321ad89c0ab9989455	2014-09-19 22:48:38 -07:00
Deb Mukherjee	0d3c3d3ce7	Adds high bitdepth convolve, interpred & scaling Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3	2014-09-18 07:26:17 -07:00
Deb Mukherjee	10783d4f3a	Adds high bitdepth transform functions and tests Adds various high bitdepth transform functions and tests. Much of the changes are related to using typedefs tran_low_t and tran_high_t for the final transform cofficients and intermediate stages of the transform computation respectively rather than fixed types int16_t/int. When vp9_highbitdepth configure flag is off, these map tp int16_t/int32_t, but when the flag is on, they map to int32_t/int64_t to make space for needed extra precision. Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8	2014-09-11 19:56:33 -07:00
Johann	8645a53039	Allow specifying opt dependencies If optimizations use more than one cpu feature, allow specifying them so that '--disable-X' still works https://code.google.com/p/webm/issues/detail?id=854 Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18	2014-09-11 13:43:48 -07:00
levytamar82	839911fb6d	Fix bug 804 A bug in Microsoft compiler was found in the function vp9_filter_block1d16_v8_avx2 and a workaround applied. the bug occur when there was 4 consecutive maddubs + min + adds intrinsic instructions. Change-Id: I83499faeb70971e650e5663fd2490360ddb1a51b	2014-08-07 15:09:24 -07:00
James Zern	dfc4e8f012	convolve_test: drop '_t' from local typenames _t is reserved by posix + switch to camelcase http://google-styleguide.googlecode.com/svn/trunk/cppguide.xml#Type_Names Change-Id: I2a22ffc36e9f88781bc7db0d5a28a7ed924bab1a	2014-07-18 20:38:08 -07:00
James Zern	29e1b1a4b0	tests: add API_REGISTER_STATE_CHECK used to wrap API functions to ensure full environment consistency as opposed to the renamed ASM_REGISTER_STATE_CHECK which is used with assembly functions. currently checks the FPU tag word in x86/x86_64 gcc builds to ensure emms has been called. Change-Id: Ie241772dbf903d33d516a1add4c8c6783f2e1490	2014-07-10 12:40:31 -07:00
James Zern	5704578f5f	convolve: disable avx2 variants tests failing under Win32/Win64 Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf	2014-06-09 18:42:03 -07:00
Yunqing Wang	4f0943b996	Turn on unit tests for AVX2 convolve functions This patch turned on unit tests for AVX2 convolve functions. Change-Id: I51b8bfdaa290fb22862c68af61abf2394d00d47c	2014-05-27 10:36:56 -07:00
Yaowu Xu	077144d206	Use extreme values for input in convovle tests The intepolation filter functions can be better tested withe extreme values, especially given the optimization functions are prone to overflow signed 16 bit intermediate value when operation order is wrong. Change-Id: I712142b0bc1e5969c692c0486a57ffa37c9742b5	2014-05-23 13:32:54 -07:00
Dmitry Kovalev	021eaabdb8	Hiding vp9_sub_pel_filters_{8, 8s, 8lp} filters in *.c file. Change-Id: Id401da740b0a0141caaef9e1bcccd981e5cef4a4	2014-05-14 16:21:41 -07:00
Johann	ce23931a3f	Only build neon assembly for armv7 targets Allow selectively building just the intrinsics for armv8 Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477	2014-05-12 08:52:02 -07:00
Dmitry Kovalev	3d4ed278e6	Reusing vp9_get_interp_kernel() function in unit tests. Change-Id: Ic24a371817c9dd5c4035a6fe01111bd9ab63f552	2014-04-21 14:15:35 -07:00
James Zern	002ad40897	test/: remove unnecessary extern "C"s Change-Id: I826655a708010149de231ca31a2e3ba4f1842c0c	2014-01-23 19:42:59 -08:00
Joshua Litt	51490e5654	Removing PARAMS macro for consistency Change-Id: I23ed873a6c47b15491a2ffbcdd4f0fdeef1207a0	2013-11-19 09:28:18 -08:00
Yunqing Wang	3fb728c749	SSE2 8-tap sub-pixel filter optimization To ensure fast encoding/decoding on devices without ssse3 support, SSE2 optimization of sub-pixel filters was done. Test using 1080p clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps with sse2 filters, and ~15fps with c filters. Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c	2013-10-10 14:12:47 -07:00
Parag Salasakar	40edab5e39	mips dsp-ase r2 vp9 decoder convolve module optimizations Change-Id: I401536778e3c68ba2b3ae3955c689d005e1f1d59	2013-10-02 16:58:37 -07:00
Tero Rintaluoma	e326cecf18	Fix intermediate height in convolve_c - Intermediate height was not correct i.e. when block size is 4 and y_step_q4 is 6. In this case intermediate height was (4*6) >> 4 = 1 and vertical interpolation needs two source pixels plus 7 extra pixels for taps. - Also if the current output block is 16x16 and we are using 4x upscaling we need only 12 rows after horizontal filtering instead of 16. Patch Set 2: Intermediate_height updated after CL 66723 "Fix bug in convolution functions (filter selection)" Change-Id: I5a1a1bc2ac9d5edb3a6e0818de618bf318fdd589	2013-08-30 10:31:21 +03:00
Adrian Grange	3f10831308	Fix bug in convolution functions (filter selection) (In response to Issue 604: https://code.google.com/p/webm/issues/detail?id=604) There were bugs in the convolution code for two cases: 1. Where the filter table was assumed to be aligned to a 256 byte boundary. The offset of the pixel in the source buffer was computed incorrectly. 2. Where no such alignment assumption was made. An incorrect address for the filter table base was used. To fix both problems, I now assume that the filter table is 256-byte aligned and modify the pixel offset calculation to match. A later patch should remove the restriction that the filter table is aligned to a 256-byte boundary. There was also a bug in the ConvolveTest unit test (convolve_test.cc). (Bug & initial fix suggestion submitted by Tero Rintaluoma and Sami Pietilä). Change-Id: I71985551e62846e55e40de9e7e3959d4805baa82	2013-08-23 11:16:08 -07:00
Jim Bankoski	c3809f3de5	Begin to restrict x86inc.asm usage Chromium does not support 32bit builds for Mac which use x86inc.asm. Make the files which include it work if 64bit or not PIC enabled starting with vp9_copy_sse2.asm Consolidate these targets in vp9_rtcd_defs.sh Change-Id: If18f0b957a611efd085a3ee7d245cf1eb91e8248	2013-08-05 12:07:30 -07:00
Johann	59dc4e9cdd	vp9_convolve8_neon placeholder Call the individually optimized horizontal and vertical functions. This implementation abuses the temp buffer. This will be replaced with a custom optimized function. Over 2x speedup. Change-Id: I5b908d2a73d264e9810d6022bbff73207a3055dd	2013-07-17 08:39:27 -07:00
Johann	a15bebfc0a	vp9_convolve8_[horiz\|vert]_avg Super basic conversion from the other implementations. Any changes to one should be trivial to copy over keep in sync. Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8	2013-07-12 16:21:33 -07:00
Johann	158c80cbb0	convolve8 optimizations for neon Independent horizontal and vertical implementations. Requires that blocks be built from 4x4 and [xy]_step_q4 == 16 6-10% improvement. CIF improved the least. Change-Id: I137f5ceae4440adc0960bf88e4453e55a618bcda	2013-07-11 11:08:19 -07:00
Ronald S. Bultje	decead7336	Replace copy_memNxM functions with a generic copy/avg function. Change-Id: I3ce849452ed4f08527de9565a9914d5ee36170aa	2013-07-10 18:27:24 -07:00

1 2

61 Commits