generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	d9892e846f	Merge "Refactor choose_partitioning computing scheme"	2014-12-11 11:14:07 -08:00
Jingning Han	377d2f027a	Refactor choose_partitioning computing scheme This commit refactors the choose_partitioning function. It removes redundant memset calls and makes the encoder to calculate variance value per block only when it is needed. It reduces the average runtime cost of choose_partitioning by 60%. Overall it reduces speed -6 runtime by 2-5%. Change-Id: I951922c50d901d0fff77a3bafc45992179bacef9	2014-12-11 09:33:40 -08:00
Johann	26a0721268	Enable neon idct tests for intrinsics Change-Id: I45d4a22f3ecb9af172e37c95f168805e492c5493	2014-12-10 18:20:04 -08:00
James Yu	3f7c12dab9	VP9 common for ARMv8 by using NEON intrinsics 18 Add vp9_idct32x32_add_neon.c - vp9_idct32x32_1024_add_neon Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:20:04 -08:00
James Yu	3cfed4bf76	VP9 common for ARMv8 by using NEON intrinsics 14 Add vp9_idct16x16_add_neon.c - vp9_idct16x16_256_add_neon_pass1 - vp9_idct16x16_256_add_neon_pass2 - vp9_idct16x16_10_add_neon_pass1 - vp9_idct16x16_10_add_neon_pass2 Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:19:54 -08:00
James Yu	ce76aeb00d	VP9 common for ARMv8 by using NEON intrinsics 13 Add vp9_idct8x8_add_neon.c - vp9_idct8x8_64_add_neon - vp9_idct8x8_10_add_neon Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:56:54 -08:00
James Yu	8c25f4af6a	VP9 common for ARMv8 by using NEON intrinsics 12 Add vp9_idct4x4_add_neon.c - vp9_idct4x4_16_add_neon Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:49:59 -08:00
James Yu	420f58f2d2	VP9 common for ARMv8 by using NEON intrinsics 11 Add vp9_idct16x16_1_add_neon.c - vp9_idct16x16_1_add_neon Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:06:58 -08:00
James Yu	030ca4d0e5	VP9 common for ARMv8 by using NEON intrinsics 10 Add vp9_idct32x32_1_add_neon.c - vp9_idct32x32_1_add_neon Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:04:29 -08:00
James Yu	2772b45ac0	VP9 common for ARMv8 by using NEON intrinsics 09 Add vp9_idct8x8_1_add_neon.c - vp9_idct8x8_1_add_neon Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:52:33 -08:00
James Yu	9114f0afdb	VP9 common for ARMv8 by using NEON intrinsics 08 Add vp9_idct4x4_1_add_neon.c - vp9_idct4x4_1_add_neon Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:49:28 -08:00
Johann	2d8f581330	Merge "VP9 common for ARMv8 by using NEON intrinsics 07"	2014-12-10 11:40:46 -08:00
Johann	913d0adbaf	Merge "VP9 common for ARMv8 by using NEON intrinsics 04"	2014-12-10 11:40:29 -08:00
Paul Wilkins	65cfb808d0	Merge "Substantial restructuring of AQ mode 2."	2014-12-10 10:44:27 -08:00
Jingning Han	ad19724f1a	Merge "Use use_prev_frame_mvs flag for ref mv search branch"	2014-12-10 09:25:12 -08:00
Jingning Han	6fc289b9c0	Merge "Refactor update_state_rt"	2014-12-10 09:25:05 -08:00
Jingning Han	8bd88a3c83	Merge "Make RTC coding flow support sub8x8 in key frame coding"	2014-12-10 09:24:56 -08:00
Jingning Han	4cda7a1a9a	Merge "Cosmetic naming change"	2014-12-10 09:05:34 -08:00
Jingning Han	fb3cc0ed57	Merge "Take out redundant setting of mode_info from set_block_size"	2014-12-10 09:05:26 -08:00
Jingning Han	161f636809	Merge "Remove unused rd cost calculation from nonrd_use_partition"	2014-12-10 09:05:18 -08:00
Jim Bankoski	d01c7e3544	Merge changes I92251a8b,I5d23a685 * changes: Adds a decode perf test that builds a new file. Make the decoder Cfg available to encoder tests..	2014-12-10 06:42:08 -08:00
James Yu	01fc6f51e0	VP9 common for ARMv8 by using NEON intrinsics 07 Add vp9_convolve8_neon.c - vp9_convolve8_horiz_neon - vp9_convolve8_vert_neon Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	893534a996	VP9 common for ARMv8 by using NEON intrinsics 04 Add vp9_convolve8_avg_neon.c - vp9_convolve8_avg_horiz_neon - vp9_convolve8_avg_vert_neon Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	d12757f5c6	VP9 common for ARMv8 by using NEON intrinsics 03 Add vp9_copy_neon.c - vp9_convolve_copy_neon Change-Id: I291fc5423d06240876411bbceab03eae5ef585be Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:02:46 -08:00
Scott LaVarnway	617382a2e3	VP9 common for ARMv8 by using NEON intrinsics 02 Add vp9_avg_neon.c - vp9_convolve_avg_neon Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 19:00:21 -08:00
James Zern	10252275f8	Merge "Fix clang ioc warning due to NULL src_mi pointer."	2014-12-09 18:31:46 -08:00
Jingning Han	0cac834b5a	Use use_prev_frame_mvs flag for ref mv search branch Replace error_resilient flag with use_prev_frame_mvs in vp9_pick_inter_mode reference motion vector search selection. This effectively turns off the simplified ref mv search in the settings of frame resizing, even if error-resilient mode is off. Change-Id: I7fed814ee7bc0cb419a03b846e0fc2de46ba7686	2014-12-09 18:18:40 -08:00
Johann	65247d8f59	Merge "Add convolve_copy and convolve_avg to the test"	2014-12-09 16:51:35 -08:00
Jingning Han	e728678c50	Refactor update_state_rt Update the frame motion vector only if previous frame motion vector is needed for next frame reference motion vector. Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f	2014-12-09 15:35:49 -08:00
hkuang	4eee74d6ed	Fix clang ioc warning due to NULL src_mi pointer. The warning only happens in VP9 encoder's first pass due to src_mi is not set up yet. But it will not fail the encoder as left_mi and above_mi are not used in the first_pass and they will be set up again in the second pass. Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08	2014-12-09 14:32:48 -08:00
Johann	5810f1b4cd	Merge "VP9 common for ARMv8 by using NEON intrinsics 01"	2014-12-09 13:41:49 -08:00
Johann	1c3594c334	Add convolve_copy and convolve_avg to the test Change-Id: Ic9438031282e63e627550f7e4cdeda36e43e647b	2014-12-09 12:56:38 -08:00
Johann	09b7ad42e7	Merge "Disable neon assembly when neon is disabled"	2014-12-09 12:47:12 -08:00
Jim Bankoski	c2638bd80a	Adds a decode perf test that builds a new file. This allows us to track decode speed for new encodes so that we catch problems like an encode change that makes decode really slow. Change-Id: I92251a8b1f710b241f66e1042413df1b71b76038	2014-12-09 12:44:45 -08:00
James Yu	5b098b1825	VP9 common for ARMv8 by using NEON intrinsics 01 Add vp9_loopfilter_neon.c - vp9_lpf_horizontal_4_neon - vp9_lpf_vertical_4_neon - vp9_lpf_horizontal_8_neon - vp9_lpf_vertical_8_neon Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 12:26:56 -08:00
Jingning Han	225cdef665	Make RTC coding flow support sub8x8 in key frame coding This commit enables the use of sub8x8 blocks in RTC key frame encoding. It requires the block size to be preset and will decide the coding mode and encode the bit-stream. Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b	2014-12-09 11:34:58 -08:00
Johann	e577098079	Disable neon assembly when neon is disabled Change-Id: Idde266cd7287bb6bee016c90efeafa67550f94c6	2014-12-09 10:46:32 -08:00
Jingning Han	4bacaab46d	Cosmetic naming change Rename set_modeinfo_offsets as set_mode_info_offsets, to be more consistent with naming convention. Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d	2014-12-09 10:32:04 -08:00
Jingning Han	f051a7beab	Take out redundant setting of mode_info from set_block_size The later encoding process will take the top-left block's mode_info for pre-determined block size. Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5	2014-12-09 10:27:18 -08:00
hkuang	3dfdfd5c86	Merge "Clean up the logic of handling corrupted frame."	2014-12-09 10:23:18 -08:00
Paul Wilkins	e68c8dcfd2	Substantial restructuring of AQ mode 2. The restructure moves the decision into the rd pick modes loop and makes a decision based at the 16x16 block level instead of only the 64x64 level. This gives finer granularity and better visual results on the clips I have tested. Metrics results are worse than the old AQ2 especially for PSNR and this mode now falls between AQ0 and AQ1 in terms of visual impact and metrics results. Further tuning of this to follow. It should be noted that if there are multiple iterations of the recode loop the segment for a MB could change in each loop if the previous loop causes a change in the complexity / variance bin of the block. Also where a block gets a delta Q this will alter the rd multiplier for this block in subsequent recode iterations and frames where the segmentation is applied. Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7	2014-12-09 15:10:52 +00:00
Jingning Han	1395ded2a7	Remove unused rd cost calculation from nonrd_use_partition The per block rd cost calculation is not needed when partition size is preset. Change-Id: Ie5575248bbffb584e908aa13097f697ace6ec747	2014-12-08 18:45:19 -08:00
Johann	547cb14e15	Merge "Extend x32 check by also checking for __x86_64__."	2014-12-08 14:52:31 -08:00
Yunqing Wang	cddbdeabd0	Merge "SSSE3 Optimization for Atom processors using new instruction selection and ordering"	2014-12-08 13:34:54 -08:00
James Zern	c38d0490b3	Merge "Changes to assembler for NASM on mac."	2014-12-08 12:55:06 -08:00
levytamar82	8f9d94ec17	SSSE3 Optimization for Atom processors using new instruction selection and ordering The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors. By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved. In the original code, the PSHUBF uses every byte and is consecutively copied. This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result. For example: filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7 REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15 PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors. There was no observed performance impact on Core processors (expected). Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0	2014-12-08 13:11:01 -07:00
hkuang	f925e5ce0f	Merge "Improve the performance by caching the left_mi and right_mi in macroblockd."	2014-12-08 10:24:17 -08:00
Paul Wilkins	127f65531b	Merge "Use average mb energy from first pass in AQ2 test."	2014-12-08 09:01:39 -08:00
Frank Galligan	0f8e8330eb	Merge "Fix potential integer overflow."	2014-12-07 21:37:39 -08:00
Jim Bankoski	be7a285820	Make the decoder Cfg available to encoder tests.. Adds decoder config as a changeable parameter to unit tests, and changes end to end test to use commonly used parameters to enable base test of tiles encoding and frame parallel decoding. Change-Id: I5d23a6857303b4d68b92b15c3f2f04a1bcb4c2bb	2014-12-07 11:28:51 -08:00

1 2 3 4 5 ...

12246 Commits