generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	56a8bc54a6	Properly store the tx_size of selected intra mode Use a temporary variable to store the transform size associated with the best intra mode and restore the mode_info if the overall best mode is intra mode. Change-Id: I2606e0061ad32f91b095462902b1eb734b128eea	2014-12-17 09:25:14 -08:00
Jingning Han	00d2211929	Merge "Remove reset mode_info array per frame"	2014-12-17 09:24:44 -08:00
Jingning Han	cc8a11d8a1	Merge "Set second ref frame to be NONE in key frame coding"	2014-12-17 09:24:39 -08:00
Paul Wilkins	b76312124d	Deleted unused #define FAST_MOTION_MV_THRESH no longer referenced. Change-Id: Idee6ee5a59ba330904c42b20c9ec35b6fc16f7a2	2014-12-17 14:59:22 +00:00
JackyChen	b363cedcd1	Use bit_depth in VP9Common as the flag of highbit. Change-Id: I881aefbe68f9c10bb4629a2a5ee1e42a225d5ab7	2014-12-16 21:45:01 -08:00
James Yu	aeeaa67987	VP9 common for ARMv8 by using NEON intrinsics 15 Re-write - vp9_lpf_horizontal_4_dual_neon in vp9_loopfilter_16_neon.c Change-Id: Ie14f63d352f9564ad01db3939a61d91cf6d21a31 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 20:00:26 -08:00
Johann	ebc1951c7c	Merge "Use defines for inline and __builtin_prefetch"	2014-12-16 18:04:04 -08:00
Jingning Han	200d93545e	Merge "Fix intra mode update process in vp9_pick_inter_mode"	2014-12-16 17:04:04 -08:00
JackyChen	9931070094	Add rectangle block support for MFQE. Only for the rectangle blocks larger than 16X16, SAD and Variance are still based on the internal square blocks. Change-Id: I3754da1b0254147313f86a0140dbf4f980f06a5a	2014-12-16 16:35:54 -08:00
Johann	4f7060a431	Merge "VP9 common for ARMv8 by using NEON intrinsics 16"	2014-12-16 16:15:48 -08:00
Jingning Han	ccdc448b70	Remove reset mode_info array per frame The mode_info array was unnecessarily reset to zero every frame when error resilient mode turned on, given that the mode info values per block will be assigned during mode search stage. This commit removes this reset operation. It reduces the runtime cost on memset operation to 1/3. The overall speed -6 runtime is reduced by 2%. Change-Id: I32ecb73338d8995cc0c5147de09357364f13d45b	2014-12-16 15:54:24 -08:00
Jingning Han	01613aa753	Set second ref frame to be NONE in key frame coding This commit explicitly set the second reference frame type to be NONE in key frame coding mode. This fixes a subtle dependency of reference motion vector used by next inter frame on mode_info reset before key frame coding. Change-Id: I5ff0359753fdc9992b0bfe889490f7a32d7d5f6a	2014-12-16 15:49:58 -08:00
Johann	2fdbf70d40	Use defines for inline and __builtin_prefetch These were established for compatibility. Make sure to use them. Most frequently they manifest as issues on Visual Studio builds. Change-Id: I39d764d2eb341b999d7a6132cb44b2acfc511160	2014-12-16 15:21:19 -08:00
Frank Galligan	5fdd0f1fe0	Merge "Revert "Revert "Add support for setting byte alignment."""	2014-12-16 15:14:17 -08:00
James Yu	aa8dd897c1	VP9 common for ARMv8 by using NEON intrinsics 16 Add vp9_reconintra_neon.c - vp9_v_predictor_4x4_neon - vp9_v_predictor_8x8_neon - vp9_v_predictor_16x16_neon - vp9_v_predictor_32x32_neon - vp9_h_predictor_4x4_neon - vp9_h_predictor_8x8_neon - vp9_h_predictor_16x16_neon - vp9_h_predictor_32x32_neon - vp9_tm_predictor_4x4_neon - vp9_tm_predictor_8x8_neon - vp9_tm_predictor_16x16_neon - vp9_tm_predictor_32x32_neon Change-Id: Ib5d54a4766a1b5127169045659974f33aa98376d Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 12:57:52 -08:00
James Yu	ba05a4c640	VP9 common for ARMv8 by using NEON intrinsics 19 Delete vp9_dc_only_idct_add_neon.c The function was merged with vp9_short_idct4x4_1_add (later vp9_idct4x4_1_add) in `d2de1ca` and should have been deleted then. Change-Id: Ie58ba3dd9dc7330a8f1238dd7dd71c9ed4639b94 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-16 11:14:12 -08:00
JackyChen	603cdcfce5	Merge "Fixed MFQE crash issue for highbit depth."	2014-12-16 11:12:03 -08:00
JackyChen	e7bad92689	Fixed MFQE crash issue for highbit depth. Check the flags, no MFQE for highbit now. Will add highbit support latter. Change-Id: I548c27593e0f47ab7f4c92b45f14fb037dc86591	2014-12-16 10:07:38 -08:00
Jingning Han	581c8dbd33	Merge "Initialize best_tx_size with invalid value"	2014-12-16 10:01:03 -08:00
Yaowu Xu	b60ae45f36	Merge "Prevent decoder from using uninitialized entropy context."	2014-12-16 09:30:24 -08:00
Jingning Han	b47f9c5802	Merge "Use right shift to replace division in vp9_pick_inter_mode"	2014-12-16 09:26:51 -08:00
Paul Wilkins	b6c75c5a8d	Improve motion detection for low complexity regions. Where there is very subtle motion, especially when combined with low spatial complexity, the codec sometimes fails to quickly pick up the ambient motion field. Once it has been established though the field propagates well using Nearest and Near MV. This patch looks specifically at the case where the Nearest and Near have not been established as non zero vectors and in this case discounts the cost of searching for a new vector in the rd code. This will almost certainly have some implications in terms of encode speed but it should be possible to mitigate the impact in a subsequent using first pass stats and the local spatial complexity. Average results for test sets approximately neutral. Change-Id: I44a29e20f11f7ab10f8c93ffbdc50183d9801524	2014-12-16 17:22:54 +00:00
Debargha Mukherjee	4ebdb4a1b9	Merge "Fix for crash in highbitdepth rt mode"	2014-12-16 06:41:54 -08:00
Jim Bankoski	abc5a66770	Merge "Fix the comments."	2014-12-16 06:25:01 -08:00
Peter de Rivaz	e3d19bfc63	Fix for crash in highbitdepth rt mode Change 72141 introduced a new use of vp9_avg_4x4. This call needs to switch to using vp9_highbd_avg_4x4 when performing high bitdepth encodes. Change-Id: I6a8ba4b62f8a75d0a917b365a55245e2f0438ea1	2014-12-16 10:55:49 +00:00
Jingning Han	df3e3ab6ff	Fix intra mode update process in vp9_pick_inter_mode When multiple intra modes are tested, the previous mode info update process may overwrite the selected best intra mode and make the final selection use an inter mode. This commit fixes this issue by moving the mode_info reset outside the intra mode search loop. Change-Id: I15ed4288a6b3cb0832104a5e6d5d9a25cd1a5b2b	2014-12-15 17:52:09 -08:00
Johann	1d059fa23e	Merge "VP9 common for ARMv8 by using NEON intrinsics 06"	2014-12-15 14:49:33 -08:00
Johann	37ea1e1218	Merge "VP9 common for ARMv8 by using NEON intrinsics 05"	2014-12-15 14:48:53 -08:00
Jingning Han	5c93dca3d3	Merge "Simplify rate-distortion modeling function"	2014-12-15 14:37:19 -08:00
Jingning Han	c2c7596fc7	Initialize best_tx_size with invalid value If vp9_pick_inter_mode works properly, it should at least check one coding mode and hence get best_tx_size assigned a valid value. There is no need to initialize best_tx_size with a legitimate value before starting the mode search. Change-Id: Ic0496cd89672ea9c2c512a9bd1da952190af9cba	2014-12-15 12:58:34 -08:00
Jingning Han	83e2c62aba	Use right shift to replace division in vp9_pick_inter_mode Make the variable reduction_fac log2 based and explicitly use right shift when computing intra_cost_penalty. Change-Id: I208f1fb879a02debb3b3fc64f9fd06260dcf1c86	2014-12-15 12:48:07 -08:00
Frank Galligan	c4f7079ad4	Revert "Revert "Add support for setting byte alignment."" This reverts commit `91471d6aad`. Fixes the compile issues if post_proc is enabled. Change-Id: Ib40a15ce2c194f9b5adfa65a17ab01ddf60f5a59	2014-12-15 12:20:37 -08:00
James Yu	4f856cd7fa	VP9 common for ARMv8 by using NEON intrinsics 06 Add vp9_iht8x8_add_neon.c - vp9_iht8x8_64_add_neon The assembly did not previously implement tx_type 0 BUG=716 Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-15 12:18:06 -08:00
James Yu	6b71013277	VP9 common for ARMv8 by using NEON intrinsics 05 Add vp9_iht4x4_add_neon.c - vp9_iht4x4_16_add_neon The assembly did not previously implement tx_type 0 BUG=715 Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-15 12:16:19 -08:00
James Zern	8d558f2ca5	Merge "vp9/MACROBLOCKD: reorder struct members"	2014-12-15 11:54:51 -08:00
Jingning Han	eefe869291	Simplify rate-distortion modeling function Use left shift to replace one multiplication. The computation outcome remains identical. Change-Id: I1e1737af0a245de0d2a2bde10f0c171477199fc1	2014-12-15 11:51:16 -08:00
Paul Wilkins	91471d6aad	Revert "Add support for setting byte alignment." Fails to compile. Bad calls to vp9_alloc_frame_buffer and vp9_realloc_frame_buffer in postproc.c This reverts commit `399823b6f5`. Change-Id: I29f0e173f8e185d3a303cfdb17813e1eccb51e3a	2014-12-15 11:54:13 +00:00
James Zern	c58c579ec4	vp9/MACROBLOCKD: reorder struct members improves locality of reference Change-Id: I0639b98bf38879f918173b3a1b25dd93090e88b4	2014-12-12 18:01:24 -08:00
James Zern	089086bc25	Merge "Optimize bit_read_buffer."	2014-12-12 16:29:42 -08:00
Frank Galligan	9c2601eb68	Merge "Add support for setting byte alignment."	2014-12-12 15:47:11 -08:00
hkuang	3cecce916b	Optimize bit_read_buffer. Change-Id: Iee43c34909deec9787b29c1c33672213b9f049df	2014-12-12 14:38:12 -08:00
James Zern	89ee8923a8	Merge "Remove redundant loads on 1d16_v8 filter."	2014-12-12 14:32:52 -08:00
James Zern	f82d7fd854	Merge "Remove redundant loads on 1d8_v8 filter."	2014-12-12 14:32:26 -08:00
James Zern	4d40a046da	Merge "vp9: move encoder-only member from common"	2014-12-12 14:28:55 -08:00
James Zern	2bf4b4852f	Merge changes Id6421838,I37499329 * changes: vp9: make postproc members depend on CONFIG_VP9_POSTPROC vp9_postproc: remove redundant CONFIG_* checks	2014-12-12 14:27:56 -08:00
Marco	7f59cff53d	Merge "Allow for 4x4 prediction blocks for key frame, speed 6."	2014-12-12 14:27:31 -08:00
James Zern	5ccff43292	Merge "vp9_loopfilter_mmx: remove some unused tables"	2014-12-12 14:25:53 -08:00
Frank Galligan	399823b6f5	Add support for setting byte alignment. Add support for setting byte alignment on the Y, U, and V plane of the reference buffers. The byte alignment must be a power of 2, from 32 to 1024. A value of 0 sets legacy alignment. Change-Id: I7c1399622f7aa68e123646369216b32047dda73d	2014-12-12 13:34:36 -08:00
James Zern	6d1a63a02a	Merge "Remove unnecessary dqcoeff memset."	2014-12-12 12:16:32 -08:00
Frank Galligan	6a24dbd71f	Remove redundant loads on 1d16_v8 filter. This CL showed about a 3% gain in performance on some systems. Change-Id: Id27e7e0b8e69068aa364e67859436da852669250	2014-12-12 11:48:47 -08:00
Frank Galligan	44ee777905	Remove redundant loads on 1d8_v8 filter. This CL showed a modest gain in performance on some systems. Change-Id: Iad636a89a1a9804ab7a0dea302bf2c6a4d1653a4	2014-12-12 11:34:24 -08:00
James Zern	72ece1308b	vp9: move encoder-only member from common allow_comp_inter_inter VP9_COMMON -> VP9_COMP Change-Id: I6d9dc25d1cdd7e2ab62f5be69cd9fa883d21dbb6	2014-12-12 11:17:44 -08:00
James Zern	ef06de33fe	vp9: make postproc members depend on CONFIG_VP9_POSTPROC Change-Id: Id64218386968cee3132269e4a0572650f20fd980	2014-12-12 11:17:17 -08:00
James Zern	890f7bedf3	vp9_postproc: remove redundant CONFIG_* checks the entire module is wrapped in CONFIG_VP9_POSTPROC which is forcibly enabled with CONFIG_INTERNAL_STATS + a similar change in vp9_alloccommon.c Change-Id: I374993297a9fba5bef2f0b71f984eba42f0995a3	2014-12-12 11:17:16 -08:00
James Zern	d456ccbc9d	vp9_loopfilter_mmx: remove some unused tables Change-Id: I964d25cc91c8e4864d73b142d9c7a1b39cb6cfbb	2014-12-12 11:16:24 -08:00
Jim Bankoski	d916b0f22f	Merge "vp9_dx_iface.c uses CONFIG_VP9_POSTPROC but config.h not included"	2014-12-12 11:10:17 -08:00
Jingning Han	3e0793b80b	Merge "Fix PICK_MODE_CONTEXT index in non-RD coding mode"	2014-12-12 09:16:01 -08:00
Jim Bankoski	c67859f737	vp9_dx_iface.c uses CONFIG_VP9_POSTPROC but config.h not included Change-Id: Id316b3786214bf1028992968955da917e3f2d4a3	2014-12-12 08:42:36 -08:00
Jingning Han	e2c2a65695	Fix PICK_MODE_CONTEXT index in non-RD coding mode This commit fixes a bug in the PICK_MODE_CONTEXT index for horizontal partition case. The compression performance change is less than 0.01% level, since most blocks are selected to use square block size in RTC coding mode. Change-Id: I67effc18ae8795fccdd82a55f4efc609fa5cb3e1	2014-12-11 17:21:24 -08:00
JackyChen	3425d6c83e	Merge "Multiframe Quality Enhancement(MFQE) in VP9."	2014-12-11 16:24:08 -08:00
Marco	7e99cd2a9b	Allow for 4x4 prediction blocks for key frame, speed 6. For key frame under variance source partition: 4x4 prediction blocks may be selected when variance of 8x8 block is very high (threshold is set fairly high for now). Testing on some RTC clips shows this helps to reduce some ringing artifacts on key frame. Encoded key frame size increases about ~10%. Key frame PSNR increases about ~0.1-0.2dB. Change-Id: I56e203fac32ea6ef69897fb3ea269c59cb50d174	2014-12-11 15:36:16 -08:00
Jingning Han	811c74cdfa	Merge "Replace division with bit shift in choose_partitioning"	2014-12-11 13:30:03 -08:00
Debargha Mukherjee	dd33c656da	Merge "Corrected optimization of 8x8 DCT code"	2014-12-11 12:28:45 -08:00
hkuang	3c7a06c3cc	Remove unnecessary dqcoeff memset. dqcoeff is set to be 0 on initialization. And set back to 0 after being used everytime. Change-Id: I32b8e149bba40a8d707849f737a8e49a691f319c	2014-12-11 12:27:25 -08:00
Jingning Han	d9892e846f	Merge "Refactor choose_partitioning computing scheme"	2014-12-11 11:14:07 -08:00
Jingning Han	d5c396a902	Replace division with bit shift in choose_partitioning This commit explicitly uses the bit shift operation instead of division for computing block variance. Change-Id: Id19c0ff27dd1d1ae4aceee6657e1aad0d406bd74	2014-12-11 11:06:57 -08:00
Alexander Voronov	6c6a97814f	Prevent decoder from using uninitialized entropy context. If decoding starts with intra-only frame, there is a possibility of using uninitialized entropy context, what leads to undefined behavior. Change-Id: Icbb64b5b1bd1e5de2a4bfa2884e56bc0a20840af	2014-12-11 20:44:19 +03:00
Peter de Rivaz	5c22224e9e	Corrected optimization of 8x8 DCT code The 8x8 DCT uses a fast version whenever possible. There was a mistake in the checking code which meant sometimes the fast version was used when it was not safe to do so. Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7 (cherry picked from commit `fd05fb0c21`)	2014-12-11 09:42:57 -08:00
Jingning Han	377d2f027a	Refactor choose_partitioning computing scheme This commit refactors the choose_partitioning function. It removes redundant memset calls and makes the encoder to calculate variance value per block only when it is needed. It reduces the average runtime cost of choose_partitioning by 60%. Overall it reduces speed -6 runtime by 2-5%. Change-Id: I951922c50d901d0fff77a3bafc45992179bacef9	2014-12-11 09:33:40 -08:00
JackyChen	7ac3e3c1d6	Multiframe Quality Enhancement(MFQE) in VP9. It is the first version of MFQE in VP9. There are a few TODOs included in this version. Usage: Add flag --enable-vp9-postproc to config the project. In decoder, use flag --mfqe in the command line to enable MFQE in postproc. Note: Need to have key frame with low quality to see the effect of this new patch. In my experiment, I fixed the qindex to 200 in key frame. Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396	2014-12-11 09:19:39 -08:00
James Yu	3f7c12dab9	VP9 common for ARMv8 by using NEON intrinsics 18 Add vp9_idct32x32_add_neon.c - vp9_idct32x32_1024_add_neon Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:20:04 -08:00
James Yu	3cfed4bf76	VP9 common for ARMv8 by using NEON intrinsics 14 Add vp9_idct16x16_add_neon.c - vp9_idct16x16_256_add_neon_pass1 - vp9_idct16x16_256_add_neon_pass2 - vp9_idct16x16_10_add_neon_pass1 - vp9_idct16x16_10_add_neon_pass2 Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 18:19:54 -08:00
James Yu	ce76aeb00d	VP9 common for ARMv8 by using NEON intrinsics 13 Add vp9_idct8x8_add_neon.c - vp9_idct8x8_64_add_neon - vp9_idct8x8_10_add_neon Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:56:54 -08:00
James Yu	8c25f4af6a	VP9 common for ARMv8 by using NEON intrinsics 12 Add vp9_idct4x4_add_neon.c - vp9_idct4x4_16_add_neon Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 14:49:59 -08:00
James Yu	420f58f2d2	VP9 common for ARMv8 by using NEON intrinsics 11 Add vp9_idct16x16_1_add_neon.c - vp9_idct16x16_1_add_neon Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:06:58 -08:00
James Yu	030ca4d0e5	VP9 common for ARMv8 by using NEON intrinsics 10 Add vp9_idct32x32_1_add_neon.c - vp9_idct32x32_1_add_neon Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 13:04:29 -08:00
James Yu	2772b45ac0	VP9 common for ARMv8 by using NEON intrinsics 09 Add vp9_idct8x8_1_add_neon.c - vp9_idct8x8_1_add_neon Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:52:33 -08:00
James Yu	9114f0afdb	VP9 common for ARMv8 by using NEON intrinsics 08 Add vp9_idct4x4_1_add_neon.c - vp9_idct4x4_1_add_neon Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-10 12:49:28 -08:00
Johann	2d8f581330	Merge "VP9 common for ARMv8 by using NEON intrinsics 07"	2014-12-10 11:40:46 -08:00
Johann	913d0adbaf	Merge "VP9 common for ARMv8 by using NEON intrinsics 04"	2014-12-10 11:40:29 -08:00
Paul Wilkins	65cfb808d0	Merge "Substantial restructuring of AQ mode 2."	2014-12-10 10:44:27 -08:00
Jingning Han	ad19724f1a	Merge "Use use_prev_frame_mvs flag for ref mv search branch"	2014-12-10 09:25:12 -08:00
Jingning Han	6fc289b9c0	Merge "Refactor update_state_rt"	2014-12-10 09:25:05 -08:00
Jingning Han	8bd88a3c83	Merge "Make RTC coding flow support sub8x8 in key frame coding"	2014-12-10 09:24:56 -08:00
Jingning Han	4cda7a1a9a	Merge "Cosmetic naming change"	2014-12-10 09:05:34 -08:00
Jingning Han	fb3cc0ed57	Merge "Take out redundant setting of mode_info from set_block_size"	2014-12-10 09:05:26 -08:00
Jingning Han	161f636809	Merge "Remove unused rd cost calculation from nonrd_use_partition"	2014-12-10 09:05:18 -08:00
James Yu	01fc6f51e0	VP9 common for ARMv8 by using NEON intrinsics 07 Add vp9_convolve8_neon.c - vp9_convolve8_horiz_neon - vp9_convolve8_vert_neon Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87 Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	893534a996	VP9 common for ARMv8 by using NEON intrinsics 04 Add vp9_convolve8_avg_neon.c - vp9_convolve8_avg_horiz_neon - vp9_convolve8_avg_vert_neon Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:03:07 -08:00
James Yu	d12757f5c6	VP9 common for ARMv8 by using NEON intrinsics 03 Add vp9_copy_neon.c - vp9_convolve_copy_neon Change-Id: I291fc5423d06240876411bbceab03eae5ef585be Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 20:02:46 -08:00
Scott LaVarnway	617382a2e3	VP9 common for ARMv8 by using NEON intrinsics 02 Add vp9_avg_neon.c - vp9_convolve_avg_neon Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 19:00:21 -08:00
Jingning Han	0cac834b5a	Use use_prev_frame_mvs flag for ref mv search branch Replace error_resilient flag with use_prev_frame_mvs in vp9_pick_inter_mode reference motion vector search selection. This effectively turns off the simplified ref mv search in the settings of frame resizing, even if error-resilient mode is off. Change-Id: I7fed814ee7bc0cb419a03b846e0fc2de46ba7686	2014-12-09 18:18:40 -08:00
Jingning Han	e728678c50	Refactor update_state_rt Update the frame motion vector only if previous frame motion vector is needed for next frame reference motion vector. Change-Id: Ica50f9d7b46ad4f815bba0d9e30f5546df29546f	2014-12-09 15:35:49 -08:00
hkuang	4eee74d6ed	Fix clang ioc warning due to NULL src_mi pointer. The warning only happens in VP9 encoder's first pass due to src_mi is not set up yet. But it will not fail the encoder as left_mi and above_mi are not used in the first_pass and they will be set up again in the second pass. Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08	2014-12-09 14:32:48 -08:00
Johann	5810f1b4cd	Merge "VP9 common for ARMv8 by using NEON intrinsics 01"	2014-12-09 13:41:49 -08:00
James Yu	5b098b1825	VP9 common for ARMv8 by using NEON intrinsics 01 Add vp9_loopfilter_neon.c - vp9_lpf_horizontal_4_neon - vp9_lpf_vertical_4_neon - vp9_lpf_horizontal_8_neon - vp9_lpf_vertical_8_neon Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb Signed-off-by: James Yu <james.yu@linaro.org>	2014-12-09 12:26:56 -08:00
Jingning Han	225cdef665	Make RTC coding flow support sub8x8 in key frame coding This commit enables the use of sub8x8 blocks in RTC key frame encoding. It requires the block size to be preset and will decide the coding mode and encode the bit-stream. Change-Id: I35aaf8ee2d4d6085432410c7963f339f85a2c19b	2014-12-09 11:34:58 -08:00
Jingning Han	4bacaab46d	Cosmetic naming change Rename set_modeinfo_offsets as set_mode_info_offsets, to be more consistent with naming convention. Change-Id: I68ca1f36c4a78127d9439a50c1506a2afd07927d	2014-12-09 10:32:04 -08:00
Jingning Han	f051a7beab	Take out redundant setting of mode_info from set_block_size The later encoding process will take the top-left block's mode_info for pre-determined block size. Change-Id: I76a90f9ce7f3b2dbc2975b52442114e461c465b5	2014-12-09 10:27:18 -08:00
hkuang	3dfdfd5c86	Merge "Clean up the logic of handling corrupted frame."	2014-12-09 10:23:18 -08:00
Paul Wilkins	e68c8dcfd2	Substantial restructuring of AQ mode 2. The restructure moves the decision into the rd pick modes loop and makes a decision based at the 16x16 block level instead of only the 64x64 level. This gives finer granularity and better visual results on the clips I have tested. Metrics results are worse than the old AQ2 especially for PSNR and this mode now falls between AQ0 and AQ1 in terms of visual impact and metrics results. Further tuning of this to follow. It should be noted that if there are multiple iterations of the recode loop the segment for a MB could change in each loop if the previous loop causes a change in the complexity / variance bin of the block. Also where a block gets a delta Q this will alter the rd multiplier for this block in subsequent recode iterations and frames where the segmentation is applied. Change-Id: I20256c125daa14734c16f7cc9aefab656ab808f7	2014-12-09 15:10:52 +00:00
Jingning Han	1395ded2a7	Remove unused rd cost calculation from nonrd_use_partition The per block rd cost calculation is not needed when partition size is preset. Change-Id: Ie5575248bbffb584e908aa13097f697ace6ec747	2014-12-08 18:45:19 -08:00
Yunqing Wang	cddbdeabd0	Merge "SSSE3 Optimization for Atom processors using new instruction selection and ordering"	2014-12-08 13:34:54 -08:00
James Zern	c38d0490b3	Merge "Changes to assembler for NASM on mac."	2014-12-08 12:55:06 -08:00
hkuang	81e5cb86d3	Fix the comments. Change-Id: I9789476865a1b24dad54115d8f7edb4fed780b90	2014-12-08 12:44:09 -08:00
hkuang	d05cf10fe7	Add error handling for frame parallel decode and unit test for that. Change-Id: I6e309e11f1641618d2424b7a2c0fe744b8974dec	2014-12-08 12:30:19 -08:00
levytamar82	8f9d94ec17	SSSE3 Optimization for Atom processors using new instruction selection and ordering The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors. By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved. In the original code, the PSHUBF uses every byte and is consecutively copied. This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result. For example: filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15 REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7 REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15 PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8 This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors. There was no observed performance impact on Core processors (expected). Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0	2014-12-08 13:11:01 -07:00
hkuang	f925e5ce0f	Merge "Improve the performance by caching the left_mi and right_mi in macroblockd."	2014-12-08 10:24:17 -08:00
Paul Wilkins	127f65531b	Merge "Use average mb energy from first pass in AQ2 test."	2014-12-08 09:01:39 -08:00
Frank Galligan	0f8e8330eb	Merge "Fix potential integer overflow."	2014-12-07 21:37:39 -08:00
James Zern	da464c483f	Merge "vp9 asserts: fix compile warning"	2014-12-05 21:09:42 -08:00
James Zern	3db785facc	Merge "vp9: fix frame-parallel encoding"	2014-12-05 19:00:48 -08:00
Deb Mukherjee	0d367474d0	Merge "Some internal-stats, vp9-highbitdepth bug fixes"	2014-12-05 17:49:52 -08:00
James Zern	6db81fd629	vp9: fix frame-parallel encoding the flag in the header wasn't being set based on the encoder configuration in non-intra only mode broken since: `fbc2fbf` Adding oxcf temp variable. Change-Id: Ib4cff9901889824bc4e68d7f0f6deb1e41df2f53	2014-12-05 17:44:46 -08:00
Jingning Han	bd6bfb93b0	Merge "Remove redundant rdcost reset"	2014-12-05 17:35:07 -08:00
Jingning Han	296afb9440	Merge "Fix a motion search skip condition in vp9_pick_inter_mode"	2014-12-05 17:35:04 -08:00
Jingning Han	3d8d1e374e	Merge "Remove redundant MB_MODE_INFO reset from vp9_pick_mode_inter"	2014-12-05 16:59:50 -08:00
hkuang	382f86f945	Improve the performance by caching the left_mi and right_mi in macroblockd. This improve the deocde performance by ~2% on Nexus 7 2013. Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3	2014-12-05 16:25:42 -08:00
James Zern	616b3a810f	vp9 asserts: fix compile warning string literal to int within an assert Change-Id: I76a173f96b9add5bf27c3f5ad5d72c6f30e51629	2014-12-05 16:20:42 -08:00
Jingning Han	17bedc54f5	Remove redundant rdcost reset The initial reset of this_rdc in vp9_pick_inter_mode is not needed, since it will be re-assign when used. Change-Id: Ic0e12d741cbab292fc214c1eabb48b129af7839b	2014-12-05 16:06:17 -08:00
Jingning Han	eadffb2d6e	Fix a motion search skip condition in vp9_pick_inter_mode Compare the current best mode rate-distortion cost with the skip threshold to decide if performing motion search. Change-Id: Ia071824f8dd3b7db485f424692a485a2da6a1a9f	2014-12-05 15:58:36 -08:00
Jingning Han	732d57c2b5	Remove redundant MB_MODE_INFO reset from vp9_pick_mode_inter Change-Id: I0222f7abc61202f4a83b117bbfb042ada6304562	2014-12-05 15:51:11 -08:00
hkuang	eaa6deee5b	Merge "Merge set_prev_mi function into encoder function."	2014-12-05 15:12:50 -08:00
Deb Mukherjee	37448d3e1f	Some internal-stats, vp9-highbitdepth bug fixes Change-Id: I0363d98f6f6558a43276aec48f27dca37c93f5ad	2014-12-05 13:40:50 -08:00
Jingning Han	6ae829088f	Merge "Remove redundant vp9_zero in choose_partitioning"	2014-12-05 11:47:58 -08:00
Jingning Han	69a9dc5cd3	Merge "Enable conditional skip path in rd_pick_intra_sby_mode"	2014-12-05 11:25:30 -08:00
Jingning Han	62c7356098	Merge "Use hybrid RD and non-RD coding flow for key frame coding"	2014-12-05 11:25:19 -08:00
Jingning Han	9d88b30854	Remove redundant vp9_zero in choose_partitioning It makes the overall speed -6 about 2% faster with no compression performance change. Change-Id: I680a967b421caa2c5a5cdb821311c4726a2df45a	2014-12-05 10:39:39 -08:00
Jingning Han	74ded4863e	Enable conditional skip path in rd_pick_intra_sby_mode These speed-up features for key frame coding are only turned on in the settings of hybrid non-RD and RD mode decision. It provides about 20% speed-up to the hybrid key frame coding at the expense of certain compression performance loss. For vidyo1, the key frame coding statistics are changed 9838F, 35.020 dB, 61677 us -> 9920F, 34.834 dB, 47556 us Overall rtc set compression performance is down by -0.257%. Change-Id: I0025447fda26bb7855e982955642b5f55d71b51f	2014-12-05 09:36:09 -08:00
Jingning Han	07711e9b27	Use hybrid RD and non-RD coding flow for key frame coding When block size is below 16x16, the encoder swap from non-RD to RD mode for key frame coding. This largely brough back the key frame compression performance. For vidyo1 at 1000 kbps, the key frame coding statistics are changed 9978F, 34.183 dB, 36807 us -> 9838F, 35.020 dB, 61677 us As compared to the full RD case 7187F, 34.930 dB, 214470 us The overall rtc set coding performance (single key frame setting) is improved by 1.5%. Change-Id: I78a4ecf025d7b24ec911e85be94e01da05e77878	2014-12-05 09:35:27 -08:00
Yunqing Wang	a3a4a34c60	Merge "vp9_ethread: the tile-based multi-threaded encoder"	2014-12-05 08:23:49 -08:00
Frank Galligan	4c4d7261e4	Fix potential integer overflow. ioc found a potential integer overflow in the rate control. This is related to https://code.google.com/p/webm/issues/detail?id=821 Change-Id: Ib6c4acd6e964972f932fce7490592eb134f2b7ea	2014-12-05 08:02:12 -08:00
Paul Wilkins	bb6e47c1c9	Merge "Increase strength of AQ1."	2014-12-05 04:11:43 -08:00
Debargha Mukherjee	15cf55b3ca	Merge "Use the RTC optimizations when in high bitdepth mode."	2014-12-04 19:22:27 -08:00
James Zern	b43c27ab6e	Merge "vp9_reader: reorder struct members"	2014-12-04 16:08:08 -08:00
Debargha Mukherjee	4bfde1071e	Merge "Corrected the renaming of CONFIG_VP9_HIGH ro CONFIG_VP9_HIGHBITDEPTH."	2014-12-04 15:52:35 -08:00
Peter de Rivaz	a306bd8274	Use the RTC optimizations when in high bitdepth mode. Change 72193 made the encoder behave differently when configured with and without high bitdepth. This change means the same algorithm is used for both. Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5	2014-12-04 15:48:42 -08:00
hkuang	dde819599b	Clean up the logic of handling corrupted frame. No more checking of corrupted reference frame as we skip decoding any non-intra frame in case of frame corrupted. Change-Id: I77d41bbb02fc5f61972740e2d411441eb6a17073	2014-12-04 15:07:59 -08:00
hkuang	62de07c8c6	Merge set_prev_mi function into encoder function. Change-Id: Ifcf2efbb232ea4cabcdebbe77e0820d121e4a6da	2014-12-04 14:44:23 -08:00
Yunqing Wang	eba9c762a1	vp9_ethread: the tile-based multi-threaded encoder Currently, VP9 supports column-tile encoding, which allows a frame to be encoded in multiple column tiles independently. The number of column tiles are set by encoder option "--tile-columns". This provides a way to encode a frame in parallel. Based on previous set of patches, this patch implemented the tile- based multi-threaded encoder. Each thread processes one or more tiles. Usage: For HD clips: --tile-columns=2 --threads=1/2/3/4 While using 4 threads, tests showed that the encoder achieved 2.3X - 2.5X speedup at good-quality speed 3, and 2X speedup at realtime speed 5. Change-Id: Ied987f8f2618b1283a8643ad255e88341733c9d4	2014-12-04 11:21:34 -08:00
Deb Mukherjee	4f860dba78	Merge "Fixes a missing highbitdepth convolve call bug"	2014-12-04 11:19:59 -08:00
Adrian Grange	9065da983f	Merge "Free motion vector array before re-allocating"	2014-12-04 07:08:37 -08:00
Peter de Rivaz	f610f88be4	Corrected the renaming of CONFIG_VP9_HIGH ro CONFIG_VP9_HIGHBITDEPTH. Change 71789 renamed CONFIG_VP9_HIGH to CONFIG_VP9_HIGHBITDEPTH. However, one use of CONFIG_VP9_HIGH was missed. Change-Id: I0ebb9c71380c6d810a25708d15471abf9533e695	2014-12-04 11:01:46 +00:00
Tom Finegan	7339681ee9	Merge "sse2 visual studio build fix"	2014-12-03 18:05:03 -08:00
Deb Mukherjee	70d9dbd818	Fixes a missing highbitdepth convolve call bug Bug was introduced in https://gerrit.chromium.org/gerrit/#/c/72122/ Change-Id: Idb500ea619a30e7bc50e22fb8ee03be5282f41db	2014-12-03 17:48:50 -08:00
Adrian Grange	b56451f488	Merge "Use memset for initialization to 0"	2014-12-03 16:50:39 -08:00
Deb Mukherjee	6615706af2	sse2 visual studio build fix Change-Id: Id8c8c3be882bcd92afea3ccec6ebdf3f208d28ef	2014-12-03 16:35:26 -08:00
Adrian Grange	979ee6e4c9	Free motion vector array before re-allocating Change-Id: I0c39136d67e1e83020d61f86b062a04182ec9b00	2014-12-03 16:07:32 -08:00
Marco	fb20a07c36	Merge "Increase delta-qp for aq=3 mode, after key frame."	2014-12-03 16:03:06 -08:00
Jingning Han	3665f194fa	Merge "Fix indent in source_var_based_partition_search_method"	2014-12-03 15:43:40 -08:00
Adrian Grange	73caef0500	Use memset for initialization to 0 Change-Id: I714ca22b5d51016bf8b035cf457616c707257641	2014-12-03 15:22:02 -08:00
James Zern	d5937cd268	Merge "vp9: sync threads after a longjmp"	2014-12-03 14:30:55 -08:00
Marco	a047e7cdf8	Increase delta-qp for aq=3 mode, after key frame. For a few refresh periods after key frame, use large qp-delta to increase quality ramp-up. Change-Id: Ib5a150fb2dfa6bafd0d4e6b5d28dfd0724b61319	2014-12-03 13:04:45 -08:00
Jingning Han	17176cd452	Fix indent in source_var_based_partition_search_method Change-Id: I6e5e0571d6967b9b992966336715e35bb97f187e	2014-12-03 12:37:36 -08:00
Jingning Han	8f3db5f22e	Merge "Remove unused ONE_LOOP entry from speed feature"	2014-12-03 11:34:42 -08:00
Jingning Han	228ec17ff2	Merge "Rework coeff probability model update for rtc coding"	2014-12-03 11:34:35 -08:00
Marco	8fd3f9a2fb	Enable non-rd mode coding on key frame, for speed 6. For key frame at speed 6: enable the non-rd mode selection in speed setting and use the (non-rd) variance_based partition. Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames), mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16. Loss in key frame quality (~0.6-0.7dB) compared to rd coding, but speeds up key frame encoding by at least 6x. Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6. Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405	2014-12-03 09:18:08 -08:00
Jingning Han	a8d8c0f633	Remove unused ONE_LOOP entry from speed feature Change-Id: I56ead0ebc2491144c4e79e5859b05e126176702c	2014-12-03 09:17:08 -08:00
Jingning Han	8fe50191c6	Rework coeff probability model update for rtc coding This commit reworks the ONE_LOOP_REDUCED coefficient probability model update process. It allows model update for every coefficient across the spectrum at a coarser resolution, instead of performing precise update only for certain subset of probability models. The overall runtime remains nearly same (<1% change) for speed -6. The compression performance is improved by 7.5% in PSNR for speed -5 and 4.57% for speed -6, respectively. Change-Id: Ifb17136382ee7e39a9f34ff4a4f09a753125c8d1	2014-12-03 09:15:25 -08:00
James Zern	6f7ab01451	vp9: sync threads after a longjmp Synchronize all threads immediately as a subsequent decode call may cause a resize invalidating some allocations. fixes one aspect of crbug.com/437655 Change-Id: Ie993b62c2756478543206ddbe43ec6268d90a470	2014-12-02 16:51:27 -08:00
Debargha Mukherjee	99874f55fb	Merge "Reinsert macro to fix issue 884."	2014-12-02 15:32:24 -08:00
Deb Mukherjee	1fbe0c7615	Merge "Fix a warning related to VPX_EFLAG_FORCE_KF check"	2014-12-02 14:03:55 -08:00
Peter de Rivaz	2c886953d1	Reinsert macro to fix issue 884. Change 72056 unfolded some macro definitions, but lost some alternative behaviour required for high bitdepth encodes. This causes the encoder to crash, see issue 884. Change-Id: I8ce4d73c9fe0a3c10ccb86fba210fabc8b2f0ccc	2014-12-02 13:45:26 -08:00
Deb Mukherjee	02941b0df2	Fix a warning related to VPX_EFLAG_FORCE_KF check Fixes a warning in chrome build. Change-Id: I8fa0fd3e7ba1aecf89e5f79ce94cd64ed6a9567c	2014-12-02 11:35:52 -08:00
Peter de Rivaz	7e40a55ef9	Added high bitdepth sse2 transform functions Also removes some spurious changes in common/vp9_blockd.h which was introduced by a rebase issue between nextgen and master branches. Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282 (cherry picked from commit `005d80cd05`) (cherry picked from commit `08d2f54800`) (cherry picked from commit `4230c2306c`)	2014-12-02 11:16:24 -08:00
Paul Wilkins	00e3626e13	Use average mb energy from first pass in AQ2 test. AQ2 modified to use mb_av_energy in defining variance thresholds used alongside complexity when defining the segment to be used for an SB64. Slight improvements in metrics (ssim and PSNR). Change-Id: Idb9cb73f7d9c4f7118cd7e84ac77b0f25cacbf81	2014-12-02 16:07:30 +00:00
Marco Paniconi	83fd18977f	Cyclic refresh: factor segment delta-q into rate control. Incorporate segment delta-q into estimated bits. This generally improves the rate control under cyclic refresh (aq=3) mode. Change-Id: I1dc60fb230e7d08357fae18909d8ed27bf58e037	2014-12-01 16:56:43 -08:00
Jingning Han	f59cb45e90	Merge "Remove repeated search_type_check_frequency assign"	2014-12-01 14:02:10 -08:00
Yunqing Wang	7af927e324	Merge "vp9_ethread: calculate and save the tok starting address for tiles"	2014-12-01 12:49:03 -08:00
Paul Wilkins	0d3d6e0e31	Increase strength of AQ1. This patch greatly increase the strength of AQ1. Visual tests show strong gains on many clips but their is a big hit on psnr. SSIM is more mixed with some winners and losers. Change-Id: Idaa5d3b41d8576096bfa000b62bc531c3d8bf6a1	2014-11-27 10:53:37 +00:00
Jingning Han	a6df0cbcca	Remove repeated search_type_check_frequency assign This parameter is initialized as 50. No need to re-assign the same value in speed -6. Change-Id: I8735a5593412df2fdcee53ae45c8ebd1c3d792e7	2014-11-25 18:36:41 -08:00
Yunqing Wang	0993bef7e9	vp9_ethread: calculate and save the tok starting address for tiles Each tile's tok starting address is calculated before the encoding process. These addresses are stored so that the same calculation won't be done again in packing bit stream. Change-Id: I0a3be0301f002260c19a850303f2f73ebc47aa50	2014-11-25 17:19:35 -08:00
Yaowu Xu	e4234b3f8b	Separate rate_correction_factor for boosted GFs When the golden frame is boosted, the rate correction factor is not correlated well with other inter frames even in CBR mode. This commit changes to use GF specific rate_correction_factor when gf_cbr_boost is greater than 20%. Change-Id: I6312c1564387bcacc11f4c5e8a9cfdc781b5c3ab	2014-11-25 14:32:07 -08:00
Jingning Han	a04ed98482	Cosmetic change in vp9_pick_inter_mode Change-Id: Ic072585ebffdb36982ed7b8b9f875ca6c1c656c4	2014-11-25 09:42:57 -08:00
Jingning Han	92a7cfc8bf	Adaptively adjust mode test kick-off thresholds in RTC coding This commit allows the encoder to increase the mode test kick-off thresholds if the previous best mode renders all zero quantized coefficients, thereby saving motion search runs when possible. The compression performance of speed -5 and -6 is down by -0.446% and 0.591%, respectively. The runtime of speed -6 is improved by 10% for many test clips. vidyo1, 1000 kbps 16578 b/f, 40.316 dB, 7873 ms -> 16575 b/f, 40.262 dB, 7126 ms nik720p, 1000 kbps 33311 b/f, 38.651 dB, 7263 ms -> 33304 b/f, 38.629 dB, 6865 ms dark720p, 1000 kbps 33331 b/f, 39.718 dB, 13596 ms -> 33324 b/f, 39.651 dB, 12000 ms mmoving, 1000 kbps 33263 b/f, 40.983 dB, 7566 ms -> 33259 b/f, 40.978 dB, 7531 ms Change-Id: I7591617ff113e91125ec32c9b853e257fbc41d90	2014-11-25 09:42:08 -08:00
Jingning Han	30104207fd	Merge "Rework forward txfm/quantization skip system in RTC coding mode"	2014-11-25 09:33:57 -08:00
Jingning Han	6912c44135	Merge "Remove redundant intra mode penalty from vp9_pick_inter_mode"	2014-11-24 22:13:44 -08:00
James Zern	e1f55e0441	vp9_reader: reorder struct members improves locality of reference Change-Id: Ia4d55bb8c98e479528d88303fa35e8c74fbf939d	2014-11-24 22:10:39 -08:00
Yunqing Wang	edbd61e136	vp9_ethread: modify VP9_COMP structure This patch modified struct VP9_COMP. Created a struct ThreadData to include data that need to be copied for each thread. In multiple thread case, one thread processes one tile. all threads share one copy of VP9_COMP, (refer to VP9_COMP cpi in the code) but each thread has its own copy of ThreadData, (refer to ThreadData td in the code). Therefore, within the scope of encode_tiles(), both cpi and td need to be passed as function parameters. In single thread case, the FRAME_COUNTS pointer in ThreadData points to "counts" in VP9_COMMON. Change-Id: Ib37908b2d8e2c0f4f9c18f38017df5ce60e8b13e	2014-11-24 17:57:38 -08:00
Alex Converse	60ef6c0735	Merge "Fix a tautological assert."	2014-11-24 16:36:53 -08:00
Alex Converse	0496d11486	Fix a tautological assert. Change-Id: I90ad08823e1d038384536fa9f458caadc2c87f38	2014-11-24 15:01:01 -08:00
Jingning Han	25be81e2dd	Remove redundant intra mode penalty from vp9_pick_inter_mode The intra mode penalty is covered by intra_cost_penalty. This commit removes the other intra cost threshold, provided that the constant 50 is negligible in normal rate-distortion cost. Change-Id: I9b8b7483c43b9a41741622e7057def1f7d51bb72	2014-11-24 14:55:59 -08:00
Jingning Han	e6fb9c0b0b	Merge "Key frame non-RD mode decision process"	2014-11-24 13:21:56 -08:00
Debargha Mukherjee	e9d9f1adab	Merge "Refactored idct routines and headers"	2014-11-24 12:47:03 -08:00
John Stark	71379b87df	Changes to assembler for NASM on mac. fixes non-Apple nasm part of issue #755 Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a	2014-11-24 12:00:50 -08:00
Peter de Rivaz	3a8c43a479	Refactored idct routines and headers This change is made in preparation for a subsequent patch which adds acceleration for the highbitdepth transform functions. The highbitdepth transform functions attempt to use 16/32bit sse instructions where possible, but fallback to using the C implementations if potential overflow is detected. For this reason the dct routines are made global so they can be called from the acceleration functions in the subsequent patch. Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665 (cherry picked from commit `454342d4e7`)	2014-11-24 09:57:40 -08:00
Jingning Han	2fbdfd2c66	Key frame non-RD mode decision process This commit makes a non-RD coding mode decision process for key frame coding. It can be optionally turned on in speed -6 and above. Change-Id: I0847258b392877a0210b4768bef88ebc9ad009b5	2014-11-24 09:04:28 -08:00
Marco	681d5e9024	Merge "Only allow for cyclic refresh (aq=3 mode) for base layer."	2014-11-24 07:46:36 -08:00
Paul Wilkins	2232c3e34b	Merge "Fix some minor nits."	2014-11-21 17:39:43 -08:00
Debargha Mukherjee	02355a4abf	Merge "Added highbitdepth sse2 acceleration for quantize"	2014-11-21 16:08:47 -08:00
Paul Wilkins	d28e9ed452	Merge changes Ie077edd0,Id31a74fc * changes: Remove rate component adjustment for AQ1 Switch AQ1 segment basis from q ratio to rate ratio.	2014-11-21 15:38:32 -08:00
Paul Wilkins	771259fe10	Merge "Add adaptive midpoint for AQ1."	2014-11-21 15:26:18 -08:00
Paul Wilkins	6dbf83d082	Merge "Add variance restriction to AQ2."	2014-11-21 15:25:43 -08:00
Marco	53c3f2ca4d	Only allow for cyclic refresh (aq=3 mode) for base layer. Condition existed for temporal case, added it for spatial as well. Issue: https://code.google.com/p/webm/issues/detail?id=878. Change-Id: I38339207f9a94924f5568a081eabe64f867a686d	2014-11-21 14:47:32 -08:00
Paul Wilkins	ea494c0e76	Fix some minor nits. Change-Id: Ib8810d431fa20a2c78e0caaa28eb2c99903e60fb	2014-11-21 14:13:59 -08:00
Paul Wilkins	a867bb538b	Merge "Further AQ1 clean up."	2014-11-21 12:58:03 -08:00
Jingning Han	7428cebe4f	Rework forward txfm/quantization skip system in RTC coding mode This commit allows more aggressive decision to skip forward transform and quantization for luma component in RTC coding mode. The chroma components remains going through the normal coding routine, since they are not included in the non-RD mode search process. It reduces the runtime cost by 2% - 10%. In speed -6, vidyo1 1000 kbps 16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms nik720p 1000 kbps 33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms dark720p 1000 kbps 33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms The compression performance of speed -6 is improved by 0.44% in PSNR and 1.31% in SSIM. Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7	2014-11-21 12:46:40 -08:00
Paul Wilkins	b87c51ce55	Merge "Initial AQ1 restructuring."	2014-11-21 12:10:03 -08:00
Paul Wilkins	f5209d7e01	Remove rate component adjustment for AQ1 In AQ1 a rate adjustment was applied for blocks coded with a deltaq. This tends to skew the partition selection and cause rate overshoot. For example, consider a 64x64 super block where some but not all sub blocks are in a low q segment and some are in a high q segment. The choice of Q when considering large partition and transform sizes is defined by the lowest sub block segment id (currently this implies the lowest Q). If some parts of the larger partition are very hard this will cause a high rate component. The correct behavior here is for the rd code to discard the large partition choice and break down to sub blocks where some have low and some have high Q. However the rate correction factor above mask the high cost of coding at a larger partition size. Change-Id: Ie077edd0b1b43c094898f481df772ea280b35960	2014-11-21 08:51:58 -08:00
Paul Wilkins	1663eff7f8	Switch AQ1 segment basis from q ratio to rate ratio. In defining the Q deltas for segments in AQ1 use a rate ratio rather than a q ratio. Change-Id: Id31a74fcf2b7e55437e42a51c21b3cbcb57028d4	2014-11-21 08:50:57 -08:00
Paul Wilkins	fc47c5d653	Add adaptive midpoint for AQ1. Make the midpoint variance used in AQ mode 1 segmentation depend on the overall complexity of the frame in two pass. Change-Id: I452814ec57f7a32352e41bb250e78066abe952dd	2014-11-20 18:37:34 -08:00
Alex Converse	bc1b3d8412	Allow DC/H/V/TM on screen content. 6.3% better compression less than 1% compression time increase Change-Id: Ie83c059436e54c09de9e7c87e06e0a6d40dc38fe	2014-11-20 18:04:57 -08:00
Alex Converse	722e9d611b	Drop special inter mode selection for screen content. Better mode selection was implemented for all content. Change-Id: I479778ed21d3968892f4dce396c83733583f4f23	2014-11-20 18:04:57 -08:00
Yunqing Wang	72522dbc86	Merge "vp9_ethread: move filter_cache out of RD_OPT struct"	2014-11-20 16:51:31 -08:00
Paul Wilkins	d031237999	Add variance restriction to AQ2. Add an additional restriction to bit/complexity based segmentation based on spatial variance. Only lower Q when both the number of bits spent in the initial encoding pass and the spatial complexity are below a threshold. This will prevent the low Q segments being used just because there is a surfeit of bits. Small metrics gains especially opsnr. derf ~0.2% std-hd ~0.3% Change-Id: I6a8496d466d673f9b0e2b2ca6304ea7b6d8e1cce	2014-11-20 16:23:35 -08:00
Paul Wilkins	3d1e8c9a85	Further AQ1 clean up. Further patch to restructure AQ mode 1. Change-Id: I566452a033d047a49a40441a7be24690ea69412d	2014-11-20 16:00:51 -08:00
Paul Wilkins	6a760d483d	Initial AQ1 restructuring. This is the first of a series of patches to restructure and improve AQ mode 1 (variance based AQ). Change-Id: Idcf693131a3ea2459dcfd957a54a65b971fa4a2a	2014-11-20 15:50:15 -08:00
Paul Wilkins	b74eeb8675	Merge "Fix bug in calculating number of mbs with scaling."	2014-11-20 15:45:41 -08:00
Yunqing Wang	54ba65a63e	Merge "vp9_ethread: move max/min partition size to mb struct"	2014-11-20 14:00:37 -08:00
Yunqing Wang	379334c2d8	vp9_ethread: move filter_cache out of RD_OPT struct Similar to mask_filter, the filter_cache in RD_OPT struct can be moved out, and declared as a local variable since it is only used in pick_inter_mode functions. Change-Id: I412b99cca82bade07ac912064ec03dd1de6b2c17	2014-11-20 13:44:16 -08:00
Yunqing Wang	0b71fdbf80	Merge "vp9_ethread: change mask_filter to a local variable"	2014-11-20 13:02:55 -08:00
Yunqing Wang	bdaa3eaf43	Merge "Revert "vp9_ethread: include a pointer to mb in VP9_COMP""	2014-11-20 12:27:34 -08:00
Paul Wilkins	5e5da2e963	Fix bug in calculating number of mbs with scaling. Correct calculation of number of mbs in two pass code when frame resizing is enabled. Always use initial number of mbs if scaling is enabled, as this is what was used in the first pass. Change-Id: I49a4280ab5a8b1000efcc157a449a081cbb6d410	2014-11-20 12:24:43 -08:00
Yunqing Wang	b0efddd8e6	vp9_ethread: change mask_filter to a local variable The mask_filter in RD_OPT struct is used to record rd result in filter decision. It is only used in pick_inter_mode functions, and is removed from the struct and declared as a local variable. Change-Id: I3c95c8632ba7241591ce00ef2ef5677b5e297d7b	2014-11-20 09:41:49 -08:00
Yunqing Wang	ad7586a9e1	vp9_ethread: move max/min partition size to mb struct The max_partition_size and max_partition_size are set at the beginning while setting speed features, and then adjusted at SB level. Moving them to mb struct ensures there is a local copy for each thread. Change-Id: I7dd08dc918d9f772fcd718bbd6533e0787720ad4	2014-11-20 09:24:50 -08:00
Yunqing Wang	70c9d2983b	Revert "vp9_ethread: include a pointer to mb in VP9_COMP" This reverts commit `6906d218dd`. Another way will be used to handle mb struct. Change-Id: Ic1111a46b2b1ee00f8f9e3fcd4cf3eb6030b2dc4	2014-11-20 08:31:12 -08:00
Peter de Rivaz	a7b2d09f36	Added highbitdepth sse2 acceleration for quantize Also includes block error. (This patch is mostly cherry picked from commit `db7192e0b0`) Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78	2014-11-19 23:55:19 -08:00
Jingning Han	c42715b721	Enable ssse3 version of vp9_fdct8x8_quant It improves the speed performance of vp9_fdct8x8_quant_sse2 by about 5%. Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097	2014-11-19 22:14:19 -08:00
Yaowu Xu	21db24efcb	Add a reset to rc tracking for dropped frames VP9/DatarateTestVP9Large.ChangingDropFrameThresh/[34] fails post the merge of commit#ffa06b37. This commit adds reset of rc tracking info when frame is dropped, and fixes the causes of the bad interaction between the tests and the previous commit. Change-Id: I848acfd9fcb336359662274325190f94aac76eae	2014-11-19 15:32:11 -08:00
Jingning Han	bf63652d34	Merge "Combine fdct8x8 and quantization process"	2014-11-19 11:17:44 -08:00
Jingning Han	ce77a7bcb0	Merge "Add sse2 version for vp9_quantize_fp"	2014-11-19 11:17:36 -08:00
Jingning Han	c6908fd5f7	Combine fdct8x8 and quantization process This commit reworks the forward transform and quantization process for 8x8 block coding. It combines the two operations in a single function to save a store/load stage of the original transform coefficients. Overall the speed -6 is slightly faster (around 1% range). The compression performance of speed -6 is improved by 3.4%. Change-Id: Id6628daef123f3e4649248735ec2ad7423629387	2014-11-18 18:10:56 -08:00
Yaowu Xu	587f0cd39d	Merge "Prevent severe rate control errors in CBR mode"	2014-11-18 16:56:06 -08:00
Marco	3c715863da	Merge "Modify active_worst_quality setting for one pass CBR."	2014-11-18 11:22:18 -08:00
Yaowu Xu	550707b9e1	Merge "change to call vp9_refining_search_sad() directly"	2014-11-18 09:14:02 -08:00
Yaowu Xu	ffa06b3708	Prevent severe rate control errors in CBR mode In rare cases, the interaction between rate correction factor and Q choices may cause severe oscillating frame sizes that are way off target bandwidth. This commit adds tracking of rate control results for last two frames, and use the information to prevent oscillating Q choices. Change-Id: I9a6d125a15652b9bcac0e1fec6d7a1aedc4ed97e	2014-11-18 09:05:57 -08:00
Jingning Han	2d3cc8ea2b	Add sse2 version for vp9_quantize_fp vp9_quantize_fp is the quantization process used by rtc coding mode. This commit adds a sse2 implementation of it. The implementation is modified based on vp9_quantize_b_sse2. No speed difference from ssse3 version. Change-Id: I24949c5b27df160b4f35117d28858d269454e64a	2014-11-18 09:01:41 -08:00
Jingning Han	13a999de8e	Merge "Add empty pointer check to pred buffering in rtc coding mode"	2014-11-17 17:40:54 -08:00
Marco	b660f723b4	Modify active_worst_quality setting for one pass CBR. Current setting had active_worst_quality set too high (close to worst_quality) for first frame(s) following first key frame. This changes that to be somewhat more aggressive in allowing active_worst_quality to be lower following key frame. Also remove the 4/5 reduction in active_worst for key frame as this should be set by the user qp_max setting. Change-Id: I0530b3ddcc85c00e3eb7568de1b14a31206c4a4c	2014-11-17 11:46:49 -08:00
Yaowu Xu	1687c47bfd	change to call vp9_refining_search_sad() directly The function pointer in compressor instance does not change, so this commit changes to call the function directly. Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5	2014-11-17 11:30:17 -08:00
Jingning Han	a62c87fb04	Add empty pointer check to pred buffering in rtc coding mode This commit adds a check condition to the prediction buffering operation used in the rtc coding mode. This resolves a unit test warning in example/vpx_tsvc_encoder_vp9_mode_7. Change-Id: I9fd50d5956948b73b53bd8fc5a16ee66aff61995	2014-11-17 11:24:07 -08:00
Yunqing Wang	4539c496bc	Merge "Code cleanup: remove unused members in RD_OPT"	2014-11-17 09:10:28 -08:00
Yunqing Wang	3f4a93baf2	Merge "vp9_ethread: combine encoder counts in separate struct"	2014-11-17 08:57:38 -08:00
Debargha Mukherjee	c3a9056df4	Merge "Added sse2 acceleration for highbitdepth variance"	2014-11-14 21:11:27 -08:00
Yunqing Wang	87ae6d73d4	Code cleanup: remove unused members in RD_OPT These 2 members in RD_OPT were moved to TileDataEnc struct already, and therefore were removed here. Change-Id: I22fee3b67f96e473a58e194a7edc76dbd48bfa04	2014-11-14 16:33:25 -08:00
Yunqing Wang	d0b547c676	vp9_ethread: combine encoder counts in separate struct Several frame counters in encoder are updated at SB level. Combine those counters and put them in a separate struct, which allows us to allocate one copy for each thread. Change-Id: I00366296a13c0ada4d8fa12f5e07728388b6cab7	2014-11-14 16:09:22 -08:00
Peter de Rivaz	48032bfcdb	Added sse2 acceleration for highbitdepth variance Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f (cherry picked from commit `d7422b2b1e`) (cherry picked from commit `6d741e4d76`)	2014-11-14 15:18:53 -08:00
Yunqing Wang	6906d218dd	vp9_ethread: include a pointer to mb in VP9_COMP Modified VP9_COMP struct to include MACROBLOCK *mb. This change makes it feasible in multi-thread case to allocate a mb for each thread. Change-Id: I624d6d1aa9c132362200753e5d90b581b1738d6e	2014-11-14 12:31:06 -08:00
hkuang	a9a20a1040	Fix a bug in frame parallel decode and add a unit test for that. A flush bug is discovered during putting frame parallel decoder into Android. This test will expose that bug. Change-Id: Ia047f27972f4da0471649f79f1f91e7695297473	2014-11-14 10:16:34 -08:00
Yunqing Wang	807885b5e0	Merge "vp9_ethread: modify the cyclic refresh struct"	2014-11-13 18:35:01 -08:00
Yaowu Xu	e4e85ad4a8	Merge "adapt the adjustment limit for rate correction factor in RTC mode"	2014-11-13 15:50:30 -08:00
Yunqing Wang	8ee605f188	vp9_ethread: modify the cyclic refresh struct Two members in struct CYCLIC_REFRESH int64_t projected_rate_sb; int64_t projected_dist_sb; are updated at the superblock level, which makes them shared data in the multi-thread situation, and requires extra work to handle them. However, those values are updated and used immediately, and therefore can be removed. This patch cleaned up the code and removed the two members. Change-Id: I2c6ee4552bf49fb63ce590cdb47f9723974fffb1	2014-11-13 15:05:46 -08:00
Adrian Grange	35de9db312	Merge "Prepare for dynamic frame resizing in the recode loop"	2014-11-13 15:01:49 -08:00
Paul Wilkins	20517e0627	Merge "Fix 32 bit build emms problem."	2014-11-13 15:00:41 -08:00
Jingning Han	aeff1f7ec2	Merge "Use reconstructed pixels for intra prediction"	2014-11-13 13:59:02 -08:00
Jingning Han	6efafda738	Merge "Refactor nonrd_use_partition coding process"	2014-11-13 13:58:21 -08:00
Adrian Grange	0d085ebc0a	Prepare for dynamic frame resizing in the recode loop Prepare for the introduction of frame-size change logic into the recode loop. Separated the speed dependent features into separate static and dynamic parts, the latter being those features that are dependent on the frame size. Change-Id: Ia693e28c5cf069a1a7bf12e49ecf83e440e1d313	2014-11-13 11:41:20 -08:00
Paul Wilkins	b9c4f9a7db	Fix 32 bit build emms problem. Add extra vp9_clear_system_state() calls to fix double / mmx issue introduced into first pass code for 32 bit builds. Change-Id: I84cd2986b80d83650a091ab25c43755efeb82e03	2014-11-13 11:33:55 -08:00
Yaowu Xu	9f79259e54	adapt the adjustment limit for rate correction factor in RTC mode Rate correction factor is used to correct the estimated rate for any given quantizer, and feeds into rate control for quantizer selection. We make use of the actual bits used to calculate this rate correction factor with an adjustment limit to prevent over-adjustment. This commit adapts the adjustment limit to the difference between the estimated bits and the actual bits, allows the adjustment limit to vary between 0.125 (when estimate is close to actual) and 0.625 (when there is >10X factor off between estimated and actual bits). By doing this, the commit appears to have largely corrected two observed issues: 1. Adjustment is too slow when the actual bits used is way off from estimate due to the small adjustment limit. 2. Extreme oscillating quantizer choices due to the feedback loop. Change-Id: I4ee148d2c9d26d173b6c48011313ddb07ce2d7d6	2014-11-13 11:26:52 -08:00
Deb Mukherjee	7621a19aa5	Merge "Vidyo: Turn off keyframes in higher spatial layers"	2014-11-13 03:27:11 -08:00
Debargha Mukherjee	002172efd6	Merge "Added highbitdepth sse2 SAD acceleration and tests"	2014-11-12 21:20:34 -08:00
Peter de Rivaz	7eee487c00	Added highbitdepth sse2 SAD acceleration and tests Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f (cherry picked from commit `b1a6f6b9cb`)	2014-11-12 14:25:45 -08:00
Yaowu Xu	8e112d9586	Merge "Use normal rate_correction_factor for gf in CBR mode"	2014-11-12 08:00:26 -08:00
Deb Mukherjee	48a7627316	Vidyo: Turn off keyframes in higher spatial layers Change-Id: Icdd5e71cd6a2b59bc4b3b972af9e4d4a36821792	2014-11-11 16:09:07 -08:00
Deb Mukherjee	c7a905ca3d	Merge "Vidyo: Support for one-pass rc-enabled SVC encoder"	2014-11-11 16:03:11 -08:00
Jingning Han	e717d22b63	Use reconstructed pixels for intra prediction This commit makes the speed -6 and above use the reconstructed boundary pixels for precise intra prediction. This allows more intra prediction modes to be tested in the non-RD coding process. Enabling horizontal and vertical intra prediction modes can improve the speed -6 compression performance for rtc set by 0.331%. Change-Id: I3a99f9d12c6af54de2bdbf28c76eab8e0905f744	2014-11-11 10:04:43 -08:00
Paul Wilkins	4999505472	Merge "AQ1 - remove first pass weights."	2014-11-11 09:17:33 -08:00
Yaowu Xu	f2b978e895	Use normal rate_correction_factor for gf in CBR mode I0c5f010 changed to allow update golden reference buffer in CBR mode, this commit changes the use of rate_correction_factor for those frames to be aligned with the new usage. This commit attempts to solve two issues: a. Initialization of rate correction factor for Golden Frame Prior to this patch, even the regular inter frame has been update the rate correction factor based on content and encoding results, the first golden frame would still use the ininitialized value that can be way off. b. Allowing rate correction factor update to be slightly faster Prior to this patch, when the rate correction factor is off, the update to the factor is too slow, the factor could not get close to a semi-correct value even after many frames. The commit helps all clips in psnr/ssim metric, but especially to a few clip in RTC set that rate correction was way off. For example thaloundeskmtgvga gained about .5dB for both overall/average psnr. Change-Id: I0be5c41691be57891d824505348b64be87fa3545	2014-11-10 16:55:13 -08:00
Deb Mukherjee	0ba1542f12	Vidyo: Support for one-pass rc-enabled SVC encoder Adds support for one-pass rc-enabled SVC encoder with callbacks for getting per-layer packets. - the callback function registration is implemented as an encoder control function. - if the callback function is not registered, the old way of aggregating packets with superframe will take effect. - one more control function “VP9E_GET_SVC_LAYER_ID” has been implemented to get the temporal/spatial id from the encoder within the callback. This can be used to get the ids to put on RTP packet. Change-Id: I1a90e00135dde65da128b758e6c00b57299a111a	2014-11-10 16:08:58 -08:00
Deb Mukherjee	130c6d7455	Merge "Iadst transforms to use internal low precision"	2014-11-10 15:39:46 -08:00
Deb Mukherjee	cc57c5e4af	Iadst transforms to use internal low precision Change-Id: I266777d40c300bc53b45b205144520b85b0d6e58 (cherry picked from commit `a1b726117f`)	2014-11-07 14:19:45 -08:00
Alex Converse	ce9ba97a9d	Fix LAST SKIP when considering GOLDEN Change-Id: I39d9f13fa34984ee9dad0c4f303ef672635f420e	2014-11-07 13:44:17 -08:00
Paul Wilkins	08d86bc904	Merge "Add intra complexity and brightness weight to first pass."	2014-11-07 09:22:12 -08:00
Yaowu Xu	98492c1091	Merge "Change the use of a reserved color space entry"	2014-11-07 06:24:59 -08:00
Paul Wilkins	31b6d7c1eb	AQ1 - remove first pass weights. Removed redundant weighting function tied for AQ1 from first pass code. Improvment in baseline AQ1 results:- Derf opsnr +0.142% SSIm +0.258% YT opsnr +0.173% SSIm +0.3% Change-Id: I16ef91caf2d7f302cd5940cc5e2626d48ebcb212	2014-11-07 14:11:29 +00:00
Yaowu Xu	af3519a385	Change the use of a reserved color space entry This commit rename a reserved color space entry to BT_2020, it intends to provide support for VP9 bitstream to pass along the color space type defined in BT.2020(Rec.2020) please note this entry does not have any effect on encoding/decoding behavior, but allow applications to the pass the information along from encoding end to decoding end. Change-Id: I4678520e89141ea5e8900f7bd1c0e95b710b7091	2014-11-06 19:14:21 -08:00
Jingning Han	754b05a4de	Refactor nonrd_use_partition coding process This commit integrates the non-RD mode decision process and the encoding process into a single recursion scheme. Change-Id: I6a7e72a0b84d567554801ebbe01ec75d54c1f77d	2014-11-06 17:00:48 -08:00
Yunqing Wang	bf44117d5f	Merge "Modify the frame context memory deallocation"	2014-11-06 13:08:57 -08:00
Jingning Han	417e754f56	Merge "Remove unused is_background function"	2014-11-06 12:03:15 -08:00
Jingning Han	e97f404e52	Merge "Rework cut-off decisions in cyclic refresh aq mode"	2014-11-06 12:03:07 -08:00
Yunqing Wang	1228433430	Modify the frame context memory deallocation This patch was to fix the vpxdec fuzzing3 test failure. When an error occurs, setjmp() is invoked, which calls the decoder removing routine. In multiple thread situation, other threads could try to access the frame context memory that is already deallocated, thus causing a segfault. An invalid unit test was added for this issue. Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952	2014-11-06 11:34:19 -08:00
Paul Wilkins	5e935126a6	Add intra complexity and brightness weight to first pass. The aim of this patch is to apply a positive weighting to frames that have a significant number of blocks that are of low spatial complexity and are dark. The rationale behind this is that artifacts tend to be more visible in such frames. In this patch the weight is only applied in regard to the distribution of bits between frames. Hence if all the frames share similar characteristics (as is the case for most of our short test clips) there will be little or no net effect. However, the effect can be seen on some longer form test content. For example Tears of steel baseline test: 2323.09 Kbit/s opsnr 39.915 ssim 74.729 With this patch:- 2213.34 Kbit/s opsnr 39.963 ssim 74.808 (Sligtly better metrics and about 5% smaller) The weighting may well need some further tuning along side changes to the aq modes. Change-Id: Ieced379bca03938166ab87b2b97f55d94948904c	2014-11-06 10:45:00 +00:00
Jingning Han	10da059b52	Remove unused is_background function Change-Id: Ia540eac5f066ae95280c2f898370eddf0110c279	2014-11-05 21:19:23 -08:00
Jingning Han	caaf63b2c4	Rework cut-off decisions in cyclic refresh aq mode This commit removes the cyclic aq mode dependency on in_static_area and reworks the corresponding cut-off thresholds. It improves the compression performance of speed -5 by 1.47% in PSNR and 2.07% in SSIM, and the compression performance of speed -6 by 3.10% in PSNR and 5.25% in SSIM. Speed wise, about 1% faster in both settings at high bit-rates. Change-Id: I1ffc775afdc047964448d9dff5751491ba4ff4a9	2014-11-05 21:17:09 -08:00
hkuang	e8860693ea	Merge "Totally remove prev_mi in VP9 decoder."	2014-11-05 17:48:47 -08:00
hkuang	4cc7c5a17f	Totally remove prev_mi in VP9 decoder. This will save the memory and improve the decode speed due to removing unnecessary memset of big prev_mi array for all the key frames. Decoding a all key frames 1080p video shows speed improve around 2%. Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10	2014-11-05 16:14:30 -08:00
Yaowu Xu	2c4fee17bc	Fix visual studio 2013 compiler warnings For configured with --enable-vp9-highbitdepth Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6	2014-11-05 13:47:28 -08:00
Hui Su	2c95a3f374	Merge "Simplify interface of write_selected_tx_size and read_tx_size"	2014-11-05 13:33:09 -08:00
Jingning Han	a7889cac9a	Merge "Skip ref frame mode search conditioned on predicted mv residuals"	2014-11-05 12:04:10 -08:00
Hui Su	709c634b84	Simplify interface of write_selected_tx_size and read_tx_size Change-Id: Ia2b2a895deefaaf7b34bf26df86add56dbab082c	2014-11-04 16:11:50 -08:00
Minghai Shang	9f9e30d7bf	Merge "[spatial svc] Make spatial svc working for one pass rate control"	2014-11-04 15:57:16 -08:00
hkuang	23da920a8e	Fix the memory leak due to missing free frame_mvs. Change-Id: I2ceee7341d906259002c0ea31ea009ae32c04bfd	2014-11-04 13:28:31 -08:00
Minghai Shang	86c36a504d	[spatial svc] Make spatial svc working for one pass rate control Change-Id: Ibd9114485c3d747f9d148f64f706bf873ea473ac	2014-11-04 11:46:48 -08:00
Jingning Han	1e753387c8	Merge "Refactor sub-pixel motion search unit"	2014-11-04 09:11:15 -08:00
Jingning Han	1434f7695b	Skip ref frame mode search conditioned on predicted mv residuals This commit makes the RTC coding mode to conditionally skip the reference frame mode search, when the predicted motion vector of the current reference frame gives more than two times sum of absolute difference compared to that of other reference frames. It reduces the runtim by 1% - 4% for speed -5 and -6. The average compression performance is improved by about 0.1% in both settings. It is of particular benefit to light change scenarios. The compression performance of test clip mmmovingvga.y4m is improved by 6.39% and 15.69% at high bit rates for speed -5 and -6, respectively. Speed -5 vidyo1 16555 b/f, 40.818 dB, 12422 ms -> 16552 b/f, 40.804 dB, 12100 ms nik 33211 b/f, 39.138 dB, 11341 ms -> 33228 b/f, 39.139 dB, 11023 ms mmmoving 33263 b/f, 40.935 dB, 13508 ms -> 33256 b/f, 41.068 dB, 12861 ms Speed -6 vidyo1 16541 b/f, 40.227 dB, 8437 ms -> 16540 b/f, 40.220 dB, 8216 ms nik 33272 b/f, 38.399 dB, 7610 ms -> 33267 b/f, 38.414 dB, 7490 ms mmmoving 33255 b/f, 40.555 dB, 7523 ms -> 33257 b/f, 40.975 dB, 7493 ms Change-Id: Id2aef76ef74a3cba5e9a82a83b792144948c6a91	2014-11-04 09:10:19 -08:00
Yunqing Wang	6d90a9d289	Merge "WORKAROUND FIX FOR GCC4.9.1"	2014-11-03 16:56:38 -08:00
Marco	343acaa8f2	Merge "Allow disable of refresh golden for more than 1 layer encoding."	2014-11-03 14:38:05 -08:00
Jingning Han	e083f6bd08	Refactor sub-pixel motion search unit This commit unfolds the legacy macro definitions used in the sub-pixel motion search and refactors the operational flow for later optimizations. Change-Id: I3e3f770cad961d03d1a6eb0b2a0186cc77eaf2b8	2014-11-03 09:02:57 -08:00
Jingning Han	0ca5908ff6	Merge "Fix the THR_MODES array used in vp9_pick_inter_mode"	2014-11-03 08:46:42 -08:00
Yaowu Xu	2fe893c94f	Merge "Fix speed 7 and speed 12 for rt"	2014-11-03 08:02:58 -08:00
Marco	d6b688375f	Allow disable of refresh golden for more than 1 layer encoding. The current logic was allowing for disabling golden refresh only for two pass svc encoding. This change disables it as long as more than 1 layer encoding is used (for example temporal layers under 1pass CBR). Change-Id: I4dc5204a7ad365c821ec7963e93b59da82e1826b	2014-11-02 22:24:00 -08:00
Jingning Han	7e119e2946	Fix the THR_MODES array used in vp9_pick_inter_mode Fix the alignment of entries fo intra prediction modes. Change-Id: Ie32ad87cf90694efd591a4b1cc29c916c4cd56f7	2014-11-02 12:25:57 -08:00
levytamar82	86175a5788	WORKAROUND FIX FOR GCC4.9.1 In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic _mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1. until it will be fixed I created a workaround that create the up convert by using broadcast128+shuffle. The bug was reported here: https://code.google.com/p/webm/issues/detail?id=867 Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351	2014-11-01 11:27:28 -07:00
Yaowu Xu	0271ff7775	Fix speed 7 and speed 12 for rt A recent change has introduced big quality drops for speed 7 and 12 for --rt mode. The change reverted the big drop and improved quality by 9.5% for speed 7 and 13.4% for speed 12. Change-Id: I07b82e3bb6002a73af486a083458c88877bdad01	2014-10-31 17:29:02 -07:00
hkuang	55577431ae	Bind motion vectors with frame buffer structure. This will save a lot of memory for decoder due to removing of prev_mi, but prev_mi is still needed in encoder. So this will increase a little bit memory for encoder. Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed	2014-10-31 17:01:08 -07:00
Jingning Han	1c84e73ebd	Merge "Fix mode index use case in vp9_pick_inter_mode"	2014-10-31 08:55:40 -07:00
Jingning Han	61966b1d10	Merge "Refactor vp9_update_rd_thresh_fact"	2014-10-31 08:55:28 -07:00
Jingning Han	1cffea9fb7	Merge "Rework pred pixel buffer system in non-RD coding mode"	2014-10-31 08:55:24 -07:00
Jingning Han	64348d9f8d	Fix mode index use case in vp9_pick_inter_mode This improves coding performance of speed -5 and -6 by 0.6%, respectively. Change-Id: Ic5a7746a88c73285f0b14333d35dc16b02152c25	2014-10-30 11:10:06 -07:00
Jingning Han	f7b46d8c5e	Refactor vp9_update_rd_thresh_fact Reduce the scope of function parameters. Change-Id: Ifef2cfb559908a97498ffdbd6ea53da1cd45a73c	2014-10-30 11:09:40 -07:00
Jingning Han	7bea8c59f9	Rework pred pixel buffer system in non-RD coding mode This commit makes the inter prediction buffer system to support hybrid partition search. It reduces the runtime of speed -5 by about 3%. No compression performance change. vidyo1 720p 1000 kbps 11831 ms -> 11497 ms nik 720p 1000 kbps 10919 ms -> 10645 ms Change-Id: I5b2da747c6395c253cd074d3907f5402e1840c36	2014-10-30 11:08:35 -07:00
Hui Su	d478d2df37	Merge "Move the definition of switchable filter numbers into enum INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV."	2014-10-30 11:05:04 -07:00
Hui Su	66906da066	Merge "Combine vp9_encode_block_intra and encode_block_intra"	2014-10-30 11:02:31 -07:00
Yunqing Wang	aed48c786a	Remove unused speed feature Partition_check was unused and removed. Change-Id: I15ec9162d86dc61f04c09229c498629878ed7155	2014-10-29 17:05:04 -07:00
Jingning Han	afa31ab9b8	Merge "Enable mode search threshold update in non-RD coding mode"	2014-10-29 12:42:22 -07:00
Jingning Han	9349a28e80	Enable mode search threshold update in non-RD coding mode Adaptively adjust the mode thresholds after each mode search round to skip checking less likely selected modes. Local tests indicate 5% - 10% speed-up in speed -5 and -6. Average coding performance loss is -1.055%. speed -5 vidyo1 720p 1000 kbps 16533 b/f, 40.851 dB, 12607 ms -> 16556 b/f, 40.796 dB, 11831 ms nik 720p 1000 kbps 33229 b/f, 39.127 dB, 11468 ms -> 33235 b/f, 39.131 dB, 10919 ms speed -6 vidyo1 720p 1000 kbps 16549 b/f, 40.268 dB, 10138 ms -> 16538 b/f, 40.212 dB, 8456 ms nik 720p 1000 kbps 33271 b/f, 38.433 dB, 7886 ms -> 33279 b/f, 38.416 dB, 7843 ms Change-Id: I2c2963f1ce4ed9c1cf233b5b2c880b682e1c1e8b	2014-10-29 10:55:34 -07:00
Adrian Grange	4074099ed8	Simplify vp9_set_rd_speed_thresholds_sub8x8 Change-Id: I4bf0f9a38697f5aea564a47afd7f02bb8b2888b6	2014-10-29 09:09:46 -07:00
Hui Su	0928da3b6e	Combine vp9_encode_block_intra and encode_block_intra Change-Id: I79091fb677b64892ecca2fb466fde14602d8cdfc	2014-10-28 18:57:01 -07:00
Jingning Han	982dab6050	Merge "Use zero motion vector in choose_partitioning"	2014-10-28 12:00:13 -07:00
JackyChen	50e5c30536	Merge "vp9_denoiser_sse2: refactor the code."	2014-10-28 11:06:05 -07:00
Yaowu Xu	7d7b43b9af	Merge "Allow update of golden refernce buffer in CBR mode"	2014-10-28 10:48:02 -07:00
JackyChen	99a8dac4de	vp9_denoiser_sse2: refactor the code. Combined vp9_denoiser_8xM_sse2 and vp9_denoiser_4xM_sse2 into one function vp9_denoiser_NxM_sse2_small and passed the bitexact testing. Changed the name of the function vp9_denoiser_64_32_16xM_sse2 to vp9_denoiser_NxM_sse2_big. Change-Id: Ib22478df585994dd347ebae04202c0b701e7f451	2014-10-28 09:36:58 -07:00
Yaowu Xu	2a506e33b4	Merge "Add a new control of golden frame boost in CBR mode"	2014-10-28 09:32:58 -07:00
Yaowu Xu	e5cd51880e	Allow update of golden refernce buffer in CBR mode This commit changes to allow the usage of golden reference frame in VP9 CBR mode to improve quality. VP9 supports potentially up to 8 reference buffers, it has reference buffers available for this purpose. This was not possible in VP8 as golden and alt-ref buffers were used for temporal scalability purpose in CBR mode in WebRTC. For frames that update golden frame, there can be a quality boost. The amount of allowed bitrate boost can be controlled via parameter rc_max_inter_bitrate_pct. The inital value of the boost ratior is currently based on over_shoot_pct. Further experiments will work out the adaption of this boost value. Change-Id: I0c5f010c8fd8b7b598f69779c1b30e5b2ac30a4d	2014-10-28 09:31:10 -07:00
Paul Wilkins	422d7bc918	Relax maximum Q for extreme overshoot. Added code to relax the active maximum Q in response to extreme local overshoot to reduce bandwidth peaks. The impact is small in metrics terms, but it this helps reduce bandwidth spikes and overall overshoot in a number of clips in our tests sets (especially the YT test set). In particular this should help prevent very big spikes where a clip is mainly easy but has a short hard section. In such a case a choice of maximum Q for the clip as a whole may allow us to hit the overall target rate but give some extreme spikes. The chunked encoding in YT mitigates this problem but it can show up where a longer clip is coded as a single chunk. Change-Id: I213d09950ccb8489d10adf00fda1e53235b39203	2014-10-28 13:03:06 +00:00
Jingning Han	07436abb86	Use zero motion vector in choose_partitioning The zero motion vector was effectively used in the subsampled pixel based variance calculation. This commit makes it directly use zero mv to generate prediction. Change-Id: Ica83dc843e9f8da2f89c3ef451e50f16214c0def	2014-10-27 19:38:43 -07:00
Jingning Han	d56b3eb0cf	Refactor encoder tile data structure Make the common tile info as one element in the encoder tile data struct. Change-Id: I8c474b4ba67ee3e2c86ab164f353ff71ea9992be	2014-10-27 19:37:13 -07:00
Yaowu Xu	03a60b78db	Add a new control of golden frame boost in CBR mode 0 means that golden boost is off, and uses average frame target rate, a non-zero number means the percentage of boost over average frame bitrate is given initially to golden frames in CBR mode. Change-Id: If4334fe2cc424b65ae0cce27f71b5561bf1e577d	2014-10-27 13:55:18 -07:00
Jingning Han	192010d218	Refactor rtc coding mode to support tile encoding Use per tile threshold in the prediction mode search process. Change-Id: I6c74ee5a3b069bb4281002dfe51310911a0756c0	2014-10-27 09:53:46 -07:00
Yaowu Xu	aa2af3ff6e	Merge "Add a new control of max bitrate for inter frame"	2014-10-27 08:11:54 -07:00
Jingning Han	ac53c41e64	Merge "Tile based adaptive mode search in RD loop"	2014-10-24 18:44:52 -07:00
James Zern	01900edc40	Merge changes I8a9c9019,Ic7b2faa3,I44d42a50,I3f3a3924,I10747b32,I31b49c9e * changes: add vp9_loop_filter_data_reset move LFWorkerData allocation to VP9LfSync vp9_loop_filter_frame_mt: remove pbi dependency vp9_loop_filter_frame_mt: pass planes directly vp9_loop_filter_frame_mt: pass VP9LfSync directly vp9: store TileWorkerData allocations separately	2014-10-24 11:43:51 -07:00
Yaowu Xu	636099f7b6	Add a new control of max bitrate for inter frame Change-Id: I205de3611622cff7f751ea8baf9f82784581730a	2014-10-24 10:19:28 -07:00
Jingning Han	eee201c221	Tile based adaptive mode search in RD loop Make the spatially adaptive mode search in rate-distortion optimization loop inter tile independent. Experiments suggest that this does not significantly change the coding staticstics. Single tile, speed 3: pedestrian_area 1080p 1500 kbps 59192 b/f, 40.611 dB, 101689 ms blue_sky 1080p 1500 kbps 58505 b/f, 36.347 dB, 62458 ms mobile_cal 720p 1000 kbps 13335 b/f, 35.646 dB, 45655 ms as compared to 4 column tiles, speed 3: pedestrian_area 1080p 1500 kbps 59329 b/f, 40.597 dB, 101917 ms blue_sky 1080p 1500 kbps 58712 b/f, 36.320 dB, 62693 ms mobile_cal 720p 1000 kbps 13191 b/f, 35.485 dB, 45319 ms Change-Id: I35c6e1e0a859fece8f4145dec28623cbc6a12325	2014-10-24 10:00:27 -07:00
Paul Wilkins	60d192db04	Merge "Enable dual arf with constant q."	2014-10-24 05:51:25 -07:00
Paul Wilkins	3758650c98	Merge "Move frame re-sizing into the recode loop"	2014-10-24 05:50:39 -07:00
Adrian Grange	65753eeb8a	Move frame re-sizing into the recode loop The point at which frames are scaled to their coded dimensions is moved into the re-code loop. This is in preparation for a further patch that will add logic into the re-code loop to reduce the coded frame size if the encoder is struggling to hit the target data rate at the native frame size. Change-Id: Ie4131f5ec6fb93148879f6ce96123296442bf2d1	2014-10-23 16:20:57 -07:00
Yaowu Xu	86777f2e1e	Merge "Move filter_ref initialization"	2014-10-23 11:20:22 -07:00
James Zern	01483677e5	add vp9_loop_filter_data_reset Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275	2014-10-23 19:43:48 +02:00
Yaowu Xu	065809d286	Move filter_ref initialization To outside the loop to avoid repeating the operations. Change-Id: I66c1986e98ce0d7594caad3d3b45de655b299bff	2014-10-23 08:27:25 -07:00
Paul Wilkins	8fc3ab774f	Enable dual arf with constant q. Add second level arf Q adjustment when using dual arfs in constant Q mode. Previously in constant Q mode enabling dual arf hurt by ~5% but with this change the average benefit is ~1-1.5% with some mid range data points up ~10%. Note however that it still hurts on some clips including some very low motion show content. Change-Id: I5b7789a2f42a6127d9e801cc010c20a7113bdd9b	2014-10-23 13:19:31 +01:00
Paul Wilkins	9363425daa	Merge "Initialization bug for multi arf."	2014-10-23 02:02:48 -07:00
Jingning Han	41a17f4457	Merge "Allow checking zeromv mode in vp9_pick_inter_mode"	2014-10-22 18:46:20 -07:00
Yunqing Wang	330a6b2756	Merge "vp9_ethread: allocate frame contexts outside VP9_COMMON struct"	2014-10-22 17:10:39 -07:00
Frank Galligan	f271bed671	Merge "Fix Neon convolve profiling"	2014-10-22 15:50:36 -07:00
Yunqing Wang	7c7e4d4eb8	vp9_ethread: allocate frame contexts outside VP9_COMMON struct This patch allocated frame contexts outside VP9_COMMON. This allows multiple threads to share the same copy of frame contexts, and reduces the overhead. It also guarantees the correct update of these contexts during bitstream packing. This patch doesn't change encoding result. Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353	2014-10-22 15:03:12 -07:00
Yaowu Xu	7c48a295ae	Merge "Fix a subtle issue in re-use inter_pred"	2014-10-22 14:53:06 -07:00
Jingning Han	08cdd006e1	Allow checking zeromv mode in vp9_pick_inter_mode This improves the compression performance of speed -5 by 0.6%. The speed impact is less than 1%. Change-Id: Ie77daa561976dfc8b479061e1221bdf428eb0c3b	2014-10-22 14:47:15 -07:00
JackyChen	897500b9ba	Merge "vp9_denoiser_sse2.c: improve code style."	2014-10-22 13:52:03 -07:00
Yaowu Xu	3f79359e0a	Fix a subtle issue in re-use inter_pred The initialization of this_mode_pred does not work when the ref_frame loop ever goes beyond LAST_FRAME. This commit fixes the subtle issue and allows potentially expanding the loop to test GOLDEN_FRAME. Change-Id: Ibbd427a22160d1d9eacb8ed0c87f88d6cef9c0f3	2014-10-22 12:06:27 -07:00
JackyChen	5cba6516aa	vp9_denoiser_sse2.c: improve code style. denoiser_sse2.c: fix typos in comment. Change-Id: Ic0fb102331b0e533c058da3cab1fbc30de9a0070	2014-10-22 10:55:54 -07:00
Frank Galligan	95a568b3a8	Fix Neon convolve profiling When profiling, gprof can't distinguish between matching labels in different files. Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f	2014-10-22 10:51:53 -07:00
Paul Wilkins	7cd6330ef3	Initialization bug for multi arf. Moved erroneous reset of cpi->multi_arf_last_grp_enabled. Change-Id: Ibb0b96f6ed1d5eeb575a3b1c798e0fe2ee651d06	2014-10-22 18:51:07 +01:00
Hangyu Kuang	9ce3a7d76c	Implement frame parallel decode for VP9. Using 4 threads, frame parallel decode is ~3x faster than single thread decode and around 30% faster than tile parallel decode for frame parallel encoded video on both Android and desktop with 4 threads. Decode speed is scalable to threads too which means decode could be even faster with more threads. Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e	2014-10-22 10:50:58 -07:00
Jingning Han	0e64aa5073	Merge "Refactor rate distortion cost structure in non-RD coding mode"	2014-10-22 08:41:36 -07:00
Yaowu Xu	87665f16f4	Merge "Change speed features for good quality(cpu-used=5)"	2014-10-22 08:40:15 -07:00
Jingning Han	be212d4db3	Refactor rate distortion cost structure in non-RD coding mode This commit refactors the rate distortion structure used in the non-RD coding mode and saves a few RDCOST calculations. Change-Id: I62c3416c300d2c5372f21b96d93a6b633a34ab3a	2014-10-21 17:17:11 -07:00
Hui Su	8947b18fa3	Move the definition of switchable filter numbers into enum INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV. Change-Id: Ic36c9eb6ccb8ec324d991f7241e42b40b60b1dcb	2014-10-21 15:41:37 -07:00
Yaowu Xu	c30f7e6cc5	Change speed features for good quality(cpu-used=5) The existing speed features produce horrible encoding results, almost 30% worse than cpu-used=4, this commit adjust the speed features to produce relatively resonable results to be within 3%-5% of cpu-used=4. Change-Id: I0ca6ebafb33024d4a0cbcf04c78a4a00b8dd1ecf	2014-10-21 11:59:12 -07:00
Jingning Han	1ed1dde06d	Remove unused copy_partitioning Change-Id: I75a2a3772ed17e73180eb4f263cc838cae4927b0	2014-10-21 09:47:58 -07:00

... 5 6 7 8 9 ...

7346 Commits