generic-library/vpx

Author	SHA1	Message	Date
Deb Mukherjee	8d3d2b76f3	Tx size selection enhancements (1) Refines the modeling function and uses that to add some speed features. Specifically, intead of using a flag use_largest_txfm as a speed feature, an enum tx_size_search_method is used, of which two of the types are USE_FULL_RD and USE_LARGESTALL. Two other new types are added: USE_LARGESTINTRA (use largest only for intra) USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for inter) (2) Another change is that the framework for deciding transform type is simplified to use a heuristic count based method rather than an rd based method using txfm_cache. In practice the new method is found to work just as well - with derf only -0.01 down. The new method is more compatible with the new framework where certain rd costs are based on full rd and certain others are based on modeled rd or are not computed. In this patch the existing rd based method is still kept for use in the USE_FULL_RD mode. In the other modes, the count based method is used. However the recommendation is to remove it eventually since the benefit is limited, and will remove a lot of complications in the code (3) Finally a bug is fixed with the existing use_largest_txfm speed feature that causes mismatches when the lossless mode and 4x4 WH transform is forced. Results on derf: USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a pretty good compromise) USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction (currently the benefit of modeling is limited for txfm size selection, but keeping this enum as a placeholder) . USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing use_largest_txfm speed feature). Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936	2013-07-02 13:54:00 -07:00
Deb Mukherjee	9c20cedd93	Clean-up in forward update to use mapping tables Uses mapping tables instead of complicated modulo/division operations for prob mapping for forward updates. No bit-stream or output change. Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546	2013-07-02 12:48:20 -07:00
Dmitry Kovalev	904070ca64	Merge "Removing unused implicit segmentation code."	2013-07-02 11:58:48 -07:00
Ronald S. Bultje	3cc6eb7c00	Merge "Make get_coef_context() branchless."	2013-07-02 11:48:15 -07:00
Dmitry Kovalev	3140c443e4	Merge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h."	2013-07-02 11:31:35 -07:00
Dmitry Kovalev	18fd43601c	Merge "Additional vp9_decodemv.c cleanup."	2013-07-02 11:31:23 -07:00
Dmitry Kovalev	a3d2e6c98b	Removing unused implicit segmentation code. Change-Id: I8a2983fb14274a6ac53681fa4cd5d4209cbd2905	2013-07-02 11:16:42 -07:00
Yunqing Wang	f4bee75c2b	Merge "Add speed feature to disable splitmv"	2013-07-02 10:54:22 -07:00
Yunqing Wang	b12e060b55	Add speed feature to disable splitmv Added a speed feature in speed 1 to disable splitmv for HD (>=720) clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim loss. Encoding speedup is 36%. (For reference: The test result on derf set showed 2% psnr loss and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be enabled for small resolution videos.) Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6	2013-07-02 10:02:34 -07:00
Jingning Han	b91a1586a3	Calculate rd cost per transformed block Compute the rate-distortion cost per transformed block, and cumulate the cost through all blocks inside a partition. This allows encoder to detect if the cumulative rd cost is already above the best rd cost, thereby enabling early termination in the rate-distortion optimization search. Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac	2013-07-02 09:58:46 -07:00
Ronald S. Bultje	9df24b41ca	Merge "Update quantize SSSE3 SIMD to cover 32x32 transform case also."	2013-07-02 09:38:08 -07:00
Paul Wilkins	1319d9c077	Adjust Speed 0 settings. Remove the use of sf->comp_inter_joint_search_thresh from the baseline speed 0. Approx +0.4% on derf. Change-Id: Icc14db98909830f40e5ac66130d40e78d2e55c71	2013-07-02 15:42:14 +01:00
Paul Wilkins	b7cd01ed73	Revert "New motion threshold factor - speed feature." This reverts commit `1377278180`. Also fixes a spelling mistake. Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f	2013-07-02 15:06:40 +01:00
Yaowu Xu	9e408e3504	fix the mismatch again in cpu_used 2 Change-Id: Icc4f70f0b0f91c9e7d5d00eedd67841afe2f2679	2013-07-01 19:13:18 -07:00
Jim Bankoski	d4158283e7	use partitioning from last frame This cl converts use partition from last frame to do the following: if part is none,horz, vert -> try split if part != none and one of the children is not split - try none Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87 Signed-off-by: Jim Bankoski <jimbankoski@google.com>	2013-07-01 18:18:50 -07:00
Dmitry Kovalev	1ac0540296	Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h. Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c	2013-07-01 17:28:08 -07:00
Ronald S. Bultje	26b6318de8	Make get_coef_context() branchless. This should significantly speedup cost_coeffs(). Basically what the patch does is to make the neighbour arrays padded by one item to prevent an eob check in get_coef_context(), then it populates each col/row scan and left/top edge coefficient with two times the same neighbour - this prevents a single/double context branch in get_coef_context(). Lastly, it populates neighbour arrays in pixel order (rather than scan order), so we don't have to dereference the scantable to get the correct neighbours. Total encoding time of first 50 frames of bus (speed 0) at 1500kbps goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase. Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56	2013-07-01 16:34:10 -07:00
Dmitry Kovalev	7ed754995d	Additional vp9_decodemv.c cleanup. Change-Id: I5b413bc0884af0bda38c05332d86490103905b3b	2013-07-01 16:14:13 -07:00
Yaowu Xu	ba3b2604f0	Merge "Quantize (64-bit only, for now) SSSE3 SIMD."	2013-07-01 15:58:57 -07:00
Dmitry Kovalev	6411228aca	Merge "Removing vp9_modecont.{h, c}."	2013-07-01 14:58:48 -07:00
Dmitry Kovalev	d9db0d96ec	Merge "Moving encoder subexp encoding functions to subexp.{h, c}."	2013-07-01 14:58:36 -07:00
Dmitry Kovalev	a4e14d7f9f	Merge "Adding vp9_rb_read_signed_literal function."	2013-07-01 14:58:20 -07:00
Dmitry Kovalev	33fffc155e	Merge "Inlining decode_atom, decode_sb_intra, and decode_sb."	2013-07-01 14:58:06 -07:00
Dmitry Kovalev	2381d46480	Merge "Cleanup inside vp9_decodemv.c."	2013-07-01 14:50:32 -07:00
Ronald S. Bultje	c8defcfdee	Update quantize SSSE3 SIMD to cover 32x32 transform case also. Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to 2min10.1, i.e. a 2.3% overall speed increase. Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87	2013-07-01 11:36:33 -07:00
Ronald S. Bultje	7353ceab9d	Quantize (64-bit only, for now) SSSE3 SIMD. Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904	2013-07-01 11:36:07 -07:00
Dmitry Kovalev	2ab3bc8871	Removing vp9_modecont.{h, c}. Moving vp9_default_inter_mode_probs array to vp9_entropymode.c. Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de	2013-07-01 10:17:15 -07:00
Paul Wilkins	7bb436feee	Merge "New motion threshold factor - speed feature."	2013-07-01 09:39:02 -07:00
Yaowu Xu	632289b31f	fix a mismatch in cpuused 2 Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540	2013-07-01 08:54:50 -07:00
Paul Wilkins	1377278180	New motion threshold factor - speed feature. Added a speed feature that focuses only on thresholds for new motion modes. Moved sf->comp_inter_joint_search_thresh into speed 1. This has ~+0.4% impact on quality at speed 0 as our quality reference baseline. Slight adjustment to baseline thresholds. Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5	2013-07-01 12:11:21 +01:00
Dmitry Kovalev	e5e15eb38e	Adding vp9_rb_read_signed_literal function. Change-Id: I30ea91561ffac7e5065ba41b2d3ab7dedb720593	2013-07-01 02:09:36 -07:00
Jingning Han	993942ce0c	Merge "Enable SSE2 4x4 ADST/DCT transform"	2013-06-29 15:57:04 -07:00
Christian Duvivier	466e0cf303	SSE2 version of vp9_short_fdct32x32_rd. 43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0	2013-06-29 13:53:00 -07:00
Dmitry Kovalev	bb8ccf1caf	Moving encoder subexp encoding functions to subexp.{h, c}. Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca	2013-06-29 11:50:45 -07:00
Ronald S. Bultje	bc70c60b25	Merge "fixed a bug where sse is not populated"	2013-06-29 07:42:41 -07:00
Johann	6098e359f4	Merge "add Neon optimized add constant residual functions"	2013-06-28 19:50:38 -07:00
James Zern	84d08fa9c4	Merge "fix test compile error"	2013-06-28 19:48:05 -07:00
Ronald S. Bultje	a487af8d35	Merge "Inline vp9_get_coef_context() (and remove vp9_ prefix)."	2013-06-28 19:37:11 -07:00
Ronald S. Bultje	7731e53839	Merge "Minor change to prevent one level of dereference in cost_coeffs()."	2013-06-28 19:36:56 -07:00
chm	a83cfd4da1	add Neon optimized add constant residual functions - Add add_constant_residual_8x8 16x16 32x32 functions - Tested under RealView debugger enviroment Change-Id: I5c3a432f651b49bf375de6496353706a33e3e68e	2013-06-28 19:06:51 -07:00
Dmitry Kovalev	d6264f9adf	Merge "Cosmetic reordering of FRAME_CONTEXT members."	2013-06-28 18:38:02 -07:00
Dmitry Kovalev	1947828c3d	Inlining decode_atom, decode_sb_intra, and decode_sb. Change-Id: I41711bb994f542c5ba3d0cefd9b2e79db3c2c3a1	2013-06-28 18:34:30 -07:00
James Zern	a63e31e81e	fix test compile error since: `92479d9` Make update_partition_context faster fixes: vp9/common/vp9_blockd.h:408:22: error: non-constant-expression cannot be narrowed from type 'int' to 'char' in initializer list [-Wc++11-narrowing] char pcvalue[2] = {~(0xe << boffset), ~(0xf <<boffset)}; ^~~~~~~~~~~~~~~~~ Change-Id: Id5b00b9a72d00a2b314081a23879bd1fa3ce983b	2013-06-28 18:07:37 -07:00
Jingning Han	1109b6b888	Enable SSE2 4x4 ADST/DCT transform This commit enables SSE2 4x4 foward hybrid transform. The runtime goes from 249 cycles down to 74 cycles. Overall around 2% speed-up at no compression performance change. Change-Id: Iad4d526346e05c7be896466c05500711bb763660	2013-06-28 17:24:43 -07:00
Yaowu Xu	f853e662b7	fixed a bug where sse is not populated Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc	2013-06-28 17:10:22 -07:00
Jingning Han	07b72ace70	Merge "Fix switch statement in 8x8 transform"	2013-06-28 16:49:59 -07:00
Dmitry Kovalev	228b8232d3	Cosmetic reordering of FRAME_CONTEXT members. Change-Id: Id641e5188adf55e53e606e5813ae45feaf7abbd2	2013-06-28 16:16:03 -07:00
Dmitry Kovalev	15fefced7d	Cleanup inside vp9_decodemv.c. Adding read_skip_coeff function. Renaming decode_mv to read_mv for consistency with another function names. Removing redundant function arguments. Renaming kfread_modes to read_intra_mode_info, read_mb_modes_mv to read_inter_mode_info, vp9_decode_mb_mode_mv to vp9_read_mode_info, vp9_decode_mode_mvs_init to vp9_prepare_read_mode_info. Inlining function mb_mode_mv_init inside vp9_prepare_read_mode_info. Change-Id: Ifee05d333da4cd331d4aff40ce41ccd9b70e494a	2013-06-28 15:32:31 -07:00
Dmitry Kovalev	59070f6e3c	Merge "Removing CONFIG_DEBUG checks on assertions."	2013-06-28 14:03:28 -07:00
Jingning Han	9def7f72a0	Fix switch statement in 8x8 transform Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f	2013-06-28 13:40:36 -07:00

... 12 13 14 15 16 ...

6080 Commits