generic-library/vpx

Author	SHA1	Message	Date
James Zern	9581eb6e8a	use consistent framerate naming s/frame_rate/framerate/g Change-Id: I6fc3e088e419c5f46e3a9390dd8a2cad2677a2fc	2013-07-16 14:12:47 -07:00
Dmitry Kovalev	5de96b3ce6	Merge "Rewriting vp9_set_pred_flag_{seg_id, mbskip}."	2013-07-16 13:34:42 -07:00
James Zern	5baa416b6c	Merge "vp9: remove frames_{since,till}.. from MACROBLOCKD"	2013-07-16 13:00:14 -07:00
James Zern	3a7c2665d0	Merge "yv12config: remove YUV_TYPE"	2013-07-16 12:16:04 -07:00
Dmitry Kovalev	863138a2ad	Rewriting vp9_set_pred_flag_{seg_id, mbskip}. Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent with vp9_get_segment_id without using confusing sub(a, b) macro. Passing mi_row and mi_col to functions explicitly instead of replying on mb_to_right_edge and mb_to_bottom_edge. Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435	2013-07-16 10:44:48 -07:00
Paul Wilkins	30d2ea45ce	Minor cleanup in code to fine uv tx_size. Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e	2013-07-16 18:27:33 +01:00
Jingning Han	dd97c62ab8	Merge "Skip inter-coded block reconstruction in rd loop"	2013-07-16 09:03:38 -07:00
Dmitry Kovalev	e8e7620a1f	Merge "Removing and moving around constant definitions."	2013-07-16 00:52:53 -07:00
Yaowu Xu	c5b0cd8405	Merge "Change to extend full border only when needed"	2013-07-15 21:35:32 -07:00
Yaowu Xu	5b915ebd92	Change to extend full border only when needed This is a short term optimization till we work out a decoder implementation requiring no frame border extension. Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f	2013-07-15 20:52:13 -07:00
Dmitry Kovalev	ca75f1255f	Removing and moving around constant definitions. Removing unused and duplicated constants, moving them from .h to .c if possible. Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f	2013-07-15 19:26:30 -07:00
Johann	6eae37f45c	Merge "Remove print_nmvcounts"	2013-07-15 18:43:41 -07:00
Ronald S. Bultje	1ff94fea56	Inline vp9_quantize() in xform_quant(). Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f	2013-07-15 17:30:57 -07:00
Ronald S. Bultje	6fb418741f	Inline xform_quant() in encode_block_intra(). Also inline some of the block calculations to assist the compiler to not do silly things like calculating the same offset (or converting between raster/transform block offset or block, mi and pixel unit) many, many, many times. Cycle times: 4x4: 584 -> 505 cycles (16% faster) 8x8: 1651 -> 1560 cycles (6% faster) 16x16: 7897 -> 7704 cycles (2.5% faster) 32x32: 16096 -> 15852 cycles (1.5% faster) Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall. Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80	2013-07-15 16:00:42 -07:00
Jingning Han	043e0f9dad	Skip inter-coded block reconstruction in rd loop Skip the inverse transform and reconstruction of inter-mode coded blocks in the rate-distortion optimization loop, when skip_encode_sb feature is turned on. This provides about 1% speed-up at speed 0, and 1.5% speed-up at speed 1. No performance change in both settings. Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007	2013-07-15 11:32:14 -07:00
Jingning Han	faff6ed0fb	Skip duplicate block encoding in the rd loop This speed feature allows the encoder to largely remove the spatial dependency between blocks inside a 64x64 superblock, thereby removing the need to repeatedly encode superblocks per partition type in the rate-distortion optimization loop. A major challenge lies in the intra modes tested in the rate-distortion optimization loop. The subsequent blocks do not have access to the reconstructed boundary pixels without the intermediate coding steps. This was resolved by using the original pixels for intra prediction in the rd loop, followed by an appropriately designed distortion modeling on the quantization parameters. Experiments also suggested that the performance impact is more discernible at lower bit-rate/psnr settings. Hence a quantizer dependent threshold is applied to deactivate skip of block coding. For bus_cif at 2000 kbps, speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB performance loss. speed 1: runtime 65312ms -> 61536ms, (7% speed-up) at 0.04dB performance loss. This operation is currently turned on in settings of speed 1. Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec	2013-07-15 11:08:58 -07:00
James Zern	dc1d2331f6	vp9: remove frames_{since,till}.. from MACROBLOCKD frames_since_golden / frames_till_alt_ref_frame are unused. Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b	2013-07-13 18:02:11 -07:00
Dmitry Kovalev	429070987a	Using vp9_copy and vp9_zero instead of custom code. Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b	2013-07-12 18:07:43 -07:00
Yaowu Xu	cdea4a7c66	Merge "Fix a build issue"	2013-07-12 16:17:22 -07:00
James Zern	4fc6c88e9c	yv12config: remove YUV_TYPE this was never fleshed out in the context of VP8, for which it was added. for VP9 it has no meaning. Change-Id: Iba2ecc026d9e947067b96690245d337e51e26eff	2013-07-12 15:25:48 -07:00
Dmitry Kovalev	cc662dd768	Adding struct tx_probs and struct tx_counts to cleanup the code. Also removing unused declarations from vp9_entropymode.h file. Change-Id: Ib9c5826db3584a32f6bb3297a76c522b99d83402	2013-07-12 15:22:38 -07:00
Yaowu Xu	fb754b182f	Fix a build issue Change-Id: I23a75c495ed7ea917d7f312bef0990e20a6b53d9	2013-07-12 11:38:44 -07:00
James Zern	0195fb53cb	vp9: consistent 'log2' variable naming lg2 -> log2 Change-Id: I0602ddff49e42c9c40c29c084d04b7592b9f8edf	2013-07-12 11:37:43 -07:00
Deb Mukherjee	94c481f9f1	Some minor cleanups for efficiency Implements some of the helper functions more efficiently with lookups rathers than branches. Modeling function is consolidated to reduce some computations. Also merged the two enums BLOCK_SIZE_TYPES and BlockSize into one because there is no need to keep them separate (even though the semantics are a little different). No bitstream or output change. About 0.5% speedup Change-Id: I7d71a66e8031ddb340744dc493f22976052b8f9f	2013-07-12 10:22:56 -07:00
Dmitry Kovalev	727631873d	Merge "Removing redundant code mostly from vp9_pred_common.{h, c}."	2013-07-12 10:22:30 -07:00
Paul Wilkins	b8ddc9f0d3	Merge "Speed 2 feature adjustment."	2013-07-12 02:14:01 -07:00
Jingning Han	84c3ac0476	Merge "Remove unnecessary tx_type branch in encode_block"	2013-07-11 21:52:27 -07:00
Dmitry Kovalev	dd150e8ea9	Removing redundant code mostly from vp9_pred_common.{h, c}. Removing redundant function arguments and curly braces. Change-Id: I46e02561f33fe02e84a3b19756f03b9504bd6a1b	2013-07-11 18:39:10 -07:00
Johann	e6ab476dd4	Remove print_nmvcounts For some reason iOS builds take a really long time to sort this function out. It's not used anywhere so remove it. Change-Id: Ia5c8513a0d9c7eb32641cca58ca1c1113e2dd9f4	2013-07-11 17:22:03 -07:00
Ronald S. Bultje	ee09dd9949	Remove unused function block_error(). Change-Id: I78a79fc51c2d7cc3c261f35b569155397f3dc0c4	2013-07-11 17:14:03 -07:00
Dmitry Kovalev	8c05e59065	Calling is_inter_mode() instead of custom code. Change-Id: Iccd4ab95ea51a6d57ed43947f2fd7ad92e8979cf	2013-07-11 14:14:47 -07:00
Dmitry Kovalev	c4ad3273c7	Moving segmentation related vars into separate struct. Adding segmentation struct to vp9_seg_common.h. Struct members are from macroblockd and VP9Common structs. Moving segmentation related constants and enums to vp9_seg_common.h. Change-Id: I23fabc33f11a359249f5f80d161daf569d02ec03	2013-07-11 11:57:57 -07:00
Dmitry Kovalev	f70c021d36	Merge "Adding write_compressed_header function."	2013-07-11 11:57:17 -07:00
Dmitry Kovalev	802e57535a	Merge "Removing unused TOKENEXTRA arg from pick_sb_modes function."	2013-07-11 11:46:06 -07:00
Jingning Han	b9381b6faf	Remove unnecessary tx_type branch in encode_block The function encode_block is called only by inter-prediction modes, hence removing the transform type branching there. Change-Id: I34a3172e28ce2388835efd0f8781922211bff857	2013-07-11 09:11:35 -07:00
Scott LaVarnway	f2a6bcfb18	Eliminated prev_mip memsets/memcpys in encoder This patch is in experimental but was not merged into master. This patch swaps ptrs instead of copying and uses the last show_frame flag instead of setting the entire buffer to zero. Change-Id: Ia0950466c8ba301a2a5bf917ff3d07bc1a2c2311	2013-07-11 10:47:28 -04:00
Paul Wilkins	5290eeab88	Speed 2 feature adjustment. With sf->auto_mv_step_size on it is questionable whether sf->reduce_first_step_size is worthwhile. At speed 2 it was not having a big impact. Even at speed 2 sf->optimize_coefficients = 0 is not having a big speed imapct so for now I have moved it down into a higher speed setting. Change-Id: I8a54de76d486ad37aabce76474889da2768b14c1	2013-07-11 13:59:12 +01:00
Jingning Han	aedc7c59b1	Merge "Fix tx_type bug in intra4x4 rd loop"	2013-07-10 20:13:25 -07:00
Ronald S. Bultje	c13e0bcb52	Remove unused fwalsh/fdct x86 SIMD implementations. Change-Id: Ia942e56cf322821d42ba06178672791eeee2847e	2013-07-10 18:22:51 -07:00
Dmitry Kovalev	544d8c3316	Removing unused TOKENEXTRA arg from pick_sb_modes function. Change-Id: I0543e72fa092eef3976b65e16bb597197c364873	2013-07-10 15:57:28 -07:00
Jingning Han	18803f9cc4	Fix tx_type bug in intra4x4 rd loop This commit fixed the mis-use of the tx_type for inverse transform in intra4x4 rate-distortion optimization loop. It improves the overall coding performance. Change-Id: I7fe9953175b74890357dbcee33c138573766e980	2013-07-10 15:49:49 -07:00
Deb Mukherjee	7494bba66b	Merge "Prunes out full-rd computation based on modeled rd"	2013-07-10 15:37:11 -07:00
Dmitry Kovalev	0ac5e4dd58	Adding write_compressed_header function. Change-Id: Ic5257fa8278e9b6297de230e4fd26a1e23ad2bb7	2013-07-10 15:08:34 -07:00
Jim Bankoski	68ef7a6b8a	configure with internal stats not working Change-Id: I5dea4570cb05df27a522abf6e7b695998654284a	2013-07-10 15:07:53 -07:00
Jim Bankoski	865ca76604	Merge "remove warnings when NDEBUG is set"	2013-07-10 14:39:39 -07:00
Jim Bankoski	6591cf2f7e	remove warnings when NDEBUG is set Change-Id: Ie0cb732fdcb98616a422c4463bff80642248d136	2013-07-10 14:27:20 -07:00
Deb Mukherjee	53ff43adc3	Prunes out full-rd computation based on modeled rd Adds a speed feature to eliminate full-rd computation if the modeled rd or rd based on a different parameter in the same mode is already a lot larger than the best rd yet. Specifically, only search the sharp and smooth filters if the modeled rd cost based on the regular filter is within a certain factor of the best rd cost so far. Also, skip full-rd computation of non splitmv inter modes if the modeled rd cost based on pred error is within the same factor of the best rd cost so far. Also adds some enhancements in the rd search for splitmv mode to speed things up by early breakouts. Negligible impact on performance. Resuts on derfraw300: psnr: -0.013% with the splitmv enhancements, -0.24% with the rd breakout feature on. speedup: 6% with splitmv enhancements, 20% with also residual breakout (tested on football sequence at 600 Kbps) Change-Id: I37abc308ea9f110c1679ce649b6a7e73ab1ad5fc	2013-07-10 13:49:49 -07:00
Jingning Han	114423538f	SSE2 16x16 ADST/DCT hybrid transform This commit enables 16x16 ADST/DCT forward hybrid transform using SSE2 operations. It reduces the runtime from 5433 cycles to 1621 cycles, at no compression performance loss. Change-Id: I75fd7f1984e9e28846af459f810ff0d6ae125230	2013-07-10 12:14:53 -07:00
Dmitry Kovalev	417df1d42e	Merge "Adding encode_tiles function to vp9_bitstream.c."	2013-07-10 11:43:50 -07:00
Yaowu Xu	e52eec490c	Merge "Add a feature to reduce chrome intra mode search"	2013-07-10 11:35:47 -07:00
Ronald S. Bultje	b1df674a99	Remove memcpy() in handle_inter_mode() filter selection. Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min4.9 to 2min3.1, i.e. a 1.4% speedup overall. Change-Id: I9b25e87974430cb942caa276410bb2eda815bd83	2013-07-10 09:27:56 -07:00
Yaowu Xu	bed27a960a	Add a feature to reduce chrome intra mode search Change-Id: I721ebdeef2b53ce3e5c3eba3f7462ae2103c95a8	2013-07-10 08:59:18 -07:00
Jim Bankoski	fb027a7658	removing case statements around prediction entropy coding Removes SEG_ID Removes MBSKIP Removes SWITCHABLE_INTERP Removes INTRA_INTER Removes COMP_INTER_INTER Removes COMP_REF_P Removes SINGLE_REF_P1 Removes SINGLE_REF_P2 Removes TX_SIZE Change-Id: Ie4520ae1f65c8cac312432c0616cc80dea5bf34b	2013-07-09 20:10:16 -07:00
Yaowu Xu	059f2929e9	Merge "Revert "Remove memcpy() in handle_inter_mode() filter selection.""	2013-07-09 20:10:06 -07:00
Yaowu Xu	205efbc153	Revert "Remove memcpy() in handle_inter_mode() filter selection." This reverts commit `fcf7998a47`. Change-Id: Ic6532223faec9f1483b78adb2e37b79c7b1a0efb	2013-07-09 17:42:10 -07:00
Dmitry Kovalev	d82f459d1a	Adding encode_tiles function to vp9_bitstream.c. Change-Id: Ie44824ec25fd8fdb25d7c8124a9b28c26d802029	2013-07-09 15:59:19 -07:00
John Koleszar	f0d9f10d24	Remove all asm offset files from VP9 The files are empty and unused. Change-Id: Ieb4242d14273efdf24149bda33f9591540bba06a	2013-07-09 14:26:53 -07:00
Ronald S. Bultje	204d1b7058	Merge "Unbreak lossless."	2013-07-09 09:54:48 -07:00
Ronald S. Bultje	d8fa5d45cc	Merge "Make intra prediction pointers RTCD-based."	2013-07-09 09:54:43 -07:00
Ronald S. Bultje	059c0ba5d4	Unbreak lossless. Change-Id: I8130ec9b5371c65e885f245a5ac73840c23cb4a1	2013-07-09 09:46:37 -07:00
Dmitry Kovalev	c6c279aff0	Merge "Using mi_cols instead of mb_cols."	2013-07-08 20:09:19 -07:00
Dmitry Kovalev	1c65c580d6	Merge "Refactoring setup_pre_planes function."	2013-07-08 20:08:05 -07:00
Dmitry Kovalev	6254c8d780	Merge "Calling set_partition_seg_context() instead of code duplication."	2013-07-08 20:07:06 -07:00
Ronald S. Bultje	8350e7fe38	Make intra prediction pointers RTCD-based. This probably has a mildly negative impact on performance, but will (in future commits - or possibly merged with this one) allow SIMD implementations of individual intra prediction functions. We may perhaps want to consider having separate functions per txfm-size also (i.e. 4x4, 8x8, 16x16 and 32x32 intra prediction functions for each intra prediction mode), but I haven't played much with that yet. Change-Id: Ie739985eee0a3fcbb7aed29ee6910fdb653ea269	2013-07-08 17:25:51 -07:00
Ronald S. Bultje	a5062cc635	Don't call encode_sb() for the final of 4-split subpartitions. The resulting reconstruction is never used, thus it just wastes CPU cycles. Reduces encode time of first 50 frames of bus (speed 0) @ 1500kbps from 2min2.0 to 2min1.2, i.e. a 0.65% overall speedup. Change-Id: I74755ca3aadc21e2be220f486259060bd4088c45	2013-07-08 16:22:39 -07:00
Ronald S. Bultje	8fde07a3ae	Don't recalculate mv_ref costs for each block/partition. Changes cost_mv_ref() into doing a LUT into pre-calculated cost arrays instead. Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min11.6 to 2min10.9, i.e. 0.5% faster overall. Change-Id: If186e92c34c201b29cbbc058785a15c9c09e433a	2013-07-08 16:22:39 -07:00
Ronald S. Bultje	5a73254918	Remove unnecessary memset(best_index, 0) from trellis/optimize. First 50 frames of bus @ 1500kbps (speed 0) goes from 2min12.6 to 2min11.6, i.e. 0.75% overall speedup. Change-Id: I67054f8146e82a02b6457c51a1c8627a937e5e1e	2013-07-08 16:22:39 -07:00
Ronald S. Bultje	fcf7998a47	Remove memcpy() in handle_inter_mode() filter selection. Encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min4.9 to 2min3.1, i.e. a 1.4% speedup overall. Change-Id: Ibe8b08d159797504c5d0c5122de1b6da3b6595e0	2013-07-08 16:22:39 -07:00
Ronald S. Bultje	ed995afba1	Make frame-wide filter-type decision fully RD-based. Overall, on all test sets, this gains about +0.2% on all metrics. City is a clip where this really hurts (-1.0% on all metrics), I'm not quite sure why yet. Maybe interesting to look into in the future. Change-Id: I6f0eecb20e72f0194633270d30bf00d76d9eae78	2013-07-08 16:22:37 -07:00
Dmitry Kovalev	b7559258a4	Using mi_cols instead of mb_cols. Eliminating usage of mb-units, switching to mi-units. Adding ALIGN_POWER_OF_TWO macro. Change-Id: I2491c969f713207c062011878b57e4e531818607	2013-07-08 14:54:04 -07:00
Deb Mukherjee	d9b62160a0	Implements several heuristics to prune mode search Skips mode searches for intra and compound inter modes depending on the best mode so far and the reference frames. The various heuristics to be used are selected by bits from a flag. The previous direction based intra mode search pruning is also absorbed in this framework. Specifically the flags and their impact are: 1) FLAG_SKIP_INTRA_BESTINTER (skip intra mode search for oblique directional modes and TM_PRED if the best so far is an inter mode) derfraw300: -0.15%, 10% speedup 2) FLAG_SKIP_INTRA_DIRMISMATCH (skip D27, D63, D117 and D153 mode search if the best so far is not one of the closest hor/vert/diagonal directions. derfraw300: -0.05%, about 9% speedup 3) FLAG_SKIP_COMP_BESTINTRA (skip compound prediction mode search if the best so far is an intra mode) derfraw300: -0.06%, about 7-8% speedup 4) FLAG_SKIP_COMP_REFMISMATCH (skip compound prediction search if the best single ref inter mode does not have the same ref as one of the two references being tested in the compound mode) derfraw300: -0.56%, about 10% speedup Change-Id: I1a736cd29b36325489e7af9f32698d6394b2c495	2013-07-08 12:17:12 -07:00
Jingning Han	a38cf2658a	Merge "Refactor SSE2 8x8 functional units"	2013-07-05 11:18:18 -07:00
Paul Wilkins	ef0ca2deaa	Merge "Fix to comp_inter_joint_search_thresh feature."	2013-07-04 03:27:00 -07:00
Dmitry Kovalev	f72e072555	Refactoring setup_pre_planes function. Removing set_refs, adding set_ref function. Change-Id: I5635c478b106ae4e57d317f1c83d929644307e63	2013-07-03 17:42:01 -07:00
Dmitry Kovalev	2ce6b23473	Merge "Adding write_skip_coeff function."	2013-07-03 16:33:58 -07:00
Jingning Han	68172dbede	Merge "Enable early termination in rd search"	2013-07-03 14:20:41 -07:00
Dmitry Kovalev	430bd0c94a	Merge "Replacing 64 / MI_SIZE with MI_BLOCK_SIZE."	2013-07-03 14:16:02 -07:00
Dmitry Kovalev	dda1835dc6	Adding write_skip_coeff function. Change-Id: I221126f22ab9067348eb0efb8a73b15a8f49c3fd	2013-07-03 13:23:47 -07:00
Jingning Han	2bd6fe08f8	Enable early termination in rd search This commit allows encoder to detect the cumulative rate-distortion cost per transformed block inside a partition. If the cumulative rd cost is already above the best rd value, it terminates the rest operations and continue to next prediction mode test. It reduces the runtime of bus at target bit-rate 2000 from 308 second to 266 second, i.e., about 13% speed-up at no performance penalty. Change-Id: I5f15a3d8955d97031d5653006027866a00654e7a	2013-07-03 12:54:18 -07:00
Dmitry Kovalev	2ad62c9312	Calling set_partition_seg_context() instead of code duplication. Change-Id: I65be6acc54c99688fd1f0c946cec3511514b8555	2013-07-03 11:15:58 -07:00
Dmitry Kovalev	5a21de8418	Replacing 64 / MI_SIZE with MI_BLOCK_SIZE. Change-Id: I32276552b3ea6dc1dce8e298be114cfe1019b31c	2013-07-03 10:54:50 -07:00
Dmitry Kovalev	60198a595d	Merge "Adding write_selected_txfm_size function."	2013-07-03 10:33:55 -07:00
Jingning Han	2cb75c9607	Refactor SSE2 8x8 functional units These serve as building blocks for SSE2 8x8 and 16x16 ADST/DCT hybrid transform coding. Change-Id: I4089a754c66e0c986f67d9b8ec4dfb9627ad430d	2013-07-03 10:11:59 -07:00
Ronald S. Bultje	61fe678f36	Merge "Use pmovmskb to skip quantize loops over empty coefficients."	2013-07-03 09:05:48 -07:00
Paul Wilkins	f58b44ad62	Fix to comp_inter_joint_search_thresh feature. When this is 0 (BLOCK_SIZE_AB4X4) we want to do the inter joint search for all sizes. Change-Id: Id40cd6fe7790e7e1165352b9cef5e12fa8c0bc88	2013-07-03 16:58:34 +01:00
Paul Wilkins	72c5778ec5	Added two new skip experiments. sf->unused_mode_skip_lvl. Tests modes as normal for all sizes at or below the given level. At larger sizes it skips all modes that were not chosen at any smaller size. Hence setting BLOCK_SIZE_SB64X64 is in effect off. Setting BLOCK_SIZE_AB4X4 will only consider modes that were chosen for one or more 4x4 blocks at larger sizes. sf->reference_masking. Do a test encode of the NONE partition at one size and create a reference frame mask based on the best rd choice. In the full search only allow this reference frame. Currently it is testing 64x64 and repeats this in the full search. This does not work well with Jim's Partition code just now and is disabled by default. Change-Id: I8f8c52d2ef4a0c08100150b0ea4155d1aaab93dd	2013-07-03 16:56:06 +01:00
Paul Wilkins	b0a2871c35	Merge "Adjust Speed 0 settings."	2013-07-03 02:47:18 -07:00
Dmitry Kovalev	1f6e95e76a	Merge "Removing redundant struct from union b_mode_info."	2013-07-02 18:09:31 -07:00
Dmitry Kovalev	be77f6bbbf	Removing redundant struct from union b_mode_info. Change-Id: I08fc6e474ff2c12cfa065bae4989c724276e2c83	2013-07-02 16:51:57 -07:00
Dmitry Kovalev	edb060a77c	Adding write_selected_txfm_size function. Change-Id: I143b430b7c24a964ccd0ebb75944cf317a072214	2013-07-02 16:41:22 -07:00
Yaowu Xu	0d7b7c09cb	Added a speed feature use_square_partition_only This commit adds a speed feature where only squared partition are evaluated in partition picking. Enable this feature in cpu-used 2 reduces encoding time by ~30%. loss of compression: -0.9% on cif set -1.23% on stdhd Change-Id: Ia6fad11210f0b78365abb889f9245604513be5b9	2013-07-02 16:40:15 -07:00
Ronald S. Bultje	e5fb4b61b6	Use pmovmskb to skip quantize loops over empty coefficients. If none of the 16 coefficients that we quantize per loop iteration are larger than the zbin, directly skip to the next round of coeffs, rather than doing a full quantize loop that will eventually result in 16 zeroes. This incurs a jump cost, but saves a lot of other work. 32x32 quant goes from 1349 -> 1184 cycles. The same approach yielded no significantly positive results for smaller transforms, so is not used there (8x8: 103 -> 101 cycles; 16x16: 302 -> 306 cycles). Change-Id: I8fca17dc2543fc8eed1dbcd5100145e3c3a9b647	2013-07-02 16:34:24 -07:00
Deb Mukherjee	37501d687c	Speed feature to binary search dir intramodes This speed feature will skip searching the directional intra prediction modes D63, D117, D27, D153 if the best intra mode so far is not one of the diagonal, horizontal or vertical directions closest to the respective directions being tested. In other words, this implements a sort of binary search in the angular domain. Speedup: about 9-10% Results: -0.05% only on derfraw300. Change-Id: I413584c41f2a3e8dabfbdeb40718c8fc4b1d63a2	2013-07-02 14:07:19 -07:00
Deb Mukherjee	66324d501a	Merge "Clean-up in forward update to use mapping tables"	2013-07-02 14:02:53 -07:00
Deb Mukherjee	8d3d2b76f3	Tx size selection enhancements (1) Refines the modeling function and uses that to add some speed features. Specifically, intead of using a flag use_largest_txfm as a speed feature, an enum tx_size_search_method is used, of which two of the types are USE_FULL_RD and USE_LARGESTALL. Two other new types are added: USE_LARGESTINTRA (use largest only for intra) USE_LARGESTINTRA_MODELINTER (use largest for intra, and model for inter) (2) Another change is that the framework for deciding transform type is simplified to use a heuristic count based method rather than an rd based method using txfm_cache. In practice the new method is found to work just as well - with derf only -0.01 down. The new method is more compatible with the new framework where certain rd costs are based on full rd and certain others are based on modeled rd or are not computed. In this patch the existing rd based method is still kept for use in the USE_FULL_RD mode. In the other modes, the count based method is used. However the recommendation is to remove it eventually since the benefit is limited, and will remove a lot of complications in the code (3) Finally a bug is fixed with the existing use_largest_txfm speed feature that causes mismatches when the lossless mode and 4x4 WH transform is forced. Results on derf: USE_FULL_RD: +0.03% (due to change in the tables), 0% encode time reduction USE_LARGESTINTRA: -0.21%, 15% encode time reduction (this one is a pretty good compromise) USE_LARGESTINTRA_MODELINTER: -0.98%, 22% encode time reduction (currently the benefit of modeling is limited for txfm size selection, but keeping this enum as a placeholder) . USE_LARGESTALL: -1.05%, 27% encode-time reduction (same as existing use_largest_txfm speed feature). Change-Id: I4d60a5f9ce78fbc90cddf2f97ed91d8bc0d4f936	2013-07-02 13:54:00 -07:00
Deb Mukherjee	9c20cedd93	Clean-up in forward update to use mapping tables Uses mapping tables instead of complicated modulo/division operations for prob mapping for forward updates. No bit-stream or output change. Change-Id: Ifd9ce8ac1437835c305c94f64c18273c7a68f546	2013-07-02 12:48:20 -07:00
Ronald S. Bultje	3cc6eb7c00	Merge "Make get_coef_context() branchless."	2013-07-02 11:48:15 -07:00
Dmitry Kovalev	3140c443e4	Merge "Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h."	2013-07-02 11:31:35 -07:00
Yunqing Wang	f4bee75c2b	Merge "Add speed feature to disable splitmv"	2013-07-02 10:54:22 -07:00
Yunqing Wang	b12e060b55	Add speed feature to disable splitmv Added a speed feature in speed 1 to disable splitmv for HD (>=720) clips. Test result on stdhd set: 0.3% psnr loss and 0.07% ssim loss. Encoding speedup is 36%. (For reference: The test result on derf set showed 2% psnr loss and 1.6% ssim loss. Encoding speedup is 34%. SPLITMV should be enabled for small resolution videos.) Change-Id: I54f72b94f506c6d404b47c42e71acaa5374d6ee6	2013-07-02 10:02:34 -07:00
Jingning Han	b91a1586a3	Calculate rd cost per transformed block Compute the rate-distortion cost per transformed block, and cumulate the cost through all blocks inside a partition. This allows encoder to detect if the cumulative rd cost is already above the best rd cost, thereby enabling early termination in the rate-distortion optimization search. Change-Id: I0a856367a9a7b6dd0b466e7b767f54d5018d09ac	2013-07-02 09:58:46 -07:00
Ronald S. Bultje	9df24b41ca	Merge "Update quantize SSSE3 SIMD to cover 32x32 transform case also."	2013-07-02 09:38:08 -07:00
Paul Wilkins	1319d9c077	Adjust Speed 0 settings. Remove the use of sf->comp_inter_joint_search_thresh from the baseline speed 0. Approx +0.4% on derf. Change-Id: Icc14db98909830f40e5ac66130d40e78d2e55c71	2013-07-02 15:42:14 +01:00
Paul Wilkins	b7cd01ed73	Revert "New motion threshold factor - speed feature." This reverts commit `1377278180`. Also fixes a spelling mistake. Change-Id: I5be8aa4d8d3c0323d4a6f41968a7b2c048949c3f	2013-07-02 15:06:40 +01:00
Yaowu Xu	9e408e3504	fix the mismatch again in cpu_used 2 Change-Id: Icc4f70f0b0f91c9e7d5d00eedd67841afe2f2679	2013-07-01 19:13:18 -07:00
Jim Bankoski	d4158283e7	use partitioning from last frame This cl converts use partition from last frame to do the following: if part is none,horz, vert -> try split if part != none and one of the children is not split - try none Change-Id: I5b6c659e35f3ac9f11c051b92ba98af6d7e8aa87 Signed-off-by: Jim Bankoski <jimbankoski@google.com>	2013-07-01 18:18:50 -07:00
Dmitry Kovalev	1ac0540296	Removing vp9_mbpitch.c, moving vp9_setup_block_dptrs to vp9_block.h. Change-Id: Ia547a5dd7650b771fd00edd673ab9f920270731c	2013-07-01 17:28:08 -07:00
Ronald S. Bultje	26b6318de8	Make get_coef_context() branchless. This should significantly speedup cost_coeffs(). Basically what the patch does is to make the neighbour arrays padded by one item to prevent an eob check in get_coef_context(), then it populates each col/row scan and left/top edge coefficient with two times the same neighbour - this prevents a single/double context branch in get_coef_context(). Lastly, it populates neighbour arrays in pixel order (rather than scan order), so we don't have to dereference the scantable to get the correct neighbours. Total encoding time of first 50 frames of bus (speed 0) at 1500kbps goes from 2min10.1 to 2min5.3, i.e. a 2.6% overall speed increase. Change-Id: I42bcd2210fd7bec03767ef0e2945a665b851df56	2013-07-01 16:34:10 -07:00
Yaowu Xu	ba3b2604f0	Merge "Quantize (64-bit only, for now) SSSE3 SIMD."	2013-07-01 15:58:57 -07:00
Dmitry Kovalev	6411228aca	Merge "Removing vp9_modecont.{h, c}."	2013-07-01 14:58:48 -07:00
Dmitry Kovalev	d9db0d96ec	Merge "Moving encoder subexp encoding functions to subexp.{h, c}."	2013-07-01 14:58:36 -07:00
Ronald S. Bultje	c8defcfdee	Update quantize SSSE3 SIMD to cover 32x32 transform case also. Encode time of bus (speed 0) 50 frames @ 1500kbps goes from 2min14.4 to 2min10.1, i.e. a 2.3% overall speed increase. Change-Id: I3699580e74ec26c7d24e03681bc47ba25ee1ee87	2013-07-01 11:36:33 -07:00
Ronald S. Bultje	7353ceab9d	Quantize (64-bit only, for now) SSSE3 SIMD. Total encoding time for first 50 frames of bus (speed 0) @ 1500kbps goes 2min34.8 to 2min14.4, i.e. a 10.4% overall speedup. The code is x86-64 only, it needs some minor modifications to be 32bit compatible, because it uses 15 xmm registers, whereas 32bit only has 8. Change-Id: I2df53770c2e850813ffa713e1a91b45b0082b904	2013-07-01 11:36:07 -07:00
Dmitry Kovalev	2ab3bc8871	Removing vp9_modecont.{h, c}. Moving vp9_default_inter_mode_probs array to vp9_entropymode.c. Change-Id: I88ebda86ccc07f2a43c6c01d4b37898214cfb6de	2013-07-01 10:17:15 -07:00
Paul Wilkins	7bb436feee	Merge "New motion threshold factor - speed feature."	2013-07-01 09:39:02 -07:00
Yaowu Xu	632289b31f	fix a mismatch in cpuused 2 Change-Id: I921c9faba6386535aaf717a54301dd346a9b8540	2013-07-01 08:54:50 -07:00
Paul Wilkins	1377278180	New motion threshold factor - speed feature. Added a speed feature that focuses only on thresholds for new motion modes. Moved sf->comp_inter_joint_search_thresh into speed 1. This has ~+0.4% impact on quality at speed 0 as our quality reference baseline. Slight adjustment to baseline thresholds. Change-Id: I7ebf104f1fe29af77ed4837b2e84be065621bbe5	2013-07-01 12:11:21 +01:00
Jingning Han	993942ce0c	Merge "Enable SSE2 4x4 ADST/DCT transform"	2013-06-29 15:57:04 -07:00
Christian Duvivier	466e0cf303	SSE2 version of vp9_short_fdct32x32_rd. 43,000 -> 5,750 cycles, about 7.5x faster. Change-Id: Ibfd92821b9603f4ed9c256e0ececec14fa4565d0	2013-06-29 13:53:00 -07:00
Dmitry Kovalev	bb8ccf1caf	Moving encoder subexp encoding functions to subexp.{h, c}. Change-Id: I83ca53bf6def871f199a382a671f26ad7cbecbca	2013-06-29 11:50:45 -07:00
Ronald S. Bultje	bc70c60b25	Merge "fixed a bug where sse is not populated"	2013-06-29 07:42:41 -07:00
Ronald S. Bultje	a487af8d35	Merge "Inline vp9_get_coef_context() (and remove vp9_ prefix)."	2013-06-28 19:37:11 -07:00
Ronald S. Bultje	7731e53839	Merge "Minor change to prevent one level of dereference in cost_coeffs()."	2013-06-28 19:36:56 -07:00
Jingning Han	1109b6b888	Enable SSE2 4x4 ADST/DCT transform This commit enables SSE2 4x4 foward hybrid transform. The runtime goes from 249 cycles down to 74 cycles. Overall around 2% speed-up at no compression performance change. Change-Id: Iad4d526346e05c7be896466c05500711bb763660	2013-06-28 17:24:43 -07:00
Yaowu Xu	f853e662b7	fixed a bug where sse is not populated Change-Id: I692d800af1f976c84a76f8bd66864c4b39540abc	2013-06-28 17:10:22 -07:00
Jingning Han	07b72ace70	Merge "Fix switch statement in 8x8 transform"	2013-06-28 16:49:59 -07:00
Dmitry Kovalev	59070f6e3c	Merge "Removing CONFIG_DEBUG checks on assertions."	2013-06-28 14:03:28 -07:00
Jingning Han	9def7f72a0	Fix switch statement in 8x8 transform Change-Id: I7c46354c4983feb5f6202c3ab4a1d9534da7e30f	2013-06-28 13:40:36 -07:00
Ronald S. Bultje	cee3bc6ffa	Merge "Some minor optimizations for cost_coeffs()."	2013-06-28 11:54:50 -07:00
Ronald S. Bultje	ec5d09b950	Merge "Make coefficient skip condition an explicit RD choice."	2013-06-28 11:54:28 -07:00
Ronald S. Bultje	d00b8e5f82	Inline vp9_get_coef_context() (and remove vp9_ prefix). Makes cost_coeffs() a lot faster: 4x4: 236 -> 181 cycles 8x8: 888 -> 588 cycles 16x16: 3550 -> 2483 cycles 32x32: 17392 -> 12010 cycles Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup. Change-Id: I16b8d595946393c8dc661599550b3f37f5718896	2013-06-28 10:40:21 -07:00
Dmitry Kovalev	0345fc3ad9	Merge "Decoder's code cleanup."	2013-06-28 10:38:54 -07:00
Dmitry Kovalev	8e6ce6bb9e	Removing CONFIG_DEBUG checks on assertions. Adding CHECK_MEM_ERROR macro to vp9_common.h and removing two duplicated ones from vp9_onyx_int.h and vp9_onyxd_int.h. Change-Id: I916afec61b3019f18193135dac7c35ed0f89b8b6	2013-06-28 10:36:20 -07:00
Ronald S. Bultje	e3ce2b2ab3	Minor change to prevent one level of dereference in cost_coeffs(). 4x4: 234 -> 236 cycles 8x8: 878 -> 888 cycles 16x16: 3664 -> 3550 cycles 32x32: 18134 -> 17392 cycles Change-Id: I37a51bfbb0060a3a54f09c6045c14a989811ed78	2013-06-28 10:29:07 -07:00
Ronald S. Bultje	91d223bd5c	Some minor optimizations for cost_coeffs(). Cycle timings for first 3 frames of bus (speed 0) at 1500kbps: 4x4: 298 -> 234 cycles 8x8: 1227 -> 878 cycles 16x16: 23426 -> 18134 cycles 32x32: 4906 -> 3664 cycles Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min0.7 to 2min51.6 seconds, i.e. 5.3% faster. Change-Id: I68a0e1b530b0563b84a67342cca4b45146077e95	2013-06-28 10:29:02 -07:00
Ronald S. Bultje	af660715c0	Make coefficient skip condition an explicit RD choice. This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4	2013-06-28 10:28:49 -07:00
Yaowu Xu	1b5421f3c5	Merge "Minor cleanups"	2013-06-28 09:56:11 -07:00
Yaowu Xu	64bb996e03	Merge "Optimize partition search order"	2013-06-28 09:29:39 -07:00
Yaowu Xu	8b9eea0a34	Minor cleanups Change-Id: I379617c1c731a686b3f7e032b8805860c1055b12	2013-06-28 09:19:50 -07:00
Yaowu Xu	1374a06bd8	Optimize partition search order This commit change the partition search order to allow checking of rectangular partition to be done after square partitions. It also added a speed feature to skip rectangular partition check when NONE is better than SPLIT in RD sense. This feature roughly speed up encoder by 1.5X with loss on compression -0.91% on cif set -0.56% on stdhd set Change-Id: I0d2d06993041aa9ea9073fcc39c54f73a127dfa4	2013-06-28 07:13:54 -07:00
Ronald S. Bultje	fd4eed3b08	Fix tile independence with both column tiling and static_thresh set. Change-Id: I0b2be0ec2c410a527f88b95a44f24ac967b2dac1	2013-06-27 21:56:40 -07:00
Dmitry Kovalev	3231da0a9e	Decoder's code cleanup. Using vp9_set_pred_flag function instead of custom code, adding decode_tokens function which is now called from decode_atom, decode_sb_intra, and decode_sb. Change-Id: Ie163a7106c0241099da9c5fe03069bd71f9d9ff8	2013-06-27 16:15:43 -07:00
Dmitry Kovalev	a3664258c5	Merge "General cleanup in segmentation-related code."	2013-06-27 14:57:07 -07:00
Dmitry Kovalev	be83ef3104	Merge "Moving subexp encoding functions in separate vp9_dsubexp.c file."	2013-06-27 14:55:18 -07:00
Ronald S. Bultje	7a049be6bf	Inline quantize so idiv instruction gets removed from inner loop. Encoding time of first 50 frames of bus @ 1500kbps (speed 0) goes from 3min15.0 to 3min10.9, i.e. 2.1% faster overall. Change-Id: If592ee99be09bcd34a7c8498347f44e7305e982c	2013-06-27 09:01:04 -07:00
Paul Wilkins	05ffdf2625	Merge "Auto adapt step size feature."	2013-06-27 02:28:41 -07:00
Paul Wilkins	59af9049d3	Merge "Start adaptive threshold for each mode at max."	2013-06-27 02:28:36 -07:00
Paul Wilkins	5bcf069c6b	Merge "Change meaning of cpi->sf.first_step and rename."	2013-06-27 02:28:21 -07:00
Jingning Han	fc1cfd8e32	Merge "Make intra predictor reference buffer configurable"	2013-06-26 19:02:02 -07:00
Jingning Han	861cb06c67	Make intra predictor reference buffer configurable This commit enables configurable reference buffer pointer for intra predictor. This allows later removal of spatial dependency between blocks inside a 64x64 superblock in the rate-distortion optimization loop. Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1	2013-06-26 17:17:21 -07:00
Jingning Han	a9e7243d1a	Merge "Remove empty function vp9_build_block_offsets"	2013-06-26 17:06:56 -07:00
Ronald S. Bultje	b5468155b6	Remove unused macro RDTRUNC_8x8 from encodemb.c. Change-Id: I0c097567adab24215d807963ccb34810a2afe007	2013-06-26 15:34:01 -07:00
Jingning Han	bd9bac0391	Remove empty function vp9_build_block_offsets This function is empty, hence is removed. Change-Id: Ia9d01710806bffe0398a6dc9405f8a5a81b27d74	2013-06-26 14:55:47 -07:00
Paul Wilkins	f36f66cd37	Merge "fixed a compiling problem with MSVC win32 build"	2013-06-26 11:58:16 -07:00
Paul Wilkins	9f3ab83486	Auto adapt step size feature. Also tweaks to other features and experiments with what is on and off at different speed settings. Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2	2013-06-26 19:48:39 +01:00
Dmitry Kovalev	be07485e9a	General cleanup in segmentation-related code. Using consistent function and variable names. Change-Id: I2deb3fded8797453a2081836c9ce2e79ade06eb7	2013-06-26 10:27:28 -07:00
Dmitry Kovalev	49dee16879	Merge "Using get_plane_block_{width, height} instead of custom code."	2013-06-26 10:23:27 -07:00
Yaowu Xu	60dc7375af	fixed a compiling problem with MSVC win32 build The aligned array in parameter list caused win32 build to report c2719 error. This commit fixed the issue by make the parameter type a pointer instead of an array. Change-Id: I4ed654ce4eba2db4995d9cdc136c68e9a6acc992	2013-06-26 09:33:16 -07:00
Paul Wilkins	689957e3ad	Start adaptive threshold for each mode at max. Each frame we reset all adaptive thresholds to MAX rather than base. As modes are picked their thresholds drop down. Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8	2013-06-26 17:04:47 +01:00
Paul Wilkins	e606cac046	Change meaning of cpi->sf.first_step and rename. Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size and changed its meaning such that it is a delta applied to reduce the default first step size (>> x) in the motion search rather than an absolute value. The default first step size is already changed according to the image dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size now applies a further correction from the default. Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d	2013-06-26 17:04:06 +01:00
John Koleszar	8137e24f3d	Merge "Move vp9_counts_to_nmv_context to encoder"	2013-06-25 22:44:21 -07:00
John Koleszar	7bbb0633cd	Merge "Move vp9_full_to_model_counts to encoder"	2013-06-25 22:44:16 -07:00
Jingning Han	3cc8c8c3a0	Merge "Refactor intra predictor block"	2013-06-25 19:46:55 -07:00
Jingning Han	d19ea3861d	Refactor intra predictor block Remove vp9_intra4x4_predict(). Use the common intra prediction function for all block sizes. Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560	2013-06-25 16:33:13 -07:00
Dmitry Kovalev	6fb10f2de4	Renaming "nmv" to "mv". Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b	2013-06-25 15:19:18 -07:00
Dmitry Kovalev	dc0f457c94	Using get_plane_block_{width, height} instead of custom code. Change-Id: I453ed11b965e857a14c18ea5c0f4a0a48e7dc0d9	2013-06-25 14:11:18 -07:00
Ronald S. Bultje	0441e0a2fc	Merge "Only do metrics on cropped (visible) area of picture."	2013-06-25 13:51:18 -07:00
Ronald S. Bultje	1d0ae2e63c	Merge "Don't skip right/bottom border pixels in SSIM calculations."	2013-06-25 13:51:04 -07:00
Ronald S. Bultje	c5be54eef3	Merge "Add averaging-SAD functions for 8-point comp-inter motion search."	2013-06-25 13:50:53 -07:00
Jingning Han	d52c359d43	Merge "Tune the rounding operations in 8x8 ADST/DCT sse2"	2013-06-25 13:17:05 -07:00
Ronald S. Bultje	450c7b57a8	Only do metrics on cropped (visible) area of picture. The part where we align it by 8 or 16 is an implementation detail that shouldn't matter to the outside world. Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53	2013-06-25 12:57:28 -07:00
Ronald S. Bultje	44f349df62	Don't skip right/bottom border pixels in SSIM calculations. Change-Id: I75acb55ade54bef6ad7703ed5e691581fa2f8fe1	2013-06-25 12:57:28 -07:00
Ronald S. Bultje	c24d922396	Add averaging-SAD functions for 8-point comp-inter motion search. Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2, i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc the variance of the averaging predictor. This is slightly suboptimal because the function is subpixel-position-aware, but it will (at least for the SSE2 version) not actually use a bilinear filter for a full-pixel position, thus leading to approximately the same performance compared to if we implemented an actual average-aware full-pixel variance function. That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus leading to a total gain of 2.7%. Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd	2013-06-25 12:57:28 -07:00
Jingning Han	0084e61d5f	Tune the rounding operations in 8x8 ADST/DCT sse2 Improve the round-trip precision to meet the unit test setttings. Change-Id: I303febae56b4b990ea3798b8ebed94c0510ecf79	2013-06-25 12:02:26 -07:00
Ronald S. Bultje	5ebe47747d	Merge "Don't re-allocate comp_pred buffers for each call to comp motion search."	2013-06-25 12:00:36 -07:00
Dmitry Kovalev	9467571777	Moving subexp encoding functions in separate vp9_dsubexp.c file. Change-Id: Idbb2ea80f764fa830fe2ddcfc54ef7fe232f05a8	2013-06-25 11:53:17 -07:00
Dmitry Kovalev	5ae096778e	Merge "Removing unused code."	2013-06-25 11:50:55 -07:00
Jingning Han	cd6932db77	Merge "Add 8x8 dct/adst unit tests"	2013-06-25 11:21:17 -07:00
Dmitry Kovalev	87ee34aacb	Removing unused code. Removing block index (ib) parameter from get_tx_type_{8x8, 16x16} functions. Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1	2013-06-25 10:17:19 -07:00
Dmitry Kovalev	70e9622185	Merge "Removing find_seg_id and using vp9_get_pred_mi_segid instead."	2013-06-25 10:16:06 -07:00
Dmitry Kovalev	529679bd52	Merge "Transforming scale_mv_component_q4 into scale_mv_q4 function."	2013-06-25 10:15:33 -07:00
Jingning Han	ab362621fe	Add 8x8 dct/adst unit tests This commit enables 8x8 DCT and hybrid transform unit tests. It also tunes the forward hybrid transform rounding opertions for more precise round-trip performance. Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3	2013-06-25 09:57:01 -07:00
Jingning Han	67365520e7	Merge "Use aligned buffer operations in 8x8/16x16 2D-DCT"	2013-06-25 09:49:03 -07:00
Yaowu Xu	b9c934df8e	Merge "Enable sse2 implmentation of 8x8 ADST/DCT"	2013-06-25 09:13:22 -07:00
Jingning Han	82d504b50f	Use aligned buffer operations in 8x8/16x16 2D-DCT This reduces 16x16 2D-DCT runtime from 865 cycles to 837 cycles. Change-Id: I137758b81cd127b936175284310e81378db64552	2013-06-24 19:56:23 -07:00
Jingning Han	a32a086d23	Enable sse2 implmentation of 8x8 ADST/DCT This commit makes use of the butterfly structure to enable the sse2 version implementation of 8x8 ADST/DCT hybrid transform coding. The runtime of hybrid transform module goes down from 1170 cycles to 245 cycles. Overall speed-up around 1.5%. Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f	2013-06-24 18:41:33 -07:00
Yaowu Xu	e371cd73a3	change to enable use_largest_txform feature for all regular inter frames at speed 1 Change-Id: I0a8b301273ecf2b8730ab1f6b7a05f89f4d498e0	2013-06-24 16:43:26 -07:00
John Koleszar	4ecd6dbead	Move vp9_counts_to_nmv_context to encoder This function only used from within vp9_encodemv.c. Change-Id: Ib3fc7c30b1e2d27321397ac474cbc8976bc1f4b1	2013-06-24 15:58:18 -07:00
John Koleszar	08b1798ae7	Move vp9_full_to_model_counts to encoder This function is not called from the decoder, so it doesn't need to be in common/. Change-Id: I6977dd462a25b4ff39c9c7e1b0b5b16aa58ee733	2013-06-24 15:46:15 -07:00
Ronald S. Bultje	4dc70fa7f9	Don't re-allocate comp_pred buffers for each call to comp motion search. Instead, just allocate a few bytes on the stack, this is 4k, which isn't all that much. Change-Id: I82af6ee89e6ed01faaa23ff891ee7ced76df8c16	2013-06-24 14:05:13 -07:00
Dmitry Kovalev	f27f76dfb3	Transforming scale_mv_component_q4 into scale_mv_q4 function. Using MV instead of int_mv for function arguments. Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6	2013-06-21 15:34:29 -07:00
Ronald S. Bultje	fc033b38ee	Remove emms - that shouldn't be there. Change-Id: I8fcab81e390f93dc17e9666bbf8f77883b5aa897	2013-06-21 14:45:04 -07:00
Dmitry Kovalev	40141681c0	Removing find_seg_id and using vp9_get_pred_mi_segid instead. Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827	2013-06-21 13:05:10 -07:00
Ronald S. Bultje	ba42c02654	Add missing SECTION .text marker in assembly file. Fixes a crash on Windows when building with MSVC. Change-Id: I124ac756a1be55d190fadda5fcc46d23b1445dbf	2013-06-21 12:55:46 -07:00
Ronald S. Bultje	54b2a59623	Implement SSE2 block_error. Change vp9_block_error() to return a 64bit error variable, change all callers to expect a 64bit return value (this will prevent overflows, which we basically don't check for at all right now). Remove duplicate block_error() function, which fixed that through truncation. Remove old (incompatible) mmx/sse2 block_error SIMD versions and replace with a new one that returns a 64bit value. Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to 3min23, i.e. a 3% overall speedup. Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68	2013-06-21 12:54:52 -07:00
Ronald S. Bultje	7756e9892b	Merge "Add subtract_block SSE2 version and unit test."	2013-06-21 12:49:50 -07:00
Ronald S. Bultje	9a480482cb	Merge "SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance()."	2013-06-21 12:49:43 -07:00
Ronald S. Bultje	25c588b1e4	Add subtract_block SSE2 version and unit test. 3% faster overall (3min35.0 to 3min28.5). Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e	2013-06-21 09:35:37 -07:00
Yaowu Xu	869d770610	Merge "Get some speed back for cpuused 1"	2013-06-20 22:37:01 -07:00
Yaowu Xu	45e25a7814	Get some speed back for cpuused 1 and remove unused code. Change-Id: If380440c4450294b5450b7a9eeb94a376846ec01	2013-06-20 19:05:18 -07:00
Yaowu Xu	61721181ec	Merge "rename variables to avoid build error in MSVC"	2013-06-20 19:04:30 -07:00
Yaowu Xu	ee07a261a0	rename variables to avoid build error in MSVC Change-Id: I7960178c95c54d5c4497e44cfc8c493566294b34	2013-06-20 18:31:48 -07:00
Yaowu Xu	e6cd5ed307	Merge "Implement sse2 and ssse3 versions for all sub_pixel_variance sizes."	2013-06-20 17:42:50 -07:00
Ronald S. Bultje	1e6a32f1af	SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to 3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions which use a bilinear filter (x_offset & 7 \|\| y_offset & 7) aren't perfectly interleaved, and can probably be improved further in the future. I've marked this with a few TODOs/FIXMEs in the code. Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9	2013-06-20 15:59:48 -07:00
Dmitry Kovalev	8283d893eb	Merge "Renaming 'nmv' to 'mv' for several functions."	2013-06-20 10:17:12 -07:00
Deb Mukherjee	7947a33d72	Improving model rd with variance and quant step Improves the rd modeling function and implements them using interpolation from a table which is a little faster. Also uses sse as input to the modeling function rather than var - since there is no dc prediction used and as a result the sse works a little better. derfraw300: +0.05% Speedup: ~1% Change-Id: I151353c6451e0e8fe3ae18ab9842f8f67e5151ff	2013-06-20 10:06:28 -07:00
Jim Bankoski	9f2a1ae23e	adds force partitioning greater than or less than block size adds a new speed feature to force partitioning to be greater than or less than a certain size Change-Id: I8c048eeeef93700ae822eccf98f8751a45b2e7d0	2013-06-20 09:51:42 -07:00
Jim Bankoski	18bdf708e7	adds a set partitioning to speed features this feature lets you set a partitioning size to be used by the entire frame. Change-Id: I208a4c8c701375cbb054418266f677768b6f8f06	2013-06-20 09:50:44 -07:00
Jim Bankoski	476d73d294	partition by variance using var from last frame This uses variance to split partition. Variance is calculated using nearest mv, always from last ref frame. Change-Id: Idd015b4a9aa3bc82591759eac239680c07496896	2013-06-20 09:48:22 -07:00
Jim Bankoski	1f94b97694	convert all speed things to speed features Change-Id: Ie24489a4d39f3e53e816eeebf75a1c9c7d94515a	2013-06-20 09:42:44 -07:00
Jim Bankoski	727fa7b1e4	new partition via variance Change-Id: Ideee45cad8b38087c509cd404484728e85d0c427	2013-06-20 09:42:05 -07:00
Jim Bankoski	0fad6a9d99	fix to set up new speed feature This uses the speed feature functionality for code. Change-Id: I9cd16c0c5f98520ae27ebba81aa2c178546587f8	2013-06-20 09:35:02 -07:00
Jim Bankoski	df2314cfdd	don't copy partitions for key frames or altrefs force us to go through slow partitioning for keyframes, altref and overlays. Change-Id: I1a286361bf74083e71973575a7296be46eb98742	2013-06-20 09:34:32 -07:00
Ronald S. Bultje	8fb6c58191	Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d	2013-06-20 09:34:25 -07:00
Jim Bankoski	f954490bbf	disable speed > 1 speed corrections in firstpass need to rework these Change-Id: I17dc2c88d2faadd2f8fb117c52c25f04ea2e9856	2013-06-20 09:34:03 -07:00
Jim Bankoski	fbcce4dd6f	Merge "copy partitioning from last fame"	2013-06-20 09:32:43 -07:00
Jim Bankoski	f033b44e74	copy partitioning from last fame Change-Id: I26e80ede80cb4389378a95afa95d229092a9859a	2013-06-20 09:32:19 -07:00
Yunqing Wang	3656835771	Merge "Add two-pass quantization"	2013-06-19 11:35:40 -07:00
Yunqing Wang	b5bf7b13a8	Add two-pass quantization Optimized the quantization function by making it a two-pass process. The first pass does a quick checking of the transform coefficients against the base ZBIN, and only keep the good enough set of coefficients for quantization. A skipping check is added. If all coefficients are within the base ZBIN, no quantization is needed. The second pass is the actual quantization pass, which only processes the coefficient subset determined in first pass. This reduces the computation. Furthermore, an alternitive method is used for large transform size, which often has sparse nonzero quantized coefficients. Overall, the encoder speedup is about 4%. The quantization function itself gets 20% faster. Change-Id: I3a9dd0da6db030260b6d9c314a9fa48ecae89f22	2013-06-19 10:35:02 -07:00
Yaowu Xu	12180c8329	Remove unnecessary copying of probs. Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c	2013-06-18 23:02:27 -07:00
Dmitry Kovalev	87e1fa7627	Renaming 'nmv' to 'mv' for several functions. Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09	2013-06-18 18:28:10 -07:00
Jingning Han	7088426976	Merge "Make fdct32 computation flow within 16bit range"	2013-06-18 11:40:14 -07:00
Dmitry Kovalev	dfc0385291	Merge "Removing vp9_invtrans.{c, h} files."	2013-06-18 10:16:25 -07:00
Jingning Han	a41a4860c0	Make fdct32 computation flow within 16bit range This commit makes use of dual fdct32x32 versions for rate-distortion optimization loop and encoding process, respectively. The one for rd loop requires only 16 bits precision for intermediate steps. The original fdct32x32 that allows higher intermediate precision (18 bits) was retained for the encoding process only. This allows speed-up for fdct32x32 in the rd loop. No performance loss observed. Change-Id: I3237770e39a8f87ed17ae5513c87228533397cc3	2013-06-18 09:46:24 -07:00
Ronald S. Bultje	d9fc451666	Move subpixel variance function from common/ to encoder/. This seems to only be used in the encoder. Also remove an empty wrapper file that contained forward declarations for this function, but didn't actually define any actual functions. Change-Id: Ifc561eef7ebe374a7d03698055e51e105f6d614b	2013-06-17 16:54:09 -07:00
Dmitry Kovalev	686b99741c	Removing vp9_invtrans.{c, h} files. Moving single function from vp9_invtrans.c to vp9_encodemb.c. Change-Id: I26bf6bb90de342a3036c0dbfba78a7dd75a61fe7	2013-06-17 16:09:03 -07:00
Ronald S. Bultje	a2f33e2505	Use assembly-optimized variance functions in sub_pixel_{avg}_var(). 2.5% faster when encoding first 50 frames of bus @ 1500kbps. Change-Id: I5a64703996cf7fd39b07e32c72311c4b125ec6d4	2013-06-17 14:57:13 -07:00
Ronald S. Bultje	53729c7786	Fix typo ('weight' instead of 'width'). Change-Id: I5d3944051d091b4bf3eb13e2a30132d34203ef74	2013-06-17 13:56:24 -07:00
John Koleszar	c2da365484	Merge "Remove constant vp9_coef_update_prob table"	2013-06-14 17:07:19 -07:00
John Koleszar	0f7a66e962	Remove constant vp9_coef_update_prob table All elements of this table are equal to 252, so replace it with a single constant VP9_COEF_UPDATE_PROB. Change-Id: I1e2d1d284326ce6df9899a740c2fc344b3ec81c9	2013-06-14 15:12:31 -07:00
Jingning Han	0b7910b9ff	Merge "Enable sse2 version of sad8x4/4x8"	2013-06-14 13:15:49 -07:00
Jingning Han	c43af9a8a3	Enable sse2 version of sad8x4/4x8 The encoding time for bus at CIF goes from 661s to 625s. This commit also enabled unit test of sad8x4/4x8 in sad_test.cc. Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1	2013-06-14 09:19:28 -07:00
Deb Mukherjee	4ad96115cd	Some cleanups in rd motion search No bitstream or output change - only cosmetics. Change-Id: Ic8c1d7ad010a87dcf27d12a38cd7dd5adba683a7	2013-06-13 17:25:23 -07:00
Jingning Han	15f50e7b42	Enable sse2 version of sad8x4/4x8 The encoding time for bus at CIF goes from 661s to 625s. This commit also enabled unit test of sad8x4/4x8 in sad_test.cc. Change-Id: If3d10ebb56bda584bdb69bcf056599d580b12cb1	2013-06-13 16:18:18 -07:00
Ronald S. Bultje	fa96eeb835	Implement SSE version for sad4x8x4d and SSE2 version for sad8x4x4d. Encoding time of crew (CIF, first 50 frames) @ 1500kbps goes from 4min56 to 4min42. Change-Id: I92c0c8b32980d2ae7c6dafc8b883a2c7fcd14a9f	2013-06-12 17:40:01 -04:00
Ronald S. Bultje	b55f8b696a	Merge "Fix row tiling."	2013-06-12 12:41:57 -07:00
John Koleszar	ad3b12f857	Merge "Fix chroma output when scaling"	2013-06-12 12:39:10 -07:00
Ronald S. Bultje	8a0808a145	Fix row tiling. Change-Id: I57be4eeaea6e4402f6a0cc04f5c6b7a5d9aedf9b	2013-06-12 13:42:59 -04:00
John Koleszar	01016ff9a6	Fix chroma output when scaling The encode-side scaling was not indexing through the image correctly for the chroma planes, causing a green checkerboard-like output in the unit test. Change-Id: I9abbd73615404cd6699588be3e64dcf59005bc14	2013-06-12 10:11:53 -07:00
John Koleszar	d0ed677a34	Merge branch 'master' into experimental Change-Id: Ie648398b82f7311143709f55c0e30ba452f50eff	2013-06-11 16:29:28 -07:00
Deb Mukherjee	e3d3ace314	Merge "Minor change in forward updates" into experimental	2013-06-11 12:48:41 -07:00
Deb Mukherjee	a4d906c132	Minor change in forward updates Removes the case of coding prob = 0 for forward updates, since that is not an allowed probability to code. Slightly improves efficiency but may not matter in practice. Change-Id: I3b4caf82e8f0891992f0706d4089cc5a27568dba	2013-06-11 10:33:07 -07:00
Jim Bankoski	fca6c82b29	Fix rd partition search for corner blocks This commit enables proper partition type search for the bottom- right corner blocks. Change-Id: Id1123d0e4e81eba648ed4f3c0c7ab587e174f650	2013-06-11 09:29:21 -07:00
Deb Mukherjee	f18328cbf1	Adds a zero check in model_rd function Avoids divide-by-zero when variance is 0. Change-Id: I3c7f526979046ff7d17714ce960fe81d6e1442a0	2013-06-10 17:04:47 -07:00
John Koleszar	9b78ed8229	Merge "Using network byte order (big-endian) to encode tile size." into experimental	2013-06-10 16:48:11 -07:00
Deb Mukherjee	51a7c7631d	Merge "New probs for filters/tx_size and a few others" into experimental	2013-06-10 16:39:43 -07:00
Deb Mukherjee	a43ff15399	New probs for filters/tx_size and a few others * New probs for subpel filters/tx_count * Makes a change to not reset to defaults for the tx_size probs if an intermediate frame reverts to using a fixed tx_size. * A few updates to the parameters for backward adaptation for mode/mv * some cosmetic cleanups derf300: +0.06% Change-Id: I22994d659bc31ca7a4fc8820fde24001e64a2920	2013-06-10 16:38:47 -07:00
Dmitry Kovalev	85381e3416	Using network byte order (big-endian) to encode tile size. This is consistent with uncompressed header encoding. Change-Id: Iccf40a44b493ed36ee085b81ed56f7952cde70a9	2013-06-10 16:13:08 -07:00
John Koleszar	0fcb625e35	Remove remnants of VP8 profiles/versions Remove the bilinear filter mode, and the no-loopfilter mode, and the related vp9_setup_version() function. Change-Id: I32311367812faf37863131df3af37d63d03973d7	2013-06-10 15:55:03 -07:00
John Koleszar	2f3cbfdde1	Merge "Fix use of get_uv_tx_size in loopfilter" into experimental	2013-06-10 12:17:11 -07:00

... 3 4 5 6 7 ...

1453 Commits