generic-library/vpx

Author	SHA1	Message	Date
Jingning Han	d05f66aa10	SSE2 16x16 inverse ADST/DCT hybrid transform This commit enables SSE2 implementation of 16x16 inverse ADST/DCT hybrid transform. The runtime goes from 5742 cycles -> 1821 cycles. This provides about 1% encoding speed-up at speed 0. Change-Id: I1678d0988bf30b9efd524877705bbb3645edb17b	2013-07-16 12:51:42 -07:00
James Zern	c0562d08f6	Merge "VP[89]_COMMON: remove unused near_boffset"	2013-07-16 12:17:04 -07:00
James Zern	63e914bde4	Merge "VP9_COMMON: remove unused framerate/bitrate"	2013-07-16 12:16:37 -07:00
James Zern	3a7c2665d0	Merge "yv12config: remove YUV_TYPE"	2013-07-16 12:16:04 -07:00
Ronald S. Bultje	58a2005367	Merge "Replace generated quant tables with static lookup tables."	2013-07-16 12:07:17 -07:00
Ronald S. Bultje	e965cccce5	Replace generated quant tables with static lookup tables. This prevents possible float rounding issues between architectures. Change-Id: I6ed260aebd49feb4cfb5596a5370c44be5f72167	2013-07-16 12:06:26 -07:00
John Koleszar	cc1aac1b3c	Merge "Fix above context pointers"	2013-07-16 11:23:38 -07:00
Jingning Han	5851904744	Merge "SSE2 8x8 inverse ADST/DCT transform"	2013-07-16 11:00:11 -07:00
Dmitry Kovalev	baf0c959c7	Moving vp9_kf_default_bmode_probs to vp9_entropymode.c. Removing vp9_modelcontext.c. Change-Id: If2316c58dead2708d9f95b52d9494ba4c1dd7427	2013-07-16 10:54:34 -07:00
Dmitry Kovalev	863138a2ad	Rewriting vp9_set_pred_flag_{seg_id, mbskip}. Making implementation of vp9_set_pred_flag_{seg_id, mbskip} consistent with vp9_get_segment_id without using confusing sub(a, b) macro. Passing mi_row and mi_col to functions explicitly instead of replying on mb_to_right_edge and mb_to_bottom_edge. Change-Id: I54c1087dd2ba9036f8ba7eb165b073e807d00435	2013-07-16 10:44:48 -07:00
Paul Wilkins	30d2ea45ce	Minor cleanup in code to fine uv tx_size. Change-Id: I94b97a966b5efbc9a243048f1f5ddbbdc4b1846e	2013-07-16 18:27:33 +01:00
John Koleszar	5efd9609e3	Fix above context pointers In the prior code, the above context pointers used for entropy decoding were initialized on the first frame, and not updated when the frame size changed. The per-frame code which initializes the contexts assumes that the contexts are contiguous, leading to an incomplete initialization when the frame is smaller. This commit updates the pointers so that the context is contigous whenever the frame size changes. Change-Id: I08b53e3a30c8289491212311682ff1b8028cff6c	2013-07-16 10:26:56 -07:00
Johann	90ebfe621f	Merge "vp9_convolve8_[horiz\|vert]_avg"	2013-07-16 09:42:52 -07:00
Jingning Han	dd97c62ab8	Merge "Skip inter-coded block reconstruction in rd loop"	2013-07-16 09:03:38 -07:00
Dmitry Kovalev	e8e7620a1f	Merge "Removing and moving around constant definitions."	2013-07-16 00:52:53 -07:00
Yaowu Xu	c5b0cd8405	Merge "Change to extend full border only when needed"	2013-07-15 21:35:32 -07:00
Yaowu Xu	5b915ebd92	Change to extend full border only when needed This is a short term optimization till we work out a decoder implementation requiring no frame border extension. Change-Id: I02d15bfde4d926b50a4e58b393d8c4062d1be70f	2013-07-15 20:52:13 -07:00
Dmitry Kovalev	ca75f1255f	Removing and moving around constant definitions. Removing unused and duplicated constants, moving them from .h to .c if possible. Change-Id: Ief4d6b984a3ca2e9b38504f0d855ed072cf7133f	2013-07-15 19:26:30 -07:00
Dmitry Kovalev	65762849d1	Merge "Consistent naming for loop-filter filters."	2013-07-15 19:21:32 -07:00
Johann	6eae37f45c	Merge "Remove print_nmvcounts"	2013-07-15 18:43:41 -07:00
Ronald S. Bultje	b02c4d364f	Increase border size from 96 to 160. This is required because upon downscaling, if a motion vector points partially into the UMV (e.g. all minus 1 of 64+7 pixels, i.e. 70), then we can point up to 140 pixels into the larger-resolution (2x) reference buffer UMV, which means the UMV for reference buffers in downscaling needs to be 140 rounded up to the nearest multiple of 32, i.e. 160. Longer-term, we should probably handle the UMV differently by detecting edge coverage on-the-fly and using a temporary buffer for edge extensions instead of adding 160 pixels on all sides of the image (which means a CIF image uses 3x its own area size for borders). Change-Id: I5184443e6731cd6721fc6a5d430a53e7d91b4f7e	2013-07-15 17:30:57 -07:00
Ronald S. Bultje	1ff94fea56	Inline vp9_quantize() in xform_quant(). Cycle times: 4x4: 151 to 131 cycles (15% faster) 8x8: 334 to 306 cycles (9% faster) 16x16: 1401 to 1368 cycles (2.5% faster) 32x32: 7403 to 7367 cycles (0.5% faster) Total encode time of first 50 frames of bus @ 1500kbps (speed 0) goes from 1min39.2 to 1min38.6, i.e. a 0.67% overall speedup. Change-Id: I799a49460e5e3fcab01725564dd49c629bfe935f	2013-07-15 17:30:57 -07:00
Ronald S. Bultje	7e684e2009	Merge "Inline xform_quant() in encode_block_intra()."	2013-07-15 17:29:39 -07:00
Frank Galligan	ce1d69aed9	Merge "Neon: Update mbfilter if all vectors follow one branch."	2013-07-15 17:11:55 -07:00
Dmitry Kovalev	e973b4e2d9	Consistent naming for loop-filter filters. Renaming flatmask4 to flat_mask4, flatmask5 to flat_mask5, hevmask to hev_mask, filter to filter4, mbfilter to filter8, wide_mbfilter to filter16. Change-Id: Ic61c73e59c2eee505257584867aafac99833cea1	2013-07-15 16:01:31 -07:00
Ronald S. Bultje	6fb418741f	Inline xform_quant() in encode_block_intra(). Also inline some of the block calculations to assist the compiler to not do silly things like calculating the same offset (or converting between raster/transform block offset or block, mi and pixel unit) many, many, many times. Cycle times: 4x4: 584 -> 505 cycles (16% faster) 8x8: 1651 -> 1560 cycles (6% faster) 16x16: 7897 -> 7704 cycles (2.5% faster) 32x32: 16096 -> 15852 cycles (1.5% faster) Overall, this saves about 0.5 seconds (1min49.8 -> 1min49.3) on the first 50 frames of bus (speed 0) @ 1500kbps, i.e. 0.5% overall. Change-Id: If3dd62453f8e2ab9d4ee616bc4ea956fb8874b80	2013-07-15 16:00:42 -07:00
Dmitry Kovalev	2c31729839	Code cleanup inside vp9_decodeframe.c. Removing unused DEC_DEBUG define and dec_debug variable. Changing function signatures to eliminate code duplication, renaming function mb_init_dequantizer to init_dequantizer. Also removing redundant curly braces, and comments. Change-Id: Ia56ee1b0be5f24abb0e878581845be8a4773c298	2013-07-15 14:47:25 -07:00
Frank Galligan	f4f60f6005	Neon: Update mbfilter if all vectors follow one branch. Change the mbfilter Neon code from executing both branches if all vectors follow only one branch. The code is about 5% faster when executing only one branch and about 1% slower when executing both branches. -PS5: Remove local stack space from mbfilter. Change-Id: I6a23f9b318a9f4568a2718b4c9348db988fe2182	2013-07-15 13:08:28 -07:00
Jingning Han	6094bf37c5	Cosmetic changes in 4x4 and 8x8 fdct unit tests Make the codes consistent with conventions. Change-Id: Id044ed8382f83a3c3f54f9edd569f00bcd0523db	2013-07-15 11:37:17 -07:00
Jingning Han	043e0f9dad	Skip inter-coded block reconstruction in rd loop Skip the inverse transform and reconstruction of inter-mode coded blocks in the rate-distortion optimization loop, when skip_encode_sb feature is turned on. This provides about 1% speed-up at speed 0, and 1.5% speed-up at speed 1. No performance change in both settings. Change-Id: I2932718bf4d007163702b61b16b6ff100cf9d007	2013-07-15 11:32:14 -07:00
Jingning Han	faff6ed0fb	Skip duplicate block encoding in the rd loop This speed feature allows the encoder to largely remove the spatial dependency between blocks inside a 64x64 superblock, thereby removing the need to repeatedly encode superblocks per partition type in the rate-distortion optimization loop. A major challenge lies in the intra modes tested in the rate-distortion optimization loop. The subsequent blocks do not have access to the reconstructed boundary pixels without the intermediate coding steps. This was resolved by using the original pixels for intra prediction in the rd loop, followed by an appropriately designed distortion modeling on the quantization parameters. Experiments also suggested that the performance impact is more discernible at lower bit-rate/psnr settings. Hence a quantizer dependent threshold is applied to deactivate skip of block coding. For bus_cif at 2000 kbps, speed 0: runtime 269854ms -> 237774ms (12% speed-up) at 0.05dB performance loss. speed 1: runtime 65312ms -> 61536ms, (7% speed-up) at 0.04dB performance loss. This operation is currently turned on in settings of speed 1. Change-Id: Ib689741dfff8dd38365d8c1b92860a3e176f56ec	2013-07-15 11:08:58 -07:00
Dmitry Kovalev	1f14bbb624	Merge "Fixing vp9_get_pred_context_comp_ref_p function."	2013-07-15 10:51:42 -07:00
James Zern	04606d7258	vp9_loopfilter_intrin_sse2: make some funcs static + drop 'vp9_' Change-Id: I4a4bec175316aab8f65c3a23bacc8362399a1357	2013-07-13 18:48:00 -07:00
James Zern	dc968d3d45	vp9_loopfilter_intrin_sse2: remove unused uv funcs vp9_mbloop_filter_horizontal_edge_sse2 / vp9_mbloop_filter_vertical_edge_uv_sse2 Change-Id: I61c4351ef0cce79fa4156a47ddace781f1566869	2013-07-13 18:44:32 -07:00
James Zern	bd6b79c44d	vp9_loopfilter: remove uv function typedef loop_filter_uvfunction is unused Change-Id: I37eb3559e9eb2808f1f29dfea429441c94c9df2a	2013-07-13 18:38:28 -07:00
James Zern	9a4e175a64	filter_block_plane: reuse some constants + light const application + limit scope of params to build_lfi Change-Id: I1031c556aec160a690921dc10e7aa8a707f43ecd	2013-07-13 18:21:05 -07:00
James Zern	b09d37af0c	vp9_loopfilter.c: make some functions static + drop 'vp9_' Change-Id: I8c8f1f421f7fc84d2efb80349cd725de3c9bf6bd	2013-07-13 18:14:03 -07:00
James Zern	dc1d2331f6	vp9: remove frames_{since,till}.. from MACROBLOCKD frames_since_golden / frames_till_alt_ref_frame are unused. Change-Id: I348e7689d4d75412cf4de7703d885be942e4a26b	2013-07-13 18:02:11 -07:00
James Zern	04092764f7	VP9_COMMON: remove unused framerate/bitrate + VP8_COMMON: place them under CONFIG_POSTPROC_VISUALIZER Change-Id: I2702d5a3e1134b9c5f7ddc14b4173955a400f2cf	2013-07-12 21:43:23 -07:00
Jingning Han	91365addf8	SSE2 8x8 inverse ADST/DCT transform This commit enables SSE2 implementation of 8x8 inverse ADST/DCT transform. The runtime goes from 1216 cycles -> 266 cycles. For bus_cif at 2000 kbps, the overall runtime reduces from 253707ms -> 248430ms, i.e., 2% speed-up at speed 0. Change-Id: Ib0372e17e9162d7b11a10d653b1c8be547c878fb	2013-07-12 21:03:16 -07:00
James Zern	ce0324d8dd	VP[89]_COMMON: remove unused near_boffset Change-Id: If9b9ca703b997312df85241a0758d414cfdc5228	2013-07-12 19:41:27 -07:00
Dmitry Kovalev	429070987a	Using vp9_copy and vp9_zero instead of custom code. Change-Id: Id9b6ceeddca3f9b34bfada5c499b1e7a2f42c30b	2013-07-12 18:07:43 -07:00
Dmitry Kovalev	31a68bcdff	Fixing vp9_get_pred_context_comp_ref_p function. Adding missed parenthesis around boolean expressions. Bitstream is changed. Regenerating test vectors. Change-Id: I4cc00b761e9473f92f180a9fc3a0c607f0aaae56	2013-07-12 17:46:02 -07:00
Dmitry Kovalev	31403080ff	Merge "Removing redundant call to set_mi_row_col."	2013-07-12 17:08:23 -07:00
Dmitry Kovalev	3c94fffdb0	Removing redundant call to set_mi_row_col. This function is actually called from set_offsets which is called right before vp9_read_mode_info. Change-Id: Ibb9d5ad606194bc80eab264fad85b31c9dfd8f77	2013-07-12 16:25:23 -07:00
Johann	a15bebfc0a	vp9_convolve8_[horiz\|vert]_avg Super basic conversion from the other implementations. Any changes to one should be trivial to copy over keep in sync. Change-Id: I1720b4128e0aba4b2779e3761f6494f8a09d3ea8	2013-07-12 16:21:33 -07:00
Yaowu Xu	cdea4a7c66	Merge "Fix a build issue"	2013-07-12 16:17:22 -07:00
Dmitry Kovalev	aa518af8c7	Merge "Adding struct tx_probs and struct tx_counts to cleanup the code."	2013-07-12 16:02:09 -07:00
Dmitry Kovalev	444c8d4c53	Merge "Making functions read_{inter, intra}_segment_id more similar."	2013-07-12 15:50:02 -07:00
James Zern	c9a2a06c20	Merge "vp9_postproc: remove useless self-assign"	2013-07-12 15:41:41 -07:00

1 2 3 4 5 ...

5732 Commits