generic-library/vpx

Author	SHA1	Message	Date
Ronald S. Bultje	d00b8e5f82	Inline vp9_get_coef_context() (and remove vp9_ prefix). Makes cost_coeffs() a lot faster: 4x4: 236 -> 181 cycles 8x8: 888 -> 588 cycles 16x16: 3550 -> 2483 cycles 32x32: 17392 -> 12010 cycles Total encode time of first 50 frames of bus (speed 0) @ 1500kbps goes from 2min51.6 to 2min43.9, i.e. 4.7% overall speedup. Change-Id: I16b8d595946393c8dc661599550b3f37f5718896	2013-06-28 10:40:21 -07:00
Ronald S. Bultje	af660715c0	Make coefficient skip condition an explicit RD choice. This commit replaces zrun_zbin_boost, a method of biasing non-zero coefficients following runs of zero-coefficients to be rounded towards zero, with an explicit skip-block choice in the RD loop. The logic is basically that if individual coefficients should be rounded towards zero (from a RD point of view), the trellis/optimize loop should take care of it. If whole blocks should be zero (from a RD point of view), a single RD check is much more efficient than a complete serialization of the quantization loop. Quality change: derf +0.5% psnr, +1.6% ssim; yt +0.6% psnr, +1.1% ssim. SIMD for quantize will follow in a separate patch. Results for other test sets pending. Change-Id: Ife5fa641163ac5150ac428011e87188f1937c1f4	2013-06-28 10:28:49 -07:00
Frank Galligan	1d6dc1b702	Add Neon optimized loop filter functions. - Added vp9_loop_filter_horizontal_edge_neon and vp9_loop_filter_vertical_edge_neon. - The functions are based off the vp8 loopfilter functions. - Matches x86 md5 checksum. Change-Id: Id1c4dddb03584227e5ecd29f574a6ac27738fdd0	2013-06-27 16:14:45 -07:00
Dmitry Kovalev	a3664258c5	Merge "General cleanup in segmentation-related code."	2013-06-27 14:57:07 -07:00
Dmitry Kovalev	be83ef3104	Merge "Moving subexp encoding functions in separate vp9_dsubexp.c file."	2013-06-27 14:55:18 -07:00
Jingning Han	fc1cfd8e32	Merge "Make intra predictor reference buffer configurable"	2013-06-26 19:02:02 -07:00
Jingning Han	4c10515f89	Merge "Make update_partition_context faster"	2013-06-26 19:01:45 -07:00
Yaowu Xu	896dc47cac	Merge "Change to use LUT for mode-to-txfm conversion"	2013-06-26 17:19:47 -07:00
Jingning Han	861cb06c67	Make intra predictor reference buffer configurable This commit enables configurable reference buffer pointer for intra predictor. This allows later removal of spatial dependency between blocks inside a 64x64 superblock in the rate-distortion optimization loop. Change-Id: I02418c2077efe19adc86e046a6b49364a980f5b1	2013-06-26 17:17:21 -07:00
Jingning Han	92479d9526	Make update_partition_context faster Use vpx_memset for updating the partition contexts. Thanks to Noah for pointing out the need of refactoring in this part. Change-Id: I67fb78429d632298f1cd8a0be346cc76f79392a6	2013-06-26 17:05:51 -07:00
Yaowu Xu	25fe05fd92	Change to use LUT for mode-to-txfm conversion Change-Id: Ieb989830f49e6708ee7728eddebf7a2144c37c6f	2013-06-26 14:10:43 -07:00
Dmitry Kovalev	be07485e9a	General cleanup in segmentation-related code. Using consistent function and variable names. Change-Id: I2deb3fded8797453a2081836c9ce2e79ade06eb7	2013-06-26 10:27:28 -07:00
John Koleszar	8137e24f3d	Merge "Move vp9_counts_to_nmv_context to encoder"	2013-06-25 22:44:21 -07:00
John Koleszar	7bbb0633cd	Merge "Move vp9_full_to_model_counts to encoder"	2013-06-25 22:44:16 -07:00
Jingning Han	3cc8c8c3a0	Merge "Refactor intra predictor block"	2013-06-25 19:46:55 -07:00
Jingning Han	d19ea3861d	Refactor intra predictor block Remove vp9_intra4x4_predict(). Use the common intra prediction function for all block sizes. Change-Id: Ibd19d51dfa3da8bbdfb79ddeb81530b2e2089560	2013-06-25 16:33:13 -07:00
Dmitry Kovalev	6fb10f2de4	Renaming "nmv" to "mv". Change-Id: I8299f55c3b930221e52c2237f2ddea65b94fd33b	2013-06-25 15:19:18 -07:00
Ronald S. Bultje	c24d922396	Add averaging-SAD functions for 8-point comp-inter motion search. Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2, i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc the variance of the averaging predictor. This is slightly suboptimal because the function is subpixel-position-aware, but it will (at least for the SSE2 version) not actually use a bilinear filter for a full-pixel position, thus leading to approximately the same performance compared to if we implemented an actual average-aware full-pixel variance function. That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus leading to a total gain of 2.7%. Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd	2013-06-25 12:57:28 -07:00
Dmitry Kovalev	9467571777	Moving subexp encoding functions in separate vp9_dsubexp.c file. Change-Id: Idbb2ea80f764fa830fe2ddcfc54ef7fe232f05a8	2013-06-25 11:53:17 -07:00
Dmitry Kovalev	5ae096778e	Merge "Removing unused code."	2013-06-25 11:50:55 -07:00
Yaowu Xu	c2e3ee13e7	Merge "Changed size of mb_mode_context to 8 bits"	2013-06-25 10:44:47 -07:00
Scott LaVarnway	855e23ce8c	Merge "Small mode_info_context cleanup in filter_block_plane"	2013-06-25 10:34:19 -07:00
Dmitry Kovalev	87ee34aacb	Removing unused code. Removing block index (ib) parameter from get_tx_type_{8x8, 16x16} functions. Change-Id: Ia213335aae7a7cb027f97b9cc9b04519840250f1	2013-06-25 10:17:19 -07:00
Dmitry Kovalev	70e9622185	Merge "Removing find_seg_id and using vp9_get_pred_mi_segid instead."	2013-06-25 10:16:06 -07:00
Dmitry Kovalev	529679bd52	Merge "Transforming scale_mv_component_q4 into scale_mv_q4 function."	2013-06-25 10:15:33 -07:00
Scott LaVarnway	c787f40bc4	Small mode_info_context cleanup in filter_block_plane Unnecessary updates to xd->mode_info_context. Change-Id: I36d2d68ca48366f727548526726b1b5437f62968	2013-06-25 12:28:50 -04:00
Yaowu Xu	b9c934df8e	Merge "Enable sse2 implmentation of 8x8 ADST/DCT"	2013-06-25 09:13:22 -07:00
Jingning Han	a32a086d23	Enable sse2 implmentation of 8x8 ADST/DCT This commit makes use of the butterfly structure to enable the sse2 version implementation of 8x8 ADST/DCT hybrid transform coding. The runtime of hybrid transform module goes down from 1170 cycles to 245 cycles. Overall speed-up around 1.5%. Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f	2013-06-24 18:41:33 -07:00
John Koleszar	4ecd6dbead	Move vp9_counts_to_nmv_context to encoder This function only used from within vp9_encodemv.c. Change-Id: Ib3fc7c30b1e2d27321397ac474cbc8976bc1f4b1	2013-06-24 15:58:18 -07:00
John Koleszar	08b1798ae7	Move vp9_full_to_model_counts to encoder This function is not called from the decoder, so it doesn't need to be in common/. Change-Id: I6977dd462a25b4ff39c9c7e1b0b5b16aa58ee733	2013-06-24 15:46:15 -07:00
John Koleszar	ece724ae16	Merge "Remove unused vp9_build_intra_predictors_sb{y,uv}_s"	2013-06-24 15:08:58 -07:00
John Koleszar	ee4a7e4e46	Merge "Remove unused vp9_model_to_full_probs_sb()"	2013-06-24 15:08:54 -07:00
Scott LaVarnway	dfa2ecc3f1	Changed size of mb_mode_context to 8 bits This reduced the size of the MODE_INFO array (mip and prev_mip) by 425,568 bytes each for 1080p resolutions. Change-Id: Ifa513ec2d0a49e8ec0867ec90620762fb7f1261d	2013-06-24 17:11:16 -04:00
John Koleszar	858475a03a	Fix loopfilter of leftmost 4x4 edges in SB For cases where there's no transform set in bit 0 (the left edge of the SB) but bit 0 of mask_4x4_int is set (the edge 4 pixels from the left edge needs filtering), it was incorrectly being skipped before. This situation only happens on the leftmost edge of the image, as the edge at column 0 is intentionally skipped since there aren't pixels to the left to read. Change-Id: Ib2fbbcb40166e90af31b1a0e13b85b68c226cbd3	2013-06-24 08:26:00 -07:00
John Koleszar	9e7019f7df	Remove unused vp9_build_intra_predictors_sb{y,uv}_s The functions no longer referenced. Change-Id: If2705dfbc607f79ec8ec2242d5e03bec27a35aaf	2013-06-21 16:10:05 -07:00
John Koleszar	5c32215e27	Remove unused vp9_model_to_full_probs_sb() This function never referenced. Change-Id: I1c42cd355bfa88e17d169f7335a44be682af58cc	2013-06-21 15:38:55 -07:00
Dmitry Kovalev	f27f76dfb3	Transforming scale_mv_component_q4 into scale_mv_q4 function. Using MV instead of int_mv for function arguments. Change-Id: Ic25e13dccbc98fac1fa1b3255127e00cca2a57f6	2013-06-21 15:34:29 -07:00
Dmitry Kovalev	40141681c0	Removing find_seg_id and using vp9_get_pred_mi_segid instead. Change-Id: Ia40229903c08f14020e90e94cfdf494aba1be827	2013-06-21 13:05:10 -07:00
Ronald S. Bultje	54b2a59623	Implement SSE2 block_error. Change vp9_block_error() to return a 64bit error variable, change all callers to expect a 64bit return value (this will prevent overflows, which we basically don't check for at all right now). Remove duplicate block_error() function, which fixed that through truncation. Remove old (incompatible) mmx/sse2 block_error SIMD versions and replace with a new one that returns a 64bit value. Encoding time of first 50 frames of bus @ 1500kbps goes from 3min29 to 3min23, i.e. a 3% overall speedup. Change-Id: Ib71ac5508b5ee8a80f1753cd85d72df1629abe68	2013-06-21 12:54:52 -07:00
Ronald S. Bultje	7756e9892b	Merge "Add subtract_block SSE2 version and unit test."	2013-06-21 12:49:50 -07:00
Ronald S. Bultje	9a480482cb	Merge "SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance()."	2013-06-21 12:49:43 -07:00
Ronald S. Bultje	25c588b1e4	Add subtract_block SSE2 version and unit test. 3% faster overall (3min35.0 to 3min28.5). Change-Id: I5ff8a5c2c91586b6632ca5009ad1ea51ce94af5e	2013-06-21 09:35:37 -07:00
Yaowu Xu	e6cd5ed307	Merge "Implement sse2 and ssse3 versions for all sub_pixel_variance sizes."	2013-06-20 17:42:50 -07:00
Ronald S. Bultje	1e6a32f1af	SSE2/SSSE3 optimizations and unit test for sub_pixel_avg_variance(). Encoding of bus @ 1500kbps (first 50 frames) goes from 3min57 to 3min35, i.e. approximately a 10.5% speedup. Note that the SIMD versions which use a bilinear filter (x_offset & 7 \|\| y_offset & 7) aren't perfectly interleaved, and can probably be improved further in the future. I've marked this with a few TODOs/FIXMEs in the code. Change-Id: I5c9e900c0f0d32e431a50fecae213b510b2549f9	2013-06-20 15:59:48 -07:00
Frank Galligan	c259af4f73	Fix win64 warning. - size_t vs int. Change-Id: Ib47ebd932a4b69db9f52a43000bb69d0a96b9134	2013-06-20 14:07:11 -07:00
Dmitry Kovalev	8283d893eb	Merge "Renaming 'nmv' to 'mv' for several functions."	2013-06-20 10:17:12 -07:00
Ronald S. Bultje	8fb6c58191	Implement sse2 and ssse3 versions for all sub_pixel_variance sizes. Overall speedup around 5% (bus @ 1500kbps first 50 frames 4min10 -> 3min58). Specific changes to timings for each function compared to original assembly-optimized versions (or just new version timings if no previous assembly-optimized version was available): sse2 4x4: 99 -> 82 cycles sse2 4x8: 128 cycles sse2 8x4: 121 cycles sse2 8x8: 149 -> 129 cycles sse2 8x16: 235 -> 245 cycles (?) sse2 16x8: 269 -> 203 cycles sse2 16x16: 441 -> 349 cycles sse2 16x32: 641 cycles sse2 32x16: 643 cycles sse2 32x32: 1733 -> 1154 cycles sse2 32x64: 2247 cycles sse2 64x32: 2323 cycles sse2 64x64: 6984 -> 4442 cycles ssse3 4x4: 100 cycles (?) ssse3 4x8: 103 cycles ssse3 8x4: 71 cycles ssse3 8x8: 147 cycles ssse3 8x16: 158 cycles ssse3 16x8: 188 -> 162 cycles ssse3 16x16: 316 -> 273 cycles ssse3 16x32: 535 cycles ssse3 32x16: 564 cycles ssse3 32x32: 973 cycles ssse3 32x64: 1930 cycles ssse3 64x32: 1922 cycles ssse3 64x64: 3760 cycles Change-Id: I81ff6fe51daf35a40d19785167004664d7e0c59d	2013-06-20 09:34:25 -07:00
Jim Bankoski	2c6bdbbc78	new debug modes code The new print out includes skips and has prefixed sections so you can grep to find things like transforms chosen on each frame. Change-Id: I195043424647d9514cfc3ff6720a5b20d010fa1b	2013-06-20 09:33:11 -07:00
Yaowu Xu	12180c8329	Remove unnecessary copying of probs. Change-Id: Ic924f07c6ab0c929c6cdf11880d3c625806e272c	2013-06-18 23:02:27 -07:00
Dmitry Kovalev	87e1fa7627	Renaming 'nmv' to 'mv' for several functions. Change-Id: I183a38997a9d01e4a1b869e92509f6915216fa09	2013-06-18 18:28:10 -07:00

1 2 3 4 5 ...

981 Commits