generic-library/vpx

Author	SHA1	Message	Date
Urvang Joshi	5322a31b18	Remove the token state array from greedy optimize_b. Reduces memory usage, and speeds up encoding for some difficult clips. No impact on output or metrics. Ported from aomedia patch: https://aomedia-review.googlesource.com/c/14501 Change-Id: I26ec69af8336f9e80da486a1cfbfc89a3596954d	2017-07-11 13:05:29 -07:00
James Zern	80b83c73ba	cosmetics,vp9/: normalize inv/fwd_txfm naming + vpx_dsp/, test/ itxfm -> inv_txfm, ftxfm -> fwd_txfm Change-Id: I3aacdb65143576d64cfe5c9b14dd358c17c1fe7e	2017-07-06 18:35:44 -07:00
James Zern	8d1bda93f4	cosmetics,vp9/encoder: s/txm/txfm/ txfm is more commonly used as an abbreviation through the codebase Change-Id: I86fd90ef132468f9da270091c05daa1f5a49ece2	2017-06-29 15:08:47 -07:00
Urvang Joshi	4bb99ee27e	Enable greedy version of optimize_b() in VP9 by default. Improvements were already mentioned in the previous patch: https://chromium-review.googlesource.com/#/c/531675/ Change-Id: I4906ab1c61c25a815bdeb986016fad6dcb69eb71	2017-06-23 17:04:58 -07:00
Urvang Joshi	a4ea7e131b	VP9: Add greedy version of av1_optimize_b(). This was ported from the greedy version in AV1, written by Dake He (dkhe@google.com). See: https://aomedia.googlesource.com/aom/+/master/av1/encoder/encodemb.c#137 Greedy version is disabled by default, but can be picked by setting USE_GREEDY_OPTIMIZE_B to 1. To be enabled by default later. This is both faster and better in terms of compression. Compression Improvement: ------------------------ lowres: -0.119 midres: -0.064 hdres: -0.405 Speed Improvement: ------------------ (Based on encode time of 3 videos of different difficulties at 3 different target bitrates) With --cpu-used=0: 0.38% to 5.55% faster With --cpu-used=1: 0.24% to 2.79% faster With --cpu-used=2: 0.29% to 1.46% faster Change-Id: Ia7a23b3b244ad8eb253ac9e43cd03c5e021d2635	2017-06-15 11:19:08 -07:00
Linfeng Zhang	d5de63d2be	Update highbd idct functions arguments to use uint16_t dst BUG=webm:1388 Change-Id: I3581d80d0389b99166e70987d38aba2db6c469d5	2017-05-03 13:59:16 -07:00
Linfeng Zhang	081b39f2b7	Clean CONVERT_TO_BYTEPTR/SHORTPTR in idct BUG=webm:1388 Change-Id: Ida62c941f2b836d6c9e27b427a7d5008ab6dc112	2017-05-03 13:58:31 -07:00
Peter de Rivaz	66117b97c5	VP9: enable trellis for high bitdepth intra BUG=webm:1409 Change-Id: I5236595aac1c09386c60ffe8ad621e01422ed5a7	2017-04-26 11:43:01 +01:00
Jingning Han	ca9bedd538	Backport "Optimize the use case of token_cost table" to VP9 cherry picked from nextgenv2 `90ea281f29` Change-Id: Ie989e60c6479ac3251cadaac9c7e795ccba52f4e	2017-03-17 16:54:22 -07:00
Alex Converse	ab71181545	Drop vp9_get_token_extracost vp9_get_token_cost does the same thing with one fewer lookup. Change-Id: Ifc110b12403cb1a04a3f91357ab435c67b4815d6	2017-03-17 16:53:09 -07:00
Alex Converse	3a6ec9ea72	vp9_optimize_b: Combine extrabits cost with token lookup About 0.6% fewer cycles spent in vp9_optimize_b. Change-Id: I2ae62a78374c594ed81d4e3100a5848e2f6f2c4e	2017-03-16 17:03:22 -07:00
Alex Converse	15dac923b9	Merge "Narrow cat6_high_cost tables to uint16_t"	2017-03-03 23:45:39 +00:00
Alex Converse	bcd12de6c3	Narrow cat6_high_cost tables to uint16_t Saves 2688 bytes of rodata. Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56	2017-03-03 23:09:12 +00:00
Johann	ca4e27f5da	Drop zbin_ptr and quant_shift_ptr vp9[_highbd]_quantize]_fp[_32x32] and vp9_fdct8x8_quant do not make use of these parameters. scan is used for C code and iscan is used for SIMD implementations. Change-Id: I908a0ff7d3febac33da97e0596e040ec7bc18ca5	2017-02-16 13:20:32 -08:00
Hui Su	37cd112b0f	Merge "Fix an overflow warning in optimize_b()"	2017-01-25 22:49:30 +00:00
hui su	519b2e48a8	Fix an overflow warning in optimize_b() BUG=webm:1361 Change-Id: Ib840bf3b39f7b3c8c017d3488a83434e9a0f45f5	2017-01-25 10:54:39 -08:00
Ranjit Kumar Tulabandu	8b0c11c358	Multi-threading of first pass stats collection (yunqingwang) 1. Rebased the patch. Incorporated recent first pass changes. 2. Turned on the first pass unit test. Change-Id: Ia2f7ba8152d0b6dd6bf8efb9dfaf505ba7d8edee	2017-01-24 15:48:02 -08:00
Gabriel Marin	fce163cd54	Remove superfluous conditional on 'shortcut' Remove superfluous test. Produces a small improvement in instruction scheduling. Measured a 1% to 1.5% reduction in execution time for routine vp9_optimize_b with different compilers. No change in behavior. TEST=Verified that encoded files match bit for bit, with and without this change. BUG=b/33678225 Change-Id: I2bf248d4c25fc0256147d7a8766ff9108ae9cba3	2016-12-20 12:20:21 -08:00
Gabriel Marin	0549f5aae9	Simplify address arithmetic in vp9_optimize_b Simplify address arithmetic on token_costs to reduce the number of generated instructions that are used for address arithmetic inside routine vp9_optimize_b. It also helps improve instruction scheduling depending on compiler and optimization level. Measured a 9.3% reduction in retired instructions and 5.3% reduction in execution time for this routine with GCC v4.8.4 and optimization flags -O3, and a reduction of up to 11.6% in execution time with other compilers. No change in behavior. TEST=Verified that encoded files match bit for bit, with and without this change. BUG=b/33678225 Change-Id: I6098650fb5cd2aa04e014fe6e68ca20761f3a21f	2016-12-19 13:10:04 -08:00
clang-format	5f6d143b41	apply clang-format Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487	2016-09-15 15:07:53 -07:00
clang-format	e0cc52db3f	vp9/encoder: apply clang-format Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2	2016-08-02 16:47:11 -07:00
Alex Converse	e446ffda45	Cache optimizations in optimize_b(). Move best index into the token state. Shrink it down to one byte. This is more cache friendly (access are group together) and uses less total memory. Results in 4% fewer cycles in optimize_b(). Change-Id: I75db484fb3dc82f59928d54b659d79c80ee40452	2016-07-29 12:06:49 -07:00
hui su	248f6ad771	Revert "Eliminate isolated and small tail coefficients:" This reverts commit `ff19cdafdb`. Change-Id: I81f68870ca27a1ff683ee22090530b6997815fb2	2016-07-13 11:14:44 -07:00
Jingning Han	2f28f9072e	Enable coeff optimization for intra modes This further improves the coding performance by lowres 0.3% midres 0.5% hdres 0.6% Change-Id: I6a03b6da210b9cbc261474bad4a103e0ba021c68	2016-07-07 12:25:41 -07:00
Jingning Han	44354ee7bf	Use precise context to estimate coeff rate cost Use the precise context to estimate the zero token cost in trellis optimization process. This improves the speed 0 coding performance by 0.15% for lowres and 0.1% for midres. It improves the speed 1 coding performance by 0.2% for midres and hdres. Change-Id: I59c7c08702fc79dc4f8534b64ca594da909e2c91	2016-07-07 12:25:33 -07:00
Jingning Han	62aa642d71	Enable uniform quantization with trellis optimization in speed 0 This commit allows the inter prediction residual to use uniform quantization followed by trellis coefficient optimization in speed 0. It improves the coding performance by lowres 0.79% midres 1.07% hdres 1.44% Change-Id: I46ef8cfe042a4ccc7a0055515012cd6cbf5c9619	2016-07-07 12:25:33 -07:00
Min Ye	ff19cdafdb	Eliminate isolated and small tail coefficients: Improve hdres PSNR by 0.696% Improve midres PSNR by 0.313% Improve lowres PSNR by 0.142% Change-Id: Icabde78aa9689f539f6a03ec09f712c20758796c	2016-07-06 11:08:23 -07:00
Jingning Han	14011f037d	Remove txfrm_block_to_raster_xy() from vp9 encoder The transform block row and column positions are always available outside the callees. There is no need to re-compute these values again. This approach has been used by the decoder. This commit removes txfrm_block_to_raster_xy() function. Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108	2016-07-04 18:41:47 -07:00
Alex Converse	50d3629c61	Repack vp9_token_state. Reduces size from 32 bytes to 24 bytes on x86_64. Change-Id: I8a22552343a1fc916117f35267fe6a295250f742	2016-06-20 12:56:32 -07:00
Jingning Han	9e185ed177	Refactor optimize_b for speed performance This commit refactors the trellis coefficient optimization process. It saves multiplications used to generate the final dequantized coefficients. It removes two memset operations on quantized and dequantized coefficient sets. This improves the unit speed by 10%. Change-Id: I23f47c6e14582520a7f952f03ce8f72183e7f0e6	2016-06-17 17:41:09 -07:00
Jingning Han	dba1d1a63d	Port optimize_b speed-up from vp10 This commit back ports the speed-up from vp10. It improves the unit speed by 15%. Change-Id: Ibe8c0e0974b03266d6abd16a41e89c3b91d8db2a	2016-06-17 17:41:05 -07:00
Jingning Han	f99f78c7af	Use 64-bit integer to store distortion in optimize_b This fixes the overflow issue. Bug=webm:1241 Change-Id: Ia168b7fae1ad214a6837aaa785a08bf8506987dd	2016-06-17 15:07:00 -07:00
hui su	a554bd8dac	Avoid a potential assertion fail in optimize_b() The eob of a block is not perperly set when skip_recode is true, thus triggering assert(eob <= default_eob) to fail. Change-Id: Ifecbe33dce2dc4903e0a80bd384dc09bf0dd8a44	2016-06-07 15:45:04 -07:00
Yaowu Xu	81eb71f00c	Change to use proper type in vp{9,10}_token_state "qc" in vp{9,10}_token_state is used to save quantized coefficients, this commit changes the type from short to tran_low_t to properly reflect the value range for highbitdepth build. This fixes an out-of-range bug when optimize_b is used in highbitdepth build. Change-Id: Ibf330879e6ac6ae8f099e085caa9d3d9a889fde8	2016-05-04 12:14:11 -07:00
hui su	c3a9247e09	VP9: adjust trellis quant optimization RD parameters Coding gain: lowres 0.64% midres 0.38% hdres 0.58% Change-Id: I233fa2a4b24bd1e15091a5f5ef6aff661f3f50ec	2016-04-26 10:17:38 -07:00
hui su	c8f56d2303	VP9: enable trellis quantization optimization for intra blocks Coding gain: lowres 0.18% midres 0.23% hdres 0.36% Change-Id: I044c8afbc481fc55b23d440352941071355b0afb	2016-04-26 10:17:29 -07:00
Jim Bankoski	1de659af06	vp9_encodemb.c: TODO clean up huisu did in nextgen branch -> please try in vp9 Change-Id: I0ff35db07ac38464e0e2858e303be686c03a5d0e	2016-04-21 20:35:54 +00:00
Alex Converse	d13385cee7	Switch to 9-bit rate cost constants built on a 256 probability denominator. -.220 BDRATE derf: https://x20web.corp.google.com/~aconverse/results/cost256_derf.html -.675 BDRATE hevcmr: https://x20web.corp.google.com/~aconverse/results/cost256_hevcmr.html Change-Id: Ifb1646d8ce65ffe0eff9953a911b1b88735b335f	2016-01-27 19:34:30 +00:00
Alex Converse	4326cffa65	Merge "Tie the bit cost scale to a define."	2016-01-21 19:17:56 +00:00
Scott LaVarnway	5232326716	VP9: Eliminate MB_MODE_INFO Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185	2016-01-19 16:40:20 -08:00
Alex Converse	269428e35c	Tie the bit cost scale to a define. This is a pure-refactor in preparation to potentially raise the bit-cost resolution. Verified at good speed 0 and rt speed -6. Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9	2016-01-15 15:59:31 -08:00
Scott LaVarnway	2f8625d824	VP9: remove plane_type from macroblockd_plane Change-Id: Ia5072a3a92212d8565f33359f6c146469bdfbbec	2015-09-30 15:15:11 -07:00
Alex Converse	a8a08ce57e	Move vp9_systemdependent.h to vpx_ports bitops.h and system_state.h Use system_state.h in vpx_dsp and remove unneeded includes of vp9_systemdependent.h. Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4	2015-08-10 15:37:14 -07:00
Jingning Han	b4f2c567c8	Cosmetic - align format in vp9 Change-Id: I83ed3422f1f4009675ad2f5c4b7236bc7b83b30e	2015-08-06 15:56:11 -07:00
Jingning Han	d621de7e8d	Change vp9_quantize to vpx_quantize This commit clears all the vp9_ prefix use case in vpx_dsp. It gets the vp9 folder ready to branch out vp10. Change-Id: I2906eec179ee792b4af8c9b4161313653050e931	2015-08-04 15:31:49 -07:00
Alex Converse	4ac5058afc	Give skip_txfm constants names. This is using a define instead of an enum to keep byte packing. Change-Id: I3abb07c8bfe377e19be4531b624af7b7b4207792	2015-07-31 10:08:08 -07:00
Jingning Han	4b5109cd73	Replace vp9_ prefix in 2D-DCT functions with vpx_ Clean up the forward 2D-DCT function names in vpx_dsp. Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25	2015-07-28 16:06:44 -07:00
Hui Su	a15edeb76d	Merge "Code cleanup in vp9_encode_block_intra"	2015-07-24 17:40:37 +00:00
hui su	e298d650cb	Code cleanup in vp9_encode_block_intra Change-Id: Ie4d958b26e586db218f8ee95d5df4bf11f2345a1	2015-07-22 10:53:12 -07:00
Jingning Han	389ed6da10	Refactor highbd forward transform use case Separate the hybrid transform case from 2D-DCT case. This will allow us to clear up cross dependency between c and SIMD implementations later. Change-Id: Iaa499e8b096850a1c5a0c50a3b6e63e15d0184bf	2015-07-20 10:31:17 -07:00

1 2 3 4 5 ...

359 Commits