Commit Graph

341 Commits

Author SHA1 Message Date
Gabriel Marin
0549f5aae9 Simplify address arithmetic in vp9_optimize_b
Simplify address arithmetic on token_costs to reduce the number of generated
instructions that are used for address arithmetic inside routine
vp9_optimize_b. It also helps improve instruction scheduling depending on
compiler and optimization level.

Measured a 9.3% reduction in retired instructions and 5.3% reduction in
execution time for this routine with GCC v4.8.4 and optimization flags -O3,
and a reduction of up to 11.6% in execution time with other compilers.

No change in behavior.

TEST=Verified that encoded files match bit for bit, with and without this
change.
BUG=b/33678225

Change-Id: I6098650fb5cd2aa04e014fe6e68ca20761f3a21f
2016-12-19 13:10:04 -08:00
clang-format
5f6d143b41 apply clang-format
Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487
2016-09-15 15:07:53 -07:00
clang-format
e0cc52db3f vp9/encoder: apply clang-format
Change-Id: I45d9fb4013f50766b24363a86365e8063e8954c2
2016-08-02 16:47:11 -07:00
Alex Converse
e446ffda45 Cache optimizations in optimize_b().
Move best index into the token state. Shrink it down to one byte. This
is more cache friendly (access are group together) and uses less total
memory.

Results in 4% fewer cycles in optimize_b().

Change-Id: I75db484fb3dc82f59928d54b659d79c80ee40452
2016-07-29 12:06:49 -07:00
hui su
248f6ad771 Revert "Eliminate isolated and small tail coefficients:"
This reverts commit ff19cdafdb.

Change-Id: I81f68870ca27a1ff683ee22090530b6997815fb2
2016-07-13 11:14:44 -07:00
Jingning Han
2f28f9072e Enable coeff optimization for intra modes
This further improves the coding performance by
lowres 0.3%
midres 0.5%
hdres  0.6%

Change-Id: I6a03b6da210b9cbc261474bad4a103e0ba021c68
2016-07-07 12:25:41 -07:00
Jingning Han
44354ee7bf Use precise context to estimate coeff rate cost
Use the precise context to estimate the zero token cost in trellis
optimization process. This improves the speed 0 coding performance
by 0.15% for lowres and 0.1% for midres. It improves the speed 1
coding performance by 0.2% for midres and hdres.

Change-Id: I59c7c08702fc79dc4f8534b64ca594da909e2c91
2016-07-07 12:25:33 -07:00
Jingning Han
62aa642d71 Enable uniform quantization with trellis optimization in speed 0
This commit allows the inter prediction residual to use uniform
quantization followed by trellis coefficient optimization in
speed 0. It improves the coding performance by

lowres 0.79%
midres 1.07%
hdres  1.44%

Change-Id: I46ef8cfe042a4ccc7a0055515012cd6cbf5c9619
2016-07-07 12:25:33 -07:00
Min Ye
ff19cdafdb Eliminate isolated and small tail coefficients:
Improve hdres PSNR by 0.696%
Improve midres PSNR by 0.313%
Improve lowres PSNR by 0.142%

Change-Id: Icabde78aa9689f539f6a03ec09f712c20758796c
2016-07-06 11:08:23 -07:00
Jingning Han
14011f037d Remove txfrm_block_to_raster_xy() from vp9 encoder
The transform block row and column positions are always available
outside the callees. There is no need to re-compute these values
again. This approach has been used by the decoder. This commit
removes txfrm_block_to_raster_xy() function.

Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108
2016-07-04 18:41:47 -07:00
Alex Converse
50d3629c61 Repack vp9_token_state.
Reduces size from 32 bytes to 24 bytes on x86_64.

Change-Id: I8a22552343a1fc916117f35267fe6a295250f742
2016-06-20 12:56:32 -07:00
Jingning Han
9e185ed177 Refactor optimize_b for speed performance
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It removes two memset operations on quantized
and dequantized coefficient sets. This improves the unit speed
by 10%.

Change-Id: I23f47c6e14582520a7f952f03ce8f72183e7f0e6
2016-06-17 17:41:09 -07:00
Jingning Han
dba1d1a63d Port optimize_b speed-up from vp10
This commit back ports the speed-up from vp10. It improves the
unit speed by 15%.

Change-Id: Ibe8c0e0974b03266d6abd16a41e89c3b91d8db2a
2016-06-17 17:41:05 -07:00
Jingning Han
f99f78c7af Use 64-bit integer to store distortion in optimize_b
This fixes the overflow issue.

Bug=webm:1241

Change-Id: Ia168b7fae1ad214a6837aaa785a08bf8506987dd
2016-06-17 15:07:00 -07:00
hui su
a554bd8dac Avoid a potential assertion fail in optimize_b()
The eob of a block is not perperly set when skip_recode is true,
thus triggering assert(eob <= default_eob) to fail.

Change-Id: Ifecbe33dce2dc4903e0a80bd384dc09bf0dd8a44
2016-06-07 15:45:04 -07:00
Yaowu Xu
81eb71f00c Change to use proper type in vp{9,10}_token_state
"qc" in vp{9,10}_token_state is used to save quantized coefficients, this
commit changes the type from short to tran_low_t to properly reflect
the value range for highbitdepth build.

This fixes an out-of-range bug when optimize_b is used in highbitdepth
build.

Change-Id: Ibf330879e6ac6ae8f099e085caa9d3d9a889fde8
2016-05-04 12:14:11 -07:00
hui su
c3a9247e09 VP9: adjust trellis quant optimization RD parameters
Coding gain:
lowres  0.64%
midres  0.38%
hdres   0.58%

Change-Id: I233fa2a4b24bd1e15091a5f5ef6aff661f3f50ec
2016-04-26 10:17:38 -07:00
hui su
c8f56d2303 VP9: enable trellis quantization optimization for intra blocks
Coding gain:
lowres  0.18%
midres  0.23%
hdres   0.36%

Change-Id: I044c8afbc481fc55b23d440352941071355b0afb
2016-04-26 10:17:29 -07:00
Jim Bankoski
1de659af06 vp9_encodemb.c: TODO clean up
huisu did in nextgen branch -> please try in vp9

Change-Id: I0ff35db07ac38464e0e2858e303be686c03a5d0e
2016-04-21 20:35:54 +00:00
Alex Converse
d13385cee7 Switch to 9-bit rate cost constants built on a 256 probability denominator.
-.220 BDRATE derf: https://x20web.corp.google.com/~aconverse/results/cost256_derf.html
-.675 BDRATE hevcmr: https://x20web.corp.google.com/~aconverse/results/cost256_hevcmr.html

Change-Id: Ifb1646d8ce65ffe0eff9953a911b1b88735b335f
2016-01-27 19:34:30 +00:00
Alex Converse
4326cffa65 Merge "Tie the bit cost scale to a define." 2016-01-21 19:17:56 +00:00
Scott LaVarnway
5232326716 VP9: Eliminate MB_MODE_INFO
Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185
2016-01-19 16:40:20 -08:00
Alex Converse
269428e35c Tie the bit cost scale to a define.
This is a pure-refactor in preparation to potentially raise the bit-cost
resolution.

Verified at good speed 0 and rt speed -6.

Change-Id: I5347e6e8c28a9ad9dd0aae1d76a3d0f3c2335bb9
2016-01-15 15:59:31 -08:00
Scott LaVarnway
2f8625d824 VP9: remove plane_type from macroblockd_plane
Change-Id: Ia5072a3a92212d8565f33359f6c146469bdfbbec
2015-09-30 15:15:11 -07:00
Alex Converse
a8a08ce57e Move vp9_systemdependent.h to vpx_ports bitops.h and system_state.h
Use system_state.h in vpx_dsp and remove unneeded includes of
vp9_systemdependent.h.

Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4
2015-08-10 15:37:14 -07:00
Jingning Han
b4f2c567c8 Cosmetic - align format in vp9
Change-Id: I83ed3422f1f4009675ad2f5c4b7236bc7b83b30e
2015-08-06 15:56:11 -07:00
Jingning Han
d621de7e8d Change vp9_quantize to vpx_quantize
This commit clears all the vp9_ prefix use case in vpx_dsp. It gets
the vp9 folder ready to branch out vp10.

Change-Id: I2906eec179ee792b4af8c9b4161313653050e931
2015-08-04 15:31:49 -07:00
Alex Converse
4ac5058afc Give skip_txfm constants names.
This is using a define instead of an enum to keep byte packing.

Change-Id: I3abb07c8bfe377e19be4531b624af7b7b4207792
2015-07-31 10:08:08 -07:00
Jingning Han
4b5109cd73 Replace vp9_ prefix in 2D-DCT functions with vpx_
Clean up the forward 2D-DCT function names in vpx_dsp.

Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25
2015-07-28 16:06:44 -07:00
Hui Su
a15edeb76d Merge "Code cleanup in vp9_encode_block_intra" 2015-07-24 17:40:37 +00:00
hui su
e298d650cb Code cleanup in vp9_encode_block_intra
Change-Id: Ie4d958b26e586db218f8ee95d5df4bf11f2345a1
2015-07-22 10:53:12 -07:00
Jingning Han
389ed6da10 Refactor highbd forward transform use case
Separate the hybrid transform case from 2D-DCT case. This will
allow us to clear up cross dependency between c and SIMD
implementations later.

Change-Id: Iaa499e8b096850a1c5a0c50a3b6e63e15d0184bf
2015-07-20 10:31:17 -07:00
Yunqing Wang
38f1fbbb75 Migrate quantization functions from vp9/ to vpx_dsp/
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32

vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32

The purpose of doing that was to allow these functions to be shared
by multiple codecs.

Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
2015-07-17 16:38:14 -07:00
Jingning Han
81452cf0b7 Refactor intra block prediction function
This commit simplifies the intra block boundary condition logic.
It removes the block index from the argument set.

Change-Id: If00142512eb88992613d6609356dfd73ba390138
2015-07-13 15:20:47 -07:00
Jingning Han
535cc6d87f Format fixes in vp9_encodeframe.c and vp9_encodemb.c
Change-Id: Ib1303dac9043ab1b1f8fce54611cf4ea8a208038
2015-07-09 00:04:28 +00:00
Jingning Han
432cd4bfb7 Move subtract functions from vp9 to vpx_dsp
Factor out the subtraction operator as common function.

Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
2015-07-06 12:22:47 -07:00
Scott LaVarnway
b962646fc5 Re-worked header files
Various header/test files had to be re-worked in order to
build "Remove cm parameter from vp9_decode_block_tokens()".

This patch reverts the "Remove cm" part and only contains
the re-worked header files.

Change-Id: I520958a88d1991fee988a3c784d0eac40e117a32
2015-05-22 11:19:51 -07:00
Johann
1d7ccd5325 Relocate memory operations for common code
With the sad functions, and hopefully the variance functions soon,
moving to the vpx_dsp location, place the defines used in the
reference C code in a common location.

Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca
2015-05-13 11:41:15 -07:00
James Zern
f58011ada5 vpx_mem: remove vpx_memset
vestigial. replace instances with memset() which they already were being
defined to.

Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201
2015-04-28 20:00:59 -07:00
Scott LaVarnway
8b17f7f4eb Revert "Remove mi_grid_* structures."
(see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6)

For the test clip used, the decoder performance improved by ~2%.
This is also an intermediate step towards adding back the
mode_info streams.

Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d
2015-04-21 11:16:45 -07:00
Deb Mukherjee
6910e92d04 dc quantizer fix for 32x32 transforms
The rounding factor needs to be scaled down by a factor of 2.
Also, the quantized and dequantized coefficients are memset to 0
when dc quantizer is used.

Change-Id: Ifa68bab02addbf1b83d249c5b4cbd5cda796b1cf
2015-03-03 15:58:27 -08:00
Yaowu Xu
364b92dc88 Fix compiler warnigns for msvc2013
Change-Id: I1e32bf8f6872a6fb7e9cabe86483e94805e2f790
2015-01-05 17:31:19 -08:00
Jim Bankoski
b3c66f8a2f WIP: Remove giant value cost table
Change-Id: Iabe8a8868a747626c24bb13f1796f4c7827af367
2014-12-23 15:06:17 -08:00
Jim Bankoski
d6d431c476 Merge "Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value.""" 2014-12-22 13:43:56 -08:00
Jingning Han
d0f2377027 Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value.""
This reverts commit 9946ee23e0.

Fix the ssse3 asm function.

Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07
2014-12-22 10:09:25 -08:00
Jim Bankoski
4b8c6d96ec Tokenization without huge tables.
Change-Id: Iff528c4b7528cc70320343b3a7ce07a92b024dfd
2014-12-22 08:42:52 -08:00
Paul Wilkins
9946ee23e0 Revert "Removal of legacy zbin_extra / zbin_oq_value."
This reverts commit e9b586e21b.

Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4
2014-12-19 15:02:58 +00:00
Paul Wilkins
e9b586e21b Removal of legacy zbin_extra / zbin_oq_value.
zbin extra / zbin_oq_value was widely passed around,
hence removal touches a lot of code.

Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5
2014-12-18 16:49:11 +00:00
Peter de Rivaz
a306bd8274 Use the RTC optimizations when in high bitdepth mode.
Change 72193 made the encoder behave differently
when configured with and without high bitdepth.
This change means the same algorithm is used for both.

Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
2014-12-04 15:48:42 -08:00
Jingning Han
7428cebe4f Rework forward txfm/quantization skip system in RTC coding mode
This commit allows more aggressive decision to skip forward
transform and quantization for luma component in RTC coding mode.
The chroma components remains going through the normal coding
routine, since they are not included in the non-RD mode search
process.

It reduces the runtime cost by 2% - 10%. In speed -6,
vidyo1 1000 kbps
16576 b/f, 40.281 dB, 8402 ms -> 16576 b/f, 40.323 dB, 7764 ms

nik720p 1000 kbps
33337 b/f, 38.622 dB, 7473 ms -> 33299 b/f, 38.660 dB, 7314 ms

dark720p 1000 kbps
33330 b/f, 39.785 dB, 13505 ms -> 33325 b/f, 39.714 dB, 13105 ms

The compression performance of speed -6 is improved by 0.44% in
PSNR and 1.31% in SSIM.

Change-Id: Iae9e3738de6255babea734e5897f29118bebc6d7
2014-11-21 12:46:40 -08:00