Commit Graph

7760 Commits

Author SHA1 Message Date
James Zern
c77b1f5acd vp9: RECON_AND_STORE4X4: remove dest offset
offsetting by a variable stride prevents instruction reordering,
resulting in poor assembly

Change-Id: Id62d6b3299cdd23f8c44f97b630abf4fea241446
2015-04-30 19:14:17 -07:00
James Zern
778845da05 vp9_idct_intrin_*: RECON_AND_STORE: remove dest offset
offsetting by a variable stride prevents instruction reordering,
resulting in poor assembly.
additionally reroll 16x16/32x32 loops to reduce register spill with this
new format

Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11
2015-04-30 19:14:17 -07:00
Yaowu Xu
2061359fcf Merge "Remove vp9_idct16x16_10_add_ssse3()" 2015-04-30 23:13:33 +00:00
James Zern
0ae1e4a95a Merge "vp9_decodeframe: simplify compare_tile_buffers" 2015-04-30 23:05:42 +00:00
hkuang
493a8579f1 Add some sse2 code for intra prediction.
Change-Id: I16c0a62e52dab62837c547345df31e7518620ed4
2015-04-30 15:42:57 -07:00
Yunqing Wang
4907c29904 Reduce intra_cost_penalty for BLOCK_8X8
This patch reduced the BLOCK_8X8's intra_cost_penalty, which
allows 8x8 blocks to conduct intra mode search. Borg test
result(rtc set): 0.077% PSNR gain, 0.228% SSIM gain. No speed
changes.

Change-Id: Icfe90c4f6969de24bda8ecacbd3da50330bf22b2
2015-04-30 11:03:06 -07:00
Yaowu Xu
47767609fe Remove vp9_idct16x16_10_add_ssse3()
The rotation computation using 2X of cos(pi/16) has a potential to
overflow 32 bit, this commit disable the function to allow further
investigation and optimization.

Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
2015-04-30 09:07:30 -07:00
Yunqing Wang
fd90ce2711 Merge "Improve golden frame refreshing in non-rd mode" 2015-04-30 15:57:55 +00:00
Yunqing Wang
a257e469e1 Adjust the vbp early termination threshold slightly
Calculated cpi->vbp_threshold_sad from this frame's dequant value.
The encoding quality and speed didn't change much. Borg test
result: PSNR: -0.002%, SSIM: -0.003%.

Change-Id: I97c9826986f39582f29910d637d08a69c90afdee
2015-04-30 08:51:02 -07:00
Parag Salasakar
95cb130f32 Merge "mips msa vp9 copy and avg convolve optimization" 2015-04-30 04:39:13 +00:00
Yaowu Xu
d45870be8d Merge "Disable ssse3 version idct16x16_256_add()" 2015-04-30 03:09:23 +00:00
James Zern
9e81112df2 vp9_decodeframe: simplify compare_tile_buffers
return the difference between the 2 buffer sizes rather than exactly
-1/0/1.

Change-Id: Idf1ccff7088b31845470bcc71bea5927b0598cc7
2015-04-29 17:42:30 -07:00
Yaowu Xu
486a73a9ce Disable ssse3 version idct16x16_256_add()
The version is currently producing different result from c version
for some input. Disable the use of it for now to allow time for
investigation the source of mismatch.

Change-Id: Id039455494ee531db4886a9f1fa4761174ef6df3
2015-04-29 16:58:59 -07:00
Yunqing Wang
d31698b0e0 Improve golden frame refreshing in non-rd mode
The default golden frame interval was doubled. After encoding a
frame, the background motion was measured. If the motion was high,
the current frame was set as the golden frame. Currently, the
changes were applied only while aq-mode 3 was on.

Borg tests(rtc set) showed a 0.226% PSNR gain and 0.312% SSIM gain.
No speed changes.

Change-Id: Id1e2793cc5be37e8a9bacec1380af6f36182f9b1
2015-04-29 16:43:43 -07:00
Parag Salasakar
2301d10f73 mips msa vp9 copy and avg convolve optimization
average improvement ~3x-5x

Change-Id: I422e4c33ea7e6d6783ba40029438ccf21b0e76bb
2015-04-29 12:28:17 +05:30
James Zern
f58011ada5 vpx_mem: remove vpx_memset
vestigial. replace instances with memset() which they already were being
defined to.

Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201
2015-04-28 20:00:59 -07:00
James Zern
f274c2199b vpx_mem: remove vpx_memcpy
vestigial. replace instances with memcpy() which they already were being
defined to.

Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c
2015-04-28 19:59:41 -07:00
James Zern
fbd3b89488 vpx_mem: remove vpx_memmove
vestigial. replace instances with memmove() which they already were
being defined to.

Change-Id: If396d3f9e3cf79c0ee5d7429615ef3d6b2a34afa
2015-04-28 19:59:40 -07:00
Frank Galligan
2be50a1c9c Merge "WIP: Use LUT for y_dequant/uv_dequant" 2015-04-28 16:12:10 +00:00
Yunqing Wang
bfce02971e Merge "Fix debugmodes file to print modes and MVs correctly" 2015-04-28 15:47:46 +00:00
Scott LaVarnway
afcb62b414 WIP: Use LUT for y_dequant/uv_dequant
instead of calculating every block.

Change-Id: Ib19ff2546be8441f8755ae971ba2910f29412029
2015-04-28 07:52:06 -07:00
Yunqing Wang
297b2b99de Fix debugmodes file to print modes and MVs correctly
This patch fixed the issues in debugmodes file because of the recent
changes in MODE_INFO struct.

Change-Id: I4df83379ecc887c1f009d4a8329c9809c5b299d6
2015-04-27 17:09:38 -07:00
Yaowu Xu
b3e411e481 Add validation of UV partition size
For color sampling format other than 420, valid partion size in Y may
not work for UV plane. This commit adds validation of UV partition
size before select the partition choice.

This fixes a crash for real time encoding of 422 input.

Change-Id: I1fe3282accfd58625e8b5e6a4c8d2c84199751b6
2015-04-24 12:34:18 -07:00
Jim Bankoski
a6e9ae9066 Adds worst frame metrics for a bunch of metrics.
Change-Id: Ieaccc36ed1bee024bb644a9cfaafdaaa65d31772
2015-04-22 06:45:56 -07:00
paulwilkins
e07b141da0 Merge "Modified test for auto key frame detection." 2015-04-22 02:29:17 -07:00
paulwilkins
5d8877a944 Merge "Limit arf interval for low fpf clips." 2015-04-22 02:25:38 -07:00
Parag Salasakar
1c9af9833d Merge "mips msa vp9 convolve8 horiz optimization" 2015-04-21 22:08:25 -07:00
Jim Bankoski
3b35e962e2 Merge "Adds a new temporal consistency metric to libvpx." 2015-04-21 16:11:11 -07:00
Johann
931c0a954f Merge "Rename neon convolve avg file" 2015-04-21 15:45:29 -07:00
Johann
66b9933b8d Rename neon convolve avg file
Some build systems use just the basename for object files.

Change-Id: I333e1107ee866f3906cc46476ef8d04c6200a8a0
2015-04-21 14:18:17 -07:00
Scott LaVarnway
8b17f7f4eb Revert "Remove mi_grid_* structures."
(see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6)

For the test clip used, the decoder performance improved by ~2%.
This is also an intermediate step towards adding back the
mode_info streams.

Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d
2015-04-21 11:16:45 -07:00
Jim Bankoski
ee87e20d53 Adds a new temporal consistency metric to libvpx.
Change-Id: Id61699ebf57ae4f8af96a468740c852b2f45f8e1
2015-04-21 10:05:37 -07:00
Yaowu Xu
924d06a075 Merge "Resolve configuration conflict" 2015-04-21 08:00:49 -07:00
paulwilkins
3606b78108 Modified test for auto key frame detection.
The existing test was triggering a lot of false positives on some types
of animated material with very plain backgrounds. These were triggering
code designed to catch key frames in letter box format clips.

This patch tightens up the criteria and imposes a minimum requirement
on the % blocks coded intra in the first pass and the ratio between the
% coded intra and the modified inter % after discounting neutral (flat)
blocks that are coded equally well either way.

On a particular problem animation clip this change eliminated a large
number of false positives including some cases where the old code
selected kf several times in a row. Marginal false negatives are less
damaging typically to compression and in the problem clip there are now
a couple of cases where "visual" scene cuts are ignored because of well
correlated content across the scene cut.

Replaced some magic numbers related to this with #defines and added
explanatory comments.

Change-Id: Ia3d304ac60eb7e4323e3817eaf83b4752cd63ecf
2015-04-21 12:50:11 +01:00
Parag Salasakar
ca90d4fd96 mips msa vp9 convolve8 horiz optimization
average improvement ~6x-8x

Change-Id: I7c91eec41aada3b0a5231dda7869b3b968f3ad18
2015-04-21 12:31:26 +05:30
Parag Salasakar
391ecffed9 Merge "mips msa vp9 convolve8 hv optimization" 2015-04-20 23:39:24 -07:00
Parag Salasakar
ef51c1ab5b mips msa vp9 convolve8 hv optimization
average improvement ~5x-8x

Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b
2015-04-21 09:17:49 +05:30
Yaowu Xu
b423a6b212 Resolve configuration conflict
Between --enable-internal-stats and --enable-vp9-highbitdepth

Change-Id: I36b741554e835033e69883270b6b0e5374a1aafa
2015-04-20 16:44:12 -07:00
Yaowu Xu
305492c375 Move declaration before statement
Change-Id: Ib64786fcc0d6dc11c4e66f5b7f3e93b2a4fcb664
2015-04-20 09:50:59 -07:00
Parag Salasakar
2e36149ccd Merge "mips msa vp9 convolve8 vert optimization" 2015-04-18 23:39:25 -07:00
Parag Salasakar
27d083c1b9 mips msa vp9 convolve8 vert optimization
average improvement ~6x-10x

Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa
2015-04-18 08:13:04 +05:30
Jim Bankoski
03829f2fea Merge "Adds a blockiness metric to internal stats." 2015-04-17 16:06:26 -07:00
Jim Bankoski
3d2f037a44 Merge "adds psnrhvs to internal stats." 2015-04-17 16:06:10 -07:00
Jim Bankoski
f2cbee9a04 Merge "Adds a fastssim metric to VPX internal stats." 2015-04-17 16:05:53 -07:00
Jim Bankoski
1777413a2a Adds a blockiness metric to internal stats.
Change-Id: Iedceeb020492050063acf3fd2326f96c29db9ae5
2015-04-17 11:13:18 -07:00
Jim Bankoski
9757c1aded adds psnrhvs to internal stats.
PSNR HVS is a human visual system weighted version of SNR that's
gained some popularity from academia and apparently better matches
MOS testing.

This code is borrowed from the Daala Project but uses our FDCT code.

Change-Id: Idd10fbc93129f7f4734946f6009f87d0f44cd2d7
2015-04-17 10:29:27 -07:00
Jim Bankoski
3f7f194304 Adds a fastssim metric to VPX internal stats.
This code appeared in the Daala project first and was originally
committed by Nathan Egge.

Change-Id: Iadce416a091929c51b46637ebdec984cddcaf18c
2015-04-17 10:23:24 -07:00
Jingning Han
73bce9ec7e Merge "Remove unnecessary backup token stream pointer" 2015-04-17 09:13:53 -07:00
Marco Paniconi
f76ccce5bc Revert "Revert "Force_split on 16x16 blocks in variance partition.""
This reverts commit 004b9d83e3

Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743
2015-04-16 17:52:13 -07:00
Jingning Han
645c70f852 Remove unnecessary backup token stream pointer
When the tokenization is not taking effect, the tokenization
pointer remains unchanged. No need to re-assign the backup pointer
value.

Change-Id: I58fe1f6285aa3b4a88ceb864c11d5de8ac6235dd
2015-04-16 16:44:44 -07:00
Minghai Shang
29b5cf6a9d Merge "[svc] Fix syntax error when encoding multiple tiles." 2015-04-16 13:43:44 -07:00
Minghai Shang
4aa9255efa [svc] Fix syntax error when encoding multiple tiles.
Change-Id: Ia77b551415f3b3386e22a6c805f244f2d13fe3e3
2015-04-16 12:56:30 -07:00
paulwilkins
effd974b16 Limit arf interval for low fpf clips.
This patch limits  the maximum arf interval length to
approximately half a second. In some low fps animations in
particular the existing code was selecting an overly long interval
which was hurting visual quality. For a sample problem test clip
(360P animation , 15fps, ~200Kbit/s) this change also improved
metrics by >0.5 db.

There may be some clips where this hurts metrics a little, but the
worst case impact visually is likely to be less than having an
interval that is much too long. On more normal material at 24
fps or higher, the impact is likely to be nil/minimal.

Change-Id: Id8b57413931a670c861213ea91d7cc596375a297
2015-04-16 11:50:37 +01:00
Yunqing Wang
14e7203e7b Merge "Fix Tsan errors" 2015-04-15 15:34:03 -07:00
Yunqing Wang
63c5bf2b9c Fix Tsan errors
This patch fixed 2 reported Tsan errors while running VP9 real-time
encoder.

Change-Id: Ib0278fe802852862c3ce87c4a500e544d7089f67
2015-04-15 12:33:39 -07:00
Johann
14ef4aeafb Reorganize *_rtcd() calling conventions
Change-Id: Ib1e17d8aae9b713b87f560ab5e49952ee2bfdcc2
2015-04-15 11:12:05 -04:00
Yunqing Wang
004b9d83e3 Revert "Force_split on 16x16 blocks in variance partition."
This reverts commit eb8c667570.
The patch caused mismatch while using multi-threads.

Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be
2015-04-14 15:19:31 -07:00
Marco
2baa3debd5 Merge "Force_split on 16x16 blocks in variance partition." 2015-04-14 09:44:58 -07:00
hkuang
3b2510374a Merge "Remove unnecessary set postproc flags." 2015-04-13 14:33:43 -07:00
Marco
eb8c667570 Force_split on 16x16 blocks in variance partition.
Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks.

Also increase variance threshold for 32x32, and add exit condiiton in choose_partition
(with very safe threshold) based on sad used to select reference frame.

Some visual improvement near moving boundaries.
Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%.
Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip.

Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577
2015-04-13 12:05:07 -07:00
Parag Salasakar
2f693be8f8 Merge "mips msa vp9 common headers added" 2015-04-09 21:50:15 -07:00
Jingning Han
2404332c1b Merge "Remove get_nonrd_var_based_fixed_partition function" 2015-04-09 14:45:19 -07:00
Jingning Han
4565812032 Merge "Compute prediction filter type cost only when needed" 2015-04-09 14:45:11 -07:00
Jingning Han
93d9c50419 Merge "SSSE3 assembly implementation of 8x8 Hadamard transform" 2015-04-09 11:16:11 -07:00
Jingning Han
208aa6158b Remove get_nonrd_var_based_fixed_partition function
This function has been replaced by other approaches and is not
in use now.

Change-Id: I387f45b5607d202539e482468ccc70e6c0f9341f
2015-04-09 09:49:55 -07:00
Parag Salasakar
481fb7640c mips msa vp9 common headers added
Change-Id: Ia31ada59172eb1818e1eb91009f83cbb1f581223
2015-04-09 15:35:12 +05:30
hkuang
7e8e507bfb Remove unnecessary mv clamp with on demand border extension.
Change-Id: Ia2956f06f409b9b0ca8320ca4c1ea5680e938402
2015-04-08 17:16:52 -07:00
Frank Galligan
5668dcc7b9 Refactor dec_build_inter_predictors
Refactor the loops in dec_build_inter_predictors to try and decrease
the number of instructions. Limited testing saw about 1% perf
increase on x86 and about 0.67 % perf increase on Arm.

Change-Id: I69cfe6335bb562fbaaebf43fb3f5c5a2a28882a2
2015-04-08 15:00:29 -07:00
Debargha Mukherjee
59681be0a0 Merge "Improve accuracy of rate control in CQ mode" 2015-04-08 10:48:17 -07:00
James Zern
2ed0cf06f9 Merge "vp9_full_search_sadx[38]: align sad arrays" 2015-04-07 20:57:21 -07:00
Yaowu Xu
c88ce84bb5 Merge "Optimize the checking for transform skipping" 2015-04-07 16:29:51 -07:00
Yaowu Xu
90517b5e85 Merge "move ref_frame_cost computations into a function" 2015-04-07 16:29:45 -07:00
Debargha Mukherjee
60bd744c88 Improve accuracy of rate control in CQ mode
Modifies a special handling that improves rate control accuracy in
the constrained quality mode, when the undershoot and overshoot
limits are set tighter.

Change-Id: If62103f0ef3ed1cac92807400678c93da50cf046
2015-04-07 16:29:21 -07:00
James Zern
e1ff83f4b0 vp9_full_search_sadx[38]: align sad arrays
the sse4 code expects 16-byte aligned arrays; vp8 already had a similar
change applied:
b2aa401 Align SAD output array to be 16-byte aligned

Change-Id: I5e902035e5a87e23309e151113f3c0d4a8372226
2015-04-07 14:34:06 -07:00
Jingning Han
927693a991 Merge "Enable Hadamard transform based cost estimate for all block sizes" 2015-04-07 12:51:27 -07:00
Jingning Han
6de407b638 Merge "Account for eob cost in the RTC mode decision process" 2015-04-07 12:50:30 -07:00
Jingning Han
25206e7b7f Compute prediction filter type cost only when needed
Skip redundant prediction filter type cost in filter search loop,
if the rate value will be reset in Hadamard transform based rate
distortion estimate.

Change-Id: Ie5221f4bc8da9461c449df367251aeeac52c6e5d
2015-04-07 12:41:46 -07:00
Yaowu Xu
0bb897211d Optimize the checking for transform skipping
If U is not skippable, then do not perform the check on V.

Change-Id: Iba5e8362bd42390197f373c44388a426a4404549
2015-04-06 17:54:05 -07:00
Jingning Han
7f629dfca4 SSSE3 assembly implementation of 8x8 Hadamard transform
It uses about 10% less CPU cycles than the SSE2 intrinsic
implementation.

Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499
2015-04-04 09:59:37 -07:00
Jingning Han
9922e4344a Enable Hadamard transform based cost estimate for all block sizes
This commit turns on the Hadamard transform based rate distortion
estimate for all block sizes in RTC coding mode. It conditionally
skips the rate distortion estimation if all zero block flag is set
on. No significant encoding speed change is observed. The
compression performance of speed -6 is improved by 1.7% over using
it only for block sizes of 32x32 and below.

Change-Id: I768145e6f05c737b05b5b5f1ee674e929532cafb
2015-04-04 09:58:45 -07:00
Yunqing Wang
b2baaa215b Merge "Fix the scaling factor in UV skipping test" 2015-04-03 17:09:59 -07:00
Yunqing Wang
1a1114d21c Fix the scaling factor in UV skipping test
The threshold scaling factor was calculated wrong using partition
size "bsize". Thank Yaowu for pointing it out. It was fixed and no
speed change was seen.

Change-Id: If7a5564456f0f68d6957df3bd2d1876bbb8dfd27
2015-04-03 16:07:43 -07:00
James Zern
44e3640923 Merge "vp9: enable sse4 sad functions" 2015-04-03 14:57:52 -07:00
Jingning Han
30e9c091c0 Merge "Tune SSSE3 assembly implementation to improve quantization speed" 2015-04-03 11:24:28 -07:00
Jingning Han
60e01c6530 Account for eob cost in the RTC mode decision process
This commit accounts for the transform block end of coefficient flag
cost in the RTC mode decision process. This allows a more precise
rate estimate. It also turns on the model to block sizes up to 32x32.
The test sequences shows about 3% - 5% speed penalty for speed -6.
The average compression performance improvement for speed -6 is
1.58% in PSNR. The compression gains for hard clips like jimredvga,
mmmoving, and tacomascmv at low bit-rate range are 1.8%, 2.1%, and
3.2%, respectively.

Change-Id: Ic2ae211888e25a93979eac56b274c6e5ebcc21fb
2015-04-03 10:31:51 -07:00
hkuang
d72ed35374 Merge "Fix error of "Left shift of negative value -1"." 2015-04-02 21:35:12 -07:00
Yunqing Wang
12cb30d4bd Merge "Set vbp thresholds for aq3 boosted blocks" 2015-04-02 18:22:08 -07:00
Yaowu Xu
718feb0f69 move ref_frame_cost computations into a function
Change-Id: Iebf2ad2b1db7e2874788fda8d55e67f4cb1149f1
2015-04-02 18:10:55 -07:00
hkuang
73c8fe5deb Fix error of "Left shift of negative value -1".
Change-Id: Ia4f3feb20df0e89cc51b02def858e12e927312cc
2015-04-02 17:35:33 -07:00
Marco
f85f79f630 Merge "Code cleanup: put (8x8/4x4)fill_variance into separate function." 2015-04-02 17:33:01 -07:00
Yunqing Wang
cae03a7ef5 Set vbp thresholds for aq3 boosted blocks
The vbp thresholds are set seperately for boosted/non-boosted
superblocks according to their segment_id. This way we don't
have to force the boosted blocks to split to 32x32.

Speed 6 RTC set borg test result showed some quality gains.
Overall PSNR: +0.199%; Avg PSNR: +0.245%; SSIM: +0.802%.
No speed change was observed.

Change-Id: I37c6643a3e2da59c4b7dc10ebe05abc8abf4026a
2015-04-02 15:48:32 -07:00
Marco
77ea408983 Code cleanup: put (8x8/4x4)fill_variance into separate function.
Code cleanup, no change in behavior.

Change-Id: I043b889f8f0b3afb49de0da00873bc3499ebda24
2015-04-02 13:37:35 -07:00
Marco
6eb05c9ed0 Small fix to segment check in pickmode.
Change-Id: Id5fd82a504def2523292466fbaad5dade9424c72
2015-04-02 09:55:13 -07:00
James Zern
b8a1de86fd Merge "vp9/neon: skip some files in high-bitdepth build" 2015-04-01 23:36:56 -07:00
James Zern
b644384bb5 Merge "vp9: fix high-bitdepth NEON build" 2015-04-01 23:36:17 -07:00
Yaowu Xu
54210f706c Merge "use MAX_MB_PLANE consistently" 2015-04-01 18:24:39 -07:00
hkuang
f3bea3de5b Remove unnecessary set postproc flags.
Change-Id: Iaf136969bc368a890f9671647576ee9d54eef03b
2015-04-01 17:11:35 -07:00
hkuang
4cf68be17a Merge "Fix 10-bit video decode failure with --frame-parallel mode." 2015-04-01 17:07:58 -07:00
Jingning Han
2149f214d5 Merge "Reduce required xmm number by one in block_error_fp" 2015-04-01 15:46:22 -07:00
Jingning Han
657cabe0f7 Tune SSSE3 assembly implementation to improve quantization speed
Change-Id: If0ca8b25b4800d4336e6cbc97194cd9b01c5b5a3
2015-04-01 15:28:01 -07:00
Yaowu Xu
f26b8c84f8 use MAX_MB_PLANE consistently
Change-Id: Ic416a7f145001a88f5a7f70dde9b1edbc1b69381
2015-04-01 15:21:20 -07:00
Yaowu Xu
fff4654d36 Merge "Simplify bsize calculation" 2015-04-01 15:06:55 -07:00
Jingning Han
cf4447339e Merge "Optimize quantization simd implementation" 2015-04-01 14:55:18 -07:00
Jingning Han
a4364e5146 Merge "Simplify effective src_diff address computation" 2015-04-01 14:55:03 -07:00
Jingning Han
7acb2a8795 Merge "Refactor block_yrd function for RTC coding mode" 2015-04-01 14:54:24 -07:00
Yaowu Xu
ba91b54d7c Simplify bsize calculation
Change-Id: Ibc514684def9914c66f04cb7931f773e2b79c168
2015-04-01 12:15:06 -07:00
Jingning Han
19da916716 Simplify effective src_diff address computation
Remove redundant offset calculation for effective src_diff address.

Change-Id: I4aab241a36abcef7fd8adf74aed5e12b8b88e0ef
2015-04-01 12:07:47 -07:00
Jingning Han
f2cf3c06a0 Reduce required xmm number by one in block_error_fp
Use 6 xmms instead of 8.

Change-Id: If976ad85d09191d2fb0565399d690f2869dbbcc7
2015-04-01 12:07:35 -07:00
Jingning Han
1470529f62 Refactor block_yrd function for RTC coding mode
This commit separates Hadamard transform/quantization operations
from rate and distortion computation in block_yrd. This allows one
to skip SATD computation when all transform blocks are quantized
to zero. It also uses a new block error function that skips
repeated computation of sum of squared residuals. It reduces the
CPU cycles spent on block error calculation in block_yrd by 40%.

Change-Id: I726acb2454b44af1c3bd95385abecac209959b10
2015-04-01 12:00:43 -07:00
Jingning Han
eed1badedd Optimize quantization simd implementation
This commit allows the quantizer to compare the AC coefficients to
the quantization step size to determine if further multiplication
operations are needed. It makes the quantization process 20% faster
without coding statistics change.

Change-Id: I735aaf6a9c0874c82175bb565b20e131464db64a
2015-04-01 11:47:09 -07:00
Yunqing Wang
a0043c6d30 Enhance the transform skipping decision-making in non-rd mode
For large partition blocks(block_size > 32x32), the variance
calculation is modified so that every 8x8 block's variance
is stored during the calculation, which is used in the
following transform skipping test. Also, the variance for
every tx block is calculated. The skipping test checks all tx
blocks in the partition, and sets the skip flag only if all tx
blocks are skippable. If the skip flag of Y plane is 1, a
quick evaluation is done on UV planes. If the current partition
block is skippable in YUV planes, the mode search checks fewer
inter modes and doesn't check intra modes.

The rtc set borg test(at speed 6) showed that:
Overall psnr: -0.527%; Avg psnr: -0.510%; ssim: -0.573%.
Average single-thread speedup on rtc set was 3.5%.
For 720p clips, more speedups were seen.
gipsrecmotion: 13%
gipsrestat: 12%
vidyo: 5 - 9%
dark: 15%
niklas: 6%

Change-Id: I8d8ebec0cb305f1de016516400bf007c3042666e
2015-04-01 09:43:40 -07:00
hkuang
1582ac851f Fix 10-bit video decode failure with --frame-parallel mode.
Also add unit test to avoid same error in the future.

Issue:981

Change-Id: Iaf9889d8d5514cfdff1ea098e6ae133be56d501f
2015-04-01 09:19:35 -07:00
James Zern
14e24a1297 vp9: enable sse4 sad functions
sse4 isn't set by configure or used in rtcd, correct the sad entries to
use sse4_1 without changing the signatures for now.
this was done in vp8 post-vp9 branch.

Change-Id: Ia9f1fff9f2476fdfa53ed022778dd2f708caa271
2015-03-31 21:00:55 -07:00
James Zern
a98f6c0254 vp9/neon: skip some files in high-bitdepth build
exclude files that only contain functions for non-high-bitdepth builds.
this removes some warnings related to missing prototypes

Change-Id: Ic6642998c46a7b808c6c53b2f9c34bcd4d037abe
2015-03-31 18:06:21 -07:00
James Zern
8845334097 vp9: fix high-bitdepth NEON build
remove incorrect specializations in rtcd and update a configuration
check in partial_idct_test.cc

Change-Id: I20f551f38ce502092b476fb16d3ca0969dba56f0
2015-03-31 17:45:25 -07:00
Yunqing Wang
fc98114761 Merge "Rename vbp thresholds" 2015-03-31 16:33:30 -07:00
Marco
c2b8218eba Merge "Set postproc flags in decoder_get_frame." 2015-03-31 15:22:14 -07:00
Yunqing Wang
c28ff1a9de Rename vbp thresholds
Code refactoring

Change-Id: I410fcce1bc6d95c62c474445f4c97ea8469f1e79
2015-03-31 15:14:44 -07:00
Jingning Han
502ac72233 Merge "Tuning SATD rate calculation for speed" 2015-03-31 14:24:26 -07:00
Jingning Han
1c39c5b96f Merge "Use aligned copy in 8x8 Hadamard transform SSE2" 2015-03-31 12:16:47 -07:00
Jingning Han
fa4289522e Merge "Allow block skip coding option in RTC mode" 2015-03-31 12:16:36 -07:00
Jingning Han
1638d7dc96 Merge "Fix 8x8 Hadamard SSE2 implementation" 2015-03-31 12:16:27 -07:00
Alex Converse
9670d766ab Merge "VP9E_GET_ACTIVE_MAP API function." 2015-03-31 11:52:56 -07:00
Jingning Han
531468a07a Tuning SATD rate calculation for speed
This commit allows the encoder to check the eob per transform
block to decide how to compute the SATD rate cost. If the entire
block is quantized to zero, there is no need to add anything; if
only the DC coefficient is non-zero, add its absolute value;
otherwise, sum over the block. This reduces the CPU cycles spent
on vp9_satd_sse2 to one third.

Change-Id: I0d56044b793b286efc0875fafc0b8bf2d2047e32
2015-03-31 11:02:20 -07:00
hui su
d4f2f1dd5b Merge "Move vp9_coef_con_tree to common/" 2015-03-31 10:51:10 -07:00
Jingning Han
014fa45298 Use aligned copy in 8x8 Hadamard transform SSE2
This reduces the 8x8 Hadamard transform cycles by 20%.

Change-Id: If34c5e02f3afa42244c6efabe121f7cf5d2df41b
2015-03-31 10:21:52 -07:00
Jingning Han
db5ec37edc Merge "Enable 16x16 Hadamard transform in SATD based mode decision" 2015-03-31 09:55:41 -07:00
Jingning Han
8c5670bb6f Merge "Use SATD based mode decision for block sizes below 16x16" 2015-03-31 09:47:47 -07:00
Jingning Han
ebe1be9186 Allow block skip coding option in RTC mode
When the estimated rate-distortion cost of skip coding mode is
lower than that of sending quantized coefficients, allow the
encoder to drop these coefficients. This improves the compression
performance of speed -6 by 0.268% and makes the encoding speed
slightly faster.

Change-Id: Idff2d7ba59f27ead33dd5a0e9f68746ed3c2ab68
2015-03-31 09:32:53 -07:00
hui su
302e24cb3e Move vp9_coef_con_tree to common/
This tree should be defined in common/, as it is needed for
both encoder and decoder.

Change-Id: I4f5cbc80025cf2ced14182c98f7c82dc7d0f87db
2015-03-31 09:20:46 -07:00
Marco
385ca8f741 Set postproc flags in decoder_get_frame.
The postproc settings were not set in decoder_get_frame().

Change-Id: I20d23de3ea18f6df061a53d691d4095d5c62532a
2015-03-30 16:15:57 -07:00
Jingning Han
9b99eb2e12 Merge "Reuse inter prediction pixel block for Hadamard transform" 2015-03-30 16:09:38 -07:00
Jingning Han
34a996ac1e Fix 8x8 Hadamard SSE2 implementation
This commit fixes the SSE2 version 8x8 Hadamard transform
alignment and makes it consistent with the C version.

Change-Id: I1304e5f97e0e5ef2d798fe38081609c39f5bfe74
2015-03-30 15:54:08 -07:00
Jingning Han
26d3d3af6a Enable 16x16 Hadamard transform in SATD based mode decision
This commit replaces the 16x16 2D-DCT transform with Hadamard
transform for RTC coding mode. It reduces the CPU cycles cost
on 16x16 transform by 5X. Overall it makes the speed -6 encoding
speed 1.5% faster without compromise on compression performance.

Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b
2015-03-30 15:43:31 -07:00
Jingning Han
f0ac5aaa08 Merge "Hadamard transform based coding mode decision process" 2015-03-30 15:43:15 -07:00
Jingning Han
b4b5af6acd Use SATD based mode decision for block sizes below 16x16
This commit makes the encoder to select between SATD/variance as
metric for mode decision. It also allows to account chroma
component costs for mode decision as well. The overall encoding
time increase as compared to variance based mode selection is about
15% for speed -6. The compression performance is on average 2.2%
better than variance based approach, with about 5% compression
performance gains for hard clips (e.g., jimredvga, nikas720p, and
mmmoving) at lower bit-rate range.

Change-Id: I4d04a31d36f4fcb3f5f491dacd6e7fe44cb9d815
2015-03-30 15:20:07 -07:00
Jingning Han
8a927a1b7a Reuse inter prediction pixel block for Hadamard transform
It saves one unnecessary motion compensated prediction constructed
by using 8-tap filter.

Change-Id: I101215131e6f38621d5935885f94cc74de6a5377
2015-03-30 15:04:33 -07:00
Jingning Han
8c411f74e0 Hadamard transform based coding mode decision process
This commit uses Hadamard transform based rate-distortion cost
estimate for rtc coding mode decision. It improves the compression
performance of speed -6 for many hard clips at lower bit-rates.
For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for
niklas720p. This will introduce extra encoding cycle costs at
this point.

Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375
2015-03-30 14:46:05 -07:00
Alex Converse
bf7def9a43 Merge "Simplify skip check." 2015-03-30 11:31:45 -07:00
jackychen
b38b32a794 Merge "vp9_postproc.c: eliminate -Wshadow build warnings." 2015-03-30 10:29:39 -07:00
jackychen
68610ae568 vp9_postproc.c: eliminate -Wshadow build warnings.
Change-Id: I6df525a9ad1ae3cfbba8710d21db8fee76e64dbb
2015-03-27 20:27:30 -07:00
Marco
fa20a60f0d Speed 5: use non-rd mode for key frame coding.
Metrics on RTC set go down by ~1.5% on average.
Key frame encoding time goes down by factor of ~5.

Change-Id: Ia83acc55848613870e5ac6efe7f3d904d877febb
2015-03-27 16:19:26 -07:00
Adrian Grange
ad18b2b641 Remove 8-bit array in HBD
Creating both 8- and 16-bit arrays and then only using one
of them is wasteful.

Change-Id: Ic5b397c283efaff7bcfff2d2413838ba3e065561
2015-03-25 15:37:03 -07:00
Adrian Grange
65df3d138a Replace heap with stack memory allocation
Replaced the dynamic memory allocation of the
second_pred buffer with an allocation on the stack.

Change-Id: I2716c46b71e8587714ca5733a99eca2c68419b23
2015-03-25 15:36:43 -07:00
Adrian Grange
8d8d7bfde5 Fix use of scaling in joint motion search
To enable us to the scale-invariant motion estimation
code during mode selection, each of the reference
buffers is scaled to match the size of the frame
being encoded.

This fix ensures that a unit scaling factor is used in
this case rather than the one calculated assuming that
the reference frame is not scaled.

Change-Id: Id9a5c85dad402f3a7cc7ea9f30f204edad080ebf
2015-03-25 15:35:29 -07:00
paulwilkins
ab788c5380 Merge "Enable group adaptive max q by default." 2015-03-24 15:00:12 -07:00
Alex Converse
4dcb839607 VP9E_GET_ACTIVE_MAP API function.
This is useful when aq mode 3 (cyclic refresh) reactivates segments for refresh.

Change-Id: I3ad1d9410b899ede393d82bb8db14e2da4d84eca
2015-03-24 11:19:47 -07:00
Alex Converse
a1e20ec58f Refactor fast loop filter code to handle 444.
Change-Id: I921b1ebabdf617049f8fa26fbe462c3ff115c1ce
2015-03-24 11:17:50 -07:00
Yaowu Xu
c77d4dcb35 Merge "vp9_pred_mv(): misc fixes and optimizations" 2015-03-24 10:36:51 -07:00
Alex Converse
02697e35dc Merge "A tiny cyclic refresh / active map fix." 2015-03-24 09:43:24 -07:00