Commit Graph

16948 Commits

Author SHA1 Message Date
Rafael de Lucena Valle
405b94c661 Add Hadamard for Power8
Change-Id: I3b4b043c1402b4100653ace4869847e030861b18
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-15 23:46:18 -03:00
Marco Paniconi
cd47c1942e Merge "vp9: Fix some issues with denoiser and SVC." 2017-03-16 02:42:55 +00:00
Marco
a340c64a79 vp9: Fix some issues with denoiser and SVC.
Fix the update of the denoiser buffer when the base
spatial layer is a key frame. And allow for better/lower
QP on high spatial layers when their base layer is key frame.

Change-Id: I96b2426f1eaa43b8b8d4c31a68b0c6d68c3024a2
2017-03-15 17:19:17 -07:00
Marco
2c8430e223 vp9: Turn off ml_partition_search_early_termination.
Fails on nightly ubsan, valgrind tests.
Enabled on commit:6701014

Change-Id: Ied3f5cb38e39cba54ac134f4514107cdfdfce159
2017-03-15 15:00:38 -07:00
Yi Luo
8440cc4817 Merge "Improve idct32x32_1024_add SSSE3 intrinsics performance" 2017-03-15 02:32:52 +00:00
Linfeng Zhang
d9a9a4ffea Merge "Fix overflow issue in 32x32 idct NEON intrinsics" 2017-03-15 00:38:17 +00:00
Jerome Jiang
27d5a57072 Merge "vp9: Using source sad for speedup for dynamic resizing." 2017-03-15 00:03:52 +00:00
Linfeng Zhang
c756eb01c8 Fix overflow issue in 32x32 idct NEON intrinsics
Similar issue as Change bc1c18e.

The PartialIDctTest.ResultsMatch test on vpx_idct32x32_135_add_neon()
in high bit-depth mode exposes 16-bit overflow in final stage of pass
2, when changing the test number from 1,000 to 1,000,000.

Change to use saturating add/sub for vpx_idct32x32_34_add_neon(),
vpx_idct32x32_135_add_neon and vpx_idct32x32_1024_add_neon() in high
bit-depth mode.

Change-Id: Iaec0e9aeab41a3fdb4e170d7e9b3ad1fda922f6f
2017-03-14 16:59:14 -07:00
Jerome Jiang
2fa7092808 Merge "vp9: Enable row multithreading for SVC in real-time mode." 2017-03-14 23:29:46 +00:00
Jerome Jiang
02463273c9 vp9: Using source sad for speedup for dynamic resizing.
Only for speed >= 7.

Change-Id: I3ac85fbb4023cf7e6f8333806b345b0174382a09
2017-03-14 15:47:19 -07:00
Yi Luo
fedcf83f33 Improve idct32x32_1024_add SSSE3 intrinsics performance
- Function level speed improves ~12%.

Change-Id: I9b7dbddabf08c7d0f6b25264e6074d5ccbe39290
2017-03-14 14:04:08 -07:00
James Zern
1b91f41935 Merge "vp9/encoder: fix segfault on win32 using vs < 2015" 2017-03-14 19:21:42 +00:00
Yunqing Wang
c3e290963d Merge "Apply machine learning-based early termination in VP9 partition search" 2017-03-14 18:07:05 +00:00
Marco Paniconi
78a6946904 Merge "vp9: Speed >= 8: Enable simple_block_yrd speed feature." 2017-03-14 17:50:17 +00:00
Marco
c0c789ab50 vp9: Adjust copy partition threshold, for speed 8.
Reduce it from 5 to 4, small/no change in metrics or speed.
Small reduction in dragging artifact near moving head.

Change-Id: Ic3bc5ca67c70bf0c89fc2ed14454840a28ae5b6a
2017-03-14 09:18:53 -07:00
Marco
c216c8d6f2 vp9: Speed >= 8: Enable simple_block_yrd speed feature.
Enable speed feature for resolutions > VGA.
avgPSNR on RTC down by ~1.7%.
Speedup on ARM: ~5%.

Change-Id: I7a3fe5f7425aa8df3f4a2eced1afa355bc0d4c95
2017-03-14 09:10:28 -07:00
Marco Paniconi
507204316a Merge "vp9: Fix to source_sad feature for SVC." 2017-03-13 19:18:31 +00:00
Linfeng Zhang
b0bfcc368c Merge "Add vpx_highbd_idct32x32_135_add_c()" 2017-03-13 18:49:01 +00:00
Marco
f0a22b23fe vp9: Fix to source_sad feature for SVC.
Allow speed feature sf->use_source_sad to be used
on highest spatial layer for SVC.

Change-Id: I260eb0478902764f49f83e43b17024fe86ff3b22
2017-03-13 11:00:40 -07:00
Yunqing Wang
670101439f Apply machine learning-based early termination in VP9 partition search
This patch was based on Yang Xian's intern project code. Further modifications
were done.
1. Moved machine-learning related parameters into the context structure.
2. Corrected the calculation of sum_eobs.
3. Removed unused parameters and calculations.
4. Made it work with multiple tiles.
5. Added a speed feature for the machine-learning based partition search
early termination.
6. Re-organized the code.

The patch was rebased to the top-of-tree.

Borg test BDRATE result:
4k set:     PSNR: +0.144%; SSIM: +0.043%;
hdres set:  PSNR: +0.149%; SSIM: +0.269%;
midres set: PSNR: +0.127%; SSIM: +0.257%;

Average speed gain result:
4k clips: 22%;
hd clips: 23%;
midres clips: 15%.

Change-Id: I0220e93a8277e6a7ea4b2c34b605966e3b1584ac
2017-03-13 09:54:18 -07:00
Marco Paniconi
b39f7c3364 Merge "vp9: Fix condition for intra search in non-rd pickmode." 2017-03-13 06:11:13 +00:00
Marco
8c18df7fcd vp9: Fix condition for intra search in non-rd pickmode.
Fixes an issue when the LAST and golden is not used as a reference,
in which case its possible no encoding mode is set (since intra may be
skipped under certain codtions). Fix is to make sure intra is searched
if no inter mode is checked.

Issue can happen for temporal layer pattern#7 in vpx_temporal_svc_encoder.c

Change-Id: I5ab4999b2f9dbd739044888e0916b5ec491d966b
2017-03-12 22:30:39 -07:00
James Zern
48fca113d1 inv_txfm_ssse3,butterfly: fix win32 abi compatibility
only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.

since:
07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance

BUG=webm:1384

Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
2017-03-10 19:57:17 -08:00
James Zern
c09b290cea vp9/encoder: fix segfault on win32 using vs < 2015
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.

https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code

BUG=webm:1054

Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
2017-03-10 17:37:17 -08:00
Marco Paniconi
0af189c00d Merge "vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt" 2017-03-10 18:26:06 +00:00
Marco
169c846575 vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt
Enable row-mt in the sample encoder vpx_temporal_svc_encoder.c,
under certain condiitons.

Change-Id: Ic103ee81a9d80be5bf6e5778cc21fc3199db909d
2017-03-10 10:11:39 -08:00
Yi Luo
018290a344 Merge "Improve idct32x32_135_add SSSE3 intrinsics performance" 2017-03-10 17:14:30 +00:00
Marco
ffb3c50da1 vp9: Enable row multithreading for SVC in real-time mode.
Enable row-mt for SVC for real-time mode, speed >=5.

Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.

For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.

Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
2017-03-10 01:01:07 +00:00
Yi Luo
327add990f Improve idct32x32_135_add SSSE3 intrinsics performance
- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.

Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
2017-03-09 16:17:54 -08:00
Johann Koenig
f951881e8c Merge "ppc: include ppc.h for ppc_simd_caps()" 2017-03-09 23:12:37 +00:00
James Zern
cb60e66085 Merge "move vp9_scale_and_extend_frame_c to vp9_frame_scale.c" 2017-03-09 22:51:08 +00:00
Johann
ccd23215ed ppc: include ppc.h for ppc_simd_caps()
Change-Id: Idc829eb066cf4e905d062cb9c08424e0f1b7e1a7
2017-03-09 09:26:45 -08:00
James Zern
2f31a16445 move vp9_scale_and_extend_frame_c to vp9_frame_scale.c
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes

BUG=chromium:697956

Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
2017-03-08 21:13:50 -08:00
Marco Paniconi
04aa9e28d5 Merge "vp9: Enable two speed features for SVC real-time mode." 2017-03-09 03:58:14 +00:00
Marco
ea3c817ac2 vp9: Enable two speed features for SVC real-time mode.
Enable short_circuit_low_temp_var and limit_newmv_early_exit
for SVC, 1 pass CBR mode.

Change-Id: I77df2b2c6cc40657bb8ea76e19dfc2fdaad6389e
2017-03-08 16:13:59 -08:00
Marco
97b6a6f037 vp9: Add control to vpx_temporal_svc_encoder for row-mt.
Keep it off as default for now.

Change-Id: Ia2518a8ce96c9735c3fe67215dde25a35e8620af
2017-03-08 16:03:44 -08:00
Jerome Jiang
834f26c3b9 Merge "Shift speed 2 from non-large VP9 tests to large ones." 2017-03-08 23:14:27 +00:00
Johann Koenig
42a1b310e1 Merge "Add support for POWER8/VSX" 2017-03-08 22:38:21 +00:00
Yunqing Wang
6a86492adf Merge "Make the partition search early termination feature to be frame size dependent" 2017-03-08 22:31:30 +00:00
Yunqing Wang
099e9bf1ff Make the partition search early termination feature to be frame size dependent
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.

Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
2017-03-08 12:56:41 -08:00
Linfeng Zhang
77311e0dff Update vpx_idct32x32_1024_add_neon()
Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4

Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.

Tried to remove store_in_output(), but speed gets worse.

Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
2017-03-08 12:39:04 -08:00
Rafael de Lucena Valle
51289302ab Add support for POWER8/VSX
Add ppc, ppc64 and ppc64le on all_platforms and ARCH_LIST

Add VSX flags and check for -mvsx

Define empty setup_rtcd_internal

Add Altivec detection based on:
http://freevec.org/function/altivec_runtime_detection_linux

Detect VSX at runtime when enabled

Change-Id: I304f4d8c5fee0ff19b6483cd2e9cc50d6ddec472
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-08 20:28:08 +00:00
Linfeng Zhang
48f5886605 Add vpx_highbd_idct32x32_135_add_c()
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.

BUG=webm:1301

Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-03-08 10:46:33 -08:00
Marco Paniconi
2fa710aa6d Merge "vp9: Fix for denoising with SVC." 2017-03-08 18:26:12 +00:00
Marco
45de35fc58 vp9: Fix for denoising with SVC.
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.

No change in quality.

Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
2017-03-08 09:45:58 -08:00
Linfeng Zhang
c4e5c54d69 cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()
No speed changes and disassembly is almost identical.

Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f
2017-03-08 08:58:32 -08:00
Linfeng Zhang
3cf5c213f1 cosmetics,dsp/arm/: rename a variable
Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency.

Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a
2017-03-08 08:55:41 -08:00
Jerome Jiang
c4c0331f65 Shift speed 2 from non-large VP9 tests to large ones.
This may fix the time out failure of valgrind tests in nightly
since more coverages were added on row-mt.

Change-Id: Id9414e66d1a266602c7495243d9f5cb69e17ccdc
2017-03-07 13:58:11 -08:00
James Bankoski
88a888f022 Merge "tiny_ssim.c : adds y4m support to tiny_ssim." 2017-03-07 18:49:14 +00:00
Jim Bankoski
393d9d0195 tiny_ssim.c : adds y4m support to tiny_ssim.
Change-Id: I7a13b7e3a1e11ddbe4be3009edf03528e1bc7647
2017-03-07 08:37:00 -08:00