Commit Graph

17127 Commits

Author SHA1 Message Date
James Zern
48fca113d1 inv_txfm_ssse3,butterfly: fix win32 abi compatibility
only the first 3 parameters can be aligned to 16 as required by __m128i,
make them all pointers for consistency.

since:
07c48ccfe Improve idct32x32_34_add SSSE3 intrinsics performance

BUG=webm:1384

Change-Id: I0324f701e723a27cb470036a180693ba8829d01d
2017-03-10 19:57:17 -08:00
James Zern
c09b290cea vp9/encoder: fix segfault on win32 using vs < 2015
shift the bsse[] member of the macroblock struct to the front to avoid
an incorrect offset (0) to the upper half of bsse[0] which leads to a
negative resulting in a crash. restrict this to visual studio versions
before 2015 (the bug was observed with 2013, fixed in 2015) to avoid any
potential cache impact on other platforms.

https://connect.microsoft.com/VisualStudio/feedback/details/2396360/bad-structure-offset-in-32-bit-code

BUG=webm:1054

Change-Id: I40f68a1d421ccc503cc712192263bab4f7dde076
2017-03-10 17:37:17 -08:00
Marco Paniconi
0af189c00d Merge "vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt" 2017-03-10 18:26:06 +00:00
Marco
169c846575 vp9: Sample encoder vpx_temporal_svc_encoder: enable row-mt
Enable row-mt in the sample encoder vpx_temporal_svc_encoder.c,
under certain condiitons.

Change-Id: Ic103ee81a9d80be5bf6e5778cc21fc3199db909d
2017-03-10 10:11:39 -08:00
Yi Luo
018290a344 Merge "Improve idct32x32_135_add SSSE3 intrinsics performance" 2017-03-10 17:14:30 +00:00
Marco
ffb3c50da1 vp9: Enable row multithreading for SVC in real-time mode.
Enable row-mt for SVC for real-time mode, speed >=5.

Add the controls to the sample encoders, but keep it off for now.
Add the control and enable it for the 1 pass CBR unittests.

For speed 7, 3 layer SVC, 2 threads, row-mt enabled gives about ~5% speedup.

Change-Id: Ie8e77323c17263e3e7a7b9858aec12a3a93ec0c1
2017-03-10 01:01:07 +00:00
Yi Luo
327add990f Improve idct32x32_135_add SSSE3 intrinsics performance
- Split the inv txfm into three parts to avoid stack spillover.
- Function level speed improves ~12%.
- Use function and macro to remove some repeated code.

Change-Id: I14f5f072334fd766808cb52bf648df792e7379ee
2017-03-09 16:17:54 -08:00
Johann Koenig
f951881e8c Merge "ppc: include ppc.h for ppc_simd_caps()" 2017-03-09 23:12:37 +00:00
James Zern
cb60e66085 Merge "move vp9_scale_and_extend_frame_c to vp9_frame_scale.c" 2017-03-09 22:51:08 +00:00
Johann
94655569fe Remove ppc-linux-gcc target
Change-Id: Iec2430966f54e2e5ba79f6bb703f47adde46479f
2017-03-09 11:33:33 -08:00
Johann
ccd23215ed ppc: include ppc.h for ppc_simd_caps()
Change-Id: Idc829eb066cf4e905d062cb9c08424e0f1b7e1a7
2017-03-09 09:26:45 -08:00
James Zern
2f31a16445 move vp9_scale_and_extend_frame_c to vp9_frame_scale.c
this is similar to the x86 configuration and helps mitigate an issue
with a circular dependency between this function and the ssse3 variant
causing an outsized increase in binary size (~300K for chrome)
chrome.dll:
.text 255B000 -> 252B000
.data 7B000 -> 75000
-221184 bytes

BUG=chromium:697956

Change-Id: Ic95b142ecd62dd4f1795788aa27dd8fab59b708c
2017-03-08 21:13:50 -08:00
Marco Paniconi
04aa9e28d5 Merge "vp9: Enable two speed features for SVC real-time mode." 2017-03-09 03:58:14 +00:00
Marco
ea3c817ac2 vp9: Enable two speed features for SVC real-time mode.
Enable short_circuit_low_temp_var and limit_newmv_early_exit
for SVC, 1 pass CBR mode.

Change-Id: I77df2b2c6cc40657bb8ea76e19dfc2fdaad6389e
2017-03-08 16:13:59 -08:00
Marco
97b6a6f037 vp9: Add control to vpx_temporal_svc_encoder for row-mt.
Keep it off as default for now.

Change-Id: Ia2518a8ce96c9735c3fe67215dde25a35e8620af
2017-03-08 16:03:44 -08:00
Jerome Jiang
834f26c3b9 Merge "Shift speed 2 from non-large VP9 tests to large ones." 2017-03-08 23:14:27 +00:00
Johann Koenig
42a1b310e1 Merge "Add support for POWER8/VSX" 2017-03-08 22:38:21 +00:00
Yunqing Wang
6a86492adf Merge "Make the partition search early termination feature to be frame size dependent" 2017-03-08 22:31:30 +00:00
Yunqing Wang
099e9bf1ff Make the partition search early termination feature to be frame size dependent
The 2 thresholds(i.e. partition_search_breakout_dist_thr and
partition_search_breakout_rate_thr) are used as the partition search
early termination speed feature. This refactoring patch made this
feature to be frame size dependent consistently throughout the code.

Change-Id: Idaa0bd8400badaa0f8e2091e3f41ed2544e71be9
2017-03-08 12:56:41 -08:00
Linfeng Zhang
77311e0dff Update vpx_idct32x32_1024_add_neon()
Most are cosmetics changes.
Speed has no change with clang 3.8, and about 5% faster with gcc 4.8.4

Tried the strategy used in 8x8 and 16x16 (which operations' orders are
similar to the C code), though speed gets better with gcc, it's worse
with clang.

Tried to remove store_in_output(), but speed gets worse.

Change-Id: I93c8d284e90836f98962bb23d63a454cd40f776e
2017-03-08 12:39:04 -08:00
Rafael de Lucena Valle
51289302ab Add support for POWER8/VSX
Add ppc, ppc64 and ppc64le on all_platforms and ARCH_LIST

Add VSX flags and check for -mvsx

Define empty setup_rtcd_internal

Add Altivec detection based on:
http://freevec.org/function/altivec_runtime_detection_linux

Detect VSX at runtime when enabled

Change-Id: I304f4d8c5fee0ff19b6483cd2e9cc50d6ddec472
Signed-off-by: Rafael de Lucena Valle <rafaeldelucena@gmail.com>
2017-03-08 20:28:08 +00:00
Linfeng Zhang
48f5886605 Add vpx_highbd_idct32x32_135_add_c()
When eob is less than or equal to 135 for high-bitdepth 32x32 idct,
call this function.

BUG=webm:1301

Change-Id: I8a5864f5c076e449c984e602946547a7b09c9fe6
2017-03-08 10:46:33 -08:00
Marco Paniconi
2fa710aa6d Merge "vp9: Fix for denoising with SVC." 2017-03-08 18:26:12 +00:00
Marco
45de35fc58 vp9: Fix for denoising with SVC.
Fix the conditon for getting last_source when denoising is on.
This avoids unneeded scaling in the case of SVC.

No change in quality.

Change-Id: I32c1c2c9085104da51af8535716bcc4d55fb0f42
2017-03-08 09:45:58 -08:00
Linfeng Zhang
c4e5c54d69 cosmetics,dsp/arm/: vpx_idct32x32_{34,135}_add_neon()
No speed changes and disassembly is almost identical.

Change-Id: Id07996237d2607ca6004da5906b7d288b8307e1f
2017-03-08 08:58:32 -08:00
Linfeng Zhang
3cf5c213f1 cosmetics,dsp/arm/: rename a variable
Rename cospi_6_26_14_18N to cospi_6_26N_14_18N for consistency.

Change-Id: I00498b43bb612b368219a489b3adaa41729bf31a
2017-03-08 08:55:41 -08:00
Jerome Jiang
c4c0331f65 Shift speed 2 from non-large VP9 tests to large ones.
This may fix the time out failure of valgrind tests in nightly
since more coverages were added on row-mt.

Change-Id: Id9414e66d1a266602c7495243d9f5cb69e17ccdc
2017-03-07 13:58:11 -08:00
James Bankoski
88a888f022 Merge "tiny_ssim.c : adds y4m support to tiny_ssim." 2017-03-07 18:49:14 +00:00
Jim Bankoski
393d9d0195 tiny_ssim.c : adds y4m support to tiny_ssim.
Change-Id: I7a13b7e3a1e11ddbe4be3009edf03528e1bc7647
2017-03-07 08:37:00 -08:00
James Zern
47cf7c25a2 Merge "vp8_create_decoder_instances: correct pbi[] memset" 2017-03-04 00:47:18 +00:00
Alex Converse
15dac923b9 Merge "Narrow cat6_high_cost tables to uint16_t" 2017-03-03 23:45:39 +00:00
James Zern
9c3c1f3725 vp8_create_decoder_instances: correct pbi[] memset
clear the entire array on error. the size used previously was equal to
the number of elements.

BUG=webm:1364

Change-Id: I2f2e16ed6e867f41d4774a5a8ac9cedaee11ce46
2017-03-03 15:23:32 -08:00
Alex Converse
bcd12de6c3 Narrow cat6_high_cost tables to uint16_t
Saves 2688 bytes of rodata.

Change-Id: I46633b6e50c2845181c70fff6273a8e58fdd1e56
2017-03-03 23:09:12 +00:00
Vignesh Venkatasubramanian
9e7140b451 Merge "vp9,realtime: Enable row multithreading for non-rd" 2017-03-03 19:05:52 +00:00
Marco Paniconi
1bb63bf669 Merge "vp9: Speed 8: reduce the adaptive_rd_thresh level." 2017-03-02 22:25:03 +00:00
Marco
b60617f5ff vp9: Speed 8: reduce the adaptive_rd_thresh level.
Reduce the level from 4 to 2.
This gives ~1-2% quality gain on RTC set, with small decreaee in speed (~1-2% on mac).

Change-Id: I7d959731badcee3d45b2f4a08efe378765016a13
2017-03-02 13:34:10 -08:00
Vignesh Venkatasubramanian
453f18040f vp9,realtime: Enable row multithreading for non-rd
Enable row level multithreading for realtime encodes where non-rd
path is used (speed >= 5).

Change-Id: I5439cb49a02171166d8e1de06c7d5e6f8e819a41
2017-03-02 11:03:56 -08:00
Yi Luo
07c48ccfe0 Improve idct32x32_34_add SSSE3 intrinsics performance
- Split the transform into first half and second half.
- Reschedule the instructions to avoid stack spillover.
- Function level speed improves ~16%.

Change-Id: I166889840d23aa8a273eca00f6fbdae8b4566f35
2017-03-01 11:14:48 -08:00
Chrome Cunningham
b71245683b Merge "VPX_CODEC_CAP_HIGHBITDEPTH for decoder interface" 2017-03-01 18:01:14 +00:00
Chris Cunningham
bcd0c49af3 VPX_CODEC_CAP_HIGHBITDEPTH for decoder interface
Moves the def from vpx_encoder.h -> vpx_codec.h. The defined value
is changed as part of this move.

Adds the value to decoder capabilities when CONFIG_VP9_HIGHBITDEPTH.

Change-Id: I7d61fc821cda29f1e32bb9b2b9ffd3d83966e419
2017-02-28 17:10:34 -08:00
James Zern
8697d14ec8 Revert "Fix for max qindex calculation of a gf interval"
This reverts commit d3db846cc5.

This change causes a large drop in psnr (4-5db) on low framerate
difficult content (tested at 360/480p)

BUG=b/35804225

Change-Id: I8e90012d3b9c8a0cddb062ba93b01b36c0e0c0a0
2017-02-28 16:26:13 -08:00
James Zern
66919e370b vp9_ethread_test,cosmetics: s/new-mt/row-mt/
Change-Id: I8c145337adf49d30b88a17ff31501b8751ed1fa0
2017-02-28 15:13:11 -08:00
James Zern
3ab8a05b37 stress.sh: add vp9_stress_test_row_mt
vp9_stress_test now forces --row-mt=0 to cover both versions

Change-Id: I8d134879435bf1d8e76ab3fd89e698efba0e86b2
2017-02-28 15:09:30 -08:00
James Zern
b58a8ccb02 stress.sh: parameterize thread count
Change-Id: Iae45266cea86585f0935af4012335198cf93719f
2017-02-28 15:09:30 -08:00
James Zern
4684d286de stress.sh: add one pass encodes
Change-Id: I38e6c988f17c56fbfacd95378b27ef8d77c75f90
2017-02-28 15:09:30 -08:00
Yunqing Wang
3833905ff2 Add a comment in encoder thread test
Added a comment.

Change-Id: I82f71c72598ad6f1eaa0b57b0b8ec56ab9658e81
2017-02-28 11:13:09 -08:00
Yunqing Wang
3fa7e5c62c Set row_mt to 0 by default
Set row_mt to 0 for now.

Change-Id: I922536a6d71a765e435daeaf4d932ef14363d19a
2017-02-28 11:00:56 -08:00
Marco
defe094e9e vp9: Fix an issue with setting variance thresholds.
From commit:
https://chromium-review.googlesource.com/c/441393/

On non-segment the set_vbp_thresholds() should be called
again to adjust thresholds based on content_state of superblock.
This was the intended behavior from 441393.

Small change in RTC metrics and speed.

Change-Id: I45e5fbdc4af74db76b3cb4f13074fcae0eb2219e
2017-02-27 12:09:51 -08:00
Vignesh Venkatasubramanian
ddfe906be2 vp9_ethread_test: Rename new_mt to row_mt
Rename left over occurences of new_mt.

Change-Id: Ib884e84c801fcd366ca4b57ec912ac5972023375
2017-02-27 10:50:02 -08:00
Vignesh Venkatasubramanian
5881601488 vp9: Rename new_mt to row_mt
new_mt is a very generic name that will get obsolete soon enough.
Since this is exposed as a codec control, renaming it to row_mt to
signify row level paralellism. Also renaming the ETHREAD_BIT_MATCH
codec control to ROW_MT_BIT_EXACT.

Change-Id: Ic7872d78bb3b12fb4cf92ba028ec8e08eb3a9558
2017-02-27 09:43:26 -08:00