Commit Graph

17597 Commits

Author SHA1 Message Date
Johann
2d6b5df657 neon: vpx_quantize_b
With skip block or coeff < zbin it is about twice as fast as C.

If most coeff values are > zbin it is about 10-15x as fast as C.

BUG=webm:1426

Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
2017-07-31 10:38:46 -07:00
James Zern
1c666465af inv_txfm_{sse2,ssse3}: clear conversion warnings
visual studio reports tran_high_t (int64) -> short in calls to
_mm_set1_epi16

Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745
2017-07-25 20:13:49 -07:00
James Zern
62682ac8ad highbd_idct*_sse*.c: clear conversion warnings
visual studio reports tran_high_t (int64) -> int in calls to
_mm_setr_epi32

Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05
2017-07-25 20:11:09 -07:00
James Zern
85736e616e vpx_variance16x16_sse2: correct cast order
allow the right shift to operate on 64-bits, this matches the rest of
the implementations

previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order

Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
2017-07-25 16:45:40 -07:00
James Zern
b0f1ae1475 vpx_get16x16var_avx2: correct cast order
allow the right shift to operate on 64-bits, this matches the rest of
the implementations

missed in:
6acd061aa variance_avx2: sync variance functions with c-code

Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
2017-07-24 16:29:44 -07:00
James Zern
8836e46ffd set_var_thresh_from_histogram: prevent negative variance
For 8-bit the subtrahend is small enough to fit into uint32_t.

For 10/12-bit apply:
63a37d16f Prevent negative variance

previously:
47b9a0912 Resolve -Wshorten-64-to-32 in highbd variance.
c0241664a Resolve -Wshorten-64-to-32 in variance.

Change-Id: I181c85f0b9a03da37c2e8b89482d48aa3dbc0aee
2017-07-22 13:27:32 -07:00
Marco
8c7a60e04d vp8: Fix compile warning in vp8_multi_resolution_encoder.c
Change-Id: I49c960179dfc1902aa5e5c99915789878c06bc3d
2017-07-20 14:19:43 -07:00
Johann Koenig
e8bd534c42 Merge "quantize test: promote RandRange() result to signed" 2017-07-20 19:46:05 +00:00
Johann Koenig
0c30b75f40 Merge "quantize test: lowbd functions do not pass in highbd" 2017-07-20 19:45:59 +00:00
Jerome Jiang
494188505b Merge "vp9: Removed unused skin detection function." 2017-07-20 16:58:01 +00:00
Johann
af08fbb444 quantize test: promote RandRange() result to signed
Avoid unsigned overflow warning:
unsigned integer overflow: 19974 - 32703 cannot be represented in type
'unsigned int'

Change-Id: Ifebee014342e4c6f3b53306c0cad6ae0b465ac12
2017-07-20 08:17:48 -07:00
Johann
c782f27ead quantize test: lowbd functions do not pass in highbd
qcoeff output looks OK but dqcoeff is no good.

BUG=webm:1448

Change-Id: I07211db8a8b74f1f45fdd059852e2de0e5ee18fd
2017-07-20 08:17:48 -07:00
Johann Koenig
4702bb26be Merge "quantize test: eob is output" 2017-07-20 15:17:26 +00:00
Johann Koenig
e1809501d0 Merge "Earmark extra space for VSX." 2017-07-19 21:35:57 +00:00
Jerome Jiang
9dd992b6f0 Merge "Roll libwebm: Fix android build failure with NDK r15b." 2017-07-19 21:30:21 +00:00
Johann
bde2e4aa36 quantize test: eob is output
eob values are generated by the function.

Change-Id: I8ce92100e83022bff99888a5a7e6ef378c49fda3
2017-07-19 14:17:19 -07:00
Han Shen
b72d3e8a25 Earmark extra space for VSX.
Backend specific optimization for PPC VSX reads 16 bytes, whereas arm neon /
sse2 only reads <= 8 bytes. Although the extra bytes read are actually never
used, this is not a warrant for groping around.  Fixed by allocating more when
building for VSX. This is reported by asan.

Also note - PPC does have assembly that loads 64-bit content from memory - lxsdx
loads one 64-bit doubleword (whereas lxvd2x loads two 64-bit doubleword) from
memory. However, we only have "vec_vsx_ld" builtins that mapped to lxvd2x, no
builtins to lxsdx. The only way to access lxsdx is through inline assembly,
which does not fit well in the origin paradigm.

Refer:
  vsx:
    vpx_tm_predictor_4x4_vsx @ third_party/libvpx/git_root/vpx_dsp/ppc/intrapred_vsx.c
  neon:
    vpx_tm_predictor_4x4_neon @ third_party/libvpx/git_root/vpx_dsp/arm/intrapred_neon_asm.asm
  sse2:
    tm_predictor_4x4 @ third_party/libvpx/git_root/vpx_dsp/x86/intrapred_sse2.asm

BUG=b/63112600

Tested:
  asan tests passed.

Change-Id: I5f74b56e35c05b67851de8b5530aece213f2ce9d
2017-07-19 13:59:32 -07:00
Johann Koenig
89a116f4cb Merge "variance: call C comp_avg_pred" 2017-07-19 20:34:13 +00:00
Jerome Jiang
8ad9338e2e Roll libwebm: Fix android build failure with NDK r15b.
BUG=webm:1447

Change-Id: I8defe45cb94eb9c209ba72ce446786f24c14c0b8
2017-07-18 16:52:46 -07:00
Jerome Jiang
4526644615 vp9: Removed unused skin detection function.
Change-Id: I6702b7b11aa4ac9aac5fd54deef4377cdcb29c64
2017-07-18 14:52:04 -07:00
Jerome Jiang
59e461db1f Merge "vp9: Allocate alt-ref in denoiser for SVC." 2017-07-18 21:30:04 +00:00
Jerome Jiang
babef23a5f Merge "vp9: Remove isolated skin & non-skin blocks." 2017-07-18 20:48:32 +00:00
Johann Koenig
56d3f1573a Merge changes I62c2e313,Ibd7a0337,I94e1d886
* changes:
  quantize test: test sse2 and avx optimizations
  quantize test: extend arrays
  quantize test: restrict and correct input
2017-07-18 20:42:39 +00:00
Johann
4b9a848bb3 variance: call C comp_avg_pred
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.

Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
2017-07-18 20:22:53 +00:00
Jerome Jiang
fd216268ad vp9: Allocate alt-ref in denoiser for SVC.
When SVC is used, allocate alt-ref in denoiser.

Change-Id: I1b17221b55b9444cd23b97d481b54ff8d296d857
2017-07-18 13:22:47 -07:00
Johann
101981b736 quantize test: test sse2 and avx optimizations
ssse3 does not pass either of the tests.

avx 32x32 does not pass.

Change-Id: I62c2e31336fd2327327afaa0da896ad79a3def44
2017-07-18 12:08:16 -07:00
Jerome Jiang
adbfc4308a vp9: Remove isolated skin & non-skin blocks.
0.007% regression on rtc and 0.004% gain on rtc_derf.
1 thread on QVGA,VGA and HD has ~0.2% speed regression while 2 threads has
~0.2% speed gain on Google Pixel.

Change-Id: Ia4a6ec904df670d7001e35e070b01e34149d23dc
2017-07-18 11:29:14 -07:00
Johann
c7ebe82253 quantize test: extend arrays
Officially the quant structures are 8 elements, with one dc element and
7 repeated ac elements. The low bit depth optimizations take advantage
of this to fill the xmm registers. The high bit depth version manually
duplicates the values.

If all the optimizations were unified, the structure sizes could be
greatly reduced.

Change-Id: Ibd7a0337a7832ce2a1a05ee433c310077e1059ae
2017-07-18 09:55:47 -07:00
Johann
cb61ba02f4 quantize test: restrict and correct input
Use only valid values for quantize inputs. These were determined by
looping over vp9_init_quantizer and looking for max and min values.

This allows extending the test to the low bit depth functions which were
not designed to handle all possible inputs but only valid inputs.

Change-Id: I94e1d8863a49ac227845b65c6b50130e10e6319e
2017-07-18 09:40:45 -07:00
Marco
817f68cdcf vp9: Disable usage of sb_use_mv_part for SVC.
To fix valgrind issueis with SVC tests.
SVC encoding uses prune_evenmore which is causing uinit value.

Will re-enable later when issue is resolved.

Change-Id: I257ff878cf78197ddd813db056582a4d5fe94f44
2017-07-18 09:28:56 -07:00
Marco
ad56371343 vp9: Fix to setting content_state for real-time mode.
When content_state_sb is set to LowVarHighSumdiff, don't reset
it to VeryHighSad. Visually better on clips with strong lighting changes.

Small/negligible change in RTC metrics and speed.

Change-Id: I20c383e3c4cf8d1149de5f9260449c0b7cf7c6aa
2017-07-17 16:21:25 -07:00
Marco
0c9e2f4c15 vp9: Reuse motion from choose_partitioning in NEWMV search.
When int_pro_motion_estimation is done for superblock in
choose_partitioning, use it to avoid the full_pixel_search
for NEWMV mode, if bsize is >= 32X32.

For speed > 7.
Small/neutral change on RTC metrics.
~1-2% speedup on arm on high motion clip.

Change-Id: I3cfe6833ff4bf75d4afa83eaf058ad45729de85b
2017-07-17 13:15:48 -07:00
James Zern
9223b947ca Merge "fix 'make exampletest' w/CONFIG_REALTIME_ONLY" 2017-07-15 18:37:10 +00:00
Jerome Jiang
682135fa60 vp9: Compute skin only for blocks eligible for noise estimation.
Change-Id: Iddcb83a5968db57cfd312c5bc44b2a226a2a3264
2017-07-14 15:14:30 -07:00
Marco
666e394d41 vp9: Adjust minmax threshold for variance partitioning.
Only affects speed 7. Improvement on high motion clips.

Change-Id: Ibddb68fed9c63207df29ffd790f9205b1cecf687
2017-07-13 21:19:37 -07:00
Johann
e3fa4ae8e3 quantize test: use Buffer
Although the low bitdepth functions are identical (excepting the need
for larger intermediate values) they do not pass these tests. This
improves the error output to aid debugging.

Simplify buffer usage with Buffer and removing unnecessarily aligned
variables.

eob is a single element and never written using aligned instructions.

BUG=webm:1426

Change-Id: Ic95789a135cf1e8a3846d85270f2b818f6ec7e35
2017-07-13 15:54:48 -07:00
James Zern
960466939d fix 'make exampletest' w/CONFIG_REALTIME_ONLY
for tests that aren't explicitly testing 2-pass behavior use --passes=1
with this configuration

Change-Id: I6a1520ecc65d0f626486604310af29dacb9f197f
2017-07-13 10:47:20 -07:00
James Zern
b578d59623 Merge "remove vp9_firstpass.c w/CONFIG_REALTIME_ONLY" 2017-07-12 23:30:04 +00:00
Johann Koenig
e0d79bc7b5 Merge "sad4d neon: 64x[32,64]" 2017-07-12 20:15:00 +00:00
Marco Paniconi
f6586b8bf8 Merge "vp9: Fix to SVC and denoising for fixed pattern case." 2017-07-12 19:13:05 +00:00
Johann Koenig
3158752980 Merge changes Ibf5e61dc,I44b48512,I7de2500c,I5081b5ce
* changes:
  sad4d neon: 32x[16,32,64]
  sad4d neon: 16x[8,16,32]
  sad4d neon: 8x[4,8,16]
  sad4d neon: 4x4, 4x8
2017-07-12 15:01:30 +00:00
Johann
e381753926 sad4d neon: 64x[32,64]
Rewrite 64x64.

BUG=webm:1425

Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf
2017-07-12 13:26:39 +00:00
Johann
e1bde306c8 sad4d neon: 32x[16,32,64]
Rewrite 32x32. Use half the accumulator registers.

BUG=webm:1425

Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13
2017-07-12 13:25:18 +00:00
Johann
807ce8fb1e sad4d neon: 16x[8,16,32]
Rewrite 16x16. Use half the accumulator registers.

BUG=webm:1425

Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2
2017-07-12 13:25:11 +00:00
Johann
8152b0904d sad4d neon: 8x[4,8,16]
BUG=webm:1425

Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4
2017-07-12 13:25:03 +00:00
Johann
dd4347e9ec sad4d neon: 4x4, 4x8
BUG=webm:1425

Change-Id: I5081b5ce131821d590c53ac1206a94f50cb8b468
2017-07-12 03:38:03 +00:00
Urvang Joshi
1dee320446 Merge "Remove the token state array from greedy optimize_b." 2017-07-12 00:08:56 +00:00
James Zern
df18412f32 remove vp9_firstpass.c w/CONFIG_REALTIME_ONLY
BUG=webm:1446

Change-Id: I6e0ea9342c715d354c641109737172afa649b85b
2017-07-11 13:10:16 -07:00
Urvang Joshi
5322a31b18 Remove the token state array from greedy optimize_b.
Reduces memory usage, and speeds up encoding for some difficult clips.
No impact on output or metrics.

Ported from aomedia patch:
https://aomedia-review.googlesource.com/c/14501

Change-Id: I26ec69af8336f9e80da486a1cfbfc89a3596954d
2017-07-11 13:05:29 -07:00
James Bankoski
7d5afa227a Merge "Reintroduce fix for max qindex calculation of a gf interval" 2017-07-11 19:47:16 +00:00