17488 Commits

Author SHA1 Message Date
Johann
eae7cf2368 fdct16x16 neon optimization
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.

BUG=webm:1424

Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
2017-06-07 14:59:55 -07:00
Marco Paniconi
9cea3a3c4e Merge "vp9: SVC: Enable simple_block_yrd for temporal layers." 2017-06-07 21:12:14 +00:00
Johann Koenig
0c4f74d129 Merge changes Iade45f69,I18d90658,Ieca3f1ef
* changes:
  buffer.h: add num_elements_
  buffer.h: zero-init all values
  buffer.h: use size_t
2017-06-07 19:20:16 +00:00
Marco
14d4718043 vp9: SVC: Enable simple_block_yrd for temporal layers.
Enable simple_block_yrd for temporal enhancement layers (TL > 0).
And remove block size condiiton for SVC mode.
Only affects speed >= 7 SVC.

Speedup ~3-4%.
avgPSNR regression on RTC for (3 spatial, 3 temporal) layers: ~1%.

Change-Id: Iff4fc191623b71c69cd373e7c0823385e7ac67ed
2017-06-07 11:41:50 -07:00
Johann
902d63759e buffer.h: add num_elements_
raw_size_ was being incorrectly computed and used

Change-Id: Iade45f69964c567ffb258880f26006a96ae5a30d
2017-06-07 11:31:20 -07:00
Johann
4a37e3e2a0 buffer.h: zero-init all values
Change-Id: I18d90658bcd4365d49adcadd6954090b3b399aa8
2017-06-07 11:27:26 -07:00
Johann
f08581c1d0 buffer.h: use size_t
Change-Id: Ieca3f1ef23cd1d7b844ea3ecb054007ed280b04f
2017-06-07 11:24:27 -07:00
Marco
13b02a8efe vp9: SVC: Enable row-mt in sample encoder.
Change-Id: I4b51043cb3f5955efe947fe4685aed4a21adb8bd
2017-06-07 10:32:44 -07:00
James Zern
ff42e04f9c Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}" 2017-06-06 23:52:39 +00:00
Marco Paniconi
27b34a109d Merge "vp9: SVC: Adjust some speed settings for SVC speed >= 7." 2017-06-06 23:07:45 +00:00
Marco
7d2f5f8e9d vp9: SVC: Adjust some speed settings for SVC speed >= 7.
Keep the 1/4subpel for all frames, use SUBPEL_TREE_PRUNED_EVENMORE
for all temporal enhancement layer frames.

Change-Id: Ibc681acbb6fc75b7b3c57fc483fcb11d591dfc9a
2017-06-06 15:30:24 -07:00
Johann
de4cb716ee buffer.h: split out init
Change-Id: Idfbd2e01714ca9d00525c5aeba78678b43fb0287
2017-06-06 15:02:50 -07:00
Johann
8659764a07 buffer.h: Use T for values
Change-Id: I2da4110e843b6e361028b921c24b6ca2ea9077d9
2017-06-06 12:05:14 -07:00
Jerome Jiang
cf07d85809 Initialize cost_list all to INT_MAX.
It is initialized to be { INT_MAX, 0, ... } in ffe0f9b.
No effect on encoders.
Make it consistent with other initializations.

BUG=webm:1440

Change-Id: Ie2a180d93626b55914c8c4255e466a1986d2b922
2017-06-06 10:42:37 -07:00
James Zern
6df142e2ab vp9_mcomp,get_cost_surf_min: quiet conversion warning
visual studio will warn if a 32-bit shift is implicitly converted to 64.
in this case integer storage is enough for the result.
since:
f3a9ae5ba Fix ubsan failure in vp9_mcomp.c.

Change-Id: I7e0e199ef8d3c64e07b780c8905da8c53c1d09fc
2017-06-05 22:52:58 -07:00
Jerome Jiang
968a5d6bc2 Merge "Fix valgrind failure on uninitialized variables." 2017-06-06 03:47:31 +00:00
James Zern
4753c23983 Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx" 2017-06-06 02:19:41 +00:00
Jerome Jiang
ffe0f9b7fb Fix valgrind failure on uninitialized variables.
BUG=webm:1440

Change-Id: I7074e42bdfa8dd25f11bbb3f2ab1b41d6f4c12e4
2017-06-05 13:09:29 -07:00
Jerome Jiang
f3a9ae5baa Fix ubsan failure in vp9_mcomp.c.
Change-Id: Iff1dea1fe9d4ea1d3fc95ea736ddf12f30e6f48d
2017-06-02 21:37:13 -07:00
Marco
e30781ff80 vp9: SVC: Force subpel search off under certain conditions.
For SVC 1 pass non-rd mode:
Force subpel seach off for SVC for non-reference frames
under motion threshold.

Add flag to svc context to indicate if the frame is not used
as a reference.

Little/no quaity loss, ~2% speedup.

Change-Id: Ic433c44b514d19d08b28f80ff05231dc943b28e9
2017-06-01 20:48:52 -07:00
Marco Paniconi
ff637d1903 Merge "vp9: Speed >8: Set subpel_search_method for low motion." 2017-06-01 23:57:19 +00:00
Marco
8c6fa5c5e3 vp9: Speed >8: Set subpel_search_method for low motion.
Speed >=8: for resolutions above CIF, and for low motion content,
set subpel_search_method to SUBPEL_TREE_PRUNED_EVENMORE.

Small speed gain (~2%) on vga clips,
RTC metrics up by ~2-3% on average.

Change-Id: Ie26ba0264589652f92dfe74308740debf94cf0cc
2017-06-01 16:16:13 -07:00
Jerome Jiang
68f035026f vp8 skin detection: Fix visual studio build failure.
Change-Id: I510b755550ebbfa2aaf9b974920d7f1c6454a845
2017-06-01 13:46:46 -07:00
Jerome Jiang
e254969df2 Fix corruption in skin map debugging output yuv.
For both vp8 and vp9.

BUG=webm:1437

Change-Id: Ifd06f68a876ade91cc2cc27c574c4641b77cce28
2017-06-01 16:59:43 +00:00
Jerome Jiang
f1a300acc4 vp8: Clean up skin detection.
Use only the average of center 2x2 pixels in vp8.

Change-Id: I2b23ff19a90827226273e0fca49e90c734eda59b
2017-05-31 14:57:10 -07:00
Johann Koenig
755b3daf90 Merge "comp_avg_pred neon: used by sub pixel avg variance" 2017-05-31 18:17:28 +00:00
Jerome Jiang
32d8992147 Merge "Write skin map of vp8 skin detection for debug." 2017-05-31 16:37:07 +00:00
Linfeng Zhang
30ea3ef283 Merge "Update vpx_highbd_idct4x4_16_add_sse2()" 2017-05-31 15:56:20 +00:00
Johann
f695b30ac2 comp_avg_pred neon: used by sub pixel avg variance
BUG=webm:1423

Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804
2017-05-30 22:47:34 +00:00
Jerome Jiang
c39526da8a Write skin map of vp8 skin detection for debug.
Change-Id: Ica1b4e918aa759cd0ce65920f9d88452bbf9e3b4
2017-05-30 10:30:05 -07:00
Linfeng Zhang
45048dc9dc Update vpx_highbd_idct4x4_16_add_sse2()
BUG=webm:1412

Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84
2017-05-30 09:25:30 -07:00
Johann Koenig
b9649d2407 Merge "comp_avg_pred: alignment" 2017-05-30 16:21:05 +00:00
Johann Koenig
48c0e13286 Merge "remove DECLARE_ALIGNED from neon code" 2017-05-30 15:58:17 +00:00
Johann
ea8b4a450d comp_avg_pred: alignment
x86 requires 16 byte alignment for some vector loads/stores.

arm does not have the same requirement.

The asserts are still in avg_pred_sse2.c. This just removes them from
the common code.

Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6
2017-05-30 07:46:43 -07:00
Jerome Jiang
a5ab38093f Merge "Fix vp8 race when build --enable-vp9-highbitdepth." 2017-05-30 05:47:44 +00:00
Johann
42ce25821d remove DECLARE_ALIGNED from neon code
Unlike x86 neon only requires type alignment when loading into vectors.

Change-Id: I7bbbe4d51f78776e499ce137578d8c0effdbc02f
2017-05-26 10:41:57 -07:00
Johann Koenig
2693b89c19 Merge "subpel variance neon: reduce stack usage" 2017-05-26 17:25:47 +00:00
Johann Koenig
47174d60c8 Merge "Use vdup instead of vmov" 2017-05-26 17:25:24 +00:00
Jerome Jiang
0afa2dad76 Fix vp8 race when build --enable-vp9-highbitdepth.
Split vp8/vp9 implementations on yv12_copy_frame_c.
Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c.
Clean up vp8 codes usage in vp9.

BUG=webm:1435

Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
2017-05-26 09:45:01 -07:00
Marco
146005a911 vp9: SVC: Fix to condiiton on using source_sad.
Fix the condition on usage of source_sad for temporal layers.
FIx allows it to be used for the case of 1 temporal layer.

Change-Id: I02b1b0ade67a7889d1b93cee66d27c0951131fc3
2017-05-26 08:46:50 -07:00
Marco Paniconi
9ec9415fd9 Merge "vp9: Use source_sad only on top temporal enhancement layer." 2017-05-26 05:24:06 +00:00
Marco Paniconi
4be18ab295 Merge "vp9: SVC: Enable copy partition for SVC speed >= 7." 2017-05-26 05:23:47 +00:00
Marco
ea914456af vp9: Use source_sad only on top temporal enhancement layer.
For 1 pass CBR SVC mode.

Change-Id: Ic026740f9d0ec5eee7c5845be9c5b15884fec48d
2017-05-25 16:32:05 -07:00
Jerome Jiang
327c9bb1da Refactor: Move vp8 skin detection to new files.
Change-Id: If760f28cbbf22beac1cc9bd1546f13831e9dd3f0
2017-05-25 16:12:27 -07:00
Marco
747cf7a505 vp9: SVC: Enable copy partition for SVC speed >= 7.
Adjust the max_copied_frame setting for temporal layers.
Keep the same setting for non-SVC at speed 8.
This change also enables copy_partiton for non-SVC at speed 7,
but with smaller value of max_copied_frame (=2).

~2% speedup for SVC speed 7, 3 layers, with little/no quality loss.

Change-Id: Ic65ac9aad764ec65a35770d263424b2393ec6780
2017-05-25 12:21:46 -07:00
Johann
f3c97ed32e subpel variance neon: reduce stack usage
Unlike x86, arm does not impose additional alignment restrictions on
vector loads. For incoming values to the first pass, it uses vld1_u32()
which typically does impose a 4 byte alignment. However, as the first
pass operates on user-supplied values we must prepare for unaligned
values anyway (and have, see mem_neon.h).

But for the local temporary values there is no stride and the load will
use vld1_u8 which does not require 4 byte alignment.

There are 3 temporary structures. In the C, one is uint16_t. The arm
saturates between passes but still passes tests. If this becomes an
issue new functions will be needed.

Change-Id: I3c9d4701bfeb14b77c783d0164608e621bfecfb1
2017-05-24 13:28:13 -07:00
Johann
d204c4bf01 Use vdup instead of vmov
Change-Id: Idb6248c1429b55176bb3e9f4e8365ea0ed2be62a
2017-05-24 11:38:15 -07:00
Johann Koenig
de1a9c77a7 Merge changes Iaab2b9a1,Idfb458d3
* changes:
  sub pel avg variance neon: 4x block sizes
  sub pel variance neon: 4x block sizes
2017-05-24 18:33:53 +00:00
Johann Koenig
b11a37f540 Merge changes I31fa6ef8,I228c6f29
* changes:
  sub pel avg variance neon: add neon optimizations
  sub pel variance neon: normalize variable names
2017-05-24 18:32:02 +00:00
James Zern
f0279ceb92 Merge "partial_idct_test,InitInput: fix rollover in mult" 2017-05-24 16:27:21 +00:00