Commit Graph

1942 Commits

Author SHA1 Message Date
Johann Koenig
794a5ad713 Merge "fdct32x32 neon implementation" 2017-06-23 01:58:00 +00:00
Linfeng Zhang
c5f9de573f Merge changes I783c5f4f,I365f8e53,I5dac0e98
* changes:
  Clean vpx_idct16x16_256_add_sse2()
  Update vpx_idct{8x8,16x16,32x32}_1_add_sse2()
  Clean 32x32 full idct sse2 and ssse3 code
2017-06-22 21:42:23 +00:00
Johann
e67660cf37 fdct32x32 neon implementation
Almost 3x faster in constrained loop testing. Over 10x faster in HBD
builds.

BUG=webm:1424

Change-Id: I2b7f8453e1d4ada63cde729d8115d684c4a71ff9
2017-06-22 06:40:17 -07:00
Linfeng Zhang
2b43a1ee18 Clean 32x32 full idct sse2 and ssse3 code
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.

Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
2017-06-21 13:46:49 -07:00
Johann
1c48915233 dct tests: align InvAccuracyCheck buffers
'in' is used for the reference fdct. 'coeff' is input to the idct being
tested and 'dst[16]' is output

Fixes a segfault on unaligned memory access on x86.

Change-Id: I3691b1380ed49986897dd89a63ce63a80a0e0962
2017-06-21 11:47:00 -07:00
James Zern
0aa3677d9d fix build, rm ref to vpx_idct8x8_64_add_ssse3
this was deleted in:
98967645a Remove vpx_idct8x8_64_add_ssse3()

but this was merged in:
9e03eedf6 Merge changes Ib26dd515,Ie60dabc3

after:
a92991133 Merge "dct tests: run all possible sizes in one test"

which added a new reference

Change-Id: I8da4a6c80d27b237a378ff15eead1daab89e7e25
2017-06-20 19:46:45 -07:00
Linfeng Zhang
9e03eedf62 Merge changes Ib26dd515,Ie60dabc3
* changes:
  Clean 8x8 idct x86 optimization
  Remove vpx_idct8x8_64_add_ssse3()
2017-06-21 00:38:25 +00:00
Johann
4ebb9a36f1 dct tests: run all possible sizes in one test
Modify fdct4x4_test.cc to support all size combinations. This does not
add any new tests and in fact fails a few. There were minimal changes
made to the tests so it's not entirely surprising that some of the
larger 12 bit transforms are failing since it was initially only used
for 4x4.

In follow up patches the tests in fdct8x8_test.cc, dct16x16_test.cc and
dct32x32_test.cc will be evaluated and moved to dct_test.cc.

BUG=webm:1424

Change-Id: I72a23430f457d7fae8c91e706adc0e77c25abc8f
2017-06-19 15:39:35 -07:00
Linfeng Zhang
98967645a1 Remove vpx_idct8x8_64_add_ssse3()
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.

Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
2017-06-15 14:09:33 -07:00
Johann Koenig
6dcd9b37ea Merge "idct_test: don't use std::nothrow anymore" 2017-06-09 20:42:39 +00:00
Johann Koenig
8aa4ee1f10 Merge "buffer.h: allow declaring an alignment" 2017-06-09 20:42:21 +00:00
Johann
92373a5bb2 idct_test: don't use std::nothrow anymore
But still check for NULL before calling Init()

Change-Id: I2bf2887e1064c9103d29c542d20365c0aea75d76
2017-06-09 11:09:06 -07:00
Johann
5aee8ea752 buffer.h: allow declaring an alignment
x86 simd register operations generally prefer and may require 16 byte
alignment.

Change-Id: I73ce577a90dc66af60743c5727c36f23200950ba
2017-06-09 11:03:15 -07:00
James Zern
b3a262dff3 Merge "vp8_decode_frame: fix oob read on truncated key frame" 2017-06-08 23:17:50 +00:00
James Zern
45daecb4f7 vp8_decode_frame: fix oob read on truncated key frame
the check for error correction being disabled was overriding the data
length checks. this avoids returning incorrect information (width /
height) for the decoded frame which could result in inconsistent sizes
returned in to an application causing it to read beyond the bounds of
the frame allocation.

BUG=webm:1443
BUG=b/62458770

Change-Id: I063459674e01b57c0990cb29372e0eb9a1fbf342
2017-06-08 23:16:04 +00:00
Johann
e50ea014c3 Revert "buffer.h: use size_t"
This reverts commit f08581c1d0.

type conversion warnings abound.

Change-Id: I41d4c0e7a388e1008bdbc55fefda4bbca3f89f00
2017-06-08 10:20:21 -07:00
Johann Koenig
903375a48a Merge "fdct16x16 neon optimization" 2017-06-08 15:19:36 +00:00
Johann
eae7cf2368 fdct16x16 neon optimization
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.

BUG=webm:1424

Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
2017-06-07 14:59:55 -07:00
Johann Koenig
0c4f74d129 Merge changes Iade45f69,I18d90658,Ieca3f1ef
* changes:
  buffer.h: add num_elements_
  buffer.h: zero-init all values
  buffer.h: use size_t
2017-06-07 19:20:16 +00:00
Johann
902d63759e buffer.h: add num_elements_
raw_size_ was being incorrectly computed and used

Change-Id: Iade45f69964c567ffb258880f26006a96ae5a30d
2017-06-07 11:31:20 -07:00
Johann
4a37e3e2a0 buffer.h: zero-init all values
Change-Id: I18d90658bcd4365d49adcadd6954090b3b399aa8
2017-06-07 11:27:26 -07:00
Johann
f08581c1d0 buffer.h: use size_t
Change-Id: Ieca3f1ef23cd1d7b844ea3ecb054007ed280b04f
2017-06-07 11:24:27 -07:00
James Zern
ff42e04f9c Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}" 2017-06-06 23:52:39 +00:00
Johann
de4cb716ee buffer.h: split out init
Change-Id: Idfbd2e01714ca9d00525c5aeba78678b43fb0287
2017-06-06 15:02:50 -07:00
Johann
8659764a07 buffer.h: Use T for values
Change-Id: I2da4110e843b6e361028b921c24b6ca2ea9077d9
2017-06-06 12:05:14 -07:00
James Zern
4753c23983 Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx" 2017-06-06 02:19:41 +00:00
Johann Koenig
755b3daf90 Merge "comp_avg_pred neon: used by sub pixel avg variance" 2017-05-31 18:17:28 +00:00
Johann
f695b30ac2 comp_avg_pred neon: used by sub pixel avg variance
BUG=webm:1423

Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804
2017-05-30 22:47:34 +00:00
Jerome Jiang
a5ab38093f Merge "Fix vp8 race when build --enable-vp9-highbitdepth." 2017-05-30 05:47:44 +00:00
Jerome Jiang
0afa2dad76 Fix vp8 race when build --enable-vp9-highbitdepth.
Split vp8/vp9 implementations on yv12_copy_frame_c.
Remove high-bitdepth codes from vp8_yv12_extend_frame_borders_c.
Clean up vp8 codes usage in vp9.

BUG=webm:1435

Change-Id: Ic68e79e9d71e1b20ddfc451fb8dcf2447861236d
2017-05-26 09:45:01 -07:00
Johann Koenig
de1a9c77a7 Merge changes Iaab2b9a1,Idfb458d3
* changes:
  sub pel avg variance neon: 4x block sizes
  sub pel variance neon: 4x block sizes
2017-05-24 18:33:53 +00:00
Johann Koenig
b11a37f540 Merge changes I31fa6ef8,I228c6f29
* changes:
  sub pel avg variance neon: add neon optimizations
  sub pel variance neon: normalize variable names
2017-05-24 18:32:02 +00:00
James Zern
566f6d75bd partial_idct_test,InitInput: fix rollover in mult
promote coeff to signed 64-bit to avoid exceeding integer bounds when
squaring the value

Change-Id: If77bef6bc0a6a4c39ca3013e5e2ddb426a1c6e1f
2017-05-24 15:27:38 +02:00
Alexandra Hájková
8bf6eaf433 ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}
Change-Id: I547d0099e15591655eae954e3ce65fdf3b003123
2017-05-24 13:27:09 +00:00
Linfeng Zhang
36f1b183e4 Update InitInput() in test/partial_idct_test.cc
Make it work in high bit depth.

BUG=webm:1412

Change-Id: Ic5cfd410a69709f01e2924774356a108a349d273
2017-05-23 14:24:23 -07:00
Johann
f6fcd3410d sub pel avg variance neon: 4x block sizes
BUG=webm:1423

Change-Id: Iaab2b9a183fdb54aae5f717aba95d90dc36a9e3b
2017-05-22 14:40:05 -07:00
Johann
188d58eaa9 sub pel variance neon: 4x block sizes
Add optimizations for blocks of width 4

BUG=webm:1423

Change-Id: Idfb458d36db3014d48fbfbe7f5462aa6eb249938
2017-05-22 14:40:01 -07:00
Johann
9b0d306a2f sub pel avg variance neon: add neon optimizations
These are missing an optimized version of vpx_comp_avg_pred

BUG=webm:1423

Change-Id: I31fa6ef842e98f7ff3ea079ffed51ae33178e2ed
2017-05-22 13:58:43 -07:00
Linfeng Zhang
c167345ffb Add vpx_highbd_idct{4x4,8x8,16x16}_1_add_sse2
BUG=webm:1412

Change-Id: Ia338a6057d36f9ed7eaa9cbd4dfbf0c3cbdc6468
2017-05-22 11:24:21 -07:00
Johann Koenig
e7cac13016 Merge changes Ib8dd96f7,Ie9854b77
* changes:
  neon variance: process 4x blocks
  use memcpy for unaligned neon stores
2017-05-22 17:48:33 +00:00
Johann Koenig
3c603eadb4 Merge "neon fdct: 4x4 implementation" 2017-05-19 17:08:58 +00:00
Johann
7b742da63e neon variance: process 4x blocks
Continue processing sets of 16 values. Plenty of improvement for 4x8
(doubles the speed) but only about 30% for 4x4.

BUG=webm:1422

Change-Id: Ib8dd96f75d474f0348800271d11e58356b620905
2017-05-17 17:35:01 -07:00
Marco Paniconi
a2dfbbd7d6 Merge "vp9: Modify ChangingDropFrameThresh unittest." 2017-05-17 18:42:51 +00:00
Marco
4733df333f vp9: Modify ChangingDropFrameThresh unittest.
Add another (lower) bitrate to the test, to cover
frame drop behavior at low bitrate range.

Change-Id: Iaad003974159daf3d2d65ef3a6575a3e72e498d6
2017-05-17 09:38:21 -07:00
Linfeng Zhang
3210ca6d60 Update partial idct testing code
Add PartialIDctTest::PrintDiff() to help debugging.
In RunQuantCheck, try all combinations of +/-mask_ input for 4x4 idct.
Update PartialIDctTest::InitInput().

Change-Id: I13fd163954a4c1a3a6cfeb5e4a4d3d0e7ff901f4
2017-05-17 09:28:32 -07:00
Johann
105503b839 neon fdct: 4x4 implementation
Approximately twice as fast as C implementation.

BUG=webm:1424

Change-Id: I3c0307fb08ddc23df42545cd089a78e2ed5c9d3f
2017-05-17 07:38:18 -07:00
Alexandra Hájková
bcbc3929ae ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx
Change-Id: Ic9639b1331d8c5cbc207c2a036891ff0137fc56f
2017-05-13 13:13:15 +00:00
James Zern
ac8f58f6ab Merge changes I1b54a7a5,I3028bdad,I59788cd9
* changes:
  ppc: Add get_mb_ss_vsx
  ppc: Add get4x4sse_cs_vsx
  ppc: Add comp_avg_pred_vsx
2017-05-12 15:24:59 +00:00
Luca Barbato
143b21e362 ppc: Add get_mb_ss_vsx
Change-Id: I1b54a7a5bb642e4b836d786ea1ae506eed025e3f
2017-05-12 17:23:00 +02:00
Luca Barbato
6d225eb5f9 ppc: Add get4x4sse_cs_vsx
Change-Id: I3028bdadf653665d18e781d28e9625f62804b3d8
2017-05-12 17:23:00 +02:00