Johann
3ae458f2f3
partial fdct neon: maintain neon registers
...
Finish the calulations in neon registers. This avoids a potentially
expensive move from neon to gp and allows at least clang to store
directly to memory.
BUG=webm:1424
Change-Id: Idef25eec95f7610947167818e9194bde8b00d282
2017-07-01 09:29:38 -07:00
James Zern
27e37e1a8a
fwd_txfm_msa.c: correct vpx_fdct8x8_1_msa prototype
...
this makes the function compatible with high-bitdepth and fixes test
failures since:
5ac88162b partial fdct test
Change-Id: Ib630694608237f0c515948942e05dbea259ba338
2017-06-30 18:50:47 -07:00
Linfeng Zhang
1e3a93e72e
Merge changes I5d038b4f,I9d00d1dd,I0722841d,I1f640db7
...
* changes:
Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
2017-06-30 20:49:19 +00:00
Johann Koenig
89d3dc043e
Merge changes Id5beb35d,I2945fe54,Ib0f3cfd6,I78a2eba8
...
* changes:
partial fdct neon: add 32x32_1
partial fdct neon: add 16x16_1
partial fdct neon: add 4x4_1
partial fdct neon: move 8x8_1 and enable hbd tests
2017-06-30 01:00:07 +00:00
Linfeng Zhang
c338f3635e
Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
...
BUG=webm:1412
Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591
2017-06-29 17:19:34 -07:00
Linfeng Zhang
ee5cb8d87f
sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
...
BUG=webm:1412
Change-Id: I9d00d1ddbd724fd5f825fd974c4cf46a9bca6cb3
2017-06-29 17:18:01 -07:00
Linfeng Zhang
0fa59a4baf
Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
...
Also clean highbd_inv_txfm_sse2.h
BUG=webm:1412
Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b
2017-06-29 17:17:43 -07:00
Linfeng Zhang
9ac78ae35f
Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
...
BUG=webm:1412
Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d
2017-06-29 17:13:28 -07:00
James Zern
bd77931421
dct_partial_test,fwd_txfm: change << to *
...
left shift of a negative number is undefined in C; quiets a ubsan
warning
Change-Id: Ib1624ad5326ac8e0eead9348468ef7fe5d4df9a4
2017-06-29 14:42:03 -07:00
Johann
9fe510c12a
partial fdct neon: add 32x32_1
...
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.
The values could potentially be summed and shifted in place.
BUG=webm:1424
Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
2017-06-28 15:37:44 -07:00
Johann
f310ddc470
partial fdct neon: add 16x16_1
...
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.
BUG=webm:1424
Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
2017-06-28 15:37:44 -07:00
Johann
4959dd3eb3
partial fdct neon: add 4x4_1
...
BUG=webm:1424
Change-Id: Ib0f3cfd6116fc1f5a99acb8bfd76e25b90177ffc
2017-06-28 15:37:44 -07:00
Johann
cf75ab6ccd
partial fdct neon: move 8x8_1 and enable hbd tests
...
The function was originally written with HBD in mind. Enable it and
configure the tests.
BUG=webm:1424
Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
2017-06-28 15:37:43 -07:00
Johann Koenig
81e25512c3
Merge changes Ib454762d,I966650df,Ie126553e,I068f06c6,Icb72a94e
...
* changes:
sad neon: rewrite 64x64 and add 64x32
sad neon: rewrite 32x32, add 32x16 and 32x64
sad neon: rewrite 16x8, 16x16, add 16x32
sad neon: rewrite 8x8 and 8x16
sad neon: rewrite 4x4 and add 4x8
2017-06-28 22:37:00 +00:00
Johann Koenig
35f8515c3f
Merge "partial fdct test"
2017-06-28 22:34:53 +00:00
Johann
5ac88162b9
partial fdct test
...
Test the _1 variant of the fdct, which simply sums the block and applies
a modifying shift based on the block size.
BUG=webm:1424
Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468
2017-06-28 20:32:20 +00:00
Johann
ad011aaab8
sad neon: rewrite 64x64 and add 64x32
...
BUG=webm:1425
Change-Id: Ib454762d1c61b05a98324fe81ad58c9e09784717
2017-06-28 12:21:34 -07:00
Johann
77a648885c
sad neon: rewrite 32x32, add 32x16 and 32x64
...
BUG=webm:1425
Change-Id: I966650df7e3face93e1e771634d1cc5458a35f85
2017-06-28 12:20:27 -07:00
Johann
469643757f
sad neon: rewrite 16x8, 16x16, add 16x32
...
BUG=webm:1425
Change-Id: Ie126553e5fffcdfaf3d82a85b368ac10ce9ab082
2017-06-28 12:16:00 -07:00
Johann
e40e78be24
sad neon: rewrite 8x8 and 8x16
...
BUG=webm:1425
Change-Id: I068f06c67b841f09ea07c04ada0c2f1706102138
2017-06-28 12:15:57 -07:00
Johann
46d8660ce3
sad neon: rewrite 4x4 and add 4x8
...
The previous implementation loaded 8 values (discarding half)
BUG=webm:1425
Change-Id: Icb72a94e2557a4ee2db7091266ab58fd92f72158
2017-06-28 11:14:59 -07:00
Linfeng Zhang
0bb31a46a4
Update vpx_idct8x8_12_add_ssse3()
...
Change-Id: I0f38801c391db87ddae168602a786a062cd34b1d
2017-06-26 14:57:41 -07:00
Linfeng Zhang
a76b6b232c
Update load_input_data() in x86
...
Split to load_input_data4() and load_input_data8().
Use pack with signed saturation instruction for high bitdepth.
Change-Id: Icda3e0129a6fdb4a51d1cafbdc652ae3a65f4e06
2017-06-26 13:38:33 -07:00
Linfeng Zhang
8253a27904
Add vpx_highbd_idct4x4_16_add_sse4_1()
...
BUG=webm:1412
Change-Id: Ie33482409351a01be4e89466b0441834eb1e905a
2017-06-23 14:30:12 -07:00
Linfeng Zhang
b8a4b5dd8d
Cosmetics, 8x8 idct SSE2 optimization
...
Change-Id: Id21fa94fd323e36cd19a2d890bf4a0cafb7d964d
2017-06-23 14:30:12 -07:00
James Zern
88a302e743
Merge changes from topic 'missing-proto'
...
* changes:
onyxd_int.h: add missing prototypes
onyxd.h: add vp8dx_references_buffer prototype
vp[89],vpx_dsp: add missing includes
vp8,encodeframe.h: correct prototypes
vp8: add temporal_filter.h
add picklpf.h
add ethreading.h
vp8,bitstream.h: add missing prototypes
vp8: remove vp8_fast_quantize_b_mmx
vp8,loopfilter_filters: make some functions static
vp9_ratectrl: make adjust_gf_boost_lag_one_pass_vbr static
vp9_encodeframe: make scale_part_thresh_sumdiff static
vp9_alt_ref_aq: correct vp9_alt_ref_aq_create proto
tiny_ssim: make some functions static
2017-06-23 05:44:24 +00:00
Johann Koenig
794a5ad713
Merge "fdct32x32 neon implementation"
2017-06-23 01:58:00 +00:00
Johann
e67660cf37
fdct32x32 neon implementation
...
Almost 3x faster in constrained loop testing. Over 10x faster in HBD
builds.
BUG=webm:1424
Change-Id: I2b7f8453e1d4ada63cde729d8115d684c4a71ff9
2017-06-22 06:40:17 -07:00
James Zern
44418c659f
vp[89],vpx_dsp: add missing includes
...
quiets -Wmissing-prototypes
Change-Id: I841cfc019d592f2bc6b3fec5818051a31f4c53b5
2017-06-21 19:00:15 -07:00
Linfeng Zhang
466b667ff3
Clean vpx_idct16x16_256_add_sse2()
...
Remove macro IDCT16 which is redundant with idct16_8col().
Change-Id: I783c5f4fda038a22d5ee5c2b22e8c2cdfb38432c
2017-06-21 13:47:15 -07:00
Linfeng Zhang
42522ce0b7
Update vpx_idct{8x8,16x16,32x32}_1_add_sse2()
...
Change-Id: I365f8e53d9ccd028cef0f561d4de9e5916278609
2017-06-21 13:47:05 -07:00
Linfeng Zhang
2b43a1ee18
Clean 32x32 full idct sse2 and ssse3 code
...
vpx_idct32x32_1024_add_ssse3() is actually a sse2 function and faster
than vpx_idct32x32_1024_add_sse2(). Replace the slow one. All are
code relocations, no new code.
Change-Id: I5dac0e98cc411a4ce05660406921118986638d19
2017-06-21 13:46:49 -07:00
Linfeng Zhang
c7e4917e97
Clean 8x8 idct x86 optimization
...
Create load_buffer_8x8() and write_buffer_8x8().
Change-Id: Ib26dd515d734a5402971c91de336ab481b213fdf
2017-06-15 14:30:00 -07:00
Linfeng Zhang
98967645a1
Remove vpx_idct8x8_64_add_ssse3()
...
It's almost identical with vpx_idct8x8_64_add_sse2(), except little
difference in instructions order.
Change-Id: Ie60dabc35eaa6ebae7c755e6cff00a710aad284f
2017-06-15 14:09:33 -07:00
Linfeng Zhang
6da6a23291
Update high bitdepth load_input_data() in x86
...
BUG=webm:1412
Change-Id: Ibf9d120b80c7d3a7637e79e123cf2f0aae6dd78c
2017-06-13 16:53:53 -07:00
Linfeng Zhang
d6eeef9ee6
Clean array_transpose_{4X8,16x16,16x16_2) in x86
...
Change-Id: I341399ecbde37065375ea7e63511a26bfc285ea0
2017-06-13 16:50:44 -07:00
Linfeng Zhang
9c72e85e4c
Remove array_transpose_8x8() in x86
...
Duplicate of transpose_16bit_8x8()
Change-Id: Iaa5dd63b5cccb044974a65af22c90e13418e311f
2017-06-13 16:50:44 -07:00
Linfeng Zhang
cbb991b6b8
Convert 8x8 idct x86 macros to inline functions
...
Change-Id: Id59865fd6c453a24121ce7160048d67875fc67ce
2017-06-13 16:50:43 -07:00
Jerome Jiang
943f9ee25c
Merge "Merge skin detection code in vp8/9."
2017-06-08 16:36:00 +00:00
Johann Koenig
903375a48a
Merge "fdct16x16 neon optimization"
2017-06-08 15:19:36 +00:00
Jerome Jiang
658e854252
Merge skin detection code in vp8/9.
...
BUG=webm:1438
Change-Id: Ie3dc034c7dbb498a0b088a767b1936ddeed4df14
2017-06-07 21:20:34 -07:00
Johann
eae7cf2368
fdct16x16 neon optimization
...
Roughly 2x speedup. Since the only change for HBD is to store(), the
improvement appears to hold there as well.
BUG=webm:1424
Change-Id: I15b813d50deb2e47b49a6b0705945de748e83c19
2017-06-07 14:59:55 -07:00
James Zern
ff42e04f9c
Merge "ppc: Add vpx_sadnxmx4d_vsx for n,m = {8, 16, 32 ,64}"
2017-06-06 23:52:39 +00:00
James Zern
4753c23983
Merge "ppc: Add vpx_sad64/32/16x64/32/16_avg_vsx"
2017-06-06 02:19:41 +00:00
Johann Koenig
755b3daf90
Merge "comp_avg_pred neon: used by sub pixel avg variance"
2017-05-31 18:17:28 +00:00
Linfeng Zhang
30ea3ef283
Merge "Update vpx_highbd_idct4x4_16_add_sse2()"
2017-05-31 15:56:20 +00:00
Johann
f695b30ac2
comp_avg_pred neon: used by sub pixel avg variance
...
BUG=webm:1423
Change-Id: I33de537f238f58f89b7a6c1c2d6e8110de4b8804
2017-05-30 22:47:34 +00:00
Linfeng Zhang
45048dc9dc
Update vpx_highbd_idct4x4_16_add_sse2()
...
BUG=webm:1412
Change-Id: I26e4b34ae9bc1ae80c24f56d740d737a95f1ab84
2017-05-30 09:25:30 -07:00
Johann Koenig
b9649d2407
Merge "comp_avg_pred: alignment"
2017-05-30 16:21:05 +00:00
Johann
ea8b4a450d
comp_avg_pred: alignment
...
x86 requires 16 byte alignment for some vector loads/stores.
arm does not have the same requirement.
The asserts are still in avg_pred_sse2.c. This just removes them from
the common code.
Change-Id: Ic5175c607a94d2abf0b80d431c4e30c8a6f731b6
2017-05-30 07:46:43 -07:00