Johann Koenig
a83e1f1d53
Merge "quantize ssse3: declare all variables"
2017-07-27 21:18:35 +00:00
James Zern
1c666465af
inv_txfm_{sse2,ssse3}: clear conversion warnings
...
visual studio reports tran_high_t (int64) -> short in calls to
_mm_set1_epi16
Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745
2017-07-25 20:13:49 -07:00
James Zern
62682ac8ad
highbd_idct*_sse*.c: clear conversion warnings
...
visual studio reports tran_high_t (int64) -> int in calls to
_mm_setr_epi32
Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05
2017-07-25 20:11:09 -07:00
James Zern
85736e616e
vpx_variance16x16_sse2: correct cast order
...
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147
vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
2017-07-25 16:45:40 -07:00
James Zern
b0f1ae1475
vpx_get16x16var_avx2: correct cast order
...
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa
variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
2017-07-24 16:29:44 -07:00
Johann
4b9a848bb3
variance: call C comp_avg_pred
...
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.
Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
2017-07-18 20:22:53 +00:00
Johann
03f5e300d6
quantize ssse3: declare all variables
...
Copy missing line from avx implementation.
Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27
2017-07-18 12:32:57 -07:00
Johann
e381753926
sad4d neon: 64x[32,64]
...
Rewrite 64x64.
BUG=webm:1425
Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf
2017-07-12 13:26:39 +00:00
Johann
e1bde306c8
sad4d neon: 32x[16,32,64]
...
Rewrite 32x32. Use half the accumulator registers.
BUG=webm:1425
Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13
2017-07-12 13:25:18 +00:00
Johann
807ce8fb1e
sad4d neon: 16x[8,16,32]
...
Rewrite 16x16. Use half the accumulator registers.
BUG=webm:1425
Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2
2017-07-12 13:25:11 +00:00
Johann
8152b0904d
sad4d neon: 8x[4,8,16]
...
BUG=webm:1425
Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4
2017-07-12 13:25:03 +00:00
Johann
dd4347e9ec
sad4d neon: 4x4, 4x8
...
BUG=webm:1425
Change-Id: I5081b5ce131821d590c53ac1206a94f50cb8b468
2017-07-12 03:38:03 +00:00
Johann
66a96fd3de
avg_neon: fix 4x4, update 8x8
...
4x4 was failing with a bus error. Most likely due to clang alignment
hints on 32bit loads.
Change-Id: Ib191ce0e6239fc55d85f10e4dbe15876e5052edb
2017-07-10 15:29:34 -07:00
Johann
87610ac45e
neon: consolidate horizontal adds
...
Change-Id: Iaf9e88ff636ccf8f0ef310869c6827f3f205cca8
2017-07-10 15:29:13 -07:00
Johann Koenig
4b78c6e6f7
Merge "remove vp9_full_sad_search"
2017-07-10 20:42:40 +00:00
Johann
109faffe9b
remove vp9_full_sad_search
...
This code is unused in vp9. Only vp8 still contains references to
vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.
Remove the remaining sizes and all the highbitdepth versions.
BUG=webm:1425
Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
2017-07-10 11:20:35 -07:00
Johann Koenig
4e16f70703
Merge changes Id84d9780,Iaa6ea75b,I3362e0dd,I0020a49e,Ia42e4f36, ...
...
* changes:
sad neon: avg for 64x[32,64]
sad neon: macroize 64xN definitions
sad neon: avg for 32x[16,32,64]
sad neon: macroize 32xN definitions
sad neon: avg for 16x[8,16,32]
sad neon: macroize 16xN definitions
2017-07-07 21:01:23 +00:00
Johann Koenig
6c375b9cd0
Merge "fdct neon: 32x32_rd"
2017-07-07 14:05:51 +00:00
Johann
e4e08556db
sad neon: avg for 64x[32,64]
...
BUG=webm:1425
Change-Id: Id84d97807a6a0fbcc889c4dfe11929d54f85493d
2017-07-07 07:04:04 -07:00
Johann
6ae8f8dbe8
sad neon: macroize 64xN definitions
...
Change-Id: Iaa6ea75b10e75784f31b1e08637eecf0dcb5cff9
2017-07-07 07:04:04 -07:00
Johann
67cffc1ef6
sad neon: avg for 32x[16,32,64]
...
BUG=webm:1425
Change-Id: I3362e0dded3b46ca032caa7f44db42f324bc596d
2017-07-07 07:04:04 -07:00
Johann
b0d15713be
sad neon: macroize 32xN definitions
...
Change-Id: I0020a49e77d27514375a03095d5821dc0aa7d128
2017-07-07 07:04:04 -07:00
Johann
527e0c9b1c
sad neon: avg for 16x[8,16,32]
...
BUG=webm:1425
Change-Id: Ia42e4f36547c5fe12114fb58379e34bce82eb2f2
2017-07-07 07:04:04 -07:00
Johann
3c18acf452
sad neon: macroize 16xN definitions
...
Change-Id: I5aea6ffbfa48eb1970afe3be54f0bba275d7fa58
2017-07-07 07:04:04 -07:00
Johann
d6423b3166
sad neon: macroize 8xN definitions
...
Change-Id: I7b36a57e893c1795a37ba7994995bec7ff021409
2017-07-06 07:51:59 -07:00
Johann
63bdc574e5
sad neon: avg for 8x[4,8,16]
...
BUG=webm:1425
Change-Id: If2ab51e3050e078b0011b174efe41fcb65a15f44
2017-07-06 07:43:09 -07:00
Johann
6bac3f80ee
sad neon: avg for 4x4 and 4x8
...
BUG=webm:1425
Change-Id: Ifc685a96cb34f7fd9243b4c674027480564b84fb
2017-07-06 07:12:47 -07:00
Johann
75b00592c7
fdct neon: 32x32_rd
...
About 40% faster than the non-rd version.
BUG=webm:1424
Change-Id: Ia99d14eb9532302eeaab8cd3e503395b0374b5a2
2017-07-06 06:30:50 -07:00
James Zern
a6531cbc54
Merge changes from topic 'missing-proto'
...
* changes:
fwd_txfm_msa.c: add missing vpx_dsp_rtcd.h
vpx_convolve_*_msa.c: add missing vpx_dsp_rtcd.h
loopfilter_*_msa.c: add missing vpx_dsp_rtcd.h
2017-07-05 20:00:25 +00:00
Johann Koenig
b6321025cd
Merge "partial fdct neon: maintain neon registers"
2017-07-05 19:12:38 +00:00
James Zern
fb135ff050
Merge changes I4ed1312f,Id2673eec
...
* changes:
ppc: Add vpx_idct8x8_64_add_vsx
ppc: Add vpx_idct4x4_16_add_vsx
2017-07-02 02:38:39 +00:00
Alexandra Hájková
c757d6dde4
ppc: Add vpx_idct8x8_64_add_vsx
...
Change-Id: I4ed1312f365509e0595dcc09890ecb050f6f2069
2017-07-01 12:55:47 -07:00
Alexandra Hájková
d8c277030c
ppc: Add vpx_idct4x4_16_add_vsx
...
Change-Id: Id2673eece32027fb245919c7a5c81994a4a19fd8
2017-07-01 12:32:18 -07:00
James Zern
3dd993e4be
highbd_idct8x8_add_sse4: make << of neg. val a multiply
...
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
2017-07-01 11:56:56 -07:00
Johann
3ae458f2f3
partial fdct neon: maintain neon registers
...
Finish the calulations in neon registers. This avoids a potentially
expensive move from neon to gp and allows at least clang to store
directly to memory.
BUG=webm:1424
Change-Id: Idef25eec95f7610947167818e9194bde8b00d282
2017-07-01 09:29:38 -07:00
James Zern
a876d04072
fwd_txfm_msa.c: add missing vpx_dsp_rtcd.h
...
+ only expose compatible functions in high-bitdepth build
quiets -Wmissing-prototypes warnings
Change-Id: I8ef7db08a34c5c54b5cde6e732c0d70f4287c89a
2017-06-30 18:53:30 -07:00
James Zern
8710c6d884
vpx_convolve_*_msa.c: add missing vpx_dsp_rtcd.h
...
quiets -Wmissing-prototypes warnings
Change-Id: I1ab5b8ae4a62f54e0f9eb3fc81371c9b99972c30
2017-06-30 18:50:56 -07:00
James Zern
329dabf57e
loopfilter_*_msa.c: add missing vpx_dsp_rtcd.h
...
+ make some functions static
quiets -Wmissing-prototypes warnings
Change-Id: I2130e06142e71a004a1eb30e173feba4f6fe68a0
2017-06-30 18:50:52 -07:00
James Zern
27e37e1a8a
fwd_txfm_msa.c: correct vpx_fdct8x8_1_msa prototype
...
this makes the function compatible with high-bitdepth and fixes test
failures since:
5ac88162b
partial fdct test
Change-Id: Ib630694608237f0c515948942e05dbea259ba338
2017-06-30 18:50:47 -07:00
Linfeng Zhang
1e3a93e72e
Merge changes I5d038b4f,I9d00d1dd,I0722841d,I1f640db7
...
* changes:
Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
2017-06-30 20:49:19 +00:00
Johann Koenig
89d3dc043e
Merge changes Id5beb35d,I2945fe54,Ib0f3cfd6,I78a2eba8
...
* changes:
partial fdct neon: add 32x32_1
partial fdct neon: add 16x16_1
partial fdct neon: add 4x4_1
partial fdct neon: move 8x8_1 and enable hbd tests
2017-06-30 01:00:07 +00:00
Linfeng Zhang
c338f3635e
Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
...
BUG=webm:1412
Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591
2017-06-29 17:19:34 -07:00
Linfeng Zhang
ee5cb8d87f
sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
...
BUG=webm:1412
Change-Id: I9d00d1ddbd724fd5f825fd974c4cf46a9bca6cb3
2017-06-29 17:18:01 -07:00
Linfeng Zhang
0fa59a4baf
Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
...
Also clean highbd_inv_txfm_sse2.h
BUG=webm:1412
Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b
2017-06-29 17:17:43 -07:00
Linfeng Zhang
9ac78ae35f
Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
...
BUG=webm:1412
Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d
2017-06-29 17:13:28 -07:00
James Zern
bd77931421
dct_partial_test,fwd_txfm: change << to *
...
left shift of a negative number is undefined in C; quiets a ubsan
warning
Change-Id: Ib1624ad5326ac8e0eead9348468ef7fe5d4df9a4
2017-06-29 14:42:03 -07:00
Johann
9fe510c12a
partial fdct neon: add 32x32_1
...
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.
The values could potentially be summed and shifted in place.
BUG=webm:1424
Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
2017-06-28 15:37:44 -07:00
Johann
f310ddc470
partial fdct neon: add 16x16_1
...
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.
BUG=webm:1424
Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
2017-06-28 15:37:44 -07:00
Johann
4959dd3eb3
partial fdct neon: add 4x4_1
...
BUG=webm:1424
Change-Id: Ib0f3cfd6116fc1f5a99acb8bfd76e25b90177ffc
2017-06-28 15:37:44 -07:00
Johann
cf75ab6ccd
partial fdct neon: move 8x8_1 and enable hbd tests
...
The function was originally written with HBD in mind. Enable it and
configure the tests.
BUG=webm:1424
Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
2017-06-28 15:37:43 -07:00