Commit Graph

805 Commits

Author SHA1 Message Date
Linfeng Zhang
9c43d81bc2 Refactor highbd idct 4x4 and 8x8 x86 functions
BUG=webm:1412

Change-Id: I221dff34dd5f71b390b5e043d0a137ccb0a01dec
2017-07-27 18:01:03 -07:00
Johann Koenig
a83e1f1d53 Merge "quantize ssse3: declare all variables" 2017-07-27 21:18:35 +00:00
James Zern
1c666465af inv_txfm_{sse2,ssse3}: clear conversion warnings
visual studio reports tran_high_t (int64) -> short in calls to
_mm_set1_epi16

Change-Id: Icb8d1baee77ad3d45edb1477a443d3e648f0b745
2017-07-25 20:13:49 -07:00
James Zern
62682ac8ad highbd_idct*_sse*.c: clear conversion warnings
visual studio reports tran_high_t (int64) -> int in calls to
_mm_setr_epi32

Change-Id: Ic2247c8e3800991202151790d78bd94c4f4aed05
2017-07-25 20:11:09 -07:00
James Zern
85736e616e vpx_variance16x16_sse2: correct cast order
allow the right shift to operate on 64-bits, this matches the rest of
the implementations

previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order

Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
2017-07-25 16:45:40 -07:00
James Zern
b0f1ae1475 vpx_get16x16var_avx2: correct cast order
allow the right shift to operate on 64-bits, this matches the rest of
the implementations

missed in:
6acd061aa variance_avx2: sync variance functions with c-code

Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
2017-07-24 16:29:44 -07:00
Johann
4b9a848bb3 variance: call C comp_avg_pred
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.

Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
2017-07-18 20:22:53 +00:00
Johann
03f5e300d6 quantize ssse3: declare all variables
Copy missing line from avx implementation.

Change-Id: I9755c5b4d4034867de6fa9f741c24bf49dce3a27
2017-07-18 12:32:57 -07:00
Johann
e381753926 sad4d neon: 64x[32,64]
Rewrite 64x64.

BUG=webm:1425

Change-Id: I336bf5a3aa4b783389c10b16a50f0f559346ecbf
2017-07-12 13:26:39 +00:00
Johann
e1bde306c8 sad4d neon: 32x[16,32,64]
Rewrite 32x32. Use half the accumulator registers.

BUG=webm:1425

Change-Id: Ibf5e61dc4ba15056102aef8495f4a02c668c5d13
2017-07-12 13:25:18 +00:00
Johann
807ce8fb1e sad4d neon: 16x[8,16,32]
Rewrite 16x16. Use half the accumulator registers.

BUG=webm:1425

Change-Id: I44b48512b1e3629505d83c2645e800f53878ccc2
2017-07-12 13:25:11 +00:00
Johann
8152b0904d sad4d neon: 8x[4,8,16]
BUG=webm:1425

Change-Id: I7de2500cca4b621f21478c4b0333c56d76dbc9a4
2017-07-12 13:25:03 +00:00
Johann
dd4347e9ec sad4d neon: 4x4, 4x8
BUG=webm:1425

Change-Id: I5081b5ce131821d590c53ac1206a94f50cb8b468
2017-07-12 03:38:03 +00:00
Johann
66a96fd3de avg_neon: fix 4x4, update 8x8
4x4 was failing with a bus error. Most likely due to clang alignment
hints on 32bit loads.

Change-Id: Ib191ce0e6239fc55d85f10e4dbe15876e5052edb
2017-07-10 15:29:34 -07:00
Johann
87610ac45e neon: consolidate horizontal adds
Change-Id: Iaf9e88ff636ccf8f0ef310869c6827f3f205cca8
2017-07-10 15:29:13 -07:00
Johann Koenig
4b78c6e6f7 Merge "remove vp9_full_sad_search" 2017-07-10 20:42:40 +00:00
Johann
109faffe9b remove vp9_full_sad_search
This code is unused in vp9. Only vp8 still contains references to
vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.

Remove the remaining sizes and all the highbitdepth versions.

BUG=webm:1425

Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
2017-07-10 11:20:35 -07:00
Johann Koenig
4e16f70703 Merge changes Id84d9780,Iaa6ea75b,I3362e0dd,I0020a49e,Ia42e4f36, ...
* changes:
  sad neon: avg for 64x[32,64]
  sad neon: macroize 64xN definitions
  sad neon: avg for 32x[16,32,64]
  sad neon: macroize 32xN definitions
  sad neon: avg for 16x[8,16,32]
  sad neon: macroize 16xN definitions
2017-07-07 21:01:23 +00:00
Johann Koenig
6c375b9cd0 Merge "fdct neon: 32x32_rd" 2017-07-07 14:05:51 +00:00
Johann
e4e08556db sad neon: avg for 64x[32,64]
BUG=webm:1425

Change-Id: Id84d97807a6a0fbcc889c4dfe11929d54f85493d
2017-07-07 07:04:04 -07:00
Johann
6ae8f8dbe8 sad neon: macroize 64xN definitions
Change-Id: Iaa6ea75b10e75784f31b1e08637eecf0dcb5cff9
2017-07-07 07:04:04 -07:00
Johann
67cffc1ef6 sad neon: avg for 32x[16,32,64]
BUG=webm:1425

Change-Id: I3362e0dded3b46ca032caa7f44db42f324bc596d
2017-07-07 07:04:04 -07:00
Johann
b0d15713be sad neon: macroize 32xN definitions
Change-Id: I0020a49e77d27514375a03095d5821dc0aa7d128
2017-07-07 07:04:04 -07:00
Johann
527e0c9b1c sad neon: avg for 16x[8,16,32]
BUG=webm:1425

Change-Id: Ia42e4f36547c5fe12114fb58379e34bce82eb2f2
2017-07-07 07:04:04 -07:00
Johann
3c18acf452 sad neon: macroize 16xN definitions
Change-Id: I5aea6ffbfa48eb1970afe3be54f0bba275d7fa58
2017-07-07 07:04:04 -07:00
Johann
d6423b3166 sad neon: macroize 8xN definitions
Change-Id: I7b36a57e893c1795a37ba7994995bec7ff021409
2017-07-06 07:51:59 -07:00
Johann
63bdc574e5 sad neon: avg for 8x[4,8,16]
BUG=webm:1425

Change-Id: If2ab51e3050e078b0011b174efe41fcb65a15f44
2017-07-06 07:43:09 -07:00
Johann
6bac3f80ee sad neon: avg for 4x4 and 4x8
BUG=webm:1425

Change-Id: Ifc685a96cb34f7fd9243b4c674027480564b84fb
2017-07-06 07:12:47 -07:00
Johann
75b00592c7 fdct neon: 32x32_rd
About 40% faster than the non-rd version.

BUG=webm:1424

Change-Id: Ia99d14eb9532302eeaab8cd3e503395b0374b5a2
2017-07-06 06:30:50 -07:00
James Zern
a6531cbc54 Merge changes from topic 'missing-proto'
* changes:
  fwd_txfm_msa.c: add missing vpx_dsp_rtcd.h
  vpx_convolve_*_msa.c: add missing vpx_dsp_rtcd.h
  loopfilter_*_msa.c: add missing vpx_dsp_rtcd.h
2017-07-05 20:00:25 +00:00
Johann Koenig
b6321025cd Merge "partial fdct neon: maintain neon registers" 2017-07-05 19:12:38 +00:00
James Zern
fb135ff050 Merge changes I4ed1312f,Id2673eec
* changes:
  ppc: Add vpx_idct8x8_64_add_vsx
  ppc: Add vpx_idct4x4_16_add_vsx
2017-07-02 02:38:39 +00:00
Alexandra Hájková
c757d6dde4 ppc: Add vpx_idct8x8_64_add_vsx
Change-Id: I4ed1312f365509e0595dcc09890ecb050f6f2069
2017-07-01 12:55:47 -07:00
Alexandra Hájková
d8c277030c ppc: Add vpx_idct4x4_16_add_vsx
Change-Id: Id2673eece32027fb245919c7a5c81994a4a19fd8
2017-07-01 12:32:18 -07:00
James Zern
3dd993e4be highbd_idct8x8_add_sse4: make << of neg. val a multiply
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.

Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
2017-07-01 11:56:56 -07:00
Johann
3ae458f2f3 partial fdct neon: maintain neon registers
Finish the calulations in neon registers. This avoids a potentially
expensive move from neon to gp and allows at least clang to store
directly to memory.

BUG=webm:1424

Change-Id: Idef25eec95f7610947167818e9194bde8b00d282
2017-07-01 09:29:38 -07:00
James Zern
a876d04072 fwd_txfm_msa.c: add missing vpx_dsp_rtcd.h
+ only expose compatible functions in high-bitdepth build

quiets -Wmissing-prototypes warnings

Change-Id: I8ef7db08a34c5c54b5cde6e732c0d70f4287c89a
2017-06-30 18:53:30 -07:00
James Zern
8710c6d884 vpx_convolve_*_msa.c: add missing vpx_dsp_rtcd.h
quiets -Wmissing-prototypes warnings

Change-Id: I1ab5b8ae4a62f54e0f9eb3fc81371c9b99972c30
2017-06-30 18:50:56 -07:00
James Zern
329dabf57e loopfilter_*_msa.c: add missing vpx_dsp_rtcd.h
+ make some functions static

quiets -Wmissing-prototypes warnings

Change-Id: I2130e06142e71a004a1eb30e173feba4f6fe68a0
2017-06-30 18:50:52 -07:00
James Zern
27e37e1a8a fwd_txfm_msa.c: correct vpx_fdct8x8_1_msa prototype
this makes the function compatible with high-bitdepth and fixes test
failures since:
5ac88162b partial fdct test

Change-Id: Ib630694608237f0c515948942e05dbea259ba338
2017-06-30 18:50:47 -07:00
Linfeng Zhang
1e3a93e72e Merge changes I5d038b4f,I9d00d1dd,I0722841d,I1f640db7
* changes:
  Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
  sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
  Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
  Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
2017-06-30 20:49:19 +00:00
Johann Koenig
89d3dc043e Merge changes Id5beb35d,I2945fe54,Ib0f3cfd6,I78a2eba8
* changes:
  partial fdct neon: add 32x32_1
  partial fdct neon: add 16x16_1
  partial fdct neon: add 4x4_1
  partial fdct neon: move 8x8_1 and enable hbd tests
2017-06-30 01:00:07 +00:00
Linfeng Zhang
c338f3635e Add vpx_highbd_idct8x8_{12, 64}_add_sse4_1
BUG=webm:1412

Change-Id: I5d038b4fa842ce2f6b9bd5c8c44c70647bda9591
2017-06-29 17:19:34 -07:00
Linfeng Zhang
ee5cb8d87f sse2: Add transpose_32bit_4x4x2() and update transpose_32bit_4x4()
BUG=webm:1412

Change-Id: I9d00d1ddbd724fd5f825fd974c4cf46a9bca6cb3
2017-06-29 17:18:01 -07:00
Linfeng Zhang
0fa59a4baf Refactor highbd idct 4x4 sse4.1 code and add highbd_inv_txfm_sse4.h
Also clean highbd_inv_txfm_sse2.h

BUG=webm:1412

Change-Id: I0722841d824ce602874019bd9779b10d49d10c0b
2017-06-29 17:17:43 -07:00
Linfeng Zhang
9ac78ae35f Refactor vpx_idct8x8_12_add_ssse3() and add inv_txfm_ssse3.h
BUG=webm:1412

Change-Id: I1f640db71ad4c644b7521305a781f2218eb1ba9d
2017-06-29 17:13:28 -07:00
James Zern
bd77931421 dct_partial_test,fwd_txfm: change << to *
left shift of a negative number is undefined in C; quiets a ubsan
warning

Change-Id: Ib1624ad5326ac8e0eead9348468ef7fe5d4df9a4
2017-06-29 14:42:03 -07:00
Johann
9fe510c12a partial fdct neon: add 32x32_1
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.

The values could potentially be summed and shifted in place.

BUG=webm:1424

Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
2017-06-28 15:37:44 -07:00
Johann
f310ddc470 partial fdct neon: add 16x16_1
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.

BUG=webm:1424

Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
2017-06-28 15:37:44 -07:00
Johann
4959dd3eb3 partial fdct neon: add 4x4_1
BUG=webm:1424

Change-Id: Ib0f3cfd6116fc1f5a99acb8bfd76e25b90177ffc
2017-06-28 15:37:44 -07:00