allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.
Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
This code is unused in vp9. Only vp8 still contains references to
vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.
Remove the remaining sizes and all the highbitdepth versions.
BUG=webm:1425
Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
* changes:
sad neon: avg for 64x[32,64]
sad neon: macroize 64xN definitions
sad neon: avg for 32x[16,32,64]
sad neon: macroize 32xN definitions
sad neon: avg for 16x[8,16,32]
sad neon: macroize 16xN definitions
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
Finish the calulations in neon registers. This avoids a potentially
expensive move from neon to gp and allows at least clang to store
directly to memory.
BUG=webm:1424
Change-Id: Idef25eec95f7610947167818e9194bde8b00d282
this makes the function compatible with high-bitdepth and fixes test
failures since:
5ac88162b partial fdct test
Change-Id: Ib630694608237f0c515948942e05dbea259ba338
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.
The values could potentially be summed and shifted in place.
BUG=webm:1424
Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.
BUG=webm:1424
Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
The function was originally written with HBD in mind. Enable it and
configure the tests.
BUG=webm:1424
Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
* changes:
sad neon: rewrite 64x64 and add 64x32
sad neon: rewrite 32x32, add 32x16 and 32x64
sad neon: rewrite 16x8, 16x16, add 16x32
sad neon: rewrite 8x8 and 8x16
sad neon: rewrite 4x4 and add 4x8
Test the _1 variant of the fdct, which simply sums the block and applies
a modifying shift based on the block size.
BUG=webm:1424
Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468