Change legacy vp8/9_write_yuv_frame to vpx_write_yuv_files.
Delete some flags that can be enabled during build.
To enable writing denoised YUV, use the following command line:
CFLAGS='-DOUTPUT_YUV_DENOISED' ./configure
--enable-vp9-temporal-denoising
For skinmap, use CFLAGS='-DOUTPUT_YUV_SKINMAP'
Change-Id: I236974ac8b3cf279d20c4dc7f6162d8b480b6528
The result of the xor operation is unsigned. If coeff was negative,
this results in an unsigned value - INT_MIN.
Change-Id: I1f1edeaa6de1f4c68b848e8a82a666d390b749f0
Created inline functions highbd_butterfly_cospi16_sse2()
and highbd_butterfly_cospi16_sse4_1()
BUG=webm:1412
Change-Id: Icbc53a73712b6207379872a5e88d0a4d09e2322a
With skip block the neon is about twice as fast as C.
The neon has no shortcut for coeff < zbin so it always takes the
same amount of time. Even if the C can take the shortcut, it is over
twice as fast in neon. If it can't, that gap increases to over 10x.
BUG=webm:1426
Change-Id: I400722146c1b5a5f6289f67d85fd642463d2bfc6
Fairly minor differences from sse2. pabsw and psignw are the big gains.
Also re-uses some values in eob calculation to avoid an extra pcmp.
Fixes test failures in HBD and OS X builds.
Allows using it in 32bit builds, where it is about 40% faster than sse2.
Substantially faster than the assembly for skip_block. 10-20% faster the
rest of the time.
Change-Id: If783bb3567e561e47667e10133b9c84414a334e2
Prepare for high bitdepth 16x16 idct sse4.1 code.
Just functions moving and renaming.
BUG=webm:1412
Change-Id: Ie056fe4494b1f299491968beadcef990e2ab714a
vpx_sub_pixel_variance32xh_avx2() and
vpx_sub_pixel_avg_variance32xh_avx2
see:
17fae3a Change to use correct check for halfpel
Change-Id: Ib0741c5c2fd011e9650ca62b76009f1b59fdbe4c
This replaces commit aa1c4cd, which has a bug and was reverted in
commit 3c73e58.
The bug is caused by rounding -step1[5] in highbd_idct8x8_12_half1d().
Change-Id: I37b3a5f0d91815f2dc570209091dc6626fd178a8
With skip block or coeff < zbin it is about twice as fast as C.
If most coeff values are > zbin it is about 10-15x as fast as C.
BUG=webm:1426
Change-Id: I5d3c007b014a372d5ef0882b39bb48983b4131c7
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: I595f0ff7904ef025e07bb80234293d958dc9f254
This reverts commit aa1c4cd140.
This fails the following tests with extreme input coefficients:
SSE2/InvTrans8x8DCT.CompareReference/0
SSE2/InvTrans8x8DCT.CompareReference/2
previously the optimized path was skipped in this range
Change-Id: I9af015a46eba96208834a219fafd651d37556a80
This reverts commit 03f5e300d6.
This causes test failures under OSX:
SSSE3/VP9QuantizeTest.EOBCheck/0
SSSE3/VP9QuantizeTest.OperationCheck/0
Change-Id: I122732717ead1f7af5b04c529a6948e382e5e59b
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
previously:
b0f1ae147 vpx_get16x16var_avx2: correct cast order
Change-Id: I632ee5e418f3f9b30e79ecd05588eb172b0783aa
allow the right shift to operate on 64-bits, this matches the rest of
the implementations
missed in:
6acd061aa variance_avx2: sync variance functions with c-code
Change-Id: Icae436b881251ccb9f9ed64fcbf8d358c58a4617
Keep optimized code out of the reference implementation. This matches
the style of the other sub calls.
Change-Id: I3da6acd4f2c647b029c420e22ac9410a18259689
This code is unused in vp9. Only vp8 still contains references to
vpx_sad_NxMx[3|8] and only for sizes 16x16, 16x8, 8x16, 8x8 and 4x4.
Remove the remaining sizes and all the highbitdepth versions.
BUG=webm:1425
Change-Id: If6a253977c8e0c04599e25cbeb45f71a94f563e8
* changes:
sad neon: avg for 64x[32,64]
sad neon: macroize 64xN definitions
sad neon: avg for 32x[16,32,64]
sad neon: macroize 32xN definitions
sad neon: avg for 16x[8,16,32]
sad neon: macroize 16xN definitions
left shifting a negative value is undefined; quiets a ubsan warning.
this is applied to a constant, no change in the generated code.
Change-Id: Ia17a7672d4832463decbc4afd6cd42974d02698e
Finish the calulations in neon registers. This avoids a potentially
expensive move from neon to gp and allows at least clang to store
directly to memory.
BUG=webm:1424
Change-Id: Idef25eec95f7610947167818e9194bde8b00d282
this makes the function compatible with high-bitdepth and fixes test
failures since:
5ac88162b partial fdct test
Change-Id: Ib630694608237f0c515948942e05dbea259ba338
Always return an int32_t. Since it needs to be moved to a register for
shifting, this doesn't really penalize the smaller transforms.
The values could potentially be summed and shifted in place.
BUG=webm:1424
Change-Id: Id5beb35d79c7574ebd99285fc4182788cf2bb972
For the 8x8_1, the highbd output fit nicely in the existing function. 12
bit input will overflow this implementation of 16x16_1.
BUG=webm:1424
Change-Id: I2945fe5478b18f996f1a5de80110fa30f3f4e7ec
The function was originally written with HBD in mind. Enable it and
configure the tests.
BUG=webm:1424
Change-Id: I78a2eba8d4d9d59db98a344ba0840d4a60ebe9a1
* changes:
sad neon: rewrite 64x64 and add 64x32
sad neon: rewrite 32x32, add 32x16 and 32x64
sad neon: rewrite 16x8, 16x16, add 16x32
sad neon: rewrite 8x8 and 8x16
sad neon: rewrite 4x4 and add 4x8
Test the _1 variant of the fdct, which simply sums the block and applies
a modifying shift based on the block size.
BUG=webm:1424
Change-Id: Ic80d6008abba0c596b575fa0484d5b5855321468