The existing code applied a 6-tap filter with 0's on either end.
We're already paying the branch penalty to avoid computing the two
extra columns needed as input to this filter.
We might as well save time computing the filter as well.
This reduces the inner loop from 21 instructions to 16, the number
of loads per iteration from 4 to 1, and the number of multiplies
from 7 to 4.
The gain in overall decoding performance, however, is small (less
than 1%).
This change also means we now valgrind clean on ARMv6, which is
its real purpose.
The errors reported here were valgrind's fault (it does not detect
that 0 times an uninitialized value is initialized), but Julian
Seward says it would slow down valgrind considerably to make such
checks.
Speeding up libvpx rather, even by a small amount, seems a much
better idea if only to enable proper valgrind checking of the
rest of the codec.
Change-Id: Ifb376ea195e086b60f61daf1097d8910c4d8ff16
This function was accessing values below the stack pointer, which
can be corrupted by signal delivery at any time.
Change-Id: I92945b30817562eb0340f289e74c108da72aeaca
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
move some things around, reorder some instructions
constant 0 is used several times. load it once per call in horiz,
once per loop in vert.
separate saturating instructions to avoid stalls.
just use one usub8 call to set GE flags, rather than uqsub8 followed by
usub8 w/ 0
document some stalls for further consideration
Change-Id: Ic3877e0ddbe314bb8a17fd5db73501a7d64570ec
test cases were causing a crash because the count was being read
incorrectly. after fixing that, noticed that the output was not
matching. fixed that.
Change-Id: Idb0edb887736bd566a3cf6d4aa1a03ea8d20eb27
Jeff Muizelaar posted some changes to the idct/reconstruction c code.
This is the equivalent update for the arm assembly.
This shows a good boost on v6, and a minor boost on neon.
Here are some numbers for highway in qcif, 2641 frames:
HEAD neon: ~161 fps
new neon: ~162 fps
HEAD v6: ~102 fps
new v6: ~106 fps
The following functions have been updated for armv6 and neon:
vp8_dc_only_idct_add
vp8_dequant_idct_add
vp8_dequant_dc_idct_add
Conflicts:
vp8/decoder/arm/armv6/dequantdcidct_v6.asm
vp8/decoder/arm/armv6/dequantidct_v6.asm
Resolved by removing these files. When I rewrote the functions, I also
moved the files to dequant_dc_idct_v6.asm/dequant_idct_v6.asm
Change-Id: Ie3300df824d52474eca1a5134cf22d8b7809a5d4
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.
Change-Id: I236c05fade06589e417179c0444cb39b09e4200d