This commit replaces the vp8_ prefixed subtract function with the
common vpx_subtract_block function. It removes redundant SIMD
optimization codes and unit tests.
Change-Id: I42e086c32c93c6125e452dcaa6ed04337fe028d9
By using 0xff for a short it was not setting the high bits. When
comparing the output with vtst to find non-zero elements it was skipping
vaules which had no low bits set such as -512 / 0xFE00.
Using -8191 as the first element of coeff will generate this condition.
BUG=883
Change-Id: Ia1e10fb809d1e7866f28c56769fe703e6231a657
Use intrinsics for neon quantization. Slight loss (<5%) of performance
compared to the assembly. Roughly 10x faster on arm64 because that was
running C code before.
Change-Id: I7cf5242d8f29b7eab5bca6a1c20c89c9fc9ca66d
The version of gcc4.6 included with the Android NDK through r10b
fails to compile this function. Replace it with C code.
BUG=860
Change-Id: Ifcc0476664071aec46a171cdd5ad17305930986a
Now match the "C" version of "Fix to reduce block
artifacts from vp8 temporal denoiser."
(see change id Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f)
Change-Id: I99e569bb6af4ae3532621127e12bf917a48ba08e
Used horizonal add instructions instead of adding
byte lanes. The encoder performance improved by
~4% for the test clip used.
Change-Id: Iaddd10403fcffb5b3f53b1f591ab2fe0ff002c08
The recent compiler can generate optimized code that uses NEON registers
for various operations besides floating-point operations. Therefore,
only saving callee-saved registers d8 - d15 at the beginning of the
encoder/decoder is not enough anymore. This patch added register saving
code in VP8 NEON functions that use those registers.
Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31
Creates a merge between the master and experimental branches. Fixes a
number of conflicts in the build system to allow *either* VP8 or VP9
to be built. Specifically either:
$ configure --disable-vp9 $ configure --disable-vp8
--disable-unit-tests
VP9 still exports its symbols and files as VP8, so that will be
resolved in the next commit.
Unit tests are broken in VP9, but this isn't a new issue. They are
fixed upstream on origin/experimental as of this writing, but rebasing
this merge proved difficult, so will tackle that in a second merge
commit.
Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
For non-static functions, change the prefix to vp9_. For static functions,
remove the prefix. Also fix some comments, remove unused code or unused
function prototypes.
Change-Id: I1f8be05362f66060fe421c3d4c9a906fdf835de5
Besides imposing a performance penalty at startup in most
configurations, these relocations break the dynamic linker for
native Fennec, since it does not support them at all.
Change-Id: Id5dc768609354ebb4379966eb61a7313e6fd18de
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.
To fix this we cast the result of the multiplication as unsigned.
The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.
For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).
PPC change is untested.
Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
Patch set 2: 64 bit build fix
Patch set 3: 64 bit crash fix
[Tero]
Patch set 4: Updated ARMv6 and NEON assembly.
Added also minor NEON optimizations to subtract
functions.
Patch set 5: x86 stride bug fix
Change-Id: I1fcca93e90c89b89ddc204e1c18f208682675c15
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.
Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
The partial frame copy function used to copy an extra 8 lines above
and below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.
Define the "magic" fraction number for partial filtering in
loopfilter.h .
Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9
Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.
Change-Id: I338f02b881098782f82af63d97f042b85e63e902
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
- Additional 3-6% speedup compared to neon optimized fast
quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)
Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
vp8_fast_quantize_b_neon function updated and further optimized.
- match current C implementation of fast quantizer
- updated to use asm_enc_offsets for structure members
- updated ads2gas scripts to handle alignment issues
Change-Id: I5cbad9c460ad8ddb35d2970a8684cc620711c56d
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
- vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
that contains all needed address calculations
- unnecessary file encodemb_arm.c removed
- consistent with ARMv6 optimized version
Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
- armv5te-none-rvct
- armv6-none-rvct
- armv7-none-rvct
To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.
Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.
Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
In vp8_find_best_sub_pixel_step_iteratively(), many times xoffset
and yoffset are specific values - (4,0) (0,4) and (4,4). Modified
code to call simplified NEON version at these specific offsets to
help with the performance.
Change-Id: Iaf896a0f7aae4697bd36a49e182525dd1ef1ab4d