A 16x16 pixel block is copied to the destination pointed out by the
target motion vector. Since the motion vector is relative to the
center of the buffer, the upper bound of the range is size/2-16.
Previously we never used negative motion vectors, but there is no
reason not to test that direction. Therefore, the possible range
would be [-size/2,size/2-16]. Additionally pad this range with
INTPEL_NEEDED_MARGIN.
The block size is chosen randomly; if the block size is 16,
LineFullSearch_c will read a block with 16 pixels from kiMaxPos;
thus kiMaxPos cannot be larger than height-16, otherwise the calls
end up with reads out of bounds.
This avoids the following error when doing "make OS=ios" if gtest
isn't installed:
make: *** No rule to make target `binaries', needed by `all'. Stop.
This fixes issue #752.
this only solve the problem temporarily. but for the building of tests working, merge this first.
we need to re-consider the including of typedef headers
This reduces the build time from 69 s to 30 s, reduces the size of
the built wels.lib from 30 MB to 3.9 MB, and reduces the number of
warnings when building wels.lib.
Instead of loading the registers one lane at a time, load full
registers and then transpose them.
This is faster, reducing the runtime for the function from about
506 cycles to 434 cycles (tested on a Cortex A8).
This also avoids an issue which seems like a cpu bug, present
on Sony Xperia T (cpu implementer 0x51 architecture 7 variant 0x1
part 0x04d). On such a device, it seemed like the "vswp q9, q10"
could start executing before the previous
vld4.u8 {d20[x],d21[x],d22[x],d23[x]}, [r3], r1
had finished and written back their result. Changing the
"vswp q9, q10" into "vswp q10, q9", or into separate
"vswp d18, d20; vswp d19, d21" (or the other way around) seemed to
avoid the issue. This happened occasionally (a couple times per
100000 invocations or so).
Checking HAVE_NEON is not enough; e.g. android devices with
armeabi-v7a are not required to have NEON, so every use of such
functions should be check WELS_CPU_NEON in the cpu features
as well.