and FUN_CONV_2D macros. The predict lut now handles
this case. The encoder now calls vpx_scaled_2d() instead
of vpx_convolve8() for scaling.
Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
This test places 128 in positions that would not be found
in the VP9 filter tables. The ssse3 code packs this table
into chars and uses the pmaddubsw instruction, which treats
the value as signed. The ssse3 code checks for 128 in
position 3, skipping the ssse3 code if found, and calls
vp9_convolve8_c(). vp9_convolve8_c() is also used for scaling.
ChangeFilterWorks breaks the ssse3 scaling code found in other
commits.
Change-Id: I1f5a76834bc35180b9094c48f9421bdb19d3d1cb
Updated sources according to improved version of common MSA macros.
Enabled respective convolve MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.
Change-Id: If5ad6ef8ea7ca47feed6d2fc9f34f0f0e8b6694d
Done little restructuring/styling changes to the sources like generic macro definitions, their use to reduce code lines, better code alignments etc.
Disabled all MSA hooks and tests
Change-Id: Ic6f2dce0b501f46b80c06c46c0fe2043d557b190
Assembly tests should clear system state, as we have no
expectation of proper system state in between test runs..
Change-Id: I0f591996c1f17ef2a5a8572a6b445f757223a144
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
If optimizations use more than one cpu feature, allow
specifying them so that '--disable-X' still works
https://code.google.com/p/webm/issues/detail?id=854
Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18
A bug in Microsoft compiler was found in the function
vp9_filter_block1d16_v8_avx2 and a workaround applied.
the bug occur when there was 4 consecutive maddubs + min + adds
intrinsic instructions.
Change-Id: I83499faeb70971e650e5663fd2490360ddb1a51b
used to wrap API functions to ensure full environment consistency as
opposed to the renamed ASM_REGISTER_STATE_CHECK which is used with
assembly functions.
currently checks the FPU tag word in x86/x86_64 gcc builds to ensure
emms has been called.
Change-Id: Ie241772dbf903d33d516a1add4c8c6783f2e1490
The intepolation filter functions can be better tested withe extreme
values, especially given the optimization functions are prone to
overflow signed 16 bit intermediate value when operation order is
wrong.
Change-Id: I712142b0bc1e5969c692c0486a57ffa37c9742b5
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
- Intermediate height was not correct i.e. when block size is 4 and
y_step_q4 is 6. In this case intermediate height was
(4*6) >> 4 = 1 and vertical interpolation needs two source pixels
plus 7 extra pixels for taps.
- Also if the current output block is 16x16 and we are using 4x upscaling
we need only 12 rows after horizontal filtering instead of 16.
Patch Set 2: Intermediate_height updated after CL 66723
"Fix bug in convolution functions (filter selection)"
Change-Id: I5a1a1bc2ac9d5edb3a6e0818de618bf318fdd589