Uses a single 1D table to implement the weighting of the predictors
for the compound inter-intra experiment.
Change-Id: I204ffbe4f9fc79d5d43b6c724ad253d800461012
These variables have the type int64_t, not long long. long long could
be a larger type than 64 bits. Emulate INT64_MAX for older versions of
MSVC, and remove the unreferenced vpx_ports/vpxtypes.h
Change-Id: Ideaca71838fcd3849d816d5ab17aa347c97d03b0
This experiment gives little gains and adds relatively much code
complexity (and it hinders other experiments), so let's get rid of
it.
Change-Id: Id25e79a137a1b8a01138aa27a1fa0ba4a2df274a
In commit 9a1d73d, loop filtering was added for UV 4x4 boundaries
when TX_8X8 is used by a MB. This commit further refined the decision
to be based on the actual transform used for the UV planes. When
UV planes use 4x4 transform, i.e. when prediction mode used is either
I8X8_PRED or SPLITMV, UV planes are filtered on 4x4 boundaries, and no
filtering is applied on 4x4 block boundaries when UV planes use 8X8
transform.
Change-Id: Ibb404face0a1d129b4b4abaf67c55d82e8df8bec
Fixes some scaling issues. Adds an option to only compute the
dct on the low-low subband for 32x32 and 64x64 blocks using
only a single 16x16 dct after 1 and 2 wavelet decomposition
levels respectively. Also adds an option to use a 8x8 dct
as building block.
Currenlty with the 2/6 filter and with a single 16x16 dct on
the low low band, the reuslts compared to full 32x32 dct is
as follows:
derf: -0.15%
yt: -0.29%
std-hd: -0.18%
hd: -0.6%
These are my current recommended settings, since the 2/6 filter
is very simple.
Results with 8x8 dct are about 0.3% worse.
Change-Id: I00100cdc96e32deced591985785ef0d06f325e44
and vp9_mb_lpf_vertical_edge_w_sse2. This was quickly done so we can
run some tests over the weekend. Future commits will optimize/refactor these
functions further.
The decoder performance improved by ~17% for the clip used.
Change-Id: I612687cd5a7670ee840a0cbc3c68dc2b84d4af76
Updated the rtcd_defs and used the sse2 uv version
of the loopfilter. The performance improved by ~8%
for the test clip used.
Change-Id: I5a0bca3b6674198d40ca4a77b8cc722ddde79c36
The commit changed to not to use wider lpf within a superblock when
32x32 transform is used for the block.
The commit also changed to use the shorter version of loop filtering:
for UV planes.
Change-Id: I344c1fb9a3be9d1200782a788bcb0b001fedcff8
This patch removes the old pred-filter experiment and replaces it
with one that is implemented using the switchable filter framework.
If the pred-filter experiment is enabled, three interopolation
filters are tested during mode selection; the standard 8-tap
interpolation filter, a sharp 8-tap filter and a (new) 8-tap
smoothing filter.
The 6-tap filter code has been preserved for now and if the
enable-6tap experiment is enabled (in addition to the pred-filter
experiment) the original 6-tap filter replaces the new 8-tap smooth
filter in the switchable mode.
The new experiment applies the prediction filter in cases of a
fractional-pel motion vector. Future patches will apply the filter
where the mv is pel-aligned and also to intra predicted blocks.
Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4
This is to add to the 64x64 transform experiment as an alternative to
a 64x64 DCT.
Two levels of wavelet decomposition is used on a 64x64 block, followed
by 16x16 DCT on the four lowest subbands. The highest three subbands
are left untransformed after the first level DWT.
Change-Id: I3d48d5800468d655191933894df6b46e15adca56
This commit did a couple of minor cleanup/refactoring to prepare for
futher loop filter experiments. It merged y_only version of loop filter
function into the regular one, which makes sure that same logic is used
for functions for picking level and for actual loop filtering.
Change-Id: Id10c94dccd45f58e5310bacfdf6ee63cbb60b86f
This experimental change reorders the search so
that all possible references that match the target
reference frame are tested first and these in order
of distance from the current block. These will usually
be the highest scoring candidates.
If we do not find enough good candidates this way
we try non matching cases. These will usually be lower
scoring candidates.
The change in order together with breakouts when
we have found enough candidates should reduce
the computational cost and especially reduce the number
of sort operations.
Quality Results:
Std Hd +0.228%, Hd +0.074%, YT +0.046%, derf +0.137%
This effect is probably due to the fact that more distant
weak candidates are now less likely to get "promoted" over
near candidates even if they are repeated.
Change-Id: Iec37e77d88a48ad0ee1f315b14327a95d63f81f6
The 2-D inverse transform X = M1*Z*Transposed_M2 was calculated
in 2 steps from left to right:
1. Vertical transform: Y = M1*Z
2. Horizontal transform: X= Y*Transposed_M2
In SIMD, a transpose is needed in vertical transform.
Here, switched the calculation order to do it from right to left.
In this way, we could eliminate that transpose by writing the
intermediate results out to their transposed positions.
Change-Id: I34dfe5eb01292f6e363712420d99475e2e81e12c
Various fixups to resolve issues when building vp9-preview under the more stringent
checks placed on the experimental branch.
Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
Adds an experiment to derive the previous context of a coefficient
not just from the previous coefficient in the scan order but from a
combination of several neighboring coefficients previously encountered
in scan order. A precomputed table of neighbors for each location
for each scan type and block size is used. Currently 5 neighbors are
used.
Results are about 0.2% positive using a strategy where the max coef
magnitude from the 5 neigbors is used to derive the context.
Change-Id: Ie708b54d8e1898af742846ce2d1e2b0d89fd4ad5
For coefficients, use int16_t (instead of short); for pixel values in
16-bit intermediates, use uint16_t (instead of unsigned short); for all
others, use uint8_t (instead of unsigned char).
Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da