Includes tests which compare output of new SSSE3 functions with
their C equivalents, and fixes to the C code to ensure these tests
pass.
Change-Id: Iec3980cce95a8ee6bf9421fa4793130e92c162e3
Changes ensure wedge_partition, interintra and palette expts all
work alongside the ext_coding_unit_size experiment.
Change-Id: I18f17acb29071f6fc6784e815661c73cc21144d6
Extending the SSE2 implementation of vp9_subtract_block to work
with the 128x128 coding unit experiment
Change-Id: Ib3cc16bf5801ef2c7eecc19d3cc07a8c50631580
CONFIG_SR_MODE=1, enable SR mode
USE_POST_F=1, enable SR post filter
SR_USE_MULTI_F=1, enable SR post filter family
Not compatible with other experiments yet
Change-Id: I116f1d898cc2ff7dd114d7379664304907afe0ec
Framework for alternate transforms for inter 32x32 and larger based
on dwt-dct hybrid is implemented.
Further experiments are to be condcuted with different
variations of hybrid dct/dwt or plain dwt, as well as super-resolution
mode.
Change-Id: I9a2bf49ba317e7668002cf1499211d7da6fa14ad
The wavelets implemented are 2/6, 5/3 and 9/7 each with
a lifting based scheme for even block sizes. The 9/7
one is a double implementation currently.
This is to start experiments with:
1. Replacing large transforms (32x32 and 64x64) with wavelets
or wavelet-dct hybrids that can hopefully localize errors better
spatially. (Will also need alternate entropy coder)
2. Super-resolution modes where the higher sub-bands may be
selectively skipped from being conveyed, while a smart
reconstruction recovers the lost frequencies.
The current patch includes two types of 32x32 and 64x64
transforms: one where only wavelets are used, and another
where a single level wavelet decomposition is followed
by a lower resolution dct on the low-low band.
Change-Id: I2d6755c4e6c8ec9386a04633dacbe0de3b0043ec
This is a combination of:
4a19fa6 Added sse2 acceleration for highbitdepth variance
c6f5d3b Fix high bit depth assembly function bugs
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f
This framework allows lower quantization bins to be shrunk down or
expanded to match closer the source distribution (assuming a generalized
gaussian-like central peaky model for the coefficients) in an
entropy-constrained sense. Specifically, the width of the bins 0-4 are
modified as a factor of the nominal quantization step size and from 5
onwards all bins become the same as the nominal quantization step size.
Further, different bin width profiles as well as reconstruction values
can be used based on the coefficient band as well as the quantization step
size divided into 5 ranges.
A small gain currently on derflr of about 0.16% is observed with the
same paraemters for all q values.
Optimizing the parameters based on qstep value is left as a TODO for now.
Results on derflr with all expts on is +6.08% (up from 5.88%).
Experiments are in progress to tune the parameters for different
coefficient bands and quantization step ranges.
Change-Id: I88429d8cb0777021bfbb689ef69b764eafb3a1de
Preliminary 64x64 transform implementation.
Includes all code changes.
All mismatches resolved.
Coding results for derf and stdhd are within noise. stdhd is slightly
higher, derf is slightly lower.
To be further refined.
Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/
(a92f987a6b7819ae5c62a429e126e1c26bdb1b71) on highbitdepth branch.
Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8
The concept:
There's too much noise in source pixels for variance and at low bitrate
the reconstructed looks nothing like the source so we have problems
getting good partitionings with either. This skirts the issue by using
a box blur scaled down version for variance calculations. To compare
against source_var_ moved keyframe to be rd based like source_var.
Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are
only 16x16 blocks in denoiser, while in VP9, there are 13 different
block sizes.
By adding this SSE2 code, the improvement of encoder speed is around
20%(using C code vs using SSE2 code), vary for different clips.
The unit test for VP9 denoiser is to confirm that the SSE2 code is
bit-exact with the C code. The unit test covers all block size.
Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
Miscellaneous bug-fixes for high bitdepth functionality.
With this patch, high bit-depth profiles become mostly functional,
except for an intermittent assert failure issue that is being
tracked.
Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c
Moves transform type defines to vp9_common.h from vp9_idct.h
so that they can be included in vp9_rtcd_defs.pl safely.
Change-Id: Id5106227bee5934f7ce8b06f2eb9fa8a9a2e0ddb
This reverts commit eafc8c9c40d712aabe234bed5269a02c62fa0bfc.
tran_low_t/tran_high_t don't belong in a public header, they're private.
Similarly the public headers shouldn't rely on config defines,
vpx_config.h isn't installed.
Change-Id: I194ec273598da418df8dd727b6c0e78a556740ad
This commit fixes a compiling error in vp9_idct.h, where the codec
checks that the intermediate steps of transformation fit within
16-bit length. The issue was due to broken file dependency.
Change-Id: Ib22bba13a1e6df28489cb23d6774c561969f1fdc
This commit adds back sse2 or ssse3 optimized versio of a couple of
functions, fixes a ~10% performance regression.
Change-Id: I049786906e5a641224dced63c6492aec9d86d183
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
If optimizations use more than one cpu feature, allow
specifying them so that '--disable-X' still works
https://code.google.com/p/webm/issues/detail?id=854
Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18
Removed functions:
* vp9_post_proc_down_and_across_mmx
* vp9_mbpost_proc_down_mmx
* vp9_plane_add_noise_mmx
They all have sse2 equivalent.
Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff
vp9_variance_sse2.c contains a mix of intrinsics and references to
assembly which uses x86inc.asm; it's conditionally included as a result.
Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec