Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.
Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.
Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.
Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
use CONFIG_VP[89] to protect white-box tests and drop redundant
uses of CONFIG_VP9 in variable assignments within that block
Change-Id: Id3c6cf5c7822aa161b19768b295f58829a1c6447
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
Always round sum error and sum square error toward zero in variance
calculations. This prevents variance from becoming negative.
Avoiding rounding variance at all might be better but would be far
more invasive.
Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
the final sum may use up to 26 bits
+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit
Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
For 1 pass CBR mode: increase waiting time after key frame
before we start sampling rate control behavior for determining
resize. This change need to disable one internal resize(DownUp)
temporally since it requires a longer clip to do so.
Change-Id: If21beda1be23f169ee541ab4dd642f718347887a
this avoids redefining vpx_codec_vp9_dx, vpx_codec_vp9_dx_algo in
vp9_encoder_parms_get_to_decoder.cc
Change-Id: I3b89e7a62497227ee32419f1a7d30e4c10a13c05
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.
Further 2 optimizations are applied:
1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.
The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.
Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.
Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
The serial decode check is too strict for tile-threaded decoding as
there is no guarantee on the decode order nor which specific error
will take precedence. Currently a tile-level error is not forwarded so
the frame will simply be marked corrupt.
Change-Id: I51cf1e39e44bedeac93746154b36a4ccb2f059b1
define NOMINMAX to allow the std:: versions to be used; min/max will be
defined transitively via windows.h otherwise
Change-Id: I692b03fa3e70b7a53962d3fd209498f70f712fed
In the decoder, map this to the output variable vpx_image_t.r_w/h.
This is intended as an improved version of VP9D_GET_DISPLAY_SIZE,
which doesn't work with parallel frame decoding. In the encoder,
map this to a codec control func (VP9E_SET_RENDER_SIZE) that takes
a w/h pair argument in a int[2] (identical to VP9D_GET_DISPLAY_SIZE).
Also add render_size to the encoder_param_get_to_decoder unit test.
See issue 1030.
Change-Id: I12124c13602d832bf4c44090db08c1009c94c7e8
these have been supported in tile-threaded decoding since:
b3b7645 vp9_dthread: remove frame_parallel_decoding_mode requirement
Change-Id: Ia5a752db9be937153cf4830d9258752136356d1b
remove static from fdct4/8/16/32 in vp10/encoder/dct.c
add prefix vp10_ to fdct4/8/16/32
add vp10/encoder/dct.h
Change-Id: I644827a191c1a7761850ec0b1da705638b618c66
Reallocation of mi buffer fails if change size on the first frame and
change config in subsequent frames. Add a condition for resolution
check to avoid assertion failure.
BUG=1074
Change-Id: Ie26ed816a57fa871ba27a72db9805baaaeaba9f3
the range check in dct.c (abs(input[i]) < (1 << bit)) will fail in many
cases. this was broken at the time this check was added
BUG=1076
Change-Id: I3df8c7a555e95567d73ac16acda997096ab8d6e2
the range check in dct.c (abs(input[i]) < (1 << bit)) will fail in the
25-29 range. this was broken at the time this check was added
Change-Id: I8ca9607f6cbdc8be7f47696ffeabbab3ac5727e2
In decoder, export (eventually) into vpx_image_t.range field. In
encoder, use oxcf->color_range to set it (same way as for
color_space).
See issue 1059.
Change-Id: Ieabbb2a785fa58cc4044bd54eee66f328f3906ce
Verify the dynamic resizer behavior for real time, 1 pass CBR mode.
Start at low target bitrate, raise the bitrate in the middle of the
clip, verify that scaling-up does occur after bitrate changed.
Change-Id: I7ad8c9a4c8288387d897dd6bdda592f142d8870c
Verify the dynamic resizer behavior for real time, 1 pass CBR mode.
Run at low bitrate, with resize_allowed = 1, and verify that we get
one resize down event.
Change-Id: Ic347be60972fa87f7d68310da2a055679788929d
Unify the style of fdct4() fdct8() fdct16()
Add fdct32()
Add range_check() at each stage
Add unit test at ../../test/vp10_dct_test.cc
Change-Id: I13f76d9046c3ea473c82024b09a5bc8662e2c28e
1) copy following files from vpx_dsp/ to vp10/common/
vp10_inv_txfm.c
vp10_inv_txfm.h
vp10_inv_txfm_sse2.c
vp10_inv_txfm_sse2.h
2) change the function prefix "vpx_" to "vp10_" in above files
3) add unit test at vp10_inv_txfm_test.cc
Change-Id: I206f10f60c8b27d872c84b7482c3bb1d1cb4b913
avoid duplicating internal structures and include vp9_dx_iface.c
directly. these had fallen out of sync after the frame-parallel branch
merge.
Change-Id: I604cfbffa95abe2a1c8e906a696f32436b1422ed
this file needs to be reworked to remove the duplication of codec
internals + allow for divergence of vp9 and vp10
Change-Id: I6266b94ccfbc24dae30148f134804b52aa411b88
Unify the style of fdct4() fdct8() fdct16()
Add fdct32()
Add range_check() at each stage
Add unit test at ../../test/vp10_dct_test.cc
Change-Id: I9e912b2c5683862e65c5a21abc3e1c260cca4576
* changes:
test: limit the valid image size on OS/2
configure: add -Zhigh-mem to LDFLAGS on OS/2
configure: disable PIC on OS/2
Makefile: add $(STACKREALIGN) to CFLAGS for vp9_reconintra.c
x86inc.asm: fix NASM compilation
The test file compiler fails if one uses --disable-vp8-decoder
--enable-vp9-decoder. It effectively turns on CONFIG_VP8 and
CONFIG_DECODERS, but turns off CONFIG_VP8_DECODER, which causes
compiler error at test_vector_test.cc.
This commit fixes this issue by adding vp8/9 decoder flags to
the decoder behavior test, respectively.
Change-Id: I097ff8fd5e12715a94a565a82e54503885eb7187
and FUN_CONV_2D macros. The predict lut now handles
this case. The encoder now calls vpx_scaled_2d() instead
of vpx_convolve8() for scaling.
Change-Id: Ia1c8af8a31e4cb4887a587143108cb45835f7df7
This commit clears all the vp9_ prefix use case in vpx_dsp. It gets
the vp9 folder ready to branch out vp10.
Change-Id: I2906eec179ee792b4af8c9b4161313653050e931
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.
Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.
Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.
Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.
Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
This test places 128 in positions that would not be found
in the VP9 filter tables. The ssse3 code packs this table
into chars and uses the pmaddubsw instruction, which treats
the value as signed. The ssse3 code checks for 128 in
position 3, skipping the ssse3 code if found, and calls
vp9_convolve8_c(). vp9_convolve8_c() is also used for scaling.
ChangeFilterWorks breaks the ssse3 scaling code found in other
commits.
Change-Id: I1f5a76834bc35180b9094c48f9421bdb19d3d1cb
The SSE2 version high bit-depth forward hybrid transforms are
essentially using the C functions via cross referencing to 1-D
functions in vp9_dct.c. This commit unifies the two versions and
removes the unnecessary dependency.
Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32
vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32
The purpose of doing that was to allow these functions to be shared
by multiple codecs.
Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
For vp9 decoder build without profile 2 and profile 3 support, this
commit changes to report error "Unsupported bitstream profile" for
input streams in profile 2 or 3, rather than other misleading error
information.
In addition, one of the invalid files in unit tests is actually coded
profile 2, this commit makes it tested only when the decoder is built
with vp9-highbitdepth.
This fixes issue #1028.
Change-Id: I8b6c1210787c8f89c703a546687dcf973ac20fc0
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.
Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
From Change Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
There is still an issue relating to one animated test clip with repeat
patterns where this change effectively increase the default maximum
arf interval by +1. This can be examined seperately.
Change-Id: Idd01d5480fc45202d8a059a0c3afc0997cc5bdd1
This commit replaces the vp8_ prefixed subtract function with the
common vpx_subtract_block function. It removes redundant SIMD
optimization codes and unit tests.
Change-Id: I42e086c32c93c6125e452dcaa6ed04337fe028d9
Adds two new vp9 parameters --min-gf-interval and --max-gf-interval
to enable testing based on frequency of alt-ref frames.
Also adds a unit-test to test enforcement of min-gf-interval.
For both these parameters the default value is 0, which indicates
they are picked by the encoder, based on resolution and framerate
considerations. If they are greater than zero, the specified
parameter is honored.
(Additional note by paulwilkins)
Note that there is a slight oddity in that key frames are also GFs and
considered part of GF only group. However they are treated as not
being part of an arf group because for arf groups the previous GF is
assumed to be the terminal or overlay frame for the previous group.
(end note)
Change-Id: Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
This reverts commit a42df86c03.
this change causes MSA/VP9SubpelVarianceTest.Ref and
MSA/VP9SubpelVarianceTest.ExtremeRef failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
This reverts commit 61774ad1c4.
this change causes MSA/VP9SubpelAvgVarianceTest.Ref failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
Change-Id: I7fb520c12b2a3b212d5e84b7619a380a48e49bb0
This commit moves the primitive multi-threading files from vp9
folder to vpx_thread, which will be accessible by all vpx codec.
Change-Id: Ib51e66e9c69801c10631fab56d35a0c0aaed5883
- Change default real time speed to -6.
- Add vpxenc_vp9_webm_rt_multithread, which encodes
niklas_1280_720_30.y4m with 2 to 4 threads using 2 to 4
tile columns.
Change-Id: I4d86c3360aec67ae5d1ba82eb6e0f0be8068b5af
add a dependency on *_rtcd.h to ensure they're generated before
attempting to build the test files
Change-Id: Ibbbd1f6ea77912bfd297129e7c83b9a80923ea12
fails unit tests:
[ FAILED ] NEON/VP8SubpelVarianceTest.ExtremeRef/0, where GetParam() = (3, 3, 0x14e36d, 0)
[ FAILED ] NEON/VP8SubpelVarianceTest.Ref/0, where GetParam() = (3, 3, 0x14e36d, 0)
the tests were recently enabled in:
eb88b17 Make vp9 subpixel match vp8
the functions likely haven't changed since being converted from assembly
Change-Id: I6141717b111b8f735f436c160d74270af53ef722
This control allows the application to skip the loop filter in the
decoder. This is an advanced control that should only be used in
extreme circumstances as it may introduce and accumulate decode
artifacts.
Change-Id: I278c65c60826f84c9141ebe06c6eeed3c2335fa8
WebM files will adjust the display width and height according to the
input pixel aspect ratio. The default pixel aspect ratio is 1:1.
BUG=https://code.google.com/p/webm/issues/detail?id=1005
Change-Id: I23e0a601b7259fa9513cb86110c41b8437769808
The only difference between the two was that the vp9 function allowed
for every step in the bilinear filter (16 steps) while vp8 only allowed
for half of those. Since all the call sites in vp9 (<< 1) the input, it
only ever used the same steps as vp8.
This will allow moving the subpel variance to vpx_dsp with the rest of
the variance functions.
Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75