Commit Graph

84 Commits

Author SHA1 Message Date
John Koleszar
7a07eea13f Convert subpixel filters to use convolve framework
Update the code to call the new convolution functions to do subpixel
prediction rather than the existing functions. Remove the old C and
assembly code, since it is unused. This causes a 50% performance
reduction on the decoder, but that will be resolved when the asm for
the new functions is available.

There is no consensus for whether 6-tap or 2-tap predictors will be
supported in the final codec, so these filters are implemented in
terms of the 8-tap code, so that quality testing of these modes
can continue. Implementing the lower complexity algorithms is a
simple exercise, should it be necessary.

This code produces slightly better results in the EIGHTTAP_SMOOTH
case, since the filter is now applied in only one direction when
the subpel motion is only in one direction. Like the previous code,
the filtering is skipped entirely on full-pel MVs. This combination
seems to give the best quality gains, but this may be indicative of a
bug in the encoder's filter selection, since the encoder could
achieve the result of skipping the filtering on full-pel by selecting
one of the other filters. This should be revisited.

Quality gains on derf positive on almost all clips. The only clip
that seemed to be hurt at all datarates was football
(-0.115% PSNR average, -0.587% min). Overall averages 0.375% PSNR,
0.347% SSIM.

Change-Id: I7d469716091b1d89b4b08adde5863999319d69ff
2013-02-05 14:23:17 -08:00
John Koleszar
5ca6a3667f Add 8-tap generic convolver
This commit introduces a new convolution function which will be used to
replace the existing subpixel interpolation functions. It is much the
same as the existing functions, but allows for changing the filter
kernel on a per-pixel basis, and doesn't bake in knowledge of the
filter to be applied or the size of the resulting block into the
function name.

Replacing the existing subpel filters will come in a later commit.

Change-Id: Ic9a5615f2f456cb77f96741856fc650d6d78bb91
2013-02-05 14:19:28 -08:00
Scott LaVarnway
5780c4cbd5 Added vp9_short_idct1_32x32_c
and called this function in vp9_dequant_idct_add_32x32_c when
eob == 1.  For the test clip used, the decoder performance improved
by 21+%.  Based on Yaowu's 16 point idct work.

Change-Id: Ib579a90fed531d45777980e04bf0c9b23c093c43
2013-02-04 16:49:17 -08:00
Yaowu Xu
1eb79dc1dc re-write 8 point idct
to be consistent with idct16 and idct32.

Change-Id: Ie89dbd32b65c33274b7fecb4b41160fcf1962204
2013-02-04 07:31:25 -08:00
Yaowu Xu
ccaaeb4b5a a couple of minor fixes
fixed a function prototypes to prevent compiler warnings;
removed a function not in use;
un-capitialize "Refstride" to ref_stride

Change-Id: Ib4472b6084f357d96328c6a06e795b6813a9edba
2013-02-04 07:19:32 -08:00
Yaowu Xu
91e0e80142 Changes 16 point idct
This commit changes the inverse 16 point dct to use the same algorithm
as the one for 32 point idct. In fact, now 16 point dct uses the exact
version of the souce code for even portion of the 32 point idct.

Tests showed current implementation has significant better accuracy
than the previous version. With this implementation and the minor bug
fix on forward 16 point dct, encoding tests showed about 0.2% better
compression of CIF set, test results on std-hd setting pending.

Change-Id: I68224b60c816ba03434e9f08bee147c7e344fb63
2013-01-31 19:52:18 -08:00
Ronald S. Bultje
c9071601a2 Remove compound intra-intra experiment.
This experiment gives little gains and adds relatively much code
complexity (and it hinders other experiments), so let's get rid of
it.

Change-Id: Id25e79a137a1b8a01138aa27a1fa0ba4a2df274a
2013-01-14 15:47:25 -08:00
Yaowu Xu
741fbe9656 Merge experiment "subpelrefmv"
Change-Id: Iac7f3d108863552b850c92c727e00c95571c9e96
2013-01-14 15:18:47 -08:00
Yaowu Xu
f7dab60096 Merge experiment "widerlpf"
Change-Id: I0c94475075e66e13cfe4c20fab7db6474441ae86
2013-01-14 15:17:35 -08:00
Scott LaVarnway
4987c0f07e Initial sse2 version of the wide loopfilters
Updated the rtcd_defs and used the sse2 uv version
of the loopfilter.  The performance improved by ~8%
for the test clip used.

Change-Id: I5a0bca3b6674198d40ca4a77b8cc722ddde79c36
2013-01-11 14:54:14 -08:00
Jim Bankoski
9431536045 rtcd for new wider loop filters
Change-Id: I8826bcdcf72ba6d86bde31cd13902a710399805c
2013-01-11 09:45:45 -08:00
Ronald S. Bultje
aa2effa954 Merge tx32x32 experiment.
Change-Id: I615651e4c7b09e576a341ad425cf80c393637833
2013-01-10 08:23:59 -08:00
Ronald S. Bultje
6884a83f06 Merge superblocks64 experiment.
Change-Id: If6c88752dffdb566f8d4322f135145270716fb8e
2013-01-09 17:21:40 -08:00
Adrian Grange
7d6b5425d7 New prediction filter
This patch removes the old pred-filter experiment and replaces it
with one that is implemented using the switchable filter framework.

If the pred-filter experiment is enabled, three interopolation
filters are tested during mode selection; the standard 8-tap
interpolation filter, a sharp 8-tap filter and a (new) 8-tap
smoothing filter.

The 6-tap filter code has been preserved for now and if the
enable-6tap experiment is enabled (in addition to the pred-filter
experiment) the original 6-tap filter replaces the new 8-tap smooth
filter in the switchable mode.

The new experiment applies the prediction filter in cases of a
fractional-pel motion vector. Future patches will apply the filter
where the mv is pel-aligned and also to intra predicted blocks.

Change-Id: I08e8cba978f2bbf3019f8413f376b8e2cd85eba4
2013-01-09 12:00:39 -08:00
Ronald S. Bultje
cd0f36b24f Merge "Merge superblocks (32x32) experiment." into experimental 2013-01-08 13:31:37 -08:00
Yunqing Wang
f1c56a8c8c Merge "vp9_sub_pixel_variance16x2 SSE2 optimization" into experimental 2013-01-08 12:59:08 -08:00
Ronald S. Bultje
4455036cfc Merge superblocks (32x32) experiment.
Change-Id: I0df99742029834a85c4933652b0587cf5b6b2587
2013-01-08 12:54:45 -08:00
Yunqing Wang
8d568312a2 vp9_sub_pixel_variance16x2 SSE2 optimization
About 5% decoder speedup.

Change-Id: Ib6687d337af758a536a0e7e289f400990f1f9794
2013-01-08 12:01:55 -08:00
John Koleszar
879cb7d962 Merge vp9-preview changes into experimental branch
Incorportate vp9-preview changes by merging master branch into experimental.

Conflicts:
	test/test.mk
	vp9/common/vp9_filter.c
	vp9/common/vp9_idctllm.c
	vp9/common/vp9_invtrans.h
	vp9/common/vp9_mbpitch.c
	vp9/common/vp9_rtcd_defs.sh
	vp9/common/vp9_systemdependent.h
	vp9/common/vp9_type_aliases.h
	vp9/common/x86/vp9_asm_stubs.c
	vp9/common/x86/vp9_subpixel_mmx.asm
	vp9/decoder/vp9_decodframe.c
	vp9/decoder/vp9_dequantize.c
	vp9/decoder/vp9_dequantize.h
	vp9/decoder/vp9_onyxd_int.h
	vp9/encoder/vp9_bitstream.c
	vp9/encoder/vp9_encodeframe.c
	vp9/encoder/vp9_rdopt.c

Change-Id: I17f51c3666d1b59cf1a699f87607cbc5d30a87c5
2013-01-08 10:19:59 -08:00
Ronald S. Bultje
c3941665e9 64x64 blocksize support.
3.2% gains on std/hd, 1.0% gains on hd.

Change-Id: I481d5df23d8a4fc650a5bcba956554490b2bd200
2013-01-05 18:20:25 -08:00
John Koleszar
5ebe94f9f1 Build fixes to merge vp9-preview into master
Various fixups to resolve issues when building vp9-preview under the more stringent
checks placed on the experimental branch.

Change-Id: I21749de83552e1e75c799003f849e6a0f1a35b07
2012-12-26 11:21:09 -08:00
Scott LaVarnway
89ac94f8fb Removed mmx versions of vp9_bilinear_predict filters
These filters will not work with VP9.

Change-Id: Ic26c77961084fcea6bfa97f4cd95afdea2282e85
2012-12-21 14:41:49 -08:00
Scott LaVarnway
08dabbcee1 Disabled x86inc style assembly functions
Temporary fix for 32-bit mac build errors.

Change-Id: I2038f033cac16ea796097d0edd0f1c3da03246d7
2012-12-19 11:53:43 -08:00
Ronald S. Bultje
4cca47b538 Use standard integer types for pixel values and coefficients.
For coefficients, use int16_t (instead of short); for pixel values in
16-bit intermediates, use uint16_t (instead of unsigned short); for all
others, use uint8_t (instead of unsigned char).

Change-Id: I3619cd9abf106c3742eccc2e2f5e89a62774f7da
2012-12-18 15:31:19 -08:00
Scott LaVarnway
b575394e21 Improved vp9_ihtllm_c
As suggested by Yaowu, we can use eob to reduce the complexity
of the vp9_ihtllm_c function.  For the 1080p test clip used, the decoder
performance improved by 17%.

Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435
2012-12-12 15:49:39 -08:00
Ronald S. Bultje
c456b35fdf 32x32 transform for superblocks.
This adds Debargha's DCT/DWT hybrid and a regular 32x32 DCT, and adds
code all over the place to wrap that in the bitstream/encoder/decoder/RD.

Some implementation notes (these probably need careful review):
- token range is extended by 1 bit, since the value range out of this
  transform is [-16384,16383].
- the coefficients coming out of the FDCT are manually scaled back by
  1 bit, or else they won't fit in int16_t (they are 17 bits). Because
  of this, the RD error scoring does not right-shift the MSE score by
  two (unlike for 4x4/8x8/16x16).
- to compensate for this loss in precision, the quantizer is halved
  also. This is currently a little hacky.
- FDCT and IDCT is double-only right now. Needs a fixed-point impl.
- There are no default probabilities for the 32x32 transform yet; I'm
  simply using the 16x16 luma ones. A future commit will add newly
  generated probabilities for all transforms.
- No ADST version. I don't think we'll add one for this level; if an
  ADST is desired, transform-size selection can scale back to 16x16
  or lower, and use an ADST at that level.

Additional notes specific to Debargha's DWT/DCT hybrid:
- coefficient scale is different for the top/left 16x16 (DCT-over-DWT)
  block than for the rest (DWT pixel differences) of the block. Therefore,
  RD error scoring isn't easily scalable between coefficient and pixel
  domain. Thus, unfortunately, we need to compute the RD distortion in
  the pixel domain until we figure out how to scale these appropriately.

Change-Id: I00386f20f35d7fabb19aba94c8162f8aee64ef2b
2012-12-07 14:45:05 -08:00
Johann
52d350febf Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.

Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.

The 'vp8' functions have not been renamed yet. That is for after the
cleanup.

Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c
2012-12-05 08:59:40 -08:00
Johann
a905672906 Remove ARM optimizations from VP9
Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b
2012-12-05 08:59:25 -08:00
Johann
c6bd29e2f5 Begin to refactor vpx_scale usage in VP9
Only declare the functions in vpx_scale RTCD and include the relevant
header.

Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.

The 'vp8' functions have not been renamed yet. That is for after the
cleanup.

Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c
2012-12-03 12:51:56 -08:00
Johann
34591b54dd Remove ARM optimizations from VP9
Change-Id: I9f0ae635fb9a95c4aa1529c177ccb07e2b76970b
2012-12-03 12:50:15 -08:00
Jim Bankoski
9f9370425b warnings in various experiments
Change-Id: Ib5106d4772450f8026f823dd743f162ab833b1d6
2012-11-30 07:31:37 -08:00
Jim Bankoski
030e268a90 ihtllm moves to rtcd
clears up some warnings

Change-Id: I9899637497c6ad7519f098e055ab98580ae6d688
2012-11-29 07:19:38 -08:00
Jim Bankoski
85cba19e16 remove postproc invokes
and some miscellaneous invoke left overs

Change-Id: I63191b1bfd3bea4ce30cceaeb686ec850570fc43
2012-11-28 10:00:25 -08:00
John Koleszar
fcccbcbb39 Add vp9_ prefix to all vp9 files
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.

Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
2012-11-27 14:12:30 -08:00