Commit Graph

2390 Commits

Author SHA1 Message Date
Adrian Grange
8cb8aef7c7 Merge "Modified frame buffer handling" 2014-07-17 12:15:16 -07:00
Scott LaVarnway
ba0652e83a Merge "Added vp9_sad64x64_neon(), vp9_sad32x32_neon()" 2014-07-17 11:42:16 -07:00
Adrian Grange
f68aaa38d6 Modified frame buffer handling
This patch is the first step toward simplifying the
frame buffer handling.

The final goal is to have a common frame buffer handling
framework for both encoder and decoder that incorporates
the existing ability to use externally allocated memory.

Change-Id: I2c378a4f54a39908915f46c4260e17a080db7ff1
2014-07-17 11:06:35 -07:00
Scott LaVarnway
696fa52eaa Added vp9_sad64x64_neon(), vp9_sad32x32_neon()
and vp9_sad16x16_neon()

On a Nexus 7, vpxenc (in realtime mode, speed -6)
reported a performance improvement of ~17%.

Change-Id: I91e070cde2973451083d3f3d63b49b7886de9a85
2014-07-16 12:54:46 -07:00
Deb Mukherjee
1f6aaeddc5 Merge "Some extra bit probability cleanups" 2014-07-14 17:26:54 -07:00
Jingning Han
6ce515b9ff Merge "Fix chrome valgrind warning due to the use of mismatched bsize" 2014-07-13 11:07:44 -07:00
James Zern
0999a2a24e Merge "vp9_loopfilter.c: cosmetics" 2014-07-11 16:02:21 -07:00
Jingning Han
3cddd81c6d Fix chrome valgrind warning due to the use of mismatched bsize
This commit fixes a mismatched use case of block size in non-RD
intra prediction check. The residual SSE and variance should be
calculated per transform block size, instead of operating block
size, which caused chrome valgrind warning on conditional jump
based on uninitialized value (webm issue 823). This commit
resolves this issue.

Change-Id: I595c06599c7e0fd0e4a08736519ba68fc14bc79a
2014-07-11 15:49:22 -07:00
Yunqing Wang
7e340614c1 Merge "Remove unnecessary assertions" 2014-07-11 13:47:03 -07:00
Deb Mukherjee
6957e7a077 Some extra bit probability cleanups
Refactoring to remove some duplication of probability
tables between tokenization and detokenization.

Change-Id: I2fc6a6497f9c0410021a9b41f828bc58a864e466
2014-07-11 11:39:18 -07:00
Yunqing Wang
978642a426 Remove unnecessary assertions
Removed 2 unnecessary assertions.

Change-Id: I0f8877d0494bf3ecdb0d7931ccbcaa8289e01d8b
2014-07-11 10:48:57 -07:00
Yaowu Xu
a75d55df1b Remove an unused parameter
Change-Id: I6ad6fd75dc3c9e6218d88148cf49e205398e2af5
2014-07-11 08:10:04 -07:00
James Zern
8a7cc1f47b Merge "update vp9_thread.c" 2014-07-10 23:19:55 -07:00
James Zern
8701ed0270 update vp9_thread.c
pull the latest from libwebp.

Original source:
 http://git.chromium.org/webm/libwebp.git
 100644 blob 264210ba2807e4da47eb5d18c04cf869d89b9784 src/utils/thread.c

commit 46fd44c1042c9903b2f1ab87e9f200a13c7e702d
Author: James Zern <jzern@google.com>
Date:   Tue Jul 8 19:53:28 2014 -0700

    thread: remove harmless race on status_ in End()

    if a thread was still doing work when End() was called there'd be a race
    on worker->status_. in these cases, however, the specific value is
    meaningless as it would be >= OK and the thread would have been shut
    down properly, but we'll check 'impl_' instead to avoid any potential
    TSan/DRD reports.

    Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1

Change-Id: Ib0ef25737b3c6d017fa74822e21ed58508230b91
2014-07-10 12:20:54 -07:00
Yunqing Wang
1226d133df Merge "Refactor vp9_diamond_search_sad function" 2014-07-10 11:06:32 -07:00
Yunqing Wang
46441ec5c8 Merge "Refactor refining_search_sad code" 2014-07-10 10:43:00 -07:00
hkuang
51e9788e58 Fix a bug in boundary checking.
Change-Id: Ifc741da9da6f61c8d3c1f675ec6b8a96570f877d
2014-07-10 09:43:04 -07:00
Yunqing Wang
75cd57503d Refactor vp9_diamond_search_sad function
Currently, vp9_diamond_search_sadx4() is only called when sse3 is
enabled, which is improper since sse2 optimization of sdx4df
functions are available. Changed to always use
vp9_diamond_search_sadx4().

Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717
2014-07-10 09:19:03 -07:00
James Zern
58609335b1 vp9_loopfilter.c: cosmetics
- fix indent, spelling
- drop some whitespace in some comments
- add an assert in vp9_setup_mask, it shouldn't be called on decode
  error

Change-Id: Ic312a815e977a6f9cb81ceb7b039eeada76c5aa0
2014-07-09 17:27:57 -07:00
Yunqing Wang
30117a576d Refactor refining_search_sad code
There are sse2 optimization of sdx4df functions. Instead of calling
vp9_refining_search_sadx4 only when sse3 is enabled, call it always.

Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9
2014-07-09 16:50:11 -07:00
Jingning Han
f6bf614b2f Merge "Re-design quantization process for 32x32 transform block" 2014-07-09 11:55:26 -07:00
hkuang
b84ee5a3d0 Merge "Move vp9_thread.* to common." 2014-07-09 10:16:13 -07:00
Jingning Han
9ad1b9fc67 Re-design quantization process for 32x32 transform block
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.

Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
2014-07-08 16:55:28 -07:00
Adrian Grange
7c43fb67ae Fix decoder handling of intra-only frames
This patch fixes bug 633:
https://code.google.com/p/webm/issues/detail?id=633

The first decoded frame does not have to be a keyframe,
it could be an inter-frame that is coded intra-only.

This patch fixes the handling of intra-only frames.

A test vector has also been added that encodes 3
intra-only frames at the start of the clip. The
test vector was generated using the code in the
following patch:
https://gerrit.chromium.org/gerrit/#/c/70680/

Change-Id: Ib40b1dbf91aae2bc047e23c626eaef09d1860147
2014-07-08 16:24:03 -07:00
hkuang
337e8015c9 Move vp9_thread.* to common.
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.

Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
2014-07-07 14:52:19 -07:00
Yaowu Xu
82fd084b35 Merge "Re-design quantization process" 2014-07-01 19:04:01 -07:00
Jingning Han
9ac2f66320 Re-design quantization process
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.

The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.

Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
2014-07-01 17:00:07 -07:00
Alex Converse
6c54dbcb69 Merge "BITSTREAM: Handle transform size and motion vectors more logically for non-420." 2014-06-30 17:44:01 -07:00
James Zern
44472cde55 vp9: disable postproc buffer alloc when unnecessary
the buffer is only used in encoding and only when
CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled.
a future change should decouple this from the frame buffer allocation
and make it conditional based on runtime flags when the above config
options are enabled.
reduces decode heap usage by at least 12%

Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf
2014-06-27 20:59:56 -07:00
Jim Bankoski
52b63c238e Merge "Better validation of invalid files" 2014-06-27 11:05:21 -07:00
Jim Bankoski
9f37d149c1 Better validation of invalid files
This patch checks that a decoder never tries to reference frame that's
outside the range of 2x to 1/16th the size of this frame.  Any attempt
to do so causes a failure.

Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c
2014-06-27 10:03:15 -07:00
Jingning Han
46ea9ec719 Enable real-time version reference motion vector search
This commit enables a fast reference motion vector search scheme.
It checks the nearest top and left neighboring blocks to decide the
most probable predicted motion vector. If it finds the two have
the same motion vectors, it then skip finding exterior range for
the second most probable motion vector, and correspondingly skips
the check for NEARMV.

The runtime of speed -5 goes down
pedestrian at 1080p 29377 ms -> 27783 ms
vidyo at 720p       11830 ms -> 10990 ms
i.e., 6%-8% speed-up.

For rtc set, the compression performance
goes down by about -1.3% for both speed -5 and -6.

Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
2014-06-26 09:49:13 -07:00
Adrian Grange
8357292a5a Fix test on maximum downscaling limits
There is a normative scaling range of (x1/2, x16)
for VP9. This patch fixes the maximum downscaling
tests that are applied in the convolve function.

The code used a maximum downscaling limit of x1/5
for historic reasons related to the scalable
coding work. Since the downsampling in this
application is non-normative it will revert to
using a separate non-normative scaler.

Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f
2014-06-24 10:26:09 -07:00
Adrian Grange
8c1f071f1e Allocate buffers based on correct chroma format
The encoder currently allocates frame buffers before
it establishes what the chroma sub-sampling factor is,
always allocating based on the 4:4:4 format.

This patch detects the chroma format as early as
possible allowing the encoder to allocate buffers of
the correct size.

Future patches will change the encoder to allocate
frame buffers on demand to further reduce the memory
profile of the encoder and rationalize the buffer
management in the encoder and decoder.

Change-Id: Ifd41dd96e67d0011719ba40fada0bae74f3a0d57
2014-06-23 11:45:13 -07:00
Jingning Han
961bafc366 Merge "Remove unused vp9_init_quant_tables function" 2014-06-23 09:37:30 -07:00
Johann
1fc2b0fd00 Merge "Include type defines" 2014-06-20 11:29:19 -07:00
Johann
d658216276 Don't return value for void functions
Clears "warning: 'return' with a value, in function returning void"

Change-Id: I93972610d67e243ec772a1021d2fdfcfc689c8c2
2014-06-20 11:26:44 -07:00
Johann
baef0b89da Include type defines
Clears error: unknown type name 'uint8_t'

Change-Id: I9b6eff66a5c69bc24aeaeb5ade29255a164ef0e2
2014-06-20 11:26:13 -07:00
Alex Converse
7557a65d16 BITSTREAM: Handle transform size and motion vectors more logically for non-420.
This breaks the profile 1 bitstream.

Don't force non420 uv transform size to 1/4 y size. In the 4:2:0 case the
chroma corresponding to a luma block is 1/4 its size. In the 4:4:4 case
chroma and luma planes are the same size. Disallowing larger transforms
can result in a loss of compression efficiency and is inconsistent.

For sub-8x8 blocks only average corresponding motion vectors.

4:2:0 and profile 0 behavior remains unchanged.

Change-Id: I560ae07183012c6734dd1860ea54ed6f62f3cae8
2014-06-18 13:07:51 -07:00
Jingning Han
3b9c19aaa7 Remove unused vp9_init_quant_tables function
This function is not effectively used, hence removed.

Change-Id: I2e8e48fa07c7518931690f3b04bae920cb360e49
2014-06-18 11:51:41 -07:00
James Zern
88df435d6b Merge "vp9_rtcd: correct avx2 references" 2014-06-16 17:39:13 -07:00
Johann
79afb5eb41 Use lrand48 on Android
When building x86 assembly use lrand48 instead of the
undocumented inlined _rand function.

Android now supports rand()
https://android-review.googlesource.com/97731
but only for new versions. Original workaround:
https://gerrit.chromium.org/gerrit/15744

Change-Id: I130566837d5bfc9e54187ebe9807350d1a7dab2a
2014-06-12 19:57:25 -07:00
Jingning Han
d5ae43318e Merge "Fast computation path for forward transform and quantization" 2014-06-12 11:59:52 -07:00
Jingning Han
ccba289f8d Fast computation path for forward transform and quantization
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.

It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.

In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.

Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
2014-06-12 11:10:54 -07:00
James Zern
9f3a0dbb5e vp9_rtcd: correct avx2 references
s/"\$avx2_x86inc"/"avx2"/

avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm

Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a
2014-06-10 16:26:36 -07:00
James Zern
cbce09ce62 Merge changes I6abc0657,I8224fba2,I04f64a45,I5d49d119,I76b4d171,I88c11ac3
* changes:
  vp9_sub_pixel_*variance*: disable avx2 variants
  vp9_sad*x4d: disable avx2 variants
  vp9_f(dct|ht): disable avx2 variants
  convolve: disable avx2 variants
  fdct8x8_test: add missing avx2 functions
  dct4x4_test: add missing avx2 functions
2014-06-10 16:14:45 -07:00
James Zern
520cb3f39f vp9_sub_pixel_*variance*: disable avx2 variants
tests failing under Win32/Win64

+ variance_test: add missing avx2 functions (partially disabled)

Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
2014-06-10 16:11:15 -07:00
James Zern
d3ff009d84 vp9_sad*x4d: disable avx2 variants
tests failing under Win32/Win64

+ sad_test: add missing avx2 functions (disabled)

Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
2014-06-10 16:10:12 -07:00
hkuang
cdffeaaae0 Add mode info arrays and mode info index.
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and  previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.

In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.

Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
2014-06-10 13:43:36 -07:00
James Zern
dd9f502933 vp9_f(dct|ht): disable avx2 variants
tests failing under Win32/Win64

+ dct16x16_test: add missing avx2 functions (partially disabled)

exercises the forward transforms
no idct/iht implementations, so the c-code is used

Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
2014-06-09 18:48:11 -07:00