Commit Graph

2408 Commits

Author SHA1 Message Date
James Zern
3a924f6ed1 Merge "signed unsigned mismatch - warning error" 2014-08-01 16:28:38 -07:00
James Zern
1b6ac28a2f Merge "removed sign mismatch warning" 2014-08-01 14:45:12 -07:00
Frank Galligan
5f8fa13258 Merge "Added vp9_sad8x8_neon()" 2014-08-01 14:11:38 -07:00
Scott LaVarnway
98165ec074 Neon version of vp9_sub_pixel_variance8x8(),
vp9_variance8x8(), and vp9_get8x8var().

On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~1.2%.

Change-Id: I8a66ac2a0f550b407caa27816833bdc563395102
2014-08-01 11:35:55 -07:00
Frank Galligan
5487b6067c Merge "Neon version of vp9_sub_pixel_variance32x32()," 2014-08-01 09:46:37 -07:00
Scott LaVarnway
545be78136 Added vp9_sad8x8_neon()
Change-Id: I3be8911121ef9a5f39f6c1a2e28f9e00972e0624
2014-08-01 06:36:18 -07:00
Jim Bankoski
0f3689d32d signed unsigned mismatch - warning error
Change-Id: I991e36aa3cfa62aae6d27b253297dd9ca9e8bc12
2014-08-01 06:29:32 -07:00
Jim Bankoski
512f9b631f removed sign mismatch warning
Change-Id: Iaa40b472f6c1c48bb3bb47332b6fcf36d7f3c10e
2014-08-01 06:28:00 -07:00
Scott LaVarnway
6f4b8dcdc2 Neon version of vp9_subtract_block()
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~3.2%

Change-Id: I8862497264142171b7efc32df1a67714a23539f4
2014-07-31 09:28:06 -07:00
Scott LaVarnway
d39448e2d4 Neon version of vp9_sub_pixel_variance32x32(),
vp9_variance32x32(), and vp9_get32x32var().

Change-Id: I8137e2540e50984744da59ae3a41e94f8af4a548
2014-07-31 08:00:36 -07:00
Scott LaVarnway
d4a37db5b8 Neon version of vp9_quantize_fp()
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~12.4%

Change-Id: Id29d215acf58bb108489e218a259adf74b4768d7
2014-07-30 09:33:46 -07:00
Scott LaVarnway
521cf7e879 Neon version of vp9_sub_pixel_variance16x16(),
vp9_variance16x16(), and vp9_get16x16var().

On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~16.7%.

Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb
2014-07-30 08:17:32 -07:00
Scott LaVarnway
d19d222db6 Added vp9_fdct8x8_neon(), vp9_fdct8x8_1_neon()
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~3.7%.

Change-Id: I428c72c40df82c6d537955e320a8debf99343004
2014-07-29 08:56:05 -07:00
levytamar82
4ba92dc5ab Fix bug 805
Remove all the redundant dct functions (dct4x4, dct8x8)
in avx2 except dct32x32 those functions were copied originally from dct_sse2

Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e
2014-07-28 15:46:01 -07:00
Jingning Han
53844275e9 Fix potential ioc issue in vp9_get_prob for 4K above sizes
This commit turns on the existing vp9_get_prob function using
64 bit in the intermediate step. It fixes the ioc issue for 4K
above frame sizes (issue 828).

Change-Id: I9f627f3beca2c522f73b38fd2a3e7eefdff01a7c
2014-07-24 15:35:51 -07:00
Alex Converse
5926e7c0e8 Remove unfinished VP9 alpha channel.
Change-Id: Ic5d3a3a0dac10b49495771886a31e793bb78b5ca
2014-07-21 15:55:50 -07:00
Deb Mukherjee
727f384085 Merge "Separates profile 2 into 2 profiles 2 and 3" 2014-07-18 03:23:51 -07:00
Deb Mukherjee
c447a50aea Separates profile 2 into 2 profiles 2 and 3
Separates HBD profile int two profiles (2 and 3) consistent with the
highbitdepth branch. This patch is ported from the original highbitdepth
branch patch: https://gerrit.chromium.org/gerrit/#/c/70460/

Two of the invalid file tests needed to be updated.

Change-Id: I6a4acd2f7a60b1fb4cbcc8e0dad4eab4248431e3
2014-07-17 20:51:59 -07:00
Adrian Grange
8cb8aef7c7 Merge "Modified frame buffer handling" 2014-07-17 12:15:16 -07:00
Scott LaVarnway
ba0652e83a Merge "Added vp9_sad64x64_neon(), vp9_sad32x32_neon()" 2014-07-17 11:42:16 -07:00
Adrian Grange
f68aaa38d6 Modified frame buffer handling
This patch is the first step toward simplifying the
frame buffer handling.

The final goal is to have a common frame buffer handling
framework for both encoder and decoder that incorporates
the existing ability to use externally allocated memory.

Change-Id: I2c378a4f54a39908915f46c4260e17a080db7ff1
2014-07-17 11:06:35 -07:00
Scott LaVarnway
696fa52eaa Added vp9_sad64x64_neon(), vp9_sad32x32_neon()
and vp9_sad16x16_neon()

On a Nexus 7, vpxenc (in realtime mode, speed -6)
reported a performance improvement of ~17%.

Change-Id: I91e070cde2973451083d3f3d63b49b7886de9a85
2014-07-16 12:54:46 -07:00
Deb Mukherjee
1f6aaeddc5 Merge "Some extra bit probability cleanups" 2014-07-14 17:26:54 -07:00
Jingning Han
6ce515b9ff Merge "Fix chrome valgrind warning due to the use of mismatched bsize" 2014-07-13 11:07:44 -07:00
James Zern
0999a2a24e Merge "vp9_loopfilter.c: cosmetics" 2014-07-11 16:02:21 -07:00
Jingning Han
3cddd81c6d Fix chrome valgrind warning due to the use of mismatched bsize
This commit fixes a mismatched use case of block size in non-RD
intra prediction check. The residual SSE and variance should be
calculated per transform block size, instead of operating block
size, which caused chrome valgrind warning on conditional jump
based on uninitialized value (webm issue 823). This commit
resolves this issue.

Change-Id: I595c06599c7e0fd0e4a08736519ba68fc14bc79a
2014-07-11 15:49:22 -07:00
Yunqing Wang
7e340614c1 Merge "Remove unnecessary assertions" 2014-07-11 13:47:03 -07:00
Deb Mukherjee
6957e7a077 Some extra bit probability cleanups
Refactoring to remove some duplication of probability
tables between tokenization and detokenization.

Change-Id: I2fc6a6497f9c0410021a9b41f828bc58a864e466
2014-07-11 11:39:18 -07:00
Yunqing Wang
978642a426 Remove unnecessary assertions
Removed 2 unnecessary assertions.

Change-Id: I0f8877d0494bf3ecdb0d7931ccbcaa8289e01d8b
2014-07-11 10:48:57 -07:00
Yaowu Xu
a75d55df1b Remove an unused parameter
Change-Id: I6ad6fd75dc3c9e6218d88148cf49e205398e2af5
2014-07-11 08:10:04 -07:00
James Zern
8a7cc1f47b Merge "update vp9_thread.c" 2014-07-10 23:19:55 -07:00
James Zern
8701ed0270 update vp9_thread.c
pull the latest from libwebp.

Original source:
 http://git.chromium.org/webm/libwebp.git
 100644 blob 264210ba2807e4da47eb5d18c04cf869d89b9784 src/utils/thread.c

commit 46fd44c1042c9903b2f1ab87e9f200a13c7e702d
Author: James Zern <jzern@google.com>
Date:   Tue Jul 8 19:53:28 2014 -0700

    thread: remove harmless race on status_ in End()

    if a thread was still doing work when End() was called there'd be a race
    on worker->status_. in these cases, however, the specific value is
    meaningless as it would be >= OK and the thread would have been shut
    down properly, but we'll check 'impl_' instead to avoid any potential
    TSan/DRD reports.

    Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1

Change-Id: Ib0ef25737b3c6d017fa74822e21ed58508230b91
2014-07-10 12:20:54 -07:00
Yunqing Wang
1226d133df Merge "Refactor vp9_diamond_search_sad function" 2014-07-10 11:06:32 -07:00
Yunqing Wang
46441ec5c8 Merge "Refactor refining_search_sad code" 2014-07-10 10:43:00 -07:00
hkuang
51e9788e58 Fix a bug in boundary checking.
Change-Id: Ifc741da9da6f61c8d3c1f675ec6b8a96570f877d
2014-07-10 09:43:04 -07:00
Yunqing Wang
75cd57503d Refactor vp9_diamond_search_sad function
Currently, vp9_diamond_search_sadx4() is only called when sse3 is
enabled, which is improper since sse2 optimization of sdx4df
functions are available. Changed to always use
vp9_diamond_search_sadx4().

Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717
2014-07-10 09:19:03 -07:00
James Zern
58609335b1 vp9_loopfilter.c: cosmetics
- fix indent, spelling
- drop some whitespace in some comments
- add an assert in vp9_setup_mask, it shouldn't be called on decode
  error

Change-Id: Ic312a815e977a6f9cb81ceb7b039eeada76c5aa0
2014-07-09 17:27:57 -07:00
Yunqing Wang
30117a576d Refactor refining_search_sad code
There are sse2 optimization of sdx4df functions. Instead of calling
vp9_refining_search_sadx4 only when sse3 is enabled, call it always.

Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9
2014-07-09 16:50:11 -07:00
Jingning Han
f6bf614b2f Merge "Re-design quantization process for 32x32 transform block" 2014-07-09 11:55:26 -07:00
hkuang
b84ee5a3d0 Merge "Move vp9_thread.* to common." 2014-07-09 10:16:13 -07:00
Jingning Han
9ad1b9fc67 Re-design quantization process for 32x32 transform block
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.

Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
2014-07-08 16:55:28 -07:00
Adrian Grange
7c43fb67ae Fix decoder handling of intra-only frames
This patch fixes bug 633:
https://code.google.com/p/webm/issues/detail?id=633

The first decoded frame does not have to be a keyframe,
it could be an inter-frame that is coded intra-only.

This patch fixes the handling of intra-only frames.

A test vector has also been added that encodes 3
intra-only frames at the start of the clip. The
test vector was generated using the code in the
following patch:
https://gerrit.chromium.org/gerrit/#/c/70680/

Change-Id: Ib40b1dbf91aae2bc047e23c626eaef09d1860147
2014-07-08 16:24:03 -07:00
hkuang
337e8015c9 Move vp9_thread.* to common.
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.

Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
2014-07-07 14:52:19 -07:00
Yaowu Xu
82fd084b35 Merge "Re-design quantization process" 2014-07-01 19:04:01 -07:00
Jingning Han
9ac2f66320 Re-design quantization process
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.

The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.

Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
2014-07-01 17:00:07 -07:00
Alex Converse
6c54dbcb69 Merge "BITSTREAM: Handle transform size and motion vectors more logically for non-420." 2014-06-30 17:44:01 -07:00
James Zern
44472cde55 vp9: disable postproc buffer alloc when unnecessary
the buffer is only used in encoding and only when
CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled.
a future change should decouple this from the frame buffer allocation
and make it conditional based on runtime flags when the above config
options are enabled.
reduces decode heap usage by at least 12%

Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf
2014-06-27 20:59:56 -07:00
Jim Bankoski
52b63c238e Merge "Better validation of invalid files" 2014-06-27 11:05:21 -07:00
Jim Bankoski
9f37d149c1 Better validation of invalid files
This patch checks that a decoder never tries to reference frame that's
outside the range of 2x to 1/16th the size of this frame.  Any attempt
to do so causes a failure.

Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c
2014-06-27 10:03:15 -07:00
Jingning Han
46ea9ec719 Enable real-time version reference motion vector search
This commit enables a fast reference motion vector search scheme.
It checks the nearest top and left neighboring blocks to decide the
most probable predicted motion vector. If it finds the two have
the same motion vectors, it then skip finding exterior range for
the second most probable motion vector, and correspondingly skips
the check for NEARMV.

The runtime of speed -5 goes down
pedestrian at 1080p 29377 ms -> 27783 ms
vidyo at 720p       11830 ms -> 10990 ms
i.e., 6%-8% speed-up.

For rtc set, the compression performance
goes down by about -1.3% for both speed -5 and -6.

Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
2014-06-26 09:49:13 -07:00