Commit Graph

3255 Commits

Author SHA1 Message Date
James Zern
d456ccbc9d vp9_loopfilter_mmx: remove some unused tables
Change-Id: I964d25cc91c8e4864d73b142d9c7a1b39cb6cfbb
2014-12-12 11:16:24 -08:00
JackyChen
3425d6c83e Merge "Multiframe Quality Enhancement(MFQE) in VP9." 2014-12-11 16:24:08 -08:00
Alexander Voronov
6c6a97814f Prevent decoder from using uninitialized entropy context.
If decoding starts with intra-only frame, there is a possibility
of using uninitialized entropy context, what leads to undefined
behavior.

Change-Id: Icbb64b5b1bd1e5de2a4bfa2884e56bc0a20840af
2014-12-11 20:44:19 +03:00
Peter de Rivaz
5c22224e9e Corrected optimization of 8x8 DCT code
The 8x8 DCT uses a fast version whenever possible.
There was a mistake in the checking code which
meant sometimes the fast version was used when it
was not safe to do so.

Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7
(cherry picked from commit fd05fb0c21)
2014-12-11 09:42:57 -08:00
JackyChen
7ac3e3c1d6 Multiframe Quality Enhancement(MFQE) in VP9.
It is the first version of MFQE in VP9. There are a few TODOs included
in this version.
Usage: Add flag --enable-vp9-postproc to config the project.
In decoder, use flag --mfqe in the command line to enable
MFQE in postproc.
Note: Need to have key frame with low quality to see the effect of this
new patch. In my experiment, I fixed the qindex to 200 in key frame.

Change-Id: I021f9ce4616ed3574c81e48d968662994b56a396
2014-12-11 09:19:39 -08:00
James Yu
3f7c12dab9 VP9 common for ARMv8 by using NEON intrinsics 18
Add vp9_idct32x32_add_neon.c
- vp9_idct32x32_1024_add_neon

Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 18:20:04 -08:00
James Yu
3cfed4bf76 VP9 common for ARMv8 by using NEON intrinsics 14
Add vp9_idct16x16_add_neon.c
- vp9_idct16x16_256_add_neon_pass1
- vp9_idct16x16_256_add_neon_pass2
- vp9_idct16x16_10_add_neon_pass1
- vp9_idct16x16_10_add_neon_pass2

Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 18:19:54 -08:00
James Yu
ce76aeb00d VP9 common for ARMv8 by using NEON intrinsics 13
Add vp9_idct8x8_add_neon.c
- vp9_idct8x8_64_add_neon
- vp9_idct8x8_10_add_neon

Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 14:56:54 -08:00
James Yu
8c25f4af6a VP9 common for ARMv8 by using NEON intrinsics 12
Add vp9_idct4x4_add_neon.c
- vp9_idct4x4_16_add_neon

Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 14:49:59 -08:00
James Yu
420f58f2d2 VP9 common for ARMv8 by using NEON intrinsics 11
Add vp9_idct16x16_1_add_neon.c
- vp9_idct16x16_1_add_neon

Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 13:06:58 -08:00
James Yu
030ca4d0e5 VP9 common for ARMv8 by using NEON intrinsics 10
Add vp9_idct32x32_1_add_neon.c
- vp9_idct32x32_1_add_neon

Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 13:04:29 -08:00
James Yu
2772b45ac0 VP9 common for ARMv8 by using NEON intrinsics 09
Add vp9_idct8x8_1_add_neon.c
- vp9_idct8x8_1_add_neon

Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 12:52:33 -08:00
James Yu
9114f0afdb VP9 common for ARMv8 by using NEON intrinsics 08
Add vp9_idct4x4_1_add_neon.c
- vp9_idct4x4_1_add_neon

Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 12:49:28 -08:00
James Yu
01fc6f51e0 VP9 common for ARMv8 by using NEON intrinsics 07
Add vp9_convolve8_neon.c
- vp9_convolve8_horiz_neon
- vp9_convolve8_vert_neon

Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
893534a996 VP9 common for ARMv8 by using NEON intrinsics 04
Add vp9_convolve8_avg_neon.c
- vp9_convolve8_avg_horiz_neon
- vp9_convolve8_avg_vert_neon

Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
d12757f5c6 VP9 common for ARMv8 by using NEON intrinsics 03
Add vp9_copy_neon.c
- vp9_convolve_copy_neon

Change-Id: I291fc5423d06240876411bbceab03eae5ef585be
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:02:46 -08:00
Scott LaVarnway
617382a2e3 VP9 common for ARMv8 by using NEON intrinsics 02
Add vp9_avg_neon.c
- vp9_convolve_avg_neon

Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 19:00:21 -08:00
hkuang
4eee74d6ed Fix clang ioc warning due to NULL src_mi pointer.
The warning only happens in VP9 encoder's first pass due to src_mi
is not set up yet. But it will not fail the encoder as left_mi and
above_mi are not used in the first_pass and they will be set up again
in the second pass.

Change-Id: I12dffcd5fb1002b2b2dabb083c8726650e4b5f08
2014-12-09 14:32:48 -08:00
James Yu
5b098b1825 VP9 common for ARMv8 by using NEON intrinsics 01
Add vp9_loopfilter_neon.c
- vp9_lpf_horizontal_4_neon
- vp9_lpf_vertical_4_neon
- vp9_lpf_horizontal_8_neon
- vp9_lpf_vertical_8_neon

Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 12:26:56 -08:00
Yunqing Wang
cddbdeabd0 Merge "SSSE3 Optimization for Atom processors using new instruction selection and ordering" 2014-12-08 13:34:54 -08:00
James Zern
c38d0490b3 Merge "Changes to assembler for NASM on mac." 2014-12-08 12:55:06 -08:00
hkuang
81e5cb86d3 Fix the comments.
Change-Id: I9789476865a1b24dad54115d8f7edb4fed780b90
2014-12-08 12:44:09 -08:00
levytamar82
8f9d94ec17 SSSE3 Optimization for Atom processors using new instruction selection and ordering
The function vp9_filter_block1d16_h8_ssse3 uses the PSHUFB instruction which has a 3 cycle latency and slows execution when done in blocks of 5 or more on Atom processors.
By replacing the PSHUFB instructions with other more efficient single cycle instructions (PUNPCKLBW + PUNPCHBW + PALIGNR) performance can be improved.
In the original code, the PSHUBF uses every byte and is consecutively copied.
This is done more efficiently by PUNPCKLBW and PUNPCHBW, using PALIGNR to concatenate the intermediate result and then shift right the next consecutive 16 bytes for the final result.

For example:
filter = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8
Reg = 0,1,2,3,4,5,6,7,8,9,10,11,12,13,14,15
REG1 = PUNPCKLBW Reg, Reg = 0,0,1,1,2,2,3,3,4,4,5,5,6,6,7,7
REG2 = PUNPCHBW Reg, Reg = 8,8,9,9,10,10,11,11,12,12,13,13,14,14,15,15
PALIGNR REG2, REG1, 1 = 0,1,1,2,2,3,3,4,4,5,5,6,6,7,7,8

This optimization improved the function performance by 23% and produced a 3% user level gain on 1080p content on Atom processors.
There was no observed performance impact on Core processors (expected).

Change-Id: I3cec701158993d95ed23ff04516942b5a4a461c0
2014-12-08 13:11:01 -07:00
hkuang
f925e5ce0f Merge "Improve the performance by caching the left_mi and right_mi in macroblockd." 2014-12-08 10:24:17 -08:00
hkuang
382f86f945 Improve the performance by caching the left_mi and right_mi in macroblockd.
This improve the deocde performance by ~2% on Nexus 7 2013.

Change-Id: Ie9c4ba0371a149eb7fddc687a6a291c17298d6c3
2014-12-05 16:25:42 -08:00
hkuang
eaa6deee5b Merge "Merge set_prev_mi function into encoder function." 2014-12-05 15:12:50 -08:00
Peter de Rivaz
a306bd8274 Use the RTC optimizations when in high bitdepth mode.
Change 72193 made the encoder behave differently
when configured with and without high bitdepth.
This change means the same algorithm is used for both.

Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
2014-12-04 15:48:42 -08:00
hkuang
62de07c8c6 Merge set_prev_mi function into encoder function.
Change-Id: Ifcf2efbb232ea4cabcdebbe77e0820d121e4a6da
2014-12-04 14:44:23 -08:00
Marco
8fd3f9a2fb Enable non-rd mode coding on key frame, for speed 6.
For key frame at speed 6: enable the non-rd mode selection in speed setting
and use the (non-rd) variance_based partition.

Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames),
mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16.

Loss in key frame quality (~0.6-0.7dB) compared to rd coding,
but speeds up key frame encoding by at least 6x.
Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6.

Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
2014-12-03 09:18:08 -08:00
Peter de Rivaz
7e40a55ef9 Added high bitdepth sse2 transform functions
Also removes some spurious changes in common/vp9_blockd.h which
was introduced by a rebase issue between nextgen and master branches.

Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
(cherry picked from commit 005d80cd05)
(cherry picked from commit 08d2f54800)
(cherry picked from commit 4230c2306c)
2014-12-02 11:16:24 -08:00
Alex Converse
0496d11486 Fix a tautological assert.
Change-Id: I90ad08823e1d038384536fa9f458caadc2c87f38
2014-11-24 15:01:01 -08:00
Debargha Mukherjee
e9d9f1adab Merge "Refactored idct routines and headers" 2014-11-24 12:47:03 -08:00
John Stark
71379b87df Changes to assembler for NASM on mac.
fixes non-Apple nasm part of issue #755

Change-Id: I11955d270c4ee55e3c00e99f568de01b95e7ea9a
2014-11-24 12:00:50 -08:00
Peter de Rivaz
3a8c43a479 Refactored idct routines and headers
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.

The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected.  For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.

Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
(cherry picked from commit 454342d4e7)
2014-11-24 09:57:40 -08:00
Debargha Mukherjee
02355a4abf Merge "Added highbitdepth sse2 acceleration for quantize" 2014-11-21 16:08:47 -08:00
Peter de Rivaz
a7b2d09f36 Added highbitdepth sse2 acceleration for quantize
Also includes block error.

(This patch is mostly cherry picked from
commit db7192e0b0)

Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78
2014-11-19 23:55:19 -08:00
Jingning Han
c42715b721 Enable ssse3 version of vp9_fdct8x8_quant
It improves the speed performance of vp9_fdct8x8_quant_sse2 by
about 5%.

Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097
2014-11-19 22:14:19 -08:00
Jingning Han
bf63652d34 Merge "Combine fdct8x8 and quantization process" 2014-11-19 11:17:44 -08:00
Jingning Han
ce77a7bcb0 Merge "Add sse2 version for vp9_quantize_fp" 2014-11-19 11:17:36 -08:00
Jingning Han
c6908fd5f7 Combine fdct8x8 and quantization process
This commit reworks the forward transform and quantization process
for 8x8 block coding. It combines the two operations in a single
function to save a store/load stage of the original transform
coefficients. Overall the speed -6 is slightly faster (around 1%
range). The compression performance of speed -6 is improved by
3.4%.

Change-Id: Id6628daef123f3e4649248735ec2ad7423629387
2014-11-18 18:10:56 -08:00
Jingning Han
2d3cc8ea2b Add sse2 version for vp9_quantize_fp
vp9_quantize_fp is the quantization process used by rtc coding
mode. This commit adds a sse2 implementation of it. The
implementation is modified based on vp9_quantize_b_sse2. No speed
difference from ssse3 version.

Change-Id: I24949c5b27df160b4f35117d28858d269454e64a
2014-11-18 09:01:41 -08:00
Yaowu Xu
1687c47bfd change to call vp9_refining_search_sad() directly
The function pointer in compressor instance does not change, so this
commit changes to call the function directly.

Change-Id: I9c9c460e3475711c384b74c9842f0b4f3d037cc5
2014-11-17 11:30:17 -08:00
Peter de Rivaz
48032bfcdb Added sse2 acceleration for highbitdepth variance
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f
(cherry picked from commit d7422b2b1e)
(cherry picked from commit 6d741e4d76)
2014-11-14 15:18:53 -08:00
Debargha Mukherjee
002172efd6 Merge "Added highbitdepth sse2 SAD acceleration and tests" 2014-11-12 21:20:34 -08:00
Peter de Rivaz
7eee487c00 Added highbitdepth sse2 SAD acceleration and tests
Change-Id: I1a74a1b032b198793ef9cc526327987f7799125f
(cherry picked from commit b1a6f6b9cb)
2014-11-12 14:25:45 -08:00
Deb Mukherjee
cc57c5e4af Iadst transforms to use internal low precision
Change-Id: I266777d40c300bc53b45b205144520b85b0d6e58
(cherry picked from commit a1b726117f)
2014-11-07 14:19:45 -08:00
Yaowu Xu
98492c1091 Merge "Change the use of a reserved color space entry" 2014-11-07 06:24:59 -08:00
Yaowu Xu
af3519a385 Change the use of a reserved color space entry
This commit rename a reserved color space entry to BT_2020, it intends
to provide support for VP9 bitstream to pass along the color space
type defined in BT.2020(Rec.2020)

please note this entry does not have any effect on encoding/decoding
behavior, but allow applications to the pass the information along
from encoding end to decoding end.

Change-Id: I4678520e89141ea5e8900f7bd1c0e95b710b7091
2014-11-06 19:14:21 -08:00
Yunqing Wang
1228433430 Modify the frame context memory deallocation
This patch was to fix the vpxdec fuzzing3 test failure. When an
error occurs, setjmp() is invoked, which calls the decoder
removing routine. In multiple thread situation, other threads
could try to access the frame context memory that is already
deallocated, thus causing a segfault.

An invalid unit test was added for this issue.

Change-Id: Ida7442154f3d89759483f0f4fe0324041fffb952
2014-11-06 11:34:19 -08:00
hkuang
e8860693ea Merge "Totally remove prev_mi in VP9 decoder." 2014-11-05 17:48:47 -08:00
hkuang
4cc7c5a17f Totally remove prev_mi in VP9 decoder.
This will save the memory and improve the decode speed due to
removing unnecessary memset of big prev_mi array for
all the key frames.

Decoding a all key frames 1080p video shows speed improve around 2%.

Change-Id: I6284a445c1291056e3c15135c3c20d502f791c10
2014-11-05 16:14:30 -08:00
Yaowu Xu
2c4fee17bc Fix visual studio 2013 compiler warnings
For configured with --enable-vp9-highbitdepth

Change-Id: I2b181519d7192f8d7a241ad5760c3578255f24e6
2014-11-05 13:47:28 -08:00
hkuang
23da920a8e Fix the memory leak due to missing free frame_mvs.
Change-Id: I2ceee7341d906259002c0ea31ea009ae32c04bfd
2014-11-04 13:28:31 -08:00
Yunqing Wang
6d90a9d289 Merge "WORKAROUND FIX FOR GCC4.9.1" 2014-11-03 16:56:38 -08:00
levytamar82
86175a5788 WORKAROUND FIX FOR GCC4.9.1
In the function mb_lpf_horizontal_edge_w_avx2_16 the usage of the intrinsic
_mm256_cvtepu8_epi16 cause a compiler bug in gcc 4.9.1.
until it will be fixed I created a workaround that create the up convert by
using broadcast128+shuffle.
The bug was reported here:
https://code.google.com/p/webm/issues/detail?id=867

Change-Id: I73452e6806f42e0fadcde96b804ea3afa7eeb351
2014-11-01 11:27:28 -07:00
hkuang
55577431ae Bind motion vectors with frame buffer structure.
This will save a lot of memory for decoder due to removing of prev_mi,
but prev_mi is still needed in encoder. So this will increase a little bit
memory for encoder.

Change-Id: I24b2f1a423ebffa55a9bd2fcee1077dac995b2ed
2014-10-31 17:01:08 -07:00
Hui Su
d478d2df37 Merge "Move the definition of switchable filter numbers into enum INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and IF_DIFF_REF_FRAME_ADD_MV." 2014-10-30 11:05:04 -07:00
James Zern
01900edc40 Merge changes I8a9c9019,Ic7b2faa3,I44d42a50,I3f3a3924,I10747b32,I31b49c9e
* changes:
  add vp9_loop_filter_data_reset
  move LFWorkerData allocation to VP9LfSync
  vp9_loop_filter_frame_mt: remove pbi dependency
  vp9_loop_filter_frame_mt: pass planes directly
  vp9_loop_filter_frame_mt: pass VP9LfSync directly
  vp9: store TileWorkerData allocations separately
2014-10-24 11:43:51 -07:00
James Zern
01483677e5 add vp9_loop_filter_data_reset
Change-Id: I8a9c9019242ec10fa499a78db322221bf96a0275
2014-10-23 19:43:48 +02:00
Yunqing Wang
330a6b2756 Merge "vp9_ethread: allocate frame contexts outside VP9_COMMON struct" 2014-10-22 17:10:39 -07:00
Yunqing Wang
7c7e4d4eb8 vp9_ethread: allocate frame contexts outside VP9_COMMON struct
This patch allocated frame contexts outside VP9_COMMON. This allows
multiple threads to share the same copy of frame contexts, and
reduces the overhead. It also guarantees the correct update of
these contexts during bitstream packing. This patch doesn't change
encoding result.

Change-Id: Ic181a2460b891d1d587278a6d02d8057b9dbd353
2014-10-22 15:03:12 -07:00
Frank Galligan
95a568b3a8 Fix Neon convolve profiling
When profiling, gprof can't distinguish between matching labels in
different files.

Change-Id: I56770df212ed314a0d8568071fa8157624ef1e8f
2014-10-22 10:51:53 -07:00
Hangyu Kuang
9ce3a7d76c Implement frame parallel decode for VP9.
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.

Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
2014-10-22 10:50:58 -07:00
Hui Su
8947b18fa3 Move the definition of switchable filter numbers into enum
INTERP_FILTER; Modify the macro ADD_MV_REF_LIST and
IF_DIFF_REF_FRAME_ADD_MV.

Change-Id: Ic36c9eb6ccb8ec324d991f7241e42b40b60b1dcb
2014-10-21 15:41:37 -07:00
Yunqing Wang
687c56e802 Merge "SAD32xh and SAD64xh for AVX2" 2014-10-20 12:37:55 -07:00
levytamar82
7045aec00a SAD32xh and SAD64xh for AVX2
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain

Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
2014-10-19 13:59:10 -07:00
Peter de Rivaz
73ae6e495c Add highbitdepth function for vp9_avg_8x8
Cherry-picked from https://gerrit.chromium.org/gerrit/#/c/71914/
(a92f987a6b) on highbitdepth branch.

Change-Id: I6903e4e4cb57d90590725c8a1c64c23da7ae65e8
2014-10-17 17:04:37 -07:00
James Zern
e9b8810b4d move LFWorkerData allocation to VP9LfSync
this removes an assumption that worker->data1 would be pointing to a
TileWorkerData allocation.
additionally, within the multi-threaded loopfilter pass VP9LfSync as a
parameter to the worker hook, removing the need for a shadow pointer in
LFWorkerData.

Change-Id: Ic7b2faa34e3eb59dbcb8a7c67f333448fa047c88
2014-10-16 18:55:46 +02:00
Alex Converse
00a9671bbd Merge "Add a 32-bit friendly sse2 quantizer." 2014-10-14 14:35:02 -07:00
Alex Converse
7497d2fb23 Add a 32-bit friendly sse2 quantizer.
This is based on the 64-bit ssse3 quantizer.

1.1x speedup for screen content at speed 7.

Change-Id: I57d15415ef97c49165954bbe3daaaf9318e37448
2014-10-14 11:37:41 -07:00
hkuang
c38a8edf16 Merge "Remove extra line." 2014-10-14 11:05:01 -07:00
Adrian Grange
f7c336aa19 Merge "Remove mi_grid_base_array from VP9_COMMON (unused)" 2014-10-14 07:50:17 -07:00
hkuang
c5fd035ce0 Use pre increment.
Change-Id: I016b4e77d8268e189473f4c382603afe1ae1750f
2014-10-13 14:07:03 -07:00
Adrian Grange
83b63d573a Remove mi_grid_base_array from VP9_COMMON (unused)
Change-Id: I4b4764463f5a7cdc01ec004b882c6237466c74b0
2014-10-13 11:54:05 -07:00
hkuang
dbe91de6d4 Remove extra line.
Change-Id: I5e79c276d8953ae17cd35b2846e6e40660c037c3
2014-10-10 14:59:04 -07:00
hkuang
effc1a6f56 Correct the code format.
Change-Id: If2de420f8123a4e8bf635dd29205dd74ee174eee
2014-10-09 17:57:45 -07:00
Deb Mukherjee
9a29fdbae7 Merge "Rename highbitdepth functions to use highbd prefix" 2014-10-09 15:39:56 -07:00
Deb Mukherjee
1929c9b391 Rename highbitdepth functions to use highbd prefix
Uses highbd_ prefix convention consistently.

Change-Id: I58f7f799a7ff8e32701bcd71c955bcf1cdd4581e
2014-10-09 14:40:40 -07:00
James Zern
caa0f81914 vp9_rtcd_defs: fix vp9_avg_8x8 declaration
vp9_avg_8x8 does not depend on x86inc, fixes 32-bit OS X build

Change-Id: I709b874ea84bf57c8cdb5ac7d43eecc6b8c1a2dd
2014-10-09 10:44:42 +02:00
Jingning Han
f6ff752c63 Merge "Clean up header files in vp9_blockd.h and related files" 2014-10-08 15:25:09 -07:00
Jingning Han
1c3398675f Merge "Use #define statement for MAX_MB_PLANE" 2014-10-08 15:24:56 -07:00
Jim Bankoski
20254d1daa Merge "experimental : partition using 1/8 x 1/8 image" 2014-10-08 09:04:26 -07:00
Jim Bankoski
0ce51d823f experimental : partition using 1/8 x 1/8 image
The concept:

There's too much noise in source pixels for variance and at low bitrate
the reconstructed looks nothing like the source so we have problems
getting good partitionings with either.   This skirts the issue by using
a box blur scaled down version for variance calculations.  To compare
against source_var_ moved keyframe to be rd based like source_var.

Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
2014-10-07 16:36:14 -07:00
Jingning Han
608c4acc1f Merge "Remove vp9_blockd.h from vp9_common_data.c" 2014-10-07 15:34:07 -07:00
Jingning Han
3bbec7b422 Merge "Replace mi_width_log2() with mi_width_log2_lookup table" 2014-10-07 15:33:52 -07:00
Jingning Han
27c9577f8e Merge "Take out repeated block width/height lookup functions" 2014-10-07 15:33:45 -07:00
Jingning Han
6ad272cb84 Clean up header files in vp9_blockd.h and related files
This commit breaks the overly broad header files into more
targeted and smaller ones, to help better structure the system
layout.

Change-Id: I7b24559d3ea6e582cf5d9bbe8f71459f9824d71b
2014-10-07 15:17:10 -07:00
Jingning Han
3c28fb768d Use #define statement for MAX_MB_PLANE
Change-Id: I3a7f83ab1dbfcedc8a82fe798c2fa30dd9c7d696
2014-10-07 15:00:22 -07:00
Jingning Han
d7febaf5c5 Remove extra empty line
Change-Id: I6f2865bb8ba9295f5c45a4cad065aecbe1e63c32
2014-10-07 14:06:54 -07:00
Jingning Han
bd9706506f Merge "Move inter filter defs to vp9_filter.h" 2014-10-07 13:42:26 -07:00
Jingning Han
ebd724852e Remove vp9_blockd.h from vp9_common_data.c
The basic data defs should be above block operation level.

Change-Id: I7dd9836d01120ab75e0c472baac9f15495ed0db5
2014-10-07 13:02:54 -07:00
Jingning Han
7ee58985bd Replace mi_width_log2() with mi_width_log2_lookup table
Change-Id: If0ea98aa139d14d40cd924114e18396aff36b5a5
2014-10-07 12:45:25 -07:00
Jingning Han
b66f7016c1 Take out repeated block width/height lookup functions
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.

Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
2014-10-07 12:33:07 -07:00
Jingning Han
5d9cdac087 Move inter filter defs to vp9_filter.h
Add comments on the use case of these definitions. Further reduce
the scope of header file in vp9_context_tree.h.

Change-Id: Ic4a7638e838d0ac441b64abfc56e57354c059d75
2014-10-07 12:16:37 -07:00
Deb Mukherjee
cfc337aae8 Merge "Resolves some static analysis / undefined warnings" 2014-10-07 12:15:26 -07:00
Deb Mukherjee
fced63ed30 Resolves some static analysis / undefined warnings
Also fixes a case of distortion becoming negative and messing
up the RDCOST computation.

Change-Id: Id345af9e8dfff31ade622be5756e51f2cdface53
2014-10-07 11:20:56 -07:00
JackyChen
a9f479682a Merge "Add SSE2 code and unit test for VP9 denoiser." 2014-10-07 10:51:55 -07:00
JackyChen
80465dae88 Add SSE2 code and unit test for VP9 denoiser.
This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are
only 16x16 blocks in denoiser, while in VP9, there are 13 different
block sizes.

By adding this SSE2 code, the improvement of encoder speed is around
20%(using C code vs using SSE2 code), vary for different clips.

The unit test for VP9 denoiser is to confirm that the SSE2 code is
bit-exact with the C code. The unit test covers all block size.

Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
2014-10-06 15:27:40 -07:00
Jingning Han
12344f2697 Add range check in inverse ADST 16x16
Bit-stream clarification related to Issue 868.

Change-Id: I92a7bc5b7782c9ea5c3f6cceec761742183c9514
2014-10-06 11:07:58 -07:00
Deb Mukherjee
3bcc2af8cd Some data type changes in vp9_idct.c
Resolves a visual studio warning, and includes some cleanups.

Change-Id: I6a7576ef323c475b7d1c659800cd82c6cb1fd18d
2014-10-04 16:03:04 -07:00
Deb Mukherjee
8a01074d04 Merge "Incorporate WRAPLOW macro into non-highbitdepth tx" 2014-10-03 12:45:39 -07:00
Deb Mukherjee
d50716face Incorporate WRAPLOW macro into non-highbitdepth tx
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.

Also incudes some minor cleanups/renaming etc.

Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
2014-10-03 11:38:05 -07:00
Yaowu Xu
f809475c73 Merge "Make iscan and scan neighbor arrays static const." 2014-10-02 15:15:58 -07:00
Yaowu Xu
9712bc691d Make iscan and scan neighbor arrays static const.
This commit changes the tables to be read only, which fixes
issue #866

Change-Id: I85bbe03f9d344f50570f8c1c61699bdc5cee248f
2014-10-02 14:08:14 -07:00
Alexander Voronov
befc36d4a7 Fix invalid memory access in inter prediction (issue 853).
Change-Id: I5a566d6ade720f212a60c0ad5d6f1ee1d1d37f2e
2014-10-02 18:57:47 +04:00
Jingning Han
c7d719325e Merge "Remove redundant header file from vp9_idct.h" 2014-10-01 17:05:36 -07:00
Deb Mukherjee
30fbf23fda Merge "High-bitdepth bugfixes" 2014-10-01 16:47:43 -07:00
Jingning Han
74c2997bc9 Remove redundant header file from vp9_idct.h
Change-Id: Id92544762e7b96d3c729dfc8e04ecff91cbcc7f9
2014-10-01 14:58:27 -07:00
Deb Mukherjee
a160d72522 High-bitdepth bugfixes
Miscellaneous bug-fixes for high bitdepth functionality.
With this patch, high bit-depth profiles become mostly functional,
except for an intermittent assert failure issue that is being
tracked.

Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c
2014-10-01 14:18:11 -07:00
Jingning Han
3d17f0d45f Remove repeated vpx_integer.h from vp9_prob.h
The file vpx_integer.h has been included and used in the parent
file vp9_common.h.

Change-Id: I9c65f08353576f9ef1e5ea17244fc5ca964ec002
2014-10-01 12:45:52 -07:00
Jingning Han
764c00ab50 Use precise header files in vp9_entropymv.h
The commit cleans up the header files in vp9_entropymv.h. This
file should only depend on vp9_mv.h and vp9_prob.h. Remove the
giant vp9_blockd.h from header file list.

Change-Id: I44cd26d2cfd10a16a9325778347dd53f888a874c
2014-10-01 12:41:08 -07:00
Deb Mukherjee
872b207b78 Moves transform type defines to vp9_common
Moves transform type defines to vp9_common.h from vp9_idct.h
so that they can be included in vp9_rtcd_defs.pl safely.

Change-Id: Id5106227bee5934f7ce8b06f2eb9fa8a9a2e0ddb
2014-09-30 19:44:17 -07:00
James Zern
4a296e6baa Revert "Fix compiling error in vp9_idct.h"
This reverts commit eafc8c9c40.

tran_low_t/tran_high_t don't belong in a public header, they're private.
Similarly the public headers shouldn't rely on config defines,
vpx_config.h isn't installed.

Change-Id: I194ec273598da418df8dd727b6c0e78a556740ad
2014-09-30 16:08:55 -07:00
Jingning Han
0829d2be7f Remove redundant header file declaration
Some header file in vp9_idct.c has been included in vp9_idct.h.
This commit removes these redundant declarations.

Change-Id: I0238c27e4efff5c981eb437022c6bc6970c4e445
2014-09-30 09:13:00 -07:00
Jingning Han
eafc8c9c40 Fix compiling error in vp9_idct.h
This commit fixes a compiling error in vp9_idct.h, where the codec
checks that the intermediate steps of transformation fit within
16-bit length. The issue was due to broken file dependency.

Change-Id: Ib22bba13a1e6df28489cb23d6774c561969f1fdc
2014-09-30 09:11:59 -07:00
Deb Mukherjee
9ed23de13f Miscellaneous decoder changes for high bitdepth
Also includes yv12 config changes.

Change-Id: Iacf40d8bf486815b54c32a127ce3cd4516b7e44f
2014-09-29 11:27:45 -07:00
hkuang
c53a95ad1d Avoid calling vp9_is_scaled two times in a function.
Use a local variable to hold the result of vp9_is_scaled.

Change-Id: I5e203909805923e20eefef596bc84424da47dbe2
2014-09-25 11:52:16 -07:00
Yaowu Xu
845d4f333d Fix a couple of comments
The first comment is obselete given the way is now normative in VP9
bitstream. The second comment line was too long.

Change-Id: I6546585babf60d466485ddcf2daa6d2fa79e999a
2014-09-25 08:24:16 -07:00
Yaowu Xu
d237d483a5 Correct the condition for border extension
As reported in issue #850, the condition for border extension was not
complete. This commit added the case when the scaling is enabled.

This fixes issue #850.

Change-Id: I67768b23f0dcc4ac9a9aa0a0825b0fe8cb85a72e
2014-09-24 11:26:40 -07:00
Yaowu Xu
148c57d231 Merge "Fix invalid memory access on 2x downscale." 2014-09-24 09:58:05 -07:00
Alexander Voronov
eafd842a3e Fix incorrect subsampling used in VP9 non420 loopfilter.
Change-Id: Ia959e24b4676242c80a8867d2c39a6fee90f71a5
2014-09-24 17:01:09 +04:00
Deb Mukherjee
e2a90c0b21 Merge "High bit-depth loop/arf/postproc filter functions" 2014-09-23 17:26:32 -07:00
Deb Mukherjee
931ed516ba High bit-depth loop/arf/postproc filter functions
Adds high-bitdepth loopfilter, temporal filter and postproc functions

Change-Id: I81c8a9176890784686bc4f2af0d550d243b3b2d3
2014-09-23 16:20:43 -07:00
hkuang
c70cea97ac Remove mi_grid_* structures.
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.

This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.

Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
2014-09-19 21:27:11 -07:00
Deb Mukherjee
822b51609b High bit-depth coefficient coding functions
Tokenization and Detokenization enhancements for 10/12 bit

Change-Id: I3c269ec30f8eb160ee024905638a193975237559
2014-09-19 15:21:24 -07:00
Frank Galligan
49dc7b05d0 Merge "FIX: vp9_loopfilter_intrin_sse2.c" 2014-09-18 15:10:16 -07:00
Scott LaVarnway
13284311eb FIX: vp9_loopfilter_intrin_sse2.c
Fixes Visual Studio build failures

Change-Id: I233719cd63b3ad0db16e2834bf1d7ea1df805880
2014-09-18 13:09:13 -07:00
Deb Mukherjee
6d0ee9860e Merge "Adds high bitdepth convolve, interpred & scaling" 2014-09-18 10:52:23 -07:00
Deb Mukherjee
0d3c3d3ce7 Adds high bitdepth convolve, interpred & scaling
Change-Id: Ie51c352a6b250547207cbc1ebba833a01ed053e3
2014-09-18 07:26:17 -07:00
Frank Galligan
4e066299d9 Merge "Improved mb_lpf_horizontal_edge_w_sse2_16() #2" 2014-09-17 18:52:30 -07:00
Scott LaVarnway
217e3cb1fb Improved mb_lpf_horizontal_edge_w_sse2_16() #2
The decoder performance improved up to 1% for the
test clips used.

Change-Id: I4621112bdccfba01640322facfa4ba8da8290ea5
2014-09-17 17:25:20 -07:00
Deb Mukherjee
7d0e4f9ad1 Resolves a few gcc warnings
clang is fine.

Change-Id: Ia4e9ff17ea3b86bc87dca35828ee7ce45bea6994
2014-09-16 22:44:40 -07:00
Deb Mukherjee
f7cf05cfe0 Merge "Adding high-bitdepth intra prediction functions" 2014-09-16 17:10:24 -07:00
Frank Galligan
ecd7e3d2b7 Merge "Remove memset of every external frame buffer." 2014-09-16 15:17:26 -07:00
Deb Mukherjee
81a8138fc3 Adding high-bitdepth intra prediction functions
Change-Id: I6f5cb101e2dc57c3d3f4d7e0ffb4ddbed027d111
2014-09-16 15:04:39 -07:00
Deb Mukherjee
5cd0aab81a Adds high bitdepth quantization functions
Adds various high bitdepth quantization functions.

Change-Id: I36fc0bf75a1bd15128ed271df8723de0ac134b0c
2014-09-16 14:55:37 -07:00
Yaowu Xu
601f3a886e Fix a performance regression
This commit adds back sse2 or ssse3 optimized versio of a couple of
functions, fixes a ~10% performance regression.

Change-Id: I049786906e5a641224dced63c6492aec9d86d183
2014-09-16 11:18:46 -07:00
Frank Galligan
175d9dfe0a Remove memset of every external frame buffer.
Libvpx was memseting every external frame buffer before decode. This
was to work around a valgrind issue in our C loop filter. Most of
the time this was not needed and we have noticed some significant
performance loss on some platforms. Now we require the application to
zero out the buffers if it is using external frame buffers.

Change-Id: I7330d00a315e65137ed30edd5f813e8929b76242
2014-09-15 15:37:36 -07:00
Alexander Voronov
29071a418e Fix invalid memory access on 2x downscale.
The issue was discovered on bitstream with 2x vertical downscale. For
zero MVs, y_pad is set to 1 only when vertical convolution is
required. The original code assumes that for y_step_q4 == 32 we don't
perform vertical convolution. But vp9_setup_scale_factors_for_frame()
sets convolve functions so that when x_step and y_step are both not
equal to 16, convolve in both directions is performed. And convolve()
unconditionally subtracts one stride from source pointer when calls
convolve_horiz(). This leads to invalid memory access.

Change-Id: I882dfa6081a58e172b5ffa55842bfcd6727f10bf
2014-09-15 17:50:20 +04:00
Jingning Han
82fad6f4b6 Merge "Add a note for enum values of MV_REFERENCE_FRAME" 2014-09-13 10:42:45 -07:00
Deb Mukherjee
10783d4f3a Adds high bitdepth transform functions and tests
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.

Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
2014-09-11 19:56:33 -07:00
Deb Mukherjee
1e4136d35d Adds high bit depth sad and variance functions
Moves high bit depth sad/var functions from highbitdepth
branch to master.

Change-Id: If03845d8ef9c9c494e13350e7a587c289306b94d
2014-09-11 17:30:44 -07:00
Johann
ac2f2e7855 Merge "Allow specifying opt dependencies" 2014-09-11 16:02:41 -07:00
Johann
8645a53039 Allow specifying opt dependencies
If optimizations use more than one cpu feature, allow
specifying them so that '--disable-X' still works

https://code.google.com/p/webm/issues/detail?id=854

Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18
2014-09-11 13:43:48 -07:00
Jingning Han
3ef9786b7e Add a note for enum values of MV_REFERENCE_FRAME
Change-Id: Ifaf6738f26e86ded6eb6ea1465bad7a229612999
2014-09-11 10:55:42 -07:00
Jim Bankoski
0e66848081 Merge "LoopFilterWorkerData: remove misleading 'const'" 2014-09-10 06:33:51 -07:00
James Zern
2215d2f135 Merge changes If8887e1d,I36bfc9c8,I3d1e6c42
* changes:
  vp9_dthread: simplify loop_filter_row_worker signature
  simplify vp9_loop_filter_worker signature
  vp9_decodeframe: simplify tile_work_hook signature
2014-09-09 16:50:28 -07:00
Dmitry Kovalev
8e205a2a09 Merge "Cleaning up and speeding up vp9_idct32x32_1024_add_sse2()." 2014-09-09 12:50:23 -07:00
James Zern
7b572c9806 LoopFilterWorkerData: remove misleading 'const'
'frame_buffer' is modified indirectly via 'planes'.

+ do the same for vp9_loop_filter_rows

Change-Id: Ibb7daa2e261064e4a5317a2969e3490e59891b82
2014-09-08 20:06:48 -07:00
James Zern
48662747bd simplify vp9_loop_filter_worker signature
use the type names directly in the function declaration rather than
(void *arg1, void *arg2)

Change-Id: I36bfc9c886310ce370bf0ca7c679ebd6e95109cc
2014-09-08 19:53:46 -07:00
Dmitry Kovalev
980abf6078 Fixing Mac OS build.
Change-Id: Ifae8906185a868a07685eb7a7da2484af95e70a7
2014-09-08 08:53:12 -07:00
Dmitry Kovalev
70092af5c0 Cleaning up and speeding up vp9_idct32x32_1024_add_sse2().
Change-Id: If91017b792572c9db6e257011ca307bef8428486
2014-09-05 18:12:30 -07:00
Dmitry Kovalev
89963bf586 Merge "Removing postproc mmx code." 2014-09-05 18:11:08 -07:00
Dmitry Kovalev
54bec0971f Merge "Initializing intra modes without vpx_once()." 2014-09-05 12:03:36 -07:00
Dmitry Kovalev
1100e262c5 Removing postproc mmx code.
Removed functions:
* vp9_post_proc_down_and_across_mmx
* vp9_mbpost_proc_down_mmx
* vp9_plane_add_noise_mmx

They all have sse2 equivalent.

Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff
2014-09-05 11:52:50 -07:00
James Zern
a8083449e9 fix x86-darwin* build
vp9_variance_sse2.c contains a mix of intrinsics and references to
assembly which uses x86inc.asm; it's conditionally included as a result.

Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec
2014-09-04 23:32:13 -07:00
Dmitry Kovalev
490943552f Removing unused function prototypes.
Change-Id: Ia5e383e2cf18052f6f1eacf8b9495ab8e4d58878
2014-09-04 14:26:30 -07:00
Dmitry Kovalev
48197f0a70 Adding sse2 variant for vp9_mse{8x8, 8x16, 16x8}.
Change-Id: I6786d25ce4f32b8d8912f2d239a45ca15b310c4b
2014-09-03 19:02:14 -07:00
Dmitry Kovalev
bf778e7d8e Initializing intra modes without vpx_once().
Change-Id: I0a9d52432f2500f1bd8f43f229e70e38bb9a0343
2014-09-03 11:39:02 -07:00
Dmitry Kovalev
0ecc75c819 Merge "Removing MMX SAD calculation code." 2014-09-02 17:35:59 -07:00
Dmitry Kovalev
318fc0c34f Removing MMX SAD calculation code.
Removed functions:
* vp9_sad_16x16_mmx
* vp9_sad_8x16_mmx
* vp9_sad_16x8_mmx
* vp9_sad_8x8_mmx
* vp9_sad_4x4_mmx

Change-Id: Ic5174b93b64d65d846f0c11e72cab149e9472bc3
2014-09-02 14:41:36 -07:00
Deb Mukherjee
5acfafb18e Adds config opt for highbitdepth + misc. vpx
Adds config parameter vp9_highbitdepth, to support highbitdepth profiles.
Also includes most vpx level high bit-depth functions. However
encode/decode in the highbitdepth profiles will not work until
the rest of the code is in place.

Change-Id: I34c53b253c38873611057a6cbc89a1361b8985a6
2014-09-02 14:37:10 -07:00
Dmitry Kovalev
12cd6f421d Removing variance MMX code.
Removed functions:
* vp9_mse16x16_mmx
* vp9_get_mb_ss_mmx
* vp9_get4x4var_mmx
* vp9_get8x8var_mmx
* vp9_variance4x4_mmx
* vp9_variance8x8_mmx
* vp9_variance16x16_mmx
* vp9_variance16x8_mmx
* vp9_variance8x16_mmx

They all have SSE2 equivalent.

Change-Id: I3796f2477c4f59b35b4828f46a300c16e62a2615
2014-08-29 10:26:42 -07:00
Dmitry Kovalev
eba83a0fdb Merge "Replacing int_mv with MV inside the first pass code." 2014-08-25 13:56:14 -07:00
Dmitry Kovalev
a459e582cb Replacing int_mv with MV inside the first pass code.
Change-Id: Ia3be6b5a18e1ff6cc5c5f4d37e4a5d0972388308
2014-08-22 16:20:18 -07:00
Jim Bankoski
cebe2c8d88 vp9_postproc.c: unused parameter warning resolved
Change-Id: I6d77a7c775c0482fd1f9bb03ea6f336dd2973fa0
2014-08-22 13:41:07 -07:00
Yaowu Xu
23c88870ec Merge "Fix bug 804" 2014-08-21 08:56:32 -07:00
Adrian Grange
c5d8c1e785 Merge "get_ref_frame: fix test for valid buffer." 2014-08-15 10:41:28 -07:00
Adrian Grange
54f8cb78c6 Merge "Fix bug 837: realloc mode info buffers on resize" 2014-08-14 14:53:33 -07:00
Adrian Grange
89a213b4b0 get_ref_frame: fix test for valid buffer.
In the current implementation of the encoder,
frame buffers may come from the wider set of
12 such buffers, and is not restricted to the
8 allowed as reference frames. This is only
an implementation detail and does not affect
the constraint of having a total of 8 reference
buffers overall.

Change-Id: I075f777146c2df49c275d89232933f8127235175
2014-08-14 12:42:11 -07:00
Adrian Grange
4e30565a9f Fix bug 837: realloc mode info buffers on resize
The test to determine if the mode info buffers need
to be resized when the frame size changes was
incorrect, as per bug 837.

By storing the size of the allocated data structure,
a simple test determines whether to allocate more
memory when the frame size changes.

Change-Id: I1544698f2882cf958fc672485614f2f46e9719bd
2014-08-14 08:59:15 -07:00
James Zern
4b79563805 Merge "get_ref_frame: check ref_frame_map value" 2014-08-12 22:48:27 -07:00
James Zern
a6b7bd6a1c Merge "fixes several -Wunused-function warnings" 2014-08-12 20:15:14 -07:00
James Zern
3caed4f8fd get_ref_frame: check ref_frame_map value
'ref_frame_map' is initialized to -1. avoids using an invalid index  if
VP9_GET_REFERENCE/VP8_COPY_REFERENCE controls are issued after a decode
error.

Change-Id: I4599762c4d0b07a5943a72bf4a86ccb596cc062a
2014-08-12 17:47:04 -07:00
Jim Bankoski
f452961765 fixes several -Wunused-function warnings
Change-Id: I4dc2cb255f4fe30998b6ee61184895dee9f5da8e
2014-08-12 16:51:07 -07:00
Adrian Grange
1ebf52df2c Common encode/decode function to get reference frame
Replaced encoder and decoder functions to get a pointer
to a reference frame with a common function, vp9_get_ref_frame,
and simplified it.

Change-Id: Icb206fcce8caace3bfd1db3dbfa318dde79043ee
2014-08-08 11:37:11 -07:00
Adrian Grange
75b42a4977 Remove coding_use_prev_mi member from VP9_COMMON
This was shadowing the use of error_resilient_mode, but with
the opposite sense.

Change-Id: Ie4d30263a304fe4b3e94f0c7741db6888cc6afd8
2014-08-08 09:40:38 -07:00
levytamar82
69a5f5ecf7 Fix bug 807
in the sub_pixel_*variance* function the dst is aligned to 16 bytes and not
to 32 bytes - now load unaligned data

Change-Id: I2e0b9745543697efc56fefa32857ea10117af135
2014-08-07 18:51:02 -07:00
levytamar82
839911fb6d Fix bug 804
A bug in Microsoft compiler was found in the function
vp9_filter_block1d16_v8_avx2 and a workaround applied.
the bug occur when there was 4 consecutive maddubs + min + adds
intrinsic instructions.

Change-Id: I83499faeb70971e650e5663fd2490360ddb1a51b
2014-08-07 15:09:24 -07:00
levytamar82
af10457e02 Fix bug 806
in the function sad32x32x4d and sad64x64x4d the source is aligned to 16 bytes
and not to 32 bytes - the load is now unaligned.

Change-Id: I922fdba56d0936b5cf72e4503519f185645a168c
2014-08-07 14:13:30 -07:00
Dmitry Kovalev
65234504b9 Merge "Removing direct references to VP9_COMP." 2014-08-07 14:12:32 -07:00
Deb Mukherjee
a468170804 Merge "Changes hdr for profiles > 1 for intraonly frames" 2014-08-07 11:15:38 -07:00
Deb Mukherjee
09bf1d61ca Changes hdr for profiles > 1 for intraonly frames
Specifies the bit-depth, color sampling and colorspace
for intra only frames for profiles > 0

Also adds checks to ensure that profile 1 and 3 are
exclusively used for non 420 streams.

Change-Id: Icfb15fa1acccbce8f757c78fa8a2f60591360745
2014-08-07 09:47:14 -07:00
Yaowu Xu
0a2b25dcb9 configure: add --enable-coefficient-range-checking
This commit adds a configure time option used to enable strict error
checking in decoder to make sure intermediate stage cofficients of
inverse transforms are within valid range of signed 16 bit integer.

For valid VP9 input streams, intermediate stage coefficients should
always stay within the range of a signed 16 bit integer. Coefficients
can go out of this range for invalid/corrupt VP9 streams. However,
strictly checking this range for every intermediate coefficient can
be a burden for decoder, therefore such validation is only enabled
with configure option --enable-coefficient-range-checking.

Change-Id: I47d47c8c4e48a922c3d223ca59064f51b3f0f5ed
2014-08-06 17:13:16 -07:00
Dmitry Kovalev
09b3d04aac Removing direct references to VP9_COMP.
Change-Id: Ic37624d807884e71f08b50fd04892f03f2708ba7
2014-08-06 12:59:02 -07:00
Johann
7516abc7dc Remove vp9_postproc_x86.h
This configuration has moved to vp9_rtcd_defs.pl

Change-Id: I71a31dbb8d79df226b60dd834324a5af69956c51
2014-08-05 15:46:13 -07:00
Jim Bankoski
128827d947 cast enums to int to avoid gcc warning in pred_common
Change-Id: Ie3e478ef4fa565225d9e19a14d2f40aad966c2b6
2014-08-04 12:07:37 -07:00
Jim Bankoski
7f63dabfe9 break at the end of clauses with assert(0) to avoid gcc warning
Change-Id: I1b3c5337f018dde27dc819ab18bd081d169a91e8
2014-08-04 08:52:53 -07:00
Jim Bankoski
3cf5908e24 uint8_t segment and skip to avoid signed / unsigned warnings
Change-Id: I2e2765b851fb0a1b15351c2aa0e079197cbee373
2014-08-04 08:52:40 -07:00
James Zern
ce896df057 Merge "vp9_entropy: inline comes first to avoid warning." 2014-08-01 19:15:34 -07:00
James Zern
3a924f6ed1 Merge "signed unsigned mismatch - warning error" 2014-08-01 16:28:38 -07:00
Jim Bankoski
9c74e6aac7 vp9_entropy: inline comes first to avoid warning.
Change-Id: I5b050122e6ed183a5b33c1f38e4fbf63b6721062
2014-08-01 16:05:30 -07:00
James Zern
1b6ac28a2f Merge "removed sign mismatch warning" 2014-08-01 14:45:12 -07:00
Frank Galligan
5f8fa13258 Merge "Added vp9_sad8x8_neon()" 2014-08-01 14:11:38 -07:00
Scott LaVarnway
98165ec074 Neon version of vp9_sub_pixel_variance8x8(),
vp9_variance8x8(), and vp9_get8x8var().

On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~1.2%.

Change-Id: I8a66ac2a0f550b407caa27816833bdc563395102
2014-08-01 11:35:55 -07:00
Frank Galligan
5487b6067c Merge "Neon version of vp9_sub_pixel_variance32x32()," 2014-08-01 09:46:37 -07:00
Scott LaVarnway
545be78136 Added vp9_sad8x8_neon()
Change-Id: I3be8911121ef9a5f39f6c1a2e28f9e00972e0624
2014-08-01 06:36:18 -07:00
Jim Bankoski
0f3689d32d signed unsigned mismatch - warning error
Change-Id: I991e36aa3cfa62aae6d27b253297dd9ca9e8bc12
2014-08-01 06:29:32 -07:00
Jim Bankoski
512f9b631f removed sign mismatch warning
Change-Id: Iaa40b472f6c1c48bb3bb47332b6fcf36d7f3c10e
2014-08-01 06:28:00 -07:00
Scott LaVarnway
6f4b8dcdc2 Neon version of vp9_subtract_block()
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~3.2%

Change-Id: I8862497264142171b7efc32df1a67714a23539f4
2014-07-31 09:28:06 -07:00
Scott LaVarnway
d39448e2d4 Neon version of vp9_sub_pixel_variance32x32(),
vp9_variance32x32(), and vp9_get32x32var().

Change-Id: I8137e2540e50984744da59ae3a41e94f8af4a548
2014-07-31 08:00:36 -07:00
Scott LaVarnway
d4a37db5b8 Neon version of vp9_quantize_fp()
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~12.4%

Change-Id: Id29d215acf58bb108489e218a259adf74b4768d7
2014-07-30 09:33:46 -07:00
Scott LaVarnway
521cf7e879 Neon version of vp9_sub_pixel_variance16x16(),
vp9_variance16x16(), and vp9_get16x16var().

On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~16.7%.

Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb
2014-07-30 08:17:32 -07:00
Scott LaVarnway
d19d222db6 Added vp9_fdct8x8_neon(), vp9_fdct8x8_1_neon()
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~3.7%.

Change-Id: I428c72c40df82c6d537955e320a8debf99343004
2014-07-29 08:56:05 -07:00
levytamar82
4ba92dc5ab Fix bug 805
Remove all the redundant dct functions (dct4x4, dct8x8)
in avx2 except dct32x32 those functions were copied originally from dct_sse2

Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e
2014-07-28 15:46:01 -07:00
hkuang
44395a21da Move vp9_dec_build_inter_predictors_* to decoder folder.
Change-Id: Ibe9fa28440cc79ba9f3504d78c7dca7bb01a23e1
2014-07-28 11:09:11 -07:00
hkuang
7eca086707 Add segmentation map array for current and last frame segmentation.
The original implementation only allocates one segmentation map and this
works fine for serial decode. But for frame parallel decode,  each thread
need to have its own segmentation map and the last frame segmentation map
should be provided from last frame decoding thread.

After finishing decoding a frame, thread need to serve the old segmentation
map that associate with the previous decoded frame. The thread also need to
use another segmentation map for decoding the current frame.

Change-Id: I442ddff36b5de9cb8a7eb59e225744c78f4492d8
2014-07-28 10:44:02 -07:00
Jingning Han
53844275e9 Fix potential ioc issue in vp9_get_prob for 4K above sizes
This commit turns on the existing vp9_get_prob function using
64 bit in the intermediate step. It fixes the ioc issue for 4K
above frame sizes (issue 828).

Change-Id: I9f627f3beca2c522f73b38fd2a3e7eefdff01a7c
2014-07-24 15:35:51 -07:00
Alex Converse
5926e7c0e8 Remove unfinished VP9 alpha channel.
Change-Id: Ic5d3a3a0dac10b49495771886a31e793bb78b5ca
2014-07-21 15:55:50 -07:00
Deb Mukherjee
727f384085 Merge "Separates profile 2 into 2 profiles 2 and 3" 2014-07-18 03:23:51 -07:00
Deb Mukherjee
c447a50aea Separates profile 2 into 2 profiles 2 and 3
Separates HBD profile int two profiles (2 and 3) consistent with the
highbitdepth branch. This patch is ported from the original highbitdepth
branch patch: https://gerrit.chromium.org/gerrit/#/c/70460/

Two of the invalid file tests needed to be updated.

Change-Id: I6a4acd2f7a60b1fb4cbcc8e0dad4eab4248431e3
2014-07-17 20:51:59 -07:00
Adrian Grange
8cb8aef7c7 Merge "Modified frame buffer handling" 2014-07-17 12:15:16 -07:00
Scott LaVarnway
ba0652e83a Merge "Added vp9_sad64x64_neon(), vp9_sad32x32_neon()" 2014-07-17 11:42:16 -07:00
Adrian Grange
f68aaa38d6 Modified frame buffer handling
This patch is the first step toward simplifying the
frame buffer handling.

The final goal is to have a common frame buffer handling
framework for both encoder and decoder that incorporates
the existing ability to use externally allocated memory.

Change-Id: I2c378a4f54a39908915f46c4260e17a080db7ff1
2014-07-17 11:06:35 -07:00
Scott LaVarnway
696fa52eaa Added vp9_sad64x64_neon(), vp9_sad32x32_neon()
and vp9_sad16x16_neon()

On a Nexus 7, vpxenc (in realtime mode, speed -6)
reported a performance improvement of ~17%.

Change-Id: I91e070cde2973451083d3f3d63b49b7886de9a85
2014-07-16 12:54:46 -07:00
Deb Mukherjee
1f6aaeddc5 Merge "Some extra bit probability cleanups" 2014-07-14 17:26:54 -07:00
hkuang
4c08120ca0 Merge "Include the right header for VP9 worker thread." into frame_parallel 2014-07-14 16:09:16 -07:00
hkuang
294b849796 Include the right header for VP9 worker thread.
pthread.h is not supported in windows. vp9_thread.h includes
the emulation layer for pthread in windows.

Change-Id: I2b1c8ec299928472faca7ebeea998170c9f4d744
2014-07-14 16:03:38 -07:00
Jingning Han
6ce515b9ff Merge "Fix chrome valgrind warning due to the use of mismatched bsize" 2014-07-13 11:07:44 -07:00
James Zern
0999a2a24e Merge "vp9_loopfilter.c: cosmetics" 2014-07-11 16:02:21 -07:00
Jingning Han
3cddd81c6d Fix chrome valgrind warning due to the use of mismatched bsize
This commit fixes a mismatched use case of block size in non-RD
intra prediction check. The residual SSE and variance should be
calculated per transform block size, instead of operating block
size, which caused chrome valgrind warning on conditional jump
based on uninitialized value (webm issue 823). This commit
resolves this issue.

Change-Id: I595c06599c7e0fd0e4a08736519ba68fc14bc79a
2014-07-11 15:49:22 -07:00
hkuang
3cffa0c74e Move vp9_thread.* to common.
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.

(cherry picked from commit 337e8015c9)

Change-Id: I0587a08447925f4554d7788686a31483c2ae3f37
2014-07-11 15:24:31 -07:00
Yunqing Wang
7e340614c1 Merge "Remove unnecessary assertions" 2014-07-11 13:47:03 -07:00
Deb Mukherjee
6957e7a077 Some extra bit probability cleanups
Refactoring to remove some duplication of probability
tables between tokenization and detokenization.

Change-Id: I2fc6a6497f9c0410021a9b41f828bc58a864e466
2014-07-11 11:39:18 -07:00
Yunqing Wang
978642a426 Remove unnecessary assertions
Removed 2 unnecessary assertions.

Change-Id: I0f8877d0494bf3ecdb0d7931ccbcaa8289e01d8b
2014-07-11 10:48:57 -07:00
Yaowu Xu
a75d55df1b Remove an unused parameter
Change-Id: I6ad6fd75dc3c9e6218d88148cf49e205398e2af5
2014-07-11 08:10:04 -07:00
James Zern
8a7cc1f47b Merge "update vp9_thread.c" 2014-07-10 23:19:55 -07:00
James Zern
8701ed0270 update vp9_thread.c
pull the latest from libwebp.

Original source:
 http://git.chromium.org/webm/libwebp.git
 100644 blob 264210ba2807e4da47eb5d18c04cf869d89b9784 src/utils/thread.c

commit 46fd44c1042c9903b2f1ab87e9f200a13c7e702d
Author: James Zern <jzern@google.com>
Date:   Tue Jul 8 19:53:28 2014 -0700

    thread: remove harmless race on status_ in End()

    if a thread was still doing work when End() was called there'd be a race
    on worker->status_. in these cases, however, the specific value is
    meaningless as it would be >= OK and the thread would have been shut
    down properly, but we'll check 'impl_' instead to avoid any potential
    TSan/DRD reports.

    Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1

Change-Id: Ib0ef25737b3c6d017fa74822e21ed58508230b91
2014-07-10 12:20:54 -07:00
Yunqing Wang
1226d133df Merge "Refactor vp9_diamond_search_sad function" 2014-07-10 11:06:32 -07:00
Yunqing Wang
46441ec5c8 Merge "Refactor refining_search_sad code" 2014-07-10 10:43:00 -07:00
hkuang
51e9788e58 Fix a bug in boundary checking.
Change-Id: Ifc741da9da6f61c8d3c1f675ec6b8a96570f877d
2014-07-10 09:43:04 -07:00
Yunqing Wang
75cd57503d Refactor vp9_diamond_search_sad function
Currently, vp9_diamond_search_sadx4() is only called when sse3 is
enabled, which is improper since sse2 optimization of sdx4df
functions are available. Changed to always use
vp9_diamond_search_sadx4().

Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717
2014-07-10 09:19:03 -07:00
James Zern
58609335b1 vp9_loopfilter.c: cosmetics
- fix indent, spelling
- drop some whitespace in some comments
- add an assert in vp9_setup_mask, it shouldn't be called on decode
  error

Change-Id: Ic312a815e977a6f9cb81ceb7b039eeada76c5aa0
2014-07-09 17:27:57 -07:00
Yunqing Wang
30117a576d Refactor refining_search_sad code
There are sse2 optimization of sdx4df functions. Instead of calling
vp9_refining_search_sadx4 only when sse3 is enabled, call it always.

Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9
2014-07-09 16:50:11 -07:00
Jingning Han
f6bf614b2f Merge "Re-design quantization process for 32x32 transform block" 2014-07-09 11:55:26 -07:00
hkuang
b84ee5a3d0 Merge "Move vp9_thread.* to common." 2014-07-09 10:16:13 -07:00
Jingning Han
9ad1b9fc67 Re-design quantization process for 32x32 transform block
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.

Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
2014-07-08 16:55:28 -07:00
Adrian Grange
7c43fb67ae Fix decoder handling of intra-only frames
This patch fixes bug 633:
https://code.google.com/p/webm/issues/detail?id=633

The first decoded frame does not have to be a keyframe,
it could be an inter-frame that is coded intra-only.

This patch fixes the handling of intra-only frames.

A test vector has also been added that encodes 3
intra-only frames at the start of the clip. The
test vector was generated using the code in the
following patch:
https://gerrit.chromium.org/gerrit/#/c/70680/

Change-Id: Ib40b1dbf91aae2bc047e23c626eaef09d1860147
2014-07-08 16:24:03 -07:00
hkuang
337e8015c9 Move vp9_thread.* to common.
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.

Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
2014-07-07 14:52:19 -07:00
hkuang
28a794f680 Seperate the frame buffers from VP9 encoder/decoder structure.
Prepare for frame parallel decoding, the frame buffers must be
separated from the encoder and decoder structure, while the encoder
and decoder will hold the pointer of the BufferPool.

Change-Id: I172c78f876e41fb5aea11be5f632adadf2a6f466
2014-07-02 15:34:20 -07:00
Yaowu Xu
82fd084b35 Merge "Re-design quantization process" 2014-07-01 19:04:01 -07:00
Jingning Han
9ac2f66320 Re-design quantization process
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.

The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.

Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
2014-07-01 17:00:07 -07:00
Alex Converse
6c54dbcb69 Merge "BITSTREAM: Handle transform size and motion vectors more logically for non-420." 2014-06-30 17:44:01 -07:00
James Zern
44472cde55 vp9: disable postproc buffer alloc when unnecessary
the buffer is only used in encoding and only when
CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled.
a future change should decouple this from the frame buffer allocation
and make it conditional based on runtime flags when the above config
options are enabled.
reduces decode heap usage by at least 12%

Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf
2014-06-27 20:59:56 -07:00
Jim Bankoski
52b63c238e Merge "Better validation of invalid files" 2014-06-27 11:05:21 -07:00
Jim Bankoski
9f37d149c1 Better validation of invalid files
This patch checks that a decoder never tries to reference frame that's
outside the range of 2x to 1/16th the size of this frame.  Any attempt
to do so causes a failure.

Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c
2014-06-27 10:03:15 -07:00
Jingning Han
46ea9ec719 Enable real-time version reference motion vector search
This commit enables a fast reference motion vector search scheme.
It checks the nearest top and left neighboring blocks to decide the
most probable predicted motion vector. If it finds the two have
the same motion vectors, it then skip finding exterior range for
the second most probable motion vector, and correspondingly skips
the check for NEARMV.

The runtime of speed -5 goes down
pedestrian at 1080p 29377 ms -> 27783 ms
vidyo at 720p       11830 ms -> 10990 ms
i.e., 6%-8% speed-up.

For rtc set, the compression performance
goes down by about -1.3% for both speed -5 and -6.

Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
2014-06-26 09:49:13 -07:00
Adrian Grange
8357292a5a Fix test on maximum downscaling limits
There is a normative scaling range of (x1/2, x16)
for VP9. This patch fixes the maximum downscaling
tests that are applied in the convolve function.

The code used a maximum downscaling limit of x1/5
for historic reasons related to the scalable
coding work. Since the downsampling in this
application is non-normative it will revert to
using a separate non-normative scaler.

Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f
2014-06-24 10:26:09 -07:00
Adrian Grange
8c1f071f1e Allocate buffers based on correct chroma format
The encoder currently allocates frame buffers before
it establishes what the chroma sub-sampling factor is,
always allocating based on the 4:4:4 format.

This patch detects the chroma format as early as
possible allowing the encoder to allocate buffers of
the correct size.

Future patches will change the encoder to allocate
frame buffers on demand to further reduce the memory
profile of the encoder and rationalize the buffer
management in the encoder and decoder.

Change-Id: Ifd41dd96e67d0011719ba40fada0bae74f3a0d57
2014-06-23 11:45:13 -07:00
Jingning Han
961bafc366 Merge "Remove unused vp9_init_quant_tables function" 2014-06-23 09:37:30 -07:00
Johann
1fc2b0fd00 Merge "Include type defines" 2014-06-20 11:29:19 -07:00
Johann
d658216276 Don't return value for void functions
Clears "warning: 'return' with a value, in function returning void"

Change-Id: I93972610d67e243ec772a1021d2fdfcfc689c8c2
2014-06-20 11:26:44 -07:00
Johann
baef0b89da Include type defines
Clears error: unknown type name 'uint8_t'

Change-Id: I9b6eff66a5c69bc24aeaeb5ade29255a164ef0e2
2014-06-20 11:26:13 -07:00
Alex Converse
7557a65d16 BITSTREAM: Handle transform size and motion vectors more logically for non-420.
This breaks the profile 1 bitstream.

Don't force non420 uv transform size to 1/4 y size. In the 4:2:0 case the
chroma corresponding to a luma block is 1/4 its size. In the 4:4:4 case
chroma and luma planes are the same size. Disallowing larger transforms
can result in a loss of compression efficiency and is inconsistent.

For sub-8x8 blocks only average corresponding motion vectors.

4:2:0 and profile 0 behavior remains unchanged.

Change-Id: I560ae07183012c6734dd1860ea54ed6f62f3cae8
2014-06-18 13:07:51 -07:00
Jingning Han
3b9c19aaa7 Remove unused vp9_init_quant_tables function
This function is not effectively used, hence removed.

Change-Id: I2e8e48fa07c7518931690f3b04bae920cb360e49
2014-06-18 11:51:41 -07:00
James Zern
88df435d6b Merge "vp9_rtcd: correct avx2 references" 2014-06-16 17:39:13 -07:00
Johann
79afb5eb41 Use lrand48 on Android
When building x86 assembly use lrand48 instead of the
undocumented inlined _rand function.

Android now supports rand()
https://android-review.googlesource.com/97731
but only for new versions. Original workaround:
https://gerrit.chromium.org/gerrit/15744

Change-Id: I130566837d5bfc9e54187ebe9807350d1a7dab2a
2014-06-12 19:57:25 -07:00
Jingning Han
d5ae43318e Merge "Fast computation path for forward transform and quantization" 2014-06-12 11:59:52 -07:00
Jingning Han
ccba289f8d Fast computation path for forward transform and quantization
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.

It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.

In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.

Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
2014-06-12 11:10:54 -07:00
James Zern
9f3a0dbb5e vp9_rtcd: correct avx2 references
s/"\$avx2_x86inc"/"avx2"/

avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm

Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a
2014-06-10 16:26:36 -07:00
James Zern
cbce09ce62 Merge changes I6abc0657,I8224fba2,I04f64a45,I5d49d119,I76b4d171,I88c11ac3
* changes:
  vp9_sub_pixel_*variance*: disable avx2 variants
  vp9_sad*x4d: disable avx2 variants
  vp9_f(dct|ht): disable avx2 variants
  convolve: disable avx2 variants
  fdct8x8_test: add missing avx2 functions
  dct4x4_test: add missing avx2 functions
2014-06-10 16:14:45 -07:00
James Zern
520cb3f39f vp9_sub_pixel_*variance*: disable avx2 variants
tests failing under Win32/Win64

+ variance_test: add missing avx2 functions (partially disabled)

Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
2014-06-10 16:11:15 -07:00
James Zern
d3ff009d84 vp9_sad*x4d: disable avx2 variants
tests failing under Win32/Win64

+ sad_test: add missing avx2 functions (disabled)

Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
2014-06-10 16:10:12 -07:00
hkuang
cdffeaaae0 Add mode info arrays and mode info index.
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and  previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.

In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.

Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
2014-06-10 13:43:36 -07:00
James Zern
dd9f502933 vp9_f(dct|ht): disable avx2 variants
tests failing under Win32/Win64

+ dct16x16_test: add missing avx2 functions (partially disabled)

exercises the forward transforms
no idct/iht implementations, so the c-code is used

Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
2014-06-09 18:48:11 -07:00
James Zern
5704578f5f convolve: disable avx2 variants
tests failing under Win32/Win64

Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
2014-06-09 18:42:03 -07:00
Jingning Han
0c4a4225ec Merge "Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs" 2014-06-03 16:51:39 -07:00
Dmitry Kovalev
19c492a749 Merge "Reusing existing vp9_get{8x8, 16x16}var() instead of new ones." 2014-06-03 10:04:27 -07:00
Deb Mukherjee
fc88292ef2 Remove Wextra warnings from vp9_sad.c
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.

Fixes a bug in original patch:
(https://gerrit.chromium.org/gerrit/#/c/70163/8)
that was reverted due to a nightly test failure.

Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6
2014-06-02 13:50:20 -07:00
Frank Galligan
c40a968e13 Merge "Revert "Remove Wextra warnings from vp9_sad.c"" 2014-06-01 16:58:11 -07:00
Frank Galligan
0b44988952 Revert "Remove Wextra warnings from vp9_sad.c"
This reverts commit 916550428d

Change-Id: I500822b03f09c64ff6ec5396c68edee9ca3b75cb
2014-06-01 16:20:26 -07:00
Jingning Han
ba6bed372b Merge "Fix a potential overflow issue in inverse 16x16 full 2D-DCT" 2014-05-30 15:52:53 -07:00
Jingning Han
2c1cdf69b6 Fix a potential overflow issue in inverse 16x16 full 2D-DCT
An overflow issue could potentially happen in the second round 1-D
transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes
this issue.

Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc
2014-05-29 11:46:32 -07:00
Dmitry Kovalev
e14f900ae3 Merge "Moving itxm_add pointer from MACROBLOCKD to MACROBLOCK." 2014-05-29 11:16:39 -07:00
Dmitry Kovalev
f7ff24cdd0 Reusing existing vp9_get{8x8, 16x16}var() instead of new ones.
Change-Id: I87b7c657d8813d7fb383ab519d150c0ffb1dd377
2014-05-29 11:14:06 -07:00
Jingning Han
6d21cbd20b Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.

Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
2014-05-28 10:53:33 -07:00
Jingning Han
d5bcef5242 Merge "Fix compiling error in MSVS" 2014-05-27 16:58:00 -07:00
Jingning Han
239e68ddbf Fix compiling error in MSVS
Need to include math.h before tmmintrin.h in some versions of MSVS.

Change-Id: Ia6b83ae599316887ecf30c4e4b9e4355fb8a4219
2014-05-27 15:58:47 -07:00
Yunqing Wang
1f2200080b Revert "Making vp9_get_sse_sum_{8x8, 16x16} static."
This reverts commit e8bbb3d9db.

Change-Id: Ie368d36fd249d323d859d208609c711f04537bbc
2014-05-27 13:37:08 -07:00
Deb Mukherjee
444f93945b Merge "Remove Wextra warnings from vp9_sad.c" 2014-05-27 11:54:05 -07:00
Yunqing Wang
a591ac9e5a Merge "Fix decoder mismatch in sub-pixel AVX2 intrinsic filters" 2014-05-27 10:52:16 -07:00
levytamar82
773596050f Fix decoder mismatch in sub-pixel AVX2 intrinsic filters
The subpixel SSSE3 was fixed in this patch:
https://gerrit.chromium.org/gerrit/#/c/70283/
So the equivalent AVX2 is fixed accordingly.

Change-Id: Ieebbc1949c99d34b12b8b47692df71aca5001f3a
2014-05-23 16:48:40 -07:00
Jingning Han
59c3f446fe Merge "Inverse 16x16 2D-DCT SSSE3 implementation" 2014-05-23 16:01:22 -07:00
Jingning Han
48b0891370 Inverse 16x16 2D-DCT SSSE3 implementation
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.

Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
2014-05-23 15:09:35 -07:00
Yunqing Wang
67ca5b586a Merge "Fix decoder mismatch in sub-pixel SSSE3 intrinsic filters" 2014-05-23 14:24:48 -07:00
Dmitry Kovalev
d7d7cedaaa Merge "Removing vp9_pragmas.h." 2014-05-23 12:58:00 -07:00
Yunqing Wang
c5443fc881 Fix decoder mismatch in sub-pixel SSSE3 intrinsic filters
In 8-tap filtering, to guarantee the intermediate results fit in
16 bits, the order of accumulating the products needs to be done
correctly, and the largest product should be added last. This
patch fixed the problem using the method in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation".

Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
2014-05-23 11:52:20 -07:00
Yaowu Xu
9410330893 Merge "change to use assembly version of ssse3 filter code" 2014-05-23 08:02:28 -07:00
Deb Mukherjee
916550428d Remove Wextra warnings from vp9_sad.c
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.

Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71
2014-05-22 22:21:16 -07:00
Yaowu Xu
7a0c9b82f2 change to use assembly version of ssse3 filter code
As mismatchs were found  between the intrinsic version and c only. The
commit temporarily revert to use the matching assembly version to
allow further investigation.

Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
2014-05-22 17:11:57 -07:00
Yunqing Wang
aaf204e550 Merge "Fix a decoding mismatch in sub-pixel filters" 2014-05-22 17:09:14 -07:00
Yunqing Wang
efcdf946ed Fix a decoding mismatch in sub-pixel filters
This did the same correction as the one in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation" to avoid saturation
during filtering.

Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6
2014-05-22 15:42:13 -07:00
Dmitry Kovalev
72ab966d5e Removing vp9_pragmas.h.
Change-Id: I9120a87e27e73e496932d11716937e2fad246521
2014-05-22 13:46:31 -07:00
Deb Mukherjee
e272273443 Renames x86_64 specific asm files
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.

Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
2014-05-21 13:55:56 -07:00
Dmitry Kovalev
35a83677a5 Moving itxm_add pointer from MACROBLOCKD to MACROBLOCK.
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.

Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
2014-05-21 11:09:44 -07:00
Deb Mukherjee
ef750d8472 Merge "Extends temporal filtering to work for 422 data" 2014-05-20 16:31:28 -07:00
Deb Mukherjee
a185bc3350 Extends temporal filtering to work for 422 data
This is needed for profiles 1 and 2.

Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028
2014-05-20 15:19:40 -07:00
hkuang
20c1edf612 Refactor decode_tiles and loopfilter code.
The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.

|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|

For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.

For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"

Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
2014-05-20 14:47:45 -07:00
Dmitry Kovalev
c23c613fdf Merge "Hiding vp9_sub_pel_filters_{8, 8s, 8lp} filters in *.c file." 2014-05-19 10:27:16 -07:00
Dmitry Kovalev
79ba41903f Removing MACROBLOCKD dependency from loop filter.
Change-Id: I9ef40f3d95ab8f94f69e92ea25678a40956bc1ce
2014-05-16 09:48:26 -07:00
Adrian Grange
9dc9f17814 Merge "Fix post-processor macros & remove vizualization" 2014-05-16 09:01:41 -07:00
Dmitry Kovalev
619e6b539a Merge "Removing redundant "8x8" suffix from MODE_INFO vars." 2014-05-15 17:53:31 -07:00
Jim Bankoski
ec82d2dfec Merge "Revert "Remove Wextra warnings from vp9_sad.c"" 2014-05-15 11:54:23 -07:00
Yunqing Wang
c661cf0dad Merge "AVX2 To VP9 Block Error Optimization" 2014-05-15 11:29:29 -07:00
Dmitry Kovalev
ed784a0bc4 Removing redundant "8x8" suffix from MODE_INFO vars.
Change-Id: I7ed7fecc959c6598ff98895f1a5cf7e11ac1615f
2014-05-15 11:14:42 -07:00
Adrian Grange
384bc5163c Fix post-processor macros & remove vizualization
Make all post-processor code conditionally
compilable based on the CONFIG_VP9_POSTPROC
macro.

Also, remove the vizualization code from VP9
since it is out of date and will not compile.

Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd
2014-05-15 08:35:36 -07:00
Jim Bankoski
a16794dd31 Revert "Remove Wextra warnings from vp9_sad.c"
This reverts commit 7ab9a9587b

Nightly test http://build.webmproject.org/jenkins/view/libvpx-nightly-tests/job/libvpx%20unit%20tests%20(valgrind-2)/arch=x86_64-linux-gcc,filter=-*VP8*:*Large.*/276/console

Failed 

This patch did not address all the assembly issues 
some of the vp8 assembly counts on 5 arguments being passed in to this function:   

one example : vp8_sad8x16_wmt

Please address or split this into vp9 and vp8 patches.

Change-Id: I78afcc171649894f887bb8ee3c66de24aaddc7ca
2014-05-15 08:31:20 -07:00
Yaowu Xu
71854f3a6e Merge "vp9_decodeframe.c: cleanup -wextra warnings" 2014-05-15 06:50:51 -07:00
Dmitry Kovalev
021eaabdb8 Hiding vp9_sub_pel_filters_{8, 8s, 8lp} filters in *.c file.
Change-Id: Id401da740b0a0141caaef9e1bcccd981e5cef4a4
2014-05-14 16:21:41 -07:00
levytamar82
1fbab853c8 AVX2 To VP9 Block Error Optimization
vp9_block_error_sse2 can only handle 16 bytes at a time but
the function requires to handle a sequence of 32 bytes at a time
so each 16 bytes is handled in a different register.
With AVX2 optimization the 32 bytes can be handled in one register instead
of two in the SSE2
The vp9_block_error was optimized by 85%.
The user level was optimized by 1.2%

Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
2014-05-14 11:51:07 -07:00
Deb Mukherjee
9687c057f8 Merge "Remove Wextra warnings from vp9_sad.c" 2014-05-14 10:01:50 -07:00
Yaowu Xu
ed09580777 vp9_decodeframe.c: cleanup -wextra warnings
Change-Id: I0315cea6a5e58182bc2556e9825ec2ef0b1480c3
2014-05-14 09:46:11 -07:00
Jingning Han
e5bbb4cfd8 Merge "Silience -wextra warnings in vp9_reconintra.c" 2014-05-14 09:25:08 -07:00
Deb Mukherjee
7ab9a9587b Remove Wextra warnings from vp9_sad.c
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.

Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
2014-05-14 03:17:31 -07:00
Dmitry Kovalev
eecc750b33 Merge "Moving loopfilter call to vp9_decode_frame()." 2014-05-13 17:20:26 -07:00
Jingning Han
806fa6aaca Silience -wextra warnings in vp9_reconintra.c
The warning messages complained that there are unused arguments
in a few prediction modes. This structure was designed on purpose,
such that a wrapper function can cover all prediction mode cases
and make them readily accessible as an pointer array.

This commit silences such warnings.

Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0
2014-05-13 12:54:23 -07:00
Adrian Grange
fd6bf31b8a vp9_convolve.c: cleanup -wextra warnings
Change-Id: I04930aca2293ebbaeb96dfedd2f9c5a55762fd2e
2014-05-13 09:57:24 -07:00
Dmitry Kovalev
ae7d3ef39f Moving loopfilter call to vp9_decode_frame().
Inline loopfilter has been already handled in vp9_decode_frame().
Collecting all similar code in one place now.

Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
2014-05-12 16:19:19 -07:00
Johann
ce23931a3f Only build neon assembly for armv7 targets
Allow selectively building just the intrinsics for armv8

Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
2014-05-12 08:52:02 -07:00
Alex Converse
ec8a3272fa Merge "Add an x86inc MMX fwht4x4." 2014-05-09 13:48:49 -07:00
Jingning Han
9412785b02 Merge changes I3edd4b95,I4514f974,Ie7fa4386
* changes:
  Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT
  Change eob threshold for partial inverse 8x8 2D-DCT to 12
  SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
2014-05-09 09:58:39 -07:00
Alex Converse
b5422fab46 Add an x86inc MMX fwht4x4.
Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197
2014-05-08 12:01:27 -07:00
Jingning Han
41a350a83d Change eob threshold for partial inverse 8x8 2D-DCT to 12
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.

The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).

Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
2014-05-08 09:48:58 -07:00
Jingning Han
9e7b09bc5d SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
This commit enables ssse3 assembly implementation of the 8x8
inverse 2D-DCT with only first 10 coefficients non-zero. The
average runtime for this unit goes down from 198 cycles to 129
cycles (34.8% faster).

Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
2014-05-07 17:40:02 -07:00
Dmitry Kovalev
68a600d82a Merge "Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c." 2014-05-07 13:34:05 -07:00
Paul Wilkins
33b1c457ed Revert "Add an MMX fwht4x4"
Includes changes that are not compatible with VS windows builds.
Amongst other things stdint.h is not supported in VS.

This reverts commit 89fbf3de50.

Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
2014-05-07 12:53:27 +01:00
Alex Converse
75d05d5ed4 Merge "Add an MMX fwht4x4" 2014-05-06 11:12:27 -07:00
Jingning Han
d289deb04c Merge "SSSE3 implementation of full inverse 8x8 2D-DCT" 2014-05-06 09:17:22 -07:00
Dmitry Kovalev
e8bbb3d9db Making vp9_get_sse_sum_{8x8, 16x16} static.
Change-Id: Ifb7937c977308c682986f0ce9645a0807d2aa46a
2014-05-05 19:12:38 -07:00
Alex Converse
89fbf3de50 Add an MMX fwht4x4
7% faster encoding a desktop lossless at RT speed 4.

Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64
2014-05-05 15:10:48 -07:00
Jingning Han
52ae97b6aa SSSE3 implementation of full inverse 8x8 2D-DCT
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.

Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
2014-05-05 10:49:27 -07:00
Dmitry Kovalev
25a666ef39 Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c.
Change-Id: I642a7d343677bf934e9a54cf4ad78e908620e39a
2014-05-01 16:45:49 -07:00
Jingning Han
39761eb5d6 Merge "Enable SSSE3 implementation of 8x8 forward 2D-DCT" 2014-04-30 13:41:36 -07:00
Dmitry Kovalev
d2bc8816a1 Merge "Adding search_site_config struct." 2014-04-29 16:59:47 -07:00
Jingning Han
1eaa3a76dc Enable SSSE3 implementation of 8x8 forward 2D-DCT
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
version is turned on only for x86_64. The average unit runtime
goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
This translates into about 1.5% speed-up for pedestrian_area 1080p
at speed 2.

Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
2014-04-29 15:49:18 -07:00
Dmitry Kovalev
9b042dc04c Merge "Removing unused vp9_variance_halfpixvar*() functions." 2014-04-29 14:52:58 -07:00
Dmitry Kovalev
aa464eca5e Adding search_site_config struct.
Change-Id: I2ad333553e673dbabcdc0f0366aea311e90849bf
2014-04-29 10:34:53 -07:00
Dmitry Kovalev
7b59014b74 Removing old unused vp9_tapify.py.
Change-Id: I7d66987fd04a3f98c140fc5f99ed0e9bc01f61d0
2014-04-25 15:19:31 -07:00
Dmitry Kovalev
6e01079cc0 Removing unused vp9_variance_halfpixvar*() functions.
Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3
2014-04-25 11:50:07 -07:00
Dmitry Kovalev
03e7deae4f Removing unused vp9_sub_pixel_mse* functions.
Change-Id: I8d906da3bd6de0d3042676846f61a8b2a3444508
2014-04-24 11:49:12 -07:00
Dmitry Kovalev
e608418899 Renaming MB_PREDICTION_MODE to PREDICTION_MODE.
Actually, it would be great to have two separate enums INTRA_MODES and
INTER_MODES in future.

Change-Id: I6c4147cf0002853da9c1e03fe9514eab876f01c8
2014-04-22 17:48:31 -07:00
Dmitry Kovalev
55977e4a4f Merge "Moving frame_frags field from VP9Common to VP9_COMP." 2014-04-15 10:39:31 -07:00
Dmitry Kovalev
63fa722179 Removing unused cost arguments from mcomp functions.
Change-Id: Id81a76d18be6b2de69f81bb563d74c3bb356d434
2014-04-11 10:24:36 -07:00
Yunqing Wang
23ccf71924 Merge "Fix encoder uninitialized read errors reported by drmemory" 2014-04-10 09:45:08 -07:00
Dmitry Kovalev
1d5ed021fb Moving frame_frags field from VP9Common to VP9_COMP.
Change-Id: I0f4a5c50561a2653d22c366c214a937272ecfa2c
2014-04-09 20:56:06 -07:00
Dmitry Kovalev
65e650e0c0 Merge "Revert "Converting set_prev_mi() to get_prev_mi()."" 2014-04-09 20:44:30 -07:00
Dmitry Kovalev
60def47f21 Revert "Converting set_prev_mi() to get_prev_mi()."
This reverts commit 22a3e30790

Change-Id: I460d905edf5fb2006da58c18fbe02c04d0c631bb
2014-04-09 15:23:16 -07:00
Tom Finegan
4fffefe189 Merge "Fix avx builds on macosx with clang 5.0." 2014-04-09 13:03:26 -07:00
Dmitry Kovalev
5ed83c3220 Merge "Converting set_prev_mi() to get_prev_mi()." 2014-04-09 10:27:05 -07:00
Yunqing Wang
2e7d327789 Merge "Use source frame difference to make partition decision" 2014-04-09 10:26:42 -07:00
Yunqing Wang
3a6670fcf8 Fix encoder uninitialized read errors reported by drmemory
This patch fixed the uninitialized read errors in Issue 748:
"dr memory VP9 encode errors". In vp9_convolve_avg_sse2,
when width is 4, pavgb reads 8 bytes from dst buffer that is
out of range. An error is reported although the data is not
actually used later. This issue was resolved by preventing
uninitialized reads.

Change-Id: I109a54910aa47139cb13119de86f2062cff207df
2014-04-09 09:59:15 -07:00
Tom Finegan
f600b50a6e Fix avx builds on macosx with clang 5.0.
The macosx release of clang v5.0 identifies itself as:
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)

This version of clang uses the older _mm_broadcastsi128_si256, like
v3.3, as given away in the LLVM svn version above.

Change-Id: I4d6d59d5454efd57d2ae9e75f5eb7486af7cbd0c
2014-04-08 18:56:03 -07:00
Yunqing Wang
4e66293fcb Use source frame difference to make partition decision
Calculate the difference variance between last source frame and
current source frame. The variance is calculated at 16x16 block
level. The variances are compared to several thresholds to decide
final partition sizes.

An adaptive strategy is implemented to decide using
SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
in the video. The switching test is done once every
search_type_check_frequency frames.

The selection of source_var_thresh needs to be investigated
further later.

RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
ssim gain. For clips with large enough static area, the
encoding speedup is around 2% to 15%.

Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
2014-04-08 17:03:02 -07:00
Deb Mukherjee
d35df2d8ea High-level hooks for Profile 2 (10/12 bit)
Adds some high-level hooks for profile 2 before further
progress on the implementation.

According to the definitiion in this patch:
1. Profile 2 only supports 10 or 12 bit color but not 8
2. Profile 2 supports all color sampling modes: 444, 422 and 420,
and alpha plane.
3. Profile 3 is currently undefined.

Please consider the definition carefully and suggest modifications
to the definition as needed.

Change-Id: I5b284fc679e54ac5aee171af72fa7994cfd28995
2014-04-08 16:18:34 -07:00
Dmitry Kovalev
22a3e30790 Converting set_prev_mi() to get_prev_mi().
Change-Id: Iad4002d7aecaae0e25d88e286bacde7e6cd7264f
2014-04-07 16:01:34 -07:00
Dmitry Kovalev
b5e12dda52 Cleaning up vp9_{cx, dx}_iface.c files.
Change-Id: Ib4e31ba74c4b882bd93942ef743f4a189892738d
2014-04-07 10:38:51 -07:00
Dmitry Kovalev
a9f324fa7f Removing interp_kernel from MACROBLOCKD.
Now interp_kernel is obtained when it is really required (based on
mbmi->interp_filter value).

Change-Id: I4c7a93c179d1045eba16e7526c293d02c9b8b47e
2014-04-03 15:28:42 -07:00
Dmitry Kovalev
8b8606a737 Merge "Cleaning up vp9_mvref_common.c." 2014-04-02 11:03:36 -07:00
Dmitry Kovalev
68027a0b8a Merge "Grouping members in MB_MODE_INFO struct." 2014-04-02 11:00:58 -07:00
Dmitry Kovalev
86f44a91f4 Renaming two members in MACROBLOCKD struct.
Renames:
  mi_8x8 -> mi
  mode_info_stride -> mi_stride

Change-Id: I66f3e5fd1e7b7f46f108af5bb711c5fd9493c1be
2014-04-01 17:46:40 -07:00
Dmitry Kovalev
d42976c515 Common configuration for MACROBLOCKD struct.
Change-Id: Ie2ea9dd8bd338cc9fe12ca9033df64f7644c68b3
2014-04-01 10:57:59 -07:00
Dmitry Kovalev
20d868f05d Grouping members in MB_MODE_INFO struct.
Change-Id: Ia6d7e7a08810e0c3401da4d10266828d560e6851
2014-03-28 17:44:13 -07:00
Yaowu Xu
4f857bacd2 [BITSTREAM]Fix the scaling calculation
For very large size video image, the scaling calculation may need use
value beyond the range of int. This commit upgrade the value to 64bit
to make sure the calculation do not wrap around INT_MAX.

The change corrected the decoder behavior.

The bug affects only very large resolution video because the scaling
calculation was sufficient for image size smaller than 2^13.

This resolves issue:
https://code.google.com/p/webm/issues/detail?id=750

Change-Id: I2d2ed303ca6482f31f819f3c07d6d3e98ef3adc5
2014-03-28 16:40:29 -07:00
Dmitry Kovalev
03349d2ba2 Moving dqcoeff array to MACROBLOCKD in decoder.
Change-Id: I3e20c0cdb9d2437bddf21afb255855f2dead8e02
2014-03-28 10:36:16 -07:00
Dmitry Kovalev
38053687bc Cleaning up vp9_mvref_common.c.
Change-Id: I4eb815156ecaab02c9182e6e1abbea0e4d86c441
2014-03-27 17:50:02 -07:00
Dmitry Kovalev
0437575848 Merge "Removing prev_mi_8x8 from MACROBLOCKD." 2014-03-26 15:45:11 -07:00
Dmitry Kovalev
38c2d37b9d Merge "Cleaning up vp9_entropymv.c." 2014-03-26 14:28:45 -07:00
Dmitry Kovalev
63f86c149a Removing prev_mi_8x8 from MACROBLOCKD.
Change-Id: I32beb5f18c10b5771146c55933b5555487f53633
2014-03-26 10:50:34 -07:00
Dmitry Kovalev
ed39c40a2e Moving above_context to VP9_COMMON.
Change-Id: I713af99d1e17e05a20eab20df51d74ebfd1a68d2
2014-03-25 10:40:08 -07:00
Yaowu Xu
34a3628a45 Merge "Fixed a build issue" 2014-03-25 10:22:18 -07:00
Yaowu Xu
59872069d2 Merge "Change back the scaling calculation." 2014-03-25 09:48:21 -07:00
Yaowu Xu
8051563972 Fixed a build issue
Adding the missed include file.

Change-Id: I7e48df6b0633afbebaf1ccb3062ae404e7203dc9
2014-03-25 09:45:54 -07:00
Dmitry Kovalev
5b8c834c1a Initialization code cleanup.
Change-Id: I47a8b4bf9a6cc0063d1a6785eaaad641d0659e24
2014-03-24 12:21:22 -07:00
Dmitry Kovalev
49bb6df0e2 Cleaning up vp9_entropymv.c.
Change-Id: I01b3530779da89acb84c71bac5ccac456f00c5ac
2014-03-24 11:02:27 -07:00
Yunqing Wang
b458bb7c20 Merge "AVX2 SAD Optimization:" 2014-03-24 10:52:32 -07:00
Dmitry Kovalev
ac5bdc0ed8 Merge "Cleaning up vp9_loopfilter.c." 2014-03-24 09:02:06 -07:00
hkuang
22232ec602 Change back the scaling calculation.
Let the calculation to be compatible with Google's HW implementation.

Change-Id: I22e179888cdb0419e230351c0a47661b37051fef
2014-03-24 08:32:56 -07:00
Dmitry Kovalev
9895c9d4dd Merge "Removing redundant {above, left}_seg_context manipulation code." 2014-03-22 22:31:48 -07:00
Dmitry Kovalev
2786938a3c Merge "Renaming and making vp9_update_mode_info_border() static." 2014-03-21 21:19:18 -07:00
Dmitry Kovalev
58cc06f9b3 Cleaning up vp9_loopfilter.c.
Change-Id: I7c7cf7d3c7b00d1c74ffa8aa8fb8d78a0e48326f
2014-03-21 16:31:15 -07:00
Frank Galligan
8345e76d61 Merge "Fix libvpx VP9 decoder dr memory errors" 2014-03-21 15:24:39 -07:00
Dmitry Kovalev
e141f10bfc Renaming and making vp9_update_mode_info_border() static.
Change-Id: Ibb72a29cae9ca9443aae56fc4c5458d190eae279
2014-03-21 14:02:25 -07:00
levytamar82
0fa8b668c1 AVX2 SAD Optimization:
2 functions were optimized for avx2 by using full 256 bit register
In order to handle 32 elements in parallel instead of only 16 in parallel:
1. vp9_sad32x32x4d
2. vp9_sad64x64x4d

The function level gain is 66% and the user level gain is ~1%.

Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
2014-03-21 13:53:32 -07:00
Yunqing Wang
9b5df3fabe Fix libvpx VP9 decoder dr memory errors
Fixed dr memory errors reported in Issue 736:
https://code.google.com/p/webm/issues/detail?id=736

All elements in left_col buffer need to be initialized to ensure
the correctness of SIMD operations in x86 optimized code.

Change-Id: I8e7f26ab45cca8099c1f9342bcf852f828bda7e4
2014-03-21 12:23:47 -07:00
Dmitry Kovalev
4cb37bff96 Removing redundant {above, left}_seg_context manipulation code.
Change-Id: Ib3c1746e61220c629cbd971b2458aa686b5c9e36
2014-03-21 12:12:55 -07:00
Dmitry Kovalev
a57de9da03 Merge "Reusing {above, left}_seg_context vars in both encoder and decoder." 2014-03-21 12:02:42 -07:00
Yaowu Xu
46c71e5eba Merge "Remove duplicate declaration" 2014-03-21 08:44:04 -07:00
Dmitry Kovalev
7ad40117f1 Reusing {above, left}_seg_context vars in both encoder and decoder.
Change-Id: Id1fa36c92cb007b73a450cc8552e810cedad38b9
2014-03-20 16:15:57 -07:00
Dmitry Kovalev
03781ff22d Merge "Removing mi_stream." 2014-03-20 13:43:13 -07:00
Dmitry Kovalev
4b37dc8d87 Adding alloc_mi() function.
Change-Id: I3b944884c048f589c86e0169aeb3c3855bc8b729
2014-03-19 13:31:47 -07:00
Yaowu Xu
7ef16efca1 Remove duplicate declaration
Change-Id: Ic8e52a89e0df816c38cd8ff1b7c53862b9a6dff2
2014-03-19 12:23:32 -07:00
Yaowu Xu
8cb59992e8 Merge "Fix the md5 mismatch for some scale cases." 2014-03-19 11:13:28 -07:00
Dmitry Kovalev
8ccfcb765f Removing mi_stream.
Change-Id: If674140e30c223c88894b983fd22a583efb99dcf
2014-03-19 10:47:32 -07:00
Dmitry Kovalev
b8bc2d337a Fixing warnings/errors from c++ compiler.
Change-Id: Ia561dda53f2dd10e3a10a2df2adb8027ab19397a
2014-03-18 10:47:51 -07:00
hkuang
1f7e4856f8 Fix the md5 mismatch for some scale cases.
Fixes issue #731
Change-Id: Id313e84b8fb4ff20f6a4e1ed11cb601927888318
2014-03-17 11:21:43 -07:00
Dmitry Kovalev
7c6337ba9e Merge "Adding vp9_swap_mi_and_prev_mi() function." 2014-03-13 17:47:27 -07:00
Dmitry Kovalev
d8e5564129 Using MB_PREDICTION_MODE enum instead of int.
Change-Id: I652d17f7bff84f75d015f4f39652472e14eb3134
2014-03-13 15:03:00 -07:00
Dmitry Kovalev
e65c564c78 Adding vp9_swap_mi_and_prev_mi() function.
Change-Id: I18b3939f0b51085cdd25c9182c3a9c7536ca7e3e
2014-03-13 13:55:33 -07:00
Dmitry Kovalev
3dca8ca7af Merge "Renaming mode2txfm_map to intra_mode_to_tx_type_lookup." 2014-03-12 23:29:29 -07:00
Yaowu Xu
17256ad763 Revert "With on demand border extension, clamping the MV"
This reverts commit b0fec6ab4a.

Change-Id: I9acd8ee0423f22d92138f11579611ff959331013
2014-03-12 19:40:15 -07:00
Yaowu Xu
acf2eb73e7 Revert "Remove dec_build_inter_predictors() parameters"
This reverts commit 9650b9d72a.

Change-Id: I841c4a4734170fda63469e32adc10703aa4bf0fa
2014-03-12 19:39:59 -07:00
Dmitry Kovalev
95aed4a3fa Renaming mode2txfm_map to intra_mode_to_tx_type_lookup.
Change-Id: I9a19eb96907f674e3ce1e573f5dd49f0fbf2ae4f
2014-03-12 17:23:26 -07:00
Dmitry Kovalev
c909b43e3c Merge "Moving mi_streams from VP9Decompressor to VP9Common." 2014-03-12 12:20:18 -07:00
Dmitry Kovalev
fec0d4bc7d Merge "Removing last_mi from MACROBLOCKD struct." 2014-03-12 12:19:43 -07:00
Dmitry Kovalev
dff81e6c7a Moving mi_streams from VP9Decompressor to VP9Common.
Change-Id: I7ad79c061ad4efbc4914ac49723b48183fdbdd47
2014-03-10 16:12:45 -07:00
Dmitry Kovalev
ff935ff781 Removing last_mi from MACROBLOCKD struct.
Change-Id: Ied12b39c55667b26fd3bf90eb331e601c53a10f6
2014-03-10 16:02:03 -07:00
Dmitry Kovalev
6281a9abbb Adding type casts to remove C++ compiler errors.
Change-Id: I224e49955ad6c833d204feb8efc4056e37d206be
2014-03-10 14:53:30 -07:00
Dmitry Kovalev
f8f8c6d44c Adding reusable get_y_mode_prob() function.
Change-Id: Iebd182d7aeebc0f8964b6fd35057449bb25b00c1
2014-03-10 10:50:16 -07:00
Jim Bankoski
622f06eb59 Merge "vp9_reconinter.h static functions in header converted to global" 2014-03-10 07:36:05 -07:00
Jim Bankoski
ffda0cde7b Merge "vp9_onyxc_int.h static -> static inline in header" 2014-03-10 07:35:54 -07:00
Dmitry Kovalev
0ac2139d02 Merge "Removing vp9_onyx.h and moving its content to the encoder." 2014-03-06 11:49:41 -08:00
James Zern
e7fe1543f6 Merge "vp9_systemdependent: reorder includes avoid proto mismatch" 2014-03-06 11:42:50 -08:00
James Zern
fe49c05214 Merge "vp9_subpixel_8t_intrin_avx2: fix build w/clang 3.4+" 2014-03-06 11:41:44 -08:00
James Zern
caecedc92f vp9_subpixel_8t_intrin_avx2: fix build w/clang 3.4+
clang reports gcc-4.2.1 in e.g., 3.3, 3.4; add a specific clang version
check for _mm256_broadcastsi128_si256

fixes issue #720

Change-Id: I5c8e3c27fdea05d8a5b050e8cb74894b595f4709
2014-03-06 10:55:44 -08:00
Dmitry Kovalev
3f1ab25812 Removing vp9_onyx.h and moving its content to the encoder.
Change-Id: I03451c88536bc498edddbe0cd9773ff79da085c2
2014-03-05 23:33:22 -08:00
James Zern
e9680bef22 vp9_systemdependent: reorder includes avoid proto mismatch
fixes a warning in vs9/x64 related to ceil()

Change-Id: Ic4bde9d0b7e961546dbe304de74aa37fc02fcf94
2014-03-05 22:02:29 -08:00
Dmitry Kovalev
08a7d7e405 Merge "Renaming NMV_UPDATE_PROB to MV_UPDATE_PROB." 2014-03-05 21:39:09 -08:00
Dmitry Kovalev
bb9b6a9568 Merge "Cleaning up vp9_mvref_common.c." 2014-03-05 10:57:37 -08:00
Dmitry Kovalev
791751015f Merge "Removing VP9_PTR." 2014-03-05 10:57:10 -08:00
Dmitry Kovalev
d31fc628a7 Renaming NMV_UPDATE_PROB to MV_UPDATE_PROB.
Change-Id: I7f3bcca103f0b1f6b3c064b61472543de9a8288a
2014-03-05 10:37:52 -08:00
Dmitry Kovalev
fe7b1d0a8d Removing VP9_PTR.
Change-Id: Ib49d8dbc67c590f22a1a70251ff607c9f38febd7
2014-03-03 16:50:16 -08:00
Jim Bankoski
e5e9b05d68 vp9_reconinter.h static functions in header converted to global
Change-Id: I916944950deb22f4c2301d83a803b732bf3ecd77
2014-03-03 14:58:43 -08:00
Jim Bankoski
3d12e65483 vp9_onyxc_int.h static -> static inline in header
Change-Id: Ib65fb0679156960305b10fbf590254ff6bf1bfe1
2014-03-03 14:50:07 -08:00
James Zern
805078a1bf build: convert rtcd.sh to perl
significantly speeds up file generation.

the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.

---
Linux
    [CREATE] vpx_scale_rtcd.h
real    0m0.485s ->    0m0.022s
    [CREATE] vp8_rtcd.h
real    0m4.619s ->    0m0.060s
    [CREATE] vp9_rtcd.h
real    0m10.102s ->    0m0.087s

Windows
    [CREATE] vpx_scale_rtcd.h
real    0m8.360s ->    0m0.080s
    [CREATE] vp8_rtcd.h
real    1m8.083s ->    0m0.160s
    [CREATE] vp9_rtcd.h
real    2m6.489s ->    0m0.233s

Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
2014-03-03 14:47:11 -08:00
Dmitry Kovalev
be647f7b83 Merge "Adding get_tx_type() instead of get_tx_type_{8x8, 16x16}." 2014-03-03 14:24:28 -08:00
Dmitry Kovalev
594677a76b Merge "Moving FRAME_CONTEXT & FRAME_COUNTS to vp9_entropymode.h." 2014-03-03 14:24:04 -08:00
Dmitry Kovalev
46af01d719 Adding get_tx_type() instead of get_tx_type_{8x8, 16x16}.
Change-Id: I4a54b12e5229705222c5a101258b9d1f81e2948d
2014-03-03 12:20:51 -08:00
Dmitry Kovalev
c288367678 Adding consts and cleaning up vp9_rdopt.
Change-Id: I9423b543e1be414e5c9e10480b813f06e6b88f8a
2014-03-03 12:19:51 -08:00
Yunqing Wang
d4648d93f4 Merge "AVX2 SubPixel AVG Variance Optimization" 2014-03-03 09:01:36 -08:00
Yaowu Xu
9650b9d72a Remove dec_build_inter_predictors() parameters
There were two parameters not in use, this commit removed them.

Change-Id: Ia03a73b9a2521400bed539df45574e34214ed93a
2014-03-01 11:14:00 -08:00
Yaowu Xu
2f4eb5f096 Remove vp9_create_common()
The function has evolved over time, now only calls vp9_rtcd(), so this
commit removes the function and changes to call vp9_rtcd() directly.

Change-Id: I8cfa6190daa4b28f6f3d1e11bb3a07f9c95322bf
2014-03-01 10:59:24 -08:00
levytamar82
ea14909687 AVX2 SubPixel AVG Variance Optimization
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_avg_variance64x64
2. vp9_sub_pixel_avg_variance32x32
both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 80% function level gain and 2% user level gain

Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8
2014-02-28 22:51:04 -07:00
Dmitry Kovalev
d689f2ad33 Cleaning up vp9_mvref_common.c.
different_ref_found is always equal to one (if calculated) because
ref_frame[0] != ref_frame[1] for each mi-block.

Change-Id: Ibd7625b7b29dec2fd3c40edbc3de1169abb78585
2014-02-28 15:12:33 -08:00
Dmitry Kovalev
e68cc30bb5 Moving FRAME_CONTEXT & FRAME_COUNTS to vp9_entropymode.h.
Change-Id: I1fe71e35b1e44da693b43d26607abb33efd56820
2014-02-28 13:56:43 -08:00
Dmitry Kovalev
e4159100bc Merge "Adding get_y_mode() function." 2014-02-28 11:12:22 -08:00
Dmitry Kovalev
28bd1dd15e Merge "Adding consts to arguments of vp9_block_error()." 2014-02-28 10:51:43 -08:00
Dmitry Kovalev
3a83d08a08 Merge "Moving get_tx_eob() from common to encoder." 2014-02-28 10:49:47 -08:00
hkuang
edcbbf2ee3 Merge "Fix a bug in neon that has not save and restore q4-q7 registers." 2014-02-28 09:48:26 -08:00
Dmitry Kovalev
3b2cd9137a Moving get_tx_eob() from common to encoder.
Change-Id: I7d11c6ae259aff6560710d16fea3032c661e5b02
2014-02-27 18:26:44 -08:00
Dmitry Kovalev
791e9bdac9 Adding consts to arguments of vp9_block_error().
Change-Id: Id145da99259866109cfee8b47a1d8f309944b937
2014-02-27 18:17:08 -08:00
Dmitry Kovalev
1ae91f7784 Adding get_y_mode() function.
Change-Id: Iaac57b24f79cd205a8c62bc1177412d22f5787a8
2014-02-27 16:05:50 -08:00
hkuang
f3d8e315ac Fix a bug in neon that has not save and restore q4-q7 registers.
Change-Id: Ie21b5ae89100389b80f919710839084f935a8545
2014-02-27 14:06:52 -08:00
Minghai Shang
3a8deeb8b6 Merge "[svc] Add target bitrate settings for each layers." 2014-02-27 10:51:26 -08:00
Dmitry Kovalev
2c594a5275 Removing vp9_systemdependent.c.
Change-Id: I7b9738a7113c0c4687e5d320581ff69d98a8b271
2014-02-26 18:07:23 -08:00
Minghai Shang
8c196b27b3 [svc] Add target bitrate settings for each layers.
Change-Id: Ia7677fb436667bc4f76db71f65e4784f433f7826
2014-02-26 13:30:50 -08:00
hkuang
08f250f565 Merge "Fix a bug in intra prediction due to change in 25e55526301eba7d6e5c68e25402e9b2102976d8." 2014-02-26 11:56:45 -08:00
hkuang
1c4e449133 Fix a bug in intra prediction due to change in
25e5552630.

Change-Id: I17ac67c3ced91ad4f057b296f7e8dc86a3389f26
2014-02-25 17:54:33 -08:00
Dmitry Kovalev
7bca32a6a3 Merge "Changing vp9_full_search_sad{, x3, x8} signatures." 2014-02-25 10:51:17 -08:00
Yaowu Xu
05e850cb9e added clamp of segment loop filter level
for ABSDATA mode, so segment loop filter level always fall in valid
range for both Absolute and delta modes.

Change-Id: If90df3411479533dbdab63f8ae088d2f5dd174a9
2014-02-24 09:56:48 -08:00
Yaowu Xu
bfaf415ea7 Merge "Added clamp of qindex to valid range" 2014-02-24 08:28:07 -08:00
Dmitry Kovalev
2aacc66b66 Merge "Cleaning up vp9_mvref_common.{h, c}." 2014-02-23 08:25:40 -08:00
Yaowu Xu
e22b12e304 Added clamp of qindex to valid range
The qindex for a segment was not clamped in ABSDATA mode, which may
cause invalid memory access if an ill-formed stream has a negative
value in ABSDATA mode. This commit added clamp to make sure qindex
for a segment always fall into valid range.

Change-Id: I0a74d00f4ef40aec7edaeca1d03c8645e23ab08c
2014-02-22 12:30:18 -08:00
Yaowu Xu
f1633e5844 Merge "Remove an unused variable" 2014-02-21 22:44:05 -08:00
Alex Converse
6e3cf6ec1d Stop gating non420 features with a configure flag.
Change-Id: I8cc38fdef6a2a0968af8dfe15e7c2b3c46c531ea
2014-02-21 12:05:29 -08:00
James Zern
e2f614be53 Merge "vp9_subpixel_8t_intrin_ssse3.c: make some tables static" 2014-02-20 16:02:16 -08:00
James Zern
3240db7407 Merge "vp9_subpixel_8t_intrin_avx2.c: make some tables static" 2014-02-20 16:01:50 -08:00
Yaowu Xu
c58e1c7be9 Remove an unused variable
Change-Id: I8eeec70a7d4403243762f14d0b560792801645e8
2014-02-20 14:49:44 -08:00
James Zern
10f2db2b1f Merge "vp9: normalize DECLARE_ALIGNED use on global tables" 2014-02-19 11:38:47 -08:00
Dmitry Kovalev
d43c5cc5ea Cleaning up vp9_mvref_common.{h, c}.
Hiding vp9_find_mv_refs_idx() inside vp9_mvref_common.c, moving definition
of vp9_find_mv_refs() to vp9_mvref_common.c.

Change-Id: I0c9f34b03648785a7d18edf6d4fddd34e55dfcc5
2014-02-19 14:23:51 +01:00
Dmitry Kovalev
35bd886864 Merge "Cleaning up pack_inter_mode_mvs() function." 2014-02-19 01:04:36 -08:00
James Zern
b78c219c80 vp9: normalize DECLARE_ALIGNED use on global tables
- place extern within the macro
- use in the header only

Change-Id: I4274b345d8af9ef329c0eb9553a3ddaad70d1d26
2014-02-18 22:57:43 -08:00
James Zern
d73d621e5d vp9_subpixel_8t_intrin_ssse3.c: make some tables static
+ fix formatting

Change-Id: I344d4de089d03e403f0c7b3e64aeb7086cce86ac
2014-02-18 20:42:00 -08:00
James Zern
a96af49bab vp9_subpixel_8t_intrin_avx2.c: make some tables static
+ fix formatting

Change-Id: Ia62610bff3d63855104366d7860749b6a3cf4577
2014-02-18 20:40:40 -08:00
James Zern
26c8e720ca Merge "vp9_filter: move table alignment decl's to header" 2014-02-18 20:15:33 -08:00
Yunqing Wang
0cc71c9c9f Merge "SSSE3 convolution optimization" 2014-02-18 12:55:34 -08:00
Yunqing Wang
ad8d4454f0 Merge "AVX2 SubPixel Variance Optimization" 2014-02-18 12:18:13 -08:00
Dmitry Kovalev
36420009ea Changing vp9_full_search_sad{, x3, x8} signatures.
Passing block MV pointer instead of block index into
vp9_full_search_sad{, x3, x8} functions.

Change-Id: Ica07356633471c2c8f81b583a7aeba85a436bafb
2014-02-17 14:24:57 +01:00
James Zern
8092080216 vp9_filter: move table alignment decl's to header
avoids mismatched alignment warnings in visual studio builds

Change-Id: I2cedb8042fd47e708bde3f7168a6fb4bd9aaa569
2014-02-15 10:18:24 -08:00
James Yu
e486488ce8 Replace vqshrun by vqmovun if shift #0 bit
Change-Id: Ifabb8c7ec0c327fea9d6739cab10addb060ff435
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-14 21:03:40 -08:00
Johann
4378503665 Merge "Remove redundant arm neon instructions." 2014-02-14 20:02:51 -08:00
levytamar82
52dac5d1cb AVX2 SubPixel Variance Optimization
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_variance64x64
2. vp9_sub_pixel_variance32x32
both of those function were calling vp9_sub_pixel_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 70% function level gain and 2% user level gain

Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde
2014-02-14 16:59:11 -07:00
Adrian Grange
b7be30eb36 Cleanup some comments.
Change-Id: I568861ba1d43620865ad9a98a97eef37a51fd856
2014-02-14 15:05:30 -08:00
Yaowu Xu
ecf392a155 Merge "minor spelling cleanup in comments" 2014-02-14 14:29:35 -08:00
levytamar82
3068d7d944 SSSE3 convolution optimization
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization is done only for 64 bit

Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
2014-02-14 15:08:42 -07:00
Dmitry Kovalev
19a8eee1f0 Cleaning up pack_inter_mode_mvs() function.
Change-Id: I48ad06e3e1ae9720a0683022621f4504e3bebce6
2014-02-13 19:21:10 -08:00
Yaowu Xu
8d646becb6 Merge "Removed the reset of mode_info from previous frame" 2014-02-13 17:03:50 -08:00
Frank Galligan
fb8c246b70 Merge "Add VP9 decoder support for external frame buffers" 2014-02-13 15:29:52 -08:00
Frank Galligan
a4f30a5023 Add VP9 decoder support for external frame buffers
Added support for external frame buffers to libvpx's VP9 decoder.
If the external frame buffer functions are set then libvpx will
call the get function whenever it needs a new frame buffer to
decode a frame into. And it will call the release function
whenever there are no more references to that buffer.

Change-Id: Id2934d005f606af6e052fb6db0d5b7c02f567522
2014-02-13 13:14:19 -08:00
Yaowu Xu
896d79a57e Removed the reset of mode_info from previous frame
Prior to this commit, both encoder and decoder reset mode/mv info from
previous frame in error resilient mode to ensure bitstreams are able to
decode when there is loss of frame in decoder side. However, this is
not necessary. This commit changed to remove the reset, so encoder can
continue to use mode/mv/partition information from previously encoded
frame without affecting decodeablilty under loss of frame.

Change-Id: I0279f862900dc647fb471ae3389770bb1b9f454f
2014-02-13 12:48:08 -08:00
Dmitry Kovalev
df6c523fed Merge "Renaming skip_coeff to skip for consistency." 2014-02-13 11:04:34 -08:00
Frank Galligan
e5a1b214f7 Merge "Fix neon wide loopfilter for filter8 only branch" 2014-02-13 09:52:48 -08:00
Yunqing Wang
92824a9cbc Merge "AVX2 Convolve Optimization" 2014-02-13 09:43:55 -08:00
levytamar82
876c72a093 AVX2 Convolve Optimization
Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows

Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
2014-02-12 20:45:31 -07:00
Frank Galligan
b41acbf9bb Fix neon wide loopfilter for filter8 only branch
The current code removed the check to only perform the filter8.

Change-Id: Ie54e19a77745042a5660eab986d9ef1c42e82410
2014-02-12 18:36:17 -08:00
Dmitry Kovalev
004c8c636e Renaming skip_coeff to skip for consistency.
Change-Id: I036e815ca63d00cba71202ae09ba0f6ef745dcb8
2014-02-12 17:44:12 -08:00
Andrew Russell
549c31f8ae minor spelling cleanup in comments
Change-Id: Ia91c6c406273345b08505097ffe1af3896980f06
2014-02-12 16:32:51 -08:00
Dmitry Kovalev
50712fcaa9 Adding consts to mv search function arguments.
Change-Id: Ie79114bba4f0cea55d9f701e20d2be2017630f3b
2014-02-12 14:28:23 -08:00
Dmitry Kovalev
0109d757ee Merge "Removing vp9_foreach_transformed_block_uv() function." 2014-02-12 12:11:14 -08:00
Jingning Han
e8b7610e8f Use INTER_OFFSET in vp9_pick_inter_mode
Cosmetic change to use pre-defined macros.

Change-Id: I93e9fa90113d0242599048940b39694660385a6f
2014-02-12 09:14:29 -08:00
James Yu
619f29cdb0 Remove redundant arm neon instructions.
Change-Id: I1fabad59747eb5f68c64275a36c3a1d94daf32a3
Signed-off-by: James Yu <james.yu@linaro.org>
2014-02-11 21:19:12 -08:00
Dmitry Kovalev
79dd1f8441 Removing vp9_foreach_transformed_block_uv() function.
Change-Id: I35ec77b71e6fd686865cead9281e4dd9e9bc9e86
2014-02-11 18:06:00 -08:00
Tom Finegan
c49c75fde0 Merge "vp9/common/x86: Silence MSVC warnings in vp9_asm_stubs.c." 2014-02-11 14:39:27 -08:00
Frank Galligan
d51ca0db00 Merge "Add get release decoder frame buffer functions." 2014-02-11 08:19:37 -08:00
Dmitry Kovalev
803a5c67dd Merge "Encoder quantization cleanup." 2014-02-10 21:32:04 -08:00
Tom Finegan
60e91a92c3 vp9/common/x86: Silence MSVC warnings in vp9_asm_stubs.c.
Update filter_1dfunction definition to match usage.

Change-Id: Ie3cae13dc1ec3f5838c5f29d1c76a1a98a9217fa
2014-02-10 15:08:42 -08:00
Frank Galligan
e8e152799b Add get release decoder frame buffer functions.
This CL changes libvpx to call a function when a frame buffer
is needed for decode. Libvpx will call a release callback when
no other frames reference the frame buffer. This CL adds a
default implementation of the frame buffer callbacks. Currently
only VP9 is supported. A future CL will add support for
applications to supply their own frame buffer callbacks.

Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
2014-02-10 14:08:11 -08:00
Jim Bankoski
3c790ec0f8 Convert small static header functions to inline
Change-Id: I467b28346a0d8d4d8b96d6c05fc39c34eec26e5c
2014-02-10 07:56:45 -08:00
Jim Bankoski
b5f59ea280 Convert small static functions in header to inline..
Change-Id: Ic4fc01be7738fbabf8c7860dbe3476ab4caf5fc2
2014-02-10 07:56:38 -08:00
Jim Bankoski
7341725e13 Convert small header functions to inline
Change-Id: I4e5575f0d7ccfe2361b8cbf78e7dc079272c9f5f
2014-02-10 07:56:29 -08:00
Jim Bankoski
69f58b40e0 Convert header static functions to inline or make them global.
Change-Id: Ib26fbfef3505299f754e5af6c437a85d7746fc28
2014-02-10 07:39:12 -08:00
Jim Bankoski
6a9e58cb1d Converted functions in header to INLINE...
Change-Id: I00512c6cef3a4af8df57c7263ceb853fb2db8140
2014-02-09 20:12:04 -08:00
Jim Bankoski
18c8deabbf Convert functions to inline that are small .
Change-Id: I3b160e93d9319c8e1abda2a60f49f89c409d534b
2014-02-09 20:08:58 -08:00
Jim Bankoski
9768d0b184 Convert functions to inline that are in headers static.
Change-Id: If1ec3b64be327e8c48ec7efbacde208d2129fdb0
2014-02-09 20:06:35 -08:00
Jim Bankoski
99e4c508b2 Converted function to inline
Change-Id: Iaa4880c8a207cfea509608e1ef4593794b6b31f2
2014-02-09 20:04:54 -08:00
Jim Bankoski
3a3aa3f4e3 Converted short static functions to inline.
Change-Id: I859719d41ced2e35d2765b636e627bb7edc3651e
2014-02-09 19:58:54 -08:00
Tom Finegan
bf79a4da77 vp9/common: Silence MSVC warning in vp9_convolve.c.
Added cast to int to silence MSVC warning.

Change-Id: I9ef4709d2e4cf0db070d9e52385c1b3f138b00a5
2014-02-07 10:13:57 -08:00
Dmitry Kovalev
005fc6970b Finally removing "short" from transform names.
Change-Id: I5259b68dc1bcceb153e3ffe638a79a59a3019e9d
2014-02-06 11:54:15 -08:00
Marco Paniconi
4864ab21b0 Layer based rate control for CBR mode.
This patch adds a buffer-based rate control for temporal layers,
under CBR mode.

Added vpx_temporal_scalable_patters.c encoder for testing temporal
layers, for both vp9 and vp8 (replaces the old vp8_scalable_patterns).

Updated datarate unittest with tests for temporal layer rate-targeting.

Change-Id: I8900a854288b9354d9c697cfeb0243a9fd6790b1
2014-02-06 09:24:45 -08:00
Dmitry Kovalev
f32fa45cba Merge "Cleaning up vp9_get_pred_context_single_ref_p1()." 2014-02-05 18:38:38 -08:00
Dmitry Kovalev
4a1a7919da Merge "Removing "_1d" suffix from mips transform code." 2014-02-05 18:37:49 -08:00
Yunqing Wang
7ad56bf3c9 Merge "Optimize bilinear sub-pixel filters in ssse3" 2014-02-05 17:20:52 -08:00
Dmitry Kovalev
724fefb4cf Cleaning up vp9_get_pred_context_single_ref_p1().
Change-Id: I279343b474d7ff41afcf8f1493b6fbf716b51823
2014-02-05 11:48:01 -08:00
Dmitry Kovalev
a536237228 Merge "Cleaning up vp9_get_pred_context_single_ref_p2()." 2014-02-05 11:37:17 -08:00
Martin Storsjo
03bc491721 arm: Consistently use braces around doubleword arguments to vld
This isn't strictly necessary, but makes the file more consistent
with the other arm assembly source files.

Change-Id: I245c9677d89e0ab3f31991e473764858af35b180
2014-02-05 13:24:25 +02:00
Martin Storsjo
c2bb1aa544 arm: Use {} around quadword arguments to vld
This fixes building for iOS.

Change-Id: Ice082648c02a3faf93891f7ddc122875e2bdc9cb
2014-02-05 13:24:17 +02:00
James Zern
d89f861f4b vp9_systemdependent.h: relocate system includes
avoid wrapping msvc includes with extern "C"; this breaks some visual
studio builds of the (c++) tests.

Change-Id: Ie8062d55d4f4c049f6cd360a36da6a67607df132
2014-02-04 18:28:45 -08:00
Dmitry Kovalev
c31cf0d647 Merge "Moving x1 & y1 calculation under if condition." 2014-02-04 14:50:25 -08:00
hkuang
b0fec6ab4a With on demand border extension, clamping the MV
is not longer needed.

Change-Id: I40c37ef18c67ab27fc336694dfca3c43a87c47ca
2014-02-04 13:57:40 -08:00
Yunqing Wang
d1961e6fbf Optimize bilinear sub-pixel filters in ssse3
This patch added ssse3 optimization of bilinear sub-pixel filters.
The real time encoder was speeded up by ~1%.

Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
2014-02-04 08:01:55 -08:00
James Zern
2b7338aca4 Merge "vp9_filter.h: rename interp_kernel type" 2014-02-03 23:12:28 -08:00
Dmitry Kovalev
5daaff527e Moving x1 & y1 calculation under if condition.
Change-Id: Iae787d491f7cfe24855ef8f2d04e2c6c19350378
2014-02-03 18:03:17 -08:00
Dmitry Kovalev
64cca45c1d Cleaning up vp9_get_pred_context_single_ref_p2().
Change-Id: I294075acd3073c41e153079ff4462816898b3778
2014-02-03 17:46:34 -08:00
James Zern
cca4276dac vp9_filter.h: rename interp_kernel type
-> InterpKernel
avoids conflicts in variable names, fixing the build with various
toolchains.

broken since:
8691565 Removing subpix_fn_table struct.

Change-Id: Ib5f6fdbcb494a97b62c75b99d4d826ff25d4c981
2014-02-03 16:48:38 -08:00
Alex Converse
be1b41673f Merge "INLINE and reimplement get_unsigned_bits()." 2014-02-03 16:26:33 -08:00
Dmitry Kovalev
220b8f8644 Encoder quantization cleanup.
Change-Id: I633205c95f0e81ce0589580501d0be4425a3cb8e
2014-02-03 14:57:28 -08:00
Dmitry Kovalev
282f36adc4 Merge "Removing "_short" suffix from arm transform file names." 2014-02-03 14:28:47 -08:00
Alex Converse
ffd3d4834b INLINE and reimplement get_unsigned_bits().
The new implementation disagrees when the argument is equal to 2**n but
that is never called in practice and based on how it is used the new
implementation is correct in that case.

Change-Id: Ifbac4ad87d459fe6bd2fd0f400c0340f96617342
2014-02-03 12:16:22 -08:00
Yunqing Wang
2488cb34bc Optimize bilinear sub-pixel filters in sse2
Using bilinear filters could speed up the codec in real-time mode.
This patch added sse2 optimizations of bilinear filters that
operate on different-sized blocks.

Tests showed that the real-time encoder was speeded up by 3%.

Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
2014-02-03 10:34:45 -08:00
Marco Paniconi
6be2b750b8 Layer based rate control for CBR mode.
This patch adds a buffer-based rate control for temporal layers,
under CBR mode.

Added vpx_temporal_scalable_patters.c encoder for testing temporal
layers, for both vp9 and vp8 (replaces the old vp8_scalable_patterns).

Updated datarate unittest with tests for temporal layer rate-targeting.

Change-Id: I9cb6cce2494390ae6096ee17774af7fb9308bde7
2014-02-02 14:30:43 -08:00
Jim Bankoski
9dec7712ab static function convert to inline or global vp9_blockd.h
Change-Id: Ifdd951f24932839f06d1c700371662511dde6ebe
2014-01-31 19:50:40 -08:00
Yunqing Wang
7c6a49bada Merge "Rename a loopfilter parameter" 2014-01-31 18:33:33 -08:00
Dmitry Kovalev
c2ca97caaf Merge "Cleaning up motion compensation code." 2014-01-31 17:33:40 -08:00
Dmitry Kovalev
c49b08c9a1 Removing "_short" suffix from arm transform file names.
Change-Id: Iefe118f61a335e88821a21a9f50fb919212c1507
2014-01-31 17:19:02 -08:00
Dmitry Kovalev
6e4a03e844 Removing "_1d" suffix from mips transform code.
Unifying transform function names across libvpx, 1d is a redundant suffix.

Change-Id: I077c19f3bc7d4842ed7ca5814d77b3dce1728e13
2014-01-31 17:05:03 -08:00
Yunqing Wang
11a9366e3b Rename a loopfilter parameter
As pointed out by Dmitry and James, "partial" is a Microsoft-
specific c++ keyword, and it is renamed.

Change-Id: Ia0fc11ceb89e54b3195287f89f7e26edbbe9beb8
2014-01-31 16:30:04 -08:00
Dmitry Kovalev
88340b173b Merge "Combining fb_idx_ref_cnt[] and yv12_fb[] arrays." 2014-01-31 15:55:04 -08:00
Dmitry Kovalev
a8a2f22958 Merge "Renaming "mbskip" to "skip"." 2014-01-31 15:52:35 -08:00
Yunqing Wang
903801f1ef vp9 decoder: row-based multi-threaded loopfilter
Implemented parallel loopfiltering, which uses existing tile-
decoding threads. Each thread works on one row, and when that row
is loopfiltered, it moves to next unattended row. To ensure the
correct filtering order, threads are synchronized and one
superblock is filtered only if the superblocks it depends on are
filtered already.

To reduce synchronization overhead and speed up the decoder, we use
nsync > 1 for high resolution.

Performance tests:
1. on desktop:
8-tile 4k video using 8 threads, speedup: 70% - 80%
4-tile HD video using 4 threads, speedup: ~35%
2. on mobile device(Nexus 7):
4-tile 1080p video using 4 threads, speedup: 18% - 25%
4-tile 1080p video using 2 threads, speedup: 10% - 15%

Change-Id: If54b4a11960dd706c22d5ad145ad94156031f36a
2014-01-31 14:44:53 -08:00
Yaowu Xu
96dc80da61 Merge "create super fast rtc mode" 2014-01-29 16:36:20 -08:00
Dmitry Kovalev
b107f2c470 Renaming "mbskip" to "skip".
Change-Id: I27a30b43eae026a77f92958e2238d02d9cdf7832
2014-01-29 14:48:42 -08:00
Dmitry Kovalev
5670f1e2a8 Merge "Finally removing vp9_setup_interp_filters() function." 2014-01-29 12:54:21 -08:00
Dmitry Kovalev
6332063475 Combining fb_idx_ref_cnt[] and yv12_fb[] arrays.
Adding new RefCntBuffer struct which contains reference counter and image
buffer.

Change-Id: I71c1f532faa13442c32c43fc03ec45b6f88fb844
2014-01-29 12:48:01 -08:00
Dmitry Kovalev
b00eb5c464 Finally removing vp9_setup_interp_filters() function.
Change-Id: If446225afbb49f6033c2a4516a37c377de6f70f7
2014-01-29 11:29:34 -08:00
Jim Bankoski
ea8aaf15b5 create super fast rtc mode
This patch only works if the video is a width and height that are both
a multiple of 32..   It sets every partition to 16x16, and does INTRADC
only on the first frame and ZEROMV on every other frame.   It always does
does the largest possible transform, and loop filter level is set to 4.

Was ~20% faster than speed -5 of vp8

Now 20% slower but adds motion search ( every block ), nearest, near
and zeromv

The SVC test was changed because - while this realtime mode produces
bad quality albeit quickly, it isn't obeying all the rules it should
about which frames are available.

Change-Id: I235c0b22573957986d41497dfb84568ec1dec8c7
2014-01-29 08:39:39 -08:00
Yunqing Wang
3c29cbffbf Add macros for convolve functions
Added macros to reduce the code duplication.

Change-Id: I1916aa5a386ea07d961d4ec439ab09bb8c45487d
2014-01-28 18:40:23 -08:00
Dmitry Kovalev
b098c04290 Merge "Decoupling set_ref_ptrs() and vp9_setup_interp_filters()." 2014-01-28 10:37:58 -08:00
Dmitry Kovalev
4ce35d8f2d Merge "Removing _1d suffix from transform names." 2014-01-28 10:37:26 -08:00
hkuang
af87148a22 Merge "Add vp9_tm_predictor_32x32 neon implementation which is 7.8 times faster than C." 2014-01-28 09:57:08 -08:00
Dmitry Kovalev
ff41764920 Removing _1d suffix from transform names.
It is enough to specify (e.g.) idct16, it is obviously different from
idct16x16.

Change-Id: I6b408a37a945de3162429380b59a775b03b95db0
2014-01-27 16:15:36 -08:00
hkuang
770454f3a8 Add vp9_tm_predictor_32x32 neon implementation
which is 7.8 times faster than C.

Change-Id: I858ef4ec09202a07d445da8db702783d6d9d7321
2014-01-27 16:01:07 -08:00
Dmitry Kovalev
e5b31a1d8c Decoupling set_ref_ptrs() and vp9_setup_interp_filters().
Change-Id: I8d17867a4772554cbba2bd113cc5b4c99d50146d
2014-01-27 16:00:20 -08:00
Dmitry Kovalev
b2f0ae65c7 Merge "Removing subpix_fn_table struct." 2014-01-27 10:42:42 -08:00
hkuang
05d2081d38 Fix the vp9_tm_predictor_8x8_neon.
Change-Id: I832cf83871044bfee7b7e57dbd31bae05cbd53e9
2014-01-27 10:17:20 -08:00
Dmitry Kovalev
8691565441 Removing subpix_fn_table struct.
We don't use different filter kernels for x and y, it is always one kernel
for both directions.

Change-Id: Iefcbb02ec74bf46ea20d9dca672a3efd5d631517
2014-01-24 17:06:26 -08:00
Dmitry Kovalev
f9f936b82f Merge "Renaming INTERPOLATION_TYPE to INTERP_FILTER." 2014-01-24 16:52:10 -08:00
Frank Galligan
183361dadb Merge "Optimize vp9_tm_predictor_8x8_neon function" 2014-01-24 16:21:56 -08:00
Dmitry Kovalev
4264c93844 Renaming INTERPOLATION_TYPE to INTERP_FILTER.
Corresponding renames:
  subpel_kernel              => interp_kernel
  vp9_get_filter_kernel()    => vp9_get_interp_kernel()
  pred_filter_type           => pred_interp_filter
  adaptive_pred_filter_type  => adaptive_pred_interp_filter
  mcomp_filter_type          => interp_filter
  read_interp_filter_type()  => read_interp_filter()
  write_interp_filter_type() => write_interp_filter()
  fix_mcomp_filter_type()    => fix_interp_filter()

Change-Id: I1fa61fa1dc81ebbf043457c3ee2d8d4515bee6d3
2014-01-24 15:57:28 -08:00
Dmitry Kovalev
03eb63c114 Merge "Removing MODE_STATS." 2014-01-24 15:53:12 -08:00
Frank Galligan
c6d537155c Merge "Revert external frame buffer code." 2014-01-24 11:31:23 -08:00
Frank Galligan
56a8a0b54b Optimize vp9_tm_predictor_8x8_neon function
Change-Id: Ia12aae491202098ff66366145aa0c3da38dc97e5
2014-01-24 11:07:14 -08:00
hkuang
92ab96a7ae Merge "Add vp9_tm_predictor_16x16 neon implementation which is 3.5 times faster than C." 2014-01-24 10:48:44 -08:00
James Zern
26c88ec14e Merge changes I826655a7,I5164df72,Iba9b198c,Ide9a6846,I4f51ce85,I0e6aa00f,Ic334da9a,I252f5f8a,I7865db2d,I13b434b1
* changes:
  test/: remove unnecessary extern "C"s
  top-level: add extern "C" to headers
  vpx_ports: add extern "C" to headers
  vpx: add extern "C" to headers
  vp9/encoder: add extern "C" to headers
  vp9/decoder: add extern "C" to headers
  vp9/common: add extern "C" to headers
  vp8/encoder: add extern "C" to headers
  vp8/decoder: add extern "C" to headers
  vp8/common: add extern "C" to headers
2014-01-24 10:47:00 -08:00
hkuang
3633ffcbf7 Add vp9_tm_predictor_16x16 neon implementation
which is 3.5 times faster than C.

Change-Id: I24439ba7a2971829c11620f34848facf2c916678
2014-01-24 10:22:58 -08:00
Frank Galligan
b1c72b633e Revert external frame buffer code.
A future CL will add external frame buffers
differently.

Squash commit of four revert commits:
Revert "Increase required number of external frame buffers"

This reverts commit 9e41d569d7.

Revert "Add external constants."

This reverts commit bbf53047b0.

Revert "Add frame buffer lru cache."

This reverts commit fbada948fa.

Conflicts:
	vpxdec.c

Change-Id: I76fe42419923a6ea6c75d9997cbbf941d73d3005

Revert "Add support to pass in external frame buffers."

This reverts commit 10f891696b.

Conflicts:
	test/external_frame_buffer_test.cc
	vp9/common/vp9_alloccommon.c
	vp9/common/vp9_reconinter.c
	vp9/decoder/vp9_decodeframe.c
	vp9/encoder/vp9_onyx_if.c
	vp9/vp9_dx_iface.c
	vpx/vpx_decoder.h
	vpx/vpx_external_frame_buffer.h
	vpx_scale/generic/yv12config.c
	vpxdec.c

Change-Id: I7434cf590f1c852b38569980e4247fad0d939c2e
2014-01-24 10:10:20 -08:00
Adrian Grange
8b0537f631 Merge changes I24ad1f0f,I33be1366
* changes:
  Reorder functions to avoid forward declaration
  Rename set_scale_factors as set_ref_ptrs
2014-01-24 08:38:52 -08:00
Dmitry Kovalev
6c98df29e4 Cleaning up motion compensation code.
Change-Id: I74cf028e8c732cd0dbc070326152d3085b824a80
2014-01-23 17:15:30 -08:00
James Zern
0940c9cfde vp9/common: add extern "C" to headers
Change-Id: Ic334da9aee968e33762c2b25d9fbad24c844b411
2014-01-23 16:21:24 -08:00
Dmitry Kovalev
5f75fda9e9 Merge "Cleaning up vp9_refining_search_sad() function." 2014-01-22 17:15:22 -08:00
hkuang
97826df96b Add tm_predictor_8x8 neon implementation.
Change-Id: I76c2720546b737cb63018a8ab6a3ff62a291786d
2014-01-22 13:43:20 -08:00
Adrian Grange
e37eb0ade7 Rename set_scale_factors as set_ref_ptrs
New name better describes what the function does.

Change-Id: I33be1366a81f058a9854b804bcde211061187dc7
2014-01-22 13:04:30 -08:00
Johann
4e9dc6d45d Merge "Match vp9_coefband_trans_* declarations" 2014-01-22 11:10:51 -08:00
Johann
6c492fc2f9 Match vp9_coefband_trans_* declarations
VS2013 Chromium builds failed with:
warning C4742: 'vp9_coefband_trans_8x8plus' has different alignment in

https://code.google.com/p/chromium/issues/detail?id=336620

Change-Id: I865f72bc23ae958531eeb5f497002c12e9a36fcd
2014-01-21 17:07:23 -08:00
hkuang
437004c710 Seperate the border size for encoder and decoder.
Encoder's boarder is still 160, while decoder's boarder will be 32.
With on demand and separate boarder buffer for boarder extension.
The decoder's boarder does not need to to 160 anymore.

Change-Id: I93d5aaff15a33a2213e9761eaa37c5f2870747db
2014-01-21 15:28:41 -08:00
Dmitry Kovalev
a001016996 Removing MODE_STATS.
Change-Id: I7520e1cc82b749187c9445356dd7b54f3f3826cc
2014-01-17 17:30:22 -08:00
Jingning Han
b461c0884e Deprecate best_mv from encoder
This commit deprecates the use of best_mv from encoding and bit-stream
writing stages. It hence removes the definition from MACROBLOCKD.

Change-Id: I8e5302775a2aa4a18900726df407bff881f2dfb1
2014-01-17 17:15:34 -08:00
hkuang
671df8486d Merge "Use a temp buffer for reconstruction when reference buffer is out of boarder." 2014-01-17 16:17:36 -08:00
hkuang
7459fee8c6 Use a temp buffer for reconstruction when
reference buffer is out of boarder.

Change-Id: Ic7ad136e54a4d68abe0fd4345146a86b0ba824e1
2014-01-17 16:15:54 -08:00
Dmitry Kovalev
d8bfe9e24c Cleaning up vp9_refining_search_sad() function.
Change-Id: I660b53da8ebf3049832ce8a10721051c4e0ebb00
2014-01-17 15:20:28 -08:00
Dmitry Kovalev
ac40c87f68 Removing unused vp9_yv12_copy_partial_frame() function.
Change-Id: I3149e562fe9500914f67b6f908283edcdc381ac6
2014-01-16 18:16:34 -08:00
Yunqing Wang
d2bb0c51d3 Revert "Revert "Revert "SSSE3 convolution optimization"""
This reverts commit f9404f2406.

This patch caused some ASAN error.

Change-Id: If15b7e581310e19061d111c69f2931809662ed19
2014-01-16 16:11:46 -08:00
hkuang
2a2d8c140f Merge "Add vp9_tm_predictor_4x4 neon implementation" 2014-01-16 10:18:12 -08:00
Dmitry Kovalev
67e4ca2a1a Merge "Cleaning up postproc code." 2014-01-15 16:23:54 -08:00
Yaowu Xu
056db03d17 Merge "Revert "Revert "SSSE3 convolution optimization""" 2014-01-15 15:03:25 -08:00
Deb Mukherjee
8ce5f68fe4 Merge "Rearranges the END_USAGE typedef" 2014-01-15 14:01:30 -08:00
hkuang
f2ef389256 Add vp9_tm_predictor_4x4 neon implementation
Change-Id: I10c423bde7ea5a3bac9f14f35c73b6bc31c8f3e3
2014-01-15 11:51:36 -08:00
Deb Mukherjee
f32106951a Rearranges the END_USAGE typedef
Rearranges the END_USAGE typedef to make it compatible with the
vpx user input.

Change-Id: Ic9fa9e9edbee7c0ad01e12e685b219582fcecd16
2014-01-15 10:10:23 -08:00
Adrian Grange
c3011e6f90 Delete outdated comment & tidy-up others
Change-Id: I83031180723ee59270ec8fb66b2f73c0796bee25
2014-01-15 09:53:03 -08:00
Dmitry Kovalev
a540f8a0b0 Cleaning up postproc code.
Change-Id: I7e53f6345a4cf89309262f50850c9ad08ed3c527
2014-01-14 15:49:19 -08:00
Yunqing Wang
f9404f2406 Revert "Revert "SSSE3 convolution optimization""
This reverts commit b645257121.

Change-Id: I60d1bf57ae8e9eb6127f42f2d5a780124ac51b45
2014-01-13 12:29:55 -08:00
James Zern
f83c12b540 Merge "cosmetics: vp9_reconinter.h: make some variables const" 2014-01-11 12:39:32 -08:00
Dmitry Kovalev
96be0a50ab Removing mi_height_log2_lookup table.
Change-Id: I1f0ae2edc3a96b33c0494d165ae756a8feba6184
2014-01-10 13:29:47 -08:00
Paul Wilkins
b645257121 Revert "SSSE3 convolution optimization"
This reverts commit 511d218c60.

In current form intrinsics break borg build.

Change-Id: Ied37936af841250ecff449802e69a3d3761c91b9
2014-01-10 13:38:26 +00:00
Jingning Han
a4c94a94cc Merge "Optimze inv 16x16 DCT with 10 non-zero coeffs - P2" 2014-01-09 18:17:25 -08:00
Jingning Han
faa2ba86cc Merge "Optimze inv 16x16 DCT with 10 non-zero coeffs - P1" 2014-01-09 18:17:12 -08:00
Dmitry Kovalev
c8e8d3a461 Merge "Renaming 'Sharpness' to 'sharpness'." 2014-01-09 13:42:55 -08:00
Jingning Han
af31b27aae Optimze inv 16x16 DCT with 10 non-zero coeffs - P2
This commit further optimizes SSE2 operations in the second 1-D
inverse 16x16 DCT, with (<10) non-zero coefficients. The average
runtime of this module goes down from 779 cycles -> 725 cycles.

Change-Id: Iac31b123640d9b1e8f906e770702936b71f0ba7f
2014-01-09 12:46:09 -08:00
Yunqing Wang
f3b9b97c0e Merge "SSSE3 convolution optimization" 2014-01-09 12:39:47 -08:00
levytamar82
511d218c60 SSSE3 convolution optimization
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization done only for 64bit.

Change-Id: Icb586dc0c938b56699864fcee6c52fd43b36b969
2014-01-09 12:27:51 -07:00
Dmitry Kovalev
4fbe54d201 Merge "Renaming 'Mode' to 'mode'." 2014-01-08 16:29:29 -08:00
Jingning Han
ba6ab46cdc Optimze inv 16x16 DCT with 10 non-zero coeffs - P1
This commit is the first patch optimizing SSE2 implementation of inverse
16x16 DCT with <10 non-zero coefficients. It focused on the first 1-D (row)
transformation. It exploits the fact that only top-left 4x4 block contains
non-zero coefficients, in a 2-D inverse 16x16 DCT with <10 coeffients.

The average runtime of idct16x16_10 unit is reduced from
883 cycles -> 779 cycles (12% faster).

For pedestrian_area_1080p 300 frames at 4000 kbps, the speed 2 runtime goes
down from 310651 ms  -> 305910 ms. The decoding speed goes up from
80.37 fps -> 80.87 fps.

Change-Id: Ic6f3ac5a637a76c07ba73ddaafe318a699fea645
2014-01-08 15:36:45 -08:00
Alex Converse
8fcb74e6bb Merge "Add a C fallback for get_msb() and change inline to INLINE." 2014-01-08 14:43:46 -08:00
hkuang
5be0ed30dc Merge "Add initial intra frame neon optimization. 1~2% gain." 2014-01-08 14:41:43 -08:00
Dmitry Kovalev
962c8b241e Renaming 'Mode' to 'mode'.
Change-Id: I6cdd670d66288dbd66228f38bba6b30502d25362
2014-01-08 14:33:59 -08:00
Dmitry Kovalev
57be81369a Renaming 'Sharpness' to 'sharpness'.
Change-Id: I54513dc3b3321e0c0bb6b15ea5c34085ed80b4a4
2014-01-08 14:19:14 -08:00
Alex Converse
ce7ff3b63d Add a C fallback for get_msb() and change inline to INLINE.
For systems without __builtin_clz() or _BitScanReverse(), taken from libwep

Change-Id: Iead257efc1772c466c79e1dc0356ed571d38d43e
2014-01-08 12:25:47 -08:00
hkuang
691111aacf Add initial intra frame neon optimization. 1~2% gain.
More intra optimizations will be added.

Change-Id: I33ae8d93f6002bf7b64cc2669602d9e6bfa5a6e8
2014-01-08 11:58:42 -08:00
Yunqing Wang
a84029ad9c Merge "AVX2 Variance Optimization" 2014-01-08 11:33:42 -08:00
levytamar82
357b65369f AVX2 Variance Optimization
Optimizing the variance functions: vp9_variance16x16, vp9_variance32x32,
vp9_variance64x64, vp9_variance32x16, vp9_variance64x32,
vp9_mse16x16 by migrating to AVX2
some of the functions were optimized by processing 32 elements instead of 16.
some of the functions were optimized by processing 2 loop strides of 16
elements in a single 256 bit register
This optimization gives between 2.4% - 2.7% user level performance gain
and 42% function level gain.

Change-Id: I265ae08a2b0196057a224a86450153ef3aebd85d
2014-01-08 12:05:53 -07:00
Alex Converse
f2ca665f1c Replace RD modeling with a fixed point approximation.
Change-Id: I44eb44eb3f36c05d916ef140ef42cc84f72f99ec
2014-01-08 10:37:24 -08:00
Dmitry Kovalev
bbb25e6a39 Merge "Adding RefBuffer struct." 2014-01-06 14:19:44 -08:00
Jingning Han
b49e9fb433 Merge "Tune IDCT8_1D macro function interface" 2014-01-06 09:38:19 -08:00
Dmitry Kovalev
0c5575fe57 Merge "Moving hev mask calculation into filter4() function." 2014-01-03 15:56:16 -08:00
Jingning Han
3e0c62b53f Tune IDCT8_1D macro function interface
This commit adds input/output ports for IDCT8_1D macro function to
provide more flexibility in variable use. It allows to skip several
buffer swap operations.

Change-Id: I21f3450509537322293043b3281bfd3949868677
2014-01-03 15:23:47 -08:00
Dmitry Kovalev
ba41e9d459 Adding RefBuffer struct.
Adding RefBuffer to simplify reference buffer management. The struct has a
pointer to image data and scale factors relative to the current frame.

Change-Id: If38eb1491ff687cc11428aee339f3e052e2c5d9e
2014-01-03 15:21:55 -08:00
Jingning Han
0b1a27135a Reduce num of buffer swap calls in idct8_1d_sse2
This commit merges the initial buffer swap operations in idct8_1d_sse2
into the array transpose step, hence reducing number of instructions
therein.

Change-Id: I219f6f50813390d2ec3ee37eecf2a4a2b44ae479
2014-01-03 12:12:03 -08:00
Jingning Han
1bb11781e2 Rework idct8x8_10 SSE2 implementation
This commit optimizes the SSE2 implmentation of idct8x8_10. It exploits
the fact that only top-left 4x4 block contains non-zero coefficients,
and hence reduces the instructions needed.

The runtime of idct8x8_10_sse2 goes down from 216 to 198 CPU cycles,
estimated by averaging over 100000 runs. For pedestrian_area_1080p 300
frames coded at 4000kbps, the average decoding speed goes up from
79.3 fps to 79.7 fps.

Change-Id: I6d277bbaa3ec9e1562667906975bae06904cb180
2014-01-03 12:04:09 -08:00
Yaowu Xu
8458c8c450 Merge "Fix show existing frame" 2014-01-02 09:27:28 -08:00
Dmitry Kovalev
f3beca079c Merge "Calculating has_second_ref only once for single_ref context." 2013-12-26 13:41:02 -08:00
Dmitry Kovalev
1e8b5bf4ac Merge "Removing vp9_findnearmv.{h, c} files." 2013-12-26 13:38:38 -08:00
James Zern
44963dfd37 cosmetics: vp9_reconinter.h: make some variables const
Change-Id: If5cd0a1487e97c8e9d13dc2e078c6dceaf79de4f
2013-12-26 14:02:46 -05:00
Dmitry Kovalev
87440aeb82 Moving MAX_PROB constant to vp9_prob.h.
Change-Id: I07470ad1b7a0344d088911428ffab8ba9a0d8708
2013-12-20 15:56:59 -08:00
Dmitry Kovalev
b3b9f4a4d0 Merge "Using single struct to represent scale factors." 2013-12-20 11:22:02 -08:00
Yunqing Wang
b6a0ac11f0 Merge "Code clean up" 2013-12-20 08:46:11 -08:00
Dmitry Kovalev
987810ad95 Removing vp9_findnearmv.{h, c} files.
Moving all code from that files to vp9_mvref_common.{h, c}.

Change-Id: Ibc4afcb8cea6847166ff411130e93611ebe63b20
2013-12-19 17:39:57 -08:00
Dmitry Kovalev
a3fbcc88bb Using single struct to represent scale factors.
Moving back to scale_factors struct. We don't need anymore x_offset_q4 and
y_offset_q4 because both values are calculated locally inside vp9_scale_mv
function.

Change-Id: I78a2122ba253c428a14558bda0e78ece738d2b5b
2013-12-19 16:06:33 -08:00
Dmitry Kovalev
c872d2be65 Call set_scaled_offsets() just before scale_mv() call.
Before mv scaling it is required to calculate x_offset_q4/y_offset_q4
by calling set_scaled_offsets(). Now offset configuration can not be
missed because it happens just before scale_mv().

Change-Id: I7dd1a85b85811a6cc67c46c9b01e6ccbbb06ce3a
2013-12-19 14:55:13 -08:00
Yunqing Wang
09faf55916 Code clean up
Removed unused filter coefficients.

Change-Id: Ib395a51305e23ff41ab69c1808d56946d25961cd
2013-12-19 11:09:23 -08:00
Dmitry Kovalev
c67ee5ea24 Merge "Converting vp9_treecoder.h to vp9_prob.{h, c}" 2013-12-19 11:03:30 -08:00
Marco Paniconi
02d5ebcfdc Merge "Updates for 1-pass CBR rate control." 2013-12-18 10:28:33 -08:00
Marco Paniconi
1b8b8b0d0d Updates for 1-pass CBR rate control.
Adjustments based on buffer level, frame dropper.

Change-Id: Iaa85b570493526a60c4b9fb7ded4c0226b1b3a33
2013-12-18 09:24:24 -08:00
Jim Bankoski
9d754dcca8 Merge "rename loop filter functions" 2013-12-17 18:56:09 -08:00
Jim Bankoski
b720ba165f rename loop filter functions
This renames all the loop filter functions so that they no
longer refer to mb

Change-Id: I8a58a8c7fd253d835cb619bde13913e896ece90b
2013-12-17 17:34:34 -08:00
Dmitry Kovalev
118c8fb3fb Calculating has_second_ref only once for single_ref context.
Change-Id: Ib1253e0606426850f53060a4c5303af86bf1c093
2013-12-17 17:02:24 -08:00
Dmitry Kovalev
c6a1ff223b Merge "Calling is_inter_block() only if mbmi is available." 2013-12-17 16:10:56 -08:00
Dmitry Kovalev
4821084b3f Moving hev mask calculation into filter4() function.
Change-Id: Ieccf2070b2b01b4135f4c5f9857667eb7825c761
2013-12-17 15:23:23 -08:00
Dmitry Kovalev
eb0c73b6e0 Merge "Converting mode_lf_lut struct member into static lookup table." 2013-12-17 15:20:05 -08:00
James Zern
bd9a388a06 vp9: normalize include guards
Change-Id: If4ddbdcfb3ab387cbca6910b42cf4df8111e6879
2013-12-16 19:40:49 -08:00
Yaowu Xu
3cce464342 Define POSITION to differentiate from MV
MV struct was ussed to indicate the postition of a MI_BLOCK with row
and col components. The expression was confusing, this commit added a
new stucture "POSITION" with row and col component to better describe
the position of a mi_block.

Change-Id: I59fdd4b45010fe7d85a8db22a55503265c4f5b2b
2013-12-16 17:28:00 -08:00
Yaowu Xu
50ec6311e6 Move two functions to encoder
As they are used by encoder only.

Change-Id: I7b1e6955b218aba66fe156523521a8121c9a84a4
2013-12-16 17:27:48 -08:00
Dmitry Kovalev
bb7b4bad6d Merge "Getting rid of b_{width, height}_log2 calls in non-420 loop filter." 2013-12-16 15:10:25 -08:00
Dmitry Kovalev
865d5b83f2 Calling is_inter_block() only if mbmi is available.
Modifying vp9_get_intra_inter_context(), vp9_get_reference_mode_context(),
vp9_get_pred_context_single_ref_p1(), vp9_get_pred_context_single_ref_p2()
functions.

Change-Id: Ifaa2c3eb0c76a544ae8bd1fe3155aada266eae78
2013-12-16 15:09:33 -08:00
hkuang
fb53409d2a Merge "Remove border extension in intra frame prediction." 2013-12-16 14:48:54 -08:00
Dmitry Kovalev
b1d821704b Merge "Yet another vp9_pred_common.c cleanup." 2013-12-16 14:10:52 -08:00
hkuang
25e5552630 Remove border extension in intra frame prediction.
Change-Id: Id677df4d3dbbed6fdf7319ca6464f19cf32c8176
2013-12-16 14:05:58 -08:00
Dmitry Kovalev
b5c9261832 Converting vp9_treecoder.h to vp9_prob.{h, c}
Moving vp9_norm probability table from vp9_entropy.c to vp9_prob.c

Change-Id: Ie757b73860c6f43130790c332b292e2a1a81b788
2013-12-16 12:53:09 -08:00
Frank Galligan
fbada948fa Add frame buffer lru cache.
Add an option for libvpx to return the least recently used
frame buffer.

Change-Id: I886a96ffb94984f1c42de53086e0131922df3260
2013-12-15 19:57:42 -08:00
Frank Galligan
d0ee1fd797 Merge "Add support to pass in external frame buffers." 2013-12-15 19:18:25 -08:00
Frank Galligan
10f891696b Add support to pass in external frame buffers.
VP9 decoder can now use frame buffers passed in by the application.

Change-Id: I599527ec85c577f3f5552831d79a693884fafb73
2013-12-15 18:45:46 -08:00
Dmitry Kovalev
4d2d1591a3 Converting mode_lf_lut struct member into static lookup table.
Change-Id: I6e6c7cb5ff5b60fbe6a7c314daec5ccdc2cafcc3
2013-12-14 17:42:12 -08:00
Dmitry Kovalev
2aadc06e0d Yet another vp9_pred_common.c cleanup.
Change-Id: I617d6c610d181076773c5c3d6f3dbc6717b02580
2013-12-14 17:39:24 -08:00
Dmitry Kovalev
64cf398713 Merge "Using MV struct instead of int_mv union in encoder." 2013-12-13 16:42:54 -08:00
Dmitry Kovalev
33df4f0483 Merge "vp9_convole.c cleanup." 2013-12-13 15:40:00 -08:00
Dmitry Kovalev
f54b515797 Merge "Cleaning up vp9_append_sub8x8_mvs_for_idx()." 2013-12-13 15:38:53 -08:00
Dmitry Kovalev
25da21b14e Using MV struct instead of int_mv union in encoder.
Change-Id: I8b81a3e4b4fa530a654c28d9c136afa0c1d379fd
2013-12-13 15:24:48 -08:00
Dmitry Kovalev
466cc94e7a Getting rid of b_{width, height}_log2 calls in non-420 loop filter.
Using num_{4x4, 8x8}_blocks_{wide, high}_lookup instead.

Change-Id: I66a7ab807fa57395253b2d0e636c2479fa8c4adf
2013-12-13 12:53:41 -08:00
James Zern
178db94cd6 vp9 asserts: fix compile warning
string literal to int within an assert

Change-Id: I0c889256b67a078e6e2a79577f0b7ae084243258
2013-12-12 19:49:19 -08:00
Dmitry Kovalev
629fb85f17 vp9_convole.c cleanup.
Making overall logic more clear, moving "hacked" calculation of base filter
array pointer to get_filter_base() function.

Change-Id: Ibbd38a9f937e48d35bbbfef3ad933ab36664cccb
2013-12-12 11:14:06 -08:00
Deb Mukherjee
7edd5170b5 Merge "Changes interfaces to vp9_get_compressed_data fn" 2013-12-11 15:50:40 -08:00
Dmitry Kovalev
e79103166f Merge "Renames for consistency in vp9_pred_common.{c, h} files." 2013-12-11 14:30:44 -08:00
Deb Mukherjee
e33855cc47 Changes interfaces to vp9_get_compressed_data fn
Silences some lint warnings in previous patches

Change-Id: I04bf47ebe7e63a95fd322719a3154e589c115d78
2013-12-11 14:22:51 -08:00
hkuang
9460226acd Merge "Fix valgrind error." 2013-12-11 13:22:32 -08:00
hkuang
1339f3842c Fix valgrind error.
Temporarily change memcpy to memmove.

Change-Id: I700a197bc1ce496be1ddad7118429c5da465b0ca
2013-12-11 13:21:28 -08:00
Dmitry Kovalev
3274fc30ee Renames for consistency in vp9_pred_common.{c, h} files.
Change-Id: Icba06e84ca55c419abbacedf5825eeb394a1b140
2013-12-10 18:31:46 -08:00
Dmitry Kovalev
098d13ba10 Cleaning up vp9_append_sub8x8_mvs_for_idx().
Replacing if-else with switch statement, reordering function arguments.

Change-Id: I4825d2ef311ba8999b6d4ceb0eef003587a13434
2013-12-10 17:56:53 -08:00
Dmitry Kovalev
2dd20e468a Cleaning up skip context calculation.
Renames:
  vp9_get_pred_context_mbskip => vp9_get_skip_context
  vp9_get_pred_prob_mbskip    => vp9_get_skip_prob

Change-Id: I2af499848ef73f3f5cd8cdb27852d0bcdfe31d09
2013-12-10 14:11:26 -08:00
Dmitry Kovalev
35b7b0b549 Merge "Removing unused vp9_get_pred_flag_mbskip() function." 2013-12-10 13:58:35 -08:00
hkuang
19bbe41c71 Merge "Refactor inter_predictor function." 2013-12-10 13:34:24 -08:00
Dmitry Kovalev
48088f210d Removing unused vp9_get_pred_flag_mbskip() function.
Change-Id: Ib46a97d8ff9f2915b9fa2abba3cd18b6711fcb0c
2013-12-10 12:53:17 -08:00
Dmitry Kovalev
e18eb7721e Merge "Renaming comp_pred_mode to reference_mode." 2013-12-10 10:52:34 -08:00
hkuang
6c9dcae532 Refactor inter_predictor function.
Change-Id: Ic429b2f16462e926f30efb3af4da3080026359d8
2013-12-10 10:36:44 -08:00
Dmitry Kovalev
d2dad31e79 Merge "Cleaning up vp9_get_pred_context_switchable_interp() functuion." 2013-12-09 17:34:30 -08:00
hkuang
d70a8c09c6 Merge "Implenment on demand border extension. In place extend the border now. Next commit will totally remove the border." 2013-12-09 17:16:31 -08:00
Dmitry Kovalev
9edd4d4db7 Cleaning up vp9_get_pred_context_switchable_interp() functuion.
Change-Id: I67a45a41312ca0efd8fe00ccd8bdc0f97675d09f
2013-12-09 17:02:38 -08:00
hkuang
ff2c96be1f Implenment on demand border extension. In place extend
the border now. Next commit will totally remove the border.

Change-Id: Ic1e1ca9cc34f81c688715b3948689b47df63a151
2013-12-09 16:44:08 -08:00
Jingning Han
f92b5842bf Merge "Full range motion search for regular block sizes" 2013-12-09 16:12:35 -08:00
Dmitry Kovalev
08c48ddc01 Renaming comp_pred_mode to reference_mode.
Change-Id: I83ffed2b1878a35ac35f07f9ee74309adc9c7b11
2013-12-09 15:13:34 -08:00
Dmitry Kovalev
347df4ce55 Merge "Renaming vp9_get_pred_context_tx_size() function." 2013-12-09 15:10:49 -08:00
Dmitry Kovalev
2c3120274a Removing max_uv_txsize_lookup lookup table.
Adding get_uv_tx_size_impl() with tx size selection logic, rewriting
get_uv_tx_size().

Change-Id: I3ecb108059a41be227a8c89a0710bd174f508951
2013-12-09 14:03:23 -08:00
Dmitry Kovalev
a19d694f09 Merge "Removing BLOCK_TYPES and adding PLANE_TYPES constant instead." 2013-12-07 02:20:41 -08:00
Dmitry Kovalev
cb92f4f042 Renaming vp9_get_pred_context_tx_size() function.
Change-Id: Ia6d6f4dfb1fd1ec0f8ba53796b59a802e9d7881d
2013-12-06 15:31:06 -08:00
Dmitry Kovalev
b6e5bb27c9 Merge "Renaming reference mode context calculation function." 2013-12-06 14:22:47 -08:00
Jingning Han
b295092b8f Full range motion search for regular block sizes
Add a full range motion search for regular block sizes. This runs
exhaustive search within the given reference area. This commit further
optimizes the search process by combining 4 points test into one
pipeline, which gives 30% speed-up as compared to run each individual
point at a time.

This full range search serves as a best possible motion search reference.
When replacing the diamond search with full range search, the speed 0
runtime of bus CIF at 2000 kbps goes from 153872ms to 623051ms. The
compression performance compared to speed 0 setting gains 0.585% for
derf set.

Change-Id: Ieef1225216b0b86b4ac4872fa7fb9e18bf2eabb3
2013-12-06 12:24:53 -08:00
Dmitry Kovalev
2da30a96d4 Merge "Removing duplicated C code from vp9_loopfilter_filters.c file." 2013-12-06 12:13:24 -08:00
Dmitry Kovalev
63963f51ef Renaming reference mode context calculation function.
Renames:
  vp9_get_pred_context_comp_inter_inter => vp9_get_reference_mode_context
  vp9_get_pred_prob_comp_inter_inter    => vp9_get_reference_mode_prob

Change-Id: I3bbb69481e6b0c848028667c9269f567f293d3bd
2013-12-06 11:23:01 -08:00
Dmitry Kovalev
d6b159d4a6 Removing BLOCK_TYPES and adding PLANE_TYPES constant instead.
Change-Id: Ic3bb862e93aedf6a489a33ea6f7e5097d96855ee
2013-12-06 10:54:00 -08:00
Dmitry Kovalev
cf4dfdc8e7 Merge "Moving vp9_tree_probs_from_distribution() to encoder." 2013-12-06 10:18:30 -08:00
Dmitry Kovalev
8eac2ca840 Merge "Renaming constants." 2013-12-06 09:55:02 -08:00
Dmitry Kovalev
5be34ba80f Merge "vp9_get_pred_context_intra_inter() clean up." 2013-12-06 09:14:36 -08:00
Adrian Grange
de2046275d Merge "Remove redundant calls to vp9_update_mode_info_border" 2013-12-06 08:59:47 -08:00
Dmitry Kovalev
4ac6a2552b Moving vp9_tree_probs_from_distribution() to encoder.
Writing custom coeff branch count calculation (which is much clearer) in
adapt_coef_probs() function. Removing vp9_treecoder.c file.

Change-Id: I8880fb7a39996c8bcf6cd0acf9898a8c712ba91f
2013-12-05 18:13:26 -08:00
Dmitry Kovalev
377fa8aff8 Renaming PREV_COEF_CONTEXTS to COEFF_CONTEXTS.
Also adding BAND_COEFF_CONTEXTS macro to simplify for loop logic.

Change-Id: I12a78a49cf1addf81e6b3fe2a3736ec2b79bd79e
2013-12-05 17:08:06 -08:00
Dmitry Kovalev
6fd71e1b09 vp9_get_pred_context_intra_inter() clean up.
Renaming:
 vp9_get_pred_context_intra_inter => vp9_get_intra_inter_context
 vp9_get_pred_prob_intra_inter    => vp9_get_intra_inter_prob

Change-Id: I2c1affea2e84f4e616137c6df82adb11c7845781
2013-12-05 17:01:03 -08:00
Dmitry Kovalev
f7396f3394 Merge "Removing vp9_default_coef_probs.h file." 2013-12-05 16:44:26 -08:00
Dmitry Kovalev
0d4b8d7e43 Renaming constants.
NUM_YV12_BUFFERS        => FRAME_BUFFERS
ALLOWED_REFS_PER_FRAME  => REFS_PER_FRAME
NUM_REF_FRAMES_LOG2     => REF_FRAMES_LOG2
NUM_REF_FRAMES          => REF_FRAMES
NUM_FRAME_CONTEXTS_LOG2 => FRAME_CONTEXTS_LOG2
NUM_FRAME_CONTEXTS      => FRAME_CONTEXTS

Change-Id: I4e1ada08f25d8fa30fdf03aebe1b1c9df0f87e63
2013-12-05 16:23:09 -08:00
Dmitry Kovalev
2b95a05bf6 Removing duplicated C code from vp9_loopfilter_filters.c file.
Change-Id: I299b621fca1c8ff5d296afde9698cdcccfecaf3f
2013-12-05 15:49:57 -08:00
Adrian Grange
93d8a3fd29 Remove redundant calls to vp9_update_mode_info_border
Removed calls to vp9_update_mode_info_border since
they immediately followed code that initialized the
entire buffer to 0.

Change-Id: Ife06794daa20439a0b607a83a87f88df59afac40
2013-12-05 15:02:32 -08:00
Dmitry Kovalev
6df9ec52a0 Merge "Cleaning up vp9_get_pred_context_tx_size() function." 2013-12-05 09:59:00 -08:00
Tero Rintaluoma
047b0b01bb Fix show existing frame
- Disable mode info update in case where current frame is coded
  as "show existing frame".
- Should fix issue 676.

Change-Id: Ibee681850eb307f982da6528d3e31cb94f881c08
2013-12-05 12:10:10 +02:00
Frank Galligan
7ecf3bc91c Fix ref count decrement code.
Buffer 0 would never be decremented, so it could only be used
once.

Change-Id: I605d99fa2a513eadae6a0e230161729880653282
2013-12-04 22:21:00 -08:00
Dmitry Kovalev
5eeffc9fc5 Cleaning up vp9_get_pred_context_tx_size() function.
Change-Id: Ia6ef876e3d1e66b2182a9c0bce3fd758691cd381
2013-12-04 21:35:30 -08:00