Commit Graph

2467 Commits

Author SHA1 Message Date
Adrian Grange
7c43fb67ae Fix decoder handling of intra-only frames
This patch fixes bug 633:
https://code.google.com/p/webm/issues/detail?id=633

The first decoded frame does not have to be a keyframe,
it could be an inter-frame that is coded intra-only.

This patch fixes the handling of intra-only frames.

A test vector has also been added that encodes 3
intra-only frames at the start of the clip. The
test vector was generated using the code in the
following patch:
https://gerrit.chromium.org/gerrit/#/c/70680/

Change-Id: Ib40b1dbf91aae2bc047e23c626eaef09d1860147
2014-07-08 16:24:03 -07:00
hkuang
337e8015c9 Move vp9_thread.* to common.
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.

Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
2014-07-07 14:52:19 -07:00
Yaowu Xu
82fd084b35 Merge "Re-design quantization process" 2014-07-01 19:04:01 -07:00
Jingning Han
9ac2f66320 Re-design quantization process
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.

The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.

Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
2014-07-01 17:00:07 -07:00
Alex Converse
6c54dbcb69 Merge "BITSTREAM: Handle transform size and motion vectors more logically for non-420." 2014-06-30 17:44:01 -07:00
James Zern
44472cde55 vp9: disable postproc buffer alloc when unnecessary
the buffer is only used in encoding and only when
CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled.
a future change should decouple this from the frame buffer allocation
and make it conditional based on runtime flags when the above config
options are enabled.
reduces decode heap usage by at least 12%

Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf
2014-06-27 20:59:56 -07:00
Jim Bankoski
52b63c238e Merge "Better validation of invalid files" 2014-06-27 11:05:21 -07:00
Jim Bankoski
9f37d149c1 Better validation of invalid files
This patch checks that a decoder never tries to reference frame that's
outside the range of 2x to 1/16th the size of this frame.  Any attempt
to do so causes a failure.

Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c
2014-06-27 10:03:15 -07:00
Jingning Han
46ea9ec719 Enable real-time version reference motion vector search
This commit enables a fast reference motion vector search scheme.
It checks the nearest top and left neighboring blocks to decide the
most probable predicted motion vector. If it finds the two have
the same motion vectors, it then skip finding exterior range for
the second most probable motion vector, and correspondingly skips
the check for NEARMV.

The runtime of speed -5 goes down
pedestrian at 1080p 29377 ms -> 27783 ms
vidyo at 720p       11830 ms -> 10990 ms
i.e., 6%-8% speed-up.

For rtc set, the compression performance
goes down by about -1.3% for both speed -5 and -6.

Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
2014-06-26 09:49:13 -07:00
Adrian Grange
8357292a5a Fix test on maximum downscaling limits
There is a normative scaling range of (x1/2, x16)
for VP9. This patch fixes the maximum downscaling
tests that are applied in the convolve function.

The code used a maximum downscaling limit of x1/5
for historic reasons related to the scalable
coding work. Since the downsampling in this
application is non-normative it will revert to
using a separate non-normative scaler.

Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f
2014-06-24 10:26:09 -07:00
Adrian Grange
8c1f071f1e Allocate buffers based on correct chroma format
The encoder currently allocates frame buffers before
it establishes what the chroma sub-sampling factor is,
always allocating based on the 4:4:4 format.

This patch detects the chroma format as early as
possible allowing the encoder to allocate buffers of
the correct size.

Future patches will change the encoder to allocate
frame buffers on demand to further reduce the memory
profile of the encoder and rationalize the buffer
management in the encoder and decoder.

Change-Id: Ifd41dd96e67d0011719ba40fada0bae74f3a0d57
2014-06-23 11:45:13 -07:00
Jingning Han
961bafc366 Merge "Remove unused vp9_init_quant_tables function" 2014-06-23 09:37:30 -07:00
Johann
1fc2b0fd00 Merge "Include type defines" 2014-06-20 11:29:19 -07:00
Johann
d658216276 Don't return value for void functions
Clears "warning: 'return' with a value, in function returning void"

Change-Id: I93972610d67e243ec772a1021d2fdfcfc689c8c2
2014-06-20 11:26:44 -07:00
Johann
baef0b89da Include type defines
Clears error: unknown type name 'uint8_t'

Change-Id: I9b6eff66a5c69bc24aeaeb5ade29255a164ef0e2
2014-06-20 11:26:13 -07:00
Alex Converse
7557a65d16 BITSTREAM: Handle transform size and motion vectors more logically for non-420.
This breaks the profile 1 bitstream.

Don't force non420 uv transform size to 1/4 y size. In the 4:2:0 case the
chroma corresponding to a luma block is 1/4 its size. In the 4:4:4 case
chroma and luma planes are the same size. Disallowing larger transforms
can result in a loss of compression efficiency and is inconsistent.

For sub-8x8 blocks only average corresponding motion vectors.

4:2:0 and profile 0 behavior remains unchanged.

Change-Id: I560ae07183012c6734dd1860ea54ed6f62f3cae8
2014-06-18 13:07:51 -07:00
Jingning Han
3b9c19aaa7 Remove unused vp9_init_quant_tables function
This function is not effectively used, hence removed.

Change-Id: I2e8e48fa07c7518931690f3b04bae920cb360e49
2014-06-18 11:51:41 -07:00
James Zern
88df435d6b Merge "vp9_rtcd: correct avx2 references" 2014-06-16 17:39:13 -07:00
Johann
79afb5eb41 Use lrand48 on Android
When building x86 assembly use lrand48 instead of the
undocumented inlined _rand function.

Android now supports rand()
https://android-review.googlesource.com/97731
but only for new versions. Original workaround:
https://gerrit.chromium.org/gerrit/15744

Change-Id: I130566837d5bfc9e54187ebe9807350d1a7dab2a
2014-06-12 19:57:25 -07:00
Jingning Han
d5ae43318e Merge "Fast computation path for forward transform and quantization" 2014-06-12 11:59:52 -07:00
Jingning Han
ccba289f8d Fast computation path for forward transform and quantization
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.

It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.

In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.

Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
2014-06-12 11:10:54 -07:00
James Zern
9f3a0dbb5e vp9_rtcd: correct avx2 references
s/"\$avx2_x86inc"/"avx2"/

avx2 code is all intrinsics and as a result doesn't rely on x86inc.asm

Change-Id: I76ad39474d8a00658f3e43131830ef0f4f34772a
2014-06-10 16:26:36 -07:00
James Zern
cbce09ce62 Merge changes I6abc0657,I8224fba2,I04f64a45,I5d49d119,I76b4d171,I88c11ac3
* changes:
  vp9_sub_pixel_*variance*: disable avx2 variants
  vp9_sad*x4d: disable avx2 variants
  vp9_f(dct|ht): disable avx2 variants
  convolve: disable avx2 variants
  fdct8x8_test: add missing avx2 functions
  dct4x4_test: add missing avx2 functions
2014-06-10 16:14:45 -07:00
James Zern
520cb3f39f vp9_sub_pixel_*variance*: disable avx2 variants
tests failing under Win32/Win64

+ variance_test: add missing avx2 functions (partially disabled)

Change-Id: I6abc0657ea076379ab9ca65c12678b9ea199849d
2014-06-10 16:11:15 -07:00
James Zern
d3ff009d84 vp9_sad*x4d: disable avx2 variants
tests failing under Win32/Win64

+ sad_test: add missing avx2 functions (disabled)

Change-Id: I8224fba2b270f6039ab1877d71e1e512f0081856
2014-06-10 16:10:12 -07:00
hkuang
cdffeaaae0 Add mode info arrays and mode info index.
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and  previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.

In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.

Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
2014-06-10 13:43:36 -07:00
James Zern
dd9f502933 vp9_f(dct|ht): disable avx2 variants
tests failing under Win32/Win64

+ dct16x16_test: add missing avx2 functions (partially disabled)

exercises the forward transforms
no idct/iht implementations, so the c-code is used

Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
2014-06-09 18:48:11 -07:00
James Zern
5704578f5f convolve: disable avx2 variants
tests failing under Win32/Win64

Change-Id: I5d49d11911bcda3a832b14efe5500d22597bedcf
2014-06-09 18:42:03 -07:00
Jingning Han
0c4a4225ec Merge "Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs" 2014-06-03 16:51:39 -07:00
Dmitry Kovalev
19c492a749 Merge "Reusing existing vp9_get{8x8, 16x16}var() instead of new ones." 2014-06-03 10:04:27 -07:00
Deb Mukherjee
fc88292ef2 Remove Wextra warnings from vp9_sad.c
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.

Fixes a bug in original patch:
(https://gerrit.chromium.org/gerrit/#/c/70163/8)
that was reverted due to a nightly test failure.

Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6
2014-06-02 13:50:20 -07:00
Frank Galligan
c40a968e13 Merge "Revert "Remove Wextra warnings from vp9_sad.c"" 2014-06-01 16:58:11 -07:00
Frank Galligan
0b44988952 Revert "Remove Wextra warnings from vp9_sad.c"
This reverts commit 916550428d

Change-Id: I500822b03f09c64ff6ec5396c68edee9ca3b75cb
2014-06-01 16:20:26 -07:00
Jingning Han
ba6bed372b Merge "Fix a potential overflow issue in inverse 16x16 full 2D-DCT" 2014-05-30 15:52:53 -07:00
Jingning Han
2c1cdf69b6 Fix a potential overflow issue in inverse 16x16 full 2D-DCT
An overflow issue could potentially happen in the second round 1-D
transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes
this issue.

Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc
2014-05-29 11:46:32 -07:00
Dmitry Kovalev
e14f900ae3 Merge "Moving itxm_add pointer from MACROBLOCKD to MACROBLOCK." 2014-05-29 11:16:39 -07:00
Dmitry Kovalev
f7ff24cdd0 Reusing existing vp9_get{8x8, 16x16}var() instead of new ones.
Change-Id: I87b7c657d8813d7fb383ab519d150c0ffb1dd377
2014-05-29 11:14:06 -07:00
Jingning Han
6d21cbd20b Enable SSSE3 inverse 2D-DCT with 10 non-zero coeffs
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.

Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
2014-05-28 10:53:33 -07:00
Jingning Han
d5bcef5242 Merge "Fix compiling error in MSVS" 2014-05-27 16:58:00 -07:00
Jingning Han
239e68ddbf Fix compiling error in MSVS
Need to include math.h before tmmintrin.h in some versions of MSVS.

Change-Id: Ia6b83ae599316887ecf30c4e4b9e4355fb8a4219
2014-05-27 15:58:47 -07:00
Yunqing Wang
1f2200080b Revert "Making vp9_get_sse_sum_{8x8, 16x16} static."
This reverts commit e8bbb3d9db.

Change-Id: Ie368d36fd249d323d859d208609c711f04537bbc
2014-05-27 13:37:08 -07:00
Deb Mukherjee
444f93945b Merge "Remove Wextra warnings from vp9_sad.c" 2014-05-27 11:54:05 -07:00
Yunqing Wang
a591ac9e5a Merge "Fix decoder mismatch in sub-pixel AVX2 intrinsic filters" 2014-05-27 10:52:16 -07:00
levytamar82
773596050f Fix decoder mismatch in sub-pixel AVX2 intrinsic filters
The subpixel SSSE3 was fixed in this patch:
https://gerrit.chromium.org/gerrit/#/c/70283/
So the equivalent AVX2 is fixed accordingly.

Change-Id: Ieebbc1949c99d34b12b8b47692df71aca5001f3a
2014-05-23 16:48:40 -07:00
Jingning Han
59c3f446fe Merge "Inverse 16x16 2D-DCT SSSE3 implementation" 2014-05-23 16:01:22 -07:00
Jingning Han
48b0891370 Inverse 16x16 2D-DCT SSSE3 implementation
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.

Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
2014-05-23 15:09:35 -07:00
Yunqing Wang
67ca5b586a Merge "Fix decoder mismatch in sub-pixel SSSE3 intrinsic filters" 2014-05-23 14:24:48 -07:00
Dmitry Kovalev
d7d7cedaaa Merge "Removing vp9_pragmas.h." 2014-05-23 12:58:00 -07:00
Yunqing Wang
c5443fc881 Fix decoder mismatch in sub-pixel SSSE3 intrinsic filters
In 8-tap filtering, to guarantee the intermediate results fit in
16 bits, the order of accumulating the products needs to be done
correctly, and the largest product should be added last. This
patch fixed the problem using the method in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation".

Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
2014-05-23 11:52:20 -07:00
Yaowu Xu
9410330893 Merge "change to use assembly version of ssse3 filter code" 2014-05-23 08:02:28 -07:00
Deb Mukherjee
916550428d Remove Wextra warnings from vp9_sad.c
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.

Change-Id: I068cc2391eed51e9b140ea6aba78338c5fec8d71
2014-05-22 22:21:16 -07:00
Yaowu Xu
7a0c9b82f2 change to use assembly version of ssse3 filter code
As mismatchs were found  between the intrinsic version and c only. The
commit temporarily revert to use the matching assembly version to
allow further investigation.

Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
2014-05-22 17:11:57 -07:00
Yunqing Wang
aaf204e550 Merge "Fix a decoding mismatch in sub-pixel filters" 2014-05-22 17:09:14 -07:00
Yunqing Wang
efcdf946ed Fix a decoding mismatch in sub-pixel filters
This did the same correction as the one in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation" to avoid saturation
during filtering.

Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6
2014-05-22 15:42:13 -07:00
Dmitry Kovalev
72ab966d5e Removing vp9_pragmas.h.
Change-Id: I9120a87e27e73e496932d11716937e2fad246521
2014-05-22 13:46:31 -07:00
Deb Mukherjee
e272273443 Renames x86_64 specific asm files
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.

Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
2014-05-21 13:55:56 -07:00
Dmitry Kovalev
35a83677a5 Moving itxm_add pointer from MACROBLOCKD to MACROBLOCK.
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.

Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
2014-05-21 11:09:44 -07:00
Deb Mukherjee
ef750d8472 Merge "Extends temporal filtering to work for 422 data" 2014-05-20 16:31:28 -07:00
Deb Mukherjee
a185bc3350 Extends temporal filtering to work for 422 data
This is needed for profiles 1 and 2.

Change-Id: I5dd7644c2932d055ab89e050d4be7d4117cd1028
2014-05-20 15:19:40 -07:00
hkuang
20c1edf612 Refactor decode_tiles and loopfilter code.
The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.

|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|

For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.

For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"

Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
2014-05-20 14:47:45 -07:00
Dmitry Kovalev
c23c613fdf Merge "Hiding vp9_sub_pel_filters_{8, 8s, 8lp} filters in *.c file." 2014-05-19 10:27:16 -07:00
Dmitry Kovalev
79ba41903f Removing MACROBLOCKD dependency from loop filter.
Change-Id: I9ef40f3d95ab8f94f69e92ea25678a40956bc1ce
2014-05-16 09:48:26 -07:00
Adrian Grange
9dc9f17814 Merge "Fix post-processor macros & remove vizualization" 2014-05-16 09:01:41 -07:00
Dmitry Kovalev
619e6b539a Merge "Removing redundant "8x8" suffix from MODE_INFO vars." 2014-05-15 17:53:31 -07:00
Jim Bankoski
ec82d2dfec Merge "Revert "Remove Wextra warnings from vp9_sad.c"" 2014-05-15 11:54:23 -07:00
Yunqing Wang
c661cf0dad Merge "AVX2 To VP9 Block Error Optimization" 2014-05-15 11:29:29 -07:00
Dmitry Kovalev
ed784a0bc4 Removing redundant "8x8" suffix from MODE_INFO vars.
Change-Id: I7ed7fecc959c6598ff98895f1a5cf7e11ac1615f
2014-05-15 11:14:42 -07:00
Adrian Grange
384bc5163c Fix post-processor macros & remove vizualization
Make all post-processor code conditionally
compilable based on the CONFIG_VP9_POSTPROC
macro.

Also, remove the vizualization code from VP9
since it is out of date and will not compile.

Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd
2014-05-15 08:35:36 -07:00
Jim Bankoski
a16794dd31 Revert "Remove Wextra warnings from vp9_sad.c"
This reverts commit 7ab9a9587b

Nightly test http://build.webmproject.org/jenkins/view/libvpx-nightly-tests/job/libvpx%20unit%20tests%20(valgrind-2)/arch=x86_64-linux-gcc,filter=-*VP8*:*Large.*/276/console

Failed 

This patch did not address all the assembly issues 
some of the vp8 assembly counts on 5 arguments being passed in to this function:   

one example : vp8_sad8x16_wmt

Please address or split this into vp9 and vp8 patches.

Change-Id: I78afcc171649894f887bb8ee3c66de24aaddc7ca
2014-05-15 08:31:20 -07:00
Yaowu Xu
71854f3a6e Merge "vp9_decodeframe.c: cleanup -wextra warnings" 2014-05-15 06:50:51 -07:00
Dmitry Kovalev
021eaabdb8 Hiding vp9_sub_pel_filters_{8, 8s, 8lp} filters in *.c file.
Change-Id: Id401da740b0a0141caaef9e1bcccd981e5cef4a4
2014-05-14 16:21:41 -07:00
levytamar82
1fbab853c8 AVX2 To VP9 Block Error Optimization
vp9_block_error_sse2 can only handle 16 bytes at a time but
the function requires to handle a sequence of 32 bytes at a time
so each 16 bytes is handled in a different register.
With AVX2 optimization the 32 bytes can be handled in one register instead
of two in the SSE2
The vp9_block_error was optimized by 85%.
The user level was optimized by 1.2%

Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
2014-05-14 11:51:07 -07:00
Deb Mukherjee
9687c057f8 Merge "Remove Wextra warnings from vp9_sad.c" 2014-05-14 10:01:50 -07:00
Yaowu Xu
ed09580777 vp9_decodeframe.c: cleanup -wextra warnings
Change-Id: I0315cea6a5e58182bc2556e9825ec2ef0b1480c3
2014-05-14 09:46:11 -07:00
Jingning Han
e5bbb4cfd8 Merge "Silience -wextra warnings in vp9_reconintra.c" 2014-05-14 09:25:08 -07:00
Deb Mukherjee
7ab9a9587b Remove Wextra warnings from vp9_sad.c
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.

Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
2014-05-14 03:17:31 -07:00
Dmitry Kovalev
eecc750b33 Merge "Moving loopfilter call to vp9_decode_frame()." 2014-05-13 17:20:26 -07:00
Jingning Han
806fa6aaca Silience -wextra warnings in vp9_reconintra.c
The warning messages complained that there are unused arguments
in a few prediction modes. This structure was designed on purpose,
such that a wrapper function can cover all prediction mode cases
and make them readily accessible as an pointer array.

This commit silences such warnings.

Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0
2014-05-13 12:54:23 -07:00
Adrian Grange
fd6bf31b8a vp9_convolve.c: cleanup -wextra warnings
Change-Id: I04930aca2293ebbaeb96dfedd2f9c5a55762fd2e
2014-05-13 09:57:24 -07:00
Dmitry Kovalev
ae7d3ef39f Moving loopfilter call to vp9_decode_frame().
Inline loopfilter has been already handled in vp9_decode_frame().
Collecting all similar code in one place now.

Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
2014-05-12 16:19:19 -07:00
Johann
ce23931a3f Only build neon assembly for armv7 targets
Allow selectively building just the intrinsics for armv8

Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
2014-05-12 08:52:02 -07:00
Alex Converse
ec8a3272fa Merge "Add an x86inc MMX fwht4x4." 2014-05-09 13:48:49 -07:00
Jingning Han
9412785b02 Merge changes I3edd4b95,I4514f974,Ie7fa4386
* changes:
  Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT
  Change eob threshold for partial inverse 8x8 2D-DCT to 12
  SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
2014-05-09 09:58:39 -07:00
Alex Converse
b5422fab46 Add an x86inc MMX fwht4x4.
Change-Id: Ib0a73d4863478f9b8a00976379d25d2f6ebbb197
2014-05-08 12:01:27 -07:00
Jingning Han
41a350a83d Change eob threshold for partial inverse 8x8 2D-DCT to 12
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.

The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).

Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
2014-05-08 09:48:58 -07:00
Jingning Han
9e7b09bc5d SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
This commit enables ssse3 assembly implementation of the 8x8
inverse 2D-DCT with only first 10 coefficients non-zero. The
average runtime for this unit goes down from 198 cycles to 129
cycles (34.8% faster).

Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
2014-05-07 17:40:02 -07:00
Dmitry Kovalev
68a600d82a Merge "Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c." 2014-05-07 13:34:05 -07:00
Paul Wilkins
33b1c457ed Revert "Add an MMX fwht4x4"
Includes changes that are not compatible with VS windows builds.
Amongst other things stdint.h is not supported in VS.

This reverts commit 89fbf3de50.

Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
2014-05-07 12:53:27 +01:00
Alex Converse
75d05d5ed4 Merge "Add an MMX fwht4x4" 2014-05-06 11:12:27 -07:00
Jingning Han
d289deb04c Merge "SSSE3 implementation of full inverse 8x8 2D-DCT" 2014-05-06 09:17:22 -07:00
Dmitry Kovalev
e8bbb3d9db Making vp9_get_sse_sum_{8x8, 16x16} static.
Change-Id: Ifb7937c977308c682986f0ce9645a0807d2aa46a
2014-05-05 19:12:38 -07:00
Alex Converse
89fbf3de50 Add an MMX fwht4x4
7% faster encoding a desktop lossless at RT speed 4.

Change-Id: I41627f5b737752616b6512bb91a36ec45995bf64
2014-05-05 15:10:48 -07:00
Jingning Han
52ae97b6aa SSSE3 implementation of full inverse 8x8 2D-DCT
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.

Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
2014-05-05 10:49:27 -07:00
Dmitry Kovalev
25a666ef39 Moving pair_set_epi32 macro into vp9_dct32x32_sse2.c.
Change-Id: I642a7d343677bf934e9a54cf4ad78e908620e39a
2014-05-01 16:45:49 -07:00
Jingning Han
39761eb5d6 Merge "Enable SSSE3 implementation of 8x8 forward 2D-DCT" 2014-04-30 13:41:36 -07:00
Dmitry Kovalev
d2bc8816a1 Merge "Adding search_site_config struct." 2014-04-29 16:59:47 -07:00
Jingning Han
1eaa3a76dc Enable SSSE3 implementation of 8x8 forward 2D-DCT
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
version is turned on only for x86_64. The average unit runtime
goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
This translates into about 1.5% speed-up for pedestrian_area 1080p
at speed 2.

Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
2014-04-29 15:49:18 -07:00
Dmitry Kovalev
9b042dc04c Merge "Removing unused vp9_variance_halfpixvar*() functions." 2014-04-29 14:52:58 -07:00
Dmitry Kovalev
aa464eca5e Adding search_site_config struct.
Change-Id: I2ad333553e673dbabcdc0f0366aea311e90849bf
2014-04-29 10:34:53 -07:00
Dmitry Kovalev
7b59014b74 Removing old unused vp9_tapify.py.
Change-Id: I7d66987fd04a3f98c140fc5f99ed0e9bc01f61d0
2014-04-25 15:19:31 -07:00
Dmitry Kovalev
6e01079cc0 Removing unused vp9_variance_halfpixvar*() functions.
Change-Id: I99695564a3aa9bc8c79ac0a551d257e2ff3ad3c3
2014-04-25 11:50:07 -07:00
Dmitry Kovalev
03e7deae4f Removing unused vp9_sub_pixel_mse* functions.
Change-Id: I8d906da3bd6de0d3042676846f61a8b2a3444508
2014-04-24 11:49:12 -07:00
Dmitry Kovalev
e608418899 Renaming MB_PREDICTION_MODE to PREDICTION_MODE.
Actually, it would be great to have two separate enums INTRA_MODES and
INTER_MODES in future.

Change-Id: I6c4147cf0002853da9c1e03fe9514eab876f01c8
2014-04-22 17:48:31 -07:00
Dmitry Kovalev
55977e4a4f Merge "Moving frame_frags field from VP9Common to VP9_COMP." 2014-04-15 10:39:31 -07:00
Dmitry Kovalev
63fa722179 Removing unused cost arguments from mcomp functions.
Change-Id: Id81a76d18be6b2de69f81bb563d74c3bb356d434
2014-04-11 10:24:36 -07:00
Yunqing Wang
23ccf71924 Merge "Fix encoder uninitialized read errors reported by drmemory" 2014-04-10 09:45:08 -07:00
Dmitry Kovalev
1d5ed021fb Moving frame_frags field from VP9Common to VP9_COMP.
Change-Id: I0f4a5c50561a2653d22c366c214a937272ecfa2c
2014-04-09 20:56:06 -07:00
Dmitry Kovalev
65e650e0c0 Merge "Revert "Converting set_prev_mi() to get_prev_mi()."" 2014-04-09 20:44:30 -07:00
Dmitry Kovalev
60def47f21 Revert "Converting set_prev_mi() to get_prev_mi()."
This reverts commit 22a3e30790

Change-Id: I460d905edf5fb2006da58c18fbe02c04d0c631bb
2014-04-09 15:23:16 -07:00
Tom Finegan
4fffefe189 Merge "Fix avx builds on macosx with clang 5.0." 2014-04-09 13:03:26 -07:00
Dmitry Kovalev
5ed83c3220 Merge "Converting set_prev_mi() to get_prev_mi()." 2014-04-09 10:27:05 -07:00
Yunqing Wang
2e7d327789 Merge "Use source frame difference to make partition decision" 2014-04-09 10:26:42 -07:00
Yunqing Wang
3a6670fcf8 Fix encoder uninitialized read errors reported by drmemory
This patch fixed the uninitialized read errors in Issue 748:
"dr memory VP9 encode errors". In vp9_convolve_avg_sse2,
when width is 4, pavgb reads 8 bytes from dst buffer that is
out of range. An error is reported although the data is not
actually used later. This issue was resolved by preventing
uninitialized reads.

Change-Id: I109a54910aa47139cb13119de86f2062cff207df
2014-04-09 09:59:15 -07:00
Tom Finegan
f600b50a6e Fix avx builds on macosx with clang 5.0.
The macosx release of clang v5.0 identifies itself as:
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)

This version of clang uses the older _mm_broadcastsi128_si256, like
v3.3, as given away in the LLVM svn version above.

Change-Id: I4d6d59d5454efd57d2ae9e75f5eb7486af7cbd0c
2014-04-08 18:56:03 -07:00
Yunqing Wang
4e66293fcb Use source frame difference to make partition decision
Calculate the difference variance between last source frame and
current source frame. The variance is calculated at 16x16 block
level. The variances are compared to several thresholds to decide
final partition sizes.

An adaptive strategy is implemented to decide using
SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
in the video. The switching test is done once every
search_type_check_frequency frames.

The selection of source_var_thresh needs to be investigated
further later.

RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
ssim gain. For clips with large enough static area, the
encoding speedup is around 2% to 15%.

Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
2014-04-08 17:03:02 -07:00
Deb Mukherjee
d35df2d8ea High-level hooks for Profile 2 (10/12 bit)
Adds some high-level hooks for profile 2 before further
progress on the implementation.

According to the definitiion in this patch:
1. Profile 2 only supports 10 or 12 bit color but not 8
2. Profile 2 supports all color sampling modes: 444, 422 and 420,
and alpha plane.
3. Profile 3 is currently undefined.

Please consider the definition carefully and suggest modifications
to the definition as needed.

Change-Id: I5b284fc679e54ac5aee171af72fa7994cfd28995
2014-04-08 16:18:34 -07:00
Dmitry Kovalev
22a3e30790 Converting set_prev_mi() to get_prev_mi().
Change-Id: Iad4002d7aecaae0e25d88e286bacde7e6cd7264f
2014-04-07 16:01:34 -07:00
Dmitry Kovalev
b5e12dda52 Cleaning up vp9_{cx, dx}_iface.c files.
Change-Id: Ib4e31ba74c4b882bd93942ef743f4a189892738d
2014-04-07 10:38:51 -07:00
Dmitry Kovalev
a9f324fa7f Removing interp_kernel from MACROBLOCKD.
Now interp_kernel is obtained when it is really required (based on
mbmi->interp_filter value).

Change-Id: I4c7a93c179d1045eba16e7526c293d02c9b8b47e
2014-04-03 15:28:42 -07:00
Dmitry Kovalev
8b8606a737 Merge "Cleaning up vp9_mvref_common.c." 2014-04-02 11:03:36 -07:00
Dmitry Kovalev
68027a0b8a Merge "Grouping members in MB_MODE_INFO struct." 2014-04-02 11:00:58 -07:00
Dmitry Kovalev
86f44a91f4 Renaming two members in MACROBLOCKD struct.
Renames:
  mi_8x8 -> mi
  mode_info_stride -> mi_stride

Change-Id: I66f3e5fd1e7b7f46f108af5bb711c5fd9493c1be
2014-04-01 17:46:40 -07:00
Dmitry Kovalev
d42976c515 Common configuration for MACROBLOCKD struct.
Change-Id: Ie2ea9dd8bd338cc9fe12ca9033df64f7644c68b3
2014-04-01 10:57:59 -07:00
Dmitry Kovalev
20d868f05d Grouping members in MB_MODE_INFO struct.
Change-Id: Ia6d7e7a08810e0c3401da4d10266828d560e6851
2014-03-28 17:44:13 -07:00
Yaowu Xu
4f857bacd2 [BITSTREAM]Fix the scaling calculation
For very large size video image, the scaling calculation may need use
value beyond the range of int. This commit upgrade the value to 64bit
to make sure the calculation do not wrap around INT_MAX.

The change corrected the decoder behavior.

The bug affects only very large resolution video because the scaling
calculation was sufficient for image size smaller than 2^13.

This resolves issue:
https://code.google.com/p/webm/issues/detail?id=750

Change-Id: I2d2ed303ca6482f31f819f3c07d6d3e98ef3adc5
2014-03-28 16:40:29 -07:00
Dmitry Kovalev
03349d2ba2 Moving dqcoeff array to MACROBLOCKD in decoder.
Change-Id: I3e20c0cdb9d2437bddf21afb255855f2dead8e02
2014-03-28 10:36:16 -07:00
Dmitry Kovalev
38053687bc Cleaning up vp9_mvref_common.c.
Change-Id: I4eb815156ecaab02c9182e6e1abbea0e4d86c441
2014-03-27 17:50:02 -07:00
Dmitry Kovalev
0437575848 Merge "Removing prev_mi_8x8 from MACROBLOCKD." 2014-03-26 15:45:11 -07:00
Dmitry Kovalev
38c2d37b9d Merge "Cleaning up vp9_entropymv.c." 2014-03-26 14:28:45 -07:00
Dmitry Kovalev
63f86c149a Removing prev_mi_8x8 from MACROBLOCKD.
Change-Id: I32beb5f18c10b5771146c55933b5555487f53633
2014-03-26 10:50:34 -07:00
Dmitry Kovalev
ed39c40a2e Moving above_context to VP9_COMMON.
Change-Id: I713af99d1e17e05a20eab20df51d74ebfd1a68d2
2014-03-25 10:40:08 -07:00
Yaowu Xu
34a3628a45 Merge "Fixed a build issue" 2014-03-25 10:22:18 -07:00
Yaowu Xu
59872069d2 Merge "Change back the scaling calculation." 2014-03-25 09:48:21 -07:00
Yaowu Xu
8051563972 Fixed a build issue
Adding the missed include file.

Change-Id: I7e48df6b0633afbebaf1ccb3062ae404e7203dc9
2014-03-25 09:45:54 -07:00
Dmitry Kovalev
5b8c834c1a Initialization code cleanup.
Change-Id: I47a8b4bf9a6cc0063d1a6785eaaad641d0659e24
2014-03-24 12:21:22 -07:00
Dmitry Kovalev
49bb6df0e2 Cleaning up vp9_entropymv.c.
Change-Id: I01b3530779da89acb84c71bac5ccac456f00c5ac
2014-03-24 11:02:27 -07:00
Yunqing Wang
b458bb7c20 Merge "AVX2 SAD Optimization:" 2014-03-24 10:52:32 -07:00
Dmitry Kovalev
ac5bdc0ed8 Merge "Cleaning up vp9_loopfilter.c." 2014-03-24 09:02:06 -07:00
hkuang
22232ec602 Change back the scaling calculation.
Let the calculation to be compatible with Google's HW implementation.

Change-Id: I22e179888cdb0419e230351c0a47661b37051fef
2014-03-24 08:32:56 -07:00
Dmitry Kovalev
9895c9d4dd Merge "Removing redundant {above, left}_seg_context manipulation code." 2014-03-22 22:31:48 -07:00
Dmitry Kovalev
2786938a3c Merge "Renaming and making vp9_update_mode_info_border() static." 2014-03-21 21:19:18 -07:00
Dmitry Kovalev
58cc06f9b3 Cleaning up vp9_loopfilter.c.
Change-Id: I7c7cf7d3c7b00d1c74ffa8aa8fb8d78a0e48326f
2014-03-21 16:31:15 -07:00
Frank Galligan
8345e76d61 Merge "Fix libvpx VP9 decoder dr memory errors" 2014-03-21 15:24:39 -07:00
Dmitry Kovalev
e141f10bfc Renaming and making vp9_update_mode_info_border() static.
Change-Id: Ibb72a29cae9ca9443aae56fc4c5458d190eae279
2014-03-21 14:02:25 -07:00
levytamar82
0fa8b668c1 AVX2 SAD Optimization:
2 functions were optimized for avx2 by using full 256 bit register
In order to handle 32 elements in parallel instead of only 16 in parallel:
1. vp9_sad32x32x4d
2. vp9_sad64x64x4d

The function level gain is 66% and the user level gain is ~1%.

Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
2014-03-21 13:53:32 -07:00
Yunqing Wang
9b5df3fabe Fix libvpx VP9 decoder dr memory errors
Fixed dr memory errors reported in Issue 736:
https://code.google.com/p/webm/issues/detail?id=736

All elements in left_col buffer need to be initialized to ensure
the correctness of SIMD operations in x86 optimized code.

Change-Id: I8e7f26ab45cca8099c1f9342bcf852f828bda7e4
2014-03-21 12:23:47 -07:00
Dmitry Kovalev
4cb37bff96 Removing redundant {above, left}_seg_context manipulation code.
Change-Id: Ib3c1746e61220c629cbd971b2458aa686b5c9e36
2014-03-21 12:12:55 -07:00
Dmitry Kovalev
a57de9da03 Merge "Reusing {above, left}_seg_context vars in both encoder and decoder." 2014-03-21 12:02:42 -07:00
Yaowu Xu
46c71e5eba Merge "Remove duplicate declaration" 2014-03-21 08:44:04 -07:00
Dmitry Kovalev
7ad40117f1 Reusing {above, left}_seg_context vars in both encoder and decoder.
Change-Id: Id1fa36c92cb007b73a450cc8552e810cedad38b9
2014-03-20 16:15:57 -07:00