Commit Graph

3250 Commits

Author SHA1 Message Date
Jim Bankoski
da1bda0fb2 vp9_postproc.c : unused variable if not vp9_highbitdepth.
Change-Id: Ib89b128f23767934c40b5add3fcf9dbd875e82f9
2016-07-15 15:04:57 -07:00
Jim Bankoski
0dc69c70f7 postproc : fix function parameters for noise functions.
Change-Id: I582b6307f28bfc987dcf8910379a52c6f679173c
2016-07-15 08:27:34 -07:00
James Bankoski
7eec1f31b5 Merge "postproc: noise style fixes." 2016-07-13 22:04:47 +00:00
Jim Bankoski
e736691a6d postproc: noise style fixes.
Change-Id: Ifdcb36b8e77b65faeeb10644256e175acb32275d
2016-07-13 12:39:01 -07:00
James Bankoski
e93f2fdb83 Merge "postproc - move filling of noise buffer to vpx_dsp." 2016-07-13 15:31:17 +00:00
Jim Bankoski
2ca24b0075 postproc - move filling of noise buffer to vpx_dsp.
Change-Id: I63ba35dc0ae9286c9812367a531e01d79a4c1635
2016-07-13 07:35:25 -07:00
Jim Bankoski
b24373fec2 deblock: missing const on extern const.
Change-Id: I0df08f7c431daf939e266f008bf5158b0c97358b
2016-07-13 07:27:29 -07:00
Jim Bankoski
6f424a768e vp9_postproc.c missing extern.
BUG=webm:1256

Change-Id: I5271e71bc53cce033fb906040643dcdd5ccb2381
2016-07-12 17:47:49 -07:00
Jim Bankoski
88e6951465 deblock filter : moved from vp8 code branch
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.

Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
2016-07-12 05:53:00 -07:00
Debargha Mukherjee
adbad6092f Merge "Remove decode asserts from better-hw-compatibility" 2016-07-06 20:55:29 +00:00
Debargha Mukherjee
4b6e4e1813 Remove decode asserts from better-hw-compatibility
Safer to have the decoder operate normally and have
better-hw-compatibility only implement encoding changes.
Fixes some test failures.

Change-Id: I0dd70d002e4e893992f0cd59774b9363e6f7fe76
2016-07-06 12:26:38 -07:00
Jingning Han
51aad61c8c Merge "Remove txfrm_block_to_raster_xy() from vp9 encoder" 2016-07-06 16:00:18 +00:00
Jingning Han
14011f037d Remove txfrm_block_to_raster_xy() from vp9 encoder
The transform block row and column positions are always available
outside the callees. There is no need to re-compute these values
again. This approach has been used by the decoder. This commit
removes txfrm_block_to_raster_xy() function.

Change-Id: I5b90f91a0d8b7c35cfa7d171da9edf8202630108
2016-07-04 18:41:47 -07:00
Jacky Chen
ee78c541a4 Merge "vp9 postproc: Bug fix and code clean." 2016-06-30 21:59:44 +00:00
Johann
fe96dbda15 vp9: remove x86inc.asm distinction
BUG=b:29583530

Change-Id: I952da3fc0d4716dec897be0d2e9806af6612722b
2016-06-29 18:55:28 -07:00
jackychen
6b4463dc1f vp9 postproc: Bug fix and code clean.
Bug fix: The crash is caused by not allocating buffer for prev_mip in
postproc_state and prev_mip in postproc_state is only used for MFQE,
ohter postproc modules, deblocking and etc., should not use it.

BUG=webm:1251

Change-Id: I3120d2f50603b4a2d400e92d583960a513953a28
2016-06-28 16:13:44 -07:00
James Zern
f51f67602e *.asm: normalize label format
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.

BUG=b/29583530

Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
2016-06-27 19:46:57 -07:00
James Zern
efad6feb9a Merge "cosmetics: Change few types to their posix version" 2016-06-24 21:50:45 +00:00
Yury Gitman
3b2e2f2f77 cosmetics: Change few types to their posix version
Change-Id: I6d7bc9ed7396e7b0d63ee97bfa473fdea002f9ee
2016-06-24 10:18:06 -07:00
hui su
a5af392aae Add a hardware compatibility feature
This commit adds an encoder workaround to support better
compatibility with a non-compliant hardware vp9 profile 2 decoder.

The known issue with this decoder is:
The decoder assumes a wrong value, 127 instead of the correct
value of 511 and 2047, for any assumed top-left corner pixel in
UV planes for 10 and 12 bit, respectively. Such assumed
top-left corner pixel is used for INTRA prediction when a real
decoded/reconstructed pixel is not avalable, e.g. when it is
located inside the row above the top row or inside the column
left to the leftest column of a video image.

Change-Id: Ic15a938a3107e1b85e96cb7903a5c4220986b99d
2016-06-21 10:33:57 -07:00
hui su
72d4890caf Add vp9 encoder API VP9E_GET_LEVEL to provide bitstream level
Change-Id: I1ef3df0192491035728fe9d5eb25cc66dc2965de
2016-06-15 12:53:28 -07:00
James Zern
97b4f8fe92 Merge "Revert "remove vp9_diamond_search_sad_avx.c"" 2016-06-08 02:56:00 +00:00
Scott LaVarnway
eb09bbe88b Revert "remove vp9_diamond_search_sad_avx.c"
This reverts commit be12fefa4b
and commit 057c1c4034.

Also, the mismatch between the avx version and the
c version has been fixed.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168

For a rt encode using 1080p@60fps material, up to 11% performance
improvement overall was seen.

Change-Id: Icd1f216209ebc6fc0b8da885f32f356fa4355ed0
2016-06-07 17:21:01 -07:00
Linfeng Zhang
304d310975 Fix Visual Studio build failure in filter_selectively_vert_row2() calls
Error messages:
 ..\vp9\common\vp9_loopfilter.c(1312): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]

..\vp9\common\vp9_loopfilter.c(1313): warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
[.build-x86_64-win64-vs10\vpx.vcxproj]

..\vp9\common\vp9_loopfilter.c(1312): error C2220: warning treated as
error - no 'object' file generated
[.build-x86_64-win64-vs10\vpx.vcxproj]

Change-Id: Ia69260611997cd2ba41c7184a85ecead740a7c07
2016-06-03 09:36:58 -07:00
Linfeng Zhang
b26232eb1b Update filter_selectively_vert_row2()
Reduce operations and jumps. perf shows CPU time reduced from 1.9% to
1.6% when decoding fdJc1_IBKJA.248.webm on Xeon E5.
Will apply the changes to vp10 after code review.

Change-Id: I9351509922855d8896ddef1ed093b3ca12619a61
2016-06-01 11:20:47 -07:00
Linfeng Zhang
2ab7b9a6c9 Merge "Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10." 2016-05-27 17:51:35 +00:00
Linfeng Zhang
af7fb17c09 Upgrade fwht4x4_mmx() to fwht4x4_sse2() for vp9 and vp10.
Function level timing test shows about 27% time saving on
a Xeon E5-2680 v2 desktop.

Rename vp9_dct_sse2.c to vp9_dct_intrin_sse2.c for vp9 and
rename dct_sse2.c to dct_intrin_sse2.c for vp10 to avoid
duplicate basenames.

Actually vp9_fwht4x4_mmx/sse2() and vp10_fwht4x4_mmx/sse2()
are identical. TODO: They should be unified later if there is
no intention to keep a duplicate.

Change-Id: I3e537b7bbd9ba417c606cd7c68c4dbbfa583f77d
2016-05-27 09:51:16 -07:00
Yaowu Xu
ba8651d474 Fix comments in build_intra_predictors_high()
1. Removed TODOs, no longer applicable to finalized vp9 profiles.
2. Added explanation on assumed values for highbitdepth profiles.

Change-Id: I59e0bebaaab900cc611ed284daa5fa0bdedb8097
2016-05-25 12:18:35 -07:00
Alex Converse
284be1c9e0 Merge "Move, rename, and inline high_inter_predictor." 2016-05-18 21:20:04 +00:00
Alex Converse
a5191f3e60 Move, rename, and inline high_inter_predictor.
The inlining mirrors what was done with the low bit depth
inter_predictor. And the new highbd_inter_predictor name is more
consistent with other high bit depth functions.

Change-Id: I96437f745759aeec6260c6e39a974bf36f1c211c
2016-05-18 09:39:49 -07:00
Scott LaVarnway
3036fd761a VP9: _get_pred_context_switchable_interp()
Remove unnecessary checks.

Change-Id: Ic7bce8277ac5f4ae88d4ab7d0ae3ab110b2f225b
2016-05-17 15:26:12 -07:00
hui su
be3f0698b0 Add VP9 encoder API for level specification.
Add control API VP9E_SET_TARGET_LEVEL that allows the encoder to
control the output bitstream level and/or keep level related
statistics.

Usage:
               255         do not care about level (default)
               0           keep level related stats only
               10          target for level 1
               11          target for level 1.1
               .
               .
               .
               62          target for level 6.2

Usage for vpxenc:

--target-level=0/255/10/11...

Change-Id: I31d1aeca19358b893e7577b4e63748c8e614034a
2016-05-10 11:48:16 -07:00
James Zern
70c149db7f vp9_idct_intrin_sse2: add missing vp9_rtcd.h include
Change-Id: I39a67ffea7b0a55b45cdf935986439537b65601f
2016-05-04 15:07:27 -07:00
Jim Bankoski
fce3cee8dd Move vpx_add_plane from codec to vpx_dsp and dedup.
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2016-05-02 12:17:39 -07:00
Paul Wilkins
7a4c2c7671 Merge "Experiment to adapt for net AQ offset." 2016-04-26 12:56:52 +00:00
paulwilkins
4b590058c8 Experiment to adapt for net AQ offset.
In Aq mode 1 the segment and AQ delta for each block is based
on spatial variance. There may be a net imbalance between blocks
that have lower Q than the baseline value and those that have higher Q.

This patch monitors that imbalance and extends the allowed baseline
Q range for the frame to accommodate adjustment of that baseline value
to compensate.

Change-Id: Iae8a48c7c01fe2af94a141e149d03acf467237ca
2016-04-25 12:07:07 +01:00
Scott LaVarnway
9e0efb6008 VP9: Do not call vp9_adjust_mask() in vp9_setup_mask()
vp9_adjust_mask() is called again in loop_filter_rows().

Change-Id: If52f5339dfa7971c47b12f9e05f87951044d9391
2016-04-22 13:53:32 -07:00
Jim Bankoski
996ccc3311 vp9_loopfilter.c : fix / clarify todo
Change-Id: Ie3ec67a83d1877d3deae9c7922b6899d915aa19e
2016-04-21 20:39:56 +00:00
Jim Bankoski
6bd28a2d05 vp9_loopfilter.c: Todo clean up encoder should work like decoder.
Change-Id: I570c6859d6e18cd94ce4a29068477b937489399c
2016-04-21 20:36:39 +00:00
Jim Bankoski
df4c95afce vp9_loopfilter.c : todo cleanup
Removed this todo because of another todo which says none of this code
should exist. It should be integrated into the block by block encode
process as per the decoder.

Change-Id: I076bd15140a060e69c014dd7d7cd07fea260aba3
2016-04-21 20:35:30 +00:00
Jim Bankoski
b627af0eb0 vp9_loopfilter.c: Todo try inloop calculation.
This is implemented in the decoder already.  Will add a todo for the
encoder.

Change-Id: I5e78c045cb2edb5ba171022aeeb70051a708b916
2016-04-20 23:53:20 +00:00
Scott LaVarnway
ef98a8f61f VP9: inline vp9_get_intra_inter_context()
Change-Id: I71366140799b9b39474b9b459082cdb250bd1905
2016-04-15 04:58:37 -07:00
hui su
69c7ad3407 Correct comments for scan order neighbors
Change-Id: I5e2dc39bf0ee8e501e4dd358be2e92ae50934593
2016-04-07 11:07:21 -07:00
Scott LaVarnway
a2a97b869f VP9: Refactor vp9_decode_block_tokens()
Change-Id: I30ab27808ec903f9490f36621fb16c197bd35d16
2016-04-01 04:57:39 -07:00
James Zern
74ed95a33e Merge "disable vp9_diamond_search_sad_avx" 2016-04-01 03:33:51 +00:00
James Zern
057c1c4034 disable vp9_diamond_search_sad_avx
this results in different output than C, observed with 1080p input at
speed 2.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1168

Change-Id: Ie58cf20057f4531d1b1d19c7b7eae9e642587ce5
2016-03-30 17:31:28 -07:00
Scott LaVarnway
ba962a5f37 VP9: Eliminate up_available and left_available
Use above_mi and left_mi instead.

Change-Id: I0b50e232c31d11da30aa2fb6f91a695aaf725e0c
2016-03-30 04:47:39 -07:00
James Zern
4950dbceaf Merge changes from topic 'rm-loopfilter-count-param'
* changes:
  lpf_8_test: remove unneeded function wrapper
  remove loopfilter 'count' param TODOs
  split vpx_highbd_lpf_horizontal_16 in two
  split vpx_lpf_horizontal_16 in two
  vpx_highbd_lpf_horizontal_4: remove unused count param
  vpx_highbd_lpf_horizontal_8: remove unused count param
  vpx_highbd_lpf_vertical_4: remove unused count param
  vpx_highbd_lpf_vertical_8: remove unused count param
  vpx_lpf_horizontal_4: remove unused count param
  vpx_lpf_horizontal_8: remove unused count param
  vpx_lpf_vertical_4: remove unused count param
  vpx_lpf_vertical_8: remove unused count param
  lpf_8_test: add missing dspr2 tests
  lpf_8_test: add missing vpx_lpf_horizontal_4 tests
  lpf_8_test: add missing vpx_lpf_vertical_4 tests
  lpf_8_test: simplify function wrapper generation
2016-02-18 18:47:48 +00:00
Alex Converse
09f9c5d7f9 Better workaround for Bug 1089.
Don't initialize first pass costs for a number of symbols where first
pass probabilities aren't initialized.

This brings a 1.22x first pass speedup.

https://bugs.chromium.org/p/webm/issues/detail?id=1089

Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510
2016-02-17 14:46:26 -08:00
James Zern
110d377899 remove loopfilter 'count' param TODOs
Change-Id: I25ce7314372ce2f521526ea7864ffc4ab62e4519
2016-02-16 23:14:03 -08:00
James Zern
9b44d9d00f split vpx_highbd_lpf_horizontal_16 in two
replace with vpx_highbd_lpf_horizontal_edge_16 and
vpx_highbd_lpf_horizontal_edge_8 to avoid passing a count parameter

Change-Id: I551f8cec0fce57032cb2652584bb802e2248644d
2016-02-16 23:13:58 -08:00
James Zern
1b519fb666 split vpx_lpf_horizontal_16 in two
replace with vpx_lpf_horizontal_edge_16 and vpx_lpf_horizontal_edge_8 to
avoid passing a count parameter

Change-Id: I848c95c02a3c6ebaa6c2bdf0983dce05cd645271
2016-02-16 22:57:45 -08:00
James Zern
e7a23d703b vpx_highbd_lpf_horizontal_4: remove unused count param
Change-Id: I655a771e1b1a8753be5669ef9348a312ba6cfdbc
2016-02-16 22:57:45 -08:00
James Zern
5171857329 vpx_highbd_lpf_horizontal_8: remove unused count param
Change-Id: Iaca71ea3796115d4c2d43563b4e6f3914e21f1bf
2016-02-16 22:57:44 -08:00
James Zern
3c1019e49d vpx_highbd_lpf_vertical_4: remove unused count param
Change-Id: Ic6da723c5cf3cd8127db1f476c3e46ea134cb774
2016-02-16 22:57:44 -08:00
James Zern
72a9f06ac2 vpx_highbd_lpf_vertical_8: remove unused count param
Change-Id: Id16f7259897654831d31642c2d5e0bbe5e13416c
2016-02-16 22:57:44 -08:00
James Zern
b1e97c6a25 vpx_lpf_horizontal_4: remove unused count param
Change-Id: Iec7d8eda343991f7d7d46931dca17af23c821d11
2016-02-16 22:57:27 -08:00
James Zern
bd5a5bb561 vpx_lpf_horizontal_8: remove unused count param
Change-Id: I48741e167a7b09b7c9ad3bfc1c4b88ef1029ae46
2016-02-16 22:54:40 -08:00
James Zern
109a47b342 vpx_lpf_vertical_4: remove unused count param
Change-Id: I43a191cb3d42e51e7bca266adfa11c6239a8064c
2016-02-16 14:59:00 -08:00
James Zern
37225744db vpx_lpf_vertical_8: remove unused count param
Change-Id: Ic69406da00afb0f06588e8c0deb2b043952b078c
2016-02-16 14:59:00 -08:00
Marco
3cbc26f31b vp9-resize: Fix an issue with external dynamic resize.
External dynamic resize with swapping width and height was
not handled properly.
Fix is to re-init loop-filter under certain condtions.

Modify unittest to test this case.
Without this change test will fail.

Relates to: https://bugs.chromium.org/p/webm/issues/detail?id=1140

Change-Id: I7d81ca7fe0783b3bc103a52a7b7cf073a96be26e
2016-02-12 15:06:48 -08:00
James Zern
ecd32d6faa Merge "Vidyo patch: Optimization for 1-to-2 downsampling and upsampling." 2016-02-05 02:36:03 +00:00
Scott LaVarnway
989c69303d Vidyo patch: Optimization for 1-to-2 downsampling and upsampling.
Change-Id: I9cc9780f506e025aea57485a9e21f0835faf173c
2016-02-04 14:50:26 -08:00
Paul Wilkins
e062eb16fb Merge "Loop filter search resets on overlay frame." 2016-02-02 14:44:47 +00:00
hui su
5afc4e4c77 Fix some typos.
Change-Id: I32aacd014df6c927cf2893dc096cbe6ec7604b9b
2016-01-27 16:12:49 -08:00
Scott LaVarnway
5232326716 VP9: Eliminate MB_MODE_INFO
Change-Id: Ifa607dd2bb366ce09fa16dfcad3cc45a2440c185
2016-01-19 16:40:20 -08:00
paulwilkins
733bbab53a Loop filter search resets on overlay frame.
This patch fixes a bug that causes the loop filter search to reset to
a low value or zero after each arf overlay frame. We expect the overlay
frames to need little or no loop filtering but this should not propagate.

Change-Id: I895b28474cf200f20d82793f3de40b60b19579fd
2016-01-19 13:05:15 +00:00
Scott LaVarnway
d4bc17d696 Merge "VP9: inline vp9_use_mv_hp()" 2016-01-14 13:36:40 +00:00
Scott LaVarnway
a85e552d95 VP9: Remove decoder args from find_mv_refs_idx()
The decoder does not use this function.

Change-Id: Ie67f909c0f4108ef286789c70df867d4b960a780
2016-01-13 13:30:40 -08:00
Scott LaVarnway
de993a847f VP9: inline vp9_use_mv_hp()
Change-Id: Ib275bfc4c29c572d6c70e5ec6dbfc241590d3e3e
2016-01-13 08:02:05 -08:00
Scott LaVarnway
15939cb2d7 Merge "VP9: Eliminate unnecessary nearest/near searches" 2016-01-12 20:00:59 +00:00
Scott LaVarnway
d8aa40634a VP9: Eliminate unnecessary nearest/near searches
Prior to this patch, read_inter_block_mode_info() would
find the nearmv and nearestmv for all modes.  Now it does not
search for ZEROMV modes and breaks out early for NEARMV and
NEWMV modes.

Change-Id: Ifa7b1eaf58bb03b9c7792ea5012fef477527d0fd
2016-01-12 05:09:06 -08:00
Yaowu Xu
2bd4f44409 Assert no mv clamping for scaled references
Under --enable-better-hw-compabibility, this commit adds the asserts
that no mv clamping is applied for scaled references, so when built
with this configure option, decoder will assert if an input bitstream
triggger mv clamping for scaled reference frames.

Change-Id: I786e86a2bbbfb5bc2d2b706a31b0ffa8fe2eb0cb
2016-01-05 14:55:05 -08:00
Yaowu Xu
ce6d3f1de4 Merge "Assert no 8x4/4x8 partition for scaled references" 2016-01-05 20:35:46 +00:00
Yaowu Xu
03a021a6fc Assert no 8x4/4x8 partition for scaled references
This commit adds a new configure option:

--enable-better-hw-compatibility

The purpose of the configure option is to provide information on known
hardware decoder implementation bugs, so encoder implementers may
choose to implement their encoders in a way to avoid triggering these
decoder bugs.

The WebM team were made aware of that a number of hardware decoders
have trouble in handling the combination of scaled frame reference
frame and 8x4 or 4x8 partitions. This commit added asserts to vp9
decoder, so when built with above configure option, the decoder can
assert if an input bitstream triggers such decoder bug.

Change-Id: I386204cfa80ed16b50ebde57f886121ed76200bf
2016-01-04 18:33:37 -08:00
James Zern
d36659cec7 move vp9_avg to vpx_dsp
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
Jacky Chen
d9bba21306 Merge "Add vp9_avg_4x4_neon and the unit test." 2015-12-09 06:09:33 +00:00
jackychen
303f144eef Add vp9_avg_4x4_neon and the unit test.
Change-Id: I3ef9a9648841374ed3cc865a02053c14ad821a20
2015-12-08 17:23:36 -08:00
Scott LaVarnway
f0b0b1fe62 VP9: Add ssse3 version of vpx_idct32x32_135_add()
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-12-02 04:50:46 -08:00
James Zern
fd51d90159 Merge changes Iaf8cbe95,I6748183d,I2a49811d
* changes:
  add vp9_satd_neon
  fix vp9_satd_sse2
  vp9_satd: return an int
2015-11-25 01:48:53 +00:00
James Zern
eb1d0f8d60 add vp9_satd_neon
~60-65% faster at the function level across block sizes

Change-Id: Iaf8cbe95731c43fdcbf68256e44284ba51a93893
2015-11-24 16:09:10 -08:00
Alex Converse
4b038ad2ef Merge "Deduplicate some high bit depth tables" 2015-11-24 18:24:32 +00:00
James Zern
60760f710f fix vp9_satd_sse2
accumulate satd in 32-bits
+ add unit test

Change-Id: I6748183df3662ddb9d635f9641f9586f2fd38ad5
2015-11-20 14:35:46 -08:00
James Zern
3e0138edb7 vp9_satd: return an int
the final sum may use up to 26 bits

+ add a unit test
+ disable the sse2 as the result will rollover; this will be fixed in a
future commit

Change-Id: I2a49811dfaa06abfd9fa1e1e65ed7cd68e4c97ce
2015-11-20 14:35:38 -08:00
paulwilkins
0149fb3d6b Changes to exhaustive motion search.
This change alters the nature and use of exhaustive motion search.

Firstly any exhaustive search is preceded by a normal step search.
The exhaustive search is only carried out if the distortion resulting
from the step search is above a threshold value.

Secondly the simple +/- 64 exhaustive search is replaced by a
multi stage mesh based search where each stage has a range
and step/interval size. Subsequent stages use the best position from
the previous stage as the center of the search but use a reduced range
and interval size.

For example:
  stage 1: Range +/- 64 interval 4
  stage 2: Range +/- 32 interval 2
  stage 3: Range +/- 15 interval 1

This process, especially when it follows on from a normal step
search, has shown itself to be almost as effective as a full range
exhaustive search with step 1 but greatly lowers the computational
complexity such that it can be used in some cases for speeds 0-2.

This patch also removes a double exhaustive search for sub 8x8 blocks
which also contained  a bug (the two searches used different distortion
metrics).

For best quality in my test animation sequence this patch has almost
no impact on quality but improves encode speed by more than 5X.

Restricted use in good quality speeds 0-2 yields significant quality gains
on the animation test of 0.2 - 0.5 db with only a small impact on encode
speed. On most clips though the quality gain and speed impact are small.

Change-Id: Id22967a840e996e1db273f6ac4ff03f4f52d49aa
2015-11-13 10:16:31 +00:00
Geza Lore
5eefd3ebfd Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
  - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
  - For all i: mvsadcost[0][i] == mvsadcost[1][i]
        (equal per component cost)
  - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
        (Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.

Change-Id: I6c2791d43022822a9e6ab43cd124a773946d0bdc
2015-11-11 14:03:47 +00:00
James Zern
30466f26b4 Revert "Add AVX vectorized vp9_diamond_search_sad"
This reverts commit f1342a7b07.

This breaks 32-bit builds:
 runtime error: load of misaligned address 0xf72fdd48 for type 'const
__m128i' (vector of 2 'long long' values), which requires 16 byte
alignment

+ _mm_set1_epi64x is incompatible with some versions of visual studio

Change-Id: I6f6fc3c11403344cef78d1c432cdc9147e5c1673
2015-11-06 13:15:01 -08:00
Yunqing Wang
57cae22c1e Merge "Add AVX vectorized vp9_diamond_search_sad" 2015-11-05 20:17:13 +00:00
Geza Lore
f1342a7b07 Add AVX vectorized vp9_diamond_search_sad
This function now has an AVX intrinsics version which is about 80%
faster compared to the C implementation. This provides a 2-4% total
speed-up for encode, depending on encoding parameters. The function
utilizes 3 properties of the cost function lookup table, constructed
in 'cal_nmvjointsadcost' and 'cal_nmvsadcosts'.
For the joint cost:
  - mvjointsadcost[1] == mvjointsadcost[2] == mvjointsadcost[3]
For the component costs:
  - For all i: mvsadcost[0][i] == mvsadcost[1][i]
        (equal per component cost)
  - For all i: mvsadcost[0][i] == mvsadcost[0][-i]
        (Cost function is even)
These must hold, otherwise the AVX version of the function cannot be used.

Change-Id: I184055b864c5a2dc37b2d8c5c9012eb801e9daf6
2015-11-05 10:02:17 +00:00
Alex Converse
246e0eaa71 Deduplicate some high bit depth tables
Change-Id: I6977f7d155cc1e81ae2393933893caac6770821f
2015-11-03 15:40:44 -08:00
hui su
e085fb643f Generate intra prediction reference values only when necessary
This can help increase encoding speed substantially.

Change-Id: Id0c009146e6e74d9365add71c7b10b9a57a84676
2015-11-02 10:26:50 -08:00
Alex Converse
989193c797 Make the zero handling in extend_to_full_distribution more explicit.
The old workaround "p = 0 ? 0 : p -1" is misleading.

?: happens before =
assigning back to p truncates to one byte.

Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.

https://bugs.chromium.org/p/webm/issues/detail?id=1089

Change-Id: I587c44dd61c1f3767543c0126376f881889935af
2015-10-29 14:46:55 -07:00
Alex Converse
663960e757 Revert "Replace the zero handling in extend_to_full_distribution."
This reverts commit 7f56cb2978.

It causes uninitialized reads in the first pass setting up later cost tables.

Change-Id: I2df498df3f5c03eff359f79edf045aed0c618dc9
2015-10-28 11:51:40 -07:00
Alex Converse
7f56cb2978 Replace the zero handling in extend_to_full_distribution.
The old workaround "p = 0 ? 0 : p -1" is misleading.

?: happens before =
assigning back to p truncates to one byte.

Therefore it is equivalent to (p - 1) & 0xFF, but the check just exists
to work around a first pass bug, so let's make the work around more
clear.

https://code.google.com/p/webm/issues/detail?id=1089

Change-Id: Ia6dcc8922e1acbac0eeca23a4d564a355c489572
2015-10-26 11:29:46 -07:00
Geza Lore
aa8f85223b Optimize vp9_highbd_block_error_8bit assembly.
A new version of vp9_highbd_error_8bit is now available which is
optimized with AVX assembly. AVX itself does not buy us too much, but
the non-destructive 3 operand format encoding of the 128bit SSEn integer
instructions helps to eliminate move instructions. The Sandy Bridge
micro-architecture cannot eliminate move instructions in the processor
front end, so AVX will help on these machines.

Further 2 optimizations are applied:

1. The common case of computing block error on 4x4 blocks is optimized
as a special case.
2. All arithmetic is speculatively done on 32 bits only. At the end of
the loop, the code detects if overflow might have happened and if so,
the whole computation is re-executed using higher precision arithmetic.
This case however is extremely rare in real use, so we can achieve a
large net gain here.

The optimizations rely on the fact that the coefficients are in the
range [-(2^15-1), 2^15-1], and that the quantized coefficients always
have the same sign as the input coefficients (in the worst case they are
0). These are the same assumptions that the old SSE2 assembly code for
the non high bitdepth configuration relied on. The unit tests have been
updated to take this constraint into consideration when generating test
input data.

Change-Id: I57d9888a74715e7145a5d9987d67891ef68f39b7
2015-10-21 12:30:40 +01:00
Yaowu Xu
568429512e Add a new enum type vpx_color_range_t
to make meaning of color_range obvious.

Change-Id: I303582e448b82b3203b497e27b22601cc718dfff
2015-10-16 16:27:18 -07:00
Geza Lore
0134764fa6 Optimization of 8bit block error for high bitdepth
If high bit depth configuration is enabled, but encoding in profile 0,
the code now falls back on optimized SSE2 assembler to compute the
block errors, similar to when high bit depth is not enabled.

Change-Id: I471d1494e541de61a4008f852dbc0d548856484f
2015-10-08 14:05:25 -07:00
Alex Converse
2f7f482c77 vp9: simplify extrabits encoding
Change-Id: I5a2abd35cb303d8f6354b3119ab95acf90405116
2015-10-06 16:26:08 -07:00
Debargha Mukherjee
cb5c47f20d Merge "Accelerated transform in high bit depth" 2015-10-02 06:55:55 +00:00
hui su
06bdc7f6db Small cleanup
Change-Id: I5aeaa94b743f84738d288f8b027fec4c164f2ec3
2015-10-01 11:19:13 -07:00
Scott LaVarnway
2f8625d824 VP9: remove plane_type from macroblockd_plane
Change-Id: Ia5072a3a92212d8565f33359f6c146469bdfbbec
2015-09-30 15:15:11 -07:00
Scott LaVarnway
13888e0eef Merge "VP9: remove plane_type checks in loopfilter functions" 2015-09-30 22:11:21 +00:00
James Zern
a18cc591a5 vp9_loopfilter: remove unnecessary masks
Change-Id: I264e75bf3ddd083ee5311c50a37fb18fe634ddc3
2015-09-30 12:12:53 -07:00
James Zern
a1914dbb31 vp9_reset_lfm: harmonize function signature
Change-Id: Ifb0f41fb43564a777be29b4c66443b366fa146a3
2015-09-29 20:46:37 -07:00
Scott LaVarnway
18373264d9 VP9: remove plane_type checks in loopfilter functions
vp9_filter_block_plane_ss11() and vp9_filter_block_plane_non420()
are only called for the uv planes.

Change-Id: Iacd3b3242c8ce581edd37c8f06d95efc8a0f88a3
2015-09-29 15:54:33 -07:00
Scott LaVarnway
66de2b710f Merge "VP9: move loopfilter build masks to decode loop" 2015-09-29 21:40:48 +00:00
Yaowu Xu
45948a03c0 Fix a macro definition
to be consistent with the head file name.

Change-Id: I9634332a2b3fac7e7f3b7ef58821ea7c81c5c813
2015-09-29 09:34:42 -07:00
Scott LaVarnway
7718117104 VP9: move loopfilter build masks to decode loop
The loopfilter masks are now built in the decode loop.
This is done so we can eventually reduce the number of
MODE_INFO structs required by the decoder.

The encoder builds the masks for the entire frame prior
to calling the loopfilter.

Change-Id: Ia2146b07e0acb8c50203e586dfae0c4c5b316f11
2015-09-29 05:20:49 -07:00
Julia Robson
406030d1b0 Accelerated transform in high bit depth
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.

Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
2015-09-28 21:09:16 -07:00
Ronald S. Bultje
36ffe64498 Rename display_{size,width,height} to render_*.
The name "display_*" (or "d_*") is used for non-compatible information
(that is, the cropped frame dimensions in pixels, as opposed to the
intended screen rendering surface size). Therefore, continuing to use
display_* would be confusing to end users. Instead, rename the field
to render_*, so that struct vpx_image can include it.

Change-Id: Iab8d2eae96492b71c4ea60c4bce8121cb2a1fe2d
2015-09-25 21:34:29 -04:00
Scott LaVarnway
5404978825 VP9: Remove frame_parallel_decoding_mode from macroblockd
Not used.

Change-Id: I71527d0ee43a5730f1a2527e7ab687a77a137db4
2015-09-23 16:06:46 -07:00
James Zern
9d8decc162 Merge changes from topic 'tile-thread-cleanup'
* changes:
  vp9/decode_tiles_mt: move frame count accum from loop
  VP9Decoder: remove duplicate tile_worker_info
  vp9/decode_tiles_mt: move some inits from inner loop
  vp9_accumulate_frame_counts: pass counts directly
2015-09-17 22:00:23 +00:00
Ronald S. Bultje
eeb5ef0a24 Add support for color-range.
In decoder, export (eventually) into vpx_image_t.range field. In
encoder, use oxcf->color_range to set it (same way as for
color_space).

See issue 1059.

Change-Id: Ieabbb2a785fa58cc4044bd54eee66f328f3906ce
2015-09-16 06:41:46 -04:00
James Zern
b09aa3ac54 vp9: add extern "C" to headers
Change-Id: I1b6927ad820f99340985b094d415aaab14defaf4
2015-09-09 23:15:59 -07:00
Jingning Han
42b0560319 Fix the sub8x8 block inter prediction with scaled reference frame
Sync the encoder's buffer offset calculation for sub8x8 block motion
compensated prediction with scaled reference frame to match the
decoder's behavior. This resolves an enc/dec mismatch issue when
sub8x8 inter mode with scaled is turned on.

Change-Id: I4bab3672b007a5ae0c992f8a701341892d2458b0
2015-09-08 11:09:30 -07:00
James Zern
0548046ae3 vp9_accumulate_frame_counts: pass counts directly
Change-Id: Ic3c6cfba5b1867c335f2834da936e20caec8597a
2015-09-04 19:47:33 -07:00
Johann
c5f11912ae Include vpx_dsp_common.h when using VPXMIN/MAX
Change-Id: I2e387a06484a06301f3cd6600c4ba2f4335b61ee
2015-08-31 14:36:35 -07:00
James Zern
5e16d397bd vpx_dsp_common: add VPX prefix to MIN/MAX
prevents redeclaration warnings;
vp8 has its own define which will be resolved in a future commit

Change-Id: Ic941fef3dd4262fcdce48b73075fe6b375f11c9c
2015-08-26 20:11:32 -07:00
Jingning Han
89af744ba6 Change vp9_ prefix function names in vpx_scale to vpx_
Change-Id: Iac85902cbbb3e752801dc85de9a3c778e47304aa
2015-08-14 15:27:43 -07:00
hui su
088b05fd99 Use sizeof(variable) instead of sizeof(type)
Change-Id: Ia069da11eebb271063e9eb837bdb3e7175ecce13
2015-08-12 11:25:38 -07:00
Scott LaVarnway
4ef08dcec8 Merge "VPX: Add rtcd support for scaling." 2015-08-11 13:19:00 +00:00
Alex Converse
a8a08ce57e Move vp9_systemdependent.h to vpx_ports bitops.h and system_state.h
Use system_state.h in vpx_dsp and remove unneeded includes of
vp9_systemdependent.h.

Change-Id: I92557ec6dd5aa790160b4f31fe7967db0d7ec3c4
2015-08-10 15:37:14 -07:00
Jingning Han
244912d506 Make build_inter_predictors static function
Remove the function declaration from vp9_reconinter.h file.

Change-Id: I193562151b69ece19b9ee2efa1a791fe2522cca0
2015-08-10 15:51:13 +00:00
Alex Converse
f2e44aa664 Move the msvc round() replacement to msvc.h
Change-Id: If470411c3c62a27f52261f4ece2c5054b71789c7
2015-08-07 18:27:48 -07:00
Alex Converse
610e258cc5 Make the round() replacement match C99 and POSIX.
http://pubs.opengroup.org/onlinepubs/009695399/functions/round.html

Change-Id: Idf387d944d36bf593f8797db9053e11e5c9b9b39
2015-08-07 18:24:21 -07:00
Jingning Han
a9aa29d901 Merge "Add static syntax to copy_mem64x64" 2015-08-07 21:41:32 +00:00
Jingning Han
1057ee4847 Add static syntax to copy_mem64x64
Change-Id: Iee4c853ea4a44ae9f5de60c09e5a7b810f15d2dd
2015-08-07 10:16:27 -07:00
Aℓex Converse
eaa8043a31 Merge "Move VP9 SSIM metrics to vpx_dsp." 2015-08-07 16:43:28 +00:00
Alex Converse
c7b7011b9b Move VP9 SSIM metrics to vpx_dsp.
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
2015-08-06 18:25:25 -07:00
Jingning Han
b4f2c567c8 Cosmetic - align format in vp9
Change-Id: I83ed3422f1f4009675ad2f5c4b7236bc7b83b30e
2015-08-06 15:56:11 -07:00
Jingning Han
3ad75fc623 Merge "Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names" 2015-08-04 22:30:36 +00:00
Jingning Han
08a453b9de Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.

Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
2015-08-04 13:46:11 -07:00
Jingning Han
457a87d986 Merge "Move inverse transfrom dspr2 functions from vp9 to vpx_dsp" 2015-08-04 04:16:22 +00:00
James Zern
a0fd7a9831 Merge "add vp9_vector_var_neon" 2015-08-04 02:30:41 +00:00
Jingning Han
bfad9d2fe6 Move inverse transfrom dspr2 functions from vp9 to vpx_dsp
Change-Id: Ia9cf7c31cab4ba3dd6b9bb668c4b3e84bd55cf69
2015-08-03 11:59:50 -07:00
Jingning Han
92b08f516a Add common_dspr2.c file to vpx_dsp/mips
Move the declaration of commonly referenced variable to
vpx_dsp/mips/common_dspr2.c.

Change-Id: Ia51287b02e2ac5cfae0fba98c721f0810618f28e
2015-08-03 10:53:47 -07:00
Scott LaVarnway
8f6b943100 VPX: Add rtcd support for scaling.
Change-Id: If34bfb0d918967445aea7dc30cd7b55ebfedb1f2
2015-08-03 09:43:34 -07:00
Jingning Han
0b0eba728d Add _dspr2 to local function names
It avoids symbol conflicts between function names of various
implementation versions.

Change-Id: Iad79ebcb8e289457801812a7745c8380b5b06a46
2015-08-02 20:21:59 -07:00
Jingning Han
44849516d4 Factor out mips/msa inverse transform implementations
Move mips/msa inverse transform implementations from vp9 folder to
vpx_dsp.

Change-Id: Ic4cf3f05247c3c63db7b532a0e5000017a962391
2015-08-01 09:25:12 -07:00
Jingning Han
b37494cfb5 Merge "Use precise header files in inverse transform msa implementations" 2015-08-01 16:20:43 +00:00
James Zern
7dc5a689b4 add vp9_vector_var_neon
~50-60% faster depending on the width

Change-Id: I9d007cfa10b9aaa2169c8c009d95522df6123a92
2015-07-31 17:31:58 -07:00
Jingning Han
56c2cb7553 Use precise header files in inverse transform msa implementations
Change-Id: Ie8a79d9e2837842c3f60776b661cd42782b108d5
2015-07-31 23:24:54 +00:00
Jingning Han
e8b133c79c Factor inverse transform functions into vpx_dsp
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.

Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-31 16:21:00 -07:00
Zoe Liu
7cfdc00337 Refactor mips/dspr2 on convolution.
Change-Id: If59a39d5a92c261537342726f94bb7f7f26dfff3
2015-07-31 10:27:42 -07:00
Zoe Liu
7186a2dd86 Code refactor on InterpKernel
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.

Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
2015-07-31 10:27:33 -07:00
James Zern
f42012e526 Merge "add vp9_block_error_fp_neon" 2015-07-29 00:47:09 +00:00
Hui Su
4cbf36b105 Merge "Replace prefix vp9_ with vpx_ for intra prediction functions" 2015-07-29 00:38:48 +00:00
Jingning Han
fc18cf7a11 Merge "Move DC only forward 2D-DCT functions to vpx_dsp" 2015-07-29 00:06:37 +00:00
Aℓex Converse
08d5cf226e Merge "Remove branch in inner loop of foreach_transformed_block_in_plane()" 2015-07-28 21:59:33 +00:00
Jingning Han
d19033fa4e Move DC only forward 2D-DCT functions to vpx_dsp
This completes the forward transform functions layout refactoring.

Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
2015-07-28 14:52:30 -07:00