Commit Graph

486 Commits

Author SHA1 Message Date
Johann Koenig
ad55b1d270 Merge changes Ia3e9122f,Id33eb6c8,I956bd8ce
* changes:
  Remove vp8_clear_system_state
  vpx_dsp: clean up rtcd
  vp8: clean up rtcd
2016-09-29 23:16:45 +00:00
Linfeng Zhang
b3cb065ee4 Refine vpx convolve8 NEON intrinsics optimization
BUG=webm:1290

Change-Id: I5d7fce62270f9d76ef9ce98b3d188ad11fb21873
2016-09-29 12:48:59 -07:00
Johann
7b5a348088 vpx_dsp: clean up rtcd
Remove avx2+ssse3 specialization. Disabling ssse3 now automatically
disables avx2.

Change-Id: Id33eb6c85d1c4ee57128ebe45c995eb15cfcc765
2016-09-29 12:10:07 -07:00
James Zern
93c823e24b vpx_dsp/get_prob: relocate den == 0 test
to get_binary_prob(). the only other caller mode_mv_merge_probs() does
its own test on 0.

BUG=chromium:639712

Change-Id: I1178688706baeca2883f7aadbc254abb219a44ce
2016-09-28 17:42:49 -07:00
James Zern
7481edb33f vpx_dsp/get_prob: make clip_prob branchless
+ inline the function directly as there was only one consumer
(get_prob())

this is an attempt to reduce the amount of branches to workaround an amd
bug. this change is mildly faster or neutral across x86-64, arm.

http://support.amd.com/TechDocs/44739_12h_Rev_Gd.pdf
665 Integer Divide Instruction May Cause Unpredictable Behavior

BUG=chromium:639712

Suggested-by: Pascal Massimino <pascal.massimino@gmail.com>
Change-Id: Ia91823aded79aab469dd68095d44300e8df04ed2
2016-09-28 11:51:46 -07:00
Johann
02fa245d15 mips: clean up wextra warnings
Remove unused zbin variable:
warning: unused parameter ‘zbin’

Use int for loop variables to avoid unsigned conversion:
warning: comparison between signed and unsigned integer expressions

Change-Id: Icea74b870c0ee68a8bf687e796a69392af25a8ad
2016-09-27 13:19:18 -07:00
Urvang Joshi
0aa3e2564f Add compiler warning flag -Wextra and fix related warnings.
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.

Cherry-picked from AOM 4790a69faaec8f03d65f64ff070f6ab4307dbb16

Expands use of (void)x; on unused variables. AOM only supports one codec
in codec_factory.h

Does not include changes to HandleDecodeResult. AOM removed
invalid_file_test.cc which does use the video parameter.

Does not enable -Wextra yet. There are more issues to fix.

BUG=webm:1069

Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
2016-09-27 12:05:01 -07:00
Linfeng Zhang
b46243d7ff Merge "Refactor lpf (size 4 and 8) NEON intrinsics optimization" 2016-09-26 16:11:12 +00:00
James Zern
deadda3dea Merge "vpx_idct32x32_34_add_sse2: rm unneeded transposes" 2016-09-23 02:49:26 +00:00
James Zern
fdd1186f97 vpx_idct32x32_34_add_sse2: rm unneeded transposes
this change is neutral to mildly positive across various x86-64
platforms

Change-Id: I28fb5ae598fc1317b7a42c9a846ac5d57d104784
2016-09-21 19:49:25 -07:00
James Zern
e372bfd5ac variance_neon: sync variance*() w/c,sse2
removes some unnecessary casts and adds a few explicit uint32 ones for
larger sizes to quiet -Wshorten-64-to-32 warnings

Change-Id: I63c5fce8e62c426d5cf5c10a66a113c119a43518
2016-09-21 18:04:45 -07:00
Linfeng Zhang
761e5ec2f6 Refactor lpf (size 4 and 8) NEON intrinsics optimization
Also check in 8x8 8-bit transpose NEON intrinsics optimization
transpose_u8_8x8()

Change-Id: I32d321cf97ea21eab158ac4896990fc9a51681c4
2016-09-19 16:41:37 -07:00
James Zern
6acd061aad variance_avx2: sync variance functions with c-code
add missing int64 -> uint32 cast; quiets -Wshorten-64-to-32 warnings

Change-Id: I4850b36e18dc8b399108342be4bfe0b684aefb78
2016-09-19 16:19:29 -07:00
James Zern
aa0eb67bf7 loopfilter_mb_neon: remove unused load_8x8()
quiets a -Wunused-function warning for arm targets

Change-Id: I293a7e3d3d7d61d6af2fbedad5e8c25126c418b6
2016-09-17 11:00:31 -07:00
Linfeng Zhang
5d73639d8f Merge "Refactor lpf (size 16) NEON intrinsics optimization" 2016-09-17 00:33:30 +00:00
Linfeng Zhang
8107368000 Refactor lpf (size 16) NEON intrinsics optimization
Extract shared code so later lpf size 4 and 8 functions can reuse.

Change-Id: Ibb43ef1fd8651bd2e32fcc4c56cf6fa7ca237401
2016-09-16 09:12:13 -07:00
James Zern
33aef48f29 vpx_subpixel_8t_intrin_avx2: tolerate unversioned clang
assume __clang_major__==0 has the latest version of
_mm256_broadcastsi128_si256. fixes builds with custom clang toolchains.

BUG=b/30970831

Change-Id: I90becd56278e4716bd46e2ba9d910af977e8dfa6
2016-09-16 07:14:17 +00:00
clang-format
5f6d143b41 apply clang-format
Change-Id: I501597b7c1e0f0c7ae2aea3ee8073f0a641b3487
2016-09-15 15:07:53 -07:00
James Zern
4b0e78bfda Merge "vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()" 2016-09-08 01:05:18 +00:00
Scott LaVarnway
309125b1e7 vpx_dsp: added vpx_highbd_idct32x32_1_add_sse2()
Change-Id: I140d93aebadb0eaf6220881e61a0451450081227
2016-09-07 05:58:29 -07:00
Johann Koenig
4d1540f8ce Merge changes from topic 'Wundef'
* changes:
  Enable -Wundef by default
  Define VP8_TEMPORAL_ALT_REF to !CONFIG_REALTIME_ONLY
  Remove CONFIG_DEBUG guards from assert()
  Remove unused function vpx_de_mblock
  Fix -Wundef warning for OUTPUT_FPF
  Fix -Wundef warning for __SANITIZE_ADDRESS__
2016-09-02 01:39:18 +00:00
Johann
24f534ac90 Remove unused function vpx_de_mblock
vpx_config.h was not included so CONFIG_POSTPROC was never defined.

Change-Id: I777de499823afa286734549a8e7f4a93e7ad97f3
2016-08-31 23:01:45 -07:00
Linfeng Zhang
bee7d837ab Update NEON transpose functions.
Unify coding style.

Change-Id: I5826f40c02c882df7353391e0c9dd6cef6bd4b97
2016-08-31 14:58:40 -07:00
Linfeng Zhang
f7cbfed682 Update vpx_lpf_vertical_16_dual_neon() intrinsics
Process 16 samples together.

Change-Id: If6ee8e3377aa2786417f2fc411ba7d87ea8b6799
2016-08-30 11:17:33 -07:00
Linfeng Zhang
4916515511 Update vpx_lpf_horizontal_edge_16_neon() intrinsics
Process 16 samples together.

Change-Id: I9cfbe04c9d25d8b89f63f48f519e812746db754d
2016-08-27 14:47:48 -07:00
Johann Koenig
a70861c435 Merge "Remove halfpix specialization" 2016-08-26 21:28:01 +00:00
James Zern
3ddff4503a add_noise,vpx_setup_noise: correct 'char_dist' type
fixes SSE2/AddNoiseTest.CheckCvsAssembly/0 with -funsigned-char.
visibly broken since:
0dc69c7 postproc : fix function parameters for noise functions.
where the types diverged (char vs. int8)
but likely the return changed in:
2ca24b0 postproc - move filling of noise buffer to vpx_dsp.
when multiple implementations were merged.

Change-Id: I176ca1f170217f05ba7872b0c4de63e41949e999
2016-08-24 21:46:26 -07:00
Johann
d393885af1 Remove halfpix specialization
This function only exists as a shortcut to subpixel variance with
predefined offsets. xoffset = 4 for horizontal, yoffset = 4 for vertical
and both for "hv"

Removing this allows the existing optimizations for the variance
functions to be called. Instead of having only sse2 optimizations, this
gives sse2, ssse3, msa and neon.

BUG=webm:1273

Change-Id: Ieb407b423b91b87d33c4263c6a1ad5e673b0efd6
2016-08-23 17:05:39 -07:00
Linfeng Zhang
f9efbad392 NEON asm of vpx_lpf_{horizontal,vertical}_8_dual_neon()
Also expose the NEON intrinsics version.

BUG=webm:1261, webm:1266.

Change-Id: I8c4ae658467dcf66ebf7a75982b2ef712dbb4535
2016-08-16 08:50:57 -07:00
James Zern
dfcefe06fa Merge "variance_impl_avx2: restore table layout" 2016-08-12 23:02:27 +00:00
James Zern
bd7cfb46fb variance_impl_avx2: restore table layout
disable clang-format for bilinear_filters_avx2

restores the row layout prior to:
099bd7f vpx_dsp: apply clang-format
but keeps the justification used by clang-format

Change-Id: Icf1733a37edb807e74c26b23a93963c03bd08fd7
2016-08-12 11:52:53 -07:00
Linfeng Zhang
f09b5a3328 NEON intrinsics for 4 loopfilter functions
New NEON intrinsics functions:
vpx_lpf_horizontal_edge_8_neon()
vpx_lpf_horizontal_edge_16_neon()
vpx_lpf_vertical_16_neon()
vpx_lpf_vertical_16_dual_neon()

BUG=webm:1262, webm:1263, webm:1264, webm:1265.

Change-Id: I7a2aff2a358b22277429329adec606e08efbc8cb
2016-08-12 09:58:17 -07:00
Johann Koenig
57f49db81f Merge changes I6ef79702,Id332c641,I354b5d22,I84438013
* changes:
  Use common transpose for vpx_idct32x32_1024_add_neon
  Use common transpose for vpx_idct8x8_[12|64]_add_neon
  Use common transpose for vp9_iht8x8_add_neon
  Use common transpose for vpx_idct16x16_[10|256]_add_neon
2016-08-04 22:30:47 +00:00
Johann Koenig
17720b60bb Merge "Remove armv6 target" 2016-08-04 22:21:13 +00:00
James Zern
7f7c888c14 Merge "correct break placement" 2016-08-04 22:19:30 +00:00
Johann
0325b95938 Use common transpose for vpx_idct32x32_1024_add_neon
Change-Id: I6ef7970206d588761ebe80005aecd35365ec50ff
2016-08-04 20:13:18 +00:00
Johann
f4e4ce7549 Use common transpose for vpx_idct8x8_[12|64]_add_neon
Change-Id: Id332c641f05336ef9a45e17493ff149fd0a168f0
2016-08-04 20:13:12 +00:00
Johann
8619203ddc Use common transpose for vpx_idct16x16_[10|256]_add_neon
Change-Id: I84438013f483e82084d33ba9a63c33273d35fcaa
2016-08-04 20:12:53 +00:00
Johann Koenig
b757d89ff9 Merge "Extract neon transpose for re-use" 2016-08-04 20:12:38 +00:00
James Zern
70a7885a65 correct break placement
these should be placed within {}s when present

Change-Id: Ia775fac5373603e77360398f19b07958fb43f476
2016-08-04 13:00:14 -07:00
Johann
d55724fae9 Remove armv6 target
Change-Id: I1fa81cc9cabf362a185fc3a53f1e58de533a41e5
2016-08-04 12:55:06 -07:00
Johann
377cfa31f0 Extract neon transpose for re-use
Change-Id: I5e1c7f4c80d1c6f7fd582ac468c6eaaa3603a06c
2016-08-04 19:04:25 +00:00
Johann
df69c751a7 Don't expand to Q register for 4x4 intrapred
The code was expanding to Q registers so that vqrshn could be used, for
vector quad round shift and narrow. If 4 values are added together,
there is a shift by 2. If 8 values, a shift by 3. Since this accounts
for any possibility of overflow, we can skip the narrowing shift.

This allows keeping the values in D registers and casting the 16 bit
value to 8 bits.

Change-Id: I8d9cfa07176271f492c116ffa6a7b351af0b8751
2016-08-04 18:51:46 +00:00
Alex Converse
d089ac4dda Resolve -Wshorten-64-to-32 warnings in prob.h.
Change-Id: I1244ee908d81467f0fc8a8fce979fc8077a325b4
2016-08-02 15:40:23 -07:00
Alex Converse
3a04c9c9c4 Merge "Resolve -Wshorten-64-to-32 in variance." 2016-08-02 22:26:55 +00:00
Min Chen
407c2e2974 replace by VSTM/VLDM to reduce one of VST1/VLD1
Change-Id: I596567570580babb1a52925541d1fd1045c352f5
2016-07-28 23:01:38 +00:00
Alex Converse
c0241664aa Resolve -Wshorten-64-to-32 in variance.
The subtrahend is small enough to fit into uint32_t.

Change-Id: Ic4d7128aaa665eaf6b25d562610ba8942c46137f
2016-07-28 10:16:31 -07:00
clang-format
956af1d478 vpx_dsp/x86/quantize_sse2.c: apply clang-format
post:
e429080 .clang-format: disable DerivePointerAlignment

Change-Id: I21a0546668edb2b09660e216d4875a1d2ad24d53
2016-07-27 21:41:18 -07:00
clang-format
099bd7f07e vpx_dsp: apply clang-format
Change-Id: I3ea3e77364879928bd916f2b0a7838073ade5975
2016-07-25 14:14:19 -07:00
Ivan Krasin
91369fd9b7 Fix compilation error under Clang 4.0.
The LLVM trunk has reached 4.0 and now __clang_major__ is not enough
to distinguish between old XCode Clang and the new 'real' Clang.
Using __apple_build_version__ allows to make this distinction.

BUG=chromium:631144

Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9
2016-07-25 19:18:49 +00:00
Yury Gitman
3d3f51262c Add VPX_SWAP macro
Change-Id: I60e233eddef238ad918183392794084673f27d2d
2016-07-22 15:41:25 -07:00
James Zern
3d791194f8 vpx_plane_add_noise_c: normalize int types
quiets signed/unsigned mismatch warning

Change-Id: Iaabd7dfff110ba26056258457541f5635d2e85e6
2016-07-16 11:56:55 -07:00
Jim Bankoski
0dc69c70f7 postproc : fix function parameters for noise functions.
Change-Id: I582b6307f28bfc987dcf8910379a52c6f679173c
2016-07-15 08:27:34 -07:00
James Bankoski
7eec1f31b5 Merge "postproc: noise style fixes." 2016-07-13 22:04:47 +00:00
Yaowu Xu
d6197b621d Merge "Fix encoder crashes for odd size input" 2016-07-13 20:05:09 +00:00
Jim Bankoski
e736691a6d postproc: noise style fixes.
Change-Id: Ifdcb36b8e77b65faeeb10644256e175acb32275d
2016-07-13 12:39:01 -07:00
James Bankoski
e93f2fdb83 Merge "postproc - move filling of noise buffer to vpx_dsp." 2016-07-13 15:31:17 +00:00
Jim Bankoski
2ca24b0075 postproc - move filling of noise buffer to vpx_dsp.
Change-Id: I63ba35dc0ae9286c9812367a531e01d79a4c1635
2016-07-13 07:35:25 -07:00
Jim Bankoski
b24373fec2 deblock: missing const on extern const.
Change-Id: I0df08f7c431daf939e266f008bf5158b0c97358b
2016-07-13 07:27:29 -07:00
Jim Bankoski
6f424a768e vp9_postproc.c missing extern.
BUG=webm:1256

Change-Id: I5271e71bc53cce033fb906040643dcdd5ccb2381
2016-07-12 17:47:49 -07:00
Yaowu Xu
98431cde07 Fix encoder crashes for odd size input
Change-Id: Id5c30c419282369cc8c3280d9a70b34a859a71d8
2016-07-12 11:11:26 -07:00
Jim Bankoski
88e6951465 deblock filter : moved from vp8 code branch
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.

Change-Id: I5209d76edafc894b550f751fc76d3aa6799b392d
2016-07-12 05:53:00 -07:00
Jingning Han
7c1fdf02cd Merge "Support measure distortion in the pixel domain" 2016-07-07 18:09:20 +00:00
Jingning Han
e357b9efe0 Support measure distortion in the pixel domain
Use pixel domain distortion metric in speed 0. This improves the
compression performance by 0.3% for both low and high resolution
test sets.

Change-Id: I5b5b7115960de73f0b5e5d0c69db305e490e6f1d
2016-07-06 18:25:17 -07:00
James Zern
5afa3b9150 Merge "improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw" 2016-07-02 03:08:33 +00:00
James Zern
3197172405 Merge "Update vpx subpixel 1d filter ssse3 asm" 2016-07-02 03:08:17 +00:00
Johann
1b833d63d9 vpx_dsp: remove x86inc.asm distinction
BUG=b:29583530

Change-Id: I397d77536b0d3cee0a92cdfe8b76bc4e434d0720
2016-06-29 18:55:58 -07:00
James Zern
3a6a81fc9a Merge changes I9433d858,Iafd05637,If08ce6ca
* changes:
  tests: remove redundant round() definition
  remove visual studio < 2010 workarounds
  configure: remove old visual studio support (<2010)
2016-06-29 23:07:16 +00:00
Linfeng Zhang
6b350766bd Update vpx subpixel 1d filter ssse3 asm
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.

Change-Id: Iba2f1f2fe059a9d142c396d03a6b8d2d3b981e87
2016-06-29 13:48:41 -07:00
Yaowu Xu
63a37d16f3 Prevent negative variance
Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.

Change-Id: I610d9c8aa2d4eebe7bc5f2c5624a9e3cadad4c94
2016-06-29 11:08:17 -07:00
James Zern
c125f4a594 remove visual studio < 2010 workarounds
BUG=b/29583530

Change-Id: Iafd05637eb65f4da54a9c857e79204a77646858a
2016-06-28 20:58:49 -07:00
James Zern
0afe5e405d Merge "*.asm: normalize label format" 2016-06-28 19:22:10 +00:00
Yaowu Xu
d34b49d7b9 psnr.c: use int64_t for sum of differences
Since the values can be negative.

Change-Id: Idda69e9fb47bb34696aeb20170341a0191c5d85e
2016-06-28 09:53:11 -07:00
James Zern
f51f67602e *.asm: normalize label format
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.

BUG=b/29583530

Change-Id: I46e95255e12026dd542d9838e2dd3fbddf7b56e2
2016-06-27 19:46:57 -07:00
Yaowu Xu
7676defca9 Merge "Port metric computation changes from nextgenv2" 2016-06-27 19:18:00 +00:00
Min Chen
b2fb48cfcf improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw
Change-Id: I14c0c2e54d0b0584df88e9a3f0a256ec096bea6e
2016-06-27 17:50:45 +00:00
James Zern
cfd5e0221c Revert "Update vpx subpixel 1d filter ssse3 asm"
This reverts commit 1517fb74fd.

Fixes a segfault in windows x64 builds.

Change-Id: I6a6959cd7e64a28376849a9f2b11fc852a7c1fbe
2016-06-25 11:37:20 -07:00
Yaowu Xu
003a9d20ad Port metric computation changes from nextgenv2
Change-Id: I4aceffcdf7af59ffeb51984f0345c3a4c7e76a9f
2016-06-24 13:52:50 -07:00
Linfeng Zhang
bdeb5febe4 Merge "Update vpx subpixel 1d filter ssse3 asm" 2016-06-23 19:08:04 +00:00
Alex Converse
83db21b2fd vpx_lpf_horizontal_4_sse2: Remove dead load.
Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e
2016-06-22 18:17:41 -07:00
James Zern
527a9fea76 Merge "remove vp10" 2016-06-22 22:35:57 +00:00
Linfeng Zhang
1517fb74fd Update vpx subpixel 1d filter ssse3 asm
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.

Change-Id: I37e3e9c5694737d9134a6bce6698d3e43f8fc962
2016-06-22 13:15:00 -07:00
Yaowu Xu
ef665996ae Prevent negative variance
Due to rounding used computation, HDB variance computation may produce
slightly negative values. This commit adds clamping to make sure
output variance values for 10 and 12 to be non-negative.

Change-Id: Id679aa55a4c201958c4c7d28cd8733b9246a71c8
2016-06-22 17:55:14 +00:00
Yaowu Xu
543ea3eb3e Make type conversion explicit
This fixes MSVC warnings.

Change-Id: I675d8486230b2b74d7973d95720a4995c4750282
2016-06-20 12:05:29 -07:00
James Zern
67edc5e83b remove vp10
development has moved to the nextgenv2 branch and a snapshot from here
was used to seed aomedia

BUG=b/29457125

Change-Id: Iedaca11ec7870fb3a4e50b2c9ea0c2b056a0d3c0
2016-06-17 18:26:08 -07:00
Yaowu Xu
de3a8f23c8 vpx_dsp/quantize.c: fix ubsan warnings
BUG=webm:1219

Change-Id: I0c80271c6b78adf40aa7a4cac9e6b431d56958cb
2016-06-16 21:46:14 +00:00
Yaowu Xu
e5e998a6eb vpx_dsp/variance.c: change to use correct type
This commit change to use int64_t to represent the sum of pixel
differences, which can be negative.

This fixes a number of ubsan warnings.

BUG=webm:1219

Change-Id: I885f245ae895ab92ca5f3b9848d37024b07aac98
2016-06-16 21:45:48 +00:00
Johann
c516dd67bc neon hadamard 16x16
Runs about twice as fast as C

BUG=webm:1027

Change-Id: I6760d99f4e22259439ca35d746194b12a81bfa71
2016-06-14 19:23:38 +00:00
Debargha Mukherjee
697bcef677 Add a couple of missing WRAPLOW checks
To make coefficient checking consistent with the VP9 spec sections
8.7.1.6 and 8.7.1.1.

Change-Id: I92e38e89a41d1e482317bb478c48ffa608d2d6ee
2016-06-09 12:58:27 -07:00
Debargha Mukherjee
c2ebd0e6da Merge "Move range checks into WRAPLOW" 2016-06-06 16:28:24 +00:00
James Zern
e34e684059 Merge changes If31d36c8,I10b947e7
* changes:
  vpx_dsp,add_noise: remove mmx implementation
  vpx_dsp: remove mmx variance implementations
2016-06-04 00:56:06 +00:00
Debargha Mukherjee
aa90983696 Move range checks into WRAPLOW
Provides more comprehensive coverage for --enable-coefficient-checking.
The intent is to make the --enable-coefficient-checking option
consistent with the VP9 spec.

Change-Id: I12d0120756d17572ca2b2d7e6a2ab9d8071d8d58
2016-06-03 11:27:33 -07:00
Linfeng Zhang
b90166665f Merge "Slow pshufb removal in 3 intra prediction functions." 2016-06-03 16:35:14 +00:00
James Zern
462e0ff88b vpx_dsp,add_noise: remove mmx implementation
a sse2 version exists, this is a reasonable modern baseline.

Change-Id: If31d36c8412d25b53f41b4a93cf02f46802c0c33
2016-06-02 23:51:22 -07:00
James Zern
eea8ea88ab vpx_dsp: remove mmx variance implementations
there are sse2 equivalents for all remaining variance implementations

Change-Id: I10b947e73fc0067688181f819b59e47966bec3d2
2016-06-02 23:46:16 -07:00
Linfeng Zhang
ad0646cb84 Slow pshufb removal in 3 intra prediction functions.
Replaced vpx_d45_predictor_4x4_ssse3(), vpx_d45_predictor_8x8_ssse3()
and vpx_d207_predictor_4x4_ssse3() with
created vpx_d45_predictor_4x4_sse2(), vpx_d45_predictor_8x8_sse2()
and vpx_d207_predictor_4x4_sse2() respectively.
It's mostly neutral or slightly worse than ssse3 in good cases and
better than ssse3 in the bad cases (but still worse than using the mmx
regs).

Change-Id: Ib0237ceb71d2c57b8a93fd3170330cfed9d56bdd
2016-06-02 10:55:58 -07:00
Yaowu Xu
46ff1072b3 variance_avx2.c: UBSAN/IOC fix
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1222

Change-Id: Ifb3bedf9b4e1b007b21aebaa4beb9ba50424efef
2016-05-31 16:44:35 -07:00
Linfeng Zhang
0ba9b299e9 Merge "Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2" 2016-05-27 15:47:28 +00:00
Linfeng Zhang
4b5e462d08 Upgrade vpx_lpf_{vertical,horizontal}_4 mmx to sse2
Followed the code style of other lpf fuctions.
These 2 functions put 2 rows of data in a single xmm register,
so they have similar but not identical filter operations,
and cannot share the same macros.

Change-Id: I3bab55a5d1a1232926ac8fd1f03251acc38302bc
2016-05-26 14:55:18 -07:00
Scott LaVarnway
9d24fe60f1 Merge "Code clean of sub_pixel_variance4xh -- 2" 2016-05-26 13:20:24 +00:00
Scott LaVarnway
a4f3751be5 Code clean of sub_pixel_variance4xh -- 2
Replace MMX with SSE2.

Change-Id: Id8482d2589131f9427e7f36bc64413f058caf31f
2016-05-24 04:44:05 -07:00
James Zern
3fb55d24e8 Revert "Code clean of sub_pixel_variance4xh"
This reverts commit 2468163e07.

causes valgrind errors for overread of buffer in SubpelVarianceTest

Change-Id: I448e52c76f815ac199305b71f7d169f2bc167679
2016-05-19 23:37:27 -07:00
Yaowu Xu
d1f0f4cc63 Merge "Clarify integer value ranges" 2016-05-18 23:55:05 +00:00
James Zern
146ccd304f Merge "Code clean of sub_pixel_variance4xh" 2016-05-18 23:18:35 +00:00
Johann Koenig
36b610d8c1 Merge "neon hadamard 8x8" 2016-05-18 20:11:16 +00:00
Yaowu Xu
a564b18d7f Clarify integer value ranges
This commit clarifies integer value range for vairables used in
several variance functions, also change to use proper type
conversion to reflect the value ranges.

Change-Id: Ic3234b83a912ce1ad12d1b254f3378763e15cc5c
2016-05-18 10:25:12 -07:00
Scott LaVarnway
2468163e07 Code clean of sub_pixel_variance4xh
Replace MMX with SSE2.

Change-Id: Ia8fcba755952804e347d7d7736f57d1f90c988a0
2016-05-18 04:24:41 -07:00
Johann
9b54e812f7 neon hadamard 8x8
Runs about 30% faster than the C

BUG=webm:1021

Change-Id: I6809d6d84c3077ab619c53298296950e976bdaba
2016-05-16 11:58:02 -07:00
Yaowu Xu
c1e4f5a80d Merge "Change to use correct check for halfpel" 2016-05-13 01:27:47 +00:00
Linfeng Zhang
2f55beb355 Merge "remove mmx variance functions" 2016-05-11 22:21:23 +00:00
Yaowu Xu
17fae3ad0a Change to use correct check for halfpel
In motion estimation stage for subpel motion, subpel variance is
computed use bilinear interpolation. The motion vector precision
used is at 1/8 pel and three bits are used to represent the x and y
subpel offsets. Based on this, the half pel check should be against
4, not 8.

Change-Id: I1f56fa1fa3f2f5e19a20d27983efe628557f170e
2016-05-11 13:52:59 -07:00
Linfeng Zhang
d0ffae825d remove mmx variance functions
there are sse2 equivalents which is a reasonable modern baseline
Removed mmx variance functions:
vpx_get_mb_ss_mmx()
vpx_get8x8var_mmx()
vpx_get4x4var_mmx()
vpx_variance4x4_mmx()
vpx_variance8x8_mmx()
vpx_mse16x16_mmx()
vpx_variance16x16_mmx()
vpx_variance16x8_mmx()
vpx_variance8x16_mmx()

Change-Id: Iffaf85344c6676a3dd337c0645a2dd5deb2f86a1
2016-05-11 12:39:42 -07:00
Linfeng Zhang
d0e687bf8c remove mmx sad functions
there are sse2 equivalents which is a reasonable modern baseline

Change-Id: Ibbe536a5ad1c2cccef6bdcc75c13b3dde35a56ba
2016-05-11 10:50:04 -07:00
Jim Bankoski
da33728f48 vpx_dsp: Rename postproc.c add_noise.
Change-Id: I4906d1b79a2951e659995202b9fa97e2ea5cfba0
2016-05-10 06:52:58 -07:00
Scott LaVarnway
c2c5297595 Merge "VPX: refactor vpx_idct16x16_1_add_sse2()" 2016-05-09 22:15:17 +00:00
James Bankoski
7cced7b3ea Merge "libvpx: vpx_add_plane_noise make c match assembly" 2016-05-09 20:17:38 +00:00
Johann Koenig
9e5811f485 Merge changes Id13b97f4,I1d342725
* changes:
  The subfunctions are only defined for sse2
  Unlike non-hbd variance, opt2 is never used
2016-05-09 18:38:59 +00:00
Scott LaVarnway
1490342be5 VPX: refactor vpx_idct16x16_1_add_sse2()
Change-Id: I431ea0d9abe764d110a1ba32a8cb15e2fdac8805
2016-05-09 09:50:00 -07:00
Jim Bankoski
7a91d21d69 libvpx: vpx_add_plane_noise make c match assembly
This change makes the c match the assembly and removes the todo's
associated with getting this to work.

Change-Id: Ie32e9ebb584a9d60399662d8bcb71b74fbd19d1e
2016-05-07 12:47:49 -07:00
Johann
7e4c306981 Use canonical avg_pred functions
Change-Id: Ibe0cc388226622561d2b4a00e5bdc1016a3c4a94
2016-05-06 19:06:03 -07:00
Johann
b23bd2360f The subfunctions are only defined for sse2
See highbd_subpel_variance_impl_sse2.asm

Change-Id: Id13b97f4f6d189ed71cdc6d52b3c4ea63dc1da05
2016-05-06 18:58:49 -07:00
Johann
a761197fbd Unlike non-hbd variance, opt2 is never used
Change-Id: I1d342725df332c4efc6006d9e3dcb7372c41f448
2016-05-06 18:38:04 -07:00
James Zern
5e679848e8 Merge changes from topic 'missing-proto'
* changes:
  vp9_frame_scale_ssse3.c: make 2 functions static
  vp9_pickmode.c: make function static
  vp9_noise_estimate.c: make function static
  vp9_aq_360.c: add missing include
  vp9_idct_intrin_sse2: add missing vp9_rtcd.h include
  vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include
2016-05-06 02:25:29 +00:00
James Zern
2184692c07 vpx_dsp/*.[hc]: add missing vpx_dsp_rtcd.h include
Change-Id: I103be7eee36492f8619144ce8325bc916d4975c7
2016-05-04 15:06:44 -07:00
James Zern
4f69f741d8 vpx_dsp_common.h: remove circular include
Change-Id: I05b3028a38bbc062c388eeb95e99a3fee583ae6b
2016-05-04 14:54:53 -07:00
James Zern
aa68a8301e vpx_dsp_common.h: fix include guard
Change-Id: I1ad41c096ec86870f9aecab6fdbc3af03e972afc
2016-05-04 14:54:32 -07:00
James Bankoski
89f905e5e5 Merge "libvpx: add a unit test for plane_add_noise." 2016-05-04 13:09:05 +00:00
Jim Bankoski
34d5aff747 libvpx: add a unit test for plane_add_noise.
In so doing this fixes a couple of bugs:

vpx_plane_add_noise.c needed to subtract a clamp instead of add.
And the assembly (mmx sse) had assumptions that parameters were
continuous in memory which was not true.

Change-Id: I76f2c43cf54bfc838eb2edf8a443eaaa7565d7b5
2016-05-03 16:23:06 -07:00
James Bankoski
e755a283dd Merge "Move vpx_add_plane from codec to vpx_dsp and dedup." 2016-05-03 14:11:57 +00:00
Jim Bankoski
fce3cee8dd Move vpx_add_plane from codec to vpx_dsp and dedup.
Change-Id: I12218d8331c0558c0587a66321e3ca46da7e5cc7
2016-05-02 12:17:39 -07:00
Alex Converse
f6d13e7be5 Merge "bitreader: remove an unsigned overflow." 2016-04-28 16:26:37 +00:00
Alex Converse
a68b24fdee Tweak casts on vpx_sub_pixel_variance to avoid implicit overflow.
Change-Id: I481eb271b082fa3497b0283f37d9b4d1f6de270c
2016-04-27 16:37:18 -07:00
Alex Converse
36a0c7ffe3 bitreader: remove an unsigned overflow.
bits_left is in the range [0, 64 (= BD_VALUE_SIZE)] , so the narrowing
conversion should be safe.

Change-Id: I943fcd359eaad76249ee1e1fb03a2ac16945d2fd
2016-04-27 15:31:35 -07:00
Alex Converse
6c4007be1c Be explicit about overflow in vpx_variance16x16_sse2.
The product always fits in uint32_t, but the operands don't.

An optimizing compiler should generate the wraparound code.
(Verified with clang).

Change-Id: I25eb64df99152992bc898b8ccbb01d55c8d16e3c
2016-04-27 15:22:17 -07:00
Alex Converse
ccb894ce73 Remove casts on < 16x16 variance.
These blocks will never overflow since max sum is +/-255*w*h.

Change-Id: Ia2c630339fd9cfb411b56b6040ff402095f12a2e
2016-04-27 15:21:58 -07:00
Johann
2f5840de3e vpx_minmax_8x8_neon and test
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1156

Change-Id: Ief0ad8d6255b0ef0f233cda153799e3c72d3dbc6
2016-04-21 21:40:25 -07:00
Johann
8c02a36953 hadamard 8x8 test
The order of the output structure is not currently important.

BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1021

Change-Id: Ibc0006d569675db6c5060c4529f5d9e73f2e96a6
2016-04-21 22:28:21 +00:00
Johann Koenig
c59c5cbeff Merge "Enable vpx_idct32x32_1024_add_neon for neon as well, not only for neon_asm" 2016-04-15 16:00:51 +00:00
Martin Storsjo
d8b3e29ee7 Enable vpx_idct32x32_1024_add_neon for neon as well, not only for neon_asm
This was never hooked up for the 32x32_34 case as the neon_asm version
in 3f7c12da, when the intrinsics version was added.

Change-Id: Ic7db4ce5850c637315f9fe9e2de93a4f8cf9e320
2016-04-15 10:25:47 +03:00
Johann
26faa3ec7a Apply 'const' to data not pointer
Change-Id: Ic6b695442e319f7582a7ee8e52a47ae3e38c7298
2016-04-14 14:47:16 -07:00
James Zern
5ab46e0ecd Merge changes I7a1c0cba,Ie02b5caf,I2cbd85d7,I644f35b0
* changes:
  vpx_fdct16x16_1_sse2: improve load pattern
  vpx_fdct16x16_1_c/msa: fix accumulator overflow
  vpx_fdctNxN_1_sse2: reduce store size
  dct32x32_test: add PartialTrans32x32Test, Random
2016-04-06 02:51:53 +00:00
James Zern
38bc1d0f4b vpx_fdct16x16_1_sse2: improve load pattern
load the full row rather than doing 2 8-wide columns

Change-Id: I7a1c0cba06b0dc1ae86046410922b1efccb95c95
2016-04-04 16:03:42 -07:00
James Zern
eb64ea3e89 vpx_fdct16x16_1_c/msa: fix accumulator overflow
tran_low_t is only signed 16-bits in non-high-bitdepth mode

Change-Id: Ie02b5caf2658e8e71f995c17dd5ce666a4d64918
2016-04-04 16:03:41 -07:00
James Zern
3735def667 vpx_fdctNxN_1_sse2: reduce store size
only output[0] needs to be set, store_output is more involved than a
movdqa in the high bitdepth case

Change-Id: I2cbd85d7cf74688bdf47eb767934fe42e02bff67
2016-04-04 16:02:06 -07:00
James Zern
c21d437052 vpx_fdct32x32_1_msa: fix accumulator overflow
Change-Id: I33a5432eda3416382e1cea06b45082c0c65faa75
2016-04-02 11:04:38 -07:00
James Zern
f4cae05cd4 vpx_fdctNxN_1_c: remove unnecessary store
only output[0] needs to be set, the other values will be ignored in this
case.

Change-Id: I8e9692fc0d6d85700ba46f70c2e899a956023910
2016-04-01 12:21:59 -07:00
James Zern
0269df41c1 vpx_fdct32x32_1_c: fix accumulator overflow
tran_low_t is only 16-bits in non-high-bitdepth mode

Change-Id: Ifc06110c95e86e6d790c44250d52a538b2e9713b
2016-03-30 15:20:20 -07:00
Scott LaVarnway
67c4c8244a VPX: loopfilter_mmx.asm using x86inc 2
This reverts commit 9aa083d164.

Fixes a decoder mismatch with 32bit PIC builds.

Change-Id: I94717df662834810302fe3594b38c53084a4e284
2016-03-08 04:24:47 -08:00
James Zern
9aa083d164 Revert "VPX: loopfilter_mmx.asm using x86inc"
This reverts commit 15ecdc3970.

breaks 32-bit pic builds

Change-Id: I8bb1b9471a293f05ac7423aaba0339d408931b7a
2016-03-04 18:23:45 -08:00
Scott LaVarnway
dd6729f826 VPX: Remove pmin/pmax from subpixel functions.
These instructions are unnecessary if the adds
are done in the correct order.

Change-Id: I4e533b8267c32e610a4b94203ad052dc9fdabd71
2016-02-27 05:47:56 -08:00