97 Commits

Author SHA1 Message Date
Yunqing Wang
322ea7ff5b Fix the win32 crash when GET_GOT is not defined
This patch continues to fix the win32 crash issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1105

Johann's patch is here:
https://chromium-review.googlesource.com/#/c/316446/2

Change-Id: I7fe191c717e40df8602e229371321efb0d689375
2015-12-10 14:25:01 -08:00
Johann Koenig
420b9f5bd3 Merge "fix null pointer crash in Win32 because esp register is broken" 2015-12-09 19:31:12 +00:00
Jian Zhou
aa5b517a39 Re-enable SSE2 based intra 4x4 prediction
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.

Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
2015-12-07 18:50:37 -08:00
Scott LaVarnway
c7e557b82c Merge "VP9: Add ssse3 version of vpx_idct32x32_135_add()" 2015-12-07 21:13:35 +00:00
Sergey Kolomenkin
5fc9688792 fix null pointer crash in Win32 because esp register is broken
https://bugs.chromium.org/p/webm/issues/detail?id=1105

Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d
2015-12-07 12:57:06 -08:00
James Zern
79a9add666 Revert "MMX in intra 4x4 prediction replaced with SSE2"
This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a.

This causes a segfault when decoding vp8, in both 32 and 64-bit

Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
2015-12-05 10:20:39 -08:00
Jian Zhou
e86c7c863e Speed up h_predictor_16x16
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.

Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
2015-12-04 12:12:55 -08:00
Jian Zhou
da3f08fac3 Speed up h_predictor_8x8
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.

Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
2015-12-04 11:36:44 -08:00
Jian Zhou
aa2764abdd MMX in intra 8x8 prediction replaced with SSE2
8x8 Intra predictor implemented with MMX is replaced with SSE2.

Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
2015-12-03 18:11:06 -08:00
Jian Zhou
89a1efa4c4 MMX in intra 4x4 prediction replaced with SSE2
4x4 Intra predictor implemented with MMX is replaced with SSE2.

Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
2015-12-03 16:40:23 -08:00
Jian Zhou
623e988add Merge "SSE2 speed up of h_predictor_4x4" 2015-12-02 18:49:00 +00:00
Scott LaVarnway
f0b0b1fe62 VP9: Add ssse3 version of vpx_idct32x32_135_add()
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-12-02 04:50:46 -08:00
Scott LaVarnway
2669e05949 Merge "VPX: x86 asm version of vpx_idct32x32_1024_add()" 2015-11-30 23:28:27 +00:00
Jian Zhou
9d29d76280 SSE2 speed up of h_predictor_4x4
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
2015-11-30 10:08:05 -08:00
Scott LaVarnway
0148e20c3c VPX: x86 asm version of vpx_idct32x32_1024_add()
Change-Id: I3ba4ede553e068bf116dce59d1317347988b3542
2015-11-25 10:11:29 -08:00
Jian Zhou
901d20369a Merge "Speed up tm_predictor_8x8" 2015-11-25 02:34:07 +00:00
Jian Zhou
f4621c5c8d Speed up tm_predictor_8x8
Left neighbor read from memory only once.
Speed up by ~20% in ./test_intra_pred_speed.

Change-Id: Ia1388630df6fed0dce9a6eeded6cb855bbc43505
2015-11-24 16:07:06 -08:00
Scott LaVarnway
97e6cc6198 VPX: Removed unnecessary pmulhrsw in IDCT32X32_34
and fixed macro name.

Change-Id: I306b98a2b4ec80b130ae80290b4cd9c7a5363311
2015-11-23 10:24:09 -08:00
James Zern
16eba81f69 Revert "Speed up h_predictor_4x4"
This reverts commit d76032ae87e535be5b924d9e88bbd67189380534.

breaks 32-bit builds

Change-Id: If6266ec2a405b5a21d615112f0f37e8a71193858
2015-11-20 22:25:29 -08:00
James Zern
1b10753ad7 Merge "Speed up h_predictor_4x4" 2015-11-21 01:12:42 +00:00
Scott LaVarnway
e7fc39fdf5 Merge "VPX: x86 asm version of vpx_idct32x32_34_add()" 2015-11-20 15:11:00 +00:00
Jian Zhou
d76032ae87 Speed up h_predictor_4x4
Modify h_predictor_4x4 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: Id01c34c48e75b9d56dfc2e93af12cf0c0326a279
2015-11-19 11:34:22 -08:00
Jian Zhou
79b68626ae Speed up tm_predictor_4x4
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
2015-11-18 16:44:25 -08:00
Scott LaVarnway
ed833048c2 VPX: x86 asm version of vpx_idct32x32_34_add()
Change-Id: Ic81f38998fb1b8d33f5a5d7424c2c41002786cef
2015-11-17 17:42:24 -08:00
James Zern
0ccad4d649 Revert "VPX: x86 asm version of vpx_idct32x32_34_add()"
This reverts commit 9aeaa2016e7470c4e316d90da33d883098eed6f4.

This causes some test vectors to fail.

Change-Id: I3659a2068404ec5a0591fba5c88b1bec0c9059a4
2015-11-11 11:12:38 -08:00
James Zern
e3efed7f4c Merge "convolve_copy_sse2: replace SSE w/SSE2 code" 2015-11-10 22:35:12 +00:00
Scott LaVarnway
f48321974b Merge "VPX: x86 asm version of vpx_idct32x32_34_add()" 2015-11-10 21:40:11 +00:00
Scott LaVarnway
9aeaa2016e VPX: x86 asm version of vpx_idct32x32_34_add()
Change-Id: I8a933c63b7fbf3c65e2c06dbdca9646cadd0b7cb
2015-11-10 11:54:56 -08:00
James Zern
40dab58941 convolve_copy_sse2: replace SSE w/SSE2 code
this should be neutral or slightly faster on modern (P4+) architectures

Change-Id: Iec4c080275941eb8c9e05a66a2daf0405d86a69b
2015-11-09 23:45:16 -08:00
Geza Lore
9cfba09ac0 Optimize vpx_quantize_{b,b_32x32} assembler.
Added optimization of the 8 bit assembly quantizer routines. This makes
these functions up to 100% faster, depending on encoding parameters.

This patch maskes the encoder faster in both the high bitdepth and 8bit
configurations. In the high bitdepth configuration, it effects profile 0
only.

Based on my profiling using 1080p input the net gain is between 1-3% for
the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
depending on target bitrate. The difference between the 8 bit and high
bitdepth configurations for the same encoder run is reduced by 1% in all
cases I have profiled.

Change-Id: I86714a6b7364da20cd468cd784247009663a5140
2015-10-20 10:11:19 +01:00
Johann
ec623a0bb7 Upstream Mozilla fix for older Apple clang builds
Also use the _mm_broadcastsi128_si256 intrisic for
Apple clang versions 4.[012]

https://bugzilla.mozilla.org/show_bug.cgi?id=1085607
https://code.google.com/p/webm/issues/detail?id=1082

Change-Id: I6bc821d8163387194ef663e94bfed91fa7281d88
2015-10-14 07:41:23 -07:00
Alex Converse
0c00af126d Add vpx_highbd_convolve_{copy,avg}_sse2
single-threaded:
swanky (silvermont): ~1% faster overall
peppy (celeron,haswell): ~1.5% faster overall

Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073
2015-10-09 11:50:25 -07:00
Geza Lore
cbada4a982 Remove 4 mova insts from quantize_ssse3_x86_64.asm
Change-Id: If3cb9345b44162e600e6c74873e0cb4c207fc7fb
2015-10-09 07:52:04 -07:00
Julia Robson
37c68efee2 SSSE3 optimisation for quantize in high bit depth
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSSE3 optimisations.

Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5
2015-10-06 13:32:02 +01:00
Scott LaVarnway
b212094839 Merge "VPX: refactor vpx_idct32x32_1_add_sse2()" 2015-10-06 11:35:15 +00:00
Julia Robson
5e6533e707 SSE2 optimisation for quantize in high bit depth
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSE2 optimisation
(but not the SSSE3 optimisation).

Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9
2015-10-05 10:59:16 -07:00
Scott LaVarnway
23d1c06268 VPX: refactor vpx_idct32x32_1_add_sse2()
Change-Id: Ia1a2cac0e9dc05f3207b3433a6c1589fa7f2aee3
2015-10-05 06:33:42 -07:00
Julia Robson
406030d1b0 Accelerated transform in high bit depth
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.

Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
2015-09-28 21:09:16 -07:00
Johann
dd4f953350 Remove vpx_filter_block1d16_v8_intrin_ssse3
This was rewritten and moved to vpx_dsp/x86/vpx_subpixel_8t_ssse3.asm
in 195883023bb39b5ee5c6811a316ab96d9225034d

Change-Id: I117ce983dae12006e302679ba7f175573dd9e874
2015-09-18 16:05:43 -07:00
James Zern
683b5a3161 vpx_subpixel_8t_ssse3: fix reg counts/access
fixes build on windows x64; previously 'heightq' i.e., the 64-bit register
was accessed when only the 32-bit value was needed. given this is from a
stack variable the upper bits were undefined.

+ bump register/xmm counts; users of SETUP_LOCAL_VARS touch xmm13 in
64-bit builds and filter_block1d16_v* uses one extra temp variable

Change-Id: I9c768c0b2047481d1d3b11c2e16b2f8de6eb0d80
2015-09-17 12:27:34 -07:00
Scott LaVarnway
195883023b VPX: subpixel_8t_ssse3 asm using x86inc
This is based on the original patch optimized for 32bit
platforms by Tamar/Ilya and now uses the x86inc style asm.
The assembly was also modified to support 64bit platforms.

Change-Id: Ice12f249bbbc162a7427e3d23fbf0cbe4135aff2
2015-09-03 20:35:51 -07:00
Johann Koenig
5c245a46d8 Merge changes I53b5bdc5,Ib81168a7,Ie0113945
* changes:
  Only build ssse3 filter functions on 64 bit
  Clean up unused function warnings in vp8 encoder
  Clean up unused function warnings in vp8 onyx_if.c
2015-08-27 20:58:53 +00:00
Johann
a28b2c6ff0 Add sse2 versions of halfpix variance
These were lost in the great sub pixel variance move of
6a82f0d7fb9ee908c389e8d55444bbaed3d54e9c

Not having these functions caused a ~10% performance regression in
some realtime vp8 encodes.

Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
2015-08-27 11:58:38 -07:00
Johann
f5507b514c Only build ssse3 filter functions on 64 bit
Avoid an unused function warning by only building the functions when
they will be used.

Change-Id: I53b5bdc5a180c79d63b34e4c8921d679bbc54009
2015-08-26 10:32:18 -07:00
Scott LaVarnway
6c0f6dd817 Merge "VPX: scaled convolve : fix windows build errors" 2015-08-21 12:06:34 +00:00
Scott LaVarnway
acf24cc1b8 VPX: scaled convolve : fix windows build errors
Change-Id: Ic81d435ea928183197040cdf64b6afd7dbaf57e4
2015-08-20 13:09:27 -07:00
Scott LaVarnway
6a21ca20cc Merge "VPX ssse3 scaled convolve" 2015-08-19 22:12:21 +00:00
Jingning Han
49f6ff1103 Rename inv_txfm_sse2.asm to inv_wht_sse2.asm
Change-Id: I43bcc70680503e4c18d8f021097307778cf9ea70
2015-08-19 10:29:53 -07:00
Scott LaVarnway
2030c49cf8 VPX ssse3 scaled convolve
Change-Id: I71d5994e21813554a927d35ebcc26bf7a68984fd
2015-08-18 15:13:02 -07:00
Scott LaVarnway
6cf95bd1e7 Merge "VPX: remove step == 16 and filter[3] != 128 checks" 2015-08-12 20:13:33 +00:00