Jian Zhou
789dbb3131
Code clean of sad4xNx4D_sse
...
Replace MMX with SSE2.
Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
2015-12-17 17:43:46 -08:00
Jian Zhou
b158d9a649
Code clean of sad4xN(_avg)_sse
...
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont.
Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
2015-12-17 11:10:42 -08:00
James Zern
b81f04a0cc
Merge "move vp9_avg to vpx_dsp"
2015-12-15 03:41:22 +00:00
James Zern
d36659cec7
move vp9_avg to vpx_dsp
...
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
Jian Zhou
2404e3290e
Merge "Code clean of tm_predictor_32x32"
2015-12-14 17:56:01 +00:00
Jian Zhou
6e87880e7f
Merge "Speed up tm_predictor_16x16"
2015-12-11 18:55:46 +00:00
Jian Zhou
88120481a4
Code clean of tm_predictor_32x32
...
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.
Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
2015-12-11 10:32:08 -08:00
Jian Zhou
62f986265f
Merge "SSE2 based h_predictor_32x32"
2015-12-11 18:02:34 +00:00
James Zern
ecb8dff768
Merge "dc_left_pred[48]: fix pic builds"
2015-12-11 02:48:11 +00:00
Jian Zhou
5604924945
Merge "Code clean of dc_left/top_predictor_16x16"
2015-12-11 01:53:44 +00:00
James Zern
40ee78bc19
dc_left_pred[48]: fix pic builds
...
GET_GOT modifies the stack pointer so the offset for left's address will
be wrong if loaded afterword.
Change-Id: Iff9433aec45f5f6fe1a59ed8080c589bad429536
2015-12-10 15:44:31 -08:00
Yunqing Wang
322ea7ff5b
Fix the win32 crash when GET_GOT is not defined
...
This patch continues to fix the win32 crash issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1105
Johann's patch is here:
https://chromium-review.googlesource.com/#/c/316446/2
Change-Id: I7fe191c717e40df8602e229371321efb0d689375
2015-12-10 14:25:01 -08:00
Jian Zhou
4ec5953080
Code clean of dc_left/top_predictor_16x16
...
Remove some redundant code.
Change-Id: Ida2e8c0ce28770f7a9545ca014fe792b04295260
2015-12-10 11:59:58 -08:00
Jian Zhou
c90a8a1a43
SSE2 based h_predictor_32x32
...
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.
Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
2015-12-10 10:09:58 -08:00
Johann Koenig
420b9f5bd3
Merge "fix null pointer crash in Win32 because esp register is broken"
2015-12-09 19:31:12 +00:00
Jian Zhou
aa5b517a39
Re-enable SSE2 based intra 4x4 prediction
...
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.
Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
2015-12-07 18:50:37 -08:00
Scott LaVarnway
c7e557b82c
Merge "VP9: Add ssse3 version of vpx_idct32x32_135_add()"
2015-12-07 21:13:35 +00:00
Sergey Kolomenkin
5fc9688792
fix null pointer crash in Win32 because esp register is broken
...
https://bugs.chromium.org/p/webm/issues/detail?id=1105
Change-Id: I304ea85ea1f6474e26f074dc39dc0748b90d4d3d
2015-12-07 12:57:06 -08:00
James Zern
79a9add666
Revert "MMX in intra 4x4 prediction replaced with SSE2"
...
This reverts commit 89a1efa4c436c58c101c8b3de866e3014be7d77a.
This causes a segfault when decoding vp8, in both 32 and 64-bit
Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
2015-12-05 10:20:39 -08:00
Jian Zhou
e86c7c863e
Speed up h_predictor_16x16
...
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
2015-12-04 12:12:55 -08:00
Jian Zhou
da3f08fac3
Speed up h_predictor_8x8
...
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.
Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
2015-12-04 11:36:44 -08:00
Jian Zhou
aa2764abdd
MMX in intra 8x8 prediction replaced with SSE2
...
8x8 Intra predictor implemented with MMX is replaced with SSE2.
Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
2015-12-03 18:11:06 -08:00
Jian Zhou
89a1efa4c4
MMX in intra 4x4 prediction replaced with SSE2
...
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
2015-12-03 16:40:23 -08:00
Jian Zhou
623e988add
Merge "SSE2 speed up of h_predictor_4x4"
2015-12-02 18:49:00 +00:00
Scott LaVarnway
f0b0b1fe62
VP9: Add ssse3 version of vpx_idct32x32_135_add()
...
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-12-02 04:50:46 -08:00
Jian Zhou
c7fae5d893
Speed up tm_predictor_16x16
...
Reduce mem access to left. Speed up by 10% in ./test_intra_pred_speed
with the same instruction size.
Change-Id: Ia33689d62476972cc82ebb06b50415aeccc95d15
2015-11-30 17:46:40 -08:00
Scott LaVarnway
2669e05949
Merge "VPX: x86 asm version of vpx_idct32x32_1024_add()"
2015-11-30 23:28:27 +00:00
Jian Zhou
9d29d76280
SSE2 speed up of h_predictor_4x4
...
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
2015-11-30 10:08:05 -08:00
Scott LaVarnway
0148e20c3c
VPX: x86 asm version of vpx_idct32x32_1024_add()
...
Change-Id: I3ba4ede553e068bf116dce59d1317347988b3542
2015-11-25 10:11:29 -08:00
Jian Zhou
901d20369a
Merge "Speed up tm_predictor_8x8"
2015-11-25 02:34:07 +00:00
Alex Converse
022c848b4d
Change highbd variance rounding to prevent negative variance.
...
Always round sum error and sum square error toward zero in variance
calculations. This prevents variance from becoming negative.
Avoiding rounding variance at all might be better but would be far
more invasive.
Change-Id: Icf24e0e75ff94952fc026ba6a4d26adf8d373f1c
2015-11-24 16:32:01 -08:00
Jian Zhou
f4621c5c8d
Speed up tm_predictor_8x8
...
Left neighbor read from memory only once.
Speed up by ~20% in ./test_intra_pred_speed.
Change-Id: Ia1388630df6fed0dce9a6eeded6cb855bbc43505
2015-11-24 16:07:06 -08:00
Alex Converse
b84fa548fb
Merge "bitreader/writer: Change shift to signed"
2015-11-24 18:33:45 +00:00
Scott LaVarnway
97e6cc6198
VPX: Removed unnecessary pmulhrsw in IDCT32X32_34
...
and fixed macro name.
Change-Id: I306b98a2b4ec80b130ae80290b4cd9c7a5363311
2015-11-23 10:24:09 -08:00
James Zern
16eba81f69
Revert "Speed up h_predictor_4x4"
...
This reverts commit d76032ae87e535be5b924d9e88bbd67189380534.
breaks 32-bit builds
Change-Id: If6266ec2a405b5a21d615112f0f37e8a71193858
2015-11-20 22:25:29 -08:00
James Zern
1b10753ad7
Merge "Speed up h_predictor_4x4"
2015-11-21 01:12:42 +00:00
Alex Converse
612e3c8a0e
Merge "Fix a signed shift overflow in vpx_rb_read_inv_signed_literal."
2015-11-20 17:42:05 +00:00
Scott LaVarnway
e7fc39fdf5
Merge "VPX: x86 asm version of vpx_idct32x32_34_add()"
2015-11-20 15:11:00 +00:00
Alex Converse
6aa2163b69
bitreader/writer: Change shift to signed
...
Silences several legal but suspicious unsigned overflows found with
clang -fsanitize=integer.
Change-Id: I69399751492a183167932b0a10751c433c32ca7b
2015-11-19 15:13:39 -08:00
Alex Converse
42b7c44b2f
Fix a signed shift overflow in vpx_rb_read_inv_signed_literal.
...
Found with clang -fsanitize=integer
Change-Id: I17cb2166c06ff463abfaf9b0e6bc749d0d6fdf94
2015-11-19 15:04:20 -08:00
Jian Zhou
d76032ae87
Speed up h_predictor_4x4
...
Modify h_predictor_4x4 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: Id01c34c48e75b9d56dfc2e93af12cf0c0326a279
2015-11-19 11:34:22 -08:00
Jian Zhou
79b68626ae
Speed up tm_predictor_4x4
...
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.
Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
2015-11-18 16:44:25 -08:00
Scott LaVarnway
ed833048c2
VPX: x86 asm version of vpx_idct32x32_34_add()
...
Change-Id: Ic81f38998fb1b8d33f5a5d7424c2c41002786cef
2015-11-17 17:42:24 -08:00
James Zern
0ccad4d649
Revert "VPX: x86 asm version of vpx_idct32x32_34_add()"
...
This reverts commit 9aeaa2016e7470c4e316d90da33d883098eed6f4.
This causes some test vectors to fail.
Change-Id: I3659a2068404ec5a0591fba5c88b1bec0c9059a4
2015-11-11 11:12:38 -08:00
James Zern
e3efed7f4c
Merge "convolve_copy_sse2: replace SSE w/SSE2 code"
2015-11-10 22:35:12 +00:00
Scott LaVarnway
f48321974b
Merge "VPX: x86 asm version of vpx_idct32x32_34_add()"
2015-11-10 21:40:11 +00:00
Scott LaVarnway
9aeaa2016e
VPX: x86 asm version of vpx_idct32x32_34_add()
...
Change-Id: I8a933c63b7fbf3c65e2c06dbdca9646cadd0b7cb
2015-11-10 11:54:56 -08:00
James Zern
40dab58941
convolve_copy_sse2: replace SSE w/SSE2 code
...
this should be neutral or slightly faster on modern (P4+) architectures
Change-Id: Iec4c080275941eb8c9e05a66a2daf0405d86a69b
2015-11-09 23:45:16 -08:00
Debargha Mukherjee
65dd056e41
Merge "Optimize vpx_quantize_{b,b_32x32} assembler."
2015-10-26 18:04:49 +00:00
Ronald S. Bultje
53dc9fd0a0
vp10: merge ext_ipred_bltr experiment into misc_fixes.
...
Change-Id: I2f2deb700748408b8278b7f5c29ee1f2e39785ec
2015-10-21 22:27:34 -04:00