Commit Graph

232 Commits

Author SHA1 Message Date
James Zern
cedb1db594 Merge "Code clean of highbd_tm_predictor_4x4" 2015-12-22 16:45:01 +00:00
James Zern
a097963f80 Merge "Code clean of highbd_dc_predictor_4x4" 2015-12-22 16:30:37 +00:00
Jian Zhou
db11307502 Code clean of highbd_tm_predictor_4x4
Replace MMX with SSE2, reduce mem access to left neighbor,
loop unrolled.

Change-Id: I941be915af809025f121ecc6c6443f73c9903e70
2015-12-18 18:43:41 -08:00
Jian Zhou
c91dd55eda Code clean of highbd_v_predictor_4x4
MMX replaced with SSE2, same performance.

Change-Id: I2ab8f30a71e5fadbbc172fb385093dec1e11a696
2015-12-18 15:25:27 -08:00
Jian Zhou
8366b414dd Code clean of highbd_dc_predictor_4x4
MMX replaced with SSE2, same performance.

Change-Id: Ic57855254e26757191933c948fac6aa047fadafc
2015-12-18 12:45:23 -08:00
Jian Zhou
789dbb3131 Code clean of sad4xNx4D_sse
Replace MMX with SSE2.

Change-Id: I948ca1be6ed9b8e67f16555e226f1203726b7da6
2015-12-17 17:43:46 -08:00
Jian Zhou
b158d9a649 Code clean of sad4xN(_avg)_sse
Replace MMX with SSE2, reduce psadbw ops which may help Silvermont.

Change-Id: Ic7aec15245c9e5b2f3903dc7631f38e60be7c93d
2015-12-17 11:10:42 -08:00
James Zern
b81f04a0cc Merge "move vp9_avg to vpx_dsp" 2015-12-15 03:41:22 +00:00
James Zern
d36659cec7 move vp9_avg to vpx_dsp
Change-Id: I7bc991abea383db1f86c1bb0f2e849837b54d90f
2015-12-14 14:42:12 -08:00
Jian Zhou
88120481a4 Code clean of tm_predictor_32x32
Reallocate the xmm register usage so that no ARCH_X86_64 required.
Reduce memory access to the left neighbor by half.
Speed up by single digit on big core machine.

Change-Id: I392515ed8e8aeb02e6a717b3966b1ba13f5be990
2015-12-11 10:32:08 -08:00
Jian Zhou
c90a8a1a43 SSE2 based h_predictor_32x32
Relocate the function from SSSE3 to SSE2, Unroll loop from 16 to 8,
and reduce mem access to left.
Speed up by single digit in ./test_intra_pred_speed on big core
machines.

Change-Id: I2b7fc95ffc0c42145be2baca4dc77116dff1c960
2015-12-10 10:09:58 -08:00
Jian Zhou
aa5b517a39 Re-enable SSE2 based intra 4x4 prediction
4x4 Intra predictor implemented with MMX is replaced with SSE2.
Segfault in change 315561 when decoding vp8 is taken care of.

Change-Id: I083a7cb4eb8982954c20865160f91ebec777ec76
2015-12-07 18:50:37 -08:00
Scott LaVarnway
c7e557b82c Merge "VP9: Add ssse3 version of vpx_idct32x32_135_add()" 2015-12-07 21:13:35 +00:00
James Zern
79a9add666 Revert "MMX in intra 4x4 prediction replaced with SSE2"
This reverts commit 89a1efa4c4.

This causes a segfault when decoding vp8, in both 32 and 64-bit

Change-Id: Idbb9bb28ab897e1d055340497c47b49a12231367
2015-12-05 10:20:39 -08:00
Jian Zhou
e86c7c863e Speed up h_predictor_16x16
Relocate the function from SSSE3 to SSE2, Unroll loop from 8 to 4,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.

Change-Id: Ie48229c2e32404706b722442942c84983bda74cc
2015-12-04 12:12:55 -08:00
Jian Zhou
da3f08fac3 Speed up h_predictor_8x8
Relocate the function from SSSE3 to SSE2, Unroll loop from 4 to 2,
and reduce mem access to left.
Speed up by >20% in ./test_intra_pred_speed.

Change-Id: Ib9f1846819783b6e05e2a310c930eb844b2b4d2e
2015-12-04 11:36:44 -08:00
Jian Zhou
aa2764abdd MMX in intra 8x8 prediction replaced with SSE2
8x8 Intra predictor implemented with MMX is replaced with SSE2.

Change-Id: I0c90e7c1e1e6942489ac2bfe58903b728aac7a52
2015-12-03 18:11:06 -08:00
Jian Zhou
89a1efa4c4 MMX in intra 4x4 prediction replaced with SSE2
4x4 Intra predictor implemented with MMX is replaced with SSE2.

Change-Id: Id57da2a7c38832d0356bc998790fc1989d39eafc
2015-12-03 16:40:23 -08:00
Jian Zhou
623e988add Merge "SSE2 speed up of h_predictor_4x4" 2015-12-02 18:49:00 +00:00
Scott LaVarnway
f0b0b1fe62 VP9: Add ssse3 version of vpx_idct32x32_135_add()
Change-Id: I9a780131efaad28cf1ad233ae64c5c319a329727
2015-12-02 04:50:46 -08:00
Jian Zhou
9d29d76280 SSE2 speed up of h_predictor_4x4
Relocate h_predictor_4x4 from SSSE3 to SSE2 with XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: I64e14c13b482a471449be3559bfb0da45cf88d9d
2015-11-30 10:08:05 -08:00
Scott LaVarnway
0148e20c3c VPX: x86 asm version of vpx_idct32x32_1024_add()
Change-Id: I3ba4ede553e068bf116dce59d1317347988b3542
2015-11-25 10:11:29 -08:00
Scott LaVarnway
e7fc39fdf5 Merge "VPX: x86 asm version of vpx_idct32x32_34_add()" 2015-11-20 15:11:00 +00:00
Jian Zhou
79b68626ae Speed up tm_predictor_4x4
tm_predictor_4x4 is implemented with SSE2 using XMM registers.
Speed up by ~25% in ./test_intra_pred_speed.

Change-Id: I25074b78d476a2cb17f81cf654bdfd80df2070e0
2015-11-18 16:44:25 -08:00
Scott LaVarnway
ed833048c2 VPX: x86 asm version of vpx_idct32x32_34_add()
Change-Id: Ic81f38998fb1b8d33f5a5d7424c2c41002786cef
2015-11-17 17:42:24 -08:00
James Zern
0ccad4d649 Revert "VPX: x86 asm version of vpx_idct32x32_34_add()"
This reverts commit 9aeaa2016e.

This causes some test vectors to fail.

Change-Id: I3659a2068404ec5a0591fba5c88b1bec0c9059a4
2015-11-11 11:12:38 -08:00
Scott LaVarnway
f48321974b Merge "VPX: x86 asm version of vpx_idct32x32_34_add()" 2015-11-10 21:40:11 +00:00
Scott LaVarnway
9aeaa2016e VPX: x86 asm version of vpx_idct32x32_34_add()
Change-Id: I8a933c63b7fbf3c65e2c06dbdca9646cadd0b7cb
2015-11-10 11:54:56 -08:00
Debargha Mukherjee
65dd056e41 Merge "Optimize vpx_quantize_{b,b_32x32} assembler." 2015-10-26 18:04:49 +00:00
Geza Lore
9cfba09ac0 Optimize vpx_quantize_{b,b_32x32} assembler.
Added optimization of the 8 bit assembly quantizer routines. This makes
these functions up to 100% faster, depending on encoding parameters.

This patch maskes the encoder faster in both the high bitdepth and 8bit
configurations. In the high bitdepth configuration, it effects profile 0
only.

Based on my profiling using 1080p input the net gain is between 1-3% for
the 8 bit config, and around 2.5-4.5% for the high bitdepth config,
depending on target bitrate. The difference between the 8 bit and high
bitdepth configurations for the same encoder run is reduced by 1% in all
cases I have profiled.

Change-Id: I86714a6b7364da20cd468cd784247009663a5140
2015-10-20 10:11:19 +01:00
Ronald S. Bultje
c7dc1d78bf vp10: add extended-intra prediction edges experiment.
This experiment allows using full above/right edges for all transform
sizes whenever available (for d45/d63), and adds bottom/left edges for
d207.

See issue 1043.

Change-Id: I5cf7f345e783e8539bb6b6d2c9972fb1d6d0a78b
2015-10-16 19:30:39 -04:00
Alex Converse
0c00af126d Add vpx_highbd_convolve_{copy,avg}_sse2
single-threaded:
swanky (silvermont): ~1% faster overall
peppy (celeron,haswell): ~1.5% faster overall

Change-Id: Ib74f014374c63c9eaf2d38191cbd8e2edcc52073
2015-10-09 11:50:25 -07:00
Julia Robson
37c68efee2 SSSE3 optimisation for quantize in high bit depth
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSSE3 optimisations.

Change-Id: I194b505dd3f4c494e5c5e53e020f5d94534b16b5
2015-10-06 13:32:02 +01:00
Julia Robson
5e6533e707 SSE2 optimisation for quantize in high bit depth
When configured with high bit detpth enabled, the 8bit quantize
function stopped using optimised code. This made 8bit content
decode slowly. This commit re-enables the SSE2 optimisation
(but not the SSSE3 optimisation).

Change-Id: Id015fe3c1c44580a4bff3f4bd985170f2806a9d9
2015-10-05 10:59:16 -07:00
Ronald S. Bultje
3fedf4a59b Merge "vp10: reimplement d45/4x4 to match vp8 instead of vp9." 2015-10-02 17:15:59 +00:00
Debargha Mukherjee
cb5c47f20d Merge "Accelerated transform in high bit depth" 2015-10-02 06:55:55 +00:00
Ronald S. Bultje
62a1579525 vp10: reimplement d45/4x4 to match vp8 instead of vp9.
This is more a proof of concept than anything else. The problem here
isn't so much how to code it, but rather where to place the resulting
code. All intrapred DSP code lives in vpx_dsp, so do we want the vp10
specific intra pred functions to live there, or in vp10/?

See issue 1015.

Change-Id: I675f7badcc8e18fd99a9553910ecf3ddf81f0a05
2015-10-01 10:11:54 -04:00
Ronald S. Bultje
c26a9ecaa2 vp8: change build_intra4x4_predictors() to use vpx_dsp.
I've added a few new functions (d45e, d63e, he, ve) to cover the
filtered h/v 4x4 predictors that are vp8-specific, the "correct"
d45 with the correctly filtered bottom-right pixel (as opposed to
the unfiltered version in vp9), and the "broken" d63 with weirdly
filtered bottom-right pixels (which is correctly filtered in vp9).

There may be a minor performance impact on all systems because we
have to do an extra copy of the Above pixel array to incorporate
the topleft pixel in the same array (thus fitting the vpx_dsp API).
In addition, armv6 will have a more serious performance impact b/c
I removed the armv6/vp8-specific assembly. I'm not sure anyone
cares...

Change-Id: I7f9e5ebee11d8e21aca2cd517a69eefc181b2e86
2015-09-30 18:45:49 -04:00
Ronald S. Bultje
54d48955f6 vp8: change build_intra_predictors_mby_s to use vpx_dsp.
Change-Id: I2000820e0c04de2c975d370a0cf7145330289bb2
2015-09-30 18:45:40 -04:00
Julia Robson
406030d1b0 Accelerated transform in high bit depth
When configured with high bitdepth enabled, the 8bit transform
stopped using optimised code. This made 8bit content decode slowly.

Change-Id: I67d91f9b212921d5320f949fc0a0d3f32f90c0ea
2015-09-28 21:09:16 -07:00
Johann
a28b2c6ff0 Add sse2 versions of halfpix variance
These were lost in the great sub pixel variance move of
6a82f0d7fb

Not having these functions caused a ~10% performance regression in
some realtime vp8 encodes.

Change-Id: I50658483d9198391806b27899f2c0d309233c4b5
2015-08-27 11:58:38 -07:00
Scott LaVarnway
6a21ca20cc Merge "VPX ssse3 scaled convolve" 2015-08-19 22:12:21 +00:00
Scott LaVarnway
2030c49cf8 VPX ssse3 scaled convolve
Change-Id: I71d5994e21813554a927d35ebcc26bf7a68984fd
2015-08-18 15:13:02 -07:00
Jingning Han
5de049b067 Turn on dspr2 loop filter functions in vpx_dsp
Add the dspr2 files to vpx_dsp.mk and enable these functions in
vpx_dsp_rtcd_defs.pl file.

Change-Id: I79feb5af24f174f4a0788dc6f3b6df7f4e1fa467
2015-08-17 16:15:24 -07:00
Yaowu Xu
94ba3939cd vpx_highbd_ssim_parms_8x8: make parameter types consistent
Change-Id: Ie1fe6603232adc22dbe4d51bd1008c856a6d40ca
2015-08-14 09:18:07 -07:00
Jingning Han
3ee6db6c81 Fork VP9 and VP10 codebase
This commit folks the VP9 and VP10 codebase and makes libvpx
support VP8, VP9, and VP10.

Change-Id: I81782e0b809acb3c9844bee8c8ec8f4d5e8fa356
2015-08-11 17:05:28 -07:00
Scott LaVarnway
4ef08dcec8 Merge "VPX: Add rtcd support for scaling." 2015-08-11 13:19:00 +00:00
Alex Converse
26f4f2dc8e ssim: Add missing statics and consts
Change-Id: I2aa2a545bd2f8f170c66c2e267ea9d617ff10d87
2015-08-07 12:01:19 -07:00
Alex Converse
c65e79d2e5 ssim: Replace unsigned long with uint32_t.
The assembly only writes the low 4 bytes, and the HBD version only uses
uint32_t bytes.

Change-Id: Ie3694ecda511c231e55870df814cbae30e588073
2015-08-07 11:48:31 -07:00
Alex Converse
c7b7011b9b Move VP9 SSIM metrics to vpx_dsp.
Change-Id: I20c7b42631b579fade6cf7ebf6d4c69b2fcb5e5e
2015-08-06 18:25:25 -07:00
Jingning Han
d621de7e8d Change vp9_quantize to vpx_quantize
This commit clears all the vp9_ prefix use case in vpx_dsp. It gets
the vp9 folder ready to branch out vp10.

Change-Id: I2906eec179ee792b4af8c9b4161313653050e931
2015-08-04 15:31:49 -07:00
Jingning Han
08a453b9de Replace vp9_ prefix with vpx_ prefix in vpx_dsp function names
This commit clears the function naming convention in vpx_dsp. It
replaces vp9_ prefix of global functions with vpx_ prefix. It also
removes the vp9_ prefix from static functions.

Change-Id: I6394359a63b71a51dda01342eec6a3cc08dfeedf
2015-08-04 13:46:11 -07:00
Scott LaVarnway
8f6b943100 VPX: Add rtcd support for scaling.
Change-Id: If34bfb0d918967445aea7dc30cd7b55ebfedb1f2
2015-08-03 09:43:34 -07:00
Jingning Han
e8b133c79c Factor inverse transform functions into vpx_dsp
This commit moves the module inverse transform functions from vp9
to vpx_dsp folder. The hybrid transform wrapper functions stay in
the vp9 folder, since it involves codec-specific data structures.

Change-Id: Ib066367c953d3d024c73ba65157bbd70a95c9ef8
2015-07-31 16:21:00 -07:00
Zoe Liu
7cfdc00337 Refactor mips/dspr2 on convolution.
Change-Id: If59a39d5a92c261537342726f94bb7f7f26dfff3
2015-07-31 10:27:42 -07:00
Zoe Liu
7186a2dd86 Code refactor on InterpKernel
It in essence refactors the code for both the interpolation
filtering and the convolution. This change includes the moving
of all the files as well as the changing of the code from vp9_
prefix to vpx_ prefix accordingly, for underneath architectures:
(1) x86;
(2) arm/neon; and
(3) mips/msa.
The work on mips/drsp2 will be done in a separate change list.

Change-Id: Ic3ce7fb7f81210db7628b373c73553db68793c46
2015-07-31 10:27:33 -07:00
hui su
5fddefbced Exclude vpx intra prediction functions in vp8-only build
Currently vp8 is not using the intra prediction functions in vpx_dsp.

Change-Id: I1522b5f5cb12a81999fb126cf7c62c70259e7a52
2015-07-30 13:49:47 -07:00
Hui Su
4cbf36b105 Merge "Replace prefix vp9_ with vpx_ for intra prediction functions" 2015-07-29 00:38:48 +00:00
Jingning Han
d12a4a825c Merge "Replace vp9_ prefix in 2D-DCT functions with vpx_" 2015-07-29 00:07:31 +00:00
Jingning Han
fc18cf7a11 Merge "Move DC only forward 2D-DCT functions to vpx_dsp" 2015-07-29 00:06:37 +00:00
Jingning Han
4b5109cd73 Replace vp9_ prefix in 2D-DCT functions with vpx_
Clean up the forward 2D-DCT function names in vpx_dsp.

Change-Id: I3117978596d198b690036e7eb05fe429caf3bc25
2015-07-28 16:06:44 -07:00
Jingning Han
d19033fa4e Move DC only forward 2D-DCT functions to vpx_dsp
This completes the forward transform functions layout refactoring.

Change-Id: I996fb0fb795f41e2040f7b21db985774098aedbd
2015-07-28 14:52:30 -07:00
Hui Su
fe7cabe8b6 Merge "Move intra prediction functions from vp9/common/ to vpx_dsp/" 2015-07-28 20:41:01 +00:00
Jingning Han
a6a4659bea Factor 32x32 fwd DCT to vpx_dsp folder
Move the 32x32 2D-DCT implementations from vp9/ to vpx_dsp/.

Change-Id: Id3980696f8b69906ff7a59ff9fb2b9013d60047d
2015-07-28 11:13:41 -07:00
hui su
4013645353 Replace prefix vp9_ with vpx_ for intra prediction functions
Change-Id: I8ae6fb586f8d5d018ace228df11714f82b085076
2015-07-27 13:42:06 -07:00
hui su
7971846a5e Move intra prediction functions from vp9/common/ to vpx_dsp/
Change-Id: I64edc26cf4aab050c83f2d393df6250628ad43b8
2015-07-27 13:38:16 -07:00
Jingning Han
9aaf523ace Move msa implementations of 2D-DCT to vpx_dsp
Refactor and clean up the msa transform related code layout.

Change-Id: Ic5048bd3d62a6046589817da745370ea89448e44
2015-07-24 13:24:25 -07:00
Jingning Han
b67821f37b Factor forward 2D-DCT transforms into vpx_dsp
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.

Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
2015-07-22 15:48:17 -07:00
Parag Salasakar
2cdd3beac9 Merge "mips msa vp9 avg subpel variance optimization rebased" 2015-07-21 06:07:01 +00:00
Yunqing Wang
f65473c036 Merge "Migrate quantization functions from vp9/ to vpx_dsp/" 2015-07-20 16:20:07 +00:00
Yunqing Wang
38f1fbbb75 Migrate quantization functions from vp9/ to vpx_dsp/
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32

vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32

The purpose of doing that was to allow these functions to be shared
by multiple codecs.

Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
2015-07-17 16:38:14 -07:00
Jingning Han
2992739b5d Rename loop filter function from vp9_ to vpx_
Change-Id: I6f424bb8daec26bf8482b5d75dd9b0e45c11a665
2015-07-17 15:55:02 -07:00
Jingning Han
50adfdf5ba Migrate loop filter functions from vp9/ to vpx_dsp/
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.

Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
2015-07-16 16:40:47 -07:00
Parag Salasakar
1d7f1ca7da mips msa vp9 avg subpel variance optimization rebased
Change-Id: Ia21987010dbb688e2a8fa204ca9129d2f34c9581
2015-07-08 12:07:28 +05:30
Johann
6a82f0d7fb Move sub pixel variance to vpx_dsp
Change-Id: I66bf6720c396c89aa2d1fd26d5d52bf5d5e3dff1
2015-07-07 15:51:04 -07:00
Parag Salasakar
33a0deb928 Merge "mips msa vpx_dsp sadx3 sadx8 optimization" 2015-07-07 02:10:23 +00:00
Jingning Han
432cd4bfb7 Move subtract functions from vp9 to vpx_dsp
Factor out the subtraction operator as common function.

Change-Id: I526e703477c6a290e0e3e3c8898f8bb1ca82779b
2015-07-06 12:22:47 -07:00
Parag Salasakar
6abf1aea63 mips msa vpx_dsp sadx3 sadx8 optimization
average improvement ~3x-5x

Change-Id: Ifdb4670d31ae83c4e22a4238293d1377b16c90db
2015-07-02 08:02:19 +05:30
Parag Salasakar
bc3ec8ef07 mips msa vpx_dsp sad sad4d avgsad optimization
average improvement ~3x-5x

Change-Id: Ie30748cfbedebbd544b7ef4f286055ccb7f60306
2015-07-01 11:39:43 +05:30
Parag Salasakar
2d730a289a mips msa vpx_dsp variance optimization
average improvement ~2x-4x

Change-Id: Ia3eef3f390148c2eb5cdc580a94cb26369737f82
2015-06-30 12:22:18 +05:30
Johann
c3bdffb0a5 Move variance functions to vpx_dsp
subpel functions will be moved in another patch.

Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
2015-05-26 12:01:52 -07:00
Johann
d5d9289800 Move shared SAD code to vpx_dsp
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.

This reduces the size of vpxenc/dec by 36k on x86_64 builds.

Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
2015-05-06 16:58:20 -07:00