Commit Graph

249 Commits

Author SHA1 Message Date
James Zern
ce88d74d34 vp9_reconintra_neon: add d45 4x4
based on webp's LD4()

~59% faster over 20M pixels

Change-Id: I371eaed9ce8f470451046997e130b0ba1a2f7a9c
2015-06-18 15:25:07 -07:00
James Zern
337b221e00 vp9_reconintra_neon: add d135 4x4
based on webp's RD4()

~50% faster over 20M pixels

Change-Id: Ifcb7bf7f7fc8eabf79d9e3b219ce1be67abc524a
2015-06-18 15:25:06 -07:00
James Zern
41d8545ab6 Merge "vp9_reconintra_neon: add DC 4x4 predictors" 2015-06-18 22:24:55 +00:00
James Zern
6e44bf20f7 vp9_reconintra_neon: add DC 4x4 predictors
~85-89% faster over 20M pixels

Change-Id: I3812e8adfffe5255034da88dfe6546e12f4d10ee
2015-06-18 15:22:43 -07:00
James Zern
e77f859d72 Merge "vp9_reconintra_neon: add DC 32x32 predictors" 2015-06-18 22:17:51 +00:00
Parag Salasakar
d9fedf7832 mips msa vp9 fdct 32x32 optimization
average improvement ~4x-6x

Change-Id: Ibcac3ef8ed5e207cf8c121e696570e6b63d3c0f4
2015-06-17 07:58:34 +05:30
Parag Salasakar
fa53008fb7 Merge "mips msa vp9 fdct 16x16 optimization" 2015-06-17 01:21:59 +00:00
Parag Salasakar
89b4b315aa mips msa vp9 fdct 16x16 optimization
average improvement ~4x-6x

Change-Id: Id3b2243e5b3c7844c90c4231a5e75fa69911362c
2015-06-16 12:49:34 +05:30
James Zern
79fb3a013e vp9_reconintra_neon: add DC 32x32 predictors
~84-85% faster over 20M pixels

Change-Id: Ia67a7f4a342bf7b0a9280e05c25d81a774d90469
2015-06-15 20:57:28 -07:00
James Zern
98f0178611 enable vp9_d153_predictor_32x32_ssse3
unused since its initial commit
~91% faster over 20M pixels

Change-Id: Ic8b5b3246bc97c8406be8bc4496601370403b70a
2015-06-12 19:48:22 -07:00
Parag Salasakar
fbac961b47 mips msa vp9 filter by weight optimization
filter by weight - average improvement ~2x-3x

Change-Id: I4832033335d339cdafdce697f07ce3e643920057
2015-06-12 12:06:42 +05:30
Parag Salasakar
a2288d274c mips msa vp9 intra-pred optimization
intra pred - average improvement ~2x-3x

Change-Id: Ie3f7d6eded5ecb7ed7ee506ba8e4d98f93803b09
2015-06-06 22:29:32 +05:30
Parag Salasakar
d43fd99822 mips msa vp9 loopfilter 4, 8 optimization
average improvement ~3x-4x

Change-Id: I59279293ce4b2a1e99bd10579ac97740e943643f
2015-06-05 09:56:08 +05:30
Parag Salasakar
914f8f9ee0 mips msa vp9 loopfilter 16 optimization
average improvement ~3x-4x

Change-Id: I8ef263da6ebcf8f20aabaefeccf25a84640ba048
2015-06-04 11:50:41 +05:30
Parag Salasakar
bdfbc3e876 mips msa vp9 convolve8 avg hv optimization
average improvement ~4x-6x

Change-Id: I7c8b4f2334491be8a859592606e568bc95d019aa
2015-06-04 08:11:01 +05:30
Parag Salasakar
b8c1cdcd12 mips msa vp9 convolve8 avg horiz optimization
average improvement ~5x-8x

Change-Id: I179a69ec620fbd69979bd128f05d18113618aab4
2015-06-03 11:33:42 +05:30
Parag Salasakar
c543d38ac7 mips msa vp9 convolve8 avg vert optimization
average improvement ~4x-6x

Change-Id: Ia2e6f770da46416ebec31fdcea5cc7878879a9d9
2015-06-03 09:55:25 +05:30
Parag Salasakar
54a6f73958 mips msa vp9 idct4x4 and iwht4x4 optimization
average improvement ~3x-4x
moved assert to respective files

Change-Id: I6c915059d456a00bdd76fab0dd2eede8b6c6ea58
2015-06-02 12:16:28 +05:30
Parag Salasakar
ebf7466cd8 mips msa vp9 updated convolve horiz, vert, hv, copy, avg module
Updated sources according to improved version of common MSA macros.
Enabled respective convolve MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.

Change-Id: If5ad6ef8ea7ca47feed6d2fc9f34f0f0e8b6694d
2015-06-02 12:03:51 +05:30
Parag Salasakar
6af9d7f2e2 mips msa vp9 updated idct 8x8, 16x16 and 32x32 module
Updated sources according to improved version of common MSA macros.
Enabled idct MSA hooks and tests.
Overall, this is just upgrading the code with styling changes.

Change-Id: I1f488ab2c741f6c622b7a855388a202168082209
2015-06-01 09:24:23 +05:30
Parag Salasakar
71e88f903d Merge "mips msa vp9 updated macros and disable all MSA functions" 2015-05-30 02:52:27 +00:00
James Zern
a2a13cbe5f vp9_reconintra_neon: add DC 16x16 predictors
85-89% faster over 20M pixels

Change-Id: I9b320ed6b9e67f27df738b84c8b43b65a93c50c2
2015-05-29 15:41:44 -07:00
James Zern
e97b849219 vp9_reconintra_neon: add DC 8x8 predictors
~90% faster over 20M pixels

Change-Id: Iab791510cc57c8332c2f9a5da0ed50702e5f5763
2015-05-29 15:39:08 -07:00
Parag Salasakar
f9f078ebb6 mips msa vp9 updated macros and disable all MSA functions
Done little restructuring/styling changes to the sources like generic macro definitions, their use to reduce code lines, better code alignments etc.
Disabled all MSA hooks and tests

Change-Id: Ic6f2dce0b501f46b80c06c46c0fe2043d557b190
2015-05-29 13:34:33 +05:30
Johann
c3bdffb0a5 Move variance functions to vpx_dsp
subpel functions will be moved in another patch.

Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce
2015-05-26 12:01:52 -07:00
James Zern
330fba41e2 vp9 intrinsics: add vp9_rtcd include
silences a missing declaration warning

Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1
2015-05-15 10:43:47 -07:00
Parag Salasakar
686616a989 Merge "mips msa vp9 idct 8x8 optimization" 2015-05-13 04:36:34 +00:00
hkuang
f5574fb44c Merge "Add more sse2 code for intra prediction." 2015-05-08 17:26:30 +00:00
Parag Salasakar
7c5f00f868 mips msa vp9 idct 8x8 optimization
average improvement ~4x-6x

Change-Id: I5edf713721b9e24c7e0ce2e69d8fc3ecab625d91
2015-05-08 12:23:27 +05:30
Parag Salasakar
a8a9c2bb45 Merge "mips msa vp9 idct 32x32 optimization" 2015-05-08 04:27:44 +00:00
Johann
76a08210b6 Merge "Move shared SAD code to vpx_dsp" 2015-05-07 18:33:06 +00:00
Parag Salasakar
1601c1385a mips msa vp9 idct 32x32 optimization
average improvement ~4x-6x

Change-Id: Idaba7e49fbd7f388caee0d73773ccf6e4807ef17
2015-05-07 12:42:23 +05:30
hkuang
7153b822ed Add more sse2 code for intra prediction.
vp9_dc_left_predictor_16x16
vp9_dc_top_predictor_32x32
vp9_dc_left_predictor_32x32
vp9_dc_128_predictor_32x32

Change-Id: Ib9861deefd01c3527235b92ff6b3d571ef6b4bc6
2015-05-06 17:17:00 -07:00
Johann
d5d9289800 Move shared SAD code to vpx_dsp
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.

This reduces the size of vpxenc/dec by 36k on x86_64 builds.

Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
2015-05-06 16:58:20 -07:00
Parag Salasakar
d1cdda88bd Merge "mips msa vp9 idct 16x16 optimization" 2015-05-06 06:40:56 +00:00
James Zern
ccae5d99d2 fix and enable vp9_dc_128_predictor_16x16
widen the loads and stores to 128-bit.

this was added, but not enabled in:
493a857 Add some sse2 code for intra prediction.

Change-Id: I277d7db608a7db7d75cc0bde86f48fa66ad487e4
2015-05-05 11:40:13 -07:00
hkuang
e47811ef8f Merge "Add some sse2 code for intra prediction." 2015-05-05 17:11:07 +00:00
Parag Salasakar
60052b618f mips msa vp9 idct 16x16 optimization
average improvement ~4x-6x

Change-Id: I55e95b7f2ba403dff11813958dc7c73a900dd022
2015-05-05 12:37:06 +05:30
Yaowu Xu
2061359fcf Merge "Remove vp9_idct16x16_10_add_ssse3()" 2015-04-30 23:13:33 +00:00
hkuang
493a8579f1 Add some sse2 code for intra prediction.
Change-Id: I16c0a62e52dab62837c547345df31e7518620ed4
2015-04-30 15:42:57 -07:00
Yaowu Xu
47767609fe Remove vp9_idct16x16_10_add_ssse3()
The rotation computation using 2X of cos(pi/16) has a potential to
overflow 32 bit, this commit disable the function to allow further
investigation and optimization.

Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf
2015-04-30 09:07:30 -07:00
Parag Salasakar
95cb130f32 Merge "mips msa vp9 copy and avg convolve optimization" 2015-04-30 04:39:13 +00:00
Yaowu Xu
486a73a9ce Disable ssse3 version idct16x16_256_add()
The version is currently producing different result from c version
for some input. Disable the use of it for now to allow time for
investigation the source of mismatch.

Change-Id: Id039455494ee531db4886a9f1fa4761174ef6df3
2015-04-29 16:58:59 -07:00
Parag Salasakar
2301d10f73 mips msa vp9 copy and avg convolve optimization
average improvement ~3x-5x

Change-Id: I422e4c33ea7e6d6783ba40029438ccf21b0e76bb
2015-04-29 12:28:17 +05:30
Parag Salasakar
ca90d4fd96 mips msa vp9 convolve8 horiz optimization
average improvement ~6x-8x

Change-Id: I7c91eec41aada3b0a5231dda7869b3b968f3ad18
2015-04-21 12:31:26 +05:30
Parag Salasakar
ef51c1ab5b mips msa vp9 convolve8 hv optimization
average improvement ~5x-8x

Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b
2015-04-21 09:17:49 +05:30
Parag Salasakar
2e36149ccd Merge "mips msa vp9 convolve8 vert optimization" 2015-04-18 23:39:25 -07:00
Parag Salasakar
27d083c1b9 mips msa vp9 convolve8 vert optimization
average improvement ~6x-10x

Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa
2015-04-18 08:13:04 +05:30
Marco Paniconi
f76ccce5bc Revert "Revert "Force_split on 16x16 blocks in variance partition.""
This reverts commit 004b9d83e3

Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743
2015-04-16 17:52:13 -07:00
Yunqing Wang
004b9d83e3 Revert "Force_split on 16x16 blocks in variance partition."
This reverts commit eb8c667570.
The patch caused mismatch while using multi-threads.

Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be
2015-04-14 15:19:31 -07:00
Marco
eb8c667570 Force_split on 16x16 blocks in variance partition.
Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks.

Also increase variance threshold for 32x32, and add exit condiiton in choose_partition
(with very safe threshold) based on sad used to select reference frame.

Some visual improvement near moving boundaries.
Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%.
Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip.

Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577
2015-04-13 12:05:07 -07:00
Jingning Han
93d9c50419 Merge "SSSE3 assembly implementation of 8x8 Hadamard transform" 2015-04-09 11:16:11 -07:00
Jingning Han
7f629dfca4 SSSE3 assembly implementation of 8x8 Hadamard transform
It uses about 10% less CPU cycles than the SSE2 intrinsic
implementation.

Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499
2015-04-04 09:59:37 -07:00
James Zern
44e3640923 Merge "vp9: enable sse4 sad functions" 2015-04-03 14:57:52 -07:00
James Zern
b644384bb5 Merge "vp9: fix high-bitdepth NEON build" 2015-04-01 23:36:17 -07:00
Jingning Han
1470529f62 Refactor block_yrd function for RTC coding mode
This commit separates Hadamard transform/quantization operations
from rate and distortion computation in block_yrd. This allows one
to skip SATD computation when all transform blocks are quantized
to zero. It also uses a new block error function that skips
repeated computation of sum of squared residuals. It reduces the
CPU cycles spent on block error calculation in block_yrd by 40%.

Change-Id: I726acb2454b44af1c3bd95385abecac209959b10
2015-04-01 12:00:43 -07:00
James Zern
14e24a1297 vp9: enable sse4 sad functions
sse4 isn't set by configure or used in rtcd, correct the sad entries to
use sse4_1 without changing the signatures for now.
this was done in vp8 post-vp9 branch.

Change-Id: Ia9f1fff9f2476fdfa53ed022778dd2f708caa271
2015-03-31 21:00:55 -07:00
James Zern
8845334097 vp9: fix high-bitdepth NEON build
remove incorrect specializations in rtcd and update a configuration
check in partial_idct_test.cc

Change-Id: I20f551f38ce502092b476fb16d3ca0969dba56f0
2015-03-31 17:45:25 -07:00
Jingning Han
26d3d3af6a Enable 16x16 Hadamard transform in SATD based mode decision
This commit replaces the 16x16 2D-DCT transform with Hadamard
transform for RTC coding mode. It reduces the CPU cycles cost
on 16x16 transform by 5X. Overall it makes the speed -6 encoding
speed 1.5% faster without compromise on compression performance.

Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b
2015-03-30 15:43:31 -07:00
Jingning Han
8c411f74e0 Hadamard transform based coding mode decision process
This commit uses Hadamard transform based rate-distortion cost
estimate for rtc coding mode decision. It improves the compression
performance of speed -6 for many hard clips at lower bit-rates.
For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for
niklas720p. This will introduce extra encoding cycle costs at
this point.

Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375
2015-03-30 14:46:05 -07:00
Jingning Han
1790d45252 Use variance metric for integral projection vector match
This commit replaces the SAD with variance as metric for the
integral projection vector match. It improves the search accuracy
in the presence of slight light change. The average speed -6
compression performance for rtc set is improved by 1.7%. No speed
changes are observed for the test clips.

Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d
2015-03-01 10:42:56 -08:00
Jingning Han
ed2dc59c1b Integral projection based motion estimation
This commit introduces a new block match motion estimation
using integral projection measurement. The 2-D block and the nearby
region is projected onto the horizontal and vertical 1-D vectors,
respectively. It then runs vector match, instead of block match,
over the two separate 1-D vectors to locate the motion compensated
reference block.

This process is run per 64x64 block to align the reference before
choosing partitioning in speed 6. The overall CPU cycle cost due
to this additional 64x64 block match (SSE2 version) takes around 2%
at low bit-rate rtc speed 6. When strong motion activities exist in
the video sequence, it substantially improves the partition
selection accuracy, thereby achieving better compression performance
and lower CPU cycles.

The experiments were tested in RTC speed -6 setting:
cloud 1080p 500 kbps
17006 b/f, 37.086 dB, 5386 ms ->
16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster)

pedestrian_area 1080p 500 kbps
53537 b/f, 36.771 dB, 18706 ms ->
51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings)

blue_sky 1080p 500 kbps
70214 b/f, 33.600 dB, 13979 ms ->
53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster)

jimred 400 kbps
13380 b/f, 36.014 dB, 5723 ms ->
13377 b/f, 36.087 dB, 5831 ms  (2% bit-rate savings, 2% slower)

Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c
2015-02-19 13:47:19 -08:00
Frank Galligan
e3167f7fbf Add vp9_sad32x32x4d_neon Neon intrinsic function.
On Nexus 7 speed -6 saw ~18% increase in perf.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

BUG=https://code.google.com/p/webm/issues/detail?id=908

Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c
2015-01-27 08:54:00 -08:00
Frank Galligan
9f574d0316 Add vp9_sad16x16x4d_neon Neon intrinsic function.
On Nexus 7 speed -6 saw ~15% increase in perf.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

BUG=https://code.google.com/p/webm/issues/detail?id=908

Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9
2015-01-27 08:42:17 -08:00
Frank Galligan
54fa956715 Add vp9_sad64x64x4d_neon Neon intrinsic function.
On Nexus 7 speed -6 saw ~30% increase in perf.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

BUG=https://code.google.com/p/webm/issues/detail?id=908

Change-Id: Id12af7d1883243c23e6692e898aea82299633d58
2015-01-27 08:33:40 -08:00
Frank Galligan
9f6eba419a Add Neon intrinsic vp9_fdct8x8_quant_neon
On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30%
increase in perf.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51
2015-01-24 22:49:50 -08:00
JackyChen
65f60f8e8c Merge "SSE2 code for the filter in MFQE." 2015-01-23 11:08:16 -08:00
JackyChen
09673deba9 SSE2 code for the filter in MFQE.
The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8
side. In our testing, we achieve 2X speed by adopting this change.

Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716
2015-01-18 16:07:59 -08:00
Frank Galligan
6e7e1cf32f Add Neon intrinsics for vp9_avg_8x8_neon
On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5%
increase in perf for 720p.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee
2015-01-15 15:32:40 -08:00
Frank Galligan
ec1d8387e1 Add 64x64 sub_pel_variance Neon function
On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase
in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10%
increase in perf for 720p.

Tested on Nexus 7, built with ndk r10d, gcc 4.9.

Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334
2015-01-14 08:36:24 -08:00
Frank Galligan
74d40cd507 Add 64x variance Neon functions
Add optimized Neon functions of:
vp9_variance32x64
vp9_variance64x32
vp9_variance64x64

On Nexus 7 speed -5 and -6 saw about a 4% increase in perf.
Speeds -7 and -8 saw about a 6% increase in perf.
Tested on Nexus 7, built with ndk r10d, gcc 4.9.

Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa
2015-01-13 15:08:13 -08:00
Johann
377b6682f9 Disable vp9 _8_ loopfilters
Investigating https://code.google.com/p/chromium/issues/detail?id=443839

Change-Id: Ibb7485d835c5aa5e1d40f31715596ba8d208eedb
2015-01-06 19:26:11 -08:00
Jingning Han
d0f2377027 Revert "Revert "Removal of legacy zbin_extra / zbin_oq_value.""
This reverts commit 9946ee23e0.

Fix the ssse3 asm function.

Change-Id: I07f77a63aa98087626e45c4e87aa5dcafc0b0b07
2014-12-22 10:09:25 -08:00
Paul Wilkins
9946ee23e0 Revert "Removal of legacy zbin_extra / zbin_oq_value."
This reverts commit e9b586e21b.

Change-Id: I5b36e6727da6c05278d97e2c37b80c109f79bed4
2014-12-19 15:02:58 +00:00
Paul Wilkins
e9b586e21b Removal of legacy zbin_extra / zbin_oq_value.
zbin extra / zbin_oq_value was widely passed around,
hence removal touches a lot of code.

Change-Id: Idc94359735b60c38a160e4385ae09d5ca8b6b8e5
2014-12-18 16:49:11 +00:00
James Yu
aeeaa67987 VP9 common for ARMv8 by using NEON intrinsics 15
Re-write
- vp9_lpf_horizontal_4_dual_neon
in vp9_loopfilter_16_neon.c

Change-Id: Ie14f63d352f9564ad01db3939a61d91cf6d21a31
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-16 20:00:26 -08:00
James Yu
aa8dd897c1 VP9 common for ARMv8 by using NEON intrinsics 16
Add vp9_reconintra_neon.c
- vp9_v_predictor_4x4_neon
- vp9_v_predictor_8x8_neon
- vp9_v_predictor_16x16_neon
- vp9_v_predictor_32x32_neon
- vp9_h_predictor_4x4_neon
- vp9_h_predictor_8x8_neon
- vp9_h_predictor_16x16_neon
- vp9_h_predictor_32x32_neon
- vp9_tm_predictor_4x4_neon
- vp9_tm_predictor_8x8_neon
- vp9_tm_predictor_16x16_neon
- vp9_tm_predictor_32x32_neon

Change-Id: Ib5d54a4766a1b5127169045659974f33aa98376d
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-16 12:57:52 -08:00
James Yu
4f856cd7fa VP9 common for ARMv8 by using NEON intrinsics 06
Add vp9_iht8x8_add_neon.c
- vp9_iht8x8_64_add_neon

The assembly did not previously implement tx_type 0
BUG=716

Change-Id: Icfc99dd24f3d59047f9184a7d0c761ba7e3de934
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-15 12:18:06 -08:00
James Yu
6b71013277 VP9 common for ARMv8 by using NEON intrinsics 05
Add vp9_iht4x4_add_neon.c
- vp9_iht4x4_16_add_neon

The assembly did not previously implement tx_type 0
BUG=715

Change-Id: I60034d1568de034edba45c5cdd13f3d87dbc73b6
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-15 12:16:19 -08:00
James Yu
3f7c12dab9 VP9 common for ARMv8 by using NEON intrinsics 18
Add vp9_idct32x32_add_neon.c
- vp9_idct32x32_1024_add_neon

Change-Id: Ic598b772c28bd3487a8ead7a4598a66b25f9b00f
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 18:20:04 -08:00
James Yu
3cfed4bf76 VP9 common for ARMv8 by using NEON intrinsics 14
Add vp9_idct16x16_add_neon.c
- vp9_idct16x16_256_add_neon_pass1
- vp9_idct16x16_256_add_neon_pass2
- vp9_idct16x16_10_add_neon_pass1
- vp9_idct16x16_10_add_neon_pass2

Change-Id: I54d25b54a36f4371760f54e4036693aaea40a5de
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 18:19:54 -08:00
James Yu
ce76aeb00d VP9 common for ARMv8 by using NEON intrinsics 13
Add vp9_idct8x8_add_neon.c
- vp9_idct8x8_64_add_neon
- vp9_idct8x8_10_add_neon

Change-Id: I6ee7b4496765aa36ed52990f2ef73e9f24459610
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 14:56:54 -08:00
James Yu
8c25f4af6a VP9 common for ARMv8 by using NEON intrinsics 12
Add vp9_idct4x4_add_neon.c
- vp9_idct4x4_16_add_neon

Change-Id: I011a96b10f1992dbd52246019ce05bae7ca8ea4f
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 14:49:59 -08:00
James Yu
420f58f2d2 VP9 common for ARMv8 by using NEON intrinsics 11
Add vp9_idct16x16_1_add_neon.c
- vp9_idct16x16_1_add_neon

Change-Id: I7c6524024ad4cb4e66aa38f1c887e733503c39df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 13:06:58 -08:00
James Yu
030ca4d0e5 VP9 common for ARMv8 by using NEON intrinsics 10
Add vp9_idct32x32_1_add_neon.c
- vp9_idct32x32_1_add_neon

Change-Id: If9ffe9a857228f5c67f61dc2b428b40965816eda
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 13:04:29 -08:00
James Yu
2772b45ac0 VP9 common for ARMv8 by using NEON intrinsics 09
Add vp9_idct8x8_1_add_neon.c
- vp9_idct8x8_1_add_neon

Change-Id: I9d23e01fa96013febbf64db6c76c6c955f14e3ff
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 12:52:33 -08:00
James Yu
9114f0afdb VP9 common for ARMv8 by using NEON intrinsics 08
Add vp9_idct4x4_1_add_neon.c
- vp9_idct4x4_1_add_neon

Change-Id: Ieab9af107dbd07a4f9503bc945890c90faccb8ac
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-10 12:49:28 -08:00
James Yu
01fc6f51e0 VP9 common for ARMv8 by using NEON intrinsics 07
Add vp9_convolve8_neon.c
- vp9_convolve8_horiz_neon
- vp9_convolve8_vert_neon

Change-Id: I0bdd99ff72d275223fe211ac7243c25a5a60cf87
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
893534a996 VP9 common for ARMv8 by using NEON intrinsics 04
Add vp9_convolve8_avg_neon.c
- vp9_convolve8_avg_horiz_neon
- vp9_convolve8_avg_vert_neon

Change-Id: I617971e37b02186fec5aca181f4f9622050ea2df
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:03:07 -08:00
James Yu
d12757f5c6 VP9 common for ARMv8 by using NEON intrinsics 03
Add vp9_copy_neon.c
- vp9_convolve_copy_neon

Change-Id: I291fc5423d06240876411bbceab03eae5ef585be
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 20:02:46 -08:00
Scott LaVarnway
617382a2e3 VP9 common for ARMv8 by using NEON intrinsics 02
Add vp9_avg_neon.c
- vp9_convolve_avg_neon

Change-Id: Id2c9d5bcfa37cff1a16417aba1656ff07bdf10fd
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 19:00:21 -08:00
James Yu
5b098b1825 VP9 common for ARMv8 by using NEON intrinsics 01
Add vp9_loopfilter_neon.c
- vp9_lpf_horizontal_4_neon
- vp9_lpf_vertical_4_neon
- vp9_lpf_horizontal_8_neon
- vp9_lpf_vertical_8_neon

Change-Id: I97a0d7b399a431c21ee77396be3d5f5a1f7ebccb
Signed-off-by: James Yu <james.yu@linaro.org>
2014-12-09 12:26:56 -08:00
Peter de Rivaz
a306bd8274 Use the RTC optimizations when in high bitdepth mode.
Change 72193 made the encoder behave differently
when configured with and without high bitdepth.
This change means the same algorithm is used for both.

Change-Id: I707a44a94afca773a9e0c2f7ebeeea83030257c5
2014-12-04 15:48:42 -08:00
Marco
8fd3f9a2fb Enable non-rd mode coding on key frame, for speed 6.
For key frame at speed 6: enable the non-rd mode selection in speed setting
and use the (non-rd) variance_based partition.

Adjust some logic/thresholds in variance partition selection for key frame only (no change to delta frames),
mainly to bias to selecting smaller prediction blocks, and also set max tx size of 16x16.

Loss in key frame quality (~0.6-0.7dB) compared to rd coding,
but speeds up key frame encoding by at least 6x.
Average PNSR/SSIM metrics over RTC clips go down by ~1-2% for speed 6.

Change-Id: Ie4845e0127e876337b9c105aa37e93b286193405
2014-12-03 09:18:08 -08:00
Peter de Rivaz
7e40a55ef9 Added high bitdepth sse2 transform functions
Also removes some spurious changes in common/vp9_blockd.h which
was introduced by a rebase issue between nextgen and master branches.

Change-Id: If359f0e9a71bca9c2ba685a87a355873536bb282
(cherry picked from commit 005d80cd05)
(cherry picked from commit 08d2f54800)
(cherry picked from commit 4230c2306c)
2014-12-02 11:16:24 -08:00
Debargha Mukherjee
02355a4abf Merge "Added highbitdepth sse2 acceleration for quantize" 2014-11-21 16:08:47 -08:00
Peter de Rivaz
a7b2d09f36 Added highbitdepth sse2 acceleration for quantize
Also includes block error.

(This patch is mostly cherry picked from
commit db7192e0b0)

Change-Id: Idef18f90b111a0d0c9546543d3347e551908fd78
2014-11-19 23:55:19 -08:00
Jingning Han
c42715b721 Enable ssse3 version of vp9_fdct8x8_quant
It improves the speed performance of vp9_fdct8x8_quant_sse2 by
about 5%.

Change-Id: I74b093ba4d81df64caf71ac7693f3d917f673097
2014-11-19 22:14:19 -08:00
Jingning Han
bf63652d34 Merge "Combine fdct8x8 and quantization process" 2014-11-19 11:17:44 -08:00
Jingning Han
ce77a7bcb0 Merge "Add sse2 version for vp9_quantize_fp" 2014-11-19 11:17:36 -08:00