Commit Graph

34 Commits

Author SHA1 Message Date
Johann
d5d9289800 Move shared SAD code to vpx_dsp
Create a new component, vpx_dsp, for code that can be shared
between codecs. Move the SAD code into the component.

This reduces the size of vpxenc/dec by 36k on x86_64 builds.

Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a
2015-05-06 16:58:20 -07:00
Johann
6eec73a747 Remove asm offset dependencies
The obj_int_extract code is no longer worth maintaining. It creates
significant issues when adapting for different build systems and no
longer offers as significant of a performance benefit due to
improvements in intrinsics.

Source files will remain until the various third-party builds are updated.

The neon fast quantizer has been moved to intrinsics. The armv6 version
has been removed because so few remaining targets require it.

Compilers and processors have improved significantly since the
pack_tokens code was written. The assembly is no longer faster than the
C code.

pack_tokens were the only optimizations for the armv5te targets so the targets
will be removed after the test infrastructure has been updated.

BUG=710

Change-Id: Ic785b167cd9f95eeff31c7c76b7b736c07fb30eb
2014-11-06 16:00:01 -08:00
Johann
2134eb2f05 Remove pair quantization
The intrinsics version of the pair quant is slower than running it
individually.

Change-Id: I7b4ea8599d4aab04be0a5a0c59b8b29a7fc283f4
2014-10-31 13:42:55 -07:00
Johann
7ae75c3d52 vp8 quantization -> intrinsics
Use intrinsics for neon quantization. Slight loss (<5%) of performance
compared to the assembly. Roughly 10x faster on arm64 because that was
running C code before.

Change-Id: I7cf5242d8f29b7eab5bca6a1c20c89c9fc9ca66d
2014-10-31 13:42:13 -07:00
Scott LaVarnway
fe2cc873dc VP8 encoder for ARMv8 by using NEON intrinsics 1
Add vp8_mse16x16_neon.c
- vp8_mse16x16_neon
- vp8_get4x4sse_cs_neon

Change-Id: I108952f60a9ae50613f0ce3903c2c81df19d99d0
Signed-off-by: James Yu <james.yu@linaro.org>
2014-09-15 12:04:09 -07:00
Johann
23bed46ddd Move vp8_variance_halfpixvar16x16_*_neon definitions
These functions moved from 'neon_asm' to 'neon' in
9293d267d2

Change-Id: I9cb5626c3eec96876a73fb18f2bfc982a5858edb
2014-09-09 08:21:36 -07:00
Jia Jia
395f2e874b vp8 encoder: remove vp8_yv12_copy_partial_frame_neon
Use generic C implementation instead of neon-specific code

Change-Id: Ib322b4ece9cdbd4de76a9eed3d2e9fd1d8542406
2014-09-08 08:59:24 -07:00
Scott LaVarnway
ec94967ffe Revert "Revert "VP8 for ARMv8 by using NEON intrinsics 10""
This reverts commit 677fb5123e

Compiles with 4.6.

Change-Id: I7f87048911b6bc28a61741d95501fa45ee97b819
2014-09-04 08:51:20 -07:00
Scott LaVarnway
dcbfacbb98 Neon version of vp8_build_intra_predictors_mby_s() and
vp8_build_intra_predictors_mbuv_s().

This patch replaces the assembly version with an intrinsic
version.

On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~2.6%.

Change-Id: I9ef65bad929450c0215253fdae1c16c8b4a8f26f
2014-09-03 13:41:27 -07:00
Johann
5b788c0cbe Merge "Revert "Revert "VP8 for ARMv8 by using NEON intrinsics 06" This reverts commit 81ad047ee5. Revert "VP8 for ARMv8 by using NEON intrinsics 15" This reverts commit 727af7cebe3698b8493ba6c1360b0a6606c310fb."" 2014-09-03 13:27:11 -07:00
Scott LaVarnway
652ef29d09 Revert "Revert "VP8 for ARMv8 by using NEON intrinsics 08""
This reverts commit 928ff03889

Compiles with 4.6 now.

Change-Id: Ib455da1098bb0e0623248be07579882a425fcbd1
2014-08-29 13:29:36 -07:00
Johann
911e96a4eb Revert "Revert "VP8 for ARMv8 by using NEON intrinsics 06" This reverts commit 81ad047ee5. Revert "VP8 for ARMv8 by using NEON intrinsics 15" This reverts commit 727af7cebe3698b8493ba6c1360b0a6606c310fb."
This reverts commit 920f803f2e

Change-Id: I410d9036214a1b18427cca70b4bc6d8239740737
2014-08-20 09:41:50 -07:00
James Yu
eed005b076 VP8 encoder for ARMv8 by using NEON intrinsics 6
Add shortfdct_neon.c
- vp8_short_fdct4x4_neon
- vp8_short_fdct8x4_neon

Change-Id: I90152c803b484f5fab839473d632c50af0524e68
Signed-off-by: James Yu <james.yu@linaro.org>
2014-08-20 09:25:29 -07:00
James Yu
6d6fdd9c3d VP8 encoder for ARMv8 by using NEON intrinsics 3
Add subtract_neon.c
- vp8_subtract_b_neon
- vp8_subtract_mby_neon
- vp8_subtract_mbuv_neon

Change-Id: If9a17a093478552e3e3276eeaa3f098b9021d08c
Signed-off-by: James Yu <james.yu@linaro.org>
2014-08-20 09:20:55 -07:00
Scott LaVarnway
8013aaa10b VP8 encoder for ARMv8 by using NEON intrinsics 2
Add vp8_shortwalsh4x4_neon.c
- vp8_short_walsh4x4_neon

Change-Id: Ica5f584be608c9e636f62db14f563757e94be09b
Signed-off-by: James Yu <james.yu@linaro.org>
2014-08-20 09:19:23 -07:00
Johann
018fd12cf6 Disable vp8_sixtap_predict4x4_neon
Crashing on iOS encodes
BUG=817

Change-Id: I2c50f8d359563d15a8b8002b6091184c0f9818af
2014-07-18 14:08:52 -07:00
Scott LaVarnway
a4b7ae7e82 Neon version of vp8_denoiser_filter_uv()
The encoder performance improved by 5% (vs "C")
for the test clip used.

Change-Id: I866b35eb2a06092edce7b37fc409562d0dacd7e7
2014-06-27 11:03:58 -07:00
Marco Paniconi
91ccad2179 Merge "vp8: Add temporal denoising for UV-channel." 2014-06-26 13:03:50 -07:00
Scott LaVarnway
94ae0430d2 vp8: Add temporal denoising for UV-channel.
C version and sse2 version, and off by default.
For the test clip used, the sse2 performance improved by ~5.6%

Change-Id: Ic2d815968849db51b9d62085d7a490d0e01574f6
2014-06-26 11:45:42 -07:00
Johann
0d3ed089f1 sse4 regular quantize
Change-Id: Ibd95df0adf9cc9143006ee9032b4cb2ebfd5dd1b
2014-06-18 14:26:16 -07:00
Marco Paniconi
6da66e1114 vp8: Add increase_denoising parameter to denoiser.
Change-Id: I96ed73e109c4f89dd06f3583cf7ecf9277401fae
2014-05-16 15:06:59 -07:00
Johann
c511d79c08 Merge "Remove intermediate step in vp8_dequantize_b" 2014-05-16 07:33:52 -07:00
Johann
1bec51d666 Merge "Build armv7a-only code" 2014-05-15 12:26:24 -07:00
Jim Bankoski
ec82d2dfec Merge "Revert "Remove Wextra warnings from vp9_sad.c"" 2014-05-15 11:54:23 -07:00
Jim Bankoski
a16794dd31 Revert "Remove Wextra warnings from vp9_sad.c"
This reverts commit 7ab9a9587b

Nightly test http://build.webmproject.org/jenkins/view/libvpx-nightly-tests/job/libvpx%20unit%20tests%20(valgrind-2)/arch=x86_64-linux-gcc,filter=-*VP8*:*Large.*/276/console

Failed 

This patch did not address all the assembly issues 
some of the vp8 assembly counts on 5 arguments being passed in to this function:   

one example : vp8_sad8x16_wmt

Please address or split this into vp9 and vp8 patches.

Change-Id: I78afcc171649894f887bb8ee3c66de24aaddc7ca
2014-05-15 08:31:20 -07:00
Johann
2f6f955a17 Remove intermediate step in vp8_dequantize_b
With the intrinsics it is no longer necessary to have a stub/helper
function.

Change-Id: I3695961c3c94f1bb750d3b7b29716e509ebba482
2014-05-14 12:24:18 -07:00
Johann
4dcc6d9707 Build armv7a-only code
Allow disabling the more generic NEON code.
Use filtered option to disable rtcd code.

Change-Id: Icb4500c1a2bac16eed3c5e3ec0c35e92e6bbbb9f
2014-05-14 12:23:33 -07:00
Marco Paniconi
f017b0d21c Merge "Revert "Revert "Remove struct params from vp8_denoiser_filter""" 2014-05-14 11:00:56 -07:00
Marco Paniconi
96d1946e87 Revert "Revert "Remove struct params from vp8_denoiser_filter""
This reverts commit 06e6d56fa1

Change-Id: If95598385b693945d6b144d03b6da8f6a57dac98
2014-05-14 10:55:53 -07:00
Deb Mukherjee
7ab9a9587b Remove Wextra warnings from vp9_sad.c
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.

Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
2014-05-14 03:17:31 -07:00
Johann
ce23931a3f Only build neon assembly for armv7 targets
Allow selectively building just the intrinsics for armv8

Change-Id: I2f29b2e4508b8b8e5649c2906b3159ad1d4ec477
2014-05-12 08:52:02 -07:00
Frank Galligan
06e6d56fa1 Revert "Remove struct params from vp8_denoiser_filter"
This reverts commit e516a42527

Change-Id: I7c78712acc737ad5f580181cdab3aa76b23f3ca5
2014-05-07 16:19:20 -07:00
Scott LaVarnway
e516a42527 Remove struct params from vp8_denoiser_filter
This eliminates the asm_offsets dependency for future
all-assembly versions of this function.

Change-Id: I3227073ecfcb8ee6e593934fab941e9081abdda0
2014-05-02 10:31:52 -07:00
James Zern
805078a1bf build: convert rtcd.sh to perl
significantly speeds up file generation.

the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.

---
Linux
    [CREATE] vpx_scale_rtcd.h
real    0m0.485s ->    0m0.022s
    [CREATE] vp8_rtcd.h
real    0m4.619s ->    0m0.060s
    [CREATE] vp9_rtcd.h
real    0m10.102s ->    0m0.087s

Windows
    [CREATE] vpx_scale_rtcd.h
real    0m8.360s ->    0m0.080s
    [CREATE] vp8_rtcd.h
real    1m8.083s ->    0m0.160s
    [CREATE] vp9_rtcd.h
real    2m6.489s ->    0m0.233s

Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
2014-03-03 14:47:11 -08:00