vpx/vpx_dsp/arm/idct_neon.asm
James Zern 3ae25974fd idct,NEON: add a tran_low_t->s16 load adapter
enable idct4x4* and idct8x8* which are compatible for 8-bit decodes in
high-bitdepth mode. the adapter narrows 32-bit input to 16, whether the
expansion can be avoided at all in this case remains a TODO. roughly
matches sse2.

BUG=webm:1294

Change-Id: I3ea94e5a2070dfd509b5de0c555aab4e1f4da036
2016-10-31 11:21:16 -07:00

30 lines
1.0 KiB
NASM

;
; Copyright (c) 2016 The WebM project authors. All Rights Reserved.
;
; Use of this source code is governed by a BSD-style license
; that can be found in the LICENSE file in the root of the source
; tree. An additional intellectual property rights grant can be found
; in the file PATENTS. All contributing project authors may
; be found in the AUTHORS file in the root of the source tree.
;
INCLUDE ./vpx_config.asm
; Helper function used to load tran_low_t into int16, narrowing if
; necessary.
; $dst0..3 are d registers with the pairs assumed to be contiguous in
; non-high-bitdepth builds. q0-q3 are used as temporaries in high-bitdepth.
MACRO
LOAD_TRAN_LOW_TO_S16 $dst0, $dst1, $dst2, $dst3, $src
IF CONFIG_VP9_HIGHBITDEPTH
vld1.s32 {q0,q1}, [$src]!
vld1.s32 {q2,q3}, [$src]!
vmovn.i32 $dst0, q0
vmovn.i32 $dst1, q1
vmovn.i32 $dst2, q2
vmovn.i32 $dst3, q3
ELSE
vld1.s16 {$dst0-$dst1,$dst2-$dst3}, [$src]!
ENDIF
MEND