Remove trailing whitespace
Most of it was removed in ff6b669176
from C++ source files,
but other files were left unchanged.
This commit is contained in:
parent
8f9a5469be
commit
f9dea46712
28
README.md
28
README.md
@ -1,7 +1,7 @@
|
||||
OpenH264
|
||||
=======
|
||||
OpenH264 is a codec library which supports H.264 encoding and decoding. It is suitable for use in real time applications such as WebRTC. See http://www.openh264.org/ for more details.
|
||||
|
||||
|
||||
Encoder Features
|
||||
------------------------
|
||||
- Constrained Baseline Profile up to Level 5.2 (4096x2304)
|
||||
@ -17,10 +17,10 @@ Encoder Features
|
||||
- Single reference frame for inter prediction
|
||||
- Multiple reference frames when using LTR and/or 3-4 temporal layers
|
||||
- Periodic and on-demand Instantaneous Decoder Refresh (IDR) frame insertion
|
||||
- Dynamic changes to bit rate, frame rate, and resolution
|
||||
- Dynamic changes to bit rate, frame rate, and resolution
|
||||
- Annex B byte stream output
|
||||
- YUV 4:2:0 planar input
|
||||
|
||||
|
||||
Decoder Features
|
||||
------------------------
|
||||
- Constrained Baseline Profile up to Level 5.2 (4096x2304)
|
||||
@ -32,7 +32,7 @@ Decoder Features
|
||||
- Multiple reference frames when specified in Sequence Parameter Set (SPS)
|
||||
- Annex B byte stream input
|
||||
- YUV 4:2:0 planar output
|
||||
|
||||
|
||||
OS Support
|
||||
----------------
|
||||
- Windows 64-bit and 32-bit (initial release is only 32-bit, 64-bit will follow soon)
|
||||
@ -40,7 +40,7 @@ OS Support
|
||||
- Linux 64-bit and 32-bit (initial release is only 32-bit, 64-bit will follow soon)
|
||||
- Android 32-bit (initial release does not include this target, will follow soon)
|
||||
- iOS 64-bit and 32-bit (not supported yet, may be added in the future)
|
||||
|
||||
|
||||
Processor Support
|
||||
-------------------------
|
||||
- Intel x86 optionally with MMX/SSE (no AVX yet, help is welcome)
|
||||
@ -53,37 +53,37 @@ Linux makefiles for 32 bit builds are available:
|
||||
: build the decoder library and executable via codec/build/linux/dec/makefile
|
||||
: build the encoder library and executable via codec/build/linux/enc/makefile
|
||||
: build the encoder shared library via processing/build/linux/makefile
|
||||
|
||||
|
||||
Windows Visual Studio 2008/2010/2012 projects are available:
|
||||
: build the decoder via the Visual Studio projects in codec/build/win32/dec
|
||||
: build the encoder via the Visual Studio projects in codec/build/win32/enc
|
||||
: build the encoder shared library via the Visual Studio projects in processing/build/win32/
|
||||
|
||||
|
||||
NASM needed to be installed for assembly code: workable version 2.07 or above, nasm can downloaded from http://www.nasm.us/
|
||||
|
||||
|
||||
API details to be provided later.
|
||||
|
||||
|
||||
Using the Test App
|
||||
-------------------------
|
||||
Linux shell scripts to build the test apps:
|
||||
: build via testbin/AutoBuild_Linux.sh
|
||||
: clean via testbin/AutoClean_Linux.sh
|
||||
|
||||
|
||||
Windows batch files to build the test apps:
|
||||
: Visual Studio 2008 use testbin/AutoBuild_Windows_VS2008.bat
|
||||
: Visual Studio 2010 use testbin/AutoBuild_Windows_VS2010.bat
|
||||
: Visual Studio 2012 use testbin/AutoBuild_Windows_VS2012.bat
|
||||
|
||||
|
||||
Usage information can be found in testbin/CmdLineReadMe
|
||||
Command line options and details to be provided later.
|
||||
|
||||
|
||||
Using the Source
|
||||
-----------------------
|
||||
codec - encoder, decoder, console (test app), build (makefile, vcproj)
|
||||
processing - raw pixel processing (used by encoder)
|
||||
testbin - autobuild scripts, test app config files, yuv test files
|
||||
bin - binaries for library and test app
|
||||
|
||||
|
||||
Known Issues
|
||||
-------------------
|
||||
See the issue tracker on https://github.com/cisco/openh264/issues
|
||||
@ -91,7 +91,7 @@ See the issue tracker on https://github.com/cisco/openh264/issues
|
||||
- Encoder errors when compressed frame size exceeds half uncompressed size
|
||||
- Encoder console app only support multiple of 16 width/height for now
|
||||
- Decoder errors when compressed frame size exceeds 1MB
|
||||
|
||||
|
||||
License
|
||||
----------
|
||||
BSD, see LICENSE file for details.
|
||||
|
@ -19,7 +19,7 @@ def make_o(x):
|
||||
def write_cpp_rule(f, x):
|
||||
src = "$(%s_SRCDIR)/%s"%(PREFIX, x)
|
||||
dst = "$(%s_SRCDIR)/%s"%(PREFIX, make_o(x))
|
||||
|
||||
|
||||
f.write("%s: %s\n"%(dst, src))
|
||||
f.write('\t$(CXX) $(CFLAGS) $(CXXFLAGS) $(INCLUDES) $(' + PREFIX + '_CFLAGS) $(' + PREFIX + '_INCLUDES) -c -o ' + dst + ' ' + src + '\n');
|
||||
f.write("\n")
|
||||
@ -27,7 +27,7 @@ def write_cpp_rule(f, x):
|
||||
def write_asm_rule(f, x):
|
||||
src = "$(%s_SRCDIR)/%s"%(PREFIX, x)
|
||||
dst = "$(%s_SRCDIR)/%s"%(PREFIX, make_o(x))
|
||||
|
||||
|
||||
f.write("%s: %s\n"%(dst, src))
|
||||
f.write('\t$(ASM) $(ASMFLAGS) $(ASM_INCLUDES) $(' + PREFIX + '_ASMFLAGS) $(' + PREFIX + '_ASM_INCLUDES) -o ' + dst + ' ' + src + '\n');
|
||||
f.write("\n")
|
||||
@ -70,7 +70,7 @@ f.write("%s_SRCDIR=%s\n"%(PREFIX, args.directory))
|
||||
f.write("%s_CPP_SRCS=\\\n"%(PREFIX))
|
||||
for c in cpp:
|
||||
f.write("\t$(%s_SRCDIR)/%s\\\n"%(PREFIX, c))
|
||||
f.write("\n")
|
||||
f.write("\n")
|
||||
f.write("%s_OBJS += $(%s_CPP_SRCS:.cpp=.o)\n"%(PREFIX, PREFIX))
|
||||
|
||||
f.write("ifeq ($(USE_ASM), Yes)\n");
|
||||
|
@ -25,7 +25,7 @@ GCC = gcc -m32
|
||||
ASFLAGS= -f elf -DNOPREFIX -I ../../../decoder/core/asm/
|
||||
|
||||
LIBS= -lstdc++ -ldl
|
||||
#-lm
|
||||
#-lm
|
||||
CFLAGS= $(INCLUDE) -fPIC -D__GCC__ -DLINUX -D__NO_CTYPE -DHAVE_CACHE_LINE_ALIGN
|
||||
|
||||
ifeq ($(DBG),1)
|
||||
@ -65,7 +65,7 @@ $(CORESRCDIR)/decoder_core.cpp \
|
||||
$(CORESRCDIR)/utils.cpp \
|
||||
$(PLUSSRCDIR)/welsDecoderExt.cpp \
|
||||
$(PLUSSRCDIR)/welsCodecTrace.cpp \
|
||||
$(COMMONSRCDIR)/logging.cpp
|
||||
$(COMMONSRCDIR)/logging.cpp
|
||||
|
||||
ASMSRC= $(ASMSRCDIR)/block_add.asm \
|
||||
$(ASMSRCDIR)/cpuid.asm \
|
||||
@ -78,7 +78,7 @@ $(ASMSRCDIR)/mb_copy.asm \
|
||||
$(ASMSRCDIR)/mc_luma.asm \
|
||||
$(ASMSRCDIR)/memzero.asm \
|
||||
$(ASMSRCDIR)/asm_inc.asm \
|
||||
|
||||
|
||||
MAINSRC= $(MAINSRCDIR)/d3d9_utils.cpp \
|
||||
$(MAINSRCDIR)/h264dec.cpp \
|
||||
$(MAINSRCDIR)/read_config.cpp
|
||||
@ -119,7 +119,7 @@ $(OBJDIR)/mc_chroma.o \
|
||||
$(OBJDIR)/mb_copy.o \
|
||||
$(OBJDIR)/mc_luma.o \
|
||||
$(OBJDIR)/memzero.o \
|
||||
$(OBJDIR)/asm_inc.o
|
||||
$(OBJDIR)/asm_inc.o
|
||||
endif
|
||||
|
||||
OBJBIN= $(OBJDIR)/d3d9_utils.o \
|
||||
@ -134,7 +134,7 @@ default: depend checkdir lib dylib exe release
|
||||
|
||||
dependencies:
|
||||
@echo "" >dependencies
|
||||
|
||||
|
||||
checkdir:
|
||||
@echo 'checkdir..'
|
||||
@if test ! -d $(BINDIR) ; \
|
||||
@ -154,7 +154,7 @@ checkdir:
|
||||
mkdir -p $(OBJDIR) ; \
|
||||
fi
|
||||
@echo
|
||||
|
||||
|
||||
release:
|
||||
@echo 'release..'
|
||||
@echo 'cp -f $(SHAREDLIB) $(OUTDIR)'
|
||||
@ -169,14 +169,14 @@ clean:
|
||||
@rm -f $(OBJBIN)
|
||||
@rm -f $(BINLIB)
|
||||
@rm -f $(SHAREDLIB)
|
||||
@rm -f $(BIN)
|
||||
@rm -f $(BIN)
|
||||
|
||||
tags:
|
||||
@echo update tag table
|
||||
@etags $(CORESRCDIR)/*.c $(CORESRCDIR)/*.cpp $(PLUSSRCDIR)/*.cpp $(MAINSRCDIR)/*.cpp
|
||||
|
||||
|
||||
lib: $(OBJDEC)
|
||||
|
||||
|
||||
lib: $(OBJDEC)
|
||||
@echo '$(OBJDEC)'
|
||||
@echo
|
||||
@echo 'ar cr $(BINLIB) $(OBJDEC)'
|
||||
@ -197,15 +197,15 @@ dylib: $(OBJDEC)
|
||||
@$(CXX) -shared -Wl,-Bsymbolic -o $(SHAREDLIB) $(OBJDEC) $(LIBS)
|
||||
@echo '... done'
|
||||
@echo
|
||||
|
||||
|
||||
|
||||
exe: $(OBJBIN)
|
||||
@echo
|
||||
@echo
|
||||
@echo '$(OBJBIN)'
|
||||
@echo
|
||||
@echo '$(CXX) $(LIBS) $(OBJBIN) $(BINLIB) -o $(BIN)'
|
||||
@echo 'creating binary "$(BIN)"'
|
||||
@$(CXX) $(OBJBIN) $(BINLIB) -o $(BIN) $(LIBS)
|
||||
@$(CXX) $(OBJBIN) $(BINLIB) -o $(BIN) $(LIBS)
|
||||
@echo '... done'
|
||||
@echo
|
||||
|
||||
@ -223,31 +223,31 @@ depend:
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(CORESRCDIR)/%.c
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(CORESRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(PLUSSRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(ASMSRCDIR)/%.asm
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(AS) $(ASFLAGS) -o $@ $<
|
||||
@$(AS) $(ASFLAGS) -o $@ $<
|
||||
|
||||
#$(OBJDIR)/%.o$(SUFFIX): $(ASMCOMDIR)/%.asm
|
||||
# @echo 'compiling object file "$@" ...'
|
||||
# @$(AS) $(ASFLAGS) -o $@ $<
|
||||
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(MAINSRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(COMMONSRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
|
||||
include $(DEPEND)
|
||||
|
||||
|
@ -26,8 +26,8 @@ GCC = gcc -m32
|
||||
ASFLAGS= -f elf -DNOPREFIX -I ../../../encoder/core/asm/
|
||||
|
||||
LIBS= -lstdc++ -ldl -lpthread -lm
|
||||
#-lm
|
||||
CFLAGS= $(INCLUDE) -m32 -fPIC -D__GCC__ -DLINUX -D__NO_CTYPE -DWELS_SVC -DENCODER_CORE -DHAVE_CACHE_LINE_ALIGN -DWELS_TESTBED -DMT_ENABLED
|
||||
#-lm
|
||||
CFLAGS= $(INCLUDE) -m32 -fPIC -D__GCC__ -DLINUX -D__NO_CTYPE -DWELS_SVC -DENCODER_CORE -DHAVE_CACHE_LINE_ALIGN -DWELS_TESTBED -DMT_ENABLED
|
||||
|
||||
ifeq ($(DBG),1)
|
||||
#SUFFIX= .dbg
|
||||
@ -150,7 +150,7 @@ $(OBJDIR)/quant.o \
|
||||
$(OBJDIR)/satd_sad.o \
|
||||
$(OBJDIR)/score.o \
|
||||
$(OBJDIR)/asm_inc.o \
|
||||
$(OBJDIR)/vaa.o
|
||||
$(OBJDIR)/vaa.o
|
||||
endif
|
||||
OBJBIN= $(OBJDIR)/read_config.o \
|
||||
$(OBJDIR)/welsenc.o
|
||||
@ -163,7 +163,7 @@ default: depend checkdir lib dylib exe release
|
||||
|
||||
dependencies:
|
||||
@echo "" >dependencies
|
||||
|
||||
|
||||
checkdir:
|
||||
@echo 'checkdir..'
|
||||
@if test ! -d $(OUTDIR) ; \
|
||||
@ -195,9 +195,9 @@ clean:
|
||||
tags:
|
||||
@echo update tag table
|
||||
@etags $(THREADLIBSRCDIR)/*.cpp $(COMMSRCDIR)/*.cpp $(CORESRCDIR)/*.cpp $(PLUSSRCDIR)/*.cpp $(MAINSRCDIR)/*.cpp
|
||||
|
||||
|
||||
lib: $(OBJENC)
|
||||
|
||||
|
||||
lib: $(OBJENC)
|
||||
@echo '$(OBJENC)'
|
||||
@echo
|
||||
@echo 'ar cr $(BINLIB) $(OBJENC)'
|
||||
@ -218,7 +218,7 @@ dylib: $(OBJDEC)
|
||||
@$(GCC) -shared -Wl,-Bsymbolic -m32 -o $(SHAREDLIB) $(OBJENC) $(LIBS)
|
||||
@echo '... done'
|
||||
@echo
|
||||
|
||||
|
||||
release:
|
||||
@echo 'release..'
|
||||
@echo 'cp -f $(SHAREDLIB) $(OUTDIR)'
|
||||
@ -228,7 +228,7 @@ release:
|
||||
@echo
|
||||
|
||||
exe: $(OBJBIN)
|
||||
@echo
|
||||
@echo
|
||||
@echo '$(OBJBIN)'
|
||||
@echo
|
||||
@echo '$(GCC) $(LIBS) $(OBJBIN) $(BINLIB) -m32 -o $(BIN)'
|
||||
@ -251,24 +251,24 @@ $(OBJDIR)/%.o$(SUFFIX): $(THREADLIBSRCDIR)/%.cpp
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(CORESRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(PLUSSRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(ASMSRCDIR)/%.asm
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(AS) $(ASFLAGS) -o $@ $<
|
||||
|
||||
@$(AS) $(ASFLAGS) -o $@ $<
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(MAINSRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(MAINSRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
||||
$(OBJDIR)/%.o$(SUFFIX): $(COMMONSRCDIR)/%.cpp
|
||||
@echo 'compiling object file "$@" ...'
|
||||
@$(CC) -m32 -c $(CFLAGS) -o $@ $<
|
||||
|
@ -43,7 +43,7 @@
|
||||
; Options, for DEBUG
|
||||
;***********************************************************************
|
||||
|
||||
%if 1
|
||||
%if 1
|
||||
%define MOVDQ movdqa
|
||||
%else
|
||||
%define MOVDQ movdqu
|
||||
@ -58,7 +58,7 @@
|
||||
BITS 32
|
||||
|
||||
;***********************************************************************
|
||||
; Macros
|
||||
; Macros
|
||||
;***********************************************************************
|
||||
|
||||
%macro WELS_EXTERN 1
|
||||
@ -74,7 +74,7 @@ BITS 32
|
||||
pxor %2, %2
|
||||
psubw %2, %1
|
||||
pmaxsw %1, %2
|
||||
%endmacro
|
||||
%endmacro
|
||||
|
||||
%macro MMX_XSwap 4
|
||||
movq %4, %2
|
||||
@ -105,7 +105,7 @@ BITS 32
|
||||
SSE2_XSawp qdq, %5, %2, %3
|
||||
%endmacro
|
||||
|
||||
;in: xmm0, xmm1, xmm2, xmm3 pOut: xmm0, xmm1, xmm3, xmm4
|
||||
;in: xmm0, xmm1, xmm2, xmm3 pOut: xmm0, xmm1, xmm3, xmm4
|
||||
%macro SSE2_TransTwo4x4W 5
|
||||
SSE2_XSawp wd, %1, %2, %5
|
||||
SSE2_XSawp wd, %3, %4, %2
|
||||
@ -125,26 +125,26 @@ BITS 32
|
||||
movdqa %6, %9
|
||||
movdqa %9, %4
|
||||
SSE2_XSawp bw, %7, %6, %4
|
||||
|
||||
SSE2_XSawp wd, %1, %3, %6
|
||||
|
||||
SSE2_XSawp wd, %1, %3, %6
|
||||
SSE2_XSawp wd, %8, %2, %3
|
||||
SSE2_XSawp wd, %5, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %3
|
||||
movdqa %9, %3
|
||||
SSE2_XSawp wd, %7, %4, %3
|
||||
|
||||
SSE2_XSawp dq, %1, %5, %4
|
||||
|
||||
SSE2_XSawp dq, %1, %5, %4
|
||||
SSE2_XSawp dq, %6, %2, %5
|
||||
SSE2_XSawp dq, %8, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %5
|
||||
movdqa %9, %5
|
||||
SSE2_XSawp dq, %7, %3, %5
|
||||
|
||||
|
||||
SSE2_XSawp qdq, %1, %8, %3
|
||||
SSE2_XSawp qdq, %4, %2, %8
|
||||
SSE2_XSawp qdq, %6, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %1
|
||||
movdqa %9, %1
|
||||
SSE2_XSawp qdq, %7, %5, %1
|
||||
movdqa %5, %9
|
||||
%endmacro
|
||||
@ -170,9 +170,9 @@ BITS 32
|
||||
%macro butterfly_1to16_sse 3 ; xmm? for dst, xmm? for tmp, one byte for pSrc [generic register name: a/b/c/d]
|
||||
mov %3h, %3l
|
||||
movd %1, e%3x ; i.e, 1% = eax (=b0)
|
||||
pshuflw %2, %1, 00h ; ..., b0 b0 b0 b0 b0 b0 b0 b0
|
||||
pshufd %1, %2, 00h ; b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0
|
||||
%endmacro
|
||||
pshuflw %2, %1, 00h ; ..., b0 b0 b0 b0 b0 b0 b0 b0
|
||||
pshufd %1, %2, 00h ; b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0
|
||||
%endmacro
|
||||
|
||||
;copy a dw into a xmm for 8 times
|
||||
%macro SSE2_Copy8Times 2
|
||||
|
@ -48,7 +48,7 @@ BITS 32
|
||||
; Macros and other preprocessor constants
|
||||
;*******************************************************************************
|
||||
|
||||
%macro BLOCK_ADD_16_SSE2 4
|
||||
%macro BLOCK_ADD_16_SSE2 4
|
||||
movdqa xmm0, [%2]
|
||||
movdqa xmm1, [%3]
|
||||
movdqa xmm2, [%3+10h]
|
||||
@ -65,7 +65,7 @@ BITS 32
|
||||
|
||||
lea %2, [%2+%4]
|
||||
lea %3, [%3+%4*2]
|
||||
lea %1, [%1+%4]
|
||||
lea %1, [%1+%4]
|
||||
%endmacro
|
||||
|
||||
%macro BLOCK_ADD_8_MMXEXT 4
|
||||
@ -106,7 +106,7 @@ BITS 32
|
||||
|
||||
lea %2, [%2+%4]
|
||||
lea %3, [%3+%5*2]
|
||||
lea %1, [%1+%4]
|
||||
lea %1, [%1+%4]
|
||||
%endmacro
|
||||
|
||||
|
||||
@ -130,24 +130,24 @@ BITS 32
|
||||
lea %1, [%1+%4]
|
||||
%endmacro
|
||||
|
||||
%macro BLOCK_ADD_8_STRIDE_2_LINES_SSE2 5
|
||||
%macro BLOCK_ADD_8_STRIDE_2_LINES_SSE2 5
|
||||
movdqa xmm1, [%3]
|
||||
movq xmm0, [%2]
|
||||
punpcklbw xmm0, xmm7
|
||||
paddw xmm0, xmm1
|
||||
packuswb xmm0, xmm7
|
||||
movq [%1], xmm0
|
||||
|
||||
movq [%1], xmm0
|
||||
|
||||
movdqa xmm3, [%3+%5*2]
|
||||
movq xmm2, [%2+%4]
|
||||
punpcklbw xmm2, xmm7
|
||||
paddw xmm2, xmm3
|
||||
packuswb xmm2, xmm7
|
||||
movq [%1+%4], xmm2
|
||||
|
||||
packuswb xmm2, xmm7
|
||||
movq [%1+%4], xmm2
|
||||
|
||||
lea %1, [%1+%4*2]
|
||||
lea %2, [%2+%4*2]
|
||||
lea %3, [%3+%5*4]
|
||||
lea %3, [%3+%5*4]
|
||||
%endmacro
|
||||
|
||||
%macro CHECK_DATA_16_ZERO_SSE4 3
|
||||
@ -159,7 +159,7 @@ BITS 32
|
||||
por xmm0, xmm1
|
||||
ptest xmm7, xmm0
|
||||
cmovae eax, %3
|
||||
|
||||
|
||||
add %1, 20h
|
||||
add ecx, 04h
|
||||
mov byte [%2+ebx], al
|
||||
@ -170,12 +170,12 @@ BITS 32
|
||||
movdqa xmm1, [%1+%3]
|
||||
movdqa xmm2, [%1+%3*2]
|
||||
movdqa xmm3, [%1+%4]
|
||||
|
||||
|
||||
mov eax, 0h
|
||||
mov ebx, 0h
|
||||
movdqa xmm4, xmm0
|
||||
movdqa xmm5, xmm2
|
||||
|
||||
|
||||
punpcklqdq xmm0, xmm1
|
||||
punpckhqdq xmm4, xmm1
|
||||
punpcklqdq xmm2, xmm3
|
||||
@ -183,12 +183,12 @@ BITS 32
|
||||
|
||||
por xmm0, xmm2
|
||||
por xmm4, xmm5
|
||||
|
||||
|
||||
ptest xmm7, xmm0
|
||||
cmovae eax, %5
|
||||
ptest xmm7, xmm4
|
||||
cmovae ebx, %5
|
||||
|
||||
cmovae ebx, %5
|
||||
|
||||
mov byte [%2], al
|
||||
mov byte [%2+1], bl
|
||||
%endmacro
|
||||
@ -230,45 +230,45 @@ BITS 32
|
||||
movdqa xmm0, [%1]
|
||||
movdqa xmm1, [%1+10h]
|
||||
mov ebx, [ecx]
|
||||
|
||||
|
||||
pcmpeqw xmm0, xmm7
|
||||
pcmpeqw xmm1, xmm7
|
||||
packsswb xmm0, xmm1
|
||||
pmovmskb edx, xmm0
|
||||
pmovmskb edx, xmm0
|
||||
sub edx, 0ffffh
|
||||
|
||||
cmovb eax, ebp
|
||||
|
||||
cmovb eax, ebp
|
||||
add ecx, 4
|
||||
add %1, 20h
|
||||
mov byte [%2+ebx], al
|
||||
%endmacro
|
||||
|
||||
|
||||
|
||||
|
||||
%macro CHECK_RS_4x4_BLOCK_2_ZERO_SSE2 5
|
||||
movdqa xmm0, [%1]
|
||||
movdqa xmm1, [%1 + %3]
|
||||
movdqa xmm2, [%1 + %3*2]
|
||||
movdqa xmm3, [%1 + %4]
|
||||
|
||||
movdqa xmm3, [%1 + %4]
|
||||
|
||||
movdqa xmm4, xmm0
|
||||
movdqa xmm5, xmm2
|
||||
|
||||
|
||||
punpcklqdq xmm0, xmm1
|
||||
punpckhqdq xmm4, xmm1
|
||||
punpcklqdq xmm2, xmm3
|
||||
punpckhqdq xmm5, xmm3
|
||||
|
||||
|
||||
pcmpeqw xmm0, xmm7
|
||||
pcmpeqw xmm2, xmm7
|
||||
pcmpeqw xmm4, xmm7
|
||||
pcmpeqw xmm5, xmm7
|
||||
|
||||
|
||||
packsswb xmm0, xmm2
|
||||
packsswb xmm4, xmm5
|
||||
pmovmskb eax, xmm0
|
||||
pmovmskb ebx, xmm4
|
||||
|
||||
|
||||
sub eax, 0ffffh
|
||||
mov eax, 0
|
||||
cmovb eax, %5
|
||||
@ -276,7 +276,7 @@ BITS 32
|
||||
mov ebx, 0
|
||||
cmovb ebx, %5
|
||||
mov byte [%2], al
|
||||
mov byte [%2+1], bl
|
||||
mov byte [%2+1], bl
|
||||
%endmacro
|
||||
|
||||
;*******************************************************************************
|
||||
@ -291,12 +291,12 @@ SECTION .rodata align=16
|
||||
|
||||
ALIGN 16
|
||||
SubMbScanIdx:
|
||||
dd 0x0, 0x1, 0x4, 0x5,
|
||||
dd 0x0, 0x1, 0x4, 0x5,
|
||||
dd 0x2, 0x3, 0x6, 0x7,
|
||||
dd 0x8, 0x9, 0xc, 0xd,
|
||||
dd 0xa, 0xb, 0xe, 0xf,
|
||||
dd 0x10, 0x11, 0x14, 0x15,
|
||||
dd 0x12, 0x13, 0x16, 0x17,
|
||||
dd 0x12, 0x13, 0x16, 0x17,
|
||||
|
||||
;*******************************************************************************
|
||||
; Code
|
||||
@ -312,10 +312,10 @@ ALIGN 16
|
||||
; void_t WelsResBlockZero16x16_sse2(int16_t* pBlock,int32_t iStride)
|
||||
;*******************************************************************************
|
||||
WelsResBlockZero16x16_sse2:
|
||||
push esi
|
||||
push esi
|
||||
|
||||
mov esi, [esp+08h]
|
||||
mov ecx, [esp+0ch]
|
||||
mov ecx, [esp+0ch]
|
||||
lea ecx, [ecx*2]
|
||||
lea eax, [ecx*3]
|
||||
|
||||
@ -375,7 +375,7 @@ WelsResBlockZero16x16_sse2:
|
||||
|
||||
movdqa [esi+eax], xmm7
|
||||
movdqa [esi+eax+10h], xmm7
|
||||
|
||||
|
||||
pop esi
|
||||
ret
|
||||
|
||||
@ -386,7 +386,7 @@ ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void_t WelsResBlockZero8x8_sse2(int16_t * pBlock, int32_t iStride)
|
||||
;*******************************************************************************
|
||||
WelsResBlockZero8x8_sse2:
|
||||
WelsResBlockZero8x8_sse2:
|
||||
push esi
|
||||
|
||||
mov esi, [esp+08h]
|
||||
@ -407,7 +407,7 @@ WelsResBlockZero8x8_sse2:
|
||||
movdqa [esi+ecx*2], xmm7
|
||||
movdqa [esi+eax], xmm7
|
||||
|
||||
|
||||
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
@ -84,12 +84,12 @@ ALIGN 16
|
||||
; void WelsCPUId( int32_t index, int32_t *uiFeatureA, int32_t *uiFeatureB, int32_t *uiFeatureC, int32_t *uiFeatureD )
|
||||
;****************************************************************************************************
|
||||
WelsCPUId:
|
||||
push ebx
|
||||
push ebx
|
||||
push edi
|
||||
|
||||
|
||||
mov eax, [esp+12] ; operating index
|
||||
cpuid ; cpuid
|
||||
|
||||
|
||||
; processing various information return
|
||||
mov edi, [esp+16]
|
||||
mov [edi], eax
|
||||
@ -100,10 +100,10 @@ WelsCPUId:
|
||||
mov edi, [esp+28]
|
||||
mov [edi], edx
|
||||
|
||||
pop edi
|
||||
pop edi
|
||||
pop ebx
|
||||
ret
|
||||
|
||||
|
||||
WELS_EXTERN WelsCPUSupportAVX
|
||||
; need call after cpuid=1 and eax, ecx flag got then
|
||||
ALIGN 16
|
||||
@ -139,7 +139,7 @@ ALIGN 16
|
||||
WelsCPUSupportFMA:
|
||||
mov eax, [esp+4]
|
||||
mov ecx, [esp+8]
|
||||
|
||||
|
||||
; refer to detection of FMA addressed in INTEL AVX manual document
|
||||
and ecx, 018001000H
|
||||
cmp ecx, 018001000H ; check OSXSAVE, AVX, FMA feature flags
|
||||
@ -153,7 +153,7 @@ WelsCPUSupportFMA:
|
||||
mov eax, 1
|
||||
ret
|
||||
fma_not_supported:
|
||||
mov eax, 0
|
||||
mov eax, 0
|
||||
ret
|
||||
|
||||
WELS_EXTERN WelsEmms
|
||||
|
@ -99,9 +99,9 @@ IdctResAddPred_mmx:
|
||||
%define kiStride esp+pushsize+8
|
||||
%define pRs esp+pushsize+12
|
||||
|
||||
mov eax, [pRs ]
|
||||
mov edx, [pPred ]
|
||||
mov ecx, [kiStride]
|
||||
mov eax, [pRs ]
|
||||
mov edx, [pPred ]
|
||||
mov ecx, [kiStride]
|
||||
movq mm0, [eax+ 0]
|
||||
movq mm1, [eax+ 8]
|
||||
movq mm2, [eax+16]
|
||||
@ -114,13 +114,13 @@ IdctResAddPred_mmx:
|
||||
|
||||
WELS_Zero mm7
|
||||
WELS_DW32 mm6
|
||||
|
||||
|
||||
MMX_StoreDiff4P mm3, mm0, mm6, mm7, [edx]
|
||||
MMX_StoreDiff4P mm4, mm0, mm6, mm7, [edx+ecx]
|
||||
lea edx, [edx+2*ecx]
|
||||
MMX_StoreDiff4P mm1, mm0, mm6, mm7, [edx]
|
||||
MMX_StoreDiff4P mm2, mm0, mm6, mm7, [edx+ecx]
|
||||
|
||||
|
||||
%undef pushsize
|
||||
%undef pPred
|
||||
%undef kiStride
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -155,11 +155,11 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
lea %1, [%1+%2]
|
||||
%endmacro
|
||||
|
||||
%macro exp_top_bottom_sse2 1 ; iPaddingSize [luma(32)/chroma(16)]
|
||||
%macro exp_top_bottom_sse2 1 ; iPaddingSize [luma(32)/chroma(16)]
|
||||
; ebx [width/16(8)]
|
||||
; esi [pSrc+0], edi [pSrc-1], ecx [-stride], 32(16) ; top
|
||||
; eax [pSrc+(h-1)*stride], ebp [pSrc+(h+31)*stride], 32(16) ; bottom
|
||||
|
||||
|
||||
%if %1 == 32 ; for luma
|
||||
sar ebx, 04h ; width / 16(8) pixels
|
||||
.top_bottom_loops:
|
||||
@ -173,7 +173,7 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a
|
||||
mov_line_end16x4_sse2 edi, ecx, xmm0, a
|
||||
|
||||
|
||||
; bottom
|
||||
movdqa xmm1, [eax] ; last line of picture pData
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a ; dst, stride, xmm?
|
||||
@ -184,15 +184,15 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a
|
||||
mov_line_end16x4_sse2 ebp, ecx, xmm1, a
|
||||
|
||||
|
||||
lea esi, [esi+16] ; top pSrc
|
||||
lea edi, [edi+16] ; top dst
|
||||
lea eax, [eax+16] ; bottom pSrc
|
||||
lea ebp, [ebp+16] ; bottom dst
|
||||
neg ecx ; positive/negative stride need for next loop?
|
||||
|
||||
neg ecx ; positive/negative stride need for next loop?
|
||||
|
||||
dec ebx
|
||||
jnz near .top_bottom_loops
|
||||
jnz near .top_bottom_loops
|
||||
%elif %1 == 16 ; for chroma ??
|
||||
mov edx, ebx
|
||||
sar ebx, 04h ; (width / 16) pixels
|
||||
@ -202,21 +202,21 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a
|
||||
mov_line_end16x4_sse2 edi, ecx, xmm0, a
|
||||
|
||||
mov_line_end16x4_sse2 edi, ecx, xmm0, a
|
||||
|
||||
; bottom
|
||||
movdqa xmm1, [eax] ; last line of picture pData
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a
|
||||
mov_line_end16x4_sse2 ebp, ecx, xmm1, a
|
||||
|
||||
mov_line_end16x4_sse2 ebp, ecx, xmm1, a
|
||||
|
||||
lea esi, [esi+16] ; top pSrc
|
||||
lea edi, [edi+16] ; top dst
|
||||
lea eax, [eax+16] ; bottom pSrc
|
||||
lea ebp, [ebp+16] ; bottom dst
|
||||
neg ecx ; positive/negative stride need for next loop?
|
||||
|
||||
neg ecx ; positive/negative stride need for next loop?
|
||||
|
||||
dec ebx
|
||||
jnz near .top_bottom_loops
|
||||
|
||||
@ -243,50 +243,50 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
%endif
|
||||
%endmacro
|
||||
|
||||
%macro exp_left_right_sse2 2 ; iPaddingSize [luma(32)/chroma(16)], u/a
|
||||
%macro exp_left_right_sse2 2 ; iPaddingSize [luma(32)/chroma(16)], u/a
|
||||
; ecx [height]
|
||||
; esi [pSrc+0], edi [pSrc-32], edx [stride], 32(16) ; left
|
||||
; ebx [pSrc+(w-1)], ebp [pSrc+w], 32(16) ; right
|
||||
; xor eax, eax ; for pixel pData (uint8_t) ; make sure eax=0 at least high 24 bits of eax = 0
|
||||
|
||||
%if %1 == 32 ; for luma
|
||||
|
||||
%if %1 == 32 ; for luma
|
||||
.left_right_loops:
|
||||
; left
|
||||
mov al, byte [esi] ; pixel pData for left border
|
||||
butterfly_1to16_sse xmm0, xmm1, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
movdqa [edi], xmm0
|
||||
movdqa [edi+16], xmm0
|
||||
|
||||
|
||||
; right
|
||||
mov al, byte [ebx]
|
||||
butterfly_1to16_sse xmm1, xmm2, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
movdqa [ebp], xmm1
|
||||
movdqa [ebp+16], xmm1
|
||||
|
||||
|
||||
lea esi, [esi+edx] ; left pSrc
|
||||
lea edi, [edi+edx] ; left dst
|
||||
lea ebx, [ebx+edx] ; right pSrc
|
||||
lea ebp, [ebp+edx] ; right dst
|
||||
|
||||
lea ebp, [ebp+edx] ; right dst
|
||||
|
||||
dec ecx
|
||||
jnz near .left_right_loops
|
||||
%elif %1 == 16 ; for chroma ??
|
||||
jnz near .left_right_loops
|
||||
%elif %1 == 16 ; for chroma ??
|
||||
.left_right_loops:
|
||||
; left
|
||||
mov al, byte [esi] ; pixel pData for left border
|
||||
butterfly_1to16_sse xmm0, xmm1, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
movdqa [edi], xmm0
|
||||
|
||||
movdqa [edi], xmm0
|
||||
|
||||
; right
|
||||
mov al, byte [ebx]
|
||||
butterfly_1to16_sse xmm1, xmm2, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
movdq%2 [ebp], xmm1 ; might not be aligned 16 bytes in case chroma planes
|
||||
|
||||
|
||||
lea esi, [esi+edx] ; left pSrc
|
||||
lea edi, [edi+edx] ; left dst
|
||||
lea ebx, [ebx+edx] ; right pSrc
|
||||
lea ebp, [ebp+edx] ; right dst
|
||||
|
||||
lea ebp, [ebp+edx] ; right dst
|
||||
|
||||
dec ecx
|
||||
jnz near .left_right_loops
|
||||
%endif
|
||||
@ -339,25 +339,25 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
; TL
|
||||
mov_line_16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
mov_line_end16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
|
||||
; TR
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
mov_line_end16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
|
||||
; BL
|
||||
mov_line_16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
mov_line_end16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
|
||||
; BR
|
||||
mov_line_16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
mov_line_end16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
%endif
|
||||
%endmacro
|
||||
@ -375,7 +375,7 @@ ExpandPictureLuma_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebp
|
||||
|
||||
|
||||
; for both top and bottom border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; pDst
|
||||
@ -387,10 +387,10 @@ ExpandPictureLuma_sse2:
|
||||
mov cl, byte [esi]
|
||||
butterfly_1to16_sse xmm3, xmm4, c ; pDst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; load top border
|
||||
mov ecx, edx ; kiStride
|
||||
mov ecx, edx ; kiStride
|
||||
neg ecx ; -kiStride
|
||||
lea edi, [esi+ecx] ; last line of top border
|
||||
; load bottom border
|
||||
; load bottom border
|
||||
dec eax ; h-1
|
||||
imul eax, edx ; (h-1)*kiStride
|
||||
lea eax, [esi+eax] ; last line of picture pData
|
||||
@ -398,16 +398,16 @@ ExpandPictureLuma_sse2:
|
||||
lea ebp, [eax+edx] ; last line of bottom border, (h-1)*stride + 32 * stride
|
||||
; also prepare for cross border pData: bottom-left with xmm5, bottom-right xmm6
|
||||
dec ebx ; kiWidth-1
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
; xor edx, edx
|
||||
mov dl, byte [eax] ; bottom-left
|
||||
butterfly_1to16_sse xmm5, xmm6, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
mov dl, byte [ebx] ; bottom-right
|
||||
butterfly_1to16_sse xmm6, xmm4, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for top & bottom expanding
|
||||
; for top & bottom expanding
|
||||
mov ebx, [esp+32] ; kiWidth
|
||||
exp_top_bottom_sse2 32
|
||||
|
||||
exp_top_bottom_sse2 32
|
||||
|
||||
; for both left and right border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst: left border pSrc
|
||||
@ -419,14 +419,14 @@ ExpandPictureLuma_sse2:
|
||||
lea edi, [esi+eax] ; left border dst
|
||||
dec ebx
|
||||
lea ebx, [esi+ebx] ; right border pSrc, (p_dst + width - 1)
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
; prepare for cross border pData: top-right with xmm4
|
||||
; xor eax, eax
|
||||
mov al, byte [ebx] ; top-right
|
||||
butterfly_1to16_sse xmm4, xmm0, a ; pDst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for left & right border expanding
|
||||
exp_left_right_sse2 32, a
|
||||
|
||||
|
||||
; for cross border [top-left, top-right, bottom-left, bottom-right]
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; pDst
|
||||
@ -436,7 +436,7 @@ ExpandPictureLuma_sse2:
|
||||
; have done xmm3,..,xmm6 cross pData initialization above, perform pading as below, To be continued..
|
||||
mov eax, -32 ; luma=-32, chroma=-16
|
||||
neg ecx ; -stride
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [edi+ecx] ; last line of top-left border
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [ebp+ecx] ; last line of top-right border
|
||||
@ -444,19 +444,19 @@ ExpandPictureLuma_sse2:
|
||||
mov ecx, [esp+28] ; kiStride
|
||||
imul edx, ecx ; (height+32(16)) * stride
|
||||
lea eax, [edi+edx] ; last line of bottom-left border
|
||||
lea ebx, [ebp+edx] ; last line of bottom-right border
|
||||
lea ebx, [ebp+edx] ; last line of bottom-right border
|
||||
neg ecx ; -kiStride
|
||||
; for left & right border expanding
|
||||
exp_cross_sse2 32, a
|
||||
|
||||
exp_cross_sse2 32, a
|
||||
|
||||
; sfence ; commit cache write back memory
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
pop edx
|
||||
pop ebx
|
||||
|
||||
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
@ -472,7 +472,7 @@ ExpandPictureChromaAlign_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebp
|
||||
|
||||
|
||||
; for both top and bottom border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; pDst
|
||||
@ -484,10 +484,10 @@ ExpandPictureChromaAlign_sse2:
|
||||
mov cl, byte [esi]
|
||||
butterfly_1to16_sse xmm3, xmm4, c ; pDst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; load top border
|
||||
mov ecx, edx ; kiStride
|
||||
mov ecx, edx ; kiStride
|
||||
neg ecx ; -kiStride
|
||||
lea edi, [esi+ecx] ; last line of top border
|
||||
; load bottom border
|
||||
; load bottom border
|
||||
dec eax ; h-1
|
||||
imul eax, edx ; (h-1)*kiStride
|
||||
lea eax, [esi+eax] ; last line of picture pData
|
||||
@ -495,16 +495,16 @@ ExpandPictureChromaAlign_sse2:
|
||||
lea ebp, [eax+edx] ; last line of bottom border, (h-1)*kiStride + 16 * kiStride
|
||||
; also prepare for cross border pData: bottom-left with xmm5, bottom-right xmm6
|
||||
dec ebx ; kiWidth-1
|
||||
lea ebx, [eax+ebx] ; pDst[w-1][h-1]
|
||||
lea ebx, [eax+ebx] ; pDst[w-1][h-1]
|
||||
; xor edx, edx
|
||||
mov dl, byte [eax] ; bottom-left
|
||||
butterfly_1to16_sse xmm5, xmm6, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
mov dl, byte [ebx] ; bottom-right
|
||||
butterfly_1to16_sse xmm6, xmm4, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for top & bottom expanding
|
||||
; for top & bottom expanding
|
||||
mov ebx, [esp+32] ; kiWidth
|
||||
exp_top_bottom_sse2 16
|
||||
|
||||
exp_top_bottom_sse2 16
|
||||
|
||||
; for both left and right border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; pDst: left border pSrc
|
||||
@ -516,14 +516,14 @@ ExpandPictureChromaAlign_sse2:
|
||||
lea edi, [esi+eax] ; left border dst
|
||||
dec ebx
|
||||
lea ebx, [esi+ebx] ; right border pSrc, (p_dst + width - 1)
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
; prepare for cross border pData: top-right with xmm4
|
||||
; xor eax, eax
|
||||
mov al, byte [ebx] ; top-right
|
||||
butterfly_1to16_sse xmm4, xmm0, a ; pDst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for left & right border expanding
|
||||
exp_left_right_sse2 16, a
|
||||
|
||||
|
||||
; for cross border [top-left, top-right, bottom-left, bottom-right]
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; pDst
|
||||
@ -533,9 +533,9 @@ ExpandPictureChromaAlign_sse2:
|
||||
; have done xmm3,..,xmm6 cross pData initialization above, perform pading as below, To be continued..
|
||||
mov eax, -16 ; chroma=-16
|
||||
neg ecx ; -stride
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [edi+ecx] ; last line of top-left border
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [ebp+ecx] ; last line of top-right border
|
||||
mov ecx, [esp+28] ; kiStride
|
||||
add edx, 16 ; height+16, luma=32, chroma=16
|
||||
@ -545,15 +545,15 @@ ExpandPictureChromaAlign_sse2:
|
||||
neg ecx ; -kiStride
|
||||
; for left & right border expanding
|
||||
exp_cross_sse2 16, a
|
||||
|
||||
|
||||
; sfence ; commit cache write back memory
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
pop edx
|
||||
pop ebx
|
||||
|
||||
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
@ -569,7 +569,7 @@ ExpandPictureChromaUnalign_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebp
|
||||
|
||||
|
||||
; for both top and bottom border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; pDst
|
||||
@ -581,10 +581,10 @@ ExpandPictureChromaUnalign_sse2:
|
||||
mov cl, byte [esi]
|
||||
butterfly_1to16_sse xmm3, xmm4, c ; pDst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; load top border
|
||||
mov ecx, edx ; kiStride
|
||||
mov ecx, edx ; kiStride
|
||||
neg ecx ; -kiStride
|
||||
lea edi, [esi+ecx] ; last line of top border
|
||||
; load bottom border
|
||||
; load bottom border
|
||||
dec eax ; h-1
|
||||
imul eax, edx ; (h-1)*kiStride
|
||||
lea eax, [esi+eax] ; last line of picture pData
|
||||
@ -592,16 +592,16 @@ ExpandPictureChromaUnalign_sse2:
|
||||
lea ebp, [eax+edx] ; last line of bottom border, (h-1)*kiStride + 16 * kiStride
|
||||
; also prepare for cross border pData: bottom-left with xmm5, bottom-right xmm6
|
||||
dec ebx ; kiWidth-1
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
; xor edx, edx
|
||||
mov dl, byte [eax] ; bottom-left
|
||||
butterfly_1to16_sse xmm5, xmm6, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
mov dl, byte [ebx] ; bottom-right
|
||||
butterfly_1to16_sse xmm6, xmm4, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for top & bottom expanding
|
||||
; for top & bottom expanding
|
||||
mov ebx, [esp+32] ; kiWidth
|
||||
exp_top_bottom_sse2 16
|
||||
|
||||
exp_top_bottom_sse2 16
|
||||
|
||||
; for both left and right border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst: left border pSrc
|
||||
@ -613,14 +613,14 @@ ExpandPictureChromaUnalign_sse2:
|
||||
lea edi, [esi+eax] ; left border dst
|
||||
dec ebx
|
||||
lea ebx, [esi+ebx] ; right border pSrc, (p_dst + width - 1)
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
; prepare for cross border pData: top-right with xmm4
|
||||
; xor eax, eax
|
||||
mov al, byte [ebx] ; top-right
|
||||
butterfly_1to16_sse xmm4, xmm0, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for left & right border expanding
|
||||
exp_left_right_sse2 16, u
|
||||
|
||||
|
||||
; for cross border [top-left, top-right, bottom-left, bottom-right]
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst
|
||||
@ -630,9 +630,9 @@ ExpandPictureChromaUnalign_sse2:
|
||||
; have done xmm3,..,xmm6 cross pData initialization above, perform pading as below, To be continued..
|
||||
neg ecx ; -kiStride
|
||||
mov eax, -16 ; chroma=-16
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [edi+ecx] ; last line of top-left border
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [ebp+ecx] ; last line of top-right border
|
||||
mov ecx, [esp+28] ; kiStride
|
||||
add edx, 16 ; kiHeight+16, luma=32, chroma=16
|
||||
@ -642,14 +642,14 @@ ExpandPictureChromaUnalign_sse2:
|
||||
neg ecx ; -kiStride
|
||||
; for left & right border expanding
|
||||
exp_cross_sse2 16, u
|
||||
|
||||
|
||||
; sfence ; commit cache write back memory
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
pop edx
|
||||
pop ebx
|
||||
|
||||
|
||||
ret
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -37,7 +37,7 @@
|
||||
;* History
|
||||
;* 15/09/2009 Created
|
||||
;* 12/28/2009 Modified with larger throughput
|
||||
;* 12/29/2011 Tuned WelsCopy16x16NotAligned_sse2, added UpdateMbMv_sse2 WelsCopy16x8NotAligned_sse2,
|
||||
;* 12/29/2011 Tuned WelsCopy16x16NotAligned_sse2, added UpdateMbMv_sse2 WelsCopy16x8NotAligned_sse2,
|
||||
;* WelsCopy16x8_mmx, WelsCopy8x16_mmx etc;
|
||||
;*
|
||||
;*
|
||||
@ -84,7 +84,7 @@ ALIGN 16
|
||||
; int iHeight );
|
||||
;*******************************************************************************
|
||||
PixelAvgWidthEq4_mmx:
|
||||
|
||||
|
||||
push esi
|
||||
push edi
|
||||
push ebp
|
||||
@ -102,7 +102,7 @@ ALIGN 4
|
||||
movd mm0, [ebp]
|
||||
pavgb mm0, [esi]
|
||||
movd [edi], mm0
|
||||
|
||||
|
||||
dec ebx
|
||||
lea edi, [edi+eax]
|
||||
lea esi, [esi+ecx]
|
||||
@ -115,7 +115,7 @@ ALIGN 4
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void_t PixelAvgWidthEq8_mmx( uint8_t *pDst, int iDstStride,
|
||||
@ -124,7 +124,7 @@ ALIGN 16
|
||||
; int iHeight );
|
||||
;*******************************************************************************
|
||||
PixelAvgWidthEq8_mmx:
|
||||
|
||||
|
||||
push esi
|
||||
push edi
|
||||
push ebp
|
||||
@ -145,14 +145,14 @@ ALIGN 4
|
||||
movq mm0, [esi+ecx]
|
||||
pavgb mm0, [ebp+edx]
|
||||
movq [edi+eax], mm0
|
||||
|
||||
|
||||
lea esi, [esi+2*ecx]
|
||||
lea ebp, [ebp+2*edx]
|
||||
lea edi, [edi+2*eax]
|
||||
|
||||
|
||||
sub ebx, 2
|
||||
jnz .height_loop
|
||||
|
||||
|
||||
WELSEMMS
|
||||
pop ebx
|
||||
pop ebp
|
||||
@ -174,7 +174,7 @@ PixelAvgWidthEq16_sse2:
|
||||
push edi
|
||||
push ebp
|
||||
push ebx
|
||||
|
||||
|
||||
|
||||
mov edi, [esp+20] ; pDst
|
||||
mov eax, [esp+24] ; iDstStride
|
||||
@ -188,28 +188,28 @@ ALIGN 4
|
||||
movdqu xmm0, [esi]
|
||||
pavgb xmm0, [ebp]
|
||||
movdqu [edi], xmm0
|
||||
|
||||
|
||||
movdqu xmm0, [esi+ecx]
|
||||
pavgb xmm0, [ebp+edx]
|
||||
movdqu [edi+eax], xmm0
|
||||
|
||||
|
||||
movdqu xmm0, [esi+2*ecx]
|
||||
pavgb xmm0, [ebp+2*edx]
|
||||
movdqu [edi+2*eax], xmm0
|
||||
|
||||
|
||||
lea esi, [esi+2*ecx]
|
||||
lea ebp, [ebp+2*edx]
|
||||
lea edi, [edi+2*eax]
|
||||
|
||||
|
||||
movdqu xmm0, [esi+ecx]
|
||||
pavgb xmm0, [ebp+edx]
|
||||
movdqu [edi+eax], xmm0
|
||||
|
||||
|
||||
lea esi, [esi+2*ecx]
|
||||
lea ebp, [ebp+2*edx]
|
||||
lea edi, [edi+2*eax]
|
||||
|
||||
|
||||
|
||||
|
||||
sub ebx, 4
|
||||
jne .height_loop
|
||||
|
||||
@ -232,7 +232,7 @@ McCopyWidthEq4_mmx:
|
||||
push edi
|
||||
push ebx
|
||||
|
||||
|
||||
|
||||
mov esi, [esp+16]
|
||||
mov eax, [esp+20]
|
||||
mov edi, [esp+24]
|
||||
@ -242,12 +242,12 @@ ALIGN 4
|
||||
.height_loop:
|
||||
mov ebx, [esi]
|
||||
mov [edi], ebx
|
||||
|
||||
|
||||
add esi, eax
|
||||
add edi, ecx
|
||||
dec edx
|
||||
jnz .height_loop
|
||||
WELSEMMS
|
||||
WELSEMMS
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -275,12 +275,12 @@ ALIGN 4
|
||||
add edi, ecx
|
||||
dec edx
|
||||
jnz .height_loop
|
||||
|
||||
WELSEMMS
|
||||
|
||||
WELSEMMS
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
@ -308,7 +308,7 @@ McCopyWidthEq16_sse2:
|
||||
push edi
|
||||
|
||||
mov esi, [esp+12] ; pSrc
|
||||
mov eax, [esp+16] ; iSrcStride
|
||||
mov eax, [esp+16] ; iSrcStride
|
||||
mov edi, [esp+20] ; pDst
|
||||
mov edx, [esp+24] ; iDstStride
|
||||
mov ecx, [esp+28] ; iHeight
|
||||
@ -324,7 +324,7 @@ ALIGN 4
|
||||
lea esi, [esi+eax*2]
|
||||
lea edi, [edi+edx*2]
|
||||
jnz .height_loop
|
||||
|
||||
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
@ -69,11 +69,11 @@ SECTION .text
|
||||
|
||||
ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void McChromaWidthEq4_mmx( uint8_t *src,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; void McChromaWidthEq4_mmx( uint8_t *src,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; int32_t iHeigh );
|
||||
;*******************************************************************************
|
||||
WELS_EXTERN McChromaWidthEq4_mmx
|
||||
@ -81,46 +81,46 @@ McChromaWidthEq4_mmx:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
|
||||
|
||||
mov eax, [esp +12 + 20]
|
||||
movd mm3, [eax]
|
||||
WELS_Zero mm7
|
||||
punpcklbw mm3, mm3
|
||||
movq mm4, mm3
|
||||
punpcklwd mm3, mm3
|
||||
punpckhwd mm4, mm4
|
||||
|
||||
punpcklwd mm3, mm3
|
||||
punpckhwd mm4, mm4
|
||||
|
||||
movq mm5, mm3
|
||||
punpcklbw mm3, mm7
|
||||
punpckhbw mm5, mm7
|
||||
|
||||
|
||||
movq mm6, mm4
|
||||
punpcklbw mm4, mm7
|
||||
punpckhbw mm6, mm7
|
||||
|
||||
mov esi, [esp +12+ 4]
|
||||
mov eax, [esp + 12 + 8]
|
||||
mov edi, [esp + 12 + 12]
|
||||
mov edx, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
|
||||
mov esi, [esp +12+ 4]
|
||||
mov eax, [esp + 12 + 8]
|
||||
mov edi, [esp + 12 + 12]
|
||||
mov edx, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
lea ebx, [esi + eax]
|
||||
movd mm0, [esi]
|
||||
movd mm1, [esi+1]
|
||||
punpcklbw mm0, mm7
|
||||
punpcklbw mm1, mm7
|
||||
.xloop:
|
||||
|
||||
|
||||
pmullw mm0, mm3
|
||||
pmullw mm1, mm5
|
||||
paddw mm0, mm1
|
||||
|
||||
|
||||
movd mm1, [ebx]
|
||||
punpcklbw mm1, mm7
|
||||
movq mm2, mm1
|
||||
pmullw mm1, mm4
|
||||
paddw mm0, mm1
|
||||
|
||||
|
||||
movd mm1, [ebx+1]
|
||||
punpcklbw mm1, mm7
|
||||
movq mm7, mm1
|
||||
@ -130,13 +130,13 @@ McChromaWidthEq4_mmx:
|
||||
|
||||
paddw mm0, [h264_d0x20_mmx]
|
||||
psrlw mm0, 6
|
||||
|
||||
|
||||
WELS_Zero mm7
|
||||
packuswb mm0, mm7
|
||||
movd [edi], mm0
|
||||
movd [edi], mm0
|
||||
|
||||
movq mm0, mm2
|
||||
|
||||
|
||||
lea edi, [edi +edx ]
|
||||
lea ebx, [ebx + eax]
|
||||
|
||||
@ -151,11 +151,11 @@ McChromaWidthEq4_mmx:
|
||||
|
||||
ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void McChromaWidthEq8_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; void McChromaWidthEq8_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; int32_t iheigh );
|
||||
;*******************************************************************************
|
||||
WELS_EXTERN McChromaWidthEq8_sse2
|
||||
@ -163,47 +163,47 @@ McChromaWidthEq8_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
|
||||
|
||||
mov eax, [esp +12 + 20]
|
||||
movd xmm3, [eax]
|
||||
WELS_Zero xmm7
|
||||
punpcklbw xmm3, xmm3
|
||||
punpcklwd xmm3, xmm3
|
||||
|
||||
|
||||
movdqa xmm4, xmm3
|
||||
punpckldq xmm3, xmm3
|
||||
punpckhdq xmm4, xmm4
|
||||
movdqa xmm5, xmm3
|
||||
movdqa xmm6, xmm4
|
||||
|
||||
|
||||
punpcklbw xmm3, xmm7
|
||||
punpckhbw xmm5, xmm7
|
||||
punpcklbw xmm4, xmm7
|
||||
punpckhbw xmm6, xmm7
|
||||
|
||||
mov esi, [esp +12+ 4]
|
||||
mov eax, [esp + 12 + 8]
|
||||
mov edi, [esp + 12 + 12]
|
||||
mov edx, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
|
||||
mov esi, [esp +12+ 4]
|
||||
mov eax, [esp + 12 + 8]
|
||||
mov edi, [esp + 12 + 12]
|
||||
mov edx, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
lea ebx, [esi + eax]
|
||||
movq xmm0, [esi]
|
||||
movq xmm1, [esi+1]
|
||||
punpcklbw xmm0, xmm7
|
||||
punpcklbw xmm1, xmm7
|
||||
.xloop:
|
||||
|
||||
|
||||
pmullw xmm0, xmm3
|
||||
pmullw xmm1, xmm5
|
||||
paddw xmm0, xmm1
|
||||
|
||||
|
||||
movq xmm1, [ebx]
|
||||
punpcklbw xmm1, xmm7
|
||||
movdqa xmm2, xmm1
|
||||
pmullw xmm1, xmm4
|
||||
paddw xmm0, xmm1
|
||||
|
||||
|
||||
movq xmm1, [ebx+1]
|
||||
punpcklbw xmm1, xmm7
|
||||
movdqa xmm7, xmm1
|
||||
@ -213,19 +213,19 @@ McChromaWidthEq8_sse2:
|
||||
|
||||
paddw xmm0, [h264_d0x20_sse2]
|
||||
psrlw xmm0, 6
|
||||
|
||||
|
||||
WELS_Zero xmm7
|
||||
packuswb xmm0, xmm7
|
||||
movq [edi], xmm0
|
||||
movq [edi], xmm0
|
||||
|
||||
movdqa xmm0, xmm2
|
||||
|
||||
|
||||
lea edi, [edi +edx ]
|
||||
lea ebx, [ebx + eax]
|
||||
|
||||
dec ecx
|
||||
jnz near .xloop
|
||||
|
||||
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -237,8 +237,8 @@ McChromaWidthEq8_sse2:
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void McChromaWidthEq8_ssse3( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; int32_t iHeigh);
|
||||
@ -248,23 +248,23 @@ McChromaWidthEq8_ssse3:
|
||||
push ebx
|
||||
push esi
|
||||
push edi
|
||||
|
||||
|
||||
mov eax, [esp + 12 + 20]
|
||||
|
||||
pxor xmm7, xmm7
|
||||
movd xmm5, [eax]
|
||||
punpcklwd xmm5, xmm5
|
||||
punpckldq xmm5, xmm5
|
||||
movd xmm5, [eax]
|
||||
punpcklwd xmm5, xmm5
|
||||
punpckldq xmm5, xmm5
|
||||
movdqa xmm6, xmm5
|
||||
punpcklqdq xmm5, xmm5
|
||||
punpckhqdq xmm6, xmm6
|
||||
|
||||
mov eax, [esp + 12 + 4]
|
||||
mov edx, [esp + 12 + 8]
|
||||
mov esi, [esp + 12 + 12]
|
||||
mov edi, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
punpckhqdq xmm6, xmm6
|
||||
|
||||
mov eax, [esp + 12 + 4]
|
||||
mov edx, [esp + 12 + 8]
|
||||
mov esi, [esp + 12 + 12]
|
||||
mov edi, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
sub esi, edi
|
||||
sub esi, edi
|
||||
movdqa xmm7, [h264_d0x20_sse2]
|
||||
@ -273,39 +273,39 @@ McChromaWidthEq8_ssse3:
|
||||
movdqa xmm1, xmm0
|
||||
psrldq xmm1, 1
|
||||
punpcklbw xmm0, xmm1
|
||||
|
||||
.hloop_chroma:
|
||||
|
||||
.hloop_chroma:
|
||||
lea esi, [esi+2*edi]
|
||||
|
||||
|
||||
movdqu xmm2, [eax+edx]
|
||||
movdqa xmm3, xmm2
|
||||
psrldq xmm3, 1
|
||||
punpcklbw xmm2, xmm3
|
||||
movdqa xmm4, xmm2
|
||||
|
||||
|
||||
pmaddubsw xmm0, xmm5
|
||||
pmaddubsw xmm2, xmm6
|
||||
paddw xmm0, xmm2
|
||||
paddw xmm0, xmm7
|
||||
psrlw xmm0, 6
|
||||
packuswb xmm0, xmm0
|
||||
movq [esi],xmm0
|
||||
|
||||
movq [esi],xmm0
|
||||
|
||||
lea eax, [eax+2*edx]
|
||||
movdqu xmm2, [eax]
|
||||
movdqa xmm3, xmm2
|
||||
psrldq xmm3, 1
|
||||
punpcklbw xmm2, xmm3
|
||||
movdqa xmm0, xmm2
|
||||
|
||||
|
||||
pmaddubsw xmm4, xmm5
|
||||
pmaddubsw xmm2, xmm6
|
||||
paddw xmm4, xmm2
|
||||
paddw xmm4, xmm7
|
||||
psrlw xmm4, 6
|
||||
packuswb xmm4, xmm4
|
||||
movq [esi+edi],xmm4
|
||||
|
||||
movq [esi+edi],xmm4
|
||||
|
||||
sub ecx, 2
|
||||
jnz .hloop_chroma
|
||||
pop edi
|
||||
|
@ -69,16 +69,16 @@ WELS_EXTERN McHorVer20WidthEq4_mmx
|
||||
|
||||
ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void_t McHorVer20WidthEq4_mmx( uint8_t *pSrc,
|
||||
; int iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int iDstStride,
|
||||
; void_t McHorVer20WidthEq4_mmx( uint8_t *pSrc,
|
||||
; int iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int iDstStride,
|
||||
; int iHeight)
|
||||
;*******************************************************************************
|
||||
McHorVer20WidthEq4_mmx:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
|
||||
mov esi, [esp+12]
|
||||
mov eax, [esp+16]
|
||||
mov edi, [esp+20]
|
||||
@ -100,7 +100,7 @@ McHorVer20WidthEq4_mmx:
|
||||
punpcklbw mm4, mm7
|
||||
movd mm5, [esi+3]
|
||||
punpcklbw mm5, mm7
|
||||
|
||||
|
||||
paddw mm2, mm3
|
||||
paddw mm4, mm5
|
||||
psllw mm4, 2
|
||||
@ -113,12 +113,12 @@ McHorVer20WidthEq4_mmx:
|
||||
psraw mm0, 5
|
||||
packuswb mm0, mm7
|
||||
movd [edi], mm0
|
||||
|
||||
|
||||
add esi, eax
|
||||
add edi, ecx
|
||||
dec edx
|
||||
jnz .height_loop
|
||||
|
||||
|
||||
WELSEMMS
|
||||
pop edi
|
||||
pop esi
|
||||
@ -181,8 +181,8 @@ WELS_EXTERN McHorVer20WidthEq16_sse2
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void_t McHorVer22Width8HorFirst_sse2(int16_t *pSrc,
|
||||
; int16_t iSrcStride,
|
||||
; void_t McHorVer22Width8HorFirst_sse2(int16_t *pSrc,
|
||||
; int16_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride
|
||||
; int32_t iHeight
|
||||
@ -197,11 +197,11 @@ McHorVer22Width8HorFirst_sse2:
|
||||
mov edi, [esp+24] ;pDst
|
||||
mov edx, [esp+28] ;iDstStride
|
||||
mov ebx, [esp+32] ;iHeight
|
||||
pxor xmm7, xmm7
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
sub esi, eax ;;;;;;;;need more 5 lines.
|
||||
sub esi, eax
|
||||
|
||||
|
||||
.yloop_width_8:
|
||||
movq xmm0, [esi]
|
||||
punpcklbw xmm0, xmm7
|
||||
@ -215,7 +215,7 @@ McHorVer22Width8HorFirst_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -225,7 +225,7 @@ McHorVer22Width8HorFirst_sse2:
|
||||
psllw xmm4, 2
|
||||
paddw xmm0, xmm4
|
||||
movdqa [edi], xmm0
|
||||
|
||||
|
||||
add esi, eax
|
||||
add edi, edx
|
||||
dec ebx
|
||||
@ -238,8 +238,8 @@ McHorVer22Width8HorFirst_sse2:
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
;void_t McHorVer22VerLast_sse2(
|
||||
; uint8_t *pSrc,
|
||||
; int32_t pSrcStride,
|
||||
; uint8_t *pSrc,
|
||||
; int32_t pSrcStride,
|
||||
; uint8_t * pDst,
|
||||
; int32_t iDstStride,
|
||||
; int32_t iWidth,
|
||||
@ -250,17 +250,17 @@ ALIGN 16
|
||||
paddw %1, %6
|
||||
movdqa %7, %2
|
||||
movdqa %8, %3
|
||||
|
||||
|
||||
|
||||
|
||||
paddw %7, %5
|
||||
paddw %8, %4
|
||||
|
||||
psubw %1, %7
|
||||
psraw %1, 2
|
||||
paddw %1, %8
|
||||
psubw %1, %7
|
||||
psraw %1, 2
|
||||
paddw %8, %1
|
||||
|
||||
psubw %1, %7
|
||||
psraw %1, 2
|
||||
paddw %1, %8
|
||||
psubw %1, %7
|
||||
psraw %1, 2
|
||||
paddw %8, %1
|
||||
paddw %8, [h264_mc_hc_32]
|
||||
psraw %8, 6
|
||||
packuswb %8, %8
|
||||
@ -272,15 +272,15 @@ McHorVer22VerLast_sse2:
|
||||
push edi
|
||||
push ebx
|
||||
push ebp
|
||||
|
||||
|
||||
mov esi, [esp+20]
|
||||
mov eax, [esp+24]
|
||||
mov edi, [esp+28]
|
||||
mov edx, [esp+32]
|
||||
mov ebx, [esp+36]
|
||||
mov ecx, [esp+40]
|
||||
shr ebx, 3
|
||||
|
||||
mov ecx, [esp+40]
|
||||
shr ebx, 3
|
||||
|
||||
.width_loop:
|
||||
movdqa xmm0, [esi]
|
||||
movdqa xmm1, [esi+eax]
|
||||
@ -290,73 +290,73 @@ McHorVer22VerLast_sse2:
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm4, [esi]
|
||||
movdqa xmm5, [esi+eax]
|
||||
|
||||
|
||||
FILTER_VER xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm6, [esi]
|
||||
|
||||
|
||||
movdqa xmm0, xmm1
|
||||
movdqa xmm1, xmm2
|
||||
movdqa xmm2, xmm3
|
||||
movdqa xmm3, xmm4
|
||||
movdqa xmm4, xmm5
|
||||
movdqa xmm5, xmm6
|
||||
|
||||
|
||||
add edi, edx
|
||||
sub esi, eax
|
||||
|
||||
sub esi, eax
|
||||
|
||||
.start:
|
||||
FILTER_VER xmm0,xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm6, [esi]
|
||||
FILTER_VER xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqa xmm7, [esi+eax]
|
||||
FILTER_VER xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, [edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm0, [esi]
|
||||
FILTER_VER xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, xmm2,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqa xmm1, [esi+eax]
|
||||
FILTER_VER xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,[edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm2, [esi]
|
||||
FILTER_VER xmm5, xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqa xmm3, [esi+eax]
|
||||
FILTER_VER xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,xmm5,[edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm4, [esi]
|
||||
FILTER_VER xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,xmm5,xmm6, [edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqa xmm5, [esi+eax]
|
||||
jmp near .start
|
||||
|
||||
|
||||
.x_loop_dec:
|
||||
dec ebx
|
||||
jz near .exit
|
||||
@ -366,9 +366,9 @@ McHorVer22VerLast_sse2:
|
||||
add esi, 16
|
||||
add edi, 8
|
||||
jmp .width_loop
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.exit:
|
||||
pop ebp
|
||||
pop ebx
|
||||
@ -379,28 +379,28 @@ McHorVer22VerLast_sse2:
|
||||
|
||||
ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void_t McHorVer20WidthEq8_sse2( uint8_t *pSrc,
|
||||
; int iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int iDstStride,
|
||||
; void_t McHorVer20WidthEq8_sse2( uint8_t *pSrc,
|
||||
; int iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int iDstStride,
|
||||
; int iHeight,
|
||||
; );
|
||||
;*******************************************************************************
|
||||
McHorVer20WidthEq8_sse2:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
|
||||
mov esi, [esp + 12] ;pSrc
|
||||
mov eax, [esp + 16] ;iSrcStride
|
||||
mov edi, [esp + 20] ;pDst
|
||||
mov ecx, [esp + 28] ;iHeight
|
||||
mov edx, [esp + 24] ;iDstStride
|
||||
|
||||
|
||||
lea esi, [esi-2] ;pSrc -= 2;
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
movdqa xmm6, [h264_w0x10_1]
|
||||
.y_loop:
|
||||
.y_loop:
|
||||
movq xmm0, [esi]
|
||||
punpcklbw xmm0, xmm7
|
||||
movq xmm1, [esi+5]
|
||||
@ -413,7 +413,7 @@ McHorVer20WidthEq8_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -424,7 +424,7 @@ McHorVer20WidthEq8_sse2:
|
||||
paddw xmm0, xmm4
|
||||
paddw xmm0, xmm6
|
||||
psraw xmm0, 5
|
||||
|
||||
|
||||
packuswb xmm0, xmm7
|
||||
movq [edi], xmm0
|
||||
|
||||
@ -432,37 +432,37 @@ McHorVer20WidthEq8_sse2:
|
||||
lea esi, [esi+eax]
|
||||
dec ecx
|
||||
jnz near .y_loop
|
||||
|
||||
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void_t McHorVer20WidthEq16_sse2( uint8_t *pSrc,
|
||||
; int iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int iDstStride,
|
||||
; void_t McHorVer20WidthEq16_sse2( uint8_t *pSrc,
|
||||
; int iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int iDstStride,
|
||||
; int iHeight,
|
||||
; );
|
||||
;*******************************************************************************
|
||||
McHorVer20WidthEq16_sse2:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
|
||||
|
||||
mov esi, [esp + 12] ;pSrc
|
||||
mov eax, [esp + 16] ;iSrcStride
|
||||
mov edi, [esp + 20] ;pDst
|
||||
mov ecx, [esp + 28] ;iHeight
|
||||
mov edx, [esp + 24] ;iDstStride
|
||||
|
||||
|
||||
lea esi, [esi-2] ;pSrc -= 2;
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
movdqa xmm6, [h264_w0x10_1]
|
||||
.y_loop:
|
||||
|
||||
|
||||
movq xmm0, [esi]
|
||||
punpcklbw xmm0, xmm7
|
||||
movq xmm1, [esi+5]
|
||||
@ -475,7 +475,7 @@ McHorVer20WidthEq16_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -501,7 +501,7 @@ McHorVer20WidthEq16_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3+8]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -514,9 +514,9 @@ McHorVer20WidthEq16_sse2:
|
||||
psraw xmm0, 5
|
||||
packuswb xmm0, xmm7
|
||||
movq [edi+8], xmm0
|
||||
|
||||
lea edi, [edi+edx]
|
||||
lea esi, [esi+eax]
|
||||
|
||||
lea edi, [edi+edx]
|
||||
lea esi, [esi+eax]
|
||||
dec ecx
|
||||
jnz near .y_loop
|
||||
pop edi
|
||||
@ -525,17 +525,17 @@ McHorVer20WidthEq16_sse2:
|
||||
|
||||
|
||||
;*******************************************************************************
|
||||
; void_t McHorVer02WidthEq8_sse2( uint8_t *pSrc,
|
||||
; int iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int iDstStride,
|
||||
; void_t McHorVer02WidthEq8_sse2( uint8_t *pSrc,
|
||||
; int iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int iDstStride,
|
||||
; int iHeight )
|
||||
;*******************************************************************************
|
||||
ALIGN 16
|
||||
McHorVer02WidthEq8_sse2:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
|
||||
mov esi, [esp + 12] ;pSrc
|
||||
mov edx, [esp + 16] ;iSrcStride
|
||||
mov edi, [esp + 20] ;pDst
|
||||
@ -546,7 +546,7 @@ McHorVer02WidthEq8_sse2:
|
||||
sub esi, edx
|
||||
|
||||
WELS_Zero xmm7
|
||||
|
||||
|
||||
SSE_LOAD_8P xmm0, xmm7, [esi]
|
||||
SSE_LOAD_8P xmm1, xmm7, [esi+edx]
|
||||
lea esi, [esi+2*edx]
|
||||
@ -555,8 +555,8 @@ McHorVer02WidthEq8_sse2:
|
||||
lea esi, [esi+2*edx]
|
||||
SSE_LOAD_8P xmm4, xmm7, [esi]
|
||||
SSE_LOAD_8P xmm5, xmm7, [esi+edx]
|
||||
|
||||
.start:
|
||||
|
||||
.start:
|
||||
FILTER_HV_W8 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
jz near .xx_exit
|
||||
@ -566,7 +566,7 @@ McHorVer02WidthEq8_sse2:
|
||||
FILTER_HV_W8 xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [edi+eax]
|
||||
dec ecx
|
||||
jz near .xx_exit
|
||||
|
||||
|
||||
lea edi, [edi+2*eax]
|
||||
SSE_LOAD_8P xmm7, xmm0, [esi+edx]
|
||||
FILTER_HV_W8 xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, [edi]
|
||||
|
@ -32,7 +32,7 @@
|
||||
;* memzero.asm
|
||||
;*
|
||||
;* Abstract
|
||||
;*
|
||||
;*
|
||||
;*
|
||||
;* History
|
||||
;* 9/16/2009 Created
|
||||
@ -47,8 +47,8 @@ BITS 32
|
||||
; Code
|
||||
;***********************************************************************
|
||||
|
||||
SECTION .text
|
||||
|
||||
SECTION .text
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
;_inline void __cdecl WelsPrefetchZero_mmx(int8_t const*_A);
|
||||
@ -57,7 +57,7 @@ WELS_EXTERN WelsPrefetchZero_mmx
|
||||
WelsPrefetchZero_mmx:
|
||||
mov eax,[esp+4]
|
||||
prefetchnta [eax]
|
||||
ret
|
||||
ret
|
||||
|
||||
|
||||
ALIGN 16
|
||||
@ -69,7 +69,7 @@ WelsSetMemZeroAligned64_sse2:
|
||||
mov eax, [esp + 4] ; dst
|
||||
mov ecx, [esp + 8]
|
||||
neg ecx
|
||||
|
||||
|
||||
pxor xmm0, xmm0
|
||||
.memzeroa64_sse2_loops:
|
||||
movdqa [eax], xmm0
|
||||
@ -77,11 +77,11 @@ WelsSetMemZeroAligned64_sse2:
|
||||
movdqa [eax+32], xmm0
|
||||
movdqa [eax+48], xmm0
|
||||
add eax, 0x40
|
||||
|
||||
|
||||
add ecx, 0x40
|
||||
jnz near .memzeroa64_sse2_loops
|
||||
|
||||
ret
|
||||
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
@ -92,7 +92,7 @@ WelsSetMemZeroSize64_mmx:
|
||||
mov eax, [esp + 4] ; dst
|
||||
mov ecx, [esp + 8]
|
||||
neg ecx
|
||||
|
||||
|
||||
pxor mm0, mm0
|
||||
.memzero64_mmx_loops:
|
||||
movq [eax], mm0
|
||||
@ -102,16 +102,16 @@ WelsSetMemZeroSize64_mmx:
|
||||
movq [eax+32], mm0
|
||||
movq [eax+40], mm0
|
||||
movq [eax+48], mm0
|
||||
movq [eax+56], mm0
|
||||
movq [eax+56], mm0
|
||||
add eax, 0x40
|
||||
|
||||
|
||||
add ecx, 0x40
|
||||
jnz near .memzero64_mmx_loops
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void WelsSetMemZeroSize8_mmx(void *dst, int32_t size)
|
||||
;***********************************************************************
|
||||
@ -119,17 +119,17 @@ WELS_EXTERN WelsSetMemZeroSize8_mmx
|
||||
WelsSetMemZeroSize8_mmx:
|
||||
mov eax, [esp + 4] ; dst
|
||||
mov ecx, [esp + 8] ; size
|
||||
neg ecx
|
||||
neg ecx
|
||||
pxor mm0, mm0
|
||||
|
||||
|
||||
.memzero8_mmx_loops:
|
||||
movq [eax], mm0
|
||||
add eax, 0x08
|
||||
|
||||
|
||||
add ecx, 0x08
|
||||
jnz near .memzero8_mmx_loops
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
|
||||
|
@ -27,18 +27,18 @@ LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED
|
||||
// TEXTINCLUDE
|
||||
//
|
||||
|
||||
1 TEXTINCLUDE
|
||||
1 TEXTINCLUDE
|
||||
BEGIN
|
||||
"resource.h\0"
|
||||
END
|
||||
|
||||
2 TEXTINCLUDE
|
||||
2 TEXTINCLUDE
|
||||
BEGIN
|
||||
"#include ""afxres.h""\r\n"
|
||||
"\0"
|
||||
END
|
||||
|
||||
3 TEXTINCLUDE
|
||||
3 TEXTINCLUDE
|
||||
BEGIN
|
||||
"\r\n"
|
||||
"\0"
|
||||
|
@ -43,7 +43,7 @@
|
||||
; Options, for DEBUG
|
||||
;***********************************************************************
|
||||
|
||||
%if 1
|
||||
%if 1
|
||||
%define MOVDQ movdqa
|
||||
%else
|
||||
%define MOVDQ movdqu
|
||||
@ -58,7 +58,7 @@
|
||||
BITS 32
|
||||
|
||||
;***********************************************************************
|
||||
; Macros
|
||||
; Macros
|
||||
;***********************************************************************
|
||||
|
||||
%macro WELS_EXTERN 1
|
||||
@ -74,7 +74,7 @@ BITS 32
|
||||
pxor %2, %2
|
||||
psubw %2, %1
|
||||
pmaxsw %1, %2
|
||||
%endmacro
|
||||
%endmacro
|
||||
|
||||
%macro MMX_XSwap 4
|
||||
movq %4, %2
|
||||
@ -105,7 +105,7 @@ BITS 32
|
||||
SSE2_XSawp qdq, %5, %2, %3
|
||||
%endmacro
|
||||
|
||||
;in: xmm0, xmm1, xmm2, xmm3 pOut: xmm0, xmm1, xmm3, xmm4
|
||||
;in: xmm0, xmm1, xmm2, xmm3 pOut: xmm0, xmm1, xmm3, xmm4
|
||||
%macro SSE2_TransTwo4x4W 5
|
||||
SSE2_XSawp wd, %1, %2, %5
|
||||
SSE2_XSawp wd, %3, %4, %2
|
||||
@ -125,26 +125,26 @@ BITS 32
|
||||
movdqa %6, %9
|
||||
movdqa %9, %4
|
||||
SSE2_XSawp bw, %7, %6, %4
|
||||
|
||||
SSE2_XSawp wd, %1, %3, %6
|
||||
|
||||
SSE2_XSawp wd, %1, %3, %6
|
||||
SSE2_XSawp wd, %8, %2, %3
|
||||
SSE2_XSawp wd, %5, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %3
|
||||
movdqa %9, %3
|
||||
SSE2_XSawp wd, %7, %4, %3
|
||||
|
||||
SSE2_XSawp dq, %1, %5, %4
|
||||
|
||||
SSE2_XSawp dq, %1, %5, %4
|
||||
SSE2_XSawp dq, %6, %2, %5
|
||||
SSE2_XSawp dq, %8, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %5
|
||||
movdqa %9, %5
|
||||
SSE2_XSawp dq, %7, %3, %5
|
||||
|
||||
|
||||
SSE2_XSawp qdq, %1, %8, %3
|
||||
SSE2_XSawp qdq, %4, %2, %8
|
||||
SSE2_XSawp qdq, %6, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %1
|
||||
movdqa %9, %1
|
||||
SSE2_XSawp qdq, %7, %5, %1
|
||||
movdqa %5, %9
|
||||
%endmacro
|
||||
@ -170,9 +170,9 @@ BITS 32
|
||||
%macro butterfly_1to16_sse 3 ; xmm? for dst, xmm? for tmp, one byte for pSrc [generic register name: a/b/c/d]
|
||||
mov %3h, %3l
|
||||
movd %1, e%3x ; i.e, 1% = eax (=b0)
|
||||
pshuflw %2, %1, 00h ; ..., b0 b0 b0 b0 b0 b0 b0 b0
|
||||
pshufd %1, %2, 00h ; b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0
|
||||
%endmacro
|
||||
pshuflw %2, %1, 00h ; ..., b0 b0 b0 b0 b0 b0 b0 b0
|
||||
pshufd %1, %2, 00h ; b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0
|
||||
%endmacro
|
||||
|
||||
;copy a dw into a xmm for 8 times
|
||||
%macro SSE2_Copy8Times 2
|
||||
|
@ -318,25 +318,25 @@ byte_1pos_table:
|
||||
SECTION .text
|
||||
|
||||
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
;int32_t CavlcParamCal_sse2(int16_t*coffLevel, uint8_t* run, int16_t *Level, int32_t* total_coeffs , int32_t endIdx);
|
||||
;int32_t CavlcParamCal_sse2(int16_t*coffLevel, uint8_t* run, int16_t *Level, int32_t* total_coeffs , int32_t endIdx);
|
||||
;***********************************************************************
|
||||
WELS_EXTERN CavlcParamCal_sse2
|
||||
CavlcParamCal_sse2:
|
||||
push ebx
|
||||
push edi
|
||||
push esi
|
||||
|
||||
|
||||
mov eax, [esp+16] ;coffLevel
|
||||
mov edi, [esp+24] ;Level
|
||||
mov ebx, [esp+32] ;endIdx
|
||||
cmp ebx, 3
|
||||
jne .Level16
|
||||
jne .Level16
|
||||
pxor xmm1, xmm1
|
||||
movq xmm0, [eax] ; removed QWORD
|
||||
jmp .Cal_begin
|
||||
.Level16:
|
||||
jmp .Cal_begin
|
||||
.Level16:
|
||||
movdqa xmm0, [eax]
|
||||
movdqa xmm1, [eax+16]
|
||||
.Cal_begin:
|
||||
@ -354,7 +354,7 @@ CavlcParamCal_sse2:
|
||||
pcmpeqw xmm7, xmm7 ;generate -1
|
||||
mov ebx, 0xff
|
||||
;pinsrw xmm6, ebx, 3
|
||||
|
||||
|
||||
mov bl, dh
|
||||
|
||||
lea ebx, [byte_1pos_table+8*ebx]
|
||||
@ -362,7 +362,7 @@ CavlcParamCal_sse2:
|
||||
pextrw ecx, xmm0, 3
|
||||
shr ecx, 8
|
||||
mov dh, cl
|
||||
|
||||
|
||||
.loopHighFind0:
|
||||
cmp ecx, 0
|
||||
je .loopHighFind0End
|
||||
@ -372,7 +372,7 @@ CavlcParamCal_sse2:
|
||||
add esi, 8
|
||||
mov esi, [eax+2*esi]
|
||||
mov [edi], si
|
||||
add edi, 2
|
||||
add edi, 2
|
||||
;add ebx, 1
|
||||
inc ebx
|
||||
dec ecx
|
||||
@ -403,8 +403,8 @@ CavlcParamCal_sse2:
|
||||
;and edx, 0xff
|
||||
movzx edx, byte [ebx]
|
||||
mov edx, [eax+2*edx]
|
||||
mov [edi], dx
|
||||
add edi, 2
|
||||
mov [edi], dx
|
||||
add edi, 2
|
||||
;add ebx, 1
|
||||
inc ebx
|
||||
dec esi
|
||||
@ -436,8 +436,8 @@ CavlcParamCal_sse2:
|
||||
psllq xmm0, xmm3
|
||||
psrlq xmm0, xmm3
|
||||
movdqa xmm4, xmm1
|
||||
psllq xmm1, xmm2
|
||||
psrlq xmm4, xmm3
|
||||
psllq xmm1, xmm2
|
||||
psrlq xmm4, xmm3
|
||||
punpcklqdq xmm1, xmm4
|
||||
por xmm0, xmm1
|
||||
|
||||
|
@ -84,12 +84,12 @@ ALIGN 16
|
||||
; void WelsCPUId( int32_t uiIndex, int32_t *pFeatureA, int32_t *pFeatureB, int32_t *pFeatureC, int32_t *pFeatureD )
|
||||
;****************************************************************************************************
|
||||
WelsCPUId:
|
||||
push ebx
|
||||
push ebx
|
||||
push edi
|
||||
|
||||
|
||||
mov eax, [esp+12] ; operating index
|
||||
cpuid ; cpuid
|
||||
|
||||
|
||||
; processing various information return
|
||||
mov edi, [esp+16]
|
||||
mov [edi], eax
|
||||
@ -100,10 +100,10 @@ WelsCPUId:
|
||||
mov edi, [esp+28]
|
||||
mov [edi], edx
|
||||
|
||||
pop edi
|
||||
pop edi
|
||||
pop ebx
|
||||
ret
|
||||
|
||||
|
||||
WELS_EXTERN WelsCPUSupportAVX
|
||||
; need call after cpuid=1 and eax, ecx flag got then
|
||||
ALIGN 16
|
||||
@ -139,7 +139,7 @@ ALIGN 16
|
||||
WelsCPUSupportFMA:
|
||||
mov eax, [esp+4]
|
||||
mov ecx, [esp+8]
|
||||
|
||||
|
||||
; refer to detection of FMA addressed in INTEL AVX manual document
|
||||
and ecx, 018001000H
|
||||
cmp ecx, 018001000H ; check OSXSAVE, AVX, FMA feature flags
|
||||
@ -153,7 +153,7 @@ WelsCPUSupportFMA:
|
||||
mov eax, 1
|
||||
ret
|
||||
fma_not_supported:
|
||||
mov eax, 0
|
||||
mov eax, 0
|
||||
ret
|
||||
|
||||
WELS_EXTERN WelsEmms
|
||||
|
@ -48,26 +48,26 @@ SECTION .rodata align=16
|
||||
|
||||
;***********************************************************************
|
||||
; Constant
|
||||
;***********************************************************************
|
||||
|
||||
;***********************************************************************
|
||||
|
||||
align 16
|
||||
SSE2_DeQuant8 dw 10, 13, 10, 13, 13, 16, 13, 16,
|
||||
SSE2_DeQuant8 dw 10, 13, 10, 13, 13, 16, 13, 16,
|
||||
dw 10, 13, 10, 13, 13, 16, 13, 16,
|
||||
dw 11, 14, 11, 14, 14, 18, 14, 18,
|
||||
dw 11, 14, 11, 14, 14, 18, 14, 18,
|
||||
dw 11, 14, 11, 14, 14, 18, 14, 18,
|
||||
dw 13, 16, 13, 16, 16, 20, 16, 20,
|
||||
dw 13, 16, 13, 16, 16, 20, 16, 20,
|
||||
dw 14, 18, 14, 18, 18, 23, 18, 23,
|
||||
dw 13, 16, 13, 16, 16, 20, 16, 20,
|
||||
dw 14, 18, 14, 18, 18, 23, 18, 23,
|
||||
dw 14, 18, 14, 18, 18, 23, 18, 23,
|
||||
dw 16, 20, 16, 20, 20, 25, 20, 25,
|
||||
dw 16, 20, 16, 20, 20, 25, 20, 25,
|
||||
dw 18, 23, 18, 23, 23, 29, 23, 29,
|
||||
dw 16, 20, 16, 20, 20, 25, 20, 25,
|
||||
dw 18, 23, 18, 23, 23, 29, 23, 29,
|
||||
dw 18, 23, 18, 23, 23, 29, 23, 29
|
||||
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
; MMX functions
|
||||
;***********************************************************************
|
||||
;***********************************************************************
|
||||
|
||||
%macro MMX_LoadDiff4P 5
|
||||
movd %1, [%3]
|
||||
@ -112,7 +112,7 @@ SSE2_DeQuant8 dw 10, 13, 10, 13, 13, 16, 13, 16,
|
||||
MMX_SumSub %4, %1, %6
|
||||
MMX_SumSub %3, %2, %6
|
||||
MMX_SumSub %3, %4, %6
|
||||
MMX_SumSubMul2 %1, %2, %5
|
||||
MMX_SumSubMul2 %1, %2, %5
|
||||
%endmacro
|
||||
|
||||
%macro MMX_IDCT 6
|
||||
@ -145,13 +145,13 @@ WelsDctT4_mmx:
|
||||
mov edx, [esp+24] ; i_pix2
|
||||
|
||||
WELS_Zero mm7
|
||||
|
||||
|
||||
MMX_LoadDiff4x4P mm1, mm2, mm3, mm4, eax, ebx, ecx, edx, mm0, mm7
|
||||
|
||||
MMX_DCT mm1, mm2, mm3 ,mm4, mm5, mm6
|
||||
MMX_DCT mm1, mm2, mm3 ,mm4, mm5, mm6
|
||||
MMX_Trans4x4W mm3, mm1, mm4, mm5, mm2
|
||||
|
||||
MMX_DCT mm3, mm5, mm2 ,mm4, mm1, mm6
|
||||
|
||||
MMX_DCT mm3, mm5, mm2 ,mm4, mm1, mm6
|
||||
MMX_Trans4x4W mm2, mm3, mm4, mm1, mm5
|
||||
|
||||
mov eax, [esp+ 8] ; pDct
|
||||
@ -178,15 +178,15 @@ WelsIDctT4Rec_mmx:
|
||||
%define i_pred esp+pushsize+16
|
||||
%define pDct esp+pushsize+20
|
||||
|
||||
mov eax, [pDct ]
|
||||
mov eax, [pDct ]
|
||||
movq mm0, [eax+ 0]
|
||||
movq mm1, [eax+ 8]
|
||||
movq mm2, [eax+16]
|
||||
movq mm3, [eax+24]
|
||||
mov edx, [p_dst ]
|
||||
mov ecx, [i_dst ]
|
||||
mov edx, [p_dst ]
|
||||
mov ecx, [i_dst ]
|
||||
mov eax, [p_pred]
|
||||
mov ebx, [i_pred]
|
||||
mov ebx, [i_pred]
|
||||
|
||||
MMX_Trans4x4W mm0, mm1, mm2, mm3, mm4
|
||||
MMX_IDCT mm1, mm2, mm3, mm4, mm0, mm6
|
||||
@ -195,14 +195,14 @@ WelsIDctT4Rec_mmx:
|
||||
|
||||
WELS_Zero mm7
|
||||
WELS_DW32 mm6
|
||||
|
||||
|
||||
MMX_StoreDiff4P mm3, mm0, mm6, mm7, [edx], [eax]
|
||||
MMX_StoreDiff4P mm4, mm0, mm6, mm7, [edx+ecx], [eax+ebx]
|
||||
lea edx, [edx+2*ecx]
|
||||
lea eax, [eax+2*ebx]
|
||||
MMX_StoreDiff4P mm1, mm0, mm6, mm7, [edx], [eax]
|
||||
MMX_StoreDiff4P mm2, mm0, mm6, mm7, [edx+ecx], [eax+ebx]
|
||||
|
||||
|
||||
WELSEMMS
|
||||
%undef pushsize
|
||||
%undef p_dst
|
||||
@ -220,17 +220,17 @@ WelsIDctT4Rec_mmx:
|
||||
%macro SSE2_Store4x8p 6
|
||||
SSE2_XSawp qdq, %2, %3, %6
|
||||
SSE2_XSawp qdq, %4, %5, %3
|
||||
MOVDQ [%1+0x00], %2
|
||||
MOVDQ [%1+0x10], %4
|
||||
MOVDQ [%1+0x20], %6
|
||||
MOVDQ [%1+0x30], %3
|
||||
MOVDQ [%1+0x00], %2
|
||||
MOVDQ [%1+0x10], %4
|
||||
MOVDQ [%1+0x20], %6
|
||||
MOVDQ [%1+0x30], %3
|
||||
%endmacro
|
||||
|
||||
%macro SSE2_Load4x8p 6
|
||||
MOVDQ %2, [%1+0x00]
|
||||
MOVDQ %4, [%1+0x10]
|
||||
MOVDQ %6, [%1+0x20]
|
||||
MOVDQ %3, [%1+0x30]
|
||||
MOVDQ %4, [%1+0x10]
|
||||
MOVDQ %6, [%1+0x20]
|
||||
MOVDQ %3, [%1+0x30]
|
||||
SSE2_XSawp qdq, %4, %3, %5
|
||||
SSE2_XSawp qdq, %2, %6, %3
|
||||
%endmacro
|
||||
@ -271,40 +271,40 @@ WelsIDctT4Rec_mmx:
|
||||
%endmacro
|
||||
|
||||
%macro SSE2_Load8DC 6
|
||||
movdqa %1, %6 ; %1 = dc0 dc1
|
||||
movdqa %1, %6 ; %1 = dc0 dc1
|
||||
paddw %1, %5
|
||||
psraw %1, $6 ; (dc + 32) >> 6
|
||||
|
||||
psraw %1, $6 ; (dc + 32) >> 6
|
||||
|
||||
movdqa %2, %1
|
||||
psrldq %2, 4
|
||||
punpcklwd %2, %2
|
||||
punpckldq %2, %2 ; %2 = dc2 dc2 dc2 dc2 dc3 dc3 dc3 dc3
|
||||
punpckldq %2, %2 ; %2 = dc2 dc2 dc2 dc2 dc3 dc3 dc3 dc3
|
||||
|
||||
movdqa %3, %1
|
||||
psrldq %3, 8
|
||||
punpcklwd %3, %3
|
||||
punpckldq %3, %3 ; %3 = dc4 dc4 dc4 dc4 dc5 dc5 dc5 dc5
|
||||
|
||||
|
||||
movdqa %4, %1
|
||||
psrldq %4, 12
|
||||
punpcklwd %4, %4
|
||||
punpckldq %4, %4 ; %4 = dc6 dc6 dc6 dc6 dc7 dc7 dc7 dc7
|
||||
|
||||
|
||||
punpcklwd %1, %1
|
||||
punpckldq %1, %1 ; %1 = dc0 dc0 dc0 dc0 dc1 dc1 dc1 dc1
|
||||
punpckldq %1, %1 ; %1 = dc0 dc0 dc0 dc0 dc1 dc1 dc1 dc1
|
||||
%endmacro
|
||||
|
||||
%macro SSE2_DCT 6
|
||||
SSE2_SumSub %6, %3, %5
|
||||
SSE2_SumSub %1, %2, %5
|
||||
SSE2_SumSub %3, %2, %5
|
||||
SSE2_SumSubMul2 %6, %1, %4
|
||||
SSE2_SumSub %6, %3, %5
|
||||
SSE2_SumSub %1, %2, %5
|
||||
SSE2_SumSub %3, %2, %5
|
||||
SSE2_SumSubMul2 %6, %1, %4
|
||||
%endmacro
|
||||
|
||||
%macro SSE2_IDCT 7
|
||||
SSE2_SumSub %7, %2, %6
|
||||
SSE2_SumSubDiv2 %1, %3, %5, %4
|
||||
SSE2_SumSub %2, %1, %5
|
||||
SSE2_SumSub %7, %2, %6
|
||||
SSE2_SumSubDiv2 %1, %3, %5, %4
|
||||
SSE2_SumSub %2, %1, %5
|
||||
SSE2_SumSub %7, %4, %5
|
||||
%endmacro
|
||||
|
||||
@ -316,12 +316,12 @@ ALIGN 16
|
||||
WelsDctFourT4_sse2:
|
||||
push ebx
|
||||
push esi
|
||||
mov esi, [esp+12]
|
||||
mov esi, [esp+12]
|
||||
mov eax, [esp+16] ; pix1
|
||||
mov ebx, [esp+20] ; i_pix1
|
||||
mov ecx, [esp+24] ; pix2
|
||||
mov edx, [esp+28] ; i_pix2
|
||||
|
||||
mov edx, [esp+28] ; i_pix2
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
;Load 4x8
|
||||
@ -331,33 +331,33 @@ WelsDctFourT4_sse2:
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
SSE2_LoadDiff8P xmm2, xmm6, xmm7, [eax], [ecx]
|
||||
SSE2_LoadDiff8P xmm3, xmm6, xmm7, [eax+ebx], [ecx+edx]
|
||||
|
||||
|
||||
SSE2_DCT xmm1, xmm2, xmm3, xmm4, xmm5, xmm0
|
||||
SSE2_TransTwo4x4W xmm2, xmm0, xmm3, xmm4, xmm1
|
||||
SSE2_DCT xmm0, xmm4, xmm1, xmm3, xmm5, xmm2
|
||||
SSE2_DCT xmm0, xmm4, xmm1, xmm3, xmm5, xmm2
|
||||
SSE2_TransTwo4x4W xmm4, xmm2, xmm1, xmm3, xmm0
|
||||
|
||||
SSE2_Store4x8p esi, xmm4, xmm2, xmm3, xmm0, xmm5
|
||||
|
||||
|
||||
SSE2_Store4x8p esi, xmm4, xmm2, xmm3, xmm0, xmm5
|
||||
|
||||
lea eax, [eax + 2 * ebx]
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
|
||||
|
||||
;Load 4x8
|
||||
SSE2_LoadDiff8P xmm0, xmm6, xmm7, [eax ], [ecx ]
|
||||
SSE2_LoadDiff8P xmm1, xmm6, xmm7, [eax+ebx ], [ecx+edx]
|
||||
lea eax, [eax + 2 * ebx]
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
SSE2_LoadDiff8P xmm2, xmm6, xmm7, [eax], [ecx]
|
||||
SSE2_LoadDiff8P xmm3, xmm6, xmm7, [eax+ebx], [ecx+edx]
|
||||
|
||||
|
||||
SSE2_DCT xmm1, xmm2, xmm3, xmm4, xmm5, xmm0
|
||||
SSE2_TransTwo4x4W xmm2, xmm0, xmm3, xmm4, xmm1
|
||||
SSE2_DCT xmm0, xmm4, xmm1, xmm3, xmm5, xmm2
|
||||
SSE2_TransTwo4x4W xmm2, xmm0, xmm3, xmm4, xmm1
|
||||
SSE2_DCT xmm0, xmm4, xmm1, xmm3, xmm5, xmm2
|
||||
SSE2_TransTwo4x4W xmm4, xmm2, xmm1, xmm3, xmm0
|
||||
|
||||
|
||||
lea esi, [esi+64]
|
||||
SSE2_Store4x8p esi, xmm4, xmm2, xmm3, xmm0, xmm5
|
||||
|
||||
SSE2_Store4x8p esi, xmm4, xmm2, xmm3, xmm0, xmm5
|
||||
|
||||
pop esi
|
||||
pop ebx
|
||||
ret
|
||||
@ -377,62 +377,62 @@ WelsIDctFourT4Rec_sse2:
|
||||
%define pushsize 8
|
||||
push ebx
|
||||
push esi
|
||||
|
||||
mov eax, [rec]
|
||||
mov ebx, [stride]
|
||||
mov ecx, [pred]
|
||||
mov edx, [pred_stride]
|
||||
mov esi, [rs]
|
||||
|
||||
mov eax, [rec]
|
||||
mov ebx, [stride]
|
||||
mov ecx, [pred]
|
||||
mov edx, [pred_stride]
|
||||
mov esi, [rs]
|
||||
|
||||
;Load 4x8
|
||||
SSE2_Load4x8p esi, xmm0, xmm1, xmm4, xmm2, xmm5
|
||||
|
||||
SSE2_Load4x8p esi, xmm0, xmm1, xmm4, xmm2, xmm5
|
||||
|
||||
SSE2_TransTwo4x4W xmm0, xmm1, xmm4, xmm2, xmm3
|
||||
SSE2_IDCT xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm0
|
||||
SSE2_TransTwo4x4W xmm1, xmm4, xmm0, xmm2, xmm3
|
||||
SSE2_IDCT xmm4, xmm2, xmm3, xmm0, xmm5, xmm6, xmm1
|
||||
|
||||
|
||||
WELS_Zero xmm7
|
||||
WELS_DW32 xmm6
|
||||
|
||||
SSE2_StoreDiff8p xmm4, xmm5, xmm6, xmm7, [eax ], [ecx]
|
||||
SSE2_StoreDiff8p xmm0, xmm5, xmm6, xmm7, [eax + ebx ], [ecx + edx]
|
||||
lea eax, [eax + 2 * ebx]
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
SSE2_StoreDiff8p xmm1, xmm5, xmm6, xmm7, [eax], [ecx]
|
||||
SSE2_StoreDiff8p xmm2, xmm5, xmm6, xmm7, [eax + ebx ], [ecx + edx]
|
||||
|
||||
|
||||
add esi, 64
|
||||
lea eax, [eax + 2 * ebx]
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
SSE2_Load4x8p esi, xmm0, xmm1, xmm4, xmm2, xmm5
|
||||
|
||||
SSE2_Load4x8p esi, xmm0, xmm1, xmm4, xmm2, xmm5
|
||||
|
||||
SSE2_TransTwo4x4W xmm0, xmm1, xmm4, xmm2, xmm3
|
||||
SSE2_IDCT xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm0
|
||||
SSE2_IDCT xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm0
|
||||
SSE2_TransTwo4x4W xmm1, xmm4, xmm0, xmm2, xmm3
|
||||
SSE2_IDCT xmm4, xmm2, xmm3, xmm0, xmm5, xmm6, xmm1
|
||||
|
||||
WELS_Zero xmm7
|
||||
WELS_DW32 xmm6
|
||||
|
||||
|
||||
SSE2_StoreDiff8p xmm4, xmm5, xmm6, xmm7, [eax ], [ecx]
|
||||
SSE2_StoreDiff8p xmm0, xmm5, xmm6, xmm7, [eax + ebx ], [ecx + edx]
|
||||
lea eax, [eax + 2 * ebx]
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
lea ecx, [ecx + 2 * edx]
|
||||
SSE2_StoreDiff8p xmm1, xmm5, xmm6, xmm7, [eax], [ecx]
|
||||
SSE2_StoreDiff8p xmm2, xmm5, xmm6, xmm7, [eax + ebx], [ecx + edx]
|
||||
SSE2_StoreDiff8p xmm2, xmm5, xmm6, xmm7, [eax + ebx], [ecx + edx]
|
||||
|
||||
pop esi
|
||||
pop ebx
|
||||
ret
|
||||
|
||||
|
||||
%macro SSE2_StoreDiff4x8p 8
|
||||
SSE2_StoreDiff8p %1, %3, %4, [%5], [%6]
|
||||
SSE2_StoreDiff8p %1, %3, %4, [%5 + %7], [%6 + %8]
|
||||
SSE2_StoreDiff8p %1, %3, %4, [%5 + %7], [%6 + %8]
|
||||
SSE2_StoreDiff8p %2, %3, %4, [%5 + 8], [%6 + 8]
|
||||
SSE2_StoreDiff8p %2, %3, %4, [%5 + %7 + 8], [%6 + %8 + 8]
|
||||
SSE2_StoreDiff8p %2, %3, %4, [%5 + %7 + 8], [%6 + %8 + 8]
|
||||
%endmacro
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
; void WelsIDctRecI16x16Dc_sse2(uint8_t *rec, int32_t stride, uint8_t *pred, int32_t pred_stride, int16_t *dct_dc)
|
||||
;***********************************************************************
|
||||
@ -443,47 +443,47 @@ ALIGN 16
|
||||
WelsIDctRecI16x16Dc_sse2:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
|
||||
mov ecx, [luma_dc]
|
||||
mov eax, [rec]
|
||||
mov edx, [stride]
|
||||
mov esi, [pred]
|
||||
mov edi, [pred_stride]
|
||||
mov eax, [rec]
|
||||
mov edx, [stride]
|
||||
mov esi, [pred]
|
||||
mov edi, [pred_stride]
|
||||
pxor xmm7, xmm7
|
||||
WELS_DW32 xmm6
|
||||
|
||||
|
||||
SSE2_Load8DC xmm0, xmm1, xmm2, xmm3, xmm6, [ecx]
|
||||
SSE2_StoreDiff4x8p xmm0, xmm1, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm0, xmm1, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm2, xmm3, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm2, xmm3, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
SSE2_Load8DC xmm0, xmm1, xmm2, xmm3, xmm6, [ecx + 16]
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm0, xmm1, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm0, xmm1, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm2, xmm3, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm2, xmm3, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
|
||||
SSE2_Load8DC xmm0, xmm1, xmm2, xmm3, xmm6, [ecx + 16]
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm0, xmm1, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm0, xmm1, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm2, xmm3, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
lea eax, [eax + 2 * edx]
|
||||
lea esi, [esi + 2 * edi]
|
||||
SSE2_StoreDiff4x8p xmm2, xmm3, xmm5, xmm7, eax, esi, edx, edi
|
||||
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
@ -517,7 +517,7 @@ WelsIDctRecI16x16Dc_sse2:
|
||||
punpckldq %3, %4
|
||||
punpcklqdq %1, %3
|
||||
%endmacro
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
;void WelsHadamardT4Dc_sse2( int16_t *luma_dc, int16_t *pDct)
|
||||
;***********************************************************************
|
||||
@ -525,23 +525,23 @@ WELS_EXTERN WelsHadamardT4Dc_sse2
|
||||
WelsHadamardT4Dc_sse2:
|
||||
mov eax, [esp + 4] ; luma_dc
|
||||
mov ecx, [esp + 8] ; pDct
|
||||
|
||||
|
||||
SSE2_Load4Col xmm1, xmm5, xmm6, xmm0, ecx
|
||||
SSE2_Load4Col xmm2, xmm5, xmm6, xmm0, ecx + 0x40
|
||||
SSE2_Load4Col xmm3, xmm5, xmm6, xmm0, ecx + 0x100
|
||||
SSE2_Load4Col xmm4, xmm5, xmm6, xmm0, ecx + 0x140
|
||||
|
||||
|
||||
SSE2_SumSubD xmm1, xmm2, xmm7
|
||||
SSE2_SumSubD xmm3, xmm4, xmm7
|
||||
SSE2_SumSubD xmm2, xmm4, xmm7
|
||||
SSE2_SumSubD xmm1, xmm3, xmm7
|
||||
SSE2_SumSubD xmm1, xmm3, xmm7
|
||||
|
||||
SSE2_Trans4x4D xmm4, xmm2, xmm1, xmm3, xmm5 ; pOut: xmm4,xmm3,xmm5,xmm1
|
||||
|
||||
|
||||
SSE2_SumSubD xmm4, xmm3, xmm7
|
||||
SSE2_SumSubD xmm5, xmm1, xmm7
|
||||
|
||||
WELS_DD1 xmm6
|
||||
WELS_DD1 xmm6
|
||||
SSE2_SumSubDiv2D xmm3, xmm1, xmm6, xmm0 ; pOut: xmm3 = (xmm3+xmm1+1)/2, xmm0 = (xmm3-xmm1+1)/2
|
||||
SSE2_SumSubDiv2D xmm4, xmm5, xmm6, xmm1 ; pOut: xmm4 = (xmm4+xmm5+1)/2, xmm1 = (xmm4-xmm5+1)/2
|
||||
SSE2_Trans4x4D xmm3, xmm0, xmm1, xmm4, xmm2 ; pOut: xmm3,xmm4,xmm2,xmm1
|
||||
@ -550,7 +550,7 @@ WelsHadamardT4Dc_sse2:
|
||||
packssdw xmm2, xmm1
|
||||
movdqa [eax+ 0], xmm3
|
||||
movdqa [eax+16], xmm2
|
||||
|
||||
ret
|
||||
|
||||
ret
|
||||
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -153,11 +153,11 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
lea %1, [%1+%2]
|
||||
%endmacro
|
||||
|
||||
%macro exp_top_bottom_sse2 1 ; iPaddingSize [luma(32)/chroma(16)]
|
||||
%macro exp_top_bottom_sse2 1 ; iPaddingSize [luma(32)/chroma(16)]
|
||||
; ebx [width/16(8)]
|
||||
; esi [pSrc+0], edi [pSrc-1], ecx [-stride], 32(16) ; top
|
||||
; eax [pSrc+(h-1)*stride], ebp [pSrc+(h+31)*stride], 32(16) ; bottom
|
||||
|
||||
|
||||
%if %1 == 32 ; for luma
|
||||
sar ebx, 04h ; width / 16(8) pixels
|
||||
.top_bottom_loops:
|
||||
@ -171,7 +171,7 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a
|
||||
mov_line_end16x4_sse2 edi, ecx, xmm0, a
|
||||
|
||||
|
||||
; bottom
|
||||
movdqa xmm1, [eax] ; last line of picture pData
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a ; dst, stride, xmm?
|
||||
@ -182,15 +182,15 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a
|
||||
mov_line_end16x4_sse2 ebp, ecx, xmm1, a
|
||||
|
||||
|
||||
lea esi, [esi+16] ; top pSrc
|
||||
lea edi, [edi+16] ; top dst
|
||||
lea eax, [eax+16] ; bottom pSrc
|
||||
lea ebp, [ebp+16] ; bottom dst
|
||||
neg ecx ; positive/negative stride need for next loop?
|
||||
|
||||
neg ecx ; positive/negative stride need for next loop?
|
||||
|
||||
dec ebx
|
||||
jnz near .top_bottom_loops
|
||||
jnz near .top_bottom_loops
|
||||
%elif %1 == 16 ; for chroma ??
|
||||
mov edx, ebx
|
||||
sar ebx, 04h ; (width / 16) pixels
|
||||
@ -200,21 +200,21 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a
|
||||
mov_line_16x4_sse2 edi, ecx, xmm0, a
|
||||
mov_line_end16x4_sse2 edi, ecx, xmm0, a
|
||||
|
||||
mov_line_end16x4_sse2 edi, ecx, xmm0, a
|
||||
|
||||
; bottom
|
||||
movdqa xmm1, [eax] ; last line of picture pData
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm1, a
|
||||
mov_line_end16x4_sse2 ebp, ecx, xmm1, a
|
||||
|
||||
mov_line_end16x4_sse2 ebp, ecx, xmm1, a
|
||||
|
||||
lea esi, [esi+16] ; top pSrc
|
||||
lea edi, [edi+16] ; top dst
|
||||
lea eax, [eax+16] ; bottom pSrc
|
||||
lea ebp, [ebp+16] ; bottom dst
|
||||
neg ecx ; positive/negative stride need for next loop?
|
||||
|
||||
neg ecx ; positive/negative stride need for next loop?
|
||||
|
||||
dec ebx
|
||||
jnz near .top_bottom_loops
|
||||
|
||||
@ -241,50 +241,50 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
%endif
|
||||
%endmacro
|
||||
|
||||
%macro exp_left_right_sse2 2 ; iPaddingSize [luma(32)/chroma(16)], u/a
|
||||
%macro exp_left_right_sse2 2 ; iPaddingSize [luma(32)/chroma(16)], u/a
|
||||
; ecx [height]
|
||||
; esi [pSrc+0], edi [pSrc-32], edx [stride], 32(16) ; left
|
||||
; ebx [pSrc+(w-1)], ebp [pSrc+w], 32(16) ; right
|
||||
; xor eax, eax ; for pixel pData (uint8_t) ; make sure eax=0 at least high 24 bits of eax = 0
|
||||
|
||||
%if %1 == 32 ; for luma
|
||||
|
||||
%if %1 == 32 ; for luma
|
||||
.left_right_loops:
|
||||
; left
|
||||
mov al, byte [esi] ; pixel pData for left border
|
||||
butterfly_1to16_sse xmm0, xmm1, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
movdqa [edi], xmm0
|
||||
movdqa [edi+16], xmm0
|
||||
|
||||
|
||||
; right
|
||||
mov al, byte [ebx]
|
||||
butterfly_1to16_sse xmm1, xmm2, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
movdqa [ebp], xmm1
|
||||
movdqa [ebp+16], xmm1
|
||||
|
||||
|
||||
lea esi, [esi+edx] ; left pSrc
|
||||
lea edi, [edi+edx] ; left dst
|
||||
lea ebx, [ebx+edx] ; right pSrc
|
||||
lea ebp, [ebp+edx] ; right dst
|
||||
|
||||
lea ebp, [ebp+edx] ; right dst
|
||||
|
||||
dec ecx
|
||||
jnz near .left_right_loops
|
||||
%elif %1 == 16 ; for chroma ??
|
||||
jnz near .left_right_loops
|
||||
%elif %1 == 16 ; for chroma ??
|
||||
.left_right_loops:
|
||||
; left
|
||||
mov al, byte [esi] ; pixel pData for left border
|
||||
butterfly_1to16_sse xmm0, xmm1, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
movdqa [edi], xmm0
|
||||
|
||||
movdqa [edi], xmm0
|
||||
|
||||
; right
|
||||
mov al, byte [ebx]
|
||||
butterfly_1to16_sse xmm1, xmm2, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
movdq%2 [ebp], xmm1 ; might not be aligned 16 bytes in case chroma planes
|
||||
|
||||
|
||||
lea esi, [esi+edx] ; left pSrc
|
||||
lea edi, [edi+edx] ; left dst
|
||||
lea ebx, [ebx+edx] ; right pSrc
|
||||
lea ebp, [ebp+edx] ; right dst
|
||||
|
||||
lea ebp, [ebp+edx] ; right dst
|
||||
|
||||
dec ecx
|
||||
jnz near .left_right_loops
|
||||
%endif
|
||||
@ -337,25 +337,25 @@ WELS_EXTERN ExpandPictureChromaUnalign_sse2 ; for chroma unalignment
|
||||
; TL
|
||||
mov_line_16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
mov_line_end16x4_sse2 edi, ecx, xmm3, a ; dst, stride, xmm?
|
||||
|
||||
; TR
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
mov_line_end16x4_sse2 ebp, ecx, xmm4, %2 ; dst, stride, xmm?
|
||||
|
||||
; BL
|
||||
mov_line_16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
mov_line_end16x4_sse2 eax, ecx, xmm5, a ; dst, stride, xmm?
|
||||
|
||||
; BR
|
||||
mov_line_16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
mov_line_16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
mov_line_end16x4_sse2 ebx, ecx, xmm6, %2 ; dst, stride, xmm?
|
||||
%endif
|
||||
%endmacro
|
||||
@ -373,7 +373,7 @@ ExpandPictureLuma_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebp
|
||||
|
||||
|
||||
; for both top and bottom border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst
|
||||
@ -385,10 +385,10 @@ ExpandPictureLuma_sse2:
|
||||
mov cl, byte [esi]
|
||||
butterfly_1to16_sse xmm3, xmm4, c ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; load top border
|
||||
mov ecx, edx ; stride
|
||||
mov ecx, edx ; stride
|
||||
neg ecx ; -stride
|
||||
lea edi, [esi+ecx] ; last line of top border
|
||||
; load bottom border
|
||||
; load bottom border
|
||||
dec eax ; h-1
|
||||
imul eax, edx ; (h-1)*stride
|
||||
lea eax, [esi+eax] ; last line of picture pData
|
||||
@ -396,16 +396,16 @@ ExpandPictureLuma_sse2:
|
||||
lea ebp, [eax+edx] ; last line of bottom border, (h-1)*stride + 32 * stride
|
||||
; also prepare for cross border pData: bottom-left with xmm5, bottom-right xmm6
|
||||
dec ebx ; width-1
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
; xor edx, edx
|
||||
mov dl, byte [eax] ; bottom-left
|
||||
butterfly_1to16_sse xmm5, xmm6, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
mov dl, byte [ebx] ; bottom-right
|
||||
butterfly_1to16_sse xmm6, xmm4, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for top & bottom expanding
|
||||
; for top & bottom expanding
|
||||
mov ebx, [esp+32] ; width
|
||||
exp_top_bottom_sse2 32
|
||||
|
||||
exp_top_bottom_sse2 32
|
||||
|
||||
; for both left and right border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst: left border pSrc
|
||||
@ -417,14 +417,14 @@ ExpandPictureLuma_sse2:
|
||||
lea edi, [esi+eax] ; left border dst
|
||||
dec ebx
|
||||
lea ebx, [esi+ebx] ; right border pSrc, (p_dst + width - 1)
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
; prepare for cross border pData: top-right with xmm4
|
||||
; xor eax, eax
|
||||
mov al, byte [ebx] ; top-right
|
||||
butterfly_1to16_sse xmm4, xmm0, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for left & right border expanding
|
||||
exp_left_right_sse2 32, a
|
||||
|
||||
|
||||
; for cross border [top-left, top-right, bottom-left, bottom-right]
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst
|
||||
@ -434,7 +434,7 @@ ExpandPictureLuma_sse2:
|
||||
; have done xmm3,..,xmm6 cross pData initialization above, perform pading as below, To be continued..
|
||||
mov eax, -32 ; luma=-32, chroma=-16
|
||||
neg ecx ; -stride
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [edi+ecx] ; last line of top-left border
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [ebp+ecx] ; last line of top-right border
|
||||
@ -442,19 +442,19 @@ ExpandPictureLuma_sse2:
|
||||
mov ecx, [esp+28] ; stride
|
||||
imul edx, ecx ; (height+32(16)) * stride
|
||||
lea eax, [edi+edx] ; last line of bottom-left border
|
||||
lea ebx, [ebp+edx] ; last line of bottom-right border
|
||||
lea ebx, [ebp+edx] ; last line of bottom-right border
|
||||
neg ecx ; -stride
|
||||
; for left & right border expanding
|
||||
exp_cross_sse2 32, a
|
||||
|
||||
exp_cross_sse2 32, a
|
||||
|
||||
; sfence ; commit cache write back memory
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
pop edx
|
||||
pop ebx
|
||||
|
||||
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
@ -470,7 +470,7 @@ ExpandPictureChromaAlign_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebp
|
||||
|
||||
|
||||
; for both top and bottom border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst
|
||||
@ -482,10 +482,10 @@ ExpandPictureChromaAlign_sse2:
|
||||
mov cl, byte [esi]
|
||||
butterfly_1to16_sse xmm3, xmm4, c ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; load top border
|
||||
mov ecx, edx ; stride
|
||||
mov ecx, edx ; stride
|
||||
neg ecx ; -stride
|
||||
lea edi, [esi+ecx] ; last line of top border
|
||||
; load bottom border
|
||||
; load bottom border
|
||||
dec eax ; h-1
|
||||
imul eax, edx ; (h-1)*stride
|
||||
lea eax, [esi+eax] ; last line of picture pData
|
||||
@ -493,16 +493,16 @@ ExpandPictureChromaAlign_sse2:
|
||||
lea ebp, [eax+edx] ; last line of bottom border, (h-1)*stride + 16 * stride
|
||||
; also prepare for cross border pData: bottom-left with xmm5, bottom-right xmm6
|
||||
dec ebx ; width-1
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
; xor edx, edx
|
||||
mov dl, byte [eax] ; bottom-left
|
||||
butterfly_1to16_sse xmm5, xmm6, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
mov dl, byte [ebx] ; bottom-right
|
||||
butterfly_1to16_sse xmm6, xmm4, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for top & bottom expanding
|
||||
; for top & bottom expanding
|
||||
mov ebx, [esp+32] ; width
|
||||
exp_top_bottom_sse2 16
|
||||
|
||||
exp_top_bottom_sse2 16
|
||||
|
||||
; for both left and right border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst: left border pSrc
|
||||
@ -514,14 +514,14 @@ ExpandPictureChromaAlign_sse2:
|
||||
lea edi, [esi+eax] ; left border dst
|
||||
dec ebx
|
||||
lea ebx, [esi+ebx] ; right border pSrc, (p_dst + width - 1)
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
; prepare for cross border pData: top-right with xmm4
|
||||
; xor eax, eax
|
||||
mov al, byte [ebx] ; top-right
|
||||
butterfly_1to16_sse xmm4, xmm0, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for left & right border expanding
|
||||
exp_left_right_sse2 16, a
|
||||
|
||||
|
||||
; for cross border [top-left, top-right, bottom-left, bottom-right]
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst
|
||||
@ -531,9 +531,9 @@ ExpandPictureChromaAlign_sse2:
|
||||
; have done xmm3,..,xmm6 cross pData initialization above, perform pading as below, To be continued..
|
||||
mov eax, -16 ; chroma=-16
|
||||
neg ecx ; -stride
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [edi+ecx] ; last line of top-left border
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [ebp+ecx] ; last line of top-right border
|
||||
mov ecx, [esp+28] ; stride
|
||||
add edx, 16 ; height+16, luma=32, chroma=16
|
||||
@ -543,15 +543,15 @@ ExpandPictureChromaAlign_sse2:
|
||||
neg ecx ; -stride
|
||||
; for left & right border expanding
|
||||
exp_cross_sse2 16, a
|
||||
|
||||
|
||||
; sfence ; commit cache write back memory
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
pop edx
|
||||
pop ebx
|
||||
|
||||
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
@ -567,7 +567,7 @@ ExpandPictureChromaUnalign_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebp
|
||||
|
||||
|
||||
; for both top and bottom border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst
|
||||
@ -579,10 +579,10 @@ ExpandPictureChromaUnalign_sse2:
|
||||
mov cl, byte [esi]
|
||||
butterfly_1to16_sse xmm3, xmm4, c ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; load top border
|
||||
mov ecx, edx ; stride
|
||||
mov ecx, edx ; stride
|
||||
neg ecx ; -stride
|
||||
lea edi, [esi+ecx] ; last line of top border
|
||||
; load bottom border
|
||||
; load bottom border
|
||||
dec eax ; h-1
|
||||
imul eax, edx ; (h-1)*stride
|
||||
lea eax, [esi+eax] ; last line of picture pData
|
||||
@ -590,16 +590,16 @@ ExpandPictureChromaUnalign_sse2:
|
||||
lea ebp, [eax+edx] ; last line of bottom border, (h-1)*stride + 16 * stride
|
||||
; also prepare for cross border pData: bottom-left with xmm5, bottom-right xmm6
|
||||
dec ebx ; width-1
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
lea ebx, [eax+ebx] ; dst[w-1][h-1]
|
||||
; xor edx, edx
|
||||
mov dl, byte [eax] ; bottom-left
|
||||
butterfly_1to16_sse xmm5, xmm6, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
mov dl, byte [ebx] ; bottom-right
|
||||
butterfly_1to16_sse xmm6, xmm4, d ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for top & bottom expanding
|
||||
; for top & bottom expanding
|
||||
mov ebx, [esp+32] ; width
|
||||
exp_top_bottom_sse2 16
|
||||
|
||||
exp_top_bottom_sse2 16
|
||||
|
||||
; for both left and right border
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst: left border pSrc
|
||||
@ -611,14 +611,14 @@ ExpandPictureChromaUnalign_sse2:
|
||||
lea edi, [esi+eax] ; left border dst
|
||||
dec ebx
|
||||
lea ebx, [esi+ebx] ; right border pSrc, (p_dst + width - 1)
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
lea ebp, [ebx+1] ; right border dst
|
||||
; prepare for cross border pData: top-right with xmm4
|
||||
; xor eax, eax
|
||||
mov al, byte [ebx] ; top-right
|
||||
butterfly_1to16_sse xmm4, xmm0, a ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
; for left & right border expanding
|
||||
exp_left_right_sse2 16, u
|
||||
|
||||
|
||||
; for cross border [top-left, top-right, bottom-left, bottom-right]
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
mov esi, [esp+24] ; p_dst
|
||||
@ -628,9 +628,9 @@ ExpandPictureChromaUnalign_sse2:
|
||||
; have done xmm3,..,xmm6 cross pData initialization above, perform pading as below, To be continued..
|
||||
neg ecx ; -stride
|
||||
mov eax, -16 ; chroma=-16
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [esi+eax]
|
||||
lea edi, [edi+ecx] ; last line of top-left border
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [esi+ebx]
|
||||
lea ebp, [ebp+ecx] ; last line of top-right border
|
||||
mov ecx, [esp+28] ; stride
|
||||
add edx, 16 ; height+16, luma=32, chroma=16
|
||||
@ -640,14 +640,14 @@ ExpandPictureChromaUnalign_sse2:
|
||||
neg ecx ; -stride
|
||||
; for left & right border expanding
|
||||
exp_cross_sse2 16, u
|
||||
|
||||
|
||||
; sfence ; commit cache write back memory
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
pop edx
|
||||
pop ebx
|
||||
|
||||
|
||||
ret
|
||||
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -32,7 +32,7 @@
|
||||
;* intra_pred_util.asm
|
||||
;*
|
||||
;* Abstract
|
||||
;* mmxext/sse for WelsFillingPred8to16, WelsFillingPred8x2to16 and
|
||||
;* mmxext/sse for WelsFillingPred8to16, WelsFillingPred8x2to16 and
|
||||
;* WelsFillingPred1to16 etc.
|
||||
;*
|
||||
;* History
|
||||
@ -84,7 +84,7 @@ WelsFillingPred8to16_mmx:
|
||||
movq mm0, [ecx]
|
||||
movq [eax ], mm0
|
||||
movq [eax+8], mm0
|
||||
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
@ -100,16 +100,16 @@ WelsFillingPred8x2to16_mmx:
|
||||
movq mm1, [ecx+8]
|
||||
movq [eax ], mm0
|
||||
movq [eax+8], mm1
|
||||
|
||||
|
||||
WELSEMMS
|
||||
|
||||
ret
|
||||
|
||||
%macro butterfly_1to8_mmx 3 ; mm? for dst, mm? for tmp, one byte for pSrc [generic register name: a/b/c/d]
|
||||
mov %3h, %3l
|
||||
movd %2, e%3x ; i.e, 1% = eax (=b0)
|
||||
pshufw %1, %2, 00h ; b0 b0 b0 b0, b0 b0 b0 b0
|
||||
%endmacro
|
||||
mov %3h, %3l
|
||||
movd %2, e%3x ; i.e, 1% = eax (=b0)
|
||||
pshufw %1, %2, 00h ; b0 b0 b0 b0, b0 b0 b0 b0
|
||||
%endmacro
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************----------------
|
||||
@ -120,10 +120,10 @@ WelsFillingPred1to16_mmx:
|
||||
|
||||
mov cl, byte [esp+8] ; v
|
||||
butterfly_1to8_mmx mm0, mm1, c ; mm? for dst, mm? for tmp, one byte for pSrc [generic register name: a/b/c/d]
|
||||
|
||||
|
||||
movq [eax ], mm0
|
||||
movq [eax+8], mm0
|
||||
|
||||
|
||||
WELSEMMS
|
||||
|
||||
ret
|
||||
@ -136,9 +136,9 @@ WelsFillingPred8x2to16_sse2:
|
||||
mov eax, [esp+4] ; pred
|
||||
mov ecx, [esp+8] ; v
|
||||
|
||||
movdqa xmm0, [ecx]
|
||||
movdqa [eax], xmm0
|
||||
|
||||
movdqa xmm0, [ecx]
|
||||
movdqa [eax], xmm0
|
||||
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
@ -150,7 +150,7 @@ WelsFillingPred1to16_sse2:
|
||||
|
||||
mov cl, byte [esp+8] ; v
|
||||
butterfly_1to16_sse xmm0, xmm1, c ; dst, tmp, pSrc [generic register name: a/b/c/d]
|
||||
|
||||
|
||||
movdqa [eax], xmm0
|
||||
|
||||
|
||||
ret
|
||||
|
@ -32,7 +32,7 @@
|
||||
;* mb_copy.asm
|
||||
;*
|
||||
;* Abstract
|
||||
;* mb_copy
|
||||
;* mb_copy
|
||||
;*
|
||||
;*
|
||||
;*********************************************************************************************/
|
||||
@ -52,9 +52,9 @@ SECTION .text
|
||||
WELS_EXTERN WelsCopy16x16_sse2
|
||||
WELS_EXTERN WelsCopy16x16NotAligned_sse2
|
||||
WELS_EXTERN WelsCopy8x8_mmx
|
||||
WELS_EXTERN WelsCopy16x8NotAligned_sse2 ;
|
||||
WELS_EXTERN WelsCopy8x16_mmx ;
|
||||
WELS_EXTERN UpdateMbMv_sse2 ;
|
||||
WELS_EXTERN WelsCopy16x8NotAligned_sse2 ;
|
||||
WELS_EXTERN WelsCopy8x16_mmx ;
|
||||
WELS_EXTERN UpdateMbMv_sse2 ;
|
||||
|
||||
;***********************************************************************
|
||||
; void WelsCopy16x16_sse2( uint8_t* Dst,
|
||||
@ -66,7 +66,7 @@ ALIGN 16
|
||||
WelsCopy16x16_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
push ebx
|
||||
|
||||
mov edi, [esp+16] ; Dst
|
||||
mov eax, [esp+20] ; iStrideD
|
||||
@ -107,7 +107,7 @@ WelsCopy16x16_sse2:
|
||||
movdqa xmm5, [esi+ecx]
|
||||
movdqa xmm6, [esi+2*ecx]
|
||||
movdqa xmm7, [esi+edx]
|
||||
|
||||
|
||||
movdqa [edi], xmm0
|
||||
movdqa [edi+eax], xmm1
|
||||
movdqa [edi+2*eax], xmm2
|
||||
@ -116,7 +116,7 @@ WelsCopy16x16_sse2:
|
||||
movdqa [edi], xmm4
|
||||
movdqa [edi+eax], xmm5
|
||||
movdqa [edi+2*eax], xmm6
|
||||
movdqa [edi+ebx], xmm7
|
||||
movdqa [edi+ebx], xmm7
|
||||
|
||||
pop ebx
|
||||
pop edi
|
||||
@ -134,7 +134,7 @@ ALIGN 16
|
||||
WelsCopy16x16NotAligned_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
push ebx
|
||||
|
||||
mov edi, [esp+16] ; Dst
|
||||
mov eax, [esp+20] ; iStrideD
|
||||
@ -175,7 +175,7 @@ WelsCopy16x16NotAligned_sse2:
|
||||
movdqu xmm5, [esi+ecx]
|
||||
movdqu xmm6, [esi+2*ecx]
|
||||
movdqu xmm7, [esi+edx]
|
||||
|
||||
|
||||
movdqa [edi], xmm0
|
||||
movdqa [edi+eax], xmm1
|
||||
movdqa [edi+2*eax], xmm2
|
||||
@ -184,8 +184,8 @@ WelsCopy16x16NotAligned_sse2:
|
||||
movdqa [edi], xmm4
|
||||
movdqa [edi+eax], xmm5
|
||||
movdqa [edi+2*eax], xmm6
|
||||
movdqa [edi+ebx], xmm7
|
||||
|
||||
movdqa [edi+ebx], xmm7
|
||||
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -202,7 +202,7 @@ ALIGN 16
|
||||
WelsCopy16x8NotAligned_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
push ebx
|
||||
|
||||
mov edi, [esp+16] ; Dst
|
||||
mov eax, [esp+20] ; iStrideD
|
||||
@ -220,7 +220,7 @@ WelsCopy16x8NotAligned_sse2:
|
||||
movdqu xmm4, [esi]
|
||||
movdqu xmm5, [esi+ecx]
|
||||
movdqu xmm6, [esi+2*ecx]
|
||||
movdqu xmm7, [esi+edx]
|
||||
movdqu xmm7, [esi+edx]
|
||||
|
||||
movdqa [edi], xmm0
|
||||
movdqa [edi+eax], xmm1
|
||||
@ -231,7 +231,7 @@ WelsCopy16x8NotAligned_sse2:
|
||||
movdqa [edi+eax], xmm5
|
||||
movdqa [edi+2*eax], xmm6
|
||||
movdqa [edi+ebx], xmm7
|
||||
|
||||
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -245,7 +245,7 @@ WelsCopy16x8NotAligned_sse2:
|
||||
; int32_t iStrideS )
|
||||
;***********************************************************************
|
||||
ALIGN 16
|
||||
WelsCopy8x16_mmx:
|
||||
WelsCopy8x16_mmx:
|
||||
push ebx
|
||||
|
||||
mov eax, [esp + 8 ] ;Dst
|
||||
@ -253,60 +253,60 @@ WelsCopy8x16_mmx:
|
||||
mov ebx, [esp + 16] ;Src
|
||||
mov edx, [esp + 20] ;iStrideS
|
||||
|
||||
movq mm0, [ebx]
|
||||
movq mm1, [ebx+edx]
|
||||
movq mm0, [ebx]
|
||||
movq mm1, [ebx+edx]
|
||||
lea ebx, [ebx+2*edx]
|
||||
movq mm2, [ebx]
|
||||
movq mm3, [ebx+edx]
|
||||
movq mm2, [ebx]
|
||||
movq mm3, [ebx+edx]
|
||||
lea ebx, [ebx+2*edx]
|
||||
movq mm4, [ebx]
|
||||
movq mm5, [ebx+edx]
|
||||
movq mm4, [ebx]
|
||||
movq mm5, [ebx+edx]
|
||||
lea ebx, [ebx+2*edx]
|
||||
movq mm6, [ebx]
|
||||
movq mm7, [ebx+edx]
|
||||
movq mm6, [ebx]
|
||||
movq mm7, [ebx+edx]
|
||||
lea ebx, [ebx+2*edx]
|
||||
|
||||
movq [eax], mm0
|
||||
movq [eax+ecx], mm1
|
||||
|
||||
movq [eax], mm0
|
||||
movq [eax+ecx], mm1
|
||||
lea eax, [eax+2*ecx]
|
||||
movq [eax], mm2
|
||||
movq [eax], mm2
|
||||
movq [eax+ecx], mm3
|
||||
lea eax, [eax+2*ecx]
|
||||
movq [eax], mm4
|
||||
movq [eax], mm4
|
||||
movq [eax+ecx], mm5
|
||||
lea eax, [eax+2*ecx]
|
||||
movq [eax], mm6
|
||||
movq [eax], mm6
|
||||
movq [eax+ecx], mm7
|
||||
lea eax, [eax+2*ecx]
|
||||
|
||||
movq mm0, [ebx]
|
||||
movq mm1, [ebx+edx]
|
||||
movq mm0, [ebx]
|
||||
movq mm1, [ebx+edx]
|
||||
lea ebx, [ebx+2*edx]
|
||||
movq mm2, [ebx]
|
||||
movq mm3, [ebx+edx]
|
||||
movq mm2, [ebx]
|
||||
movq mm3, [ebx+edx]
|
||||
lea ebx, [ebx+2*edx]
|
||||
movq mm4, [ebx]
|
||||
movq mm5, [ebx+edx]
|
||||
movq mm4, [ebx]
|
||||
movq mm5, [ebx+edx]
|
||||
lea ebx, [ebx+2*edx]
|
||||
movq mm6, [ebx]
|
||||
movq mm7, [ebx+edx]
|
||||
|
||||
movq [eax], mm0
|
||||
movq [eax+ecx], mm1
|
||||
movq mm6, [ebx]
|
||||
movq mm7, [ebx+edx]
|
||||
|
||||
movq [eax], mm0
|
||||
movq [eax+ecx], mm1
|
||||
lea eax, [eax+2*ecx]
|
||||
movq [eax], mm2
|
||||
movq [eax], mm2
|
||||
movq [eax+ecx], mm3
|
||||
lea eax, [eax+2*ecx]
|
||||
movq [eax], mm4
|
||||
movq [eax], mm4
|
||||
movq [eax+ecx], mm5
|
||||
lea eax, [eax+2*ecx]
|
||||
movq [eax], mm6
|
||||
movq [eax+ecx], mm7
|
||||
movq [eax], mm6
|
||||
movq [eax+ecx], mm7
|
||||
|
||||
WELSEMMS
|
||||
pop ebx
|
||||
pop ebx
|
||||
ret
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
; void WelsCopy8x8_mmx( uint8_t* Dst,
|
||||
; int32_t iStrideD,
|
||||
@ -314,7 +314,7 @@ WelsCopy8x16_mmx:
|
||||
; int32_t iStrideS )
|
||||
;***********************************************************************
|
||||
ALIGN 16
|
||||
WelsCopy8x8_mmx:
|
||||
WelsCopy8x8_mmx:
|
||||
push ebx
|
||||
push esi
|
||||
mov eax, [esp + 12] ;Dst
|
||||
@ -343,7 +343,7 @@ WelsCopy8x8_mmx:
|
||||
lea esi, [esi+2*ebx]
|
||||
movq mm6, [esi]
|
||||
movq mm7, [esi+ebx]
|
||||
|
||||
|
||||
movq [eax], mm0
|
||||
movq [eax+ecx], mm1
|
||||
lea eax, [eax+2*ecx]
|
||||
@ -355,12 +355,12 @@ WelsCopy8x8_mmx:
|
||||
lea eax, [eax+2*ecx]
|
||||
movq [eax], mm6
|
||||
movq [eax+ecx], mm7
|
||||
|
||||
|
||||
WELSEMMS
|
||||
pop esi
|
||||
pop esi
|
||||
pop ebx
|
||||
ret
|
||||
|
||||
|
||||
; (dunhuang@cisco), 12/21/2011
|
||||
;***********************************************************************
|
||||
; void UpdateMbMv_sse2( SMVUnitXY *pMvBuffer, const SMVUnitXY sMv )
|
||||
@ -417,7 +417,7 @@ WELS_EXTERN PixelAvgWidthEq16_sse2
|
||||
WELS_EXTERN McCopyWidthEq4_mmx
|
||||
WELS_EXTERN McCopyWidthEq8_mmx
|
||||
WELS_EXTERN McCopyWidthEq16_sse2
|
||||
|
||||
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
@ -432,38 +432,38 @@ PixelAvgWidthEq8_mmx:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
mov edi, [esp+20]
|
||||
mov esi, [esp+28]
|
||||
mov edx, [esp+36]
|
||||
mov ebp, [esp+24]
|
||||
mov eax, [esp+32]
|
||||
mov ebx, [esp+40]
|
||||
mov ecx, [esp+44]
|
||||
mov edi, [esp+20]
|
||||
mov esi, [esp+28]
|
||||
mov edx, [esp+36]
|
||||
mov ebp, [esp+24]
|
||||
mov eax, [esp+32]
|
||||
mov ebx, [esp+40]
|
||||
mov ecx, [esp+44]
|
||||
sar ecx, 2
|
||||
.height_loop:
|
||||
movq mm0, [esi]
|
||||
movq mm0, [esi]
|
||||
pavgb mm0, [edx]
|
||||
movq [edi], mm0
|
||||
movq mm1, [esi+eax]
|
||||
movq mm1, [esi+eax]
|
||||
pavgb mm1, [edx+ebx]
|
||||
movq [edi+ebp], mm1
|
||||
lea edi, [edi+2*ebp]
|
||||
lea esi, [esi+2*eax]
|
||||
lea edx, [edx+2*ebx]
|
||||
|
||||
movq mm2, [esi]
|
||||
movq mm2, [esi]
|
||||
pavgb mm2, [edx]
|
||||
movq [edi], mm2
|
||||
movq mm3, [esi+eax]
|
||||
movq mm3, [esi+eax]
|
||||
pavgb mm3, [edx+ebx]
|
||||
movq [edi+ebp], mm3
|
||||
lea edi, [edi+2*ebp]
|
||||
lea esi, [esi+2*eax]
|
||||
lea edx, [edx+2*ebx]
|
||||
|
||||
|
||||
dec ecx
|
||||
jne .height_loop
|
||||
|
||||
|
||||
WELSEMMS
|
||||
pop edi
|
||||
pop esi
|
||||
@ -485,42 +485,42 @@ PixelAvgWidthEq16_sse2:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
mov edi, [esp+20]
|
||||
mov esi, [esp+28]
|
||||
mov edx, [esp+36]
|
||||
mov ebp, [esp+24]
|
||||
mov eax, [esp+32]
|
||||
mov ebx, [esp+40]
|
||||
mov ecx, [esp+44]
|
||||
mov edi, [esp+20]
|
||||
mov esi, [esp+28]
|
||||
mov edx, [esp+36]
|
||||
mov ebp, [esp+24]
|
||||
mov eax, [esp+32]
|
||||
mov ebx, [esp+40]
|
||||
mov ecx, [esp+44]
|
||||
sar ecx, 2
|
||||
.height_loop:
|
||||
movdqu xmm0, [esi]
|
||||
movdqu xmm1, [edx]
|
||||
movdqu xmm2, [esi+eax]
|
||||
movdqu xmm3, [edx+ebx]
|
||||
movdqu xmm3, [edx+ebx]
|
||||
pavgb xmm0, xmm1
|
||||
pavgb xmm2, xmm3
|
||||
movdqu [edi], xmm0
|
||||
movdqu [edi+ebp], xmm2
|
||||
lea edi, [edi+2*ebp]
|
||||
lea esi, [esi+2*eax]
|
||||
lea edx, [edx+2*ebx]
|
||||
lea edx, [edx+2*ebx]
|
||||
|
||||
movdqu xmm4, [esi]
|
||||
movdqu xmm5, [edx]
|
||||
movdqu xmm6, [esi+eax]
|
||||
movdqu xmm7, [edx+ebx]
|
||||
movdqu xmm7, [edx+ebx]
|
||||
pavgb xmm4, xmm5
|
||||
pavgb xmm6, xmm7
|
||||
movdqu [edi], xmm4
|
||||
movdqu [edi+ebp], xmm6
|
||||
lea edi, [edi+2*ebp]
|
||||
lea esi, [esi+2*eax]
|
||||
lea edx, [edx+2*ebx]
|
||||
|
||||
lea edx, [edx+2*ebx]
|
||||
|
||||
dec ecx
|
||||
jne .height_loop
|
||||
|
||||
|
||||
pop edi
|
||||
pop esi
|
||||
pop ebx
|
||||
@ -540,7 +540,7 @@ avg_w16_align_0_ssse3:
|
||||
dec dword [esp+4]
|
||||
jg avg_w16_align_0_ssse3
|
||||
ret
|
||||
|
||||
|
||||
ALIGN 64
|
||||
avg_w16_align_1_ssse3:
|
||||
movdqa xmm1, [ebx+16]
|
||||
@ -555,7 +555,7 @@ avg_w16_align_1_ssse3:
|
||||
jg avg_w16_align_1_ssse3
|
||||
ret
|
||||
|
||||
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void PixelAvgWidthEq16_ssse3(uint8_t *pDst, int32_t iDstStride,
|
||||
@ -574,7 +574,7 @@ PixelAvgWidthEq16_ssse3:
|
||||
mov ebx, [esp+28] ; src1
|
||||
mov ecx, [esp+36] ; src2
|
||||
mov esi, [esp+24] ; i_dst_stride
|
||||
|
||||
|
||||
%define avg_w16_offset (avg_w16_align_1_ssse3-avg_w16_align_0_ssse3)
|
||||
mov edx, ebx
|
||||
and edx, 0x01
|
||||
@ -582,11 +582,11 @@ PixelAvgWidthEq16_ssse3:
|
||||
lea ebp, [avg_w16_offset]
|
||||
imul ebp, edx
|
||||
lea edx, [ebp+eax]
|
||||
|
||||
mov eax, [esp+32]
|
||||
mov ebp, [esp+44]
|
||||
|
||||
mov eax, [esp+32]
|
||||
mov ebp, [esp+44]
|
||||
push ebp
|
||||
mov ebp, [esp+44]
|
||||
mov ebp, [esp+44]
|
||||
and ebx, 0xfffffff0
|
||||
call edx
|
||||
pop ebp
|
||||
@ -607,7 +607,7 @@ McCopyWidthEq4_mmx:
|
||||
push edi
|
||||
push ebx
|
||||
|
||||
|
||||
|
||||
mov esi, [esp+16]
|
||||
mov eax, [esp+20]
|
||||
mov edi, [esp+24]
|
||||
@ -617,12 +617,12 @@ ALIGN 4
|
||||
.height_loop:
|
||||
mov ebx, [esi]
|
||||
mov [edi], ebx
|
||||
|
||||
|
||||
add esi, eax
|
||||
add edi, ecx
|
||||
dec edx
|
||||
jnz .height_loop
|
||||
WELSEMMS
|
||||
WELSEMMS
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -650,12 +650,12 @@ ALIGN 4
|
||||
add edi, ecx
|
||||
dec edx
|
||||
jnz .height_loop
|
||||
|
||||
WELSEMMS
|
||||
|
||||
WELSEMMS
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void McCopyWidthEq16_sse2( uint8_t *pSrc, int32_t iSrcStride, uint8_t *pDst, int32_t iDstStride, int32_t iHeight )
|
||||
@ -664,11 +664,11 @@ McCopyWidthEq16_sse2:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
mov esi, [esp+12]
|
||||
mov eax, [esp+16]
|
||||
mov edi, [esp+20]
|
||||
mov edx, [esp+24]
|
||||
mov ecx, [esp+28]
|
||||
mov esi, [esp+12]
|
||||
mov eax, [esp+16]
|
||||
mov edi, [esp+20]
|
||||
mov edx, [esp+24]
|
||||
mov ecx, [esp+28]
|
||||
|
||||
ALIGN 4
|
||||
.height_loop:
|
||||
@ -681,7 +681,7 @@ ALIGN 4
|
||||
lea esi, [esi+eax*2]
|
||||
lea edi, [edi+edx*2]
|
||||
jnz .height_loop
|
||||
|
||||
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
@ -69,11 +69,11 @@ SECTION .text
|
||||
|
||||
ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void McChromaWidthEq4_mmx( uint8_t *src,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; void McChromaWidthEq4_mmx( uint8_t *src,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; int32_t iHeigh );
|
||||
;*******************************************************************************
|
||||
WELS_EXTERN McChromaWidthEq4_mmx
|
||||
@ -81,46 +81,46 @@ McChromaWidthEq4_mmx:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
|
||||
|
||||
mov eax, [esp +12 + 20]
|
||||
movd mm3, [eax]
|
||||
WELS_Zero mm7
|
||||
punpcklbw mm3, mm3
|
||||
movq mm4, mm3
|
||||
punpcklwd mm3, mm3
|
||||
punpckhwd mm4, mm4
|
||||
|
||||
punpcklwd mm3, mm3
|
||||
punpckhwd mm4, mm4
|
||||
|
||||
movq mm5, mm3
|
||||
punpcklbw mm3, mm7
|
||||
punpckhbw mm5, mm7
|
||||
|
||||
|
||||
movq mm6, mm4
|
||||
punpcklbw mm4, mm7
|
||||
punpckhbw mm6, mm7
|
||||
|
||||
mov esi, [esp +12+ 4]
|
||||
mov eax, [esp + 12 + 8]
|
||||
mov edi, [esp + 12 + 12]
|
||||
mov edx, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
|
||||
mov esi, [esp +12+ 4]
|
||||
mov eax, [esp + 12 + 8]
|
||||
mov edi, [esp + 12 + 12]
|
||||
mov edx, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
lea ebx, [esi + eax]
|
||||
movd mm0, [esi]
|
||||
movd mm1, [esi+1]
|
||||
punpcklbw mm0, mm7
|
||||
punpcklbw mm1, mm7
|
||||
.xloop:
|
||||
|
||||
|
||||
pmullw mm0, mm3
|
||||
pmullw mm1, mm5
|
||||
paddw mm0, mm1
|
||||
|
||||
|
||||
movd mm1, [ebx]
|
||||
punpcklbw mm1, mm7
|
||||
movq mm2, mm1
|
||||
pmullw mm1, mm4
|
||||
paddw mm0, mm1
|
||||
|
||||
|
||||
movd mm1, [ebx+1]
|
||||
punpcklbw mm1, mm7
|
||||
movq mm7, mm1
|
||||
@ -130,13 +130,13 @@ McChromaWidthEq4_mmx:
|
||||
|
||||
paddw mm0, [h264_d0x20_mmx]
|
||||
psrlw mm0, 6
|
||||
|
||||
|
||||
WELS_Zero mm7
|
||||
packuswb mm0, mm7
|
||||
movd [edi], mm0
|
||||
movd [edi], mm0
|
||||
|
||||
movq mm0, mm2
|
||||
|
||||
|
||||
lea edi, [edi +edx ]
|
||||
lea ebx, [ebx + eax]
|
||||
|
||||
@ -151,11 +151,11 @@ McChromaWidthEq4_mmx:
|
||||
|
||||
ALIGN 16
|
||||
;*******************************************************************************
|
||||
; void McChromaWidthEq8_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; void McChromaWidthEq8_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; int32_t iheigh );
|
||||
;*******************************************************************************
|
||||
WELS_EXTERN McChromaWidthEq8_sse2
|
||||
@ -163,47 +163,47 @@ McChromaWidthEq8_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
|
||||
|
||||
mov eax, [esp +12 + 20]
|
||||
movd xmm3, [eax]
|
||||
WELS_Zero xmm7
|
||||
punpcklbw xmm3, xmm3
|
||||
punpcklwd xmm3, xmm3
|
||||
|
||||
|
||||
movdqa xmm4, xmm3
|
||||
punpckldq xmm3, xmm3
|
||||
punpckhdq xmm4, xmm4
|
||||
movdqa xmm5, xmm3
|
||||
movdqa xmm6, xmm4
|
||||
|
||||
|
||||
punpcklbw xmm3, xmm7
|
||||
punpckhbw xmm5, xmm7
|
||||
punpcklbw xmm4, xmm7
|
||||
punpckhbw xmm6, xmm7
|
||||
|
||||
mov esi, [esp +12+ 4]
|
||||
mov eax, [esp + 12 + 8]
|
||||
mov edi, [esp + 12 + 12]
|
||||
mov edx, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
|
||||
mov esi, [esp +12+ 4]
|
||||
mov eax, [esp + 12 + 8]
|
||||
mov edi, [esp + 12 + 12]
|
||||
mov edx, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
lea ebx, [esi + eax]
|
||||
movq xmm0, [esi]
|
||||
movq xmm1, [esi+1]
|
||||
punpcklbw xmm0, xmm7
|
||||
punpcklbw xmm1, xmm7
|
||||
.xloop:
|
||||
|
||||
|
||||
pmullw xmm0, xmm3
|
||||
pmullw xmm1, xmm5
|
||||
paddw xmm0, xmm1
|
||||
|
||||
|
||||
movq xmm1, [ebx]
|
||||
punpcklbw xmm1, xmm7
|
||||
movdqa xmm2, xmm1
|
||||
pmullw xmm1, xmm4
|
||||
paddw xmm0, xmm1
|
||||
|
||||
|
||||
movq xmm1, [ebx+1]
|
||||
punpcklbw xmm1, xmm7
|
||||
movdqa xmm7, xmm1
|
||||
@ -213,19 +213,19 @@ McChromaWidthEq8_sse2:
|
||||
|
||||
paddw xmm0, [h264_d0x20_sse2]
|
||||
psrlw xmm0, 6
|
||||
|
||||
|
||||
WELS_Zero xmm7
|
||||
packuswb xmm0, xmm7
|
||||
movq [edi], xmm0
|
||||
movq [edi], xmm0
|
||||
|
||||
movdqa xmm0, xmm2
|
||||
|
||||
|
||||
lea edi, [edi +edx ]
|
||||
lea ebx, [ebx + eax]
|
||||
|
||||
dec ecx
|
||||
jnz near .xloop
|
||||
|
||||
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -237,8 +237,8 @@ McChromaWidthEq8_sse2:
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void McChromaWidthEq8_ssse3( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; uint8_t *pABCD,
|
||||
; int32_t iHeigh);
|
||||
@ -248,23 +248,23 @@ McChromaWidthEq8_ssse3:
|
||||
push ebx
|
||||
push esi
|
||||
push edi
|
||||
|
||||
|
||||
mov eax, [esp + 12 + 20]
|
||||
|
||||
pxor xmm7, xmm7
|
||||
movd xmm5, [eax]
|
||||
punpcklwd xmm5, xmm5
|
||||
punpckldq xmm5, xmm5
|
||||
movd xmm5, [eax]
|
||||
punpcklwd xmm5, xmm5
|
||||
punpckldq xmm5, xmm5
|
||||
movdqa xmm6, xmm5
|
||||
punpcklqdq xmm5, xmm5
|
||||
punpckhqdq xmm6, xmm6
|
||||
|
||||
mov eax, [esp + 12 + 4]
|
||||
mov edx, [esp + 12 + 8]
|
||||
mov esi, [esp + 12 + 12]
|
||||
mov edi, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
punpckhqdq xmm6, xmm6
|
||||
|
||||
mov eax, [esp + 12 + 4]
|
||||
mov edx, [esp + 12 + 8]
|
||||
mov esi, [esp + 12 + 12]
|
||||
mov edi, [esp + 12 + 16]
|
||||
mov ecx, [esp + 12 + 24]
|
||||
|
||||
sub esi, edi
|
||||
sub esi, edi
|
||||
movdqa xmm7, [h264_d0x20_sse2]
|
||||
@ -273,39 +273,39 @@ McChromaWidthEq8_ssse3:
|
||||
movdqa xmm1, xmm0
|
||||
psrldq xmm1, 1
|
||||
punpcklbw xmm0, xmm1
|
||||
|
||||
.hloop_chroma:
|
||||
|
||||
.hloop_chroma:
|
||||
lea esi, [esi+2*edi]
|
||||
|
||||
|
||||
movdqu xmm2, [eax+edx]
|
||||
movdqa xmm3, xmm2
|
||||
psrldq xmm3, 1
|
||||
punpcklbw xmm2, xmm3
|
||||
movdqa xmm4, xmm2
|
||||
|
||||
|
||||
pmaddubsw xmm0, xmm5
|
||||
pmaddubsw xmm2, xmm6
|
||||
paddw xmm0, xmm2
|
||||
paddw xmm0, xmm7
|
||||
psrlw xmm0, 6
|
||||
packuswb xmm0, xmm0
|
||||
movq [esi],xmm0
|
||||
|
||||
movq [esi],xmm0
|
||||
|
||||
lea eax, [eax+2*edx]
|
||||
movdqu xmm2, [eax]
|
||||
movdqa xmm3, xmm2
|
||||
psrldq xmm3, 1
|
||||
punpcklbw xmm2, xmm3
|
||||
movdqa xmm0, xmm2
|
||||
|
||||
|
||||
pmaddubsw xmm4, xmm5
|
||||
pmaddubsw xmm2, xmm6
|
||||
paddw xmm4, xmm2
|
||||
paddw xmm4, xmm7
|
||||
psrlw xmm4, 6
|
||||
packuswb xmm4, xmm4
|
||||
movq [esi+edi],xmm4
|
||||
|
||||
movq [esi+edi],xmm4
|
||||
|
||||
sub ecx, 2
|
||||
jnz .hloop_chroma
|
||||
pop edi
|
||||
|
@ -91,29 +91,29 @@ WELS_EXTERN McHorVer20WidthEq16_sse2
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void McHorVer20WidthEq16_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; void McHorVer20WidthEq16_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; int32_t iHeight,
|
||||
; );
|
||||
;***********************************************************************
|
||||
McHorVer20WidthEq16_sse2:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
|
||||
mov esi, [esp + 12]
|
||||
mov eax, [esp + 16]
|
||||
mov edi, [esp + 20]
|
||||
mov ecx, [esp + 28]
|
||||
mov edx, [esp + 24]
|
||||
sub esi, 2
|
||||
|
||||
|
||||
mov esi, [esp + 12]
|
||||
mov eax, [esp + 16]
|
||||
mov edi, [esp + 20]
|
||||
mov ecx, [esp + 28]
|
||||
mov edx, [esp + 24]
|
||||
sub esi, 2
|
||||
|
||||
WELS_Zero xmm7
|
||||
movdqa xmm6, [h264_w0x10_1]
|
||||
.y_loop:
|
||||
|
||||
|
||||
movq xmm0, [esi]
|
||||
punpcklbw xmm0, xmm7
|
||||
movq xmm1, [esi+5]
|
||||
@ -126,7 +126,7 @@ McHorVer20WidthEq16_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -152,7 +152,7 @@ McHorVer20WidthEq16_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3+8]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -165,8 +165,8 @@ McHorVer20WidthEq16_sse2:
|
||||
psraw xmm0, 5
|
||||
packuswb xmm0, xmm7
|
||||
movq [edi+8], xmm0
|
||||
|
||||
|
||||
|
||||
|
||||
add esi, eax
|
||||
add edi, edx
|
||||
dec ecx
|
||||
@ -178,9 +178,9 @@ McHorVer20WidthEq16_sse2:
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void McHorVer22Width8HorFirst_sse2( uint8_t*pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t* pTap,
|
||||
; void McHorVer22Width8HorFirst_sse2( uint8_t*pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t* pTap,
|
||||
; int32_t iTapStride,
|
||||
; int32_t iHeight);
|
||||
;***********************************************************************
|
||||
@ -193,11 +193,11 @@ McHorVer22Width8HorFirst_sse2:
|
||||
mov edi, [esp+24] ;tap
|
||||
mov edx, [esp+28] ;tap_stride
|
||||
mov ebx, [esp+32] ;i_height
|
||||
pxor xmm7, xmm7
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
sub esi, eax ;;;;;;;;need more 5 lines.
|
||||
sub esi, eax
|
||||
|
||||
|
||||
.yloop_width_8:
|
||||
movq xmm0, [esi]
|
||||
punpcklbw xmm0, xmm7
|
||||
@ -211,7 +211,7 @@ McHorVer22Width8HorFirst_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -221,7 +221,7 @@ McHorVer22Width8HorFirst_sse2:
|
||||
psllw xmm4, 2
|
||||
paddw xmm0, xmm4
|
||||
movdqa [edi], xmm0
|
||||
|
||||
|
||||
add esi, eax
|
||||
add edi, edx
|
||||
dec ebx
|
||||
@ -230,30 +230,30 @@ McHorVer22Width8HorFirst_sse2:
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
; void McHorVer02WidthEq8_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; void McHorVer02WidthEq8_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; int32_t iHeight )
|
||||
;***********************************************************************
|
||||
ALIGN 16
|
||||
McHorVer02WidthEq8_sse2:
|
||||
push esi
|
||||
push edi
|
||||
|
||||
mov esi, [esp + 12]
|
||||
mov edx, [esp + 16]
|
||||
mov edi, [esp + 20]
|
||||
mov eax, [esp + 24]
|
||||
mov ecx, [esp + 28]
|
||||
|
||||
mov esi, [esp + 12]
|
||||
mov edx, [esp + 16]
|
||||
mov edi, [esp + 20]
|
||||
mov eax, [esp + 24]
|
||||
mov ecx, [esp + 28]
|
||||
|
||||
sub esi, edx
|
||||
sub esi, edx
|
||||
|
||||
WELS_Zero xmm7
|
||||
|
||||
|
||||
SSE_LOAD_8P xmm0, xmm7, [esi]
|
||||
SSE_LOAD_8P xmm1, xmm7, [esi+edx]
|
||||
lea esi, [esi+2*edx]
|
||||
@ -262,8 +262,8 @@ McHorVer02WidthEq8_sse2:
|
||||
lea esi, [esi+2*edx]
|
||||
SSE_LOAD_8P xmm4, xmm7, [esi]
|
||||
SSE_LOAD_8P xmm5, xmm7, [esi+edx]
|
||||
|
||||
.start:
|
||||
|
||||
.start:
|
||||
FILTER_HV_W8 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
jz near .xx_exit
|
||||
@ -273,7 +273,7 @@ McHorVer02WidthEq8_sse2:
|
||||
FILTER_HV_W8 xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [edi+eax]
|
||||
dec ecx
|
||||
jz near .xx_exit
|
||||
|
||||
|
||||
lea edi, [edi+2*eax]
|
||||
SSE_LOAD_8P xmm7, xmm0, [esi+edx]
|
||||
FILTER_HV_W8 xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, [edi]
|
||||
@ -356,11 +356,11 @@ WELS_EXTERN McHorVer22HorFirst_sse2
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
; void McHorVer02_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; void McHorVer02_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; int32_t iWidth,
|
||||
; int32_t iWidth,
|
||||
; int32_t iHeight )
|
||||
;***********************************************************************
|
||||
ALIGN 16
|
||||
@ -368,19 +368,19 @@ McHorVer02_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
|
||||
mov esi, [esp + 16]
|
||||
mov edx, [esp + 20]
|
||||
mov edi, [esp + 24]
|
||||
mov eax, [esp + 28]
|
||||
mov ecx, [esp + 36]
|
||||
mov ebx, [esp + 32]
|
||||
|
||||
mov esi, [esp + 16]
|
||||
mov edx, [esp + 20]
|
||||
mov edi, [esp + 24]
|
||||
mov eax, [esp + 28]
|
||||
mov ecx, [esp + 36]
|
||||
mov ebx, [esp + 32]
|
||||
shr ebx, 3
|
||||
sub esi, edx
|
||||
sub esi, edx
|
||||
|
||||
.xloop:
|
||||
WELS_Zero xmm7
|
||||
|
||||
.xloop:
|
||||
WELS_Zero xmm7
|
||||
SSE_LOAD_8P xmm0, xmm7, [esi]
|
||||
SSE_LOAD_8P xmm1, xmm7, [esi+edx]
|
||||
lea esi, [esi+2*edx]
|
||||
@ -389,7 +389,7 @@ McHorVer02_sse2:
|
||||
lea esi, [esi+2*edx]
|
||||
SSE_LOAD_8P xmm4, xmm7, [esi]
|
||||
SSE_LOAD_8P xmm5, xmm7, [esi+edx]
|
||||
|
||||
|
||||
FILTER_HV_W8 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
lea esi, [esi+2*edx]
|
||||
@ -402,8 +402,8 @@ McHorVer02_sse2:
|
||||
movdqa xmm5,xmm6
|
||||
add edi, eax
|
||||
sub esi, edx
|
||||
|
||||
.start:
|
||||
|
||||
.start:
|
||||
FILTER_HV_W8 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
@ -413,7 +413,7 @@ McHorVer02_sse2:
|
||||
FILTER_HV_W8 xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [edi+eax]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*eax]
|
||||
SSE_LOAD_8P xmm7, xmm0, [esi+edx]
|
||||
FILTER_HV_W8 xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, [edi]
|
||||
@ -454,16 +454,16 @@ McHorVer02_sse2:
|
||||
SSE_LOAD_8P xmm5, xmm6, [esi+edx]
|
||||
jmp near .start
|
||||
|
||||
.x_loop_dec:
|
||||
.x_loop_dec:
|
||||
dec ebx
|
||||
jz near .xx_exit
|
||||
mov esi, [esp + 16]
|
||||
mov edi, [esp + 24]
|
||||
mov esi, [esp + 16]
|
||||
mov edi, [esp + 24]
|
||||
sub esi, edx
|
||||
sub esi, edx
|
||||
add esi, 8
|
||||
add edi, 8
|
||||
mov ecx, [esp + 36]
|
||||
mov ecx, [esp + 36]
|
||||
jmp near .xloop
|
||||
|
||||
.xx_exit:
|
||||
@ -473,12 +473,12 @@ McHorVer02_sse2:
|
||||
ret
|
||||
|
||||
|
||||
ALIGN 16
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void McHorVer20_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; void McHorVer20_sse2( uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t *pDst,
|
||||
; int32_t iDstStride,
|
||||
; int32_t iWidth,
|
||||
; int32_t iHeight
|
||||
; );
|
||||
@ -487,19 +487,19 @@ McHorVer20_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
mov esi, [esp+16]
|
||||
mov eax, [esp+20]
|
||||
mov edi, [esp+24]
|
||||
mov edx, [esp+28]
|
||||
mov ecx, [esp+32]
|
||||
mov ebx, [esp+36]
|
||||
mov esi, [esp+16]
|
||||
mov eax, [esp+20]
|
||||
mov edi, [esp+24]
|
||||
mov edx, [esp+28]
|
||||
mov ecx, [esp+32]
|
||||
mov ebx, [esp+36]
|
||||
sub esi, 2
|
||||
pxor xmm7, xmm7
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
cmp ecx, 9
|
||||
jne near .width_17
|
||||
|
||||
.yloop_width_9:
|
||||
jne near .width_17
|
||||
|
||||
.yloop_width_9:
|
||||
movq xmm0, [esi]
|
||||
punpcklbw xmm0, xmm7
|
||||
movq xmm1, [esi+5]
|
||||
@ -512,7 +512,7 @@ McHorVer20_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
movdqa xmm7, xmm2
|
||||
paddw xmm7, xmm3
|
||||
movdqa xmm6, xmm4
|
||||
@ -526,12 +526,12 @@ McHorVer20_sse2:
|
||||
paddw xmm0, [h264_w0x10_1]
|
||||
psraw xmm0, 5
|
||||
packuswb xmm0, xmm0
|
||||
movd [edi], xmm0
|
||||
|
||||
movd [edi], xmm0
|
||||
|
||||
pxor xmm7, xmm7
|
||||
movq xmm0, [esi+6]
|
||||
punpcklbw xmm0, xmm7
|
||||
|
||||
|
||||
paddw xmm4, xmm1
|
||||
paddw xmm5, xmm3
|
||||
psllw xmm5, 2
|
||||
@ -543,8 +543,8 @@ McHorVer20_sse2:
|
||||
paddw xmm2, [h264_w0x10_1]
|
||||
psraw xmm2, 5
|
||||
packuswb xmm2, xmm2
|
||||
movq [edi+1], xmm2
|
||||
|
||||
movq [edi+1], xmm2
|
||||
|
||||
add esi, eax
|
||||
add edi, edx
|
||||
dec ebx
|
||||
@ -553,8 +553,8 @@ McHorVer20_sse2:
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
.width_17:
|
||||
.yloop_width_17:
|
||||
movq xmm0, [esi]
|
||||
@ -569,7 +569,7 @@ McHorVer20_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -582,7 +582,7 @@ McHorVer20_sse2:
|
||||
psraw xmm0, 5
|
||||
packuswb xmm0, xmm0
|
||||
movq [edi], xmm0
|
||||
|
||||
|
||||
movq xmm0, [esi+8]
|
||||
punpcklbw xmm0, xmm7
|
||||
movq xmm1, [esi+5+8]
|
||||
@ -595,7 +595,7 @@ McHorVer20_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3+8]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
movdqa xmm7, xmm2
|
||||
paddw xmm7, xmm3
|
||||
movdqa xmm6, xmm4
|
||||
@ -610,12 +610,12 @@ McHorVer20_sse2:
|
||||
psraw xmm0, 5
|
||||
packuswb xmm0, xmm0
|
||||
movd [edi+8], xmm0
|
||||
|
||||
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
movq xmm0, [esi+6+8]
|
||||
punpcklbw xmm0, xmm7
|
||||
|
||||
|
||||
paddw xmm4, xmm1
|
||||
paddw xmm5, xmm3
|
||||
psllw xmm5, 2
|
||||
@ -627,7 +627,7 @@ McHorVer20_sse2:
|
||||
paddw xmm2, [h264_w0x10_1]
|
||||
psraw xmm2, 5
|
||||
packuswb xmm2, xmm2
|
||||
movq [edi+9], xmm2
|
||||
movq [edi+9], xmm2
|
||||
add esi, eax
|
||||
add edi, edx
|
||||
dec ebx
|
||||
@ -636,14 +636,14 @@ McHorVer20_sse2:
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
;void McHorVer22HorFirst_sse2
|
||||
; (uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; (uint8_t *pSrc,
|
||||
; int32_t iSrcStride,
|
||||
; uint8_t * pTap,
|
||||
; int32_t iTapStride,
|
||||
; int32_t iWidth,int32_t iHeight);
|
||||
@ -652,21 +652,21 @@ McHorVer22HorFirst_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
mov esi, [esp+16]
|
||||
mov eax, [esp+20]
|
||||
mov edi, [esp+24]
|
||||
mov edx, [esp+28]
|
||||
mov ecx, [esp+32]
|
||||
mov ebx, [esp+36]
|
||||
pxor xmm7, xmm7
|
||||
|
||||
mov esi, [esp+16]
|
||||
mov eax, [esp+20]
|
||||
mov edi, [esp+24]
|
||||
mov edx, [esp+28]
|
||||
mov ecx, [esp+32]
|
||||
mov ebx, [esp+36]
|
||||
pxor xmm7, xmm7
|
||||
|
||||
sub esi, eax ;;;;;;;;need more 5 lines.
|
||||
sub esi, eax
|
||||
|
||||
|
||||
cmp ecx, 9
|
||||
jne near .width_17
|
||||
|
||||
.yloop_width_9:
|
||||
jne near .width_17
|
||||
|
||||
.yloop_width_9:
|
||||
movq xmm0, [esi]
|
||||
punpcklbw xmm0, xmm7
|
||||
movq xmm1, [esi+5]
|
||||
@ -679,7 +679,7 @@ McHorVer22HorFirst_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
movdqa xmm7, xmm2
|
||||
paddw xmm7, xmm3
|
||||
movdqa xmm6, xmm4
|
||||
@ -690,12 +690,12 @@ McHorVer22HorFirst_sse2:
|
||||
paddw xmm0, xmm6
|
||||
psllw xmm6, 2
|
||||
paddw xmm0, xmm6
|
||||
movd [edi], xmm0
|
||||
|
||||
movd [edi], xmm0
|
||||
|
||||
pxor xmm7, xmm7
|
||||
movq xmm0, [esi+6]
|
||||
punpcklbw xmm0, xmm7
|
||||
|
||||
|
||||
paddw xmm4, xmm1
|
||||
paddw xmm5, xmm3
|
||||
psllw xmm5, 2
|
||||
@ -704,9 +704,9 @@ McHorVer22HorFirst_sse2:
|
||||
paddw xmm2, xmm5
|
||||
psllw xmm5, 2
|
||||
paddw xmm2, xmm5
|
||||
movq [edi+2], xmm2
|
||||
movhps [edi+2+8], xmm2
|
||||
|
||||
movq [edi+2], xmm2
|
||||
movhps [edi+2+8], xmm2
|
||||
|
||||
add esi, eax
|
||||
add edi, edx
|
||||
dec ebx
|
||||
@ -715,8 +715,8 @@ McHorVer22HorFirst_sse2:
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
.width_17:
|
||||
.yloop_width_17:
|
||||
movq xmm0, [esi]
|
||||
@ -731,7 +731,7 @@ McHorVer22HorFirst_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
paddw xmm2, xmm3
|
||||
paddw xmm4, xmm5
|
||||
psllw xmm4, 2
|
||||
@ -741,7 +741,7 @@ McHorVer22HorFirst_sse2:
|
||||
psllw xmm4, 2
|
||||
paddw xmm0, xmm4
|
||||
movdqa [edi], xmm0
|
||||
|
||||
|
||||
movq xmm0, [esi+8]
|
||||
punpcklbw xmm0, xmm7
|
||||
movq xmm1, [esi+5+8]
|
||||
@ -754,7 +754,7 @@ McHorVer22HorFirst_sse2:
|
||||
punpcklbw xmm4, xmm7
|
||||
movq xmm5, [esi+3+8]
|
||||
punpcklbw xmm5, xmm7
|
||||
|
||||
|
||||
movdqa xmm7, xmm2
|
||||
paddw xmm7, xmm3
|
||||
movdqa xmm6, xmm4
|
||||
@ -766,12 +766,12 @@ McHorVer22HorFirst_sse2:
|
||||
psllw xmm6, 2
|
||||
paddw xmm0, xmm6
|
||||
movd [edi+16], xmm0
|
||||
|
||||
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
movq xmm0, [esi+6+8]
|
||||
punpcklbw xmm0, xmm7
|
||||
|
||||
|
||||
paddw xmm4, xmm1
|
||||
paddw xmm5, xmm3
|
||||
psllw xmm5, 2
|
||||
@ -780,9 +780,9 @@ McHorVer22HorFirst_sse2:
|
||||
paddw xmm2, xmm5
|
||||
psllw xmm5, 2
|
||||
paddw xmm2, xmm5
|
||||
movq [edi+18], xmm2
|
||||
movhps [edi+18+8], xmm2
|
||||
|
||||
movq [edi+18], xmm2
|
||||
movhps [edi+18+8], xmm2
|
||||
|
||||
add esi, eax
|
||||
add edi, edx
|
||||
dec ebx
|
||||
@ -791,23 +791,23 @@ McHorVer22HorFirst_sse2:
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
%macro FILTER_VER 9
|
||||
paddw %1, %6
|
||||
movdqa %7, %2
|
||||
movdqa %8, %3
|
||||
|
||||
|
||||
|
||||
|
||||
paddw %7, %5
|
||||
paddw %8, %4
|
||||
|
||||
psubw %1, %7
|
||||
psraw %1, 2
|
||||
paddw %1, %8
|
||||
psubw %1, %7
|
||||
psraw %1, 2
|
||||
paddw %8, %1
|
||||
|
||||
psubw %1, %7
|
||||
psraw %1, 2
|
||||
paddw %1, %8
|
||||
psubw %1, %7
|
||||
psraw %1, 2
|
||||
paddw %8, %1
|
||||
paddw %8, [h264_mc_hc_32]
|
||||
psraw %8, 6
|
||||
packuswb %8, %8
|
||||
@ -815,8 +815,8 @@ McHorVer22HorFirst_sse2:
|
||||
%endmacro
|
||||
;***********************************************************************
|
||||
;void McHorVer22VerLastAlign_sse2(
|
||||
; uint8_t *pTap,
|
||||
; int32_t iTapStride,
|
||||
; uint8_t *pTap,
|
||||
; int32_t iTapStride,
|
||||
; uint8_t * pDst,
|
||||
; int32_t iDstStride,
|
||||
; int32_t iWidth,
|
||||
@ -828,15 +828,15 @@ McHorVer22HorFirst_sse2:
|
||||
push edi
|
||||
push ebx
|
||||
push ebp
|
||||
|
||||
|
||||
mov esi, [esp+20]
|
||||
mov eax, [esp+24]
|
||||
mov edi, [esp+28]
|
||||
mov edx, [esp+32]
|
||||
mov ebx, [esp+36]
|
||||
mov ecx, [esp+40]
|
||||
shr ebx, 3
|
||||
|
||||
mov ecx, [esp+40]
|
||||
shr ebx, 3
|
||||
|
||||
.width_loop:
|
||||
movdqa xmm0, [esi]
|
||||
movdqa xmm1, [esi+eax]
|
||||
@ -846,73 +846,73 @@ McHorVer22HorFirst_sse2:
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm4, [esi]
|
||||
movdqa xmm5, [esi+eax]
|
||||
|
||||
|
||||
FILTER_VER xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm6, [esi]
|
||||
|
||||
|
||||
movdqa xmm0, xmm1
|
||||
movdqa xmm1, xmm2
|
||||
movdqa xmm2, xmm3
|
||||
movdqa xmm3, xmm4
|
||||
movdqa xmm4, xmm5
|
||||
movdqa xmm5, xmm6
|
||||
|
||||
|
||||
add edi, edx
|
||||
sub esi, eax
|
||||
|
||||
sub esi, eax
|
||||
|
||||
.start:
|
||||
FILTER_VER xmm0,xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm6, [esi]
|
||||
FILTER_VER xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqa xmm7, [esi+eax]
|
||||
FILTER_VER xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, [edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm0, [esi]
|
||||
FILTER_VER xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, xmm2,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqa xmm1, [esi+eax]
|
||||
FILTER_VER xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,[edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm2, [esi]
|
||||
FILTER_VER xmm5, xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqa xmm3, [esi+eax]
|
||||
FILTER_VER xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,xmm5,[edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqa xmm4, [esi]
|
||||
FILTER_VER xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,xmm5,xmm6, [edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqa xmm5, [esi+eax]
|
||||
jmp near .start
|
||||
|
||||
|
||||
.x_loop_dec:
|
||||
dec ebx
|
||||
jz near .exit
|
||||
@ -922,9 +922,9 @@ McHorVer22HorFirst_sse2:
|
||||
add esi, 16
|
||||
add edi, 8
|
||||
jmp .width_loop
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.exit:
|
||||
pop ebp
|
||||
pop ebx
|
||||
@ -934,8 +934,8 @@ McHorVer22HorFirst_sse2:
|
||||
|
||||
;***********************************************************************
|
||||
;void McHorVer22VerLastUnAlign_sse2(
|
||||
; uint8_t *pTap,
|
||||
; int32_t iTapStride,
|
||||
; uint8_t *pTap,
|
||||
; int32_t iTapStride,
|
||||
; uint8_t * pDst,
|
||||
; int32_t iDstStride,
|
||||
; int32_t iWidth,
|
||||
@ -947,15 +947,15 @@ McHorVer22HorFirst_sse2:
|
||||
push edi
|
||||
push ebx
|
||||
push ebp
|
||||
|
||||
|
||||
mov esi, [esp+20]
|
||||
mov eax, [esp+24]
|
||||
mov edi, [esp+28]
|
||||
mov edx, [esp+32]
|
||||
mov ebx, [esp+36]
|
||||
mov ecx, [esp+40]
|
||||
shr ebx, 3
|
||||
|
||||
mov ecx, [esp+40]
|
||||
shr ebx, 3
|
||||
|
||||
.width_loop:
|
||||
movdqu xmm0, [esi]
|
||||
movdqu xmm1, [esi+eax]
|
||||
@ -965,73 +965,73 @@ McHorVer22HorFirst_sse2:
|
||||
lea esi, [esi+2*eax]
|
||||
movdqu xmm4, [esi]
|
||||
movdqu xmm5, [esi+eax]
|
||||
|
||||
|
||||
FILTER_VER xmm0, xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
lea esi, [esi+2*eax]
|
||||
movdqu xmm6, [esi]
|
||||
|
||||
|
||||
movdqa xmm0, xmm1
|
||||
movdqa xmm1, xmm2
|
||||
movdqa xmm2, xmm3
|
||||
movdqa xmm3, xmm4
|
||||
movdqa xmm4, xmm5
|
||||
movdqa xmm5, xmm6
|
||||
|
||||
|
||||
add edi, edx
|
||||
sub esi, eax
|
||||
|
||||
sub esi, eax
|
||||
|
||||
.start:
|
||||
FILTER_VER xmm0,xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, [edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqu xmm6, [esi]
|
||||
FILTER_VER xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqu xmm7, [esi+eax]
|
||||
FILTER_VER xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, [edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqu xmm0, [esi]
|
||||
FILTER_VER xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, xmm2,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqu xmm1, [esi+eax]
|
||||
FILTER_VER xmm4, xmm5, xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,[edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqu xmm2, [esi]
|
||||
FILTER_VER xmm5, xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,[edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqu xmm3, [esi+eax]
|
||||
FILTER_VER xmm6, xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,xmm5,[edi]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea esi, [esi+2*eax]
|
||||
movdqu xmm4, [esi]
|
||||
FILTER_VER xmm7, xmm0, xmm1, xmm2, xmm3,xmm4,xmm5,xmm6, [edi+edx]
|
||||
dec ecx
|
||||
jz near .x_loop_dec
|
||||
|
||||
|
||||
lea edi, [edi+2*edx]
|
||||
movdqu xmm5, [esi+eax]
|
||||
jmp near .start
|
||||
|
||||
|
||||
.x_loop_dec:
|
||||
dec ebx
|
||||
jz near .exit
|
||||
@ -1041,9 +1041,9 @@ McHorVer22HorFirst_sse2:
|
||||
add esi, 16
|
||||
add edi, 8
|
||||
jmp .width_loop
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
.exit:
|
||||
pop ebp
|
||||
pop ebx
|
||||
|
@ -32,7 +32,7 @@
|
||||
;* memzero.asm
|
||||
;*
|
||||
;* Abstract
|
||||
;*
|
||||
;*
|
||||
;*
|
||||
;* History
|
||||
;* 9/16/2009 Created
|
||||
@ -47,8 +47,8 @@ BITS 32
|
||||
; Code
|
||||
;***********************************************************************
|
||||
|
||||
SECTION .text
|
||||
|
||||
SECTION .text
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
;_inline void __cdecl WelsPrefetchZero_mmx(int8_t const*_A);
|
||||
@ -57,7 +57,7 @@ WELS_EXTERN WelsPrefetchZero_mmx
|
||||
WelsPrefetchZero_mmx:
|
||||
mov eax,[esp+4]
|
||||
prefetchnta [eax]
|
||||
ret
|
||||
ret
|
||||
|
||||
|
||||
ALIGN 16
|
||||
@ -69,7 +69,7 @@ WelsSetMemZeroAligned64_sse2:
|
||||
mov eax, [esp + 4] ; dst
|
||||
mov ecx, [esp + 8]
|
||||
neg ecx
|
||||
|
||||
|
||||
pxor xmm0, xmm0
|
||||
.memzeroa64_sse2_loops:
|
||||
movdqa [eax], xmm0
|
||||
@ -77,11 +77,11 @@ WelsSetMemZeroAligned64_sse2:
|
||||
movdqa [eax+32], xmm0
|
||||
movdqa [eax+48], xmm0
|
||||
add eax, 0x40
|
||||
|
||||
|
||||
add ecx, 0x40
|
||||
jnz near .memzeroa64_sse2_loops
|
||||
|
||||
ret
|
||||
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
@ -92,7 +92,7 @@ WelsSetMemZeroSize64_mmx:
|
||||
mov eax, [esp + 4] ; dst
|
||||
mov ecx, [esp + 8]
|
||||
neg ecx
|
||||
|
||||
|
||||
pxor mm0, mm0
|
||||
.memzero64_mmx_loops:
|
||||
movq [eax], mm0
|
||||
@ -102,16 +102,16 @@ WelsSetMemZeroSize64_mmx:
|
||||
movq [eax+32], mm0
|
||||
movq [eax+40], mm0
|
||||
movq [eax+48], mm0
|
||||
movq [eax+56], mm0
|
||||
movq [eax+56], mm0
|
||||
add eax, 0x40
|
||||
|
||||
|
||||
add ecx, 0x40
|
||||
jnz near .memzero64_mmx_loops
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
ALIGN 16
|
||||
;***********************************************************************
|
||||
; void WelsSetMemZeroSize8_mmx(void *dst, int32_t size)
|
||||
;***********************************************************************
|
||||
@ -119,17 +119,17 @@ WELS_EXTERN WelsSetMemZeroSize8_mmx
|
||||
WelsSetMemZeroSize8_mmx:
|
||||
mov eax, [esp + 4] ; dst
|
||||
mov ecx, [esp + 8] ; size
|
||||
neg ecx
|
||||
neg ecx
|
||||
pxor mm0, mm0
|
||||
|
||||
|
||||
.memzero8_mmx_loops:
|
||||
movq [eax], mm0
|
||||
add eax, 0x08
|
||||
|
||||
|
||||
add ecx, 0x08
|
||||
jnz near .memzero8_mmx_loops
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
|
||||
|
@ -44,17 +44,17 @@
|
||||
|
||||
BITS 32
|
||||
|
||||
SECTION .text
|
||||
SECTION .text
|
||||
;************************************************
|
||||
;NEW_QUANT
|
||||
;NEW_QUANT
|
||||
;************************************************
|
||||
|
||||
%macro SSE2_Quant8 5
|
||||
MOVDQ %1, %5
|
||||
pxor %2, %2
|
||||
pcmpgtw %2, %1
|
||||
pxor %1, %2
|
||||
psubw %1, %2
|
||||
pxor %2, %2
|
||||
pcmpgtw %2, %1
|
||||
pxor %1, %2
|
||||
psubw %1, %2
|
||||
paddusw %1, %3
|
||||
pmulhuw %1, %4
|
||||
pxor %1, %2
|
||||
@ -64,10 +64,10 @@ SECTION .text
|
||||
|
||||
%macro SSE2_QuantMax8 6
|
||||
MOVDQ %1, %5
|
||||
pxor %2, %2
|
||||
pcmpgtw %2, %1
|
||||
pxor %1, %2
|
||||
psubw %1, %2
|
||||
pxor %2, %2
|
||||
pcmpgtw %2, %1
|
||||
pxor %1, %2
|
||||
psubw %1, %2
|
||||
paddusw %1, %3
|
||||
pmulhuw %1, %4
|
||||
pmaxsw %6, %1
|
||||
@ -86,17 +86,17 @@ SECTION .text
|
||||
WELS_EXTERN WelsQuant4x4_sse2
|
||||
align 16
|
||||
WelsQuant4x4_sse2:
|
||||
mov eax, [ff]
|
||||
mov ecx, [mf]
|
||||
mov eax, [ff]
|
||||
mov ecx, [mf]
|
||||
MOVDQ xmm2, [eax]
|
||||
MOVDQ xmm3, [ecx]
|
||||
|
||||
|
||||
mov edx, [pDct]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x10]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x10]
|
||||
|
||||
ret
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
;void WelsQuant4x4Dc_sse2(int16_t *pDct, const int16_t ff, int16_t mf);
|
||||
;***********************************************************************
|
||||
@ -104,37 +104,37 @@ WELS_EXTERN WelsQuant4x4Dc_sse2
|
||||
align 16
|
||||
WelsQuant4x4Dc_sse2:
|
||||
mov ax, [mf]
|
||||
SSE2_Copy8Times xmm3, eax
|
||||
|
||||
SSE2_Copy8Times xmm3, eax
|
||||
|
||||
mov cx, [ff]
|
||||
SSE2_Copy8Times xmm2, ecx
|
||||
SSE2_Copy8Times xmm2, ecx
|
||||
|
||||
mov edx, [pDct]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x10]
|
||||
|
||||
ret
|
||||
|
||||
|
||||
ret
|
||||
|
||||
;***********************************************************************
|
||||
; void WelsQuantFour4x4_sse2(int16_t *pDct, int16_t* ff, int16_t *mf);
|
||||
;***********************************************************************
|
||||
WELS_EXTERN WelsQuantFour4x4_sse2
|
||||
align 16
|
||||
WelsQuantFour4x4_sse2:
|
||||
mov eax, [ff]
|
||||
mov ecx, [mf]
|
||||
mov eax, [ff]
|
||||
mov ecx, [mf]
|
||||
MOVDQ xmm2, [eax]
|
||||
MOVDQ xmm3, [ecx]
|
||||
|
||||
mov edx, [pDct]
|
||||
|
||||
mov edx, [pDct]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x10]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x10]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x20]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x30]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x40]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x50]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x60]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x70]
|
||||
SSE2_Quant8 xmm0, xmm1, xmm2, xmm3, [edx + 0x70]
|
||||
|
||||
ret
|
||||
|
||||
@ -144,17 +144,17 @@ WelsQuantFour4x4_sse2:
|
||||
WELS_EXTERN WelsQuantFour4x4Max_sse2
|
||||
align 16
|
||||
WelsQuantFour4x4Max_sse2:
|
||||
mov eax, [ff]
|
||||
mov ecx, [mf]
|
||||
mov eax, [ff]
|
||||
mov ecx, [mf]
|
||||
MOVDQ xmm2, [eax]
|
||||
MOVDQ xmm3, [ecx]
|
||||
|
||||
mov edx, [pDct]
|
||||
|
||||
mov edx, [pDct]
|
||||
pxor xmm4, xmm4
|
||||
pxor xmm5, xmm5
|
||||
pxor xmm6, xmm6
|
||||
pxor xmm7, xmm7
|
||||
SSE2_QuantMax8 xmm0, xmm1, xmm2, xmm3, [edx ], xmm4
|
||||
SSE2_QuantMax8 xmm0, xmm1, xmm2, xmm3, [edx ], xmm4
|
||||
SSE2_QuantMax8 xmm0, xmm1, xmm2, xmm3, [edx + 0x10], xmm4
|
||||
SSE2_QuantMax8 xmm0, xmm1, xmm2, xmm3, [edx + 0x20], xmm5
|
||||
SSE2_QuantMax8 xmm0, xmm1, xmm2, xmm3, [edx + 0x30], xmm5
|
||||
@ -162,19 +162,19 @@ WelsQuantFour4x4Max_sse2:
|
||||
SSE2_QuantMax8 xmm0, xmm1, xmm2, xmm3, [edx + 0x50], xmm6
|
||||
SSE2_QuantMax8 xmm0, xmm1, xmm2, xmm3, [edx + 0x60], xmm7
|
||||
SSE2_QuantMax8 xmm0, xmm1, xmm2, xmm3, [edx + 0x70], xmm7
|
||||
|
||||
|
||||
SSE2_TransTwo4x4W xmm4, xmm5, xmm6, xmm7, xmm0
|
||||
pmaxsw xmm0, xmm4
|
||||
pmaxsw xmm0, xmm4
|
||||
pmaxsw xmm0, xmm5
|
||||
pmaxsw xmm0, xmm7
|
||||
pmaxsw xmm0, xmm7
|
||||
movdqa xmm1, xmm0
|
||||
punpckhqdq xmm0, xmm1
|
||||
pmaxsw xmm0, xmm1
|
||||
|
||||
mov edx, [max]
|
||||
movq [edx], xmm0
|
||||
|
||||
ret
|
||||
mov edx, [max]
|
||||
movq [edx], xmm0
|
||||
|
||||
ret
|
||||
|
||||
%macro MMX_Copy4Times 2
|
||||
movd %1, %2
|
||||
@ -185,10 +185,10 @@ WelsQuantFour4x4Max_sse2:
|
||||
SECTION .text
|
||||
|
||||
%macro MMX_Quant4 4
|
||||
pxor %2, %2
|
||||
pcmpgtw %2, %1
|
||||
pxor %1, %2
|
||||
psubw %1, %2
|
||||
pxor %2, %2
|
||||
pcmpgtw %2, %1
|
||||
pxor %1, %2
|
||||
psubw %1, %2
|
||||
paddusw %1, %3
|
||||
pmulhuw %1, %4
|
||||
pxor %1, %2
|
||||
@ -211,13 +211,13 @@ WelsHadamardQuant2x2_mmx:
|
||||
movd mm3, [eax + 0x40]
|
||||
movd mm1, [eax + 0x60]
|
||||
punpcklwd mm3, mm1
|
||||
|
||||
|
||||
mov cx, 0
|
||||
mov [eax], cx
|
||||
mov [eax + 0x20], cx
|
||||
mov [eax + 0x40], cx
|
||||
mov [eax + 0x60], cx
|
||||
|
||||
|
||||
;hdm_2x2, mm0 = dct0 dct1, mm3 = dct2 dct3
|
||||
movq mm5, mm3
|
||||
paddw mm3, mm0
|
||||
@ -229,32 +229,32 @@ WelsHadamardQuant2x2_mmx:
|
||||
paddw mm1, mm3
|
||||
psubw mm3, mm5
|
||||
punpcklwd mm1, mm3
|
||||
|
||||
|
||||
;quant_2x2_dc
|
||||
mov ax, [mf]
|
||||
MMX_Copy4Times mm3, eax
|
||||
MMX_Copy4Times mm3, eax
|
||||
mov cx, [ff]
|
||||
MMX_Copy4Times mm2, ecx
|
||||
MMX_Quant4 mm1, mm0, mm2, mm3
|
||||
|
||||
|
||||
; store dct_2x2
|
||||
mov edx, [dct2x2]
|
||||
mov edx, [dct2x2]
|
||||
movq [edx], mm1
|
||||
mov ecx, [iChromaDc]
|
||||
movq [ecx], mm1
|
||||
|
||||
|
||||
; pNonZeroCount of dct_2x2
|
||||
pcmpeqb mm2, mm2 ; mm2 = FF
|
||||
pcmpeqb mm2, mm2 ; mm2 = FF
|
||||
pxor mm3, mm3
|
||||
packsswb mm1, mm3
|
||||
pcmpeqb mm1, mm3 ; set FF if equal, 0 if not equal
|
||||
psubsb mm1, mm2 ; set 0 if equal, 1 if not equal
|
||||
psadbw mm1, mm3 ;
|
||||
movd eax, mm1
|
||||
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
;int32_t WelsHadamardQuant2x2Skip_mmx(int16_t *pDct, int16_t ff, int16_t mf);
|
||||
;***********************************************************************
|
||||
@ -269,7 +269,7 @@ WelsHadamardQuant2x2Skip_mmx:
|
||||
movd mm3, [eax + 0x40]
|
||||
movd mm1, [eax + 0x60]
|
||||
punpcklwd mm3, mm1
|
||||
|
||||
|
||||
;hdm_2x2, mm0 = dct0 dct1, mm3 = dct2 dct3
|
||||
movq mm5, mm3
|
||||
paddw mm3, mm0
|
||||
@ -281,32 +281,32 @@ WelsHadamardQuant2x2Skip_mmx:
|
||||
paddw mm1, mm3
|
||||
psubw mm3, mm5
|
||||
punpcklwd mm1, mm3
|
||||
|
||||
|
||||
;quant_2x2_dc
|
||||
mov ax, [mf]
|
||||
MMX_Copy4Times mm3, eax
|
||||
MMX_Copy4Times mm3, eax
|
||||
mov cx, [ff]
|
||||
MMX_Copy4Times mm2, ecx
|
||||
MMX_Quant4 mm1, mm0, mm2, mm3
|
||||
|
||||
|
||||
; pNonZeroCount of dct_2x2
|
||||
pcmpeqb mm2, mm2 ; mm2 = FF
|
||||
pcmpeqb mm2, mm2 ; mm2 = FF
|
||||
pxor mm3, mm3
|
||||
packsswb mm1, mm3
|
||||
pcmpeqb mm1, mm3 ; set FF if equal, 0 if not equal
|
||||
psubsb mm1, mm2 ; set 0 if equal, 1 if not equal
|
||||
psadbw mm1, mm3 ;
|
||||
movd eax, mm1
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
|
||||
%macro SSE2_DeQuant8 3
|
||||
|
||||
WELSEMMS
|
||||
ret
|
||||
|
||||
|
||||
%macro SSE2_DeQuant8 3
|
||||
MOVDQ %2, %1
|
||||
pmullw %2, %3
|
||||
MOVDQ %1, %2
|
||||
%endmacro
|
||||
%endmacro
|
||||
|
||||
|
||||
ALIGN 16
|
||||
@ -329,7 +329,7 @@ WelsDequant4x4_sse2:
|
||||
;***********************************************************************====
|
||||
;void WelsDequantFour4x4_sse2(int16_t *pDct, const uint16_t* mf);
|
||||
;***********************************************************************====
|
||||
|
||||
|
||||
align 16
|
||||
|
||||
WELS_EXTERN WelsDequantFour4x4_sse2
|
||||
@ -356,39 +356,38 @@ WelsDequantFour4x4_sse2:
|
||||
WELS_EXTERN WelsDequantIHadamard4x4_sse2
|
||||
align 16
|
||||
WelsDequantIHadamard4x4_sse2:
|
||||
mov eax, [esp + 4]
|
||||
mov eax, [esp + 4]
|
||||
mov cx, [esp + 8]
|
||||
|
||||
|
||||
; WelsDequantLumaDc4x4
|
||||
SSE2_Copy8Times xmm1, ecx
|
||||
SSE2_Copy8Times xmm1, ecx
|
||||
;psrlw xmm1, 2 ; for the (>>2) in ihdm
|
||||
MOVDQ xmm0, [eax]
|
||||
MOVDQ xmm2, [eax+0x10]
|
||||
pmullw xmm0, xmm1
|
||||
pmullw xmm0, xmm1
|
||||
pmullw xmm2, xmm1
|
||||
|
||||
; ihdm_4x4
|
||||
movdqa xmm1, xmm0
|
||||
psrldq xmm1, 8
|
||||
movdqa xmm3, xmm2
|
||||
psrldq xmm3, 8
|
||||
|
||||
SSE2_SumSub xmm0, xmm3, xmm5 ; xmm0 = xmm0 - xmm3, xmm3 = xmm0 + xmm3
|
||||
SSE2_SumSub xmm1, xmm2, xmm5 ; xmm1 = xmm1 - xmm2, xmm2 = xmm1 + xmm2
|
||||
psrldq xmm3, 8
|
||||
|
||||
SSE2_SumSub xmm0, xmm3, xmm5 ; xmm0 = xmm0 - xmm3, xmm3 = xmm0 + xmm3
|
||||
SSE2_SumSub xmm1, xmm2, xmm5 ; xmm1 = xmm1 - xmm2, xmm2 = xmm1 + xmm2
|
||||
SSE2_SumSub xmm3, xmm2, xmm5 ; xmm3 = xmm3 - xmm2, xmm2 = xmm3 + xmm2
|
||||
SSE2_SumSub xmm0, xmm1, xmm5 ; xmm0 = xmm0 - xmm1, xmm1 = xmm0 + xmm1
|
||||
|
||||
SSE2_TransTwo4x4W xmm2, xmm1, xmm3, xmm0, xmm4
|
||||
SSE2_SumSub xmm2, xmm4, xmm5
|
||||
SSE2_SumSub xmm1, xmm0, xmm5
|
||||
SSE2_SumSub xmm4, xmm0, xmm5
|
||||
SSE2_SumSub xmm2, xmm1, xmm5
|
||||
SSE2_TransTwo4x4W xmm2, xmm1, xmm3, xmm0, xmm4
|
||||
SSE2_SumSub xmm2, xmm4, xmm5
|
||||
SSE2_SumSub xmm1, xmm0, xmm5
|
||||
SSE2_SumSub xmm4, xmm0, xmm5
|
||||
SSE2_SumSub xmm2, xmm1, xmm5
|
||||
SSE2_TransTwo4x4W xmm0, xmm1, xmm4, xmm2, xmm3
|
||||
|
||||
|
||||
punpcklqdq xmm0, xmm1
|
||||
MOVDQ [eax], xmm0
|
||||
|
||||
|
||||
punpcklqdq xmm2, xmm3
|
||||
MOVDQ [eax+16], xmm2
|
||||
MOVDQ [eax+16], xmm2
|
||||
ret
|
||||
|
File diff suppressed because it is too large
Load Diff
@ -45,7 +45,7 @@
|
||||
bits 32
|
||||
|
||||
;***********************************************************************
|
||||
; Macros
|
||||
; Macros
|
||||
;***********************************************************************
|
||||
|
||||
;***********************************************************************
|
||||
@ -59,7 +59,7 @@ align 16
|
||||
sse2_1: dw 1, 1, 1, 1, 1, 1, 1, 1
|
||||
align 16
|
||||
sse2_b1: db 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1, 1
|
||||
i_ds_table: db 3, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
|
||||
i_ds_table: db 3, 2, 2, 1, 1, 1, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0
|
||||
align 16
|
||||
sse2_plane_inc_minus: dw -7, -6, -5, -4, -3, -2, -1, 0
|
||||
align 16
|
||||
@ -139,7 +139,7 @@ low_mask_table:
|
||||
db 4, 8, 5, 8, 8,12, 1, 4, 4, 8
|
||||
db 4, 7, 7,11, 4, 8, 7,11, 8,11
|
||||
db 11,15, 1, 4, 3, 7, 4, 7, 7,11
|
||||
db 3, 7, 6,10, 7,10,10,14, 4, 7
|
||||
db 3, 7, 6,10, 7,10,10,14, 4, 7
|
||||
db 7,11, 7,10,10,14, 7,11,10,14
|
||||
db 11,14,14,18, 0, 4, 3, 7, 3, 6
|
||||
db 6,10, 3, 7, 6,10, 7,10,10,14
|
||||
@ -191,7 +191,7 @@ WelsScan4x4DcAc_sse2:
|
||||
movdqa [eax],xmm0
|
||||
movdqa [eax+16], xmm1
|
||||
ret
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
;void WelsScan4x4DcAc_ssse3( int16_t level[16], int16_t *pDct )
|
||||
;***********************************************************************
|
||||
@ -206,7 +206,7 @@ WelsScan4x4DcAc_ssse3:
|
||||
pinsrw xmm0, eax, 7 ; xmm0[7] = [8]
|
||||
pinsrw xmm1, ecx, 0 ; xmm1[0] = [7]
|
||||
pshufb xmm1, [pb_scanacdc_maskb]
|
||||
pshufb xmm0, [pb_scanacdc_maska]
|
||||
pshufb xmm0, [pb_scanacdc_maska]
|
||||
|
||||
mov eax, [esp+4]
|
||||
movdqa [eax],xmm0
|
||||
@ -224,7 +224,7 @@ WelsScan4x4Ac_sse2:
|
||||
movdqa xmm2, xmm0
|
||||
punpcklqdq xmm0, xmm1
|
||||
punpckhqdq xmm2, xmm1
|
||||
|
||||
|
||||
movdqa xmm3, xmm0
|
||||
punpckldq xmm0, xmm2
|
||||
punpckhdq xmm3, xmm2
|
||||
@ -236,10 +236,10 @@ WelsScan4x4Ac_sse2:
|
||||
pextrw edx, xmm3, 0
|
||||
pinsrw xmm3, eax, 0
|
||||
pinsrw xmm0, edx, 3
|
||||
|
||||
|
||||
pshufhw xmm1, xmm0, 0x93
|
||||
pshuflw xmm2, xmm3, 0x39
|
||||
|
||||
|
||||
movdqa xmm3, xmm2
|
||||
psrldq xmm1, 2
|
||||
pslldq xmm3, 14
|
||||
@ -255,13 +255,13 @@ WelsScan4x4Ac_sse2:
|
||||
;void int32_t WelsCalculateSingleCtr4x4_sse2( int16_t *pDct );
|
||||
;***********************************************************************
|
||||
ALIGN 16
|
||||
WELS_EXTERN WelsCalculateSingleCtr4x4_sse2
|
||||
WELS_EXTERN WelsCalculateSingleCtr4x4_sse2
|
||||
WelsCalculateSingleCtr4x4_sse2:
|
||||
push ebx
|
||||
mov eax, [esp+8]
|
||||
movdqa xmm0, [eax]
|
||||
movdqa xmm1, [eax+16]
|
||||
|
||||
|
||||
packsswb xmm0, xmm1
|
||||
|
||||
pxor xmm3, xmm3
|
||||
@ -317,7 +317,7 @@ WelsGetNoneZeroCount_sse2:
|
||||
and edx, 0xff
|
||||
shr ecx, 8
|
||||
; and ecx, 0xff ; we do not need this due to high 16bits equal to 0 yet
|
||||
xor eax, eax
|
||||
xor eax, eax
|
||||
add al, [nozero_count_table+ecx]
|
||||
add al, [nozero_count_table+edx]
|
||||
ret
|
||||
|
@ -38,7 +38,7 @@
|
||||
;* 04/14/2010 Created
|
||||
;* 06/07/2010 Added AnalysisVaaInfoIntra_sse2(ssse3)
|
||||
;* 06/10/2010 Tune rc_sad_frame_sse2 and got about 40% improvement
|
||||
;* 08/11/2010 Added abs_difference_mbrow_sse2 & sum_sqrsum_mbrow_sse2
|
||||
;* 08/11/2010 Added abs_difference_mbrow_sse2 & sum_sqrsum_mbrow_sse2
|
||||
;*
|
||||
;*************************************************************************/
|
||||
%include "asm_inc.asm"
|
||||
@ -167,7 +167,7 @@ AnalysisVaaInfoIntra_sse2:
|
||||
mov ebp, esp
|
||||
and ebp, 0fh
|
||||
sub esp, ebp
|
||||
sub esp, 32
|
||||
sub esp, 32
|
||||
%define PUSH_SIZE 52 ; 20 + 32
|
||||
|
||||
mov esi, [esp+ebp+PUSH_SIZE+4] ; data_y
|
||||
@ -179,31 +179,31 @@ AnalysisVaaInfoIntra_sse2:
|
||||
add edx, ecx ; iLineSize x 3 [edx]
|
||||
mov eax, ebx
|
||||
sal eax, $1 ; iLineSize x 4 [eax]
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
|
||||
; loops
|
||||
VAA_AVG_BLOCK_SSE2 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp], xmm0
|
||||
movq [esp], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSE2 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp+8], xmm0
|
||||
movq [esp+8], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSE2 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp+16], xmm0
|
||||
movq [esp+16], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSE2 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp+24], xmm0
|
||||
|
||||
|
||||
movdqa xmm0, [esp] ; block 0~7
|
||||
movdqa xmm1, [esp+16] ; block 8~15
|
||||
movdqa xmm2, xmm0
|
||||
paddw xmm0, xmm1
|
||||
SUM_WORD_8x2_SSE2 xmm0, xmm3
|
||||
|
||||
|
||||
pmullw xmm1, xmm1
|
||||
pmullw xmm2, xmm2
|
||||
movdqa xmm3, xmm1
|
||||
@ -219,7 +219,7 @@ AnalysisVaaInfoIntra_sse2:
|
||||
paddd xmm1, xmm2
|
||||
pshufd xmm2, xmm1, 0B1h
|
||||
paddd xmm1, xmm2
|
||||
|
||||
|
||||
movd ebx, xmm0
|
||||
and ebx, 0ffffh ; effective low word truncated
|
||||
mov ecx, ebx
|
||||
@ -227,7 +227,7 @@ AnalysisVaaInfoIntra_sse2:
|
||||
sar ebx, $4
|
||||
movd eax, xmm1
|
||||
sub eax, ebx
|
||||
|
||||
|
||||
%undef PUSH_SIZE
|
||||
add esp, 32
|
||||
add esp, ebp
|
||||
@ -253,7 +253,7 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
mov ebp, esp
|
||||
and ebp, 0fh
|
||||
sub esp, ebp
|
||||
sub esp, 32
|
||||
sub esp, 32
|
||||
%define PUSH_SIZE 52 ; 20 + 32
|
||||
|
||||
mov esi, [esp+ebp+PUSH_SIZE+4] ; data_y
|
||||
@ -265,25 +265,25 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
add edx, ecx ; iLineSize x 3 [edx]
|
||||
mov eax, ebx
|
||||
sal eax, $1 ; iLineSize x 4 [eax]
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
|
||||
; loops
|
||||
VAA_AVG_BLOCK_SSSE3 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp], xmm0
|
||||
movq [esp], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSSE3 xmm1, xmm2, xmm3, xmm4, xmm5, xmm6
|
||||
movq [esp+8], xmm1
|
||||
movq [esp+8], xmm1
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSSE3 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp+16], xmm0
|
||||
movq [esp+16], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSSE3 xmm1, xmm2, xmm3, xmm4, xmm5, xmm6
|
||||
movq [esp+24], xmm1
|
||||
|
||||
|
||||
movdqa xmm0, [esp] ; block 0~7
|
||||
movdqa xmm1, [esp+16] ; block 8~15
|
||||
movdqa xmm2, xmm0
|
||||
@ -305,7 +305,7 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
paddd xmm1, xmm2
|
||||
pshufd xmm2, xmm1, 0B1h
|
||||
paddd xmm1, xmm2
|
||||
|
||||
|
||||
movd ebx, xmm0
|
||||
and ebx, 0ffffh ; effective low work truncated
|
||||
mov ecx, ebx
|
||||
@ -313,7 +313,7 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
sar ebx, $4
|
||||
movd eax, xmm1
|
||||
sub eax, ebx
|
||||
|
||||
|
||||
%undef PUSH_SIZE
|
||||
add esp, 32
|
||||
add esp, ebp
|
||||
@ -323,7 +323,7 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
pop edx
|
||||
pop ebx
|
||||
ret
|
||||
|
||||
|
||||
WELS_EXTERN MdInterAnalysisVaaInfo_sse41
|
||||
;***********************************************************************
|
||||
; uint8_t MdInterAnalysisVaaInfo_sse41( int32_t *pSad8x8 )
|
||||
@ -331,18 +331,18 @@ WELS_EXTERN MdInterAnalysisVaaInfo_sse41
|
||||
ALIGN 16
|
||||
MdInterAnalysisVaaInfo_sse41:
|
||||
mov eax, [esp+4]
|
||||
movdqa xmm0, [eax] ; load 4 sad_8x8
|
||||
movdqa xmm0, [eax] ; load 4 sad_8x8
|
||||
pshufd xmm1, xmm0, 01Bh
|
||||
paddd xmm1, xmm0
|
||||
pshufd xmm2, xmm1, 0B1h
|
||||
paddd xmm1, xmm2
|
||||
paddd xmm1, xmm2
|
||||
psrad xmm1, 02h ; iAverageSad
|
||||
movdqa xmm2, xmm1
|
||||
psrad xmm2, 06h
|
||||
movdqa xmm3, xmm0 ; iSadBlock
|
||||
psrad xmm3, 06h
|
||||
psubd xmm3, xmm2
|
||||
pmulld xmm3, xmm3 ; [comment]: pmulld from SSE4.1 instruction sets
|
||||
pmulld xmm3, xmm3 ; [comment]: pmulld from SSE4.1 instruction sets
|
||||
pshufd xmm4, xmm3, 01Bh
|
||||
paddd xmm4, xmm3
|
||||
pshufd xmm3, xmm4, 0B1h
|
||||
@ -354,7 +354,7 @@ MdInterAnalysisVaaInfo_sse41:
|
||||
pcmpgtd xmm0, xmm1 ; iSadBlock > iAverageSad
|
||||
movmskps eax, xmm0
|
||||
ret
|
||||
.threshold_exit:
|
||||
.threshold_exit:
|
||||
mov eax, 15
|
||||
ret
|
||||
|
||||
@ -365,28 +365,28 @@ WELS_EXTERN MdInterAnalysisVaaInfo_sse2
|
||||
ALIGN 16
|
||||
MdInterAnalysisVaaInfo_sse2:
|
||||
mov eax, [esp+4]
|
||||
movdqa xmm0, [eax] ; load 4 sad_8x8
|
||||
movdqa xmm0, [eax] ; load 4 sad_8x8
|
||||
pshufd xmm1, xmm0, 01Bh
|
||||
paddd xmm1, xmm0
|
||||
pshufd xmm2, xmm1, 0B1h
|
||||
paddd xmm1, xmm2
|
||||
paddd xmm1, xmm2
|
||||
psrad xmm1, 02h ; iAverageSad
|
||||
movdqa xmm2, xmm1
|
||||
psrad xmm2, 06h
|
||||
movdqa xmm3, xmm0 ; iSadBlock
|
||||
psrad xmm3, 06h
|
||||
psubd xmm3, xmm2
|
||||
|
||||
|
||||
; to replace pmulld functionality as below
|
||||
movdqa xmm2, xmm3
|
||||
movdqa xmm2, xmm3
|
||||
pmuludq xmm2, xmm3
|
||||
pshufd xmm4, xmm3, 0B1h
|
||||
pmuludq xmm4, xmm4
|
||||
movdqa xmm5, xmm2
|
||||
punpckldq xmm5, xmm4
|
||||
punpckhdq xmm2, xmm4
|
||||
punpcklqdq xmm5, xmm2
|
||||
|
||||
punpcklqdq xmm5, xmm2
|
||||
|
||||
pshufd xmm4, xmm5, 01Bh
|
||||
paddd xmm4, xmm5
|
||||
pshufd xmm5, xmm4, 0B1h
|
||||
@ -398,6 +398,6 @@ MdInterAnalysisVaaInfo_sse2:
|
||||
pcmpgtd xmm0, xmm1 ; iSadBlock > iAverageSad
|
||||
movmskps eax, xmm0
|
||||
ret
|
||||
.threshold_exit:
|
||||
.threshold_exit:
|
||||
mov eax, 15
|
||||
ret
|
||||
|
@ -27,18 +27,18 @@ LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED
|
||||
// TEXTINCLUDE
|
||||
//
|
||||
|
||||
1 TEXTINCLUDE
|
||||
1 TEXTINCLUDE
|
||||
BEGIN
|
||||
"resource.h\0"
|
||||
END
|
||||
|
||||
2 TEXTINCLUDE
|
||||
2 TEXTINCLUDE
|
||||
BEGIN
|
||||
"#include ""afxres.h""\r\n"
|
||||
"\0"
|
||||
END
|
||||
|
||||
3 TEXTINCLUDE
|
||||
3 TEXTINCLUDE
|
||||
BEGIN
|
||||
"\r\n"
|
||||
"\0"
|
||||
|
@ -3,7 +3,7 @@ NAME = libwelsvp
|
||||
|
||||
OUTDIR = ../../../bin/linux
|
||||
BINDIR = ../../bin
|
||||
OBJDIR = ../../obj
|
||||
OBJDIR = ../../obj
|
||||
SRCDIRS = ../../src/asm \
|
||||
../../src/common \
|
||||
../../src/adaptivequantization \
|
||||
@ -12,7 +12,7 @@ SRCDIRS = ../../src/asm \
|
||||
../../src/downsample \
|
||||
../../src/scenechangedetection \
|
||||
../../src/vaacalc \
|
||||
../../src/complexityanalysis
|
||||
../../src/complexityanalysis
|
||||
SRCDIRS += ../../src/imagerotate
|
||||
|
||||
|
||||
@ -28,7 +28,7 @@ CPPFLAGS += -DX86_ASM
|
||||
endif
|
||||
ASMFLAGS = -f elf -DNOPREFIX -I ../../src/asm/
|
||||
LDFLAGS = -lstdc++ -ldl
|
||||
|
||||
|
||||
SRCEXTS = .cpp
|
||||
ifeq ($(NASM), 1)
|
||||
SRCEXTS += .asm
|
||||
@ -54,11 +54,11 @@ LINK = $(GCC) $(LDFLAGS)
|
||||
.SUFFIXES:
|
||||
|
||||
all: $(TARGETLIB)
|
||||
|
||||
|
||||
%.d:%.cpp
|
||||
@echo -n $(dir $<) > $@
|
||||
@$(DEPEND_cpp.d) $< >> $@
|
||||
|
||||
|
||||
%.d:%.asm
|
||||
@echo -n $(dir $<) > $@
|
||||
@$(DEPEND_asm.d) $< >> $@
|
||||
@ -67,9 +67,9 @@ objs:$(OBJS)
|
||||
|
||||
%.o:%.cpp
|
||||
$(COMPILE.cpp) $< -o $@
|
||||
|
||||
|
||||
%.o:%.asm
|
||||
$(COMPILE.asm) $< -o $@
|
||||
$(COMPILE.asm) $< -o $@
|
||||
|
||||
tags: $(HEADERS) $(SOURCES)
|
||||
etags $(HEADERS) $(SOURCES)
|
||||
|
@ -43,7 +43,7 @@
|
||||
; Options, for DEBUG
|
||||
;***********************************************************************
|
||||
|
||||
%if 1
|
||||
%if 1
|
||||
%define MOVDQ movdqa
|
||||
%else
|
||||
%define MOVDQ movdqu
|
||||
@ -58,7 +58,7 @@
|
||||
BITS 32
|
||||
|
||||
;***********************************************************************
|
||||
; Macros
|
||||
; Macros
|
||||
;***********************************************************************
|
||||
|
||||
%macro WELS_EXTERN 1
|
||||
@ -74,7 +74,7 @@ BITS 32
|
||||
pxor %2, %2
|
||||
psubw %2, %1
|
||||
pmaxsw %1, %2
|
||||
%endmacro
|
||||
%endmacro
|
||||
|
||||
%macro MMX_XSwap 4
|
||||
movq %4, %2
|
||||
@ -105,7 +105,7 @@ BITS 32
|
||||
SSE2_XSawp qdq, %5, %2, %3
|
||||
%endmacro
|
||||
|
||||
;in: xmm0, xmm1, xmm2, xmm3 pOut: xmm0, xmm1, xmm3, xmm4
|
||||
;in: xmm0, xmm1, xmm2, xmm3 pOut: xmm0, xmm1, xmm3, xmm4
|
||||
%macro SSE2_TransTwo4x4W 5
|
||||
SSE2_XSawp wd, %1, %2, %5
|
||||
SSE2_XSawp wd, %3, %4, %2
|
||||
@ -125,26 +125,26 @@ BITS 32
|
||||
movdqa %6, %9
|
||||
movdqa %9, %4
|
||||
SSE2_XSawp bw, %7, %6, %4
|
||||
|
||||
SSE2_XSawp wd, %1, %3, %6
|
||||
|
||||
SSE2_XSawp wd, %1, %3, %6
|
||||
SSE2_XSawp wd, %8, %2, %3
|
||||
SSE2_XSawp wd, %5, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %3
|
||||
movdqa %9, %3
|
||||
SSE2_XSawp wd, %7, %4, %3
|
||||
|
||||
SSE2_XSawp dq, %1, %5, %4
|
||||
|
||||
SSE2_XSawp dq, %1, %5, %4
|
||||
SSE2_XSawp dq, %6, %2, %5
|
||||
SSE2_XSawp dq, %8, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %5
|
||||
movdqa %9, %5
|
||||
SSE2_XSawp dq, %7, %3, %5
|
||||
|
||||
|
||||
SSE2_XSawp qdq, %1, %8, %3
|
||||
SSE2_XSawp qdq, %4, %2, %8
|
||||
SSE2_XSawp qdq, %6, %7, %2
|
||||
movdqa %7, %9
|
||||
movdqa %9, %1
|
||||
movdqa %9, %1
|
||||
SSE2_XSawp qdq, %7, %5, %1
|
||||
movdqa %5, %9
|
||||
%endmacro
|
||||
@ -170,9 +170,9 @@ BITS 32
|
||||
%macro butterfly_1to16_sse 3 ; xmm? for dst, xmm? for tmp, one byte for pSrc [generic register name: a/b/c/d]
|
||||
mov %3h, %3l
|
||||
movd %1, e%3x ; i.e, 1% = eax (=b0)
|
||||
pshuflw %2, %1, 00h ; ..., b0 b0 b0 b0 b0 b0 b0 b0
|
||||
pshufd %1, %2, 00h ; b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0
|
||||
%endmacro
|
||||
pshuflw %2, %1, 00h ; ..., b0 b0 b0 b0 b0 b0 b0 b0
|
||||
pshufd %1, %2, 00h ; b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0, b0 b0 b0 b0
|
||||
%endmacro
|
||||
|
||||
;copy a dw into a xmm for 8 times
|
||||
%macro SSE2_Copy8Times 2
|
||||
|
@ -84,12 +84,12 @@ ALIGN 16
|
||||
; void WelsCPUId( int32_t uiIndex, int32_t *pFeatureA, int32_t *pFeatureB, int32_t *pFeatureC, int32_t *pFeatureD )
|
||||
;****************************************************************************************************
|
||||
WelsCPUId:
|
||||
push ebx
|
||||
push ebx
|
||||
push edi
|
||||
|
||||
|
||||
mov eax, [esp+12] ; operating index
|
||||
cpuid ; cpuid
|
||||
|
||||
|
||||
; processing various information return
|
||||
mov edi, [esp+16]
|
||||
mov [edi], eax
|
||||
@ -100,10 +100,10 @@ WelsCPUId:
|
||||
mov edi, [esp+28]
|
||||
mov [edi], edx
|
||||
|
||||
pop edi
|
||||
pop edi
|
||||
pop ebx
|
||||
ret
|
||||
|
||||
|
||||
WELS_EXTERN WelsCPUSupportAVX
|
||||
; need call after cpuid=1 and eax, ecx flag got then
|
||||
ALIGN 16
|
||||
@ -139,7 +139,7 @@ ALIGN 16
|
||||
WelsCPUSupportFMA:
|
||||
mov eax, [esp+4]
|
||||
mov ecx, [esp+8]
|
||||
|
||||
|
||||
; refer to detection of FMA addressed in INTEL AVX manual document
|
||||
and ecx, 018001000H
|
||||
cmp ecx, 018001000H ; check OSXSAVE, AVX, FMA feature flags
|
||||
@ -153,7 +153,7 @@ WelsCPUSupportFMA:
|
||||
mov eax, 1
|
||||
ret
|
||||
fma_not_supported:
|
||||
mov eax, 0
|
||||
mov eax, 0
|
||||
ret
|
||||
|
||||
WELS_EXTERN WelsEmms
|
||||
|
@ -55,25 +55,25 @@ BITS 32
|
||||
; Code
|
||||
;***********************************************************************
|
||||
SECTION .text
|
||||
|
||||
|
||||
%macro WEIGHT_LINE 9
|
||||
movq %2, %9
|
||||
punpcklbw %2, %7
|
||||
movdqa %8, %2
|
||||
|
||||
|
||||
movdqa %1, %6
|
||||
psubusb %1, %8
|
||||
psubusb %8, %6
|
||||
por %8, %1 ; ABS(curPixel - centerPixel);
|
||||
|
||||
|
||||
movdqa %1, %3
|
||||
psubusb %1, %8
|
||||
|
||||
pmullw %1, %1
|
||||
psrlw %1, 5
|
||||
pmullw %2, %1
|
||||
pmullw %2, %1
|
||||
paddusw %4, %1
|
||||
paddusw %5, %2
|
||||
paddusw %5, %2
|
||||
%endmacro
|
||||
|
||||
%macro WEIGHT_LINE1_UV 4
|
||||
@ -91,12 +91,12 @@ SECTION .text
|
||||
punpcklbw %2, %4
|
||||
psllw %2, 1
|
||||
paddw %3, %2
|
||||
|
||||
|
||||
movdqa %2, %1
|
||||
psrldq %2, 3
|
||||
punpcklbw %2, %4
|
||||
paddw %3, %2
|
||||
|
||||
|
||||
movdqa %2, %1
|
||||
psrldq %2, 4
|
||||
punpcklbw %2, %4
|
||||
@ -119,13 +119,13 @@ SECTION .text
|
||||
punpcklbw %2, %4
|
||||
psllw %2, 2
|
||||
paddw %3, %2
|
||||
|
||||
|
||||
movdqa %2, %1
|
||||
psrldq %2, 3
|
||||
punpcklbw %2, %4
|
||||
psllw %2, 1
|
||||
paddw %3, %2
|
||||
|
||||
|
||||
movdqa %2, %1
|
||||
psrldq %2, 4
|
||||
punpcklbw %2, %4
|
||||
@ -149,13 +149,13 @@ SECTION .text
|
||||
punpcklbw %2, %4
|
||||
pmullw %2, [sse2_20]
|
||||
paddw %3, %2
|
||||
|
||||
|
||||
movdqa %2, %1
|
||||
psrldq %2, 3
|
||||
punpcklbw %2, %4
|
||||
psllw %2, 2
|
||||
paddw %3, %2
|
||||
|
||||
|
||||
movdqa %2, %1
|
||||
psrldq %2, 4
|
||||
punpcklbw %2, %4
|
||||
@ -177,7 +177,7 @@ WELS_EXTERN BilateralLumaFilter8_sse2
|
||||
%define stride esp + pushsize + 8
|
||||
BilateralLumaFilter8_sse2:
|
||||
push ebx
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
mov eax, [pixel]
|
||||
mov ebx, eax
|
||||
@ -186,23 +186,23 @@ BilateralLumaFilter8_sse2:
|
||||
movdqa xmm3, [sse2_32]
|
||||
pxor xmm4, xmm4 ; nTotWeight
|
||||
pxor xmm5, xmm5 ; nSum
|
||||
|
||||
|
||||
dec eax
|
||||
mov ecx, [stride]
|
||||
|
||||
|
||||
WEIGHT_LINE xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [eax] ; pixel 4
|
||||
WEIGHT_LINE xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [eax + 2] ; pixel 5
|
||||
|
||||
|
||||
sub eax, ecx
|
||||
WEIGHT_LINE xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [eax] ; pixel 1
|
||||
WEIGHT_LINE xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [eax + 1] ; pixel 2
|
||||
WEIGHT_LINE xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [eax + 2] ; pixel 3
|
||||
|
||||
|
||||
lea eax, [eax + ecx * 2]
|
||||
WEIGHT_LINE xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [eax] ; pixel 6
|
||||
WEIGHT_LINE xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [eax + 1] ; pixel 7
|
||||
WEIGHT_LINE xmm1, xmm2, xmm3, xmm4, xmm5, xmm6, xmm7, xmm0, [eax + 2] ; pixel 8
|
||||
|
||||
|
||||
pcmpeqw xmm0, xmm0
|
||||
psrlw xmm0, 15
|
||||
psllw xmm0, 8
|
||||
@ -211,10 +211,10 @@ BilateralLumaFilter8_sse2:
|
||||
paddusw xmm5, xmm0
|
||||
psrlw xmm5, 8
|
||||
packuswb xmm5, xmm5
|
||||
movq [ebx], xmm5
|
||||
|
||||
movq [ebx], xmm5
|
||||
|
||||
pop ebx
|
||||
ret
|
||||
ret
|
||||
|
||||
WELS_EXTERN WaverageChromaFilter8_sse2
|
||||
;***********************************************************************
|
||||
@ -231,33 +231,33 @@ ALIGN 16
|
||||
WaverageChromaFilter8_sse2:
|
||||
mov edx, [esp + 4] ; pixels
|
||||
mov ecx, [esp + 8] ; stride
|
||||
|
||||
|
||||
mov eax, ecx
|
||||
add eax, eax
|
||||
sub edx, eax ; pixels - 2 * stride
|
||||
sub edx, 2
|
||||
|
||||
pxor xmm0, xmm0
|
||||
|
||||
pxor xmm0, xmm0
|
||||
pxor xmm3, xmm3
|
||||
|
||||
|
||||
movdqu xmm1, [edx]
|
||||
WEIGHT_LINE1_UV xmm1, xmm2, xmm3, xmm0
|
||||
|
||||
|
||||
movdqu xmm1, [edx + ecx]
|
||||
WEIGHT_LINE2_UV xmm1, xmm2, xmm3, xmm0
|
||||
|
||||
add edx, eax
|
||||
WEIGHT_LINE2_UV xmm1, xmm2, xmm3, xmm0
|
||||
|
||||
add edx, eax
|
||||
movdqu xmm1, [edx]
|
||||
WEIGHT_LINE3_UV xmm1, xmm2, xmm3, xmm0
|
||||
|
||||
|
||||
movdqu xmm1, [edx + ecx]
|
||||
WEIGHT_LINE2_UV xmm1, xmm2, xmm3, xmm0
|
||||
|
||||
WEIGHT_LINE2_UV xmm1, xmm2, xmm3, xmm0
|
||||
|
||||
movdqu xmm1, [edx + ecx * 2]
|
||||
WEIGHT_LINE1_UV xmm1, xmm2, xmm3, xmm0
|
||||
|
||||
WEIGHT_LINE1_UV xmm1, xmm2, xmm3, xmm0
|
||||
|
||||
psrlw xmm3, 6
|
||||
packuswb xmm3, xmm3
|
||||
movq [edx + 2], xmm3
|
||||
movq [edx + 2], xmm3
|
||||
|
||||
ret
|
||||
ret
|
@ -92,10 +92,10 @@ DyadicBilinearDownsamplerWidthx32_sse:
|
||||
mov edi, [esp+24] ; pDst
|
||||
mov edx, [esp+28] ; iDstStride
|
||||
mov esi, [esp+32] ; pSrc
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ebp, [esp+44] ; iSrcHeight
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
.yloops:
|
||||
mov eax, [esp+40] ; iSrcWidth
|
||||
@ -112,7 +112,7 @@ DyadicBilinearDownsamplerWidthx32_sse:
|
||||
;=> target:
|
||||
;: H G F E D C B A, P O N M L K J I
|
||||
;: h g f e d c b a, p o n m l k j i
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movq mm0, [esi] ; 1st pSrc line
|
||||
movq mm1, [esi+8] ; 1st pSrc line + 8
|
||||
movq mm2, [esi+ecx] ; 2nd pSrc line
|
||||
@ -140,7 +140,7 @@ DyadicBilinearDownsamplerWidthx32_sse:
|
||||
pshufw mm7, mm7, 0d8h ; p o n m P O N M ; 11011000 B: mm7
|
||||
|
||||
; to handle mm4, mm5, mm6, mm7
|
||||
movq mm0, mm4 ;
|
||||
movq mm0, mm4 ;
|
||||
punpckldq mm0, mm5 ; H G F E D C B A
|
||||
punpckhdq mm4, mm5 ; h g f e d c b a
|
||||
|
||||
@ -152,7 +152,7 @@ DyadicBilinearDownsamplerWidthx32_sse:
|
||||
pavgb mm0, mm4 ; (A+a+1)>>1, .., (H+h+1)>>1, temp_row1
|
||||
pavgb mm1, mm6 ; (I+i+1)>>1, .., (P+p+1)>>1, temp_row2
|
||||
pavgb mm0, mm1 ; (temp_row1+temp_row2+1)>>1, pending here and wait another horizonal part done then write memory once
|
||||
|
||||
|
||||
; 2nd part horizonal loop: x16 bytes
|
||||
; mem hi<- ->lo
|
||||
;1st Line Src: mm0: d D c C b B a A mm1: h H g G f F e E
|
||||
@ -245,10 +245,10 @@ DyadicBilinearDownsamplerWidthx16_sse:
|
||||
mov edi, [esp+24] ; pDst
|
||||
mov edx, [esp+28] ; iDstStride
|
||||
mov esi, [esp+32] ; pSrc
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ebp, [esp+44] ; iSrcHeight
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
.yloops:
|
||||
mov eax, [esp+40] ; iSrcWidth
|
||||
@ -265,7 +265,7 @@ DyadicBilinearDownsamplerWidthx16_sse:
|
||||
;=> target:
|
||||
;: H G F E D C B A, P O N M L K J I
|
||||
;: h g f e d c b a, p o n m l k j i
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movq mm0, [esi] ; 1st pSrc line
|
||||
movq mm1, [esi+8] ; 1st pSrc line + 8
|
||||
movq mm2, [esi+ecx] ; 2nd pSrc line
|
||||
@ -293,7 +293,7 @@ DyadicBilinearDownsamplerWidthx16_sse:
|
||||
pshufw mm7, mm7, 0d8h ; p o n m P O N M ; 11011000 B: mm7
|
||||
|
||||
; to handle mm4, mm5, mm6, mm7
|
||||
movq mm0, mm4 ;
|
||||
movq mm0, mm4 ;
|
||||
punpckldq mm0, mm5 ; H G F E D C B A
|
||||
punpckhdq mm4, mm5 ; h g f e d c b a
|
||||
|
||||
@ -306,7 +306,7 @@ DyadicBilinearDownsamplerWidthx16_sse:
|
||||
pavgb mm1, mm6 ; (I+i+1)>>1, .., (P+p+1)>>1, temp_row2
|
||||
pavgb mm0, mm1 ; (temp_row1+temp_row2+1)>>1, pending here and wait another horizonal part done then write memory once
|
||||
|
||||
movq [edi ], mm0
|
||||
movq [edi ], mm0
|
||||
|
||||
; next SMB
|
||||
lea esi, [esi+16]
|
||||
@ -349,10 +349,10 @@ DyadicBilinearDownsamplerWidthx8_sse:
|
||||
mov edi, [esp+24] ; pDst
|
||||
mov edx, [esp+28] ; iDstStride
|
||||
mov esi, [esp+32] ; pSrc
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ebp, [esp+44] ; iSrcHeight
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
.yloops:
|
||||
mov eax, [esp+40] ; iSrcWidth
|
||||
@ -369,9 +369,9 @@ DyadicBilinearDownsamplerWidthx8_sse:
|
||||
;=> target:
|
||||
;: H G F E D C B A
|
||||
;: h g f e d c b a
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movq mm0, [esi] ; 1st pSrc line
|
||||
movq mm1, [esi+ecx] ; 2nd pSrc line
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movq mm0, [esi] ; 1st pSrc line
|
||||
movq mm1, [esi+ecx] ; 2nd pSrc line
|
||||
|
||||
; to handle mm0, mm1, mm2, mm3
|
||||
pshufw mm2, mm0, 0d8h ; d D b B c C a A ; 11011000 B
|
||||
@ -382,19 +382,19 @@ DyadicBilinearDownsamplerWidthx8_sse:
|
||||
pshufw mm4, mm1, 0d8h ; h H f F g G e E ; 11011000 B
|
||||
pshufw mm5, mm4, 04eh ; g G e E h H f F ; 01001110 B
|
||||
punpcklbw mm4, mm5 ; h g H G f e F E
|
||||
pshufw mm4, mm4, 0d8h ; h g f e H G F E ; 11011000 B: mm5
|
||||
pshufw mm4, mm4, 0d8h ; h g f e H G F E ; 11011000 B: mm5
|
||||
|
||||
; to handle mm2, mm4
|
||||
movq mm0, mm2 ;
|
||||
movq mm0, mm2 ;
|
||||
punpckldq mm0, mm4 ; H G F E D C B A
|
||||
punpckhdq mm2, mm4 ; h g f e d c b a
|
||||
|
||||
; avg within MB horizon width (16 x 2 lines)
|
||||
pavgb mm0, mm2 ; (H+h+1)>>1, .., (A+a+1)>>1, temp_row1, 2
|
||||
pshufw mm1, mm0, 04eh ; 01001110 B
|
||||
pshufw mm1, mm0, 04eh ; 01001110 B
|
||||
pavgb mm0, mm1 ; (temp_row1+temp_row2+1)>>1, pending here and wait another horizonal part done then write memory once
|
||||
|
||||
movd [edi], mm0
|
||||
movd [edi], mm0
|
||||
|
||||
; next unit
|
||||
lea esi, [esi+8]
|
||||
@ -440,10 +440,10 @@ DyadicBilinearDownsamplerWidthx32_ssse3:
|
||||
mov edi, [esp+24] ; pDst
|
||||
mov edx, [esp+28] ; iDstStride
|
||||
mov esi, [esp+32] ; pSrc
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ebp, [esp+44] ; iSrcHeight
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
movdqa xmm7, [shufb_mask_low] ; mask low
|
||||
movdqa xmm6, [shufb_mask_high] ; mask high
|
||||
@ -467,13 +467,13 @@ DyadicBilinearDownsamplerWidthx32_ssse3:
|
||||
;: p o n m l k j i h g f e d c b a
|
||||
;: P .. A
|
||||
;: p .. a
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movdqa xmm0, [esi] ; 1st_src_line
|
||||
movdqa xmm1, [esi+16] ; 1st_src_line + 16
|
||||
movdqa xmm2, [esi+ecx] ; 2nd_src_line
|
||||
movdqa xmm3, [esi+ecx+16] ; 2nd_src_line + 16
|
||||
|
||||
movdqa xmm3, [esi+ecx+16] ; 2nd_src_line + 16
|
||||
|
||||
; packing & avg
|
||||
movdqa xmm4, xmm0 ; h H g G f F e E d D c C b B a A
|
||||
pshufb xmm0, xmm7 ; 0 H 0 G 0 F 0 E 0 D 0 C 0 B 0 A
|
||||
@ -487,26 +487,26 @@ DyadicBilinearDownsamplerWidthx32_ssse3:
|
||||
pshufb xmm1, xmm7
|
||||
pshufb xmm5, xmm6
|
||||
; psubb xmm5, xmm1
|
||||
; psrlw xmm5, 8
|
||||
; psrlw xmm5, 8
|
||||
pavgb xmm1, xmm5
|
||||
|
||||
movdqa xmm4, xmm2
|
||||
pshufb xmm2, xmm7
|
||||
pshufb xmm4, xmm6
|
||||
; psubb xmm4, xmm2
|
||||
; psrlw xmm4, 8
|
||||
; psrlw xmm4, 8
|
||||
pavgb xmm2, xmm4
|
||||
|
||||
movdqa xmm5, xmm3
|
||||
pshufb xmm3, xmm7
|
||||
pshufb xmm5, xmm6
|
||||
; psubb xmm5, xmm3
|
||||
; psrlw xmm5, 8
|
||||
; psrlw xmm5, 8
|
||||
pavgb xmm3, xmm5
|
||||
|
||||
packuswb xmm0, xmm1
|
||||
packuswb xmm2, xmm3
|
||||
pavgb xmm0, xmm2
|
||||
|
||||
packuswb xmm0, xmm1
|
||||
packuswb xmm2, xmm3
|
||||
pavgb xmm0, xmm2
|
||||
|
||||
; write pDst
|
||||
movdqa [edi], xmm0
|
||||
@ -526,7 +526,7 @@ DyadicBilinearDownsamplerWidthx32_ssse3:
|
||||
|
||||
dec ebp
|
||||
jg near .yloops
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
@ -551,11 +551,11 @@ DyadicBilinearDownsamplerWidthx16_ssse3:
|
||||
mov edi, [esp+24] ; pDst
|
||||
mov edx, [esp+28] ; iDstStride
|
||||
mov esi, [esp+32] ; pSrc
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ebp, [esp+44] ; iSrcHeight
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
movdqa xmm7, [shufb_mask_low] ; mask low
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
movdqa xmm7, [shufb_mask_low] ; mask low
|
||||
movdqa xmm6, [shufb_mask_high] ; mask high
|
||||
|
||||
.yloops:
|
||||
@ -574,31 +574,31 @@ DyadicBilinearDownsamplerWidthx16_ssse3:
|
||||
;: H G F E D C B A, P O N M L K J I
|
||||
;: h g f e d c b a, p o n m l k j i
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movdqa xmm0, [esi] ; 1st_src_line
|
||||
movdqa xmm1, [esi+ecx] ; 2nd_src_line
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movdqa xmm0, [esi] ; 1st_src_line
|
||||
movdqa xmm1, [esi+ecx] ; 2nd_src_line
|
||||
|
||||
; packing & avg
|
||||
movdqa xmm2, xmm0 ; h H g G f F e E d D c C b B a A
|
||||
pshufb xmm0, xmm7 ; 0 H 0 G 0 F 0 E 0 D 0 C 0 B 0 A
|
||||
pshufb xmm2, xmm6 ; 0 h 0 g 0 f 0 e 0 d 0 c 0 b 0 a
|
||||
; another implementation for xmm2 high bits
|
||||
; psubb xmm2, xmm0 ; h 0 g 0 f 0 e 0 d 0 c 0 b 0 a 0
|
||||
; psrlw xmm2, 8 ; 0 h 0 g 0 f 0 e 0 d 0 c 0 b 0 a
|
||||
; psrlw xmm2, 8 ; 0 h 0 g 0 f 0 e 0 d 0 c 0 b 0 a
|
||||
pavgb xmm0, xmm2
|
||||
|
||||
movdqa xmm3, xmm1
|
||||
pshufb xmm1, xmm7
|
||||
pshufb xmm3, xmm6
|
||||
; psubb xmm3, xmm1
|
||||
; psrlw xmm3, 8
|
||||
; psrlw xmm3, 8
|
||||
pavgb xmm1, xmm3
|
||||
|
||||
pavgb xmm0, xmm1
|
||||
packuswb xmm0, xmm1
|
||||
pavgb xmm0, xmm1
|
||||
packuswb xmm0, xmm1
|
||||
|
||||
; write pDst
|
||||
movq [edi], xmm0
|
||||
movq [edi], xmm0
|
||||
|
||||
; next SMB
|
||||
lea esi, [esi+16]
|
||||
@ -615,7 +615,7 @@ DyadicBilinearDownsamplerWidthx16_ssse3:
|
||||
|
||||
dec ebp
|
||||
jg near .yloops
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
@ -641,12 +641,12 @@ DyadicBilinearDownsamplerWidthx32_sse4:
|
||||
mov edi, [esp+24] ; pDst
|
||||
mov edx, [esp+28] ; iDstStride
|
||||
mov esi, [esp+32] ; pSrc
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ebp, [esp+44] ; iSrcHeight
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
movdqa xmm7, [shufb_mask_low] ; mask low
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
movdqa xmm7, [shufb_mask_low] ; mask low
|
||||
movdqa xmm6, [shufb_mask_high] ; mask high
|
||||
|
||||
.yloops:
|
||||
@ -668,13 +668,13 @@ DyadicBilinearDownsamplerWidthx32_sse4:
|
||||
;: p o n m l k j i h g f e d c b a
|
||||
;: P .. A
|
||||
;: p .. a
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movntdqa xmm0, [esi] ; 1st_src_line
|
||||
movntdqa xmm1, [esi+16] ; 1st_src_line + 16
|
||||
movntdqa xmm2, [esi+ecx] ; 2nd_src_line
|
||||
movntdqa xmm3, [esi+ecx+16] ; 2nd_src_line + 16
|
||||
|
||||
movntdqa xmm3, [esi+ecx+16] ; 2nd_src_line + 16
|
||||
|
||||
; packing & avg
|
||||
movdqa xmm4, xmm0 ; h H g G f F e E d D c C b B a A
|
||||
pshufb xmm0, xmm7 ; 0 H 0 G 0 F 0 E 0 D 0 C 0 B 0 A
|
||||
@ -703,10 +703,10 @@ DyadicBilinearDownsamplerWidthx32_sse4:
|
||||
; psubb xmm5, xmm3
|
||||
; psrlw xmm5, 8
|
||||
pavgb xmm3, xmm5
|
||||
|
||||
packuswb xmm0, xmm1
|
||||
packuswb xmm2, xmm3
|
||||
pavgb xmm0, xmm2
|
||||
|
||||
packuswb xmm0, xmm1
|
||||
packuswb xmm2, xmm3
|
||||
pavgb xmm0, xmm2
|
||||
|
||||
; write pDst
|
||||
movdqa [edi], xmm0
|
||||
@ -726,7 +726,7 @@ DyadicBilinearDownsamplerWidthx32_sse4:
|
||||
|
||||
dec ebp
|
||||
jg near .yloops
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
@ -751,10 +751,10 @@ DyadicBilinearDownsamplerWidthx16_sse4:
|
||||
mov edi, [esp+24] ; pDst
|
||||
mov edx, [esp+28] ; iDstStride
|
||||
mov esi, [esp+32] ; pSrc
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ecx, [esp+36] ; iSrcStride
|
||||
mov ebp, [esp+44] ; iSrcHeight
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
|
||||
sar ebp, $1 ; iSrcHeight >> 1
|
||||
movdqa xmm7, [shufb_mask_low] ; mask low
|
||||
movdqa xmm6, [shufb_mask_high] ; mask high
|
||||
|
||||
@ -774,10 +774,10 @@ DyadicBilinearDownsamplerWidthx16_sse4:
|
||||
;: H G F E D C B A, P O N M L K J I
|
||||
;: h g f e d c b a, p o n m l k j i
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movntdqa xmm0, [esi] ; 1st_src_line
|
||||
movntdqa xmm1, [esi+ecx] ; 2nd_src_line
|
||||
|
||||
;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;;
|
||||
movntdqa xmm0, [esi] ; 1st_src_line
|
||||
movntdqa xmm1, [esi+ecx] ; 2nd_src_line
|
||||
|
||||
; packing & avg
|
||||
movdqa xmm2, xmm0 ; h H g G f F e E d D c C b B a A
|
||||
pshufb xmm0, xmm7 ; 0 H 0 G 0 F 0 E 0 D 0 C 0 B 0 A
|
||||
@ -793,11 +793,11 @@ DyadicBilinearDownsamplerWidthx16_sse4:
|
||||
; psrlw xmm3, 8
|
||||
pavgb xmm1, xmm3
|
||||
|
||||
pavgb xmm0, xmm1
|
||||
packuswb xmm0, xmm1
|
||||
pavgb xmm0, xmm1
|
||||
packuswb xmm0, xmm1
|
||||
|
||||
; write pDst
|
||||
movq [edi], xmm0
|
||||
movq [edi], xmm0
|
||||
|
||||
; next SMB
|
||||
lea esi, [esi+16]
|
||||
@ -814,7 +814,7 @@ DyadicBilinearDownsamplerWidthx16_sse4:
|
||||
|
||||
dec ebp
|
||||
jg near .yloops
|
||||
|
||||
|
||||
pop ebp
|
||||
pop edi
|
||||
pop esi
|
||||
@ -858,7 +858,7 @@ GeneralBilinearAccurateDownsampler_sse2:
|
||||
%define xInverse esp + 20
|
||||
%define dstStep esp + 24
|
||||
sub esp, localsize
|
||||
|
||||
|
||||
pxor xmm0, xmm0
|
||||
mov edx, 32767
|
||||
mov eax, [uiScaleX]
|
||||
@ -871,7 +871,7 @@ GeneralBilinearAccurateDownsampler_sse2:
|
||||
psllq xmm1, 32
|
||||
por xmm1, xmm2 ; 0 0 uinc -uinc (dword)
|
||||
pshufd xmm7, xmm1, 01000100b ; xmm7: uinc -uinc uinc -uinc
|
||||
|
||||
|
||||
mov eax, [uiScaleY]
|
||||
and eax, 32767
|
||||
mov ebx, eax
|
||||
@ -882,15 +882,15 @@ GeneralBilinearAccurateDownsampler_sse2:
|
||||
psllq xmm6, 32
|
||||
por xmm6, xmm2 ; 0 0 vinc -vinc (dword)
|
||||
pshufd xmm6, xmm6, 01010000b ; xmm6: vinc vinc -vinc -vinc
|
||||
|
||||
|
||||
mov edx, 40003fffh
|
||||
movd xmm5, edx
|
||||
punpcklwd xmm5, xmm0 ; 16384 16383
|
||||
pshufd xmm5, xmm5, 01000100b ; xmm5: 16384 16383 16384 16383
|
||||
|
||||
|
||||
|
||||
DOWNSAMPLE:
|
||||
|
||||
|
||||
mov eax, [dwDstHeight]
|
||||
mov edi, [pDstData]
|
||||
mov edx, [dwDstStride]
|
||||
@ -901,10 +901,10 @@ DOWNSAMPLE:
|
||||
mov [tmpHeight], eax
|
||||
mov eax, 16384
|
||||
mov [yInverse], eax
|
||||
|
||||
|
||||
pshufd xmm4, xmm5, 01010000b ; initial v to 16384 16384 16383 16383
|
||||
|
||||
HEIGHT:
|
||||
|
||||
HEIGHT:
|
||||
mov eax, [yInverse]
|
||||
mov esi, [pSrcData]
|
||||
shr eax, 15
|
||||
@ -912,25 +912,25 @@ HEIGHT:
|
||||
add esi, eax ; get current row address
|
||||
mov ebp, esi
|
||||
add ebp, [dwSrcStride]
|
||||
|
||||
|
||||
mov eax, 16384
|
||||
mov [xInverse], eax
|
||||
mov ecx, [dwDstWidth]
|
||||
dec ecx
|
||||
|
||||
|
||||
movdqa xmm3, xmm5 ; initial u to 16384 16383 16384 16383
|
||||
|
||||
|
||||
WIDTH:
|
||||
mov eax, [xInverse]
|
||||
shr eax, 15
|
||||
|
||||
|
||||
movd xmm1, [esi+eax] ; xxxxxxba
|
||||
movd xmm2, [ebp+eax] ; xxxxxxdc
|
||||
pxor xmm0, xmm0
|
||||
punpcklwd xmm1, xmm2 ; xxxxdcba
|
||||
punpcklbw xmm1, xmm0 ; 0d0c0b0a
|
||||
punpcklwd xmm1, xmm0 ; 000d000c000b000a
|
||||
|
||||
|
||||
movdqa xmm2, xmm4 ; xmm2: vv(1-v)(1-v) tmpv
|
||||
pmaddwd xmm2, xmm3 ; mul u(1-u)u(1-u) on xmm2
|
||||
movdqa xmm0, xmm2
|
||||
@ -942,20 +942,20 @@ WIDTH:
|
||||
pshufd xmm1, xmm2, 00001110b
|
||||
paddq xmm2, xmm1
|
||||
psrlq xmm2, 29
|
||||
|
||||
|
||||
movd eax, xmm2
|
||||
inc eax
|
||||
shr eax, 1
|
||||
mov [edi], al
|
||||
inc edi
|
||||
|
||||
|
||||
mov eax, [uiScaleX]
|
||||
add [xInverse], eax
|
||||
|
||||
|
||||
paddw xmm3, xmm7 ; inc u
|
||||
psllw xmm3, 1
|
||||
psrlw xmm3, 1
|
||||
|
||||
|
||||
loop WIDTH
|
||||
|
||||
WIDTH_END:
|
||||
@ -964,41 +964,41 @@ WIDTH_END:
|
||||
mov cl, [esi+eax]
|
||||
mov [edi], cl
|
||||
inc edi
|
||||
|
||||
|
||||
mov eax, [uiScaleY]
|
||||
add [yInverse], eax
|
||||
add edi, [dstStep]
|
||||
|
||||
|
||||
paddw xmm4, xmm6 ; inc v
|
||||
psllw xmm4, 1
|
||||
psrlw xmm4, 1
|
||||
|
||||
|
||||
dec dword [tmpHeight]
|
||||
jg HEIGHT
|
||||
|
||||
|
||||
LAST_ROW:
|
||||
LAST_ROW:
|
||||
mov eax, [yInverse]
|
||||
mov esi, [pSrcData]
|
||||
shr eax, 15
|
||||
mul dword [dwSrcStride]
|
||||
add esi, eax ; get current row address
|
||||
|
||||
|
||||
mov eax, 16384
|
||||
mov [xInverse], eax
|
||||
mov ecx, [dwDstWidth]
|
||||
|
||||
|
||||
LAST_ROW_WIDTH:
|
||||
mov eax, [xInverse]
|
||||
shr eax, 15
|
||||
|
||||
|
||||
mov al, [esi+eax]
|
||||
mov [edi], al
|
||||
inc edi
|
||||
|
||||
|
||||
mov eax, [uiScaleX]
|
||||
add [xInverse], eax
|
||||
|
||||
|
||||
loop LAST_ROW_WIDTH
|
||||
|
||||
LAST_ROW_END:
|
||||
@ -1026,10 +1026,10 @@ LAST_ROW_END:
|
||||
%undef xInverse
|
||||
%undef dstStep
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
WELS_EXTERN GeneralBilinearFastDownsampler_sse2
|
||||
;**************************************************************************************************************
|
||||
;int GeneralBilinearFastDownsampler_sse2( unsigned char* pDst, const int iDstStride, const int iDstWidth, const int iDstHeight,
|
||||
@ -1062,7 +1062,7 @@ GeneralBilinearFastDownsampler_sse2:
|
||||
%define xInverse esp + 20
|
||||
%define dstStep esp + 24
|
||||
sub esp, localsize
|
||||
|
||||
|
||||
pxor xmm0, xmm0
|
||||
mov edx, 65535
|
||||
mov eax, [uiScaleX]
|
||||
@ -1075,7 +1075,7 @@ GeneralBilinearFastDownsampler_sse2:
|
||||
psllq xmm1, 32
|
||||
por xmm1, xmm2 ; 0 uinc 0 -uinc
|
||||
pshuflw xmm7, xmm1, 10001000b ; xmm7: uinc -uinc uinc -uinc
|
||||
|
||||
|
||||
mov eax, [uiScaleY]
|
||||
and eax, 32767
|
||||
mov ebx, eax
|
||||
@ -1086,15 +1086,15 @@ GeneralBilinearFastDownsampler_sse2:
|
||||
psllq xmm6, 32
|
||||
por xmm6, xmm2 ; 0 vinc 0 -vinc
|
||||
pshuflw xmm6, xmm6, 10100000b ; xmm6: vinc vinc -vinc -vinc
|
||||
|
||||
|
||||
mov edx, 80007fffh ; 32768 32767
|
||||
movd xmm5, edx
|
||||
movd xmm5, edx
|
||||
pshuflw xmm5, xmm5, 01000100b ; 32768 32767 32768 32767
|
||||
mov ebx, 16384
|
||||
|
||||
|
||||
|
||||
FAST_DOWNSAMPLE:
|
||||
|
||||
|
||||
mov eax, [dwDstHeight]
|
||||
mov edi, [pDstData]
|
||||
mov edx, [dwDstStride]
|
||||
@ -1105,11 +1105,11 @@ FAST_DOWNSAMPLE:
|
||||
mov [tmpHeight], eax
|
||||
mov eax, 16384
|
||||
mov [yInverse], eax
|
||||
|
||||
|
||||
pshuflw xmm4, xmm5, 01010000b
|
||||
psrlw xmm4, 1 ; initial v to 16384 16384 16383 16383
|
||||
|
||||
FAST_HEIGHT:
|
||||
|
||||
FAST_HEIGHT:
|
||||
mov eax, [yInverse]
|
||||
mov esi, [pSrcData]
|
||||
shr eax, 15
|
||||
@ -1117,23 +1117,23 @@ FAST_HEIGHT:
|
||||
add esi, eax ; get current row address
|
||||
mov ebp, esi
|
||||
add ebp, [dwSrcStride]
|
||||
|
||||
|
||||
mov eax, 32768
|
||||
mov [xInverse], eax
|
||||
mov ecx, [dwDstWidth]
|
||||
dec ecx
|
||||
|
||||
|
||||
movdqa xmm3, xmm5 ; initial u to 32768 32767 32768 32767
|
||||
|
||||
|
||||
FAST_WIDTH:
|
||||
mov eax, [xInverse]
|
||||
shr eax, 16
|
||||
|
||||
|
||||
movd xmm1, [esi+eax] ; xxxxxxba
|
||||
movd xmm2, [ebp+eax] ; xxxxxxdc
|
||||
punpcklwd xmm1, xmm2 ; xxxxdcba
|
||||
punpcklbw xmm1, xmm0 ; 0d0c0b0a
|
||||
|
||||
|
||||
movdqa xmm2, xmm4 ; xmm2: vv(1-v)(1-v) tmpv
|
||||
pmulhuw xmm2, xmm3 ; mul u(1-u)u(1-u) on xmm2
|
||||
pmaddwd xmm2, xmm1
|
||||
@ -1142,17 +1142,17 @@ FAST_WIDTH:
|
||||
movd xmm1, ebx
|
||||
paddd xmm2, xmm1
|
||||
psrld xmm2, 15
|
||||
|
||||
|
||||
packuswb xmm2, xmm0
|
||||
movd eax, xmm2
|
||||
mov [edi], al
|
||||
inc edi
|
||||
|
||||
|
||||
mov eax, [uiScaleX]
|
||||
add [xInverse], eax
|
||||
|
||||
|
||||
paddw xmm3, xmm7 ; inc u
|
||||
|
||||
|
||||
loop FAST_WIDTH
|
||||
|
||||
FAST_WIDTH_END:
|
||||
@ -1161,41 +1161,41 @@ FAST_WIDTH_END:
|
||||
mov cl, [esi+eax]
|
||||
mov [edi], cl
|
||||
inc edi
|
||||
|
||||
|
||||
mov eax, [uiScaleY]
|
||||
add [yInverse], eax
|
||||
add edi, [dstStep]
|
||||
|
||||
|
||||
paddw xmm4, xmm6 ; inc v
|
||||
psllw xmm4, 1
|
||||
psrlw xmm4, 1
|
||||
|
||||
|
||||
dec dword [tmpHeight]
|
||||
jg FAST_HEIGHT
|
||||
|
||||
|
||||
FAST_LAST_ROW:
|
||||
FAST_LAST_ROW:
|
||||
mov eax, [yInverse]
|
||||
mov esi, [pSrcData]
|
||||
shr eax, 15
|
||||
mul dword [dwSrcStride]
|
||||
add esi, eax ; get current row address
|
||||
|
||||
|
||||
mov eax, 32768
|
||||
mov [xInverse], eax
|
||||
mov ecx, [dwDstWidth]
|
||||
|
||||
|
||||
FAST_LAST_ROW_WIDTH:
|
||||
mov eax, [xInverse]
|
||||
shr eax, 16
|
||||
|
||||
|
||||
mov al, [esi+eax]
|
||||
mov [edi], al
|
||||
inc edi
|
||||
|
||||
|
||||
mov eax, [uiScaleX]
|
||||
add [xInverse], eax
|
||||
|
||||
|
||||
loop FAST_LAST_ROW_WIDTH
|
||||
|
||||
FAST_LAST_ROW_END:
|
||||
|
@ -85,7 +85,7 @@ SECTION .text
|
||||
|
||||
%macro SSE2_PRED_H_16X16_TWO_LINE 1
|
||||
lea eax, [eax+ecx*2]
|
||||
|
||||
|
||||
COPY_16_TIMES eax, xmm0
|
||||
movdqa [edx+%1], xmm0
|
||||
COPY_16_TIMESS eax, xmm0, ecx
|
||||
@ -97,22 +97,22 @@ WelsI16x16LumaPredH_sse2:
|
||||
mov edx, [esp+4] ; pred
|
||||
mov eax, [esp+8] ; pRef
|
||||
mov ecx, [esp+12] ; stride
|
||||
|
||||
|
||||
COPY_16_TIMES eax, xmm0
|
||||
movdqa [edx], xmm0
|
||||
COPY_16_TIMESS eax, xmm0, ecx
|
||||
movdqa [edx+0x10], xmm0
|
||||
|
||||
SSE2_PRED_H_16X16_TWO_LINE 0x20
|
||||
|
||||
SSE2_PRED_H_16X16_TWO_LINE 0x20
|
||||
SSE2_PRED_H_16X16_TWO_LINE 0x40
|
||||
SSE2_PRED_H_16X16_TWO_LINE 0x60
|
||||
SSE2_PRED_H_16X16_TWO_LINE 0x80
|
||||
SSE2_PRED_H_16X16_TWO_LINE 0xa0
|
||||
SSE2_PRED_H_16X16_TWO_LINE 0xc0
|
||||
SSE2_PRED_H_16X16_TWO_LINE 0xe0
|
||||
|
||||
|
||||
ret
|
||||
|
||||
|
||||
;***********************************************************************
|
||||
; void WelsI16x16LumaPredV_sse2(uint8_t *pred, uint8_t *pRef, int32_t stride);
|
||||
;***********************************************************************
|
||||
@ -121,10 +121,10 @@ WelsI16x16LumaPredV_sse2:
|
||||
mov edx, [esp+4] ; pred
|
||||
mov eax, [esp+8] ; pRef
|
||||
mov ecx, [esp+12] ; stride
|
||||
|
||||
|
||||
sub eax, ecx
|
||||
movdqa xmm0, [eax]
|
||||
|
||||
|
||||
movdqa [edx], xmm0
|
||||
movdqa [edx+10h], xmm0
|
||||
movdqa [edx+20h], xmm0
|
||||
@ -135,11 +135,11 @@ WelsI16x16LumaPredV_sse2:
|
||||
movdqa [edx+70h], xmm0
|
||||
movdqa [edx+80h], xmm0
|
||||
movdqa [edx+90h], xmm0
|
||||
movdqa [edx+160], xmm0
|
||||
movdqa [edx+160], xmm0
|
||||
movdqa [edx+176], xmm0
|
||||
movdqa [edx+192], xmm0
|
||||
movdqa [edx+208], xmm0
|
||||
movdqa [edx+224], xmm0
|
||||
movdqa [edx+240], xmm0
|
||||
|
||||
|
||||
ret
|
@ -67,7 +67,7 @@ BITS 32
|
||||
%endmacro
|
||||
|
||||
|
||||
|
||||
|
||||
%macro CACHE_SPLIT_CHECK 3 ; address, width, cacheline
|
||||
and %1, 0x1f|(%3>>1)
|
||||
cmp %1, (32-%2)|(%3>>1)
|
||||
@ -108,15 +108,15 @@ WelsSampleSad8x8_sse21:
|
||||
push edi
|
||||
mov eax, [esp+12]
|
||||
mov ebx, [esp+16]
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
|
||||
mov edi, ecx
|
||||
and edi, 0x07
|
||||
sub ecx, edi
|
||||
sub ecx, edi
|
||||
mov edx, 8
|
||||
sub edx, edi
|
||||
|
||||
|
||||
shl edi, 3
|
||||
shl edx, 3
|
||||
movd xmm5, edi
|
||||
@ -124,10 +124,10 @@ WelsSampleSad8x8_sse21:
|
||||
mov edi, 8
|
||||
add edi, ecx
|
||||
mov edx, [esp+24]
|
||||
|
||||
|
||||
movq xmm0, [eax]
|
||||
movhps xmm0, [eax+ebx]
|
||||
|
||||
|
||||
movq xmm1, [ecx]
|
||||
movq xmm2, [edi]
|
||||
movhps xmm1, [ecx+edx]
|
||||
@ -135,35 +135,17 @@ WelsSampleSad8x8_sse21:
|
||||
psrlq xmm1, xmm5
|
||||
psllq xmm2, xmm6
|
||||
por xmm1, xmm2
|
||||
|
||||
psadbw xmm0, xmm1
|
||||
paddw xmm7, xmm0
|
||||
|
||||
lea eax, [eax+2*ebx]
|
||||
lea ecx, [ecx+2*edx]
|
||||
lea edi, [edi+2*edx]
|
||||
|
||||
movq xmm0, [eax]
|
||||
movhps xmm0, [eax+ebx]
|
||||
|
||||
movq xmm1, [ecx]
|
||||
movq xmm2, [edi]
|
||||
movhps xmm1, [ecx+edx]
|
||||
movhps xmm2, [edi+edx]
|
||||
psrlq xmm1, xmm5
|
||||
psllq xmm2, xmm6
|
||||
por xmm1, xmm2
|
||||
|
||||
|
||||
psadbw xmm0, xmm1
|
||||
paddw xmm7, xmm0
|
||||
|
||||
lea eax, [eax+2*ebx]
|
||||
lea ecx, [ecx+2*edx]
|
||||
lea edi, [edi+2*edx]
|
||||
|
||||
|
||||
movq xmm0, [eax]
|
||||
movhps xmm0, [eax+ebx]
|
||||
|
||||
|
||||
movq xmm1, [ecx]
|
||||
movq xmm2, [edi]
|
||||
movhps xmm1, [ecx+edx]
|
||||
@ -171,17 +153,17 @@ WelsSampleSad8x8_sse21:
|
||||
psrlq xmm1, xmm5
|
||||
psllq xmm2, xmm6
|
||||
por xmm1, xmm2
|
||||
|
||||
|
||||
psadbw xmm0, xmm1
|
||||
paddw xmm7, xmm0
|
||||
|
||||
|
||||
lea eax, [eax+2*ebx]
|
||||
lea ecx, [ecx+2*edx]
|
||||
lea edi, [edi+2*edx]
|
||||
|
||||
|
||||
movq xmm0, [eax]
|
||||
movhps xmm0, [eax+ebx]
|
||||
|
||||
|
||||
movq xmm1, [ecx]
|
||||
movq xmm2, [edi]
|
||||
movhps xmm1, [ecx+edx]
|
||||
@ -189,10 +171,28 @@ WelsSampleSad8x8_sse21:
|
||||
psrlq xmm1, xmm5
|
||||
psllq xmm2, xmm6
|
||||
por xmm1, xmm2
|
||||
|
||||
|
||||
psadbw xmm0, xmm1
|
||||
paddw xmm7, xmm0
|
||||
|
||||
|
||||
lea eax, [eax+2*ebx]
|
||||
lea ecx, [ecx+2*edx]
|
||||
lea edi, [edi+2*edx]
|
||||
|
||||
movq xmm0, [eax]
|
||||
movhps xmm0, [eax+ebx]
|
||||
|
||||
movq xmm1, [ecx]
|
||||
movq xmm2, [edi]
|
||||
movhps xmm1, [ecx+edx]
|
||||
movhps xmm2, [edi+edx]
|
||||
psrlq xmm1, xmm5
|
||||
psllq xmm2, xmm6
|
||||
por xmm1, xmm2
|
||||
|
||||
psadbw xmm0, xmm1
|
||||
paddw xmm7, xmm0
|
||||
|
||||
movhlps xmm0, xmm7
|
||||
paddw xmm0, xmm7
|
||||
movd eax, xmm0
|
||||
@ -202,12 +202,12 @@ WelsSampleSad8x8_sse21:
|
||||
push ebx
|
||||
mov eax, [esp+8]
|
||||
mov ebx, [esp+12]
|
||||
mov edx, [esp+20]
|
||||
mov edx, [esp+20]
|
||||
pxor xmm6, xmm6
|
||||
SSE2_GetSad8x4
|
||||
lea eax, [eax+2*ebx]
|
||||
lea ecx, [ecx+2*edx]
|
||||
SSE2_GetSad8x4
|
||||
SSE2_GetSad8x4
|
||||
movhlps xmm0, xmm6
|
||||
paddw xmm0, xmm6
|
||||
movd eax, xmm0
|
||||
|
@ -163,7 +163,7 @@ BITS 32
|
||||
paddd xmm6, xmm1
|
||||
paddd xmm6, xmm3
|
||||
lea esi, [esi+ebx*2]
|
||||
lea edi, [edi+ebx*2]
|
||||
lea edi, [edi+ebx*2]
|
||||
%endmacro
|
||||
|
||||
%macro WELS_SAD_SUM_SQSUM_16x1_SSE2 0
|
||||
@ -172,11 +172,11 @@ BITS 32
|
||||
movdqa xmm3, xmm1
|
||||
psadbw xmm3, xmm2
|
||||
paddd xmm6, xmm3
|
||||
|
||||
|
||||
movdqa xmm3, xmm1
|
||||
psadbw xmm3, xmm0
|
||||
paddd xmm5, xmm3
|
||||
|
||||
|
||||
movdqa xmm2, xmm1
|
||||
punpcklbw xmm1, xmm0
|
||||
punpckhbw xmm2, xmm0
|
||||
@ -184,7 +184,7 @@ BITS 32
|
||||
pmaddwd xmm2, xmm2
|
||||
paddd xmm4, xmm1
|
||||
paddd xmm4, xmm2
|
||||
|
||||
|
||||
add esi, ebx
|
||||
add edi, ebx
|
||||
%endmacro
|
||||
@ -195,16 +195,16 @@ BITS 32
|
||||
movdqa xmm3, xmm1
|
||||
psadbw xmm3, xmm2
|
||||
paddd xmm7, xmm3 ; sad
|
||||
|
||||
|
||||
movdqa xmm3, xmm1
|
||||
pmaxub xmm3, xmm2
|
||||
pminub xmm2, xmm1
|
||||
psubb xmm3, xmm2 ; diff
|
||||
|
||||
|
||||
movdqa xmm2, xmm1
|
||||
psadbw xmm2, xmm0
|
||||
paddd xmm6, xmm2 ; sum
|
||||
|
||||
|
||||
movdqa xmm2, xmm1
|
||||
punpcklbw xmm1, xmm0
|
||||
punpckhbw xmm2, xmm0
|
||||
@ -212,7 +212,7 @@ BITS 32
|
||||
pmaddwd xmm2, xmm2
|
||||
paddd xmm5, xmm1
|
||||
paddd xmm5, xmm2 ; sqsum
|
||||
|
||||
|
||||
movdqa xmm1, xmm3
|
||||
punpcklbw xmm1, xmm0
|
||||
punpckhbw xmm3, xmm0
|
||||
@ -220,7 +220,7 @@ BITS 32
|
||||
pmaddwd xmm3, xmm3
|
||||
paddd xmm4, xmm1
|
||||
paddd xmm4, xmm3 ; sqdiff
|
||||
|
||||
|
||||
add esi, ebx
|
||||
add edi, ebx
|
||||
%endmacro
|
||||
@ -238,16 +238,16 @@ BITS 32
|
||||
movdqa xmm3, xmm2
|
||||
psadbw xmm3, xmm0
|
||||
paddd sum_ref_reg, xmm3 ; sum_ref
|
||||
|
||||
|
||||
movdqa xmm3, xmm1
|
||||
pmaxub xmm3, xmm2
|
||||
pminub xmm2, xmm1
|
||||
psubb xmm3, xmm2 ; abs diff
|
||||
pmaxub mad_reg, xmm3 ; max abs diff
|
||||
|
||||
|
||||
psadbw xmm3, xmm0
|
||||
paddd sad_reg, xmm3 ; sad
|
||||
|
||||
|
||||
add esi, ebx
|
||||
add edi, ebx
|
||||
%endmacro
|
||||
@ -285,7 +285,7 @@ BITS 32
|
||||
psllq xmm3, 32
|
||||
paddd xmm2, xmm3
|
||||
paddd sad_reg, xmm2 ; sqsum
|
||||
|
||||
|
||||
movdqa xmm2, [edi]
|
||||
movdqa xmm3, xmm1
|
||||
psadbw xmm3, xmm0
|
||||
@ -294,13 +294,13 @@ BITS 32
|
||||
psadbw xmm3, xmm0
|
||||
pslldq xmm3, 4
|
||||
paddd sum_reg, xmm3 ; sum_ref
|
||||
|
||||
|
||||
movdqa xmm3, xmm1
|
||||
pmaxub xmm3, xmm2
|
||||
pminub xmm2, xmm1
|
||||
psubb xmm3, xmm2 ; abs diff
|
||||
pmaxub mad_reg, xmm3 ; max abs diff
|
||||
|
||||
|
||||
movdqa xmm1, xmm3
|
||||
psadbw xmm3, xmm0
|
||||
paddd sad_reg, xmm3 ; sad
|
||||
@ -312,7 +312,7 @@ BITS 32
|
||||
pmaddwd xmm3, xmm3
|
||||
paddd sqdiff_reg, xmm1
|
||||
paddd sqdiff_reg, xmm3 ; sqdiff
|
||||
|
||||
|
||||
add esi, ebx
|
||||
add edi, ebx
|
||||
%endmacro
|
||||
@ -351,7 +351,7 @@ rc_sad_frame_sse2:
|
||||
mov ebx, [esp+32]
|
||||
mov ecx, [esp+36]
|
||||
mov edx, [esp+40]
|
||||
pxor xmm0, xmm0
|
||||
pxor xmm0, xmm0
|
||||
.hloop:
|
||||
mov eax, ebx
|
||||
mov ebp, $0
|
||||
@ -361,7 +361,7 @@ rc_sad_frame_sse2:
|
||||
psadbw xmm1, xmm2
|
||||
pshufd xmm2, xmm1, 0f6h ; 11110110 B ; movhlps for float
|
||||
paddd xmm1, xmm2
|
||||
paddd xmm0, xmm1
|
||||
paddd xmm0, xmm1
|
||||
add ebp, 010h
|
||||
dec eax
|
||||
jnz near .wloop
|
||||
@ -384,20 +384,20 @@ WELS_EXTERN SampleVariance16x16_sse2
|
||||
; void SampleVariance16x16_sse2( uint8_t * y_ref, int32_t y_ref_stride, uint8_t * y_src, int32_t y_src_stride,SMotionTextureUnit* pMotionTexture );
|
||||
;***********************************************************************
|
||||
ALIGN 16
|
||||
SampleVariance16x16_sse2:
|
||||
SampleVariance16x16_sse2:
|
||||
push esi
|
||||
push edi
|
||||
push ebx
|
||||
|
||||
|
||||
sub esp, 16
|
||||
%define SUM [esp]
|
||||
%define SUM_CUR [esp+4]
|
||||
%define SQR [esp+8]
|
||||
%define SQR_CUR [esp+12]
|
||||
%define PUSH_SIZE 28 ; 12 + 16
|
||||
%define PUSH_SIZE 28 ; 12 + 16
|
||||
|
||||
mov edi, [esp+PUSH_SIZE+4] ; y_ref
|
||||
mov edx, [esp+PUSH_SIZE+8] ; y_ref_stride
|
||||
mov edx, [esp+PUSH_SIZE+8] ; y_ref_stride
|
||||
mov esi, [esp+PUSH_SIZE+12] ; y_src
|
||||
mov eax, [esp+PUSH_SIZE+16] ; y_src_stride
|
||||
mov ecx, 010h ; height = 16
|
||||
@ -422,7 +422,7 @@ SampleVariance16x16_sse2:
|
||||
; sqr += diff * diff;
|
||||
pmaxub xmm0, xmm1
|
||||
pminub xmm1, xmm2
|
||||
psubb xmm0, xmm1 ; diff
|
||||
psubb xmm0, xmm1 ; diff
|
||||
SUM_SQR_SSE2 xmm1, xmm0, xmm7 ; dst, pSrc, zero
|
||||
movd ebx, xmm1
|
||||
add SQR, ebx
|
||||
@ -433,7 +433,7 @@ SampleVariance16x16_sse2:
|
||||
punpcklbw xmm0, xmm7
|
||||
punpckhbw xmm1, xmm7
|
||||
paddw xmm0, xmm1 ; 8x2
|
||||
SUM_WORD_8x2_SSE2 xmm0, xmm1
|
||||
SUM_WORD_8x2_SSE2 xmm0, xmm1
|
||||
movd ebx, xmm0
|
||||
and ebx, 0ffffh
|
||||
add SUM_CUR, ebx
|
||||
@ -442,12 +442,12 @@ SampleVariance16x16_sse2:
|
||||
SUM_SQR_SSE2 xmm0, xmm3, xmm7 ; dst, pSrc, zero
|
||||
movd ebx, xmm0
|
||||
add SQR_CUR, ebx
|
||||
|
||||
|
||||
lea edi, [edi+edx]
|
||||
lea esi, [esi+eax]
|
||||
dec ecx
|
||||
jnz near .hloops
|
||||
|
||||
|
||||
mov ebx, 0
|
||||
mov bx, word SUM
|
||||
sar ebx, 8
|
||||
@ -465,17 +465,17 @@ SampleVariance16x16_sse2:
|
||||
sar ecx, 8
|
||||
sub ecx, ebx
|
||||
mov [edi+2], cx ; to store uiTextureIndex
|
||||
|
||||
|
||||
%undef SUM
|
||||
%undef SUM_CUR
|
||||
%undef SQR
|
||||
%undef SQR_CUR
|
||||
%undef PUSH_SIZE
|
||||
|
||||
add esp, 16
|
||||
add esp, 16
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
pop esi
|
||||
|
||||
ret
|
||||
|
||||
@ -497,7 +497,7 @@ AnalysisVaaInfoIntra_sse2:
|
||||
mov ebp, esp
|
||||
and ebp, 0fh
|
||||
sub esp, ebp
|
||||
sub esp, 32
|
||||
sub esp, 32
|
||||
%define PUSH_SIZE 52 ; 20 + 32
|
||||
|
||||
mov esi, [esp+ebp+PUSH_SIZE+4] ; data_y
|
||||
@ -509,31 +509,31 @@ AnalysisVaaInfoIntra_sse2:
|
||||
add edx, ecx ; linesize x 3 [edx]
|
||||
mov eax, ebx
|
||||
sal eax, $1 ; linesize x 4 [eax]
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
|
||||
; loops
|
||||
VAA_AVG_BLOCK_SSE2 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp], xmm0
|
||||
movq [esp], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSE2 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp+8], xmm0
|
||||
movq [esp+8], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSE2 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp+16], xmm0
|
||||
movq [esp+16], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSE2 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp+24], xmm0
|
||||
|
||||
|
||||
movdqa xmm0, [esp] ; block 0~7
|
||||
movdqa xmm1, [esp+16] ; block 8~15
|
||||
movdqa xmm2, xmm0
|
||||
paddw xmm0, xmm1
|
||||
SUM_WORD_8x2_SSE2 xmm0, xmm3
|
||||
|
||||
|
||||
pmullw xmm1, xmm1
|
||||
pmullw xmm2, xmm2
|
||||
movdqa xmm3, xmm1
|
||||
@ -549,7 +549,7 @@ AnalysisVaaInfoIntra_sse2:
|
||||
paddd xmm1, xmm2
|
||||
pshufd xmm2, xmm1, 0B1h
|
||||
paddd xmm1, xmm2
|
||||
|
||||
|
||||
movd ebx, xmm0
|
||||
and ebx, 0ffffh ; effective low word truncated
|
||||
mov ecx, ebx
|
||||
@ -557,7 +557,7 @@ AnalysisVaaInfoIntra_sse2:
|
||||
sar ebx, $4
|
||||
movd eax, xmm1
|
||||
sub eax, ebx
|
||||
|
||||
|
||||
%undef PUSH_SIZE
|
||||
add esp, 32
|
||||
add esp, ebp
|
||||
@ -567,7 +567,7 @@ AnalysisVaaInfoIntra_sse2:
|
||||
pop edx
|
||||
pop ebx
|
||||
ret
|
||||
|
||||
|
||||
WELS_EXTERN AnalysisVaaInfoIntra_ssse3
|
||||
;***********************************************************************
|
||||
; int32_t AnalysisVaaInfoIntra_ssse3( uint8_t *pDataY, const int32_t linesize );
|
||||
@ -583,7 +583,7 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
mov ebp, esp
|
||||
and ebp, 0fh
|
||||
sub esp, ebp
|
||||
sub esp, 32
|
||||
sub esp, 32
|
||||
%define PUSH_SIZE 52 ; 20 + 32
|
||||
|
||||
mov esi, [esp+ebp+PUSH_SIZE+4] ; data_y
|
||||
@ -595,25 +595,25 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
add edx, ecx ; linesize x 3 [edx]
|
||||
mov eax, ebx
|
||||
sal eax, $1 ; linesize x 4 [eax]
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
|
||||
|
||||
; loops
|
||||
VAA_AVG_BLOCK_SSSE3 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp], xmm0
|
||||
movq [esp], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSSE3 xmm1, xmm2, xmm3, xmm4, xmm5, xmm6
|
||||
movq [esp+8], xmm1
|
||||
movq [esp+8], xmm1
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSSE3 xmm0, xmm1, xmm2, xmm3, xmm4, xmm5
|
||||
movq [esp+16], xmm0
|
||||
movq [esp+16], xmm0
|
||||
|
||||
lea esi, [esi+eax]
|
||||
VAA_AVG_BLOCK_SSSE3 xmm1, xmm2, xmm3, xmm4, xmm5, xmm6
|
||||
movq [esp+24], xmm1
|
||||
|
||||
|
||||
movdqa xmm0, [esp] ; block 0~7
|
||||
movdqa xmm1, [esp+16] ; block 8~15
|
||||
movdqa xmm2, xmm0
|
||||
@ -635,7 +635,7 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
paddd xmm1, xmm2
|
||||
pshufd xmm2, xmm1, 0B1h
|
||||
paddd xmm1, xmm2
|
||||
|
||||
|
||||
movd ebx, xmm0
|
||||
and ebx, 0ffffh ; effective low work truncated
|
||||
mov ecx, ebx
|
||||
@ -643,7 +643,7 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
sar ebx, $4
|
||||
movd eax, xmm1
|
||||
sub eax, ebx
|
||||
|
||||
|
||||
%undef PUSH_SIZE
|
||||
add esp, 32
|
||||
add esp, ebp
|
||||
@ -654,12 +654,12 @@ AnalysisVaaInfoIntra_ssse3:
|
||||
pop ebx
|
||||
ret
|
||||
%endif
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
WELS_EXTERN abs_difference_mbrow_sse2
|
||||
;*************************************************************************************************************
|
||||
;void abs_difference_mbrow_sse2( uint8_t *ref_orig, uint8_t *cur_orig, int32_t iPicStride,
|
||||
;void abs_difference_mbrow_sse2( uint8_t *ref_orig, uint8_t *cur_orig, int32_t iPicStride,
|
||||
; int32_t gom_pixel_num, int32_t *pSum)
|
||||
;*************************************************************************************************************
|
||||
ALIGN 16
|
||||
@ -691,13 +691,13 @@ gom_row_loop_p:
|
||||
add edi, 16
|
||||
cmp esi, edx
|
||||
jl gom_row_loop_p
|
||||
|
||||
|
||||
sub esi, eax
|
||||
sub edi, eax
|
||||
add esi, ebx
|
||||
add edi, ebx
|
||||
loop mb_width_loop_p
|
||||
|
||||
|
||||
movdqa xmm1, xmm0
|
||||
psrldq xmm1, 8
|
||||
paddd xmm1, xmm0
|
||||
@ -710,7 +710,7 @@ gom_row_loop_p:
|
||||
%undef iPicStride
|
||||
%undef gom_pixel_num
|
||||
%undef pSum
|
||||
%undef pushsize
|
||||
%undef pushsize
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -721,7 +721,7 @@ gom_row_loop_p:
|
||||
|
||||
WELS_EXTERN sum_sqrsum_mbrow_sse2
|
||||
;*************************************************************************************************************
|
||||
;void sum_sqrsum_mbrow_sse2( uint8_t *cur_orig, int32_t iPicStride,
|
||||
;void sum_sqrsum_mbrow_sse2( uint8_t *cur_orig, int32_t iPicStride,
|
||||
; int32_t gom_pixel_num, int32_t *pSum, int32_t *pSqrSum)
|
||||
;*************************************************************************************************************
|
||||
ALIGN 16
|
||||
@ -759,18 +759,18 @@ gom_row_loop_i:
|
||||
add esi, 16
|
||||
cmp esi, edx
|
||||
jl gom_row_loop_i
|
||||
|
||||
|
||||
sub esi, eax
|
||||
add esi, ebx
|
||||
loop mb_width_loop_i
|
||||
|
||||
|
||||
movdqa xmm3, xmm1
|
||||
psrldq xmm3, 8
|
||||
paddd xmm1, xmm3
|
||||
movd eax, xmm1
|
||||
mov edx, [pSum]
|
||||
add [edx], eax
|
||||
|
||||
|
||||
movdqa xmm3, xmm2
|
||||
psrldq xmm3, 8
|
||||
paddd xmm2, xmm3
|
||||
@ -787,7 +787,7 @@ gom_row_loop_i:
|
||||
%undef gom_pixel_num
|
||||
%undef pSum
|
||||
%undef pSqrSum
|
||||
%undef pushsize
|
||||
%undef pushsize
|
||||
pop ebx
|
||||
pop esi
|
||||
ret
|
||||
@ -819,7 +819,7 @@ VAACalcSad_sse2:
|
||||
mov ebx, [iPicStride]
|
||||
mov edx, [psad8x8]
|
||||
mov eax, ebx
|
||||
|
||||
|
||||
shr dword [iPicWidth], 4 ; iPicWidth/16
|
||||
shr dword [iPicHeight], 4 ; iPicHeight/16
|
||||
shl eax, 4 ; iPicStride*16
|
||||
@ -839,7 +839,7 @@ width_loop:
|
||||
movd [edx], xmm6
|
||||
psrldq xmm6, 8
|
||||
movd [edx+4], xmm6
|
||||
|
||||
|
||||
pxor xmm6, xmm6
|
||||
WELS_SAD_16x2_SSE2
|
||||
WELS_SAD_16x2_SSE2
|
||||
@ -849,24 +849,24 @@ width_loop:
|
||||
movd [edx+8], xmm6
|
||||
psrldq xmm6, 8
|
||||
movd [edx+12], xmm6
|
||||
|
||||
|
||||
add edx, 16
|
||||
sub esi, eax
|
||||
sub edi, eax
|
||||
add esi, 16
|
||||
add edi, 16
|
||||
|
||||
|
||||
dec ecx
|
||||
jnz width_loop
|
||||
|
||||
|
||||
pop edi
|
||||
pop esi
|
||||
add esi, eax
|
||||
add edi, eax
|
||||
|
||||
|
||||
dec dword [iPicHeight]
|
||||
jnz height_loop
|
||||
|
||||
|
||||
mov edx, [psadframe]
|
||||
movdqa xmm5, xmm7
|
||||
psrldq xmm7, 8
|
||||
@ -880,16 +880,16 @@ width_loop:
|
||||
%undef iPicStride
|
||||
%undef psadframe
|
||||
%undef psad8x8
|
||||
%undef pushsize
|
||||
%undef pushsize
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
WELS_EXTERN VAACalcSadVar_sse2
|
||||
;*************************************************************************************************************
|
||||
;void VAACalcSadVar_sse2( uint8_t *cur_data, uint8_t *ref_data, int32_t iPicWidth, int32_t iPicHeight
|
||||
;void VAACalcSadVar_sse2( uint8_t *cur_data, uint8_t *ref_data, int32_t iPicWidth, int32_t iPicHeight
|
||||
; int32_t iPicStride, int32_t *psadframe, int32_t *psad8x8, int32_t *psum16x16, int32_t *psqsum16x16)
|
||||
;*************************************************************************************************************
|
||||
|
||||
@ -919,7 +919,7 @@ VAACalcSadVar_sse2:
|
||||
mov ebx, [iPicStride]
|
||||
mov edx, [psad8x8]
|
||||
mov eax, ebx
|
||||
|
||||
|
||||
shr dword [iPicWidth], 4 ; iPicWidth/16
|
||||
shr dword [iPicHeight], 4 ; iPicHeight/16
|
||||
shl eax, 4 ; iPicStride*16
|
||||
@ -945,7 +945,7 @@ var_width_loop:
|
||||
movd [edx], xmm6
|
||||
psrldq xmm6, 8
|
||||
movd [edx+4], xmm6
|
||||
|
||||
|
||||
pxor xmm6, xmm6
|
||||
WELS_SAD_SUM_SQSUM_16x1_SSE2
|
||||
WELS_SAD_SUM_SQSUM_16x1_SSE2
|
||||
@ -959,49 +959,49 @@ var_width_loop:
|
||||
movd [edx+8], xmm6
|
||||
psrldq xmm6, 8
|
||||
movd [edx+12], xmm6
|
||||
|
||||
|
||||
mov ebp, [psum16x16]
|
||||
movdqa xmm1, xmm5
|
||||
psrldq xmm1, 8
|
||||
paddd xmm5, xmm1
|
||||
movd [ebp], xmm5
|
||||
add dword [psum16x16], 4
|
||||
|
||||
|
||||
movdqa xmm5, xmm4
|
||||
psrldq xmm5, 8
|
||||
paddd xmm4, xmm5
|
||||
movdqa xmm3, xmm4
|
||||
psrldq xmm3, 4
|
||||
paddd xmm4, xmm3
|
||||
|
||||
|
||||
mov ebp, [psqsum16x16]
|
||||
movd [ebp], xmm4
|
||||
add dword [psqsum16x16], 4
|
||||
|
||||
|
||||
add edx, 16
|
||||
sub esi, eax
|
||||
sub edi, eax
|
||||
add esi, 16
|
||||
add edi, 16
|
||||
|
||||
|
||||
dec ecx
|
||||
jnz var_width_loop
|
||||
|
||||
|
||||
mov esi, [tmp_esi]
|
||||
mov edi, [tmp_edi]
|
||||
add esi, eax
|
||||
add edi, eax
|
||||
|
||||
|
||||
dec dword [iPicHeight]
|
||||
jnz var_height_loop
|
||||
|
||||
|
||||
mov edx, [psadframe]
|
||||
movdqa xmm5, xmm7
|
||||
psrldq xmm7, 8
|
||||
paddd xmm7, xmm5
|
||||
movd [edx], xmm7
|
||||
|
||||
add esp, localsize
|
||||
add esp, localsize
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -1020,12 +1020,12 @@ var_width_loop:
|
||||
%undef pushsize
|
||||
%undef localsize
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
WELS_EXTERN VAACalcSadSsd_sse2
|
||||
;*************************************************************************************************************
|
||||
;void VAACalcSadSsd_sse2(uint8_t *cur_data, uint8_t *ref_data, int32_t iPicWidth, int32_t iPicHeight,
|
||||
;void VAACalcSadSsd_sse2(uint8_t *cur_data, uint8_t *ref_data, int32_t iPicWidth, int32_t iPicHeight,
|
||||
; int32_t iPicStride,int32_t *psadframe, int32_t *psad8x8, int32_t *psum16x16, int32_t *psqsum16x16, int32_t *psqdiff16x16)
|
||||
;*************************************************************************************************************
|
||||
|
||||
@ -1059,7 +1059,7 @@ VAACalcSadSsd_sse2:
|
||||
mov ebx, [iPicStride]
|
||||
mov edx, [psad8x8]
|
||||
mov eax, ebx
|
||||
|
||||
|
||||
shr dword [iPicWidth], 4 ; iPicWidth/16
|
||||
shr dword [iPicHeight], 4 ; iPicHeight/16
|
||||
shl eax, 4 ; iPicStride*16
|
||||
@ -1091,7 +1091,7 @@ sqdiff_width_loop:
|
||||
movd [edx+4], xmm7
|
||||
movd ebp, xmm1
|
||||
add [tmp_sadframe], ebp
|
||||
|
||||
|
||||
pxor xmm7, xmm7
|
||||
WELS_SAD_SUM_SQSUM_SQDIFF_16x1_SSE2
|
||||
WELS_SAD_SUM_SQSUM_SQDIFF_16x1_SSE2
|
||||
@ -1108,14 +1108,14 @@ sqdiff_width_loop:
|
||||
movd [edx+12], xmm7
|
||||
movd ebp, xmm1
|
||||
add [tmp_sadframe], ebp
|
||||
|
||||
|
||||
mov ebp, [psum16x16]
|
||||
movdqa xmm1, xmm6
|
||||
psrldq xmm1, 8
|
||||
paddd xmm6, xmm1
|
||||
movd [ebp], xmm6
|
||||
add dword [psum16x16], 4
|
||||
|
||||
|
||||
mov ebp, [psqsum16x16]
|
||||
pshufd xmm6, xmm5, 14 ;00001110
|
||||
paddd xmm6, xmm5
|
||||
@ -1123,7 +1123,7 @@ sqdiff_width_loop:
|
||||
paddd xmm5, xmm6
|
||||
movd [ebp], xmm5
|
||||
add dword [psqsum16x16], 4
|
||||
|
||||
|
||||
mov ebp, [psqdiff16x16]
|
||||
pshufd xmm5, xmm4, 14 ; 00001110
|
||||
paddd xmm5, xmm4
|
||||
@ -1131,29 +1131,29 @@ sqdiff_width_loop:
|
||||
paddd xmm4, xmm5
|
||||
movd [ebp], xmm4
|
||||
add dword [psqdiff16x16], 4
|
||||
|
||||
|
||||
add edx, 16
|
||||
sub esi, eax
|
||||
sub edi, eax
|
||||
add esi, 16
|
||||
add edi, 16
|
||||
|
||||
|
||||
dec ecx
|
||||
jnz sqdiff_width_loop
|
||||
|
||||
|
||||
mov esi, [tmp_esi]
|
||||
mov edi, [tmp_edi]
|
||||
add esi, eax
|
||||
add edi, eax
|
||||
|
||||
|
||||
dec dword [iPicHeight]
|
||||
jnz sqdiff_height_loop
|
||||
|
||||
|
||||
mov ebx, [tmp_sadframe]
|
||||
mov eax, [psadframe]
|
||||
mov [eax], ebx
|
||||
|
||||
add esp, localsize
|
||||
add esp, localsize
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -1174,14 +1174,14 @@ sqdiff_width_loop:
|
||||
%undef pushsize
|
||||
%undef localsize
|
||||
ret
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
|
||||
WELS_EXTERN VAACalcSadBgd_sse2
|
||||
;*************************************************************************************************************
|
||||
;void VAACalcSadBgd_sse2(uint8_t *cur_data, uint8_t *ref_data, int32_t iPicWidth, int32_t iPicHeight,
|
||||
;void VAACalcSadBgd_sse2(uint8_t *cur_data, uint8_t *ref_data, int32_t iPicWidth, int32_t iPicHeight,
|
||||
; int32_t iPicStride, int32_t *psadframe, int32_t *psad8x8, int32_t *p_sd8x8, uint8_t *p_mad8x8)
|
||||
;*************************************************************************************************************
|
||||
|
||||
@ -1211,7 +1211,7 @@ VAACalcSadBgd_sse2:
|
||||
mov edi, [ref_data]
|
||||
mov ebx, [iPicStride]
|
||||
mov eax, ebx
|
||||
|
||||
|
||||
shr dword [iPicWidth], 4 ; iPicWidth/16
|
||||
shr dword [iPicHeight], 4 ; iPicHeight/16
|
||||
shl eax, 4 ; iPicStride*16
|
||||
@ -1234,11 +1234,11 @@ bgd_width_loop:
|
||||
WELS_SAD_SD_MAD_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_SD_MAD_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_SD_MAD_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
|
||||
|
||||
|
||||
|
||||
mov edx, [p_mad8x8]
|
||||
WELS_MAX_REG_SSE2 xmm4
|
||||
|
||||
|
||||
;movdqa xmm1, xmm4
|
||||
;punpcklbw xmm1, xmm0
|
||||
;punpcklwd xmm1, xmm0
|
||||
@ -1247,7 +1247,7 @@ bgd_width_loop:
|
||||
;punpcklwd xmm4, xmm0
|
||||
;movd [edx+4], xmm4
|
||||
;add edx, 8
|
||||
;mov [p_mad8x8], edx
|
||||
;mov [p_mad8x8], edx
|
||||
mov [tmp_ecx], ecx
|
||||
movhlps xmm1, xmm4
|
||||
movd ecx, xmm4
|
||||
@ -1257,12 +1257,12 @@ bgd_width_loop:
|
||||
add edx, 2
|
||||
mov [p_mad8x8], edx
|
||||
|
||||
|
||||
|
||||
pslldq xmm7, 4
|
||||
pslldq xmm6, 4
|
||||
pslldq xmm5, 4
|
||||
|
||||
|
||||
|
||||
|
||||
pxor xmm4, xmm4 ; pMad8x8
|
||||
WELS_SAD_SD_MAD_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_SD_MAD_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
@ -1272,10 +1272,10 @@ bgd_width_loop:
|
||||
WELS_SAD_SD_MAD_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_SD_MAD_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_SD_MAD_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
|
||||
|
||||
mov edx, [p_mad8x8]
|
||||
WELS_MAX_REG_SSE2 xmm4
|
||||
|
||||
|
||||
;movdqa xmm1, xmm4
|
||||
;punpcklbw xmm1, xmm0
|
||||
;punpcklwd xmm1, xmm0
|
||||
@ -1284,7 +1284,7 @@ bgd_width_loop:
|
||||
;punpcklwd xmm4, xmm0
|
||||
;movd [edx+4], xmm4
|
||||
;add edx, 8
|
||||
;mov [p_mad8x8], edx
|
||||
;mov [p_mad8x8], edx
|
||||
movhlps xmm1, xmm4
|
||||
movd ecx, xmm4
|
||||
mov [edx], cl
|
||||
@ -1292,51 +1292,51 @@ bgd_width_loop:
|
||||
mov [edx+1],cl
|
||||
add edx, 2
|
||||
mov [p_mad8x8], edx
|
||||
|
||||
|
||||
; data in xmm7, xmm6, xmm5: D1 D3 D0 D2
|
||||
|
||||
|
||||
mov edx, [psad8x8]
|
||||
pshufd xmm1, xmm7, 10001101b ; D3 D2 D1 D0
|
||||
movdqa [edx], xmm1
|
||||
movdqa [edx], xmm1
|
||||
add edx, 16
|
||||
mov [psad8x8], edx ; sad8x8
|
||||
|
||||
|
||||
paddd xmm1, xmm7 ; D1+3 D3+2 D0+1 D2+0
|
||||
pshufd xmm2, xmm1, 00000011b
|
||||
paddd xmm1, xmm2
|
||||
movd edx, xmm1
|
||||
add ebp, edx ; sad frame
|
||||
|
||||
|
||||
mov edx, [p_sd8x8]
|
||||
psubd xmm6, xmm5
|
||||
pshufd xmm1, xmm6, 10001101b
|
||||
movdqa [edx], xmm1
|
||||
add edx, 16
|
||||
mov [p_sd8x8], edx
|
||||
|
||||
|
||||
|
||||
|
||||
add edx, 16
|
||||
sub esi, eax
|
||||
sub edi, eax
|
||||
add esi, 16
|
||||
add edi, 16
|
||||
|
||||
|
||||
mov ecx, [tmp_ecx]
|
||||
dec ecx
|
||||
jnz bgd_width_loop
|
||||
|
||||
|
||||
mov esi, [tmp_esi]
|
||||
mov edi, [tmp_edi]
|
||||
add esi, eax
|
||||
add edi, eax
|
||||
|
||||
|
||||
dec dword [iPicHeight]
|
||||
jnz bgd_height_loop
|
||||
|
||||
|
||||
mov edx, [psadframe]
|
||||
mov [edx], ebp
|
||||
|
||||
add esp, localsize
|
||||
add esp, localsize
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
@ -1360,8 +1360,8 @@ bgd_width_loop:
|
||||
|
||||
WELS_EXTERN VAACalcSadSsdBgd_sse2
|
||||
;*************************************************************************************************************
|
||||
;void VAACalcSadSsdBgd_sse2(uint8_t *cur_data, uint8_t *ref_data, int32_t iPicWidth, int32_t iPicHeight,
|
||||
; int32_t iPicStride, int32_t *psadframe, int32_t *psad8x8, int32_t *psum16x16, int32_t *psqsum16x16,
|
||||
;void VAACalcSadSsdBgd_sse2(uint8_t *cur_data, uint8_t *ref_data, int32_t iPicWidth, int32_t iPicHeight,
|
||||
; int32_t iPicStride, int32_t *psadframe, int32_t *psad8x8, int32_t *psum16x16, int32_t *psqsum16x16,
|
||||
; int32_t *psqdiff16x16, int32_t *p_sd8x8, uint8_t *p_mad8x8)
|
||||
;*************************************************************************************************************
|
||||
|
||||
@ -1395,7 +1395,7 @@ VAACalcSadSsdBgd_sse2:
|
||||
mov edi, [ref_data]
|
||||
mov ebx, [iPicStride]
|
||||
mov eax, ebx
|
||||
|
||||
|
||||
shr dword [iPicWidth], 4 ; iPicWidth/16
|
||||
shr dword [iPicHeight], 4 ; iPicHeight/16
|
||||
shl eax, 4 ; iPicStride*16
|
||||
@ -1418,7 +1418,7 @@ sqdiff_bgd_width_loop:
|
||||
WELS_SAD_BGD_SQDIFF_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_BGD_SQDIFF_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_BGD_SQDIFF_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
|
||||
|
||||
mov edx, [psad8x8]
|
||||
movdqa xmm2, xmm7
|
||||
pshufd xmm1, xmm2, 00001110b
|
||||
@ -1426,17 +1426,17 @@ sqdiff_bgd_width_loop:
|
||||
movd [edx+4], xmm1
|
||||
add edx, 8
|
||||
mov [psad8x8], edx ; sad8x8
|
||||
|
||||
|
||||
paddd xmm1, xmm2
|
||||
movd edx, xmm1
|
||||
add [tmp_sadframe], edx ; iFrameSad
|
||||
|
||||
|
||||
mov edx, [psum16x16]
|
||||
movdqa xmm1, xmm6
|
||||
pshufd xmm2, xmm1, 00001110b
|
||||
paddd xmm1, xmm2
|
||||
movd [edx], xmm1 ; sum
|
||||
|
||||
|
||||
mov edx, [p_sd8x8]
|
||||
pshufd xmm1, xmm6, 11110101b ; Sref1 Sref1 Sref0 Sref0
|
||||
psubd xmm6, xmm1 ; 00 diff1 00 diff0
|
||||
@ -1444,7 +1444,7 @@ sqdiff_bgd_width_loop:
|
||||
movq [edx], xmm1
|
||||
add edx, 8
|
||||
mov [p_sd8x8], edx
|
||||
|
||||
|
||||
mov edx, [p_mad8x8]
|
||||
WELS_MAX_REG_SSE2 xmm5
|
||||
;movdqa xmm1, xmm5
|
||||
@ -1464,7 +1464,7 @@ sqdiff_bgd_width_loop:
|
||||
mov [edx+1],cl
|
||||
add edx, 2
|
||||
mov [p_mad8x8], edx
|
||||
|
||||
|
||||
psrlq xmm7, 32
|
||||
psllq xmm7, 32 ; clear sad
|
||||
pxor xmm6, xmm6 ; sum_8x8 interleaves cur and pRef in Dword, Sref1 Scur1 Sref0 Scur0
|
||||
@ -1477,7 +1477,7 @@ sqdiff_bgd_width_loop:
|
||||
WELS_SAD_BGD_SQDIFF_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_BGD_SQDIFF_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
WELS_SAD_BGD_SQDIFF_16x1_SSE2 xmm7, xmm6, xmm5, xmm4
|
||||
|
||||
|
||||
mov edx, [psad8x8]
|
||||
movdqa xmm2, xmm7
|
||||
pshufd xmm1, xmm2, 00001110b
|
||||
@ -1485,11 +1485,11 @@ sqdiff_bgd_width_loop:
|
||||
movd [edx+4], xmm1
|
||||
add edx, 8
|
||||
mov [psad8x8], edx ; sad8x8
|
||||
|
||||
|
||||
paddd xmm1, xmm2
|
||||
movd edx, xmm1
|
||||
add [tmp_sadframe], edx ; iFrameSad
|
||||
|
||||
|
||||
mov edx, [psum16x16]
|
||||
movdqa xmm1, xmm6
|
||||
pshufd xmm2, xmm1, 00001110b
|
||||
@ -1498,7 +1498,7 @@ sqdiff_bgd_width_loop:
|
||||
add [edx], ebp
|
||||
add edx, 4
|
||||
mov [psum16x16], edx
|
||||
|
||||
|
||||
mov edx, [psqsum16x16]
|
||||
psrlq xmm7, 32
|
||||
pshufd xmm2, xmm7, 00001110b
|
||||
@ -1506,7 +1506,7 @@ sqdiff_bgd_width_loop:
|
||||
movd [edx], xmm2 ; sqsum
|
||||
add edx, 4
|
||||
mov [psqsum16x16], edx
|
||||
|
||||
|
||||
mov edx, [p_sd8x8]
|
||||
pshufd xmm1, xmm6, 11110101b ; Sref1 Sref1 Sref0 Sref0
|
||||
psubd xmm6, xmm1 ; 00 diff1 00 diff0
|
||||
@ -1514,7 +1514,7 @@ sqdiff_bgd_width_loop:
|
||||
movq [edx], xmm1
|
||||
add edx, 8
|
||||
mov [p_sd8x8], edx
|
||||
|
||||
|
||||
mov edx, [p_mad8x8]
|
||||
WELS_MAX_REG_SSE2 xmm5
|
||||
;movdqa xmm1, xmm5
|
||||
@ -1525,7 +1525,7 @@ sqdiff_bgd_width_loop:
|
||||
;punpcklwd xmm5, xmm0
|
||||
;movd [edx+4], xmm5
|
||||
;add edx, 8
|
||||
;mov [p_mad8x8], edx
|
||||
;mov [p_mad8x8], edx
|
||||
movhlps xmm1, xmm5
|
||||
movd ecx, xmm5
|
||||
mov [edx], cl
|
||||
@ -1533,7 +1533,7 @@ sqdiff_bgd_width_loop:
|
||||
mov [edx+1],cl
|
||||
add edx, 2
|
||||
mov [p_mad8x8], edx
|
||||
|
||||
|
||||
mov edx, [psqdiff16x16]
|
||||
pshufd xmm1, xmm4, 00001110b
|
||||
paddd xmm4, xmm1
|
||||
@ -1542,30 +1542,30 @@ sqdiff_bgd_width_loop:
|
||||
movd [edx], xmm4
|
||||
add edx, 4
|
||||
mov [psqdiff16x16], edx
|
||||
|
||||
|
||||
add edx, 16
|
||||
sub esi, eax
|
||||
sub edi, eax
|
||||
add esi, 16
|
||||
add edi, 16
|
||||
|
||||
|
||||
mov ecx, [tmp_ecx]
|
||||
dec ecx
|
||||
jnz sqdiff_bgd_width_loop
|
||||
|
||||
|
||||
mov esi, [tmp_esi]
|
||||
mov edi, [tmp_edi]
|
||||
add esi, eax
|
||||
add edi, eax
|
||||
|
||||
|
||||
dec dword [iPicHeight]
|
||||
jnz sqdiff_bgd_height_loop
|
||||
|
||||
|
||||
mov edx, [psadframe]
|
||||
mov ebp, [tmp_sadframe]
|
||||
mov [edx], ebp
|
||||
|
||||
add esp, localsize
|
||||
add esp, localsize
|
||||
pop ebx
|
||||
pop edi
|
||||
pop esi
|
||||
|
@ -33,4 +33,4 @@
|
||||
LIBRARY welsvp.dll
|
||||
EXPORTS
|
||||
CreateVpInterface PRIVATE
|
||||
DestroyVpInterface PRIVATE
|
||||
DestroyVpInterface PRIVATE
|
@ -27,18 +27,18 @@ LANGUAGE LANG_CHINESE, SUBLANG_CHINESE_SIMPLIFIED
|
||||
// TEXTINCLUDE
|
||||
//
|
||||
|
||||
1 TEXTINCLUDE
|
||||
1 TEXTINCLUDE
|
||||
BEGIN
|
||||
"resource.h\0"
|
||||
END
|
||||
|
||||
2 TEXTINCLUDE
|
||||
2 TEXTINCLUDE
|
||||
BEGIN
|
||||
"#include ""afxres.h""\r\n"
|
||||
"\0"
|
||||
END
|
||||
|
||||
3 TEXTINCLUDE
|
||||
3 TEXTINCLUDE
|
||||
BEGIN
|
||||
"\r\n"
|
||||
"\0"
|
||||
|
@ -23,7 +23,7 @@ rem ************************************************
|
||||
rem call VP build
|
||||
echo "Welsvp Building....."
|
||||
cd %VPProjectDir%
|
||||
rem vcclean
|
||||
rem vcclean
|
||||
%VCBUILDEXE% WelsVP_2008.vcproj
|
||||
|
||||
|
||||
@ -33,7 +33,7 @@ echo "WelsEncoder Building....."
|
||||
|
||||
cd %CurDir%
|
||||
cd %EncoderProjectDir%
|
||||
rem vcclean
|
||||
rem vcclean
|
||||
%VCBUILDEXE% WelsEncCore.vcproj
|
||||
%VCBUILDEXE% WelsEncPlus.vcproj
|
||||
%VCBUILDEXE% encConsole.vcproj
|
||||
@ -44,7 +44,7 @@ echo "WelsDecoder Building....."
|
||||
|
||||
cd %CurDir%
|
||||
cd %DecoderProjectDir%
|
||||
rem vcclean
|
||||
rem vcclean
|
||||
%VCBUILDEXE% WelsDecCore.vcproj
|
||||
%VCBUILDEXE% WelsDecPlus.vcproj
|
||||
%VCBUILDEXE% decConsole.vcproj
|
||||
|
@ -36,7 +36,7 @@ echo "WelsEncoder Building....."
|
||||
cd %CurDir%
|
||||
cd %EncoderProjectDir%
|
||||
echo current directory is %EncoderProjectDir%
|
||||
rem vcclean
|
||||
rem vcclean
|
||||
|
||||
echo %VCMSBUILDEXE_RELEASE% WelsEncoder_2010.sln
|
||||
%VCMSBUILDEXE_RELEASE% WelsEncoder_2010.sln
|
||||
@ -49,7 +49,7 @@ echo "WelsDecoder Building....."
|
||||
cd %CurDir%
|
||||
cd %DecoderProjectDir%
|
||||
echo current directory is %DecoderProjectDir%
|
||||
rem vcclean
|
||||
rem vcclean
|
||||
|
||||
echo %VCMSBUILDEXE_RELEASE% WelsDecoder_2010.sln
|
||||
|
||||
|
@ -36,7 +36,7 @@ echo "WelsEncoder Building....."
|
||||
cd %CurDir%
|
||||
cd %EncoderProjectDir%
|
||||
echo current directory is %EncoderProjectDir%
|
||||
rem vcclean
|
||||
rem vcclean
|
||||
|
||||
echo %VCMSBUILDEXE_RELEASE% WelsEncoder_2012.sln
|
||||
%VCMSBUILDEXE_RELEASE% WelsEncoder_2012.sln
|
||||
@ -49,7 +49,7 @@ echo "WelsDecoder Building....."
|
||||
cd %CurDir%
|
||||
cd %DecoderProjectDir%
|
||||
echo current directory is %DecoderProjectDir%
|
||||
rem vcclean
|
||||
rem vcclean
|
||||
|
||||
echo %VCMSBUILDEXE_RELEASE% WelsDecoder_2012.sln
|
||||
|
||||
|
@ -12,19 +12,19 @@ EnableSpsPpsIDAddition 1
|
||||
EnableFrameCropping 1 # enable frame cropping flag
|
||||
|
||||
#============================== LOOP FILTER ==============================
|
||||
LoopFilterDisableIDC 0 # Loop filter idc (0: on, 1: off,
|
||||
LoopFilterDisableIDC 0 # Loop filter idc (0: on, 1: off,
|
||||
# 2: on except for slice boundaries,
|
||||
# 3: two stage. slice boundries on in second stage
|
||||
# 4: Luma on but Chroma off (w.r.t. idc=0)
|
||||
# 4: Luma on but Chroma off (w.r.t. idc=0)
|
||||
# 5: Luma on except on slice boundaries, but Chroma off in enh. layer (w.r.t. idc=2)
|
||||
# 6: Luma on in two stage. slice boundries on in second stage, but Chroma off (w.r.t. idc=3)
|
||||
LoopFilterAlphaC0Offset 0 # AlphaOffset(-6..+6): valid range
|
||||
LoopFilterBetaOffset 0 # BetaOffset (-6..+6): valid range
|
||||
|
||||
InterLayerLoopFilterDisableIDC 0 # filter idc for inter-layer deblocking (0: on, 1: off,
|
||||
InterLayerLoopFilterDisableIDC 0 # filter idc for inter-layer deblocking (0: on, 1: off,
|
||||
# 2: on except for slice boundaries,
|
||||
# 3: two stage. slice boundries on in second stage
|
||||
# 4: Luma on but Chroma off in enh. layer (w.r.t. idc=0)
|
||||
# 4: Luma on but Chroma off in enh. layer (w.r.t. idc=0)
|
||||
# 5: Luma on except on slice boundaries, but Chroma off in enh. layer (w.r.t. idc=2)
|
||||
# 6: Luma on in two stage. slice boundries on in second stage, but Chroma off (w.r.t. idc=3)
|
||||
InterLayerLoopFilterAlphaC0Offset 0 # AlphaOffset for inter-layer deblocking
|
||||
@ -51,7 +51,7 @@ EnableAdaptiveQuantization 1 # Enable Adaptive Quantization (1: enable, 0: d
|
||||
|
||||
#============================== LONG TERM REFERENCE CONTROL ==============================
|
||||
EnableLongTermReference 0 # Enable Long Term Reference (1: enable, 0: disable)
|
||||
LtrMarkPeriod 30 # Long Term Reference Marking Period
|
||||
LtrMarkPeriod 30 # Long Term Reference Marking Period
|
||||
|
||||
#============================== LAYER DEFINITION ==============================
|
||||
PrefixNALAddingCtrl 0 # Control flag of adding prefix unit (0: off, 1: on)
|
||||
|
@ -12,19 +12,19 @@ EnableSpsPpsIDAddition 1
|
||||
EnableFrameCropping 1 # enable frame cropping flag
|
||||
|
||||
#============================== LOOP FILTER ==============================
|
||||
LoopFilterDisableIDC 0 # Loop filter idc (0: on, 1: off,
|
||||
LoopFilterDisableIDC 0 # Loop filter idc (0: on, 1: off,
|
||||
# 2: on except for slice boundaries,
|
||||
# 3: two stage. slice boundries on in second stage
|
||||
# 4: Luma on but Chroma off (w.r.t. idc=0)
|
||||
# 4: Luma on but Chroma off (w.r.t. idc=0)
|
||||
# 5: Luma on except on slice boundaries, but Chroma off in enh. layer (w.r.t. idc=2)
|
||||
# 6: Luma on in two stage. slice boundries on in second stage, but Chroma off (w.r.t. idc=3)
|
||||
LoopFilterAlphaC0Offset 0 # AlphaOffset(-6..+6): valid range
|
||||
LoopFilterBetaOffset 0 # BetaOffset (-6..+6): valid range
|
||||
|
||||
InterLayerLoopFilterDisableIDC 0 # filter idc for inter-layer deblocking (0: on, 1: off,
|
||||
InterLayerLoopFilterDisableIDC 0 # filter idc for inter-layer deblocking (0: on, 1: off,
|
||||
# 2: on except for slice boundaries,
|
||||
# 3: two stage. slice boundries on in second stage
|
||||
# 4: Luma on but Chroma off in enh. layer (w.r.t. idc=0)
|
||||
# 4: Luma on but Chroma off in enh. layer (w.r.t. idc=0)
|
||||
# 5: Luma on except on slice boundaries, but Chroma off in enh. layer (w.r.t. idc=2)
|
||||
# 6: Luma on in two stage. slice boundries on in second stage, but Chroma off (w.r.t. idc=3)
|
||||
InterLayerLoopFilterAlphaC0Offset 0 # AlphaOffset for inter-layer deblocking
|
||||
@ -51,7 +51,7 @@ EnableAdaptiveQuantization 0 # Enable Adaptive Quantization (1: enable, 0: d
|
||||
|
||||
#============================== LONG TERM REFERENCE CONTROL ==============================
|
||||
EnableLongTermReference 1 # Enable Long Term Reference (1: enable, 0: disable)
|
||||
LtrMarkPeriod 30 # Long Term Reference Marking Period
|
||||
LtrMarkPeriod 30 # Long Term Reference Marking Period
|
||||
|
||||
#============================== LAYER DEFINITION ==============================
|
||||
PrefixNALAddingCtrl 0 # Control flag of adding prefix unit (0: off, 1: on)
|
||||
|
@ -12,19 +12,19 @@ EnableSpsPpsIDAddition 1
|
||||
EnableFrameCropping 1 # enable frame cropping flag
|
||||
|
||||
#============================== LOOP FILTER ==============================
|
||||
LoopFilterDisableIDC 0 # Loop filter idc (0: on, 1: off,
|
||||
LoopFilterDisableIDC 0 # Loop filter idc (0: on, 1: off,
|
||||
# 2: on except for slice boundaries,
|
||||
# 3: two stage. slice boundries on in second stage
|
||||
# 4: Luma on but Chroma off (w.r.t. idc=0)
|
||||
# 4: Luma on but Chroma off (w.r.t. idc=0)
|
||||
# 5: Luma on except on slice boundaries, but Chroma off in enh. layer (w.r.t. idc=2)
|
||||
# 6: Luma on in two stage. slice boundries on in second stage, but Chroma off (w.r.t. idc=3)
|
||||
LoopFilterAlphaC0Offset 0 # AlphaOffset(-6..+6): valid range
|
||||
LoopFilterBetaOffset 0 # BetaOffset (-6..+6): valid range
|
||||
|
||||
InterLayerLoopFilterDisableIDC 0 # filter idc for inter-layer deblocking (0: on, 1: off,
|
||||
InterLayerLoopFilterDisableIDC 0 # filter idc for inter-layer deblocking (0: on, 1: off,
|
||||
# 2: on except for slice boundaries,
|
||||
# 3: two stage. slice boundries on in second stage
|
||||
# 4: Luma on but Chroma off in enh. layer (w.r.t. idc=0)
|
||||
# 4: Luma on but Chroma off in enh. layer (w.r.t. idc=0)
|
||||
# 5: Luma on except on slice boundaries, but Chroma off in enh. layer (w.r.t. idc=2)
|
||||
# 6: Luma on in two stage. slice boundries on in second stage, but Chroma off (w.r.t. idc=3)
|
||||
InterLayerLoopFilterAlphaC0Offset 0 # AlphaOffset for inter-layer deblocking
|
||||
@ -51,7 +51,7 @@ EnableAdaptiveQuantization 1 # Enable Adaptive Quantization (1: enable, 0: d
|
||||
|
||||
#============================== LONG TERM REFERENCE CONTROL ==============================
|
||||
EnableLongTermReference 1 # Enable Long Term Reference (1: enable, 0: disable)
|
||||
LtrMarkPeriod 30 # Long Term Reference Marking Period
|
||||
LtrMarkPeriod 30 # Long Term Reference Marking Period
|
||||
|
||||
#============================== LAYER DEFINITION ==============================
|
||||
PrefixNALAddingCtrl 0 # Control flag of adding prefix unit (0: off, 1: on)
|
||||
|
Loading…
Reference in New Issue
Block a user