This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC loop filter and sao functions in new file hevc_lpf_sao_msa.c
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
In this patch, in comparision with previous patch, duplicated c functions are removed.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC idct functions in new file hevc_idct_msa.c
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC uni mc epel functions.
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC mc epel functions.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC biw mc functions (qpel as well as epel) in new file hevc_mc_biw_msa.c
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC uniw mc functions (qpel as well as epel) in new file hevc_mc_uniw_msa.c
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch adds MSA (MIPS-SIMD-Arch) optimizations for HEVC bi mc functions (qpel as well as epel) in new file hevc_mc_bi_msa.c
Adds new generic macros (needed for this patch) in libavutil/mips/generic_macros_msa.h
Adds HEVC specific macros (needed for this patch) in libavcodec/mips/hevc_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch modifies HEVC mc MIPS-SIMD optimized code according to improved version of generic macros.
Overall, this patch is just upgrading the code with styling changes and will bring it in sync with MIPS-SIMD optimized latest codebase at our end.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch moves HEVC code of uni mc cases to new file hevc_mc_uni_msa.c.
(There are total 5 sub-modules of HEVC mc functions, if we add all these modules in one single file, its size would be huge (~750k) & difficult to maintain, so splitting it in multiple files)
This patch also adds new HEVC header file libavcodec/mips/hevc_macros_msa.h
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This patch modifies H264 loopfilter, weighted & bi-weighted prediction MIPS-SIMD optimized code according to improved version of generic macros.
Also there are minor code alignment changes.
Overall, this patch is just upgrading the code with styling changes and will bring it in sync with MIPS-SIMD optimized latest codebase at our end.
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
HAVE_LOONGSON is replaced by HAVE_LOONGSON3. Even Loongson-2E and 2F support
Loongson SIMD instructs but have low performance for decoding. We plan to focus
on optimizing Loongson-3A1000, 3B1500 and 3A1500, and modify the configure file
to support Loongson-2 series later by adding HAVE_LOONGSON2.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This mainly consists of replacing all the pointer arithmatic 'addiu'
instructions with PTR_ADDIU which will handle the differences in pointer
sizes when compiled on 64 bit mips systems.
The header asmdefs.h contains the PTR_ macros which expend to the correct mips
instructions to manipulate registers containing pointers.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
There are no independant uses of mips32r2 instructions except for the
FPU parts. Due to the heavy use of mips32r2 specifc fpu extensions, I
am guessing the original author intended MIPSFPU to imply MIPS32R2 anyway.
Since these fpu instructions are available on mips64 (non-r2), enable them
there as well.
Also remove the last occurence of HAVE_MIPS32R2 (which is coupled to
HAVE_MIPSFPU anyway).
mips32r2 is left in the list of options form compatability so that using
--disable-mips32r2 doesn't break anything.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Removing these removes the dependency of this code on mips32r2 which would
allow it to be used on processors which have FPU instructions, but not r2
instructions (like the mips64el debian port for instance).
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
On mips64, the registers t[4-7] do not exist. Instead of using a lot of #ifdef
or defines to handle differing register names, use variables and let GCC
allocate the registers automatically (like in the other mips assembly files).
In get_band_cost_ESC_mips, t4 and t5 were renamed to t6 and t7 to avoid a
variable name conflict.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
This is obviously needed for 64-bit support.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Reviewed-by: Nedeljko Babic <Nedeljko.Babic@imgtec.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Change register constraint on the v variable from = to +. This was causing GCC
to think that the v variable was never read and therefore not initialize it.
This fixes about 20 fate failures on mips64el.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The float_copy and fmul_and_reverse functions are refactored out from the
multiple copies in this file.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
The optimized C version of this code actually runs faster than this
version, so remove it.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Remove some assembly that the compiler can easily handle optimally on its own.
GCC produces almost identical assembly.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Q_fract should have be declared as 'const float*'.
Also fix the constness of some local variables affected by this.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
GCC is perfectly happy generating optimized multiplication code on its own for
64-bit arches. GCC refuses to optimize the loongson code when in 32-bit mode,
so I've left that.
Signed-off-by: James Cowgill <james410@cowgill.org.uk>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Iterative implementation of 32 bit fixed point split-radix FFT.
Max FFT that can be calculated currently is 2^12.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
Floating point FFT (nips optimized) breaks when hard coded tables are
not enabled because MIPS optimization of floating point FFT uses only
ff_init_ff_cos_tabs(16) which is not enabled by default in that case.
This patch is fixing it.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
List of clobbered registers fixed and added where it is lacking.
Signed-off-by: Nedeljko Babic <nbabic@mips.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
* qatar/master:
lavc: Move vector_fmul_window to AVFloatDSPContext
rtpdec_mpeg4: Check the remaining amount of data before reading
Conflicts:
libavcodec/dsputil.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>