859 Commits

Author SHA1 Message Date
Daniel Kang
4de83b7b6d H264: x86 predict init cosmetics.
Change indentation and whitespace; also move HAVE_YASM blocks.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-06-08 00:22:52 +02:00
Daniel Kang
a8d44f9dd5 Add x86 assembly for some 10-bit H.264 intra predict functions.
Parts are inspired from the 8-bit H.264 predict code in Libav.
Other parts ported from x264 with relicensing permission from author.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-06-06 01:31:02 +02:00
Loren Merritt
53be7b23e9 Cosmetic changes to h264_idct_10bit.asm.
Removes redundant dword tags and whitespace changes.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-02 07:07:15 -07:00
Loren Merritt
994c3550ff 2x faster h264_idct_add8_10.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-06-02 07:07:02 -07:00
Ronald S. Bultje
e6635a9a19 h264: remove CONFIG_GPL from x86 intra prediction code.
The authors permitted relicensing to LGPL a long time ago (Holger,
Loren and Jason).
2011-06-02 07:02:46 -07:00
Daniel Kang
f3aa65af3a h264/10bit: add HAVE_ALIGNED_STACK checks.
Fixes regression in 836f47d34b49e8ba9883e738a42f154130421caa in ICC-10.x,
since ICC<=11.0 doesn't align stack upon function calls.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-31 21:43:20 -07:00
Daniel Kang
348493db60 Update 8-bit H.264 IDCT function names to reflect bit-depth.
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
2011-05-31 15:02:32 -07:00
Daniel Kang
836f47d34b Add IDCT functions for 10-bit H.264.
Ports the majority of IDCT functions for 10-bit H.264.

Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author.

Signed-off-by: Ronald S. Bultje <rbultje@google.com>
2011-05-31 15:02:32 -07:00
Justin Ruggles
70bb747a57 ac3dsp: do not use the ff_* prefix when referencing ff_ac3_bap_bits.
this should fix the windows builds

Signed-off-by: Martin Storsjö <martin@martin.st>
2011-05-28 22:43:40 +03:00
Justin Ruggles
6ca23db9cc ac3enc: modify mantissa bit counting to keep bap counts for all values of bap
instead of just 0 to 4.

This does all the actual bit counting as a final step.
2011-05-28 12:39:28 -04:00
Diego Biurrun
5e528cffcf x86: Add appropriate ifdefs around certain AVX functions.
nasm versions prior to 2.09 have trouble assembling some of our AVX code.
Protect these sections by preprocessor macros to allow compilation to pass.
2011-05-27 21:18:12 +02:00
Dave Yeo
a10fb79070 x86 asm: Add SECTION_TEXT to dct32_sse.asm.
This fixes the following error on OS/2:
error: segment name `.text align=16' not recognized

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-23 12:47:53 +02:00
Loren Merritt
422b2362fc dct32_sse: eliminate some spills
125->104 cycles on penryn (x86_64 only)
2011-05-22 19:27:18 +02:00
Vitor Sessak
165c7c420d Fix dct32() compilation with --disable-yasm
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-22 07:10:19 -04:00
Vitor Sessak
6204feb160 dct32: Add AVX implementation of 32-point DCT 2011-05-21 17:42:26 +02:00
Vitor Sessak
4e653b98c8 dct32: Change pass 6 permutation to allow for AVX implementation 2011-05-21 17:42:26 +02:00
Vitor Sessak
3758eb0eb9 dct32: port SSE 32-point DCT to YASM 2011-05-21 17:42:26 +02:00
Diego Biurrun
153382e1b6 multiple inclusion guard cleanup
Add missing multiple inclusion guards; clean up #endif comments;
add missing library prefixes; keep guard names consistent.
2011-05-21 13:48:10 +02:00
Dave Yeo
d69f9a4234 Add support for a.out object format to assembler macros.
This format is still used by e.g. OS/2.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-20 17:52:21 +02:00
Mans Rullgard
0b5e44ed29 mpegaudiodsp: fix x86 and ppc makefiles
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-19 16:32:24 +01:00
Mans Rullgard
c4f5c2d6f4 Move some mpegaudio functions to new mpegaudiodsp subsystem
This separation allows these functions to be used in a cleaner
fashion from other codecs (e.g. qdm2) and simplifies creating
optimised versions of them.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-05-19 12:25:34 +01:00
Justin Ruggles
e98a95e779 10l: wrap float_interleave functions in HAVE_YASM.
fixes compilation with --disable-yasm
2011-05-18 20:18:08 -04:00
Justin Ruggles
32f8fb8ecf Add float_interleave() to FmtConvertContext with x86-optimized versions.
Partially based on patches by clsid2 in ffdshow-tryout.
ff_float_interleave6() x86 improvements by Loren Merrit.
2011-05-18 17:27:05 -04:00
Daniel Kang
d0005d347d Modify x86util.asm to ease transitioning to 10-bit H.264 assembly.
Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-17 20:44:48 +02:00
Gil Pedersen
257de5fb25 h264dsp_mmx: Add #ifdefs around some mmxext functions on x86_64.
This fixes linking errors due to undefined symbols on x86_64 OS X.

Signed-off-by: Diego Biurrun <diego@biurrun.de>
2011-05-16 15:35:53 +02:00
Diego Biurrun
888fa31eca Fix FSF address copy paste error in some license headers. 2011-05-14 21:32:31 +02:00
Jason Garrett-Glaser
5705b02079 10-bit H.264 x86 chroma v loopfilter asm
Also delete some unused deblock asm macros.
2011-05-11 11:09:10 -07:00
Jason Garrett-Glaser
9f3d6ca4f1 Port x86 10-bit H.264 deblock asm from x264 2011-05-10 20:02:15 -07:00
Jason Garrett-Glaser
8ad77b65b5 Update x86 H.264 deblock asm
Includes AVX versions from x264.
2011-05-10 20:01:58 -07:00
Ronald S. Bultje
86b29553f8 h264dsp_mmx: place bracket outside #if/#endif block.
Should fix compile on systems missing yasm/nasm.
2011-05-10 08:39:38 -04:00
Oskar Arvidsson
19a0729b4c Adds 8-, 9- and 10-bit versions of some of the functions used by the h264 decoder.
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).

Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.

Preparatory patch for high bit depth h264 decoding support.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-05-10 07:24:36 -04:00
Diego Biurrun
a734fa575f Remove disabled non-optimized code variants. 2011-04-29 20:01:13 +02:00
Vitor Sessak
9d35fa520e Add AVX FFT implementation.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
2011-04-26 18:25:24 +02:00
Vitor Sessak
33cbfa6fa3 Update x86inc.asm from x264 to allow AVX emulation using SSE and MMX.
Signed-off-by: Reinhard Tartler <siretart@tauware.de>
2011-04-26 18:18:22 +02:00
Alexander Strange
1500be13f2 dsputil: allow to skip drawing of top/bottom edges. 2011-03-26 17:45:38 -04:00
Justin Ruggles
e6e9823488 Add apply_window_int16() to DSPContext with x86-optimized versions and use it
in the ac3_fixed encoder.
2011-03-22 21:08:30 -04:00
Mans Rullgard
0aded9484d Move dct and rdft definitions to separate files
This leaves fft.h with only the core FFT and MDCT definitions
thus making it more managable.

Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-20 17:15:33 +00:00
Mans Rullgard
2912e87a6c Replace FFmpeg with Libav in licence headers
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-19 13:33:20 +00:00
Justin Ruggles
0f999cfddb ac3enc: add float_to_fixed24() with x86-optimized versions to AC3DSPContext
and use in scale_coefficients() for the floating-point AC-3 encoder.
2011-03-17 16:46:48 -04:00
Justin Ruggles
79414257e2 mathops: fix MULL() when the compiler does not inline the function.
If the function is not inlined, an immmediate cannot be used for the
shift parameter, so the %cl register must be used instead in that case.

This fixes compilation for x86-32 using gcc with --disable-optimizations.
2011-03-15 20:49:37 -04:00
Justin Ruggles
aaff3b312e mathops: change "g" constraint to "rm" in x86-32 version of MUL64().
The 1-arg imul instruction cannot take an immediate argument, only a register
or memory argument.
2011-03-15 13:43:47 -04:00
Justin Ruggles
b181b8fb96 mathops: convert MULL/MULH/MUL64 to inline functions rather than macros.
This fixes unexpected name collisions that were occurring with variables
declared within the macros.
It also fixes the fate-acodec-ac3_fixed regression test on x86-32.
2011-03-15 13:43:47 -04:00
Justin Ruggles
f1efbca5e9 ac3enc: add SIMD-optimized shifting functions for use with the fixed-point AC3 encoder. 2011-03-14 08:45:31 -04:00
Mans Rullgard
a5444fee06 Add CONFIG_AC3DSP symbol to simplify makefiles
Signed-off-by: Mans Rullgard <mans@mansr.com>
2011-03-12 11:35:26 +00:00
Ronald S. Bultje
bf6fa73245 dsputil_mmx.c: remove ff_vector128.
Remove ff_vector128, it is identical to ff_pb_80.
2011-02-19 10:51:15 -05:00
Ronald S. Bultje
12802ec060 dsputil: move VC1-specific stuff into VC1DSPContext. 2011-02-17 17:35:35 -05:00
Justin Ruggles
1f004fc512 ac3dsp: Change punpckhqdq to movhlps in ac3_max_msb_abs_int16().
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-16 14:08:34 -05:00
Justin Ruggles
fbb6b49dab ac3enc: Add x86-optimized function to speed up log2_tab().
AC3DSPContext.ac3_max_msb_abs_int16() finds the maximum MSB of the absolute
value of each element in an array of int16_t.

Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-13 16:49:39 -05:00
Loren Merritt
e6b1ed693a FFT: factor a shuffle out of the inner loop and merge it into fft_permute.
6% faster SSE FFT on Conroe, 2.5% on Penryn.

Signed-off-by: Janne Grunau <janne-ffmpeg@jannau.net>
2011-02-13 15:36:39 +01:00
Justin Ruggles
dda3f0ef48 Add x86-optimized versions of exponent_min().
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
2011-02-10 15:32:47 -05:00