Inspection of compiled code shows gcc handles these fine on its own.
Benchmarking also shows no measurable speed difference.
Removing the remaining cases in get_cabac_bypass_sign_x86() does
cause more substantial changes to the compiled code with uncertain
impact.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The upcoming gcc 4.7 has more advanced constant propagation
resulting some inline asm operands becoming constants and thus
emitted as literals, sometimes in contexts where this results
in invalid instructions.
This patch changes the constraints of the relevant operands
to "rm" thus forcing a valid type. While obviously suboptimal,
this is what older gcc versions already did, and there is no
change to the code generated with these.
Signed-off-by: Mans Rullgard <mans@mansr.com>
This macro can cause problems in conjunction with the bitdepth
template expansion. It was presumably added to keep source
compatibility when high bitdepth support was added. However,
emulated_edge_mc is a dsputil pointer and should not be called
directly, so there is little reason to keep such a macro.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Some operands need to be accessed in byte mode, which restricts the
available registers in 32-bit mode. Using the 'q' constraint selects
a suitable register.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Parts are inspired from the 8-bit H.264 predict code in Libav.
Other parts ported from x264 with relicensing permission from author.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
Fixes regression in 836f47d34b in ICC-10.x,
since ICC<=11.0 doesn't align stack upon function calls.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Ports the majority of IDCT functions for 10-bit H.264.
Parts are inspired from 8-bit IDCT code in Libav; other parts ported from x264 with relicensing permission from author.
Signed-off-by: Ronald S. Bultje <rbultje@google.com>
This separation allows these functions to be used in a cleaner
fashion from other codecs (e.g. qdm2) and simplifies creating
optimised versions of them.
Signed-off-by: Mans Rullgard <mans@mansr.com>
Arguments for variable size instructions are added to many macros, along
with other various changes. The x86util.asm code was ported from x264.
Signed-off-by: Diego Biurrun <diego@biurrun.de>
This patch lets e.g. dsputil_init chose dsp functions with respect to
the bit depth to decode. The naming scheme of bit depth dependent
functions is <base name>_<bit depth>[_<prefix>] (i.e. the old
clear_blocks_c is now named clear_blocks_8_c).
Note: Some of the functions for high bit depth is not dependent on the
bit depth, but only on the pixel size. This leaves some room for
optimizing binary size.
Preparatory patch for high bit depth h264 decoding support.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
If the function is not inlined, an immmediate cannot be used for the
shift parameter, so the %cl register must be used instead in that case.
This fixes compilation for x86-32 using gcc with --disable-optimizations.
This fixes unexpected name collisions that were occurring with variables
declared within the macros.
It also fixes the fate-acodec-ac3_fixed regression test on x86-32.
AC3DSPContext.ac3_max_msb_abs_int16() finds the maximum MSB of the absolute
value of each element in an array of int16_t.
Signed-off-by: Ronald S. Bultje <rsbultje@gmail.com>
Fix emu_edge_v_extend_15 to be <128 bytes on Win64, by being more strict
on the size of registers and which registers are being used for operations
where multiple are available. This fixes segfaults in emulated_edge()
function calls on Win64.
This will be beneficial for use with the audio conversion API without
requiring it to depend on all of dsputil.
Signed-off-by: Mans Rullgard <mans@mansr.com>
The original functions did not work correctly for edge pixels, e.g.
when CODEC_FLAG_EMU_EDGE is set, leading to corrupt output in e.g. VLC.
Based on a patch by Daniel Kang <daniel d kang gmail com>.
Signed-off-by: Ronald S. Bultje <rsbultje gmail com>
About 2.5x the speed.
NOTE: the way that the asm code handles large qmuls is a bit suboptimal.
If x264-style dequant was used (separate shift and qmul values), it might
be possible to get some extra speed.
Originally committed as revision 26336 to svn://svn.ffmpeg.org/ffmpeg/trunk
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26162 to svn://svn.ffmpeg.org/ffmpeg/trunk
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26159 to svn://svn.ffmpeg.org/ffmpeg/trunk
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26158 to svn://svn.ffmpeg.org/ffmpeg/trunk
(authors:Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26157 to svn://svn.ffmpeg.org/ffmpeg/trunk
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26156 to svn://svn.ffmpeg.org/ffmpeg/trunk
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26155 to svn://svn.ffmpeg.org/ffmpeg/trunk
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26151 to svn://svn.ffmpeg.org/ffmpeg/trunk
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26150 to svn://svn.ffmpeg.org/ffmpeg/trunk
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26149 to svn://svn.ffmpeg.org/ffmpeg/trunk
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26148 to svn://svn.ffmpeg.org/ffmpeg/trunk
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26147 to svn://svn.ffmpeg.org/ffmpeg/trunk
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26146 to svn://svn.ffmpeg.org/ffmpeg/trunk
(authors: Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot
d dot kang at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26145 to svn://svn.ffmpeg.org/ffmpeg/trunk
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang
at gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26143 to svn://svn.ffmpeg.org/ffmpeg/trunk
Jason, Loren, Holger) to FFmpeg. Patch by Daniel Kang <daniel dot d dot kang at
gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26142 to svn://svn.ffmpeg.org/ffmpeg/trunk
FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-
Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and
Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing
for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.
Originally committed as revision 26140 to svn://svn.ffmpeg.org/ffmpeg/trunk
FFmpeg. Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-
Glaser <darkshikari gmail com> (approves LGPL relicensing for this code) and
Loren Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing
for this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.
Originally committed as revision 26139 to svn://svn.ffmpeg.org/ffmpeg/trunk
Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser
<darkshikari gmail com> (approves LGPL relicensing for this code) and Loren
Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for
this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.
Originally committed as revision 26138 to svn://svn.ffmpeg.org/ffmpeg/trunk
Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser
<darkshikari gmail com> (approves LGPL relicensing for this code) and Loren
Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for
this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.
Originally committed as revision 26137 to svn://svn.ffmpeg.org/ffmpeg/trunk
authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser <darkshikari
gmail com> (approves LGPL relicensing for this code) and Loren Merritt <lorenm
at u dot washington dot edu> (approves LGPL relicensing for this code). Patch
by Daniel Kang <daniel dot d dot kang at gmail com>, as part of Google's GCI
2010.
Originally committed as revision 26135 to svn://svn.ffmpeg.org/ffmpeg/trunk
Original authors: Holger Lubitz <holger lubitz org>, Jason Garrett-Glaser
<darkshikari gmail com> (approves LGPL relicensing for this code) and Loren
Merritt <lorenm at u dot washington dot edu> (approves LGPL relicensing for
this code). Patch by Daniel Kang <daniel dot d dot kang at gmail com>, as
part of Google's GCI 2010.
Originally committed as revision 26132 to svn://svn.ffmpeg.org/ffmpeg/trunk
initially said he'd be OK with relicensing, he also said he wanted to have
another look at the patch, and then he went on vacation, so let's play it
safe for now. We can consider removing this again later.
Originally committed as revision 26131 to svn://svn.ffmpeg.org/ffmpeg/trunk
LGPL relicensing approved by original authors: Holger Lubitz <holger lubitz
org>, Jason Garrett-Glaser <darkshikari gmail com> and Loren Merritt <lorenm
at u dot washington dot edu>. Patch by Daniel Kang <daniel dot d dot kang at
gmail com>, as part of Google's GCI 2010.
Originally committed as revision 26087 to svn://svn.ffmpeg.org/ffmpeg/trunk
This fixes compilation with the latest clang trunk version.
Patch by İsmail Dönmez, ismail at namtrac dot org
Originally committed as revision 25628 to svn://svn.ffmpeg.org/ffmpeg/trunk
These blocks depended on the compiler keeping xmm registers untouched between
them.
Originally committed as revision 25619 to svn://svn.ffmpeg.org/ffmpeg/trunk
suncc does not like the leading commas inside the macro, but it has no problem
with trailing commas.
Originally committed as revision 25615 to svn://svn.ffmpeg.org/ffmpeg/trunk
Some code was initializing some xmm registers in one asm block and using them
in the following block, assuming they wouldn't be changed in between blocks.
Originally committed as revision 25568 to svn://svn.ffmpeg.org/ffmpeg/trunk
prediction (plus some with different rounding for svq3/rv40). Speedup (for
SSSE3) about ~6-fold, 3.6% faster overall with cathedral sample.
Originally committed as revision 25361 to svn://svn.ffmpeg.org/ffmpeg/trunk
Fixes compilation with clang's builtin assembler
Patch by İsmail Dönmez, ismail at namtrac dot org
Originally committed as revision 25331 to svn://svn.ffmpeg.org/ffmpeg/trunk
inline asm works for gcc-3.x also (hopefully). Should fix gcc-3.x FATE
breakage after r25254.
Originally committed as revision 25262 to svn://svn.ffmpeg.org/ffmpeg/trunk
increase to e.g. vc1, snow and mpeg decoding.
Patch by Eli Friedman <eli dot friedman gmail com>.
Originally committed as revision 25259 to svn://svn.ffmpeg.org/ffmpeg/trunk
from memory locations/offsets depending on b_idx plus constants, rather than
having gcc do this. This saves several lea calls and together saves about
10 cycles in h264_loop_filter_strength_mmx2().
Originally committed as revision 25256 to svn://svn.ffmpeg.org/ffmpeg/trunk
a pxor, or remove the instruction alltogether. Altogether, this saves 1
instruction.
Originally committed as revision 25255 to svn://svn.ffmpeg.org/ffmpeg/trunk
This has no measurable speed effect because the surrounding code doesn't
take advantage of this yet.
Originally committed as revision 25254 to svn://svn.ffmpeg.org/ffmpeg/trunk
of the d_idx variable and therefore allows for future optimizations. No speed
difference by this commit itself.
Originally committed as revision 25253 to svn://svn.ffmpeg.org/ffmpeg/trunk
inlining various constants within the loop code. 20 cycles faster on
cathedral sample.
Originally committed as revision 25252 to svn://svn.ffmpeg.org/ffmpeg/trunk