Commit Graph

26 Commits

Author SHA1 Message Date
Christophe Gisquet
2267003981 x86: hpeldsp: better factorization
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-29 21:47:40 +02:00
Christophe Gisquet
81aa0f4604 x86: hpeldsp: implement SSSE3 version of _xy2
Loading pb_1 rather than pw_8192 was benchmarked to be more efficient.
Loading of the 2 yields no advantage. Loading of one saves ~11 cycles.

decicycles count:
put8:  3223(mmx)    -> 2387
avg8:  2863(mmxext) -> 2125
put16: 4356(sse2)   -> 3553
avg16: 4481(sse2)   -> 3513

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 15:15:56 +02:00
Christophe Gisquet
9722a6a3f3 x86: hpeldsp: implement SSE2 put_pixels16_xy2
This is obviously equivalent to the avg version, without the avg.

3223(mmx) -> 2006(sse2)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 03:45:17 +02:00
Christophe Gisquet
f0aca50e0b x86: hpeldsp: implement SSE2 versions
Those are mostly used in codecs older than H.264, eg MPEG-2.

put16 versions:
      mmx  mmx2  sse2
x2:  1888  1185   552
y2:  1778  1092   510

avg16 xy2: 3509(mmx2) -> 2169(sse2)

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-24 03:29:48 +02:00
Christophe Gisquet
c081ca851c x86: hpeldsp: avg_pixels_xy2 for mmx2&3dnow
This is a port of the inline assembly of the mmx version to use the
pavg(us|)b instruction.

        8    16
mmx   1498  4355
mmx2  1242  3509

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:49 +02:00
Christophe Gisquet
17ac998055 x86: hpeldsp: mark _xy2 versions as approximate
Currently, only the mmx version is bitexact, the others (mmxext and
3dnow) are not, in spite of their naming.

Therefore, make their name more obvious. Also restore a comment that
was removed in 71155d7b.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:45 +02:00
Christophe Gisquet
f8de35ebc4 x86: hpeldsp: kill hpeldsp_mmx.c
before:
1987 decicycles in 8_x2, 262121 runs, 23 skips

after:
1902 decicycles in 8_x2, 262112 runs, 32 skips

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-05-22 20:17:40 +02:00
Michael Niedermayer
4104eb44e6 Merge commit '55519926ef855c671d084ccc151056de9e3d3a77'
* commit '55519926ef855c671d084ccc151056de9e3d3a77':
  x86: Make function prototype comments in assembly code consistent

Conflicts:
	libavcodec/x86/sbrdsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 00:01:30 +01:00
Michael Niedermayer
1c788eaca9 Merge commit '831a1180785a786272cdcefb71566a770bfb879e'
* commit '831a1180785a786272cdcefb71566a770bfb879e':
  Update dsputil- and SIMD-related comments to match reality more closely

Conflicts:
	libavcodec/x86/hpeldsp.asm
	libavutil/arm/float_dsp_init_arm.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-13 23:59:56 +01:00
Diego Biurrun
55519926ef x86: Make function prototype comments in assembly code consistent
This helps grepping for functions, among other things.
2014-03-13 05:50:29 -07:00
Diego Biurrun
831a118078 Update dsputil- and SIMD-related comments to match reality more closely 2014-03-13 05:50:29 -07:00
Mikulas Patocka
694d997afe x86: hpeldsp: Use PAVGB instruction macro where necessary
Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Diego Biurrun <diego@biurrun.de>
2013-11-04 01:29:23 +01:00
Mikulas Patocka
074155360d avcodec/x86/hpeldsp: fix crash on AMD K6-3+
There are instructions pavgb and pavgusb. Both instructions do the same
operation but they have different enconding. Pavgb exists in SSE (or
MMXEXT) instruction set and pavgusb exists in 3D-NOW instruction set.

livavcodec uses the macro PAVGB to select the proper instruction. However,
the function avg_pixels8_xy2 doesn't use this macro, it uses pavgb
directly.

As a consequence, the function avg_pixels8_xy2 crashes on AMD K6-2 and
K6-3 processors, because they have pavgusb, but not pavgb.

This bug seems to be introduced by commit
71155d7b41, "dsputil: x86: Convert mpeg4
qpel and dsputil avg to yasm"

Signed-off-by: Mikulas Patocka <mikulas@artax.karlin.mff.cuni.cz>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-11-03 19:49:11 +01:00
Ronald S. Bultje
610b18e2e3 x86: qpel: Move fullpel and l2 functions to a separate file
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-04-08 12:38:33 +03:00
Ronald S. Bultje
22cc8a103c x86/qpel: move fullpel and l2 functions to separate file.
This way, they can be shared between mpeg4qpel and h264qpel without
requiring either one to be compiled unconditionally.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-03-09 17:25:30 +01:00
Michael Niedermayer
ede45c4e1d Merge commit '25841dfe806a13de526ae09c11149ab1f83555a8'
* commit '25841dfe806a13de526ae09c11149ab1f83555a8':
  Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter.

Conflicts:
	libavcodec/alpha/dsputil_alpha.c
	libavcodec/dsputil_template.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-06 12:18:25 +01:00
Diego Biurrun
25841dfe80 Use ptrdiff_t instead of int for {avg, put}_pixels line_size parameter.
This avoids SIMD-optimized functions having to sign-extend their
line size argument manually to be able to do pointer arithmetic.
2013-02-05 12:59:12 +01:00
Michael Niedermayer
cb573f7fbc avcodec/x86: Add daniels copyright to the recent gcc->yasm convertions he did.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-02-03 13:50:44 +01:00
Michael Niedermayer
dd87d4a318 Merge remote-tracking branch 'qatar/master'
* qatar/master:
  x86: hpel: Move {avg,put}_pixels16_sse2 to hpeldsp
  configure: Add a comment indicating why uclibc is checked before glibc

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-31 20:03:36 +01:00
Diego Biurrun
52acd79165 x86: hpel: Move {avg,put}_pixels16_sse2 to hpeldsp 2013-01-31 11:19:23 +01:00
Michael Niedermayer
bb2f4ae434 Merge commit '05b0998f511ffa699407465d48c7d5805f746ad2'
* commit '05b0998f511ffa699407465d48c7d5805f746ad2':
  dsputil: Fix error by not using redzone and register name
  swscale: GBRP output support

Conflicts:
	libswscale/output.c
	libswscale/swscale.c
	libswscale/swscale_internal.h
	libswscale/utils.c
	tests/ref/lavfi/pixdesc
	tests/ref/lavfi/pixfmts_copy
	tests/ref/lavfi/pixfmts_null
	tests/ref/lavfi/pixfmts_scale
	tests/ref/lavfi/pixfmts_vflip

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-28 14:11:31 +01:00
Michael Niedermayer
834e9fb056 x86: hpeldsp: Fix a typo, use the right register
This makes the code actually work.

Signed-off-by: Martin Storsjö <martin@martin.st>
2013-01-28 12:49:37 +02:00
Daniel Kang
05b0998f51 dsputil: Fix error by not using redzone and register name
Signed-off-by: Derek Buitenhuis <derek.buitenhuis@gmail.com>
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-28 07:23:20 +01:00
Michael Niedermayer
edde562130 AVG_PIXELS8_XY2: fix typo, make code actually work
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-27 15:50:26 +01:00
Michael Niedermayer
aa3f449955 x86/hpeldsp: Fix author attribution
This also fixes the project name

Original authors fabrice and nick go back to the initial ffmpeg commit
Others for example contributed in: (for a complete list please use git blame / show / log)

commit e9c0a38ff0
Author: Zdenek Kabelac <kabi@informatics.muni.cz>
Date:   Tue May 28 16:35:58 2002 +0000

    * optimized avg_* functions (except xy2)
    * minor speedup for put_pixels_x2 & cleanup

    Originally committed as revision 619 to svn://svn.ffmpeg.org/ffmpeg/trunk

commit 607dce96c0
Author: Michael Niedermayer <michaelni@gmx.at>
Date:   Fri May 17 01:04:14 2002 +0000

    hopefully faster mmx2&3dnow MC

    Originally committed as revision 506 to svn://svn.ffmpeg.org/ffmpeg/trunk

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2013-01-27 14:47:58 +01:00
Daniel Kang
71155d7b41 dsputil: x86: Convert mpeg4 qpel and dsputil avg to yasm
Signed-off-by: Luca Barbato <lu_zero@gentoo.org>
2013-01-27 06:45:31 +01:00