Commit Graph

1570 Commits

Author SHA1 Message Date
Michael Niedermayer
74fed968d1 Merge commit '82dd1026cfc1d72b04019185bea4c1c9621ace3f'
* commit '82dd1026cfc1d72b04019185bea4c1c9621ace3f':
  x86: dsputil: Move hpeldsp-related declarations to a separate header

Conflicts:
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 23:21:54 +01:00
Michael Niedermayer
9333bba6ed Merge commit '6655c933a887a2d20707fff657b614aa1d86a25b'
* commit '6655c933a887a2d20707fff657b614aa1d86a25b':
  x86: dsputil: Move fpel declarations to a separate header

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 23:08:22 +01:00
Michael Niedermayer
77bc342975 Merge commit '322a1dda973e802db7b57f2007fad3efcd5bab81'
* commit '322a1dda973e802db7b57f2007fad3efcd5bab81':
  dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros

Conflicts:
	libavcodec/arm/hpeldsp_init_arm.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 22:53:33 +01:00
Michael Niedermayer
d6d3cfb0aa Merge commit '600b854ad8173995518bd917e7f86120b5505088'
* commit '600b854ad8173995518bd917e7f86120b5505088':
  imgconvert: Move ff_deinterlace_line_*_mmx declarations out of dsputil

Conflicts:
	libavcodec/imgconvert.c
	libavcodec/x86/dsputil_x86.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 22:17:54 +01:00
Michael Niedermayer
8fbc6e5911 Merge commit '1a8d0cf77ed2611e542ae98f341d4c43a04467bd'
* commit '1a8d0cf77ed2611e542ae98f341d4c43a04467bd':
  x86: dsputil: Move inline assembly macros to a separate header

Conflicts:
	libavcodec/x86/dsputil_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 22:11:27 +01:00
Diego Biurrun
82dd1026cf x86: dsputil: Move hpeldsp-related declarations to a separate header 2014-03-22 06:17:29 -07:00
Diego Biurrun
6655c933a8 x86: dsputil: Move fpel declarations to a separate header 2014-03-22 06:17:29 -07:00
Diego Biurrun
322a1dda97 dsputil: Refactor duplicated CALL_2X_PIXELS / PIXELS16 macros 2014-03-22 06:17:29 -07:00
Diego Biurrun
600b854ad8 imgconvert: Move ff_deinterlace_line_*_mmx declarations out of dsputil 2014-03-22 06:17:29 -07:00
Diego Biurrun
1a8d0cf77e x86: dsputil: Move inline assembly macros to a separate header 2014-03-22 06:17:29 -07:00
Matt Oliver
cd5cf395f6 Additional icl inline asm fix.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-22 14:07:03 +01:00
Michael Niedermayer
1cd107f637 avcodec/x86/snowdsp: add missing clobbers to inner_add_yblock_bw_8_obmc_16_bh_even_sse2() and inner_add_yblock_bw_16_obmc_32_sse2()
Note, these functions are currently disabled

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-21 03:38:48 +01:00
Michael Niedermayer
e98bac82e5 Merge commit '82bb3048013201c0095d2853d4623633d912252f'
* commit '82bb3048013201c0095d2853d4623633d912252f':
  dsputil: Use correct type in me_cmp_func function pointer

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:36:40 +01:00
Michael Niedermayer
011d83de48 Merge commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18'
* commit '0e083d7e43805db1a978cb57bfa25fda62e8ff18':
  build: Group general components separate from de/encoders in arch Makefiles

Conflicts:
	libavcodec/arm/Makefile
	libavcodec/x86/Makefile

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:26:31 +01:00
Michael Niedermayer
ba85bfabf3 Merge commit '5169e688956be3378adb3b16a93962fe0048f1c9'
* commit '5169e688956be3378adb3b16a93962fe0048f1c9':
  dsputil: Propagate bit depth information to all (sub)init functions

Conflicts:
	libavcodec/arm/dsputil_init_arm.c
	libavcodec/arm/dsputil_init_armv5te.c
	libavcodec/arm/dsputil_init_armv6.c
	libavcodec/arm/dsputil_init_neon.c
	libavcodec/dsputil.c
	libavcodec/dsputil.h
	libavcodec/ppc/dsputil_ppc.c
	libavcodec/x86/dsputil_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-20 22:06:01 +01:00
Diego Biurrun
82bb304801 dsputil: Use correct type in me_cmp_func function pointer 2014-03-20 05:03:23 -07:00
Diego Biurrun
0e083d7e43 build: Group general components separate from de/encoders in arch Makefiles
This is in line with how the top-level libavcodec Makefile is structured.
2014-03-20 05:03:23 -07:00
Diego Biurrun
5169e68895 dsputil: Propagate bit depth information to all (sub)init functions
This avoids recalculating the value over and over again.
2014-03-20 05:03:23 -07:00
Carl Eugen Hoyos
57fdc74c34 Add one forgotten named inline asm operand in libavcodec/x86/motion_est.c. 2014-03-19 03:00:19 +01:00
Matt Oliver
8236747511 Automatically change MANGLE() into named inline asm operands when direct symbol reference in inline asm are not supported.
This is part of the patch-set for intel C inline asm on windows support

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-18 23:39:30 +01:00
Matt Oliver
b2d3a45598 avcodec/x86/mlpdsp: Only use asm when non-local inline asm lables are supported
This is part of the patch-set for intel C inline asm on windows support

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-18 23:37:50 +01:00
James Almer
aa1f38015c x86/synth_filter: improve FMA version
Replace mulps+subps with fnmaddps, resulting in two less instructions inside the
inner loops.
About 1% faster FMA3 performance.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-17 21:04:15 +01:00
Matt Oliver
b73aae6fe9 avcodec/x86/idct_sse2_xvid: move offsets out of MANGLE()
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-17 04:19:59 +01:00
Matt Oliver
9eb3f11c55 Add missing external declarations.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-17 00:48:09 +01:00
Matt Oliver
590805b7c3 Fixed 64bit conformance with mvzbl.
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-17 00:13:50 +01:00
Michael Niedermayer
5dd97d5809 Merge commit 'db3f61a04f1f66746660f921bb2780ddf1141f3b'
* commit 'db3f61a04f1f66746660f921bb2780ddf1141f3b':
  x86: dsputil_init: Drop some unnecessary parentheses

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 01:25:57 +01:00
Michael Niedermayer
27cab16ce7 Merge commit '441b093915717afa7d24be34bdab2a4911b30a57'
* commit '441b093915717afa7d24be34bdab2a4911b30a57':
  x86: dsputil_init: K&R formatting cosmetics

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 01:25:36 +01:00
Michael Niedermayer
236874a571 Merge commit '4cb4680c1087a2cd13d4b0c9167a2eb3147f99d8'
* commit '4cb4680c1087a2cd13d4b0c9167a2eb3147f99d8':
  x86: dsputil_x86.h: K&R formatting cosmetics

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 01:25:19 +01:00
Michael Niedermayer
925ce6faf4 Merge commit 'f8bbebecfd7ea3dceb7c96f931beca33f80a3490'
* commit 'f8bbebecfd7ea3dceb7c96f931beca33f80a3490':
  x86: motion_est: K&R formatting cosmetics

Conflicts:
	libavcodec/x86/motion_est.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 01:20:43 +01:00
Michael Niedermayer
b7a5f5dc66 Merge commit 'a36947c167d7278b891453083b57dc56b7a7f5c5'
* commit 'a36947c167d7278b891453083b57dc56b7a7f5c5':
  dsputilenc_mmx: K&R formatting cosmetics

Conflicts:
	libavcodec/x86/dsputilenc_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 01:09:57 +01:00
Michael Niedermayer
d926c4b240 Merge commit '38675229a879aa5258a8c71891fc8cbf74cf139f'
* commit '38675229a879aa5258a8c71891fc8cbf74cf139f':
  dsputil_mmx: K&R formatting cosmetics

Conflicts:
	libavcodec/x86/dsputil_mmx.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 01:01:37 +01:00
Michael Niedermayer
55f53f6c29 Merge commit '6a8b35dc88b4a1a452f192fbbf53ae7f59bc3f23'
* commit '6a8b35dc88b4a1a452f192fbbf53ae7f59bc3f23':
  dsputilenc_mmx: Merge two assignment blocks with identical conditions

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 00:57:25 +01:00
Michael Niedermayer
4104eb44e6 Merge commit '55519926ef855c671d084ccc151056de9e3d3a77'
* commit '55519926ef855c671d084ccc151056de9e3d3a77':
  x86: Make function prototype comments in assembly code consistent

Conflicts:
	libavcodec/x86/sbrdsp.asm

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 00:01:30 +01:00
Michael Niedermayer
a9b1936a4e Merge commit 'edd1f833fa145eb9c5026877c699ebe6efca00a0'
* commit 'edd1f833fa145eb9c5026877c699ebe6efca00a0':
  x86: h264_idct_10_bit: Use proper type in function prototype comments

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-14 00:00:16 +01:00
Michael Niedermayer
1c788eaca9 Merge commit '831a1180785a786272cdcefb71566a770bfb879e'
* commit '831a1180785a786272cdcefb71566a770bfb879e':
  Update dsputil- and SIMD-related comments to match reality more closely

Conflicts:
	libavcodec/x86/hpeldsp.asm
	libavutil/arm/float_dsp_init_arm.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-13 23:59:56 +01:00
Michael Niedermayer
d61e1156be Merge commit '17608f6ee3d2088cdb8d1e704276d8b34f01160d'
* commit '17608f6ee3d2088cdb8d1e704276d8b34f01160d':
  x86: Add some more missing headers

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-13 23:41:17 +01:00
Diego Biurrun
db3f61a04f x86: dsputil_init: Drop some unnecessary parentheses 2014-03-13 08:15:51 -07:00
Diego Biurrun
441b093915 x86: dsputil_init: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
4cb4680c10 x86: dsputil_x86.h: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
f8bbebecfd x86: motion_est: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
a36947c167 dsputilenc_mmx: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
38675229a8 dsputil_mmx: K&R formatting cosmetics 2014-03-13 08:15:51 -07:00
Diego Biurrun
6a8b35dc88 dsputilenc_mmx: Merge two assignment blocks with identical conditions 2014-03-13 08:15:51 -07:00
Diego Biurrun
55519926ef x86: Make function prototype comments in assembly code consistent
This helps grepping for functions, among other things.
2014-03-13 05:50:29 -07:00
Diego Biurrun
edd1f833fa x86: h264_idct_10_bit: Use proper type in function prototype comments 2014-03-13 05:50:29 -07:00
Diego Biurrun
831a118078 Update dsputil- and SIMD-related comments to match reality more closely 2014-03-13 05:50:29 -07:00
Diego Biurrun
17608f6ee3 x86: Add some more missing headers 2014-03-13 05:50:28 -07:00
Diego Biurrun
08dba0e1c3 x86: mpegvideoenc: Remove some remnants of the long-gone libmpeg2 IDCT 2014-03-13 05:50:28 -07:00
James Almer
9e0e1f9067 x86/dsputil: add emms to ff_scalarproduct_int16_mmxext()
Also undo the changes to ra144enc.c from previous commits.
Should fix ticket #3429

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-06 18:23:55 +01:00
Michael Niedermayer
2d99de66b7 Merge commit '3bfdee00cd92ff07c364d4901c4aefda32780756'
* commit '3bfdee00cd92ff07c364d4901c4aefda32780756':
  x86: dcadsp: Fix linking with yasm and optimizations disabled

Conflicts:
	libavcodec/x86/dcadsp_init.c

See: 206167a295
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-06 14:10:27 +01:00
Diego Biurrun
3bfdee00cd x86: dcadsp: Fix linking with yasm and optimizations disabled
Some optimized functions reference optimized symbols, so the functions
must be explicitly disabled when those symbols are unavailable.
2014-03-05 23:16:21 +01:00
Michael Niedermayer
146b476ba0 Merge commit '3741aa37c2a0d0717faff74a5c4cc357d16f6d1d'
* commit '3741aa37c2a0d0717faff74a5c4cc357d16f6d1d':
  x86: cabac: Use correct #includes to make header compile standalone

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-05 21:33:44 +01:00
Diego Biurrun
3741aa37c2 x86: cabac: Use correct #includes to make header compile standalone 2014-03-05 13:32:25 +01:00
James Almer
7fd64e3e36 x86/synth_filter: add synth_filter_fma3
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-05 01:58:16 +01:00
James Almer
206167a295 x86/synth_filter: add missing HAVE_YASM guard
Should fix compilation failures with --disable-yasm on some compilers

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-04 22:47:28 +01:00
James Almer
884e085d1e x86/synth_filter: Revert the switch to float ops with SSE2
This reverts the changes 6467209836
and 68c3ed936a did to the SSE2 version,
which generated a hit of about 5 cycles.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-02 11:58:10 +01:00
James Almer
68c3ed936a x86/synth_filter: add synth_filter_avx
Sandy Bridge Win64:
180 cycles on ff_synth_filter_inner_sse2
150 cycles on ff_synth_filter_inner_avx

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-02 01:00:55 +01:00
James Almer
6467209836 x86/synth_filter: add synth_filter_sse
Build only on x86_32 targets.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-03-01 15:32:40 +01:00
Michael Niedermayer
fb3c33f3cd Merge commit '4cb6964244fd6c099383d8b7e99731e72cc844b9'
* commit '4cb6964244fd6c099383d8b7e99731e72cc844b9':
  dcadec: simplify decoding of VQ high frequencies

Conflicts:
	configure
	libavcodec/dcadec.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-28 21:41:19 +01:00
Michael Niedermayer
baf3adc621 Merge commit '08e3ea60ff4059341b74be04a428a38f7c3630b0'
* commit '08e3ea60ff4059341b74be04a428a38f7c3630b0':
  x86: synth filter float: implement SSE2 version

Conflicts:
	libavcodec/x86/dcadsp.asm
	libavcodec/x86/dcadsp_init.c

See: 2cdbcc0048
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-28 20:38:39 +01:00
Christophe Gisquet
2cdbcc0048 x86: synth filter float: implement SSE2 version
Timings for Arrandale:
          C    SSE
win32:  2108   334
win64:  1152   322

Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.

Unrolling for ARCH_X86_64 is a 20 cycles gain.

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-28 20:34:40 +01:00
Michael Niedermayer
e346a59383 Merge commit 'ad507d7907457e678900bac132122ba7be4644cb'
* commit 'ad507d7907457e678900bac132122ba7be4644cb':
  x86: dcadsp: implement SSE lfe_dir

Conflicts:
	libavcodec/x86/dcadsp.asm

See: 169243112c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-28 19:22:00 +01:00
Christophe Gisquet
169243112c x86: dcadsp: implement SSE lfe_dir
Results for Arrandale/Windows:
32: 1670 -> 316
64:  728 -> 298

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-28 19:20:03 +01:00
Michael Niedermayer
5ba1648318 Merge commit 'b23650491fbd579a4365f42bd42575afb7b53f7e'
* commit 'b23650491fbd579a4365f42bd42575afb7b53f7e':
  prores: Use consistent names for DSP arch initialization functions

Conflicts:
	libavcodec/proresdsp.c
	libavcodec/proresdsp.h
	libavcodec/x86/proresdsp_init.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-28 17:13:00 +01:00
Christophe Gisquet
4cb6964244 dcadec: simplify decoding of VQ high frequencies
The vector dequantization has a test in a loop preventing effective SIMD
implementation. By moving it out of the loop, this loop can be DSPized.

Therefore, modify the current DSP implementation. In particular, the
DSP implementation no longer has to handle null loop sizes.

The decode_hf implementations have following timings:

For x86 Arrandale:
        C  SSE SSE2 SSE4
win32: 260 162  119  104
win64: 242 N/A   89   72

The arm NEON optimizations follow in a later patch as external asm. The
now unused check for the y modifier in arm inline asm is removed from
configure.
2014-02-28 13:03:22 +01:00
Christophe Gisquet
08e3ea60ff x86: synth filter float: implement SSE2 version
Timings for Arrandale:
          C    SSE
win32:  2108   334
win64:  1152   322

Factorizing the inner loop with a call/jmp is a >15 cycles cost, even with
the jmp destination being aligned.

Unrolling for ARCH_X86_64 is a 20 cycles gain.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-28 13:00:48 +01:00
Christophe Gisquet
ad507d7907 x86: dcadsp: implement SSE lfe_dir
Results for Arrandale/Windows:
32: 1670 -> 316
64:  728 -> 298

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-28 13:00:47 +01:00
Diego Biurrun
b23650491f prores: Use consistent names for DSP arch initialization functions 2014-02-28 10:34:55 +01:00
James Almer
2163a40a46 x86/imdct36: use sse3 instructions in the last BUTTERF step when possible
Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-27 23:28:15 +01:00
James Almer
fbf98375e4 x86/imdct36: don't build imdct36_float_sse on x86_64 targets
There's an SSE2 version as well, and x86_64 guarantees that
instruction set is present.

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-27 22:54:03 +01:00
James Almer
3f3d748cab x86: Move XOP emulation to x86util
We need the emulation to support the cases where the first
argument is the same as the fourth. To achieve this a fifth
argument working as a temporary may be needed.
Emulation that doesn't obey the original instruction semantics
can't be in x86inc.

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-24 08:30:19 +01:00
Michael Niedermayer
8372aaf721 Merge commit '017a06a9ee86b047079166c2694c9c655ff03356'
* commit '017a06a9ee86b047079166c2694c9c655ff03356':
  x86: dsputil: Use correct file name as multiple inclusion guard

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-20 14:58:04 +01:00
Diego Biurrun
017a06a9ee x86: dsputil: Use correct file name as multiple inclusion guard 2014-02-20 04:16:15 -08:00
Michael Niedermayer
130c33af35 Merge commit 'b23bc95920e2f10b9621857e829c45b064f356c0'
* commit 'b23bc95920e2f10b9621857e829c45b064f356c0':
  x86: dca: Add missing multiple inclusion guards

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-19 15:44:48 +01:00
Diego Biurrun
b23bc95920 x86: dca: Add missing multiple inclusion guards 2014-02-19 10:19:15 +01:00
Hendrik Leppkes
7716eda0aa vp9/x86: set correct number of registers used in intra pred asm
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-18 17:20:14 +01:00
James Almer
07b4b0ca62 tta/x86: add ff_ttafilter_process_dec_{ssse3, sse4}
Results are from a Win64 build running on an AMD FX 6300

1121 decicycles in ttafilter_process_dec_c, 16777112 runs, 104 skips
522 decicycles in ff_ttafilter_process_dec_ssse3, 16777149 runs, 67 skips
477 decicycles in ff_ttafilter_process_dec_sse4, 16777156 runs, 60 skips

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-17 13:51:19 +01:00
Ronald S. Bultje
fdb093c4e4 vp9/x86: intra prediction SIMD.
Partially based on h264_intrapred. (I hope to eventually merge these
two intrapred implementations back together.)
2014-02-17 13:39:00 +01:00
James Almer
ec482e738d x86/fladsp: add missing check to ff_flacdsp_init_x86()
Fixes compilation with flac decoder disabled and encoder enabled

Signed-off-by: James Almer <jamrial@gmail.com>
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-16 12:06:04 +01:00
Michael Niedermayer
d601106ab1 avcodec/x86/lossless_videodsp: fix w type
Fixes fate issues on mingw64

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-15 06:41:38 +01:00
Peter Ross
b8664c9294 avcodec/vp8dsp: add VP7 idct and loop filter
Signed-off-by: Peter Ross <pross@xvid.org>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-15 02:15:35 +01:00
James Almer
e87974bc00 flac/x86: add ff_flac_lpc_32_xop()
Tested on an AMD FX 6300

679081 decicycles in ff_flac_lpc_32_xop, 32768 runs
774425 decicycles in ff_flac_lpc_32_sse4, 32768 runs

Signed-off-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-13 22:14:59 +01:00
James Darnley
623f380a18 lavc: fix flac encoder and decoder dependencies
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-13 21:00:32 +01:00
Michael Niedermayer
df98b36aa6 Merge commit '5c1c6e82261b856214499b9fef3a08bf3ff6e0ae'
* commit '5c1c6e82261b856214499b9fef3a08bf3ff6e0ae':
  dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-08 17:25:31 +01:00
Michael Niedermayer
dd2b330347 Merge commit '0cffd6fff59f192120dc93aa6c3cb8180f5506e3'
* commit '0cffd6fff59f192120dc93aa6c3cb8180f5506e3':
  x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-08 17:06:57 +01:00
Janne Grunau
5c1c6e8226 dca: include dcadsp.h in {arm,x86}/dca.h for checkheaders 2014-02-08 13:38:36 +01:00
Janne Grunau
0cffd6fff5 x86: use the inline int8x8_fmul_int32 only if inline SSE2 is availbale
Fixes compilation with MSVC. Also does not rely on on earlier config.h
include but include it directly.
2014-02-08 12:10:56 +01:00
Clément Bœsch
669d4f9053 x86/vp9lpf: simplify 2nd transpose in 44/48/88/84.
For non-avx optims, this saves 8 movs.

before:
  1785 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524129 runs, 159 skips
  3327 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262116 runs, 28 skips
  2712 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193729 runs, 575 skips
  3237 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524061 runs, 227 skips

after:
  1768 decicycles in ff_vp9_loop_filter_h_44_16_ssse3, 524062 runs, 226 skips
  3310 decicycles in ff_vp9_loop_filter_h_48_16_ssse3, 262107 runs, 37 skips
  2719 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193954 runs, 350 skips
  3184 decicycles in ff_vp9_loop_filter_h_84_16_ssse3, 524236 runs, 52 skips
2014-02-08 11:10:23 +01:00
Michael Niedermayer
82ae8a44e6 Merge commit '5b59a9fc6152169599561f04b4f66370edda5c9c'
* commit '5b59a9fc6152169599561f04b4f66370edda5c9c':
  x86: dcadsp: implement int8x8_fmul_int32

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-08 01:20:33 +01:00
Christophe Gisquet
5b59a9fc61 x86: dcadsp: implement int8x8_fmul_int32
For the callable function (as opposed to the inline one):
         C  SSE  SSE2  SSE4
Win32:  47   42   29    26
Win64:  30   33   25    23
The SSE version is neither compiled nor set for ARCH_X86_64, as the
inlinable function takes over.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-02-07 22:52:40 +01:00
Loren Merritt
9c978f243a flac/x86: add ff_flac_lpc_32_sse4()
benchmarked on sandybridge x86_64:
1358232 decicycles in flac_lpc_32_c
1244575 decicycles in flac_lpc_32_sse4, James Almer's patch
 650045 decicycles in flac_lpc_32_sse4, this patch

I haven't tested the edgecases such as odd block lengths

odd block length tested-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-02-06 02:51:19 +01:00
Clément Bœsch
d92a725329 x86/vp9lpf: remove 8 SWAPs in 84/48 transpose. 2014-02-05 07:21:13 +01:00
Clément Bœsch
97dde561de x86/vp9lpf: remove braindead double pxor. 2014-02-05 07:21:11 +01:00
Clément Bœsch
9a3b05b0a9 x86/vp9lpf: save a few mov in flat8in/hev masks calc. 2014-02-05 07:21:09 +01:00
Clément Bœsch
91d85bb167 x86/vp9lpf: add ff_vp9_loop_filter_[vh]_44_16_{sse2,ssse3,avx}. 2014-02-05 07:21:06 +01:00
Michael Niedermayer
de17ccc774 Merge commit '51daafb02eaf96e0743a37ce95a7f5d02c1fa3c2'
* commit '51daafb02eaf96e0743a37ce95a7f5d02c1fa3c2':
  x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such.

Conflicts:
	libavcodec/x86/videodsp_init.c

See: 1b3a7e1f42
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-31 14:30:30 +01:00
Clément Bœsch
c5dd73b890 x86/vp9lpf: add ff_vp9_loop_filter_h_{48,84}_16_{sse2,ssse3,avx}().
5.40s → 5.30s overall decode time with -threads 1 on ped1080p.webm
(i7 920, ssse3)
2014-01-30 19:34:13 +01:00
Ronald S. Bultje
9ee9c679a7 x86: videodsp: Fix a bug in a %if statement where we used '%%' instead of '&&'.
Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-01-30 15:33:23 +01:00
Ronald S. Bultje
51daafb02e x86: videodsp: Properly mark sse2 instructions in emulated_edge_mc as such.
Should fix crashes or corrupt output on pre-SSE2 CPUs when they were
using SSE2-code (e.g. AMD Athlon XP 2400+ or Intel Pentium III) in
hfix or hvar single-edge (left/right) extension functions.

Signed-off-by: Janne Grunau <janne-libav@jannau.net>
2014-01-30 15:30:01 +01:00
James Almer
644c32ea4b x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_sse2()
Similar gains as the ssse3 version once again

Signed-off-by: James Almer <jamrial@gmail.com>
2014-01-28 09:30:55 +01:00