Commit Graph

27198 Commits

Author SHA1 Message Date
James Almer
644c32ea4b x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_sse2()
Similar gains as the ssse3 version once again

Signed-off-by: James Almer <jamrial@gmail.com>
2014-01-28 09:30:55 +01:00
Clément Bœsch
222c46c531 x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
9680 decicycles in loop_filter_v_88_16_c, 4193765 runs, 539 skips
9233 decicycles in loop_filter_h_88_16_c, 4193751 runs, 553 skips

1929 decicycles in ff_vp9_loop_filter_v_88_16_ssse3, 4194118 runs, 186 skips
2738 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193861 runs, 443 skips

5.978 → 5.417 overall decode time on ped1080p.webm (-threads 1)

Adding SSE2 support should be relatively trivial (just a matter of
changing the pshufb [mask_mix] with something else), patch welcome.
2014-01-28 07:36:38 +01:00
Michael Niedermayer
2a9c50798b avcodec/huffyuv: dont depend on bitstream_bpp having a specific value for version>2
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-28 00:27:57 +01:00
Michael Niedermayer
673ce8e46a avcodec/libfdk-aacenc: change MODE_7_1_FRONT_CENTER to map to AV_CH_LAYOUT_7POINT1_WIDE_BACK
This was suggested by Rodeo on IRC
<Rodeo> for consistency with the rest, MODE_7_1_FRONT_CENTER would be AV_CH_LAYOUT_7POINT1_WIDE_BACK (since LS+RS is mapped to back channels in other modes)

Reviewed-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 23:33:10 +01:00
Michael Niedermayer
a38842120a avcodec/libfdk-aacenc: change MODE_7_1_REAR_SURROUND to map to AV_CH_LAYOUT_7POINT1
This was suggested by Rodeo on IRC
<Rodeo> sorry, I meant MODE_7_1_REAR_SURROUND would probably be AV_CH_LAYOUT_7POINT1

Reviewed-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 23:33:10 +01:00
Clément Bœsch
822385d775 x86/vp9lpf: add a preload system in FILTER_UPDATE.
Allow some macro refactoring in filter14().
2014-01-27 22:39:26 +01:00
Clément Bœsch
315b4775ad x86/vp9lpf: refactor v/h using common macros for P7 to Q7. 2014-01-27 22:39:26 +01:00
Clément Bœsch
5d144086cc x86/vp9lpf: faster P7..Q7 accesses.
Introduce 2 additional registers for stride3 and mstride3 to allow
direct accesses (lea drops).

3931 → 3827 decicycles in ff_vp9_loop_filter_v_16_16_ssse3

Also uses defines to clarify the code.
2014-01-27 22:37:42 +01:00
Carl Eugen Hoyos
05e5bb6107 Fix decoding of some 8 < bpc < 16 signed j2k samples with libopenjpeg.
No testcase known.

Reviewed-by: Michael Bradshaw
2014-01-27 14:38:59 +01:00
Rainer Hochecker
bceeccc648 dxva2: bump maximum number of slieces for mpeg2
Suggested by heleppkes on https://trac.ffmpeg.org/ticket/3133

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 14:24:29 +01:00
Michael Niedermayer
6369766f01 avcodec/huffyuv: support gbrp9/10/12/14
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 02:11:29 +01:00
Michael Niedermayer
7cf8918b0d avcodec/huffyuv: update years in copyright
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 01:45:57 +01:00
Ronald S. Bultje
c2871568cf vp9: fix invalid ref frame w/h on size change.
Fixes invalid reads and crashes in vp90-2-05-resize.webm and fuzzed6.ivf.
The output is still not identical to what libvpx does (because we don't
actually scale in MC).

Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 20:16:01 +01:00
Ronald S. Bultje
d9343c3484 vp9: disable use_last_frame_mvs on resolution change (scalable).
Prevents some invalid memory accesses after resolution change in
vp90-2-05-resize.webm, and libvpx does this too.

Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 20:15:45 +01:00
Michael Niedermayer
e6c0da70fc avcodec/huffyuvdec: optimize >8bps VLC reading
97479 -> 54891 decicycles

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 19:59:57 +01:00
Michael Niedermayer
599e629f88 avcodec/huffyuvenc: fix end pointer for stats_out
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 16:24:36 +01:00
Michael Niedermayer
a301bb63f0 avcodec/huffyuvenc: fail if stats_out is too small instead of silently truncating
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 16:23:45 +01:00
Jean First
91489d28ba avcodec/libfdk_aacenc: enable 7.1 channel encoding
7.1(wide) and 7.1(wide-side) channel layouts are supported in fdk_aac since october 2013 (commit fa3eba1644)

Signed-off-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 03:31:42 +01:00
Michael Niedermayer
7667afffb8 avcodec/mpeg12dec: Revert Change to mpeg2_fast_decode_block_non_intra
This fixes the speed regression from 20626f53e9
and still checks sufficiently to prevent out of allocated memory accesses
due to the index

Before:
1823 decicycles in mpeg2_fast_decode_block_non_intra, 8388493 runs, 115 skips
After:
1808 decicycles in mpeg2_fast_decode_block_non_intra, 8388494 runs, 114 skips

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 02:57:40 +01:00
Michael Niedermayer
6a92598e14 avcodec/mpeg12dec: Redesign index checks for mpeg2_fast_decode_block_intra
This fixes the speed regression from 20626f53e9
and still checks sufficiently to prevent out of allocated memory accesses
due to the index

Before:
1681 decicycles in mpeg2_fast_decode_block_intra, 4194238 runs, 66 skips
After:
1658 decicycles in mpeg2_fast_decode_block_intra, 4194248 runs, 56 skips

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 02:57:29 +01:00
Michael Niedermayer
20626f53e9 Merge commit '6d93307f8df81808f0dcdbc064b848054a6e83b3'
* commit '6d93307f8df81808f0dcdbc064b848054a6e83b3':
  mpeg12: check scantable indices in all decode_block functions

Benchmarks

Before:
1878 decicycles in mpeg2_decode_block_non_intra, 8388487 runs, 121 skips
1700 decicycles in mpeg2_decode_block_intra, 4194239 runs, 65 skips
1808 decicycles in mpeg2_fast_decode_block_non_intra, 8388492 runs, 116 skips
1669 decicycles in mpeg2_fast_decode_block_intra, 4194248 runs, 56 skips
--
2056 decicycles in mpeg1_decode_block_inter, 65535 runs, 1 skips
2346 decicycles in mpeg1_decode_block_intra, 32768 runs, 0 skips
2011 decicycles in mpeg1_fast_decode_block_inter, 65533 runs, 3 skips
----------------
After:
1858 decicycles in mpeg2_decode_block_non_intra, 8388490 runs, 118 skips
1691 decicycles in mpeg2_decode_block_intra, 4194233 runs, 71 skips
1823 decicycles in mpeg2_fast_decode_block_non_intra, 8388493 runs, 115 skips
1681 decicycles in mpeg2_fast_decode_block_intra, 4194238 runs, 66 skips
--
2010 decicycles in mpeg1_decode_block_inter, 65535 runs, 1 skips
2322 decicycles in mpeg1_decode_block_intra, 32766 runs, 2 skips
1995 decicycles in mpeg1_fast_decode_block_inter, 65535 runs, 1 skips

All benchmarks are the best scores of several runs

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 02:52:21 +01:00
Michael Niedermayer
965fa6b0d9 Merge commit 'fb0c9d41d685abb58575c5482ca33b8cd457c5ec'
* commit 'fb0c9d41d685abb58575c5482ca33b8cd457c5ec':
  avutil: remove timer.h include from internal.h

Conflicts:
	libavcodec/ffv1dec.c
	libavutil/internal.h

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 01:54:55 +01:00
Michael Niedermayer
53167ecfdb avcodec/huffyuv: support AV_PIX_FMT_YUV(A)4XYP16 and GRAY16
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 00:23:03 +01:00
Janne Grunau
6d93307f8d mpeg12: check scantable indices in all decode_block functions
Add checks to the fast functions used with CODEC_FLAGS2_FAST and move
the check for all other functions to before the invalid memory is
accessed. Fixes https://trac.videolan.org/vlc/ticket/9713 with
CODEC_FLAGS2_FAST.

CC: libav-stable@libav.org
2014-01-25 21:50:20 +01:00
Janne Grunau
fb0c9d41d6 avutil: remove timer.h include from internal.h
Added libavutil/timer.h include to all files with {START,STOP}_TIMER.
2014-01-25 21:50:20 +01:00
Michael Niedermayer
018e2b57ca avcodec/libx264: also consider ticks per frame for fps/timebase setup
Setting fps = 1/timebase is not correct

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 16:31:30 +01:00
Clément Bœsch
5f4d04d084 x86/lossless_videodsp: silly one-line cosmetic. 2014-01-25 16:24:50 +01:00
Clément Bœsch
5267e85056 x86/lossless_videodsp: use common macro for add and diff int16 loop. 2014-01-25 14:27:37 +01:00
Clément Bœsch
cddbfd2a95 x86/lossless_videodsp: simplify and explicit aligned/unaligned flags 2014-01-25 11:59:43 +01:00
Michael Niedermayer
5554c6dd45 Merge remote-tracking branch 'rbultje/vp9-simd'
* rbultje/vp9-simd:
  vp9: fix memory corruption if header decoding fails after size change.
  vp9/x86: use explicit register for relative stack references.
  vp9/x86: iwht4x4 (lossless) mmx.
  vp9/x86: 4x4 iadst SIMD (ssse3) variants.
  vp9/x86: 8x8 iadst SIMD (ssse3/avx) variants.

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 01:43:54 +01:00
Michael Niedermayer
4b84a69ebb Merge remote-tracking branch 'qatar/master'
* qatar/master:
  dxtory: compressed RGB555/RGB565 decoding support

Conflicts:
	libavcodec/dxtory.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 01:39:19 +01:00
Michael Niedermayer
2d0d1f7eb3 Merge commit '0e1ad2f591b87e944550c15b54e54f8189743289'
* commit '0e1ad2f591b87e944550c15b54e54f8189743289':
  dxtory: add more compressed and uncompressed modes

Conflicts:
	libavcodec/dxtory.c

Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 01:33:42 +01:00
Ronald S. Bultje
4147b337c1 vp9: fix memory corruption if header decoding fails after size change. 2014-01-24 19:25:26 -05:00
Ronald S. Bultje
c9e6325ed9 vp9/x86: use explicit register for relative stack references.
Before this patch, we explicitly modify rsp, which isn't necessarily
universally acceptable, since the space under the stack pointer might
be modified in things like signal handlers. Therefore, use an explicit
register to hold the stack pointer relative to the bottom of the stack
(i.e. rsp). This will also clear out valgrind errors about the use of
uninitialized data that started occurring after the idct16x16/ssse3
optimizations were first merged.
2014-01-24 19:25:25 -05:00
Ronald S. Bultje
97474d527f vp9/x86: iwht4x4 (lossless) mmx. 2014-01-24 19:25:25 -05:00
Ronald S. Bultje
d43efa68bd vp9/x86: 4x4 iadst SIMD (ssse3) variants.
Cycle measurements for intra itxfm_4x4_add on ped1080p.webm:
idct_idct:    66 -> 67 cycles (noise measurement)
idct_iadst:  199 -> 79 cycles
iadst_idct:  165 -> 70 cycles
iadst_iadst: 183 -> 82 cycles
2014-01-24 19:25:25 -05:00
Ronald S. Bultje
baf47020cd vp9/x86: 8x8 iadst SIMD (ssse3/avx) variants.
Cycle measurements for intra itxfm_8x8_add on ped1080p.webm:
idct_idct:   133 -> 135 cycles (noise measurement)
idct_iadst:  900 -> 241 cycles
iadst_idct:  864 -> 215 cycles
iadst_iadst: 973 -> 310 cycles
2014-01-24 19:25:25 -05:00
Michael Niedermayer
cf812d8129 avcodec/dvbsubdec: Remove unused display_list_size
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 01:22:48 +01:00
Wim Vander Schelden
af09be4f4b Fixed a memory leak in dvbsubenc.c: sub->num_rects was reduced without freeing the associated rects.
Signed-off-by: Wim Vander Schelden <lists@fixnum.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 00:41:57 +01:00
Kostya Shishkov
28e1eed3c2 dxtory: compressed RGB555/RGB565 decoding support 2014-01-24 20:09:51 +01:00
Kostya Shishkov
0e1ad2f591 dxtory: add more compressed and uncompressed modes 2014-01-24 20:09:44 +01:00
Michael Niedermayer
934bb11ad7 avcodec/mpeg12dec: fix mis-indented line
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:17 +01:00
Michael Niedermayer
5f54756f7e avcodec/mpeg12dec: Disable the checked bitstream reader
Mpeg1/2 should not need it

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:17 +01:00
Michael Niedermayer
76b5e99ce9 avcodec/mpeg12dec: Check for overread in mpeg_decode_slice()
This is needed in case the checked bitstream reader is disabled

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:17 +01:00
Michael Niedermayer
d82eccea2b avcodec/mpeg12dec: check block index in mpeg2_fast_decode_block_non_intra()
Prevents some overreads at the cost of 1 cpu cycle

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:16 +01:00
Michael Niedermayer
0c8e5fb211 avcodec/mpeg12dec: Optimize mpeg1_decode_block_intra()
sandybridge i7 274->260 cycles

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:16 +01:00
Michael Niedermayer
0a59055167 avcodec/mpeg12dec: check for overread in mpeg1_fast_decode_block_inter()
No speedloss meassured

Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:16 +01:00
Michael Niedermayer
746350ea0f avcodec/mpeg12dec: Make mpeg2_fast_decode_block_intra() more robust by breaking out on invalid vlcs
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:16 +01:00
Guillaume Martres
50866c8d95 vp9: fix bugs in updating coef probabilities with parallelmode=1
- The memcpy was completely wrong because
s->prob_ctx[s->framectxid].coef is a [4][2][2][6][6][3] array, whereas
s->prob.coef is a [4][2][2][6][6][11] array.
- The additional check was committed to ffmpeg by Ronald S. Bultje.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-01-24 07:03:11 +01:00
Ronald S. Bultje
bd01412313 vp9: fix mvref finding to adhere to bug in libvpx.
Fixes a particular youtube video that I unfortunately can't share.

Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-01-24 07:02:56 +01:00