James Almer
644c32ea4b
x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_sse2()
...
Similar gains as the ssse3 version once again
Signed-off-by: James Almer <jamrial@gmail.com>
2014-01-28 09:30:55 +01:00
Clément Bœsch
222c46c531
x86/vp9lpf: add ff_vp9_loop_filter_[vh]_88_16_{ssse3,avx}.
...
9680 decicycles in loop_filter_v_88_16_c, 4193765 runs, 539 skips
9233 decicycles in loop_filter_h_88_16_c, 4193751 runs, 553 skips
1929 decicycles in ff_vp9_loop_filter_v_88_16_ssse3, 4194118 runs, 186 skips
2738 decicycles in ff_vp9_loop_filter_h_88_16_ssse3, 4193861 runs, 443 skips
5.978 → 5.417 overall decode time on ped1080p.webm (-threads 1)
Adding SSE2 support should be relatively trivial (just a matter of
changing the pshufb [mask_mix] with something else), patch welcome.
2014-01-28 07:36:38 +01:00
Michael Niedermayer
2a9c50798b
avcodec/huffyuv: dont depend on bitstream_bpp having a specific value for version>2
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-28 00:27:57 +01:00
Michael Niedermayer
673ce8e46a
avcodec/libfdk-aacenc: change MODE_7_1_FRONT_CENTER to map to AV_CH_LAYOUT_7POINT1_WIDE_BACK
...
This was suggested by Rodeo on IRC
<Rodeo> for consistency with the rest, MODE_7_1_FRONT_CENTER would be AV_CH_LAYOUT_7POINT1_WIDE_BACK (since LS+RS is mapped to back channels in other modes)
Reviewed-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 23:33:10 +01:00
Michael Niedermayer
a38842120a
avcodec/libfdk-aacenc: change MODE_7_1_REAR_SURROUND to map to AV_CH_LAYOUT_7POINT1
...
This was suggested by Rodeo on IRC
<Rodeo> sorry, I meant MODE_7_1_REAR_SURROUND would probably be AV_CH_LAYOUT_7POINT1
Reviewed-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 23:33:10 +01:00
Clément Bœsch
822385d775
x86/vp9lpf: add a preload system in FILTER_UPDATE.
...
Allow some macro refactoring in filter14().
2014-01-27 22:39:26 +01:00
Clément Bœsch
315b4775ad
x86/vp9lpf: refactor v/h using common macros for P7 to Q7.
2014-01-27 22:39:26 +01:00
Clément Bœsch
5d144086cc
x86/vp9lpf: faster P7..Q7 accesses.
...
Introduce 2 additional registers for stride3 and mstride3 to allow
direct accesses (lea drops).
3931 → 3827 decicycles in ff_vp9_loop_filter_v_16_16_ssse3
Also uses defines to clarify the code.
2014-01-27 22:37:42 +01:00
Carl Eugen Hoyos
05e5bb6107
Fix decoding of some 8 < bpc < 16 signed j2k samples with libopenjpeg.
...
No testcase known.
Reviewed-by: Michael Bradshaw
2014-01-27 14:38:59 +01:00
Rainer Hochecker
bceeccc648
dxva2: bump maximum number of slieces for mpeg2
...
Suggested by heleppkes on https://trac.ffmpeg.org/ticket/3133
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 14:24:29 +01:00
Michael Niedermayer
6369766f01
avcodec/huffyuv: support gbrp9/10/12/14
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 02:11:29 +01:00
Michael Niedermayer
7cf8918b0d
avcodec/huffyuv: update years in copyright
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-27 01:45:57 +01:00
Ronald S. Bultje
c2871568cf
vp9: fix invalid ref frame w/h on size change.
...
Fixes invalid reads and crashes in vp90-2-05-resize.webm and fuzzed6.ivf.
The output is still not identical to what libvpx does (because we don't
actually scale in MC).
Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 20:16:01 +01:00
Ronald S. Bultje
d9343c3484
vp9: disable use_last_frame_mvs on resolution change (scalable).
...
Prevents some invalid memory accesses after resolution change in
vp90-2-05-resize.webm, and libvpx does this too.
Reviewed-by: ubitux
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 20:15:45 +01:00
Michael Niedermayer
e6c0da70fc
avcodec/huffyuvdec: optimize >8bps VLC reading
...
97479 -> 54891 decicycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 19:59:57 +01:00
Michael Niedermayer
599e629f88
avcodec/huffyuvenc: fix end pointer for stats_out
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 16:24:36 +01:00
Michael Niedermayer
a301bb63f0
avcodec/huffyuvenc: fail if stats_out is too small instead of silently truncating
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 16:23:45 +01:00
Jean First
91489d28ba
avcodec/libfdk_aacenc: enable 7.1 channel encoding
...
7.1(wide) and 7.1(wide-side) channel layouts are supported in fdk_aac since october 2013 (commit fa3eba1644)
Signed-off-by: Jean First <jeanfirst@gmail.com>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 03:31:42 +01:00
Michael Niedermayer
7667afffb8
avcodec/mpeg12dec: Revert Change to mpeg2_fast_decode_block_non_intra
...
This fixes the speed regression from 20626f53e9
and still checks sufficiently to prevent out of allocated memory accesses
due to the index
Before:
1823 decicycles in mpeg2_fast_decode_block_non_intra, 8388493 runs, 115 skips
After:
1808 decicycles in mpeg2_fast_decode_block_non_intra, 8388494 runs, 114 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 02:57:40 +01:00
Michael Niedermayer
6a92598e14
avcodec/mpeg12dec: Redesign index checks for mpeg2_fast_decode_block_intra
...
This fixes the speed regression from 20626f53e9
and still checks sufficiently to prevent out of allocated memory accesses
due to the index
Before:
1681 decicycles in mpeg2_fast_decode_block_intra, 4194238 runs, 66 skips
After:
1658 decicycles in mpeg2_fast_decode_block_intra, 4194248 runs, 56 skips
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 02:57:29 +01:00
Michael Niedermayer
20626f53e9
Merge commit '6d93307f8df81808f0dcdbc064b848054a6e83b3'
...
* commit '6d93307f8df81808f0dcdbc064b848054a6e83b3':
mpeg12: check scantable indices in all decode_block functions
Benchmarks
Before:
1878 decicycles in mpeg2_decode_block_non_intra, 8388487 runs, 121 skips
1700 decicycles in mpeg2_decode_block_intra, 4194239 runs, 65 skips
1808 decicycles in mpeg2_fast_decode_block_non_intra, 8388492 runs, 116 skips
1669 decicycles in mpeg2_fast_decode_block_intra, 4194248 runs, 56 skips
--
2056 decicycles in mpeg1_decode_block_inter, 65535 runs, 1 skips
2346 decicycles in mpeg1_decode_block_intra, 32768 runs, 0 skips
2011 decicycles in mpeg1_fast_decode_block_inter, 65533 runs, 3 skips
----------------
After:
1858 decicycles in mpeg2_decode_block_non_intra, 8388490 runs, 118 skips
1691 decicycles in mpeg2_decode_block_intra, 4194233 runs, 71 skips
1823 decicycles in mpeg2_fast_decode_block_non_intra, 8388493 runs, 115 skips
1681 decicycles in mpeg2_fast_decode_block_intra, 4194238 runs, 66 skips
--
2010 decicycles in mpeg1_decode_block_inter, 65535 runs, 1 skips
2322 decicycles in mpeg1_decode_block_intra, 32766 runs, 2 skips
1995 decicycles in mpeg1_fast_decode_block_inter, 65535 runs, 1 skips
All benchmarks are the best scores of several runs
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 02:52:21 +01:00
Michael Niedermayer
965fa6b0d9
Merge commit 'fb0c9d41d685abb58575c5482ca33b8cd457c5ec'
...
* commit 'fb0c9d41d685abb58575c5482ca33b8cd457c5ec':
avutil: remove timer.h include from internal.h
Conflicts:
libavcodec/ffv1dec.c
libavutil/internal.h
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 01:54:55 +01:00
Michael Niedermayer
53167ecfdb
avcodec/huffyuv: support AV_PIX_FMT_YUV(A)4XYP16 and GRAY16
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-26 00:23:03 +01:00
Janne Grunau
6d93307f8d
mpeg12: check scantable indices in all decode_block functions
...
Add checks to the fast functions used with CODEC_FLAGS2_FAST and move
the check for all other functions to before the invalid memory is
accessed. Fixes https://trac.videolan.org/vlc/ticket/9713 with
CODEC_FLAGS2_FAST.
CC: libav-stable@libav.org
2014-01-25 21:50:20 +01:00
Janne Grunau
fb0c9d41d6
avutil: remove timer.h include from internal.h
...
Added libavutil/timer.h include to all files with {START,STOP}_TIMER.
2014-01-25 21:50:20 +01:00
Michael Niedermayer
018e2b57ca
avcodec/libx264: also consider ticks per frame for fps/timebase setup
...
Setting fps = 1/timebase is not correct
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 16:31:30 +01:00
Clément Bœsch
5f4d04d084
x86/lossless_videodsp: silly one-line cosmetic.
2014-01-25 16:24:50 +01:00
Clément Bœsch
5267e85056
x86/lossless_videodsp: use common macro for add and diff int16 loop.
2014-01-25 14:27:37 +01:00
Clément Bœsch
cddbfd2a95
x86/lossless_videodsp: simplify and explicit aligned/unaligned flags
2014-01-25 11:59:43 +01:00
Michael Niedermayer
5554c6dd45
Merge remote-tracking branch 'rbultje/vp9-simd'
...
* rbultje/vp9-simd:
vp9: fix memory corruption if header decoding fails after size change.
vp9/x86: use explicit register for relative stack references.
vp9/x86: iwht4x4 (lossless) mmx.
vp9/x86: 4x4 iadst SIMD (ssse3) variants.
vp9/x86: 8x8 iadst SIMD (ssse3/avx) variants.
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 01:43:54 +01:00
Michael Niedermayer
4b84a69ebb
Merge remote-tracking branch 'qatar/master'
...
* qatar/master:
dxtory: compressed RGB555/RGB565 decoding support
Conflicts:
libavcodec/dxtory.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 01:39:19 +01:00
Michael Niedermayer
2d0d1f7eb3
Merge commit '0e1ad2f591b87e944550c15b54e54f8189743289'
...
* commit '0e1ad2f591b87e944550c15b54e54f8189743289':
dxtory: add more compressed and uncompressed modes
Conflicts:
libavcodec/dxtory.c
Merged-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 01:33:42 +01:00
Ronald S. Bultje
4147b337c1
vp9: fix memory corruption if header decoding fails after size change.
2014-01-24 19:25:26 -05:00
Ronald S. Bultje
c9e6325ed9
vp9/x86: use explicit register for relative stack references.
...
Before this patch, we explicitly modify rsp, which isn't necessarily
universally acceptable, since the space under the stack pointer might
be modified in things like signal handlers. Therefore, use an explicit
register to hold the stack pointer relative to the bottom of the stack
(i.e. rsp). This will also clear out valgrind errors about the use of
uninitialized data that started occurring after the idct16x16/ssse3
optimizations were first merged.
2014-01-24 19:25:25 -05:00
Ronald S. Bultje
97474d527f
vp9/x86: iwht4x4 (lossless) mmx.
2014-01-24 19:25:25 -05:00
Ronald S. Bultje
d43efa68bd
vp9/x86: 4x4 iadst SIMD (ssse3) variants.
...
Cycle measurements for intra itxfm_4x4_add on ped1080p.webm:
idct_idct: 66 -> 67 cycles (noise measurement)
idct_iadst: 199 -> 79 cycles
iadst_idct: 165 -> 70 cycles
iadst_iadst: 183 -> 82 cycles
2014-01-24 19:25:25 -05:00
Ronald S. Bultje
baf47020cd
vp9/x86: 8x8 iadst SIMD (ssse3/avx) variants.
...
Cycle measurements for intra itxfm_8x8_add on ped1080p.webm:
idct_idct: 133 -> 135 cycles (noise measurement)
idct_iadst: 900 -> 241 cycles
iadst_idct: 864 -> 215 cycles
iadst_iadst: 973 -> 310 cycles
2014-01-24 19:25:25 -05:00
Michael Niedermayer
cf812d8129
avcodec/dvbsubdec: Remove unused display_list_size
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 01:22:48 +01:00
Wim Vander Schelden
af09be4f4b
Fixed a memory leak in dvbsubenc.c: sub->num_rects was reduced without freeing the associated rects.
...
Signed-off-by: Wim Vander Schelden <lists@fixnum.org>
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-25 00:41:57 +01:00
Kostya Shishkov
28e1eed3c2
dxtory: compressed RGB555/RGB565 decoding support
2014-01-24 20:09:51 +01:00
Kostya Shishkov
0e1ad2f591
dxtory: add more compressed and uncompressed modes
2014-01-24 20:09:44 +01:00
Michael Niedermayer
934bb11ad7
avcodec/mpeg12dec: fix mis-indented line
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:17 +01:00
Michael Niedermayer
5f54756f7e
avcodec/mpeg12dec: Disable the checked bitstream reader
...
Mpeg1/2 should not need it
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:17 +01:00
Michael Niedermayer
76b5e99ce9
avcodec/mpeg12dec: Check for overread in mpeg_decode_slice()
...
This is needed in case the checked bitstream reader is disabled
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:17 +01:00
Michael Niedermayer
d82eccea2b
avcodec/mpeg12dec: check block index in mpeg2_fast_decode_block_non_intra()
...
Prevents some overreads at the cost of 1 cpu cycle
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:16 +01:00
Michael Niedermayer
0c8e5fb211
avcodec/mpeg12dec: Optimize mpeg1_decode_block_intra()
...
sandybridge i7 274->260 cycles
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:16 +01:00
Michael Niedermayer
0a59055167
avcodec/mpeg12dec: check for overread in mpeg1_fast_decode_block_inter()
...
No speedloss meassured
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:16 +01:00
Michael Niedermayer
746350ea0f
avcodec/mpeg12dec: Make mpeg2_fast_decode_block_intra() more robust by breaking out on invalid vlcs
...
Signed-off-by: Michael Niedermayer <michaelni@gmx.at>
2014-01-24 18:09:16 +01:00
Guillaume Martres
50866c8d95
vp9: fix bugs in updating coef probabilities with parallelmode=1
...
- The memcpy was completely wrong because
s->prob_ctx[s->framectxid].coef is a [4][2][2][6][6][3] array, whereas
s->prob.coef is a [4][2][2][6][6][11] array.
- The additional check was committed to ffmpeg by Ronald S. Bultje.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-01-24 07:03:11 +01:00
Ronald S. Bultje
bd01412313
vp9: fix mvref finding to adhere to bug in libvpx.
...
Fixes a particular youtube video that I unfortunately can't share.
Signed-off-by: Anton Khirnov <anton@khirnov.net>
2014-01-24 07:02:56 +01:00