This patch adds MSA (MIPS-SIMD-Arch) optimizations for VP9 MC functions in new file vp9_mc_msa.c
Signed-off-by: Shivraj Patil <shivraj.patil@imgtec.com>
Reviewed-by: "Ronald S. Bultje" <rsbultje@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: b4b47bc2b3fb7ca710bfffe5aa969e37_signal_sigabrt_7ffff70eccc9_744_nc_sample2.avi with memlimit of 4194304
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
* commit '4512ee78e19fdb011bdec1b3a8dc0b315c82a81e':
mpegts: Mark the muxer as supporting variable fps
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
* commit 'c88c5eef53ff1619724ba47b722da64ec0593dab':
hevc: Split the struct setup from the pps parsing
Conflicts:
libavcodec/hevc_ps.c
Merged-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes use of uninitialized memory
Fixes: a96874b9466b6edc660a519c7ad47977_signal_sigsegv_7ffff713351a_744_nc_sample.avi with memlimit 2147483648
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
With gcc-4.9.2 loongson faild in test fate-dca, this is caused by option
-fexpensive-optimizations in -O3 optimization. We disable it temporarily
before the bug been fixed up.
Signed-off-by: ZhouXiaoyong <zhouxiaoyong@loongson.cn>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Speed of all modes increased by a factor between 7.4 and 19.8 largely depending
on whether bytes are unpacked into words. Modes 2, 3, and 4 have been sped-up
by a factor of 43 (thanks quick sort!)
All modes are available on x86_64 but only modes 1, 10, 11, 12, 13, 14, 19, 20,
21, and 22 are available on x86 due to the number of SIMD registers used.
With a contribution from James Almer <jamrial@gmail.com>
Possibly fixes Ticket4671
the removed check is wrong and insufficient
Based on patch by Maksym Veremeyenko <verem@m1.tv>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
The internal line accumulator for 16bit can overflow, so I changed that
from int to uint64_t in the C code. The matching assembly looks a little
weird but output looks correct.
(avx2 should be trivial to add later.)
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 39a25908b84604acdaa490138282d091_signal_sigsegv_7ffff713351a_331_WAWV.avi with memlimit of 262144
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 260813283176b57b3c9974fe284eebc3_signal_sigsegv_7ffff713351a_991_xtrem_e2_m64q15_a32sxx.3gp with memlimit of 262144
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 1013dbde2c360d939cc2dfc33e4f275c_signal_sigsegv_a0500f_45_320vp3.nsv with memlimit of 536870912
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Fixes: 18615ff56beedc63a884a8db0678b47c_signal_sigsegv_7ffff713351a_991_xtrem_e2_m64q15_a32sxx.3gp with memlimit of 524288
Found-by: Samuel Groß, Mateusz "j00ru" Jurczyk and Gynvael Coldwind
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>
Both are 2-2.5x faster than their C counterpart.
Reviewed-by: Paul B Mahol <onemda@gmail.com>
Reviewed-by: James Almer <jamrial@gmail.com>
Signed-off-by: Michael Niedermayer <michael@niedermayer.cc>