2693 Commits

Author SHA1 Message Date
sijchen
41b4ecb06b Avoid memory problem when mem alloc failed during initializing pRefList 2016-02-17 09:52:30 -08:00
sijchen
4b97dcb367 avoid memory problem when mem alloc failed during initializing pRefList 2016-02-16 10:05:49 -08:00
Karina
18728a4876 trace cleanup 2016-02-16 10:52:37 +08:00
ruil2
a26955e444 Merge pull request #2358 from sijchen/fix_free2
[Encoder]  avoid memory problem if mem alloc failed in the middle of InitDqLayer
2016-02-16 10:47:23 +08:00
sijchen
855d1cf8c2 add input parameter check as protection for an encoder interface 2016-02-15 11:54:51 -08:00
sijchen
b76a79c726 move the rc free to the correct condition to avoid access to invalid memory 2016-02-15 10:13:50 -08:00
sijchen
025500d5aa move the assigning m_uiSpatialPicNum earlier to cover the memory leak if error in allocating pic 2016-02-15 10:13:23 -08:00
sijchen
36722c553b use WelsMallocz instead of WelsMalloc to avoid non-null pointer at init 2016-02-15 10:12:44 -08:00
sijchen
71aa533038 move the printing of MEMORY_CHECK part to more reasonable 2016-02-15 10:12:34 -08:00
sijchen
6a0f0811ae use WelsUninitEncoderExt in all free process in WelsInitEncoderExt 2016-02-15 10:06:43 -08:00
sijchen
408b7cad17 use WelsUninitEncoderExt rather than FreeMemorySvc which correctly deals with release of vpp memory 2016-02-15 10:04:52 -08:00
Sindre Aamås
e96a7b5c92 [Common/x86] DeblockChromaEq4V_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~2.07x speedup on Haswell (x86-64).
~2.12x speedup on Haswell (x86 32-bit).
2016-02-15 02:08:03 +01:00
Sindre Aamås
fc16010583 [Common/x86] DeblockChromaLt4V_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~2.68x speedup on Haswell (x86-64).
~2.38x speedup on Haswell (x86 32-bit).
2016-02-15 02:07:25 +01:00
Sindre Aamås
62fb37d096 [Common/x86] DeblockLumaEq4_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Minimize spills.

~2.31x speedup on Haswell (x86-64).
~2.40x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:39 +01:00
Sindre Aamås
732e1c5f78 [Common/x86] DeblockLumaLt4_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~1.97x speedup on Haswell (x86-64).
~3.09x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:18 +01:00
sijchen
2b9a250fbd include the free-ing of pointer into FreeDqLayer 2016-02-12 16:23:57 -08:00
sijchen
a1a3873a62 improve the code structure 2016-02-10 22:25:41 -08:00
sijchen
43fdf74fa6 fix a miss of assigning and remove an unused line 2016-02-10 21:54:53 -08:00
sijchen
914302a462 avoid memory problem if mem alloc failed in the middle of InitDqLayer 2016-02-10 21:54:53 -08:00
sijchen
aaa25160ec Merge pull request #2353 from saamas/encoder-x86-dct-opt2
[Encoder] x86 DCT optimizations
2016-02-08 15:00:12 -08:00
sijchen
e5e7013b73 Merge pull request #2350 from sijchen/th00
[Common] Add sink to IWelsTask
2016-02-08 14:59:38 -08:00
HaiboZhu
ad9ca3824f Merge pull request #2354 from ruil2/remove_trace
fix error width and height issue
2016-02-04 12:00:20 +08:00
Karina
ae508b9724 fix error width and height issue 2016-02-04 10:25:03 +08:00
sijchen
f5fd7420a9 Merge pull request #2351 from huili2/fix_width_height_enc_constraint
fix frame size constraints for width and height
2016-02-02 16:31:05 -08:00
Sindre Aamås
c8c74903f8 [Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.

Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
Sindre Aamås
f90960983c [Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
Sindre Aamås
7486de2844 [Encoder] AVX2 DCT tweaks
Do some shuffling in load/store unpack/pack to save some
work in horizontal DCTs.

Use a few 128-bit broadcasts to compact data vectors a bit.

~1.04x speedup for the DCT case on Haswell.
~1.12x speedup for the IDCT case on Haswell.
2016-02-02 17:22:48 +01:00
Karina
2d4cbcf060 remove trace 2016-02-02 17:34:59 +08:00
unknown
3873addc3d fix frame size constraints for width and height 2016-02-01 15:55:53 +08:00
HaiboZhu
1030820ec4 Merge pull request #2342 from sijchen/enh_ut_tem
[UT] correct and enhance the ut template and trace improvement
2016-02-01 09:08:05 +08:00
sijchen
ef329e33c3 add simulcastAvc setting in setting trace 2016-01-20 14:24:16 -08:00
Sindre Aamås
e22d731f26 [Encoder] yasm-compatible vinserti128 syntax in DCT asm 2016-01-19 21:48:23 +01:00
Sindre Aamås
144ff0fd51 [Encoder] SSE2 4x4 IDCT optimizations
Use a combination of instruction types that distributes more
evenly across execution ports on common architectures.

Do the horizontal IDCT without transposing back and forth.

Minor tweaks.

~1.14x faster on Haswell. Should be faster on other architectures
as well.
2016-01-19 13:12:29 +01:00
Sindre Aamås
991e344d8c [Encoder] SSE2 4x4 DCT optimizations
Use a combination of instruction types that distributes more
evenly across execution ports on common architectures.

Do the horizontal DCT without transposing back and forth.

Minor tweaks.

~1.54x faster on Haswell. Should be faster on other architectures
as well.
2016-01-19 13:12:28 +01:00
Sindre Aamås
3088d96978 [Encoder] Add an AVX2 4x4 IDCT implementation
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b267163f10 [Encoder] Add an AVX2 4x4 DCT implementation
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
HaiboZhu
8eb4de10a2 Merge pull request #2337 from HaiboZhu/Add_Protection_wrong_API_call
Add protection for wrong API call without initialize
2016-01-19 13:42:49 +08:00
HaiboZhu
5e3e975ffb Merge pull request #2331 from ruil2/return_value
add return value judgment
2016-01-19 12:25:10 +08:00
Haibo Zhu
6d7bd2daf4 Add protection for wrong API call without initialize 2016-01-19 12:00:54 +08:00
Martin Storsjö
fbe35cffca Avoid warnings in MSVC about implicitly casting floats to integers 2016-01-16 11:10:25 +02:00
Karina
559e786fa4 add return value judgment 2016-01-15 10:30:41 +08:00
HaiboZhu
d11f12db54 Merge pull request #2330 from ruil2/mt_build_1
fix build issue when some macro turn on
2016-01-15 09:28:07 +08:00
sijchen
5eb18b101e change the output way of debug trace 2016-01-13 22:13:43 -08:00
Karina
67f4dcf2e2 fix build issue when some macro turn on 2016-01-14 09:40:20 +08:00
Karina
0f0d54ef51 using independent encoder control logic for SAVC case 2016-01-14 09:16:12 +08:00
sijchen
cce1c29844 add sink to IWelsTask (for further enhancements) 2016-01-13 16:24:54 -08:00
sijchen
5cad0f9bba enhance a UT to cover more case 2016-01-11 22:01:02 -08:00
sijchen
bf35b6fee7 add a debug trace if encoder returns error 2016-01-11 22:00:24 -08:00
sijchen
19f5eb0932 complete a debug trace in load-balancing task 2016-01-11 22:00:14 -08:00
sijchen
7a8da6a468 remove unneed codes after new task-managements 2016-01-11 21:59:49 -08:00