HaiboZhu
25f53a2e3d
Merge pull request #2399 from saamas/encoder-x86-add-avx2-satd-routines
...
[Encoder/x86] Add AVX2 SATD routines
2016-03-10 09:59:33 +08:00
Haibo Zhu
31de8bb3a0
Change the level limit check behavior to make the compatibility
2016-03-09 08:34:07 +08:00
Sindre Aamås
48a520915a
[Encoder/x86] Add AVX2 SATD routines
...
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2 (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2 (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2 (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
volvet
d4c68527b1
Merge pull request #2389 from saamas/common-x86-deblock-chroma-horizontal-ssse3-optimizations
...
[Common/x86] Deblock chroma horizontal ssse3 optimizations
2016-03-08 17:09:08 +08:00
HaiboZhu
d9bfc9204b
Merge pull request #2394 from sijchen/th021
...
[Common] remove sink in WelsThreadPool and hide the construtor to finish the s…
2016-03-08 16:29:40 +08:00
Karina
fee9d502bb
format update and fix build issue when turn on STAT_OUTPUT macro
2016-03-04 13:55:14 +08:00
sijchen
316f740630
Merge pull request #2390 from sijchen/th012
...
[Common] put CWelsThreadPool to singleTon for future usage
2016-03-03 09:47:20 -08:00
Martin Storsjö
7f53c29302
Fix a return value check
...
In 9cb4f4e8e21af, the error code returned from CheckIntraNxNPredMode
was changed - therefore, these return value checks, that look
for a specific error code, need to be updated accordingly.
This fixes crashes in DecodeCrashTestAPI.DecoderCrashTest
with some seeds.
2016-03-03 10:15:34 +02:00
sijchen
4db9c32976
remove sink in WelsThreadPool and hide the construtor to finish the singleTon
2016-03-02 17:08:09 -08:00
sijchen
d4f09d9048
put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask)
2016-02-29 11:40:25 -08:00
HaiboZhu
52d25f544a
Merge pull request #2386 from huili2/return_info_change
...
modify return value check inside decoder
2016-02-29 09:21:31 +08:00
Sindre Aamås
a009153741
[Common/x86] DeblockChromaEq4H_ssse3 optimizations
...
Use packed 8-bit operations rather than unpack to 16-bit.
~5.80x speedup on Haswell (x86-64).
~1.69x speedup on Haswell (x86 32-bit).
2016-02-26 10:58:16 +01:00
Sindre Aamås
9909c306f1
[Common/x86] DeblockChromaLt4H_ssse3 optimizations
...
Use packed 8-bit operations rather than unpack to 16-bit.
~5.72x speedup on Haswell (x86-64).
~1.85x speedup on Haswell (x86 32-bit).
2016-02-26 10:58:16 +01:00
unknown
9cb4f4e8e2
modify return value check inside decoder
2016-02-26 16:29:35 +08:00
Martin Storsjö
69e3fac093
Avoid reading iCountMbNumInSlice out of bounds on slice realloc
...
Prior to 7bcb3ba4f4abf18a,
pCurLayer->sLayerInfo.pSliceInLayer[uiSliceIdx].iCountMbNumInSlice
was read after setting pCurLayer->sLayerInfo.pSliceInLayer to
the newly allocated, larger array. After this commit, it is read
before the array has been switched, and thus is read from the
old array (which only holds elements up to iMaxSliceNumOld, not
up to iMaxSliceNum).
This fixes reads out of bounds, and crashes in the test suite.
2016-02-25 10:31:58 +02:00
HaiboZhu
040974f735
Merge pull request #2378 from shihuade/MultiThread_V4.9_V5
...
add thread-based slice buffer and refactor reallocate process
2016-02-25 14:40:56 +08:00
HaiboZhu
321c772536
Merge pull request #2372 from ruil2/refine_trace
...
update trace for ENCODER_OPTION_TRACE_CALLBACK
2016-02-25 10:50:12 +08:00
HaiboZhu
027f027c25
Merge pull request #2371 from GregoryJWolfe/master
...
Added support for "video signal type present" information.
2016-02-25 10:49:34 +08:00
huade
5e8a716c1d
add thread-based slice buffer and refact reallocate process for futher change
2016-02-25 10:08:41 +08:00
Gregory J. Wolfe
c7fcba06c7
Added support for "video signal type present" information.
...
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:
- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)
These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
ruil2
3e538617cd
Merge pull request #2374 from sijchen/for_ts0
...
[Encoder] fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-23 17:26:20 +08:00
huade
7bcb3ba4f4
refactor slice level rc structure
2016-02-23 16:49:37 +08:00
sijchen
881fc11c48
finish the remaining prob of fixing ts=0
2016-02-22 10:40:35 -08:00
sijchen
9816e3302d
fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-22 10:33:55 -08:00
Karina
597b4eef73
fix timestamp = 0 issue when rc mode is BITRATE mode.
2016-02-22 10:33:55 -08:00
Karina
65218a3c35
update trace for ENCODER_OPTION_TRACE_CALLBACK
2016-02-22 14:33:10 +08:00
ruil2
2754129064
Merge pull request #2360 from saamas/common-x86-deblock-optimizations
...
[Common/x86] Deblocking optimizations
2016-02-19 09:52:39 +08:00
Gregory J. Wolfe
f35a0daccf
Added support for "video signal type present" information.
...
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:
- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)
These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-18 11:51:51 -05:00
ruil2
13586a3dfc
Merge pull request #2366 from sijchen/fix_free6
...
[Encoder] add error handling in memory allocation failed case for multi-threading
2016-02-18 10:25:19 +08:00
ruil2
f791ac28ec
Merge pull request #2365 from sijchen/fix_free42
...
[Encoder] avoid memory problem when mem alloc failed during initializing pRefList
2016-02-18 10:25:07 +08:00
ruil2
de1a70d164
Merge pull request #2363 from sijchen/fix_free5
...
[Encoder] add input parameter check as protection for an encoder interface
2016-02-18 10:24:55 +08:00
sijchen
e07ee9c096
use WELS_DELETE_OP for deleting
2016-02-17 10:07:33 -08:00
sijchen
74955c877f
set pointers to null and call uninit
2016-02-17 10:07:33 -08:00
sijchen
cc675f9fd1
add error handling in memory allocation failed case
2016-02-17 10:07:33 -08:00
sijchen
41b4ecb06b
Avoid memory problem when mem alloc failed during initializing pRefList
2016-02-17 09:52:30 -08:00
sijchen
4b97dcb367
avoid memory problem when mem alloc failed during initializing pRefList
2016-02-16 10:05:49 -08:00
Karina
18728a4876
trace cleanup
2016-02-16 10:52:37 +08:00
ruil2
a26955e444
Merge pull request #2358 from sijchen/fix_free2
...
[Encoder] avoid memory problem if mem alloc failed in the middle of InitDqLayer
2016-02-16 10:47:23 +08:00
sijchen
855d1cf8c2
add input parameter check as protection for an encoder interface
2016-02-15 11:54:51 -08:00
sijchen
b76a79c726
move the rc free to the correct condition to avoid access to invalid memory
2016-02-15 10:13:50 -08:00
sijchen
025500d5aa
move the assigning m_uiSpatialPicNum earlier to cover the memory leak if error in allocating pic
2016-02-15 10:13:23 -08:00
sijchen
36722c553b
use WelsMallocz instead of WelsMalloc to avoid non-null pointer at init
2016-02-15 10:12:44 -08:00
sijchen
71aa533038
move the printing of MEMORY_CHECK part to more reasonable
2016-02-15 10:12:34 -08:00
sijchen
6a0f0811ae
use WelsUninitEncoderExt in all free process in WelsInitEncoderExt
2016-02-15 10:06:43 -08:00
sijchen
408b7cad17
use WelsUninitEncoderExt rather than FreeMemorySvc which correctly deals with release of vpp memory
2016-02-15 10:04:52 -08:00
Sindre Aamås
e96a7b5c92
[Common/x86] DeblockChromaEq4V_ssse3 optimizations
...
Use packed 8-bit operations rather than unpack to 16-bit.
Avoid spills.
~2.07x speedup on Haswell (x86-64).
~2.12x speedup on Haswell (x86 32-bit).
2016-02-15 02:08:03 +01:00
Sindre Aamås
fc16010583
[Common/x86] DeblockChromaLt4V_ssse3 optimizations
...
Use packed 8-bit operations rather than unpack to 16-bit.
Avoid spills.
~2.68x speedup on Haswell (x86-64).
~2.38x speedup on Haswell (x86 32-bit).
2016-02-15 02:07:25 +01:00
Sindre Aamås
62fb37d096
[Common/x86] DeblockLumaEq4_ssse3 optimizations
...
Use packed 8-bit operations rather than unpack to 16-bit.
Minimize spills.
~2.31x speedup on Haswell (x86-64).
~2.40x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:39 +01:00
Sindre Aamås
732e1c5f78
[Common/x86] DeblockLumaLt4_ssse3 optimizations
...
Use packed 8-bit operations rather than unpack to 16-bit.
Avoid spills.
~1.97x speedup on Haswell (x86-64).
~3.09x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:18 +01:00
sijchen
2b9a250fbd
include the free-ing of pointer into FreeDqLayer
2016-02-12 16:23:57 -08:00