4158 Commits

Author SHA1 Message Date
zhilwang
d7570bfa52 Merge pull request #2401 from saamas/decoder-use-encoder-x86-idct-routines
[Decoder] Use encoder x86 IDCT routines
2016-03-18 08:50:33 +08:00
HaiboZhu
a8ab4afe5b Merge pull request #2410 from HaiboZhu/Add_disable_assert_in_release
Diable assert in release with -DNDEBUG macro
2016-03-17 15:46:25 +08:00
HaiboZhu
c441f6f390 Merge pull request #2411 from huili2/memory_leak_fix
fix memory leak when alloc failed in decoder
2016-03-17 15:46:12 +08:00
Haibo Zhu
43f767d06e Diable assert in release with -DNDEBUG macro
Update the code to avoid the function unused warning
2016-03-17 11:24:01 +08:00
unknown
693fd14272 fix memory leak when alloc failed in decoder 2016-03-17 10:31:25 +08:00
Sindre Aamås
b6c4a5447c [Decoder/x86] IDCT one block at a time with SSE2
At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.

Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.
2016-03-16 19:55:11 +01:00
huili2
a8d9576297 Merge pull request #2405 from HaiboZhu/Fix_UT_decoder_init_fail
Fix the decoder init failed case in UT
2016-03-16 16:28:14 +08:00
HaiboZhu
7a3b3fdbe7 Merge pull request #2403 from ruil2/downsampling1
change downsampling logic
2016-03-16 09:48:08 +08:00
Haibo Zhu
46f42ec5f3 Fix the decoder init failed case in UT 2016-03-14 17:06:58 +08:00
Karina
f84f2315ab change downsampling logic that downsampling source is from the nearest layer instead of the highest layer 2016-03-14 09:55:36 +08:00
HaiboZhu
25f53a2e3d Merge pull request #2399 from saamas/encoder-x86-add-avx2-satd-routines
[Encoder/x86] Add AVX2 SATD routines
2016-03-10 09:59:33 +08:00
Sindre Aamås
98042f1600 [Decoder] Use encoder x86 IDCT routines
Move asm routines to common. Delete obsolete decoder routines.

Use wider routines where applicable.

~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.
2016-03-09 10:41:42 +01:00
HaiboZhu
bffda9ec02 Merge pull request #2397 from HaiboZhu/Remove_level_limit_check
Change the level limit check behavior to make the compatibility
2016-03-09 09:50:44 +08:00
Haibo Zhu
31de8bb3a0 Change the level limit check behavior to make the compatibility 2016-03-09 08:34:07 +08:00
Sindre Aamås
48a520915a [Encoder/x86] Add AVX2 SATD routines
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2  (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2  (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2   (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
volvet
d4c68527b1 Merge pull request #2389 from saamas/common-x86-deblock-chroma-horizontal-ssse3-optimizations
[Common/x86] Deblock chroma horizontal ssse3 optimizations
2016-03-08 17:09:08 +08:00
HaiboZhu
d9bfc9204b Merge pull request #2394 from sijchen/th021
[Common] remove sink in WelsThreadPool and hide the construtor to finish the s…
2016-03-08 16:29:40 +08:00
HaiboZhu
74b8a66140 Merge pull request #2395 from ruil2/stat_output
format update and fix build issue when turn on STAT_OUTPUT macro
2016-03-07 13:46:27 +08:00
Karina
fee9d502bb format update and fix build issue when turn on STAT_OUTPUT macro 2016-03-04 13:55:14 +08:00
sijchen
316f740630 Merge pull request #2390 from sijchen/th012
[Common] put CWelsThreadPool to singleTon for future usage
2016-03-03 09:47:20 -08:00
huili2
ac6cf877d6 Merge pull request #2392 from mstorsjo/decoder-error-return
Fix a return value check
2016-03-03 16:40:55 +08:00
Martin Storsjö
7f53c29302 Fix a return value check
In 9cb4f4e8e21af, the error code returned from CheckIntraNxNPredMode
was changed - therefore, these return value checks, that look
for a specific error code, need to be updated accordingly.

This fixes crashes in DecodeCrashTestAPI.DecoderCrashTest
with some seeds.
2016-03-03 10:15:34 +02:00
sijchen
4db9c32976 remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
sijchen
d4f09d9048 put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask) 2016-02-29 11:40:25 -08:00
HaiboZhu
52d25f544a Merge pull request #2386 from huili2/return_info_change
modify return value check inside decoder
2016-02-29 09:21:31 +08:00
sijchen
7e88b13809 Merge pull request #2380 from mstorsjo/fix-slice-realloc
Avoid reading iCountMbNumInSlice out of bounds on slice realloc
2016-02-26 09:46:13 -08:00
Sindre Aamås
a009153741 [Common/x86] DeblockChromaEq4H_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

~5.80x speedup on Haswell (x86-64).
~1.69x speedup on Haswell (x86 32-bit).
2016-02-26 10:58:16 +01:00
Sindre Aamås
9909c306f1 [Common/x86] DeblockChromaLt4H_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

~5.72x speedup on Haswell (x86-64).
~1.85x speedup on Haswell (x86 32-bit).
2016-02-26 10:58:16 +01:00
unknown
9cb4f4e8e2 modify return value check inside decoder 2016-02-26 16:29:35 +08:00
Martin Storsjö
69e3fac093 Avoid reading iCountMbNumInSlice out of bounds on slice realloc
Prior to 7bcb3ba4f4abf18a,
pCurLayer->sLayerInfo.pSliceInLayer[uiSliceIdx].iCountMbNumInSlice
was read after setting pCurLayer->sLayerInfo.pSliceInLayer to
the newly allocated, larger array. After this commit, it is read
before the array has been switched, and thus is read from the
old array (which only holds elements up to iMaxSliceNumOld, not
up to iMaxSliceNum).

This fixes reads out of bounds, and crashes in the test suite.
2016-02-25 10:31:58 +02:00
HaiboZhu
040974f735 Merge pull request #2378 from shihuade/MultiThread_V4.9_V5
add thread-based slice buffer and  refactor reallocate process
2016-02-25 14:40:56 +08:00
HaiboZhu
321c772536 Merge pull request #2372 from ruil2/refine_trace
update trace for ENCODER_OPTION_TRACE_CALLBACK
2016-02-25 10:50:12 +08:00
HaiboZhu
027f027c25 Merge pull request #2371 from GregoryJWolfe/master
Added support for "video signal type present" information.
2016-02-25 10:49:34 +08:00
huade
5e8a716c1d add thread-based slice buffer and refact reallocate process for futher change 2016-02-25 10:08:41 +08:00
Gregory J. Wolfe
03890fe86f Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-24 10:33:18 -05:00
Gregory J. Wolfe
c7fcba06c7 Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
ruil2
3e538617cd Merge pull request #2374 from sijchen/for_ts0
[Encoder] fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-23 17:26:20 +08:00
ruil2
78ae48c686 Merge pull request #2375 from shihuade/MultiThread_V4.8_v4
refactor slice level rc statistic info structure
2016-02-23 17:25:57 +08:00
huade
7bcb3ba4f4 refactor slice level rc structure 2016-02-23 16:49:37 +08:00
sijchen
881fc11c48 finish the remaining prob of fixing ts=0 2016-02-22 10:40:35 -08:00
sijchen
9816e3302d fix timestamp = 0 issue when rc mode is BITRATE mode 2016-02-22 10:33:55 -08:00
Karina
597b4eef73 fix timestamp = 0 issue when rc mode is BITRATE mode. 2016-02-22 10:33:55 -08:00
Karina
65218a3c35 update trace for ENCODER_OPTION_TRACE_CALLBACK 2016-02-22 14:33:10 +08:00
ruil2
2754129064 Merge pull request #2360 from saamas/common-x86-deblock-optimizations
[Common/x86] Deblocking optimizations
2016-02-19 09:52:39 +08:00
Gregory J. Wolfe
f35a0daccf Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data.  The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-18 11:51:51 -05:00
ruil2
13586a3dfc Merge pull request #2366 from sijchen/fix_free6
[Encoder] add error handling in memory allocation failed case for multi-threading
2016-02-18 10:25:19 +08:00
ruil2
f791ac28ec Merge pull request #2365 from sijchen/fix_free42
[Encoder] avoid memory problem when mem alloc failed during initializing pRefList
2016-02-18 10:25:07 +08:00
ruil2
de1a70d164 Merge pull request #2363 from sijchen/fix_free5
[Encoder] add input parameter check as protection for an encoder interface
2016-02-18 10:24:55 +08:00
sijchen
4537682042 Merge pull request #2362 from ruil2/trace1
trace cleanup
2016-02-17 14:52:46 -08:00
sijchen
e07ee9c096 use WELS_DELETE_OP for deleting 2016-02-17 10:07:33 -08:00