Commit Graph

2558 Commits

Author SHA1 Message Date
HaiboZhu
027f027c25 Merge pull request #2371 from GregoryJWolfe/master
Added support for "video signal type present" information.
2016-02-25 10:49:34 +08:00
Gregory J. Wolfe
c7fcba06c7 Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
ruil2
3e538617cd Merge pull request #2374 from sijchen/for_ts0
[Encoder] fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-23 17:26:20 +08:00
huade
7bcb3ba4f4 refactor slice level rc structure 2016-02-23 16:49:37 +08:00
sijchen
881fc11c48 finish the remaining prob of fixing ts=0 2016-02-22 10:40:35 -08:00
sijchen
9816e3302d fix timestamp = 0 issue when rc mode is BITRATE mode 2016-02-22 10:33:55 -08:00
Karina
597b4eef73 fix timestamp = 0 issue when rc mode is BITRATE mode. 2016-02-22 10:33:55 -08:00
ruil2
2754129064 Merge pull request #2360 from saamas/common-x86-deblock-optimizations
[Common/x86] Deblocking optimizations
2016-02-19 09:52:39 +08:00
Gregory J. Wolfe
f35a0daccf Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data.  The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-18 11:51:51 -05:00
ruil2
13586a3dfc Merge pull request #2366 from sijchen/fix_free6
[Encoder] add error handling in memory allocation failed case for multi-threading
2016-02-18 10:25:19 +08:00
ruil2
f791ac28ec Merge pull request #2365 from sijchen/fix_free42
[Encoder] avoid memory problem when mem alloc failed during initializing pRefList
2016-02-18 10:25:07 +08:00
ruil2
de1a70d164 Merge pull request #2363 from sijchen/fix_free5
[Encoder] add input parameter check as protection for an encoder interface
2016-02-18 10:24:55 +08:00
sijchen
e07ee9c096 use WELS_DELETE_OP for deleting 2016-02-17 10:07:33 -08:00
sijchen
74955c877f set pointers to null and call uninit 2016-02-17 10:07:33 -08:00
sijchen
cc675f9fd1 add error handling in memory allocation failed case 2016-02-17 10:07:33 -08:00
sijchen
41b4ecb06b Avoid memory problem when mem alloc failed during initializing pRefList 2016-02-17 09:52:30 -08:00
sijchen
4b97dcb367 avoid memory problem when mem alloc failed during initializing pRefList 2016-02-16 10:05:49 -08:00
Karina
18728a4876 trace cleanup 2016-02-16 10:52:37 +08:00
ruil2
a26955e444 Merge pull request #2358 from sijchen/fix_free2
[Encoder]  avoid memory problem if mem alloc failed in the middle of InitDqLayer
2016-02-16 10:47:23 +08:00
sijchen
855d1cf8c2 add input parameter check as protection for an encoder interface 2016-02-15 11:54:51 -08:00
sijchen
b76a79c726 move the rc free to the correct condition to avoid access to invalid memory 2016-02-15 10:13:50 -08:00
sijchen
025500d5aa move the assigning m_uiSpatialPicNum earlier to cover the memory leak if error in allocating pic 2016-02-15 10:13:23 -08:00
sijchen
36722c553b use WelsMallocz instead of WelsMalloc to avoid non-null pointer at init 2016-02-15 10:12:44 -08:00
sijchen
71aa533038 move the printing of MEMORY_CHECK part to more reasonable 2016-02-15 10:12:34 -08:00
sijchen
6a0f0811ae use WelsUninitEncoderExt in all free process in WelsInitEncoderExt 2016-02-15 10:06:43 -08:00
sijchen
408b7cad17 use WelsUninitEncoderExt rather than FreeMemorySvc which correctly deals with release of vpp memory 2016-02-15 10:04:52 -08:00
Sindre Aamås
e96a7b5c92 [Common/x86] DeblockChromaEq4V_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~2.07x speedup on Haswell (x86-64).
~2.12x speedup on Haswell (x86 32-bit).
2016-02-15 02:08:03 +01:00
Sindre Aamås
fc16010583 [Common/x86] DeblockChromaLt4V_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~2.68x speedup on Haswell (x86-64).
~2.38x speedup on Haswell (x86 32-bit).
2016-02-15 02:07:25 +01:00
Sindre Aamås
62fb37d096 [Common/x86] DeblockLumaEq4_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Minimize spills.

~2.31x speedup on Haswell (x86-64).
~2.40x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:39 +01:00
Sindre Aamås
732e1c5f78 [Common/x86] DeblockLumaLt4_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

Avoid spills.

~1.97x speedup on Haswell (x86-64).
~3.09x speedup on Haswell (x86 32-bit).
2016-02-15 02:06:18 +01:00
sijchen
2b9a250fbd include the free-ing of pointer into FreeDqLayer 2016-02-12 16:23:57 -08:00
sijchen
a1a3873a62 improve the code structure 2016-02-10 22:25:41 -08:00
sijchen
43fdf74fa6 fix a miss of assigning and remove an unused line 2016-02-10 21:54:53 -08:00
sijchen
914302a462 avoid memory problem if mem alloc failed in the middle of InitDqLayer 2016-02-10 21:54:53 -08:00
sijchen
aaa25160ec Merge pull request #2353 from saamas/encoder-x86-dct-opt2
[Encoder] x86 DCT optimizations
2016-02-08 15:00:12 -08:00
sijchen
e5e7013b73 Merge pull request #2350 from sijchen/th00
[Common] Add sink to IWelsTask
2016-02-08 14:59:38 -08:00
HaiboZhu
ad9ca3824f Merge pull request #2354 from ruil2/remove_trace
fix error width and height issue
2016-02-04 12:00:20 +08:00
Karina
ae508b9724 fix error width and height issue 2016-02-04 10:25:03 +08:00
sijchen
f5fd7420a9 Merge pull request #2351 from huili2/fix_width_height_enc_constraint
fix frame size constraints for width and height
2016-02-02 16:31:05 -08:00
Sindre Aamås
c8c74903f8 [Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.

Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
Sindre Aamås
f90960983c [Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
Sindre Aamås
7486de2844 [Encoder] AVX2 DCT tweaks
Do some shuffling in load/store unpack/pack to save some
work in horizontal DCTs.

Use a few 128-bit broadcasts to compact data vectors a bit.

~1.04x speedup for the DCT case on Haswell.
~1.12x speedup for the IDCT case on Haswell.
2016-02-02 17:22:48 +01:00
Karina
2d4cbcf060 remove trace 2016-02-02 17:34:59 +08:00
unknown
3873addc3d fix frame size constraints for width and height 2016-02-01 15:55:53 +08:00
HaiboZhu
1030820ec4 Merge pull request #2342 from sijchen/enh_ut_tem
[UT] correct and enhance the ut template and trace improvement
2016-02-01 09:08:05 +08:00
sijchen
ef329e33c3 add simulcastAvc setting in setting trace 2016-01-20 14:24:16 -08:00
Sindre Aamås
e22d731f26 [Encoder] yasm-compatible vinserti128 syntax in DCT asm 2016-01-19 21:48:23 +01:00
Sindre Aamås
144ff0fd51 [Encoder] SSE2 4x4 IDCT optimizations
Use a combination of instruction types that distributes more
evenly across execution ports on common architectures.

Do the horizontal IDCT without transposing back and forth.

Minor tweaks.

~1.14x faster on Haswell. Should be faster on other architectures
as well.
2016-01-19 13:12:29 +01:00
Sindre Aamås
991e344d8c [Encoder] SSE2 4x4 DCT optimizations
Use a combination of instruction types that distributes more
evenly across execution ports on common architectures.

Do the horizontal DCT without transposing back and forth.

Minor tweaks.

~1.54x faster on Haswell. Should be faster on other architectures
as well.
2016-01-19 13:12:28 +01:00
Sindre Aamås
3088d96978 [Encoder] Add an AVX2 4x4 IDCT implementation
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00