474 Commits

Author SHA1 Message Date
Sindre Aamås
48a520915a [Encoder/x86] Add AVX2 SATD routines
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2  (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2  (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2   (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
sijchen
4db9c32976 remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
sijchen
d4f09d9048 put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask) 2016-02-29 11:40:25 -08:00
HaiboZhu
040974f735 Merge pull request #2378 from shihuade/MultiThread_V4.9_V5
add thread-based slice buffer and  refactor reallocate process
2016-02-25 14:40:56 +08:00
HaiboZhu
027f027c25 Merge pull request #2371 from GregoryJWolfe/master
Added support for "video signal type present" information.
2016-02-25 10:49:34 +08:00
huade
5e8a716c1d add thread-based slice buffer and refact reallocate process for futher change 2016-02-25 10:08:41 +08:00
Gregory J. Wolfe
c7fcba06c7 Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
ruil2
3e538617cd Merge pull request #2374 from sijchen/for_ts0
[Encoder] fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-23 17:26:20 +08:00
huade
7bcb3ba4f4 refactor slice level rc structure 2016-02-23 16:49:37 +08:00
sijchen
9816e3302d fix timestamp = 0 issue when rc mode is BITRATE mode 2016-02-22 10:33:55 -08:00
Karina
597b4eef73 fix timestamp = 0 issue when rc mode is BITRATE mode. 2016-02-22 10:33:55 -08:00
Gregory J. Wolfe
f35a0daccf Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data.  The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-18 11:51:51 -05:00
sijchen
914302a462 avoid memory problem if mem alloc failed in the middle of InitDqLayer 2016-02-10 21:54:53 -08:00
sijchen
aaa25160ec Merge pull request #2353 from saamas/encoder-x86-dct-opt2
[Encoder] x86 DCT optimizations
2016-02-08 15:00:12 -08:00
sijchen
e5e7013b73 Merge pull request #2350 from sijchen/th00
[Common] Add sink to IWelsTask
2016-02-08 14:59:38 -08:00
Sindre Aamås
c8c74903f8 [Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.

Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
Sindre Aamås
f90960983c [Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
Sindre Aamås
3088d96978 [Encoder] Add an AVX2 4x4 IDCT implementation
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b267163f10 [Encoder] Add an AVX2 4x4 DCT implementation
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Karina
0f0d54ef51 using independent encoder control logic for SAVC case 2016-01-14 09:16:12 +08:00
sijchen
cce1c29844 add sink to IWelsTask (for further enhancements) 2016-01-13 16:24:54 -08:00
Karina
d4f979c495 seperate each layer trace output 2016-01-05 14:02:58 +08:00
huade
f161566458 remove pSliceBs from ctx 2015-12-15 17:10:52 +08:00
huade
e8536c6b73 remove iCountThreadsNum and unitfy with iMultipleThreadIdc 2015-12-14 12:26:02 +08:00
HaiboZhu
92637b4912 Merge pull request #2304 from sijchen/th21
[Encoder] Add tasks and thread pool call for SM_SIZELIMITED_SLICE mode
2015-12-11 16:16:16 +08:00
Karina
fde8bd2554 update temporal layer quant 2015-12-10 15:07:19 +08:00
sijchen
76ca56498a Add tasks and thread pool call for SM_SIZELIMITED_SLICE mode 2015-12-09 09:55:04 -08:00
sijchen
89752ff62f Refactor: remove CWelsTaskManageMultiD 2015-11-30 10:32:48 -08:00
HaiboZhu
f679da900f Merge pull request #2281 from sijchen/th11
[Encoder] remove duplicated operation after thread pool
2015-11-27 12:13:33 +08:00
HaiboZhu
921443ead8 Merge pull request #2272 from sijchen/rf0
[Encoder] put duplicated codes into one function
2015-11-27 09:27:37 +08:00
huade
4a4ade1201 refact WriteSliceBs() 2015-11-26 09:32:33 +08:00
sijchen
05c89b75f0 remove duplicated operation after thread pool and rename a task for clearer meaning 2015-11-25 13:46:21 -08:00
huade
d02addd90f remove pCountMbNumInSlice from SSliceCtx 2015-11-25 13:36:37 +08:00
sijchen
2fc9c08710 put duplicated codes into one function 2015-11-24 11:14:58 -08:00
HaiboZhu
01016b1c83 Merge pull request #2264 from sijchen/api41
[Encoder] put bUseLoadBalancing into actual usage and add test case for it
2015-11-24 14:16:21 +08:00
huade
f263f0710a remove pSliceComplexRatio from SliceThreading 2015-11-24 10:44:23 +08:00
huade
b001785eee remove pSliceConsumeTime in SSliceCtx and pSliceThreading 2015-11-24 08:58:37 +08:00
sijchen
f3c4b878ff update the usage of flag and MD5 value 2015-11-23 11:54:43 -08:00
huade
9ef07c5b99 remove pFirstMbInSlice in SSliceCtx 2015-11-20 09:51:01 +08:00
huade
b77b68ffa0 change input parameters for UpdateMbNeighbourInfoForNextSlice etc. 2015-11-19 17:18:03 +08:00
huade
c842c5c946 change input parameters for DynamicAdjustSlicePEncCtxAll etc, SSliceCtx refactoring 2015-11-19 15:00:38 +08:00
huade
b60bb67b4e SSliceCtx struture refactoring----change input paramters for Init/UninitSlicePEncCtx() 2015-11-19 13:19:34 +08:00
huade
35ab32b1a3 remove (ppCtx)->pSliceCtxList and only keep DqLayer->sSliceCtx to simply the structure manage 2015-11-19 11:03:50 +08:00
huade
06eb03578d SSliceCtx struture refactoring----change input paramters for UpdateMbListNeighborParallel 2015-11-17 17:54:58 +08:00
sijchen
6fe05b0996 add error handling of task returns 2015-11-13 12:05:06 -08:00
sijchen
b5d890c1ea Merge pull request #2224 from sijchen/thp73
[Encoder] put the logic related to multiple D layer into a class …
2015-11-13 11:57:07 -08:00
sijchen
e508c86dac fix the missing loadbalancing part 2015-11-12 13:15:07 -08:00
sijchen
aeb5ab4b99 [Encoder] put the logic related to multiple D layer into a class for better structure 2015-11-11 22:55:16 -08:00
HaiboZhu
beacba76e3 Merge pull request #2220 from sijchen/thp61
[Encoder] add preencodingtasklist in task management
2015-11-12 13:54:49 +08:00
sijchen
33c378f7b7 change API for slicing part for easier usage (the UseLoadBalancing flag is still under working) 2015-11-10 09:50:06 -08:00