Sindre Aamås
48a520915a
[Encoder/x86] Add AVX2 SATD routines
...
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2 (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2 (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2 (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
sijchen
4db9c32976
remove sink in WelsThreadPool and hide the construtor to finish the singleTon
2016-03-02 17:08:09 -08:00
sijchen
d4f09d9048
put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask)
2016-02-29 11:40:25 -08:00
HaiboZhu
040974f735
Merge pull request #2378 from shihuade/MultiThread_V4.9_V5
...
add thread-based slice buffer and refactor reallocate process
2016-02-25 14:40:56 +08:00
HaiboZhu
027f027c25
Merge pull request #2371 from GregoryJWolfe/master
...
Added support for "video signal type present" information.
2016-02-25 10:49:34 +08:00
huade
5e8a716c1d
add thread-based slice buffer and refact reallocate process for futher change
2016-02-25 10:08:41 +08:00
Gregory J. Wolfe
c7fcba06c7
Added support for "video signal type present" information.
...
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:
- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)
These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
ruil2
3e538617cd
Merge pull request #2374 from sijchen/for_ts0
...
[Encoder] fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-23 17:26:20 +08:00
huade
7bcb3ba4f4
refactor slice level rc structure
2016-02-23 16:49:37 +08:00
sijchen
9816e3302d
fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-22 10:33:55 -08:00
Karina
597b4eef73
fix timestamp = 0 issue when rc mode is BITRATE mode.
2016-02-22 10:33:55 -08:00
Gregory J. Wolfe
f35a0daccf
Added support for "video signal type present" information.
...
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:
- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)
These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-18 11:51:51 -05:00
sijchen
914302a462
avoid memory problem if mem alloc failed in the middle of InitDqLayer
2016-02-10 21:54:53 -08:00
sijchen
aaa25160ec
Merge pull request #2353 from saamas/encoder-x86-dct-opt2
...
[Encoder] x86 DCT optimizations
2016-02-08 15:00:12 -08:00
sijchen
e5e7013b73
Merge pull request #2350 from sijchen/th00
...
[Common] Add sink to IWelsTask
2016-02-08 14:59:38 -08:00
Sindre Aamås
c8c74903f8
[Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
...
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.
~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.
Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
Sindre Aamås
f90960983c
[Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
...
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.
~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
Sindre Aamås
3088d96978
[Encoder] Add an AVX2 4x4 IDCT implementation
...
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b267163f10
[Encoder] Add an AVX2 4x4 DCT implementation
...
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Karina
0f0d54ef51
using independent encoder control logic for SAVC case
2016-01-14 09:16:12 +08:00
sijchen
cce1c29844
add sink to IWelsTask (for further enhancements)
2016-01-13 16:24:54 -08:00
Karina
d4f979c495
seperate each layer trace output
2016-01-05 14:02:58 +08:00
huade
f161566458
remove pSliceBs from ctx
2015-12-15 17:10:52 +08:00
huade
e8536c6b73
remove iCountThreadsNum and unitfy with iMultipleThreadIdc
2015-12-14 12:26:02 +08:00
HaiboZhu
92637b4912
Merge pull request #2304 from sijchen/th21
...
[Encoder] Add tasks and thread pool call for SM_SIZELIMITED_SLICE mode
2015-12-11 16:16:16 +08:00
Karina
fde8bd2554
update temporal layer quant
2015-12-10 15:07:19 +08:00
sijchen
76ca56498a
Add tasks and thread pool call for SM_SIZELIMITED_SLICE mode
2015-12-09 09:55:04 -08:00
sijchen
89752ff62f
Refactor: remove CWelsTaskManageMultiD
2015-11-30 10:32:48 -08:00
HaiboZhu
f679da900f
Merge pull request #2281 from sijchen/th11
...
[Encoder] remove duplicated operation after thread pool
2015-11-27 12:13:33 +08:00
HaiboZhu
921443ead8
Merge pull request #2272 from sijchen/rf0
...
[Encoder] put duplicated codes into one function
2015-11-27 09:27:37 +08:00
huade
4a4ade1201
refact WriteSliceBs()
2015-11-26 09:32:33 +08:00
sijchen
05c89b75f0
remove duplicated operation after thread pool and rename a task for clearer meaning
2015-11-25 13:46:21 -08:00
huade
d02addd90f
remove pCountMbNumInSlice from SSliceCtx
2015-11-25 13:36:37 +08:00
sijchen
2fc9c08710
put duplicated codes into one function
2015-11-24 11:14:58 -08:00
HaiboZhu
01016b1c83
Merge pull request #2264 from sijchen/api41
...
[Encoder] put bUseLoadBalancing into actual usage and add test case for it
2015-11-24 14:16:21 +08:00
huade
f263f0710a
remove pSliceComplexRatio from SliceThreading
2015-11-24 10:44:23 +08:00
huade
b001785eee
remove pSliceConsumeTime in SSliceCtx and pSliceThreading
2015-11-24 08:58:37 +08:00
sijchen
f3c4b878ff
update the usage of flag and MD5 value
2015-11-23 11:54:43 -08:00
huade
9ef07c5b99
remove pFirstMbInSlice in SSliceCtx
2015-11-20 09:51:01 +08:00
huade
b77b68ffa0
change input parameters for UpdateMbNeighbourInfoForNextSlice etc.
2015-11-19 17:18:03 +08:00
huade
c842c5c946
change input parameters for DynamicAdjustSlicePEncCtxAll etc, SSliceCtx refactoring
2015-11-19 15:00:38 +08:00
huade
b60bb67b4e
SSliceCtx struture refactoring----change input paramters for Init/UninitSlicePEncCtx()
2015-11-19 13:19:34 +08:00
huade
35ab32b1a3
remove (ppCtx)->pSliceCtxList and only keep DqLayer->sSliceCtx to simply the structure manage
2015-11-19 11:03:50 +08:00
huade
06eb03578d
SSliceCtx struture refactoring----change input paramters for UpdateMbListNeighborParallel
2015-11-17 17:54:58 +08:00
sijchen
6fe05b0996
add error handling of task returns
2015-11-13 12:05:06 -08:00
sijchen
b5d890c1ea
Merge pull request #2224 from sijchen/thp73
...
[Encoder] put the logic related to multiple D layer into a class …
2015-11-13 11:57:07 -08:00
sijchen
e508c86dac
fix the missing loadbalancing part
2015-11-12 13:15:07 -08:00
sijchen
aeb5ab4b99
[Encoder] put the logic related to multiple D layer into a class for better structure
2015-11-11 22:55:16 -08:00
HaiboZhu
beacba76e3
Merge pull request #2220 from sijchen/thp61
...
[Encoder] add preencodingtasklist in task management
2015-11-12 13:54:49 +08:00
sijchen
33c378f7b7
change API for slicing part for easier usage (the UseLoadBalancing flag is still under working)
2015-11-10 09:50:06 -08:00