503 Commits

Author SHA1 Message Date
Karina
ffe11835fc change FrameQP control and complexity calculation 2016-07-11 10:20:17 +08:00
HaiboZhu
d35647ec3b Merge pull request #2491 from ruil2/nalsize
add nalsize checking UT and fix nalsize control when cabac on
2016-06-15 10:24:18 +08:00
Karina
b5cef5d49c modify reserved nal header size and change source frame in NalSizeChecking UT 2016-06-08 10:12:27 +08:00
Karina
2171d84f1e add nalsize checking UT and fix nalsize control when cabac on 2016-06-03 17:36:14 +08:00
sijchen@cisco.com
a7ae1efc3a add back the missing part after merging and formatting 2016-06-01 21:33:33 -07:00
sijchen@cisco.com
8bacc3d4d0 Preprocess: refactor to improve code readability 2016-06-01 21:26:24 -07:00
HaiboZhu
515eeb41e4 Merge pull request #2481 from ruil2/maxbitrate1
fix iContinualSkipFrames calculation
2016-06-01 09:03:57 +08:00
ruil2
2d3fc37a07 Merge pull request #2484 from sijchen/refactor_preprocess13
[Encoder] Refactor: add class for diff preprocess strategy
2016-06-01 08:31:02 +08:00
sijchen@cisco.com
03863ae4c6 different preprocess actually used diff source picture management 2016-05-31 14:36:21 -07:00
sijchen@cisco.com
a1cae49732 add class for diff preprocess strategy 2016-05-31 13:48:45 -07:00
Karina
dd021b6ca8 fix iContinualSkipFrames calculation 2016-05-31 21:01:11 +08:00
Karina
64ad70b0ea get the correct did for savc case 2016-05-31 17:35:20 +08:00
Karina
4fc2b1f636 refine RC 2016-05-31 16:44:04 +08:00
ruil2
c96c8b05a8 Merge pull request #2468 from sijchen/refactor_pre
[Encoder] Refactor: create diff func for diff case to make logic clean
2016-05-23 13:21:40 +08:00
Karina
9b2dd55324 add GetBsPostion for cabac and cavlc 2016-05-20 14:29:48 +08:00
sijchen
27e803f6f4 refactor to make logic clean 2016-05-19 09:42:39 -07:00
sijchen
1ac02f3002 fix conflict with master 2016-05-18 10:57:39 -07:00
sijchen
00747540fb move strategy related pointer to class 2016-05-16 10:55:13 -07:00
sijchen
ffb85046b4 Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class, to improve code readability 2016-05-04 15:06:02 -07:00
HaiboZhu
c30cc41261 Merge pull request #2448 from saamas/encoder-getnonzerocount-sse42
[Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
2016-05-04 09:49:47 +08:00
ruil2
e9dc97803d Merge pull request #2447 from saamas/encoder-cavlcparamcal-sse42
[Encoder] Add an SSE4.2 implementation of CavlcParamCal
2016-04-28 09:08:44 +08:00
Sindre Aamås
4645bd26aa [Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
Avoid touching some cache lines by using popcnt instead of table
lookups.

Also gives a speedup of ~1.4x on Haswell as compared with SSE2.
2016-04-20 19:10:24 +02:00
Sindre Aamås
3f31aff4dc [Encoder] Add an SSE4.2 implementation of CavlcParamCal
Use a combination of table lookups and pshufb to convert coefficients
to zero run/level format. Two 16-entry lookup tables are used for a
total of 192 bytes worth of tables. (The existing SSE2 version uses a
table of size 2048 bytes.)

Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the
speedup is greater for input with many trailing zeros).

The use of popcnt makes it require SSE4.2. This can be replaced with
a small LUT and accumulation which would reduce the requirement to
SSSE3.
2016-04-20 18:37:08 +02:00
Sindre Aamås
502b16925e [UT] Add tests for CavlcParamCal_c and CavlcParamCal_sse2 2016-04-20 18:37:08 +02:00
Sindre Aamås
bb49e23719 [Encoder] Add AVX2 4x4 quantization routines
WelsQuantFour4x4Max_avx2 (~2.06x speedup over SSE2)
WelsQuantFour4x4_avx2    (~2.32x speedup over SSE2)
WelsQuant4x4Dc_avx2      (~1.49x speedup over SSE2)
WelsQuant4x4_avx2        (~1.42x speedup over SSE2)
2016-04-13 11:56:47 +02:00
Karina
7e14400b0b refine the workflow for encode one frame 2016-03-31 16:58:20 +08:00
huade
a7a5b7b0f4 refactor for slice buffer init/allocate/free 2016-03-22 13:51:20 +08:00
Karina
228cdeba1b refine reset function 2016-03-21 10:48:41 +08:00
Karina
f84f2315ab change downsampling logic that downsampling source is from the nearest layer instead of the highest layer 2016-03-14 09:55:36 +08:00
Sindre Aamås
48a520915a [Encoder/x86] Add AVX2 SATD routines
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2  (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2  (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2   (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
sijchen
4db9c32976 remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
sijchen
d4f09d9048 put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask) 2016-02-29 11:40:25 -08:00
HaiboZhu
040974f735 Merge pull request #2378 from shihuade/MultiThread_V4.9_V5
add thread-based slice buffer and  refactor reallocate process
2016-02-25 14:40:56 +08:00
HaiboZhu
027f027c25 Merge pull request #2371 from GregoryJWolfe/master
Added support for "video signal type present" information.
2016-02-25 10:49:34 +08:00
huade
5e8a716c1d add thread-based slice buffer and refact reallocate process for futher change 2016-02-25 10:08:41 +08:00
Gregory J. Wolfe
c7fcba06c7 Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
ruil2
3e538617cd Merge pull request #2374 from sijchen/for_ts0
[Encoder] fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-23 17:26:20 +08:00
huade
7bcb3ba4f4 refactor slice level rc structure 2016-02-23 16:49:37 +08:00
sijchen
9816e3302d fix timestamp = 0 issue when rc mode is BITRATE mode 2016-02-22 10:33:55 -08:00
Karina
597b4eef73 fix timestamp = 0 issue when rc mode is BITRATE mode. 2016-02-22 10:33:55 -08:00
Gregory J. Wolfe
f35a0daccf Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data.  The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-18 11:51:51 -05:00
sijchen
914302a462 avoid memory problem if mem alloc failed in the middle of InitDqLayer 2016-02-10 21:54:53 -08:00
sijchen
aaa25160ec Merge pull request #2353 from saamas/encoder-x86-dct-opt2
[Encoder] x86 DCT optimizations
2016-02-08 15:00:12 -08:00
sijchen
e5e7013b73 Merge pull request #2350 from sijchen/th00
[Common] Add sink to IWelsTask
2016-02-08 14:59:38 -08:00
Sindre Aamås
c8c74903f8 [Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.

Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
Sindre Aamås
f90960983c [Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
Sindre Aamås
3088d96978 [Encoder] Add an AVX2 4x4 IDCT implementation
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b267163f10 [Encoder] Add an AVX2 4x4 DCT implementation
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Karina
0f0d54ef51 using independent encoder control logic for SAVC case 2016-01-14 09:16:12 +08:00
sijchen
cce1c29844 add sink to IWelsTask (for further enhancements) 2016-01-13 16:24:54 -08:00