1332 Commits

Author SHA1 Message Date
sijchen
27e803f6f4 refactor to make logic clean 2016-05-19 09:42:39 -07:00
Karina
8a341070f2 fix overflow issue 2016-05-19 12:00:49 +08:00
sijchen
1ac02f3002 fix conflict with master 2016-05-18 10:57:39 -07:00
Karina
c298d66d48 fix temporal layer skip issue 2016-05-18 09:47:49 +08:00
Haibo Zhu
85f4beb9a8 Fix the wrong variable name which casue the build error 2016-05-17 13:46:04 +08:00
ruil2
0ec686f7ec Merge pull request #2452 from sijchen/refactor_sps2
Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class
2016-05-17 09:19:14 +08:00
sijchen
00747540fb move strategy related pointer to class 2016-05-16 10:55:13 -07:00
Karina
3b55d64902 fix crash when temporal layer is skipped, the frame should not be encoded 2016-05-16 14:43:13 +08:00
sijchen
ffb85046b4 Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class, to improve code readability 2016-05-04 15:06:02 -07:00
HaiboZhu
c30cc41261 Merge pull request #2448 from saamas/encoder-getnonzerocount-sse42
[Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
2016-05-04 09:49:47 +08:00
ruil2
e9dc97803d Merge pull request #2447 from saamas/encoder-cavlcparamcal-sse42
[Encoder] Add an SSE4.2 implementation of CavlcParamCal
2016-04-28 09:08:44 +08:00
ruil2
7d65687284 Merge pull request #2441 from saamas/encoder-add-avx2-4x4-quantization-routines
[Encoder] Add AVX2 4x4 quantization routines
2016-04-28 09:08:31 +08:00
Sindre Aamås
fb0b2b3f41 [Encoder/x86] Drop unneeded LOAD_4_PARA in CavlcParamCal_sse42 2016-04-24 22:59:35 +02:00
Sindre Aamås
d1c7713191 [Encoder/x86] Minor CavlcParamCal_sse42 tweak
Do more elaborate register allocation to avoid a few mov instructions.
2016-04-24 22:36:23 +02:00
Sindre Aamås
f56bdc3aa4 [Encoder/x86] Minor CavlcParamCal_sse42 tweak
Avoid loading single-use parameter.
2016-04-21 16:29:02 +02:00
Sindre Aamås
2eb8800712 [Encoder/x86] Remove a leftover mov instruction in CavlcParamCal_sse42 2016-04-21 15:53:33 +02:00
Sindre Aamås
4645bd26aa [Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
Avoid touching some cache lines by using popcnt instead of table
lookups.

Also gives a speedup of ~1.4x on Haswell as compared with SSE2.
2016-04-20 19:10:24 +02:00
Sindre Aamås
3f31aff4dc [Encoder] Add an SSE4.2 implementation of CavlcParamCal
Use a combination of table lookups and pshufb to convert coefficients
to zero run/level format. Two 16-entry lookup tables are used for a
total of 192 bytes worth of tables. (The existing SSE2 version uses a
table of size 2048 bytes.)

Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the
speedup is greater for input with many trailing zeros).

The use of popcnt makes it require SSE4.2. This can be replaced with
a small LUT and accumulation which would reduce the requirement to
SSSE3.
2016-04-20 18:37:08 +02:00
Sindre Aamås
502b16925e [UT] Add tests for CavlcParamCal_c and CavlcParamCal_sse2 2016-04-20 18:37:08 +02:00
HaiboZhu
3b68840d5f Merge pull request #2444 from GuangweiWang/fix-assembly-arm64
Fix assembly arm64
Code review at: https://rbcommons.com/s/OpenH264/r/1594/
2016-04-20 10:03:01 +08:00
Guangwei Wang
cc407b4b21 fix code style 2016-04-17 19:47:55 +08:00
Guangwei Wang
0b8cdcaff8 extension 32-bit parameters to 64-bit on arm64 assembly function 2016-04-17 19:41:57 +08:00
Karina
dd340b7fe7 modify neon comment 2016-04-14 14:49:11 +08:00
Karina
d34e209266 fix 32-bit parameters issue on arm64 assembly function 2016-04-13 19:30:08 +08:00
Sindre Aamås
bb49e23719 [Encoder] Add AVX2 4x4 quantization routines
WelsQuantFour4x4Max_avx2 (~2.06x speedup over SSE2)
WelsQuantFour4x4_avx2    (~2.32x speedup over SSE2)
WelsQuant4x4Dc_avx2      (~1.49x speedup over SSE2)
WelsQuant4x4_avx2        (~1.42x speedup over SSE2)
2016-04-13 11:56:47 +02:00
Karina
7e14400b0b refine the workflow for encode one frame 2016-03-31 16:58:20 +08:00
Karina
3927d91b85 fix temporal layer issue when output frame rate is different from input frame rate 2016-03-29 15:48:06 +08:00
Karina
10dfb2670b fix skip frames statistics issue 2016-03-24 14:17:47 +08:00
sijchen
47d310539f Squashed commit of the following:
commit c8111942e07437034a74b33887c33b5ad78e476a
Author: Karina <ruil2@cisco.com>
Date:   Wed Mar 23 14:31:18 2016 +0800

    update SHA table

commit f36a25344c25a131581dcbcd2d103fc4b131012e
Author: Karina <ruil2@cisco.com>
Date:   Wed Mar 23 13:45:58 2016 +0800

    fix bitrate overflow issue when adaptive quality turns on
2016-03-23 10:23:33 -07:00
Forrest Shi
47ad929c25 fix bug for debug mode 2016-03-23 11:14:40 +08:00
huade
a7a5b7b0f4 refactor for slice buffer init/allocate/free 2016-03-22 13:51:20 +08:00
sijchen
33bb96f604 Merge pull request #2420 from sijchen/fix_sps
[Encoder] fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on
2016-03-21 21:51:07 -07:00
sijchen
8103988cde Merge pull request #2418 from ruil2/refine_init
fix preprocessing initialization logic
2016-03-21 11:32:24 -07:00
Karina
228cdeba1b refine reset function 2016-03-21 10:48:41 +08:00
Karina
7c15d68e24 fix preprocessing initialization logic 2016-03-18 16:43:11 +08:00
Karina
316ab31882 fix bitrate update issue 2016-03-18 14:28:32 +08:00
zhilwang
d7570bfa52 Merge pull request #2401 from saamas/decoder-use-encoder-x86-idct-routines
[Decoder] Use encoder x86 IDCT routines
2016-03-18 08:50:33 +08:00
Haibo Zhu
43f767d06e Diable assert in release with -DNDEBUG macro
Update the code to avoid the function unused warning
2016-03-17 11:24:01 +08:00
sijchen
90deb80b50 rename the functions 2016-03-14 21:41:08 -07:00
sijchen
c009183e97 fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on 2016-03-14 11:28:44 -07:00
Karina
f84f2315ab change downsampling logic that downsampling source is from the nearest layer instead of the highest layer 2016-03-14 09:55:36 +08:00
Sindre Aamås
98042f1600 [Decoder] Use encoder x86 IDCT routines
Move asm routines to common. Delete obsolete decoder routines.

Use wider routines where applicable.

~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.
2016-03-09 10:41:42 +01:00
Sindre Aamås
48a520915a [Encoder/x86] Add AVX2 SATD routines
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2  (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2  (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2   (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
HaiboZhu
d9bfc9204b Merge pull request #2394 from sijchen/th021
[Common] remove sink in WelsThreadPool and hide the construtor to finish the s…
2016-03-08 16:29:40 +08:00
Karina
fee9d502bb format update and fix build issue when turn on STAT_OUTPUT macro 2016-03-04 13:55:14 +08:00
sijchen
4db9c32976 remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
sijchen
d4f09d9048 put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask) 2016-02-29 11:40:25 -08:00
Martin Storsjö
69e3fac093 Avoid reading iCountMbNumInSlice out of bounds on slice realloc
Prior to 7bcb3ba4f4abf18a,
pCurLayer->sLayerInfo.pSliceInLayer[uiSliceIdx].iCountMbNumInSlice
was read after setting pCurLayer->sLayerInfo.pSliceInLayer to
the newly allocated, larger array. After this commit, it is read
before the array has been switched, and thus is read from the
old array (which only holds elements up to iMaxSliceNumOld, not
up to iMaxSliceNum).

This fixes reads out of bounds, and crashes in the test suite.
2016-02-25 10:31:58 +02:00
HaiboZhu
040974f735 Merge pull request #2378 from shihuade/MultiThread_V4.9_V5
add thread-based slice buffer and  refactor reallocate process
2016-02-25 14:40:56 +08:00
HaiboZhu
321c772536 Merge pull request #2372 from ruil2/refine_trace
update trace for ENCODER_OPTION_TRACE_CALLBACK
2016-02-25 10:50:12 +08:00