Commit Graph

2604 Commits

Author SHA1 Message Date
Karina
525dbe7093 add 32-bit parameter sign-extentions for block_add_aarch64_neon.S 2016-04-14 10:06:57 +08:00
Karina
d34e209266 fix 32-bit parameters issue on arm64 assembly function 2016-04-13 19:30:08 +08:00
Karina
7943764869 add missing sign extension for arm64 2016-04-12 16:27:58 +08:00
Karina
7e14400b0b refine the workflow for encode one frame 2016-03-31 16:58:20 +08:00
Karina
3927d91b85 fix temporal layer issue when output frame rate is different from input frame rate 2016-03-29 15:48:06 +08:00
sijchen
30da4f196e Merge pull request #2426 from ruil2/fix_trace
fix skip frames statistics issue
2016-03-24 09:49:48 -07:00
Martin Storsjö
a4e71d6662 Add missing sign extension for x86_64 in mb_copy.asm
This fixes running the code built for x86_64 OS X with Xcode 7.3.
2016-03-24 10:20:42 +02:00
Karina
10dfb2670b fix skip frames statistics issue 2016-03-24 14:17:47 +08:00
sijchen
47d310539f Squashed commit of the following:
commit c8111942e07437034a74b33887c33b5ad78e476a
Author: Karina <ruil2@cisco.com>
Date:   Wed Mar 23 14:31:18 2016 +0800

    update SHA table

commit f36a25344c25a131581dcbcd2d103fc4b131012e
Author: Karina <ruil2@cisco.com>
Date:   Wed Mar 23 13:45:58 2016 +0800

    fix bitrate overflow issue when adaptive quality turns on
2016-03-23 10:23:33 -07:00
HaiboZhu
c0641f40d9 Merge pull request #2423 from shihuade/SPSUpdate
fix bug for debug mode
2016-03-23 13:27:35 +08:00
Haibo Zhu
e4a4fb6577 (1) Fix the level limit check wrong condition
(2) Fix the FMO return value overflow bug
2016-03-23 11:15:26 +08:00
Forrest Shi
47ad929c25 fix bug for debug mode 2016-03-23 11:14:40 +08:00
huade
a7a5b7b0f4 refactor for slice buffer init/allocate/free 2016-03-22 13:51:20 +08:00
sijchen
33bb96f604 Merge pull request #2420 from sijchen/fix_sps
[Encoder] fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on
2016-03-21 21:51:07 -07:00
sijchen
8103988cde Merge pull request #2418 from ruil2/refine_init
fix preprocessing initialization logic
2016-03-21 11:32:24 -07:00
Karina
228cdeba1b refine reset function 2016-03-21 10:48:41 +08:00
Karina
7c15d68e24 fix preprocessing initialization logic 2016-03-18 16:43:11 +08:00
Karina
316ab31882 fix bitrate update issue 2016-03-18 14:28:32 +08:00
zhilwang
d7570bfa52 Merge pull request #2401 from saamas/decoder-use-encoder-x86-idct-routines
[Decoder] Use encoder x86 IDCT routines
2016-03-18 08:50:33 +08:00
HaiboZhu
a8ab4afe5b Merge pull request #2410 from HaiboZhu/Add_disable_assert_in_release
Diable assert in release with -DNDEBUG macro
2016-03-17 15:46:25 +08:00
Haibo Zhu
43f767d06e Diable assert in release with -DNDEBUG macro
Update the code to avoid the function unused warning
2016-03-17 11:24:01 +08:00
unknown
693fd14272 fix memory leak when alloc failed in decoder 2016-03-17 10:31:25 +08:00
Sindre Aamås
b6c4a5447c [Decoder/x86] IDCT one block at a time with SSE2
At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.

Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.
2016-03-16 19:55:11 +01:00
sijchen
90deb80b50 rename the functions 2016-03-14 21:41:08 -07:00
sijchen
c009183e97 fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on 2016-03-14 11:28:44 -07:00
Karina
f84f2315ab change downsampling logic that downsampling source is from the nearest layer instead of the highest layer 2016-03-14 09:55:36 +08:00
HaiboZhu
25f53a2e3d Merge pull request #2399 from saamas/encoder-x86-add-avx2-satd-routines
[Encoder/x86] Add AVX2 SATD routines
2016-03-10 09:59:33 +08:00
Sindre Aamås
98042f1600 [Decoder] Use encoder x86 IDCT routines
Move asm routines to common. Delete obsolete decoder routines.

Use wider routines where applicable.

~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.
2016-03-09 10:41:42 +01:00
Haibo Zhu
31de8bb3a0 Change the level limit check behavior to make the compatibility 2016-03-09 08:34:07 +08:00
Sindre Aamås
48a520915a [Encoder/x86] Add AVX2 SATD routines
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2  (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2  (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2   (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
volvet
d4c68527b1 Merge pull request #2389 from saamas/common-x86-deblock-chroma-horizontal-ssse3-optimizations
[Common/x86] Deblock chroma horizontal ssse3 optimizations
2016-03-08 17:09:08 +08:00
HaiboZhu
d9bfc9204b Merge pull request #2394 from sijchen/th021
[Common] remove sink in WelsThreadPool and hide the construtor to finish the s…
2016-03-08 16:29:40 +08:00
Karina
fee9d502bb format update and fix build issue when turn on STAT_OUTPUT macro 2016-03-04 13:55:14 +08:00
sijchen
316f740630 Merge pull request #2390 from sijchen/th012
[Common] put CWelsThreadPool to singleTon for future usage
2016-03-03 09:47:20 -08:00
Martin Storsjö
7f53c29302 Fix a return value check
In 9cb4f4e8e2, the error code returned from CheckIntraNxNPredMode
was changed - therefore, these return value checks, that look
for a specific error code, need to be updated accordingly.

This fixes crashes in DecodeCrashTestAPI.DecoderCrashTest
with some seeds.
2016-03-03 10:15:34 +02:00
sijchen
4db9c32976 remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
sijchen
d4f09d9048 put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask) 2016-02-29 11:40:25 -08:00
HaiboZhu
52d25f544a Merge pull request #2386 from huili2/return_info_change
modify return value check inside decoder
2016-02-29 09:21:31 +08:00
Sindre Aamås
a009153741 [Common/x86] DeblockChromaEq4H_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

~5.80x speedup on Haswell (x86-64).
~1.69x speedup on Haswell (x86 32-bit).
2016-02-26 10:58:16 +01:00
Sindre Aamås
9909c306f1 [Common/x86] DeblockChromaLt4H_ssse3 optimizations
Use packed 8-bit operations rather than unpack to 16-bit.

~5.72x speedup on Haswell (x86-64).
~1.85x speedup on Haswell (x86 32-bit).
2016-02-26 10:58:16 +01:00
unknown
9cb4f4e8e2 modify return value check inside decoder 2016-02-26 16:29:35 +08:00
Martin Storsjö
69e3fac093 Avoid reading iCountMbNumInSlice out of bounds on slice realloc
Prior to 7bcb3ba4f4,
pCurLayer->sLayerInfo.pSliceInLayer[uiSliceIdx].iCountMbNumInSlice
was read after setting pCurLayer->sLayerInfo.pSliceInLayer to
the newly allocated, larger array. After this commit, it is read
before the array has been switched, and thus is read from the
old array (which only holds elements up to iMaxSliceNumOld, not
up to iMaxSliceNum).

This fixes reads out of bounds, and crashes in the test suite.
2016-02-25 10:31:58 +02:00
HaiboZhu
040974f735 Merge pull request #2378 from shihuade/MultiThread_V4.9_V5
add thread-based slice buffer and  refactor reallocate process
2016-02-25 14:40:56 +08:00
HaiboZhu
321c772536 Merge pull request #2372 from ruil2/refine_trace
update trace for ENCODER_OPTION_TRACE_CALLBACK
2016-02-25 10:50:12 +08:00
HaiboZhu
027f027c25 Merge pull request #2371 from GregoryJWolfe/master
Added support for "video signal type present" information.
2016-02-25 10:49:34 +08:00
huade
5e8a716c1d add thread-based slice buffer and refact reallocate process for futher change 2016-02-25 10:08:41 +08:00
Gregory J. Wolfe
c7fcba06c7 Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
ruil2
3e538617cd Merge pull request #2374 from sijchen/for_ts0
[Encoder] fix timestamp = 0 issue when rc mode is BITRATE mode
2016-02-23 17:26:20 +08:00
huade
7bcb3ba4f4 refactor slice level rc structure 2016-02-23 16:49:37 +08:00
sijchen
881fc11c48 finish the remaining prob of fixing ts=0 2016-02-22 10:40:35 -08:00