Karina
d34e209266
fix 32-bit parameters issue on arm64 assembly function
2016-04-13 19:30:08 +08:00
Sindre Aamås
bb49e23719
[Encoder] Add AVX2 4x4 quantization routines
...
WelsQuantFour4x4Max_avx2 (~2.06x speedup over SSE2)
WelsQuantFour4x4_avx2 (~2.32x speedup over SSE2)
WelsQuant4x4Dc_avx2 (~1.49x speedup over SSE2)
WelsQuant4x4_avx2 (~1.42x speedup over SSE2)
2016-04-13 11:56:47 +02:00
Sindre Aamås
1e83bec860
[UT] Add some missing quantization tests
2016-04-13 11:56:44 +02:00
Sindre Aamås
abaf3a4104
[UT] Reduce duplication in quantization tests
2016-04-13 08:59:16 +02:00
HaiboZhu
50daa8f737
Merge pull request #2439 from ruil2/deblocking_fix
...
add missing sign extension for arm64 on deblocking_aarch64_neon.S
2016-04-12 16:48:54 +08:00
Karina
7943764869
add missing sign extension for arm64
2016-04-12 16:27:58 +08:00
Sindre Aamås
93db6511a8
[UT] Test VAA routines with a wider variety of resolutions
...
Test even and odd multiples of 32 width because some AVX2 routines
have conditional logic based on that.
2016-04-11 16:40:36 +02:00
Sindre Aamås
57fc3e9917
[Processing] Add AVX2 VAA routines
...
Process 8 lines at a time rather than 16 lines at a time because
this appears to give more reliable memory subsystem performance on
Haswell.
Speedup is > 2x as compared to SSE2 when not memory-bound on Haswell.
On my Haswell MBP, VAACalcSadSsdBgd is about ~3x faster when uncached,
which appears to be related to processing 8 lines at a time as opposed
to 16 lines at a time. The other routines are also faster as compared
to the SSE2 routines in this case but to a lesser extent.
2016-04-11 16:09:56 +02:00
HaiboZhu
eb9f56584f
Merge pull request #2432 from ruil2/refine_encode1
...
refine the workflow for encode one frame
2016-04-06 08:59:48 +08:00
hzwangsiyu
6d2d031fca
Update .gitignore
2016-04-04 10:32:29 +08:00
Karina
7e14400b0b
refine the workflow for encode one frame
2016-03-31 16:58:20 +08:00
HaiboZhu
c423a80ba4
Merge pull request #2431 from ruil2/temporal_layer
...
fix frame rate issue
2016-03-30 17:02:43 +08:00
Karina
3927d91b85
fix temporal layer issue when output frame rate is different from input frame rate
2016-03-29 15:48:06 +08:00
ruil2
17d7aa13e4
Merge pull request #2427 from mstorsjo/mktargets
...
Refresh regenerating targets.mk
2016-03-25 09:36:53 +08:00
sijchen
30da4f196e
Merge pull request #2426 from ruil2/fix_trace
...
fix skip frames statistics issue
2016-03-24 09:49:48 -07:00
zhilwang
25818b0fc2
Merge pull request #2428 from mstorsjo/sign-extension
...
Add missing sign extension for x86_64 in mb_copy.asm
2016-03-24 17:07:42 +08:00
Martin Storsjö
a4e71d6662
Add missing sign extension for x86_64 in mb_copy.asm
...
This fixes running the code built for x86_64 OS X with Xcode 7.3.
2016-03-24 10:20:42 +02:00
Martin Storsjö
81493590f8
Remove a stray empty line
...
This disappears when regenerating the makefiles.
2016-03-24 10:01:48 +02:00
Martin Storsjö
d7bc4f5f03
Make sure that gtest-targets.mk gets regenerated with the right directory
2016-03-24 10:01:21 +02:00
Karina
10dfb2670b
fix skip frames statistics issue
2016-03-24 14:17:47 +08:00
sijchen
22bec09507
Merge pull request #2425 from sijchen/ruil_rc_update
...
fix bitrate overflow issue when adaptive quality turns on
2016-03-23 10:54:51 -07:00
sijchen
47d310539f
Squashed commit of the following:
...
commit c8111942e07437034a74b33887c33b5ad78e476a
Author: Karina <ruil2@cisco.com>
Date: Wed Mar 23 14:31:18 2016 +0800
update SHA table
commit f36a25344c25a131581dcbcd2d103fc4b131012e
Author: Karina <ruil2@cisco.com>
Date: Wed Mar 23 13:45:58 2016 +0800
fix bitrate overflow issue when adaptive quality turns on
2016-03-23 10:23:33 -07:00
HaiboZhu
c0641f40d9
Merge pull request #2423 from shihuade/SPSUpdate
...
fix bug for debug mode
2016-03-23 13:27:35 +08:00
HaiboZhu
e52c6eacb0
Merge pull request #2422 from HaiboZhu/Bugfix_level_check_error_fmo_return_value
...
Fix the level limit check bug and fmo return overflow bug
2016-03-23 12:14:20 +08:00
Haibo Zhu
e4a4fb6577
(1) Fix the level limit check wrong condition
...
(2) Fix the FMO return value overflow bug
2016-03-23 11:15:26 +08:00
Forrest Shi
47ad929c25
fix bug for debug mode
2016-03-23 11:14:40 +08:00
sijchen
22d6a94919
Merge pull request #2414 from ksb2go/master
...
Google has deprecated using SVN. Move over to GitHub
2016-03-22 16:20:58 -07:00
sijchen
40e1a69fae
Merge pull request #2421 from shihuade/MultiThread_V5.2_Pull_V2
...
refactor for slice buffer init/allocate/free
2016-03-22 16:20:37 -07:00
huade
a7a5b7b0f4
refactor for slice buffer init/allocate/free
2016-03-22 13:51:20 +08:00
sijchen
33bb96f604
Merge pull request #2420 from sijchen/fix_sps
...
[Encoder] fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on
2016-03-21 21:51:07 -07:00
sijchen
8103988cde
Merge pull request #2418 from ruil2/refine_init
...
fix preprocessing initialization logic
2016-03-21 11:32:24 -07:00
Karina
228cdeba1b
refine reset function
2016-03-21 10:48:41 +08:00
sijchen
38313b913d
Merge pull request #2419 from ruil2/bitrate_update
...
fix bitrate update issue
2016-03-18 16:07:59 -07:00
Karina
7c15d68e24
fix preprocessing initialization logic
2016-03-18 16:43:11 +08:00
Karina
316ab31882
fix bitrate update issue
2016-03-18 14:28:32 +08:00
zhilwang
d7570bfa52
Merge pull request #2401 from saamas/decoder-use-encoder-x86-idct-routines
...
[Decoder] Use encoder x86 IDCT routines
2016-03-18 08:50:33 +08:00
David Chen
7112938a28
Google has deprecated using SVN. Move over to GitHub
2016-03-17 17:25:22 -07:00
HaiboZhu
a8ab4afe5b
Merge pull request #2410 from HaiboZhu/Add_disable_assert_in_release
...
Diable assert in release with -DNDEBUG macro
2016-03-17 15:46:25 +08:00
HaiboZhu
c441f6f390
Merge pull request #2411 from huili2/memory_leak_fix
...
fix memory leak when alloc failed in decoder
2016-03-17 15:46:12 +08:00
Haibo Zhu
43f767d06e
Diable assert in release with -DNDEBUG macro
...
Update the code to avoid the function unused warning
2016-03-17 11:24:01 +08:00
unknown
693fd14272
fix memory leak when alloc failed in decoder
2016-03-17 10:31:25 +08:00
Sindre Aamås
b6c4a5447c
[Decoder/x86] IDCT one block at a time with SSE2
...
At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.
Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.
2016-03-16 19:55:11 +01:00
huili2
a8d9576297
Merge pull request #2405 from HaiboZhu/Fix_UT_decoder_init_fail
...
Fix the decoder init failed case in UT
2016-03-16 16:28:14 +08:00
HaiboZhu
7a3b3fdbe7
Merge pull request #2403 from ruil2/downsampling1
...
change downsampling logic
2016-03-16 09:48:08 +08:00
sijchen
90deb80b50
rename the functions
2016-03-14 21:41:08 -07:00
sijchen
c009183e97
fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on
2016-03-14 11:28:44 -07:00
Haibo Zhu
46f42ec5f3
Fix the decoder init failed case in UT
2016-03-14 17:06:58 +08:00
Karina
f84f2315ab
change downsampling logic that downsampling source is from the nearest layer instead of the highest layer
2016-03-14 09:55:36 +08:00
HaiboZhu
25f53a2e3d
Merge pull request #2399 from saamas/encoder-x86-add-avx2-satd-routines
...
[Encoder/x86] Add AVX2 SATD routines
2016-03-10 09:59:33 +08:00
Sindre Aamås
98042f1600
[Decoder] Use encoder x86 IDCT routines
...
Move asm routines to common. Delete obsolete decoder routines.
Use wider routines where applicable.
~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.
2016-03-09 10:41:42 +01:00