Commit Graph

4213 Commits

Author SHA1 Message Date
ruil2
e9dc97803d Merge pull request #2447 from saamas/encoder-cavlcparamcal-sse42
[Encoder] Add an SSE4.2 implementation of CavlcParamCal
2016-04-28 09:08:44 +08:00
ruil2
7d65687284 Merge pull request #2441 from saamas/encoder-add-avx2-4x4-quantization-routines
[Encoder] Add AVX2 4x4 quantization routines
2016-04-28 09:08:31 +08:00
ruil2
56618249d7 Merge pull request #2436 from saamas/processing-add-avx2-vaa-routines
[Processing] Add AVX2 VAA routines
2016-04-28 09:08:03 +08:00
Sindre Aamås
fb0b2b3f41 [Encoder/x86] Drop unneeded LOAD_4_PARA in CavlcParamCal_sse42 2016-04-24 22:59:35 +02:00
Sindre Aamås
d1c7713191 [Encoder/x86] Minor CavlcParamCal_sse42 tweak
Do more elaborate register allocation to avoid a few mov instructions.
2016-04-24 22:36:23 +02:00
Sindre Aamås
f56bdc3aa4 [Encoder/x86] Minor CavlcParamCal_sse42 tweak
Avoid loading single-use parameter.
2016-04-21 16:29:02 +02:00
Sindre Aamås
2eb8800712 [Encoder/x86] Remove a leftover mov instruction in CavlcParamCal_sse42 2016-04-21 15:53:33 +02:00
Sindre Aamås
3f31aff4dc [Encoder] Add an SSE4.2 implementation of CavlcParamCal
Use a combination of table lookups and pshufb to convert coefficients
to zero run/level format. Two 16-entry lookup tables are used for a
total of 192 bytes worth of tables. (The existing SSE2 version uses a
table of size 2048 bytes.)

Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the
speedup is greater for input with many trailing zeros).

The use of popcnt makes it require SSE4.2. This can be replaced with
a small LUT and accumulation which would reduce the requirement to
SSSE3.
2016-04-20 18:37:08 +02:00
Sindre Aamås
502b16925e [UT] Add tests for CavlcParamCal_c and CavlcParamCal_sse2 2016-04-20 18:37:08 +02:00
HaiboZhu
98c6c6de11 Merge pull request #2446 from HaiboZhu/Reduce_log_size_for_parse_only_mode
Add the log reduce logic into parse only mode
2016-04-20 10:48:57 +08:00
HaiboZhu
3b68840d5f Merge pull request #2444 from GuangweiWang/fix-assembly-arm64
Fix assembly arm64
Code review at: https://rbcommons.com/s/OpenH264/r/1594/
2016-04-20 10:03:01 +08:00
Haibo Zhu
3ccecfbdbe Add the log reduce logic into parse only mode 2016-04-20 09:58:12 +08:00
HaiboZhu
c9433ee73b Merge pull request #2442 from ruil2/deblocking_fix
fix 32-bit parameters issue on arm64 assembly function
2016-04-18 09:21:24 +08:00
Guangwei Wang
cc407b4b21 fix code style 2016-04-17 19:47:55 +08:00
Guangwei Wang
0b8cdcaff8 extension 32-bit parameters to 64-bit on arm64 assembly function 2016-04-17 19:41:57 +08:00
Karina
1ecb9582df update arm assembly comments 2016-04-14 14:57:21 +08:00
Karina
dd340b7fe7 modify neon comment 2016-04-14 14:49:11 +08:00
Karina
525dbe7093 add 32-bit parameter sign-extentions for block_add_aarch64_neon.S 2016-04-14 10:06:57 +08:00
Karina
d34e209266 fix 32-bit parameters issue on arm64 assembly function 2016-04-13 19:30:08 +08:00
Sindre Aamås
bb49e23719 [Encoder] Add AVX2 4x4 quantization routines
WelsQuantFour4x4Max_avx2 (~2.06x speedup over SSE2)
WelsQuantFour4x4_avx2    (~2.32x speedup over SSE2)
WelsQuant4x4Dc_avx2      (~1.49x speedup over SSE2)
WelsQuant4x4_avx2        (~1.42x speedup over SSE2)
2016-04-13 11:56:47 +02:00
Sindre Aamås
1e83bec860 [UT] Add some missing quantization tests 2016-04-13 11:56:44 +02:00
Sindre Aamås
abaf3a4104 [UT] Reduce duplication in quantization tests 2016-04-13 08:59:16 +02:00
HaiboZhu
50daa8f737 Merge pull request #2439 from ruil2/deblocking_fix
add missing sign extension for arm64 on deblocking_aarch64_neon.S
2016-04-12 16:48:54 +08:00
Karina
7943764869 add missing sign extension for arm64 2016-04-12 16:27:58 +08:00
Sindre Aamås
93db6511a8 [UT] Test VAA routines with a wider variety of resolutions
Test even and odd multiples of 32 width because some AVX2 routines
have conditional logic based on that.
2016-04-11 16:40:36 +02:00
Sindre Aamås
57fc3e9917 [Processing] Add AVX2 VAA routines
Process 8 lines at a time rather than 16 lines at a time because
this appears to give more reliable memory subsystem performance on
Haswell.

Speedup is > 2x as compared to SSE2 when not memory-bound on Haswell.
On my Haswell MBP, VAACalcSadSsdBgd is about ~3x faster when uncached,
which appears to be related to processing 8 lines at a time as opposed
to 16 lines at a time. The other routines are also faster as compared
to the SSE2 routines in this case but to a lesser extent.
2016-04-11 16:09:56 +02:00
HaiboZhu
eb9f56584f Merge pull request #2432 from ruil2/refine_encode1
refine the workflow for encode one frame
2016-04-06 08:59:48 +08:00
Karina
7e14400b0b refine the workflow for encode one frame 2016-03-31 16:58:20 +08:00
HaiboZhu
c423a80ba4 Merge pull request #2431 from ruil2/temporal_layer
fix frame rate issue
2016-03-30 17:02:43 +08:00
Karina
3927d91b85 fix temporal layer issue when output frame rate is different from input frame rate 2016-03-29 15:48:06 +08:00
ruil2
17d7aa13e4 Merge pull request #2427 from mstorsjo/mktargets
Refresh regenerating targets.mk
2016-03-25 09:36:53 +08:00
sijchen
30da4f196e Merge pull request #2426 from ruil2/fix_trace
fix skip frames statistics issue
2016-03-24 09:49:48 -07:00
zhilwang
25818b0fc2 Merge pull request #2428 from mstorsjo/sign-extension
Add missing sign extension for x86_64 in mb_copy.asm
2016-03-24 17:07:42 +08:00
Martin Storsjö
a4e71d6662 Add missing sign extension for x86_64 in mb_copy.asm
This fixes running the code built for x86_64 OS X with Xcode 7.3.
2016-03-24 10:20:42 +02:00
Martin Storsjö
81493590f8 Remove a stray empty line
This disappears when regenerating the makefiles.
2016-03-24 10:01:48 +02:00
Martin Storsjö
d7bc4f5f03 Make sure that gtest-targets.mk gets regenerated with the right directory 2016-03-24 10:01:21 +02:00
Karina
10dfb2670b fix skip frames statistics issue 2016-03-24 14:17:47 +08:00
sijchen
22bec09507 Merge pull request #2425 from sijchen/ruil_rc_update
fix bitrate overflow issue when adaptive quality turns on
2016-03-23 10:54:51 -07:00
sijchen
47d310539f Squashed commit of the following:
commit c8111942e07437034a74b33887c33b5ad78e476a
Author: Karina <ruil2@cisco.com>
Date:   Wed Mar 23 14:31:18 2016 +0800

    update SHA table

commit f36a25344c25a131581dcbcd2d103fc4b131012e
Author: Karina <ruil2@cisco.com>
Date:   Wed Mar 23 13:45:58 2016 +0800

    fix bitrate overflow issue when adaptive quality turns on
2016-03-23 10:23:33 -07:00
HaiboZhu
c0641f40d9 Merge pull request #2423 from shihuade/SPSUpdate
fix bug for debug mode
2016-03-23 13:27:35 +08:00
HaiboZhu
e52c6eacb0 Merge pull request #2422 from HaiboZhu/Bugfix_level_check_error_fmo_return_value
Fix the level limit check bug and fmo return overflow bug
2016-03-23 12:14:20 +08:00
Haibo Zhu
e4a4fb6577 (1) Fix the level limit check wrong condition
(2) Fix the FMO return value overflow bug
2016-03-23 11:15:26 +08:00
Forrest Shi
47ad929c25 fix bug for debug mode 2016-03-23 11:14:40 +08:00
sijchen
22d6a94919 Merge pull request #2414 from ksb2go/master
Google has deprecated using SVN. Move over to GitHub
2016-03-22 16:20:58 -07:00
sijchen
40e1a69fae Merge pull request #2421 from shihuade/MultiThread_V5.2_Pull_V2
refactor for slice buffer init/allocate/free
2016-03-22 16:20:37 -07:00
huade
a7a5b7b0f4 refactor for slice buffer init/allocate/free 2016-03-22 13:51:20 +08:00
sijchen
33bb96f604 Merge pull request #2420 from sijchen/fix_sps
[Encoder] fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on
2016-03-21 21:51:07 -07:00
sijchen
8103988cde Merge pull request #2418 from ruil2/refine_init
fix preprocessing initialization logic
2016-03-21 11:32:24 -07:00
Karina
228cdeba1b refine reset function 2016-03-21 10:48:41 +08:00
sijchen
38313b913d Merge pull request #2419 from ruil2/bitrate_update
fix bitrate update issue
2016-03-18 16:07:59 -07:00