Karina
9b2dd55324
add GetBsPostion for cabac and cavlc
2016-05-20 14:29:48 +08:00
sijchen
27e803f6f4
refactor to make logic clean
2016-05-19 09:42:39 -07:00
Karina
ac37666cf1
modify the interface that use a independent subseqID for each layer
2016-05-19 17:17:17 +08:00
sijchen
a5e4cca710
Merge pull request #2467 from ruil2/overflow
...
fix overflow issue
2016-05-18 21:35:32 -07:00
Karina
8a341070f2
fix overflow issue
2016-05-19 12:00:49 +08:00
sijchen
3fd490dbed
Merge pull request #2460 from sijchen/refactor_ref2
...
[Encoder] move strategy related pointer to class
2016-05-18 11:40:08 -07:00
sijchen
1ac02f3002
fix conflict with master
2016-05-18 10:57:39 -07:00
sijchen
7188e50acf
Merge pull request #2465 from ruil2/skip_layers
...
fix temporal layer skip issue
2016-05-18 09:34:09 -07:00
Karina
c298d66d48
fix temporal layer skip issue
2016-05-18 09:47:49 +08:00
sijchen
6d79601d93
Merge pull request #2463 from HaiboZhu/Fix_build_error_windows_debug
...
Fix the wrong variable name which casue the build error
2016-05-16 22:57:32 -07:00
Haibo Zhu
85f4beb9a8
Fix the wrong variable name which casue the build error
2016-05-17 13:46:04 +08:00
HaiboZhu
46220cfb3b
Merge pull request #2461 from HaiboZhu/Bugfix_remove_undefined_behavior_warning
...
Remove the undefined behavior waring in parse_cabac
2016-05-17 10:51:18 +08:00
Haibo Zhu
86c1f0d2c6
Remove the undefined behavior waring in parse_cabac
2016-05-17 09:40:03 +08:00
ruil2
0ec686f7ec
Merge pull request #2452 from sijchen/refactor_sps2
...
Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class
2016-05-17 09:19:14 +08:00
sijchen
1eb735299a
Merge pull request #2458 from ruil2/downsampling2
...
add one new downsampling algorithms
2016-05-16 10:59:35 -07:00
sijchen
00747540fb
move strategy related pointer to class
2016-05-16 10:55:13 -07:00
HaiboZhu
f623aa318d
Merge pull request #2459 from ruil2/fix_crash
...
fix crash when temporal layer is skipped, the frame should not be encoded
2016-05-16 15:35:38 +08:00
Karina
3b55d64902
fix crash when temporal layer is skipped, the frame should not be encoded
2016-05-16 14:43:13 +08:00
Karina
96b2a87030
add one new downsampling algorithms
2016-05-16 09:28:19 +08:00
sijchen
3fa9a4840a
Merge pull request #2433 from hzwangsiyu/master
...
Update .gitignore
2016-05-05 16:27:56 -07:00
sijchen
ffb85046b4
Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class, to improve code readability
2016-05-04 15:06:02 -07:00
HaiboZhu
c30cc41261
Merge pull request #2448 from saamas/encoder-getnonzerocount-sse42
...
[Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
2016-05-04 09:49:47 +08:00
ruil2
e9dc97803d
Merge pull request #2447 from saamas/encoder-cavlcparamcal-sse42
...
[Encoder] Add an SSE4.2 implementation of CavlcParamCal
2016-04-28 09:08:44 +08:00
ruil2
7d65687284
Merge pull request #2441 from saamas/encoder-add-avx2-4x4-quantization-routines
...
[Encoder] Add AVX2 4x4 quantization routines
2016-04-28 09:08:31 +08:00
ruil2
56618249d7
Merge pull request #2436 from saamas/processing-add-avx2-vaa-routines
...
[Processing] Add AVX2 VAA routines
2016-04-28 09:08:03 +08:00
Sindre Aamås
fb0b2b3f41
[Encoder/x86] Drop unneeded LOAD_4_PARA in CavlcParamCal_sse42
2016-04-24 22:59:35 +02:00
Sindre Aamås
d1c7713191
[Encoder/x86] Minor CavlcParamCal_sse42 tweak
...
Do more elaborate register allocation to avoid a few mov instructions.
2016-04-24 22:36:23 +02:00
Sindre Aamås
f56bdc3aa4
[Encoder/x86] Minor CavlcParamCal_sse42 tweak
...
Avoid loading single-use parameter.
2016-04-21 16:29:02 +02:00
Sindre Aamås
2eb8800712
[Encoder/x86] Remove a leftover mov instruction in CavlcParamCal_sse42
2016-04-21 15:53:33 +02:00
Sindre Aamås
4645bd26aa
[Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
...
Avoid touching some cache lines by using popcnt instead of table
lookups.
Also gives a speedup of ~1.4x on Haswell as compared with SSE2.
2016-04-20 19:10:24 +02:00
Sindre Aamås
d906dda224
[UT] Improve GetNonZeroCount tests
...
Reduce duplication.
Test more combinations.
Always test boundary cases.
2016-04-20 19:10:24 +02:00
Sindre Aamås
3f31aff4dc
[Encoder] Add an SSE4.2 implementation of CavlcParamCal
...
Use a combination of table lookups and pshufb to convert coefficients
to zero run/level format. Two 16-entry lookup tables are used for a
total of 192 bytes worth of tables. (The existing SSE2 version uses a
table of size 2048 bytes.)
Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the
speedup is greater for input with many trailing zeros).
The use of popcnt makes it require SSE4.2. This can be replaced with
a small LUT and accumulation which would reduce the requirement to
SSSE3.
2016-04-20 18:37:08 +02:00
Sindre Aamås
502b16925e
[UT] Add tests for CavlcParamCal_c and CavlcParamCal_sse2
2016-04-20 18:37:08 +02:00
HaiboZhu
98c6c6de11
Merge pull request #2446 from HaiboZhu/Reduce_log_size_for_parse_only_mode
...
Add the log reduce logic into parse only mode
2016-04-20 10:48:57 +08:00
HaiboZhu
3b68840d5f
Merge pull request #2444 from GuangweiWang/fix-assembly-arm64
...
Fix assembly arm64
Code review at: https://rbcommons.com/s/OpenH264/r/1594/
2016-04-20 10:03:01 +08:00
Haibo Zhu
3ccecfbdbe
Add the log reduce logic into parse only mode
2016-04-20 09:58:12 +08:00
HaiboZhu
c9433ee73b
Merge pull request #2442 from ruil2/deblocking_fix
...
fix 32-bit parameters issue on arm64 assembly function
2016-04-18 09:21:24 +08:00
Guangwei Wang
cc407b4b21
fix code style
2016-04-17 19:47:55 +08:00
Guangwei Wang
0b8cdcaff8
extension 32-bit parameters to 64-bit on arm64 assembly function
2016-04-17 19:41:57 +08:00
Karina
1ecb9582df
update arm assembly comments
2016-04-14 14:57:21 +08:00
Karina
dd340b7fe7
modify neon comment
2016-04-14 14:49:11 +08:00
Karina
525dbe7093
add 32-bit parameter sign-extentions for block_add_aarch64_neon.S
2016-04-14 10:06:57 +08:00
Karina
d34e209266
fix 32-bit parameters issue on arm64 assembly function
2016-04-13 19:30:08 +08:00
Sindre Aamås
bb49e23719
[Encoder] Add AVX2 4x4 quantization routines
...
WelsQuantFour4x4Max_avx2 (~2.06x speedup over SSE2)
WelsQuantFour4x4_avx2 (~2.32x speedup over SSE2)
WelsQuant4x4Dc_avx2 (~1.49x speedup over SSE2)
WelsQuant4x4_avx2 (~1.42x speedup over SSE2)
2016-04-13 11:56:47 +02:00
Sindre Aamås
1e83bec860
[UT] Add some missing quantization tests
2016-04-13 11:56:44 +02:00
Sindre Aamås
abaf3a4104
[UT] Reduce duplication in quantization tests
2016-04-13 08:59:16 +02:00
HaiboZhu
50daa8f737
Merge pull request #2439 from ruil2/deblocking_fix
...
add missing sign extension for arm64 on deblocking_aarch64_neon.S
2016-04-12 16:48:54 +08:00
Karina
7943764869
add missing sign extension for arm64
2016-04-12 16:27:58 +08:00
Sindre Aamås
93db6511a8
[UT] Test VAA routines with a wider variety of resolutions
...
Test even and odd multiples of 32 width because some AVX2 routines
have conditional logic based on that.
2016-04-11 16:40:36 +02:00
Sindre Aamås
57fc3e9917
[Processing] Add AVX2 VAA routines
...
Process 8 lines at a time rather than 16 lines at a time because
this appears to give more reliable memory subsystem performance on
Haswell.
Speedup is > 2x as compared to SSE2 when not memory-bound on Haswell.
On my Haswell MBP, VAACalcSadSsdBgd is about ~3x faster when uncached,
which appears to be related to processing 8 lines at a time as opposed
to 16 lines at a time. The other routines are also faster as compared
to the SSE2 routines in this case but to a lesser extent.
2016-04-11 16:09:56 +02:00