Sindre Aamås
f90960983c
[Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
...
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.
~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
Sindre Aamås
7486de2844
[Encoder] AVX2 DCT tweaks
...
Do some shuffling in load/store unpack/pack to save some
work in horizontal DCTs.
Use a few 128-bit broadcasts to compact data vectors a bit.
~1.04x speedup for the DCT case on Haswell.
~1.12x speedup for the IDCT case on Haswell.
2016-02-02 17:22:48 +01:00
HaiboZhu
1030820ec4
Merge pull request #2342 from sijchen/enh_ut_tem
...
[UT] correct and enhance the ut template and trace improvement
2016-02-01 09:08:05 +08:00
zhilwang
c420d72443
Merge pull request #2341 from saamas/encoder-x86-dct-opt
...
[Encoder] x86 DCT optimizations
2016-01-28 10:33:34 +08:00
HaiboZhu
51f3bbdfde
Merge pull request #2345 from shihuade/WP8ScriptUpdate
...
update build script for wp8 under multi-vc version
2016-01-24 07:56:23 +08:00
Forrest Shi
21402ca419
update build script for wp8 under multi-vc version
2016-01-23 16:56:53 +08:00
HaiboZhu
3174e2a220
Merge pull request #2344 from mstorsjo/cleanup-map
...
Ignore the MSVC generated map file, remove it on make clean
2016-01-22 09:45:57 +08:00
Martin Storsjö
fa52fbfc9d
Ignore the MSVC generated map file, remove it on make clean
2016-01-21 10:23:34 +02:00
HaiboZhu
77c40e09e0
Merge pull request #2343 from HaiboZhu/Add_map_file_msvc
...
Generate map file for msvc build
2016-01-21 14:34:50 +08:00
sijchen
ef329e33c3
add simulcastAvc setting in setting trace
2016-01-20 14:24:16 -08:00
sijchen
47e3f4c45c
correct and enhance the ut template
2016-01-19 17:16:39 -08:00
Sindre Aamås
cc8d541432
[UT] Utilize DCT function pointer typedefs
2016-01-19 22:00:24 +01:00
Sindre Aamås
e22d731f26
[Encoder] yasm-compatible vinserti128 syntax in DCT asm
2016-01-19 21:48:23 +01:00
Sindre Aamås
a45c10cf91
[UT] Only run AVX2 tests if host supports AVX2
2016-01-19 14:27:46 +01:00
Sindre Aamås
144ff0fd51
[Encoder] SSE2 4x4 IDCT optimizations
...
Use a combination of instruction types that distributes more
evenly across execution ports on common architectures.
Do the horizontal IDCT without transposing back and forth.
Minor tweaks.
~1.14x faster on Haswell. Should be faster on other architectures
as well.
2016-01-19 13:12:29 +01:00
Sindre Aamås
991e344d8c
[Encoder] SSE2 4x4 DCT optimizations
...
Use a combination of instruction types that distributes more
evenly across execution ports on common architectures.
Do the horizontal DCT without transposing back and forth.
Minor tweaks.
~1.54x faster on Haswell. Should be faster on other architectures
as well.
2016-01-19 13:12:28 +01:00
Sindre Aamås
3088d96978
[Encoder] Add an AVX2 4x4 IDCT implementation
...
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b267163f10
[Encoder] Add an AVX2 4x4 DCT implementation
...
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b9adbcf37c
[UT] Add missing SSE2 4x4 IDCT test
...
IDCT input is defined in such a way that the intermediate values
cannot legally overflow an int16_t. The use of random values
as input causes such overflows. This results in implementation-
dependent output depending on which type is used to hold
intermediate results. Use a template for the test reference
implementation to test implementations with different
intermediate representation.
2016-01-19 13:12:28 +01:00
Sindre Aamås
8764231784
[UT] Improve DCT tests
...
Initialize input arrays with different random values.
Otherwise, the input to the DCT routines is effectively
all zero values after taking the difference.
Reduce duplication.
2016-01-19 13:12:28 +01:00
Sindre Aamås
7739184dfd
Update nasm requirement in README.md
...
We need version 2.10 or above for AVX2 support.
2016-01-19 13:12:28 +01:00
Sindre Aamås
496de8bf09
Use dist: trusty with travis
...
Trusty has a newer nasm version with AVX2 support.
2016-01-19 12:10:39 +01:00
Haibo Zhu
3206010a89
Generate map file for msvc build
2016-01-19 17:03:50 +08:00
HaiboZhu
21c1c02441
Merge pull request #2334 from sijchen/fix_ut
...
[UT] fix the prob in case that the task uID is too big
2016-01-19 15:39:17 +08:00
HaiboZhu
8eb4de10a2
Merge pull request #2337 from HaiboZhu/Add_Protection_wrong_API_call
...
Add protection for wrong API call without initialize
2016-01-19 13:42:49 +08:00
HaiboZhu
5e3e975ffb
Merge pull request #2331 from ruil2/return_value
...
add return value judgment
2016-01-19 12:25:10 +08:00
Haibo Zhu
6d7bd2daf4
Add protection for wrong API call without initialize
2016-01-19 12:00:54 +08:00
huili2
91fa9fad63
Merge pull request #2335 from mstorsjo/fix-msvc-warnings
...
Avoid warnings in MSVC about implicitly casting floats to integers
2016-01-18 08:48:15 +08:00
Martin Storsjö
fbe35cffca
Avoid warnings in MSVC about implicitly casting floats to integers
2016-01-16 11:10:25 +02:00
sijchen
d46cd07511
fix the prob in case that the task uID is too big
2016-01-15 16:06:09 -08:00
Karina
559e786fa4
add return value judgment
2016-01-15 10:30:41 +08:00
HaiboZhu
d11f12db54
Merge pull request #2330 from ruil2/mt_build_1
...
fix build issue when some macro turn on
2016-01-15 09:28:07 +08:00
HaiboZhu
67f925674a
Merge pull request #2329 from ruil2/layer4
...
using independent encoder control logic for SAVC case
2016-01-15 09:27:58 +08:00
Karina
67f4dcf2e2
fix build issue when some macro turn on
2016-01-14 09:40:20 +08:00
Karina
0f0d54ef51
using independent encoder control logic for SAVC case
2016-01-14 09:16:12 +08:00
ruil2
7bfb96b2b6
Merge pull request #2327 from sijchen/th41
...
Multiple enhancements and a bug fix
2016-01-13 13:07:58 +08:00
sijchen
5cad0f9bba
enhance a UT to cover more case
2016-01-11 22:01:02 -08:00
sijchen
bf35b6fee7
add a debug trace if encoder returns error
2016-01-11 22:00:24 -08:00
sijchen
19f5eb0932
complete a debug trace in load-balancing task
2016-01-11 22:00:14 -08:00
sijchen
7a8da6a468
remove unneed codes after new task-managements
2016-01-11 21:59:49 -08:00
sijchen
dcdd496082
fix a bug in multi-layer case in task-management
2016-01-11 21:58:10 -08:00
HaiboZhu
b940e2cdf8
Merge pull request #2325 from ruil2/trace1
...
separate each layer trace output
2016-01-11 14:05:55 +08:00
ruil2
737548fe06
Merge pull request #2326 from shihuade/Win10_V1.0_Push
...
update auto build script for windows 10
2016-01-08 17:15:40 +08:00
ruil2
c32263e06b
Merge pull request #2322 from HaiboZhu/Fix_Encoder_Info_Output
...
Fix the build errors when open the encoder info output
2016-01-08 17:15:15 +08:00
huade
1d9497b7f6
update auto build script for windows 10
2016-01-08 09:38:15 +08:00
Karina
d4f979c495
seperate each layer trace output
2016-01-05 14:02:58 +08:00
HaiboZhu
303fbfeb55
Merge pull request #2324 from ruil2/update_style
...
update format
2016-01-05 13:08:53 +08:00
Karina
57c87f1845
update format
2016-01-05 11:40:59 +08:00
HaiboZhu
cd75541c8f
Merge pull request #2323 from ruil2/rc_timestamp
...
resolve abnormal timestamp(rollback or jump case)
2015-12-31 09:55:58 +08:00
Haibo Zhu
a6a504f944
Fix the build errors when open the encoder info output
2015-12-31 09:06:59 +08:00