Commit Graph

4056 Commits

Author SHA1 Message Date
Sindre Aamås
991e344d8c [Encoder] SSE2 4x4 DCT optimizations
Use a combination of instruction types that distributes more
evenly across execution ports on common architectures.

Do the horizontal DCT without transposing back and forth.

Minor tweaks.

~1.54x faster on Haswell. Should be faster on other architectures
as well.
2016-01-19 13:12:28 +01:00
Sindre Aamås
3088d96978 [Encoder] Add an AVX2 4x4 IDCT implementation
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b267163f10 [Encoder] Add an AVX2 4x4 DCT implementation
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b9adbcf37c [UT] Add missing SSE2 4x4 IDCT test
IDCT input is defined in such a way that the intermediate values
cannot legally overflow an int16_t. The use of random values
as input causes such overflows. This results in implementation-
dependent output depending on which type is used to hold
intermediate results. Use a template for the test reference
implementation to test implementations with different
intermediate representation.
2016-01-19 13:12:28 +01:00
Sindre Aamås
8764231784 [UT] Improve DCT tests
Initialize input arrays with different random values.

Otherwise, the input to the DCT routines is effectively
all zero values after taking the difference.

Reduce duplication.
2016-01-19 13:12:28 +01:00
Sindre Aamås
7739184dfd Update nasm requirement in README.md
We need version 2.10 or above for AVX2 support.
2016-01-19 13:12:28 +01:00
Sindre Aamås
496de8bf09 Use dist: trusty with travis
Trusty has a newer nasm version with AVX2 support.
2016-01-19 12:10:39 +01:00
HaiboZhu
21c1c02441 Merge pull request #2334 from sijchen/fix_ut
[UT] fix the prob in case that the task uID is too big
2016-01-19 15:39:17 +08:00
HaiboZhu
8eb4de10a2 Merge pull request #2337 from HaiboZhu/Add_Protection_wrong_API_call
Add protection for wrong API call without initialize
2016-01-19 13:42:49 +08:00
HaiboZhu
5e3e975ffb Merge pull request #2331 from ruil2/return_value
add return value judgment
2016-01-19 12:25:10 +08:00
Haibo Zhu
6d7bd2daf4 Add protection for wrong API call without initialize 2016-01-19 12:00:54 +08:00
huili2
91fa9fad63 Merge pull request #2335 from mstorsjo/fix-msvc-warnings
Avoid warnings in MSVC about implicitly casting floats to integers
2016-01-18 08:48:15 +08:00
Martin Storsjö
fbe35cffca Avoid warnings in MSVC about implicitly casting floats to integers 2016-01-16 11:10:25 +02:00
sijchen
d46cd07511 fix the prob in case that the task uID is too big 2016-01-15 16:06:09 -08:00
Karina
559e786fa4 add return value judgment 2016-01-15 10:30:41 +08:00
HaiboZhu
d11f12db54 Merge pull request #2330 from ruil2/mt_build_1
fix build issue when some macro turn on
2016-01-15 09:28:07 +08:00
HaiboZhu
67f925674a Merge pull request #2329 from ruil2/layer4
using independent encoder control logic for SAVC case
2016-01-15 09:27:58 +08:00
Karina
67f4dcf2e2 fix build issue when some macro turn on 2016-01-14 09:40:20 +08:00
Karina
0f0d54ef51 using independent encoder control logic for SAVC case 2016-01-14 09:16:12 +08:00
ruil2
7bfb96b2b6 Merge pull request #2327 from sijchen/th41
Multiple enhancements and a bug fix
2016-01-13 13:07:58 +08:00
sijchen
5cad0f9bba enhance a UT to cover more case 2016-01-11 22:01:02 -08:00
sijchen
bf35b6fee7 add a debug trace if encoder returns error 2016-01-11 22:00:24 -08:00
sijchen
19f5eb0932 complete a debug trace in load-balancing task 2016-01-11 22:00:14 -08:00
sijchen
7a8da6a468 remove unneed codes after new task-managements 2016-01-11 21:59:49 -08:00
sijchen
dcdd496082 fix a bug in multi-layer case in task-management 2016-01-11 21:58:10 -08:00
HaiboZhu
b940e2cdf8 Merge pull request #2325 from ruil2/trace1
separate each layer trace output
2016-01-11 14:05:55 +08:00
ruil2
737548fe06 Merge pull request #2326 from shihuade/Win10_V1.0_Push
update auto build script for windows 10
2016-01-08 17:15:40 +08:00
ruil2
c32263e06b Merge pull request #2322 from HaiboZhu/Fix_Encoder_Info_Output
Fix the build errors when open the encoder info output
2016-01-08 17:15:15 +08:00
huade
1d9497b7f6 update auto build script for windows 10 2016-01-08 09:38:15 +08:00
Karina
d4f979c495 seperate each layer trace output 2016-01-05 14:02:58 +08:00
HaiboZhu
303fbfeb55 Merge pull request #2324 from ruil2/update_style
update format
2016-01-05 13:08:53 +08:00
Karina
57c87f1845 update format 2016-01-05 11:40:59 +08:00
HaiboZhu
cd75541c8f Merge pull request #2323 from ruil2/rc_timestamp
resolve abnormal timestamp(rollback or jump case)
2015-12-31 09:55:58 +08:00
Haibo Zhu
a6a504f944 Fix the build errors when open the encoder info output 2015-12-31 09:06:59 +08:00
HaiboZhu
539818101f Merge pull request #2321 from huili2/modify_ec_option_comment
modify EC method comment in API
2015-12-30 14:21:58 +08:00
huili2
740968d1f6 modify EC method comment in API 2015-12-30 13:41:29 +08:00
Karina
0d5db3d986 resolve abnormal timestamp(rollback or jump case) 2015-12-29 15:05:42 +08:00
ruil2
e3c2cb00a5 Merge pull request #2317 from shihuade/Scripts_V3
update scripts
2015-12-18 14:50:18 +08:00
huade
f79361ac35 update scripts 2015-12-18 09:05:12 +08:00
sijchen
100e952231 Merge pull request #2314 from shihuade/MultiThread_V4.5_SliceBsRefact_V1
remove pSliceBs from ctx
2015-12-17 12:02:00 -08:00
sijchen
1b0735c3a9 Merge pull request #2315 from shihuade/Scripts_V2
add scripts for multi-encoder comparision
2015-12-17 12:01:49 -08:00
huade
74d73ac7ec add scripts for multi-encoder comparision 2015-12-17 16:22:55 +08:00
huade
f161566458 remove pSliceBs from ctx 2015-12-15 17:10:52 +08:00
HaiboZhu
04bfacd7e1 Merge pull request #2313 from shihuade/MultiThread_V4.4_ThreadIdcUnify
refact threadIdc and CPU cores logic in init module
2015-12-15 13:56:49 +08:00
huade
ef38c2abf8 refact threadIdc and CPU cores logic in init module 2015-12-15 11:27:00 +08:00
sijchen
e75c5852e8 Merge pull request #2312 from shihuade/TravisTestCase
reduce one test sequences and let travis jobs num to 4, thus reduce test time
2015-12-14 09:44:38 -08:00
sijchen
406f89ec54 Merge pull request #2309 from shihuade/MultiThread_V4.4_ThreadSliceNum_V3_Pull
remove iCountThreadsNum and unitfy with iMultipleThreadIdc
2015-12-14 09:44:13 -08:00
sijchen
2620f4bcfd Merge pull request #2310 from shihuade/MultiThread_V4.5_LayerSizeFixed
fixed layer size update bugs
2015-12-14 09:44:01 -08:00
huade
0f24b80af8 reduce one test sequences and let travis jobs num to 4, thus reduce test time 2015-12-14 17:18:21 +08:00
huade
549a1b9bf4 fixed layer size update bugs 2015-12-14 14:56:09 +08:00