677 Commits

Author SHA1 Message Date
Sindre Aamås
b6c4a5447c [Decoder/x86] IDCT one block at a time with SSE2
At lower bitrates, it is overall faster to conditionally do one block
at a time with SSE2 on Haswell and likely other common architectures.
At higher bitrates, it is faster to use the wider routine that IDCTs
four blocks at a time. To avoid potential performance regressions
as compared to MMX, stick with single-block IDCTs with SSE2. There
is still a performance advantage as compared to MMX because the
single-block SSE2 routine is faster than the corresponding MMX
routine.

Stick with four blocks at a time with AVX2 for which that appears
to be consistently faster on Haswell.
2016-03-16 19:55:11 +01:00
Sindre Aamås
98042f1600 [Decoder] Use encoder x86 IDCT routines
Move asm routines to common. Delete obsolete decoder routines.

Use wider routines where applicable.

~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.
2016-03-09 10:41:42 +01:00
Sindre Aamås
48a520915a [Encoder/x86] Add AVX2 SATD routines
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2  (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2  (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2   (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
sijchen
4db9c32976 remove sink in WelsThreadPool and hide the construtor to finish the singleTon 2016-03-02 17:08:09 -08:00
sijchen
d4f09d9048 put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask) 2016-02-29 11:40:25 -08:00
Gregory J. Wolfe
03890fe86f Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-24 10:33:18 -05:00
Gregory J. Wolfe
c7fcba06c7 Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
sijchen
aaa25160ec Merge pull request #2353 from saamas/encoder-x86-dct-opt2
[Encoder] x86 DCT optimizations
2016-02-08 15:00:12 -08:00
sijchen
e5e7013b73 Merge pull request #2350 from sijchen/th00
[Common] Add sink to IWelsTask
2016-02-08 14:59:38 -08:00
Sindre Aamås
c8c74903f8 [Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.

Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
Sindre Aamås
f90960983c [Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
unknown
3873addc3d fix frame size constraints for width and height 2016-02-01 15:55:53 +08:00
HaiboZhu
1030820ec4 Merge pull request #2342 from sijchen/enh_ut_tem
[UT] correct and enhance the ut template and trace improvement
2016-02-01 09:08:05 +08:00
sijchen
47e3f4c45c correct and enhance the ut template 2016-01-19 17:16:39 -08:00
Sindre Aamås
cc8d541432 [UT] Utilize DCT function pointer typedefs 2016-01-19 22:00:24 +01:00
Sindre Aamås
a45c10cf91 [UT] Only run AVX2 tests if host supports AVX2 2016-01-19 14:27:46 +01:00
Sindre Aamås
3088d96978 [Encoder] Add an AVX2 4x4 IDCT implementation
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b267163f10 [Encoder] Add an AVX2 4x4 DCT implementation
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b9adbcf37c [UT] Add missing SSE2 4x4 IDCT test
IDCT input is defined in such a way that the intermediate values
cannot legally overflow an int16_t. The use of random values
as input causes such overflows. This results in implementation-
dependent output depending on which type is used to hold
intermediate results. Use a template for the test reference
implementation to test implementations with different
intermediate representation.
2016-01-19 13:12:28 +01:00
Sindre Aamås
8764231784 [UT] Improve DCT tests
Initialize input arrays with different random values.

Otherwise, the input to the DCT routines is effectively
all zero values after taking the difference.

Reduce duplication.
2016-01-19 13:12:28 +01:00
sijchen
d46cd07511 fix the prob in case that the task uID is too big 2016-01-15 16:06:09 -08:00
sijchen
5eb18b101e change the output way of debug trace 2016-01-13 22:13:43 -08:00
Karina
0f0d54ef51 using independent encoder control logic for SAVC case 2016-01-14 09:16:12 +08:00
sijchen
cce1c29844 add sink to IWelsTask (for further enhancements) 2016-01-13 16:24:54 -08:00
sijchen
5cad0f9bba enhance a UT to cover more case 2016-01-11 22:01:02 -08:00
huade
0f24b80af8 reduce one test sequences and let travis jobs num to 4, thus reduce test time 2015-12-14 17:18:21 +08:00
HaiboZhu
ee01b3afaf Merge pull request #2307 from huili2/fix_decstat
fix iAvgLumaQp in decStat
2015-12-14 10:26:16 +08:00
huili2
b2d4a95537 fix iAvgLumaQp in decStat 2015-12-11 14:14:42 +08:00
sijchen
0c820f4c06 adjust encoder test case to cover multi-thread without loadbalancing 2015-12-09 09:58:03 -08:00
sijchen
76ca56498a Add tasks and thread pool call for SM_SIZELIMITED_SLICE mode 2015-12-09 09:55:04 -08:00
HaiboZhu
7e9fdc181f Merge pull request #2301 from huili2/simple_parseonly_ctx
remove parseonly in decoder ctx
2015-12-09 10:33:29 +08:00
sijchen
f38d24f036 fix the conflict with the current master 2015-11-30 23:42:26 -08:00
Guangwei Wang
c917d09263 fix bug in UT code 2015-12-01 08:55:00 +08:00
sijchen
420778f4d8 add valid adjustment in test to avoid outputing warning trace 2015-11-30 11:33:13 -08:00
sijchen
42ac53b5fc update win UT project after UT structure change 2015-11-30 11:29:47 -08:00
sijchen
46667588e3 moving test cases to specific files to avoid the too long encode_decode_api_test.cpp 2015-11-30 10:47:10 -08:00
huili2
926fc67451 remove parseonly in decoder ctx 2015-11-27 08:56:20 +08:00
sijchen
05c89b75f0 remove duplicated operation after thread pool and rename a task for clearer meaning 2015-11-25 13:46:21 -08:00
sijchen
67dab5d70e Merge pull request #2266 from sijchen/ut0
[UT] put class notification to header file
2015-11-25 09:57:43 -08:00
HaiboZhu
404315ab19 Merge pull request #2270 from huili2/parseonly_api_bugfix
disable wrongly calling for parseonly related
2015-11-25 09:00:54 +08:00
huili2
9fade10d77 disable wrongly calling for parseonly related 2015-11-24 11:11:27 +08:00
sijchen
5d03a8a692 put class notification to header file 2015-11-23 15:55:24 -08:00
sijchen
f3c4b878ff update the usage of flag and MD5 value 2015-11-23 11:54:43 -08:00
Martin Storsjö
eaf4798119 Readd a test for GetOption in TestInitUninit
In dc2cbe4, the previous test for GetOption that succeeds when the
decoder is initialized was removed. Add a GetOption call for a different
option, now that DECODER_OPTION_DATAFORMAT is removed.
2015-11-20 00:17:43 +02:00
Martin Storsjö
b3b083c883 Fully initialize m_sDecParam in TestInitUninit
Before dc2cbe4, the DecoderConfigParam function returned early
since DecoderSetCsp signaled a failure, which is why the uninitialized
parameters weren't read before.

This fixes valgrind warnings about conditional jumps depending on
uninitialized values.
2015-11-20 00:13:42 +02:00
huili2
dc2cbe4a22 remove API data format in decoder in 1.6 2015-11-17 13:58:57 +08:00
sijchen
b5d890c1ea Merge pull request #2224 from sijchen/thp73
[Encoder] put the logic related to multiple D layer into a class …
2015-11-13 11:57:07 -08:00
Haibo Zhu
628befe8be Revert "Merge pull request #2217 from huili2/simply_dec_ctx"
This reverts commit 27172bafd7ff2cc80b08768a32a23470f3d6d3fd, reversing
changes made to 24916a652ee5d3e36d931c222df20966f7c158fa.
2015-11-13 20:16:03 +08:00
Karina
7c1fbad53a fix crash 2015-11-13 17:16:26 +08:00
sijchen
e508c86dac fix the missing loadbalancing part 2015-11-12 13:15:07 -08:00