Commit Graph

203 Commits

Author SHA1 Message Date
sijchen
ffb85046b4 Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class, to improve code readability 2016-05-04 15:06:02 -07:00
HaiboZhu
c30cc41261 Merge pull request #2448 from saamas/encoder-getnonzerocount-sse42
[Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
2016-05-04 09:49:47 +08:00
ruil2
e9dc97803d Merge pull request #2447 from saamas/encoder-cavlcparamcal-sse42
[Encoder] Add an SSE4.2 implementation of CavlcParamCal
2016-04-28 09:08:44 +08:00
Sindre Aamås
4645bd26aa [Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount
Avoid touching some cache lines by using popcnt instead of table
lookups.

Also gives a speedup of ~1.4x on Haswell as compared with SSE2.
2016-04-20 19:10:24 +02:00
Sindre Aamås
d906dda224 [UT] Improve GetNonZeroCount tests
Reduce duplication.
Test more combinations.
Always test boundary cases.
2016-04-20 19:10:24 +02:00
Sindre Aamås
3f31aff4dc [Encoder] Add an SSE4.2 implementation of CavlcParamCal
Use a combination of table lookups and pshufb to convert coefficients
to zero run/level format. Two 16-entry lookup tables are used for a
total of 192 bytes worth of tables. (The existing SSE2 version uses a
table of size 2048 bytes.)

Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the
speedup is greater for input with many trailing zeros).

The use of popcnt makes it require SSE4.2. This can be replaced with
a small LUT and accumulation which would reduce the requirement to
SSSE3.
2016-04-20 18:37:08 +02:00
Sindre Aamås
502b16925e [UT] Add tests for CavlcParamCal_c and CavlcParamCal_sse2 2016-04-20 18:37:08 +02:00
Sindre Aamås
bb49e23719 [Encoder] Add AVX2 4x4 quantization routines
WelsQuantFour4x4Max_avx2 (~2.06x speedup over SSE2)
WelsQuantFour4x4_avx2    (~2.32x speedup over SSE2)
WelsQuant4x4Dc_avx2      (~1.49x speedup over SSE2)
WelsQuant4x4_avx2        (~1.42x speedup over SSE2)
2016-04-13 11:56:47 +02:00
Sindre Aamås
1e83bec860 [UT] Add some missing quantization tests 2016-04-13 11:56:44 +02:00
Sindre Aamås
abaf3a4104 [UT] Reduce duplication in quantization tests 2016-04-13 08:59:16 +02:00
Sindre Aamås
48a520915a [Encoder/x86] Add AVX2 SATD routines
WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell).
WelsSampleSatd16x8_avx2  (~2.19x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x16_avx2  (~1.68x speedup over SSE4.1 on Haswell).
WelsSampleSatd8x8_avx2   (~1.53x speedup over SSE4.1 on Haswell).
2016-03-08 11:31:17 +01:00
Gregory J. Wolfe
03890fe86f Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-24 10:33:18 -05:00
Gregory J. Wolfe
c7fcba06c7 Added support for "video signal type present" information.
The "Video signal type present" information is written to the output
video file when it is created, and later is used by the decoder to
properly decode the compressed video data. The saved attributes
are:

- format type (PAL, NTSC, etc.)
- color primaries (BT709, SMPTE170M, etc.)
- transfer characteristics (BT709, SMPTE170M, etc.)
- color matrix ((BT709, SMPTE170M, etc.)

These modifications allow the client to specify these attributes
and, if specified, makes sure they are written to the output file.
2016-02-23 13:21:06 -05:00
sijchen
aaa25160ec Merge pull request #2353 from saamas/encoder-x86-dct-opt2
[Encoder] x86 DCT optimizations
2016-02-08 15:00:12 -08:00
Sindre Aamås
c8c74903f8 [Encoder] Add single-block AVX2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~3.15x speedup over MMX for the DCT on Haswell.
~2.94x speedup over MMX for the IDCT on Haswell.

Returns diminish with increasing vector length because a larger
proportion of the time is spent on load/store/shuffling.
2016-02-02 17:22:49 +01:00
Sindre Aamås
f90960983c [Encoder] Add single-block SSE2 4x4 DCT/IDCT routines
We do four blocks at a time when possible, but need to handle
single blocks at a time for intra prediction.

~2.31x speedup over MMX for the DCT on Haswell.
~1.92x speedup over MMX for the IDCT on Haswell.
2016-02-02 17:22:48 +01:00
unknown
3873addc3d fix frame size constraints for width and height 2016-02-01 15:55:53 +08:00
Sindre Aamås
cc8d541432 [UT] Utilize DCT function pointer typedefs 2016-01-19 22:00:24 +01:00
Sindre Aamås
a45c10cf91 [UT] Only run AVX2 tests if host supports AVX2 2016-01-19 14:27:46 +01:00
Sindre Aamås
3088d96978 [Encoder] Add an AVX2 4x4 IDCT implementation
~2.03x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b267163f10 [Encoder] Add an AVX2 4x4 DCT implementation
~2.52x faster on Haswell as compared to the SSE2 version.
2016-01-19 13:12:28 +01:00
Sindre Aamås
b9adbcf37c [UT] Add missing SSE2 4x4 IDCT test
IDCT input is defined in such a way that the intermediate values
cannot legally overflow an int16_t. The use of random values
as input causes such overflows. This results in implementation-
dependent output depending on which type is used to hold
intermediate results. Use a template for the test reference
implementation to test implementations with different
intermediate representation.
2016-01-19 13:12:28 +01:00
Sindre Aamås
8764231784 [UT] Improve DCT tests
Initialize input arrays with different random values.

Otherwise, the input to the DCT routines is effectively
all zero values after taking the difference.

Reduce duplication.
2016-01-19 13:12:28 +01:00
sijchen
aeb5ab4b99 [Encoder] put the logic related to multiple D layer into a class for better structure 2015-11-11 22:55:16 -08:00
sijchen
33c378f7b7 change API for slicing part for easier usage (the UseLoadBalancing flag is still under working) 2015-11-10 09:50:06 -08:00
Sijia Chen
819f6f5d93 [Encoder] add encoder tasks and task-management class
https://rbcommons.com/s/OpenH264/r/1334/
2015-10-19 22:48:28 -07:00
karina li
2c830e64d7 exception case for width or height is less than 16 2015-09-08 17:21:56 +08:00
Guangwei Wang
e42ce60cc9 add UT for sub8x8 modes assembly functions 2015-07-30 10:02:32 +08:00
Martin Storsjö
78e0ec6130 Convert tabs to spaces before comments 2015-06-10 10:22:29 +03:00
Martin Storsjö
764793d74b Remove tabs in struct and class definitions 2015-06-10 10:22:01 +03:00
Martin Storsjö
ca51ee0f44 Remove tabs where a simple space is just enough 2015-06-10 10:21:52 +03:00
Martin Storsjö
51efa57a3d Convert tabs to spaces in vertically aligned code 2015-06-10 10:21:29 +03:00
Martin Storsjö
723044837a Convert tabs to spaces in defines 2015-06-10 10:21:25 +03:00
Martin Storsjö
ebbcb67fb7 Convert tabs to spaces in assignment of SIMD function pointers 2015-06-03 15:39:30 +03:00
Martin Storsjö
0298b3f580 Initialize enough samples in the new 4x8 tests
This fixes valgrind warnings about tests using uninitialized data.
2015-06-03 09:45:06 +03:00
huili2
f76325edc7 Merge pull request #1973 from huili2/sub8
modify some functions extending to sub8x8 usage, especially in ME part
2015-06-02 14:44:06 +08:00
huili2
c3cfce5223 modify some functions extending to sub8x8 usage, especially in ME part 2015-06-02 13:39:38 +08:00
sijchen
5588e82fce Merge pull request #1961 from mstorsjo/fix-warnings
Remove a redundant check of this!=NULL
2015-06-01 10:42:56 +08:00
Martin Storsjö
1239bb24ba Remove a redundant check of this!=NULL
'this' can't be NULL in well-defined C++ code. This fixes a warning
with clang 3.6 from Xcode 6.3.
2015-05-27 11:46:53 +03:00
Sijia Chen
9442a7a0b5 add parameter checking on resolution and related UT 2015-05-26 15:41:47 +08:00
Martin Storsjö
b90eca78cd Avoid endian assumptions in FillQpelLocationByFeatureValue_c
These values are read as two separate 16 bit integers from an
array in the FeatureSearchOne function, therefore we should
also store them in a well-defined order.

This fixes encoding of screen content on big endian; now the
full testsuite passes on big endian.
2015-05-15 13:11:23 +03:00
Martin Storsjö
7a80c21526 Reformat tables without tabs 2015-05-13 22:06:58 +03:00
Haibo Zhu
61b82d28c4 Add framerate & spatialbitrate comparison for encoder UT 2015-05-05 18:53:50 -07:00
Martin Storsjö
8d34c68ad6 Add a missing newline at the end of a file
Some tools (like git) complain if a file lacks a newline at
the end of a file, and some editors will automatically readd
it when editing such files.
2015-05-04 12:46:48 +03:00
Sijia Chen
1922b533f6 change the range of frame rate from 30 to 60 2015-04-16 12:45:43 +08:00
ruil2
cce966fbba update bGapsInFrameNumValueAllowedFlag according to parameters setting 2015-03-18 13:44:03 +08:00
ruil2
7d055cae94 Merge pull request #1786 from sijchen/fix_over
improve error logging in UT
2015-02-06 12:17:56 +08:00
sijchen
5fdd01ec0c Merge pull request #1787 from mstorsjo/remove-stray-semicolon
Remove accidental double semicolons
2015-02-02 18:15:02 +08:00
sijchen
e7a7a35611 Merge pull request #1779 from mstorsjo/share-memalign
Move the memory allocation/deallocation routines to the common library
2015-02-02 18:14:55 +08:00
Martin Storsjö
a3063531c4 Remove accidental double semicolons 2015-02-02 09:20:35 +02:00