openh264

Author	SHA1	Message	Date
Sindre Aamås	3f31aff4dc	[Encoder] Add an SSE4.2 implementation of CavlcParamCal Use a combination of table lookups and pshufb to convert coefficients to zero run/level format. Two 16-entry lookup tables are used for a total of 192 bytes worth of tables. (The existing SSE2 version uses a table of size 2048 bytes.) Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the speedup is greater for input with many trailing zeros). The use of popcnt makes it require SSE4.2. This can be replaced with a small LUT and accumulation which would reduce the requirement to SSSE3.	2016-04-20 18:37:08 +02:00
Sindre Aamås	502b16925e	[UT] Add tests for CavlcParamCal_c and CavlcParamCal_sse2	2016-04-20 18:37:08 +02:00
Martin Storsjö	81493590f8	Remove a stray empty line This disappears when regenerating the makefiles.	2016-03-24 10:01:48 +02:00
sijchen	47d310539f	Squashed commit of the following: commit c8111942e07437034a74b33887c33b5ad78e476a Author: Karina <ruil2@cisco.com> Date: Wed Mar 23 14:31:18 2016 +0800 update SHA table commit f36a25344c25a131581dcbcd2d103fc4b131012e Author: Karina <ruil2@cisco.com> Date: Wed Mar 23 13:45:58 2016 +0800 fix bitrate overflow issue when adaptive quality turns on	2016-03-23 10:23:33 -07:00
sijchen	33bb96f604	Merge pull request #2420 from sijchen/fix_sps [Encoder] fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on	2016-03-21 21:51:07 -07:00
zhilwang	d7570bfa52	Merge pull request #2401 from saamas/decoder-use-encoder-x86-idct-routines [Decoder] Use encoder x86 IDCT routines	2016-03-18 08:50:33 +08:00
Sindre Aamås	b6c4a5447c	[Decoder/x86] IDCT one block at a time with SSE2 At lower bitrates, it is overall faster to conditionally do one block at a time with SSE2 on Haswell and likely other common architectures. At higher bitrates, it is faster to use the wider routine that IDCTs four blocks at a time. To avoid potential performance regressions as compared to MMX, stick with single-block IDCTs with SSE2. There is still a performance advantage as compared to MMX because the single-block SSE2 routine is faster than the corresponding MMX routine. Stick with four blocks at a time with AVX2 for which that appears to be consistently faster on Haswell.	2016-03-16 19:55:11 +01:00
huili2	a8d9576297	Merge pull request #2405 from HaiboZhu/Fix_UT_decoder_init_fail Fix the decoder init failed case in UT	2016-03-16 16:28:14 +08:00
sijchen	c009183e97	fix the lack of eSpsPpsIdStrategy==INCREASING_ID under simulcast avc on	2016-03-14 11:28:44 -07:00
Haibo Zhu	46f42ec5f3	Fix the decoder init failed case in UT	2016-03-14 17:06:58 +08:00
Karina	f84f2315ab	change downsampling logic that downsampling source is from the nearest layer instead of the highest layer	2016-03-14 09:55:36 +08:00
Sindre Aamås	98042f1600	[Decoder] Use encoder x86 IDCT routines Move asm routines to common. Delete obsolete decoder routines. Use wider routines where applicable. ~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.	2016-03-09 10:41:42 +01:00
Sindre Aamås	48a520915a	[Encoder/x86] Add AVX2 SATD routines WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell). WelsSampleSatd16x8_avx2 (~2.19x speedup over SSE4.1 on Haswell). WelsSampleSatd8x16_avx2 (~1.68x speedup over SSE4.1 on Haswell). WelsSampleSatd8x8_avx2 (~1.53x speedup over SSE4.1 on Haswell).	2016-03-08 11:31:17 +01:00
sijchen	4db9c32976	remove sink in WelsThreadPool and hide the construtor to finish the singleTon	2016-03-02 17:08:09 -08:00
sijchen	d4f09d9048	put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask)	2016-02-29 11:40:25 -08:00
Gregory J. Wolfe	03890fe86f	Added support for "video signal type present" information. The "Video signal type present" information is written to the output video file when it is created, and later is used by the decoder to properly decode the compressed video data. The saved attributes are: - format type (PAL, NTSC, etc.) - color primaries (BT709, SMPTE170M, etc.) - transfer characteristics (BT709, SMPTE170M, etc.) - color matrix ((BT709, SMPTE170M, etc.) These modifications allow the client to specify these attributes and, if specified, makes sure they are written to the output file.	2016-02-24 10:33:18 -05:00
Gregory J. Wolfe	c7fcba06c7	Added support for "video signal type present" information. The "Video signal type present" information is written to the output video file when it is created, and later is used by the decoder to properly decode the compressed video data. The saved attributes are: - format type (PAL, NTSC, etc.) - color primaries (BT709, SMPTE170M, etc.) - transfer characteristics (BT709, SMPTE170M, etc.) - color matrix ((BT709, SMPTE170M, etc.) These modifications allow the client to specify these attributes and, if specified, makes sure they are written to the output file.	2016-02-23 13:21:06 -05:00
sijchen	aaa25160ec	Merge pull request #2353 from saamas/encoder-x86-dct-opt2 [Encoder] x86 DCT optimizations	2016-02-08 15:00:12 -08:00
sijchen	e5e7013b73	Merge pull request #2350 from sijchen/th00 [Common] Add sink to IWelsTask	2016-02-08 14:59:38 -08:00
Sindre Aamås	c8c74903f8	[Encoder] Add single-block AVX2 4x4 DCT/IDCT routines We do four blocks at a time when possible, but need to handle single blocks at a time for intra prediction. ~3.15x speedup over MMX for the DCT on Haswell. ~2.94x speedup over MMX for the IDCT on Haswell. Returns diminish with increasing vector length because a larger proportion of the time is spent on load/store/shuffling.	2016-02-02 17:22:49 +01:00
Sindre Aamås	f90960983c	[Encoder] Add single-block SSE2 4x4 DCT/IDCT routines We do four blocks at a time when possible, but need to handle single blocks at a time for intra prediction. ~2.31x speedup over MMX for the DCT on Haswell. ~1.92x speedup over MMX for the IDCT on Haswell.	2016-02-02 17:22:48 +01:00
unknown	3873addc3d	fix frame size constraints for width and height	2016-02-01 15:55:53 +08:00
HaiboZhu	1030820ec4	Merge pull request #2342 from sijchen/enh_ut_tem [UT] correct and enhance the ut template and trace improvement	2016-02-01 09:08:05 +08:00
sijchen	47e3f4c45c	correct and enhance the ut template	2016-01-19 17:16:39 -08:00
Sindre Aamås	cc8d541432	[UT] Utilize DCT function pointer typedefs	2016-01-19 22:00:24 +01:00
Sindre Aamås	a45c10cf91	[UT] Only run AVX2 tests if host supports AVX2	2016-01-19 14:27:46 +01:00
Sindre Aamås	3088d96978	[Encoder] Add an AVX2 4x4 IDCT implementation ~2.03x faster on Haswell as compared to the SSE2 version.	2016-01-19 13:12:28 +01:00
Sindre Aamås	b267163f10	[Encoder] Add an AVX2 4x4 DCT implementation ~2.52x faster on Haswell as compared to the SSE2 version.	2016-01-19 13:12:28 +01:00
Sindre Aamås	b9adbcf37c	[UT] Add missing SSE2 4x4 IDCT test IDCT input is defined in such a way that the intermediate values cannot legally overflow an int16_t. The use of random values as input causes such overflows. This results in implementation- dependent output depending on which type is used to hold intermediate results. Use a template for the test reference implementation to test implementations with different intermediate representation.	2016-01-19 13:12:28 +01:00
Sindre Aamås	8764231784	[UT] Improve DCT tests Initialize input arrays with different random values. Otherwise, the input to the DCT routines is effectively all zero values after taking the difference. Reduce duplication.	2016-01-19 13:12:28 +01:00
sijchen	d46cd07511	fix the prob in case that the task uID is too big	2016-01-15 16:06:09 -08:00
sijchen	5eb18b101e	change the output way of debug trace	2016-01-13 22:13:43 -08:00
Karina	0f0d54ef51	using independent encoder control logic for SAVC case	2016-01-14 09:16:12 +08:00
sijchen	cce1c29844	add sink to IWelsTask (for further enhancements)	2016-01-13 16:24:54 -08:00
sijchen	5cad0f9bba	enhance a UT to cover more case	2016-01-11 22:01:02 -08:00
huade	0f24b80af8	reduce one test sequences and let travis jobs num to 4, thus reduce test time	2015-12-14 17:18:21 +08:00
HaiboZhu	ee01b3afaf	Merge pull request #2307 from huili2/fix_decstat fix iAvgLumaQp in decStat	2015-12-14 10:26:16 +08:00
huili2	b2d4a95537	fix iAvgLumaQp in decStat	2015-12-11 14:14:42 +08:00
sijchen	0c820f4c06	adjust encoder test case to cover multi-thread without loadbalancing	2015-12-09 09:58:03 -08:00
sijchen	76ca56498a	Add tasks and thread pool call for SM_SIZELIMITED_SLICE mode	2015-12-09 09:55:04 -08:00
HaiboZhu	7e9fdc181f	Merge pull request #2301 from huili2/simple_parseonly_ctx remove parseonly in decoder ctx	2015-12-09 10:33:29 +08:00
sijchen	f38d24f036	fix the conflict with the current master	2015-11-30 23:42:26 -08:00
Guangwei Wang	c917d09263	fix bug in UT code	2015-12-01 08:55:00 +08:00
sijchen	420778f4d8	add valid adjustment in test to avoid outputing warning trace	2015-11-30 11:33:13 -08:00
sijchen	42ac53b5fc	update win UT project after UT structure change	2015-11-30 11:29:47 -08:00
sijchen	46667588e3	moving test cases to specific files to avoid the too long encode_decode_api_test.cpp	2015-11-30 10:47:10 -08:00
huili2	926fc67451	remove parseonly in decoder ctx	2015-11-27 08:56:20 +08:00
sijchen	05c89b75f0	remove duplicated operation after thread pool and rename a task for clearer meaning	2015-11-25 13:46:21 -08:00
sijchen	67dab5d70e	Merge pull request #2266 from sijchen/ut0 [UT] put class notification to header file	2015-11-25 09:57:43 -08:00
HaiboZhu	404315ab19	Merge pull request #2270 from huili2/parseonly_api_bugfix disable wrongly calling for parseonly related	2015-11-25 09:00:54 +08:00

1 2 3 4 5 ...

687 Commits