openh264

Author	SHA1	Message	Date
Karina	c1255451d7	use the correct frametype in statistics info	2016-06-06 17:06:56 +08:00
Karina	02218e2dbd	modify configure file comments	2016-06-06 16:22:09 +08:00
ruil2	106d13d26c	Merge pull request #2492 from saamas/processing-x86-downsample-use-lddqu [Processing/x86] Use lddqu in case we still run on anything that benefits	2016-06-06 12:46:55 +08:00
Sindre Aamås	f183891c5b	[Processing/x86] Use lddqu in case we still run on anything that benefits	2016-06-04 00:41:35 +02:00
Sindre Aamås	5a9c6db335	[Processing] Relax downsample buffer size requirement AFAICT, it is sufficient that the sample buffer has space for half the source width/height. With the current sample buffer size, this enables its use for resolutions up to 3840x2176.	2016-06-03 15:14:09 +02:00
Sindre Aamås	68a5910f8f	[Processing] Clear LSB before rounding up dyadic downsample width	2016-06-03 12:03:01 +02:00
Karina	2171d84f1e	add nalsize checking UT and fix nalsize control when cabac on	2016-06-03 17:36:14 +08:00
ruil2	3eba80765c	Merge pull request #2487 from sijchen/refactor_ref31 [Encoder] Preprocess: refactor to improve code readability	2016-06-03 13:39:04 +08:00
sijchen	1fa02f6b07	Merge pull request #2488 from ruil2/codingIdx1 fix codingIdx update issue	2016-06-02 10:00:56 -07:00
Karina	4f41c3a5bf	fix codingIdx update issue	2016-06-02 21:17:31 +08:00
Sindre Aamås	8a0af4a3f2	[Processing/x86] DyadicBilinearDownsample optimizations Average vertically before horizontally; horizontal averaging is more worksome. Doing the vertical averaging first reduces the number of horizontal averages by half. Use pmaddubsw and pavgw to do the horizontal averaging for a slight performance improvement. Minor tweaks. Improve the SSSE3 dyadic downsample routines and drop the SSE4 routines. The non-temporal loads used in the SSE4 routines do nothing for cache- backed memory AFAIK. Adjust tests because averaging vertically first gives slightly different output. ~2.39x speedup for the widthx32 routine on Haswell when not memory-bound. ~2.20x speedup for the widthx16 routine on Haswell when not memory-bound. Note that the widthx16 routine can be unrolled for further speedup.	2016-06-02 13:44:28 +02:00
Sindre Aamås	7cbb75eac6	[Processing] Pick dyadic downsample function based on stride Assume that data can be written into the padding area following each line. This enables the use of faster routines for more cases. Align downsample buffer stride to a multiple of 32. With this all strides used should be a multiple of 16, which means that use of narrower downsample routines can be dropped altogether.	2016-06-02 13:44:28 +02:00
Sindre Aamås	770e48ac2b	[Processing] Remove unused align macros The WELS_ALIGN macro here aliases the WELS_ALIGN macro in macros.h which is inconvenient. Just remove these unused macros.	2016-06-02 13:44:28 +02:00
sijchen@cisco.com	a7ae1efc3a	add back the missing part after merging and formatting	2016-06-01 21:33:33 -07:00
sijchen@cisco.com	8bacc3d4d0	Preprocess: refactor to improve code readability	2016-06-01 21:26:24 -07:00
sijchen	f6b6a0f6aa	Merge pull request #2485 from ruil2/init remove redundant initialization	2016-06-01 09:28:02 -07:00
sijchen@cisco.com	8537a9274d	fix a prob	2016-06-01 09:21:12 -07:00
sijchen@cisco.com	a9601cdc59	refactor to avoid only use idx0 in syntax writing, for now it has no impact on bs, may benefit future usage	2016-06-01 09:21:12 -07:00
Karina	268a0eb6f4	remove redundant initialization	2016-06-01 10:52:51 +08:00
HaiboZhu	515eeb41e4	Merge pull request #2481 from ruil2/maxbitrate1 fix iContinualSkipFrames calculation	2016-06-01 09:03:57 +08:00
HaiboZhu	7ccc377d55	Merge pull request #2480 from ruil2/fix fix removing parameter setting wrongly	2016-06-01 09:03:49 +08:00
ruil2	2d3fc37a07	Merge pull request #2484 from sijchen/refactor_preprocess13 [Encoder] Refactor: add class for diff preprocess strategy	2016-06-01 08:31:02 +08:00
Karina	87e81a7a40	use the same name to avoid confusing.	2016-06-01 08:21:03 +08:00
sijchen@cisco.com	03863ae4c6	different preprocess actually used diff source picture management	2016-05-31 14:36:21 -07:00
sijchen@cisco.com	a1cae49732	add class for diff preprocess strategy	2016-05-31 13:48:45 -07:00
sijchen	c29da290b9	Merge pull request #2479 from ruil2/refine_rc1 get the correct did for savc case	2016-05-31 10:58:38 -07:00
Karina	dd021b6ca8	fix iContinualSkipFrames calculation	2016-05-31 21:01:11 +08:00
Karina	8effa45edd	fix removing parameter setting	2016-05-31 20:46:13 +08:00
Karina	64ad70b0ea	get the correct did for savc case	2016-05-31 17:35:20 +08:00
HaiboZhu	df77a5d587	Merge pull request #2478 from ruil2/refine_rc1 refine RC	2016-05-31 17:20:46 +08:00
Karina	4fc2b1f636	refine RC	2016-05-31 16:44:04 +08:00
HaiboZhu	3f199f92a9	Merge pull request #2477 from ruil2/add_param_configure add savc setting in configure file and command line	2016-05-31 16:33:40 +08:00
Karina	7f2ba4dcb6	add savc setting in configure file and command line	2016-05-31 13:53:31 +08:00
HaiboZhu	1d2b52e4cc	Merge pull request #2476 from ruil2/did1 fix dependency ID mapping issue	2016-05-31 11:08:16 +08:00
Karina	e3c306608c	fix dependency ID mapping issue	2016-05-30 15:03:39 +08:00
ruil2	39c2fb3d6b	Merge pull request #2472 from saamas/processing-x86-general-bilinear-downsample-optimizations [Processing/x86] GeneralBilinearDownsample optimizations	2016-05-27 15:17:31 +08:00
Sindre Aamås	563376df0c	[UT] Test downsampling routines with a wider variety of height ratios	2016-05-25 14:16:29 +02:00
HaiboZhu	c17a58efdf	Merge pull request #2473 from ruil2/update_interface modify the interface that use a independent subseqID for each layer	2016-05-25 10:00:13 +08:00
HaiboZhu	780101fcfd	Merge pull request #2474 from ruil2/overflow avoid overflow	2016-05-25 09:59:36 +08:00
Karina	2ef9613e55	avoid overflow	2016-05-24 13:25:05 +08:00
Sindre Aamås	4fec6d581e	[UT] Test generic downsampling routines with a wider variety of width ratios Get coverage of all code paths for routines that branch to different paths for different scaling ratios.	2016-05-23 20:23:47 +02:00
Sindre Aamås	e490215990	[Processing/x86] Add an AVX2 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 8. Because pshufb does not cross 128-bit lanes, the overhead of address calculations and loads is relatively greater as compared with an SSSE3/SSE4.1 implementation. Fall back to a generic approach for ratios > 8. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~8.52x/~6.89x (32-bit/64-bit) for horizontal ratios <= 2, ~7.81x/~6.13x for ratios within (2, 4], ~5.81x/~4.52x for ratios within (4, 8], and ~5.06x/~4.09x for ratios > 8 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:47 +02:00
Sindre Aamås	b43e58a366	[Processing/x86] Add an AVX2 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 8. Because pshufb does not cross 128-bit lanes, the overhead of address calculations and loads is relatively greater as compared with an SSSE3 implementation. Fall back to a generic approach for ratios > 8. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~10.42x/~5.23x (32-bit/64-bit) for horizontal ratios <= 2, ~9.49x/~4.64x for ratios within (2, 4], ~6.43x/~3.18x for ratios within (4, 8], and ~5.42x/~2.50x for ratios > 8 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:47 +02:00
Sindre Aamås	b1013095b1	[Processing/x86] Add an SSE4.1 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. The use of blendps makes this require SSE4.1. The pshufb path can be backported to SSSE3 and the generic path to SSE2 for a minor reduction in performance by replacing blendps and preceding instructions with an equivalent sequence. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~5.32x/~4.25x (32-bit/64-bit) for horizontal ratios <= 2, ~5.06x/~3.97x for ratios within (2, 4], and ~3.93x/~3.13x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:39 +02:00
Sindre Aamås	1995e03d91	[Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. Note that the generic approach can be backported to SSE2. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2, ~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:31 +02:00
Sindre Aamås	cbaf087583	[Processing] Reduce duplication in downsampling wrappers	2016-05-23 13:19:17 +02:00
ruil2	c96c8b05a8	Merge pull request #2468 from sijchen/refactor_pre [Encoder] Refactor: create diff func for diff case to make logic clean	2016-05-23 13:21:40 +08:00
HaiboZhu	685b6144a5	Merge pull request #2469 from ruil2/fix_bitrate add GetBsPostion for cabac and cavlc	2016-05-23 09:49:45 +08:00
Karina	9b2dd55324	add GetBsPostion for cabac and cavlc	2016-05-20 14:29:48 +08:00
sijchen	27e803f6f4	refactor to make logic clean	2016-05-19 09:42:39 -07:00

1 2 3 4 5 ...

4338 Commits