openh264

Author	SHA1	Message	Date
HaiboZhu	60cbb77583	Merge pull request #2500 from ruil2/downsampling use average downsampling fistly then general downsampling	2016-06-21 10:11:44 +08:00
Karina	7c0ca2fc14	use average downsampling fistly then general downsampling when dst resolution > 1/4 source resolution and dst resolution <1/2 source resolution	2016-06-17 10:30:47 +08:00
Sindre Aamås	0f7b8365b9	[Encoder] Avoid valgrind downsampling false positives X86 SIMD downsampling routines may, for convenience, read slightly beyond the input data and into the alignment padding area beyond each line. This causes valgrind to warn about uninitialized values even if these values only affect lanes of a SIMD vector that are effectively never used. Avoid these false positives by zero-initializing the padding area beyond each line of the source buffer used for downsampling.	2016-06-16 21:19:17 +02:00
Martin Storsjö	e945654f06	Use assert.h instead of cassert This fixes building for android differently than in f5e483ce. On android, <cassert> isn't available in the normal include path, only when the STL headers are available. We intentionally avoid using STL within the main libopenh264.so, to simplify dependency chains for users of the library (which otherwise could run into conflicts if the surrounding app would want to use a different STL implementation). The previous fix only provided headers, not actually linking against STL, so at this point it's not a real issue yet, but it's still a very slippery slope towards accidentally starting relying on STL within the core library. Instead explicitly avoid using STL within the core library, by not even providing the include path.	2016-06-15 21:06:11 +03:00
HaiboZhu	2e6c9f7cd3	Merge pull request #2496 from saamas/processing-relax-downsample-buffer-size-requirement [Processing] Relax downsample buffer size requirement	2016-06-15 10:31:53 +08:00
HaiboZhu	d35647ec3b	Merge pull request #2491 from ruil2/nalsize add nalsize checking UT and fix nalsize control when cabac on	2016-06-15 10:24:18 +08:00
HaiboZhu	151a7ff643	Merge pull request #2490 from sijchen/refactor_ref4 [Encoder] refactor: to avoid only use idx0 in syntax writing, for now it has no impact on bs	2016-06-15 10:23:38 +08:00
HaiboZhu	84a7669b63	Merge pull request #2464 from bumblebritches57/MVC MVC aka Stereoscopic 3D support	2016-06-15 10:05:15 +08:00
ruil2	4b6f037020	Merge pull request #2489 from saamas/processing-dyadic-bilinear-downsample-optimizations [Processing] DyadicBilinearDownsample optimizations	2016-06-12 10:02:55 +08:00
Karina	b5cef5d49c	modify reserved nal header size and change source frame in NalSizeChecking UT	2016-06-08 10:12:27 +08:00
Karina	40f4fc05bb	get each spatial layer qp	2016-06-06 17:13:22 +08:00
Karina	c1255451d7	use the correct frametype in statistics info	2016-06-06 17:06:56 +08:00
ruil2	106d13d26c	Merge pull request #2492 from saamas/processing-x86-downsample-use-lddqu [Processing/x86] Use lddqu in case we still run on anything that benefits	2016-06-06 12:46:55 +08:00
Sindre Aamås	f183891c5b	[Processing/x86] Use lddqu in case we still run on anything that benefits	2016-06-04 00:41:35 +02:00
Sindre Aamås	5a9c6db335	[Processing] Relax downsample buffer size requirement AFAICT, it is sufficient that the sample buffer has space for half the source width/height. With the current sample buffer size, this enables its use for resolutions up to 3840x2176.	2016-06-03 15:14:09 +02:00
Sindre Aamås	68a5910f8f	[Processing] Clear LSB before rounding up dyadic downsample width	2016-06-03 12:03:01 +02:00
Karina	2171d84f1e	add nalsize checking UT and fix nalsize control when cabac on	2016-06-03 17:36:14 +08:00
ruil2	3eba80765c	Merge pull request #2487 from sijchen/refactor_ref31 [Encoder] Preprocess: refactor to improve code readability	2016-06-03 13:39:04 +08:00
Karina	4f41c3a5bf	fix codingIdx update issue	2016-06-02 21:17:31 +08:00
Sindre Aamås	8a0af4a3f2	[Processing/x86] DyadicBilinearDownsample optimizations Average vertically before horizontally; horizontal averaging is more worksome. Doing the vertical averaging first reduces the number of horizontal averages by half. Use pmaddubsw and pavgw to do the horizontal averaging for a slight performance improvement. Minor tweaks. Improve the SSSE3 dyadic downsample routines and drop the SSE4 routines. The non-temporal loads used in the SSE4 routines do nothing for cache- backed memory AFAIK. Adjust tests because averaging vertically first gives slightly different output. ~2.39x speedup for the widthx32 routine on Haswell when not memory-bound. ~2.20x speedup for the widthx16 routine on Haswell when not memory-bound. Note that the widthx16 routine can be unrolled for further speedup.	2016-06-02 13:44:28 +02:00
Sindre Aamås	7cbb75eac6	[Processing] Pick dyadic downsample function based on stride Assume that data can be written into the padding area following each line. This enables the use of faster routines for more cases. Align downsample buffer stride to a multiple of 32. With this all strides used should be a multiple of 16, which means that use of narrower downsample routines can be dropped altogether.	2016-06-02 13:44:28 +02:00
Sindre Aamås	770e48ac2b	[Processing] Remove unused align macros The WELS_ALIGN macro here aliases the WELS_ALIGN macro in macros.h which is inconvenient. Just remove these unused macros.	2016-06-02 13:44:28 +02:00
sijchen@cisco.com	a7ae1efc3a	add back the missing part after merging and formatting	2016-06-01 21:33:33 -07:00
sijchen@cisco.com	8bacc3d4d0	Preprocess: refactor to improve code readability	2016-06-01 21:26:24 -07:00
sijchen@cisco.com	8537a9274d	fix a prob	2016-06-01 09:21:12 -07:00
sijchen@cisco.com	a9601cdc59	refactor to avoid only use idx0 in syntax writing, for now it has no impact on bs, may benefit future usage	2016-06-01 09:21:12 -07:00
Karina	268a0eb6f4	remove redundant initialization	2016-06-01 10:52:51 +08:00
HaiboZhu	515eeb41e4	Merge pull request #2481 from ruil2/maxbitrate1 fix iContinualSkipFrames calculation	2016-06-01 09:03:57 +08:00
HaiboZhu	7ccc377d55	Merge pull request #2480 from ruil2/fix fix removing parameter setting wrongly	2016-06-01 09:03:49 +08:00
ruil2	2d3fc37a07	Merge pull request #2484 from sijchen/refactor_preprocess13 [Encoder] Refactor: add class for diff preprocess strategy	2016-06-01 08:31:02 +08:00
Karina	87e81a7a40	use the same name to avoid confusing.	2016-06-01 08:21:03 +08:00
sijchen@cisco.com	03863ae4c6	different preprocess actually used diff source picture management	2016-05-31 14:36:21 -07:00
sijchen@cisco.com	a1cae49732	add class for diff preprocess strategy	2016-05-31 13:48:45 -07:00
sijchen	c29da290b9	Merge pull request #2479 from ruil2/refine_rc1 get the correct did for savc case	2016-05-31 10:58:38 -07:00
Karina	dd021b6ca8	fix iContinualSkipFrames calculation	2016-05-31 21:01:11 +08:00
Karina	8effa45edd	fix removing parameter setting	2016-05-31 20:46:13 +08:00
Karina	64ad70b0ea	get the correct did for savc case	2016-05-31 17:35:20 +08:00
HaiboZhu	df77a5d587	Merge pull request #2478 from ruil2/refine_rc1 refine RC	2016-05-31 17:20:46 +08:00
Karina	4fc2b1f636	refine RC	2016-05-31 16:44:04 +08:00
Karina	7f2ba4dcb6	add savc setting in configure file and command line	2016-05-31 13:53:31 +08:00
Karina	e3c306608c	fix dependency ID mapping issue	2016-05-30 15:03:39 +08:00
ruil2	39c2fb3d6b	Merge pull request #2472 from saamas/processing-x86-general-bilinear-downsample-optimizations [Processing/x86] GeneralBilinearDownsample optimizations	2016-05-27 15:17:31 +08:00
HaiboZhu	c17a58efdf	Merge pull request #2473 from ruil2/update_interface modify the interface that use a independent subseqID for each layer	2016-05-25 10:00:13 +08:00
Karina	2ef9613e55	avoid overflow	2016-05-24 13:25:05 +08:00
Sindre Aamås	e490215990	[Processing/x86] Add an AVX2 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 8. Because pshufb does not cross 128-bit lanes, the overhead of address calculations and loads is relatively greater as compared with an SSSE3/SSE4.1 implementation. Fall back to a generic approach for ratios > 8. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~8.52x/~6.89x (32-bit/64-bit) for horizontal ratios <= 2, ~7.81x/~6.13x for ratios within (2, 4], ~5.81x/~4.52x for ratios within (4, 8], and ~5.06x/~4.09x for ratios > 8 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:47 +02:00
Sindre Aamås	b43e58a366	[Processing/x86] Add an AVX2 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 8. Because pshufb does not cross 128-bit lanes, the overhead of address calculations and loads is relatively greater as compared with an SSSE3 implementation. Fall back to a generic approach for ratios > 8. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~10.42x/~5.23x (32-bit/64-bit) for horizontal ratios <= 2, ~9.49x/~4.64x for ratios within (2, 4], ~6.43x/~3.18x for ratios within (4, 8], and ~5.42x/~2.50x for ratios > 8 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:47 +02:00
Sindre Aamås	b1013095b1	[Processing/x86] Add an SSE4.1 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. The use of blendps makes this require SSE4.1. The pshufb path can be backported to SSSE3 and the generic path to SSE2 for a minor reduction in performance by replacing blendps and preceding instructions with an equivalent sequence. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~5.32x/~4.25x (32-bit/64-bit) for horizontal ratios <= 2, ~5.06x/~3.97x for ratios within (2, 4], and ~3.93x/~3.13x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:39 +02:00
Sindre Aamås	1995e03d91	[Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. Note that the generic approach can be backported to SSE2. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2, ~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:31 +02:00
Sindre Aamås	cbaf087583	[Processing] Reduce duplication in downsampling wrappers	2016-05-23 13:19:17 +02:00
ruil2	c96c8b05a8	Merge pull request #2468 from sijchen/refactor_pre [Encoder] Refactor: create diff func for diff case to make logic clean	2016-05-23 13:21:40 +08:00

1 2 3 4 5 ...

2691 Commits