openh264

Author	SHA1	Message	Date
Martin Storsjö	91331e1ba4	Silence warnings with GCC 5.4 This fixes warnings like the following: codec/decoder/core/src/mv_pred.cpp: In function ‘void WelsDec::PredPSkipMvFromNeighbor(WelsDec::PDqLayer, int16_t)’: codec/decoder/core/src/mv_pred.cpp:158:51: warning: ‘iLeftTopXy’ may be used uninitialized in this function [-Wmaybe-uninitialized] codec/processing/src/backgrounddetection/BackgroundDetection.cpp: In member function ‘void WelsVP::CBackgroundDetection::ForegroundDilation(WelsVP::SBackgroundOU, WelsVP::SBackgroundOU*, WelsVP::CBackgroundDetection::vBGDParam, int32_t)’: codec/processing/src/backgrounddetection/BackgroundDetection.cpp:281:63: warning: suggest parentheses around operand of ‘!’ or change ‘\|’ to ‘\|\|’ or ‘!’ to ‘~’ [-Wparentheses] For the possibly uninitialized variables, this is similar to earlier commits 8be8fe17 and af2666fd.	2016-07-20 11:53:10 +03:00
Guangwei Wang	7d00e8bc42	add option for enable/disable AVX2	2016-07-15 12:15:57 +08:00
Karina	9d89a6976e	init samplebuffer	2016-07-13 11:20:36 +08:00
Karina	7c0ca2fc14	use average downsampling fistly then general downsampling when dst resolution > 1/4 source resolution and dst resolution <1/2 source resolution	2016-06-17 10:30:47 +08:00
Martin Storsjö	e945654f06	Use assert.h instead of cassert This fixes building for android differently than in f5e483ce. On android, <cassert> isn't available in the normal include path, only when the STL headers are available. We intentionally avoid using STL within the main libopenh264.so, to simplify dependency chains for users of the library (which otherwise could run into conflicts if the surrounding app would want to use a different STL implementation). The previous fix only provided headers, not actually linking against STL, so at this point it's not a real issue yet, but it's still a very slippery slope towards accidentally starting relying on STL within the core library. Instead explicitly avoid using STL within the core library, by not even providing the include path.	2016-06-15 21:06:11 +03:00
HaiboZhu	2e6c9f7cd3	Merge pull request #2496 from saamas/processing-relax-downsample-buffer-size-requirement [Processing] Relax downsample buffer size requirement	2016-06-15 10:31:53 +08:00
ruil2	4b6f037020	Merge pull request #2489 from saamas/processing-dyadic-bilinear-downsample-optimizations [Processing] DyadicBilinearDownsample optimizations	2016-06-12 10:02:55 +08:00
Sindre Aamås	f183891c5b	[Processing/x86] Use lddqu in case we still run on anything that benefits	2016-06-04 00:41:35 +02:00
Sindre Aamås	5a9c6db335	[Processing] Relax downsample buffer size requirement AFAICT, it is sufficient that the sample buffer has space for half the source width/height. With the current sample buffer size, this enables its use for resolutions up to 3840x2176.	2016-06-03 15:14:09 +02:00
Sindre Aamås	68a5910f8f	[Processing] Clear LSB before rounding up dyadic downsample width	2016-06-03 12:03:01 +02:00
Sindre Aamås	8a0af4a3f2	[Processing/x86] DyadicBilinearDownsample optimizations Average vertically before horizontally; horizontal averaging is more worksome. Doing the vertical averaging first reduces the number of horizontal averages by half. Use pmaddubsw and pavgw to do the horizontal averaging for a slight performance improvement. Minor tweaks. Improve the SSSE3 dyadic downsample routines and drop the SSE4 routines. The non-temporal loads used in the SSE4 routines do nothing for cache- backed memory AFAIK. Adjust tests because averaging vertically first gives slightly different output. ~2.39x speedup for the widthx32 routine on Haswell when not memory-bound. ~2.20x speedup for the widthx16 routine on Haswell when not memory-bound. Note that the widthx16 routine can be unrolled for further speedup.	2016-06-02 13:44:28 +02:00
Sindre Aamås	7cbb75eac6	[Processing] Pick dyadic downsample function based on stride Assume that data can be written into the padding area following each line. This enables the use of faster routines for more cases. Align downsample buffer stride to a multiple of 32. With this all strides used should be a multiple of 16, which means that use of narrower downsample routines can be dropped altogether.	2016-06-02 13:44:28 +02:00
Sindre Aamås	770e48ac2b	[Processing] Remove unused align macros The WELS_ALIGN macro here aliases the WELS_ALIGN macro in macros.h which is inconvenient. Just remove these unused macros.	2016-06-02 13:44:28 +02:00
Sindre Aamås	e490215990	[Processing/x86] Add an AVX2 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 8. Because pshufb does not cross 128-bit lanes, the overhead of address calculations and loads is relatively greater as compared with an SSSE3/SSE4.1 implementation. Fall back to a generic approach for ratios > 8. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~8.52x/~6.89x (32-bit/64-bit) for horizontal ratios <= 2, ~7.81x/~6.13x for ratios within (2, 4], ~5.81x/~4.52x for ratios within (4, 8], and ~5.06x/~4.09x for ratios > 8 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:47 +02:00
Sindre Aamås	b43e58a366	[Processing/x86] Add an AVX2 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 8. Because pshufb does not cross 128-bit lanes, the overhead of address calculations and loads is relatively greater as compared with an SSSE3 implementation. Fall back to a generic approach for ratios > 8. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~10.42x/~5.23x (32-bit/64-bit) for horizontal ratios <= 2, ~9.49x/~4.64x for ratios within (2, 4], ~6.43x/~3.18x for ratios within (4, 8], and ~5.42x/~2.50x for ratios > 8 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:47 +02:00
Sindre Aamås	b1013095b1	[Processing/x86] Add an SSE4.1 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. The use of blendps makes this require SSE4.1. The pshufb path can be backported to SSSE3 and the generic path to SSE2 for a minor reduction in performance by replacing blendps and preceding instructions with an equivalent sequence. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~5.32x/~4.25x (32-bit/64-bit) for horizontal ratios <= 2, ~5.06x/~3.97x for ratios within (2, 4], and ~3.93x/~3.13x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:39 +02:00
Sindre Aamås	1995e03d91	[Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. Note that the generic approach can be backported to SSE2. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2, ~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:31 +02:00
Sindre Aamås	cbaf087583	[Processing] Reduce duplication in downsampling wrappers	2016-05-23 13:19:17 +02:00
Karina	96b2a87030	add one new downsampling algorithms	2016-05-16 09:28:19 +08:00
ruil2	56618249d7	Merge pull request #2436 from saamas/processing-add-avx2-vaa-routines [Processing] Add AVX2 VAA routines	2016-04-28 09:08:03 +08:00
Karina	1ecb9582df	update arm assembly comments	2016-04-14 14:57:21 +08:00
Karina	d34e209266	fix 32-bit parameters issue on arm64 assembly function	2016-04-13 19:30:08 +08:00
Sindre Aamås	57fc3e9917	[Processing] Add AVX2 VAA routines Process 8 lines at a time rather than 16 lines at a time because this appears to give more reliable memory subsystem performance on Haswell. Speedup is > 2x as compared to SSE2 when not memory-bound on Haswell. On my Haswell MBP, VAACalcSadSsdBgd is about ~3x faster when uncached, which appears to be related to processing 8 lines at a time as opposed to 16 lines at a time. The other routines are also faster as compared to the SSE2 routines in this case but to a lesser extent.	2016-04-11 16:09:56 +02:00
unknown	3873addc3d	fix frame size constraints for width and height	2016-02-01 15:55:53 +08:00
Martin Storsjö	c31e4e23f2	Fix indentation to consistently use spaces instead of tabs Also get rid of other stray tabs in scripts.	2015-09-15 08:41:19 +03:00
Martin Storsjö	77bd41ca7e	Fix building down_sample_neon.S with gnu binutils	2015-09-14 21:38:26 +03:00
Guangwei Wang	64657d3cfd	add new c and assembly functions to optimize downsampler when downscale equal 1:3/1:4	2015-09-11 16:45:40 +08:00
Martin Storsjö	78e0ec6130	Convert tabs to spaces before comments	2015-06-10 10:22:29 +03:00
Martin Storsjö	3052b7ac64	Remove tabs from vertically aligned function declarations and typedefs	2015-06-10 10:22:13 +03:00
Martin Storsjö	764793d74b	Remove tabs in struct and class definitions	2015-06-10 10:22:01 +03:00
Martin Storsjö	ca51ee0f44	Remove tabs where a simple space is just enough	2015-06-10 10:21:52 +03:00
Martin Storsjö	51efa57a3d	Convert tabs to spaces in vertically aligned code	2015-06-10 10:21:29 +03:00
Martin Storsjö	723044837a	Convert tabs to spaces in defines	2015-06-10 10:21:25 +03:00
Martin Storsjö	43767cddb6	Remove tabs from commented out code	2015-06-10 10:21:21 +03:00
Martin Storsjö	c134aa753a	Remove unnecessary/pointless/accidental tabs from the middle of lines of code	2015-06-03 15:39:30 +03:00
Martin Storsjö	b052a9580e	Convert tabs to spaces in code that looks like tables Also fix the alignment in some related cases, even though they didn't use tabs originally.	2015-06-03 13:26:36 +03:00
Martin Storsjö	df994fa3f5	Convert tabs to spaces in enums and tables of defines	2015-05-15 11:20:11 +03:00
Martin Storsjö	b05468b5c1	Convert tabs to spaces in multiline comments	2015-05-15 10:50:49 +03:00
Martin Storsjö	0ca7ff49e2	Convert tabs to spaces in assignment of SIMD function pointers	2015-05-14 14:07:49 +03:00
Martin Storsjö	95ac72754e	Convert tabs to spaces in .def files The three def files in the project currently use tabs very inconsistently.	2015-05-14 13:58:44 +03:00
Martin Storsjö	d152c25485	Remove tabs from the copyright/license section in file headers	2015-05-14 13:58:40 +03:00
Martin Storsjö	7a80c21526	Reformat tables without tabs	2015-05-13 22:06:58 +03:00
Martin Storsjö	dd913ef878	Don't use tabs for indentation in multi-line macros The astyle configuration makes sure normal code is indented consistently with 2 spaces, but astyle doesn't seem to touch the indentation in these multi-line macros.	2015-05-13 22:06:54 +03:00
Martin Storsjö	f324c354b1	Remove unnecessary double spaces and tabs in ifdef directives	2015-04-29 15:34:38 +03:00
Martin Storsjö	0995390c4a	Remove apple specific versions of arm macros with arguments The apple assembler for arm can handle the gnu binutils style macros just fine these days, so there is no need to duplicate all of these macros in two syntaxes, when the new one works fine in all cases. We already require a new enough assembler to support the gnu binutils style features since we use the .rept directive in a few places.	2015-03-27 11:11:45 +02:00
Martin Storsjö	d8202cf38f	Remove apple specific versions of arm64 macros with arguments The apple assembler for arm64 can handle the gnu binutils style macros just fine, so there is no need to duplicate all of these macros in two syntaxes, when the new one works fine in all cases. We already require a new enough assembler to support the gnu binutils style features since we use the .rept directive in a few places.	2015-03-27 11:11:23 +02:00
Martin Storsjö	0b0884874d	Remove superfluous .text directives at the start of arm assembly files This directive can be set by the common include header that is included by all files anyway.	2015-03-27 10:46:34 +02:00
Martin Storsjö	b98e7c1f7d	Rename a vcproj folder to camelcase, to match all other folders in the same project	2015-03-25 11:46:41 +02:00
Sijia Chen	431bcee310	1, update the max-nal-size setting in UT and param check since we are using a larger input check 2, fix potential overflow (will change bs but little impact on bs)	2015-02-06 13:24:20 +08:00
Martin Storsjö	a3063531c4	Remove accidental double semicolons	2015-02-02 09:20:35 +02:00

1 2 3 4 5 ...

271 Commits