openh264

Author	SHA1	Message	Date
HaiboZhu	84a7669b63	Merge pull request #2464 from bumblebritches57/MVC MVC aka Stereoscopic 3D support	2016-06-15 10:05:15 +08:00
Sindre Aamås	8a0af4a3f2	[Processing/x86] DyadicBilinearDownsample optimizations Average vertically before horizontally; horizontal averaging is more worksome. Doing the vertical averaging first reduces the number of horizontal averages by half. Use pmaddubsw and pavgw to do the horizontal averaging for a slight performance improvement. Minor tweaks. Improve the SSSE3 dyadic downsample routines and drop the SSE4 routines. The non-temporal loads used in the SSE4 routines do nothing for cache- backed memory AFAIK. Adjust tests because averaging vertically first gives slightly different output. ~2.39x speedup for the widthx32 routine on Haswell when not memory-bound. ~2.20x speedup for the widthx16 routine on Haswell when not memory-bound. Note that the widthx16 routine can be unrolled for further speedup.	2016-06-02 13:44:28 +02:00
Sindre Aamås	1995e03d91	[Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. Note that the generic approach can be backported to SSE2. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2, ~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:31 +02:00
ruil2	7d65687284	Merge pull request #2441 from saamas/encoder-add-avx2-4x4-quantization-routines [Encoder] Add AVX2 4x4 quantization routines	2016-04-28 09:08:31 +08:00
Karina	dd340b7fe7	modify neon comment	2016-04-14 14:49:11 +08:00
Karina	d34e209266	fix 32-bit parameters issue on arm64 assembly function	2016-04-13 19:30:08 +08:00
Sindre Aamås	bb49e23719	[Encoder] Add AVX2 4x4 quantization routines WelsQuantFour4x4Max_avx2 (~2.06x speedup over SSE2) WelsQuantFour4x4_avx2 (~2.32x speedup over SSE2) WelsQuant4x4Dc_avx2 (~1.49x speedup over SSE2) WelsQuant4x4_avx2 (~1.42x speedup over SSE2)	2016-04-13 11:56:47 +02:00
Karina	7943764869	add missing sign extension for arm64	2016-04-12 16:27:58 +08:00
Martin Storsjö	a4e71d6662	Add missing sign extension for x86_64 in mb_copy.asm This fixes running the code built for x86_64 OS X with Xcode 7.3.	2016-03-24 10:20:42 +02:00
Sindre Aamås	b6c4a5447c	[Decoder/x86] IDCT one block at a time with SSE2 At lower bitrates, it is overall faster to conditionally do one block at a time with SSE2 on Haswell and likely other common architectures. At higher bitrates, it is faster to use the wider routine that IDCTs four blocks at a time. To avoid potential performance regressions as compared to MMX, stick with single-block IDCTs with SSE2. There is still a performance advantage as compared to MMX because the single-block SSE2 routine is faster than the corresponding MMX routine. Stick with four blocks at a time with AVX2 for which that appears to be consistently faster on Haswell.	2016-03-16 19:55:11 +01:00
Marcus Johnson	4d6b1c23fe	MVC support 2	2016-03-16 01:32:56 -04:00
Marcus Johnson	69bae68698	Add support for MVC NALs to EWelsNalUnitType	2016-03-16 01:28:55 -04:00
Sindre Aamås	98042f1600	[Decoder] Use encoder x86 IDCT routines Move asm routines to common. Delete obsolete decoder routines. Use wider routines where applicable. ~1.07x overall faster decode on a quick 720p30 4Mbps test on Haswell.	2016-03-09 10:41:42 +01:00
Sindre Aamås	48a520915a	[Encoder/x86] Add AVX2 SATD routines WelsSampleSatd16x16_avx2 (~2.31x speedup over SSE4.1 on Haswell). WelsSampleSatd16x8_avx2 (~2.19x speedup over SSE4.1 on Haswell). WelsSampleSatd8x16_avx2 (~1.68x speedup over SSE4.1 on Haswell). WelsSampleSatd8x8_avx2 (~1.53x speedup over SSE4.1 on Haswell).	2016-03-08 11:31:17 +01:00
volvet	d4c68527b1	Merge pull request #2389 from saamas/common-x86-deblock-chroma-horizontal-ssse3-optimizations [Common/x86] Deblock chroma horizontal ssse3 optimizations	2016-03-08 17:09:08 +08:00
sijchen	4db9c32976	remove sink in WelsThreadPool and hide the construtor to finish the singleTon	2016-03-02 17:08:09 -08:00
sijchen	d4f09d9048	put CWelsThreadPool to singleTon for future usage (including add sink for IWelsTask)	2016-02-29 11:40:25 -08:00
Sindre Aamås	a009153741	[Common/x86] DeblockChromaEq4H_ssse3 optimizations Use packed 8-bit operations rather than unpack to 16-bit. ~5.80x speedup on Haswell (x86-64). ~1.69x speedup on Haswell (x86 32-bit).	2016-02-26 10:58:16 +01:00
Sindre Aamås	9909c306f1	[Common/x86] DeblockChromaLt4H_ssse3 optimizations Use packed 8-bit operations rather than unpack to 16-bit. ~5.72x speedup on Haswell (x86-64). ~1.85x speedup on Haswell (x86 32-bit).	2016-02-26 10:58:16 +01:00
ruil2	2754129064	Merge pull request #2360 from saamas/common-x86-deblock-optimizations [Common/x86] Deblocking optimizations	2016-02-19 09:52:39 +08:00
sijchen	e07ee9c096	use WELS_DELETE_OP for deleting	2016-02-17 10:07:33 -08:00
sijchen	74955c877f	set pointers to null and call uninit	2016-02-17 10:07:33 -08:00
sijchen	cc675f9fd1	add error handling in memory allocation failed case	2016-02-17 10:07:33 -08:00
sijchen	71aa533038	move the printing of MEMORY_CHECK part to more reasonable	2016-02-15 10:12:34 -08:00
Sindre Aamås	e96a7b5c92	[Common/x86] DeblockChromaEq4V_ssse3 optimizations Use packed 8-bit operations rather than unpack to 16-bit. Avoid spills. ~2.07x speedup on Haswell (x86-64). ~2.12x speedup on Haswell (x86 32-bit).	2016-02-15 02:08:03 +01:00
Sindre Aamås	fc16010583	[Common/x86] DeblockChromaLt4V_ssse3 optimizations Use packed 8-bit operations rather than unpack to 16-bit. Avoid spills. ~2.68x speedup on Haswell (x86-64). ~2.38x speedup on Haswell (x86 32-bit).	2016-02-15 02:07:25 +01:00
Sindre Aamås	62fb37d096	[Common/x86] DeblockLumaEq4_ssse3 optimizations Use packed 8-bit operations rather than unpack to 16-bit. Minimize spills. ~2.31x speedup on Haswell (x86-64). ~2.40x speedup on Haswell (x86 32-bit).	2016-02-15 02:06:39 +01:00
Sindre Aamås	732e1c5f78	[Common/x86] DeblockLumaLt4_ssse3 optimizations Use packed 8-bit operations rather than unpack to 16-bit. Avoid spills. ~1.97x speedup on Haswell (x86-64). ~3.09x speedup on Haswell (x86 32-bit).	2016-02-15 02:06:18 +01:00
sijchen	e5e7013b73	Merge pull request #2350 from sijchen/th00 [Common] Add sink to IWelsTask	2016-02-08 14:59:38 -08:00
unknown	3873addc3d	fix frame size constraints for width and height	2016-02-01 15:55:53 +08:00
Sindre Aamås	3088d96978	[Encoder] Add an AVX2 4x4 IDCT implementation ~2.03x faster on Haswell as compared to the SSE2 version.	2016-01-19 13:12:28 +01:00
sijchen	5eb18b101e	change the output way of debug trace	2016-01-13 22:13:43 -08:00
sijchen	cce1c29844	add sink to IWelsTask (for further enhancements)	2016-01-13 16:24:54 -08:00
Sijia Chen	3e0ee69812	remove unneeded codes and add some logs	2015-11-02 23:15:29 -08:00
Sijia Chen	054a297ca7	adjust encoder tasks, add ut and enable new thread pool under some slice modes	2015-10-28 09:39:26 -07:00
HaiboZhu	e0cee02d77	Merge pull request #2177 from sijchen/thp21 [Encoder] add encoder tasks and task-management class	2015-10-23 13:21:42 +08:00
Martin Storsjö	80c8b7b1cc	Add a missing include of stdlib.h This is required for malloc in this header. This fixes building for Windows Phone.	2015-10-20 08:59:41 +03:00
Sijia Chen	819f6f5d93	[Encoder] add encoder tasks and task-management class https://rbcommons.com/s/OpenH264/r/1334/	2015-10-19 22:48:28 -07:00
Martin Storsjö	dac26cf923	Remove unused STL includes This fixes building for Android, where libopenh264.so is intended not to link to any particular STL implementation.	2015-10-19 11:21:29 +03:00
Sijia Chen	b29760ee31	remove unneeded parts	2015-10-15 11:31:34 -07:00
Sijia Chen	ade32f5c48	implementation for WelsSleep on WP8.0 https://rbcommons.com/s/OpenH264/r/1315/	2015-10-15 11:27:43 -07:00
Sijia Chen	a3f606e58a	replacement of std::list for m_cBusyThreads https://rbcommons.com/s/OpenH264/r/1320/	2015-10-15 11:17:29 -07:00
Sijia Chen	bc566f0923	put m_cIdleThreads to CWelsCircleQueue rather than std::map https://rbcommons.com/s/OpenH264/r/1313/	2015-10-15 10:24:48 -07:00
Sijia Chen	eb00d5cb9e	change std::list to internal implementation and add the new ut file for CWelsCircleQueue https://rbcommons.com/s/OpenH264/r/1310/	2015-10-15 10:11:29 -07:00
Sijia Chen	757a596e97	add basic threadpool functions https://rbcommons.com/s/OpenH264/r/1294/	2015-10-15 10:04:00 -07:00
Haibo Zhu	03d16bb4d1	Remove UBSAN warnings about negative left shift	2015-10-14 19:43:19 -07:00
HaiboZhu	3067d127aa	Merge pull request #2153 from mstorsjo/fix-warnings Fix warnings when building for iOS with xcode	2015-10-13 18:26:56 +08:00
Martin Storsjö	8363d43588	Fix warnings when building for iOS with xcode	2015-10-13 12:27:11 +03:00
Martin Storsjö	837599becc	Revert an accidental change that broke MSVC compilation This reverts an unrelated part of `e7e3b4f37f`. Since the function still is declared as taking an int32_t parameter in the header, changing the function implementation makes it end up as a different function.	2015-10-13 12:15:01 +03:00
Haibo Zhu	e7e3b4f37f	Init the string value and add protection for WelsStrcat()	2015-10-10 08:45:48 -07:00

1 2 3 4 5 ...

461 Commits