openh264

Author	SHA1	Message	Date
Sindre Aamås	b1013095b1	[Processing/x86] Add an SSE4.1 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. The use of blendps makes this require SSE4.1. The pshufb path can be backported to SSSE3 and the generic path to SSE2 for a minor reduction in performance by replacing blendps and preceding instructions with an equivalent sequence. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~5.32x/~4.25x (32-bit/64-bit) for horizontal ratios <= 2, ~5.06x/~3.97x for ratios within (2, 4], and ~3.93x/~3.13x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:39 +02:00
Sindre Aamås	1995e03d91	[Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. Note that the generic approach can be backported to SSE2. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2, ~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:31 +02:00
Sindre Aamås	cbaf087583	[Processing] Reduce duplication in downsampling wrappers	2016-05-23 13:19:17 +02:00
ruil2	c96c8b05a8	Merge pull request #2468 from sijchen/refactor_pre [Encoder] Refactor: create diff func for diff case to make logic clean	2016-05-23 13:21:40 +08:00
HaiboZhu	685b6144a5	Merge pull request #2469 from ruil2/fix_bitrate add GetBsPostion for cabac and cavlc	2016-05-23 09:49:45 +08:00
Karina	9b2dd55324	add GetBsPostion for cabac and cavlc	2016-05-20 14:29:48 +08:00
sijchen	27e803f6f4	refactor to make logic clean	2016-05-19 09:42:39 -07:00
sijchen	a5e4cca710	Merge pull request #2467 from ruil2/overflow fix overflow issue	2016-05-18 21:35:32 -07:00
Karina	8a341070f2	fix overflow issue	2016-05-19 12:00:49 +08:00
sijchen	3fd490dbed	Merge pull request #2460 from sijchen/refactor_ref2 [Encoder] move strategy related pointer to class	2016-05-18 11:40:08 -07:00
sijchen	1ac02f3002	fix conflict with master	2016-05-18 10:57:39 -07:00
sijchen	7188e50acf	Merge pull request #2465 from ruil2/skip_layers fix temporal layer skip issue	2016-05-18 09:34:09 -07:00
Karina	c298d66d48	fix temporal layer skip issue	2016-05-18 09:47:49 +08:00
sijchen	6d79601d93	Merge pull request #2463 from HaiboZhu/Fix_build_error_windows_debug Fix the wrong variable name which casue the build error	2016-05-16 22:57:32 -07:00
Haibo Zhu	85f4beb9a8	Fix the wrong variable name which casue the build error	2016-05-17 13:46:04 +08:00
HaiboZhu	46220cfb3b	Merge pull request #2461 from HaiboZhu/Bugfix_remove_undefined_behavior_warning Remove the undefined behavior waring in parse_cabac	2016-05-17 10:51:18 +08:00
Haibo Zhu	86c1f0d2c6	Remove the undefined behavior waring in parse_cabac	2016-05-17 09:40:03 +08:00
ruil2	0ec686f7ec	Merge pull request #2452 from sijchen/refactor_sps2 Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class	2016-05-17 09:19:14 +08:00
sijchen	1eb735299a	Merge pull request #2458 from ruil2/downsampling2 add one new downsampling algorithms	2016-05-16 10:59:35 -07:00
sijchen	00747540fb	move strategy related pointer to class	2016-05-16 10:55:13 -07:00
HaiboZhu	f623aa318d	Merge pull request #2459 from ruil2/fix_crash fix crash when temporal layer is skipped, the frame should not be encoded	2016-05-16 15:35:38 +08:00
Karina	3b55d64902	fix crash when temporal layer is skipped, the frame should not be encoded	2016-05-16 14:43:13 +08:00
Karina	96b2a87030	add one new downsampling algorithms	2016-05-16 09:28:19 +08:00
sijchen	3fa9a4840a	Merge pull request #2433 from hzwangsiyu/master Update .gitignore	2016-05-05 16:27:56 -07:00
sijchen	ffb85046b4	Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class, to improve code readability	2016-05-04 15:06:02 -07:00
HaiboZhu	c30cc41261	Merge pull request #2448 from saamas/encoder-getnonzerocount-sse42 [Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount	2016-05-04 09:49:47 +08:00
ruil2	e9dc97803d	Merge pull request #2447 from saamas/encoder-cavlcparamcal-sse42 [Encoder] Add an SSE4.2 implementation of CavlcParamCal	2016-04-28 09:08:44 +08:00
ruil2	7d65687284	Merge pull request #2441 from saamas/encoder-add-avx2-4x4-quantization-routines [Encoder] Add AVX2 4x4 quantization routines	2016-04-28 09:08:31 +08:00
ruil2	56618249d7	Merge pull request #2436 from saamas/processing-add-avx2-vaa-routines [Processing] Add AVX2 VAA routines	2016-04-28 09:08:03 +08:00
Sindre Aamås	fb0b2b3f41	[Encoder/x86] Drop unneeded LOAD_4_PARA in CavlcParamCal_sse42	2016-04-24 22:59:35 +02:00
Sindre Aamås	d1c7713191	[Encoder/x86] Minor CavlcParamCal_sse42 tweak Do more elaborate register allocation to avoid a few mov instructions.	2016-04-24 22:36:23 +02:00
Sindre Aamås	f56bdc3aa4	[Encoder/x86] Minor CavlcParamCal_sse42 tweak Avoid loading single-use parameter.	2016-04-21 16:29:02 +02:00
Sindre Aamås	2eb8800712	[Encoder/x86] Remove a leftover mov instruction in CavlcParamCal_sse42	2016-04-21 15:53:33 +02:00
Sindre Aamås	4645bd26aa	[Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount Avoid touching some cache lines by using popcnt instead of table lookups. Also gives a speedup of ~1.4x on Haswell as compared with SSE2.	2016-04-20 19:10:24 +02:00
Sindre Aamås	d906dda224	[UT] Improve GetNonZeroCount tests Reduce duplication. Test more combinations. Always test boundary cases.	2016-04-20 19:10:24 +02:00
Sindre Aamås	3f31aff4dc	[Encoder] Add an SSE4.2 implementation of CavlcParamCal Use a combination of table lookups and pshufb to convert coefficients to zero run/level format. Two 16-entry lookup tables are used for a total of 192 bytes worth of tables. (The existing SSE2 version uses a table of size 2048 bytes.) Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the speedup is greater for input with many trailing zeros). The use of popcnt makes it require SSE4.2. This can be replaced with a small LUT and accumulation which would reduce the requirement to SSSE3.	2016-04-20 18:37:08 +02:00
Sindre Aamås	502b16925e	[UT] Add tests for CavlcParamCal_c and CavlcParamCal_sse2	2016-04-20 18:37:08 +02:00
HaiboZhu	98c6c6de11	Merge pull request #2446 from HaiboZhu/Reduce_log_size_for_parse_only_mode Add the log reduce logic into parse only mode	2016-04-20 10:48:57 +08:00
HaiboZhu	3b68840d5f	Merge pull request #2444 from GuangweiWang/fix-assembly-arm64 Fix assembly arm64 Code review at: https://rbcommons.com/s/OpenH264/r/1594/	2016-04-20 10:03:01 +08:00
Haibo Zhu	3ccecfbdbe	Add the log reduce logic into parse only mode	2016-04-20 09:58:12 +08:00
HaiboZhu	c9433ee73b	Merge pull request #2442 from ruil2/deblocking_fix fix 32-bit parameters issue on arm64 assembly function	2016-04-18 09:21:24 +08:00
Guangwei Wang	cc407b4b21	fix code style	2016-04-17 19:47:55 +08:00
Guangwei Wang	0b8cdcaff8	extension 32-bit parameters to 64-bit on arm64 assembly function	2016-04-17 19:41:57 +08:00
Karina	1ecb9582df	update arm assembly comments	2016-04-14 14:57:21 +08:00
Karina	dd340b7fe7	modify neon comment	2016-04-14 14:49:11 +08:00
Karina	525dbe7093	add 32-bit parameter sign-extentions for block_add_aarch64_neon.S	2016-04-14 10:06:57 +08:00
Karina	d34e209266	fix 32-bit parameters issue on arm64 assembly function	2016-04-13 19:30:08 +08:00
Sindre Aamås	bb49e23719	[Encoder] Add AVX2 4x4 quantization routines WelsQuantFour4x4Max_avx2 (~2.06x speedup over SSE2) WelsQuantFour4x4_avx2 (~2.32x speedup over SSE2) WelsQuant4x4Dc_avx2 (~1.49x speedup over SSE2) WelsQuant4x4_avx2 (~1.42x speedup over SSE2)	2016-04-13 11:56:47 +02:00
Sindre Aamås	1e83bec860	[UT] Add some missing quantization tests	2016-04-13 11:56:44 +02:00
Sindre Aamås	abaf3a4104	[UT] Reduce duplication in quantization tests	2016-04-13 08:59:16 +02:00

1 2 3 4 5 ...

4242 Commits