openh264

Author	SHA1	Message	Date
HaiboZhu	515eeb41e4	Merge pull request #2481 from ruil2/maxbitrate1 fix iContinualSkipFrames calculation	2016-06-01 09:03:57 +08:00
HaiboZhu	7ccc377d55	Merge pull request #2480 from ruil2/fix fix removing parameter setting wrongly	2016-06-01 09:03:49 +08:00
ruil2	2d3fc37a07	Merge pull request #2484 from sijchen/refactor_preprocess13 [Encoder] Refactor: add class for diff preprocess strategy	2016-06-01 08:31:02 +08:00
Karina	87e81a7a40	use the same name to avoid confusing.	2016-06-01 08:21:03 +08:00
sijchen@cisco.com	03863ae4c6	different preprocess actually used diff source picture management	2016-05-31 14:36:21 -07:00
sijchen@cisco.com	a1cae49732	add class for diff preprocess strategy	2016-05-31 13:48:45 -07:00
sijchen	c29da290b9	Merge pull request #2479 from ruil2/refine_rc1 get the correct did for savc case	2016-05-31 10:58:38 -07:00
Karina	dd021b6ca8	fix iContinualSkipFrames calculation	2016-05-31 21:01:11 +08:00
Karina	8effa45edd	fix removing parameter setting	2016-05-31 20:46:13 +08:00
Karina	64ad70b0ea	get the correct did for savc case	2016-05-31 17:35:20 +08:00
HaiboZhu	df77a5d587	Merge pull request #2478 from ruil2/refine_rc1 refine RC	2016-05-31 17:20:46 +08:00
Karina	4fc2b1f636	refine RC	2016-05-31 16:44:04 +08:00
Karina	7f2ba4dcb6	add savc setting in configure file and command line	2016-05-31 13:53:31 +08:00
Karina	e3c306608c	fix dependency ID mapping issue	2016-05-30 15:03:39 +08:00
ruil2	39c2fb3d6b	Merge pull request #2472 from saamas/processing-x86-general-bilinear-downsample-optimizations [Processing/x86] GeneralBilinearDownsample optimizations	2016-05-27 15:17:31 +08:00
HaiboZhu	c17a58efdf	Merge pull request #2473 from ruil2/update_interface modify the interface that use a independent subseqID for each layer	2016-05-25 10:00:13 +08:00
Karina	2ef9613e55	avoid overflow	2016-05-24 13:25:05 +08:00
Sindre Aamås	e490215990	[Processing/x86] Add an AVX2 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 8. Because pshufb does not cross 128-bit lanes, the overhead of address calculations and loads is relatively greater as compared with an SSSE3/SSE4.1 implementation. Fall back to a generic approach for ratios > 8. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~8.52x/~6.89x (32-bit/64-bit) for horizontal ratios <= 2, ~7.81x/~6.13x for ratios within (2, 4], ~5.81x/~4.52x for ratios within (4, 8], and ~5.06x/~4.09x for ratios > 8 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:47 +02:00
Sindre Aamås	b43e58a366	[Processing/x86] Add an AVX2 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 8. Because pshufb does not cross 128-bit lanes, the overhead of address calculations and loads is relatively greater as compared with an SSSE3 implementation. Fall back to a generic approach for ratios > 8. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~10.42x/~5.23x (32-bit/64-bit) for horizontal ratios <= 2, ~9.49x/~4.64x for ratios within (2, 4], ~6.43x/~3.18x for ratios within (4, 8], and ~5.42x/~2.50x for ratios > 8 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:47 +02:00
Sindre Aamås	b1013095b1	[Processing/x86] Add an SSE4.1 implementation of GeneralBilinearAccurateDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. The use of blendps makes this require SSE4.1. The pshufb path can be backported to SSSE3 and the generic path to SSE2 for a minor reduction in performance by replacing blendps and preceding instructions with an equivalent sequence. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~5.32x/~4.25x (32-bit/64-bit) for horizontal ratios <= 2, ~5.06x/~3.97x for ratios within (2, 4], and ~3.93x/~3.13x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:39 +02:00
Sindre Aamås	1995e03d91	[Processing/x86] Add an SSSE3 implementation of GeneralBilinearFastDownsample Keep track of relative pixel offsets and utilize pshufb to efficiently extract relevant pixels for horizontal scaling ratios <= 4. Fall back to a generic approach for ratios > 4. Note that the generic approach can be backported to SSE2. The implementation assumes that data beyond the end of each line, before the next line begins, can be dirtied; which AFAICT is safe with the current usage of these routines. Speedup is ~6.67x/~3.26x (32-bit/64-bit) for horizontal ratios <= 2, ~6.24x/~3.00x for ratios within (2, 4], and ~4.89x/~2.17x for ratios > 4 when not memory-bound on Haswell as compared with the current SSE2 implementation.	2016-05-23 20:23:31 +02:00
Sindre Aamås	cbaf087583	[Processing] Reduce duplication in downsampling wrappers	2016-05-23 13:19:17 +02:00
ruil2	c96c8b05a8	Merge pull request #2468 from sijchen/refactor_pre [Encoder] Refactor: create diff func for diff case to make logic clean	2016-05-23 13:21:40 +08:00
Karina	9b2dd55324	add GetBsPostion for cabac and cavlc	2016-05-20 14:29:48 +08:00
sijchen	27e803f6f4	refactor to make logic clean	2016-05-19 09:42:39 -07:00
Karina	ac37666cf1	modify the interface that use a independent subseqID for each layer	2016-05-19 17:17:17 +08:00
Karina	8a341070f2	fix overflow issue	2016-05-19 12:00:49 +08:00
sijchen	1ac02f3002	fix conflict with master	2016-05-18 10:57:39 -07:00
Karina	c298d66d48	fix temporal layer skip issue	2016-05-18 09:47:49 +08:00
Haibo Zhu	85f4beb9a8	Fix the wrong variable name which casue the build error	2016-05-17 13:46:04 +08:00
HaiboZhu	46220cfb3b	Merge pull request #2461 from HaiboZhu/Bugfix_remove_undefined_behavior_warning Remove the undefined behavior waring in parse_cabac	2016-05-17 10:51:18 +08:00
Haibo Zhu	86c1f0d2c6	Remove the undefined behavior waring in parse_cabac	2016-05-17 09:40:03 +08:00
ruil2	0ec686f7ec	Merge pull request #2452 from sijchen/refactor_sps2 Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class	2016-05-17 09:19:14 +08:00
sijchen	1eb735299a	Merge pull request #2458 from ruil2/downsampling2 add one new downsampling algorithms	2016-05-16 10:59:35 -07:00
sijchen	00747540fb	move strategy related pointer to class	2016-05-16 10:55:13 -07:00
Karina	3b55d64902	fix crash when temporal layer is skipped, the frame should not be encoded	2016-05-16 14:43:13 +08:00
Karina	96b2a87030	add one new downsampling algorithms	2016-05-16 09:28:19 +08:00
sijchen	ffb85046b4	Refactoring: Wrap all the operations related to eSpsPpsIdStrategy to class, to improve code readability	2016-05-04 15:06:02 -07:00
HaiboZhu	c30cc41261	Merge pull request #2448 from saamas/encoder-getnonzerocount-sse42 [Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount	2016-05-04 09:49:47 +08:00
ruil2	e9dc97803d	Merge pull request #2447 from saamas/encoder-cavlcparamcal-sse42 [Encoder] Add an SSE4.2 implementation of CavlcParamCal	2016-04-28 09:08:44 +08:00
ruil2	7d65687284	Merge pull request #2441 from saamas/encoder-add-avx2-4x4-quantization-routines [Encoder] Add AVX2 4x4 quantization routines	2016-04-28 09:08:31 +08:00
ruil2	56618249d7	Merge pull request #2436 from saamas/processing-add-avx2-vaa-routines [Processing] Add AVX2 VAA routines	2016-04-28 09:08:03 +08:00
Sindre Aamås	fb0b2b3f41	[Encoder/x86] Drop unneeded LOAD_4_PARA in CavlcParamCal_sse42	2016-04-24 22:59:35 +02:00
Sindre Aamås	d1c7713191	[Encoder/x86] Minor CavlcParamCal_sse42 tweak Do more elaborate register allocation to avoid a few mov instructions.	2016-04-24 22:36:23 +02:00
Sindre Aamås	f56bdc3aa4	[Encoder/x86] Minor CavlcParamCal_sse42 tweak Avoid loading single-use parameter.	2016-04-21 16:29:02 +02:00
Sindre Aamås	2eb8800712	[Encoder/x86] Remove a leftover mov instruction in CavlcParamCal_sse42	2016-04-21 15:53:33 +02:00
Sindre Aamås	4645bd26aa	[Encoder] Add an SSE4.2 implementation of WelsGetNonZeroCount Avoid touching some cache lines by using popcnt instead of table lookups. Also gives a speedup of ~1.4x on Haswell as compared with SSE2.	2016-04-20 19:10:24 +02:00
Sindre Aamås	3f31aff4dc	[Encoder] Add an SSE4.2 implementation of CavlcParamCal Use a combination of table lookups and pshufb to convert coefficients to zero run/level format. Two 16-entry lookup tables are used for a total of 192 bytes worth of tables. (The existing SSE2 version uses a table of size 2048 bytes.) Speedup is ~1.5x-3x as compared with the SSE2 version on Haswell (the speedup is greater for input with many trailing zeros). The use of popcnt makes it require SSE4.2. This can be replaced with a small LUT and accumulation which would reduce the requirement to SSSE3.	2016-04-20 18:37:08 +02:00
Sindre Aamås	502b16925e	[UT] Add tests for CavlcParamCal_c and CavlcParamCal_sse2	2016-04-20 18:37:08 +02:00
HaiboZhu	98c6c6de11	Merge pull request #2446 from HaiboZhu/Reduce_log_size_for_parse_only_mode Add the log reduce logic into parse only mode	2016-04-20 10:48:57 +08:00

1 2 3 4 5 ...

2662 Commits