generic-library/vpx

Author	SHA1	Message	Date
James Zern	c8b9658ecc	Merge "vp9_reconintra_neon: add d45 8x8"	2015-06-22 22:27:57 +00:00
Scott LaVarnway	86f4a3d8af	Remove tile param and added to MACROBLOCKD. Change-Id: I0e60aaa9f84bcc9f2376d71bd934f251baee38db	2015-06-22 06:09:38 -07:00
Parag Salasakar	bc94999148	mips msa vp9 fdct 4x4 optimization average improvement ~2x-3x Change-Id: Idf8be780b8b4228fc91f110a94e4ee1fd9af0163	2015-06-22 14:30:24 +05:30
Parag Salasakar	b6131a733d	Merge "mips msa vp9 fdct 8x8 optimization"	2015-06-20 02:58:10 +00:00
James Zern	12c6688e31	vp9_reconintra_neon: add d45 8x8 based on ssse3 implementation ~91% faster over 20M pixels Change-Id: I6d743a53352c2d6de0efe7899d7996e8b0f7fa29	2015-06-19 19:19:22 -07:00
Parag Salasakar	7ca84888c2	mips msa vp9 fdct 8x8 optimization average improvement ~4x-5x Change-Id: I37582efc2622bc20b2bf99617a76110ab24e9f6a	2015-06-20 07:48:35 +05:30
James Zern	714a46a63c	Merge "vp9_filter: make all filter tables static"	2015-06-19 03:32:24 +00:00
James Zern	a2c69af50e	Merge "vp9_reconintra_neon: add d45 4x4"	2015-06-19 03:27:23 +00:00
James Zern	5d1d72df16	Merge changes from topic 'vp9-intra-pred' * changes: vp9_reconintra_neon: add d135 4x4 vp9_reconintra: correct d135 4x4 signature	2015-06-19 03:24:58 +00:00
James Zern	ce88d74d34	vp9_reconintra_neon: add d45 4x4 based on webp's LD4() ~59% faster over 20M pixels Change-Id: I371eaed9ce8f470451046997e130b0ba1a2f7a9c	2015-06-18 15:25:07 -07:00
James Zern	337b221e00	vp9_reconintra_neon: add d135 4x4 based on webp's RD4() ~50% faster over 20M pixels Change-Id: Ifcb7bf7f7fc8eabf79d9e3b219ce1be67abc524a	2015-06-18 15:25:06 -07:00
James Zern	e8e3583fc7	vp9_reconintra: correct d135 4x4 signature add missing '_c' suffix Change-Id: I928d6cf8f90db0b8ca0b1f3bbf10b3d792062cec	2015-06-18 15:25:06 -07:00
James Zern	41d8545ab6	Merge "vp9_reconintra_neon: add DC 4x4 predictors"	2015-06-18 22:24:55 +00:00
James Zern	6e44bf20f7	vp9_reconintra_neon: add DC 4x4 predictors ~85-89% faster over 20M pixels Change-Id: I3812e8adfffe5255034da88dfe6546e12f4d10ee	2015-06-18 15:22:43 -07:00
James Zern	e77f859d72	Merge "vp9_reconintra_neon: add DC 32x32 predictors"	2015-06-18 22:17:51 +00:00
Parag Salasakar	d9fedf7832	mips msa vp9 fdct 32x32 optimization average improvement ~4x-6x Change-Id: Ibcac3ef8ed5e207cf8c121e696570e6b63d3c0f4	2015-06-17 07:58:34 +05:30
Parag Salasakar	fa53008fb7	Merge "mips msa vp9 fdct 16x16 optimization"	2015-06-17 01:21:59 +00:00
Scott LaVarnway	5fe0e55ca4	Merge "Eliminated frame_type check in get_partition_probs()"	2015-06-16 13:40:23 +00:00
Scott LaVarnway	b2658ec321	Eliminated frame_type check in get_partition_probs() Moved the frame_type check to the tile level and stored the prob ptr in MACROBLOCKD. Change-Id: I10b5a4abd58213dc7610e3ade1a1583c01526842	2015-06-16 05:37:54 -07:00
Scott LaVarnway	a41fe749a8	Merge "Update use_prev_frame_mvs flag in decoder."	2015-06-16 12:28:46 +00:00
Parag Salasakar	89b4b315aa	mips msa vp9 fdct 16x16 optimization average improvement ~4x-6x Change-Id: Id3b2243e5b3c7844c90c4231a5e75fa69911362c	2015-06-16 12:49:34 +05:30
James Zern	79fb3a013e	vp9_reconintra_neon: add DC 32x32 predictors ~84-85% faster over 20M pixels Change-Id: Ia67a7f4a342bf7b0a9280e05c25d81a774d90469	2015-06-15 20:57:28 -07:00
James Zern	3edd293dae	vp9_pred_common: inline vp9_get_tx_size_context + drop 'vp9_' prefix Change-Id: If3f3ec32d03026af78b8fcd82749e587a3f43059	2015-06-15 18:41:22 -07:00
James Zern	e6add6499f	vp9_pred_common: inline vp9_get_segment_id + drop 'vp9_' prefix Change-Id: Id5a3c8d416dbdf93d9f4f1bde662f7b2c2290168	2015-06-15 18:41:14 -07:00
James Zern	17c9678a3c	Merge "vp9_entropy: delete vp9_coefmodel_tree[]"	2015-06-15 23:02:42 +00:00
James Zern	e8d3491ec2	Merge "vp9_entropymode: make vp9_init_mode_probs private"	2015-06-15 23:02:36 +00:00
James Zern	98f0178611	enable vp9_d153_predictor_32x32_ssse3 unused since its initial commit ~91% faster over 20M pixels Change-Id: Ic8b5b3246bc97c8406be8bc4496601370403b70a	2015-06-12 19:48:22 -07:00
James Zern	ef75416ab7	vp9_entropy: delete vp9_coefmodel_tree[] it's been unused since: `4ac6a25` Moving vp9_tree_probs_from_distribution() to encoder. Change-Id: Ieae65864277fc3dbe993c5c08d75c6c5fcaa3a2d	2015-06-12 18:43:37 -07:00
James Zern	53b7f33f2d	vp9_entropymode: make vp9_init_mode_probs private rename to init_mode_probs Change-Id: Id451d7763b784ed37e43f2c35073a778078d3d0f	2015-06-12 18:25:23 -07:00
Parag Salasakar	ecbbef6b67	Merge "mips msa vp9 filter by weight optimization"	2015-06-12 18:30:11 +00:00
Parag Salasakar	fbac961b47	mips msa vp9 filter by weight optimization filter by weight - average improvement ~2x-3x Change-Id: I4832033335d339cdafdce697f07ce3e643920057	2015-06-12 12:06:42 +05:30
James Zern	e2b52f6f01	vp9_filter: make all filter tables static these are returned via vp9_get_interp_kernel() Change-Id: I45ed75e5b1515c4f5be9212759dcb50a456b5548	2015-06-11 15:15:52 -07:00
James Zern	33b3953c54	vp9_filter: restore vp9_bilinear_filters alignment the declaration containing the alignment in vp9_filter.h was removed in: `eb88b17` Make vp9 subpixel match vp8 fixes a crash in 32-bit builds Change-Id: I9a97e6b4e8e94698e43ff79d0d8bb85043b73c61	2015-06-11 15:15:25 -07:00
Scott LaVarnway	cca866f578	inline vp9_get_segdata() and change name. Change-Id: I706645cf9d9dc04f1b3b6ac80df80edb7f101854	2015-06-11 09:52:00 -07:00
Scott LaVarnway	a49c701529	Merge "inline vp9_segfeature_active()"	2015-06-11 12:29:45 +00:00
Scott LaVarnway	42c0b1b1f1	inline vp9_segfeature_active() and changed name. Change-Id: Ie023ca66cc2c823032f58d4faeb53fd1863c94f3	2015-06-11 04:20:55 -07:00
Parag Salasakar	c7489f4815	Merge "mips msa vp9 intra-pred optimization"	2015-06-11 03:31:49 +00:00
James Zern	44afbbb72d	Merge "vp9_reconintra/d45_predictor: remove temp storage"	2015-06-10 19:23:57 +00:00
Scott LaVarnway	97880c3324	Merge "Reducing size of MODE_INFO struct"	2015-06-10 13:15:19 +00:00
Scott LaVarnway	c9976b32b4	Update use_prev_frame_mvs flag in decoder. Added check to see if last frame was all intra. This will eliminate two checks in find_mv_refs_idx(). Also, do not update the frame mvs if the current frame is all intra. This improved performance on material with frequent intra-only frames. Change-Id: I44a4042c3670ab0d38439d565062a0e2a1ba9d1e	2015-06-08 03:38:13 -07:00
Parag Salasakar	a2288d274c	mips msa vp9 intra-pred optimization intra pred - average improvement ~2x-3x Change-Id: Ie3f7d6eded5ecb7ed7ee506ba8e4d98f93803b09	2015-06-06 22:29:32 +05:30
James Zern	9c6eea35b6	Merge "vp9_reconintra: simplify d63_predictor"	2015-06-05 21:49:13 +00:00
Frank Galligan	bfb6d48812	Add control to skip loop filter in VP9 decoder. This control allows the application to skip the loop filter in the decoder. This is an advanced control that should only be used in extreme circumstances as it may introduce and accumulate decode artifacts. Change-Id: I278c65c60826f84c9141ebe06c6eeed3c2335fa8	2015-06-05 10:07:09 -07:00
Parag Salasakar	d43fd99822	mips msa vp9 loopfilter 4, 8 optimization average improvement ~3x-4x Change-Id: I59279293ce4b2a1e99bd10579ac97740e943643f	2015-06-05 09:56:08 +05:30
James Zern	60d0b3364c	vp9_reconintra/d45_predictor: remove temp storage dst row 0 can be reused in the same way Change-Id: Id977da62545dcc4a89cebbcbad90ba84f8ff5d6b	2015-06-04 20:11:53 -07:00
James Zern	7012ba6395	vp9_reconintra: simplify d63_predictor calculate the averages needed for even and odd rows once; this removes a conditional from the inner loop the final average calculated currently relies on above[] being extended, it could be reduced to use above[block_size - 2] + 3 * above[block_size - 1] Change-Id: I70f5eac8d8a2a959c7114844a95826f445c3dd4d	2015-06-04 19:21:05 -07:00
Parag Salasakar	dc07cc6fed	Merge "mips msa vp9 loopfilter 16 optimization"	2015-06-05 02:15:26 +00:00
James Zern	c2cf347fe2	Merge "vp9_reconintra: use AVG[23] consistently"	2015-06-05 02:15:22 +00:00
James Zern	2b6d62140e	Merge "vp9_reconintra_neon_asm/tm4x4: simplify left load"	2015-06-05 01:46:39 +00:00
James Zern	6c3b691c49	Merge "vp9_reconintra: fix d45/d63 discrepancies"	2015-06-04 22:56:43 +00:00
James Zern	faea038f4f	vp9_reconintra: fix d45/d63 discrepancies the final index in rows 2, 3 differ from vp8 Change-Id: I0fcea907b4ab44e266c0f1fd77b290d2236b280a	2015-06-04 14:49:56 -07:00
Scott LaVarnway	baaaa57533	Reducing size of MODE_INFO struct Reduced size from 124 bytes to 104 bytes. For decode only builds, it is reduced to 68 bytes. Change-Id: If9e6b92285459425fa086ab5a743d0a598a69de3	2015-06-04 07:32:16 -07:00
Scott LaVarnway	8bb37dd069	Remove cm parameter from vp9_decode_block_tokens() part 2 Change-Id: Iee24b6bb095f748333223e6036fc5c9d9e7e5f1c	2015-06-04 07:13:19 -07:00
Scott LaVarnway	877fac122b	Merge "Remove counts param"	2015-06-04 13:46:42 +00:00
Parag Salasakar	914f8f9ee0	mips msa vp9 loopfilter 16 optimization average improvement ~3x-4x Change-Id: I8ef263da6ebcf8f20aabaefeccf25a84640ba048	2015-06-04 11:50:41 +05:30
Johann Koenig	c005792951	Merge "Make vp9 subpixel match vp8"	2015-06-04 06:16:13 +00:00
Parag Salasakar	fd891a9655	Merge "mips msa vp9 convolve8 avg hv optimization"	2015-06-04 05:44:24 +00:00
Johann	eb88b172fe	Make vp9 subpixel match vp8 The only difference between the two was that the vp9 function allowed for every step in the bilinear filter (16 steps) while vp8 only allowed for half of those. Since all the call sites in vp9 (<< 1) the input, it only ever used the same steps as vp8. This will allow moving the subpel variance to vpx_dsp with the rest of the variance functions. Change-Id: I6fa2509350a2dc610c46b3e15bde98a15a084b75	2015-06-03 22:10:51 -07:00
hkuang	ce5e17072d	Merge "Optimize the idct assembly code."	2015-06-04 04:32:11 +00:00
James Zern	4fcabf5169	vp9_reconintra: use AVG[23] consistently Change-Id: Iab7215f82be0c0c831cd81b6f8091afc3710dd54	2015-06-03 19:52:46 -07:00
Parag Salasakar	bdfbc3e876	mips msa vp9 convolve8 avg hv optimization average improvement ~4x-6x Change-Id: I7c8b4f2334491be8a859592606e568bc95d019aa	2015-06-04 08:11:01 +05:30
James Zern	2da8d24e8f	Merge "vp9_reconintra: simplify d45_predictor"	2015-06-04 01:59:10 +00:00
James Zern	a9f55e8324	Merge changes from topic 'vp9-intra-pred' * changes: vp9_reconintra: specialize d135 4x4 vp9_reconintra: specialize d117 4x4 vp9_reconintra: specialize d207 4x4 vp9_reconintra: specialize d153 4x4 vp9_reconintra: specialize d63 4x4 vp9_reconintra: specialize d45 4x4	2015-06-04 01:58:28 +00:00
James Zern	65d9599807	vp9_reconintra_neon_asm/tm4x4: simplify left load use vld1.8 {d0[]}, [r0] rather than ldrb+vdup; mildly faster Change-Id: Ia5ffc736bcb0f5497b7d9e55a93bf5a5f5f6928c	2015-06-03 18:51:13 -07:00
hkuang	98e88e6ad8	Optimize the idct assembly code. Change-Id: Ia0ff859ff1c813dbe100e2f27b1ef78167483f4e	2015-06-03 17:20:35 -07:00
Parag Salasakar	b8c1cdcd12	mips msa vp9 convolve8 avg horiz optimization average improvement ~5x-8x Change-Id: I179a69ec620fbd69979bd128f05d18113618aab4	2015-06-03 11:33:42 +05:30
Parag Salasakar	c543d38ac7	mips msa vp9 convolve8 avg vert optimization average improvement ~4x-6x Change-Id: Ia2e6f770da46416ebec31fdcea5cc7878879a9d9	2015-06-03 09:55:25 +05:30
Scott LaVarnway	f779dba405	Remove counts param Moved to MACROBLOCKD. Change-Id: Icce765b334f2755f4fe2a4c39fb2ae2d7660d004	2015-06-02 09:06:00 -07:00
Parag Salasakar	54a6f73958	mips msa vp9 idct4x4 and iwht4x4 optimization average improvement ~3x-4x moved assert to respective files Change-Id: I6c915059d456a00bdd76fab0dd2eede8b6c6ea58	2015-06-02 12:16:28 +05:30
Parag Salasakar	ebf7466cd8	mips msa vp9 updated convolve horiz, vert, hv, copy, avg module Updated sources according to improved version of common MSA macros. Enabled respective convolve MSA hooks and tests. Overall, this is just upgrading the code with styling changes. Change-Id: If5ad6ef8ea7ca47feed6d2fc9f34f0f0e8b6694d	2015-06-02 12:03:51 +05:30
Parag Salasakar	cf1c0ebc3a	Merge "mips msa vp9 updated idct 8x8, 16x16 and 32x32 module"	2015-06-02 04:48:02 +00:00
James Zern	71d923232c	Merge changes from topic 'vp9-intra-pred' * changes: vp9_reconintra_neon/tm: improve above_left load vp9_reconintra_neon: cosmetics: normalize fn params	2015-06-01 20:03:47 +00:00
James Zern	b601202905	Merge "vp9_reconintra_neon_asm/tm: simplify above_left load"	2015-06-01 20:01:38 +00:00
Parag Salasakar	6af9d7f2e2	mips msa vp9 updated idct 8x8, 16x16 and 32x32 module Updated sources according to improved version of common MSA macros. Enabled idct MSA hooks and tests. Overall, this is just upgrading the code with styling changes. Change-Id: I1f488ab2c741f6c622b7a855388a202168082209	2015-06-01 09:24:23 +05:30
James Zern	acc481eaae	vp9_reconintra: simplify d45_predictor only the immediate above right pixel is needed; this removes a conditional from the inner loop the final average calculated currently relies on above[] being extended, it could be reduced to use above[block_size - 2] + 3 * above_right Change-Id: Ica4f2b8d25eec3ca1d6fa52ef0d4adc228eeea3f	2015-05-30 13:30:59 -07:00
James Zern	6e068e51b5	vp9_reconintra: specialize d135 4x4 based on webp's RD4() Change-Id: I64c8f0a1325a8f201eaad39b396fae7a2d06efff	2015-05-30 13:29:40 -07:00
James Zern	b6782686f4	vp9_reconintra: specialize d117 4x4 based on webp's VR4() Change-Id: Ic8c0b8ed65a63772ca0a4321592880a5e8947db5	2015-05-30 13:29:02 -07:00
James Zern	c022dbc4d3	vp9_reconintra: specialize d207 4x4 based on webp's HU4() Change-Id: I2401ef307cd94e70cc7904f55954af04290c8af9	2015-05-30 13:28:22 -07:00
James Zern	2276eb16f3	vp9_reconintra: specialize d153 4x4 based on webp's HD4() Change-Id: Icba1e21ec4b8f5026dc92e49741a68b059c8b9b1	2015-05-30 13:27:50 -07:00
James Zern	102123821d	vp9_reconintra: specialize d63 4x4 based on webp's VL4() Change-Id: Ibab962053843eae8752b4e74b6481a53bb034ae9	2015-05-30 13:27:03 -07:00
James Zern	6051bcc3dc	vp9_reconintra: specialize d45 4x4 based on webp's LD4() Change-Id: I74855d23ce73e1c6988fe08bf7c959b7a69b4abf	2015-05-30 13:26:21 -07:00
Parag Salasakar	71e88f903d	Merge "mips msa vp9 updated macros and disable all MSA functions"	2015-05-30 02:52:27 +00:00
James Zern	7621b48a1c	vp9_reconintra_neon/tm: improve above_left load use vld1?_dup_u8 over vdup?_n_u8, reduces general register use; mildly faster Change-Id: Ie0e4e550849a207b34b378541196b553c9f12011	2015-05-29 19:18:43 -07:00
James Zern	f2d621e383	vp9_reconintra_neon: cosmetics: normalize fn params s/y_stride/stride/ Change-Id: Ie98c3fe241dc240b653849eda356a8862bdd52f4	2015-05-29 19:01:39 -07:00
James Zern	b337c54cc4	vp9_reconintra_neon_asm/tm: simplify above_left load use vld1.8 {d0[]}, [r0] rather than ldrb+vdup; mildly faster Change-Id: I5c24d49a90c2855c94395184774b289da8e9d5a7	2015-05-29 18:56:16 -07:00
James Zern	a2a13cbe5f	vp9_reconintra_neon: add DC 16x16 predictors 85-89% faster over 20M pixels Change-Id: I9b320ed6b9e67f27df738b84c8b43b65a93c50c2	2015-05-29 15:41:44 -07:00
James Zern	e97b849219	vp9_reconintra_neon: add DC 8x8 predictors ~90% faster over 20M pixels Change-Id: Iab791510cc57c8332c2f9a5da0ed50702e5f5763	2015-05-29 15:39:08 -07:00
Parag Salasakar	f9f078ebb6	mips msa vp9 updated macros and disable all MSA functions Done little restructuring/styling changes to the sources like generic macro definitions, their use to reduce code lines, better code alignments etc. Disabled all MSA hooks and tests Change-Id: Ic6f2dce0b501f46b80c06c46c0fe2043d557b190	2015-05-29 13:34:33 +05:30
Scott LaVarnway	bbea7c95d8	Merge "Re-worked header files"	2015-05-28 19:56:39 +00:00
Johann	3f2a06674a	Merge "Don't #define snprintf in VS 2015 or higher."	2015-05-28 19:38:57 +00:00
hkuang	5317185eb0	Merge "Add error handling when running out of free frame buffers."	2015-05-28 17:41:01 +00:00
Johann	cad0eca25c	Don't #define snprintf in VS 2015 or higher. In VS 2015 and higher snprintf is supplied and therefore vsnprintf doesn't need to be defined. This also avoids problems caused by _snprintf being different from snprintf. This fixes a build break with VS 2015 and improves security. Originally submitted via chromium by brucedawson@chromium.org https://codereview.chromium.org/1055603003 Additionally break this MSVC-specific tweak to a new file, which will become the home of all such MSVC-specific things. This requires adding a dependency on msvc.h to every example which uses args.c and tools_common.h Change-Id: I35b5f8e7ea00f6627403aabc9ea79b0412557a99	2015-05-27 18:28:25 -07:00
hkuang	131cab7c27	Add error handling when running out of free frame buffers. Change-Id: If28b59b9521204a6e3aecedcf75932d76a752567	2015-05-27 14:20:58 -07:00
Minghai Shang	cbdfdb947c	Merge "[decoder] Optimize context buffer re-allocation"	2015-05-27 20:24:30 +00:00
Johann	dee70d355f	Merge "Move variance functions to vpx_dsp"	2015-05-26 23:02:11 +00:00
Johann	c3bdffb0a5	Move variance functions to vpx_dsp subpel functions will be moved in another patch. Change-Id: Idb2e049bad0b9b32ac42cc7731cd6903de2826ce	2015-05-26 12:01:52 -07:00
Scott LaVarnway	89ca85dacd	Move inter_predictor to vp9_reconinter.h This function was originally static. Change-Id: I1922fa86711ace884d9f394210b6bb9ea2a0bfe3	2015-05-26 04:22:11 -07:00
James Zern	02fda6582c	Merge changes Ie15e301e,Ib070c79b * changes: vp9_reconintra_neon: cosmetics: reindent vp9_reconintra_neon: cosmetics: drop unneeded returns	2015-05-23 17:47:52 +00:00
James Zern	4e11f3ca6e	vp9_reconintra_neon: cosmetics: reindent Change-Id: Ie15e301e8f55cf928f42a03e53a8bb8b66d0e5d5	2015-05-22 21:04:30 -07:00
James Zern	ff683ab1da	vp9_reconintra_neon: cosmetics: drop unneeded returns Change-Id: Ib070c79bdbb9c1f4e25af693d7056ec9f964c789	2015-05-22 20:59:36 -07:00
James Zern	8c15ced172	vp9: move ssse3 convolve fns to intrinsics file + synchronize filter function signatures this makes any intrinsics filters available for inlining and has the side-effect of making those filters static, quieting missing-prototype warnings. Change-Id: I1908875caffa585bd4fc65aaf10d17a5e20cfb46	2015-05-22 20:14:16 -07:00
James Zern	2161e44025	vp9: move avx2 convolve fns to intrinsics file + synchronize filter function signatures this makes any intrinsics filters available for inlining and has the side-effect of making those filters static, quieting missing-prototype warnings. Change-Id: I1cd55c9d52547793ad65aa90c7620f0e426edaa2	2015-05-22 20:13:06 -07:00
James Zern	ef2b3cce50	add vp9/common/x86/convolve.h collect the vp9_convolve function definition macros there; this will allow some relocation of functions from vp9_asm_stubs.c Change-Id: Idadd117fa256dd48748379856973fd985b8204e8	2015-05-22 20:12:16 -07:00
James Zern	48d8291df4	vp9_subpixel_8t_intrin_ssse3: quiet vs9 warning reorder includes to avoid: warning C4985: 'ceil': attributes not present on previous declaration. this is the same workaround used in vp9/common/vp9_systemdependent.h Change-Id: Ia10dd63de24f96fa1507a6179220e9d6ec774db6	2015-05-22 12:05:02 -07:00
Scott LaVarnway	b962646fc5	Re-worked header files Various header/test files had to be re-worked in order to build "Remove cm parameter from vp9_decode_block_tokens()". This patch reverts the "Remove cm" part and only contains the re-worked header files. Change-Id: I520958a88d1991fee988a3c784d0eac40e117a32	2015-05-22 11:19:51 -07:00
James Zern	a492bcef87	vp9_mvref_common.c: fix compile warning string literal to int within an assert Change-Id: Ifd7acc717e01ee1bb3955ef830ec0d1645942459	2015-05-20 16:45:16 -07:00
Minghai Shang	48bfee8797	[decoder] Optimize context buffer re-allocation 1. Check existing buffer sizes when re-allocate context buffers. 2. Don't need to set mi buffers to 0 during setup_mi. Change-Id: I6b48b0e077a4d804312b605ad0dc34aec5795a6d	2015-05-20 11:05:22 -07:00
James Zern	97db651ce0	vp9: add some missing includes mostly: <file>.c should include <file>.h silences missing prototype warnings Change-Id: Ic05ec32c6f7b2224b78825904d96d73aacad6000	2015-05-15 10:43:47 -07:00
James Zern	330fba41e2	vp9 intrinsics: add vp9_rtcd include silences a missing declaration warning Change-Id: I59a34e1a1377cf3529b678d7ec0122bd43ab1bf1	2015-05-15 10:43:47 -07:00
James Zern	18b60af27c	vp9: correct some function signatures silences missing prototype warnings Change-Id: Idaf68d83d2cb03847f3ee002c4d00c2ac79da604	2015-05-15 10:43:47 -07:00
Frank Galligan	d610ead258	Merge "Move mc_buf to cut down size of MACROBLOCKD."	2015-05-15 15:20:39 +00:00
Frank Galligan	0a80164c94	Move mc_buf to cut down size of MACROBLOCKD. Change-Id: Icea64b9e5632b41aaa7cd7018c501d6add9b7a7f	2015-05-14 19:10:02 -07:00
Johann	cafae5b544	Merge "Relocate memory operations for common code"	2015-05-13 19:47:24 +00:00
Johann	1d7ccd5325	Relocate memory operations for common code With the sad functions, and hopefully the variance functions soon, moving to the vpx_dsp location, place the defines used in the reference C code in a common location. Change-Id: I4c8ce7778eb38a0a3ee674d2f1c488eda01cfeca	2015-05-13 11:41:15 -07:00
Parag Salasakar	686616a989	Merge "mips msa vp9 idct 8x8 optimization"	2015-05-13 04:36:34 +00:00
James Zern	a5e4ca8390	build_intra_predictors*: reduce above_data size currently this needs to be 2x (NEED_ABOVERIGHT) the size of the largest block (32) + 1 (for above_left). reduce the buffer size from 128 + 16 (alignment) to 64 + 16. Change-Id: Idaca1806c7e1214e9437de24e15edc2ebf18f95d	2015-05-08 20:17:20 -07:00
James Zern	6d22713722	Merge "build_intra_predictors*: reduce left_col size"	2015-05-09 00:53:55 +00:00
hkuang	d53fb0fda5	Fix clang ioc warning due to NULL mi pointer. The warning only happens in VP9 encoder's first pass due to src_mi is not set up yet. But it will not fail the encoder as left_mi and above_mi are not used in the first_pass and they will be set up again in the second pass. Change-Id: I0713b4660d71e229e196654cb0970ba6b1574f28	2015-05-08 15:42:50 -07:00
hkuang	f5574fb44c	Merge "Add more sse2 code for intra prediction."	2015-05-08 17:26:30 +00:00
Parag Salasakar	7c5f00f868	mips msa vp9 idct 8x8 optimization average improvement ~4x-6x Change-Id: I5edf713721b9e24c7e0ce2e69d8fc3ecab625d91	2015-05-08 12:23:27 +05:30
Parag Salasakar	a8a9c2bb45	Merge "mips msa vp9 idct 32x32 optimization"	2015-05-08 04:27:44 +00:00
James Zern	7e55ff1593	build_intra_predictors*: reduce left_col size this should only need to be the size of the largest block, i.e., 32, not 64. Change-Id: Ib8cb2424771fdd2a64c55379597248b2722a5ceb	2015-05-07 16:16:42 -07:00
James Zern	fd3658b0e4	replace DECLARE_ALIGNED_ARRAY w/DECLARE_ALIGNED this macro was used inconsistently and only differs in behavior from DECLARE_ALIGNED when an alignment attribute is unavailable. this macro is used with calls to assembly, while generic c-code doesn't rely on it, so in a c-only build without an alignment attribute the code will function as expected. Change-Id: Ie9d06d4028c0de17c63b3a27e6c1b0491cc4ea79	2015-05-07 11:55:08 -07:00
Johann	76a08210b6	Merge "Move shared SAD code to vpx_dsp"	2015-05-07 18:33:06 +00:00
hkuang	086934136b	Merge "Remove an unnecessary check."	2015-05-07 15:51:11 +00:00
Parag Salasakar	1601c1385a	mips msa vp9 idct 32x32 optimization average improvement ~4x-6x Change-Id: Idaba7e49fbd7f388caee0d73773ccf6e4807ef17	2015-05-07 12:42:23 +05:30
hkuang	7153b822ed	Add more sse2 code for intra prediction. vp9_dc_left_predictor_16x16 vp9_dc_top_predictor_32x32 vp9_dc_left_predictor_32x32 vp9_dc_128_predictor_32x32 Change-Id: Ib9861deefd01c3527235b92ff6b3d571ef6b4bc6	2015-05-06 17:17:00 -07:00
Johann	d5d9289800	Move shared SAD code to vpx_dsp Create a new component, vpx_dsp, for code that can be shared between codecs. Move the SAD code into the component. This reduces the size of vpxenc/dec by 36k on x86_64 builds. Change-Id: I73f837ddaecac6b350bf757af0cfe19c4ab9327a	2015-05-06 16:58:20 -07:00
hkuang	240767b29d	Remove an unnecessary check. Change-Id: Id0f224ac4667dd173363b0f05711678448291d4e	2015-05-06 14:15:00 -07:00
hkuang	623e6eed5e	Merge "Optimize the read_partition."	2015-05-06 17:29:52 +00:00
Parag Salasakar	d1cdda88bd	Merge "mips msa vp9 idct 16x16 optimization"	2015-05-06 06:40:56 +00:00
hkuang	4c1a8be29d	Optimize the read_partition. Change-Id: I5a796425ce5706824a2fc17c6f24f983c5b9e43b	2015-05-05 15:51:04 -07:00
James Zern	ccae5d99d2	fix and enable vp9_dc_128_predictor_16x16 widen the loads and stores to 128-bit. this was added, but not enabled in: `493a857` Add some sse2 code for intra prediction. Change-Id: I277d7db608a7db7d75cc0bde86f48fa66ad487e4	2015-05-05 11:40:13 -07:00
hkuang	e47811ef8f	Merge "Add some sse2 code for intra prediction."	2015-05-05 17:11:07 +00:00
Parag Salasakar	60052b618f	mips msa vp9 idct 16x16 optimization average improvement ~4x-6x Change-Id: I55e95b7f2ba403dff11813958dc7c73a900dd022	2015-05-05 12:37:06 +05:30
James Zern	670b2c09ce	vp9_idct_intrin_sse2: cosmetics: reindent + fix some whitespace Change-Id: Id61b739282014288a7e5d3c17a9d6448d9d4cda2	2015-05-01 16:07:54 -07:00
James Zern	c77b1f5acd	vp9: RECON_AND_STORE4X4: remove dest offset offsetting by a variable stride prevents instruction reordering, resulting in poor assembly Change-Id: Id62d6b3299cdd23f8c44f97b630abf4fea241446	2015-04-30 19:14:17 -07:00
James Zern	778845da05	vp9_idct_intrin_*: RECON_AND_STORE: remove dest offset offsetting by a variable stride prevents instruction reordering, resulting in poor assembly. additionally reroll 16x16/32x32 loops to reduce register spill with this new format Change-Id: I0635b8ba21ecdb88116e927dbdab53acdf256e11	2015-04-30 19:14:17 -07:00
Yaowu Xu	2061359fcf	Merge "Remove vp9_idct16x16_10_add_ssse3()"	2015-04-30 23:13:33 +00:00
hkuang	493a8579f1	Add some sse2 code for intra prediction. Change-Id: I16c0a62e52dab62837c547345df31e7518620ed4	2015-04-30 15:42:57 -07:00
Yaowu Xu	47767609fe	Remove vp9_idct16x16_10_add_ssse3() The rotation computation using 2X of cos(pi/16) has a potential to overflow 32 bit, this commit disable the function to allow further investigation and optimization. Change-Id: I4a9803bc71303d459cb1ec5bbd7c4aaf8968e5cf	2015-04-30 09:07:30 -07:00
Parag Salasakar	95cb130f32	Merge "mips msa vp9 copy and avg convolve optimization"	2015-04-30 04:39:13 +00:00
Yaowu Xu	d45870be8d	Merge "Disable ssse3 version idct16x16_256_add()"	2015-04-30 03:09:23 +00:00
Yaowu Xu	486a73a9ce	Disable ssse3 version idct16x16_256_add() The version is currently producing different result from c version for some input. Disable the use of it for now to allow time for investigation the source of mismatch. Change-Id: Id039455494ee531db4886a9f1fa4761174ef6df3	2015-04-29 16:58:59 -07:00
Parag Salasakar	2301d10f73	mips msa vp9 copy and avg convolve optimization average improvement ~3x-5x Change-Id: I422e4c33ea7e6d6783ba40029438ccf21b0e76bb	2015-04-29 12:28:17 +05:30
James Zern	f58011ada5	vpx_mem: remove vpx_memset vestigial. replace instances with memset() which they already were being defined to. Change-Id: Ie030cfaaa3e890dd92cf1a995fcb1927ba175201	2015-04-28 20:00:59 -07:00
James Zern	f274c2199b	vpx_mem: remove vpx_memcpy vestigial. replace instances with memcpy() which they already were being defined to. Change-Id: Icfd1b0bc5d95b70efab91b9ae777ace1e81d2d7c	2015-04-28 19:59:41 -07:00
Frank Galligan	2be50a1c9c	Merge "WIP: Use LUT for y_dequant/uv_dequant"	2015-04-28 16:12:10 +00:00
Scott LaVarnway	afcb62b414	WIP: Use LUT for y_dequant/uv_dequant instead of calculating every block. Change-Id: Ib19ff2546be8441f8755ae971ba2910f29412029	2015-04-28 07:52:06 -07:00
Yunqing Wang	297b2b99de	Fix debugmodes file to print modes and MVs correctly This patch fixed the issues in debugmodes file because of the recent changes in MODE_INFO struct. Change-Id: I4df83379ecc887c1f009d4a8329c9809c5b299d6	2015-04-27 17:09:38 -07:00
Parag Salasakar	1c9af9833d	Merge "mips msa vp9 convolve8 horiz optimization"	2015-04-21 22:08:25 -07:00
Johann	931c0a954f	Merge "Rename neon convolve avg file"	2015-04-21 15:45:29 -07:00
Johann	66b9933b8d	Rename neon convolve avg file Some build systems use just the basename for object files. Change-Id: I333e1107ee866f3906cc46476ef8d04c6200a8a0	2015-04-21 14:18:17 -07:00
Scott LaVarnway	8b17f7f4eb	Revert "Remove mi_grid_* structures." (see I3a05cf1610679fed26e0b2eadd315a9ae91afdd6) For the test clip used, the decoder performance improved by ~2%. This is also an intermediate step towards adding back the mode_info streams. Change-Id: Idddc4a3f46e4180fbebddc156c4bbf177d5c2e0d	2015-04-21 11:16:45 -07:00
Parag Salasakar	ca90d4fd96	mips msa vp9 convolve8 horiz optimization average improvement ~6x-8x Change-Id: I7c91eec41aada3b0a5231dda7869b3b968f3ad18	2015-04-21 12:31:26 +05:30
Parag Salasakar	ef51c1ab5b	mips msa vp9 convolve8 hv optimization average improvement ~5x-8x Change-Id: I3214734cb3716e742907ce0d2d7a042d953df82b	2015-04-21 09:17:49 +05:30
Parag Salasakar	2e36149ccd	Merge "mips msa vp9 convolve8 vert optimization"	2015-04-18 23:39:25 -07:00
Parag Salasakar	27d083c1b9	mips msa vp9 convolve8 vert optimization average improvement ~6x-10x Change-Id: Ie3f3ab3a9005be84935919701e56b404e420affa	2015-04-18 08:13:04 +05:30
Marco Paniconi	f76ccce5bc	Revert "Revert "Force_split on 16x16 blocks in variance partition."" This reverts commit `004b9d83e3` Change-Id: I2f2d0bdb9368c2c07f1d29a69cd461267a3a8743	2015-04-16 17:52:13 -07:00
Johann	14ef4aeafb	Reorganize *_rtcd() calling conventions Change-Id: Ib1e17d8aae9b713b87f560ab5e49952ee2bfdcc2	2015-04-15 11:12:05 -04:00
Yunqing Wang	004b9d83e3	Revert "Force_split on 16x16 blocks in variance partition." This reverts commit `eb8c667570`. The patch caused mismatch while using multi-threads. Change-Id: Icd646340af25b5d91e32f03ed3ea212e00e3e0be	2015-04-14 15:19:31 -07:00
Marco	eb8c667570	Force_split on 16x16 blocks in variance partition. Force split on 16x16 block (to 8x8) based on the minmax over the 8x8 sub-blocks. Also increase variance threshold for 32x32, and add exit condiiton in choose_partition (with very safe threshold) based on sad used to select reference frame. Some visual improvement near moving boundaries. Average gain in psnr/ssim: ~0.6%, some clips go up ~1 or 2%. Encoding time increase (due to more 8x8 blocks) from ~1-4%, depending on clip. Change-Id: I4759bb181251ac41517cd45e326ce2997dadb577	2015-04-13 12:05:07 -07:00
Parag Salasakar	2f693be8f8	Merge "mips msa vp9 common headers added"	2015-04-09 21:50:15 -07:00
Jingning Han	93d9c50419	Merge "SSSE3 assembly implementation of 8x8 Hadamard transform"	2015-04-09 11:16:11 -07:00
Parag Salasakar	481fb7640c	mips msa vp9 common headers added Change-Id: Ia31ada59172eb1818e1eb91009f83cbb1f581223	2015-04-09 15:35:12 +05:30
Jingning Han	7f629dfca4	SSSE3 assembly implementation of 8x8 Hadamard transform It uses about 10% less CPU cycles than the SSE2 intrinsic implementation. Change-Id: I91017c0c068679a214b98cdd4cff3a6facfb7499	2015-04-04 09:59:37 -07:00
James Zern	44e3640923	Merge "vp9: enable sse4 sad functions"	2015-04-03 14:57:52 -07:00
James Zern	b644384bb5	Merge "vp9: fix high-bitdepth NEON build"	2015-04-01 23:36:17 -07:00
Yaowu Xu	54210f706c	Merge "use MAX_MB_PLANE consistently"	2015-04-01 18:24:39 -07:00
Yaowu Xu	f26b8c84f8	use MAX_MB_PLANE consistently Change-Id: Ic416a7f145001a88f5a7f70dde9b1edbc1b69381	2015-04-01 15:21:20 -07:00
Jingning Han	1470529f62	Refactor block_yrd function for RTC coding mode This commit separates Hadamard transform/quantization operations from rate and distortion computation in block_yrd. This allows one to skip SATD computation when all transform blocks are quantized to zero. It also uses a new block error function that skips repeated computation of sum of squared residuals. It reduces the CPU cycles spent on block error calculation in block_yrd by 40%. Change-Id: I726acb2454b44af1c3bd95385abecac209959b10	2015-04-01 12:00:43 -07:00
James Zern	14e24a1297	vp9: enable sse4 sad functions sse4 isn't set by configure or used in rtcd, correct the sad entries to use sse4_1 without changing the signatures for now. this was done in vp8 post-vp9 branch. Change-Id: Ia9f1fff9f2476fdfa53ed022778dd2f708caa271	2015-03-31 21:00:55 -07:00
James Zern	8845334097	vp9: fix high-bitdepth NEON build remove incorrect specializations in rtcd and update a configuration check in partial_idct_test.cc Change-Id: I20f551f38ce502092b476fb16d3ca0969dba56f0	2015-03-31 17:45:25 -07:00
hui su	d4f2f1dd5b	Merge "Move vp9_coef_con_tree to common/"	2015-03-31 10:51:10 -07:00
Jingning Han	db5ec37edc	Merge "Enable 16x16 Hadamard transform in SATD based mode decision"	2015-03-31 09:55:41 -07:00
hui su	302e24cb3e	Move vp9_coef_con_tree to common/ This tree should be defined in common/, as it is needed for both encoder and decoder. Change-Id: I4f5cbc80025cf2ced14182c98f7c82dc7d0f87db	2015-03-31 09:20:46 -07:00
Jingning Han	26d3d3af6a	Enable 16x16 Hadamard transform in SATD based mode decision This commit replaces the 16x16 2D-DCT transform with Hadamard transform for RTC coding mode. It reduces the CPU cycles cost on 16x16 transform by 5X. Overall it makes the speed -6 encoding speed 1.5% faster without compromise on compression performance. Change-Id: If6c993831dc4c678d841edc804ff395ed37f2a1b	2015-03-30 15:43:31 -07:00
Jingning Han	f0ac5aaa08	Merge "Hadamard transform based coding mode decision process"	2015-03-30 15:43:15 -07:00
Jingning Han	8c411f74e0	Hadamard transform based coding mode decision process This commit uses Hadamard transform based rate-distortion cost estimate for rtc coding mode decision. It improves the compression performance of speed -6 for many hard clips at lower bit-rates. For example, 5.5% for jimredvga, 6.7% for mmmoving, 6.1% for niklas720p. This will introduce extra encoding cycle costs at this point. Change-Id: Iaf70634fa2417a705ee29f2456175b981db3d375	2015-03-30 14:46:05 -07:00
jackychen	68610ae568	vp9_postproc.c: eliminate -Wshadow build warnings. Change-Id: I6df525a9ad1ae3cfbba8710d21db8fee76e64dbb	2015-03-27 20:27:30 -07:00
Alex Converse	a1e20ec58f	Refactor fast loop filter code to handle 444. Change-Id: I921b1ebabdf617049f8fa26fbe462c3ff115c1ce	2015-03-24 11:17:50 -07:00
hkuang	9f4f98fdbd	Merge "Optimize the intra frame decode to skip some unnecessary copy."	2015-03-23 16:50:37 -07:00
hkuang	85107641a4	Optimize the intra frame decode to skip some unnecessary copy. This speeds up a normal YT style 1080P clip decode by ~1% on nexus 7. Change-Id: Ied7fa0d8bc941b2adb4db9382f549ee4d5654f3a	2015-03-23 10:11:49 -07:00
hkuang	b88dac8938	Safely free all the frame buffers after all the workers finish the work. Issue: 978 Change-Id: Ia7aa809095008f6819a44d7ecb0329def79b1117	2015-03-19 12:21:00 -07:00
Yaowu Xu	73508be364	Fix a typo introduced in #94401aff This fixes all test vector failures Change-Id: Ie1a9fe0f023f7a0c7e89eb55df1b40ff65302adc	2015-03-12 08:01:08 -07:00
hkuang	4a691aa209	Merge "Refactor the block decode code to make it simpler."	2015-03-11 16:19:14 -07:00
hkuang	94401aff5c	Refactor the block decode code to make it simpler. Change-Id: I0f983cb821ad7ec6fbefe7895cb8124a8fa39df6	2015-03-11 11:37:16 -07:00
Yunqing Wang	f0cf9719d0	Accumulate tx_totals counters in multi-threaded encoder Tx_totals counters weren't handled correctly in multi-thread case, which caused the mismatch while encoding using threads > 1. This patch fixed that. Change-Id: Ice9b0386f57175fb92a0bdcd5042686a3106246a	2015-03-10 10:02:49 -07:00
Hangyu Kuang	a1ef75bb63	Merge "Only wait for previous frame's motion vector if needed."	2015-03-06 10:27:26 -08:00
Hangyu Kuang	d5fa786b4f	Only wait for previous frame's motion vector if needed. Change-Id: Iecce685a33b64844446c0009f21bc85566d7469f	2015-03-05 16:09:44 -08:00
Johann	42eb97eb91	Declare function used by 'once' with 'void' parameters Visual Studio is exceptionally picky about this: vp9_reconintra.c(900): warning C4113: 'void (__cdecl )()' differs in parameter lists from 'void (__cdecl )(void)' [.build-x86_64-win64-vs10\vpx.vcxproj] Change-Id: I564c7415f4608fd962be8c699d6133a996b545f7	2015-03-04 15:34:55 -08:00
Adrian Grange	3807dd82ab	Make encoder buffer allocation dynamic Frame buffers are now allocated dynamically on-demand. Entries in the reference frame map, cm->ref_frame_map, may now be set to -1 (INVALID_IDX) to indicate that there is not a valid reference buffer in that "slot". All slots in the reference frame map are now initialized to the empty state (-1) and each buffer is initialized to have a reference count of 0. Change-Id: Id1afe98de98db4ae8b2dfefed7889c3b28c68582	2015-03-04 07:58:32 -08:00
Yunqing Wang	55639c383b	fix a race condition caused by intra function pointer initialization This patch fixed webm issue 962. (https://code.google.com/p/webm/issues/detail?id=962) The data races occurred when an encoder and a decoder were created at the same time, and the function pointers were initialized twice. Change-Id: I8851b753c4b4ad4767d6eea781b61f0ac9abb44b	2015-03-03 09:58:37 -08:00
Jingning Han	1790d45252	Use variance metric for integral projection vector match This commit replaces the SAD with variance as metric for the integral projection vector match. It improves the search accuracy in the presence of slight light change. The average speed -6 compression performance for rtc set is improved by 1.7%. No speed changes are observed for the test clips. Change-Id: I71c1d27e42de2aa429fb3564e6549bba1c7d6d4d	2015-03-01 10:42:56 -08:00
Jingning Han	c4cb8059ff	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 4"	2015-02-27 09:49:10 -08:00
Jingning Han	43bb97f7d0	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 3"	2015-02-27 09:49:00 -08:00
Jingning Han	4800b0e80d	Merge "Fix high bit-depth loop-filter sse2 compiling issue - part 2"	2015-02-27 09:48:51 -08:00
Jingning Han	8ec22296b3	Fix high bit-depth loop-filter sse2 compiling issue - part 3 Change-Id: Idb14b9a285f8098126f967c5e2750221d6a58f69	2015-02-26 15:21:22 -08:00
Jingning Han	14ff1cb74a	Fix high bit-depth loop-filter sse2 compiling issue - part 2 Change-Id: I6728b69bb3dff1daa64ff7142f691e80a089f1c4	2015-02-26 12:41:19 -08:00
Jingning Han	2080e4b206	Fix high bit-depth loop-filter sse2 compiling issue - part 1 The intrinsic statement _mm_subs_epi16() should take immediate. Feeding variable as its input argument will cause compile failure in older version gcc. Change-Id: I6a71efcc8d3b16b84715e0a9bcfa818494eea3f4	2015-02-25 09:59:50 -08:00
James Zern	044bfa3949	Merge "vp9_loopfilter: quiet integer constant size warnings"	2015-02-24 19:09:32 -08:00
Jingning Han	5b87f1bb5a	Fix high bit-depth loop-filter sse2 compiling issue - part 4 Change-Id: I39f56f60425836f2e1ec07da71edd4810a4c78bb	2015-02-24 14:50:30 -08:00
James Zern	279d350f0b	vp9_loopfilter: quiet integer constant size warnings mark uint64_t constants with 'ULL' Change-Id: I7648e161b4004fba35e1fa7ab79e34cc19e39716	2015-02-24 11:13:16 -08:00
Hangyu Kuang	8724d31d12	Move dequant table from VP9_COMMON to VP9_COMP as decoder does not need it any more. This reduces VP9_COMMON size from 25776 bytes to 17584 bytes(~31%). Change-Id: Ic5daea732ccefb6d512b048af7983f0efe08589b	2015-02-20 11:12:42 -08:00
Jingning Han	216b171d63	Merge "Integral projection based motion estimation"	2015-02-19 15:08:11 -08:00
Jingning Han	ed2dc59c1b	Integral projection based motion estimation This commit introduces a new block match motion estimation using integral projection measurement. The 2-D block and the nearby region is projected onto the horizontal and vertical 1-D vectors, respectively. It then runs vector match, instead of block match, over the two separate 1-D vectors to locate the motion compensated reference block. This process is run per 64x64 block to align the reference before choosing partitioning in speed 6. The overall CPU cycle cost due to this additional 64x64 block match (SSE2 version) takes around 2% at low bit-rate rtc speed 6. When strong motion activities exist in the video sequence, it substantially improves the partition selection accuracy, thereby achieving better compression performance and lower CPU cycles. The experiments were tested in RTC speed -6 setting: cloud 1080p 500 kbps 17006 b/f, 37.086 dB, 5386 ms -> 16669 b/f, 37.970 dB, 5085 ms (>0.9dB gain and 6% faster) pedestrian_area 1080p 500 kbps 53537 b/f, 36.771 dB, 18706 ms -> 51897 b/f, 36.792 dB, 18585 ms (4% bit-rate savings) blue_sky 1080p 500 kbps 70214 b/f, 33.600 dB, 13979 ms -> 53885 b/f, 33.645 dB, 10878 ms (30% bit-rate savings, 25% faster) jimred 400 kbps 13380 b/f, 36.014 dB, 5723 ms -> 13377 b/f, 36.087 dB, 5831 ms (2% bit-rate savings, 2% slower) Change-Id: Iffdb6ea5b16b77016bfa3dd3904d284168ae649c	2015-02-19 13:47:19 -08:00
James Zern	0dd591bedd	loop_filter_rows_mt: remove dependency on 'last_height' using this to control reallocation would miss a change if the function were not called for every frame. fixes potential memory corruption by the subsequent memset Change-Id: I4c6bb6ab68803104fc824c7e27cc2f9b2cf53e33	2015-02-13 19:11:23 -08:00
Yunqing Wang	238707ab4c	Merge "Make vp9_print_modes_and_motion_vectors() work"	2015-02-11 16:58:52 -08:00
James Zern	d8ed558c99	Merge "vp9_thread: prefer pthread.h if available"	2015-02-11 16:50:07 -08:00
James Zern	923cc0bf51	vp9_highbd_tm_predictor_16x16: fix win64 by saving xmm8; cglobal's xmm reg arg is 0-based Change-Id: Ic8426ec9ac59ab4478716aa812452a6406794dcb	2015-02-10 19:34:12 -08:00
Yunqing Wang	f37788eaf6	Make vp9_print_modes_and_motion_vectors() work MODE_INFO struct was modified, and vp9_print_modes_and_motion_vectors() didn't work anymore. This patch modified vp9_debugmodes.c so that this function works again for debug usage. Change-Id: I293fae0295235deb2529a460a274caf7c045ac1a	2015-02-10 16:37:02 -08:00
James Zern	d167a1aeee	vp9_thread: prefer pthread.h if available this avoids conflicts with recent versions of mingw-w64 (tested g++ 4.8.2) and the unit tests Change-Id: Ic41ea31eebe0e3e712ed5e657f37d8cad6712088	2015-02-10 12:47:14 -08:00
Yunqing Wang	84b813aa42	Merge "Make encoder and decoder share common thread function"	2015-02-10 09:06:41 -08:00
Yunqing Wang	d3a37731c2	Merge "Rename loopfilter_thread files to thread_common files"	2015-02-10 09:06:23 -08:00
hkuang	dd88f48296	Set the maximum decode threads to be 8. This will fix the frame parallel decode hang on windows due to not enough semaphores. This will also make the frame parallel decode safer as the number of frame buffers could only support maximum 8 threads. Change-Id: Id9ef50692819dcbebbd74a0aabffbfb3f39a4309	2015-02-09 10:38:41 -08:00
Yunqing Wang	4ae092c660	Make encoder and decoder share common thread function Moved vp9_accumulate_frame_counts to vp9_thread_common.c to eliminate the duplicate code. Change-Id: I9cf506d729603c8bf1494b4c86a3b7d47af1917a	2015-02-06 11:45:51 -08:00
Yunqing Wang	41063137c3	Rename loopfilter_thread files to thread_common files Renames the files to allow more common thread code to be moved to vp9/common. Change-Id: I7386e64e221086e3cdc087e79812f993c423413b	2015-02-06 10:03:31 -08:00
Jingning Han	0c6d3a03e1	Account for chroma component costs in RTC mode decision This commit allows the encoder to account for additional chroma plane costs in the mode decision process, if the current block potentially contains significant color change. It improves the visual quality at very low bit-rates. The compression performance of dark720p is improved by 12.39% in speed 6. For jimred at 150 kbps, the PSNR of V component (red) increased by 0.2 dB, at the expense of about 5% increase in encoding time. Note that for sequences where the chroma components are fairly consistent, the encoding time increase is negligible. On average the rtc set compression performance is improved by 1.172% in PSNR and 1.920% in SSIM. Change-Id: Ia55b24ef23a25304f7ec9958fbf07fd6e658505c	2015-02-04 09:45:14 -08:00
hkuang	70554a21f1	Merge "Remove duplicate code."	2015-02-03 13:37:48 -08:00
Jim Bankoski	9f1cf2c8cf	make low bitrates a lot less blocky Remove loop filter skip at speed 7+ because of bad visual artifacts and up the postprocessing. Change-Id: Ibdd0bac71aaee232d2bb2e14462733c51517768d	2015-02-03 06:45:56 -08:00
Yaowu Xu	80e729f601	Merge "Optimize coef update"	2015-02-01 20:08:29 -08:00
hkuang	be6aeadaf4	Try again to merge branch 'frame-parallel' into master branch. In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. Current frame parallel decode will only speed up the decoding for frame parallel encoded videos. For non frame parallel encoded videos, frame parallel decode is slower than serial decode due to lack of loopfilter worker thread. There are still some known issues that need to be addressed. For example: decode frame parallel videos with segmentation enabled is not right sometimes. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c This reverts commit `a18da9760a`. Change-Id: I361442ffec1586d036ea2e0ee97ce4f077585f02	2015-01-30 21:00:13 -08:00
James Zern	f6c2a6c5d6	vp9: rename 'near' parameters + nearest for consistency near is a reserved word in windows builds so using it as a parameter name may cause build failures with some configurations Change-Id: Iddf1d4ecdb39843f14e95dbfd9dca55f07f81403	2015-01-30 15:52:24 -08:00
Yaowu Xu	45971abd1d	Optimize coef update 1. move the check of search method of USE_TX_8X8 up one level to avoid operations of build_tree_distributions() 2. count tx used and avoid computaton for coef udpate when one size is not used at all. Change-Id: Ia3e54a2588aa531c41377a1bfaa64385d04a592c	2015-01-30 10:16:40 -08:00
hkuang	e8c42fb0bd	Remove duplicate code. (issue #934). Change-Id: Ic8adaaff87aae0b33d9b508f160b48e0ccdaaf4c	2015-01-28 12:00:34 -08:00
Frank Galligan	e3167f7fbf	Add vp9_sad32x32x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~18% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I70ccdea0326750552ed946fb004507d6efe02d5c	2015-01-27 08:54:00 -08:00
Frank Galligan	9f574d0316	Add vp9_sad16x16x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~15% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: I4b2006b644c488f42bf06d8a22ef0e6120a96bf9	2015-01-27 08:42:17 -08:00
Frank Galligan	54fa956715	Add vp9_sad64x64x4d_neon Neon intrinsic function. On Nexus 7 speed -6 saw ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. BUG=https://code.google.com/p/webm/issues/detail?id=908 Change-Id: Id12af7d1883243c23e6692e898aea82299633d58	2015-01-27 08:33:40 -08:00
Frank Galligan	9f6eba419a	Add Neon intrinsic vp9_fdct8x8_quant_neon On Nexus 7 speed -5 got ~2%, -6 got ~15%, -7 and -8 got ~30% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I83246d63b96674d170098a572fa4fe28a05aaf51	2015-01-24 22:49:50 -08:00
Yaowu Xu	643c75d90b	Merge "Replace divide with look-up"	2015-01-23 21:12:18 -08:00
JackyChen	65f60f8e8c	Merge "SSE2 code for the filter in MFQE."	2015-01-23 11:08:16 -08:00
Yaowu Xu	eda179764f	Replace divide with look-up This commit replaces an integer divide with a table-lookup. It is to improve decoding speed, and at the same time, to reduce possible complications with a bug in AMD Family 12h processors: "665 Integer Divide Instruction May Cause Unpredictable Behavior" Change-Id: I678b707a538798a923850bac467e66e847e6def7	2015-01-23 09:02:07 -08:00
Johann	a18da9760a	Revert "Merge branch 'frame-parallel' to enable frame parallel decode in master branch." This reverts commit `bde04ce503` Change-Id: I053dae04c761b04a36dc239558503905a14d2470	2015-01-23 08:42:02 -08:00
hkuang	bde04ce503	Merge branch 'frame-parallel' to enable frame parallel decode in master branch. In frame parallel decode, libvpx decoder decodes several frames on all cpus in parallel fashion. If not being flushed, it will only return frame when all the cpus are busy. If getting flushed, it will return all the frames in the decoder. Compare with current serial decode mode in which libvpx decoder is idle between decode calls, libvpx decoder is busy between decode calls. VP9 frame parallel decode is >30% faster than serial decode with tile parallel threading which will makes devices play 1080P VP9 videos more easily. * frame-parallel: Add error handling for frame parallel decode and unit test for that. Fix a bug in frame parallel decode and add a unit test for that. Add two test vectors to test frame parallel decode. Add key frame seeking to webmdec and webm_video_source. Implement frame parallel decode for VP9. Increase the thread test range to cover 5, 6, 7, 8 threads. Fix a bug in adding frame parallel unit test. Add VP9 frame-parallel unit test. Manually pick "Make the api behavior conform to api spec." from master branch. Move vp9_dec_build_inter_predictors_* to decoder folder. Add segmentation map array for current and last frame segmentation. Include the right header for VP9 worker thread. Move vp9_thread.* to common. ctrl_get_reference does not need user_priv. Seperate the frame buffers from VP9 encoder/decoder structure. Revert "Revert "Revert "Revert 3 patches from Hangyu to get Chrome to build:""" Conflicts: test/codec_factory.h test/decode_test_driver.cc test/decode_test_driver.h test/invalid_file_test.cc test/test-data.sha1 test/test.mk test/test_vectors.cc vp8/vp8_dx_iface.c vp9/common/vp9_alloccommon.c vp9/common/vp9_entropymode.c vp9/common/vp9_loopfilter_thread.c vp9/common/vp9_loopfilter_thread.h vp9/common/vp9_mvref_common.c vp9/common/vp9_onyxc_int.h vp9/common/vp9_reconinter.c vp9/decoder/vp9_decodeframe.c vp9/decoder/vp9_decodeframe.h vp9/decoder/vp9_decodemv.c vp9/decoder/vp9_decoder.c vp9/decoder/vp9_decoder.h vp9/encoder/vp9_encoder.c vp9/encoder/vp9_pickmode.c vp9/encoder/vp9_rdopt.c vp9/vp9_cx_iface.c vp9/vp9_dx_iface.c Change-Id: Ib92eb35851c172d0624970e312ed515054e5ca64	2015-01-22 18:18:53 -08:00
Frank Galligan	469ff48d7b	Merge "Add Neon intrinsics for vp9_avg_8x8_neon"	2015-01-20 14:38:39 -08:00
Yunqing Wang	6d7b7abf52	Add non420 code in multi-threaded loopfilter Added non420 part back to make it consistent with single thread code in vp9_loopfilter.c. Change-Id: I8ca255d73bffebae294d2627d6655eafe535cb90	2015-01-20 09:31:47 -08:00
JackyChen	09673deba9	SSE2 code for the filter in MFQE. The SSE2 code is from VP8 MFQE, reuse it in VP9. No change on VP8 side. In our testing, we achieve 2X speed by adopting this change. Change-Id: Ib2b14144ae57c892005c1c4b84e3379d02e56716	2015-01-18 16:07:59 -08:00
Yunqing Wang	e76eaf05b1	vp9_ethread: add parallel loopfilter 1. Added row-based loopfilter in encoder; 2. Moved common multi-threaded loopfilter functions from decoder to common; 3. Merged multi-threaded loopfilter code, and made encoder/ decoder call same function to reduce code duplication. Encoder tests showed that 1% - 2% speedup was seen for good-quality 2-pass mode(at speed 3); 1% - 3% speedup using 2 threads and 4% - 6% speedup using 4 threads were seen for real-time mode(at speed 7). Change-Id: I8a4ac51c2ad9bab9fa7b864e90743931c53ec1c4	2015-01-16 17:19:27 -08:00
Frank Galligan	6e7e1cf32f	Add Neon intrinsics for vp9_avg_8x8_neon On Nexus 7 speed -5, -6, -7, and -8 saw about a 1% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 1.5% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: Ibf17ebfd952a6aec941719bd8306df8ec4574bee	2015-01-15 15:32:40 -08:00
Yaowu Xu	829a01dbb7	Merge "Add encoder control for setting color space"	2015-01-14 14:14:34 -08:00
Yaowu Xu	e94b415c34	Add encoder control for setting color space This commit adds encoder side control for vp9 to set color space info in the output compressed bitstream. It also amends the "vp9_encoder_params_get_to_decoder" test to verify the correct color space information is passed from the encoder end to decoder end. Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650	2015-01-14 10:17:14 -08:00
Frank Galligan	ec1d8387e1	Add 64x64 sub_pel_variance Neon function On Nexus 7 speed -5, -6, -7, and -8 saw about a 15% increase in perf for 480p. Speeds -5, -6, -7, and -8 saw about a 10% increase in perf for 720p. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I2fa5315845e3021c9a6e2ea47e52e68b398d8334	2015-01-14 08:36:24 -08:00
Frank Galligan	74d40cd507	Add 64x variance Neon functions Add optimized Neon functions of: vp9_variance32x64 vp9_variance64x32 vp9_variance64x64 On Nexus 7 speed -5 and -6 saw about a 4% increase in perf. Speeds -7 and -8 saw about a 6% increase in perf. Tested on Nexus 7, built with ndk r10d, gcc 4.9. Change-Id: I5a81f13c9897eb927fa39662530f5524a0f768fa	2015-01-13 15:08:13 -08:00
James Zern	4d6838627d	Merge "vp9: add per-tile longjmp error handling"	2015-01-08 15:53:37 -08:00
Johann	00bbe342c2	Merge "Disable vp9 _8_ loopfilters"	2015-01-08 12:47:52 -08:00
Yaowu Xu	01eec75858	Merge "Refactor calculation of tile_cols"	2015-01-07 16:24:57 -08:00
JackyChen	1883c940b9	Merge "Use qdiff to adjust the threshold of sad and variance in MFQE."	2015-01-07 14:57:46 -08:00
Yaowu Xu	e9cf9b7dfe	Refactor calculation of tile_cols Change-Id: I2c38ea2bcf6d221a0b6b2fb9be4cebbee21006a3	2015-01-07 14:28:59 -08:00
JackyChen	60cf5cf7b2	Use qdiff to adjust the threshold of sad and variance in MFQE. When qdiff is larger, the sad/variance threshold should also be higher which indicates a more aggressive action on MFQE. Change-Id: I44c5c93572805458d4f87fdc7619cc9d8a522185	2015-01-07 09:07:10 -08:00
Johann	377b6682f9	Disable vp9 _8_ loopfilters Investigating https://code.google.com/p/chromium/issues/detail?id=443839 Change-Id: Ibb7485d835c5aa5e1d40f31715596ba8d208eedb	2015-01-06 19:26:11 -08:00

... 3 4 5 6 7 ...

3103 Commits