openh264

Author	SHA1	Message	Date
Martin Storsjö	932a38abc0	Reformat the copyright header of deblocking_neon.S This makes it identical to the ones in the other files.	2014-05-31 13:44:21 +03:00
dongzhang	218adc7e29	Fix a bug in deblocking for neon 32 bit arm implementation	2014-05-09 14:06:16 +08:00
Martin Storsjö	23f57adaea	Do full register loads instead of single-lane loads in DeblockLumaEq4H_neon Instead of loading the registers one lane at a time, load full registers and then transpose them. This is faster, reducing the runtime for the function from about 506 cycles to 434 cycles (tested on a Cortex A8). This also avoids an issue which seems like a cpu bug, present on Sony Xperia T (cpu implementer 0x51 architecture 7 variant 0x1 part 0x04d). On such a device, it seemed like the "vswp q9, q10" could start executing before the previous vld4.u8 {d20[x],d21[x],d22[x],d23[x]}, [r3], r1 had finished and written back their result. Changing the "vswp q9, q10" into "vswp q10, q9", or into separate "vswp d18, d20; vswp d19, d21" (or the other way around) seemed to avoid the issue. This happened occasionally (a couple times per 100000 invocations or so).	2014-04-28 10:12:16 +03:00
Licai Guo	3f2ea77908	Merge pull request #719 from dongzha/MC Modify ARM32 Neon code for Expand Chroma Picture, when UVWidth%16==8.	2014-04-21 14:38:51 +08:00
Licai Guo	2f8c539e60	Merge pull request #707 from dongzha/FixIssueMcNEON Fix potential issue for neon implement on encoder mode decision.	2014-04-17 17:26:25 +08:00
dongzhang	a4f59bc0d7	Modify ARM32 Neon code for Expand Chroma Picture, when UVWidth%16==8.	2014-04-17 15:58:30 +08:00
Licai Guo	c8e1a41c29	Move copy_mb neon code to common folder	2014-04-17 10:06:48 +08:00
Dong Zhang	8a4300be50	Fix potential issue for neon implement on encoder mode decision. Error happens when ME_REFINE_BUF_STRIDE is not equal to 32.	2014-04-13 19:41:29 -07:00
Licai Guo	e39de8d404	reoranize common to inc/src/x86/arm	2014-03-18 19:41:32 -07:00

9 Commits