Yunqing Wang 20bd1446c0 Preload reference area to an intermediate buffer in sub-pixel motion search
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
  3.4%   at --rt --cpu-used =-4
  2.8%   at --rt --cpu-used =-3
  2.3%   at --rt --cpu-used =-2
  2.2%   at --rt --cpu-used =-1

Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.

Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.

Make this change exclusively for x86 platforms.

Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
2011-07-22 09:28:06 -04:00
..
2011-07-08 09:31:41 +03:00
2011-04-25 17:37:41 -04:00
2011-07-08 09:23:38 -04:00
2011-06-08 16:19:37 -04:00
2011-02-24 14:35:18 -05:00
2011-05-24 13:24:52 -04:00
2010-10-27 18:08:04 -07:00
2011-04-13 14:26:45 -04:00
2011-02-08 17:42:54 -05:00
2011-05-24 13:24:52 -04:00
2011-06-13 17:14:11 -04:00
2010-10-27 18:08:04 -07:00
2011-07-08 09:31:41 +03:00
2011-07-08 09:31:41 +03:00
2011-07-08 09:31:41 +03:00
2010-10-27 18:08:04 -07:00
2010-10-27 18:08:04 -07:00
2011-05-12 11:08:16 -04:00
2011-07-08 09:31:41 +03:00
2011-07-18 11:48:05 -04:00
2011-06-27 09:43:21 -07:00
2010-10-27 18:08:04 -07:00
2010-10-27 18:08:04 -07:00
2010-10-27 18:08:04 -07:00
2010-10-27 18:08:04 -07:00
2011-05-06 00:13:59 -04:00
2010-10-27 18:08:04 -07:00
2010-10-27 18:08:04 -07:00