Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.
Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):
1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain
2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.
Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.
Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
Do mvp clamping in full-pixel precision instead of 1/8-pixel
precision to avoid error caused by right shifting operation.
Also, further fixed the motion vector limit calculation in change:
b748045470
Change-Id: Ied88a4f7ddfb0476eb9f7afc6ceeddbf209fffd7
Motion vector limits are calculated using right shifts, which
could give wrong results for negative numbers. James Berry's
test on one clip showed encoder produced some artifacts. This
change fixed that.
Change-Id: I035fc02280b10455b7f6eb388f7c2e33b796b018
The starting points are always within the limits, and bounds
checking on these points is not needed. For speed < 5, the
encoded result changes a little because different treatment
is taken while starting point equals the bounds.
Change-Id: I09a402d310f51e305a3519f1601b1d17b05c6152
In real-time mode motion search, there is no need to calculate
variance. This change improved encoding speed by 1% ~ 2%(speed=-5).
Change-Id: I65b874901eb599ac38fe8cf9cad898c14138d431
this commit makes the usage errorperbit and sadperbit consistent for
encoding modes and passes. Removed all different magic weight factors
associated with errorperbit. Now 1/2 is used for both sadperbit16 and
sadperbit4, the /2 operation is merged into initializations of the 2
variables.
Tests on cif set show .23%, 0.18% and 0.19% gain by avg psnr, overall
psnr and ssim respectively.
Change-Id: Ifa285c3e065ce0a5a77addfc9f95aabf54ee270d
sad_per_bit has been used for a number of motion vector search routines
with different magic weights: 1, 1/2 and 1/4. This commit remove these
magic numbers and use 1/2 for all motion search routines, also reformat
a number of source code lines to within 80 column limit.
Test on cif set shows overall effect is neutral on all metrics. <=0.01%
Change-Id: I8a382821fa4cffc9c0acf8e8431435a03df74885
error_per_bit and sad_per_bit were designed as estimates of a bit worth
of sum squared error and sum absolute difference respectively. Under
this assumption, error_per_bit should be used in combination with 2nd
order errors (variance or sum squared error) while sad_per_bit should
be used in combination with 1st order SADs in motion estimation. There
were a few places where sad_per_bit has been misused with variances,
this commit changes to use error_per_bit for those places, also changes
parameter names to properly indicate which constant is being used.
On cif set, the change has a universal gain by all metrics: 0.13% by
average/overall psnr and 0.1% by ssim.
Change-Id: I4850fdcc3fd6886b30f784bd843f13dd401215fb
The compiler produces better assembly when using int_mv
for assignments. The compiler shifts and ors the two 16bit
values when assigning MV.
Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
Further modification and wrong implementation fix which caused
refining_search and refining_searchx4 result mismatching.
Change-Id: I80cb3a44bf5824413fd50c972e383eebb75f9b6f
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.
Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.
Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
Changed 8-neighbor searching to 4-neighour searching, and continued
searching until the center point is the best match.
Test on test set showed 1.3% encoding speed improvement as well as
0.1% PSNR and SSIM improvement at speed=-5 (rt mode).
Will continue to improve it.
Change-Id: If4993b1907dd742b906fd3f86fee77cc5932ee9a
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.
Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
from motion search as distortion. This helps performance in real-
time mode.
Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.
Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.
These symbols were identified by:
$ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
| sort | grep '^ *1 '
Change-Id: I59609f58ab65312012c047036ae1e0634f795779
Applied better MV prediction in real-time mode, which improves
the encoding quality.
Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.
Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.
Change-Id: I349a96c452bd691081d8c8e3e54419e7f477bebd
Remove allocation/deallocation of stats storage.
Remove full search functions in machine specific encoder inits.
Remove last pass validation in validate_config.
Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
This fix added MV range checks for NEWMV mode as suggested by Jim.
To reduce unnecessary MV range checks, I tried Yaowu's suggestion.
Update UMV borders in NEWMV mode to also cover MV range check.
Also, in this way, every MV that is valid gets checked in diamond
search function.
Change-Id: I95a89ce0daf6f178c454448f13d4249f19b30f3a
The MV's range is 256. Since the new motion search uses a different
starting MV than the center ref MV, a MV range checking needs to
be done to avoid corruption.
Change-Id: I8ae0721d1bd203639e13891e2e54a2e87276f306
Add vp8_mv_pred() to better predict starting MV for NEWMV
mode in vp8_rd_pick_inter_mode(). Set different search
ranges according to MV prediction accuracy, which improves
encoder performance without hurting the quality. Also,
as Yaowu suggested, using diamond search result as full
search starting point and therefore adjusting(reducing)
full search range helps the performance.
Change-Id: Ie4a3c8df87e697c1f4f6e2ddb693766bba1b77b6
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4
(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.
Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
NEON has optimized 16x16 half-pixel variance functions, but they
were not part of the RTCD framework. Add these functions to RTCD,
so that other platforms can make use of this optimization in the
future and special-case ARM code can be removed.
A number of functions were taking two variance functions as
parameters. These functions were changed to take a single
parameter, a pointer to a struct containing all the variance
functions for that block size. This provides additional flexibility
for calling additional variance functions (the half-pixel special
case, for example) and by initializing the table for all block sizes,
we don't have to construct this function pointer table for each
macroblock.
Change-Id: I78289ff36b2715f9a7aa04d5f6fbe3d23acdc29c
The ARM version of vp8_hex_search() is a faster implementation
of the same algorithm. Since it doesn't use any ARM specific
code, it can be made the default implementation. This removes
a linking error.
Change-Id: I77d10f2c16b2515bff4522c350004e03b7659934
These functions were true duplicates of functions present in the
generic code. This fixes some of the link errors when building
with --enable-shared --enable-pic.
Change-Id: Idff26599d510d954e439207883607ad6b74df20c
In order to know if all 4/8 neighbor points are within the bounds,
4 bounds checking are enough instead of checking 4 bounds for
each points (16/32 checkings). This improvement reduces cost of
vp8_diamond_search_sadx4() by 30%, and gives encoder a 1.5%
performance gain (test options: 1 pass, good, speed=4).
Change-Id: Ie8da29d18a6ecfc9829e74ac02f6fa70e042331a
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
bestsad needs to be a int and set to INT_MAX because at the end
of the function it is compared to INT_MAX to determine if there
was a match in the function.
Change-Id: Ie80e88e4c4bb4a1ff9446079b794d14d5a219788
bestsad should be an int initialized to INT_MAX. The optimized
SAD function expects a signed value for bestsad to use for comparison
and early loop termination. When no match is made, which is
determined by a comparison of bestsad to INT_MAX, INT_MAX is returned.
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.
Change-Id: I236c05fade06589e417179c0444cb39b09e4200d