for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
Prepend . to local labels in assembly code. This
allows non unique labels within a file. Also
makes profiling information more informative
by keeping the function name with the loop name.
Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I29f5c2ead342b2086a468029c15e2c1d948b5d97
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.
This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.
Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4
Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.
Also use min GF interval as min KF interval.
Change-Id: Iee238b8cae0ffaed850a5a944ac825cee18da485
This reverts commit b5ea2fbc2c1554769848774c836aad262af95072. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.
Conflicts:
vp8/encoder/onyx_if.c
vp8/encoder/ratectrl.c
Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.
Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77
Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.
Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):
1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain
2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.
Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.
Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
With this fix, the experimental branch now builds and encodes correctly
with the following two configure options respectively:
--enable-experimental --enable-t8x8
--enable-experimental
Change-Id: I3147c33c503fe713a85fd371e4f1a974805778bf
The auto merge process pull and merge commits from public git or master
branch. These automerges while worked well most time, but has created
a few problems. This commit fixed several issues existed long before
the latest 8x8 transform commit.
Change-Id: I895ca99713231b1aec521d57db5d9839f74aacfa