Commit e06c242ba introduced a change to call vp8_find_near_mvs() only
once instead of once per reference frame by observing that the only
effect that the frame had was on the bias applied to the motion
vector. By keeping track of the sign_bias value, the mv to use could
be flip-flopped by multiplying its components by -1.
This behavior was subtley wrong in the case when clamping was applied
to the motion vectors found by vp8_find_near_mvs(). A motion vector
could be in-bounds with one sign bias, but out of bounds after
inverting the sign, or vice versa. The clamping must match that done
by the decoder.
This change modifies vp8_find_near_mvs() to remove the clamping from
that function. The vp8_pick_inter_mode() and vp8_rd_pick_inter_mode()
functions instead track the correctly clamped values for both bias
values, switching between them by simple assignment. The common
clamping and inversion code is in vp8_find_near_mvs_bias()
Change-Id: I17e1a348d1643497eca0be232e2fbe2acf8478e1
This commit is incomplete, as it does not synchronize the loop filter
before returning a handle to the reconstructed frame in
vpx_codec_get_preview_frame(), which can cause (false?) failures
when running the test_reconstruct_buffer test.
This may be related to a bug that does cause visible artifacts, which
is also under investigation.
This reverts commit 380d64ecb1.
Change-Id: Iad710941e7731d44fc2bde63bc63d6763cc4629e
A processor with ARMv7 instructions does not
necessarily have NEON dsp extensions. This CL
has the added side effect of allowing the ability
to enable/disable the dsp extensions cleanly.
Change-Id: Ie1e879b8fe131885bc3d4138a0acc9ffe73a36df
When running multi-layer (ML) encodes and dynamically
changing coding parameters on the fly (e.g. frame
duration/rate, bandwidths allocated to each layer)
the encoder would not produce sensible output.
In certain cases the rate targeting would be
hideously inaccurate.
These fixes make it possible to change these coding
parameters correctly and to maintain accurate control
of the rate targeting.
I also added the specification of the input timebase
into the test program, vp8_scalable_patterns.c.
Patch 2: Moved declaration to appease MS compiler)
Change-Id: Ic8bb5a16daa924bb64974e740696e040d07ae363
Add new get_predictor_pointers() and get_reference_search_order()
functions for code shared between the two implementations.
Change-Id: I1ebe76aa8f168b1f5cfabc00d05d8f19a0d4d207
This patch removes the local copies of the dequantize
constants and implements John's idea as described
in "Make a local copy of the dequantized data" commit.
Change-Id: Ic6b7d681f00bf63263f71ff1e39ab2f80729e8b2
Except zrun_zbin_boost, 15 AC values are the same for all other
parameters. Removed unneccessary calculation.
Change-Id: I6101c0fe8080bd2b4387c3b04d7ddedbf6010409
The code had a number of constructs like (condition)?1:0,
which is redundant with C's semantics. In the cases where a boolean
operator was used in the condition, simply remove the ternary part.
Otherwise adjust the surrounding expression to remove the condition
(eg, for rounding up. See pickinter.c and rdopt.c)
Change-Id: Icb2372defa3783cf31857d90b2630d06b2c7e1be
Make bilinearfilter_arm.c compiled only when HAVE_ARMV6, as its definitions
are v6 only. This is normally not a problem for static builds as the file
is elided at link time, but this was not being done properly for the
--enable-shared --enable-pic build.
Change-Id: Ic800a7cde751f74f22555c5b247f99f9df5e550d
Merged multi-resolution motion estimation with regular motion
estimation function in order to remove duplicated part. This
caused slight changes in multi-resulotion encoder quality &
performance.
Change-Id: Ib4ecc7acfebfe5eea959b5b91febae6db7b95fd1
The total_stats, this_frame_stats, and total_left_stats structures
were previously create by a heap allocation, despite being of fixed
size. These structures were allocated and deallocated during
{de,}allocate_compressor_data, which is reinvoked whenever the frame
size changes. Unfortunately, this clobbers the total_stats and
total_left_stats data.
Historically, these were variable size at one time, due to the first
pass motion map, which necessitated their being created by a unique
heap allocation. However, this bug with the total_stats being
clobbered has probably been present since that initial implementation.
These structures are instead moved to be stored within the struct
twopass_rc directly, rather than being heap allocated separately.
Change-Id: I7f9e519e25c58b92969071f0e99fa80307e0682b
These functions are now used by the encoder.
This is WIP with the goal of creating a common idct/add for
the encoder and decoder. A boost of 1.8% was seen for
the HD rt test clip used.
[Tero] Added needed changes to ARM side.
Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf
While doing motion search on a macroblock, we usually call
vp8_find_near_mvs once per reference frame. Actually, for
different reference frames, the only difference in calculating
these near_mvs is they may have different sign_bias, which
causes a sign change in resulting near_mvs. In this change, we
only do find_near_mvs for the first reference frame. For other
reference frames, only need to adjust the near_mvs according to
that reference frame's sign_bias value.
Change-Id: I661394b49c6ad79fed7d0f2eb2be239b9c56f149
The fast-path for skipped MBs was not correctly respecting the
block type during update of the coefficient counts. Extracted
this from part of change I365cfb6ac636f19c545f682e3aeac185253abaef
Change-Id: I53d8cf0a00a98034b97b0ed3414b703bae74a739
Do the test filtering in the existing backup frame buffer instead of
the original. Copy the original data into extra buffer before doing
the filtering. This way there is no need to restore the original
unfiltered frame at the end of level picking process.
This came up in some discussions with Johann. Thanks!
Change-Id: I495f4301d983854673276c34ec0ddf9a9d622122
The example encoder down-samples the input video frames a number of
times with a down-sampling factor, and then encodes and outputs
bitstreams with different resolutions.
Support arbitrary down-sampling factor, and down-sampling factor
can be different for each encoding level.
For example, the encoder can be tested as follows.
1. Configure with multi-resolution encoding enabled:
../libvpx/configure --target=x86-linux-gcc --disable-codecs
--enable-vp8 --enable-runtime_cpu_detect --enable-debug
--disable-install-docs --enable-error-concealment
--enable-multi-res-encoding
2. Run make
3. Encode:
If input video is 1280x720, run:
./vp8_multi_resolution_encoder 1280 720 input.yuv 1.ivf 2.ivf 3.ivf 1
(output: 1.ivf(1280x720); 2.ivf(640x360); 3.ivf(320x180).
The last parameter is set to 1/0 to show/not show PSNR.)
4. Decode:
./simple_decoder 1.ivf 1.yuv
./simple_decoder 2.ivf 2.yuv
./simple_decoder 3.ivf 3.yuv
5. View video:
mplayer 1.yuv -demuxer rawvideo -rawvideo w=1280:h=720 -loop 0 -fps 30
mplayer 2.yuv -demuxer rawvideo -rawvideo w=640:h=360 -loop 0 -fps 30
mplayer 3.yuv -demuxer rawvideo -rawvideo w=320:h=180 -loop 0 -fps 30
The encoding parameters can be modified in vp8_multi_resolution_encoder.c,
for example, target bitrate, frame rate...
Modified API. John helped a lot with that. Thanks!
Change-Id: I03be9a51167eddf94399f92d269599fb3f3d54f5
This value needs to be copied to each thread's data structure.
This fixed artifact problem in multi-thread encoder.
Change-Id: Iab6d9745a1d44846aa503184705376f63a505597
API was not returning correct partition sizes on arm targets.
The armv5 token packing functions were not storing the information to the
partition size table.
As a fix, have one boolcoder instance allocated for each partition so
that partition sizes are internally available after all partitions
were encoded. This will also allow more flexibility in producing
several partitions in parallel.
Use buffer validation (overflow check) in all ARM bitpacking
functions.
Change-Id: I31c8a11d8a7613676f0ff50928cb2a2ab14fd169
Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2
file is bad
Moving towards allowing --disable-mmx
Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
There was an implicit reference frame test order (typically LAST,
GOLD, ARF) in the mode selection logic, but this doesn't provide the
expected results when some reference frames are disabled. For
instance, in real-time mode, the speed selection logic often disables
the ARF modes. So if the user disables the LAST and GOLD frames, the
encoder was always choosing INTRA, when in reality searching the ARF
in this case has the same speed penalty as searching LAST would have
had.
Instead, introduce the notion of a reference frame search order. This
patch preserves the former priorities, so if a frame is disabled, the
other frames bump up a slot to take its place. This patch lays the
groundwork for doing something smarter in the frame test order, for
example considering temporal distance or looking at the frames used by
nearby blocks.
Change-Id: I1199149f8662a408537c653d2c021c7f1d29a700
Extend buffer write validation (overflow check) to single token
partition packing, both mb and row based functions.
Change-Id: I36e19b7d37fc43712d05c70e3ad223d3eb5b973d
Patch set 2: 64 bit build fix
Patch set 3: 64 bit crash fix
[Tero]
Patch set 4: Updated ARMv6 and NEON assembly.
Added also minor NEON optimizations to subtract
functions.
Patch set 5: x86 stride bug fix
Change-Id: I1fcca93e90c89b89ddc204e1c18f208682675c15
The calculated frame_rate is a state variable in the codec, and
shouldn't be maintained in the configuration struct. Move it to the
main part of cpi so that it isn't clobbered when the configuration
struct is updated. The initial framerate estimate is moved from the
vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so
that it is only called once and not reset on every call to
vp8_change_config().
Change-Id: I8d9a3d1283330d1ee297d07e9d78d1f2875f2465
Call the idct/add after the tokenize. This is WIP with
the goal of creating a common idct/add for the encoder and
decoder. This move is necessary because the decoder's version
of the idct clobbers qcoeff, which is used by the tokenize.
Change-Id: I6b08d8e8397cd873647fa4fb9469884e3c876756
Added ARM optimized intra 4x4 prediction
- 2x faster on Profiler compared to C-code compiled with -O3
- Function interface changed a little to improve BLOCKD structure
access
Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
vp8cx_mb_init_quantizer() needs to be called at least once to get
all values calculated. This change added one check to decide if
we could skip initialization or not.
Change-Id: I3f65eb548be57580a61444328336bc18c25c085b
Added code to clip the buffer level to the maximum buffer
size. Without this the buffer level would increase
unchecked.
This bug was found when encoding an essentially static
scene at 2Mb/s. The encoder is unable to generate frames
consistent with the high data-rate because Q bottoms out
at Qmin.
As frames generated are consistently undersized the buffer
level increases and does not get checked against the
maximum size specified by the user (or default).
Change-Id: Id8a3c6323d3246da50f7cb53ddbf78b5528032c6
Fix compiler warning for passing a non const array
to a function expecting a const array by using an
intermediary pointer and casting.
Change-Id: I9bdd358ebdc926223993fb8fb2098ffedd2f3fc7
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.
Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
In some situations (f.g. error-resilient is turned on), vp8cx_mb
_init_quantizer() was called once per macroblock. Added checks
to avoid calculations when there is no change.
Change-Id: Ie4f0a5ade2202041254990a4e9d5b03bd1ac5aea
It is discovered that in rare situations the 2nd order block may
produce a few small magnitude coefficients that has no effect on
reconstruction. The situations are a combination of low quantizer
values (high quality) and low energy in residual signals (content
dependent). This commit added code to detect such cases and reset
the 2nd order block to all 0.
Patch 1 to 4 used code to do all-zero-check on idct result buffer,
and tests on derf set showed a consistent gain of .12%-.14% on all
metrics.But due to a recent change Ie31d90b, the idct result buffer
is not longer populated. So patch 5&6 use an alternative method to
detect the situations. Tests on derf set now shows a consistent
quality gain of .16%-.20%.
As suggested by Jim, Patch 7&8 removed the condition of all first
order block not having any coefficient, instead we reset 2nd order
coefficients to all 0 if sum of absolute value of the coefficients
is small. So it does slightly more than just detecting the oddity
as discussed above, but tests on derf set now show a consistent
gain of .20%-.23% on all metrics.
It is worth noting here that this change does not have any effect
on mid/high quantizer range, it only affects the quantizer value
18 or blow. Within this range, the change helps compression by up
to 2.5% on clips in the derf set.
Change-Id: I718e19cf59a4fc2462cb7070832759beb9f7e7dd
The partial frame copy function used to copy an extra 8 lines above
and below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.
Define the "magic" fraction number for partial filtering in
loopfilter.h .
Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.
Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28
During the _pick only the Y plane is examined. In addition, data beyond
the borders of the frame is not read.
Change-Id: Ic549adfca70fc6e0b55f8aab0efe81f0afac89f9
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.
Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.
The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)
Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.
Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26
Previous entropy probs need to be saved (and restored) only when
current updates are not propagated.
Change-Id: Ie6ee0543066e30874e56258be0a6b7d2dd2fdb2b
Uninitialized data could be written to the first pass file when no
motion vectors are present in the frame.
Also fix a number of compiler warnings.
Change-Id: Icc9f53b6d33da9de4563d86d9fd591910473ea90
vp8_find_near_mvs() is being called on all possible reference frames
but the data computed may be used if the loop exits early, which can
be due to x->skip beign set to 1.
Optimize this by call vp8_find_near_mvs() laziy only if it is going
to be used and not computed yet.
Change-Id: Iccdbd4c962a670c9f2c99b8aca8096042ca5dc98
Changes to the selection of Q limits for two pass
and two pass CQ mode.
Allowance made for Mode and motion vector costs.
Some refactoring of common code.
For Derf and YT sets CQ mode average improvement
circa 1% (SSIM and Global PSNR).
Some increased tendency to undershoot even when
user CQ not reached.
Patch2: Removed some test code accidentally merged.
Change-Id: Icf74d13af77437c08602571dc7a97e747cce5066
Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.
Change-Id: I288d97b5e331d41d6f5bb49d97986fa12ac6f066
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9
Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.
Restriction applies only to armv5 targets and not for armv6 and above.
Change-Id: I378d17c51e1f16a6820814fbed43e115aaabb03e
- Updated walsh transform to match C
(based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
correspondingly
Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e
Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.
Change-Id: I338f02b881098782f82af63d97f042b85e63e902
vp8_update_zbin_extra() is called all the time even though the fast
quantizer doesn't use it. Skip this call if fast quantizer is used.
Change-Id: Ia711c38431930cc2486cf59b8466060ef0e9d9db
This change makes sure that no key frame recoding in real-time mode
even if CONFIG_REALTIME_ONLY is not configured.
Change-Id: Ifc34141f3217a6bb63cc087d78b111fadb35eec2
for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
Prepend . to local labels in assembly code. This
allows non unique labels within a file. Also
makes profiling information more informative
by keeping the function name with the loop name.
Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I29f5c2ead342b2086a468029c15e2c1d948b5d97
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.
This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.
Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4
Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.
Also use min GF interval as min KF interval.
Change-Id: Iee238b8cae0ffaed850a5a944ac825cee18da485
This reverts commit b5ea2fbc2c. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.
Conflicts:
vp8/encoder/onyx_if.c
vp8/encoder/ratectrl.c
Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.
Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77
Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.
Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):
1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain
2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.
Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.
Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db