Break MFQE code into it's own file.
It is currently only valid for 16x16 and 8x8 Y blocks. It also filters
4x4 U/V blocks.
Refactor filtering and add associated assembly. Limited test cases show
--mfqe introduces a penalty of ~20% with HD content. The assembly
reduces the penalty to ~15%
Change-Id: I4b8de6b5cdff5413037de5b6c42f437033ee55bf
Coefficient costing failed to take account of the first branch
being skipped ( 0 vs eob) if the previous token is 0.
Fixed rd to account for slightly increased token cost & cleaned up
warning message
Change-Id: I56140635d9f48a28dded5a816964e973a53975ef
Produce the token partitions on-the-fly, while processing each MB.
Context is updated at the beginning of each frame based on the
previoud frame's counters. Optimally encoder outputs partitions in
separate buffers. For frame based output, partitions are concatenated
internally.
Limitations:
- enabled just in combination with realtime-only mode
- number of encoding threads has to be equal or less than the
number of token partitions. For this reason, by default the encoder
will do 8 token partitions.
- vpxenc supports partition output (-P) just in combination with
IVF output format (--ivf)
Performance:
- Realtime encoder can be up to 13% faster (ARM) depending on the number
of threads and bitrate settings. Constant gain over the 5-16 speed
range.
- Token buffer reduced from one frame to 8 MBs
Quality:
- quality is affected by the delayed context updates. This again
dependents on input material, speed and bitrate settings. For VC
style input the loss seen is up to 0.2dB. If error-resilient=2
mode is used than the effect of this change is negligible.
Example:
./configure --enable-realtime-only --enable-onthefly-bitpacking
./vpxenc --rt --end-usage=1 --fps=30000/1000 -w 640 -h 480
--target-bitrate=1000 --token-parts=3 --static-thresh=2000
--ivf -P -t 4 -o strm.ivf tanya_640x480.yuv
Change-Id: I127295cb85b835fc287e1c0201a67e378d025d76
Eliminated some mb branches along with other code cleanups.
This is part of an ongoing effort to remove cut/paste
code in the decoder.
Change-Id: Ifabb0f67cafa6922b5a0e89a0d03a9b34e9e5752
Reworked the code to use vp8_build_intra_predictors_mby_s,
vp8_intra_prediction_down_copy, and vp8_intra4x4_predict_d_c
functions instead. vp8_intra4x4_predict_d_c is a decoder-only
version of vp8_intra4x4_predict. Future commits will fix this
code duplication.
Change-Id: Ifb4507103b7c83f8b94a872345191c49240154f5
When we encode slide-show clips, for the majority of the time,
only ZEROMV mode is checked, and all other modes are skipped.
This change delayed uv intra-mode evaluation until intra mode is
actually checked. This gave big performance gain for slide-show
video encoding (2nd pass gain: 18% to 28%). But, this change
doesn't help other types of videos.
Also, zbin_mode_boost is adjusted in mode-checking loop, which
causes bitstream mismatch before/after this change when --best
or --good with --cpu-used=0 are used.
Change-Id: I582b3e69fd384039994360e870e6e059c36a64cc
use oxcf instead of common in check to Reinit the
lookahead buffer if the frame size changes
prior behavior would cause assertion fail/crash
first observed in:
support changing resolution with vpx_codec_enc_config_set
Change-Id: Ib669916ca9b4f206d4cc3caab5107e49d39a36aa
When temporal layers is used (i.e., number_of_layers > 1),
we don't use the frame rate boost for setting the key
frame target size. The factor was forcing the target size to be
always at its minimum (2* per_frame_bandwidth) for low frame rates
(i.e., base layer frame rate).
Generally we should modify or remove this frame rate factor;
for now we turn if off for number_of_layers > 1.
Change-Id: Ia5acf406c9b2f634d30ac2473adc7b9bf2e7e6c6
Reworked the code to use vp8_build_intra_predictors_mbuv_s
instead. This is WIP with the goal of eliminating all
functions in reconintra_mt.h
Change-Id: I61c4a132684544b24a38c4a90044597c6ec0dd52
In vp8_rd_pick_inter_mode(), if total of eobs is zero, rate needs
to be adjusted since there are no non-zero coefficients for
transmission. The uv intra eobs calculated in
rd_pick_intra_mbuv_mode() need to be saved before they are
overwritten by inter-mode eobs.
Change-Id: I41dd04fba912e8122ef95793d4d98a251bc60e58
mode_info_context->mbmi.mb_skip_coeff has to always reflect the
existence or not of coeffs for a certain MB. The loopfilter needs this
info.
mb_skip_coeff is either set by the vp8_tokenize_mb or has to be set to
1 when the MB is skipped by mode selection. This has to be done
regardless of the mb_no_coeff_skip value.
prob_skip_false is needed just when mb_no_coeff_skip is 1. No need to
keep count of both skip_false and skip_true as they are complementary
(skip_true+skip_false = total_mbs)
Change-Id: I3c74c9a0ee37bec10de7bb796e408f3e77006813
Depending on implementation the optimized SAD functions may return early
when the calculated SAD exceeds max_sad.
Change-Id: I05ce5b2d34e6d45fb3ec2a450aa99c4f3343bf3a
Refactoring some of the mode decoding logic introduced a bug where
the segmentation maps would not be properly reset on keyframes.
http://code.google.com/p/webm/issues/detail?id=378
The text of the bug is somewhat misleading as I initially read it to
imply the bug was present in v0.9.7-p1 (Cayuga), but note the text
"master", which indicates this was something subsequent. This issue
bisects back to v0.9.7-p1-84-ga99c20c, so unfortunately it was broken
during the Duclair release.
Thanks to Alexei Leonenko for investigating the root cause.
Change-Id: I9713c9f070eb37b31b3b029d9ef96be9b6ea2def
On Android NDK, rand() is inlined function. But, on our SSE optimization,
we need symbol for rand()
Change-Id: I42ab00e3255208ba95d7f9b9a8a3605ff58da8e1
Replace inner loops of pack_mb_row_tokens_c and
pack_tokens_into_partitions_c with a call to pack_tokens_c.
Change-Id: I0341554fb154a14a5dadb63f8fc78010724c2c33
Second shot at this...
Sync with loopfilter thread as late as possible, usually just at the
beginning of next frame encoding. This returns control to application
faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed
imediatly so we cannot delay the sync. Same has to be done when
internal frame is previewed.
Change-Id: I64e110c8b224dd967faefffd9c93dd8dbad4a5b5
As an optimization some architectures use the max_sad argument to break
out early from the SAD. Pass in INT_MAX instead of 0 to prevent this.
Change-Id: I653c476834b97771578d63f231233d445388629d
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.
To fix this we cast the result of the multiplication as unsigned.
The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.
For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).
PPC change is untested.
Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
Allow the application to change the frame size during encoding. This
is only supported when not using lagged compress.
Change-Id: I89b585d703d5fd728a9e3dedf997f1b595d0db0f
MFQE postproc crashed with stream dimensions not a multiple of 16.
The buffer was memset unconditionally, so if the buffer allocation
fails we end up trying to write to NULL.
This patch traps an allocation failure with vpx_internal_error(),
and aligns the buffer dimensions to what vp8_yv12_alloc_frame_buffer()
expects.
Change-Id: I3915d597cd66886a24f4ef39752751ebe6425066
Sometimes, a user doesn't have enough bandwidth to send high-resolution
(i.e. HD) video even though the camera catches HD video. This change
allowed users to skip highest-resolution encoding by setting that level's
target bit rate to 0.
To test it, modify the following line in vp8_multi_resolution_encoder.c.
unsigned int target_bitrate[NUM_ENCODERS]={1400, 500, 100};
To skip the highest-resolution level, change it to
unsigned int target_bitrate[NUM_ENCODERS]={0, 500, 100};
To skip the first and second highest resolution levels, change it to
unsigned int target_bitrate[NUM_ENCODERS]={0, 0, 100};
This change also fixed a small problem in mapping, which slightly helped
quality and performance.
Change-Id: I977bae9a9fbfba85c8be4bd5af01539f2b84bc81
This is the final commit in the series converting to the new RTCD
system. It removes the encoder csystemdependent files and the remaining
global function pointers that didn't conform to the old RTCD system.
Change-Id: I9649706f1bb89f0cbf431ab0e3e7552d37be4d8e
This commit continues the process of converting to the new RTCD
system. It removes the last of the VP8_ENCODER_RTCD struct references.
Change-Id: I2a44f52d7cccf5177e1ca98a028ead570d045395
This is a proof of concept RTCD implementation to replace the current
system of nested includes, prototypes, INVOKE macros, etc. Currently
only the decoder specific functions are implemented in the new system.
Additional functions will be added in subsequent commits.
Overview:
RTCD "functions" are implemented as either a global function pointer
or a macro (when only one eligible specialization available).
Functions which have RTCD specializations are listed using a simple
DSL identifying the function's base name, its prototype, and the
architecture extensions that specializations are available for.
Advantages over the old system:
- No INVOKE macros. A call to an RTCD function looks like an ordinary
function call.
- No need to pass vtables around.
- If there is only one eligible function to call, the function is
called directly, rather than indirecting through a function pointer.
- Supports the notion of "required" extensions, so in combination with
the above, on x86_64 if the best function available is sse2 or lower
it will be called directly, since all x86_64 platforms implement
sse2.
- Elides all references to functions which will never be called, which
could reduce binary size. For example if sse2 is required and there
are both mmx and sse2 implementations of a certain function, the
code will have no link time references to the mmx code.
- Significantly easier to add a new function, just one file to edit.
Disadvantages:
- Requires global writable data (though this is not a new requirement)
- 1 new generated source file.
Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55
Commit 892e23a5b introduced support for the VP8D_GET_LAST_REF_USED,
but missed the mapping of the control id to the underlying function,
so it was unavailable to applications.
In addition, the underlying function vp8_references_buffer() is
moved from common/postproc.c to decoder/onyxd_if.c as postproc.c is
not built in all configurations.
Change-Id: I426dd254e7e6c4c061b70d729b69a6c384ebbe44
Commit e06c242ba introduced a change to call vp8_find_near_mvs() only
once instead of once per reference frame by observing that the only
effect that the frame had was on the bias applied to the motion
vector. By keeping track of the sign_bias value, the mv to use could
be flip-flopped by multiplying its components by -1.
This behavior was subtley wrong in the case when clamping was applied
to the motion vectors found by vp8_find_near_mvs(). A motion vector
could be in-bounds with one sign bias, but out of bounds after
inverting the sign, or vice versa. The clamping must match that done
by the decoder.
This change modifies vp8_find_near_mvs() to remove the clamping from
that function. The vp8_pick_inter_mode() and vp8_rd_pick_inter_mode()
functions instead track the correctly clamped values for both bias
values, switching between them by simple assignment. The common
clamping and inversion code is in vp8_find_near_mvs_bias()
Change-Id: I17e1a348d1643497eca0be232e2fbe2acf8478e1
This commit is incomplete, as it does not synchronize the loop filter
before returning a handle to the reconstructed frame in
vpx_codec_get_preview_frame(), which can cause (false?) failures
when running the test_reconstruct_buffer test.
This may be related to a bug that does cause visible artifacts, which
is also under investigation.
This reverts commit 380d64ecb1.
Change-Id: Iad710941e7731d44fc2bde63bc63d6763cc4629e
A processor with ARMv7 instructions does not
necessarily have NEON dsp extensions. This CL
has the added side effect of allowing the ability
to enable/disable the dsp extensions cleanly.
Change-Id: Ie1e879b8fe131885bc3d4138a0acc9ffe73a36df
Makes the thresholds for the multiframe quality enhancement module
depend on the difference between the base quantizers. Also modifies
the mixing function to weigh the current low quality frame less if
the difference in quantizer is large. With the above modifications
mfqe works well for both scalable patterns as well as low quality
key frames.
Change-Id: If24e94f63f3c292f939eea94f627e7ebfb27cb75
filter in a way such that when there is a single bad frame, the
post-processing is applied not only to just that frame but a
few subsequent frames as well.
Change-Id: Iba5d9896eed77244eb76b4a74692a93f8ecff634
When running multi-layer (ML) encodes and dynamically
changing coding parameters on the fly (e.g. frame
duration/rate, bandwidths allocated to each layer)
the encoder would not produce sensible output.
In certain cases the rate targeting would be
hideously inaccurate.
These fixes make it possible to change these coding
parameters correctly and to maintain accurate control
of the rate targeting.
I also added the specification of the input timebase
into the test program, vp8_scalable_patterns.c.
Patch 2: Moved declaration to appease MS compiler)
Change-Id: Ic8bb5a16daa924bb64974e740696e040d07ae363
Add new get_predictor_pointers() and get_reference_search_order()
functions for code shared between the two implementations.
Change-Id: I1ebe76aa8f168b1f5cfabc00d05d8f19a0d4d207
with deblock or demacroblock filters. When --mfqe is used together
with --demacroblock or --deblock, mfqe is applied first and then
demacroblock/deblock is applied to the mfqe result.
Change-Id: Id83ee01f1b4a33a116f071dcf26d59c7f3497c32