sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
This is done by expanding luma row to 32-byte alignment, since
there is currently a bunch of code that assumes that
uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
encoder/temporal_filter.c, and possibly others; I haven't done a
full audit).
It also uses replaces the hardcoded border of 16 in a number of
encoder buffers with VP8BORDERINPIXELS (currently 32), as the
chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
dumping the frame memory as a contiguous blob produces a valid,
if padded, image.
Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.
Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.
Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
Separate simple filter with reduced no. of parameters.
MB filter level picking based on precalculated table. Level table updated for
each frame. Inside and edge limits precalculated and updated just when
sharpness changes. HEV threshhold is constant.
ARM targets use scalars and others vectors.
Change works only with --target=generic-gnu
All other targets have to be updated!
Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c
Allow the encoder to inform the application that the encoded frame will not
be used as a reference.
Change-Id: I90e41962325ef73d44da03327deb340d6f7f4860
In this commit I have added an experimental function
that tests prediction quality either side of a central position
to calculate a suggested boost number for an ARF frame.
The function is passed an offset from the current position and
a number of frames to search forwards and backwards.
It returns a forward, backward and compound boost number.
The new code can be deactivated using #define NEW_BOOST 0
In its current default state the code searches forwards and backwards
from the proposed position of the next alt ref.
The the old code used a boost number calculated by scanning forward
from the previous GF up to the proposed alt ref frame position.
I have also added some code to try and prevent placement of a gf/arf
where there is a brief flash.
Change-Id: I98af789a5181148659f10dd5dd2ff2d4250cd51c
Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.
Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
This reverts commit 212f618373.
Further testing shows that the overshoot accumulation/damping is too
aggressive on some clips. Allowing the accumulated overshoot to
decay and limiting to damping to golden frames shows some promise.
But some clips show significant overshoot in the buffer window, so
I think this still needs work.
Change-Id: Ic02a9ca34f55229f9cc04786f4fab54cdc1a3ef5
This patch attempts to reduce the peak bitrate hit by the encoder
when using small buffer windows.
Tested on the CIF set over 200-500kbps using these settings:
--buf-sz=500 --buf-initial-sz=250 --buf-optimal-sz=250 \
--undershoot-pct=100
Two pass encodes were tested at best quality. One pass encodes were
tested only at realtime speed 4:
--rt --cpu-used=-4
The peak datarate (over the specified 500ms window) was measured
for each encode, and averaged together to get metric for
"average peak," computed as SUM(peak)/SUM(target). This patch
reduces the average peak datarate as follows:
One pass:
baseline: 1.29715
this patch: 1.23664
Two pass:
baseline: 1.32702
this patch: 1.37824
This change had a positive effect on our quality metrics as well:
One pass CBR:
Min / Mean / Max (pct)
Average PSNR -0.42 / 2.86 / 27.32
Overall PSNR -0.90 / 2.00 / 17.27
SSIM -0.05 / 3.95 / 37.46
Two pass CBR:
Min / Mean / Max (pct)
Average PSNR -4.47 / 4.35 / 35.99
Overall PSNR -3.40 / 4.18 / 36.46
SSIM -4.56 / 6.98 / 53.67
One pass VBR:
Min / Mean / Max (pct)
Average PSNR -5.21 / 0.01 / 3.30
Overall PSNR -8.10 / -0.38 / 1.21
SSIM -7.38 / -0.11 / 3.17
(note: most values here were close to the mean, there were a few
outliers on files that were very sensitive to golden frame size)
Two pass VBR:
Min / Mean / Max (pct)
Average PSNR 0.00 / 0.00 / 0.00
Overall PSNR 0.00 / 0.00 / 0.00
SSIM 0.00 / 0.00 / 0.00
Neither one pass or two pass CBR mode adheres particularly strictly
to the short term buffer constraints, and two pass is less
consistent, even in the baseline commit. This should be addressed
in a later commit. This likely will hurt the quality numbers, as it
will have to reduce the burstiness of golden frames.
Aside: My work on this commit makes it clear that we need to make
rate control modes "pluggable", where you can easily write a new
one or work on one in isolation.
Change-Id: I1ea9a48f2beedd59891f1288aabf7064956b4716
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
- Additional 3-6% speedup compared to neon optimized fast
quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)
Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
Misplaced #endif caused first_time_stamp_ever to only be initialized if
CONFIG_INTERNAL_STATS was set.
Change-Id: I2296a4ab00f7dfb767583edcc5d59b94f48c0621
in onyx_if.c update_reference_frames() make
sure that frame buffer indexes are not equal
before preforming a buffer copy. If two frames
share the same buffer the flags will already be
set correctly.
Change-Id: Ida9b5516d08e3435c90f131d2dc19d842cfb536e
Test showed using hex search in realtime mode largely speed up
encoding process, and still achieves similar quality like the
diamond search we have. Therefore, removed the diamond search
option.
Change-Id: I975767d0ec0539f9f6ed7fdfc09506e39761b66c
fixed a bug where active_worst_quality could be set
below active_best_quality which could result in an
infinite loop.
Change-Id: I93c229c3bc5bff2a82b4c33f41f8acf4dd194039
This patch collects the twopass specific memebers of VP8_COMP into a
dedicated struct. This is a first step towards isolating the two pass
rate control and aids readability by decorating these variables with
the 'twopass.' namespace. This makes it clear to the reader in what
contexts the variable will be valid, and is a hint that a section of
code might be a good candidate to move to firstpass.c in later
refactoring. There likely will be other rate control modes that need
their own specific data as well.
This notation is probably overly verbose in firstpass.c, so an
alternative would be to access this struct through a pointer like
'rc->' instead of 'cpi->firstpass.' in that file. Feel free to make
a review comment to that effect if you prefer.
Change-Id: I0ab8254647cb4b493a77c16b5d236d0d4a94ca4d
This commit restructures the mb activity masking code
to better facilitate experimentation using different metrics
etc. and also allows for adjustment of the zero bin either
for encode only or both the encode and mode selection
stages
It also uses information from the current frame rather than
the previous frame and the default strength has been
reduced.
Change-Id: Id39b19eace37574dc429f25aae810c203709629b
This patch improves the accuracy of frame rate estimation by using a
larger, 1 second window. It also more quickly adapts to step changes
in the input frame rate (ie 30fps to 15fps)
Change-Id: I39e48a8f5ac880b4c4b2ebd81049259b81a0218e
The variable is introduced in commit 2e53e9e53 to make more use of
trellis quantization, but this is no longer necessary after RDMULT
was made adaptive in a number of later commits.
Change-Id: I7420522ec7723f38cf77033466c25afb405d52ae
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.
Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.
Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.
This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.
Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
This patch is to fix a rare hang in multi-thread encoder that was
only seen on Windows. Thanks for John's help in debugging the
problem. More test is needed.
Change-Id: Idb11c6d344c2082362a032b34c5a602a1eea62fc
The commit also removed the slow ssim calculation that uses a 7x7
kernel, and revised the comments to better describe how sample ssim
values are computed and averaged
Change-Id: I1d874073cddca00f3c997f4b9a9a3db0aa212276
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35
Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
Combine calc_iframe_target_size, previously only used for forced
keyframes, with calc_auto_iframe_target_size, which handled most
keyframes.
Change-Id: I227051361cf46727caa5cd2b155752d2c9789364
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.
Change-Id: I66a9a62a5f9c23c818015e03f92f3757bf3bb5c8
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers
stop tracking the option in two places. use filter_type exclusively
Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.
This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.
Change-Id: Ieba55b86517b81e51e6f0a9fe27aabba295acab0
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.
Change-Id: I3d2aa8a448d52e6dc08858d825bf394929b47cf3
Golden and ALT reference buffers were refreshed by copying from
the new buffer. Replaced this by index manipulation.
Also moved all the reference frame updates to one function for
easier tracking.
Change-Id: Icd3e534e7e2c8c5567168d222e6a64a96aae24a1
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.
Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.
Change-Id: Icbc2c83d2b90e184be03e6f9679e678f3a4bce8f
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.
Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.
Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.
These symbols were identified by:
$ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
| sort | grep '^ *1 '
Change-Id: I59609f58ab65312012c047036ae1e0634f795779
Clean up vp8_init_config() a bit and remove null pointer case,
as this code can't be called any more and is not an adequate
trap anyway, as a null pointer would cause exceptions before
hitting the test.
Change-Id: I937c00167cc039b3aa3f645f29c319d58ae8d3ee
Issue 291 highlighted the fact that CQ mode was not working
as expected in 1 pass mode,
This commit fixes that specific problem but in so doing I also
uncovered an overflow issue in the VBR code for 1 pass and
some data values not being correctly initialized.
For some clips (particularly short clips), the resulting
improvement is dramatic.
Change-Id: Ieefd6c6e4776eb8f1b0550dbfdfb72f86b33c960
In multithreaded mode the loopfilter is running in its own thread (filter level
calculation and frame filtering). Filtering is mostly done in parallel with the
bitstream packing. Before starting the packing the loopfilter level has
to be calculated. Also any needed reference frame copying is done in the
filter thread.
Currently the encoder will create n+1 threads, where n > 1 is the number of
threads specified by application and 1 is the extra filter thread. With n = 1
the encoder runs in single thread mode. There will never be more than n threads
running concurrently.
Change-Id: I4fb29b559a40275d6d3babb8727245c40fba931b
The firstpass motion map consists of an 8-bit flag for
each MB indicating how strongly the firstpass code
believes it should be filtered during the second pass
ARNR filtering.
For long or large format material the motion map can
become extremely large and hamper the operation of
the encoding process.
This change removes the motion map altogether, leaving
the second pass to rely on the magnitude of the motion
compensated error to determine the filter weight to
use for the MB during ARNR filtering.
Tests on the derf set indicate that the effect of this
change is neutral, with some small wins and losses. The
motion map has therefore been removed based on
a cost/benefit evaluation.
Change-Id: I53e07d236f5ce09a6f0c54e7c4ffbb490fb870f6
Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.
Change-Id: Ib7506e3eb2cea06657d56be5a899f38dfe3eeb39
checks added to make sure that cpi->tplist
is freed correctly in vp8_dealloc_compressor_data
and vp8_alloc_compressor_data.
Change-Id: I66149dbbd25c958800ad94f4379d723191d9680d
As mentioned in check-in "Improve motion search in real-time mode",
MV prediction calculation causes speed loss for speed 7 and above.
This change added a flag to turn off this calculation for speed>6
in real-time mode.
Change-Id: I9f4ae5a8bf449222d1784b54e7d315fc8347b2d1
Applied better MV prediction in real-time mode, which improves
the encoding quality.
Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.
Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.
Change-Id: I349a96c452bd691081d8c8e3e54419e7f477bebd
Created a new speed 1 which is in the middle of the old
speed 0 and speed 1. (for both quality and performance)
Change-Id: I4802133cdb43f359ca787646c090899679dd5d84
The encoder was not correctly catching transitions in the quantizer
deltas. If a delta_q was set, then the quantizer would be reinitialized
on every frame, but if they transitioned to 0, the quantizer would
not be reinitialized, leading to a encode-decode mismatch.
This bug was triggered by commit 999e155, which sets a Y2 delta Q
for very low base Q levels.
Change-Id: Ia6733464a55ee4ff2edbb82c0873980d345446f5
Whe auto keyframe insertion is enabled and conditions are right (scene change)
the encoder can decide to insert a key frame and does a re-encoding. This can
introduce extra latency. In RT mode we do not do the re-encoding of the current
frame but force the next frame to key frame.
Change-Id: I15c175fa845ac4c1a1f18bea3676e154669522a7
Reduce the number of sync points by letting each thread
continue imediatly with a new MB row.
Better multicore scaling, improves performance by 5-20% on ARM multicore.
Change-Id: Ic97e4d1c4886a842c85dd3539a93cb217188ed1b
The code previously tested cpi->common.refresh_alt_ref_frame
but there are situations where this flag may be set for viewable frames.
The correct test should be !cm->show_frame.
Change-Id: Ia1a600622992a4a68fe1d38ac23bf6b34b133688
This commit also removes artificial RDMULT cap for low quantizers.
The intention is to address some abnormal behavior of mode selections
at the low quantizer end, where many macroblocks were coded with
SPLITMV with all partitions using same motion vector including (0,0).
This change improves the compression quality substantially for high
quality encodings in both PSNR and SSIM terms. Overall effect on
mid/low rate range is also positive for all metrics, but smaller
in magnitude.
Change-Id: I864b29c4bd9ff610d2545fa94a19cc7e80c02667
Adjust checking points in motion vector prediction to better cover
possible movements, and get a better prediction. Tests on test
clips showed a 0.1% improvement in SSIM, and no change in PSNR
and performance.
Change-Id: Ifdab05d35e10faea1445c61bb73debf888c9d2f8
These changes are specifically targeted at fade transitions to
static scenes. Here we want to place a GF/ARF immediately
after the fade and prevent an ARF just before the fade.
Also some code lines and comment lines shortened to 80 chars
while I was there.
Change-Id: Iefdc09a4fa7b265048fc017246b73e138693950f
This code fixes a bug in the calculation of
the minimum Q for alt ref frames.
It also allows an extended gf/arf interval for sections
of clips that completely static (or nearly so).
Change-Id: I1a21aaa16d4f0578e5f99b13bebd78d59403c73b
Remove allocation/deallocation of stats storage.
Remove full search functions in machine specific encoder inits.
Remove last pass validation in validate_config.
Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
The CQ level was not using the q_trans[] array to convert
to a 0-127 range as per min and maxq
Experimental change to try and match the reconstruction
error for forced key frames approximately to that of the
previous frame by means of the recode loop. Though this
may cause extra recodes and the recode behavior has not
been optimized, it can only happen on forced key frames.
Change-Id: I1f7e42d526f1b1cb556dd461eff1a692bd1b5b2f
This change is designed to try and reduce pulsing effects when moving
with a complex transition like a fade, into an easy or static section in
an otherwise difficult clip in CQ mode.
The active CQ level is relaxed down to the user entered level for frames that
are generating less than the passed in minimum bandwidth.
Change-Id: Id6d8b551daad4f489c087bd742bc95418a95f3f0
Fixed discrepancy cpi->ni_frames vs cm->current_video_frame > 150.
Make one pass path explicit.
There is still scope for some odd behaviour around the transition
point at cpi->ni_frames > 150.
Change-Id: Icdee130fe6e2a832206d30e45bf65963edd7a74d
Where a key frame occurs because of a minimum interval
selected by the user, then these forced key frames ideally need
to be more closely matched in quality to the surrounding frame.
Change-Id: Ia55b1f047e77dc7fbd78379c45869554f25b3df7
Add a flag to always enable block4x4 search for speed=0 (good
quality) to guarantee no quality loss for speed0.
Change-Id: Ie04bbc25f7e6a33a7bfa30e05775d33148731c81
Further experiment with restriction of the Q range.
This uses the average non KF/GF/ARF quantizer, instead
of just relying on the initial value. It is not such a strong constraint
but there may be a reduced risk of rate misses.
Change-Id: I424fe782a37a2f4e18c70805e240db55bfaa25ec
The merge includes hooks to for CQ mode and other code
changes merged from the test branch.
CQ mode attempts to maintain a more stable quantizer within a clip
whilst also trying to adhere to a guidline maximum bitrate.
The existing target data rate parameter is used to specify the
guideline maximum bitrate.
A new parameter allows the user to specify a target CQ level.
For normal (non kf/gf/arf) frames, the quantizer will not drop BELOW the
user specified value (0-63). However, in some cases the encoder may
choose to impose a target CQ that is above that specified by the user,
if it estimates that consistent use of the target value is not compatible
with guideline maximum bitrate.
Change-Id: I2221f9eecae8cc3c431d36caf83503941b25e4c1
cpi->target_bits_per_mb is currently not being used,
so delete it. Also removed other unused code in rdopt.c.
Change-Id: I98449f9030bcd2f15451d9b7a3b9b93dd1409923
The following features don't make sense for the first
pass in its current form and have a significant impact on its
speed (up to 50%).
Slow quantizer, slow dct and trellis optimization.
Change-Id: Id9943f6765ffbd71fc0084ec7dfbc9d376fd6fcd
Scott pointed out that last_frame_type only gets updated while
loopfilter exists. Since last_frame_type is also needed in
motion search now, it needs to be updated every frame.
Change-Id: I9203532fd67361588d4024628d9ddb8e391ad912
Use the fast quantizer for inter mode selection and the
regular quantizer for the rest of the encode for good quality,
speed 1. Both performance and quality were improved. The
quality gains will make up for the quality loss mentioned in
I9dc089007ca08129fb6c11fe7692777ebb8647b0.
Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
allow for optimized versions of apply_temporal_filter
(now vp8_apply_temporal_filter_c)
the function was previously declared as static and appears to have been
inlined. with this change, that's no longer possible. performance takes
a small hit.
the declaration for vp8_cx_temp_filter_c was moved to onyx_if.c because
of a circular dependency. for rtcd, temporal_filter.h holds the
definition for the rtcd table, so it needs to be included by onyx_int.h.
however, onyx_int.h holds the definition for VP8_COMP which is needed
for the function prototype. blah.
Change-Id: I499c055fdc652ac4659c21c5a55fe10ceb7e95e3
In SPLITMV, the 8x8 segment will be checked first. If the 8x8 rd
is better than the best, we check the other segments. Otherwise
bail. Adjustments to the thresh_mult were necessary to make
up for the initial quality loss.
The performance improved by 20% (average) for good quality,
speed 0 and speed 1, while the overall quality remained the same.
Change-Id: I717aef401323c8a254fba3e9777d2a316c774cc3
The MV's range is 256. Since the new motion search uses a different
starting MV than the center ref MV, a MV range checking needs to
be done to avoid corruption.
Change-Id: I8ae0721d1bd203639e13891e2e54a2e87276f306
This code is unused, as the current preproc implementation uses the
same spatial filter that postproc uses.
Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
Corrected the initial Q range limits for the recode loop
to reflect the current allowed range for the frame.
In experimental work on constrained quality this bug was
causing unnecessary recodes.
Change-Id: I7e256fbfa681293b0223fe21ec329933d76c229f
Deallocating the buffers before re-allocating them.
The fix passed James Berry's test program for memory
leak check.
Change-Id: I18c3cf665412c0e313a523e3d435106c03ca438d
The inter_minq table controls the range of quantizers available
for a particular frame in two pass relative to a max Q value.
The changes reduces the range somewhat. The effect of this
was a small increase (0.3% average) in psnr for the test set
but it should also help encode speed somewhat for higher
quality modes as it will reduce the number of iterations in the
recode loop.
The change damps the range of quantizers available locally
within a section of a clip and should therefore help keep quality
more uniform. If there is systematic overshoot or undershoot the
range can shift gradually to accommodate. However, there is
some increased risk of overshoot or undershoot against the target
bit rate in VBR mode and this risk will be more pronounced for short
clips.
The change damps the range of quantizers available locally
within a section of a clip and should therefore help keep quality
more uniform. If there is systematic overshoot or undershoot the
range can shift gradually to accommodate. However, there is
some increased risk of overshoot or undershoot against the
target bit rate in VBR mode and this risk will be more
pronounced for short clips.
Change-Id: I84465567d49ae767c6c73ff2a2aac30c895adb52
Add vp8_mv_pred() to better predict starting MV for NEWMV
mode in vp8_rd_pick_inter_mode(). Set different search
ranges according to MV prediction accuracy, which improves
encoder performance without hurting the quality. Also,
as Yaowu suggested, using diamond search result as full
search starting point and therefore adjusting(reducing)
full search range helps the performance.
Change-Id: Ie4a3c8df87e697c1f4f6e2ddb693766bba1b77b6
On a keyframe alt ref and golden are refreshed. The flag was
not being set and so on the frame after a keyframe, motion
search would occur on the alt ref frame. This is not necessary
because the alt ref frame identical to the last frame in this
scenario.
Handle corner case where a forward alt-ref frame is put
directly after a keyframe.
Change-Id: I9be4cf290d694f8cf2f9a31852014b5ccf1504d3
Replaced existing code to decide if a frame recode is required
with a function call. This is to simplify addition of extra clauses
that may be needed for the planned constrained quality mode.
Also fixed a bug where by alt ref not considered in the test.
Change-Id: I3d40bb21abe3e19f8456761e6849deb171738b60
The fast quantizer assembly code has not been updated to match the new
exact quantizer, which was made the default in commit 6adbe09.
Specifically, they are not aware of the potential for the coefficient
to be scaled, which results in the quantized result exceeding the range
of the DCT. This patch restores the previous behavior of using the
non-shifted coefficients when in the fast quantizer code path, but
unfortunately requires rebuilding the tables when switching between the
two.
Change-Id: I0a33f5b3850335011a06906f49fafed54dda9546
Debugging in postproc needs more flags to allow for specific
block types to be turned on or off in the visualizations.
Must be enabled with --enable-postproc-visualizer during
configuration time.
Change-Id: Ia74f357ddc3ad4fb8082afd3a64f62384e4fcb2d
Small changes to the default zero bin and rounding tables.
Though the tables are currently the same for the Y1 and Y2 cases
I have left them as separate tables in case we want to tune this later.
There is now some adjustment of the zbin based on the prediction mode.
Previously this was restricted to an adjustment for gf/arf 0,0 MV.
The exact quantizer now marginal outperforms and is the default.
The overall average gain is about 0.5%
Change-Id: I5e4353f3d5326dde4e86823684b236a1e9ea7f47
Change Ice204e86 identified a problem with bitrate undershoot due to
low precision in the timestamps passed to the library. This patch
takes a different approach by calculating the duration of this frame
and passing it to the library, rather than using a fixed duration
and letting the library average it out with higher precision
timestamps. This part of the fix only applies to vpxenc.
This patch also attempts to fix the problem for generic applications
that may have made the same mistake vpxenc did. Instead of
calculating this frame's duration by the difference of this frame's
and the last frame's start time, we use the end times instead. This
allows the framerate calculation to scavenge "unclaimed" time from
the last frame. For instance:
start | end | calculated duration
======+=======+====================
0ms 33ms 33ms
33ms 66ms 33ms
66ms 99ms 33ms
100ms 133ms 34ms
Change-Id: I92be4b3518e0bd530e97f90e69e75330a4c413fc
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4
(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.
Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
NEON has optimized 16x16 half-pixel variance functions, but they
were not part of the RTCD framework. Add these functions to RTCD,
so that other platforms can make use of this optimization in the
future and special-case ARM code can be removed.
A number of functions were taking two variance functions as
parameters. These functions were changed to take a single
parameter, a pointer to a struct containing all the variance
functions for that block size. This provides additional flexibility
for calling additional variance functions (the half-pixel special
case, for example) and by initializing the table for all block sizes,
we don't have to construct this function pointer table for each
macroblock.
Change-Id: I78289ff36b2715f9a7aa04d5f6fbe3d23acdc29c
cppcheck found a leaked file descriptor in the debugging code
enabled by defining ENTROPY_STATS. Fixes issue #60.
Change-Id: I0c1d0669cb94d44fed77860f97b82763be06b7cb
The primary goal is to allow a binary to be built which supports
NEON, but can fall back to non-NEON routines, since some Android
devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
Tegra).
The configure-generated flags HAVE_ARMV7, etc., are used to decide
which versions of each function to build, and when
CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
at run time.
In order for this to work, the CFLAGS must be set to something
appropriate (e.g., without -mfpu=neon for ARMv7, and with
appropriate -march and -mcpu for even earlier configurations), or
the native C code will not be able to run.
The ASFLAGS must remain set for the most advanced instruction set
required at build time, since the ARM assembler will refuse to emit
them otherwise.
I have not attempted to make any changes to configure to do this
automatically.
Doing so will probably require the addition of new configure options.
Many of the hooks for RTCD on ARM were already there, but a lot of
the code had bit-rotted, and a good deal of the ARM-specific code
is not integrated into the RTCD structs at all.
I did not try to resolve the latter, merely to add the minimal amount
of protection around them to allow RTCD to work.
Those functions that were called based on an ifdef at the calling
site were expanded to check the RTCD flags at that site, but they
should be added to an RTCD struct somewhere in the future.
The functions invoked with global function pointers still are, but
these should be moved into an RTCD struct for thread safety (I
believe every platform currently supported has atomic pointer
stores, but this is not guaranteed).
The encoder's boolhuff functions did not even have _c and armv7
suffixes, and the correct version was resolved at link time.
The token packing functions did have appropriate suffixes, but the
version was selected with a define, with no associated RTCD struct.
However, for both of these, the only armv7 instruction they actually
used was rbit, and this was completely superfluous, so I reworked
them to avoid it.
The only non-ARMv4 instruction remaining in them is clz, which is
ARMv5 (not even ARMv5TE is required).
Considering that there are no ARM-specific configs which are not at
least ARMv5TE, I did not try to detect these at runtime, and simply
enable them for ARMv5 and above.
Finally, the NEON register saving code was completely non-reentrant,
since it saved the registers to a global, static variable.
I moved the storage for this onto the stack.
A single binary built with this code was tested on an ARM11 (ARMv6)
and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
and produced identical output, while using the correct accelerated
functions on each.
I did not test on any earlier processors.
Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
The first implementation of the firstpass motion map for motion
compensated temporal filtering created a file, fpmotionmap.stt,
in the current working directory. This was not safe for multiple
encoder instances. This patch merges this data into the first pass
stats packet interface, so that it is handled like the other
(numerical) firstpass stats.
The new stats packet is defined as follows:
Numerical Stats (16 doubles) -- 128 bytes
Motion Map -- 1 byte / Macroblock
Padding -- to align packet to 8 bytes
The fpmotionmap.stt file can still be generated for debugging
purposes in the same way that the textual version of the stats
are available (defining OUTPUT_FPF in firstpass.c)
Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
when a subsequent frame is encoded as an alt reference frame, it is
unlikely that any mb in current frame will be used as reference for
future frames, so we can enable quantization optimization even when
the RD constant is slightly rate-biased. The change has an overall
benefit between 0.1% to 0.2% bit savings on the test sets based on
vpxssim scores.
Change-Id: I9aa7bc5cd573ea84e3ee655d2834c18c4460ceea
This uses MB variance to change the RDO weight for mode decision
and quantization.
Activity is normalized against the average for the frame, which is
currently tracked using feed-forward statistics.
This could also be used to adjust the quantizer for the entire
frame, but that requires more extensive rate control changes.
This does not yet attempt to adapt the quantizer within the frame,
but the signaling cost means that will likely only be useful at
very high rates.
Change-Id: I26cd7c755cac3ff33cfe0688b1da50b2b87b9c93
This is just eliminating some cruft.
Although a number of variables are declared only when INTRARDOPT
is defined, they are used elsewhere without that protection, and
no longer just for intra RDO.
The intra_rd_opt flag was hard-coded to 1 and never checked.
Change-Id: I83a81554ecee8053e7b4ccd8aa04e18fa60f8e4f
This code adjust the impact of the amount and speed of motion
on GF and KF boost.
Sections with lots of slow motion will tend to have a
somewhat bigger boost and sections with fast motion may
have less.
There is a knock on effect to the selection of the active
quantizer range.
This will likely require further tuning but helps with a couple
of particularly bad edge cases.
Change-Id: Ic2449cda7305672b69acf42fc0a845b77ac98d40
If temporal filtering is enabled but a filter type is not specified
centered filter mode is used by default.
Change-Id: I87306f267c1390074c806c506a69b4ba914d92a2
This function graduated from being a test func to something that's on
by default. Rename it and remove some spurious comments that confuse
its status.
Change-Id: I689695a3ad29c35e9a72a43ec93766733ac6c20b
Loopfilter deltas are initialized to zero on keyframes in the decoder.
The values then persist from the previous frame unless an update bit
is set in the bitstream. This data is not included in the entropy
data saved by the 'refresh entropy' bit in the bitstream, so it is
effectively an additional contextual element beyond the 3 ref-frames
and the entropy data.
The encoder was treating this delta update bit as update-if-nonzero,
meaning that the value would be refreshed even if it hadn't changed,
and more significantly, if the correct value for the delta changed
to zero, the update wouldn't be sent, and the decoder would preserve
the last (presumably non-zero) value.
This patch updates the encoder to send an update only if the value
has changed from the previously transmitted value. It also forces the
value to be transmitted in error resilient mode, to account for lost
context in the event of lost frames.
Change-Id: I56671d5b42965d0166ac226765dbfce3e5301868
Create look up tables for controlling the active quantizer range.
Some initial tuning to improve quality circa 0.5% on test set.
Clean up of some stats output code
Change-Id: Ia698a8525f8b8129a503cadace3ee73fe888f543
Modified AltRef temporal filter to adapt filter length based
on macroblock coding modes selected during first-pass
encode.
Also added sub-pixel motion compensation to the AltRef
filter.
This patch avoids compiling some debugging code in onyx_if.c. The most
significant fix is to avoid generating code for vp8_write_yuv_frame,
which is never called. Some other code was removed by the dead code
elimination performed by the compiler, and this patch does it with the
preprocessor instead. There are advantages both ways.
Change-Id: I044fd43179d2e947553f0d6f2cad5b40907ac458
When ARFs are enabled in non-lagged compress modes, the GF interval
was being reset to zero. Non-lagged ARF updates were enabled in commit
63ccfbd, but this incorrect GF interval caused a quality regression.
Change-Id: I615c3b493f4ce2127044f4e68d0bcb07d6b730c3
Changes 'The VP8 project' to 'The WebM project', for consistency
with other webmproject.org repositories.
Fixes issue #97.
Change-Id: I37c13ed5fbdb9d334ceef71c6350e9febed9bbba
vp8_get_compressed_data() was defeating logic in
encode_frame_to_datarate() that determined the reference buffers to
search and forcing all frames to be eligible to search. In cases
where buffers have identical contents, this is unnecessary extra
work.
Change-Id: I9e667ac39128ae32dc455a3db4c62e3efce6f114
ARFs were explicitly disabled except in lagged compress mode. New
ARF logic allows for the ARF buffer to hold an older golden frame,
which does not require lagged compress.
Change-Id: I1dff82b6f53e8311f1e0514b1794ae05919d5f79
Moved partition_bmi and partition_count out of MB_MODE_INFO and
placed into MACROBLOCK. Also reduced the size of other members
of the MB_MODE_INFO struct. For 1080p, the memory was reduced
by 1,209,516 bytes. The decoder performance appeared to improve
by 3% for the clip used.
Note: The main goal for this change is to improve the decoder
performance. The encoder will be revisited at a later date for
further structure cleanup.
Change-Id: I4733621292ee9cc3fffa4046cb3fd4d99bd14613
These changes improve the behaviour of the code with
forced key frames sent in by a calling application.
The sizing of the frames is still suboptimal for two pass in
particular but the behaviour is much better than it was.
Change-Id: I35fae610c67688ccc69d11f385e87dfc884e65a1
The external API exposes the RC initial/optimal/full buffer level in
milliseconds, but this value was truncated internally to seconds. This
patch allows the use of the full precision during the conversion from
time to bits.
Change-Id: If8dd2a87614c05747f81432cbe75dd9e6ed2f04e
vp8_update_gf_useage_maps() is only used by the encoder. This patch
fixes the ability to build in decode-only or encode-only
configurations.
Change-Id: I3a5211428e539886ba998e09e8abd747ac55c9aa
At the end of the decode, frame buffers were being copied.
The frames are not updated after the copy, they are just
for reference on later frames. This change allows multiple
references to the same frame buffer instead of copying it.
Changes needed to be made to the encoder to handle this. The
encoder is still doing frame buffer copies in similar places
where pointer reference could be done.
Change-Id: I7c38be4d23979cc49b5f17241ca3a78703803e66
Change submitted for Adrian Grange. Convert threshold
calculation in ARNR filter to a lookup table.
Change-Id: I12a4bbb96b9ce6231ce2a6ecc2d295610d49e7ec
Previously we had assumed that it was necessary to give a full frame's
bit allocation to the alt ref frame if it has been created through temporal
filtering. This is not the case. The active max quantizer control
insures that sufficient bits are allocated if needed and allocating a
full frame's worth of bits creates an excessive overhead for the ARF.
Change-Id: I83c95ed7bc7ce0e53ccae6ff32db5a97f145937a
Corrected setting of "which_buffer" for U & V cases to match that
used for Y, i.e. to refer to the temporally most recent frame of
those to be filtered.
Change-Id: Idf94b287ef47a05f060da3e61134a0b616adcb6b
The new fdct lowers the round trip sum squared error for a
4x4 block ~0.12. or ~0.008/pixel. For reference, the old
matrix multiply version has average round trip error 1.46
for a 4x4 block.
Thanks to "derf" for his suggestions and references.
Change-Id: I5559d1e81d333b319404ab16b336b739f87afc79
1. Unavailability of each reference frame type should be tested
independently,
2. Also, only the VP8_GOLD_FLAG needs to be tested before setting
golden frame specific thresholds, and only VP8_ALT_FLAG needs
testing before setting thresholds relevant to the AltRef frame.
(Raised by gbvalor, in response to Issue 47)
Change-Id: I6a06fc2a6592841d85422bc1661e33349bb6c3b8
Since the intent is
to reset the appropriate bit in ref_frame_flags not to
test a logic condition. Prior result would always have
been ref_frame_flags being set to 0.
(Issue reported by dgohman, issue 47)
Change-Id: I2c12502ed74c73cf38e98c9680e0249c29e16433
When the license headers were updated, they accidentally contained
trailing whitespace, so unfortunately we have to touch all the files
again.
Change-Id: I236c05fade06589e417179c0444cb39b09e4200d
Visual c++ compiler uses xmm registers for floating point
operations for 64 bit architecture, therefore its calling
convention requires the preservation of xmm6-xmm15 in any
function that have used these registers. However, the sse2
functions, that were originally written for 32 bit windows,
may have used xmm6 and xmm7 without preserving the content.
In this particular case, the compiler used xmm6 to save
the variable "two_pass_min_rate", the value of the variable
is mucked up by our sse2 optimized loop filter functions,
hence the results of release/debug mismatching.