Since BPRED will be tested at most once, and SPLITMV is not enabled,
there's nothing to clobber the subblock modes, so there's no need to
save and restore them.
Change-Id: I7c3615b69190c10bd068a44df5488d6e8b85a364
In activity masking, RDO constant RDMULT is adjusted on a per MB basis
adaptive to activity with the MB. errorperbit, which is defined as
RDMULT/RDDIV, is a constant used in motion estimation. Previously, in
activity masking, errorperbit is not changed even when RDMULT is changed.
This commit changed to adjust errorperbit according to the change in
RDMULT.
Test in cif set showed a very small but consistent gain by all quality
metrics (average, overall psnr and ssim) when activity masking is on.
Change-Id: I07ded3e852919ab76757691939fe435328273823
This change is analogous to I0b67dae1f8a74902378da7bdf565e39ab832dda7,
which made the move for the non-RD path.
Change-Id: If63fc1b0cd1eb7f932e710f83ff24d91454f8ed1
This commit moves the intra block mode selection from encodeframe.c
to pickinter.c (in the non-RD case). This allowed pick_intra_mbuv_mode
and pick_intra4x4mby_modes to be made static, and is a step towards
refactoring intra mode selection in the main pickinter loop. Gave a
small perf increase (~0.5%).
Change-Id: I0b67dae1f8a74902378da7bdf565e39ab832dda7
Some further re-structuring of activity masking code.
Still has various experimental switches.
Supports a metric based on intra encode.
Experimental comparison against a fixed activity target rather
than a frame average, for altering rd and zbin.
Overall the SSIM performance is similar to TT's original
code but there is a much smaller PSNR hit of circa
0.5% instead of 3.2%
Change-Id: I0fd53b2dfb60620b3f74d7415e0b81c1ac58c39a
While investigating the effect of DC values on SAD and SSE in motion
estimation, a side finding indicates the two table of constants need
be adjusted. The adjustment was done by multiplying old constants by
90% with rounding. Also absorb the 1/2 scaling constant into the two
tables. Refer to change Ifa285c3e for background of the 1/2 factor.
Cif set test showed a very small gain on all metric.
Change-Id: I04333527a823371175dd46cb04a817e5b9a8b752
The encoder defined about 4 set of similar functions to calculate sum,
variance or sse or a combination of them. This commit removed one set
of these functions, get8x8var and get16x16var, where calls to the later
function are replaced with var16x16 by using the fact on a 16x16 MB:
variance == sse - sum*sum/256
Change-Id: I803eabd1fb3ab177780a40338cbd596dffaed267
In real-time mode motion search, there is no need to calculate
variance. This change improved encoding speed by 1% ~ 2%(speed=-5).
Change-Id: I65b874901eb599ac38fe8cf9cad898c14138d431
This patch attempts to reduce the peak bitrate hit by the encoder
when using small buffer windows.
Tested on the CIF set over 200-500kbps using these settings:
--buf-sz=500 --buf-initial-sz=250 --buf-optimal-sz=250 \
--undershoot-pct=100
Two pass encodes were tested at best quality. One pass encodes were
tested only at realtime speed 4:
--rt --cpu-used=-4
The peak datarate (over the specified 500ms window) was measured
for each encode, and averaged together to get metric for
"average peak," computed as SUM(peak)/SUM(target). This patch
reduces the average peak datarate as follows:
One pass:
baseline: 1.29715
this patch: 1.23664
Two pass:
baseline: 1.32702
this patch: 1.37824
This change had a positive effect on our quality metrics as well:
One pass CBR:
Min / Mean / Max (pct)
Average PSNR -0.42 / 2.86 / 27.32
Overall PSNR -0.90 / 2.00 / 17.27
SSIM -0.05 / 3.95 / 37.46
Two pass CBR:
Min / Mean / Max (pct)
Average PSNR -4.47 / 4.35 / 35.99
Overall PSNR -3.40 / 4.18 / 36.46
SSIM -4.56 / 6.98 / 53.67
One pass VBR:
Min / Mean / Max (pct)
Average PSNR -5.21 / 0.01 / 3.30
Overall PSNR -8.10 / -0.38 / 1.21
SSIM -7.38 / -0.11 / 3.17
(note: most values here were close to the mean, there were a few
outliers on files that were very sensitive to golden frame size)
Two pass VBR:
Min / Mean / Max (pct)
Average PSNR 0.00 / 0.00 / 0.00
Overall PSNR 0.00 / 0.00 / 0.00
SSIM 0.00 / 0.00 / 0.00
Neither one pass or two pass CBR mode adheres particularly strictly
to the short term buffer constraints, and two pass is less
consistent, even in the baseline commit. This should be addressed
in a later commit. This likely will hurt the quality numbers, as it
will have to reduce the burstiness of golden frames.
Aside: My work on this commit makes it clear that we need to make
rate control modes "pluggable", where you can easily write a new
one or work on one in isolation.
Change-Id: I1ea9a48f2beedd59891f1288aabf7064956b4716
Currently, hex search couldn't guarantee the motion vector(MV)
found is within the limit of maximum MV. Therefore, very large
motion vectors resulted from big motion in the video could cause
encoding artifacts. This change adjusted hex search bounds
checking to make sure the resulted motion vector won't go out
of the range. James Berry, thank you for finding the bug.
Change-Id: If2c55edd9019e72444ad9b4b8688969eef610c55
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.
Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
This is basically a slightly modified version of the previous patch,
and it has a moderately positive effect (SSIM/PSNR both +0.08% avg
on derf-set). Most clips show no change, except waterfall/coastguard,
each ~ +0.8% SSIM/PSNR. You can see similar effects in other clips
by shortening their length to terminate at a very short last group
of frames.
Change-Id: I7a70de99ca1f9fe6a8b6ca7a6e30e8a4b64383e4
this commit makes the usage errorperbit and sadperbit consistent for
encoding modes and passes. Removed all different magic weight factors
associated with errorperbit. Now 1/2 is used for both sadperbit16 and
sadperbit4, the /2 operation is merged into initializations of the 2
variables.
Tests on cif set show .23%, 0.18% and 0.19% gain by avg psnr, overall
psnr and ssim respectively.
Change-Id: Ifa285c3e065ce0a5a77addfc9f95aabf54ee270d
The fb_idx_ref_cnt book-keeping was in error. Added an assert to
prevent future errors in the reference count vector. Also fixed a
pointer syntax error.
Change-Id: I563081090c78702d82199e407df4ecc93da6f349
sad_per_bit has been used for a number of motion vector search routines
with different magic weights: 1, 1/2 and 1/4. This commit remove these
magic numbers and use 1/2 for all motion search routines, also reformat
a number of source code lines to within 80 column limit.
Test on cif set shows overall effect is neutral on all metrics. <=0.01%
Change-Id: I8a382821fa4cffc9c0acf8e8431435a03df74885
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
- Additional 3-6% speedup compared to neon optimized fast
quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)
Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
Misplaced #endif caused first_time_stamp_ever to only be initialized if
CONFIG_INTERNAL_STATS was set.
Change-Id: I2296a4ab00f7dfb767583edcc5d59b94f48c0621
Added preload instructions to armv6 encoder optimizations.
About 5% average speed-up on Tegra2 for VGA@30fps sequence.
Change-Id: I41d74737720fb71ce7a316f07555357822f3347e
in onyx_if.c update_reference_frames() make
sure that frame buffer indexes are not equal
before preforming a buffer copy. If two frames
share the same buffer the flags will already be
set correctly.
Change-Id: Ida9b5516d08e3435c90f131d2dc19d842cfb536e
Test showed using hex search in realtime mode largely speed up
encoding process, and still achieves similar quality like the
diamond search we have. Therefore, removed the diamond search
option.
Change-Id: I975767d0ec0539f9f6ed7fdfc09506e39761b66c
error_per_bit and sad_per_bit were designed as estimates of a bit worth
of sum squared error and sum absolute difference respectively. Under
this assumption, error_per_bit should be used in combination with 2nd
order errors (variance or sum squared error) while sad_per_bit should
be used in combination with 1st order SADs in motion estimation. There
were a few places where sad_per_bit has been misused with variances,
this commit changes to use error_per_bit for those places, also changes
parameter names to properly indicate which constant is being used.
On cif set, the change has a universal gain by all metrics: 0.13% by
average/overall psnr and 0.1% by ssim.
Change-Id: I4850fdcc3fd6886b30f784bd843f13dd401215fb
While profile=3, there is no sub-pixel search. Distortion and SSE
have to calculated using get_inter_mbpred_error().
Change-Id: Ifb36e17eef7750af93efa7d0e2870142ef540184
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO.
This reduced the memory footprint by 518,400 bytes for 1080
resolutions. The decoder performance improved by ~4% for the
clip used and the encoder showed very small improvements. (0.5%)
This reduction was first mentioned to me by John K. and in a
later discussion by Yaowu.
This is WIP.
Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
fixed a bug where active_worst_quality could be set
below active_best_quality which could result in an
infinite loop.
Change-Id: I93c229c3bc5bff2a82b4c33f41f8acf4dd194039
This patch collects the twopass specific memebers of VP8_COMP into a
dedicated struct. This is a first step towards isolating the two pass
rate control and aids readability by decorating these variables with
the 'twopass.' namespace. This makes it clear to the reader in what
contexts the variable will be valid, and is a hint that a section of
code might be a good candidate to move to firstpass.c in later
refactoring. There likely will be other rate control modes that need
their own specific data as well.
This notation is probably overly verbose in firstpass.c, so an
alternative would be to access this struct through a pointer like
'rc->' instead of 'cpi->firstpass.' in that file. Feel free to make
a review comment to that effect if you prefer.
Change-Id: I0ab8254647cb4b493a77c16b5d236d0d4a94ca4d
The partition_info struct contains info just for SPLITMV,
so it should be used instead of BLOCKD. Eventually, I want
to reduce the size of B_MODE_INFO struct found in BLOCKD, so
this is the first step toward that goal.
Also, since SPLITMV is not supported in vp8_pick_inter_mode(),
the unnecessary mem copies and checks were removed. For rt
encodes, this gave a slight performance improvement.
Change-Id: I5585c98fa9d5acbde1c7e0f452a01d9ecc080574
The error-concealer is plugged in after any motion vectors have been
decoded. It tries to estimate any missing motion vectors from the
motion vectors of the previous frame. Intra blocks with missing
residual are replaced with inter blocks with estimated motion vectors.
This feature was developed in a separate sandbox
(sandbox/holmer/error-concealment).
Change-Id: I5c8917b031078d79dbafd90f6006680e84a23412
rvct 4.1 was complaining about vstmia.16, store multiple expects 64 data type.
optimized the implementation.
Change-Id: I0701052cabd685c375637bbc3796ff6d88f5972c
This commit restructures the mb activity masking code
to better facilitate experimentation using different metrics
etc. and also allows for adjustment of the zero bin either
for encode only or both the encode and mode selection
stages
It also uses information from the current frame rather than
the previous frame and the default strength has been
reduced.
Change-Id: Id39b19eace37574dc429f25aae810c203709629b
This patch improves the accuracy of frame rate estimation by using a
larger, 1 second window. It also more quickly adapts to step changes
in the input frame rate (ie 30fps to 15fps)
Change-Id: I39e48a8f5ac880b4c4b2ebd81049259b81a0218e
The compiler produces better assembly when using int_mv
for assignments. The compiler shifts and ors the two 16bit
values when assigning MV.
Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
Further modification and wrong implementation fix which caused
refining_search and refining_searchx4 result mismatching.
Change-Id: I80cb3a44bf5824413fd50c972e383eebb75f9b6f
This is to reflect the RD improvement in the encoder. The change has a
small positive impact on quality (0.25% by VPXSSIM and 0.05% by PSNR)
Change-Id: Ic66ffc19b10870645088c0624c85556f009fd210
The variable is introduced in commit 2e53e9e53 to make more use of
trellis quantization, but this is no longer necessary after RDMULT
was made adaptive in a number of later commits.
Change-Id: I7420522ec7723f38cf77033466c25afb405d52ae
global values were being referenced, but the GOT was not being set up.
as the GOT is only required for PIC, this issue wasn't caught in the
default configuration.
Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f
http://code.google.com/p/webm/issues/detail?id=328
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.
Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.
Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
Paul pointed out that the pointer to the gf_active_flags is not being
properly incremented in multithreaded encoder. This commit fixes the
issue by making sure the gf_active_ptr points to the starting of next
group of mb rows.
Change-Id: I3246e657d23beabb614dfb880733a68a5fd7e34c
Commit db5057c introduced a bug in that the active_worst_quality
selected by the 2 pass rate controller was being overridden for key
frames, causing a severe quality loss.
Change-Id: I4865a6fbe3e94e9b4fb9271c7dd68b455d7b371d
vp8_fast_quantize_b_neon function updated and further optimized.
- match current C implementation of fast quantizer
- updated to use asm_enc_offsets for structure members
- updated ads2gas scripts to handle alignment issues
Change-Id: I5cbad9c460ad8ddb35d2970a8684cc620711c56d
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.
This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.
Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
This patch is to fix a rare hang in multi-thread encoder that was
only seen on Windows. Thanks for John's help in debugging the
problem. More test is needed.
Change-Id: Idb11c6d344c2082362a032b34c5a602a1eea62fc
Changed 8-neighbor searching to 4-neighour searching, and continued
searching until the center point is the best match.
Test on test set showed 1.3% encoding speed improvement as well as
0.1% PSNR and SSIM improvement at speed=-5 (rt mode).
Will continue to improve it.
Change-Id: If4993b1907dd742b906fd3f86fee77cc5932ee9a
The commit also removed the slow ssim calculation that uses a 7x7
kernel, and revised the comments to better describe how sample ssim
values are computed and averaged
Change-Id: I1d874073cddca00f3c997f4b9a9a3db0aa212276
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35
Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.
Change-Id: Ie02eaa6fed662866a937299194c590d41b25bc3d
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.
Change-Id: Ib5172eaa57323b939c439fff8a8ab5fa38da9b69
Combine calc_iframe_target_size, previously only used for forced
keyframes, with calc_auto_iframe_target_size, which handled most
keyframes.
Change-Id: I227051361cf46727caa5cd2b155752d2c9789364