The rd_thresholds are adaptively changed based on best mode tested.
It was only changed for the same block size, this commit makes the
adaptation for similar block sizes too. The commit also made minor
adjustment and code cleanups.
The impact on encoding time for _ped:
118089 ms -> 111927 ms
The impact on compression:
derf: -0.339%
stdhd: -0.303%
Change-Id: I8817fed1102350497f2ec631849e43f753878e5d
Adds code to return an integer cost list for NSTEP search. Then
uses it for pruned subpel search in speed 3.
derf: -0.06%
Speed on mobcal 720p increaes from 10.28 fps to 10.65 fps.
[Subject to further testing].
Change-Id: Ib591382d25b2c11bcaba9d3a27a93a9d1ab27a96
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.
This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.
Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
vpx_svc_parameters_t contains id, resolution and min/max qp for each spatial layer.
In this change we will use extra config to send min/max qp and scaling factors, then calculate layer resolution inside encoder.
Change-Id: Ib673303266605fe803c3b067284aae5f7a25514a
Substantial restructuring of the way we estimate
the rate of decay in prediction quality and determine
the arf interval and amount of boost used.
Also other changes to support moving to a lower first pass
Q which exposes some new features and allows us to better
distinguish genuinely static blocks from low motion or noisy
blocks.
Net gains now visible on all the test sets with std-hd PSNR up
1.87%. There are still some bad outlier cases but most of these
are low motion or slide show type content where the metrics
are already high at any given rate. The best + case is up by
more than 10%.
Change-Id: I18e25170053bdf3188f493ff8062f48a74515815
This commit adds back sse2 or ssse3 optimized versio of a couple of
functions, fixes a ~10% performance regression.
Change-Id: I049786906e5a641224dced63c6492aec9d86d183
The ARF frame should always be the same size as the
native resolution of the input frames.
It will be scaled to the required resolution at
encode time.
Change-Id: I0afe858129aa6ef65b1648f43476331715346896
This commit makes the encoder to use non-zero mode threshold for
NEARESTMV modes. The runtime for test clips of speed 3 is reduced
by about 1%.
pedestrian 1080p 2000 kbps, 143239 ms -> 141989 ms
bus CIF 1000 kbps, 7835 ms -> 7749 ms
The compression performance change is about -0.02% for both derf
and stdhd.
Change-Id: Ib71808922c41ae2997100cb7c561f68dcebfa08e
If the partition block is skippable, which means no coefficients
for Y, U, and V planes, its skip flag is set to 1. No quality
change (verified by borg tests), and no noticeable speed change.
Change-Id: I9231f720f8dd6364384cf05aa148ca24d75450f1
Libvpx was memseting every external frame buffer before decode. This
was to work around a valgrind issue in our C loop filter. Most of
the time this was not needed and we have noticed some significant
performance loss on some platforms. Now we require the application to
zero out the buffers if it is using external frame buffers.
Change-Id: I7330d00a315e65137ed30edd5f813e8929b76242
This commit enforces ARF validation check for compound inter modes.
It avoids potential access to ARF in the encoding process if it
is not allowed.
Change-Id: I055fec946b5d19d97937dc9001e1e564923e2439
The valid reference frame check in sub8x8 rate-distortion
optimization search has been included in the ref_frame_skip_mask
scheme. This commit removes the later further validation checks
that are not in effect.
Change-Id: I853b477c44037d3dc0afec6cbfce08a96c597a75
This commit replaces the best_ref_index table fetch with the use
of best_mbmode in vp9_rd_pick_inter_mode_sub8x8.
Change-Id: I882ee9ee6a8c0e61befcca1f4dba6d2ea8de8f13
The issue was discovered on bitstream with 2x vertical downscale. For
zero MVs, y_pad is set to 1 only when vertical convolution is
required. The original code assumes that for y_step_q4 == 32 we don't
perform vertical convolution. But vp9_setup_scale_factors_for_frame()
sets convolve functions so that when x_step and y_step are both not
equal to 16, convolve in both directions is performed. And convolve()
unconditionally subtracts one stride from source pointer when calls
convolve_horiz(). This leads to invalid memory access.
Change-Id: I882dfa6081a58e172b5ffa55842bfcd6727f10bf
Call to vp9_rc_get_second_pass_params() moved from
Pass2Encode() to earlier in vp9_get_compressed_data(),
to ensure that two pass stats and parameters are
available before decisions such as frame scaling.
Change-Id: If21537f0073919b04696a7d5e9aac78e23d76f39
When a reference frame type is not in the frame buffer, the mode
search threshold will be set to INT_MAX, so as to effectively
turn off the mode entries in the rate-distortion optimization loop
that involves this reference frame type. This operation is now
integrated in the ref_frame_skip_mask scheme. This commit hence
removes the redundant mode search threshold setting.
Change-Id: Ib18f45da611afda2af275201efd367df7f5101ab
This commit unifies the reference frame control in the rate-
distortion optimization search loop of sub8x8 block size to remove
the control dependency on mode search order.
Change-Id: I3a174099f71a7cc176ede9fd60e2374243ae9232
Improves function to return sad of integer pels by reusing integer
pels already visited in the smallest scale.
Turns on BIGDIA search for speed 4. Also, turns on the
first version of the pruned subpel search at this speed.
derf: -0.32% (speed 4)
Speed seems to improve by at least 5% but subject to verification.
Change-Id: Iaec8eaffd61d6237ac029e6a2a1b0a88b2a35271
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
The variable best_inter_rd is effectively not in use in the rate-
distortion mode search loops of both regular block sizes and sub8x8
block sizes.
Change-Id: I178f909f8c9629772e13adc6257908653b2adf31
The speed feature that skips compound inter prediction modes was
subsumed by other speed features and effectively was not in use.
This commit removes it.
Change-Id: I22b0c71a8ddd15d93b25d86fa63a1dce2ba6a1a9
If optimizations use more than one cpu feature, allow
specifying them so that '--disable-X' still works
https://code.google.com/p/webm/issues/detail?id=854
Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18
Integrate intra mode mask speed feature with the mode_skip_mask
scheme. Move it outside the mode search loop in the
vp9_rd_pick_inter_mode_sb function.
Change-Id: I7738fea749bfdc08ad05d7f2524feb8ff67568d9
This speed feature is used in real-time setting only. Remove the
related condition check in the rate-distortion optimization search
loop.
Change-Id: Iaacc1e268214634e6f95c5048c28a60cec6c42fc
Refactor overlay frame speed-up related function. Make it unified
with the ref_frame_skip_mask system and Move it out of the mode
search loop.
Change-Id: I0dde9baf44354f6ba00b4679cba02fa6a30c7316
This commit refactor the rate-distortion optimization search for
regular block sizes to remove the speed feature dependency on mode
search order.
Change-Id: Ied033ee484c2957e17baa7b6450b720fe7dd0e7d
This issue is found when the denoising mode is set to kDenoiserOnYUVAggressive.
Updated the C code to make it the same with SSE version.
I also changed several lines in VP9 denoiser for the code style.
Change-Id: I640d48cf946fe8c6a400e6e252107501d1e226d3
don't bother decoding any further after receiving an earlier decode
error until a key/intra-only frame is encountered.
Change-Id: I381917b70d7a9e6f8d6de42e3d181bb113a4cec4
This commit fixes a bug related to skipping intra mode checking, by
using a separate variable to store the best prediction error from
inter mode. It avoids unintentionally overwriting intra mode
rate-distortion cost, and hence affecting other speed features.
Change-Id: I99e12993339c84c8b4f597996b372012e5858fae
Assigning selected reference frame pointer is done in the
encode_superblock function. No need to do this at the end of
rate-distortion optimization search.
Change-Id: I33fcede0fd304b4a4c4deef2d126d79546a9c070
This commit refactors the vp9_rd_pick_inter_mode_sb function to
remove the intra mode early termination dependency on the mode
search order.
Change-Id: If6ac49aa7c530c7b9a5bd31b0ab84db83e192bec
This commit allows the encoder to find current best prediction mode
state using best_mbmode, instead of fetching from the static mode
search table via best_mode_index.
Change-Id: Ibefeab83aed33a49c2be03e83f09153856ca4271
The use of use_lastframe_partitioning is totally removed in good-
quality encoding. Its usage in real-time encoding needs to be
evaluated to see if it can be removed too.
The Borg tests at speed 4 showed:
stdhd set: 0.220% psnr gain, 0.166% ssim gain;
derf set: 0.329% psnr gain, 0.476% ssim gain.
Speed test on selected clips showed 1.54% speedup.(Worst case:
pedestrian_area_1080p25.y4m, speed loss: 1.5%)
Change-Id: I1c844d329b0b5678558439b887297c1be7ddab00
the code currently checks whether the allocation has been done instead
of allocating on the first frame.
since:
4f27202 vp9: fix crash in mt loopfilter w/corrupt file
this change defers the allocation until the loop filter is used.
Change-Id: I660c1b7f34e713a8dd9884483f01d23b9847366e
The speedup in rd_pick_partition() function makes it possible
to drop use_lastframe_partitioning feature. By doing that, we
achieve good PSNR gain with small speed loss. Also, this makes
encoding loop less complicated. The code cleanup patch will
follow.
Borg tests showed:
1. At speed 2,
stdhd set: 0.201% PSNR gain, 0.133% SSIM gain;
derf set: 0.262% PSNR gain, 0.276% SSIM gain.
2. At speed 3,
stdhd set: 0.139% PSNR gain, 0.109% SSIM gain;
derf set: 0.447% PSNR gain, 0.442% SSIM gain.
The average speed loss over selected test clips is within 1%
with the worst case of 4%.
Change-Id: Icfd2ded7869372b585a6972855d933b3d0280d90
The rate costs calculated for inter modes are not precise in some
cases, which causes NEWMV is chosen instead of NEARESTMV, NEARMV,
and ZEROMV. This patch added checks for these cases, and corrected
the mode decisions.
Borg tests at speed 3 showed:
1. stdhd set: 0.102% PSNR gain and 0.088% SSIM gain.
2. derf set: 0.147% PSNR gain and 0.132% SSIM gain.
No speed change.
Change-Id: I35d17684b89ad4734fb610942d707899146426db
Removed functions:
* vp9_post_proc_down_and_across_mmx
* vp9_mbpost_proc_down_mmx
* vp9_plane_add_noise_mmx
They all have sse2 equivalent.
Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff
vp9_variance_sse2.c contains a mix of intrinsics and references to
assembly which uses x86inc.asm; it's conditionally included as a result.
Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec
allocations within vp9_alloc_context_buffers() rely on mi_rows/mi_cols
individually, use those to determine whether to realloc rather than
stride and stride * rows. this fixes a crash with some fuzzed files for
invalid accesses into last_frame_seg_map and above_context.
Change-Id: I7b9f40dcf170d443890f3bd2acd285507943c7d4
proceeding using a corrupt (incompletely decoded) frame reference may
lead to incorrect assumptions about allocation sizes leading to a crash.
Change-Id: I76e74f2e1be127c2e2c7e1174bb3307497dfd23d
This commit turns on adaptive motion search for ARF coding, in
addition to other normal inter frame coding. It improves the
average compression efficiency:
stdhd 0.1%
derf 0.04%
For the test sequences, the speed 3 runtime is reduced:
pedestrian 1080p 2000 kbps, 149932 ms -> 144580 ms, (3.3% speed-up)
bus CIF 1000 kbps, 8050 ms -> 7895 ms, (1.9%)
highway CIF 100 bkps, 45033 ms -> 44078 ms, (2.2%)
Change-Id: I5228565b609f99e8ae04f6140a2bf2b64a831d21
This is to keep the same with VP8 denoiser.
If motion magnitude is small,
make denoiser more aggressive.
Change-Id: I942a6e2f2ed9aec6f0c4c1f9e5fa47066cadcc0c
When the first try of denoising turns out to be too much,
we will use a softer filter by adopting an adjustment to
make the result closer to original pixel (as in VP8 denoiser).
The old code made the adjustment in the wrong direction.
Change-Id: I84e28fa9e01eef47c5a37d5a2e6d3d378a06786b
This commit allows the encoder to store outcomes of single reference
frame modes and compares them to decide if the inter prediction
filter, forward transform, and quantization can be skipped.
The compression performance of speed 3 is down
derf -0.364%
stdhd -0.198%
For test sequences, the speed 3 runtime is reduced
highway CIF 100 kbps, 51976 ms -> 45033 ms, 13% speed-up
stockholm 720p 1000 kbps, 71826 ms -> 67838 ms, 5.5% speed-up
pedestrian 1080p 2000 kbps, 154924 ms -> 150702 ms, 2.6% speed-up
Change-Id: I5aa26f918d2b4b5197a2c0afa2779319f1c88e44
intra_super_block_yrd() and inter_super_block_yrd() are largely same,
this commit merges them into one to reduce code duplication.
Change-Id: I64d7042a5b099345627cf55663010c185b25ec37
From 3 to 2, which seems to be slightly positive on compression for
all test sets, also reduces encoding time by 2%-5%, varying on the
test clips.
Change-Id: If045417bd27311700c919b4a335eff0dc1130ae0
This commit removes the special case for key frame, as transform size
decision is controlled by the appropriate speed feature for all lossy
coding modes: tx_size_search_method.
Change-Id: I9677171e3f2432ec23705f7c5ea8170dd4562fae
This commit allows the encoder to skip check on compound inter
modes in the rate-distortion optimization loop, if the reference
frame bias signs are the same.
Change-Id: Ib753e6bb11cbdd338aee69dbe2b649671f75a6b0
Adds config parameter vp9_highbitdepth, to support highbitdepth profiles.
Also includes most vpx level high bit-depth functions. However
encode/decode in the highbitdepth profiles will not work until
the rest of the code is in place.
Change-Id: I34c53b253c38873611057a6cbc89a1361b8985a6
It's built based on current spatial svc code.
We only support one spatial two temporal layers at this time.
Change-Id: I1fdc8584354b910331e626bfae60473b3b701ba1
This commit skips the compound inter mode prediction check in the
rate-distortion optimization loop for ARF coding. It reduces the
runtime for certain test clips at speed 3, at no compression
performance change:
bus CIF 1000 kbps, 8260 ms -> 8090 ms, 1.8% speed-up
stockholm 720p 1000 kbps, 74453 ms -> 71826 ms, 2.9% speed-up
No visible speed-up for pedestrian area 1080p at 2000 kbps.
Change-Id: Ic68aa56837159b726563b784e2e3729e846465ad
Use unsigned int type to store the sse in the pixel domain. The
precision is sufficient to handle sse of block size up to 64x64.
The transform domain version however needs int64_t, since there is
a transfer gain applied in the forward transformation that might
cause unsigned int overflow.
Change-Id: Ifef97c38597e426262290f35341fbb093cf0a079
store the number of allocated rows in VP9LfSync, the calculated values
can not be relied on when dealing with corrupt material.
Change-Id: I13b8bcec9738c299a71df726772ab7ac05511e5b
This commit allows encoder to skip intra coding mode test, when
the known inter residual is less than the source variance. It
reduces the runtime of speed 3 for test clips:
bus cif 1000 kbps: 8587 ms -> 8260 ms, 3.8% speed-up
pedestrian 1080p 2000 kbps: 161381 ms -> 155241 ms, 3.7% speed-up.
The compression performance is down by
derf -0.36%
stdhd -0.25%
Change-Id: I75ce1e035b4da2153cb1ac14111d1a07c05a735d
This commit extends the sse and forward transform computation flag
to support the case 64x64 blocks where there are 4 32x32 2D-DCT
blocks.
Change-Id: I86a3e805dfaa0f3abd812f590520c71aa0e40473
In order to understand memory layout consider the declaration of the
following structs. The first one is a part of our API:
struct vpx_codec_ctx {
// ...
struct vpx_codec_priv *priv;
};
The second one is defined in vpx_codec_internal.h:
struct vpx_codec_priv {
// ...
};
The following struct is defined 4 times for encoder/decoder VP8/VP9:
struct vpx_codec_alg_priv {
struct vpx_codec_priv base;
// ...
};
Private data allocation for the given ctx:
struct vpx_codec_ctx *ctx = <get>
struct vpx_codec_alg_priv *alg_priv = <allocate>
ctx->priv = (struct vpx_codec_priv *)alg_priv;
The cast works because vpx_codec_alg_priv has a
vpx_codec_priv instance as a first member 'base'.
Change-Id: I10d1afc8c9a7dfda50baade8c7b0296678bdb0d0
In the partition search, the encoder checks all possible
partitionings in the superblock's partition search tree.
This patch proposed a set of criteria for partition search
early termination, which effectively decided whether or
not to terminate the search in current branch based on the
"skippable" result of the quantized transform coefficients.
The "skippable" information was gathered during the
partition mode search, and no overhead calculations were
introduced.
This patch gives significant encoding speed gains without
sacrificing the quality.
Borg test results:
1. At speed 1,
stdhd set: psnr: +0.074%, ssim: +0.093%;
derf set: psnr: -0.024%, ssim: +0.011%;
2. At speed 2,
stdhd set: psnr: +0.033%, ssim: +0.100%;
derf set: psnr: -0.062%, ssim: +0.003%;
3. At speed 3,
stdhd set: psnr: +0.060%, ssim: +0.190%;
derf set: psnr: -0.064%, ssim: -0.002%;
4. At speed 4,
stdhd set: psnr: +0.070%, ssim: +0.143%;
derf set: psnr: -0.104%, ssim: +0.039%;
The speedup ranges from several percent to 60+%.
speed1 speed2 speed3 speed4
(1080p, 100f):
old_town_cross: 48.2% 23.9% 20.8% 16.5%
park_joy: 11.4% 17.8% 29.4% 18.2%
pedestrian_area: 10.7% 4.0% 4.2% 2.4%
(720p, 200f):
mobcal: 68.1% 36.3% 34.4% 17.7%
parkrun: 15.8% 24.2% 37.1% 16.8%
shields: 45.1% 32.8% 30.1% 9.6%
(cif, 300f)
bus: 3.7% 10.4% 14.0% 7.9%
deadline: 13.6% 14.8% 12.6% 10.9%
mobile: 5.3% 11.5% 14.7% 10.7%
Change-Id: I246c38fb952ad762ce5e365711235b605f470a66
Updates the vp9_pattern_search function to return integer one-away
neighbors' sad values, for subsequent use in speeding up the
sub-pel search. Also, removes code for the do_refine option
which is not being used currently.
Updates the integer and subpel functions to pass in a 5-element
sad list for output or input.
A new pruned sub-pel search algorithm is implemented that uses
the sad returned from the integer pel search. But it is not
deployed yet.
Change-Id: Ifa9f5ad024b5b660570366d2bd900343e1891520
this change is proactive: the loop filter expects valid input and may
produce undefined results / crash in other cases.
Change-Id: I6cc1e966062a91cbc6db981c87cd03d9129fc8fe
attempting to decode a frame after the previous frame failed has the
potential of interrupting an earlier loop filter task
Change-Id: I6f2b1ddcdf5b89c3e2ee8caf5289dada2a087d66
This commit re-work the operation flow related to prediction
residual generation and the rate-distortion modeling. It saves one
call for model_rd_for_sb.
Change-Id: Icaf96c0ff09c903637ed5283448afe01d798195f
The value of switchable rate has been stored in a local variable.
This change skips the second call to vp9_get_switchable_rate() by
reusing the local variable.
Change-Id: Ib7d3fef7621cc4bde94c6d6e6b3a71f1fd4559f2
Check the mode and motion vector cost. If it is already above
the existing best rate-distortion cost, skip the rest check process
on this mode.
Change-Id: Ie065cebdfda2a3be3be18b8e8b43dc29aaa8c179
This commit makes the rate distortion modeling run in the unit of
maximum transform block size. No compression/speed change observed.
It is for the use of later fast forward transform purpose.
Change-Id: Ibaaedb69c765e8d0c5d5012f0ec07f36fd9f68fd
if the first frame was corrupt and loop filter not called, the next call
would assume the necessary allocations had been done and segfault when
accessing a NULL pointer
Change-Id: Ib6ef505e5c594e6f0fe65ab0700172bcf06b92a6
This commit addes a new strategy to reduce the search for optimal
interpolation filter type. The encoder counts and store how many each
filter type is selected and used for each of the reference frames.
A filter type that is rarely used for all three reference frames is
masked out to avoid computation.
The impact on compression is neglectible:
-0.02% on derf
+0.02% on stdhd
Encoding time is seen to reduce by 2~3%.
Change-Id: Ibafa92291b51185de40da513716222db4b230383
We can use one frame context for each layer so that we don't have
to reset the probs every frame. But we can't use prev_mi since we
may drop enhancement layers. So we have to generate a non vp9
compatible bitstream and modify it in the player.
1. We need to code all frames as invisible frame to let prev_mi
not to be used. But in the bitstream we need to code the
show_frame flag to 1 so that the publisher will know it's
supposed to be a visible frame.
2. In the player we need to change the show_frame flag to 0 for
all frames. Then add an one byte frame into the super frame
to tell the decoder which layer we want to show.
Change-Id: I75b7304cf31f0ab952f043e33c034495e88f01f3
The function was called in two places. In the first case it is replaced
with vp9_set_speed_features() call. In the second case the body of set_speed_features() is inlined.
Change-Id: If3fdf1b4168eee97677c224f69c245fe46c7f606
This patch fixes slow first pass problem. Mode could only be determined
from the deadline value during frame encode call. Unfortunately, we use
mode value before any encode calls during the first pass encoding (see
set_speed_features() logic). The mode for the first pass must be different
from BEST to make first pass fast.
Change-Id: I562a7d32004ff631695d91c09a44d8a9076fd6b5
The case where frame width increases but the overall memory
size required to hold the mi arrays does not was not
handled.
Change-Id: I72e70b912a7d1766687ad682979f1c9ee124449b
My fault, that was a float (not integer) which was converted to int64_t.
This reverts commit a885e1cbf0
Change-Id: Ic50708b959e1c3cb3e37da1429d334fafc3391d6
In the full-rd transform size search, we go through all transform
sizes to choose the one with best rd score. In this patch, an
early termination is added to stop the search once we see that the
smaller size won't give better rd score than the larger size. Also,
the search starts from largest transform size, then goes down to
smallest size.
A speed feature tx_size_search_breakout is added, which is turned off
at speed 0, and on for other speeds. The transform size search is
turned on at speed 1.
Borg test results:
1. At speed 1,
derf set: psnr gain: 0.618%, ssim gain: 0.377%;
stdhd set: psnr gain: 0.594%, ssim gain: 0.162%;
No noticeable speed change.
3. At speed 2,
derf set: psnr loss: 0.157%, ssim loss: 0.175%;
stdhd set: psnr loss: 0.090%, ssim loss: 0.101%;
speed gain: ~4%.
Change-Id: I22535cd2017b5e54f2a62bb6a38231aea4268b3f
This commit enables the encoder to record the location of the
center frame to generate alter reference frame. It then allows to
skip checking prediction modes of other reference frame types when
it comes to encode this frame.
The speed 3 runtime is reduced for the test sequences:
bus at CIF 1000 kbps, 9791 ms -> 9446 ms, i.e., 3.5% speed-up,
pedestrian at 1080p 2000 kbps, 184043 ms -> 175730 ms, i.e., 4.5%
speed-up.
No compression performance change observed.
Change-Id: Iacfde3bcc1445964e7a241f239bd6ea11cb94bd1
1. Clean the code for encode frame tests
2. Add encode w/ and w/o alt reference frame test
3. Add encode SNR layers test
4. Add encode multiple layers but decode partial layers test
Change-Id: Ibd2c9bc02525db584a6f931a98405f2d851b3cd6
This reverts commit 5509b7fd8f
Observed a big drop in compression quality and speed for speed 1 for a 360p test clip, revert this now for investigation.
Change-Id: If69dc8d77a225b34dc7907a9472e1a7a0a22762d
Add a speed feature to give the tighter partition search
range. Before partition search, calculate the histogram
of the partition sizes of the left, above and previous
co-located blocks of the current block. If the variance of
observed partition sizes is small enough, adjust the search
range around the mean partition size, which will be tigher.
The feature is currently turned on at speed 2. Experiments on
sample youtube clips show on average the runtime is reduced
by 3-7%.
For hard stdhd clips:
park_joy_1080p @ 15000kbps: 509251 ms -> 491953 ms (3.3%)
pedestrian_area_1080p @ 2000kbps: 223941 ms -> 214226 ms (4.3%)
The PSNR performance is changed:
derf: -0.112%
yt: -0.099%
hd: -0.090%
stdhd:-0.102%
Change-Id: Ie205ec5325bf92ec5676c243e30ba9d0adca10f2
In the current implementation of the encoder,
frame buffers may come from the wider set of
12 such buffers, and is not restricted to the
8 allowed as reference frames. This is only
an implementation detail and does not affect
the constraint of having a total of 8 reference
buffers overall.
Change-Id: I075f777146c2df49c275d89232933f8127235175
At --good and speed 3 or above for resolution less than 720p. This
disables the tests for 64x64 intra prediction modes. Encoding time
reduction is about 1%.
Change-Id: Ib396e3d1417fece416e3f0fee929b128acbb130f
The test to determine if the mode info buffers need
to be resized when the frame size changes was
incorrect, as per bug 837.
By storing the size of the allocated data structure,
a simple test determines whether to allocate more
memory when the frame size changes.
Change-Id: I1544698f2882cf958fc672485614f2f46e9719bd
In the encoder, current_video_frame is used in a couple of places to
decide encoding strategy, this commit replaces with more appropriate
variables.
Change-Id: I3d3d8d8e2ea02c489e4639b9d4c446a63e357d29
This commit moves the simplified coefficient probability model
and costing update to speed 4, and turns on chessboard pattern
mode search for sub 720p sequences. The overall coding performance
of speed 3 is improved:
derf 0.889%
stdhd 1.744%
The speed 3 runtime for test sequences are improved:
bus cif at 1000 kbps 9823 ms -> 9642 ms
pedestrian 1080p 2000 kbps 189559 ms -> 183284 ms
Change-Id: Iecbc7496a68f31fd49fb09f8dfd97c028d675a5d
This commit enables the encoder to skip NEARMV and ZEROMV if the
above and left blocks have identical reference frame, and the
current reference is different from that. It reduces the runtime
of speed 3 for test sequences:
bus cif at 1000 kbps 10064 ms -> 9823 ms
pedestrian 1080p at 2000 kbps 193078 ms -> 189559 ms
The compression performance is changed by
derf -0.085%
stdhd -0.103%
Change-Id: If304f26d42e6412152a84c3dd7b02635c38444f4
This commit allows the encoder to check the above and left neighbor
blocks' reference frames and motion vectors. If they are all
consistent, skip checking the NEARMV and ZEROMV modes. This is
enabled in speed 3. The coding performance is improved:
pedestrian area 1080p at 2000 kbps,
from 74773 b/f, 41.101 dB, 198064 ms
to 74795 b/f, 41.099 dB, 193078 ms
park joy 1080p at 15000 kbps,
from 290727 b/f, 30.640 dB, 609113 ms
to 290558 b/f, 30.630 dB, 592815 ms
Overall compression performance of speed 3 is changed
derf -0.171%
stdhd -0.168%
Change-Id: I8d47dd543a5f90d7a1c583f74035b926b6704b95
The function is called only once, right after all stats counters are
reset to 0. Therefore all the computations have zero effect on return
values. This commmit to removed those effectless code.
Change-Id: I50d27c0802547921fa36c60aa4bd92d76247f595
'ref_frame_map' is initialized to -1. avoids using an invalid index if
VP9_GET_REFERENCE/VP8_COPY_REFERENCE controls are issued after a decode
error.
Change-Id: I4599762c4d0b07a5943a72bf4a86ccb596cc062a
if the decode of the first frame fails, frame_to_show may not be set.
fixes a crash in vpxdec with corrupt data.
Change-Id: I5ab9476d005778a13fd42a39d05876bb6c90a93c
At speed 6 the smallest partitioning was 16x16 and biggest
intra block was 8x8, essentially disallowing all intra blocks
which produces ugly artifacts when revealing new video.
Change-Id: I364042d4c64e09be0666ade64aac94d0a1b586cf
This commit enables encoder to select fast forward transform and
quantization path according to the prediction residual sse/variance,
in the rate-distortion optimization scheme.
Change-Id: Ief9fc3844fd4107166d401970e800c6e5ce2b5fe
We had a very complicated way to initialize cpi->pass from
cfg->g_pass:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->mode = ONE_PASS_GOOD;
break;
case VPX_RC_FIRST_PASS:
oxcf->mode = TWO_PASS_FIRST;
break;
case VPX_RC_LAST_PASS:
oxcf->mode = TWO_PASS_SECOND_BEST;
break;
}
cpi->pass = get_pass(oxcf->mode).
Now pass is moved to VP9EncoderConfig and initialization is simple:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->pass = 0;
break;
case VPX_RC_FIRST_PASS:
oxcf->pass = 1;
break;
case VPX_RC_LAST_PASS:
oxcf->pass = 2;
break;
}
Change-Id: I8f582203a4575f5e39b071598484a8ad2b72e0d9
Eliminated instructions by using better neon instructions
and rearranging the loop.
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~1.0%.
Change-Id: I6b1700e79318f647ea67ef25e954c308932950ec
Replaced encoder and decoder functions to get a pointer
to a reference frame with a common function, vp9_get_ref_frame,
and simplified it.
Change-Id: Icb206fcce8caace3bfd1db3dbfa318dde79043ee