In 8-tap filtering, to guarantee the intermediate results fit in
16 bits, the order of accumulating the products needs to be done
correctly, and the largest product should be added last. This
patch fixed the problem using the method in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation".
Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
Restructuring to allocate the bits for each frame in
a GF group at the time the group is defined.
At the moment the allocation closely mirrors what
we had before.
Also changes the default rate adjustment method to
LONG_TERM_VBR_CORRECTION.
Change-Id: Ie5793c46c6b9c888cead5d8790792efd7d60b7c1
As mismatchs were found between the intrinsic version and c only. The
commit temporarily revert to use the matching assembly version to
allow further investigation.
Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
Use the appropriate subblock offset mode info rather than the parent
block base, when filling mbmi in the pc tree in nonrd_use_partition.
This mimics what is done in the vertical case and what is done for
both cases in nonrd_pick_partition.
This change has little practical effect at the moment since in speed 5
rt horizontal and vertical partitions are currently only used unpaired
at edges of the picture.
Change-Id: I4632f66ca84086dac56c7d36b45ddbe38a06f42a
This did the same correction as the one in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation" to avoid saturation
during filtering.
Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6
If we are already saving a lot in bits from the target (maximum)
bitrate in the constrained quality mode, allow the quantizer
to go lower than the cq level. This hopefully will solve issues
with getting too low a bitrate and consequently poor quality for
certain videos in cq mode.
Change-Id: I1c4e8b0171fcf58f95198b3add85eea5f3c8f19f
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
This commit changed to enable the encoder to adjust motion dection
speed threshold based on picture size. In addition, cpu-used 1 now
does a partition search every other frame instead of every third
frame for low resolution inputs.
The change has no quality/speed impact for 720p and above. Test
showed the change increase encoding time by between 3% to 6% for
cpu-used 2 encodiong of 360p sequences. It also has a compression
gain about .3%.
For cpu-used 2, the change resolved some very disturbing visual
artifacts in certain sequences when large block partitionings and
transforms are used as a result of copying the partition from a
previous frame.
Change-Id: Ic7fd22508cdb811d4ca935655adbf20109286cfa
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.
Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.
|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|
For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.
For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
This commit adjusts the forward 16x16 DCT computation steps to
simplify the register level operations. It fixes the corresponding
sse2 version accordingly.
Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08
Added a skipping test in non-rd inter-mode. After interpolation
prediction step, the residuals are tested to see if they will be
quantized to 0 based on modeling between spatial domain and
frequency domain.
Set static-thresh to 800 for >=720p and 300 for <720p, rtc set
tests showed
1. Speed 5, psnr: -0.514%; ssim: -1.748%;
speedup on related clips: 5% -11%
2. Speed 6, psbr: -0.628%; ssim: -1.637%;
speedup on related clips: 4% - 9%
Change-Id: I62fbf26bc043ecd2b584f255f1a4ee5ab52bfcf3
Make all post-processor code conditionally
compilable based on the CONFIG_VP9_POSTPROC
macro.
Also, remove the vizualization code from VP9
since it is out of date and will not compile.
Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd
vp9_block_error_sse2 can only handle 16 bytes at a time but
the function requires to handle a sequence of 32 bytes at a time
so each 16 bytes is handled in a different register.
With AVX2 optimization the 32 bytes can be handled in one register instead
of two in the SSE2
The vp9_block_error was optimized by 85%.
The user level was optimized by 1.2%
Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
The various motion search functions share a
common function prototype. In the case of
vp9_full_range_search() two of the parameters
are not needed.
Change-Id: I0e190af54a3b3f276409f20e8ec55912f9b0b798
Simplify the calculation of KF bitrate in similar way
to previous patch for GF/arf.
This has no impact on derf or std hd sets but gives a
small net gain of ~0.1% for yt and yt-hd sets.
Change-Id: Ida64ac1428d9c2a62adb67056fadbf0180eff030
The variation in boost calculation for gf and arf groups
is not significant enough to justify the extra complexity.
Also removed some other spurious code that no longer
has much material impact.
The handling of the rare case, where the boost bits
number is less than the number of bits a that would
be allocated if a frame was not boosted, will be dealt
with in a subsequent patch.
This change actually helps on all sets a little by
~0.1% - 0.2% with slightly bigger gains on SSIM.
Change-Id: Id42c1ac22a80a8c4993cfa0e51bc733eb9ed4f75
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.
Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
The warning messages complained that there are unused arguments
in a few prediction modes. This structure was designed on purpose,
such that a wrapper function can cover all prediction mode cases
and make them readily accessible as an pointer array.
This commit silences such warnings.
Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0
Allow slightly larger minq-maxq range for P frames. This improves
the compression performance of speed -5 for rtc set by 2.7% in psnr.
Change-Id: I438653d52d0fe51111509c6092e2334bac2de0cf
When superframe index is available we completely rely on it and use frame
size values from the index.
Change-Id: I0011d08b223303a8b912c2bcc8a02b74d0426ee0
Inline loopfilter has been already handled in vp9_decode_frame().
Collecting all similar code in one place now.
Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
Re-factor duplicate code.
Add two pass check for use of section_intra_rating as
it is un-initialised in the 1 pass and rt case.
Change-Id: I93120796f07961b8a21fb26e1a9f0d3d13949994
One of a series of changes to clean up two pass
allocation as precursor to support for multiple arf
or boosted frames per GF/ARF group.
This change pulls out the calculation of the total bits
allocated to a GF/ARF group into a function, to aid
readability and reduce the line count for define_gf_group().
This change should have no material impact on output.
Change-Id: I716fba08e26f9ddde3257e7d9b188453791883a3
This commit enables a chessboard pattern for partition search. All
the black blocks run regular partition search ranging from 8x8 to
32x32. The rest white blocks take the nearby blocks' information
to adaptively decide the effective search range.
The compression performance for rtc set at speed -5 is down by 1.5%.
For pedestrian 1080p at speed -5, the runtime goes from 41594 ms to
39697 ms, i.e., about 5% faster.
Change-Id: Ia4b96e237abfaada487c743bca08fe1afd298685
Fixes the idecoder in the case where:
cm->error_resilient_mode == 0, and
cm->frame_parallel_decoding_mode == 0, but
new_fb->corrupted == 1.
The assert in debug_check_frame_counts fails to
take into account the case of a corrupt frame.
Change-Id: Idf318a68458cc88d65d6f3f408a10d8ffe87e43f
* changes:
Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT
Change eob threshold for partial inverse 8x8 2D-DCT to 12
SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
tx_mode supercedes whatever mechanism is used to push for 16x16
allowing for the use of the 4x4 transform.
Change-Id: I6c3f05ab9fe52050e40cc6303de9334653763289
We only used two members from that struct: max_threads and inv_tile_order.
Moving them directly to VP9Decoder struct.
Change-Id: If696a4e5b5b41868a55f3cc971e1d7c1dd9d5f69
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.
The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).
Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
vp9_is_upper_layer_key_frame() definition does not match declaration--
it was missing the second const.
Change-Id: I71312579eb443be1924b8b06d8b3177c3dcb40f3
This commit enables ssse3 assembly implementation of the 8x8
inverse 2D-DCT with only first 10 coefficients non-zero. The
average runtime for this unit goes down from 198 cycles to 129
cycles (34.8% faster).
Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
Merged minq tables for arf and gf cases.
These tables were almost the same and for
VBR the arf table was not used at all.
Change-Id: Ie3c87e91dab613cf06f6945ac1ace0e0e4213d34
Small adjustment to the active Q range calculations.
These changes should slightly extend the available Q range
for KF/GF/ARF and narrow it for other frames.
The results for this change in isolation are broadly positive
for SSIM and average PSNR and slightly up but mixed for opsnr.
derf +0.293% opsnr, +1.286% SSIM
std-hd + 0.528% opsnr, + 1.746% SSIM
yt +0.056% opsnr, +0.457% SSIM
yt-hd -0.147% opsnr, + 0.226% SSIM
Change-Id: If065280342027ecc5d44b49fc1d440dfef041002
Includes changes that are not compatible with VS windows builds.
Amongst other things stdint.h is not supported in VS.
This reverts commit 89fbf3de50.
Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
When the variance is far less than sse, the block is considered to
be under light change. All the energy is compacted into DC coeff
and can be coded at low cost. In such situation, switch the rate-
distortion modeling from sse+var based back to variance based.
Note that this is a temporary solution to handle the rare situations
where the scene light changes.
Change-Id: I1ee0fe2b9eda6b5fac40152e1841bf23f4d229fd
The rounding of the ARNR filter output prior to
normalization by the filter strength was incorrect
when strength = 0.
In this case 1 << (strength - 1) would not create the
required rounding of 0, rather it would outrange. This
patch fixes this issue.
Change-Id: I771809ba34d6052b17d34c870ea11ff67b418dab
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.
Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
When ARNR filtering is disabled, by setting
arnr_max_frames=0, mode_skip_mask was being set to
-1 for the ARF frame resulting in no mode being
selected for the block.
The intent is to restrict the reference frame to the
previous ARF frame and the mode to one of ZEROMV,
NEARMV or NEARESTMV.
Change-Id: Ifc3920b153142cd01d422910c94d2f20ffb6f129
On balance Deb's modified rate control for VBR seems
to be outperforming especially on some low motion YT
clips so I have switched this to be the default mode for
now.
Change-Id: I0713d430cad6425ac5c48fccdf332e12814ee44a
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
version is turned on only for x86_64. The average unit runtime
goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
This translates into about 1.5% speed-up for pedestrian_area 1080p
at speed 2.
Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
This reverts commit 59e733ca81.
Hold off removing arnr_type to give users the opportunity
to change their script files to handle its deprecation. A
follow-up patch will mark the control for setting arnr_type
as deprecated and it will be removed completely in a later
revision of the code.
Change-Id: I8b817c744e144d3714234a4cd4309816d0c7e3e8
This member of VP9_COMP seemed unnecessary since it
only shadowed VP9EncoderConfig.key_freq that is
accessible through VP9_COMP.
Change-Id: Ib751bb1cf1b0b3c50a2a527d7c34f6829dd6fee3
The encoder was not handling requests to place keyframes at
fixed intervals, i.e. kf_min_dist == kf_max_dist, correctly.
In this case when looking to place the next keyframe it was
accumulating stats all the way up to the end of the firstpass
file. This patch corrects this behavior.
Change-Id: I948ad9f1d7faa0c05861df588136cce3bb61d7e7
This commit introduces a chessboard pattern search for the prediction
filter type search. It runs extensive search in alternate blocks and
allows the rest blocks to refer coding decisions of their nearby
neighbors.
For pedestrian 1080p at 4000 kbps, the runtime of speed -5 goes down
from 43990 ms to 42200 ms. The overall compression performance for
RTC set is changed by -1.37%.
Change-Id: Icfe220c49451cda796f0ca91d935c9ed01e56c9d
ARNR filtering is now forced to be centered on the ARF
frame and the other two options have been removed.
The other modes of constructing the ARNR frame were
not used and there does not seem to be any good
reason to maintain them.
This is purely an encoder-side change.
Change-Id: Ic772636d23f280752973852b9740083532a49de2
For speed 3 and above, such search is only allowed at speed 3.
The change helped cif and stdhd set by 1.2% and .7% in compression,
but increased the encoding time by around 5%.
Change-Id: Ifa4832327f1c1bef3decb032ceb769cbf50e059f
Adds test code to verify that supplemental superframe information
that precedes the normal superframe information will not break
decoding.
Change-Id: Ia252b887d7ee138f51dc9a778376ff739402c455
The end_useage parameter is confusingly named since it
now actually defines the rate control method used.
Change-Id: I98912caabfe556b7af0b939a645d1336409e4d71
This commit enables a background detection approach for adaptive
quantizer control. It combines the cyclic refresh pattern and the
background information to determine the segment id for adaptive
quantizer selection, prior to the non-RD mode decision process.
It hence allows proper quantization information update for a more
precise rate-distortion modeling in the non-RD mode decision.
The compression performance of speed -5 for rtc set is improved
by 2.5%, at no speed change.
Change-Id: Ic3713e8ed9185b403b5b1679d19dabd57506d452
1. We didn't scale source image in lower layers so that
the stats are incorrect.
2. We didn't extend borders for re-constructed image.
Change-Id: Ia8d7bafbdb695ffa7f504e171f9449812e7bb0a3
To make direct side by side testing this patch combines two
VBR corrections schemes to allow more direct side by side testing.
(The other patch was by Debargha chg id I0cd1f7...)
Change-Id: I271c45e5c4ccf8de8305589000218b80d9dc3a25
The background detection only tracks luma component. This commits
removes the frame buffer pointer retrieval for chroma components.
Change-Id: I098bd2950f5e5829ed5dc2b48568167248da7fad
This patch sets up a quad_tree structure (pc_tree) for holding all of
pick_mode_context data we use at any square block size during encoding
or picking modes. That includes contexts for 2 horizontal and 2 vertical
splits, one none, and pointers to 4 sub pc_tree nodes corresponding
to split. It also includes a pointer to the current chosen partitioning.
This replaces code that held an index for every level in the pick
modes array including: sb_index, mb_index,
b_index, ab_index.
These were used as stateful indexes that pointed to the current pick mode
contexts you had at each level stored in the following arrays
array ab4x4_context[][][],
sb8x4_context[][][], sb4x8_context[][][], sb8x8_context[][][],
sb8x16_context[][][], sb16x8_context[][][], mb_context[][], sb32x16[][],
sb16x32[], sb32_context[], sb32x64_context[], sb64x32_context[],
sb64_context
and the partitioning that had been stored in the following:
b_partitioning, mb_partitioning, sb_partitioning, and sb64_partitioning.
Prior to this patch before doing an encode you had to set the appropriate
index for your block size ( switch statement), update it ( up to 3
lookups for the index array value) and then make your call into a recursive
function at which point you'd have to call get_context which then
had to do a switch statement based on the blocksize, and then up to 3
lookups based upon the block size to find the context to use.
With the new code the context for the block size is passed around directly
avoiding the extraneous switch statements and multi dimensional array
look ups that were listed above. At any level in the search all of the
contexts are local to the pc_tree you are working on (in?).
In addition in most places code that used to call sub functions and
then check if the block size was 4x4 and index was > 0 and return
now don't preferring instead to call the right none function on the inside.
Change-Id: I06e39318269d9af2ce37961b3f95e181b57f5ed9
There is no need to initialize source/dst frame buffers at frame
level. These will be done at block coding stage. This commit hence
removes the redundant operations.
Change-Id: I11d9f2556058c6205c8e58ed53e31f78622c41b7
Add code to monitor over and under spend and
apply limited correction to the data rate of subsequent
frames. To prevent the problem of starvation or overspend
on individual frames (especially near the end of a clip) the
maximum adjustment on a single frame is limited to a %
of its un-modified allocation.
Change-Id: I6e1ca035ab8afb0c98eac4392115d0752d9cbd7f
This commit compares the current original frame to the previous
original frame at 64x64 block level and decides if the entire
block belongs to background area. If it is in the background area,
skip non-RD partition search and copy the partition types of the
collocated block in the previous frame.
For vidyo1 in the rtc set, this makes the speed -5 coding speed
about 8% faster. The overall compression performance is down by
1.37% for rtc set.
Change-Id: Iccf920562fcc88f21d377fb6a44c547c8689b7ea
Delete code relating to the old VP8_TUNE_SSIM flag
as this code does not currently work and is largely made
redundant in VP9 by the various AQ modes.
Change-Id: I71f28e1f680573d296422254489000678552b17b
Remove duplicate rd_thresh code introduced when vp9_rd_pick_inter_mode_sub8x8()
was forked from vp9_rd_pick_inter_mode_sb().
Change-Id: I3c9b7143d182e1f28b29c16518eaca81dc2ecfed
Fix rate control bug whereby the rate factor heuristics
were being updated on arf overlays causing a rate surge
for a few frames followed by a corrective drop.
This fix eliminates many of the overshoot problems that
we were seeing on hard clips (even without applying
stricter vbr rate control) and also helps quality on
almost all clips with some hard clips improving by >5%.
Overall quality results measured at speed 2.
Derf +1.78% opsnr , +2.44% SSIM
Stdhd +2.41% opsnr, +2.85% SSIM
Change-Id: I2369df6295c2705963fa6307877f6acb304bcc39
We don't use declarations from this file. The real declarations
(differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad.
Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0
Added command line flags "resize-width" & "resize-height"
to allow the user to specify the frame size to encode at.
These two flags are ignored if the "resize-allowed" switch
is not set to 1.
All frames in the clip are then encoded at this size, which
must be smaller than the raw frame size.
Change-Id: I3d64bd9303d5c0bd678461a866a1ea621700d744
A previous path improved speed 2 quality a little but
more extensive testing showed that it slowed encode
by a few %.
The change will have a similar effect for speed 3 but
should not impact speeds 4+;
This experiment should reverse that and give a speed
up at the cost of a small quality loss.
Borg results pending.
Change-Id: I4493fc1541aaf44587f1a41ff219f7088da9252c
Both values are already checked as command line arguments:
RANGE_CHECK_HI(cfg, g_lag_in_frames, MAX_LAG_BUFFERS);
RANGE_CHECK_HI(extra_cfg, sharpness, 7);
Change-Id: I584798d587152d88dfd517c210054b466f4e5f8a