If we are already saving a lot in bits from the target (maximum)
bitrate in the constrained quality mode, allow the quantizer
to go lower than the cq level. This hopefully will solve issues
with getting too low a bitrate and consequently poor quality for
certain videos in cq mode.
Change-Id: I1c4e8b0171fcf58f95198b3add85eea5f3c8f19f
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
This commit changed to enable the encoder to adjust motion dection
speed threshold based on picture size. In addition, cpu-used 1 now
does a partition search every other frame instead of every third
frame for low resolution inputs.
The change has no quality/speed impact for 720p and above. Test
showed the change increase encoding time by between 3% to 6% for
cpu-used 2 encodiong of 360p sequences. It also has a compression
gain about .3%.
For cpu-used 2, the change resolved some very disturbing visual
artifacts in certain sequences when large block partitionings and
transforms are used as a result of copying the partition from a
previous frame.
Change-Id: Ic7fd22508cdb811d4ca935655adbf20109286cfa
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.
Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
This commit adjusts the forward 16x16 DCT computation steps to
simplify the register level operations. It fixes the corresponding
sse2 version accordingly.
Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08
Added a skipping test in non-rd inter-mode. After interpolation
prediction step, the residuals are tested to see if they will be
quantized to 0 based on modeling between spatial domain and
frequency domain.
Set static-thresh to 800 for >=720p and 300 for <720p, rtc set
tests showed
1. Speed 5, psnr: -0.514%; ssim: -1.748%;
speedup on related clips: 5% -11%
2. Speed 6, psbr: -0.628%; ssim: -1.637%;
speedup on related clips: 4% - 9%
Change-Id: I62fbf26bc043ecd2b584f255f1a4ee5ab52bfcf3
vp9_block_error_sse2 can only handle 16 bytes at a time but
the function requires to handle a sequence of 32 bytes at a time
so each 16 bytes is handled in a different register.
With AVX2 optimization the 32 bytes can be handled in one register instead
of two in the SSE2
The vp9_block_error was optimized by 85%.
The user level was optimized by 1.2%
Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
The various motion search functions share a
common function prototype. In the case of
vp9_full_range_search() two of the parameters
are not needed.
Change-Id: I0e190af54a3b3f276409f20e8ec55912f9b0b798
Simplify the calculation of KF bitrate in similar way
to previous patch for GF/arf.
This has no impact on derf or std hd sets but gives a
small net gain of ~0.1% for yt and yt-hd sets.
Change-Id: Ida64ac1428d9c2a62adb67056fadbf0180eff030
The variation in boost calculation for gf and arf groups
is not significant enough to justify the extra complexity.
Also removed some other spurious code that no longer
has much material impact.
The handling of the rare case, where the boost bits
number is less than the number of bits a that would
be allocated if a frame was not boosted, will be dealt
with in a subsequent patch.
This change actually helps on all sets a little by
~0.1% - 0.2% with slightly bigger gains on SSIM.
Change-Id: Id42c1ac22a80a8c4993cfa0e51bc733eb9ed4f75
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.
Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
Allow slightly larger minq-maxq range for P frames. This improves
the compression performance of speed -5 for rtc set by 2.7% in psnr.
Change-Id: I438653d52d0fe51111509c6092e2334bac2de0cf
Inline loopfilter has been already handled in vp9_decode_frame().
Collecting all similar code in one place now.
Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
Re-factor duplicate code.
Add two pass check for use of section_intra_rating as
it is un-initialised in the 1 pass and rt case.
Change-Id: I93120796f07961b8a21fb26e1a9f0d3d13949994
One of a series of changes to clean up two pass
allocation as precursor to support for multiple arf
or boosted frames per GF/ARF group.
This change pulls out the calculation of the total bits
allocated to a GF/ARF group into a function, to aid
readability and reduce the line count for define_gf_group().
This change should have no material impact on output.
Change-Id: I716fba08e26f9ddde3257e7d9b188453791883a3
This commit enables a chessboard pattern for partition search. All
the black blocks run regular partition search ranging from 8x8 to
32x32. The rest white blocks take the nearby blocks' information
to adaptively decide the effective search range.
The compression performance for rtc set at speed -5 is down by 1.5%.
For pedestrian 1080p at speed -5, the runtime goes from 41594 ms to
39697 ms, i.e., about 5% faster.
Change-Id: Ia4b96e237abfaada487c743bca08fe1afd298685
tx_mode supercedes whatever mechanism is used to push for 16x16
allowing for the use of the 4x4 transform.
Change-Id: I6c3f05ab9fe52050e40cc6303de9334653763289
vp9_is_upper_layer_key_frame() definition does not match declaration--
it was missing the second const.
Change-Id: I71312579eb443be1924b8b06d8b3177c3dcb40f3
Merged minq tables for arf and gf cases.
These tables were almost the same and for
VBR the arf table was not used at all.
Change-Id: Ie3c87e91dab613cf06f6945ac1ace0e0e4213d34
Small adjustment to the active Q range calculations.
These changes should slightly extend the available Q range
for KF/GF/ARF and narrow it for other frames.
The results for this change in isolation are broadly positive
for SSIM and average PSNR and slightly up but mixed for opsnr.
derf +0.293% opsnr, +1.286% SSIM
std-hd + 0.528% opsnr, + 1.746% SSIM
yt +0.056% opsnr, +0.457% SSIM
yt-hd -0.147% opsnr, + 0.226% SSIM
Change-Id: If065280342027ecc5d44b49fc1d440dfef041002
Includes changes that are not compatible with VS windows builds.
Amongst other things stdint.h is not supported in VS.
This reverts commit 89fbf3de50.
Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
When the variance is far less than sse, the block is considered to
be under light change. All the energy is compacted into DC coeff
and can be coded at low cost. In such situation, switch the rate-
distortion modeling from sse+var based back to variance based.
Note that this is a temporary solution to handle the rare situations
where the scene light changes.
Change-Id: I1ee0fe2b9eda6b5fac40152e1841bf23f4d229fd
The rounding of the ARNR filter output prior to
normalization by the filter strength was incorrect
when strength = 0.
In this case 1 << (strength - 1) would not create the
required rounding of 0, rather it would outrange. This
patch fixes this issue.
Change-Id: I771809ba34d6052b17d34c870ea11ff67b418dab
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.
Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
When ARNR filtering is disabled, by setting
arnr_max_frames=0, mode_skip_mask was being set to
-1 for the ARF frame resulting in no mode being
selected for the block.
The intent is to restrict the reference frame to the
previous ARF frame and the mode to one of ZEROMV,
NEARMV or NEARESTMV.
Change-Id: Ifc3920b153142cd01d422910c94d2f20ffb6f129
On balance Deb's modified rate control for VBR seems
to be outperforming especially on some low motion YT
clips so I have switched this to be the default mode for
now.
Change-Id: I0713d430cad6425ac5c48fccdf332e12814ee44a
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
version is turned on only for x86_64. The average unit runtime
goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
This translates into about 1.5% speed-up for pedestrian_area 1080p
at speed 2.
Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
This reverts commit 59e733ca81.
Hold off removing arnr_type to give users the opportunity
to change their script files to handle its deprecation. A
follow-up patch will mark the control for setting arnr_type
as deprecated and it will be removed completely in a later
revision of the code.
Change-Id: I8b817c744e144d3714234a4cd4309816d0c7e3e8
This member of VP9_COMP seemed unnecessary since it
only shadowed VP9EncoderConfig.key_freq that is
accessible through VP9_COMP.
Change-Id: Ib751bb1cf1b0b3c50a2a527d7c34f6829dd6fee3
The encoder was not handling requests to place keyframes at
fixed intervals, i.e. kf_min_dist == kf_max_dist, correctly.
In this case when looking to place the next keyframe it was
accumulating stats all the way up to the end of the firstpass
file. This patch corrects this behavior.
Change-Id: I948ad9f1d7faa0c05861df588136cce3bb61d7e7
This commit introduces a chessboard pattern search for the prediction
filter type search. It runs extensive search in alternate blocks and
allows the rest blocks to refer coding decisions of their nearby
neighbors.
For pedestrian 1080p at 4000 kbps, the runtime of speed -5 goes down
from 43990 ms to 42200 ms. The overall compression performance for
RTC set is changed by -1.37%.
Change-Id: Icfe220c49451cda796f0ca91d935c9ed01e56c9d