With row-tile enabled, the encoder test driver needs to configure the
decoder in order to decode all tiles correctly.
Change-Id: I8b00a766cf5e41255625846f92fd71915c614ec1
Allow for 3 quant profiles from entropy context
Refactored dq_offset bands to allow for re-optimization based on number
of quantization profiles
Change-Id: Ib8d7e8854ad4e0bf8745038df28833d91efcfbea
Optical flow parameters still need to be tweaked and is much slower
so feature based should be the default for now.
Change-Id: Id6cafb5a245e329f728e9c66c89c0ed1018c347c
For each global motion parameter, search some step
size to the left and right to try and minimize warp
mse and correct for error.
Change-Id: I1f0b464b0924d56b15460e509f89736af9b7e78f
A tile copy mode is introduced, while allows a tile to use
another tile's coded data directly at bitstream level. This
largely reduces the bit rate in this use case. Our tests
showed that 10% - 20% bit rate reduction was achieved.
Change-Id: Icf5ae00320e27193b15ce95297720f8b6f5e7fd9
1. Made VP8_SET_REFERENCE vpx_codec_control API work in VP9 based
on Ryan Overbeck's patch. (Thanks.)
2. Added vp9cx_set_ref example, which demonstrated how to set
VP8_SET_REFERENCE vpx_codec_control in encoder and decoder. If
we only set_reference in the encoder, the encoder/decoder
mismatch would be observed.
3. Also updated test/cx_set_ref.sh.
Change-Id: I433a6220616947ca8c73e65a5fb3694751ea84b6
Added dq_off_index attribute to mbmi to allow for switching between
dequantization modes.
Reduced number of different dequantization modes from 5 to 3.
Changed dequant_val_nuq to be allow for 3 dequant levels instead of 1.
Fixed lint errors
Change-Id: I7aee3548011aa4eee18adb09d835051c3108d2ee
Implements wedge mask generation by pre-computing arrays.
Improves encode speed by 15-20%.
Also consolidates the mask generation code for inter-inter
and inter-intra.
Change-Id: If1d4ae2babb2c05bd50cc30affce22785fd59f95
In large-scale tile coding, when the number of tiles is large and the tile
size is small, using a fixed number of bytes in the tile header to store
tile-data size information as done in current VP9 codec would bring high
overhead for each tile. This patch implemented 2 ways to lower that overhead
and adaptively determine the number of bytes needed for tile-data size
transmission.
The test on a test clip having the tile size of 64x64 showed that the number
of bytes used for storing tile-data size was reduced from 4 to 1, which
substantially improved the compression ratio in large-scale tile coding.
Change-Id: Ia02fc43eda67fa252fbc2554321957790f53f6fd
A small gain (0.1 - 0.2%) with this experiment on derflr/hevcmr.
The DST2 can be implemened very efficiently using sign flipping
of odd indexed inputs, followed by DCT, followed by reversal of
the output. This is how it is implemented in this patch.
SIMD optimization is pending.
Change-Id: Ic2fc211ce0e6b7c6702974d76d6573f55cc4da0e
Includes tests which compare output of new SSSE3 functions with
their C equivalents, and fixes to the C code to ensure these tests
pass.
Change-Id: Iec3980cce95a8ee6bf9421fa4793130e92c162e3
Changes ensure wedge_partition, interintra and palette expts all
work alongside the ext_coding_unit_size experiment.
Change-Id: I18f17acb29071f6fc6784e815661c73cc21144d6
When the error resilient mode is on, the decoder resets mode info structure
to zero once per frame. This makes decoder about 10x slower if we decode
a single tile at a time. This patch resolves the issue by only memset mode
info of those decoded tiles. Currently, to decode a frame, tile decoding is
less than 2x slower than frame decoding.
Change-Id: Ia3fd88d91a4e74e7bbbc6547d87c24d085a1533e
This is a port of 4f5108090a
This commit also fixes a bug where FLIPADST transforms when combined
with a DST (that is FLIPADST_DST and DST_FLIPADST) did not actually did
a flipped transform but a straight ADST instead. This was due to the C
implementation that it fell back on not implementing flipping. This is
now fixed as well and FLIPADST_DST and DST_FLIPADST does what it is
supposed to do.
There are 3 functions in the SR_MODE experiment that should be updated,
but given that the build of SR_MODE is broken at the upstream tip of
nextgen, I could not test these, so I have put in assertions and FIXME
notes at the problematic places.
Change-Id: I5b8175b85f944f2369b183a26256e08d97f4bdef
Extending the SSE2 implementation of vp9_subtract_block to work
with the 128x128 coding unit experiment
Change-Id: Ib3cc16bf5801ef2c7eecc19d3cc07a8c50631580
This is a port of 01bb4a318d
This commit also fixes a bug where FLIPADST transforms when combined
with a DST (that is FLIPADST_DST and DST_FLIPADST) did not actually did
a flipped transform but a straight ADST instead. This was due to the C
implementation that it fell back on not implementing flipping. This is
now fixed as well and FLIPADST_DST and DST_FLIPADST does what it is
supposed to do.
Change-Id: I89c67ca1d5e06808a1567c51e7d6bec4998182bd
Also fixes a valgrind error when optimizations are disabled.
Done in preparation for the work on the extended coding unit size
experiment.
Change-Id: Ib074c5a02c94ebed7dd61ff0465d26fa89834545
Remove default cortex-a8 tuning.
Probably not even the dominant platform the library is being built for.
Add --cpu= option description to help. The option already exists.
Don't allow passing just --cpu as a no-op.
BUG=826
Change-Id: Iaa3f4f693ec78b18927b159b480daafeba0549c0
For tall rectangular blocks, the block_idx of the lower transform
block was being mis-calculated.
Does not affect results the way this function is being used now.
Change-Id: I470464d19be0bf0f42003d0cc29793bc42db8f52
These changes have been made in preparation for the work on the
extended coding unit size experiment.
Change-Id: I83f289812426bb9aba6c4a5fedd2b0e0a4fe17cb
Speeds up wedge search by pre-calculating single predictions
before computing the wedge combination.
About 20% speed up achieved.
Change-Id: I72f76568559d1899c8ac0afaa133d433ba388e6d
Under the experiment of CONFIG_LAST4_REF. On derflr testset, using
highbitdepth (HBD), in average PSNR,
(1) LAST2+LAST3+LAST4 obtained +0.361% against LAST2+LAST3;
(2) LAST2+LAST3+LAST4 obtained +1.567% against baesline.
Change-Id: Ic8b14272de6a569df2b54418fa72b505e1ed3aad
On derflr testset, using 12-bit HightBitDepth mode, this CL obtained a
small gain of +0.031% by turning on LAST2+LAST3.
Change-Id: Ib6c9d595e56269634bf29d684eabcd806fc08cc9
Under experiment CONFIG_LAST3_REF, which can only be turned on when
the experiment of CONFIG_MULTI_REF is on, i.e. LAST3_FRAME can only
be used when LAST2_FRAME is used. CONFIG_LAST3_REF would most likely
be combined with CONFIG_MULTI_REF once the performance improvement
is further confirmed.
On the testset of derflr, using Average PSNR metrics, with HighBitDepth
(HBD) on:
(1) LAST2 HBD obtained +0.579% against base HBD;
(2) LAST2 + LAST3 HBD obtained +0.591% against LAST2 HBD;
(3) LAST2 + LAST3 HBD obtained +1.173% against base HBD.
Change-Id: I1aa2b2e2d2c9834e5f8e61bf2d8818c7b1516669
All the numbers of MAX_MODES have been changed assuming
CONFIG_MULTI_REF. Now correct numbers have been put in for both with and
without the enabling of the experiment MULTI_REF.
Change-Id: I70ffe2f1a89fa572d612dd3d311d3af19fe3a632
Turning on all the other experiments, compared the RD performance
between with and without the use of LAST2_FRAME, on derflr testset,
on Average PSNR:
8-bit: +0.653% (All positive except one,
max: mobile_cif: 2.019%; min: paris_cif: -0.081%)
12-bit HBD: +0.735% (All positive,
max: bridge_far_cif: 2.416%; min: bowing_cif: 0.132%)
Change-Id: Ia0a375667e228c8ba3d2e223abff608206f2f545
This CL introduces a few macros plus code cleaning on the encoding of
the reference frames. Coding performance remains unchanged.
For the encoding of either the compound reference or the single reference
case, since each bit has different contexts, the tree structure may not
be applied to treat the combined bits as one symbol. It is possible we may
explore the sharing of the same context for all the bits to introduce
the use of tree structure for the next step.
Change-Id: I6916ae53c66be1a0b23e6273811c0139515484df
Under the experiment CONFIG_MULTI_REF. Current version shows
LAST2 vs base in nextgen on the testset of derflr:
(1) 8-bit: Average PSNR +0.53%
(worst: students_cif: -0.247%; best: mobile_cif: 1.902%)
(2) 12-bit HBD: Average PSNR +0.63%
(worst: pamphlet_cif: -0.213%, best: mobile_cif: 2.101%)
More tuning on the reference frame context design and default
probs are being conducted. This version does not guarantee to
work with other experiments in nextgen. A separate CL will address
the working with all other experiments.
Change-Id: I7f40d2522517afc26ca389c995bad56989587f65
CONFIG_SR_MODE=1, enable SR mode
USE_POST_F=1, enable SR post filter
SR_USE_MULTI_F=1, enable SR post filter family
Not compatible with other experiments yet
Change-Id: I116f1d898cc2ff7dd114d7379664304907afe0ec
Disables some test vector tests when Vp8/Vp9 decoders are disabled
in configuration. Also moves some macros to the vpx level in
line with recent refactoring on the master branch.
Change-Id: Iaac8008992110398ae096c36b9726f723164c207
derflr: up to 1.429% from a little less than 1.3% when
--enable-dst1 is also enabled with --enable-ext-tx.
Change-Id: I301229f2239b18acb96accc4fc44b64fa6927ace
Adds an additional transform in the ext_tx experiment that
is a 2d DST1-DST1 combination.
To enable use --enable-ext-tx --enable-dst1.
This needs to be later extended to combine DST1 with DCT
or ADST.
Change-Id: I6d29f1b778ef8294bcfb6a512a78fc5eda20723b
Limited the prediction extension to 8 pixels at each edge
Fixed a bug in the combination of wedge prediction and supertx
~10% speed up in decoder
derflr: -0.004
derflr+hbd: +0.002
hevcmr: +0.015
Change-Id: I777518896894a612c9704d3de0e7902bf498b0ea
Framework for alternate transforms for inter 32x32 and larger based
on dwt-dct hybrid is implemented.
Further experiments are to be condcuted with different
variations of hybrid dct/dwt or plain dwt, as well as super-resolution
mode.
Change-Id: I9a2bf49ba317e7668002cf1499211d7da6fa14ad
Optimization of bilateral filter:
1) Pre-calculate the bilateral filters at all the
levels at the initialization.
2) Convert 1D matrix to 2D matrix, avoid too many
multiplications in the bilateral filter loop.
3) Fix a bug in "loop_bilateral_filter_highbd".
The right-shifted range index can be larger than 255.
Change-Id: I42f6177e896706948e2403bd9edce46e3eb5cbf8
Predict displacement vector with the same logic as NEARESTMV. If no
neighbors are available fall back to the current one block left or up
prediction.
vp9+intrabc+tx_skip+palette: -0.489
vp9+intrabc: -0.771
Change-Id: If67d08b54f1a3b847cf7ab8c7b800c55baa1a86b
Changes to allow high bit depth and palette to be enabled at the
same time by using a 16 bit (instead of 8bit) palette when high
bit depth is enabled and modifying related functions accordingly.
Change-Id: I97d30b4d9338d3a51db02c94bc568eba60a8905d
This patch avoids using tx_size larger than 16x16 in lossless mode.
Big block quantization (32x32 or larger) is not lossless.
Change-Id: I69cd84d4f3fd06d641048d6096da1bfde18ad24e
There are 4 entropy tables to select for initial entropy table,
depending on the frame base q-index. The entropy tables are
trained with derf, yt, and stdhd sets. About 0.2% gain on
the following test sets:
derflr 0.227%
yt 0.277%
stdhd 0.233%
hevclr 0.221%
hevcmr 0.155%
hevchr 0.182%
Change-Id: I3fde846c47fc020e80c814897690b4cda1da569c
Change-Id: I460408372586c823974f945ed9fd8dcb0360fbaf
For the combination of this and removing NEWDV from the tree:
derflr: -0.101 screen_content: +0.053
The bulk of the decline in screen content effecincy is from the liquify
clip. These should be recoverable by further entropy tweaks.
Change-Id: I9d80152b8492e60a0367c31797fb6932fb09bba9
The wavelets implemented are 2/6, 5/3 and 9/7 each with
a lifting based scheme for even block sizes. The 9/7
one is a double implementation currently.
This is to start experiments with:
1. Replacing large transforms (32x32 and 64x64) with wavelets
or wavelet-dct hybrids that can hopefully localize errors better
spatially. (Will also need alternate entropy coder)
2. Super-resolution modes where the higher sub-bands may be
selectively skipped from being conveyed, while a smart
reconstruction recovers the lost frequencies.
The current patch includes two types of 32x32 and 64x64
transforms: one where only wavelets are used, and another
where a single level wavelet decomposition is followed
by a lower resolution dct on the low-low band.
Change-Id: I2d6755c4e6c8ec9386a04633dacbe0de3b0043ec
This was originally part of change-id:Idef18f90b111a0d0c9546543d3347e551908fd78
but the rest of that patch has previously been incorporated into nextgen
without these tests.
Change-Id: I6ac491ed1cfc153e0ed3cb999c76feac9d5e74e3
This fixes a failure to decode as VP9 with frame level signalling
patched in and removed to implictly force it off.
This is primarily a correctness fix, but there were small coding gains.
0.074% smaller on derlf
0.180% smaller on screen_content
Change-Id: I4cc264bf4d9016201924be198a0c5b8374020fd9
Skip flag is removed from the bitstream for all blocks coded in intra
mode. Very minor coding gain in derf and stdhd sets (0.048 and 0.1)
Change-Id: I79f03300f16d6fa84ce54405cafecab8a021cd7d
If the decoder is configured to decode a tile indexed above the
upper limit, the internal codec will clip the index to the upper
limits.
Change-Id: Icbc1bb7b14069ac009e0a2042dd66a46d6f76679
This commit allows the vpxdec to produce the target tile
reconstruction as its output. When the provided tile index is -1,
the entire corresponding row/column tiles will be reconstructed.
If the tile index is over the upper limit, the decoder will decode
left-most/bottom tile.
Change-Id: I4c18bdb32099f736f99b8842f7f177a32b3fee09
This commit allows the decoder to decode selective tiles according
to decoder configuration settings. To decode a single tile out of
the provided key frame bit-stream (test_kf.webm), set compiler
configuration:
--enable-experimental --enable-row-tile --enable-key-frame-tile
use the command:
vpxdec -o test_dec.y4m test_kf.webm --tile-row=1 --tile-column=2
where the tile's row and column indexes are 1 and 2, respectively.
To decode all row tiles inside a provided column index, use:
--tile-row=-1 --tile-column=2
To decode all column tiles inside a provided row index, use:
--tile-row=2 --tile-column=-1
Change-Id: Ib73c266414dcee7eaab5d741b90d0058970dae56
Fixing compile errors when intrabc and vp9_highbitdepth are both enabled.
Error arose because intrabc uses vp9_setup_scale_factors_for_frame()
which takes an additional argument (use_highbitdepth) when
vp9_highbitdepth is enabled.
Change-Id: I6a15b09dcea0d35525f4c25efb6848804ae5cfab
With this patch, the ZEROMV mode is overloaded to represent
a single global dominant motion using one of three models:
1. True zero translation motion (as before)
2. A translation motion different from 0
3. A Rotation-zoom affine model where the predictor is warped
The actual model used is indicated at the frame level for
each reference frame.
A metric that computes the ratio of the error with a global
non-zero model to the error for zero motion, is used to
determine on the encoder side whether to use one of the two
non-zero models or not.
Change-Id: I1f3d235b8860e543191237024a89041ff3aad689
This commit makes the bit-stream syntax support fast selective tile
decoding in a large scale tile array. It reduces the computational
complexity of computing the target tile offset in the bit-stream
from quadratic to linear scale, while maintaining relatively small
stack space requirement (in the order of 1024 bytes instead of 1M
bytes). The overhead cost due to tile separation remains identical.
Change-Id: Id60c6915733d33a627f49e167c57d2534e70aa96
Now this is an on-going work on the re-work on the motion vector
references for all Inter coding modes. Currently implementation on
sub8x8 and BLOCK_8X8 is done. More work will be added along the way.
Essetial ideas include:
(1) Added new nearestmv algorithm through adaptive median search, out of
the four nearest neighbors: TOP, LEFT, TOPLEFT, and TOPRIGHT;
(2) Added new sheme for sub8x8 to obtain the mv ref candidates,
specially, mv ref candidates are obtained depending on the sub8x8 mode
of the current block as well as the sub8x8 mode of the neighboring
block;
(3) Added top right corner mv ref candidate whenever it is available;
The adding of the top right mv ref has showed potential in helping such
video clips as bridge_far_cif.
Change-Id: I573c04cd346ed7010f4ad87a6eaa6bab6e2caf9c
Adding a high bit depth version of the function loop_bilateral_filter()
and ensuring the appropriate version of this function is called when
vp9_highbitdepth is enabled.
Change-Id: I7a78029c733a9a79b3b2f39af0de0824239ad5a3
If a non-skipped block has all transform blocks with only 0 data, then
decoder infers skip flag. This affects the loopfilter. No real encoder
would do this though, so it is pointless. Also, it causes headaches in
HW implmentations as the loop filter cannot proceed until all TX blocks
in the block have been checked. There could be up to 768 of them in
64x64 4:4:4 with 4x4 transform.
Change-Id: I45a021d1f27ca7feefed2242605777e70ce7cabd
This commit allows the encoder to process tile coding per 64x64
block. The supported upper limit of tile resolution is the minimum
of frame size and 4096 in each dimension. To turn on, set
--experiment --row-tile
and compile.
It overwrite the old --tile-columns and --tile-rows configurations.
These two parameters now tell the encoder the width and height of
tile in the unit of 64x64 block. For example,
--tile-columns=1 --tile-rows=1
will make a tile contains a single 64x64 block.
Change-Id: Id515749a05cfeb9e9d008291b76bdfb720de0948
This commit allows the internal codec handle arbitrary tile size
in the unit of 64x64 pixel blocks.
Change-Id: I3ad24de392064645bebab887c94e1db957794916
Move the 2D tile info arrays as global variables. This resolves
the local function stack overflow issue due to excessively large
tile info variables. This allows the internal operation to support
up to 1024 row and column tiles.
Change-Id: I6644cc929e5d3a778a5c03a712ebfc0b8729f576
This commit allows the codec to use up to row tiles (optionally
in combination with up to 64 column tiles per row tile). The
minimum tile size is set to be 256x256 pixel block.
Change-Id: I811ca93f0c5eba41e190f6c7c0f064d1083f530f
The max and min tile number reference should be used to support
both row and column tiles. This commit renames the previous col
prefix to avoid confusion.
Change-Id: I487bea43701af946b79023597a9a9a0516480380
Runborgs results on derflr show consistent results between NEW_INTER
and the previous combination of NEWMVREF and COMPOUND_MODES.
Change-Id: Ieba239c4faa7f93bc5c05ad656a7a3b818b4fbfc
Test VP9/EndToEndTestLarge.EndtoEndPSNRTest/1 (422 stream) failed when
supertx enabled. This was because 4x8 and 8x4 blocks were not being
split into 4x4s during tokenization in the encoder. This patch
uses vp9_foreach_transformed_block() to fix this.
Change-Id: I1f1cb27474eb9e04347067f5c4aff1942bbea8d9
Fix the row tile boundary detection issues. This allows to use
more resources for parallel encoding/decoding when avaiable.
Change-Id: Ifda9f66d1d7c2567dd4e0a572a99a83f179b55f9
Besides code cleaning, this patch contains 3 fixes:
(1) Fixed the COMPOUND_MODES for the NEW_NEWMV mode;
(2) Fixed the joint search when the NEAR_FORNEWMV mode (in NEWMVREF)
is being evaluated;
(3) Fixed the WEDGE_PARTITION when the NEAR_FORNEWMV mode (in NEWMVREF)
is being evaluated.
(4) Adjusted the entropy probability value for NEAR_FORNEW mode.
On derflr turning on all 14 experiments (except for global-motion), the
average gain w.r.t. PSNR is +0.07%:
Maximum on bridge_far_cif: +1.02%
Minimum on hallmonitor_cif: -0.16%
Change-Id: I4c9c6ee24a981af7e655a629580641d9f9745f91
Use separate token probabilities and counters for non-transform
blocks (pixel domain) . Initial probabilities are trained with screen_content
clips. On screen_content, it improves coding performance by about
2% (from +16.4% to +18.45%).
The initial probabilities are not optimized for natural videos. So it should
not be used for natural videos. Set FOR_SCREEN_CONTENT as 0/1 to specify
whether or not to enable this patch.
Change-Id: Ifa361c94bb62aa4b783cbfa50de08c3fecae0984
Implements a first version of global motion where the
existing ZEROMV mode is converted to a translation only
global motion mode.
A lot of the code for supporting a rotation-zoom affine
model is also incorporated.
WIP.
Change-Id: Ia1288a8dfe82f89484d4e291780288388e56d91b
The 8x8 DCT uses a fast version whenever possible.
There was a mistake in the checking code which
meant sometimes the fast version was used when it
was not safe to do so.
Change-Id: I154c84c9e2d836764768a11082947ca30f4b5ab7
In the case when there are only non-zero coefficients
in the first 4x4 block a special routine is called.
The highbitdepth optimized version of this routine
examined the wrong positions when deciding whether
to call an assembler or C inverse transform.
Change-Id: I62da663ca11775dadb66e402e42f4a1cb1927893
This is a combination of:
4a19fa6 Added sse2 acceleration for highbitdepth variance
c6f5d3b Fix high bit depth assembly function bugs
Change-Id: I446bdf3a405e4e9d2aa633d6281d66ea0cdfd79f
This change is made in preparation for a
subsequent patch which adds acceleration
for the highbitdepth transform functions.
The highbitdepth transform functions attempt
to use 16/32bit sse instructions where possible,
but fallback to using the C implementations if
potential overflow is detected. For this reason
the dct routines are made global so they can be
called from the acceleration functions in the
subsequent patch.
Change-Id: Ia921f191bf6936ccba4f13e8461624b120c1f665
It is a minor change, but the essential idea is to use the mv of the
top right block as the nearmv for the bottom left partition in the
sub8x8 block. The change is under the experiment of NEWMVREF.
When all 13 experiments are on (except for INTRABC), the gain is +0.05%:
Worse on bowing_cif: -0.17%
Best on foreman_cif: +0.42%; and bridge_far_cif: +0.40%
The total 13 experiments achieved a gain of +6.97% against base.
Change-Id: I3a51d9e28b34b0943fe16a984d62bfb38304ebca
vp9_quantize_rect did illegal shifts but didn't use the results.
The shift |a << b| is unfortunately undefined if |a < 0|, but the
more verbose |a * (1 << b)| generates the same machine code.
Change-Id: I7ceac66fa20a700630cf8ed008949146b161dab4
use \li to denote list items with \if.
fixes the following likely visible in <1.8.3:
usage.dox: warning: Invalid list item found
usage.dox: warning: End of list marker found without any preceding list items
Change-Id: I33c72799edf9f8866596ac8f79247050b8c75681
Do not treat first element (dc) differently.
on screen_content
tx-skip only: +16.4% (was +15.45%)
no significant impact on natrual videos
Change-Id: I79415a9e948ebbb4a69109311c10126d8a0b96ab
palette expt: correctly update color buffer
intrabc expt: update zcoeff_blk so that residue coding will not
be mistakenly skipped
Change-Id: I870f5b742c2ac394f4c871aa65e6591e293d8ef6
Changes include:
* Uses double for RD cost computation to guard against overflow
for large resolution frames.
* Use previous frame's filter level to code the level better.
* Change precision of the filter parameters.
* Allow spatial variance for x and y to be different
Change-Id: I1669f65eb0ab1e8519962954c92d59e04f1277b7
derflr: +0.556% (a little up from before)
Use raster scan order for non-transform blocks
+15.45% (+2.1%) on screen_content
no significant change on natural videos
Change-Id: I0e264cb69e8624540639302d131f7de9c31c3ba7
Adds an internal buffer in the encoder to store the deblocked
result to help speed up the search for the best bilateral filter.
Very small change in performance but a lot faster:
derflr: +0.518%
Change-Id: I5d37e016088e559c16317789cfb1c2f49334b2b9
+0.3% on 10-bit
+0.3% on 12-bit
With other high bit compatible experiments on 12-bit
+12.44% (+0.17) over 8-bit baseline
Change-Id: I40b4c382fa54ba4640d08d9d01950ea8c1200bc9
Adds a framework to incorporate a parameterized loop
postfilter in the coding loop after the application of the
standard deblocking loop filter.
The first version uses a straight bilateral filter
where the parameters conveyed are just spatial and
intensity gaussian variances.
Results on derflr:
+0.523% (only with this experiment)
+6.714% (with all expts other than intrabc)
Change-Id: I20d47285b4d25b8c6386ff8af2a75ff88ac2b69b
This patch allows the prediction residues of tx-skipped blocks
to use probs that are different from regular transfrom
coefficients for token entropy coding. Prediction residues are
assumed as in band 6.
The initial value of probs is obtained with stats from limited
tests. The statistic model for constrained token nodes has not
been optimized. The probs for token extra bits have not been
optimized. These can be future work.
Certain coding improvment is observed:
derflr with all experiments: +6.26% (+0.10%)
screen_content with palette: +22.48% (+1.28%)
Change-Id: I1c0d78178ee9f3655febb6f30cdaef8ee9f8e3cc
This experiment, referred as NEWMVREF, also merged with NEWMVREF_SUB8X8
and the latter one has been removed. Runborgs results show that:
(1) Turning on this experiment only, compared against the base:
derflf: Average PSNR 0.40%; Overall PSNR 0.40%; SSIM 0.35%
(2) Turning on all the experiments including this feature, compared against
that without this feature, on the highbitdepth case using 12-bit:
derflf: Average PSNR 0.33%; Overall PSNR 0.32%; SSIM 0.30%.
Now for highbitdepth using 12-bit, compared against base:
derflf: Average PSNR 11.12%; Overall PSNR 11.07%; SSIM 20.27%.
Change-Id: Ie61dbfd5a19b8652920d2c602201a25a018a87a6
The basic idea is to use a pixel’s neighboring colors as
context to predict its own color. Up to 4 neighbors are
considered here: left, left-above, above, right-above.
To reduce the number of contexts, the combination of any
4 (or less) colors are mapped to a reduced number of
patterns. For example, 1111, 2222, 3333, … , can be mapped
to the same pattern: AAAA. SImilarly, 1122, 1133, 2233, …,
can be mapped to the pattern AABB. In this way, the total
number of color contexts is reduced to 16.
This almost doubles the gain of palette coding on screen
content videos.
on screen_content
--enable-palette +14.2%
--enable-palette --enable-tx-skip +21.2%
on derflr
--enable-palette +0.12%
with all other experiments +6.16%
Change-Id: I560306dae216f2ac11a9214968c2ad2319fa1718
Also make changes to transmit palette-enabled flag using
neighbor blocks as context.
on screen_content
--enable-palette +7.35%
on derflr
with all other experiments +6.05%
Change-Id: Id6c2f726d21913d54a3f86ecfea474a4044c27f6
on screen_content
--enable-palette +6.74%
on derflr
with all other experiments +6.02%
(--enable-supertx --enable-copy-mode
--enable-ext-tx --enable-filterintra
--enable-tx64x64 --enable-tx-skip
--enable-interintra --enable-wedge-partition
--enable-compound-modes --enable-new-quant
--enable-palette)
Change-Id: Ib85049b4c3fcf52bf95efbc9d6aecf53d53ca1a3
This framework allows lower quantization bins to be shrunk down or
expanded to match closer the source distribution (assuming a generalized
gaussian-like central peaky model for the coefficients) in an
entropy-constrained sense. Specifically, the width of the bins 0-4 are
modified as a factor of the nominal quantization step size and from 5
onwards all bins become the same as the nominal quantization step size.
Further, different bin width profiles as well as reconstruction values
can be used based on the coefficient band as well as the quantization step
size divided into 5 ranges.
A small gain currently on derflr of about 0.16% is observed with the
same paraemters for all q values.
Optimizing the parameters based on qstep value is left as a TODO for now.
Results on derflr with all expts on is +6.08% (up from 5.88%).
Experiments are in progress to tune the parameters for different
coefficient bands and quantization step ranges.
Change-Id: I88429d8cb0777021bfbb689ef69b764eafb3a1de
For 444 videos, a single palette of 3-d colors is
generated for YUV. For 420 videos, there may be two
palettes, one for Y, and the other for UV.
Also fixed a bug when palette and tx-skip are both on.
on derflr
--enable-palette +0.00%
with all experiments +5.87% (was +5.93%)
on screen_content
--enable-palette +6.00%
--enable-palette --enable-tx_skip +15.3%
on screen_content 444 version
--enable-palette +6.76%
--enable-palette --enable-tx_skip +19.5%
Change-Id: I7287090aecc90eebcd4335d132a8c2c3895dfdd4
All stats look fine.
derflr: +0.912 with respect to 10-bit internal baseline
(Was +0.747% w.r.t. 8 bit)
+5.545 with respect to 8-bit baseline
Change-Id: I3c14fd17718a640ea2f6bd39534e0b5cbe04fb66
This commit adds encoder side control for vp9 to set color space info
in the output compressed bitstream.
It also amends the "vp9_encoder_params_get_to_decoder" test to verify
the correct color space information is passed from the encoder end to
decoder end.
Change-Id: Ibf5fba2edcb2a8dc37557f6fae5c7816efa52650
(cherry picked from commit e94b415c34)
This commit added a field to vpx_image_t for indicating color space,
the field is also added to YUV_BUFFER_CONFIG. This allows the color
space information pass through the decoder from input stream to the
output buffer.
The commit also updates compare_img() function with added verification
of matching color space to ensure the color space information to be
correctly passed from encode to decoder in compressed vp9 streams.
Change-Id: I412776ec83defd8a09d76759aeb057b8fa690371
(cherry picked from commit 6b223fcb58)
The calculation of required extension used in HBD case was wrong due
to rounding for UV when y dimension is odd. This commit replace the
computation with correct version.
This fixes a crash caused by writting beyond buffer boundary.
Change-Id: Ic7c9afeb7388cd1341ec4974a611dacfb74ac6b6
(cherry picked from commit 4bca73b609)
This commit changes the value of highbitdepth flag to avoid conflict
with vp8 refresh_last_frame flag.
Change-Id: Idcff2cf44f0a200bd935b326f785c0cf32d7228a
(cherry picked from commit dd27307cac)
For all sub8x8 partition, the mv ref has changed to its own nearest_mv
instead of the nearest_mv of the super 8x8 block:
--enable-newmvref-sub8x8: ~0.1% gain for derflr
Besides the above new experiment, code has been cleaned largely for the
sub8x8 motion search, such that the mv ref can be fairly easily changed to
use other options, e.g., the next step's global motion effort.
Change-Id: I8e3f4aaa8553ba8c445369692e079db5ce282593
The bugs affected only the tx64x64 experiment. This patch fixes
them.
Also fixes some compiler warnings
Change-Id: I4cf1e24a4f1fa59831bf29750cf6daa8373e8e8f
Added palette coding option for Y channel only, key frame only.
Palette colors are obtained with k-means clustering algorithm.
Run-length coding is used to compress the color indices.
On screen_content:
--enable-palette + 5.75%
--enable-palette --enable-tx_skip +15.04% (was 13.3%)
On derflr:
--enable-palette - 0.03%
with all the other expriments + 5.95% (was 5.98%)
Change-Id: I6d1cf45c889be764d14083170fdf14a424bd31b5
Added a function to compute a motion field for a pair of buffers, for use in
finding an affine transform or homography.
Change-Id: Id5169cc811a61037e877dfd57fccaca89d93936f
This code is to start experiments with global motion models.
The corner detection can be either fast_9 or Harris.
Corner matching is currently based on normalized correlation.
Three flavors of ransac are used to estimate either a
homography (8-param), or an affine model (6-param) or a
rotation-zoom only affine model (4-param).
The highest level API for the library is in vp9_global_motion.h,
where there are two functions - one for computing a single model
and another for computing multiple models up to a maximum number
provided or until a desired inlier probability is achieved.
Change-Id: I3f9788ec2dc0635cbc65f5c66c6ea8853cfcf2dd
Remove the duplicated step counting the overhead for ext_tx, because
it has been counted in super_block_yrd().
Change-Id: I50bc01d8166572bc0847305565cf33c14ee516e6
Also fixes some build issues with certain experimental combinations.
Results on derflr with all experiments on: +5.516%
Change-Id: I9b492f3d3556bd1f057005571dc9bee63167dd95
Initial patch to remove get_zbin_mode_boost() and
cpi->zbin_mode_boost.
For now sets a dummy value of 0 for zbin extra pending
a further clean up patch.
Change-Id: I64a1e1eca2d39baa8ffb0871b515a0be05c9a6af
(cherry picked from commit 60e9b731cf)
COMPOUND_MODES experiment encodes separate MV modes for each frame in a compound
reference prediction. Added modes NEAREST_NEARESTMV, ZERO_ZEROMV,
NEW_NEWMV, NEAREST_NEARMV, NEAR_NEARESTMV, NEW_NEARESTMV, NEAR_NEWMV,
NEW_NEARMV, and NEAREST_NEWMV.
Also enhances the wedge-partition expt to work better with compound
modes.
Results:
derflr +0.227
All experiments on: derflr +5.218
Change-Id: I719e8a34826bf1f1fe3988dac5733a845a89ef2b
Implements vertical, horizontal, and tm dpcm intra prediction for
blocks in tx_skip mode. Typical coding gain on screen content video
is 2%~5%.
Change-Id: Idd5bd84ac59daa586ec0cd724680cef695981651
16x16 and 8x8 transform sizes, both for the regular case and the high
bit depth (CONFIG_VP9_HIGHBITDEPTH) case.
Change-Id: I34a9d3c73c3687f967105194ce4def48c3ec435c
A smooth weighting scheme is used to put more weight
on the intra predictor samples near the left/top boundaries
and decaying it to favor the inter predictor samples more as
we move away from these boundaries in the direction of
prediction.
Results:
derflr: +0.609% with only this experiment
derflr: +3.901% with all experiments
Change-Id: Ic9dbe599ad6162fb05900059cbd6fc88b203a09c
Disable SUPERTX mode when lossless mode is enabled because the largest
lossless transform size is 4X4.
Change-Id: I4167959a282728d62354119ced99ca29febabfd1
In lossless coding, tx_size can be larger than 4x4 when transform
skipping is activated. Compared to regular vp9 lossless coding,
performance improvement for derf is about 5%; gain is larger
for screen content videos.
Change-Id: Ib20ece7e117f29fb91543612757302a2400110b4
This patch improves the non-transform coding mode. At this
point, the coding gain on screen content videos is about
12% for lossless, an 15% for lossy case.
1. Encode tx_skip flags with context. Y tx_skip flag context is
whether the prediction mode is inter or intra. UV flag context
is Y flag.
2. Transform skipping is less helpful when the Q-index is high.
So it is enabled only when the Q-index is smaller than a
threshold. Currently the threshold is set as 255 for intra blocks,
and 0 for inter blocks.
3. The shift of the prediction residue, when copying them to the
coeff buffer, is set as 3 when the Q-index is larger than a
threshold (currently set as 0), and 2 otherwise.
Change-Id: I372973c7518cf385f6e542b22d0f803016e693b0
Reimplements the supertx experiment from the playground branch.
Makes it work with other experiments.
Results:
With --enable-superttx
derflr: +0.958
With --enable-supertx --enable-ext-tx
derflr: +2.25%
With --enable-supertx --enable-ext-tx --enable-filterintra
derflr: +2.73%
Change-Id: I5012418ef2556bf2758146d90c4e2fb8a14610c7
Extends the ext-tx experiment to include all 9 DST/DCT
variants.
Results with ext_tx experiment:
derflr: +1.338 derflr (improved from 1.12)
Change-Id: I24b5564f96bce6ccaa13d88ca6cb9d0c57000597
Extends the ext-tx experiment to include regular and flipped
DST variants. A total of 9 transforms are thus possible for
each inter block with transform size <= 16x16.
In this patch currently only the four ADST_ADST variants
(flipped or non-flipped in both dimensions) are enabled
for inter blocks.
The gain with the ext-tx experiment grows to +1.12 on derflr.
Further experiments are underway.
Change-Id: Ia2ed19a334face6135b064748f727fdc9db278ec
Non-transform option is enabled in both intra and inter modes.
In lossless case, the average coding gain on screen content
clips is 11.3% in my test.
Change-Id: I2e8de515fb39e74c61bb86ce0f682d5f79e15188
Extends quantizer range and some other fixes.
Change-Id: Ia9adf7848e772783365d6501efd07585bff80c15
stdhd: +0.166%, turns positive a little
derf: -0.036% (very few blocks use the 64x64 mode)
Preliminary 64x64 transform implementation.
Includes all code changes.
All mismatches resolved.
Coding results for derf and stdhd are within noise. stdhd is slightly
higher, derf is slightly lower.
To be further refined.
Change-Id: I091c183f62b156d23ed6f648202eb96c82e69b4b
This commit refactors the rate distortion structure used in the
non-RD coding mode and saves a few RDCOST calculations.
Change-Id: I62c3416c300d2c5372f21b96d93a6b633a34ab3a
The existing speed features produce horrible encoding results, almost
30% worse than cpu-used=4, this commit adjust the speed features to
produce relatively resonable results to be within 3%-5% of cpu-used=4.
Change-Id: I0ca6ebafb33024d4a0cbcf04c78a4a00b8dd1ecf
Its functionality has been replaced with choose_partitioning and
threshold based control on split mode check.
Change-Id: Ic9bb321df06b524f5c38ea5874dc6f6a8f93c5e3
This speed feature has been deprecated in both yt and rtc coding
modes. This commit removes the related operations.
Change-Id: I079c79c6adafe45581af2ebf8b98faebcface1ce
This commit re-designs the recursive partition search scheme in
rtc speed -5. It first checks if the current block is under cyclic
refresh mode. If so, apply recursive partition search. Otherwise,
perform sub-sampled pixel based partition selection. When the
pre-selection finds the partition size should be 32x32 or above,
use the partition size directly. Otherwise, apply partition search
at nearby levels around the preset partition size.
It is enabled in speed -5. The compression performance of rtc
speed -5 is improved by 9.4%. Speed wise, the run-time goes slower
from 1% to 10%.
nik_720p, 1000 kbps
33220 b/f, 38.977 dB, 10109 ms -> 33200 b/f, 39.119 dB, 10210 ms
vidyo1_720p, 1000 kbps
16536 b/f, 40.495 dB, 10119 ms -> 16536 b/f, 40.827 dB, 11287 ms
Change-Id: I65adba352e3adc03bae50854ddaea1b421653c6c
Extend --auto-alt-ref from parameter so we can use it to
turn multi-arf on and off from the command line.
For now the range is 0-off, 1-on, 2-multi-arf on.
Rename play_alternate to enable_auto_arf
Change-Id: Id7b64407cfbe76ba0090a83b588a03e22a240386
All sad function that process above 32 consecutive elements are optimized
for AVX2:
vp9_sad64x64
vp9_sad64x32
vp9_sad32x64
vp9_sad32x32
vp9_sad32x16
vp9_sad64x64_avg
vp9_sad64x32_avg
vp9_sad32x64_avg
vp9_sad32x32_avg
vp9_sad32x16_avg
The functions that appeared as a hotspot is vp9_sad32x32 and vp9_sad64x64
vp9_sad32x32 was optimized by 68% and vp9_sad64x64 was optimized by 90%
both of them gave and overall ~2.3% user level gain
Change-Id: Iccf86b375a2b54c5fbbe685902ead0c9a561b9fd
Currently, the tokens for a tile are stored immediately after its
preceding tile, which causes a dependency. This is unnecessary
since we always allocate enough memory for tokens. Removing
the dependency allows token writing done in parallel. This patch
doesn't change encoding result.
Change-Id: I7365a6e5e2c2833eb14377c37e1503c9d0f26543
This should be set right after decoder really start to decode frame
instead setting at the end.
Even decoder does not have a displayable frame to show and return NULL
to application, this should be set too.
Change-Id: If0313a834bc64e3b0f05a84f4459d444d9eab0d8
When early termination is triggered, properly reset the rate cost
to invalid value to avoid potential ioc issue.
Change-Id: I3444390be2e49a34bb02cf8a74c33d5dbd96d88d
Delete gfboost_qadjust() and move Q based adjustment
into calc_frame_boost(). Also remove clamping. Making
the adjustment here means that it influences not just the
boost level but also the selection of the GF/ARF interval.
This change gives a small average gain in PSNR but
larger gains in SSIM, especially for harder std-hd set (1.5%)
Change-Id: I3aa81b8feccaeff93d915e19fb9cf5cd64c86327
Covers all profiles and input formats. The tests check if the
encode succeeds and if the psnr is sane.
Change-Id: I195a5330debf92562846121819b6eaf961e27c01
This commit fixes an ioc issue that will happen when the cumulative
variables are not in effective use. The fix discards these
redundant additions.
Change-Id: Idbac5bfb989c0cedc5f8a323effce938519b2457
This removes an unnecessary restriction that causes
a problem (noticed by AWG) when the forced key frame
interval is set to a very small value, such as 10. In this case
we were being forced to code minimal length GF groups.
Change-Id: I76ef5861a09638ff51f61fea02359554184ada53
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
Change remerged.
Change-Id: I9efab38bba7da86e056fbe8f663e711c5df38449
This reverts commit 452dc21500.
This change has introduced a significant quality regression on content
with forced key frames. (e.g. the YT and yt-hd set). It is most
noticeable in static content where the kf bits dominate. Here, despite
key frames being apparently coded at the same Q, there is a drop in all
metrics of ~20% (e.g clXR and BFa0).
Change-Id: Iba14cc61778c0846fa0a59c33c55a9fc49512cb4
Compare the estimated rate and distortion to the thresholds scaled
according to the operating block size and determine if further
split partition search will be run. The compression performance of
speed -5 is changed by -0.074%. The encoding speed is 10% - 15%
faster.
vidyo1 720p
16545 b/f, 40.492 dB, 11475 ms -> 16535 b/f, 40.486 dB, 10100 ms
nik720p
16624 b/f, 36.310 dB, 10071 ms -> 16617 b/f, 36.313 dB, 8346 ms
Change-Id: Ic9197ab5761279ae55d2fb7813b2af0e0db497b8
Reduce the intra_cost_penalty for non-rd mode,
and some updates to VAR_BASED_PARTITION.
Visual tests show some improvement at Speed 6, for RTC clips.
Change-Id: If9090daf7aed14906a32d931a538ab544bbca606
This commit replaces the use of copy_partitioning with
choose_partitioning based on the sse of subsamped pixels, which
provides significantly better coding performance and runs at
similar speed, as compared to copy_partitioning. It improves rtc
speed 5 coding performance by 3%.
Change-Id: I52d3682a12dce0147f5e52383a594fc242ca3228
this change checks that CONFIG_SPATIAL_SVC is defined and adds a TODO to
ensure this is changed in the future as the release headers can't
depend on vpx_config.h.
vpx/vpx_encoder.h:164:5: warning: "CONFIG_SPATIAL_SVC" is not defined
[-Wundef]
Change-Id: I797a0150e5f56caf048e7ee00b282fbc9c5ede19
We encode a empty invisible frame in front of the base layer frame to
avoid using prev_mi. Since there's a restriction for reference frame
scaling factor, we have to make it smaller and smaller gradually until
its size is 16x16.
Change-Id: I60b680314e33a60b4093cafc296465ee18169c19
Move the point at which input frames are scaled
into the recode loop. This will allow us to change
the coded frame size dynamically in response
to previous attempts to encode the frame at a
higher resolution.
A following patch will implement a scheme for
resizing the frame in the recode loop.
Change-Id: I6a59c02d6ac1626512edad6de8b60063b79433e6
This commit makes a struct that contains rate value, distortion
value, and the rate-distortion cost. The goal is to provide a
better interface for rate-distortion related operation. It is
first used in rd_pick_partition and saves a few RDCOST calculations.
Change-Id: I1a6ab7b35282d3c80195af59b6810e577544691f
Add a test vector to show the cases where segmentation map is preserved
from frome to frame as outlined in the inquiry in issue 761.
Change-Id: I630c6aba27d0d0b109cc7fd7c6fcd008222a0cf3
Add back clamp which ensures that the Q adaptation
is turned off when the over_shoot_pct and under_shoot_pct
parameters are set to 100.
Change-Id: Id0161b114d39a3029cd3eb28020caab0c3914922
Allow min and maxQ to creep when the undershoot
or overshoot exceeds thresholds controlled by the
command line under_shoot_pct and over_shoot_pct
values.
Default is 100%,100% which ~disables adaptation.
Derf results for example undershoot% / overshoot%:-
Head:- Mean abs (%rate error) = 14.4%
This check in:-
25%/25% - Mean abs (%rate error) = 6.7%
PSNR hit -1% SSIM -0.1%
5% / 5% - Mean abs (%rate error) = 2.2%
PSNR hit -3.3% SSIM - 1.1%
Most of the remaining error and most of the quality hit is
at extreme data rates. The adaptation code still has an
exception for material that is in effect static so that we
don't over adjust and over spend on YT slide show type
content.
(Rebase of If25a2449a415449c150acff23df713e9598d64c9
to resolve a auto-merge error)
Change-Id: Iec4e1613ef0d067454751d8220edb7058dfbd816
use arg_parse_enum_or_int like vpxenc. this also fixes a warning as
arg_parse_enum is not currently declared in args.h.
Change-Id: If9ce258d6adb6286eb86f529083929d5fe2b3a56
Allow min and maxQ to creep when the undershoot
or overshoot exceeds thresholds controlled by the
command line under_shoot_pct and over_shoot_pct
values.
Default is 100%,100% which ~disables adaptation.
Derf results for example undershoot% / overshoot%:-
Head:- Mean abs (%rate error) = 14.4%
This check in:-
25%/25% - Mean abs (%rate error) = 6.7%
PSNR hit -1% SSIM -0.1%
5% / 5% - Mean abs (%rate error) = 2.2%
PSNR hit -3.3% SSIM - 1.1%
Most of the remaining error and most of the quality hit is
at extreme data rates. The adaptation code still has an
exception for material that is in effect static so that we
don't over adjust and over spend on YT slide show type
content.
Change-Id: If25a2449a415449c150acff23df713e9598d64c9
Function will jump to error handler when ref buffer is corrupted.
So "xd->corrupted |= ref_buffer->buf->corrupted;" is useless.
Change-Id: I35353a0637ad0dbb682454e040ef69fa68280bfa
- Some fixes to surface fit.
- Returns variance function as cost rather than sad in the
pattern search and diamond search functions. Only
vp9_pattern_search_sad function used in bigdia search
uses sad as integer 1-away costs.
- Deploys SUBPEL_TREE_PRUNED_MORE for speed 4+.
Results:
derf [Speed 3]: About +0.036% in coding efficiency without any
discernible speed loss.
derf [Speed 4]: About 2-3% faster at -0.199% loss in coding efficiency.
derf [Speed 5]: About 3-4% faster at -0.149% loss in coding efficiency.
Change-Id: I8462f94f6adb46966ca964f2bd0400977357fd63
In model_rd_for_sb function, the spatial domain SSE and variance
are checked to see if transform coefficients are quantized to 0.
Besides that, this patch adds another set of thresholds that are
much more strict. These thresholds are used to conduct a partition
block level check to measure if all its TX blocks are skippable
for YUV planes. If it is true, x->skip is set for this partition
block, and thus its mode search is terminated.
This speeds up the encoding at very low prediction error case,
such as screen sharing application. This patch covers what
rd_encode_breakout_test() does, so that function is removed.
Borg test at speed 3 shows:
For stdhd set, psnr: +0.008%, ssim: +0.014%;
For derf set, psnr: +0.018%, ssim: +0.025%.
No noticeable speed change.
Change-Id: I4e5f15cf10016a282a68e35175ff854b28195944
If the GOLDEN or ALTREF frame was last updated > x frames in the past,
don't use them for denoising (only consider LAST). Using an old reference
frame for denoising, e.g., if it is a long-term reference or the last key frame,
can cause some visible artifacts, in particular in the aggressive denoising mode.
Change-Id: I239c9fbb092c36cba7e95328f1fa67a58d6a7ed9
For input source with size that is not multiple of 8, the size is
rounded to 8 and saved in width or height, the original source sizes
are saved in crop_width and crop_height. This commit corrects the
computation of bottom and right extension amounts to use the orignal
sizes, hence crop_width and crop_height.
In addition, this commit also adds the missed initialization for
uv_crop_width and uv_crop_height.
This addresses issue #834
Change-Id: I084543ca7645a4964b88f7cf8ff668f517d3a39b
The concept:
There's too much noise in source pixels for variance and at low bitrate
the reconstructed looks nothing like the source so we have problems
getting good partitionings with either. This skirts the issue by using
a box blur scaled down version for variance calculations. To compare
against source_var_ moved keyframe to be rd based like source_var.
Change-Id: Ie3babdbfadae324b7b5a76bea192893af27f0624
This commit breaks the overly broad header files into more
targeted and smaller ones, to help better structure the system
layout.
Change-Id: I7b24559d3ea6e582cf5d9bbe8f71459f9824d71b
Fixed an encoder crash. Set skip_txfm to 0 for cases that skip_txfm
isn't calculated. Put memcpy of skip_txfm at right place.
Change-Id: Ib3b6afc1b251a85b2a853c8138fb3393f48cfef6
The functions b_width_log2 and b_height_log2 only do direct
table fetch. This commit unifies such use cases by using the
table directly and removes these functions.
Change-Id: I3103fc6ba959c1182886a2799d21b8b77c8a7b6b
Add comments on the use case of these definitions. Further reduce
the scope of header file in vp9_context_tree.h.
Change-Id: Ic4a7638e838d0ac441b64abfc56e57354c059d75
The coefficient range checking is enabled when configured with
--enable-debug --enable-coefficient-range-checking
for vpxdec to detect ill-formed input stream. This addresses the
problem raised by issue #792.
Change-Id: I3f9ea541de4dc742dd64389d6c5f543fb1c4f052
This commit fixes a buffer pointer mis-use in store_coding_context.
The compression performance for stdhd set of speed 3 is improved by
0.097%. It fixes issue 869.
Change-Id: Idc59e22035eaf39f7133ca04174894374d647ff7
This SSE2 is based on VP8 denoiser's SSE2 code. In VP8, there are
only 16x16 blocks in denoiser, while in VP9, there are 13 different
block sizes.
By adding this SSE2 code, the improvement of encoder speed is around
20%(using C code vs using SSE2 code), vary for different clips.
The unit test for VP9 denoiser is to confirm that the SSE2 code is
bit-exact with the C code. The unit test covers all block size.
Change-Id: Ic8d8ac26db4ea40a5f146b5678a065af07eaaa3d
Adjustments to the GF interval choice and minimum boost.
Adjustment to the calculation of 2 pass worst q.
Compared to 09/29 head there is metrics hit on derf of
(-0.123%,-0.191%)
Compared to the September 29 head and a baseline on
September 18 baseline the accuracy of the VBR rate control
measured on the derf set is as follows:-
Mean error % / Mean abs(error %)
Sept 18 baseline (-7.0% / 14.76%)
Sept 29 head (-15.7%, 19.8%)
This check in (-1.5% / 14.4%)
The mean undershoot is reduced slightly but the
worst case overshoot on e.g. harbour/highway is
increased. This will be addressed in a later patch.
Change-Id: Iffd9b0ab7432a131c98fbaaa82d1e5b40be72b58
It is possible that the GOLDEN reference frame is not avaiable, in
which setting the predicted mv will be associated with a residual
value of INT_MAX. This commit checks this condition before
left shift and comparison with that of ALTREF frame, to avoid
overflow issue.
Change-Id: Ib98c3149dbdd016f2fe5beaafb13f67d469dd07c
This commit adds proper initialization of segment id for variance AQ
mode in non-rd coding path. It fixes the enc/dec mismatch issue of
rt=7 with --aq-mode=1, as reported in issue #816
Change-Id: I02fa41b96345bf2e66077d5ea553f85ba800f7bb
This commit enables the encoder to skip split partition search if
the bigger block size has all non-zero quantized coefficients in low
frequency area and the total rate cost is below a certain threshold.
It logarithmatically scales the rate threshold according to the
current block size. For speed 3, the compression performance loss:
derf -0.093%
stdhd -0.066%
Local experiments show 4% - 20% encoding speed-up for speed 3.
blue_sky_1080p, 1500 kbps
51051 b/f, 35.891 dB, 67236 ms ->
50554 b/f, 35.857 dB, 59270 ms (12% speed-up)
old_town_cross_720p, 1500 kbps
14431 b/f, 36.249 dB, 57687 ms ->
14108 b/f, 36.172 dB, 46586 ms (19% speed-up)
pedestrian_area_1080p, 1500 kbps
50812 b/f, 40.124 dB, 100439 ms ->
50755 b/f, 40.118 dB, 96549 ms (4% speed-up)
mobile_calendar_720p, 1000 kbps
10352 b/f, 35.055 dB, 51837 ms ->
10172 b/f, 35.003 dB, 44076 ms (15% speed-up)
Change-Id: I412e34db49060775b3b89ba1738522317c3239c8
Incorporates the WRAPLOW macro into the non-highbitdepth transforms
to aid hardware verification between a software C model and an
intended hardware implementation though the use of the configure
options: --enable-experimental --enable-emulate-hardware.
Note that to avoid further discrepancies between the sse/sse2
implementations of the transforms and the C implementation, when the
emulate hardware option is invoked, we also disable sse/sse2/etc.
Also incudes some minor cleanups/renaming etc.
Change-Id: Ib864d8493313927d429cce402982f1c8e45b3287
Miscellaneous bug-fixes for high bitdepth functionality.
With this patch, high bit-depth profiles become mostly functional,
except for an intermittent assert failure issue that is being
tracked.
Change-Id: I6a7fcbdcf1e5b09842e88535f8442d2e1230748c
- iphonesimulator: IOS_VERSION_MIN was declared in the wrong place.
- armv6: linking via ld instead of CXX is basically required.
Change-Id: Iad187691f633dcf2bc3e3590e88084bb926edb76
This commit removes unused header file vp9_onyxc_int.h and repeatedly
included file vpx_ports/mem.h from vp9_block.h
Change-Id: I400b210bd1da48f1880bd50a8f4a6e2c690e15a1
Block transform skipping was implemented based on DCT's energy
conservation property. Modified the thresholds using zero bin
parameters. AC and DC coefficients were checked separately to
allow better identifying of skippable blocks.
Borg test at speed 3 showed:
stdhd set: psnr gain: 0.153%, ssim gain: 0.051%;
derf set: psnr gain: 0.023%, ssim gain: 0.036%
For most test clips, the encoding speedup is 1% - 2%.
parkrun(720p): 7.5% speedup, park_joy(1080p): 3.5% speedup.
Change-Id: If28eb81113a077414f5ca7b021c14f9069b373bb
The commit cleans up the header files in vp9_entropymv.h. This
file should only depend on vp9_mv.h and vp9_prob.h. Remove the
giant vp9_blockd.h from header file list.
Change-Id: I44cd26d2cfd10a16a9325778347dd53f888a874c
For regular inter frames, if the distance from GOLDEN_FRAME is larger
than 2 and if the predicted motion vector of LAST_FRAME gives lower
sse than that of GOLDEN_FRAME, skip the GOLDE_FRAME mode checking in
the rate-distortion optimization. It provides about 5% speed-up at
expense of -0.137% and -0.230% performance down for speed 3. Local
experiment results:
pedestrian 1080p 2000 kbps
66712 b/f, 40.908 dB, 113688 ms ->
66768 b/f, 40.911 dB, 108752 ms
blue_sky 1080p 2000 kbps
51054 b/f, 35.894 dB, 70406 ms ->
51051 b/f, 35.891 dB, 67236 ms
old_town_cross 720p 1500 kbps
14412 b/f, 36.252 dB, 60690 ms ->
14431 b/f, 36.249 dB, 57346 ms
Change-Id: Idfcafe7f63da7a4896602fc60bd7093f0f0d82ca
Commit message longer than commit edition.
Simulator and devices:
Add a common minimum iOS version that can be shared by iOS and iOS
simulator targets.
Fix --enable-debug (for device targets; sim was fine):
Allow for successful configuration and build with --enable-debug when
CXX is available by:
- Using CXX as LD (when CXX is available).
- Passing the correct form of the iOS minimum version parameter based on
whether LD is CXX or really is ld.
Note: ld -g still won't work on macosx with this patch, so if CXX is not
available, configuration will still fail reporting that the toolchain
cannot link executables when attempting to pass --enable-debug (because
ld returns an error code since the one included with xcode doesn't
support the -g argument).
Change-Id: Ia488aed167cc2ca82ee9e980589fb76dddce634f
Moves transform type defines to vp9_common.h from vp9_idct.h
so that they can be included in vp9_rtcd_defs.pl safely.
Change-Id: Id5106227bee5934f7ce8b06f2eb9fa8a9a2e0ddb
fixes --enable-coefficient-range-checking --enable-debug
vp9_idct.h has references to INT16_MIN/MAX; this header is included in
c++ source so needs to request the macros
Change-Id: I2e643eb973c2d84729fa3cf2f4c4d8bf65cfdff0
This reverts commit eafc8c9c40.
tran_low_t/tran_high_t don't belong in a public header, they're private.
Similarly the public headers shouldn't rely on config defines,
vpx_config.h isn't installed.
Change-Id: I194ec273598da418df8dd727b6c0e78a556740ad
Some header file in vp9_idct.c has been included in vp9_idct.h.
This commit removes these redundant declarations.
Change-Id: I0238c27e4efff5c981eb437022c6bc6970c4e445
This commit fixes a compiling error in vp9_idct.h, where the codec
checks that the intermediate steps of transformation fit within
16-bit length. The issue was due to broken file dependency.
Change-Id: Ib22bba13a1e6df28489cb23d6774c561969f1fdc
When calculating delta in VP8 denoiser, since the block size is fixed to 16x16,
the divisor is 256, which is the number of the pixel.
But in VP9, the block size varies, the divisor should correspond to the block
size.
Change-Id: Ibdc1e5d23ba8c788b0d0dc6d406bcdfc34c1b142
One is a more aggressive version of the pruned subpel tree
search where only a single halfpel candidate is searched.
The search candidate is based on a surface fit result.
The other is a method to obtain the subpel position at one
shot based on the same surface fit.
The methods have not been deployed in any speed setting yet.
Change-Id: I34fef3f2e34f11396c9d1ba97f4be8c4ffca62d3
This commit enables the encoder to skip checking ALTREF inter modes
in ARF coding, if the predicted motion vectors suggest that the
GOLDEN_FRAME provides higher prediction accuracy than ALTREF_FRAME.
It improves the speed 3 encoding speed by about 5%, at the expense
of compression performance loss -0.041% and -0.225% for derf and
stdhd, respectively.
pedestrian_area 1080p 2000 kbps
66705 b/f, 40.909 dB, 118738 ms ->
66732 b/f, 40.908 dB, 113688 ms
old_town_cross 720p 1500 kbps
14427 b/f, 36.256 dB, 62746 ms ->
14412 b/f, 36.252 dB, 60690 ms
blue_sky 1080p 1500 kbps
51026 b/f, 35.897 dB, 73310 ms ->
50921 b/f, 35.893 dB, 70406 ms
bus CIF 1000 kbps
21301 b/f, 34.841 dB, 7326 ms ->
21248 b/f, 34.837 dB, 7196 ms
Change-Id: I76cf88b4d655e1ee3c0cb03c8a5745493040e8d2
This patch re-enabled the feature in Pengchong's patch
(commit 1286126073). Originally, it
was turned on while use_lastframe_partitioning > 0(not used anymore).
Now it was added as a feature, and turned on while speed >= 2.
As described in the original patch, this feature helps speed up the
slideshows in YouTube.
Change-Id: I1b0f18d65da1ee1c8d1e117dabba910c5207c471
iOS 5 support isn't available in the Xcode 6 install; iOS 6 covers
phones starting at the 3GS, so should be a reasonable base line
Change-Id: I15572ec0dd73f1ffc88c58120c706384a01f2478
The version of gcc4.6 included with the Android NDK through r10b
fails to compile this function. Replace it with C code.
BUG=860
Change-Id: Ifcc0476664071aec46a171cdd5ad17305930986a
A left shift of negative value causes IOC runtime warnings, this
commit converts two such left shifts to multiply to avoid IOCs.
Change-Id: I8811428768d7135e6e16af4b3094d0341589a995
The first comment is obselete given the way is now normative in VP9
bitstream. The second comment line was too long.
Change-Id: I6546585babf60d466485ddcf2daa6d2fa79e999a
As reported in issue #850, the condition for border extension was not
complete. This commit added the case when the scaling is enabled.
This fixes issue #850.
Change-Id: I67768b23f0dcc4ac9a9aa0a0825b0fe8cb85a72e
Simplified the code and removed some code that was not used anymore.
This patch didn't change encoding result.
Change-Id: I7e54a74c8f35a6726dfc8a1c55b337448b7ea124
The rd_thresholds are adaptively changed based on best mode tested.
It was only changed for the same block size, this commit makes the
adaptation for similar block sizes too. The commit also made minor
adjustment and code cleanups.
The impact on encoding time for _ped:
118089 ms -> 111927 ms
The impact on compression:
derf: -0.339%
stdhd: -0.303%
Change-Id: I8817fed1102350497f2ec631849e43f753878e5d
Adds code to return an integer cost list for NSTEP search. Then
uses it for pruned subpel search in speed 3.
derf: -0.06%
Speed on mobcal 720p increaes from 10.28 fps to 10.65 fps.
[Subject to further testing].
Change-Id: Ib591382d25b2c11bcaba9d3a27a93a9d1ab27a96
mi_grid_* are arrays of pointer to pointer. They save the pointers that point
to the MIs in cm->mi. But they are unnecessary and complicated. The original
goal was to remove MODE_INFO_t copy. But with an extra MODE_INFO_t pointer
inside MODE_INFO_t, same goal could be achieved.
This commit totally removes the mi_grid_* structures. But there are still
many dummy MODE_INFO_t inside cm->mi which are a waste of memory. Next commit
will do on-demand MODE_INFO_t allocation in order to save these memories.
Change-Id: I3a05cf1610679fed26e0b2eadd315a9ae91afdd6
Allow for option to apply spatial blur for temporal
denoising, under the aggressive denoising mode.
Change-Id: I41c5fdc0b6cf32d8f8d1d4236b25fa5aa406e89e
Place after pulling in forward declarations from the codec. This fixes
compilation of the tests under vs9.
Since
10783d4 Adds high bitdepth transform functions and tests
where vp9_idct.h was added to vp9_rtcd.h the tests are pulling in
vp9_systemdependent.h, which under visual studio include intrin.h. With
VS9 these include headers which define helper classes for intel
intrinsics. When including it in the tests (via vp9_rtcd.h) __cplusplus
is defined but vp9_rtcd.h would wrap declarations in 'extern "C" {'
causing a mismatch in linkage which resulted in compilation failure.
Change-Id: I475e50198b71320e8606bc95c9454876d8799ede
vpx_svc_parameters_t contains id, resolution and min/max qp for each spatial layer.
In this change we will use extra config to send min/max qp and scaling factors, then calculate layer resolution inside encoder.
Change-Id: Ib673303266605fe803c3b067284aae5f7a25514a
In many tests in VP8, the denoiser is disabled. By adding this
conditional comilation macro, the unit test will not be included
when denoiser is not enabled.
Change-Id: I6edec85c996acca22aacd11161c52408be2660a3
Substantial restructuring of the way we estimate
the rate of decay in prediction quality and determine
the arf interval and amount of boost used.
Also other changes to support moving to a lower first pass
Q which exposes some new features and allows us to better
distinguish genuinely static blocks from low motion or noisy
blocks.
Net gains now visible on all the test sets with std-hd PSNR up
1.87%. There are still some bad outlier cases but most of these
are low motion or slide show type content where the metrics
are already high at any given rate. The best + case is up by
more than 10%.
Change-Id: I18e25170053bdf3188f493ff8062f48a74515815
1. This is to align with the ffmpeg implementation
2. Remove APIs for setting quantizers and scale-factors
Change-Id: I6e238d71db790a9fb3254baaeb61e2a5aac58f48
This commit adds back sse2 or ssse3 optimized versio of a couple of
functions, fixes a ~10% performance regression.
Change-Id: I049786906e5a641224dced63c6492aec9d86d183
The ARF frame should always be the same size as the
native resolution of the input frames.
It will be scaled to the required resolution at
encode time.
Change-Id: I0afe858129aa6ef65b1648f43476331715346896
This commit makes the encoder to use non-zero mode threshold for
NEARESTMV modes. The runtime for test clips of speed 3 is reduced
by about 1%.
pedestrian 1080p 2000 kbps, 143239 ms -> 141989 ms
bus CIF 1000 kbps, 7835 ms -> 7749 ms
The compression performance change is about -0.02% for both derf
and stdhd.
Change-Id: Ib71808922c41ae2997100cb7c561f68dcebfa08e
If the partition block is skippable, which means no coefficients
for Y, U, and V planes, its skip flag is set to 1. No quality
change (verified by borg tests), and no noticeable speed change.
Change-Id: I9231f720f8dd6364384cf05aa148ca24d75450f1
Libvpx was memseting every external frame buffer before decode. This
was to work around a valgrind issue in our C loop filter. Most of
the time this was not needed and we have noticed some significant
performance loss on some platforms. Now we require the application to
zero out the buffers if it is using external frame buffers.
Change-Id: I7330d00a315e65137ed30edd5f813e8929b76242
This commit enforces ARF validation check for compound inter modes.
It avoids potential access to ARF in the encoding process if it
is not allowed.
Change-Id: I055fec946b5d19d97937dc9001e1e564923e2439
The valid reference frame check in sub8x8 rate-distortion
optimization search has been included in the ref_frame_skip_mask
scheme. This commit removes the later further validation checks
that are not in effect.
Change-Id: I853b477c44037d3dc0afec6cbfce08a96c597a75
This commit replaces the best_ref_index table fetch with the use
of best_mbmode in vp9_rd_pick_inter_mode_sub8x8.
Change-Id: I882ee9ee6a8c0e61befcca1f4dba6d2ea8de8f13
The issue was discovered on bitstream with 2x vertical downscale. For
zero MVs, y_pad is set to 1 only when vertical convolution is
required. The original code assumes that for y_step_q4 == 32 we don't
perform vertical convolution. But vp9_setup_scale_factors_for_frame()
sets convolve functions so that when x_step and y_step are both not
equal to 16, convolve in both directions is performed. And convolve()
unconditionally subtracts one stride from source pointer when calls
convolve_horiz(). This leads to invalid memory access.
Change-Id: I882dfa6081a58e172b5ffa55842bfcd6727f10bf
Call to vp9_rc_get_second_pass_params() moved from
Pass2Encode() to earlier in vp9_get_compressed_data(),
to ensure that two pass stats and parameters are
available before decisions such as frame scaling.
Change-Id: If21537f0073919b04696a7d5e9aac78e23d76f39
When a reference frame type is not in the frame buffer, the mode
search threshold will be set to INT_MAX, so as to effectively
turn off the mode entries in the rate-distortion optimization loop
that involves this reference frame type. This operation is now
integrated in the ref_frame_skip_mask scheme. This commit hence
removes the redundant mode search threshold setting.
Change-Id: Ib18f45da611afda2af275201efd367df7f5101ab
This commit unifies the reference frame control in the rate-
distortion optimization search loop of sub8x8 block size to remove
the control dependency on mode search order.
Change-Id: I3a174099f71a7cc176ede9fd60e2374243ae9232
Improves function to return sad of integer pels by reusing integer
pels already visited in the smallest scale.
Turns on BIGDIA search for speed 4. Also, turns on the
first version of the pruned subpel search at this speed.
derf: -0.32% (speed 4)
Speed seems to improve by at least 5% but subject to verification.
Change-Id: Iaec8eaffd61d6237ac029e6a2a1b0a88b2a35271
Adds various high bitdepth transform functions and tests.
Much of the changes are related to using typedefs tran_low_t
and tran_high_t for the final transform cofficients and intermediate
stages of the transform computation respectively rather than fixed
types int16_t/int. When vp9_highbitdepth configure flag is off,
these map tp int16_t/int32_t, but when the flag is on, they map
to int32_t/int64_t to make space for needed extra precision.
Change-Id: I3c56de79e15b904d6f655b62ffae170729befdd8
The variable best_inter_rd is effectively not in use in the rate-
distortion mode search loops of both regular block sizes and sub8x8
block sizes.
Change-Id: I178f909f8c9629772e13adc6257908653b2adf31
The speed feature that skips compound inter prediction modes was
subsumed by other speed features and effectively was not in use.
This commit removes it.
Change-Id: I22b0c71a8ddd15d93b25d86fa63a1dce2ba6a1a9
If optimizations use more than one cpu feature, allow
specifying them so that '--disable-X' still works
https://code.google.com/p/webm/issues/detail?id=854
Change-Id: I3108ea37b397371a2be84dd5f2380b304db23f18
1. svc_encodeframe.c will not handle frame or stats packets anymore.
The app will process them.
2. Remove APIs that related to these packets.
Change-Id: Id0d7f8b458dc09c6f77064c0878fd4e572db001b
Integrate intra mode mask speed feature with the mode_skip_mask
scheme. Move it outside the mode search loop in the
vp9_rd_pick_inter_mode_sb function.
Change-Id: I7738fea749bfdc08ad05d7f2524feb8ff67568d9
This speed feature is used in real-time setting only. Remove the
related condition check in the rate-distortion optimization search
loop.
Change-Id: Iaacc1e268214634e6f95c5048c28a60cec6c42fc
Refactor overlay frame speed-up related function. Make it unified
with the ref_frame_skip_mask system and Move it out of the mode
search loop.
Change-Id: I0dde9baf44354f6ba00b4679cba02fa6a30c7316
This commit refactor the rate-distortion optimization search for
regular block sizes to remove the speed feature dependency on mode
search order.
Change-Id: Ied033ee484c2957e17baa7b6450b720fe7dd0e7d
This issue is found when the denoising mode is set to kDenoiserOnYUVAggressive.
Updated the C code to make it the same with SSE version.
I also changed several lines in VP9 denoiser for the code style.
Change-Id: I640d48cf946fe8c6a400e6e252107501d1e226d3
This speeds up the encode significantly. Also added a comment about using
best quality to keep new developers informed.
Change-Id: I04e8154d4b2c4cae07fe7cc9a71e707f649e9ed4
don't bother decoding any further after receiving an earlier decode
error until a key/intra-only frame is encountered.
Change-Id: I381917b70d7a9e6f8d6de42e3d181bb113a4cec4
This commit fixes a bug related to skipping intra mode checking, by
using a separate variable to store the best prediction error from
inter mode. It avoids unintentionally overwriting intra mode
rate-distortion cost, and hence affecting other speed features.
Change-Id: I99e12993339c84c8b4f597996b372012e5858fae
Assigning selected reference frame pointer is done in the
encode_superblock function. No need to do this at the end of
rate-distortion optimization search.
Change-Id: I33fcede0fd304b4a4c4deef2d126d79546a9c070
This commit refactors the vp9_rd_pick_inter_mode_sb function to
remove the intra mode early termination dependency on the mode
search order.
Change-Id: If6ac49aa7c530c7b9a5bd31b0ab84db83e192bec
This commit allows the encoder to find current best prediction mode
state using best_mbmode, instead of fetching from the static mode
search table via best_mode_index.
Change-Id: Ibefeab83aed33a49c2be03e83f09153856ca4271
Allow building for targets which have NEON but not EDSP or MEDIA.
Set HAS_NEON flag for builds which have neon assembly but not
intrinsics.
Change-Id: Ibfa81a8444a8c55d1d3209c533d1d70d2f809669
The use of use_lastframe_partitioning is totally removed in good-
quality encoding. Its usage in real-time encoding needs to be
evaluated to see if it can be removed too.
The Borg tests at speed 4 showed:
stdhd set: 0.220% psnr gain, 0.166% ssim gain;
derf set: 0.329% psnr gain, 0.476% ssim gain.
Speed test on selected clips showed 1.54% speedup.(Worst case:
pedestrian_area_1080p25.y4m, speed loss: 1.5%)
Change-Id: I1c844d329b0b5678558439b887297c1be7ddab00
the code currently checks whether the allocation has been done instead
of allocating on the first frame.
since:
4f27202 vp9: fix crash in mt loopfilter w/corrupt file
this change defers the allocation until the loop filter is used.
Change-Id: I660c1b7f34e713a8dd9884483f01d23b9847366e
The speedup in rd_pick_partition() function makes it possible
to drop use_lastframe_partitioning feature. By doing that, we
achieve good PSNR gain with small speed loss. Also, this makes
encoding loop less complicated. The code cleanup patch will
follow.
Borg tests showed:
1. At speed 2,
stdhd set: 0.201% PSNR gain, 0.133% SSIM gain;
derf set: 0.262% PSNR gain, 0.276% SSIM gain.
2. At speed 3,
stdhd set: 0.139% PSNR gain, 0.109% SSIM gain;
derf set: 0.447% PSNR gain, 0.442% SSIM gain.
The average speed loss over selected test clips is within 1%
with the worst case of 4%.
Change-Id: Icfd2ded7869372b585a6972855d933b3d0280d90
The rate costs calculated for inter modes are not precise in some
cases, which causes NEWMV is chosen instead of NEARESTMV, NEARMV,
and ZEROMV. This patch added checks for these cases, and corrected
the mode decisions.
Borg tests at speed 3 showed:
1. stdhd set: 0.102% PSNR gain and 0.088% SSIM gain.
2. derf set: 0.147% PSNR gain and 0.132% SSIM gain.
No speed change.
Change-Id: I35d17684b89ad4734fb610942d707899146426db
Removed functions:
* vp9_post_proc_down_and_across_mmx
* vp9_mbpost_proc_down_mmx
* vp9_plane_add_noise_mmx
They all have sse2 equivalent.
Change-Id: I59c1fac12b7c96ca4538d455e4400c2b7875feff
vp9_variance_sse2.c contains a mix of intrinsics and references to
assembly which uses x86inc.asm; it's conditionally included as a result.
Change-Id: I254451483a65881c0b8e18e27bf0c3ddef60c4ec
make bifilter4_coeff[][] uint8_t, no values exceed this range and
they're loaded with vdup_n_u8().
Change-Id: I921983e9edd828d29820e40ac30a7801dbe0fb4f
allocations within vp9_alloc_context_buffers() rely on mi_rows/mi_cols
individually, use those to determine whether to realloc rather than
stride and stride * rows. this fixes a crash with some fuzzed files for
invalid accesses into last_frame_seg_map and above_context.
Change-Id: I7b9f40dcf170d443890f3bd2acd285507943c7d4
proceeding using a corrupt (incompletely decoded) frame reference may
lead to incorrect assumptions about allocation sizes leading to a crash.
Change-Id: I76e74f2e1be127c2e2c7e1174bb3307497dfd23d
This commit turns on adaptive motion search for ARF coding, in
addition to other normal inter frame coding. It improves the
average compression efficiency:
stdhd 0.1%
derf 0.04%
For the test sequences, the speed 3 runtime is reduced:
pedestrian 1080p 2000 kbps, 149932 ms -> 144580 ms, (3.3% speed-up)
bus CIF 1000 kbps, 8050 ms -> 7895 ms, (1.9%)
highway CIF 100 bkps, 45033 ms -> 44078 ms, (2.2%)
Change-Id: I5228565b609f99e8ae04f6140a2bf2b64a831d21
This is to keep the same with VP8 denoiser.
If motion magnitude is small,
make denoiser more aggressive.
Change-Id: I942a6e2f2ed9aec6f0c4c1f9e5fa47066cadcc0c
Calling Reset(int) method instead of overloaded operator()(int).
Adding underscore at the end of class member name.
Change-Id: I01934e7bc056d4b594e5d05d693328febd34ac3c
When the first try of denoising turns out to be too much,
we will use a softer filter by adopting an adjustment to
make the result closer to original pixel (as in VP8 denoiser).
The old code made the adjustment in the wrong direction.
Change-Id: I84e28fa9e01eef47c5a37d5a2e6d3d378a06786b
Use the right return values - vget_low_s64 returns int64x1_t, not
a normal int64_t.
Also make __builtin_prefetch a no-op on MSVC for this file.
Change-Id: I4d2fce01d0ba106b98d3d53b137803119c2c2c08
This commit allows the encoder to store outcomes of single reference
frame modes and compares them to decide if the inter prediction
filter, forward transform, and quantization can be skipped.
The compression performance of speed 3 is down
derf -0.364%
stdhd -0.198%
For test sequences, the speed 3 runtime is reduced
highway CIF 100 kbps, 51976 ms -> 45033 ms, 13% speed-up
stockholm 720p 1000 kbps, 71826 ms -> 67838 ms, 5.5% speed-up
pedestrian 1080p 2000 kbps, 154924 ms -> 150702 ms, 2.6% speed-up
Change-Id: I5aa26f918d2b4b5197a2c0afa2779319f1c88e44
vp8_build_intra_predictors_mbuv_s().
This patch replaces the assembly version with an intrinsic
version.
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~2.6%.
Change-Id: I9ef65bad929450c0215253fdae1c16c8b4a8f26f
intra_super_block_yrd() and inter_super_block_yrd() are largely same,
this commit merges them into one to reduce code duplication.
Change-Id: I64d7042a5b099345627cf55663010c185b25ec37
From 3 to 2, which seems to be slightly positive on compression for
all test sets, also reduces encoding time by 2%-5%, varying on the
test clips.
Change-Id: If045417bd27311700c919b4a335eff0dc1130ae0
This commit removes the special case for key frame, as transform size
decision is controlled by the appropriate speed feature for all lossy
coding modes: tx_size_search_method.
Change-Id: I9677171e3f2432ec23705f7c5ea8170dd4562fae
This commit allows the encoder to skip check on compound inter
modes in the rate-distortion optimization loop, if the reference
frame bias signs are the same.
Change-Id: Ib753e6bb11cbdd338aee69dbe2b649671f75a6b0
Adds config parameter vp9_highbitdepth, to support highbitdepth profiles.
Also includes most vpx level high bit-depth functions. However
encode/decode in the highbitdepth profiles will not work until
the rest of the code is in place.
Change-Id: I34c53b253c38873611057a6cbc89a1361b8985a6
It's built based on current spatial svc code.
We only support one spatial two temporal layers at this time.
Change-Id: I1fdc8584354b910331e626bfae60473b3b701ba1
This commit skips the compound inter mode prediction check in the
rate-distortion optimization loop for ARF coding. It reduces the
runtime for certain test clips at speed 3, at no compression
performance change:
bus CIF 1000 kbps, 8260 ms -> 8090 ms, 1.8% speed-up
stockholm 720p 1000 kbps, 74453 ms -> 71826 ms, 2.9% speed-up
No visible speed-up for pedestrian area 1080p at 2000 kbps.
Change-Id: Ic68aa56837159b726563b784e2e3729e846465ad
Use unsigned int type to store the sse in the pixel domain. The
precision is sufficient to handle sse of block size up to 64x64.
The transform domain version however needs int64_t, since there is
a transfer gain applied in the forward transformation that might
cause unsigned int overflow.
Change-Id: Ifef97c38597e426262290f35341fbb093cf0a079
store the number of allocated rows in VP9LfSync, the calculated values
can not be relied on when dealing with corrupt material.
Change-Id: I13b8bcec9738c299a71df726772ab7ac05511e5b
This commit allows encoder to skip intra coding mode test, when
the known inter residual is less than the source variance. It
reduces the runtime of speed 3 for test clips:
bus cif 1000 kbps: 8587 ms -> 8260 ms, 3.8% speed-up
pedestrian 1080p 2000 kbps: 161381 ms -> 155241 ms, 3.7% speed-up.
The compression performance is down by
derf -0.36%
stdhd -0.25%
Change-Id: I75ce1e035b4da2153cb1ac14111d1a07c05a735d
This commit extends the sse and forward transform computation flag
to support the case 64x64 blocks where there are 4 32x32 2D-DCT
blocks.
Change-Id: I86a3e805dfaa0f3abd812f590520c71aa0e40473
In order to understand memory layout consider the declaration of the
following structs. The first one is a part of our API:
struct vpx_codec_ctx {
// ...
struct vpx_codec_priv *priv;
};
The second one is defined in vpx_codec_internal.h:
struct vpx_codec_priv {
// ...
};
The following struct is defined 4 times for encoder/decoder VP8/VP9:
struct vpx_codec_alg_priv {
struct vpx_codec_priv base;
// ...
};
Private data allocation for the given ctx:
struct vpx_codec_ctx *ctx = <get>
struct vpx_codec_alg_priv *alg_priv = <allocate>
ctx->priv = (struct vpx_codec_priv *)alg_priv;
The cast works because vpx_codec_alg_priv has a
vpx_codec_priv instance as a first member 'base'.
Change-Id: I10d1afc8c9a7dfda50baade8c7b0296678bdb0d0
In the partition search, the encoder checks all possible
partitionings in the superblock's partition search tree.
This patch proposed a set of criteria for partition search
early termination, which effectively decided whether or
not to terminate the search in current branch based on the
"skippable" result of the quantized transform coefficients.
The "skippable" information was gathered during the
partition mode search, and no overhead calculations were
introduced.
This patch gives significant encoding speed gains without
sacrificing the quality.
Borg test results:
1. At speed 1,
stdhd set: psnr: +0.074%, ssim: +0.093%;
derf set: psnr: -0.024%, ssim: +0.011%;
2. At speed 2,
stdhd set: psnr: +0.033%, ssim: +0.100%;
derf set: psnr: -0.062%, ssim: +0.003%;
3. At speed 3,
stdhd set: psnr: +0.060%, ssim: +0.190%;
derf set: psnr: -0.064%, ssim: -0.002%;
4. At speed 4,
stdhd set: psnr: +0.070%, ssim: +0.143%;
derf set: psnr: -0.104%, ssim: +0.039%;
The speedup ranges from several percent to 60+%.
speed1 speed2 speed3 speed4
(1080p, 100f):
old_town_cross: 48.2% 23.9% 20.8% 16.5%
park_joy: 11.4% 17.8% 29.4% 18.2%
pedestrian_area: 10.7% 4.0% 4.2% 2.4%
(720p, 200f):
mobcal: 68.1% 36.3% 34.4% 17.7%
parkrun: 15.8% 24.2% 37.1% 16.8%
shields: 45.1% 32.8% 30.1% 9.6%
(cif, 300f)
bus: 3.7% 10.4% 14.0% 7.9%
deadline: 13.6% 14.8% 12.6% 10.9%
mobile: 5.3% 11.5% 14.7% 10.7%
Change-Id: I246c38fb952ad762ce5e365711235b605f470a66
Updates the vp9_pattern_search function to return integer one-away
neighbors' sad values, for subsequent use in speeding up the
sub-pel search. Also, removes code for the do_refine option
which is not being used currently.
Updates the integer and subpel functions to pass in a 5-element
sad list for output or input.
A new pruned sub-pel search algorithm is implemented that uses
the sad returned from the integer pel search. But it is not
deployed yet.
Change-Id: Ifa9f5ad024b5b660570366d2bd900343e1891520
this change is proactive: the loop filter expects valid input and may
produce undefined results / crash in other cases.
Change-Id: I6cc1e966062a91cbc6db981c87cd03d9129fc8fe
attempting to decode a frame after the previous frame failed has the
potential of interrupting an earlier loop filter task
Change-Id: I6f2b1ddcdf5b89c3e2ee8caf5289dada2a087d66
This commit re-work the operation flow related to prediction
residual generation and the rate-distortion modeling. It saves one
call for model_rd_for_sb.
Change-Id: Icaf96c0ff09c903637ed5283448afe01d798195f
The value of switchable rate has been stored in a local variable.
This change skips the second call to vp9_get_switchable_rate() by
reusing the local variable.
Change-Id: Ib7d3fef7621cc4bde94c6d6e6b3a71f1fd4559f2
Check the mode and motion vector cost. If it is already above
the existing best rate-distortion cost, skip the rest check process
on this mode.
Change-Id: Ie065cebdfda2a3be3be18b8e8b43dc29aaa8c179
This commit makes the rate distortion modeling run in the unit of
maximum transform block size. No compression/speed change observed.
It is for the use of later fast forward transform purpose.
Change-Id: Ibaaedb69c765e8d0c5d5012f0ec07f36fd9f68fd
if the first frame was corrupt and loop filter not called, the next call
would assume the necessary allocations had been done and segfault when
accessing a NULL pointer
Change-Id: Ib6ef505e5c594e6f0fe65ab0700172bcf06b92a6
This commit addes a new strategy to reduce the search for optimal
interpolation filter type. The encoder counts and store how many each
filter type is selected and used for each of the reference frames.
A filter type that is rarely used for all three reference frames is
masked out to avoid computation.
The impact on compression is neglectible:
-0.02% on derf
+0.02% on stdhd
Encoding time is seen to reduce by 2~3%.
Change-Id: Ibafa92291b51185de40da513716222db4b230383
We can use one frame context for each layer so that we don't have
to reset the probs every frame. But we can't use prev_mi since we
may drop enhancement layers. So we have to generate a non vp9
compatible bitstream and modify it in the player.
1. We need to code all frames as invisible frame to let prev_mi
not to be used. But in the bitstream we need to code the
show_frame flag to 1 so that the publisher will know it's
supposed to be a visible frame.
2. In the player we need to change the show_frame flag to 0 for
all frames. Then add an one byte frame into the super frame
to tell the decoder which layer we want to show.
Change-Id: I75b7304cf31f0ab952f043e33c034495e88f01f3
the parsing of this flag was mistakenly put in a CONFIG_VP8_DECODER
conditional block in:
95853db vpxdec: add --keep-going option
Change-Id: Ie83ca0399fd3f3d4b0a9d03b7ca5536b310e1f02
The function was called in two places. In the first case it is replaced
with vp9_set_speed_features() call. In the second case the body of set_speed_features() is inlined.
Change-Id: If3fdf1b4168eee97677c224f69c245fe46c7f606
This patch fixes slow first pass problem. Mode could only be determined
from the deadline value during frame encode call. Unfortunately, we use
mode value before any encode calls during the first pass encoding (see
set_speed_features() logic). The mode for the first pass must be different
from BEST to make first pass fast.
Change-Id: I562a7d32004ff631695d91c09a44d8a9076fd6b5
use win32/win64 instead of $(PlatformName) (Win32/x64) for compatibility
with yasm 1.3.0. both format types were available since at least 0.8.0
BUG=843
Change-Id: I7917620490d0663b118ff08b96d1e5dbccba3703
Change-Id: Ia55317606c78a9d984db0321ef142548d20b64bc
1: dereference of global->codec checked
2: warning fails to recognize fatal(xxx) as exit or return
3: ctrl_args_map can be null
4: streams can be null
The case where frame width increases but the overall memory
size required to hold the mi arrays does not was not
handled.
Change-Id: I72e70b912a7d1766687ad682979f1c9ee124449b
My fault, that was a float (not integer) which was converted to int64_t.
This reverts commit a885e1cbf0
Change-Id: Ic50708b959e1c3cb3e37da1429d334fafc3391d6
Scale min_consec_zero_last wrt to #temporal layers,
and use full framerate as factor in noise metric.
Change-Id: Id0842b90164ce468d1236173c51965e7620c0e12
In the full-rd transform size search, we go through all transform
sizes to choose the one with best rd score. In this patch, an
early termination is added to stop the search once we see that the
smaller size won't give better rd score than the larger size. Also,
the search starts from largest transform size, then goes down to
smallest size.
A speed feature tx_size_search_breakout is added, which is turned off
at speed 0, and on for other speeds. The transform size search is
turned on at speed 1.
Borg test results:
1. At speed 1,
derf set: psnr gain: 0.618%, ssim gain: 0.377%;
stdhd set: psnr gain: 0.594%, ssim gain: 0.162%;
No noticeable speed change.
3. At speed 2,
derf set: psnr loss: 0.157%, ssim loss: 0.175%;
stdhd set: psnr loss: 0.090%, ssim loss: 0.101%;
speed gain: ~4%.
Change-Id: I22535cd2017b5e54f2a62bb6a38231aea4268b3f
This commit enables the encoder to record the location of the
center frame to generate alter reference frame. It then allows to
skip checking prediction modes of other reference frame types when
it comes to encode this frame.
The speed 3 runtime is reduced for the test sequences:
bus at CIF 1000 kbps, 9791 ms -> 9446 ms, i.e., 3.5% speed-up,
pedestrian at 1080p 2000 kbps, 184043 ms -> 175730 ms, i.e., 4.5%
speed-up.
No compression performance change observed.
Change-Id: Iacfde3bcc1445964e7a241f239bd6ea11cb94bd1
1. Clean the code for encode frame tests
2. Add encode w/ and w/o alt reference frame test
3. Add encode SNR layers test
4. Add encode multiple layers but decode partial layers test
Change-Id: Ibd2c9bc02525db584a6f931a98405f2d851b3cd6
This reverts commit 5509b7fd8f
Observed a big drop in compression quality and speed for speed 1 for a 360p test clip, revert this now for investigation.
Change-Id: If69dc8d77a225b34dc7907a9472e1a7a0a22762d
According to the current API spec we need to call vpx_codec_encode() until
vpx_codec_get_cx_data() returns NULL.
Change-Id: I4617f8042d50480a8f47b0b7114d4759fa566b14
Add a speed feature to give the tighter partition search
range. Before partition search, calculate the histogram
of the partition sizes of the left, above and previous
co-located blocks of the current block. If the variance of
observed partition sizes is small enough, adjust the search
range around the mean partition size, which will be tigher.
The feature is currently turned on at speed 2. Experiments on
sample youtube clips show on average the runtime is reduced
by 3-7%.
For hard stdhd clips:
park_joy_1080p @ 15000kbps: 509251 ms -> 491953 ms (3.3%)
pedestrian_area_1080p @ 2000kbps: 223941 ms -> 214226 ms (4.3%)
The PSNR performance is changed:
derf: -0.112%
yt: -0.099%
hd: -0.090%
stdhd:-0.102%
Change-Id: Ie205ec5325bf92ec5676c243e30ba9d0adca10f2
- Fix nit: make test function definitions match test order.
- Fix nit: use elog instead of echo for env verification error.
Change-Id: I0eec078fc056a5bb2bd88d5833e43de48d77ec08
When configuring the buffer make sure to set all the (now) required
fields. Use the canonical variables and match the style from vpx_scale.
https://code.google.com/p/webm/issues/detail?id=841
Change-Id: I71b43d4a03756b8b2d6d60fdf8d7bf41b8041787
this test allocates >2GB currently. depending on the order of the test
runs the allocation may fail most regularly with mingw+wine.
Change-Id: Ibee1c18cfbe29a4de6c65075647ec3955d8206c0
- Remove vpxdec and vpxenc from the exclude list.
- vpx{dec,enc}.sh: Updates to support finding their executable when
LIBVPX_BIN_PATH is setup for the examples.
- tools_common.sh: New library function, vpx_tool_path(). Provides
support for finding the exectuables in vpx{dec,enc}.sh.
Change-Id: I730f11cceb44646491a7a7ff58603a4a760129ef
According to the current API spec we need to call vpx_codec_encode() until
vpx_codec_get_cx_data() returns NULL.
Change-Id: Ide0c531dc0d453df8ec1edb8acb894856d6cc22e
Explicitly makes the fileptr null when close source is called
on a temporary file. This avoids a valgrind error.
Change-Id: I9c364290eeb6842fde946dd9bf817814c7178aaa
On key frame, will always start with normal denoising mode,
but based on a computed noise metric (normalized mse on source diff)
may switch to aggressive mode (and back down again).
Change-Id: I20330b2dcf3056287be37223302b2cab5fc103eb
Modify zero_mv bias condition to include check that "closest" reference is last_frame.
This is needed for temporal layers, where the last_frame is not always the closest reference.
Also, constain zeromv_count to be for last_frame reference.
Change-Id: I7af54a809ebf01ef43b9933c9d4095b6cb189390
In the current implementation of the encoder,
frame buffers may come from the wider set of
12 such buffers, and is not restricted to the
8 allowed as reference frames. This is only
an implementation detail and does not affect
the constraint of having a total of 8 reference
buffers overall.
Change-Id: I075f777146c2df49c275d89232933f8127235175
At --good and speed 3 or above for resolution less than 720p. This
disables the tests for 64x64 intra prediction modes. Encoding time
reduction is about 1%.
Change-Id: Ib396e3d1417fece416e3f0fee929b128acbb130f
The test to determine if the mode info buffers need
to be resized when the frame size changes was
incorrect, as per bug 837.
By storing the size of the allocated data structure,
a simple test determines whether to allocate more
memory when the frame size changes.
Change-Id: I1544698f2882cf958fc672485614f2f46e9719bd
In the encoder, current_video_frame is used in a couple of places to
decide encoding strategy, this commit replaces with more appropriate
variables.
Change-Id: I3d3d8d8e2ea02c489e4639b9d4c446a63e357d29
This commit moves the simplified coefficient probability model
and costing update to speed 4, and turns on chessboard pattern
mode search for sub 720p sequences. The overall coding performance
of speed 3 is improved:
derf 0.889%
stdhd 1.744%
The speed 3 runtime for test sequences are improved:
bus cif at 1000 kbps 9823 ms -> 9642 ms
pedestrian 1080p 2000 kbps 189559 ms -> 183284 ms
Change-Id: Iecbc7496a68f31fd49fb09f8dfd97c028d675a5d
This commit enables the encoder to skip NEARMV and ZEROMV if the
above and left blocks have identical reference frame, and the
current reference is different from that. It reduces the runtime
of speed 3 for test sequences:
bus cif at 1000 kbps 10064 ms -> 9823 ms
pedestrian 1080p at 2000 kbps 193078 ms -> 189559 ms
The compression performance is changed by
derf -0.085%
stdhd -0.103%
Change-Id: If304f26d42e6412152a84c3dd7b02635c38444f4
This commit allows the encoder to check the above and left neighbor
blocks' reference frames and motion vectors. If they are all
consistent, skip checking the NEARMV and ZEROMV modes. This is
enabled in speed 3. The coding performance is improved:
pedestrian area 1080p at 2000 kbps,
from 74773 b/f, 41.101 dB, 198064 ms
to 74795 b/f, 41.099 dB, 193078 ms
park joy 1080p at 15000 kbps,
from 290727 b/f, 30.640 dB, 609113 ms
to 290558 b/f, 30.630 dB, 592815 ms
Overall compression performance of speed 3 is changed
derf -0.171%
stdhd -0.168%
Change-Id: I8d47dd543a5f90d7a1c583f74035b926b6704b95
The function is called only once, right after all stats counters are
reset to 0. Therefore all the computations have zero effect on return
values. This commmit to removed those effectless code.
Change-Id: I50d27c0802547921fa36c60aa4bd92d76247f595
check that bitrates increase with cqlevel at global test case teardown,
rather than after each individual test case. this allows the tests to be
run out of order with --gtest_shuffle.
Change-Id: I9e0d4e6a2d920a1f2fe9aee7b7876a3e7eb5d297
Reinstates an assignment to prevent an asan failurere on google3.
Not sure why the failure happens. This was removed in a recent patch
https://gerrit.chromium.org/gerrit/#/c/71068/.
Change-Id: Ifd9ccffd4c2164f4de38b21821ffb28bd779b0f3
'ref_frame_map' is initialized to -1. avoids using an invalid index if
VP9_GET_REFERENCE/VP8_COPY_REFERENCE controls are issued after a decode
error.
Change-Id: I4599762c4d0b07a5943a72bf4a86ccb596cc062a
if the decode of the first frame fails, frame_to_show may not be set.
fixes a crash in vpxdec with corrupt data.
Change-Id: I5ab9476d005778a13fd42a39d05876bb6c90a93c
At speed 6 the smallest partitioning was 16x16 and biggest
intra block was 8x8, essentially disallowing all intra blocks
which produces ugly artifacts when revealing new video.
Change-Id: I364042d4c64e09be0666ade64aac94d0a1b586cf
Reverts to using tmpfile() for non-Windows platforms. On google3
the test directory does not have write permissions, and hence the
Y4mWriteTest fails. This patch fixes the issue.
On Windows, a temporary file is created in the temp directory
that has write permissions.
The tests pass on linux, mingw, and MS visual studio.
Change-Id: Ibada1d80e25d8b8e5b6a9d3d597533674bd9024c
the bulk of the functionality was removed in:
a42b5c2 Removing legacy XMA features from libvpx.
BUG=840
Change-Id: I8ca51d6aa76028f36d0eb1a15d2f2e3161e12ea4
simple_encoder: Flush encoder. According to the current API spec we need
to call vpx_codec_encode() until vpx_codec_get_cx_data() returns NULL.
Change-Id: Ibc37706e5257a3d51e5421ca17f77ab41249d9b5
When a valid data pointer is given make sure the size is greater than
zero.
A previous check for vp9 was incorrectly removed in:
7050074 Make the api behavior conform to api spec.
No semantics for valid pointers + 0-sized frames are defined for VPx
codecs, so move the check to vpx_codec_decode(). This avoids an assert
in vp9.
+ add some basic invalid param testing for decoder init/decode/destroy
Change-Id: I99f9cef6076d15874fd72ac973f2685d8a2353c3
This commit enables encoder to select fast forward transform and
quantization path according to the prediction residual sse/variance,
in the rate-distortion optimization scheme.
Change-Id: Ief9fc3844fd4107166d401970e800c6e5ce2b5fe
- Split vpxenc() into vpxenc() and vpxenc_pipe().
- Drop all but one positional param (the input file) in favor
of passing args directly to vpxenc.
- Add an extra lossless test that explicitly sets min-q and
max-q to 0.
Change-Id: I7d5f7b495f8b9447388c5f459bc9f6de2214caf2
We had a very complicated way to initialize cpi->pass from
cfg->g_pass:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->mode = ONE_PASS_GOOD;
break;
case VPX_RC_FIRST_PASS:
oxcf->mode = TWO_PASS_FIRST;
break;
case VPX_RC_LAST_PASS:
oxcf->mode = TWO_PASS_SECOND_BEST;
break;
}
cpi->pass = get_pass(oxcf->mode).
Now pass is moved to VP9EncoderConfig and initialization is simple:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->pass = 0;
break;
case VPX_RC_FIRST_PASS:
oxcf->pass = 1;
break;
case VPX_RC_LAST_PASS:
oxcf->pass = 2;
break;
}
Change-Id: I8f582203a4575f5e39b071598484a8ad2b72e0d9
Eliminated instructions by using better neon instructions
and rearranging the loop.
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~1.0%.
Change-Id: I6b1700e79318f647ea67ef25e954c308932950ec
Replaced encoder and decoder functions to get a pointer
to a reference frame with a common function, vp9_get_ref_frame,
and simplified it.
Change-Id: Icb206fcce8caace3bfd1db3dbfa318dde79043ee
in the sub_pixel_*variance* function the dst is aligned to 16 bytes and not
to 32 bytes - now load unaligned data
Change-Id: I2e0b9745543697efc56fefa32857ea10117af135
Fix the interaction between active map and reuse_inter_pred_sby. The
reuse_inter_pred_sby feature expects inter predictors to already be
built, but blocks with active map on skip this step.
Change-Id: Ibb2bf0d228f678935d82a0ede9cb0919ab7c8878
A bug in Microsoft compiler was found in the function
vp9_filter_block1d16_v8_avx2 and a workaround applied.
the bug occur when there was 4 consecutive maddubs + min + adds
intrinsic instructions.
Change-Id: I83499faeb70971e650e5663fd2490360ddb1a51b
in the function sad32x32x4d and sad64x64x4d the source is aligned to 16 bytes
and not to 32 bytes - the load is now unaligned.
Change-Id: I922fdba56d0936b5cf72e4503519f185645a168c
Specifies the bit-depth, color sampling and colorspace
for intra only frames for profiles > 0
Also adds checks to ensure that profile 1 and 3 are
exclusively used for non 420 streams.
Change-Id: Icfb15fa1acccbce8f757c78fa8a2f60591360745
This commit adds a configure time option used to enable strict error
checking in decoder to make sure intermediate stage cofficients of
inverse transforms are within valid range of signed 16 bit integer.
For valid VP9 input streams, intermediate stage coefficients should
always stay within the range of a signed 16 bit integer. Coefficients
can go out of this range for invalid/corrupt VP9 streams. However,
strictly checking this range for every intermediate coefficient can
be a burden for decoder, therefore such validation is only enabled
with configure option --enable-coefficient-range-checking.
Change-Id: I47d47c8c4e48a922c3d223ca59064f51b3f0f5ed
This commit integrates the fast transform and quantization process
into skip_recode scheme in the rate-distortion optimization loop.
Previously the fast transform and quantization process was only
enabled for non-RD coding flow.
Change-Id: Ib7db4d39b7033f1495c75897271f769799198ba8
This is needed to update the width/height and stride parameters
for the reference buffers that the denoiser uses.
Change-Id: Id51b3bdcb56bbbc8187865544ccd3d872a0d51fe
When no more data is available, vpx_codec_decode should
be called with NULL as data and 0 as data_sz.
vpx_codec_get_frame iterates over a list of the frames
available for display. The iterator storage should be initialized
to NULL to start the iteration. Iteration is complete when this
function returns NULL.
Also change the unit test to conform to the api spec.
Change-Id: I4b258b309f5df3d37d10c82f01492c0394181c2a
vp9_rb_bytes_written -> vp9_wb_bytes_written
+ move limits.h from the header to the source file where it's needed
Change-Id: Ifcdc856b4d4dcc2fff555ef11f86c86a0d83dab3
This patch allows the encoder to directly split the block
in partition search, therefore skip searching NONE. It
computes a score which measures whether 16x16 motion vectors
from the first pass in the current block are consistent with
each others. If they are inconsistent and we have enough Q
to encode, split the block directly, and skip searching NONE.
This feature is under flag CONFIG_FP_MB_STATS. In speed 2,
it further gives a speedup of 3-8% on sample yt clips as
compared to the previous version under the same flag. Overall,
the features under the flag will give 7-15% on typical yt
clips at up to 6000kbps data rate. The speedup at very high
data rate is not significant.
For hard stdhd clips:
park_joy_1080p @ 15000kbps: 504541ms -> 506293ms (-0.35%)
pedestrian_area_1080p @ 2000kbps: 326610ms -> 290090ms (+11.2%)
The compression performance using the features under the flag:
derf: -0.068%
yt: -0.189%
hd: -0.318%
stdhd:-0.183%
To use the feature, set CONFIG_FP_MB_STATS and turn on
cpi->use_fp_mb_stats.
Change-Id: Iad58a2966515c8861aa9eb211565b1864048d47f
This code was being called from two places and
difficult to parse. I rationalized it in to a
function to improve readability.
Change-Id: I154b8fe0b84e6c01e69601e78e67bd47c954d8b6
Re-organize the one-byte structure for 16x16 first pass
block. Add bits to indicate motion vector directions.
Change-Id: Id10754ba343dfc712c7fed5bcc85c67fa0bbcb89
vp9_variance8x8(), and vp9_get8x8var().
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~1.2%.
Change-Id: I8a66ac2a0f550b407caa27816833bdc563395102
- vp9_spatial_svc_encoder.c no longer supports the -m parameter that
has been used in the example test. Tests using -m have been disabled.
- Added a basic test that appears to work as of commit
3249f26ff8.
- Minor style clean up.
Change-Id: Ic1402fcbbe28e33982c5ea12d1e3349f4069a5bf
- Split vpxdec wrapper function into vpxdec() and vpxdec_pipe().
- Remove hard coded --noblit and --summary arguments from
the wrappers in favor of shifting off the first argument (the
input file) and passing all remaining parameters to vpxdec.
- Add --noblit and --summary args to existing tests, and update the
pipe input test to use vpxdec_pipe().
Change-Id: Ia390a9990eace793058b3603ada733fb878eb78c
This patch allows the encoder to skip the search for partition
SPLIT, HORZ, VERT after the search for partition NONE is done
in RD optimization. It uses the first pass block-wise statistics
to make the decision. If all 16x16 blocks in the current partition
have zero motions and small residues from the frist pass statistics,
and it has small difference variance, further partition search is
skipped.
For speed 2 setting, experiments on general youtube clips show that
the speedup varies from 1% - 10%, 5% on average. On the performance
side in PSNR, derf 0.004%, yt -0.059%, hd -0.106%, stdhd 0.032%.
For hard stdhd clips:
park_joy_1080p, 502952 ms -> 503307 ms (-0.07%)
pedestrian_area_1080p, 227049 ms -> 220531 ms (+3%)
This feature is under the compilation flag CONFIG_FP_MB_STATS and
it is off in current setting.
Change-Id: I554537e9242178263b65ebe14a04f9c221b58bae
Remove the variable that indicates the relative block index. This
is explicitly covered by the use of pc_tree.
Change-Id: Ib13142582fff926c85e375bde656aa050add8350
extract only the md5 + quote the result
fixes:
test/examples.sh: 47: local: img-176x144-0029.i420: bad variable name
Change-Id: I81c6a83c8a4e792a520fd7046c8eedcbd4af9a0c
This commit enables a chessboard pattern constrained partition
search for 720p and above resolutions. The scheme applies stricter
partition search to alternative blocks based on its above/left
neighboring blocks' partition range, as well as that of the
collocated blocks in the previous frame. It is currently turned
on at 16x16 block size level. The chessboard pattern is flipped
per coding frame.
The speed 3 runtime is reduced:
park_joy_1080p, 652832 ms -> 607738 ms (7% speed-up)
pedestrian_area_1080p, 215998 ms -> 200589 ms (8% speed-up)
The compression performance is changed:
hd -0.223%
stdhd -0.295%
Change-Id: I2d4d123ae89f7171562f618febb4d81789575b19
vp9_variance16x16(), and vp9_get16x16var().
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~16.7%.
Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb
currently the only way to know if multiple alt-refs are enabled is to
inspect the encoder instance.
this reduces the size of the allocation by 75% when not using multiple
alt-refs
Change-Id: Ie4baa240c2897e64b766c6ad229674884b5a65b6
When building with runtime cpu detect assume that armv7 targets can be
relied upon to have at least armv6 support. This may allow dead code
detectors to remove some _c functions.
Change-Id: Iaec4414011fcbbdf6f4ed0d90ef4a8fe8af540b5
This commit replace the repetitive retrieve of max and min allowed
partition from speed_feature with local variables max_size and
min_size.
Change-Id: Ib06f11f16615e4876e4dd5fb6a968c6bf5f7b216
The get_chessboard_index() used to call the entire VP9_COMMON
struct pointer to retrieve the chessboard pattern index. This cl
makes it call the frame index directly.
Change-Id: I3cad9d209ea2e77a358085a04fe1ff0ddec5ba03
Remove the redundant index computation when store the first
pass block-wise statistics. Currently, a single byte is
allocated for a 16x16 blocks, and all the frame statistics
saved during the first pass will be kept in memory for use
in the second pass. For a 1920x1080 300-frame clip, it will
take about 2.3 MB memory. This feature is off in current
setting.
Change-Id: I135a95b348ec093d54c6a07e1e8237626909e3bd
Remove all the redundant dct functions (dct4x4, dct8x8)
in avx2 except dct32x32 those functions were copied originally from dct_sse2
Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e
The issue was introduced by commit g9f37d14 with adding explicit
restrictions on reference-frame scale factors. The restriction
is checked against aligned-by-8 frame dimensions, not against
original ones. So, for example, frame of 35×35 actually can refer
to frame of 70×70, but the new check won't allow this. It will
compare 35 vs 72 (not 70), so 2x downscale limit will be exceeded.
Change-Id: Ic663693034440f64ac8312cbff9e1e773a921060
The partition search for 4x4 blocks takes unnecessary steps to
reconstruct pixels and an extra partition type update. This commit
removes such operations. No visible compression/speed difference.
Thanks to Yue (yuec@) for finding this issue.
Change-Id: I3f83824aa3fd3717d63be0b280fa57258939a70a
The code fails the unit test. Speed comparisons to the C are invalid
because the code frequently didn't correctly extend the right and
bottom portions of the frame.
Reduce maximum frame size on ARM devices to avoid OOM
Change-Id: Ia664c86406f0bb8120fd7ad401f32d0bd44994fb
The source buffer is an aligned buffer in VP9. Added the alignment
to make it consistent with libvpx.
Change-Id: I3ebb9d2e8555ed532951da479dd5cbbb8812e02d
This commit turns on the existing vp9_get_prob function using
64 bit in the intermediate step. It fixes the ioc issue for 4K
above frame sizes (issue 828).
Change-Id: I9f627f3beca2c522f73b38fd2a3e7eefdff01a7c
The assignment of the variable mode_excluded in
vp9_rd_pick_inter_mode_sub8x8 takes redundant conditional jump.
This commit removes it.
Change-Id: Ie195fbe6e54ec2ade7093d562c456a2e93143704
Ensure consistent border extension by rounding uv_crop_* at image
creation time. Where it was rounded problems could arise with the right
and bottom extensions.
When padding = 32, y_width = 64, and y_crop_width = 63:
(padding + width - crop_width + 1) / 2
32 + 64 - 63 + 1 should equal 32 *but*
32 + 1 + 1 equals 34 giving a right buffer of 17 instead of 16.
By calculating uv_crop_* earlier we round up at the appropriate time and
for the same values:
(y_crop_width + 1) / 2
63 + 1 / 2
64
(padding / 2) + uv_width - uv_crop_width
16 + 16 - 16
16
Change-Id: If866cd1b63444771440edb1432280ac83875969b
A previous change, https://gerrit.chromium.org/gerrit/#/c/70632,
introduced a size validation for reference frames to insuare the
input stream is a valid VP9 stream. However, the logic requiring
all reference frames have valid size turned out to be too strict.
In this commit, we modify the validation to require one of the
reference frame has valid dimension. In addition, the decoder
reports error whenever it detects the use of reference frame
with invalid scalig ratio.
Change-Id: If8efc312244087556cfe00f1fcbdff811268ebad
'auto_help' was added to Getopt::Long in 2.33
this isn't strictly necessary as an unrecognized option (--help) will
issue a warning and then print the usage
Change-Id: Ia757553a4e19d22a8eb70768a8866ab1a76a0eec
The patch:
https://gerrit.chromium.org/gerrit/#/c/70814/
changed the test that determined whether the context
frame buffers needed to be reallocated or not.
The code checked for a change in total frame area
to signal the need to reallocate context buffers.
However, the above_context buffer needs to be
resized i:xf only the width of the frame has increased.
Change-Id: Ib89d75651af252908144cf662578d84f16cf30e6
--strip-unneeded causes SIGSEGV when accessing g_executable_path. So
test_libvpx crashes due to SIGSEGV in ::testing::InitGoogleTest().
OS/2, aout, strip v2.23.2
Change-Id: I2718d082447ee0d9ad0c021b9156c50e1ac085a6
1. Remove last reference flag for first frame upper layers in one pass mode.
2. Disable refresh golden frame flag for key frames.
Change-Id: I44ac1bd2c795169e4fbfdd078ea79a1d33a204d6
The value of mode_excluded has been properly set in
vp9_rd_pick_inter_mode_sb(). It is redundant to send it in
handle_inter_mode() and re-set the value again.
Change-Id: I408d4731f2f42e0bcf3ae62e85757717bb410471
This commit extends the chessboard pattern prediction filter search.
If the above and left blocks have the same prediction filter type,
the encoder will skip the prediction filter type search and use the
reference one.
The overall chessboard pattern prediction filter type search reduces
speed 3 runtime for hard clips. Experiments on park joy at 1080p
and 15000 kbps show that the runtime goes from 723265 ms to 65832 ms,
i.e., about 10% speed-up. Compression performance wise, it affects
the coding quality by
Change-Id: I880975497c7ad166532e9eea9bf46684d77ff327
derf: -0.326%
yt: -0.257%
hd: -0.241%
stdhd: -0.417%
This commit enables a chessboard pattern prediction filter type
search scheme for rate-distortion optimization speed-up. For the
inferred motion vector modes, the encoder can re-use its above/left
neighbor blocks' prediction filter type and skip a full test on
all possible filter types. Such operation is turned on/off
alternatively in a chessboard manner.
It is turned on in speed 3. For test clip pedestrian 1080p, the
runtime is reduced from 231500 ms -> 221700 ms. The compression
performance is changed:
derf: -0.147%
yt: -0.134%
hd: -0.079%
stdhd: -0.220%
Change-Id: I1912f278e7576c2dc632688e3ad7a257410c605a
VP9FrameSizeTestsLarge exposed an integer overflow in the VP9 encoder,
for now reduce the size to allow the tests to clear and prevent further
regressions.
4096x4096 -> 4096x2160
this should be restored after the bug is fixed:
https://code.google.com/p/webm/issues/detail?id=828
Change-Id: I47fdf0648f1d9a3951f731bbf0b727f85ada4fa1
When OUTPUT_YUV_DENOISED is enabled the encoder outputs the uncompressed,
denoised video to a separate file. Moved the point at which the file is
written to in order to avoid an extra blank frame at the beginning of the video.
Change-Id: I805f6a912b18b3d9cae59b13c5b8108279439ce3
the code tied to CONFIG_MULTIPLE_ARF was deleted in:
2611022 Clean out old CONFIG_MULTIPLE_ARF code.
Change-Id: Ie70bf047cde7e88d4b3996c8ff529e409bbe99e2
This should be a local variable. Move the definition from
vp9_rd_pick_inter_mode_sb to handle_inter_mode.
Change-Id: I14f4168bb1c896ed04e8f6d4cd89fbf4c9839944
Since the UV decision to denoise is based on Y, we need to set
the default/initial denoiser decision_u/v to COPY_BLOCK,
to make sure if no uv_denoiser is applied we still update
(uv)running_avg with source.
Change-Id: I5af1c2afbd40c498cd3de208bea88c837099b24d
On a key frame, the denoised-running_avg for all references
frames should be updated with the source.
The altref denoised-running_avg was not being updated on key frame,
this fixes that.
Change-Id: Ie02cd0ba5383e013af59240e6df7e185d11703f6
For sequences of resolution below 720p, the encoder will check
intra prediction modes and inter prediction modes from LAST_FRAME.
This commit turns on adaptive prediction filter scheme for sub8x8
blocks, where inter prediction modes are enabled. For the test
sequence bus at CIF, the speed 2 runtime goes down from 17879 ms
to 16783 ms, i.e., 6% speed up. The compression performance of
derf set is down by -0.128%.
Change-Id: I01d5321a5ceab4e0666ac5be56c52d896c7a8d45
The commit moved a call to vp9_clear_system_state() to a correct
location, i.e. prior function calls using floating point numbers.
This was to fix a mismatch mmx code and sse2 version, where a
floating point number used in adjust_frame_rate(cpi) gets NAN due
to mmx registers being in wrong state.
Change-Id: I40e0a6de98812000ccee6a729badb630604fd7e6
For gcc, when libvpx config option debug is disabled, added the
flag -DNDEBUG to disable the assertions in libvpx for some speedup.
Change-Id: Ifcb7b9e8ef5cbe5d07a24407b53b9a2923f596ee
This patch adds back in code that checks that the frame
size lies within defined bounds was inadvertantly removed
by a previous patch:
https://gerrit.chromium.org/gerrit/#/c/70814/
Change-Id: If526570ba559260c4b7e98098bc75f7700ae7f97
Uses mkstmp() with directory being the same as the test data
directory to create temporary output file. For Windows
GetTempFileNameA() function is used.
Change-Id: Ie4681b2b4f44f8c22d3b3faf134c44087b484f94
This undoes a check that attempted to insure on 32 bit machines allocations
bigger than 32 bit failed, but it failed before the test could be hit,
revert that for now so we can do a roll
Change-Id: Ib607de6675c10100b716df94eb329649633509c8
If the img allocation fails the test used to crash before on
32 bit architecture. This patch uses null check on img in
FillFrame. Also, if the first frame initialization has not been
conducted VPX_CODEC_ERROR is expected to return rather than
VPX_CODEC_OK.
Change-Id: I5c4e59c156374009012d280d6ff971a89b43c11f
Separates HBD profile int two profiles (2 and 3) consistent with the
highbitdepth branch. This patch is ported from the original highbitdepth
branch patch: https://gerrit.chromium.org/gerrit/#/c/70460/
Two of the invalid file tests needed to be updated.
Change-Id: I6a4acd2f7a60b1fb4cbcc8e0dad4eab4248431e3
The bug sets the wrong pointer to the first pass mb stats
if the encoder does the re-coding in the second pass.
Change-Id: I8a11f45dd7dceb38de814adec24cecccae370d00
This patch is the first step toward simplifying the
frame buffer handling.
The final goal is to have a common frame buffer handling
framework for both encoder and decoder that incorporates
the existing ability to use externally allocated memory.
Change-Id: I2c378a4f54a39908915f46c4260e17a080db7ff1
This is a practical concern to allow us to fail in a decoder instance
if the size of a file is bigger than we can reasonably handle.
Change-Id: I0446b5502b1f8a48408107648ff2a8d187dca393
This commit changed the hard-coded DEFAULT_INTERP_FILTER to a speed
feature with the same default value: SWITCHABLE.
Change-Id: I7f54f40f1bd3f5277841d04b85db7a84e47313f1
and vp9_sad16x16_neon()
On a Nexus 7, vpxenc (in realtime mode, speed -6)
reported a performance improvement of ~17%.
Change-Id: I91e070cde2973451083d3f3d63b49b7886de9a85
Adds support for raw yuv inputs in 422/444 sampling for use
in profiles 1 and 3.
New options added to vpxenc are:
--i422 and --i444, which are to be used in conjunction with
--width, --height, and --fps for proper raw yuv handling.
A new option is added to vpxdec:
--rawvideo, which enforces raw yuv video output for the
bit-stream decoded irrespective of 420, 422 or 444 sampling.
The existing options --i420 and --yv12
are specialized for use only for 420 content.
Change-Id: I2e3028380709afa673bf2e2c25ad5e271a626055
2 pass only change to calculation of rd mult based on Q.
Make a small adjustment based on frame type and also
replace adjustment based on iifactor with an one based
on the ambient GF/ARF boost level.
Also fix multi arf bug / issue.
Overall these change give an slight improvement in ssim
but hurt psnr a little.
Change-Id: I5e1751e3ff5390a26f543d7855059e6fbcce105e
We target this speed to achieve similar encoding speed and better
compression than vp8 rt mode with cpu-used at -12.
Change-Id: Ic1bb4371c81a17ea80e83459c1cbf4c09a3498e8
The issue was introduced by commit g7c43fb6. If current frame
is repeated from existing-ref pool, frame buffer ref counter
is not decreased, so buffer isn't released. Decoder fails being
unable to allocate new frame buffer at some point.
Added a test vector to verify that the condition will not
recur later. Test vector was generated by the code in this patch:
https://gerrit.chromium.org/gerrit/#/c/70862/
Change-Id: I8af96eb5b9670176e01a281d2e18bd458712cf78
In vp8, statistics are collected about the different modes as they are searched.
This process is more complicated due to the variable block size. Fields were
added to the PICM_MODE_CONTEXT struct to hold this information for each point in
the search. The information is then taken from the appropriate part of the tree
during denoising.
Change-Id: I89261ab77ad637821287ae157dfdf694702b8e77
Sets the bit-depth field as default 8 in the image structure in vp8.
Generalizes yuv read in preparation for support for reading 422/444
for 8-bit and 10/12-bit.
Change-Id: I560c13c348b122fd028e408431156376b895058c
All changes are for spatial svc only.
1. Enable encoding hidden frames in each layer and use alt reference idex to reference the hidden frame in each layer
2. Use golden reference idx for spatial reference
3. For those layers that don't have hidden frames (caused by lack of frame buffers), reference a hidden frame in lower layers
4. Add "auto-alt-refs" in svc options
Change-Id: Idf27d1fd2fb5f3ffd9e86d2119235e3dad36c178
fixes visual studio 9 + apple clang builds where the template type is
interpreted as char[] rather than const char*:
::f1_' : cannot specify explicit initializer for arrays
error: array initializer must be an initializer list or string literal
Change-Id: I27286ce341b2f7a09b6202caffd6b72f64fd2234
This commit fixes a potential out-of-boundary memory access due to
the use of reuse_inter_pred_sby in the non-RD coding flow. It
resolves the corresponding asan error.
Change-Id: Iff605f5921230966990013541cd855d698810922
This commit fixes a mismatched use case of block size in non-RD
intra prediction check. The residual SSE and variance should be
calculated per transform block size, instead of operating block
size, which caused chrome valgrind warning on conditional jump
based on uninitialized value (webm issue 823). This commit
resolves this issue.
Change-Id: I595c06599c7e0fd0e4a08736519ba68fc14bc79a
Also fix bugs related with corrupted frame handling.
Return VPX_CODEC_CORRUPT_FRAME when getting corrupted
block.
Change-Id: I7207ccc7c68c4df2b40b561315d16e49ccf7ff41
Refactoring to remove some duplication of probability
tables between tokenization and detokenization.
Change-Id: I2fc6a6497f9c0410021a9b41f828bc58a864e466
Use a weaker filter for second level arf frames.
Average gain across all sets and metrics ~0.3%
Remove code for arnr_type which is no longer
supported in VP9 which always uses a centered blur.
Re-factor and some cleanup.
Change-Id: Ieb4b8940e99e4e02b3fcc9fca6f2d4109e6ed639
Specifying the --prefix command line arg executes all test programs within the
context of the prefix string, which is assigned to VPX_TEST_PREFIX.
All test functions updated to include VPX_TEST_PREFIX in their eval command.
Change-Id: I2e215cc8f216048edf3269db02a6b5660fe32318
used to wrap API functions to ensure full environment consistency as
opposed to the renamed ASM_REGISTER_STATE_CHECK which is used with
assembly functions.
currently checks the FPU tag word in x86/x86_64 gcc builds to ensure
emms has been called.
Change-Id: Ie241772dbf903d33d516a1add4c8c6783f2e1490
pull the latest from libwebp.
Original source:
http://git.chromium.org/webm/libwebp.git
100644 blob 264210ba2807e4da47eb5d18c04cf869d89b9784 src/utils/thread.c
commit 46fd44c1042c9903b2f1ab87e9f200a13c7e702d
Author: James Zern <jzern@google.com>
Date: Tue Jul 8 19:53:28 2014 -0700
thread: remove harmless race on status_ in End()
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.
Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
Change-Id: Ib0ef25737b3c6d017fa74822e21ed58508230b91
Currently, vp9_diamond_search_sadx4() is only called when sse3 is
enabled, which is improper since sse2 optimization of sdx4df
functions are available. Changed to always use
vp9_diamond_search_sadx4().
Change-Id: I4b95d6b7a3c6c645783c373f0ba8d645ece24717
- fix indent, spelling
- drop some whitespace in some comments
- add an assert in vp9_setup_mask, it shouldn't be called on decode
error
Change-Id: Ic312a815e977a6f9cb81ceb7b039eeada76c5aa0
There are sse2 optimization of sdx4df functions. Instead of calling
vp9_refining_search_sadx4 only when sse3 is enabled, call it always.
Change-Id: I24f93818f7d4209d1425039e0eb099ff9ff08fe9
Use FAST_HEX in speed 5 and 6, which covers more points than
FAST_DIAMOND and improves motion search quality.
At speed 6, RTC set borg tests showed slight quality gain (psnr
gain: 0.143%, ssim gain: 0.226%). No noticeable encoding speed
change.
Change-Id: Ifa62875d9a52ee382ec494f271382bb77d8c67bf
This commit combined the full pel and sub pel motion search into a
single function to avoid code duplication. The commit does not change
encoder outputs.
Change-Id: Ibe18342c4f64073bef20f9cf6c6ca0a20d01bf0d
This commit enables a new quantization process for 32x32 2D-DCT
transform coefficient blocks. It improves the compression
performance of speed 5 by 1.4%. The overall compression gains of
speed 5 due to the new quantization scheme is 4.7%. It also includes
the SSSE3 implementation of the 32x32 quantization process.
Change-Id: I0855b124fd6462418683f783f5bcb44255c9993b
This patch fixes bug 633:
https://code.google.com/p/webm/issues/detail?id=633
The first decoded frame does not have to be a keyframe,
it could be an inter-frame that is coded intra-only.
This patch fixes the handling of intra-only frames.
A test vector has also been added that encodes 3
intra-only frames at the start of the clip. The
test vector was generated using the code in the
following patch:
https://gerrit.chromium.org/gerrit/#/c/70680/
Change-Id: Ib40b1dbf91aae2bc047e23c626eaef09d1860147
In the previous version, only certain buffers in the macroblockd were saved and
the restored. In this version, all of the buffers are saved and restored. The
code was then rolled into a loop for readability.
Also contains a tiny fix for when the -DOUTPUT_YUV_DENOISED flag is used.
Change-Id: Id925ef8b3fa122ae88acfa1d9a1e4df45df83518
vp8/encoder/x86/denoising_sse2.c:35:10: error: taking the absolute value
of unsigned type 'unsigned int' has no effect [-Werror,-Wabsolute-value]
Change-Id: I749ba8e6f55dbd9b822bfd4260a8397554f5e524
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.
Change-Id: I541277cf15eefed6641555944f67f4a0bcdc8154
* Replace max_step_search_steps with constant MAX_MVSEARCH_STEPS
* Fold (reduce_first_step_size + speed > 5) into reduce_first_step_size
replacing uses of reduce_first_step_size that don't add the speed
check with zero.
Change-Id: Iae46395dbf3eaca138bf4d18b838a9e364b5a198
The y4m extension used is the same as the one used in ffmpeg/x264.
The patch is adapted from the highbitdepth branch.
Also adds unit tests for y4m header parsing and md5 check
of the raw frame data, as well as y4m writing.
[build fix for Mac/VS by not using tuples with strings]
Change-Id: I40897ee37d289e4b6cea6fedc67047d692b8cb46
vp9_rdopt is for making rd optimal mode decisions. vp9_rd is for all
other rd related routines. Anything used outside of making an rd optimal
decision belongs in rd.
Change-Id: I772a3073f7588bdf139f551fb9810b6864d8e64b
Moved the threshold adjustment before reference flag checking,
which could set the threshold to INT_MAX for disabled reference
frame, and cause overflow if the adjustment is done after that.
Change-Id: I85e94f8726d5e3ae93f65965aa978721dddc9957
Renamed updating_running_avg() to filter(). Extended function with the rest of
the filter procedure. Made all of the empirically-determined constants used in
VP8 into functions so they can be tweaked more easily.
Change-Id: I41730c8c92370c76885950a43742347477ca4e7e
As in VP8.
Currently, this parameter is set with the VP8E_SET_NOISE_SENSITIVITY flag.
The flag was not renamed so that we don't break the interface for webrtc. This
should probably be changed at some point in the future.
Change-Id: Ic73fcb0dde9d1d019e9d042050b617333ac65472
Add test code to turn multi-arf on and off depending
on group length and zero motion.
Changes to active max group length for mult-arf.
Fund second arf only from normal frame bits.
Change-Id: I920287fac1c886428c15a39f731a25d07c2b796c
This commit added a speed feature to control the step_param used in
full pixel motion search. The intention is to reduced the search
steps for high speed real time coding.
Change-Id: I21d2f0105c2b647783a6688615da7fcf2b6d670b
Adapt the use of segmentation in AQ mode 2 based on
the ambient kf/arf/gf Q.
Disable segmentation where the rate per SB is very
low and overheads are likely to outweigh the benefits.
This patch reduces the -ve average metrics impact
of AQ mode 2 while allowing stronger 3 segment AQ
in some cases. Average improvement ~0.5-1.0%.
Change-Id: I5892dfcc7507c5cc6444531cc7fe17554cf8d0c7
The y4m extension used is the same as the one used in ffmpeg/x264.
The patch is adapted from the highbitdepth branch.
Also adds unit tests for y4m header parsing and md5 check
of the raw frame data, as well as y4m writing.
Change-Id: Ie2794daf6dbafd2f128464f9b9da520fc54c0dd6
for debugging purposes.
continues decoding after receiving a decode error. will still exit with
an error after the current loop, ignoring remaining --loops
Change-Id: I011a71b866ff493a3f3bbb59e9bff998d19daee3
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.
The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.
Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
Add a conditional compile flag for this feature. Also add a
switch to enable the encoder to use these statistics in the
second pass. Currently, the switch is turned off.
Change-Id: Ia1c858c35ec90e36f19f5cffe156b97ddaa04922
The current threshold is knid of low, and in many cases NEWMV
mode is checked but not picked as the best mode. This patch
added a speed feature to increase NEWMV threshold, so that
less partition mode checking goes to check NEWMV. This feature
is enabled for speed 6 and 7.
Rtc set borg tests showed:
1. Speed 6, overall psnr: -0.088%, ssim: -1.339%;
Average speedup on rtc set is 11.1%.
2. Speed 7, overall psnr: -0.505%, ssim: -2.320%
Average speedup on rtc set is 12.9%.
Change-Id: I953b849eeb6e0d5a1f13eacba30c14204472c5be
pull the latest from WebP, which adds a worker interface abstraction
allowing an application to override init/reset/sync/launch/execute/end
this has the side effect of removing a harmless, but annoying, TSan
warning.
Original source:
http://git.chromium.org/webm/libwebp.git
100644 blob 08ad4e1fecba302bf1247645e84a7d2779956bc3 src/utils/thread.c
100644 blob 7bd451b124ae3b81596abfbcc823e3cb129d3a38 src/utils/thread.h
Local modifications:
- s/WebP/VP9/g
- camelcase functions -> lower with _'s
- associate '*' with the variable, not the type
Change-Id: I875ac5a74ed873cbcb19a3a100b5e0ca6fcd9aed
The caller should reset the state instead of letting worker
to reset.
This reverts commit 34b2ce15f9.
Change-Id: Idb546ea6386cffc44e98dee772900d21ab79710f
Encoder still uses SWITCHABLE as default via DEFAULT_INTERP_FILTER,
but does not override the default if it is not SWITCHABLE.
Change-Id: I3c0f6653bd228381a623a026c66599b0a87d01d5
When the frame is intra coded only, the encoder takes the RD
coding flow. Hence the function set_mode_info is not practically
in use. This commit removes it and the associated conditional
branches.
Change-Id: I1e42659ceb55b771ba712d1cdecacb446aa6460d
This fixes the hang in VP9/InvalidFileTest.ReturnCode/3
due to worker->had_error has not been reset after getting
error.
Change-Id: Ia3608225094758a2bd88f6ae4dd9dfd93bbaad27
For real time speed 7, once encode breakout is on(i.e. encoding
setting --static-thresh=1), a proper encode breakout threshold
is set to speed up the encoder.
Set --static-thresh=1, RTC set borg test showed a slight overall
psnr loss of 0.162%, but ssim gain of 0.287%. The average speedup
on RTC set is 6%, and for some clips, the speedup can be 10+%.
Change-Id: Id522d9ce779ff7c699936d13d0c47083de4afb85
Before encoding a frame, calculate and store each 16x16 block's
variance of source difference between last and current frame.
Find partitioning threshold T for the frame from its variance
histogram, and then use T to make partition decisions.
Comparing with fixed 16x16 partitioning, rtc set test showed an
overall psnr gain of 3.242%, and ssim gain of 3.751%. The best
psnr gain is 8.653%.
The overall encoding speed didn't change much. It got faster for
some clips(for example, 12% speedup for vidyo1), and a little
slower for others.
Also, a minor modification was made in datarate unit test.
Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c
this file is an include and doesn't need to be built on its own.
fixes:
ranlib: file: libvpx_g.a(x86inc.asm.o) has no symbols
Change-Id: I89504e37ff0a4488489af7b9b7e09fb32acc4853
the buffer is only used in encoding and only when
CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled.
a future change should decouple this from the frame buffer allocation
and make it conditional based on runtime flags when the above config
options are enabled.
reduces decode heap usage by at least 12%
Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf
This reverts commit b336356198.
This causes a hang in:
VP9/InvalidFileTest.ReturnCode/3
the change to test/user_priv_test.cc remains with a minor update
Change-Id: I4a8a272ca37ea329b0f413f0b1cd827a238bd9fd
Encoding screen content exercises various fast skip paths that are
missed by natural video content.
Change-Id: Ie359884ef9be89cbe5dda6d82f1f79360604a090
This patch checks that a decoder never tries to reference frame that's
outside the range of 2x to 1/16th the size of this frame. Any attempt
to do so causes a failure.
Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c
Set the proper number of mb_rows/cols.
Also remove warnings (unused variable) when configured with temporal-denoising disabled.
Change-Id: I8abd2372394ee55295feb87a66efd294ea6989d0
Bug introduced in I930dced169c9d53f8044d2754a04332138347409. If
svc.number_temporal_layers == 1 and svc.number_spatial_layers == 1, the system
attempt to do spatial SVC. It no longer does that.
Change-Id: Ie6b130a72b1eea40c547c9a64447e40695f811c5
For the primary arf in a group, if multiple arfs
are enabled and we were using arfs in the previous
group, then allow the second arf from the previous
group to be used as an additional reference.
Change-Id: Iaf41706a52f54ef21548026851cd77100d6aebda
This commit enables an adaptive transform size selection method
for speed -6. It uses largest transform size when the sse is more
than 4 times of variance, i.e., most energy is compacted in the
DC coefficient. Otherwise, use the default TX_8X8. It improves
the compression efficiency for rtc set of speed -6 by 0.8%, no
speed change observed.
Change-Id: Ie6ed1e728ff7bf88ebe940a60811361cdd19969c
This patch allows the encoder to skip the partition search for the
frame if it is an inter frame and only zero motion vectors have
been detected in the first pass. The partition size is directly
assigned according to the difference variance.
Borg tests show overall little performance changes in term of PSNR
(derf -0.027%, yt 0.152%, hd 0.078%, stdhd 0%). The worst case of
PSNR loss is -0.514% from yt. The best PSNR gain is 4.293% from yt.
The second pass encoding speedup for slideshow clips is 15%-40%.
Change-Id: I881f347d286553ee5594a9ea09ba1a61ac684045
C version and sse2 version, and off by default.
For the test clip used, the sse2 performance improved by ~5.6%
Change-Id: Ic2d815968849db51b9d62085d7a490d0e01574f6
This commit enables a fast reference motion vector search scheme.
It checks the nearest top and left neighboring blocks to decide the
most probable predicted motion vector. If it finds the two have
the same motion vectors, it then skip finding exterior range for
the second most probable motion vector, and correspondingly skips
the check for NEARMV.
The runtime of speed -5 goes down
pedestrian at 1080p 29377 ms -> 27783 ms
vidyo at 720p 11830 ms -> 10990 ms
i.e., 6%-8% speed-up.
For rtc set, the compression performance
goes down by about -1.3% for both speed -5 and -6.
Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
Bug introduced during multiple iterations on: I3831*
gf_group->arf_update_idx[] cannot currently be used
to select the arf buffer index if buffer flipping on overlays
is enabled (still currently the case when multi arf OFF).
Change-Id: I4ce9ea08f1dd03ac3ad8b3e27375a91ee1d964dc
This commit fixes the potential issue in the non-RD mode decision
flow that only checks part of the block to estimate the cost. It
was due to the use of fixed transform size, in replacing the
largest transform block size. This commit enables per transform
block cost estimation of the intra prediction mode in the non-RD
mode decision.
Change-Id: I14ff92065e193e3e731c2bbf7ec89db676f1e132
same reasoning as:
9f3a0db vp9_rtcd: correct avx2 references
these are all intrinsics, so don't depend on x86inc.asm
Change-Id: I915beaef318a28f64bfa5469e5efe90e4af5b827
This patch reverts the previous revert from Jim and also add a
variable user_priv in the FrameWorker to save the user_priv
passed from the application. In the decoder_get_frame function,
the user_priv will be binded with the img. This change is needed
or it will fail the unit test added here:
https://gerrit.chromium.org/gerrit/#/c/70610/
This reverts commit 9be46e4565.
Change-Id: I376d9a12ee196faffdf3c792b59e6137c56132c1
Cosmetic patch only in response to comments on
previous patches suggesting a couple of name changes
for consistency and clarity.
Change-Id: Ida3a359b0d5755345660d304a7697a3a3686b2a3
like vpx_codec_decode(), vpx_codec_peek_stream_info() takes an unsigned
int, not size_t, parameter for buffer size
Change-Id: I4ce0e1fbbde461c2e1b8fcbaac3cd203ed707460
the max is 6. there are assumptions throughout the decode regarding
this; fixes a crash with a fuzzed bitstream
$ zzuf -s 5861 -r 0.01:0.05 -b 6- \
< vp90-2-00-quantizer-00.webm.ivf \
| dd of=invalid-vp90-2-00-quantizer-00.webm.ivf.s5861_r01-05_b6-.ivf \
bs=1 count=81883
Change-Id: I6af41bb34252e88bc156a4c27c80d505d45f5642
This commit replaces a few use cases of cpi->common with preset
variable cm, to avoid unnecessary pointer fetch in the non-RD
coding mode.
Change-Id: I4038f1c1a47373b8fd7bc5d69af61346103702f6
In real-time speed 6, no partition search is done. The inter
prediction results got from picking mode can be reused in the
following encoding process. A speed feature reuse_inter_pred_sby
is added to only enable the resue in speed 6.
This patch doesn't change encoding result. RTC set tests showed
that the encoding speed gain is 2% - 5%.
Change-Id: I3884780f64ef95dd8be10562926542528713b92c
There is a normative scaling range of (x1/2, x16)
for VP9. This patch fixes the maximum downscaling
tests that are applied in the convolve function.
The code used a maximum downscaling limit of x1/5
for historic reasons related to the scalable
coding work. Since the downsampling in this
application is non-normative it will revert to
using a separate non-normative scaler.
Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f
Add indirection to the section of buffer indices.
This is to help simplify things in the future if we
have other codec features that switch indices.
Limit the max GF interval for static sections to fit
the gf_group structures.
Change-Id: I38310daaf23fd906004c0e8ee3e99e15570f84cb
Fix some bugs relating to the use of buffers
in the overlay frames.
Fix bug where a mid sequence overlay was
propagating large partition and transform sizes into
the subsequent frame because of :-
sf->last_partitioning_redo_frequency > 1 and
sf->tx_size_search_method == USE_LARGESTALL
Change-Id: Ibf9ef39a5a5150f8cbdd2c9275abb0316c67873a
This patch implements a mechanism for inserting a second
arf at the mid position of arf groups.
It is currently disabled by default using the flag multi_arf_enabled.
Results are currently down somewhat in initial testing if
multi-arf is enabled. Most of the loss is attributable to the
fact that code to preserve the previous golden frame
(in the arf buffer) in cases where we are coding an overlay
frame, is currently disabled in the multi-arf case.
Change-Id: I1d777318ca09f147db2e8c86d7315fe86168c865
The encoder currently allocates frame buffers before
it establishes what the chroma sub-sampling factor is,
always allocating based on the 4:4:4 format.
This patch detects the chroma format as early as
possible allowing the encoder to allocate buffers of
the correct size.
Future patches will change the encoder to allocate
frame buffers on demand to further reduce the memory
profile of the encoder and rationalize the buffer
management in the encoder and decoder.
Change-Id: Ifd41dd96e67d0011719ba40fada0bae74f3a0d57
This patch insures that the last byte of a chunk that contains a
valid superframe marker byte, actually has a proper superframe index.
If not it returns an error.
As part of doing that the file : vp90-2-15-fuzz-flicker.webm now fails
to decode properly and moves to the invalid file test from the test
vector suite.
Change-Id: I5f1da7eb37282ec0c6394df5c73251a2df9c1744
Avoids failures:
MSE_ClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKeyDecryptOnly/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKeyDecryptOnly_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ExternalClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ExternalClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
Patches are
This reverts commit 9bc040859b
This reverts commit 6f5aba069a
This reverts commit 9bc040859b
I1f250441 Revert "Refactor the vp9_get_frame code for frame parallel."
Ibfdddce5 Revert "Delay decreasing reference count in frame-parallel decoding."
I00ce6771 Revert "Introduce FrameWorker for decoding."
Need better testing in libvpx for these commits
Change-Id: Ifa1f279b0cabf4b47c051ec26018f9301c1e130e
Another project in ChromeOS is using these files. To make libvpx
rolls simpler, add these files back unitl the other project removes
the dependency.
crbug.com/387246 tracking bug to remove dependency.
Change-Id: If9c197081c845c4a4e5c5488d4e0190380bcb1e4
When decoding in serial mode, there will be only
one FrameWorker doing decoding. When decoding in
parallel mode, there will be several FrameWorkers
doing decoding in parallel.
Change-Id: If53fc5c49c7a0bf5e773f1ce7008b8a62fdae257
See: https://code.google.com/p/chromium/issues/detail?id=362697
The code properly catches an invalid stream but seg faults instead of
returning an error due to a buffer not having been initialized. This
code fixes that.
Change-Id: I695595e742cb08807e1dfb2f00bc097b3eae3a9b
This patch adds a mechanism for insuring error checking on invalid files
by creating a unit test that runs the decoder and tests that the error
code matches what's expected on each frame in the decoder.
Disabled for now as this unit test will segfault with existing code.
Change-Id: I896f9686d9ebcbf027426933adfbea7b8c5d956e
s/stdint.h/vpx\/vpx_int.h
Added missing 'break;'s
Also included other minor changes, mostly cosmetic.
Change-Id: I852bba3e85e794f1d4af854c45c16a23a787e6a3
The test for this is in test vector code ( show existing frames will
fail ). I can't check it in disabled as I'm changing the generic
test code to do this:
https://gerrit.chromium.org/gerrit/#/c/70569/
Change-Id: I5ab324f0cb7df06316a949af0f7fc089f4a3d466
This commit allows the key frame to search through more prediction
modes and more flexible block sizes. No speed change observed. The
coding performance for rtc set is improved by 1.7% for speed -5 and
3.0% for speed -6.
Change-Id: Ifd1bc28558017851b210b4004f2d80838938bcc5
A superframe is a bunch of frames that bundled as one frame. It is mostly
used to combine one or more non-displayable frames and one displayable frame.
For frame parallel decoding, libvpx decoder will only support decoding one
normal frame or a super frame with superframe index.
If an application pass a superframe without superframe index or a chunk
of displayable frames without superframe index to libvpx decoder, libvpx
will not decode it in frame parallel mode. But libvpx decoder still could
decode it in serial mode.
Change-Id: I04c9f2c828373d64e880a8c7bcade5307015ce35
This breaks the profile 1 bitstream.
Don't force non420 uv transform size to 1/4 y size. In the 4:2:0 case the
chroma corresponding to a luma block is 1/4 its size. In the 4:4:4 case
chroma and luma planes are the same size. Disallowing larger transforms
can result in a loss of compression efficiency and is inconsistent.
For sub-8x8 blocks only average corresponding motion vectors.
4:2:0 and profile 0 behavior remains unchanged.
Change-Id: I560ae07183012c6734dd1860ea54ed6f62f3cae8
- Rename build_targets to build_framework
- Add functions for creating the vpx_config shim and obtaining
preproc symbols.
Change-Id: Ieca6938b9779077eefa26bf4cfee64286d1840b0
Speed 6 uses small tx size, namely 8x8. max_intra_bsize needs to
be modified accordingly to ensure valid intra mode checking.
Borg test on RTC set showed an overall PSNR gain of 0.335% in speed
-6.
This also changes speed -5 encoding by allowing DC_PRED checking
for block32x32. Borg test on RTC set showed a slight PSNR gain of
0.145%, and no noticeable speed change.
Change-Id: I1502978d8fbe265b3bb235db0f9c35ba0703cd45
This is the first step to rework the rate-distortion modeling used
in rtc coding mode. The overall goal is to make the modeling
customized for the statistics encountered in the rtc coding.
This commit makes encoder to perform rate-distortion modeling for
DC and AC coefficients separately. No speed changes observed.
The coding performance for pedestrian_area_1080p is largely
improved:
speed -5, from 79558 b/f, 37.871 dB -> 79598 b/f, 38.600 dB
speed -6, from 79515 b/f, 37.822 dB -> 79544 b/f, 38.130 dB
Overall performance for rtc set at speed -6 is improved by 0.67%.
Change-Id: I9153444567e5f75ccdcaac043c2365992c005c0c
strip trailing '/' from paths, this is later converted to '\' which
causes execution errors for obj_int_extract/yasm. vs10+ wasn't affected
by this issue, but make the same change for consistency.
gen_msvs_proj:
+ add missing '"' to obj_int_extract call
unlike gen_msvs_vcproj, the block is duplicated
missed in: 1e3d9b9 build/msvs: fix builds in source dirs with spaces
Change-Id: I76208e6cdc66dc5a0a7ffa8aa1edbefe31e4b130
This patch allows the VP9 encoder to skip the un-necessary
motion search in the first pass. It computes the motion error
of 0,0 motion using the last source frame as the reference,
and skips the further motion search if this error is small.
Borg test shows overall the patch gives PSNR gain (derf -0.001%,
yt 0.341%, hd 0.282%). Individual clips may have PSNR gain or
loss. The best PSNR performance is 7.347% and the worst is -0.662%.
The first pass encoding speedup for slideshow clips is over 30%.
Change-Id: I4cac4dbd911f277ee858e161f3ca652c771344fe
Allow for an option to selectively apply the deblocking loop filter to the denoised
raw block, based on the denoised state (no-filter, filter with zero motion, or filter with non-zero motion)
of the current block and its upper and left denoised block.
This helps to reduce some blocking artifacts from the motion-compensated denoising.
Change-Id: I0ac4e70076df69a98c5391979e739a2681e24ae6
This commit fixes frame header decoding for superframe index, to
prevent out of boundary memory read triggered by fuzz test
vector. It resolves a chromium security violation issue
crbug.com/376802.
The issue was introduced in the change:
Add VPXD_SET_DECRYPTOR support to the VP9 decoder.
cl-id I88f86c8ff9af34e0b6531028b691921b54c2fc48
where the buffer was read before validation check on index offset
applied.
A test vector is added accordingly.
Change-Id: I41c988e776bbdd1033312a668e03a3dbcf44ca99
This patch appears to have introduced non-determinism and/or
mismatch from debug vs release.
This reverts commit 5daef90efc.
Change-Id: I80081e55cfeaaa821b510b58a4e6e6328003c7da
The current decoding scheme will decrease the reference count
of the output frame when finish decoding. Then the application
could copy the frame from the decoder buffer to application buffer.
In frame-parallel decoding, a decoded frame will not be outputted
until several frames later which depends on thread numbers. So
the decoded frame's reference count should be decreased only
after application finish copying the frame out. But due to the
limitation of vpx_codec_get_frame, decoder could not know when
application finish decoding. So use a index last_show_frame to
release the last output frame's reference count.
Change-Id: I403ee0d01148ac1182e5a2d87cf7dcc302b51e63
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.
It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.
In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.
Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
This patch allows the encoder to skip the
un-neccessary motion search in the first pass. It
calculates the error of the zero motion vector using
the last source frame as reference and skips the
further motion search in the first pass if the error
is small.
The encoding speedup of the first pass for slideshow
videos is over 30%. Borg test shows the overall PSNR
performance remain approximately the same (derf -0.009,
hd 0.387, yt 0.021, stdhd 0.065). Individual clips may
have either PSNR gain or loss. The worst PSNR perfomance
is from yt set, with a PSNR loss of -1.1.
Change-Id: I08b2ab110b695e4689573b2567fa531b6457616e
* Only use ZEROMV, disalowing the intra modes that were previously
tested.
* Score rate and distortion as zero.
Change-Id: Ifcf99e272095725f11da1dcd26bd0f850683e680
Really just armv7. This is a convenience target intended to make iOS
development with libvpx easier. Xcode projects with default settings
will fail to build when a framework lacks armv7s support when targetting
iOS7.
Change-Id: I7eb80d52eec25501febc0d2c3c0b4ed964b8ed5b
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.
In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.
Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
tests failing under Win32/Win64
+ dct16x16_test: add missing avx2 functions (partially disabled)
exercises the forward transforms
no idct/iht implementations, so the c-code is used
Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
In non-rd real-time mode, choosing smaller transform size in
encoding gives better video quality and good speed gain than
choosing larger transform size. This patch set tx size search
method to ALLOW_8X8, which is better than using 4x4 or other
larger sizes.
Borg tests on rtc set at speed 6 showed significant gain on quality.
PSNR gain: 11.034% and SSIM gain: 15.466%.
The speed gain is 5% - 12% for <720p clips, and 2% - 7% for
720p clips.
Change-Id: If4dc74ed2df359346b059f47fb73b4a0193ec548
Use of stack frame variable "fps" beyond the lifetime of the function.
fps is sent as a paremeter to output_stats and stored in the
packet holding this encoded frame. This has scope beyond the
lifetime of the calling function.
This reverts commit 3f95a230c7
Change-Id: Icd8e14b3d7dd733590ada12e619b9dce95b6b0f5
* changes:
gen_msvs_*proj.sh: strip SRC_PATH_BARE from obj names
*.mk: pass SRC_PATH_BARE to all GEN_VCPROJ invocations
build/msvs: fix builds in source dirs with spaces
When this compiler flag is enabled, the encoder will write a denoised,
uncompressed, version of the input to denoised.yuv.
Change-Id: Ie0247f76b23219d95fe97dd70f23e097d742c249
This commit enables unit test for SSSE3 16x16 inverse 2D-DCT with
10 non-zero coefficients. It includes a new test condition to
cover the potential overflow issue due to extremely coarse quantization.
Change-Id: I945e16f05dfbe19500f0da5f15990feba8e26d99
The SSSE3 implementation might find a potential overflow issue in
its second 1-D transform, if all input residual pixels are close to
255. This commit fixes the issue and re-enables the unit test on
the SSSE3 version.
Change-Id: I0520478abdab7afd3ff2842516bec951111e9b3c
This commit reworks the unit test for 8x8 forward/inverse
transformation. It adds extreme input value test to detect overflow
issues in the intermediate steps.
It temporarily disables unit test for the SSSE3 version, which
showed overflow failure in the new test conditions.
Change-Id: I7caf10bba4b6db031add65d8c0eb99426b38aa42
Right now there is just one place to check: xd->lossless and for the first
pass there is a function is_lossless_requested().
Change-Id: I949a6834e64ce51e422e2892f097f2b871b5429a
In Aq mode 2 for kf/arf/gf the segment q delta
is calculated and then applied by re-quantization without
going through the rd loop again. If the base Q != 0
but the segment Q == 0 (lossless) this can could give rise
to a situation where we have an illegal combination of
transform size and Q. (Q == 0 requires that all blocks
are coded 4x4 WHT).
Change-Id: I241a58c6494ed442e9e4630070b0cde0fb99ae45
...when configured below the path containing spaces. configuring outside
the path containing spaces still won't work due to issues with the
makefiles, e.g.,
/path with spaces/git
/path with spaces/build1
/build2
configure/make in build1 will work, build2 will not
Change-Id: Ie4a1f313596d7457cadd67476ac1dbd3273ad46e
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.
Fixes a bug in original patch:
(https://gerrit.chromium.org/gerrit/#/c/70163/8)
that was reverted due to a nightly test failure.
Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6
This commit applies quantization process with coarse quantization
step size to the forward transform coefficients and tests all the
inverse 16x16 DCT and ADST implementation versions with the
dequantized coefficients as input, to verify that the outcomes
match the prototype.
Change-Id: I68034a6126b45192c87d8c642155290e89bff8fa
In frame parallel decoding mode, there will be still several frames inside
the decoder when application stop calling vpx_codec_decode to decode frames.
The application will need to keep calling vpx_codec_get_frame to get all the
remaining decoded frames in the decoder.
Change-Id: I2ce8260a91282f045bb9a6093ff8a606b1990f14
This commit added a call to set speed feature before initializing
motion search, fixed the problem where unintialized search method
is used before its value being set.
Change-Id: I537e4612bf0d00fd6f51396fd222d4b3bd6fde58
By enabling the OUTPUT_YUV_SRC compiler flag, the encoder will write the raw
input to bd.yuv.
The functionality was mostly implemented, but in its previous state did not
compile.
Change-Id: Ia331ad0f4c6e6f9f51e8d42cd33ba8cc146b3dbf
Making this consistent with intra mode masks: you need to specify
allowed inter/intra modes to use.
Change-Id: Iaecd28bf79047259707d8e7a59a57bb7b856383e
SEG_LEVEL_SKIP requires the block size to be at least 8x8. Attempting to
use it on smaller partitions causes the decoder to reject the bitstream.
Change-Id: Ia7188cdf8ae5ac1df6bd29f3f80dbb0610e1f7b1
An overflow issue could potentially happen in the second round 1-D
transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes
this issue.
Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc
Now match the "C" version of "Fix to reduce block
artifacts from vp8 temporal denoiser."
(see change id Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f)
Change-Id: I99e569bb6af4ae3532621127e12bf917a48ba08e
In the current logic, if the sse for zero motion is smaller
than the sse for new_mv (i.e., best_sse), we may still end up
using the non-zero mv for denoising (if the magnitude of new_mv is above threshold).
This can happen for very noisy content, and can lead to artifacts.
This change ensures that we always use zero_mv (over new_mv) for
denoisng if sse_zero_mv <= best_sse.
Change-Id: I8ef9294d837b077013b77a46c9a71d17c648b48a
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
This code dates from the ancient past and
applied an error score weighting based on pixel
brightness. This not seem to be providing any
benefit metrics wise and could be making some
visual issues in dark frames worse.
The field is left in place in the FIRSTPASS_STATS data
structure in this patch, pending changes to unit tests that
use a pre-defined first pass file.
Change-Id: Id50f04205230234858e7548ce523f11acaf3567d
The subpixel SSSE3 was fixed in this patch:
https://gerrit.chromium.org/gerrit/#/c/70283/
So the equivalent AVX2 is fixed accordingly.
Change-Id: Ieebbc1949c99d34b12b8b47692df71aca5001f3a
The previous change only tunes forward transform. It doesn't affect
the neon implementation of the inverse transform. Hence turn the
unit test on.
Change-Id: I4f0f43783b98814d1eee53182209f9669d538140
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.
Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
The intepolation filter functions can be better tested withe extreme
values, especially given the optimization functions are prone to
overflow signed 16 bit intermediate value when operation order is
wrong.
Change-Id: I712142b0bc1e5969c692c0486a57ffa37c9742b5
Further changes to first pass allocation for gf/arf groups.
Three variables removed from TWO_PASS structure as only
now used locally. Dont adjust gf_group_bits in the post
encode update as this will no longer have any effect.
Change-Id: Iff89b225db923fc856f5d2aedbc899f1d7d68b55
In 8-tap filtering, to guarantee the intermediate results fit in
16 bits, the order of accumulating the products needs to be done
correctly, and the largest product should be added last. This
patch fixed the problem using the method in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation".
Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
Restructuring to allocate the bits for each frame in
a GF group at the time the group is defined.
At the moment the allocation closely mirrors what
we had before.
Also changes the default rate adjustment method to
LONG_TERM_VBR_CORRECTION.
Change-Id: Ie5793c46c6b9c888cead5d8790792efd7d60b7c1
Fixes a bug introduced in
https://gerrit.chromium.org/gerrit/#/c/69779/13, where
uninitialized frame buffers due to corrupt and short
buffer sizes, may cause a crash.
This patch fixes the currently failing
video/processing/static_image/vp8_convert_test
Change-Id: I1b09e21482f292c11a2bfb4e570aef1d643410a7
As mismatchs were found between the intrinsic version and c only. The
commit temporarily revert to use the matching assembly version to
allow further investigation.
Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
Use the appropriate subblock offset mode info rather than the parent
block base, when filling mbmi in the pc tree in nonrd_use_partition.
This mimics what is done in the vertical case and what is done for
both cases in nonrd_pick_partition.
This change has little practical effect at the moment since in speed 5
rt horizontal and vertical partitions are currently only used unpaired
at edges of the picture.
Change-Id: I4632f66ca84086dac56c7d36b45ddbe38a06f42a
This did the same correction as the one in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation" to avoid saturation
during filtering.
Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6
If we are already saving a lot in bits from the target (maximum)
bitrate in the constrained quality mode, allow the quantizer
to go lower than the cq level. This hopefully will solve issues
with getting too low a bitrate and consequently poor quality for
certain videos in cq mode.
Change-Id: I1c4e8b0171fcf58f95198b3add85eea5f3c8f19f
If the denoiser filter causes too big a change in the absolute pixel difference
(between source and denoised signal), the block is not denoised, which can cause
visual block artifacts. This change applies a second adjustment to the temporal filter
to effectively allow for a (weaker) denoising for such blocks (which can keep
the absolute differnence within the tolerance range in most cases).
This helps to reduce some of the block artifacts from the denoising.
The additional cost of re-applying the filter to this set of blocks is low,
as the percentage of blocks per frame (with too big a change in absolute pixel difference)
is typically small, 2-5%.
Change-Id: Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
This commit changed to enable the encoder to adjust motion dection
speed threshold based on picture size. In addition, cpu-used 1 now
does a partition search every other frame instead of every third
frame for low resolution inputs.
The change has no quality/speed impact for 720p and above. Test
showed the change increase encoding time by between 3% to 6% for
cpu-used 2 encodiong of 360p sequences. It also has a compression
gain about .3%.
For cpu-used 2, the change resolved some very disturbing visual
artifacts in certain sequences when large block partitionings and
transforms are used as a result of copying the partition from a
previous frame.
Change-Id: Ic7fd22508cdb811d4ca935655adbf20109286cfa
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.
Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.
|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|
For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.
For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
This commit modifies the x86inc to allow explicit local buffer
allocation and the corresponding stack pointer adjustment.
Change-Id: I3cb2174e0242b5869a4ba0ca0cd240ee066836c3
MMX variance code in vp8 was reading out of bounds..
TODO(JBB): The best fix would involve removing duplicate library
functions between vp8 and vp9...
Change-Id: I5722853a6a58d3b55257ff385fa54c773bf98ded
This commit adjusts the forward 16x16 DCT computation steps to
simplify the register level operations. It fixes the corresponding
sse2 version accordingly.
Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08
Added a skipping test in non-rd inter-mode. After interpolation
prediction step, the residuals are tested to see if they will be
quantized to 0 based on modeling between spatial domain and
frequency domain.
Set static-thresh to 800 for >=720p and 300 for <720p, rtc set
tests showed
1. Speed 5, psnr: -0.514%; ssim: -1.748%;
speedup on related clips: 5% -11%
2. Speed 6, psbr: -0.628%; ssim: -1.637%;
speedup on related clips: 4% - 9%
Change-Id: I62fbf26bc043ecd2b584f255f1a4ee5ab52bfcf3
Use VPX_TEST_NAME instead of the script name sans path and extension
when reporting test results when the variable is not empty.
Also: Clean up some style nits while I'm at it.
Change-Id: I0319745a3b7a90d0f307e55c5108fea2204187cd
The commit changed to use memset for initialiazation of non-trivial
strucutures, where initialization using {0} caused warnings. Also,
removed {0} initializations where appropriate initialization calls
are in place.
Change-Id: Ifd03e34aa80688e382124eb889c0fc1ec43c48e6
Make all post-processor code conditionally
compilable based on the CONFIG_VP9_POSTPROC
macro.
Also, remove the vizualization code from VP9
since it is out of date and will not compile.
Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd
disabled by default, enable with:
--enable-experimental --enable-spatial-svc
this disables vp9_spatial_svc_encoder and svc_test, further work is
needed to remove internal lib references
Change-Id: I6a487ecbf07eb98843a99d96e17f08f960b63088
vp9_block_error_sse2 can only handle 16 bytes at a time but
the function requires to handle a sequence of 32 bytes at a time
so each 16 bytes is handled in a different register.
With AVX2 optimization the 32 bytes can be handled in one register instead
of two in the SSE2
The vp9_block_error was optimized by 85%.
The user level was optimized by 1.2%
Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
New name: vpx_temporal_svc_encoder.c
Also, update comment to note that example supports VP8 and VP9.
Change-Id: I6fffab81296f918ebca740192a5c609593852dff
The various motion search functions share a
common function prototype. In the case of
vp9_full_range_search() two of the parameters
are not needed.
Change-Id: I0e190af54a3b3f276409f20e8ec55912f9b0b798
Simplify the calculation of KF bitrate in similar way
to previous patch for GF/arf.
This has no impact on derf or std hd sets but gives a
small net gain of ~0.1% for yt and yt-hd sets.
Change-Id: Ida64ac1428d9c2a62adb67056fadbf0180eff030
The variation in boost calculation for gf and arf groups
is not significant enough to justify the extra complexity.
Also removed some other spurious code that no longer
has much material impact.
The handling of the rare case, where the boost bits
number is less than the number of bits a that would
be allocated if a frame was not boosted, will be dealt
with in a subsequent patch.
This change actually helps on all sets a little by
~0.1% - 0.2% with slightly bigger gains on SSIM.
Change-Id: Id42c1ac22a80a8c4993cfa0e51bc733eb9ed4f75
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.
Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
This reverts commit 81ad047ee5.
Revert "VP8 for ARMv8 by using NEON intrinsics 15"
This reverts commit 727af7cebe.
This exposes a bug in gcc 4.9 regarding register allocation. Will reland
when 4.9 is fixed.
Change-Id: I2d8a04e4edde93719280e41550f4c0765608ec4d
The warning messages complained that there are unused arguments
in a few prediction modes. This structure was designed on purpose,
such that a wrapper function can cover all prediction mode cases
and make them readily accessible as an pointer array.
This commit silences such warnings.
Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0
Allow slightly larger minq-maxq range for P frames. This improves
the compression performance of speed -5 for rtc set by 2.7% in psnr.
Change-Id: I438653d52d0fe51111509c6092e2334bac2de0cf
+ the remnants in the build system & README
the documentation that required php was removed in:
50fa585 Removing examples code generation and making them static.
Change-Id: Ibf00dca9ab2715fc21e8de358807b63d1445662c
When superframe index is available we completely rely on it and use frame
size values from the index.
Change-Id: I0011d08b223303a8b912c2bcc8a02b74d0426ee0
Inline loopfilter has been already handled in vp9_decode_frame().
Collecting all similar code in one place now.
Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
Re-factor duplicate code.
Add two pass check for use of section_intra_rating as
it is un-initialised in the 1 pass and rt case.
Change-Id: I93120796f07961b8a21fb26e1a9f0d3d13949994
One of a series of changes to clean up two pass
allocation as precursor to support for multiple arf
or boosted frames per GF/ARF group.
This change pulls out the calculation of the total bits
allocated to a GF/ARF group into a function, to aid
readability and reduce the line count for define_gf_group().
This change should have no material impact on output.
Change-Id: I716fba08e26f9ddde3257e7d9b188453791883a3
This commit enables a chessboard pattern for partition search. All
the black blocks run regular partition search ranging from 8x8 to
32x32. The rest white blocks take the nearby blocks' information
to adaptively decide the effective search range.
The compression performance for rtc set at speed -5 is down by 1.5%.
For pedestrian 1080p at speed -5, the runtime goes from 41594 ms to
39697 ms, i.e., about 5% faster.
Change-Id: Ia4b96e237abfaada487c743bca08fe1afd298685
The test vector has segment enabled with different quantizer used for
different segments for bot the first frame(key) frame and the rest of
non-key frames.
Change-Id: I7e21122183050ee046219caba483c18cbc34afe7
Fixes the idecoder in the case where:
cm->error_resilient_mode == 0, and
cm->frame_parallel_decoding_mode == 0, but
new_fb->corrupted == 1.
The assert in debug_check_frame_counts fails to
take into account the case of a corrupt frame.
Change-Id: Idf318a68458cc88d65d6f3f408a10d8ffe87e43f
* changes:
Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT
Change eob threshold for partial inverse 8x8 2D-DCT to 12
SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
tx_mode supercedes whatever mechanism is used to push for 16x16
allowing for the use of the 4x4 transform.
Change-Id: I6c3f05ab9fe52050e40cc6303de9334653763289
We only used two members from that struct: max_threads and inv_tile_order.
Moving them directly to VP9Decoder struct.
Change-Id: If696a4e5b5b41868a55f3cc971e1d7c1dd9d5f69
This reverts commit 4725ab7e51.
The constants are necessary to avoid breakage in vs9 builds:
warning C4180: qualifier applied to function type has no meaning; ignored
error C2436: 'f2_' : member function or nested class in constructor initializer list
while compiling class template member function 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>::tuple(const int &,const int &,unsigned int (__cdecl &))'
..\test\variance_test.cc : see reference to class template instantiation 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>' being compiled
Change-Id: Ia218b74fc473d40f02fee84cb7009adfbe82e5a7
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.
The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).
Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
vp9_is_upper_layer_key_frame() definition does not match declaration--
it was missing the second const.
Change-Id: I71312579eb443be1924b8b06d8b3177c3dcb40f3
This commit enables ssse3 assembly implementation of the 8x8
inverse 2D-DCT with only first 10 coefficients non-zero. The
average runtime for this unit goes down from 198 cycles to 129
cycles (34.8% faster).
Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
Merged minq tables for arf and gf cases.
These tables were almost the same and for
VBR the arf table was not used at all.
Change-Id: Ie3c87e91dab613cf06f6945ac1ace0e0e4213d34
Small adjustment to the active Q range calculations.
These changes should slightly extend the available Q range
for KF/GF/ARF and narrow it for other frames.
The results for this change in isolation are broadly positive
for SSIM and average PSNR and slightly up but mixed for opsnr.
derf +0.293% opsnr, +1.286% SSIM
std-hd + 0.528% opsnr, + 1.746% SSIM
yt +0.056% opsnr, +0.457% SSIM
yt-hd -0.147% opsnr, + 0.226% SSIM
Change-Id: If065280342027ecc5d44b49fc1d440dfef041002
The test vector is produced to have a single key frame, with segment
map enabled and transmitted. Yet no segment feature is active.
Change-Id: I365d62f00d05c07098b9a76fc8d3a991e427ec1a
Includes changes that are not compatible with VS windows builds.
Amongst other things stdint.h is not supported in VS.
This reverts commit 89fbf3de50.
Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
When the variance is far less than sse, the block is considered to
be under light change. All the energy is compacted into DC coeff
and can be coded at low cost. In such situation, switch the rate-
distortion modeling from sse+var based back to variance based.
Note that this is a temporary solution to handle the rare situations
where the scene light changes.
Change-Id: I1ee0fe2b9eda6b5fac40152e1841bf23f4d229fd
This reverts commit c500fc22c1
There is an issue with gcc 4.6 in the Android NDK:
loopfiltersimpleverticaledge_neon.c: In function 'vp8_loop_filter_bvs_neon':
loopfiltersimpleverticaledge_neon.c:176:1: error: insn does not satisfy its constraints:
Change-Id: I95b6509d12f075890308914cc691b813d2e5cd9f
This reverts commit a5d79f43b9
There is an issue with gcc 4.6 in the Android NDK:
loopfilter_neon.c: In function 'vp8_loop_filter_vertical_edge_y_neon':
loopfilter_neon.c:394:1: error: insn does not satisfy its constraints:
Change-Id: I2b8c6ee3fa595c152ac3a5c08dd79bd9770c7b52
Pulling libwebm from upstream
Changes from upstream:
249629d make Mkv(Reader|Writer)(FILE*) explicit
7f3cda4 mkvparser: fix a bunch of windows warnings
5c06178 Merge "clang-format on mkvparser.[ch]pp"
4df111e clang-format on mkvparser.[ch]pp
7b24501 clang-format re-run.
c6767b9 Change AlignTrailingComments to false in .clang-format
9097a06 Merge "muxer: Reject file if TrackType is never specified"
eddf974 Merge "clang-format on mkvmuxertypes.hpp and webmids.hpp"
def325c muxer: Reject file if TrackType is never specified
41f869c Merge "clang-format on webvttparser.(cc|h)"
fd0be37 clang-format on webvttparser.(cc|h)
207d8a1 Merge "clang-format on mkvmuxerutil.[ch]pp"
02429eb Merge "clang-format on mkvwriter.[ch]pp"
0cf7b1b Merge "clang-format on mkvreader.[ch]pp"
2e80fed Merge "clang-format on sample.cpp"
3402e12 Merge "clang-format on sample_muxer.cpp"
1a685db Merge "clang-format on sample_muxer_metadata.(cc|h)"
6634c7f Merge "clang-format on vttreader.cc"
7566004 Merge "clang-format on vttdemux.cc"
9915b84 clang-format on mkvreader.[ch]pp
7437254 clang-format on mkvmuxertypes.hpp and webmids.hpp
0d5a98c clang-format on sample_muxer.cpp
e3485c9 clang-format on vttdemux.cc
46cc823 clang-format on dumpvtt.cc
5218bd2 clang-format on vttreader.cc
1a0130d clang-format on sample_muxer_metadata.(cc|h)
867f189 clang-format on sample.cpp
4c7bec5 clang-format on mkvwriter.[ch]pp
9ead078 clang-format on mkvmuxerutil.[ch]pp
fb6b6e6 clang-format on mkvmuxer.[ch]pp
ce77592 Update .clang-format to allow short functions in one line
0a24fe4 Merge "Add support for DateUTC and DefaultDuration in MKV Muxer."
11d5b66 Merge "Add .clang-format"
a1a3b14 Add .clang-format
0fcec38 Add support for DateUTC and DefaultDuration in MKV Muxer.
Change-Id: Ia0ed161ffc3d63c2eba8ed145707ffe543617976
In a future release we plan to remove the
option of setting the ARNR filter type.
This patch marks this control as being deprecated
as advance warning that it will be removed from
the API at some point.
Change-Id: I5dcca804b44c7c93b1a10da7d69d19ba6061869c
Added macro to conditionally compile some of the
post-processing functions only when CONFIG_POSTPROC
is defined.
This was causing the build for the generic-gnu target
to fail.
Change-Id: Ibfa447feceb7a0528135025f105be48f97e9965c
The rounding of the ARNR filter output prior to
normalization by the filter strength was incorrect
when strength = 0.
In this case 1 << (strength - 1) would not create the
required rounding of 0, rather it would outrange. This
patch fixes this issue.
Change-Id: I771809ba34d6052b17d34c870ea11ff67b418dab
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.
Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
The microsoft build tools explicitly disallow building for arm in
the "desktop" target configuration; one has to target "Windows
Store" apps (aka WinRT/Metro) or Windows Phone. In Visual Studio
2012, one could just pick the v110_wp80 toolset which made the
vcxproj files buildable. In Visual Studio 2013, picking the v120_wp81
toolset isn't enough - one has to configure the vcxproj files
as an "AppContainerApplication". This has the implication that
you can't just build a plain .exe (such as the examples) - an .exe
project would need to have an AppxManifest file. Therefore we can
only build the library itself.
If loaded into Visual Studio for Windows (the Windows Store/Phone
version of Visual Studio, not the Desktop one), the obj_int_extract
project is omitted since it's treated as incompatible. Building
from the command line with msbuild works fine though.
The armv7-win32-vs12 target was added as part of a638bdf4 even
though actual use of it hadn't been tested.
Change-Id: Iee8088252cf790317aeb6b417d29058225f1f629
The getenv function doesn't exist there. In Visual Studio 2012,
the function still existed in the link libraries even though
it was hidden in the headers, but in the 2013 version it has been
removed from the link libraries as well.
Change-Id: Iea6289a698fa1788e906f5aabb6fddda3675815b
Disable register checks when vp9 is not configured. Soon vp8 assembly
will move to intrinsics, obviating this check.
This will still run the check when vp9 is enabled.
Change-Id: I90f50d22cb8c15e9c07f2c8e830e08de7fce0689
This does not do the full toolchain setup like the arm builds. It only
allows for ndk-builds. See the instructions in tests/android/README or
the webm jnin bindings project:
https://chromium.googlesource.com/webm/bindings/+/master/JNI/README.Android
Because this support is not quite polished, the build targets must be
forced. Please use
--force-target=x86-android-gcc --disable-ssse3 --disable-sse4_1 --disable-avx2
--force-target-mips-android-gcc
Change-Id: Ie2b6623f71ac816e3965c39bf97097e9d30b6e94
When ARNR filtering is disabled, by setting
arnr_max_frames=0, mode_skip_mask was being set to
-1 for the ARF frame resulting in no mode being
selected for the block.
The intent is to restrict the reference frame to the
previous ARF frame and the mode to one of ZEROMV,
NEARMV or NEARESTMV.
Change-Id: Ifc3920b153142cd01d422910c94d2f20ffb6f129
On balance Deb's modified rate control for VBR seems
to be outperforming especially on some low motion YT
clips so I have switched this to be the default mode for
now.
Change-Id: I0713d430cad6425ac5c48fccdf332e12814ee44a
Used horizonal add instructions instead of adding
byte lanes. The encoder performance improved by
~4% for the test clip used.
Change-Id: Iaddd10403fcffb5b3f53b1f591ab2fe0ff002c08
This patch did a cleanup following the commit "Save NEON registers
in VP8 NEON functions". The pushing/poping of callee-saved NEON
registers was moved into individual NEON functions. Therefore,
we don't need to save those registers at the beginning of codec.
The related code was removed.
Change-Id: I5648166514fc9beffb780aa138495597731f49ea
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
version is turned on only for x86_64. The average unit runtime
goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
This translates into about 1.5% speed-up for pedestrian_area 1080p
at speed 2.
Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
This reverts commit 59e733ca81.
Hold off removing arnr_type to give users the opportunity
to change their script files to handle its deprecation. A
follow-up patch will mark the control for setting arnr_type
as deprecated and it will be removed completely in a later
revision of the code.
Change-Id: I8b817c744e144d3714234a4cd4309816d0c7e3e8
The recent compiler can generate optimized code that uses NEON registers
for various operations besides floating-point operations. Therefore,
only saving callee-saved registers d8 - d15 at the beginning of the
encoder/decoder is not enough anymore. This patch added register saving
code in VP8 NEON functions that use those registers.
Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31
dist is broken in msvs currently due to a dependency on libs.mk which in
turn depends on the rest of the source tree, not just the examples
Change-Id: I3e313ceeae81eb29ef4bfb099d89756b43583eaa
This member of VP9_COMP seemed unnecessary since it
only shadowed VP9EncoderConfig.key_freq that is
accessible through VP9_COMP.
Change-Id: Ib751bb1cf1b0b3c50a2a527d7c34f6829dd6fee3
There are a few tests which read/write directly to/from WebM files. They should
be disabled when --disable-webm-io is passed.
Change-Id: Ibac4732e27c66da33082151ba6e6993eaa9a1efd
The global variables used in vpxdec.sh and vpxenc.sh have become useful
elsewhere: Define them in tools_common.sh instead.
Change-Id: I5b8dbd2e88c8d6b2f46c5c55d7711fa154c12b6a
The encoder was not handling requests to place keyframes at
fixed intervals, i.e. kf_min_dist == kf_max_dist, correctly.
In this case when looking to place the next keyframe it was
accumulating stats all the way up to the end of the firstpass
file. This patch corrects this behavior.
Change-Id: I948ad9f1d7faa0c05861df588136cce3bb61d7e7
This commit introduces a chessboard pattern search for the prediction
filter type search. It runs extensive search in alternate blocks and
allows the rest blocks to refer coding decisions of their nearby
neighbors.
For pedestrian 1080p at 4000 kbps, the runtime of speed -5 goes down
from 43990 ms to 42200 ms. The overall compression performance for
RTC set is changed by -1.37%.
Change-Id: Icfe220c49451cda796f0ca91d935c9ed01e56c9d
ARNR filtering is now forced to be centered on the ARF
frame and the other two options have been removed.
The other modes of constructing the ARNR frame were
not used and there does not seem to be any good
reason to maintain them.
This is purely an encoder-side change.
Change-Id: Ic772636d23f280752973852b9740083532a49de2
This patch fixed errors reported in Issue 746: "dr memory VP8
encode errors" and Issue 745: "dr memory VP8 decode errors".
The "UNINITIALIZED READ" errors were fixed in x86 assembly
code. The list of files fixed is
vp8_intra_pred_uv_tm_sse2
vp8_intra_pred_uv_tm_ssse3
vp8_intra_pred_uv_ho_mmx2
vp8_intra_pred_uv_ho_ssse3
vp8_intra_pred_y_tm_sse2
vp8_intra_pred_y_tm_ssse3
vp8_intra_pred_y_ho_sse2
Change-Id: Ib6df7bf1d442077fe534edfd90e50ad16fadacdd
For speed 3 and above, such search is only allowed at speed 3.
The change helped cif and stdhd set by 1.2% and .7% in compression,
but increased the encoding time by around 5%.
Change-Id: Ifa4832327f1c1bef3decb032ceb769cbf50e059f
Adds test code to verify that supplemental superframe information
that precedes the normal superframe information will not break
decoding.
Change-Id: Ia252b887d7ee138f51dc9a778376ff739402c455
The end_useage parameter is confusingly named since it
now actually defines the rate control method used.
Change-Id: I98912caabfe556b7af0b939a645d1336409e4d71
This commit enables a background detection approach for adaptive
quantizer control. It combines the cyclic refresh pattern and the
background information to determine the segment id for adaptive
quantizer selection, prior to the non-RD mode decision process.
It hence allows proper quantization information update for a more
precise rate-distortion modeling in the non-RD mode decision.
The compression performance of speed -5 for rtc set is improved
by 2.5%, at no speed change.
Change-Id: Ic3713e8ed9185b403b5b1679d19dabd57506d452
1. We didn't scale source image in lower layers so that
the stats are incorrect.
2. We didn't extend borders for re-constructed image.
Change-Id: Ia8d7bafbdb695ffa7f504e171f9449812e7bb0a3
Remove duplicate WebM parsing code in test/webm_video_source.h and linking it
against webmdec.c which does the exact same thing.
Change-Id: Ib7152eecde890fca58be42028cab18c9cb54221c
To make direct side by side testing this patch combines two
VBR corrections schemes to allow more direct side by side testing.
(The other patch was by Debargha chg id I0cd1f7...)
Change-Id: I271c45e5c4ccf8de8305589000218b80d9dc3a25
The background detection only tracks luma component. This commits
removes the frame buffer pointer retrieval for chroma components.
Change-Id: I098bd2950f5e5829ed5dc2b48568167248da7fad
This patch sets up a quad_tree structure (pc_tree) for holding all of
pick_mode_context data we use at any square block size during encoding
or picking modes. That includes contexts for 2 horizontal and 2 vertical
splits, one none, and pointers to 4 sub pc_tree nodes corresponding
to split. It also includes a pointer to the current chosen partitioning.
This replaces code that held an index for every level in the pick
modes array including: sb_index, mb_index,
b_index, ab_index.
These were used as stateful indexes that pointed to the current pick mode
contexts you had at each level stored in the following arrays
array ab4x4_context[][][],
sb8x4_context[][][], sb4x8_context[][][], sb8x8_context[][][],
sb8x16_context[][][], sb16x8_context[][][], mb_context[][], sb32x16[][],
sb16x32[], sb32_context[], sb32x64_context[], sb64x32_context[],
sb64_context
and the partitioning that had been stored in the following:
b_partitioning, mb_partitioning, sb_partitioning, and sb64_partitioning.
Prior to this patch before doing an encode you had to set the appropriate
index for your block size ( switch statement), update it ( up to 3
lookups for the index array value) and then make your call into a recursive
function at which point you'd have to call get_context which then
had to do a switch statement based on the blocksize, and then up to 3
lookups based upon the block size to find the context to use.
With the new code the context for the block size is passed around directly
avoiding the extraneous switch statements and multi dimensional array
look ups that were listed above. At any level in the search all of the
contexts are local to the pc_tree you are working on (in?).
In addition in most places code that used to call sub functions and
then check if the block size was 4x4 and index was > 0 and return
now don't preferring instead to call the right none function on the inside.
Change-Id: I06e39318269d9af2ce37961b3f95e181b57f5ed9
There is no need to initialize source/dst frame buffers at frame
level. These will be done at block coding stage. This commit hence
removes the redundant operations.
Change-Id: I11d9f2556058c6205c8e58ed53e31f78622c41b7
Add code to monitor over and under spend and
apply limited correction to the data rate of subsequent
frames. To prevent the problem of starvation or overspend
on individual frames (especially near the end of a clip) the
maximum adjustment on a single frame is limited to a %
of its un-modified allocation.
Change-Id: I6e1ca035ab8afb0c98eac4392115d0752d9cbd7f
This commit compares the current original frame to the previous
original frame at 64x64 block level and decides if the entire
block belongs to background area. If it is in the background area,
skip non-RD partition search and copy the partition types of the
collocated block in the previous frame.
For vidyo1 in the rtc set, this makes the speed -5 coding speed
about 8% faster. The overall compression performance is down by
1.37% for rtc set.
Change-Id: Iccf920562fcc88f21d377fb6a44c547c8689b7ea
This commit added a check of reference frame to make sure that pre
buffer pointers are initialized only when necessary and make them
to 0 if ref frame is intra, hence those buffer should never be used.
Change-Id: Ieb474fcd9feb759f02e2f9c282b7348a8fa31117
Delete code relating to the old VP8_TUNE_SSIM flag
as this code does not currently work and is largely made
redundant in VP9 by the various AQ modes.
Change-Id: I71f28e1f680573d296422254489000678552b17b
Remove duplicate rd_thresh code introduced when vp9_rd_pick_inter_mode_sub8x8()
was forked from vp9_rd_pick_inter_mode_sb().
Change-Id: I3c9b7143d182e1f28b29c16518eaca81dc2ecfed
Fix rate control bug whereby the rate factor heuristics
were being updated on arf overlays causing a rate surge
for a few frames followed by a corrective drop.
This fix eliminates many of the overshoot problems that
we were seeing on hard clips (even without applying
stricter vbr rate control) and also helps quality on
almost all clips with some hard clips improving by >5%.
Overall quality results measured at speed 2.
Derf +1.78% opsnr , +2.44% SSIM
Stdhd +2.41% opsnr, +2.85% SSIM
Change-Id: I2369df6295c2705963fa6307877f6acb304bcc39
Fix return values for webm_read_frame so that we can distinguish between
error and end of stream. 0 - Success, 1 - End of stream, -1 error.
Change-Id: Ic35d0c7d7a166e027711a3d2300ecdda25a5d0cc
We don't use declarations from this file. The real declarations
(differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad.
Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0
Added command line flags "resize-width" & "resize-height"
to allow the user to specify the frame size to encode at.
These two flags are ignored if the "resize-allowed" switch
is not set to 1.
All frames in the clip are then encoded at this size, which
must be smaller than the raw frame size.
Change-Id: I3d64bd9303d5c0bd678461a866a1ea621700d744
A previous path improved speed 2 quality a little but
more extensive testing showed that it slowed encode
by a few %.
The change will have a similar effect for speed 3 but
should not impact speeds 4+;
This experiment should reverse that and give a speed
up at the cost of a small quality loss.
Borg results pending.
Change-Id: I4493fc1541aaf44587f1a41ff219f7088da9252c
Both values are already checked as command line arguments:
RANGE_CHECK_HI(cfg, g_lag_in_frames, MAX_LAG_BUFFERS);
RANGE_CHECK_HI(extra_cfg, sharpness, 7);
Change-Id: I584798d587152d88dfd517c210054b466f4e5f8a
With a more approriate one vp9_setup_src_planes() as only src buffer
pointers need to be initialized here.
Change-Id: I40fac4d8b2d39eb7d0c65b9b6afab45138a13936
This increases the range of Q values available to
normal inter frames to allow encoder a better chance
to hit the target rate.
Change-Id: I33cd96469a46577fdcea631e26d3355710909e6d
The limits applied under the flag
"LIMIT_QRANGE_FOR_ALTREF_AND_KEY"
behaved in an undesirable way if the gap between
active_worst_quality and active_best_quality was
changed.
In this patch, the adjustment is made using the
vp9_compute_qdelta_by_rate() function and fixed
rate multiplier values. Hence it is not impacted by
the relative value of active_best_quality.
Change-Id: I93b3308e04ade1e4eb5af63edf64f91cd3700249
The cause is because VP9 encoding use vp8_vpxyv12_extendframeborders_neon
on arm which only extend boarder size 32. But VP9's border size is 160
Change-Id: I1ff7e945344a658af862beb1197925e677e8ff57
Problem has been introduced recently with the cleanup patch
I0816ec12ec0a6f21d0f25f10c214b5fd327afc6c
Change-Id: Iaacb956a6039eb5826b82618dc03be32053fb892
This commit changed the initialization of best_mode_index to -1 to make
sure it is not mistakenly used for mode masking.
Change-Id: I75b05db51466070dd23c4ee57a4d4b40764dc019
In mode selection loop, once mode_index pass mode_skip_start, all
modes with a different reference frame from current best mode are
masked out using mode_skip_mask.
However, the setting of mode_skip_mask may use an invalid mode if
there is no mode tested yet. This commit fixes the issue by making
sure a mode has been tested and selected. Otherwise, no mode will be
masked out because of their reference frame.
Change-Id: Ib0009e8a96836a65cf5347440fff8a2e1a67f29f
This patch fixed the uninitialized read errors in Issue 748:
"dr memory VP9 encode errors". In vp9_convolve_avg_sse2,
when width is 4, pavgb reads 8 bytes from dst buffer that is
out of range. An error is reported although the data is not
actually used later. This issue was resolved by preventing
uninitialized reads.
Change-Id: I109a54910aa47139cb13119de86f2062cff207df
The macosx release of clang v5.0 identifies itself as:
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
This version of clang uses the older _mm_broadcastsi128_si256, like
v3.3, as given away in the LLVM svn version above.
Change-Id: I4d6d59d5454efd57d2ae9e75f5eb7486af7cbd0c
Calculate the difference variance between last source frame and
current source frame. The variance is calculated at 16x16 block
level. The variances are compared to several thresholds to decide
final partition sizes.
An adaptive strategy is implemented to decide using
SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
in the video. The switching test is done once every
search_type_check_frequency frames.
The selection of source_var_thresh needs to be investigated
further later.
RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
ssim gain. For clips with large enough static area, the
encoding speedup is around 2% to 15%.
Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
This commit allows the non-RD mode decision flow to select
prediction filter type in NEWMV mode. It provides 8.14% compression
performance gains in both settings of AQ=0 and 3. The current speed
impact is about 5% to 10% slower.
Change-Id: Id66ecebf77abd8f90fb3f6a066c0e8dfb4bf1c42
Adds some high-level hooks for profile 2 before further
progress on the implementation.
According to the definitiion in this patch:
1. Profile 2 only supports 10 or 12 bit color but not 8
2. Profile 2 supports all color sampling modes: 444, 422 and 420,
and alpha plane.
3. Profile 3 is currently undefined.
Please consider the definition carefully and suggest modifications
to the definition as needed.
Change-Id: I5b284fc679e54ac5aee171af72fa7994cfd28995
There was a bug with the decoder that if you started the decoder
with more threads than the first frame had tile columns. Afterwards
tried to decode a frame with more tile columns than the first frame,
the decoder would hang. E.g. run vpxdec --threads=4. The first frame
had two tile columns, then the next key frame had 4 tile columns, the
decoder would hang. If you started with 4 tiles and switched to 2
tiles the decoder would be fine. The issue is that the worker the thread
loop is using is stale.
I added a test vector "vp90-2-14-resize-848x480-1280x720.webm" that
exhibited the bug.
Change-Id: I7bdd47241a52ac0fe1c693a609bc779257e94229
Copy up to a certain bsize, otherwise set to a fixed bsize.
This helsp to reduce artifact near moving boundary caused by full partition
copy without checking motion of super-block.
This artifact can occur at speeds 3,4 in real-time mode.
Issue: https://code.google.com/p/webm/issues/detail?id=738.
Change-Id: I05812521fd38816a467f72eb6a951cae4c227931
ARF overlays now use the same rate correction factor as regular inter
frames, further testing would be needed to see if it makes sense to use
a completely separate rate correction factor for ARF overlays.
$ vpxenc --cpu-used=5 --fps=50/1 --target-bitrate=2000
parkjoy.y4m -o out.webm
=> Before: 3356 kb/s
=> After: 2271 kb/s
Change-Id: I73e4defa615ba7a8a2bdb845864f4b1721cbbffe
This commit estimates the motion vector rate cost right after full
pixel motion search. It combines this and the mode cost and compares
the corresponding rate-distortion cost. If it is already above the
current best one, skip the rest sub-pixel motion search and modeling
process. For pedestrian_area 1080p at 4000 kpbs, the speed -5 runtime
goes down from 39425 ms -> 38399 ms.
Change-Id: If4cd7119fd6c266798d5cf1d19d19ab425e52a26
Tests the basics (first confirms feature is available in vpx_config.h):
- VP8 decode (in IVF file).
- VP9 decode (in WebM file).
- VP8 encode (to IVF and WebM).
- VP9 encode (to IVF and WebM).
- VP9 lossless encode (to IVF, currently disabled due to failure).
- Pipe input (to vpxdec and vpxenc).
Test data path and path to vpx{dec,enc} have been parameterized. In
addition:
- Supports disabling tests (test names prefixed with DISABLED_ are not
run by default).
- Supports filtering tests.
vpxdec.sh: Tests vpxdec.
vpxenc.sh: Tests vpxenc.
tools_common.sh: Common test functions.
Change-Id: I0612c88b8dd6049a05bbbc79a317a0cca61733a5
Reinstates this macro and truns it on in order to avoid issues
due to some frames at the end starving in harder videos.
A more acceptable solution is in the works.
Change-Id: I3c46148e86fa6114e3fed245246fb3686a9e6700
This commit slightly increases the bit allocation for key frame. This
improves speed -5 coding performance by 2.77% with aq-mode=0 and by
2.78% with aq-mode=3.
Change-Id: Iaa3e777f80b9706306606af06e89852bac146659
in some configurations MSVS will define _off_t / off_t in wchar.h; the
former is used locally while the latter is for compatibility. this
change overrides off_t as in the past and sets _OFF_T_DEFINED to prevent
a clash in types.
Change-Id: I9b0e6db586a0a2729b545d93edfc56570d2fcf97
For real-time mode under cbr, this increases the gain (5-10%)
for speed 5 (none/little change for 6), on vc-clips.
Change-Id: I9b38beeb3c820de22c43a0ba53a9456168dd24ba
Turns off the DISABLE_RC_LONG_TERM_MEM macro and makes other changes
in the way the bits are updated, to make 2-pass rate control track
target bitrates closer.
Change-Id: I5f3be4b11c2908e6a9a9a1dd4fcf4e65531c44d8
This commit reduces the frequency of frames using finer quantizer
in non-RD coding flow, and slightly tune up the quantizer resolution
when used. It provides 1.7% compression gains in speed -5 at no speed
difference.
Change-Id: I430249a51260a841a0402666e5ec1566e4f7d5a6
Temporary revert.
Problems with conflicting definitions of type off_t
in MSVC builds that need resolving.
c:\Program Files (x86)\
Microsoft Visual Studio 9.0\VC\include\wchar.h(479) :
"error C2371: 'off_t' : redefinition; different basic types
c:\on2experimental\libvpx\tools_common.h(26) :
see declaration of 'off_t'"
This reverts commit 92a4c59112.
Change-Id: I535e40a18842a92e3e6e0b29e5fba66313010803
The new tolerance is a little higher than before (especially
for kf/gf/arf) so this change gives an encode speed up
for some clips up for speeds 0-2.
Change-Id: I63f7d6c9cc11c7f58742f41e250dcd3eab1741eb
This code/setting was actually not used (since speed features were not set on first frame,
until a recent change) and should be removed.
In CBR mode, the q value for the first frame can be controlled by setting
the target size via the parameters rc_buf_initial_sz (and max_intra_size_pct).
Change-Id: I65afc64972b36c449bd5a8c25800e65da5389066
While encoding a frame, its last frame source can be used to give
acurate motion information. This patch prevents last frame to be
overwritten so that it is available during current frame encoding.
The last source is scaled when it is necessary. cpi->Last_Source
points to the last source frame.
Change-Id: I0e1ef5e9e1d2badf9d0c7a1a44a7ed5b24c09425
Use a crude correction factor to correct for
lower compression efficiency at higher encode
speeds when estimating the max Q for the
clip.
Change-Id: I5ae377647f4adf5e91d700a8791fb3b8f70efc73
This commit optimizes the bit allocation for the non-RD coding flow.
It applies slightly better quantizer to the frames, where all blocks
run a non-RD partition search. Such frames typically have better
rate-distortion trade off, thus improving the reconstruction quality
for next few frames reference at reasonably low increment in rate
cost.
The coding performance for rtc set at speed -5 with error-resilient
tuned on and rate control set as cbr is improved by 19.58%. It improved
the coding speed by about 10% for a portion of local test clips.
Change-Id: I9d56696cd4359dc8136ca10aff10fff05aaa2686
For very large size video image, the scaling calculation may need use
value beyond the range of int. This commit upgrade the value to 64bit
to make sure the calculation do not wrap around INT_MAX.
The change corrected the decoder behavior.
The bug affects only very large resolution video because the scaling
calculation was sufficient for image size smaller than 2^13.
This resolves issue:
https://code.google.com/p/webm/issues/detail?id=750
Change-Id: I2d2ed303ca6482f31f819f3c07d6d3e98ef3adc5
This commit adjusted the speed steps in rt mode to make the steps
more evenly spaced on speed and quality, specifically:
1. Merged 3 and 4 into one single step 3 and removed confilicting
features.
2. Move 8, 7, 6, 5 to be 7, 6, 5, 4 repsectively.
Change-Id: I38d56d61531f3561d772aef953c411c8fb38c063
Root cause is the different default register length between x86
and x64 platform. Change spatial_layer_id to long long.
Change-Id: If1a5972365c7a59f7e76cb4fd714610f3d48a8ff
Small speed gain for speed 1.
Quality is generally a little up for speed 2.
Speed 3 was similar to speed 4 but now positioned more
evenly between 2 and 4 speed and quality wise.
(opsnr +5.6% ssim +8.25% across all sets)
Speed 4 is a little slower than before but sizable quality gains.
(opsnr +3.7% ssim +6.8% across all sets)
The code has been cleaned up a bit so that for each incremental
speed step changes over the previous speed step are applied.
This makes it easier to see what is changing from one setting to
the next.
Change-Id: I2d98d0d6230af23486adaec01908f58942a7cdeb
Allow tx search for ARF and GF helps quality but a little slower.
Setting subpel_iters_per_step to 1 improves encode speed.
Setting sf->mode_skip_start = 10 improves speed.
Initial local results suggest overall impact on quality is neutral
but encode is up to 15% faster.
Change-Id: Ibde02cae6626a44c10a1da0cefe888afbb51f037
Removes a TODO. Changed meaning of some parameters
(target-max-percent refresh and starting index) to be
defined relative to superblock. Also, modify turn-off condition.
Change-Id: I5e55f372b7079c24f9cdac0b06fa34620dbf456b
This commit enables the non-RD mode decision coding flow to
adaptively apply partition search in non-refresh frame, when the
collocated block in previous frame suggests there might be a motion
activity. It refactors the update_state_rt() function to support
buffer swap of mode_info struct, thereby unifying the encoding
stage across various non-RD coding modes.
It provides 5% compression performance gains in speed -6 for rtc
test set, at about 12% speed slow down.
Change-Id: Iefa374aed5a11c4b7ff9a3ed36a98ea8bd184edb
Fix so that vp9_update_segment_aq() will use the correct (i..e, chosen)
encoding mode (from ctx struct) in update_state.
Change-Id: Icc4b66f3935fad5ec4516a4d57e843d12c365e64
This commit allows the recursive non-RD partition search to early
terminate sub search tree when the cumulative rate-distortion is
already above the best available.
Change-Id: Ifdbcbb4bee229f47fde3033200829577c9f1fc1d
This commit added a speed feature to make the logic of calculating
skip_recode on a block level more explicit. This also enable the
feature to be enabled at speed 5 where the previous logic is too
conservative, help gain back the lost speed for --rt(-5).
Change-Id: Ieb37ca3e85c2e7bda343486edf13d5f5395f2233
There were a few conflicts between the new non-RD partition search
and recent clean-up patches, which were not caught by git control.
This commit fixed these issues.
Change-Id: Ieebefbd6c19d81d0d13e3c568877d5cce2ab7797
This commit takes out the if statements on using adaptive motion
search flag. This feature is automatically enabled in non-RD mode
decision flow for variable partition types search.
Change-Id: I5a25cf9109d84d07aa61b3e02c8d32dda1e90cb0
This patch fixed WebRTC Issue 3020: "Uninit error at
vp8_mbpost_proc_down_xmm". The first 8 values in d were not initialized,
but was accessed. This patch fixed c code as well as mmx and sse2 code.
Change-Id: Iaa5b41a4ed3bea971b15fb826ce34b7ab4e36fb1
This commit enables a recursive partition type search for non-RD
mode decisions. It allows the encoder to choose variable block
sizes in a 64x64 block based on rate-distortion modeling.
It improves speed -6 coding efficiency for rtc set by 2.4%. Most
of the gains come from 32-40dB range, where many sequences gain
about 5% to 20%. Local tests suggest there is no speed change.
Change-Id: I06300016e500a21652812b7b3b081db39a783d66
This commit reformats non-RD coding flow layout to allow mode
decision with fixed and variable block sizes.
Change-Id: I2cdd3bb9f26c499ee4a9849004fd925cdd195d09
2 functions were optimized for avx2 by using full 256 bit register
In order to handle 32 elements in parallel instead of only 16 in parallel:
1. vp9_sad32x32x4d
2. vp9_sad64x64x4d
The function level gain is 66% and the user level gain is ~1%.
Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
Fixed dr memory errors reported in Issue 736:
https://code.google.com/p/webm/issues/detail?id=736
All elements in left_col buffer need to be initialized to ensure
the correctness of SIMD operations in x86 optimized code.
Change-Id: I8e7f26ab45cca8099c1f9342bcf852f828bda7e4
The third pred_mv is stored in x->pred_mv[ref_frame]. This commit make
sure the correct mv is read.
Change-Id: Ibed24daf36703a63f0394c87b2381ee1d2eb7910
In non-rd pick_mode code, added mode skipping according to
thresholds. Used rd thresholds now, but we can modified them
later for real-time case.
RTC set borg test showed a 0.095% PSNR gain. For different rtc
clips, the real-time(speed 7) encoder speedup is 2% - 10%.
Change-Id: Ic72535c96b891092c662453be32d3168f7e34dcc
The flag x->skip_recode interacts badly with
the cpi->sf.use_nonrd_pick_mode and
cpi->sf.skip_encode_sb speed settings.
Restricting the use of the skip_decode flag when
these other speed choices are in use helps quality
for speeds 3 and 4 by a large amount with only a
small impact on speed.
Average improvmentes for 2 pass speed 4:
Derf +8.8%
Yt + 10.53%
Std-Hd +6.95%
yt-hd + 22.95%
Change-Id: I8010876d8012042a11077c92e69d813c3dfa58eb
This commit changed how q is validated in lossless mode. With this
commit, when --lossless=1 is specificed at commandline, --min-q and
--max-q are now ignored. This is to make the option non-ambiguious.
Change-Id: I33e85690460537509d33be75d6a3597be4affc09
One of the tests for real-time mode is failing at speed 6.
Introduced recently, will enable again when fixed.
Change-Id: I8f42de6a3eca226c9aa5c5e1fab98d629993c087
Instead of hardcoding a certain indentation, use the regexp to
provide similar indentation for the new line as well.
Change-Id: Iacb2621b35ce7e1aa3980c1603b8e3ab02d98a35
This is an initial attempt to allow variable block size partition
in non-RD coding flow. It tests 8x8, 16x16 and 32x32 block size per
64x64 block, all using non-RD mode decision and the associated rate
distortion costs from modeling, then selects the best block size to
encode the entire 64x64 block. Such operations are triggered every
other 3 frames. The blocks of intermediate frames will reuse the
collocated block's partition type.
It improves the compression performance by 13.2%. Note that the gains
are not evenly distributed. For many hard clips, the compression
performance is improved by 20% to 28%. Local speed test shows that
it will also increase runtime by 50%, as compared to speed -7. It is
now enabled in speed -6 setting.
Change-Id: Ib4fb8659d21621c9075b3c369ddaa9ecb0a4b204
This makes sure that labels for data symbols directly after
functions get properly 4-byte-aligned (when the source is assembled
in thumb mode).
Previously, if declaring a data symbol directly after a function, the
symbol could end up pointing to the unaligned address (if the total
size of the thumb function didn't end up being a multiple of 4). The
data in the symbol itself ended up aligned, but the symbol pointed to
the preceding unaligned position.
That is, a source file looking like this:
---
...
ENDP
symbol
DCD 0x12345678
---
could end up being assembled into
symbol:
xxxxx2: 0000
xxxxx4: 5678
xxxxx6: 1234
(This doesn't happen if the symbol label is on the same line as the
DCD directive.)
By adding an ALIGN 4 directly after the ENDP we make sure the symbol
itself gets aligned properly.
This isn't an issue with the original, untranslated arm source,
since it only is built in arm mode where all instructions are 4 byte,
and since the gnu assembler automatically adds the padding before the
symbol even in thumb mode.
Change-Id: Iadbeebd656b0197e423e79a12a7d3ef8859cf445
1. Save stats for each spatial layer
2. Add frame buffer management for svc first pass rc
3. Set default spatial layer to 1
4. Flush encoder at the end of stream in test app
This only supports spatial svc.
Change-Id: Ia89cfa87bb6394e6c0405b921d86c426d0a0c9ae
<=sse2 isn't strictly necessary on x86_64, but this is more consistent
with the rest of the flags and should be harmless
Change-Id: Ice0f1d1c4c7510ee90af2a62dbd3d6508db63487
inheritance should be public; also correct placement of ClearSystemState
as the base class doesn't inherit from testing
Change-Id: I0f41330fccc62a70b8dd40d66bbd829b9d98cf84
This reverts commit 89025585cd.
This check breaks BSD builds and isn't useful through the configure
process. The README describes the build environment requirements (GNU
make).
Change-Id: I25f8a9c1640909412ab405dbd09a1c4d93e5a511
The use of uninitialized skip flag will trigger inconsistency in
coding statistics, when alternate RD and non-RD coding modes are
enabled. This commit fixes this issue and removes unnecessary if
statements from update_state_rt.
Change-Id: I7d549dcb0e3ef48b999e5bbc78174ba84502cfcf
The comment made it look like the condition code was dropped from
the extra add instruction, while it actually was handled properly.
Thus, the comment was misleading while the code itself did the right
thing.
Also clarify the comment indicating that we use the full three-operand
form of the add instruction.
Change-Id: I2c1ac6ac4fedf262d104ea30a6c005febc74de9c
Adding a --(enable|disable)-webm-io flag to control WebM container input and
output support. For now, enabling WebM IO by default only when there is a C++
compiler. Doing so because eventually we will move WebM IO to libwebm and it
is built using C++.
Change-Id: I210ac36c23528e382ed41d3c4322291720481492
The block coding skip flags are assigned in the normal RD mode
decision loop. They are then used in the final encoding stage.
In the non-RD mode decision, the forward transform and quantization
stages are replaced by modeling based on SSE and variance of
prediction residues. This commit applies reset to this array in
the non-RD coding mode.
Change-Id: I66584669b035e9c8ac23e95047849ff277472742
This commit moves the position where rdmult is saved to make sure it
is the correct value. Prior, an uninitialized value may be saved and
restored.
This addresses issue:
https://code.google.com/p/webm/issues/detail?id=733
Change-Id: I436407f289169bc63da3c5a6bf609bed16cb71b5
This commit unifies the non-RD partition use cases for both fixed
and variable block sizes. Deprecate and remove the separate function
for fixed partition type only.
Change-Id: I2b6cb945e90c1566f985adcebc4d0757480a8004
This commit allows the non-RD mode decision process to return the
rate and distortion costs associated with the selected mode.
Change-Id: Ibe0f67d323f65839fd9cb0a726c1219bf7b55da9
Brings back most of Jim's previous patch for choosing
partitioning based on variance while making it compatible
with the current state of the code. Also adds a
nonrd_use_partition() function to recursively encode for any
arbitrary sb_type decisions within a 64x64 block; and
includes some refactoring.
Currently, when the VAR_BASED_PARTITIONING mode is turned on
for speed 7, there is a 10+% speed-up observed.
Experiments/improvements with this new partitioning method
will be conducted subsequently.
Change-Id: Ie6f43bfbde30583e941f450bf07c3b48828c9571
This commit adjusts the rate-distortion modeling for non-RD mode
decision. It puts more weights on energy from AC coefficients when
estimating the cost. The coding performance for rtc testset is
improved by 0.72%.
Change-Id: Ifa6ff11311a513ec2b10586589e82a9a21f6c61c
Assign interp_kernel value in MACROBLOCKD. This will be used to
select prediction filter coefficient sets and generate motion
compensated prediction.
Change-Id: I28c8dfb2dae6566f6939bb328aca5875c94bee65
Reimplementing sub8x8-reading of intra block modes in
read_intra_frame_mode_info() and read_intra_block_mode_info(). Code looks
more readable as well.
Change-Id: Ia42fc7d0dad708bc0c7a8bff1f8b37809b843f40
Clean-ups include
a. redundant code in rt -5 speed feature settings
b. code that guarantees square block availability in
rd_auto_partition_range()
Change-Id: Ic7b04d45b6dc15c461e0edbbb4e78aec20348291
The block size used for non-RD mode decision in FIXED_PARTITION
setting was uninitialized. This commit fixes it by setting block
size to be BLOCK_16X16.
Change-Id: Ief04c9f1ab668de69297d9ab3dc15e2fa0bc4e95
Neon code unit test is failing now due to save/restore neon register
operations are not done inside this function, but outside of it. Disable
it now until VP8 neon code get cleaned up.
Bug: 725
Change-Id: Id1ff1ef50a0e894b41c820a310ff8ba31ef12d18
translates to TreatWarningAsError (/WX)
setting this via the CL environment variable is not possible due to the
/WX- default which is used on the command line
Change-Id: I0b42a9d3ca9eba6af82c25b8e434baa2fcb00156
Adds a fast diamond search which is about 5% faster than FAST_HEX
with only a 0.1% drop in psnr when turned on for both speeds 5 and 7.
This search is turned on for speed 7.
Change-Id: I497630aa88a5148926086bb3038e7975e5f4eb98
This commit allows the non-RD mode decision to skip mode RD modelling
check, if the motion vector associated with the current mode is
same as that of NEARESTMV mode. This makes speed -7 about 2% faster.
Previous change that converts cost metric from SAD to model based RD
value makes the codec 6% slower at speed -7.
Change-Id: I30cfec5452f606a671b8432a2f7f0c94fbb49fc8
For some dimensions, neon code ends up in a dead loop inside.
This will fix the unit test failure in svc_test on ARM.
Change-Id: Ie6098bfaefd86bcf3616a3d0c2c3ff0b154222b5
The rate-distortion model in non-RD coding mode is only applied to
luma component. This commit removed a few redundant addition steps.
Change-Id: Id8edc0a47c2dbef8deba43debe2c95db39454de3
This commit replaces SAD cost with modeled rate-distortion cost
for non-RD mode decision. It translates the prediction residual
SSE into estimate rate and reconstruction distorion costs, hence
capturing the quantization setting effect. The compression
performance of speed -7 for rtc set is improved by 14.79%.
Change-Id: Ifda014eb0501d13109fe7f92680bf1410b463632
fixes issue #711
specifying a multiword CC, e.g., CC='gcc -m32', would cause the failure
under dash
reported in
https://bugs.gentoo.org/show_bug.cgi?id=498136
patch by floppymaster at gmail dot com
Change-Id: I2ba246f765646161538622739961ec0f6c2d8c2d
clang on macosx does not support -Wunused-but-set-variable; adding the flag
causes additional warnings about the flag. As a more generalized fix, use
-Werror when checking compiler flag support in order to avoid using
unsupported warning flags.
Change-Id: I2529862e211f880d56491eac3b9fa90fff1aa5c3
clang reports gcc-4.2.1 in e.g., 3.3, 3.4; add a specific clang version
check for _mm256_broadcastsi128_si256
fixes issue #720
Change-Id: I5c8e3c27fdea05d8a5b050e8cb74894b595f4709
prevents out of tree build failures when the source tree has already
been configured; modeled after a similar check in autoconf
Change-Id: I627eb7243576f4d753141dfcb4ed4e34544d03a7
Configuration logging is passed through pr, but nothing configure
does actually requires pr. Use cat instead.
Change-Id: I451217882a329c2bfb8942ac86ac624a7feef670
The only difference between two examples was usage of VPX_EFLAG_FORCE_KF
flag for frame encoding. Moving this functionality into simple_encoder
with additional command line option.
Change-Id: Ia3c4209be073eeb541d4ac6b41bd0f12812f6676
avoid building x86inc.asm, x86_abi_support.asm and vpx_config.asm as
they provide no symbols themselves
fixes:
warning LNK4221: This object file does not define any previously
undefined public symbols, so it will not be used by any link operation
that consumes this library
Change-Id: Iecfe03aa76efbfc07c2af5b91ba5405634e45f1d
Set speed features before running frame encoding. This avoids
redundant RD threshold calculation in key frame coding.
Change-Id: If8e3cf2c02976baa59b310c1c23af9eea0c46e36
* Remove all non-DC intra modes for BLOCK_32x32 and up
* Remove all intra modes for blocks bigger than BLOCK_32x32
* Remove ZEROMV for BLOCK_32x32 and up
* Only consider NEARESTMV for blocks bigger than BLOCK_32x32
Change-Id: Ia18351a238213e2f072f9e481d622949346a245f
+ nestegg_track_codec_data
quiets uint64_t -> size_t warnings
the sizes used are previously validated against their associated LIMIT_*
values
Change-Id: Ie574a3a7496d0143bd58b778145c27f38dd6a4da
- Change type of encrypt_buffer() offset argument to ptrdiff_t, and change the
type of the size argument to size_t.
- Update size argument encrypt_buffer() in vp8_boolcoder_test.c with
same.
Change-Id: Ie29c7c82c73318bee01b89c6fb4c4e1442eef03c
The core motion estimation fucntions all return sad now consistently.
The only exception is vp9_full_pixel_diamond(), however the core diamond
and refining search routines called from vp9_full_pixel_diamond() also
return SAD. If variance of pred error + mv cost is desired it must be
calculated explicitly outside these functions. For very fast encoding,
hopefully this will eliminate some redundant computations.
Also suggests reimplementing FAST_HEX with the vp9_pattern_search
framework. It is not exactly the same as the existing FAST_HEX, but
performance is slightly better and speed is very similar. Enables
removing a lot of duplicate code.
Change-Id: I152736393438c25bdf7e96b37cbb8ce330f4f94a
significantly speeds up file generation.
the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.
---
Linux
[CREATE] vpx_scale_rtcd.h
real 0m0.485s -> 0m0.022s
[CREATE] vp8_rtcd.h
real 0m4.619s -> 0m0.060s
[CREATE] vp9_rtcd.h
real 0m10.102s -> 0m0.087s
Windows
[CREATE] vpx_scale_rtcd.h
real 0m8.360s -> 0m0.080s
[CREATE] vp8_rtcd.h
real 1m8.083s -> 0m0.160s
[CREATE] vp9_rtcd.h
real 2m6.489s -> 0m0.233s
Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
* speed improvment of 30 percent achieved
* multiplies and adds remain the same
* non-arithmetic instructions minimized by hand, by:
-expanding 2 pass loop
-removing irrelivant "shuffles"
-combining last two rounding steps
* further improvments may be possible
Change-Id: Idec2c3f52910c48e6a0e0f9aefed5cae31b0b8c0
This patch adds a new speed feature which doesn't do the rather
expensive entropy context lookup or save to the table, while
doing costing.
The speed up on desktop36p.y4m is around 10% other clips much less.
On the RTC test set this was + 1% in overall datarate.
Change-Id: Ia5144bbf45270671e7be9c8e4055369909e2f738
This gets more accurate mode hit stats. It's also the first step to
handling ZEROMV not being allowed more intelligently.
Change-Id: I5de6734507b5177bf73e9ddbad923f218c39f3e4
intra_y_mode_mask is already enforced for the sub8x8 case.
intra_uv_mode_mask is already enforced for all sizes.
Change-Id: Ia9dd14701cb49873c2e8f24eb5f8b255eaf76a1f
The function has evolved over time, now only calls vp9_rtcd(), so this
commit removes the function and changes to call vp9_rtcd() directly.
Change-Id: I8cfa6190daa4b28f6f3d1e11bb3a07f9c95322bf
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_avg_variance64x64
2. vp9_sub_pixel_avg_variance32x32
both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 80% function level gain and 2% user level gain
Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8
ne_read_block/ne_find_cue_position_for_track/nestegg_get_cue_point
in the use of ne_map_track_number_to_index
+ add a check to ensure it doesn't exceed the type bounds
fixes:
./third_party/nestegg/src/nestegg.c|1322| warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
Change-Id: I3703d739dcf9a2d4d8e2b704e957e5e3fd80dca0
different_ref_found is always equal to one (if calculated) because
ref_frame[0] != ref_frame[1] for each mi-block.
Change-Id: Ibd7625b7b29dec2fd3c40edbc3de1169abb78585
2. Add read/write for RC stats file
The two pass RC for svc does not work yet. This is just the first
step. We need further development to make it working.
Change-Id: I8ef0e177dff0b5ed3c97a916beea5123717cc6f2
Adds a speed 8 to VP9 where only the nearestmv (0 mv) is searched.
This seems to be about the same speed as vp8 speed 5.
Adds a new speed feature to disable inter modes based on a mask for
each blocksize.
Adds code for having lower complexity motion search methods
in nonrd pick mode function, even though speed 7 still uses DIAMOND
search for now.
Also uses HEX search for speed 6 rather than FAST_HEX which improves
psnr by 0.56% without any noticeable speed drop (tested on gipsmotion).
Change-Id: Ic13176572dbd3aed5884a26786940a4b1bbd8a75
For blocks at frame boundary, the selected block size sometimes needs
to be smaller than that was first given. This commit forces such block
size change only between square blocks, so as to avoid the potential
use case containing 32x16 + 16x8 + 16x8, for 1080p sequences.
Local test suggested no visible coding speed difference. Borg test
reveals no difference in terms of compression performance.
Change-Id: Ie8de87f3c6febc3acf11b4cbfdf2077f9f6def52
This commit checks if the motion vector associated with the current
mode has been computed in previous mode tests. If possible, skip the
redundant reference block generation and SAD calculation in the
non-RD mode decision process.
For test sequence pedestrian_area 1080p, the runtime goes from
24261 ms to 23770 ms. This does not change compression performance.
The speed-up is mostly around places with consistent motion.
Change-Id: I97be63c6a2d07c57be26b3c600fbda3803adddda
previously the scale functions would always be include regardless of the
CONFIG_SPATIAL_RESAMPLING setting.
Change-Id: Ifbccf47b20689b5dd61bb3ddccd5c013297b4e05
Added fast HEX search while doing non-rd partition picking to
speed up the encoder.
Borg test(speed 7) on rtc set showed 1.8% overall PSNR loss.
Encoder speedup was 5% - 15% for different rtc clips.
Change-Id: I9c83026eabc70b69fcc747c90369ec60bfa3ca24
In handle_inter_mode, the reference frames are set in refs buffer.
One can use refs buffer directly to avoid redundant fetch.
Change-Id: I811d408cae52dcd5e053dd4bfe69550eb6a2ff56
Instead of using source variance, this patch uses variance of the
frame difference between the source and the current frame to make
fixed size partition decisions. Also disables adjusting partitioning
if variance based or fixed size partitioning is used.
The latter change improves the speed substantially for speed 6, so
that speed 7 is now less than 3x the speed of speed 6. But speed
6 is 48% better in psnr on the rtc set compared to speed 7.
As compared to speed 5,
speed 6 is -37% in psnr at about 2.5x the speed,
speed 7 is -55% in psnr at about 7x the speed.
Change-Id: If61d80431d3e04ed304ac05832e773cdb2c0a578
Begin() will be called twice with 2-pass encodes, invoking
y4m_input_open which allocates memory; close the old instance first.
Change-Id: Id252a21d286ca9ae998bd87599d43aeb8d7d77aa
FwdTrans8x8HT is disabled as the tests currently fail.
note not all functions have NEON implementations:
- fdct8x8/fht8x8
Change-Id: I028bdec9a21eaaee2c5865470ab179aac403540e
Trans4x4HT is disabled as the tests currently fail.
note not all functions have NEON implementations:
- fdct4x4/fht4x4
Change-Id: I26f8724bf2a9ea01d59205a1c57119ed25d043bc
There was a bug in the previous code that if GOLDEN was better than
LAST neither would be used. LAST would get turned off due to superior
GOLDEN quality then all GOLDEN modes would get skipped.
Change-Id: I173f3720451707dab7b2cbbe8b8e6a047089bde7
Removing all copies of identical vp8_mse2psnr/vp9_mse2psnr functions.
Using vpx_sse_to_psnr() instead in all places.
Change-Id: I15beef9834d43d8fc8a8a7a2d1fc5de3d658fed8
Usage of encode_b_args is unnecessary because encode_block_pass1() doesn't
use them. That's why optimize_init_b() call is also not required.
Change-Id: Ib6cfe4916c2ca85749c90bb0adcba6fea592f9ac
As Yunqing suggested, this commit makes non-RD mode decision always run
sub-pixel motion search in NEWMV mode. The compression performance
gains becomes fairly significant after we enabled sub-pixel accuracy
motion compensated prediction to calculate SAD cost.
For test sequences pedestrian_area at 1080p and vidyo1 at 720p, the
runtime goes slower by 5%. For rtc test set, the compression performance
is improved by 21.20%.
Change-Id: I38cbfdd5c53d79423e1fafb3154f8ddeed63bbf0
The commit change to use partitions sizes directly from last frame
for frames directly where last frame selects partitions sizes based
on coding efficiency.
On --rt --cpu-used=-5, the change hurts compression by 4% but reduces
encoding time by ~20%
Change-Id: Ia68665e5c8489b7bfcf5fac7768332fba88928e6
Modify existing test to also check the case of dropping
(i.e., skip decoding) a consecutive list of frames.
Change-Id: Ia8c1195559f952e86e6697996931d3a920c05ae3
The only difference between two examples was a setting of
g_error_resilient flag in error-resilient example. Moving this
functionality into simple_encoder with additional command line option.
Change-Id: I0245793320125926e1bf208cc1e87aef39ca478d
This commit builds the actual prediction block in sub-pixel accuracy
and uses which to calculate SAD for non-RD mode decision. In the trail
run on pedestrian_area at 1080p, rtc speed -7 runtime goes from
23495 ms -> 25107 ms (7% slower). The compression performance is
improved by 20.57% for rtc test set.
Change-Id: I438589cd103fe99f1b50c2d1939ac6ca43fa0157
Use a set of dedicated variables to buffer the current best mode
in non-RD mode decision. This allows to use mode_info for more
complicated test in the non-RD process.
Change-Id: I6024c9feb0662afd3eb29f7017f7b5a5446f303f
Adds a method for determining a fixed size partition based on
variance of a 64x64 SB. This method is added to rtc speed 6.
Also fixes a bug in rtc_use_partition() and includes some
refactoring related to partitioning search, and some cosmetics.
Currently compared to speed 5, the coding efficiency of speed 6
is -19% and that of speed 7 is -55%, in cbr mode.
Change-Id: I057e04125a8b765906bb7d4bf7a36d1e575de7c6
The optimizer did something funny with the code around
line 1412. Before the call to encode_sb split_dist was
set properly but after it was adjusted and converted to
a negative.
https://code.google.com/p/webm/issues/detail?id=714
Change-Id: I9a7631d5325ade2dc28c1030653a23eecec8721b
Change dx_time data type to int64_t to prevent
test time overflow when decoding long video.
Change-Id: I3dd5e324a246843e07e635fd25c50e71e385ed70
Signed-off-by: James Yu <james.yu@linaro.org>
If sf->disable_split_mask is DISABLE_ALL_SPLIT, disable
sf->adaptive_pred_interp_filter to avoid unnecessary operations.
Change-Id: Icb59174b2f4e9a3c3c16a696deb8018e5bd999eb
Moves the existing speed 6 to speed 7 and adds an
intermediate level 6 which is roughly in between
speeds 6 and 7 in both speed and coding efficiency.
Also includes some minor fixes/adjustments.
Change-Id: I98befc4d82d750e79fe426c457c4a2571f6b6cc7
If show_existing_frame indicates that the decoder should
display an existing (previously decoded) frame, add a
check to make sure that the signaled buffer does contain
a valid decoded frame.
Change-Id: Iac8c686b321827414d69a3f2d0467566911bcba2
for ABSDATA mode, so segment loop filter level always fall in valid
range for both Absolute and delta modes.
Change-Id: If90df3411479533dbdab63f8ae088d2f5dd174a9
The qindex for a segment was not clamped in ABSDATA mode, which may
cause invalid memory access if an ill-formed stream has a negative
value in ABSDATA mode. This commit added clamp to make sure qindex
for a segment always fall into valid range.
Change-Id: I0a74d00f4ef40aec7edaeca1d03c8645e23ab08c
Skip coefficient cost update in non-RD mode decision setting. Allow
periodical mode and motion vector cost update. Currently every other
8 frames. The increment runtime is a constant number. Hence more
visible for CIF resolution, while negligible for 1080p.
Speed -6 compression performance for rtc set is improved by 4.5%.
Change-Id: I27e0ad7c521fcc2af1d825582cbdd1a27ac4c323
This commit makes a refactoring of the rtc_use_partition. It allows
the encoder to take a preferred block size for non-RD mode decision.
The boundary blocks are handled such that smaller block sizes that
fit in the boundary size will be used instread.
In rtc mode, the coding performance of speed -6 for pedestrian_1080p
goes from
158980 b/f, 38.934 dB, 22721 ms to
159008 b/f, 40.064 dB, 23721 ms.
For rtc set, the speed -6 compression performance is improved by
26%. Still about 2dB behind speed -5 at this point.
Change-Id: If0944f0880eaf1ad340bc325d97cea8d0f9dd53f
- Update the vcxproj generator to pass the path to the batch file.
- Update the batch file the take the path to obj_int_extract.exe as arg
2.
Fixes this warning:
warning MSB8012: TargetPath does not match Linker's OutputFile property
value.
Change-Id: I5825f1d1d79f370aeb295bbd2aeb08b22c0e73ab
* Reduce the number of short cirtcuit checks by pre-computing and combining like checks.
* Postpone non-trivial initializations until after the shortcircuits are evaluated.
* Add some consts and const pointers.
No change to the actual results of the call or output of the encoder.
Change-Id: Ie44c4702aec6e08cfe0b8b0ba3cd6b57206478d1
This commit enables the use of DC, vertical, and horizontal intra
prediction mode in rtc non-RD mode decision. When the best cost value
of inter modes is above a given threshold, the encoder runs the
above three intra modes and selects the one that has minimum
prediction residual in terms of SAD.
This together with recent changes on non-RD mode decision and coding
control improves compression performance of speed -6 by
derf 91%
yt 61%
hd 46%
stdhd 52%
In terms of encoding speed, it is about 3 times faster than speed -5.
Change-Id: I6b483bfd0307e6482bb22a6676ae4e25a52b1310
When non-RD coding decision is used in rtc mode, the alt reference
is not used for inter frame prediction. This commit disabled alt ref
option whenever speed -6 is used.
Change-Id: I0b33ca03661de1db2d9bef1bcbff848cd4c9396f
Set TargetName for library builds instead of changing the value of
OutputFile.
This fixes the following warnings:
warning MSB8012: TargetPath does not match Library's OutputFile property
value.
Change-Id: I4320b6d9ea922d3a15b9823c7c6694ee33edbf45
Current setting was specific to 1 layer case.
rc_target_bitrate is total bitrate for whole stream,
so set it to ts_target_bitrate for highest/top temporal layer.
Change-Id: I83de73364956fa21c0a7c971c9f390d4840457e6
Use unsigned int instead of uint64_t for duration and deadline
arguments to functions get_frame_stats() and encode_frame().
Change-Id: I1f26a7afc38ae89916b2c67415ced26fdc9d53e7
In the first coding run of a 64x64 block, check the coding mode
for each 8x8 block. Will need a second annealing stage to decide
the partition size to be encoded.
Change-Id: Ida9417805ff3358979b0c0429d4099c023c88866
In good quality mode motion search, the best matches are normally
found after searching in a large area. In real time mode, to make
encoding fast, a center-biased fast HEX search is used, which
converges quickly most of the time. A 4-point diamond search is
also carried out as the following refining search, which gives more
precise results, and maintains good motion search quality.
At speed 5, the borg test on rtc set showed an overall PSNR loss of
0.936%. The encoding speed gain is 4% - 5%.
Change-Id: I42cd68bb56a09ca1b86293c99d5f7312225ca7ae
Run sub-pixel motion search when NEWMV gives lower rate-distortion
cost. This improves coding performance of derf set by 8%, std-hd by
2.2%.
Change-Id: Ife50f7fda8463927784fe59a41cc439c833e941a
- Rename and make static
s/vp9_compute_qdelta_by_rate/compute_qdelta_by_rate/
- Make base_q_index an integer.
- Add a cast.
Change-Id: Iea8d1397fd2717e7373b182ec51f5db960ef2cca
If MINGW_HAS_SECURE_API is defined, we don't need to declare strtok_s, but we still need strtok_r define.
Change-Id: I7cf781bb58f991a2bdce6a2ccf5082f6924579a3
these were incorrectly stripped in:
50fa585 Removing examples code generation and making them static.
Change-Id: Idb475ad5b303634311e9f616604312cb925cc6a9
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_variance64x64
2. vp9_sub_pixel_variance32x32
both of those function were calling vp9_sub_pixel_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 70% function level gain and 2% user level gain
Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization is done only for 64 bit
Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
the remainder of the documentation will not be included in the output
unless the file itself is documented
Change-Id: I5a83a6c41cdfbf2976da288e4b70bd04002725f2
Only use layered average size if number_temporal_layers > 1.
Also removed unneeded commented-out line, and change some parameter
setting in vpx_temporal_scalable_patterns.c
Change-Id: Ic86e43e7daf0313e8c5a4aba1497299158111955
Added support for external frame buffers to libvpx's VP9 decoder.
If the external frame buffer functions are set then libvpx will
call the get function whenever it needs a new frame buffer to
decode a frame into. And it will call the release function
whenever there are no more references to that buffer.
Change-Id: Id2934d005f606af6e052fb6db0d5b7c02f567522
Prior to this commit, both encoder and decoder reset mode/mv info from
previous frame in error resilient mode to ensure bitstreams are able to
decode when there is loss of frame in decoder side. However, this is
not necessary. This commit changed to remove the reset, so encoder can
continue to use mode/mv/partition information from previously encoded
frame without affecting decodeablilty under loss of frame.
Change-Id: I0279f862900dc647fb471ae3389770bb1b9f454f
Silence signed/unsigned mismatch warnings by adding casts where
ts_number_layers does not match the sign of the variable to which
it is being compared.
Change-Id: Iab25e18c877d158b2b2b417de7da94669648b2fa
In rtc coding mode, the encoder is running non-RD mode decision. It
does not need dual buffer swap as was the case in the RD mode. This
commit initializes the internal buffer pointers outside the block
coding loop for rtc mode.
Change-Id: Ie076705c60d6b7919217e3f1dfd49e7db5064ac2
The functionalities of set_offsets() are subsumed in later
set_partitioning() and rtc_use_partition() functions, hence removed.
Change-Id: Ie514b13cb66c2379f13d0be9b1da4c12ca4581e5
Flipping arf on and off too often is hurting some clips.
This change makes no difference for 50-75% of our test
clips but helps some by a big margin. (eg. std-hd crew
by 6% and one of the YT and YT-hd clips by 14%)
Average improvements for 2 pass, speed 2 (psnr,ssim)
are as follows:-
derf 0.165%, 0.210%
yt 1.210%, 1.464%
yt-hd 1.189%, 1.471%
std-hd 1.031%, 0.886%
Change-Id: I121fe66cfb4a62d384b23b484a7d648789641969
Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows
Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
Use size_t for DecodeFrame()'s size arg, and cast only
at the vpx_codec_decode() call site. This silences warnings that
appear in svc_test.cc when building with vs2013.
Change-Id: I2cf39f02a45732c752097f07b0c7ad414b1517d8
This function initializes the predictor buffer pointers and
calculates reference motion vectors. It is only needed in the settings
of inter frame coding. Hence removing it from the key frame coding
branch in rtc_use_partition.
Change-Id: Ic4e16c7467a5f32be4e0bf619ef9d57afb4a7075
Turns on AVX when the final characters of .c and .cc file names preceding the
.c and .cc file extension contain the substrings avx or avx2. This silences
many MSVC warnings issued during compilation files that use AVX.
Change-Id: I82bda394af7a688679abab2a50dd7e10b3cb0c7a
This function is deprecated after the re-design of partition search
that runs big block size, then four-way split, followed by
rectangular block sizes. This commit removes the related functions.
Change-Id: I417549c8e0fa3cf35bd29816b805dd4e7c3660c6
The function rd_pick_reference_frame can be deprecated. Its use was
subsumed by the adaptive motion search control.
Change-Id: Icb0c2fa335f0f06fa7b79a71f972d9fa54d750db
Removes certain cases of feedback of active_worst_quality,
and removes it from the RATE_CONTROL structure. Now active
worst quality is expected to be computed locally in the
q picking function during the encode.
Making temporal filter strength depend on avg_frame_qindex
rather than on active_worst_quality actually improves
performance esp. for yt.
derf: +0.038%
yt: +0.359%
Change-Id: I1fe5a343034b55af9322289165321f00ac0827b1
In real time encoding, we enable encode_breakout to make encoding
fast. A speed feature "use_encode_breakout" is defined to set
encode_breakout thresholds for different speeds.
However, currently, static_thresh is an encoder option. The encode_
breakout can be turned off if user sets static_thresh=0 specifically.
The rtc set borg test result: (need to set --static_thresh=1)
speed -5, psnr loss -3.543%;
speed -4, psnr loss -2.358%;
speed -3, psnr loss -0.771%.
Encoding speed test:
speed -5, 11% - 60% speedup;
speed -4, 5.5% - 28% speedup;
speed -3, 0.8% - 7% speedup.
Change-Id: Icde592ffbe77eac7446f872a2e9eb2051733677b
Aq 1 only updates segment map on kf and arf and
only uses 3 segments. With these settings AQ1 is
+ for most clips in SSIM but negative in psnr.
However, the penalty in PSNR is much less than
previously.
Old version aq1 average results for std hd
-20.899% psnr, -5.809% SSIM
New version aq1 for std hd
-3.57% psnr, +1.23% SSIM
Aq2 Now uses only 2 segments and rd.
This mode is still slightly negative for most clips on
psnr and SSIM but seems to have a much bigger visual
impact on several problem clips than aq mode 1.
Old results for std hd:
-2.578% psnr, -1.151% SSIM
New results for std hd:
-1.561% psnr, -0.85% SSIM
Change-Id: I94f57f8a73121629ce598fb921aad761c1450e1c
Some parameter changes and fixes on one-pass rate control.
derfraw300 is now only 10% below 2-pass speed 0 rate control.
Change-Id: I1940eef8a5a035dc18e71b880d5e00cabd1f01b9
This CL changes libvpx to call a function when a frame buffer
is needed for decode. Libvpx will call a release callback when
no other frames reference the frame buffer. This CL adds a
default implementation of the frame buffer callbacks. Currently
only VP9 is supported. A future CL will add support for
applications to supply their own frame buffer callbacks.
Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
Function encode_rtc_frame_internal() and encode_frame_internal() only
differed by a couple of speed features, this commit relocation those
difference into the setup of speed features and merged two functions
into one to remove duplication.
It also fixed a subtle bug super_fast_rtc was used before it was
initialized.
Change-Id: I234a5a1d11a4450930e5b4943dbab434208d5030
-Properly set the average frame size for each layer.
-Allow each layer to update its average/last Q stats after encoding.
-Initialize for some layer context variables.
Change-Id: Iaa37d144fcf4f30ff4283a4e8db8b9ca8bf4c815
A bug was reported in Issue 702: "SIGILL (Illegal instruction) when
transcoding with vp9 - using FFmpeg". It was reproduced and fixed.
Change-Id: Ie32c149a89af02856084aeaf289e848a905c7700
Bitwise OR operation doesn't guarantee any subexpression evaluation order.
Just reading one bit now and ignoring the next one. For reference look at
vp9_decode_frame() implementation.
Change-Id: I4971686929838ae5ded8f43a38a2934db5e1d462
Fixes some of the parameters for 1-pass non-cbr mode.
Also includes some cleanups, inlcuding refactoring of the
recode_loop options.
Results on derfraw300 improve by about 5-6%, so that the one-pass
mode is now 13% below the 2-pass mode in speed 0.
Change-Id: I844cc2638694c7574f3be00d41d60b23dc1016f0
this ensures both are properly initialized when calling _dealloc().
+ check the arrays before access
Change-Id: I789af39b41c271b5cb3c029526581b4d9903b895
This patch adds a buffer-based rate control for temporal layers,
under CBR mode.
Added vpx_temporal_scalable_patters.c encoder for testing temporal
layers, for both vp9 and vp8 (replaces the old vp8_scalable_patterns).
Updated datarate unittest with tests for temporal layer rate-targeting.
Change-Id: I8900a854288b9354d9c697cfeb0243a9fd6790b1
Update the local makefile to build all the files and the test
application by default to simplify build verification.
Change-Id: Ic10141ea14c85110ff7507447d16297b77d296e9
Right now only IVF format is supported which is enough for example code.
Other formats like y4m, webm, raw yuv will be supported later.
Change-Id: I34c6f20731c1851947587ca5c589d7856b675164
since
50fa585 Removing examples code generation and making them static.
the examples have been c files, not generated from text. this removes
GEN_EXAMPLES and replaces it with EXAMPLES, building the source directly
rather than copying it to the build folder
Change-Id: I5445bc49553419e3d2430963517d2c18cdba1f82
Inlcudes a number cleanups:
1. Moves the one-pass pre-encode parameter setting functions
to vp9_ratectrl.c
2. Deprecates per_frame_bandwidth in RATE_CONTROL structure
3. Removes target_bandwidth in cpi structure since it is not used.
4. Various renaming of functions
There is no bit-stream change in 2-pass, one-pass cbr and one-pass
vbr modes.
Change-Id: Ifd9916bf4d485b7d04c5f52044ffe6703254ccbd
This isn't strictly necessary, but makes the file more consistent
with the other arm assembly source files.
Change-Id: I245c9677d89e0ab3f31991e473764858af35b180
Avoid substitution of substrings by using \b to make sure the
substituted strings are at word boundaries.
This is an adaption of the corresponding changes to ads2gas.pl
from 7ebcaeb0fa.
Change-Id: I52160e8ba0373d4779d5fc3b0c384ca5c51c7b13
remove example files that have been tracked since:
50fa585 Removing examples code generation and making them static.
Change-Id: I9dd2e1588003918286d455c5e58a43393b176a84
older versions of visual studio did not include the trailing \. this
moves the objects to their intended location: the project subdirectory
Change-Id: I244479cdebf6b3f03bed6dbfca82e7fb4542f0de
CONFIG_USE_X86INC is available to every makefile, there's no need to
duplicate its value with USE_X86INC
Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
avoid wrapping msvc includes with extern "C"; this breaks some visual
studio builds of the (c++) tests.
Change-Id: Ie8062d55d4f4c049f6cd360a36da6a67607df132
git diff adds the following line to diffs:
\ No newline at end of file
which interferes with diff.py parsing. diff.py only looks for '+', '-'
and ' ' at the beginning of the line.
Issue seen on https://gerrit.chromium.org/gerrit/68611
Change-Id: I0d7b4485c470e0b409f2c9cddde6c9aceba0152e
Fixes rate control partially in one-pass non-cbr case to achieve a
bitrate close to the one desired. Previous version was way off at
the high bitrate end.
Also includes several one-pass rate control cleanups and refactoring.
On derfraw300, one-pass encoding is now 19% off from two-pass speed
0 encoding, down from 35%.
Change-Id: I6f0dcdb7f8aa85a7e7cd3a3155d4f9d2a4d2f4f4
This patch added ssse3 optimization of bilinear sub-pixel filters.
The real time encoder was speeded up by ~1%.
Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
Moved a few features with low impact on compression form -5 to -4 and
increased adaptive_rd_thresh for -5.
Change-Id: Ib1b748168cc6ed7684ae4818499f3a536ae76253
The new implementation disagrees when the argument is equal to 2**n but
that is never called in practice and based on how it is used the new
implementation is correct in that case.
Change-Id: Ifbac4ad87d459fe6bd2fd0f400c0340f96617342
This avoids calls to get_unsigned_bits() with constants and
replaces hard to trace loops with simpler structures.
Change-Id: Ic1afc5a17d7df5bcfc85b76efda316b0bf118467
Using bilinear filters could speed up the codec in real-time mode.
This patch added sse2 optimizations of bilinear filters that
operate on different-sized blocks.
Tests showed that the real-time encoder was speeded up by 3%.
Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
This patch adds a buffer-based rate control for temporal layers,
under CBR mode.
Added vpx_temporal_scalable_patters.c encoder for testing temporal
layers, for both vp9 and vp8 (replaces the old vp8_scalable_patterns).
Updated datarate unittest with tests for temporal layer rate-targeting.
Change-Id: I9cb6cce2494390ae6096ee17774af7fb9308bde7
the public typedef already includes a const, quiets
'same type qualifier used more than once' warnings
Change-Id: Ib118b3b116fba59d4c6ead84d85b26e5d3ed363d
As pointed out by Dmitry and James, "partial" is a Microsoft-
specific c++ keyword, and it is renamed.
Change-Id: Ia0fc11ceb89e54b3195287f89f7e26edbbe9beb8
This commit added a logic to prevent the inter_filter type from being
changed if the default interp_filter mode is not switchable. Also, it
sets the default interp_filter to BILINEAR at very and super fast rtc
encoding modes
Change-Id: Ic41e6d31de29795a4ce536ec79afb01cab6daad3
--rt --cpu-used=-5 uses the progressive rtc mode
--rt --cpu-used=-6 uses the new super fast rtc mode
Change-Id: Id6469ca996100cdf794a0e42d76430161f22f976
Implemented parallel loopfiltering, which uses existing tile-
decoding threads. Each thread works on one row, and when that row
is loopfiltered, it moves to next unattended row. To ensure the
correct filtering order, threads are synchronized and one
superblock is filtered only if the superblocks it depends on are
filtered already.
To reduce synchronization overhead and speed up the decoder, we use
nsync > 1 for high resolution.
Performance tests:
1. on desktop:
8-tile 4k video using 8 threads, speedup: 70% - 80%
4-tile HD video using 4 threads, speedup: ~35%
2. on mobile device(Nexus 7):
4-tile 1080p video using 4 threads, speedup: 18% - 25%
4-tile 1080p video using 2 threads, speedup: 10% - 15%
Change-Id: If54b4a11960dd706c22d5ad145ad94156031f36a
* Avoid unnecessary type erasure
* Prune unused/duplicate fields from struct rdcost_block_args
* Make struct rdcost_block_args a local
Change-Id: I4f1fd4837ccd028bbfe727191ee8d69f0463b7e5
When showing a previously decoded frame, i.e. when
show_existing_frame=1, the update of the
last_show_frame flag must be disabled.
This is to ensure that the last_show_frame flag
reflects the state of the flag for the immediately
previously decoded frame rather then the value that
was forced to ensure that a previously decoded frame
would be displayed.
This patch also adds a test vector to verify that the
display_existing_frame flag works as expected. Code
for generating the test vector can be found in this
patch:
https://gerrit.chromium.org/gerrit/#/c/68581/
(Bug originally reported by Alexander Voronov
<ru.xalba@gmail.com>).
Change-Id: I731d288fba02088959f7fcc87707137fffc6acf5
Added a constant to represent the minimum KF boost
rather than using the magic number 2000 in the code.
Change-Id: I9428b61f47d26312caff81c6f9ae8587df004791
Some changes in 1ca1186 were mistakenly reverted by a later merge,
this commit re-instated the chanages from values to enums.
Change-Id: Ia6b01c31da584a1f612996e6432612c1295b9eaf
obj_int_extract.bat
this project and target still need some work to allow for concurrent
builds to succeed from the command line.
Change-Id: Ieb3bddc54636e77519083c48573909616257eb23
In this new mode, the size range is strictly determined by the min
and max partition size in neighborhood blocks.
Niklas720 encoding time at cpu-used -5 goes from 56250ms to 50676ms,
a 10% reduction.
Change-Id: I316b0e2ac967ff3fad57b28d69c0ec80b7d8b34e
Includes a few fixes and clean-ups that adds the ability
to use alt-ref frames in one-pass mode.
Whether alt-refs are actually used or not is controlled by a
macro USE_ALTREF_FOR_ONE_PASS in vp9_firstpass.c.
This first cut seems to improve derf by 15+% in 1-pass mode.
But further experiments with parameters are underway.
Change-Id: I78254421435478003367c788c7930d2dc4ee2816
This patch only works if the video is a width and height that are both
a multiple of 32.. It sets every partition to 16x16, and does INTRADC
only on the first frame and ZEROMV on every other frame. It always does
does the largest possible transform, and loop filter level is set to 4.
Was ~20% faster than speed -5 of vp8
Now 20% slower but adds motion search ( every block ), nearest, near
and zeromv
The SVC test was changed because - while this realtime mode produces
bad quality albeit quickly, it isn't obeying all the rules it should
about which frames are available.
Change-Id: I235c0b22573957986d41497dfb84568ec1dec8c7
Adds a stand-alone resize_util app for testing. The app
will not be built in the shared library configurations
so as not to require the APIs to be exposed.
Change-Id: I4718c8bff1abf4e57c2ab2d84be8738fc0048200
Adds multiple filters in the 0.5-1.0 range in the last stage
of the resize functions to prevent over-smoothing/aliasing
Change-Id: I1a615adb16f0df5095790945c94b28b4d6a6fc48
SSE for a 64x64 block with 3 planes can go as high as 3*2^28. So left
shift by 4 may overflow 32 bit int.
Change-Id: I63c84aa56894788bb987299badabbd7cc6fd0be6
The sum of squared mv components can go beyond int range for large
input resolution. This commit changed the type to int64 to avoid
overflow.
Change-Id: Ib21ea2817845cea1435f893064e6417c79c5bc64
New API is supposed to be used from example code. Current implementation
only supports IVF containers (will be extended to Y4M).
Change-Id: Ib7da87237690b1a28297bdf03bc41c6836a84b7e
Adds an arbitrary-size resize library for use in scaling of input
frames in a non-normative manner in the vp9 encoder. The method
used is as follows:
Downsampling - Uses a 8 tap filter for factor of 2 decimation upto
a size just higher than the desired size. Then interpolates pixels
at a precision of 1/32 pel using a set of 8-tap filters.
Upsampling - Interpolates pixels at a precision of 1/32 pel using
a set of 8-tap filters.
There is no assembly optimization yet.
Change-Id: Ib5b81e174fc139da322bb97c8214d52289d60d8a
Encoder's boarder is still 160, while decoder's boarder will be 32.
With on demand and separate boarder buffer for boarder extension.
The decoder's boarder does not need to to 160 anymore.
Change-Id: I93d5aaff15a33a2213e9761eaa37c5f2870747db
This commit explicitly enforces the effective motion vector range
in the motion search stage. The range needs to be the intersection
of UMV border, effective absolute motion vector value range, and
the target search area.
Change-Id: I1cf7c563e02b1086040dad6c1f4f6be1538635a6
When showing a previously decoded frame, we need to
explicitly set the show_frame flag.
For the current frame being decoded this flag is
explicitly set in the frame header.
This should fix WebM Issue 696:
http://code.google.com/p/webm/issues/detail?id=696
Change-Id: I5751a809813f88d2ca6f62c47c3878475ff9ba8d
The affect on quality was minimal. Less than .1%, various sets
yt ( +.15%), derf (-.1%), hd ( -.1% ), std hd(-.15%)...
The affect on speed of encode at speed -5 was substantial ( ~3% ).
Change-Id: I8903346fbae0c35f5b9ea20f81fdd239ae81247d
This commit deprecates the use of best_mv from encoding and bit-stream
writing stages. It hence removes the definition from MACROBLOCKD.
Change-Id: I8e5302775a2aa4a18900726df407bff881f2dfb1
<<"Error: Error Block Test, C output doesn't match SSE2 output. "
<<"First failed at test case "<<first_failure;
}
usingstd::tr1::make_tuple;
#if HAVE_SSE2
INSTANTIATE_TEST_CASE_P(
SSE2_C_COMPARE,ErrorBlockTest,
::testing::Values(
make_tuple(&vp9_highbd_block_error_sse2,
&vp9_highbd_block_error_c,VPX_BITS_10),
make_tuple(&vp9_highbd_block_error_sse2,
&vp9_highbd_block_error_c,VPX_BITS_12),
make_tuple(&vp9_highbd_block_error_sse2,
&vp9_highbd_block_error_c,VPX_BITS_8)));
#endif // HAVE_SSE2
#endif // CONFIG_VP9_HIGHBITDEPTH
}// namespace
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.