* changes:
Merge rd_stats only when it is valid
Let parentheses in handle_inter_mode be symmetric
Add RD_STATS into MB_MODE_INFO
Add txb_coeff_cost_map
Since PVQ's max block size equals to the max transform size,
daala's definition of OD_BSIZE_MAX was changed from 5 down to 4 to
use AV1's max trasform size 32x32. However, dering also uses
OD_BSIZE_MAX and assumes its value is 5, which caused dering
not working.
Change-Id: I9d82bb24adc7d57552a8e0a8a7e798e77d96fd4b
This commit is a manual cherry-pick from aom/master:
45592a39d3b00aee4d6bd70da669400017b7a5d8
Only part of the changes apply in nextgenv2
Change-Id: I1e22514c6fe5af556710254278f2f8a5805db999
- Add minimal compiler flag testing.
- Generate aom_config.c and aom_config.h. Note: hard coded
to generic-gnu values for now.
- Still a work in progress. This will not build anything.
BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=76
Change-Id: Id65b42ea9f4c4f744d788660e2de7234886ce039
The transform functions have been refactored in nextgenv2, this commit
resolves the calls in pvq patch to use this new scheme.
Change-Id: I1b56e75106a3357bb19bd7df2b4ba305eb9ed185
Previously, do_cubic_filter would return results with the
wrong precision if the sample point was exactly aligned to
a pixel.
Change-Id: I40139f9a6701a8e72e691f37bb352f7814a7f306
PVQ replaces the scalar quantizer and coefficient coding with a new
design originally developed in Daala. It currently depends on the
Daala entropy coder although it could be adapted to work with another
entropy coder if needed:
./configure --enable-experimental --enable-daala_ec --enable-pvq
The version of PVQ in this commit is adapted from the following
revision of Daala:
fb51c1ade6
More information about PVQ:
- https://people.xiph.org/~jm/daala/pvq_demo/
- https://jmvalin.ca/papers/spie_pvq.pdf
The following files are copied as-is from Daala with minimal
adaptations, therefore we disable clang-format on those files
to make it easier to synchronize the AV1 and Daala codebases in the future:
av1/common/generic_code.c
av1/common/generic_code.h
av1/common/laplace_tables.c
av1/common/partition.c
av1/common/partition.h
av1/common/pvq.c
av1/common/pvq.h
av1/common/state.c
av1/common/state.h
av1/common/zigzag.h
av1/common/zigzag16.c
av1/common/zigzag32.c
av1/common/zigzag4.c
av1/common/zigzag64.c
av1/common/zigzag8.c
av1/decoder/decint.h
av1/decoder/generic_decoder.c
av1/decoder/laplace_decoder.c
av1/decoder/pvq_decoder.c
av1/decoder/pvq_decoder.h
av1/encoder/daala_compat_enc.c
av1/encoder/encint.h
av1/encoder/generic_encoder.c
av1/encoder/laplace_encoder.c
av1/encoder/pvq_encoder.c
av1/encoder/pvq_encoder.h
Known issues:
- Lossless mode is not supported, '--lossless=1' will give the same result as
'--end-usage=q --cq-level=1'.
- High bit depth is not supported by PVQ.
Change-Id: I1ae0d6517b87f4c1ccea944b2e12dc906979f25e
With RD_STATS in MB_MODE_INFO, we will be able to compare the results
from rate-distortion loop and the results from bitstream packing.
Change-Id: If1dba7d87126577a6f369ac087d4517f7cebb0c5
The txb_coeff_cost_map is a 16x16 map which records each single
transform block's cost from the transform block's location in 4-pixel
unit in recursive transform experiment.
Change-Id: I6da97880c457680594bca56617084010891beaa2
When CONFIG_OS_SUPPORT is not enabled the aom_timer timer function
stubs cause unused parameter warnings. This comments out the arg
names and silences the warning.
Change-Id: I97bdbcbebdf081ac5cb2ffd86439028a1e672fa2
av1/encoder/rdopt.c:9533 ‘zeromv[1].as_int’ may be used
uninitialized in this function [-Wmaybe-uninitialized]
this was spurious given the logic in the if
Change-Id: I8ddfe7e46d1bf5593cc8624f05c9f181243a87d4
Fix the bool coder test not to use a probability of 100%.
Change-Id: I799871cb0c48580edf0ee15a6c9931d27591ec99
(cherry picked from commit 9b79f6a3d6ea398e5d51d3d1dd69cbfb1725370e)
* changes:
Add av1_ prefix on ###_rd_stats functions
Use init_rd_stats() in encodeframe.c
Add transform block coefficient cost in RD_STATS for debugging
Add helper functions to modify RD_STATS
Add mi_row and mi_col into mbmi to facilitate rd_debug process
Add token cost comparison in write_modes_b()
Those functions includes
init_rd_stats()
invalid_rd_stats()
merge_rd_stats()
This CL help simplify the code.
Change-Id: Id1704d883bd21a039b0478a940994ca14184ae1c
This is just partial implementation
Compare token cost of pack_mb_tokens/pack_txb_tokens with token cost
from rate-distortion loop. If there is any difference, dump out mode
info.
Change-Id: I46b373ee2522c5047f799f36baf7cec5fbc06f06
This commit replaces the offset based block index calculation with
incremental based one. It does not change the coding statistics.
Change-Id: I3789294eb45416bd0823e773ec30f05ed41ba0dc
rd_debug is a debug tool aim at finding discrepancy between rate-distortion
loop and bitstream packing.
Change-Id: I751c4121516c5e6368668229c77778880a9dcb9d
* changes:
Reformatting the deringing code
Introducing OD_DERING_SIZE_LOG2 constant (3)
Renaming deringing blockwise write-back functions to make code clearer
Deringing refactoring: replace last_sbc with simpler dering_left flag
Getting rid of the od_dering_in type
This commit refactors the recursive transform block partition
search process to make it support rectangular transform block size
coding.
Change-Id: I0207ae40d83c7eae3cb5d460e403f470747590d3
Allow the transform size writing, reading, and the reconstruction
process to support rectangular transform block size coding.
Change-Id: I57393c73ec60835a088d785ca838d7e3d7eb29a4
Using a struct named dlist rather than an array named bskip. Simplified some
code.
No change in output
Change-Id: Id40d40b19b5d8f2ebafe347590fa1bb8cb80e6e1
The OD_DERING_VERY_LARGE values are now explicitly copied to the buffer instead
of being read from the line buffer when we're on the edge of the frame. This
will make it possible to make the line buffer 8-bit for non-high-bitdepth.
No change in output
Change-Id: I1a4134d67ac7f8c239f08d73941405c56f01050b
Now only buffering three lines across the entire frame and four lines
over the height of one superblock.
No change in output.
Change-Id: I6b99399974e197dc02f2e4ff2e60cdd7fdaa2e43
This introduces a line buffer that hold the last three lines of each original
row so that the next row can be deringed with the original input of the upper
row.
No change in output
Change-Id: I8fad3bc48745e9ce3e440289f453477a0c5442c0
This commit makes the encoding process of the recursive transform
block partition support both rectangular and square transform block
sizes as the starting point. If the coding block size is rectangular,
it would allow the transform block size to start from the largest
rectangular transform size, and recursive parse to the selected
coding sizes.
Change-Id: I576628b9166565bada6a918f0a1e67849dfef4cd
The function module in inter_predictor() has been changed to
universally support arbitrary block size inter prediction. Hence
sub8x8mc can be a standalone experiment now.
Change-Id: Ie9d87f61fc317b1d114edb4e0bf5544f918ed08e
The rectangular transform syntax is by default supported, hence
no need to put it under the experimental flag. This does not change
the coding statistics.
Change-Id: I3a147503d973a03400f8a86e11f07c7d754e6234
* changes:
Avoid the "initial copy" in the deringing filter
Only copy the deringed blocks back into the buffer
Reducing copies in deringing filter
sb_all_skip_out() now computes a list of deringed blocks
compute bskip as we go
Revert "Fix dering filter when using 4:2:2 or 4:4:0 subsampling"
This prepares the integration of rectangular transform block size
with recursive transform block partition system.
Change-Id: Id96aa3790dace15619c665f438241938992d1730
Upsampled references currently increase the size of references by
64 times. This patch limits the memory used by the encoder to
about 3GB when encoding high bit depth content.
This should be re-evaluated in the future, if doing 8-tap
resampling in the motion search becomes reasonably fast, or if
the upsampled references are reduced in size (by omitting some
subpel positions and interpolating them instead).
Change-Id: I6d84ff0d6202ec46f4fa53e268e68aa808e5df85
Reverted commit: f8306bfdc (with some changes).
Reason: This was triggering an assert in debug build because of zero
probability values. So, using an "UNUSED_PROB" macro to replace these to
retain clarity.
Assertion failure can be reproduced as follows:
$ make clean; extra_cflags='-O0 -g -fno-inline' ../../configure
--enable-debug --enable-experimental --enable-palette && make -j 16
$ ./aomenc -D --codec=av1 ~/videos/screen_content_set/gimp.y4m -o
/tmp/foo.webm --tune-content=screen --limit=50
Pass 1/2 frame 50/51 8976B 1436b/f 86169b/s 2902620 us
(17.23 fps)
Pass 2/2 frame 25/0 0B 2933053 us 8.52 fps [ETA unknown]
aomenc: ../../av1/encoder/cost.c:46: cost: Assertion `prob != 0' failed.
Aborted (core dumped)
Change-Id: I47a76b8f415060909bc8448fae3002857eb61d8e
This commit allows the partition context model to account for the
maximum transform block size of the coding block.
Change-Id: I22b91e85fff70faa974afd362ce327d3f2eda81d
- Add unit tests to verify the bit-exact result.
- User level time reduction (EXT_TX):
encoder: 3.63%
decoder: 2.36%
- Also add tx_type=V_DCT...H_FLIPADST SSE2 for 16x16 inv txfm.
Change-Id: Idc6d9e8254aa536e5f18a87fa0d37c6bd551c083
Start adding cmake build support. This is based on the generic-gnu
target and will not build anything. It simply produces a project file
(when generating for a IDE) that can be loaded and that allows for
interaction with (most of) the aom sources used in a generic-gnu
build.
Notable missing pieces:
- flag testing
- config generation
- experiment configuration
- enable/disable encoder/decoder
- aomenc/aomdec
- all third party library build integration
- all tests
Change-Id: Iaeda0b03d58591a26a8fb54f63a2aa3b5354e3a6
* changes:
Reverse order of CLPF and dering
Refactor: read_tx_size_probs()
Fix compiling issues with --enable-ec-adapt
Fixes compilation error on Windows/Visual Studio
This commit adds simp-mv-pred experiment. The experiment is to work on
top of ref-mv experiment to save memory bandwidth and reduce the size
of line buffer needed in ref-mv experiment.
When compared to ref-mv, this experiment showed:
low-delay BDR gain: 0.03%
High-delay BDR gain: 0.01%
memory/memory bandwidth saving: 40%
local memory/gate count saving: 20%
Change-Id: Ic4006e041fc58ede411da83d0d730c464ebe1749
This commit fixes the top-right reference block location for block
sizes above 8x8. It improves the coding performance of ref-mv:
lowres 0.08%
midres 0.15%
Thanks to jiafeng@ for finding this issue.
Change-Id: I70750fc7b18bf0126d3e07abc1b63ca5a160193e
This prevents a crash if the upsample_refs speed feature is
changed as part of set_size_dependent_vars, when the recode
loop is enabled.
Change-Id: I645e389bfe961879dd2001439a34fde2993868d9
This CL will cause
0.122% PSNR drop on lowres dataset
0.059% PSNR drop on midres dataset
However, it will facilitate hardware implementation.
Change-Id: I0a0713acacbfd571509a721337711c021915dd3c
The EC_ADAPT experiment cannot work unless EC_MULTISYMBOL is also
enabled.
This patch replaces all individual checks with a centralized check in
both the bitreader.h and bitwriter.h.
Change-Id: I418852d95c5012cc074ed65cd24997e08bc2aadd
The new ec_multisymbol experiment supersedes the rans experiment and is
used for multisymbol features that can be backed by either daala_ec or
rans.
This experiment is automatically enabled by ec_adapt and will try to
enable daala_ec or ans (in that order).
Change-Id: Ie75b4002b7a9d7f5f7b4d130c1aacb3dbe97e54f
Due to the way the daala entropy coder handles raw bits, the current
test is broken because the buffer length is not known when calling
aom_reader_init() is called.
Change-Id: I76e93ec0e160e31f286c23f7c9c0094390c6c2d4
This experiment performs symbol-by-symbol statistics
adaptation for non-binary symbols. It requires DAALA_EC or
RANS and ANS to be enabled. The adaptation is currently
based on a simple recursive filter and is taken from
Daala. It has an adaptation rate dependent on alphabet size,
taken from Daala. It applies wherever non-binary symbols
are encoded using Cumulative Probability Functions rather
than trees.
Where symbols are adapted, forward updates in the compressed
header are removed.
In the case of RANS coefficient token values are adapted,
with the exception of the zero token which remains a
binary symbol. In the case of DAALA_EC other values
such as inter and intra modes are adapted as CDFs are
provided in those cases.
The experiment is configured with:
./configure --enable-experimental --enable-daala-ec --enable-ec-adapt
or
./configure --enable-experimental --enable-ans --enable-rans \
--enable-ec-adapt
EC_ADAPT is not currently compatible with tiles.
BDR results on Objective-1-fast give a small loss:
PSNR YCbCr: 0.51% 0.49% 0.48%
PSNRHVS: 0.50%
SSIM: 0.50%
MSSSIM: 0.51%
CIEDE2000: 0.50%
Change-Id: I3888718e42616f3fd87144de7f125228446ac984
Parse the recursive transform block partition to fetch the actual
transform size. Use this correct transform size to select the
corresponding loop filter kernel. This slightly improves the coding
performance of recursive transform partition for hdres to 0.14%.
Change-Id: Ibe8bc3fdd0d222a4f1fb8156c56a407bec052b9b
Also:
- For unsigned ints, don't check value >= 0 as that is always true.
- Add "-Wlogical-op" warning flag which would have warned that "logical
'or' of collectively exhaustive tests is always true" before this
patch.
Change-Id: Idf3bd312464397f2df19256fc69b22f345dc7753
Making this change in case the future implementation changes and the
compairson is no longer between single bits.
Change-Id: I94f474ce7d82febfa23cec65cbe1b9d240b42e02
This ensures TGs can be decoded even if the whole
frame has not been received and the frame length
is not known.
Change-Id: If24837fcc3b5c46554751be792e91100de73e8d6
For compound mode, it is a sure thing that one of the 2 reference frames
would be either a forward predictive reference, or a backward predictive
reference, and the other would provide a different prediction.
Change-Id: I8d7b40525bec4db0f26ba255c8eefa9f20bd52a3
This is a manual adaptation of the following commit from aom/master:
ce12003d60a1c8d6c65ed07ba165c34062fcbcbd
The original commit message:
A tile group is a set of tiles in scan order.
Each tile group has a version of uncompressed and compressed headers,
identical apart from tile group parameters.
Encoding probability updates takes account of the number of
headers to control overheads.
The decoder supports arbitrary numbers of tile groups with
arbitrary number of tiles. The number of tiles in a TG is
signalled in the uncompressed header for that TG.
The encoder currently only supports a fixed number
of TGs (3, when error resilient mode is on) of equal size
(except possibly for the last one).
The average BDR performnce with 3 tile groups versus
anchor with error resilient mode and up to 16 tiles is:
NR YCbCr: 3.02% 3.04% 3.05%
PSNRHVS: 3.09%
SSIM: 3.06%
MSSSIM: 3.05%
CIEDE2000: 3.04%
Change-Id: I9b97c5ed733103b9160a3a5d4370de5322c00c0b
For clarity, use separate variables for 'color_ctx_hash' and
'color_ctx' instead of reusing same variables for both.
BUG=webm:1324
Change-Id: I3a516ea54353e1f0737822c613a68da252e30c6e
Use unified transform block size and coding block size map. This
prepares for the integration of 2x2 transform block size and the
rectangular transform block size.
Change-Id: I99f51017d19aef337639b708ee9c7faedcc20935
Unify coefficient context used by different experiments. Make
block size and transform block size consistent with rest codebase.
Change-Id: I237336f161d6c473b88c59c48ee68d24b75ce738
It always returns true since the related misc_fix[1] was merged.
[1] 23e83574b6a5105bdc686c49f2d5909f33ea721f
Change-Id: Ie3af685572a2f0a42d2b9fb9903c1abeea225dfd
Use the transform block partition depth counts to decide if to
reset the tx_mode at frame header level. Add a comment to make this
explicit.
Change-Id: I417920b4b61eeb91cde9536336a12deea2d42f79
Previously, when ext-partition-types and either clpf or dering were
enabled, the signalling for clpf/dering would not be encoded or decoded,
as the code to do so was inside a #if !CONFIG_EXT_PARTITION_TYPES block.
This caused many tests (eg, AV1/EndToEndTestLarge.EndToEndPSNRTest/0)
to fail with encode/decode mismatches.
Change-Id: If1742deb1812877813b2c3e93a048430f9a504ba
These are in response to post-commit suggestions made on
If429c93bb90b66fdff0edc07ecd9fc078077d303.
Change-Id: Id29afa158471bd6259bd07ac00812a50bfd0a709
This ensures that the parameter refinement never
results in a motion parameter value that exceeds the number
of alloted bits in the bitstream. It accounts for all of
the necessary precision shifts required to make global motion compatible
with the warped motion library. It also accounts for the
zero-centering that is applied to global motion parameters that are
naturally centered around one.
Change-Id: If429c93bb90b66fdff0edc07ecd9fc078077d303
- Change FDCT32x32_2D_AVX2 output parameter to tran_low_t.
- Add unit tests for CONFIG_AOM_HIGHBITDEPTH=1.
- Update TODO notes.
BUG=webm:1323
Change-Id: If4766c919a24231fce886de74658b6dd7a011246
This avoids explicitly casting them to 'int' later.
These methods were already called with signed int arguments for pitch,
so this also avoids int -> unsigned int -> int conversion.
Change-Id: I2129f5ceff8f2525a188ee3ae52f9fe7067bd2e3
This prepares for the integration of rectangular transform size
into recursive transform block partition.
Change-Id: I164eb43d10afa9bb2f4722de7a48faa770ba4ced
Purpose:
-Reduce dynamic range of interpolation filter coefficents from 8
bits to 7 bits.
-Inner product for 8-bit input data can be stored in a 16-bit signed
integer.
Impact on compression efficiency:
-Marginal improvement, typically less than 0.5% BDR.
Change-Id: I58d1408307ae7d2a6f9de8965c5877b258703199
Use table to replace the arithmetic computation for mapping between
transform block and pixel number. Support automatic scale of block
size and transform block size.
Change-Id: I84766850172265d4295f418383dbc5e6e5838ec8
- Eliminate the awkward _av1 suffix/infix in local variable names.
- Lift bitdepth selection out of the token loop.
Change-Id: I26d3397464f7808e0481a804033a93ca4f01f5d5
This change is similar to the one done for choose_tx_size_from_rd in
daf841b4a10ece1b6831300d79f271d00f9d027b
It gives a 4% speed-up on bus_cif.y4m with the following settings:
--cpu-used=4 -p 1 --end-usage=q --cq-level=40 --tile-columns=0 --tile-rows=0
Change-Id: Ic54fe4a066a2c0b5f6349d80cd13de8bb8ddcabc
last_frame_type is not well defined for intra_only frames
if we are decoding them out of order.
This change removes a dependency on last_frame_type for these frames.
Change-Id: I440cac68792714de222e192a0b3e75f6e1aa5e4b
It's clearer on inspection that the zero probabilities are unused.
Cherry-picked from aomedia/master: 8134db1
Change-Id: I56cddcb41ba256b7bb921d6a8538405165566dfb
Ran some manual sanity checks:
- Verified that the automatically generated encodings match the
hand-written encodings before the patch.
- Verified that the encoded bitstream before/after this patch is
identical.
Change-Id: I2153c57e463cff09c1d03d619b432fb1015199c3
For a 16x16 pixel block, one needs to allocate 16x16 coefficient
tokens, plus up to 16 eob tokens, per plane. This commit increases
the token allocation size to cover the case where all the transform
blocks are of size 4x4 in format 444.
Change-Id: I5755e6a53771053d51163d01ec1d62e670c5009e
Given the largest transform size is 32x32, this commmit changes size
estiiation based on the size rounding up to 32 multiples to avoid
insufficient buffer allocations.
BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=36
Change-Id: I6eab09dc6acdc0f5a6bcadb918d62c4852aae21f
This commit increases the minimum size for allocated buffer for
compressed data. The old size underestimated the size needed for
small images with width or height less than 64 pixels.
BUG=https://bugs.chromium.org/p/aomedia/issues/detail?id=31
Change-Id: Ia12507edc2be1e737ec49c32f64fd2ebf1eab41f
Previously, we assumed that av1_init_plane_quantizers is always called with
segment_id == xd->mi[0]->mbmi.segment_id (and use the latter to derive the value
of 'qindex' to use in the quantizer). But this is no longer true when supertx
is enabled. This patch instead remembers the value of 'qindex' derived from
the latest call to av1_init_plane_quantizers and uses that directly.
Change-Id: Ifa1c5bf74cad29942ff79b88ca92c231bc07f336
This commit resets the transform size to be the maximum possible
value. It avoids out-of-boundary writing when the ActiveMap is
turned on.
Change-Id: I8302dd9a5c9fffaea3edf9ad33f72aa111999737
This avoids an encoding segmentation fault in speed 5, due to the
use of uninitialized dummy inter prediction filter buffer in the
dynamic motion vector referencing scheme.
Change-Id: Icd888d46623e8abf34267838135eed8656d552e4
The difference between src and dst will be signed, the error will be
unsigned. The change quiets -fsanitize=integer:
unsigned integer overflow: 4294967295 * 4294967295
Change-Id: I131cefcc9583ee8a5b98eb5182fd30e9c7237ea0
It seems that when bit accounting was introduced in
https://chromium-review.googlesource.com/#/c/400658/
there was one place which was accidentally skipped, leading to build failures
with --enable-dering. This patch adds the missing information.
Change-Id: I59e1bd6f7e1d4fa58506ee7af307b845c78a7cbe
By default, when building with --enable-accounting the bit accounting
code will collect statistics for every frame while decoding.
Collecting statistics can slow down decode time and we would eventually
like to enable the CONFIG_ACCOUNTING flag by default.
This patch adds a runtime flag so that bit accounting statistics are
only collected when actually needed.
Change-Id: I25d9eaf26ea132d61ace95b952872158c9ac29e7
This decoder control requires AV1 to be compiled with --enable-accounting.
Note that bit accounting data is only available after a frame has been
decoded.
Change-Id: I8a15213d9f2587638e0edb62932738e985160e03
ISO C90 forbids mixed declarations and code and the function
aom_accounting_set_context() was being called before the MB_MODE_INFO
declaration.
Change-Id: I8619525b1b2fd37753891bd310d9d59c881b8807
Move computing the class0_fp_cdf and fp_cdf tables per coded mv
symbol to computing it only when the probabilities are updated.
Change-Id: Ib4957c8ab21e6189bcc3817a07b7681dfb343223
Move computing the class_cdf table per coded mv class symbol to
computing it only when the probabilities are updated.
Change-Id: I6c4a9075817e8ba2e251f0e82436995f08f2ec5c
Move computing the joint_cdf table per coded mv joint symbol to
computing it only when the probabilities are updated.
Change-Id: If5d195f70e6fad7b60f69606c8386ad5e69657d2
Move computing the inter_mode_cdf tables per coded inter mode symbol to
computing them only when the probabilities are updated.
Change-Id: I7a7b059ee75723cb6f278ed82a20cf34c27915d8
Limit the recursive transform block partition depth to 2. For a
32x32 transform block unit, one can maximally go down to 8x8 transform
block size.
Change-Id: I2caa92bb2eee64762b7ecca8920259f7c50fb0aa
Check the encoding statistics. If all the coding blocks use the
max transform size, skip transform size coding in the frame header.
Change-Id: I31cb16314e87f945d7e95a34a90a5536b3ed82d5
Skip counting the inter transform block size distribution for
the intra transform block size coding.
Change-Id: Ifad9d843f57d069d0619a54d66ca18101e1b69f1
Move computing the uv_mode_cdf tables per coded intra mode symbol to
computing them only when the probabilities are updated.
Change-Id: I627b59d30726c913f5d7ba7753cb0446a12655bb
Move computing the y_mode_cdf tables per coded intra mode symbol to
computing them only when the probabilities are updated.
Change-Id: I8c43d09b8ef5febe2a3ec64bd51d28bd78ea73ed
Move computing the kf_y_mode_cdf tables per coded intra mode symbol to
computing them only when the probabilities are updated.
Change-Id: I5999447050c2f7d5dbccde80bee05ecd1c5440ab
Cherry-pick Daala b5020bee:
Remove redundant test in od_ec_decode_bool_q15().
Using a test that decodes 100M random binary symbols, making this change
produced a speed up of 8.81% with gcc-4.9.3 and 3.71% with clang-3.7.1,
both compiled with -O2.
Change-Id: If6d0077a56121a575ae53bcd4d1d9b7d800a317d
This CL will facilitate adapt_scan experiment.
In adapt_scan experiment, dynamic scan order will be stored in
AV1_COMMON
Change-Id: I4763ea931b5e1af54d4f173971befeb01a4db335
* changes:
Palette: Use inverse_color_order to find color index faster.
Rewrite some loops to avoid -Wunsafe-loop-optimizations warnings.
Remove some useless casts
Add compiler warning flag -Wextra and fix related warnings.
Declare some array sizes to be constants (known at compile time).
Handle the sub8x8 chroma component at the unit of 2x2/4x2/2x4 level
and use the motion vector inherited from the luma component. This
improves the coding performance:
lowres 0.4%
midres 0.25%
hdres 0.15%
Change-Id: I34dff4218cfa3e5d55e7ed0341f36f4719389f7e
The av1_mv_class0_tree is a balanced tree with two leafs and can
simply be coded as a boolean with probability class0[0].
If CLASS0_SIZE is ever changed from 1, this change will need to be
reverted.
Change-Id: If294dac825a5f945371092c74aa8e3f84cd962b6
When building with --enable-daala_ec, the tx_type for intra blocks can be
coded using the CDFs that are updated once per frame.
This patch converts a tx_type symbol to be coded with aom_write_symbol()
and aom_read_symbol() that was missed in f3e8e267.
Change-Id: I34f8fef7525f88e156bbcb78dfc48994367610ce
In a previous commit: 5db9743fbb, two
changes that appeared to be typos are breaking build when experiments
are enabled:
../../libvpx/configure --enable-experimental --enable-ref-mv
--enable-ext-intra --enable-ext-refs --enable-ext-interp
--enable-supertx --enable-var-tx --enable-entropy --enable-ext-inter
--enable-ext-tx --enable-motion-var --enable-dual-filter
--enable-ext-partition --enable-ext-partition-types
--enable-loop-restoration --enable-rect-tx --enable-palette
--enable-aom-highbitdepth --enable-filter-intra --enable-internal-stats
&& make clean && make -j16
This commit fixes the issue.
Change-Id: I9ce5bbc96df326214202868cb0669bd334c86851
For example, loops of the form:
"for (i = 0; i < 1 + max_value; ++i) ..." or
"for (i = 0; i <= max_value; ++i) ..." are possibly infinite loops,
theoretically speaking (even if practically, they aren't).
So, compiler cannot optimize those loops.
When possible, I rewrote such loops to be finite even theoretically.
Cherry-picked from aomedia/master: 4e69284
Change-Id: Ied47a24833b689c0ec011f8645cf1c01856f7c59
Note: some of these warnings are enabled by a combination of -Wunused
(added earlier) and -Wextra.
Cherry-picked from aomedia/master: 4790a69
Change-Id: I322a1366bd4fd6c0dec9e758c2d5e88e003b1cbf
The av1_token encodings must match the contents of the aom_tree_index
structures so generate all encodings from the symbol trees.
Change-Id: I37be9f12c86a02693ae3c3c1d24b00f2abb29bfb
* changes:
Define SIMD_INLINE using AOM_FORCE_INLINE
AOM_FORCE_INLINE: fix always_inline attribute
Free memory allocated by daala_ec encoder.
Move clpf_sse4_1.c to clpf_sse4.c in agreement with convention
sync avg_test.cc with aom/master
Cherry-picked from aomedia/master: 953f086c
Note: related fixes were already part of webm/nextgenv2.
Change-Id: I027a4f2a540af5a304b358ddbf293965b4211b9e
The bug was introduced here:
https://chromium-review.googlesource.com/#/c/399975/4/av1/encoder/bitstream.c
In that patch, I had removed 2nd declaration of a variable of the same
name. But it turns out that the two variables actually had a different
type (even though the name was same).
Now, we keep both variables, but rename one of them -- that fixes the
mismatch. While we are at it, made both variables local as well.
The fix can be verified as follows:
../../libvpx/configure --enable-experimental --enable-supertx
--enable-var-tx --enable-entropy --enable-internal-stats && make clean
&& make -j16
aomenc -o soccer_cif_1000_av1_b8.webm ../soccer_cif.y4m --codec=av1
--limit=50 --skip=0 -p 2 --pass=1 --fpf=soccer_cif_av1.fpf --good
--cpu-used=0 --target-bitrate=1000 --lag-in-frames=25 --min-q=0
--max-q=63 --auto-alt-ref=1 --kf-max-dist=150 --kf-min-dist=0
--drop-frame=0 --static-thresh=0 --bias-pct=50 --minsection-pct=0
--maxsection-pct=2000 --arnr-maxframes=7 --arnr-strength=5 --sharpness=0
--undershoot-pct=100 --overshoot-pct=100 --frame-parallel=0
--tile-columns=0 --profile=0 --test-decode=warn
aomenc -o soccer_cif_1000_av1_b8.webm ../soccer_cif.y4m --codec=av1
--limit=50 --skip=0 -p 2 --pass=2 --fpf=soccer_cif_av1.fpf --good
--cpu-used=0 --target-bitrate=1000 --lag-in-frames=25 --min-q=0
--max-q=63 --auto-alt-ref=1 --kf-max-dist=150 --kf-min-dist=0
--drop-frame=0 --static-thresh=0 --bias-pct=50 --minsection-pct=0
--maxsection-pct=2000 --arnr-maxframes=7 --arnr-strength=5 --sharpness=0
--undershoot-pct=100 --overshoot-pct=100 --frame-parallel=0
--tile-columns=0 --profile=0 --test-decode=warn -v --psnr
Change-Id: Ibd72dbe1f620e6de231513220ee4e190606613ae
av1_init_scan_order
initialize data structures related to adaptive scan order
av1_update_scan_prob
update nonzero probabilities from nonzero counts
av1_augment_prob
embed r + c and coeff_idx info with nonzero probabilities.
When sorting the nonzero probabilities, if there is a tie,
the coefficient with smaller r + c will be scanned first
av1_update_sort_order
apply quick sort on nonzero probabilities to obtain a sort order
av1_update_scan_order
apply topological sort on the nonzero probabilities sorting order to
guarantee each to-be-scanned coefficient's upper and left coefficient
will be scanned before the to-be-scanned coefficient.
av1_update_neighbors
For each coeff_idx in scan[], update its above and left neighbors in
neighbors[] accordingly.
Change-Id: I64c4938057daf8e30e48609a00ecc08d2e3062f4
Plus a small code clean up. The experiment of EXT_REFS, compared against
the baseline, using Overall PSNR, now obtains a gain on lowres as:
Avg: -5.818; BDRate: -5.653
Compared against the previous EXT_REFS results on lowres, a tiny gain is
obtained as:
Avg: -0.047, BDRate: -0.063
(1) 780952 Add encoder first pass support to bi-prediction in EXT_REFS
(2) f91498 Add pred prob handling for new references in EXT_REFS
(3) e91472 Add decoder support for bi-direct prediction in EXT_REFS
(4) 0dbac9 Add encoder support to new references in EXT_REFS
(5) ad70cc Remove hard-coded number for EXT_REFS
(6) 9c1e2f Add the use of new reference frames at encoder in EXT_REFS
(7) 6d4fde Add the experiment flag of EXT_REFS
Change-Id: I26f7ca45b9ede7579fdb9d0d6a1a91f4334599bd
- Use range check function to avoid DCT_DCT overflow.
We need to re-develop the column txfm side scaling/rounding. Now,
we prefer to maintain the current BDRate level.
- Encoder user level time reduction <1% owing to av1_fht32x32_avx2.
- Add MemCheck unit test and fdct32() unit test.
Change-Id: I1e67030f67bc637859798ebe2f6698afffb8531c
The tell functions return an unsigned integer.
This causes the AV1.TestTell test case to fail because
-1 is greater than 20 when treated as an unsigned integer.
Change-Id: I9dd1d7eb61260d30d1713a4917159fc6fe8eee42
Free the two memory buffers allocated by the daala_ec encoder when
calling od_ec_enc_clear() from aom_daala_stop_encode().
Change-Id: If20e86374ea29e51ee59111012905e56039dd4cc
This commit changes to send frame size explicitly when
error_resilient_mode=1. Purpose is to allow parsing of bitstream
after a packet loss.
Change-Id: I7d1c010a465aa18914762cc1a3e61db377304c08
The (new) ans experiment replaces the bool coder with uABS bools. The
'rans' experiment adds multisymbol coding.
This matches the setup in aom/master.
Change-Id: Ida8372ccabf1e1e9afc45fe66362cda35a491222
* changes:
Fix warnings reported by -Wshadow: Part4: main directory
Fix warnings reported by -Wshadow: Part3: test/ directory
Fix warnings reported by -Wshadow: Part2b: more from av1 directory
Fix warnings reported by -Wshadow: Part2: av1 directory
Fix warnings reported by -Wshadow: Part1b: scan_order struct and variable
Fix warnings reported by -Wshadow: Part1: aom_dsp directory
Move STAT_TYPE enum to source file.
Code cleanup: mainly rd_pick_partition and methods called from there.
The bit accounting functions aom_reader_tell() and aom_reader_tell_frac()
return the number of bits and 1/8th bits respectively.
This patch changes the return type from ptrdiff_t which is signed to
uint32_t which is unsigned.
The size_t type is not used since we only care about the number of bits
or 1/8 bits per entropy coder context and we don't expect to code more
than 512 megabits per tile.
Change-Id: I84a119d1f52829dcbdb66a92656eacca06e42b11
Now that all warnings are taken care of, add warning flag -Wshadow to
configure.
Note: Enabling this flag for C++ generates some useless warnings about
some function parameters shadowing class member function names. So, only
enabling this warning for C code.
Cherry-picked from aomedia/master: b96cbc4
Change-Id: I3922dea2e6976b16519c4aa4d1bd395c198134f1
The tx_partition_set_contexts function changes tx_size even
for blocks coded with a rectangular transform.
This causes an internal rd inconsistency when using all of
CONFIG_VAR_TX, CONFIG_RECT_TX, CONFIG_EXT_TX.
Change-Id: Ia45d4a8893b0961534219bb96d9652719038c7a1
This patch adds bit account infrastructure to the bit reader API.
When configured with --enable-accounting, every bit reader API
function records the number of bits necessary to decoding a symbol.
Accounting symbol entries are collected in global accounting data
structure, that can be used to understand exactly where bits are
spent (http://aomanalyzer.org). The data structure is cleared and
reused each frame to reduce memory usage. When configured without
--enable-accounting, bit accounting does not incur any runtime
overhead.
All aom_read_xxx functions now have an additional string parameter
that specifies the symbol name. By default, the ACCT_STR macro is
used (which expands to __func__). For more precise accounting,
these should be replaced with more descriptive names.
Change-Id: Ia2e1343cb842c9391b12b77272587dfbe307a56d
Currently the RD loop traverses 4X8 blocks in inverted N order while
the bitstream stores blocks smaller than 8x8 in Z order. This causes a
discrepancy where the RD loop reads uninitialized data while
performing intra prediction. As a temporary fix simply disable the
use of the extended right edge for 4X8 blocks, until the bitstream can
be changed to match the logical structure of the blocks.
Change-Id: I44a9e4fc1a15cd551a7b38c3c1227bc5dac77e9a
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Cherry-picked from aomedia/master: 863b0499
Change-Id: Ida5de713156dc0126a27f90fdd36d29a398a3c88
- Change struct name to all caps SCAN_ORDER to be locally consistent.
- Rename struct pointers to 'scan_order' instead of hard to read short
names 'so' and 'sc'.
Cherry-picked from aomedia/master: 30abc082
Change-Id: Ib9f0eefe28fa97d23d642b77d7dc8e5f8613177d
While we are at it:
- Rename some variables to more meaningful names
- Reuse some common consts from a header instead of redefining them.
Cherry-picked from aomedia/master: 09eea2193
Change-Id: I61030e773137ae107d3bd43556c0d5bb26f9dbf8
In the header, all we need is number of stat types, not the names for actual
types.
Removing it avoids names like 'Y', 'U', 'V' and 'ALL' being visible
in all files that include the encoder.h header.
Change-Id: I874a73a3cfe6bcb29aedea102077a52addc49af6
- Const correctness
- Refactoring
- Make variables local when possible etc
- Remove -Wcast-qual to allow explicitly casting away const.
Cherry-picked from aomedia/master: c27fcccc
And then a number of more const correctness changes to make sure other
experiments build OK.
Change-Id: I77c18d99d21218fbdc9b186d7ed3792dc401a0a0
Move computing the segmentation_probs.tree_cdf table per symbol to
computing it only when the probabilities are updated.
Change-Id: I3826418094bbaca4ded87de5ff04d4b27c85e35a
Ransac's get_rand_indices originally used rand_r seeded with the
same value every time, producing the same random sequence at every
iteration. This causes the global motion parameters to be slightly
less accurate because ransac cannot improve the model fit after
the first attempt.
Change-Id: Idca2f88468ea21d19ba41ab66e5a2744ee33aade
This function is called after `super_block_yrd` and assumes that the dst
buffer is correct but that is no longer always the case after
daf841b4a10ece1b6831300d79f271d00f9d027b since we don't call
`txfm_rd_in_plane` after the RDO loop in `choose_tx_size_from_rd`.
We could fix this by always saving and restoring the dst buffer but
removing `rd_variance_adjustment` is a better solution:
- Getting the dst buffer always right is tricky as demonstrated by the
fact that it is wrong now, even if we fix it now we could break it later
and not notice
- Perceptual weighting is a good idea but `rd_variance_adjustment` is the
wrong approach as it weights both the rate and the distortion:
to get meaningful units you should only weight the distortion,
weighting rate means that we pretend some bits cost less than other
bits, this is not the case. The distortion weighting approach is
implemented by Daala in `od_compute_dist` and we plan to experiment
with this in AV1 too.
- Removing `rd_variance_adjustment` improves coding efficiency on all
metrics, here are the results for objective-1-fast using the Low
Latency settings:
PSNR Y: -0.14%
PSNRHVS: -0.17%
SSIM: -0.12%
MSSSIM: -0.12%
CIEDE2000: -0.07%
Change-Id: I74b26b568ee65f56521646b8f30dd53bcd29fce3
This array was allocated and used to save and restore segmentation map,
however the original segmentation map was never modified between the
calls to save and restore.
Change-Id: Iaf0fbfed733c097e84cf44d2aa6b8f35d2fb456b
This is not used since the commint 00cd5de536fd5545d8fb663b2db81c014e3e6a41,
"Remove skip_recode speed feature".
Change-Id: Ic03da6c0095f6285a3889d5d22e8aaa2e6cbfd79
On average no compression performance changes. Encoding speed is
increased by 10~20% on some test clips in the derf set.
Change-Id: I9856caaa260303f6f6259686671bed7d51012277
Drop some speed features used in speed 2 and above, during the
algorithm development process. This helps simplify the codebase.
Change-Id: I3b2f5560d90b00d2d8fd57c2cb36f6ddd3f228e4
This commit ports the following from aom/master:
4c46278 Add aom_reader_tell() support.
b9c9935 Remove an erroneous declaration.
56c9c3b Fix ANS build.
Change-Id: I59bd910f58c218c649a1de2a7b5fae0397e13cb1
This computation should match the code in encode_block
to increase the accuracy of the rd optimization.
Change-Id: Ibc9d9ab6d88d0c0f3af62e9cc233216aba48a57e
When built with var_tx and ext_tx, select_tx_size_fix_type is used
to compute the cost for using a particular tx_type.
The code indexes the array inter_tx_type_costs at the wrong location
resulting in a zero cost for signalling tx_type for rect_tx blocks.
Change-Id: Iba38be3a0d822109f778f0600b242dfb40359766
To get ready for pulling AV1 to nextgenv2. Refactoring is done to
make the code structures similar, especially for the motion search
part.
Change-Id: I5d7636394408d97de55394d668540f5627827983
Cherry-pick Daala 211c2a41: Clean up EC tell() and tell_frac() functions.
Add a const qualifier to the od_ec_enc and od_ec_dec parameters of
the od_ec_enc_tell(), od_ec_enc_tell_frac(), od_ec_dec_tell(), and
od_ec_dec_tell_frac() functions.
Add an OD_WARN_UNUSED_RESULT to od_ec_enc_tell_frac().
Change-Id: Ia50e2fd75e98d8a03d993449d658b695cf56e6fb
The formatting of OD_UNIFORM_CDFS_Q15[] in entcode.c is helpful for
for understanding what is contained in the array (e.g., the uniform
probability distributions of small sizes 2 through 16).
This patch reverts the change made in f4b2926d and adds linter hints to
ignore the formatting.
Change-Id: I2ad9fe6673b86e6067cb97b40f0f0e69a119cdf5
In super_block_uvrd(),if is_cost_valid == 0, all return parameters,
i.e. rate, distortion, skippable, and sse, are reset.
So, should not call txfm_rd_in_plane() if is_cost_valid == 0.
Also, the bug causes av1_xform_quant() to see invalid diff signal
since av1_subtract_plane() is not called in super_block_uvrd().
Change-Id: Iaa06061e2e9aa8876b4611a54f4ae6b8d499332b
Display the -b --bit-depth command line parameter on of aomenc when
--config-aom-highbitdepth is enabled.
Change-Id: I76147e38b9985e68b1e642e21be8fd4d8ec4d966
Move computing the partition_cdf tables per symbol to
computing them only when the probabilities are updated.
Change-Id: I442f9230ba00be7f5d0558d7c38d7324ad009ee8
Move computing the inter_ext_tx_cdf tables per symbol to
computing them only when the probabilities are updated.
Change-Id: I5e1e62f8eae8f6b2edbbd378beeb786649502c10
Move computing the intra_ext_tx_cdf tables per symbol to
computing them only when the probabilities are updated.
Change-Id: I26d5e419e103093e98a7d896c196176305b50fc9
Move from computing the switchable_interp_cdf per symbol to
computing once per frame when the probabilities are adapted.
Change-Id: I6571126239f0327e22bb09ee8bad94114291683e
* changes:
Add missing CONFIG_DAALA_EC declaration.
Add API for writing trees using a CDF.
Add macro to build a simple cdf table.
Use Daala entropy coder to code trees.
Silence clang-format code review warning.
Use Daala entropy coder to code bits.
Clear existing format issue in the codebase
Add Daala entropy coder.
Move the av1_indices_from_tree() function from av1/encoder/treewriter.c
to aom_dsp/prob.c so that it can be used by both the encoder and
the decoder.
Change-Id: Ie43c599f425c3503b1ff93f0c77b5033a05b1bb4
Without first including ./aom_config.h in aom_dsp/prob.c the memmove
function is implicitly defined and causes a compiler warning.
Change-Id: I339d0389f10324a1085aba7d6492b2159a14da92
Add av1_indices_from_tree() function that computes a forward and inverse
mapping of the tree leaf-node symbols to their in-order traversal.
This is necessary because many of the aom_tree binary trees have their
leaf nodes out of order (e.g., an in-order traversal of a tree with n
nodes does not start at symbol 0 and go to symbol n - 1), but the CDFs
created by tree_to_cdf() are indexed in-order.
Change-Id: Icd0dbed4c171a67c9e84a634106c4fdb5b1b3488
Added aom_write_tree_cdf() and aom_read_tree_cdf() function calls to
bitwriter.h and bitreader.h respectively.
These calls take a multisymbol CDF and an index and directly encode the
symbol using the enabled entropy coder.
Currently only the daala entropy encoder supports this (enabled with
--enable-daala_ec) and a compile error is thrown otherwise.
Change-Id: I2fa1e87af4352c94384e0cfdbfd170ac99cf3705
Add the av1_tree_to_cdf() macro which takes a aom_tree_index tree and
associated aom_prob probabilities and constructs a daala uint16_t cdf.
The av1_tree_to_cdf_1D() and av1_tree_to_cdf_2D() apply av1_tree_to_cdf()
across 1D and 2D arrays respectively.
Change-Id: If79fa5ae034263f279d7d0842493570885272fb2
When building with --enable-daala_ec, calls to aom_write_tree() and
aom_read_tree() will convert a aom_tree_index structure with associated
aom_prob probabilities into a CDF on the fly for use with the
od_ec_encode_cdf_q15().
The number of symbols in the CDF is capped at 16, and trees that contain
more than 16 leaf nodes are handled by splitting the most likely, e.g.,
highest probability symbols, first and coding multiple symbols if
necessary.
ntt-short-1:
MEDIUM (%) HIGH (%)
PSNR 0.000227 0.000213
PSNRHVS 0.000215 0.000205
SSIM 0.000229 0.000209
FASTSSIM 0.000229 0.000214
subset1:
RATE (%) DSNR (dB)
PSNR -0.00026 0.00002
PSNRHVS -0.00026 0.00002
SSIM -0.00026 0.00001
FASTSSIM -0.00026 0.00001
Change-Id: Icb1a8cb854fd81fdd88fbe4bc6761c7eb4757dfe
When building with --enable-daala_ec, calls to aom_write() and aom_read()
use the daala entropy coder to write and read bits.
When the probability is exactly 0.5 (128), then raw bits are used.
ntt-short-1:
MEDIUM (%) HIGH (%)
PSNR -0.027556 -0.020114
PSNRHVS -0.027401 -0.020169
SSIM -0.027587 -0.020151
FASTSSIM -0.027592 -0.020102
subset1:
RATE (%) DSNR (dB)
PSNR 0.03296 -0.00210
PSNRHVS 0.03537 -0.00281
SSIM 0.03299 -0.00161
FASTSSIM 0.03458 -0.00111
Change-Id: I48ad8eb40fc895d62d6e241ea8abc02820d573f7
This flag was already added to aomedia/master, so bringing it back to
webm/nextgenv2, as part of an effort to get the two codebases in sync.
Change-Id: I2b933a6a160e4210d1411a9e7978149eb8553205
This reverts commit 9b25f30674 to
reinstate the reverted commit with fixes that solved the build issues
when --enalbe-clpf is used in configure.
Change-Id: I15447cae7fa9b3deb27976345dc3db230a4a7a60
These signals were in the uncompressed frame header (as a temporary
hack), which caused two problems:
* We don't want that header to be duplicated in the slice header
* It was necessary to signal the number of bits to transmit up front
However, the filter size can be 128x128 which is greater than the SB
size, and a decoder wouldn't be able to know whether to read a bit or
not until the final SB of that 128x128 block has been decoded
(depending on whether the 128x128 is all skip or not). Therefore the
signalling was changed for 128x128 blocks so that every top left SB of
a 128x128 filter block contains a signal regardless of whether the
block is all skip or not. Also, all the MB's of 128x128 block are
filtered even if they are skip MB's. This gives the signal a purpose
even when the 128x128 block is all skip, and it also gives a slight
coding gain as it leaves a way to filter skip blocks, which was
previously forbidden.
Low latency:
PSNR YCbCr: -0.19% -0.14% -0.06%
PSNRHVS: -0.15%
SSIM: -0.13%
MSSSIM: -0.15%
CIEDE2000: -0.19%
High latency:
PSNR YCbCr: -0.03% -0.01% -0.09%
PSNRHVS: 0.04%
SSIM: 0.00%
MSSSIM: 0.02%
CIEDE2000: -0.02%
Change-Id: I69ba7144d07d388b4f0968f6a53558f480979171
To get ready for pulling AV1 to nextgenv2
Replace the experimental flag by MOTION_VAR. Rename major variables.
Change-Id: If6cf4f37b9319c46d8f90df551cc7295d66ca205
* changes:
Deringing cleanup: remove DERING_REFINEMENT (always on now)
Don't run the deringing filter on skipped blocks within a superblock
Don't dering skipped superblocks
On x86 use _mm_set_epi32 when _mm_cvtsi64_si128 isn't available
Currently the patch does not have any impact on the RD performance. The
fix could however potentially help on the next step of work, especially
when the extra altref frames allow non-zero temporal filtering strength
and their corresponding OVERLAY frames, i.e. the INTNL_OVERLAY frames
are being added.
Change-Id: I2e07fb3d0aa547a0b5dd05bb4ba865cd46309076
* changes:
Remove custom rans types
Remove add_token_no_extra.
Remove unused aom_rans_build_cdf_from_pdf
Add the tool used to generate the constrained tokenset.
Remove the starting zero from ANS CDFs.
Import the aom_read/write_symbol abstractions from aom/master
(cherry picked from aom/master commit 11206c60d930be9d29100567aa67f2a65463852a)
Includes renames in a bunch of places not handled by the original
due to differing tree states.
Change-Id: Ic74d9d8850b8c80a51e55e425bbf472a67e2653f
It was a fairly small production optimization for VP9.
Change-Id: Ie93b474ea5b7e63384a7c0b3a56b135462d1471b
(cherry picked from aom/master commit df9bb76b1330de42fe13827df4c72010adb51429)
The code that generates the raw distribution is based on a MATLAB
program by Debargha Mukherjee, and the algorithm used to quantize the
distribution comes from the ANS Toolkit by Jarek Duda.
Change-Id: Ic273f7d9e43e3ecd999e9e7e04cde57e8559375a
(cherry picked from aom/master commit ef446026aeafa318f9bee182b8c80eb4f1ef5a0a)
This brings it in line with the Daala CDFs and will make it easier to
share code.
Change-Id: Idfd2d2b33c3b9b2c4e72ce72fb3d8039013448b9
(cherry picked from aom/master commit af98507ca928afe33e9f88fdd2ca168379528d6a)
* changes:
Unfork ANS decode_coefs
Remove ZERO_TOKEN from the ANS tokenset
Drop costing ANS tokens from derived probabilities
Unfork ANS pack_mb_tokens
Fix a bug in the C implementation of the ihalfright32
transform, in the case that its input and output buffers are the same.
This occurs when it is called by av1_iht32x16_512_add_c.
Change-Id: I61c652e2662178520c0639a2879ae128a9c7ec3f
- av1_fht32x32 AVX2 function level time reduction ~89% compared to C.
- av1_fht32x32_avx2() on DCT_DCT improves 42.62% over aom_fdct32x32_avx2()
But function replacement must go with the corresponding inverse txfm.
- No obvious user level time reduction due to 32x32 TX_TYPE selection.
- Zero high 128b YMM to avoid AVX-SSE transition penalties
(fix 16x16 case).
- Added 32x32 AVX2 unit tests to verify bitexact.
- AVX2 optimization summary:
On CPU i7-6700, based on 16x16/32x32 fwd txfm optimization results:
C to AVX2: function level time reduction, ~86-89%.
SSE2 to AVX2: function level time reduction, ~51%.
Change-Id: Idd0cd8bf066a61c7117140ef15ab6c1f8eb4b036
This can be re-added after aligning AOM's ANS with nextgenv2's ANS.
This partially reverts commit 3829cd2f2f.
Change-Id: I78afc587f1abfe33ffcd53b3262910cfae135534
This mimics what's currently done in aom/master. This can be re-added
after aligning AOM's ANS with nextgenv2's ANS.
Change-Id: I3ae62181dd4803694204a234c717a86a15ca8a40
This commit renames LIBVPX_TEST_DATA_PATH to LIBAOM_TEST_DATA_PATH,
with a work around for working with jenkins environmnet variables.
Change-Id: If664ce57e25ad2af8121d1b578bf64043f0baa2a
Before this change, gm parameters were being written to the
bitstream for all frames, but only read for inter only frames,
causing a bitstream error.
Change-Id: I63b8e2fdf6358e07cc00718de04cc399809bde37
Adds the functionality to return the rate cost due to
coefficients without doing full search of all modes.
This will be subsequently used in various experiments,
including in new_quant experiment to search quantization
profiles at the superblock level without repeating the
full mode/partition search.
Change-Id: I4aad3f3f0c8b8dfdea38f8f4f094a98283f47f08
* Move clipping tests from inside to outside loops
* Let sizex and sizey to clpf_block() be the clipped block size rather
than both just bs
* Make fallback tests to C more accurate
Change-Id: Icdc57540ce21b41a95403fdcc37988a4ebf546c7
When CLPF was extended to chroma, the chroma RDO accidentally
discarded the optimal block size found in the luma RDO.
PSNR YCbCr: -0.25% 0.05% 0.06%
PSNRHVS: -0.19%
SSIM: -0.36%
MSSSIM: -0.23%
Conflicts:
av1/common/clpf.c
Change-Id: Ie49cd30f9276a311ada88cb2f13d14757617f030
Previously, only the motion vectors were being stored. This caused
a mismatch in the global motion experiment, which needs this
mode information to decide whether or not to use the gm parameters
in reconstruction.
Change-Id: I58cde750ec06587dbfb8d65b07c15a67b7d6b1f6
Rename av1_write_tree() to aom_write_tree() and move it into bitwriter.h
to match aom_read_tree() in bitreader.h.
Manually cherry-picked from aom/master:
33a143fa7ac42d62080bfc20468cb76ad26045db
Change-Id: I6c686cdd3e0f179d7e95c5bc6984558b62d46d67
av1_clpf_frame() was always called with the same src and dst,
so we only need one argument and the code supporting different
src and dst was removed.
Change-Id: I70919f50e5cfb19c22eb4dff9ee7c0fa2697fad3
Instead of having CLPF write to an entire new frame and
copy the result back into the original frame, make the
filter able to work in-place by keeping a buffer of size
frame_width*filter_block_size and delay the write-back
by one filter_block_size row.
This reduces the cycles spent in the filter to ~75%.
Change-Id: I78ca74380c45492daa8935d08d766851edb5fbc1
The referenced bug was fixed by saving neon registers. That this had any
effect was coincidental.
Both chromium and Android build with clang and neither uses this flag.
Change-Id: I470247d6fd9226fc207b42a187105581a94badc3
(cherry picked from commit fad70a358b)
- Unit tests are added for AVX2 SIMD.
- Encoder speed improvement:
AV1 baseline and EXT_TX, three 1080p sequences at bitrate:
800 Kbps, 2 Mbps, 6 Mbps, on i7-6700 CPU, average
user level time reduction: 3.86%.
Change-Id: Ibbd7837ee3a831c6b1e4e471bf6c8d3fa3a19ff4
This commit ports a CLPF change from aom/master by manually
cherry-picking:
7560123c066854aa40c4685625454aea03410b18
Change-Id: I61eb08862a101df74a6b65ece459833401e81117
This allows the hardware decoder to start decoding ref_mv_idx
syntax prior to the sorting stage and hide the latency of entropy
decoding. The compression performance change is about 0.01% level.
Change-Id: I86b34f31f6c99a36ae2780416175cc0bd90ff492
This is a similar change to following aom CL
https://aomedia-review.googlesource.com/#/c/1961/
Move SIMD related functions from filter.c/h to following files
av1_convolve_ssse3.c
av1_highbd_convolve_filters_sse4.c
Change following c files to header files.
av1_highbd_convolve_filters_sse4.c
av1_convolve_filters_ssse3.c
Change-Id: I41a3cc6b0789e632451aeda82f5eb97a4d78e370
This test checks if there's any basic change to the bitstream or default
encoder by running an encode and checking that the md5 from the decode
doesn't change.
Any change to the default encoder or bitstream should be accompanied by
a change to the md5 in this file.
Change-Id: Ibdd5a1442296fd3e946823ec1f43e8ac4e66dd34
Refactor to streamline the number of profiles needed, in
preparation for the next steps.
NO change in performance.
Change-Id: I753b89299897857f3c250c316b4cdc4fedcb90e8
When the block has width/height above or equal to 64, use 16x16
block search step for reference motion vector search in the non-
immediate rows and columns.
Change-Id: If11ce97a9328b879f30ef87115086aa0cd985a2f
Fix the logical OR computation in .mk file. Otherwise, when both
experiments are on, the output of $(filter... will be two 'yes',
which will cause missing library issue.
Change-Id: I53c44e925dc9ea77c7467217c20e4f1bc7e20fc3
Disable rect_tx because we only support 4x4 Walsh-Hadamard transform
in lossless mode.
Fixes failure in ./test_libaom --gtest_filter=*Large*ScreencastQ0/1
Configuration: --enable-experimental --enable-var-tx --enable-rect-tx
--enable-ref-mv --enable-ext_intra --enable-ext_tx --enable-debug
--disable-optimizations
Change-Id: Ib6b3494c7dcf7182f1cab9b138388d054851a23d
Reconciles the following commits from aom/master to nextgenv2:
- 25aaf40bbc24beeb52de9af7d7624b7d7c6ce9de
- 87073de5693df70eba1c9b9be2b2732ed3b08fb3
Change-Id: Ideda50a6ec75485cb4fa7437c69f4e58d6a2ca73
Move code for reading and writing literals and reading trees to use
just the aom_read_bit() and aom_write_bit() function calls.
Change-Id: Id2bced5f0125a5558030a813c51c3d79e5701873
(cherry picked from aom/master commit bc1ac15846a200272551699d45457039535e56b2)
This helps in porting entropy coder changes that happened in aom/master.
Change-Id: If423eeb3da552066cceb88227138ea61d6a20f07
(cherry picked from aom/master commit d311d02da55433d20aad6dd88e0bbb992919988d)
The codec name is defined in av1_dx_iface.c
This name needs to match kAV1Name in decode_test_driver.cc.
Otherwise the EndtoEndPSNRTest fails when built with --enable-ext-tile,
(because we need the IsAV1 function to return true.)
Change-Id: I05d5ea5b6fd4bbd49e8bcacd047fb81c27efb3b3
Also adds hooks to choose different profiles for UV and intra.
Results
lowres: -0.15%
midres: -0.24%
Change-Id: I4af8bc3e9b82b6f8a061dce9f52c89afa6239ae1
* changes:
Remove VP10 style bitreader and bitwriter wrappers
Rename av1_ans_test to match aom/master.
Migrate bitreader to the interface from aom/master
- Here function, aom_fdct16x16_1_sse2 is mistakely tested. It can pass
AOM_BITS_8, AOM_BITS_10, but not AOM_BITS_12. We should fix this test
when aom_highbd_fdct16x16_1_sse2 is available.
Change-Id: I5cac6ee5404ff6d833940e1ecc34663b29d7a41c
This makes room for typedefing some other struct to aom_writer.
Change-Id: I1e82de1320da00b3e41c90b14f2df45e7628aa89
(cherry picked from commit d69161f8f1eed602e0e5d21f4e6157b674e30cf6)
This should make room for compile time pluggable replacements.
Change-Id: Ib7afcffa93bf664b89a49da21a20138127443292
(cherry picked from commit 9dd0b8982445515d6dddb6342e655b56062a8f7f)
Includes a major refactoring/enhancement to support
tile-adaptive switchable restoration. The framework can be
readily extended to add more restoration schemes in the
future. Also includes various cleanups and fixes.
Specifically the framework allows restoration to be conducted
on tiles such that each tile can be either left unrestored, or
use bilateral or wiener filtering.
There is a modest improvemnt in coding efficiency (0.1 - 0.2%).
Further enhancements will be added subsequently to improve coding
efficiency and complexity.
Change-Id: I5ebedb04785ce1ef6f324abe209e925c2d6cbe8a
If test files don't already exist it calls aomenc to create them.
cherry-picked #ee9ac321 from aom/master
Change-Id: I0e0f33cb60b3492e9106d6c9e2c51f64f71ebb63
Added a limit, resolving a todo and added a limit parameter so that we
can do a very simple fast encode in 1 pass.
Change-Id: I265cd912d970d560a0b00b86e6c7ec7b6fef1e7b
Adding AV1 input files to the test set is not feasible because the
bitstream is in constant flux. Add test input encoding and hook
it up in simple_decoder.sh to start.
cherry-picked #b591df89 from aom/master
Change-Id: Ie4c06a7c458cdc2ab003d27fb92418c77c87fc88
Added a limit function and removed a todo and fixed script so that
it can actually be run on av1.
cherry-picked #1801d35d from aom/master
Change-Id: Ib8d1d1b5c7dbe0169e4e6c7d89d28801d7699c37
This commit adds asserts to clarify value ranges in sum computations,
also corrects type conversion used in related calculations.
cherry-picked #738d5b19 from aom/master
Change-Id: Ib6d574ec23e5c28ccd994dac26f973eb3920430d
This commit changes to use uint32_t for cost (always non-negative),
and promote to int64_t before calculation of the savings.
This fixes an integer overflow.
cherry-picked #a3028ddf from aom/master
Change-Id: I71c2580d188cc79d2d8069241d0353cf331b5c83
The app this script called was removed in this patch.
50cbe24 remove more vp8 and vp9 only code
cherry-picked #1c17dd6f from aom/master
Change-Id: Ib622eff6a3a35c5dab26908b094ace969f128c11
this matches style guidelines and stabilizes successive runs of
clang-format across the tree. remaining types should be address in
successive commits.
Change-Id: I6ad3f69cf0a22cb9a9b895b272195f891f71170f
This makes the code in select_tx_size_fix_type match the
corresponding code in pack_inter_mode_mvs.
Change-Id: I69bcc0dc6fdd733091fafe9188a3f7397e1e613f
removes the need for an intermediate cast to int, which was missing in
the call added in:
73a3fd4 aom_mem: Refactor code
quiets a visual studio warning:
C4146: unary minus operator applied to unsigned type, result still
unsigned
Change-Id: I76c4003416759c6c76b78f74de7c0d2ba5071216
Some minor adjustments to tile size and bilateral filters.
About 0.1% improvement for midres and hdres, very small change for
lowres.
Change-Id: Ia94f68a926867dfd67da1a8795fd8de0ddd8e2d6
This allows for a clean subtraction of 1 along the transform
matrix diagonal and also makes the order of the parameter list
a little more intuitive.
Change-Id: I6a5d754af41b8d1292f241f9b21473160517d24f
these aren't overly speed critical, best to leave it to the compiler. as
a side-effect this fixes Visual Studio compilation (should have been
INLINE)
Change-Id: Ic81fb5ac76bc19c61efb2f1a965c0f79e9e45ebd
aom_realloc was allocating 1 byte more than needed every time.
Fixed this, and took this opportunity to do a small refactoring.
Change-Id: I38fcb62b698894acbbab43466c1decd12f906789
Bitstream syntax:
For a rectangular inter block, 'rect_tx' flag is sent to indicate if
the biggest rect tx is used. If no, continue to decode regular
recursive tx partition.
Change-Id: I127e35cc619b65acb5e9a0717f399cdcdb73fbf0
Uses an array to map block sizes, y tx sizes, and subsampling
factors to various transform sizes for UV.
Results improve by 0.1-0.2%
Change-Id: Icb58fd96bc7c01a72cbf1332fe2be4d55a0feedc
This corrects a formatting error introduced in:
I1e9d548ce445d29002f0c59ebfd3957a6f15e702
where spaces were used as delimiters instead of tabs.
The corresponding fixes for vp9 and vp8 are in
Ibc4eb8fd82e6b926ba259a679dc98557cadba9b1.
Change-Id: Ica3d625d6672b3c47e0e208b45eede29b9004030
- Localize static lookup tables in the sole functions that use them.
- Remove dead high bit-depth IDST functions.
- Apply clang-format
Change-Id: Ibbd7db4259f9ea64d695b2f13f5c118aac8f1cf9
This patch completes the global motion experiment
implementation. It modifies the format of the motion
parameters to use the mv union to facilitate faster
copying and checks for parameters equal to 0 that occur
frequently in rdopt. The rd decisions for the global motion experiment
have also been added to rdopt.
Change-Id: Idfb9f0c6d23e538221763881099c5a2a3891f5a9
03394bd Remove dead code from av1_dering_search.
337b23a Changing the weights of the first CRF filter in deringing
Change-Id: I1216c146dc3f72f24ceec3d3c65c4dd6cd73623e
The values after right shifts should fit into 32bit int. The commit
fixes MSVC build warning when new-quant is enabled.
Change-Id: Ic89dd86fb981a1206653943658af2b6b2925a676
When the experiment is ON, we use Paeth predictor instead of TM
predictor.
For derf set, this gives about 0.09% improvement overall, and 0.55%
improvement if all frames are forced to be intra-only.
Also, if the EXT_INTRA experiment is also on, the improvement overall
is 0.056%, and improvement if all frames are forced to be intra-only is
0.465%.
Change-Id: Id74e107ede70a8d2107fa14fcb3f44b23a437274
Function ans_read_int() takes int as parameter, this commit uses an
explicit conversion to avoid MSVC building warning.
Change-Id: Ia405e1d5a86c0f42932fa1da29417ccbf2dd58e7
Cherry-Picked the following commits:
0defd8f Changed "WebM" to "AOMedia" & "webm" to "aomedia"
54e6676 Replace "VPx" by "AVx"
5082a36 Change "Vpx" to "Avx"
7df44f1 Replace "Vp9" w/ "Av1"
967f722 Remove kVp9CodecId
828f30c Change "Vp8" to "AOM"
030b5ff AUTHORS regenerated
2524cae Add ref-mv experimental flag
016762b Change copyright notice to AOMedia form
81e5526 Replace vp9 w/ av1
9b94565 Add missing files
fa8ca9f Change "vp9" to "av1"
ec838b7 Convert "vp8" to "aom"
80edfa0 Change "VP9" to "AV1"
d1a11fb Change "vp8" to "aom"
7b58251 Point to WebM test data
dd1a5c8 Replace "VP8" with "AOM"
ff00fc0 Change "VPX" to "AOM"
01dee0b Change "vp10" to "av1" in source code
cebe6f0 Convert "vpx" to "aom"
17b0567 rename vp10*.mk to av1_*.mk
fe5f8a8 rename files vp10_* to av1_*
Change-Id: I6fc3d18eb11fc171e46140c836ad5339cf6c9419
1. Changed buffer_alloc_sz and frame_size type to size_t.
2. Added a TODO for video resolution limits. On 32 bit systems, the maximum
resolution supported in the encoder is 4k(3840x2160). The malloc() would
fail if encoding >4k video on a 32 bit system.
Change-Id: Ibd91b28fd63d1b04e8ac9a5270a17629f239188a
The previous ext-refs experiment did not consider the cost of the 2nd
reference frame on mode decision in the compound mode. With the fix,
using Overall PSNR, compared to the previous ext-refs RD performance
before the bug fix, all against the baseline, the improvements are:
"ext-refs" before fix: lowres -5.665% midres: -4.833%
"ext-refs" after fix: lowres -5.776% midres: -5.000%
Improvement by the fix: lowres -0.111% midres: -0.167%
Change-Id: I2eceedf2d4046b169514e049fd01baaf0bbb50c6
Fixed a list of VS warnings. Warning message:
..\test\vp10_convolve_test.cc(34): warning C4244: 'initializing' : conversion
from 'ptrdiff_t' to 'int', possible loss of data
Change-Id: I9a1d3978a79fbb7b1ac028c5713ac72b6ff99172
This computes global motion parameters between 2 frames by
matching corresponding points using FAST feature and then
fitting a model using RANSAC.
Change-Id: Ib6664df44090e8cfa4db9f2f9e0556931ccfe5c8
Frame can be split into rectangular tiles for application of separate
bilateral or Wiener filters per tile. Some variable names changed for
better readability.
Change-Id: I13ebc4d0b0baf368e524db5ce276f03ed76af9c8
For rectangular blocks between 8x8 and 32x32, we can now code the
transform size as one bigger than the largest square that fits in
the block (eg, for 16x8, we can code a transform size of 16x16
rather than the previous maximum of 8x8), when this oversized
transform is coded in the bitstream, the codec will use the full
size rectangular transform for that block (eg 16x8 transform in
the above example).
Also fixes a scaling bug in 16x8/8x16 transforms.
Change-Id: I62ce75f1b01c46fe2fbc727ce4abef695f4fcd43
This commit separate the frame index of EXT_ARFs' from other frame
types in the ext-refs setting.
It improves the average RD performance by
0.206% in the lowres, and
0.173% in the midres.
The overall gains for the ext-refs compared to the baseline are
5.665% in the lowres, and
4.883% in the midres.
Change-Id: I6591ad29120880c1aef0bd0b7cf15238c3f3b8f3
rd_pick_palette_intra_sby() method is called only when,
cpi->common.allow_screen_content_tools is on. So, no need to check that
again. We just use an assert() instead to still be safe.
Change-Id: I19785c2aac016798c8d331bbe91971b3806b73a8
Originally, only bi-pred type of frames can use BWDREF. When
extra alt-refs are inserted in a gf group, the closest alt-ref
serves as ALTREF for the frames within the corresponding
subgroup. Therefore, the original alt-ref can be used as BWDREF
for the LF_UPDATE type of frames.
This patch further swaps the virtual indices of BWDREF and ALTREF
for those frames whose BWDREF is farther than ALTREF. As a result,
the BWDREF is always the closet backward reference frame, and the
ALTREF is the farther one.
It improves the average RD performance by
0.132% in lowres, and
0.030% in midres.
The overall gains for the ext-refs compared to the baseline are
5.486% in lowres, and
4.666% in midres.
Change-Id: I22e4e5f378f19c4c89196a0a5e9214adb46c3428
This function was previously unused and removed in
I6bc740e778658d6f81ca54888fc6fa822d3b5ee0. I am adding it back in
with previously suggested fixes.
Change-Id: Iee0afb39170d25895b11d07e71843eae6913efd1
Mostly refactoring, but a very tiny functional change:
Do all rounding in calc_centroids() itself, instead of rounding in two
places inside palette.c
This gives a slight performance improvement for screen content:
0.078% on average.
Change-Id: I7a0e007d30ebf4e59839483a167123f31a222dd4
The previous code was giving:
unused variable ‘tmp_rate’ [-Wunused-variable]
unused variable ‘tmp_dist’ [-Wunused-variable]
‘rate2_nocoeff’ may be used uninitialized in this function [-Wmaybe-uninitialized]
Change-Id: I26326d0e5ffc141ad548654356a877cd3627cea6
Insert multiple arfs in a gf group to emulate multi-layer backward
reference frames structure. At maximum, two extra ARF's are inserted
in a gf group.
It improves the RD performance by 0.317% in Avg in lowres dataset.
Change-Id: I62c32e1b0f25b978484dd113b319bebcd959bf60
This is temporary until the global motion experiment is made to work
with ext_inter and dual_filter.
Change-Id: I73624ca6f536fd98218d7e07bcd7a2c1e6f5aebd
vp10_fht{16x16,8x8,4x4}_msa and the iht were disabled with this config
in:
4ab19ea Fix assertion failures in mips+msa setting
Change-Id: Ic675258b89ca490e8021c887b705c68428925129
Originally we can have a BRF right before an overlay frame (in
display order), which might be unnecessary since we already has a
quality backward reference frame (ARF).
This patch avoids such a coding structure and improves the RD
performance by 0.086% in Avg in the lowres dataset, and 0.153 in
Avg in the midres dataset.
In the lowres dataset, significant gains are obtained for the
following sequences:
mobisode2_240p: 0.563%
keiba_240p: 0.440%
bus_cif: 0.336%
soccer_cif: 0.333%
And the performance drops only in the following four video sequences:
motherdaughter_cif: 0.028%
bqsquare_240p: 0.017%
basketballpass_240p: 0.015%
bowing_cif: 0.006%
Change-Id: Ic94f648ba8e52eb0014933d484fb247610a9ae05
Mannually cherry-picked:
1579133 Use OD_DIVU for small divisions in temporal_filter.
0312229 Replace divides by small values with multiplies.
9c48eec Removing divisions from od_dir_find8()
0950ed8 Merge "Port active map / cyclic refresh fixes to vp10."
efefdad Port active map / cyclic refresh fixes to vp10.
1eaf748 Port switch to 9-bit rate cost to aom.
0b1606e Only build deringing code when --enable-dering.
e2511e1 Deringing cleanup: don't hardcode the number of levels
8fe5c5d Rename dering_in to od_dering_in to sync with Daala
4eb1380 Makes second filters for 45-degree directions horizontal
7f4c3f5 Removes the superblock variance contribution to the threshold
3dc56f9 Simplifying arithmetic by using multiply+shift
cf2aaba Return 0 explicitly for OD_ILOG(0).
49ca22a Use the Daala implementation of OD_ILOG().
8518724 Fix compiler warning in od_dering.c.
485d6a6 Prevent multiple inclusion of odintrin.h.
51b7a99 Adds the Daala deringing filter as experimental
Note that a few of the changes were already in libvpx codebse.
Change-Id: I1c32ee7694e5ad22c98b06ff97737cd792cd88ae
Manually cherry-picked following commits from AOMedia git repository:
bb2727c Sort includess for "clpf.h"
c297fd0 Add quantisation matrix range parameters.
0527894 Add encoder option and signaling for quant matrix control.
4106232 Turn off trellis coding for quantization matrices.
4017fca Modify tests to allow quantization matrices.
1c122c2 Add quant and dequant functions for new quant matrices.
95a8999 Enable CLPF
f72782b Fix a build issue
73bae50 Add quantisation matrices and selection functions
33208d2 Added support for constrained low pass filter (CLPF)
Change-Id: I60fc1ee1ac40e6b9d1d00affd97547ee5d5dd6be
The projected coordiantes in projectPointsTranslation were
being shifted by the incorrect precision.
Change-Id: If6040bea9e5187020d85c6095d85c7ff5786b7f9
- On E5-2680, park_joy_1080p, 5 frames, baseline encoding time
reduces about 0.8~1.0%.
- Credit goes to Erik Niemeyer (erik.a.niemeyer@intel.com).
Change-Id: I69f191d5a4e4b96a5f9ffd8286e484b69d565c01
- Add unit tests to verify the bit-exact.
- Speed unit test, function improvement: about 8%-23%.
- On E5-2680, park_joy_1080p_12, 25 frames, --kf-max-dist=1
encoding time improves from <1% to 3.5%
Change-Id: Ic16368885bb253db0200c3a6db143ab1a0b7fc26
This is a debug tool used to detect bitstream error. On encoder side, it pushes
each bit and probability into a queue before the bit is written into the
Arithmetic coder. On decoder side, whenever a bit is read out from the
Arithmetic coder, it pops up the reference bit and probability from the queue as
well. If the two results do not match, this debug tool will report an error.
This tool can be used to pin down the bitstream error precisely. By combining
gdb's backtrace method, we can detect which module causes the bitstream error.
Change-Id: I133a4371fafdd48c488f2ca47f9e395676c401f2
There was a bug in the original set up for RATE_FACTOR_LEVELS, which
results that rate_factor_deltas for GF_ARF_STD is 2.00, instead of the
intentional value of 1.75, whereas for KF_STD is 0.00, instead of the
intentional value of 2.00.
Nevertheless, if simply fixing the bug as in the first patch, the RD
performance unexpectedly dropped by 0.143% in Avg bitrate using
Overall PSNR, especially for following sequences in lowres:
bridge_close_cif: dropped by 1.468%
container_cif: dropped by 2.140%
husky_cif: dropped by 0.826%
motherdaughter_cif: dropped by 0.798%
rasehorses_240p: dropped by 0.805%
students_cif: dropped by 1.411%
This indicates that we should boost up the value for GF_ARF_STD from
1.75 to at least to 2.00. After doing so, while still keeps 2.00 for
KF_STD, the new patch achieves a small gain of 0.15% for the baseline,
and a smaller gain of 0.06% for the experiment of ext-refs. Most
sequences keep the similar RD performance in lowres, except for the
following ones that obtain a bigger gain:
(1) Baseline:
container_cif: 1.628%
students_cif: 1.015%
(2) ext-refs
tennis_sif: 1.248%
Change-Id: I992f8f6a3e20f1b71ec52a1ddc969af4968b78d5
block: from int64_t to int as it is a block index.
sse: from unsigned int to int64_t to reduce type conversion.
Change-Id: Iec8104ff8a3fd3a77d4e451c12918bd869966c2f
spatial/temporal scalability are not supported in VP10 currently.
+ remove the unused vp10/encoder/skin_detection.[hc]
this also enables DatarateTestLarge for VP10 which passes with no
experiments enabled. these were removed previously when only the SVC
tests should have been:
134710a Disable tests not applicable to VP10
Change-Id: I9ee7a0dd5ad3d8cc1e8fd5f0a90260fa43da387c
This patch just creates the interface for global motion computation
and calls it from encodeframe. Currently, the function
compute_global_motion_feature_based is empty and the work to do
the actual parameter calculation will be added in a future patch.
Change-Id: Ife142742140079e1c1743b66f180aeb2ecea29ae
- Avoid some memcpy()s
- Remove indices array
- Make pre_indices array local
- Avoid rounding twice
- Other small simplifications
Change-Id: Iac3236daaad04f21f54054cdd9504de13b942a07
This patch only includes inter frame reconstruction using gm
parameters when GLOBAL_MOTION and/or VP9_HIGHBITDEPTH are enabled.
GM is not currently used when EXT_INTER or DUAL_FILTER is enabled.
This will be added in a followup patch. For now, these experiments
will take precedence over GLOBAL_MOTION when they are all enabled.
Change-Id: I930ddda529c44d7245dbb56db3c9c5524cf45473
- Add unit tests to verify the bit-exact result.
- In speed test, function speed (for each mode/tx_size)
improves about 23%~35%.
- On E5-2680, park_joy_1080p, 10 frames, --kf-max-dist=1,
encoding time improves about 1%~2%.
Change-Id: Id89f313d44eea562c02e775a6253dc4df7e046a9
(1) Key frame: skip filter intra modes whose directional pred
version is relatively bad (rd >= 1.125 * best_rd)
(2) Inter frame: do not check filter intra modes if best_intra_rd
>= 1.25 * best_rd
Encoding time overhead is reduced by:
4.9% (9.2%->4.3%, soccer_cif)
Coding gains drop by 0.021% on lowres and by 0.076% on midres
Change-Id: I29b6f7d3d3dc4b362c6d63bc447e6a429ba5dc66
The ARF Index was wrong when updating the upsampled reference
frame buffer.
Compared to the baseline in which multi_arf_allowed is disabled, the
RD performance drops 2.250% in Avg using Overall PSNR in the derf
dataset. The performance decrease is especially in the following
video sequences:
foreman_cif: drops 7.489%
husky_cif: drops 6.421%
soccer_cif: drops 4.850%
However, it has a significant gain in the following video sequences:
container_cif: increases 8.043%
harbour_cif: increases 1.332%
Change-Id: I02472909eb34bd070d7544f57383e72559fa42b3
We have renamed following Macros to avoid name confusion:
REFS_PER_FRAME --> INTER_REFS_PER_FRAME
(= ALTREF_FRAME - LAST_FRAME + 1)
MAX_REF_FRAMES --> TOTAL_REFS_PER_FRAME
(= ALTREF_FRAME - INTRA_FRAME + 1)
INTER_REFS_PER_FRAME specifies the maximum number of reference frames
that each Inter frame may use.
TOTAL_REFS_PER_FRAME is equal to INTER_REFS_PER_FRAME + 1, which
counts the INTRA_FRAME.
Further, at the encoder side, since REF_FRAMES specifies the maximum
number of the reference frames that the encoder may store, REF_FRAMES
is usually larger than INTER_REFS_PER_FRAME. For example, in the
ext-refs experiment, REF_FRAMES == 8, which allows the encoder to
store maximum 8 reference frames in the buffer, but
INTER_REFS_PER_FRAME equals to 6, which allows each Inter frame may
use up to 6 frames out of the 8 buffered frames as its references.
Hence, in order to explore the possibility to store more reference
frames in future patches, we modified a couple of array sizes to
accomodate the case that the number of buffered reference frames is
not always equal to the number of the references that are being used
by each Inter frame.
Change-Id: I19e42ef608946cc76ebfd3e965a05f4b9b93a0b3
this function produces implicit double -> int conversion warnings and
has additional style issues.
Change-Id: I6bc740e778658d6f81ca54888fc6fa822d3b5ee0
The gm parameters need to have WARPED_PRECISION_BITS precision
until they are written to the bitstream because functions in
reconinter use these parameters before they are written to
the bitstream. Previously, the parameters weren't being converted
to WARPED_PRECISION_BITS until they were read from the bitstream
which causes an encode/decode mismatch.
Change-Id: I31e76e9d6f7d24df21af287a72f8c01f1997304d
(1) Apply ALLOW_FILTER_INTRA_MODES flag to the correct place, otherwise
there are bitstream mismatchs when it is 0.
(2) Rename pick_ext_intra_iframe() to pick_ext_intra_interframe().
Change-Id: Ic88c930de1d3f819750f0892df52bde55ae32a91
Manually cherry-picked the following changes:
8c8d16de vp9 -> vpx in names
75b57d39 VP9_ -> VPX_ in function names
761a7088 VP9_INTERP_EXTEND -> VPX_INTERP_EXTEND
4273a52c VP9->VPX in border pixel macros
03568c31 VP9_FRAME_MARKER -> VPX_FRAME_MARKER
2334f51d VP9->VPX in fdct function names
Change-Id: Icc18dbf4b416dd0fa21033b3e19ab8a47c893508
This commit makes the encoder to use different frame context index
for different frame types. In the baseline setting, it sets the
frame context index of the overlay frame to be different from other
regular inter frames. In the ext-refs setting, it further allows
the backward reference frame to use a different index.
It improves the compression performance for both settings.
Baseline
lowres 0.12%
ext-refs
lowres 0.50%
midres 0.56%
Change-Id: I7c63ddec9fc296c56a86353cf2c661a740b97a97
everything outside of third_party should follow 'PointerAlignment:
right' i.e., associate the '*' with the variable
+ add a note about the clang-format that generated this file
Change-Id: I13e3f4f5fb6e22a8fa7fc3d06879c995b7c41a39
(cherry picked from commit e4290800b2)
The LLVM trunk has reached 4.0 and now __clang_major__ is not enough
to distinguish between old XCode Clang and the new 'real' Clang.
Using __apple_build_version__ allows to make this distinction.
BUG=chromium:631144
Change-Id: I0b6e46fddfe4f409c7b7e558bda34872e60ee2d9
allows 'make test_libvpx', etc. some reworking of the makefiles would be
needed to avoid hard coding targets here.
Change-Id: I18982dbf691e7d36ab8bcf5934bab9340687b061
(cherry picked from commit 25085a6ac2)
remove some (but not all yet!) tuple mis-use, and revamp the code a lot.
Factorize some common chores into MainTestClass.
Change-Id: Id37b7330eebe80d19b9d12a454f24ff9be6b1116
Set adaptive_rd_thresh to 0 at speed 0. This allows a thorough mode
search, and eliminates a blocking artifact seen in an encoder test.
Borg test:
1. lowres
Overall PSNR: -0.135%; SSIM: -0.293%;
2. hdres
Overall PSNR: -0.122%; SSIM: -0.208%;
Encoder speed tests: 2% - 6% slower.
Change-Id: Ie7601cb8824df8f6f2ae0b2942bd938600f76990
Added a new expt rect-tx to be used in conjunction with ext-tx.
[rect-tx is a temporary config flag and will eventually be
merged into ext-tx once it works correctly with all other
experiments].
Added 4x8 and 8x4 tranforms for use initially with rectangular
sub8x8 y blocks as part of this experiment.
There is about a -0.2% BDRATE improvement on lowres, others pending.
When var-tx is on rectangular transforms are currently not used.
That will be enabled in a subsequent patch.
Change-Id: Iaf3f88ede2740ffe6a0ffb1ef5fc01a16cd0283a
- HBD encoder speed improvement (SSE4.1):
Enable CONFIG_VP9_HIGHBITDEPTH, on Xeon E5-2680,
50 frames, park_joy_1080p, 12-bit,
Encoding time reduces from 4846481 to 4177471 (ms)
- Add unit test to verify bit-exact and EOB calculation
Change-Id: I08e8ef3549ddad5ab36d86e78557df3b288537ea
Currently nothing is implemented to compute GM parameters, this
just adds the capability to send them in the bitstream if they
were computed. Still need to implement the reconstruction
based on the parameters in reconinter.
Change-Id: I72aea3c6a9de9f5a40f96da76c82b54a52781fe2
ARF with zero strength temporal filter can be reused by setting the
show_existing_frame = 1, and in this case, there is no need to
refresh the reference frame buffer. However, we used the flag
"refresh_golden_frame" as the identifier for the starting point of a gf
group.
A new flags "is_arf_filter_off" is used to record if the filter with
strengrh zero is used.
Change-Id: I25971a760f6e1638d5147fe30488c48125512b1a
Keep track of the best and second best full pixel motion vector
candidates, and do subpel search around both of them.
Compression improvement:
lowres 0.22% midres 0.23% hdres 0.18%
No noticeable encoding speed changes observed on lowres test clips.
Change-Id: I5f4df2a03d1db061cfdfdba6138b27e9ea91f089
This commit bring all up-to-date changes from master that are
applicable to nextgenv2. Due to the remove VP10 code in master,
we had to cherry pick the following commits to get those changes:
Add default flags for arm64/armv8 builds
Allows building simple targets with sane default flags.
For example, using the Android arm64 toolchain from the NDK:
https://developer.android.com/ndk/guides/standalone_toolchain.html
./build/tools/make-standalone-toolchain.sh --arch=arm64 \
--platform=android-24 --install-dir=/tmp/arm64
CROSS=/tmp/arm64/bin/aarch64-linux-android- \
~/libvpx/configure --target=arm64-linux-gcc --disable-multithread
BUG=webm:1143
vpx_lpf_horizontal_4_sse2: Remove dead load.
Change-Id: I51026c52baa1f0881fcd5b68e1fdf08a2dc0916e
Fail early when android target does not include --sdk-path
Change-Id: I07e7e63476a2e32e3aae123abdee8b7bbbdc6a8c
configure: clean up var style and set_all usage
Use quotes whenever possible and {} always for variables.
Replace multiple set_all calls with *able_feature().
Conflicts:
build/make/configure.sh
vp9-svc: Remove some unneeded code/comment.
datarate_test,DatarateTestLarge: normalize bits type
quiets a msvc warning:
conversion from 'const int64_t' to 'size_t', possible loss of data
mips added p6600 cpu support
Removed -funroll-loops
psnr.c: use int64_t for sum of differences
Since the values can be negative.
*.asm: normalize label format
add a trailing ':', though it's optional with the tools we support, it's
more common to use it to mark a label. this also quiets the
orphan-labels warning with nasm/yasm.
BUG=b/29583530
Prevent negative variance
Due to rounding, hbd variance may become negative. This commit put in
check and clamp of negative values to 0.
configure: remove old visual studio support (<2010)
BUG=b/29583530
Conflicts:
configure
configure: restore vs_version variable
inadvertently lost in the final patchset of:
078dff7 configure: remove old visual studio support (<2010)
this prevents an empty CONFIG_VS_VERSION and avoids make failure
Require x86inc.asm
Force enable x86inc.asm when building for x86. Previously there were
compatibility issues so a flag was added to simplify disabling this
code.
The known issues have been resolved and x86inc.asm is the preferred
abstraction layer (over x86_abi_support.asm).
BUG=b:29583530
convolve_test: fix byte offsets in hbd build
CONVERT_TO_BYTEPTR(x) was corrected in:
003a9d2 Port metric computation changes from nextgenv2
to use the more common (x) within the expansion. offsets should occur
after converting the pointer to the desired type.
+ factorized some common expressions
Conflicts:
test/convolve_test.cc
vpx_dsp: remove x86inc.asm distinction
BUG=b:29583530
Conflicts:
vpx_dsp/vpx_dsp.mk
vpx_dsp/vpx_dsp_rtcd_defs.pl
vpx_dsp/x86/highbd_variance_sse2.c
vpx_dsp/x86/variance_sse2.c
test: remove x86inc.asm distinction
BUG=b:29583530
Conflicts:
test/vp9_subtract_test.cc
configure: remove x86inc.asm distinction
BUG=b:29583530
Change-Id: I59a1192142e89a6a36b906f65a491a734e603617
Update vpx subpixel 1d filter ssse3 asm
Speed test shows the new vertical filters have degradation on Celeron
Chromebook. Added "X86_SUBPIX_VFILTER_PREFER_SLOW_CELERON" to control
the vertical filters activated code. Now just simply active the code
without degradation on Celeron. Later there should be 2 set of vertical
filters ssse3 functions, and let jump table to choose based on CPU type.
improve vpx_filter_block1d* based on replace paddsw+psrlw to pmulhrsw
Make set_reference control API work in VP9
Moved the API patch from NextGenv2. An example was included.
To try it, for example, run the following command:
$ examples/vpx_cx_set_ref vp9 352 288 in.yuv out.ivf 4 30
Conflicts:
examples.mk
examples/vpx_cx_set_ref.c
test/cx_set_ref.sh
vp9/decoder/vp9_decoder.c
deblock filter : moved from vp8 code branch
The deblocking filters used in vp8 have been moved to vpx_dsp for
use by both vp8 and vp9.
vpx_thread.[hc]: update webp source reference
+ drop the blob hash, the updated reference will be updated in the
commit message
BUG=b/29583578
vpx_thread: use native windows cond var if available
BUG=b/29583578
original webp change:
commit 110ad5835ecd66995d0e7f66dca1b90dea595f5a
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 19:49:58 2015 -0800
thread: use native windows cond var if available
Vista / Server 2008 and up. no speed difference observed.
100644 blob 4fc372b7bc6980a9ed3618c8cce5b67ed7b0f412 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use InitializeCriticalSectionEx if available
BUG=b/29583578
original webp change:
commit 63fadc9ffacc77d4617526a50c696d21d558a70b
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:38:46 2015 -0800
thread: use InitializeCriticalSectionEx if available
Windows Vista / Server 2008 and up
100644 blob f84207d89b3a6bb98bfe8f3fa55cad72dfd061ff src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use WaitForSingleObjectEx if available
BUG=b/29583578
original webp change:
commit 0fd0e12bfe83f16ce4f1c038b251ccbc13c62ac2
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:40:26 2015 -0800
thread: use WaitForSingleObjectEx if available
Windows XP and up
100644 blob d58f74e5523dbc985fc531cf5f0833f1e9157cf0 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vpx_thread: use CreateThread for windows phone
BUG=b/29583578
original webp change:
commit d2afe974f9d751de144ef09d31255aea13b442c0
Author: James Zern <jzern@google.com>
Date: Mon Nov 23 20:41:26 2015 -0800
thread: use CreateThread for windows phone
_beginthreadex is unavailable for winrt/uwp
Change-Id: Ie7412a568278ac67f0047f1764e2521193d74d4d
100644 blob 93f7622797f05f6acc1126e8296c481d276e4047 src/utils/thread.c
100644 blob 840831185502d42a3246e4b7ff870121c8064791 src/utils/thread.h
vp9_postproc.c missing extern.
BUG=webm:1256
deblock: missing const on extern const.
postproc - move filling of noise buffer to vpx_dsp.
Fix encoder crashes for odd size input
clean-up vp9_intrapred_test
remove tuple and overkill VP9IntraPredBase class.
postproc: noise style fixes.
gtest-all.cc: quiet an unused variable warning
under windows / mingw builds
vp9_intrapred_test: follow-up cleanup
address few comments from ce050afaf3
Change-Id: I3eece7efa9335f4210303993ef6c1857ad5c29c8
Use regular extended zero bin quantizer for both inter and intra
modes in the first pass. This doesn't affect lowres and midres
significantly, but would bring back 0.9% coding gains for hdres.
Change-Id: Ifa5977fa7b141fc5be595c0f3a4fc81a93f6606f
Originally the uniform quantization function was not being
replaced with the new_quant version in rdopt when new_quant
is turned on. This fixes the bug.
Change-Id: I593793bb909e1e1a6f89544eeca6783fe0576f25
1. Add "best_mv" in MACROBLOCK to store the best motion vector
during motion search, so that we don't need to pass its pointer
to various motion search functions.
2. Declare some functions as static when possible.
3. Fix some indents.
Change-Id: I0778146c0866cbc55e245988c59222577ea8260e
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the obmc predictor. Clean up calc_target_weighted_pred.
Encoder speedup: 1.3%
Decoder speedup: 6.5%
Change-Id: I0c774fe53d22399e92a10d1daf3af0010d88d2c5
Fixed best error reported by loop filter selection, this value is used
during loop restoration to pick best mode. Baseline remains unchanged,
change in BDRate for loop restoration experiment:
-0.628 -> -0.625 for lowres,
-1.262 -> -1.283 for highres.
Change-Id: I69ef1608bc232b250ac46f59e31fdbed1a999dcd
- For experiment EXT_INTERP under high bit depth.
- Add unit test to verify bit-exact.
- Speed performance improvement:
On Xeon E5-2680, park_joy_1080p_12.y4m, 50 frames, encoding time
drops from 6682503 ms to 5390270 ms.
Change-Id: Iea4debf5414f3accf1eb5672abeab56a0539ac77
* changes:
vp10/encoder/rdopt.c: make a function static
vp10/encoder/rd.c: make a function static
vp10_convolve_ssse3.c: make some functions static
vp10/encoder/bitstream.[hc]: correct a prototype
vp10/common/idct.h: add some missing prototypes
highbd_quantize_intrin_sse2.c: add missing rtcd include
vp10: add some missing includes
Use vpx_blend_a64_hmask and vpx_blend_a64_vmask to speed up
computing the supertx predictor.
Decoder speedup of up to 4% has been observed.
Change-Id: I255a5ba4cc24f78dc905d25b6e2f7fbafac13253
- Made source buffers pointers to const.
- Renamed vpx_blend_mask6b to vpx_blend_a64_mask. This is more
indicative that the function does alpha blending. The 6, or 6b
suffix was misleading, as the max mask value (64) does not fit into
6 bits.
- Added VPX_BLEND_* macros to use when needing to blend scalars.
- Use VPX_BLEND_A256 in combine_interintra to be more explicit about
the operation being done.
- Added versions of vpx_blend_a64_* which take 1D horizontal/vertical
masks directly and apply them to all rows/columns
(vpx_blend_a64_hmask and vpx_blend_a64_vmask). The SSE4.1 optimzied
horizontal version now falls back on the 2D version. This can be
improved upon if it show up high enough in a profile.
- All vpx_blend_a64_* functions now support block sizes down to 1x1
(ie: a single pixel). This is for usage convenience. The SSE4.1
optimized versions fall back on the C implementation if
w <= 2 or h <= 2. This can again be improved if it becomes hot code.
Change-Id: I13ab3835146ffafe3e1d74d8e9cf64a5abe4144d
Directly call c functions, otherwise when EXT_TX is enabled, hybrid
transform other than combination of DCT/ADST has not been implemented, thus
will cause assertion failures in the switch loops in vp10_fhtnxn_msa() and
vp10_ihtnxn_nxn_add_msa().
BUG=webm:1239
Change-Id: I2379a07e5406f9489edcd2f3205682f679c9b091
vp10_optimize_b now takes between 40% to 60% of the TOTAL runtime
of the encoder, depending on bit-rate. It also contains 2/3 to 3/4
of the mispredicted branch instructions in the whole program.
Adding a few branch hints makes vp10_optimize_b around 2-5% faster
(dependig on bit-rate) when compiled with gcc/clang.
Change-Id: I1572733e18b4166bc10591b958c5018a9561fa2b
The combination of the two experiments improves the compression
performance gains:
lowres 2.5%
midres 2.1%
Change-Id: Id26c0a9474ce08893aa1d946365c7ff850fab57a
When the prediction residuals are all zero, reset the coeff rate
cost and the distortion value to be zero. This change doesn't affect
lowres set significantly, but improves several clips in the midres
set, like sintel_480p and mobisode2_480p, by a few percents. The
average performance for midres set is improved by 0.2%.
Change-Id: Idd5ebf2652e556a1b1c569fe3c48dacef3f11c32
Use int64_t type for distortion. This avoids integer overflow
issues in the trellis optimization function in high bit-depth
settings.
Change-Id: I550c3ca9f11a3191ef8638a152887018cd476141
test/assertion_helpers.h
test/randomise.{cc,h}
test/snapshot.h
Modfiy blend_mask6_test.cc not to rely on these.
Change-Id: I88b8933fe0a729a606797e5cd421795a544c612d
This reinstates the tests from commit
efda2831e5 with the appropriate
fixes for 32 bit x86 builds.
Change-Id: Ib331906c5b448ca964895ee9cbfd4266f67d1089
- Use int32_t instead of int in vpx_obmc{variance,sad} functions
- Remove weigthed_src and obmc mask strides and assume contiguous
buffers. These inputs can always be packed as contiguous arrays.
Change-Id: I74c09b3fb3337f13d39e13a9cb61e140536f345d
Originally we need to send the refresh flag and the virtual indices
mapping for the reference frame buffer update for show_existing_frame to
have the BWDREF_FRAME replace the LAST_FRAME.
To remove sending this information, we update the the virtual indices
of the reference frame buffer after the last_bipred_frame is encoded,
and therefore the decoder will receive the updated reference mapping
at the next non-show-existing frame.
As a result, we can save 4 bytes per show-existing frame, and get 0.12,
0.2, and 0.07 BDRATE improvement in lowres, derf, and midref test set
respectively.
Change-Id: I63d41ee6ea99884798f0778b789d2701e2f2d3e0
Reject ext-inter compound modes before doing full rate distortion
evaluation, if the corresponding single reference modes had a lower
modelled RD.
ext-inter speedup up to TBD.
Coding performance: TBD
Change-Id: I358bfb879c5ebe5e7afbf6f540cc784f8de14857
This offset value related to the bit depth has been taken care of
inside the function vp10_highbd_block_error.
Change-Id: I58dd8a53380ba4529d59837e56a951bc81a2962e
Commit 0d6980d7a1 removed some use
of the skip_txfm optimization, and the rest are not productive.
The current use of this optimization is only used with --good
and --cpu-used >= 3, however the overhead of this is higher than the
speedup it yields.
Removing this, and subsequently simplifying model_rd_for_sb yields
a net encoder speedup:
--cpu-used=0 ~1.5% faster
--cpu-used=3 ~2.0% faster
The code simplification is also significant.
Change-Id: I1dd668c32de15a2e912c59c42379d0f9e1032ff8
- Fix the over-writing bug in horizontal filtering as width = 2.
- Fix 10-tap vertical filtering which no longer reads one row of
pixel above the block.
- Fix 10-tap filter zero padding.
- Encoder speed slow down ~4.0%, compared to,
81ad953 Convolution vertical filter SSSE3 optimization
Change-Id: I9bb294a4529300081c29bf284e6bc6eb081cc536
Add the ability to pick between 3 quantization profiles.
The profile is chosen based on the entropy context at the
block level.
Change-Id: Iaea0485798441b7d635962c2563f3a477f582dac
This commit refactors the transform and quantization process for
sub8x8 blocks and unifies the related functions.
Change-Id: I005f61f3eb49eec44f947b906c4e308cab9935a2
Replace the expanded zero-bin quantizer with uniform quantizer in
the recursive transform block partitioning scheme. This improves
the compression performance by 0.4% for lowres.
Change-Id: I1c32ce9ebba0f0760e36a2c5bd20f2f5887ea5b4
- Apply 8-pixel vertical filtering direction parallelism.
- Add unit tests to verify bit exact.
- Encoder speed improves ~29% (enable EXT_INTERP) on Xeon E5-2680.
- Combinational cycle count of vp10_convolve() drops from 26.06%
to 6.73%.
Change-Id: Ic1ae48f8fb1909991577947a8c00d07832737e57
This fixes the unit test failure in the 1-pass settings of
EndToEndTestLarge.EndtoEndPSNRTest
bug=webm:1243
Change-Id: I7667c341f7c063f7ffb83786446bbbd1e498c1aa
Move the reference frame type definitions to common/enums.h file.
Replace hard coded numbers.
Combine repeated definitions.
Change-Id: I288e079a03e448014cc181bcdb3f88ee8ec8d139
This commit fixes the use of uninitialized context values in the
combination of supertx and var-tx.
Change-Id: I2d36badf5c9806ea402ce3e19515cc299e6b79e8
This commit refactors the reference frame structure used in the
dynamic motion vector referencing system, and makes it support
the bi-directional reference frames. This resolves unit test
failure (enc/dec mismatch) when both are turned on.
The compression performance (ref-mv + ext-refs) is improved by
0.2% for lowres.
Change-Id: I233624d8fccc1f69e82295f94de984ff056365dc
The tile boundaries should now be respected even between tile rows.
regardless of whether ext-tile is used or not.
Change-Id: I5a39fd274451114a4264215f97f12be2c908016d
When the next two states are identical, skip repeated cost table
fetch and multiplication operations. This makes the trellis unit
about 5% faster.
Change-Id: I0dbf7ad0a5732044e4e45dd59e9431a251c678f2
- Apply signal direction/4-pixel vertical/8-pixel vertical
parallelism.
- Add unit test to verify the bit exact result.
- Overall encoding time improves ~24% on Xeon E5-2680 CPU.
Change-Id: I104dcbfd43451476fee1f94cd16ca5f965878e59
Properly restore the rate cost in the inner search loop of obmc
prediction. This avoids unexpected encoding behavior. It fixes
the unit test failure in obmc experiment:
AltRefForcedKeyTestLarge.Frame1IsKey/2
Change-Id: I667b219dfcf2f2c63d9d984900ed3cfd10c354bd
This patch removed the experiment of BIDIR_PRED and merged the feature
into the experiment of EXT_REFS:
(1) Each frame now has up to 6 reference frames, namely
LAST_FRAME, LAST2_FRAME, LAST3_FRAME, GOLDEN_FRAME, (forward) and
BWDREF_FRAME, ALTREF_FRAME (backward);
LAST4_FRAME has been removed;
(2) First pass still keeps the 8 updates:
KF_UPDATE, LF_UPDATE, GF_UPDATE, ARF_UPDATE, OVERLAY_UPDATE, and
BRF_UPDATE, LAST_BIPRED_UPDATE, BI_PRED_UPDATE;
(3) show_existing_frame==1 is supported in the experiment of EXT_REFS;
(4) New encoding modes are added for both single-ref and compound cases,
through the use of the 2 extra forward references (LAST2 & LAST3)
and the 1 extra backward reference (BWDREF).
RD performance wise, using Overall PSNR: Avg/BDRate
Bipred only Prev EXT_REFS Current EXT_REFS with bipred
lowres: -3.474/-3.324 -1.748/-1.586 -4.613/-4.387
derflr: -2.097/-1.353 -1.439/-1.215 -3.120/-2.252
midres: -2.129/-1.901 -1.345/-1.185 -2.898/-2.636
If in vp10/encoder/firstpass.h, change BFG_INTERVAL from 2 to 3, i.e. to
use 2 bi-predictive frames than 1, a further improvement may be
obtained:
Current EXT_REFS with bipred
1 bi-predictive frame 2 bi-predictive frames
lowres: -4.613/-4.387 -4.675/-4.465
derflr: -3.120/-2.252 -3.333/-2.516
midres: -2.898/-2.636 -3.406/-3.095
Change-Id: Ib06fe9ea0a5cfd7418a1d79b978ee9d80bf191cb
Inter blocks that have SEG_LVL_SKIP active must be at least 8x8 in
size for bitstream conformance (see read_inter_block_mode_info in
decodemv.c).
This patch makes the variance based partitioning scheme stop at 8x8
blocks in inter frames. This satisfies the SEG_LVL_SKIP constraint
and is more in line with the original implementation of this function
(before it got extended for 128x128 superblocks).
BUG=webm:1234
Change-Id: I1fdd894569a9c0817713a77daabe4c8b8e1d00c0
This commit takes the precise rate estimate for zero_token rate
cost update. It improves the compression performance:
lowres 0.15%
midres 0.23%
Change-Id: I36761079f75ce43c814f8c663667e359d4ac2cd4
The trellis optimization is going backward. Hence there is no need
to restore the token_cache values that is behind the current node
in the scan order.
Change-Id: I4da8a2e3f78bf9630e6667c85d8c387c5d94de9a
The test in arf_freq assumes any no-show frame as ALTREF_FRAME and
then calculate the minimum run between two consecutive ALTREF_FRAME's
based on this assumption. As BWDREF_FRAME is also a no-show frame and
the minimum run between two consecutive BWDREF_FRAME's may vary
between 1 and any arbitrary positive number as long as it does not
exceed the golden frame group interval, this test does not apply to
the experiment of BIDIR_PRED.
Change-Id: I70efb2c691fdc18601dbb8a7735ac2f27817e75a
Move the supertx skip bit and transform type past the recursive
prediction blocks. This is in preparation for using the segment level
skip feature for supertx blocks.
Change-Id: I8319414b0734144a9264e8a4a60940b6716b12a8
The old version used 64 bit loads, and then ignored the top half
of the result. This can cause asan failures if we read past the end
of a buffer. Switched to using 32 bit loads instead.
Change-Id: I57da127a26f869fb4b4f700b55408f6dc2fbbc1a
this file is shared between vp9 & vp10; this makes it available in the
presence of --disable-vp9
BUG=webm:1235
Change-Id: Iaf060c3c09afd2c7df69995b0c01589f78d4945e
The current test case is only run for vp9 and vp10 when HBD
is enabled. This was mistakenly removed in:
d53f9a3 Enable VP10 HBD PSNR checking unit test
Change-Id: I88b8168ad1efd805d759238a037653a2901bf50d
This commit refactors the trellis coefficient optimization process.
It saves multiplications used to generate the final dequantized
coefficients. It also removes two memset operations on quantized
and dequantized coefficient sets.
The trellis coefficient optimization is on average running over
10% faster.
Change-Id: If3aa26d2a706c3012bf2b7ac059bf1825250e81f
This fixes some crashes in
VP10/EndToEndTestLarge.EndtoEndPSNRTest/ with high bit-depth and
ext-inter.
Change-Id: I10f0f08e1be4bd5c388616074d4aa3f91a2fda7a
This commit reworks the transform and quantization unit. It enables
the use of adaptive quantization for intra modes. This further
improves the compression performance:
lowres 0.36%
midres 0.79%
hdres 0.73%
The key frame coding performance is improved:
lowres 1.7%
midres 1.9%
hdres 3.3%
The overall coding gains are:
lowres 1.1%
midres 1.8%
hdres 2.3%
Change-Id: Iaec1a3a4c1d5eac883ab526ed076d957060479dd
Move filter intra modes search to the end, after regular
mode search.
On average no performance changes.
Change-Id: I9293c8fdf706ebf831fbd61c6bb81959790f4848
The speed feature sf->lpf_picl == LPF_PICK_MINIMAL_LPF is used
to disable loop filtering. This did not work with the loop-restoration
experiment, but now it is respected.
Note that this speed feature is only used in real-time cpu-used >= 8
settings.
Change-Id: I193723c9ac5f802ec31d8c8b4d37650796e065fd
mi->stride now depends on the maximum superblock size, and hence
the constant 8 padding is no longer appropriate. Traverse the array
using mi->stride instead.
Change-Id: I8e84b9fe1728f6663f8c10765fe32206375f1e71
Segment based loopfilter strength for supertx coded blocks is now
selected based on the minimum of all segment IDs within a supertx
coded block (same as the quantiser settings).
Change-Id: Ib056bd0d05f6a1d3b512a76deb4e2ad4db0f7dc4
Segment level quantizer settings for supertx coded blocks are now
selected based on the minimum of all segment IDs within a supertx
coded block.
This also fixes the 3 adaptive quantization modes with supertx.
Change-Id: Ib5db099539d4f82f240e1d745d6e5264f8b34cde
Exit early from function when supertx is used, rather than putting
the bulk of the function body in a single conditional.
Change-Id: I41f388a45bd46e4a6ee1c51f26782ed9bddff4e5
When using VARIANCE_AQ, we can change the segment assignment after
initialising the quantiser in set_offsets, so re-initialise it when
we do so.
Change-Id: I1f168553aaf0ade419f0d4bf05820cd591b87659
Explicitly signal when the segment map is being refreshed when
using VARIANE_AQ. This simplifies decisions about when the segment id
needs to be set from the previous segment map vs based on the current
variance.
Change-Id: Ieb12c950e9cfbc3f53f4d184880071dea805563c
This commit makes the dual filter experiment work with non-420
settings. It fixes unit test failure in EndToEndTestLarge.
Change-Id: I04f7afdee78f91389d9ff72947efa152098af930
Revisit the compression performance and complexity trade-off after
making the SIMD version of trellis optimizations. Before that,
reduce the transform-quantization function calls temporarily. This
would cause about 0.3% performance drop for lowres set.
Change-Id: I16917a6bd5c44ec6cd8cd0b59f3c336c4fd96dd2
This commit combines uniform quantizer with trellis based coefficient
level optimization. It improves the codebase compression performance:
lowres 0.8%
midres 1.0%
hdres 1.6%
Note that the current trellis optimization unit is using C code. This
will make the cost of the overall quantization process slower. A number
of optimizations will come up next.
Change-Id: Id441dd238e4844409d0f08f82604be777f3f5282
This experiment implements non-uniform quantization where
the width of the bins increases gradually to more closely
match a laplacian distribution of the coeficcients.
Performance Gain:
derflr: 0.15%
hevcmr: 0.675%
Change-Id: I25234244e3bcd94b87c1f77cf682190b61c8ef94
This reverts commit f19700fe52.
This crashes in SSE2/SumSquares2DTest.RandomValues/0 under x86 due to
alignment issues
Change-Id: I135d83ba6a7894c09d7c7a139b7eaf876416b40c
This reverts commit efda2831e5.
This commit causes segmentation fault at SSE2/SumSquares2DTest.RandomValues/0
Change-Id: I171937e4daf6f15323e8206418773deb03bd8c53
This test is failing when no experiments are turned on. PSNR is
31.96 when the threshold is 32.
broken since:
0d6980d Remove swap buffer speed feature
Change-Id: I3c29815b40d5282c37f52f4345b56992f8558b2e
The assumption doesn't hold true in the current codebase. Remove
this speed feature to simplify the codebase.
Change-Id: I9b69f484c9b7cd612b825047cc5b2fce63ee0af7
The inter prediction residual can undergo different transform types
during the rate-distortion optimization search. The assumption used
in this speed feature no longer holds true. This commit removes the
related code to clean up the codebase and clear out unit test
failure in higher speed setting.
Change-Id: I7f7cd4df2345ed3e607c9fae75b38cd2dbde0cac
If it's creating problems with some experiments, disable it under the
actual conditions where it doesn't work and file a bug.
Change-Id: Iab9f4bfe42ea926d49d371918da25f9a8938a20f
This commit re-works the transform type speed feature. It moves
the transform type selection outside of the coding mode loop. This
avoids repeated motion search if the best prediction mode is
chosen as NEWMV. It improves the speed performance for clips that
contain more motion activities.
For mobile_cif at 1000 kbps, this makes the baseline encoding 7%
faster and makes the encoding with dynamic motion vector referencing
scheme enabled 10% faster.
Change-Id: I93e2714b3e461303372c4b66a4134ee212faffd1
This patch will make sure the use of the BWDREF_FRAME for the
encoding of both the two types of bipredictive frames, namely
LAST_BIPRED_UPDATE and BIPRED_UPDATE. To realize it, the
updates on the cpi->ref_frame_flags have been moved to before
the encoding of one frame, instread of originally handled after
the encoding of one frame.
RD performance has been improved slightly, approximately by 0.17%
compared to before the applying of this patch:
lowres: Avg -3.474; BDRate -3.324
derflr: Avg -2.097; BDRate -1.353
Change-Id: I0aa19afd752293e345489fbff104c4351ca5498c
The segment counts are computed as part of packing the bitstream,
so they might have been computed already in the recode loop. Zero
the accumulator to avoid double counting.
This fixes some encoder/decoder mismatches.
Change-Id: Ib7816034cbbb1db41101116b706302b02fad3a2c
1. Wiener restoration filter now has normalization and evaluation of
quantization procedure.
2. Corrected scaling of bits in RD cost computation.
3. Changed dynamic range and number of bits for Wiener filter.
Observed gains: Overall 0.58% for low_res, 0.7% for mid_res sequences.
Change-Id: I8928b3ea493bfe1790926b00388d6c4bafc08e19
We can optimize wedge partition selection by pre-computing the
residuals of the 2 underlying predictors, and then blend these
to compute the sse of the compound predictor, without actually
having to compute and subtract the compound predictor.
Similarly we can pre-compute a proxy array which we can use to
cheaply check which mask sign would have lower sse.
Details are in wedge_utils.c.
Mathematically these are equivalence transformations, but due to the
finite precision the encoder output will be perturbed, though on
average this should make 0% difference.
ext-inter gains about ~4.5% speedup.
Change-Id: Ib2657c3209ae161b4090b58b4b6c392641bf2792
xd->plane[0].n4_h and xd->plane[0].n4_w are not set at that point
when using supertx.
While this fixes the immediate crash described in the referenced
bug report, there are still issues in the ref-mv experiment that
causes these tests to fail, so they are kept disabled.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1230
Change-Id: Ibf8ef02847a903f8d10e6be28e16694db10c75af
The transform can only be skipped if both Y and U/V can be skipped, so
we always include the cost of tx size in the rate for Y. This will
get later subtracted if the transform is actually skipped.
Change-Id: I136a223e5596f18b69bb9f743e7e08438183a215
We used to cache the cost of the UV mode from the search with a
different previously tried Y mode, but the UV mode is contexted
on the Y mode, so caching the cost is inaccurate.
Change-Id: Ib003510afb6fc9befb7808b67b0be64f1c0a0804
This patch factors in the different partition coding syntax used for
right and bottom edge blocks when doing RD search.
Change-Id: I2f31650512b6a4a7a2c03352414693aff6fbf87b
This is purely a refactoring patch and has no functional effect.
Uses of these masks can be arranged such that all input blocks are
contiguous in memory (stride == block width). In this case 1D versions
of operations can be used. 1D vector operations have superior performance
over 2D block equivalents as they are more processor cache friendly and
they can do away with a second loop overhead.
Change-Id: I2b76c9888aea2c857cc497e8a4b2841fd3dad54e
Use the same rounding method that is used throughout the codebase,
where the halfway value is rounded up rather than down.
Change-Id: I04e92850bc69a7d7a07b06e3d2ce97f6f2ada321
Seperate prediction mode and tx type search for inter
modes. Enabled for speed >=1.
baseline:
speed increase 40%
compression drop 0.30%/0.29% on lowres/midres
ext-tx:
speed increase 160%
compression drop 1.08%/0.95% on lowres/midres
Change-Id: Ieb34b1ee80df6980d16e26a5783e08cc0deae55b
Add a speed feature to seperate prediction mode and tx type search
for intra modes: search for best intra prediction mode with fixed
default tx type first, then choose the best tx type for the
selected mode.
Coding performance drop:
baseline
lowres 0.10% midres 0.08% hdres 0.14%
with ext-tx
lowres 0.14% midres 0.25% hdres 0.20%
Speed improvement is 20% for baseline and 17% for ext-tx.
It is turned on for speed >= 1.
Change-Id: Ia5e8d39e8a4e2e42c521bfde938f8b6a98ab24f9
This is for the bidir-pred experiment. Previously the length of the
bi-predictive frame group interval is fixed at 2, i.e. one
bi-predictive frame may be inserted every other frame. This patch
makes the length adjustable, i.e. any positive number may be
specified, but the use of the backward ref will be turned off if the
bi-predictive frame group interval is larger than the golden frame
group.
Further, an additional rate factor level has been added:
INTER_LOW
, which applies to LAST_BIPRED_UPDATE frames that are not used as
references.
Change-Id: I5514d34a64dd486bbb5756c2d0612946f598a789
input_, ref_input_ and output_ were being allocated with new[] followed
by vpx_memalign, remove the former
Change-Id: Ia16d0f9b9317042a24445095ad3c284f4e7bb481
Major parts have been implemented as follows:
(1) Added BRF_UPDATE, LASTNRF_UPDATE, and NRF_UPDATE in firstpass.c;
(2) Added the handling for the scenario of
"cpi->common.show_existing_frame == 1" at the encoder;
(3) Added a new reference frame of BWDREF_FRAME;
(4) Have bwd-ref work with upsampled references.
Note that when the experiment of "ext_refs" turned on, this experiment
will be turned off automatically currently.
RD performance in Overall PSNR has been improved, compared against the
VP10 baseline:
lowres: Avg -3.312; BDRate -3.154
derflr: Avg -1.927; BDRate -1.176
midres: Avg -2.149; BDRate -2.001
hdres : Avg -0.567; BDRate -0.588
Change-Id: I4c06ff51cc20194bffbd4d2346e57ba3dcf6b62c
Removing redundant calls to memcpy from
build_wedge_inter_predictor_from_buf yields a net 4% encoder speedup
with ext-inter only. The output is identical.
Change-Id: If97d4e323a5c8aca90c84a25a72085e006b05446
This patch always compares the most recent show frames between
the encoder and the decoder to test the mismatch.
Change-Id: I68a91ad0996a598231450debfd616e24992419b5
This is to replace vp10/common/reconinter.c:build_masked_compound.
Functionality is equivalent, but the interface is slightly more
generic.
Total encoder speedup with ext-inter: ~7.5%
Change-Id: Iee18b83ae324ffc9c7f7dc16d4b2b06adb4d4305
This commit makes the filter extension in highbd aware of the
dual filter and ext-interp experiments to prevent enc/dec mismatch
when both experiments are turned on.
Change-Id: I11ac1f041bd5f73d61e839d6386d9c5d008da3f7
This commit makes the sub8x8 chroma component inter predictor
operate at 2x2 block level. This allows one to use the actual motion
vector associated with each individal pixel block. It improves the
compression performance
lowres 0.40%
midres 0.25%
hdres 0.15%
Change-Id: Ia40e07cc7fde463dbf660018850e024932136c4f
Use the same rounding method that is used throughout the codebase,
where the halfway value is rounded up rather than down.
Change-Id: Ie969ed7eb9fcc88a93a90c7e274fd82f336c7e4d
With ext-interp, a switchable interpolation filter is coded iff the
motion vector uses fractional pixel movement (ie, true subpixel
movement). With ext-interp and obmc enabled at the same time, the RD
search proceeds as:
1. Do motion search
2. Do interpolation filter search iff subpixel motion, otherwise use
EIGHTTAP_REGULAR
3. Evaluate obmc=0
4. Evaluete obmc=1 - This involves another motion search
If the motion search in step 4 yields an integer motion vector, while
the search in step 1 did not, then an interp_filter value other than
EIGHTTAP_REGULAR is invalid, and will cause an assertion failure
at output time, or a mismatch if not using --enable-debug.
The fix sets the interp_filter to EIGHTTAP_REGULAR if obmc=1 is picked
with an integer motion vector.
Change-Id: I4685d1ad537f41d833dc9eb64845956b67886cca
- Confirm input coeff buffer is 16-byte aligned.
- sizeof() prefer variable name instead of type.
- Fix function name (Capital first letter then Pascal case).
- Long base class name uses a newline (with colon and 4 space indent).
- Remove a unnecessary reference function variable.
- Method declaration precedes variable declaration in class definition.
Change-Id: I317f7e679926b5219f58c5f7d14512e94985e7fe
If a reference block is coded with sub8x8 block size, and if it
has sub-pixel level motion vectors, its prediction filter type
should be used as context information.
The coding performance gains of dual filter type coding scheme are
lowres 0.57%
hdres 0.88%
Change-Id: I68b98f2518d02f11c29d0256aeb45b2580fe5cac
This commit reworks the probability model contexts used in the
prediction filter type entropy coding.
Change-Id: I7abc68cb469248d0d7ca1046da3c086ecb7b066a
- Integrate 5 flip transform types for each 4x4, 8x8, and 16x16
block, for experiment, EXT_TX.
- Encoder speed improves about 12%-15%.
- Update the unit tests for bit-exact result against C.
Change-Id: Idf27c87f1e516ca5b66c7b70142477a115404ccb
Use model for interintra mode search.
Speed-up about 5-10% with about 0.04 drop in efficiency.
lowres: -2.60%
Change-Id: I825bf0ba8a46eb7f19fc528c25b8df066fb8ea95
minus the non-existent nonrd portion. original change:
commit d642294b1c
Author: Jingning Han <jingning@google.com>
Date: Thu Feb 11 12:36:49 2016 -0800
Fix tsan error in VP9 sub8x8 intra mode search
This commit fixes issue 1141. The issue was triggered in multi-tile
encoding. The change properly saves and restores the block context
information in the real-time mode selection process. It removes
several redundant memcpy operations in sub8x8 intra block mode
search.
Change-Id: I35c9ad197f4bd500ec39b5fc833f052f19eee010
Change-Id: Idfa38c54c9e645479f6870d46f71fb1e91c071da
For the current stage, we assume a single prediction filter type
per direction in the settings of compound inter prediction modes.
Change-Id: I12a1afdd364b93fcee870bd11ad01fc40ab48cff
Increases number of wedges for smaller block and removes
wedge coding mode for blocks larger than 32x32.
Also adds various other enhancements for subsequent experimentation,
including adding provision for multiple smoothing functions
(though one is used currently), adds a speed feature that decides
the sign for interinter wedges using a fast mechanism, and refactors
wedge representations.
lowres: -2.651% BDRATE
Most of the gain is due to increase in codebook size for 8x8 - 16x16.
Change-Id: I50669f558c8d0d45e5a6f70aca4385a185b58b5b
The amount of border extension needed in the first stage inter
filtering is decided by the length of the second stage filter
kernel.
Change-Id: Icddbc58c02234d5df09ff0eeebcf166ffe689203
* changes:
Refactor and add flip unit test to vp10_inv_txfm2d_test.cc
Add flip feature to vp10_inv_txfm2d.c
add unit test for highbd flip transform
Refactor vp10_fwd_txfm2d_test.cc
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Encoder overall instruction count drops 2.91%.
- Decoder overall instruction count drops 1.01%.
- Add unit test to test bit-exact result against C.
Change-Id: I908c9e0e5106c58f67dd72d28760e6c9ce54278e
Avoid accessing transform type that is not 2D-DCT if the transform
size is 64x64. This fixes an assert failure in this unit test.
Change-Id: I0dee865ea0925f5743b8a25c2f90eb6522b4d272
This commit unifies the inter predictor used in supertx at both
encoder and decoder sides. It removes the redundant decoder
implementations related to border extension.
Change-Id: I03985cee52604a518394232fa9258ce057af9c00
This commit unifies the inter predictor used in the encoder and
decoder side for super-tx experiment. This resolves an enc/dec
mismatch found in nextgenv2 nightly-run unit test.
Change-Id: I16ab8d6063edf9d2fba79473f470f1a592cc10a0
In the tile-coding experiment,
1. In tile decoder, added 2 set control APIs:
VP10_SET_DECODE_TILE_ROW and VP10_SET_DECODE_TILE_COL. It allowed
users to set the range of decoding at frame level.
2. Added a unit test while tile-coding experiment is on. It tested
both tile encoder and decoder to make sure the encoded frame
can be decoded as a whole frame or as independent tiles.
Change-Id: I73fd0632b685047cb9376008127cde72efa3fb2b
Fix the variable type mismatch between highbd_calc_masked_var_t and
the actual function definiton. This clears the related compiler
warnings in highbd with ext-inter experiment.
Change-Id: I0423318b16c867ed84700084ad21ca6e42edb321
Improves performance by 0.2%
lowres: -2.052% BDRATE
Also increases precision of the shift parameters (for further
investigation into different wedge shifts).
Change-Id: I59fcab9baa002e52a6487ed8d617185840a678ed
Weighted single motion search is implemented for obmc predictor.
When NEWMV mode is used, to determine the MV for the current block,
we run weighted motion search to compare the weighted prediction
with (source - weighted prediction using neighbors' MVs), in which
the distortion is the actual prediction error of obmc prediction.
Coding gain: 0.404/0.425/0.366 for lowres/midres/hdres
Speed impact: +14% encoding time
(obmc w/o mv search 13%-> obmc w/ mv search 27%)
Change-Id: Id7ad3fc6ba295b23d9c53c8a16a4ac1677ad835c
Functions vp10_fwd_txfm2d_#x#_sse4_1 tested in this file
will be tested in vp10_fhts#x#_test.cc
Remove this to avoid duplication
Change-Id: Iaf21ab85b9a164fcf2a4574b3e13217e43b6255e
NEW_QUANT allows bin widths to be modified as a factor of the nominal
quantization step size. This adds functions to get dequantization
values based on the dequantization offset and 3 knots for a single
quantization profile.
Change-Id: I41f10599997e943cb3391c7a0847d8485b9d8b43
Improves speed by about 10-15% by combining y-only rd with
modeling function in a better way.
Also, coding efficiency improves by about 0.1%
lowres: -1.805% BDRATE with ext-inter
Change-Id: I6ef1f8942ec6806252f3fcf749ae4f30dffe42b1
When not using ext-tile, there were still dependencies between tile
rows due to various tools (eg intra predictors) relying on the above
row or above mode info, which can be in the above tile. This is now
broken (the same way as it was when ext-tile is enabled) by fixing
the appropriate predicates.
Change-Id: I107dd0d8481775a792f14e05cfbbd761f16cdc1e
When constructing the intra predictor for rectangular interintra blocks,
the last row/column of the first square is copied back into the source
image (which is the current reconstructed image buffer) before
predicting the second square. The code used to use the height instead
of width for vertical rectangles, and vice versa for horizontal
rectangles, leading to overwriting the block on the right/below. This
leads to an encode/decode mismatch if the right/below block is in a
different tile and is encoded before the current block, which did happen
with multi-threaded encoding tests. This is now fixed.
Change-Id: I073a2a447a98b842b1394d72cc774a78cb296921
This change has no performance impact. It prepares the proper
function interface for better performance optimization.
Change-Id: I12e2f2deaf7f3adc603de0a74852116468c762f6
In VP10, REFRESH_FRAME_CONTEXT_OFF mode is only set when the error
resillient mode is on. Instead of being used to decide how to update
the frame contexts, it is used to decide if or not to reset the
frame contexts.
To verify, ran borg test on lowres set. The result is neutral.
Overall PSNR: -0.006%; SSIM: -0.006%.
Change-Id: Ic48265cf7488e80c6f5aab3eef7ba1c273506419
The original pruning function was not taking into account
that certain tx sizes/block sizes use a reduced tx set.
Prune 1: -0.3% performance drop, 20% speedup on foreman video
Prune 2: -0.48% perfomance drop, 30% speedup on foreman video
Change-Id: I557e919d97a89f787b47b3c8579a080db57f91d0
Without this patch, the experiment of ext-refs showed almost no coding
gains compared to the baseline. This is because when ext-refs is on, the
use of upsampled reference is off.
With this patch, the ext-refs experiment works with the upsampled
references and shows coding gains in Overall PSNR as follows, with ~5%
slow down for encoding time:
lowres: Avg - -0.965; BDRate - -0.844
derflr: Avg - -0.847; BDRate - -0.669
Note that the previous patch a912c6ec31
that "Make LAST_FRAME always point to the newly coded frame in ext-refs"
made ext-refs work with the upsampled refereces.
Change-Id: Id79248d71760109fb9198af4f45718b17455555f
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update vp10_fht16x16_test.cc to do bit-exact test against
latest C version.
- HBD encoder speed improves ~1.8%.
Change-Id: Icfc799a212e5289bcf6cedcae3722032133a2bc6
This commit fixes the compiler error in high bit-depth inter
predictor when dual filter type experiment is turned on.
Change-Id: I404a76a246477f2fcffc38a3275007d5dfe229cd
This patch changes the encoder only for the ext-refs experiment. For
each newly coded frame to refresh the LAST_FRAME, the decoder is
notified that the LAST4_FRAME is to be refreshed, and read out the
updated reference frame buffer vitural indexes for the next coded
frame in a way that:
LAST4_FRAME => LAST_FRAME,
LAST_FRAME => LAST2_FRAME,
LAST2_FRAME => LAST3_FRAME, and
LAST3_FRAME => LAST4_FRAME.
Compared against the original ext-refs experiment in TOT, a small gain
is achieved in overall PSNR:
lowres Avg: -0.154
lowres BDRate: -0.044
Change-Id: I648810c146a3cd915b408274a9373b7d38324864
Make the bit-stream level support per direction filter type coding
for motion compensated reference.
Change-Id: I61a2360b301075f6734cfd9711b7ae68f214174d
Use clear and correct type/function names.
Add ASM_REGISTER_STATE_CHECK wrapper for SSE4.1 function.
Conform macro EXPECT_EQ(expected, actual) convention.
Change-Id: I26c6430bea98a4fcb9727eb411b86a3b7abce933
The word 'pick' is usually used in functions that make decisions where
the bitstream allows multiple legal choices, and not to limit the
bitstream format itself.
Change-Id: Ia60709c29e004475e1aa8861aefded27ebaf4712
This experiment will implement the use of a backward prediction
reference without temporal filtering. No overlay frame will be
transmitted, instead, the flag of show_existing_frame will be turned
on.
Change-Id: I361a3004344e2ca6b63723f660635c0d790ee036
The encoder signals the interp filter type in the frame header if all
blocks use the same filter (see bitstream.c:fix_interp_filter). This
decision is made based on the counts, but with ext-interp, the counts
are actually only incremented for blocks that fail vp10_is_interp_needed
(see for example encodeframe.c:update_state), otherwise a default value
is used (EIGHTTAP_REGULAR). The decoder however first checks if the
interp filter is signaled at the frame level, and uses that filter type
for all blocks, even if the default value should have been used.
This patch makes the decoder first check with vp10_is_interp_needed
to see if the default value should be used and then checks the frame
level signaling, which reconciles the difference between encoder and
decoder.
Change-Id: I87857ade42dea06b0d5ec2a029e9219268334dbb
The test used to test that multi-threaded encode/decode resulted in
the same reconstructed image as single-threaded encode/decode. This
however did not mean that the multi-threaded encoder produced the same
bitstream as the single-threaded encoder, as the multi-threaded encoder
could use different forward probability updates and still produce a
bitstream that is sub optimal but yields the same reconstructed image.
The test now asserts that the bitstream is the same as well as the
reconstructed image. Also added more cpu-use values for testing VP10.
Change-Id: I324ed33a702c488b39e077f750d81a1ad1d7ea87
General code cleanup, but also use the same supertx condition for
ext-partition-types as for conventional partitions.
Change-Id: If86eb18b3c07b9c60434eec2c98b97ce93665b67
Factor out common codes from vp10_get_pred_context_intra_interp().
This prevents a potential invalid access of pointers xd->left_mbmi
and xd->above_mbmi.
The coding statistics are identical.
Change-Id: I72dbf9380da7359b997bbe925010faab8e9e7f8d
"qc" in vp10_token_state is used to save quantized coefficients, this
commit changes the type from short to tran_low_t to properly reflect
the value range for highbitdepth build.
This fixes an out-of-range bug when optimize_b is used in highbitdepth
build.
Change-Id: I914c6fd3d3f4b9d061f9ed7cc5f08a883ab59dcd
This is the set of 1D transforms that are used in each
ext_tx_used_inter set. The 1D sets will help speed up
the ext tx pruning functions.
Change-Id: Ib46ad26be2df60b3bfcd2f22d96e7f38ae286df5
This ensures the multi-threaded and single-threaded encoder/decoder
always uses the same probability contexts.
Change-Id: I6f1e7c6bd8808c390c1dc0a628ae97db3acedf6d
Decoding superframes correctly requires computing the end of the
frame contents in the bitstream precisely. This patch enables
ext-tile to do so.
Also extended superframe_test to test with multiple tiles if using
ext-tile.
Change-Id: I04bb8cde8755a3d764ee3c36aa8b7a6c5c9db742
Tile rows should now be independent, so make pbi->inv_tile_order
invert the decoding order of tile rows as well as tile columns.
This should improve test coverage. Also added more tile configurations
to the tile_independence_tests.
Change-Id: I14b0f2fa9241c1acaf9e2a07071952cb33feca77
With ext-tile enabled, the encoder test driver needs to configure the
tile sizes wit different values to encode using a single tile, and to
decode all tiles. This should fix most unit test failures.
Change-Id: I0a0d26737414669791f3bd8d80c537db09f06072
We enable this unit test to protect the high bit depth picture
quality from degrading under VP10 optimization and new experimental
tools.
Change-Id: I6297a44cf01954773e06549ab2a68c319fc848a8
Fix 1: in ext-inter + obmc config, properly identify if the left
predictor used for obmc is a compound one in the case that the
neighbor uses wedgeinterinter pred and we will dump the ALTREF part.
This will fix the seg fault in unit test:
VP10/AltRefForcedKeyTestLarge.Frame1IsKey/0
Fix 2: in ext-tile + obmc experiment, handle the case that the
above block does not fit in the same row tile with the current one,
so as to prevent potential crashes.
Change-Id: I1c177d4f4ad15e10d11d8756e146496437753eea
This commit fixes an encoder segment fault in the codebase, when
the segmentation feature is turned on. The issue was introduced in
5cce322 Porting ext_partition experiment from nextgen
Change-Id: Ifb4c06c5a6976114a8bd061d40d0338a136abaaf
- Tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Update bit-exact unit test against current C version.
- HBD encoder speed improves ~3.8%.
Change-Id: Ie13925ba11214eef2b5326814940638507bf68ec
Removes integer divides from backward updates for VP10.
Currently this is put in as part of the entropy experiment.
Coding efficiency change is in the noise level.
Change-Id: I5b3c0ab6169ee6d82d0ca1778e264fd4577cdd32
With ext interp, write_switchable_interp_filter calls
vp10_is_interp_needed, which needs access to the reference frame
buffers to check if they are scaled, the ref frame buffer pointer
at this point used to be uninitialized in the encoder resulting in
bitstream syntax mismatch when the encoder/decoder did not read/write
the interp filter element consistently.
Change-Id: Ie0be2a19cbfcb5639a751aa857458e91c23b8fe3
Valgrind flags these up as needed by handle_inter_mode.
Initializing fixes some assertion failures in the unit tests with
only ref-mv enabled.
Change-Id: I4d56c356692745dbecd9f790cdbb8dbfbaf72d55
Remove the restriction that the neighboring predictor cannot be
used in obmc prediction if it is an interintra or wedgeinterinter
block. The inter predictor of the interintra block, or the first
inter predictor(using LAST or GOLDEN frame) of the wedgeinterinter
block will be exploited in obmc prediction.
Coding gain: 0.248% (2.833%->3.081%) lowres
Change-Id: I4ac0368b9d2f2956f266b30c1ac97db8bafa0742
The bug is introduced by commit 1a0352d, in which mv costs are
counted twice in joint_motion_search() in ext_inter experiment.
Change-Id: Ibace453df999d3c2e781d73f1f0912038fee2d4e
This commit enables 1/8 luma component motion vector precision
for all motion vector cases. It improves the compression performance
of lowres by 0.13% and hdres by 0.49%.
Change-Id: Iccfc85e8ee1c0154dfbd18f060344f1e3db5dc18
Reduce transform set for intra for 8x8 and smalller to 7 from 12.
Also fixes an issue with prob updates.
Enocder Speed-up about 8-10%
Coding efficiency very little change.
lowres: -2.996 (from -3.055 before)
midres: -2.482 (from -2.552 before)
Change-Id: I4ba50ff967521b33c748fe423bd92f7cf4105ebc
MMX and X87 floating point instructions cannot be mixed freely on
the 32 bit x86 architecture.
This fixes a lot of unit tests in the 32bit build with
--enable-ext-intra.
BUG=https://bugs.chromium.org/p/webm/issues/detail?id=1196
Change-Id: I0e1c3565f4b9cb4fc2d716e94d9c40e68b36fac8
- Optimization on tx_type: DCT_DCT, DCT_ADST, ADST_DCT, ADST_ADST.
- Overall encoder speed improves ~4.5%-6%.
- Update bit-exact unit test against current C version.
Change-Id: If751c030612245b1c2470200c9570cf40d655504
This commit fixes an out-of-bound memory access case in the
loop filter mask setting. This issue was introduced in
10232ed Refactor loopfilter level arrays to 2D.
https://chromium-review.googlesource.com/#/c/336645/
Change-Id: I7101a4a79b9ecfdd8ec5ef13a0b314cc95f48d12
This commit fixes an encoding decision process issue that could
trigger enc/dec mismatch in the ext-inter experiment.
Change-Id: I6f10d1fd2fd1aa04e51df04c39a65cf72ac66c42
Unit test shows manually developed SSE4.1 code would performs ~30%
better if TXFM_2D_CFG configuration is set in lower level. This
change only updates function signature. There is no performance
impact.
Change-Id: I62692bd50a21ffc8a944bbd6c155c0a2020ad77b
This will facilitate bringing the zero node into the token set while
allowing its probability to vary independently.
Change-Id: I57b44c0fce44debb8e612021e44713b229d1b3cf
x->blk_skip used to be uninitialized (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when using cm->tx_mode != TX_MODE_SELECT.
This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.
Also fixed an edge effect where encode_block in encodemb.c used the
formal width of the block (without cropping at the right edge), to
look up blk_skip, while select_tx_block in rdopt.c used the cropped
width to set blk_skip.
Change-Id: I76d0f49ac5ab3ab54203573e0d7fcfcc1c6aa10d
x->blk_skip used to be uninitialzied (leftover from encoding the
previous block), if cm->tx_mode != TX_MODE_SELECT (which is used with
higher --cpu-used or --rt options). This resulted in degraded coding
performance when uning cm->tx_mode != TX_MODE_SELECT.
This fixes the VP10/EndToEndTestLarge.EndtoEndPSNRTest/40 unit test.
Change-Id: If39062927446798c626fc93694b4e6a4f35fa5da
This commit handles the zero motion vector residuals for single
and compound reference modes, respectively. It improves the coding
performance by 0.13% with no additional encoding complexity.
Change-Id: I16075a836025bd2746da2ff4698fb9261e4b08c1
We removed this adaption, which intended to reduce the size of
overlapped region if the neighboring block is a non-skip one. Thus,
now the width/height of the overlapping region is fixed as a half of
the current block.
Performance improvement (lowres/midres): 0.111%/0.102%
Change-Id: Ife75dad9d4eb355c78a05178b50cc015c442884f
This commit re-arranges the transform type and size selectio
process. It removes an unnecessary rate-distortion cost computation
step. Local experiments show that this speeds up the encoding
process by 6% for both the baseline and the ext-intra experiment.
Change-Id: Iab3b86a63a1e9e55548466791ed5d29a0575c1e7
This CL fix the bug
rdopt.c:1687: choose_tx_size_from_rd: Assertion
`mbmi->tx_type == DCT_DCT' failed
It is caused by
1) mms register access before double operation
2) different compiler behaviors
code:
int64_t a = INT64_MAX;
double b = 1. * INT64_MAX;
printf("a < b: %d\n", a < b);
result:
a < b: 0
code:
--target=x86-linux-gcc
int64_t a = INT64_MAX;
double b = 1. * INT64_MAX;
printf("a < b: %d\n", a < b);
result:
a < b: 1
I remove the double operation and test it with EXT_TX experiment.
The psnr change is around 0.05%, which is considered as noise level.
Change-Id: If8935c70c8603617fcfa8571accd30ccdda786a0
In the rd loop, check the perf of obmc, whose mv is copied from regular
inter predictor, when wedge interinter is better than regular inter
(previously it will force allow_obmc = 0). The condition of the early
termination before this step is relaxed to avoid skipping too many obmc
predictions. The rates of the overhead are properly calculated for these tools.
The logic of the bitstream syntax:
(a single ref) the interintra flag is sent first, only if it is 0, we
send the obmc flag;
(compound refs) the obmc flag is sent first, only if it is 0, we send
the wedge interinter flag
Coding gain
lowres: 0.428% (2.287%->2.715%)
Change-Id: I5f3a34640b398e313cbf84235c9fe2073eb2173f
- Implemented Angie's new fwd txfm algorithm.
- Improve ~100% than last 64-bit version; 3 times faster than
original C code.
- Passed bit-exact unit test.
Change-Id: Ica30b9768706604a6d69fe42da778441f0f5f02e
With ext-ref enabled, it is possible that when trying to encode the
first true ALTREF frame after a keyframe, the previous ALTREF frame
(alias for the keyframe) is the same as one of the new LAST{2,3,4}
reference frames, and hence cpi->ref_frame_flags will have the ALTREF
bit clear, as computed by get_ref_frame_flags in encoder.c.
sf->alt_ref_search_fp forces the previous ALTREF frame to
be used as the only possible reference when encoding a new ALTREF
frame, but due to cpi->ref_frame_flags, some buffers will not be
initialized (see rdopt.c:7689 yv12_mb), leading to a segfault.
get_ref_frame_flags in encoder.c has been changed to prefer to keep
the LAST frame, then the ALTREF frame, then any of the LAST{2,3,4}
frames and then the GOLDEN frame in that order of preference in case
any of them are the same. This avoids the segfault and behaves the
same for the baseline.
Change-Id: I4da1991667614009da5d3061a6316c0d5dbc6c0c
This avoids repeatedly checking the candidate motion vector
precision level at the decoder end. The compression performance
varies at 0.01% level.
Change-Id: I4a88e95decd900d0cac9a0c2e70ba43ef7ecac38
* changes:
Branch dct to new implementation for bd12
Change dct32x32's range
Fit dct's stage range into 32-bit when bitdepth is 12
Pass tx_type into get_tx_scale
Skip transform type search in modes with ref_mv_idx > 0. This
brings down the additional encoding time cost due to the DMR system
from 32% to 17%, at minimal coding performance regression.
Change-Id: Ie82e1d2831a313c6f1e47f7da221b51345023eb3
Fix several use cases where MAX_MV_REF_CANDIDATES is mixed up with
is_compound flag to avoid potential coding interruption.
Change-Id: Ifdee1ef8a81ef6d1c155315c6c6a3074aa7a8b5e
Do not consider 4x4 transform when the maximum possible transform
size is 32x32.
Overall encoding speed is increased by more than 10%. Compression
performance is neutral on lowres, midres, and hdres.
Change-Id: Ifac61c3c9f4b0ab392bffd4d1faa373d91014cf1
The VP8_EFLAG_NO_UPD_LAST and VP8_EFLAG_NO_REF_LAST flags can be
passed to the encoder to signal that it should not update/reference
the LAST ref frame when encoding the current frame. With
--enable-ext-refs turned on, the new LAST2 LAST3 and LAST4 ref frames
could still be used or updated, which causes the
VP10/ErrorResilienceTestLarge.DropFramesWithoutRecovery/{0,1,2}
tests to fail.
With this patch, if --enable-ext-refs is used, then
VP8_EFLAG_NO_UPD_LAST and VP8_EFLAG_NO_REF_LAST also applies to the
new LAST2 LAST3 and LAST4 ref frames, as well as the LAST ref frame.
Change-Id: If482b1c09bbaf914eca8e0348a2367bff261661d
These caused the following warning with GCC 5:
warning: logical not is only applied to the left hand side of
comparison [-Wlogical-not-parentheses]
assert(!is_compound == (cm->reference_mode == SINGLE_REFERENCE));
Change-Id: If296aabb2311ceb7d903b395c1549ef81c2cbf9b
The minimal ans partition size is now one byte. This is checked in
ans_read_init().
The read_is_valid() condition is handled by setup_token_decoder().
Change-Id: I7b202b896630bc4285532208bf7cf84567afe158
- Interface function takes a local MxN function to call based on the
block size.
- Repetition call (w/o cache line miss) shows improvement:
~63% - ~340%.
- Overall encoder speed improvement: ~0.9%.
Change-Id: Ieff8f3d192415c61d6d58d8b99bb2a722004823f
Speed increase for ext-tx by 20% for a BDRATE drop of 0.26%.
The ext-tx expt becomes -2.66% BDRATE (reduced from -2.92%) for
the lowres set.
It turns out that reducing the set of transforms for intra from
12 to 5 makes very little difference in coding performance (~0.04%).
Most of the performance drop comes from the reduction is transform
set for inter. Currently there is a provision to control that with
a macro.
Change-Id: I7de05527bf72f96acc1e0ab8a74a849da0a141e5
Update svm parameters with training data using new transforms
and remove DST from pruning functions.
Change-Id: I7bd1c4744455d571c1ecfb4cea14c25ac291f002
-Fix some bugs in row_scan and col_scan. In some cases, the above
or left neighbor was not considered even though it is available.
-When above or left neighbor is not available, try using the
top-left, top-right or bottom-left neighbor.
Compression improvement:
lowres 0.20%
midres 0.16%
hdres 0.20%
Change-Id: If521665589c7f29277b8e9223f21f4a8bf3fef39
The uncompressed frame header contains a bit to signal whether the
frame is encoded using 64x64 or 128x128 superblocks. This can vary
between any 2 frames.
vpxenc gained the --sb-size={64,128,dynamic} option, which allows the
configuration of the superblock size used (default is dynamic). 64/128
will force the encoder to always use the specified superblock size.
Dynamic would enable the encoder to choose the sb size for each
frame, but this is not implemented yet (dynamic does the same as 128
for now).
Constraints on tile sizes depend on the superblock size, the following
is a summary of the current bitstream syntax and semantics:
If both --enable-ext-tile is OFF and --enable-ext-partition is OFF:
The tile coding in this case is the same as VP9. In particular,
tiles have a minimum width of 256 pixels and a maximum width of
4096 pixels. The tile width must be multiples of 64 pixels
(except for the rightmost tile column). There can be a maximum
of 64 tile columns and 4 tile rows.
If --enable-ext-tile is OFF and --enable-ext-partition is ON:
Same constraints as above, except that tile width must be
multiples of 128 pixels (except for the rightmost tile column).
There is no change in the bitstream syntax used for coding the tile
configuration if --enable-ext-tile is OFF.
If --enable-ext-tile is ON and --enable-ext-partition is ON:
This is the new large scale tile coding configuration. The
minimum/maximum tile width and height are 64/4096 pixels. Tile
width and height must be multiples of 64 pixels. The uncompressed
header contains two 6 bit fields that hold the tile width/heigh
in units of 64 pixels. The maximum number of tile rows/columns
is only limited by the maximum frame size of 65536x65536 pixels
that can be coded in the bitstream. This yields a maximum of
1024x1024 tile rows and columns (of 64x64 tiles in a 65536x65536
frame).
If both --enable-ext-tile is ON and --enable-ext-partition is ON:
Same applies as above, except that in the bitstream the 2 fields
containing the tile width/height are in units of the superblock
size, and the superblock size itself is also coded in the bitstream.
If the uncompressed header signals the use of 64x64 superblocks,
then the tile width/height fields are 6 bits wide and are in units
of 64 pixels. If the uncompressed header signals the use of 128x128
superblocks, then the tile width/height fields are 5 bits wide and
are in units of 128 pixels.
The above is a summary of the bitstream. The user interface to vpxenc
(and the equivalent encoder API) behaves a follows:
If --enable-ext-tile is OFF:
No change in the user interface. --tile-columns and --tile-rows
specify the base 2 logarithm of the desired number of tile columns
and tile rows. The actual number of tile rows and tile columns,
and the particular tile width and tile height are computed by the
codec ensuring all of the above constraints are respected.
If --enable-ext-tile is ON, but --enable-ext-partition is OFF:
No change in the user interface. --tile-columns and --tile-rows
specify the WIDTH and HEIGHT of the tiles in unit of 64 pixels.
The valid values are in the range [1, 64] (which corresponds to
[64, 4096] pixels in increments of 64.
If both --enable-ext-tile is ON and --enable-ext-partition is ON:
If --sb-size=64 (default):
The user interface is the same as in the previous point.
--tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
in units of 64 pixels, in the range [1, 64] (which corresponds
to [64, 4096] pixels in increments of 64).
If --sb-size=128 or --sb-size=dynamic:
--tile-columns and --tile-rows specify tile WIDTH and HEIGHT,
in units of 128 pixels in the range [1, 32] (which corresponds
to [128, 4096] pixels in increments of 128).
Change-Id: Idc9beee1ad12ff1634e83671985d14c680f9179a
In certain cases the code was subtracting the obmc cost
despite it not having been added previously.
For example with ref_mv, supertx, ext_inter, obmc & ext_refs
enabled the following test was failing but now passes:
"VP10/ArfFreqTestLarge.MinArfFreqTest/33"
Change-Id: I966853f34c18d5a1d4c7a56fa201c1b02973fc88
- Use arithmetic AND (&) instead of logical AND (&&) to
generate correct testing input.
- Fix variance reference function to be consistent with
our codebase implementation.
- Refer to the following issue:
https://bugs.chromium.org/p/webm/issues/detail?id=1166
Change-Id: I8c1ebb03e22dc9e1dcd96bdf935fc126cee71307
Bitdepth 10/12:
Fit coefficient range into 32 bits
Fit codfficient * const range into 32 bits
Bitdepth 8:
Fit coefficient range into 16 bits
Fit codfficient * constant range into 32 bits
Change-Id: I50b5a3132e8a9f5155c971ab0f6eb52876d2b5ca
Use ANS for all entropy coded data in VP10 including the compressed header and
modes and motion vectors. ANS tokens continue to be used for DCT tokens.
Change-Id: Idf709a747150601e4d95d81ecfb3dc7253d349df
Decouples interintra modes and probability models from regular
intra modes, to enable creating/optimizing new interintra modes.
Also, fixes interpolation values for 128x128 interintra and obmc.
Change-Id: I5c2016db49b8f029164e5fe84c6274d4e02ff90e
Rename MI_BLOCK_SIZE.* -> MAX_MIB_SIZE.* (MIB is for MI Block).
Rename MI_MASK.* -> MAX_MIB_MASK.*
There are no functional changes.
This is in preparation for coding the superblock size at the frame
level, which will require some of these constants to become variables.
The new names better reflect future semantics, and hence make the code
clearer.
Change-Id: Iee08d97554cf4cc16a5dc166a3ffd1ab91529992
Fixes an issue with rectangular inter-intra blocks.
Includes various other refactoring and cleanups to enable fast mixing
of inter and intra predictors.
Uses only the best single inter reference so far for the inter-intra
search.
About 30% speed-up with a 0.1% hit in performance.
This is part one of overhauling on the ext-inter experiment. To be
continued in subsequent patches.
Change-Id: Id10ee100c78c6e00009a3a4f930a4435ef403a95
If --enable-ext-partition is used at build time, the superblock size
(sometimes also referred to as coding unit (CU) size) is extended to
128x128 pixels.
Change-Id: Ie09cec6b7e8d765b7555ff5d80974aab60803f3a
Amends previous commit to also handle subsampling correctly.
Change ID of prev commit: I6b07e6cf9b287ba4b5bd6599af4a7412e50b3bdc
Was causing occassional failures for 422 streams due to accessing
elements beyond the extent of the bmi array.
Change-Id: I37ebabf4c01ca84bcd1851428172bdf753805d98
Improve the readability in the related rate-distortion optimization
search control function of sub8x8 blocks.
Change-Id: I7f7456bf40a98aa5146abfe0488cda745b84d899
This commit makes the sub8x8 block to use its nearest neighbor's
motion vector as predicted motion vector for NEWMV mode. It improves
the coding performance by 0.12%.
Change-Id: I99e56715b327573ce7e8a26e3515a4984dadfd98
Moved the API patch from NextGen to NextGenv2 and also added this
API to VP10. An example was included. To try it, for example, run
the following command:
$ examples/vpx_cx_set_ref vp10 352 288 in.yuv out.ivf 4 30
Change-Id: Ib56bc3d365e530cfc8d859a13ddbf4c007907b81
This patch fixes 2 issues in Palette mode:
1. More memory is needed in PALETTE_BUFFER for 444 video format.
2. A merge issue caused by
https://chromium-review.googlesource.com/#/c/333940/7
Change-Id: I2aedc7dfdfb6b66fbd600189ec6e1e2cc6120d40
No need to do avoid shortcuts when all we are testing is the superframe
syntax. Decreases the run time up the VP10 version of the test from 22
seconds to 3 seconds on my machine.
Change-Id: If0c3551cbb8af8b803e02629e803e5f09da76cd1
- Wrote function: fidtx8_sse2() and fidtx16_sse2().
- Turned on vp10_fht8x8_sse2()/vp10_fht16x16_sse2() for new types.
- Updated 8x8/16x16 unit tests for accuracy/speed.
- Running 20K times with random numbers and getting through
tx type from V_DCT to H_FLIPADST, SSE2 speed improvement:
8x8: ~131%
16x16: ~66%
Change-Id: Ibbb707e932a08fec3b1f423a7dab280a1d696c9a
Skip checking obmc when regular inter predictor is not so good (the
rd-cost for Y residual is greater than the total rd of the best mode
so far.)
Performance change compared to full rd search:
+0.006% lowres, -0.056% midres
Encoding time :
1.14X baseline (was 1.42X)
Change-Id: I11350f955a20e1a2331be458537a915e09fbedf3
After porting tile coding from VP9 to VP10, some performance
degradation was seen because of the difference between VP9 and
Vp10 baseline. This patch disabled some features in VP10 while
tile coding is turned on. Also, an encoder control API was added
back for this use case.
Change-Id: I8f736db8388408a8cc35320a2f80abb02906571c
Skip filtered intra modes search in inter frame when DC mode is
worse than the best mode so far.
With ext-intra enabled, the overall speed is increased by 20~40%;
performance drop is 0.03% on lowres and 0.05% on midres.
Change-Id: I75d2503b067cf5e46e3533b97fb01497e125baa7
- Added function fidtx4_sse2().
- Turned on vp10_fht4x4_sse2() for these tx types.
- Updated 4x4 unit test for speed/accuracy.
- 4x4 Unit test passed.
- Running 20K times with random numbers for tx type from
V_DCT to H_FLIPADST, SSE2 against C, speed improves ~46%.
Change-Id: I828088b7f98dc0f5939a72e3fcd6cb0b8d8dd8bf
If configured with --enable-ext-tile, the codec uses an alternative
tile coding syntax in the bitstream. Changes include::
- The maximum number of tile rows and columns is extended to 1024
each.
- The minimum tile width/height is 64 pixels (1 superblock).
- A tile copy mode is added where a tile directly reuse the coded
data of a previous tile
- The meaning of the tile-columns and tile-rows codec parameters are
overloaded to mean tile-width and tile-height in units of 64
pixels.
- All tiles should now be independent, including rows within the
same columns, so large scale parallel, or independent decoding is
possible.
- vpxdec also gained the options to decode only a particular tile,
tile row, or tile column.
Changes without --enable-ext-tile:
- All tiles should now be independent, including rows within the
same columns, so large scale parallel, or independent decoding is
possible.
- vpxenc default tile configuration changed to use 1 tile column.
Change-Id: I0cd08ad550967ac18622dae5e98ad23d581cb33e
- Use Makefile to control the build for highbd_fwd_txfm_sse4.c.
- Fixed hybrid transform (HT) types due to recent update.
- Added new unit test cases for highbd HT.
Change-Id: Ifd768a9b429a8c21ed40c1de8152fb5ac71e2f90
This commit separates the predicted motion vector from the nearestmv
motion vector in the coding process for both regular and sub8x8
block sizes.
Change-Id: I703490513b0194e6669ebf719352db015facb3e1
- Setup function vp10_highbd_fht4x4_sse4_1 for highbd SSE4.1
intrinsics optimization.
- Wrote SSE4.1 functions: load_buffer_4x4(), write_buffer_4x4(),
and fdct4x4_sse4_1().
- Used logic right shift to avoid coeff memory write/read.
- Turned on vp10_highbd_fht4x4_sse4_1 for DCT_DCT mode only.
- Improved overall encoding performance >2.3% for 50 frames
sequence, park_joy_1080p_12.y4m, in which, --input-bit-depth=12,
--bit-depth=12, 50 frames.
- Unit test passed.
Change-Id: Idd6dc6e472cbbf235f0ade4f66fbe859a860a004
This has been ported under ext_partition_types because it is due
to be combined with the coding_unit_size experiment which is
already being ported under ext_partition
Change-Id: I47af869ae123ddf0aa99160dac644059d14266ee
Merge the functions that generate prediction by above/left predictors
for the encoder and the decoder.
Change-Id: I57e53a8f2eb8d3028c4ed0c9abdcbf00503f95a0
Decompose choose_tx_size_from_rd into three functions that determine
the transform coding rd at different stages. Besides the original
function, txfm_yrd() calculates the rd for fixed size and type.
choose_tx_size_fix_type() fixes the type and searches for the size.
It can enable other experiments to do restricted tx searches so as to
reduce the impact on speed.
Similar refactoring is done for select_tx_type_yrd() in VAR_TX.
Performance change in baseline is trivial:
0.014/0.001/-0.020 for lowres/midres/hdres.
Change-Id: I2ecbf6066329be088ec1bfb69013b657b14b8afe
This allows sharing more code paths with the rest of the code an allows
for easier compatibility with the other experiments.
Change-Id: Id288b533805a4d0657ec2f17542f2e6ad23ebdb4
Makes a set of 16 transforms total, adding all 1D
combinations of ADST and FlipADST, and removng all DST
transforms.
lowres, midres both improve by about 0.1% and hdres by
-0.378% in BDRATE but with fewer transforms that are also
simpler.
Further experiments to continue later.
Change-Id: I7348a4c0e12078fdea5ae3a2d36a89a319ffcc6e
Rework the interface to allow codec store the reference motion
vector list information for coding process.
Change-Id: I47e26587f6c0808655e4626f316ec7614a7ad8ed
This commit re-designs the probability model for the syntax elements
of the dynamic motion vector referencing system.
Change-Id: Icfb8203c7e8f64e10e99f5890e25e6f6b15fe5d1
This buffered ANS coder supports coding the symbols in forward (decode)
order. Rather than windowing or growing the buffer, right now this
coder merely asserts that the buffer will never overflow.
This approach should allow ANS to be used as a drop in replacement for
other entropy coders rather than requiring complicated reversal logic
throughout the codebase.
Change-Id: I6689271233d0e22fea94c51950415dad5af96598
This commit enables the dynamic motion vector predictor for NEWMV
mode. It allows the codec to select the best motion vector predictor
in a rate-distortion optimization framework for motion vector
residual coding. The compression performance is improved:
lowres 0.14%
midres 0.27%
hdres 0.24%
Change-Id: I6a601c74eb6cb0b71a613336d40363359f2edecd
Eearly termination if U plane RD cost is large enough.
No notable compression performance changes.
Change-Id: Ieeefc5859cb55d94391b502b4bd840bc8bcb2578
This is a cosmetic patch that removes a great deal of conditional
compilation around CONFIG_VAR_TX from the partition search function.
Change-Id: I9dcef9d4fe6847b793c77bdf565a5cacbdfacd59
This patch added two features to improve entropy coding efficiency
for coefficient tokens.
1. Choose 1 of 4 default probability tables based on q-index for
key-frames.
It is ported from nextgen branch:
https://chromium-review.googlesource.com/#/c/280586/
2. Do backward update after each superblock (64X64) row using
subframe token counts.
Coding gain: 0.1% on lowres; 0.42% on midres; 0.36% on hdres.
Much larger gain for key-frames: 2.6%, 2.3%, 1.7%.
Design doc: go/huisu-entropy
Change-Id: Ia3b6a615636be09247d70e4c520405637561532b
This commit fixes the computation of rate_nocoef for situation when
rate_y is uninitialized at INT_MAX for x->skip is true.
Change-Id: If3dde4e4ee16667f4408067d3bb3084f916272f1
In preparation for adding more 1D variants with ADST/FlipADST/etc.
BDRATE actually improves by 0.21% on lowres.
Change-Id: I2fa4720c69fe001fa666119a284dfc6b17fffab2
Optimized 2 up-sampled reference prediction functions in high-bit
depth case. This reduced the HBD encoding time by 3%.
Change-Id: I8663ffb5234f5e70168c0fc9ca676309fe8e98f2
Instead of testing all interpfilter-BMC/OBMC combinations, we choose
the best interpolation filter based on regular inter prediction.
Reduction in encoding time: ~10%
Drop in performance gain: 0.08% lowres, 0.04% midres
Change-Id: Ifc19097a918ac76b529db9af4c60e2c70e93f7ad
Temporarily disable transform type selection for 32x32 transform
block size. This speeds up the encoding process. For bus at CIF
150 frames, the encoding time goes from 896s -> 762s (11% faster).
The compression performance for lowres set is improved by 0.15%,
and -0.029% for hdres.
Change-Id: If239b272970eb302150bec13b8cf192fbe045332
Using the up-sampled reference frames in sub-pixel motion search is
enabled as a speed feature for good-quality mode speed 0 and speed 1.
Change-Id: Ieb454bf8c646ddb99e87bd64c8e74dbd78d84a50
Coding gain on screen_content is 12.2% (was 6.6%).
Some features such as frame-level color buffer, adaptive
entropy coding, are coming in future patches.
Change-Id: I2658cf5ec0cbb02cff685475759f3b68c9807697
This commit enables the hybrid 1-D/2-D transform coding scheme for
high bit-depth setting. It improves the compression performance of
ext-tx experiment by 0.98% for lowres_all set.
Change-Id: Ic27f5037f2c36b095a93b9f15dbae34bdcdf00aa
This commit enables the 1-D transform to use Manhattan grid vertical
and horizontal scan order for transform coefficient entropy coding.
Enabled in inter prediction mode, the hybrid 1D/2D transform coding
scheme outperforms the 2D-DCT based coding system used in VP9 by
lowres_all 1.7%
hdres_all 1.4%
As one coding option, in addition to the existing 17 other transform
types in ext-tx experiment, the 1D/2D hybrid transform improves
the coding gains:
lowres_all 2.2% -> 3.0%
Change-Id: I9cefa9d9e38224546d0afd67feecd9f8d4a16ab0
- Implemented fdst16_sse2(), fdst16_8col() against C version: fdst16().
- Turned on 7 DST related hybrid txfm types in vp10_fht16x16_sse2().
- Replaced vp10_fht10x10_c() with vp10_fht16x16_sse2() in
fwd_txfm_16x16().
- Added vp10_fht16x16_sse2() unit test against C version:
vp10_fht16x16_c() (--gtest_filter=*VP10Trans16x16*).
- Unit test passed.
- Speed improvement: 2.4%, 3.2%, 3.2%, for city_cif.y4m, garden_sif.y4m,
and mobile_cif.y4m.
Change-Id: Ib30a67ce5d5964bef143d588d0f8fa438be8901f
Pixel domain distortion calculation is enabled for the rd loop of
inter sub8x8 and intra 4x4 cases.
Coding gain: 0.124% derflr, 0.122% derfhd
Change-Id: I43b47fe81b4f5ccc1c66bc626bd310c413a1ed87
- Inherited base class TransformTestBase to derived class VP10Trans8x8HT.
- Employed RunCoeffCheck() to test vp10_fht8x8_sse2() against C reference
function vp10_fht8x8_c().
- fdst8_sse2() related seven hybrid transform cases are covered in this
test.
- Test passed (4 test cases w/o EXT_TX; 16 test cases with EXT_TX).
Change-Id: Id9a9b308c707164a120d9ceb2c30e572026fb1d0
This commit enables a hybrid 1-D/2-D transform coding scheme and
the accompany entropy coding system. It currently uses hybrid
1-D/2-D DCT transform coding. It provides coding performance gains:
lowres_all 0.55%
hdres_all 0.43%
Change-Id: I2b30dcafd21eb2bb3371f6e854cbab440a4dfa78
For left side obmc, the input of the mask function is corrected as
the column coordinate.
Also, minor fixes for a compiler warning.
Change-Id: Ia981ef443d5b0285a93d73e5c7ab83f8c3a23464
Inherited class TransformTestBase to derived class VP10Trans4x4HT.
Employed RunCoeffCheck() to test vp10_fht4x4_sse2() against
C reference vp10_fht4x4_c().
fdst4_sse2() related seven hybrid transform cases are covered
in this test.
Wrote a header file for test base class. Some modification to
make sure the base class can be used for 8x8, 16x16, 32x32 cases.
All related tests passed.
Change-Id: I6b19a39d3ea30b657847781e78e73b829998a57a
This sets up the interface for 3 speed features that progressively
eliminate a greater number of transforms in ext tx using
pre-trained support vector machines.
Each speed feature still needs to be implemented.
Change-Id: Ia508aeadc0cffdc080fb227f357a5d1dfbca08e2
This commit fixes an encoding issue related to var-tx and ref-mv
experiments that causes the codec to use random values for transform
block skip flag.
Change-Id: I8daa6d6b88ea45b5bbeb81b43dd0eeff545c8e5a
Make the RANS implementation operate on cumulative distribution
functions rather than individual probability distribution functions.
CDFs have shown themselves more flexible to work with.
Reduces decoding memory usage from scaling O(num_distributions *
symbol_resolution) to O(num_distributions).
No bitstream change. This is an purely implementation change.
Change-Id: I4e18d3a0a3d37a36a61487c3d778f9d088b0b374
This allows the codec to use effective motion vector as the candidate
to produce the reference motion vector list.
Change-Id: Ib90be705fe28200c13376d6d7741800a61f13043
fdct16_sse2() was not bit-exact with C reference, fdct16().
The inconsistency was found by writing a unit test for
vp10_fht16x16_sse2(). Since the unit test needs a pending
change on the inherited base class. I will commit this unit
test after making a header file for this base class.
Passed the uncommitted unit test: vp10_fht16x16_test.cc.
Change-Id: If2b617883c633a3ea90c19e1d018240c8007102b
The above-right and left-bottom pixels were sometimes not used even
though they are available. Results on lowres_all and hdres_all are
mostly neutral.
Change-Id: Ic13533dd498442ad5592b83bb5fabf053cc8e8f0
The sum of squared value of a block can overflow 32bit, this commit
changes to use int64_t to avoid the overflow issue.
Change-Id: I78fcd6999634f186f86d649cfce85d97a993d040
Up-sampled the reference frames to 8 times in each dimension using
the 8-tap interpolation filter. In sub-pixel motion search, use the
up-sampled reference frames to find the best matching blocks. This
largely improved the motion search precision, and thus, improved
the compression quality. There was no change in decoder side.
Borg test and speed test results:
1. On derflr set,
Overall PSNR gain: 1.306%, and SSIM gain: 1.512%.
Average speed loss on derf set was 6.0%.
2. On stdhd set,
Overall PSNR gain: 0.754%, and SSIM gain: 0.814%.
On hevchd set,
Overall PSNR gain: 0.465%, and SSIM gain: 0.527%.
Speed loss on HD clips was 3.5%.
Change-Id: I300ebaafff57e88914f3dedc8784cb21d316b04f
Fixes some issues introduced by a merge of two patches.
Also decouples the temporal interpolation filter from the switchable
filters for now for ease of experimentation with both separately.
Change-Id: If1c7c08adf00e0cf818fe8d0d3656c26ea65eb32
Includes various cosmetic changes and refactoring including
naming the sharp filters differently (since they are no longer
8-tap).
Change-Id: Ida5a19ca0daa9f6a64a6734394c685b2a4a2564a
This commit unifies the encoder and decoder border extension and
motion compensated prediction process. Remove the decoder specific
flow to simplify the development flow.
Change-Id: I9c43bbe6d7c017e6da2db6a62c5bf3d0af7ccfce
The interintra experiment, which combines an inter prediction and an
inter prediction have been ported from the nextgen branch. The
experiment is merged into ext_inter, so there is no separate configure
option to enable it.
Change-Id: I0cc20cefd29e9b77ab7bbbb709abc11512320325
This commit uses 12-tap sharp filter to generate alter reference
frame. It improves the compression performance by
derf 0.45%
hevcmr 0.35%
stdhd 0.79%
No encoding time change is observed.
Change-Id: Ia5dc26d5aae6b9b0cb782e5a28dc5066eeeb2ec8
Implemented fdst8_sse2() function against C version: fdst8().
Added seven DST related hybrid transform types in vp10_fht8x8_sse2().
Replaced vp10_fht8x8_c() with vp10_fht8x8_sse2() in fwd_txfm_8x8().
Speedup: 18.1%, 11.5%, 22.0% based on speed test from
city_cif.y4m, garden_sif.y4m, mobile_cif.y4m.
Change-Id: Ia4aa1ea44c7a33e494f64ce843037f8703f975e3
Adds hooks to use 32x32 ext-tx. Also adds scan orders for the masked
transforms for 32x32.
Make macro USE_MSKTX_FOR_32X32 1 in blockd.h to support 32x32 masked
transforms for ext-tx.
Change-Id: Ie6564830266651fcafae2d536c274dafd664ce17
These variable names were legacy from a previous version of this
function and in the current version they were confusingly backwards.
Change-Id: I4f6c1628f296fd5b650fd9c5e2d56d7daf66a3f6
This commit enables a context based motion vector entropy coding
conditioned on dynamic reference motion vector list. This (along with
the previous CL) imporves the coding gains due to dynamic motion
vector referencing based entropy coding:
derf 0.1%
hevcmr 0.2%
stdhd 0.7%
hevchr 0.4%
No encoding time change was observed.
Change-Id: I179c723844079195f6952a12582996a3ca9e9914
Instead of using model_rd_for_sb() to estimate the cost and make the
decision on bmc/obmc, we use super_block_yrd/uvrd() to calculate and
compare the real rd costs of bmc and obmc.
Average bit-rate reduction(%) of obmc experiment:
derflr/derfhd/hevcmr/hevchd
2.353/TBD/TBD/TBD
Before the optimization, the coding gain was:
1.582/1.109/1.600/1.164
Note: there is still some mysterious bug because that compared to
the previous version, the performance at low bit rate drops a lot.
Change-Id: I8dbee04a272190f10516a3953c1ae690f8136766
This commits adds a shift stage for FASTSSIM computaton when source
bit depth is different from working bit depth, to make sure metric
results are calculated in bit_depth consistent with source.
Change-Id: I997799634076ef7b00fd051710544681ed536185
This commit adds the ability to shift down the working buffer when
source bit_depth is different than working bit_depth. It does so
by shift down to be consistent with source bit_depth.
Change-Id: Idfdbfc614d73fe445d62e35e642cc7d75e9dc4ff
Don't initialize first pass costs for a number of symbols where first
pass probabilities aren't initialized.
As a side effect, an illegal read in the ANS experiment is fixed.
https://bugs.chromium.org/p/webm/issues/detail?id=1089
Change-Id: I97438c357bd88f52f5a15c697031cf0c3cc8f510
This commit unifies the motion vector cost buffers for full pixel
and sub-pixel motion search. The new motion vector coding system
provides 0.5% coding gains for 720p and above sequences and 0.2%
for lower resolution sets.
Change-Id: I927ec81eadc39d11a3c12b375221a1ddd2e8bf24
This commit extends the HBDMetricTests to handle testing for metric
computation where input source depth is different from working bit
depth.
Change-Id: I5d11101cc9603a3fd09e8439816bb982a0f1b654
Priviously, we do 12-tap interpolation even there is no sub pixel,
This could cause a bug becuase decoder doesn't extend border when there
is no sub pixel. In this situation, if we still do interpolation, we
will access the border extension which doesn't exist and cause a
memory error
Change-Id: I55b879722f0a10c5d13261bd9617a75c826a2418
This commit accounts for the context based probability model for
motion vector cost estimate in rate-distortion optimization.
Change-Id: Ia068a9395dcb4ecc348f128b17b8d24734660b83
This commit converts the scalar motion vector probability model
into vector format for later precise estimate.
Change-Id: I7008d047ecc1b9577aa8442b4db2df312be869dc
This fixes a bug in HBD sum of squared error computation introduced
in #abd00505d1c658cc106bad51369197270a299f92.
Change-Id: I9d4e8627eb8ea491bac44794c40c7f1e6ba135dc
-Avoid unnecessary calculations
-Use SIMD when possible
Encoder is about 5% faster with the extra intra prediction angles
enabled.
Change-Id: I131056befe327cedab217ad4a40d5f2a11318acc
Preliminary tests indicated that these changes make cost_coeffs
approximately 20% faster which is a 2% improvement overall
Change-Id: Iaf013ba75884415cd824e98349f654ffb1c3ef33
This makes all metric computation to locate at some place, also gets
rid of duplicate code between vp9 and vp10.
Change-Id: I24a2707d183a2419cd18a8343010adae185ffcd4
Adds new 32x32 masked 1-d transforms that combine 1-D length-16
DCT with length-16 identity transforms.
To be continued in subsequent patches.
Change-Id: I0b4f66492d44c079b3c3b531ba48a97201de1484
This commit fixes an enc/dec mismatch in the dynamic motion vector
referencing experiment introduced in 837ef00.
Change-Id: I9fbe116fce118a80ef0f96bf41ce1f802547c2ee
This bug made the rd loop use one-side obmc (compound of the current
predictor and the predictors of the left mi's, while the above ones
are ignored by mistake) to determine whether to use obmc. This fix
improved the compression performance by ~0.6% on different test sets.
Coding gain (%) of obmc experiment on derflr/derfhd/hevcmr/hevchd:
1.568/TBD/1.628/TBD
Change-Id: I43b239bedf9a8eebfd02315b1b036e140a998140
Removes the USE_DST2 flag that was on by default. DST2 performs
slightly better that DST1 and is faster to compute.
Change-Id: Ifb788f3f0a0e1995d7625230cec144b876f01206
Using this we can eliminate large numbers of calls to predict intra,
and is also faster than most of the variance functions it replaces.
This is an equivalence transform so coding performance is unaffected.
Encoder speedup is approx 7% when var_tx, super_tx and ext_tx are all
enabled.
Change-Id: I0d4c83afc4a97a1826f3abd864bd68e41bb504fb
In this experiment, an obmc inter prediction mode is enabled for
>= 8X8 inter blocks. When the obmc flag is on, the regular block-
based motion compensation will be refined by using predictors of
the above and left blocks.
Fixed some compatibility issues with vp9_highbitdepth, supertx,
ref_mv, and ext_interp.
Coding gain (%) on derflr/hevcmr/hevchd
OBMC:
1.047/1.022/0.708
OBMC + SUPERTX:
1.652/1.616/1.137
SUPERTX:
0.862/0.779/0.630
Change-Id: I5d8d3c4729c6d3ccb03ec7034563107893103b7f
Adds a wiener filter based restoration scheme in loop which can
be optionally selected instead of the bilateral filter.
The LMMSE filter generated per frame is a separable symmetric 7
tap filter. Three parameters for each of horizontal and vertical
filters are transmitted in the bitstream. The fourth parameter
is obtained assuming the sum is normalized to 1.
Also integerizes the bilateral filters, along with other
refactoring necessary in order to support the new switchable
restoration type framework.
derflr: -0.75% BDRATE
[A lot of videos still prefer bilateral, however since many frames
now use the simpler separable filter, the decoding speed is
much better].
Further experiments to follow, related to replacing the bilateral.
Change-Id: I6b1879983d50aab7ec5647340b6aef6b22299636
This commit adds computation of PSNRHVS for highbitdepth build, it
also adds tests to make sure the calculation of psnrhvs metric for
10 and 12 bit correct.
Change-Id: Iac8a8073d2b3e3ba5d368829d770793212fa63b6
This commit aligns the rate-distortion metrics for both luma and
chroma components in super transform rate-distortion optimization.
It improves the coding gains due to var-tx and supertx experiments
by 0.2% for high resolution test sets.
Change-Id: Ib89d99e29cb5ee27b1f867e301954d4164d8b364
Brings the following commits to vp10:
269428e Tie the bit cost scale to a define.
d13385c Switch to 9-bit rate cost constants built on a 256 probability denominator.
ad43a73 Fix a signed overflow in vp9 motion cost.
1c9b091 Fix some interger overflow errors
fac947d Restore previous motion search bit-error scale.
Change-Id: I598ba7ee7efcde18439c31dfa96b86cbf297a580
This commit adds the computation of fastSSIM for highbitdepth build,
it also modifies the hbdmetric test to be more generic and applicable
for fastSSIM.
The 255 used for calculating ssim constants c1 and c2 is not exactly
scaled by 4x and 16x to 1023 and 4095, therefore requries the metric
test to have a thresold more tolerant than 0, currently at 0.03dB.
Change-Id: I631829da7773de400e77fc36004156e5e126c7e0
obmc: We add an obmc prediction mode at superblock level.
When it is enabled, predictors of the above and left blocks
are used to refine the regular block-based motion compensation.
Change-Id: I6310104ea3dfece16d736351e368861471dd1c9b
Setting FIXED_TX_TYPE as 1 makes the encoder skip tx_type search,
about twice as fast.
This speed feature is off by defualt; we can turn it on when we
want to quickly test new ideas.
Change-Id: Ieab5807d17fcd54fce3e8ae2f59a18b42eb79408
This commit aligns the rate-distortion metric for the recursive
transform block partitioning and the super transform. It resolves
the conflicts between these two experiments. The coding performance
gains of the combined experiments (var-tx + super-tx) has been
improved:
derf 0.89% -> 1.9%
hevcmr 1.06% -> 1.8%
stdhd 0.29% -> 1.4%
hevchr 0.80% -> 2.3%
Change-Id: I7e33994ad70c1b2751435620815f867d82172f41
These costs are added in separately just before the computed
ref_costs_* are added in the calling functions, so they were
effectively double counted.
Change-Id: Ic941d0243460cc2e750791cfc508e97d8b90e8fd
This patch makes rd optimization use the same context for computing
the rate cost of coding the partitioning as the packer actually uses
when emitting it in write_modes_sb.
Change-Id: Idb1427bb2f9c37ab80c6aa182f7ff754ef0595cb
The value of use_highbitdepth flag is used for compute the size for
high bit depth buffer allocation, which should take value 0 or 1
depending on if the buffer is used for high bit depth or not.
Previously, the values is set to 8 or 0, this commit fixes the issue
and properly set the value for this flag to 1 or 0.
This cuts the size of highbitdepth buffer memory allocation to 2/9 of
the size prior to the fix.
Change-Id: I401518b5a6147e5d8a973e54f7ca6bc1892065e0
This commit enables entropy coding for dynamic reference motion
vector modes. The probability model is contexted on the ranking
categories of the reference motion vector candidates.
Change-Id: I09b58d98a409d63ec1a407331e29f8945b7ef17d
Fixes an issue where the tx_type was not set correctly for
sub8x8 inter and intra blocks. In the current syntax, for
sub8x8 blocks, there is still a single tx_type that is
transmitted. Ideally, this should be searched for the best
rd performance, albeit at the expense of encode speed.
For now, we just set it to DCT_DCT. Previously it was left
incorrectly as what was used for the previous non sub8x8
block.
derflr: BDRATE -0.277%
Change-Id: If76ba903bfbfd4d374cf1ac7d1daee50e92f0edd
This commit enables the dynamic reference motion vector coding mode
for the compound inter blocks.
Change-Id: Ibe78fd8de6989db392cd67a9d81a69d680345ba1
There were a number of compiler warnings:
1. int16_t to uint8_t in recon_intra.c;
2. double to float conversions in psnrhvs.c
3. intptr_t to int in quantize.c
4. size_t to int32_t in decoder.c
Change-Id: Id95423b17779dcfa6cf39d9a90fe8cb8b910f5df
Set USE_12_SHARP_FILTER to 1 to turn on the experiment
The psnr percentages increase
derf stdhd
lowbd +0.332 +0.318
highbd +0.476 +0.507
Change-Id: I783c0fc764ee8541645e100453c9b2073924e209
Temporaly disable warning for unused function for vp10, needs clean
out the warnings before re-enable the flag for vp10.
Change-Id: I5636f8cd607423f6ea6963db9c2cbd688e30b495
Seperate the prediction angle search and fitler search.
It can reduce the computation overhead of filter search by as much
as 85%, while keeping more than 50% of the coding gain.
Change-Id: Id152f71e20ebcaca8b429bdd4ca1fbeb646fc6bf
BD-rate performance improvement (on top of ext-intra):
derflr 0.22%
hevclr 0.36%
hevcmr 0.48%
hevchr 0.37%
stdhd 0.19%
Average speed impact on some derf clips is about 40% slower (on
top of ext-intra). Speed improvment is a to-do.
Change-Id: I8fe3fe8c5e4f60d0462778adbcc15c84dfbe7a25
This commit generalizes dynamic reference motion vector coding mode
to support multiple candidate modes in the rate-distortion
optimization scheme and to support the selection in the bit-stream
syntax. The maximum number of modes allowed is currently limited to
4. The syntax elements for the dynamic reference motion vector
modes are using binary codes. The scheme supports single reference
frame.
It improves the compression performance
derf 0.135%
hevcmr 0.098%
Change-Id: Id053d6ce76e8365e52727bd0d12d28ce3de2e0e8
It makes the encoder accounts for the block zero-forcing operation
when optimizing the mode decisions.
Change-Id: I2c8e243756080b446b8a53a9679f75c4c47148cf
Fixes assertion for football_422_4sif.y4m when supertx, var_tx and
ext_tx are all enabled. Problem was after subsampling, the u and v
blocks being encoded were no longer square.
Change-Id: Ie626f30a2e64538d33343a26d5124a79a6f2b985
This commit allows an adaptive motion vector referencing mode
approach. It checks the available reference motion vector candidate
list and decides the amount of motion vector referencing modes. The
current implementation assumes simple binary coding for the syntax.
The compression performance is improved by
derf 0.11%
hevcmr 0.38%
stdhd 0.09%
hevchr 0.23%
The coding gains due to the new reference motion vector system are
derf 1.0%
hevcmr 1.7%
stdhd 1.4%
hevchr 1.3%
Change-Id: Idf932fc373546fe59c8741f1b933ff656e8dbc3f
Update blk_skip in update_state_supertx() and rd_supertx_sb().
Performance (SUPERTX+VAR_TX): TBD
(Eventually will merge update_state() and update_state_supertx())
Change-Id: I34ef982b80151ba2dfba745859cb2ca7b90dc888
Fixes integer pel MV usage for the sub8x8 case, which fixes a
rare mismatch issue.
Also adds some other minor missing code related to filter threshes.
Change-Id: I6b07e6cf9b287ba4b5bd6599af4a7412e50b3bdc
Reintroduce part of
Iaf2b717e6b8626b2b6a03226127221b776b49884
Which was later reverted in
I4c5b40ec63a6f19521191d3c730af87db3c4bc00
Change-Id: If3e5610ba3985ae7b4d952d8e616982465ac667a
There is still an assertion failure when tokenizing transform
coefficients for a supertx block, due to the eob not being set
consistently with the coefficients, so we always recode supertx blocks
for now. Also added further PICK_MODE_CONTEXT instances to avoid
potential clash between horizontal/vertical/split partition SUPERTX
trials.
Change-Id: I5f3da1fa0d8d20fc21face170487e1a285fd1cc6
The encoder did not update left_txfm_context and above_txfm_context in
MACROBLOCKD (used for choosing the probability context for the vartx
split bits) when the supertx bit was set for a block. The deoder on the
other hand did update these for supertx blocks. The encoder used these
to compute the context counts, which the packer then uses to adapt it's
probabilities. This results in the packer and the decoder using
different probabilities.
This patch harmonizes the encoder and the decoder by making the encoder
update the mentioned context for supertx coded blocks.
Change-Id: I3a22132124b1bce2ee501d640ceab374b19e3ca1
This is necessary when using SUPERTX, as the bitstream packer relies on
tx_size being set correctly to decide whether to output the block using
supertx or not.
Change-Id: I79e776b3b810f4a15b9dbc6afdd6fc90c73c8934
The loop filter relies on inter_tx_size in MB_MODE_INFO being set
properly when VAR_TX is enabled. Supertx coded blocks did not set this
previously at all, and the differing garbage values eventually resulted
in in a YUV mismatch between encoder and decoder after loop filtering.
This patch fixes this by setting inter_tx_size to the proper supertx
size in both the encoder and the decoder. This should also mean that
loop filtering is done at the proper transform boundaries, even when
supertx or vartx is being used.
Change-Id: I41a564cd6d34ce4a8313ad4efa89d905f5ead731
Fixes some of the issues introduced by a merge from master.
derflr: -0.893% BDRATE
hevcmr: -1.667% BDRATE
Change-Id: I4c5b40ec63a6f19521191d3c730af87db3c4bc00
Combinations of different mv modes for two reference frames
are allowed in compound inter modes. 9 options are enabled,
including NEAREST_NEARESTMV, NEAREST_NEARMV, NEAR_NEARESTMV,
NEAREST_NEWMV, NEW_NEARESTMV, NEAR_NEWMV, NEW_NEARMV, ZERO_ZEROMV,
and NEW_NEWMV.
This experiment is mostly deported from the nextgen branch.
It is made compatible with other experiments
Coding gain of EXT_INTER(derflr/hevcmr/hevchd): 0.533%/0.728%/0.639%
Change-Id: Id47e97284e6481b186870afbad33204b7a33dbb0
This patch fixes a couple of issues caused by change-id:
I15d20ce5292b70f0c2b4ba55c1f1318181481596
Changes to the code for when the ext_tx experiment is not enabled
were merged from master but as var_tx does not exist on master
the changes had not been applied to the case when var_tx experiment
is enabled
Change-Id: Iaf2b717e6b8626b2b6a03226127221b776b49884
Move it from vp10_adapt_intra_frame_probs() to
vp10_adapt_inter_frame_probs() because intra frames do not use
supertx.
Change-Id: I28c7391944848666054d4b990ac17a8ae08aaaee
Current implementation is a bilateral filter whose
parameters are transmitted in the bitstream.
derflr: -0.647% BDRATE
hevcmr: -0.794% BDRATE
This is a prelimary patch. Various other variations are to
be investigated next, that will hopefully be less expensive
on the decoder side.
Change-Id: I50634ae8f5014ad0bf7432306348908a349d81e1
This patch changes the code for 16bit buffers to use the same
optimisation as is used for 8bit buffers. (See change-Id:
I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1 for more information
about the optimisation)
Change-Id: I5f327a13a7b01fc356114a2aa9d1261bf76d8d69
NEW2MV is enabled, representing a new motion vector predicted from
NEARMV. It is mostly ported from nextgen, where it was named
NEW_INTER.
A few fixes are done for sub8x8 RDO to correct some misused
mv references in the original patch.
A 'bug-fix' for encoding complexity is done, reducing the additional
encoding time from 50% to 20%. In sub8x8 case, the old patch
did motion search for every interpolation filter (vp9 only
searches once). This fix also slightly improves the coding gain.
This experiment has been made compatible with REF_MV and EXT_REFS.
Coding gain (derflr/hevcmr/hevchd): 0.267%/0.542%/0.257%
Change-Id: I9a94c5f292e7454492a877f65072e8aedba087d4
ext-partition: to hold partition extensions (ex. ext-partition,
ext-coding-unit-size from nextgen)
loop-restore: to hold in-loop restoration filter (ex. loop-postfilter
from nextgen and other Wiener restoration filters)
Change-Id: I71c7f1588f05fb0f2b00f7004a78e90c9cceae3f
Fixes a breakage introduced with the latest merge from master and
cleans up a couple of compiler warnings.
Change-Id: Ia55b39ba78e43f6fe52c54d7f34faa4dd6bbbf26
This commit considers the case where a single reference motion
vector pair is found in the candidate list. It treats this pair
as the effective motion vector for nearestmv mode. This improves
the coding performance by 0.06% for stdhd sets.
Change-Id: I9ce12f456b52912933e05c18c3841a78c26155d2
This commit allows the codec to add motion vector pairs into
the candidate list. It further improves the compression performance
by 0.1% across derf, hevcmr, stdhd, and hevchr sets without adding
encode/decode time.
Change-Id: I88d36da25a2a89bb506d411844af667081eba98b
The '110' prefix on a final byte indicates a superframe marker. Coded
data is not allowed to use this pattern on a final byte.
Code |state - l_base| little endian with the following prefix scheme:
Prefix '00': Single byte coded state.
Prefix '01': Two bytes le coded state.
Prefix '10': Three bytes le coded state.
Change-Id: Ibc953b67675b567394b93de39b7cb22cadc47435
This commit re-works the reference motion vector stack process
and make it support extended context set. It unifies reference
motion vector checking process for row and column scan, as well as
for single block scan.
Change-Id: I68c05cde93cf8b0ca2ef4d1523399f405bd0a337
It allows the codec to account for certain corner cases when
processing inter prediction mode entropy coding.
Change-Id: Ied451f4fff26ba579f6556554b8381ff2ccd0003
Various additional changes were made to make the experiment
compatible with misc_fixes.
derflr: +0.979%
hevcmr: +0.865%
Speed-wise with --enable-supertx the encoder is only about 10%
slower than without. Decoding impact is about 30% slowdown.
Note this does not work with ext-tx or var-tx yet. That is
a TODO.
Change-Id: If25af4241a7a9efbd28f58eda3c4f044c7a7ef4b
For the experiment of EXT_REFS, removed the previous special handling
on the new last 3 references, i.e. LAST2_FRAME, LAST3_FRAME, and
LAST4_FRAME, at the decoder, so that these new last references are
treated the same way as the other 3 references (LAST_FRAME,
GOLDEN_FRAME, and ALTREF_FRAME). Encoder changes have been made
accordingly to realize this flexibility.
Change-Id: Ic6546f9443b4377bb7e7b101bfa3e70a8b8d1c65
The DST2 is implemented by input alternate sign-flip, followed
by DCT, followed by output reversal.
Results are roughly the same, but it should be easier to optimize
the DST2.
[Interestingly a mtrix multuiply implementation is about 0.1%
better].
Change-Id: If9ae5fdba87767fb0e6c163a62b77ee66a8d3afc
1) Add VP10_XFORM_QUANT_SKIP_QUANT mode for vp10_xform_quant
2) Let encode_block call vp10_xform_quant so that its code flow
is clear
Change-Id: I122d5cf6a089f444ae018f3e4bf844be847e17ee
This commit allows the codec to analyze the reference motion vector
candidate list and adaptively reduce the size of inter prediction
mode set.
Change-Id: Ied6a403843b860d66f26ed485c1825c05c71bdfc
This commit re-works the entropy coding scheme of the motion
compensated prediction modes. It allows more flexible hyperplane
partition for precise classification.
Change-Id: Iba5035c76691946cf1386b6c495e399c3d9c8fc5
The 8bit temporal filter has changed to use non-local means.
This fix adds the same change into the high bitdepth code.
Change-Id: I3375b13a7d914fc8aa9eb4aac1d2e4d9b74b782f
Fix a bug in vp10_has_right;
Some cosmetic changes.
Tiny performance improvement (0.02%~0.04%) on derflr and hevcmr.
Change-Id: Iee829003a20f32d6185a08bab2bd4201806be2b3
1) Add facade to quantize b/fp/dc version so that their interface
are the same.
2) Merge vp10_xform_quant b/fp/dc version to one function so that
the code flow in encodemb.c is clear
Change-Id: Ib62d6215438fc2d07f4e7e72393f964832d6746f
This commit adds top-right corner and collocated block into the
reference motion vector candidate check list.
Change-Id: I892a4c7fb04ddda44e0f9dfe769471252d40c42b
Estimate angle histogram using gradient analysis, then skip those
angles that are unlikely to be chosen.
On ext-intra experiment, turning off filter-intra modes:
for all-key-frame setting, computation overhead is reduced
by about 40%, coding gain dropped from +2.08% to +1.96% (derflr);
with kf-max-dist=150, computation overhead is reduced
by about 60%, coding gain dropped from +0.58% to +0.49% (derflr).
Change-Id: I36687410fb10561b8e1a8eebb1528cf17755bd5b
Add inv_txfm and highbd_inv_txfm as facades of inverse transform
such that the code flow in encodemb.c can be simpler
Change-Id: Iea45fd22dd8b173f8eb3919ca6502636f7bcfcf7
Check all motion vectors in the immediate above and left blocks if
the reference conditions matched.
Change-Id: I8bf33bfcee99e8150232c7681fdeade307024272
This commit ports the motion vector stack from motion field
analyzer to the encoding and decoding pipeline.
Change-Id: Ie283c1e1a15b4c17a1c7c175ce322bf053bb7840
This commit allows the codec to analyze the motion field in the
avaiable above and left neighboring area to produce a set of
reference motion vectors for each reference frame. These reference
motion vectors are ranked according to the likelihood that it will
be picked.
Change-Id: I82e6cd990a7716848bb7b6f5f2b1829966ff2483
Reduces the transform optons for INTRA as well as INTER when
transform size is 16x16 to not use any of the DSTs.
Thus, a total of 10 options are used for 16x16, while 4x4
and 8x8 still uses 17 options.
derflr/hevchd actually improves a little, while hevcmr drops
a little.
About 10% speed improvement.
Change-Id: I920a182231e052cdd622f8bb67085c16c572cb1e
These primitive variables are commonly required by many other
experiments as well. The use of n4_w and n4_h was originally
introduced in the vp9 decoder implementation.
Change-Id: I93d701d891e3860f31150031e3b9a2b29a3993d2
Under the experiment of EXT_REFS: LAST2_FRAME, LAST3_FRAME, and
LAST4_FRAME.
Coding efficiency: derflr +1.601%; hevchr +1.895%
Speed: Encoder slowed down by ~75%
Change-Id: Ifeee5f049c2c1f7cb29bc897622ef88897082ecf
Remove MISC_FIXES flags except for the changes on MV precision, which
has a 0.1% performance drop.
On derflr, the impact is -0.012%.
Change-Id: I0a74e5a212dd0cb827192a318c92a714c9681e45
Reset the effective range of inter_tx_size, instead of the entire
array in the rate-distortion optimization loop.
Change-Id: Id453fbd6dddfe69f4e451ba8518c083326d5dd53
This commit re-designs the alternate reference frame generation
process. It employs non-local mean approach to produce more stable
pixel estimation for alternate reference frame. It improves the
compression performance gains:
derf 0.5%
hevcmr 0.8%
stdhd 1.3%
hevchr 1.0%
The encoding time at speed 0 is not affected.
Change-Id: Iaa757f0da189ce93812d69617a81bf630d449848
This commit fixes an encoding failure case triggered when early
termination feature is turned on for transform block size search.
It resolves the corresponding enc/dec mismatch issue.
Change-Id: I2c5b7d8b1efe25fe3810e6ed307f4b1865dede49
Adds a new interpolation experiment.
Improves entropy coding to send the filter type only if
the motion vectors have subpel components.
Adds one new 8-tap smooth filter, and tweaks the others.
derflr: +0.695%
hevcmr: +0.305%
About 5% encode slowdown. No visible impact for decoding.
Also makes the interpolation framework flexible to support both
strictly interpolating filters as well as non-interpolating
filters that filter integer offsets. This is mainly for
further experimentation and if not found useful the code will
be removed.
Change-Id: I8db9cde56ca916be771fe54a130d608bf10786e6
This commit refactors the rate-distortion optimization scheme for
transform block coding. When both ext-tx and var-tx experiments
are turned on, the encoding time for bus_cif at 1000 kbps goes down
from 706377 ms to 666503 ms (5.6% speed-up). The coding statics
remain unchanged.
Change-Id: I20835db573725580aad79c16220f799ce01f2093
When using FLIPADST, the vp10_inv_txfm_add functions used to flip
the destination array, add the result of the inverse transform, to it
and then flip the destination back. This has been replaced by
flipping the result of the inverse transform before adding it to the
destination. Up-Down flipping is done by negating the destination
stride, and staring from the bottom, so it should now be free.
Left-right flipping is done with the usual SSE2 instructions in the
optimized code.
The C functions match the SSE2 functions as expected, so the C functions
now do the flipping as well when required. Adding this cleanly required
some refactoring of the C functions, but there is no measurable
performance impact when ext-tx is not enabled.
Encode speedup with ext-tx enabled is about 3%.
Change-Id: I5b04e5d720f0b9f0d54fd8607a8764f2314c7234
Currently there are two parts in this experiment: extra directional intra
prediction modes and the filter intra modes migrated from the nextgen branch.
Several macros are defined in "blockd.h" to provide controls of the experiment
settings. Setting "DR_ONLY" as 1 (default is 0) means we only use directional
modes, and skip the filter-intra modes; "EXT_INTRA_ANGLES" (default is 128)
defines the number of different angles we want to support; setting
"ANGLE_FAST_SEARCH" as 1 (default is 1) means we use fast sub-optimal search
for the best prediction angle, instead of exhaustive search. The fast search
is about 6 times faster than the exhaustive search, while preserving about
60% of the coding gains.
With extra directional prediction modes (fast search), we observe the following
code gains (number in parentheses is for all-key-frame setting):
derflr +0.42% (+1.79%)
hevclr +0.78% (+2.19%)
hevcmr +1.20% (+3.49%)
stdhd +0.56%
Speed-wise, about 110% slower for key frames, and 30% slower overall.
The gains of filter intra modes mostly add up with the gains of directional
modes. The overall coding gain of this experiment:
derflr +0.94%
hevclr +1.46%
hevcmr +1.94%
stdhd +1.58%
Change-Id: Ida9ad00cdb33aff422d06eb42b4f4e5f25df8a2a
Jenkins per-commit test need to be expedited as more experiments are
added into the nextgenv2 branch. This patch does the following:
thread test: change the length of test clip from 5 frames to 3 frames;
only test speed 1.
ArfFreq test: marked as "large".
The tests marked as "large" will be removed from per-commit test
(to nightly test).
Change-Id: I62b373c52b481dcd281e741ebf5098408a97ff4d
This patch eliminates the copying of data when using FLIPADST forward
transforms, by incorporating the necessary data flipping into the
load_buffer_* functions of the SSE2 optimized forward transforms. The
load_buffer_* functions are normally inlined, so the overhead of copying
the data is removed and the overhead of flipping is minimized. Left to
right flipping is still not free, as the columns need to be shuffled in
registers.
To preserve identity between the C and SSE2 implementations, the
appropriate C implementations now also do the data flipping as part of
the transform, rather than relying on the caller for flipping the input.
Overall speedup is about 1.5-2% in encode on my tests. Note that these
are only the forward transforms. Inverse transforms to come in a later
patch.
There are also a few code hygiene changes:
- Fixed some indents of switch statements.
- DCT_DCT transform now always use vp10_fht* functions, which dispatch
to vpx_fdct* for DCT_DCT (some of them used to call vpx_fdct*
directly, some of them used to call vp10_fht*).
Change-Id: I93439257dc5cd104ac6129cfed45af142fb64574
These tables were out of sync with the indexing enum since the
refactoring in commit 4f16f119 (change 303389), due to the removal
of the ext_tx_to_txtype lookup table. This patch just puts them
back in order.
Change-Id: Ieb7d57654f61b99b511d54c9ba09abbd5e8d0d14
This commit re-works the rate-distortion optimization scheme for
transform coding. It improves the overall compression performance.
For derf set, the ext-tx experiment provides 2.27% coding gains,
and the new scheme that integrates multiple transform type selection
and recursive transform block partitioning provides a total of 3.24%
coding gains.
Change-Id: Ia1887c4c44b73dfb915d091d96660a99f09d5cc3
This commit hooks up the rate-distortion optimization system to
fully exploit recursive transform block partition and multiple
transform type. The compression performance of the two experiments
largely adds up. For derf set, ext-tx provides additional 2.1%
coding gains on top of the gains due to recursive transform block
partition (0.69%).
Change-Id: I1091fb9545f74e489a6a2489dc3c12f5abd05043
Correctly compute the block size in bit-stream coefficient token
packing. This fixes an enc/dec mismatch at very high bit-rates.
Change-Id: I37bf084731dc660df0c695cad406ddcd0f9eb904
Change-Id: I38952cd55b91f35e5db45bc8e6a20ef25069c464
--ext-refs: extended references - for multi-ref in nextgen
--ext-inter: extended inter - for new_inter/copy_mode in nextgen
--ext-interp: for new interpolation
This commit allows the loop filter to account for the recursive
transform block partition when selecting the filter and mask.
Change-Id: I62b6c2dcc0497cbe1f264b03c46163f55d2c9752
This commit refactors the loop filter selection process to support
variable transform block sizes based filter mask. It disables the
multi-thread loop filter implementation to simplify the experiments.
The speed impact on speed 0 encoding is negligible.
Change-Id: Ia470b6da9ad833fe6eb72d2cbeda9296b21910ec
If a block has all coefficients quantized to zero, the codec will
assume that it uses largest transform block size.
Change-Id: Icd4e8e7cdc4b6af6974f87169e50b040ebfe9020
Allows inter and intra tx_types to have different sets of
transforms for different tx_size/sb_type combinations.
Change-Id: Ic0ac1daef7a9fb15c4210271e4d04cd36e5cec8e
Rework the rate distortion optimization pipeline. Use precise
distortion metric that accounts for the forward and inverse
transform rounding effect.
Change-Id: Ibe19ce9791ec3547739294cc3012dd9e11f4ea49
This commit makes the coefficient token packtization process account
for variable transform block sizes supported in a single processing
block. It fixes an enc/dec mismatch issue when var-tx, ext-tx, and
misc-fixes experiments are all turned on.
Change-Id: I2e8946e6f72de567603a568debbadad11196430c
Properly update the transform type counts in key frame coding at
decoder. It fixes an enc/dec mismatch issue when both ext-tx and
misc-fixes are turned on.
Change-Id: I1e40a77c8d8157d5ff254b072ce474d8dfbaa3ae
EXT_TX introduces some new symbols to be decoded.
The encoder counts how many times these are used.
In multithreaded mode, the counts from the worker threads
need to be accumulated into the main thread.
This change means that VP10/VPxEncoderThreadTest now works
with more choices of cpu-used and number of passes.
Change-Id: Ibe7e6a3c58145265f4ead155ff98fb4cb37c3513
Properly reset the early termination flag in the recursive transform
block partitioning rate-distortion optimization scheme.
Change-Id: Ibfe918f21f11dcb1ec267c09f954c635305cc95a
Use inter_block_yrd as rate-distortion optimization for lossless
coding. This fixes transform coefficient buffer swap use case and
resolves the unit test failure related to lossless coding.
Change-Id: I1512dab5ed5760c31f7de21a06e8d9ed1eb081fa
This commit properly resets the recursive transform block partition
array in the settings of using largest transform block size at frame
header level. It fixes one of the unit test failure related to the use
of frame level fixed transform block size with 440 color format.
Change-Id: I6750f323e2c2510c080ffc3af82ce2041f4f60b8
This commit makes the recursive transform block partitioning properly
handle the non-420 color format. It resolves an enc/dec mismatch
issue in that setting when var-tx experiment is turned on.
Change-Id: I48a91de02c11b3153f897d1cca0ae948eec15605
This commit fixes an enc/dec mismatch issue in recursive transform
partitioning experiment due to merge conflict.
Change-Id: I66146ef806c008902c91d54f4f8c7ccf47996b78
This commit fixes the merge conflicts between master and nextgenv2 and
disable early termination in choose_tx_size() to avoid failure in test.
The test failures are pre-existing, some of the issue were fixed in
masterbase already, so will have another merge to introduce the fixes.
Change-Id: Ib71889661955e73aedbb4db49d8be70425281dcb
Temporarily reset the transform type in the inter modes when
recursive transform block partitioning is used. This resolves an
enc/dec mismatch issue in nextgenv2 codebase when both var-tx and
ext-tx experiments are turned on.
Change-Id: I2543f0a567243da95b237752d46964b07b669ad9
Clear the compiler errors when both high bit-depth and recursive
transform block partition experiments are enabled.
Change-Id: If0b6396851f10c28b4f26350322ccd1ba2fc9aff
This commit re-designs the recursive transform block partition
rate-distortion optimization framework. It allows the encoder to
improve speed by 10%.
Change-Id: I6dd3a7dd428a530d8012e5c6ddc40e650c8b392b
This commit makes the rate-distortion optimization for chroma
component support the recursive transform block coding scheme.
Change-Id: I1bfed6d05b0ebb3905cb625222401e2ccbae10f3
This commit makes the transform, quantization, tokenization and
their corresponding inverse operations support recursive transform
block coding process.
Change-Id: I71f2ef3a7c2d3db7cfc63c1fd3f1337e8e0360b5
This commit re-designs the bitstream syntax to support recursive
transform block partition. The initial stage targets the inter
prediction residuals.
Change-Id: I556ab3c68c198387a2fd2d02e2b475e83cd417c3
Add the row and column index to the argument list of unit functions
called by foreach_transformed_block wrapper. This avoids the
repeated internal parsing according to the block index.
Change-Id: I42b3578eac258ebaba7a7c74f684de9abab521a6
Adds an early termination to the ext_tx search, and also
implements the DST transforms more efficiently.
About 4 times faster with the ext-tx experiment.
There is a 0.09% drop in performance on derflr from 1.735% to
1.648%, but worth it with the speedup achieved.
Change-Id: I2ede9d69c557f25e0a76cd5d701cc0e36e825c7c
Resolved Conflicts in the following files:
configure
vp10/common/idct.c
vp10/encoder/dct.c
vp10/encoder/encodemb.c
vp10/encoder/rdopt.c
Change-Id: I4cb3986b0b80de65c722ca29d53a0a57f5a94316
Consider tha case in which skipping transform coefficients is more
efficient.
derflr +0.13%
hevclr +0.11%
hevcmr +0.14%
hevchr +0.22%
with ext-tx, the impact is -0.02%.
Change-Id: I0aa2965cf9e152396623c2fee62545bd3a3a7f07
Extend the ext_tx experiment to make the UV inter blocks use
the same transform type as the extended transform type used
for Y.
derflr: +1.792% (about +0.06)
Change-Id: I4a77e1f7764b2e8b523e28f42ba13559dde4f0ca
Creates new hybrid transforms combining symmetric DST with
ADST and DCT. Thus a total of 16 transforms are supported.
derfl: +1.659% (up about 0.2%)
Change-Id: Idde1cecdb59527890bf05da740099c3f6a5b9764
Does not include DST1 yet.
derflr: +1.437 (8-bit internal), +7.243 (12-bit internal)
with --enable-ext-tx
Change-Id: I91f1759fd2de794755eb6384cda52e80e979cb7d
derflr +0.202%
hevclf +0.207%
hevcmr +0.095%
hevchr +0.077%
Tested locally on several derf sequences, speed (encoder + decoder)
is slower by less than 1%.
It is part of the EXT_TX experiment, which is to be continued to
explore different transform variants.
Change-Id: I05d44994a62106538a9a241ed8d89bd7c5d14761
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.