In some cases intra modes in inter frames give
an over smoothed appearance. Especially with
noisy but flat content.
Also in some cases there were problems with key
frame sizing again with very flat but noisy content.
These are temporary changes to help alleviate the
visual problems but will almost certainly hurt metric
results especially at the very low data rate end.
Change-Id: I11549179a19277ffc283d9788bc70168f2a8bdc9
This is to fix a decoder crash when decoder skips a number of frame to
continue decoding from a later key frame.
Change-Id: I3ba116eba6c3440e0528a21f53745f694302e4ad
MSVC 2012 (_MSC_VER=1600) introduced the definition, this commit
prevents the redefinition of the macro
Change-Id: I7de92e7e9e865a342f2bcc4b071f8d3c9b2a508c
As suggested by Yaowu, we can use eob to reduce the complexity
of the vp9_ihtllm_c function. For the 1080p test clip used, the decoder
performance improved by 17%.
Change-Id: I32486f2f06f9b8f60467d2a574209aa3a3daa435
First attempt at avoiding all the compile-time environment detection for
cases where you can generate the environments statically, as when the
real build is being performed by another build system.
Change-Id: Ie3cf95d71d6c5169900f31e263b84bc123cdf73f
In addition to allowing tests to use the RTCD-enabled functions (perhaps transitively)
without having run a full encode/decode test yet, this fixes a linking issue with
Apple's G++ whereby the Common symbols (the function pointers themselves) wouldn't
be resolved. Fixing this linking issue is the primary impetus for this patch, as none
of the tests exercise the RTCD functionality except through the main API.
Change-Id: I12aed91ca37a707e5309aa6cb9c38a649c06bc6a
Only declare the functions in vpx_scale RTCD and include the relevant
header.
Remove unused files and functions in vpx_scale to avoid wasting time
renaming. vpx_scale/win32/scaleopt.c contains functions which have not
been called in a long time but are potentially optimized.
The 'vp8' functions have not been renamed yet. That is for after the
cleanup.
Change-Id: I2c325a101d60fa9d27e7dfcd5b52a864b4a1e09c
1. remove the dependency on non existing "vp9_temporal_filter_x86.h"
2. prefix filenames with vp9_ in obj_int_extract.bat to reflect the
change of the actual filenames.
Change-Id: Ib1b4d96ac41788f76917764a6722d8461c857302
- vpx_calloc is called on arf_not_zz above.
- Note The removed vpx_memset call had an issue with sizeof.
Change-Id: I86fd7a167d0a042e581e613e2a6c0b5e63073fc6
Adds support for compound inter-intra prediction with superblocks.
Also, fixes a bug that disabled intra modes for superblocks.
Change-Id: I4d711317e1bc19df8c2f32dc645429f7fff31036
Allows switchbale filters to be used without mismatch when the
superblock experiment is on.
Also removes a spurious clamping code in decodemv.c which causes
rare encode/decode mismatches.
Change-Id: I809d9ee0b2859552b613500b539a615515b863ae
Previously, the "!=" check is logically incorrect when eob is at 0 and
effective coefficient starting position is 1. This commit should have
no effect on bitstream.
Change-Id: I6ce3a847c7e72bfbe4f7c74f88e3310c6b9b6d30
This patch allows use of 8x8 and 4x4 ADST correctly for Intra
16x16 modes and Intra 8x8 modes when the block size selected
is smaller than the prediction mode. Also includes some cleanups
and refactoring.
Rebase.
Change-Id: Ie3257bdf07bdb9c6e9476915e3a80183c8fa005a
This commit makes sure Y2 entropy coding context is always updated on
every macroblock even there is no Y2 block.
Change-Id: Ie307cfc46526efe55613be39f9f178d2531b56ba
This change included:
1. Aligned reads in vp9_mbloop_filter_vertical_edge function.
Since we actually read 16 bytes, we can align the reads to read
starting at (s - 8) instead of (s - 5).
2. Combined u, v loop filters.
3. Added 8x16 transpose.
This gave 2% decoder performance gain (tulip clip).
Change-Id: Ib14c2f1645c4a3436df17fe2f24789506bf0bb58
This commit removed a couple of redundant data structures in frame
coding contextsm, mode_context and mode_context_a, and changed to
use vp9_mode_contexts only. The switch of the context for different
frame type now relies on the switch of frame coding context between
lfc and lfc_a. This commit also removed a number of memcpy among
these redundant data structure.
Change-Id: I42e8174bd60f466b0860afc44c1263896471b0f3
Not all segment feature data elements are full-range powers of two, so
there are values that can be encoded that are invalid. Add a new function
to clamp values to the maximum allowed.
Change-Id: Ie47cb80ef2d54292e6b8db9f699c57214a915bc4
Support for gyp which doesn't support multiple objects in the same
static library having the same basename.
Change-Id: Ib947eefbaf68f8b177a796d23f875ccdfa6bc9dc
Modified the mv_pred() fuunction that chooses a centre
point from which to start step searches to use the top
candidate vectors chosen previously.
Some gains (mainly on HD and tested with SB off).
Std_hd 0.874%, YT-hd 0.174%, YT 0.05%, Derf 0.036%
Change-Id: Ie232284f561838b8ecee0e28dcbb07a9cd46cf56
Vp9_sad3x16_sse2() is heavily called in decoder, in which the
unaligned reads consume lots of cpu cycles. When CONFIG_SUBPELREFMV
is off, the unaligned offset is 1. In this situation,
we can adjust the src_ptr to be 4-byte aligned, and then do the
aligned reads. This reduced the reading time significantly. Tests
on 1080p clip showed over 2% decoder performance gain with
CONFIG_SUBPELREFM off.
Change-Id: I953afe3ac5406107933ef49d0b695eafba9a6507
Add a new function vp9_decode_mb_tokens() that handles the switch
between different per-tx-size detokenize functions. Make actual
implementations (vp9_decode_mb_tokens_NxN()) static.
Change-Id: I9e0c4ef410bfa90128a02b472c079a955776816d
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Split out UV and Y coefficient
loops for clarity. Use xd->block[].qcoeff instead of xd->qcoeff + magic
to remove use of magic offset variables.
Change-Id: I5b17eda1bb666c69c2b7ea957d5525cd78192e33
Don't declare variables if they only ever have a single value and are
used only as argument to another function call; instead, just hardcode
the value in the function call directly. Also remove unneeded brackets
around a code block, and remove the magic offsets 64 and 256 for chroma
values in the coefficient memory block.
Change-Id: I14fc14120a81ea1d6fb862674e8bf8cf6ba3d114
Update the fmt_deps function to use a new sed expression to convert the
object file name generated by the compiler into the path-transformed
name of the .o and .d files.
Prior to this patch, changing a header file would not trigger an
incremental build.
Change-Id: I07f498a1d134577b89a72e3f1143c737b31a0636
This prevents the relatively expensive token-from-coefficient lookup
function get_token(), plus a duplicate loop..
Change-Id: Ibecd407b2a91d3593d439ec4646e43fa26d2ff91
Just like for all other block modes, b_pred tokens can be read together
before starting macroblock reconstruction. This removes special cases
for b_pred in decode_macroblock() and allows to make decode_coefs_4x4()
static in detokenize.c.
While at it, remove the redundant handling and checking of plane_type
and block_index (i) in decode_coefs_4x4(). Since the function is static,
and is called only from decode_mb_tokens_4x4(), we don't need to worry
that the arguments ever go out of sync.
Change-Id: I2d415da0b51b89d0490a6b9e24cc86363c2090f7
Experiments with a larger set of contexts and some
clean up to replace magic numbers regarding the
number of contexts.
The starting values and rate of backwards adaption
are still suspect and based on a small set of tests.
Added forwards adjustment of probabilities.
The net result of adding the new context and forward
update is small compared to the old context from the
legacy find_near function. (down a little on derf but
up by a similar amount for HD)
HOWEVER.... with the new context and forward update
the impact of disabling the reverse update (which may be
necessary in some use cases to facilitate parallel decoding)
is hugely reduced.
For the old context without forward update, the impact of
turning off reverse update (Experiment was with SB off) was
Derf - 0.9, Yt -1.89, ythd -2.75 and sthd -8.35. The impact was
mainly at low data rates.
With the new context and forward update enabled the impact
for all the test sets was no more than 0.5-1% (again most at
the low end).
Change-Id: Ic751b414c8ce7f7f3ebc6f19a741d774d2b4b556
added additional motion vectors at close neighborhood of a superblock
to the list of candiate motion vectors, and removed a couple that are
further away.
The change helped std-hd set about .8% (all metrics) and smaller gain
for derf set.
Change-Id: Iaa69b98614db43420ed3fd4738d0ca5587b90045
A patch on compound inter-intra prediction.
In compound inter-intra prediction, a new predictor for
16x16 inter coded MBs are obtained by combining a single
inter predictor with a 16x16 intra predictor, in a manner
that the weight varies with distance from the top/left
boundary. The current search strategy is to combine the best
inter mode with the best intra mode obtained independently.
Results so far:
derf +0.31%
yt +0.32%
std-hd +0.35%
hd +0.42%
It is conceivable that the results would improve somewhat
with a more thorough search strategy where all intra modes
are searched given the best mv, or even a joint search for
the best mv and the best intra mode.
Change-Id: I7951f1ed0d6eb31ca32ac24d120f1585bcd8d79b
Modify the decoder to return the ending position of the bool decoder and
use that as the starting position for the next frame.
The constant-space algorithm for parsing the appended frame lengths is
O(n^2), which is a potential DoS concern if n is unbounded. Revisit
the appended lengths for use as partition lengths when multipartition
support is added.
In addition, this allows decoding of raw streams outside of a container
without additional framing information, though it's insufficient to
be able to remux said stream into a container.
Change-Id: I71e801a9c3e37abe559a56a597635b0cbae1934b
Tags VP9 tracks with the V_VP9 video type when writing to .webm files,
and supports decoding both from vpxdec without specifying --codec.
Change-Id: I0ef61dee06f4db2a74032b142a4b4976c51faf6e
this commit changed the asm file compiling in MSVC to use individually
customized build command line with object filename specified for each
input file. This allows object filenames prefixed with path name, and
avoid name collision in link time
Change-Id: I996098643dcadc393af57035a04bef3877f45424
Rather than building an object file directory heirarchy matching the
source tree's layout, rename the object files so that the object
file name contains the path in the source file tree. The intent here
is to allow two files in different parts of the source tree to have
the same name and still not collide when put into an ar archive.
Change-Id: Id627737dc95ffc65b738501215f34a995148c5a2
Update decode_coefs() to break when c >= eob, since it's possible that
c starts the loop from 1 and eob is 0. The loop won't terminate in that
case.
Add new get_eob() function to consistently clamp the eob based on the
segment level EOB and the block size. It's possible to code a segment
level EOB that's greater than the block size, and that leads to an
out of bounds access.
Change-Id: I859563b30414615cf1b30dcc2aef8a1de358c42d
This is in line with other cases where we disable ADST if prediction
size and transform size don't match. Before this patch, the RD loop
will use ADST for superblocks, but frame encoding/decoding won't.
Change-Id: I700368c632eb72b5e089c22ef25649d99d7697d0
There are now more than 16 possible modes so 5
bits required for segment mode feature.
Note that it is likely that the mode feature and how it is
coded will change but for now the 4 bits was a bug.
Change-Id: I63348ae3a9cc31566a656c2dc78f09f5e1a9dcc9
CONFIG_DEBUG was turning on some code to dump the reconstructed frame
to a buffer from within the decoder. Move this code to a more specific
debugging define.
Change-Id: I3ca9ea634bdbd186f2470bd644d3695ee0ab3037
Similar to 16x16 dequant and idct, based on the value of eobs, the
8x8 dequant and idct calculation was simplified to improve decorder
performance.
Combined vp9_dequant_idct_add_8x8 and vp9_dequant_dc_idct_add_8x8
to eliminate duplicate code.
Change-Id: Ia58e50ab27f7012b7379c495837c9c0b5ba9cf7f
This change is a fix / extension of the newbestrefmv
experiment. As such it is presented without IFDEF.
The change creates a new context for coding inter modes
in vp9_find_mv_refs(). This replaces the context that
was previously calculated in vp9_find_near_mvs().
The new context is unoptimized and not necessarily
any better at this stage (results pending), but eliminates
the need for a legacy call to vp9_find_near_mvs().
Based on numbers from Scott, this could help decode
speed by several %.
In a later patch I will add support for forward update of
context (assuming this helps) and refine the context as
necessary.
Change-Id: I1cd991b82c8df86cc02237a34185e6d67510698a
Experiment to test speed trade off of reducing the
extent of the ref mv search.
Reducing the maximum number of tested candidates to 9 had
minimal net effect on quality in any of the tests sets.
Reduction to 7 has a small negative impact (worst was STD-HD
at about -0.2%).
This change is in response to the apparently high number of
decode cycles reported in regard to mv-ref selection.
Change-Id: I0e92e92e324337689358495a1ec9ccdeb23dc774
This fixes encoder/decoder mismatches with the superblock experiment
turned on whenever a superblock is encoded using the 4x4 transform.
Change-Id: Iefec7055e8d25f8efdbba66c4261bbd322d335a3
This should prevent inconsistent results between identical encodes with
the superblock experiment turned on.
Change-Id: I41a005fae53f2eb59736cc70041185fb7d63cfca
Preliminary patch on a new 4x4 intra mode B_CONTEXT_PRED where the
dominant direction from the context is used to encode. Various decoder
changes are needed to support decoding of B_CONTEXT_PRED in conjunction
with hybrid transforms since the scan order and tokenization depends on
the actual direction of prediction obtained from the context. Currently
the traditional directional modes are used in conjunction with the
B_CONTEXT_PRED, which also seems to provide the best results.
The gains are small - in the 0.1% range.
Change-Id: I5a7ea80b5218f42a9c0dfb42d3f79a68c7f0cdc2
The altref frame is packed along with the next P frame. So that
outside of the codec there are now only two types of frames P and I.
Also, now it is one frame in and one frame out with respect to the
codec. Apart from that, all the frames are length encoded with the
length of each frame appended to the frame itself. There are
two categories of frames and each of them will look as follows:
- Packed frames (an altref along with the succeeding p frame)
- altref_frame_data | altref_lenngth | frame_data | length
- Unpacked frames (all frames other than the above)
- frame_data | length
Change-Id: If1eabf5c473f7d46b3f2d026bd30c803588c5330
Also split superblock handling code out of decode_macroblock() into
a new function decode_superblock(), for easier readability.
Derf +0.05%, HD +0.2%, STDHD +0.1%. We can likely get further gains
by allowing to select mb_skip_coeff for a subset of the complete SB
or something along those lines, because although this change allows
coding smaller transforms for bigger predictors, it increases the
overhead of coding EOBs to skip the parts where the residual is
near-zero, and thus the overall gain is not as high as we'd expect.
Change-Id: I552ce1286487267f504e3090b683e15515791efa
As suggested by Yaowu, simplified 16x16 dequant and idct. In decoder,
after detoken step, we know the number of non-zero dct coefficients
(eobs) in a macroblock. Idct calculation can be skipped or simplified
based on eobs, which improves the decoder performance.
Change-Id: I9ffa1cb134bcb5a7d64fcf90c81871a96d1b4018
Creates a merge between the master and experimental branches. Fixes a
number of conflicts in the build system to allow *either* VP8 or VP9
to be built. Specifically either:
$ configure --disable-vp9 $ configure --disable-vp8
--disable-unit-tests
VP9 still exports its symbols and files as VP8, so that will be
resolved in the next commit.
Unit tests are broken in VP9, but this isn't a new issue. They are
fixed upstream on origin/experimental as of this writing, but rebasing
this merge proved difficult, so will tackle that in a second merge
commit.
Change-Id: I2b7d852c18efd58d1ebc621b8041fe0260442c21
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.
To fix this we force the multiplication to be unsigned.
The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.
For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).
This change is based on:
1698234 Missed some variance casts
fea3556 Fix variance overflow
Change-Id: I2c61856cca9db54b9b81de83b4505ea81a050a0f
s/([vV][pP])8/$19/
additionally dct.h was removed; declare the _c functions that are used
in the tests. the TODO for conversion to parameterized tests still
remains.
Change-Id: I73db9425a57075bbb78a92693ba6b320578981cd
there are still a couple type of warning left, which are related to
double constants assigned to float type. As those would be addressed
by the conversion of transforms into integer version. This commit
has left those un-dealt with.
Change-Id: I48fd9b489c0c27ad6b543f4177423419f929f2bb
Documentation is typically auto-detected by checking for php and
doxygen. Add an option to explicitly disable it.
Remove toggle keywords from libraries, examples, documentation and
unit tests. They were not consistent with the default status.
Change-Id: I21049675ccfd8e58ac612cd058641b197db5c0eb
calculate the txfm_cache difference first as both values may be
INT64_MAX with the intent that they cancel each other out.
Change-Id: I214d072458e1b24f60289974e6302af1aff7b66c
rd_pick_intra4x4mby_modes / rd_pick_intra8x8mby_modes would both use the
input value of 'rate_y' in the return calculation. In many places this
value is uninitialized. Remove the unneeded sum.
Change-Id: Icbd3df685303000301e69291c0ebc06f74bd548d
The block sizes for decoding tokens are up to 16x16, which means
eobs is within [0, 256]. Using (signed) char is not enough. Changed
eobs data type to unsigned short to fix the problem.
Change-Id: I88a7d3098e1f1604c336d6adb88ffec971fb03a6
Most of these were picked up by jenkins in the commit that changed
the vp8 namespace to vp9 in common/.
Change-Id: I5cbd56ffc753b92ef805133cda6acc1713a13878
For non-static functions, change the prefix to vp9_. For static functions,
remove the prefix. Also fix some comments, remove unused code or unused
function prototypes.
Change-Id: I1f8be05362f66060fe421c3d4c9a906fdf835de5
Most of these were picked up in the previous commit (prefix change from
vp8_ to vp9_), but I'm pushing this separately so that it's easier to
review.
Change-Id: Ifce2cdd6f008f4b1fbc2d89b5196d75e35fe115d
Most of these were picked up in the previous commit (prefix change from
vp8_ to vp9_), but I'm pushing this separately so that it's easier to
review.
Change-Id: I91e959895778b8632d7d33375523df8a7568a490
Converted the forward and inverse transforms to integer forms.
Modify #define TEST_INT 1/0
in the code to call integer/float version of transforms.
The tests showed that average OVERALL PSNR loss was less than 0.1%.
Change-Id: I1dfa4eeab6412597e3b970ce299cf0e116a917e6
As suggested by Paul, this commit separate the subpel refmv selection
into a separate experiment. It also changed a couple variable names
to better reflect the nature of the variables.
Change-Id: Id951c3cadc61a982dd15afe641000f60213b8995
This is the condition under which it is called in onyx_if.c. Also remove
the unused function print_mv_ref_cts().
Change-Id: I51ea3720d46f86d136e2215e01cf9d6c7dfc41ea
Two head files dct.h and dct_x86.h were removed in a previous commit,
this commit removed the build's dependency on the two files.
Change-Id: Idd33712470912d39d42f133dc30b710cab6fa832
Preparation for project restructuring.
Added vp9_ prefix on some function names that have global scope.
Added static declaration on some that dont.
Change-Id: If072f78b4300e8c17cfeed82c5d17b59946dcc5e
Previously, in evaluating reference motion vectors, MVs are always
rounded to integer pixel position and SADs are calculated. This
commit takes into account the subpixel portion of the mvs, and uses
bilinear interpolation to produce reference pixel values in subpixel
postions. In addition, SSE is used in place of SAD. Pixels used are
16x2 above and 2x16 to the left.
This commmit intends to test the potential of this line of work in
term of compression improvement, obviously, the change would increase
decoder complexity significantly.
Test results
std-hd: 1.738%(avg) 1.779%(glb), 1.663%(ssim)
derf: 0.472%(avg) 0.477%(glb), 0.418%(ssim)
Change-Id: I3ae1b098f6289df78891134d9a5e4bb2fde87a0b
Cleaned up some inconsistent references using both xd-> and
x->e_mbd. to access the same data structure in the same function.
Change-Id: Ieb496fa22bf1feec6aa7bc70b941ea4f16e0f8b5
Coding and costing of mv reference signal.
Issues in updating MV ref with COMPANDED_MVREF_THRESH
to be resolved. Ideally the MV precision should be defined based
on absolute MV magnitude not as now the MV ref magnitude.
Update to mv counts moved into bitstream.c because otherwise
if the motion reference is changed at the last minute the encoder
and decoder get out of step in terms of the counts used to update
entropy probs.
Code working on a few test clips but no results yet re benefit vs
signaling cost and no tuning of red loop to test lower cost alternatives
based on the available reference values.
Patch 3. Added check to make sure we don't pick a reference
that would give rise to an uncodeable / out of range residual.
Patch 6-7: Attempt to rebase. OK to submit but best to leave flag off for now.
Patch 9. Remove print no longer needed.
Change-Id: I1938c2ffe41afe6d3cf6ccc0cb2c5d404809a712
Quickly modified the ssse3 sixtap filters to support eight taps. For the test
clip used, a 23+% boost in decoder performance was seen. We can
revisit later and improve further.
Change-Id: I5f59860459e80d6fa23e6cc0fd91296a969f5240
Refactor per-transform copy & paste into a common function
update_coef_probs_common() and read_coef_probs_common(). The dry-run and
bit-writing loops in the encoder are still obvious candidates to be made
common, but they start to diverge a bit in the next commit, so are left
as-is for now.
Change-Id: I896bd3f4a073a6296eab7e92463fef79d8c6c08c
On Ubuntu 12.04, we got the following warning message:
<command-line>:0:0: warning: "_FORTIFY_SOURCE" redefined
[enabled by default]
<built-in>:0:0: note: this is the location of the previous definition
This was already fixed in VP8 configure file. Did the same change in
experimental branch to stop this warning.
Change-Id: Id162e5fd8841585ae806df6560b2f7536ea307c0
There is a macro DEFAULT_INTERP_FILTER defined in encoder/onyx_if.c that
is set as EIGHTTAP for now - so SWITCHABLE is not really used. Ideally,
this should be SWITCHABLE but that would make the encoder quite a bit slower.
We will change the default filter to SWITCHABLE once we find a faster way to
search for switchable filters.
Change-Id: Iee91832cdc07e6e14108d9b543130fdd12fc9874
The vp8_post_proc_down_and_across_mb_row_sse2() needs space for an
even number of macroblocks, as they are read two at a time for the
chroma planes. Round up the width during the allocation of
pp_limits_buffer to support this.
Change-Id: Ibfc10c7be290d961ab23ac3dde12a7bb96c12af0
Got 61 test vectors from vp8-test-vectors.git
(http://git.chromium.org/gitweb/?p=webm/vp8-test-vectors.git)
Added decoder test vectors downloading in unit tests. Uploaded
the test vectors and their md5 files to WebM website.
$ gsutil cp *.* gs://downloads.webmproject.org/test_data/libvpx
Added their sha1sum to the test/test-data.sha1 file.
In unit tests, download the test vectors to LIBVPX_TEST_DATA_PATH.
Test_vector_test goes through the test vectors, decodes them, and
compute the md5 checksums. The checksums are compared with the
expected md5 checksums to tell if the decoder decodes correctly.
Change-Id: Ia1e84f9347ddf1d4a02e056c0fee7d28dccfae15
* changes:
Fix another typo in 4x4-transform-for-i8x8-intra-pred coeff contexts.
8x8 transform support in splitmv.
Use SPLITMV_PARTITIONING instead of a plain integer type.
For splitmv, where partitioning is 8x16, 16x8 or 8x8, this patch
uses the 8x8 transform (instead of the 4x4) if txfm_mode is
ALLOW_8X8 or ALLOW_16X16. For TX_MODE_SELECT, splitmv can indicate
which of the 2 transform sizes (4x4 or 8x8) it wants to use.
Gains (with hybridtx4x4/8x8/16x16 and tx_select experiments
enabled) on derf: +0.9%, HD: +0.4%, STD/HD: +0.8% (SSIM or overall
PSNR, both metrics show similar improvements).
Change-Id: Ide954b000b415548ed92a7ac78e24f36e60fcf06
This can be used to distinguish between 16x8, 8x16, 8x8 and 4x4
partitioning modes when choosing splitmv as a MB mode.
Change-Id: Idc8b59772e1a80ccc4ad44d63c5c2ec3fc061a3c
It currently counts the probability that the branch is true, but it
should count the probability that the branch is false.
Change-Id: I963825da2e7a7ed3a613eb23ffd085e427dc36e5
Allows B_VL_PRED & B_LD_PRED modes to be used for all blocks
within a MB in B_PRED mode. These modes were temporarily
disabled with super-block coding.
Change-Id: I973b9bdb82c3da5f12d7cc963162a28805f25303
First sse2 version of vp8_mbloop_filter_vertical_edge(). For now,
intrinsics are being used until the bitstream is finalized. This function
will be revisited later for further performance improvements.
For the test clip used, a 34+% decoder performance improvement
was seen. This will vary depending on material.
Change-Id: I455b438bc8d8af76cf7533ac42eda5f689b21f7c
There were several different methods for calculating bitstream
probabilities in use. Consolodate these into a pair of functions,
get_prob() and get_binary_prob().
Change-Id: I5534f517f74027fee16d89c9baefaafea8156b2f
When run with no arguments, report warnings in the diff between the
working tree and HEAD. With arguments, report warnings in the diff
between the named commit and its parents.
Change-Id: Ie10dcdecb303edf8af51bad645cc11206a1fc26b
Pass the bool coder to be used explicitly. This avoids cases where two
different bool coders can be addressed from the same function. Also be
more consistent with bool coder variable naming, start to standardize
on 'bc'.
Change-Id: I1c95e2fdbe24ebe8c0f84924daa1728e3b054a31
Separates the logic on transform type selection previously spread out
over a number of files into a separate function. Currently the tx_type
field in b_mode_info is not used, but still left in there to eventually
use for signaling the transform type in the bitstream.
Also, now for tx_type = DCT_DCT, the regular integer DCT is used, as
opposed to the floating point DCT used in conjuction with hybrid
transform.
Results change somewhat due to the transform change, but are within
reasonable limits. The hd/std-hd sets are slightly up, while derf/yt
are slightly down.
Change-Id: I5776840c2239ca2da31ca6cfd7fd1148dc5f9e0f
First sse2 version of vp8_mbloop_filter_horizontal_edge(). For now,
intrinsics are being used until the bitstream is finalized. This function
will be revisited later for further performance improvements.
For the test clip used, a 31+% decoder performance improvement
was seen. This will vary depending on material.
Change-Id: I03ed3a7182478bdd1f094644ff3e0442625600e7
Prior to this patch, if there were any lint errors, this script would
exit with an error, even if those errors were not in the hunks being
tested by this script. This change makes it so that if any lint lines
are printed, an error is returned.
Change-Id: I69c8bef4367ccf25d287508f29e587b1f4426143
This commit moves a bit of data that ended up packed with the
modes/mv/residual partition during the change to interleaved encoding
into partition 0 where it belongs.
Change-Id: Ic711a378c58d9d6a17254384f492c213a15bad92
Packs the bitstream with each mb's residual following its mode/mv
information.
TODO: There are still a few fields that should be packed into partition
0 but are included in partition 1, due to them being serialized from
write_kfmodes/pack_inter_mode_mvs, which execute after the first
partition is finalized. These need to be separated out into a separate
function, similar to mb_mode_mv_init() in decodemv.c.
Change-Id: I43a46c363601ab36954d07ebe498760e1e2e3af4
Rather than decoding all modes/mvs separately, decode them per MB. This
forces the mode which was already used form the CONFIG_NEWBESTREFMV and
CONFIG_SUPERBLOCKS experiments, and is a precursor to changing to
interleaved encoding.
Change-Id: If19ee74ac8a987846d1cd0cf2b2e02a82f1a43ad
The commit changed to avoid using pixels from extended border in
in evaluating and select best reference motion vector.
Change-Id: I39b758889373e42ed2889d59744388e5b9c1a20a
It is essentially a duplicate of mode for RD-only purposes. Removing it
saves us 4 bytes per B_MODE_INFO, or ~0.5MB for a 1080p video encode.
Change-Id: I0a54db5f51658b3946d7efb1ca6e8cfbda0cdf88
The variable is essentially a duplicate of mode for RD-only purposes.
Removing it gives identical results, and saves 4 bytes per macroblock
(i.e. 32.5kB for a 1080p HD video encode).
Change-Id: I22d5058fdb80ab0b69862caee825e9d86bb148b3
This way a caller doesn't need to implement the logic for which (and how
many) tokens to write out to stuff one macroblock worth of EOBs. Make
the actual function implementations static, since they are now only used
in tokenize.c; also do some minor stylistic changes so it follows the
style guide a little more closely; use PLANE_TYPE where appropriate,
remove old (stale) frame_type function arguments; hardcode plane type
where only a single one is possible (2nd order DC or U/V EOB stuffing);
support stuffing 8x8/4x4 transform EOBs with no 2nd order DC.
Change-Id: Ia448e251d19a4e3182eddeb9edd034bd7dc16fa3
Change the macros PLANE_TYPE_{Y_NO_DC,Y2,UV,Y_WITH_DC} to a typed enum,
and use this typed enum consistently across all places where relevant.
In places where the type is implied (e.g. in functions that only handle
second order planes or chroma planes), remove it as a function argument
and instead hardcode the proper enum in the code directly.
Change-Id: I93652b4a36aa43163d49c732b0bf5c4442738c47
Also merge the three occurrences of 4x4 chroma block writing into a
single function, and call that function instead of duplicating the
4x4 chroma tokenization code in 3 places.
Change-Id: I7913538d1029f709b0e3ae49fff1148d3be9eeb9
Merge code blocks for different transform sizes; use MACROBLOCKD as a
temp variable where that leads to smaller overall source code; remove
duplicate code under #if CONFIG_HYBRIDTRANSFORM/#else blocks. Some style
changes to make it follow the style guide a little better.
Change-Id: I1870a06dae298243db46e14c6729c96c66196525
Also make some minor stylistic changes to bring the code closer to
the style guide. Remove distinction between inter and intra transform
functions, since both do exactly the same thing except for the check
against SPLITMV for the second-order transform. Remove some commented
out debug code. Remove 8x8/16x16 transform code in encode_inter16x16y(),
since the first-pass only uses 4x4 anyway.
Change-Id: Ife54816ff759825b9141f95dc2ba43c253c14dba
Also make some minor stylistic changes to bring the code closer to
the style guide. Remove checks against i8x8/bpred in the mb-codepath,
since these do individual block reconstruction and thus don't go through
this codepath.
Change-Id: I4dfcf8f78746f4647a206475acf731837aa4fd47
This includes trellis optimization, forward/inverse transform,
quantization, tokenization and stuffing functions.
Change-Id: Ibd34132e1bf0cd667671a57b3f25b3d361b9bf8a
Entropy coding takes care of this anyway, and this causes changes to
the txfm size assigned to skip blocks, which can affect the loopfilter
output, thus causing encoder/decoding mismatches.
Change-Id: I591a8d8a4758a507986b751a9f83e6d76e406998
The update_mb_segmentation_map flag was being signalled earlier than
other data dependent on that flag. Consolidate this data so it's
parsed within the same if-scope as the flag is originally parsed in.
Change-Id: I10e90b4f511856445ef75a85a44ff441e1e5e672
If the threshold(limits) <= 0, skipped filtering and copied the
frame directly. Also, fixed memory allocation checking.
Change-Id: If3d79d5b2bcb71b9777e6eb5cba1384585131e22
Documentation is typically auto-detected by checking for php and
doxygen. Add an option to explicitly disable it.
Remove toggle keywords from libraries, examples, documentation and
unit tests. They were not consistent with the default status.
Change-Id: I21049675ccfd8e58ac612cd058641b197db5c0eb
Use the common update_skip_probs() function rather than duplicating its
logic in write_kf_modes().
Change-Id: I2890a28f6907cb79ffe0fb21d20f0ef98b85cdd9
If a reference frame is forced because of low dissimilarity, then
shut off the search of intra modes. This change has mixed results. On
one clip (QVGA), it hurt quality by ~1.5% with negligible speed impact.
On another (VGA) it had negligible affect on quality, but a ~0.2% speed
impact.
Change-Id: Ic8b07648979d732f489de5f094957e140f84d2eb
Rather than overloading the parent_ref_frame value to shut off the
search in some cases, add a new validity flag. This cleans up some
of the duplicated mr_encoder_id && mr_low_res_mv_avail checks as
well, for readability.
Change-Id: Iddad93a27066c3d85ff2f25a361ac113b288ab7b
Results: derf (vanilla or +hybridtx) +0.2% and (+hybrid16x16
or +tx16x16) +0.7%-0.8%; HD (vanilla or +hybridtx) +0.1-0.2%
and (+hybrid16x16 or +tx16x16) +1.4%, STD/HD (vanilla or +hybridtx)
about even, and (+hybrid16x16 or +tx16x16) +0.8-1.0%.
Change-Id: I03899e2f7a64e725a863f32e55366035ba77aa62
1. Algorithm modification:
Instead of having same filter threshold for a whole frame, now we
allow the thresholds to be adjusted for each macroblock. In current
implementation, to avoid excessive blur on background as reported
in issue480(http://code.google.com/p/webm/issues/detail?id=480), we
reduce the thresholds for skipped macroblocks.
2. SSE2 optimization:
As started in issue479(http://code.google.com/p/webm/issues/detail?id=479),
the filter calculation was adjusted for better performance. The c
code was also modified accordingly. This made the deblock filter
2x faster, and the decoder was 1.2x faster overall.
Next, the demacroblock filter will be modified similarly.
Change-Id: I05e54c3f580ccd427487d085096b3174f2ab7e86
In some situations, believed to be an interaction between temporal
scalability and dropped frames, the references available to an
encoder may not be the same references available to its parent.
Previously, the code tried to force the reference frame chosen by
the parent to be used on this frame, even if it was disabled. This
was preventing the pick mode loop from running even once, which led
to a crash.
Attempts to reproduce this bug locally were unsuccessful, so it is
still undetermined what the underlying cause of this issue is. In
the specific case that was failing, the application did not set
any flags which influenced the reference selection on that frame.
ref_frame_flags indicated that the golden frame was disabled,
believed to be because the last frame updated the last and golden
frames, so golden was shut off by default. It's not clear why this
wouldn't have also been true in the lower res encoder, ie, why the
lower res encoder decided to use and/or was allowed to use the
golden frame. We weren't able to debug into the non-crashing
lower res encoder as the crash couldn't be reproduced locally.
Change-Id: Ifb265253d26963ac2afde0e20cf6792788be6af7
This commit fixes unsafe simd / floating point interactions arising
from the current hybrid and 16x16 transform implementation.
These led to a raft of bugs and issues when the project was
built using VS2008 for Win32 though they did not show up with
the unix builds.
Gerrit makes a meal out of presenting the fix but all I have actually
done is indent the body of each function that uses floating point by
one level and bracket with emms instructions using the function
vp8_clear_system_state(). See below.
function () {
vp8_clear_system_state();
{
... function body
}
vp8_clear_system_state();
}
This is almost certainly over the top in terms of number of emms
instructions but is a temporary measure pending implementation of
integer variants of each function to replace the floating point.
Limited testing suggests that this fixes the problems that arose for
Win32 VS2008 when the hybrid or 16x16 transforms were enabled.
Change-Id: I7c9a72bd79315246ed880578dec51e2b7c178442
This unit test compares the difference in quality with
error resilience enabled and disabled. The test runs
for all of the one-pass encoding modes.
The test ensures that the effect of turning on error
resilience makes less than a 10% difference in PSNR.
Further cases should be added to do a more comprehensive
test.
Change-Id: I1fc747fc78c9459bc6c74494f4b38308dbed0c32
If a parent mb is available but is intra coded, then parent_ref_mv is
invalid. Check that the parent is inter coded before trying to access
the parent_ref_mv. Previously the parent_ref_mv was being read from
an uninitialized stack allocation, causing potential OOB reads and
other undefined behavior.
Change-Id: I0c93cd412a19c3a184bcf6decaa145b3a036a6c0
Modified EncoderTest class to have separate member variables
for initialization time and per-frame.
Change-Id: I08a1901f8f3ec16e45f96297e08e7f6df0f4aa0b
The stats buffer needs to be reset between runs of the
encoder. I added a Reset() function to TwopassStatsStore
and called it at the beginning of each encode.
This enables us to run multiple encodes which was
previously not possible since there was no way to reset
the stats between runs.
Change-Id: Iebb18dab83ba9331f009f764cc858609738a27f9
Protect the call to {Initialize,Delete}CriticalSection() with an
Interlocked{Inc,Dec}rement() pair, rather than the previous static
initialization. This should play better with AppVerifier, and fix issue
http://code.google.com/p/webm/issues/detail?id=467
Change-Id: I06eadbfac1b3b4414adb8eac862ef9bd14bbe4ad
The codec as it stood placed a keyframe one frame after a
real cut scene - and ignored datarate and other considerations.
TODO: Its possible that we should detect a keyframe and recode
the frame ( in certain circumstances) to improve quality.
Change-Id: Ia1fd6d90103f4da4d21ca5ab62897d22e0b888a8
Reset the cyclie refresh mode index in alloc_compressor_data().
This is needed to handle both cases of internal and
external spatial resizing.
Change-Id: I2697e12d45135eae2e8f0d45161811f24722312a
Separates the entropy coding context models for 4x4, 8x8 and 16x16
ADST variants.
There is a small improvement for HD (hd/std-hd) by about 0.1-0.2%.
Results on derf/yt are about the same, probably because there is not
enough statistics.
Results may improve somewhat once the initial probability tables are
updated for the hybrid transforms which is coming soon.
Change-Id: Ic7c0c62dacc68ef551054fdb575be8b8507d32a8
On an internal spatial resize, this mode index was not reset to 0,
and therefore could exceed dimensions of seg_map or cyclic_refresh_map.
Change-Id: I6fe85dbd2765eb0207a9d9f71fda8d8b8c34f075
Patch Set 1: gain familiarity with unit tests... added simple
4x4 subtract test
Patch Set 2: fixed mistakes, parameterized as suggested
Patch Set 3: randomized the source/predictor data
Change-Id: I33432bdf7c9f2a9b8c2533a37106382c2a8209ee
Signed-off-by: Scott LaVarnway <slavarnway@google.com>
Fixes some build issues for people building for win32 who have a
pthreads emulation layer installed.
Change-Id: I0e0003fa01f65020f6ced35d961dcb1130db37a8
This should avoid problems with blocks gettings high quality
improvement despite having recently moved:
Change-Id: Ic0af0de2d6577807fa3c553f47b55d547ef36359
Set the seg map to 0 for key frame.
In previous commit on cyclic refresh, the seg map for key frame
was not reset, and instead used the seg map from last frame.
Change-Id: I848eb2face420dfcd2f7daca6f070b9127ca938b
-Increase the amount of mbs to be refreshed.
-Replace the delta qp with a fixed and reduced delta.
-Change to the mb update loop to try to always update same amount of mbs.
Change-Id: I93ac88002fd8dc677d2337f77998ff93f64e4ff9
With this change, even if hybridtransform8x8 experiment is off,
8x8 dct is used for the I8x8 mode. However note that the gains
observed with the hybridtransform8x8 experiment will now be less,
since part of the gain is now merged in.
Change-Id: I9afb3880906fd0a1368a374041fc08efcf060c54
This change is necessary for the frame-based multithreading
implementation.
Since the postproc occurs in this call, vpxdec was modified to time around
vpx_codec_get_frame()
Change-Id: I389acf78b6003cd35e41becc16c893f7d3028523
The commit changed to use 3 rows above and 3 cols from left for SAD
scoring for selecting the best reference motion vector. The change
helped std-hd set by >.2% on psnr/ssim metrics.
Change-Id: Ifad3b528d0b4b6e3c22518af789d76eff23c1520
Initialize the top line at the beginning of each frame and
the left column at the beginning of each row.
Change-Id: I5412f7ea49ffc490215cf65a62715a6c5e3a5a29
The high-precision (1/8) pel bit is turned off if the reference
MV is larger than a threshold. The motivation for this patch is
the intuition that if motion is likely large (as indicated by
the reference), there is likley to be more motion blur, and as
a result 1/8 pel precision would be wasteful both in rd sense
as well as computationally.
The feature is incorporated as part of the newmventropy experiment.
There is a modest RD improvement with the patch. Overall the
results with the newmventropy experiment with the threshold being
16 integer pels are:
derf: +0.279%
std-hd: +0.617%
hd: +1.299%
yt: +0.822%
With threshold 8 integer pels are:
derf: +0.295%
std-hd: +0.623%
hd: +1.365%
yt: +0.847%
Patch: rebased
Patch: rebase fixes
Change-Id: I4ed14600df3c457944e6541ed407cb6e91fe428b
Multiple decoders were getting allocated per frame.
If the decoder crashed we exitted with out freeing
them and the next time in we'd allocate over.
This fix removes the allocation and just has 8
boolcoders in the pbi structure
Change-Id: I638b5bda23b622b43b7992aec21dd7cf6f6278da
Some cleanups that will make it easier to maintain the code
and incorporate upcoming changes on entropy coding for the
hybrid transforms.
Change-Id: I44bdba368f7b8bf203161d7a6d3b1fc2c9e21a8f
If the decoder crashes and returned an error before it set up
block offsets but after it set up frame buffers. We had a
problem decoding the next keyframe because the block offsets
were never set.
Change-Id: Ied2866e9770d80fc66241d5e0d978d4f5f9cdd89
This commit merges those parts of the CONFIG_NEW_MVREF
that specifically relate to choosing a better set of candidate
MV references into the NEWBESTREFMV experiment.
CONFIG_NEW_MVREF will then be used for changes relating
to the explicit coding of a cost optimized MV reference in the
bitstream as part of MV coding.
Change-Id: Ied982c0ad72093eab29e38b8cd74d5c3d7458b10
Extend experiment to use both vectors from MBs
coded using compound prediction as candidates.
In final sort only consider best 4 candidates
for now but make sure 0,0 is always one of them.
Other minor changes to new MV reference code.
Pass in Mv list to vp8_find_best_ref_mvs().
Change-Id: Ib96220c33c6b80bd1d5e0fbe8b68121be7997095
Adds a new experiment with redesigned/refactored motion vector entropy
coding. The patch also takes a first step towards separating the
integer and fractional pel components of a MV. However the fractional
pel encoding still depends on the integer pel part and so they are
not fully independent. Further experiments are in progress to see
how much they can be decoupled without affecting performance.
All components including entropy coding/decoding, costing for MV
search, forward updates and backward updates to probability tables,
have been implemented.
Results so far:
derf: +0.19%
std-hd: +0.28%
yt: +0.80%
hd: +1.15%
Patch: Simplifies the fractional pel models:
derf: +0.284%
std-hd: +0.289%
yt: +0.849%
hd: +1.254%
Patch: Some changes in the models, rebased.
derf: +0.330%
std-hd: +0.306%
yt: +0.816%
hd: +1.225%
Change-Id: I646b3c48f3587f4cc909639b78c3798da6402678
Adjusts some of the qualification thresholds in mfqe to eliminate
artifacts due to wrong decisions. Besides, a new qualification
criteria is used to disable mfqe if the quality of the previous
frame is itself not too good.
Change-Id: I4097c20b7fd4fcc60cc3003c1e33e8faae2ff066
The denoiser function was modified to reduce the computational
complexity.
1. The denoiser c function modification:
The original implementation calculated pixel's filter_coefficient
based on the pixel value difference between current raw frame and last
denoised raw frame, and stored them in lookup tables. For each pixel c,
find its coefficient using
filter_coefficient[c] = LUT[abs_diff[c]];
and then apply filtering operation for the pixel.
The denoising filter costed about 12% of encoding time when it was
turned on, and half of the time was spent on finding coefficients in
lookup tables. In order to simplify the process, a short cut was taken.
The pixel adjustments vs. pixel diff value were calculated ahead of time.
adjustment = filtered_value - current_raw
= (filter_coefficient * diff + 128) >> 8
The adjustment vs. diff curve becomes flat very quick when diff increases.
This allowed us to use only several levels to get a close approximation
of the curve. Following the denoiser algorithm, the adjustments are
further modified according to how big the motion magnitude is.
2. The sse2 function was rewritten.
This change made denoiser filter function 3x faster, and improved the
encoder performance by 7% ~ 10% with the denoiser on.
Change-Id: I93a4308963b8e80c7307f96ffa8b8c667425bf50
Replace DECLARE_ALIGNED_ with vpx_memalign()
DECLARE_ALIGNED (__declspec(align())) does not work as intended when
used on class data members:
Data in classes or structures is aligned within the class or structure
at the minimum of its natural alignment and the current packing setting
(from #pragma pack or the /Zp compiler option)
Change-Id: I304aaa6c3716fbfae24675ecf192f4b40787e83e
Enable ADST/DCT of dimension 16x16 for I16X16 modes. This change provides
benefits mostly for hd sequences.
Set up the framework for selectable transform dimension.
Also allowing quantization parameter threshold to control the use
of hybrid transform (This is currently disabled by setting threshold
always above the quantization parameter. Adaptive thresholding can
be built upon this, which will further improve the coding performance.)
The coding performance gains (with respect to the codec that has all
other configuration settings turned on) are
derf: 0.013
yt: 0.086
hd: 0.198
std-hd: 0.501
Change-Id: Ibb4263a61fc74e0b3c345f54d73e8c73552bf926
The transform functions in experimental branch absorbed a scaling
factor of 4 to allow quantization steps closer to unit quantizer.
This commit added scaling code in between forward and inverse
transform to properly account for the scaling factor.
Change-Id: I9a573ddc1ffa74973b34800a5da1a56dbabe0949
Alternative strategy for finding a list of candidate motion
vectors to use as reference values in mv coding and as
nearest and near.
Sort by sad in vp8_find_best_ref_mvs() rather than just
pick the best. Allow 0,0 as a best ref option but not a
nearest or near unless there are no alternatives.
Encode/Decode verified on at least some clips.
Some commented out experimental and stats code still in place.
Gain over existing code averages about 1% on derf (alll metrics)
with improvement on all clips. Other test results pending.
The entropy coding of the mode (nearest/near etc) still
depends upon and requires the old "findnear" code so
this needs looking at and may provide room for further gains.
Change-Id: I871d7cba1d1c379c4bad9bcccce1fb19c46b8247
For videos with big static background(such as video conferencing
clips), the mode decision was biased to ZEROMV in order to
obtain a stable background. The percentage of ZEROMV on last
frame was used to predict if there is static area in current frame,
and checking already-encoded neighboring macroblocks' motion
vectors to make sure the local area has low motion.
Change-Id: I05b3241d3a56a0bda88b6681e5646c1c8baf2e57
The current way of counting inter_zz_count doesn't work correctly
in multi-threaded encoding. Calculating it after the frame is
encoded fixed the problem.
Change-Id: Ifcb1972cde950b8cc194f75c6d7b6af09e8b0e65
Added missing parameters to calls to:
vp8_build_intra_predictors_internal
vp8_build_intra_predictors_mbuv_internal
Change-Id: If8eeb8ff23eff4572397b404fe61be5d0c950bbe
This commit adds a pick_sb_mode() function which selects the best 32x32
superblock coding mode. Then it selects the best per-MB modes, compares
the two and encodes that in the bitstream.
The bitstream coding is rather simplistic right now. At the SB level,
we code a bit to indicate whether this block uses SB-coding (32x32
prediction) or MB-coding (anything else), and then we follow with the
actual modes. This could and should be modified in the future, but is
omitted from this commit because it will likely involve reorganizing
much more code rather than just adding SB coding, so it's better to let
that be judged on its own merits.
Gains on derf: about even, YT/HD: +0.75%, STD/HD: +1.5%.
Change-Id: Iae313a7cbd8f75b3c66d04a68b991cb096eaaba6
Loop filter producing wierd artifacts when
repeatedly applied in noisy video. This
mitigates the effect.
Change-Id: If4b1a8543912d186a486f84e11d8b01f7436fa5f
Resolved the decoder mismatch issue due to quantization parameter
threshold for hybrid transform coding. The macroblock dequantizer
initialization is moved to be performed before coefficient
detokenization, since the (de)tokenization is now dependent on the
macroblock level quantization parameter.
Change-Id: I443da4992ebb70ae4114750b2f1363c0c628580e
This doesn't affect the result, since there are no MVs coded using this
entropy. It does, however, silence valgrind warnings about uninitialized
variables.
Change-Id: I6e21ba92df6ce5381bf58b8c349ef4373294a0b6
About 3.5x faster, 30% overall encoder speedup. Rest of optimizations
will come soon (see TODO section in filter_sse4.c).
Change-Id: If18108048bfd5345fc942e8574e4c7f58e0e86e0
visual studio targets do not depend on executables, only the projects
produced.
tested with --target=x86-win32-vs9
fixes:
...
make[1]: *** No rule to make target `test_libvpx', needed by `.bins'.
Stop.
Makefile:17: recipe for target `.DEFAULT' failed
Change-Id: I606ab32d5e26fee352f25c822e0f496eff165382
The current parsing logic of the dumpmachine tuple lacks any arm
cases which means tgt_isa never gets set, so for all arm targets,
we get detected as generic-gnu. Add some basic arm checks here
so the automatic detection logic works.
Change-Id: Ie5e98142876025c6708604236bc519c0bdb09319
If you build with --enabled-shared on a Linux arch not explicitly
listed, the configure script will abort because it didn't detect
"linux" in the fallback generic-gnu tuple.
Since this is the fallback tuple and people are passing
--enable-shared, assume the user knows what they're in for.
Change-Id: Ia35b657e7247c8855e3a94fca424c9884d4241e3
The reference motion vector selected by surrounding pixels that has
the best matching score is used as nearest motion vector.
The change has shown consistent gain on all test sets, compression
gains range from .2% to .6%. The variation is largely dependent on
various other experiments on or off.
Change-Id: I5552e1c2f6fc57c3e8818a5ee41ffda89af05e75
References to MACROBLOCKD that use "x" changed to "xd"
to comply with convention elsewhere that x = MACROBLOCK
and xd = MACROBLOCKD.
Simplify some repeat references using local variables.
Change-Id: I0ba2e79536add08140a6c8b19698fcf5077246bc
Add local variable in several places to reference the MB mode
info structure. Currently this is usually accessed in the code as
x->e_mbd.mode_info_context->mbmi.* or in some places
xd->mode_info_context->mbmi.*
Resolved some uses of x-> for the MACROBLOCKD structure.
Rebased without dependency on motion reference experiment.
Change-Id: If6718276ee4f2ef131825d1524dfdb02a3793aed
Merges this experiment in to make it easier to run tests on
filter precision, vectorized implementation etc.
Also removes an experimental filter.
Change-Id: I1e8706bb6d4fc469815123939e9c6e0b5ae945cd
Latest version of all scripts/makefile but rtcd_defs.sh is empty, all
existing functions are still selected using the old/current way.
Change-Id: Ib92946a48a31d6c8d1d7359eca524bc1d3e66174
using large values for the timebase, e.g., {33333, 1000000} could
rollover the timestamp calculation in vp8e_encode as it was not using
64-bit math.
originally reported on ffmpeg's trac:
https://ffmpeg.org/trac/ffmpeg/ticket/1014
BUG=468
Change-Id: Iedb4e11de086a3dda75097bfaf08f2488e2088d8
The commit replaces run-time initialization of cosine constants with
static constant values, which provides ~30% relief on slow speed. The
real solution, however will be to implement integer versions of those
functions that current use float/double.
Change-Id: Ie3ff1793509653d78dd1aeaf88cc6737da1bc55f
Using surrounding reconstructed pixels from left and above to select
best matching mv to use as reference motion vector for mv encoding.
Test results:
AVGPSNR GLBPSNR VPXSSIM
Derf: 1.107% 1.062% 0.992%
Std-hd:1.209% 1.176% 1.029%
Change-Id: I8f10e09ee6538c05df2fb9f069abcaf1edb3fca6
The forward and inverse hybrid transforms are now performed using
single function modules, where the dimension is sent as argument.
Added an inline function clip8b to clip the reconstruction pixels
into range of 0-255.
Change-Id: Id7d870b3e1aefc092721c80c0af6f641eb5f3747
This allows building on MountainLion as the 10.6 SDK has been
removed from the latest Xcode version (4.4 4F250). Also fix
all warnings for that build.
Change-Id: Ib70bca4a25295f13595f0d10ea9f0229631de5a4
Merged in the high_precision_mv experiment to make it easier
to work on new mv encoding strategies. Also removed
coef_update_probs3().
Change-Id: I82d3b0bb642419fe05dba82528bc9ba010e90924
Previouly, the decoding of mode and motion vector are done a per frame
basis followed by residue decoding and reconstuction. The commit added
the option to allow decoder to interleave the decoding of mode and mvs
with the residue decoding on a per MB basis.
Change-Id: Ia5316f4a7af9ba7f155c92b5a6fc97201b653571
Fixed the code review comments.
Under the htrans8x8 experiment the 8X8 DCT in the
I8X8 mode is replaced with a combination of 8X8 ADST and
DCT.
Overall coding gains with the htrans8x8 experiment are:
derf: 0.486
std-hd: 1.040
hd: 1.063
yt: 0.506
Note that part of the gain comes from bigger transforms
(8x8 instead of 4x4) and part comes from replacing the DCT
wth the ADST.
Change-Id: I92ca6bbfce11b4165d612b81d9adfad4d010c775
Set on all 16x16 intra/inter modes
Features:
- Butterfly fDCT/iDCT
- Loop filter does not filter internal edges with 16x16
- Optimize coefficient function
- Update coefficient probability function
- RD
- Entropy stats
- 16x16 is a config option
Have not tested with experiments.
hd: 2.60%
std-hd: 2.43%
yt: 1.32%
derf: 0.60%
Change-Id: I96fb090517c30c5da84bad4fae602c3ec0c58b1c
Interleaved loopfiltering with decode. For 1080p clips, up to 1%
performance gain. For 4k clips, up to 10% seen. This patch is required
for better "frame-based" multithreading.
Change-Id: Ic834cf32297cc04f27e8205652fb9f70cbe290db
Apply 2D-DCT transform of dimension 8x8 to encode prediction
residuals of I8X8 mode.
Brought back block type 3 probability context model for 8x8 tokens,
which is used for the coefficients of Y blocks in I8x8 modes. The
coefficient costs estimate of I8X8 mode in rate-distortion is also
changed appropriately.
Performance results:
derf: 0.246
yt: 0.114
std-hd: 0.730
hd: 0.670
Change-Id: If1d970eeb4e1827c9f0d2c5b27d33089b347ea27
predict_d has become canonical. Remove previous helper function.
Disable ARM assembly pending update.
Change-Id: Idd84ac8a28f9b0221ea97904a77de1e705d06a7d
The sync interval for the multithreaded encoder was considered as not changing
during the encoding. This is not true if picture size is changed.
The encoder could dead-lock because the main thread and the other threads were
using different sync interval.
Change-Id: I75232bbdbc6c02d77f830d870fd8b4e96697c64e
After the picture size was changed to a bigger one, the internal memory was
corrupted and multithreaded encoder was deadlocking.
Memory for last frame's MVs, segmentation map and active map were allocated when
the compressor was created (vp8_create_compressor). Buffers need to be
reallocated when picture size is changed, so, the allocation was moved to
vp8_alloc_compressor_data, which is called every time the picture is resized.
Change-Id: I7ce16b8e69bbf0386d7997df57add155aada2240
Merged the enhanced_interp experiment.
Found and fixed a bug in the include files framework, whereby
certain encoder files were still using the old INTERP_EXTEND
value of 3 instead of 4. The thresholds for mv range mcomp.c
need a small adjustment to prevent crashes.
The results are more or less unchanged.
Change-Id: Iac5008390f1efc97ce1102fbb5f8989c847fb579
Allows for swtiching/setting interpolation filters at the MB
level. A frame level flag indicates whether to use a specifc
filter for the entire frame or to signal the interpolation
filter for each MB. When switchable filters are used, the
encoder chooses between 8-tap and 8-tap sharp filters. The
code currently has options to explore other variations as well,
which will be cleaned up subsequently.
One issue with the framework is that encoding is slow. I
tried to do some tricks to speed things up but it is still slow.
Decoding speed should not be affected since the number of
filter taps remain unchanged.
With the current version, we are up 0.5% on derf on average but
some videos city/mobile improve by close to 4 and 2% respectively.
If we did a full-search by turning the SEARCH_BEST_FILTER flag
on, the results are somewhat better.
The framework can be combined with filtered prediction, and I
seek feedback regarding that.
Rebased.
Change-Id: I8f632cb2c111e76284140a2bd480945d6d42b77a
The ambient qp and active worse/best qp were reset for every frame
when temporal layers is on. This change removes this reset.
As this affects the target size for forced key frames
(it will actually lower the size somewhat), we increased the
inital boost factor to compensate.
Change-Id: Ie38d95f5c99ab3d447469c49e2177bc3fcc4ad28
SAD returns unsigned values. Make all the declarations the same.
Remove bestsad initialization and check. It is always set to the
result of a SAD call so it will never remain UINT_MAX
Use ja instead of jg to test unsigned comparison instead of signed.
Update test.
Change-Id: I46336ab45f4e60fc37caf20bd36bc5782079c7a5
The following five experiments are merged:
newentropy
newupdate
adaptive_entropy (also includes a couple of parameter changes
that improves results a little
in common/entropymode.c and encoder/modecosts.c
that were not merged from the internal branch)
newintramodes
expanded_coef_context
Change-Id: I8a142a831786ee9dc936f22be1d42a8bced7d270
Precalculated block ptrs do not need updates during encoding.
Set these at init stage.
Moved the allocation of 'mt_current_mb_col' (last encoded MB on each
row) to vp8_alloc_compressor_data(), so that it is correctly
reallocated when frame size is changing.
Change-Id: Idcdaa2d0cf3a7f782b7d888626b7cf22a4ffb5c1
Added drop_frame support in multi-resolution encoder.
If one frame is dropped at a lower-resolution level, the next
upper-resolution level encoder needs to encode that frame
independently without any lower-resolution level motion
information.
Another issue is that if one frame is dropped at some but not all
resolution levels, a frame after that one may use different set
of reference frames at different resolution levels. This reference
frame asynchronization could degrade motion search precision in
upper-resolution level encoding, which uses lower-resolution level
motion result. This change compares the lower-resolution and upper-
resolution level's reference frames. If they are not the same, the
upper-resolution level encoder can not use lower-resolution level
motion result.
Change-Id: I61afa4f313630e75b7cbdd5742e230e8724a988a
Replaced local definitions of the extension required
by the filters with the globally defined value.
Change-Id: If9e590a1f2e5b0bdc2d3e3c3f04aacbd3b09bfee
Adds ADST/DCT hybrid transform coding for Intra4x4 mode.
The ADST is applied to directions in which the boundary
pixels are used for prediction, while DCT applied to
directions without corresponding boundary prediction.
Adds enum TX_TYPE in b_mode_infor to indicate the transform
type used.
Make coding style consistent with google style.
Fixed the commented issues.
Experimental results in terms of bit-rate reduction:
derf: 0.731%
yt: 0.982%
std-hd: 0.459%
hd: 0.725%
Will be looking at 8x8 transforms next.
Change-Id: I46dbd7b80dbb3e8856e9c34fbc58cb3764a12fcf
the integer version has very good precision, the float version is no
longer useful. this commit also removes the experiment option from
configure script.
Change-Id: Ibb92e63c9f5083357cdf89c559d584a7deb3353f
this commit removes a number of experiment options from configure
script. the associated experiments are already fully merged, the
options in configure script have no effect at all.
Change-Id: I8054ccaee0a04610162ed76ac9e59c4538217113
vp8_encode_inter_macroblock() is called in both pick_mb_modes() as
well as encode_sb(), thus the number of macroblocks in the counter
were twice as big as actual numbers. This doesn't affect output.
Change-Id: I6de8a996ee44d2f7f2080d8d2177dd7bc6207c93
This allows CONFIG_SUPERBLOCKS experiment to almost compile succesfully,
except for the missing pick_sb_modes() function.
Change-Id: Ib2322f2aacdc371e8066f2eb4a8d761c40490b4d
Added validity checking in multi-res encoder. Disable spatial
resampling and frame dropping before we have those supports.
Also, deallocate the memory for all resolution levels once error
occurs.
Change-Id: Ia5d65a645381cad1a49940ab3a19bb5696c39c09
Create a merge between the experimental development history and the
previously published snapshot.
Change-Id: If320df72a6bbefec53833626d08dbc9678be2d2d
The lower complexity modes may not generate a keyframe automatically.
This behavior was found when running under Valgrind, as the slow
performance caused the speed selection to pick lower complexities than
when running natively. Instead, use a fixed complexity for the
realtime auto keyframe test.
Affected tests:
AllModes/KeyframeTest.TestAutoKeyframe/0
Change-Id: I44e3f44e125ad587c293ab5ece29511d7023be9b
This unit test tests vp8_sixtap_predict function against preset
data and random generated data. The test against preset data
checks the correctness of the functions, and the test against
random data checks if the optimized six-tap predictor functions
generate matching result as the c functions. It tests the
following functions:
vp8_sixtap_predict16x16_c
vp8_sixtap_predict16x16_mmx
vp8_sixtap_predict16x16_sse2
vp8_sixtap_predict16x16_ssse3
vp8_sixtap_predict8x8_c
vp8_sixtap_predict8x8_mmx
vp8_sixtap_predict8x8_sse2
vp8_sixtap_predict8x8_ssse3
vp8_sixtap_predict8x4_c
vp8_sixtap_predict8x4_mmx
vp8_sixtap_predict8x4_sse2
vp8_sixtap_predict8x4_ssse3
vp8_sixtap_predict4x4_c
vp8_sixtap_predict4x4_mmx
vp8_sixtap_predict4x4_ssse3
Change-Id: I6de097898ebca34a4c8020aed1e8dde5cd3e493b
This is to avoid a rare encoder/decoder mismatch for MB using SPLITMV
mode. In decoder, the UV mv can be determined to need clamp, but the
flag is never set in encoder motion vector selection process, and the
clamp is not done in encoding in encoder.
Change-Id: I60520d3f790354c7855dadf03f0978ea9b77e2c0
This patch fixed issue 458 by calling copy function when both
offsets are 0, which guarantees the SSSE3 functions output
same result as the c function for all possible offsets.
Change-Id: I209aec7a4c6b3362db2646a8887c1038493b6496
xd->subpixel_predict16x16 is called in first pass, but isn't
initialized in first pass, which causes segfault. This patch
fixed that problem.
Change-Id: Ibd2cad4e2d32ea589fc3e0876d60d3079ae836e7
We need an easy way to build the unit test driver without running the
tests. This enables passing options like --gtest_filter to the
executable, which can't be done very cleanly when running under
`make test`.
Fixed a number of compiler errors/warnings when building the tests
in various configurations by Jenkins.
Change-Id: I9198122600bcf02520688e5f052ab379f963b77b
This commit adds lossless compression capability to the experimental
branch. The lossless experiment can be enabled using --enable-lossless
in configure. When the experiment is enabled, the encoder will use
lossless compression mode by command line option --lossless, and the
decoder automatically recognizes a losslessly encoded clip and decodes
accordingly.
To achieve the lossless coding, this commit has changed the following:
1. To encode at lossless mode, encoder forces the use of unit
quantizer, i.e, Q 0, where effective quantization is 1. Encoder also
disables the usage of 8x8 transform and allows only 4x4 transform;
2. At Q 0, the first order 4x4 DCT/IDCT have been switched over
to a pair of forward and inverse Walsh-Hadamard Transform
(http://goo.gl/EIsfy), with proper scaling applied to match the range
of the original 4x4 DCT/IDCT pair;
3. At Q 0, the second order remains to use the previous
walsh-hadamard transform pair. However, to maintain the reversibility
in second order transform at Q 0, scaling down is applied to first
order DC coefficients prior to forward transform, and scaling up is
applied to the second order output prior to quantization. Symmetric
upscaling and downscaling are added around inverse second order
transform;
4. At lossless mode, encoder also disables a number of minor
features to ensure no loss is introduced, these features includes:
a. Trellis quantization optimization
b. Loop filtering
c. Aggressive zero-binning, rounding and zero-bin boosting
d. Mode based zero-bin boosting
Lossless coding test was performed on all clips within the derf set,
to verify that the commit has achieved lossless compression for all
clips. The average compression ratio is around 2.57 to 1.
(http://goo.gl/dEShs)
Change-Id: Ia3aba7dd09df40dd590f93b9aba134defbc64e34
Added the ability to optionally filter the prediction data
when inter modes are selected (excludes SPLITMV, for now).
The mode selection loop considers both the filtered and
non-filtered prediction data when choosing mode. The filter
can be turned on/off at the frame-level, or signaled for
each MB.
Change-Id: I1b783c71d95a361ab36c761b07e8a6b06bc36822
Incorporates mv_ref, mbsplit and second_mv into the adaptive
entropy framework. The mv_ref framework has been modified from
before.
Adds some clean-ups and fixes.
Results with the adaptive entropy experiment are currently up by
+1.93% on derf; +2.33% std-hd and +1.87% yt-hd.
Fixed a nasty intermittent bug.
Change-Id: I4b1ac9f9483b48432597595195bfec05f31d1e39
Fixed the quantifier that optionally matches a quote before the
filename. This was originally reported in the homebrew project[1].
Note that this fix is different than patch posted there, as there are
some platforms that don't have the quote, so it needs to be included
in the expression optionally.
[1]: https://github.com/mxcl/homebrew/issues/12567#issuecomment-6434000
Change-Id: Ibf2ed93ce169d80932e877f942dc4eeb03867f8b
Update the comment that defines the allowed ranges for
delta_q and delta_lf that can be used with segmentation.
Change-Id: Ie56ad6f946704259e03ffd49921a4cfb7e1e2f1f
The commit introduces a make target 'testdata' that downloads the
required test data from the WebM project website. The data will also
be downloaded if invoking `make test` but is not a strict requirement
for only building the test executable.
The download directory is taken from the LIBVPX_TEST_DATA_PATH
environment variable, or may be specified as part of the make command.
If unset, it defaults to the current directory. It's expected that
most developers will want to set this environment variable to a place
outside their source/build trees, to avoid having to download the data
more than once.
To add test data file:
1) add a line to test/test.mk:
LIBVPX_TEST_DATA-yes += foo-bar-file.y4m
2) add its sha1sum to the test/test-data.sha1 file in the following
format:
528cc88c821e5f5b133c2b40f9c8e3f22eaacc4c foo-bar-file.y4m
3) upload the file to the website
$ gsutil cp foo-bar-file.y4m gs://downloads.webmproject.org/test_data/libvpx
This implementation will check the integrity of the test data
automatically if the `sha1sum` executable is available.
Change-Id: If6910fe304bb3f5cdcc5cb9e5f9afa5be74720d2
This is a unit test for the post-processing functions:
- vp8_post_proc_down_and_across_c
- vp8_post_proc_down_and_across_mmx
- vp8_post_proc_down_and_across_xmm
Change-Id: Iec3e690327b17470209c00417835473f6d9a35d6
Disable unit-tests. The logging in GTest would need to be adjusted.
Restructure ARM cpu detection. Flatten if-else logic.
Change #if defined(HAVE_*) to #if HAVE_* because we only need to check
for features that the library was actually built with. This should have
been harmless, as disabled feature sets wouldn't have any features to
call.
Change-Id: Iea21aa42ce5f049c53ca0376d25bcd0f36f38284
Changes relating to Issue 411
Removed code that was clearing down the segmentation data each
frame.
Added range/parameter checking in vp8_set_roimap(); Return error
if called when cyclic_refresh is enabled.
Correct setup_features() so that it sets or clears the segment update
flags as appropriate.
Change-Id: Ib31ac53006640ddf1ba7b9ec8f8b952e3eff860a
Soft enable runtime cpu detect for armv7-android target, so that it
can be disabled and remove dependency on 'cpufeatures' lib.
Change the arm_cpu_caps implementation selection such that 'no rtcd' takes
precedence over system type.
Switch to use -mtune instead of -mcpu. NDK was complaining about
-mcpu=cortex-a8 conflicting with -march=armv7-a, not sure why.
Add a linker flag to fix some cortex-a8 bug, as suggested by NDK Dev
Guide.
Examples:
Configure for armv7+neon:
./configure --target=armv7-android-gcc \
--sdk-path=/path/to/android/ndk \
--disable-runtime-cpu-detect \
--enable-realtime-only \
--disable-unit-tests
...armv7 w/o neon:
./configure --target=armv7-android-gcc \
--sdk-path=/path/to/android/ndk \
--disable-runtime-cpu-detect \
--enable-realtime-only \
--disable-neon \
--cpu=cortex-a9 \
--disable-unit-tests
Change-Id: I37e2c0592745208979deec38f7658378d4bd6cfa
The function vp8_post_proc_down_and_across_c takes the
stride of both the src and dst images as parameters, but
assumes that they are the same.
I modified the code to use the correct strides, as the
assembler versions of these functions do.
Change-Id: I222715b774cd071b21c15a4b0d2f4aef64a520de
vpx uses symbols in libm and thus we need to provide an indication to
the user of libvpx that if they want to link against libvpx they must
also link against libm.
Change-Id: I31d4068bf7f6f5b1fd222bcdf9e6a1a92fb6696f
Avoid a pthreads dependency via pthread_once() when compiled with
--disable-multithread.
In addition, this synchronization is disabled for Win32 as well, even
though we can be sure that the required primatives exist, so that the
requirements on the application when built with --disable-multithread
are consistent across platforms.
Users using libvpx built with --disable-multithread in a multithreaded
context should provide their own synchronization. Updated the
documentation to vpx_codec_enc_init_ver() and vpx_codec_dec_init_ver()
to note this requirement. Moved the RTCD initialization call to match
this description, as previously it didn't happen until the first
frame.
Change-Id: Id576f6bce2758362188278d3085051c218a56d4a
This patch incorporates adaptive entropy coding of coefficient tokens,
and mode/mv information based on distributions encountered in a frame.
Specifically, there is an initial forward update to the probabilities
in the bitstream as before for coding the symbols in the frame, however
at the end of decoding each frame, the forward update to the
probabilities is reverted and instead the probabilities are updated
towards the actual distributions encountered within the frame.
The amount of update is weighted by the number of hits within each
context.
Results on derf/hd/std-hd are all up by 1.6%.
On derf, the most of the gains come from coefficients, however for the
hd and std-hd sets, the most of the gains come from the mode/mv
information updates.
Change-Id: I708c0e11fdacafee04940fe7ae159ba6844005fd
This commit is to remove two arrays, which contain the probabilities
of how likely each probability in coef_probs table is updated. The
commit changed to use a fixed number "252".
Surprisedly, the overall impact on quality is close to zero, which
basically says the two big static arrays are not helpful at all.
derf: -0.016%, -0.020%
std-hd: 0.000%, -0.013%
yt: -0.022%, +0.007%
yt-hd: -0.038%, +0.034%
Change-Id: Ifee94d28a37dcab4f1d2b994bd5b07575be42b72
This commit added the ability to accumulate the coef stats across
different encodings using an intermediate binary stats files. The
accumulation happens only the binary stats file exists in current
directory. The encoder needs to be built with "ENTROPY_STATS" to
allow the output. The commit also fixed a few formating issues in
output stats file.
Change-Id: Ib1a41180aa554845cf51e4421a230b128a3a82b4
Changes to calculation of sr_coded_error to include 0,0 case.
Experimental use of sr_coded_error in calculating correction factor
for estimating the allowable Q range.
Reinstated some code needed for calculating section_intra_rating.
Add flash detection in calculation of KF boost
Increased tolerance in testing candidate key frames (needed with
longer motion search as this tends to slightly increase inter %.
Zbin changes for 8x8.
Other minor adjustments, refactoring and bug fixes.
Reinstated some motion break out clauses in boost loop
as their removal hurt a few 50fps clips badly in the std set.
It may be possible to remove them again later if a better way
can be found of preventing overly long gf intervals.
Change-Id: Iee686d0c31072828bb1ccd2bc63f5f1c7c548ea2
Allows building the library with the gcc -pedantic option, for improved
portabilty. In particular, this commit removes usage of C99/C++ style
single-line comments and dynamic struct initializers. This is a
continuation of the work done in commit 97b766a46, which removed most
of these warnings for decode only builds.
Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966
Frame dropping decision is made by evaluating both current frame
and next frame's buffer_level. If both buffer_levels are less
than drop_mark, next frame is dropped. When frame dropping is
over, namely, buffer_level becomes normal again, we need to
reset decimation_count to 0.
Change-Id: Iae182612e61e0da367fbd43afdc90738d975d1a3
The logic for spatial resizing is done after the Q is selected for the
frame. This causes a problem that the Q we select for the (resized)
key frame may be based on a different resolution than the frame we
will encode.
This fix is to ensure that, when resize is on, the selected Q is still
based on the resolution of the frame to be encoded.
Change-Id: Ia49a9eac5f64e48d1c00dfc7ed4ce26fe84d3fa1
The change in assembly offset files to define values as const int broke
Windows build, because the variables are stored in .rdata section instead
of .data section.
This CL changes the integer peeking from .data to .rdata.
Change-Id: I87e465ddcc78d39ec29f3720ea7df0ab807d5512
This change is to allow obj_int_extract to extract all integers
in the data segment. With the const keyword these variables are
forced into the .rodata segment even for zero variable value.
We had a problem before that zero valueed variables would get
assigned to BSS segment that fooled obj_int_extract to give
incorrect values.
Change-Id: Icd94f80a8ab356879894ca508bf132d20b865299
Update the Visual Studio builds to support the new monolithic unit
test binary.
Includes minor semi-cosmetic refactoring of solution.mk, as the
%vpx.vcproj match is no longer appropriate given the test_libvpx
target.
Change-Id: I29e6e07c39e72b54a4b3eaca5b9b7877ef3fb134
Variables m & mi were being dereferenced when they might
hold invalid values.
The fix is simply to move these dereferences to after the
point at which mb_row and mb_col are tested for validity.
Change-Id: Ib16561efa9792dc469759936189ea379d374ad20
Compares the sum of differences between the input block and the averaged
block. If they differ too much the block will not be filtered. Negligible
perfomance hit.
Change-Id: Ib1c31a265efd4d100b3abc4a1ea6675038c8ddde
Add PRIVATE macro for adding private_extern directive for yasm
to hide global symbols. This is only enabled if -DCHROMIUM is used
with YASM.
Also fixed a small problem with rtcd_defs.sh to guard TEMPORAL_DENOISING.
Change-Id: I9027fce3ebddcf20078293e4b86b396f21da7857
This extends the denoiser to work for temporally scalable
coding.
I believe this also fixes a very rare but really bad bug in the original
implementation.
Change-Id: I8b3593a8c54b86eb76f785af1970935f7d56262a
The change in assembly offset files to define values as const int broke
Windows build, because the variables are stored in .rdata section instead
of .data section.
This CL changes the integer peeking from .data to .rdata.
Change-Id: I87e465ddcc78d39ec29f3720ea7df0ab807d5512
This fix addresses some problems with very complex clips like
handling of flashes on clips like crew (which was made worse
by an earlier patch (derf and std-hd)).
Most clips a small effect but some between 1 & 2%
Derf +0.039, +0.211%
YT +0.042, +0.083%
Change-Id: I65fc7c13afc31482040068544dd65b8808f5cb4a
Compares the sum of differences between the input block and the averaged
block. If they differ too much the block will not be filtered. Negligible
perfomance hit.
Change-Id: Ib1c31a265efd4d100b3abc4a1ea6675038c8ddde
Removed the local scaling factor est_max_qcorrection_factor
and related code to simplify estimateq calculation (little effect
anyway)
Cap range of total correction factor.
Slight change to break out case to turn off arf.
Change-Id: I748187737ba93cfadf016f3dfdf8d2741934067f
the commit fixed a number of compiling issues when some epxeriments
are turned on at the same time.
Change-Id: Idb15b215e2d2a7d25f2707f99ef55a34e7301ce7
Adds a test that ensures the application is able to trigger frame size
changes via vpx_codec_enc_config_set()
Change-Id: I231c062e533d75c8d63c5f8a5544650117429a63
This extends the denoiser to work for temporally scalable
coding.
I believe this also fixes a very rare but really bad bug in the original
implementation.
Change-Id: I8b3593a8c54b86eb76f785af1970935f7d56262a
This change is to allow obj_int_extract to extract all integers
in the data segment. With the const keyword these variables are
forced into the .rodata segment even for zero variable value.
We had a problem before that zero valueed variables would get
assigned to BSS segment that fooled obj_int_extract to give
incorrect values.
Change-Id: Icd94f80a8ab356879894ca508bf132d20b865299
Add PRIVATE macro for adding private_extern directive for yasm
to hide global symbols. This is only enabled if -DCHROMIUM is used
with YASM.
Also fixed a small problem with rtcd_defs.sh to guard TEMPORAL_DENOISING.
Change-Id: I9027fce3ebddcf20078293e4b86b396f21da7857
After a key frame encoding, the frame type could change while
filtering is still going on. Pass the frame type as parameter to the
loopfilter function and don't read it from common storage.
vp8cx_set_alt_lf_level has to be done before packing the stream.
Currently alt_lf_level is not used so there hasn't been any visible
problem here.
Change-Id: Ia114162158cd833c2b16e3b89303cc9c91f19165
The two-pass code does not support the case where the application
changes the frame size dynamically. Add this case to the validation
checks in the vpx_codec_enc_config_set() path.
Change-Id: Idadc42c7c3bd566ecdbce30d8dd720add097f992
* changes:
Add initial keyframe tests
Move all tests to test/ directory
Enable unit tests by default
Build unit tests monolithically
configure: initial support for CXX, CXXFLAGS variables
Resolution changes in calls to vpx_codec_enc_config_set() would cause
a memory leak due to failing to release the lookahead and alt ref
buffers.
Change-Id: I48392ea25e71fe2760d60cfde3fb3874598cc85f
Reverted part of change in memory alllocation code, which ensures
that the function returns 0 and encoder works correctly when
CONFIG_MULTI_RES_ENCODING isn't turned on.
Change-Id: Id5d5e7f2c8bd9e961a6dca79d257e8185f0d592a
The commit changed how baseline 8x8 coefficient probabilities are
initialized, to be consistent with the initialization of baseline
4x4 coefficient probabilities.
The commit does not have any effect on compression.
Change-Id: Ifb3902b5dc0b0c2e6dc3aa5d4a6589d528e58355
After a key frame encoding, the frame type could change while
filtering is still going on. Pass the frame type as parameter to the
loopfilter function and don't read it from common storage.
vp8cx_set_alt_lf_level has to be done before packing the stream.
Currently alt_lf_level is not used so there hasn't been any visible
problem here.
Change-Id: Ia114162158cd833c2b16e3b89303cc9c91f19165
Implements a couple simple tests of the encoder API using the gtest
framework:
TestDisableKeyframes
TestForceKeyframe
TestKeyframeMaxDistance
Change-Id: I38e93fe242fbeb30bb11b23ac12de8ddc291a28d
Rework unit tests to have a single executable rather than many, which
should avoid pollution of the visual studio project namespace, improve
build times, and make it easier to use the gtest test sharding system
when we get these going on the continuous build cluster.
Change-Id: If4c3e5d4b3515522869de6c89455c2a64697cca6
Use CXX rather than assuming g++ to invoke the compiler. Also introduce
separate CXXFLAGS, as certain CFLAGS we enable by default cause warnings
with g++.
Change-Id: Ia2f40ef27c93e45c971d070cc58bdcde9da2ac7c
aligned buffers improve performace. this change brings vpxenc &
vp8_scalable_patterns in line with the other examples.
Change-Id: I4cf9f3e4728b901161905dd7ccb092e774ffb15f
Remove dependency on amount and speed of motion as this
may not behave well across different image sizes.
Tweak impact of % inter.
Add in experimental adjustment based on relative quality of an
older second reference frame.
Cap range of decay values allowed.
Some small + effect on derf but -ve on yt & hd at this stage.
Change-Id: I390d6f6ebe67a2eb0b834980d0d4650124980d3e
In multi-resolution encoding, frame_type decision for each frame
is made by the lowest-resolution encoder. For all other higher-
resolution encoders, kf_mode is always set to VPX_KF_DISABLED,
and they are forced to use the same frame_type picked by the
lowest-resolution encoder.
Change-Id: Ic4d52ec65bbc012ca9c2d236210e28a295591eaf
I now see I didn't write a very long description, so let's do it
here then. We took a pretty big quality hit (0.1-0.2%) from my
recent fix of the inversion of arguments to vp8_cost_bit() in the
RD reference frame costing. I looked into it and basically the
costing prevented us from switching reference frames. This is of
course silly, since each frame codes its own prob_intra_coded, so
using last frame cost indications as a limiting factor can never
be right.
Here, I've rewritten that code to estimate costings based partially
on statistics from progress on current frame encoding. Overall,
this gives us a ~0.2%-0.3% improvement over what we had previously
before my argument-inversion-fix, and thus about ~0.4% over current
git (on derf-set), and a little more (0.5-1.0%) on HD/STD-HD/YT.
Change-Id: I79ebd4ccec4d6edbf0e152d9590d103ba2747775
base the static image test off a measure of 0,0 motion
instead of the decay accumulator value.
Change "transition to still detection" to compare the
decay rate from successive frames.
Minor tweak to the arf extra boost given based on the
number of frames affected.
Removed unused variable mod_err_per_mb_accumulator.
Change-Id: Idd8360083ad409e45f133ce97dd2488259003e64
The commit added an integer version of 8x8 forward DCT, based on the
orginal forward DCT from VP6. The constants, roundings, and shifts
were adjusted to improve the accuracy. The latest patch has a very
similar accuracy in term of round trip error against the floating
point version.
It should be noted here that the purpose of the patch is to help
encoding speed and facilitate all other experiments. There will be
futher review in combination with inverse DCT before finalization.
configure with "--enable--int_8x8fdct" to use the integer version
Change-Id: I5a4f80507429f0e07cf02a13768ec81cbfddc5bc
Some marginal impact due to the fact that it makes use of
arf more likely / stable even in hard sections.
Change-Id: Ic72fda0f63eefc9433914b5d9cd374d515810129
Removed unused function.
Added tentative code to take error score of an older frame
into account when calculating Q range. However, for now
it is disabled pending merging other changes and testing.
Change-Id: Ie89955e70319dac31b79e3b833e3352712a061ec
Allow CHOST to override the gcc -dumpmachine output. This allows to
use the target autodetection code when cross compiling by setting the
CHOST variable.
On Gentoo, we would like to support easy cross-compilation, and for
libvpx this would basically mean copying the code in
build/make/configure.sh to setup the right --target option. It seems a
lot easier to let it guess by itself.
Another option I considered was using CROSS-gcc instead but this would
not work for our multilib setups: They use gcc -m32 to build 32bits
binaries and gcc -m32 -dumpmachine will output the 64bits version,
which would then make libvpx wrongly believe it is building for a
64bits architecture.
Change-Id: I05a19be402228f749e23be7473ca53ae74fd2186
Remove testing of whether we estimate that it will be possible
to code an arf at a lower Q than the ambient Q. This adds quite
a bit of extra code and complexity for marginal gain.
Factored out some code relating to ARNR selection to a separate
function as this is likely to be changed / simplified soon.
Change-Id: Ia1cf060405637ef5bbf7018355437be21d12375f
Removed odd *100 >> 4 factor from boost calculations. Not all the
calculations exactly match what was there before so there may be
some minor impact on results.
Some other minor tidying up in regard to coding conventions.
The specific values of factors and thresholds will likely change as
part of subsequent patches.
Change-Id: Id976321484ac02ba50294cf54fafbc17dda85686
These frames can force reference frame (arf), mode (zeromv) and skip,
which means that if we use compound prediction (i.e. arf+last), we
might use a blend of a perfect (arf) and an imperfect (last) predictor,
leading to semi-garbage display and thus a huge drop in SSIM/PSNR (up
to 10dB for some frames I analyzed).
Gives a +0.2% gain on YT.
Change-Id: If1f2b7899ad165684af3808fd379295e82558cbb
This is the first patch in a series of changes to the first
pass code. (Broken down for ease of testing/merging/review).
This patch introduces a new stats element "sr_coded_error".
This is the coded error recorded vs the second reference
frame (which is updated such that it lags by at least one frame).
No use is made of the new structure in this change so this patch
should have no material effect.
Removed some ifdefs and deprecated code (#if NEW_BOOST).
Removed twopass.gf_decay_rate (not used any more)
Change-Id: I1be672a73017f7c13fd50fb4f99236aa2ed30916
This commit changed the forward and the inverse 4x4 Walsh Hadamard
transform to a new pair, where the inverse transform can pefectly
reconstuct the input to forward transform. It also does so without
changing the input and output value range. Even more, it does not
change the complexity of the transforms.
While it was not expected to improve the results of our current test,
it does improve std-hd set by 0.2% on all metrics. No change on derf.
Change-Id: Ie4f23ddd3a0f3c5fbe97fb58399f860031f99337
Accept the same range of inputs for the VP8E_SET_NOISE_SENSITIVITY
control, regardless of whether temporal denoising is enabled or not.
This is important for maintaining compatibility with existing
applications.
Change-Id: I94cd4bb09bf7c803516701a394cf1a63bfec0097
1. block types
There are only three types of blocks for 8x8 transformed MBs, i.e. Y
block with DC does not exist for 8x8 transformed MBs as all MB using
8x8 transform have 2nd order haar transform. This commit introduced
a new macro BLOCK_TYPES_8X8 to reflect such fact.
2. context counters
This commit also fixed the mixed of context_counters between 4x4 and
8x8 transformed MBs. The mixed use of the counters leads me to think
the existing the context probabilities were not properly generated
from 8x8 transformed MBs.
3. redundant collecting in recoding
The commit also corrected the code that accumulates entropy stats by
making sure stats only collected for final packing, not during the
recode loop
Change-Id: I029f09f8f60bd0c3240cc392ff5c6d05435e322c
Adds a unit test to the boolcoder that tests encoding
and decoding thousands of different bits, with different
probabilities in different patterns.
Code borrowed from the webp project - and its committers.
Change-Id: Icabbb884d57e666496490c961dd29b246144ab3e
Make functions only referenced from one translation unit static. Other
symbols with extern linkage get a vp8/vpx prefix.
Change-Id: I928c7e0d0d36e89ac78cb54ff8bb28748727834f
Change If4321cc5 fixed a bug caused by forward declarations not being
kept in sync across C files, resulting in a function call with the
wrong arguments. The commit moves the affected function declarations
into a header file, along with the other symbols from encodeframe.c
that were being sloppily shared.
Change-Id: I76a7b4c66d4fe175f9cbef7e52148655e4bb9ba1
Removes all runtime initialization of global data. This commit is a
squashed version of the following series cherry-picked from master.
This is necessary because of a change that was merged to the tester
that depends on the scaler being moved to the RTCD framework, which
is a worthwhile thing to include in Eider anyway.
- a91b42f02 Makes all global data in entropy.c const
- b35a0db0e Makes all global data in tokenize.c const
- 441cac8ea Makes all mode token tables const
- 5948a0210 Ports vpx_xcaler to new RTCD method
- 317d4244c Makes all mode token tables const part 2
Change-Id: Ifeaea24df2b731e7c509fa6c6ef6891a374afc26
Move the notion of 0 bitrate implying skip deeper into the codec,
rather than doing it at the multi-encoder API level. This preserves
v1.0.0 ABI compatibility, rather than forcing a bump to v2.0.0 over a
minor change. Also, this allows the case where the application can
selectively enable and disable the larger resolution(s) without having
to reinitialize the codec instace (for instance, if no target is
receiving the full resolution stream).
It's not clear how deep to push this check. It may be valuable to
allow the framerate adaptation code to run, for example. Currently put
the check as early as possible for simplicity, should reevaluate this
as this feature gains real use.
Change-Id: I371709b8c6b52185a1c71a166a131ecc244582f0
Original patch by Ginn Chen <ginn.chen@oracle.com> against libvpx
v0.9.0.
I've forward-ported it to the current version (which mostly
involved removing hunks that were no longer relevant), since I've
given up on getting Ginn to submit this upstream himself.
Change-Id: I403c757c831c78d820ebcfe417e717b470a1d022
Besides imposing a performance penalty at startup in most
configurations, these relocations break the dynamic linker for
native Fennec, since it does not support them at all.
Change-Id: Id5dc768609354ebb4379966eb61a7313e6fd18de
These are warnings in most builds, but show up as compile errors on
some platforms when these headers are included from C++ code.
Change-Id: I6c523b4dbbc699075fe73830442b51922e5a61d5
These are warnings in most builds, but show up as compile errors on
some platforms when these headers are included from C++ code.
Change-Id: I6c523b4dbbc699075fe73830442b51922e5a61d5
This commit adjusted slightly the 4x4 coefficents band definition to
better classify coefficients with similar distributions and usages.
It helps derf set about .1%, it is alos slightly positive for std-hd
set, where 4x4 blocks are used less frequently.
The commit also removed a const array not in use.
Change-Id: I78d16905d4036641ec905b0c32c190c1def5b249
These values can be overridden with some poorly documented and
overloaded options: --libc and --sdk-path
../libvpx/configure --target=armv7-darwin-gcc --sdk-path=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer --libc=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS5.1.sdk/
So for someone who still wants to build with the iOS 5 SDK, the last
part of the path should be iPhoneOS5.0.sdk
Change-Id: Ibe93d96ae828c619700dc3222983aa4c30456b88
The ARNR filter uses a motion compensated temporal filter,
but the motion estimation implementation accounts for the
cost of the mv in its decision making process. The ARNR
filter uses a dummy cost table initialized to 0 as a way
to ignore the mv costs (which are irrelevant to the filter).
This CL modifies the ARNR filter implementation to so that
the mv costing is ignored without the requirement for
dummy tables.
Change-Id: I0dd9620c3b70682f938b2a70912c11d4d7c9284c
The ARNR filter uses MC to find the best match between the
ARF and other nearby frames in the filter-set. Since the
ARF is a member of the filter-set, MC in that case is
unnecesssary. This patch modifies the filter so it does
not apply MC in this case.
Change-Id: Ic0321199c08db2189a57f28d1700b745bc7ff66d
The ARNR filter uses a motion compensated temporal filter,
but the motion estimation implementation accounts for the
cost of the mv in its decision making process. The ARNR
filter uses a dummy cost table initialized to 0 as a way
to ignore the mv costs (which are irrelevant to the filter).
This CL modifies the ARNR filter implementation so that
the mv costing is ignored without the requirement for
dummy tables.
Change-Id: I4196aa5c24da63f858ff54fbaa5fc85ae1f1957f
The backup MODE_INFO buffer used in the error concealment was
allocated in the codec common parts allocation even though this is a
decoder only resource. Moved the allocation to the decoder side.
No need to update_mode_info_border as mode_info buffers are zero
allocated.
This fixes also a potential memory leak as the EC overlaps buffer was not
properly released before reallocation after a frame size change.
Change-Id: I12803d3e012308d95669069980b1c95973fb775f
Adds a speed feature to conduct a brute force search among a set of
available interpolation filters for the best one in an RD sense.
There is a gain of 0.4% on derf, 1.0% on Std-HD.
Patch 2: A macro added to determine if the encoder state is reset
for each new filter tried.
Patch 3: rebase, also fixes a bug (decodframe.c) introduced by a
couple of missing function pointer assignements.
Patch 4: rebase.
Change-Id: Ic9ccca9d8c35c6af557449ae867391a2f996cc29
Rather than using the static default maximum keyframe spacing
provided by vpx_codec_enc_config_default() set the default
value to 5 times the frame rate. Five seconds is too long for
live streaming applications, but is a compromise between seek
efficiency and giving the encoder freedom to choose keyframe
locations.
The five second value is from James Zern's suggestion in
http://article.gmane.org/gmane.comp.multimedia.webm.user/2945
Change-Id: Ib7274dc248589c433c06e68ca07232e97f7ce17f
This commit merge the QI mode experiment. As the experiment affects
the encoding of intra coding modes on key frame only, the overall
effect of the experiment on encoding tests is insignificant.
Change-Id: I9e4e3933adface88867ad429cee3986e529c511d
The commit merges the UVINTRA experiment and removed the related
macros. The overall effect of the experiment is a small gain (.1%
on derf)
Change-Id: Ia34b3312fb9b5b34c9ba111bf0fa78c6f78ac80b
Race was introduced by https://gerrit.chromium.org/gerrit/15563.
If loopfilter related config params were changed between frames, or
after a KEY frame, there could be a mismatch between encoder's and
decoder's recontructed image. In worst case, when frame buffers are
reallocated because of a size change, the loopfilter could
do an invalid data access (segmentation fault).
Fixes:
Sync with the loopfilter before applying any encoder changes in
vp8_change_config().
Moved the loopfilter synching to the top of
encode_frame_to_data_rate() so that it's done before any alteration of
the encoder.
Change-Id: Ide5245d2a2aeed78752de750c0110bc4b46f5b7b
Increment the last_row_mb_col counter by nsync after last MB of row is
ready. This way we dont need to check for last MB of row when
synching.
Set last MB of row ready just after row extension is done, This avoids
o potential race condition whit the processing of last MB of next row.
Change-Id: I19c44fd6041116ee5483be2143b4f4bfcd149eac
Adds differential encoding of prob updates using a subexponential
code centered around the previous probability value.
Also searches for the most cost-effective update, and breaks
up the coefficient updates into smaller groups.
Small gain on Derf: 0.2%
Change-Id: Ie0071e3dc113e3d0d7ab95b6442bb07a89970030
RD costs were local to MACROBLOCK data and had to be copied all the
time to each thread's MACROBLOCK data. Tables moved to a common place
and only pointers are setup for each encoding thread.
vp8_cost_tokens() generates 'int' costs so changed all types to be
int (i.e. removed unsigned).
NOTE: Could do some more cleaning in vp8cx_init_mbrthread_data().
Change-Id: Ifa4de4c6286dffaca7ed3082041fe5af1345ddc0
The block pointers and offset do not need to be calculated for every
frame. Block internal predictors can be update once when decoder is
allocated. Destination and previous buffer offsets have to be updated
just when frame size is changing.
Change-Id: I92ca8df0e6aaac4cc35ab890751d446760bf82e2
Key frame macrobock and block mode probabilities are constant.
Remove the allocation of tables for each codec instance and use
instead the default const prob tables.
Change-Id: I8361798ac491f9b3889e86925a494c58647c753f
Move the notion of 0 bitrate implying skip deeper into the codec,
rather than doing it at the multi-encoder API level. This preserves
v1.0.0 ABI compatibility, rather than forcing a bump to v2.0.0 over a
minor change. Also, this allows the case where the application can
selectively enable and disable the larger resolution(s) without having
to reinitialize the codec instace (for instance, if no target is
receiving the full resolution stream).
It's not clear how deep to push this check. It may be valuable to
allow the framerate adaptation code to run, for example. Currently put
the check as early as possible for simplicity, should reevaluate this
as this feature gains real use.
Change-Id: I371709b8c6b52185a1c71a166a131ecc244582f0
Look for changes in the codec's configured w/h instead of its active
w/h when forcing keyframes. Otherwise calls to vp8_change_config()
will force a keyframe when spatial resampling is active.
Change-Id: Ie0d20e70507004e714ad40b640aa5b434251eb32
This commit changed to enable the usage 8x8 transform for all frame
type, all resolution and all quantizer range. This has an overall
benefit .2% to .3% in term of compression, but more importantly,
the difficult clips benefits much more, up to 2% to 3% on clips
like football, harbour and so on.
We observed some weird humps on very high end on a couple of youtube
clips, but have determined the underly cause was the aggressive zbin
having an effect of lowering rate with lower quality, which have
an impact on slide show clips around 60DB.
The commit does not change the association between prediction mode
and transform size.
Change-Id: I33043bdce6207528ae00b4a4b26d8ff63cfea1f4
This is to prevent the evaluation of a mode from using values left
over from a mode evaluated prior in the loop.
Change-Id: Ife2c6ceb76d2f7365fd262515d3ae48229033c2d
(see Change I9b2ccc88: Makes all mode token tables const)
Further remove runtime table initialization and use
precalculated const data. Data footprint reduced
by 4112 bytes.
Change-Id: Ia3ae9fc19f77316b045cabff01f6e5f0876a86ab
Ensure that RTCD function pointers are set at most once, to silence
some data race warnings. Implementation provided for POSIX threads and
Win32, with the prior unsynchronized behavior left in place for other
platforms.
Change-Id: I65c5856df43ef67043b3d5f26ddafddd8fcb2f7e
We can get rid of all remaining global initializers now:
vp8_scale_machine_specific_config()
vp8_initialize()
vp8dx_initialize()
Change-Id: I2825cea5d1c01ad9f6c45df49a0f86d803bfeb69
Mode token tabels precalculated in entropymode.c.
Removes vp8_initialize_common()as all common global data
is precalculated const now.
Change-Id: I9b2ccc883e4f618069e1bc180dad3a823394eb73
With the NEWENTROPY experiment enabled encoding certain clips
produced invlid bitstreams, or files that had a high degree
of artefacts.
This was the results of pointers in MACROBLOCKD not being
setup correctly (mode_info_context and prev_mode_info_context).
Change-Id: Ice13e1efa8bd122997d2f8f3f1e761c6c16e0403
These contexts need to be saved and restored for recode, otherwise
encoder/decoder mismatch happens for some clips (eg._mobcal 720p)
Change-Id: Ic65cfa0bf56ed0472ecab962ce31394d59d344bf
Removes all runtime initialization of global data in tokenize.c.
DCT token and cost tabels are pre-generated.
Second patch in a series to make sure code is reentrant.
Change-Id: Iab48b5fe290129823947b669413101f22a1bcac0
Removes all runtime initialization of global data in entropy.c.
Precalculated values are used for initializing all entropy related
tabels.
First patch in a series to make sure code is reentrant.
Change-Id: I9aac91a2a26f96d73c6470d772a343df63bfe633
When producing an invisible ARF, the time stamp counters aren't
updated since the last time stamp is seen by the codec twice. The
prior code was trapping this case with refresh_alt_ref, but this isn't
correct for other uses of the ARF. Instead, use the show_frame flag.
Change-Id: If67fff7c6c66a3606698e34e2fb5731f56b4a223
Added code to save the coding context in vp8_rd_pick_inter_mode
when the coding mode is forced to ARF(0,0).
Also, modified the MV bounds computation to comply with the
change in MV border from 32 to 64 pixels.
Change-Id: I96963a6f5f4d04ce84c807ae11e0635177c3ad6c
Local variable offsets are now consistent for the functions,
removed unused parameters, reworked the assembly to eliminate
stalls/instructions.
Change-Id: Iaa37668f8a9bb8754df435f6a51c3a08d547f879
Turning off the interpolation filter selection based on edge
proportion. This heuristics has not been working as well as
expected and I have started a more rigorous investigation into
this. We can turn this off for now since it is unnecessarily
slowing things down.
Rebase.
Change-Id: Ic5958b2b3a35ec2d8eb73b6d81617ca8fbe07e74
This commit tries to address an issue related to the oddity shown on
HD _mobcal clip, where some rather ugly blocks shown in the second
frame at low-mid bit rates if the third frame is not made a key frame
by he encoder. The fixes include: 1) made calls to sad_16x16 to be
consistent with function prototype. 2) remove the error bias to intra
and golden in mbgraph search. 3) changed the error accumulation on
inter_segment encoding to avoid potential out-of-range. 1) has no
effect on encoding results.
Encoding test show that the overall effect of the commit helps about
.2%(HD) to .3%(cif)
Change-Id: I930975a2d0c06252f01c39e0a02351529774e30b
The commit removed a limit on key frame detection, which caused a big
drop in all metric measurements for standard HD clip such as _mobcal.
This single change helps two standard HD clips by a huge amount, which
help the overall std-hd set by 2.4% (glb psnr), 0.9% (avg_psnr), 2.1%
(vpxssim).
In the result page:
http://pafr9.prod.google.com:26163/?/cns/rc-d/home/on2-prod/sunkaras/borg-test/yaowu
2012_04_02_1649_yaowu_bugfix_std-hd
2012_04_03_1452_yaowu_hump_std-hd
represent the encoding test results and std-hd set prior and after this
commit respectively.
Change-Id: Ie4313e317c737ea0e699c3a7919c1376744baa1a
This commit has made macro_block_yrd_8x8 and macro_block_yrd_8x8 to
take same parameters. It also removed a few unnecessary shifts that
has the potential to create out-of-range distortion values.
Change-Id: I4ec5afb307c3685c2a67a07c2850f0927d214455
Failed to build on Linux (as described in Android.mk) with NDK r7b.
Set vpx_rtcd.h dependency after libvpx sources are added to
LOCAL_SRC_FILES so that vpx_rtcd.h is generated before any libvpx file
is touched.
Change-Id: Ibe19d485ca9f679dc084044df0e3fb14587c4d3e
Some code re-factored / moved to allow the main
pack operation inside the recode loop so that the
size estimate is accurate.
Deletion of some redundant code relating to one pass.
Aproximate improvement over March 27 code base:
Derf 0.0%, YT 0.5%, YThd 0.3% Std_hd 0.25%
Change-Id: Id2d071794ab44f0b52935f6fcdb5733d09a6bb86
Some adjustments to zbin for t8x8.
Changes to rules for sizing forced key frames.
Some extra stats output in tmp.stt.
Approximate gain on YT-hd set 0.5%
There are still issues in sizing key frames and gf/arf frames
when the image is largely static. These in part relate to
problems with cost estimates in the recode loop.
Change-Id: I6f0159dc8a8faeab4115a19c668d442491619a68
This is the first patch to add superblock (32x32) coding
order capabilities. It does not yet do any mode selection
at the SB level, that will follow in a further patch.
This patch encodes rows of SBs rather than
MBs, each SB contains 2x2 MBs.
Two intra prediction modes have been disabled since they
require reconstructed data for the above-right MB which
may not have been encoded yet (e.g. for the bottom right
MB in each SB).
Results on the one test clip I have tried (720p GIPS clip)
suggest that it is somewhere around 0.2dB worse than the
baseline version, so there may be bugs.
It has been tested with no experiments enabled and with
the following 3 experiments enabled:
--enable-enhanced_interp
--enable-high_precision_mv
--enable-sixteenth_subpel_uv
in each case the decode buffer matches the recon buffer
(using "cmp" to compare the dumped/decoded frames).
Note: Testing these experiments individually created
errors.
Some problems were found with other experiments but it
is unclear what state these experiments are in:
--enable-comp_intra_pred
--enable-newentropy
--enable-uvintra
This code has not been extensively tested yet, so there
is every likelihood that further bugs remain. I also
intend to do some code cleanup & refactoring in tandem
with the next patch that adds the 32x32 modes.
Change-Id: I1eba7f740a70b3510df58db53464535ef881b4d9
This patch includes:
1. fixes to disable block based termporal mixing when motion
is detected (because this version of mfqe only handles zero motion).
2. The criterion used for determining whether to mix or
not are changed to use squared differences rather than
absolute differences.
3. Additional checks on color mismatch and excessive block
flatness added. If the block as decoded has very low activity
it is unlikely to yield benefits for mixing.
Change-Id: I07331e5ab5ba64844c56e84b1a4b7de823eac6cb
In cases where you have a flat background occluded by a moving object
of similar luminosity in the foreground, it was likely that the
foreground blocks would persist for a few frames after the background
is uncovered. This is particularly noticable when the object has a
different color than the background, so add the chroma planes in as an
additional check.
In addition, for block sizes of 8 and 16, the luma threshold is
applied on four subblocks independently, which helps when only part of
the background in the block has been uncovered.
This fixes issue #392, which includes a test clip to reproduce the
issue.
BUG=392
Change-Id: I2bd7b2b0e25e912dcac342e5ad6e8914f5afd302
When using 'make dist' after --disable-vp8[encoder|decoder] it would
fail to recognize the option. This would only occur when also specifying
--enable-install-docs and --enable-install-srcs but not
--enable-codec-srcs
Including vpx/ fixes builds with --enable-codec-srcs
vpx_timer.h is also required for vpxenc.c
Change-Id: Ie3e28b2f7ec7ee6d5961d3843f9eab869f79c35b
truncate() operates from the current file pointer position. On at least
Linux specifying 0 without resetting the pointer will pad the file with
zeros to the current offset.
Change-Id: Ide704a1097f46c0c530f27212bb12e923f93e2d6
It's common for commit messages to be wrapped at odd places. git-gui
is often to blame. Adds support for automatically fixing up these
messages if running ftfy --amend, and adds a new option --msg-only for
fixing only the commit message.
Change-Id: Ia7ea529f8cb7395d34d9b39f1192598e9a1e315b
Reduced the size of the struct by 8 bytes, which would be
a memory savings of 64800 bytes for 1080 resolutions. Had
an extra byte, so created an is_4x4 for B_PRED or SPLITMV
modes. This simplified the mode checks in
vp8_reset_mb_tokens_context and vp8_decode_mb_tokens.
Change-Id: Ibec27784139abdc34d4d01f73c09f43e9e10e0f5
Found this bug while tracking down some anomalies in my experiments.
Since vp8_cost_one and vp8_cost_zero return unsigned int, the
bit shift by 8 will be incorrect if the value is negative.
I am cautiously optimistic that this fix will make the prob
updates more correct and somewhat improve results across the board.
But the update probabilities will need to be retuned I think.
Patch 2: Adding more of the same fixes using a macro.
Change-Id: I1a168f040e74e8c67e7225103b1c2af9a611da49
This is a utility for applying a limited amount of style correction on
a change-by-change basis. Rather than a big-bang reformatting, this
tool attempts to only correct the style in diff hunks that you touch.
This should make the cosmetic changes small enough that we can mix them
with functional changes without destroying the diffs, and there's an
escape hatch for separating the reformatting to a second commit for
purists and cases where it hurts readability.
At this time, the script requires a clean working tree, so run it after
you've commited your changes. Run without arguments, the style
corrections will be applied and left unstaged in your working copy. It
also supports the --amend option, which will automatically amend your
HEAD with the corrected style, and --commit, which will create a new
change dependent on your HEAD that contains only the whitespace changes.
There are a number of ways this could be applied in an automated manner
if this proves to be useful, either on a project-wide or per-user
basis. This doesn't buy anything in terms of real code quality, the
intent here would be to keep formatting nits out of review comments in
favor of more meaningful ones and help people whose habitual style
doesn't match the baseline.
Requires astyle[1] 1.24 or newer.
[1]: http://astyle.sourceforge.net/
Change-Id: I2fb3434de8479655e9811f094029bb90e5d757e1
This new vp8_decode_mb_tokens() uses a modified version of
WebP's GetCoeffs function. For now, the dequant does not
occur in GetCoeffs.
Tests showed performance improvements up to 2.5% depending
on material.
Change-Id: Ia24d78627e16ffee5eb4d777ee8379a9270f07c5
Adds logic to disable mfqe for the first frame after a configuration
change such as change in resolution. Also adds some missing
if CONFIG_POSTPROC macro checks.
Change-Id: If29053dad50b676bd29189ab7f9fe250eb5d30b3
When ac_yquant>171, a key frame is enabled to use 8x8 transform. In
such case, MBs with DC_PRED or TM_PRED are selected to use T8x8. This
change helped the full STD-HD set by ~.1% or so, which is reasonable
considering how often key frame occurs in these encodings.
Change-Id: Id17009ef6327252177b19e6bf0d6628827febaf1
Deprecate fast quant and strict_quant code.
Small effect on quality as fast was used in first pass but the
effect is basically neutral across the derf set.
The rationale here is to reduce the number of code paths for
now to make experimentation easier. Optimized and fast code
options can be re-introduced later along with other encode
speed options.
Change-Id: Ia30c5daf3dbc52e72c83b277a1d281e3c934cdad
Various refactoring to make the subpel motion compensation
filters switchable by a frame level field.
Two types of 8-tap filters are supported in addition to the existing
bilinar and sixtap filters. One is the default 8-tap and the
other has a sharper cut-off for use with frames with substantial
edge content.
Patch 2: Added a preliminary strategy for filter selection based on
edginess detecton. Also includes some filter changes.
Change-Id: I866085bda5ae143cfdf2ec88157feaabdf7bd63a
This change added a motion search skipping mechanism similar
to what we did in second pass. For a macroblock that is very
similar to the macroblock at same location on last frame,
we can set its mv to be zero, and skip motion search. This
improves first-pass performance for slide shows and video
conferencing clips with a slight PSNR loss.
Change-Id: Ic73f9ef5604270ddd6d433170091d20361dfe229
Universal builds create subdirectories for each target. Without
BUILD_PFX we only generated one vpx_rtcd.h instead of one for each.
Change-Id: I1caed4e018c8865ffc8da15e434cae2b96154fb4
Newer XCodes have moved the SDK path from /Developer/SDKs
Use a suggestion from jorgenisaksson@gmail.com to locate it
osx_sdk_dir is not required to be set. Apple now offers a set
command line tools which do not require this. isysroot is also
not required in newer versions of XCode so only set it when we
are confident in the location.
There remain issues with the iOS configure steps which will be
addressed later
Change-Id: I4f5d7e35175d0dea84faaa6bfb52a0153c72f84b
doxygen < 1.7.? seems to have been more tolerant of single line
\if/\endif
This change fixes warnings such as:
mainpage.dox:13: warning: unable to resolve reference to `vp8_encoder-'
for \ref command
vpx_decoder.h:193: warning: explicit link request to 'n' could not be
resolved
Change-Id: If3d04af5ede1b0d1e2c63021d0e4ac8f98db20b2
The commit added a clamp to the 2nd motion vector used in compound
prediction to insure mv within UMV borders. The clamp is similar to
that of the first motion vector except that No SPLITMV is ever used
for the 2nd motion vector.
Change-Id: I26dd63c304bd66b2e03a083749cc98c641667116
The commit added a new command line option --test-decode to vpxenc.
The option enables encoder to decode compressed frames and test recon
buffers from the decode against those from encode for mismatch.
There are a few limitations on this option currently, one of them
being the match test is not done on a number of lagged frames at
the end of an encoding.
Change-Id: I80c29b46dcdcea9f48107a506b235743068862fe
The recoding loop save and restore frame coding context for recodes.
However in recoding of key frames, some of the coding context saved
was stale from last encoded inter frame. The save/restore sometimes
overwrites the re-inintialized coding context with saved context
from last frame, resulting in encoder/decoder mismatch
Change-Id: I354ae2f71074d142602d51d06544c05a2462caaf
this commit added a command line option to skip first n frames from
input file to facilitate debugging and testing.
Change-Id: I4ffc5f85fa7e193ea4bdee08cb236717de8beef1
This is a code snapshot of experimental work currently ongoing for a
next-generation codec.
The codebase has been cut down considerably from the libvpx baseline.
For example, we are currently only supporting VBR 2-pass rate control
and have removed most of the code relating to coding speed, threading,
error resilience, partitions and various other features. This is in
part to make the codebase easier to work on and experiment with, but
also because we want to have an open discussion about how the bitstream
will be structured and partitioned and not have that conversation
constrained by past work.
Our basic working pattern has been to initially encapsulate experiments
using configure options linked to #IF CONFIG_XXX statements in the
code. Once experiments have matured and we are reasonably happy that
they give benefit and can be merged without breaking other experiments,
we remove the conditional compile statements and merge them in.
Current changes include:
* Temporal coding experiment for segments (though still only 4 max, it
will likely be increased).
* Segment feature experiment - to allow various bits of information to
be coded at the segment level. Features tested so far include mode
and reference frame information, limiting end of block offset and
transform size, alongside Q and loop filter parameters, but this set
is very fluid.
* Support for 8x8 transform - 8x8 dct with 2nd order 2x2 haar is used
in MBs using 16x16 prediction modes within inter frames.
* Compound prediction (combination of signals from existing predictors
to create a new predictor).
* 8 tap interpolation filters and 1/8th pel motion vectors.
* Loop filter modifications.
* Various entropy modifications and changes to how entropy contexts and
updates are handled.
* Extended quantizer range matched to transform precision improvements.
There are also ongoing further experiments that we hope to merge in the
near future: For example, coding of motion and other aspects of the
prediction signal to better support larger image formats, use of larger
block sizes (e.g. 32x32 and up) and lossless non-transform based coding
options (especially for key frames). It is our hope that we will be
able to make regular updates and we will warmly welcome community
contributions.
Please be warned that, at this stage, the codebase is currently slower
than VP8 stable branch as most new code has not been optimized, and
even the 'C' has been deliberately written to be simple and obvious,
not fast.
The following graphs have the initial test results, numbers in the
tables measure the compression improvement in terms of percentage. The
build has the following optional experiments configured:
--enable-experimental --enable-enhanced_interp --enable-uvintra
--enable-high_precision_mv --enable-sixteenth_subpel_uv
CIF Size clips:
http://getwebm.org/tmp/cif/
HD size clips:
http://getwebm.org/tmp/hd/
(stable_20120309 represents encoding results of WebM master branch
build as of commit#7a15907)
They were encoded using the following encode parameters:
--good --cpu-used=0 -t 0 --lag-in-frames=25 --min-q=0 --max-q=63
--end-usage=0 --auto-alt-ref=1 -p 2 --pass=2 --kf-max-dist=9999
--kf-min-dist=0 --drop-frame=0 --static-thresh=0 --bias-pct=50
--minsection-pct=0 --maxsection-pct=800 --sharpness=0
--arnr-maxframes=7 --arnr-strength=3(for HD,6 for CIF)
--arnr-type=3
Change-Id: I5c62ed09cfff5815a2bb34e7820d6a810c23183c
The commit added a clamp to the 2nd motion vector used in compound
prediction to insure mv within UMV borders. The clamp is similar to
that of the first motion vector except that No SPLITMV is ever used
for the 2nd motion vector.
Change-Id: I26dd63c304bd66b2e03a083749cc98c641667116
The commit added a new command line option --test-decode to vpxenc.
The option enables encoder to decode compressed frames and test recon
buffers from the decode against those from encode for mismatch.
There are a few limitations on this option currently, one of them
being the match test is not done on a number of lagged frames at
the end of an encoding.
Change-Id: I80c29b46dcdcea9f48107a506b235743068862fe
The recoding loop save and restore frame coding context for recodes.
However in recoding of key frames, some of the coding context saved
was stale from last encoded inter frame. The save/restore sometimes
overwrites the re-inintialized coding context with saved context
from last frame, resulting in encoder/decoder mismatch
Change-Id: I354ae2f71074d142602d51d06544c05a2462caaf
This issue likely doesn't appear in the unmodified encoder, but
sufficient hacking on the mode selection loop can expose it.
Change-Id: I8a35831e8f08b549806d0c2c6900d42af883f78f
this commit added a command line option to skip first n frames from
input file to facilitate debugging and testing.
Change-Id: I4ffc5f85fa7e193ea4bdee08cb236717de8beef1
In a previous commit, the duplicate of headerfile defaultcoefcounts.h
was identified. This commit updates the .mk file to ensure configure
and make works properly for all platforms.
Change-Id: I31a39c809a734ba438ee53db700f252e9a03eddd
The maximum psnr has a marginal impact on the overall output in high
quality encodings, the change will make sure the psnr output to be
consistent with encoder internal stats.
Change-Id: I35cf2f85008ec127a7d91c9eb69fa7811798ae32
Pulled out super block code for the snapshot as this
is not quite ready and will need an extensive re-merge.
Change-Id: I436369b511257447a7b0ea064016cb63f5011849
Break MFQE code into it's own file.
It is currently only valid for 16x16 and 8x8 Y blocks. It also filters
4x4 U/V blocks.
Refactor filtering and add associated assembly. Limited test cases show
--mfqe introduces a penalty of ~20% with HD content. The assembly
reduces the penalty to ~15%
Change-Id: I4b8de6b5cdff5413037de5b6c42f437033ee55bf
https://gerrit.chromium.org/gerrit/#change,17319 fixes cost estimating
to take skip_eob into account. No quality difference seen on derf set
tests, but about .4% gain on STD_HD set.
Change-Id: Ic5fe6d35ee021e664a6fcd28037b8432a0e470ca
build/make/version.sh requires CHANGELOG to generate vpx_version.h
The file is already included when building the documentation. However,
documentation is not build if doxygen/php are not present.
This is necessary when using '--enable-install-srcs --enable-codec-srcs'
and 'make dist'
Change-Id: Icada883a056a4713d24934ea44e0f6969b68f9c2
Coefficient costing failed to take account of the first branch
being skipped ( 0 vs eob) if the previous token is 0.
Fixed rd to account for slightly increased token cost & cleaned up
warning message
Change-Id: I56140635d9f48a28dded5a816964e973a53975ef
This gives a modest gain on derf overall, although at low bitrates the
cost is still too high, so this can be improved further.
Patch 2. Re-base and fix 80 column issues
Change-Id: Ida2f9fa3fe75370669f6a27b37108dc602231c63
The commit changed to compute UV intra RD estimates for 4x4 and 8x8
separately to be used in mode decision for MB modes associated with
the appropriate transform size respectively. Now finally after many
other changes related 8x8 quantizer zbin boost and zbin_mode_boost,
this change overall helps the HD(with 8x8) by around ~.13%.
(avg .13% glb .13% ssim .17%)
The commit also has a few changes for eliminating compiler warnings.
Change-Id: Ibab35dad44820c87e6b44799c66f8d519cc37344
The commit added the correct Zbin_mode_boost initialization based on
Intra Mode before using rate distortion to pick UV intra mode.
Change-Id: I8e57878ff356a06672f6fa2431be860bf9b9a5c7
Propagate debug setting to the EBML struct. When writing the application
name, this allows us to strip the version code and keep the output
metadata static.
Change-Id: I8e06c6abd743bedbff5af6242bbdae5d55754538
When we do 2-pass encoding, elapsed time is accumulated through
whole 2-pass process, which gives incorrect time and fps results
for second pass. This change fixed that by resetting the time
accumulator for second pass.
Change-Id: Ie6cbf0d0e66e6874e7071305e253c6267529cf20
Produce the token partitions on-the-fly, while processing each MB.
Context is updated at the beginning of each frame based on the
previoud frame's counters. Optimally encoder outputs partitions in
separate buffers. For frame based output, partitions are concatenated
internally.
Limitations:
- enabled just in combination with realtime-only mode
- number of encoding threads has to be equal or less than the
number of token partitions. For this reason, by default the encoder
will do 8 token partitions.
- vpxenc supports partition output (-P) just in combination with
IVF output format (--ivf)
Performance:
- Realtime encoder can be up to 13% faster (ARM) depending on the number
of threads and bitrate settings. Constant gain over the 5-16 speed
range.
- Token buffer reduced from one frame to 8 MBs
Quality:
- quality is affected by the delayed context updates. This again
dependents on input material, speed and bitrate settings. For VC
style input the loss seen is up to 0.2dB. If error-resilient=2
mode is used than the effect of this change is negligible.
Example:
./configure --enable-realtime-only --enable-onthefly-bitpacking
./vpxenc --rt --end-usage=1 --fps=30000/1000 -w 640 -h 480
--target-bitrate=1000 --token-parts=3 --static-thresh=2000
--ivf -P -t 4 -o strm.ivf tanya_640x480.yuv
Change-Id: I127295cb85b835fc287e1c0201a67e378d025d76
Eliminated some mb branches along with other code cleanups.
This is part of an ongoing effort to remove cut/paste
code in the decoder.
Change-Id: Ifabb0f67cafa6922b5a0e89a0d03a9b34e9e5752
The commit fixed a problem where 8x8 regular quantizer was using the
4x4 zbinboost lookup table that only has 16 entries at each Q. The
commit assigned a uniform zbin boost value for all cases that there
are more than 16 consective zeros. The change only affects MBs using
8x8 transform. The fix has a slightly positive impact on quality.
Test results:
http://www.corp.google.com/~yaowu/no_crawl/hd_fixzbinb.html
(avg psnr: .26% glb psnr: .21% ssim: .28%)
Results on cif size clip are also positive even though gain is smaller
http://www.corp.google.com/~yaowu/no_crawl/derf_fixzbinb.html
Change-Id: Ibe8f6da181d1fb377fbd0d3b5feb15be0cfa2017
This is the first patch for refactoring of the code related to
high-precision mv, so that 1/4 and 1/8 pel motion vectors can
co-exist in the same bit-stream by use of a frame level flag.
The current patch works fine for only use of 1/4th and
only use of 1/8th pel mv, but there are some issues with the
mode switching in between. Subsequent patches on this change Id
will fix the remaining issues.
Patch 2: Adds fixes to make sure that multiple mv precisions can
co-exist in the bit-stream. Frame level switching has been tested
to work correctly.
Patch 3: Fixes lines exceeding 80 char
Patch 4:
http://www.corp.google.com/~debargha/vp8_results/enhinterp.html
Results on derf after ssse3 bugfix, compared to everything
enabled but the 8-tap, 1/8-subpel and 1/16-subpel uv. Overall the
gains are about 3% now. Hopefully there are no more bugs lingering.
Apparently the sse3 bug affected the quartel subpel results more than
the eighth pel ones (which is understandabale because one bad predictor
due to the bug, matters less if there are a lot more subpel options
available as in the 1/8 subpel case).
The results in the 4th column correspond to the current settings.
The first two columns correspond to two settings of adaptive switching
of the 1/4 or 1/8 subpel mode based on initial Q estimate. These
do not work as good as just using 1/8 all the time yet.
Change-Id: I3ef392ad338329f4d68a85257a49f2b14f3af472
The commit overall on derf test is break even to very slightly positive
comparing to all 4x4 transform.
Change-Id: I2a7c19599aa54c2d3a5b35db0dc891ba8a6a2b26
Reworked the code to use vp8_build_intra_predictors_mby_s,
vp8_intra_prediction_down_copy, and vp8_intra4x4_predict_d_c
functions instead. vp8_intra4x4_predict_d_c is a decoder-only
version of vp8_intra4x4_predict. Future commits will fix this
code duplication.
Change-Id: Ifb4507103b7c83f8b94a872345191c49240154f5
When we encode slide-show clips, for the majority of the time,
only ZEROMV mode is checked, and all other modes are skipped.
This change delayed uv intra-mode evaluation until intra mode is
actually checked. This gave big performance gain for slide-show
video encoding (2nd pass gain: 18% to 28%). But, this change
doesn't help other types of videos.
Also, zbin_mode_boost is adjusted in mode-checking loop, which
causes bitstream mismatch before/after this change when --best
or --good with --cpu-used=0 are used.
Change-Id: I582b3e69fd384039994360e870e6e059c36a64cc
The "update" variable was used as a flag in coef_prob update dry run
that tests if a frame should encodes update at all. The wrong init
value forced the update happening always. fixing this has a minor
improvement in low bit rate situation when 8x8 transform is allowed.
Change-Id: Icb498e8d6a62fd074dcbc2065b797cba9237cb51
use oxcf instead of common in check to Reinit the
lookahead buffer if the frame size changes
prior behavior would cause assertion fail/crash
first observed in:
support changing resolution with vpx_codec_enc_config_set
Change-Id: Ib669916ca9b4f206d4cc3caab5107e49d39a36aa
For now the interface elements have been left in place
to make sure existing parameter files work but parameters
relating to drop frame wont do anything.
Change-Id: I579ee614726387381c546845dac4bc03c74c6a07
The Lagrangian interpolation filter is maximally flat in the
passband. There is non-trivial improvement with the hd set, while
for derf the results are virtually unchanged.
See:
http://www.corp.google.com/~debargha/vp8_results/enhinterpn.html (derf)
http://www.corp.google.com/~debargha/vp8_results/enhinterpn_hd.html (HD)
Patch 2: Updated the results for derf in the html above to use the
new baseline. There is still about 4% improvement. Will update the
hd baseline later (since it takes 9 hours to run on my machine)
Patch 3: By mistake the default filter was left at 60 - should be 0
to use the new interpolation filter.
Change-Id: If5f64444976562415d68a2aeabb94fdfa0d47890
* Removes EDGE_PIXEL_FILTER for external sanpshot
* changes the default 8-tap filter based on high precision results
in http://www.corp.google.com/~debargha/vp8_results/enhinterpn.html
* changes the default prob tables for high-precision mv encoding to
favor zeros in the last bit (i.e. quarter pel). This is only important
for short clips.
Change-Id: I02bb0de8679d9eec06cdbcc8160dbf073cd847a4
This is the initial patch for supporting 1/8th pel
motion. Currently if we configure with enable-high-precision-mv,
all motion vectors would default to 1/8 pel. Encode and
decode syncs fine with the current code. In the next phase
the code will be refactored so that we can choose the 1/8
pel mode adaptively at a frame/segment/mb level.
Derf results:
http://www.corp.google.com/~debargha/vp8_results/enhinterp_hpmv.html
(about 0.83% better than 8-tap interpoaltion)
Patch 3: Rebased. Also adding 1/16th pel interpolation for U and V
Patch 4: HD results.
http://www.corp.google.com/~debargha/vp8_results/enhinterp_hd_hpmv.html
Seems impressive (unless I am doing something wrong).
Patch 5: Added mmx/sse for bilateral filtering, as well as enforced
use of c-versions of subpel filters with 8-taps and 1/16th pel;
Also redesigned the 8-tap filters to reduce the cut-off in order to
introduce a denoising effect. There is a new configure option
sixteenth-subpel-uv which will use 1/16 th pel interpolation for
uv, if the motion vectors have 1/8 pel accuracy.
With the fixes the results are promising on the derf set. The enhanced
interpolation option with 8-taps alone gives 3% improvement over thei
derf set:
http://www.corp.google.com/~debargha/vp8_results/enhinterpn.html
Results on high precision mv and on the hd set are to follow.
Patch 6: Adding a missing condition for CONFIG_SIXTEENTH_SUBPEL_UV in
vp8/common/x86/x86_systemdependent.c
Patch 7: Cleaning up various debug messages.
Patch 8: Merge conflict
Change-Id: I5b1d844457aefd7414a9e4e0e06c6ed38fd8cc04
When temporal layers is used (i.e., number_of_layers > 1),
we don't use the frame rate boost for setting the key
frame target size. The factor was forcing the target size to be
always at its minimum (2* per_frame_bandwidth) for low frame rates
(i.e., base layer frame rate).
Generally we should modify or remove this frame rate factor;
for now we turn if off for number_of_layers > 1.
Change-Id: Ia5acf406c9b2f634d30ac2473adc7b9bf2e7e6c6
Yunqing fixed an oddity in UVIntra skippable evaluation for stable
branch, which brought up the fact that the evaluation is broken.
The issue was that for MBs with 2nd order block, the eob for 1st
order blocks is set at 1. The previous evaluation did not take that
into account. This commit intend to fix the problem. The commit also
absorbed Yunqing's fix for UVIntra skippable evalution.
Test on hd showed some good gains in combination with LPF bias fix:
http://www.corp.google.com/~yaowu/no_crawl/LPFBias_FixSkip.html
(avg psnr: .34%, glb psnr: .32%, ssim: .22%)
Change-Id: I36af11c8ef7f643e8ff46da7bf3a167b437039d4
Reworked the code to use vp8_build_intra_predictors_mbuv_s
instead. This is WIP with the goal of eliminating all
functions in reconintra_mt.h
Change-Id: I61c4a132684544b24a38c4a90044597c6ec0dd52
The bias in picklpf intended to bias toward less greedy in getting
best frame level psnr while maximize overall quality for a clip.
This commit reduced the bias for frames using 8x8 transform to
achieve better compression overall.
The change improve compression by ~.15% consistently on most of the
HD clips tested.
http://www.corp.google.com/~yaowu/no_crawl/LPFBias_FixSkip.html
Change-Id: Ic30932d2b8eaebd52339b0195f569edc48eed7bc
When compiling with -ggdb3 the output includes an extraneous EQU from
vpx_ports/asm_offsets.h
https://trac.macports.org/ticket/33285
Change-Id: Iba93ddafec414c152b87001a7542e7a894781231
In vp8_rd_pick_inter_mode(), if total of eobs is zero, rate needs
to be adjusted since there are no non-zero coefficients for
transmission. The uv intra eobs calculated in
rd_pick_intra_mbuv_mode() need to be saved before they are
overwritten by inter-mode eobs.
Change-Id: I41dd04fba912e8122ef95793d4d98a251bc60e58
mode_info_context->mbmi.mb_skip_coeff has to always reflect the
existence or not of coeffs for a certain MB. The loopfilter needs this
info.
mb_skip_coeff is either set by the vp8_tokenize_mb or has to be set to
1 when the MB is skipped by mode selection. This has to be done
regardless of the mb_no_coeff_skip value.
prob_skip_false is needed just when mb_no_coeff_skip is 1. No need to
keep count of both skip_false and skip_true as they are complementary
(skip_true+skip_false = total_mbs)
Change-Id: I3c74c9a0ee37bec10de7bb796e408f3e77006813
Depending on implementation the optimized SAD functions may return early
when the calculated SAD exceeds max_sad.
Change-Id: I05ce5b2d34e6d45fb3ec2a450aa99c4f3343bf3a
The commit rationized and simplified the entropy context conversion
betwen MB using 8x8 transform and MB using 4x4 transform. The old version
had a number of weirdness in how 4x4 transform MB's context is used for
8x8 blocks other than the first 8x8 within a MB.
Test showed the change has a gain ~.1% for avg psnr, glb psnr and ssim on
the limited HD set.
Change-Id: I774536c416baa6845aa741f956d8a69fa40e5d47
Built in echo in 'sh' on OS X does not support -n (exclude trailing
newline). It's not necessary so just leave it off. Fixes issue 390.
Build include guard using 'symbol' so that it is more likely to be
unique.
Change-Id: I4bc6aa1fc5e02228f71c200214b5ee4a16d56b83
Add the ability to specify multiple output streams on the command line.
Streams are delimited by --, and most parameters inherit from previous
streams.
In this implementation, resizing streams is still not supported. It
does not make use of the new multistream support in the encoder either.
Two pass support runs all streams independently, though it's
theoretically possible that we could combine firstpass runs in the
future. The logic required for this is too tricky to do as part of this
initial implementation. This is mostly an effort to get the parameter
passing and independent streams working from the application's
perspective, and a later commit will add the rescaling and
multiresolution support.
Change-Id: Ibf18c2355f54189fc91952c734c899e5c072b3e0
Refactoring some of the mode decoding logic introduced a bug where
the segmentation maps would not be properly reset on keyframes.
http://code.google.com/p/webm/issues/detail?id=378
The text of the bug is somewhat misleading as I initially read it to
imply the bug was present in v0.9.7-p1 (Cayuga), but note the text
"master", which indicates this was something subsequent. This issue
bisects back to v0.9.7-p1-84-ga99c20c, so unfortunately it was broken
during the Duclair release.
Thanks to Alexei Leonenko for investigating the root cause.
Change-Id: I9713c9f070eb37b31b3b029d9ef96be9b6ea2def
On Android NDK, rand() is inlined function. But, on our SSE optimization,
we need symbol for rand()
Change-Id: I42ab00e3255208ba95d7f9b9a8a3605ff58da8e1
Removal of the pickinter.c and .h files and calls to this
code.
Removal of some code relating to real time and one pass
settings though there is more to be done in this regard.
However, vp8_set_speed_features() now
only supports modes 0 and 1 and speeds up to 3
so rd should always be set.
Change-Id: I62c0c1b6154ab499785baef310536080e87bc4d8
this commit changed the UV r/d calculation in the mode decision process to
properly account for the rate of 8x8 transform coefficients.
Change-Id: I485f8f35f2b61db0b6539beb32e83481b1cf083b
the changes are still temporary, the final transforms, especially
inverse ones should take in account both accuracy, complexity, and
sign-bias, which should be decided at a later time.
Change-Id: I116b0c70b25f5ee324ae5713d4564f5d0aa27151
During the work of extend_qrange, we have rolled a factor of 2 from
quantization/dequatnization into 2nd order walsh-hadamard transform.
This commit does the same for the 2nd order haar transform. so they
can share the same quantizaiton process as the 2nd order WHT.
Change-Id: I734af4a20ea8149a01b5b1971a065092977dfe33
Previously, the scaling related to extended quantize range happens in
dequantization stage, which implies the coefficients form forward
transform are in different scale(4x) from dequantization coefficients
This worked fine when there was not distortion computation done based
on 8x8 transform, but it completely wracked the distortion estimation
based on transform coefficients and dequantized transform coefficients
introduced in commit f64725a00 for macroblocks using 8x8 transform.
This commit fixed the issue by moving the scaling into the stage of
inverse 8x8 transform.
TODO: Test&Verify the transform/quantization pipeline accuracy.
Change-Id: Iff77b36a965c2a6b247e59b9c59df93eba5d60e2
Replace inner loops of pack_mb_row_tokens_c and
pack_tokens_into_partitions_c with a call to pack_tokens_c.
Change-Id: I0341554fb154a14a5dadb63f8fc78010724c2c33
Second shot at this...
Sync with loopfilter thread as late as possible, usually just at the
beginning of next frame encoding. This returns control to application
faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed
imediatly so we cannot delay the sync. Same has to be done when
internal frame is previewed.
Change-Id: I64e110c8b224dd967faefffd9c93dd8dbad4a5b5
I'm basically not convinced that the concept works at all, let alone
that this is the right place to do it. I think if we want something
like this at all, I should integrate it with the main encoding loop
and re-encode checks in onyx_if.c, and show that it has a significant
benefit (which right now, it doesn't; removing this re-encode check
actually increases all metrics by ~0.15%).
Change-Id: I1b597385dc17f468384a994484fb24813389411f
This commit moved segment based loop filter level selection into
the experiment of CONFIG_FEATUREUPDATES. As previous commit noted,
the segment based loop filter selection helps the compression by
~0.1% on cif set, the ongoing experiment CONFIG_FEATUREUPDATES
made encoding updates of the segment based LPF level more efficient,
hence, another .04% gain on cif set. The commit also fixed an issue
previously where encoder/decoder may use different loop filter level
for one of the segments.
Change-Id: Ia978b14aae95bb107d561ba53a7a2bb6ff01faf3
This commmit added logic for MB using dual-pred to compute rate
estimation based on correct transform size. The section of code
was previously located under #if CONFIG_DUALPRED, that was made
to be working with T8x8 experiment at the same time.
Change-Id: Iebc2518c03f11378b9c2e72905520f088b54d5c0
Added a bit to signify that the feature changed since
the last time we sent it, or not so that we don't need
to send all the databits for every feature change.
added config
Change-Id: I8d3064ce90d4500bf0d5c6b87c664e46138dfcac
Added a frame level flag to indicate if coef probabilities are updated
at all for the frame.
During the experimental work with 8x8 transform, it is discovered that
even in the case of no probability is ever update, cost of transmitting
"no update" for each of probabilities can run up to become a significant
overhead cost. A single bit to indicate no-update for all coef probs
is therefore helpful, which is also demonstrated by the test results:
1. On Cif set:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif_t8x8_updprob.html
(avg psnr: .14%, glb psnr: .14% SSIM: .13%)
2. On HD set:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/HD_t8x8_updprob.html
(avg psnr: .02% glb psnr: .01% SSIM: .02%)
It should be noted that the gain on HD is smaller because the average bit
rate is much higher in contrast to the overhead bit cost.
Change-Id: I46db270e693ee8799fef34a14d8260868ce4cd16
For 8x8 transformed macroblock, the 2nd order transform is a 2x2 haar
transform, here there is only 4 coefficients total. A previous merge
changed these to 64, causing crashes when encoding with 8x8 transform
enabled. (i.e. when input video image size > 640x360 ) This commit
reverts them back to 4 and fixes the crashes.
Change-Id: I3290b81f8c0d32c7efec03093a61ea57736c0550
For the experimental branch we are trying to slim the codebase
down removing features such as threading for now which complicate
the process of development and testing.
Change-Id: I657c0246aef4d1fa8c8ffc6a1adfeee45bce8e24
In summary, this commit encompasses a series of changes in attempt to
improve the 8x8 transform based coding to help overall compression
quality, please refer to the detailed commit history below for what
are the rationale underly the series of changes:
a. A frame level flag to indicate if 8x8 transform is used at all.
b. 8x8 transform is not used for key frames and small image size.
c. On inter coded frame, macroblocks using modes B_PRED, SPLIT_MV
and I8X8_PRED are forced to using 4x4 transform based coding, the
rest uses 8x8 transform based coding.
d. Encoder and decoder has the same assumption on the relationship
between prediction modes and transform size, therefore no signaling
is encoded in bitstream.
e. Mode decision process now calculate the rate and distortion scores
using their respective transforms.
Overall test results:
1. HD set
http://www.corp.google.com/~yaowu/no_crawl/t8x8/HD_t8x8_20120206.html
(avg psnr: 3.09% glb psnr: 3.22%, ssim: 3.90%)
2. Cif set:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif_t8x8_20120206.html
(avg psnr: -0.03%, glb psnr: -0.02%, ssim: -0.04%)
It should be noted here, as 8x8 transform coding itself is disabled
for cif size clips, the 0.03% loss is purely from the 1 bit/frame
flag overhead on if 8x8 transform is used or not for the frame.
---patch history for future reference---
Patch 1:
this commit tries to select transform size based on macroblock
prediction mode. If the size of a prediction mode is 16x16, then
the macroblock is forced to use 8x8 transform. If the prediction
mode is B_PRED, SPLITMV or I8X8_PRED, then the macroblock is forced
to use 4x4 transform. Tests on the following HD clips showed mixed
results: (all hd clips only used first 100 frames in the test)
http://www.corp.google.com/~yaowu/no_crawl/t8x8/hdmodebased8x8.htmlhttp://www.corp.google.com/~yaowu/no_crawl/t8x8/hdmodebased8x8_log.html
while the results are mixed and overall negative, it is interesting to
see 8x8 helped a few of the clips.
Patch 2:
this patch tries to hard-wire selection of transform size based on
prediction modes without using segmentation to signal the transform size.
encoder and decoder both takes the same assumption that all macroblocks
use 8x8 transform except when prediciton mode is B_PRED, I8X8_PRED or
SPLITMV. Test results are as follows:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cifmodebase8x8_0125.htmlhttp://www.corp.google.com/~yaowu/no_crawl/t8x8/hdmodebased8x8_0125log.html
Interestingly, by removing the overhead or coding the segmentation, the
results on this limited HD set have turn positive on average.
Patch 3:
this patch disabled the usage of 8x8 transform on key frames, and kept the
logic from patch 2 for inter frames only. test results on HD set turned
decidedly positive with 8x8 transform enabled on inter frame with 16x16
prediction modes: (avg psnr: .81% glb psnr: .82 ssim: .55%)
http://www.corp.google.com/~yaowu/no_crawl/t8x8/hdintermode8x8_0125.html
results on cif set still negative overall
Patch 4:
continued from last patch, but now in mode decision process, the rate and
distortion estimates are computed based on 8x8 transform results for MBs
with modes associated with 8x8 transform. This patch also fixed a problem
related to segment based eob coding when 8x8 transform is used. The patch
significantly improved the results on HD clips:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/hd8x8RDintermode.html
(avg psnr: 2.70% glb psnr: 2.76% ssim: 3.34%)
results on cif also improved, though they are still negative compared to
baseline that uses 4x4 transform only:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif8x8RDintermode.html
(avg psnr: -.78% glb psnr: -.86% ssim: -.19%)
Patch 5:
This patch does 3 things:
a. a bunch of decoder bug fixes, encodings and decodings were verified
to have matched recon buffer on a number of encodes on cif size mobile and
hd version of _pedestrian.
b. the patch further improved the rate distortion calculation of MBS that
use 8x8 transform. This provided some further gain on compression.
c. the patch also got the experimental work SEG_LVL_EOB to work with 8x8
transformed macroblock, test results indicates it improves the cif set
but hurt the HD set slightly.
Tests results on HD clips:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/HD_t8x8_20120201.html
(avg psnr: 3.19% glb psnr: 3.30% ssim: 3.93%)
Test results on cif clips:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif_t8x8_20120201.html
(avg psnr: -.47% glb psnr: -.51% ssim: +.28%)
Patch 6:
Added a frame level flag to indicate if 8x8 transform is allowed at all.
temporarily the decision is based on frame size, can be optimized later
one. This get the cif results to basically unchanged, with one bit per
frame overhead on both cif and hd clips.
Patch 8:
Rebase and Merge to head by PGW.
Fixed some suspect 4s that look like hey should be 64s in regard
to segmented EOB. Perhaps #defines would be bette.
Bulit and tested without T8x8 enabled and produces unchanged
output.
Patch 9:
Corrected misalligned code/decode of "txfm_mode" bit.
Limited testing for correct encode and decode with
T8x8 configured on derf clips.
Change-Id: I156e1405d25f81579d579dff8ab9af53944ec49c
As an optimization some architectures use the max_sad argument to break
out early from the SAD. Pass in INT_MAX instead of 0 to prevent this.
Change-Id: I653c476834b97771578d63f231233d445388629d
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.
To fix this we cast the result of the multiplication as unsigned.
The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.
For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).
PPC change is untested.
Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
Merged in most of the current common prediction changes
that were under the #if CONFIG_COMPRED option.
Change-Id: If4e6f61dbe7b86dd449f6effbe93b5eb7e893885
Further changes to make experiments with the context
used for coding the dual pred flag easier.
Current best performing method tested on derf is a two
element context based on reference frame. I also tried
various combinations of mode and reference frame as
shown in commented out case using up to 6 contexts.
Derf +0.26 overall psnr +0.15% ssim vs original method.
Change-Id: I64c21ddec0abbb27feaaeaa1da2e9f164ebaca03
Further use of common prediction functions and experiments
with alternate contexts based on mode and reference frame.
For the Derf set using reference frame as basis of context
gives +0.18% Overall Psnr and +0.08 SSIM
Change-Id: Ie7eb76f329f74c9c698614f01ece31de0b6bfc9e
We should only change the dual prediction mode if we actually entered
the recode branch. Else, it may potentially undo beneficial changes
to the dual prediction mode in the first encode iteration.
Change-Id: I79fc53e5fd0bb551092ed422c797619f1566f002
Some conditions were conditional under a threshold, whereas they should
always execute. Also, some conditions were testing an array instead of
the values within it.
Change-Id: Ia6892945cfbbe07322e6af6be42cd864bf9479c1
Allow the application to change the frame size during encoding. This
is only supported when not using lagged compress.
Change-Id: I89b585d703d5fd728a9e3dedf997f1b595d0db0f
MFQE postproc crashed with stream dimensions not a multiple of 16.
The buffer was memset unconditionally, so if the buffer allocation
fails we end up trying to write to NULL.
This patch traps an allocation failure with vpx_internal_error(),
and aligns the buffer dimensions to what vp8_yv12_alloc_frame_buffer()
expects.
Change-Id: I3915d597cd66886a24f4ef39752751ebe6425066
The 5-layer encode must have a keyframe every 16 frames.
The KF flag was being reset after the encode of the first
frame, which it should not do for the 5-layer case
(mode=6).
Change-Id: I207d6e689d347fe3fd1075b97a817e82f7ad53b9
The existing code updated the reference frame probabilities before
the test to evaluate the impact of using updated probabilities
in vp8_estimate_entropy_savings().
The estimate of cost and savings is still basic and does not reflect
the new prediction code but this would require per MB costings
and the benefit is probably marginal, as this is really just used for
rate estimation in the loop.
Change-Id: Id6ba88ae6e11c273b3159deff70980363ccd8ea1
This commit merges the NEWNEAR experiment such that it
is effectively always on.
The fact that there were changes in the threading code again
highlights the need to strip out such features during the
bitstream development phase as trying to maintain this code
(especially as it is not being tested) slows the development cycle.
Change-Id: I8b34950a1333231ced9928aa11cd6d6459984b65
Initial modifications to make limited use of common prediction
functions.
The only functional change thus far is that updates to the probabilities are
no longer "damped". This was a testing convenience but in fact seems to
help by a little over 0.1% over the derf set.
Change-Id: I8b82907d9d6b6a4a075728b60b31ce93392a5f2e
Trial of a modified prediction function that ranks each possible
reference frame based on a combination of local usage and
frame level probability. The code is a bit cleaner and simpler.
In direct comparison with old unpredicted method with segment level
coding turned off for mode,ref & EOB the prediction gives a gain on derf
of around 0.4%. There is some further gain from bug fixes over earlier code.
With segment coding on the prediction method is slightly -ve on some very
easy clips (at low rates) due to slightly higher overheads, but better on harder
clips. Overall neutral on derf in direct comparison on latest code base, but
compared to earlier code without bug fixes about +0.7% overall psnr
+0.3% SSIM.
Change-Id: I5b8474658b208134d352d24f6517f25795490789
Sometimes, a user doesn't have enough bandwidth to send high-resolution
(i.e. HD) video even though the camera catches HD video. This change
allowed users to skip highest-resolution encoding by setting that level's
target bit rate to 0.
To test it, modify the following line in vp8_multi_resolution_encoder.c.
unsigned int target_bitrate[NUM_ENCODERS]={1400, 500, 100};
To skip the highest-resolution level, change it to
unsigned int target_bitrate[NUM_ENCODERS]={0, 500, 100};
To skip the first and second highest resolution levels, change it to
unsigned int target_bitrate[NUM_ENCODERS]={0, 0, 100};
This change also fixed a small problem in mapping, which slightly helped
quality and performance.
Change-Id: I977bae9a9fbfba85c8be4bd5af01539f2b84bc81
Extended prediction and coding of reference frame where
a subset of options are flagged as available at the segment level.
Updated copyright notices.
Switch to SAD in mbgraph code as SATD problematic for the
foreground and background separation as it can ignore large DC shifts.
Change-Id: I661dbbb2f94f3ec0f96bb928c1655e5e415a7de1
As a precursor to encoding 32x32 blocks this cl adds the
ability to encode the frame superblock (=32x32 block) at
a time. Within a SB the 4 indiviual MBs are encoded in
raster-order (NW,NE,SW,SE).
This functionality is added as an experiment which can be
enabled by ispecifying --enable-superblocks in the
command line specified to configure (CONFIG_SUPERBLOCKS
macro in the code).
To make this work I had to disable the two intra
prediction modes that use data from the top-right of the
MB.
On the tests that I have run the results produce
almost exactly the same PSNRs & SSIMs with a very
slightly higher average data rate (and slightly higher
data rate than just disabling the two intra modes in
the original code).
NOTE: This will also break the multi-threaded code.
This replaces the abandoned change:
Iebebe0d1a50ce8c15c79862c537b765a2f67e162
Change-Id: I1bc1a00f236abc1a373c7210d756e25f970fcad8
Commented out changes from earlier checking:
"Change Iab7f1eff: vpnext use segref segmentation filter"
Which in its current state breaks the decoder.
Change-Id: I9185098aeda8ce65310f338c4c9375f4a39005d3
The bug was introduced by the commit that added I8X8 intra prediction
mode for inter frames, the decoder was not update to accept the additional
probability update from encoder. This causes the decoder typicall to crash
when encoder sends intra mode probability update.
Change-Id: Ib7dc42dc77a51178aa9ece41e081829818a25016
This check in uses the common prediction interface functions
to code reference frame.
Some updates made regarding the impact of the new code in rd loop
but there remain TODOs in this regard.
Change-Id: I9da3ed5dfdaa489e0903ab33258b0767a585567f
This does not change any functionality just modifies the code to
use the common prediction module interface for coding
the segment data.
Change-Id: Ifd43e9153573365619774a4f5572215e44fb5aa3
This function adds the common prediction modules, some data structures
and a config option but does not use them.
It also corrects a bug in clearing down the MODE_INFO border and introduces
a new element that indicates if an entry corresponds to an "in image" macro block
or is part of the border.
Change-Id: Ib69eec0876173ebe9d1de9df9537d0b2447702e0
This is the final commit in the series converting to the new RTCD
system. It removes the encoder csystemdependent files and the remaining
global function pointers that didn't conform to the old RTCD system.
Change-Id: I9649706f1bb89f0cbf431ab0e3e7552d37be4d8e
This commit continues the process of converting to the new RTCD
system. It removes the last of the VP8_ENCODER_RTCD struct references.
Change-Id: I2a44f52d7cccf5177e1ca98a028ead570d045395
This is a proof of concept RTCD implementation to replace the current
system of nested includes, prototypes, INVOKE macros, etc. Currently
only the decoder specific functions are implemented in the new system.
Additional functions will be added in subsequent commits.
Overview:
RTCD "functions" are implemented as either a global function pointer
or a macro (when only one eligible specialization available).
Functions which have RTCD specializations are listed using a simple
DSL identifying the function's base name, its prototype, and the
architecture extensions that specializations are available for.
Advantages over the old system:
- No INVOKE macros. A call to an RTCD function looks like an ordinary
function call.
- No need to pass vtables around.
- If there is only one eligible function to call, the function is
called directly, rather than indirecting through a function pointer.
- Supports the notion of "required" extensions, so in combination with
the above, on x86_64 if the best function available is sse2 or lower
it will be called directly, since all x86_64 platforms implement
sse2.
- Elides all references to functions which will never be called, which
could reduce binary size. For example if sse2 is required and there
are both mmx and sse2 implementations of a certain function, the
code will have no link time references to the mmx code.
- Significantly easier to add a new function, just one file to edit.
Disadvantages:
- Requires global writable data (though this is not a new requirement)
- 1 new generated source file.
Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55
Commit 892e23a5b introduced support for the VP8D_GET_LAST_REF_USED,
but missed the mapping of the control id to the underlying function,
so it was unavailable to applications.
In addition, the underlying function vp8_references_buffer() is
moved from common/postproc.c to decoder/onyxd_if.c as postproc.c is
not built in all configurations.
Change-Id: I426dd254e7e6c4c061b70d729b69a6c384ebbe44
Picks a per segment loopfilter. Adapts the algorithm to search for
a loopfilter value for each separate segment. Further todo fix the
bias
Improvements .06 % ov psnr, .11% ssim
http://www.corp.google.com/~jimbankoski/no_crawl/segmentedpicklpf.html
Change-Id: Ic6a571c16fcd6ec0139f4de1f8061f87c6515a10
Added new 2 and 3 layer prediction frame patterns to
vp8_scalable_patterns.c and modified the coding
parameters.
Change-Id: I18798fd7326a79d2ad1e1d5b6c26f5516b6d247f
changed the token cost for 8x8 transformed macroblock used in trellisquant
from those derived from 4x4 transform coefficient distribution to those
derived from 8x8 transform coefficient distribution. Test results show
this fix help 8x8 transform based compression consistently on cif and hd
sets:
http://www.corp.google.com/~yaowu/no_crawl/t8x8/cif_cost8x8only.html
(avg psnr:.14% glb psnr: .17% ssim: .20%)
http://www.corp.google.com/~yaowu/no_crawl/t8x8/hd_cost8x8only.html
(avg psnr:.17% glb psnr: .18% ssim: .58%)
Note: To test the effect of this change, 8x8 transform was forced to be used
only on 16x16 predicted macroblocks on inter frames, the effect would be
bigger had all macroblocks been forcd to use 8x8 transform.
Change-Id: If9b7868b75357c66541f511e5ee78e4d2d4929a4
using an 8-tap filter.
The results with 3 different 8-tap filters on the derf set are in:
http://www.corp.google.com/~debargha/vp8_results/enhinterp.html
The one that gives the most gain achieves an overall gain of about
0.6%. The results for a set of 12 hd (720p) videos are in:
http://www.corp.google.com/~debargha/vp8_results/enhinterp_hd.html
with max gain of 0.55% with the same filter. The best filter apparently
achieves the best trade-off between pass band ripple and stop band
attenuation.
Change-Id: I919e28ae245c0493147fa0864f8c9d048a9dd530
Commit e06c242ba introduced a change to call vp8_find_near_mvs() only
once instead of once per reference frame by observing that the only
effect that the frame had was on the bias applied to the motion
vector. By keeping track of the sign_bias value, the mv to use could
be flip-flopped by multiplying its components by -1.
This behavior was subtley wrong in the case when clamping was applied
to the motion vectors found by vp8_find_near_mvs(). A motion vector
could be in-bounds with one sign bias, but out of bounds after
inverting the sign, or vice versa. The clamping must match that done
by the decoder.
This change modifies vp8_find_near_mvs() to remove the clamping from
that function. The vp8_pick_inter_mode() and vp8_rd_pick_inter_mode()
functions instead track the correctly clamped values for both bias
values, switching between them by simple assignment. The common
clamping and inversion code is in vp8_find_near_mvs_bias()
Change-Id: I17e1a348d1643497eca0be232e2fbe2acf8478e1
This commit is incomplete, as it does not synchronize the loop filter
before returning a handle to the reconstructed frame in
vpx_codec_get_preview_frame(), which can cause (false?) failures
when running the test_reconstruct_buffer test.
This may be related to a bug that does cause visible artifacts, which
is also under investigation.
This reverts commit 380d64ecb1.
Change-Id: Iad710941e7731d44fc2bde63bc63d6763cc4629e
This introduces base functions for introducing implicit segmentation.
The code that actually stores the results to the segment map isn't
here yet. This just prints out the segmentation map results
if you call it.
Uses connected component labeling technique on mbmi info so that only
if 2 mbs are horizontally or vertically touching do they get the same
segment.
vp8next - plumbing for rotation
code to produce taps for rotation ( tapify. py ), code
for predicting using rotation ( predict_rotated.c ) , code
for finding the best rotation find_rotation.c.
didn't checkin code that uses this in the codec. still work
in progress.
Fixed copyright notice
Change-Id: I450c13cfa41ab2fcb699f3897760370b4935fdf8
A processor with ARMv7 instructions does not
necessarily have NEON dsp extensions. This CL
has the added side effect of allowing the ability
to enable/disable the dsp extensions cleanly.
Change-Id: Ie1e879b8fe131885bc3d4138a0acc9ffe73a36df
This commit added a set of loop filter functions for macroblocks
using 8x8 transform. First we turned off the regular loop filtering
on 4x4 block boundaries that do not exist in macroblocks using 8x8
transform. Second, we change to use the same loop filter(mask and
7 tap filter) that used for macroblock edge filtering.
Change-Id: I3a00460b7674ced116917d86812ffc32578c1d3a
Makes the thresholds for the multiframe quality enhancement module
depend on the difference between the base quantizers. Also modifies
the mixing function to weigh the current low quality frame less if
the difference in quantizer is large. With the above modifications
mfqe works well for both scalable patterns as well as low quality
key frames.
Change-Id: If24e94f63f3c292f939eea94f627e7ebfb27cb75
This fixed a conflict introduced by the change of adding 8x8 intra
prediction modes. The 8x8 intra prediction mode code assumed the
use of 4x4 transform, and causes encoder crashes when the codec is
configured with --enable-t8x8.
Change-Id: I00cc94df63e9725377ffba9eb51be6b77fe3fcf9
commit cf561bad accidentally deleted a line of code that sets the
base_qindex for each frame, which leads to every frame is encoded
at Q of 0.
Change-Id: Ib5f8022e856bf3b3bd0d4147405e46241e3dcf2d
The commit adds a new set of loop filter for macroblock edge filtering.
The new loop filter has a mask to detect so-called "flat" regions. The
detection checks 5 pixels of each side of an edge. If the all pixels
have value with +/-1 from the edge pixel on the same side, the region
is treated as a "flat" region. For such case, a 7 tap filter is used
to change 3 pixel values on each side. The 7 taps are:
[1, 1, 1, 2, 1, 1, 1]/8
The furthest away pixels used as input are +/-5 away from edge. For
non-flat region, we fall back to old filtering. It should be noted
here that the thresholds and filter taps may require more optimization
for best possible results.
Tests on a set of hd clips showed consistent gains:
http://www.corp.google.com/~yaowu/no_crawl/mblpf_hd.html
(avg psnr: .83% glb psnr: .77% ssim: .82%)
Tests on derf set also showed consistent gains:
http://www.corp.google.com/~yaowu/no_crawl/mblpf_derf.html
(avg psnr: .24% glb psnr: .22% ssim: .48%)
Change-Id: I0855b1ff48e79e1175c20b81967137e18b2af352
Android.mk file for using the Android NDK build
system to compile. Adds option for SDK path to
use the compiler that comes with android for testing
compiler compliance.
Change-Id: I5fd17cb76e3ed631758d3f392e62ae1a050d0d10
A problem can arise on static clips with force key frames where
attempts to avoid popping lead to a progressive reduction in key
frame Q that ultimately may lead to unexpected overspend against
the rate target.
The changes in this patch help to insure that in such clips the
quality of the key frames across the clip is more uniform (rather
than starting bad and getting better - especially at low target rates).
This patch also includes a fix that removes a delta on the Y2DC
when the baseline q index < 4 as this is no longer needed.
There is also a fix to try and prevent repeat single step Q adjustment in
the recode loop leading to lots of recodes, especially where the use
of forced skips as part of segmentation has made the impact of Q on
the number of bits generated much smaller.
Patch 2: Amend "last_boosted_qindex" calculation for arf overlay frames.
Change-Id: Ia1feeb79ed8ed014e4239994fcf5e58e68fd9459
filter in a way such that when there is a single bad frame, the
post-processing is applied not only to just that frame but a
few subsequent frames as well.
Change-Id: Iba5d9896eed77244eb76b4a74692a93f8ecff634
When running multi-layer (ML) encodes and dynamically
changing coding parameters on the fly (e.g. frame
duration/rate, bandwidths allocated to each layer)
the encoder would not produce sensible output.
In certain cases the rate targeting would be
hideously inaccurate.
These fixes make it possible to change these coding
parameters correctly and to maintain accurate control
of the rate targeting.
I also added the specification of the input timebase
into the test program, vp8_scalable_patterns.c.
Patch 2: Moved declaration to appease MS compiler)
Change-Id: Ic8bb5a16daa924bb64974e740696e040d07ae363
Add new get_predictor_pointers() and get_reference_search_order()
functions for code shared between the two implementations.
Change-Id: I1ebe76aa8f168b1f5cfabc00d05d8f19a0d4d207
with deblock or demacroblock filters. When --mfqe is used together
with --demacroblock or --deblock, mfqe is applied first and then
demacroblock/deblock is applied to the mfqe result.
Change-Id: Id83ee01f1b4a33a116f071dcf26d59c7f3497c32
precision used in initialization of roundoff is not a constant
updated to use #define MFQE_PRECISION 4
Change-Id: If2fc3d3d633d58a7f4ab34d258c232ec1e5f0a79
Code was from a time when the compiler was
not good at optimizing. Compilers are better
and instruction sets have increased to make
this code obsolete.
Change-Id: I8d261371685247465eb4aa00bdcecc9fe1784219
The default maximum keyframe interval is 9999, or over five minutes
at normal video rates. When the encoder produces streams with such
a long interval seeking (with correct output) is more expensive,
and live streaming is impossible.
Of course the encoding application should set this parameter
based on its knowledge of the intended use of the stream, but
reducing the default gives better results for applications
which do not.
Change-Id: I900b15d74ce72ecc3ade4d43f758c5cf97a2098a
This patch removes the local copies of the dequantize
constants and implements John's idea as described
in "Make a local copy of the dequantized data" commit.
Change-Id: Ic6b7d681f00bf63263f71ff1e39ab2f80729e8b2
In certain hardware configuration, where mmx code is enabled and
other simd (sse2/sse3) disabled, lacking of this emms caused invalid
internal stats outputs.
Change-Id: I77c61cf6e0448d3f3b8c11781aa9e42f31d231c9
Adds a multiframe postprocessing module to enhance the quality of
certain frames that are coded at lower quality than preceding frames.
The module can be invoked from the commandline by use of the --mfqe
option, and will be most beneficial for enhancing the quality of
frames decoded using scalable patterns.
Uses the vp8_variance_var16x16 and vp8_variance_sad16x16 function
pointers to compute SAD and Variance of blocks.
Change-Id: Id73d2a6e3572d07f9f8e36bbce00a4fc5ffd8961
Current android ndk compiler does not recognize
strings for attributes. Numerical equivalents
can be found in the "ARM IHI 0045C" document.
Change-Id: I72de85b8949dc0ae5212af604fff1d5a91a828ea
Except zrun_zbin_boost, 15 AC values are the same for all other
parameters. Removed unneccessary calculation.
Change-Id: I6101c0fe8080bd2b4387c3b04d7ddedbf6010409
Makes the distribution tree (built with 'make dist') buildable with
--enable-install-srcs --enable-multi-res-encoding
Change-Id: If2ea7632f7b26615196e9abcfaa34618cc50112a
The code had a number of constructs like (condition)?1:0,
which is redundant with C's semantics. In the cases where a boolean
operator was used in the condition, simply remove the ternary part.
Otherwise adjust the surrounding expression to remove the condition
(eg, for rounding up. See pickinter.c and rdopt.c)
Change-Id: Icb2372defa3783cf31857d90b2630d06b2c7e1be
Multithreaded encoding was breaking at low bitrates
Please review/comment. Not sure if this is the best fix.
Change-Id: I87468c765372593fd865bc82e25121ebb8ca6af2
Previously, the mode context is always udpated based on stats of current
frame, when there is no count, 50% is used for both left and right branch.
However, it is observed that with such strategy, a small count or no count
at all can skew the probability distribution significantly. This commmit
changed the mode_context update strategy to prevent small counts from
skewing the probability distributions.
Tests on derf set showed a small gain: .06% in psnr and .09% in ssim
Change-Id: Ic812e64ae5f70251c170b0717f7b7fa587055488
Make bilinearfilter_arm.c compiled only when HAVE_ARMV6, as its definitions
are v6 only. This is normally not a problem for static builds as the file
is elided at link time, but this was not being done properly for the
--enable-shared --enable-pic build.
Change-Id: Ic800a7cde751f74f22555c5b247f99f9df5e550d
This commit extends the number of Q steps to 256 from 128.
The q_trans[] array has been altered to distribute available Q index values
(using the current 64 steps available as input parameters) evenly across the
available range. This is coupled with the fact that each Q step where possible
now equates to a fixed % change in the quantizer. This may want refinement
later especially in terms of the granularity at the high quality end but is a
reasonable starting point.
Change-Id: I2aaa6874fa10ce05c958dd182947ce39f6f1eecb
High Q end extended a little.
Some clean up.
Slightly better on SSIM, Slightly worse on PSNR over derf set.
Change-Id: I3dceea8a39e11c26e1a389a40e40b86efc76d28c
Added code to support 256 index steps instead of 128 but disabled for now.
Replace hard wired table vp8cx_base_skip_false_prob[128]
Observed Qindex problem with setting minimum loop filter value.
(Experiment code using real Q in place but for now just returning 0. This has a big
beneficial effect on some clips, particularly waterfall which shows 5% ssim gain)
Change-Id: I2f7117de8adc1797164c106aa13effc900a1467e
Merged multi-resolution motion estimation with regular motion
estimation function in order to remove duplicated part. This
caused slight changes in multi-resulotion encoder quality &
performance.
Change-Id: Ib4ecc7acfebfe5eea959b5b91febae6db7b95fd1
Whilst the encoder explicitly set the segment_id to 0
when segmentation is diabled, the decoder would allow
the segment_id to persist from the previous frame.
This fix attempts to make the decoder behave the same
as the encoder by explicitly setting the segment_id to
0 in this case.
Change-Id: I65c3a05247550edb10706eb5d54d306dfb792309
The total_stats, this_frame_stats, and total_left_stats structures
were previously create by a heap allocation, despite being of fixed
size. These structures were allocated and deallocated during
{de,}allocate_compressor_data, which is reinvoked whenever the frame
size changes. Unfortunately, this clobbers the total_stats and
total_left_stats data.
Historically, these were variable size at one time, due to the first
pass motion map, which necessitated their being created by a unique
heap allocation. However, this bug with the total_stats being
clobbered has probably been present since that initial implementation.
These structures are instead moved to be stored within the struct
twopass_rc directly, rather than being heap allocated separately.
Change-Id: I7f9e519e25c58b92969071f0e99fa80307e0682b
When mb_skip_coeff is set, the idct is not necessary. Prior
to this patch, the code would call idcts based on leftover
eob information. This patch will now skip the idct for
SPLIT_MV and clear out the eobs for B_PRED, forcing the idct
to be skipped.
Change-Id: If5b0d2ed3ebd07789d30ec5160df927485fcaa17
mode_info_context is padded with an additional column of data, so
mode_info_stride should be used to move between rows rather than
mb_cols.
Change-Id: I598559a2cd9df1c486d64aaeccf76b76a7ecf21c
These functions are now used by the encoder.
This is WIP with the goal of creating a common idct/add for
the encoder and decoder. A boost of 1.8% was seen for
the HD rt test clip used.
[Tero] Added needed changes to ARM side.
Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf
Both encoder & decoder were using mb_cols to
offset from one row of MODE_INFO structures to the next
when they should have been using mode_info_stride.
Fixing this in both encoder and decoder gives around
a 3KB size saving and 0.025dB PSNR improvement on the one
720P clip I tried.
(Also removed "index" which was being updated but not used)
Change-Id: I413bea802b142886bfcf8d8aa7f5a2f0c524fd4b
While doing motion search on a macroblock, we usually call
vp8_find_near_mvs once per reference frame. Actually, for
different reference frames, the only difference in calculating
these near_mvs is they may have different sign_bias, which
causes a sign change in resulting near_mvs. In this change, we
only do find_near_mvs for the first reference frame. For other
reference frames, only need to adjust the near_mvs according to
that reference frame's sign_bias value.
Change-Id: I661394b49c6ad79fed7d0f2eb2be239b9c56f149
Add the notion of deferred validation of parameters. We don't want to
validate the cq_level at initialization time, because it won't have
been set via set_param() yet.
Change-Id: Ia1308395e8c10e0b1dc4e9af3a09b2bd6744cc30
The fast-path for skipped MBs was not correctly respecting the
block type during update of the coefficient counts. Extracted
this from part of change I365cfb6ac636f19c545f682e3aeac185253abaef
Change-Id: I53d8cf0a00a98034b97b0ed3414b703bae74a739
The conversion script was incorrectly matching
CONFIG_POSTPROC[_VISUALIZER] and generating an
incorrect vpx_config.asm
Match both PROC and ENDP on word boundaries
Change-Id: Ic2788c3b522d4ee0afc5223b72e1b09fb52645be
Aligned the image buffer and stride to 32 bytes. This enables
calling of optimized scaler function in libyuv, and improves
the performance.
Tested libyuv scaler(x86 optimization) on Linux and Windows,
including: Linux 32/64bit, visual studio 32/64bit, Cygwin, and
MinGW32.
Also, fixed a wrong pointer in vpx_codec_encode().
Change-Id: Ibe97d7a0a745f82c43852fa4ed719be5a4db6abc
Removed a couple more fixed tables for the extended quantizer experiment
that depend on QINDEX_RANGE.
Change-Id: I2c15ffc7488c2a2b8d6504e2c4b6b2339799d117
Previously, Y-adaptive UV intra coding only enabled on key frames in
UVINTRA experiment. This commit enabled the same coding for inter
frames, so the encoding of UV intra modes are consistent cross all
frame types. Tests on derf set showed a very small overall gain around
.04%:
http://www.corp.google.com/~yaowu/no_crawl/interUVintra.html
The gain looks to be reasonable given inta coded MBs is only a
small portion of MBs in inter frames.
Change-Id: Ic6fc261923f2c253f4a0c9f8bccf4797557b9e16
update_mbgraph_frame_stats used xd->mode_info_context
before it had been setup, resulting in potentially
random accesses of uninitialized memory.
This fix allocates a local MODE_INFO structure to hold
the data generated in the function.
Change-Id: Ic9e75610008ce0e2d690e8e583c21582fee6fc45
A previous commit 76feb965 made the vp8_mode_context adaptive on a frame
frame basis, this commit further made the coding context adaptive to two
frame types separately. Tests on derf set showed a further small gain on
all metrics: avg psnr 0.10%, glb psnr: 0.11%, ssim: 0.08%
http://www.corp.google.com/~yaowu/no_crawl/newNearMode_1209.html
Change-Id: I7b3e32ec8729de1903d14a3f1213f1624b78cdee
The previous definition of uintptr_t was incorrect on x64 with MSVC.
Also, the ARMV6 version of int_fast16_t was defined as unsigned for
some reason. Since the fast types are currently unused, just remove
them.
Change-Id: Idd73f77a989c77feedcb4a6802ae6bd37324ed40
The commit fixed a problem by capping cpi->active_best_quality to be
smaller than cpi->worst_quality. Also fixed a few line of code that
was misplaced.
Change-Id: Ie908264b72140c669122a0afde5d886619c33474
This commit removed the macro CONFIG_MULCONTEXT, which was used to
indicate the experiment code for using separate context for altref
and normal frames. This commit made the change fully merged in.
Change-Id: I525f927f68e2365d37b340ef23b836a136a4f70b
This commit removed the macro CONFIG_I8X8, which was used to indicate
the 8x8 intra prediction experiment, made the change fully merged in.
Change-Id: Iafa4443781ce6e83f5591c12ba615a0e92ce0ea0
vp8_mode_contexts[] is an entropy table used to code inter mode
choices. It was a fixed constant table. This commit made the entropy
context adaptive. Tests on derf set showed very good consistent gains
on all metrics: avg psnr .47%, overall psnr .46% and ssim .40%.
http://www.corp.google.com/~yaowu/no_crawl/newModeContext.html
Change-Id: Ia62b14485c948e2b74586118619c5eb2068b43b2
The MODE_STATS macro was used to #ifdef around code for mode entropy
stats collection, this commit fixed a crash when MODE_STATS is on.
The commit also changed a number of array definitions to use defined
macros instead of hard-coded numbers.
Change-Id: I114592f53a1e44e31e455f5725f036ae6168735a
Because the variable doesn't distinguish between DC and non-DC
prediction, but rather between 16x16 or 4x4 prediction.
Change-ID: Ia4e7dda2bd6230c91515072e3277be2d64e42629
Do the test filtering in the existing backup frame buffer instead of
the original. Copy the original data into extra buffer before doing
the filtering. This way there is no need to restore the original
unfiltered frame at the end of level picking process.
This came up in some discussions with Johann. Thanks!
Change-Id: I495f4301d983854673276c34ec0ddf9a9d622122
This patch introduces the concept of dual inter16x16 prediction. A
16x16 inter-predicted macroblock can use 2 references instead of 1,
where both references use the same mvmode (new, near/est, zero). In the
case of newmv, this means that two MVs are coded instead of one. The
frame can be encoded in 3 ways: all MBs single-prediction, all MBs dual
prediction, or per-MB single/dual prediction selection ("hybrid"), in
which case a single bit is coded per-MB to indicate whether the MB uses
single or dual inter prediction.
In the future, we can (maybe?) get further gains by mixing this with
Adrian's 32x32 work, per-segment dual prediction settings, or adding
support for dual splitmv/8x8mv inter prediction.
Gain (on derf-set, CQ mode) is ~2.8% (SSIM) or ~3.6% (glb PSNR). Most
gain is at medium/high bitrates, but there's minor gains at low bitrates
also. Output was confirmed to match between encoder and decoder.
Note for optimization people: this patch introduces a 2nd version of
16x16/8x8 sixtap/bilin functions, which does an avg instead of a
store. They may want to look and make sure this is implemented to
their satisfaction so we can optimize it best in the future.
Change-ID: I59dc84b07cbb3ccf073ac0f756d03d294cb19281
Added code to allocate aligned image buffer in vpx_img_alloc(). The
alignment of the buffer and stride is determined by the parameter
align.
Change-Id: Idc866978484be3558551c56df39130ab7f74ddd4
Resolved or factored out some further issues with Q index.
Put in a 3rd order polynomial instead of less accurate power function
as the best fit on gf and kf boost adjustment.
Added avg_q value to use instead of ni_av_qi.
Compute segment delta Q values based on avg_q.
Fixed bug in adjust_maxq_qrange().
The extended range Q on the derf set, using standard data rates
(which do not extend high enough to get big benefits) still show
a shortfall of between 0.5 and 1% though so there would appear to
be further issues that need to be tracked down.
Change-Id: Icfd49b9f401906ba487ef1bef7d397048295d959
The example encoder down-samples the input video frames a number of
times with a down-sampling factor, and then encodes and outputs
bitstreams with different resolutions.
Support arbitrary down-sampling factor, and down-sampling factor
can be different for each encoding level.
For example, the encoder can be tested as follows.
1. Configure with multi-resolution encoding enabled:
../libvpx/configure --target=x86-linux-gcc --disable-codecs
--enable-vp8 --enable-runtime_cpu_detect --enable-debug
--disable-install-docs --enable-error-concealment
--enable-multi-res-encoding
2. Run make
3. Encode:
If input video is 1280x720, run:
./vp8_multi_resolution_encoder 1280 720 input.yuv 1.ivf 2.ivf 3.ivf 1
(output: 1.ivf(1280x720); 2.ivf(640x360); 3.ivf(320x180).
The last parameter is set to 1/0 to show/not show PSNR.)
4. Decode:
./simple_decoder 1.ivf 1.yuv
./simple_decoder 2.ivf 2.yuv
./simple_decoder 3.ivf 3.yuv
5. View video:
mplayer 1.yuv -demuxer rawvideo -rawvideo w=1280:h=720 -loop 0 -fps 30
mplayer 2.yuv -demuxer rawvideo -rawvideo w=640:h=360 -loop 0 -fps 30
mplayer 3.yuv -demuxer rawvideo -rawvideo w=320:h=180 -loop 0 -fps 30
The encoding parameters can be modified in vp8_multi_resolution_encoder.c,
for example, target bitrate, frame rate...
Modified API. John helped a lot with that. Thanks!
Change-Id: I03be9a51167eddf94399f92d269599fb3f3d54f5
Added two experimental options to the configure script:
1. newnear:
new scheme of doing mv encoding that include a motion vector from
last frame in nearest and near mv search
2. mulcontext:
tracks entropy context separately for regular frames and alt ref
frames.
Change-Id: If6e0d5d593351707b497a26eb6a763e080f77e6f
This commit added code to keep track of separate entropy contexts for
normal frames and alt ref frames. The underly assumption was that the
two type of frames have different entropy characteristics given they
typically have quite different quantization levels. By keeping entropy
contexts separate, it helps the entropy context distribution to be more
closely adapted to each frame type.
Tests on derf set showed a good and very consistent gain on all clips
on all metrics, avg psnr: 0.89%, overall psnr: 0.84% and ssim 0.93%.
http://www.corp.google.com/~yaowu/no_crawl/mulcontext.html
Change-Id: I15bc9697f6ff7829042911fe0c62930585d7e65d
This commit enabled the usage of 8x8 intra prediction modes on inter
frames. There are a few TODO items related to this: 1)baseline entropy
need be calibrated; 2)cost of UV need to be done more properly rather
than using decision only relying on Y; 3)Threshold for allowing picking
8x8 intra prediction should be lowered to lower than the B_PRED.
Even with all the TODOs, tests showed consistent gain on derf set ~0.1%
(PSNR:0.08% and SSIM:0.14%). It is assumed that 8x8 intra prediction
will help more on large resolution clips, especially with above TODOs
addressed.
Change-Id: I398ada49dfc32575cfab962a569c2885111ae3ba
Fixed some further QIndex related issues and replaced some tables
(eg zbin and rounding)
Also Added function (currently disabled by default) to populate the
main AC and DC quantizer tables. Using the original AC range the
resulting computed DC values give behavior broadly comparable
on the DERF set. That is not to say that the equations will hold good
over a more extended range. The purpose of this code is to make it
easier to experiment with further alterations to the Q range and distribution
of Q values plus the relative weights given to AC and DC.
The function find_fp_qindex() ensures that changes to the Q tables
are reflected in the value passed in to the first pass code.
Slight experimental adjustment to static segment Q offset.
Change-Id: I36186267d55dfc2a3d565d0cff7218ef300d1cd5
this commit is to add an variable in the macroblock level mode
info structure to track the transform size used in each MB, so
the information can be used later in the loop filter to change
how loop filter works on MBs with different transform sizes.
Change-Id: Id0eeaba6cc854c6d1be00ed8d237b3d9e250e447
Slight tweaks to the new minq equations to bring results more into line with
original lookup tables.
Change-Id: I969fc87d95912df549b6775e83ee2345e84d4da0
Fixed bug in firspass.c call to vp8_initialize_rd_consts()
This was passing in vp8_dc_quant(cm->base_qindex, cm->y1dc_delta_q)
instead of (cm->base_qindex + cm->y1dc_delta_q).
It just so happens that for the value 26 used for cm->base_qindex in the
unextended Q case, the two give similar results. However, when using
the extended Q range the two are very different.
Also added more stats output and partly disabled another broken feature.
Change-Id: Iddf6cf5ea8467c44b7c133f38e629f6ba6f2581e
dynamicly assign ARG_CTRL_CNT_MAX and
add check to make sure argument instance
doesnt already exist before creating a duplicate
Change-Id: I4f78a9c5346cda8e812cd89c077afe8996493508
This value needs to be copied to each thread's data structure.
This fixed artifact problem in multi-thread encoder.
Change-Id: Iab6d9745a1d44846aa503184705376f63a505597
This is an experiment to include a mv contribution from last frame to
nearest and near mv definition. Initial test showed some small though
consistent gain.
latest patch slightly better result ~.13%-~.18%.
TODO: the entropy used to encode the mode choice, i.e. the mv counts
based conditional distribution of modes should be re-collected to
reflect this change, it is expected that there is some further gain
from that.
Change-Id: Ief1e284a36d8aa56b49ae5b360c91419ec494fa4
to the dqcoeff or qcoeff buffer. The encoder would
populate the dc coeffs of the y blocks as a separate
stage (recon_dcblock) and the decoder would use a special
version of the idct. This change eliminates the extra copy
and reduces the code footprint.
[Tero] Added needed changes to armv6 and NEON assembly.
Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba
This comitt brings accross changes from the public branch
commit number Icf74d13af77437c08602571dc7a97e747cce5066.
The main puurpose of this comit relates to CQ mode but it
also includes some refactoring of the two pass code which
I hope will make tuning the experimental branch for the new
quantizer range a little less painfull.
Change-Id: I278e989436a928fc1fe7761068960048f9d7a376
This commit resolves further QIndex look up tables to facilitate
experimentation with the quantizer range.
In some cases rather than remove the look up tables completely
I have created functions that are called once to populate them
using a formulaic approach base on the actual quantizer.
The use of these functions based on best fit of data from the original
tables does affect the results on some clips but across the derf test
set the effect was broadly neutral.
Change-Id: I8baa61c97ce87dc09a6340d56fdeb681b9345793
API was not returning correct partition sizes on arm targets.
The armv5 token packing functions were not storing the information to the
partition size table.
As a fix, have one boolcoder instance allocated for each partition so
that partition sizes are internally available after all partitions
were encoded. This will also allow more flexibility in producing
several partitions in parallel.
Use buffer validation (overflow check) in all ARM bitpacking
functions.
Change-Id: I31c8a11d8a7613676f0ff50928cb2a2ab14fd169
One of the problems arising when tweaking or adjusting the quantizer
tables is that there are a lot of look up tables that depend on the QINDEX.
Any adjustment to the link between QINDEX and real quantizer therefore tends
to break aspects of for example the rate control.
In this check in I have replaced several of the look up tables with functions that
approximate the same results as the old Q luts but use a formulaic approach
based on real Q values rather than QIndex. This should hopefully make it easier
to experiment with changes to the Q tables without always having to go through
and hand optimize a set of look up tables. Once things stabilize we may choose
to re-instate luts for the sake of performance.
Patch 2:
Addressed Ronald's comments.
vp8_init_me_luts() Added so luts only initialized once.
Change-Id: Ic80db2212d2fd01e08e8cb5c7dca1fda1102be57
Corrected dc lookup table to maintain ac/dc balance
close to what it was previously.
Firstpass not being passed the adjusted Q index for
the extended range.
Change-Id: Ic0200dabda445fea03bf81067999cb2670e99b77
Fix decoder segmentation bug for temporal coding where the segment map
was first initialized on a key frame.
in vp8_kfread_modes() after reading the segment id it must be written to
the pbi->segmentation_map[] for use in temporal coding on subsequent frames.
Change-Id: I1489305efc376564e734a216f69c2844646ee3d3
Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2
file is bad
Moving towards allowing --disable-mmx
Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
There was an implicit reference frame test order (typically LAST,
GOLD, ARF) in the mode selection logic, but this doesn't provide the
expected results when some reference frames are disabled. For
instance, in real-time mode, the speed selection logic often disables
the ARF modes. So if the user disables the LAST and GOLD frames, the
encoder was always choosing INTRA, when in reality searching the ARF
in this case has the same speed penalty as searching LAST would have
had.
Instead, introduce the notion of a reference frame search order. This
patch preserves the former priorities, so if a frame is disabled, the
other frames bump up a slot to take its place. This patch lays the
groundwork for doing something smarter in the frame test order, for
example considering temporal distance or looking at the frames used by
nearby blocks.
Change-Id: I1199149f8662a408537c653d2c021c7f1d29a700
Extend buffer write validation (overflow check) to single token
partition packing, both mb and row based functions.
Change-Id: I36e19b7d37fc43712d05c70e3ad223d3eb5b973d
The buffer level was able to increase indefinitely rather than
being clipped to the maximum buffer size specified by the user.
This change checks the buffrer level and prevents it from
going beyond the upper limit of the buffer.
Change-Id: Ifff55f79d3c018e4d3d77e554b11ada543cc1654
This commit has a few minor fixes to the 8x8 trellis quant, so to
make it work regardless if extend_qrange is enabled or not. It also
borrowed adaptive RDMULT constants from 4x4 trellis that was missed
in the 8x8 trellis quant.
Change-Id: I60d7769071f102c699b5084597e62bca87a1f759
Explicit inclusion of limits.h to satisfy unix build for definition of INT_MAX.
Some commented out code removed.
Change-Id: I5b5980dfaa9b4d2d12bfd729cfd35bd982106908
The buffer level was not being capped at the maximum decode
buffer size specified by the user.
In CBR mode this resulted in large files being generated
immediately after a dramatic reduction in the data rate
target. This fix should prevent this from happening.
Change-Id: Iaf01c9fb037cf51515217a5834d6ee4fbb0cb853
Patch set 2: 64 bit build fix
Patch set 3: 64 bit crash fix
[Tero]
Patch set 4: Updated ARMv6 and NEON assembly.
Added also minor NEON optimizations to subtract
functions.
Patch set 5: x86 stride bug fix
Change-Id: I1fcca93e90c89b89ddc204e1c18f208682675c15
Removal of CONFIGURE_SEGMENTATION ifdefs.
Removal of legacy support code fo the old coding mechanism.
Use local reference "xd" for MACROBLOCKD structure in
encode_frame_to_data_rate()
Moved call to choose_segmap_coding_method() out of encode
loop as the cost of segmentation is not properly accounted
in the loop anyway. If this is desirable in the future it
can be moved back. The use of this function to do all the
analysis and set the probabilities also removes the need
to track segment useage in threading code.
Change-Id: I85bc8fd63440e7176c73d26cb742698f9b70cade
Changed name and sense of segment_flag to "seg_id_predicted"
Added some additional comments and retested.
I also did some experimentation with a spatial prediction option
using a similar strategy to the temporal mode implemented.
This helps in some cases where temporal prediction is bad but
I suspect there is more overlap here with work on a larger scale
block structure and spatial correlation will likely be better
handled through that mechanism.
Next check in will remove #ifdefs and legacy mode code.
Change-Id: I3b382b65ed2a57bd7775ac0f3a01a9508a209cbc
This check in includes quite a lot of clean up and refactoring.
Most of the analysis and set up for the different coding options for the
segment map (currently simple distribution based coding or temporaly
predicted coding), has been moved to one location (the function
choose_segmap_coding_method() in segmenation.c). This code was previously
scattered around in various locations making integration with other
experiments and modification / debug more difficult.
Currently the functionality is as it was with the exception that the
prediction probabilities are now only transmitted when the temporal
prediction mode is selected.
There is still quite a bit more clean up work that will be possible
when the #ifdef is removed. Also at that time I may rename and alter
the sense of macroblock based variable "segment_flag" which indicates
(1 that the segmnet id is not predicted vs 0 that it is predicted).
I also intend to experiment with a spatial prediction mode that can be
used when coding a key frame segment map or in cases where temporal
prediction does not work well but there is spatial correlation.
In a later check in when the ifdefs have gone I may also move the call
to choose_segmap_coding_method() to just before where the bitsream is
packed (currently it is in vp8_encode_frame()) to further reduce the
possibility of clashes with other experiments and prevent it being called
on each itteration of the recode loop.
Change-Id: I3d4aba2a2826ec21f367678d5b07c1d1c36db168
The calculated frame_rate is a state variable in the codec, and
shouldn't be maintained in the configuration struct. Move it to the
main part of cpi so that it isn't clobbered when the configuration
struct is updated. The initial framerate estimate is moved from the
vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so
that it is only called once and not reset on every call to
vp8_change_config().
Change-Id: I8d9a3d1283330d1ee297d07e9d78d1f2875f2465
Added last_segmentation_map[] structure
to keep track of what we had before when
doing temporal prediction. With this change
the existing code does once again appear to
be giving a decodable bitstream for both
temporal and standard prediction modes.
However, it is still somewhat messy and
confused and there is no option to take
advantage of spatial prediction so it could
do with further work.
Some housekeeping / clean out.
Change-Id: I368258243f82127b81d8dffa7ada615208513b47
Some initial cleanup to aid testing and debug.
Pull code to choose temporal or spatial encoding
out of encodeframe.c into a dedicated function
in segmentation.c.
For now disable broken temporal mode.
Move the coding of "temporal_update" flag and
only transmit if segment map update is indicated.
Rename the functions read_mb_features() and
write_mb_features() to read_mb_segid() and
read_mb_segid() as they only read and write
the macroblock segment id not any of the
features.
Change-Id: Ib75118520b1144c24d35fdfc6ce46106803cabcf
The dequantizer functions for 2nd order haar block had confusing 8x8
in their names. this commit fixed their name to avoid confusion.
Change-Id: I6ae4e7888330865f831436313637d4395b1fc273
This commit added scaling factors to 8x8 transform, quant, dequant and
inverse transform pipeline to make 8x8 transform to work when configed
with enable-extend_qrange. This commit also disabled the trellis-quant
when extend_qrange is configured.
Change-Id: Icfb3192e4746f70a4bb35ad18b7b47705b657e52
updated the decode_macroblock logic to reflect that 8x8 transform is
not used for "SPLITMV". Also fixed an issue where 2nd order haar block
has wrong dequant/idct process.
Change-Id: I1e373f6535c009dfec503b6362c8a5cfc196e1da
extend_qrange introduces a different scaling factor, this commit takes
the scaling difference into account for reset 2nd order coefficients.
Change-Id: Ie58bca9f52698fa759e3f88da2aa4d82630fa91a
the bug caused the encoder to produce invalid bitstream when configured
with enable_extend_qrange.
Change-Id: I1e81c48b13359d0043cbbd480e679380a2da117c
Ronald recently sent me this patch that he did in April.
> From: Ronald S. Bultje <rbultje@google.com>
> Date: Thu, 28 Apr 2011 17:30:15 -0700
> Subject: [PATCH] SSE2 optimizations for
> vp8_build_intra_predictors_mby{,_s}().
HD decode tests have shown a performance boost up to 1.5%,
depending on material.
Patch set 3: Fixed encoder crash.
Change-Id: Ie1fd1fa3dc750eec1a7a20bfa2decc079dcf48c8
For ease of testing and merging experiments I have
removed in line code in encode_frame() that assigns
MBs to be t8x8 or t4x4 coded segments and have
moved the decision point and segment setup to
the init_seg_features0 test function.
Keeping everything in one place helps make sure
for now that experiments using segmentation are
not fighting each other.
Also made sure mode selection code can't choose 4x4
modes if t8x8 is selected.
Patch2: In init_seg_features() add checks
for SEG_LVL_TRANSFORM active.
Change-Id: Ia1767edd99b78510011d4251539f9bc325842e3a
Call the idct/add after the tokenize. This is WIP with
the goal of creating a common idct/add for the encoder and
decoder. This move is necessary because the decoder's version
of the idct clobbers qcoeff, which is used by the tokenize.
Change-Id: I6b08d8e8397cd873647fa4fb9469884e3c876756
Removed code in #if CONFIG_SEGMENTATION that
enables segmentation and creates a test segmentation
map, to avoid conflicts with the other segmentation test
code,
Change-Id: I7a21a44ed188b814cd80b30dd628c62474eba730
Bug fix to logic in vp8_pick_inter_mode() and
vp8_rd_pick_inter_mode().
The block on the use of segment features
for the cm->refresh_alt_ref_frame case
was just for testing and is not correct.
The special case code for alt ref can
be re-enabled as an else clause.
Change-Id: Ic9b57cdb5f04ea7737032b8fb953d84d7717b3ce
Added ARM optimized intra 4x4 prediction
- 2x faster on Profiler compared to C-code compiled with -O3
- Function interface changed a little to improve BLOCKD structure
access
Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
The 8x8 forward transform makes use of floating operations, therefore
requires emms call to reset mmx registers to correct state. Without
the resets, the 8x8 forward transform results are indefinite on win32
platform.
Change-Id: Ib5b71c3213e10b8a04fe776adf885f3714e7deb1
The referenced function (SignalObjectAndWait) isn't used. Reduces the
warnings with mingw32-w64 which defines this.
Change-Id: I4ce592879ec9372bf196dac640204c4d370bd210
vp8cx_mb_init_quantizer() needs to be called at least once to get
all values calculated. This change added one check to decide if
we could skip initialization or not.
Change-Id: I3f65eb548be57580a61444328336bc18c25c085b
logically this commit should NOT change anything, but seems to help
revert the 3DB loss on bowing in the following commit:
https://on2-git.corp.google.com/g/#change,6193
This is still debugging in progress. Need further investigation to
understand the root cause of the issue.
Change-Id: I0b49d1ef3a311dfff58c6acd3eaebdb3bda6257c
Initial attempt at using new segment feature signaling
to indicate 4x4 or 8x8 transform.
needs --enable-experimental --enable-t8x8
Note this is work in progress.
Change-Id: Ib160d46a5d810307bfcbc79853ce1a65b5b870b7
Added code to clip the buffer level to the maximum buffer
size. Without this the buffer level would increase
unchecked.
This bug was found when encoding an essentially static
scene at 2Mb/s. The encoder is unable to generate frames
consistent with the high data-rate because Q bottoms out
at Qmin.
As frames generated are consistently undersized the buffer
level increases and does not get checked against the
maximum size specified by the user (or default).
Change-Id: Id8a3c6323d3246da50f7cb53ddbf78b5528032c6
Fix compiler warning for passing a non const array
to a function expecting a const array by using an
intermediary pointer and casting.
Change-Id: I9bdd358ebdc926223993fb8fb2098ffedd2f3fc7
Temporary check in to turn off other segment features
tests when #if CONFIG_T8X8 is set as the assignment of
MBs to differnt segments in each case will conflict.
The 8x8 code will be modified to use the new segment
feature method properly in a later check in.
Increase bits allowed for EOB end stop marker to 6 ready
for 8x8.
Change-Id: I4835bc8d3bf98e1775c3d247d778639c90b01f7f
No change to functionality or output.
Updates to the segment feature data structure now all done
through functions such as set_segdata() and get_segdata()
in seg_common.c.
The reason for this is to make changing the structures (if needed)
and debug easier.
In addition it provides a single location for subsequent addition
of range and validity checks. For example valid combination of
mode and reference frame.
Change-Id: I2e866505562db4e4cb6f17a472b25b4465f01add
This commit tries to do UV intra mode coding adaptive to Y intra mode.
Entropy context is defined as conditional PDF of uv intra mode given
the Y mode. All constants are normalized with 256 to be fit in 8 bits.
This provides further coding efficiency beyond the quantizer adaptive
y intra mode coding. Consistent gains were observed on all clips and
all bit rates for HD all key encoding tests.
To test, configure with
--enable-experimental --enable-uvintra
Change-Id: I2d78d73f143127f063e19bd0bac3b68c418d756a
As discovered in path 10 of Change Ia12acd2f, reset 2nd order coeffs
without reset of above and left coding context may have introduced
problem that causes encoder/decoder mismatching. This commit added
update to coding context when the 2nd coefficients are cleared.
In addition, this commit also introduced early breakout in the checks
to speed up when coefficients are too significant to be cleared.
Change-Id: I85322a432b11e8af85001525d1e9dc218f9a0bd6
Removal of configure #ifdefs so that segment features
always available. Removal of code supporting old
segment feature method.
Still a good deal of tidying up to do.
Change-Id: I397855f086f8c09ab1fae0a5f65d9e06d2e3e39f
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.
Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
Modify reference frame segmentation so that ONE or MORE
reference frames may be marked as a available for a given
segment.
Fixed bugs relating to segment coding of INTRA and some
INTER modes at the segment level.
Modified Q boost for static areas based on ambient average Q.
Strong results now on clips with significant static areas.
(some data points in derf set as high as 9% and some static &
slide show type content in YT set > 20%)
Change-Id: Ia79f912efa84b977f35a23683ae3643251e24f0c
Adding support for several partitions within one input fragment.
This is necessary to fully support all possible packetization
combinations in the VP8 RTP profile. Several partitions can
be transmitted in the same packet, and they can only be split
by reading the partition lengths from the bitstream.
Change-Id: If7d7ea331cc78cb7efd74c4a976b720c9a655463
In some situations (f.g. error-resilient is turned on), vp8cx_mb
_init_quantizer() was called once per macroblock. Added checks
to avoid calculations when there is no change.
Change-Id: Ie4f0a5ade2202041254990a4e9d5b03bd1ac5aea
The block of code skipped testing the current mode if the
reference frame is AltRef, the mv is not (0,0) and
ARNR filtering is disabled.
This block of code has already been tested above if the
macro CONFIG_SEGFEATURES is set to 0.
Change-Id: I3f5710bb8270caad06c9a0eee59fa0daf1f70776
The variable this_mode was being used before it had been
initialized.
Moved the line that sets-up this_mode toward the top of the
enclosing loop, prior to its first use. The bug would result in
tests in the loop lagging the mode that was expected to be
tested.
Change-Id: If4e51600449ce6b4285f112da17a44c24b4a19fb
Some correction for entropy impact of segment signaled (EOB and ref frame)
Other slight tweaks.
Derf VBR average gain now over 1% (best over 7%)
One YT test clip has gains of circa 30% (VBR)
There is still an issue with noisy clips where making the background static
and coded with 0,0 can have a negative effect, especially at low Q.
This is probably because of the loss of smoothing by fractional pixel filters.
Change-Id: I7a225613c98067b96f8fc7a7e36f95d465b2b834
It is discovered that in rare situations the 2nd order block may
produce a few small magnitude coefficients that has no effect on
reconstruction. The situations are a combination of low quantizer
values (high quality) and low energy in residual signals (content
dependent). This commit added code to detect such cases and reset
the 2nd order block to all 0.
Patch 1 to 4 used code to do all-zero-check on idct result buffer,
and tests on derf set showed a consistent gain of .12%-.14% on all
metrics.But due to a recent change Ie31d90b, the idct result buffer
is not longer populated. So patch 5&6 use an alternative method to
detect the situations. Tests on derf set now shows a consistent
quality gain of .16%-.20%.
As suggested by Jim, Patch 7&8 removed the condition of all first
order block not having any coefficient, instead we reset 2nd order
coefficients to all 0 if sum of absolute value of the coefficients
is small. So it does slightly more than just detecting the oddity
as discussed above, but tests on derf set now show a consistent
gain of .20%-.23% on all metrics.
It is worth noting here that this change does not have any effect
on mid/high quantizer range, it only affects the quantizer value
18 or blow. Within this range, the change helps compression by up
to 2.5% on clips in the derf set.
Change-Id: I718e19cf59a4fc2462cb7070832759beb9f7e7dd
Tests showed ~1.2% performance boost on the HD clip used.
Performance will vary based on material.
Change-Id: Icbcf1a828750d5b4ae5252bf596b3ef594042e8a
Interleaved vp8_find_near_mvs and vp8_mv_ref_probs.
2.5% to 4% performance improvement for the HD clips used.
Change-Id: Id888b667cf5ae2f0e19da18743140f055ff7de8d
The partial frame copy function used to copy an extra 8 lines above
and below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.
Define the "magic" fraction number for partial filtering in
loopfilter.h .
Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
Restructure if statement to clarify the error condition. Trigger the
error before clobbering pc-> variables.
Change-Id: Id01cab798a341ce9899078fdcec265a0e942a0b7
Global function pointers can not be defined in header files. Restructure
vpx_scale pointer configuration.
Change-Id: I6f568a263ad770d32f530abad6007f990fd1003a
Decode the mv mode with if-then-elses instead of traversing
the vp8_mv_ref_tree data structure. This will make it
easier to interleave vp8_find_near_mvs and vp8_mv_ref_probs.
Change-Id: I1e798d6ec40fcaeeff06ccc82f81201978d12f74
Prior to the added rounding, tests on randomly generated data showed
that forward-inverse transform round trip errors are about 3.02/block
for input range [-10,10] and 2.68/block for input range [-256, 255].
The added rounding reduced the errors to 0.031/block for input range
[-10,10] and 0.037/block for input range [-256, 255].
Maximum round trip error on for any pixel position is 1.
The average errors are calculated based on 100,000 blocks of randomly
with the specified ranges.
Paul mentioned in discussion that the change was not clear on why we
need change the rounding, so Patch 2 intends to make the rationale
obvious in code, it merged the two separate shifts into one, and the
two separate rounding factors into one. Patch 1 and 2 have same
numerical test results.
Change-Id: Ic5e2f5463de17253084d8b2398c4a210194b20de
Only encode sign bit for feature data that can have a sign.
Tweaks to the test segmentation rules so that it now actually gives
a net benefit on the derf set of about 0.4% though much higher
on some clips at the low end.
Change-Id: I8e61f1aebf41c9037db7e67e2f8975aa18a0c986
This quite large check in includes the following:
Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
This is used as a basis for a simple segmentation for the normal frames
in a gf/arf group. This code also uses satd functions from Yaowu.
Adds functionality for coding the latest possible position of an EOB for
blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
Where the EOB position is 0 this acts like "skip" and the normal coding
of skip at the per mb level is disabled.
Added functions (seg_common.c) for setting and reading segment feature
elements. These may want to be optimized away at some point but while the
mecahnism is in a state of flux they provide a single location for making
changes and keep things a bit cleaner.
This is still proof of concept code. Currently the tested feature set:-
Quantizer,
Loop Filter level,
Reference frame,
Prediction Mode,
EOB end stop.
TBD:-
Add functions for setting and reading the feature data with range
and validity checking.
Handling of signed and unsigned feature data. At the moment all is assumed
to be signed and a sign bit is coded but many cannot be negative.
Correct handling of EOB feature with intra coded blocks.
Testing/trapping of legal/illegal ref frame and mode combinations.
Transform size switch plus merge and test with 8c8 DCT work
Merge and test with Sumans Segmenation coding optimizations
Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.
Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28
During the _pick only the Y plane is examined. In addition, data beyond
the borders of the frame is not read.
Change-Id: Ic549adfca70fc6e0b55f8aab0efe81f0afac89f9
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
NEON version of copyframeyonly, extendframeborders, copy_frame_func were
not working for plane stride < 128 and/or y_width < 128.
Change-Id: Id6c2e6c795274da0c90134b15c0d5f62d1b17a6c
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.
Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.
The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)
Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.
Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26
Previous entropy probs need to be saved (and restored) only when
current updates are not propagated.
Change-Id: Ie6ee0543066e30874e56258be0a6b7d2dd2fdb2b
When 8x8 transform is enabled, the decoder does an extra reconstruct
on MBs that are coded using 8x8. This commit fixed the logic around
the decoding of mb encoded with 8x8 transform.
Change-Id: I6926557c9ef00eecb375f62946f7e140c660bf6f
Uninitialized data could be written to the first pass file when no
motion vectors are present in the frame.
Also fix a number of compiler warnings.
Change-Id: Icc9f53b6d33da9de4563d86d9fd591910473ea90
Proof of concept test code that encodes mode and reference
frame data at the segment level.
Decode-able bit stream but some issues not yet resolved.
As it this helps a little on a couple of clips but hurts on most as
the basis for segmentation is unsound.
To build and test, configure with
--enable-experimental --enable-segfeatures
Change-Id: I22a60774f69273523fb152db8c31f4b10b07c7f4
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).
This implementation is x86_64 only. We retain the previous
implementation for x86.
Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.
Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
vp8_find_near_mvs() is being called on all possible reference frames
but the data computed may be used if the loop exits early, which can
be due to x->skip beign set to 1.
Optimize this by call vp8_find_near_mvs() laziy only if it is going
to be used and not computed yet.
Change-Id: Iccdbd4c962a670c9f2c99b8aca8096042ca5dc98
Changes to the selection of Q limits for two pass
and two pass CQ mode.
Allowance made for Mode and motion vector costs.
Some refactoring of common code.
For Derf and YT sets CQ mode average improvement
circa 1% (SSIM and Global PSNR).
Some increased tendency to undershoot even when
user CQ not reached.
Patch2: Removed some test code accidentally merged.
Change-Id: Icf74d13af77437c08602571dc7a97e747cce5066
'all' is the conventional target for building everything in the
makefile, but the child make was expecting all-$(target), for debugging
reasons that I don't recall exactly. Restore the expected behavior.
Change-Id: Ifbb03610b55be679ce7c5e210b7a69a156bb76b9
Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.
Change-Id: I288d97b5e331d41d6f5bb49d97986fa12ac6f066
To avoid a dependency on vpx_version.h, call the vpx_codec_version_str()
function and build up the string manually.
Change-Id: Ie650e9b8f2aaaffaa31da5e9ef3b566b972321b4
Make sure that this header is listed as one of the sources, so that it
will be installed if necessary.
Change-Id: I2427e494488126b179151dc21043c1e2c8ba5991
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9
Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.
Restriction applies only to armv5 targets and not for armv6 and above.
Change-Id: I378d17c51e1f16a6820814fbed43e115aaabb03e
These changes fixes a glitch between the RTP profile and the input
partitions interface. Since there's no way for the user to know the
actual number of partitions, the decoder have to read the
multi_token_paritition bits also when input partitions mode is
enabled.
Included are also a couple of fixes for issues with independent
partitions and uninitialized memory reads.
Change-Id: I6f93b15287d291169ed681898ed3fbcc5dc81837
- Updated walsh transform to match C
(based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
correspondingly
Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e
Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.
Change-Id: I338f02b881098782f82af63d97f042b85e63e902
This commit added a 3 bit index to the bitstream, the index is used to
look into the intra mode coding entropy context table. The commit uses
the mode stats to calculate the cost of transmitting modes using 8
possible entropy distributions, and selects the distribution that
provides the lowest cost to do the actual mode coding.
Initial test show this provides additional .2%~.3% gain over quantizer
adaptive intra mode coding. So the adaptive intra mode coding provides
a total of .5%(psnr) to .6% gain(ssim) combined for all-key-encoding
To build and test, configure with
--enable-experimental --enable-qimode
Change-Id: I7c41cd8bfb352bc1fe7c5da1848a58faea5ed74a
make intra mode coding entropy distribution adaptive to baseQindex, an
encoding test on hd clips with all key frame shows universal gain on
all clips in both .2%(psnr) and (ssim).3%.
To build and test, configure with
--enable-experimental --enable-qimode
Change-Id: Iaa69241b984d4fdd8baa6d77ee78c0140f5ac00a
Patch 1 to Patch 3 is an initial implementation of 8x8 intra prediction
modes, here are with the following assumptions:
a. 8x8 has 4 prediction modes DC, H, V and TM
b. UV 4x4 block use the same mode as corresponding 8x8 area
c. i8x8 modes are enabled for key frame only for now
Patch 4:
d. removed debug code from previous patches
Patch 5:
e. added stats code to collect entropy stats and further cleaned up
Patch 6:
f. changed mode stats code to collect finer stats of modes
Patch 7:
g. normalized i8x8 modes distribution to total at 256 (8bits).
Patch 8:
h. fixed a bug in decoder and removed debug printf output.
Patch 9:
i. more cleanups to address paul's comment
Patch 10:
j. messy rebase/merges to bring the commit up to date.
Tests on HD clips encoded with all key frame showing consistent gain
on all clips and all metrics:~0.5%(psnr) and 0.6%(ssim):
http://www.corp.google.com/~yaowu/no_crawl/i8x8hd_allkey_fixedq.html
To build and test, configure with:
--enable-experimental --enable-i8x8
Change-Id: I9813fe07ae48cab5fdb5d904bca022514ad01e7f
In the "Removed bmi copy to/from BLOCKD" commit, the copy
to the bmi in BLOCKD was eliminated. The clamp_mvs() used
the bmi in BLOCKD, which now contains incorrect values. This
patch fixes this problem.
Change-Id: I8eca1eaf4015052b0b63e90876f7ad321aba7cff
Code all the features for one segment (grouped together)
then all for the next etc. etc. rather than grouping the
data by feature.
Change-Id: I2a65193b3a70aca78f92e855e35d8969d857b6dd
This data structure is now [Segment ID][Features]
rather than [Features][Segment_ID]
I propose as a separate modification to make the experimental
bit stream reflect this such that all the features for a segment
are coded together.
Change-Id: I581e4e3ca2033bdbdef3d9300977a8202f55b4fb
Some basic plumbing added for a range of segment level features.
MB_LVL_* changed to SEG_LVL_* to better reflect meaning.
Change-Id: Iac96da36990aa0e40afc0d86e990df337fd0c50b
vp8_update_zbin_extra() is called all the time even though the fast
quantizer doesn't use it. Skip this call if fast quantizer is used.
Change-Id: Ia711c38431930cc2486cf59b8466060ef0e9d9db
This change makes sure that no key frame recoding in real-time mode
even if CONFIG_REALTIME_ONLY is not configured.
Change-Id: Ifc34141f3217a6bb63cc087d78b111fadb35eec2
for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
Prepend idct function names with vp8_
so that under profiling they show up
associated with libvpx.
Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
The data that the simple horizontal loopfilter reads is aligned, treat
it accordingly.
For the vertical, we only use the bottom 4 bytes, so don't read in 16
(and incur the penalty for unaligned access).
This shows a small improvement on older processors which have a
significant penalty for unaligned reads.
postproc_mmx.c is unused
Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
Prepend . to local labels in assembly code. This
allows non unique labels within a file. Also
makes profiling information more informative
by keeping the function name with the loop name.
Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I29f5c2ead342b2086a468029c15e2c1d948b5d97
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.
This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.
Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4
Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.
Also use min GF interval as min KF interval.
Change-Id: Iee238b8cae0ffaed850a5a944ac825cee18da485
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.
Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.
Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
To get a list of files that the libvpx library depends on in the current
configuration, run:
$ make target=libs libvpx_srcs.txt
Change-Id: I68a69648ecf212f0fe29c325297728ac2a9393d9
This reverts commit b5ea2fbc2c. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.
Conflicts:
vp8/encoder/onyx_if.c
vp8/encoder/ratectrl.c
Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.
Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.
Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.
Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.
Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.
Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.
Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.
Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.
Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
Corrected the merge direction this time, so that running
`git describe` on the master branch finds v0.9.7 as the most recent
tag.
Change-Id: I9e7b5d473c26e670c6d9a76f5c03fa617690651d
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.
Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77
Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.
Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146
The static libs should not be built from sources during the top level
of a universal build. This regression was introduced in commit
495b241fa6, which made the static
libs selectable under CONFIG_STATIC.
Change-Id: I585167e17459877e0fa7fa19e1046c3703d91c97
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676
update the code with new assumptions and guard them with a compile time
assert
Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
Since this is the only ABI incompatible change since the last release,
convert it to use the control interface instead. The member of the
configuration struct is replaced with the VP8E_SET_MAX_INTRA_BITRATE_PCT
control.
More significant API changes were expected to be forthcoming when this
control was first introduced, and while they continue to be expected,
it's not worth breaking compatibility for only this change.
Change-Id: I799d8dbe24c8bc9c241e0b7743b2b64f81327d59
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):
1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain
2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.
Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.
Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
This adds the magic .note.GNU-stack section at the end of each ARM
asm file (when built with gas), indicating that a non-executable
stack is allowed.
Without this section, the linker will assume the object requires an
executable stack by default, forcing an executable stack for the
entire program.
Change-Id: Ie86de6a449b52d392b9e5e0479833ed8c508ee65
With this fix, the experimental branch now builds and encodes correctly
with the following two configure options respectively:
--enable-experimental --enable-t8x8
--enable-experimental
Change-Id: I3147c33c503fe713a85fd371e4f1a974805778bf
The auto merge process pull and merge commits from public git or master
branch. These automerges while worked well most time, but has created
a few problems. This commit fixed several issues existed long before
the latest 8x8 transform commit.
Change-Id: I895ca99713231b1aec521d57db5d9839f74aacfa
This is done by expanding luma row to 32-byte alignment, since
there is currently a bunch of code that assumes that
uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
encoder/temporal_filter.c, and possibly others; I haven't done a
full audit).
It also uses replaces the hardcoded border of 16 in a number of
encoder buffers with VP8BORDERINPIXELS (currently 32), as the
chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
dumping the frame memory as a contiguous blob produces a valid,
if padded, image.
Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
This reverts commit b73a3693e5.
This version of the check doesn't work with generic-gnu, and figuring
out the correct symbol version at configure time is probably more work
than this is worth. May revisit in the future.
Change-Id: I6c75e88bd3bd82a4b21e09a25780fe53aacb7d70
allowing the compiler to inline this function. For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.
Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.
Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.
Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
Using small values for --buf-sz= in command line causes
floating point exception due to division by zero.
Change-Id: Ibfe2d44db922993a78ebc9a4a1087d9625de48ae
Optimized C-code of the following functions:
- vp8_tokenize_mb
- tokenize1st_order_b
- tokenize2nd_order_b
Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
depending on encoding parameters.
Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c
glibc implements some checking on longjmp() calls by replacing it with
an internal function __longjmp_chk(), when FORTIFY_SOURCE is defined.
This can be problematic when compiling the library under one version of
glibc and running it under another. Work around this issue for the one
symbol affected for now, before taking out the undef hammer.
Fixes http://code.google.com/p/webm/issues/detail?id=166
Change-Id: Ifc5e25cdec17915e394711f2185b3e9214572d10
Previously allocated more memory than necessary for yuv buffers.
This makes it harder to track bugs with reading uninitialized
data.
Change-Id: I510f7b298d3c647c869be6e5d51608becc63cce9
As reported in issue 331, vpxenc encoded incorrect webm file header
on big endian machines. This change fixed that.
Change-Id: I31924ebd476a87f3e88b9b5424540bf781d2b86f
Do mvp clamping in full-pixel precision instead of 1/8-pixel
precision to avoid error caused by right shifting operation.
Also, further fixed the motion vector limit calculation in change:
b748045470
Change-Id: Ied88a4f7ddfb0476eb9f7afc6ceeddbf209fffd7
Separate simple filter with reduced no. of parameters.
MB filter level picking based on precalculated table. Level table updated for
each frame. Inside and edge limits precalculated and updated just when
sharpness changes. HEV threshhold is constant.
ARM targets use scalars and others vectors.
Change works only with --target=generic-gnu
All other targets have to be updated!
Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c
Allow the encoder to inform the application that the encoded frame will not
be used as a reference.
Change-Id: I90e41962325ef73d44da03327deb340d6f7f4860
Motion vector limits are calculated using right shifts, which
could give wrong results for negative numbers. James Berry's
test on one clip showed encoder produced some artifacts. This
change fixed that.
Change-Id: I035fc02280b10455b7f6eb388f7c2e33b796b018
In this commit I have added an experimental function
that tests prediction quality either side of a central position
to calculate a suggested boost number for an ARF frame.
The function is passed an offset from the current position and
a number of frames to search forwards and backwards.
It returns a forward, backward and compound boost number.
The new code can be deactivated using #define NEW_BOOST 0
In its current default state the code searches forwards and backwards
from the proposed position of the next alt ref.
The the old code used a boost number calculated by scanning forward
from the previous GF up to the proposed alt ref frame position.
I have also added some code to try and prevent placement of a gf/arf
where there is a brief flash.
Change-Id: I98af789a5181148659f10dd5dd2ff2d4250cd51c
For clips that are near 60fps and have a lot of alt refs, it's possible
that the ring buffer holding the previous frames sizes/pts could wrap
around, leading to a division by zero.
In addition to checking for this condition in the ring buffer loop,
the buffer size is made dependent on the actual frame rate in use,
rather than defaulting to 60, which should improve accuracy at frame
rates >= ~60.
Change-Id: If5a04d6e847316dc5f7504f25c01164cf9332be8
There were many instances in the code of vp8_coef_tokens and
vp8_coef_tokens-1, which was a preprocessor macro despite the naming
convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
respectively.
Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da
With this commit frames can be received partition-by-partition
from the encoder and passed partition-by-partition to the
decoder.
At the encoder-side this makes it easier to split encoded
frames at partition boundaries, useful when packetizing
frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled,
several VPX_CODEC_CX_FRAME_PKT packets will be returned
from vpx_codec_get_cx_data(), containing one partition
each. The partition_id (starting at 0) specifies the decoding
order of the partitions. All partitions but the last has
the VPX_FRAME_IS_FRAGMENT flag set.
At the decoder this opens up the possibility of decoding partition
N even though partition N-1 was lost (given that independent
partitioning has been enabled in the encoder) if more info
about the missing parts of the stream is available through
external signaling.
Each partition is passed to the decoder through the
vpx_codec_decode() function, with the data pointer pointing
to the start of the partition, and with data_sz equal to the
size of the partition. Missing partitions can be signaled to
the decoder by setting data != NULL and data_sz = 0. When
all partitions have been given to the decoder "end of data"
should be signaled by calling vpx_codec_decode() with
data = NULL and data_sz = 0.
The first partition is the first partition according to the
VP8 bitstream + the uncompressed data chunk + DCT address
offsets if multiple residual partitions are used.
Change-Id: I5bc0682b9e4112e0db77904755c694c3c7ac6e74
Adding capabilities with which the encoder can output frames
partition by partition, and the decoder can get input data
partition by partition.
Change-Id: Ieae0801480b8de8cd43c3c57dd3bab2e4c346fe0
Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.
Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.
Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.
Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
Also includes a couple of error concealment bug fixes:
- the segment_id wasn't properly initialized when missing
- when interpolating and no neighbors are found, set to zero
- clear the qcoef buffer when concealing an MB
Change-Id: Id79c876b41d78b559a2241e9cd0fd2cae6198f49
Only the first frame buffer ref counter was being initialized
because the index was fixed at 0 rather than using i.
Change-Id: Ib842298be4a5e3607f9e21c2cd4bfbee4054ffc4
I got this idea from Pascal (Thanks). Before encoding a macroblock,
copy it to a 16x16 buffer, and then read source data from there
instead. This will help keep the source data in cache, and help
with the performance.
Change-Id: Id05f4cb601299150511d59dcba0ae62c49b5b757
experimental branch build was broken from some merge artifacts, this
commit fixes those issues to enable the experimental branch to build.
Change-Id: Ic52b2d2f1d1b80abb7ecaa4c0927bcf887ac0c2a
This reverts commit 212f618373.
Further testing shows that the overshoot accumulation/damping is too
aggressive on some clips. Allowing the accumulated overshoot to
decay and limiting to damping to golden frames shows some promise.
But some clips show significant overshoot in the buffer window, so
I think this still needs work.
Change-Id: Ic02a9ca34f55229f9cc04786f4fab54cdc1a3ef5
Add the --rate-hist=n option, which displays a histogram with n
buckets for the rate over the --buf-sz window.
Change-Id: I2807b5a1525c7972e9ba40839b37e92b23ceffaf
Add the --q-hist=n option, which displays a histogram with n buckets for
the quantizer selected on each frame.
Change-Id: I59b020c26b0acae0b938685081d9932bd98df5c9
vp8_yv12_copy_frame_ptr() expects same size
buffers which was not previously gaurenteed.
Using an improperly allocated buffer would
cause a crash before.
Change-Id: I904982313ce9352474f80de842013dcd89f48685
RDMULT/RDDIV defines a bit worth of distortion in term of sum squared
difference. This has also been used as errorperbit in subpixel motion
search, where the distortions computed as variance of the difference.
The variance of differences is different from sum squared differences
by amount of DC squared. Typically, for inter predicted MBs, this
difference averages around 10% between the two distortion, so this patch
introduces a 110% constant in deriving errorperbit from RDMULT/RDDIV.
Test on CIF set shows small but positive gain on overall PSNR (.03%)
and SSIM (.07%), overall impact on average PSNR is 0.
Change-Id: I95425f922d037b4d96083064a10c7cdd4948ee62
Relocated the vp8dx_bool_decoder_fill() call, allowing
the compiler to produce better assembly code. Tests
showed a 1 - 2 % performance boost (x86 using gcc)
for the 720p clip used.
Change-Id: Ic5a4eefed8777e6eefa007d4f12dfc7e64482732
The starting points are always within the limits, and bounds
checking on these points is not needed. For speed < 5, the
encoded result changes a little because different treatment
is taken while starting point equals the bounds.
Change-Id: I09a402d310f51e305a3519f1601b1d17b05c6152
Modify the second-pass code to provide a full golden-frame (GF) bit
allocation boost if the past GF group (GFG) had no alt-ref frame (ARF),
even if the current GFG does contain and ARF.
This mostly has no effect on clips, since switching ARFs on/off between
GFGs is not very common. Has a positive effect on e.g. cheer (+0.45 SSIM
at 600kbps) and football (+0.25 SSIM at 600kbps), particularly at high
bitrates. Has a negative effect (-0.04 SSIM at 300kbps) at pamphlet,
which appears only marginally related to this patch, and crew (-0.1 SSIM
at 700kbps).
Change-Id: I2e32899638b59f857e26efeac18a82e0c0b77089
Replace =1 with =true for yasm tool element. This aids in upgrading
e.g., vs9 project files to vs10.
build/x86-msvs/yasm.xml generated during conversion will require the
Separator attribute to be removed for the build to complete
successfully.
Change-Id: If75c4f9a925529740048882003e9d766c5ac4f0c
The BPRED mode selection uses SSE as a distortion metric, but the early
breakout threshold being used was a variance value.
Change-Id: I42d4602fb9b548bf681a36445701fada5e73aff1
firstpass.c contains some rate adjustment code that assures that the
last few frames in a sequence abide by rate limits. If the second-to-
last group of frames contains an alt-ref frame (ARF), the last golden
frame (GF) is zero bytes, and we will thus spend a ridiculously high
number of bits on regular P-frames trying to hit the target rate. This
does slightly enhance the quality of these last few frames, but has
no perceptual value (other than hitting the target rate).
Disabling this code means we consistently (slightly) undershoot the
target rate and consequently do worse on the last few frames of a
clip, which is particularly noticeable for small clips. The quality-
per-bitrate is generally better, ~0.2% better overall on derf-set,
especially on clips such as garden, tennis, foreman at low bitrates.
Has a negative effect on hallmonitor at high bitrates.
Change-Id: I1d63452fef5fee4a0ad2fb2e9af4c9f2e0d86d23
Moved encode_intra function from firstpass.c to encodeintra.c to
prevent linking problem in real-time only build. Also changed name
of the function to vp8_encode_intra because it is not a static.
Change-Id: Ibf3c6c1de3152567347e5fbef47d1d39564620a5
- Updated -linux-rvct targets to support RVDS 4.0 and later.
- Changed optimization flag to -Otime because -O3 ruined performance
for RVCT linux targets.
- Added support for --enable-small for RVCT
- RVCT created library should be able to link with GCC
- Supports building shared linux libraries
Change-Id: Ic62589950d86c3420fd4d908b8efb870806d1233
If setup_token_decoder reported an internal error the memory allocated
there would not be freed in the resulting call to _remove_decompressor.
Change-Id: Ib459de222d76b1910d6f449cdcd01663447dbdf6
Small decode performance gain (~1%) on keyframes. No
noticeable gains on encode. Also changed pick_intra4x4mby_modes()
to read the above and left block modes for keyframes only.
Change-Id: I1f4885252f5b3e9caf04d4e01e643960f910aba5
Automatically created assembly offset files added to CLEAN-OBJS list
for proper cleanup. This will fix following build error:
1) Build for the workstation
./conigure
make
make clean
2) Build for ARM platform
./configure --target=armv7-linux-gcc
make ==> this will fail because it uses old asm_*_offset.asm files
Change-Id: Id5275c470390ca81b8db086a15ad75af39b80703
Since BPRED will be tested at most once, and SPLITMV is not enabled,
there's nothing to clobber the subblock modes, so there's no need to
save and restore them.
Change-Id: I7c3615b69190c10bd068a44df5488d6e8b85a364
In activity masking, RDO constant RDMULT is adjusted on a per MB basis
adaptive to activity with the MB. errorperbit, which is defined as
RDMULT/RDDIV, is a constant used in motion estimation. Previously, in
activity masking, errorperbit is not changed even when RDMULT is changed.
This commit changed to adjust errorperbit according to the change in
RDMULT.
Test in cif set showed a very small but consistent gain by all quality
metrics (average, overall psnr and ssim) when activity masking is on.
Change-Id: I07ded3e852919ab76757691939fe435328273823
This change is analogous to I0b67dae1f8a74902378da7bdf565e39ab832dda7,
which made the move for the non-RD path.
Change-Id: If63fc1b0cd1eb7f932e710f83ff24d91454f8ed1
This commit moves the intra block mode selection from encodeframe.c
to pickinter.c (in the non-RD case). This allowed pick_intra_mbuv_mode
and pick_intra4x4mby_modes to be made static, and is a step towards
refactoring intra mode selection in the main pickinter loop. Gave a
small perf increase (~0.5%).
Change-Id: I0b67dae1f8a74902378da7bdf565e39ab832dda7
Some further re-structuring of activity masking code.
Still has various experimental switches.
Supports a metric based on intra encode.
Experimental comparison against a fixed activity target rather
than a frame average, for altering rd and zbin.
Overall the SSIM performance is similar to TT's original
code but there is a much smaller PSNR hit of circa
0.5% instead of 3.2%
Change-Id: I0fd53b2dfb60620b3f74d7415e0b81c1ac58c39a
While investigating the effect of DC values on SAD and SSE in motion
estimation, a side finding indicates the two table of constants need
be adjusted. The adjustment was done by multiplying old constants by
90% with rounding. Also absorb the 1/2 scaling constant into the two
tables. Refer to change Ifa285c3e for background of the 1/2 factor.
Cif set test showed a very small gain on all metric.
Change-Id: I04333527a823371175dd46cb04a817e5b9a8b752
The encoder defined about 4 set of similar functions to calculate sum,
variance or sse or a combination of them. This commit removed one set
of these functions, get8x8var and get16x16var, where calls to the later
function are replaced with var16x16 by using the fact on a 16x16 MB:
variance == sse - sum*sum/256
Change-Id: I803eabd1fb3ab177780a40338cbd596dffaed267
In real-time mode motion search, there is no need to calculate
variance. This change improved encoding speed by 1% ~ 2%(speed=-5).
Change-Id: I65b874901eb599ac38fe8cf9cad898c14138d431
This patch attempts to reduce the peak bitrate hit by the encoder
when using small buffer windows.
Tested on the CIF set over 200-500kbps using these settings:
--buf-sz=500 --buf-initial-sz=250 --buf-optimal-sz=250 \
--undershoot-pct=100
Two pass encodes were tested at best quality. One pass encodes were
tested only at realtime speed 4:
--rt --cpu-used=-4
The peak datarate (over the specified 500ms window) was measured
for each encode, and averaged together to get metric for
"average peak," computed as SUM(peak)/SUM(target). This patch
reduces the average peak datarate as follows:
One pass:
baseline: 1.29715
this patch: 1.23664
Two pass:
baseline: 1.32702
this patch: 1.37824
This change had a positive effect on our quality metrics as well:
One pass CBR:
Min / Mean / Max (pct)
Average PSNR -0.42 / 2.86 / 27.32
Overall PSNR -0.90 / 2.00 / 17.27
SSIM -0.05 / 3.95 / 37.46
Two pass CBR:
Min / Mean / Max (pct)
Average PSNR -4.47 / 4.35 / 35.99
Overall PSNR -3.40 / 4.18 / 36.46
SSIM -4.56 / 6.98 / 53.67
One pass VBR:
Min / Mean / Max (pct)
Average PSNR -5.21 / 0.01 / 3.30
Overall PSNR -8.10 / -0.38 / 1.21
SSIM -7.38 / -0.11 / 3.17
(note: most values here were close to the mean, there were a few
outliers on files that were very sensitive to golden frame size)
Two pass VBR:
Min / Mean / Max (pct)
Average PSNR 0.00 / 0.00 / 0.00
Overall PSNR 0.00 / 0.00 / 0.00
SSIM 0.00 / 0.00 / 0.00
Neither one pass or two pass CBR mode adheres particularly strictly
to the short term buffer constraints, and two pass is less
consistent, even in the baseline commit. This should be addressed
in a later commit. This likely will hurt the quality numbers, as it
will have to reduce the burstiness of golden frames.
Aside: My work on this commit makes it clear that we need to make
rate control modes "pluggable", where you can easily write a new
one or work on one in isolation.
Change-Id: I1ea9a48f2beedd59891f1288aabf7064956b4716
Currently, hex search couldn't guarantee the motion vector(MV)
found is within the limit of maximum MV. Therefore, very large
motion vectors resulted from big motion in the video could cause
encoding artifacts. This change adjusted hex search bounds
checking to make sure the resulted motion vector won't go out
of the range. James Berry, thank you for finding the bug.
Change-Id: If2c55edd9019e72444ad9b4b8688969eef610c55
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.
Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
This is basically a slightly modified version of the previous patch,
and it has a moderately positive effect (SSIM/PSNR both +0.08% avg
on derf-set). Most clips show no change, except waterfall/coastguard,
each ~ +0.8% SSIM/PSNR. You can see similar effects in other clips
by shortening their length to terminate at a very short last group
of frames.
Change-Id: I7a70de99ca1f9fe6a8b6ca7a6e30e8a4b64383e4
this commit makes the usage errorperbit and sadperbit consistent for
encoding modes and passes. Removed all different magic weight factors
associated with errorperbit. Now 1/2 is used for both sadperbit16 and
sadperbit4, the /2 operation is merged into initializations of the 2
variables.
Tests on cif set show .23%, 0.18% and 0.19% gain by avg psnr, overall
psnr and ssim respectively.
Change-Id: Ifa285c3e065ce0a5a77addfc9f95aabf54ee270d
The fb_idx_ref_cnt book-keeping was in error. Added an assert to
prevent future errors in the reference count vector. Also fixed a
pointer syntax error.
Change-Id: I563081090c78702d82199e407df4ecc93da6f349
sad_per_bit has been used for a number of motion vector search routines
with different magic weights: 1, 1/2 and 1/4. This commit remove these
magic numbers and use 1/2 for all motion search routines, also reformat
a number of source code lines to within 80 column limit.
Test on cif set shows overall effect is neutral on all metrics. <=0.01%
Change-Id: I8a382821fa4cffc9c0acf8e8431435a03df74885
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
- Additional 3-6% speedup compared to neon optimized fast
quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)
Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
Misplaced #endif caused first_time_stamp_ever to only be initialized if
CONFIG_INTERNAL_STATS was set.
Change-Id: I2296a4ab00f7dfb767583edcc5d59b94f48c0621
Added preload instructions to armv6 encoder optimizations.
About 5% average speed-up on Tegra2 for VGA@30fps sequence.
Change-Id: I41d74737720fb71ce7a316f07555357822f3347e
in onyx_if.c update_reference_frames() make
sure that frame buffer indexes are not equal
before preforming a buffer copy. If two frames
share the same buffer the flags will already be
set correctly.
Change-Id: Ida9b5516d08e3435c90f131d2dc19d842cfb536e
Test showed using hex search in realtime mode largely speed up
encoding process, and still achieves similar quality like the
diamond search we have. Therefore, removed the diamond search
option.
Change-Id: I975767d0ec0539f9f6ed7fdfc09506e39761b66c
error_per_bit and sad_per_bit were designed as estimates of a bit worth
of sum squared error and sum absolute difference respectively. Under
this assumption, error_per_bit should be used in combination with 2nd
order errors (variance or sum squared error) while sad_per_bit should
be used in combination with 1st order SADs in motion estimation. There
were a few places where sad_per_bit has been misused with variances,
this commit changes to use error_per_bit for those places, also changes
parameter names to properly indicate which constant is being used.
On cif set, the change has a universal gain by all metrics: 0.13% by
average/overall psnr and 0.1% by ssim.
Change-Id: I4850fdcc3fd6886b30f784bd843f13dd401215fb
While profile=3, there is no sub-pixel search. Distortion and SSE
have to calculated using get_inter_mbpred_error().
Change-Id: Ifb36e17eef7750af93efa7d0e2870142ef540184
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO.
This reduced the memory footprint by 518,400 bytes for 1080
resolutions. The decoder performance improved by ~4% for the
clip used and the encoder showed very small improvements. (0.5%)
This reduction was first mentioned to me by John K. and in a
later discussion by Yaowu.
This is WIP.
Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
fixed a bug where active_worst_quality could be set
below active_best_quality which could result in an
infinite loop.
Change-Id: I93c229c3bc5bff2a82b4c33f41f8acf4dd194039
This patch collects the twopass specific memebers of VP8_COMP into a
dedicated struct. This is a first step towards isolating the two pass
rate control and aids readability by decorating these variables with
the 'twopass.' namespace. This makes it clear to the reader in what
contexts the variable will be valid, and is a hint that a section of
code might be a good candidate to move to firstpass.c in later
refactoring. There likely will be other rate control modes that need
their own specific data as well.
This notation is probably overly verbose in firstpass.c, so an
alternative would be to access this struct through a pointer like
'rc->' instead of 'cpi->firstpass.' in that file. Feel free to make
a review comment to that effect if you prefer.
Change-Id: I0ab8254647cb4b493a77c16b5d236d0d4a94ca4d
The partition_info struct contains info just for SPLITMV,
so it should be used instead of BLOCKD. Eventually, I want
to reduce the size of B_MODE_INFO struct found in BLOCKD, so
this is the first step toward that goal.
Also, since SPLITMV is not supported in vp8_pick_inter_mode(),
the unnecessary mem copies and checks were removed. For rt
encodes, this gave a slight performance improvement.
Change-Id: I5585c98fa9d5acbde1c7e0f452a01d9ecc080574
The error-concealer is plugged in after any motion vectors have been
decoded. It tries to estimate any missing motion vectors from the
motion vectors of the previous frame. Intra blocks with missing
residual are replaced with inter blocks with estimated motion vectors.
This feature was developed in a separate sandbox
(sandbox/holmer/error-concealment).
Change-Id: I5c8917b031078d79dbafd90f6006680e84a23412
rvct 4.1 was complaining about vstmia.16, store multiple expects 64 data type.
optimized the implementation.
Change-Id: I0701052cabd685c375637bbc3796ff6d88f5972c
This commit restructures the mb activity masking code
to better facilitate experimentation using different metrics
etc. and also allows for adjustment of the zero bin either
for encode only or both the encode and mode selection
stages
It also uses information from the current frame rather than
the previous frame and the default strength has been
reduced.
Change-Id: Id39b19eace37574dc429f25aae810c203709629b
This patch improves the accuracy of frame rate estimation by using a
larger, 1 second window. It also more quickly adapts to step changes
in the input frame rate (ie 30fps to 15fps)
Change-Id: I39e48a8f5ac880b4c4b2ebd81049259b81a0218e
The compiler produces better assembly when using int_mv
for assignments. The compiler shifts and ors the two 16bit
values when assigning MV.
Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
Further modification and wrong implementation fix which caused
refining_search and refining_searchx4 result mismatching.
Change-Id: I80cb3a44bf5824413fd50c972e383eebb75f9b6f
This is to reflect the RD improvement in the encoder. The change has a
small positive impact on quality (0.25% by VPXSSIM and 0.05% by PSNR)
Change-Id: Ic66ffc19b10870645088c0624c85556f009fd210
The variable is introduced in commit 2e53e9e53 to make more use of
trellis quantization, but this is no longer necessary after RDMULT
was made adaptive in a number of later commits.
Change-Id: I7420522ec7723f38cf77033466c25afb405d52ae
global values were being referenced, but the GOT was not being set up.
as the GOT is only required for PIC, this issue wasn't caught in the
default configuration.
Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f
http://code.google.com/p/webm/issues/detail?id=328
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.
Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.
Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
Paul pointed out that the pointer to the gf_active_flags is not being
properly incremented in multithreaded encoder. This commit fixes the
issue by making sure the gf_active_ptr points to the starting of next
group of mb rows.
Change-Id: I3246e657d23beabb614dfb880733a68a5fd7e34c
Commit db5057c introduced a bug in that the active_worst_quality
selected by the 2 pass rate controller was being overridden for key
frames, causing a severe quality loss.
Change-Id: I4865a6fbe3e94e9b4fb9271c7dd68b455d7b371d
VS2010 has included stdint.h, but not inttypes.h. Prefer the compiler's
version of these types. Fixes issue 327.
Change-Id: Ica71600e06b8e94e3bbb4f12988b4a9817d5e5e4
vp8_fast_quantize_b_neon function updated and further optimized.
- match current C implementation of fast quantizer
- updated to use asm_enc_offsets for structure members
- updated ads2gas scripts to handle alignment issues
Change-Id: I5cbad9c460ad8ddb35d2970a8684cc620711c56d
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.
This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.
Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
This patch is to fix a rare hang in multi-thread encoder that was
only seen on Windows. Thanks for John's help in debugging the
problem. More test is needed.
Change-Id: Idb11c6d344c2082362a032b34c5a602a1eea62fc
Changed 8-neighbor searching to 4-neighour searching, and continued
searching until the center point is the best match.
Test on test set showed 1.3% encoding speed improvement as well as
0.1% PSNR and SSIM improvement at speed=-5 (rt mode).
Will continue to improve it.
Change-Id: If4993b1907dd742b906fd3f86fee77cc5932ee9a
The commit also removed the slow ssim calculation that uses a 7x7
kernel, and revised the comments to better describe how sample ssim
values are computed and averaged
Change-Id: I1d874073cddca00f3c997f4b9a9a3db0aa212276
Always use CFLAGS/LDFLAGS that point to headers and libvpx.a inside our
build tree before ones from the environment, which could reference
headers or libs outside the build tree.
This fixes issue 307.
Change-Id: I34d176b8c21098f6da5ea71f0147d3c49283cc45
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35
Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.
Change-Id: Ie02eaa6fed662866a937299194c590d41b25bc3d
This change simply exercises the VP8D_GET_FRAME_CORRUPTED control,
outputting a warning message at the end if the bit was set for any
frames. Should never produce any output for good input.
Change-Id: Idaf6ba8f53660f47763cd563fa1485938580a37d
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.
Change-Id: Ib5172eaa57323b939c439fff8a8ab5fa38da9b69
Combine calc_iframe_target_size, previously only used for forced
keyframes, with calc_auto_iframe_target_size, which handled most
keyframes.
Change-Id: I227051361cf46727caa5cd2b155752d2c9789364
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.
Change-Id: I66a9a62a5f9c23c818015e03f92f3757bf3bb5c8
Fixed test vector mismatch that was introduced
in the "Removed dc_diff from MB_MODE_INFO"
(Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931)
Change-Id: I98fa509b418e757b5cdc4baa71202f4168dc14ec
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers
stop tracking the option in two places. use filter_type exclusively
Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.
This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.
Change-Id: Ieba55b86517b81e51e6f0a9fe27aabba295acab0
vp8_adjust_key_frame_context() divides by
estimate_keyframe_frequency() which can
return 0 in the case where --kf-max-dist=0.
Change-Id: Idfc59653478a0073187cd2aa420e98a321103daa
Create a new input parameter to allow specifying
the packed frame stereo 3d format. A default value
of mono will be written in the absence of user
specified input
Change-Id: I576d9952ab5d7e2076fbf1b282016a9a1baaa103
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.
Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.
Change-Id: I3d2aa8a448d52e6dc08858d825bf394929b47cf3
This patch changes the release configuration of MS VS projects to
explicitly use two compiler options "Maximize Speed (/O2)" and
"Favor fast code(/Ot)".
Change-Id: I0bf8343d9ca195851332b91ec69c69ee4e31ce2a
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.
Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
The dc_diff flag is used to skip loopfiltering. Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.
Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931
Code cleanup. The build inter predictor functions are
redundantly checking the mode_info_context for either
INTRA_FRAME or SPLITMV.
Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb
Golden and ALT reference buffers were refreshed by copying from
the new buffer. Replaced this by index manipulation.
Also moved all the reference frame updates to one function for
easier tracking.
Change-Id: Icd3e534e7e2c8c5567168d222e6a64a96aae24a1
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.
Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.
Change-Id: Icbc2c83d2b90e184be03e6f9679e678f3a4bce8f
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.
Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.
Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.
This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."
Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().
Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1
According to the docs, this should have been enabled, but
the disassembled output shows otherwise. This improved
the encode/decode performance.
Change-Id: I45ad7e6d299b89ac3166d7ef7da75b74994344c6
vp8_filter_block1d16_h4_ssse3 was never called
because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.
Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
from motion search as distortion. This helps performance in real-
time mode.
Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
The value of distortion2 returned by vp8_pick_intra4x4mby_modes
was being overwritten by the value returned by get16x16prederror
before it was tested.
Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a
update for the latest version of the ios sdk. adding
usr/lib/system fixes a missing libcache.dylib issue
make isysroot path more DRY
Change-Id: Ib748ef3dac3cac2e4848fbffa1e9a0112eac826b
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.
Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.
Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant
only use 8 bits to store this, instead of 16. will allow saving an
xmm register in an updated version of the regular quantize
Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.
This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.
Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.
Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2
The error accumulator stats values cpi->prediction_error and
cpi->intra_error were being populated with rd values not
distortion values.
These are only "currently" used in a limited way for RT compress
key frame detection.
Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.
Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.
Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.
Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
cygwin doesn't support _sopen. drop down to the lowest common
denominator and merge main for all platforms. this also opens the door
for supporting multiple object formats with a single binary.
Change-Id: I7cd45091639d447434e6d5db2e19cfc9988f8630
rather than look up rc in the zig zag table, embed it in the macro. this
also allows us to shuffle some values in the macro and keep *d in rsi
gains of about the same order as the obj_int_extract implementation: ~2%
Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
- vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
that contains all needed address calculations
- unnecessary file encodemb_arm.c removed
- consistent with ARMv6 optimized version
Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
Detect the number of available cores and limit the thread allocation
accordingly. On decoder side limit the number of threads to the max
number of token partition.
Core detetction works on Windows and
Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN.
Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
Rules are added to libs.mk to generate a vpx.pc, which is
installed as pkgconfig/vpx.pc under the target library directory.
This also requires the install path prefix be exported directly
in config.mk.
Some systems use a tool called pkg-config to query information
about intalled libraries or other resources, based on database
files provided by the packages themselves at install time.
Providing such a file for libvpx simplifies integration with
other build systems, and provides an easy avenue for developers
to test against their own builds of the library.
Change-Id: I4e32a8fbb53fc331aa95eb207c63dd70a76d18ed
The configure script exports the major/minor/patch version
numbers, but didn't make the full version string available
to Makefile recipes and rules, the way it is available to
C code from vpx_version.h.
Change-Id: Ic6a9d4c574a6ea66a50c928f4eedeb91d7668eb5
Identified as a possible cause of issue #308, the code was silently
ignoring realloc failures, which would lead to corruption, memory
leaks, and likely a crash. The best we can do in this case is die
gracefully.
Change-Id: Ie5f6a853d367015be5b9712bd742778f3baeefd9
Adds following ARMv6 optimized functions to encoder:
- vp8_subtract_b_armv6
- vp8_subtract_mby_armv6
- vp8_subtract_mbuv_armv6
Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.
Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
now that we need asm_enc_offsets.c for x86 and arm and it is
harmless to build it for other targets, add it unconditionally
Change-Id: I320c5220afd94fee2b98bda9ff4e5e34c67062f3
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.
Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems
when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9
Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
This declaration did not match the prototype_sad() prototype, but was
unused in this translation unit, so it is removed instead. Fixes
issue 290.
Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
Map an enum to the --end-usage values, so you can specify
--end-usage=cq instead of --end-usage=2. The numerical values still
work for historical scripts, etc, but this is more user friendly.
Change-Id: I445ecd9638f801f5924a71eabf449bee293cdd34
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
- No interlocks in Cortex-A8 pipeline
- One interlock cycle in ARM11 pipeline
- About 2.16 times faster than current C-code compiled with -O3
Change-Id: I60484ecd144365da45bb68a960d30196b59952b8
Thread synchronization was not correct when frame width was 1 MB.
Number of allocated encoding threads is limited by the sync_range.
There is no point having more because each thread lags sync_range MBs
behind the thread processing the row above.
http://code.google.com/p/webm/issues/detail?id=302
Change-Id: Icaf67a883beecc5ebf2f11e9be47b6997fdf6f26
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.
These symbols were identified by:
$ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
| sort | grep '^ *1 '
Change-Id: I59609f58ab65312012c047036ae1e0634f795779
The mmap allocation code in vp8_dx_iface.c was inconsistent.
The static array vp8_mem_req_segs defines two descriptors,
but only the first is real. The second is a sentinel and
isn't actually allocated, so vpx_codec_alg_priv is declared
with mmaps[NELEMENTS(vp8_mem_req_segs)-1]. Some functions
use this reduced upper bound when iterating though the mmap
array, but these two functions did not.
Instead, this commit calls NELEMENTS(...->mmaps) to directly
query the bounds of the dereferenced array.
This fixes an array-bounds warning from gcc 4.6 on
vp8_xma_set_mmap.
Change-Id: I918e2721b401d134c1a9764c978912bdb3188be1
Fixes implicit declaration warning for 'mach_task_self'. This change
is an update to Change I9991dedd1ccfddc092eca86705ecbc3b764b799d,
which fixed this issue for the decoder but not the encoder.
Change-Id: I9df033e81f9520c4f975b7a7cf6c643d12e87c96
obj_int_extract was unconditionally skipping the first character in the
symbol. make sure it's actually an '_' first
Change-Id: Icfe527eb8a0028faeabaa1dcedf8cd8f51c92754
Without this change I get link errors in firefox's libxul. It looks
like the linker expect a particular pattern for getting the GOT. This
patch changes webm to use the same pattern used by the compiler.
Change-Id: Iea8c2e134ad45c1dc7d221ff885a8429bfa4e057
The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s
functions had global function pointers rather than using the RTCD
framework. This can show up as a potential data race with tools such as
helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935
for an example.
Change-Id: I29c407f828ac2bddfc039f852f138de5de888534
Clean up vp8_init_config() a bit and remove null pointer case,
as this code can't be called any more and is not an adequate
trap anyway, as a null pointer would cause exceptions before
hitting the test.
Change-Id: I937c00167cc039b3aa3f645f29c319d58ae8d3ee
Issue 291 highlighted the fact that CQ mode was not working
as expected in 1 pass mode,
This commit fixes that specific problem but in so doing I also
uncovered an overflow issue in the VBR code for 1 pass and
some data values not being correctly initialized.
For some clips (particularly short clips), the resulting
improvement is dramatic.
Change-Id: Ieefd6c6e4776eb8f1b0550dbfdfb72f86b33c960
In multithreaded mode the loopfilter is running in its own thread (filter level
calculation and frame filtering). Filtering is mostly done in parallel with the
bitstream packing. Before starting the packing the loopfilter level has
to be calculated. Also any needed reference frame copying is done in the
filter thread.
Currently the encoder will create n+1 threads, where n > 1 is the number of
threads specified by application and 1 is the extra filter thread. With n = 1
the encoder runs in single thread mode. There will never be more than n threads
running concurrently.
Change-Id: I4fb29b559a40275d6d3babb8727245c40fba931b
Enable extraction of assembly offsets from compiled examples in MSVS.
This will allow us to remove some stub functions from x86 assembly since
we will be able to reliably determine structure offsets at compile time.
see ARM code for examples:
vp8/encoder/arm/armv5te/
vpx_scale/arm/neon/
Change-Id: I1852dc6b56ede0bf1dddb5552196222a7c6a902f
The firstpass motion map consists of an 8-bit flag for
each MB indicating how strongly the firstpass code
believes it should be filtered during the second pass
ARNR filtering.
For long or large format material the motion map can
become extremely large and hamper the operation of
the encoding process.
This change removes the motion map altogether, leaving
the second pass to rely on the magnitude of the motion
compensated error to determine the filter weight to
use for the MB during ARNR filtering.
Tests on the derf set indicate that the effect of this
change is neutral, with some small wins and losses. The
motion map has therefore been removed based on
a cost/benefit evaluation.
Change-Id: I53e07d236f5ce09a6f0c54e7c4ffbb490fb870f6
The previous calculation of macroblock count (w*h)/256
is not correct when the width/height are not multiples of
16. Use the precalculated macroblock count from
cpi->common instead. This manifested itself as a divide
by zero when the number of pixels was less than 256.
num_mbs updated in estimate_max_q, estimate_q,
estimate_kf_group_q, and estimate_cq
Change-Id: I92ff98587864c801b1ee5485cfead964673a9973
Move the update of the loopfilter info to the same block where it
is used. GCC 4.5 is not able trace the initialization of the local
filter_info across the other calls between the two conditionals on
pbi->common and issues an uninitialized variable warning.
Change-Id: Ie4487b3714a096b3fb21608f6b0c74e745e3c6fc
failed to find headers in the source directory
output to stdout instead of a hardcoded file
MinGW doesn't support _sopen_s
_fstat catches non-existant files
Change-Id: I24e0aacc6f6f26e6bcfc25f9ee7821aa3c8cc7e7
1. Process 16 pixels at one time instead of 8.
2. Add check for both xoffset =0 and yoffset=0, which happens
during motion search.
This change gave encoder 1%~3% performance gain.
Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
GCC 4.5 and 4.6 both issue a warning about the multi-line format
string introduced in bc9c30a0, which also changed the whitespace
in the associated stt file by line-wrapping the long format string.
Instead, use multiple string constants, which the compiler will
concatenate. This maintains the original formatting, but remains
legible within the standard line length.
Change-Id: I27c9f92d46be82d408105a3a4091f145f677e00e
Disable zbin boost in SPLITMV mode as intended. Was incorrectly looking
at vp8_ref_frame_order instead of vp8_mode_order when comparing against
SPLITMV. This condition should have always been false, as SPLITMV is
not in the range of valid reference frames.
Change-Id: I0408cc7595eff68f00efef6d008e79f5b60d14bf
Cast size_t to (unsigned long) and print it with the %lu format
string, which is more portable than C99's explict %zu for size_t.
This truncates on Windows x64 but otherwise works on 32 and 64 bit
platforms. In practice the stats file is unlikely to be so large.
Change-Id: I0432b3acf85fc6ba4ad50640942e1ca4614b21cb
In some cases where clips have been encoded with
borders (eg. some wide-screen content where there is a
border top and bottom and slide shows containing portrait
format photographs (border left and right)) key frames were
not being correctly detected.
The new code looks to measure cases where a portion of
the image can be coded equally easily using intra or inter
modes and where the resulting error score is also very low.
These "neutral" areas are then discounted in the key frame
detection code.
Change-Id: I00c3e8230772b8213cdc08020e1990cf83b780d8
This code extends what was previously done for GFs, to pick
cases where insertion of a key frame after a fade (or other
transition or complex motion) followed by a still section, will
be beneficial and will reduce the number of forced key frames.
Change-Id: If8bca00457f0d5f83dc3318a587f61c17d90f135
Rename the common control id enum vp8_{dec,com}_control_id,
move VP8_DECODER_CTRL_ID_START to common, wrap long lines.
Change-Id: I659abc62f10aa389d496f7f43950775db0ef2f9f
When the modified_error_left accumulator exceeds INT_MAX, an incorrect
cast to int resulted in a negative value, causing the rate control to
allocate no bits to that keyframe group, leading to severe undershoot
and subsequent poor quality.
This error was exposed by the recent change to the rolling target and
actual spend accumulators in commit 305be4e4 which fixed them to
actually calculate the average value rather than be re-initialized
on every frame to the average per-frame bitrate. When this bug was
triggered, the target bitrate could be 0, so the rolling target
becomes small, which causes the undershoot. The code prior to 305be4e4
did not exhibit this behavior because the rolling target was always
set to a reasonable value and was independent of the actual target
bitrate. With this patch, the actual target bitrate is calculated
correctly, and the rate control tracks as expected.
This cast was likely added to silence a compiler warning on a comparison
between a double (modified_error_left) and an int (0). Instead, this
patch removes the cast and changes the comparison to be against 0.0,
which should prevent the warning from reoccuring.
This fixes issue #289. Special thanks to gnafu for his efforts in
reporting and debugging this fix.
Change-Id: Ie5cc1a7b516c578a76c3a50c892a6f04a11621fe
add visual studio 9 to --help
remove cpp, cxx, hpp, hxx files from filter
add the ability to target project names. this will be necessary to
enable obj_int_extract
Change-Id: I407583320d8b67a0df40c07221838c42678792f7
AMD64 only implies SSE2, not SSE3. There aren't any known cases where
icc was generating SSE3 instructions since all the vectorizable code
is already in handwritten asm, so this fix is included mostly for
correctness. Fixes issue #259.
Change-Id: I993335a4740b68b559035305fb52ca725a6beaff
MSVC can't pass the address of global variables in a DLL correctly
across DLL boundaries. This patch allows linking the examples to
a libvpx dll build. Fixes issue #268.
Change-Id: I1c52d076cfc68efb3efdfba019f12d53c5019f58
This allows the base documentation to be built without the need for php
which is required to produce the example documentation
Change-Id: Id1861723c672fa8da132a074a4657e2cb94c1e79
Group algorithm interfaces to avoid undocumented warning from doxygen
and provide basic documentation for CQ level & cpuused.
Change-Id: I11095061be962cbc998741de9c8c3019d415e137
This fixes an overflow problem in the frame error accumulators.
The overflow condition is extreme but did trigger when Frank B.
coded some high motion interlaced HD content.
The observed effect was a catastrophic breakdown of the rate
control leading to massive undershoot and poor bit allocation.
All the error values should really be unsigned but I will look at this
separately.
Change-Id: I9745f5c5ca2783620426b66b568b2088b579151f
- correct spelling
- remove explicit file name w/\file (unnecessary when contained in the
same file and prone to desync)
Change-Id: I68a3960ac5ab84d0f2e5c9b2e29799f26dfccf23
Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.
Change-Id: Ib7506e3eb2cea06657d56be5a899f38dfe3eeb39
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).
Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
checks added to make sure that cpi->tplist
is freed correctly in vp8_dealloc_compressor_data
and vp8_alloc_compressor_data.
Change-Id: I66149dbbd25c958800ad94f4379d723191d9680d
Eliminated unnecessary calculations. Improved performance
by 10% on keyframes and 1.6% overall for the test clip used.
Change-Id: I87671b26af5e2cc439e81d0fee3b15c7cd2a3309
As mentioned in check-in "Improve motion search in real-time mode",
MV prediction calculation causes speed loss for speed 7 and above.
This change added a flag to turn off this calculation for speed>6
in real-time mode.
Change-Id: I9f4ae5a8bf449222d1784b54e7d315fc8347b2d1
Applied better MV prediction in real-time mode, which improves
the encoding quality.
Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.
Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.
Change-Id: I349a96c452bd691081d8c8e3e54419e7f477bebd
Created a new speed 1 which is in the middle of the old
speed 0 and speed 1. (for both quality and performance)
Change-Id: I4802133cdb43f359ca787646c090899679dd5d84
Use 0xFFF0 vice 240 (0xF0) for determining whether the sometimes
implicit bit 3 will be transmitted. This is consistent with the decoder
and encode_mvcomponent().
Change-Id: Ic1304d0ab56844bed8236edd1c5243a6767fc6b1
make reference version of bilinear_filters short.
use reference versions of bilinear_filters and sub_pel_filters when
possible.
recognize that Width was being passed into
filter_block2d_bil_first_pass multiple times. ARM version had already
fixed this. propegate to C.
change references to src_pixels_per_line to src_pitch and standardize on
src/dst (instead of input/output).
recognize that first_pass is only run in the verticle and second_pass
only horizontal. ARM version had already fixed this. propegate to C
Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
it's difficult to mux the *_offsets.c files because of header conflicts.
make three instead, name them consistently and partititon the contents
to allow building them as required.
Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
preview_img d_w and d_h along with w and h
would not be updated for resized frames.
now uses sd.y_width and sd.y_height
Change-Id: I52241de4cc1de5e73f865e668bd70a7cbd954390
When the keyframe distance is fixed the first interval has the right
distance but, the next ones have kf_distance + 1.
Change-Id: I44f1190fe7146124bd07660a5e0ef08829e3ae07
common/arm/vpx_asm_offsets moves up a level. prepare for muxing with
encoder/arm/vpx_vp8_enc_asm_offsets
Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
Only suppress unused function warnings, rather than supprressing all
unused-* warnings. Unused functions can still be seen with
--enable-extra-warnings.
Change-Id: Ibca20d859dbffedd76bd082ffe0fa685c3ac198e
The encoder was not correctly catching transitions in the quantizer
deltas. If a delta_q was set, then the quantizer would be reinitialized
on every frame, but if they transitioned to 0, the quantizer would
not be reinitialized, leading to a encode-decode mismatch.
This bug was triggered by commit 999e155, which sets a Y2 delta Q
for very low base Q levels.
Change-Id: Ia6733464a55ee4ff2edbb82c0873980d345446f5
Whe auto keyframe insertion is enabled and conditions are right (scene change)
the encoder can decide to insert a key frame and does a re-encoding. This can
introduce extra latency. In RT mode we do not do the re-encoding of the current
frame but force the next frame to key frame.
Change-Id: I15c175fa845ac4c1a1f18bea3676e154669522a7
Reduce the number of sync points by letting each thread
continue imediatly with a new MB row.
Better multicore scaling, improves performance by 5-20% on ARM multicore.
Change-Id: Ic97e4d1c4886a842c85dd3539a93cb217188ed1b
from vp8cx_encode_intra_macro_block. prediction_error is used when
deciding if a frame should be a keyframe. After reviewing this with
Yaowu, it was pointed out that vp8cx_encode_intra_macro_block
is only called for keyframes, so the accumulation is unnecessary.
Change-Id: Id79dc81b80d4f5d124f3a0dba1b923887e2e1ec8
vp8_pick_intra4x4mby_modes uses the passed in distortion
for an early breakout. The best distortion was never saved
and the distortion for TM_PRED was always used.
Change-Id: Idbaf73027408a4bba26601713725191a5d7b325e
Previously, the DC check is to make sure there is no code-able
DC shift for quantizer Q0, which has been verified rather
conservative. This commit changes the criteria to have two
components, DC and AC, to address the conservativeness. First,
it checks if all AC energy is enough to contribute a single
non-zero quantized AC coefficient. Second, for DC, the decision
to skip further considers two possible scenarios: 1. There is
no code-able 2nd order DC coefficient at all; 2 The residue is
relatively flat, but the uniform DC change is very small, i.e.
less than 1/2 gray level per pixel.
Comparing to previous criteria, the new criteria is about 10%
to 15% faster in encoding time with a very small quality loss.
(threshold ~1000 and quality range 33db-45db)
It should be noted that this commit enables "automatic" static
threshold for encodebreakout if a non-zero small value is passed
in to encoder.
Change-Id: I0f77719a1ac2c2dfddbd950d84920df374515ce3
The condition for using RD when selecting the intra coding mode
for a MB is that the RD flag is set AND we're not in real-time
mode.
Previously the code used RD if either the RD flag was set OR
we were not using real-time mode.
Change-Id: Ic711151298468a3f99babad39ba8375f66d55a08
This function was using a variance metric compared to and SSE metric in
other places (eg. vp8_rd_inter_uv)
Change-Id: I9109fcc5a13bca9db1d7ead500fe14999ab233eb
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
- armv5te-none-rvct
- armv6-none-rvct
- armv7-none-rvct
To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.
Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.
Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
vp8/encoder/rdopt.c:728: warning: pointer targets in passing argument 3
of 'macro_block_yrd' differ in signedness
vp8/encoder/rdopt.c:541: note: expected 'int *' but argument is of type
'unsigned int *'
distortion is signed when calling macro_block_yrd is both other cases,
as well as for RDCOST
Change-Id: I5e22358b7da76a116f498793253aac8099cb3461
Change-Id: I6ca2d89f355839c4c770773c09fc69dcea7c1406
warning: implicit declaration of function
'vp8_variance_halfpixvar16x16_[h|v|hv]_neon'
'vp8_sub_pixel_variance16x16_neon_func'
Improved the performance of the first pass only
(~6% on 720p test clip) by making use of LUT instead of the
float calculations. Might try a SIMD version later.
Also started to make use of int_mv instead of
MV.
Change-Id: If2a217c7d6b59cd2c25c5553e0ca7e0502403af8
Use the function macro_block_yrd() to calculate error and distortion
in keeping with what is done for inter frames.
The old code was using a variance metric for once case and an
SSE function for measuring distortion in the other case.
The function vp8_encode_intra16x16mbyrd() is no longer used.
Change-Id: Ic228cb00a78ff637f4365b43f58fbe5a9273d36f
The code previously tested cpi->common.refresh_alt_ref_frame
but there are situations where this flag may be set for viewable frames.
The correct test should be !cm->show_frame.
Change-Id: Ia1a600622992a4a68fe1d38ac23bf6b34b133688
This commit also removes artificial RDMULT cap for low quantizers.
The intention is to address some abnormal behavior of mode selections
at the low quantizer end, where many macroblocks were coded with
SPLITMV with all partitions using same motion vector including (0,0).
This change improves the compression quality substantially for high
quality encodings in both PSNR and SSIM terms. Overall effect on
mid/low rate range is also positive for all metrics, but smaller
in magnitude.
Change-Id: I864b29c4bd9ff610d2545fa94a19cc7e80c02667
Commit 336aa0b7da incorrectly
declared current_pos as and int, when it should have been
a FIRSTPASS_STATS pointer.
Change-Id: I0a51c7a86ebba8546c95dd5d9d1c1143d4613e40
Adjust checking points in motion vector prediction to better cover
possible movements, and get a better prediction. Tests on test
clips showed a 0.1% improvement in SSIM, and no change in PSNR
and performance.
Change-Id: Ifdab05d35e10faea1445c61bb73debf888c9d2f8
The old 2 pass code estimated error distribution when coding a
forced (by interval) key frame. The result of this was that in some
cases, when allocating bits at the GF group level within a KF
group there was either a glut of bits or starvation of bits at the end
of the KF group.
Added code to rescan and get the correct data once the position of
a forced key frame has been determined.
Change-Id: I0c811675ef3f9e4109d14bd049d7641682ffcf11
-For targets with external build systems like visual
studio CC is not set so check_add_cflags will fail.
Only call this function if extra_cflags is set.
Change-Id: I3531bad69e9b6a59c5be1b0e8b6053ccccbc332c
vp8cx_mb_init_quantizer was being called for every mode checked
in vp8_rd_pick_inter_mode. zbin_extra is the only value that
really needs to be recalculated. This calculation is disabled
when using the fast quantizer for mode selection.
This gave a small performance boost (~.5% to 1%).
Note: This needs to be verified with segmentation_enabled.
Change-Id: I62716a870b3c82b4a998bdf95130ff0b02106f1e
In sub-pixel calculation, xoffset and yoffset mostly take some
specific values. Modified sub-pixel filter functions according to
these possible values to improve performance.
Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
Added code to scan ahead a few frames when we see what
we think is a static scene in the two pass GF loop to see if the
conditions persist.
Moved calculation of decay rate out into a fuunction.
Change-Id: I6e9c67e01ec9f555144deafc8ae67ef25bffb449
These changes are specifically targeted at fade transitions to
static scenes. Here we want to place a GF/ARF immediately
after the fade and prevent an ARF just before the fade.
Also some code lines and comment lines shortened to 80 chars
while I was there.
Change-Id: Iefdc09a4fa7b265048fc017246b73e138693950f
Add --extra-cflags as config parameter for user defined extra CFLAGS.
Add -g to asflags when debug enabled for arm targets.
Change-Id: Ibdde7cfdda6736c1c1db45e6466bd08504a51f15
In both vp8_find_next_key_frame and define_gf_group,
motion_pct was initialised at the top of the loop before
next_frame stats had been read in.
This fix sets motion_pct after next_frame stats have
been read.
Change-Id: I8c0bebf372ef8aa97b97fd35b42973d1d831ee73
Prior to this change, VP8 min quantizer is 4, which caps the
highest quality around 51DB. This experimental change extends
the min quantizer to 1, removes the cap and allows the highest
quality to be around ~73DB, consistent with the fdct/idct round trip
error. To test this change, at configure time use options:
--enable-experimental --enable-extend_qrange
The following is a brief log of changes in each of the patch sets
patch set 1:
In this commit, the quantization/dequantization constants are kept
unchanged, instead scaling factor 4 is rolled into fdct/idct.
Fixed Q0 encoding tests on mobile:
Before: 9560.567kbps Overall PSNR:50.255DB VPXSSIM:98.288
Now: 18035.774kbps Overall PSNR:73.022DB VPXSSIM:99.991
patch set 2:
regenerated dc/ac quantizer lookup tables based on the scaling
factor rolled in the fdct/idct. Also slightly extended the range
towards the high quantizer end.
patch set 3:
slightly tweaked the quantizer tables and generated bits_per_mb
table based on Paul's suggestions.
patch set 4:
fix a typo in idct, re-calculated tables relating active max Q
to active min Q
patch set 5:
added rdmult lookup table based on Q
patch set 6:
fix rdmult scale: dct coefficient has scaled up by 4
patch set 7:
make transform coefficients to be within 16bits
patch set 8:
normalize 2nd order quantizers
patch set 9:
fix mis-spellings
patch set 10:
change the configure script and macros to allow experimental code
to be enabled at configure time with --enable-extend_qrange
patch set 11:
rebase for merge
Change-Id: Ib50641ddd44aba2a52ed890222c309faa31cc59c
Incorrect value loop_decay_rate used in GF loop.
The intent was to test the cumulative value decay_accumulator.
Change-Id: I62928c63eb09f4f6936a45ebd1c23784d1c9681b
In vp8_find_best_sub_pixel_step_iteratively(), many times xoffset
and yoffset are specific values - (4,0) (0,4) and (4,4). Modified
code to call simplified NEON version at these specific offsets to
help with the performance.
Change-Id: Iaf896a0f7aae4697bd36a49e182525dd1ef1ab4d
In the process of developing new intra prediction modes, tests have
shown removal of the low pass filtering from B_HE_PRED and B_VE_PRED
has an overall minor positive impact in both PSNR and SSIM metric.
Overall difference is about 0.1%. The change shall also have a small
positive impact on speed. Intuitively, this change should also reduce
some of the tendency of "flattening"
Change-Id: I3c43b0daca833c6eff77d00f19c811f9ef9368a3
Changing the MAX_PSNR to 100 to allow testing of further experiments
on extending quantizer range to near lossless. With an effective
quantizer of 1, encoder achieves ~68DB, which is consistent with
fdct/idct round trip error.
Change-Id: I7b6d0e94a8936968ef42e82e63ebb13999c36832
Extending the value range of tokens allows further experiments on
extending quantizer range. Encoder and decoder were verified to
produce matching reconstructed buffers by tests with forced
quantized value of 1.
Change-Id: I12faf92832867870b6f71ddeafbf643f1040086d
- Used three probability approach for temporal context as follows:
P0 - probability of no change if both above and left not changed
P1 - probability of no change if one of above and left has changed
P2 - probability of no change if both above and left have changed
In addition, a 1 bit/frame has been used to decide whether to use temporal context or to encode directly. The cost of using both the schemes is calculated ahead and the temporal_update flag is set if the cost of using temporal context is lower than encoding the segment ids directly.
This approach has given around 20% reduction in cost of bits needed to encode segmentation ids.
Change-Id: I44a5509599eded215ae5be9554314280d3d35405
Modified code so that:
-When above and left contexts are same and not equal to current segment id, it needs to read a maximum of 2 segment_tree_probabilities.
- When above and left contexts are different and not equal to current segment id, it needs to read only a single segment_tree_probability.
Change-Id: Idc2cf2c4afcc6179b8162ac5a32c948ff5a9a2ba
-Updates by making use of spatial correlation.
-Checks if the segment_id is same as above or left context and encodes only the update to the map instead of updating individual segment_ids.
Change-Id: Ib861df97e8aa2b37516219eeddcdbaf552b6a249
This patch creates some basic infrastructure for doing bitstream-
incompatible changes to the VP8 encoder. The key parts are:
- --enable-experimental configure switch, to enable support for this
incompatible bitstream. This switch is required to be set to enable
any "experiments"
- A list for "experiments" which translate into --enable-<experiment>
options and CONFIG_<experiment> macros.
- The high bit of the "Version" field is used to indicate that the
bitstream was produced by an experimental encoder. The decoder will
fail to decode an experimental bitstream without
--enable-experimental.
- A new "vp8x" encoder interface is created to set the experimental
bit.
- The vp8x encoder interface is made the default for ivfenc in
experimental mode.
Change-Id: Idbdd5eae4cec5becf75bb4770837dcd256b2abef
2010-06-01 11:14:33 -04:00
950 changed files with 223328 additions and 67658 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.