ndk-build is intended to be launched from the application directory,
not the jni directory. Clarify the path in the comments.
Change-Id: Ie0faa89a13d967088a4fd2fd1c647962a9c04217
This reverts commit 8bb82fded5.
This is an incorrect workaround. It has been fixed in the GYP files
upstream.
Change-Id: If42f997747ce878b874508fdf7ae5a73a6fa1b2b
WIP: Fixing unsafe threading in VP8 encoder.
Use the passed in macroblock instead of the macroblock located in
cpi.
Change-Id: I1bfa07de6ea463f2baeaae1bae5d950691bc4afc
The loopfilter thread from the previous frame can be running while
starting the current frame. cpi->Source will change during this time causing
the wrong data to be copied. The refresh_x_frame flags also change, which
will cause incorrect updates of the denoised buffers.
Change-Id: I7d982b4fcb40a0610801332aa85f3b792c64e4c3
The denoiser was writing to LAST_FRAME buffer. If LAST_FRAME isn't being
updated, the reference frame buffers were out of sync between the encoder and the
denoised raw buffers. This patch resolves the discrepancy by always writing to a work
buffer (INTRA_FRAME) and then copying from that buffer to any buffers that needs to
be updated.
Change-Id: I6dd855b9749978b542bc3d515914d5f16faf25df
Multi-threaded code was not updated to disable background
refresh for non base-layer frames at the time it was
disabled in the main C-code.
Change-Id: Id6cc376130b7def046942121cfd0526b4f0a71d4
This is enabled by default in the main configure.sh, but apparently
is supposed to be disabled if the hardware doesn't support it.
Unaligned reads is only supported on armv6 and newer.
Change-Id: Ie1412e36a14036bbb4fe7b89aa36a178f35b2228
fixes, e.g.:
In file included from ../vpx/internal/../vpx_decoder.h:33:0,
from ../vpx/internal/vpx_codec_internal.h:46,
from ../vp8/common/onyx.h:21,
from ../vp8/encoder/block.h:15,
from ../test/subtract_test.cc:18:
../vpx/internal/../vpx_codec.h:52:0: warning: "DECLSPEC_DEPRECATED"
redefined
/usr/x86_64-w64-mingw32/sys-root/mingw/include/winnt.h:164:0: note: this
is the location of the previous definition
Change-Id: Iddc9318451d3e4e4a78b4d706518083fffff5c61
Don't use the switch to gf_rate_correction factor when
temporal layers is used (i.e., cpi->oxcf.number_of_layers > 1).
In temporal layers, we prefer to avoid this as any frame
(e.g., base layer frame at anchor of pattern) may update
both last and golden (and possibly alt-ref), and so we would get
different rate correction factors within the same layer.
This change will make sure one rate correction factor exists for each layer.
Also, made some other code in qp-regulate that depends on
alt/golden update specific to the 1 layer case.
Change-Id: I41a6d085bd477f9307ef3b3c311695214273892c
xmm[6-11] should be saved and restored for Windows x64; prevents an
encoder mismatch and some datarate issues.
Change-Id: I03c38eb18ec20c6c441cae19416393058baad1ee
xmm6/xmm7 should be saved and restored for Windows x64; prevents an
encoder mismatch and some datarate issues.
Change-Id: Ifa1a82ab25fbdc5112d66f5332e14b16e69ac164
https://codereview.chromium.org/11413061/
The Android NDK automatically manages the include directories. Trying
to do so manually for the Android GYP files can cause the wrong setjmp.h
to be included.
Change-Id: I5c3769f983fcbad1ed602feda781690c6e4e97b3
Exlcude key frame from buffer underrun check, and increase
lowest bitrate in BasicBufferModel.
Both changes are needed because of a known issue (#495).
Change-Id: If5e994f813d7d5ae870c1a72be404c8f7dbbdf27
This change converts push to stmdb and pop to ldmia. In addition word boundaries
are obeyed using \b avoiding substituion where not appropriate.
Patch provided by ihf@chromium.org.
TEST=Used on many Daisy assembly files.
BUG=None.
Change-Id: Ie5b197b158edd0467294551d0b640c8db6530d95
So that, in case of error, the arrays are not filled with trash
pointers that are attempted a free() during vp8mt_de_alloc_temp_buffers()
Change-Id: Ic074549c2903a43316510eb42e4f393e7d3ee528
The vp8_post_proc_down_and_across_mb_row_sse2() needs space for an
even number of macroblocks, as they are read two at a time for the
chroma planes. Round up the width during the allocation of
pp_limits_buffer to support this.
Change-Id: Ibfc10c7be290d961ab23ac3dde12a7bb96c12af0
Got 61 test vectors from vp8-test-vectors.git
(http://git.chromium.org/gitweb/?p=webm/vp8-test-vectors.git)
Added decoder test vectors downloading in unit tests. Uploaded
the test vectors and their md5 files to WebM website.
$ gsutil cp *.* gs://downloads.webmproject.org/test_data/libvpx
Added their sha1sum to the test/test-data.sha1 file.
In unit tests, download the test vectors to LIBVPX_TEST_DATA_PATH.
Test_vector_test goes through the test vectors, decodes them, and
compute the md5 checksums. The checksums are compared with the
expected md5 checksums to tell if the decoder decodes correctly.
Change-Id: Ia1e84f9347ddf1d4a02e056c0fee7d28dccfae15
If the threshold(limits) <= 0, skipped filtering and copied the
frame directly. Also, fixed memory allocation checking.
Change-Id: If3d79d5b2bcb71b9777e6eb5cba1384585131e22
Documentation is typically auto-detected by checking for php and
doxygen. Add an option to explicitly disable it.
Remove toggle keywords from libraries, examples, documentation and
unit tests. They were not consistent with the default status.
Change-Id: I21049675ccfd8e58ac612cd058641b197db5c0eb
If a reference frame is forced because of low dissimilarity, then
shut off the search of intra modes. This change has mixed results. On
one clip (QVGA), it hurt quality by ~1.5% with negligible speed impact.
On another (VGA) it had negligible affect on quality, but a ~0.2% speed
impact.
Change-Id: Ic8b07648979d732f489de5f094957e140f84d2eb
Rather than overloading the parent_ref_frame value to shut off the
search in some cases, add a new validity flag. This cleans up some
of the duplicated mr_encoder_id && mr_low_res_mv_avail checks as
well, for readability.
Change-Id: Iddad93a27066c3d85ff2f25a361ac113b288ab7b
1. Algorithm modification:
Instead of having same filter threshold for a whole frame, now we
allow the thresholds to be adjusted for each macroblock. In current
implementation, to avoid excessive blur on background as reported
in issue480(http://code.google.com/p/webm/issues/detail?id=480), we
reduce the thresholds for skipped macroblocks.
2. SSE2 optimization:
As started in issue479(http://code.google.com/p/webm/issues/detail?id=479),
the filter calculation was adjusted for better performance. The c
code was also modified accordingly. This made the deblock filter
2x faster, and the decoder was 1.2x faster overall.
Next, the demacroblock filter will be modified similarly.
Change-Id: I05e54c3f580ccd427487d085096b3174f2ab7e86
In some situations, believed to be an interaction between temporal
scalability and dropped frames, the references available to an
encoder may not be the same references available to its parent.
Previously, the code tried to force the reference frame chosen by
the parent to be used on this frame, even if it was disabled. This
was preventing the pick mode loop from running even once, which led
to a crash.
Attempts to reproduce this bug locally were unsuccessful, so it is
still undetermined what the underlying cause of this issue is. In
the specific case that was failing, the application did not set
any flags which influenced the reference selection on that frame.
ref_frame_flags indicated that the golden frame was disabled,
believed to be because the last frame updated the last and golden
frames, so golden was shut off by default. It's not clear why this
wouldn't have also been true in the lower res encoder, ie, why the
lower res encoder decided to use and/or was allowed to use the
golden frame. We weren't able to debug into the non-crashing
lower res encoder as the crash couldn't be reproduced locally.
Change-Id: Ifb265253d26963ac2afde0e20cf6792788be6af7
This unit test compares the difference in quality with
error resilience enabled and disabled. The test runs
for all of the one-pass encoding modes.
The test ensures that the effect of turning on error
resilience makes less than a 10% difference in PSNR.
Further cases should be added to do a more comprehensive
test.
Change-Id: I1fc747fc78c9459bc6c74494f4b38308dbed0c32
If a parent mb is available but is intra coded, then parent_ref_mv is
invalid. Check that the parent is inter coded before trying to access
the parent_ref_mv. Previously the parent_ref_mv was being read from
an uninitialized stack allocation, causing potential OOB reads and
other undefined behavior.
Change-Id: I0c93cd412a19c3a184bcf6decaa145b3a036a6c0
Modified EncoderTest class to have separate member variables
for initialization time and per-frame.
Change-Id: I08a1901f8f3ec16e45f96297e08e7f6df0f4aa0b
The stats buffer needs to be reset between runs of the
encoder. I added a Reset() function to TwopassStatsStore
and called it at the beginning of each encode.
This enables us to run multiple encodes which was
previously not possible since there was no way to reset
the stats between runs.
Change-Id: Iebb18dab83ba9331f009f764cc858609738a27f9
Protect the call to {Initialize,Delete}CriticalSection() with an
Interlocked{Inc,Dec}rement() pair, rather than the previous static
initialization. This should play better with AppVerifier, and fix issue
http://code.google.com/p/webm/issues/detail?id=467
Change-Id: I06eadbfac1b3b4414adb8eac862ef9bd14bbe4ad
The codec as it stood placed a keyframe one frame after a
real cut scene - and ignored datarate and other considerations.
TODO: Its possible that we should detect a keyframe and recode
the frame ( in certain circumstances) to improve quality.
Change-Id: Ia1fd6d90103f4da4d21ca5ab62897d22e0b888a8
Reset the cyclie refresh mode index in alloc_compressor_data().
This is needed to handle both cases of internal and
external spatial resizing.
Change-Id: I2697e12d45135eae2e8f0d45161811f24722312a
On an internal spatial resize, this mode index was not reset to 0,
and therefore could exceed dimensions of seg_map or cyclic_refresh_map.
Change-Id: I6fe85dbd2765eb0207a9d9f71fda8d8b8c34f075
Patch Set 1: gain familiarity with unit tests... added simple
4x4 subtract test
Patch Set 2: fixed mistakes, parameterized as suggested
Patch Set 3: randomized the source/predictor data
Change-Id: I33432bdf7c9f2a9b8c2533a37106382c2a8209ee
Signed-off-by: Scott LaVarnway <slavarnway@google.com>
Fixes some build issues for people building for win32 who have a
pthreads emulation layer installed.
Change-Id: I0e0003fa01f65020f6ced35d961dcb1130db37a8
This should avoid problems with blocks gettings high quality
improvement despite having recently moved:
Change-Id: Ic0af0de2d6577807fa3c553f47b55d547ef36359
Set the seg map to 0 for key frame.
In previous commit on cyclic refresh, the seg map for key frame
was not reset, and instead used the seg map from last frame.
Change-Id: I848eb2face420dfcd2f7daca6f070b9127ca938b
-Increase the amount of mbs to be refreshed.
-Replace the delta qp with a fixed and reduced delta.
-Change to the mb update loop to try to always update same amount of mbs.
Change-Id: I93ac88002fd8dc677d2337f77998ff93f64e4ff9
This change is necessary for the frame-based multithreading
implementation.
Since the postproc occurs in this call, vpxdec was modified to time around
vpx_codec_get_frame()
Change-Id: I389acf78b6003cd35e41becc16c893f7d3028523
Initialize the top line at the beginning of each frame and
the left column at the beginning of each row.
Change-Id: I5412f7ea49ffc490215cf65a62715a6c5e3a5a29
Multiple decoders were getting allocated per frame.
If the decoder crashed we exitted with out freeing
them and the next time in we'd allocate over.
This fix removes the allocation and just has 8
boolcoders in the pbi structure
Change-Id: I638b5bda23b622b43b7992aec21dd7cf6f6278da
If the decoder crashes and returned an error before it set up
block offsets but after it set up frame buffers. We had a
problem decoding the next keyframe because the block offsets
were never set.
Change-Id: Ied2866e9770d80fc66241d5e0d978d4f5f9cdd89
Adjusts some of the qualification thresholds in mfqe to eliminate
artifacts due to wrong decisions. Besides, a new qualification
criteria is used to disable mfqe if the quality of the previous
frame is itself not too good.
Change-Id: I4097c20b7fd4fcc60cc3003c1e33e8faae2ff066
The denoiser function was modified to reduce the computational
complexity.
1. The denoiser c function modification:
The original implementation calculated pixel's filter_coefficient
based on the pixel value difference between current raw frame and last
denoised raw frame, and stored them in lookup tables. For each pixel c,
find its coefficient using
filter_coefficient[c] = LUT[abs_diff[c]];
and then apply filtering operation for the pixel.
The denoising filter costed about 12% of encoding time when it was
turned on, and half of the time was spent on finding coefficients in
lookup tables. In order to simplify the process, a short cut was taken.
The pixel adjustments vs. pixel diff value were calculated ahead of time.
adjustment = filtered_value - current_raw
= (filter_coefficient * diff + 128) >> 8
The adjustment vs. diff curve becomes flat very quick when diff increases.
This allowed us to use only several levels to get a close approximation
of the curve. Following the denoiser algorithm, the adjustments are
further modified according to how big the motion magnitude is.
2. The sse2 function was rewritten.
This change made denoiser filter function 3x faster, and improved the
encoder performance by 7% ~ 10% with the denoiser on.
Change-Id: I93a4308963b8e80c7307f96ffa8b8c667425bf50
Replace DECLARE_ALIGNED_ with vpx_memalign()
DECLARE_ALIGNED (__declspec(align())) does not work as intended when
used on class data members:
Data in classes or structures is aligned within the class or structure
at the minimum of its natural alignment and the current packing setting
(from #pragma pack or the /Zp compiler option)
Change-Id: I304aaa6c3716fbfae24675ecf192f4b40787e83e
For videos with big static background(such as video conferencing
clips), the mode decision was biased to ZEROMV in order to
obtain a stable background. The percentage of ZEROMV on last
frame was used to predict if there is static area in current frame,
and checking already-encoded neighboring macroblocks' motion
vectors to make sure the local area has low motion.
Change-Id: I05b3241d3a56a0bda88b6681e5646c1c8baf2e57
The current way of counting inter_zz_count doesn't work correctly
in multi-threaded encoding. Calculating it after the frame is
encoded fixed the problem.
Change-Id: Ifcb1972cde950b8cc194f75c6d7b6af09e8b0e65
Loop filter producing wierd artifacts when
repeatedly applied in noisy video. This
mitigates the effect.
Change-Id: If4b1a8543912d186a486f84e11d8b01f7436fa5f
visual studio targets do not depend on executables, only the projects
produced.
tested with --target=x86-win32-vs9
fixes:
...
make[1]: *** No rule to make target `test_libvpx', needed by `.bins'.
Stop.
Makefile:17: recipe for target `.DEFAULT' failed
Change-Id: I606ab32d5e26fee352f25c822e0f496eff165382
The current parsing logic of the dumpmachine tuple lacks any arm
cases which means tgt_isa never gets set, so for all arm targets,
we get detected as generic-gnu. Add some basic arm checks here
so the automatic detection logic works.
Change-Id: Ie5e98142876025c6708604236bc519c0bdb09319
If you build with --enabled-shared on a Linux arch not explicitly
listed, the configure script will abort because it didn't detect
"linux" in the fallback generic-gnu tuple.
Since this is the fallback tuple and people are passing
--enable-shared, assume the user knows what they're in for.
Change-Id: Ia35b657e7247c8855e3a94fca424c9884d4241e3
using large values for the timebase, e.g., {33333, 1000000} could
rollover the timestamp calculation in vp8e_encode as it was not using
64-bit math.
originally reported on ffmpeg's trac:
https://ffmpeg.org/trac/ffmpeg/ticket/1014
BUG=468
Change-Id: Iedb4e11de086a3dda75097bfaf08f2488e2088d8
Interleaved loopfiltering with decode. For 1080p clips, up to 1%
performance gain. For 4k clips, up to 10% seen. This patch is required
for better "frame-based" multithreading.
Change-Id: Ic834cf32297cc04f27e8205652fb9f70cbe290db
predict_d has become canonical. Remove previous helper function.
Disable ARM assembly pending update.
Change-Id: Idd84ac8a28f9b0221ea97904a77de1e705d06a7d
The sync interval for the multithreaded encoder was considered as not changing
during the encoding. This is not true if picture size is changed.
The encoder could dead-lock because the main thread and the other threads were
using different sync interval.
Change-Id: I75232bbdbc6c02d77f830d870fd8b4e96697c64e
After the picture size was changed to a bigger one, the internal memory was
corrupted and multithreaded encoder was deadlocking.
Memory for last frame's MVs, segmentation map and active map were allocated when
the compressor was created (vp8_create_compressor). Buffers need to be
reallocated when picture size is changed, so, the allocation was moved to
vp8_alloc_compressor_data, which is called every time the picture is resized.
Change-Id: I7ce16b8e69bbf0386d7997df57add155aada2240
The ambient qp and active worse/best qp were reset for every frame
when temporal layers is on. This change removes this reset.
As this affects the target size for forced key frames
(it will actually lower the size somewhat), we increased the
inital boost factor to compensate.
Change-Id: Ie38d95f5c99ab3d447469c49e2177bc3fcc4ad28
SAD returns unsigned values. Make all the declarations the same.
Remove bestsad initialization and check. It is always set to the
result of a SAD call so it will never remain UINT_MAX
Use ja instead of jg to test unsigned comparison instead of signed.
Update test.
Change-Id: I46336ab45f4e60fc37caf20bd36bc5782079c7a5
Precalculated block ptrs do not need updates during encoding.
Set these at init stage.
Moved the allocation of 'mt_current_mb_col' (last encoded MB on each
row) to vp8_alloc_compressor_data(), so that it is correctly
reallocated when frame size is changing.
Change-Id: Idcdaa2d0cf3a7f782b7d888626b7cf22a4ffb5c1
Added drop_frame support in multi-resolution encoder.
If one frame is dropped at a lower-resolution level, the next
upper-resolution level encoder needs to encode that frame
independently without any lower-resolution level motion
information.
Another issue is that if one frame is dropped at some but not all
resolution levels, a frame after that one may use different set
of reference frames at different resolution levels. This reference
frame asynchronization could degrade motion search precision in
upper-resolution level encoding, which uses lower-resolution level
motion result. This change compares the lower-resolution and upper-
resolution level's reference frames. If they are not the same, the
upper-resolution level encoder can not use lower-resolution level
motion result.
Change-Id: I61afa4f313630e75b7cbdd5742e230e8724a988a
Added validity checking in multi-res encoder. Disable spatial
resampling and frame dropping before we have those supports.
Also, deallocate the memory for all resolution levels once error
occurs.
Change-Id: Ia5d65a645381cad1a49940ab3a19bb5696c39c09
The lower complexity modes may not generate a keyframe automatically.
This behavior was found when running under Valgrind, as the slow
performance caused the speed selection to pick lower complexities than
when running natively. Instead, use a fixed complexity for the
realtime auto keyframe test.
Affected tests:
AllModes/KeyframeTest.TestAutoKeyframe/0
Change-Id: I44e3f44e125ad587c293ab5ece29511d7023be9b
This unit test tests vp8_sixtap_predict function against preset
data and random generated data. The test against preset data
checks the correctness of the functions, and the test against
random data checks if the optimized six-tap predictor functions
generate matching result as the c functions. It tests the
following functions:
vp8_sixtap_predict16x16_c
vp8_sixtap_predict16x16_mmx
vp8_sixtap_predict16x16_sse2
vp8_sixtap_predict16x16_ssse3
vp8_sixtap_predict8x8_c
vp8_sixtap_predict8x8_mmx
vp8_sixtap_predict8x8_sse2
vp8_sixtap_predict8x8_ssse3
vp8_sixtap_predict8x4_c
vp8_sixtap_predict8x4_mmx
vp8_sixtap_predict8x4_sse2
vp8_sixtap_predict8x4_ssse3
vp8_sixtap_predict4x4_c
vp8_sixtap_predict4x4_mmx
vp8_sixtap_predict4x4_ssse3
Change-Id: I6de097898ebca34a4c8020aed1e8dde5cd3e493b
This patch fixed issue 458 by calling copy function when both
offsets are 0, which guarantees the SSSE3 functions output
same result as the c function for all possible offsets.
Change-Id: I209aec7a4c6b3362db2646a8887c1038493b6496
xd->subpixel_predict16x16 is called in first pass, but isn't
initialized in first pass, which causes segfault. This patch
fixed that problem.
Change-Id: Ibd2cad4e2d32ea589fc3e0876d60d3079ae836e7
We need an easy way to build the unit test driver without running the
tests. This enables passing options like --gtest_filter to the
executable, which can't be done very cleanly when running under
`make test`.
Fixed a number of compiler errors/warnings when building the tests
in various configurations by Jenkins.
Change-Id: I9198122600bcf02520688e5f052ab379f963b77b
Fixed the quantifier that optionally matches a quote before the
filename. This was originally reported in the homebrew project[1].
Note that this fix is different than patch posted there, as there are
some platforms that don't have the quote, so it needs to be included
in the expression optionally.
[1]: https://github.com/mxcl/homebrew/issues/12567#issuecomment-6434000
Change-Id: Ibf2ed93ce169d80932e877f942dc4eeb03867f8b
Update the comment that defines the allowed ranges for
delta_q and delta_lf that can be used with segmentation.
Change-Id: Ie56ad6f946704259e03ffd49921a4cfb7e1e2f1f
The commit introduces a make target 'testdata' that downloads the
required test data from the WebM project website. The data will also
be downloaded if invoking `make test` but is not a strict requirement
for only building the test executable.
The download directory is taken from the LIBVPX_TEST_DATA_PATH
environment variable, or may be specified as part of the make command.
If unset, it defaults to the current directory. It's expected that
most developers will want to set this environment variable to a place
outside their source/build trees, to avoid having to download the data
more than once.
To add test data file:
1) add a line to test/test.mk:
LIBVPX_TEST_DATA-yes += foo-bar-file.y4m
2) add its sha1sum to the test/test-data.sha1 file in the following
format:
528cc88c821e5f5b133c2b40f9c8e3f22eaacc4c foo-bar-file.y4m
3) upload the file to the website
$ gsutil cp foo-bar-file.y4m gs://downloads.webmproject.org/test_data/libvpx
This implementation will check the integrity of the test data
automatically if the `sha1sum` executable is available.
Change-Id: If6910fe304bb3f5cdcc5cb9e5f9afa5be74720d2
This is a unit test for the post-processing functions:
- vp8_post_proc_down_and_across_c
- vp8_post_proc_down_and_across_mmx
- vp8_post_proc_down_and_across_xmm
Change-Id: Iec3e690327b17470209c00417835473f6d9a35d6
Disable unit-tests. The logging in GTest would need to be adjusted.
Restructure ARM cpu detection. Flatten if-else logic.
Change #if defined(HAVE_*) to #if HAVE_* because we only need to check
for features that the library was actually built with. This should have
been harmless, as disabled feature sets wouldn't have any features to
call.
Change-Id: Iea21aa42ce5f049c53ca0376d25bcd0f36f38284
Changes relating to Issue 411
Removed code that was clearing down the segmentation data each
frame.
Added range/parameter checking in vp8_set_roimap(); Return error
if called when cyclic_refresh is enabled.
Correct setup_features() so that it sets or clears the segment update
flags as appropriate.
Change-Id: Ib31ac53006640ddf1ba7b9ec8f8b952e3eff860a
Soft enable runtime cpu detect for armv7-android target, so that it
can be disabled and remove dependency on 'cpufeatures' lib.
Change the arm_cpu_caps implementation selection such that 'no rtcd' takes
precedence over system type.
Switch to use -mtune instead of -mcpu. NDK was complaining about
-mcpu=cortex-a8 conflicting with -march=armv7-a, not sure why.
Add a linker flag to fix some cortex-a8 bug, as suggested by NDK Dev
Guide.
Examples:
Configure for armv7+neon:
./configure --target=armv7-android-gcc \
--sdk-path=/path/to/android/ndk \
--disable-runtime-cpu-detect \
--enable-realtime-only \
--disable-unit-tests
...armv7 w/o neon:
./configure --target=armv7-android-gcc \
--sdk-path=/path/to/android/ndk \
--disable-runtime-cpu-detect \
--enable-realtime-only \
--disable-neon \
--cpu=cortex-a9 \
--disable-unit-tests
Change-Id: I37e2c0592745208979deec38f7658378d4bd6cfa
The function vp8_post_proc_down_and_across_c takes the
stride of both the src and dst images as parameters, but
assumes that they are the same.
I modified the code to use the correct strides, as the
assembler versions of these functions do.
Change-Id: I222715b774cd071b21c15a4b0d2f4aef64a520de
vpx uses symbols in libm and thus we need to provide an indication to
the user of libvpx that if they want to link against libvpx they must
also link against libm.
Change-Id: I31d4068bf7f6f5b1fd222bcdf9e6a1a92fb6696f
Avoid a pthreads dependency via pthread_once() when compiled with
--disable-multithread.
In addition, this synchronization is disabled for Win32 as well, even
though we can be sure that the required primatives exist, so that the
requirements on the application when built with --disable-multithread
are consistent across platforms.
Users using libvpx built with --disable-multithread in a multithreaded
context should provide their own synchronization. Updated the
documentation to vpx_codec_enc_init_ver() and vpx_codec_dec_init_ver()
to note this requirement. Moved the RTCD initialization call to match
this description, as previously it didn't happen until the first
frame.
Change-Id: Id576f6bce2758362188278d3085051c218a56d4a
Allows building the library with the gcc -pedantic option, for improved
portabilty. In particular, this commit removes usage of C99/C++ style
single-line comments and dynamic struct initializers. This is a
continuation of the work done in commit 97b766a46, which removed most
of these warnings for decode only builds.
Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966
Frame dropping decision is made by evaluating both current frame
and next frame's buffer_level. If both buffer_levels are less
than drop_mark, next frame is dropped. When frame dropping is
over, namely, buffer_level becomes normal again, we need to
reset decimation_count to 0.
Change-Id: Iae182612e61e0da367fbd43afdc90738d975d1a3
The logic for spatial resizing is done after the Q is selected for the
frame. This causes a problem that the Q we select for the (resized)
key frame may be based on a different resolution than the frame we
will encode.
This fix is to ensure that, when resize is on, the selected Q is still
based on the resolution of the frame to be encoded.
Change-Id: Ia49a9eac5f64e48d1c00dfc7ed4ce26fe84d3fa1
The change in assembly offset files to define values as const int broke
Windows build, because the variables are stored in .rdata section instead
of .data section.
This CL changes the integer peeking from .data to .rdata.
Change-Id: I87e465ddcc78d39ec29f3720ea7df0ab807d5512
This change is to allow obj_int_extract to extract all integers
in the data segment. With the const keyword these variables are
forced into the .rodata segment even for zero variable value.
We had a problem before that zero valueed variables would get
assigned to BSS segment that fooled obj_int_extract to give
incorrect values.
Change-Id: Icd94f80a8ab356879894ca508bf132d20b865299
Update the Visual Studio builds to support the new monolithic unit
test binary.
Includes minor semi-cosmetic refactoring of solution.mk, as the
%vpx.vcproj match is no longer appropriate given the test_libvpx
target.
Change-Id: I29e6e07c39e72b54a4b3eaca5b9b7877ef3fb134
Compares the sum of differences between the input block and the averaged
block. If they differ too much the block will not be filtered. Negligible
perfomance hit.
Change-Id: Ib1c31a265efd4d100b3abc4a1ea6675038c8ddde
Add PRIVATE macro for adding private_extern directive for yasm
to hide global symbols. This is only enabled if -DCHROMIUM is used
with YASM.
Also fixed a small problem with rtcd_defs.sh to guard TEMPORAL_DENOISING.
Change-Id: I9027fce3ebddcf20078293e4b86b396f21da7857
This extends the denoiser to work for temporally scalable
coding.
I believe this also fixes a very rare but really bad bug in the original
implementation.
Change-Id: I8b3593a8c54b86eb76f785af1970935f7d56262a
The change in assembly offset files to define values as const int broke
Windows build, because the variables are stored in .rdata section instead
of .data section.
This CL changes the integer peeking from .data to .rdata.
Change-Id: I87e465ddcc78d39ec29f3720ea7df0ab807d5512
Compares the sum of differences between the input block and the averaged
block. If they differ too much the block will not be filtered. Negligible
perfomance hit.
Change-Id: Ib1c31a265efd4d100b3abc4a1ea6675038c8ddde
Adds a test that ensures the application is able to trigger frame size
changes via vpx_codec_enc_config_set()
Change-Id: I231c062e533d75c8d63c5f8a5544650117429a63
This extends the denoiser to work for temporally scalable
coding.
I believe this also fixes a very rare but really bad bug in the original
implementation.
Change-Id: I8b3593a8c54b86eb76f785af1970935f7d56262a
This change is to allow obj_int_extract to extract all integers
in the data segment. With the const keyword these variables are
forced into the .rodata segment even for zero variable value.
We had a problem before that zero valueed variables would get
assigned to BSS segment that fooled obj_int_extract to give
incorrect values.
Change-Id: Icd94f80a8ab356879894ca508bf132d20b865299
Add PRIVATE macro for adding private_extern directive for yasm
to hide global symbols. This is only enabled if -DCHROMIUM is used
with YASM.
Also fixed a small problem with rtcd_defs.sh to guard TEMPORAL_DENOISING.
Change-Id: I9027fce3ebddcf20078293e4b86b396f21da7857
After a key frame encoding, the frame type could change while
filtering is still going on. Pass the frame type as parameter to the
loopfilter function and don't read it from common storage.
vp8cx_set_alt_lf_level has to be done before packing the stream.
Currently alt_lf_level is not used so there hasn't been any visible
problem here.
Change-Id: Ia114162158cd833c2b16e3b89303cc9c91f19165
The two-pass code does not support the case where the application
changes the frame size dynamically. Add this case to the validation
checks in the vpx_codec_enc_config_set() path.
Change-Id: Idadc42c7c3bd566ecdbce30d8dd720add097f992
* changes:
Add initial keyframe tests
Move all tests to test/ directory
Enable unit tests by default
Build unit tests monolithically
configure: initial support for CXX, CXXFLAGS variables
Resolution changes in calls to vpx_codec_enc_config_set() would cause
a memory leak due to failing to release the lookahead and alt ref
buffers.
Change-Id: I48392ea25e71fe2760d60cfde3fb3874598cc85f
Reverted part of change in memory alllocation code, which ensures
that the function returns 0 and encoder works correctly when
CONFIG_MULTI_RES_ENCODING isn't turned on.
Change-Id: Id5d5e7f2c8bd9e961a6dca79d257e8185f0d592a
After a key frame encoding, the frame type could change while
filtering is still going on. Pass the frame type as parameter to the
loopfilter function and don't read it from common storage.
vp8cx_set_alt_lf_level has to be done before packing the stream.
Currently alt_lf_level is not used so there hasn't been any visible
problem here.
Change-Id: Ia114162158cd833c2b16e3b89303cc9c91f19165
Implements a couple simple tests of the encoder API using the gtest
framework:
TestDisableKeyframes
TestForceKeyframe
TestKeyframeMaxDistance
Change-Id: I38e93fe242fbeb30bb11b23ac12de8ddc291a28d
Rework unit tests to have a single executable rather than many, which
should avoid pollution of the visual studio project namespace, improve
build times, and make it easier to use the gtest test sharding system
when we get these going on the continuous build cluster.
Change-Id: If4c3e5d4b3515522869de6c89455c2a64697cca6
Use CXX rather than assuming g++ to invoke the compiler. Also introduce
separate CXXFLAGS, as certain CFLAGS we enable by default cause warnings
with g++.
Change-Id: Ia2f40ef27c93e45c971d070cc58bdcde9da2ac7c
aligned buffers improve performace. this change brings vpxenc &
vp8_scalable_patterns in line with the other examples.
Change-Id: I4cf9f3e4728b901161905dd7ccb092e774ffb15f
In multi-resolution encoding, frame_type decision for each frame
is made by the lowest-resolution encoder. For all other higher-
resolution encoders, kf_mode is always set to VPX_KF_DISABLED,
and they are forced to use the same frame_type picked by the
lowest-resolution encoder.
Change-Id: Ic4d52ec65bbc012ca9c2d236210e28a295591eaf
Allow CHOST to override the gcc -dumpmachine output. This allows to
use the target autodetection code when cross compiling by setting the
CHOST variable.
On Gentoo, we would like to support easy cross-compilation, and for
libvpx this would basically mean copying the code in
build/make/configure.sh to setup the right --target option. It seems a
lot easier to let it guess by itself.
Another option I considered was using CROSS-gcc instead but this would
not work for our multilib setups: They use gcc -m32 to build 32bits
binaries and gcc -m32 -dumpmachine will output the 64bits version,
which would then make libvpx wrongly believe it is building for a
64bits architecture.
Change-Id: I05a19be402228f749e23be7473ca53ae74fd2186
Accept the same range of inputs for the VP8E_SET_NOISE_SENSITIVITY
control, regardless of whether temporal denoising is enabled or not.
This is important for maintaining compatibility with existing
applications.
Change-Id: I94cd4bb09bf7c803516701a394cf1a63bfec0097
Adds a unit test to the boolcoder that tests encoding
and decoding thousands of different bits, with different
probabilities in different patterns.
Code borrowed from the webp project - and its committers.
Change-Id: Icabbb884d57e666496490c961dd29b246144ab3e
Make functions only referenced from one translation unit static. Other
symbols with extern linkage get a vp8/vpx prefix.
Change-Id: I928c7e0d0d36e89ac78cb54ff8bb28748727834f
Change If4321cc5 fixed a bug caused by forward declarations not being
kept in sync across C files, resulting in a function call with the
wrong arguments. The commit moves the affected function declarations
into a header file, along with the other symbols from encodeframe.c
that were being sloppily shared.
Change-Id: I76a7b4c66d4fe175f9cbef7e52148655e4bb9ba1
Removes all runtime initialization of global data. This commit is a
squashed version of the following series cherry-picked from master.
This is necessary because of a change that was merged to the tester
that depends on the scaler being moved to the RTCD framework, which
is a worthwhile thing to include in Eider anyway.
- a91b42f02 Makes all global data in entropy.c const
- b35a0db0e Makes all global data in tokenize.c const
- 441cac8ea Makes all mode token tables const
- 5948a0210 Ports vpx_xcaler to new RTCD method
- 317d4244c Makes all mode token tables const part 2
Change-Id: Ifeaea24df2b731e7c509fa6c6ef6891a374afc26
Move the notion of 0 bitrate implying skip deeper into the codec,
rather than doing it at the multi-encoder API level. This preserves
v1.0.0 ABI compatibility, rather than forcing a bump to v2.0.0 over a
minor change. Also, this allows the case where the application can
selectively enable and disable the larger resolution(s) without having
to reinitialize the codec instace (for instance, if no target is
receiving the full resolution stream).
It's not clear how deep to push this check. It may be valuable to
allow the framerate adaptation code to run, for example. Currently put
the check as early as possible for simplicity, should reevaluate this
as this feature gains real use.
Change-Id: I371709b8c6b52185a1c71a166a131ecc244582f0
Original patch by Ginn Chen <ginn.chen@oracle.com> against libvpx
v0.9.0.
I've forward-ported it to the current version (which mostly
involved removing hunks that were no longer relevant), since I've
given up on getting Ginn to submit this upstream himself.
Change-Id: I403c757c831c78d820ebcfe417e717b470a1d022
Besides imposing a performance penalty at startup in most
configurations, these relocations break the dynamic linker for
native Fennec, since it does not support them at all.
Change-Id: Id5dc768609354ebb4379966eb61a7313e6fd18de
These are warnings in most builds, but show up as compile errors on
some platforms when these headers are included from C++ code.
Change-Id: I6c523b4dbbc699075fe73830442b51922e5a61d5
These are warnings in most builds, but show up as compile errors on
some platforms when these headers are included from C++ code.
Change-Id: I6c523b4dbbc699075fe73830442b51922e5a61d5
These values can be overridden with some poorly documented and
overloaded options: --libc and --sdk-path
../libvpx/configure --target=armv7-darwin-gcc --sdk-path=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer --libc=/Applications/Xcode.app/Contents/Developer/Platforms/iPhoneOS.platform/Developer/SDKs/iPhoneOS5.1.sdk/
So for someone who still wants to build with the iOS 5 SDK, the last
part of the path should be iPhoneOS5.0.sdk
Change-Id: Ibe93d96ae828c619700dc3222983aa4c30456b88
The ARNR filter uses MC to find the best match between the
ARF and other nearby frames in the filter-set. Since the
ARF is a member of the filter-set, MC in that case is
unnecesssary. This patch modifies the filter so it does
not apply MC in this case.
Change-Id: Ic0321199c08db2189a57f28d1700b745bc7ff66d
The ARNR filter uses a motion compensated temporal filter,
but the motion estimation implementation accounts for the
cost of the mv in its decision making process. The ARNR
filter uses a dummy cost table initialized to 0 as a way
to ignore the mv costs (which are irrelevant to the filter).
This CL modifies the ARNR filter implementation so that
the mv costing is ignored without the requirement for
dummy tables.
Change-Id: I4196aa5c24da63f858ff54fbaa5fc85ae1f1957f
The backup MODE_INFO buffer used in the error concealment was
allocated in the codec common parts allocation even though this is a
decoder only resource. Moved the allocation to the decoder side.
No need to update_mode_info_border as mode_info buffers are zero
allocated.
This fixes also a potential memory leak as the EC overlaps buffer was not
properly released before reallocation after a frame size change.
Change-Id: I12803d3e012308d95669069980b1c95973fb775f
Rather than using the static default maximum keyframe spacing
provided by vpx_codec_enc_config_default() set the default
value to 5 times the frame rate. Five seconds is too long for
live streaming applications, but is a compromise between seek
efficiency and giving the encoder freedom to choose keyframe
locations.
The five second value is from James Zern's suggestion in
http://article.gmane.org/gmane.comp.multimedia.webm.user/2945
Change-Id: Ib7274dc248589c433c06e68ca07232e97f7ce17f
Race was introduced by https://gerrit.chromium.org/gerrit/15563.
If loopfilter related config params were changed between frames, or
after a KEY frame, there could be a mismatch between encoder's and
decoder's recontructed image. In worst case, when frame buffers are
reallocated because of a size change, the loopfilter could
do an invalid data access (segmentation fault).
Fixes:
Sync with the loopfilter before applying any encoder changes in
vp8_change_config().
Moved the loopfilter synching to the top of
encode_frame_to_data_rate() so that it's done before any alteration of
the encoder.
Change-Id: Ide5245d2a2aeed78752de750c0110bc4b46f5b7b
Increment the last_row_mb_col counter by nsync after last MB of row is
ready. This way we dont need to check for last MB of row when
synching.
Set last MB of row ready just after row extension is done, This avoids
o potential race condition whit the processing of last MB of next row.
Change-Id: I19c44fd6041116ee5483be2143b4f4bfcd149eac
RD costs were local to MACROBLOCK data and had to be copied all the
time to each thread's MACROBLOCK data. Tables moved to a common place
and only pointers are setup for each encoding thread.
vp8_cost_tokens() generates 'int' costs so changed all types to be
int (i.e. removed unsigned).
NOTE: Could do some more cleaning in vp8cx_init_mbrthread_data().
Change-Id: Ifa4de4c6286dffaca7ed3082041fe5af1345ddc0
The block pointers and offset do not need to be calculated for every
frame. Block internal predictors can be update once when decoder is
allocated. Destination and previous buffer offsets have to be updated
just when frame size is changing.
Change-Id: I92ca8df0e6aaac4cc35ab890751d446760bf82e2
Key frame macrobock and block mode probabilities are constant.
Remove the allocation of tables for each codec instance and use
instead the default const prob tables.
Change-Id: I8361798ac491f9b3889e86925a494c58647c753f
Move the notion of 0 bitrate implying skip deeper into the codec,
rather than doing it at the multi-encoder API level. This preserves
v1.0.0 ABI compatibility, rather than forcing a bump to v2.0.0 over a
minor change. Also, this allows the case where the application can
selectively enable and disable the larger resolution(s) without having
to reinitialize the codec instace (for instance, if no target is
receiving the full resolution stream).
It's not clear how deep to push this check. It may be valuable to
allow the framerate adaptation code to run, for example. Currently put
the check as early as possible for simplicity, should reevaluate this
as this feature gains real use.
Change-Id: I371709b8c6b52185a1c71a166a131ecc244582f0
Look for changes in the codec's configured w/h instead of its active
w/h when forcing keyframes. Otherwise calls to vp8_change_config()
will force a keyframe when spatial resampling is active.
Change-Id: Ie0d20e70507004e714ad40b640aa5b434251eb32
(see Change I9b2ccc88: Makes all mode token tables const)
Further remove runtime table initialization and use
precalculated const data. Data footprint reduced
by 4112 bytes.
Change-Id: Ia3ae9fc19f77316b045cabff01f6e5f0876a86ab
Ensure that RTCD function pointers are set at most once, to silence
some data race warnings. Implementation provided for POSIX threads and
Win32, with the prior unsynchronized behavior left in place for other
platforms.
Change-Id: I65c5856df43ef67043b3d5f26ddafddd8fcb2f7e
We can get rid of all remaining global initializers now:
vp8_scale_machine_specific_config()
vp8_initialize()
vp8dx_initialize()
Change-Id: I2825cea5d1c01ad9f6c45df49a0f86d803bfeb69
Mode token tabels precalculated in entropymode.c.
Removes vp8_initialize_common()as all common global data
is precalculated const now.
Change-Id: I9b2ccc883e4f618069e1bc180dad3a823394eb73
Removes all runtime initialization of global data in tokenize.c.
DCT token and cost tabels are pre-generated.
Second patch in a series to make sure code is reentrant.
Change-Id: Iab48b5fe290129823947b669413101f22a1bcac0
Removes all runtime initialization of global data in entropy.c.
Precalculated values are used for initializing all entropy related
tabels.
First patch in a series to make sure code is reentrant.
Change-Id: I9aac91a2a26f96d73c6470d772a343df63bfe633
When producing an invisible ARF, the time stamp counters aren't
updated since the last time stamp is seen by the codec twice. The
prior code was trapping this case with refresh_alt_ref, but this isn't
correct for other uses of the ARF. Instead, use the show_frame flag.
Change-Id: If67fff7c6c66a3606698e34e2fb5731f56b4a223
Local variable offsets are now consistent for the functions,
removed unused parameters, reworked the assembly to eliminate
stalls/instructions.
Change-Id: Iaa37668f8a9bb8754df435f6a51c3a08d547f879
Failed to build on Linux (as described in Android.mk) with NDK r7b.
Set vpx_rtcd.h dependency after libvpx sources are added to
LOCAL_SRC_FILES so that vpx_rtcd.h is generated before any libvpx file
is touched.
Change-Id: Ibe19d485ca9f679dc084044df0e3fb14587c4d3e
This patch includes:
1. fixes to disable block based termporal mixing when motion
is detected (because this version of mfqe only handles zero motion).
2. The criterion used for determining whether to mix or
not are changed to use squared differences rather than
absolute differences.
3. Additional checks on color mismatch and excessive block
flatness added. If the block as decoded has very low activity
it is unlikely to yield benefits for mixing.
Change-Id: I07331e5ab5ba64844c56e84b1a4b7de823eac6cb
In cases where you have a flat background occluded by a moving object
of similar luminosity in the foreground, it was likely that the
foreground blocks would persist for a few frames after the background
is uncovered. This is particularly noticable when the object has a
different color than the background, so add the chroma planes in as an
additional check.
In addition, for block sizes of 8 and 16, the luma threshold is
applied on four subblocks independently, which helps when only part of
the background in the block has been uncovered.
This fixes issue #392, which includes a test clip to reproduce the
issue.
BUG=392
Change-Id: I2bd7b2b0e25e912dcac342e5ad6e8914f5afd302
When using 'make dist' after --disable-vp8[encoder|decoder] it would
fail to recognize the option. This would only occur when also specifying
--enable-install-docs and --enable-install-srcs but not
--enable-codec-srcs
Including vpx/ fixes builds with --enable-codec-srcs
vpx_timer.h is also required for vpxenc.c
Change-Id: Ie3e28b2f7ec7ee6d5961d3843f9eab869f79c35b
truncate() operates from the current file pointer position. On at least
Linux specifying 0 without resetting the pointer will pad the file with
zeros to the current offset.
Change-Id: Ide704a1097f46c0c530f27212bb12e923f93e2d6
It's common for commit messages to be wrapped at odd places. git-gui
is often to blame. Adds support for automatically fixing up these
messages if running ftfy --amend, and adds a new option --msg-only for
fixing only the commit message.
Change-Id: Ia7ea529f8cb7395d34d9b39f1192598e9a1e315b
Reduced the size of the struct by 8 bytes, which would be
a memory savings of 64800 bytes for 1080 resolutions. Had
an extra byte, so created an is_4x4 for B_PRED or SPLITMV
modes. This simplified the mode checks in
vp8_reset_mb_tokens_context and vp8_decode_mb_tokens.
Change-Id: Ibec27784139abdc34d4d01f73c09f43e9e10e0f5
This is a utility for applying a limited amount of style correction on
a change-by-change basis. Rather than a big-bang reformatting, this
tool attempts to only correct the style in diff hunks that you touch.
This should make the cosmetic changes small enough that we can mix them
with functional changes without destroying the diffs, and there's an
escape hatch for separating the reformatting to a second commit for
purists and cases where it hurts readability.
At this time, the script requires a clean working tree, so run it after
you've commited your changes. Run without arguments, the style
corrections will be applied and left unstaged in your working copy. It
also supports the --amend option, which will automatically amend your
HEAD with the corrected style, and --commit, which will create a new
change dependent on your HEAD that contains only the whitespace changes.
There are a number of ways this could be applied in an automated manner
if this proves to be useful, either on a project-wide or per-user
basis. This doesn't buy anything in terms of real code quality, the
intent here would be to keep formatting nits out of review comments in
favor of more meaningful ones and help people whose habitual style
doesn't match the baseline.
Requires astyle[1] 1.24 or newer.
[1]: http://astyle.sourceforge.net/
Change-Id: I2fb3434de8479655e9811f094029bb90e5d757e1
This new vp8_decode_mb_tokens() uses a modified version of
WebP's GetCoeffs function. For now, the dequant does not
occur in GetCoeffs.
Tests showed performance improvements up to 2.5% depending
on material.
Change-Id: Ia24d78627e16ffee5eb4d777ee8379a9270f07c5
Adds logic to disable mfqe for the first frame after a configuration
change such as change in resolution. Also adds some missing
if CONFIG_POSTPROC macro checks.
Change-Id: If29053dad50b676bd29189ab7f9fe250eb5d30b3
This change added a motion search skipping mechanism similar
to what we did in second pass. For a macroblock that is very
similar to the macroblock at same location on last frame,
we can set its mv to be zero, and skip motion search. This
improves first-pass performance for slide shows and video
conferencing clips with a slight PSNR loss.
Change-Id: Ic73f9ef5604270ddd6d433170091d20361dfe229
Universal builds create subdirectories for each target. Without
BUILD_PFX we only generated one vpx_rtcd.h instead of one for each.
Change-Id: I1caed4e018c8865ffc8da15e434cae2b96154fb4
Newer XCodes have moved the SDK path from /Developer/SDKs
Use a suggestion from jorgenisaksson@gmail.com to locate it
osx_sdk_dir is not required to be set. Apple now offers a set
command line tools which do not require this. isysroot is also
not required in newer versions of XCode so only set it when we
are confident in the location.
There remain issues with the iOS configure steps which will be
addressed later
Change-Id: I4f5d7e35175d0dea84faaa6bfb52a0153c72f84b
doxygen < 1.7.? seems to have been more tolerant of single line
\if/\endif
This change fixes warnings such as:
mainpage.dox:13: warning: unable to resolve reference to `vp8_encoder-'
for \ref command
vpx_decoder.h:193: warning: explicit link request to 'n' could not be
resolved
Change-Id: If3d04af5ede1b0d1e2c63021d0e4ac8f98db20b2
This issue likely doesn't appear in the unmodified encoder, but
sufficient hacking on the mode selection loop can expose it.
Change-Id: I8a35831e8f08b549806d0c2c6900d42af883f78f
Break MFQE code into it's own file.
It is currently only valid for 16x16 and 8x8 Y blocks. It also filters
4x4 U/V blocks.
Refactor filtering and add associated assembly. Limited test cases show
--mfqe introduces a penalty of ~20% with HD content. The assembly
reduces the penalty to ~15%
Change-Id: I4b8de6b5cdff5413037de5b6c42f437033ee55bf
build/make/version.sh requires CHANGELOG to generate vpx_version.h
The file is already included when building the documentation. However,
documentation is not build if doxygen/php are not present.
This is necessary when using '--enable-install-srcs --enable-codec-srcs'
and 'make dist'
Change-Id: Icada883a056a4713d24934ea44e0f6969b68f9c2
Coefficient costing failed to take account of the first branch
being skipped ( 0 vs eob) if the previous token is 0.
Fixed rd to account for slightly increased token cost & cleaned up
warning message
Change-Id: I56140635d9f48a28dded5a816964e973a53975ef
Propagate debug setting to the EBML struct. When writing the application
name, this allows us to strip the version code and keep the output
metadata static.
Change-Id: I8e06c6abd743bedbff5af6242bbdae5d55754538
When we do 2-pass encoding, elapsed time is accumulated through
whole 2-pass process, which gives incorrect time and fps results
for second pass. This change fixed that by resetting the time
accumulator for second pass.
Change-Id: Ie6cbf0d0e66e6874e7071305e253c6267529cf20
Produce the token partitions on-the-fly, while processing each MB.
Context is updated at the beginning of each frame based on the
previoud frame's counters. Optimally encoder outputs partitions in
separate buffers. For frame based output, partitions are concatenated
internally.
Limitations:
- enabled just in combination with realtime-only mode
- number of encoding threads has to be equal or less than the
number of token partitions. For this reason, by default the encoder
will do 8 token partitions.
- vpxenc supports partition output (-P) just in combination with
IVF output format (--ivf)
Performance:
- Realtime encoder can be up to 13% faster (ARM) depending on the number
of threads and bitrate settings. Constant gain over the 5-16 speed
range.
- Token buffer reduced from one frame to 8 MBs
Quality:
- quality is affected by the delayed context updates. This again
dependents on input material, speed and bitrate settings. For VC
style input the loss seen is up to 0.2dB. If error-resilient=2
mode is used than the effect of this change is negligible.
Example:
./configure --enable-realtime-only --enable-onthefly-bitpacking
./vpxenc --rt --end-usage=1 --fps=30000/1000 -w 640 -h 480
--target-bitrate=1000 --token-parts=3 --static-thresh=2000
--ivf -P -t 4 -o strm.ivf tanya_640x480.yuv
Change-Id: I127295cb85b835fc287e1c0201a67e378d025d76
Eliminated some mb branches along with other code cleanups.
This is part of an ongoing effort to remove cut/paste
code in the decoder.
Change-Id: Ifabb0f67cafa6922b5a0e89a0d03a9b34e9e5752
Reworked the code to use vp8_build_intra_predictors_mby_s,
vp8_intra_prediction_down_copy, and vp8_intra4x4_predict_d_c
functions instead. vp8_intra4x4_predict_d_c is a decoder-only
version of vp8_intra4x4_predict. Future commits will fix this
code duplication.
Change-Id: Ifb4507103b7c83f8b94a872345191c49240154f5
When we encode slide-show clips, for the majority of the time,
only ZEROMV mode is checked, and all other modes are skipped.
This change delayed uv intra-mode evaluation until intra mode is
actually checked. This gave big performance gain for slide-show
video encoding (2nd pass gain: 18% to 28%). But, this change
doesn't help other types of videos.
Also, zbin_mode_boost is adjusted in mode-checking loop, which
causes bitstream mismatch before/after this change when --best
or --good with --cpu-used=0 are used.
Change-Id: I582b3e69fd384039994360e870e6e059c36a64cc
use oxcf instead of common in check to Reinit the
lookahead buffer if the frame size changes
prior behavior would cause assertion fail/crash
first observed in:
support changing resolution with vpx_codec_enc_config_set
Change-Id: Ib669916ca9b4f206d4cc3caab5107e49d39a36aa
When temporal layers is used (i.e., number_of_layers > 1),
we don't use the frame rate boost for setting the key
frame target size. The factor was forcing the target size to be
always at its minimum (2* per_frame_bandwidth) for low frame rates
(i.e., base layer frame rate).
Generally we should modify or remove this frame rate factor;
for now we turn if off for number_of_layers > 1.
Change-Id: Ia5acf406c9b2f634d30ac2473adc7b9bf2e7e6c6
Reworked the code to use vp8_build_intra_predictors_mbuv_s
instead. This is WIP with the goal of eliminating all
functions in reconintra_mt.h
Change-Id: I61c4a132684544b24a38c4a90044597c6ec0dd52
When compiling with -ggdb3 the output includes an extraneous EQU from
vpx_ports/asm_offsets.h
https://trac.macports.org/ticket/33285
Change-Id: Iba93ddafec414c152b87001a7542e7a894781231
In vp8_rd_pick_inter_mode(), if total of eobs is zero, rate needs
to be adjusted since there are no non-zero coefficients for
transmission. The uv intra eobs calculated in
rd_pick_intra_mbuv_mode() need to be saved before they are
overwritten by inter-mode eobs.
Change-Id: I41dd04fba912e8122ef95793d4d98a251bc60e58
mode_info_context->mbmi.mb_skip_coeff has to always reflect the
existence or not of coeffs for a certain MB. The loopfilter needs this
info.
mb_skip_coeff is either set by the vp8_tokenize_mb or has to be set to
1 when the MB is skipped by mode selection. This has to be done
regardless of the mb_no_coeff_skip value.
prob_skip_false is needed just when mb_no_coeff_skip is 1. No need to
keep count of both skip_false and skip_true as they are complementary
(skip_true+skip_false = total_mbs)
Change-Id: I3c74c9a0ee37bec10de7bb796e408f3e77006813
Depending on implementation the optimized SAD functions may return early
when the calculated SAD exceeds max_sad.
Change-Id: I05ce5b2d34e6d45fb3ec2a450aa99c4f3343bf3a
Built in echo in 'sh' on OS X does not support -n (exclude trailing
newline). It's not necessary so just leave it off. Fixes issue 390.
Build include guard using 'symbol' so that it is more likely to be
unique.
Change-Id: I4bc6aa1fc5e02228f71c200214b5ee4a16d56b83
Add the ability to specify multiple output streams on the command line.
Streams are delimited by --, and most parameters inherit from previous
streams.
In this implementation, resizing streams is still not supported. It
does not make use of the new multistream support in the encoder either.
Two pass support runs all streams independently, though it's
theoretically possible that we could combine firstpass runs in the
future. The logic required for this is too tricky to do as part of this
initial implementation. This is mostly an effort to get the parameter
passing and independent streams working from the application's
perspective, and a later commit will add the rescaling and
multiresolution support.
Change-Id: Ibf18c2355f54189fc91952c734c899e5c072b3e0
Refactoring some of the mode decoding logic introduced a bug where
the segmentation maps would not be properly reset on keyframes.
http://code.google.com/p/webm/issues/detail?id=378
The text of the bug is somewhat misleading as I initially read it to
imply the bug was present in v0.9.7-p1 (Cayuga), but note the text
"master", which indicates this was something subsequent. This issue
bisects back to v0.9.7-p1-84-ga99c20c, so unfortunately it was broken
during the Duclair release.
Thanks to Alexei Leonenko for investigating the root cause.
Change-Id: I9713c9f070eb37b31b3b029d9ef96be9b6ea2def
On Android NDK, rand() is inlined function. But, on our SSE optimization,
we need symbol for rand()
Change-Id: I42ab00e3255208ba95d7f9b9a8a3605ff58da8e1
Replace inner loops of pack_mb_row_tokens_c and
pack_tokens_into_partitions_c with a call to pack_tokens_c.
Change-Id: I0341554fb154a14a5dadb63f8fc78010724c2c33
Second shot at this...
Sync with loopfilter thread as late as possible, usually just at the
beginning of next frame encoding. This returns control to application
faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed
imediatly so we cannot delay the sync. Same has to be done when
internal frame is previewed.
Change-Id: I64e110c8b224dd967faefffd9c93dd8dbad4a5b5
As an optimization some architectures use the max_sad argument to break
out early from the SAD. Pass in INT_MAX instead of 0 to prevent this.
Change-Id: I653c476834b97771578d63f231233d445388629d
In the variance calculations the difference is summed and later squared.
When the sum exceeds sqrt(2^31) the value is treated as a negative when
it is shifted which gives incorrect results.
To fix this we cast the result of the multiplication as unsigned.
The alternative fix is to shift sum down by 4 before multiplying.
However that will reduce precision.
For 16x16 blocks the maximum sum is 65280 and sqrt(2^31) is 46340 (and
change).
PPC change is untested.
Change-Id: I1bad27ea0720067def6d71a6da5f789508cec265
Allow the application to change the frame size during encoding. This
is only supported when not using lagged compress.
Change-Id: I89b585d703d5fd728a9e3dedf997f1b595d0db0f
MFQE postproc crashed with stream dimensions not a multiple of 16.
The buffer was memset unconditionally, so if the buffer allocation
fails we end up trying to write to NULL.
This patch traps an allocation failure with vpx_internal_error(),
and aligns the buffer dimensions to what vp8_yv12_alloc_frame_buffer()
expects.
Change-Id: I3915d597cd66886a24f4ef39752751ebe6425066
The 5-layer encode must have a keyframe every 16 frames.
The KF flag was being reset after the encode of the first
frame, which it should not do for the 5-layer case
(mode=6).
Change-Id: I207d6e689d347fe3fd1075b97a817e82f7ad53b9
Sometimes, a user doesn't have enough bandwidth to send high-resolution
(i.e. HD) video even though the camera catches HD video. This change
allowed users to skip highest-resolution encoding by setting that level's
target bit rate to 0.
To test it, modify the following line in vp8_multi_resolution_encoder.c.
unsigned int target_bitrate[NUM_ENCODERS]={1400, 500, 100};
To skip the highest-resolution level, change it to
unsigned int target_bitrate[NUM_ENCODERS]={0, 500, 100};
To skip the first and second highest resolution levels, change it to
unsigned int target_bitrate[NUM_ENCODERS]={0, 0, 100};
This change also fixed a small problem in mapping, which slightly helped
quality and performance.
Change-Id: I977bae9a9fbfba85c8be4bd5af01539f2b84bc81
This is the final commit in the series converting to the new RTCD
system. It removes the encoder csystemdependent files and the remaining
global function pointers that didn't conform to the old RTCD system.
Change-Id: I9649706f1bb89f0cbf431ab0e3e7552d37be4d8e
This commit continues the process of converting to the new RTCD
system. It removes the last of the VP8_ENCODER_RTCD struct references.
Change-Id: I2a44f52d7cccf5177e1ca98a028ead570d045395
This is a proof of concept RTCD implementation to replace the current
system of nested includes, prototypes, INVOKE macros, etc. Currently
only the decoder specific functions are implemented in the new system.
Additional functions will be added in subsequent commits.
Overview:
RTCD "functions" are implemented as either a global function pointer
or a macro (when only one eligible specialization available).
Functions which have RTCD specializations are listed using a simple
DSL identifying the function's base name, its prototype, and the
architecture extensions that specializations are available for.
Advantages over the old system:
- No INVOKE macros. A call to an RTCD function looks like an ordinary
function call.
- No need to pass vtables around.
- If there is only one eligible function to call, the function is
called directly, rather than indirecting through a function pointer.
- Supports the notion of "required" extensions, so in combination with
the above, on x86_64 if the best function available is sse2 or lower
it will be called directly, since all x86_64 platforms implement
sse2.
- Elides all references to functions which will never be called, which
could reduce binary size. For example if sse2 is required and there
are both mmx and sse2 implementations of a certain function, the
code will have no link time references to the mmx code.
- Significantly easier to add a new function, just one file to edit.
Disadvantages:
- Requires global writable data (though this is not a new requirement)
- 1 new generated source file.
Change-Id: Iae6edab65315f79c168485c96872641c5aa09d55
Commit 892e23a5b introduced support for the VP8D_GET_LAST_REF_USED,
but missed the mapping of the control id to the underlying function,
so it was unavailable to applications.
In addition, the underlying function vp8_references_buffer() is
moved from common/postproc.c to decoder/onyxd_if.c as postproc.c is
not built in all configurations.
Change-Id: I426dd254e7e6c4c061b70d729b69a6c384ebbe44
Added new 2 and 3 layer prediction frame patterns to
vp8_scalable_patterns.c and modified the coding
parameters.
Change-Id: I18798fd7326a79d2ad1e1d5b6c26f5516b6d247f
Commit e06c242ba introduced a change to call vp8_find_near_mvs() only
once instead of once per reference frame by observing that the only
effect that the frame had was on the bias applied to the motion
vector. By keeping track of the sign_bias value, the mv to use could
be flip-flopped by multiplying its components by -1.
This behavior was subtley wrong in the case when clamping was applied
to the motion vectors found by vp8_find_near_mvs(). A motion vector
could be in-bounds with one sign bias, but out of bounds after
inverting the sign, or vice versa. The clamping must match that done
by the decoder.
This change modifies vp8_find_near_mvs() to remove the clamping from
that function. The vp8_pick_inter_mode() and vp8_rd_pick_inter_mode()
functions instead track the correctly clamped values for both bias
values, switching between them by simple assignment. The common
clamping and inversion code is in vp8_find_near_mvs_bias()
Change-Id: I17e1a348d1643497eca0be232e2fbe2acf8478e1
This commit is incomplete, as it does not synchronize the loop filter
before returning a handle to the reconstructed frame in
vpx_codec_get_preview_frame(), which can cause (false?) failures
when running the test_reconstruct_buffer test.
This may be related to a bug that does cause visible artifacts, which
is also under investigation.
This reverts commit 380d64ecb1.
Change-Id: Iad710941e7731d44fc2bde63bc63d6763cc4629e
A processor with ARMv7 instructions does not
necessarily have NEON dsp extensions. This CL
has the added side effect of allowing the ability
to enable/disable the dsp extensions cleanly.
Change-Id: Ie1e879b8fe131885bc3d4138a0acc9ffe73a36df
Makes the thresholds for the multiframe quality enhancement module
depend on the difference between the base quantizers. Also modifies
the mixing function to weigh the current low quality frame less if
the difference in quantizer is large. With the above modifications
mfqe works well for both scalable patterns as well as low quality
key frames.
Change-Id: If24e94f63f3c292f939eea94f627e7ebfb27cb75
Android.mk file for using the Android NDK build
system to compile. Adds option for SDK path to
use the compiler that comes with android for testing
compiler compliance.
Change-Id: I5fd17cb76e3ed631758d3f392e62ae1a050d0d10
filter in a way such that when there is a single bad frame, the
post-processing is applied not only to just that frame but a
few subsequent frames as well.
Change-Id: Iba5d9896eed77244eb76b4a74692a93f8ecff634
When running multi-layer (ML) encodes and dynamically
changing coding parameters on the fly (e.g. frame
duration/rate, bandwidths allocated to each layer)
the encoder would not produce sensible output.
In certain cases the rate targeting would be
hideously inaccurate.
These fixes make it possible to change these coding
parameters correctly and to maintain accurate control
of the rate targeting.
I also added the specification of the input timebase
into the test program, vp8_scalable_patterns.c.
Patch 2: Moved declaration to appease MS compiler)
Change-Id: Ic8bb5a16daa924bb64974e740696e040d07ae363
Add new get_predictor_pointers() and get_reference_search_order()
functions for code shared between the two implementations.
Change-Id: I1ebe76aa8f168b1f5cfabc00d05d8f19a0d4d207
with deblock or demacroblock filters. When --mfqe is used together
with --demacroblock or --deblock, mfqe is applied first and then
demacroblock/deblock is applied to the mfqe result.
Change-Id: Id83ee01f1b4a33a116f071dcf26d59c7f3497c32
precision used in initialization of roundoff is not a constant
updated to use #define MFQE_PRECISION 4
Change-Id: If2fc3d3d633d58a7f4ab34d258c232ec1e5f0a79
Code was from a time when the compiler was
not good at optimizing. Compilers are better
and instruction sets have increased to make
this code obsolete.
Change-Id: I8d261371685247465eb4aa00bdcecc9fe1784219
The default maximum keyframe interval is 9999, or over five minutes
at normal video rates. When the encoder produces streams with such
a long interval seeking (with correct output) is more expensive,
and live streaming is impossible.
Of course the encoding application should set this parameter
based on its knowledge of the intended use of the stream, but
reducing the default gives better results for applications
which do not.
Change-Id: I900b15d74ce72ecc3ade4d43f758c5cf97a2098a
This patch removes the local copies of the dequantize
constants and implements John's idea as described
in "Make a local copy of the dequantized data" commit.
Change-Id: Ic6b7d681f00bf63263f71ff1e39ab2f80729e8b2
Adds a multiframe postprocessing module to enhance the quality of
certain frames that are coded at lower quality than preceding frames.
The module can be invoked from the commandline by use of the --mfqe
option, and will be most beneficial for enhancing the quality of
frames decoded using scalable patterns.
Uses the vp8_variance_var16x16 and vp8_variance_sad16x16 function
pointers to compute SAD and Variance of blocks.
Change-Id: Id73d2a6e3572d07f9f8e36bbce00a4fc5ffd8961
Current android ndk compiler does not recognize
strings for attributes. Numerical equivalents
can be found in the "ARM IHI 0045C" document.
Change-Id: I72de85b8949dc0ae5212af604fff1d5a91a828ea
Except zrun_zbin_boost, 15 AC values are the same for all other
parameters. Removed unneccessary calculation.
Change-Id: I6101c0fe8080bd2b4387c3b04d7ddedbf6010409
Makes the distribution tree (built with 'make dist') buildable with
--enable-install-srcs --enable-multi-res-encoding
Change-Id: If2ea7632f7b26615196e9abcfaa34618cc50112a
The code had a number of constructs like (condition)?1:0,
which is redundant with C's semantics. In the cases where a boolean
operator was used in the condition, simply remove the ternary part.
Otherwise adjust the surrounding expression to remove the condition
(eg, for rounding up. See pickinter.c and rdopt.c)
Change-Id: Icb2372defa3783cf31857d90b2630d06b2c7e1be
Multithreaded encoding was breaking at low bitrates
Please review/comment. Not sure if this is the best fix.
Change-Id: I87468c765372593fd865bc82e25121ebb8ca6af2
Make bilinearfilter_arm.c compiled only when HAVE_ARMV6, as its definitions
are v6 only. This is normally not a problem for static builds as the file
is elided at link time, but this was not being done properly for the
--enable-shared --enable-pic build.
Change-Id: Ic800a7cde751f74f22555c5b247f99f9df5e550d
Merged multi-resolution motion estimation with regular motion
estimation function in order to remove duplicated part. This
caused slight changes in multi-resulotion encoder quality &
performance.
Change-Id: Ib4ecc7acfebfe5eea959b5b91febae6db7b95fd1
The total_stats, this_frame_stats, and total_left_stats structures
were previously create by a heap allocation, despite being of fixed
size. These structures were allocated and deallocated during
{de,}allocate_compressor_data, which is reinvoked whenever the frame
size changes. Unfortunately, this clobbers the total_stats and
total_left_stats data.
Historically, these were variable size at one time, due to the first
pass motion map, which necessitated their being created by a unique
heap allocation. However, this bug with the total_stats being
clobbered has probably been present since that initial implementation.
These structures are instead moved to be stored within the struct
twopass_rc directly, rather than being heap allocated separately.
Change-Id: I7f9e519e25c58b92969071f0e99fa80307e0682b
When mb_skip_coeff is set, the idct is not necessary. Prior
to this patch, the code would call idcts based on leftover
eob information. This patch will now skip the idct for
SPLIT_MV and clear out the eobs for B_PRED, forcing the idct
to be skipped.
Change-Id: If5b0d2ed3ebd07789d30ec5160df927485fcaa17
These functions are now used by the encoder.
This is WIP with the goal of creating a common idct/add for
the encoder and decoder. A boost of 1.8% was seen for
the HD rt test clip used.
[Tero] Added needed changes to ARM side.
Change-Id: Ibbb8000be09034203d7adffc457d3c3f8b06a5bf
While doing motion search on a macroblock, we usually call
vp8_find_near_mvs once per reference frame. Actually, for
different reference frames, the only difference in calculating
these near_mvs is they may have different sign_bias, which
causes a sign change in resulting near_mvs. In this change, we
only do find_near_mvs for the first reference frame. For other
reference frames, only need to adjust the near_mvs according to
that reference frame's sign_bias value.
Change-Id: I661394b49c6ad79fed7d0f2eb2be239b9c56f149
Add the notion of deferred validation of parameters. We don't want to
validate the cq_level at initialization time, because it won't have
been set via set_param() yet.
Change-Id: Ia1308395e8c10e0b1dc4e9af3a09b2bd6744cc30
The fast-path for skipped MBs was not correctly respecting the
block type during update of the coefficient counts. Extracted
this from part of change I365cfb6ac636f19c545f682e3aeac185253abaef
Change-Id: I53d8cf0a00a98034b97b0ed3414b703bae74a739
The conversion script was incorrectly matching
CONFIG_POSTPROC[_VISUALIZER] and generating an
incorrect vpx_config.asm
Match both PROC and ENDP on word boundaries
Change-Id: Ic2788c3b522d4ee0afc5223b72e1b09fb52645be
Aligned the image buffer and stride to 32 bytes. This enables
calling of optimized scaler function in libyuv, and improves
the performance.
Tested libyuv scaler(x86 optimization) on Linux and Windows,
including: Linux 32/64bit, visual studio 32/64bit, Cygwin, and
MinGW32.
Also, fixed a wrong pointer in vpx_codec_encode().
Change-Id: Ibe97d7a0a745f82c43852fa4ed719be5a4db6abc
The previous definition of uintptr_t was incorrect on x64 with MSVC.
Also, the ARMV6 version of int_fast16_t was defined as unsigned for
some reason. Since the fast types are currently unused, just remove
them.
Change-Id: Idd73f77a989c77feedcb4a6802ae6bd37324ed40
Do the test filtering in the existing backup frame buffer instead of
the original. Copy the original data into extra buffer before doing
the filtering. This way there is no need to restore the original
unfiltered frame at the end of level picking process.
This came up in some discussions with Johann. Thanks!
Change-Id: I495f4301d983854673276c34ec0ddf9a9d622122
Added code to allocate aligned image buffer in vpx_img_alloc(). The
alignment of the buffer and stride is determined by the parameter
align.
Change-Id: Idc866978484be3558551c56df39130ab7f74ddd4
The example encoder down-samples the input video frames a number of
times with a down-sampling factor, and then encodes and outputs
bitstreams with different resolutions.
Support arbitrary down-sampling factor, and down-sampling factor
can be different for each encoding level.
For example, the encoder can be tested as follows.
1. Configure with multi-resolution encoding enabled:
../libvpx/configure --target=x86-linux-gcc --disable-codecs
--enable-vp8 --enable-runtime_cpu_detect --enable-debug
--disable-install-docs --enable-error-concealment
--enable-multi-res-encoding
2. Run make
3. Encode:
If input video is 1280x720, run:
./vp8_multi_resolution_encoder 1280 720 input.yuv 1.ivf 2.ivf 3.ivf 1
(output: 1.ivf(1280x720); 2.ivf(640x360); 3.ivf(320x180).
The last parameter is set to 1/0 to show/not show PSNR.)
4. Decode:
./simple_decoder 1.ivf 1.yuv
./simple_decoder 2.ivf 2.yuv
./simple_decoder 3.ivf 3.yuv
5. View video:
mplayer 1.yuv -demuxer rawvideo -rawvideo w=1280:h=720 -loop 0 -fps 30
mplayer 2.yuv -demuxer rawvideo -rawvideo w=640:h=360 -loop 0 -fps 30
mplayer 3.yuv -demuxer rawvideo -rawvideo w=320:h=180 -loop 0 -fps 30
The encoding parameters can be modified in vp8_multi_resolution_encoder.c,
for example, target bitrate, frame rate...
Modified API. John helped a lot with that. Thanks!
Change-Id: I03be9a51167eddf94399f92d269599fb3f3d54f5
dynamicly assign ARG_CTRL_CNT_MAX and
add check to make sure argument instance
doesnt already exist before creating a duplicate
Change-Id: I4f78a9c5346cda8e812cd89c077afe8996493508
This value needs to be copied to each thread's data structure.
This fixed artifact problem in multi-thread encoder.
Change-Id: Iab6d9745a1d44846aa503184705376f63a505597
to the dqcoeff or qcoeff buffer. The encoder would
populate the dc coeffs of the y blocks as a separate
stage (recon_dcblock) and the decoder would use a special
version of the idct. This change eliminates the extra copy
and reduces the code footprint.
[Tero] Added needed changes to armv6 and NEON assembly.
Change-Id: I83202ffdbaf83f6e5dd69f4ba2519fcf0b13b3ba
API was not returning correct partition sizes on arm targets.
The armv5 token packing functions were not storing the information to the
partition size table.
As a fix, have one boolcoder instance allocated for each partition so
that partition sizes are internally available after all partitions
were encoded. This will also allow more flexibility in producing
several partitions in parallel.
Use buffer validation (overflow check) in all ARM bitpacking
functions.
Change-Id: I31c8a11d8a7613676f0ff50928cb2a2ab14fd169
Storing vp8_bilinear_filters_mmx in an mmx file and using it in an sse2
file is bad
Moving towards allowing --disable-mmx
Change-Id: I20493b35bdedcdcfc0915e6f05fdbe6c81a4a742
There was an implicit reference frame test order (typically LAST,
GOLD, ARF) in the mode selection logic, but this doesn't provide the
expected results when some reference frames are disabled. For
instance, in real-time mode, the speed selection logic often disables
the ARF modes. So if the user disables the LAST and GOLD frames, the
encoder was always choosing INTRA, when in reality searching the ARF
in this case has the same speed penalty as searching LAST would have
had.
Instead, introduce the notion of a reference frame search order. This
patch preserves the former priorities, so if a frame is disabled, the
other frames bump up a slot to take its place. This patch lays the
groundwork for doing something smarter in the frame test order, for
example considering temporal distance or looking at the frames used by
nearby blocks.
Change-Id: I1199149f8662a408537c653d2c021c7f1d29a700
Extend buffer write validation (overflow check) to single token
partition packing, both mb and row based functions.
Change-Id: I36e19b7d37fc43712d05c70e3ad223d3eb5b973d
Patch set 2: 64 bit build fix
Patch set 3: 64 bit crash fix
[Tero]
Patch set 4: Updated ARMv6 and NEON assembly.
Added also minor NEON optimizations to subtract
functions.
Patch set 5: x86 stride bug fix
Change-Id: I1fcca93e90c89b89ddc204e1c18f208682675c15
The calculated frame_rate is a state variable in the codec, and
shouldn't be maintained in the configuration struct. Move it to the
main part of cpi so that it isn't clobbered when the configuration
struct is updated. The initial framerate estimate is moved from the
vp8_cx_iface.c wrapper into the body of init_config() in onyx_if.c, so
that it is only called once and not reset on every call to
vp8_change_config().
Change-Id: I8d9a3d1283330d1ee297d07e9d78d1f2875f2465
Ronald recently sent me this patch that he did in April.
> From: Ronald S. Bultje <rbultje@google.com>
> Date: Thu, 28 Apr 2011 17:30:15 -0700
> Subject: [PATCH] SSE2 optimizations for
> vp8_build_intra_predictors_mby{,_s}().
HD decode tests have shown a performance boost up to 1.5%,
depending on material.
Patch set 3: Fixed encoder crash.
Change-Id: Ie1fd1fa3dc750eec1a7a20bfa2decc079dcf48c8
Call the idct/add after the tokenize. This is WIP with
the goal of creating a common idct/add for the encoder and
decoder. This move is necessary because the decoder's version
of the idct clobbers qcoeff, which is used by the tokenize.
Change-Id: I6b08d8e8397cd873647fa4fb9469884e3c876756
Added ARM optimized intra 4x4 prediction
- 2x faster on Profiler compared to C-code compiled with -O3
- Function interface changed a little to improve BLOCKD structure
access
Change-Id: I9bc2b723155943fe0cf03dd9ca5f1760f7a81f54
The referenced function (SignalObjectAndWait) isn't used. Reduces the
warnings with mingw32-w64 which defines this.
Change-Id: I4ce592879ec9372bf196dac640204c4d370bd210
vp8cx_mb_init_quantizer() needs to be called at least once to get
all values calculated. This change added one check to decide if
we could skip initialization or not.
Change-Id: I3f65eb548be57580a61444328336bc18c25c085b
Added code to clip the buffer level to the maximum buffer
size. Without this the buffer level would increase
unchecked.
This bug was found when encoding an essentially static
scene at 2Mb/s. The encoder is unable to generate frames
consistent with the high data-rate because Q bottoms out
at Qmin.
As frames generated are consistently undersized the buffer
level increases and does not get checked against the
maximum size specified by the user (or default).
Change-Id: Id8a3c6323d3246da50f7cb53ddbf78b5528032c6
Fix compiler warning for passing a non const array
to a function expecting a const array by using an
intermediary pointer and casting.
Change-Id: I9bdd358ebdc926223993fb8fb2098ffedd2f3fc7
Changed 'int eob' to 'char *eob' in BLOCKD so that both encoder and
decoder will use eobs[25] array from MACROBLOCKD structure. In future,
this will enable use of the decoder side IDCT in the encoder.
Change-Id: I6e1c011628cb8864fd4a0b80f0279ce16a5ca978
Adding support for several partitions within one input fragment.
This is necessary to fully support all possible packetization
combinations in the VP8 RTP profile. Several partitions can
be transmitted in the same packet, and they can only be split
by reading the partition lengths from the bitstream.
Change-Id: If7d7ea331cc78cb7efd74c4a976b720c9a655463
In some situations (f.g. error-resilient is turned on), vp8cx_mb
_init_quantizer() was called once per macroblock. Added checks
to avoid calculations when there is no change.
Change-Id: Ie4f0a5ade2202041254990a4e9d5b03bd1ac5aea
It is discovered that in rare situations the 2nd order block may
produce a few small magnitude coefficients that has no effect on
reconstruction. The situations are a combination of low quantizer
values (high quality) and low energy in residual signals (content
dependent). This commit added code to detect such cases and reset
the 2nd order block to all 0.
Patch 1 to 4 used code to do all-zero-check on idct result buffer,
and tests on derf set showed a consistent gain of .12%-.14% on all
metrics.But due to a recent change Ie31d90b, the idct result buffer
is not longer populated. So patch 5&6 use an alternative method to
detect the situations. Tests on derf set now shows a consistent
quality gain of .16%-.20%.
As suggested by Jim, Patch 7&8 removed the condition of all first
order block not having any coefficient, instead we reset 2nd order
coefficients to all 0 if sum of absolute value of the coefficients
is small. So it does slightly more than just detecting the oddity
as discussed above, but tests on derf set now show a consistent
gain of .20%-.23% on all metrics.
It is worth noting here that this change does not have any effect
on mid/high quantizer range, it only affects the quantizer value
18 or blow. Within this range, the change helps compression by up
to 2.5% on clips in the derf set.
Change-Id: I718e19cf59a4fc2462cb7070832759beb9f7e7dd
Tests showed ~1.2% performance boost on the HD clip used.
Performance will vary based on material.
Change-Id: Icbcf1a828750d5b4ae5252bf596b3ef594042e8a
Interleaved vp8_find_near_mvs and vp8_mv_ref_probs.
2.5% to 4% performance improvement for the HD clips used.
Change-Id: Id888b667cf5ae2f0e19da18743140f055ff7de8d
The partial frame copy function used to copy an extra 8 lines above
and below. The partial frame filtering can only modify 3 pixel rows
above the partial frame. Reduce copy to bare minimum needed, which is
4 lines, so that partial filtering on copied frame is possible.
Define the "magic" fraction number for partial filtering in
loopfilter.h .
Change-Id: I4791ffc541b6884b12759a0d0714a8faf16147ec
Restructure if statement to clarify the error condition. Trigger the
error before clobbering pc-> variables.
Change-Id: Id01cab798a341ce9899078fdcec265a0e942a0b7
Global function pointers can not be defined in header files. Restructure
vpx_scale pointer configuration.
Change-Id: I6f568a263ad770d32f530abad6007f990fd1003a
Decode the mv mode with if-then-elses instead of traversing
the vp8_mv_ref_tree data structure. This will make it
easier to interleave vp8_find_near_mvs and vp8_mv_ref_probs.
Change-Id: I1e798d6ec40fcaeeff06ccc82f81201978d12f74
check to make sure that cx_data buffer has enough room before
writting to it, prior behavior did not which could result in a crash.
Change-Id: I3fab6f2bc4a96d7c675ea81acd39ece121738b28
During the _pick only the Y plane is examined. In addition, data beyond
the borders of the frame is not read.
Change-Id: Ic549adfca70fc6e0b55f8aab0efe81f0afac89f9
Instead of using the predict buffer, the decoder now writes
the predictor into the recon buffer. For blocks with eob=0,
unnecessary idcts can be eliminated. This gave a performance
boost of ~1.8% for the HD clips used.
Tero: Added needed changes to ARM side and scheduled some
assembly code to prevent interlocks.
Patch Set 6: Merged (I1bcdca7a95aacc3a181b9faa6b10e3a71ee24df3)
into this commit because of similarities in the idct
functions.
Patch Set 7: EC bug fix.
Change-Id: Ie31d90b5d3522e1108163f2ac491e455e3f955e6
NEON version of copyframeyonly, extendframeborders, copy_frame_func were
not working for plane stride < 128 and/or y_width < 128.
Change-Id: Id6c2e6c795274da0c90134b15c0d5f62d1b17a6c
It was crashing when number of partitions was bigger than the number
of MB rows (ex. 128x96 with 8 partitions).
Start point was not checked against mb_rows, plus extra
"empty" partitions were not written out.
Change-Id: I9c2f013b9ec022354b658fab4ef799ff8b1de93d
Added the ability to create rate-targeted, temporally
scalable, VP8 compatible bitstreams.
The application vp8_scalable_patterns.c demonstrates how
to use this capability. Users can create output bitstreams
containing upto 5 temporally separable streams encoded
as a single VP8 bitstream.
(previously abandoned as:
I92d1483e887adb274d07ce9e567e4d0314881b0a)
Change-Id: I156250a3fe930be57c069d508c41b6a7a4ea8d6a
buffer_level in VP8_COMP and starting_buffer_level, optimal_buffer_level
and maximum_buffer_size in VP8_CONFIG changed from int to int64_t
to avoid potential crash issues for larger target bit rates.
Change-Id: I0d5ab6c8a44c2fef51f30cd8df4bb4b739c5df26
Previous entropy probs need to be saved (and restored) only when
current updates are not propagated.
Change-Id: Ie6ee0543066e30874e56258be0a6b7d2dd2fdb2b
Uninitialized data could be written to the first pass file when no
motion vectors are present in the frame.
Also fix a number of compiler warnings.
Change-Id: Icc9f53b6d33da9de4563d86d9fd591910473ea90
The data processed by the loopfilter overlaps. At the block level, this
results in some redundant transforms. Grouping the filtering allows for
a single 16x16 transpose (and inversion) instead of three 16x8 transposes
(and three more inversions).
This implementation is x86_64 only. We retain the previous
implementation for x86.
Improvements are obviously material dependant, but it seems to be ~%1 in
tests here.
Change-Id: I467b7ec3655be98fb5f1a94b5d145e5e5a660007
vp8_find_near_mvs() is being called on all possible reference frames
but the data computed may be used if the loop exits early, which can
be due to x->skip beign set to 1.
Optimize this by call vp8_find_near_mvs() laziy only if it is going
to be used and not computed yet.
Change-Id: Iccdbd4c962a670c9f2c99b8aca8096042ca5dc98
Changes to the selection of Q limits for two pass
and two pass CQ mode.
Allowance made for Mode and motion vector costs.
Some refactoring of common code.
For Derf and YT sets CQ mode average improvement
circa 1% (SSIM and Global PSNR).
Some increased tendency to undershoot even when
user CQ not reached.
Patch2: Removed some test code accidentally merged.
Change-Id: Icf74d13af77437c08602571dc7a97e747cce5066
'all' is the conventional target for building everything in the
makefile, but the child make was expecting all-$(target), for debugging
reasons that I don't recall exactly. Restore the expected behavior.
Change-Id: Ifbb03610b55be679ce7c5e210b7a69a156bb76b9
Sync with loopfilter thread just at the beginning of next frame encoding.
This returns control to application faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed imediatly
so we cannot delay the sync.
Change-Id: I288d97b5e331d41d6f5bb49d97986fa12ac6f066
To avoid a dependency on vpx_version.h, call the vpx_codec_version_str()
function and build up the string manually.
Change-Id: Ie650e9b8f2aaaffaa31da5e9ef3b566b972321b4
Make sure that this header is listed as one of the sources, so that it
will be installed if necessary.
Change-Id: I2427e494488126b179151dc21043c1e2c8ba5991
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9
Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.
Restriction applies only to armv5 targets and not for armv6 and above.
Change-Id: I378d17c51e1f16a6820814fbed43e115aaabb03e
These changes fixes a glitch between the RTP profile and the input
partitions interface. Since there's no way for the user to know the
actual number of partitions, the decoder have to read the
multi_token_paritition bits also when input partitions mode is
enabled.
Included are also a couple of fixes for issues with independent
partitions and uninitialized memory reads.
Change-Id: I6f93b15287d291169ed681898ed3fbcc5dc81837
- Updated walsh transform to match C
(based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
correspondingly
Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e
Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.
Change-Id: I338f02b881098782f82af63d97f042b85e63e902
In the "Removed bmi copy to/from BLOCKD" commit, the copy
to the bmi in BLOCKD was eliminated. The clamp_mvs() used
the bmi in BLOCKD, which now contains incorrect values. This
patch fixes this problem.
Change-Id: I8eca1eaf4015052b0b63e90876f7ad321aba7cff
vp8_update_zbin_extra() is called all the time even though the fast
quantizer doesn't use it. Skip this call if fast quantizer is used.
Change-Id: Ia711c38431930cc2486cf59b8466060ef0e9d9db
This change makes sure that no key frame recoding in real-time mode
even if CONFIG_REALTIME_ONLY is not configured.
Change-Id: Ifc34141f3217a6bb63cc087d78b111fadb35eec2
for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
Prepend idct function names with vp8_
so that under profiling they show up
associated with libvpx.
Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
The data that the simple horizontal loopfilter reads is aligned, treat
it accordingly.
For the vertical, we only use the bottom 4 bytes, so don't read in 16
(and incur the penalty for unaligned access).
This shows a small improvement on older processors which have a
significant penalty for unaligned reads.
postproc_mmx.c is unused
Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
Prepend . to local labels in assembly code. This
allows non unique labels within a file. Also
makes profiling information more informative
by keeping the function name with the loop name.
Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I29f5c2ead342b2086a468029c15e2c1d948b5d97
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.
This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.
Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4
Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.
Also use min GF interval as min KF interval.
Change-Id: Iee238b8cae0ffaed850a5a944ac825cee18da485
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.
Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.
Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
To get a list of files that the libvpx library depends on in the current
configuration, run:
$ make target=libs libvpx_srcs.txt
Change-Id: I68a69648ecf212f0fe29c325297728ac2a9393d9
This reverts commit b5ea2fbc2c. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.
Conflicts:
vp8/encoder/onyx_if.c
vp8/encoder/ratectrl.c
Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.
Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.
Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.
Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.
Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.
Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.
Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.
Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.
Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
Corrected the merge direction this time, so that running
`git describe` on the master branch finds v0.9.7 as the most recent
tag.
Change-Id: I9e7b5d473c26e670c6d9a76f5c03fa617690651d
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.
Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77
Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.
Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146
The static libs should not be built from sources during the top level
of a universal build. This regression was introduced in commit
495b241fa6, which made the static
libs selectable under CONFIG_STATIC.
Change-Id: I585167e17459877e0fa7fa19e1046c3703d91c97
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676
update the code with new assumptions and guard them with a compile time
assert
Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
Since this is the only ABI incompatible change since the last release,
convert it to use the control interface instead. The member of the
configuration struct is replaced with the VP8E_SET_MAX_INTRA_BITRATE_PCT
control.
More significant API changes were expected to be forthcoming when this
control was first introduced, and while they continue to be expected,
it's not worth breaking compatibility for only this change.
Change-Id: I799d8dbe24c8bc9c241e0b7743b2b64f81327d59
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):
1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain
2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.
Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.
Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
This adds the magic .note.GNU-stack section at the end of each ARM
asm file (when built with gas), indicating that a non-executable
stack is allowed.
Without this section, the linker will assume the object requires an
executable stack by default, forcing an executable stack for the
entire program.
Change-Id: Ie86de6a449b52d392b9e5e0479833ed8c508ee65
This is done by expanding luma row to 32-byte alignment, since
there is currently a bunch of code that assumes that
uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
encoder/temporal_filter.c, and possibly others; I haven't done a
full audit).
It also uses replaces the hardcoded border of 16 in a number of
encoder buffers with VP8BORDERINPIXELS (currently 32), as the
chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
dumping the frame memory as a contiguous blob produces a valid,
if padded, image.
Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
This reverts commit b73a3693e5.
This version of the check doesn't work with generic-gnu, and figuring
out the correct symbol version at configure time is probably more work
than this is worth. May revisit in the future.
Change-Id: I6c75e88bd3bd82a4b21e09a25780fe53aacb7d70
allowing the compiler to inline this function. For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.
Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.
Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.
Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
Using small values for --buf-sz= in command line causes
floating point exception due to division by zero.
Change-Id: Ibfe2d44db922993a78ebc9a4a1087d9625de48ae
Optimized C-code of the following functions:
- vp8_tokenize_mb
- tokenize1st_order_b
- tokenize2nd_order_b
Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
depending on encoding parameters.
Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c
glibc implements some checking on longjmp() calls by replacing it with
an internal function __longjmp_chk(), when FORTIFY_SOURCE is defined.
This can be problematic when compiling the library under one version of
glibc and running it under another. Work around this issue for the one
symbol affected for now, before taking out the undef hammer.
Fixes http://code.google.com/p/webm/issues/detail?id=166
Change-Id: Ifc5e25cdec17915e394711f2185b3e9214572d10
Previously allocated more memory than necessary for yuv buffers.
This makes it harder to track bugs with reading uninitialized
data.
Change-Id: I510f7b298d3c647c869be6e5d51608becc63cce9
As reported in issue 331, vpxenc encoded incorrect webm file header
on big endian machines. This change fixed that.
Change-Id: I31924ebd476a87f3e88b9b5424540bf781d2b86f
Do mvp clamping in full-pixel precision instead of 1/8-pixel
precision to avoid error caused by right shifting operation.
Also, further fixed the motion vector limit calculation in change:
b748045470
Change-Id: Ied88a4f7ddfb0476eb9f7afc6ceeddbf209fffd7
Separate simple filter with reduced no. of parameters.
MB filter level picking based on precalculated table. Level table updated for
each frame. Inside and edge limits precalculated and updated just when
sharpness changes. HEV threshhold is constant.
ARM targets use scalars and others vectors.
Change works only with --target=generic-gnu
All other targets have to be updated!
Change-Id: I6b73aca6b525075b20129a371699b2561bd4d51c
Allow the encoder to inform the application that the encoded frame will not
be used as a reference.
Change-Id: I90e41962325ef73d44da03327deb340d6f7f4860
Motion vector limits are calculated using right shifts, which
could give wrong results for negative numbers. James Berry's
test on one clip showed encoder produced some artifacts. This
change fixed that.
Change-Id: I035fc02280b10455b7f6eb388f7c2e33b796b018
In this commit I have added an experimental function
that tests prediction quality either side of a central position
to calculate a suggested boost number for an ARF frame.
The function is passed an offset from the current position and
a number of frames to search forwards and backwards.
It returns a forward, backward and compound boost number.
The new code can be deactivated using #define NEW_BOOST 0
In its current default state the code searches forwards and backwards
from the proposed position of the next alt ref.
The the old code used a boost number calculated by scanning forward
from the previous GF up to the proposed alt ref frame position.
I have also added some code to try and prevent placement of a gf/arf
where there is a brief flash.
Change-Id: I98af789a5181148659f10dd5dd2ff2d4250cd51c
For clips that are near 60fps and have a lot of alt refs, it's possible
that the ring buffer holding the previous frames sizes/pts could wrap
around, leading to a division by zero.
In addition to checking for this condition in the ring buffer loop,
the buffer size is made dependent on the actual frame rate in use,
rather than defaulting to 60, which should improve accuracy at frame
rates >= ~60.
Change-Id: If5a04d6e847316dc5f7504f25c01164cf9332be8
There were many instances in the code of vp8_coef_tokens and
vp8_coef_tokens-1, which was a preprocessor macro despite the naming
convention. Replace these with MAX_ENTROPY_TOKENS and ENTROPY_NODES,
respectively.
Change-Id: I72c4f6c7634c94e1fa066cd511471e5592c748da
With this commit frames can be received partition-by-partition
from the encoder and passed partition-by-partition to the
decoder.
At the encoder-side this makes it easier to split encoded
frames at partition boundaries, useful when packetizing
frames. When VPX_CODEC_USE_OUTPUT_PARTITION is enabled,
several VPX_CODEC_CX_FRAME_PKT packets will be returned
from vpx_codec_get_cx_data(), containing one partition
each. The partition_id (starting at 0) specifies the decoding
order of the partitions. All partitions but the last has
the VPX_FRAME_IS_FRAGMENT flag set.
At the decoder this opens up the possibility of decoding partition
N even though partition N-1 was lost (given that independent
partitioning has been enabled in the encoder) if more info
about the missing parts of the stream is available through
external signaling.
Each partition is passed to the decoder through the
vpx_codec_decode() function, with the data pointer pointing
to the start of the partition, and with data_sz equal to the
size of the partition. Missing partitions can be signaled to
the decoder by setting data != NULL and data_sz = 0. When
all partitions have been given to the decoder "end of data"
should be signaled by calling vpx_codec_decode() with
data = NULL and data_sz = 0.
The first partition is the first partition according to the
VP8 bitstream + the uncompressed data chunk + DCT address
offsets if multiple residual partitions are used.
Change-Id: I5bc0682b9e4112e0db77904755c694c3c7ac6e74
Adding capabilities with which the encoder can output frames
partition by partition, and the decoder can get input data
partition by partition.
Change-Id: Ieae0801480b8de8cd43c3c57dd3bab2e4c346fe0
Adding support in the encoder for generating
independent residual partitions by forcing
equal probabilities over the prev coef entropy
contexts.
Change-Id: I402f5c353255f3ca20eae2620af739f6a498cd21
The current code stores pointers to coefficient tables and loads them to
access the tables contents. As these pointers are stored in the code
sections, it means we end up with text relocations. eu-findtextrel will
thus complain about code not compiled with -fpic/-fPIC.
Since the pointers are stored in the code sections, we can actually cheat
and let the assembler generate relative addressing when accessing the
coefficient tables, and just load their location with adr.
Change-Id: Ib74ae2d3f2bab80b29991355f2dbe6955f38f6ae
Also includes a couple of error concealment bug fixes:
- the segment_id wasn't properly initialized when missing
- when interpolating and no neighbors are found, set to zero
- clear the qcoef buffer when concealing an MB
Change-Id: Id79c876b41d78b559a2241e9cd0fd2cae6198f49
Only the first frame buffer ref counter was being initialized
because the index was fixed at 0 rather than using i.
Change-Id: Ib842298be4a5e3607f9e21c2cd4bfbee4054ffc4
I got this idea from Pascal (Thanks). Before encoding a macroblock,
copy it to a 16x16 buffer, and then read source data from there
instead. This will help keep the source data in cache, and help
with the performance.
Change-Id: Id05f4cb601299150511d59dcba0ae62c49b5b757
This reverts commit 212f618373.
Further testing shows that the overshoot accumulation/damping is too
aggressive on some clips. Allowing the accumulated overshoot to
decay and limiting to damping to golden frames shows some promise.
But some clips show significant overshoot in the buffer window, so
I think this still needs work.
Change-Id: Ic02a9ca34f55229f9cc04786f4fab54cdc1a3ef5
Add the --rate-hist=n option, which displays a histogram with n
buckets for the rate over the --buf-sz window.
Change-Id: I2807b5a1525c7972e9ba40839b37e92b23ceffaf
Add the --q-hist=n option, which displays a histogram with n buckets for
the quantizer selected on each frame.
Change-Id: I59b020c26b0acae0b938685081d9932bd98df5c9
vp8_yv12_copy_frame_ptr() expects same size
buffers which was not previously gaurenteed.
Using an improperly allocated buffer would
cause a crash before.
Change-Id: I904982313ce9352474f80de842013dcd89f48685
RDMULT/RDDIV defines a bit worth of distortion in term of sum squared
difference. This has also been used as errorperbit in subpixel motion
search, where the distortions computed as variance of the difference.
The variance of differences is different from sum squared differences
by amount of DC squared. Typically, for inter predicted MBs, this
difference averages around 10% between the two distortion, so this patch
introduces a 110% constant in deriving errorperbit from RDMULT/RDDIV.
Test on CIF set shows small but positive gain on overall PSNR (.03%)
and SSIM (.07%), overall impact on average PSNR is 0.
Change-Id: I95425f922d037b4d96083064a10c7cdd4948ee62
Relocated the vp8dx_bool_decoder_fill() call, allowing
the compiler to produce better assembly code. Tests
showed a 1 - 2 % performance boost (x86 using gcc)
for the 720p clip used.
Change-Id: Ic5a4eefed8777e6eefa007d4f12dfc7e64482732
The starting points are always within the limits, and bounds
checking on these points is not needed. For speed < 5, the
encoded result changes a little because different treatment
is taken while starting point equals the bounds.
Change-Id: I09a402d310f51e305a3519f1601b1d17b05c6152
Modify the second-pass code to provide a full golden-frame (GF) bit
allocation boost if the past GF group (GFG) had no alt-ref frame (ARF),
even if the current GFG does contain and ARF.
This mostly has no effect on clips, since switching ARFs on/off between
GFGs is not very common. Has a positive effect on e.g. cheer (+0.45 SSIM
at 600kbps) and football (+0.25 SSIM at 600kbps), particularly at high
bitrates. Has a negative effect (-0.04 SSIM at 300kbps) at pamphlet,
which appears only marginally related to this patch, and crew (-0.1 SSIM
at 700kbps).
Change-Id: I2e32899638b59f857e26efeac18a82e0c0b77089
Replace =1 with =true for yasm tool element. This aids in upgrading
e.g., vs9 project files to vs10.
build/x86-msvs/yasm.xml generated during conversion will require the
Separator attribute to be removed for the build to complete
successfully.
Change-Id: If75c4f9a925529740048882003e9d766c5ac4f0c
The BPRED mode selection uses SSE as a distortion metric, but the early
breakout threshold being used was a variance value.
Change-Id: I42d4602fb9b548bf681a36445701fada5e73aff1
firstpass.c contains some rate adjustment code that assures that the
last few frames in a sequence abide by rate limits. If the second-to-
last group of frames contains an alt-ref frame (ARF), the last golden
frame (GF) is zero bytes, and we will thus spend a ridiculously high
number of bits on regular P-frames trying to hit the target rate. This
does slightly enhance the quality of these last few frames, but has
no perceptual value (other than hitting the target rate).
Disabling this code means we consistently (slightly) undershoot the
target rate and consequently do worse on the last few frames of a
clip, which is particularly noticeable for small clips. The quality-
per-bitrate is generally better, ~0.2% better overall on derf-set,
especially on clips such as garden, tennis, foreman at low bitrates.
Has a negative effect on hallmonitor at high bitrates.
Change-Id: I1d63452fef5fee4a0ad2fb2e9af4c9f2e0d86d23
Moved encode_intra function from firstpass.c to encodeintra.c to
prevent linking problem in real-time only build. Also changed name
of the function to vp8_encode_intra because it is not a static.
Change-Id: Ibf3c6c1de3152567347e5fbef47d1d39564620a5
- Updated -linux-rvct targets to support RVDS 4.0 and later.
- Changed optimization flag to -Otime because -O3 ruined performance
for RVCT linux targets.
- Added support for --enable-small for RVCT
- RVCT created library should be able to link with GCC
- Supports building shared linux libraries
Change-Id: Ic62589950d86c3420fd4d908b8efb870806d1233
If setup_token_decoder reported an internal error the memory allocated
there would not be freed in the resulting call to _remove_decompressor.
Change-Id: Ib459de222d76b1910d6f449cdcd01663447dbdf6
Small decode performance gain (~1%) on keyframes. No
noticeable gains on encode. Also changed pick_intra4x4mby_modes()
to read the above and left block modes for keyframes only.
Change-Id: I1f4885252f5b3e9caf04d4e01e643960f910aba5
Automatically created assembly offset files added to CLEAN-OBJS list
for proper cleanup. This will fix following build error:
1) Build for the workstation
./conigure
make
make clean
2) Build for ARM platform
./configure --target=armv7-linux-gcc
make ==> this will fail because it uses old asm_*_offset.asm files
Change-Id: Id5275c470390ca81b8db086a15ad75af39b80703
Since BPRED will be tested at most once, and SPLITMV is not enabled,
there's nothing to clobber the subblock modes, so there's no need to
save and restore them.
Change-Id: I7c3615b69190c10bd068a44df5488d6e8b85a364
In activity masking, RDO constant RDMULT is adjusted on a per MB basis
adaptive to activity with the MB. errorperbit, which is defined as
RDMULT/RDDIV, is a constant used in motion estimation. Previously, in
activity masking, errorperbit is not changed even when RDMULT is changed.
This commit changed to adjust errorperbit according to the change in
RDMULT.
Test in cif set showed a very small but consistent gain by all quality
metrics (average, overall psnr and ssim) when activity masking is on.
Change-Id: I07ded3e852919ab76757691939fe435328273823
This change is analogous to I0b67dae1f8a74902378da7bdf565e39ab832dda7,
which made the move for the non-RD path.
Change-Id: If63fc1b0cd1eb7f932e710f83ff24d91454f8ed1
This commit moves the intra block mode selection from encodeframe.c
to pickinter.c (in the non-RD case). This allowed pick_intra_mbuv_mode
and pick_intra4x4mby_modes to be made static, and is a step towards
refactoring intra mode selection in the main pickinter loop. Gave a
small perf increase (~0.5%).
Change-Id: I0b67dae1f8a74902378da7bdf565e39ab832dda7
Some further re-structuring of activity masking code.
Still has various experimental switches.
Supports a metric based on intra encode.
Experimental comparison against a fixed activity target rather
than a frame average, for altering rd and zbin.
Overall the SSIM performance is similar to TT's original
code but there is a much smaller PSNR hit of circa
0.5% instead of 3.2%
Change-Id: I0fd53b2dfb60620b3f74d7415e0b81c1ac58c39a
While investigating the effect of DC values on SAD and SSE in motion
estimation, a side finding indicates the two table of constants need
be adjusted. The adjustment was done by multiplying old constants by
90% with rounding. Also absorb the 1/2 scaling constant into the two
tables. Refer to change Ifa285c3e for background of the 1/2 factor.
Cif set test showed a very small gain on all metric.
Change-Id: I04333527a823371175dd46cb04a817e5b9a8b752
The encoder defined about 4 set of similar functions to calculate sum,
variance or sse or a combination of them. This commit removed one set
of these functions, get8x8var and get16x16var, where calls to the later
function are replaced with var16x16 by using the fact on a 16x16 MB:
variance == sse - sum*sum/256
Change-Id: I803eabd1fb3ab177780a40338cbd596dffaed267
In real-time mode motion search, there is no need to calculate
variance. This change improved encoding speed by 1% ~ 2%(speed=-5).
Change-Id: I65b874901eb599ac38fe8cf9cad898c14138d431
This patch attempts to reduce the peak bitrate hit by the encoder
when using small buffer windows.
Tested on the CIF set over 200-500kbps using these settings:
--buf-sz=500 --buf-initial-sz=250 --buf-optimal-sz=250 \
--undershoot-pct=100
Two pass encodes were tested at best quality. One pass encodes were
tested only at realtime speed 4:
--rt --cpu-used=-4
The peak datarate (over the specified 500ms window) was measured
for each encode, and averaged together to get metric for
"average peak," computed as SUM(peak)/SUM(target). This patch
reduces the average peak datarate as follows:
One pass:
baseline: 1.29715
this patch: 1.23664
Two pass:
baseline: 1.32702
this patch: 1.37824
This change had a positive effect on our quality metrics as well:
One pass CBR:
Min / Mean / Max (pct)
Average PSNR -0.42 / 2.86 / 27.32
Overall PSNR -0.90 / 2.00 / 17.27
SSIM -0.05 / 3.95 / 37.46
Two pass CBR:
Min / Mean / Max (pct)
Average PSNR -4.47 / 4.35 / 35.99
Overall PSNR -3.40 / 4.18 / 36.46
SSIM -4.56 / 6.98 / 53.67
One pass VBR:
Min / Mean / Max (pct)
Average PSNR -5.21 / 0.01 / 3.30
Overall PSNR -8.10 / -0.38 / 1.21
SSIM -7.38 / -0.11 / 3.17
(note: most values here were close to the mean, there were a few
outliers on files that were very sensitive to golden frame size)
Two pass VBR:
Min / Mean / Max (pct)
Average PSNR 0.00 / 0.00 / 0.00
Overall PSNR 0.00 / 0.00 / 0.00
SSIM 0.00 / 0.00 / 0.00
Neither one pass or two pass CBR mode adheres particularly strictly
to the short term buffer constraints, and two pass is less
consistent, even in the baseline commit. This should be addressed
in a later commit. This likely will hurt the quality numbers, as it
will have to reduce the burstiness of golden frames.
Aside: My work on this commit makes it clear that we need to make
rate control modes "pluggable", where you can easily write a new
one or work on one in isolation.
Change-Id: I1ea9a48f2beedd59891f1288aabf7064956b4716
Currently, hex search couldn't guarantee the motion vector(MV)
found is within the limit of maximum MV. Therefore, very large
motion vectors resulted from big motion in the video could cause
encoding artifacts. This change adjusted hex search bounds
checking to make sure the resulted motion vector won't go out
of the range. James Berry, thank you for finding the bug.
Change-Id: If2c55edd9019e72444ad9b4b8688969eef610c55
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.
Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
This is basically a slightly modified version of the previous patch,
and it has a moderately positive effect (SSIM/PSNR both +0.08% avg
on derf-set). Most clips show no change, except waterfall/coastguard,
each ~ +0.8% SSIM/PSNR. You can see similar effects in other clips
by shortening their length to terminate at a very short last group
of frames.
Change-Id: I7a70de99ca1f9fe6a8b6ca7a6e30e8a4b64383e4
this commit makes the usage errorperbit and sadperbit consistent for
encoding modes and passes. Removed all different magic weight factors
associated with errorperbit. Now 1/2 is used for both sadperbit16 and
sadperbit4, the /2 operation is merged into initializations of the 2
variables.
Tests on cif set show .23%, 0.18% and 0.19% gain by avg psnr, overall
psnr and ssim respectively.
Change-Id: Ifa285c3e065ce0a5a77addfc9f95aabf54ee270d
The fb_idx_ref_cnt book-keeping was in error. Added an assert to
prevent future errors in the reference count vector. Also fixed a
pointer syntax error.
Change-Id: I563081090c78702d82199e407df4ecc93da6f349
sad_per_bit has been used for a number of motion vector search routines
with different magic weights: 1, 1/2 and 1/4. This commit remove these
magic numbers and use 1/2 for all motion search routines, also reformat
a number of source code lines to within 80 column limit.
Test on cif set shows overall effect is neutral on all metrics. <=0.01%
Change-Id: I8a382821fa4cffc9c0acf8e8431435a03df74885
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
- Additional 3-6% speedup compared to neon optimized fast
quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)
Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e
Misplaced #endif caused first_time_stamp_ever to only be initialized if
CONFIG_INTERNAL_STATS was set.
Change-Id: I2296a4ab00f7dfb767583edcc5d59b94f48c0621
Added preload instructions to armv6 encoder optimizations.
About 5% average speed-up on Tegra2 for VGA@30fps sequence.
Change-Id: I41d74737720fb71ce7a316f07555357822f3347e
in onyx_if.c update_reference_frames() make
sure that frame buffer indexes are not equal
before preforming a buffer copy. If two frames
share the same buffer the flags will already be
set correctly.
Change-Id: Ida9b5516d08e3435c90f131d2dc19d842cfb536e
Test showed using hex search in realtime mode largely speed up
encoding process, and still achieves similar quality like the
diamond search we have. Therefore, removed the diamond search
option.
Change-Id: I975767d0ec0539f9f6ed7fdfc09506e39761b66c
error_per_bit and sad_per_bit were designed as estimates of a bit worth
of sum squared error and sum absolute difference respectively. Under
this assumption, error_per_bit should be used in combination with 2nd
order errors (variance or sum squared error) while sad_per_bit should
be used in combination with 1st order SADs in motion estimation. There
were a few places where sad_per_bit has been misused with variances,
this commit changes to use error_per_bit for those places, also changes
parameter names to properly indicate which constant is being used.
On cif set, the change has a universal gain by all metrics: 0.13% by
average/overall psnr and 0.1% by ssim.
Change-Id: I4850fdcc3fd6886b30f784bd843f13dd401215fb
While profile=3, there is no sub-pixel search. Distortion and SSE
have to calculated using get_inter_mbpred_error().
Change-Id: Ifb36e17eef7750af93efa7d0e2870142ef540184
Declared the bmi in MODE_INFO as a union instead of B_MODE_INFO.
This reduced the memory footprint by 518,400 bytes for 1080
resolutions. The decoder performance improved by ~4% for the
clip used and the encoder showed very small improvements. (0.5%)
This reduction was first mentioned to me by John K. and in a
later discussion by Yaowu.
This is WIP.
Change-Id: I8e175fdbc46d28c35277302a04bee4540efc8d29
fixed a bug where active_worst_quality could be set
below active_best_quality which could result in an
infinite loop.
Change-Id: I93c229c3bc5bff2a82b4c33f41f8acf4dd194039
This patch collects the twopass specific memebers of VP8_COMP into a
dedicated struct. This is a first step towards isolating the two pass
rate control and aids readability by decorating these variables with
the 'twopass.' namespace. This makes it clear to the reader in what
contexts the variable will be valid, and is a hint that a section of
code might be a good candidate to move to firstpass.c in later
refactoring. There likely will be other rate control modes that need
their own specific data as well.
This notation is probably overly verbose in firstpass.c, so an
alternative would be to access this struct through a pointer like
'rc->' instead of 'cpi->firstpass.' in that file. Feel free to make
a review comment to that effect if you prefer.
Change-Id: I0ab8254647cb4b493a77c16b5d236d0d4a94ca4d
The partition_info struct contains info just for SPLITMV,
so it should be used instead of BLOCKD. Eventually, I want
to reduce the size of B_MODE_INFO struct found in BLOCKD, so
this is the first step toward that goal.
Also, since SPLITMV is not supported in vp8_pick_inter_mode(),
the unnecessary mem copies and checks were removed. For rt
encodes, this gave a slight performance improvement.
Change-Id: I5585c98fa9d5acbde1c7e0f452a01d9ecc080574
The error-concealer is plugged in after any motion vectors have been
decoded. It tries to estimate any missing motion vectors from the
motion vectors of the previous frame. Intra blocks with missing
residual are replaced with inter blocks with estimated motion vectors.
This feature was developed in a separate sandbox
(sandbox/holmer/error-concealment).
Change-Id: I5c8917b031078d79dbafd90f6006680e84a23412
rvct 4.1 was complaining about vstmia.16, store multiple expects 64 data type.
optimized the implementation.
Change-Id: I0701052cabd685c375637bbc3796ff6d88f5972c
This commit restructures the mb activity masking code
to better facilitate experimentation using different metrics
etc. and also allows for adjustment of the zero bin either
for encode only or both the encode and mode selection
stages
It also uses information from the current frame rather than
the previous frame and the default strength has been
reduced.
Change-Id: Id39b19eace37574dc429f25aae810c203709629b
This patch improves the accuracy of frame rate estimation by using a
larger, 1 second window. It also more quickly adapts to step changes
in the input frame rate (ie 30fps to 15fps)
Change-Id: I39e48a8f5ac880b4c4b2ebd81049259b81a0218e
The compiler produces better assembly when using int_mv
for assignments. The compiler shifts and ors the two 16bit
values when assigning MV.
Change-Id: I52ce4bc2bfbfaf3f1151204b2f21e1e0654f960f
Further modification and wrong implementation fix which caused
refining_search and refining_searchx4 result mismatching.
Change-Id: I80cb3a44bf5824413fd50c972e383eebb75f9b6f
This is to reflect the RD improvement in the encoder. The change has a
small positive impact on quality (0.25% by VPXSSIM and 0.05% by PSNR)
Change-Id: Ic66ffc19b10870645088c0624c85556f009fd210
The variable is introduced in commit 2e53e9e53 to make more use of
trellis quantization, but this is no longer necessary after RDMULT
was made adaptive in a number of later commits.
Change-Id: I7420522ec7723f38cf77033466c25afb405d52ae
global values were being referenced, but the GOT was not being set up.
as the GOT is only required for PIC, this issue wasn't caught in the
default configuration.
Change-Id: I8006e53776139362a76f2c80cf9d0f8458602b2f
http://code.google.com/p/webm/issues/detail?id=328
In NEWMV mode, currently, full search is used as the refining search
after n-step search. By replacing it with an iterative diamond search
of radius 1 largely reduced the computation complexity, but still
maintained the same encoding quality since the refining search is
done for every macroblock instead of only a small precentage of
macroblocks while using full search.
Tests on the test set showed a 3.4% encoding speed increase with none
psnr & ssim loss.
Change-Id: Ife907d7eb9544d15c34f17dc6e4cfd97cb743d41
Paul pointed out that the pointer to the gf_active_flags is not being
properly incremented in multithreaded encoder. This commit fixes the
issue by making sure the gf_active_ptr points to the starting of next
group of mb rows.
Change-Id: I3246e657d23beabb614dfb880733a68a5fd7e34c
Commit db5057c introduced a bug in that the active_worst_quality
selected by the 2 pass rate controller was being overridden for key
frames, causing a severe quality loss.
Change-Id: I4865a6fbe3e94e9b4fb9271c7dd68b455d7b371d
VS2010 has included stdint.h, but not inttypes.h. Prefer the compiler's
version of these types. Fixes issue 327.
Change-Id: Ica71600e06b8e94e3bbb4f12988b4a9817d5e5e4
vp8_fast_quantize_b_neon function updated and further optimized.
- match current C implementation of fast quantizer
- updated to use asm_enc_offsets for structure members
- updated ads2gas scripts to handle alignment issues
Change-Id: I5cbad9c460ad8ddb35d2970a8684cc620711c56d
The existing emulation of posix semaphores on Windows uses SetEvent()
and WaitForSingleObject(), which implements a binary semaphore, not a
counting semaphore as implemented by posix. This causes deadlock when
used with the expected posix semantics. Instead, this patch uses the
CreateSemaphore() and ReleaseSemaphore() calls (introduced in Windows
2000) which have the expected behavior.
This patch also reverts commit eb16f00, which split a semaphore that
was being used with counting semantics into two binary semaphores.
That commit is unnecessary with corrected emulation.
Change-Id: If400771536a27af4b0c3a31aa4c4e9ced89ce6a0
This patch is to fix a rare hang in multi-thread encoder that was
only seen on Windows. Thanks for John's help in debugging the
problem. More test is needed.
Change-Id: Idb11c6d344c2082362a032b34c5a602a1eea62fc
Changed 8-neighbor searching to 4-neighour searching, and continued
searching until the center point is the best match.
Test on test set showed 1.3% encoding speed improvement as well as
0.1% PSNR and SSIM improvement at speed=-5 (rt mode).
Will continue to improve it.
Change-Id: If4993b1907dd742b906fd3f86fee77cc5932ee9a
The commit also removed the slow ssim calculation that uses a 7x7
kernel, and revised the comments to better describe how sample ssim
values are computed and averaged
Change-Id: I1d874073cddca00f3c997f4b9a9a3db0aa212276
Always use CFLAGS/LDFLAGS that point to headers and libvpx.a inside our
build tree before ones from the environment, which could reference
headers or libs outside the build tree.
This fixes issue 307.
Change-Id: I34d176b8c21098f6da5ea71f0147d3c49283cc45
Renamed configure option "enable-psnr" to "enable-internal-stats" to
better reflect the purpose of the option and eliminate the confusion
reported in http://code.google.com/p/webm/issues/detail?id=35
Change-Id: If72df6fdb9f1e33dab1329240ba4d8911d2f1f7a
Insertion sort performs better for sorting small arrays. In real-
time encoding (speed=-5), test on test set showed 1.7% performance
gain with 0% PSNR change in average.
Change-Id: Ie02eaa6fed662866a937299194c590d41b25bc3d
This change simply exercises the VP8D_GET_FRAME_CORRUPTED control,
outputting a warning message at the end if the bit was set for any
frames. Should never produce any output for good input.
Change-Id: Idaf6ba8f53660f47763cd563fa1485938580a37d
Allow more reliable detection of truncated bitstreams by being more
precise with the count of "virtual" bits in the value buffer.
Specifically, the VP8_LOTS_OF_BITS value is accumulated into count,
rather than being assigned, which was losing the prior value,
increasing the required tolerance when testing for the error condition.
Change-Id: Ib5172eaa57323b939c439fff8a8ab5fa38da9b69
Combine calc_iframe_target_size, previously only used for forced
keyframes, with calc_auto_iframe_target_size, which handled most
keyframes.
Change-Id: I227051361cf46727caa5cd2b155752d2c9789364
This is a first step in cleaning up the redundancies between
vp8_calc_{auto_,}iframe_target_size. The pick_frame_size() function is
moved to ratectrl.c, and made to be the primary interface. This means
that the various calc_*_target_size functions can be made private.
Change-Id: I66a9a62a5f9c23c818015e03f92f3757bf3bb5c8
Fixed test vector mismatch that was introduced
in the "Removed dc_diff from MB_MODE_INFO"
(Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931)
Change-Id: I98fa509b418e757b5cdc4baa71202f4168dc14ec
the decision to run the regular or simple loopfilter is made outside the
function and managed with pointers
stop tracking the option in two places. use filter_type exclusively
Change-Id: I39d7b5d1352885efc632c0a94aaf56b72cc2fe15
Rather than using a default size of 1/2 or 3/2 seconds for the first
frame, use a fraction of the initial buffer level to give the
application some control.
This will likely undergo further refinement as size limits on key
frames are currently under discussion on codec-devel@, but this gives
much better behavior for small buffer sizes as a starting point.
Change-Id: Ieba55b86517b81e51e6f0a9fe27aabba295acab0
vp8_adjust_key_frame_context() divides by
estimate_keyframe_frequency() which can
return 0 in the case where --kf-max-dist=0.
Change-Id: Idfc59653478a0073187cd2aa420e98a321103daa
Create a new input parameter to allow specifying
the packed frame stereo 3d format. A default value
of mono will be written in the absence of user
specified input
Change-Id: I576d9952ab5d7e2076fbf1b282016a9a1baaa103
The accumulator array is an integer array, so use paddd instead of paddw
to add values to it. Fixes overflows when using large --arnr-maxframes
(>8) values.
Change-Id: Iad83794caa02400a65f3ab5760f2517e082d66ae
The arguments to these fprintfs are int not long int so
the format specifier should be "%d" and not "%ld". This
was writing garbage in the linux build.
Change-Id: I3d2aa8a448d52e6dc08858d825bf394929b47cf3
This patch changes the release configuration of MS VS projects to
explicitly use two compiler options "Maximize Speed (/O2)" and
"Favor fast code(/Ot)".
Change-Id: I0bf8343d9ca195851332b91ec69c69ee4e31ce2a
add an sse4 quantizer so we can use pinsrw/pextrw and keep values in xmm
registers instead of proxying through the stack. and as long as we're
bumping up, use some ssse3 instructions in the EOB detection (see ssse3
fast quantizer)
pick up about a percent on 32bit and about two on 64bit.
Change-Id: If15abba0e8b037a1d231c0edf33501545c9d9363
The dc_diff flag is used to skip loopfiltering. Instead
of setting this flag in the decoder/encoder, we now check
for this condition in the loopfilter.
Change-Id: Ie2b9cdf9e0f4e8b932bbd36e0878c05bffd28931
Code cleanup. The build inter predictor functions are
redundantly checking the mode_info_context for either
INTRA_FRAME or SPLITMV.
Change-Id: I4d58c3a5192a4c2cec5c24ab1caf608bf13aebfb
Golden and ALT reference buffers were refreshed by copying from
the new buffer. Replaced this by index manipulation.
Also moved all the reference frame updates to one function for
easier tracking.
Change-Id: Icd3e534e7e2c8c5567168d222e6a64a96aae24a1
Remove tot_key_frame_bits and prior_key_frame_size[] as they were
tracked but never used. Remove intra_frame_target, as it was only
used to initialize prior_key_frame_size.
Refactor vp8_adjust_key_frame_context() some to remove unnecessary
calculations.
Change-Id: Icbc2c83d2b90e184be03e6f9679e678f3a4bce8f
the win64 abi requires saving and restoring xmm6:xmm15. currently
SAVE_XMM and RESTORE XMM only allow for saving xmm6:xmm7. allow
specifying the highest register used and if the stack is unaligned.
Change-Id: Ica5699622ffe3346d3a486f48eef0206c51cf867
Went through the code and fixed it. Verified on Windows.
Where possible, remove dependencies on xmm[67]
Current code relies on pushing rbp to the stack to get 16 byte
alignment. This broke when rbp wasn't pushed
(vp8/encoder/x86/sad_sse3.asm). Work around this by using unaligned
memory accesses. Revisit this and the offsets in
vp8/encoder/x86/sad_sse3.asm in another change to SAVE_XMM.
Change-Id: I5f940994d3ebfd977c3d68446cef20fd78b07877
Passed SSE from sub-pixel search back to pick_inter_mode
function, which is compared with the encode_breakout to
see if we could skip evaluating the remaining modes.
Change-Id: I4a86442834f0d1b880a19e21ea52d17d505f941d
This is reported by m...@hesotech.de (see issue 312):
"The decoder causes an access violation
when you decode the first frame, then make a pause of about
60 seconds and then decode further frames. But only if
vpx_codec_dec_cfg_t.threads> 1.
This is caused by a timeout of WaitForSingleObject.
When I change the definition of VPXINFINITE to INFINITE(0xFFFFFFFF),
the problem is solved."
Reproduced the crash and verified the changes on Windows platform.
This brings the behavior inline with the other platforms using sem_wait().
Change-Id: I27b32f90bce05846ef2684b50f7a88f292299da1
According to the docs, this should have been enabled, but
the disassembled output shows otherwise. This improved
the encode/decode performance.
Change-Id: I45ad7e6d299b89ac3166d7ef7da75b74994344c6
vp8_filter_block1d16_h4_ssse3 was never called
because UNSHADOW_ARGS moves the stack by 'mov rsp, rbp', the issue was
masked. however, if/when win64 used those registers for persistant data,
issues could/will arise.
Change-Id: I56d6effca0aeba1f86082689771cb10145d39651
In vp8_pick_inter_mode(), for NEWMV mode, use the error result got
from motion search as distortion. This helps performance in real-
time mode.
Change-Id: I398c4e46cc5381f7d874e748cf78827ef0e0860c
The value of distortion2 returned by vp8_pick_intra4x4mby_modes
was being overwritten by the value returned by get16x16prederror
before it was tested.
Change-Id: If00e80332b272c5545c3a7e381c8041e8319b41a
update for the latest version of the ios sdk. adding
usr/lib/system fixes a missing libcache.dylib issue
make isysroot path more DRY
Change-Id: Ib748ef3dac3cac2e4848fbffa1e9a0112eac826b
Index i is used to detect early breakout from the first loop, but
its value is lost due to reuse in the second for loop. I moved
the position of the second loop and did some format cleanup.
Change-Id: I02780eae1bd89df4b6c000fb8a018b0837aac2e5
This patch cleans up the source buffer storage and copy mechanism to
allow access through a standard push/pop/peek interface. This approach
also avoids an extra copy in the case where the source is not a
multiple of 16, fixing issue #102.
Change-Id: I05808c39f5743625cb4c7af54cc841b9b10fdbd9
in encodframe.c, quant_shift is set to 0 or 1 in vp8cx_invert_quant
only use 8 bits to store this, instead of 16. will allow saving an
xmm register in an updated version of the regular quantize
Change-Id: Ie88c47fe2aff5af0283dab1147fb2791e4b12f90
This patch changes the rc_undershoot_pct and rc_overshoot_pct controls
to set the "aggressiveness" of rate adaptation, by limiting the
amount of difference between the target buffer level and the actual
buffer level which is applied to the target frame rate for this frame.
This patch was initially provided by arosenberg at logitech.com as
an attachment to issue #270. It was modified to separate these controls
from the other unrelated modifications in that patch, as well as to
use the pre-existing variables rather than introducing new ones.
Change-Id: Id542e3f5667dd92d857d5eabf29878f2fd730a62
Previous to commit de4e9e3, there was an early return in the alt-ref
case that was inadvertantly removed when the function was refactored
to return void. This patch restores the prior behavior.
Change-Id: I783ffd594a4690297e2742f99526fd7ad67698b2
The error accumulator stats values cpi->prediction_error and
cpi->intra_error were being populated with rd values not
distortion values.
These are only "currently" used in a limited way for RT compress
key frame detection.
Change-Id: I2702ba1cab6e49ab8dc096ba75b6b34ab3573021
This commit fixed an overflow in ssim calculation, added register
save and restore to make sure assembly code working for x64 platform.
It also changed the sampling points to every 4x4 instead of 8x8 and
adjusted the constants in SSIM calculation to match the scale of
previous VPXSSIM.
Change-Id: Ia4dbb8c69eac55812f4662c88ab4653b6720537b
on the same order as the sse2 fast quantize change: ~2%
except for 32bit. only a slight improvment there.
Change-Id: Iff80e5f1ce7e646eebfdc8871405458ff911986b
MV sad cost error is only used in full-pixel motion search,
which only need full-pixel resolution instead of quarter-pixel
resolution. This change reduced mvsadcost table size, and
removed unneccessary pamameter passing since this table is
constant once it is generated.
Change-Id: I9f931e55f6abc3c99011321f1dfb2f3562e6f6b0
cygwin doesn't support _sopen. drop down to the lowest common
denominator and merge main for all platforms. this also opens the door
for supporting multiple object formats with a single binary.
Change-Id: I7cd45091639d447434e6d5db2e19cfc9988f8630
rather than look up rc in the zig zag table, embed it in the macro. this
also allows us to shuffle some values in the macro and keep *d in rsi
gains of about the same order as the obj_int_extract implementation: ~2%
Change-Id: Ib7252dd10eee66e0af8b0e567426122781dc053d
Address calculations moved from encodemb_arm.c file to neon
optimized assembly function to save cycles in function calls.
- vp8_subtract_b_neon_func replaced with vp8_subtract_b_neon
that contains all needed address calculations
- unnecessary file encodemb_arm.c removed
- consistent with ARMv6 optimized version
Change-Id: I6cbc1a2670b56c2077f59995fcf8f70786b4990b
Detect the number of available cores and limit the thread allocation
accordingly. On decoder side limit the number of threads to the max
number of token partition.
Core detetction works on Windows and
Posix platforms, which define _SC_NPROCESSORS_ONLN or _SC_NPROC_ONLN.
Change-Id: I76cbe37c18d3b8035e508b7a1795577674efc078
Rules are added to libs.mk to generate a vpx.pc, which is
installed as pkgconfig/vpx.pc under the target library directory.
This also requires the install path prefix be exported directly
in config.mk.
Some systems use a tool called pkg-config to query information
about intalled libraries or other resources, based on database
files provided by the packages themselves at install time.
Providing such a file for libvpx simplifies integration with
other build systems, and provides an easy avenue for developers
to test against their own builds of the library.
Change-Id: I4e32a8fbb53fc331aa95eb207c63dd70a76d18ed
The configure script exports the major/minor/patch version
numbers, but didn't make the full version string available
to Makefile recipes and rules, the way it is available to
C code from vpx_version.h.
Change-Id: Ic6a9d4c574a6ea66a50c928f4eedeb91d7668eb5
Identified as a possible cause of issue #308, the code was silently
ignoring realloc failures, which would lead to corruption, memory
leaks, and likely a crash. The best we can do in this case is die
gracefully.
Change-Id: Ie5f6a853d367015be5b9712bd742778f3baeefd9
Adds following ARMv6 optimized functions to encoder:
- vp8_subtract_b_armv6
- vp8_subtract_mby_armv6
- vp8_subtract_mbuv_armv6
Gives 1-5% speed-up depending on input sequence and encoding
parameters. Functions have one stall cycle inside the loop body
on Cortex pipeline.
Change-Id: I19cca5408b9861b96f378e818eefeb3855238639
now that we need asm_enc_offsets.c for x86 and arm and it is
harmless to build it for other targets, add it unconditionally
Change-Id: I320c5220afd94fee2b98bda9ff4e5e34c67062f3
Half pixel interpolations optimized in variance calculations. Separate
function calls to vp8_filter_block2d_bil_x_pass_armv6 are avoided.On
average, performance improvement is 6-7% for VGA@30fps sequences.
Change-Id: Idb5f118a9d51548e824719d2cfe5be0fa6996628
remove helper function and avoid shadowing all the arguments to the
stack on 64bit systems
when running with --good --cpu-used=0:
~2% on linux x86 and x86_64
~2% on win32 x86 msys and visual studio
more on darwin10 x86_64
significantly more on
x86_64-win64-vs9
Change-Id: Ib7be12edf511fbf2922f191afd5b33b19a0c4ae6
This declaration did not match the prototype_sad() prototype, but was
unused in this translation unit, so it is removed instead. Fixes
issue 290.
Change-Id: I168854f88a85f73ca9aaf61d1e5dc0f43fc3fdb3
Map an enum to the --end-usage values, so you can specify
--end-usage=cq instead of --end-usage=2. The numerical values still
work for historical scripts, etc, but this is more user friendly.
Change-Id: I445ecd9638f801f5924a71eabf449bee293cdd34
Optimized fdct4x4 (8x4) for ARMv6 instruction set.
- No interlocks in Cortex-A8 pipeline
- One interlock cycle in ARM11 pipeline
- About 2.16 times faster than current C-code compiled with -O3
Change-Id: I60484ecd144365da45bb68a960d30196b59952b8
Thread synchronization was not correct when frame width was 1 MB.
Number of allocated encoding threads is limited by the sync_range.
There is no point having more because each thread lags sync_range MBs
behind the thread processing the row above.
http://code.google.com/p/webm/issues/detail?id=302
Change-Id: Icaf67a883beecc5ebf2f11e9be47b6997fdf6f26
A large number of functions were defined with external linkage, even
though they were only used from within one file. This patch changes
their linkage to static and removes the vp8_ prefix from their names,
which should make it more obvious to the reader that the function is
contained within the current translation unit. Functions that were
not referenced were removed.
These symbols were identified by:
$ nm -A libvpx.a | sort -k3 | uniq -c -f2 | grep ' [A-Z] ' \
| sort | grep '^ *1 '
Change-Id: I59609f58ab65312012c047036ae1e0634f795779
The mmap allocation code in vp8_dx_iface.c was inconsistent.
The static array vp8_mem_req_segs defines two descriptors,
but only the first is real. The second is a sentinel and
isn't actually allocated, so vpx_codec_alg_priv is declared
with mmaps[NELEMENTS(vp8_mem_req_segs)-1]. Some functions
use this reduced upper bound when iterating though the mmap
array, but these two functions did not.
Instead, this commit calls NELEMENTS(...->mmaps) to directly
query the bounds of the dereferenced array.
This fixes an array-bounds warning from gcc 4.6 on
vp8_xma_set_mmap.
Change-Id: I918e2721b401d134c1a9764c978912bdb3188be1
Fixes implicit declaration warning for 'mach_task_self'. This change
is an update to Change I9991dedd1ccfddc092eca86705ecbc3b764b799d,
which fixed this issue for the decoder but not the encoder.
Change-Id: I9df033e81f9520c4f975b7a7cf6c643d12e87c96
obj_int_extract was unconditionally skipping the first character in the
symbol. make sure it's actually an '_' first
Change-Id: Icfe527eb8a0028faeabaa1dcedf8cd8f51c92754
Without this change I get link errors in firefox's libxul. It looks
like the linker expect a particular pattern for getting the GOT. This
patch changes webm to use the same pattern used by the compiler.
Change-Id: Iea8c2e134ad45c1dc7d221ff885a8429bfa4e057
The vp8_build_intra_predictors_mby and vp8_build_intra_predictors_mby_s
functions had global function pointers rather than using the RTCD
framework. This can show up as a potential data race with tools such as
helgrind. See https://bugzilla.mozilla.org/show_bug.cgi?id=640935
for an example.
Change-Id: I29c407f828ac2bddfc039f852f138de5de888534
Clean up vp8_init_config() a bit and remove null pointer case,
as this code can't be called any more and is not an adequate
trap anyway, as a null pointer would cause exceptions before
hitting the test.
Change-Id: I937c00167cc039b3aa3f645f29c319d58ae8d3ee
Issue 291 highlighted the fact that CQ mode was not working
as expected in 1 pass mode,
This commit fixes that specific problem but in so doing I also
uncovered an overflow issue in the VBR code for 1 pass and
some data values not being correctly initialized.
For some clips (particularly short clips), the resulting
improvement is dramatic.
Change-Id: Ieefd6c6e4776eb8f1b0550dbfdfb72f86b33c960
In multithreaded mode the loopfilter is running in its own thread (filter level
calculation and frame filtering). Filtering is mostly done in parallel with the
bitstream packing. Before starting the packing the loopfilter level has
to be calculated. Also any needed reference frame copying is done in the
filter thread.
Currently the encoder will create n+1 threads, where n > 1 is the number of
threads specified by application and 1 is the extra filter thread. With n = 1
the encoder runs in single thread mode. There will never be more than n threads
running concurrently.
Change-Id: I4fb29b559a40275d6d3babb8727245c40fba931b
Enable extraction of assembly offsets from compiled examples in MSVS.
This will allow us to remove some stub functions from x86 assembly since
we will be able to reliably determine structure offsets at compile time.
see ARM code for examples:
vp8/encoder/arm/armv5te/
vpx_scale/arm/neon/
Change-Id: I1852dc6b56ede0bf1dddb5552196222a7c6a902f
The firstpass motion map consists of an 8-bit flag for
each MB indicating how strongly the firstpass code
believes it should be filtered during the second pass
ARNR filtering.
For long or large format material the motion map can
become extremely large and hamper the operation of
the encoding process.
This change removes the motion map altogether, leaving
the second pass to rely on the magnitude of the motion
compensated error to determine the filter weight to
use for the MB during ARNR filtering.
Tests on the derf set indicate that the effect of this
change is neutral, with some small wins and losses. The
motion map has therefore been removed based on
a cost/benefit evaluation.
Change-Id: I53e07d236f5ce09a6f0c54e7c4ffbb490fb870f6
The previous calculation of macroblock count (w*h)/256
is not correct when the width/height are not multiples of
16. Use the precalculated macroblock count from
cpi->common instead. This manifested itself as a divide
by zero when the number of pixels was less than 256.
num_mbs updated in estimate_max_q, estimate_q,
estimate_kf_group_q, and estimate_cq
Change-Id: I92ff98587864c801b1ee5485cfead964673a9973
Move the update of the loopfilter info to the same block where it
is used. GCC 4.5 is not able trace the initialization of the local
filter_info across the other calls between the two conditionals on
pbi->common and issues an uninitialized variable warning.
Change-Id: Ie4487b3714a096b3fb21608f6b0c74e745e3c6fc
failed to find headers in the source directory
output to stdout instead of a hardcoded file
MinGW doesn't support _sopen_s
_fstat catches non-existant files
Change-Id: I24e0aacc6f6f26e6bcfc25f9ee7821aa3c8cc7e7
1. Process 16 pixels at one time instead of 8.
2. Add check for both xoffset =0 and yoffset=0, which happens
during motion search.
This change gave encoder 1%~3% performance gain.
Change-Id: Idaa39506b48f4f8b2fbbeb45aae8226fa32afb3e
GCC 4.5 and 4.6 both issue a warning about the multi-line format
string introduced in bc9c30a0, which also changed the whitespace
in the associated stt file by line-wrapping the long format string.
Instead, use multiple string constants, which the compiler will
concatenate. This maintains the original formatting, but remains
legible within the standard line length.
Change-Id: I27c9f92d46be82d408105a3a4091f145f677e00e
Disable zbin boost in SPLITMV mode as intended. Was incorrectly looking
at vp8_ref_frame_order instead of vp8_mode_order when comparing against
SPLITMV. This condition should have always been false, as SPLITMV is
not in the range of valid reference frames.
Change-Id: I0408cc7595eff68f00efef6d008e79f5b60d14bf
Cast size_t to (unsigned long) and print it with the %lu format
string, which is more portable than C99's explict %zu for size_t.
This truncates on Windows x64 but otherwise works on 32 and 64 bit
platforms. In practice the stats file is unlikely to be so large.
Change-Id: I0432b3acf85fc6ba4ad50640942e1ca4614b21cb
In some cases where clips have been encoded with
borders (eg. some wide-screen content where there is a
border top and bottom and slide shows containing portrait
format photographs (border left and right)) key frames were
not being correctly detected.
The new code looks to measure cases where a portion of
the image can be coded equally easily using intra or inter
modes and where the resulting error score is also very low.
These "neutral" areas are then discounted in the key frame
detection code.
Change-Id: I00c3e8230772b8213cdc08020e1990cf83b780d8
This code extends what was previously done for GFs, to pick
cases where insertion of a key frame after a fade (or other
transition or complex motion) followed by a still section, will
be beneficial and will reduce the number of forced key frames.
Change-Id: If8bca00457f0d5f83dc3318a587f61c17d90f135
Rename the common control id enum vp8_{dec,com}_control_id,
move VP8_DECODER_CTRL_ID_START to common, wrap long lines.
Change-Id: I659abc62f10aa389d496f7f43950775db0ef2f9f
When the modified_error_left accumulator exceeds INT_MAX, an incorrect
cast to int resulted in a negative value, causing the rate control to
allocate no bits to that keyframe group, leading to severe undershoot
and subsequent poor quality.
This error was exposed by the recent change to the rolling target and
actual spend accumulators in commit 305be4e4 which fixed them to
actually calculate the average value rather than be re-initialized
on every frame to the average per-frame bitrate. When this bug was
triggered, the target bitrate could be 0, so the rolling target
becomes small, which causes the undershoot. The code prior to 305be4e4
did not exhibit this behavior because the rolling target was always
set to a reasonable value and was independent of the actual target
bitrate. With this patch, the actual target bitrate is calculated
correctly, and the rate control tracks as expected.
This cast was likely added to silence a compiler warning on a comparison
between a double (modified_error_left) and an int (0). Instead, this
patch removes the cast and changes the comparison to be against 0.0,
which should prevent the warning from reoccuring.
This fixes issue #289. Special thanks to gnafu for his efforts in
reporting and debugging this fix.
Change-Id: Ie5cc1a7b516c578a76c3a50c892a6f04a11621fe
add visual studio 9 to --help
remove cpp, cxx, hpp, hxx files from filter
add the ability to target project names. this will be necessary to
enable obj_int_extract
Change-Id: I407583320d8b67a0df40c07221838c42678792f7
AMD64 only implies SSE2, not SSE3. There aren't any known cases where
icc was generating SSE3 instructions since all the vectorizable code
is already in handwritten asm, so this fix is included mostly for
correctness. Fixes issue #259.
Change-Id: I993335a4740b68b559035305fb52ca725a6beaff
MSVC can't pass the address of global variables in a DLL correctly
across DLL boundaries. This patch allows linking the examples to
a libvpx dll build. Fixes issue #268.
Change-Id: I1c52d076cfc68efb3efdfba019f12d53c5019f58
This allows the base documentation to be built without the need for php
which is required to produce the example documentation
Change-Id: Id1861723c672fa8da132a074a4657e2cb94c1e79
Group algorithm interfaces to avoid undocumented warning from doxygen
and provide basic documentation for CQ level & cpuused.
Change-Id: I11095061be962cbc998741de9c8c3019d415e137
This fixes an overflow problem in the frame error accumulators.
The overflow condition is extreme but did trigger when Frank B.
coded some high motion interlaced HD content.
The observed effect was a catastrophic breakdown of the rate
control leading to massive undershoot and poor bit allocation.
All the error values should really be unsigned but I will look at this
separately.
Change-Id: I9745f5c5ca2783620426b66b568b2088b579151f
- correct spelling
- remove explicit file name w/\file (unnecessary when contained in the
same file and prone to desync)
Change-Id: I68a3960ac5ab84d0f2e5c9b2e29799f26dfccf23
Currently, when the video frame width is not multiples of 16, the
source buffer has a stride of non-multiples of 16, which forces
an unaligned load in SAD function and hurts the performance. To
avoid that, this change allocates source buffers to be multiples
of 16.
Change-Id: Ib7506e3eb2cea06657d56be5a899f38dfe3eeb39
In real-time mode, vp8_sad16x16 function is called heavily in
motion search part. Improvement of this function gives 1.2%
encoding performance gain (real-time mode, tulip clip).
Change-Id: I23c401fc40c061f732a9767e8d383737a179bd58
checks added to make sure that cpi->tplist
is freed correctly in vp8_dealloc_compressor_data
and vp8_alloc_compressor_data.
Change-Id: I66149dbbd25c958800ad94f4379d723191d9680d
Eliminated unnecessary calculations. Improved performance
by 10% on keyframes and 1.6% overall for the test clip used.
Change-Id: I87671b26af5e2cc439e81d0fee3b15c7cd2a3309
As mentioned in check-in "Improve motion search in real-time mode",
MV prediction calculation causes speed loss for speed 7 and above.
This change added a flag to turn off this calculation for speed>6
in real-time mode.
Change-Id: I9f4ae5a8bf449222d1784b54e7d315fc8347b2d1
Applied better MV prediction in real-time mode, which improves
the encoding quality.
Used quarter-pixel search instead of iterative sub-pixel search
for speed >=5 to improve encoding performance.
Tests on the test set showed:
1. For speed=-5, quality improvement: 1.7% on AvgPSNR and 2.1%
on SSIM, performance improvement: 3.6% (This counts in the
performance lose caused by MV prediction calculation in "Improve
MV prediction in vp8_pick_inter_mode() for speed>3").
2. For speed=-8, quality improvement: 2.1% on AvgPSNR and 2.5%
on SSIM. but, 6.9% performance decrease because of MV prediction
calculation. This should be improved later.
Change-Id: I349a96c452bd691081d8c8e3e54419e7f477bebd
Created a new speed 1 which is in the middle of the old
speed 0 and speed 1. (for both quality and performance)
Change-Id: I4802133cdb43f359ca787646c090899679dd5d84
Use 0xFFF0 vice 240 (0xF0) for determining whether the sometimes
implicit bit 3 will be transmitted. This is consistent with the decoder
and encode_mvcomponent().
Change-Id: Ic1304d0ab56844bed8236edd1c5243a6767fc6b1
make reference version of bilinear_filters short.
use reference versions of bilinear_filters and sub_pel_filters when
possible.
recognize that Width was being passed into
filter_block2d_bil_first_pass multiple times. ARM version had already
fixed this. propegate to C.
change references to src_pixels_per_line to src_pitch and standardize on
src/dst (instead of input/output).
recognize that first_pass is only run in the verticle and second_pass
only horizontal. ARM version had already fixed this. propegate to C
Change-Id: I292d376d239a9a7ca37ec2bf03cc0720606983e2
it's difficult to mux the *_offsets.c files because of header conflicts.
make three instead, name them consistently and partititon the contents
to allow building them as required.
Change-Id: I8f9768c09279f934f44b6c5b0ec363f7943bb796
preview_img d_w and d_h along with w and h
would not be updated for resized frames.
now uses sd.y_width and sd.y_height
Change-Id: I52241de4cc1de5e73f865e668bd70a7cbd954390
When the keyframe distance is fixed the first interval has the right
distance but, the next ones have kf_distance + 1.
Change-Id: I44f1190fe7146124bd07660a5e0ef08829e3ae07
common/arm/vpx_asm_offsets moves up a level. prepare for muxing with
encoder/arm/vpx_vp8_enc_asm_offsets
Change-Id: I89a04a5235447e66571995c9d9b4b6edcb038e24
Only suppress unused function warnings, rather than supprressing all
unused-* warnings. Unused functions can still be seen with
--enable-extra-warnings.
Change-Id: Ibca20d859dbffedd76bd082ffe0fa685c3ac198e
The encoder was not correctly catching transitions in the quantizer
deltas. If a delta_q was set, then the quantizer would be reinitialized
on every frame, but if they transitioned to 0, the quantizer would
not be reinitialized, leading to a encode-decode mismatch.
This bug was triggered by commit 999e155, which sets a Y2 delta Q
for very low base Q levels.
Change-Id: Ia6733464a55ee4ff2edbb82c0873980d345446f5
Whe auto keyframe insertion is enabled and conditions are right (scene change)
the encoder can decide to insert a key frame and does a re-encoding. This can
introduce extra latency. In RT mode we do not do the re-encoding of the current
frame but force the next frame to key frame.
Change-Id: I15c175fa845ac4c1a1f18bea3676e154669522a7
Reduce the number of sync points by letting each thread
continue imediatly with a new MB row.
Better multicore scaling, improves performance by 5-20% on ARM multicore.
Change-Id: Ic97e4d1c4886a842c85dd3539a93cb217188ed1b
from vp8cx_encode_intra_macro_block. prediction_error is used when
deciding if a frame should be a keyframe. After reviewing this with
Yaowu, it was pointed out that vp8cx_encode_intra_macro_block
is only called for keyframes, so the accumulation is unnecessary.
Change-Id: Id79dc81b80d4f5d124f3a0dba1b923887e2e1ec8
vp8_pick_intra4x4mby_modes uses the passed in distortion
for an early breakout. The best distortion was never saved
and the distortion for TM_PRED was always used.
Change-Id: Idbaf73027408a4bba26601713725191a5d7b325e
Previously, the DC check is to make sure there is no code-able
DC shift for quantizer Q0, which has been verified rather
conservative. This commit changes the criteria to have two
components, DC and AC, to address the conservativeness. First,
it checks if all AC energy is enough to contribute a single
non-zero quantized AC coefficient. Second, for DC, the decision
to skip further considers two possible scenarios: 1. There is
no code-able 2nd order DC coefficient at all; 2 The residue is
relatively flat, but the uniform DC change is very small, i.e.
less than 1/2 gray level per pixel.
Comparing to previous criteria, the new criteria is about 10%
to 15% faster in encoding time with a very small quality loss.
(threshold ~1000 and quality range 33db-45db)
It should be noted that this commit enables "automatic" static
threshold for encodebreakout if a non-zero small value is passed
in to encoder.
Change-Id: I0f77719a1ac2c2dfddbd950d84920df374515ce3
The condition for using RD when selecting the intra coding mode
for a MB is that the RD flag is set AND we're not in real-time
mode.
Previously the code used RD if either the RD flag was set OR
we were not using real-time mode.
Change-Id: Ic711151298468a3f99babad39ba8375f66d55a08
This function was using a variance metric compared to and SSE metric in
other places (eg. vp8_rd_inter_uv)
Change-Id: I9109fcc5a13bca9db1d7ead500fe14999ab233eb
Adds following targets to configure script to support RVCT compilation
without operating system support (for Profiler or bare metal images).
- armv5te-none-rvct
- armv6-none-rvct
- armv7-none-rvct
To strip OS specific parts from the code "os_support"-config was added
to script and CONFIG_OS_SUPPORT flag is used in the code to exclude OS
specific parts such as OS specific includes and function calls for
timers and threads etc. This was done to enable RVCT compilation for
profiling purposes or running the image on bare metal target with
Lauterbach.
Removed separate AREA directives for READONLY data in armv6 and neon
assembly files to fix the RVCT compilation. Otherwise
"ldr <reg>, =label" syntax would have been needed to prevent linker
errors. This syntax is not supported by older gnu assemblers.
Change-Id: I14f4c68529e8c27397502fbc3010a54e505ddb43
vp8/encoder/rdopt.c:728: warning: pointer targets in passing argument 3
of 'macro_block_yrd' differ in signedness
vp8/encoder/rdopt.c:541: note: expected 'int *' but argument is of type
'unsigned int *'
distortion is signed when calling macro_block_yrd is both other cases,
as well as for RDCOST
Change-Id: I5e22358b7da76a116f498793253aac8099cb3461
Change-Id: I6ca2d89f355839c4c770773c09fc69dcea7c1406
warning: implicit declaration of function
'vp8_variance_halfpixvar16x16_[h|v|hv]_neon'
'vp8_sub_pixel_variance16x16_neon_func'
Improved the performance of the first pass only
(~6% on 720p test clip) by making use of LUT instead of the
float calculations. Might try a SIMD version later.
Also started to make use of int_mv instead of
MV.
Change-Id: If2a217c7d6b59cd2c25c5553e0ca7e0502403af8
Use the function macro_block_yrd() to calculate error and distortion
in keeping with what is done for inter frames.
The old code was using a variance metric for once case and an
SSE function for measuring distortion in the other case.
The function vp8_encode_intra16x16mbyrd() is no longer used.
Change-Id: Ic228cb00a78ff637f4365b43f58fbe5a9273d36f
The code previously tested cpi->common.refresh_alt_ref_frame
but there are situations where this flag may be set for viewable frames.
The correct test should be !cm->show_frame.
Change-Id: Ia1a600622992a4a68fe1d38ac23bf6b34b133688
This commit also removes artificial RDMULT cap for low quantizers.
The intention is to address some abnormal behavior of mode selections
at the low quantizer end, where many macroblocks were coded with
SPLITMV with all partitions using same motion vector including (0,0).
This change improves the compression quality substantially for high
quality encodings in both PSNR and SSIM terms. Overall effect on
mid/low rate range is also positive for all metrics, but smaller
in magnitude.
Change-Id: I864b29c4bd9ff610d2545fa94a19cc7e80c02667
Commit 336aa0b7da incorrectly
declared current_pos as and int, when it should have been
a FIRSTPASS_STATS pointer.
Change-Id: I0a51c7a86ebba8546c95dd5d9d1c1143d4613e40
Adjust checking points in motion vector prediction to better cover
possible movements, and get a better prediction. Tests on test
clips showed a 0.1% improvement in SSIM, and no change in PSNR
and performance.
Change-Id: Ifdab05d35e10faea1445c61bb73debf888c9d2f8
The old 2 pass code estimated error distribution when coding a
forced (by interval) key frame. The result of this was that in some
cases, when allocating bits at the GF group level within a KF
group there was either a glut of bits or starvation of bits at the end
of the KF group.
Added code to rescan and get the correct data once the position of
a forced key frame has been determined.
Change-Id: I0c811675ef3f9e4109d14bd049d7641682ffcf11
-For targets with external build systems like visual
studio CC is not set so check_add_cflags will fail.
Only call this function if extra_cflags is set.
Change-Id: I3531bad69e9b6a59c5be1b0e8b6053ccccbc332c
vp8cx_mb_init_quantizer was being called for every mode checked
in vp8_rd_pick_inter_mode. zbin_extra is the only value that
really needs to be recalculated. This calculation is disabled
when using the fast quantizer for mode selection.
This gave a small performance boost (~.5% to 1%).
Note: This needs to be verified with segmentation_enabled.
Change-Id: I62716a870b3c82b4a998bdf95130ff0b02106f1e
In sub-pixel calculation, xoffset and yoffset mostly take some
specific values. Modified sub-pixel filter functions according to
these possible values to improve performance.
Change-Id: I83083570af8b00ff65093467914fbb97a4e9ea21
Added code to scan ahead a few frames when we see what
we think is a static scene in the two pass GF loop to see if the
conditions persist.
Moved calculation of decay rate out into a fuunction.
Change-Id: I6e9c67e01ec9f555144deafc8ae67ef25bffb449
These changes are specifically targeted at fade transitions to
static scenes. Here we want to place a GF/ARF immediately
after the fade and prevent an ARF just before the fade.
Also some code lines and comment lines shortened to 80 chars
while I was there.
Change-Id: Iefdc09a4fa7b265048fc017246b73e138693950f
Add --extra-cflags as config parameter for user defined extra CFLAGS.
Add -g to asflags when debug enabled for arm targets.
Change-Id: Ibdde7cfdda6736c1c1db45e6466bd08504a51f15
In both vp8_find_next_key_frame and define_gf_group,
motion_pct was initialised at the top of the loop before
next_frame stats had been read in.
This fix sets motion_pct after next_frame stats have
been read.
Change-Id: I8c0bebf372ef8aa97b97fd35b42973d1d831ee73
Incorrect value loop_decay_rate used in GF loop.
The intent was to test the cumulative value decay_accumulator.
Change-Id: I62928c63eb09f4f6936a45ebd1c23784d1c9681b
A new vpx_codec_control called VP8D_GET_FRAME_CORRUPTED. The output
from the function is non-zero if the last decoded frame contains
corruption due to packet losses.
The decoder is also modified to accept encoded frames of zero length.
A zero length frame indicates to the decoder that one or more frames
have been completely lost. This will mark the last decoded reference
buffer as corrupted. The data pointer can be NULL if the length is
zero.
Change-Id: Ic5902c785a281c6e05329deea958554b7a6c75ce
In vp8_find_best_sub_pixel_step_iteratively(), many times xoffset
and yoffset are specific values - (4,0) (0,4) and (4,4). Modified
code to call simplified NEON version at these specific offsets to
help with the performance.
Change-Id: Iaf896a0f7aae4697bd36a49e182525dd1ef1ab4d
This code fixes a bug in the calculation of
the minimum Q for alt ref frames.
It also allows an extended gf/arf interval for sections
of clips that completely static (or nearly so).
Change-Id: I1a21aaa16d4f0578e5f99b13bebd78d59403c73b
Remove allocation/deallocation of stats storage.
Remove full search functions in machine specific encoder inits.
Remove last pass validation in validate_config.
Change-Id: I7f29be69273981a4fef6e80ecdb6217c68cbad4e
The CQ level was not using the q_trans[] array to convert
to a 0-127 range as per min and maxq
Experimental change to try and match the reconstruction
error for forced key frames approximately to that of the
previous frame by means of the recode loop. Though this
may cause extra recodes and the recode behavior has not
been optimized, it can only happen on forced key frames.
Change-Id: I1f7e42d526f1b1cb556dd461eff1a692bd1b5b2f
Previously when a frame was being overlaid on a previously coded
alt ref frame we only checked the alt ref 0,0 mode. Where there is
a possibility that the alt ref buffer is a filtered frame we should allow
the other prediction modes as normal or at the least allow use of
the last frame buffer.
Change-Id: I4d6227223d125c96b4f3066ec6ec9484fee7768c
In cases where the frame width is not a multiple of 16 the
ARNR filter would go wrong.
In vp8_temporal_filter_iterate_c when updating pointers
at the end of a row of MBs, the image size was
incorrectly used rather than using Num_MBs_In_Row
times 16 (Y) or 8 (U,V).
This worked when width is multiple of 16 but failed
otherwise.
Change-Id: I008919062715bd3d17c7aa2562ab58d1cb37053a
This change is designed to try and reduce pulsing effects when moving
with a complex transition like a fade, into an easy or static section in
an otherwise difficult clip in CQ mode.
The active CQ level is relaxed down to the user entered level for frames that
are generating less than the passed in minimum bandwidth.
Change-Id: Id6d8b551daad4f489c087bd742bc95418a95f3f0
Fixed discrepancy cpi->ni_frames vs cm->current_video_frame > 150.
Make one pass path explicit.
There is still scope for some odd behaviour around the transition
point at cpi->ni_frames > 150.
Change-Id: Icdee130fe6e2a832206d30e45bf65963edd7a74d
Where a key frame occurs because of a minimum interval
selected by the user, then these forced key frames ideally need
to be more closely matched in quality to the surrounding frame.
Change-Id: Ia55b1f047e77dc7fbd78379c45869554f25b3df7
Add a flag to always enable block4x4 search for speed=0 (good
quality) to guarantee no quality loss for speed0.
Change-Id: Ie04bbc25f7e6a33a7bfa30e05775d33148731c81
The maximum possible MV in 1/8 pel units is (1<<11), which could
cause mvcost out of its range that is 1023. Change maximum
possible MV in 1/8 pel units to (1<<11)-8 will fix this problem.
Change-Id: I5788ed1de773f66658c14f225fb4ab5b1679b74b
Further experiment with restriction of the Q range.
This uses the average non KF/GF/ARF quantizer, instead
of just relying on the initial value. It is not such a strong constraint
but there may be a reduced risk of rate misses.
Change-Id: I424fe782a37a2f4e18c70805e240db55bfaa25ec
The merge includes hooks to for CQ mode and other code
changes merged from the test branch.
CQ mode attempts to maintain a more stable quantizer within a clip
whilst also trying to adhere to a guidline maximum bitrate.
The existing target data rate parameter is used to specify the
guideline maximum bitrate.
A new parameter allows the user to specify a target CQ level.
For normal (non kf/gf/arf) frames, the quantizer will not drop BELOW the
user specified value (0-63). However, in some cases the encoder may
choose to impose a target CQ that is above that specified by the user,
if it estimates that consistent use of the target value is not compatible
with guideline maximum bitrate.
Change-Id: I2221f9eecae8cc3c431d36caf83503941b25e4c1
In two pass encoding each frame is given an active
Q range to work with. This change limits how much this
Q range can be altered over time from the initial estimate
made for the clip as a whole.
There is some danger this could lead to overshoot or undershoot
in some corner cases but it helps considerably in regard to
clips where either there is a glut or famine of bits in some sections,
particularly near the end of a clip.
Change-Id: I34fcd1af31d2ee3d5444f93e334645254043026e
cpi->target_bits_per_mb is currently not being used,
so delete it. Also removed other unused code in rdopt.c.
Change-Id: I98449f9030bcd2f15451d9b7a3b9b93dd1409923
count can be reduced to short because the max number of filtered frames
is set to 15. the max value for any frame is 32 (modifier = 16,
filter_weight = 2). 15*32 = 480 which requires 9 bits
this function goes from about 7000 us / 1000 iterations for the C code
to < 275 us / 1000 iterations for sse2 for block_size = 16 and from
about 1800 us / 1000 iters to < 100 us / 1000 iters for block_size = 8
Change-Id: I64a32607f58a2d33c39286f468b04ccd457d9e6e
Commit 0ce3901 introduced a change in the frame buffer copy logic where
the NEW frame could be copied to the ARF or GF buffer through the
copy_buffer_to_{arf,gf}==1 flags, if the LAST frame was not being
refreshed. This is not correct. The intent of the
copy_buffer_to_{arf,gf}==1 flag is to copy the LAST buffer. To copy the
NEW buffer, the refresh_{alt_ref,golden}_frame flag should be used.
The original buffer copy logic is fairly convoluted. For example:
if (cm->refresh_last_frame)
{
vp8_swap_yv12_buffer(&cm->last_frame, &cm->new_frame);
cm->frame_to_show = &cm->last_frame;
}
else
{
cm->frame_to_show = &cm->new_frame;
}
...
if (cm->copy_buffer_to_arf)
{
if (cm->copy_buffer_to_arf == 1)
{
if (cm->refresh_last_frame)
vp8_yv12_copy_frame_ptr(&cm->new_frame, &cm->alt_ref_frame);
else
vp8_yv12_copy_frame_ptr(&cm->last_frame, &cm->alt_ref_frame);
}
else if (cm->copy_buffer_to_arf == 2)
vp8_yv12_copy_frame_ptr(&cm->golden_frame, &cm->alt_ref_frame);
}
Effectively, if refresh_last_frame, then new and last are swapped, so
when "new" is copied to ARF, it's equivalent to copying LAST to ARF. If
not refresh_last_frame, then LAST is copied to ARF. So LAST is copied to
ARF in both cases.
Commit 0ce3901 removed the first buffer swap but kept the
refresh_last_frame?new:last behavior, changing the sense since the first
swap wasn't done to the more readable refresh_last_frame?last:new, but
this logic is not correct when !refresh_last_frame.
This commit restores the correct behavior from v0.9.1 and prior. This
case is missing from the test vector set.
Change-Id: I8369fc13a37ae882e31a8a104da808a08bc8428f
The following features don't make sense for the first
pass in its current form and have a significant impact on its
speed (up to 50%).
Slow quantizer, slow dct and trellis optimization.
Change-Id: Id9943f6765ffbd71fc0084ec7dfbc9d376fd6fcd
2011-01-06 17:10:07 +00:00
692 changed files with 128097 additions and 58324 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.