A flush bug is discovered during putting frame parallel decoder
into Android. This test will expose that bug.
Change-Id: Ia047f27972f4da0471649f79f1f91e7695297473
Using 4 threads, frame parallel decode is ~3x faster than single thread
decode and around 30% faster than tile parallel decode for frame parallel
encoded video on both Android and desktop with 4 threads. Decode speed is
scalable to threads too which means decode could be even faster with more threads.
Change-Id: Ia0a549aaa3e83b5a17b31d8299aa496ea4f21e3e
There are two CreateDecoder functions and decode_test_driver is not
calling the right function now. This bug is discovered during really
enable the frame parallel flag inside libvpx. This bug does not affect
any existing unit test though.
Change-Id: Icd9633c4b66d50e422a09c4310ff791082878936
Make sure VP9 frame-parallel decode passes all the standard
test vectors. Only test running with 2,3,4 threads now.
Also refactor the video decode test driver to support passing
in decode flags which is used to enable frame-parallel decode.
Change-Id: I6a712464232c2e13681634951c7e176312522e1e
The original implementation only allocates one segmentation map and this
works fine for serial decode. But for frame parallel decode, each thread
need to have its own segmentation map and the last frame segmentation map
should be provided from last frame decoding thread.
After finishing decoding a frame, thread need to serve the old segmentation
map that associate with the previous decoded frame. The thread also need to
use another segmentation map for decoding the current frame.
Change-Id: I442ddff36b5de9cb8a7eb59e225744c78f4492d8
pthread.h is not supported in windows. vp9_thread.h includes
the emulation layer for pthread in windows.
Change-Id: I2b1c8ec299928472faca7ebeea998170c9f4d744
Prepare for frame parallel decoding, the reference count buffers
need to be protected by mutex. Move vp9_thread.* to common
folder so that those buffers could use cross-platform mutex
from vp9_thread.*.
(cherry picked from commit 337e8015c9)
Change-Id: I0587a08447925f4554d7788686a31483c2ae3f37
The relationship of the user private data at runtime
is not preserved from decode() to this call which may
occur at an unknown point in the future
Change-Id: Ia7eb25365c805147614574c3af87aedbe0305fc6
Prepare for frame parallel decoding, the frame buffers must be
separated from the encoder and decoder structure, while the encoder
and decoder will hold the pointer of the BufferPool.
Change-Id: I172c78f876e41fb5aea11be5f632adadf2a6f466
Adapt the use of segmentation in AQ mode 2 based on
the ambient kf/arf/gf Q.
Disable segmentation where the rate per SB is very
low and overheads are likely to outweigh the benefits.
This patch reduces the -ve average metrics impact
of AQ mode 2 while allowing stronger 3 segment AQ
in some cases. Average improvement ~0.5-1.0%.
Change-Id: I5892dfcc7507c5cc6444531cc7fe17554cf8d0c7
The y4m extension used is the same as the one used in ffmpeg/x264.
The patch is adapted from the highbitdepth branch.
Also adds unit tests for y4m header parsing and md5 check
of the raw frame data, as well as y4m writing.
Change-Id: Ie2794daf6dbafd2f128464f9b9da520fc54c0dd6
This commit re-designs the quantization process for transform
coefficient blocks of size 4x4 to 16x16. It improves compression
performance for speed 7 by 3.85%. The SSSE3 version for the
new quantization process is included.
The average runtime of the 8x8 block quantization is reduced
from 285 cycles -> 255 cycles, i.e., over 10% faster.
Change-Id: I61278aa02efc70599b962d3314671db5b0446a50
Add a conditional compile flag for this feature. Also add a
switch to enable the encoder to use these statistics in the
second pass. Currently, the switch is turned off.
Change-Id: Ia1c858c35ec90e36f19f5cffe156b97ddaa04922
The current threshold is knid of low, and in many cases NEWMV
mode is checked but not picked as the best mode. This patch
added a speed feature to increase NEWMV threshold, so that
less partition mode checking goes to check NEWMV. This feature
is enabled for speed 6 and 7.
Rtc set borg tests showed:
1. Speed 6, overall psnr: -0.088%, ssim: -1.339%;
Average speedup on rtc set is 11.1%.
2. Speed 7, overall psnr: -0.505%, ssim: -2.320%
Average speedup on rtc set is 12.9%.
Change-Id: I953b849eeb6e0d5a1f13eacba30c14204472c5be
pull the latest from WebP, which adds a worker interface abstraction
allowing an application to override init/reset/sync/launch/execute/end
this has the side effect of removing a harmless, but annoying, TSan
warning.
Original source:
http://git.chromium.org/webm/libwebp.git
100644 blob 08ad4e1fecba302bf1247645e84a7d2779956bc3 src/utils/thread.c
100644 blob 7bd451b124ae3b81596abfbcc823e3cb129d3a38 src/utils/thread.h
Local modifications:
- s/WebP/VP9/g
- camelcase functions -> lower with _'s
- associate '*' with the variable, not the type
Change-Id: I875ac5a74ed873cbcb19a3a100b5e0ca6fcd9aed
The caller should reset the state instead of letting worker
to reset.
This reverts commit 34b2ce15f9.
Change-Id: Idb546ea6386cffc44e98dee772900d21ab79710f
Encoder still uses SWITCHABLE as default via DEFAULT_INTERP_FILTER,
but does not override the default if it is not SWITCHABLE.
Change-Id: I3c0f6653bd228381a623a026c66599b0a87d01d5
When the frame is intra coded only, the encoder takes the RD
coding flow. Hence the function set_mode_info is not practically
in use. This commit removes it and the associated conditional
branches.
Change-Id: I1e42659ceb55b771ba712d1cdecacb446aa6460d
This fixes the hang in VP9/InvalidFileTest.ReturnCode/3
due to worker->had_error has not been reset after getting
error.
Change-Id: Ia3608225094758a2bd88f6ae4dd9dfd93bbaad27
For real time speed 7, once encode breakout is on(i.e. encoding
setting --static-thresh=1), a proper encode breakout threshold
is set to speed up the encoder.
Set --static-thresh=1, RTC set borg test showed a slight overall
psnr loss of 0.162%, but ssim gain of 0.287%. The average speedup
on RTC set is 6%, and for some clips, the speedup can be 10+%.
Change-Id: Id522d9ce779ff7c699936d13d0c47083de4afb85
Before encoding a frame, calculate and store each 16x16 block's
variance of source difference between last and current frame.
Find partitioning threshold T for the frame from its variance
histogram, and then use T to make partition decisions.
Comparing with fixed 16x16 partitioning, rtc set test showed an
overall psnr gain of 3.242%, and ssim gain of 3.751%. The best
psnr gain is 8.653%.
The overall encoding speed didn't change much. It got faster for
some clips(for example, 12% speedup for vidyo1), and a little
slower for others.
Also, a minor modification was made in datarate unit test.
Change-Id: Ie290743aa3814e83607b93831b667a2a49d0932c
this file is an include and doesn't need to be built on its own.
fixes:
ranlib: file: libvpx_g.a(x86inc.asm.o) has no symbols
Change-Id: I89504e37ff0a4488489af7b9b7e09fb32acc4853
the buffer is only used in encoding and only when
CONFIG_INTERNAL_STATS or CONFIG_VP9_POSTPROC is enabled.
a future change should decouple this from the frame buffer allocation
and make it conditional based on runtime flags when the above config
options are enabled.
reduces decode heap usage by at least 12%
Change-Id: Id0b97620d4936afefa538d3aadf32106743d9caf
This reverts commit b336356198.
This causes a hang in:
VP9/InvalidFileTest.ReturnCode/3
the change to test/user_priv_test.cc remains with a minor update
Change-Id: I4a8a272ca37ea329b0f413f0b1cd827a238bd9fd
Encoding screen content exercises various fast skip paths that are
missed by natural video content.
Change-Id: Ie359884ef9be89cbe5dda6d82f1f79360604a090
This patch checks that a decoder never tries to reference frame that's
outside the range of 2x to 1/16th the size of this frame. Any attempt
to do so causes a failure.
Change-Id: I5c98fa7bb95ac4f29146f29dd92b62fe96164e4c
Set the proper number of mb_rows/cols.
Also remove warnings (unused variable) when configured with temporal-denoising disabled.
Change-Id: I8abd2372394ee55295feb87a66efd294ea6989d0
Bug introduced in I930dced169c9d53f8044d2754a04332138347409. If
svc.number_temporal_layers == 1 and svc.number_spatial_layers == 1, the system
attempt to do spatial SVC. It no longer does that.
Change-Id: Ie6b130a72b1eea40c547c9a64447e40695f811c5
For the primary arf in a group, if multiple arfs
are enabled and we were using arfs in the previous
group, then allow the second arf from the previous
group to be used as an additional reference.
Change-Id: Iaf41706a52f54ef21548026851cd77100d6aebda
This commit enables an adaptive transform size selection method
for speed -6. It uses largest transform size when the sse is more
than 4 times of variance, i.e., most energy is compacted in the
DC coefficient. Otherwise, use the default TX_8X8. It improves
the compression efficiency for rtc set of speed -6 by 0.8%, no
speed change observed.
Change-Id: Ie6ed1e728ff7bf88ebe940a60811361cdd19969c
This patch allows the encoder to skip the partition search for the
frame if it is an inter frame and only zero motion vectors have
been detected in the first pass. The partition size is directly
assigned according to the difference variance.
Borg tests show overall little performance changes in term of PSNR
(derf -0.027%, yt 0.152%, hd 0.078%, stdhd 0%). The worst case of
PSNR loss is -0.514% from yt. The best PSNR gain is 4.293% from yt.
The second pass encoding speedup for slideshow clips is 15%-40%.
Change-Id: I881f347d286553ee5594a9ea09ba1a61ac684045
C version and sse2 version, and off by default.
For the test clip used, the sse2 performance improved by ~5.6%
Change-Id: Ic2d815968849db51b9d62085d7a490d0e01574f6
This commit enables a fast reference motion vector search scheme.
It checks the nearest top and left neighboring blocks to decide the
most probable predicted motion vector. If it finds the two have
the same motion vectors, it then skip finding exterior range for
the second most probable motion vector, and correspondingly skips
the check for NEARMV.
The runtime of speed -5 goes down
pedestrian at 1080p 29377 ms -> 27783 ms
vidyo at 720p 11830 ms -> 10990 ms
i.e., 6%-8% speed-up.
For rtc set, the compression performance
goes down by about -1.3% for both speed -5 and -6.
Change-Id: I2a7794fa99734f739f8b30519ad4dfd511ab91a5
Bug introduced during multiple iterations on: I3831*
gf_group->arf_update_idx[] cannot currently be used
to select the arf buffer index if buffer flipping on overlays
is enabled (still currently the case when multi arf OFF).
Change-Id: I4ce9ea08f1dd03ac3ad8b3e27375a91ee1d964dc
This commit fixes the potential issue in the non-RD mode decision
flow that only checks part of the block to estimate the cost. It
was due to the use of fixed transform size, in replacing the
largest transform block size. This commit enables per transform
block cost estimation of the intra prediction mode in the non-RD
mode decision.
Change-Id: I14ff92065e193e3e731c2bbf7ec89db676f1e132
same reasoning as:
9f3a0db vp9_rtcd: correct avx2 references
these are all intrinsics, so don't depend on x86inc.asm
Change-Id: I915beaef318a28f64bfa5469e5efe90e4af5b827
This patch reverts the previous revert from Jim and also add a
variable user_priv in the FrameWorker to save the user_priv
passed from the application. In the decoder_get_frame function,
the user_priv will be binded with the img. This change is needed
or it will fail the unit test added here:
https://gerrit.chromium.org/gerrit/#/c/70610/
This reverts commit 9be46e4565.
Change-Id: I376d9a12ee196faffdf3c792b59e6137c56132c1
Cosmetic patch only in response to comments on
previous patches suggesting a couple of name changes
for consistency and clarity.
Change-Id: Ida3a359b0d5755345660d304a7697a3a3686b2a3
like vpx_codec_decode(), vpx_codec_peek_stream_info() takes an unsigned
int, not size_t, parameter for buffer size
Change-Id: I4ce0e1fbbde461c2e1b8fcbaac3cd203ed707460
the max is 6. there are assumptions throughout the decode regarding
this; fixes a crash with a fuzzed bitstream
$ zzuf -s 5861 -r 0.01:0.05 -b 6- \
< vp90-2-00-quantizer-00.webm.ivf \
| dd of=invalid-vp90-2-00-quantizer-00.webm.ivf.s5861_r01-05_b6-.ivf \
bs=1 count=81883
Change-Id: I6af41bb34252e88bc156a4c27c80d505d45f5642
This commit replaces a few use cases of cpi->common with preset
variable cm, to avoid unnecessary pointer fetch in the non-RD
coding mode.
Change-Id: I4038f1c1a47373b8fd7bc5d69af61346103702f6
In real-time speed 6, no partition search is done. The inter
prediction results got from picking mode can be reused in the
following encoding process. A speed feature reuse_inter_pred_sby
is added to only enable the resue in speed 6.
This patch doesn't change encoding result. RTC set tests showed
that the encoding speed gain is 2% - 5%.
Change-Id: I3884780f64ef95dd8be10562926542528713b92c
There is a normative scaling range of (x1/2, x16)
for VP9. This patch fixes the maximum downscaling
tests that are applied in the convolve function.
The code used a maximum downscaling limit of x1/5
for historic reasons related to the scalable
coding work. Since the downsampling in this
application is non-normative it will revert to
using a separate non-normative scaler.
Change-Id: Ide80ed712cee82fe5cb3c55076ac428295a6019f
Add indirection to the section of buffer indices.
This is to help simplify things in the future if we
have other codec features that switch indices.
Limit the max GF interval for static sections to fit
the gf_group structures.
Change-Id: I38310daaf23fd906004c0e8ee3e99e15570f84cb
Fix some bugs relating to the use of buffers
in the overlay frames.
Fix bug where a mid sequence overlay was
propagating large partition and transform sizes into
the subsequent frame because of :-
sf->last_partitioning_redo_frequency > 1 and
sf->tx_size_search_method == USE_LARGESTALL
Change-Id: Ibf9ef39a5a5150f8cbdd2c9275abb0316c67873a
This patch implements a mechanism for inserting a second
arf at the mid position of arf groups.
It is currently disabled by default using the flag multi_arf_enabled.
Results are currently down somewhat in initial testing if
multi-arf is enabled. Most of the loss is attributable to the
fact that code to preserve the previous golden frame
(in the arf buffer) in cases where we are coding an overlay
frame, is currently disabled in the multi-arf case.
Change-Id: I1d777318ca09f147db2e8c86d7315fe86168c865
The encoder currently allocates frame buffers before
it establishes what the chroma sub-sampling factor is,
always allocating based on the 4:4:4 format.
This patch detects the chroma format as early as
possible allowing the encoder to allocate buffers of
the correct size.
Future patches will change the encoder to allocate
frame buffers on demand to further reduce the memory
profile of the encoder and rationalize the buffer
management in the encoder and decoder.
Change-Id: Ifd41dd96e67d0011719ba40fada0bae74f3a0d57
This patch insures that the last byte of a chunk that contains a
valid superframe marker byte, actually has a proper superframe index.
If not it returns an error.
As part of doing that the file : vp90-2-15-fuzz-flicker.webm now fails
to decode properly and moves to the invalid file test from the test
vector suite.
Change-Id: I5f1da7eb37282ec0c6394df5c73251a2df9c1744
Avoids failures:
MSE_ClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKeyDecryptOnly/EncryptedMediaTest.Playback_VP9Video_WebM/0
MSE_ExternalClearKeyDecryptOnly_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ExternalClearKey/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ExternalClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
SRC_ClearKey_Prefixed/EncryptedMediaTest.Playback_VP9Video_WebM/0
Patches are
This reverts commit 9bc040859b
This reverts commit 6f5aba069a
This reverts commit 9bc040859b
I1f250441 Revert "Refactor the vp9_get_frame code for frame parallel."
Ibfdddce5 Revert "Delay decreasing reference count in frame-parallel decoding."
I00ce6771 Revert "Introduce FrameWorker for decoding."
Need better testing in libvpx for these commits
Change-Id: Ifa1f279b0cabf4b47c051ec26018f9301c1e130e
Another project in ChromeOS is using these files. To make libvpx
rolls simpler, add these files back unitl the other project removes
the dependency.
crbug.com/387246 tracking bug to remove dependency.
Change-Id: If9c197081c845c4a4e5c5488d4e0190380bcb1e4
When decoding in serial mode, there will be only
one FrameWorker doing decoding. When decoding in
parallel mode, there will be several FrameWorkers
doing decoding in parallel.
Change-Id: If53fc5c49c7a0bf5e773f1ce7008b8a62fdae257
See: https://code.google.com/p/chromium/issues/detail?id=362697
The code properly catches an invalid stream but seg faults instead of
returning an error due to a buffer not having been initialized. This
code fixes that.
Change-Id: I695595e742cb08807e1dfb2f00bc097b3eae3a9b
This patch adds a mechanism for insuring error checking on invalid files
by creating a unit test that runs the decoder and tests that the error
code matches what's expected on each frame in the decoder.
Disabled for now as this unit test will segfault with existing code.
Change-Id: I896f9686d9ebcbf027426933adfbea7b8c5d956e
s/stdint.h/vpx\/vpx_int.h
Added missing 'break;'s
Also included other minor changes, mostly cosmetic.
Change-Id: I852bba3e85e794f1d4af854c45c16a23a787e6a3
The test for this is in test vector code ( show existing frames will
fail ). I can't check it in disabled as I'm changing the generic
test code to do this:
https://gerrit.chromium.org/gerrit/#/c/70569/
Change-Id: I5ab324f0cb7df06316a949af0f7fc089f4a3d466
This commit allows the key frame to search through more prediction
modes and more flexible block sizes. No speed change observed. The
coding performance for rtc set is improved by 1.7% for speed -5 and
3.0% for speed -6.
Change-Id: Ifd1bc28558017851b210b4004f2d80838938bcc5
A superframe is a bunch of frames that bundled as one frame. It is mostly
used to combine one or more non-displayable frames and one displayable frame.
For frame parallel decoding, libvpx decoder will only support decoding one
normal frame or a super frame with superframe index.
If an application pass a superframe without superframe index or a chunk
of displayable frames without superframe index to libvpx decoder, libvpx
will not decode it in frame parallel mode. But libvpx decoder still could
decode it in serial mode.
Change-Id: I04c9f2c828373d64e880a8c7bcade5307015ce35
This breaks the profile 1 bitstream.
Don't force non420 uv transform size to 1/4 y size. In the 4:2:0 case the
chroma corresponding to a luma block is 1/4 its size. In the 4:4:4 case
chroma and luma planes are the same size. Disallowing larger transforms
can result in a loss of compression efficiency and is inconsistent.
For sub-8x8 blocks only average corresponding motion vectors.
4:2:0 and profile 0 behavior remains unchanged.
Change-Id: I560ae07183012c6734dd1860ea54ed6f62f3cae8
- Rename build_targets to build_framework
- Add functions for creating the vpx_config shim and obtaining
preproc symbols.
Change-Id: Ieca6938b9779077eefa26bf4cfee64286d1840b0
Speed 6 uses small tx size, namely 8x8. max_intra_bsize needs to
be modified accordingly to ensure valid intra mode checking.
Borg test on RTC set showed an overall PSNR gain of 0.335% in speed
-6.
This also changes speed -5 encoding by allowing DC_PRED checking
for block32x32. Borg test on RTC set showed a slight PSNR gain of
0.145%, and no noticeable speed change.
Change-Id: I1502978d8fbe265b3bb235db0f9c35ba0703cd45
This is the first step to rework the rate-distortion modeling used
in rtc coding mode. The overall goal is to make the modeling
customized for the statistics encountered in the rtc coding.
This commit makes encoder to perform rate-distortion modeling for
DC and AC coefficients separately. No speed changes observed.
The coding performance for pedestrian_area_1080p is largely
improved:
speed -5, from 79558 b/f, 37.871 dB -> 79598 b/f, 38.600 dB
speed -6, from 79515 b/f, 37.822 dB -> 79544 b/f, 38.130 dB
Overall performance for rtc set at speed -6 is improved by 0.67%.
Change-Id: I9153444567e5f75ccdcaac043c2365992c005c0c
strip trailing '/' from paths, this is later converted to '\' which
causes execution errors for obj_int_extract/yasm. vs10+ wasn't affected
by this issue, but make the same change for consistency.
gen_msvs_proj:
+ add missing '"' to obj_int_extract call
unlike gen_msvs_vcproj, the block is duplicated
missed in: 1e3d9b9 build/msvs: fix builds in source dirs with spaces
Change-Id: I76208e6cdc66dc5a0a7ffa8aa1edbefe31e4b130
This patch allows the VP9 encoder to skip the un-necessary
motion search in the first pass. It computes the motion error
of 0,0 motion using the last source frame as the reference,
and skips the further motion search if this error is small.
Borg test shows overall the patch gives PSNR gain (derf -0.001%,
yt 0.341%, hd 0.282%). Individual clips may have PSNR gain or
loss. The best PSNR performance is 7.347% and the worst is -0.662%.
The first pass encoding speedup for slideshow clips is over 30%.
Change-Id: I4cac4dbd911f277ee858e161f3ca652c771344fe
Allow for an option to selectively apply the deblocking loop filter to the denoised
raw block, based on the denoised state (no-filter, filter with zero motion, or filter with non-zero motion)
of the current block and its upper and left denoised block.
This helps to reduce some blocking artifacts from the motion-compensated denoising.
Change-Id: I0ac4e70076df69a98c5391979e739a2681e24ae6
This commit fixes frame header decoding for superframe index, to
prevent out of boundary memory read triggered by fuzz test
vector. It resolves a chromium security violation issue
crbug.com/376802.
The issue was introduced in the change:
Add VPXD_SET_DECRYPTOR support to the VP9 decoder.
cl-id I88f86c8ff9af34e0b6531028b691921b54c2fc48
where the buffer was read before validation check on index offset
applied.
A test vector is added accordingly.
Change-Id: I41c988e776bbdd1033312a668e03a3dbcf44ca99
This patch appears to have introduced non-determinism and/or
mismatch from debug vs release.
This reverts commit 5daef90efc.
Change-Id: I80081e55cfeaaa821b510b58a4e6e6328003c7da
The current decoding scheme will decrease the reference count
of the output frame when finish decoding. Then the application
could copy the frame from the decoder buffer to application buffer.
In frame-parallel decoding, a decoded frame will not be outputted
until several frames later which depends on thread numbers. So
the decoded frame's reference count should be decreased only
after application finish copying the frame out. But due to the
limitation of vpx_codec_get_frame, decoder could not know when
application finish decoding. So use a index last_show_frame to
release the last output frame's reference count.
Change-Id: I403ee0d01148ac1182e5a2d87cf7dcc302b51e63
This commit enables a fast path computational flow for forward
transformation. It checks the sse and variance of prediction
residuals and decides if the quantized coefficients are all
zero, dc only, or more. It then selects the corresponding coding
path in the forward transformation and quantization stage.
It is currently enabled in rtc coding mode. Will do it for rd
coding mode next.
In speed -6, the runtime for pedestrian_area 1080p at 1000 kbps
goes down from 14234 ms to 13704 ms, i.e., about 4% speed-up.
Overall coding performance for rtc set is changed by -0.18%.
Change-Id: I0452da1786d59bc8bcbe0a35fdae9f623d1d44e1
This patch allows the encoder to skip the
un-neccessary motion search in the first pass. It
calculates the error of the zero motion vector using
the last source frame as reference and skips the
further motion search in the first pass if the error
is small.
The encoding speedup of the first pass for slideshow
videos is over 30%. Borg test shows the overall PSNR
performance remain approximately the same (derf -0.009,
hd 0.387, yt 0.021, stdhd 0.065). Individual clips may
have either PSNR gain or loss. The worst PSNR perfomance
is from yt set, with a PSNR loss of -1.1.
Change-Id: I08b2ab110b695e4689573b2567fa531b6457616e
* Only use ZEROMV, disalowing the intra modes that were previously
tested.
* Score rate and distortion as zero.
Change-Id: Ifcf99e272095725f11da1dcd26bd0f850683e680
Really just armv7. This is a convenience target intended to make iOS
development with libvpx easier. Xcode projects with default settings
will fail to build when a framework lacks armv7s support when targetting
iOS7.
Change-Id: I7eb80d52eec25501febc0d2c3c0b4ed964b8ed5b
In non frame-parallel decoding, this works the same way as
current decoding scheme. Every time after decoder finish
decoding a frame, it will swap the current mode info pointer
and previous mode info pointer if the decoded frame needs
to be shown. Both mode info pointer and previous mode info
pointer are from mode info arrays.
In frame-parallel decoding, this will become more complicated
as current frame's mode info pointer will be shared with next
frame as previous mode info pointer. But when one decoder
thread finishes decoding one frame and starts to work on next
available frame, it needs to retain the decoded frame's mode
info pointers until next frame finishes decoding. The mode info
index will serve this purpose. The decoder will use different
buffer in the mode info arrays and use the other buffer to save
previous decoded frame’s mode info.
Change-Id: If11d57d8eb0ee38c8876158e5482177fcb229428
tests failing under Win32/Win64
+ dct16x16_test: add missing avx2 functions (partially disabled)
exercises the forward transforms
no idct/iht implementations, so the c-code is used
Change-Id: I04f64a457fa0828a00f32b5c9fe4f55294f21f61
In non-rd real-time mode, choosing smaller transform size in
encoding gives better video quality and good speed gain than
choosing larger transform size. This patch set tx size search
method to ALLOW_8X8, which is better than using 4x4 or other
larger sizes.
Borg tests on rtc set at speed 6 showed significant gain on quality.
PSNR gain: 11.034% and SSIM gain: 15.466%.
The speed gain is 5% - 12% for <720p clips, and 2% - 7% for
720p clips.
Change-Id: If4dc74ed2df359346b059f47fb73b4a0193ec548
Use of stack frame variable "fps" beyond the lifetime of the function.
fps is sent as a paremeter to output_stats and stored in the
packet holding this encoded frame. This has scope beyond the
lifetime of the calling function.
This reverts commit 3f95a230c7
Change-Id: Icd8e14b3d7dd733590ada12e619b9dce95b6b0f5
* changes:
gen_msvs_*proj.sh: strip SRC_PATH_BARE from obj names
*.mk: pass SRC_PATH_BARE to all GEN_VCPROJ invocations
build/msvs: fix builds in source dirs with spaces
When this compiler flag is enabled, the encoder will write a denoised,
uncompressed, version of the input to denoised.yuv.
Change-Id: Ie0247f76b23219d95fe97dd70f23e097d742c249
This commit enables unit test for SSSE3 16x16 inverse 2D-DCT with
10 non-zero coefficients. It includes a new test condition to
cover the potential overflow issue due to extremely coarse quantization.
Change-Id: I945e16f05dfbe19500f0da5f15990feba8e26d99
The SSSE3 implementation might find a potential overflow issue in
its second 1-D transform, if all input residual pixels are close to
255. This commit fixes the issue and re-enables the unit test on
the SSSE3 version.
Change-Id: I0520478abdab7afd3ff2842516bec951111e9b3c
This commit reworks the unit test for 8x8 forward/inverse
transformation. It adds extreme input value test to detect overflow
issues in the intermediate steps.
It temporarily disables unit test for the SSSE3 version, which
showed overflow failure in the new test conditions.
Change-Id: I7caf10bba4b6db031add65d8c0eb99426b38aa42
Right now there is just one place to check: xd->lossless and for the first
pass there is a function is_lossless_requested().
Change-Id: I949a6834e64ce51e422e2892f097f2b871b5429a
In Aq mode 2 for kf/arf/gf the segment q delta
is calculated and then applied by re-quantization without
going through the rd loop again. If the base Q != 0
but the segment Q == 0 (lossless) this can could give rise
to a situation where we have an illegal combination of
transform size and Q. (Q == 0 requires that all blocks
are coded 4x4 WHT).
Change-Id: I241a58c6494ed442e9e4630070b0cde0fb99ae45
...when configured below the path containing spaces. configuring outside
the path containing spaces still won't work due to issues with the
makefiles, e.g.,
/path with spaces/git
/path with spaces/build1
/build2
configure/make in build1 will work, build2 will not
Change-Id: Ie4a1f313596d7457cadd67476ac1dbd3273ad46e
As a side-effect, the sad unit tests for VP8 and VP9
had to be separated.
Fixes a bug in original patch:
(https://gerrit.chromium.org/gerrit/#/c/70163/8)
that was reverted due to a nightly test failure.
Change-Id: Ia2a4e9e278fd3c89d6c3c82fcc6381320ec2a8a6
This commit applies quantization process with coarse quantization
step size to the forward transform coefficients and tests all the
inverse 16x16 DCT and ADST implementation versions with the
dequantized coefficients as input, to verify that the outcomes
match the prototype.
Change-Id: I68034a6126b45192c87d8c642155290e89bff8fa
In frame parallel decoding mode, there will be still several frames inside
the decoder when application stop calling vpx_codec_decode to decode frames.
The application will need to keep calling vpx_codec_get_frame to get all the
remaining decoded frames in the decoder.
Change-Id: I2ce8260a91282f045bb9a6093ff8a606b1990f14
This commit added a call to set speed feature before initializing
motion search, fixed the problem where unintialized search method
is used before its value being set.
Change-Id: I537e4612bf0d00fd6f51396fd222d4b3bd6fde58
By enabling the OUTPUT_YUV_SRC compiler flag, the encoder will write the raw
input to bd.yuv.
The functionality was mostly implemented, but in its previous state did not
compile.
Change-Id: Ia331ad0f4c6e6f9f51e8d42cd33ba8cc146b3dbf
Making this consistent with intra mode masks: you need to specify
allowed inter/intra modes to use.
Change-Id: Iaecd28bf79047259707d8e7a59a57bb7b856383e
SEG_LEVEL_SKIP requires the block size to be at least 8x8. Attempting to
use it on smaller partitions causes the decoder to reject the bitstream.
Change-Id: Ia7188cdf8ae5ac1df6bd29f3f80dbb0610e1f7b1
An overflow issue could potentially happen in the second round 1-D
transform of the SSSE3 full inverse 16x16 2D-DCT. This commit fixes
this issue.
Change-Id: Ia19e4888fda1cc929a28a5f89a5beec612d628dc
Now match the "C" version of "Fix to reduce block
artifacts from vp8 temporal denoiser."
(see change id Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f)
Change-Id: I99e569bb6af4ae3532621127e12bf917a48ba08e
In the current logic, if the sse for zero motion is smaller
than the sse for new_mv (i.e., best_sse), we may still end up
using the non-zero mv for denoising (if the magnitude of new_mv is above threshold).
This can happen for very noisy content, and can lead to artifacts.
This change ensures that we always use zero_mv (over new_mv) for
denoisng if sse_zero_mv <= best_sse.
Change-Id: I8ef9294d837b077013b77a46c9a71d17c648b48a
This commit enables SSSE3 implementation of the inverse 2D-DCT
with only first 10 coefficients non-zero. It reduces the runtime
of SSE2 version from 745 cycles to 538 cycles, i.e., 27% speed-up.
Change-Id: I18ba4128859b09c704a6ee361d69a86c09fe8dfe
This code dates from the ancient past and
applied an error score weighting based on pixel
brightness. This not seem to be providing any
benefit metrics wise and could be making some
visual issues in dark frames worse.
The field is left in place in the FIRSTPASS_STATS data
structure in this patch, pending changes to unit tests that
use a pre-defined first pass file.
Change-Id: Id50f04205230234858e7548ce523f11acaf3567d
The subpixel SSSE3 was fixed in this patch:
https://gerrit.chromium.org/gerrit/#/c/70283/
So the equivalent AVX2 is fixed accordingly.
Change-Id: Ieebbc1949c99d34b12b8b47692df71aca5001f3a
The previous change only tunes forward transform. It doesn't affect
the neon implementation of the inverse transform. Hence turn the
unit test on.
Change-Id: I4f0f43783b98814d1eee53182209f9669d538140
This commit enables the SSSE3 implementation of full inverse 16x16
2D-DCT. The unit runtime goes down from 1642 cycles to 1519 cycles,
about 7% speed-up.
Change-Id: I14d2fdf9da1fb4ed1e5db7ce24f77a1bfc8ea90d
The intepolation filter functions can be better tested withe extreme
values, especially given the optimization functions are prone to
overflow signed 16 bit intermediate value when operation order is
wrong.
Change-Id: I712142b0bc1e5969c692c0486a57ffa37c9742b5
Further changes to first pass allocation for gf/arf groups.
Three variables removed from TWO_PASS structure as only
now used locally. Dont adjust gf_group_bits in the post
encode update as this will no longer have any effect.
Change-Id: Iff89b225db923fc856f5d2aedbc899f1d7d68b55
In 8-tap filtering, to guarantee the intermediate results fit in
16 bits, the order of accumulating the products needs to be done
correctly, and the largest product should be added last. This
patch fixed the problem using the method in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation".
Change-Id: I79d0ad60c057b15011ece84cda9648eee0809423
Restructuring to allocate the bits for each frame in
a GF group at the time the group is defined.
At the moment the allocation closely mirrors what
we had before.
Also changes the default rate adjustment method to
LONG_TERM_VBR_CORRECTION.
Change-Id: Ie5793c46c6b9c888cead5d8790792efd7d60b7c1
Fixes a bug introduced in
https://gerrit.chromium.org/gerrit/#/c/69779/13, where
uninitialized frame buffers due to corrupt and short
buffer sizes, may cause a crash.
This patch fixes the currently failing
video/processing/static_image/vp8_convert_test
Change-Id: I1b09e21482f292c11a2bfb4e570aef1d643410a7
As mismatchs were found between the intrinsic version and c only. The
commit temporarily revert to use the matching assembly version to
allow further investigation.
Change-Id: I08436c47d4888b562c0eac8e8856d90a831442df
Use the appropriate subblock offset mode info rather than the parent
block base, when filling mbmi in the pc tree in nonrd_use_partition.
This mimics what is done in the vertical case and what is done for
both cases in nonrd_pick_partition.
This change has little practical effect at the moment since in speed 5
rt horizontal and vertical partitions are currently only used unpaired
at edges of the picture.
Change-Id: I4632f66ca84086dac56c7d36b45ddbe38a06f42a
This did the same correction as the one in commit "Correct ssse3
8/16-pixel wide sub-pixel filter calculation" to avoid saturation
during filtering.
Change-Id: Ife9aa3f62daf9114eb24fe38f7baa3c3f361b2d6
If we are already saving a lot in bits from the target (maximum)
bitrate in the constrained quality mode, allow the quantizer
to go lower than the cq level. This hopefully will solve issues
with getting too low a bitrate and consequently poor quality for
certain videos in cq mode.
Change-Id: I1c4e8b0171fcf58f95198b3add85eea5f3c8f19f
If the denoiser filter causes too big a change in the absolute pixel difference
(between source and denoised signal), the block is not denoised, which can cause
visual block artifacts. This change applies a second adjustment to the temporal filter
to effectively allow for a (weaker) denoising for such blocks (which can keep
the absolute differnence within the tolerance range in most cases).
This helps to reduce some of the block artifacts from the denoising.
The additional cost of re-applying the filter to this set of blocks is low,
as the percentage of blocks per frame (with too big a change in absolute pixel difference)
is typically small, 2-5%.
Change-Id: Id9b56e59e33f3c22e79d2f89f763bdde246fdf3f
Renames all x86_64 specific assembly files to consistently
end in _x86_64.asm. This will be useful for build systems to
handle these files differently.
All new 64-bit specific assembly files should use the new
naming convention.
Change-Id: I36c89584967c82ffc4088b1b5044ac15d2bb7536
This commit changed to enable the encoder to adjust motion dection
speed threshold based on picture size. In addition, cpu-used 1 now
does a partition search every other frame instead of every third
frame for low resolution inputs.
The change has no quality/speed impact for 720p and above. Test
showed the change increase encoding time by between 3% to 6% for
cpu-used 2 encodiong of 360p sequences. It also has a compression
gain about .3%.
For cpu-used 2, the change resolved some very disturbing visual
artifacts in certain sequences when large block partitionings and
transforms are used as a result of copying the partition from a
previous frame.
Change-Id: Ic7fd22508cdb811d4ca935655adbf20109286cfa
The final goal is eventually to get rid of both itxm_add and fwd_txm4x4.
This patch does it in the decoder.
Change-Id: Ibb3db57efbcbb1ac387c6742538a9fcf2c6f24a5
The current decode_tiles decodes the frame one tile by one tile
and then loopfilter the whole frame or use another worker thread to
do loopfiltering.
|------|------|------|------|
|Tile1-|Tile2-|Tile3-|Tile4-|
|------|------|------|------|
For example, if a tile video has one row and four cols, decode_tiles
will decode the Tile1, then Tile2, then Tile3, then Tile4.
And during decode each tile, decode_tile will decode row by row in
each tile.
For frame parallel decoding, decode_tiles will decode video in row order
across the tiles. So the order will be:
"Decode 1st row of Tile1" -> "Decode 1st row of Tile2"
-> "Decode 1st row of Tile3" -> "Decode 1st row of Tile4"
-> "Decode 2nd row of Tile1" -> "Decode 2nd row of Tile2"
-> "Decode 2nd row of Tile3" -> "Decode 2nd row of Tile4"-> "loopfilter 1st row"
Change-Id: I2211f9adc6d142fbf411d491031203cb8a6dbf6b
This commit modifies the x86inc to allow explicit local buffer
allocation and the corresponding stack pointer adjustment.
Change-Id: I3cb2174e0242b5869a4ba0ca0cd240ee066836c3
MMX variance code in vp8 was reading out of bounds..
TODO(JBB): The best fix would involve removing duplicate library
functions between vp8 and vp9...
Change-Id: I5722853a6a58d3b55257ff385fa54c773bf98ded
This commit adjusts the forward 16x16 DCT computation steps to
simplify the register level operations. It fixes the corresponding
sse2 version accordingly.
Change-Id: I72a9c25b8ca9442fc5e113f47cd701ae55aa7f08
Added a skipping test in non-rd inter-mode. After interpolation
prediction step, the residuals are tested to see if they will be
quantized to 0 based on modeling between spatial domain and
frequency domain.
Set static-thresh to 800 for >=720p and 300 for <720p, rtc set
tests showed
1. Speed 5, psnr: -0.514%; ssim: -1.748%;
speedup on related clips: 5% -11%
2. Speed 6, psbr: -0.628%; ssim: -1.637%;
speedup on related clips: 4% - 9%
Change-Id: I62fbf26bc043ecd2b584f255f1a4ee5ab52bfcf3
Use VPX_TEST_NAME instead of the script name sans path and extension
when reporting test results when the variable is not empty.
Also: Clean up some style nits while I'm at it.
Change-Id: I0319745a3b7a90d0f307e55c5108fea2204187cd
The commit changed to use memset for initialiazation of non-trivial
strucutures, where initialization using {0} caused warnings. Also,
removed {0} initializations where appropriate initialization calls
are in place.
Change-Id: Ifd03e34aa80688e382124eb889c0fc1ec43c48e6
Make all post-processor code conditionally
compilable based on the CONFIG_VP9_POSTPROC
macro.
Also, remove the vizualization code from VP9
since it is out of date and will not compile.
Change-Id: I1e9e13a09ecd43e9a3f3704c175ae8cd258ababd
disabled by default, enable with:
--enable-experimental --enable-spatial-svc
this disables vp9_spatial_svc_encoder and svc_test, further work is
needed to remove internal lib references
Change-Id: I6a487ecbf07eb98843a99d96e17f08f960b63088
vp9_block_error_sse2 can only handle 16 bytes at a time but
the function requires to handle a sequence of 32 bytes at a time
so each 16 bytes is handled in a different register.
With AVX2 optimization the 32 bytes can be handled in one register instead
of two in the SSE2
The vp9_block_error was optimized by 85%.
The user level was optimized by 1.2%
Change-Id: Ia8fffe60e61eff7432a5fbd538757894f6c319fd
New name: vpx_temporal_svc_encoder.c
Also, update comment to note that example supports VP8 and VP9.
Change-Id: I6fffab81296f918ebca740192a5c609593852dff
The various motion search functions share a
common function prototype. In the case of
vp9_full_range_search() two of the parameters
are not needed.
Change-Id: I0e190af54a3b3f276409f20e8ec55912f9b0b798
Simplify the calculation of KF bitrate in similar way
to previous patch for GF/arf.
This has no impact on derf or std hd sets but gives a
small net gain of ~0.1% for yt and yt-hd sets.
Change-Id: Ida64ac1428d9c2a62adb67056fadbf0180eff030
The variation in boost calculation for gf and arf groups
is not significant enough to justify the extra complexity.
Also removed some other spurious code that no longer
has much material impact.
The handling of the rare case, where the boost bits
number is less than the number of bits a that would
be allocated if a frame was not boosted, will be dealt
with in a subsequent patch.
This change actually helps on all sets a little by
~0.1% - 0.2% with slightly bigger gains on SSIM.
Change-Id: Id42c1ac22a80a8c4993cfa0e51bc733eb9ed4f75
As a side-effect, the max_sad check is removed from the
C-implementation of VP8, for consistency with VP9, and to
ensure that the SAD tests common to VP8/VP9 pass.
That will make the VP8 C implementation of sad a little slower
but given that is rarely used in practice, the impact will be
minimal.
Change-Id: I7f43089fdea047fbf1862e40c21e4715c30f07ca
This reverts commit 81ad047ee5.
Revert "VP8 for ARMv8 by using NEON intrinsics 15"
This reverts commit 727af7cebe.
This exposes a bug in gcc 4.9 regarding register allocation. Will reland
when 4.9 is fixed.
Change-Id: I2d8a04e4edde93719280e41550f4c0765608ec4d
The warning messages complained that there are unused arguments
in a few prediction modes. This structure was designed on purpose,
such that a wrapper function can cover all prediction mode cases
and make them readily accessible as an pointer array.
This commit silences such warnings.
Change-Id: I7036b6bdb70747e5327d8f6fceb154f100abc4c0
Allow slightly larger minq-maxq range for P frames. This improves
the compression performance of speed -5 for rtc set by 2.7% in psnr.
Change-Id: I438653d52d0fe51111509c6092e2334bac2de0cf
+ the remnants in the build system & README
the documentation that required php was removed in:
50fa585 Removing examples code generation and making them static.
Change-Id: Ibf00dca9ab2715fc21e8de358807b63d1445662c
When superframe index is available we completely rely on it and use frame
size values from the index.
Change-Id: I0011d08b223303a8b912c2bcc8a02b74d0426ee0
Inline loopfilter has been already handled in vp9_decode_frame().
Collecting all similar code in one place now.
Change-Id: I358a0280fc7c2b27cca520bc1e8c16c4eb6491dd
Re-factor duplicate code.
Add two pass check for use of section_intra_rating as
it is un-initialised in the 1 pass and rt case.
Change-Id: I93120796f07961b8a21fb26e1a9f0d3d13949994
One of a series of changes to clean up two pass
allocation as precursor to support for multiple arf
or boosted frames per GF/ARF group.
This change pulls out the calculation of the total bits
allocated to a GF/ARF group into a function, to aid
readability and reduce the line count for define_gf_group().
This change should have no material impact on output.
Change-Id: I716fba08e26f9ddde3257e7d9b188453791883a3
This commit enables a chessboard pattern for partition search. All
the black blocks run regular partition search ranging from 8x8 to
32x32. The rest white blocks take the nearby blocks' information
to adaptively decide the effective search range.
The compression performance for rtc set at speed -5 is down by 1.5%.
For pedestrian 1080p at speed -5, the runtime goes from 41594 ms to
39697 ms, i.e., about 5% faster.
Change-Id: Ia4b96e237abfaada487c743bca08fe1afd298685
The test vector has segment enabled with different quantizer used for
different segments for bot the first frame(key) frame and the rest of
non-key frames.
Change-Id: I7e21122183050ee046219caba483c18cbc34afe7
Fixes the idecoder in the case where:
cm->error_resilient_mode == 0, and
cm->frame_parallel_decoding_mode == 0, but
new_fb->corrupted == 1.
The assert in debug_check_frame_counts fails to
take into account the case of a corrupt frame.
Change-Id: Idf318a68458cc88d65d6f3f408a10d8ffe87e43f
* changes:
Turn on unit tests for SSSE3 8x8 forward and inverse 2D-DCT
Change eob threshold for partial inverse 8x8 2D-DCT to 12
SSSE3 8x8 inverse 2D-DCT with first 10 coeffs non-zero
tx_mode supercedes whatever mechanism is used to push for 16x16
allowing for the use of the 4x4 transform.
Change-Id: I6c3f05ab9fe52050e40cc6303de9334653763289
We only used two members from that struct: max_threads and inv_tile_order.
Moving them directly to VP9Decoder struct.
Change-Id: If696a4e5b5b41868a55f3cc971e1d7c1dd9d5f69
This reverts commit 4725ab7e51.
The constants are necessary to avoid breakage in vs9 builds:
warning C4180: qualifier applied to function type has no meaning; ignored
error C2436: 'f2_' : member function or nested class in constructor initializer list
while compiling class template member function 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>::tuple(const int &,const int &,unsigned int (__cdecl &))'
..\test\variance_test.cc : see reference to class template instantiation 'std::tr1::tuple<T0,T1,T2,T3,T4,T5,T6,T7,T8,T9>' being compiled
Change-Id: Ia218b74fc473d40f02fee84cb7009adfbe82e5a7
The scanning order has the first 12 coefficients of the 8x8 2D-DCT
sitting in the top left 4x4 block. Hence the partial inverse 8x8
2D-DCT allows to handle cases with eob below 12.
The overall runtime of the inverse 8x8 2D-DCT unit is reduced from
166 cycles (using SSE2) to 150 cycles (using SSSE3).
Change-Id: I4514f9748042809ac84df4c14382c00f313f1cd2
vp9_is_upper_layer_key_frame() definition does not match declaration--
it was missing the second const.
Change-Id: I71312579eb443be1924b8b06d8b3177c3dcb40f3
This commit enables ssse3 assembly implementation of the 8x8
inverse 2D-DCT with only first 10 coefficients non-zero. The
average runtime for this unit goes down from 198 cycles to 129
cycles (34.8% faster).
Change-Id: Ie7fa4386f6d3a2fe0d47a2eb26fc2a6bbc592ac7
Merged minq tables for arf and gf cases.
These tables were almost the same and for
VBR the arf table was not used at all.
Change-Id: Ie3c87e91dab613cf06f6945ac1ace0e0e4213d34
Small adjustment to the active Q range calculations.
These changes should slightly extend the available Q range
for KF/GF/ARF and narrow it for other frames.
The results for this change in isolation are broadly positive
for SSIM and average PSNR and slightly up but mixed for opsnr.
derf +0.293% opsnr, +1.286% SSIM
std-hd + 0.528% opsnr, + 1.746% SSIM
yt +0.056% opsnr, +0.457% SSIM
yt-hd -0.147% opsnr, + 0.226% SSIM
Change-Id: If065280342027ecc5d44b49fc1d440dfef041002
The test vector is produced to have a single key frame, with segment
map enabled and transmitted. Yet no segment feature is active.
Change-Id: I365d62f00d05c07098b9a76fc8d3a991e427ec1a
Includes changes that are not compatible with VS windows builds.
Amongst other things stdint.h is not supported in VS.
This reverts commit 89fbf3de50.
Change-Id: Ifa86d7df250578d1ada9b539c9ff12ed0c523cdd
When the variance is far less than sse, the block is considered to
be under light change. All the energy is compacted into DC coeff
and can be coded at low cost. In such situation, switch the rate-
distortion modeling from sse+var based back to variance based.
Note that this is a temporary solution to handle the rare situations
where the scene light changes.
Change-Id: I1ee0fe2b9eda6b5fac40152e1841bf23f4d229fd
This reverts commit c500fc22c1
There is an issue with gcc 4.6 in the Android NDK:
loopfiltersimpleverticaledge_neon.c: In function 'vp8_loop_filter_bvs_neon':
loopfiltersimpleverticaledge_neon.c:176:1: error: insn does not satisfy its constraints:
Change-Id: I95b6509d12f075890308914cc691b813d2e5cd9f
This reverts commit a5d79f43b9
There is an issue with gcc 4.6 in the Android NDK:
loopfilter_neon.c: In function 'vp8_loop_filter_vertical_edge_y_neon':
loopfilter_neon.c:394:1: error: insn does not satisfy its constraints:
Change-Id: I2b8c6ee3fa595c152ac3a5c08dd79bd9770c7b52
Pulling libwebm from upstream
Changes from upstream:
249629d make Mkv(Reader|Writer)(FILE*) explicit
7f3cda4 mkvparser: fix a bunch of windows warnings
5c06178 Merge "clang-format on mkvparser.[ch]pp"
4df111e clang-format on mkvparser.[ch]pp
7b24501 clang-format re-run.
c6767b9 Change AlignTrailingComments to false in .clang-format
9097a06 Merge "muxer: Reject file if TrackType is never specified"
eddf974 Merge "clang-format on mkvmuxertypes.hpp and webmids.hpp"
def325c muxer: Reject file if TrackType is never specified
41f869c Merge "clang-format on webvttparser.(cc|h)"
fd0be37 clang-format on webvttparser.(cc|h)
207d8a1 Merge "clang-format on mkvmuxerutil.[ch]pp"
02429eb Merge "clang-format on mkvwriter.[ch]pp"
0cf7b1b Merge "clang-format on mkvreader.[ch]pp"
2e80fed Merge "clang-format on sample.cpp"
3402e12 Merge "clang-format on sample_muxer.cpp"
1a685db Merge "clang-format on sample_muxer_metadata.(cc|h)"
6634c7f Merge "clang-format on vttreader.cc"
7566004 Merge "clang-format on vttdemux.cc"
9915b84 clang-format on mkvreader.[ch]pp
7437254 clang-format on mkvmuxertypes.hpp and webmids.hpp
0d5a98c clang-format on sample_muxer.cpp
e3485c9 clang-format on vttdemux.cc
46cc823 clang-format on dumpvtt.cc
5218bd2 clang-format on vttreader.cc
1a0130d clang-format on sample_muxer_metadata.(cc|h)
867f189 clang-format on sample.cpp
4c7bec5 clang-format on mkvwriter.[ch]pp
9ead078 clang-format on mkvmuxerutil.[ch]pp
fb6b6e6 clang-format on mkvmuxer.[ch]pp
ce77592 Update .clang-format to allow short functions in one line
0a24fe4 Merge "Add support for DateUTC and DefaultDuration in MKV Muxer."
11d5b66 Merge "Add .clang-format"
a1a3b14 Add .clang-format
0fcec38 Add support for DateUTC and DefaultDuration in MKV Muxer.
Change-Id: Ia0ed161ffc3d63c2eba8ed145707ffe543617976
In a future release we plan to remove the
option of setting the ARNR filter type.
This patch marks this control as being deprecated
as advance warning that it will be removed from
the API at some point.
Change-Id: I5dcca804b44c7c93b1a10da7d69d19ba6061869c
Added macro to conditionally compile some of the
post-processing functions only when CONFIG_POSTPROC
is defined.
This was causing the build for the generic-gnu target
to fail.
Change-Id: Ibfa447feceb7a0528135025f105be48f97e9965c
The rounding of the ARNR filter output prior to
normalization by the filter strength was incorrect
when strength = 0.
In this case 1 << (strength - 1) would not create the
required rounding of 0, rather it would outrange. This
patch fixes this issue.
Change-Id: I771809ba34d6052b17d34c870ea11ff67b418dab
This commit enables SSSE3 version full inverse 8x8 2D-DCT and
reconstruction. It makes the runtime of vp9_idct8x8_64_add down
from 256 cycles (SSE2) to 246 cycles.
Change-Id: I0600feac894d6a443a3c9d18daf34156d4e225c3
The microsoft build tools explicitly disallow building for arm in
the "desktop" target configuration; one has to target "Windows
Store" apps (aka WinRT/Metro) or Windows Phone. In Visual Studio
2012, one could just pick the v110_wp80 toolset which made the
vcxproj files buildable. In Visual Studio 2013, picking the v120_wp81
toolset isn't enough - one has to configure the vcxproj files
as an "AppContainerApplication". This has the implication that
you can't just build a plain .exe (such as the examples) - an .exe
project would need to have an AppxManifest file. Therefore we can
only build the library itself.
If loaded into Visual Studio for Windows (the Windows Store/Phone
version of Visual Studio, not the Desktop one), the obj_int_extract
project is omitted since it's treated as incompatible. Building
from the command line with msbuild works fine though.
The armv7-win32-vs12 target was added as part of a638bdf4 even
though actual use of it hadn't been tested.
Change-Id: Iee8088252cf790317aeb6b417d29058225f1f629
The getenv function doesn't exist there. In Visual Studio 2012,
the function still existed in the link libraries even though
it was hidden in the headers, but in the 2013 version it has been
removed from the link libraries as well.
Change-Id: Iea6289a698fa1788e906f5aabb6fddda3675815b
Disable register checks when vp9 is not configured. Soon vp8 assembly
will move to intrinsics, obviating this check.
This will still run the check when vp9 is enabled.
Change-Id: I90f50d22cb8c15e9c07f2c8e830e08de7fce0689
This does not do the full toolchain setup like the arm builds. It only
allows for ndk-builds. See the instructions in tests/android/README or
the webm jnin bindings project:
https://chromium.googlesource.com/webm/bindings/+/master/JNI/README.Android
Because this support is not quite polished, the build targets must be
forced. Please use
--force-target=x86-android-gcc --disable-ssse3 --disable-sse4_1 --disable-avx2
--force-target-mips-android-gcc
Change-Id: Ie2b6623f71ac816e3965c39bf97097e9d30b6e94
When ARNR filtering is disabled, by setting
arnr_max_frames=0, mode_skip_mask was being set to
-1 for the ARF frame resulting in no mode being
selected for the block.
The intent is to restrict the reference frame to the
previous ARF frame and the mode to one of ZEROMV,
NEARMV or NEARESTMV.
Change-Id: Ifc3920b153142cd01d422910c94d2f20ffb6f129
On balance Deb's modified rate control for VBR seems
to be outperforming especially on some low motion YT
clips so I have switched this to be the default mode for
now.
Change-Id: I0713d430cad6425ac5c48fccdf332e12814ee44a
Used horizonal add instructions instead of adding
byte lanes. The encoder performance improved by
~4% for the test clip used.
Change-Id: Iaddd10403fcffb5b3f53b1f591ab2fe0ff002c08
This patch did a cleanup following the commit "Save NEON registers
in VP8 NEON functions". The pushing/poping of callee-saved NEON
registers was moved into individual NEON functions. Therefore,
we don't need to save those registers at the beginning of codec.
The related code was removed.
Change-Id: I5648166514fc9beffb780aa138495597731f49ea
Assembly implementation of ssse3 8x8 forward 2D-DCT. The current
version is turned on only for x86_64. The average unit runtime
goes from 157 cycles down to 136 cycles, i.e., about 12.8% faster.
This translates into about 1.5% speed-up for pedestrian_area 1080p
at speed 2.
Change-Id: I0f12435857e9425ed7ce12541344dfa16837f4f4
This reverts commit 59e733ca81.
Hold off removing arnr_type to give users the opportunity
to change their script files to handle its deprecation. A
follow-up patch will mark the control for setting arnr_type
as deprecated and it will be removed completely in a later
revision of the code.
Change-Id: I8b817c744e144d3714234a4cd4309816d0c7e3e8
The recent compiler can generate optimized code that uses NEON registers
for various operations besides floating-point operations. Therefore,
only saving callee-saved registers d8 - d15 at the beginning of the
encoder/decoder is not enough anymore. This patch added register saving
code in VP8 NEON functions that use those registers.
Change-Id: Ie9e44f5188cf410990c8aaaac68faceee9dffd31
dist is broken in msvs currently due to a dependency on libs.mk which in
turn depends on the rest of the source tree, not just the examples
Change-Id: I3e313ceeae81eb29ef4bfb099d89756b43583eaa
This member of VP9_COMP seemed unnecessary since it
only shadowed VP9EncoderConfig.key_freq that is
accessible through VP9_COMP.
Change-Id: Ib751bb1cf1b0b3c50a2a527d7c34f6829dd6fee3
There are a few tests which read/write directly to/from WebM files. They should
be disabled when --disable-webm-io is passed.
Change-Id: Ibac4732e27c66da33082151ba6e6993eaa9a1efd
The global variables used in vpxdec.sh and vpxenc.sh have become useful
elsewhere: Define them in tools_common.sh instead.
Change-Id: I5b8dbd2e88c8d6b2f46c5c55d7711fa154c12b6a
The encoder was not handling requests to place keyframes at
fixed intervals, i.e. kf_min_dist == kf_max_dist, correctly.
In this case when looking to place the next keyframe it was
accumulating stats all the way up to the end of the firstpass
file. This patch corrects this behavior.
Change-Id: I948ad9f1d7faa0c05861df588136cce3bb61d7e7
This commit introduces a chessboard pattern search for the prediction
filter type search. It runs extensive search in alternate blocks and
allows the rest blocks to refer coding decisions of their nearby
neighbors.
For pedestrian 1080p at 4000 kbps, the runtime of speed -5 goes down
from 43990 ms to 42200 ms. The overall compression performance for
RTC set is changed by -1.37%.
Change-Id: Icfe220c49451cda796f0ca91d935c9ed01e56c9d
ARNR filtering is now forced to be centered on the ARF
frame and the other two options have been removed.
The other modes of constructing the ARNR frame were
not used and there does not seem to be any good
reason to maintain them.
This is purely an encoder-side change.
Change-Id: Ic772636d23f280752973852b9740083532a49de2
This patch fixed errors reported in Issue 746: "dr memory VP8
encode errors" and Issue 745: "dr memory VP8 decode errors".
The "UNINITIALIZED READ" errors were fixed in x86 assembly
code. The list of files fixed is
vp8_intra_pred_uv_tm_sse2
vp8_intra_pred_uv_tm_ssse3
vp8_intra_pred_uv_ho_mmx2
vp8_intra_pred_uv_ho_ssse3
vp8_intra_pred_y_tm_sse2
vp8_intra_pred_y_tm_ssse3
vp8_intra_pred_y_ho_sse2
Change-Id: Ib6df7bf1d442077fe534edfd90e50ad16fadacdd
For speed 3 and above, such search is only allowed at speed 3.
The change helped cif and stdhd set by 1.2% and .7% in compression,
but increased the encoding time by around 5%.
Change-Id: Ifa4832327f1c1bef3decb032ceb769cbf50e059f
Adds test code to verify that supplemental superframe information
that precedes the normal superframe information will not break
decoding.
Change-Id: Ia252b887d7ee138f51dc9a778376ff739402c455
The end_useage parameter is confusingly named since it
now actually defines the rate control method used.
Change-Id: I98912caabfe556b7af0b939a645d1336409e4d71
This commit enables a background detection approach for adaptive
quantizer control. It combines the cyclic refresh pattern and the
background information to determine the segment id for adaptive
quantizer selection, prior to the non-RD mode decision process.
It hence allows proper quantization information update for a more
precise rate-distortion modeling in the non-RD mode decision.
The compression performance of speed -5 for rtc set is improved
by 2.5%, at no speed change.
Change-Id: Ic3713e8ed9185b403b5b1679d19dabd57506d452
1. We didn't scale source image in lower layers so that
the stats are incorrect.
2. We didn't extend borders for re-constructed image.
Change-Id: Ia8d7bafbdb695ffa7f504e171f9449812e7bb0a3
Remove duplicate WebM parsing code in test/webm_video_source.h and linking it
against webmdec.c which does the exact same thing.
Change-Id: Ib7152eecde890fca58be42028cab18c9cb54221c
To make direct side by side testing this patch combines two
VBR corrections schemes to allow more direct side by side testing.
(The other patch was by Debargha chg id I0cd1f7...)
Change-Id: I271c45e5c4ccf8de8305589000218b80d9dc3a25
The background detection only tracks luma component. This commits
removes the frame buffer pointer retrieval for chroma components.
Change-Id: I098bd2950f5e5829ed5dc2b48568167248da7fad
This patch sets up a quad_tree structure (pc_tree) for holding all of
pick_mode_context data we use at any square block size during encoding
or picking modes. That includes contexts for 2 horizontal and 2 vertical
splits, one none, and pointers to 4 sub pc_tree nodes corresponding
to split. It also includes a pointer to the current chosen partitioning.
This replaces code that held an index for every level in the pick
modes array including: sb_index, mb_index,
b_index, ab_index.
These were used as stateful indexes that pointed to the current pick mode
contexts you had at each level stored in the following arrays
array ab4x4_context[][][],
sb8x4_context[][][], sb4x8_context[][][], sb8x8_context[][][],
sb8x16_context[][][], sb16x8_context[][][], mb_context[][], sb32x16[][],
sb16x32[], sb32_context[], sb32x64_context[], sb64x32_context[],
sb64_context
and the partitioning that had been stored in the following:
b_partitioning, mb_partitioning, sb_partitioning, and sb64_partitioning.
Prior to this patch before doing an encode you had to set the appropriate
index for your block size ( switch statement), update it ( up to 3
lookups for the index array value) and then make your call into a recursive
function at which point you'd have to call get_context which then
had to do a switch statement based on the blocksize, and then up to 3
lookups based upon the block size to find the context to use.
With the new code the context for the block size is passed around directly
avoiding the extraneous switch statements and multi dimensional array
look ups that were listed above. At any level in the search all of the
contexts are local to the pc_tree you are working on (in?).
In addition in most places code that used to call sub functions and
then check if the block size was 4x4 and index was > 0 and return
now don't preferring instead to call the right none function on the inside.
Change-Id: I06e39318269d9af2ce37961b3f95e181b57f5ed9
There is no need to initialize source/dst frame buffers at frame
level. These will be done at block coding stage. This commit hence
removes the redundant operations.
Change-Id: I11d9f2556058c6205c8e58ed53e31f78622c41b7
Add code to monitor over and under spend and
apply limited correction to the data rate of subsequent
frames. To prevent the problem of starvation or overspend
on individual frames (especially near the end of a clip) the
maximum adjustment on a single frame is limited to a %
of its un-modified allocation.
Change-Id: I6e1ca035ab8afb0c98eac4392115d0752d9cbd7f
This commit compares the current original frame to the previous
original frame at 64x64 block level and decides if the entire
block belongs to background area. If it is in the background area,
skip non-RD partition search and copy the partition types of the
collocated block in the previous frame.
For vidyo1 in the rtc set, this makes the speed -5 coding speed
about 8% faster. The overall compression performance is down by
1.37% for rtc set.
Change-Id: Iccf920562fcc88f21d377fb6a44c547c8689b7ea
This commit added a check of reference frame to make sure that pre
buffer pointers are initialized only when necessary and make them
to 0 if ref frame is intra, hence those buffer should never be used.
Change-Id: Ieb474fcd9feb759f02e2f9c282b7348a8fa31117
Delete code relating to the old VP8_TUNE_SSIM flag
as this code does not currently work and is largely made
redundant in VP9 by the various AQ modes.
Change-Id: I71f28e1f680573d296422254489000678552b17b
Remove duplicate rd_thresh code introduced when vp9_rd_pick_inter_mode_sub8x8()
was forked from vp9_rd_pick_inter_mode_sb().
Change-Id: I3c9b7143d182e1f28b29c16518eaca81dc2ecfed
Fix rate control bug whereby the rate factor heuristics
were being updated on arf overlays causing a rate surge
for a few frames followed by a corrective drop.
This fix eliminates many of the overshoot problems that
we were seeing on hard clips (even without applying
stricter vbr rate control) and also helps quality on
almost all clips with some hard clips improving by >5%.
Overall quality results measured at speed 2.
Derf +1.78% opsnr , +2.44% SSIM
Stdhd +2.41% opsnr, +2.85% SSIM
Change-Id: I2369df6295c2705963fa6307877f6acb304bcc39
Fix return values for webm_read_frame so that we can distinguish between
error and end of stream. 0 - Success, 1 - End of stream, -1 error.
Change-Id: Ic35d0c7d7a166e027711a3d2300ecdda25a5d0cc
We don't use declarations from this file. The real declarations
(differently named) are in vp9_rtcd_defs.pl, e.g. vp9_full_search_sad.
Change-Id: I73cbf064305710ba20747233cfdbe67366f069a0
Added command line flags "resize-width" & "resize-height"
to allow the user to specify the frame size to encode at.
These two flags are ignored if the "resize-allowed" switch
is not set to 1.
All frames in the clip are then encoded at this size, which
must be smaller than the raw frame size.
Change-Id: I3d64bd9303d5c0bd678461a866a1ea621700d744
A previous path improved speed 2 quality a little but
more extensive testing showed that it slowed encode
by a few %.
The change will have a similar effect for speed 3 but
should not impact speeds 4+;
This experiment should reverse that and give a speed
up at the cost of a small quality loss.
Borg results pending.
Change-Id: I4493fc1541aaf44587f1a41ff219f7088da9252c
Both values are already checked as command line arguments:
RANGE_CHECK_HI(cfg, g_lag_in_frames, MAX_LAG_BUFFERS);
RANGE_CHECK_HI(extra_cfg, sharpness, 7);
Change-Id: I584798d587152d88dfd517c210054b466f4e5f8a
With a more approriate one vp9_setup_src_planes() as only src buffer
pointers need to be initialized here.
Change-Id: I40fac4d8b2d39eb7d0c65b9b6afab45138a13936
This increases the range of Q values available to
normal inter frames to allow encoder a better chance
to hit the target rate.
Change-Id: I33cd96469a46577fdcea631e26d3355710909e6d
The limits applied under the flag
"LIMIT_QRANGE_FOR_ALTREF_AND_KEY"
behaved in an undesirable way if the gap between
active_worst_quality and active_best_quality was
changed.
In this patch, the adjustment is made using the
vp9_compute_qdelta_by_rate() function and fixed
rate multiplier values. Hence it is not impacted by
the relative value of active_best_quality.
Change-Id: I93b3308e04ade1e4eb5af63edf64f91cd3700249
The cause is because VP9 encoding use vp8_vpxyv12_extendframeborders_neon
on arm which only extend boarder size 32. But VP9's border size is 160
Change-Id: I1ff7e945344a658af862beb1197925e677e8ff57
Problem has been introduced recently with the cleanup patch
I0816ec12ec0a6f21d0f25f10c214b5fd327afc6c
Change-Id: Iaacb956a6039eb5826b82618dc03be32053fb892
This commit changed the initialization of best_mode_index to -1 to make
sure it is not mistakenly used for mode masking.
Change-Id: I75b05db51466070dd23c4ee57a4d4b40764dc019
In mode selection loop, once mode_index pass mode_skip_start, all
modes with a different reference frame from current best mode are
masked out using mode_skip_mask.
However, the setting of mode_skip_mask may use an invalid mode if
there is no mode tested yet. This commit fixes the issue by making
sure a mode has been tested and selected. Otherwise, no mode will be
masked out because of their reference frame.
Change-Id: Ib0009e8a96836a65cf5347440fff8a2e1a67f29f
This patch fixed the uninitialized read errors in Issue 748:
"dr memory VP9 encode errors". In vp9_convolve_avg_sse2,
when width is 4, pavgb reads 8 bytes from dst buffer that is
out of range. An error is reported although the data is not
actually used later. This issue was resolved by preventing
uninitialized reads.
Change-Id: I109a54910aa47139cb13119de86f2062cff207df
The macosx release of clang v5.0 identifies itself as:
Apple LLVM version 5.0 (clang-500.2.79) (based on LLVM 3.3svn)
This version of clang uses the older _mm_broadcastsi128_si256, like
v3.3, as given away in the LLVM svn version above.
Change-Id: I4d6d59d5454efd57d2ae9e75f5eb7486af7cbd0c
Calculate the difference variance between last source frame and
current source frame. The variance is calculated at 16x16 block
level. The variances are compared to several thresholds to decide
final partition sizes.
An adaptive strategy is implemented to decide using
SOURCE_VAR_BASED_PARTITION or FIXED_PARTITION based on motions
in the video. The switching test is done once every
search_type_check_frequency frames.
The selection of source_var_thresh needs to be investigated
further later.
RTC set Borg test showed 0.424% overall psnr gain, and 0.357%
ssim gain. For clips with large enough static area, the
encoding speedup is around 2% to 15%.
Change-Id: Id7d268f1d8cbca7fb8026aa4a53b3c77459dc156
This commit allows the non-RD mode decision flow to select
prediction filter type in NEWMV mode. It provides 8.14% compression
performance gains in both settings of AQ=0 and 3. The current speed
impact is about 5% to 10% slower.
Change-Id: Id66ecebf77abd8f90fb3f6a066c0e8dfb4bf1c42
Adds some high-level hooks for profile 2 before further
progress on the implementation.
According to the definitiion in this patch:
1. Profile 2 only supports 10 or 12 bit color but not 8
2. Profile 2 supports all color sampling modes: 444, 422 and 420,
and alpha plane.
3. Profile 3 is currently undefined.
Please consider the definition carefully and suggest modifications
to the definition as needed.
Change-Id: I5b284fc679e54ac5aee171af72fa7994cfd28995
There was a bug with the decoder that if you started the decoder
with more threads than the first frame had tile columns. Afterwards
tried to decode a frame with more tile columns than the first frame,
the decoder would hang. E.g. run vpxdec --threads=4. The first frame
had two tile columns, then the next key frame had 4 tile columns, the
decoder would hang. If you started with 4 tiles and switched to 2
tiles the decoder would be fine. The issue is that the worker the thread
loop is using is stale.
I added a test vector "vp90-2-14-resize-848x480-1280x720.webm" that
exhibited the bug.
Change-Id: I7bdd47241a52ac0fe1c693a609bc779257e94229
Copy up to a certain bsize, otherwise set to a fixed bsize.
This helsp to reduce artifact near moving boundary caused by full partition
copy without checking motion of super-block.
This artifact can occur at speeds 3,4 in real-time mode.
Issue: https://code.google.com/p/webm/issues/detail?id=738.
Change-Id: I05812521fd38816a467f72eb6a951cae4c227931
ARF overlays now use the same rate correction factor as regular inter
frames, further testing would be needed to see if it makes sense to use
a completely separate rate correction factor for ARF overlays.
$ vpxenc --cpu-used=5 --fps=50/1 --target-bitrate=2000
parkjoy.y4m -o out.webm
=> Before: 3356 kb/s
=> After: 2271 kb/s
Change-Id: I73e4defa615ba7a8a2bdb845864f4b1721cbbffe
This commit estimates the motion vector rate cost right after full
pixel motion search. It combines this and the mode cost and compares
the corresponding rate-distortion cost. If it is already above the
current best one, skip the rest sub-pixel motion search and modeling
process. For pedestrian_area 1080p at 4000 kpbs, the speed -5 runtime
goes down from 39425 ms -> 38399 ms.
Change-Id: If4cd7119fd6c266798d5cf1d19d19ab425e52a26
Tests the basics (first confirms feature is available in vpx_config.h):
- VP8 decode (in IVF file).
- VP9 decode (in WebM file).
- VP8 encode (to IVF and WebM).
- VP9 encode (to IVF and WebM).
- VP9 lossless encode (to IVF, currently disabled due to failure).
- Pipe input (to vpxdec and vpxenc).
Test data path and path to vpx{dec,enc} have been parameterized. In
addition:
- Supports disabling tests (test names prefixed with DISABLED_ are not
run by default).
- Supports filtering tests.
vpxdec.sh: Tests vpxdec.
vpxenc.sh: Tests vpxenc.
tools_common.sh: Common test functions.
Change-Id: I0612c88b8dd6049a05bbbc79a317a0cca61733a5
Reinstates this macro and truns it on in order to avoid issues
due to some frames at the end starving in harder videos.
A more acceptable solution is in the works.
Change-Id: I3c46148e86fa6114e3fed245246fb3686a9e6700
This commit slightly increases the bit allocation for key frame. This
improves speed -5 coding performance by 2.77% with aq-mode=0 and by
2.78% with aq-mode=3.
Change-Id: Iaa3e777f80b9706306606af06e89852bac146659
in some configurations MSVS will define _off_t / off_t in wchar.h; the
former is used locally while the latter is for compatibility. this
change overrides off_t as in the past and sets _OFF_T_DEFINED to prevent
a clash in types.
Change-Id: I9b0e6db586a0a2729b545d93edfc56570d2fcf97
For real-time mode under cbr, this increases the gain (5-10%)
for speed 5 (none/little change for 6), on vc-clips.
Change-Id: I9b38beeb3c820de22c43a0ba53a9456168dd24ba
Turns off the DISABLE_RC_LONG_TERM_MEM macro and makes other changes
in the way the bits are updated, to make 2-pass rate control track
target bitrates closer.
Change-Id: I5f3be4b11c2908e6a9a9a1dd4fcf4e65531c44d8
This commit reduces the frequency of frames using finer quantizer
in non-RD coding flow, and slightly tune up the quantizer resolution
when used. It provides 1.7% compression gains in speed -5 at no speed
difference.
Change-Id: I430249a51260a841a0402666e5ec1566e4f7d5a6
Temporary revert.
Problems with conflicting definitions of type off_t
in MSVC builds that need resolving.
c:\Program Files (x86)\
Microsoft Visual Studio 9.0\VC\include\wchar.h(479) :
"error C2371: 'off_t' : redefinition; different basic types
c:\on2experimental\libvpx\tools_common.h(26) :
see declaration of 'off_t'"
This reverts commit 92a4c59112.
Change-Id: I535e40a18842a92e3e6e0b29e5fba66313010803
The new tolerance is a little higher than before (especially
for kf/gf/arf) so this change gives an encode speed up
for some clips up for speeds 0-2.
Change-Id: I63f7d6c9cc11c7f58742f41e250dcd3eab1741eb
This code/setting was actually not used (since speed features were not set on first frame,
until a recent change) and should be removed.
In CBR mode, the q value for the first frame can be controlled by setting
the target size via the parameters rc_buf_initial_sz (and max_intra_size_pct).
Change-Id: I65afc64972b36c449bd5a8c25800e65da5389066
While encoding a frame, its last frame source can be used to give
acurate motion information. This patch prevents last frame to be
overwritten so that it is available during current frame encoding.
The last source is scaled when it is necessary. cpi->Last_Source
points to the last source frame.
Change-Id: I0e1ef5e9e1d2badf9d0c7a1a44a7ed5b24c09425
Use a crude correction factor to correct for
lower compression efficiency at higher encode
speeds when estimating the max Q for the
clip.
Change-Id: I5ae377647f4adf5e91d700a8791fb3b8f70efc73
This commit optimizes the bit allocation for the non-RD coding flow.
It applies slightly better quantizer to the frames, where all blocks
run a non-RD partition search. Such frames typically have better
rate-distortion trade off, thus improving the reconstruction quality
for next few frames reference at reasonably low increment in rate
cost.
The coding performance for rtc set at speed -5 with error-resilient
tuned on and rate control set as cbr is improved by 19.58%. It improved
the coding speed by about 10% for a portion of local test clips.
Change-Id: I9d56696cd4359dc8136ca10aff10fff05aaa2686
For very large size video image, the scaling calculation may need use
value beyond the range of int. This commit upgrade the value to 64bit
to make sure the calculation do not wrap around INT_MAX.
The change corrected the decoder behavior.
The bug affects only very large resolution video because the scaling
calculation was sufficient for image size smaller than 2^13.
This resolves issue:
https://code.google.com/p/webm/issues/detail?id=750
Change-Id: I2d2ed303ca6482f31f819f3c07d6d3e98ef3adc5
This commit adjusted the speed steps in rt mode to make the steps
more evenly spaced on speed and quality, specifically:
1. Merged 3 and 4 into one single step 3 and removed confilicting
features.
2. Move 8, 7, 6, 5 to be 7, 6, 5, 4 repsectively.
Change-Id: I38d56d61531f3561d772aef953c411c8fb38c063
Root cause is the different default register length between x86
and x64 platform. Change spatial_layer_id to long long.
Change-Id: If1a5972365c7a59f7e76cb4fd714610f3d48a8ff
Small speed gain for speed 1.
Quality is generally a little up for speed 2.
Speed 3 was similar to speed 4 but now positioned more
evenly between 2 and 4 speed and quality wise.
(opsnr +5.6% ssim +8.25% across all sets)
Speed 4 is a little slower than before but sizable quality gains.
(opsnr +3.7% ssim +6.8% across all sets)
The code has been cleaned up a bit so that for each incremental
speed step changes over the previous speed step are applied.
This makes it easier to see what is changing from one setting to
the next.
Change-Id: I2d98d0d6230af23486adaec01908f58942a7cdeb
Allow tx search for ARF and GF helps quality but a little slower.
Setting subpel_iters_per_step to 1 improves encode speed.
Setting sf->mode_skip_start = 10 improves speed.
Initial local results suggest overall impact on quality is neutral
but encode is up to 15% faster.
Change-Id: Ibde02cae6626a44c10a1da0cefe888afbb51f037
Removes a TODO. Changed meaning of some parameters
(target-max-percent refresh and starting index) to be
defined relative to superblock. Also, modify turn-off condition.
Change-Id: I5e55f372b7079c24f9cdac0b06fa34620dbf456b
This commit enables the non-RD mode decision coding flow to
adaptively apply partition search in non-refresh frame, when the
collocated block in previous frame suggests there might be a motion
activity. It refactors the update_state_rt() function to support
buffer swap of mode_info struct, thereby unifying the encoding
stage across various non-RD coding modes.
It provides 5% compression performance gains in speed -6 for rtc
test set, at about 12% speed slow down.
Change-Id: Iefa374aed5a11c4b7ff9a3ed36a98ea8bd184edb
Fix so that vp9_update_segment_aq() will use the correct (i..e, chosen)
encoding mode (from ctx struct) in update_state.
Change-Id: Icc4b66f3935fad5ec4516a4d57e843d12c365e64
This commit allows the recursive non-RD partition search to early
terminate sub search tree when the cumulative rate-distortion is
already above the best available.
Change-Id: Ifdbcbb4bee229f47fde3033200829577c9f1fc1d
This commit added a speed feature to make the logic of calculating
skip_recode on a block level more explicit. This also enable the
feature to be enabled at speed 5 where the previous logic is too
conservative, help gain back the lost speed for --rt(-5).
Change-Id: Ieb37ca3e85c2e7bda343486edf13d5f5395f2233
There were a few conflicts between the new non-RD partition search
and recent clean-up patches, which were not caught by git control.
This commit fixed these issues.
Change-Id: Ieebefbd6c19d81d0d13e3c568877d5cce2ab7797
This commit takes out the if statements on using adaptive motion
search flag. This feature is automatically enabled in non-RD mode
decision flow for variable partition types search.
Change-Id: I5a25cf9109d84d07aa61b3e02c8d32dda1e90cb0
This patch fixed WebRTC Issue 3020: "Uninit error at
vp8_mbpost_proc_down_xmm". The first 8 values in d were not initialized,
but was accessed. This patch fixed c code as well as mmx and sse2 code.
Change-Id: Iaa5b41a4ed3bea971b15fb826ce34b7ab4e36fb1
This commit enables a recursive partition type search for non-RD
mode decisions. It allows the encoder to choose variable block
sizes in a 64x64 block based on rate-distortion modeling.
It improves speed -6 coding efficiency for rtc set by 2.4%. Most
of the gains come from 32-40dB range, where many sequences gain
about 5% to 20%. Local tests suggest there is no speed change.
Change-Id: I06300016e500a21652812b7b3b081db39a783d66
This commit reformats non-RD coding flow layout to allow mode
decision with fixed and variable block sizes.
Change-Id: I2cdd3bb9f26c499ee4a9849004fd925cdd195d09
2 functions were optimized for avx2 by using full 256 bit register
In order to handle 32 elements in parallel instead of only 16 in parallel:
1. vp9_sad32x32x4d
2. vp9_sad64x64x4d
The function level gain is 66% and the user level gain is ~1%.
Change-Id: I4efbb3bc7d8bc03b64b6c98f5cd5c4a9dd3212cb
Fixed dr memory errors reported in Issue 736:
https://code.google.com/p/webm/issues/detail?id=736
All elements in left_col buffer need to be initialized to ensure
the correctness of SIMD operations in x86 optimized code.
Change-Id: I8e7f26ab45cca8099c1f9342bcf852f828bda7e4
The third pred_mv is stored in x->pred_mv[ref_frame]. This commit make
sure the correct mv is read.
Change-Id: Ibed24daf36703a63f0394c87b2381ee1d2eb7910
In non-rd pick_mode code, added mode skipping according to
thresholds. Used rd thresholds now, but we can modified them
later for real-time case.
RTC set borg test showed a 0.095% PSNR gain. For different rtc
clips, the real-time(speed 7) encoder speedup is 2% - 10%.
Change-Id: Ic72535c96b891092c662453be32d3168f7e34dcc
The flag x->skip_recode interacts badly with
the cpi->sf.use_nonrd_pick_mode and
cpi->sf.skip_encode_sb speed settings.
Restricting the use of the skip_decode flag when
these other speed choices are in use helps quality
for speeds 3 and 4 by a large amount with only a
small impact on speed.
Average improvmentes for 2 pass speed 4:
Derf +8.8%
Yt + 10.53%
Std-Hd +6.95%
yt-hd + 22.95%
Change-Id: I8010876d8012042a11077c92e69d813c3dfa58eb
This commit changed how q is validated in lossless mode. With this
commit, when --lossless=1 is specificed at commandline, --min-q and
--max-q are now ignored. This is to make the option non-ambiguious.
Change-Id: I33e85690460537509d33be75d6a3597be4affc09
One of the tests for real-time mode is failing at speed 6.
Introduced recently, will enable again when fixed.
Change-Id: I8f42de6a3eca226c9aa5c5e1fab98d629993c087
Instead of hardcoding a certain indentation, use the regexp to
provide similar indentation for the new line as well.
Change-Id: Iacb2621b35ce7e1aa3980c1603b8e3ab02d98a35
This is an initial attempt to allow variable block size partition
in non-RD coding flow. It tests 8x8, 16x16 and 32x32 block size per
64x64 block, all using non-RD mode decision and the associated rate
distortion costs from modeling, then selects the best block size to
encode the entire 64x64 block. Such operations are triggered every
other 3 frames. The blocks of intermediate frames will reuse the
collocated block's partition type.
It improves the compression performance by 13.2%. Note that the gains
are not evenly distributed. For many hard clips, the compression
performance is improved by 20% to 28%. Local speed test shows that
it will also increase runtime by 50%, as compared to speed -7. It is
now enabled in speed -6 setting.
Change-Id: Ib4fb8659d21621c9075b3c369ddaa9ecb0a4b204
This makes sure that labels for data symbols directly after
functions get properly 4-byte-aligned (when the source is assembled
in thumb mode).
Previously, if declaring a data symbol directly after a function, the
symbol could end up pointing to the unaligned address (if the total
size of the thumb function didn't end up being a multiple of 4). The
data in the symbol itself ended up aligned, but the symbol pointed to
the preceding unaligned position.
That is, a source file looking like this:
---
...
ENDP
symbol
DCD 0x12345678
---
could end up being assembled into
symbol:
xxxxx2: 0000
xxxxx4: 5678
xxxxx6: 1234
(This doesn't happen if the symbol label is on the same line as the
DCD directive.)
By adding an ALIGN 4 directly after the ENDP we make sure the symbol
itself gets aligned properly.
This isn't an issue with the original, untranslated arm source,
since it only is built in arm mode where all instructions are 4 byte,
and since the gnu assembler automatically adds the padding before the
symbol even in thumb mode.
Change-Id: Iadbeebd656b0197e423e79a12a7d3ef8859cf445
1. Save stats for each spatial layer
2. Add frame buffer management for svc first pass rc
3. Set default spatial layer to 1
4. Flush encoder at the end of stream in test app
This only supports spatial svc.
Change-Id: Ia89cfa87bb6394e6c0405b921d86c426d0a0c9ae
<=sse2 isn't strictly necessary on x86_64, but this is more consistent
with the rest of the flags and should be harmless
Change-Id: Ice0f1d1c4c7510ee90af2a62dbd3d6508db63487
inheritance should be public; also correct placement of ClearSystemState
as the base class doesn't inherit from testing
Change-Id: I0f41330fccc62a70b8dd40d66bbd829b9d98cf84
This reverts commit 89025585cd.
This check breaks BSD builds and isn't useful through the configure
process. The README describes the build environment requirements (GNU
make).
Change-Id: I25f8a9c1640909412ab405dbd09a1c4d93e5a511
The use of uninitialized skip flag will trigger inconsistency in
coding statistics, when alternate RD and non-RD coding modes are
enabled. This commit fixes this issue and removes unnecessary if
statements from update_state_rt.
Change-Id: I7d549dcb0e3ef48b999e5bbc78174ba84502cfcf
The comment made it look like the condition code was dropped from
the extra add instruction, while it actually was handled properly.
Thus, the comment was misleading while the code itself did the right
thing.
Also clarify the comment indicating that we use the full three-operand
form of the add instruction.
Change-Id: I2c1ac6ac4fedf262d104ea30a6c005febc74de9c
Adding a --(enable|disable)-webm-io flag to control WebM container input and
output support. For now, enabling WebM IO by default only when there is a C++
compiler. Doing so because eventually we will move WebM IO to libwebm and it
is built using C++.
Change-Id: I210ac36c23528e382ed41d3c4322291720481492
The block coding skip flags are assigned in the normal RD mode
decision loop. They are then used in the final encoding stage.
In the non-RD mode decision, the forward transform and quantization
stages are replaced by modeling based on SSE and variance of
prediction residues. This commit applies reset to this array in
the non-RD coding mode.
Change-Id: I66584669b035e9c8ac23e95047849ff277472742
This commit moves the position where rdmult is saved to make sure it
is the correct value. Prior, an uninitialized value may be saved and
restored.
This addresses issue:
https://code.google.com/p/webm/issues/detail?id=733
Change-Id: I436407f289169bc63da3c5a6bf609bed16cb71b5
This commit unifies the non-RD partition use cases for both fixed
and variable block sizes. Deprecate and remove the separate function
for fixed partition type only.
Change-Id: I2b6cb945e90c1566f985adcebc4d0757480a8004
This commit allows the non-RD mode decision process to return the
rate and distortion costs associated with the selected mode.
Change-Id: Ibe0f67d323f65839fd9cb0a726c1219bf7b55da9
Brings back most of Jim's previous patch for choosing
partitioning based on variance while making it compatible
with the current state of the code. Also adds a
nonrd_use_partition() function to recursively encode for any
arbitrary sb_type decisions within a 64x64 block; and
includes some refactoring.
Currently, when the VAR_BASED_PARTITIONING mode is turned on
for speed 7, there is a 10+% speed-up observed.
Experiments/improvements with this new partitioning method
will be conducted subsequently.
Change-Id: Ie6f43bfbde30583e941f450bf07c3b48828c9571
This commit adjusts the rate-distortion modeling for non-RD mode
decision. It puts more weights on energy from AC coefficients when
estimating the cost. The coding performance for rtc testset is
improved by 0.72%.
Change-Id: Ifa6ff11311a513ec2b10586589e82a9a21f6c61c
Assign interp_kernel value in MACROBLOCKD. This will be used to
select prediction filter coefficient sets and generate motion
compensated prediction.
Change-Id: I28c8dfb2dae6566f6939bb328aca5875c94bee65
Reimplementing sub8x8-reading of intra block modes in
read_intra_frame_mode_info() and read_intra_block_mode_info(). Code looks
more readable as well.
Change-Id: Ia42fc7d0dad708bc0c7a8bff1f8b37809b843f40
Clean-ups include
a. redundant code in rt -5 speed feature settings
b. code that guarantees square block availability in
rd_auto_partition_range()
Change-Id: Ic7b04d45b6dc15c461e0edbbb4e78aec20348291
The block size used for non-RD mode decision in FIXED_PARTITION
setting was uninitialized. This commit fixes it by setting block
size to be BLOCK_16X16.
Change-Id: Ief04c9f1ab668de69297d9ab3dc15e2fa0bc4e95
Neon code unit test is failing now due to save/restore neon register
operations are not done inside this function, but outside of it. Disable
it now until VP8 neon code get cleaned up.
Bug: 725
Change-Id: Id1ff1ef50a0e894b41c820a310ff8ba31ef12d18
translates to TreatWarningAsError (/WX)
setting this via the CL environment variable is not possible due to the
/WX- default which is used on the command line
Change-Id: I0b42a9d3ca9eba6af82c25b8e434baa2fcb00156
Adds a fast diamond search which is about 5% faster than FAST_HEX
with only a 0.1% drop in psnr when turned on for both speeds 5 and 7.
This search is turned on for speed 7.
Change-Id: I497630aa88a5148926086bb3038e7975e5f4eb98
This commit allows the non-RD mode decision to skip mode RD modelling
check, if the motion vector associated with the current mode is
same as that of NEARESTMV mode. This makes speed -7 about 2% faster.
Previous change that converts cost metric from SAD to model based RD
value makes the codec 6% slower at speed -7.
Change-Id: I30cfec5452f606a671b8432a2f7f0c94fbb49fc8
For some dimensions, neon code ends up in a dead loop inside.
This will fix the unit test failure in svc_test on ARM.
Change-Id: Ie6098bfaefd86bcf3616a3d0c2c3ff0b154222b5
The rate-distortion model in non-RD coding mode is only applied to
luma component. This commit removed a few redundant addition steps.
Change-Id: Id8edc0a47c2dbef8deba43debe2c95db39454de3
This commit replaces SAD cost with modeled rate-distortion cost
for non-RD mode decision. It translates the prediction residual
SSE into estimate rate and reconstruction distorion costs, hence
capturing the quantization setting effect. The compression
performance of speed -7 for rtc set is improved by 14.79%.
Change-Id: Ifda014eb0501d13109fe7f92680bf1410b463632
fixes issue #711
specifying a multiword CC, e.g., CC='gcc -m32', would cause the failure
under dash
reported in
https://bugs.gentoo.org/show_bug.cgi?id=498136
patch by floppymaster at gmail dot com
Change-Id: I2ba246f765646161538622739961ec0f6c2d8c2d
clang on macosx does not support -Wunused-but-set-variable; adding the flag
causes additional warnings about the flag. As a more generalized fix, use
-Werror when checking compiler flag support in order to avoid using
unsupported warning flags.
Change-Id: I2529862e211f880d56491eac3b9fa90fff1aa5c3
clang reports gcc-4.2.1 in e.g., 3.3, 3.4; add a specific clang version
check for _mm256_broadcastsi128_si256
fixes issue #720
Change-Id: I5c8e3c27fdea05d8a5b050e8cb74894b595f4709
prevents out of tree build failures when the source tree has already
been configured; modeled after a similar check in autoconf
Change-Id: I627eb7243576f4d753141dfcb4ed4e34544d03a7
Configuration logging is passed through pr, but nothing configure
does actually requires pr. Use cat instead.
Change-Id: I451217882a329c2bfb8942ac86ac624a7feef670
The only difference between two examples was usage of VPX_EFLAG_FORCE_KF
flag for frame encoding. Moving this functionality into simple_encoder
with additional command line option.
Change-Id: Ia3c4209be073eeb541d4ac6b41bd0f12812f6676
avoid building x86inc.asm, x86_abi_support.asm and vpx_config.asm as
they provide no symbols themselves
fixes:
warning LNK4221: This object file does not define any previously
undefined public symbols, so it will not be used by any link operation
that consumes this library
Change-Id: Iecfe03aa76efbfc07c2af5b91ba5405634e45f1d
Set speed features before running frame encoding. This avoids
redundant RD threshold calculation in key frame coding.
Change-Id: If8e3cf2c02976baa59b310c1c23af9eea0c46e36
* Remove all non-DC intra modes for BLOCK_32x32 and up
* Remove all intra modes for blocks bigger than BLOCK_32x32
* Remove ZEROMV for BLOCK_32x32 and up
* Only consider NEARESTMV for blocks bigger than BLOCK_32x32
Change-Id: Ia18351a238213e2f072f9e481d622949346a245f
+ nestegg_track_codec_data
quiets uint64_t -> size_t warnings
the sizes used are previously validated against their associated LIMIT_*
values
Change-Id: Ie574a3a7496d0143bd58b778145c27f38dd6a4da
- Change type of encrypt_buffer() offset argument to ptrdiff_t, and change the
type of the size argument to size_t.
- Update size argument encrypt_buffer() in vp8_boolcoder_test.c with
same.
Change-Id: Ie29c7c82c73318bee01b89c6fb4c4e1442eef03c
The core motion estimation fucntions all return sad now consistently.
The only exception is vp9_full_pixel_diamond(), however the core diamond
and refining search routines called from vp9_full_pixel_diamond() also
return SAD. If variance of pred error + mv cost is desired it must be
calculated explicitly outside these functions. For very fast encoding,
hopefully this will eliminate some redundant computations.
Also suggests reimplementing FAST_HEX with the vp9_pattern_search
framework. It is not exactly the same as the existing FAST_HEX, but
performance is slightly better and speed is very similar. Enables
removing a lot of duplicate code.
Change-Id: I152736393438c25bdf7e96b37cbb8ce330f4f94a
significantly speeds up file generation.
the goal of this change is to convert rtcd.sh to perl as directly as
possible to allow for simple comparison. future changes can make it more
perl-like.
---
Linux
[CREATE] vpx_scale_rtcd.h
real 0m0.485s -> 0m0.022s
[CREATE] vp8_rtcd.h
real 0m4.619s -> 0m0.060s
[CREATE] vp9_rtcd.h
real 0m10.102s -> 0m0.087s
Windows
[CREATE] vpx_scale_rtcd.h
real 0m8.360s -> 0m0.080s
[CREATE] vp8_rtcd.h
real 1m8.083s -> 0m0.160s
[CREATE] vp9_rtcd.h
real 2m6.489s -> 0m0.233s
Change-Id: Idfb71188206c91237d6a3c3a81dfe00d103f11ee
* speed improvment of 30 percent achieved
* multiplies and adds remain the same
* non-arithmetic instructions minimized by hand, by:
-expanding 2 pass loop
-removing irrelivant "shuffles"
-combining last two rounding steps
* further improvments may be possible
Change-Id: Idec2c3f52910c48e6a0e0f9aefed5cae31b0b8c0
This patch adds a new speed feature which doesn't do the rather
expensive entropy context lookup or save to the table, while
doing costing.
The speed up on desktop36p.y4m is around 10% other clips much less.
On the RTC test set this was + 1% in overall datarate.
Change-Id: Ia5144bbf45270671e7be9c8e4055369909e2f738
This gets more accurate mode hit stats. It's also the first step to
handling ZEROMV not being allowed more intelligently.
Change-Id: I5de6734507b5177bf73e9ddbad923f218c39f3e4
intra_y_mode_mask is already enforced for the sub8x8 case.
intra_uv_mode_mask is already enforced for all sizes.
Change-Id: Ia9dd14701cb49873c2e8f24eb5f8b255eaf76a1f
The function has evolved over time, now only calls vp9_rtcd(), so this
commit removes the function and changes to call vp9_rtcd() directly.
Change-Id: I8cfa6190daa4b28f6f3d1e11bb3a07f9c95322bf
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_avg_variance64x64
2. vp9_sub_pixel_avg_variance32x32
both of those function were calling vp9_sub_pixel_avg_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_avg_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 80% function level gain and 2% user level gain
Change-Id: Iea694654e1b7612dc6ed11e2626208c2179502c8
ne_read_block/ne_find_cue_position_for_track/nestegg_get_cue_point
in the use of ne_map_track_number_to_index
+ add a check to ensure it doesn't exceed the type bounds
fixes:
./third_party/nestegg/src/nestegg.c|1322| warning C4244: 'function' :
conversion from 'uint64_t' to 'unsigned int', possible loss of data
Change-Id: I3703d739dcf9a2d4d8e2b704e957e5e3fd80dca0
different_ref_found is always equal to one (if calculated) because
ref_frame[0] != ref_frame[1] for each mi-block.
Change-Id: Ibd7625b7b29dec2fd3c40edbc3de1169abb78585
2. Add read/write for RC stats file
The two pass RC for svc does not work yet. This is just the first
step. We need further development to make it working.
Change-Id: I8ef0e177dff0b5ed3c97a916beea5123717cc6f2
Adds a speed 8 to VP9 where only the nearestmv (0 mv) is searched.
This seems to be about the same speed as vp8 speed 5.
Adds a new speed feature to disable inter modes based on a mask for
each blocksize.
Adds code for having lower complexity motion search methods
in nonrd pick mode function, even though speed 7 still uses DIAMOND
search for now.
Also uses HEX search for speed 6 rather than FAST_HEX which improves
psnr by 0.56% without any noticeable speed drop (tested on gipsmotion).
Change-Id: Ic13176572dbd3aed5884a26786940a4b1bbd8a75
For blocks at frame boundary, the selected block size sometimes needs
to be smaller than that was first given. This commit forces such block
size change only between square blocks, so as to avoid the potential
use case containing 32x16 + 16x8 + 16x8, for 1080p sequences.
Local test suggested no visible coding speed difference. Borg test
reveals no difference in terms of compression performance.
Change-Id: Ie8de87f3c6febc3acf11b4cbfdf2077f9f6def52
This commit checks if the motion vector associated with the current
mode has been computed in previous mode tests. If possible, skip the
redundant reference block generation and SAD calculation in the
non-RD mode decision process.
For test sequence pedestrian_area 1080p, the runtime goes from
24261 ms to 23770 ms. This does not change compression performance.
The speed-up is mostly around places with consistent motion.
Change-Id: I97be63c6a2d07c57be26b3c600fbda3803adddda
previously the scale functions would always be include regardless of the
CONFIG_SPATIAL_RESAMPLING setting.
Change-Id: Ifbccf47b20689b5dd61bb3ddccd5c013297b4e05
Added fast HEX search while doing non-rd partition picking to
speed up the encoder.
Borg test(speed 7) on rtc set showed 1.8% overall PSNR loss.
Encoder speedup was 5% - 15% for different rtc clips.
Change-Id: I9c83026eabc70b69fcc747c90369ec60bfa3ca24
In handle_inter_mode, the reference frames are set in refs buffer.
One can use refs buffer directly to avoid redundant fetch.
Change-Id: I811d408cae52dcd5e053dd4bfe69550eb6a2ff56
Instead of using source variance, this patch uses variance of the
frame difference between the source and the current frame to make
fixed size partition decisions. Also disables adjusting partitioning
if variance based or fixed size partitioning is used.
The latter change improves the speed substantially for speed 6, so
that speed 7 is now less than 3x the speed of speed 6. But speed
6 is 48% better in psnr on the rtc set compared to speed 7.
As compared to speed 5,
speed 6 is -37% in psnr at about 2.5x the speed,
speed 7 is -55% in psnr at about 7x the speed.
Change-Id: If61d80431d3e04ed304ac05832e773cdb2c0a578
Begin() will be called twice with 2-pass encodes, invoking
y4m_input_open which allocates memory; close the old instance first.
Change-Id: Id252a21d286ca9ae998bd87599d43aeb8d7d77aa
FwdTrans8x8HT is disabled as the tests currently fail.
note not all functions have NEON implementations:
- fdct8x8/fht8x8
Change-Id: I028bdec9a21eaaee2c5865470ab179aac403540e
Trans4x4HT is disabled as the tests currently fail.
note not all functions have NEON implementations:
- fdct4x4/fht4x4
Change-Id: I26f8724bf2a9ea01d59205a1c57119ed25d043bc
There was a bug in the previous code that if GOLDEN was better than
LAST neither would be used. LAST would get turned off due to superior
GOLDEN quality then all GOLDEN modes would get skipped.
Change-Id: I173f3720451707dab7b2cbbe8b8e6a047089bde7
Removing all copies of identical vp8_mse2psnr/vp9_mse2psnr functions.
Using vpx_sse_to_psnr() instead in all places.
Change-Id: I15beef9834d43d8fc8a8a7a2d1fc5de3d658fed8
Usage of encode_b_args is unnecessary because encode_block_pass1() doesn't
use them. That's why optimize_init_b() call is also not required.
Change-Id: Ib6cfe4916c2ca85749c90bb0adcba6fea592f9ac
As Yunqing suggested, this commit makes non-RD mode decision always run
sub-pixel motion search in NEWMV mode. The compression performance
gains becomes fairly significant after we enabled sub-pixel accuracy
motion compensated prediction to calculate SAD cost.
For test sequences pedestrian_area at 1080p and vidyo1 at 720p, the
runtime goes slower by 5%. For rtc test set, the compression performance
is improved by 21.20%.
Change-Id: I38cbfdd5c53d79423e1fafb3154f8ddeed63bbf0
The commit change to use partitions sizes directly from last frame
for frames directly where last frame selects partitions sizes based
on coding efficiency.
On --rt --cpu-used=-5, the change hurts compression by 4% but reduces
encoding time by ~20%
Change-Id: Ia68665e5c8489b7bfcf5fac7768332fba88928e6
Modify existing test to also check the case of dropping
(i.e., skip decoding) a consecutive list of frames.
Change-Id: Ia8c1195559f952e86e6697996931d3a920c05ae3
The only difference between two examples was a setting of
g_error_resilient flag in error-resilient example. Moving this
functionality into simple_encoder with additional command line option.
Change-Id: I0245793320125926e1bf208cc1e87aef39ca478d
This commit builds the actual prediction block in sub-pixel accuracy
and uses which to calculate SAD for non-RD mode decision. In the trail
run on pedestrian_area at 1080p, rtc speed -7 runtime goes from
23495 ms -> 25107 ms (7% slower). The compression performance is
improved by 20.57% for rtc test set.
Change-Id: I438589cd103fe99f1b50c2d1939ac6ca43fa0157
Use a set of dedicated variables to buffer the current best mode
in non-RD mode decision. This allows to use mode_info for more
complicated test in the non-RD process.
Change-Id: I6024c9feb0662afd3eb29f7017f7b5a5446f303f
Adds a method for determining a fixed size partition based on
variance of a 64x64 SB. This method is added to rtc speed 6.
Also fixes a bug in rtc_use_partition() and includes some
refactoring related to partitioning search, and some cosmetics.
Currently compared to speed 5, the coding efficiency of speed 6
is -19% and that of speed 7 is -55%, in cbr mode.
Change-Id: I057e04125a8b765906bb7d4bf7a36d1e575de7c6
The optimizer did something funny with the code around
line 1412. Before the call to encode_sb split_dist was
set properly but after it was adjusted and converted to
a negative.
https://code.google.com/p/webm/issues/detail?id=714
Change-Id: I9a7631d5325ade2dc28c1030653a23eecec8721b
Change dx_time data type to int64_t to prevent
test time overflow when decoding long video.
Change-Id: I3dd5e324a246843e07e635fd25c50e71e385ed70
Signed-off-by: James Yu <james.yu@linaro.org>
If sf->disable_split_mask is DISABLE_ALL_SPLIT, disable
sf->adaptive_pred_interp_filter to avoid unnecessary operations.
Change-Id: Icb59174b2f4e9a3c3c16a696deb8018e5bd999eb
Moves the existing speed 6 to speed 7 and adds an
intermediate level 6 which is roughly in between
speeds 6 and 7 in both speed and coding efficiency.
Also includes some minor fixes/adjustments.
Change-Id: I98befc4d82d750e79fe426c457c4a2571f6b6cc7
If show_existing_frame indicates that the decoder should
display an existing (previously decoded) frame, add a
check to make sure that the signaled buffer does contain
a valid decoded frame.
Change-Id: Iac8c686b321827414d69a3f2d0467566911bcba2
for ABSDATA mode, so segment loop filter level always fall in valid
range for both Absolute and delta modes.
Change-Id: If90df3411479533dbdab63f8ae088d2f5dd174a9
The qindex for a segment was not clamped in ABSDATA mode, which may
cause invalid memory access if an ill-formed stream has a negative
value in ABSDATA mode. This commit added clamp to make sure qindex
for a segment always fall into valid range.
Change-Id: I0a74d00f4ef40aec7edaeca1d03c8645e23ab08c
Skip coefficient cost update in non-RD mode decision setting. Allow
periodical mode and motion vector cost update. Currently every other
8 frames. The increment runtime is a constant number. Hence more
visible for CIF resolution, while negligible for 1080p.
Speed -6 compression performance for rtc set is improved by 4.5%.
Change-Id: I27e0ad7c521fcc2af1d825582cbdd1a27ac4c323
This commit makes a refactoring of the rtc_use_partition. It allows
the encoder to take a preferred block size for non-RD mode decision.
The boundary blocks are handled such that smaller block sizes that
fit in the boundary size will be used instread.
In rtc mode, the coding performance of speed -6 for pedestrian_1080p
goes from
158980 b/f, 38.934 dB, 22721 ms to
159008 b/f, 40.064 dB, 23721 ms.
For rtc set, the speed -6 compression performance is improved by
26%. Still about 2dB behind speed -5 at this point.
Change-Id: If0944f0880eaf1ad340bc325d97cea8d0f9dd53f
- Update the vcxproj generator to pass the path to the batch file.
- Update the batch file the take the path to obj_int_extract.exe as arg
2.
Fixes this warning:
warning MSB8012: TargetPath does not match Linker's OutputFile property
value.
Change-Id: I5825f1d1d79f370aeb295bbd2aeb08b22c0e73ab
* Reduce the number of short cirtcuit checks by pre-computing and combining like checks.
* Postpone non-trivial initializations until after the shortcircuits are evaluated.
* Add some consts and const pointers.
No change to the actual results of the call or output of the encoder.
Change-Id: Ie44c4702aec6e08cfe0b8b0ba3cd6b57206478d1
This commit enables the use of DC, vertical, and horizontal intra
prediction mode in rtc non-RD mode decision. When the best cost value
of inter modes is above a given threshold, the encoder runs the
above three intra modes and selects the one that has minimum
prediction residual in terms of SAD.
This together with recent changes on non-RD mode decision and coding
control improves compression performance of speed -6 by
derf 91%
yt 61%
hd 46%
stdhd 52%
In terms of encoding speed, it is about 3 times faster than speed -5.
Change-Id: I6b483bfd0307e6482bb22a6676ae4e25a52b1310
When non-RD coding decision is used in rtc mode, the alt reference
is not used for inter frame prediction. This commit disabled alt ref
option whenever speed -6 is used.
Change-Id: I0b33ca03661de1db2d9bef1bcbff848cd4c9396f
Set TargetName for library builds instead of changing the value of
OutputFile.
This fixes the following warnings:
warning MSB8012: TargetPath does not match Library's OutputFile property
value.
Change-Id: I4320b6d9ea922d3a15b9823c7c6694ee33edbf45
Current setting was specific to 1 layer case.
rc_target_bitrate is total bitrate for whole stream,
so set it to ts_target_bitrate for highest/top temporal layer.
Change-Id: I83de73364956fa21c0a7c971c9f390d4840457e6
Use unsigned int instead of uint64_t for duration and deadline
arguments to functions get_frame_stats() and encode_frame().
Change-Id: I1f26a7afc38ae89916b2c67415ced26fdc9d53e7
In the first coding run of a 64x64 block, check the coding mode
for each 8x8 block. Will need a second annealing stage to decide
the partition size to be encoded.
Change-Id: Ida9417805ff3358979b0c0429d4099c023c88866
In good quality mode motion search, the best matches are normally
found after searching in a large area. In real time mode, to make
encoding fast, a center-biased fast HEX search is used, which
converges quickly most of the time. A 4-point diamond search is
also carried out as the following refining search, which gives more
precise results, and maintains good motion search quality.
At speed 5, the borg test on rtc set showed an overall PSNR loss of
0.936%. The encoding speed gain is 4% - 5%.
Change-Id: I42cd68bb56a09ca1b86293c99d5f7312225ca7ae
Run sub-pixel motion search when NEWMV gives lower rate-distortion
cost. This improves coding performance of derf set by 8%, std-hd by
2.2%.
Change-Id: Ife50f7fda8463927784fe59a41cc439c833e941a
- Rename and make static
s/vp9_compute_qdelta_by_rate/compute_qdelta_by_rate/
- Make base_q_index an integer.
- Add a cast.
Change-Id: Iea8d1397fd2717e7373b182ec51f5db960ef2cca
If MINGW_HAS_SECURE_API is defined, we don't need to declare strtok_s, but we still need strtok_r define.
Change-Id: I7cf781bb58f991a2bdce6a2ccf5082f6924579a3
these were incorrectly stripped in:
50fa585 Removing examples code generation and making them static.
Change-Id: Idb475ad5b303634311e9f616604312cb925cc6a9
Optimizing 2 functions to process 32 elements in parallel instead of 16:
1. vp9_sub_pixel_variance64x64
2. vp9_sub_pixel_variance32x32
both of those function were calling vp9_sub_pixel_variance16xh_ssse3
instead of calling that function, it calls vp9_sub_pixel_variance32xh_avx2
that is written in avx2 and process 32 elements in parallel.
This Optimization gave 70% function level gain and 2% user level gain
Change-Id: I4f5cb386b346ff6c878a094e1c3b37e418e50bde
Optimizing all SSSE3 assembly for convolution:
1. vp9_filter_block1d4_h8_sse2
2. vp9_filter_block1d8_h8_sse2
3. vp9_filter_block1d16_h8_sse2
4. vp9_filter_block1d4_v8_sse2
5. vp9_filter_block1d8_v8_sse2
6. vp9_filter_block1d16_v8_sse2
my optimization include:
-processing 2x8 elements in one 128 bit register instead of processing
8 elements in one 128 bit register.
-removing unecessary loads.
This optimization gives between 2.4% user level gain for 480p input
and 1.6% user level gain for 720p.
This Optimization is done only for 64 bit
Change-Id: Ic07fce2f9360329b4f2d956efda1480ae958766b
the remainder of the documentation will not be included in the output
unless the file itself is documented
Change-Id: I5a83a6c41cdfbf2976da288e4b70bd04002725f2
Only use layered average size if number_temporal_layers > 1.
Also removed unneeded commented-out line, and change some parameter
setting in vpx_temporal_scalable_patterns.c
Change-Id: Ic86e43e7daf0313e8c5a4aba1497299158111955
Added support for external frame buffers to libvpx's VP9 decoder.
If the external frame buffer functions are set then libvpx will
call the get function whenever it needs a new frame buffer to
decode a frame into. And it will call the release function
whenever there are no more references to that buffer.
Change-Id: Id2934d005f606af6e052fb6db0d5b7c02f567522
Prior to this commit, both encoder and decoder reset mode/mv info from
previous frame in error resilient mode to ensure bitstreams are able to
decode when there is loss of frame in decoder side. However, this is
not necessary. This commit changed to remove the reset, so encoder can
continue to use mode/mv/partition information from previously encoded
frame without affecting decodeablilty under loss of frame.
Change-Id: I0279f862900dc647fb471ae3389770bb1b9f454f
Silence signed/unsigned mismatch warnings by adding casts where
ts_number_layers does not match the sign of the variable to which
it is being compared.
Change-Id: Iab25e18c877d158b2b2b417de7da94669648b2fa
In rtc coding mode, the encoder is running non-RD mode decision. It
does not need dual buffer swap as was the case in the RD mode. This
commit initializes the internal buffer pointers outside the block
coding loop for rtc mode.
Change-Id: Ie076705c60d6b7919217e3f1dfd49e7db5064ac2
The functionalities of set_offsets() are subsumed in later
set_partitioning() and rtc_use_partition() functions, hence removed.
Change-Id: Ie514b13cb66c2379f13d0be9b1da4c12ca4581e5
Flipping arf on and off too often is hurting some clips.
This change makes no difference for 50-75% of our test
clips but helps some by a big margin. (eg. std-hd crew
by 6% and one of the YT and YT-hd clips by 14%)
Average improvements for 2 pass, speed 2 (psnr,ssim)
are as follows:-
derf 0.165%, 0.210%
yt 1.210%, 1.464%
yt-hd 1.189%, 1.471%
std-hd 1.031%, 0.886%
Change-Id: I121fe66cfb4a62d384b23b484a7d648789641969
Two convolve functions were optimized for AVX2:
1. vp9_filter_block1d16_h8
2. vp9_filter_block1d16_v8
vp9_filter_block1d16_v8 was optimized for AVX2 by reducing the number of
loop strides by half, two strides were processed in parallel.
vp9_filter_block1d16_v8 was also optimized in the same way also some of the
loads were being done outside of the loop and by that preventing redundant
loads.
This Optimization gives 43% function level gain and 1.3% user level gain.
Now can be compiled in Windows
Change-Id: I2714124cfb0c14a77d7a0ce126a20db92ffbf92c
Use size_t for DecodeFrame()'s size arg, and cast only
at the vpx_codec_decode() call site. This silences warnings that
appear in svc_test.cc when building with vs2013.
Change-Id: I2cf39f02a45732c752097f07b0c7ad414b1517d8
This function initializes the predictor buffer pointers and
calculates reference motion vectors. It is only needed in the settings
of inter frame coding. Hence removing it from the key frame coding
branch in rtc_use_partition.
Change-Id: Ic4e16c7467a5f32be4e0bf619ef9d57afb4a7075
Turns on AVX when the final characters of .c and .cc file names preceding the
.c and .cc file extension contain the substrings avx or avx2. This silences
many MSVC warnings issued during compilation files that use AVX.
Change-Id: I82bda394af7a688679abab2a50dd7e10b3cb0c7a
This function is deprecated after the re-design of partition search
that runs big block size, then four-way split, followed by
rectangular block sizes. This commit removes the related functions.
Change-Id: I417549c8e0fa3cf35bd29816b805dd4e7c3660c6
The function rd_pick_reference_frame can be deprecated. Its use was
subsumed by the adaptive motion search control.
Change-Id: Icb0c2fa335f0f06fa7b79a71f972d9fa54d750db
Removes certain cases of feedback of active_worst_quality,
and removes it from the RATE_CONTROL structure. Now active
worst quality is expected to be computed locally in the
q picking function during the encode.
Making temporal filter strength depend on avg_frame_qindex
rather than on active_worst_quality actually improves
performance esp. for yt.
derf: +0.038%
yt: +0.359%
Change-Id: I1fe5a343034b55af9322289165321f00ac0827b1
In real time encoding, we enable encode_breakout to make encoding
fast. A speed feature "use_encode_breakout" is defined to set
encode_breakout thresholds for different speeds.
However, currently, static_thresh is an encoder option. The encode_
breakout can be turned off if user sets static_thresh=0 specifically.
The rtc set borg test result: (need to set --static_thresh=1)
speed -5, psnr loss -3.543%;
speed -4, psnr loss -2.358%;
speed -3, psnr loss -0.771%.
Encoding speed test:
speed -5, 11% - 60% speedup;
speed -4, 5.5% - 28% speedup;
speed -3, 0.8% - 7% speedup.
Change-Id: Icde592ffbe77eac7446f872a2e9eb2051733677b
Aq 1 only updates segment map on kf and arf and
only uses 3 segments. With these settings AQ1 is
+ for most clips in SSIM but negative in psnr.
However, the penalty in PSNR is much less than
previously.
Old version aq1 average results for std hd
-20.899% psnr, -5.809% SSIM
New version aq1 for std hd
-3.57% psnr, +1.23% SSIM
Aq2 Now uses only 2 segments and rd.
This mode is still slightly negative for most clips on
psnr and SSIM but seems to have a much bigger visual
impact on several problem clips than aq mode 1.
Old results for std hd:
-2.578% psnr, -1.151% SSIM
New results for std hd:
-1.561% psnr, -0.85% SSIM
Change-Id: I94f57f8a73121629ce598fb921aad761c1450e1c
Some parameter changes and fixes on one-pass rate control.
derfraw300 is now only 10% below 2-pass speed 0 rate control.
Change-Id: I1940eef8a5a035dc18e71b880d5e00cabd1f01b9
This CL changes libvpx to call a function when a frame buffer
is needed for decode. Libvpx will call a release callback when
no other frames reference the frame buffer. This CL adds a
default implementation of the frame buffer callbacks. Currently
only VP9 is supported. A future CL will add support for
applications to supply their own frame buffer callbacks.
Change-Id: I1405a320118f1cdd95f80c670d52b085a62cb10d
Function encode_rtc_frame_internal() and encode_frame_internal() only
differed by a couple of speed features, this commit relocation those
difference into the setup of speed features and merged two functions
into one to remove duplication.
It also fixed a subtle bug super_fast_rtc was used before it was
initialized.
Change-Id: I234a5a1d11a4450930e5b4943dbab434208d5030
-Properly set the average frame size for each layer.
-Allow each layer to update its average/last Q stats after encoding.
-Initialize for some layer context variables.
Change-Id: Iaa37d144fcf4f30ff4283a4e8db8b9ca8bf4c815
A bug was reported in Issue 702: "SIGILL (Illegal instruction) when
transcoding with vp9 - using FFmpeg". It was reproduced and fixed.
Change-Id: Ie32c149a89af02856084aeaf289e848a905c7700
Bitwise OR operation doesn't guarantee any subexpression evaluation order.
Just reading one bit now and ignoring the next one. For reference look at
vp9_decode_frame() implementation.
Change-Id: I4971686929838ae5ded8f43a38a2934db5e1d462
Fixes some of the parameters for 1-pass non-cbr mode.
Also includes some cleanups, inlcuding refactoring of the
recode_loop options.
Results on derfraw300 improve by about 5-6%, so that the one-pass
mode is now 13% below the 2-pass mode in speed 0.
Change-Id: I844cc2638694c7574f3be00d41d60b23dc1016f0
this ensures both are properly initialized when calling _dealloc().
+ check the arrays before access
Change-Id: I789af39b41c271b5cb3c029526581b4d9903b895
This patch adds a buffer-based rate control for temporal layers,
under CBR mode.
Added vpx_temporal_scalable_patters.c encoder for testing temporal
layers, for both vp9 and vp8 (replaces the old vp8_scalable_patterns).
Updated datarate unittest with tests for temporal layer rate-targeting.
Change-Id: I8900a854288b9354d9c697cfeb0243a9fd6790b1
Update the local makefile to build all the files and the test
application by default to simplify build verification.
Change-Id: Ic10141ea14c85110ff7507447d16297b77d296e9
Right now only IVF format is supported which is enough for example code.
Other formats like y4m, webm, raw yuv will be supported later.
Change-Id: I34c6f20731c1851947587ca5c589d7856b675164
since
50fa585 Removing examples code generation and making them static.
the examples have been c files, not generated from text. this removes
GEN_EXAMPLES and replaces it with EXAMPLES, building the source directly
rather than copying it to the build folder
Change-Id: I5445bc49553419e3d2430963517d2c18cdba1f82
Inlcudes a number cleanups:
1. Moves the one-pass pre-encode parameter setting functions
to vp9_ratectrl.c
2. Deprecates per_frame_bandwidth in RATE_CONTROL structure
3. Removes target_bandwidth in cpi structure since it is not used.
4. Various renaming of functions
There is no bit-stream change in 2-pass, one-pass cbr and one-pass
vbr modes.
Change-Id: Ifd9916bf4d485b7d04c5f52044ffe6703254ccbd
This isn't strictly necessary, but makes the file more consistent
with the other arm assembly source files.
Change-Id: I245c9677d89e0ab3f31991e473764858af35b180
Avoid substitution of substrings by using \b to make sure the
substituted strings are at word boundaries.
This is an adaption of the corresponding changes to ads2gas.pl
from 7ebcaeb0fa.
Change-Id: I52160e8ba0373d4779d5fc3b0c384ca5c51c7b13
remove example files that have been tracked since:
50fa585 Removing examples code generation and making them static.
Change-Id: I9dd2e1588003918286d455c5e58a43393b176a84
older versions of visual studio did not include the trailing \. this
moves the objects to their intended location: the project subdirectory
Change-Id: I244479cdebf6b3f03bed6dbfca82e7fb4542f0de
CONFIG_USE_X86INC is available to every makefile, there's no need to
duplicate its value with USE_X86INC
Change-Id: Id12bd5f09cba78abba56ab5a8f56351562e5b8b6
avoid wrapping msvc includes with extern "C"; this breaks some visual
studio builds of the (c++) tests.
Change-Id: Ie8062d55d4f4c049f6cd360a36da6a67607df132
git diff adds the following line to diffs:
\ No newline at end of file
which interferes with diff.py parsing. diff.py only looks for '+', '-'
and ' ' at the beginning of the line.
Issue seen on https://gerrit.chromium.org/gerrit/68611
Change-Id: I0d7b4485c470e0b409f2c9cddde6c9aceba0152e
Fixes rate control partially in one-pass non-cbr case to achieve a
bitrate close to the one desired. Previous version was way off at
the high bitrate end.
Also includes several one-pass rate control cleanups and refactoring.
On derfraw300, one-pass encoding is now 19% off from two-pass speed
0 encoding, down from 35%.
Change-Id: I6f0dcdb7f8aa85a7e7cd3a3155d4f9d2a4d2f4f4
This patch added ssse3 optimization of bilinear sub-pixel filters.
The real time encoder was speeded up by ~1%.
Change-Id: Ie82e98976f411183cb8c61ab8d2ba0276e55a338
Moved a few features with low impact on compression form -5 to -4 and
increased adaptive_rd_thresh for -5.
Change-Id: Ib1b748168cc6ed7684ae4818499f3a536ae76253
The new implementation disagrees when the argument is equal to 2**n but
that is never called in practice and based on how it is used the new
implementation is correct in that case.
Change-Id: Ifbac4ad87d459fe6bd2fd0f400c0340f96617342
This avoids calls to get_unsigned_bits() with constants and
replaces hard to trace loops with simpler structures.
Change-Id: Ic1afc5a17d7df5bcfc85b76efda316b0bf118467
Using bilinear filters could speed up the codec in real-time mode.
This patch added sse2 optimizations of bilinear filters that
operate on different-sized blocks.
Tests showed that the real-time encoder was speeded up by 3%.
Change-Id: If99a7ee4385fcc225c3ee7445d962d5752e57c3f
This patch adds a buffer-based rate control for temporal layers,
under CBR mode.
Added vpx_temporal_scalable_patters.c encoder for testing temporal
layers, for both vp9 and vp8 (replaces the old vp8_scalable_patterns).
Updated datarate unittest with tests for temporal layer rate-targeting.
Change-Id: I9cb6cce2494390ae6096ee17774af7fb9308bde7
the public typedef already includes a const, quiets
'same type qualifier used more than once' warnings
Change-Id: Ib118b3b116fba59d4c6ead84d85b26e5d3ed363d
As pointed out by Dmitry and James, "partial" is a Microsoft-
specific c++ keyword, and it is renamed.
Change-Id: Ia0fc11ceb89e54b3195287f89f7e26edbbe9beb8
This commit added a logic to prevent the inter_filter type from being
changed if the default interp_filter mode is not switchable. Also, it
sets the default interp_filter to BILINEAR at very and super fast rtc
encoding modes
Change-Id: Ic41e6d31de29795a4ce536ec79afb01cab6daad3
--rt --cpu-used=-5 uses the progressive rtc mode
--rt --cpu-used=-6 uses the new super fast rtc mode
Change-Id: Id6469ca996100cdf794a0e42d76430161f22f976
Implemented parallel loopfiltering, which uses existing tile-
decoding threads. Each thread works on one row, and when that row
is loopfiltered, it moves to next unattended row. To ensure the
correct filtering order, threads are synchronized and one
superblock is filtered only if the superblocks it depends on are
filtered already.
To reduce synchronization overhead and speed up the decoder, we use
nsync > 1 for high resolution.
Performance tests:
1. on desktop:
8-tile 4k video using 8 threads, speedup: 70% - 80%
4-tile HD video using 4 threads, speedup: ~35%
2. on mobile device(Nexus 7):
4-tile 1080p video using 4 threads, speedup: 18% - 25%
4-tile 1080p video using 2 threads, speedup: 10% - 15%
Change-Id: If54b4a11960dd706c22d5ad145ad94156031f36a
* Avoid unnecessary type erasure
* Prune unused/duplicate fields from struct rdcost_block_args
* Make struct rdcost_block_args a local
Change-Id: I4f1fd4837ccd028bbfe727191ee8d69f0463b7e5
When showing a previously decoded frame, i.e. when
show_existing_frame=1, the update of the
last_show_frame flag must be disabled.
This is to ensure that the last_show_frame flag
reflects the state of the flag for the immediately
previously decoded frame rather then the value that
was forced to ensure that a previously decoded frame
would be displayed.
This patch also adds a test vector to verify that the
display_existing_frame flag works as expected. Code
for generating the test vector can be found in this
patch:
https://gerrit.chromium.org/gerrit/#/c/68581/
(Bug originally reported by Alexander Voronov
<ru.xalba@gmail.com>).
Change-Id: I731d288fba02088959f7fcc87707137fffc6acf5
Added a constant to represent the minimum KF boost
rather than using the magic number 2000 in the code.
Change-Id: I9428b61f47d26312caff81c6f9ae8587df004791
Some changes in 1ca1186 were mistakenly reverted by a later merge,
this commit re-instated the chanages from values to enums.
Change-Id: Ia6b01c31da584a1f612996e6432612c1295b9eaf
obj_int_extract.bat
this project and target still need some work to allow for concurrent
builds to succeed from the command line.
Change-Id: Ieb3bddc54636e77519083c48573909616257eb23
In this new mode, the size range is strictly determined by the min
and max partition size in neighborhood blocks.
Niklas720 encoding time at cpu-used -5 goes from 56250ms to 50676ms,
a 10% reduction.
Change-Id: I316b0e2ac967ff3fad57b28d69c0ec80b7d8b34e
Includes a few fixes and clean-ups that adds the ability
to use alt-ref frames in one-pass mode.
Whether alt-refs are actually used or not is controlled by a
macro USE_ALTREF_FOR_ONE_PASS in vp9_firstpass.c.
This first cut seems to improve derf by 15+% in 1-pass mode.
But further experiments with parameters are underway.
Change-Id: I78254421435478003367c788c7930d2dc4ee2816
This patch only works if the video is a width and height that are both
a multiple of 32.. It sets every partition to 16x16, and does INTRADC
only on the first frame and ZEROMV on every other frame. It always does
does the largest possible transform, and loop filter level is set to 4.
Was ~20% faster than speed -5 of vp8
Now 20% slower but adds motion search ( every block ), nearest, near
and zeromv
The SVC test was changed because - while this realtime mode produces
bad quality albeit quickly, it isn't obeying all the rules it should
about which frames are available.
Change-Id: I235c0b22573957986d41497dfb84568ec1dec8c7
Adds a stand-alone resize_util app for testing. The app
will not be built in the shared library configurations
so as not to require the APIs to be exposed.
Change-Id: I4718c8bff1abf4e57c2ab2d84be8738fc0048200
Adds multiple filters in the 0.5-1.0 range in the last stage
of the resize functions to prevent over-smoothing/aliasing
Change-Id: I1a615adb16f0df5095790945c94b28b4d6a6fc48
SSE for a 64x64 block with 3 planes can go as high as 3*2^28. So left
shift by 4 may overflow 32 bit int.
Change-Id: I63c84aa56894788bb987299badabbd7cc6fd0be6
The sum of squared mv components can go beyond int range for large
input resolution. This commit changed the type to int64 to avoid
overflow.
Change-Id: Ib21ea2817845cea1435f893064e6417c79c5bc64
New API is supposed to be used from example code. Current implementation
only supports IVF containers (will be extended to Y4M).
Change-Id: Ib7da87237690b1a28297bdf03bc41c6836a84b7e
Adds an arbitrary-size resize library for use in scaling of input
frames in a non-normative manner in the vp9 encoder. The method
used is as follows:
Downsampling - Uses a 8 tap filter for factor of 2 decimation upto
a size just higher than the desired size. Then interpolates pixels
at a precision of 1/32 pel using a set of 8-tap filters.
Upsampling - Interpolates pixels at a precision of 1/32 pel using
a set of 8-tap filters.
There is no assembly optimization yet.
Change-Id: Ib5b81e174fc139da322bb97c8214d52289d60d8a
Encoder's boarder is still 160, while decoder's boarder will be 32.
With on demand and separate boarder buffer for boarder extension.
The decoder's boarder does not need to to 160 anymore.
Change-Id: I93d5aaff15a33a2213e9761eaa37c5f2870747db
This commit explicitly enforces the effective motion vector range
in the motion search stage. The range needs to be the intersection
of UMV border, effective absolute motion vector value range, and
the target search area.
Change-Id: I1cf7c563e02b1086040dad6c1f4f6be1538635a6
When showing a previously decoded frame, we need to
explicitly set the show_frame flag.
For the current frame being decoded this flag is
explicitly set in the frame header.
This should fix WebM Issue 696:
http://code.google.com/p/webm/issues/detail?id=696
Change-Id: I5751a809813f88d2ca6f62c47c3878475ff9ba8d
The affect on quality was minimal. Less than .1%, various sets
yt ( +.15%), derf (-.1%), hd ( -.1% ), std hd(-.15%)...
The affect on speed of encode at speed -5 was substantial ( ~3% ).
Change-Id: I8903346fbae0c35f5b9ea20f81fdd239ae81247d
This commit deprecates the use of best_mv from encoding and bit-stream
writing stages. It hence removes the definition from MACROBLOCKD.
Change-Id: I8e5302775a2aa4a18900726df407bff881f2dfb1
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.