Migrates experiments from expeirmental to playground.
Included experiments are FILTERINTRA, MASKED_INTERINTER, INTERINTRA
and MASKED_INTERINTRA. Bugs in masked sub-pixel variance calculation
and masked sub-pixel motion search are fixed. Recursive filters for
intra prediction are upgraded to 4-tap filters.
Change-Id: I9964a7ebfefc1efa70bb66be7b35d975c3f66e23
This commit reworks the prediction filter rate-distortion cost update
process consistent for all block sizes.
Change-Id: I5874349ab38df380240f96c2d4ef924072bab68d
From frame 2, the lpf deltas are all cleared for for even frames, and
a set of values are set and used for odd frames. The intention is to
exercise decoding code around lpf delta/update decoding.
Change-Id: Ic9ff1bc2c2a023f4805852f8573398f2ec2249d7
Guard against incorrect size values moving *data past data_end.
Check read length against the difference of the buffers.
Change-Id: Ie0b54e2db517fd41a0f3ceb23402ee44839a4739
lf deltas are later setup in function vp9_setup_past_independence(),
so this commit removed the redundant copy. Also renamed a function
to better align the behavior of the funciton.
Change-Id: I5d28c2f5b12b3d31817e14296ed4605c1fd5c98c
MV struct was ussed to indicate the postition of a MI_BLOCK with row
and col components. The expression was confusing, this commit added a
new stucture "POSITION" with row and col component to better describe
the position of a mi_block.
Change-Id: I59fdd4b45010fe7d85a8db22a55503265c4f5b2b
Various cleanups and refactoring.
Removes feedback of active worst qaulity and uses last_q
instead to make the interface cleaner. Active worst quality
is now decided only once for a frame being coded in the
beginning based on last_q and other stats. Also, adds other
cleaups on last_q to store also the last_q for altref frames,
and reduces the altref interval a little.
The output does change a little.
derfraw300: +0.224% (global psnr)
stdhdraw250: +0.442% (global psnr)
Change-Id: Ie634cdc032697044c472dd0fe79c109b3e7f9767
The added vector was encoded with aq mode on, with the intent to
exercise the decode code around segment feature.
Change-Id: Iedcb7261e87d3e11b25ecf031d3a69385271148e
Properly handle the rd_filter_cache update, when early termination
or skip prediction filter type check is triggered.
Change-Id: Ie7b9a75fed3358f45ffd15817f2b36670c14eb2d
Increased threshold(t) for interp filter search. This sped up the
encoder with some PSNR loss.
Borg tests were ran at speed 2.
t = 100, PSNR loss:
-0.710%(derf); -0.561%(stdhd); -0.647%(youtube)
speedup:
9%(derf); 3%(stdhd); 5.7%(youtube)
t = 500, PSNR loss:
-1.687%(derf); -1.665%(stdhd); -1.664%(youtube)
speedup:
18%(derf); 10%(stdhd); 8%(youtube)
Change-Id: I180e3657c1e156aaa88dc7c437f8bcbd19f5caba
Corrected a typo that set rc_2pass_vbr_minsection_pct to
two different values on consecutive lines. Second line
should have set rc_2pass_vbr_maxsection_pct.
Change-Id: Ie07ac67cd5455afe556bef34da8127304db9c97c
This commit enables an adaptive prediction filter type selection
for sub8x8 block sizes. In speed 1, it re-uses the filter type of
collocated 8x8 block if it is tested in the rate-distortion optimization
loop, for the sub8x8 blocks. Otherwise, it runs the normal test
over all the three filter types. In speed 2, it re-uses the 8x8
block's prediction filter type, if available. Otherwise, force it
to be EIGHTTAP.
Compression and speed performance wise:
speed 1
derf -0.266%
yt -0.138%
bus at 2000 kbps: 33766ms -> 30451ms (10% speed-up)
football at 600 kbps: 48173ms -> 43786ms (9% speed-up)
speed 2
derf -0.026%
yt +0.134%
bus at 2000 kbps: 18973ms -> 17698ms (6% speed-up)
football at 600 kbps: 26748ms -> 25096ms (6% speed-up)
Change-Id: I77e097533b969fd3472147225fa79fc98095d342
Making overall logic more clear, moving "hacked" calculation of base filter
array pointer to get_filter_base() function.
Change-Id: Ibbd38a9f937e48d35bbbfef3ad933ab36664cccb
Trying to make encode_sb() more similar to write_modes_sb() and
decode_mode_sb() because essentially all branching logic should be the
same.
Change-Id: Ib7dec7b48fce29418142abad4d1dcfdb1c770735
This commit constrains the maximal motion search range for sub8x8
blocks to be [-1023, 1023], in the unit of full pixel.
Change-Id: I955b60649364ab410f2453cafd46a496f2fcb43e
There were two problems with the format string in
the conditionally compiled print statement. It referred
to a variable that is no longer available and it used
incorrect format specifiers.
Change-Id: I315e22bea2691bb535a2e33f5ca206fc55287a37
Adds a hook that derived test classes can implement to be notified
before every call to decode a frame.
Change-Id: Iefa836459cf3e5d7df9ee27f8198daf82b1be088
reorder the tiles based on size and their presumed complexity. this
minimizes the cases where the main thread is waiting on a worker to
complete.
Change-Id: Ie80642c6a1d64ece884f41683d23a3708ab38e0c
In evaluating partition split case, Wrong partition size is used in
calling partition_plane_context(). This commit change to use the
correct sub partition size. The incorrect partition size used were
causing an ASAN error in unit test.
Change-Id: Iab695b764bc51cc61580075f2ae4001421132362
Clean up and simplification of both estimate_max_q
variants and only call once per clip/section.
This leads to a more constrained range of Q values
across a clip / section.
Average gains across all 4 test sets:-
PSNR ~0.5% SSIM ~0.3%
Change-Id: If77d5f7bb50939a464e117724f4da5b001c62d70
For VP9, lossless coding is enabled by passing 0 for both min_q and
max_q. This is a valid configuration, and should not be warned.
Change-Id: Idd117579cd89cd14c0723b1d7e482067ac12b401
In lossless coding, distortion is always 0. Early exit based on this
metric was incorrect.
This CL also changed to use best_rd instead of distortion as the metric
for easly exit as requested by Jim.
Change-Id: I8ef3e407ac03b4abc3283b273f936a68fad5c2ab
Add a full range motion search for regular block sizes. This runs
exhaustive search within the given reference area. This commit further
optimizes the search process by combining 4 points test into one
pipeline, which gives 30% speed-up as compared to run each individual
point at a time.
This full range search serves as a best possible motion search reference.
When replacing the diamond search with full range search, the speed 0
runtime of bus CIF at 2000 kbps goes from 153872ms to 623051ms. The
compression performance compared to speed 0 setting gains 0.585% for
derf set.
Change-Id: Ieef1225216b0b86b4ac4872fa7fb9e18bf2eabb3
Removed an adaptive rate correction factor that was having
a negative impact on quality in many clips. This factor
was influencing the Q range available to each frame
independently of the bits allocated to each.
Average results with DISABLE_RC_LONG_TERM_MEM.
derf +0.199, -0.059.
yt +3.957, +3.798
std hd +1.577, +2.140
yt hd +4.127, +4.513
Average results without DISABLE_RC_LONG_TERM_MEM
derf -0.628, -0.665
yt +3.432, +3.015
std hd -0.105, +0.153
yt hd +3.432, +3.015
Change-Id: I45bab6b606f49a442e7b27a6d631f3ffd843bbce
Includes various cleanups.
Streamlines the interfaces so that all rate control state
updates happen in the vp9_rc_postencode_update() function.
This will hopefully make it easier to support multiple
rate control schemes.
Removes some unnecessary code, which in rare cases can casue
a difference in the constrained quality mode output, but
other than that there is no bitstream change yet.
Change-Id: I3198cc37249932feea1e3691c0b2650e7b0c22fc
Removed calls to vp9_update_mode_info_border since
they immediately followed code that initialized the
entire buffer to 0.
Change-Id: Ife06794daa20439a0b607a83a87f88df59afac40
Both single frame and compound inter motion search run with luma
component only. Hence removing the block size mapping therein.
Change-Id: I217488e702432ae9fa0e95bf6f516ebb36b5c79b
The old code would start in a mixed state, where all the reference
frames were pointing to frame buffer 0, but the reference counts
were 0. This is why we needed special code for the first frame.
Change-Id: I734961012917654ff8c0c8b317aac00ab75ded1a
Using get_plane_block_size() instead of manipulation with subsampling
values, calculating all required values only once without redundant calls
to b_width_log2().
Change-Id: I00303f2a0926f9c4cb17f34591adda60615f8919
Modifications to the spatial scalable encoder to match
changes made to the scaling code in the decoder.
In particular, the use of a dummy first frame was removed
now that the decoder is able to handle a smaller first
frame.
SvcTest.FirstFrameHasLayers unit test re-enabled.
Change-Id: Ic2e91fbe4eadf95895569947670d36d68abaf458
Jingning saw bitstream change with this patch. It could be true
that (mask_16x16_0 & 1) is 1, but (mask_16x16_1 & 1) is 0 in some
edge cases.
This reverts commit 8f05e70340.
Change-Id: I0a529435ce816a1e14653eb510d5090de276070a
In the decoder we don't need to save eobs, we can pass eob as an argument.
That's why removing eob arrays from VP9Decompressor and TileWorkerData,
and moving eob pointer from macroblockd_plane to macroblock_plane.
Change-Id: I8eb919acc837acfb3abdd8319af63d1bbca8217a
This commit makes the coefficient tree initialized prior to token
initialization, where the coefficient costs are filled out according
to the probabilities associated with coefficient value categories.
Change-Id: If4e89c3923058376f8382c683fe4a225a4a38af3
This commit fixes the intra prediction reference source selection
in the settings of skip_encode. Use original boundary pixels as
prediction reference, when the inverse transform and reconstruction
are skipped in the per block size rate-distortion optimization loop.
Change-Id: I36081aa30aa46e203e0e6f4e8a420fd08269469a
Its last remaining caller can be passed its results directly without any
additional work. Also, it's not non-4:2:0 safe.
Change-Id: Ia5089ba5f7f66c7617270483c619c9271aefd868
The performance gain of idct16x16_10_add_sse2 function is not
noticeable. However since both functions use the IDCT16_1D,
idct16x16_10_add_sse2 should be modified as well.
Tested with: park_joy_420_720p50.y4m
Change-Id: I02b957e36fcf997c677d15baf496533895271bff
This commit fixes the use of uv_intra_estimate by properly restoring
the mode_info struct required by rd_pick_intra_sbuv_mode.
Change-Id: I6a156d79533c4e2e60dfd3b8c5bb0a42a8eca280
The difference with the old code is that originally the whole token_cache
was initialized with zeros at the beginning of decode_coefs() function.
Now we set several zero values explicitly with "token_cache[scan[c]] = 0".
Change-Id: I88cc5031f01d13012d1a4491739c36cb44f9401e
Removing goto and using while loop instead, renaming seg_eob to max_eob,
moving eob token counter increment.
Change-Id: Idcc4b3a45e4f313596a71776aef56691a6647e5f
E.g. disable vertical partioning for 4:2:2. Until we come up with something
better to do with the chroma block size, this prevents an assert error.
Change-Id: I9394fb3f14ec1343abc3ad4769de208e6278f285
This fixes issue 667.
In the case where the frame was an odd number of pixels
wide or high, the border was being extended by one col
or row too far.
The calculation of color plane dimensions was modified
to use those already computed at the time the frame
buffer was allocated.
Also freed the temporary scaling buffer in vpxdec to
prevent a memory leak.
Change-Id: I195bc81d84c0fc5d8260c1232200d62399e4b51f
Considering a horizontal edge, if mask_16x16 is 1 for an even-
indexed 8x8 block, then mask_16x16 is 1 for next 8x8 block in
same row. Similiar to a verticle edge, if mask_16x16 is 1 for
an even-rowed 8x8 block, then mask_16x16 is 1 for the 8x8 block
right below it in next raw. Based on that, the mask_16x16 checking
can be simplified to save cycles. The corresponding 8-pixel
vp9_mb_lpf_horizontal_edge code can also be removed.
Change-Id: Ic3fe7a5674322239208cbe2731dc3216ce2084f3
We only need qcoeff buffers in the encoder. Reducing TileWorkerData struct
and VP9Decompressor struct sizes by 24K.
Change-Id: Id148868461f7ffa3d3dd634b371503ae9c57e207
Renaming treed_read() to consistent vp9_read_tree() and moving it from
deleted vp9_treereader.h to vp9_dboolhuff.h file.
Change-Id: Iedd8655acbe25e4fcf62b79e5a13bdea69b6b004
Added the test vector provided by Attila, which caught the bug in
Issue 661 "Decoder produces mismatched outputs with ssse3 enabled
and disabled"
vp90-hantro-stream-001.ivf
size: 320x180; 20 frames
Change-Id: Ic0d2b57ac7596ecb938dd55abc8c706fc2dd6d8f
vp9_idct32x32_34_add_sse2:
speedup: 1.472
IDCT32_1D_34 and MULTIPLICATION_AND_ADD_2 are optimized
based on the fact that Only upper-left 8x8 has
non-zero values.
vp9_idct32x32_1024_add_sse2:
speedup: 1.032
Tested with: park_joy_420_720p50.y4m
Change-Id: I8670ce547552b48695049de298e2fc46ce28dfbc
- Add command line args that allow display of warnings without prompting
for user input.
- Extend warning code to make it somewhat scalable.
Change-Id: I2bad8f9315f6eed120c2e1bbe0a2a5ede15fbf35
The idea here is to allow "in frame" adjustment of the final Q
value used to encode each SB64, using segmentation.
There is also adjustment of the rd mult in regions of overspend.
Activated using aq_mode=2
Change-Id: I2f140cd898c9f877c32cd6d2e667f5e11ada4b1c
The decoder will construct inter predictor using lazy border extension,
while the encoder, going with multiple runs of motion search in the rate-
distortion optimization loop for each block, does border extension at
frame level. This commit makes separate the inter predictors for encoder
and decoder, respectively.
Change-Id: Ieca2fecba3a7201a6d64ef9f219e5d91e50559c3
When calling check_initial_width through vp9_set_size_literal
the function was defaulting to using non-subsampled chroma.
This patch changes the default to assume sampled chroma as
an interim solution until complete support for other
color formats is added.
Change-Id: Id8e7e919b350e3473dfdf7551af6fd0716478b04
This commit takes out vp9_extend_frame_borders from
vp9_setup_scale_factors.
The refactoring is for the preparation of the use of lazy border
extension at decoder. This makes it necessary to handle border
extension separately at encoder/decoder. The use of
vp9_extend_frame_borders will be removed, when lazy border extension
is ready.
Change-Id: Ia3baba3d179d5f11eee1634f19b3b319d2a59186
The decoder ignored the display width & height
specified in the frame header.
This patch adds a control, VP9D_GET_DISPLAY_SIZE, to
allow the application to obtain the display width and
height from the frame header.
vpxdec has been modified to scale the output frame to
this size.
Should the request for the display size fail vpxdec will
use the native width and height of the raw decoded
frame instead.
Change-Id: I25db04407426dac730263720c75a7dd6400af68a
This patch followed "Add filter_selectively_vert_row2 to enable
parallel loopfiltering" commit, and added x86 SSE2 optimization
to do 16-pixel filtering in parallel. For other optimizations
(neon and dspr2), current 16-pixel functions were done by calling
8-pixel functions twice, and real 16-pixel functions could be added
later.
Decoder speedup:
tulip clip: 2% speed gain;
old_town_cross: 1.2% speed gain;
bus: 2% speed gain.
Change-Id: I4818a0c72f84b34f5fe678e496cf4a10238574b7
This fixes issue 667.
In the case where the frame was an odd number of pixels
wide or high, the border was being extended by one col
or row too far.
The calculation of color plane dimensions was modified
to use those already computed at the time the frame
buffer was allocated.
Also freed the temporary scaling buffer in vpxdec to
prevent a memory leak.
Change-Id: Ied04bdcdfd77469731408c05da205db1a6f89bf5
Moves all rate control variables to a separate structure,
removes some currently unused variables,
moves some rate control functions to vp9_ratectrl.c,
and splits the encode_frame_to_data_rate function.
Change-Id: I4ed54c24764b3b6de2dd676484f01473724ab52b
- Rename the struct to VpxEncoderConfig.
- The idea behind this is to enable checking the global settings against
stream specific settings in source files other than vpxenc.c.
Change-Id: Ic736cbb714845b9466acb34671780d65b83ad1a8
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.
Change-Id: Ib85412d690bea5609a51f0e50e7c858406b8ff9e
Separate the rounding and right shift operations of forward transform
from those of inverse transform. Take out the assertion check from
inverse transforms. If the transform coefficients were constructed to
cause intermediate steps of inverse transform overflow, the codec will
just let it overflow without breaking the decoding flow.
Change-Id: Ia7ce15dfd1a73b4abbaa78cbc74ec718523c5b1b
In commit "3d50da5397d20abc932d81453b26cde758293a40", the stack
pointer was modified while aligning the stack, and it needed to
be pop out at the end.
Change-Id: I39e4adc6b8aa3379854dd264d41aa6f0f15c7953
This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.
Change-Id: Ia78663dfe74a2d46900f1c6fb07c21fac273892f
VS2010 only supports avx. There is currently no avx code
in libvpx so don't create a special case for it.
Change-Id: I39a11410367712b98bc6122c5a42fabffcdb94cf
Using for loop based on max_tx_size instead of separate checks. Combining
build_coeff_contexts() with update_coef_probs().
Change-Id: Ie335a7db29830677fbc14478a9c190d3c1068665
Modifications are done to reduce the total clock cycle.
Speedup: 1.2
Tested with: park_joy_420_720p50.y4m
Change-Id: Ia36b87e62e2f80a5fadaf5628729aedc80f38f3f
Both functions have no relation to motion vectors, so moving them from
vp9_findnearmv.h to vp9_blockd.h.
Change-Id: I74f524267886ab0fff4a2da793a10c906ed0f43a
Added filter_selectively_vert_row2 to be ready for parallel
loopfiltering in vertical direction. This change did 2-row
filtering at a time. If 2 vertically adjacent 8x8 blocks do same
type of filtering, we can do 16-pixel filtering in parallel.
Next, we need to provide 16-pixel loopfiltering functions in c
and optimized versions for codec speedup.
Change-Id: Idf97bbdd70566e55bd30e1fd25cb8544e33291be
Add support to do 16 pixel horizontal filtering in Neon.
Nexus devices saw about 0.5% decode speed increase.
Change-Id: I2993f6c2d49f31fa74976879eeaa289fd3f4e15d
This function is called from vp9_setup_past_independence() which is called
before the modified piece of code. Moving reset of inter_mode_probs into
vp9_init_mbmode_probs() for consistency.
Change-Id: Ib188e8798e1fbe15407fd501406761b746fdda95
Although no mismatch was indicated for 8/16 wide sub-pixel filters
in issue 661, they had similar problems that could cause mismatch
potentially. This patch fixed calculations in HORIZx8/16
and VERTx8/16.
Change-Id: I169961c9d40a20340995b7d22aafc89ccf30bfca
This CL fixes an overcite with the AVX2 support CL previously
merged (Change-Id: Idc03f3fca4bf2d0afd33631ea1d3caf8fc34ec29) that
prevented runtime execution of AVX2 code in WebM.
Background:
Starting with the Sandybridge processor, the CPUID instruction was
enhanced to add various extended feature flag enumeration leaves.
Reading these leaves requires an additional input value for the CPUID
instruction which is stored in ECX. This change adds this second input
value for all ARCH_X86 and ARCH_x86_64 targets to the CPUID macros,
allowing checks of EBX bit 5 for AVX2 support. This capability will be
required moving forward to check for future processor features.
Change-Id: Ie9d872bc9ff68dad4b6578e4544e4dfd0ae26c36
In commit "3d50da5397d20abc932d81453b26cde758293a40", the stack
pointer was modified while aligning the stack, and it needed to
be pop out at the end.
Change-Id: I062971e195f1f2ab9d0ab5fb84dcf215a0fcaa67
There are many places in handle_inter_mode that need to restore the
dst buffer pointers, due to buffer pointer swap and early rd search
breakout. This commit wraps these operations into an inline function
for clean-up.
Change-Id: I0462e8c41c8bc3cd8db07395489cac03d8e5be54
This patch fixed issue 661: "Decoder produces mismatched outputs
with ssse3 enabled and disabled." In sub-pixel filters, a pixel
value was multiplied by a filter coefficient, and the results
were added up. The order of adding up these multiplications had to
be arranged carefully to prevent incorrect overflowing.
Change-Id: Id08af4200fea9e1b896fc40157b8651c2c7e80f2
Reversing bit order of partition_context_lookup, and modifying accordingly
update_partition_context() and partition_plane_context().
Change-Id: I64a11f1a94962a3bf217de2f50698cb781db71a5
- Move it to webmdec.c and webmdec.h.
- Also, tidy up obvious style nits in the vicinity of code I was
already touching.
Change-Id: Ie2898d06e73c1e9030d9c8d465b73ee7edc3c02a
This rebase is a better implementation of the previous ones.
Modifications are done to reduce the total clock cycle.
Speedup: 1.341
Compiled with -O3
Tested with: park_joy_420_720p50.y4m
Change-Id: I940eaf283f60597ca0d9d2e13d518878d55ff02d
Commit a4a5a210 enabled lossless coding, but the commit incorrectly
disabled the usage of skip in encoder even when skip should be used.
This commit make sure that skip is enabled even in lossless mode.
Change-Id: I276954f952c6ac68f17a316ebc72f09001228a08
VS2010 only supports avx. There is currently no avx code
in libvpx so don't create a special case for it.
Change-Id: Iacb10ea4762155412e04f23904b4324d01451fbd
Since they used in encoder only. This commit also re-order includes
for the files that include vp9_extend.h
Change-Id: I929fc113f2135d3198cd1fc6a17434e5a2f8a459
Explicitly constrain the upper limit of motion search range (in the
unit of full pixel) to be [-1023, +1023]. It is intended to control
the effective motion search range for 4K sequences.
Change-Id: I645539c70885eec0f155781f439d97d333336e88
Refactored IVF frame reading code out into ivf_read_frame(). Forgot
to actually make the function call in read_frame().
Change-Id: Ie9f6917e70bd26d0352a761932465c60a29a1f81
This patch followed "Rewrite filter_selectively_horiz for parallel
loopfiltering" commit, and added x86 SSE2 optimization to do
16-pixel filtering in parallel. Also, corrected the declaration
of aligned arrays. For 8-pixel-in-parallel case, improved the
calculation of the masks and filters. Updated the threshold loading
since the thresholds were already duplicated. Updated neon C functions
to call neon loopfilters twice.
Using tulip clip, tests showed it gave a ~1.5% decoder speed gain.
Change-Id: Id02638626ac27a4b0e0b09d71792a24c0499bd35
Separate the rounding and right shift operations of forward transform
from those of inverse transform. Take out the assertion check from
inverse transforms. If the transform coefficients were constructed to
cause intermediate steps of inverse transform overflow, the codec will
just let it overflow without breaking the decoding flow.
Change-Id: I73cfc3706c4e840fc543a77cbc4cdb0b05d07730
on arm until we implenment real vp9_idct32x32_34_add_neon.
This issue is due to commit 47665452f0
Merge "Add 32x32 idct function for eob<=34 case".
Change-Id: I56b5f0abc20e7dd1bba521f78a995e85d65ea296
Upstream changes to account for differences in clang
syntax for Chromium iOS builds.
Since most of these are incompatible with XCode clang,
hide them behind a flag.
Change-Id: Idafcbcd4eb01b1ada6277da2d2edfd6c04b579fd
- Move IVF reading support into ivfdec.c and ivfdec.h
- Move IVF writing support into ivfenc.c and ivfenc.h
- Removed IVF writing code from the SVC example in favor of ivfenc.
Change-Id: I70adf6240d0320fdd232d8546ed573f0f68dd793
* Change from thumb mode to arm mode improves test time significantly
* Direct inclusion of test.mk allows for unit test configuration via
configure script
Change-Id: Id58d3ba8289374528756a672459d8334afe20e2a
Simplifies the code by implementing band mapping with static arrays.
A lot of the code complexity introduced in a previous patch
disappears.
Change-Id: Ia3fac36e594fb5ad2d55ae141c58bba4c55c2d28
This commit enables the unit tests for 4x4 DCT and ADST transforms.
It covers tests of round-trip error check, coefficient match check,
coefficient overflow check, and inverse accuracy check.
Change-Id: Ibfea928ee48f0ebc088b7fdb0bf2d89a14161299
The switch to the rate-correction damping factor
in https://gerrit.chromium.org/gerrit/#/c/67536/ was not conditioned on CBR mode.
Change-Id: I2326704e8ac030a4f7b592dd3fedb94c7dd0644d
The step that sums three input samples could potentially cause the
intermediate result go beyond 16 bit limit, when operating as the
second 1-D transform. This commit fixes the issue.
Change-Id: Iaf512449ac2d25ddd8a806d760afab362c62a516
Overall change (using dual buffer scheme for superblocks of both inter
and intra modes) reduces speed 2 runtime:
bluesky_1080p at 6000kbps: 263553ms -> 257441ms
riverbed_1080p at 8000kbps: 233230ms -> 225308ms.
Change-Id: Idf8d70f768a4b0d97b2a8506372c57b7b4022119
Match any whitespace instead of individual spaces. The macro
definitions in vp9/common/arm/neon/vp9_short_idct32x32_1_add_neon.asm
triggered this and treated spaces as arguments leading to lines like:
$8vld1$8.$88$8 {$8q8$8}, [$$89$8], $$8stride$8
Change-Id: I2d5718aba4614e4fd7b702e15c2a1bd80e656bd2
These changes are to support automated regressions of vpx on android
new file: test/android/Android.mk
new file: test/android/README
new file: test/android/get_files.py
Change-Id: I52c8e9daf3676a3561badbe710ec3a16fed72abd
As Jim suggested, 1D array was used to store filter levels instead
of 2D array. This used shift_y in setup_mask directly, and saved
few cycles.
Change-Id: If61ab298784861f1806b1cd396d4e4e2e0f097b9
Implements scan order to band map with arrays in both the encoder
and decoder to remove conditional statements.
Encoding seems to be about 1% faster at speed 0, tested on football.
Decoding seems to be about 0.5-1% faster on a set of 25 videos.
Change-Id: Idb233ca0b9e0efd790e30880642e8717e1c5c8dd
This commit enables the dual buffer rate-distortion optimization
and encoding scheme. It stacks the original transform coefficients,
quantized levels, and reconstructed coefficients, in the rate-
distortion optimization search process, hence eliminates the need
to re-run residual generation, forward transform, and quantization
in the encoding stage.
Change-Id: I011bfad3a59a380a869ee552e91dae0394ec492e
Added loop filter mask checking, and made the caller function
ready for implementation of parallel loopfiltering in horizontal
direction.
Next, we need to go through the loopfilter functions (both c and
optimized versions), and provide 16-byte wide loopfiltering for
each filter type.
Change-Id: Ifef47e7ef9086ebc2fd6ca7ede8f27c9bbf79e66
Allocate memory space of dual buffer sets that store the coeff, qcoeff,
dqcoeff, and eobs. Connect the pointers of macroblock_plane and
macroblockd_plane to the actual buffer in use accordingly.
Change-Id: I2f0b5f482ca879fae39095013eaf8901db20a5a4
Make the macroblockd_plane contain dynamic buffer pointers instead
static pointers to the memory space allocated therein. The decoder
uses the buffer allocated in pbi, while encoder will use a dual
buffer approach for rate-distortion optimization search.
Change-Id: Ie6f24be2dcda35df7c15b4014e5ccf236fb3f76c
We only used "ib" to call get_scan() function, which in turn calls
get_tx_type_4x4() function. The latter one only needs block index if
bsize < BLOCK_8X8 -- under that condition raster_block == block.
Change-Id: I697306a0c3cf937acdd4f5e623d4367c5acc0b2f
This commit fixes the assignment of mode_info pointer per tile. It
makes recognition of tiles in both row and column formats and properly
arrange the use of mode_info.
The bug was first introduced in
I6226456dd11f275fa991e4a7a930549da6675915
https://gerrit.chromium.org/gerrit/#/c/67492/
Change-Id: Ie12cd209f53241513728c461ee3d7b9599ddb860
Inlining set_contexts_on_border() into set_contexts(). The only difference
is the additional check that "has_eob != 0" in addition to
"xd->mb_to_right_edge < 0" and "xd->mb_to_right_edge < 0". If has_eob == 0
then memset does the right thing and works faster.
Change-Id: I5206f767d729f758b14c667592b7034df4837d0e
This patch continued the work done in "Rewrite loop_filter_info_n
struct"(commit:00dbd369c70270428d56da6d15ea5486fc821c52) to further
improve loopfilter function.
1. Instead of storing pointers to thresholds, store loopfilter
levels within 64x64 SB;
2. Since loopfilter levels are already calculated in setup_mask,
we don't need call build_lfi to look up them again. Just save
loopfilter levels in setup_mask.
3. Reorganized and simplified filter_block_plane().
Tests showed a ~0.8% decoder speedup.
Change-Id: I723c7779738bbc2afcb9afa2c6f78580ee6c3af7
This to make sure that prediction residue always get coded in lossless
mode.
This commit also fixed lossless unit test
Change-Id: I537726ee55328d4e4cf0a0196393a67e12bfcde1
The new expression is much more logical than previous one. Surprisingly
both expressions give exactly the same set of dependent values
-- have_top, have_left, have_right -- in vp9_predict_intra_block.
Change-Id: I63eb1b592b8c37883b3a0dbb1f3daa271e446109
This patch fixed the issue reported in "Issue 655: remove textrel's
from 32-bit vp9 encoder". The set of vp9_subpel_variance functions
that used x86inc.asm ABI didn't build correctly for 32bit PIC. The
fix was carefully done under the situation that there was not
enough registers.
After the change, we got
$ eu-findtextrel libvpx.so
eu-findtextrel: no text relocations reported in 'libvpx.so'
Change-Id: I1b176311dedaf48eaee0a1e777588043c97cea82
The term x represents macroblock pointer across encode_block. Change
the two local variable names to avoid confusion.
Change-Id: Ic732e73023525d673c0a678ed2708ac1edf5a3f9
Now tile decoding consists of two stages:
1. Find tile buffer start and its size, put this info into tile_buffers.
2. Decode each tile based on information from tile_buffers.
It seems that stage 1 can also be reused by multithreaded tile decoder.
Change-Id: If0cdaefdd6d10bb41c63561346c9ae4cfac081dd
It is more logical to use dqcoeff buffer to put there *dequantized*
transform coefficients (inside inverse_transform_block and
decode_coefs functions). Dequantization happens inside WRITE_COEF_CONTINUE
macro.
qcoeff buffer should be only used in the encoder for *quantized*
transform coefficients.
Change-Id: Ifd54bef272bbf5311ced6669c4f1079f998af5d7
SVC multiple layer per frame encoding is invoked with vpx_svc_init and
vpx_svc_encode. These interfaces are designed to be invoked from ffmpeg.
Additional improvements:
- make dummy frame handling a bit more explicit
- fixed bug with single layer encodes
- track individual frame sizes and psnrs instead of averages
- parameterized quantizer, 16th scalefactors, more logging,
- enabled single layer encodes to generate baseline
- include new mode for 3 layer I frame with 5 total layers
Change-Id: I46cfa600d102e208c6af8acd6132e0cc25cda8d4
I'm sure I could do more, but I don't know how long this code has to
live. I think this at least makes the code a little easier to read and
understand.
Change-Id: I6ca76357f89468d4851a3d1826e7aefa498e51d1
This is mainly a clean up patchset. It moves the WebM writing support
out of vpxenc and into its own source file. Changes to tools_common and
vpxdec result from relocation of shared bits of code.
Change-Id: Iee55d3285f56e0a548f791094fb14c5ac5346a26
Removing special case handling from vp9_tree_probs_from_distribution(),
tree_merge_probs(), and vp9_tokens_from_tree_offset() functions. Replacing
inter_mode_offset() function with macro INTER_OFFSET which is used now for
vp9_inter_mode_tree definition.
Change-Id: Iff75a1499d460beb949ece543389c8754deaf178
Removes stack-alocation of token_cache in decode_coefs function
Seems to achieve about 1% decode speed improvement as tested on
25 480p videos.
Change-Id: I8e7eb3361fa09d9654dfad0677a6d606701fdc6e
The compound inter prediction could potentially run with initial
motion vectors of invalid value and check the mv_cost, which triggers
overheap read. This commit resolves this issue by forcing a motion
vector value check for compound inter modes of both superblock and
sub8x8 block sizes.
Change-Id: I4f4fc19ce83c8272782bc382f12c82a3f03212fc
We only update partition_probs for inter frames but they are constant
for key frames. It is not necessary to have constants inside frame
context and copy them every time. This change reduces FRAME_CONTEXT size
by at least 48 bytes.
Change-Id: If70a53be51043f37fe7d113853217937710932a7
Removed:
goldfreq, avg_encode_time, avg_pick_mode_time,
cpu_freq, interquantizer
member variables from VP9_COMP since they are no longer
used in the code.
Change-Id: I010a82c217d0da03c3f53d1858d3462190c12dcf
Removed three members from the VP9_COMP data structure:
inter_zz_count, gf_bad_count, gf_update_recommended.
These were part of the VP8 real-time mode implementation
that was removed from the initial VP9 codecbase.
Change-Id: I866b083b88ef02c74837277d50ce532ca88492f3
This commit fixes the use case of plane_block_idx, which determines
the plane (Y/U/V) index based on block index. When block idx >= 4 in
sub8x8 block loop, it should be of chroma components.
Change-Id: I072705aa7b35445524ac607089ca8ce54b7ba478
We don't have to calculate 'new' probability in convert_distribution()
because it is enough to calculate only 'new' counters which could be used
to calculate probability if necessary. That's why removing a lot of unused
temporary probability arrays and reducing number of get_binary_prob()
calls.
Change-Id: I4e14eb7203d1ace61bbddefd6b9b6326be83ba63
When a frame is dropped due to |buffer_level| < 0 for a given temporal layer,
the buffer level for the upper temporal layers was not updated (in calc_pframe_target_size()).
This change fixes that.
Also, use the layer per-frame-bandwidth for updating the buffer level
of the higher layers when a frame is dropped.
Change-Id: I660c23f3229b47e9d124a950b480314b4307c5a8
1. Reduced the size memset based on eob for 32x32 transform. The reset
of non-zero coefficient should probably go into where they are read in
inverse transform functions. (TODO)
2. Removed a redundant level of indirection.
vp9_iht4x4_add() checks transform type and call vp9_iht4x4_16_add()
for tranforms other than DCT_DCT. In this case, the DCT_DCT case
has been already handled here.
Change-Id: Iacbc77da761f0b308df5acea0f20c9add9f33d20
The change doesn't affect the bitstream. It changes the order or function
calls and affects how we reconstruct intra- and inter-blocks. Speed up is
about 1...1.5%.
For intra-blocks:
Before:
for each transform block read tokens
for each transform block do prediction
for each transform block do inverse transform
Now:
for each transform block
read tokens
do prediction
do inverse transform
For inter-blocks:
Before:
for each transform block read tokens
for each transform block do inverse transform
Now:
for each transform block
read tokens
do inverse transform
Change-Id: I12a79bf1aa5a18c351b8010369bd3ff1deae1570
This CL contains two AVX2 optimized loop filter functions,
mb_lpf_horizontal_edge_w_avx2_8 and mb_lpf_horizontal_edge_w_avx2_16.
Change-Id: I604e4fe6e99752b7800c2ea98721d97f7e0b931b
-Don't reduce maxQ for gold/alt in CBR mode.
-Fix to min/maxQ for first/initial key frame.
-Add more speeds to datarate test and reduce the starting bitrate for test.
Change-Id: Id2a333d76dd3f6a51b322ca984588e2a22159c58
This commit makes zcoeff_blk cache the case where the entire block
is quantized to be zero (without applying zero-forcing) in the rate-
distortion optimization loop, and skip the forward DCT, quantization,
inverse DCT, and reconstruction process in the encode_block stage.
It now works for all the block sizes, including sub8x8 blocks.
Change-Id: I5ae60a9c436ba3637d11666733554bec4580ef98
Both decode_modes_sb and decode_modes_b had conditions to immediately
return at the beginning. Eliminating these conditions here and calling
these functions only to do a real work. Also unrolling loop for
PARTITION_SPLIT.
Change-Id: I2fc41cb74ac491f045a2f04fe68d30ff4aaa555d
"<< SUBPEL_BITS" needs to be added in the calculation. Call
set_scaled_offsets() to calculate x_offset_q4 and y_offset_q4.
Change-Id: Ied130ea771510e918f51cd1dc3abe57f4c0962b5
factorizes the code in decode_tiles(). reading the offsets backwards
wasn't doing anything to prove tile independence
Change-Id: I0395d3c77205852ebdc55efedc68291e93cef85c
Warning was: "implicit conversion from enumeration type 'VPX_SCALING_MODE'
(aka 'enum vpx_scaling_mode_1d') to different enumeration type
'VPX_SCALING'".
Change-Id: I45689e439a8775bc1e7534d0ea1ff7c729f2c7f5
"keyframe" variable in the current code actually means that previous
frame is a keyframe because cm->frame_type has not been initialized
in read_uncompressed_header.
Change-Id: I5645b0816c70abdef5dfc70113018d06276dac77
The clamp operation may not affect the values of the final assigned mv
where compiler may make use of strict aliasing rule to optimize out the
clamp operation. This change made the code segments to better comply
the strict aliasing rule.
Change-Id: I24502ff18bd4f9e62507a879cc8760a91a0fd07e
It is enough to check just block type: intra or inter. Intra block implies
intra prediction mode, and inter block implies inter mode.
Change-Id: I3cf98731a3935f670a3cd8e2b2443483eb944be4
When building with new versions of Clang we encounter some issues. Work
around them by adding -fno-strict-aliasing when we detect Clang.
https://code.google.com/p/webm/issues/detail?id=603
Change-Id: I8e945a18a7215bcc627e7a1ee110078413259cc7
replaces use of cur_tile_mi_(row|col)_(start|end) by VP9_COMMON, making
it less stateful and more reusable for parallel tile decoding
Change-Id: I1df09382b4567a0e5f4434825d47c79afe2399be
Adding these functions to encapsulate tx_type check. Changing TX_TYPE to
int to match the declaration in vo9_rtch.h.
Change-Id: I6f3a2df6e35595ca73b6aaa9e3909ee7bc3fd16f
Restructured the storing of loopfilter information. Deleted
loop_filter_info struct and reduced copying happened in every
superblock.
Tests showed a 0.5% ~ 0.8% decoder speed gain.
Change-Id: Ie6a8e46bae71dc3a3cd8c6054f5de540b8e0ef5e
update_partition_context / partition_plane_context: this will allow for
separate storage to be used in tile decoding
Change-Id: Ie0bc393531ab7e9d2ce35c95111849b294aad4ed
This is required in order to build libvpx on OS X Mavericks where gcc
compiler is deleted, clang (3.3) is the default now.
Using unmodified source files from gtest-1.7.0/fused-src folder.
Change-Id: I3d5f7278149c904e48737327daf7097a8bb0b390
When only upper-left 8x8 area has non-zero dct coefficients, we
could skip 1D IDCT for 9th to 32th rows to save operations. This
function is called when eob <= 34.
Change-Id: I9684b75947bdde346cfe3720f08a953aa7a13fb5
If the webm file did not have a Cues then vpxdec would fail
when creating a y4m file. If there is no Cues element print
out a warning and set fps to 30.
Change-Id: Ieea7040265dfdac7dff4ccf917c6f756160a96bc
set_active_map()
set_roi_map()
The APIs need be implemented and tested later, to insure consistency
with VP9 codec internals
Change-Id: I198124ee318f0883b58d1d36cea3c7ccd742a57e
For consistency with idct function names. Renames:
vp9_short_fdct4x4 -> vp9_fdct4x4
vp9_short_walsh4x4 -> vp9_fwht4x4
Change-Id: Id15497cc1270acca626447d846f0ce9199770f58
Splitting setup_inter_inter function into is_compound_prediction_allowed
and setup_compound_prediction. Moving setup_compound_prediction call
into read_comp_pred from read_uncompressed_header.
We should do the same in the encoder as well.
Change-Id: I40d75fdc4a221b2f7705df00d23a4b3fe79987c3
The encode_block for pass 1 takes simpler functionalities and can
save a few branches. The main reason is to make encode_block only
used after running rate-distortion optimization search in pass 2,
hence allowing dual buffer stack approach later.
Change-Id: I9e549ffb758e554fe185e48a07d6e0e01e475bcf
Use a flag variable to determine if coded in inter mode, thus avoiding
multiple inter mode checks in super_block_yrd.
Change-Id: I0ef998b2811c38e185a2e0583f0f636cee45d2cf
Assign the pointer to mode_info stream per tile. Remove the use of
tile_col in the decoding modules.
Change-Id: I7df87086708a3d92c5e20e86bcfb04e458ff47a6
This move is done to have all compressed header reading functions in one
place. Moved functions:
read_switchable_interp_probs
read_inter_mode_probs
read_comp_pred_mode
read_comp_pred
update_mv
read_mv_probs
Change-Id: I2aebb57d2826d03d11bf2f8fbbfc3a9978c4f9fb
The ref's scale_factors are set at frame level, and then copied for
each partition block. Since the struct members are mostly constant,
this patch separated the constant and non-constant members, and
reduced struct copying. This gave 0.5% ~ 1.4% decoder speed gain.
Change-Id: I94043bf5a6995c8042da52e5c661818dfa6f6d4c
The pointer was asigned only once with vp9_regular_quantize_b_4x4, calling
this function directly now. Also removing unused declarations:
prototype_quantize_block
prototype_quantize_block_pair
prototype_quantize_mb
vp9_regular_quantize_b_4x4_pair
vp9_regular_quantize_b_8x8
Change-Id: I14325bc2f082336820671eafbc06126651b79f73
This commit uses left_available flag to decide if the left mode_info
struct is available for left_block_mode. As discussed with James
Zern (jzern@), this prevents the codec from fetching mode_info from
blocks in the left tile, which although effectively not used might
present concerns for multi-threaded tile decoding.
This is NOT a bit-stream change.
Change-Id: I1dc8cf1bcbf056688eee27c7bc5706ac4b4e0125
Simple modification to reduce number of cycles in the
function.
Original function number of cycles: 973
Modified function number of cycles: 835
Improvment factor: 1.165
Tested with: park_joy_420_720p50.y4m
Change-Id: Ic5857272ea3aafe21d5ef9a69258d78c688f69bd
This reverts commit a82001b1cf, reversing
changes made to f6d870f7ae.
This commit breaks windows builds and needs some work to fix those and
some additional comments.
Change-Id: Ic0b0228e36704b127e5e399ce59db26182cfffe7
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
We used set_partition_seg_context() only before calls to:
1. update_partition_context()
2. partition_plane_context()
Moving these functions from vp9_blockd.h to vp9_onyxc_int.h and
inlining set_partition_seg_context into them. After that it is not
necessary to have {above, left}_seg_context fields in MACROBLOCKD struture,
so removing them also.
Change-Id: I4723f59e1c8f3788432b7f51185d8d747b3a97f9
missed one in vp9_detokenize.c in the last
+ add some asserts in vp9_decode_frame() to catch regressions
Change-Id: Ide67505114ee17efdafb13694aed0c09039e5a16
replace VP9D_COMP usage with the (slightly) more targeted
VP9_COMMON/MACROBLCKD/struct segmentation structures.
Change-Id: Iabb3616e231417b0e17b7e4b384ea63167a81745
This 2-pass rate control setting allocates bits based
on first pass stats to each kf group, gf group and individual
frame but does not correct the bits left and allocation after
each frame.
In other words it recommends a bit allocation for each frame
but does not try and correct any over or under spend on a
frame over the remainder of the clip. This reduces the accuracy
of rate control in terms of hitting an average bitrate but prevents
problems that may arise because early frames either use to many
or too few bits. This mode is currently more inclined to undershoot
than overshoot (particularly at higher data rates).
Also minor changes to rate of adaption when recode loop is not
enabled.
This mode is currently enabled by default for VBR.
It gives the following % performance gains.
derf +0.467, +1.072
yt 2.962, 2.645
stdhd 1.682, 1.595,
yt-hd 2.3, 2.174
Change-Id: I3c84a9bf8884e5b345698ff0e19187f792c2f3a0
Delta reduced because of concern about popping on some
very hard clips.
Also allow some frame recode at speed 2 for kf/gf/arf.
Change-Id: Ib47dff42da41aa6eec83b7285fcaaca24abb851e
Renames for consistency with other constants:
NUM_FRAME_TYPES -> FRAME_TYPES
NUM_PARTITION_CONTEXTS -> PARTITION_CONTEXTS
Change-Id: I3db30acb2868eb0a424237c831087b2e264ec47f
This patch fixed a bug that caused 32bit PIC build mismatch. The
stack pointer was modified after "GET_GOT". Loading left pointer
from a hard-coded position gave wrong result.
Change-Id: Iea0aec6f917b12a6b3393ffc986bad74510248cc
Commit "d207 intra prediction ssse3 using bytes" caused mismatch
while building 32bit PIC code. Disabled these SSSE3 functions
until we fix the bug.
Change-Id: Ic444e531d3d4058092fe6eab09006b44fcb18e4c
This commit makes the buffer allocation of zcoeff_blk array in
pick_mode_context block size aware. It calculates the number of
4x4 blocks in the partition and assigns the memory space accordingly.
This process (and the uninitialization) is done once for each encoding
pass. It allows memory copy of smaller buffer when possible.
For football at 600kbps, the runtimes improve by about 1%:
speed 1, 45961ms -> 45472ms
speed 2, 23863ms -> 23598ms
Change-Id: Id2ca24906fa89f46fa5fe742ec4b8efc2a61f877
in most cases at least the left column was a harmless race as it was
left unused later in the code.
Change-Id: I43211df66fb157c6feecf08c681add4fcf18b644
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: Ibc944952a192e6c7b2b6a869ec2894c01da82ed1
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I2d95fdcbba96aaa0ed24a80870cb38f53487a97d
That makes decoder and encoder (only bitstream writing part) a little bit
simpler and faster. Moving get_sb_index() function to the encoder.
Change-Id: Ie91aaeefd69c84b085948267b33556a7666c6278
coef_counts is now in cpi->mb, instead of cpi. The commit corrected the
mis-use and enable succefual build.
Change-Id: I0e77909d34571cfd2560c66b46b1f8fa0cd1a6b4
Making this change in order to move allow_high_precision_mv field
from MACROBLOCKD structure to VP9_COMMON (because it is a frame level
flag).
Change-Id: I1d006ba36d938e0caf4d40fa051e2e38df9c1108
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: Id623c5113262655fa50f7c9d6cec9a91fcb20bb4
cherry-picked from:
commit 988b70844e03efcfcc075a9bc25d846670494f36
Author: Pascal Massimino <pascal.massimino@gmail.com>
Date: Fri Aug 2 11:15:16 2013 -0700
add WebPWorkerExecute() for convenient bypass
This is mainly for re-using the worker structs without using the
thread.
Change-Id: I8e1be29e53874ef425b15c192fb68036b4c0a359
Original source:
http://git.chromium.org/webm/libwebp.git
100644 blob c0d318aee628fdf9ba4876451a28aa978f1066b8 src/utils/thread.c
100644 blob c2b92c9fe353f8e514f78922f3d237204a9cbc66 src/utils/thread.h
Change-Id: I13fe92b1e94062bb99fdeeb7cb0b4b0575d27793
* changes:
Use a separate MODE_INFO stream for each tile column
Get rid of "this_mi", use "mi_8x8[0]" everywhere instead
Make the static_segmentation feature work again
First pass does not produce compressed data, therefore encode/decode
match check is not initialized.
Change-Id: I1971a6747337872a850987cc70ba267bd0f1d564
The only case where they were intentionally pointing to different
structures was in mbgraph, and this didn't have the expected behavior
because both of these pointers are used interchangeably through the code
Change-Id: I979251782f90885fe962305bcc845bc05907f80c
Moving code that gets band_translate array from get_scan_and_band()
function to get_band_translate() function. Renaming get_scan_and_band() to
get_scan().
Change-Id: I43047c205a1ca2a6e24be44db39dc04b7a385008
This should be similar to what x264 does with --aq-mode 1.
It works well with clips like parkjoy and touhou
(http://x264.nl/developers/Dark_Shikari/LosslessTouhou.mkv).
At low bitrates, the segmentation signaling overhead may negate the
benefits of this feature.
(PGW) Default changed to feature OFF to allow provisional merge.
Change-Id: I938abf9bb487e1d4ad3b0264ea03d9826275c70b
Updated the encoder to handle frames that are coded
intra-only. Intra-only frames must be non-showable,
that is, the "show frame" flag must be set to 0 in
the frame header.
Tested by forcing the ARF frames to be coded intra-
only.
Note: The rate control code will need to be modified
to account for intra-only frames better than they
are currently handled.
Change-Id: I6a9dd5337deddcecc599d3a44a7431909ed21079
Remove the semicolon in the definition of vp9_zero macro. Make all
the use cases of vp9_zero of consistent format.
Change-Id: Ibaf9751e8595872b12766381a93d185a4d90df8f
The commit added check to make sure no invalid memory access even when
the decoder instance is never initialized.
Change-Id: I4da343d0b3c78c27777ac7f5ce7688562c69f0c5
For bad input data, the decoder may access the array out of bounds. The
commit added clamp to prevent such out of bound access
Change-Id: I0a1cfd9b8786ea7113a998053c76605c963b077a
Use the zcoeff_blk buffer of PICK_MODE_CONTEXT to store the indexes
of all-zero-coeff block of the current best mode. Remove the temporary
buffer best_zcoeff_blk defined in the rate-distortion optimization
loop. This improves the speed performance by about 0.5% in all speed
settings.
Change-Id: Ie3e15988ddfa581eafa2e19a8228d3fe4a46095c
This commit moves token_cache buffer into macroblock struct, instead
of defining as a local variable in cost_coeffs. This avoids repeatedly
re-allocating memory space in the rate-distortion optimization loop.
The runtime at speed 0 reduces:
bus 2000kbps, 161692ms to 159951ms
football 600kbps, 229505ms to 225821ms
Change-Id: If7da6b0b6d8c5138a16271a33c4548fba33d8840
"-no-prec-div" option helps codec performance, so it was added back.
"-no-intel-extensions" was added to suppress link warning #10237.
option '-use-asm' is deprecated and removed.
Tested icc 32bit build and 64bit build.
Change-Id: I736ec2619857efd425ef76338dc52f8fbc0bcc7e
Using TREE_SIZE for the following trees:
vp9_intra_mode_tree
vp9_inter_mode_tree
vp9_partition_tree
vp9_switchable_interp_tree
vp9_mv_joint_tree
vp9_mv_class_tree
vp9_mv_class0_tree
vp9_mv_fp_tree
Change-Id: I0212bb4c1ee6648249f68517e28a67a56591ee1b
Values of MODE_UPDATE_PROB and VP9_COEF_UPDATE_PROB are equal, so replacing
them with one constant. Inlining appropriate arguments for functions:
vp9_cond_prob_diff_update (encoder)
vp9_diff_update_prob (decoder)
Change-Id: I1255a1cb477743b799b3bfbbcd8de6b32b067338
Converts the constant rddiv parameter to 128 (from 100) and
implements RDCOST with bit-shift rather than multiplication.
Other parameters are also adjusted to roughly keep the same
balance between Rate and Distortion.
There is a slight speed-up of about 0.5-1% (at speed 0) as
testted on football_cif.
There is a slight change in performance due to small change
in the parameters.
derfraw300: +0.033%
stdhdraw250; +0.102%
Change-Id: I70ac69f58fa71c83108f68fe41796cd19d1fc760
The commit changes to mask available intra prediction modes for test
based on prediction block size.
With this patch, encoding time of CpuUsed 2 reduces from 10% to 20% for
HD clips with a compression drop of 0.2%
Change-Id: I65f320f1237c0f5ae3a355bf7caf447f55625455
When the codec in VBR (or cq) mode hits its max q limits and is
struggling to hit a target bandwidth, the bit target per frame collapses.
In the first instance normal frames cap out at the maximum allowed
Q and then the ARF and GFs do the same. This latter behavior is not
generally desirable as GFs and ARFs are only effective from a quality
and data rate perspective if they have at lease some level of -Q delta
compared to the surrounding frames.
In this patch I define a separate max Q for GFs and ARFs that is
derived from but somewhat lower than that defined for normal frames.
In effect there is a minimum Q delta that will always be available for
GFs and ARFs regardless of the target rate and MAXQ setting.
This may of course mean that the absolute lowest rate obtainable for
a given clip is somewhat higher.
Change-Id: I268868b28401900d0cd87e51e609cd3b784ab54a
We have two SSE2-optimized functions for idct4_1d:
vp9_idct4_1d_sse2 <-- removing this one
idct4_1d_sse2
vp9_idct4_1d_sse2 was used only by the following functions which already
have SSE2 optimized variants:
vp9_idct4x4_16_add_c -> vp9_idct4x4_16_add_see2
idct8_1d -> vp9_idct8x8_{16, 10, 1}_see2
vp9_short_iht4x4_add_c -> vp9_short_iht4x4_add_see2
Change-Id: Ib0a7f6d1373dbaf7a4a41208cd9d0671fdf15edb
To ensure fast encoding/decoding on devices without ssse3 support,
SSE2 optimization of sub-pixel filters was done. Test using 1080p
clip showed the decoder speeds were ~70fps with ssse3 filters, ~60fps
with sse2 filters, and ~15fps with c filters.
Change-Id: Ie2088f87d83a889fba80a613e4d0e287aadd785c
Renames:
fdct4_1d -> fdct4
fadst4_1d -> fadst4
fdct8_1d -> fdct8
fadst8_1d -> fadst8
fdct16_1d -> fdct16
fadst16_1d -> fadst16
"_1d" suffix is redundant, so removing it. The same will happen with idct
in the next change sets.
Change-Id: Ibf421cd2f569146c6079269df7a31819c098265e
This commit re-designs the per transformed block rate-distortion
costs tracking buffers. It removes redundant buffer usage, makes
the needed context memory allocation per VP9_COMP instance and
reuses the same buffer sets inside the rate-distortion optimization
search loop, thereby avoiding repeatedly requiring memory space.
It reduces speed 0 runtime:
bus at 2000 kbps from 166763ms to 158967ms,
football at 600 kbps from 246614ms to 234257ms.
Both about 5% speed-up. Local tests suggest about 2% to 5% speed-up
for speed 1 and 2 settings. This does not change compression
performance.
Change-Id: I363514c5276b5cf9a38c7251088ffc6ab7f9a4c3
Increases these parameters.
There is a small efficiency gain.
Change-Id: Ie5f0ddb39c907d335e0dafa5eb112365a81f4542
derfraw300: +0.091%
stdhdraw250: +0.238%
The intra mode distortion adjustment for skip_encode feature was
broken in the refactoring cc91851. This commit fixes it and tunes
the distortion models used therein.
Change-Id: I0d676e82f8e855536a90cf9b3e3fdefafcd886c6
snprintf is not supported by MSVC, the commit replace it with the msvc
variant _snprintf to enable build.
Change-Id: I686943a78c289bae6b486a5e75effad5f86c24de
Use b_mode_info to store the inter prediction mode of sub8x8 block,
in replacement of the use of partition_info. Remove redundant buffer
update for partition_info. For bus_cif at 2000 kbps, this seem to make
speed 0 about 1% faster.
Change-Id: Id1b3be45e75a24fb4b42335ac480c23e440978f6
When all coefficients are zeros, skip the corresponding 1-D inverse
transform. This practice has been used in the SSE2 implementation of
inverse 32x32 DCT. This commit imports this algorithm into the C code.
Change-Id: I0f58bfcb183a569fab85d524d5d9cf8ae8653f86
We already have itxm_add member in MACROBLOCKD structure. Both
inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
different eob values. But eob logic is already implemented in
vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
inverse_transform_b_4x4_add).
Change-Id: I80bec9b6f7d40c5e5033c613faca5c819c3e6326
For CpuUsed 1 & 2, this commit allow to skip retangular partition check
when NONE is better than SPLIT. It also changed to allow such logic
on alt ref frame coding rather than use square partition all them. The
change has gain compressio about .3% on yt and ythd for both 1&2, It
helped .6% compression on cif and stdhd for both CpuUsed 1&2.
Change-Id: I814b653baf89f59acd20e042629a12938a1bd4e5
Now we have entropy code separate from scan/iscan code. The next step
in future is to move iscan code from common part to the encoder.
Change-Id: Id9732f7d80aec00af35c1d58d1137c4c96c91451
A new set of MSVC warnings were introduced by change
I3f36d3f7cd8d15195a6e2fafd1777cdaf9ecb847
In particular MSVC does not like:-
typedef const int16_t subpel_kernel[SUBPEL_TAPS];
struct subpix_fn_table {
const subpel_kernel *filter_x;
const subpel_kernel *filter_y;
};
causes new warning in MSVC.
warning C4114: same type qualifier used more than once
Change-Id: Iae596fd13aadf36169faf00c68eabe9a32a9b156
This commit allows sub8x8 intra modes test in the rate-distortion
loop for hd sequences in speed 1 and 2.
For sequence y90n of hd set at 8000 kbps, speed 2 runtime goes
from 207s to 210s. For ped_1080p at 3000 kbps, speed 2 runtim goes
from 336s to 337s. Both are running with 300 frames.
This improves compression performance by 0.24% for stdhd and 0.32%
for hd.
Change-Id: I173ca38a6411565ae6cfadd184c42b2070c5de1f
The idea is to have the following names for each transform size:
vp9_idct4x4_add
vp9_idct4x4_1_add
vp9_idct4x4_10_add
vp9_idct4x4_16_add
vp9_idct8x8_add
vp9_idct8x8_1_add
vp9_idct8x8_10_add
vp9_idct8x8_64_add
etc for 16x16, 32x32
The actual list of renames in this patch:
vp9_idct_add_lossless -> vp9_iwht4x4_add
vp9_short_iwalsh4x4_add -> vp9_iwht4x4_16_add
vp9_short_iwalsh4x4_1_add -> vp9_iwht4x4_1_add
vp9_idct_add -> vp9_idct4x4_add
vp9_short_idct4x4_add -> vp9_idct4x4_16_add
vp9_short_idct4x4_1_add -> vp9_idct4x4_1_add
Change-Id: I6f43f7437c68dd30cdd05d72e213765578ed30b1
Speed 4 still does not give a big gain over speed 3.
This just cleans it up a little from the last patch and comments
out features that do not seem to be giving much benefit.
Change-Id: I5f366e6160e1dbe5dc45cf5eb90cc02712baa1b6
Allow selective masking of individual split modes rather than
just a single on / off flag.
For speed 2 recovers the large speed loss seen for some derf
clips in change Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
and a small quality gain.
For speed 1 10 % speed increase observed locally on some derf clips
for minimal quality change.
Change-Id: If86191087b93cbc05351c26c60c7933e2149e485
Moving INTERPOLATIONFILTERTYPE enum and subpix_fn_table struct to
vp9_filter.h. Adding convenient typedef for subpel kernels.
Function vp9_setup_interp_filters() besides setting xd->subpix.filter_x &
xd->subpix.filter_y has a side effect of also setting scale factors. This
is not required inside decode_modes_b() because scale factors have been
already set by set_ref() calls. That's why replacing
vp9_setup_interp_filters() call with newly created vp9_get_filter_kernel()
call. The behavior of vp9_setup_interp_filters() is unchanged (it
is used from the encoder).
Change-Id: I3f36d3f7cd8d15195a6e2fafd1777cdaf9ecb847
This commit removes the redundant second reference frame check in
the rate-distortion optimization loop for sub8x8 blocks.
Change-Id: I13a57a6f624c4a9bcef02ff2a867fa30d8b44a93
This commit defines b_mode_info as a struct type. This will allow
us to further remove the use of PARTITION_INFO in the encoding process.
Change-Id: I975b0f7d557b5e0f66545a61b472def76b671cce
This commit separates the rate-distortion optimization loop of
superblocks from that of sub8x8 blocks. This allows better design
rate-distortion optimization search loop for each setting. It also
removes the use of SPLITMV and I4X4_PRED therein.
No performance change in speed 0 settings. For bus@CIF at 2000kbps,
the speed 1 runtime goes from 48009ms to 43894ms (about 10% faster).
The overall compression performance on derf changed by -0.021%.
Speed 2 runtime goes from 27114ms to 28700ms (6% slower), while the
overall coding efficiency goes up by 1.629% for derf, 1.236% for yt.
Change-Id: Ie6bdfa0a370148dd60bd800961077f7e97e67dd4
In subpixel filters, prefetched source data, unrolled loops,
and interleaved instructions.
In HORIZx4, integrated the idea in Scott's CL (commit:
d22a504d11), which was suggested by
Erik/Tamar from Intel. Further tweaking was done to combine row 0,
2, and row 1, 3 in registers to do more 2-row-in-1 operations until
the last add.
Test showed a ~2% decoder speedup.
Change-Id: Ib53d04ede8166c38c3dc744da8c6f737ce26a0e3
Substantial reworking of the speed vs quality trade offs for
speed 1 and 2.
In this patch I am attempting to freeze the "quality" meaning of
speeds 1 and 2 relative to speed 0 so that in future we can
better evaluate progress.
I am targeting :
Speed 1 quality ~-5% vs speed 0.
Speed 2 quality ~-10% vs speed 0
It is inevitable that quality will still fluctuate a little as we adjust
settings and add new features, but we will attempt to keep as
close as possible to these values. Above speed 2 things will remain
a bit more fluid for now.
In this patch speed 1 is approximately 4-5x as fast as speed 0. This
is similar to before but the quality hit is a lot less. Likewise speed 2
is approximately 2x as fast as speed 1 but is similar in quality to the
previous speed 1 configuration.
Also slight change to behavior of FLAG_EARLY_TERMINATE to insure
all reference frames get at least one rd test. Important for very low
variance regions.
WIP :- Added a new speed level with old speed 4 becoming speed 5.
Speed 3 and 4 tradeoffs still WIP
Change-Id: Ic7a38dd7b5b63ab1501f9352411972f480ac6264
This commit causes use last partition to consider whether a 64x64 has
motion that might make a new partitioning worth while.
Change-Id: I3a57bedef4f3cd961fadbfa96651c206fa36da4a
Simplify the k_cvtlo_epi16 and k_cvthi_epi16 to only two
instructions. Then inlined them.
quoting from intel MMX_App_Compute_16bit_Vector.pdf
"The PMADDWD instruction multiplies four
pairs of 16-bit numbers and produces partial sums of the results
and can do so once per clock (with a three-clock latency)."
so I am assuming that there will be three clock overhead after the
last _mm_madd_pi16 command.
Even with the overhead the number of clocks in general should be
smaller. I am not sure though becasue I could not find information
about number of clocks required for instructions in k_cvtlo_epi16
and k_cvthi_epi16. I will run a test and compare the execution time.
Change-Id: Ieda4aa338f69ad3dd196ac6e7892da3cf1b47ea7
Moving functions from vp9_idct_blk to vp9_idct because these functions are
used from both encoder and decoder. Removing duplicated code from
vp9_encodemb.c and reusing existing functions.
Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
The codec should effectively run with motion vector of range (-2048, 2047)
in full pixels, for sequences of 1080p and below. Add assertions to clarify
this behavior.
Change-Id: Ia0cac28249f587d8f8882205228fa480263ab313
Moving out decode_tokens function calls and adding decode_blocks boolean
variable. We only have to decode if eobtotal > 0, i.e. we have at least one
non-zero coefficient. Also inlining and remove vp9_set_pred_flag_mbskip
function.
Change-Id: I7be38b12ee8206faf0beea2bbf4d52be42575b03
The declaration of the bilinear filters specified an alignment clause
in the implementation file but not in the header. This turned out
to be harmless, but it did cause linker warnings to be emitted when
building on Windows.
The (extern) declaration in the header was changed, to match the
declaration in the implementation.
Change-Id: I44be89b1572fe9a50fa47a42e4db9128c4897b04
Interleaved the instructions, reduced register dependency, and
prefetched the source data. This improved the decoder speed
by 0.6% - 2%.
Change-Id: I568067aa0c629b2e58219326899c82aedf7eccca
byte version of ronalds d153 ssse3 optimizations for
4x4 and 8x8
(commit: fc91a2a112238a1aee568f3b840585de4e928fca)
Change-Id: Iec4426032311483f615fd9e0dceba3ee85ddebd7
Make encoder skip rectangular partition check in speed 1 and above,
when early termination was triggered in partition split.
Thanks Guillaume (gmartres@) for catching this issue.
This change makes bus_cif at 2000kbps speed 1 runtime goes down from
25612ms to 23438ms (about 9% speed-up), at the expense of -0.235%
performance down.
Change-Id: I98613fad081a261d30d5fa206f934ca70601c180
We don't need these functions anymore. The only one which was actually
used is vp9_add_constant_residual_32x32. Addition of
vp9_short_idct32x32_1_add eliminates this single usage. SSE2 optimized
version of vp9_short_idct32x32_1_add will be added in the next patch set,
right now it is only C implementation. Now we have all idct functions
implemented in a consistent manner.
Change-Id: I63df79a13cf62aa2c9360a7a26933c100f9ebda3
The code now takes into account temporal and spatial
information to determine the partition size range, but the
frequency counts have been removed.
The net effect is similar in quality but about 10% faster.
Change-Id: I39a513fb79cec9177b73b2a7218f0da70963ae95
This patch deletes the variance based speed three partitioning.
Speed 3 now uses the same partitioning method as speed 2
but with some stricter conditions.
The speed and quality are now somewhere between speeds 2 and 4
whereas before it was worse in both than speed 4.
Change-Id: Ia142e7007299d79db3ceee6ca8670540db6f7a41
apparently we are going to have trouble completely removing lint issue in this file.
It needs a bit more work. We need to include vpx_config.h to know whether
we need to have multi threading . and that means vpx_config.h has to come
before the system headers. ( a violation )
Change-Id: I023feeab1bf5643b79dccc3b80a4a9ad42689e7b
Signed-off-by: Jim Bankoski <jimbankoski@google.com>
Include the whitespace after the first argument's comma in the
optional first argument group.
This fixes a minor style regression in the converted output
since 2a233dd31.
Change-Id: I254f4aaff175e2d728d9b6f3c12ede03846adcf1
Both vp9_init_mbmode_probs() and vp9_zero(cm->ref_frame_sign_bias) are
called inside vp9_setup_past_independence() which called in any case for
encoder/decoder after VP9_COMMON struct creation.
Change-Id: I3724d1a4fb8060101ff0290dd6a158f0b5c57bb4
Replace current code which corrupts the stack by
duplicate of vp8 code to save and restore neon
registers.
Change-Id: Ibb0220b9aa985d10533befa0a455ebce57a2891a
Some small changes to the quantizer mapping functions.
Also includes some cleanups.
Change-Id: I9dea29b24015f6e6697012a0e4d8983049d8e5c7
Results:
derfraw300: +0.106%
stdhdraw250: +0.139%
Don't divide RDMULT and RDDIV by 100 when RDMULT > 1000. This was
probably done to avoid overflow when the rd cost was stored in a 32 bits
integer but this is not the case anymore. This change will make it easier
to support multiple quantizers per frame.
derf compression gain at speed 0: 0.037%
Change-Id: Ibeeb9b7cfa1a132a7af41bc90fc07a3bba0857f6
Jenkins warns on left shift of negative numbers and non-aligned read
of int. This commit fixed the two issues.
Change-Id: I389a7fb6a572c643902e40a4c10fefef94500d2c
- full ASM version, no more C gateway file.
- integrate combine-add with last step of 2nd pass.
- remove a few push/pop pairs.
- some instruction reordering to hide latency.
Change-Id: Ic9d9933c908b65d1bf7ba8fd47b524cda808c9c6
Both first pass and mbgraph search use block size 16x16 for motion
estimation. This commit put a limit of motion vector range. The
effective range allows the entire 16x16 with required subpel
interpolation input to be completely outside image border, but
not any further away from image border.
Change-Id: Id70a5ed08be49e70959f064859d72adc7d775d08
INT64_MAX may be assigned as RDCOST when RDCSOST computation is skipped
for speed, this commit to prevent INT64_MAX from being used as real
RDCOST in transform size decision.
Change-Id: I89a945134191bbdea1f1431ade70424ac079eaac
After change of MI context storage , mi_8x8[] pointer may be null for
a block outside of image border. The commit changes to access the data
only after validation of mi_row and mi_col.
Change-Id: I039c4eb486a228ea9d8e5f35ab9ae6717d718bf3
The probability model used to code prediction mode is conditioned
on the immediate above and left 8x8 blocks' prediction modes. When
the above/left block is coded in sub8x8 mode, we use the prediction
mode of the bottom-right sub8x8 block as the reference to generate
the context.
This commit moves the update of mbmi.mode out of the sub8x8 decoding
loop, hence removing redundant update steps and keeping the bottom-
right block's mode for the decoding process of next blocks.
Change-Id: I1e8d749684d201c1a1151697621efa5d569218b6
39c7b01d accidently reverted the row/col initialization, which broke
mv clamps, which is dependent on the sites for valid motion vector
range. This commit fixed the issue.
Change-Id: Ibcce0226e0360b1ef483fe760b2e33f1af4bf494
Added hiding global symbols for macho32 and macho64 in x86inc.asm.
This was done to fix exported symbol issue in Chrome build.
Change-Id: I08d5c559b985b82f655b537469fee125615e78c0
This commit enables forcing all coefficients zero per transformed
block, when its rate-distortion cost is lower than regular coeff
quantization.
The overall performance improvement (including its parent patch on
calculating rd cost per transformed block) at speed 1:
derf: 0.298%
yt: 0.452%
hd: 0.741%
stdhd: 0.006%
Change-Id: I66005fe0fd7af192c3eba32e02fd6d77952accb5
Adds modeled functions to decide the qp for altref frames in constant q
mode similar to other functions in use in bitrate mode.
Also turns on the constrained quality mode (end-usage=2) option which
was turned off before. Basic testing shows the mode works in principle,
to cap bitrate to the target-bitrate specified, while allowing lower
bitrate depending on the cq-level specified. The mode will need to be
improved over time.
Results for constant quality vs bitrate control mode:
derfraw300/fullderfraw: +3.0% at constant quality over bitrate control.
fullstdhdraw: +4.341%
stdhdraw250: +5.361%
Change-Id: If5027c9ec66c8e88d33e47062c6cb84a07b1cda9
This commit makes the rate-distortion optimization loop evaluate
the rd costs of regular quantization and all zero coeffs, per
transformed block. It improves speed 1 compression performance:
derf: 0.245%
yt: 0.515%
For a large partition that consists multiple transformed blocks,
this allows more flexibility to selectively force a portion of
them coded as all zero coeffs, as well be continued in the next
patches.
Change-Id: I211518be4179747b57375696f017d1160cc91851
The sub8x8 blocks has its own motion vector reference scheme. The
mv_pred is only used blocks of sizes 8x8 and above, to find the
starting point for motion search.
This change does not change any coding behavior. It makes the
encoding process slightly faster. (0.5% speed-up for local test on
speed 1.)
Change-Id: I746ee6ef0eac19aa3621be014afa12be8d82cbb9
The fake token EOSB may cause invaild memory read in pack token, this
commit reworked the loop to avoid such invalid read.
Change-Id: I37fdfce869b44a7f90003f82a02f84c45472a457
Now the same regexp that previously handled cases such as
"ldr r1, [r2, -r3]" also can handle the first operand being omitted
as in "pld [r2, -r3]".
This fixes building vp9_convolve8*neon.asm in thumb mode (and thus,
for Windows Phone as well).
Change-Id: I20c1c3f2bfb2587fb5fa523b863972a7fe30d8ff
Current x86inc.asm didn't handle 32bit PIC build properly.
TEXTRELs were seen in the library built. The PIC macros from
libvpx's x86_abi_support.asm was used to fix this problem.
The assembly code was modified to use the macros.
Notes: We need this fix in for decoder building. Functions in
encoder will be fixed later.
Change-Id: Ifa548d37b1d0bc7d0528db75009cc18cd5eb1838
This commit cleans up the second reference check in the
rate-distortion optimization loop of sub8x8 blocks.
Change-Id: Ife68feaa4cddbfad2878c9b44d3012788d634f97
Modified the resize unit test so that it optionally
writes the encoded bitstream to file. The macro
WRITE_COMPRESSED_STREAM should be set to 1 to enable
output of the test bitstream; it is set to 0 by default.
Change-Id: I7d436b1942f935da97db6d84574a98d379f57fb1
The sub8x8 check can be directly inferred from block_idx, hence
removed from the arguments if get_sub_block_mv.
Change-Id: Ib766d57e81248fb92df0f6d9b163e6c77b933ccd
This commit reworked the unit test for 8x8 forward transform. It
allows scalability to cover various implemented versions.
Change-Id: I5594bd3e2307bb5bec764eaffd8860caa260e432
Jenkins was failing to detect the case where an existing
file is recreated with new content. In this case, thinking
that the file already existed, Jenkins did not re-copy the
file as it should have.
By adding the file test-data.sha1 as a dependendency to
the LIBVPX_TEST_DATA build target the files will be
recopied if the MD5 of an existing file changes.
This could be further improved to only copy files that
have changed rather than copying the whole set as done in
this patch.
(Thanks to jzern@ who diagnozed ithe problem and suggested
this fix).
Change-Id: Icea7c61a95189bc639fec83020c28c70da5b2b41
The commit added reset of pred_mv at the beginning of each SB64x64
partition mv search, also limited the usage of pred_mv only when
search on the largest partition is already done. This is to fix
a crash at speed 1/2 encoder where an invalid mv is used in mv
search.
Change-Id: I39010177da76d054e3c90b7899a44feb2e3a5b1b
This is incompatible with most toolchains other than gcc.
Revert "Deleted #include <inttypes.h>"
This reverts commit 4d018be950.
This reverts commit d22a504d11.
Change-Id: I1751dc6831f4395ee064e6748281418e967e1dcf
This commit enables adaptive constraint on motion search range for
smaller partitions, given the motion vectors of collocated larger
partition as a candidate initial search point.
It makes speed 0 runtime of bus at CIF and 2000 kbps goes from
167s down to 162s (3% speed-up), at 0.01dB performance gains. In
the settings of speed 1, this makes the runtime goes from 33687 ms
to 32142 ms (4.5% speed-up), at 0.03dB performance gains.
Compression performance wise, it gains at speed 1:
derf 0.118%
yt 0.237%
hd 0.203%
stdhd 0.438%
Change-Id: Ic8b34c67810d9504a9579bef2825d3fa54b69454
The CpuSpeedTest is extended to cover 2pass good quality with CpuUsed
from 0 to 4. The BordersTest is changed to use CpuUsed 1 for faster
turn around.
Change-Id: I005e89adee7fe63af4b1f2a76a3a13ea826feadf
Mis-merge of the following change managed to break mode order
and delete two mode options (new alt ref and near alt ref)
It also created a situation where we could test two undefined
modes off the end of the VP9_mode_order[] data structure.
"clang warnings : remove split and i4x4_pred fake modes"
"Change Id: I8ef3c*"
Initial testing on Akiyo at speed 2.
101.35 44.567 44.447 improves to
96.82 44.915 44.815
Approx 0.3-0.4db gain and 2.5% size reduction
Change-Id: Icff813e7c0778d140ad4f0eea18cf1ed203c4e34
Removes this speed feature since it is very slow and unlikely
to be used in practice. This cleanup removes a bunch of unnecessary
complications in the outer encode loop.
Change-Id: I3c66ef1ca924fbfad7dadff297c9e7f652d308a1
Reformatted version of a patch submitted by Erik/Tamar
from Intel. For the test clips used, the decoder
performance improved by ~2%.
Change-Id: Ifbc37ac6311bca9ff1cfefe3f2e9b7f13a4a511b
Propose some changes to the speed 2 settings to improve quality.
In particular, turns off the adjust_thresholds_by_speed feature
which improves results by 6%. Also removes the code for
adjust_thresholds_by_speed since it conflicts with the adaptive
rd thresh feature.
Overall, with this change speed 2 is -15.2% from speed 0 settings,
on derf, which is significantly better than -21.6% down before.
Change-Id: I6e90a563470979eb0c258ec32d6183ed7ce9a505
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of pointers to MODE_INFO structs. The
MODE_INFO structs are now stored as a stream (decoder only),
eliminating unnecessary copies and is a little more cache
friendly.
Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
The c code implementation of 32x32 quantization does the zbin check
of all coefficients prior to the quant/dequant loop, hence removing
the redundant zbin check inside the loop. This only affects the
c code version. SSSE3 version does not separate the zbin check out.
Change-Id: Ic197a7d61d0b25fcac3cc092987651378cb56e4e
Added the resize_test unit test to the VP9 set.
Set g_in_frames = 0 to avoid a problem when the total
number of frames being encoded is smaller than
g_in_frames. In this case the test will not have
access to the encoded frames and will not be able to
compare them for testing for encoder/decoder mismatch.
Change-Id: I0d2ff8ef058de7002c5faa894ed6ea794d5f900b
If the current obtained distortion is very small, which happens
for static image case, we pick the current partition type without
further split checking.
This won't affect regular videos. For static videos, we got 10%~12%
encoding speed gain. PSNR was better for some clips, and worse for
others. Overall it was even.
Change-Id: If787a57bedf46fc595ca4f5ded2b0c0a69e9fdef
Thank Paul for the suggestions. While turning on static-thresh
for static-image videos, a big jump on bitrate was seen. In this
patch, we detected static frames in the video using first-pass
stats. For different cases, disable encode breakout or reduce
encode breakout threshold to limit the skipping.
More modification need be done to break incorrect partition
picking pattern for static frames while skipping happens.
Change-Id: Ia25f47041af0f04e229c70a0185e12b0ffa6047f
A previous speed feature skipped modes not used in earlier
partitions but this not longer worked as intended following
changes to the partition coding order and in conjunction
with some other speed features (Especially speed 2 and above).
This modified mode skip feature sets a mask after the first X
modes have been tested in each partition depending on the
reference frame of the current best case.
This patch also makes some changes to the order modes are
tested to fit better with this skip functionality.
Initial testing suggests speed and rd hit count improvements
of up to 20% at speed 1. Quality results. (derf -1.9%, std hd +0.23%).
Change-Id: Idd8efa656cbc0c28f06d09690984c1f18b1115e1
This commit completes the per coefficient accuracy check and memory
overflow check for SSE2 and other implemented versions of 16x16
transform.
Change-Id: If26a3e4f6ba82ccecc13f0b73cb8f7bb6ac14584
This commit refactors the 16x16 transform unit test. It enables the
test on all implemented versions of forward and inverse 16x16 transform
modules.
Change-Id: I0c7d5f3c5fdd5d789a25f73e287aeeaf463b9d69
Sample app: vp9_spatial_scalable_encoder
vpx_codec_control extensions:
VP9E_SET_SVC
VP9E_SET_WIDTH, VP9E_SET_HEIGHT, VP9E_SET_LAYER
VP9E_SET_MIN_Q, VP9E_SET_MAX_Q
expanded buffer size for vp9_convolve
modified setting of initial width in vp9_onyx_if.c so that layer size
can be set prior to initial encode
Default number of layers set to 3 (VPX_SS_DEFAULT_LAYERS)
Number of layers set explicitly in vpx_codec_enc_cfg.ss_number_layers
Change-Id: I2c7a6fe6d665113671337032f7ad032430ac4197
In configure when internal-stats is enabled, because postprocessing
code is needed for computing stats for enabling internal-stats
Change-Id: I3601dc5a4aa65feb99465452486a21e75eb62c1f
The commit changes the border pixel extension from 160 pixel each side
to what is necessary in arnr filter or motion estimation portion, i.e.
16 pixel on top and left side. For right or bottom side, the extension
is changed to either round up image size to multiple of 64 or at least
16 pixels.
Change-Id: Ic05e19b94368c1ab4df568723aae5734e6c3d2c5
The 16x16 transform unit test suggested that the peak coefficient
value can reach 32639. This could cause potential overflow issue
in the SSSE3 implmentation of 16x16 block quantization. This commit
fixes this issue by replacing addition with saturated addition.
Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e
this prevents returning an address smaller than the natural heap
alignment from vpx_malloc on e.g., x86_64
Change-Id: I283e858664a8529f28b22060c3815116a7798c0d
Adds a new end-usage option for constant quality encoding in vpx. This
first version implemented for VP9, encodes all regular inter frames
using the quality specified in the --cq-level= option, while encoding
all key frames and golden/altref frames at a quality better than that.
The current performance on derfraw300 is +0.910% up from bitrate control,
but achieved without multiple recode loops per frame.
The decision for qp for each altref/golden/key frame will be improved
in subsequent patches based on better use of stats from the first pass.
Further, the qp for regular inter frames may also be varied around the
provided cq-level.
Change-Id: I6c4a2a68563679d60e0616ebcb11698578615fb3
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of a pointer to a MODE_INFO struct and
a "in the image" flag. The MODE_INFO structs are now stored
as a stream, eliminating unnecessary copies and is a little
more cache friendly.
For the test clips used, the decoder performance improved
by ~4.3% (1080p) and ~9.7% (720p).
Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
and 5.9% (720p).
Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
This commit enabled a full functional test on 32x32 forward/inverse
transform, including round-trip error and memory overflow check. It
tests the prototype functions in C and all other implementations if
applicable.
Change-Id: I9cc50b05abdb4863e7abbcb29209a19b1fe90da7
The 32x32 forward transform can potentially reach peak coefficient
value close to 32700, while the rounding factor can go upto 610.
This could cause overflow issue in the SSSE3 implementation of 32x32
quantization process.
This commit resolves this issue by replacing the addition operations
with saturated addition operations in 32x32 block quantization.
Change-Id: Id6b98996458e16c5b6241338ca113c332bef6e70
The segment feature SEG_LVL_SKIP requires the prediction unit size
to be at least BLOCK_8X8. This commit makes the requirement to be
explicit. This is to prevent future encoder implementations from
making wrong choices.
Change-Id: I0127f0bd4c66e130b81f0cb0a8d3dbfe3b2da5c2
There is another unit test that has been failing randomly on win32
build. Investigation has shown that the failure was caused by simd
register state is not reset appropriately in the fdct8x8 test. This
commit added ClearSystemState() in the teardown of this test, tests
showed it resolved the random failure issue for win32 build.
Related issue: https://code.google.com/p/webm/issues/detail?id=614
Change-Id: I9381d0c1a6f4b855ccaeef1aca8c417ac8c71ee2
Moves counting of mv branches to where we have a new mv, instead of after
the whole frame is summed.
Change-Id: I945d9f6d9199ba2443fe816c92d5849340d17bbd
Speed 4 fixed partition size. Use fixed size unless it does not
fit inside image, in which case use the largest size that does.
Change-Id: I250f7a80506750dd82ab355721624a1344247223
This commit fixed the potential overflow issue in the SSE2
implementation of 32x32 forward DCT. It resolved the corrupted
coded frames in the border of scenes.
Change-Id: If87eef2d46209269f74ef27e7295b6707fbf56f9
While static-thresh is on, we only need to transmit skip
flag if skip = 1. The cost of skip bit is added to the
total rate cost.
Change-Id: I64e73e482bc297eba22907026298a15fa8cc3920
- Intermediate height was not correct i.e. when block size is 4 and
y_step_q4 is 6. In this case intermediate height was
(4*6) >> 4 = 1 and vertical interpolation needs two source pixels
plus 7 extra pixels for taps.
- Also if the current output block is 16x16 and we are using 4x upscaling
we need only 12 rows after horizontal filtering instead of 16.
Patch Set 2: Intermediate_height updated after CL 66723
"Fix bug in convolution functions (filter selection)"
Change-Id: I5a1a1bc2ac9d5edb3a6e0818de618bf318fdd589
Added some code to output normalized rd hit count stats.
In effect this approximates to the average number of rd
operations/tests per pixel for the sequence.
The results are not quite accurate and I have not bothered
to account for partial SB64s at frame edges and for key frames
However they do give some idea of the number of modes /
prediction methods being tested for each pixel across the
different partition sizes. This indicates how much scope their
is for further gains either by reducing the number of partitions
examined or the modes per partition through heuristics.
Patch 3 moved place where count incremented so partial rd
tests that are aborted with INT_MAX return are also counted.
Example numbers for first 50 frames of Akiyo.
Speed 0 ~84.4 rd operations / pixel
Speed 1 ~28.8
Speed 2 ~11.9
Change-Id: Ib956e787e12f7fa8b12d3a1a2f6cda19a65a6cb8
The 32x32 quantization process can potentially have the intermediate
stacks over 16-bit range, thereby causing enc/dec mismatch. This commit
fixes this overflow issue in the SSSE3 implementation, as well as the
prototype, of 32x32 quantization.
This fixes issue 607 from webm@googlecode.
Change-Id: I85635e6ca236b90c3dcfc40d449215c7b9caa806
The two arrays are typically initialized to INT64_MAX, if they are not
filled with valid values before the addition, the values can overflow
and lead to wrong results.
Change-Id: I515de22cf3e8f55af4b74bdb2c8eb821a02d3059
This patch is a reformatted version of optimizations done by
engineers at Intel (Erik/Tamar) who have been providing
performance feedback for VP9. For the test clips used (720p, 1080p),
up to 1.2% performance improvement was seen.
Change-Id: Ic1a7149098740079d5453b564da6fbfdd0b2f3d2
Switching from mi_{width, height}_log2 and b_{width, height}_log2 to
num_8x8_blocks_{wide, high} and num_4x4_blocks_{wide, high}. Removing
redundant code, adding const.
Change-Id: Iaab2207590fd24d0b76999071778d1395dc5cd5d
Incorporates a speed feature for fast forward updates of
coefficients. This feature takes 3 values:
0 - use standard 2-loop version
1 - use a 1-loop version
2 - use a 1-loop version with reduced updates
Results: derfraw300 +0.007% (on speed 0) at feature value = 1
-0.160% (on speed 0) at feature value = 2
There is substantial speed up at speeds 2 and above for low
resolution sequences where the entropy updates are a big part
of the overall computations.
Change-Id: Ie96fc50777088a5bd441288bca6111e43d03bcae
This commit resolved a mis-alignment issue in compound inter-inter
prediction of sub8x8. This patch follows solution from dkovalev@.
Change-Id: I3cc0cf7e55b84110e0c42ef4b2e6ca7ac3f8f932
In subpel_avg_variance functions, code similar to the following
punpkldq m2, [addr]
actually reads 8 bytes. For functions that are supposed to work on
buffers only have less 8 bytes a line, this caused valgrind error
of reading uninitialized memory.
Change-Id: I2a4c079dbdbc747829bd9e2ed85f0018ad2a3a34
vp9_setup_interp_filters before each inter block decoding, it is not
necessary to call it just before the whole frame decoding.
Change-Id: Id1b0ee62f987474e27eafba0013a4896b492c400
Removing references to plane_block_width and plane_block_height (we are
going to delete the latter ones).
Change-Id: I7982da4d373aebb54d2209dc8886f6192df4d287
Make the current head working properly, while working on fixing an
issue in the SSSE3 implementation of 32x32 quantization.
Change-Id: Ic029da3fd7f1f5e58bc641341cbd226ec49a16bc
- s|source -> src
- dest -> dst
- use verbose names in extend_plane dropping the redundant comments
+ light cosmetics:
- join a few lines / assignments
- drop some unnecessary comments & includes
Change-Id: I6d979a85a0223a0a79a22f79a6d9c7512fd04532
Previous change c4048dbd limits the mv search range assuming max block
size of 64x64, this commit change the search range using actual block
size instead.
Change-Id: Ibe07ab02b62bf64bd9f8675d2b997af20a2c7e11
We could avoid calling clamp_mv2 because it has been already called
inside vp9_find_best_ref_mvs function.
Change-Id: I08edeaf3e11e98c19e67b9711b2523ca5fb1416e
Fix of https://code.google.com/p/webm/issues/detail?id=608. We could have
used invalid display size equal to the previous frame size (not to the
current frame size).
Change-Id: I91b576be5032e47084214052a1990dc51213e2f0
Making code more compact, adding consts, removing redundant arguments,
adding do/while(0) for macros.
Change-Id: Ic9ec0bc58cee0910a5450b7fb8cfbf35fa9d0d16
To the source buffer to be encoded as an alt ref frame. This is to fix
the problem of using uninitialized memory in encoder.
See https://code.google.com/p/webm/issues/detail?id=605
Change-Id: I97618a2fc207e08abcf5301b734aa9e3ad695e2c
(In response to Issue 604:
https://code.google.com/p/webm/issues/detail?id=604)
There were bugs in the convolution code for two cases:
1. Where the filter table was assumed to be aligned to a
256 byte boundary. The offset of the pixel in the
source buffer was computed incorrectly.
2. Where no such alignment assumption was made. An
incorrect address for the filter table base was used.
To fix both problems, I now assume that the filter table is
256-byte aligned and modify the pixel offset calculation to
match.
A later patch should remove the restriction that the filter
table is aligned to a 256-byte boundary.
There was also a bug in the ConvolveTest unit test
(convolve_test.cc).
(Bug & initial fix suggestion submitted by Tero Rintaluoma
and Sami Pietilä).
Change-Id: I71985551e62846e55e40de9e7e3959d4805baa82
Values now carried over frame to frame.
Change to algorithm for decreasing threshold after
a hit and to max threshold (now based on speed)
Removed some old commented out code relating to
VP8 adaptive thresholds.
The impact of these changes tested on Akiyo (50 frames)
and measured in terms of unit rd hits is as follows:
Speed 0 84.36 -> 84.67
Speed 1 29.48 -> 22.22
Speed 2 11.76 -> 8.21
Speed 3 12.32 -> 7.21
Encode speed impact is broadly in line with these.
Change-Id: I5b886efee3077a11553fa950d796fd6d00c8cb19
Most of the focus so far has been on inter frames.
At high speed settings the key frame is now taking a high %
of the cycles.
This patch puts in some masking to reduce the number
of INTRA modes searched during key frame coding (as already
happens for inter frames) at higher speed settings
TODO: Develop this further with either adaptive rd thresholds
when choosing which intra modes to consider or some other
heuristic.
Impact.
At high speed settings on some clips the key frame was starting
to dominate. In a coding of the first 50 frames of AKIYO at speed
2 limiting the key frame intra modes to DC or TM_PRED resulted in
~30% overall speedup. For Bus the number was lower at ~4-5%.
Change-Id: I7bde68aee04995f9d9beb13a1902143112e341e2
Put rectangular partition check flag change according to the rd
costs of NONE and SPLIT partition types under the speed feature.
Change-Id: If681e1e078a8d43d86961ea4b748da5cd1b6c331
vp9_short_idct10_16x16_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut many
unnecessary calculations in order to save instructions.
Change-Id: I6e30a3fee1ece5af7f258532416d0bfddd1143f0
It is possible to have invalid scale factors and not access them
during decoding. Error is reported if we really try to use invalid scale
factors.
Change-Id: Ie532d3ea7325ee0c7a6ada08269f804350c80fdf
Comment is wrong, we don't initialize any xd pointers. We only initialize
xd->planes[i]->dst and xd->planes[i]->pre[], which are actually initialized
for every block during the decoding.
Change-Id: If152ea872ebef1f83ca70712fa6f8df1b6855f56
remove duplicate allocation from vp9_create_compressor, it was added to
vp9_alloc_frame_buffers in:
d5bec52 Added resizing & initialization of last frame segment map
Change-Id: I996723226a16a62aff8f9a52ac74e0b73cc98fdf
This commit changes the partition search order of superblocks from
{SPLIT, NONE, HORZ, VERT} to {NONE, SPLIT, HORZ, VERT} for
consistency with that of sub8x8 partition search. It enable the use
of early termination in partition search for all block sizes.
For ped_area_1080p 50 frames coded at 4000 kbps, it makes the runtime
goes down from 844305ms -> 818003ms (3% speed-up) at speed 0.
This will further move towards making the in-search partition types
configurable, hence unifying various speed-up approaches.
Some speed 1 and 2 features are turned off during the refactoring
process, including:
disable_split_var_thresh
using_small_partition_info
Stricter constraints are applied to use_square_partition_only for
right/bottom boundary blocks. Will bring back/refine these features
subsequently. At this point, it makes derf set at speed 1 about
0.45% higher in compression performance, and 9% down in run-time.
Change-Id: I3db9f9d1d1a0d6cbe2e50e49bd9eda1cf705f37c
Adds a couple of minor fixes, which may be absorbed in Jingning's
patch. Thanks to Guillaume for pointing these out.
Also adjusts the thresholds for speed 1 and 2 to 16 and 32
respectively, to keep quality drops small.
Results:
--------
derfraw300: threshold = 16, psnr -0.082%, speedup 2-3%
threshold = 32, psnr -0.218%, speedup 5-6%
stdhdraw250: threshold = 16, psnr -0.031%, speedup 2-3%
threshold = 32, psnr -0.273%, speedup 5-6%
Change-Id: I4b11ae8296cca6c2a9f644be7e40de7c423b8330
It appears that the above/left mb_skip_coeff used during
the pick modes, is left over from the previously
encode frame. This patch initializes the flag to the default
value of zero.
Change-Id: Ida4684cc99611d6e3e82628db35ed717e28ce550
+ disable() -> disable_feature() for balance
this avoids shadowing the bash builtin 'enable' allowing the scripts to
be linted with checkbashisms
Change-Id: Ia11cf86c92ec25bd14e69427b0ac0a9a61a5f7a5
the final macroblock rows are scheduled in the main thread. prior to
this change one additional macroblock row would be scheduled in the
worker forcing the main thread to wait before finishing.
Change-Id: I05f3168e5c629b898fcebb0d77eb6d6a90d6105e
Currently, the best quality mode in VP9 is not very well developed,
and unnecessarily makes the encode too slow. Hence the command line
default is changed to "good" quality. Also, the number of passes
default is changed to 2 passes as well, since 1-pass encoding is
not very efficient in VP9.
Besides, a number of VP9 defaults are set to the currently
recommended settings. With these changes, vpxenc
run with --codec=vp9 --kf-max-dist=9999 --cpu-used=0 should
work about the same as our borg results.
Note when the --cpu-used=0 option is dropped there will be a slight
difference in the output, because of a difference in the cpu-used
value for the first pass. Specifically, the default when unspecified
is to use cpu_used=1 for the first pass and cpu_used=0 for the
second pass. But when specified, both passes will use the cpu-used
value specified.
Note that this also changes the default for VP8 as being "good"
but other options stay unchanged.
Change-Id: Ib23c1a05ae2f36ee076c0e34403efbda518c5066
Adding set_contexts contexts function and call it instead of
set_contexts_on_border. Calling txfrm_block_to_raster_xy to get aoff and
loff.
Change-Id: I41897e344afd2cae1f923f4fdbe63daccf6fe80e
Moving foreach_predicted_block_in_plane function to vp9_reconinter.c
because there is only one usage.
Change-Id: I9852feae43fc3cf809b817fc541d043bc5496209
Updating implementation of vp9_get_pred_context_single_ref_p2 using
has_second_ref function to make code easier to read.
Change-Id: I5ba642712f59861a48aab974e73aa01640d086fe
vp9_short_idct10_8x8_add is used to handle the block that only have valid data
at top left 4x4 block. All the other datas are 0. So we could cut several
unnecessary calculations in order to save instructions.
Change-Id: I34fda95e29082b789aded97c2df193991c2d9195
Check the minimum rate-distortion cost of regular quantization and
all zero coeffs cases in the sub8x8 inter prediction rd loop for
luma components. Use this as the cumulative rdcost sent to UV rd
estimation.
Change-Id: Ia4bc7700437d5e13d7cdad4cf9ae57ab036d3e97
Cleans up the switchable filter search logic. Also adds a
speed feature - a variance threshold - to disable filter search
if source variance is lower than this value.
Results: derfraw300
threshold = 16, psnr -0.238%, 4-5% speedup (tested on football)
threshold = 32, psnr -0.381%, 8-9% speedup (tested on football)
threshold = 64, psnr -0.611%, 12-13% speedup (tested on football)
threshold = 96, psnr -0.804%, 16-17% speedup (tested on football)
Based on these results, the threshold is chosen as 16 for speed 1,
32 for speed 2, 64 for speed 3 and 96 for speed 4.
Change-Id: Ib630d39192773b1983d3d349b97973768e170c04
Changes to code to auto select a partition size range
based on data from spatial neighbors.
Now looks at the sb_type in each 8x8 block of above
and left SB64.
The effect on speed 1 is now weaker giving better
quality but less speed gain. Now also used in speed 2.
Change-Id: Iace33a97d5c3498dd2a9a8a4067351941abcbabc
Updating implementation of vp9_get_pred_context_single_ref_p1 using
has_second_ref function to make code easier to read.
Change-Id: Ie8f60403a7195117ceb2c6c43176ca9a9e70b909
As the pixel values beyond image border are duplicates of pixels
on edge, the change limits the mv search range, any mv beyond
the limits no longer produce new/different prediction values
as entire block with pixels used for subpel interpolation are
outside image border.
Change-Id: I4c6fdf06e33c1cef1489f5470ce0fb4e5e01fb79
For certain partition size, the function poniter may not be intialized
at all. The patch prevent the call if the pointer is not set.
Change-Id: I78b8c3992b639e8799a16b3c74f0973d07b8b9ac
This commit enables early termination in the rate-distortion
optimization search loop for chroma components. When the cumulative
rd cost is above the current best value, skip the rest per-block
transform/quantization/coeff_cost and continue to the next
prediction mode.
For bus_cif at 2000 kbps, the average run-time goes down from
168546ms -> 164678ms, (2% speed-up) at speed 0
36197ms -> 34465ms, (4% speed-up) at speed 1
Change-Id: I9d3043864126e62bd0166250d66b3170d520b3c0
Updating all foreach_transformed_block_visitor functions to work with
plane block size instead of general block. Removing a lot of duplicated
code.
Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
This change set is intermediate. The next one will remove all repetitive
plane_bsize calculations, because it will be passed as argument to
foreach_transformed_block_visitor.
Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
The intent was to initialize the deltas for the
segment to the computed value, irrespective of mode
and reference frame if (mode_ref_delta_enabled == 0).
(In response to bug posted by Manjit Hota to codec-devel
and webm-discuss lists)
Change-Id: I10435cb63d0f88359bb4c14f22181878a1988e72
Return the distortion value in vp9_rd_pick_intra_mode_sb as sum of
dist_y and dist_uv. Remove the right shift operation on dist_uv,
and make it consistent with that of vp9_rd_pick_inter_mode_sb.
Change-Id: I9d564e242d9add38e32595d33b0e0dddb1d55e5b
When the frame size changes the last frame segment map must
be resized to match and initialized to 0.
Change-Id: Idc10de109f55dbe9af3a6caae355a2974712243d
This commit makes the rate-distortion optimization search of chroma
components consistent across all block sizes. It removes redundant
codes.
Change-Id: I7e76f54d045e8efdd41d84a164c71f55b484471b
VP9_COMMON is the right place to segmentatation struct because it has
global segmentation parameters, not something specific to macroblock
processing.
Change-Id: Ib9ada0c06c253996eb3b5f6cccf6a323fbbba708
Adds a speed feature to disable split partition search based on a
given threshold on the source variance. A tighter threshold derived
from the threshold provided is used to also disable horizontal and
vertical partitions.
Results on derfraw300:
threshold = 16, psnr = -0.057%, speedup ~1% (football)
threshold = 32, psnr = -0.150%, speedup ~4-5% (football)
threshold = 64, psnr = -0.570%, speedup ~10-12% (football)
Results on stdhdraw250:
threshold = 32, psnr = -0.18%, speedup is somewhat more than derf
because of a larger number of smoother blocks at higher resolution.
Based on these results, a threshold of 32 is chosen for speed 1,
and a threshold of 64 is chosen for speeds 2 and above.
Change-Id: If08912fb6c67fd4242d12a0d094783a99f52f6c6
This commit unifies the rate-distortion cost calculation process of
luma and chroma components. It allows early termination to be enabled
later in the rd search loop of chroma components, in consistent with
luma pixels.
Change-Id: I2e52a7c6496176bf2a5e3ef338d34ceb8aad9b3d
write_ivf_file_header would incorrectly skip writing the file header in
the 2nd pass, causing the initial frame header to be overwritten on
close potential causing an overly large frame header to be read and a
crash.
most likely broken since:
9e50ed7 vpxenc: initial implementation of multistream support
fixes issue #585
Change-Id: I7e863e295dd6344c33b3e9c07f9f0394ec496e7b
Making foreach_transformed_block_in_plane more clear (it's not finished
yet). Using explicit tx_size variable consistently instead of
(ss_txfrm_size / 2) or (ss_txfrm_size >> 1) expression.
Change-Id: I1b9bba2c0a9f817fca72c88324bbe6004766fb7d
The macro block mode info context originally contained an
entry for each 16x16 macroblock. In VP9 each entry refers
to an 8x8 region not a macro block, so the naming is misleading.
This first stage clean up changes the names of 3 entries in the
structure to remove the mb_ prefix.
TODO clean up the nomenclature more widely in respect of
mbmi and bmi.
Change-Id: Ia7305c6d0cb805dfe8cdc98dad21338f502e49c6
The conversion was done with the help of the checkbashisms script
and https://wiki.ubuntu.com/DashAsBinSh .
Change-Id: Id64ecefb35c8d72302f343cd2ec442e7ef989d47
Don't do vertical or horizontal splits if subsize < min_partition_size,
except for edge blocks where it makes sense.
Change-Id: I479aa66ba1838d227b5de8312d46be184a8d6401
Enable SSE2 implementation of high precision 32x32 forward DCT. The
intermediate stacks are of 32-bits. The run-time goes down from
32126 cycles to 13442 cycles.
Change-Id: Ib5ccafe3176c65bd6f2dbdef790bd47bbc880e56
Adding function build_inter_predictors_for_planes to build inter
predictors for specified planes. This function allows to remove
condition "#if CONFIG_ALPHA" and use MAX_MB_PLANE for general case.
Renaming 'which_mv' local var to 'ref', and 'weight' argument to 'ref'.
Change-Id: I1a97160c9263006929d38953f266bc68e9c56c7d
Invert loops to operate vertically in the inner loop. This allows
removing redundant loads.
Also add preloading of data.
Change-Id: I4fa85c0ab1735bcb1dd6ea58937efac949172bdc
Each iteration of the horizontal loop reuses 7 of the 11 source
values. Loading only the 4 new values saves some time.
Also add preload for source data.
Overall 4% faster on Chromebook.
Change-Id: I8f69e749f2b7f79e9734620dcee51dbfcd716b44
'skippable' can remain unset and negatively affect later decisions
address one aspect of issue #599
Change-Id: Iffdf0ac2e49ac481c27dc27c87fa546d4167bb28
Loop filter configuration doesn't belong to macroblock, so moving it from
MACROBLOCKD to VP9_COMMON. Also moving the declaration of loopfilter struct
from vp9_blockd.h to vp9_loopfilter.h.
Change-Id: I4b3e34be9623b47cda35f9b1f9951f8c5b1d5d28
The memset sets 16 bytes rather than the correct size of the
final array dimension (MAX_MODE_LF_DELTAS).
(In response to bug posted by Manjit Hota to codec-devel
and webm-discuss lists)
Change-Id: I8980f5aa71ddc9d7ef57c5b4700bc28ddf8651c7
The mix use of double type and simd code caused invalid values stored
in double variables, further caused unit tests to fail. The failures
were only observed on x86-win32-vs9 build with vs2008.
Change-Id: If0131754a3bf217a5ace303b7963e8f5162c34b5
Adds a new subpel motion estimation function that uses a 2-level
tree-structured decision tree to eliminate redundant computations.
It searches fewer points than iterative search (which can search
the same point multiple times) but has the same quality roughly.
This is made the default setting at speeds 0 and 1, while at
speed 2 and above only a 1-level search is used.
Also includes various cleanups for consistency and redundancy removal.
Results:
derf: +0.012% psnr
stdhd: +0.09% psnr
Speedup of about 2-3%
Change-Id: Iedde4866f5475586dea0f0ba4cb7428fba24eee9
Different partitionings were not being evaluated against
best_rd and there were unnecessary calls to RDCOST. This
could have resulted in a non-optimal partioning being
selected.
I simplified the variables used to track the rate,
distortion and RD values throughout the function.
Change-Id: Ifa7085ee80d824e86791432a5bc6d8fea5a3e313
Using block width and block height instead of their logarithms. Using
SUBPEL_BITS and SUBPEL_SHIFTS constants instead of magic numbers.
Change-Id: I4e10e93c907c8a5e1cb27dfe74d1fcdcc4995448
The low precision 32x32 fdct has all the intermediate steps within
16-bit depth, hence allowing faster SSE2 implementation, at the
expense of larger round-trip error. It was used in the rate-distortion
optimization search loop only.
Using the low precision version, in replace of the high precision one,
affects the compression performance by about 0.7% (derf, stdhd) at
speed 0. For speed 1, it makes derf set down by only 0.017%.
Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
Removing the old one bsize_from_dim_lookup. Now we have a way to determine
block size for plane using its subsampling values (ss_size_lookup). And
then we can find the number of pixels in the block (num_pels_log2_lookup).
Change-Id: I6fc981da2ae093de81741d3d78eaefed11015db9
Functions scale_mv_q4 and scale_mv_q3_to_q4 were almost identical except
q3->q4 conversion in scale_mv_q3_to_q4. Now q3->q4 conversion happens
directly in vp9_build_inter_predictor.
Also adding useful constants: SUBPEL_BITS and SUBPEL_MASK.
Change-Id: Ia0a6ad2ac07c45fdf95a5139ece6286c035e9639
Reduce the delta loop filter for blocks that are cyclicly refreshed.
This helps to reduce the dot artifacts that may happen
when zero_mv blocks are repeatedly loop-filtered.
This change, along with the fix in:
https://gerrit.chromium.org/gerrit/#/c/40409/
helps to reduce this artifact, but cannot remove the dot artifacts completely.
Change-Id: I44675e7a0f59295b648a3b7d4956fb301231a97f
2013-01-11 16:46:09 -08:00
609 changed files with 91622 additions and 88321 deletions
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.