In the decoder we don't need to save eobs, we can pass eob as an argument.
That's why removing eob arrays from VP9Decompressor and TileWorkerData,
and moving eob pointer from macroblockd_plane to macroblock_plane.
Change-Id: I8eb919acc837acfb3abdd8319af63d1bbca8217a
This commit fixes the intra prediction reference source selection
in the settings of skip_encode. Use original boundary pixels as
prediction reference, when the inverse transform and reconstruction
are skipped in the per block size rate-distortion optimization loop.
Change-Id: I36081aa30aa46e203e0e6f4e8a420fd08269469a
We only need qcoeff buffers in the encoder. Reducing TileWorkerData struct
and VP9Decompressor struct sizes by 24K.
Change-Id: Id148868461f7ffa3d3dd634b371503ae9c57e207
Simplifies the code by implementing band mapping with static arrays.
A lot of the code complexity introduced in a previous patch
disappears.
Change-Id: Ia3fac36e594fb5ad2d55ae141c58bba4c55c2d28
Overall change (using dual buffer scheme for superblocks of both inter
and intra modes) reduces speed 2 runtime:
bluesky_1080p at 6000kbps: 263553ms -> 257441ms
riverbed_1080p at 8000kbps: 233230ms -> 225308ms.
Change-Id: Idf8d70f768a4b0d97b2a8506372c57b7b4022119
Implements scan order to band map with arrays in both the encoder
and decoder to remove conditional statements.
Encoding seems to be about 1% faster at speed 0, tested on football.
Decoding seems to be about 0.5-1% faster on a set of 25 videos.
Change-Id: Idb233ca0b9e0efd790e30880642e8717e1c5c8dd
This commit enables the dual buffer rate-distortion optimization
and encoding scheme. It stacks the original transform coefficients,
quantized levels, and reconstructed coefficients, in the rate-
distortion optimization search process, hence eliminates the need
to re-run residual generation, forward transform, and quantization
in the encoding stage.
Change-Id: I011bfad3a59a380a869ee552e91dae0394ec492e
We only used "ib" to call get_scan() function, which in turn calls
get_tx_type_4x4() function. The latter one only needs block index if
bsize < BLOCK_8X8 -- under that condition raster_block == block.
Change-Id: I697306a0c3cf937acdd4f5e623d4367c5acc0b2f
The term x represents macroblock pointer across encode_block. Change
the two local variable names to avoid confusion.
Change-Id: Ic732e73023525d673c0a678ed2708ac1edf5a3f9
This commit makes zcoeff_blk cache the case where the entire block
is quantized to be zero (without applying zero-forcing) in the rate-
distortion optimization loop, and skip the forward DCT, quantization,
inverse DCT, and reconstruction process in the encode_block stage.
It now works for all the block sizes, including sub8x8 blocks.
Change-Id: I5ae60a9c436ba3637d11666733554bec4580ef98
Adding these functions to encapsulate tx_type check. Changing TX_TYPE to
int to match the declaration in vo9_rtch.h.
Change-Id: I6f3a2df6e35595ca73b6aaa9e3909ee7bc3fd16f
The encode_block for pass 1 takes simpler functionalities and can
save a few branches. The main reason is to make encode_block only
used after running rate-distortion optimization search in pass 2,
hence allowing dual buffer stack approach later.
Change-Id: I9e549ffb758e554fe185e48a07d6e0e01e475bcf
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I0ba3c52513a5fdd194f1e7e2901092671398985b
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: Ibc944952a192e6c7b2b6a869ec2894c01da82ed1
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: I2d95fdcbba96aaa0ed24a80870cb38f53487a97d
Just making fdct consistent with iht/idct/fht functions which all use
stride (# of elements) as input argument.
Change-Id: Id623c5113262655fa50f7c9d6cec9a91fcb20bb4
* changes:
Use a separate MODE_INFO stream for each tile column
Get rid of "this_mi", use "mi_8x8[0]" everywhere instead
Make the static_segmentation feature work again
The only case where they were intentionally pointing to different
structures was in mbgraph, and this didn't have the expected behavior
because both of these pointers are used interchangeably through the code
Change-Id: I979251782f90885fe962305bcc845bc05907f80c
Moving code that gets band_translate array from get_scan_and_band()
function to get_band_translate() function. Renaming get_scan_and_band() to
get_scan().
Change-Id: I43047c205a1ca2a6e24be44db39dc04b7a385008
We already have itxm_add member in MACROBLOCKD structure. Both
inv_txm4x4_1_add and inv_txm4x4_add are just its special cases for
different eob values. But eob logic is already implemented in
vp9_iwht4x4_add and vp9_idct4x4_add (that's why also removing
inverse_transform_b_4x4_add).
Change-Id: I80bec9b6f7d40c5e5033c613faca5c819c3e6326
Moving functions from vp9_idct_blk to vp9_idct because these functions are
used from both encoder and decoder. Removing duplicated code from
vp9_encodemb.c and reusing existing functions.
Change-Id: Ia0a6782f8c4c409efb891651b871dd4bf22d5fe8
This commit enables forcing all coefficients zero per transformed
block, when its rate-distortion cost is lower than regular coeff
quantization.
The overall performance improvement (including its parent patch on
calculating rd cost per transformed block) at speed 1:
derf: 0.298%
yt: 0.452%
hd: 0.741%
stdhd: 0.006%
Change-Id: I66005fe0fd7af192c3eba32e02fd6d77952accb5
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of pointers to MODE_INFO structs. The
MODE_INFO structs are now stored as a stream (decoder only),
eliminating unnecessary copies and is a little more cache
friendly.
Change-Id: I031d376284c6eb98a38ad5595b797f048a6cfc0d
The 16x16 transform unit test suggested that the peak coefficient
value can reach 32639. This could cause potential overflow issue
in the SSSE3 implmentation of 16x16 block quantization. This commit
fixes this issue by replacing addition with saturated addition.
Change-Id: I6d5bb7c5faad4a927be53292324bd2728690717e
mode_info_context was stored as a grid of MODE_INFO structs.
The grid now constists of a pointer to a MODE_INFO struct and
a "in the image" flag. The MODE_INFO structs are now stored
as a stream, eliminating unnecessary copies and is a little
more cache friendly.
For the test clips used, the decoder performance improved
by ~4.3% (1080p) and ~9.7% (720p).
Patch Set 2: Re-encoded clips with latest. Now ~1.7% (1080p)
and 5.9% (720p).
Change-Id: I846f29e88610fce2523ca697a9a9ef2a182e9256
Updating all foreach_transformed_block_visitor functions to work with
plane block size instead of general block. Removing a lot of duplicated
code.
Change-Id: I6a9069e27528c611f5a648e1da0c5a5fd17f1bb4
This change set is intermediate. The next one will remove all repetitive
plane_bsize calculations, because it will be passed as argument to
foreach_transformed_block_visitor.
Change-Id: Ifc12e0b330e017c6851a28746b3a5460b9bf7f0b
Making foreach_transformed_block_in_plane more clear (it's not finished
yet). Using explicit tx_size variable consistently instead of
(ss_txfrm_size / 2) or (ss_txfrm_size >> 1) expression.
Change-Id: I1b9bba2c0a9f817fca72c88324bbe6004766fb7d
The low precision 32x32 fdct has all the intermediate steps within
16-bit depth, hence allowing faster SSE2 implementation, at the
expense of larger round-trip error. It was used in the rate-distortion
optimization search loop only.
Using the low precision version, in replace of the high precision one,
affects the compression performance by about 0.7% (derf, stdhd) at
speed 0. For speed 1, it makes derf set down by only 0.017%.
Change-Id: I4e7d18fac5bea5317b91c8e7dabae143bc6b5c8b
This commit provides special handle on 16x16 inverse 2D-DCT, where
only DC coefficient is quantized to be non-zero value.
Change-Id: I7bf71be7fa13384fab453dc8742b5b50e77a277c
This allows us to increment the position at the band-level only as
we go from one band to the next; more importantly, that allows us to
use an add instead of multiply instruction, and omit the instruction
altogether if the band doesn't change from one coef to the next, thus
being slightly faster (probably more noticeable on systems where a
multiply is expensive, like arm).
Change-Id: I4343fe35b9f9a47fa00b217bdcbf5f91ff96c381
This commit brought back the shortcut implementation of 8x8/16x16
inverse 2D-DCT. When the eob <= 10, it skips the inverse transform
operations on row 4:7/4:15 in the first round. For bus_cif at 1000
kbps, this provides about 2% speed-up at speed 0.
Change-Id: I453e2d72956467d75be4ad8c04b4482ab889d572
This commit enables a special handle for the 8x8 inverse 2D-DCT,
where only DC coefficient is quantized to be non-zero. For bus_cif
at 2000 kbps, it provides about 1% speed-up at speed 0.
Change-Id: I2523222359eec26b144cf8fd4c63a4ad63b1b011