Avoid scaling the references if they have already been scaled.
Change only affects 1 pass non-svc mode for now.
Change-Id: I204f4079c026cba7adce7a7f855d072f6139ccec
The RD and load save/code grabs it as groups of four. In practice there
is no change to physical allocations becaquse this is backed by a 16-byte
memalign.
Change-Id: I01e89769872300e23227e03dd24a6e229f482025
Add vpx_dsp_rtcd.h to the header file list. The od_bin_fdct8x8()
here depends on forward 8x8 2D-DCT.
Change-Id: I1d71edc71f07069808823d2445c1cafd285e1b94
This commit factors out common macro definitions from the forward
and inverse transform implementations into vpx_dsp. It removes
the duplicate macro definitions from encoder and decoder folders.
Change-Id: I92301acbd3317075e9c5f03328a25abb123bca78
This commit factors the 4x4, 8x8, and 16x16 2D-DCT forward
transform operations into vpx_dsp folder.
Change-Id: I084b117b79c0925edcbcabb93f62b9f4bf8dbe7d
vp9_itrans*_dspr2.c aren't necessary for high bitdepth builds and
notably vp9_itrans8_dspr2.c fails in various configurations using a
codesourcery toolchain:
vp9_itrans8_dspr2.c:31:5: can't find a register in class 'GR_REGS' while reloading 'asm'
Change-Id: I2ac76203e65cc643cb835ab50e95701896d92a1a
This commit limits the scope of 1-D DCT and ADST functions within
vp9_dct.c and makes them static. This largely clears out the cross
referencing issue between vp9_dct.c and the SIMD optimizations.
Change-Id: If7cac478b11bb32328ccf70a9f60b709dad43d7f
The SSE2 version high bit-depth forward hybrid transforms are
essentially using the C functions via cross referencing to 1-D
functions in vp9_dct.c. This commit unifies the two versions and
removes the unnecessary dependency.
Change-Id: Ib4d0702a138f8daf7d0bd97c141ee7088f293765
Separate the hybrid transform case from 2D-DCT case. This will
allow us to clear up cross dependency between c and SIMD
implementations later.
Change-Id: Iaa499e8b096850a1c5a0c50a3b6e63e15d0184bf
The following quantization functions were moved:
vp9_quantize_b
vp9_quantize_b_32x32
vp9_highbd_quantize_b
vp9_highbd_quantize_b_32x32
vp9_quantize_dc
vp9_quantize_dc_32x32
vp9_highbd_quantize_dc
vp9_highbd_quantize_dc_32x32
The purpose of doing that was to allow these functions to be shared
by multiple codecs.
Change-Id: Id8ab939f283353cdd07bd930d47db3d932a5d87f
This commit moves the loop filter dspr2 implementation from vp9 to
vpx_dsp directory. It also fixes header file format issues.
Change-Id: I09203ed4bd267d7fd76bb79a6ee84a37646206b2
Remove the use of drop_frames_water_mark, as this is used for
frame dropping control. Use fixed threshold for now on buffer underflow.
Change-Id: If0ddda9f7f6fa96067cdcb0eccb42e17bda37c32
For vp9 decoder build without profile 2 and profile 3 support, this
commit changes to report error "Unsupported bitstream profile" for
input streams in profile 2 or 3, rather than other misleading error
information.
In addition, one of the invalid files in unit tests is actually coded
profile 2, this commit makes it tested only when the decoder is built
with vp9-highbitdepth.
This fixes issue #1028.
Change-Id: I8b6c1210787c8f89c703a546687dcf973ac20fc0
The various tap loop filter operations are common functions across
codec. This commit moves them along with SIMD optimizations to
vpx_dsp folder.
Change-Id: Ia5fa0b2e5289cdb98467502a549c380b9c60e92c
In aq-mode=3 under a resizing action (i.e., resize_pending != 0),
force an update of the golden reference frame.
Change-Id: I14806f6db71b5f8c827678cc5e1fc913c138a9a4
Fix bug in setting this flag for animated content.
The bug did cause quality to increase because far
more frames are not boosted than boosted.
However, the speed trade off to gain is a lot less
favorable and the behavior was not as intended.
Change-Id: I89fb70419c88b26f40b3534de0481730a1b3fcfa
Move the clamp functions to vpx_dsp_common.h file. Clear out the
dependency of vp9_loopfilter_filters.c on vp9_common.h file.
Change-Id: I9c4b928bcd7f597106b5aa96354356d3775a3431
Use drop_frames_water_mark for threshold on buffer underflow,
and change threshold for resize down.
Change-Id: I2de19adce50abe9bcdc0b107528cec8cc1857fcc
The fast scaling for 1 pass mode was being used only on the
first frame after resizing event (because resize_scale_num/den
is set to 1 and only changed for first frame following resize event).
Change-Id: I723b63e21823eb858f25f5662d2bbe4f1842e61f
Proper use/update of resize_state and resize_pending to constrain
the total amount of downsizing to be at most one scale down, for now.
Change-Id: Id18fc32499f2fbdbec16728dcdc9e4eac09098f0
From Change Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
There is still an issue relating to one animated test clip with repeat
patterns where this change effectively increase the default maximum
arf interval by +1. This can be examined seperately.
Change-Id: Idd01d5480fc45202d8a059a0c3afc0997cc5bdd1
Flaten the intra block decoding process. It removes the legacy
foreach_transformed_block use in the decoder. This saves cycles
spent on retrieving the transform block position.
Change-Id: I21969afa50bb0a8ca292ef72f3569f33f663ef00
This commit simplifies the intra block boundary condition logic.
It removes the block index from the argument set.
Change-Id: If00142512eb88992613d6609356dfd73ba390138
Eliminates the byte by byte read from bool decoder, by reading
in a size_t and then shifting it into place.
Change-Id: Id89241977103fc3b973e4ed172a5cbf246998e5d
The clamp calls with INT32_MIN and INT32_MAX have no effect at all on
int values passed in, therefore this commit removes those effectless
clamps and also adds more const intermediate results to make the code
more readable.
Change-Id: I66d8811f58bb74ec31cbec9a6c441983a662352e
Rework the inter mode transform block decoding loop. Replace the
block index with the row and col index as the input argument. It
saves function call to compute the row and col index according to
the block index and overall block size, and many if statements
associated with the transform block position relative to the coding
block. For the test bit-stream pedestrian_area 1080p at 5 Mbps,
the decoding speed goes up from 81.13 fps to 81.92 fps.
Note that the intra coded block decoding needs more refactoring
work than the inter ones. So keep it using foreach_transforme_block
as for now.
Change-Id: I5622bdae7be28ed5af96693274057f55ba9b4fb4
The encoder gets its dqcoeff from the context tree. In the decoder move
it to directly after MACROBLOCKD.
Change-Id: I46c9b76f26956a360d17de0b26ecb994dae34ecb
Even if the recode loop is not enabled for the current frame type
trap the case where the projected size of a a frame is above the
maximum allowed in recode_loop_test()
Change-Id: I453004694b8f8699e3c2a83252e9f83adccdda4e
Changes to allow more use of rectangular partitions at
speeds 1 and 2 for content classed by the first pass as
animation and for blocks near the active image edge.
This has quite a big impact in quality for the animated
test sequence but also hurts encode speed for speed 2.
For other content types the impact on both speed and
quality is small.
Added some plumbing for detection of internal vertical
image edges.
Change-Id: I3fc48de2349f8cb87946caaf0b06dbb0ea261a9a
Change speed features / behavior for split mode when there
is an internal active edge (e.g. formatting bars).
Remove some threshold constraints in rd code near the active
edge of the image.
Add some plumbing for left and right active edge detection.
Patch set 5. Limit rd pass through for sub 8x8 to internal active edges.
This takes away any speed penalty for most clips but keeps the enhanced
edge coding for the more critical case of internal image edges
Change-Id: If644e4762874de4fe9cbb0a66211953fa74c13a5
Replace block index with transform type in the argument list. This
allows to save an extra fetch to the prediction mode. For pedestrian
area 1080p coded at 5 Mbps with single tile, the average decoding
speed goes up from 80.55 fps (before the refactoring series) to
81.13 fps.
Change-Id: Icbebf84ce63c19c0c92f3690ed201f6c3eab7881
If the pre-selected partition size (from variance partition) is
32x32, also apply nonrd partition search for 32x32 and 16x16 size.
Overall small positive gain in metrics, average ~1%.
Some visual improvement, for lower resolutions.
Change-Id: I69cb425bda94f7d13d34c451ab30e9276335a30e
The decoding process handles detokenization and reconstruction per
transform block sequentially. There is no need to offset the dqcoeff
buffer according to the transform block index. This allows to
reduce the memory spill and improve cache performance.
Change-Id: Ibb8bfe532a7a08fcabaf6d42cbec1e986901d32d
inline the code directly in read_mv_component(), the only place where it
was being used; this removes a function call in a hot function
Change-Id: I66f99c0c9ce3bc310101dbca4a470f023cc6fb55
Adds two new vp9 parameters --min-gf-interval and --max-gf-interval
to enable testing based on frequency of alt-ref frames.
Also adds a unit-test to test enforcement of min-gf-interval.
For both these parameters the default value is 0, which indicates
they are picked by the encoder, based on resolution and framerate
considerations. If they are greater than zero, the specified
parameter is honored.
(Additional note by paulwilkins)
Note that there is a slight oddity in that key frames are also GFs and
considered part of GF only group. However they are treated as not
being part of an arf group because for arf groups the previous GF is
assumed to be the terminal or overlay frame for the previous group.
(end note)
Change-Id: Ibf0c30b72074b3f71918ab278ccccc02a95a70a0
This reverts commit a42df86c03.
this change causes MSA/VP9SubpelVarianceTest.Ref and
MSA/VP9SubpelVarianceTest.ExtremeRef failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
Change-Id: I40b71a0b774eaeb31f66f795733f95cf360909f7
This reverts commit 61774ad1c4.
this change causes MSA/VP9SubpelAvgVarianceTest.Ref failures under
mips32r5el-msa-linux-gnu and mips64r6el-msa-linux-gnu
Change-Id: I7fb520c12b2a3b212d5e84b7619a380a48e49bb0
The vp9_lpf_vertical_16_dual function optimized for x86 32bit target. The hot code in that function was caused by the call to the transpose8x16.
The gcc generated assembly created uneeded fills and spills to the stack. By interleaving 2 loads and unpack instructions, in addition to hoisting the consumer
instruction closer to the producer instructions, we eliminated most of the fills and spills and improve the function-level performance by 17%.
credit for writing the function as well as finding the root cause goes to Erik Niemeyer (erik.a.niemeyer@intel.com)
Change-Id: I6173cf53956d52918a047d1c53d9a673f952ec46
Added code to reduce the minimum partition size searched
for super blocks at or straddling the edge of the image.
If the first pass has detected formatting bars the "active" edge
may not be the real edge.
Change-Id: I9c4bdd1477e60f162a75fac95ba6be7c3521e05c
Correct the ARF boost calculations to partly discount
inactive or very low energy regions of the image.
Examples (formatting bars and 0 energy areas of animated clips).
Change-Id: I241af058d10aba8c67a4deca36deb913047d4561
This commit moves the primitive multi-threading files from vp9
folder to vpx_thread, which will be accessible by all vpx codec.
Change-Id: Ib51e66e9c69801c10631fab56d35a0c0aaed5883
the max value of the lookup in expanded form is:
(((1 << 7) - 1) << 1) - 65 + 1 + 64 = 254
remove the clamp [0, 253] and add one table entry
Change-Id: I0b5d0c66702fdb0b8f1cc9ab9b0dac66326e85a6
to MB_MODE_INFO_EXT. This saves 36 bytes per 8x8 area for
both the decoder and encoder. (encoder has two MODE_INFO
buffers)
Change-Id: If006abb2224acaf326df3c2be09e77e967662107
Only do the check for resizing if the feature is selected
(i.e., resize_mode = RESIZE_DYNAMIC).
And modify condition for checking to be resize_count >= window,
(since framerate can change).
Change-Id: Idceb4e50956bb965a1492b4993b0dcb393c9be4d
Reduce boost for segment#2 for low bitrates and low-res.
This change is to reduce the rate overshoot at low bitrates.
No change in behavior, except at the very low bitrates.
Change-Id: I0dbd9d3b6356da5804de94adf10fca6a7a8f8948