vp9_variance16x16(), and vp9_get16x16var().
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~16.7%.
Change-Id: Ib163aa99f56e680194aabe00dacdd7f0899a4ecb
currently the only way to know if multiple alt-refs are enabled is to
inspect the encoder instance.
this reduces the size of the allocation by 75% when not using multiple
alt-refs
Change-Id: Ie4baa240c2897e64b766c6ad229674884b5a65b6
When building with runtime cpu detect assume that armv7 targets can be
relied upon to have at least armv6 support. This may allow dead code
detectors to remove some _c functions.
Change-Id: Iaec4414011fcbbdf6f4ed0d90ef4a8fe8af540b5
This commit replace the repetitive retrieve of max and min allowed
partition from speed_feature with local variables max_size and
min_size.
Change-Id: Ib06f11f16615e4876e4dd5fb6a968c6bf5f7b216
The get_chessboard_index() used to call the entire VP9_COMMON
struct pointer to retrieve the chessboard pattern index. This cl
makes it call the frame index directly.
Change-Id: I3cad9d209ea2e77a358085a04fe1ff0ddec5ba03
Remove the redundant index computation when store the first
pass block-wise statistics. Currently, a single byte is
allocated for a 16x16 blocks, and all the frame statistics
saved during the first pass will be kept in memory for use
in the second pass. For a 1920x1080 300-frame clip, it will
take about 2.3 MB memory. This feature is off in current
setting.
Change-Id: I135a95b348ec093d54c6a07e1e8237626909e3bd
Remove all the redundant dct functions (dct4x4, dct8x8)
in avx2 except dct32x32 those functions were copied originally from dct_sse2
Change-Id: I742576fbf5175f3ac09f2076976a9247b259323e
The issue was introduced by commit g9f37d14 with adding explicit
restrictions on reference-frame scale factors. The restriction
is checked against aligned-by-8 frame dimensions, not against
original ones. So, for example, frame of 35×35 actually can refer
to frame of 70×70, but the new check won't allow this. It will
compare 35 vs 72 (not 70), so 2x downscale limit will be exceeded.
Change-Id: Ic663693034440f64ac8312cbff9e1e773a921060
The partition search for 4x4 blocks takes unnecessary steps to
reconstruct pixels and an extra partition type update. This commit
removes such operations. No visible compression/speed difference.
Thanks to Yue (yuec@) for finding this issue.
Change-Id: I3f83824aa3fd3717d63be0b280fa57258939a70a
The code fails the unit test. Speed comparisons to the C are invalid
because the code frequently didn't correctly extend the right and
bottom portions of the frame.
Reduce maximum frame size on ARM devices to avoid OOM
Change-Id: Ia664c86406f0bb8120fd7ad401f32d0bd44994fb
The source buffer is an aligned buffer in VP9. Added the alignment
to make it consistent with libvpx.
Change-Id: I3ebb9d2e8555ed532951da479dd5cbbb8812e02d
This commit turns on the existing vp9_get_prob function using
64 bit in the intermediate step. It fixes the ioc issue for 4K
above frame sizes (issue 828).
Change-Id: I9f627f3beca2c522f73b38fd2a3e7eefdff01a7c
The assignment of the variable mode_excluded in
vp9_rd_pick_inter_mode_sub8x8 takes redundant conditional jump.
This commit removes it.
Change-Id: Ie195fbe6e54ec2ade7093d562c456a2e93143704
Ensure consistent border extension by rounding uv_crop_* at image
creation time. Where it was rounded problems could arise with the right
and bottom extensions.
When padding = 32, y_width = 64, and y_crop_width = 63:
(padding + width - crop_width + 1) / 2
32 + 64 - 63 + 1 should equal 32 *but*
32 + 1 + 1 equals 34 giving a right buffer of 17 instead of 16.
By calculating uv_crop_* earlier we round up at the appropriate time and
for the same values:
(y_crop_width + 1) / 2
63 + 1 / 2
64
(padding / 2) + uv_width - uv_crop_width
16 + 16 - 16
16
Change-Id: If866cd1b63444771440edb1432280ac83875969b