Also tweaks to other features and experiments with
what is on and off at different speed settings.
Change-Id: I3e1d0be0d195216bf17c2ac5df67f34ce0b306b2
The aligned array in parameter list caused win32 build to report
c2719 error. This commit fixed the issue by make the parameter
type a pointer instead of an array.
Change-Id: I4ed654ce4eba2db4995d9cdc136c68e9a6acc992
Each frame we reset all adaptive thresholds to MAX
rather than base. As modes are picked their thresholds
drop down.
Change-Id: Ia37f03a73003c2d9bfcda57edea07205e9a0e5e8
Renamed cpi->sf.first_step to cpi->sf.reduce_first_step_size
and changed its meaning such that it is a delta applied to
reduce the default first step size (>> x) in the motion search
rather than an absolute value.
The default first step size is already changed according to the image
dimensions (smaller for smaller images). cpi->sf.reduce_first_step_size
now applies a further correction from the default.
Change-Id: Ia94e08bc24c67b604831f980909af7e982fcd16d
The part where we align it by 8 or 16 is an implementation detail that
shouldn't matter to the outside world.
Change-Id: I9edd6f08b51b31c839c0ea91f767640bccb08d53
Makes first 50 frames of bus @ 1500kbps encode from 3min22.7 to 3min18.2,
i.e. 2.3% faster. In addition, use the sub_pixel_avg functions to calc
the variance of the averaging predictor. This is slightly suboptimal
because the function is subpixel-position-aware, but it will (at least
for the SSE2 version) not actually use a bilinear filter for a full-pixel
position, thus leading to approximately the same performance compared to
if we implemented an actual average-aware full-pixel variance function.
That gains another 0.3 seconds (i.e. encode time goes to 3min17.4), thus
leading to a total gain of 2.7%.
Change-Id: I3f059d2b04243921868cfed2568d4fa65d7b5acd
This commit enables 8x8 DCT and hybrid transform unit tests. It
also tunes the forward hybrid transform rounding opertions for
more precise round-trip performance.
Change-Id: If05c1ce59d75d641b9c6c91527d02d3a6ef498c3
This commit makes use of the butterfly structure to enable the sse2
version implementation of 8x8 ADST/DCT hybrid transform coding.
The runtime of hybrid transform module goes down from 1170 cycles
to 245 cycles. Overall speed-up around 1.5%.
Change-Id: Ic808ffd21ece8a9d0410d8c0243d7b6c28ac3b3f
This reduced the size of the MODE_INFO array (mip and prev_mip)
by 425,568 bytes each for 1080p resolutions.
Change-Id: Ifa513ec2d0a49e8ec0867ec90620762fb7f1261d
For cases where there's no transform set in bit 0 (the left edge of
the SB) but bit 0 of mask_4x4_int is set (the edge 4 pixels from the
left edge needs filtering), it was incorrectly being skipped before.
This situation only happens on the leftmost edge of the image, as
the edge at column 0 is intentionally skipped since there aren't
pixels to the left to read.
Change-Id: Ib2fbbcb40166e90af31b1a0e13b85b68c226cbd3