My fault, that was a float (not integer) which was converted to int64_t.
This reverts commit a885e1cbf0
Change-Id: Ic50708b959e1c3cb3e37da1429d334fafc3391d6
In the full-rd transform size search, we go through all transform
sizes to choose the one with best rd score. In this patch, an
early termination is added to stop the search once we see that the
smaller size won't give better rd score than the larger size. Also,
the search starts from largest transform size, then goes down to
smallest size.
A speed feature tx_size_search_breakout is added, which is turned off
at speed 0, and on for other speeds. The transform size search is
turned on at speed 1.
Borg test results:
1. At speed 1,
derf set: psnr gain: 0.618%, ssim gain: 0.377%;
stdhd set: psnr gain: 0.594%, ssim gain: 0.162%;
No noticeable speed change.
3. At speed 2,
derf set: psnr loss: 0.157%, ssim loss: 0.175%;
stdhd set: psnr loss: 0.090%, ssim loss: 0.101%;
speed gain: ~4%.
Change-Id: I22535cd2017b5e54f2a62bb6a38231aea4268b3f
This commit enables the encoder to record the location of the
center frame to generate alter reference frame. It then allows to
skip checking prediction modes of other reference frame types when
it comes to encode this frame.
The speed 3 runtime is reduced for the test sequences:
bus at CIF 1000 kbps, 9791 ms -> 9446 ms, i.e., 3.5% speed-up,
pedestrian at 1080p 2000 kbps, 184043 ms -> 175730 ms, i.e., 4.5%
speed-up.
No compression performance change observed.
Change-Id: Iacfde3bcc1445964e7a241f239bd6ea11cb94bd1
This reverts commit 5509b7fd8f
Observed a big drop in compression quality and speed for speed 1 for a 360p test clip, revert this now for investigation.
Change-Id: If69dc8d77a225b34dc7907a9472e1a7a0a22762d
Add a speed feature to give the tighter partition search
range. Before partition search, calculate the histogram
of the partition sizes of the left, above and previous
co-located blocks of the current block. If the variance of
observed partition sizes is small enough, adjust the search
range around the mean partition size, which will be tigher.
The feature is currently turned on at speed 2. Experiments on
sample youtube clips show on average the runtime is reduced
by 3-7%.
For hard stdhd clips:
park_joy_1080p @ 15000kbps: 509251 ms -> 491953 ms (3.3%)
pedestrian_area_1080p @ 2000kbps: 223941 ms -> 214226 ms (4.3%)
The PSNR performance is changed:
derf: -0.112%
yt: -0.099%
hd: -0.090%
stdhd:-0.102%
Change-Id: Ie205ec5325bf92ec5676c243e30ba9d0adca10f2
At --good and speed 3 or above for resolution less than 720p. This
disables the tests for 64x64 intra prediction modes. Encoding time
reduction is about 1%.
Change-Id: Ib396e3d1417fece416e3f0fee929b128acbb130f
The test to determine if the mode info buffers need
to be resized when the frame size changes was
incorrect, as per bug 837.
By storing the size of the allocated data structure,
a simple test determines whether to allocate more
memory when the frame size changes.
Change-Id: I1544698f2882cf958fc672485614f2f46e9719bd
In the encoder, current_video_frame is used in a couple of places to
decide encoding strategy, this commit replaces with more appropriate
variables.
Change-Id: I3d3d8d8e2ea02c489e4639b9d4c446a63e357d29
This commit moves the simplified coefficient probability model
and costing update to speed 4, and turns on chessboard pattern
mode search for sub 720p sequences. The overall coding performance
of speed 3 is improved:
derf 0.889%
stdhd 1.744%
The speed 3 runtime for test sequences are improved:
bus cif at 1000 kbps 9823 ms -> 9642 ms
pedestrian 1080p 2000 kbps 189559 ms -> 183284 ms
Change-Id: Iecbc7496a68f31fd49fb09f8dfd97c028d675a5d
This commit enables the encoder to skip NEARMV and ZEROMV if the
above and left blocks have identical reference frame, and the
current reference is different from that. It reduces the runtime
of speed 3 for test sequences:
bus cif at 1000 kbps 10064 ms -> 9823 ms
pedestrian 1080p at 2000 kbps 193078 ms -> 189559 ms
The compression performance is changed by
derf -0.085%
stdhd -0.103%
Change-Id: If304f26d42e6412152a84c3dd7b02635c38444f4
This commit allows the encoder to check the above and left neighbor
blocks' reference frames and motion vectors. If they are all
consistent, skip checking the NEARMV and ZEROMV modes. This is
enabled in speed 3. The coding performance is improved:
pedestrian area 1080p at 2000 kbps,
from 74773 b/f, 41.101 dB, 198064 ms
to 74795 b/f, 41.099 dB, 193078 ms
park joy 1080p at 15000 kbps,
from 290727 b/f, 30.640 dB, 609113 ms
to 290558 b/f, 30.630 dB, 592815 ms
Overall compression performance of speed 3 is changed
derf -0.171%
stdhd -0.168%
Change-Id: I8d47dd543a5f90d7a1c583f74035b926b6704b95
The function is called only once, right after all stats counters are
reset to 0. Therefore all the computations have zero effect on return
values. This commmit to removed those effectless code.
Change-Id: I50d27c0802547921fa36c60aa4bd92d76247f595
At speed 6 the smallest partitioning was 16x16 and biggest
intra block was 8x8, essentially disallowing all intra blocks
which produces ugly artifacts when revealing new video.
Change-Id: I364042d4c64e09be0666ade64aac94d0a1b586cf
This commit enables encoder to select fast forward transform and
quantization path according to the prediction residual sse/variance,
in the rate-distortion optimization scheme.
Change-Id: Ief9fc3844fd4107166d401970e800c6e5ce2b5fe
We had a very complicated way to initialize cpi->pass from
cfg->g_pass:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->mode = ONE_PASS_GOOD;
break;
case VPX_RC_FIRST_PASS:
oxcf->mode = TWO_PASS_FIRST;
break;
case VPX_RC_LAST_PASS:
oxcf->mode = TWO_PASS_SECOND_BEST;
break;
}
cpi->pass = get_pass(oxcf->mode).
Now pass is moved to VP9EncoderConfig and initialization is simple:
switch (cfg->g_pass) {
case VPX_RC_ONE_PASS:
oxcf->pass = 0;
break;
case VPX_RC_FIRST_PASS:
oxcf->pass = 1;
break;
case VPX_RC_LAST_PASS:
oxcf->pass = 2;
break;
}
Change-Id: I8f582203a4575f5e39b071598484a8ad2b72e0d9
Eliminated instructions by using better neon instructions
and rearranging the loop.
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~1.0%.
Change-Id: I6b1700e79318f647ea67ef25e954c308932950ec
Replaced encoder and decoder functions to get a pointer
to a reference frame with a common function, vp9_get_ref_frame,
and simplified it.
Change-Id: Icb206fcce8caace3bfd1db3dbfa318dde79043ee
in the sub_pixel_*variance* function the dst is aligned to 16 bytes and not
to 32 bytes - now load unaligned data
Change-Id: I2e0b9745543697efc56fefa32857ea10117af135
Fix the interaction between active map and reuse_inter_pred_sby. The
reuse_inter_pred_sby feature expects inter predictors to already be
built, but blocks with active map on skip this step.
Change-Id: Ibb2bf0d228f678935d82a0ede9cb0919ab7c8878
in the function sad32x32x4d and sad64x64x4d the source is aligned to 16 bytes
and not to 32 bytes - the load is now unaligned.
Change-Id: I922fdba56d0936b5cf72e4503519f185645a168c
Specifies the bit-depth, color sampling and colorspace
for intra only frames for profiles > 0
Also adds checks to ensure that profile 1 and 3 are
exclusively used for non 420 streams.
Change-Id: Icfb15fa1acccbce8f757c78fa8a2f60591360745
This commit integrates the fast transform and quantization process
into skip_recode scheme in the rate-distortion optimization loop.
Previously the fast transform and quantization process was only
enabled for non-RD coding flow.
Change-Id: Ib7db4d39b7033f1495c75897271f769799198ba8
vp9_rb_bytes_written -> vp9_wb_bytes_written
+ move limits.h from the header to the source file where it's needed
Change-Id: Ifcdc856b4d4dcc2fff555ef11f86c86a0d83dab3
This patch allows the encoder to directly split the block
in partition search, therefore skip searching NONE. It
computes a score which measures whether 16x16 motion vectors
from the first pass in the current block are consistent with
each others. If they are inconsistent and we have enough Q
to encode, split the block directly, and skip searching NONE.
This feature is under flag CONFIG_FP_MB_STATS. In speed 2,
it further gives a speedup of 3-8% on sample yt clips as
compared to the previous version under the same flag. Overall,
the features under the flag will give 7-15% on typical yt
clips at up to 6000kbps data rate. The speedup at very high
data rate is not significant.
For hard stdhd clips:
park_joy_1080p @ 15000kbps: 504541ms -> 506293ms (-0.35%)
pedestrian_area_1080p @ 2000kbps: 326610ms -> 290090ms (+11.2%)
The compression performance using the features under the flag:
derf: -0.068%
yt: -0.189%
hd: -0.318%
stdhd:-0.183%
To use the feature, set CONFIG_FP_MB_STATS and turn on
cpi->use_fp_mb_stats.
Change-Id: Iad58a2966515c8861aa9eb211565b1864048d47f
This code was being called from two places and
difficult to parse. I rationalized it in to a
function to improve readability.
Change-Id: I154b8fe0b84e6c01e69601e78e67bd47c954d8b6
Re-organize the one-byte structure for 16x16 first pass
block. Add bits to indicate motion vector directions.
Change-Id: Id10754ba343dfc712c7fed5bcc85c67fa0bbcb89
vp9_variance8x8(), and vp9_get8x8var().
On a Nexus 7, vpxenc (in realtime mode, speed -12)
reported a performance improvement of ~1.2%.
Change-Id: I8a66ac2a0f550b407caa27816833bdc563395102