This patch introduces the concept of dual inter16x16 prediction. A
16x16 inter-predicted macroblock can use 2 references instead of 1,
where both references use the same mvmode (new, near/est, zero). In the
case of newmv, this means that two MVs are coded instead of one. The
frame can be encoded in 3 ways: all MBs single-prediction, all MBs dual
prediction, or per-MB single/dual prediction selection ("hybrid"), in
which case a single bit is coded per-MB to indicate whether the MB uses
single or dual inter prediction.
In the future, we can (maybe?) get further gains by mixing this with
Adrian's 32x32 work, per-segment dual prediction settings, or adding
support for dual splitmv/8x8mv inter prediction.
Gain (on derf-set, CQ mode) is ~2.8% (SSIM) or ~3.6% (glb PSNR). Most
gain is at medium/high bitrates, but there's minor gains at low bitrates
also. Output was confirmed to match between encoder and decoder.
Note for optimization people: this patch introduces a 2nd version of
16x16/8x8 sixtap/bilin functions, which does an avg instead of a
store. They may want to look and make sure this is implemented to
their satisfaction so we can optimize it best in the future.
Change-ID: I59dc84b07cbb3ccf073ac0f756d03d294cb19281
Resolved or factored out some further issues with Q index.
Put in a 3rd order polynomial instead of less accurate power function
as the best fit on gf and kf boost adjustment.
Added avg_q value to use instead of ni_av_qi.
Compute segment delta Q values based on avg_q.
Fixed bug in adjust_maxq_qrange().
The extended range Q on the derf set, using standard data rates
(which do not extend high enough to get big benefits) still show
a shortfall of between 0.5 and 1% though so there would appear to
be further issues that need to be tracked down.
Change-Id: Icfd49b9f401906ba487ef1bef7d397048295d959
This commit added code to keep track of separate entropy contexts for
normal frames and alt ref frames. The underly assumption was that the
two type of frames have different entropy characteristics given they
typically have quite different quantization levels. By keeping entropy
contexts separate, it helps the entropy context distribution to be more
closely adapted to each frame type.
Tests on derf set showed a good and very consistent gain on all clips
on all metrics, avg psnr: 0.89%, overall psnr: 0.84% and ssim 0.93%.
http://www.corp.google.com/~yaowu/no_crawl/mulcontext.html
Change-Id: I15bc9697f6ff7829042911fe0c62930585d7e65d
This commit enabled the usage of 8x8 intra prediction modes on inter
frames. There are a few TODO items related to this: 1)baseline entropy
need be calibrated; 2)cost of UV need to be done more properly rather
than using decision only relying on Y; 3)Threshold for allowing picking
8x8 intra prediction should be lowered to lower than the B_PRED.
Even with all the TODOs, tests showed consistent gain on derf set ~0.1%
(PSNR:0.08% and SSIM:0.14%). It is assumed that 8x8 intra prediction
will help more on large resolution clips, especially with above TODOs
addressed.
Change-Id: I398ada49dfc32575cfab962a569c2885111ae3ba
Fixed some further QIndex related issues and replaced some tables
(eg zbin and rounding)
Also Added function (currently disabled by default) to populate the
main AC and DC quantizer tables. Using the original AC range the
resulting computed DC values give behavior broadly comparable
on the DERF set. That is not to say that the equations will hold good
over a more extended range. The purpose of this code is to make it
easier to experiment with further alterations to the Q range and distribution
of Q values plus the relative weights given to AC and DC.
The function find_fp_qindex() ensures that changes to the Q tables
are reflected in the value passed in to the first pass code.
Slight experimental adjustment to static segment Q offset.
Change-Id: I36186267d55dfc2a3d565d0cff7218ef300d1cd5
this commit is to add an variable in the macroblock level mode
info structure to track the transform size used in each MB, so
the information can be used later in the loop filter to change
how loop filter works on MBs with different transform sizes.
Change-Id: Id0eeaba6cc854c6d1be00ed8d237b3d9e250e447
Slight tweaks to the new minq equations to bring results more into line with
original lookup tables.
Change-Id: I969fc87d95912df549b6775e83ee2345e84d4da0
Fixed bug in firspass.c call to vp8_initialize_rd_consts()
This was passing in vp8_dc_quant(cm->base_qindex, cm->y1dc_delta_q)
instead of (cm->base_qindex + cm->y1dc_delta_q).
It just so happens that for the value 26 used for cm->base_qindex in the
unextended Q case, the two give similar results. However, when using
the extended Q range the two are very different.
Also added more stats output and partly disabled another broken feature.
Change-Id: Iddf6cf5ea8467c44b7c133f38e629f6ba6f2581e
This is an experiment to include a mv contribution from last frame to
nearest and near mv definition. Initial test showed some small though
consistent gain.
latest patch slightly better result ~.13%-~.18%.
TODO: the entropy used to encode the mode choice, i.e. the mv counts
based conditional distribution of modes should be re-collected to
reflect this change, it is expected that there is some further gain
from that.
Change-Id: Ief1e284a36d8aa56b49ae5b360c91419ec494fa4
This comitt brings accross changes from the public branch
commit number Icf74d13af77437c08602571dc7a97e747cce5066.
The main puurpose of this comit relates to CQ mode but it
also includes some refactoring of the two pass code which
I hope will make tuning the experimental branch for the new
quantizer range a little less painfull.
Change-Id: I278e989436a928fc1fe7761068960048f9d7a376
This commit resolves further QIndex look up tables to facilitate
experimentation with the quantizer range.
In some cases rather than remove the look up tables completely
I have created functions that are called once to populate them
using a formulaic approach base on the actual quantizer.
The use of these functions based on best fit of data from the original
tables does affect the results on some clips but across the derf test
set the effect was broadly neutral.
Change-Id: I8baa61c97ce87dc09a6340d56fdeb681b9345793
One of the problems arising when tweaking or adjusting the quantizer
tables is that there are a lot of look up tables that depend on the QINDEX.
Any adjustment to the link between QINDEX and real quantizer therefore tends
to break aspects of for example the rate control.
In this check in I have replaced several of the look up tables with functions that
approximate the same results as the old Q luts but use a formulaic approach
based on real Q values rather than QIndex. This should hopefully make it easier
to experiment with changes to the Q tables without always having to go through
and hand optimize a set of look up tables. Once things stabilize we may choose
to re-instate luts for the sake of performance.
Patch 2:
Addressed Ronald's comments.
vp8_init_me_luts() Added so luts only initialized once.
Change-Id: Ic80db2212d2fd01e08e8cb5c7dca1fda1102be57
Corrected dc lookup table to maintain ac/dc balance
close to what it was previously.
Firstpass not being passed the adjusted Q index for
the extended range.
Change-Id: Ic0200dabda445fea03bf81067999cb2670e99b77
Fix decoder segmentation bug for temporal coding where the segment map
was first initialized on a key frame.
in vp8_kfread_modes() after reading the segment id it must be written to
the pbi->segmentation_map[] for use in temporal coding on subsequent frames.
Change-Id: I1489305efc376564e734a216f69c2844646ee3d3
The buffer level was able to increase indefinitely rather than
being clipped to the maximum buffer size specified by the user.
This change checks the buffrer level and prevents it from
going beyond the upper limit of the buffer.
Change-Id: Ifff55f79d3c018e4d3d77e554b11ada543cc1654
This commit has a few minor fixes to the 8x8 trellis quant, so to
make it work regardless if extend_qrange is enabled or not. It also
borrowed adaptive RDMULT constants from 4x4 trellis that was missed
in the 8x8 trellis quant.
Change-Id: I60d7769071f102c699b5084597e62bca87a1f759
Explicit inclusion of limits.h to satisfy unix build for definition of INT_MAX.
Some commented out code removed.
Change-Id: I5b5980dfaa9b4d2d12bfd729cfd35bd982106908
Removal of CONFIGURE_SEGMENTATION ifdefs.
Removal of legacy support code fo the old coding mechanism.
Use local reference "xd" for MACROBLOCKD structure in
encode_frame_to_data_rate()
Moved call to choose_segmap_coding_method() out of encode
loop as the cost of segmentation is not properly accounted
in the loop anyway. If this is desirable in the future it
can be moved back. The use of this function to do all the
analysis and set the probabilities also removes the need
to track segment useage in threading code.
Change-Id: I85bc8fd63440e7176c73d26cb742698f9b70cade
Changed name and sense of segment_flag to "seg_id_predicted"
Added some additional comments and retested.
I also did some experimentation with a spatial prediction option
using a similar strategy to the temporal mode implemented.
This helps in some cases where temporal prediction is bad but
I suspect there is more overlap here with work on a larger scale
block structure and spatial correlation will likely be better
handled through that mechanism.
Next check in will remove #ifdefs and legacy mode code.
Change-Id: I3b382b65ed2a57bd7775ac0f3a01a9508a209cbc
This check in includes quite a lot of clean up and refactoring.
Most of the analysis and set up for the different coding options for the
segment map (currently simple distribution based coding or temporaly
predicted coding), has been moved to one location (the function
choose_segmap_coding_method() in segmenation.c). This code was previously
scattered around in various locations making integration with other
experiments and modification / debug more difficult.
Currently the functionality is as it was with the exception that the
prediction probabilities are now only transmitted when the temporal
prediction mode is selected.
There is still quite a bit more clean up work that will be possible
when the #ifdef is removed. Also at that time I may rename and alter
the sense of macroblock based variable "segment_flag" which indicates
(1 that the segmnet id is not predicted vs 0 that it is predicted).
I also intend to experiment with a spatial prediction mode that can be
used when coding a key frame segment map or in cases where temporal
prediction does not work well but there is spatial correlation.
In a later check in when the ifdefs have gone I may also move the call
to choose_segmap_coding_method() to just before where the bitsream is
packed (currently it is in vp8_encode_frame()) to further reduce the
possibility of clashes with other experiments and prevent it being called
on each itteration of the recode loop.
Change-Id: I3d4aba2a2826ec21f367678d5b07c1d1c36db168
Added last_segmentation_map[] structure
to keep track of what we had before when
doing temporal prediction. With this change
the existing code does once again appear to
be giving a decodable bitstream for both
temporal and standard prediction modes.
However, it is still somewhat messy and
confused and there is no option to take
advantage of spatial prediction so it could
do with further work.
Some housekeeping / clean out.
Change-Id: I368258243f82127b81d8dffa7ada615208513b47
Some initial cleanup to aid testing and debug.
Pull code to choose temporal or spatial encoding
out of encodeframe.c into a dedicated function
in segmentation.c.
For now disable broken temporal mode.
Move the coding of "temporal_update" flag and
only transmit if segment map update is indicated.
Rename the functions read_mb_features() and
write_mb_features() to read_mb_segid() and
read_mb_segid() as they only read and write
the macroblock segment id not any of the
features.
Change-Id: Ib75118520b1144c24d35fdfc6ce46106803cabcf
The dequantizer functions for 2nd order haar block had confusing 8x8
in their names. this commit fixed their name to avoid confusion.
Change-Id: I6ae4e7888330865f831436313637d4395b1fc273
This commit added scaling factors to 8x8 transform, quant, dequant and
inverse transform pipeline to make 8x8 transform to work when configed
with enable-extend_qrange. This commit also disabled the trellis-quant
when extend_qrange is configured.
Change-Id: Icfb3192e4746f70a4bb35ad18b7b47705b657e52
updated the decode_macroblock logic to reflect that 8x8 transform is
not used for "SPLITMV". Also fixed an issue where 2nd order haar block
has wrong dequant/idct process.
Change-Id: I1e373f6535c009dfec503b6362c8a5cfc196e1da
extend_qrange introduces a different scaling factor, this commit takes
the scaling difference into account for reset 2nd order coefficients.
Change-Id: Ie58bca9f52698fa759e3f88da2aa4d82630fa91a
the bug caused the encoder to produce invalid bitstream when configured
with enable_extend_qrange.
Change-Id: I1e81c48b13359d0043cbbd480e679380a2da117c
For ease of testing and merging experiments I have
removed in line code in encode_frame() that assigns
MBs to be t8x8 or t4x4 coded segments and have
moved the decision point and segment setup to
the init_seg_features0 test function.
Keeping everything in one place helps make sure
for now that experiments using segmentation are
not fighting each other.
Also made sure mode selection code can't choose 4x4
modes if t8x8 is selected.
Patch2: In init_seg_features() add checks
for SEG_LVL_TRANSFORM active.
Change-Id: Ia1767edd99b78510011d4251539f9bc325842e3a
Removed code in #if CONFIG_SEGMENTATION that
enables segmentation and creates a test segmentation
map, to avoid conflicts with the other segmentation test
code,
Change-Id: I7a21a44ed188b814cd80b30dd628c62474eba730
Bug fix to logic in vp8_pick_inter_mode() and
vp8_rd_pick_inter_mode().
The block on the use of segment features
for the cm->refresh_alt_ref_frame case
was just for testing and is not correct.
The special case code for alt ref can
be re-enabled as an else clause.
Change-Id: Ic9b57cdb5f04ea7737032b8fb953d84d7717b3ce
The 8x8 forward transform makes use of floating operations, therefore
requires emms call to reset mmx registers to correct state. Without
the resets, the 8x8 forward transform results are indefinite on win32
platform.
Change-Id: Ib5b71c3213e10b8a04fe776adf885f3714e7deb1
logically this commit should NOT change anything, but seems to help
revert the 3DB loss on bowing in the following commit:
https://on2-git.corp.google.com/g/#change,6193
This is still debugging in progress. Need further investigation to
understand the root cause of the issue.
Change-Id: I0b49d1ef3a311dfff58c6acd3eaebdb3bda6257c
Initial attempt at using new segment feature signaling
to indicate 4x4 or 8x8 transform.
needs --enable-experimental --enable-t8x8
Note this is work in progress.
Change-Id: Ib160d46a5d810307bfcbc79853ce1a65b5b870b7
Temporary check in to turn off other segment features
tests when #if CONFIG_T8X8 is set as the assignment of
MBs to differnt segments in each case will conflict.
The 8x8 code will be modified to use the new segment
feature method properly in a later check in.
Increase bits allowed for EOB end stop marker to 6 ready
for 8x8.
Change-Id: I4835bc8d3bf98e1775c3d247d778639c90b01f7f
No change to functionality or output.
Updates to the segment feature data structure now all done
through functions such as set_segdata() and get_segdata()
in seg_common.c.
The reason for this is to make changing the structures (if needed)
and debug easier.
In addition it provides a single location for subsequent addition
of range and validity checks. For example valid combination of
mode and reference frame.
Change-Id: I2e866505562db4e4cb6f17a472b25b4465f01add
This commit tries to do UV intra mode coding adaptive to Y intra mode.
Entropy context is defined as conditional PDF of uv intra mode given
the Y mode. All constants are normalized with 256 to be fit in 8 bits.
This provides further coding efficiency beyond the quantizer adaptive
y intra mode coding. Consistent gains were observed on all clips and
all bit rates for HD all key encoding tests.
To test, configure with
--enable-experimental --enable-uvintra
Change-Id: I2d78d73f143127f063e19bd0bac3b68c418d756a
As discovered in path 10 of Change Ia12acd2f, reset 2nd order coeffs
without reset of above and left coding context may have introduced
problem that causes encoder/decoder mismatching. This commit added
update to coding context when the 2nd coefficients are cleared.
In addition, this commit also introduced early breakout in the checks
to speed up when coefficients are too significant to be cleared.
Change-Id: I85322a432b11e8af85001525d1e9dc218f9a0bd6
Removal of configure #ifdefs so that segment features
always available. Removal of code supporting old
segment feature method.
Still a good deal of tidying up to do.
Change-Id: I397855f086f8c09ab1fae0a5f65d9e06d2e3e39f
Modify reference frame segmentation so that ONE or MORE
reference frames may be marked as a available for a given
segment.
Fixed bugs relating to segment coding of INTRA and some
INTER modes at the segment level.
Modified Q boost for static areas based on ambient average Q.
Strong results now on clips with significant static areas.
(some data points in derf set as high as 9% and some static &
slide show type content in YT set > 20%)
Change-Id: Ia79f912efa84b977f35a23683ae3643251e24f0c
The block of code skipped testing the current mode if the
reference frame is AltRef, the mv is not (0,0) and
ARNR filtering is disabled.
This block of code has already been tested above if the
macro CONFIG_SEGFEATURES is set to 0.
Change-Id: I3f5710bb8270caad06c9a0eee59fa0daf1f70776
The variable this_mode was being used before it had been
initialized.
Moved the line that sets-up this_mode toward the top of the
enclosing loop, prior to its first use. The bug would result in
tests in the loop lagging the mode that was expected to be
tested.
Change-Id: If4e51600449ce6b4285f112da17a44c24b4a19fb
Some correction for entropy impact of segment signaled (EOB and ref frame)
Other slight tweaks.
Derf VBR average gain now over 1% (best over 7%)
One YT test clip has gains of circa 30% (VBR)
There is still an issue with noisy clips where making the background static
and coded with 0,0 can have a negative effect, especially at low Q.
This is probably because of the loss of smoothing by fractional pixel filters.
Change-Id: I7a225613c98067b96f8fc7a7e36f95d465b2b834
Prior to the added rounding, tests on randomly generated data showed
that forward-inverse transform round trip errors are about 3.02/block
for input range [-10,10] and 2.68/block for input range [-256, 255].
The added rounding reduced the errors to 0.031/block for input range
[-10,10] and 0.037/block for input range [-256, 255].
Maximum round trip error on for any pixel position is 1.
The average errors are calculated based on 100,000 blocks of randomly
with the specified ranges.
Paul mentioned in discussion that the change was not clear on why we
need change the rounding, so Patch 2 intends to make the rationale
obvious in code, it merged the two separate shifts into one, and the
two separate rounding factors into one. Patch 1 and 2 have same
numerical test results.
Change-Id: Ic5e2f5463de17253084d8b2398c4a210194b20de
Only encode sign bit for feature data that can have a sign.
Tweaks to the test segmentation rules so that it now actually gives
a net benefit on the derf set of about 0.4% though much higher
on some clips at the low end.
Change-Id: I8e61f1aebf41c9037db7e67e2f8975aa18a0c986
This quite large check in includes the following:
Merge in some code from Ronald (mbgraph.c) that scans a Gf/arf group.
This is used as a basis for a simple segmentation for the normal frames
in a gf/arf group. This code also uses satd functions from Yaowu.
Adds functionality for coding the latest possible position of an EOB for
blocks in the segment. (Currently 0-15 only, hence just for 4x4 dct).
Where the EOB position is 0 this acts like "skip" and the normal coding
of skip at the per mb level is disabled.
Added functions (seg_common.c) for setting and reading segment feature
elements. These may want to be optimized away at some point but while the
mecahnism is in a state of flux they provide a single location for making
changes and keep things a bit cleaner.
This is still proof of concept code. Currently the tested feature set:-
Quantizer,
Loop Filter level,
Reference frame,
Prediction Mode,
EOB end stop.
TBD:-
Add functions for setting and reading the feature data with range
and validity checking.
Handling of signed and unsigned feature data. At the moment all is assumed
to be signed and a sign bit is coded but many cannot be negative.
Correct handling of EOB feature with intra coded blocks.
Testing/trapping of legal/illegal ref frame and mode combinations.
Transform size switch plus merge and test with 8c8 DCT work
Merge and test with Sumans Segmenation coding optimizations
Change-Id: Iee12e83661c7abbd1e0ce6810915eb4ec35e2d8e
When 8x8 transform is enabled, the decoder does an extra reconstruct
on MBs that are coded using 8x8. This commit fixed the logic around
the decoding of mb encoded with 8x8 transform.
Change-Id: I6926557c9ef00eecb375f62946f7e140c660bf6f
Proof of concept test code that encodes mode and reference
frame data at the segment level.
Decode-able bit stream but some issues not yet resolved.
As it this helps a little on a couple of clips but hurts on most as
the basis for segmentation is unsound.
To build and test, configure with
--enable-experimental --enable-segfeatures
Change-Id: I22a60774f69273523fb152db8c31f4b10b07c7f4
- Removed fast_fdct4x4_neon and fast_fdct8x4_neon
- Uses now short_fdct4x4 and short_fdct8x4
- Gives ~1-2% speed-up on Cortex-A8/A9
Change-Id: Ib62f2cb2080ae719f8fa1d518a3a5e71278a41ec
Rd and Rm registers should be different in 'mul'. This register
combination results in unpredictable behaviour. GCC will give
a warning and RVCT an error in this case.
Restriction applies only to armv5 targets and not for armv6 and above.
Change-Id: I378d17c51e1f16a6820814fbed43e115aaabb03e
These changes fixes a glitch between the RTP profile and the input
partitions interface. Since there's no way for the user to know the
actual number of partitions, the decoder have to read the
multi_token_paritition bits also when input partitions mode is
enabled.
Included are also a couple of fixes for issues with independent
partitions and uninitialized memory reads.
Change-Id: I6f93b15287d291169ed681898ed3fbcc5dc81837
- Updated walsh transform to match C
(based on Change Id24f3392)
- Changed fast_fdct4x4 and 8x4 to short_fdct4x4 and 8x4
correspondingly
Change-Id: I704e862f40e315b0a79997633c7bd9c347166a8e
Modified original patch If2f07220885c4c3a0cae0dace34ea0e36124f001
according to comments. Scheduled code a little bit to prevent some
interlocks.
Change-Id: I338f02b881098782f82af63d97f042b85e63e902
This commit added a 3 bit index to the bitstream, the index is used to
look into the intra mode coding entropy context table. The commit uses
the mode stats to calculate the cost of transmitting modes using 8
possible entropy distributions, and selects the distribution that
provides the lowest cost to do the actual mode coding.
Initial test show this provides additional .2%~.3% gain over quantizer
adaptive intra mode coding. So the adaptive intra mode coding provides
a total of .5%(psnr) to .6% gain(ssim) combined for all-key-encoding
To build and test, configure with
--enable-experimental --enable-qimode
Change-Id: I7c41cd8bfb352bc1fe7c5da1848a58faea5ed74a
make intra mode coding entropy distribution adaptive to baseQindex, an
encoding test on hd clips with all key frame shows universal gain on
all clips in both .2%(psnr) and (ssim).3%.
To build and test, configure with
--enable-experimental --enable-qimode
Change-Id: Iaa69241b984d4fdd8baa6d77ee78c0140f5ac00a
Patch 1 to Patch 3 is an initial implementation of 8x8 intra prediction
modes, here are with the following assumptions:
a. 8x8 has 4 prediction modes DC, H, V and TM
b. UV 4x4 block use the same mode as corresponding 8x8 area
c. i8x8 modes are enabled for key frame only for now
Patch 4:
d. removed debug code from previous patches
Patch 5:
e. added stats code to collect entropy stats and further cleaned up
Patch 6:
f. changed mode stats code to collect finer stats of modes
Patch 7:
g. normalized i8x8 modes distribution to total at 256 (8bits).
Patch 8:
h. fixed a bug in decoder and removed debug printf output.
Patch 9:
i. more cleanups to address paul's comment
Patch 10:
j. messy rebase/merges to bring the commit up to date.
Tests on HD clips encoded with all key frame showing consistent gain
on all clips and all metrics:~0.5%(psnr) and 0.6%(ssim):
http://www.corp.google.com/~yaowu/no_crawl/i8x8hd_allkey_fixedq.html
To build and test, configure with:
--enable-experimental --enable-i8x8
Change-Id: I9813fe07ae48cab5fdb5d904bca022514ad01e7f
Code all the features for one segment (grouped together)
then all for the next etc. etc. rather than grouping the
data by feature.
Change-Id: I2a65193b3a70aca78f92e855e35d8969d857b6dd
This data structure is now [Segment ID][Features]
rather than [Features][Segment_ID]
I propose as a separate modification to make the experimental
bit stream reflect this such that all the features for a segment
are coded together.
Change-Id: I581e4e3ca2033bdbdef3d9300977a8202f55b4fb
Some basic plumbing added for a range of segment level features.
MB_LVL_* changed to SEG_LVL_* to better reflect meaning.
Change-Id: Iac96da36990aa0e40afc0d86e990df337fd0c50b
vp8_update_zbin_extra() is called all the time even though the fast
quantizer doesn't use it. Skip this call if fast quantizer is used.
Change-Id: Ia711c38431930cc2486cf59b8466060ef0e9d9db
This change makes sure that no key frame recoding in real-time mode
even if CONFIG_REALTIME_ONLY is not configured.
Change-Id: Ifc34141f3217a6bb63cc087d78b111fadb35eec2
for SPLITMV and B_PRED modes. Modified code to use the bmi
found in mode_info_context instead of BLOCKD. On the decode
side, the uvmvs are calculated only when required, instead of
every macroblock. This is WIP. (bmi should eventually be
removed from BLOCKD)
Small performance gains noticed for RT encodes and decodes.(VGA)
Change-Id: I2ed7f0fd5ca733655df684aa82da575c77a973e7
Prepend idct function names with vp8_
so that under profiling they show up
associated with libvpx.
Change-Id: I4fe357b50236cb7730a4cc00164c0a3487a1d8b4
The data that the simple horizontal loopfilter reads is aligned, treat
it accordingly.
For the vertical, we only use the bottom 4 bytes, so don't read in 16
(and incur the penalty for unaligned access).
This shows a small improvement on older processors which have a
significant penalty for unaligned reads.
postproc_mmx.c is unused
Change-Id: I87b29bbc0c3b19ee1ca1de3c4f47332a53087b3d
Prepend . to local labels in assembly code. This
allows non unique labels within a file. Also
makes profiling information more informative
by keeping the function name with the loop name.
Change-Id: I7a983cb3a5ba2413d5dafd0a37936b268fb9e37f
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I48ad0218af0cc51c5078070a08511dee43ecfe09
Calculations were incorrectly classified as either
SSE3 or SSSE3. Only using SSE2 instructions.
Cleanup function names and make non-RTCD code work
as well.
Change-Id: I29f5c2ead342b2086a468029c15e2c1d948b5d97
When active map is specified and the current frame is not a key frame,
golden frame nor a altref frame then copy only those active regions.
This significantly reduces encoding time by as much as 19% on the test
system where realtime encoding is used. This is particularly useful
when the frame size is large (e.g. 2560x1600) and there's only a few
action macroblocks.
Change-Id: If394a813ec2df5a0201745d1348dbde4278f7ad4
Instead of a single mid GF boost apply a few extra bits to
every other frame. This gives a very small average metrics
improvement on both derf and YT sets.
Also use min GF interval as min KF interval.
Change-Id: Iee238b8cae0ffaed850a5a944ac825cee18da485
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.
Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
Since the block will be interpreted as an inter block, the mode will
be interpreted as a motion vector, resulting in bad concealment.
Change-Id: Ifcc685ae1cc883492bce6dbd61e418d91a89b053
This reverts commit b5ea2fbc2c. Further
testing showed noticable keyframe popping in some cases, reverting this
for now to give time for a proper fix.
Conflicts:
vp8/encoder/onyx_if.c
vp8/encoder/ratectrl.c
Change-Id: I159f53d1bf0e24c035754ab3ded8ccfd58fd04af
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.
Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.
Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.
Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.
Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
EC expects the subblock MVs to be populated, but
f1d6cc79e4 removed this code. This
commit restores it, protected by CONFIG_ERROR_CONCEALMENT. May move this
to the EC code more directly in the future.
Change-Id: I44f8f985720cb9a1bf222e59143f9e69abf56ad2
When error concealment is enabled the first key frame must
be successfully received before error concealment is activated.
Error concealment will be activated when the delta following
delta frame is received.
Also fixed a couple of bugs related to error tracking in
multi-threading. And avoiding decoding corrupt residual
when we have multiple non-resilient partitions.
Change-Id: I45c4bb296e2f05f57624aef500a874faf431a60d
This patch fixes an OOB read when error concealment is enabled and the
partition sizes are corrupt. The partition size read from the bitstream
was not being validated in EC mode.
Change-Id: Ia81dfd4bce1ab29ee78e42320abe52cee8318974
This patch fixes a bug in the interaction between the recode loop and
spatial resampling. If the codec was in a spatial resampling state,
and a subsequent iteration of the recode loop disables resampling,
then the source buffer must be reset to the unscaled source.
Change-Id: I4e4cd47b943f6cd26a47449dc7f4255b38e27c77
Changed motion search in vp8_find_best_half_pixel_step() to be the
same as in vp8_find_best_sub_pixel_step(), which checks 5 points
instead of 8 points. This only affects real-time mode with
cpu-used >=9. Tests showed it gives 2% encoding speedup with
a quality loss(psnr) of up to 0.5%.
Change-Id: I16049cad1535002346d46cfdfad345bfc3dc5146
the neon code made several assumptions which were broken by a recent
change: https://review.webmproject.org/2676
update the code with new assumptions and guard them with a compile time
assert
Change-Id: I32a8378030759966068f34618d7b4b1b02e101a0
Since this is the only ABI incompatible change since the last release,
convert it to use the control interface instead. The member of the
configuration struct is replaced with the VP8E_SET_MAX_INTRA_BITRATE_PCT
control.
More significant API changes were expected to be forthcoming when this
control was first introduced, and while they continue to be expected,
it's not worth breaking compatibility for only this change.
Change-Id: I799d8dbe24c8bc9c241e0b7743b2b64f81327d59
This change implemented same idea in change "Preload reference area
to an intermediate buffer in sub-pixel motion search." The changes
were made to vp8_find_best_sub_pixel_step() and vp8_find_best_half
_pixel_step() functions which are called when speed >= 5. Test
result (using tulip clip):
1. On Core2 Quad machine(Linux)
rt mode, speed (-5 ~ -8), encoding speed gain: 2% ~ 3%
rt mode, speed (-9 ~ -11), encoding speed gain: 1% ~ 2%
rt mode, speed (-12 ~ -14), no noticeable encoding speed gain
2. On Xeon machine(Linux)
Test on speed (-5 ~ -14) didn't show noticeable speed change.
Change-Id: I21bec2d6e7fbe541fcc0f4c0366bbdf3e2076aa2
There were some situations that the start motion vectors were
out of range. This fix adjusted range checks to make sure they
are checked and clamped.
Change-Id: Ife83b7fed0882bba6d1fa559b6e63c054fd5065d
sharpness was not recalculated in vp8cx_pick_filter_level_fast
remove last_filter_type. all values are calculated, don't need to update
the lfi data when it changes.
always use cm->sharpness_level. the extra indirection was annoying.
don't track last frame_type or sharpness_level manually. frame type
only matters for motion search and sharpness_level is taken care of in
frame_init
move function declarations to their proper header
Change-Id: I7ef037bd4bf8cf5e37d2d36bd03b5e22a2ad91db
In sub-pixel motion search, the search range is small(+/- 3 pixels).
Preload whole search area from reference buffer into a 32-byte
aligned buffer. Then in search, load reference data from this buffer
instead. This keeps data in cache, and reduces the crossing cache-
line penalty. For tulip clip, tests on Intel Core2 Quad machine(linux)
showed encoder speed improvement:
3.4% at --rt --cpu-used =-4
2.8% at --rt --cpu-used =-3
2.3% at --rt --cpu-used =-2
2.2% at --rt --cpu-used =-1
Test on Atom notebook showed only 1.1% speed improvement(speed=-4).
Test on Xeon machine also showed less improvement, since unaligned
data access latency is greatly reduced in newer cores.
Next, I will apply similar idea to other 2 sub-pixel search functions
for encoding speed > 4.
Make this change exclusively for x86 platforms.
Change-Id: Ia7bb9f56169eac0f01009fe2b2f2ab5b61d2eb2f
With this fix, the experimental branch now builds and encodes correctly
with the following two configure options respectively:
--enable-experimental --enable-t8x8
--enable-experimental
Change-Id: I3147c33c503fe713a85fd371e4f1a974805778bf
The auto merge process pull and merge commits from public git or master
branch. These automerges while worked well most time, but has created
a few problems. This commit fixed several issues existed long before
the latest 8x8 transform commit.
Change-Id: I895ca99713231b1aec521d57db5d9839f74aacfa
This is done by expanding luma row to 32-byte alignment, since
there is currently a bunch of code that assumes that
uv_stride == y_stride/2 (see, for example, vp8/common/postproc.c,
common/reconinter.c, common/arm/neon/recon16x16mb_neon.asm,
encoder/temporal_filter.c, and possibly others; I haven't done a
full audit).
It also uses replaces the hardcoded border of 16 in a number of
encoder buffers with VP8BORDERINPIXELS (currently 32), as the
chroma rows start at an offset of border/2.
Together, these two changes have the nice advantage that simply
dumping the frame memory as a contiguous blob produces a valid,
if padded, image.
Change-Id: Iaf5ea722ae5c82d5daa50f6e2dade9de753f1003
allowing the compiler to inline this function. For real-time
encodes, this gave a boost of 1% to 2.5%, depending on the
speed setting.
Change-Id: I3929d176cca086b4261267b848419d5bcff21c02
This patch attempts to improve the handling of CBR streams with
respect to the short term buffering requirements. The "buffer level"
is changed to be an average over the rc buffer, rather than a long
running average. Overshoot is also tracked over the same interval
and the golden frame targets suppressed accordingly to correct for
overly aggressive boosting.
Testing shows that this is fairly consistently positive in one
metric or another -- some clips that show significant decreases
in quality have better buffering characteristics, others show
improvenents in both.
Change-Id: I924c89aa9bdb210271f2e03311e63de3f1f8f920
Optimized C-code of the following functions:
- vp8_tokenize_mb
- tokenize1st_order_b
- tokenize2nd_order_b
Gives ~1-5% speed-up for RT encoding on Cortex-A8/A9
depending on encoding parameters.
Change-Id: I6be86104a589a06dcbc9ed3318e8bf264ef4176c