Scott pointed out that last_frame_type only gets updated while
loopfilter exists. Since last_frame_type is also needed in
motion search now, it needs to be updated every frame.
Change-Id: I9203532fd67361588d4024628d9ddb8e391ad912
Use the fast quantizer for inter mode selection and the
regular quantizer for the rest of the encode for good quality,
speed 1. Both performance and quality were improved. The
quality gains will make up for the quality loss mentioned in
I9dc089007ca08129fb6c11fe7692777ebb8647b0.
Change-Id: Ia90bc9cf326a7c65d60d31fa32f6465ab6984d21
1. Search for block8x16/block16x8 uses block8x8's search results.
2. Check block4x4 only if block8x8 is chosen. (This hurts quality,
which will be improved in another check-in.)
3. In block4x4 search, the previous block's result is used as
MV predictor for next block.
This change improves performance.
Change-Id: I9dc089007ca08129fb6c11fe7692777ebb8647b0
Re-calibrated sad_per_bit16 and sad_per_bit4 tables to linearly
correlated to quantizer values, these two variables are used in
motion search for costing motion vectors. This change has an small
positive effect on compression.
Change-Id: Ic9b5ea6fb8d5078ef663ba4899db019cc51f4166
the lookup table is based on floating point calculations (see source)
by moving the *3 before the downshift and adding the rounding bit, the
delta (LUT - integer) goes from:
______________________________________
__ 1__ 1______________________________
__ 1__ 1______________________________
____ 1______ 1________________________
____ 1 2__ 2 1________________________
______ 1 1 2__ 2__ 2__ 2 1 1__________
________ 1 1 2 2__ 1 2 3 1 2__ 2__ 2__
to:
__-1__-1______________________________
______________________________________
____-1______-1________________________
______________________________________
________-1______________-1____________
______________________________________
it's important to be able to use the integer version because the LUT
more or less precludes SIMD optimizations
Change-Id: I45a81127dc7b72a06fba951649135d9d918386c0
allow for optimized versions of apply_temporal_filter
(now vp8_apply_temporal_filter_c)
the function was previously declared as static and appears to have been
inlined. with this change, that's no longer possible. performance takes
a small hit.
the declaration for vp8_cx_temp_filter_c was moved to onyx_if.c because
of a circular dependency. for rtcd, temporal_filter.h holds the
definition for the rtcd table, so it needs to be included by onyx_int.h.
however, onyx_int.h holds the definition for VP8_COMP which is needed
for the function prototype. blah.
Change-Id: I499c055fdc652ac4659c21c5a55fe10ceb7e95e3
Add a new encoder control, VP8E_SET_TUNING, to allow the application
to inform the encoder that the material will benefit from certain
tuning. Expose this control as the --tune option to vpxenc. The args
helper is expanded to support enumerated arguments by name or value.
Two tunings are provided by this patch, PSNR (default) and SSIM.
Activity masking is made dependent on setting --tune=ssim, as the
current implementation hurts speed (10%) and PSNR (2.7% avg,
10% peak) too much for it to be a default yet.
Change-Id: I110d969381c4805347ff5a0ffaf1a14ca1965257
In SPLITMV, the 8x8 segment will be checked first. If the 8x8 rd
is better than the best, we check the other segments. Otherwise
bail. Adjustments to the thresh_mult were necessary to make
up for the initial quality loss.
The performance improved by 20% (average) for good quality,
speed 0 and speed 1, while the overall quality remained the same.
Change-Id: I717aef401323c8a254fba3e9777d2a316c774cc3
vp8_rd_pick_best_mbsegmentation looks at y only. The new
breakout does not include the frame cost, the prob_skip_false
cost, or the uv rate. Performance improved by a few percent
and the quality remained the same.
Change-Id: I94ff013998ac51e8ecce7130870f7b6600758e15
This fix added MV range checks for NEWMV mode as suggested by Jim.
To reduce unnecessary MV range checks, I tried Yaowu's suggestion.
Update UMV borders in NEWMV mode to also cover MV range check.
Also, in this way, every MV that is valid gets checked in diamond
search function.
Change-Id: I95a89ce0daf6f178c454448f13d4249f19b30f3a
The MV's range is 256. Since the new motion search uses a different
starting MV than the center ref MV, a MV range checking needs to
be done to avoid corruption.
Change-Id: I8ae0721d1bd203639e13891e2e54a2e87276f306
Per John's previous change, shrink TOKENEXTRA from 20 to 8 bytes
original: b7b1e6fb
reverted: 41f4458a
Also drop unused field from vp8_extra_bit_struct
Update ARM ASM to deal with this change. In particular, Extra is signed
and needs to be sign-extended when loaded.
Change-Id: Ibd0ddc058432bc7bb09222d6ce4ef77e93a30b41
This code is unused, as the current preproc implementation uses the
same spatial filter that postproc uses.
Change-Id: Ia06d5664917d67283f279e2480016bebed602ea7
Change the size of structure elements to reduce memory utilization.
Removed the 'section' member entirely, as it is set but never read.
Change-Id: Iad043830392fb4168cb3cd6075fb0eb70c7f691c
Added the initialization of the pointer to active map. Also added the
same logic for cyclic refresh in mbrow encoding threads.
Change-Id: Ic48d0849dc706b27fba72d07dcc498075725663d
Changed the end of block computation to use pmaxw. Removed
additional pushing and popping of registers that was not needed.
Change-Id: I08cb9b424513cd8a2c7ad8cea53b4e2adc66ef98
Change I3430820 performed an uninitialized read when
encode_breakout == 0, since the sum and sse wouldn't be set:
if(x->encode_breakout)
VARIANCE_INVOKE(..., get16x16var)(..., &sum, &sse);
if (cpi->active_map_enabled && x->active_ptr[0] == 0) {
...
} else if (sse < x->encode_breakout)
Change-Id: I915eb76d1227b4b6d1137a0dedf2c143860098a2
Corrected the initial Q range limits for the recode loop
to reflect the current allowed range for the frame.
In experimental work on constrained quality this bug was
causing unnecessary recodes.
Change-Id: I7e256fbfa681293b0223fe21ec329933d76c229f
This patch adds a weighting factor on RDMULT for UV blocks. The change
has an overall gain about 0.5% based on ssim, between 0.1 and 0.2% by
psnr numbers.
Change-Id: I97781b077ce3bb7e34241b03268491917e8d1d72
Deallocating the buffers before re-allocating them.
The fix passed James Berry's test program for memory
leak check.
Change-Id: I18c3cf665412c0e313a523e3d435106c03ca438d
Moved the code from the segmentation loop into a function
which is now called for each segment. This will allow us
to change the segment order checking more easily.
Change-Id: I9510d26f0acae5a73043fcca8f1984b121d3e052
When auto_golden wasn't set it forced all frames to be a golden
frame. Now the manual configured frequency is adhered to.
Change-Id: I360acac9bc487db0d9c4d4da6ee41f70c227c539
The inter_minq table controls the range of quantizers available
for a particular frame in two pass relative to a max Q value.
The changes reduces the range somewhat. The effect of this
was a small increase (0.3% average) in psnr for the test set
but it should also help encode speed somewhat for higher
quality modes as it will reduce the number of iterations in the
recode loop.
The change damps the range of quantizers available locally
within a section of a clip and should therefore help keep quality
more uniform. If there is systematic overshoot or undershoot the
range can shift gradually to accommodate. However, there is
some increased risk of overshoot or undershoot against the target
bit rate in VBR mode and this risk will be more pronounced for short
clips.
The change damps the range of quantizers available locally
within a section of a clip and should therefore help keep quality
more uniform. If there is systematic overshoot or undershoot the
range can shift gradually to accommodate. However, there is
some increased risk of overshoot or undershoot against the
target bit rate in VBR mode and this risk will be more
pronounced for short clips.
Change-Id: I84465567d49ae767c6c73ff2a2aac30c895adb52
Add vp8_mv_pred() to better predict starting MV for NEWMV
mode in vp8_rd_pick_inter_mode(). Set different search
ranges according to MV prediction accuracy, which improves
encoder performance without hurting the quality. Also,
as Yaowu suggested, using diamond search result as full
search starting point and therefore adjusting(reducing)
full search range helps the performance.
Change-Id: Ie4a3c8df87e697c1f4f6e2ddb693766bba1b77b6
On a keyframe alt ref and golden are refreshed. The flag was
not being set and so on the frame after a keyframe, motion
search would occur on the alt ref frame. This is not necessary
because the alt ref frame identical to the last frame in this
scenario.
Handle corner case where a forward alt-ref frame is put
directly after a keyframe.
Change-Id: I9be4cf290d694f8cf2f9a31852014b5ccf1504d3
The baseline bits per MB prediction tables have been
re calibrated based on the assumption that bits per mb
is inversely proportional to the quantizer level.
Change-Id: Ibd355c7acac4b8053dda1baf1032fe35f11da7f7
Changing the MAX_PSNR to 100 to allow testing of further experiments
on extending quantizer range to near lossless. With an effective
quantizer of 1, encoder achieves ~68DB, which is consistent with
fdct/idct round trip error.
Change-Id: I7b6d0e94a8936968ef42e82e63ebb13999c36832
Replaced existing code to decide if a frame recode is required
with a function call. This is to simplify addition of extra clauses
that may be needed for the planned constrained quality mode.
Also fixed a bug where by alt ref not considered in the test.
Change-Id: I3d40bb21abe3e19f8456761e6849deb171738b60
x86-64 passes arguments in registers. There is no need to push
them to the stack before using them.
This fixes 15acc84f10 where ebx
was not getting preserved on x86.
Change-Id: I1214b5f818a0201f75ab6ad7d5c6f448e09b16c2
The use of incorrect mv costing tables in the ARNR sub-pel
filtering code led to corruption of the altref buffer in some cases,
particularly at low data rates.
The average gain from this fix is about 0.3% but there are a few
extreme cases where nasty and visible artifacts manifested and
for these few data points the improvement is > 10%.
PGW and AWG
Change-Id: I95cc02b196a433e71d0d2bd2b933fe68ed31e796
This intends to correct the tendency that VP8 aggressively favors rate
on intra coded frames. Experiments tested different numbers in [0, 1]
and found 9/16 overall provided about 2-4% gains for all-intra coded
clips based on vpx-ssim metric. The impact on regular encoded clips
is much smaller but positive overall. Overall impact on psnr is also
positive even though very small.
Change-Id: If808553aaaa87fdd44691f9787820ac9856d9f8a
The fast quantizer assembly code has not been updated to match the new
exact quantizer, which was made the default in commit 6adbe09.
Specifically, they are not aware of the potential for the coefficient
to be scaled, which results in the quantized result exceeding the range
of the DCT. This patch restores the previous behavior of using the
non-shifted coefficients when in the fast quantizer code path, but
unfortunately requires rebuilding the tables when switching between the
two.
Change-Id: I0a33f5b3850335011a06906f49fafed54dda9546
- Used three probability approach for temporal context as follows:
P0 - probability of no change if both above and left not changed
P1 - probability of no change if one of above and left has changed
P2 - probability of no change if both above and left have changed
In addition, a 1 bit/frame has been used to decide whether to use temporal context or to encode directly. The cost of using both the schemes is calculated ahead and the temporal_update flag is set if the cost of using temporal context is lower than encoding the segment ids directly.
This approach has given around 20% reduction in cost of bits needed to encode segmentation ids.
Change-Id: I44a5509599eded215ae5be9554314280d3d35405
Debugging in postproc needs more flags to allow for specific
block types to be turned on or off in the visualizations.
Must be enabled with --enable-postproc-visualizer during
configuration time.
Change-Id: Ia74f357ddc3ad4fb8082afd3a64f62384e4fcb2d
VBR rate control can become very noisy for the last few frames.
If there are a few bits to spare or a small overshoot then the
target rate and hence quantizer may start to fluctuate wildly.
This patch prevents further adjustment of the active Q limits for
the last few frames.
Patch also removes some redundant variables and makes one small bug fix.
Change-Id: Ic167831bec79acc9f0d7e4698bcc4bb188840c45
Small changes to the default zero bin and rounding tables.
Though the tables are currently the same for the Y1 and Y2 cases
I have left them as separate tables in case we want to tune this later.
There is now some adjustment of the zbin based on the prediction mode.
Previously this was restricted to an adjustment for gf/arf 0,0 MV.
The exact quantizer now marginal outperforms and is the default.
The overall average gain is about 0.5%
Change-Id: I5e4353f3d5326dde4e86823684b236a1e9ea7f47
Change Ice204e86 identified a problem with bitrate undershoot due to
low precision in the timestamps passed to the library. This patch
takes a different approach by calculating the duration of this frame
and passing it to the library, rather than using a fixed duration
and letting the library average it out with higher precision
timestamps. This part of the fix only applies to vpxenc.
This patch also attempts to fix the problem for generic applications
that may have made the same mistake vpxenc did. Instead of
calculating this frame's duration by the difference of this frame's
and the last frame's start time, we use the end times instead. This
allows the framerate calculation to scavenge "unclaimed" time from
the last frame. For instance:
start | end | calculated duration
======+=======+====================
0ms 33ms 33ms
33ms 66ms 33ms
66ms 99ms 33ms
100ms 133ms 34ms
Change-Id: I92be4b3518e0bd530e97f90e69e75330a4c413fc
(test clip: tulip)
For good quality mode with speed=1, this gave the encoder
a small (2 - 3%) performance boost.
Change-Id: I8a1d4269465944ac0819986c2f0be4b0a2ee0b35
Unlike GCC, Visual Studio compiler doesn't allocate SAD output
array 16-byte aligned, which causes crash in visual studio.
Change-Id: Ia755cf5a807f12929bda8db94032bb3c9d0c2362
This eliminates a large set of warnings exposed by the Mozilla build
system (Use of C++ comments in ISO C90 source, commas at the end of
enum lists, a couple incomplete initializers, and signed/unsigned
comparisons).
It also eliminates many (but not all) of the warnings expose by newer
GCC versions and _FORTIFY_SOURCE (e.g., calling fread and fwrite
without checking the return values).
There are a few spurious warnings left on my system:
../vp8/encoder/encodemb.c:274:9: warning: 'sz' may be used
uninitialized in this function
gcc seems to be unable to figure out that the value shortcut doesn't
change between the two if blocks that test it here.
../vp8/encoder/onyx_if.c:5314:5: warning: comparison of unsigned
expression >= 0 is always true
../vp8/encoder/onyx_if.c:5319:5: warning: comparison of unsigned
expression >= 0 is always true
This is true, so far as it goes, but it's comparing against an enum, and the C
standard does not mandate that enums be unsigned, so the checks can't be
removed.
Change-Id: Iaf689ae3e3d0ddc5ade00faa474debe73b8d3395
Use mpsadbw, and calculate 8 sad at once. Function list:
vp8_sad16x16x8_sse4
vp8_sad16x8x8_sse4
vp8_sad8x16x8_sse4
vp8_sad8x8x8_sse4
vp8_sad4x4x8_sse4
(test clip: tulip)
For best quality mode, this gave encoder a 5% performance boost.
For good quality mode with speed=1, this gave encoder a 3%
performance boost.
Change-Id: I083b5a39d39144f88dcbccbef95da6498e490134
This patch fixes the system dependent entries for the half-pixel
variance functions in both the RTCD and non-RTCD cases:
- The generic C versions of these functions are now correct.
Before all three cases called the hv code.
- Wire up the ARM functions in RTCD mode
- Created stubs for x86 to call the optimized subpixel functions
with the correct parameters, rather than falling back to C
code.
Change-Id: I1d937d074d929e0eb93aacb1232cc5e0ad1c6184
NEON has optimized 16x16 half-pixel variance functions, but they
were not part of the RTCD framework. Add these functions to RTCD,
so that other platforms can make use of this optimization in the
future and special-case ARM code can be removed.
A number of functions were taking two variance functions as
parameters. These functions were changed to take a single
parameter, a pointer to a struct containing all the variance
functions for that block size. This provides additional flexibility
for calling additional variance functions (the half-pixel special
case, for example) and by initializing the table for all block sizes,
we don't have to construct this function pointer table for each
macroblock.
Change-Id: I78289ff36b2715f9a7aa04d5f6fbe3d23acdc29c
ARM NEON has a platform specific version of vp8_recon16x16mb, though
it's just a stub to extract the various parameters from the
MACROBLOCKD struct and pass them to vp8_recon16x16mb_neon(). Using
that function's prototype directly will be a better long term solution,
but it's quite an invasive change.
Change-Id: I04273149e2ade34749e2d09e7edb0c396e1dd620
The ARM version of vp8_hex_search() is a faster implementation
of the same algorithm. Since it doesn't use any ARM specific
code, it can be made the default implementation. This removes
a linking error.
Change-Id: I77d10f2c16b2515bff4522c350004e03b7659934
These functions were true duplicates of functions present in the
generic code. This fixes some of the link errors when building
with --enable-shared --enable-pic.
Change-Id: Idff26599d510d954e439207883607ad6b74df20c
These functions made global references but did not set up the GOT,
causing compilation failures in PIC mode.
Change-Id: Iac473bf46733f87eb2e001cd736af4acf73fa51d
cppcheck found a leaked file descriptor in the debugging code
enabled by defining ENTROPY_STATS. Fixes issue #60.
Change-Id: I0c1d0669cb94d44fed77860f97b82763be06b7cb
The primary goal is to allow a binary to be built which supports
NEON, but can fall back to non-NEON routines, since some Android
devices do not have NEON, even if they are otherwise ARMv7 (e.g.,
Tegra).
The configure-generated flags HAVE_ARMV7, etc., are used to decide
which versions of each function to build, and when
CONFIG_RUNTIME_CPU_DETECT is enabled, the correct version is chosen
at run time.
In order for this to work, the CFLAGS must be set to something
appropriate (e.g., without -mfpu=neon for ARMv7, and with
appropriate -march and -mcpu for even earlier configurations), or
the native C code will not be able to run.
The ASFLAGS must remain set for the most advanced instruction set
required at build time, since the ARM assembler will refuse to emit
them otherwise.
I have not attempted to make any changes to configure to do this
automatically.
Doing so will probably require the addition of new configure options.
Many of the hooks for RTCD on ARM were already there, but a lot of
the code had bit-rotted, and a good deal of the ARM-specific code
is not integrated into the RTCD structs at all.
I did not try to resolve the latter, merely to add the minimal amount
of protection around them to allow RTCD to work.
Those functions that were called based on an ifdef at the calling
site were expanded to check the RTCD flags at that site, but they
should be added to an RTCD struct somewhere in the future.
The functions invoked with global function pointers still are, but
these should be moved into an RTCD struct for thread safety (I
believe every platform currently supported has atomic pointer
stores, but this is not guaranteed).
The encoder's boolhuff functions did not even have _c and armv7
suffixes, and the correct version was resolved at link time.
The token packing functions did have appropriate suffixes, but the
version was selected with a define, with no associated RTCD struct.
However, for both of these, the only armv7 instruction they actually
used was rbit, and this was completely superfluous, so I reworked
them to avoid it.
The only non-ARMv4 instruction remaining in them is clz, which is
ARMv5 (not even ARMv5TE is required).
Considering that there are no ARM-specific configs which are not at
least ARMv5TE, I did not try to detect these at runtime, and simply
enable them for ARMv5 and above.
Finally, the NEON register saving code was completely non-reentrant,
since it saved the registers to a global, static variable.
I moved the storage for this onto the stack.
A single binary built with this code was tested on an ARM11 (ARMv6)
and a Cortex A8 (ARMv7 w/NEON), for both the encoder and decoder,
and produced identical output, while using the correct accelerated
functions on each.
I did not test on any earlier processors.
Change-Id: I45cbd63a614f4554c3b325c45d46c0806f009eaa
Most of the code that actually uses these matrices indexes them as
if they were a single contiguous array, and coverity produces
reports about the resulting accesses that overflow the static
bounds of the first row.
This is perfectly legal in C, but converting them to actual [16]
arrays should eliminate the report, and removes a good deal of
extraneous indexing and address operators from the code.
Change-Id: Ibda479e2232b3e51f9edf3b355b8640520fdbf23
The first implementation of the firstpass motion map for motion
compensated temporal filtering created a file, fpmotionmap.stt,
in the current working directory. This was not safe for multiple
encoder instances. This patch merges this data into the first pass
stats packet interface, so that it is handled like the other
(numerical) firstpass stats.
The new stats packet is defined as follows:
Numerical Stats (16 doubles) -- 128 bytes
Motion Map -- 1 byte / Macroblock
Padding -- to align packet to 8 bytes
The fpmotionmap.stt file can still be generated for debugging
purposes in the same way that the textual version of the stats
are available (defining OUTPUT_FPF in firstpass.c)
Change-Id: I083ffbfd95e7d6a42bb4039ba0e81f678c8183ca
x86-64 passes most arguments in registers. There is no need to
push them to the stack before using them.
Change-Id: I13c683f1358782682ecafaf1df3fb0af23b978ea
This rewriting reflects changes made in commit "Improve the
accuracy of forward walsh-hadamard transform". Since this function
is not called much, only a small encoder performance gain (~0.5% )
is seen.
Change-Id: Ie9df58a43028a11fd5b115c4bbe3141f7596578b
Instead of doing 8-bit data unpack and 16-bit subtraction, use
psubb to do 16 8-bit subtractions and pcmpgtb to preserve the
sign information. This does not bring noticable gain since
these functions are not called frequently.
Change-Id: I90a0dfaa3db9d422e4ada324076596ffb178548e
generic version got fixed, but not the arm version. fixes:
vp8/encoder/arm/mcomp_arm.c: In function 'vp8_full_search_sadx3':
vp8/encoder/arm/mcomp_arm.c:1208: warning: pointer targets in passing
argument 5 of 'fn_ptr->sdx3f' differ in signedness
vp8/encoder/arm/mcomp_arm.c:1208: note: expected 'unsigned int *' but
argument is of type 'int *'
and another unsigned change to keep the files similar
Change-Id: I1b6255dc3a03b90394a791ee0d15d8167d9454db
vp8_diamond_search_sadx4 isn't used in arm because there is no
corrosponding sdx4df as in x86. rather than keep it in sync with
../mcomp.c, delete it
vp8_hex_search had the original, more readable/understandable code if`d
out. it's also available in ../mcomp.c, so remove the dead copy
Change-Id: Ia42aa6e23b3a2e88040f467280befec091ec080e