The obj_int_extract code is no longer worth maintaining. It creates
significant issues when adapting for different build systems and no
longer offers as significant of a performance benefit due to
improvements in intrinsics.
Source files will remain until the various third-party builds are updated.
The neon fast quantizer has been moved to intrinsics. The armv6 version
has been removed because so few remaining targets require it.
Compilers and processors have improved significantly since the
pack_tokens code was written. The assembly is no longer faster than the
C code.
pack_tokens were the only optimizations for the armv5te targets so the targets
will be removed after the test infrastructure has been updated.
BUG=710
Change-Id: Ic785b167cd9f95eeff31c7c76b7b736c07fb30eb
Multi-threaded code was not updated to disable background
refresh for non base-layer frames at the time it was
disabled in the main C-code.
Change-Id: Id6cc376130b7def046942121cfd0526b4f0a71d4
The current way of counting inter_zz_count doesn't work correctly
in multi-threaded encoding. Calculating it after the frame is
encoded fixed the problem.
Change-Id: Ifcb1972cde950b8cc194f75c6d7b6af09e8b0e65
The sync interval for the multithreaded encoder was considered as not changing
during the encoding. This is not true if picture size is changed.
The encoder could dead-lock because the main thread and the other threads were
using different sync interval.
Change-Id: I75232bbdbc6c02d77f830d870fd8b4e96697c64e
Precalculated block ptrs do not need updates during encoding.
Set these at init stage.
Moved the allocation of 'mt_current_mb_col' (last encoded MB on each
row) to vp8_alloc_compressor_data(), so that it is correctly
reallocated when frame size is changing.
Change-Id: Idcdaa2d0cf3a7f782b7d888626b7cf22a4ffb5c1
Allows building the library with the gcc -pedantic option, for improved
portabilty. In particular, this commit removes usage of C99/C++ style
single-line comments and dynamic struct initializers. This is a
continuation of the work done in commit 97b766a46, which removed most
of these warnings for decode only builds.
Change-Id: Id453d9c1d9f44cc0381b10c3869fabb0184d5966
Change If4321cc5 fixed a bug caused by forward declarations not being
kept in sync across C files, resulting in a function call with the
wrong arguments. The commit moves the affected function declarations
into a header file, along with the other symbols from encodeframe.c
that were being sloppily shared.
Change-Id: I76a7b4c66d4fe175f9cbef7e52148655e4bb9ba1
RD costs were local to MACROBLOCK data and had to be copied all the
time to each thread's MACROBLOCK data. Tables moved to a common place
and only pointers are setup for each encoding thread.
vp8_cost_tokens() generates 'int' costs so changed all types to be
int (i.e. removed unsigned).
NOTE: Could do some more cleaning in vp8cx_init_mbrthread_data().
Change-Id: Ifa4de4c6286dffaca7ed3082041fe5af1345ddc0
Produce the token partitions on-the-fly, while processing each MB.
Context is updated at the beginning of each frame based on the
previoud frame's counters. Optimally encoder outputs partitions in
separate buffers. For frame based output, partitions are concatenated
internally.
Limitations:
- enabled just in combination with realtime-only mode
- number of encoding threads has to be equal or less than the
number of token partitions. For this reason, by default the encoder
will do 8 token partitions.
- vpxenc supports partition output (-P) just in combination with
IVF output format (--ivf)
Performance:
- Realtime encoder can be up to 13% faster (ARM) depending on the number
of threads and bitrate settings. Constant gain over the 5-16 speed
range.
- Token buffer reduced from one frame to 8 MBs
Quality:
- quality is affected by the delayed context updates. This again
dependents on input material, speed and bitrate settings. For VC
style input the loss seen is up to 0.2dB. If error-resilient=2
mode is used than the effect of this change is negligible.
Example:
./configure --enable-realtime-only --enable-onthefly-bitpacking
./vpxenc --rt --end-usage=1 --fps=30000/1000 -w 640 -h 480
--target-bitrate=1000 --token-parts=3 --static-thresh=2000
--ivf -P -t 4 -o strm.ivf tanya_640x480.yuv
Change-Id: I127295cb85b835fc287e1c0201a67e378d025d76
Second shot at this...
Sync with loopfilter thread as late as possible, usually just at the
beginning of next frame encoding. This returns control to application
faster and allows a better multicore scaling.
When PSNR packets are generated the final filtered frame is needed
imediatly so we cannot delay the sync. Same has to be done when
internal frame is previewed.
Change-Id: I64e110c8b224dd967faefffd9c93dd8dbad4a5b5
This commit continues the process of converting to the new RTCD
system. It removes the last of the VP8_ENCODER_RTCD struct references.
Change-Id: I2a44f52d7cccf5177e1ca98a028ead570d045395
This patch removes the local copies of the dequantize
constants and implements John's idea as described
in "Make a local copy of the dequantized data" commit.
Change-Id: Ic6b7d681f00bf63263f71ff1e39ab2f80729e8b2
This value needs to be copied to each thread's data structure.
This fixed artifact problem in multi-thread encoder.
Change-Id: Iab6d9745a1d44846aa503184705376f63a505597
vp8cx_mb_init_quantizer() needs to be called at least once to get
all values calculated. This change added one check to decide if
we could skip initialization or not.
Change-Id: I3f65eb548be57580a61444328336bc18c25c085b
I got this idea from Pascal (Thanks). Before encoding a macroblock,
copy it to a 16x16 buffer, and then read source data from there
instead. This will help keep the source data in cache, and help
with the performance.
Change-Id: Id05f4cb601299150511d59dcba0ae62c49b5b757
Some further re-structuring of activity masking code.
Still has various experimental switches.
Supports a metric based on intra encode.
Experimental comparison against a fixed activity target rather
than a frame average, for altering rd and zbin.
Overall the SSIM performance is similar to TT's original
code but there is a much smaller PSNR hit of circa
0.5% instead of 3.2%
Change-Id: I0fd53b2dfb60620b3f74d7415e0b81c1ac58c39a
Declared the bmi in BLOCKD as a union instead of B_MODE_INFO.
Then removed B_MODE_INFO completely.
Change-Id: Ieb7469899e265892c66f7aeac87b7f2bf38e7a67
vp8_fast_quantize_b_pair_neon function added to quantize
two adjacent blocks at the same time to improve performance.
- Additional 3-6% speedup compared to neon optimized fast
quantizer (Tanya VGA@30fps, 1Mbps stream, cpu-used=-5..-16)
Change-Id: I3fcbf141e5d05e9118c38ca37310458afbabaa4e