ea8d436f30
Change-Id: If6f20553159105c05f9a684cb7c8f3778c7894a1
336 lines
14 KiB
Plaintext
336 lines
14 KiB
Plaintext
2011-08-02 v0.9.7 "Cayuga"
|
|
Our third named release, focused on a faster, higher quality, encoder.
|
|
|
|
- Upgrading:
|
|
This release is backwards compatible with Aylesbury (v0.9.5) and
|
|
Bali (v0.9.6). Users of older releases should refer to the Upgrading
|
|
notes in this document for that release.
|
|
|
|
- Enhancements:
|
|
Stereo 3D format support for vpxenc
|
|
Runtime detection of available processor cores.
|
|
Allow specifying --end-usage by enum name
|
|
vpxdec: test for frame corruption
|
|
vpxenc: add quantizer histogram display
|
|
vpxenc: add rate histogram display
|
|
Set VPX_FRAME_IS_DROPPABLE
|
|
update configure for ios sdk 4.3
|
|
Avoid text relocations in ARM vp8 decoder
|
|
Generate a vpx.pc file for pkg-config.
|
|
New ways of passing encoded data between encoder and decoder.
|
|
|
|
- Speed:
|
|
This release includes across-the-board speed improvements to the
|
|
encoder. On x86, these measure at approximately 11.5% in Best mode,
|
|
21.5% in Good mode (speed 0), and 22.5% in Realtime mode (speed 6).
|
|
On ARM Cortex A9 with Neon extensions, real-time encoding of video
|
|
telephony content is 35% faster than Bali on single core and 48%
|
|
faster on multi-core. On the NVidia Tegra2 platform, real time
|
|
encoding is 40% faster than Bali.
|
|
|
|
Decoder speed was not a priority for this release, but improved
|
|
approximately 8.4% on x86.
|
|
|
|
Reduce motion vector search on alt-ref frame.
|
|
Encoder loopfilter running in its own thread
|
|
Reworked loopfilter to precalculate more parameters
|
|
SSE2/SSSE3 optimizations for build_predictors_mbuv{,_s}().
|
|
Make hor UV predict ~2x faster (73 vs 132 cycles) using SSSE3.
|
|
Removed redundant checks
|
|
Reduced structure sizes
|
|
utilize preload in ARMv6 MC/LPF/Copy routines
|
|
ARM optimized quantization, dfct, variance, subtract
|
|
Increase chrow row alignment to 16 bytes.
|
|
disable trellis optimization for first pass
|
|
Write SSSE3 sub-pixel filter function
|
|
Improve SSE2 half-pixel filter funtions
|
|
Add vp8_sub_pixel_variance16x8_ssse3 function
|
|
Reduce unnecessary distortion computation
|
|
Use diamond search to replace full search
|
|
Preload reference area in sub-pixel motion search (real-time mode)
|
|
|
|
- Quality:
|
|
This release focused primarily on one-pass use cases, including
|
|
video conferencing. Low latency data rate control was significantly
|
|
improved, improving streamability over bandwidth constrained links.
|
|
Added support for error concealment, allowing frames to maintain
|
|
visual quality in the presence of substantial packet loss.
|
|
|
|
Add rc_max_intra_bitrate_pct control
|
|
Limit size of initial keyframe in one-pass.
|
|
Improve framerate adaptation
|
|
Improved 1-pass CBR rate control
|
|
Improved KF insertion after fades to still.
|
|
Improved key frame detection.
|
|
Improved activity masking (lower PSNR impact for same SSIM boost)
|
|
Improved interaction between GF and ARFs
|
|
Adding error-concealment to the decoder.
|
|
Adding support for independent partitions
|
|
Adjusted rate-distortion constants
|
|
|
|
|
|
- Bug Fixes:
|
|
Removed firstpass motion map
|
|
Fix parallel make install
|
|
Fix multithreaded encoding for 1 MB wide frame
|
|
Fixed iwalsh_neon build problems with RVDS4.1
|
|
Fix semaphore emulation, spin-wait intrinsics on Windows
|
|
Fix build with xcode4 and simplify GLOBAL.
|
|
Mark ARM asm objects as allowing a non-executable stack.
|
|
Fix vpxenc encoding incorrect webm file header on big endian
|
|
|
|
|
|
2011-03-07 v0.9.6 "Bali"
|
|
Our second named release, focused on a faster, higher quality, encoder.
|
|
|
|
- Upgrading:
|
|
This release is backwards compatible with Aylesbury (v0.9.5). Users
|
|
of older releases should refer to the Upgrading notes in this
|
|
document for that release.
|
|
|
|
- Enhancements:
|
|
vpxenc --psnr shows a summary when encode completes
|
|
--tune=ssim option to enable activity masking
|
|
improved postproc visualizations for development
|
|
updated support for Apple iOS to SDK 4.2
|
|
query decoder to determine which reference frames were updated
|
|
implemented error tracking in the decoder
|
|
fix pipe support on windows
|
|
|
|
- Speed:
|
|
Primary focus was on good quality mode, speed 0. Average improvement
|
|
on x86 about 40%, up to 100% on user-generated content at that speed.
|
|
Best quality mode speed improved 35%, and realtime speed 10-20%. This
|
|
release also saw significant improvement in realtime encoding speed
|
|
on ARM platforms.
|
|
|
|
Improved encoder threading
|
|
Dont pick encoder filter level when loopfilter is disabled.
|
|
Avoid double copying of key frames into alt and golden buffer
|
|
FDCT optimizations.
|
|
x86 sse2 temporal filter
|
|
SSSE3 version of fast quantizer
|
|
vp8_rd_pick_best_mbsegmentation code restructure
|
|
Adjusted breakout RD for SPLITMV
|
|
Changed segmentation check order
|
|
Improved rd_pick_intra4x4block
|
|
Adds armv6 optimized variance calculation
|
|
ARMv6 optimized sad16x16
|
|
ARMv6 optimized half pixel variance calculations
|
|
Full search SAD function optimization in SSE4.1
|
|
Improve MV prediction accuracy to achieve performance gain
|
|
Improve MV prediction in vp8_pick_inter_mode() for speed>3
|
|
|
|
- Quality:
|
|
Best quality mode improved PSNR 6.3%, and SSIM 6.1%. This release
|
|
also includes support for "activity masking," which greatly improves
|
|
SSIM at the expense of PSNR. For now, this feature is available with
|
|
the --tune=ssim option. Further experimentation in this area
|
|
is ongoing. This release also introduces a new rate control mode
|
|
called "CQ," which changes the allocation of bits within a clip to
|
|
the sections where they will have the most visual impact.
|
|
|
|
Tuning for the more exact quantizer.
|
|
Relax rate control for last few frames
|
|
CQ Mode
|
|
Limit key frame quantizer for forced key frames.
|
|
KF/GF Pulsing
|
|
Add simple version of activity masking.
|
|
make rdmult adaptive for intra in quantizer RDO
|
|
cap the best quantizer for 2nd order DC
|
|
change the threshold of DC check for encode breakout
|
|
|
|
- Bug Fixes:
|
|
Fix crash on Sparc Solaris.
|
|
Fix counter of fixed keyframe distance
|
|
ARNR filter pointer update bug fix
|
|
Fixed use of motion percentage in KF/GF group calc
|
|
Changed condition for using RD in Intra Mode
|
|
Fix encoder real-time only configuration.
|
|
Fix ARM encoder crash with multiple token partitions
|
|
Fixed bug first cluster timecode of webm file is wrong.
|
|
Fixed various encoder bugs with odd-sized images
|
|
vp8e_get_preview fixed when spatial resampling enabled
|
|
quantizer: fix assertion in fast quantizer path
|
|
Allocate source buffers to be multiples of 16
|
|
Fix for manual Golden frame frequency
|
|
Fix drastic undershoot in long form content
|
|
|
|
|
|
2010-10-28 v0.9.5 "Aylesbury"
|
|
Our first named release, focused on a faster decoder, and a better encoder.
|
|
|
|
- Upgrading:
|
|
This release incorporates backwards-incompatible changes to the
|
|
ivfenc and ivfdec tools. These tools are now called vpxenc and vpxdec.
|
|
|
|
vpxdec
|
|
* the -q (quiet) option has been removed, and replaced with
|
|
-v (verbose). the output is quiet by default. Use -v to see
|
|
the version number of the binary.
|
|
|
|
* The default behavior is now to write output to a single file
|
|
instead of individual frames. The -y option has been removed.
|
|
Y4M output is the default.
|
|
|
|
* For raw I420/YV12 output instead of Y4M, the --i420 or --yv12
|
|
options must be specified.
|
|
|
|
$ ivfdec -o OUTPUT INPUT
|
|
$ vpxdec --i420 -o OUTPUT INPUT
|
|
|
|
* If an output file is not specified, the default is to write
|
|
Y4M to stdout. This makes piping more natural.
|
|
|
|
$ ivfdec -y -o - INPUT | ...
|
|
$ vpxdec INPUT | ...
|
|
|
|
* The output file has additional flexibility for formatting the
|
|
filename. It supports escape characters for constructing a
|
|
filename from the width, height, and sequence number. This
|
|
replaces the -p option. To get the equivalent:
|
|
|
|
$ ivfdec -p frame INPUT
|
|
$ vpxdec --i420 -o frame-%wx%h-%4.i420 INPUT
|
|
|
|
vpxenc
|
|
* The output file must be specified with -o, rather than as the
|
|
last argument.
|
|
|
|
$ ivfenc <options> INPUT OUTPUT
|
|
$ vpxenc <options> -o OUTPUT INPUT
|
|
|
|
* The output defaults to webm. To get IVF output, use the --ivf
|
|
option.
|
|
|
|
$ ivfenc <options> INPUT OUTPUT.ivf
|
|
$ vpxenc <options> -o OUTPUT.ivf --ivf INPUT
|
|
|
|
|
|
- Enhancements:
|
|
ivfenc and ivfdec have been renamed to vpxenc, vpxdec.
|
|
vpxdec supports .webm input
|
|
vpxdec writes .y4m by default
|
|
vpxenc writes .webm output by default
|
|
vpxenc --psnr now shows the average/overall PSNR at the end
|
|
ARM platforms now support runtime cpu detection
|
|
vpxdec visualizations added for motion vectors, block modes, references
|
|
vpxdec now silent by default
|
|
vpxdec --progress shows frame-by-frame timing information
|
|
vpxenc supports the distinction between --fps and --timebase
|
|
NASM is now a supported assembler
|
|
configure: enable PIC for shared libs by default
|
|
configure: add --enable-small
|
|
configure: support for ppc32-linux-gcc
|
|
configure: support for sparc-solaris-gcc
|
|
|
|
- Bugs:
|
|
Improve handling of invalid frames
|
|
Fix valgrind errors in the NEON loop filters.
|
|
Fix loopfilter delta zero transitions
|
|
Fix valgrind errors in vp8_sixtap_predict8x4_armv6().
|
|
Build fixes for darwin-icc
|
|
|
|
- Speed:
|
|
20-40% (average 28%) improvement in libvpx decoder speed,
|
|
including:
|
|
Rewrite vp8_short_walsh4x4_sse2()
|
|
Optimizations on the loopfilters.
|
|
Miscellaneous improvements for Atom
|
|
Add 4-tap version of 2nd-pass ARMv6 MC filter.
|
|
Improved multithread utilization
|
|
Better instruction choices on x86
|
|
reorder data to use wider instructions
|
|
Update NEON wide idcts
|
|
Make block access to frame buffer sequential
|
|
Improved subset block search
|
|
Bilinear subpixel optimizations for ssse3.
|
|
Decrease memory footprint
|
|
|
|
Encoder speed improvements (percentage gain not measured):
|
|
Skip unnecessary search of identical frames
|
|
Add SSE2 subtract functions
|
|
Improve bounds checking in vp8_diamond_search_sadx4()
|
|
Added vp8_fast_quantize_b_sse2
|
|
|
|
- Quality:
|
|
Over 7% overall PSNR improvement (6.3% SSIM) in "best" quality
|
|
encoding mode, and up to 60% improvement on very noisy, still
|
|
or slow moving source video
|
|
|
|
Motion compensated temporal filter for Alt-Ref Noise Reduction
|
|
Improved use of trellis quantization on 2nd order Y blocks
|
|
Tune effect of motion on KF/GF boost in two pass
|
|
Allow coefficient optimization for good quality speed 0.
|
|
Improved control of active min quantizer for two pass.
|
|
Enable ARFs for non-lagged compress
|
|
|
|
2010-09-02 v0.9.2
|
|
- Enhancements:
|
|
Disable frame dropping by default
|
|
Improved multithreaded performance
|
|
Improved Force Key Frame Behaviour
|
|
Increased rate control buffer level precision
|
|
Fix bug in 1st pass motion compensation
|
|
ivfenc: correct fixed kf interval, --disable-kf
|
|
- Speed:
|
|
Changed above and left context data layout
|
|
Rework idct calling structure.
|
|
Removed unnecessary MB_MODE_INFO copies
|
|
x86: SSSE3 sixtap prediction
|
|
Reworked IDCT to include reconstruction (add) step
|
|
Swap alt/gold/new/last frame buffer ptrs instead of copying.
|
|
Improve SSE2 loopfilter functions
|
|
Change bitreader to use a larger window.
|
|
Avoid loopfilter reinitialization when possible
|
|
- Quality:
|
|
Normalize quantizer's zero bin and rounding factors
|
|
Add trellis quantization.
|
|
Make the quantizer exact.
|
|
Updates to ARNR filtering algorithm
|
|
Fix breakout thresh computation for golden & AltRef frames
|
|
Redo the forward 4x4 dct
|
|
Improve the accuracy of forward walsh-hadamard transform
|
|
Further adjustment of RD behaviour with Q and Zbin.
|
|
- Build System:
|
|
Allow linking of libs built with MinGW to MSVC
|
|
Fix target auto-detection on mingw32
|
|
Allow --cpu= to work for x86.
|
|
configure: pass original arguments through to make dist
|
|
Fix builds without runtime CPU detection
|
|
msvs: fix install of codec sources
|
|
msvs: Change devenv.com command line for better msys support
|
|
msvs: Add vs9 targets.
|
|
Add x86_64-linux-icc target
|
|
- Bugs:
|
|
Potential crashes on older MinGW builds
|
|
Fix two-pass framrate for Y4M input.
|
|
Fixed simple loop filter, other crashes on ARM v6
|
|
arm: fix missing dependency with --enable-shared
|
|
configure: support directories containing .o
|
|
Replace pinsrw (SSE) with MMX instructions
|
|
apple: include proper mach primatives
|
|
Fixed rate control bug with long key frame interval.
|
|
Fix DSO link errors on x86-64 when not using a version script
|
|
Fixed buffer selection for UV in AltRef filtering
|
|
|
|
|
|
2010-06-17 v0.9.1
|
|
- Enhancements:
|
|
* ivfenc/ivfdec now support YUV4MPEG2 input and pipe I/O
|
|
* Speed optimizations
|
|
- Bugfixes:
|
|
* Rate control
|
|
* Prevent out-of-bounds accesses on invalid data
|
|
- Build system updates:
|
|
* Detect toolchain to be used automatically for native builds
|
|
* Support building shared libraries
|
|
* Better autotools emulation (--prefix, --libdir, DESTDIR)
|
|
- Updated LICENSE
|
|
* http://webmproject.blogspot.com/2010/06/changes-to-webm-open-source-license.html
|
|
|
|
|
|
2010-05-18 v0.9.0
|
|
- Initial open source release. Welcome to WebM and VP8!
|
|
|