this function is failing the 'accum == 0' assert on skia bots for
rescaling to 13x13
BUG=skia:6682
Change-Id: I9f9f3adf28cec63ad6e38ed3128f18825d5b70cc
drop './' from the reference in webp_quality_LDADD.. this form is used
in the other makefiles. this fixes a parallel build failure seen under
freebsd:
make[1]: don't know how to make ./libwebpextras.la. Stop
Change-Id: I59635a0c747d402cd990f6379eb1948c3e40f278
Documentation says: "if kmin == 0, then key-frame insertion is disabled;
and if kmax == 0, then all frames will be key-frames."
Reading this, you'd expect that if kmax == 0, then with any kmin <= 0
all frames will be key-frames. But actually the kmin <= 0 test is caught
first and you get the opposite (no keyframes but the first). You'd have
instead to set kmax == 0 and any value kmin > 0, which is absolutely
counter-intuitive (reversing order).
Moreover kmax == 1 has no valid kmin (kmin == 1 conflicts with the
`kmax > kmin` rule and kmin == 0 conflicts with `kmin >= kmax / 2 + 1`).
So it should be considered an exception too.
Instead I propose this new logic:
- kmax == 1 means that all frames are keyframes (you are explicitly
requesting a keyframe every 1 frame at most, i.e. all frames).
- kmax == 0 means no keyframes (you ask for a keyframe every 0 frames,
i.e. never).
This is more "logical" language-wise, and also does not involve any
conflicts about what if both kmax and kmin are 0, since now a single
property value is meaningful for the 2 exceptional cases.
Change-Id: Ia90fb963bc26904ff078d2e4ef9f74b22b13a0fd
(cherry picked from commit 2dc0bdcaee)
Compile with XCode, it appears quite slower than the C-version,
especially for arm64.
Change-Id: Ic46dba184a36be454fef674129d2f909003788fc
(cherry picked from commit 4f3e3bbd44)
the default is format is roman, fixes:
`R' is a string (producing the registered sign), not a macro.
Change-Id: If2bce714eff1237cd1702ae1143323249d85b93b
strip '_debug' from the library basename to form the resource target.
since:
919f9e2f Merge "add .rc files for windows dll versioning"
BUG=webp:323
Change-Id: I97cfa48afa846211385720034a40aa452c68134c
this avoids duplicates between these trees and dsp/, e.g., enc/tree.c,
dec/tree.c, making pulling the whole library source tree into one target
possible
BUG=webp:279
Change-Id: I060a614833c7c24ddd37bf641702ae6a5eef1775
remove libimagedec from webp_quality, only the webp headers are
inspected. libwebpextras isn't needed by get_disto.
Change-Id: Ib85f97e2c4a9edd97392fd20ef294d1ccc76dda5
We can switch at run-time between the standard GetCoeffs() critical
function, that uses a fast variant of VP8GetBit().
However, some platforms have slow instructions that make standard
VP8GetBit() slow. GetCoeffs() is the right level of branching to
switch to GetCoeffsAlt() that avoids these slow instructions in some
not-frequent cases.
Next patch will upgrade VP8GetBit() to use clz, after this one
is proved to be neutral speed-wise.
Change-Id: Ia6cef5de9de6131574d2202bbc0bea8559c9b693
vmlal_u8() is prone to overflow during the accumulation.
There was a mismatch happening at low q mostly. Because in this
case the distortion is important and the accumulated sum was
later than 16bit-unsigned.
Change-Id: I1a08a2f744bcdf0b26647e61b9ee92a0c2e28fe8
This makes the structure more generic, without the hard-coded
internal structure.
This is a borderline incompatible ABI change, even if WebPIDecoder structure
is opaque.
Change-Id: I518765c3f76fc17a136cef045a5a8aa70ed70e85
30% faster on x86, 5% faster on N5.
New generic function: WebPLog2FloorC()
This function is called as fallback for BitsLog2Floor() when there's
no clz() available.
Change-Id: Ica15c6092112e514c0e200fab89c434de48d4b19
This is meant to be used for run-time detection of slow platforms
regarding instructions like pshufb and bsr.
Adapted from libvpx patch: https://chromium-review.googlesource.com/#/c/367731
Change-Id: I2c22fbb9aae699d87a041393ba1ad5f1f21ff640
and 15% faster MultARGBRow()
by switching to formulae:
X / 255 = (X + 1 + (X >> 8)) >> 8 for any 16bit value X.
(X / 255 + .5) = (XX + (XX >> 8)) >> 8, with XX = X + 128
Change-Id: Ia4a7408aee74d7f61b58f5dff304d05546c04e81
The previous optimization was performing dichotomy on a function that
is anything in practice, hence a bit of randomness.
Also, two magic constants were used, one for an extra constant cost,
one for an extra linear cost. Both values/models were empirical.
A brute force search for the best cache size is now performed.
To have less CPU impact, a speed optimization is also made by not
inserting a value again and again.
This makes sense but it's also the most common case of when LZ77 is
useful hence an overall improvement sometimes.
Change-Id: I57de5750ad2313b2feecbcd15cd6e4feeb98e5c8
libwebp-0.5.2
- 12/13/2016: version 0.5.2
This is a binary compatible release.
This release covers CVE-2016-8888 and CVE-2016-9085.
* further security related hardening in the tools; fixes to
gif2webp/AnimEncoder (issues #310, #314, #316, #322), cwebp/libwebp (issue
#312)
* full libwebp (encoder & decoder) iOS framework; libwebpdecoder
WebP.framework renamed to WebPDecoder.framework (issue #307)
* CMake support for Android Studio (2.2)
* miscellaneous build related fixes (issue #306, #313)
* miscellaneous documentation improvements (issue #225)
* minor lossy encoder fixes and improvements
* tag 'v0.5.2': (54 commits)
update ChangeLog
anim_util: quiet implicit conv warnings in 32-bit
jpegdec: correct ContextFill signature
Remove some errors when compiling the code as C++.
vwebp: clear canvas during resize w/o animation
tiffdec: restore libtiff 3.9.x compatibility
update NEWS
AnimEncoder: avoid freeing uninitialized memory pointer.
WebPAnimEncoder: If 'minimize_size' and 'allow_mixed' on, try lossy + lossless.
fix a potential overflow with MALLOC_LIMIT
bump version to 0.5.2
update AUTHORS & .mailmap
iosbuild.sh: add WebPDecoder.framework + encoder
AnimEncoder: Correctly skip a frame when sub-rectangle is empty.
Fix assertions in WebPRescalerExportRow()
fix a typo in WebPPictureYUVAToARGB's doc
systematically call WebPDemuxReleaseIterator() on dec->prev_iter_
doc: use two's complement explicitly for uint8->int8 conversion
Anim_encoder: correctly handle enc->prev_candidate_undecided_
WebPPictureDistortion(): free() -> WebPSafeFree()
...
Change-Id: I16bcf54af41ce8fad98d4fbc8aa1df58f338fc23
This is duplicating code from compiler flag checking that was once
added (then reverted to CMake) as the problem seem to be on the
clang side as detailed in:
http://public.kitware.com/Bug/view.php?id=13194
I also tried to remove a similar warning with pthreads but there
is also an issue on the clang side:
https://llvm.org/bugs/show_bug.cgi?id=7798
Change-Id: I5b0061f0f71e49b493c5ee0c98f70533c28164bd
the sizes are already validated by CheckSizeForOverflow(), add casts to
size_t to avoid -Wshorten-64-to-32
Change-Id: Ida9102c2104f4a334a0ad16d6e01a12bedfd4eec
(cherry picked from commit 1e2e25b0d4)
this corrects the checkboard pattern displayed with transparent images
Change-Id: I5f46dbc9fa3893d61f5f1d4fda643ac030238f94
(cherry picked from commit 67c25ad5b4)
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().
BUG=webp:322
Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
(cherry picked from commit 3ebe1c0003)
This improves compression by ~5% at default quality.
If only 'allow_mixed' is on (but 'minimize_size' isn't), we continue to
use a heuristic to try one of the two or both.
Change-Id: Ia573a73ea26ad25f9debff759eed69d2b0449e82
(cherry picked from commit 3f4042b52a)
In GenerateCandidates(), when candidate_ll->evaluate_ and
candidate_lossy->evaluate_ are both true, if lossless encoding
exits on error, candidate_ll->evaluate_ would not be correctly
reset. This will cause freeing uninitialized memory pointer in
SetFrame().
BUG=webp:322
Change-Id: I481b49a186e4fa3607ce71b4543a481083edf444
the sizes are already validated by CheckSizeForOverflow(), add casts to
size_t to avoid -Wshorten-64-to-32
Change-Id: Ida9102c2104f4a334a0ad16d6e01a12bedfd4eec
Passing the 'verbose' flag to DecodeWebP() wasn't mandated,
and was creating a forced dependency between imageio/ and examples/
Change-Id: Ib3d3f381a7b699df369a97cfb44360580422df11
after:
fbba5bc optimize predictor #1 in plain-C For some reason, gcc has hard
time inlining this one...
Change-Id: I2e2416593acd4c9d14958d8757bfd284d999100b
WebPDecoder.framework replaces WebP.framework as the decode-only
framework.
WebP.framework now includes the full library allowing for use of the
encoder.
BUG=webp:307
Change-Id: Ic8139f201576bf94b0d4a31cb7cad0655cd8ba97
(cherry picked from commit 1d5046d1f9)
For some reason, gcc has hard time inlining this one...
Also optimize predictor #0 and #1 for encoding, so we don't have to
call the generic pointers VP8LPredictors[...]
Change-Id: I1ff31e3b83874b53f84fe23487f644619fd61db9
WebPDecoder.framework replaces WebP.framework as the decode-only
framework.
WebP.framework now includes the full library allowing for use of the
encoder.
BUG=webp:307
Change-Id: Ic8139f201576bf94b0d4a31cb7cad0655cd8ba97
Set enc->prev_candidate_undecided_ as 0 when a frame is not chosen
as a possible keyframe, so that the dispose method can be
dispose-to-background.
Change-Id: If2899f5dbc06fb53705fb8240072ab6440a6de12
(cherry picked from commit 29fedbf58b)
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)
We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.
Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
(cherry picked from commit 0a3838ca77)
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).
Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.
Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
(cherry picked from commit e4cd4daf74)
some multiplies here and there needed some extra checks
and error reporting. Even if width * height is guaranteed
to be < 2**32, we were multiplying by num_channels and
triggering a 32b overflow.
Some multiplies were not using size_t or uint64_t, additionally.
Change-Id: If2a35b94c8af204135f4b88a7fd63850aa381bbf
(cherry picked from commit 1c36440094)
Change it from transparent white to transparent black, which matches
the transparent color assumed in Webp dispose-to-background method.
Also pre-multiply background colors before comparison in anim_diff,
just as what is done with regular pixel values.
Change-Id: I5a790522df21619c666ce499f73e42294ed276f2
(cherry picked from commit 43bd895879)
WEBP_HAVE_FLAG_LOCAL -> WEBP_HAVE_FLAG_${WEBP_SIMD_FLAG} this will
include the flag being tested in the output:
-- Performing Test WEBP_HAVE_FLAG_NEON
Change-Id: I1c0a143a857b16e4eb1fcf8b23c176380a5fef29
(cherry picked from commit ee1057e3b1)
This will well isolate contributions for original code,
generated code and SIMD (especially for Android).
Change-Id: Ie47664decc7f43c2f57260a72cab951c347281a7
(cherry picked from commit 6c62841076)
brings the final libwebp.so size down 16/20K with arm64/armv7 builds
using ndk-r13
Change-Id: I20d8aba61d6b692b0fc32f4b271e2f9872f03c28
(cherry picked from commit de568abfdb)
quiets a -Wimplicit-function-declaration with some configurations of gcc
(-std=c99).
_POSIX_C_SOURCE is preferred over _BSD_SOURCE with newer versions of
glibc
Change-Id: I378bffb13ba52ff5c4bad1433090dcc387e5d507
(cherry picked from commit bfab894739)
The image is scaled to fit the whole viewport.
Avoid some oddities with offsets, etc.
removes some TODO.
Change-Id: I52fae9ca80a2feed234f32261c7f6358d7594e21
(cherry picked from commit 9310d19258)
max_i4_header_bits_ could drop to zero for difficult image and trigger
a loop. Surprisingly, StatLoop() didn't have this bug.
Change-Id: Idc0f9eadef30a2b2f02041b994f25def30901e36
(cherry picked from commit 21e7537abe)
Pick the mode with the smallest alpha.
It only affects m0, in which case the mode decision is not re-examined
later in VP8Decimate(). Tests on some natural content png images show
PSNR increase as well as visual quality improvement.
Change-Id: Iea997e718cd7477160fa05eb7cfb35f4cec2fa9a
(cherry picked from commit 1377ac2ec1)
use a compile check on a separate file to avoid assuming using
arm_neon.h is safe to use without flags when just the file itself is
self-contained with GCC target pragmas.
BUG=webp:313
Change-Id: I48f92ae3e6e4a9468ea5b937c80a89ee40b2dcfd
(cherry picked from commit 0104d730bf)
Average3 created a slowdown of 1-2% in lossless decoding.
Average4 created a slowdown of 2-3% in lossless decoding.
Change-Id: Ic2e62cdd83fc897887ec2bf41ea7cadbada84fe5
...instead of the pointers stored in the array.
Should be faster (inlined) and safer.
Also: suffix explicitly the functions with _SSE2
Change-Id: Ie7de4b8876caea15067fdbe44abfedd72b299a90
Before, a first thread could enter VP8LDspInitSSE2, set
VP8LPredictorsAdd to an SSE2 version BEFORE another thread
would do the memcpy from VP8LPredictorsAdd to VP8LPredictorsAdd_C
thus leading to a C version actually being the SSE2 one (which
would then create an infinite recursion in the SSE2 predictors
at execution).
Change-Id: I224f4ceab31d38f77a1375a7e2636a6014080e3a
it actually disables the disposal / blending method
and just displays the raw delta values.
Useful for debugging.
TODO: Outline the refreshed area with a drawn rectangle?
Change-Id: I6f8cddd0aad8b953cff78a693ae7e8c31def010c
Benchmarks from vrabaud@:
8BIT/GRAY corpus speed: faster: -4.3 % , corpus size: unchanged
skal/sources_png_skal corpus speed: faster: -5.2 % , corpus size: unchanged
images/png_rgb corpus speed: faster: -5.1 % , corpus size: unchanged
images/lpcb corpus speed: unchanged, corpus size: unchanged
images/png_big corpus speed: faster: -1.7 % , corpus size: unchanged
images/png_doc corpus speed: unchanged, corpus size: unchanged
images/png_1bit corpus speed: faster: -1.2 % , corpus size: unchanged
images/jpeg_small corpus speed: unchanged, corpus size: unchanged
images/icip_core1 corpus speed: unchanged, corpus size: unchanged
images/png_gray corpus speed: faster: -2.5 % , corpus size: unchanged
images/jpeg_high_quality corpus speed: faster: -4.0 % , corpus size: unchanged
images/jpeg corpus speed: faster: -2.3 % , corpus size: unchanged
images/png_translucent corpus speed: faster: -2.8 % , corpus size: unchanged
images/gif corpus speed: faster: -1.4 % , corpus size: unchanged
images/png_opaque corpus speed: faster: -2.8 % , corpus size: unchanged
images/png_rgb_opaque corpus speed: unchanged, corpus size: unchanged
images/png_indexed corpus speed: faster: -2.0 % , corpus size: unchanged
images/all corpus speed: faster: -1.5 % , corpus size: unchanged
images/png_small corpus speed: unchanged, corpus size: unchanged
images/png corpus speed: unchanged, corpus size: unchanged
images/gif_still corpus speed: faster: -1.6 % , corpus size: unchanged
Change-Id: I69fe11baa188c5d32cbc77a84b8c0deae13d792b
avoiding triplets of data should make it easier to write SSE2 versions.
FilterRow() can now filter all input in one single pass
-> conversion is 15-20% faster (but still overall slow compared to -pre 0)
Change-Id: I14c3215e672fdecde7ec80394e814bdc7445019f
When try_both_modes=0 (that is: -m 0 or -m 1), and the mode is i4,
we were still sometimes falling back to (unexplored, uninitialized) i16 mode,
which resulted in a enc/dec mismatch.
This was mainly occurring for large images (when bit_limit is low enough)
We disable the fall-back by disabling bit_limit using a large MAX_COST threshold.
Change-Id: I0c60257595812bd813b239ff4c86703ddf63cbf8
The options are now:
-duration d -> set the whole animation to duration 'd'
-duration d,s -> set only frame 's' to duration 'd'
-duration d,s,e -> set only interval [s,d] to duration 'd'
+ style fix
Change-Id: I72e95282d520146f76696666f44280ad9506affa
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I6ad9f93b6c4b665c559bff87716a7b847f66a20d
(cherry picked from commit 342e15f0ce)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I2881bec2884b550c966108beeff1bf0d8ef9f76b
(cherry picked from commit 1147ab4ee7)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I693cbb295df9cf94aa89294b19c0496bdbe84d18
(cherry picked from commit de9fa5074e)
avoids int rollover when working with large input
BUG=webp:312
Change-Id: I3d7b689be8d5751248a82d1021243d80d3f67203
(cherry picked from commit deb1b83199)
this will force a constant duration for an interval of frames
in an animation.
Notes:
a) '-duration [...]' can be repeated as many times as needed.
b) intervals are taken into account in option order. If they overlap, values will be overwritten.
c) 'start' and 'end' can be omitted, but not the duration value.
d) 'end' can be equal to '0', in which case it means 'last frame'
e) single-image files are untouched (ie. not turned into an animation file).
Some example usage:
webpmux -duration 150 in.webp -o out.webp
webpmux -duration 33,10,0 in.webp -o out.webp
webpmux -duration 200,2 -duration 150,0,50 in.webp -o out.webp
Change-Id: I9b595dafa77f9221bacd080be7858b1457f54636
the min-distortion was quite too low. And we were also
considering the fully skipped macroblocks (nz=0) in the stats.
We need to have at least *some* non-zero dc coeffs (nz=0x100XXXX).
Fix also two typos in StoreMaxDelta: the v0/v1 comparison was wrong,
and the DCs[] coeffs are actually already in ZigZag order.
Change-Id: I602aaa74b36f7ce80017e506212c7d6fd9deba1f
and use it to validate decoder allocations. fixes a crash in jpegdec at
least.
BUG=webp:312
Change-Id: Ia940590098f29510add6aad10a8dfe9e9ea46bf4
(cherry picked from commit bc86b7a8a1)
make this function return success/failure.
an empty map or out of bounds read is treated as an error.
BUG=webp:316
Change-Id: Ic8651836915ea4dd8e0dc81ca8d5d3f247be1ff8
(cherry picked from commit cac9a36a23)
- remove the inclusion of format_constants.h
- use incremental update of pointer, instead of arithmetic
(follow-up to e2affacc35)
Change-Id: I48420c8defc8d47339f54bc00e9da9617f08ab32
(cherry picked from commit 9f5c8eca79)
Mostly: avoid doing calculation like: ptr + j * stride
when stride is 'int'. Rather use size_t, or pointer increments (ptr += stride)
when possible.
BUG=webp:314
Change-Id: I81c684b515dd1ec4f601f32d50a6e821c4e46e20
(cherry picked from commit e2affacc35)
DGifGetExtension() may successfully return, but the data pointer should
still be validated
BUG=webp:310
Change-Id: I6cfe617871fef2fe07887e5f48bb20f7ab7cfb35
(cherry picked from commit 806f6279ae)
make this function return success/failure.
an empty map or out of bounds read is treated as an error.
BUG=webp:316
Change-Id: Ic8651836915ea4dd8e0dc81ca8d5d3f247be1ff8
This reverts commit f048d38d38.
the 'len' in Remap refers to the src[] not the colormap; this change
breaks valid files
BUG=webp:316
Change-Id: I1ed40075c2194df91d345cb6f29619b1f5cc96fc
Roughly, if both the source and the reference areas are
darker too dark (R/G/B <= ~6), they are ignored.
One caveat: SSIM calculation won't work for U/V planes,
which are 128-centered and not related to luminance.
But WebPPlaneDistortion() enforces the conversion to RGB,
if needed.
Change-Id: I586c2579c475583b8c90c5baefd766b1d5aea591
Make WebPPictureDistortion() only compute distortion on A/R/G/B planes, not Y/U/V(A).
(not just for SSIM, but PSNR too).
This is to avoid problems with using SSIM on U/V channels.
If Y/U/V distortion is needed, one can always use WebPPlaneDistortion() individually.
Change-Id: If8bc9c3ac12a8d2220f03224694fc389b16b7da9
use a compile check on a separate file to avoid assuming using
arm_neon.h is safe to use without flags when just the file itself is
self-contained with GCC target pragmas.
BUG=webp:313
Change-Id: I48f92ae3e6e4a9468ea5b937c80a89ee40b2dcfd
Also introduce an always-failing 'reader' for unknown formats.
So we don't have to check reader==NULL, code is more regular.
-> We can get read of specific ReadPNG(), ReadJPEG(), ... declaration and use.
Change-Id: I290759705420878f00c7223c726d4ad404afd9c4
- remove the inclusion of format_constants.h
- use incremental update of pointer, instead of arithmetic
(follow-up to e2affacc35)
Change-Id: I48420c8defc8d47339f54bc00e9da9617f08ab32
Mostly: avoid doing calculation like: ptr + j * stride
when stride is 'int'. Rather use size_t, or pointer increments (ptr += stride)
when possible.
BUG=webp:314
Change-Id: I81c684b515dd1ec4f601f32d50a6e821c4e46e20
When compiling as experimental, WEBP_EXPERIMENTAL_FEATURES
would not be defined because the header defining it would
not be included.
Hence runtime errors in debug mode when running:
./cwebp -lossles whatever
...
Error! Cannot encode picture as WebP
Error code: 4 (INVALID_CONFIGURATION: configuration is invalid)
(detail: WebPConfig would have a random value set for
delta_palettization as config.c does not consider
it to exist.)
Change-Id: I41761cffe81a971130ed514b195a73d1c6dac1b7
DGifGetExtension() may successfully return, but the data pointer should
still be validated
BUG=webp:310
Change-Id: I6cfe617871fef2fe07887e5f48bb20f7ab7cfb35
* prevent 64bit overflow by controlling the 32b->64b conversions
and preventively descaling by 8bit before the final multiply
* adjust the threshold constants C1 and C2 to de-emphasis the dark
areas
* use a hat-like filter instead of box-filtering to avoid blockiness
during averaging
SSIM distortion calc is actually *faster* now in SSE2, because of the
unrolling during the function rewrite.
The C-version is quite slower because still un-optimized.
Change-Id: I96e2715827f79d26faae354cc28c7406c6800c90
quiets a -Wimplicit-function-declaration with some configurations of gcc
(-std=c99).
_POSIX_C_SOURCE is preferred over _BSD_SOURCE with newer versions of
glibc
Change-Id: I378bffb13ba52ff5c4bad1433090dcc387e5d507
The image is scaled to fit the whole viewport.
Avoid some oddities with offsets, etc.
removes some TODO.
Change-Id: I52fae9ca80a2feed234f32261c7f6358d7594e21
If a small hash map can be used, use it to avoid binary search.
This fist hash function that is tried works with the previous
use case of having indexed data in green.
Change-Id: I2f91cec5f3ca7e9c393fd829e69e09bab74f4e7c
using -ssim -o will trigger SSIM map calculation instead of add-diff map.
-gray converts the error map to intensity instead of having each channels' error separated.
Change-Id: I4bdb88880a252e5562aa4e0e3c2353ad93aef20e
some multiplies here and there needed some extra checks
and error reporting. Even if width * height is guaranteed
to be < 2**32, we were multiplying by num_channels and
triggering a 32b overflow.
Some multiplies were not using size_t or uint64_t, additionally.
Change-Id: If2a35b94c8af204135f4b88a7fd63850aa381bbf
The most common conditions are re-ordered and cached.
iter_min was recently introduced to make sure enough iterations
are made in cases where there are many matches (mostly uniform regions).
Now that those are properly analyzed, it becomes useless.
Change-Id: Id3010ee4ec66b84d602fcb926f91eb9155ad27f4
-Skip examining quantized levels that are too high.
-Calculate last_pos_cost only when needed.
Encoding speed for m6 is increased by about 3%;
Compression performance is neutral.
Change-Id: I8af70b049587cca0375d9b3eb00479ec7c0c842a
max_i4_header_bits_ could drop to zero for difficult image and trigger
a loop. Surprisingly, StatLoop() didn't have this bug.
Change-Id: Idc0f9eadef30a2b2f02041b994f25def30901e36
Pick the mode with the smallest alpha.
It only affects m0, in which case the mode decision is not re-examined
later in VP8Decimate(). Tests on some natural content png images show
PSNR increase as well as visual quality improvement.
Change-Id: Iea997e718cd7477160fa05eb7cfb35f4cec2fa9a
SSIM results are incompatible with previous version!
We're now averaging the SSIM value for each pixels instead of
printing a frame-level global SSIM value.
* Got rid of some old code
* switched to uint32_t for accumulation
* refactoring
SSIM calculation is ~4x faster now.
Change-Id: I48d838e66aef5199b9b5cd5cddef6a98411f5673
Having it architecture dependent resulted in an extra
function call of an extern function, hence no inlining and
a 5-10% impact on performance.
Change-Id: I0ff40d2d881edc76d3594213a64ee53097d42450
-print_psnr is now much faster because it doesn't use the SSIM code.
The SSIM speed-up and re-write will come later.
Change-Id: Iabf565e0a8b41651d8164df1266cfeded4ab4823
we don't need to centralize best_uv[] since target_uv[] and best_rgb_uv[]
are already centralized. The diff 'W' was just in the ~[-2,2] range, so
we can ignore the correction.
Overall speed-impact is not large, though. Around ~4% faster conversion.
Output with -pre 4 is expected to be slightly different
Change-Id: Ib59f033955577c49b084d0560108020f42d84102
also: remove the useless clipping in StoreGray()
For speed reason, the 'gray' plane was initialized with the same
value for 2x2 block. But in some cases (underlying camera noise, e.g.),
it could lead to instability during iteration, noise amplification,
and visible banding.
Using a precise (but slower) initialization solves the issue, and
since the convergence is faster, we might actually gain some speed.
Change-Id: I81c42101497e7096a8f60289d710f5a3bcb0ddea
We usually need at least 2 iterations to converge
(and usually not much more after that). Only 1 was not enough.
Change-Id: Iaf802ea81afa2596f4ba045c92f5eaff61623b7b
Change it from transparent white to transparent black, which matches
the transparent color assumed in Webp dispose-to-background method.
Also pre-multiply background colors before comparison in anim_diff,
just as what is done with regular pixel values.
Change-Id: I5a790522df21619c666ce499f73e42294ed276f2
Set enc->prev_candidate_undecided_ as 0 when a frame is not chosen
as a possible keyframe, so that the dispose method can be
dispose-to-background.
Change-Id: If2899f5dbc06fb53705fb8240072ab6440a6de12
No need to find backward references for pixels in uniform regions
by looking at all pixels.
Only pixels at the same distance from the end need to be compared to.
Change-Id: I4f187e965f0667d3a929775726a412f7e69f6473
if src_{r,g,b} = 0, any value of src_a or dst_a such that
abs(src_a - dst_a) <= max_allowed_diff
was making the test pass, despite being very different-looking pixels.
The fix is to require same values for src_a and dst_a before attempting
an r/g/b comparison.
Change-Id: If3a55a229eab3904ed454f20065e49e35c39f25c
Constants are such that brute force is sometimes faster for some
data (mostly big images it seems).
Change-Id: I90aef536408683535e3b09ddfa2e77a9834038f6
gcc-4.8.3 was giving the undefined reference to ImgIoUtilReadFile and
ImgIoUtilCopyPlane when linking libimagedec.a
Change-Id: I8b85ab3a860602535629d0bb541a39754413728e
Return key/index if the query is found, and -1 otherwise.
The benefit of this is to save a hashing computation.
Change-Id: Iff056be330f5fb8204011259ac814f7677dd40fe
WEBP_HAVE_FLAG_LOCAL -> WEBP_HAVE_FLAG_${WEBP_SIMD_FLAG} this will
include the flag being tested in the output:
-- Performing Test WEBP_HAVE_FLAG_NEON
Change-Id: I1c0a143a857b16e4eb1fcf8b23c176380a5fef29
decoding and file i/o have been split to imageio, all that remains is
some string routines used for parameter parsing in the examples
Change-Id: I77386cd8aa39124b9e14c95fdbaa17ea4ab5bb24
This function returns a rough estimation of the quality factor used
for compressing the WebP bitstream.
Simple command line: 'webp_quality [-quiet] in.webp'
should print the estimated quality factor.
Change-Id: Ifba3e489461f5a587003ac9f08cc7556e9b24ac2
reserve src/ for the code of the main libraries: libwebp, libwebpdemux
and libwebpmux
+ demote this library to internal only; i.e., don't install the header +
lib with make install
Change-Id: I8c9844db8f494be0fa0a2549a5b75b5cebcf666d
We add the following MSA optimized rescaling functions:
- RescalerExportRowExpand
- RescalerExportRowShrink
Change-Id: Ic1c76065423b02617db94cf0c22bb564219b36e6
We add the following MSA optimized color transform functions:
- TransformColor
- SubtractGreenFromBlueAndRed
Change-Id: Ib182d2b5faa7191f503ce70f0dfde0ac89402fd3
This improves compression by ~5% at default quality.
If only 'allow_mixed' is on (but 'minimize_size' isn't), we continue to
use a heuristic to try one of the two or both.
Change-Id: Ia573a73ea26ad25f9debff759eed69d2b0449e82
The decision is based on the variance between DC values of each
sub-4x4 block. This heuristic is rather ok for predicting whether
the 2nd transform (intra-16) is going to help or not.
The decision threshold varies with quality (=quantization).
It's only used for -m 0 and -m 1, where no full RD-opt is performed.
It actually makes these modes quite faster, with RD curve much
closer to the -m 2 mode.
Change-Id: I15f972db97ba4082cbd1dfd16bee3eb2eca701a8
We add the following MSA optimized encoder quantization functions:
- QuantizeBlock
- Quantize2Blocks
Change-Id: Ie32b442afa99eee62d2ef48942b41116a4e157d3
pre-allocating a sorted[] array for most common cases of small
alphabet size cuts a lot of traffic.
Change-Id: I73ff2f6e507f81b0b0bb7d9801a344aa4bcb038a
This will well isolate contributions for original code,
generated code and SIMD (especially for Android).
Change-Id: Ie47664decc7f43c2f57260a72cab951c347281a7
+ s/src_a/dst_a/
+ remove unnecessary (void) as expected_num_lines_out is used within the
function
Change-Id: Ic45f798ef22bd19eaabf1a0512d1cf8a201bb4b5
libwebp-0.5.1
- 6/14/2016: version 0.5.1
This is a binary compatible release.
* miscellaneous bug fixes (issues #280, #289)
* reverted alpha plane encoding with color cache for compatibility with
libwebp 0.4.0->0.4.3 (issues #291, #298)
* lossless encoding performance improvements
* memory reduction in both lossless encoding and decoding
* force mux output to be in the extended format (VP8X) when undefined chunks
are present (issue #294)
* gradle, cmake build support
* workaround for compiler bug causing 64-bit decode failures on android
devices using clang-3.8 in the r11c NDK
* various WebPAnimEncoder improvements
* tag 'v0.5.1': (30 commits)
update ChangeLog
Clarify the expected 'config' lifespan in WebPIDecode()
update ChangeLog
Fix corner case in CostManagerInit.
gif2webp: normalize the number of .'s in the help message
vwebp: normalize the number of .'s in the help message
cwebp: normalize the number of .'s in the help message
fix rescaling bug: alpha plane wasn't filled with 0xff
Improve lossless compression.
'our bug tracker' -> 'the bug tracker'
normalize the number of .'s in the help message
pngdec,ReadFunc: throw an error on invalid read
decode.h,WebPGetInfo: normalize function comment
Inline GetResidual for speed.
Speed-up uniform-region processing.
free -> WebPSafeFree()
DecodeImageData(): change the incorrect assert
Fix a boundary case in BackwardReferencesHashChainDistanceOnly.
Make sure to consider small distances in LZ77.
add some asserts to delimit the perimeter of CostManager's operation
...
Change-Id: I44cee79fddd43527062ea9d83be67da42484ebfc
We add the following MSA optimized color transform functions:
- AddGreenToBlueAndRed
- TransformColorInverse
Change-Id: Iceab3813905955aa8b811253df9188512fc7de3f
(in case no alpha was present in the source .webp,
but user requested some)
Change-Id: I9011d38237907c60d6796a86bd2c72166aa80f27
(cherry picked from commit 06a38c7b1c)
This is essentially a revert of a3611513d2
and cfbcc5ece0.
Here is what happened: there was a corruption bug that eventually
got fixed by 0174d18d8b.
But before finding the root, a3611513d2
and cfbcc5ece0 hid the bug
by not imposing length of 1 when it was actually 2 or 3 (which does help
compression as a litteral is more efficient than an offset and a length
of size 2 or 3).
Change-Id: I6f18fc1f583a51ac9d8aab2508458264047cd493
convert the assert() to an error check to avoid crashing when reading
malformed files.
BUG=webp:302
Change-Id: I25eed9cab5c0a439bd3411beacc83f3a27af2bbf
We only perform a single pass, and swap the final histograms
into the beginning of the array as we go. Therefore, they are
already at the correct place at the end of the pass.
-> HistogramCompactBins() is removed, we just truncate the array.
output is bitwise the same.
Change-Id: I9508c96dda0f8903c927a71b06af4e6490c3249c
output should bit-write the same as before, in both
low_effort and non low_effort modes.
if anything, speed is a tad faster, probably because of the
reduced memory traffic.
Change-Id: Iaa2ddcfda2aaffefe7e5b7bc89216373d1ddb194
MAX_COLOR_COUNT was just a synonym and its use in the header was a bit
strange given the visibility of that define.
Change-Id: I536964ddc14a0c48263191b6afb80695b5a038e6
this function can be called not to decode pixels, but simply
to finish processing (through process_func()) the already decoded
pixels.
Change-Id: I80485e92e3c47f0aa3389476dcb82745a243fc4a
The optimization for (len != MIN_LENGTH) actually only holds for
(len > MIN_LENGTH) but (len < MIN_LENGTH) can now happen as len can
be changed in the loop before.
Change-Id: I3f9f91a540206c80385c5fba96c3d64ab9536752
On the non-fast path (use_8b_decode_=0) for decoding the alpha-mask,
we could end up requesting ApplyInverseTransform() with more rows
to process than NUM_ARGB_CACHE_ROWS. This could only happen on the
very last bottom rows of the image.
* ProcessRows() doesn't need to be fixed, since we never request more
than NUM_ARGB_CACHE_ROWS rows. Added an assert for that.
* the use_8b_decode_=1 case doesn't use argb_cache_, but rather does
the palette-decoding call directly. So, no problem here too.
Only the generic (and rather rare) case of calling ExtractAlphaRows()
was affected.
Change-Id: I58e28d590dcc08c24d237429b79614abcef1db7c
This is getting back to the old behavior which is actually better for
compression and speed with the latest patches.
Change-Id: I35884bab02589297c25d6e1e66dc5f13e05f7aa7
This was defined (slightly differently) at two places. Created a common
method and moved to utils/utils.[hc].
Change-Id: I66c3ac6dea24e0cd2c0eaa5440f3142b4dbbe23b
we don't need to store the resulting histogram, so no need to
call HistogramAddEval().
Allows some signature simplifications...
Change-Id: I3fff6c45f4a7c6179499c6078ff159df4ca0ac53
In case where the same offset is found in consecutive pixels,
the cost computation from one pixel can be re-used for the next.
Change-Id: Ic03c7d4ab95f3612eafc703349cfefd75273c3d7
and also recycle the malloc'd intervals
This avoids quite some malloc/free cycles during interval managment.
Change-Id: Ic2892e7c0260d0fca0e455d4728f261fb4c3800e
In a lot of cases, only one interval is used. This can cause
a lot of malloc/free cycles for only 56 bytes. By caching this
single interval and re-using it, we remove this cycle in most
frequent cases.
Change-Id: Ia22d583f60ae438c216612062316b20ecb34f029
As per the spec
(https://developers.google.com/speed/webp/docs/riff_container), only the
extended file format can contain an unknown chunk. So, when assembling a
WebP file with muxer, whenever there is an unknown chunk present, we
should create a VP8X chunk (even though none of the features are
present).
BUG=webp:294
Change-Id: I5da52d311e1853d40063d0f5026100d4325effaa
In some cases, the hash chain for a function is filled several
times:
- GetBackwardReferences -> CalculateBestCacheSize ->
BackwardReferencesLz77 that computes the hash chain
- GetBackwardReferences ->
(not always) BackwardReferencesTraceBackwards ->
BackwardReferencesHashChainDistanceOnly that computes the hash
chain in a slightly different way
Speed and compression performance are slightly changed (+ or -)
but will be homogneized in a later patch.
Change-Id: I43f0ecc7a9312c2ed6cdba1c0fabc6c5ad91c953
-> WebPImageReader
Introduce a variant of image-guessing function that returns a reader
directly: WebPGuessImageReader()
Change-Id: I5ddc53024fcf941e33d997b2be6aa1a963d939ab
adds a generic examples/image_dec.[ch] entry point too.
WebPGuessImageType() can be used to infer image type.
Change-Id: I8337e7b6ad91863c9cf118e4791668d2d175079b
This reverts commit 169004b1d5.
this changes the ABI, so should bump versions and add a note to NEWS
when we're ready to expose it
Change-Id: Ic5bbd0aee2b6fd0f9d438a9effedf22fe0cec4bf
tl;dr
We do the following:
- Start with transparent value of 0x00000000 instead of 0x00ffffff, so that
WebPCleanupTransparentAreaLossless() is a no-op.
- Restore the original canvas after lossy encoding, to discard changes made by
WebPCleanupTransparentArea() before the next encode.
Explanation of why:
In the mixed mode, anim_encoder tries to encode using both lossless and lossy
compression. In fact, when "min_size" option is enabled, there are at most 4
encodes that can happen in this order:
- lossless with dispose none
- lossy with dispose none
- lossless with dispose background
- lossy with dispose background
But both lossless and lossy both potentially modify the canvas during encode
(for better compression):
- Lossless: WebPCleanupTransparentAreaLossless() turns all transparent pixels
to 0x00000000
- Lossy: WebPCleanupTransparentArea() flattens some transparent pixels
So, the result is that, sometimes we feed the modified canvas to the encoder
instead of the original one, which isn't the right thing to do.
This also applies to just lossless or just lossy encoding, as multiple encodes
happen (with the two dispose methods) in those cases too.
Change-Id: Idfa8ce831a1627014785ba7d0316c42f72594455
This was defined (slightly differently) at two places. Created a common
method and moved to utils/utils.[hc].
Change-Id: I19adc9c48f2a4e2ec9d995e78add6f25172774c2
no longer gate this on WEBP_FORCE_ALIGNED as WebPMemToUint32() provides
this service. replace that check with WORDS_BIGENDIAN as the block is
currently little-endian specific.
Change-Id: Ie04ec0179022d20dab53da878008ae049837782f
the read size may be fixed, but the offsets into buf_ are not. forcing
an aligned read then shifting or using a temporary would be costly. this
is less important now that WebPMemToUint32() is being used.
Change-Id: I357fec8f750969cce91987abebed2f95e27a835f
Instead of comparing all the following pixels over len (which can
frequently reach the maximum MAX_LENGTH=4096 for some images),
intervals are stored and compared.
Change-Id: I0dafef6cc988dde3c1c03ae07305ac48901d60ee
The old implementation in enc/near_lossless.c performing a separate
preprocessing step is used only when a prediction filter is not used,
otherwise a new implementation integrated into lossless_enc.c is used.
It retains the same logic for converting near lossless quality into max
number of bits dropped, and for adjusting the number of bits based on
the smoothness of the image at a given pixel. As before, borders are not
changed.
Then, instead of quantizing raw component values, the residual after
subtract green and after prediction is quantized according to the
resulting number of bits, taking care to not cross the boundary between
255 and 0 after decoding. Ties are resolved by moving closer to the
prediction instead of by bankers’ rounding.
This results in about 15% size decrease for the same quality.
Change-Id: If3e9c388158c2e3e75ef88876703f40b932f671f
copy and paste error in the previous commit, change
no_sanitize("unsigned-integer-overflow") from WEBP_UBSAN_IGNORE_UNDEF ->
WEBP_UBSAN_IGNORE_UNSIGNED_OVERFLOW
Change-Id: Id178ee14df1f2c4923a91ce423241e26b60b5d32
add WEBP_UBSAN_IGNORE_UNDEF to WebPMemToUint32() / WebPUint32ToMem()
when WEBP_FORCE_ALIGNED is unset
Change-Id: I726b2e708ce29681584eb10c8874d5cf1e798756
include utils.h directly where needed to allow utils.h to rely on
defines from dsp.h in a follow-up.
Change-Id: I32e26aaeb0b04ba60b3332f685f9a2be5a0a8d3d
configure gets 2 new options:
--enable-neon / --enable-neon-rtcd
the NEON modules are split to their own convenience lib and built with
auto-detected flags if none are given via CFLAGS.
the /proc/cpuinfo check will only be used for armv7 targets whose
toolchain does not enable NEON by default or didn't have NEON forced by
the CFLAGS from the environment.
Change-Id: I2755bc1d065d5d6ee6143b44978c2082f8bef1c5
This fixes decoders built against clang-3.8 (r11c). Without this change
bad conditional code would be generated causing all calls to
WebPParseHeaders() to return 4 (UNSUPPORTED_FEATURE).
Original fix:
https://android-review.googlesource.com/#/c/196123
Change-Id: Id4b4d84048d347cea110b6cf297ef9ef4fbed323
TEST_AND_ADD_CFLAGS tests flags independently from AM_CFLAGS. The lack
of -Wformat may cause -Wformat-* checks to fail (gcc 4.9), though in the
end they would have succeeded with -Wall present.
Change-Id: Ic8f0d5b8a3f75a0b105bdc4ddd16690878a30222
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9
this is similar to the changes made in:
c8a87bb AssignSegments: quiet -Warray-bounds warning
3e7f34a AssignSegments: quiet array-bounds warning
Change-Id: Iec7d470be424390c66f769a19576021d0cd9a2fd
This will allow to work in-place on cropped area later.
Also sped up the inverse gradient filtering in SSE2 (~4%)
Change-Id: I463149eee95d36984328f163a1e17f8cabd87441
This is only possible if the filtering is not VERTICAL or GRADIENT.
Otherwise, we need the spatial predictors and hence need the un-visible
part above crop_top row.
COLOR_INDEX transform is the only transform that is not predicted
from previous row. Applying the same for other transform (spatial
predict, ...) is going to be more involve and use an extra temporary row.
+ remove ApplyInverseTransformsAlpha()
(work is done directly within ExtractPalettedAlphaRows())
+ change back to using filter_ instead of unfilter_func_
Change-Id: I09e57efae4a4af00bde35f21ca6e3d73b35d7d43
After the introduction of lossy frame rectangles
we need equivalent option in anim_diff for merging similar frames.
Change-Id: I1d03acace396ec4cb0212586c6e8b8ec5b0b0bfc
not exactly same.
Based on lossy WebP quality setting, ignore minor differences when
flattening
similar blocks.
For 6k set, at default quality with '-min_size' option, improves
compression by 0.3%
Change-Id: Ifcb64219f941e869eb2643e231220b278aad4cd4
there's some subtle changes:
- DecodeAlphaData() may be called with pos==end because we don't want
to decode more data (there's none left), but because we want to apply
process_func() to all the unprocessed pixels already decoded
- last_row is exclusive and should be understood as 'up to last_row'. Can be misleading.
- VP8LDecodeAlphaImageStream() was testing dec->last_pixel_ for completion,
which was wrong because last_pixel_ is the last *decoded* pixel, not the
last *processed* one. -> test now uses last_row_, as expected
Change-Id: I1fb04ba25cd7a4775db9e3deee3e2ae80f9c0a75
this might change some crc slightly, since WebPDequantizeLevels()
performs an analysis pass, counting levels, which impacts the smoothing.
Now, the cropping area is not the same, so minor diffs are expected here
and there.
Change-Id: I3cce1e40c6f11c25b7c841044d637685c5740352
* make ALPHNew/Delete static
* properly init ALPHDec::io_
* introduce AllocateAlphaPlane() and WebPDeallocateAlphaMemory()
* reorganize VP8DecompressAlphaRows()
but we're still allocate the full alpha-plane. Optim will come
in another patch since it's tricky
Change-Id: Ib6f190a40abb7926a71535b0ed67c39d0974e06a
this change will be superseded by patch #335160 eventually, but until then
let's fix the problem temporarily.
Change-Id: Iafd979c2ff6801e3f1de4614870ca854a4747b04
When FlattenSimilarBlocks() was making some blocks transparent with
averaged RGB values, filtering in lossy compression was causing
blockiness just outside the edge of these blocks.
Disabling filtering for that particular case avoid these block
artifacts.
The total encoded size of the 6k GIF set remains roughly the same (in
fact, reduces a bit).
Change-Id: Ida71cbabd59d851e16d871f53d19473312b3cc77
We pick a mapping with quality 0 mapping to max diff 32, to quality 100
mapping to max_diff 1.
For 6k GIF image set, this improves compression by:
4% at quality 0
0.05% at quality 75
Benefits the MovingThumbnailer test videos too.
Change-Id: I6838ce864d41e1e65311d26b9b8115a12390a253
This way we can ignore some noisy pixels and get tighter frame
rectangles.
Some results:
- Correctness:
Tested that anim_diff reports all images are identical for lossless, and
similar min_psnr value for lossy and mixed modes.
Also checked output images visually to make sure there weren't any
obvious kinks.
- Compression:
A very tiny improvement for 6000 image GIF set we have (0.03%) for lossy
and mixed mode. For some of these images, frames get dropped
automatically as they have a very small diff from previous frame.
10 images from test_video_frames_png show a clear improvement in
compression though. This CL leads to 7 out of 9 lossy WebPs getting
smaller -- for one of them, this leads to a higher quality being picked
(as that’s still < 150 KB).
Change-Id: If539b9e77e1375aa15edc8f926933593a9865f1c
and also pass 'VP8Io* io' extra param to VP8DecompressAlphaRows()
This is somehow in preparation for some memory optimizations in
the 'cropping' case. For now, only the easy crop_bottom case is
optimized.
Change-Id: Ib54531ba057bf62b98422dbb6c181dda626c72c2
This avoids generating file that would trigger a decoding bug
found in 0.4.0 -> 0.4.3 libwebp versions.
This reverts commit 6ecd72f845.
Change-Id: I4667cc8f7b851ba44479e3fe2b9d844b2c56fcf4
The mode's bits were not taken into account, which is ok for most of cases.
But in case of super large image, with 'easy' content, their overhead starts
mattering a lot and we were omitting to optimize for these.
Now, these mode bits have their own lambda values associated, limiting
the jerkiness. We also limit (for -m 2 only) the individual number of bits
to something that will prevent the partition 0 overflow.
removed the I4_PENALTY constant, which was a rather crude approximation.
Replaced by some q-dependent expression.
fixes issue #289
Change-Id: I956ae2d2308c339adc4706d52722f0bb61ccf18c
If value is '2', it means the buffer is a 'slow' one, like GPU-mapped memory.
This change is backward compatible (setting is_external_memory to 2
will be a no-op in previous libraries)
dwebp: add flags to force a particular colorspace format
new flags is:
-pixel_format {RGB,RGBA,BGR,BGRA,ARGB,RGBA_4444,RGB_565,
rgbA,bgrA,Argb,rgbA_4444,YUV,YUVA}
and also,external_memory {0,1,2}
These flags are mostly for debuggging purpose, and hence are not documented.
Change-Id: Iac88ce1e10b35163dd7af57f9660f062f5d8ed5e
This is in preparation for some SSE2 code.
And generally speaking, the whole SSIM code needs some
revamp: we're not averaging the SSIM value at each pixels
but just computing the overall SSIM value once, for the whole
plane. The former might be better than the latter.
Change-Id: I935784a917f84a18ef08dc5ec9a7b528abea46a5
based on the sse2 change in:
9960c31 Remove an unnecessary transposition in TTransform.
~9-10.5% faster at the function-level, < 1% overall
Change-Id: I44413369b230b250fb0dbc51ff2f17cfeda609b7
- The result is now indeed closest among possible results for all inputs, which
was not the case for bits>4, where the mapping was not even monotonic because
GetValAndDistance was correct only if the significant part of initial fit in
a byte at most twice.
- The set of results for a larger number of bits dropped is a subset of values
for a smaller number of bits dropped. This implies that subsequent
discretizations for a smaller number of bits dropped do not change already
discretized pixels, which improves the quality (changes do not accumulate)
and compression density (values tend to repeat more often).
- Errors are more fairly distributed between upwards and downwards thanks to
bankers’ rounding, which avoids images getting darker or lighter in overall.
- Deltas between discretized values are more repetitive. This improves
compression density if delta encoding is used.
Also, the implementation is much shorter now.
Change-Id: I0a98e7d5255e91a7b9c193a156cf5405d9701f16
Pass them along to internal 'pic' object, so that progress can be reported back
and user data can also be inspected.
Change-Id: Idb5d0d4a76d07283d704a86c5892e1ad7bda09fa
SSE4.1 is slower than the SSE2 implementation and this seems to
be due to a slow _mm_loadl_epi64 implementation by gcc
(hence a bug with my gcc 4.8) and a very slow _mm_hadd_epi32. Both
got confirmed by IACA and experiments.
Change-Id: I05607f66b7ccd8f4f42e000693aea583ffd5768f
We were not updating the current_width_, which is usually
not a problem, unless we use Delta Palette with small number
of colors
-> Addressed this re-entrancy problem by checking we have
enough capacity for transform buffer.
The problem is not currently visible, until we restrict
the number of gradient used in delta-palette to less than 16.
Then the buffers have different current_width_ and the problem
surfaces.
Change-Id: Icd84b919905d7789014bb6668bfb6813c93fb36e
The transpose refactoring will help removing a transpose in a
later CL.
The horizontal add function helps removing a _mm_sad_epu8 in DC8uv
=> the latency/throughput went from 29/25 to 23/19
Change-Id: I5f3dfd4aad614eb079b1e83631e6a7cef49a3766
This is to prevent users shooting in the foot using -psnr or
-size alone and not getting the expected result.
Change-Id: I67a3289e4ec0a2a813c98807f2ec5e600f52dc63
'implicit conversion from 'int' to 'short' changes value from 33050 to
-32486'
original patch:
https://codereview.chromium.org/1657313003/
Make libwebp build with -Wconstant-conversion from newer clangs.
After http://llvm.org/viewvc/llvm-project?rev=259271&view=rev, clang
points out that _mm_set1_epi16(33050) causes an overflow in the short
argument to _mm_set1_epi16(). Since there's no version that takes an
unsigned short, add an explicit cast to tell the compiler that this is
intentional.
No behavior change.
Change-Id: I6b4e3401b15cfbcc895f9e81b5c2dc59d43ffb9b
The goal is not to replace our autotools configuration, but to
provide a minimal CMake file that can build the lib, cwebp and
dwebp.
Change-Id: I3be343bd698d118c5f00172449d232d87e868f23
libwebp-0.5.0
- 12/17/2015: version 0.5.0
* miscellaneous bug & build fixes (issues #234, #258, #274, #275, #278)
* encoder & decoder speed-ups on x86/ARM/MIPS for lossy & lossless
- note! YUV->RGB conversion was sped-up, but the results will be slightly
different from previous releases
* various lossless encoder improvements
* gif2webp improvements, -min_size option added
* tools fully support input from stdin and output to stdout (issue #168)
* New WebPAnimEncoder API for creating animations
* New WebPAnimDecoder API for decoding animations
* other API changes:
- libwebp:
WebPPictureSmartARGBToYUVA() (-pre 4 in cwebp)
WebPConfig::exact (-exact in cwebp; -alpha_cleanup is now the default)
WebPConfig::near_lossless (-near_lossless in cwebp)
WebPFree() (free'ing webp allocated memory in other languages)
WebPConfigLosslessPreset()
WebPMemoryWriterClear()
- libwebpdemux: removed experimental fragment related fields and functions
- libwebpmux: WebPMuxSetCanvasSize()
* new libwebpextras library with some uncommon import functions:
WebPImportGray/WebPImportRGB565/WebPImportRGB4444
* tag 'v0.5.0':
update ChangeLog
faster rgb565/rgb4444/argb output
update NEWS
update AUTHORS
update mailmap
bump version to 0.5.0
README: update help text, repo link
Change-Id: I21dc611cfd2a3cb6ed6ba5c455a5049fe615f7a1
The code and logic is unified when computing bit entropy + Huffman cost.
Speed-wise, we gain 8% for lossless encoding.
Logic-wise, the beginning/end of the distributions are handled properly
and the compression ratio does not change much.
Change-Id: Ifa91d7d3e667c9a9a421faec4e845ecb6479a633
setting all transparent pixels to black rather than the "flatten" method.
0.3% smaller filesize on the 1000 PNGs if alpha cleanup is used (before: 18685774, after: 18622472)
Change-Id: Ib0db9e7ccde55b36e82de07855f2dbb630fe62b1
The functions containing magic constants are moved out of ./dsp .
VP8LPopulationCost got put back in ./enc
VP8LGetCombinedEntropy is now unrefined (refinement happening in ./enc)
VP8LBitsEntropy is now unrefined (refinement happening in ./enc)
VP8LHistogramEstimateBits got put back in ./enc
VP8LHistogramEstimateBitsBulk got deleted.
Change-Id: I09c4101eebbc6f174403157026fe4a23a5316beb
This implementation brings:
- an SSE implementation of packing / unpacking
- bigger buffers processed at the same time
The speedup is of 4% on lossy decoding (YUV to RGB), 0.5% on
lossy encoding (RGB to YUV was already optimized).
Change-Id: Iec677ee17f91c08614d1adab67c6df551925767f
* changes:
demux: remove GetFragment()
demux: remove dead fragment related TODO
demux, Frame: remove is_fragment_ field
demux,WebPIterator: remove fragment_num/num_fragments
demux: remove WebPDemuxSelectFragment
this hasn't been set since parsing of the experimental chunk was
removed.
+ cleanup IsValidExtendedFormat(). is_fragmented has caused immediate
failure since:
4e2589f demux: restore strict fragment flag check
Change-Id: If9ecfc19556297100a6d5de1ba2cffdcbdc6c8fd
and make it the default if available. this limits symbol visibility to
those marked with WEBP_EXTERN.
Change-Id: I529bf6d143a3008d9385044c87ad7dce72dce9a0
though visible WebPCopyPlane & WebPCopyPixels are not part of the public
api; avoid using a private header within this public module.
Change-Id: I5c8615fcc07090ffaa8933b00af418d8431936eb
INFO: From Compiling src/dsp/cpu.c:
src/dsp/cpu.c: In function 'x86CPUInfo':
src/dsp/cpu.c:36:3: inconsistent operand constraints in an 'asm'
With PIC and mcmodel=medium, the %rbx register must be saved and
restored which causes this problem. This was also solved in GCC-4.9 with
this patch:
https://gcc.gnu.org/ml/gcc-patches/2012-12/msg01484.html
Tested:
Builds fine with this change.
Change-Id: Icca8eea7bf5af3ef9f17f6ae2886e3430143febf
CombinedShannonEntropy takes 30% for lossless compression.
This implementation speeds up the overall process by 2 to 3 %.
Change-Id: I04a71743284c38814fd0726034d51a02b1b6ba8f
The previous priority system used a heap which was too heavy to
maintain (what was gained from insertions / deletions was lost
due to a linear that still happened on the heap for invalidation).
The new structure is a priority queue where only the head is
ordered.
Change-Id: Id13f8694885a934fe2b2f115f8f84ada061b9016
SimpleQuantize()
it's now a single function, that reconstructs the intra4x4 block during the scan
The I4_PENALTY had to be adjusted.
Overall, result is better quality-wise (esp. at q < 50), and a tad faster too.
method #0, #1 and #3+ are unchanged
Change-Id: If262aeb552397860b3dd532df8df6b1357779222
* Precision is slightly different
* also implemented in SSE2 the missing WebPUpsamplers for MODE_ARGB, MODE_Argb, MODE_RGB565, etc.
* removing yuv_tables_sse2.h saved ~8k of binary size
* the mips32/mips_dsp_r2 code is disabled for now, since it has drifted away
* the NEON code is somewhat tricky
Change-Id: Icf205faa62cf46c2825d79f3af6725dc1ec7f052
we map the input file into memory, even in the non-stdin case.
This is less efficient than letting the png/jpeg/... decoding libraries
use fread()'s, but more general.
Change-Id: I4501cb9a1daf69593eb8e3326c115cd8cbdf92fd
-> read is a bit slower (memory allocation and such) than reading directly from disk.
-> we're not yet ready to accept stdin as input (-- -) because we still need to guess
the file type with GetImageType(). And since we can't rewind on stdin, this will need
a bit more work before being able to read from stdin.
Change-Id: I6491fac4b07d28d1854518325ead88669656ddbf
Gives 0.9% smaller (2.4% compared to before alpha cleanup) size on the 1000 PNGs dataset:
Alpha cleanup before: 18856614
Alpha cleanup after: 18685802
For reference, with no alpha cleanup: 19159992
Note: WebPCleanupTransparentArea is still also called in WebPEncode. This cleanup still helps
preprocessing in the encoder, and the cases when the prediction transform is not used.
Change-Id: I63e69f48af6ddeb9804e2e603c59dde2718c6c28
The 32-bit buffers are actually rarely 64-bit aligned.
The new solution uses memcmp and is alignment agnostic.
It is also slightly faster.
Change-Id: I863003e9ee4ee8a3eed25b7b2478cb82a0ddbb20
for more speed.
This gives a roughly a 1% speedup for low_effort. But actually this is a
preparation for the upcoming CL that changes RGB values of transparent pixels
based on prediction, which should not be done for low_effort because that would
slightly hurt its performance.
On 1000 PNGs, with quality 0, method 0:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.034 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.428 MP/s
Change-Id: I5ed9f599bbf908a917723f3c780551ceb7fd724d
-Wshorten-64-to-32 and -Wunreachable-code were lost in:
96201e5 migrate anim_diff tool from C++ to C89
Change-Id: Ic7161908b1aa88094563f0c7bcad1d508101d015
add -flax-vector-conversions as a workaround when building the
intrinsics files. this was made available in 4.3.x, but may have been
backported. issue observed with:
cc (GCC) 4.2.1 20070831 patched [FreeBSD] (9.3).
src/dsp/alpha_processing_sse2.c:237 incompatible type for argument 1
of '__builtin_ia32_psrlqi128'
BUG=274
Change-Id: I996076643158ecc581facd22f9fb87bec257ccea
Arrays were compared 32 bits at a time, it is now done 64 bits at a time.
Overall encoding speed-up is only of 0.2% on @skal's small PNG corpus.
It is of 3% on my initial 1.3 Mp desktop screenshot image.
Change-Id: I1acb32b437397a7bf3dcffbecbcd4b06d29c05e1
The same computation was done for both values: go over two buffers,
sum them up, and take a decision on the sum at each iteration.
MIPS32 code has been disabled for now, pending a code update.
Change-Id: I997984326f7092b3dbb8cfa1e524bd8132b2ab9d
instead of per block. This prepares for a next CL that can make the
predictors alter RGB value behind transparent pixels for denser
encoding. Some predictors depend on the top-right pixel, and it must
have been already processed to know its new RGB value, so requires per
scanline instead of per block.
Running the encode speed test on 1000 PNGs 10 times with default
settings:
Before:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.497 MP/s
After:
Compression (output/input): 2.3745/3.2667 bpp, Encode rate (raw data): 1.501 MP/s
Same but with quality 0, method 0 and 30 iterations:
Before:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.379 MP/s
After:
Compression (output/input): 2.9120/3.2667 bpp, Encode rate (raw data): 36.462 MP/s
No effect on compressed size, this produces exactly same files. No
significant measured effect on speed. Expected faster speed from better
memory layout with scanline processing but slower speed due to needing
to get predictor mode per pixel, may compensate each other.
Change-Id: I40f766f1c1c19f87b62c1e2a1c4cd7627a2c3334
the problem was the incorporation of the extra constant 1<<16 in the kC1
constant, to emulate the addition. It's now removed and the addition is
performed explicitly.
No real speed difference observed.
cf. issue #278
Change-Id: I2c6499031571d98afff392fb5ebe21a5fa60722d
* changes:
Makefile.vc: enable WEBP_USE_THREAD for windows phone
thread: use CreateThread for windows phone
thread: use WaitForSingleObjectEx if available
thread: use InitializeCriticalSectionEx if available
thread: use native windows cond var if available
if FANCY_UPSAMPLING was not defined but io->fancy_upsampling was set,
then the call to WebPInitSamplers() was skipped -> boom.
Change-Id: Id63e2ecc09f532fbe2ec9936d9ce4b502ba8fac5
Rename the flag to exact instead of the opposite cleanup_alpha. Add the flag to
WebPConfig. Do the cleanup in the webp encoder library rather than the cwebp
binary, this will be needed for the next stage: smarter alpha cleanup for
better compression which cannot be done as a preprocessing due to depending on
predictor choices in the encoder.
Change-Id: I2fbf57f918a35f2da6186ef0b5d85e5fd0020eef
the original change triggered several internal API modifs.
This is to ensure that we're never computing pointer that can
possibly wrap around, or differences between pointers that can
overflow.
no observed speed difference
Change-Id: I9c94dda38d94fecc010305e4ad12f13b8fda5380
We now consider 3 special cases:
* htree-group has only 1 code (no bit is read from bitstream)
* htree-group has few enough literal symbols, so that all the bit
codes can fit into a look-up table of less than 64 entries
* htree-group has a trivial arb literal (not GREEN!), like before
No overall speed change.
Change-Id: I6077fa0b7e5c31a6c67aa8aca859c22cc50ee254
We now get error string instead of printing it.
The verbose option is now only used to print info and warnings.
Change-Id: I985c5acd427a9d1973068e7b7a8af5dd0d6d2585
+ jenkins fixes for native config (library order)
+ add a missing -lm
+ replace log10 by log, just in case
+ partially reverted configure.ac to remove the C++ part
Change-Id: Iee099c544451b23c6cfaca53d5a95d2d332e066e
It was needed earlier for WebPAnimEncoder API when it was using structs
like WebPConfig, but it only uses pointers to those now.
Change-Id: Ic0c144966421c678e8ef54b3fa81574bb2c9cd08
global effect is ~2% faster encoding from JPG source
and ~8% faster lossless-webp source decoding to PGM (e.g.)
Also revamped the YUVA case to first accumulate R/G/B value into 16b
temporary buffer, and then doing the UV conversion.
-> New function: WebPConvertRGBA32ToUV
Change-Id: I1d7d0c4003aa02966ad33490ce0fcdc7925cf9f5
Just for RGB24/BGR24 for now, which are the hard-to-optimize ones.
SSE2 implementation coming next.
ConvertRowToY() should go into dsp/ too, at some point.
Change-Id: Ibc705ede5cbf674deefd0d9332cd82f618bc2425
also switch to using ExtractAlpha() instead of hard-coding the loop.
The ARGBToY/UV functions are rather easy to port to SSE2 / NEON.
Change-Id: I8f1346a9ca427a36ce2d6c848369ca7964d8b3c7
This is to infer the needed conversion to YUV(A) or RGB(A).
This is useful to avoid some conversion steps between ARGB and YUVA.
For instance, if the input file is a JPEG, we decode to RGB and
convert to YUV right away, without the intermediate step to ARGB.
The only caveat is that cropping/scaling might give slightly different result,
because of YUV420 downsampling. Therefore, we omit this feature
at cwebp level, when -crop or -rescale is used.
Change-Id: I5a3abe5108982f2a4570e841e3d9baffc73f5bee
when the previous frame does not specify dispose to background only the
current frame's rectangle should be cleared
related to bug #245
Change-Id: I2fc4f5be99057e0bf87d8fedec57b06859b070bd
use 'u' rather than the unnecessary 'l' as a suffix. this prevents a
conversion warning with some toolchains
Change-Id: I21c33ce08819b3c839c75e03a8f7f3a6041d0695
Note that ALIGN_CST is still kept different in dec/frame.c for now,
because the values is 31 there, not 15. We might re-unite these two
later.
Change-Id: Ibbee607fac4eef02f175b56f0bb0ba359fda3b87
same functionality, but better code layout.
What changed:
* don't trash the palette_[] in EncodePalette(), so it can be re-used
* split generation of image from bit-stream coding
* move all the delta-palette code to delta_palettization.c, and only have 1 entry point there WebPSearchOptimalDeltaPalette()
* minimize the number of "#ifdef WEBP_EXPERIMENTAL_FEATURES" in vp8l.c
* clarify the TransformBuffer stuff. more clean-up to come here...
This should make experimenting with delta-palettization easier and more compartimentalized.
Change-Id: Iadaa90e6c5b9dabc7791aec2530e18c973a94610
some limitations: only for RGBA output,
and if reduction factor is not too small (dst_width > src_width / 128)
20-25% faster, ~4-6% global improvement total decoding.
Change-Id: I95366ddaa4a38e0a96bed754dfe790126f7bb84a
It's better to stay with a 32b fixed-point precision overall, otherwise
the C-version on ARM gets *slower*.
Actually, gcc ARM compiler optimizes some instructions pretty
well when WEBP_RESCALER_FIX is exactly 32, even in C.
Change-Id: I0eea97f7db5947470f5af355dee098eca81e178d
-fembed-bitcode is the default, a framework built without this flag will
fail to link against an application using it.
BUG=267
Change-Id: I83461cb058b1866ac99b3f0bdfa890933e88ed26
New palette compresses more than 20% better with minimum quality loss.
Tested on set of wikipedia images with command line:
cwebp -delta_palettization
Change-Id: I82ec7d513136599cd70386f607f634502eb9095d
Earlier, we stored a 1x1 frame for such frames. Now, we drop every such
frame and increase the duration of its previous frame instead.
Also, modify the anim_diff tool to handle animated images that are
equivalent, but have different number of frames.
Change-Id: I2688b1771e1f5f9f6a78e48ec81b01c3cd495403
The rounding and arithmetic is not the same as previously, to prevent overflow cases for large upscale factors.
We still rely on 32b x 32b -> 64b multiplies. Raised the fixed-point precision to 32b
so that we have some nice shifts from epi64 to epi32.
Changed rescaler_t type to 'uint32_t' in order to squeeze in all the precision required.
The MIPS code has been disabled because it's now out-of-sync. Will be fixed in
a subsequent CL when the dust settles.
~30-35% faster
Change-Id: I32e4ddc00933f1b1aa3463403086199fd5dad07b
* vertical expansion now uses bilinear interpolation
* heavily assumes that the alpha plane is decoded in full, not row-by-row
* split the RescalerExportRow and RescalerImportRow methods into Shrink
and Expand variants.
* MIPS implementation of ExportRowExpand is missing.
There's room for extra speed optim and code re-org, but let's keep that for later patches.
addresses https://code.google.com/p/webp/issues/detail?id=254
Change-Id: I8f12b855342bf07dd467fe85e4fde5fd814effdb
This is designed for the simple use-case where one wants to decode all
frames one-by-one in order.
Also, use this API in anim_util library, which is in turn used by
anim_diff tool.
Change-Id: Ie8b653c04e867d40fd23321b3dd41b87689656c7
This makes the chains more efficient and a larger variety of data is tested.
0.02 % compression gain at q 100, 0.05 % at default quality. 0.8 % speedup by
callgrind.
0.16 % compression gain for lossy alpha ?!
Change-Id: I888120133352799eb14f5f602c7f40ab404bd665
this allows scaling to a particular width/height while preserving the
source aspect ratio using WebPRescalerGetScaledDimensions().
Change-Id: I77b11528753290c1e9bb942ac761c215ccfb8701
using a *tmp_plane buffer to split a/r/g/b planes up appeared to
be the easiest route, compared to copy-pasting the whole code and
making it x_stride aware...
Change-Id: I0898ef1df62bd3e1713b77187b31b5eeef3832fe
allows the values to be used in preproc checks, fixing a
-Wunreachable-code warning in 64-bit builds where VP8L_WRITER_BITS != 16
Change-Id: Ie98dff4e8ef896436557c64d5da2c5d70228a730
Slightly faster on -m 0 -q 0, particularly for small images (50 x 75
image was 0.1 % faster on callgrind measurement).
Increases compression density by 0.005 % for the 1000 images, but small
images can improve even 0.5 % (about 4 bytes, depending on the
characteristics of the palette).
Change-Id: I94f568d396ac62a054a829abeeef3eb0af6b3f94
If this flag is not used, RGB is premultiplied before comparison.
Otherwise, the raw R/G/B values are compared, which can be a problem
in transparent area (alpha=0 R/G/B=anything)
Change-Id: I131cc10ec92414ad508b81f599a60d0097cac470
the x_add/x_sub increments were wrong for u/v in the upscaling case.
They shouldn't be left to the caller's discretion, but set up by
WebPRescalerInit to their exact necessary values.
-> Cleaned-up WebPRescalerInit() param list.
-> added safety asserts
-> removed the mips32/mips_r2 variant of "ImportRow" which were buggy prior
Change-Id: I347c75804d835811e7025de92a0758d7929dfc09
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~54% faster at the
function level and mildly faster overall
Change-Id: Ibc648e9ee35021d48901e05aa596aa01067796a2
a total impact of 1 % on encoding speed
This allows for performance neutral removal of the binary search
in cache bits selection. This will give a small improvement in
compression density.
Change-Id: If5d4d59460fa1924ce71af977320834a47c2054a
0.21 % compression density improvement for 1000 png corpus in
lossless mode
0.50 % compression density improvement for 1000 png corpus in
lossy mode
Change-Id: I14ee8c427ae5d3e116b0ee6695fcdea3321a319d
valgrind --tool=callgrind shows a 9 % speedup: 1021201984 ticks before vs.
927917709 after
-q 0 -m 0 -lossless ~/alpi/1.png
22.040 MP/s before
24.796 MP/s after
Change-Id: Iaab928167b3e20fb0d9401c6f8317a26c5a610b4
* changes:
lossless: combine the Huffman code with extra bits
lossless: Inlining add literal
lossless: simplify HashChainFindCopy heuristics
lossless: 0.5 % compression density improvement
lossless: Add zeroes into the predicted histograms.
lossless: encoding, don't compute unnecessary histo
lossless: Remove about 25 % of the speed degradation
Faster alpha coding for webp
lossless: rle mode not to accept lengths smaller than 4.
lossless: Less code for the entropy selection
lossless: 0.37 % compression density improvement
_BitScanReverse() takes an unsigned long*
http://msdn.microsoft.com/en-us/library/fbxyd7zd.aspx
fixes:
C4057: 'function': 'unsigned long *' differs in indirection to slightly
different base types from 'uint32_t *'
fixes issue #253
Change-Id: I0101ef7be18c7ed188b35e9b17e7f71290953786
do not do length 2 matches far away
speedup for non compressible data by inserting two literals at a time
when no matches are found
Change-Id: Ia8e033071f4186bb8148bb2bf13ca37586734aa3
Increases compression density by 0.03 % for lossy.
Speeds up at least one of the lossy alpha images by 20 %.
Palette entropy 'kludge' seems to save 1-2 % on alpha images.
Change-Id: I2116b8d81593ac8173bfba54a7c833997fca0804
share the computation between different modes
3-5 % speedup for lossless alpha
1 % for lossy alpha
no change in compression density
Change-Id: I5e31413b3efcd4319121587da8320ac4f14550b2
introduced in:
"lossless: 0.37 % compression density improvement"
Uses the statistics of red and blue histograms to decide if to run
cross color correction at all.
Improves compression density by 0.02 % or so.
Change-Id: I47429557e9cdbd9fa90c584696f241b17427d73f
No significant size degradation (+0.001 %) for 1000 image corpus
Fixes the 8 ms vs 2 ms degradation from:
"lossless: 0.37 % compression density improvement"
Change-Id: Id540169a305d9d5c6213a82b46c879761b3ca608
counting the entropy expectation for five different configurations:
palette
non-predicted
non-predicted with subtract green
predicted
predicted with subtract green
and choose the strategy with the smallest expected entropy
Change-Id: Iaaf209c0d565660a54a4f9b3959067afb9951960
this should be used in preference to free() for releasing memory
returned from WebPDecode*() / WebPEncode*(). this simplifies memory
management when working through language bindings
Change-Id: I15eb538a45390efc552fda8e5c251a3fbdc13c29
After several trials at re-organizing the main loop and accumulation scheme,
this is apparently the faster variant.
removed the SSE41 version, which is no longer faster now.
For some reason, the AVX variant seems to benefit most for the change.
Change-Id: Ib11ee18dbb69596cee1a3a289af8e2b4253de7b5
this moves the function outside the WEBP_USE_INTRINSICS check.
there's no alternative version and it's ~70% faster at the
function level and 1-2% faster overall
Change-Id: I59fb4918ec86b1ac3a47cbd5d05ce62f007461cb
Changed the code (again) to process 4 pixels at a time. Loop is more
involved, but overall it's faster.
Removed the SSE4.1 implementation which is now slower than SSE2.
Change-Id: I7734e371033ad8929ace7f7e1373ba930d9bb5f1
New implementations: SubtractGreenFromBlueAndRed and TransformColor
around 1-2% faster lossless encoding.
Change-Id: I1668e36fdc316ba55b3b798b91b4a3e36ce62861
DispatchAlpha* functions are hard to speed up, compared to SSE2.
ExtractAlpha sees a ~15% speed-up though.
Change-Id: I8715c2defecbc832f469eed7e6ffd012146b52de
over a 1000 image corpus
Single photograph benchmark:
Before:
Q=20: 2.560 MP/s
Q=40: 2.593 MP/s
Q=60: 1.795 MP/s
Q=80: 1.603 MP/s
Q=99: 1.122 MP/s
After:
Q=20: 3.334 MP/s
Q=40: 2.464 MP/s
Q=60: 2.009 MP/s
Q=80: 1.871 MP/s
Q=99: 1.163 MP/s
This CL allows for some further improvements that would not be possible
otherwise.
Change-Id: I61ba154beca2266cb96469281cf96e84a4412586
use vld1_dup_u8() rather than a separate ld+dup after the values were
zero extended; mildly faster at the function level
Change-Id: I1b3666a6aeb465722a1214dbc6d71c27689a7f89
It can be used to test if given pair of animated images (GIF and/or
WebP) are identical in terms of pixel match and other animation
properties.
Change-Id: I84adea145e9d062be6ad06a0d4fcdc9658cf52d4
VP8EncPredChroma8 improvements over ~20M pixels
left/top: ~67%
left-only: ~52%
top-only: ~57%
none: ~61%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: Iee7e387fb2570b4eb5af5bfd123e9c2e9ea49c76
VP8EncPredLuma16 improvements over ~20M pixels
left/top: ~75%
left-only: ~47%
top-only: ~59%
none: ~63%
based on dec_sse2 versions with minor changes to benefit from the linear
storage of the left boundary
Change-Id: I7548be7214fa85c38fd11d30f5b8b271f437657d
this target is out of date and there are better ways to make a clean
tree (the first 2 versioned, the last one not):
make distclean (when using autoconf)
git clean -fdx
git archive
Change-Id: I766b75e0adf566c6f7db1a087ff486020b031b3a
structured extended feature flags require eax = 7; avoids incorrectly
detecting avx2 on some older processors that support avx.
for completeness also check for value=1 support used by the other
checks.
from [1]:
INPUT EAX = 0: Returns CPUID’s Highest Value for Basic Processor
Information and the Vendor Identification String
[1]
http://www.intel.com/content/www/us/en/processors/processor-identification-cpuid-instruction-note.html
Change-Id: I60b20d661a978d551614dbf7acdc25db19cb6046
* make VP8LPrefetchBits() safe wrt past-EOS reads
* set 'BitReader::bits_" to a safe shifting value upon EOS
no visible performance difference on x86
Change-Id: I0a4177928cfa81d5dfc9054b36a686eaa1bf8c65
Speed-wise equivalent on x86 and ARM (maybe a tad faster, hard to tell).
Note that the two 32-bit multiples are not strictly equivalent
to the 64-bit one, since we're missing one carry propagation.
In practice, no observable difference was seen because of this
slightly different hashing result.
Change-Id: I8f2381175eae1cb20dabf149e6b27e1768fba6ab
update makefile.unix to provide 'make extras' building instructions.
note: input ordering depends on WEBP_SWAP_16BIT_CSP for rgb565 and rgb4444
Change-Id: I6f22d32189d9ba2619146a9714cedabfe28e2ad0
When converting from video sources, the duration of current frame
is often unavailable until the next frame. So, we internally convert
timestamps to durations.
Change-Id: I20ad86361c22e014be7eb91f00d5d40108281351
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.
DC8uv: ~19% faster
DC8uvNoLeft: ~12% faster
Change-Id: I707c4f6177a65b5d1f2d3deeca87d2bb740185e2
use psadbw to perform top row summation; left remains in C as repacking
it into a vector to apply the same operation is too costly.
DC16: ~20% faster
DC16NoLeft: ~14% faster
Change-Id: I7ec3f8a6e5923f88a530f79fceb88d5001bef691
generates a stub function when the specific architecture is not enabled,
exposing a symbol in the module, avoiding a compiler warning
Change-Id: Ia9336e57466a9b5241b85c1c95838e91c9283147
had to rename few structs.
-> we can now include both vp8i.h and vp8enci.h without naming
conflicts.
Change-Id: Ib41b498f1b57aab3d6b796361afc45210ec75174
Visible speed-up, thanks to pshufb and pabsw and psignw use.
had to tweak configure.ac to make "smmintri.h" presence correctly
detected (we need to set the CPPFLAGS instead of the CFLAGS!)
Change-Id: I2ab99e16a27a64fdf1f09b2b4e30a5e74ccca080
allows the former to be inlined; negligible speed-up in most cases,
however this is structure is consistent with the rest of the optimized
modules
Change-Id: Ib080240b06f7a995b47f1906627850c355b82901
we look at average global improvement and stop when things are
moving slow, or when we had a quite good first iteration already
(means: the picture is "not difficult")
Change-Id: I8ab7d100353039b5b32bb5fac3fe03c8440c78d5
Speedup method StoreImageToBitMask by replacing the code to find histogram
index and Huffman tree codes at every iteration to a more optimal code that
updates these only when the current pixel (to write) crosses the histogram
tile-row boundary.
This change speeds up the StoreImageToBitMask method by 5%.
Change-Id: If01a1ccd7820f9a3a3e5bc449d070defa51be14b
the standard vtbl functions are available there [1][2].
based on a patch from: aaroncrespo
fixes issue #243.
[1]
http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf
[2] Apple LLVM Compiler Version 6.1
- Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0.
[...]
Support for the arm64 architecture has been significantly revised to
align with ARM's implementation, where the most visible impact is that a
few of the vector intrinsics have changed to match ARM's specifications.
Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447
The MIPS code for cost is not updated yet, that's why i keep Residual::*cost
around for now. Should be removed in favor of *costs later.
Change-Id: Id1d09a8c37ea8c5b34ad5eb8811d6a3ec6c4d89f
affected functions: SimpleVFilter16, SimpleHFilter16,
SimpleVFilter16i and SimpleHFilter16i
noticed bug in FilterLoop26 (fix included in this patch)
Change-Id: I72d9c1e45cbac6393eba52bb549b04924d463e30
removes circular dependency between dsp and enc.
since:
a987fae MIPS: dspr2: added optimization for function GetResidualCost
Change-Id: Ifeb8fc02de89e2ba982ed7ffacd925d649bfec3c
kGammaFix is now only defined with USE_GAMMA_COMPRESSION;
fixes:
use of undeclared identifier 'kGammaFix'
Change-Id: Ib1e2f410eff9b83be065894f88181f91dd2776e1
set/get residual C functions moved to new file in src/dsp
mips32 version of GetResidualCost moved to new file
Change-Id: I7cebb7933a89820ff28c187249a9181f281081d2
partially normalize indent, vertical whitespace and capitalization with
the copy used on developers.google.com/speed/webp
Change-Id: I8044418eeb9eaf5bd5c799675c74f6f845d503d6
the input to the function is non-const and the pointer being operated is
being free'd; removes an unnecessary cast in the process
Change-Id: Ic515ed672ddf7f8e4e36eeac696ff7aa8a3652f7
this is in line with the recommendation in the spec, cf.,
5603947 webp-container-spec: clarify background clear on loop
Change-Id: Id3910395b05a1a1f2804be841b61f97bd4bac593
Updated the near-lossless level mapping and make it correlated to lossy
quality i.e 100 => minimum loss (in-fact no-loss) and the visual-quality loss
increases with decrease in near-lossless level (quality) till value 0.
The new mapping implies following (PSNR) loss-metric:
-near_lossless 100: No-loss (bit-stream same as -lossless).
-near_lossless 80: Very very high PSNR (around 54dB).
-near_lossless 60: Very high PSNR (around 48dB).
-near_lossless 40: High PSNR (around 42dB).
-near_lossless 20: Moderate PSNR (around 36dB).
-near_lossless 0: Low PSNR (around 30dB).
Change-Id: I930de4b18950faf2868c97d42e9e49ba0b642960
at the beginning of the loop there's an implicit clear of the entire
canvas to the background (or application defined) color. this avoids
adding the final composited frame to the first.
Change-Id: Ia3a52cf4482c6176334a5c9c99a0ddd07d1776e7
previously the first frame would be redisplayed, which might be
unexpected if the final frame was meant to be a composite, for example.
Change-Id: I4da795623c71501e2fa426e8fba8fb2ffcbab58a
DitherRow() only checks this value, not 'skip_' so previously it was
uninitialized for these blocks.
Change-Id: I0f698b81854ee9d91edacb51c1e3bdab9cba96f2
similar to:
1ba61b0 enable NEON intrinsics in aarch64 builds
vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.
performance varies based on the input, 1-3% on encode was observed
Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
AnalyzeSubtractGreen constitutes about 8-10% of the comression CPU cycles.
Statistically, subtract-green is proved to be useful for most of the
non-palette compression. So instead of evaluating the entropy (by calling
AnalyzeSubtractGreen) apply subtract-green transform for the low-effort
compression.
This changes speeds up the compression at m=0 by 8-10% (with very slight loss
of 0.07% in the compression density).
Change-Id: I9797dc39437ae089716acb14631bbc77d367acf4
Speed up AnalyzeSubtractGreen by looping through the image pixel once to
compute the two histograms.
AnalyzeEntropy code cleanup.
Removed some 'if' conditions and pointer indirections inside pixel iterate loop.
Change-Id: Ia65e3033988ff67df8e3ecce19d6e34cfc76358e
Enable the WebP near-lossless feature by pre-processing the image to smoothen
the pixels.
On a 1000 PNG image corpus, for which WebP lossless (default settings) gets
25% compression gains, following is the performance of near-lossless feature
at various '-near_lossless' levels:
-near_lossless 90: 30% (very very high PSNR 54-60dB)
-near_lossless 75: 38% (very high PSNR 48-54dB)
-near_lossless 50: 45% (high PSNR 42-48dB)
-near_lossless 25: 48% (moderate PSNR 36-42dB)
-near_lossless 10: 50% (PSNR 30-36dB)
WebP near-lossless is specifically useful for discrete-tone images like
line-art, icons etc.
Change-Id: I7d12a2c9362ccd076d09710ea05c85fa64664c38
Some frames that were previously selected as key-frames were incorrectly
being reset to sub-frames.
Change-Id: Iee342dbb9a9aec144b8185c3b54ca56aa7038bfb
The 'inverse' variants are harder to parallelize, since
the result of filtering is used for prediction.
The 'direct' way is relatively easier.
The heavy bottleneck left for optimization is still GradientUnfilter()
Change-Id: I358008f492a887e8fff6600cb27857b18dee86e9
Simplify and speedup backward references for low-effort settings by evaluating
LZ77 references only. This change speeds up compression by 10-25% at lower
(q <= 25) quality range with a slight drop (0.2%) in the compression density.
Change-Id: Ibd6f03b1a062d8ab9191786c2a425e9132e4779f
Cleaup Near-lossless code
- Simplified and refactored the code.
- Removed the requirement (TODO) to allocate the buffer of size WxH and work
with buffer of size 3xW.
- Disabled the Near-lossless prr-processing for small icon images (W < 64 and H < 64).
Change-Id: Id7ee90c90622368d5528de4dd14fd5ead593bb1b
Reported here: https://code.google.com/p/webp/issues/detail?id=239
At the beginning of method 'DecodeImageData', pixels up to
'dec->last_pixel_' are assumed to be already cached. So, at the end of
previous call to that method also, that assumption should hold true.
Hence, we should cache all pixels up to 'src' regardless of 'src_last'.
This affects lossless incremental decoding only, as that is when
src_last and src_end differ.
Note: alpha decoding is implicitly incremental, as alpha decoding of
only the rows 'y_end - y_start' happens during FinishRow() call. So, this bug
affects alpha decoding in non-incremental decoding flow as well.
This bug was introduced in: https://gerrit.chromium.org/gerrit/#/c/59716.
Change-Id: Ide6edfeb2609b02aff701e1bd9fd776da0a16be0
SanitizeEncoderOptions() was changing kmin to 2 too, which resulted in a
bad state with kmin == kmax.
Change-Id: Ie7273f1949bac469e7e6c8efbc98b154caf6de0f
* use the same TFIX == YFIX precision (2bits)
* use int instead of float in LinearToGammaF()
output is visually equivalent. Code is a little faster.
Change-Id: Ie3cfebca351dbcbd924b3d00801d6523dca6981f
add additional return checks and asserts to avoid:
C6102: Using 'XXX' from failed function call ...
Change-Id: I51f5fa630324e0cd7b2d9fceefecb4f4021474b1
width / height are unsigned; fixes a warning with msvs /analyze:
C6340: Mismatch on sign: 'const unsigned int' passed as _Param_(4) when
some signed type is required in call to 'fprintf'.
Change-Id: I5f1fad4c93745baf17d70178a5e66579ccd2b155
check enc->argb_ to quiet an msvs /analyze warning:
C6387: 'enc->argb_+y*width' could be '0': this does not adhere to the
specification for the function 'memcpy'.
Change-Id: I87544e92ee0d3ea38942a475c30c6d552f9877b7
quiets an msvs /analyze warning, the cast is due to an earlier msvs
build warning.
C6297: Arithmetic overflow: 32-bit value is shifted, then cast to
64-bit value. Results might not be an expected value.
Change-Id: I0bb6cda57879f2fbd1e3515f6753a11bc08d14ac
and only use it on x86 / x64 where it's available.
has the side-effect of quieting a msvs /analyze warning:
C6001: Using uninitialized memory 'cpu_info'.
Change-Id: Iae51be3b22b2ee949cfc473eeea9fd9fb6b3c2cb
Disable costly TraceBackwards heuristic for computing the backward references
for low_effort (method=0) compression.
The TraceBackwards heuristic is already disabled for lower (q < 25) quality
range. Following is the compression data for 1000 image corpus for q >= 25.
This speeds up compression (q >= 25) by a factor of 2.5-3X with slight loss of
compression density (0.7% for lower quality range and 1.2% for higher qualities).
Change-Id: I256c9e2137c7de4083f423ea32ee12d3b0f46253
added new function CollectColorRedTransforms to C, which calls
TransformColorRed and it is realized via pointer to function
Change-Id: Ia68d73bfcf1ca2cb443dc2825910946221f87835
explicitly add immintrin.h instead of transitively picking it up via
windows.h presumably. makes the code easier to move around.
Change-Id: If70d5143ac94fc331da763ce034358858e460e06
added new function CollectColorBlueTransforms to C, which calls
TransformColorBlue and it is realized via pointer to function
Change-Id: Ia488b7a7a689223b5d33aae9724afab89b97fced
reported in https://code.google.com/p/webp/issues/detail?id=237
An empty partition #0 should be indicative of a bitstream error.
The previous code was correct, only an assert was triggered in debug mode.
But we might as well handle the case properly right away...
Change-Id: I4dc31a46191fa9e65659c9a5bf5de9605e93f2f5
gif2webp.exe is excluded from 'all' as libgif requires C99 support which
is only available from VS2013 onward. additional include paths/libraries
can be added with CL=/I... + LINK=... for this target.
Change-Id: If9f6be1c4029f486a9d6cdc0e862376cbd1e1be0
untested
builds only libwebp (targeting windows phone), with threading disabled
for now due to the use of desktop specific threading API
Change-Id: Idf3057a73c88e1a672ffe0a7c72d2626d701b706
- Lower the threshold parameters for HashChainFindCopy.
For 1000 image PNG corpus (m=0), this change yields speedup of 15-20% at
lower quality range (0.25% drop in compression density) and about 10%
for higher quality range without any drop in the compression density.
Following is the compression stats (before/after) for method = 0:
Before After
bpp/MPs bpp/MPs
q=0 2.8615/18.000 2.8651/18.631
q=5 2.8615/18.216 2.8650/20.517
q=10 2.8572/18.070 2.8650/21.992
q=15 2.8519/18.371 2.8584/21.747
q=20 2.8454/18.975 2.8515/20.448
q=25 2.8230/8.531 2.8253/9.585
// Compression density remains same for q-range [30-100]
q=30 2.7310/7.706 2.7310/8.028
q=35 2.7253/6.855 2.7253/7.184
q=40 2.7231/6.364 2.7231/6.604
q=45 2.7216/5.844 2.7216/6.223
q=50 2.7196/5.210 2.7196/5.731
q=55 2.7208/4.766 2.7208/4.970
q=60 2.7195/4.495 2.7195/4.602
q=65 2.7185/4.024 2.7185/4.236
q=70 2.7174/3.699 2.7174/3.861
q=75 2.7164/3.449 2.7164/3.605
q=80 2.7161/3.222 2.7161/3.038
q=85 2.7153/2.919 2.7153/2.946
q=90 2.7145/2.766 2.7145/2.771
q=95 2.7124/2.548 2.7124/2.575
q=100 2.6873/2.253 2.6873/2.335
Change-Id: I0e17581fb71f6094032ad06c6203350bd502f9a1
- Do light weight entropy based histogram combine and leave out CPU
intensive stochastic and greedy heuristics for combining the
histograms.
For 1000 image PNG corpus (m=0), this change yields speedup of 10% at
lower quality range (1% drop in compression density) and about 5% for
higher quality range (1% drop in compression density). Following is the
compression stats (before/after) for method = 0:
Before After
bpp/MPs bpp/MPs
q=0 2.8336/16.577 2.8615/18.000
q=5 2.8336/16.504 2.8615/18.216
q=10 2.8293/16.419 2.8572/18.070
q=15 2.8242/17.582 2.8519/18.371
q=20 2.8182/16.131 2.8454/18.975
q=25 2.7924/7.670 2.8230/8.531
q=30 2.7078/6.635 2.7310/7.706
q=35 2.7028/6.203 2.7253/6.855
q=40 2.7005/6.198 2.7231/6.364
q=45 2.6989/5.570 2.7216/5.844
q=50 2.6970/5.087 2.7196/5.210
q=55 2.6963/4.589 2.7208/4.766
q=60 2.6949/4.292 2.7195/4.495
q=65 2.6940/3.970 2.7185/4.024
q=70 2.6929/3.698 2.7174/3.699
q=75 2.6919/3.427 2.7164/3.449
q=80 2.6918/3.106 2.7161/3.222
q=85 2.6909/2.856 2.7153/2.919
q=90 2.6902/2.695 2.7145/2.766
q=95 2.6881/2.499 2.7124/2.548
q=100 2.6873/2.253 2.6873/2.285
Change-Id: I0567945068f8dc7888041e93d872f9def91f50ba
As 'curr_canvas_mod' is being modified during calls to IncreaseTransparency()
and FlattenSimilarBlocks(), GetSubRect() should get the sub-frame from
'curr_canvas_mod' as well.
Earlier, GetSubRect() was computed from 'curr_canvas', so modifying
'curr_canvas_mod' had no effect on encoding.
Change-Id: Ia847503007b66364817fe57def5a9e3c37d1b3cc
We set the parameters even if there is just one frame. This is to make sure
assembly is correct even if single frame animation is NOT converted to a full
frame later.
Change-Id: If79e6aa5e2575cb0f3cd229f16c655b3663c35b0
Given that we decided to only handle frame disposal for output WebP
internally,
only current and previous canvas need to be maintained.
Change-Id: I625293bed5aeb5aabf4eca779f6ec3ee84c9ff2a
A separate API to generate animated WebP images.
It will eventually replace the internal gif2webp_util methods.
Also: update makefiles.
Change-Id: Idf61dfc1016c10b24fea70425d1a2323cffba515
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)
Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
inline function MakeARGB32 calls changed to call
via pointers to functions which make (a)rgb for
entire row
Change-Id: Ia4bd4be171a46c1e1821e408b073ff5791c587a9
references to fragments remain, along with some superfluous checks; these
will be removed in a future commit.
Change-Id: I39fe9314900ecbc5d60e5065b65fa1b4c668af63
and apply Paeth predictor (predictor#11) for the low effort (m=0) mode.
For 1000 image PNG corpus (m=0), this change yields speedup of 25% at lower quality
range and about 10% for higher quality range.
Change-Id: I0f036b8ffe45c241e63a067cbf01527b13d8de93
most of the time, we don't need to actually move the
data.
Compression is randomly slightly different, because HistogramCompactBins() changed.
Timing is about the same.
Change-Id: Ia6af8e9780581014d6860f2b546189ac817cfad1
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.
Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
we don't need to store the whole distribution in order to compute the alpha
Later, we can incorporate the max_value / last_non_zero bookkeeping
in SSE2 directly.
Change-Id: I748ccea4ac17965d7afcab91845ef01be3aa3e15
this is a first step to unifying encoding/decoding cache stride
and possibly sharing the prediction functions in dsp/
With this layout, there's a little (~7%) space lost with unused samples.
But no speed change was observed.
Change-Id: I016df8cad41bde5088df3579e6ad65d884ee711e
~68% faster
reuses TM4() adding support for the additional rows, the columns were
already being done.
Change-Id: I6eac17e58cd1c636082bf7281f70f884ec399a6b
Move all the Entropy evaluation methods to lossless.c (from histogram.c).
There's slight difference in the way entropy is computed for evaluating
entropy in prediction methods and histogram (literal) for huffman trees.
Plan (later) to merge few (static) methods and reduce the code size.
This change has no impact on the compression speed/density.
Change-Id: Ife3d96a3c4a8d78a91723d9e0a8d1b78c0256a15
Remove handling for WEBP_HINT_GRAPH w.r.t use_palette flag.
The WEBP_HINT_GRAPH is now used at one place, to set the initial size of the
Bit Writer as bpp for photo images are generally larger than the graphical
images.
Change-Id: I1b9c4436c85a8f69da74c0dbcd292397323f2696
Update BackwardRefsWithLocalCache to do in-place update of backward
references w.r.t local color cache index.
No impact on the compression density or compression speed.
Change-Id: Ie066251464c3928c044e037b43df3af28b48ca30
histogram.c:
- Verified (earlier) that there's low correlation between Red & Blue colors
(particularly after applying Cross-color transform). The Bin based histogram
merge, bins on three entropies viz literal, red & blue symbols. Removing
either of blue or red increases the compression density. So keeping the bins
for red & blue sybmols.
- Keeping the compact bins method as-is. This way it's simpler to read.
huffman_encode.h: Added field comments for struct HuffmanTree and removed the TODO.
Change-Id: Ia76f7bc730079d1b3b644038c5d9931db3797f0e
Use the refs_lz77 computed (with cache_bits=0) in the method 'CalculateBestCacheSize'
to regenerate the LZ77 references corresponding to the optimum cache_bits and avoid
calling costly 'BackwardReferencesLz77' one extra time.
This change leaves the compression density unchanged and speeds up compression
by 10-15%.
Change-Id: I5a92e11788d3c3f656aa7e1fba54fb5d96ee0027
This wasn't working for this specific scenario:
- Encode an RGBA 'pic' (with trivial alpha) using lossy encoding.
(so that pic->a == NULL after import happens).
- Modify the 'pic->argb' so that it has non-trivial alpha.
- Encode the same 'pic' again.
This used to fail to encode alpha data as pic->a == NULL.
Change-Id: Ieaaa7bd09825c42f54fbd99e6781d98f0b19cc0c
- The optimal cache bits is evaluated inside the method 'VP8LGetBackwardReferences'.
- The input cache_bits to 'VP8LGetBackwardReferences' sets the maximum cache
bits to use (passing 0 implies disabling the local color cache).
- The local color cache is disabled for lowerf (<= 25) quality levels (as before).
- Enabled local color cache for palette images as well. This saves additional
0.017% bytes with a slight (2-3%) improvement in the compression speed.
- Removed 'use_2d_locality' parameter from method VP8LGetBackwardReferences, as
this option is not an option now (after we freeze the lossless bit-stream).
Change-Id: I33430401e465474fa1be899f330387cd2b466280
This is because, FlattenSimilarBlocks() replaces some opaque pixels by
transparent ones. This results in an equivalent output only if blending
is turned on for the current frame.
Change-Id: I05612c952fdbd4b3a6e0ac9f3a7d49822f0cfb9b
Updated BackwardReferencesRle method by utilizing the local color cache.
Also changed the name of method BackwardReferencesHashChain to
BackwardReferencesLz77 to reflect the LZ77 coding.
For the 1000 image corpus, this change saves 0.2% bytes
(at default settings) and is 2-5% faster to encode.
Change-Id: Ic3f288253b3bbb101a69945a80994c3fd0917f8b
Optimize backwardreferences (about 0.1% byte savings) with almost same
compression speed (3% faster on defaut compression settings).
1.) Simplified iteration logic for HashChainFindCopy.
- Remapped the iter_max constant.
2.) Simplified main for loop for BackwardReferencesHashChain
- Removed 'if' conditions for corner cases in the main loop.
- Refactored the method(AddSingleLiteral) for adding one pixel.
Change-Id: I1bc44832fd81f11e714868a13e606c8f83157e64
Speed up BackwardReferencesHashChainDistanceOnly method by:
1.) Remove for loop for shortmax code path.
2.) Execute the shortmax code path after regular call to
HashChainFindCopy, only if HashChainFindCopy() returns length > 2 (MIN_LENGTH).
3.) Also for shortmax, call method HashChainFindOffset (for length = 2),
instead of expensive method HashChainFindCopy().
4.) Handling first pixel (i==0) outside main loop and removing one if
condition (i > 0) per pixel.
5.) Handle the last pixel outside the main 'for' loop.
Overall compression speedup observed is around 5% (+/- noise).
Change-Id: Ifa30c4035f8d26e6e43e3c4881244d777961c22b
at least clang 3.[45] in c++ mode with -std=c++11 define __STRICT_ANSI__
this change set WEBP_INLINE to inline for c++/non-strict-ansi/> c99
fixes crbug.com/428383
Change-Id: Ief2b934353c336a75865c73c90cc3dc5e4f83913
Enable bin-partition entropy based heuristic for merging histograms
for higher (q >= 90) qualities as well. Keep the old behavior at the
maximum quality level (q==100).
This speeds up the compression between Q=90-99 (method=4) by factor 5-7X
and with loss of 0.5-0.8% in the compression density.
Change-Id: I011182cb8ae5403c565a150362bc302630b3f330
The Maximum allowed limit is 11.
The Q=25 and below is not impacted as cache bits are forced to 0.
This saves 0.05% - 0.1% bytes for other quality with almost same compression
speed (+/- 2-3%, that's more of a noise).
Change-Id: Icf972a98f298c89e140e37a627baf709134be9a0
Updated the logic to limit the Histogram size to a constant, instead of
computing the same based on the Histogram size (that's variable size based on
the cache bits) for the maximum possible cache bits. The actual cache bits may
be lower than the maximum.
Note: The constant 2600 is 16MB/Sizeof(HistogramSize(MAX_COLOR_CACHE_BITS)).
The compression density remains the same with this change, with little faster
compression speed.
Change-Id: I3149894962852e9dad2501b9aa16bb847a20fd86
The method VP8LCalculateEstimateForCacheSize is not evaluating the all possible
range for cache_bits.
Also added a small penality for choosing the larger cache-size. This is done to
strike a balance between additional memory/CPU cost (with larger cache-size) and
byte savings from smaller WebP lossless files.
This change saves about 0.07% bytes and speeds up compression by 8% (default
settings). There's small speedup at Q=50 along with byte savings as well.
Compression at Quality=25 is not effected by this change.
Change-Id: Id8f87dee6b5bccb2baa6dbdee479ee9cda8f4f77
Snapping odd offsets in GIF to even offsets in WebP was causing extra row/column
being disposed in such cases.
Code is rewritten to maintain previous and current canvas (it used to maintain
previous canvas and current frame earlier). And we recompute change rectangles
as those from GIF may no longer apply.
Also, this renders methods like ReduceTransparency() and ConvertToKeyFrame()
redundant, as internally maintained current canvas is always independent of
previous canvases.
Disposal method choice: we pick the disposal method that results in the smallest
change rectangle.
Change-Id: Ic31186d98fe1a2a790a89d1571b17e3abd127e79
Instead of calling HashChainFindMethod, call a new (subset) method
HashChainFindOffset to get the offset/distance for a given length.
The encoding is tad faster at default compression
Before After
bpp/rate bpp/rate
442 Palette 0.2720/5.270 MP/s 0.2720/5.790 MP/s
558 non-palette 3.7607/0.797 MP/s 3.7607/0.816 MP/s
Change-Id: If4041a9c18f7e972f49fcbab8c3e2f013d8bf1cf
Updated VP8LGetBackwardReferences and HashChainFindCopy method with following:
- Remove the recursive CostModelBuild.
- Reuse the lz77 backward refs in CostModelBuild, instead of evaluating it
again (as it was done for recursion_level=0).
- Consolidated the Match-length logic inside FindMatchLength method.
- Removed the logic for altering best_length/val based on the 2D distance.
The additional 162 value (+= 9 * 9 + 9 * 9 - y * y - x * x) can't change the
best_val eval computation to choose a different curr_length, as best_val was
set to 'curr_length << 16'.
Following is the impact on the compression speed/density at default & max
quality, overall this speeds up compression by 5-15% (q=100 -> 75) with a tad
drop (0.02-0.03%) in compression density for the non-palette images.
Before After
bpp/Rate(MP/s) bpp/Rate(MP/s)
q=75 (def)
All 1000 2.4492/1.049 MP/s 2.4498/1.230 MP/s
Palette 0.2719/5.060 MP/s 0.2719/6.110 MP/s
non-Palette 3.7597/0.732 MP/s 3.7607/0.840 MP/s
q=100
All 1000 2.4134/0.125 MP/s 2.4142/0.131 MP/s
Palette 0.2692/2.585 MP/s 0.2692/2.885 MP/s
non-Palette 3.7040/0.079 MP/s 3.7053/0.083 MP/s
Change-Id: I27a5eff3356d876c3e949fd32262244b25678b7a
set WEBP_EXTERN to visibility=default
+ explicitly mark VP8GetCPUInfo as it's referenced within the examples
Change-Id: Ie3d2b15088e888f0b55203b205993eba75899d99
move the attribute to the front of the function to quiet clang warning:
GCC does not allow no_sanitize_thread attribute in this position on a
function definition
Change-Id: Ie4cc6e35a07bd00eab67d9cd6801bd2be9cfe676
Compared to previous mode it gives another 10-30% improvement in compression keeping comparable PSNR on corresponding quality settings.
Still protected by the WEBP_EXPERIMENTAL_FEATURES flag.
Change-Id: I4821815b9a508f4f38c98821acaddb74c73c60ac
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9
In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault
Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
Evaluate if for Palette images (num_colors <= 256), non-palette
compression path (Subtract green, predictor transform etc) yield an
optimal compression density.
This change reduces the WebP file (for palette images) size by 0.4% with
drop of 3-5% in compression speed.
Change-Id: I1ad66fa94db4fd7ba7bc215763791ef662cd4f42
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9
Change-Id: Ifa7c0dd7f3f075b3860fa8ec176d2c98ff54fcea
got rid of the |a-b|^|b-a| method and went back
to just (a-b)^2 instead.
quality | size(bytes) after/before | time (ms) after/before
Change-Id: Ia3e0e6507b3f903deb1e182f78dad6df07380fd0
We compact the palette by weighted distance, favoring the green channel.
Average gain on paletted file is ~0.5%, with gain up to 6-7% on some favorable cases.
Encoding speed is unaffected.
Disabled for alpha (or any single-channel input)
Also: always use quality=20 for EncodePalette() since it
doesn't make any real difference.
Change-Id: I19fb14316a366f139a941b45aef5663a33c905e1
SSE2 version is 2.1x faster
This is used to transfer the alpha plane to green channel before lossless compression.
Change-Id: I01d9df0051c183b1ff5d6eb69961d4f43e33141a
Don't combine the Histograms that have trivial (single valued A, R & B)
symbols.
Following is the compression savings data along with compression time (before
& after) per image.
Before After
bpp, rate(MP/s) bpp, rate(MP/s)
Q=25, method = 4 2.508, 1.807 2.499, 1.916
Q=50, method = 4 2.460, 1.488 2.456, 1.512
Q=75, method = 4 2.452, 1.078 2.450, 1.092
Q=25, method = 5 2.505, 1.398 2.496, 1.383
Q=50, method = 5 2.458, 1.170 2.453, 1.143
Q=75, method = 5 2.453, 0.886 2.450, 0.855
This change provides 0.1-0.4% compression gains and speeds up the lossless
compression for the default method=4 (the drop in compression speed is between 1-3.5% for method=5).
Change-Id: Idfd88c2092f37afacd26a97097b3053f8183953a
Tested on 1000 pngs corpus with quality 90-100 it gives ~0.15% improvement
in compression density and ~7% speed up.
Change-Id: I460f56c96707edb3c1f0b51a024e5122e10458df
iOS 5 support isn't available in the Xcode 6 install; iOS 6 covers
phones starting at the 3GS, so should be a reasonable base line
Change-Id: Ie5603c9e30cb52114b372509e183febbf679a69a
* We don't need to change DecodeAlpha, since incremental
decoding is not useful for Alpha (we already decode
progressively along the RGB)
* Similarly, we don't do incremental decoding for level>0 planes:
the metadata don't turn into visible pixel (only the ones in level0), so...
(No visible speed change)
Change-Id: I2fd4b9ba227561a7dbede647686584b752be7baa
- don't call VP8LClear() when there's no error (and let the caller do it)
- only initialize output once if state_ is not READ_DATA
- don't over-set dec->status_ = READ_DATA
- don't re-set dec->status_ if DecodeImageStream() fails
- remove unneeded dec->action_ field
- make ReadImageInfo() check br->eos_
- use ErrorStatusLossless() more consistently
Change-Id: Ica6e4b1c82e3fce8b1ce0274def551a886b73b0b
For some GIF images, the first frame is missing the corresponding
graphic control extension. For such cases, we were never calling
GetBackgroundColor(), and default background color value (white) was being used
incorrectly.
So, we call GetBackgroundColor() when we encounter the first image
descriptor instead, to make sure that it is always called.
Change-Id: I00fc8e943d8a0c1578dcd718f3e74dec7de4ed61
Optimize the decoding for region that have trivial literal codes.
The trivial literal is defined as huffman image with Red, Blue and Alpha
huffman trees with only single code values.
This speeds up lossless decoding by 3%
Change-Id: I0204949917836f74c0eb4ba5a7f4052a4797833b
put WebPMuxConfig on the stack in main() rather than allocating it in
InitializeConfig(); removes a level of indirection there.
Change-Id: I81d386f7472ebbd322dd3fdbfda9d78dbeb62a66
* We were re-doing most of the work in plain-C as 'left-over'.
* we were always returning has_alpha = true because of a bad mask all_0xff
These bugs were conservative and silent, in the sense that we were 'just' doing
more work than necessary.
Now, the SSE2 version is really 2x faster than the C version.
Change-Id: I6c8132a267fe3c7a3d1fa70e7a5fcd10719543fa
Handle the corner case when VP8LDecodeImage() method is called with an invalid
header data. The lossless decoding doesn't support incremental mode yet.
Return the error status as BITSTREAM error in case not all pixels are decoded
with the provided bit-stream. Also added asserts in the VP8LDecodeImage() method
to validate the decoder header with appropriate/valid data for huffman trees
(htree_groups_ etc).
Change-Id: Ibac9fcfc4bd0a2c5f624bb9d4a2b9f6459aa19ea
-> introduce special case 64b pattern-copy, similar to the 8b one for alpha.
-> use mempcy() for non-overlapping areas
+ cosmetics and homogenezation of the code
Change-Id: I0e65e04b96fec94c009a4614137dfba2a0f98561
ReadSymbol() finishes with a VP8LSetBitPos() call only and could miss an eos_ during the decode loop.
Things are faster because of inlining too.
Change-Id: I2d2a275f38834ba005bc767d45c5de72d032103e
eos_ needs to be set only when superfluous bits have actually
been requested.
Earlier, we were assuming pre-mature end-of-stream to be an error.
Now, more precisely, we mark error when we have encountered end-of-stream *and*
we attempt to read more bits after that.
This handles cases where image data requires no bits to be read
Change-Id: I628e2c39c64f10c443fb51f86b1f5919cc9fd299
if ALPHA_LOSSLESS_COMPRESSION produces a too big file (very rare!),
we fall-back to no-compression automatically.
Change-Id: I5f3f509c635ce43a5e7c23f5d0f0c8329a5f24b7
speed-up is ~1.6% for photographic image to 10% for graphical image
(1000 images corpus was sped up by 5.8 %)
Code by akramarz@google.com and jyrki@google.com
Change-Id: Iceb2e50e6cc761b9315a3865d22ec9d19b8011c6
Split initialization of YUV444Converters[] out of Upsamplers init.
update test for NULL function pointers
Change-Id: I9603f54250f90c85a12ffbecfd6c59e9b06c47e0
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.
Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
rightmost pixel was missing a copy, which could lead to invalid read.
Also added a lower dimension of 4, below which we use the regular conversion.
This is to prevent corner cases, in addition to not being overkill.
Change-Id: Iac12e7a3d74590f12fe8eeb1830b9891e61439f6
_M_IX86 will be defined in mingw builds after including windows.h. as
the gcc inline asm is first, this missing check would only have caused
an error if the code was reorganized.
Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c
with a special case for dithering==0., it gets somewhat faster on x86
thanks to inlining.
Also, less macros.
Change-Id: Ic2f2bf6718310743bb40cef2104fa759a073e6d5
New function: WebPPictureSmartARGBToYUVA()
This implement smart RGB->YUV conversion.
This is rather undocumented for now, and is triggered using '-pre 4'
preprocessing option.
This is slow-ish and use quite some memory, but should be improvable.
This is somehow a usable beta version.
Change-Id: Ia50a8c30134e4cab8a7d3eb70aef13ce1f6187a1
This compresses the uimage using lossless compression and controlable
decimating pre-process.
Code is under WEBP_EXPERIMENTAL_FEATURE while it's being experimented with.
Change-Id: I8b7f4cfcc3c6afc52a556102842bdbb045ed5ee8
libwebp 0.4.1
- 7/24/14: version 0.4.1
This is a binary compatible release.
* AArch64 (arm64) & MIPS support/optimizations
* NEON assembly additions:
- ~25% faster lossy decode / encode (-m 4)
- ~10% faster lossless decode
- ~5-10% faster lossless encode (-m 3/4)
* dwebp/vwebp can read from stdin
* cwebp/gif2webp can write to stdout
* cwebp can read webp files; useful if storing sources as webp lossless
* tag 'v0.4.1':
update ChangeLog
iosbuild.sh: specify optimization flags
update ChangeLog
makefile.unix: add vwebp.1 to the dist target
update ChangeLog
gif2webp: dust up the help message
remove -noalphadither option from README/vwebp.1
update NEWS for the next release
update AUTHORS
bump version to 0.4.1
restore mux API compatibility
remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
restore encode API compatibility
restore decode API compatibility
gif2webp: fix compile with giflib 5.1.0
gif2webp: simplify giflib version checking
Change-Id: Icf599f29bc6c0db757bc133aaddb3dbbbc316e08
explicitly set '-O3 -DNDEBUG'. setting CFLAGS on the command line
overrides the default, resulting in -O0.
Change-Id: I213979f646b1444b1d8e0eb0bb58e9b2c3cc4dd3
* try to avoid trailing '.'
* rationalize capitalization
missed in:
0a8b886 dust up the help message
Change-Id: I6f80736cc8a2ff4f185f63d463a57d5bbf88a0db
+ vwebp's -help output
this is a future option; missed in:
793368e restore decode API compatibility
Change-Id: If920df2cf8de57ebad93a6b98830562149396d8d
We store the raw RGB samples decoded from JPEG, and avoid precision loss.
Note that this may increase the encoding time reported by
cwebp -v, since RGB->YUV now occur during WebPEncode call
(in case of lossy), instead of ReadJPEG().
This also increases the memory use, since we're carying the
source ARGB samples around.
Change-Id: Ic2180206cfc9f5574f391e91c3b89b9d81695d01
fixes thread support detection in e.g., cross-compiles/mingw installs
without libpthread, the behavior is unchanged in mingw installs with
libpthread. in the latter case although libpthread is detected the
windows api (_beginthreadex) path is being used.
Change-Id: Ie5f39f15f380731a1006a2497496d627f08fa103
Some single-frame GIF images have a canvas larger than the frame rectangle. For
such images, we retain the ANMF, ANIM and VP8X chunks in the output WebP file.
This ensures that the full canvas width/height and frame offsets are retained.
Change-Id: I3ebae4893f953984de4072fda0938411de787a29
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.
Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
defines HAVE_BUILTIN_BSWAP16/32/64
updated endian_inl.h to have a non-configure fallback for gcc and clang
BSwap16() now uses __builtin_bswap16 if available
Change-Id: Ia04ee07b39303c4b247df96d84f298fb8a81f389
possibly other cross-compiles; avoids misusing -mavx2/-msse2 in these
cases by checking the usability of the associated intrinsics header
files.
iosbuild.sh has been broken since at least:
6e61a3a configure: test for -msse2
Change-Id: Ic650d9eec70f6a5de7d89997b3b425a4aa50ccd9
also reduce the load size from 64 to 32 bits as the top 32 bits are
being shifted away in the operation.
the change is neutral speed-wise on x86_64 as is the change in load size
on x86, but it gives a slight improvement on 32-bit arm.
x86 is improved ~13%, 32-bit arm ~3.7%
aarch64 is untested but will likely benefit as well.
Change-Id: Ibcb02a70f46f2651105d7ab571afe352673bef48
forces aligned memory reads (via memcpy) in the VP8 bit reader, useful
for platforms that don't support unaligned loads.
Change-Id: Ifa44a9a1677fbdc6a929520f9340b7e3fcbd6692
this defines WORDS_BIGENDIAN, replacing uses of
__BIG_ENDIAN__/__BYTE_ORDER__ with it
+ fixes lossless BGRA output with big-endian toolchains
that do not define __BIG_ENDIAN__ (codesourcery mips gcc)
Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7
moves the following to this header:
- htole*() definitions from bit_writer.c
- __BIG_ENDIAN__ fallback define from bit_reader_inl.h
Change-Id: I7fff59543f08a70bf8f9ddac849b72ed290471b1
(typecast uint32 pointer to uint64).
The proposed change is little (0.05%) slower but avoids uint32 to uint64
pointer conversion.
Change-Id: I6b8828077ea1324fabd04bfa7e7439e324776250
previously, the final canvas size was adjusted tightly from the
animation frames. Now, it can be specified separately (to be larger, in particular).
calling WebPMuxSetCanvasSize(mux, 0, 0) triggers the 'adjust tightly' behaviour.
This can be useful after calling WebPMuxCreate() if further image addition
is expected.
-> Fixed gif2webp accordingly.
also: made WebPMuxAssemble() more robust by systematically zero-ing WebPData.
Change-Id: Ib4f7eac372cf9dbf6e25cd686a77960e386a0b7f
this will remove a warning about the shift amount not being
an immediate (=constant).
Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a
Also: fix the NULL sharpen argument mismatch.
mostly to balance the use of bswap64, some gcc platforms are already
interpreting the default case the same
Change-Id: Icf860f55b3f16bea349a7d721e6d6abeeb4e5cf3
Similarly to Chrome, we then use the first sub-rectangle
to set the canvas size.
Also: add check for too-large GIF dimensions (>MAX_CANVAS_SIZE)
Change-Id: Idce55f1e6f6982a8f0e082aac540e16b530e023e
We align with Blink/Chromium code by:
- checking for ANIMEXTS1.0 signature too
- using ByteCount >= 3 instead of requiring ByteCount==3
Change-Id: Idc484ca62878517df3dccb1fdb3bb45104a5e066
see: http://odur.let.rug.nl/kleiweg/gif/netscape.html
Another store to load forward block was detected coming from the function
FTransform.
FTransform save the output data 4 times 8 bytes each. when this data is
later being loaded by the QuantizeBlock function in one chunk of 16 bytes
that caused a store to load forward block.
The fix was done in the FTransform function where each two consecutive 8 bytes
were merged into one 16 bytes register and saved into the memory.
This fix gives ~21% function level gain and 1.6% user level gain.
Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992
ChunkVerifyAndAssign() expects to have at least 8 bytes to work with,
but was only checking for the presence of 4.
Change-Id: I8456b15d872de24a90c1e8fbfba463391ced5c7f
new options:
dwebp -alpha_dither
vwebp -noalphadither
When the source was marked as quantized, we use a threshold-averaging
filter to smooth the decoded alpha plane.
Note: this option forces the decoding of alpha data in one pass, and
might slow the decoding a bit.
The new field in WebPDecoderOptions struct is 'alpha_dithering_strength'
(0 by default, means: off). Max strength value is '100'.
Change-Id: I218e21af96360d4781587fede95f8ea4e2b7287a
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)
Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)
Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.
Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
User-hook can fail but error was not propagated back.
Change-Id: Ic79f9543bf767634a127eccfef90af855ff15c34
Also: some ad-hoc clean-up and API dusting. More to come later...
this change has the side-effect of using directory names in the
include, silencing a lint warning.
Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.
also: fixes a warning for win64
+ slight speed-up
Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
+ add a WEBP_HAVE_SSE2 to dsp.h
not all 32-bit toolchain configurations will have sse2 enabled by
default
Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
this is used to set WEBP_USE_AVX2 in files where the build flag won't be
used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called
Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f
* remove LEFT/RIGHT_JUSTIFY distinction. It's all RIGHT_JUSTIFY now.
* simplify VP8GetSigned(), and add some masking branch-less code. Much
faster on ARM (~13% speed-up). 8% on x86-64, 5% on MacBook.
* split critical implementation into separate bit_reader_inl.h file that
is only included where needed (vp8.c / tree.c / bit_reader.c)
* bumped BITS value from 16 to 24 for x86-32b too, since it's a bit faster.
Change-Id: If41ca1da3e5c3dadacf2379d1ba419b151e7fce8
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).
Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
Needed to add 'volatile' and some casts.
Relevant excerpt from the 'man longjmp':
===============
The values of automatic variables are unspecified after a call to longjmp() if they meet all the following criteria:
· they are local to the function that made the corresponding setjmp(3) call;
· their values are changed between the calls to setjmp(3) and longjmp(); and
· they are not declared as volatile.
===============
Change-Id: Ic72dc92669513a820369ca52a038afa9ec88091f
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.
Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()
Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
similar to makefile.unix:
> nmake /f Makefile.vc CFG=release-static HAVE_AVX2=1
from the msdn:
The /arch:AVX2 option and __AVX2__ macro were introduced in Visual
Studio 2013 Update 2, version 12.0.34567.1
(Update 2, version 12.0.30501.00 seems to work)
Change-Id: I649ee47c9fdc399fc71a8ac8464728608d9b6412
gcc was generating very complex code, one for each case of br->len_ values!
also, pretty-fy the mask constants
Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled
Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
Sometimes, we can write 18bit or more at time, and it would
overflow the 32bit accumulator.
Also clarified the num-bits limitations (and exposed
VP8L_MAX_NUM_BIT_READ in bit_reader.h)
fixes http://code.google.com/p/webp/issues/detail?id=200
Seems a bit faster (use of local fields for bits_ / used_)
also: added the __QNX__ bswap while at it.
Change-Id: I876db93a931db15b083cf1d838c70105effa7167
* remove some sign-bit flipping
* turn some macro into inline functions
* fix some 'const' in signatures
* clarify the int8/uint8 usage
Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d
the inclusion of the files is harmless when NEON is not enabled and will
allow them to be built with NEON for APP_ABI=arm64-v8a which currently
does not use the '.neon' suffix
Change-Id: I39377876b1b68822c38f4e2396da93c56145fc0f
also changed the token-page layout a little bit to remove
a not-needed field.
This reduces the number of malloc()/free() calls substantially
with minimal increase in memory consumption (~2%).
For the tail of large sources, the number of malloc calls goes
typically from ~10000 to ~100 (e.g.: bryce_big.jpg: 22711 -> 105)
Change-Id: Ib847f41e618ed8c303d26b76da982fbc48de45b9
MALLOC_FAIL_AT flag can be used to set-up a pre-determined failure
point during malloc calls. The counter value is retrieved using
getenv().
Example usage: export MALLOC_FAIL_AT=37 && cwebp input.png
will make 'cwebp' report a memory allocation error the 37th time
malloc() or calloc() is called.
MALLOC_MEM_LIMIT can be used similarly to prevent allocating more
than a given amount of memory. This is usually less convenient to
use than MALLOC_FAIL_AT since one has to know in advance the typical
memory size allocated.
Both these flags are meant to be used for debugging only!
Also: added a 'total_mem_allocated' to record the overall memory allocated
Change-Id: I9d408095ee7d76acba0f3a31b1276fc36478720a
Non-photo source produce far less literal reference and their
buffer is usually much smaller than the picture size if its compresses
well. Hence, use a block-base allocation (and recycling) to avoid
pre-allocating a buffer with maximal size.
This can reduce memory consumption up to 50% for non-photographic
content. Encode speed is also a little better (1-2%)
Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11
This change reduces the number of calls to WebPSafeMalloc from 200 to
100. The overall memory consumption is down 3% for Lenna image.
Change-Id: I1b351a1f61abf2634c035ef1ccb34050b7876bdd
Some tracing code is activated by PRINT_MEM_INFO flag.
For debugging only! (not thread-safe, and slow).
Change-Id: I282c623c960f97d474a35b600981b761ef89ace9
the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_)
is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain
in the encoder.
Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677
We use automatic int->uint64_t promotion where applicable.
(uint64_t should be kept only for overflow checking and memory alloc).
Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125
on ExUtilLoadWebP() failure no allocated memory will be returned, so
it's safe to exit immediately. additionally, any webp specific problems
will already have been reported as part of the call to
WebPGetFeatures().
broken since:
4a0e739 dwebp: move webp decoding to example_util
Change-Id: Ibc632015a1f52bae7f96d063252624123fa7c2da
doing so is not part of ISO C; removes some pedantic warnings.
use webp/decode.h to pickup VP8StatusCode instead.
Change-Id: I19b35e0f8a36fb7c45944ae9ca86838e08b90548
The predictors based on Average2 are tad slower.
Following is the performance data for these predictors normalized to
number of instruction cycles (as per valgrind) per operation:
- Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11
instruction cycles for the C version.
- Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12
instruction cycles for the C version.
The predictors based on Average4 is faster and Average3 is tad slower:
- Predictor10 (Average4) now takes 23 instruction cycles compared to 25
instruction cycles for the C version.
- Predictor5 (Average3) now takes 20 instruction cycles compared to 18
instruction cycles for the C version.
Maybe SSE2 version of Average2 can be improved further. Otherwise, we can
remove the SSE2 version and always fallback to the C version.
Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029
* merged the two HistogramAdd/AddEval() into a single call
(with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster
Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
move simple loop filter defines closer to their use and LOAD* to a
location common with the intrinsics
Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3
the complex loop filter has no inline equivalent; the simple loop filter
remains conditional on USE_INTRINSICS: it's left undefined for now.
Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2
CollectHistogram / SSE* / QuantizeBlock have no inline equivalents,
enable them where possible and use USE_INTRINSICS to control borderline
cases: it's left undefined for now.
Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155
using this in Load4x16 was slightly slower and didn't help mitigate any
of the remaining build issues with 4.6.x.
Change-Id: Idabfe1b528842a514d14a85f4cefeb90abe08e51
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.
Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.
Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.
Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
HuffmanCost and HuffmanCostCombined optimized and added
'const' to some variables from ExtraCost functions.
Change-Id: I28b2b357a06766bee78bdab294b5fc8c05ac120d
When remapping buffer, br->eos_ was wrongly being set to true for
certain
images.
Also, refactored the end-of-stream detection as a function.
Reported in http://crbug.com/364830
Change-Id: I716ce082ef2b505fe24246b9c14912d8e97b5d84
Some versions of compiler in debug build can't find
a register in class 'GR_REGS' while reloading 'asm'
Number of used registers is decreased in this fix.
Change-Id: I7d7b8172b8f37f1de4db3d8534a346d7a72c5065
This is to help further optimizations.
(like in https://gerrit.chromium.org/gerrit/#/c/69787/)
There's a small slowdown (~0.5% at -z 9 quality) due to
function pointer usage. Note that, for speed, it's important
to return VP8LStreaks by value, and not pass a pointer.
Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000
.set at - Indicates that macro expansions may clobber
the assembler temporary ($at or $28) register.
Some macros may not be expanded without this
and will generate an error message if noat
is in effect.
"at" also added to the clobber list.
Change-Id: I67feebbd9f2944fc7f26c28496e49e1e2348529d
avoids:
src/dsp/enc_mips32.c: In function 'ITransformOne':
src/dsp/enc_mips32.c:123:3: can't find a register in class 'GR_REGS' while reloading 'asm'
src/dsp/enc_mips32.c:123:3: 'asm' operand has impossible constraints
Change-Id: Ic469667ee572f25e502c9873c913643cf7bbe89d
apparently faster, but we might save some load/store to/from memory
once we settle for the intrinsics-based FTransform()
(also: fixed some #ifdef USE_INTRINSICS problems)
Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7
Functions VP8LFastLog2Slow and VP8LFastSLog2Slow
also: replaced some "% y" by "& (y-1)" in the C-version
(since y is a power-of-two)
Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0
Verified OK, but right now they don't seem faster.
So they are disabled behind a USE_INTRINSICS flag (off for now)
Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c
+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual
Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d
slightly faster than the inline asm
in practice not much faster than the C-code in a full NEON build, but
still better overall in an Android-like one that only enables NEON for
certain files.
Change-Id: I69534016186064fd92476d5eabc0f53462d53146
* inverse transform is actually slower with intrinsics + gcc-4.6,
so is left disabled for now.
With gcc-4.8, it's a bit faster than inlined assembly.
* Sum of Square error function provide a 2-3% speed up
There's enabled by default (since there's no inlined-asm equivalent)
Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a
+ misc cosmetics
* seems 4% slower than inlined-asm with gcc-4.6
* is a tad faster (<1%) with gcc-4.8
(disabled for now)
Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095
The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t
register, and re-use the previous (luma) functions
Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38
This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.
Change-Id: I491aa1c726def934161d4a4377e009737fbeff82
+ added some work-around gcc-4.6 to make it compile (except one function).
+ lots of revamping
All variants tested ok.
Speed-up is ~5-7%
Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3
vertical only currently, 2.5-3% faster
placed under USE_INTRINSICS as this change depends on the simple
loopfilter
improves the simple loopfilter slightly thanks to some reorganization
Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155
When 4 pixels are left, they should be processed with SSE2.
Decoding is marginally faster (~0.4%).
Encoding speed: No observable difference.
Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e
new file: lossless_neon.c
speedup is ~5%
gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509
I've tried adding -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.
Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0
It's disable for now, because it crashes gcc-4.6.3 during compilation
with -O2 or -O3. It's been tested OK with -O1.
Code is still globally disabled with USE_INTRINSICS, though.
Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c
disabled for now (but tested OK), thanks to the USE_INTRINSICS #define
We'll activate the code when we're on par with non-intrinsics
Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff
Tune HistogramCombineBin for hard images that are larger than 1-2 Mega
pixel and represent photographic images.
This speeds up lossless encoding on 1000 image corpus by 10-12% and compression
penalty of 0.1-0.2%.
Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81
there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.
The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.
Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).
Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b
expose the predictor array as function pointers instead
of each individual sub-function
+ merged Average2() into ClampedAddSubtractHalf directly
+ unified the signature as "VP8LProcessBlueAndRedFunc"
no speed diff observed
Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044
With -bypass_filter switched on, the lossless-compressed
data is decoded ahead of time (before being transformed and
display). Hence, the last row was called twice.
http://code.google.com/p/webp/issues/detail?id=193
Change-Id: I9e13f495f6bd6f75fa84c4a21911f14c402d4b10
(and ~2-3% on ARM)
We don't need to store cost/score for each node, but only for
the current and previous one -> simplify code and save some memory.
Also made the 'Node' structure tighter.
Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6
all the functions involved return double and later these locals are used
in double calculations. fixes a vs build warning
Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14
Get back some of the compression gains by extending the search space for
GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not
helping much in yielding better compression density.
Before:
1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations)
Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s
After:
1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations)
Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s
Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d
Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash
for merging the Histograms more efficiently, instead of the randomized
heuristic of existing method HistogramCombine.
This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50)
with 0.8% penalty in compression density. For lower method, the speed up is 25-30%,
with 0.4% penalty in the compression density.
Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce
Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove
one if condition inside the main/critial loop. Also separated TransformColor &
TransformColorInverse into separate functions and avoid one 'if condition'
inside this critical method.
This change speeds up lossless decoding for Lenna image about 5% and 1000 image
corpus by 3-4%.
Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d
* allow reading from stdin for dwebp / vwebp
* allow writing to stdout for gif2webp
by introducing a new function ExUtilReadFromStdin()
Example use: cat in.webp | dwebp -o - -- - > out.png
Note that the '-- -' option must appear *last*
(as per general fashion for '--' option parsing)
Change-Id: I8df0f3a246cc325925d6b6f668ba060f7dd81d68
These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
cwebp -z 5 in.png -o out_lossless.webp
There are 10 possible values for -z parameter:
0 (fastest, lowest compression)
to 9 (slowest, best compression)
A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.
This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.
Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)
Output is bitwise the same as before.
Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.
This speeds up lossless encoding by 1-2% and almost no impact on compression
density.
Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b
mostly by:
- storing a single rd-score instead of cost / distortion separately
- evaluating terminal cost only once
- getting some invariants out of the loops
- more consts behind fewer variables
Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c
incorporate non-last cost in per-level cost table
also: correct trellis-quant cost evaluation at nodes
(output a little bit different now). Method 6 is ~4% faster.
Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8
Speedup lossless encoder by 20-25% by optimizing:
- GetBestColorTransformForTile: Use techniques like binary search and
local minima search to reduce the search space.
- VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for
log(1 + x) and increase the threshold for calling the approximate
version of log_2 (compared to costly call to log()).
Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d
converts 2 s16 vectors to 2 u8 and store to uint8_t destination;
TransformAC3 can reuse this after a rework
Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0
Separate the C version from the MIPS32 version and have run-time
initialization during RescalerInit()
Change-Id: I93cfa5691c073a099fe62eda1333ad2bb749915b
Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.
Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.
This speeds up the lossless encoding at lower qualities by 10-15%.
Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
* simplify the endian logic
* remove the need for memset()
* write 16 or 32 at a time (likely aligned)
Makes the code a bit faster on ARM (~1%)
Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c
-> remove the 'color_transform' multiplier, use more constants, etc.
This function is particularly critical, mostly because of
GetBestColorTransformForTile().
Loop is a bit faster (maybe ~1%)
Change-Id: I90c96a3437cafb184773acef55c77e40c224388f
The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits)
data in the destination buffer and while pre-multiplying the alpha to RGB colors.
Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba
(instead of per-macroblock)
speed unchanged.
simplified the context-saving for incremental decoding
Change-Id: I301be581bab581ff68de14c4ffe5bc0ec63f34be
VP8GetThreadMethod() may be called with a NULL headers param; correct an
assert.
broken since:
8a2fa09 Add a second multi-thread method
Change-Id: If7b6d1b8f4ec874d343a806cee5f5e6bb6438620
This makes the segmentation overall less prone to
local-optimum or boundary effect.
(and overall, encoding is a little faster)
Change-Id: I35688098b0f43c28b5cb81c4a92e1575bb0eddb9
The registers and instructions are quite different to 32bit
and the assembly code needs a rewrite.
more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/
Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f
the -alpha_cleanup flag was ineffective since we switched cwebp
to using ARGB input always.
Original idea by David Eckel (dvdckl at gmail dot com)
Change-Id: I0917a8b91ce15a43199728ff4ee2a163be443bab
New API options: WebPDecoderOptions.flip and 'dwebp -flip ...'
it uses negative stride trick.
Also changed the decoder code to support user-supplied
buffers with negative stride, independently of the
WebPDecoderOptions.flip value.
Change-Id: I4dc0d06f0c87e51a3f3428be4fee2d6b5ad76053
libwebp 0.4.0
- 12/19/13: version 0.4.0
* improved gif2webp tool
* numerous fixes, compression improvement and speed-up
* dither option added to decoder (dwebp -dither 50 ...)
* improved multi-threaded modes (-mt option)
* improved filtering strength determination
* New function: WebPMuxGetCanvasSize
* BMP and TIFF format output added to 'dwebp'
* Significant memory reduction for decoding lossy images with alpha.
* Intertwined decoding of RGB and alpha for a shorter
time-to-first-decoded-pixel.
* WebPIterator has a new member 'has_alpha' denoting whether the frame
contains transparency.
* Container spec amended with new 'blending method' for animation.
* tag 'v0.4.0':
update ChangeLog
update NEWS description with new general features
gif2webp: don't use C99 %zu
cwebp: fix metadata output w/lossy+alpha
makefile.unix: clean up libgif2webp_util.a
update Changelog
bump version to 0.4.0
update AUTHORS & .mailmap
update NEWS for 0.4.0
Change-Id: I6df1b512fe0b697192f1f9b29431cd59aa6063d1
from git://git.savannah.gnu.org/autoconf-archive.git
100644 blob d383ad5c6d m4/ax_pthread.m4
dd94691 AX_PTHREAD: add support for Clang
386b290 AX_PTHREAD: fix support for AIX
e40cfd6 AX_PTHREAD: improve support for AIX
Change-Id: Ic1f1edc656e4e917b88f151493147f650b94242f
this would require a PRIuS or similar macro for proper platform
compatibility (Visual Studio for instance would be variants of %lu)
Change-Id: I2b9fcb1639db024775fb47dbcf79a2240f3d98f2
this partially reverts
f626fe2 Detect canvas and image size mismatch in decoder.
the original change would cause calls to e.g., WebPGetInfo to fail until
a portion of the image chunk was available. With lossy+alpha this meant
waiting for the entire ALPH chunk to be received.
this change restores the original behavior -- reporting the values from
VP8X if available -- while retaining some of the added canvas/image size
checks if the image data is available
Change-Id: I6295b00a2e2d0d4d8847371756af347e4a80bc0e
add TransformDC special case, and make the switch function inlined.
Recovers a few of the CPU lost during the addition of TransformAC3
(only on ARM)
Change-Id: I21c1f0c6a9cb9d1dfc1e307b4f473a2791273bd6
from man-pages (7):
provides a comma-separated list of related man pages, ordered by section
number and then alphabetically by name, possibly followed by other
related pages or documents. Do not terminate this with a period.
Change-Id: I6763b3b3a89f3c2e9ed5c1a4e9d05fb9ce85dd40
this enables webpmux to accept input files starting with '-' when using
-get/-set/-strip; -info & -frame expect an input file as one of their
parameters so no changes are necessary there.
Change-Id: I154eb6dc388258a7fb743ec76ba869bf9589be1b
the *quantized* level should be clipped to 2047, not the
original coeff.
(similar problem was fixed in the regular quantize function
quite some time ago)
Change-Id: I2fd2f8d94561ff0204e60535321ab41a565e8f85
This is to discourage generation of animated WebP images with ICC
profile, as
the real-world use-case for such images is rare at best.
Output of ICC/XMP metadata is now controlled by the '-metadata' option.
Change-Id: I8e3e29878c32bf46cbc661f50661bac602603c43
WHT is somewhat a special case: no sharpen[] bias, etc.
Will be useful in a later CL when precision of input is changed.
Change-Id: I851b06deb94abdfc1ef00acafb8aa731801b4299
This is in preparation for a future change where input will
be 16bit instead of 12bit
No speed diff observed.
Note that the NEON implementation was using 32bit calc already.
Change-Id: If06935db5c56a77fc9cefcb2dec617483f5f62b4
* remove the sharpening for non luma-AC coeffs
* adjust the bias a little bit to compensate for this
Using the multiply-by-reciprocal doesn't always give the same result
as the exact divide, given the QFIX fixed-point precision we use.
-> removed few now-unneeded SSE2 instructions (and checked for
bit-exactness using -noasm)
Change-Id: Ib68057cbdd69c4e589af56a01a8e7085db762c24
original/compressed pictures were not converted to an adequate YUVA
colorspace before computing the distortion.
Change-Id: I37775e9b7dbd6eca16c38e235e1df325858d36a1
-ffunction-sections / -fdata-sections
can improve final binary size when used with --gc-sections, speed impact
untested
Change-Id: I37f4b5da2f34acede7965c2da2e1b97125473adc
Even at high quality setting, the U/V quantizer step is limited
to 4 which can lead to banding on gradient.
This option allows to selectively apply some randomness to
potentially flattened-out U/V blocks and attenuate the banding.
This option is off by default in 'dwebp', but set to -dither 50
by default in 'vwebp'.
Note: depending on the number of blocks selectively dithered,
we can have up to a 10% slow-down in decoding speed it seems.
Change-Id: Icc2446007f33ddacb60b3a80a9e63f2d5ad162de
RGBToU/V calls expects two extra precision bits, they were only
given one by SUM2H and SUM2H macros.
For rounding coherency, also changed SUM1 macro.
Change-Id: I05f96a46f5d4f17b830d0420eaf79b066cdf78d4
When '-mixed' option is given, each frame would be heuristically chosen
to be
encoded using lossy or lossless compression.
The heuristic is based on the number of colors in the image:
- If num_colors <= 31, pick lossless compression
- If num_colors >= 194, pick lossy compression
- Otherwise, try both and pick the one that compresses better.
Change-Id: I908c73493ddc38e8db35b7b1959300569e6d3a97
otherwise make sure that all frames are marked as a fragment. there's
still some work to do with validation if fragments are expected to cover
the entire canvas.
Change-Id: Id59e95ac01b9340ba8c6039b0c3b65484b91c42f
This is to conform to man/gif2webp.1
Earlier, one needed to give both '-kmin 0' and '-kmax 0' for this to
work.
Also, suppress further warnings for kmin = 0 and/or kmax = 0 case.
Change-Id: I6f5eeb609aeffc159d0252a40a5734162f7e4e7d
this avoids local-minima that look bad, even if the distortion
looks low (e.g. gradients, sky,...). Mostly visible in the q=50-80 range.
Output size is mostly unchanged.
Change-Id: I425b600ec45420db409911367cda375870bc2c63
Earlier we were only testing for bit_pos == LBITS. But this is not
sufficient,
as bit_pos can jump from < LBITS to > LBITS.
This was resulting in some bit-stream truncation errors not being
caught.
Note: Not a security bug though, as br->pos wasn't incremented in such
cases
and so we weren't reading beyond the buffer.
Change-Id: Idadcdcbc6a5713f8fac3470f907fa37a63074836
* raise U/V quantization bias to more neutral values
* also raise the non-zero AC bias for Y1/Y2 matrices
(we need all the precision we can for U/V leves, which are often empty)
This will increase quality in the higher range (q >= 90) mostly.
Files size is exacted to raise a little (5-7%). and SSIM accordingly of course.
Change-Id: I8a9ffdb6d8fb6dadb959e3fd392e66dc5aaed64e
kLevelsFromDelta[sharpness][delta] is an inverse look-up table
that tells the minimum filtering strength needed to trigger the
filtering of a step with amplitude 'delta'. We use this table
in various situations:
a) when computing the initial (/global) filtering
strength for each segment. We look at the quantization
step and deduce the proper filtering strength needed
to result this quantization noise (talking the -f option
into account).
b) during intra16 calculation, when a block ends up
very empty (only DC coeffs are non-zero, all ACs have
vanished). We'll rely on the in-loop filtering to
restore the smoothness (if the source was gradient-like
smooth. That's why we look at the distortion too before
triggering the filtering).
Step b) goes _in addition_ to a), potentially raising
the filtering strength if blockiness is likely.
Change-Id: Icaeca93ef21da195b079e6587a44d9edfc8e9efa
Earlier "f = f->next_" was executing for both inner and outer loop, thus
skipping validation of some frames.
Change-Id: Ice5cdb4ff5da78384aa0573addd3a5e5efa0b10c
-> helps debanding (sky, gradients, etc.)
This dithering can only be triggered when using -preset photo
or -pre 2 (as a preprocessing). Everything is unchanged otherwise.
Note that this change is likely to make the perceived PSNR/SSIM drop
since we're altering the input internally.
Change-Id: Id8d4326245d9b828141de162c94ba381b1fa5813
method 1 grouping: [parse + reconstruction] // [filtering + output]
method 2 grouping: [parse] // [reconstruction+filtering + output]
Depending on some heuristics (see VP8ThreadMethod()), we
can pick one of the other when -mt flag (or option.use_threads)
is selected.
Conservatively, we always use method #2 for now until the heuristic
is refined (so, timing should be the same the before this patch)
+ replace 'use_threads' by 'mt_method'
+ define MIN_WIDTH_FOR_THREADS constant
+ fix comment alignment
Change-Id: I11a756dea9070d6e21b1a9481d357a1e8aa0663e
Mostly visible for large images.
Reconstruction+filtering is now done in parallel to bitstream-parsing.
Change-Id: I4cc4483d803b255f4d97a2fcd9158b1c291dd900
Specifically:
- Merge OptimizeAndEncodeFrame with WebPFrameCacheAddFrame: they use the same
if-else structure.
- Move maintenance of 'prev_canvas' and 'curr_canvas' to util.
- Move ReduceTransparency() and FlattenPixels() calls to SetFrame(): This is in
preparation for the next patch: which will try try lossless encoding for
each frame, even when '-lossy' option is given.
- Make most methods static inside util.
No changes to output expected.
Change-Id: I1f65af25246665508cb20f0f6e338f9aaba9367b
broken since:
a5c297c swig/java: reduce wrapper function code duplication
this was a part of v0.3.1, but not v0.3.0.
Change-Id: I001d4bd0a7a1aa1b2d267bc63bc1d8226bff00c1
Needs more memory but allows for future parallelization.
Noticeably faster on ARM, slightly faster on x86
also: remove dec->filter_row_ unnecessary field
Change-Id: I044a808839b4e000c838a477e3e8688820436d9a
quite rarely, it gives a different output
then without the HINT, but that's often for
a smaller size (tested with default -m and -m 6)
Change-Id: I51d221ab61f8e007983325031345728e8d80b241
helps during lossless compression.
10% average saving, but that's mostly on what was previously
'difficult' cases, where the gain is ~30-50% actually.
Non-difficult cases are mostly unchanged.
Tested over ~7k random web gifs.
Change-Id: I09db4560e4ab09105d1cad28e6dbf83842eda8e9
happens surprisingly often at low quality, so we might
as well hard-code a simplified TransformWHT() directly.
Change-Id: Ib7a858ef74e8f334bd59d6512bf5bd3e455c5459
We reduce transparency by turning some transparent pixels into
corresponding RGB values from previous canvas.
This improves compression by about 23%.
Change-Id: I02d70a43a1d0906ac09a7e2dc510be3b2d38f593
useful for testing. Not listed in the man or README, since
it's only useful for testing the incremental decoding
(e.g. measuring the timing difference compared to non-incremental)
Change-Id: I8df8046e031d21006242babb5bcac09f8ff9f710
happens when decoding is partial (past Partition0), without error and
interrupted by calling WebPIDelete()
WebPIDelete() needs to call VP8ExitCritical() to free in-flight resources
Change-Id: Id4faef1b92f7edd8c17d642c58860e70dd570506
...and make gif2webp and vwebp compile without them.
Makefile.vc still to be updated...
Meanwhile the CL environment variable can be supplemented with
set CL=/DWEBP_HAVE_GL /IC:\opt\freeglut\include
Change-Id: I37a60b8c32aafd125bffa98b6cc9f57c022ebbd0
"src\enc\frame.c(88) : warning C4244: '=' : conversion from 'const double' to 'float', possible loss of data"
Change-Id: I143cb0bb6b69e1b8befe9b4f24b71adbc28095c2
The convergence algo is noticeably faster and more accurate.
Try it with: 'cwebp -size xxxxx -pass 8 ...' or 'cwebp -psnr 39 -pass 8 ...'
for instance
Allow full-looping with TokenBuffer case, and make the non-TokenBuffer
case match too.
In case Partition0 is likely to overflow, retry encoding with harder
limits on max_i4_header_bits_.
This CL should make -partition_limit option somewhat useless,
since the fix made automatically (albeit in a non-optimal way yet).
Change-Id: I46fde3564188b13b89d4cb69f847a5f24b8c735b
* fix VP8FixedCostsI4ÆÅ table
(the constant cost '211' was erronenously included)
* use the rd-score for '211' correctly (calling SetRDScore() for good)
* count partition0 bits separately during rd-opt
No meaningful difference in rd-curve.
Change-Id: I6c49a150cf28928d9a92c32fff097600d7145ca4
use of uint8_t type was causing error like:
src/dsp/upsampling.c:223:1: internal compiler error: in vect_determine_vectorization_factor, at tree-vect-loop.c:349
with gcc 4.6.3
Change-Id: Ieb6189a1375c47fc4ff992e6c09b34a7f1f605da
When -mt is used, the analysis pass will be split in two
and each halves performed in parallel. This gives a 5%-9% speed-up.
This was a good occasion to revamp the iterator and analysis-loop
code. As a result, the default (non-mt) behaviour is a tad (~1%) faster.
Change-Id: Id0828c2ebe2e968db8ca227da80af591d6a4055f
-pass 2 can be useful sometimes. More passes usually don't help more.
This change is a step toward being able to re-code the whole picture
with varying parameter (when token buffer is used).
Change-Id: Ia2538e2069a53c080e2ad248c18a1e04623a9304
* move yuv_in_/out_* scratch buffers to iterator
* add y_top_/uv_top_ shortcuts in iterator
That's ~3k of stack size instead of heap.
But it allows having several iterators work in parallel.
Change-Id: I6a437c0f2ef1e5d398c1d6a2fd4974fa0869f0c1
in_bits is const. Trying to apply bswap on it, one gets the error message:
error: read-only variable 'in_bits' used as 'asm' output
Change-Id: I0bef494b822c83d8ea87b1938b0e486d94de4742
We use the 'do not blend' option for creating independent frames.
We also mark the already independent frames as 'do not blend'.
This bounds the maximum number of frames that need to be decoded to
decode a given frame, thus leading to a much better decoding performance.
Change-Id: I7cef98af2b53751ec36993fd2bd54f7f4c4aad2b
The C-version gets ~7-8% slower in order to match the SSE2
output exactly. The old (now off-by-1) code is kept under
the WEBP_YUV_USE_TABLE flag for reference.
(note that calc rounding precision is slightly better ~= +0.02dB)
on ARM-neon, we somehow recover the ~4% speed that was lost by mimicking
the initial C-version (see https://gerrit.chromium.org/gerrit/#/c/41610)
Change-Id: Ia4363c5ed9b4c9edff5d932b002e57bb7814bf6f
If 'top' was meant to be NULL, then bottom and top can be
swapped. Logic is simpler.
+ fix compilation in non-FANCY_UPSAMPLING mode
Change-Id: I7c62bbb59454017f072c0945d1ff2d24d89286ff
Doesn't work with WIC
+ redirect some info messages from stdout to stderr
+ fix the error reporting upon output-writing error
Change-Id: I92b8bd7a15e656a3f3cdfbf56299f024e39453f8
Marking certain frames as "do not blend" helps avoiding alpha-blending
at decode/render time.
It also helps inserting I-frames (frames which can be independently
decoded) into the animation.
Change-Id: Iaa222805db88d2f1c81104ce9d882e7c7ff8cfdb
Also created variant VP8LPrefixEncodeBits that returns the
code & extra_bits only.
There's no impact on compression density and compression speed.
Change-Id: I2cafdd3438ac9270cd72ad9d57b383cdddfdfa4c
WebPDemuxPartial() returns NULL for both of the following cases:
- There was a parsing error.
- It doesn't have enough data to start parsing.
Now, one can differentiate between these two cases by checking the value
of 'state' returned by WebPDemuxPartial().
Change-Id: Ia2377f0c516b3fcfae475c0662c4932d2eddcd0b
Earlier, all lossless images were assumed to contain alpha.
Now, we use the 'alpha_is_used' bit from the VP8L bitstream to determine
the
same.
Detecting an absence of alpha can sometimes lead to much more efficient
rendering, especially for animated images.
Related: refine mux code to read width/height/has_alpha information only
once
per frame/fragment. This avoid frequent calls to VP8(L)GetInfo().
Change-Id: I4e0eef4db7d94425396c7dff6ca5599d5bca8297
Speed up HashChainFindCopy by optimizing on number of calls to
FindMatchLength method.
This change speeds up the lossless & lossy (Alpha) encoding by 20%
without loss of compression density.
At method=3, lossy (Alpha) compression speed (and density) remains
unchanged, as at that settings, the costly Backward Refs method is not
called
Change-Id: Ia1797148e9e4ee2787011837fa248afbae2242cb
Disable costly 'BackwardReferencesTraceBackwards' for encoding Alpha plane.
Increase the threshold for triggering 'BackwardReferencesTraceBackwards' to
quality 25 and above. Also lower the Alpha quality (at method 3) to be
lesser than this threshold (25).
Change-Id: Ic29fb2e6943472c564223df9fe099b19ccda0f31
This speeds up WebP lossless decoding by 20%. In particular, the
photographic images get 35% speedup.
Change-Id: Idb94750342a140ec05df52c07e12be4bba335adc
speeds up those codes that are not part of the main lookup.
This gives a 10 % speedup for a photographic image.
Change-Id: Ief54b0ad77db790a01314402ad351b40ac9a7be4
This is to ensure that the output WebP image is consistent with original
GIF when displayed against any background color.
Change-Id: I14218848153eb40358aa4ce331b2543d2fc2e86c
+ some revamp and cleanup of the alpha-filter trial loop
+ EncodeAlphaInternal() now just takes a FilterTrial param
Change-Id: Ief84385083b1cba02678bbcd3dbf707245ee962f
Specialize and simplify the alpha-decoding case, which is used when:
- no color-cache is use
- all red/blue/alpha values are the same (and hence their Huffman tree has
only 1 symbol. We don't need to consume any bits for reading these).
+ revamped the loop to use size_t and offsets instead of pointers.
~2-3% faster on Unix (gcc) but up to 25% faster lossy+alpha decoding
on Mac (llvm) and ARM.
Change-Id: I43c9688d1e4811cab0ecf0108a5b8f45781083e6
* 0.3.0: (57 commits)
update ChangeLog
Regression fix for alpha channels using color cache:
wicdec: silence a format warning
muxedit: silence some uninitialized warnings
update ChangeLog
update NEWS
bump version to 0.3.1
Revert "add WebPBlendAlpha() function to blend colors against background"
Simplify forward-WHT + SSE2 version
probe input file and quick-check for WebP format.
configure: improve gl/glut library test
update copyright text
configure: remove use of AS_VAR_APPEND
fix EXIF parsing in PNG
add doc precision for WebPPictureCopy() and WebPPictureView()
remove datatype qualifier for vmnv
fix a memory leak in gif2webp
fix two minor memory leaks in webpmux
remove some cruft from swig/libwebp.jar
README: update swig notes
...
Conflicts:
NEWS
examples/gif2webp.c
src/dec/alpha.c
src/dec/idec.c
src/dec/vp8l.c
src/enc/alpha.c
src/enc/vp8l.c
Change-Id: Ib202fad7825a090c3b3a5169acd171369cface47
+ split AllocateInternalBuffers() into two 32b/8b variants instead of
trying to do everything in one function.
Change-Id: I35cac9fcd990a2194c95da4b2a4046ca3a514343
Considering the fact that insert to/lookup from the color cache is always 32
bit, use DecodeImageData() variant in that case.
Conflicts:
src/dec/vp8l.c
Change-Id: I6c665a6cfbd9bd10651c1e82fa54e687cbd54a2b
(cherry picked from commit a37eff47d6)
from x86_64-w64-mingw32-gcc
examples/wicdec.c: In function ‘ExtractICCP’:
examples/wicdec.c:131:21: warning: format ‘%u’ expects argument of type
‘unsigned int’, but argument 4 has type ‘size_t’ [-Wformat]
Change-Id: I6642dae62265a2276ae9ac96dd8ce6f1e2d37ca5
(cherry picked from commit ffae9f31e8)
src/mux/muxedit.c:490: warning: 'x_offset' may be used uninitialized in this function
src/mux/muxedit.c:490: warning: 'y_offset' may be used uninitialized in this function
Change-Id: I4fd27f717e59a556354d0560b633d0edafe7a4d8
(cherry picked from commit 14cd5c6c40)
Considering the fact that insert to/lookup from the color cache is always 32
bit, use DecodeImageData() variant in that case.
Change-Id: I6c665a6cfbd9bd10651c1e82fa54e687cbd54a2b
src/mux/muxedit.c:490: warning: 'x_offset' may be used uninitialized in this function
src/mux/muxedit.c:490: warning: 'y_offset' may be used uninitialized in this function
Change-Id: I4fd27f717e59a556354d0560b633d0edafe7a4d8
src\dec\vp8l.c(816) : warning C4244: '=' : conversion from '__int64' to
'int', possible loss of data
src\dec\vp8l.c(817) : warning C4244: '=' : conversion from '__int64' to
'int', possible loss of data
Change-Id: I1d376d5dea909395bff8741aba16e8eed83a6e8f
from x86_64-w64-mingw32-gcc
examples/wicdec.c: In function ‘ExtractICCP’:
examples/wicdec.c:131:21: warning: format ‘%u’ expects argument of type
‘unsigned int’, but argument 4 has type ‘size_t’ [-Wformat]
Change-Id: I6642dae62265a2276ae9ac96dd8ce6f1e2d37ca5
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.
also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )
Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
(cherry picked from commit 9c4ce971a8)
add a check for a libGL function (glOrtho) in addition to glutMainLoop
when establishing the need for libGL at link time.
fixes vwebp link failure on ubuntu 13.04+
Change-Id: I537e9a5cab5cf4cd8875e06268d2107f377e625e
(cherry picked from commit 2ccf58d648)
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
(cherry picked from commit d640614d54)
This wasn't used often and benefits were likely minimal. Dropping it
outright is a bit simpler than adding a compatibility ifdef.
provides some compatibility with older versions of autoconf.
tested with autoconf 2.59/automake 1.7/aclocal 1.7
Change-Id: Ifed892346cf2329597985704830a96fc58d65607
(cherry picked from commit 9326a56f8d)
output picture object is overwritten, not free'd or destroyed.
Change-Id: Ibb47ab444063e7ad90ff3d296260807ffe7ddbf9
(cherry picked from commit 23d28e216d)
(rgba->yuv allocates memory)
Also fixed few warning and cleaned the code up.
Change-Id: Id904ad3ad8802ea9fc3d34247d27193dfa7b0b99
(cherry picked from commit 764fdffaac)
picked up a few unnecessary classes from a dirty tree in the last commit
Change-Id: I98be16a0bc8716476ce440da542d113f254aee78
(cherry picked from commit 325d15ff30)
uses autodoc to display the function arguments rather than the
inscrutable va_args (*args).
Change-Id: Iec2ff8276c1533b14c3032836d822fbdae632521
(cherry picked from commit 825b64db53)
reuse the declarations from arrays_java.i for signed char to make an
explicit uint8_t mapping. this avoids sign conversion build warnings.
Change-Id: Icfb5b865cf1fd404e89f2cd889111f0a94e3c604
(cherry picked from commit ad4a367dba)
The auto-infer logic of detecting the 'Alpha' use case
(via check '(palette[i] & 0x00ff00ffu) != 0' is failing
for this corner case image with all black pixels (rgb = 0)
and different Alpha values.
-> switch generic use-LUT detection
Change-Id: I982a8b28c8bcc43e3dc68ac358f978a4bcc14c36
(cherry picked from commit afa3450c11)
Added 1 pixel cache for palette colors for faster lookup.
This will speedup images that require ApplyPalette by 6.5% for lossless
compression.
Change-Id: Id0c5174d797ffabdb09905c2ba76e60601b686f8
(cherry picked from commit 742110ccce)
* "declaration of ‘index’ shadows a global declaration [-Wshadow]"
* "signed and unsigned type in conditional expression [-Wsign-compare]"
Change-Id: I891182d919b18b6c84048486e0385027bd93b57d
(cherry picked from commit 87a4fca25f)
Earlier such images were using roughly 9 * width * height bytes for
decoding. Now, they take 6 * width * height memory.
Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7
(cherry picked from commit 64c844863a)
normalize formatting
- update decode prototypes
- match project function name style
Change-Id: Ib481b5602171b72dbb1a5d462e6d5166e9b8566e
(cherry picked from commit 7f5f42bb36)
Simply get rid of an intermediate buffer of size width x height, by
using the fact that stride == width in this case.
Change-Id: I92376a2561a3beb6e723e8bcf7340c7f348e02c2
(cherry picked from commit edccd19436)
They went out of sync some time ago, and are
no longer really required since we have them
buildable from makefile.unix
Change-Id: Ica2dcf5c55f44365598f832f55204d123d7aa601
(cherry picked from commit a4e1cdbbe8)
doing so is not part of ISO C; removes some pedantic warnings
Change-Id: I739ad8c5cacc133e2546e9f45c0db9d92fb93d7e
(cherry picked from commit 96e948d7b0)
The output surface CAN be changed inbetween calls to
WebPIUpdate() or WebPIAppend(), but with precautions.
Change-Id: I899afbd95738a6a8e0e7000f8daef3e74c99ddd8
(cherry picked from commit ff885bfe1f)
This applies to images with optional chunks (e.g. images with ALPH
chunk,
ICCP chunk etc). Before this, the incremental decoding used to work like
non-incremental decoding for such files, that is, no rows were decoded
until
all data was available.
The change is in 2 parts:
- During optional chunk parsing, don't wait for the full VP8/VP8L chunk.
- Remap 'alpha_data' pointer whenever a new buffer is allocated/used in
WebPIAppend() and WebPIUpdate().
Change-Id: I6cfd6ca1f334b9c6610fcbf662cd85fa494f2a91
(cherry picked from commit ead4d47859)
- drop some unnecessary link flags
- use lib.exe directly for creating libraries
- factorize /nologo and use it consistently
Change-Id: Ie76119bc051e9bc53e4d6bba1a0a3f124f9062fc
(cherry picked from commit 52967498b3)
breaks under wine; from MSDN:
/FD is only used by the development environment, and it should not be
used from the command line or a build script.
Change-Id: I180c9813e721b163cc645b9b7f14fe36556019d3
(cherry picked from commit c61baf0c10)
Start VP8EncLoop/VP8EncTokenLoop only if VP8EncStartAlpha succeeded.
Change-Id: Id1faca3e6def88102329ae2b4974bd4d6d4c4a7a
(cherry picked from commit 67708d6701)
Earlier, at line#275, if ok == 0, it would have triggered a double free
of 'rgb'.
Change-Id: Iaee1f35824a66f6e4b488e523416f73b87c5ec30
(cherry picked from commit b68912af2c)
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.
Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
(cherry picked from commit e7d9548c9b)
Tuned the cross_color transform parameter (step) for lower quality
levels. This change gives speedup of 20% at lower qualities (25) and 10% at
moderate quality level (50) with a loss of 0.25% in compression density.
Also removed TODO for cross_color transform. Observed good correlation of
this with the predict transform.
Change-Id: I8a1044e9f24e6a5f84295c030fd444d0eec7d154
add a check for a libGL function (glOrtho) in addition to glutMainLoop
when establishing the need for libGL at link time.
fixes vwebp link failure on ubuntu 13.04+
Change-Id: I537e9a5cab5cf4cd8875e06268d2107f377e625e
rather than symlink the webm/vpx terms, use the same header as libvpx to
reference in-tree files
based on the discussion in:
https://codereview.chromium.org/12771026/
Change-Id: Ia3067ecddefaa7ee01550136e00f7b3f086d4af4
This wasn't used often and benefits were likely minimal. Dropping it
outright is a bit simpler than adding a compatibility ifdef.
provides some compatibility with older versions of autoconf.
tested with autoconf 2.59/automake 1.7/aclocal 1.7
Change-Id: Ifed892346cf2329597985704830a96fc58d65607
reuse the declarations from arrays_java.i for signed char to make an
explicit uint8_t mapping. this avoids sign conversion build warnings.
Change-Id: Icfb5b865cf1fd404e89f2cd889111f0a94e3c604
The auto-infer logic of detecting the 'Alpha' use case
(via check '(palette[i] & 0x00ff00ffu) != 0' is failing
for this corner case image with all black pixels (rgb = 0)
and different Alpha values.
-> switch generic use-LUT detection
Change-Id: I982a8b28c8bcc43e3dc68ac358f978a4bcc14c36
Added 1 pixel cache for palette colors for faster lookup.
This will speedup images that require ApplyPalette by 6.5% for lossless
compression.
Change-Id: Id0c5174d797ffabdb09905c2ba76e60601b686f8
'mem' was being offset once by DO_ALIGN() then shifted 'nz_size' which
would end up accounting for more than ALIGN_CST and exceed the allocation.
broken since:
9bf3129 align VP8Encoder::nz_ allocation
Change-Id: I04a4e0bbf80d909253ce057f8550ed98e0cf1054
* "declaration of ‘index’ shadows a global declaration [-Wshadow]"
* "signed and unsigned type in conditional expression [-Wsign-compare]"
Change-Id: I891182d919b18b6c84048486e0385027bd93b57d
Earlier such images were using roughly 9 * width * height bytes for
decoding. Now, they take 6 * width * height memory.
Change-Id: Ie4a681ca5074d96d64f30b2597fafdca648dd8f7
Simply get rid of an intermediate buffer of size width x height, by
using the fact that stride == width in this case.
Change-Id: I92376a2561a3beb6e723e8bcf7340c7f348e02c2
They went out of sync some time ago, and are
no longer really required since we have them
buildable from makefile.unix
Change-Id: Ica2dcf5c55f44365598f832f55204d123d7aa601
no precision loss observed
speed is not really faster (0.5% at max), as forward-WHT isn't called often.
also: replaced a "int << 3" (undefined by C-spec) by a "int * 8"
( supersedes https://gerrit.chromium.org/gerrit/#/c/48739/ )
Change-Id: I2d980ec2f20f4ff6be5636105ff4f1c70ffde401
it's not often the case, but could happen, that chroma has non-zero
coeff but luma hasn't. In such case, we should skip luma right away
Change-Id: I9515573ffaec8aad8b069d2c02ffbda4a6eff97c
Removed a call to WebPParseHeaders() inside VP8GetHeaders(). This was not needed
anyway, as all call flows already call WebPParseHeaders() before calling
VP8GetHeaders().
This avoids duplicate calls to WebPParseHeaders().
Change-Id: Icb2d618bd26c44220d956c17a69c9c45a62d5237
The output surface CAN be changed inbetween calls to
WebPIUpdate() or WebPIAppend(), but with precautions.
Change-Id: I899afbd95738a6a8e0e7000f8daef3e74c99ddd8
This applies to images with optional chunks (e.g. images with ALPH
chunk,
ICCP chunk etc). Before this, the incremental decoding used to work like
non-incremental decoding for such files, that is, no rows were decoded
until
all data was available.
The change is in 2 parts:
- During optional chunk parsing, don't wait for the full VP8/VP8L chunk.
- Remap 'alpha_data' pointer whenever a new buffer is allocated/used in
WebPIAppend() and WebPIUpdate().
Change-Id: I6cfd6ca1f334b9c6610fcbf662cd85fa494f2a91
- drop some unnecessary link flags
- use lib.exe directly for creating libraries
- factorize /nologo and use it consistently
Change-Id: Ie76119bc051e9bc53e4d6bba1a0a3f124f9062fc
breaks under wine; from MSDN:
/FD is only used by the development environment, and it should not be
used from the command line or a build script.
Change-Id: I180c9813e721b163cc645b9b7f14fe36556019d3
new option: -blend_alpha 0xrrggbb
also: don't force picture.use_argb value for lossless. Instead,
delay the YUVA<->ARGB conversion till WebPEncode() is called.
This make the blending more accurate when source is ARGB
and lossy compression is used (YUVA).
This has an effect on cropping/rescaling. E.g. for PNG, these
are now done in ARGB colorspace instead of YUV when lossy compression
is used.
Change-Id: I18571f1b1179881737a8dbd23ad0aa8cddae3c6b
* 0.3.0: (65 commits)
Update ChangeLog
Cosmetic fixes
misc style fix
add missing YUVA->ARGB automatic conversion in WebPEncode()
Container spec: Clarify frame disposal
container doc: add a note about the 'ANMF' payload
Container spec: clarify the background color field
container doc: move RIFF description to own section
libwebp/mux: fix double free
use WebPDataCopy() instead of re-coding it.
demux: keep a frame tail pointer; used in AddFrame
add doc precision about WebPParseHeaders() return codes
gif2webp: Bgcolor fix for a special case
fix bad saturation order in QuantizeBlock
vwebp/animation: fix background dispose
Makefile.vc: fix dynamic builds
update ChangeLog
examples: don't use C99 %zu
update ChangeLog
update NEWS
...
Conflicts:
src/webp/format_constants.h
Change-Id: Ie659644d3ea5592cde64ec3af90a00cd17838247
user can now call WebPEncode() with any YUVA or ARGB format, for
lossy or lossless compression
also: simplified error reporting, which is done in WebPPictureARGBToYUVA()
and WebPPictureYUVAToARGB()
Change-Id: Ifb68909217175bcf5a050e5c68d06de9849468f7
(cherry picked from commit 07d87bda1b)
- Add a note that disposal only applies to the frame rectangle
- Add the formula for alpha-blending.
- Note that alpha-blending would occur for the common rectangle
- Also note the case when both color profile and disposal are
present.
Change-Id: I214787dd64453edf3b0cdaff3951015281a32ee4
user can now call WebPEncode() with any YUVA or ARGB format, for
lossy or lossless compression
also: simplified error reporting, which is done in WebPPictureARGBToYUVA()
and WebPPictureYUVAToARGB()
Change-Id: Ifb68909217175bcf5a050e5c68d06de9849468f7
transfer ownership of chunk passed in ChunkSetNth(). prevents freeing
the chunk in e.g., MuxImageParse() when a partial assignment (alpha but
no image) has occurred.
Change-Id: Ia69656b04fdf50f098f3816b54abd4e191248de3
When transparent color index and background color index are same,
we should set background color to 0x00ffffff (transparent).
For this, we delay setting the background color until we have read the
first frame.
Change-Id: I443609b9c7697a2b94a66992460cff8465b3c127
Saturation was done on input coeff, not quantized one.
This saturation is not absolutely needed: output of FTransformWHT
is in range [-16320, 16321]. At quality 100, max quantization steps is 8,
so the maximal range used by QuantizeBlock() is [-2040, 2040].
But there's some extra bias (mtx->bias_[] and mtx->sharpen_[]) so
it's better to leave this saturation check for now.
addresses issue #145
Change-Id: I4b14f71cdc80c46f9eaadb2a4e8e03d396879d28
buffer the last frame's details to perform DISPOSE_BACKGROUND on the
image's area, rather than the entire canvas.
also fixes transparent backgrounds with animated images
Change-Id: I53e4d70c441e1eeb136f1d01e7c88de4f9ecff53
broken since:
e2feefa Makefile.vc: split mux into separate lib
broken even more since:
b65c4b7 Makefile.vc: add libwebpdecoder target
Change-Id: Ibb7a7f2c71ae63434a60d988ce15f0a4ed8fcaee
this would require a PRIuS or similar macro for proper platform
compatibility (Visual Studio for instance would be variants of %lu)
Change-Id: I1af530c7c358c91b845acde1d8c12ef46c2ef746
* merge cost calculation functions (BitsEntropy() and HuffmanCost())
* have HistogramAdd() specialized into separate functions
* use threshold to bail-out early
* revamp code a bit
* also: save memory by freeing free(histogram_image)
Change-Id: I8ee5d2cfa1462d5d6ea6361f5c89925a3720ef55
* make the 'all' target really build everything (default is still the
core examples).
* add demux/mux.h to HDRS_INSTALLED, install the corresponding libs
too
* install vwebp, webpmux, gif2webp and related manpages
Change-Id: Ib6036f2a1a05e40f106914c4bdbe9e3ad7336464
* VP8L shouldn't have an alpha chunk
* expect an animation to only contain frames, not a mix of image chunks
* enforce ANIM/ANMF order
* expect a full frame in a complete file
Change-Id: I953a8b6058f9bc00f1d042635548f158abdf6fce
subdirectories with more than one target can have the install targets
run in parallel with make -jN. group the shared headers in one place to
produce a common install target.
Change-Id: I1f3aa338a8ee6d681de1e5d0b2c6244d2c3d5451
--enable-libwebpdemux
--enable-libwebpmux
These are going stable with any remaining experimental features under
--enable-experimental
Change-Id: I8da0736438b2a58a2ea58b37b2630911ce300632
METADATA_ICCP was renamed to METADATA_ICC in
d8dc72a examples: normalize icc related program arguments
but was merged without rebasing after
0bc4268 cwebp: output metadata statistics
Change-Id: Ie317208488cc851d5d21300591c91cebf5abd4a7
Reported to eventually be 4% on ARM
(see https://code.google.com/p/webp/issues/detail?id=134 for details)
We might activate it selectively later...
Output values is not bitwise the same as the LUT-based
version, but difference is only +/-1 at max.
Change-Id: I1cc790ff4459885ed2ae2e72f31c5f3740095f07
older versions of automake (1.9) it seems would install the headers
regardless of the fact that the library was marked noinst_
this change follows some of the header guidance found here:
http://www.gnu.org/software/automake/manual/automake.html#Headers
Change-Id: I80acc00935097ebf36004e9871574fb9ef09aabf
EXPERIMENTAL -> WEBP_EXPERIMENTAL_FEATURES
the former is not used in the source; this adds
WEBP_EXPERIMENTAL_FEATURES to config.h
Change-Id: I4822bd3be1ea631e96629ae249dc778cdb5d8bb6
this allows DLLs to be built under mingw and sets up a more obvious
dependency in the shared objects
Change-Id: I33e8a6132a16ca49563492438a1b3b74be9ed6a1
expands to e.g., -lpthread/-pthread, etc. if --enable-threading is used
and additional libs/flags are required
Change-Id: I8e8a7b44450bee32ddc58097e1e309d972b1092a
This is required for WebP lossy+Alpha images, where Alpha channel is taking
60-70% of the compression (CPU) cycles.
Also evaluated on 1000 PNG corpus and overall compression speed
is 15-40% better for lossy (PNG+Alpha) compression.
The pure lossless compression numbers are almost same (or little
better) with this change.
Change-Id: I9e5ae7372ed6227a9a5b64cd9cff84c747195a57
using token-buffer (that is: slightly more memory. O(output_size))
This change is ON by default. To return to previous behaviour, use
'cwebp -low_memory' or set config.low_memory to true.
Side-effect of this new mode: it forces 1 partition only (which was
default anyway), and makes some statistics about the bitstream
no longer available. cwebp will no longer report 'intra4-coeffs', etc.
This mode also doesn't work (yet) with multi-pass, and -low_memory
is currently forced for multi-pass.
also: reversed the flag: USE_TOKEN_BUFFER -> DISABLE_TOKEN_BUFFER
also: fixed the kAverageBytesPerMB estimate
Change-Id: I4ea80382038d6df4309663e0cb7bd88d9bca9cf1
When a ANMF/FRGM chunk size (read from file) is smaller than ANMF/FRGM header
size (which is constant and implicit), the parser should report an error.
Change-Id: I91d71889937f5133a97f1e83d5254cb2d7f37028
broken since:
ad25032 Merge "multi-threaded alpha encoding for lossy"
this produced an error due to an empty VP8TBuffer struct.
Change-Id: I640809d07d20092c1d660e2b59b58a62a12e4371
Also tweak the code for single image and fragmented image cases, so that
dmux->num_frames_ is always correct.
Change-Id: I31e6904222e4d96a54a0d8c8aa73d43b7a9094e7
new option: 'cwebp -mt ...'
new config flag: config.thread_level
(allowed thread_level are 0 or 1 for now. Maybe more later...)
If -mt is activated (and WEBP_USE_THREAD is used for compile), the alpha-compression
will be done in parallel to RGB coding for lossy. Can save quite a bit of latency...
Has no effect for lossless encoding.
Change-Id: I769d0bf90e7380cf99344ad62cd77277f4df5a46
It now returns ALPHA_FLAG for lossless images with alpha, that don't
have a VP8X chunk.
This is consistent with similar methods WebPMuxGetFeatures() and
WebPGetFeatures().
Change-Id: Ia3a4ca8f3e0f102f478bd33e0727ca5be98593df
larger values are still dealt with in the .cc
~5% faster encoding
Output size is slightly different (variably), because of
different floating-point calculation ordering.
Change-Id: I6ede18b09c753997cf78aa1199a807d9ddb5d4b4
-> split libraries further into decoder / encoder
-> add libwebpdecoder.a in Makefile.unix
-> make dwebp link against libwebpdecoder.a in Makefile.unix
also: in makefile.unix, pass EXTRA_FLAGS to LDFLAGS too
(otherwise, -m32 wouldn't work, e.g.)
Change-Id: Ief3da02a729dd86bbaf949ed048836716941657f
Also use a naming 'extended format' rather than 'container file' to be
consistent with the container specification.
Change-Id: I3b07f95e0244d3534fe17b03f60db22f61e17836
signed integer overflow behavior is undefined, split PrefixEncode() to
two branches to avoid this.
Change-Id: I6e2761d0d77f0aaceafdc4e07232e089c22beb64
* add SSE2 variant for lossless
* speed-up TransformColor calls using specialized TransformColorBlue/Red
* Fuse the Shannon Entropy calls to compute it for X and X+Y simultaneously.
This latter changes the output size a little bit.
Change-Id: Ie5df94da78bf51a58da859c9099b56340da9ec89
Simplify and re-organize the VP8L bit-reader functions
(e.g.: the 40-bit look-ahead code was helping much)
Speed-up with LBITS=64, on arm7-a:
=> before:
./dwebp_justify_24_neon -v bryce_ll.webp
Time to decode picture: 11.393s
File bryce_ll.webp can be decoded (dimensions: 11158 x 2156).
...
=> after (LBITS=64): Time to decode picture: 9.953s
making the VP8L bit-reader in 32 bit mode is going to be
harder (because we need to be able to read two symbols
at a time, each with max length 15 bits)
Change-Id: I89746fb103b87b5e2fd40a3208a6fbc584b88297
This flag will make the code use no uint64, no asm, and no fancy
trick, but instead aim at being as simple and straightforward as
possible.
Main use is to help emscripten generate proper JS code.
More code needs to be simplified later.
Also: tune the BITS values to be 24 and make use of WEBP_RIGHT_JUSTIFY
Here are the typical timing for decoding a large image:
ARM7-a:
dwebp_justify_32_neon Time to decode picture: 3.280s
dwebp_justify_24_neon Time to decode picture: 2.640s
dwebp_justify_16_neon Time to decode picture: 2.723s
dwebp_justify_8_neon Time to decode picture: 2.802s
dwebp_justify_32 Time to decode picture: 4.264s
dwebp_justify_24 Time to decode picture: 3.696s
dwebp_justify_16 Time to decode picture: 3.779s
dwebp_justify_8 Time to decode picture: 3.834s
dwebp_32_neon Time to decode picture: 4.010s
dwebp_24_neon Time to decode picture: 2.725s
dwebp_16_neon Time to decode picture: 2.852s
dwebp_8_neon Time to decode picture: 2.778s
dwebp_32 Time to decode picture: 4.587s
dwebp_24 Time to decode picture: 3.800s
dwebp_16 Time to decode picture: 3.902s
dwebp_8 Time to decode picture: 3.815s
REFERENCE (HEAD) Time to decode picture: 3.818s
x86_64:
dwebp_justify_32 Time to decode picture: 0.473s
dwebp_justify_24 Time to decode picture: 0.434s
dwebp_justify_16 Time to decode picture: 0.450s
dwebp_justify_8 Time to decode picture: 0.467s
dwebp_32 Time to decode picture: 0.474s
dwebp_24 Time to decode picture: 0.468s
dwebp_16 Time to decode picture: 0.468s
dwebp_8 Time to decode picture: 0.481s
REFERENCE (HEAD) Time to decode picture: 0.436s
i386:
dwebp_justify_32 Time to decode picture: 0.723s
dwebp_justify_24 Time to decode picture: 0.618s
dwebp_justify_16 Time to decode picture: 0.626s
dwebp_justify_8 Time to decode picture: 0.651s
dwebp_32 Time to decode picture: 0.744s
dwebp_24 Time to decode picture: 0.627s
dwebp_16 Time to decode picture: 0.642s
dwebp_8 Time to decode picture: 0.642s
Change-Id: Ie56c7235733a24f94fbfc2e4351aae36ec39c225
This option remaps internal parameters to better match
the expected compression curve of JPEG and produce output files
of similar size, but with better quality.
Change-Id: I96a1cbb480b1f6a0c6845a23c33dfd63f197b689
If it's not a partial file and parser returns PARSE_NEED_MORE_DATA, then
consider it to be PARSE_ERROR.
Change-Id: Id652a345bd2a9f574970272dd0a00517de113215
It will decode to raw (flat) YUV format, similar to what
cwebp can take as input. Makes the PSNR/SSIM calculation easier.
Change-Id: Iebfaedfc0bedc70c169b24ae4aabc701488d0644
If a NULL pre-allocated buffer is passed, a buffer will be automatically
allocated.
+ add some parameter checks.
reported in http://code.google.com/p/webp/issues/detail?id=139
Change-Id: I9e14ed97db30ee12e46b5e92aac7eeaaeb99bfd5
store values to a temporary variable before calling functions that take
vector types.
removes non-standard constructs such as:
(uint8x8x2_t){{ a, b }}
fixing:
src/dsp/upsampling_neon.c:69:32: error: macro "vst2_u8" passed 3
arguments, but takes just 2
Change-Id: Ib4368e16e3a3efac18024f02be94e76243ade2dc
Fixes: https://code.google.com/p/webp/issues/detail?id=140
- along the lines of the SSE chroma upsampling.
Total speedup is ~30%.
4% speed loss on YuvToRgbXX conversion using tables instead
of 14-bit fixed precision. TODO(later): investigate, and compare
to x86.
see http://code.google.com/p/webp/issues/detail?id=134
Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009
This build script (iosbuild.sh) will build for following platforms:
iPhoneSimulator, iPhoneOS-V7 & iPhoneOS-V7s
Change-Id: Icbb69e00277a4164f848b8766089302e299506e0
This copies metadata selected by -metadata from the input to the output
if present.
Currently there is no WIC support for Windows builds.
Change-Id: I34fb2443729d80ffe3a6da0979d9f6fa9b3fe536
When the config option '--enable-libwebpdecoder' is specified, the
lean decoder library 'libwebpdecoder' will be created in addition to
libwebp. Also dwebp binary will be linked to libwebpdecoder, if this
config option is specified.
Change-Id: I9de3e149b59c9a8390fae2ba660941749640e54a
Actually, it turns out we now should never call these functions
with a zero size, otherwise something is wrong in the logic.
Change-Id: Ie414fcbec95486c169190470a71f2cff0843782a
also change lossless encoder logic, which was relying on explicit
NULL return from WebPSafeMalloc(0)
renamed function to CheckSizeArgumentsOverflow() explicitly
addresses issue #138
Change-Id: Ibbd51cc0281e60e86dfd4c5496274399e4c0f7f3
- Escape brackets for which kramdown was generating a warning.
Note: This only changes this source file; output HTML would look exactly
the same.
- Also write '5' in words ('five').
Change-Id: I472a03c090a12eb7520719ea463469b36a2736b9
- Clarify the BNF using 'Huffman code groups' and 'Huffman code group'.
- Introduce same terminology in 'Interpretation of meta Huffman codes'.
- Make explicit mention of what is the number of Huffman code groups,
number of Huffman codes and the relation between the two.
Change-Id: I07aa9b62c1d464cd25dc02ac1a68d338b575bdc2
currently has no effect except to disable metadata extraction from the
input when the value is 'none'.
Change-Id: Ic50d4c9d634cc1f6b72ae4e130e99736c85a6477
versions < 4.0.0 used uint32 interchangeably, but with 4.0.0 toff_t
became 64-bit and TIFFTAG_EXIFIFD began returning it rather than
TIFF_LONG/uint32.
Change-Id: I42492bd24613a884c7496e7bfc0c5d892758bce9
the expression using dircount will be promoted to int when tdir_t is an
unsigned short.
quiets:
warning: format specifies type 'unsigned short' but the argument has
type 'int' [-Wformat]
Change-Id: If2e8c27454826556178b0a972aaed272d5fbfa07
- Rectify a few BNF descriptions
- Corrections in "Decoding Flow of Image Data" section:
1. The sequence in case of "S < 256" should be green, red, blue, alpha.
2. In case of "S >= 256 + 24", the index should be "S - (256 + 24)".
- Provide more description to clarify "Decoding Flow of Image Data" section.
- Some cosmetics: use '1's instead of '1, 2, 3...' sequence, as kramdown takes
care of sequencing.
Change-Id: I2b76caf72f67aae813522dc1a4115f8ec8ea6db7
tiffdec does not support and warns about multi-directory images.
previously, the code would read all the directories and thus attempt to
use the last rather than the first as the message suggests.
Change-Id: I3a10c778e6e924a3df75b41c26a9c03afb761044
- precompute filtering strength once for all at the beginning
instead of per-macroblock
- reduce size of VP8MB struct from 8 bytes to 4.
- removed VP8StoreBlock() accordingly
Change-Id: Icf3d329473e21c464770be3d72a04c9ee4c321f2
* change some (*func_ptr) construct to func_ptr simply.
* remove one memcpy
* group #include related to decoding together
Change-Id: If751cfbd9e78be75c57fb60fc9c937900c2c8fe0
unused currently, but the intent is to allow each format to populate
exif/xmp/icc with cwebp then transferring it to the webp file.
Change-Id: I0514f62de52fa7f89c595ee7ef2ad7dced910a41
The color transform block size is stored as 3 bits, not 4.
Fixed the description. The code snippet is already correct.
Change-Id: I830d848b54c121cb5426ca06853a3f1184fd9a31
The prediction block size is stored as 3 bits, not 4.
Fixed the description. The code snippet is already correct.
Change-Id: Iaa66a7e9817b58a2557c9a71c2231cc400b6ae4d
GetCoeffs is (by far) the most consuming function of the decoder.
No speed change (unfortunately), but the main loop is somehow clearer.
Change-Id: I78f1c10cadc2c8696c041f5cbda86cab92cc6598
* treat the last coeff as a special case
* re-arrange the inner code to be shorter
* replace some VP8EncBands[n] by n, for n = 0 or 1
Change-Id: I71e17b014cffad7b073e787fde06260905a6953f
The main advantage is that you can avoid the use of uint64_t
some times, sticking to 32bit only.
Default still is BITS=32, this is mainly "in case".
Change-Id: Id694028793117ba822c37d46ef6c52fa0afed4ac
The binaries should have dependency on related libraries only.
For example, 'vwebp' sould not depend on libpng, libjpeg etc.
Change-Id: I517b7ae6a9092d5de891b46d0e75d07d91548e93
- Separate out mux.h and demux.h
- muxtypes.h: new header for data types common to mux/demux
- Move some misc read/write utilities to utils/utils.h
- Remove some duplicate methods.
- Separate out mux/demux libraries
Change-Id: If9b9569b10d55d922ad9317ef51710544315d6de
* fix gif2webp to handle disposal method and odd offset correctly
+ remove -scale and -crop option from vwebp, since it'll be broken
with offsets and fragments. Needs a revisit.
+ remove a warning in gif2web
Change-Id: If04c6d085806e32540f2f15a37244c4407b719b3
We don't need to use the exact forward transform,
since it's only a rough evaluation.
-> Removed some shifts and rounding constants.
Change-Id: I3fdf8b4fe9720473894155e1ad0345f4d1fd9a33
Now, its: +duration+xoffset+yoffset+disposal
+disposal can be omitted and will default to +0 (NONE)
additionally, +xoffset+yoffset can be omitted and will default to +0+0
Change-Id: I62138c9f675d4fc4408305babbcd485cb32b73d3
In particular, this removes any unnecessary FRGM/ANMF/ANIM chunks, and
indirectly leads to removal of unnecessary VP8X chunks as well.
This is especially useful for GIF to WebP conversion - it saves 56 bytes
(ANMF: 16+8 bytes, ANIM: 6+8 bytes, VP8X: 10+8 bytes) for non-animated GIFs.
Change-Id: I3b50a96ca585844c421b0fa4cd8593e52c3f95c5
This is a correction to the following change:
a00a3daf5b Use 'frgm' instead of 'tile' in
webpmux parameters
Change-Id: I8fa0bce98efdde38827fd25712017a98a6ea7388
- Allow a duration of 0
- Rename LOOP chunk to ANIM and add the background color field to it.
- Add a disposal method field for each animation frame.
- Modify webpmux.c binary interface to allow the input of background color
and disposal methods. Also make '-loop' and '-bgcolor' arguments optional
with some default values.
Change-Id: I807372a61cdb8a0d3080ae3552caf2848070bf4d
10-15% faster encoding.
Almost same output, binary wise. The main difference is
that we can't compute uv_alpha susceptibility, means there
can be subtle differences with different -sns values.
Change-Id: Id1b1a50929bf125b6372212fee1ed75a3bed975f
- Also, use the term 'fragments' instead of 'tiling' in code
- This makes code consistent with the spec.
Change-Id: Ibeccffc35db23bbedb88cc5e18e29e51621931f8
- Make ANMF and FRGM chunks hierarchical so that they encompass all chunks of
that frame.
- Use this in demuxer: stop parsing a frame if all image data for it isn't
available yet. Thus, we have a frame-level incremental support; that is,
all frames that are fully available can be parsed.
- Note: We still keep incremental support for single images - so that they can
be decoded with incremental decoding.
Change-Id: Id1585b16b06caee1d84009c42a25d2de29fa6135
Use separate fourCCs "XMP " and "EXIF" instead of a common "META"
Also, some refactorization in webpmux.c
Change-Id: Iad3337e5c1b81e785c60670ce28b1f536dd7ee31
Number of pairs selected are limited between 25% of histogram
images (at start) and number of histogram images left at any iteration.
Increase the range of iter_mult.
Removed min_cluster_size as parameter for tuning HistogramCombine.
Change-Id: Ia4068cd7af4d0f63c5af9001aceda8a40b9de740
* commit 'v0.2.1':
Update ChangeLog
update NEWS
bump version to 0.2.1
libwebp: validate chunk size in ParseOptionalChunks
cwebp (windows): fix alpha image import on XP
autoconf/libwebp: enable dll builds for mingw
[cd]webp: always output windows errors
fix double to float conversion warning
cwebp: fix jpg encodes on XP
VP8LAllocateHistogramSet: fix overflow in size calculation
GetHistoBits: fix integer overflow
EncodeImageInternal: fix uninitialized free
fix the -g/O3 discrepancy for 32bit compile
fix the BITS=8 case
Make *InitSSE2() functions be empty on non-SSE2 platform
make *InitSSE2() functions be empty on non-SSE2 platform
make VP8DspInitNEON() public
Conflicts:
src/Makefile.am
src/dsp/dec_neon.c
Change-Id: Iddc5152e4a6892db96c12d7c3f74adbc85fe6178
Contributed by Wayne Chen (datoudatou at gmail dot com)
+ some header cleanup
+ remove the NEON suffix in static functions
Change-Id: I75bf5e9b54cf5e1acc53764c6f081d61690f8e3d
(implements the backward and forward transforms in the encoder)
original patch by Wayne Chen (datoudatou at gmail dot com)
Change-Id: Ic00f3bffcdf7a924f043006728735c810ee47a57
- Changed the dynamic range where more aggressive
(BackwardReferencesTraceBackward) heuristic is run from quality > 10
(instead of quality > 25).
- Limit the backward-ref Window size to 16*width & 256*width for lower
qualities ([0, 25[ & [25, 50[) respectively, instead of 1M window.
- Evaluate the params for HashChainFindCopy outside this function call
and pass it, instead of recomputing them for every call.
Change-Id: If9eedfc14b978e7632d7cf69c96186e2910b0554
the max wasn't checked leading to a rollover case, possibly exploitable.
additionally check the RIFF size early, to avoid similar issues.
pulled from chromium:
http://codereview.chromium.org/11229048/
Change-Id: Ifebc712bf3d3de0129b76ca4c57c68e062abc429
Query the converter to ensure the format is supported; add BGR formats
as RGBA was failing for PNG on XP
Fixes issue 129
Change-Id: I02e0d74b3b21337bc5fffd6a5dc158b7809b9aa9
This is mostly for experimentation!
Need to define USE_YUVj flag in the code for that.
suggested by benwreder at hotmail dot com
Change-Id: If0b8e2c1863efc08ce097de6de20f4c7efc3f7e8
LSIM stands for "local similarity": before matching
a compressed pixel to the source, we search around in the source
and minimise the squared error. So, this is close to PSNR calculation,
but mitigates some of its limitations (pure translation and noise for instance).
There's a new -print_lsim option to cwebp too.
Change-Id: Ia38561034c7a90e71d2ea0f55bb1de527eda245b
Make the heuristic for combining Histograms a function of compression
quality. This change will speed-up compression time for compression
quality less than 75. The compression time/density remains unchanged
for compression quality 75 and higher.
Change-Id: I94513d51078340fbc0737d459fab2cebdd2d6082
correct has_alpha check; previously it was controlled by keep_alpha,
which overrode the source format check.
fixes issue #127
Change-Id: I949be90419b03610c64900be0fd37f83b70cbe73
the multiplications done for total_size would be done with integers,
possibly overflowing, before being promoted to 64-bit for the addition
Change-Id: Id5c127c8a497ce5de89a276c17f36b59eeb67c21
huff_image_size was a size_t (=32 bits with 32-bit builds) which could
rollover causing an incorrectly sized allocation and a crash in lossless
encoding.
fixes issue #128
Change-Id: I175c8c6132ba9792034807c5c1028dfddfeb4ea5
in debug mode, some float operations see their intermediate
values stored in memory rather than staying in the FPU (which
is 80bit precision).
Several fixes are possible (breaking long calculations into
atomic steps for instance), but simpler of all is just about
turning the cost[] array into float* instead of double*.
The code is a tad faster, and i didn't see any major output
size difference.
Change-Id: Icf1f833e15f8ee4ecc7f9a521d07fdc96ef711aa
Change-Id: I36d3765e94d2b5529b321c186ccee1744785c5b3
fixes:
error: ISO C++ forbids forward references to 'enum' types
since:
28d25c8 replace 'typedef struct {} X;" by "typedef struct X X; struct X {};"
Returning 0 (equal) can lead to undefined behaviour.
And, in our cases we'll never have equal keys (added asserts for that)
Change-Id: Ifaf202df321d3f877ad2a03de42e0d6cdd1b2388
SBITS=8 is reported 20-30% faster on ARM (where 64bit ops
are expensive).
Also use 32bits for i32.
Change-Id: Id6a7197d805061aeb8832f20432512d0d930ebfa
fixes the 'blocky sky problem' (saturation problem: when luma was flat,
chroma noise was taking over, resulting in random segment id assigned.
When just using a common uniform segment was better).
+ side clean-up and readibility/experimentability MACRO'ization
+ added '-map 7' option
Change-Id: I35982a9e43c0fecbfdd7b05e4813e8ba8c121d71
this will avoid the "dec_neon.o has no symbol" warning
no change in binary size observed on linux.
Change-Id: Ia27ae2bc5a03d714afa7e46671fdcf4cb630784d
* 0.2.0: (42 commits)
Update ChangeLog
dec/io.c: cosmetics
RGBA4444: harmonize lossless/lossy alpha values
fix RGBA4444 output w/fancy upsampling
Alignment fix
avoid rgb-premultiply if there's only trivial alpha values
fix the ARGB4444 premultiply arithmetic
Lossless decoder fix for a special transform order
Update encoding heuristic w.r.t palette colors.
remove unused ApplyInverseTransform()
Update ChangeLog
update AUTHORS
update NEWS
add support for ARGB -> YUVA conversion for lossless decoder
bump version to 0.2.0
fix alpha-plane check + add extra checks
MODE_YUVA: set alpha to opaque if the image has none
silence one more warning
move some RGB->YUV functions to yuv.h
README: sync [cd]webp help output
...
Change-Id: I4c4e3be2e88655d25d4dd2eae46c580d960b12ac
2012-08-17 14:33:56 -07:00
287 changed files with 75659 additions and 20582 deletions
The distances codes that map into these tuples are changes into
scan-line order distances using the following formula:
_dist = x + y * xsize_, where _xsize_ is the width of the image in
pixels. If a decoder detects a computed _dist_ value smaller than 1,
the value of 1 is used instead.
For example, distance code `1` indicates offset of `(0, 1)` for the neighboring
pixel, that is, the pixel above the current pixel (0-pixel difference in
X-direction and 1 pixel difference in Y-direction). Similarly, distance code
`3` indicates left-top pixel.
The decoder can convert a distances code 'i' to a scan-line order distance
'dist' as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
(xi, yi) = distance_map[i]
dist = x + y * xsize
if (dist < 1) {
dist = 1
}
~~~~~~~~~~~~~~~~~~~~~~~~~~~~
where 'distance_map' is the mapping noted above and `xsize` is the width of the
image in pixels.
### Color Cache Code
#### 4.2.3 Color Cache Coding
Color cache stores a set of colors that have been recently used in the
image. Using the color cache code, the color cache colors can be
referred to more efficiently than emitting the respective ARGB values
independently or sending them as backward references with a length of
one pixel.
Color cache stores a set of colors that have been recently used in the image.
Color cache codes are coded as follows. First, there is a bit that
indicates if the color cache is used or not. If this bit is 0, no color
cache codes exist, and they are not transmitted in the Huffman code that
decodes the green symbols and the length prefix codes. However, if this
bit is 1, the color cache size is read:
**Rationale:** This way, the recently used colors can sometimes be referred to
more efficiently than emitting them using other two methods (described in
[4.2.1](#huffman-coded-literals) and [4.2.2](#lz77-backward-reference)).
Color cache codes are stored as follows. First, there is a 1-bit value that
indicates if the color cache is used. If this bit is 0, no color cache codes
exist, and they are not transmitted in the Huffman code that decodes the green
symbols and the length prefix codes. However, if this bit is 1, the color cache
size is read next:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int color_cache_code_bits = ReadBits(br, 4);
int color_cache_code_bits = ReadBits(4);
int color_cache_size = 1 << color_cache_code_bits;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
`color_cache_code_bits` defines the size of the color_cache by (1 <<
`color_cache_code_bits`). The range of allowed values for
`color_cache_code_bits` is [1..11]. Compliant decoders must indicate a
`color_cache_code_bits` is \[1..11\]. Compliant decoders must indicate a
corrupted bitstream for other values.
A color cache is an array of the size `color_cache_size`. Each entry
A color cache is an array of size `color_cache_size`. Each entry
stores one ARGB color. Colors are looked up by indexing them by
(0x1e35a7bd * `color`) >> (32 - `color_cache_code_bits`). Only one
lookup is done in a color cache; there is no conflict resolution.
@@ -761,91 +812,188 @@ literals, into the cache in the order they appear in the stream.
5 Entropy Code
--------------
### Huffman Coding
### 5.1 Overview
Most of the data is coded using a canonical Huffman code. This includes
the following:
Most of the data is coded using [canonical Huffman code][canonical_huff]. Hence,
the codes are transmitted by sending the _Huffman code lengths_, as opposed to
the actual _Huffman codes_.
* a combined code that defines either the value of the green
component, a color cache code, or a prefix of the length codes;
* the data for alpha, red and blue components; and
* prefixes of the distance codes.
In particular, the format uses **spatially-variant Huffman coding**. In other
words, different blocks of the image can potentially use different entropy
codes.
The Huffman codes are transmitted by sending the code lengths; the
actual symbols are implicit and done in order for each length. The
Huffman code lengths are run-length-encoded using three different
prefixes, and the result of this coding is further Huffman coded.
**Rationale**: Different areas of the image may have different characteristics. So, allowing them to usedifferent entropy codes provides more flexibility and
potentially a better compression.
### 5.2 Details
### Spatially-variant Huffman Coding
The encoded image data consists of two parts:
For every pixel (x, y) in the image, there is a definition of which
entropycode to use. First, there is an integer called 'meta Huffman
code' that can be obtained from the entropy image. This
meta Huffman code identifies a set of five Huffman codes, one for green
(along with length codes and color cache codes), one for each of red,
blue and alpha, and one for distance. The Huffman codes are identified
by their position in a table by an integer.
1. Meta Huffman codes
1. Entropy-coded image data
#### 5.2.1 Decoding of Meta Huffman Codes
### Decoding Flow of Image Data
As noted earlier, the format allows the use of different Huffman codes for
different blocks of the image. _Meta Huffman codes_ are indexes identifying
which Huffman codes to use in different parts of the image.
Read next symbol S
Meta Huffman codes may be used _only_ when the image is being used in the
[role](#roles-of-image-data) of an _ARGB image_.
1. S < 256
1. Use S as green component
2. read alpha
3. read red
4. read blue
2. S < 256 + 24
There are two possibilities for the meta Huffman codes, indicated by a 1-bit
value:
* If this bit is zero, there is only one meta Huffman code used everywhere in
the image. No more data is stored.
* If this bit is one, the image uses multiple meta Huffman codes. These meta
Huffman codes are stored as an _entropy image_ (described below).
**Entropy image:**
The entropy image defines which Huffman codes are used in different parts of the
image, as described below.
The first 3-bits contain the `huffman_bits` value. The dimensions of the entropy
image are derived from 'huffman_bits'.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int huffman_bits = ReadBits(3) + 2;
int huffman_xsize = DIV_ROUND_UP(xsize, 1 << huffman_bits);
int huffman_ysize = DIV_ROUND_UP(ysize, 1 << huffman_bits);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
where `DIV_ROUND_UP` is as defined [earlier](#predictor-transform).
Next bits contain an entropy image of width `huffman_xsize` and height
`huffman_ysize`.
**Interpretation of Meta Huffman Codes:**
For any given pixel (x, y), there is a set of five Huffman codes associated with
it. These codes are (in bitstream order):
* **Huffman code #1**: used for green channel, backward-reference length and
color cache
* **Huffman code #2, #3 and #4**: used for red, blue and alpha channels
respectively.
* **Huffman code #5**: used for backward-reference distance.
From here on, we refer to this set as a **Huffman code group**.
The number of Huffman code groups in the ARGB image can be obtained by finding
the _largest meta Huffman code_ from the entropy image:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int num_huff_groups = max(entropy image) + 1;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
where `max(entropy image)` indicates the largest Huffman code stored in the
entropy image.
As each Huffman code groups contains five Huffman codes, the total number of
Huffman codes is:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int num_huff_codes = 5 * num_huff_groups;
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Given a pixel (x, y) in the ARGB image, we can obtain the corresponding Huffman
codes to be used as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int position = (y >> huffman_bits) * huffman_xsize + (x >> huffman_bits);
int meta_huff_code = (entropy_image[pos] >> 8) & 0xffff;
Some files were not shown because too many files have changed in this diff
Show More
Reference in New Issue
Block a user
Blocking a user prevents them from interacting with repositories, such as opening or commenting on pull requests or issues. Learn more about blocking a user.