The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.
Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()
Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled
Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
the inclusion of the files is harmless when NEON is not enabled and will
allow them to be built with NEON for APP_ABI=arm64-v8a which currently
does not use the '.neon' suffix
Change-Id: I39377876b1b68822c38f4e2396da93c56145fc0f
Functions VP8LFastLog2Slow and VP8LFastSLog2Slow
also: replaced some "% y" by "& (y-1)" in the C-version
(since y is a power-of-two)
Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0
new file: lossless_neon.c
speedup is ~5%
gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509
I've tried adding -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.
Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0
expose the predictor array as function pointers instead
of each individual sub-function
+ merged Average2() into ClampedAddSubtractHalf directly
+ unified the signature as "VP8LProcessBlueAndRedFunc"
no speed diff observed
Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044
-ffunction-sections / -fdata-sections
can improve final binary size when used with --gc-sections, speed impact
untested
Change-Id: I37f4b5da2f34acede7965c2da2e1b97125473adc
-> split libraries further into decoder / encoder
-> add libwebpdecoder.a in Makefile.unix
-> make dwebp link against libwebpdecoder.a in Makefile.unix
also: in makefile.unix, pass EXTRA_FLAGS to LDFLAGS too
(otherwise, -m32 wouldn't work, e.g.)
Change-Id: Ief3da02a729dd86bbaf949ed048836716941657f
- along the lines of the SSE chroma upsampling.
Total speedup is ~30%.
4% speed loss on YuvToRgbXX conversion using tables instead
of 14-bit fixed precision. TODO(later): investigate, and compare
to x86.
see http://code.google.com/p/webp/issues/detail?id=134
Change-Id: Idc2261037cd13b4553ca20ecc4c4007099c37009
(implements the backward and forward transforms in the encoder)
original patch by Wayne Chen (datoudatou at gmail dot com)
Change-Id: Ic00f3bffcdf7a924f043006728735c810ee47a57
the extended file format is still under development and related
libs/binaries are not fit for release
configure:
--enable/disable-libwebpmux; default is disabled.
makefile.unix:
src/mux/libwebpmux.a and examples/webpmux must be explicitly specified
Makefile.vc:
$(DIRLIB)\libwebpmux.lib and $(DIRBIN)\webpmux.exe must be explicitly
specified
Change-Id: I8246746b256010dd2a2e4de58291222d7eaf0457
Defining LOCAL_ARM_NEON = true can result in neon instructions being
used in portions unprotected by the cpu check.
This changes defines a WEBP_USE_NEON/WEBP_ANDROID_NEON pair similar to
the SSE2 code and MSVC.
Change-Id: Ifac010b06e42c73d5aca529baa2198c6796674bd
* Method #1 is now calling the lossless encoder on the alpha plane.
Format is not final, it's just a first draft. We need ad-hoc functions.
* removed now useless utils/alpha.*
* added utils/quant_levels.h instead
* removed the TCoder code altogether
Change-Id: I636840b6129a43171b74860e0a0fc5bb1bcffc6a
add proper cpu-detection for Android targets
Fixes issue #118 (and is a better solution for #117).
based on patch by pepijn vaneeckhoudt
Change-Id: I6b00ea6d51ca658ccf6a3d55b87b99c01c6805be
patch by pepijn vaneeckhoudt
- Android.mk should include dec/enc/upsampling sse2 variants. This
provides sse2 optimizations when compiling for Android/x86
- LOCAL_ARM_NEON should be set to true when compiling for armeabi-v7a.
Otherwise __ARM_NEON__ is not defined and all neon code is removed by
the preprocessor.
Change-Id: I54f3505757fc5d2d63cca4b64d61be34a0b34eb8
Add predictive filtering option for Alpha plane.
Valid range for filter option is [0, 5] corresponding to prediction
methods none, horizontal, vertical, gradient & paeth filter.
The prediction method 5 will try all the prediction methods (0 to 4)
and pick the prediction method that gives best compression.
Change-Id: I9244d4a9c5017501a9696c7cec5045f04c16d49b
- add check for native log2 to configure
- use a common define (NOT_HAVE_LOG2) to enable use of local library
version for non-autoconf platforms without their own version,
currently msvc and android
This uses a negative (NOT_HAVE_) to simplify the ifdef
Change-Id: Id0610eed507f8bb9c5da338918112853d5c8127a
Gathers all DSP-related function (and SSE2 implementations).
Clean-up some unwanted symbolic dependencies so that webp_encode,
webp_decode and webp_dsp are truly independent libraries.
+ opportunistic clean-up:
* remove unneeded VP8DspInitTables(), now integrated in VP8DspInit()
* make consistent use of VP8GetCPUInfo() in the various DspInit() funcs
* change OUT macro to DST
To be enabled with the flag WEBP_USE_THREAD.
For now it's only available on unix (pthread), when using Makefile.unix
Will be switched on more generally later.
In-loop filtering and output (=rescaling/yuv->rgb conversion)
is done in parallel to bitstream decoding, lagging 1 row behind.
Example:
examples/dwebp bryce.webp -v
Time to decode picture: 0.680s
examples/dwebp bryce.webp -v -mt
Time to decode picture: 0.515s
Change-Id: Ic30a897423137a3bdace9c4e30465ef758fe53f2
You can now use WebPDecBuffer, WebPBitstreamFeatures and WebPDecoderOptions
to have better control over the decoding process (and the speed/quality tradeoff).
WebPDecoderOptions allow to:
- turn fancy upsampler on/off
- turn in-loop filter on/off
- perform on-the-fly cropping
- perform on the-fly rescale
(and more to come. Not all features are implemented yet).
On-the-fly cropping and scaling allow to save quite some memory
(as the decoding operation will now scale with the output's size, not
the input's one). It saves some CPU too (since for instance,
in-loop filtering is partially turned off where it doesn't matter,
and some YUV->RGB conversion operations are ommitted too).
The scaler uses summed area, so is mainly meant to be used for
downscaling (like: for generating thumbnails or previews).
Incremental decoding works with these new options.
More doc to come soon.
dwebp is now using the new decoding interface, with the new flags:
-nofancy
-nofilter
-crop top left width height
-scale width height
Change-Id: I08baf2fa291941686f4ef70a9cc2e4137874e85e
+ add a simple rescaling function: WebPPictureRescale() for encoding
+ clean-up the memory managment around the alpha plane
+ fix some includes path by using "../webp/xxx.h" instead of "webp/xxx.h"
New flags for 'cwebp':
-resize <width> <height>
-444 (no effect)
-422 (no effect)
-400
Change-Id: I25a95f901493f939c2dd789e658493b83bd1abfa
converts PNG & JPEG to WebP
This is an experimental early version, with lot of room
of later optimizations in both speed and quality.
Compile with the usual `./configure && make`
Command line example is examples/cwebp
Usage:
cwebp [options] -q quality input.png -o output.webp
where 'quality' is between 0 (poor) to 100 (very good).
Typical value is around 80.
More encoding options with 'cwebp -longhelp'
Change-Id: I577a94f6f622a0c44bdfa9daf1086ace89d45539