Compare commits

...

483 Commits

Author SHA1 Message Date
James Zern
fcd94e925a update ChangeLog
Change-Id: I60c273c650a305fe36564ccc5fb1c8d7ea18118f
2015-03-04 11:30:23 -08:00
James Zern
569fe5789e update NEWS
Change-Id: Iade773fc3b9961fcd9b4112a3972cfc68e3670f2
2015-03-03 19:19:50 -08:00
James Zern
bd852f5d81 bump version to 0.4.3
libwebp{,decoder} - 0.4.3
libwebp libtool - 5.3.0
libwebpdecoder libtool - 1.3.0

mux/demux - 0.2.2 (unchanged)
libtool - 1.2.0 (unchanged)

Change-Id: Ie8c35ffc20c1bfd782bdafd99da6c6b1373022c1
2015-03-03 19:05:40 -08:00
James Zern
2d58b64f51 WebPPictureRescale: add a note about 0 width/height
(cherry picked from commit 0f773693bf)

Change-Id: I3890bb3fd32a148d7dd24c714546160c6c59d4ea
2015-03-03 17:53:49 -08:00
James Zern
a0d8ca576f examples/Android.mk: add webpmux_example target
renamed from 'webpmux' to avoid name clash with the library module name

(cherry picked from commit 6cef0e4fa4)

Change-Id: I33bbdbdcb25a6f35bd85c9a0dbbb93b2428b05f3
2015-03-03 17:53:49 -08:00
James Zern
34b1d29e3c Android.mk: add webpmux target
(cherry picked from commit 53c16ff047)

Change-Id: I60fc898fd804e23f08d760694192c5d04adcae91
2015-03-03 17:53:49 -08:00
James Zern
75619881e6 Android.mk: add webpdemux target
(cherry picked from commit 21852a00a1)

Change-Id: I2fbbefbee59a96c52f5addcfc5bfe1216caad5cc
2015-03-03 17:53:48 -08:00
James Zern
a98757650a Android.mk: add webpdecoder{,_static} targets
webpdecoder_static is reused to create libwebpdecoder.so and
libwebp.{a,so}

(cherry picked from commit 8697a3bcc8)

Change-Id: I940293cb755040c0ea45dc13f22624de8f355867
2015-03-03 17:53:48 -08:00
James Zern
a6d4859725 Android.mk: split source lists per-directory
will allow reuse in future targets

(cherry picked from commit 4a67049113)

Conflicts:
	Android.mk

Change-Id: Iededc19d954226e62f2d2383a2b80f268d613647
2015-03-03 17:53:48 -08:00
James Zern
77544d5f5b fix iOS arm64 build with Xcode 6.3
the standard vtbl functions are available there [1][2].
based on a patch from: aaroncrespo
fixes issue #243.

[1]
http://adcdownload.apple.com//Developer_Tools/Xcode_6.3_beta/Xcode_6.3_beta_Release_Notes.pdf
[2] Apple LLVM Compiler Version 6.1
- Xcode 6.3 updates the Apple LLVM compiler to version 6.1.0.
[...]
Support for the arm64 architecture has been significantly revised to
align with ARM's implementation, where the most visible impact is that a
few of the vector intrinsics have changed to match ARM's specifications.

(cherry picked from commit 602a00f93f)

Change-Id: I79a0016f44b9dbe36d0373f7f00a50ab3c2ca447
2015-03-03 17:53:47 -08:00
James Zern
6dea15784d doc/webp-container-spec: note MSB order for chunk diagrams
addresses question in issue #241

(cherry picked from commit b510fbfe3b)

Change-Id: Iff6a172d5822f6ec8b9bc0951a1c9cd3f98c9251
2015-03-03 17:53:47 -08:00
James Zern
f7cd57b23d doc/webp-container-spec: cosmetics
partially normalize indent, vertical whitespace and capitalization with
the copy used on developers.google.com/speed/webp

(cherry picked from commit e7d3df2314)

Change-Id: I8044418eeb9eaf5bd5c799675c74f6f845d503d6
2015-03-03 17:53:47 -08:00
James Zern
1d6b250b07 vwebp: clear canvas at the beginning of each loop
this is in line with the recommendation in the spec, cf.,
5603947 webp-container-spec: clarify background clear on loop

(cherry picked from commit 1579de3cae)

Change-Id: Id3910395b05a1a1f2804be841b61f97bd4bac593
2015-03-03 17:53:46 -08:00
James Zern
f97b3f86bf webp-container-spec: clarify background clear on loop
at the beginning of the loop there's an implicit clear of the entire
canvas to the background (or application defined) color. this avoids
adding the final composited frame to the first.

(cherry picked from commit 560394798f)

Change-Id: Ia3a52cf4482c6176334a5c9c99a0ddd07d1776e7
2015-03-03 17:53:46 -08:00
James Zern
4ba83c1759 vwebp: remove unnecessary static Help() prototype
all uses occur after its declaration

(cherry picked from commit 2c906c407c)

Change-Id: I775642ce6d1dec3bc6da2fa0d5d87490992c7e6c
2015-03-03 17:53:46 -08:00
James Zern
d34e8e3d18 vwebp/animation: display last frame on end-of-loop
previously the first frame would be redisplayed, which might be
unexpected if the final frame was meant to be a composite, for example.

(cherry picked from commit 0f017b56f3)

Change-Id: I4da795623c71501e2fa426e8fba8fb2ffcbab58a
2015-03-03 17:53:45 -08:00
James Zern
bbbc524fb4 dec/vp8: clear 'dither_' on skipped blocks
DitherRow() only checks this value, not 'skip_' so previously it was
uninitialized for these blocks.

(cherry picked from commit 66935fb9ee)

Change-Id: I0f698b81854ee9d91edacb51c1e3bdab9cba96f2
2015-03-03 17:53:45 -08:00
James Zern
0339fa26eb lossless_neon: enable subtract green for aarch64
similar to:
1ba61b0 enable NEON intrinsics in aarch64 builds

vtbl1_u8 is available everywhere but Xcode-based iOS arm64 builds, use
vtbl1q_u8 there.

performance varies based on the input, 1-3% on encode was observed

(cherry picked from commit 416e1cea9b)

Change-Id: Ifec35b37eb856acfcf69ed7f16fa078cd40b7034
2015-03-03 17:53:45 -08:00
Urvang Joshi
5a0c2207f4 Regression fix for lossless decoding
Reported here: https://code.google.com/p/webp/issues/detail?id=239

At the beginning of method 'DecodeImageData', pixels up to
'dec->last_pixel_' are assumed to be already cached. So, at the end of
previous call to that method also, that assumption should hold true.

Hence, we should cache all pixels up to 'src' regardless of 'src_last'.

This affects lossless incremental decoding only, as that is when
src_last and src_end differ.
Note: alpha decoding is implicitly incremental, as alpha decoding of
only the rows 'y_end - y_start' happens during FinishRow() call. So, this bug
affects alpha decoding in non-incremental decoding flow as well.

This bug was introduced in: https://gerrit.chromium.org/gerrit/#/c/59716.

(cherry picked from commit 783a8cda24)

Change-Id: Ide6edfeb2609b02aff701e1bd9fd776da0a16be0
2015-03-03 17:53:44 -08:00
James Zern
6e3a31d595 wicdec: (msvs) quiet some /analyze warnings
add additional return checks and asserts to avoid:
C6102: Using 'XXX' from failed function call ...

(cherry picked from commit 9b228b5416)

Change-Id: I51f5fa630324e0cd7b2d9fceefecb4f4021474b1
2015-03-03 17:53:44 -08:00
James Zern
b49a578135 dwebp/WritePNG: mark png variables volatile
these are used on both sides of the setjmp().

(cherry picked from commit 7a191398ca)

Change-Id: I4a789bfb3a5d56946a22286c5a140008d90e1ba2
2015-03-03 17:53:43 -08:00
James Zern
0a4391a196 dwebp: include setjmp.h w/WEBP_HAVE_PNG
setjmp() is used in WritePNG().

(cherry picked from commit 775dfad297)

Change-Id: Iadd836272fc7d368d635c891507ce2a08c4d3dec
2015-03-03 17:53:43 -08:00
James Zern
90f1ec58a9 dwebp: correct sign in format strings
width / height are unsigned; fixes a warning with msvs /analyze:
C6340: Mismatch on sign: 'const unsigned int' passed as _Param_(4) when
some signed type is required in call to 'fprintf'.

(cherry picked from commit 47d26be760)

Change-Id: I5f1fad4c93745baf17d70178a5e66579ccd2b155
2015-03-03 17:53:43 -08:00
James Zern
b61ce861f3 VP8LEncodeStream: add an assert
check enc->argb_ to quiet an msvs /analyze warning:
C6387: 'enc->argb_+y*width' could be '0':  this does not adhere to the
specification for the function 'memcpy'.

(cherry picked from commit f0e0677b87)

Change-Id: I87544e92ee0d3ea38942a475c30c6d552f9877b7
2015-03-03 17:53:42 -08:00
James Zern
df1081bb82 dsp/cpu: (msvs) add include for __cpuidex
and only use it on x86 / x64 where it's available.
has the side-effect of quieting a msvs /analyze warning:
C6001: Using uninitialized memory 'cpu_info'.

(cherry picked from commit 0de5f33e31)

Change-Id: Iae51be3b22b2ee949cfc473eeea9fd9fb6b3c2cb
2015-03-03 17:53:42 -08:00
James Zern
39aa055529 dsp/cpu: (msvs) avoid immintrin.h on _M_ARM
_xgetgv() isn't relevant there anyway

broken since:
279e661 Merge "dsp/cpu: add include for _xgetbv() w/MSVS"

(cherry picked from commit 4fbe9cf202)

Change-Id: Iaa7bc0c5be9c06bfffab39e194c64c09bf5b5a27
2015-03-03 17:53:42 -08:00
James Zern
f814f429ca dsp/cpu: add include for _xgetbv() w/MSVS
explicitly add immintrin.h instead of transitively picking it up via
windows.h presumably. makes the code easier to move around.

(cherry picked from commit b6c0428e8c)

Change-Id: If70d5143ac94fc331da763ce034358858e460e06
2015-03-03 17:53:41 -08:00
James Zern
8508ab99a7 cpu: fix AVX2 detection for gcc/clang targets
ecx needs to be set to 0; the visual studio builds were already doing
this.

https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family

(cherry picked from commit d7c4b02a57)

Change-Id: I95efb115b4d50bbdb6b14fca2aa63d0a24974e55
2015-03-03 17:53:41 -08:00
Pascal Massimino
5769623b6f fix handling of zero-sized partition #0 corner case
reported in https://code.google.com/p/webp/issues/detail?id=237

An empty partition #0 should be indicative of a bitstream error.
The previous code was correct, only an assert was triggered in debug mode.
But we might as well handle the case properly right away...

(cherry picked from commit 205c7f26af)

Change-Id: I4dc31a46191fa9e65659c9a5bf5de9605e93f2f5
2015-03-03 17:53:40 -08:00
James Zern
b2e71a9080 make the 'last_cpuinfo_used' variable names unique
allows the sources to be #include'd in some hackish builds (don't do
that!)

(cherry picked from commit 67f601cd46)

Conflicts:
	src/dsp/alpha_processing.c
	src/dsp/argb.c
	src/dsp/dec.c
	src/dsp/enc.c
	src/dsp/lossless.c
	src/dsp/upsampling.c
	src/dsp/yuv.c

Change-Id: I0c7a43acbebd0e2d5068845e6daa8ce47361cd91
2015-03-02 18:43:41 -08:00
Pascal Massimino
1273e84517 add -Wformat-nonliteral and -Wformat-security
can be useful, not sure they are a subset of the flags we
use already...

(cherry picked from commit 80d950d94e)

Change-Id: Iec742a99427a791d9527368302a1136df2ff96cd
2015-03-02 18:43:35 -08:00
Pascal Massimino
3ae78eb757 multi-thread fix: lock each entry points with a static var
we compare the current VP8GetCPUInfo pointer to the last used.
This is less code overall and each implementation is still
testable separately (by just changing VP8GetCPUInfo, but not
a separate threads!)

(cherry picked from commit a437694a17)

Conflicts:
	src/dsp/alpha_processing.c
	src/dsp/argb.c
	src/dsp/dec.c
	src/dsp/enc.c
	src/dsp/lossless.c
	src/dsp/upsampling.c
	src/dsp/yuv.c

Change-Id: Ia13fa8ffc4561a884508f6ab71ed0d1b9f1ce59b
2015-03-02 18:43:31 -08:00
James Zern
5c1eeda922 webp-container-spec: remove references to fragments
this portion of the format was never finalized

(cherry picked from commit a66e66c79d)

Change-Id: I80aa1b27457a0e52b047c7284df2f58b181ca5d8
2015-03-02 18:43:26 -08:00
James Zern
c5ceea4899 enc_neon: fix building with non-Xcode clang (iOS)
check for __apple_build_version__ to distinguish the two; a version
check could work as Apple bumped Xcode's to 5.x/6.x, but it's unclear
how upstream will deal with their versioning as they go 3.6+, so avoid
it for now.

(cherry picked from commit a3946b8956)

Change-Id: I67cda67c4f68e262a92d805a63cc1496374be063
2015-03-02 18:43:20 -08:00
James Zern
d0859d69de iosbuild: add x64_64 simulator support
based on the patch here:
https://github.com/pixelkind/webp-ios-build

(cherry picked from commit a96ccf8fde)

Change-Id: Iaa346b751e5f18e8cf13a8e5c4064b0c2a3f5f6c
2015-03-02 18:43:14 -08:00
Urvang Joshi
046732ca65 WebPEncode: Support encoding same pic twice (even if modified)
This wasn't working for this specific scenario:
- Encode an RGBA 'pic' (with trivial alpha) using lossy encoding.
(so that pic->a == NULL after import happens).
- Modify the 'pic->argb' so that it has non-trivial alpha.
- Encode the same 'pic' again.
This used to fail to encode alpha data as pic->a == NULL.

(cherry picked from commit e4f4dddba3)

Change-Id: Ieaaa7bd09825c42f54fbd99e6781d98f0b19cc0c
2015-03-02 18:43:08 -08:00
James Zern
4426f50179 webp/types.h: use inline for clang++/-std=c++11
at least clang 3.[45] in c++ mode with -std=c++11 define __STRICT_ANSI__
this change set WEBP_INLINE to inline for c++/non-strict-ansi/> c99

fixes crbug.com/428383

(cherry picked from commit 6638710b9e)

Change-Id: Ief2b934353c336a75865c73c90cc3dc5e4f83913
2015-03-02 18:43:02 -08:00
Urvang Joshi
e297fc7171 gif2webp: Use the default hint instead of WEBP_HINT_GRAPH.
This is much faster and the compression is slightly better too.

(cherry picked from commit c94ed49efd)

Change-Id: Ibf0d10eea83bfabfcc44ee497074767462ff41b1
2015-03-02 18:42:54 -08:00
James Zern
855fe4354b Makefile.vc: add a 'legacy' RTLIBCFG option
disables buffer security checks (/GS-) and any machine optimizations
(e.g., sse2)

fixes issue #228

(cherry picked from commit 34c20c06c8)

Change-Id: I81fa483dc1654199b2017626320383d2d63317dc
2015-03-02 18:42:48 -08:00
Urvang Joshi
b7eb6d55c7 gif2webp: Support GIF_DISPOSE_RESTORE_PREVIOUS
Tweaked the gif2webp_util API to support this.

Requested in: https://code.google.com/p/webp/issues/detail?id=144

(cherry picked from commit 65e5eb8a62)

Change-Id: I0e8c4edc39227355cd8d3acc55795186e25d0c3a
2015-03-02 18:42:40 -08:00
Urvang Joshi
5691bdd9da gif2webp: Handle frames with odd offsets + disposal to background.
Snapping odd offsets in GIF to even offsets in WebP was causing extra row/column
being disposed in such cases.

Code is rewritten to maintain previous and current canvas (it used to maintain
previous canvas and current frame earlier). And we recompute change rectangles
as those from GIF may no longer apply.
Also, this renders methods like ReduceTransparency() and ConvertToKeyFrame()
redundant, as internally maintained current canvas is always independent of
previous canvases.

Disposal method choice: we pick the disposal method that results in the smallest
change rectangle.

(cherry picked from commit e4c829efe9)

Conflicts:
	examples/gif2webp_util.c

Change-Id: Ic31186d98fe1a2a790a89d1571b17e3abd127e79
2015-03-02 18:42:36 -08:00
James Zern
8301da1380 stopwatch.h: fix includes
WEBP_INLINE -> webp/types.h
memcpy -> string.h

(cherry picked from commit 54edbf65ff)

Change-Id: Iab2ea8b553dc98be75eede751de62ab0292d1f97
2015-03-02 18:42:28 -08:00
James Zern
6a2209aa36 update ChangeLog
Change-Id: I32a22786f99d0c239761a362485a4a91c783851b
2014-10-17 16:15:30 +02:00
James Zern
36cad6abe8 bit_reader.h: cosmetics: fix a typo
Change-Id: I1ba09124700b3120f18eb3705eb5ba805feb2ca0
(cherry picked from commit 79b5bdbfde)
2014-10-17 16:02:03 +02:00
James Zern
e2ecae62f0 enc_mips32: workaround gcc-4.9 bug
avoids an ICE with NDK r10b + NDK_TOOLCHAIN_VERSION=4.9

In function 'SSE16x16':
enc_mips32.c (684) internal compiler error: Segmentation fault

Change-Id: I1a3d33c0a9534c97633ab93bcdf9bf59d3a7e473
(cherry picked from commit 0ce27e715e)
2014-10-15 19:58:03 +02:00
James Zern
243e68df60 update ChangeLog
Change-Id: Ib9504997d00c9f72cff30aa699fe1998b6be940d
2014-10-14 20:43:43 +02:00
James Zern
eec5f5f121 enc/vp8enci.h: update version number
0.4.1 -> 0.4.2
missed in: 857578a bump version to 0.4.2

Change-Id: Iaa62ce5af5935243748e51105008f252443e7d27
2014-10-14 20:40:02 +02:00
James Zern
0c1b98d28c update NEWS
Change-Id: Ib9b48e3611b77214bc875524249366ff62451c1b
2014-10-14 12:02:17 +02:00
James Zern
69b0fc9492 update AUTHORS
Change-Id: I750dc65205705b94f6a3fd467e6746c482c983cb
2014-10-14 12:02:17 +02:00
James Zern
857578a811 bump version to 0.4.2
libwebp{,decoder} - 0.4.2
libwebp libtool - 5.2.0
libwebpdecoder libtool - 1.2.0

mux/demux - 0.2.2
libtool - 1.2.0

Change-Id: If593a198f802fd68c7dbbdbe0fc2612dbc44e2df
2014-10-14 12:02:17 +02:00
James Zern
9129deb5b1 restore encode API compatibility
protect WebPPictureSmartARGBToYUVA with an ABI check

Change-Id: Iaec4a9f8f590f27c4c72129b90068690efc84eb7
2014-10-14 12:02:17 +02:00
James Zern
f17b95e992 AssignSegments: quiet -Warray-bounds warning
the number of segments are previously validated, but an explicit check
is needed to avoid a warning under gcc-4.9

(cherry picked from commit c8a87bb62d)

Change-Id: Ifa7c0dd7f3f075b3860fa8ec176d2c98ff54fcea
2014-10-14 12:02:16 +02:00
James Zern
9c56c8a12e enc_neon: initialize vectors w/vdup_n_u32
replaces {} initialization gnu-ism

(cherry picked from commit 7534d71640)

Change-Id: I5a7b2d4246f0205e4bfb7f4b77d720c47d8674ec
2014-10-14 12:02:16 +02:00
James Zern
a008902320 iosbuild: cleanup
- s/declare -r/readonly/
- remove unnecessary build variables
- echo configure line

(cherry picked from commit 041956f6a3)

Change-Id: I4489d8413934dcec57d60abf136a72d08fa0fd99
2014-10-14 12:02:16 +02:00
James Zern
cc6de53b3b iosbuild: output autoconf req. on failure
(cherry picked from commit 767eb40238)

Change-Id: I2a6d80cf22f5b58e80345a411c48f047fecdbb47
2014-10-14 12:02:16 +02:00
James Zern
740d765aea iosbuild: make iOS 6 the minimum requirement
iOS 5 support isn't available in the Xcode 6 install; iOS 6 covers
phones starting at the 3GS, so should be a reasonable base line

(cherry picked from commit 9f7d9e6dda)

Change-Id: Ie5603c9e30cb52114b372509e183febbf679a69a
2014-10-14 12:02:16 +02:00
James Zern
403023f500 iobuild.sh: only install .h files in Headers
cp * -> cp *.h; avoids picking up autoconf files

(cherry picked from commit 38128cb9df)

Change-Id: I57c04562d554431ddf4605af65077f32d90ac58e
2014-10-14 12:02:16 +02:00
skal
b65727b55d Premultiply with alpha during U/V downsampling
This prevents the 'alpha-leak' reported in issue #220

Speed-diff is kept minimal.

(cherry picked from commit c792d4129a)

Change-Id: I1976de5e6de7cfcec89a54df9233c1a6586a5846
2014-10-14 12:02:16 +02:00
Urvang Joshi
8de0debcec gif2webp: Background color correction
For some GIF images, the first frame is missing the corresponding
graphic control extension. For such cases, we were never calling
GetBackgroundColor(), and default background color value (white) was being used
incorrectly.
So, we call GetBackgroundColor() when we encounter the first image
descriptor instead, to make sure that it is always called.

(cherry picked from commit 0cc811d7d6)

Change-Id: I00fc8e943d8a0c1578dcd718f3e74dec7de4ed61
2014-10-14 12:02:16 +02:00
Pascal Massimino
f8b7d94daa Amend the lossless spec according to issue #205, #206 and #224
http://code.google.com/p/webp/issues/detail?id=205 <- Select()
http://code.google.com/p/webp/issues/detail?id=206 <- out-of-bound colormap index
http://code.google.com/p/webp/issues/detail?id=224 <- version number MUST be 0

(cherry picked from commit d7167ff7ce)

Change-Id: I56a575529862dfc8ad189ddcfc47ef59a58f273d
2014-10-14 12:02:16 +02:00
Pascal Massimino
9102a7b63d Add a WebPExtractAlpha function to dsp
This is the opposite of WebPDispatchAlpha

+ Implement the SSE2 version

(cherry picked from commit cddd334050)

Conflicts:
	src/dsp/alpha_processing_sse2.c

Change-Id: I0c297309255f508c5261da8aad01f7e57f924d6c
2014-10-14 12:02:16 +02:00
James Zern
e407b5d516 webpmux: simplify InitializeConfig()
put WebPMuxConfig on the stack in main() rather than allocating it in
InitializeConfig(); removes a level of indirection there.

(cherry picked from commit c0a462cac2)

Change-Id: I81d386f7472ebbd322dd3fdbfda9d78dbeb62a66
2014-10-14 12:02:16 +02:00
James Zern
3e70e64153 webpmux: fix indent
+ remove unnecessary cast

(cherry picked from commit 6986bb5e12)

Change-Id: I2070fbe6aeda49f5790c69390e5b539a2c1a5616
2014-10-14 12:02:16 +02:00
James Zern
be38f1af47 webpmux: fix exit status on numeric value parse error
in most cases 'ok' is set via a goto macro

(cherry picked from commit f89e1690df)

Change-Id: I17c832446bf3e716d3bcd323dbcc72bec544029c
2014-10-14 12:02:16 +02:00
James Zern
94dadcb1a1 webpmux: fix loop_count range check
explicitly check [0, 65535], the use of 'long' was removed in a prior
commit

(cherry picked from commit 0e23c487da)

Change-Id: I70d5bf286908459b5d4d619c657853f0e833b6ea
2014-10-14 12:02:16 +02:00
James Zern
40b3a618ec examples: warn on invalid numeric parameters
add ExUtilGet[U]Int / ExUtilGetFloat which print an error message on
parse failure.
fixes issue #219.

(cherry picked from commit 96d43a873a)

Conflicts:
	examples/cwebp.c
	examples/dwebp.c

Change-Id: Ie537f5aebd138925bf1a48289b6b5e261b3af2ca
2014-10-14 12:02:16 +02:00
Urvang Joshi
b7d209a448 gif2webp: Handle frames with missing graphic control extension
According to the GIF spec (http://www.w3.org/Graphics/GIF/spec-gif89a.txt),
this block is optional, and its scope is only the first graphic rendering block
that follows.

The spec doesn't mention what default values of frame dimensions, offsets,
duration and transparent index to use in such a case, though. So, we use the
defaults used by GIF reader in Chromium:
https://code.google.com/p/chromium/codesearch#chromium/src/third_party/WebKit/Source/platform/image-decoders/gif/GIFImageReader.h&l=186

(cherry picked from commit d51f3e4069)

Change-Id: Iecc6967847192483770e85ac15fe2835cd01ce7b
2014-10-14 12:02:16 +02:00
James Zern
bf0eb74829 configure: simplify libpng-config invocation
use --ldflags over --prefix + --libs combination

based on comment in issue #180.

(cherry picked from commit be70b86c57)

Change-Id: If2ca06053d5237b6722ddf4117917e5f3c06ab59
2014-10-14 12:02:15 +02:00
Vikas Arora
3740f7d4c6 Rectify bug in lossless incremental decoding.
Handle the corner case when VP8LDecodeImage() method is called with an invalid
header data. The lossless decoding doesn't support incremental mode yet.
Return the error status as BITSTREAM error in case not all pixels are decoded
with the provided bit-stream. Also added asserts in the VP8LDecodeImage() method
to validate the decoder header with appropriate/valid data for huffman trees
(htree_groups_ etc).

(cherry picked from commit e0a9932161)

Change-Id: Ibac9fcfc4bd0a2c5f624bb9d4a2b9f6459aa19ea
2014-10-14 12:02:06 +02:00
Pascal Massimino
3ab0a377d2 make VP8LSetBitPos() set br->eos_ flag
ReadSymbol() finishes with a VP8LSetBitPos() call only and could miss an eos_ during the decode loop.

Things are faster because of inlining too.

(cherry picked from commit d3242aee16)

Change-Id: I2d2a275f38834ba005bc767d45c5de72d032103e
2014-10-14 10:03:15 +02:00
Pascal Massimino
2e4312b14f Lossless decoding: fix eos_ flag condition
eos_ needs to be set only when superfluous bits have actually
been requested.
Earlier, we were assuming pre-mature end-of-stream to be an error.
Now, more precisely, we mark error when we have encountered end-of-stream *and*
we attempt to read more bits after that.

This handles cases where image data requires no bits to be read

(cherry picked from commit a9decb5584)

Change-Id: I628e2c39c64f10c443fb51f86b1f5919cc9fd299
2014-10-14 10:03:14 +02:00
Pascal Massimino
e6609ac6b9 fix erroneous dec->status_ setting
We only need to set BITSTREAM_ERROR if !ok.

(cherry picked from commit 3fea6a28da)

Conflicts:
    src/dec/vp8l.c

Change-Id: I5bd13e64797e8bc509477edb29158abb39cb0ee1
2014-10-14 10:02:30 +02:00
skal
5692eae1f3 add a fallback to ALPHA_NO_COMPRESSION
if ALPHA_LOSSLESS_COMPRESSION produces a too big file (very rare!),
we fall-back to no-compression automatically.

(cherry picked from commit 187d379db6)

Change-Id: I5f3f509c635ce43a5e7c23f5d0f0c8329a5f24b7
2014-10-14 09:57:41 +02:00
James Zern
6ecd5bf682 ExUtilReadFromStdin: (windows) open stdin in bin mode
fixes input/decode from stdin in the examples

(cherry picked from commit a6140194ff)

Change-Id: Ie8052da758a9ef64477501b709408236d258da82
2014-10-14 09:57:41 +02:00
James Zern
4206ac6bbf webpmux: (windows) open stdout in binary mode
prevents corrupt output. related to issue #217

(cherry picked from commit e80eab1fbc)

Change-Id: I6f0dac8131127717ba72b0709fb35d421ab41acb
2014-10-14 09:57:41 +02:00
James Zern
d40e885931 cwebp: (windows) open stdout in binary mode
prevents corrupt output. fixes issue #217

(cherry picked from commit e9bfb1166d)

Change-Id: If90afb441636144300da66d64f0e7f78505b4060
2014-10-14 09:57:41 +02:00
James Zern
4aaf463449 example_util: add ExUtilSetBinaryMode
use it in dwebp when dealing with 'stdout'

(cherry picked from commit 5927e15bc7)

Change-Id: I8b8a0b0de9e73731e913ac3c83b5e2b33c693175
2014-10-14 09:57:41 +02:00
Urvang Joshi
4c82ff76c1 webpmux man page: Clarify some title, descriptions and examples
Based on the feedback here:
https://groups.google.com/a/webmproject.org/d/msg/webp-discuss/qY6rWJLqRTY/pF8oSj8DOGYJ

(cherry picked from commit 30f3b75b33)

Change-Id: I9119ea8e211ffb269026010e3c590385bc6a9f17
2014-10-14 09:57:41 +02:00
James Zern
23d4fb3362 dsp/lossless: workaround gcc-4.9 bug on arm
force Sub3() to not be inlined, otherwise the code in Select() will be
incorrect.
https://android-review.googlesource.com/#/c/102511

(cherry picked from commit 637b388809)

Change-Id: I90ae58bf3e6cc92ca9897f69974733d562e29aaf
2014-10-13 19:03:49 +02:00
James Zern
5af7719047 dsp.h: collect gcc/clang version test macros
endian_inl.h already relies on dsp.h, grab the definitions from there.

(cherry picked from commit 8323a9038d)

Change-Id: I445f7d0631723043c55da1070498f89965bec7b1
2014-10-13 19:03:49 +02:00
James Zern
90d112466b enc_neon: enable QuantizeBlock for aarch64
vtbl4_u8 is available everywhere except iOS arm64: use vtbl2q_u8 there
with a corresponding change in the load.

(cherry picked from commit 953acd56a4)

Change-Id: Ib84212dda3c7875348282726c29e3b79b78b0eac
2014-10-13 19:03:48 +02:00
skal
ee78e7801d SmartRGBYUV: fix odd-width problem with pixel replication
rightmost pixel was missing a copy, which could lead to invalid read.

Also added a lower dimension of 4, below which we use the regular conversion.
This is to prevent corner cases, in addition to not being overkill.

(cherry picked from commit 2523aa73cb)

Change-Id: Iac12e7a3d74590f12fe8eeb1830b9891e61439f6
2014-10-13 19:03:48 +02:00
Pascal Massimino
c9ac2041e9 fix some MSVC64 warning about float conversion
(cherry picked from commit ee52dc4e54)

Change-Id: I27ab27fc15033d27d0505729f6275fb542c8d473
2014-10-13 19:03:48 +02:00
James Zern
f4497a1ef5 cpu: check for _MSC_VER before using msvc inline asm
_M_IX86 will be defined in mingw builds after including windows.h. as
the gcc inline asm is first, this missing check would only have caused
an error if the code was reorganized.

(cherry picked from commit 3fca851a20)

Change-Id: I395679bcfc43e94d308d1ceb0c0fbf932b2c378c
2014-10-13 19:03:48 +02:00
skal
e2159fdff7 faster RGB->YUV conversion function (~7% speedup)
with a special case for dithering==0., it gets somewhat faster on x86
thanks to inlining.

Also, less macros.

(cherry picked from commit e2a83d7109)

Change-Id: Ic2f2bf6718310743bb40cef2104fa759a073e6d5
2014-10-13 19:03:48 +02:00
skal
21abaa05e3 Add smart RGB->YUV conversion option -pre 4
New function: WebPPictureSmartARGBToYUVA()
This implement smart RGB->YUV conversion.

This is rather undocumented for now, and is triggered using '-pre 4'
preprocessing option.

This is slow-ish and use quite some memory, but should be improvable.
This is somehow a usable beta version.

(cherry picked from commit 3fc4c539aa)

Change-Id: Ia50a8c30134e4cab8a7d3eb70aef13ce1f6187a1
2014-10-13 19:03:48 +02:00
James Zern
1a161e20a4 configure: add work around for gcc-4.9 aarch64 bug
add -frename-registers to avoid:
src/dsp/dec_neon.c🔢1: internal
compiler error: in simplify_const_unary_operation, at
simplify-rtx.c:1539

https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62040
https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61622
https://bugs.debian.org/cgi-bin/bugreport.cgi?bug=757738

(cherry picked from commit e2b8cec05b)

Change-Id: I52fb3a29ac30b82b27da05378bdb495ddebb97d7
2014-10-13 19:03:48 +02:00
Djordje Pesut
55b10de73f MIPS: mips32r2: added optimization for BSwap32
gcc < 4.8.3 doesn't translate bswap optimally.
use optimized version always

(cherry picked from commit 98c54107df)

Change-Id: I979ea26ad6dc0166d3d2f39c4148eb8adfb7ddec
2014-10-13 19:03:48 +02:00
Lou Quillio
76d2192206 Update PATENTS to reflect s/VP8/WebM/g
Sync with http://www.webmproject.org/license/additional/

modified:   PATENTS

(cherry picked from commit dab702b357)

Change-Id: I9f7af36fdcf57a82311363a043707b181832fc18
2014-10-13 18:19:08 +02:00
Djordje Pesut
29a9db1f7c MIPS: detect mips32r6 and disable mips32r1 code
(cherry picked from commit b7e5a5c451)

Change-Id: Id1325c789a990c9a8704e84e99a22d580303eb8a
2014-10-13 18:19:04 +02:00
Timothy Gu
245c4a6737 Correctly use the AC_CANONICAL_* macros
http://www.gnu.org/software/autoconf/manual/autoconf.html#Using-System-Type

Signed-off-by: Timothy Gu <timothygu99@gmail.com>
(cherry picked from commit 63c2fc02ce)

Change-Id: I40a13e84f5266ed20bc4db098502b1610ab71206
2014-10-13 18:18:58 +02:00
James Zern
40aa8b69b0 cosmetics
fix some indent/whitespace, remove a few duplicate includes, extra
semi-colons

(cherry picked from commit e300c9d819)

Change-Id: If937182b40a21e0f2028496e7b4b06c6e8a41352
2014-10-13 18:18:50 +02:00
James Zern
2ddcca5efe cosmetics: remove some extraneous 'extern's
(cherry picked from commit f7b4c48bba)

Change-Id: Ib3f0cff37120c51633387dd1c46592c53ab0ba6d
2014-10-13 18:18:44 +02:00
James Zern
f40dd7c6de vp8enci.h: cosmetics: fix '*' placement
associate with the type

(cherry picked from commit b47fb00ac0)

Change-Id: Icf94f11bf79f6ccee3150e27b228755f8f3f0f37
2014-10-13 18:18:35 +02:00
James Zern
4610c9c55a bit_writer: cosmetics: rename kFlush() -> Flush()
(cherry picked from commit 4c6dde37b9)

Change-Id: I8907927974188bee85ffade1d75d2e50817aa115
2014-10-13 18:18:31 +02:00
James Zern
fc3c1750dc dsp: detect mips64 & disable mips32 code
(cherry picked from commit 0524d9e5e8)

Change-Id: Icf68dafd5cf0614ca25b36a0252caa1784ac8059
2014-10-13 18:18:25 +02:00
James Zern
c1a7955d59 cwebp.1: restore quality description
based on:
d3485d9 cwebp.1: fix quality description placement

Change-Id: I8bd16db7f39cc9ff816cc02c04a455493e550c26
2014-10-13 18:18:24 +02:00
James Zern
57a7e73d27 correct alpha_dithering_strength ABI check
the ABI wasn't bumped with this addition, but it's more correct to say
it was added with 0x0205 rather than 0x0204

Change-Id: I2ba12a33b612fac16bdfeb8272e76b0ea84f3938
2014-10-13 18:18:22 +02:00
James Zern
6c83157524 correct WebPMemoryWriterClear ABI check
this function was introduced in 0x0204; fix checks related to this to be
> 0x0203 instead of 0x0202, pointed out on ffmpeg-devel.

Change-Id: I52cd2b98304baf1eb9a83094e2374f2120a1546b
2014-10-13 18:18:19 +02:00
James Zern
8af2771813 update ChangeLog
Change-Id: I3b930aa6cb72d17f41e52d645de1d9b2f3a0238b
2014-07-28 17:22:32 -07:00
James Zern
f59c0b4bde iosbuild.sh: specify optimization flags
explicitly set '-O3 -DNDEBUG'. setting CFLAGS on the command line
overrides the default, resulting in -O0.

Change-Id: I213979f646b1444b1d8e0eb0bb58e9b2c3cc4dd3
2014-07-26 20:40:51 -07:00
James Zern
8d34ea3e36 update ChangeLog
Change-Id: I5346984d2adff27b64304c154d720456549a9f24
2014-07-24 11:54:02 -07:00
James Zern
dbc3da66d3 makefile.unix: add vwebp.1 to the dist target
Change-Id: Icf8b3853a9b175688c3b92d6f498ed44c58ca462
2014-07-23 23:41:47 -07:00
James Zern
89a7c83cd4 update ChangeLog
Change-Id: Ie9c2c7fe53321aefa17905c4322ad3373869ebad
2014-07-23 23:05:49 -07:00
James Zern
ffe67ee92e Merge "update NEWS for the next release" into 0.4.1 2014-07-23 23:02:19 -07:00
James Zern
2def1fe635 gif2webp: dust up the help message
* try to avoid trailing '.'
* rationalize capitalization

missed in:
0a8b886 dust up the help message

Change-Id: I6f80736cc8a2ff4f185f63d463a57d5bbf88a0db
2014-07-23 20:03:49 -07:00
James Zern
fb668d78b3 remove -noalphadither option from README/vwebp.1
+ vwebp's -help output

this is a future option; missed in:
793368e restore decode API compatibility

Change-Id: If920df2cf8de57ebad93a6b98830562149396d8d
2014-07-23 19:54:34 -07:00
James Zern
e49f693b1f update NEWS for the next release
Change-Id: If708c6b442816f43522b7e5b292f3cba266d614a
2014-07-23 19:25:36 -07:00
James Zern
cd01358057 Merge "update AUTHORS" into 0.4.1 2014-07-23 17:33:15 -07:00
James Zern
268d01eb24 update AUTHORS
Change-Id: I1eae9342df7bf4e8e98d5328b2e3eab7cba9fee8
2014-07-23 17:21:04 -07:00
James Zern
85213b9bbe bump version to 0.4.1
libwebp{,decoder} - 0.4.1
libwebp libtool - 5.1.0
libwebpdecoder libtool - 1.1.0

mux/demux - 0.2.1
libtool - 1.1.0

Change-Id: If593a198f802fd68c7dbbdbe0fc2612dbc44e2df
2014-07-23 17:17:25 -07:00
James Zern
695f80ae25 Merge "restore mux API compatibility" into 0.4.1 2014-07-23 17:11:33 -07:00
James Zern
862d296cf9 restore mux API compatibility
protect WebPMuxSetCanvasSize w/a WEBP_MUX_ABI_VERSION check

Change-Id: I6b01af55ebb4cc4c860d3cbf43be722077896748
2014-07-23 16:13:56 -07:00
skal
8f6f8c5dde remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
There's no speed diff, so better remove it altogether

Reported in https://code.google.com/p/webp/issues/detail?id=215

Change-Id: I991330de18bec340029d6df5fed0dfb4337e4662
2014-07-23 14:15:40 -07:00
James Zern
d713a69644 Merge changes If4debc15,I437a5d5f into 0.4.1
* changes:
  restore encode API compatibility
  restore decode API compatibility
2014-07-23 13:56:27 -07:00
James Zern
c2fc52e4ec restore encode API compatibility
protect WebPConfigLosslessPreset/WebPMemoryWriterClear w/a
WEBP_ENCODER_ABI_VERSION check

Change-Id: If4debc15fee172a3f18079bc2bd29eb8447bc14b
2014-07-22 22:19:55 -07:00
James Zern
793368e8c6 restore decode API compatibility
protect flip/alpha_dither w/a WEBP_DECODER_ABI_VERSION check

Change-Id: I437a5d5f78800f71b7e7e323faa321f946bf9515
2014-07-22 20:03:52 -07:00
James Zern
b8984f3151 gif2webp: fix compile with giflib 5.1.0
DGifCloseFile() added an error code output parameter
http://giflib.sourceforge.net/gif_lib.html#compatibility

fixes issue #209

original patch by grizzly dot nyo at gmail

Change-Id: I5554de2bd70dbfd95fd356424ad5fb800ac94592
2014-07-22 17:50:30 -07:00
James Zern
222f9b1a9d gif2webp: simplify giflib version checking
introduce LOCAL_GIF_PREREQ/VERSION similar to the GCC variants in dsp.

Change-Id: I00ba5d523047b3b1c14ade992172e75e69043eb3
2014-07-22 17:41:55 -07:00
Vikas Arora
d2cc61b7dd Extend MakeARGB32() to accept Alpha channel.
Change-Id: I31b8e2d085000e2e3687a373401e4f655f11fc42
2014-07-21 14:49:38 -07:00
skal
4595b62b7c Merge "use explicit size of kErrorMessages[] arrays" 2014-07-21 13:37:49 -07:00
skal
157de01507 Merge "Actuate memory stats for PRINT_MEMORY_INFO" 2014-07-21 13:31:15 -07:00
skal
fbda2f499c JPEG decoder: delay conversion to YUV to WebPEncode() call
We store the raw RGB samples decoded from JPEG, and avoid precision loss.

Note that this may increase the encoding time reported by
cwebp -v, since RGB->YUV now occur during WebPEncode call
(in case of lossy), instead of ReadJPEG().
This also increases the memory use, since we're carying the
source ARGB samples around.

Change-Id: Ic2180206cfc9f5574f391e91c3b89b9d81695d01
2014-07-21 22:26:22 +02:00
skal
0b747b1b39 use explicit size of kErrorMessages[] arrays
Change-Id: If02864e3a07ae37814bf379bf347862cd2871bf4
2014-07-21 13:21:56 -07:00
skal
3398d81ac3 Actuate memory stats for PRINT_MEMORY_INFO
Change-Id: If7eac591b5205990ca452ca02b084a908482850a
2014-07-21 13:16:18 -07:00
James Zern
6f3202be98 Merge "move WebPPictureInit to picture.c" 2014-07-21 11:47:17 -07:00
skal
6c347bbb0c move WebPPictureInit to picture.c
Change-Id: I4b8c352cfd47256d0c3827334a6942c1caf742f6
2014-07-21 14:16:19 +02:00
Pascal Massimino
fb3acf19d7 fix configure message for multi-thread
it's not just decoding, but general multi-threading

Change-Id: I9d1b7e14f05b93a0eb92955526b7019c2b0df20c
2014-07-21 00:21:50 -07:00
James Zern
40b086f78a configure: check for _beginthreadex
fixes thread support detection in e.g., cross-compiles/mingw installs
without libpthread, the behavior is unchanged in mingw installs with
libpthread. in the latter case although libpthread is detected the
windows api (_beginthreadex) path is being used.

Change-Id: Ie5f39f15f380731a1006a2497496d627f08fa103
2014-07-20 14:56:37 -07:00
Pascal Massimino
1549d62067 reorder the YUVA->ARGB and ARGB->YUVA functions correctly
+ rework few loops
+ consolidate few error-checks / error-reporting
+ don't modify picture->colorspace in Import() for ARGB output

Change-Id: Iae6da9b50acc738c59b85c3ee64efbaf6af8bffc
2014-07-18 07:15:54 -07:00
James Zern
c6461bfda2 Merge "extract colorspace code from picture.c into picture_csp.c" 2014-07-16 17:01:28 -07:00
Pascal Massimino
736f2a175e extract colorspace code from picture.c into picture_csp.c
had to refactor few functions here and there.

Change-Id: I86fde6fec7c2fc7eb48f0ecf327dbbd2bd40b9d4
2014-07-16 16:37:26 -07:00
pascal massimino
645daa033b Merge "configure: check for -Wformat-security" 2014-07-16 03:23:29 -07:00
James Zern
abafed861c configure: check for -Wformat-security
from debian hardening:
https://wiki.debian.org/Hardening#DEB_BUILD_HARDENING_FORMAT_.28gcc.2Fg.2B-.2B-_-Wformat_-Wformat-security_-Werror.3Dformat-security.29

Change-Id: I6729fb92e7286532151c2dde93b876476ed2af96
2014-07-14 22:52:15 -07:00
Pascal Massimino
fbadb48026 split monolithic picture.c into picture_{tools,psnr,rescale}.c
Change-Id: Ia5eb5496e4337e5bac8203872c5b014cad21c4f9
2014-07-12 09:13:33 -07:00
James Zern
c76f07ecc2 dec_neon/TransformAC3: initialize vector w/vcreate
replaces {} initialization gnu-ism

Change-Id: I5bedcba1a9c21883207301f07456cc6a843199a0
2014-07-11 15:56:53 -07:00
Urvang Joshi
bb4fc051bf gif2webp: Allow single-frame animations
Some single-frame GIF images have a canvas larger than the frame rectangle. For
such images, we retain the ANMF, ANIM and VP8X chunks in the output WebP file.
This ensures that the full canvas width/height and frame offsets are retained.

Change-Id: I3ebae4893f953984de4072fda0938411de787a29
2014-07-10 15:21:05 -07:00
James Zern
46fd44c104 thread: remove harmless race on status_ in End()
if a thread was still doing work when End() was called there'd be a race
on worker->status_. in these cases, however, the specific value is
meaningless as it would be >= OK and the thread would have been shut
down properly, but we'll check 'impl_' instead to avoid any potential
TSan/DRD reports.

Change-Id: Ib93cbc226a099f07761f7bad765549dffb8054b1
2014-07-08 20:32:29 -07:00
James Zern
5a1a7264fc Merge "configure: check for __builtin_bswapXX()" 2014-07-05 13:43:44 -07:00
James Zern
6781423b7d configure: check for __builtin_bswapXX()
defines HAVE_BUILTIN_BSWAP16/32/64
updated endian_inl.h to have a non-configure fallback for gcc and clang
BSwap16() now uses __builtin_bswap16 if available

Change-Id: Ia04ee07b39303c4b247df96d84f298fb8a81f389
2014-07-05 12:35:13 -07:00
James Zern
6450c48d4c configure: fix iOS builds
possibly other cross-compiles; avoids misusing -mavx2/-msse2 in these
cases by checking the usability of the associated intrinsics header
files.

iosbuild.sh has been broken since at least:
6e61a3a configure: test for -msse2

Change-Id: Ic650d9eec70f6a5de7d89997b3b425a4aa50ccd9
2014-07-04 21:48:36 -07:00
James Zern
6422e683af VP8LFillBitWindow: enable fast path for 32-bit builds
also reduce the load size from 64 to 32 bits as the top 32 bits are
being shifted away in the operation.

the change is neutral speed-wise on x86_64 as is the change in load size
on x86, but it gives a slight improvement on 32-bit arm.
x86 is improved ~13%, 32-bit arm ~3.7%
aarch64 is untested but will likely benefit as well.

Change-Id: Ibcb02a70f46f2651105d7ab571afe352673bef48
2014-07-04 14:42:47 -07:00
James Zern
4f7f52b2a1 VP8LFillBitWindow: respect WEBP_FORCE_ALIGNED
Change-Id: I23eddf01590de002efc21d8c7acc545a08fc3e48
2014-07-04 13:53:29 -07:00
James Zern
e458badcc3 endian_inl.h: implement htoleXX with BSwapXX
+ s/htole(16|32)/HToLE$1/ to avoid any name conflicts

Change-Id: Ic1c84711557e50f73d83ca5aa2b3992ac6738216
2014-07-04 12:16:36 -07:00
James Zern
f2664d1aab endian_inl.h: add BSwap16
+ use it in VP8LoadNewBytes()

Change-Id: I701d3652dc0cbd553852978702ef68c2657bca1c
2014-07-04 12:16:28 -07:00
James Zern
6fbf5345e3 Merge "configure: add --enable-aligned" 2014-07-04 00:12:28 -07:00
James Zern
dc0f479d6a configure: add --enable-aligned
forces aligned memory reads (via memcpy) in the VP8 bit reader, useful
for platforms that don't support unaligned loads.

Change-Id: Ifa44a9a1677fbdc6a929520f9340b7e3fcbd6692
2014-07-03 23:30:09 -07:00
James Zern
9cc69e2b53 Merge "configure: support WIC + OpenGL under mingw64" 2014-07-03 23:24:16 -07:00
Pascal Massimino
257adfb0be remove experimental YUV444 YUV422 and YUV400 code
(never used)

Change-Id: I12a886703592133939607df05132e9b498f916c1
2014-07-03 22:26:25 -07:00
James Zern
10f4257c3b configure: support WIC + OpenGL under mingw64
fixes issue #211

original patch by grizzly dot nyo at gmail

Change-Id: Ie7bf63d24c112e7e351979da4bebcb690a831c5d
2014-07-03 18:57:07 -07:00
James Zern
380cca4f2c configure.ac: add AC_C_BIGENDIAN
this defines WORDS_BIGENDIAN, replacing uses of
__BIG_ENDIAN__/__BYTE_ORDER__ with it

+ fixes lossless BGRA output with big-endian toolchains
  that do not define __BIG_ENDIAN__ (codesourcery mips gcc)

Change-Id: Ieaccd623292d235343b5e34b7a720fc251c432d7
2014-07-03 18:15:50 -07:00
James Zern
ee70a90187 endian_inl.h: add BSwap64
Change-Id: I66672b770500294b8f4ee8fa4bf1dfff1119dbe6
2014-07-03 13:35:34 -07:00
James Zern
47779d46c8 endian_inl.h: add BSwap32
Change-Id: I96e3ae49659307024415d64587e6312888a0070f
2014-07-03 13:28:13 -07:00
James Zern
d5104b1ff6 utils: add endian_inl.h
moves the following to this header:
- htole*() definitions from bit_writer.c
- __BIG_ENDIAN__ fallback define from bit_reader_inl.h

Change-Id: I7fff59543f08a70bf8f9ddac849b72ed290471b1
2014-07-03 13:07:14 -07:00
Pascal Massimino
58ab622437 Merge "make alpha-detection loop in IsKeyFrame() in good x/y order" 2014-07-02 22:38:41 -07:00
Pascal Massimino
9d5629025c make alpha-detection loop in IsKeyFrame() in good x/y order
Change-Id: Ifeeb855e66c7b6b849e8584787dc24e7371b1e67
2014-07-02 21:03:35 -07:00
Vikas Arora
516971b136 lossless: Remove unaligned read warning
(typecast uint32 pointer to uint64).
The proposed change is little (0.05%) slower but avoids uint32 to uint64
pointer conversion.

Change-Id: I6b8828077ea1324fabd04bfa7e7439e324776250
2014-07-02 20:55:27 -07:00
James Zern
b8b596f6c3 Merge "configure.ac: add an autoconf version prerequisite" 2014-07-02 19:26:39 -07:00
James Zern
34b02f8cb0 configure.ac: add an autoconf version prerequisite
ax_pthread.m4 uses AS_CASE() which was added in 2.60

Change-Id: Ifafe2271a8814c734a20d24048e9b24925e13077
2014-07-01 20:02:58 -07:00
James Zern
e59f53600f neon: normalize vdup_n_* usage
with constants, prefer this over vmov_n_* or vcreate_*

Change-Id: Ia84b2a82faea58e2626211a7e2257e0ba4af358a
2014-07-01 00:55:05 -07:00
James Zern
6ee7160dd2 Merge changes I0da7b3d3,Idad2f278,I4accc305
* changes:
  neon: add INIT_VECTOR4
  neon: add INIT_VECTOR3
  neon: add INIT_VECTOR2
2014-07-01 00:48:38 -07:00
skal
abc02f2492 Merge "fix (uncompiled) typo" 2014-07-01 00:29:19 -07:00
James Zern
bc03670f01 neon: add INIT_VECTOR4
used to initialize NxMx4 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I0da7b3d321f3d48579b7863fb2e4d3f449ae7f5e
2014-07-01 00:18:23 -07:00
James Zern
6c1c632b03 neon: add INIT_VECTOR3
used to initialize NxMx3 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: Idad2f278ab104cf2cc650517194258ce3cfb37b4
2014-06-30 23:53:23 -07:00
James Zern
dc7687e51b neon: add INIT_VECTOR2
used to initialize NxMx2 vector types
replaces initialization via '{{ }}' gnu-ism.

Change-Id: I4accc305c7dd4c886b63c22e38890b629bffb139
2014-06-30 23:52:42 -07:00
skal
4536e7c49c add WebPMuxSetCanvasSize() to the mux API
previously, the final canvas size was adjusted tightly from the
animation frames. Now, it can be specified separately (to be larger, in particular).

calling WebPMuxSetCanvasSize(mux, 0, 0) triggers the 'adjust tightly' behaviour.
This can be useful after calling WebPMuxCreate() if further image addition
is expected.

-> Fixed gif2webp accordingly.

also: made WebPMuxAssemble() more robust by systematically zero-ing WebPData.

Change-Id: Ib4f7eac372cf9dbf6e25cd686a77960e386a0b7f
2014-06-30 07:00:49 +02:00
skal
824eab1086 fix (uncompiled) typo
output was not properly descaled when compiled
without USE_GAMMA_COMPRESSION

Change-Id: I5282c9573cfcfbef4be22b6b3cdec7e16cad9c1c
2014-06-27 17:10:25 +02:00
Pascal Massimino
1f3e5f1e60 remove unused 'shift' argument and QFIX2 define
this will remove a warning about the shift amount not being
an immediate (=constant).

Change-Id: Ie9a00fefdb9a07ec8994fb113f24234518bc878a
Also: fix the NULL sharpen argument mismatch.
2014-06-26 00:44:12 -07:00
pascal massimino
8e867051fd Merge "VP8LoadNewBytes: use __builtin_bswap32 if available" 2014-06-25 02:15:17 -07:00
skal
1b6a263566 Merge "Fix handling of weird GIF with canvas dimension 0x0" 2014-06-25 02:14:54 -07:00
James Zern
1da3d46138 VP8LoadNewBytes: use __builtin_bswap32 if available
mostly to balance the use of bswap64, some gcc platforms are already
interpreting the default case the same

Change-Id: Icf860f55b3f16bea349a7d721e6d6abeeb4e5cf3
2014-06-24 23:24:36 -07:00
skal
1582e402fd Fix handling of weird GIF with canvas dimension 0x0
Similarly to Chrome, we then use the first sub-rectangle
to set the canvas size.

Also: add check for too-large GIF dimensions (>MAX_CANVAS_SIZE)

Change-Id: Idce55f1e6f6982a8f0e082aac540e16b530e023e
2014-06-24 17:40:40 +02:00
Pascal Massimino
b8811dac12 Merge "rename interface -> winterface" 2014-06-24 00:44:11 -07:00
skal
db8b8b5fc2 Fix logic in the GIF LOOP-detection parsing
We align with Blink/Chromium code by:
  - checking for ANIMEXTS1.0 signature too
  - using ByteCount >= 3 instead of requiring ByteCount==3

Change-Id: Idc484ca62878517df3dccb1fdb3bb45104a5e066
see: http://odur.let.rug.nl/kleiweg/gif/netscape.html
2014-06-23 23:26:36 +02:00
Pascal Massimino
25aaddc84c rename interface -> winterface
to avoid name clash on win32

Change-Id: Ia4ad3d4528df4652bab803f9a9544e21e6c4b177
2014-06-23 14:21:22 -07:00
skal
5584d9d2fc make WebPSetWorkerInterface() check its arguments
Change-Id: I522c58cfe05e864a50cacb58bdfa14d5369c6d60
2014-06-23 07:39:56 +02:00
James Zern
a9ef7ef991 Merge "cosmetics: update thread.h comments" 2014-06-19 23:23:06 -07:00
James Zern
c6af999168 Merge "dust up the help message" 2014-06-19 23:12:02 -07:00
skal
0a8b8863cc dust up the help message
* try to avoid trailing '.'
* rationalize capitalization

Change-Id: I50939baf01b1ab44d3031eee916ba51f2338af8a
2014-06-20 07:13:11 +02:00
James Zern
a9cf31913c cosmetics: update thread.h comments
WebPWorker*() are now part of WebPWorkerInterface; refer to them with
unadorned names.

Change-Id: Iae1dd59f1e545cba6dd8c18f26ba60eb9a84419b
2014-06-19 19:31:10 -07:00
levytamar82
27bfeee43a QuantizeBlock SSE2 Optimization:
Another store to load forward block was detected coming from the function
FTransform.
FTransform save the output data 4 times 8 bytes each. when this data is
later being loaded by the QuantizeBlock function in one chunk of 16 bytes
that caused a store to load forward block.
The fix was done in the FTransform function where each two consecutive 8 bytes
were merged into one 16 bytes register and saved into the memory.
This fix gives ~21% function level gain and 1.6% user level gain.

Change-Id: Idc27c307d5083f3ebe206d3ca19059e5bd465992
2014-06-18 16:22:00 -07:00
James Zern
2bc0dc3edc Merge "webpmux: warn when odd frame offsets are used" 2014-06-18 12:18:09 -07:00
James Zern
3114ebe46c Merge changes Id8edd3c1,Id418eb96,Ide05e3be
* changes:
  examples/Android.mk: add cwebp
  Android.mk: move dwebp to examples/Android.mk
  Android.mk: add ENABLE_SHARED flag
2014-06-18 12:17:15 -07:00
James Zern
c072663493 webpmux: warn when odd frame offsets are used
offsets are stored as (x/2, y/2)

Change-Id: Ic8f727ab7996a84c1f8c57f4f6dbaf8701bf8eae
2014-06-18 10:47:23 -07:00
skal
c5c6b408b1 Merge "add alpha dithering for lossy" 2014-06-18 07:32:24 -07:00
James Zern
d51467841e examples/Android.mk: add cwebp
Change-Id: Id8edd3c17d82e4ab0087c315cd3ead1cb285e714
2014-06-17 23:41:12 -07:00
James Zern
ca0fa7c7a5 Android.mk: move dwebp to examples/Android.mk
this depends on the top-level Android.mk for shared flags

Change-Id: Id418eb9639e839518a921ffcb6a376ce10aafbd2
2014-06-17 23:41:01 -07:00
James Zern
73d8fca01e Android.mk: add ENABLE_SHARED flag
builds libwebp.so instead of libwebp.a
$ ndk-build ENABLE_SHARED=1

Change-Id: Ide05e3be4f9848852e6d7e9d99abe11344419241
2014-06-17 23:30:20 -07:00
James Zern
6e93317f5b muxread: fix out of bounds read
ChunkVerifyAndAssign() expects to have at least 8 bytes to work with,
but was only checking for the presence of 4.

Change-Id: I8456b15d872de24a90c1e8fbfba463391ced5c7f
2014-06-17 12:53:28 -07:00
James Zern
8b0f6a4802 Makefile.vc: fix CFLAGS assignment w/HAVE_AVX2=1
there's no '+=' operator

Change-Id: Icd6fe684bd468827339f0d73643b325c8477d5c5
2014-06-16 20:23:20 -07:00
skal
bbe32df1e3 add alpha dithering for lossy
new options:
 dwebp -alpha_dither
 vwebp -noalphadither

When the source was marked as quantized, we use a threshold-averaging
filter to smooth the decoded alpha plane.
Note: this option forces the decoding of alpha data in one pass, and
might slow the decoding a bit.

The new field in WebPDecoderOptions struct is 'alpha_dithering_strength'
(0 by default, means: off). Max strength value is '100'.

Change-Id: I218e21af96360d4781587fede95f8ea4e2b7287a
2014-06-14 00:06:16 +02:00
skal
790207679d Merge "make error-code reporting consistent upon malloc failure" 2014-06-13 00:25:30 -07:00
skal
77bf4410f7 make error-code reporting consistent upon malloc failure
Sometimes, the error-code was not set correctly.
We now return OUT_OF_MEMORY everytimes it's appropriate
(tested using MALLOC_FAIL_AT mechanism)

Took the opportunity to clean-up the code and dust the error
code returned (some were erroneously set to INVALID_CONFIGURATION)

Change-Id: I56f7331e2447557b3dd038e245daace4fc82214c
2014-06-13 08:45:12 +02:00
James Zern
7a93c000ee **/Makefile.am: remove unused AM_CPPFLAGS
only 1 of <lib>_CPPFLAGS and AM_CPPFLAGS is used, with the former
getting precedence when it's defined. configure's DEFAULT_INCLUDES is
covering what's necessary given the include paths are all source
relative.

Change-Id: I7d14076acd266b28a88a3d92bcc3d7165284d5f3
2014-06-12 11:59:05 -07:00
skal
24e3080571 Add an interface abstraction to the WebP worker thread implementation
This allows custom implementations of threading mecanism.

Patch by Leonhard Gruenschloss.

Change-Id: Id8ea5917acd2f24fa8bce79748d1747de2751614
2014-06-12 11:35:44 +02:00
Pascal Massimino
d6cd6358ff Merge "fix orig_rect==NULL case" 2014-06-12 00:48:30 -07:00
Pascal Massimino
2bfd1ffaba fix orig_rect==NULL case
Change-Id: I3bb4fbebf59cba2a67681e74530fc0fe51f1958f
2014-06-12 00:26:33 -07:00
James Zern
059e21c195 Merge "configure: move config.h to src/webp/config.h" 2014-06-11 22:52:07 -07:00
skal
f05fe006c2 properly report back encoding error code in WebPFrameCacheAddFrame()
User-hook can fail but error was not propagated back.

Change-Id: Ic79f9543bf767634a127eccfef90af855ff15c34
Also: some ad-hoc clean-up and API dusting. More to come later...
2014-06-11 23:26:47 +02:00
James Zern
32b3137936 configure: move config.h to src/webp/config.h
this change has the side-effect of using directory names in the
include, silencing a lint warning.

Change-Id: Ib91cf63a90534e32fadfa5c2372bfdb29f854d02
2014-06-10 23:42:00 -07:00
James Zern
90090d99b5 Merge changes I7c675e51,I84f7d785
* changes:
  configure: test for -msse2
  rename upsampling_mips32.c to yuv_mips32.c
2014-06-10 16:15:21 -07:00
James Zern
ae7661b36e makefiles: define WEBP_HAVE_AVX2 when appropriate
more precisely: in makefile.unix and Makefile.vc when HAVE_AVX2=1

Change-Id: Ia93f373eef6595c016d650728f4f64f61764cd48
2014-06-09 16:35:36 -07:00
skal
69fce2ea78 remove the special casing for res->first in VP8SetResidualCoeffs
if res->first = 1, coeffs[0]=0 because of quant.c:749 and line
added at quant.c:744
So, no need for the extra case.
Going forward, TrellisQuantizeBlock() should also be calling
a variant of VP8SetResidualCoeffs() to set the 'last' field.

also: fixes a warning for win64
    + slight speed-up

Change-Id: Ib24b611f7396d24aeb5b56dc74d5c39160f048f0
2014-06-08 06:40:22 +02:00
James Zern
6e61a3a905 configure: test for -msse2
+ add a WEBP_HAVE_SSE2 to dsp.h

not all 32-bit toolchain configurations will have sse2 enabled by
default

Change-Id: I7c675e511581f93cf55c79f960fa7efa2df4987e
2014-06-07 19:44:08 -07:00
James Zern
b9d2efc629 rename upsampling_mips32.c to yuv_mips32.c
matches yuv_sse2 added in;
bdfeeba dsp/yuv: move sse2 functions to yuv_sse2.c

Change-Id: I84f7d7858ca6851c956e8366a7c76b45070dcbc3
2014-06-07 12:35:47 -07:00
James Zern
bdfeebaa01 dsp/yuv: move sse2 functions to yuv_sse2.c
Change-Id: I2f037ff18e7cf07e8801f49b3a89c1e36ef73000
2014-06-05 23:52:54 -07:00
pascal massimino
46b32e861a Merge "configure: set WEBP_HAVE_AVX2 when available" 2014-06-05 02:57:42 -07:00
pascal massimino
88305db4fc Merge "VP8RandomBits2: prevent signed int overflow" 2014-06-05 01:46:42 -07:00
James Zern
73fee88c4a VP8RandomBits2: prevent signed int overflow
'diff' at its largest may be INT_MAX; << 1 of anything at or above
1 << 30 will overflow.

Change-Id: Idb2b5a9b55acc2f6d5e32be8baaebee3f89919ad
2014-06-04 23:19:03 -07:00
James Zern
db4860b355 enc_sse2: prevent signed int overflow
_mm_movemask_epi8 returns a 16-bit mask; << 16 can overflow a signed
int.

Change-Id: Ia0bb0804fe548fb9b0edb3695e82727506066cda
2014-06-04 23:18:22 -07:00
skal
3fdaf4d28c Merge "real fix for longjmp warning" 2014-06-04 03:01:40 -07:00
skal
385e334019 real fix for longjmp warning
the 'volatile' qualifier was at the wrong place

Patch by Paul Pluzhnikov

Change-Id: I26e6f311a0ccd145de640b3505fe92965389c1d9
2014-06-04 11:02:42 +02:00
James Zern
230a055501 configure: set WEBP_HAVE_AVX2 when available
this is used to set WEBP_USE_AVX2 in files where the build flag won't be
used, i.e., dsp/enc.c, which enables VP8EncDspInitAVX2() to be called

Change-Id: I362f4ba39ca40d3e07a081292d5f743c649d9d7f
2014-06-03 23:29:23 -07:00
skal
a2ac8a420e restore original value_/range_ field order
no speed change, just for coherency

Change-Id: Iaa395bca24f33a14b68ba6920b838ef87d0d0db6
2014-06-03 09:36:56 +02:00
James Zern
5e2ee56fdd Merge "remove libwebpdspdecode dep on libwebpdsp_avx2" 2014-06-03 00:28:54 -07:00
James Zern
61362db57c remove libwebpdspdecode dep on libwebpdsp_avx2
it's encode only, libwebpdecoder doesn't need the symbols

Change-Id: I5633dd2017a96e60068ae5384f1ba27898d29f83
2014-06-03 00:05:56 -07:00
Pascal Massimino
42c447aeb0 Merge "lossy bit-reader clean-up:" 2014-06-02 23:53:00 -07:00
James Zern
479ffd8b5d Merge "remove unused #include's" 2014-06-02 23:07:20 -07:00
James Zern
9754d39a4e Merge "strong filtering speed-up (~2-3% x86, ~1-2% for NEON)" 2014-06-02 23:06:18 -07:00
skal
158aff9bb9 remove unused #include's
Change-Id: Icd91a4b6a0bde49145f57e3e74a997822c45792c
2014-06-03 08:01:05 +02:00
Pascal Massimino
09545eeadc lossy bit-reader clean-up:
* remove LEFT/RIGHT_JUSTIFY distinction. It's all RIGHT_JUSTIFY now.
* simplify VP8GetSigned(), and add some masking branch-less code. Much
  faster on ARM (~13% speed-up). 8% on x86-64, 5% on MacBook.
* split critical implementation into separate bit_reader_inl.h file that
  is only included where needed (vp8.c / tree.c / bit_reader.c)
* bumped BITS value from 16 to 24 for x86-32b too, since it's a bit faster.

Change-Id: If41ca1da3e5c3dadacf2379d1ba419b151e7fce8
2014-06-03 07:46:55 +02:00
skal
ea8b0a171d strong filtering speed-up (~2-3% x86, ~1-2% for NEON)
Extract loop invariant and avoid storing/loading samples
if they can be re-used. This is particularly interesting when
a transpose is involved (HFilter16i).

Change-Id: I93274620f6da220a35025ff8708ff0c9ee8c4139
2014-06-03 07:14:23 +02:00
skal
6679f8996f Optimize VP8SetResidualCoeffs.
Brings down WebP lossy encoding timings by 5%

Change-Id: Ia4a2fab0a887aaaf7841ce6d9ee16270d3e15489
2014-06-03 06:44:04 +02:00
skal
ac591cf22e fix for gcc-4.9 warnings about longjmp + local variables
Needed to add 'volatile' and some casts.
Relevant excerpt from the 'man longjmp':

===============
The values of automatic variables are unspecified after a call to longjmp() if they meet all the following criteria:
  ·  they are local to the function that made the corresponding setjmp(3) call;
  ·  their values are changed between the calls to setjmp(3) and longjmp(); and
  ·  they are not declared as volatile.
===============

Change-Id: Ic72dc92669513a820369ca52a038afa9ec88091f
2014-05-30 10:19:10 -07:00
James Zern
4dfa86b29c dsp/cpu: NaCl has no support for xgetbv
or the raw opcode; fixes:
  934ed4: unrecognized instruction

Change-Id: I981870baf0e8b03bf40144ea8ec25eff140d5bc3
2014-05-29 23:02:23 -07:00
James Zern
4c398699ef Merge "cwebp: fallback to native webp decode in WIC builds" 2014-05-28 15:03:34 -07:00
James Zern
33aa497e1a Merge "cwebp: add some missing newlines in longhelp output" 2014-05-28 12:52:22 -07:00
skal
c9b340a279 fix missing WebPInitAlphaProcessing call for premultiplied colorspace output
(lossless only)

Change-Id: Ic2d01c8cf9bc1082f07f348733461eb2ee30288a
2014-05-28 10:44:05 +02:00
pascal massimino
57897bae09 Merge "lossless_neon: use vcreate_*() where appropriate" 2014-05-28 01:36:13 -07:00
pascal massimino
6aa4777b39 Merge "(enc|dec)_neon: use vcreate_*() where appropriate" 2014-05-28 01:34:56 -07:00
skal
0d346e418d Always reinit VP8TransformWHT instead of hard-coding
Change-Id: I2012749ed29bd166d2a96555372f0d9baa784385
2014-05-28 10:21:07 +02:00
James Zern
7d039fc32d cwebp: fallback to native webp decode in WIC builds
this gives precedence to WIC, but attempts to decode the file as WebP if
it fails

Change-Id: I3d894f39a26aea88897a8ebd345139b82f74f312
2014-05-27 16:28:37 -07:00
James Zern
d471f424da cwebp: add some missing newlines in longhelp output
+ update README

Change-Id: Ia84d8857d575bc29ab3ce9c0f10264c042067e78
2014-05-27 16:28:02 -07:00
James Zern
bf0e003067 lossless_neon: use vcreate_*() where appropriate
this is more portable than {} initialization.
more involved cases are left for a follow-up.

Change-Id: If7e111864f287ea0a5de6311454aeda37afbb52a
2014-05-27 16:27:46 -07:00
James Zern
9251c2f6d2 (enc|dec)_neon: use vcreate_*() where appropriate
this is more portable than {} initialization.
more involved cases are left for a follow-up.

Change-Id: If8783423d17e90694b168a64ba313ed62ce2cc17
2014-05-27 16:26:56 -07:00
skal
399b916d27 lossy decoding: correct alpha-rescaling for YUVA format
The luminance needs to be pre- and post- multiplied by
the alpha value in case of rescaling, for proper averaging.

Also:
- removed util/alpha_processing and moved it to dsp/
- removed WebPInitPremultiply() which was mostly useless
and merged it with the new function WebPInitAlphaProcessing()

Change-Id: If089cefd4ec53f6880a791c476fb1c7f7c5a8e60
2014-05-27 15:27:13 -07:00
James Zern
78c12ed8e6 Merge "Makefile.vc: add rudimentary avx2 support" 2014-05-27 11:13:40 -07:00
skal
dc5b122f23 try to remove the spurious warning for static analysis
Change-Id: Ib81f16c70a0bfad05021401c1cf6788c974b63bd
2014-05-26 18:31:00 +02:00
James Zern
ddfefd624c Makefile.vc: add rudimentary avx2 support
similar to makefile.unix:
> nmake /f Makefile.vc CFG=release-static HAVE_AVX2=1

from the msdn:
The /arch:AVX2 option and __AVX2__ macro were introduced in Visual
Studio 2013 Update 2, version 12.0.34567.1
(Update 2, version 12.0.30501.00 seems to work)

Change-Id: I649ee47c9fdc399fc71a8ac8464728608d9b6412
2014-05-23 20:52:02 -07:00
Pascal Massimino
a891164398 Merge "simplify VP8LInitBitReader()" 2014-05-22 22:36:41 -07:00
Pascal Massimino
fdbcd44dd3 simplify VP8LInitBitReader()
gcc was generating very complex code, one for each case of br->len_ values!

also, pretty-fy the mask constants

Change-Id: If62b1e8266f3fe5334517305113038d2ea8a6b42
2014-05-22 21:44:16 -07:00
James Zern
7c004287af makefile.unix: add rudimentary avx2 support
$ make -f makefile.unix HAVE_AVX2=1

will define -mavx2 for src/dsp/*_dsp.c

Change-Id: Id9651bda54da057cb051dc70f7dcd008a3f803f4
2014-05-22 18:38:40 -07:00
James Zern
515e35cfb1 Merge "add stub dsp/enc_avx2.c" 2014-05-22 18:28:38 -07:00
skal
a05dc1402c SSE2: yuv->rgb speed-up for point-sampling
- use statically initialized tables (if WEBP_YUV_USE_SSE2_TABLES is defined)
 - use SSE2 row conversion for yuv->ARGB / RGBA / ABGR / RGB / BGR
 - clean-up and harmonize the WebpUpsamplers[] usage.

Change-Id: Ic5f3659a995927bd7363defac99c1fc03a85a47d
2014-05-22 09:56:47 +02:00
James Zern
178e9a69ae add stub dsp/enc_avx2.c
VP8EncDspInitAVX2 is included in sse2 builds for now, later a configure
flag should be added to avoid the stub when avx2 is unavailable/disabled

Change-Id: I6127b687c273f46f41652aaf8e3b86ae3cfb8108
2014-05-22 00:31:46 -07:00
James Zern
1b99c09cdc Merge "configure: add a test for -mavx2" 2014-05-22 00:30:10 -07:00
James Zern
fe72807112 configure: add a test for -mavx2
sets AVX2_FLAGS; currently unused

Change-Id: Ie07ee6c2fa7c1f0748430010a9f207b1723b6def
2014-05-21 23:17:21 -07:00
James Zern
e46a247c87 cpu: fix check for __cpuidex availability
__cpuidex was added in VS2008 /SP1/

Change-Id: Ie49b00b0246bd6537c0ed583412f17d6fd135baa
2014-05-21 22:59:47 -07:00
skal
176fda2650 fix the bit-writer for lossless in 32bit mode
Sometimes, we can write 18bit or more at time, and it would
overflow the 32bit accumulator.

Also clarified the num-bits limitations (and exposed
VP8L_MAX_NUM_BIT_READ in bit_reader.h)

fixes http://code.google.com/p/webp/issues/detail?id=200

Seems a bit faster (use of local fields for bits_ / used_)

also: added the __QNX__ bswap while at it.

Change-Id: I876db93a931db15b083cf1d838c70105effa7167
2014-05-22 07:19:22 +02:00
James Zern
541784c710 dsp.h: add a check for AVX2 / define WEBP_USE_AVX2
Change-Id: I90cc870f0bb4426af701779c367587dc2ae79c8b
2014-05-21 20:46:28 -07:00
James Zern
bdb151ee80 dsp/cpu: add AVX2 detection
currently unused.
https://software.intel.com/en-us/articles/how-to-detect-new-instruction-support-in-the-4th-generation-intel-core-processor-family
http://www.intel.com/content/dam/www/public/us/en/documents/manuals/64-ia-32-architectures-optimization-manual.pdf

Change-Id: I314200f890c58b9a587b902b214f90deb95f0579
2014-05-20 22:48:54 -07:00
Pascal Massimino
ab9f2f8685 Merge "revamp the point-sampling functions by processing a full plane" 2014-05-20 15:21:31 -07:00
Pascal Massimino
a2f8b28905 revamp the point-sampling functions by processing a full plane
-nofancy is slower than fancy upsampler, because the latter has SSE2 optim.

Change-Id: Ibf22e5a8ea1de86a54248d4a4ecc63d514c01b88
2014-05-20 15:13:44 -07:00
Pascal Massimino
ef076026af use decoder's DSP functions for autofilter
-af is now faster (6-7%), since we're using the SSE2 variant
Output is binary the same as before.

Change-Id: If75694594c9501cd486b8f237a810ddcc145cadd
2014-05-20 14:55:05 -07:00
pascal massimino
2b5cb32612 Merge "dsp/cpu: add AVX detection" 2014-05-20 01:10:18 -07:00
James Zern
df08e67e06 dsp/cpu: add AVX detection
currently unused.
https://software.intel.com/en-us/articles/introduction-to-intel-advanced-vector-extensions

similar checks exist in ffmpeg, libyuv. the visual studio inline asm is
based off of libyuv.

Change-Id: I3e233de3492172434e482607a94b99c617f11aad
2014-05-20 00:25:12 -07:00
Pascal Massimino
e2f405c969 Merge "clean-up and slight speed-up in-loop filtering SSE2" 2014-05-20 00:08:40 -07:00
Pascal Massimino
f60957bfd2 clean-up and slight speed-up in-loop filtering SSE2
* remove some sign-bit flipping
* turn some macro into inline functions
* fix some 'const' in signatures
* clarify the int8/uint8 usage

Change-Id: Ib04459ac34cb280c57579c5d79a5efd2f8d5e99d
2014-05-19 23:23:47 -07:00
James Zern
9fc3ae469f .gitattributes: treat .ppm as binary
Change-Id: I4da7b846f6255078f0ce97fc7e8df9f29271f52a
2014-05-15 23:18:35 -07:00
James Zern
3da924b5b4 Merge "dsp/WEBP_USE_NEON: test for __aarch64__" 2014-05-14 20:16:18 -07:00
James Zern
c7164490da Android.mk: always include *_neon.c in the build
the inclusion of the files is harmless when NEON is not enabled and will
allow them to be built with NEON for APP_ABI=arm64-v8a which currently
does not use the '.neon' suffix

Change-Id: I39377876b1b68822c38f4e2396da93c56145fc0f
2014-05-14 00:11:46 -07:00
James Zern
a577b23a0a dsp/WEBP_USE_NEON: test for __aarch64__
__ARM_NEON__ is unset by current linux gcc/clang + android toolchains
for aarch64/arm64 builds.

Change-Id: Ib2ca172ea6fcf046e4ced19a431088674c99b7f6
2014-05-14 00:07:13 -07:00
skal
54bfffcabc move RemapBitReader() from idec.c to bit_reader code
mostly for coherency and later patch.

Change-Id: Ica8352d67845b6c5b3153435edfb4646c6f24341
2014-05-14 07:07:08 +02:00
James Zern
34168ecbe4 Merge "remove all unused layer code" 2014-05-08 22:51:13 -07:00
Pascal Massimino
f1e771735a remove all unused layer code
Change-Id: I220590162b24c70f404fe3087f19dd3e6cac3608
2014-05-08 22:37:38 -07:00
Vikas Arora
b0757db7c6 Code cleanup for VP8LGetHistoImageSymbols.
Fix comments and few nits.
Change-Id: I8fa25ed523f12c6a7bfe125f0e4d638466ba4304
2014-05-08 14:13:47 -07:00
skal
5fe628d35d make the token page size be variable instead of fixed 8192
also changed the token-page layout a little bit to remove
a not-needed field.

This reduces the number of malloc()/free() calls substantially
with minimal increase in memory consumption (~2%).
For the tail of large sources, the number of malloc calls goes
typically from ~10000 to ~100 (e.g.: bryce_big.jpg: 22711 -> 105)

Change-Id: Ib847f41e618ed8c303d26b76da982fbc48de45b9
2014-05-05 14:26:14 -07:00
skal
f948d08c81 memory debug: allow setting pre-defined malloc failure points
MALLOC_FAIL_AT flag can be used to set-up a pre-determined failure
point during malloc calls. The counter value is retrieved using
getenv().
Example usage: export MALLOC_FAIL_AT=37 && cwebp input.png
will make 'cwebp' report a memory allocation error the 37th time
malloc() or calloc() is called.

MALLOC_MEM_LIMIT can be used similarly to prevent allocating more
than a given amount of memory. This is usually less convenient to
use than MALLOC_FAIL_AT since one has to know in advance the typical
memory size allocated.

Both these flags are meant to be used for debugging only!

Also: added a 'total_mem_allocated' to record the overall memory allocated

Change-Id: I9d408095ee7d76acba0f3a31b1276fc36478720a
2014-05-05 14:01:33 -07:00
skal
ca3d746e39 use block-based allocation for backward refs storage, and free-lists
Non-photo source produce far less literal reference and their
buffer is usually much smaller than the picture size if its compresses
well. Hence, use a block-base allocation (and recycling) to avoid
pre-allocating a buffer with maximal size.

This can reduce memory consumption up to 50% for non-photographic
content. Encode speed is also a little better (1-2%)

Change-Id: Icbc229e1e5a08976348e600c8906beaa26954a11
2014-05-05 11:11:55 -07:00
James Zern
1ba61b09f9 enable NEON intrinsics in aarch64 builds
avoids functions that use vtbl? as in iOS builds these are marked
unavailable

Change-Id: I17aedc3c7dc8f1d5be0941205de0b22c3772ef1b
2014-05-03 12:37:42 -07:00
James Zern
b9d2bb67d6 dsp/neon.h: coalesce intrinsics-related defines
Change-Id: Ifadd41a5bbf7f99eeb6d75d2b67daa25e0544946
2014-05-03 11:34:07 -07:00
James Zern
b5c7525897 iosbuild: add support for iOSv7/aarch64
Change-Id: I3a51c77276e245cd871acb18d9d70d109aac000b
2014-05-03 11:14:37 -07:00
Vikas Arora
9383afd5c7 Reduce number of memory allocations while decoding lossless.
This change reduces the number of calls to WebPSafeMalloc from 200 to
100. The overall memory consumption is down 3% for Lenna image.

Change-Id: I1b351a1f61abf2634c035ef1ccb34050b7876bdd
2014-05-02 01:01:43 -07:00
James Zern
888e63edc9 Merge "dsp/lossless: prevent signed int overflow in left shift ops" 2014-05-02 00:29:54 -07:00
Pascal Massimino
8137f3edbd Merge "instrument memory allocation routines for debugging" 2014-05-02 00:23:48 -07:00
Pascal Massimino
2aa187360d instrument memory allocation routines for debugging
Some tracing code is activated by PRINT_MEM_INFO flag.
For debugging only! (not thread-safe, and slow).

Change-Id: I282c623c960f97d474a35b600981b761ef89ace9
2014-05-02 00:19:55 -07:00
skal
d3bcf72bf5 Don't allocate VP8LHashChain, but treat like automatic object
the unique instance of VP8LHashChain (1MB size corresponding to hash_to_first_index_)
is now wholy part of VP8LEncoder, instead of maintaining the pointer to VP8LHashChain
in the encoder.

Change-Id: Ib6fe52019fdd211fbbc78dc0ba731a4af0728677
2014-04-30 14:10:48 -07:00
James Zern
bd6b8619dd dsp/lossless: prevent signed int overflow in left shift ops
force unsigned when shifting by 24.

Change-Id: I453601f33fdf01c516ef66ad23399ae6cbe032b3
2014-04-30 00:10:49 -07:00
James Zern
b7f19b8311 Merge "dec/vp8l: prevent signed int overflow in left shift ops" 2014-04-29 15:56:54 -07:00
Pascal Massimino
29059d5178 Merge "remove some uint64_t casts and use." 2014-04-29 14:15:40 -07:00
James Zern
e69a1df4b7 dec/vp8l: prevent signed int overflow in left shift ops
force unsigned when shifting by 24.

Change-Id: I6f9ca5fa2109e59b1d46a909136384fc6dc8ca0b
2014-04-29 14:12:38 -07:00
Pascal Massimino
cf5eb8ad19 remove some uint64_t casts and use.
We use automatic int->uint64_t promotion where applicable.

(uint64_t should be kept only for overflow checking and memory alloc).

Change-Id: I1f41b0f73e2e6380e7d65cc15c1f730696862125
2014-04-29 09:08:25 -07:00
Djordje Pesut
38e2db3e16 MIPS: MIPS32r1: Added optimization for HistogramAdd.
Change-Id: I39622a9c340c4090f64dd10e515c4ef2aa21d10a
2014-04-29 08:36:51 -07:00
James Zern
e0609ade15 dwebp: fix exit code on webp load failure
on ExUtilLoadWebP() failure no allocated memory will be returned, so
it's safe to exit immediately. additionally, any webp specific problems
will already have been reported as part of the call to
WebPGetFeatures().

broken since:
4a0e739 dwebp: move webp decoding to example_util

Change-Id: Ibc632015a1f52bae7f96d063252624123fa7c2da
2014-04-28 17:22:29 -07:00
James Zern
bbd358a8e7 Merge "example_util.h: avoid forward declaring enums" 2014-04-28 15:10:12 -07:00
James Zern
8955da2149 example_util.h: avoid forward declaring enums
doing so is not part of ISO C; removes some pedantic warnings.
use webp/decode.h to pickup VP8StatusCode instead.

Change-Id: I19b35e0f8a36fb7c45944ae9ca86838e08b90548
2014-04-28 14:56:19 -07:00
Vikas Arora
6d6865f0db Added SSE2 variants for Average2/3/4
The predictors based on Average2 are tad slower.

Following is the performance data for these predictors normalized to
number of instruction cycles (as per valgrind) per operation:
- Predictor6 & Predictor7 now takes 15 instruction cycles compared to 11
  instruction cycles for the C version.
- Predictor8 & Predictor9 now takes 15 instruction cycles compared to 12
  instruction cycles for the C version.

The predictors based on Average4 is faster and Average3 is tad slower:
- Predictor10 (Average4) now takes 23 instruction cycles compared to 25
  instruction cycles for the C version.
- Predictor5 (Average3) now takes 20 instruction cycles compared to 18
  instruction cycles for the C version.

Maybe SSE2 version of Average2 can be improved further. Otherwise, we can
remove the SSE2 version and always fallback to the C version.

Change-Id: I388b2871919985bc28faaad37c1d4beeb20ba029
2014-04-28 14:47:30 -07:00
Pascal Massimino
b3a616b356 make HistogramAdd() a pointer in dsp
* merged the two HistogramAdd/AddEval() into a single call
  (with detection of special case when b==out)
* added a SSE2 variant
* harmonize the histogram type to 'uint32_t' instead
  of just 'int'. This has a lot of ripples on signatures.
* 1-2% faster

Change-Id: I10299ff300f36cdbca5a560df1ae4d4df149d306
2014-04-28 10:09:34 -07:00
James Zern
c8bbb636ea dec_neon: relocate some inline-asm defines
move simple loop filter defines closer to their use and LOAD* to a
location common with the intrinsics

Change-Id: Iaec506d27bbc9a01be20936e30b68a4b0e690ee3
2014-04-28 00:41:42 -07:00
James Zern
4e393bb9f1 dec_neon: enable intrinsics-only functions
the complex loop filter has no inline equivalent; the simple loop filter
remains conditional on USE_INTRINSICS: it's left undefined for now.

Change-Id: I4f258e10458df53a7a1819707c8f46b450e9d9d2
2014-04-28 00:39:46 -07:00
James Zern
ba99a922ab dec_neon: use positive tests for USE_INTRINSICS
makes Simple* layout consistent with the rest of the file

Change-Id: Ib3108b0f2c694c634210e22027c253ea6236a9c6
2014-04-28 00:38:47 -07:00
pascal massimino
69058ff813 Merge "example_util: add ExUtilDecodeWebPIncremental" 2014-04-28 00:38:24 -07:00
James Zern
a7828e8bdb dec_neon: make WORK_AROUND_GCC conditional on version
Change-Id: Ic1b95f8749988de90df7c1ff6c537a21981329db
2014-04-28 00:01:19 -07:00
pascal massimino
3f3d717a6c Merge "enc_neon: enable intrinsics-only functions" 2014-04-27 02:05:53 -07:00
pascal massimino
de3cb6c820 Merge "move LOCAL_GCC_VERSION def to dsp.h" 2014-04-27 02:04:08 -07:00
James Zern
1b2fe14de5 example_util: add ExUtilDecodeWebPIncremental
simplifies ExUtilDecodeWebP() call

Change-Id: I33b42ec1c398b9fc0a895ad9deb89ab01a33b3da
2014-04-27 01:58:25 -07:00
pascal massimino
ca49e7ad97 Merge "enc_neon: move Transpose4x4 to dsp/neon.h" 2014-04-27 01:11:05 -07:00
Pascal Massimino
ad900abddd Merge "fix warning about size_t -> int conversion" 2014-04-27 01:07:03 -07:00
Pascal Massimino
4825b4360d fix warning about size_t -> int conversion
+ re-order and add some const

Change-Id: I3746520b75699e56e20835d10d1dd9cd9fd6d85d
2014-04-27 00:50:07 -07:00
James Zern
42b35e086b enc_neon: enable intrinsics-only functions
CollectHistogram / SSE* / QuantizeBlock have no inline equivalents,
enable them where possible and use USE_INTRINSICS to control borderline
cases: it's left undefined for now.

Change-Id: I62235bc4ddb8aa0769d1ce18a90e0d7da1e18155
2014-04-26 19:09:04 -07:00
James Zern
f937e01261 move LOCAL_GCC_VERSION def to dsp.h
+ add LOCAL_GCC_PREREQ and use it in lossless_neon.c

Change-Id: Ic9fd99540bc3e19c027d1598e4530dfdc9b9de00
2014-04-26 19:09:04 -07:00
James Zern
5e1a17ef4b enc_neon: move Transpose4x4 to dsp/neon.h
+ reuse it in TransformWHT()

Change-Id: Idfbd0f9b58d6253ac3d65ba55b58989c427ee989
2014-04-26 14:06:04 -07:00
James Zern
c7b92a5a29 dec_neon: (WORK_AROUND_GCC) delete unused Load4x8
using this in Load4x16 was slightly slower and didn't help mitigate any
of the remaining build issues with 4.6.x.

Change-Id: Idabfe1b528842a514d14a85f4cefeb90abe08e51
2014-04-26 12:36:14 -07:00
James Zern
8e5f90b086 Merge "make ExUtilLoadWebP() accept NULL bitstream param." 2014-04-26 12:22:53 -07:00
pascal massimino
05d4c1b7b3 Merge "cwebp: add webpdec" 2014-04-26 02:25:02 -07:00
James Zern
ddeb6ac802 cwebp: add webpdec
simple webp decoding, no metadata support

Change-Id: I2752fd1442c87513922878cbf72f001d45b631bc
2014-04-26 02:20:41 -07:00
pascal massimino
35d7d095dd Merge "Reduce memory footprint for encoding WebP lossless." 2014-04-26 01:28:43 -07:00
Vikas Arora
0b896101b4 Reduce memory footprint for encoding WebP lossless.
Reduce calls to Malloc (WebPSafeMalloc/WebPSafeCalloc) for:
- Building HashChain data-structure used in creating the backward references.
- Creating Backward references for LZ77 or RLE coding.
- Creating Huffman tree for encoding the image.
For the above mentioned code-paths, allocate memory once and re-use it
subsequently.

Reduce the foorprint of VP8LHistogram struct by changing the Struct
field 'literal_' from an array of constant size to dynamically allocated
buffer based on the input parameter cache_bits.

Initialize BitWriter buffer corresponding to 16bpp (2*W*H).
There are some hard-files that are compressed at 12 bpp or more. The
realloc is costly and can be avoided for most of the WebP lossless
images by allocating some extra memory at the encoder initializaiton.

Change-Id: I1ea8cf60df727b8eb41547901f376c9a585e6095
2014-04-26 01:14:33 -07:00
Pascal Massimino
f0b65c9a1e make ExUtilLoadWebP() accept NULL bitstream param.
Change-Id: Ifce379873ca39e46d011a1cb829beab89de5f15d
2014-04-26 01:11:44 -07:00
pascal massimino
9c0a60ccb3 Merge "dwebp: move webp decoding to example_util" 2014-04-26 01:05:48 -07:00
Djordje Pesut
1d62acf6af MIPS: MIPS32r1: Added optimization for HuffmanCost functions.
HuffmanCost and HuffmanCostCombined optimized and added
'const' to some variables from ExtraCost functions.

Change-Id: I28b2b357a06766bee78bdab294b5fc8c05ac120d
2014-04-24 11:14:57 +02:00
James Zern
4a0e73904d dwebp: move webp decoding to example_util
this will allow reuse by cwebp

Change-Id: I667252fdacfc5436112d21b040ca299273ec1515
2014-04-22 20:31:41 -07:00
James Zern
c0220460e9 Merge "Bugfix: Incremental decode of lossy-alpha" 2014-04-22 16:33:12 -07:00
Urvang Joshi
8c7cd722f6 Bugfix: Incremental decode of lossy-alpha
When remapping buffer, br->eos_ was wrongly being set to true for
certain
images.

Also, refactored the end-of-stream detection as a function.

Reported in http://crbug.com/364830

Change-Id: I716ce082ef2b505fe24246b9c14912d8e97b5d84
2014-04-22 16:06:32 -07:00
Djordje Pesut
7955152d58 MIPS: fix error with number of registers.
Some versions of compiler in debug build can't find
a register in class 'GR_REGS' while reloading 'asm'

Number of used registers is decreased in this fix.

Change-Id: I7d7b8172b8f37f1de4db3d8534a346d7a72c5065
2014-04-22 12:06:45 +02:00
skal
b1dabe3767 Merge "Move the HuffmanCost() function to dsp lib" 2014-04-18 12:08:22 -07:00
skal
75b12006e3 Move the HuffmanCost() function to dsp lib
This is to help further optimizations.
(like in https://gerrit.chromium.org/gerrit/#/c/69787/)

There's a small slowdown (~0.5% at -z 9 quality) due to
function pointer usage. Note that, for speed, it's important
to return VP8LStreaks by value, and not pass a pointer.

Change-Id: Id4167366765fb7fc5dff89c1fd75dee456737000
2014-04-18 11:59:48 -07:00
Djordje Pesut
2772b8bd98 MIPS: fix assembler error revealed by clang's debug build
.set at -  Indicates that macro expansions may clobber
           the assembler temporary ($at or $28) register.
           Some macros may not be expanded without this
           and will generate an error message if noat
           is in effect.

"at" also added to the clobber list.

Change-Id: I67feebbd9f2944fc7f26c28496e49e1e2348529d
2014-04-18 18:10:52 +02:00
James Zern
6653b601ef enc_mips32: fix unused symbol warning in debug
move kC1 / kC2 under __OPTIMIZE__
missed in:
8dec120 enc_mips32: disable ITransform(One) in debug builds

Change-Id: Ic9a12e6d73090c8c06b0e7a4bc56dd9c76b8e596
2014-04-17 23:35:36 -07:00
James Zern
8dec120975 enc_mips32: disable ITransform(One) in debug builds
avoids:
src/dsp/enc_mips32.c: In function 'ITransformOne':
src/dsp/enc_mips32.c:123:3: can't find a register in class 'GR_REGS' while reloading 'asm'
src/dsp/enc_mips32.c:123:3: 'asm' operand has impossible constraints

Change-Id: Ic469667ee572f25e502c9873c913643cf7bbe89d
2014-04-17 20:10:31 -07:00
James Zern
98519dd5c1 enc_neon: convert Disto4x4 to intrinsics
Change-Id: I0f00d5af2de2301e8237c2a38a9612d3645abad6
2014-04-17 18:29:31 -07:00
Pascal Massimino
fe9317c9bf cosmetics:
* remove MIPS32 suffix from static function names
* fix a long line in enc_neon.c

Change-Id: Ia1294ae46f471b3eb1e9ba43c6aa1b29a7aeb447
2014-04-16 00:36:19 -07:00
James Zern
953b074677 enc_neon: cosmetics
fix/remove incorrect comments
+ whitespace

Change-Id: Id1b86beb23e5bf946e73c34ab7066b6ca177f33b
2014-04-15 23:57:03 -07:00
skal
a9fc697cb6 Merge "WIP: extract the float-calculation of HuffmanCost from loop" 2014-04-15 11:33:11 -07:00
skal
3f84b5219d Merge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)" 2014-04-15 07:09:12 -07:00
Djordje Pesut
4ae0533f39 MIPS: MIPS32r1: Added optimizations for ExtraCost functions.
ExtraCost and ExtraCostCombined

Change-Id: I7eceb9ce2807296c6b43b974e4216879ddcd79f2
2014-04-15 15:37:06 +02:00
skal
b30a04cf11 WIP: extract the float-calculation of HuffmanCost from loop
new function: VP8FinalHuffmanCost()

Change-Id: I42102f8e5ef6d7a7af66490af77b7dc2048a9cb9
2014-04-15 14:52:52 +02:00
skal
a8fe8ce231 Merge "NEON intrinsics version of CollectHistogram" 2014-04-15 03:00:45 -07:00
skal
95203d2d1b NEON intrinsics version of CollectHistogram
apparently faster, but we might save some load/store to/from memory
once we settle for the intrinsics-based FTransform()

(also: fixed some #ifdef USE_INTRINSICS problems)

Change-Id: I426dea299cea0c64eb21c4d81a04a960e0c263c7
2014-04-14 16:47:20 +02:00
skal
7ca2e74bb4 replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)
saves few instructions

Change-Id: If8f464bb2894a209bba94825a4db9267df126d47
2014-04-14 15:14:45 +02:00
skal
41c6efbdc5 fix lossless_neon.c
* some extra {xx , 0 } in initializers
* replaced by vget_lane_u32() where appropriate

Change-Id: Iabcd8ec34d7c853920491fb147a10d4472280a36
2014-04-14 14:27:11 +02:00
skal
8ff96a027a NEON intrinsics version of FTransform
as little bit slower than inlined asm it seems.
So disabled for now.

Change-Id: I8c942846f9bedaed57275675ea9dbbcb8dfd9ccd
2014-04-14 09:58:35 +02:00
Jovan Zelincevic
0214f4a908 Merge "MIPS: MIPS32r1: Added optimizations for FastLog2" 2014-04-10 08:54:12 -07:00
Jovan Zelincevic
baabf1ea3a MIPS: MIPS32r1: Added optimizations for FastLog2
Functions VP8LFastLog2Slow and VP8LFastSLog2Slow

also: replaced some "% y" by "& (y-1)" in the C-version
(since y is a power-of-two)

Change-Id: I875170384e3c333812ca42d6ce7278aecabd60f0
2014-04-10 08:32:51 -07:00
skal
3d49871dbe NEON functions for lossless coding
Verified OK, but right now they don't seem faster.
So they are disabled behind a USE_INTRINSICS flag (off for now)

Change-Id: I72a1c4fa3798f98c1e034f7ca781914c36d3392c
2014-04-10 15:32:08 +02:00
Slobodan Prijic
3fe0291530 MIPS: MIPS32r1: Added optimizations for SSE functions.
Change-Id: I1287fa65064192cc2edc5c4be2b1974be665b9b4
2014-04-09 11:02:13 +02:00
skal
c503b485b6 Merge "fix the gcc-4.6.0 bug by implementing alternative method" 2014-04-08 23:25:59 -07:00
skal
abe6f48709 fix the gcc-4.6.0 bug by implementing alternative method
previous functions are a bit faster with gcc-4.8, so we keep them
for now.

Change-Id: I4081e5af66fbf606295d8a83875c1b889729b4dc
2014-04-09 07:53:55 +02:00
James Zern
5598bdecd8 enc_mips32.c: fix file mode
Change-Id: I5a43320e2ea2eebc88c65398acb9ea59b63af1fd
2014-04-08 15:12:54 -07:00
Slobodan Prijic
2b1b4d5ae9 MIPS: MIPS32r1: Add optimization for GetResidualCost
+ reorganize the cost-evaluation code by moving some functions
to cost.h/cost.c and exposing VP8Residual

Change-Id: Id976299b5d4484e65da8bed31b3d2eb9cb4c1f7d
2014-04-08 15:28:49 +02:00
pascal massimino
f0a1f3cd51 Merge "MIPS: MIPS32r1: Added optimization for FTransform" 2014-04-08 04:17:27 -07:00
Djordje Pesut
7231f610aa MIPS: MIPS32r1: Added optimization for FTransform
Change-Id: I9384dac483e8f98bcfdd277a0a3d6ec7c7a7b297
2014-04-08 04:16:44 -07:00
skal
869eaf6c60 ~30% encoding speedup: use NEON for QuantizeBlock()
also revamped the signature to avoid having to pass the 'first' parameter

Change-Id: Ief9af1747dcfb5db0700b595d0073cebd57542a5
2014-04-08 03:08:22 -07:00
James Zern
f758af6b73 enc_neon: convert FTransformWHT to intrinsics
slightly faster than the inline asm
in practice not much faster than the C-code in a full NEON build, but
still better overall in an Android-like one that only enables NEON for
certain files.

Change-Id: I69534016186064fd92476d5eabc0f53462d53146
2014-04-08 00:20:19 -07:00
Djordje Pesut
7dad095bb4 MIPS: MIPS32r1: Added optimization for Disto4x4 (TTransform)
Change-Id: Ieb20c5c52b964247cfe46f45f9a7415725bf7c02
2014-04-07 15:04:23 +02:00
Jovan Zelincevic
2298d5f301 MIPS: MIPS32r1: Added optimization for QuantizeBlock
Change-Id: I6047ab107e4d474e35b5af1dac391d5b3d8c049b
2014-04-07 09:22:35 +02:00
Djordje Pesut
e88150c9b6 Merge "MIPS: MIPS32r1: Add optimization for ITransform" 2014-04-05 10:36:05 -07:00
James Zern
de693f2502 lossless_neon: disable VP8LConvert* functions
due to breakage with NDK/gcc-4.6 builds

Change-Id: Id96258e710ee33e08a023354b3227f27da986620
2014-04-04 20:38:29 -07:00
skal
4143332b22 NEON intrinsics for encoding
* inverse transform is actually slower with intrinsics + gcc-4.6,
  so is left disabled for now.
  With gcc-4.8, it's a bit faster than inlined assembly.

* Sum of Square error function provide a 2-3% speed up
  There's enabled by default (since there's no inlined-asm equivalent)

Change-Id: I361b3f0497bc935da4cf5b35e330e379e71f498a
2014-04-04 15:02:56 -07:00
Djordje Pesut
0ca2914b23 MIPS: MIPS32r1: Add optimization for ITransform
Change-Id: Ie4c8b9bc3a7826bd443cdebf05386786fafe8c56
2014-04-04 10:50:35 +02:00
James Zern
71bca5ecf3 dec_neon: use vst_lane instead of vget_lane
results in fewer instructions, small speed improvement

Change-Id: I98de632d09ff09f295368c0d744cb4397b585084
2014-04-03 14:56:26 -07:00
skal
bf06105293 Intrinsics NEON version of TransformOne
+ misc cosmetics

* seems 4% slower than inlined-asm with gcc-4.6
* is a tad faster (<1%) with gcc-4.8
(disabled for now)

Change-Id: Iea6cd00053a2e9c1b1ccfdad1378be26584f1095
2014-04-03 14:41:56 -07:00
pascal massimino
19c6f1ba74 Merge "dec_neon: use vld?_lane instead of vset?_lane" 2014-04-03 01:16:29 -07:00
James Zern
7a94c0cf75 upsampling_neon: drop NEON suffix from local functions
Change-Id: I6583ad74aacf78dcbeb5a0ff0218a39bc3460e5a
2014-04-02 23:24:39 -07:00
James Zern
d14669c83c upsampling_sse2: drop SSE2 suffix from local functions
Change-Id: I2349c1a8e5e15e1d204642096f84f3202721c297
2014-04-02 23:24:39 -07:00
James Zern
2ca42a4fb7 enc_sse2: drop SSE2 suffix from local functions
Change-Id: I5d61605a9d410761d50b689b046114f0ab3ba24e
2014-04-02 23:24:36 -07:00
James Zern
d038e6193b dec_sse2: drop SSE2 suffix from local functions
Change-Id: Ie171778b84038d5b04c5dc6972f6015caf555882
2014-04-02 23:10:39 -07:00
James Zern
fa52d7525f dec_neon: use vld?_lane instead of vset?_lane
results in fewer instructions, small speed improvement

Change-Id: I61ab48d09a5ce7c5158eac8244d28287457edc7a
2014-04-02 23:03:18 -07:00
Pascal Massimino
c520e77d94 cosmetic: fix long line
Change-Id: Id04b368aea5784a98c705f323b32d35b362742ea
2014-04-02 23:00:50 -07:00
James Zern
4b0f2dae6f Merge "add intrinsics NEON code for chroma strong-filtering" 2014-04-02 22:57:44 -07:00
skal
e351ec0759 add intrinsics NEON code for chroma strong-filtering
The nice trick is to pack 8 u + 8 v samples into a single uint8x16x_t
register, and re-use the previous (luma) functions

Change-Id: Idf50ed2d6b7137ea080d603062bc9e0c66d79f38
2014-04-03 06:58:21 +02:00
pascal massimino
aaf734b8b0 Merge "Add SSE2 version of forward cross-color transform" 2014-04-02 14:18:59 -07:00
Urvang Joshi
c90a902eff Add SSE2 version of forward cross-color transform
Encoding speed is roughly the same.

Change-Id: I6b976d0eb24e1847714e719762cb8403768da66c
2014-04-02 12:21:20 -07:00
Vikas Arora
bc374ff39e Use histogram_bits to initalize transform_bits.
This change gains back 1% in compression density for method=3 and 0.5% for
method=4, at the expense of 10% slower compression speed.

Change-Id: I491aa1c726def934161d4a4377e009737fbeff82
2014-04-02 11:46:40 -07:00
James Zern
2132992d47 Merge "Add strong filtering intrinsics (inner and outer edges)" 2014-04-02 00:10:01 -07:00
skal
5fbff3a646 Add strong filtering intrinsics (inner and outer edges)
+ added some work-around gcc-4.6 to make it compile (except one function).
+ lots of revamping

All variants tested ok.
Speed-up is ~5-7%

Change-Id: I5ceda2ee5debfada090907fe3696889eb66269c3
2014-04-02 08:28:55 +02:00
Urvang Joshi
d4813f0cb2 Add SSE2 function for Inverse Cross-color Transform
Lossless decoding is now ~3% faster.

Change-Id: Idafb5c73e5cfb272cc3661d841f79971f9da0743
2014-04-01 15:52:25 -07:00
James Zern
26029568b7 dec_neon: add strong loopfilter intrinsics
vertical only currently, 2.5-3% faster
placed under USE_INTRINSICS as this change depends on the simple
loopfilter
improves the simple loopfilter slightly thanks to some reorganization

Change-Id: I6611441fa54228549b21ea74c013cb78d53c7155
2014-04-01 01:13:50 -07:00
James Zern
cca7d7ef0f Merge "add intrinsics version of SimpleHFilter16NEON()" 2014-04-01 00:57:11 -07:00
James Zern
1a05dfa7f5 windows: fix dll builds
WebPSafe* need to be marked external to allow mux/demux to access them
through libwebp.dll

Change-Id: Ib6620e00d376f7aa5a0550e1e244f759977f97a0
2014-03-31 17:46:12 -07:00
skal
d6c50d8ac2 Merge "add some colorspace conversion functions in NEON" 2014-03-31 13:15:18 -07:00
Urvang Joshi
4fd7c82e6a SSE2 variants of Subtract-Green: Rectify loop condition
When 4 pixels are left, they should be processed with SSE2.

Decoding is marginally faster (~0.4%).
Encoding speed: No observable difference.

Change-Id: I3cf21c07145a560ff795451e65e64faf148d5c3e
2014-03-31 10:51:45 -07:00
skal
97e5fac389 add some colorspace conversion functions in NEON
new file: lossless_neon.c
speedup is ~5%

gcc 4.6.3 seems to be doing some sub-optimal things here,
storing register on stack using 'vstmia' and such.
Looks similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=51509

I've tried adding  -fno-split-wide-types and it does help
the generated assembly. But the overall speed gets worse with
this flag. We should only compile lossless_neon.c with it -> urk.

Change-Id: I2ccc0929f5ef9dfb0105960e65c0b79b5f18d3b0
2014-03-31 17:47:46 +02:00
skal
b9a7a45f1f add intrinsics version of SimpleHFilter16NEON()
It's disable for now, because it crashes gcc-4.6.3 during compilation
with -O2 or -O3. It's been tested OK with -O1.

Code is still globally disabled with USE_INTRINSICS, though.

Change-Id: I3ca6cf83f3b9545ad8909556f700758b3cefa61c
2014-03-31 16:31:31 +02:00
Pascal Massimino
daccbf400d add light filtering NEON intrinsics
disabled for now (but tested OK), thanks to the USE_INTRINSICS #define
We'll activate the code when we're on par with non-intrinsics

Change-Id: Idbfb9cb01f4c7c9f5131b270f8c11b70d0d485ff
2014-03-30 22:15:55 -07:00
Pascal Massimino
af44460880 fix typo in STORE_WHT
was working ok because dst == out

Change-Id: I27095129a11f468422250dd2b8fad8b3bd4e5bbd
2014-03-28 10:34:44 -07:00
Vikas Arora
6af6b8e1b6 Tune HistogramCombineBin for large images.
Tune HistogramCombineBin for hard images that are larger than 1-2 Mega
pixel and represent photographic images.

This speeds up lossless encoding on 1000 image corpus by 10-12% and compression
penalty of 0.1-0.2%.

Change-Id: Ifd03b75c503b9e886098e5fe6f86be0391ca8e81
2014-03-28 07:09:59 -07:00
skal
af93bdd6bc use WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free
there's still some malloc/free in the external example
This is an encoder API change because of the introduction
of WebPMemoryWriterClear() for symmetry reasons.

The MemoryWriter object should probably go in examples/ instead
of being in the main lib, though.
mux_types.h stil contain some inlined free()/malloc() that are
harder to remove (we need to put them in the libwebputils lib
and make sure link is ok). Left as a TODO for now.

Also: WebPDecodeRGB*() function are still returning a pointer
that needs to be free()'d. We should call WebPSafeFree() on
these, but it means exposing the whole mechanism. TODO(later).

Change-Id: Iad2c9060f7fa6040e3ba489c8b07f4caadfab77b
2014-03-27 15:50:59 -07:00
James Zern
51f406a5d7 lossless_sse2: relocate VP8LDspInitSSE2 proto
this is in line with the other dsp files and silences a build warning.

Change-Id: I03ee3032c11d4c731cc10bfa0a2dcb6866756ba2
2014-03-27 15:07:43 -07:00
skal
0f4f721b12 separate SSE2 lossless functions into its own file
expose the predictor array as function pointers instead
of each individual sub-function

+ merged Average2() into ClampedAddSubtractHalf directly
+ unified the signature as "VP8LProcessBlueAndRedFunc"

no speed diff observed

Change-Id: Ic3c45dff11884a8330a9ad38c2c8e82491c6e044
2014-03-27 21:43:55 +01:00
skal
514fc251df VP8LConvertFromBGRA: use conversion function pointers
Change-Id: I863b97119d7487e4eef337e5df69e1ae2a911d4c
2014-03-27 09:00:35 +01:00
James Zern
6d2f35273d dsp/dec: TransformDCUV: use VP8TransformDC
rather than forcing the C version; this is similar to TransformUV

Change-Id: I2778194f05fca33e9b2b71323e92947c0b395e9a
2014-03-26 16:43:47 -07:00
skal
defc8e1b01 Merge "fix out-of-bound read during alpha-plane decoding" 2014-03-26 15:22:42 -07:00
James Zern
fbed36433d Merge "dsp: reuse wht transform from dec in encoder" 2014-03-26 15:13:07 -07:00
skal
d846708400 Merge "Add SSE2 version of ARGB -> BGR/RGB/... conversion functions" 2014-03-26 15:01:46 -07:00
skal
207d03b484 fix out-of-bound read during alpha-plane decoding
With -bypass_filter switched on, the lossless-compressed
data is decoded ahead of time (before being transformed and
display). Hence, the last row was called twice.

http://code.google.com/p/webp/issues/detail?id=193

Change-Id: I9e13f495f6bd6f75fa84c4a21911f14c402d4b10
2014-03-26 22:45:03 +01:00
skal
d1b33ad58b 2-5% faster trellis with clang/MacOS
(and ~2-3% on ARM)

We don't need to store cost/score for each node, but only for
the current and previous one -> simplify code and save some memory.

Also made the 'Node' structure tighter.

Change-Id: Ie3ad7d3b678992b396242f56e2ac387fe43852e6
2014-03-26 22:33:01 +01:00
skal
369c26dd3f Add SSE2 version of ARGB -> BGR/RGB/... conversion functions
~4-6% faster lossless decoding

Change-Id: I3ed1131ff2b2a0217da315fac143cd0d58293361
2014-03-26 22:19:00 +01:00
James Zern
df230f2723 dsp: reuse wht transform from dec in encoder
Change-Id: Ide663db9eaecb7a37fe0e6ad4cd5f37de190c717
2014-03-22 13:25:08 -07:00
James Zern
80e218d43a Android.mk: fix build with APP_ABI=armeabi-v7a-hard
added in r9d; relax the check to build neon code

Change-Id: Ic52b3fbd3bf53617ee52b07a55b0ed05f6f9b26f
2014-03-20 23:24:39 -07:00
Pascal Massimino
59daf08362 Merge "cosmetics:" 2014-03-18 04:02:33 -07:00
Pascal Massimino
536220084c cosmetics:
- use VP8ScanUV, separate from VP8Scan[] (for luma)
 - fix indentation
 - few missing consts
 - change TrellisQuantizeBlock() signature

Change-Id: I94b437d791cbf887015772b5923feb83dd145530
2014-03-18 03:34:56 -07:00
James Zern
3e7f34a3fb AssignSegments: quiet array-bounds warning
nb (enc->segment_hdr_.num_segments_) will be in the range
[1, NUM_MB_SEGMENTS].

Change-Id: I5c2bd0bb82b17c99aff39c98b6b1747fc040dc16
2014-03-14 18:47:52 -07:00
James Zern
3c2ebf58a4 Merge "UpdateHistogramCost: avoid implicit double->float" 2014-03-14 15:50:57 -07:00
James Zern
cf821c821f UpdateHistogramCost: avoid implicit double->float
all the functions involved return double and later these locals are used
in double calculations. fixes a vs build warning

Change-Id: Idb547104ef00b48c71c124a774ef6f2ec5f30f14
2014-03-14 11:18:52 -07:00
Vikas Arora
312e638f30 Extend the search space for GetBestGreenRedToBlue
Get back some of the compression gains by extending the search space for
GetBestGreenRedToBlue. Also removed the SkipRepeatedPixels call, as it was not
helping much in yielding better compression density.

Before:
 1000 files, 63530337 pixels, 1 loops => 45.0s (45.0 ms/file/iterations)
 Compression (output/input): 2.463/3.268 bpp, Encode rate (raw data): 1.347 MP/s

After:
1000 files, 63530337 pixels, 1 loops => 45.9s (45.9 ms/file/iterations)
 Compression (output/input): 2.461/3.268 bpp, Encode rate (raw data): 1.321 MP/s

Change-Id: I044ba9d3f5bec088305e94a7c40c053ca237fd9d
2014-03-14 09:56:00 -07:00
Vikas Arora
1c58526fe1 Fix few nits
Add/remove few casts, fixed indentation.

Change-Id: Icd141694201843c04e476f09142ce4be6e502dff
2014-03-13 13:57:39 -07:00
Vikas Arora
fef22704ec Optimize and re-structure VP8LGetHistoImageSymbols
Optimize and re-structured VP8LGetHistoImageSymbols method, by using the bin-hash
for merging the Histograms more efficiently, instead of the randomized
heuristic of existing method HistogramCombine.

This change speeds up the Lossless encoding by 40-50% (for method=4 and Q > 50)
with 0.8% penalty in compression density. For lower method, the speed up is 25-30%,
with 0.4% penalty in the compression density.

Change-Id: If61adadb1a041b95def6405aa1fe3b83c3cb25ce
2014-03-13 11:48:37 -07:00
Vikas Arora
068b14ac57 Optimize lossless decoding.
Restructure PredictorInverseTransform & ColorSpaceInverseTransform to remove
one if condition inside the main/critial loop. Also separated TransformColor &
TransformColorInverse into separate functions and avoid one 'if condition'
inside this critical method.

This change speeds up lossless decoding for Lenna image about 5% and 1000 image
corpus by 3-4%.

Change-Id: I4bd390ffa4d3bcf70ca37ef2ff2e81bedbba197d
2014-03-13 11:27:12 -07:00
Vikas Arora
5f0cfa80ff Do a binary search to get the optimum cache bits.
This speeds up the lossless encoder by a bit (1-2%), without impacting the
compression density.

Change-Id: Ied6fb38fab58eef9ded078697e0463fe7c560b26
2014-03-13 10:30:32 -07:00
James Zern
24ca3678f9 Merge "allow 'cwebp -o -' to emit output to stdout" 2014-03-12 14:01:15 -07:00
skal
e12f874eea allow 'cwebp -o -' to emit output to stdout
Change-Id: I423d25898e1ba317ccbf456bb28ce45663a3b3d2
2014-03-12 13:56:15 -07:00
skal
2bcad89b4c allow some more stdin/stout I/O
* allow reading from stdin for dwebp / vwebp
 * allow writing to stdout for gif2webp

by introducing a new function ExUtilReadFromStdin()

Example use: cat in.webp | dwebp -o - -- - > out.png

Note that the '-- -' option must appear *last*
(as per general fashion for '--' option parsing)

Change-Id: I8df0f3a246cc325925d6b6f668ba060f7dd81d68
2014-03-12 19:32:16 +01:00
skal
84ed4b3aa5 fix cwebp.1 typos after patch #69199
Change-Id: I046a54dbb4210319ddb156f49dd9d13d47b0d035
2014-03-11 23:46:50 +01:00
skal
65b99f1c92 add a -z option to cwebp, and WebPConfigLosslessPreset() function
These are presets for lossless coding, similar to zlib.
The shortcut for lossless coding is now, e.g.:
   cwebp -z 5 in.png -o out_lossless.webp

There are 10 possible values for -z parameter:
   0 (fastest, lowest compression)
to 9 (slowest, best compression)

A reasonable tradeoff is -z 6, e.g.
-z 9 can be quite slow, so use with care.

This -z option is just a shortcut for some pre-defined
'-lossless -m xx -q yy' combinations.

Change-Id: I6ae716456456aea065469c916c2d5ca4d6c6cf04
2014-03-11 23:25:35 +01:00
skal
30176619c6 4-5% faster trellis by removing some unneeded calculations.
(We didn't need the exact value of the max_error properly.
We can work with relative values instead of absolute)

Output is bitwise the same as before.

Change-Id: I67aeaaea5f81bfd9ca8e1158387a5083a2b6c649
2014-03-06 15:57:25 +01:00
James Zern
687a58ecc3 histogram.c: reindent after b33e8a0
b33e8a0 Refactor code for HistogramCombine.

Change-Id: Ia1b4b545c5f4e29cc897339df2b58f18f83c15b3
2014-03-04 00:38:14 -08:00
skal
06d456f685 Merge "~3-4% faster lossless encoding" 2014-03-04 00:17:52 -08:00
skal
c60de26099 ~3-4% faster lossless encoding
by re-arranging some code from SkipRepeatedPixel()

Change-Id: I6c1fd7cd9af22cd9be4234217ff67d7b94f44137
2014-03-04 08:12:59 +01:00
James Zern
42eb06fc0e Merge "few cosmetics after patch #69079" 2014-03-03 15:13:25 -08:00
skal
82af82644b few cosmetics after patch #69079
Change-Id: Ifa758420421b5a05825a593f6b43504887603ee7
2014-03-03 23:53:08 +01:00
Vikas Arora
b33e8a05ee Refactor code for HistogramCombine.
Refactor code for HistogramCombine and optimize the code by calculating
the combined entropy and avoid un-necessary Histogram merges.

This speeds up lossless encoding by 1-2% and almost no impact on compression
density.

Change-Id: Iedfcf4c1f3e88077bc77fc7b8c780c4cd5d6362b
2014-03-03 13:50:42 -08:00
skal
ca1bfff53f Merge "5-10% encoding speedup with faster trellis (-m 6)" 2014-03-03 13:09:17 -08:00
skal
5aeeb087d6 5-10% encoding speedup with faster trellis (-m 6)
mostly by:
- storing a single rd-score instead of cost / distortion separately
- evaluating terminal cost only once
- getting some invariants out of the loops
- more consts behind fewer variables

Change-Id: I79451f3fd1143d6537200fb8b90d0ba252809f8c
2014-03-03 22:07:06 +01:00
James Zern
82ae1bf299 cosmetics: normalize VP8GetCPUInfo checks
- use '!= NULL'
+ dec_neon/STORE_WHT: align '\'s

Change-Id: I0f0ce49bd9c58e771bafb24c51c070d5ebd77e53
2014-02-28 18:47:41 -08:00
James Zern
e3dd9243cb Merge "Refactor GetBestPredictorForTile for future tuning." 2014-02-28 18:39:27 -08:00
Vikas Arora
206cc1be5a Refactor GetBestPredictorForTile for future tuning.
This change doesn't impact compression gain or compression speed.

Change-Id: Ia87d8a46c6f1ce0f8974178d75a6b0ba0a6e3696
2014-02-28 11:30:23 -08:00
James Zern
3cb8406262 Merge "speed-up trellis quant (~5-10% overall speed-up)" 2014-02-27 14:34:01 -08:00
Pascal Massimino
b66f2227c1 Merge "lossy encoding: ~3% speed-up" 2014-02-27 11:42:16 -08:00
Pascal Massimino
4287d0d49b speed-up trellis quant (~5-10% overall speed-up)
store costs[] in node instead of context

Change-Id: I6aeb0fd94af9e48580106c41408900fe3467cc54
also: various cosmetics
2014-02-27 00:06:00 -08:00
Pascal Massimino
390c8b316d lossy encoding: ~3% speed-up
incorporate non-last cost in per-level cost table

also: correct trellis-quant cost evaluation at nodes
(output a little bit different now). Method 6 is ~4% faster.

Change-Id: Ic48bd6d33f9193838216e7dc3a9f9c5508a1fbe8
2014-02-26 05:52:24 -08:00
James Zern
9a463c4a51 Merge "dec_neon: convert TransformWHT to intrinsics" 2014-02-25 14:36:44 -08:00
pascal massimino
e8605e9625 Merge "dec_neon: add ConvertU8ToS16" 2014-02-25 08:56:17 -08:00
Djordje Pesut
4aa3e4122b MIPS: MIPS32r1: rescaler bugfix
Change-Id: I6de6e2488bd5bd58c1f705739e4467feb211f8b4
2014-02-25 14:36:48 +01:00
Vikas Arora
c16cd99aba Speed up lossless encoder.
Speedup lossless encoder by 20-25% by optimizing:
- GetBestColorTransformForTile: Use techniques like binary search and
  local minima search to reduce the search space.
- VP8LFastSLog2Slow & VP8LFastLog2Slow: Adding the correction factor for
  log(1 + x) and increase the threshold for calling the approximate
  version of log_2 (compared to costly call to log()).

Change-Id: Ia2444c914521ac298492aafa458e617028fc2f9d
2014-02-21 22:13:50 -08:00
James Zern
9d6b5ff1e6 dec_neon: convert TransformWHT to intrinsics
Change-Id: I34dc1d75ddebab131cfed031764117e3f7b75c6b
2014-02-21 11:23:46 -08:00
James Zern
2ff0aae2fe dec_neon: add ConvertU8ToS16
Change-Id: Ifc4fb8e7f862e72154d2f2739811b1022d2b9416
2014-02-20 15:35:33 -08:00
skal
77a8f91981 fix compilation with USE_YUVj flag
(not that we'll ever need it, but...)

Change-Id: I9af993c62372097846c5ca6bae8362b59c3502dc
2014-02-20 13:23:18 +01:00
James Zern
4acbec1bef Merge changes I3b240ffb,Ia9370283,Ia2d28728
* changes:
  dec_neon: TransformAC3: work on packed vectors
  dec_neon: add SaturateAndStore4x4
  dec_neon.c: convert TransformDC to intrinsics
2014-02-19 14:47:33 -08:00
James Zern
2719bb7e98 dec_neon: TransformAC3: work on packed vectors
pack 2 rows in 1 vector similar to TransformDC

Change-Id: I3b240ffb4f51a632b5c8c2daf54d938333ed4b0d
2014-02-18 19:47:20 -08:00
James Zern
b7b60ca16c dec_neon: add SaturateAndStore4x4
converts 2 s16 vectors to 2 u8 and store to uint8_t destination;
TransformAC3 can reuse this after a rework

Change-Id: Ia9370283ee3d9bfbc8c008fa883412100ff483d0
2014-02-18 19:42:35 -08:00
Pascal Massimino
b7685d73fe Rescale: let ImportRow / ExportRow be pointer-to-function
Separate the C version from the MIPS32 version and have run-time
initialization during RescalerInit()

Change-Id: I93cfa5691c073a099fe62eda1333ad2bb749915b
2014-02-17 00:58:17 -08:00
James Zern
e02f16ef45 dec_neon.c: convert TransformDC to intrinsics
no noticeable difference in performance

Change-Id: Ia2d287289c3865ddd0fc99edaf7a030778aa7025
2014-02-14 12:11:58 -08:00
skal
9cba963f9a add missing file
Change-Id: I17eab2fedc64ee3bba941a592ecef765fcd2b402
2014-02-13 21:56:19 -08:00
skal
8992ddb756 use static clipping tables
(shared with mips32)
removed abs1[] table along the way

sub-1% speed-up, but still...

Change-Id: I8c29a8a0285076cb3423b01ffae9fcc465da6a81
2014-02-13 19:32:59 -08:00
skal
0235d5e44b 1-2% faster quantization in SSE2
C-version is a bit faster too (sub-1% faster on ARM)

Change-Id: I077262042f1d0937aba1ecf15174f2c51bf6cd97
2014-02-13 15:55:30 -08:00
Pascal Massimino
b2fbc36c26 fix VC12-x64 warning
"conversion from 'vp8l_atype_t' to 'uint8_t', possible loss of data"

Change-Id: I7607a688d16aca8fae8ce472450f8423c48f3a26
2014-02-12 12:19:32 -08:00
James Zern
6e37cb942f Merge "cosmetics: backward_references.c: reindent after a7d2ee3" 2014-02-11 16:28:55 -08:00
James Zern
a42ea9742a cosmetics: backward_references.c: reindent after a7d2ee3
a7d2ee3 Optimize cache estimate logic.

Change-Id: I81dd1eea49f603465dc5f3afae8a101e5205e963
2014-02-11 15:52:22 -08:00
skal
6c32744214 Merge "fix missing __BIG_ENDIAN__ definition on some platform" 2014-02-11 14:51:11 -08:00
skal
a8b6aad155 fix missing __BIG_ENDIAN__ definition on some platform
e.g: mips-gcc doesn't define __BIG_ENDIAN__

Change-Id: Ic06bf453164ddddc69a523e7845a4993e14a1af2
2014-02-11 14:43:44 -08:00
Vikas Arora
fde2904b8a Increase initial buffer size for VP8L Bit Writer.
Increase the initial buffer size for VP8L Bit Writer from 4bpp to 8bpp.
The resize buffer is expensive (requires realloc and copy) and this additional
memory (0.5 * W * H) doesn't add much overhead on the lossless encoder.

Change-Id: Ic1fe55cd7bc3d1afadc799e4c2c8786ec848ee66
2014-02-11 11:13:21 -08:00
Vikas Arora
a7d2ee39be Optimize cache estimate logic.
Optimize 'VP8LCalculateEstimateForCacheSize' for lower quality ranges (Q < 50).
The entropy is generally lower for higher cache_bits, so start searching from
higher cache_bits and settle for a local minima, instead of evaluating all
values.

This speeds up the lossless encoding at lower qualities by 10-15%.

Change-Id: I33c1e958515a2549f2e6f64b1aab3f128660dcec
2014-02-11 10:59:01 -08:00
pascal massimino
7fb6095b03 Merge "dec_neon.c: add TransformAC3" 2014-02-11 09:17:34 -08:00
skal
bf182e837e VP8LBitWriter: use a bit-accumulator
* simplify the endian logic
* remove the need for memset()
* write 16 or 32 at a time (likely aligned)

Makes the code a bit faster on ARM (~1%)

Change-Id: I650bc5654e8d0b0454318b7a78206b301c5f6c2c
2014-02-11 09:12:45 -08:00
Djordje Pesut
3f40b4a581 Merge "MIPS: MIPS32r1: clang macro warning resolved" 2014-02-11 00:06:36 -08:00
Urvang Joshi
1684f4ee37 WebP Decoder: Mark some truncated bitstreams as invalid
Specifically, check for truncated RIFF and/or VP8(L) chunks.

For more context, see:
https://code.google.com/p/webp/issues/detail?id=185

Change-Id: I91ca2dbf05080660fbc513244fc53adc57fc04b5
2014-02-10 16:35:27 -08:00
Jovan Zelincevic
acbedac475 MIPS: MIPS32r1: clang macro warning resolved
.set macro - Enables the expansion of macro instructions.

Change-Id: I1e44fe056798aeff803cc97171724d21da1fc2bf
2014-02-10 06:50:11 -08:00
James Zern
228e4877ab dec_neon.c: add TransformAC3
based on SSE2 version

Change-Id: Icc6782955253c98e83d5984153b596ef5f1c0d34
2014-02-08 12:47:54 -08:00
James Zern
393f89b763 Android.mk: avoid gcc-specific flags with clang
Change-Id: Idb1ed2bb1dd5d9f65ca07185ef9838e587dc4e64
2014-02-07 20:31:44 -08:00
skal
32aeaf115a revamp VP8LColorSpaceTransform() a bit
-> remove the 'color_transform' multiplier, use more constants, etc.

This function is particularly critical, mostly because of
GetBestColorTransformForTile().
Loop is a bit faster (maybe ~1%)

Change-Id: I90c96a3437cafb184773acef55c77e40c224388f
2014-02-05 10:37:06 +01:00
James Zern
0c7cc4ca20 Merge "Don't dereference NULL, ensure HashChain fully initialized" 2014-02-03 22:58:37 -08:00
Scott Talbot
391316fee2 Don't dereference NULL, ensure HashChain fully initialized
Found by clang's static analyzer, they look validly uninitialized
to me.

Change-Id: I650250f516cdf6081b35cdfe92288c20a3036ac8
2014-02-03 21:16:59 -08:00
skal
926ff40229 WEBP_SWAP_16BIT_CSP: remove code dup
and prepare for potentially supporting both RGBA4444 and BARG4444

Change-Id: If5200289bc6338757a2ceb2df1a19de732595052
2014-02-03 13:24:33 -08:00
Vikas Arora
1d1cd3bbd6 Fix decode bug for rgbA_4444/RGBA_4444 color-modes.
The WEBP_SWAP_16BIT_CSP flag needs to be honored while filling the Alpha (4 bits)
data in the destination buffer and while pre-multiplying the alpha to RGB colors.

Change-Id: I3b07307d60963db8d09c3b078888a839cefb35ba
2014-02-03 09:20:54 -08:00
Pascal Massimino
939e70e7d3 update AUTHORS file
Change-Id: I50e8f20016097cf63eaeb46a8588203a2165b161
2014-01-31 12:08:05 -08:00
James Zern
8934a622ac cosmetics: *_mips32.c
indent, comments, unused includes

Change-Id: Id0aabc52d05bb633f62aec022155ec27699cf5a0
2014-01-30 18:03:48 -08:00
Djordje Pesut
dd438c9a7d MIPS: MIPS32r1: Optimization of some simple point-sampling functions. PATCH [6/6]
Change-Id: I2020e71e9be5d17d4bf67cabf6c470ca43d5d838
2014-01-29 15:37:31 +01:00
Djordje Pesut
53520911c3 Added support for calling sampling functions via pointers.
Change-Id: Ic4d72e6b175a6b27bcdcc8cd97828e44ea93e743
2014-01-29 15:32:35 +01:00
Jovan Zelincevic
d16c69749b MIPS: MIPS32r1: Optimization of filter functions. PATCH [5/6]
Change-Id: Ifbd305e0514f09a587db02c3970f22190808503a
2014-01-29 15:03:45 +01:00
Djordje Pesut
04336fc7f8 MIPS: MIPS32r1: Optimization of function TransformOne. PATCH [4/6]
Change-Id: I5b98e2de940977538cf91bfa2128f4d1daa5c170
2014-01-28 20:10:43 -08:00
Djordje Pesut
92d8fc7dd4 MIPS: MIPS32r1: Optimization of function WebPRescalerImportRow. PATCH [3/6]
Change-Id: I32339a8d2d03f1a8d8638563d2b2c9e3a13a4909
2014-01-29 00:12:41 +01:00
skal
bbc23ff34c parse one row of intra modes altogether
(instead of per-macroblock)

speed unchanged.
simplified the context-saving for incremental decoding

Change-Id: I301be581bab581ff68de14c4ffe5bc0ec63f34be
2014-01-28 21:40:40 +01:00
James Zern
a2f608f9e0 Merge "MIPS: MIPS32r1: Optimization of function WebPRescalerExportRow. [2/6]" 2014-01-28 11:19:25 -08:00
Djordje Pesut
88230854e3 MIPS: MIPS32r1: Optimization of function WebPRescalerExportRow. [2/6]
Change-Id: I420e646deefac5df9a8dcec3b02da8dff02b1167
2014-01-28 11:02:01 -08:00
James Zern
c5a5b0286f decode mt+incremental: fix segfault in debug builds
VP8GetThreadMethod() may be called with a NULL headers param; correct an
assert.

broken since:
8a2fa09 Add a second multi-thread method

Change-Id: If7b6d1b8f4ec874d343a806cee5f5e6bb6438620
2014-01-27 20:33:05 -08:00
skal
9882b2f953 always use fast-analysis for all methods.
This makes the segmentation overall less prone to
local-optimum  or boundary effect.

(and overall, encoding is a little faster)

Change-Id: I35688098b0f43c28b5cb81c4a92e1575bb0eddb9
2014-01-24 18:17:04 +01:00
James Zern
000adac011 Merge "autoconf: update ax_pthread.m4" 2014-01-24 01:13:11 -08:00
James Zern
2d2fc37d98 update .gitignore
- generalize *.pc for libwebp(mux|demux)
- add auto-generated mkinstalldirs

Change-Id: I7543209722d6b64830fa61f88eccf54165920cc4
2014-01-23 19:35:28 -08:00
James Zern
5bf4255af4 Merge "Make it possible to avoid automagic dependencies" 2014-01-23 12:57:04 -08:00
Pascal Massimino
c1cb1933d5 disable NEON for arm64 platform
The registers and instructions are quite different to 32bit
and the assembly code needs a rewrite.

more info: http://people.linaro.org/~rikuvoipio/aarch64-talk/

Change-Id: Id75dbc1b7bf47f43a426ba2831f25bb8fa252c4f
2014-01-23 12:35:01 -08:00
Paweł Hajdan, Jr
73a304e953 Make it possible to avoid automagic dependencies
See
 https://wiki.gentoo.org/wiki/Project:Quality_Assurance/Automagic_dependencies
for more info.
This patch makes it possible to request several dependencies not to be
used even if they are present on the system.

Change-Id: Ib32f60c001ade01eda3e89463279725ee4243ca2
2014-01-23 12:29:09 -08:00
Jovan Zelincevic
4d493f8db2 MIPS: MIPS32r1: Decoder bit reader function optimized. PATCH [1/6]
Solved unaligned load and added MIPS32r1 support for VP8LoadNewBytes.

Change-Id: I7782e668a8f2b9dace0ab6afeaff3449bbc7d90b
2014-01-17 22:58:40 +01:00
skal
c741183c10 make WebPCleanupTransparentArea work with argb picture
the -alpha_cleanup flag was ineffective since we switched cwebp
to using ARGB input always.

Original idea by David Eckel (dvdckl at gmail dot com)

Change-Id: I0917a8b91ce15a43199728ff4ee2a163be443bab
2014-01-17 20:19:16 +01:00
skal
5da185522b add a decoding option to flip image vertically
New API options: WebPDecoderOptions.flip and 'dwebp -flip ...'

it uses negative stride trick.

Also changed the decoder code to support user-supplied
buffers with negative stride, independently of the
WebPDecoderOptions.flip value.

Change-Id: I4dc0d06f0c87e51a3f3428be4fee2d6b5ad76053
2014-01-16 15:48:43 +01:00
James Zern
00c3c4e114 Merge "add man/vwebp.1" 2014-01-10 17:47:57 -08:00
James Zern
2c6bb42806 add man/vwebp.1
Change-Id: I3257fdeb7d48af99946f751e3e2cd07660117255
2014-01-10 17:43:03 -08:00
James Zern
ea59a8e980 Merge "Merge tag 'v0.4.0'" 2014-01-10 17:09:19 -08:00
skal
7574bed43d fix comments related to array sizes
thanks to foivos_g4 at hotmail dot com for spotting this.

Change-Id: I8cd301bae58a6edbc3ef47607654bc21321721ca
2014-01-09 17:30:25 +01:00
James Zern
0b5a90fdb5 dwebp.1: fix option formatting
Change-Id: I427e003cacb3e71ae6f6adc632dbb93be2eea0e7
2014-01-08 15:16:37 -08:00
James Zern
effcb0fdfa Merge tag 'v0.4.0'
libwebp 0.4.0
- 12/19/13: version 0.4.0
  * improved gif2webp tool
  * numerous fixes, compression improvement and speed-up
  * dither option added to decoder (dwebp -dither 50 ...)
  * improved multi-threaded modes (-mt option)
  * improved filtering strength determination
  * New function: WebPMuxGetCanvasSize
  * BMP and TIFF format output added to 'dwebp'
  * Significant memory reduction for decoding lossy images with alpha.
  * Intertwined decoding of RGB and alpha for a shorter
    time-to-first-decoded-pixel.
  * WebPIterator has a new member 'has_alpha' denoting whether the frame
    contains transparency.
  * Container spec amended with new 'blending method' for animation.

* tag 'v0.4.0':
  update ChangeLog
  update NEWS description with new general features
  gif2webp: don't use C99 %zu
  cwebp: fix metadata output w/lossy+alpha
  makefile.unix: clean up libgif2webp_util.a
  update Changelog
  bump version to 0.4.0
  update AUTHORS & .mailmap
  update NEWS for 0.4.0

Change-Id: I6df1b512fe0b697192f1f9b29431cd59aa6063d1
2013-12-30 09:57:59 -08:00
James Zern
7c76255dd4 autoconf: update ax_pthread.m4
from git://git.savannah.gnu.org/autoconf-archive.git
100644 blob d383ad5c6d	m4/ax_pthread.m4

dd94691 AX_PTHREAD: add support for Clang
386b290 AX_PTHREAD: fix support for AIX
e40cfd6 AX_PTHREAD: improve support for AIX

Change-Id: Ic1f1edc656e4e917b88f151493147f650b94242f
2013-12-29 11:56:21 -05:00
skal
fff2a11b03 make -short work with -print_ssim, -print_psnr, etc.
PSNR was always printed when using -short

Change-Id: I826785917961d355a980e4ac27dd1cf1295cb6de
2013-12-23 12:03:00 +01:00
143 changed files with 16199 additions and 6531 deletions

1
.gitattributes vendored
View File

@@ -2,3 +2,4 @@
.gitignore export-ignore
.mailmap export-ignore
*.pdf -text -diff
*.ppm -text -diff

4
.gitignore vendored
View File

@@ -1,5 +1,6 @@
*.l[ao]
*.[ao]
*.pc
.deps
.libs
/aclocal.m4
@@ -14,12 +15,15 @@
/libtool
/ltmain.sh
/missing
/mkinstalldirs
/stamp-h1
Makefile
Makefile.in
examples/[cdv]webp
examples/gif2webp
examples/webpmux
src/webp/config.h*
src/webp/stamp-h1
/output
/doc/output
*.idb

View File

@@ -5,3 +5,4 @@ Pascal Massimino <pascal.massimino@gmail.com>
Vikas Arora <vikasa@google.com>
<vikasa@google.com> <vikasa@gmail.com>
<vikasa@google.com> <vikaas.arora@gmail.com>
<slobodan.prijic@imgtec.com> <Slobodan.Prijic@imgtec.com>

View File

@@ -1,18 +1,25 @@
Contributors:
- Charles Munger (clm at google dot com)
- Christian Duvivier (cduvivier at google dot com)
- Djordje Pesut (djordje dot pesut at imgtec dot com)
- James Zern (jzern at google dot com)
- Jan Engelhardt (jengelh at medozas dot de)
- Johann (johann dot koenig at duck dot com)
- Jovan Zelincevic (jovan dot zelincevic at imgtec dot com)
- Jyrki Alakuijala (jyrki at google dot com)
- levytamar82 (tamar dot levy at intel dot com)
- Lou Quillio (louquillio at google dot com)
- Mans Rullgard (mans at mansr dot com)
- Martin Olsson (mnemo at minimum dot se)
- Mikołaj Zalewski (mikolajz at google dot com)
- Noel Chromium (noel at chromium dot org)
- Pascal Massimino (pascal dot massimino at gmail dot com)
- Paweł Hajdan, Jr (phajdan dot jr at chromium dot org)
- Pierre Joye (pierre dot php at gmail dot com)
- Scott LaVarnway (slavarnway at google dot com)
- Scott Talbot (s at chikachow dot org)
- Slobodan Prijic (slobodan dot prijic at imgtec dot com)
- Somnath Banerjee (somnath dot banerjee at gmail dot com)
- Timothy Gu (timothygu99 at gmail dot com)
- Urvang Joshi (urvang at google dot com)
- Vikas Arora (vikasa at google dot com)

View File

@@ -3,33 +3,65 @@ LOCAL_PATH := $(call my-dir)
WEBP_CFLAGS := -Wall -DANDROID -DHAVE_MALLOC_H -DHAVE_PTHREAD -DWEBP_USE_THREAD
ifeq ($(APP_OPTIM),release)
WEBP_CFLAGS += -finline-functions -frename-registers -ffast-math -s \
WEBP_CFLAGS += -finline-functions -ffast-math \
-ffunction-sections -fdata-sections
ifeq ($(findstring clang,$(NDK_TOOLCHAIN_VERSION)),)
WEBP_CFLAGS += -frename-registers -s
endif
endif
include $(CLEAR_VARS)
ifneq ($(findstring armeabi-v7a, $(TARGET_ARCH_ABI)),)
# Setting LOCAL_ARM_NEON will enable -mfpu=neon which may cause illegal
# instructions to be generated for armv7a code. Instead target the neon code
# specifically.
NEON := c.neon
else
NEON := c
endif
LOCAL_SRC_FILES := \
dec_srcs := \
src/dec/alpha.c \
src/dec/buffer.c \
src/dec/frame.c \
src/dec/idec.c \
src/dec/io.c \
src/dec/layer.c \
src/dec/quant.c \
src/dec/tree.c \
src/dec/vp8.c \
src/dec/vp8l.c \
src/dec/webp.c \
demux_srcs := \
src/demux/demux.c \
dsp_dec_srcs := \
src/dsp/alpha_processing.c \
src/dsp/alpha_processing_sse2.c \
src/dsp/cpu.c \
src/dsp/dec.c \
src/dsp/dec_clip_tables.c \
src/dsp/dec_mips32.c \
src/dsp/dec_neon.$(NEON) \
src/dsp/dec_sse2.c \
src/dsp/enc.c \
src/dsp/enc_sse2.c \
src/dsp/lossless.c \
src/dsp/lossless_mips32.c \
src/dsp/lossless_neon.$(NEON) \
src/dsp/lossless_sse2.c \
src/dsp/upsampling.c \
src/dsp/upsampling_neon.$(NEON) \
src/dsp/upsampling_sse2.c \
src/dsp/yuv.c \
src/dsp/yuv_mips32.c \
src/dsp/yuv_sse2.c \
dsp_enc_srcs := \
src/dsp/enc.c \
src/dsp/enc_avx2.c \
src/dsp/enc_mips32.c \
src/dsp/enc_neon.$(NEON) \
src/dsp/enc_sse2.c \
enc_srcs := \
src/enc/alpha.c \
src/enc/analysis.c \
src/enc/backward_references.c \
@@ -39,60 +71,145 @@ LOCAL_SRC_FILES := \
src/enc/frame.c \
src/enc/histogram.c \
src/enc/iterator.c \
src/enc/layer.c \
src/enc/picture.c \
src/enc/picture_csp.c \
src/enc/picture_psnr.c \
src/enc/picture_rescale.c \
src/enc/picture_tools.c \
src/enc/quant.c \
src/enc/syntax.c \
src/enc/token.c \
src/enc/tree.c \
src/enc/vp8l.c \
src/enc/webpenc.c \
src/utils/alpha_processing.c \
mux_srcs := \
src/mux/muxedit.c \
src/mux/muxinternal.c \
src/mux/muxread.c \
utils_dec_srcs := \
src/utils/bit_reader.c \
src/utils/bit_writer.c \
src/utils/color_cache.c \
src/utils/filters.c \
src/utils/huffman.c \
src/utils/huffman_encode.c \
src/utils/quant_levels.c \
src/utils/quant_levels_dec.c \
src/utils/random.c \
src/utils/rescaler.c \
src/utils/thread.c \
src/utils/utils.c \
utils_enc_srcs := \
src/utils/bit_writer.c \
src/utils/huffman_encode.c \
src/utils/quant_levels.c \
################################################################################
# libwebpdecoder
include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
$(dec_srcs) \
$(dsp_dec_srcs) \
$(utils_dec_srcs) \
LOCAL_CFLAGS := $(WEBP_CFLAGS)
LOCAL_C_INCLUDES += $(LOCAL_PATH)/src
# prefer arm over thumb mode for performance gains
LOCAL_ARM_MODE := arm
ifeq ($(TARGET_ARCH_ABI),armeabi-v7a)
# Setting LOCAL_ARM_NEON will enable -mfpu=neon which may cause illegal
# instructions to be generated for armv7a code. Instead target the neon code
# specifically.
LOCAL_SRC_FILES += src/dsp/dec_neon.c.neon
LOCAL_SRC_FILES += src/dsp/upsampling_neon.c.neon
LOCAL_SRC_FILES += src/dsp/enc_neon.c.neon
endif
LOCAL_STATIC_LIBRARIES := cpufeatures
LOCAL_MODULE := webp
LOCAL_MODULE := webpdecoder_static
include $(BUILD_STATIC_LIBRARY)
ifeq ($(ENABLE_SHARED),1)
include $(CLEAR_VARS)
LOCAL_WHOLE_STATIC_LIBRARIES := webpdecoder_static
LOCAL_MODULE := webpdecoder
include $(BUILD_SHARED_LIBRARY)
endif # ENABLE_SHARED=1
################################################################################
# libwebp
include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
examples/dwebp.c \
examples/example_util.c \
$(dsp_enc_srcs) \
$(enc_srcs) \
$(utils_enc_srcs) \
LOCAL_CFLAGS := $(WEBP_CFLAGS)
LOCAL_C_INCLUDES := $(LOCAL_PATH)/src
LOCAL_STATIC_LIBRARIES := webp
LOCAL_C_INCLUDES += $(LOCAL_PATH)/src
LOCAL_MODULE := dwebp
# prefer arm over thumb mode for performance gains
LOCAL_ARM_MODE := arm
include $(BUILD_EXECUTABLE)
LOCAL_WHOLE_STATIC_LIBRARIES := webpdecoder_static
LOCAL_MODULE := webp
ifeq ($(ENABLE_SHARED),1)
include $(BUILD_SHARED_LIBRARY)
else
include $(BUILD_STATIC_LIBRARY)
endif
################################################################################
# libwebpdemux
include $(CLEAR_VARS)
LOCAL_SRC_FILES := $(demux_srcs)
LOCAL_CFLAGS := $(WEBP_CFLAGS)
LOCAL_C_INCLUDES += $(LOCAL_PATH)/src
# prefer arm over thumb mode for performance gains
LOCAL_ARM_MODE := arm
LOCAL_MODULE := webpdemux
ifeq ($(ENABLE_SHARED),1)
LOCAL_SHARED_LIBRARIES := webp
include $(BUILD_SHARED_LIBRARY)
else
LOCAL_STATIC_LIBRARIES := webp
include $(BUILD_STATIC_LIBRARY)
endif
################################################################################
# libwebpmux
include $(CLEAR_VARS)
LOCAL_SRC_FILES := $(mux_srcs)
LOCAL_CFLAGS := $(WEBP_CFLAGS)
LOCAL_C_INCLUDES += $(LOCAL_PATH)/src
# prefer arm over thumb mode for performance gains
LOCAL_ARM_MODE := arm
LOCAL_MODULE := webpmux
ifeq ($(ENABLE_SHARED),1)
LOCAL_SHARED_LIBRARIES := webp
include $(BUILD_SHARED_LIBRARY)
else
LOCAL_STATIC_LIBRARIES := webp
include $(BUILD_STATIC_LIBRARY)
endif
################################################################################
include $(LOCAL_PATH)/examples/Android.mk
$(call import-module,android/cpufeatures)

483
ChangeLog
View File

@@ -1,3 +1,486 @@
569fe57 update NEWS
bd852f5 bump version to 0.4.3
2d58b64 WebPPictureRescale: add a note about 0 width/height
a0d8ca5 examples/Android.mk: add webpmux_example target
34b1d29 Android.mk: add webpmux target
7561988 Android.mk: add webpdemux target
a987576 Android.mk: add webpdecoder{,_static} targets
a6d4859 Android.mk: split source lists per-directory
77544d5 fix iOS arm64 build with Xcode 6.3
6dea157 doc/webp-container-spec: note MSB order for chunk diagrams
f7cd57b doc/webp-container-spec: cosmetics
1d6b250 vwebp: clear canvas at the beginning of each loop
f97b3f8 webp-container-spec: clarify background clear on loop
4ba83c1 vwebp: remove unnecessary static Help() prototype
d34e8e3 vwebp/animation: display last frame on end-of-loop
bbbc524 dec/vp8: clear 'dither_' on skipped blocks
0339fa2 lossless_neon: enable subtract green for aarch64
5a0c220 Regression fix for lossless decoding
6e3a31d wicdec: (msvs) quiet some /analyze warnings
b49a578 dwebp/WritePNG: mark png variables volatile
0a4391a dwebp: include setjmp.h w/WEBP_HAVE_PNG
90f1ec5 dwebp: correct sign in format strings
b61ce86 VP8LEncodeStream: add an assert
df1081b dsp/cpu: (msvs) add include for __cpuidex
39aa055 dsp/cpu: (msvs) avoid immintrin.h on _M_ARM
f814f42 dsp/cpu: add include for _xgetbv() w/MSVS
8508ab9 cpu: fix AVX2 detection for gcc/clang targets
5769623 fix handling of zero-sized partition #0 corner case
b2e71a9 make the 'last_cpuinfo_used' variable names unique
1273e84 add -Wformat-nonliteral and -Wformat-security
3ae78eb multi-thread fix: lock each entry points with a static var
5c1eeda webp-container-spec: remove references to fragments
c5ceea4 enc_neon: fix building with non-Xcode clang (iOS)
d0859d6 iosbuild: add x64_64 simulator support
046732c WebPEncode: Support encoding same pic twice (even if modified)
4426f50 webp/types.h: use inline for clang++/-std=c++11
e297fc7 gif2webp: Use the default hint instead of WEBP_HINT_GRAPH.
855fe43 Makefile.vc: add a 'legacy' RTLIBCFG option
b7eb6d5 gif2webp: Support GIF_DISPOSE_RESTORE_PREVIOUS
5691bdd gif2webp: Handle frames with odd offsets + disposal to background.
8301da1 stopwatch.h: fix includes
6a2209a update ChangeLog (tag: v0.4.2, origin/0.4.2, 0.4.2)
36cad6a bit_reader.h: cosmetics: fix a typo
e2ecae6 enc_mips32: workaround gcc-4.9 bug
243e68d update ChangeLog (tag: v0.4.2-rc2)
eec5f5f enc/vp8enci.h: update version number
0c1b98d update NEWS
69b0fc9 update AUTHORS
857578a bump version to 0.4.2
9129deb restore encode API compatibility
f17b95e AssignSegments: quiet -Warray-bounds warning
9c56c8a enc_neon: initialize vectors w/vdup_n_u32
a008902 iosbuild: cleanup
cc6de53 iosbuild: output autoconf req. on failure
740d765 iosbuild: make iOS 6 the minimum requirement
403023f iobuild.sh: only install .h files in Headers
b65727b Premultiply with alpha during U/V downsampling
8de0deb gif2webp: Background color correction
f8b7d94 Amend the lossless spec according to issue #205, #206 and #224
9102a7b Add a WebPExtractAlpha function to dsp
e407b5d webpmux: simplify InitializeConfig()
3e70e64 webpmux: fix indent
be38f1a webpmux: fix exit status on numeric value parse error
94dadcb webpmux: fix loop_count range check
40b3a61 examples: warn on invalid numeric parameters
b7d209a gif2webp: Handle frames with missing graphic control extension
bf0eb74 configure: simplify libpng-config invocation
3740f7d Rectify bug in lossless incremental decoding.
3ab0a37 make VP8LSetBitPos() set br->eos_ flag
2e4312b Lossless decoding: fix eos_ flag condition
e6609ac fix erroneous dec->status_ setting
5692eae add a fallback to ALPHA_NO_COMPRESSION
6ecd5bf ExUtilReadFromStdin: (windows) open stdin in bin mode
4206ac6 webpmux: (windows) open stdout in binary mode
d40e885 cwebp: (windows) open stdout in binary mode
4aaf463 example_util: add ExUtilSetBinaryMode
4c82ff7 webpmux man page: Clarify some title, descriptions and examples
23d4fb3 dsp/lossless: workaround gcc-4.9 bug on arm
5af7719 dsp.h: collect gcc/clang version test macros
90d1124 enc_neon: enable QuantizeBlock for aarch64
ee78e78 SmartRGBYUV: fix odd-width problem with pixel replication
c9ac204 fix some MSVC64 warning about float conversion
f4497a1 cpu: check for _MSC_VER before using msvc inline asm
e2159fd faster RGB->YUV conversion function (~7% speedup)
21abaa0 Add smart RGB->YUV conversion option -pre 4
1a161e2 configure: add work around for gcc-4.9 aarch64 bug
55b10de MIPS: mips32r2: added optimization for BSwap32
76d2192 Update PATENTS to reflect s/VP8/WebM/g
29a9db1 MIPS: detect mips32r6 and disable mips32r1 code
245c4a6 Correctly use the AC_CANONICAL_* macros
40aa8b6 cosmetics
2ddcca5 cosmetics: remove some extraneous 'extern's
f40dd7c vp8enci.h: cosmetics: fix '*' placement
4610c9c bit_writer: cosmetics: rename kFlush() -> Flush()
fc3c175 dsp: detect mips64 & disable mips32 code
c1a7955 cwebp.1: restore quality description
57a7e73 correct alpha_dithering_strength ABI check
6c83157 correct WebPMemoryWriterClear ABI check
8af2771 update ChangeLog (tag: v0.4.1, origin/0.4.1, 0.4.1)
f59c0b4 iosbuild.sh: specify optimization flags
8d34ea3 update ChangeLog (tag: v0.4.1-rc1)
dbc3da6 makefile.unix: add vwebp.1 to the dist target
89a7c83 update ChangeLog
ffe67ee Merge "update NEWS for the next release" into 0.4.1
2def1fe gif2webp: dust up the help message
fb668d7 remove -noalphadither option from README/vwebp.1
e49f693 update NEWS for the next release
cd01358 Merge "update AUTHORS" into 0.4.1
268d01e update AUTHORS
85213b9 bump version to 0.4.1
695f80a Merge "restore mux API compatibility" into 0.4.1
862d296 restore mux API compatibility
8f6f8c5 remove the !WEBP_REFERENCE_IMPLEMENTATION tweak in Put8x8uv
d713a69 Merge changes If4debc15,I437a5d5f into 0.4.1
c2fc52e restore encode API compatibility
793368e restore decode API compatibility
b8984f3 gif2webp: fix compile with giflib 5.1.0
222f9b1 gif2webp: simplify giflib version checking
d2cc61b Extend MakeARGB32() to accept Alpha channel.
4595b62 Merge "use explicit size of kErrorMessages[] arrays"
157de01 Merge "Actuate memory stats for PRINT_MEMORY_INFO"
fbda2f4 JPEG decoder: delay conversion to YUV to WebPEncode() call
0b747b1 use explicit size of kErrorMessages[] arrays
3398d81 Actuate memory stats for PRINT_MEMORY_INFO
6f3202b Merge "move WebPPictureInit to picture.c"
6c347bb move WebPPictureInit to picture.c
fb3acf1 fix configure message for multi-thread
40b086f configure: check for _beginthreadex
1549d62 reorder the YUVA->ARGB and ARGB->YUVA functions correctly
c6461bf Merge "extract colorspace code from picture.c into picture_csp.c"
736f2a1 extract colorspace code from picture.c into picture_csp.c
645daa0 Merge "configure: check for -Wformat-security"
abafed8 configure: check for -Wformat-security
fbadb48 split monolithic picture.c into picture_{tools,psnr,rescale}.c
c76f07e dec_neon/TransformAC3: initialize vector w/vcreate
bb4fc05 gif2webp: Allow single-frame animations
46fd44c thread: remove harmless race on status_ in End()
5a1a726 Merge "configure: check for __builtin_bswapXX()"
6781423 configure: check for __builtin_bswapXX()
6450c48 configure: fix iOS builds
6422e68 VP8LFillBitWindow: enable fast path for 32-bit builds
4f7f52b VP8LFillBitWindow: respect WEBP_FORCE_ALIGNED
e458bad endian_inl.h: implement htoleXX with BSwapXX
f2664d1 endian_inl.h: add BSwap16
6fbf534 Merge "configure: add --enable-aligned"
dc0f479 configure: add --enable-aligned
9cc69e2 Merge "configure: support WIC + OpenGL under mingw64"
257adfb remove experimental YUV444 YUV422 and YUV400 code
10f4257 configure: support WIC + OpenGL under mingw64
380cca4 configure.ac: add AC_C_BIGENDIAN
ee70a90 endian_inl.h: add BSwap64
47779d4 endian_inl.h: add BSwap32
d5104b1 utils: add endian_inl.h
58ab622 Merge "make alpha-detection loop in IsKeyFrame() in good x/y order"
9d56290 make alpha-detection loop in IsKeyFrame() in good x/y order
516971b lossless: Remove unaligned read warning
b8b596f Merge "configure.ac: add an autoconf version prerequisite"
34b02f8 configure.ac: add an autoconf version prerequisite
e59f536 neon: normalize vdup_n_* usage
6ee7160 Merge changes I0da7b3d3,Idad2f278,I4accc305
abc02f2 Merge "fix (uncompiled) typo"
bc03670 neon: add INIT_VECTOR4
6c1c632 neon: add INIT_VECTOR3
dc7687e neon: add INIT_VECTOR2
4536e7c add WebPMuxSetCanvasSize() to the mux API
824eab1 fix (uncompiled) typo
1f3e5f1 remove unused 'shift' argument and QFIX2 define
8e86705 Merge "VP8LoadNewBytes: use __builtin_bswap32 if available"
1b6a263 Merge "Fix handling of weird GIF with canvas dimension 0x0"
1da3d46 VP8LoadNewBytes: use __builtin_bswap32 if available
1582e40 Fix handling of weird GIF with canvas dimension 0x0
b8811da Merge "rename interface -> winterface"
db8b8b5 Fix logic in the GIF LOOP-detection parsing
25aaddc rename interface -> winterface
5584d9d make WebPSetWorkerInterface() check its arguments
a9ef7ef Merge "cosmetics: update thread.h comments"
c6af999 Merge "dust up the help message"
0a8b886 dust up the help message
a9cf319 cosmetics: update thread.h comments
27bfeee QuantizeBlock SSE2 Optimization:
2bc0dc3 Merge "webpmux: warn when odd frame offsets are used"
3114ebe Merge changes Id8edd3c1,Id418eb96,Ide05e3be
c072663 webpmux: warn when odd frame offsets are used
c5c6b40 Merge "add alpha dithering for lossy"
d514678 examples/Android.mk: add cwebp
ca0fa7c Android.mk: move dwebp to examples/Android.mk
73d8fca Android.mk: add ENABLE_SHARED flag
6e93317 muxread: fix out of bounds read
8b0f6a4 Makefile.vc: fix CFLAGS assignment w/HAVE_AVX2=1
bbe32df add alpha dithering for lossy
7902076 Merge "make error-code reporting consistent upon malloc failure"
77bf441 make error-code reporting consistent upon malloc failure
7a93c00 **/Makefile.am: remove unused AM_CPPFLAGS
24e3080 Add an interface abstraction to the WebP worker thread implementation
d6cd635 Merge "fix orig_rect==NULL case"
2bfd1ff fix orig_rect==NULL case
059e21c Merge "configure: move config.h to src/webp/config.h"
f05fe00 properly report back encoding error code in WebPFrameCacheAddFrame()
32b3137 configure: move config.h to src/webp/config.h
90090d9 Merge changes I7c675e51,I84f7d785
ae7661b makefiles: define WEBP_HAVE_AVX2 when appropriate
69fce2e remove the special casing for res->first in VP8SetResidualCoeffs
6e61a3a configure: test for -msse2
b9d2efc rename upsampling_mips32.c to yuv_mips32.c
bdfeeba dsp/yuv: move sse2 functions to yuv_sse2.c
46b32e8 Merge "configure: set WEBP_HAVE_AVX2 when available"
88305db Merge "VP8RandomBits2: prevent signed int overflow"
73fee88 VP8RandomBits2: prevent signed int overflow
db4860b enc_sse2: prevent signed int overflow
3fdaf4d Merge "real fix for longjmp warning"
385e334 real fix for longjmp warning
230a055 configure: set WEBP_HAVE_AVX2 when available
a2ac8a4 restore original value_/range_ field order
5e2ee56 Merge "remove libwebpdspdecode dep on libwebpdsp_avx2"
61362db remove libwebpdspdecode dep on libwebpdsp_avx2
42c447a Merge "lossy bit-reader clean-up:"
479ffd8 Merge "remove unused #include's"
9754d39 Merge "strong filtering speed-up (~2-3% x86, ~1-2% for NEON)"
158aff9 remove unused #include's
09545ee lossy bit-reader clean-up:
ea8b0a1 strong filtering speed-up (~2-3% x86, ~1-2% for NEON)
6679f89 Optimize VP8SetResidualCoeffs.
ac591cf fix for gcc-4.9 warnings about longjmp + local variables
4dfa86b dsp/cpu: NaCl has no support for xgetbv
4c39869 Merge "cwebp: fallback to native webp decode in WIC builds"
33aa497 Merge "cwebp: add some missing newlines in longhelp output"
c9b340a fix missing WebPInitAlphaProcessing call for premultiplied colorspace output
57897ba Merge "lossless_neon: use vcreate_*() where appropriate"
6aa4777 Merge "(enc|dec)_neon: use vcreate_*() where appropriate"
0d346e4 Always reinit VP8TransformWHT instead of hard-coding
7d039fc cwebp: fallback to native webp decode in WIC builds
d471f42 cwebp: add some missing newlines in longhelp output
bf0e003 lossless_neon: use vcreate_*() where appropriate
9251c2f (enc|dec)_neon: use vcreate_*() where appropriate
399b916 lossy decoding: correct alpha-rescaling for YUVA format
78c12ed Merge "Makefile.vc: add rudimentary avx2 support"
dc5b122 try to remove the spurious warning for static analysis
ddfefd6 Makefile.vc: add rudimentary avx2 support
a891164 Merge "simplify VP8LInitBitReader()"
fdbcd44 simplify VP8LInitBitReader()
7c00428 makefile.unix: add rudimentary avx2 support
515e35c Merge "add stub dsp/enc_avx2.c"
a05dc14 SSE2: yuv->rgb speed-up for point-sampling
178e9a6 add stub dsp/enc_avx2.c
1b99c09 Merge "configure: add a test for -mavx2"
fe72807 configure: add a test for -mavx2
e46a247 cpu: fix check for __cpuidex availability
176fda2 fix the bit-writer for lossless in 32bit mode
541784c dsp.h: add a check for AVX2 / define WEBP_USE_AVX2
bdb151e dsp/cpu: add AVX2 detection
ab9f2f8 Merge "revamp the point-sampling functions by processing a full plane"
a2f8b28 revamp the point-sampling functions by processing a full plane
ef07602 use decoder's DSP functions for autofilter
2b5cb32 Merge "dsp/cpu: add AVX detection"
df08e67 dsp/cpu: add AVX detection
e2f405c Merge "clean-up and slight speed-up in-loop filtering SSE2"
f60957b clean-up and slight speed-up in-loop filtering SSE2
9fc3ae4 .gitattributes: treat .ppm as binary
3da924b Merge "dsp/WEBP_USE_NEON: test for __aarch64__"
c716449 Android.mk: always include *_neon.c in the build
a577b23 dsp/WEBP_USE_NEON: test for __aarch64__
54bfffc move RemapBitReader() from idec.c to bit_reader code
34168ec Merge "remove all unused layer code"
f1e7717 remove all unused layer code
b0757db Code cleanup for VP8LGetHistoImageSymbols.
5fe628d make the token page size be variable instead of fixed 8192
f948d08 memory debug: allow setting pre-defined malloc failure points
ca3d746 use block-based allocation for backward refs storage, and free-lists
1ba61b0 enable NEON intrinsics in aarch64 builds
b9d2bb6 dsp/neon.h: coalesce intrinsics-related defines
b5c7525 iosbuild: add support for iOSv7/aarch64
9383afd Reduce number of memory allocations while decoding lossless.
888e63e Merge "dsp/lossless: prevent signed int overflow in left shift ops"
8137f3e Merge "instrument memory allocation routines for debugging"
2aa1873 instrument memory allocation routines for debugging
d3bcf72 Don't allocate VP8LHashChain, but treat like automatic object
bd6b861 dsp/lossless: prevent signed int overflow in left shift ops
b7f19b8 Merge "dec/vp8l: prevent signed int overflow in left shift ops"
29059d5 Merge "remove some uint64_t casts and use."
e69a1df dec/vp8l: prevent signed int overflow in left shift ops
cf5eb8a remove some uint64_t casts and use.
38e2db3 MIPS: MIPS32r1: Added optimization for HistogramAdd.
e0609ad dwebp: fix exit code on webp load failure
bbd358a Merge "example_util.h: avoid forward declaring enums"
8955da2 example_util.h: avoid forward declaring enums
6d6865f Added SSE2 variants for Average2/3/4
b3a616b make HistogramAdd() a pointer in dsp
c8bbb63 dec_neon: relocate some inline-asm defines
4e393bb dec_neon: enable intrinsics-only functions
ba99a92 dec_neon: use positive tests for USE_INTRINSICS
69058ff Merge "example_util: add ExUtilDecodeWebPIncremental"
a7828e8 dec_neon: make WORK_AROUND_GCC conditional on version
3f3d717 Merge "enc_neon: enable intrinsics-only functions"
de3cb6c Merge "move LOCAL_GCC_VERSION def to dsp.h"
1b2fe14 example_util: add ExUtilDecodeWebPIncremental
ca49e7a Merge "enc_neon: move Transpose4x4 to dsp/neon.h"
ad900ab Merge "fix warning about size_t -> int conversion"
4825b43 fix warning about size_t -> int conversion
42b35e0 enc_neon: enable intrinsics-only functions
f937e01 move LOCAL_GCC_VERSION def to dsp.h
5e1a17e enc_neon: move Transpose4x4 to dsp/neon.h
c7b92a5 dec_neon: (WORK_AROUND_GCC) delete unused Load4x8
8e5f90b Merge "make ExUtilLoadWebP() accept NULL bitstream param."
05d4c1b Merge "cwebp: add webpdec"
ddeb6ac cwebp: add webpdec
35d7d09 Merge "Reduce memory footprint for encoding WebP lossless."
0b89610 Reduce memory footprint for encoding WebP lossless.
f0b65c9 make ExUtilLoadWebP() accept NULL bitstream param.
9c0a60c Merge "dwebp: move webp decoding to example_util"
1d62acf MIPS: MIPS32r1: Added optimization for HuffmanCost functions.
4a0e739 dwebp: move webp decoding to example_util
c022046 Merge "Bugfix: Incremental decode of lossy-alpha"
8c7cd72 Bugfix: Incremental decode of lossy-alpha
7955152 MIPS: fix error with number of registers.
b1dabe3 Merge "Move the HuffmanCost() function to dsp lib"
75b1200 Move the HuffmanCost() function to dsp lib
2772b8b MIPS: fix assembler error revealed by clang's debug build
6653b60 enc_mips32: fix unused symbol warning in debug
8dec120 enc_mips32: disable ITransform(One) in debug builds
98519dd enc_neon: convert Disto4x4 to intrinsics
fe9317c cosmetics:
953b074 enc_neon: cosmetics
a9fc697 Merge "WIP: extract the float-calculation of HuffmanCost from loop"
3f84b52 Merge "replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)"
4ae0533 MIPS: MIPS32r1: Added optimizations for ExtraCost functions.
b30a04c WIP: extract the float-calculation of HuffmanCost from loop
a8fe8ce Merge "NEON intrinsics version of CollectHistogram"
95203d2 NEON intrinsics version of CollectHistogram
7ca2e74 replace some mult-long (vmull_u8) with mult-long-accumulate (vmlal_u8)
41c6efb fix lossless_neon.c
8ff96a0 NEON intrinsics version of FTransform
0214f4a Merge "MIPS: MIPS32r1: Added optimizations for FastLog2"
baabf1e MIPS: MIPS32r1: Added optimizations for FastLog2
3d49871 NEON functions for lossless coding
3fe0291 MIPS: MIPS32r1: Added optimizations for SSE functions.
c503b48 Merge "fix the gcc-4.6.0 bug by implementing alternative method"
abe6f48 fix the gcc-4.6.0 bug by implementing alternative method
5598bde enc_mips32.c: fix file mode
2b1b4d5 MIPS: MIPS32r1: Add optimization for GetResidualCost
f0a1f3c Merge "MIPS: MIPS32r1: Added optimization for FTransform"
7231f61 MIPS: MIPS32r1: Added optimization for FTransform
869eaf6 ~30% encoding speedup: use NEON for QuantizeBlock()
f758af6 enc_neon: convert FTransformWHT to intrinsics
7dad095 MIPS: MIPS32r1: Added optimization for Disto4x4 (TTransform)
2298d5f MIPS: MIPS32r1: Added optimization for QuantizeBlock
e88150c Merge "MIPS: MIPS32r1: Add optimization for ITransform"
de693f2 lossless_neon: disable VP8LConvert* functions
4143332 NEON intrinsics for encoding
0ca2914 MIPS: MIPS32r1: Add optimization for ITransform
71bca5e dec_neon: use vst_lane instead of vget_lane
bf06105 Intrinsics NEON version of TransformOne
19c6f1b Merge "dec_neon: use vld?_lane instead of vset?_lane"
7a94c0c upsampling_neon: drop NEON suffix from local functions
d14669c upsampling_sse2: drop SSE2 suffix from local functions
2ca42a4 enc_sse2: drop SSE2 suffix from local functions
d038e61 dec_sse2: drop SSE2 suffix from local functions
fa52d75 dec_neon: use vld?_lane instead of vset?_lane
c520e77 cosmetic: fix long line
4b0f2da Merge "add intrinsics NEON code for chroma strong-filtering"
e351ec0 add intrinsics NEON code for chroma strong-filtering
aaf734b Merge "Add SSE2 version of forward cross-color transform"
c90a902 Add SSE2 version of forward cross-color transform
bc374ff Use histogram_bits to initalize transform_bits.
2132992 Merge "Add strong filtering intrinsics (inner and outer edges)"
5fbff3a Add strong filtering intrinsics (inner and outer edges)
d4813f0 Add SSE2 function for Inverse Cross-color Transform
2602956 dec_neon: add strong loopfilter intrinsics
cca7d7e Merge "add intrinsics version of SimpleHFilter16NEON()"
1a05dfa windows: fix dll builds
d6c50d8 Merge "add some colorspace conversion functions in NEON"
4fd7c82 SSE2 variants of Subtract-Green: Rectify loop condition
97e5fac add some colorspace conversion functions in NEON
b9a7a45 add intrinsics version of SimpleHFilter16NEON()
daccbf4 add light filtering NEON intrinsics
af44460 fix typo in STORE_WHT
6af6b8e Tune HistogramCombineBin for large images.
af93bdd use WebPSafe[CM]alloc/WebPSafeFree instead of [cm]alloc/free
51f406a lossless_sse2: relocate VP8LDspInitSSE2 proto
0f4f721 separate SSE2 lossless functions into its own file
514fc25 VP8LConvertFromBGRA: use conversion function pointers
6d2f352 dsp/dec: TransformDCUV: use VP8TransformDC
defc8e1 Merge "fix out-of-bound read during alpha-plane decoding"
fbed364 Merge "dsp: reuse wht transform from dec in encoder"
d846708 Merge "Add SSE2 version of ARGB -> BGR/RGB/... conversion functions"
207d03b fix out-of-bound read during alpha-plane decoding
d1b33ad 2-5% faster trellis with clang/MacOS (and ~2-3% on ARM)
369c26d Add SSE2 version of ARGB -> BGR/RGB/... conversion functions
df230f2 dsp: reuse wht transform from dec in encoder
80e218d Android.mk: fix build with APP_ABI=armeabi-v7a-hard
59daf08 Merge "cosmetics:"
5362200 cosmetics:
3e7f34a AssignSegments: quiet array-bounds warning
3c2ebf5 Merge "UpdateHistogramCost: avoid implicit double->float"
cf821c8 UpdateHistogramCost: avoid implicit double->float
312e638 Extend the search space for GetBestGreenRedToBlue
1c58526 Fix few nits
fef2270 Optimize and re-structure VP8LGetHistoImageSymbols
068b14a Optimize lossless decoding.
5f0cfa8 Do a binary search to get the optimum cache bits.
24ca367 Merge "allow 'cwebp -o -' to emit output to stdout"
e12f874 allow 'cwebp -o -' to emit output to stdout
2bcad89 allow some more stdin/stout I/O
84ed4b3 fix cwebp.1 typos after patch #69199
65b99f1 add a -z option to cwebp, and WebPConfigLosslessPreset() function
3017661 4-5% faster trellis by removing some unneeded calculations.
687a58e histogram.c: reindent after b33e8a0
06d456f Merge "~3-4% faster lossless encoding"
c60de26 ~3-4% faster lossless encoding
42eb06f Merge "few cosmetics after patch #69079"
82af826 few cosmetics after patch #69079
b33e8a0 Refactor code for HistogramCombine.
ca1bfff Merge "5-10% encoding speedup with faster trellis (-m 6)"
5aeeb08 5-10% encoding speedup with faster trellis (-m 6)
82ae1bf cosmetics: normalize VP8GetCPUInfo checks
e3dd924 Merge "Refactor GetBestPredictorForTile for future tuning."
206cc1b Refactor GetBestPredictorForTile for future tuning.
3cb8406 Merge "speed-up trellis quant (~5-10% overall speed-up)"
b66f222 Merge "lossy encoding: ~3% speed-up"
4287d0d speed-up trellis quant (~5-10% overall speed-up)
390c8b3 lossy encoding: ~3% speed-up
9a463c4 Merge "dec_neon: convert TransformWHT to intrinsics"
e8605e9 Merge "dec_neon: add ConvertU8ToS16"
4aa3e41 MIPS: MIPS32r1: rescaler bugfix
c16cd99 Speed up lossless encoder.
9d6b5ff dec_neon: convert TransformWHT to intrinsics
2ff0aae dec_neon: add ConvertU8ToS16
77a8f91 fix compilation with USE_YUVj flag
4acbec1 Merge changes I3b240ffb,Ia9370283,Ia2d28728
2719bb7 dec_neon: TransformAC3: work on packed vectors
b7b60ca dec_neon: add SaturateAndStore4x4
b7685d7 Rescale: let ImportRow / ExportRow be pointer-to-function
e02f16e dec_neon.c: convert TransformDC to intrinsics
9cba963 add missing file
8992ddb use static clipping tables
0235d5e 1-2% faster quantization in SSE2
b2fbc36 fix VC12-x64 warning
6e37cb9 Merge "cosmetics: backward_references.c: reindent after a7d2ee3"
a42ea97 cosmetics: backward_references.c: reindent after a7d2ee3
6c32744 Merge "fix missing __BIG_ENDIAN__ definition on some platform"
a8b6aad fix missing __BIG_ENDIAN__ definition on some platform
fde2904 Increase initial buffer size for VP8L Bit Writer.
a7d2ee3 Optimize cache estimate logic.
7fb6095 Merge "dec_neon.c: add TransformAC3"
bf182e8 VP8LBitWriter: use a bit-accumulator
3f40b4a Merge "MIPS: MIPS32r1: clang macro warning resolved"
1684f4e WebP Decoder: Mark some truncated bitstreams as invalid
acbedac MIPS: MIPS32r1: clang macro warning resolved
228e487 dec_neon.c: add TransformAC3
393f89b Android.mk: avoid gcc-specific flags with clang
32aeaf1 revamp VP8LColorSpaceTransform() a bit
0c7cc4c Merge "Don't dereference NULL, ensure HashChain fully initialized"
391316f Don't dereference NULL, ensure HashChain fully initialized
926ff40 WEBP_SWAP_16BIT_CSP: remove code dup
1d1cd3b Fix decode bug for rgbA_4444/RGBA_4444 color-modes.
939e70e update AUTHORS file
8934a62 cosmetics: *_mips32.c
dd438c9 MIPS: MIPS32r1: Optimization of some simple point-sampling functions. PATCH [6/6]
5352091 Added support for calling sampling functions via pointers.
d16c697 MIPS: MIPS32r1: Optimization of filter functions. PATCH [5/6]
04336fc MIPS: MIPS32r1: Optimization of function TransformOne. PATCH [4/6]
92d8fc7 MIPS: MIPS32r1: Optimization of function WebPRescalerImportRow. PATCH [3/6]
bbc23ff parse one row of intra modes altogether
a2f608f Merge "MIPS: MIPS32r1: Optimization of function WebPRescalerExportRow. [2/6]"
8823085 MIPS: MIPS32r1: Optimization of function WebPRescalerExportRow. [2/6]
c5a5b02 decode mt+incremental: fix segfault in debug builds
9882b2f always use fast-analysis for all methods.
000adac Merge "autoconf: update ax_pthread.m4"
2d2fc37 update .gitignore
5bf4255 Merge "Make it possible to avoid automagic dependencies"
c1cb193 disable NEON for arm64 platform
73a304e Make it possible to avoid automagic dependencies
4d493f8 MIPS: MIPS32r1: Decoder bit reader function optimized. PATCH [1/6]
c741183 make WebPCleanupTransparentArea work with argb picture
5da1855 add a decoding option to flip image vertically
00c3c4e Merge "add man/vwebp.1"
2c6bb42 add man/vwebp.1
ea59a8e Merge "Merge tag 'v0.4.0'"
7574bed fix comments related to array sizes
0b5a90f dwebp.1: fix option formatting
effcb0f Merge tag 'v0.4.0'
7c76255 autoconf: update ax_pthread.m4
fff2a11 make -short work with -print_ssim, -print_psnr, etc.
68e7901 update ChangeLog (tag: v0.4.0-rc1, tag: v0.4.0, origin/0.4.0, 0.4.0)
256e433 update NEWS description with new general features
2962534 Merge "gif2webp: don't use C99 %zu" into 0.4.0
3b9f9dd gif2webp: don't use C99 %zu

View File

@@ -27,7 +27,7 @@ PLATFORM_LDFLAGS = /SAFESEH
NOLOGO = /nologo
CCNODBG = cl.exe $(NOLOGO) /O2 /DNDEBUG
CCDEBUG = cl.exe $(NOLOGO) /Od /Gm /Zi /D_DEBUG /RTC1
CFLAGS = /Isrc $(NOLOGO) /W3 /EHsc /c /GS
CFLAGS = /Isrc $(NOLOGO) /W3 /EHsc /c
CFLAGS = $(CFLAGS) /DWIN32 /D_CRT_SECURE_NO_WARNINGS /DWIN32_LEAN_AND_MEAN
CFLAGS = $(CFLAGS) /DHAVE_WINCODEC_H /DWEBP_USE_THREAD
LDFLAGS = /LARGEADDRESSAWARE /MANIFEST /NXCOMPAT /DYNAMICBASE
@@ -44,11 +44,21 @@ OUTDIR = ..\obj\
OUTDIR = $(OBJDIR)
!ENDIF
!IF "$(HAVE_AVX2)" == "1"
CFLAGS = $(CFLAGS) /DWEBP_HAVE_AVX2
AVX2_FLAGS = /arch:AVX2
!ENDIF
##############################################################
# Runtime library configuration
!IF "$(RTLIBCFG)" == "static"
RTLIB = /MT
RTLIBD = /MTd
!ELSE IF "$(RTLIBCFG)" == "legacy"
RTLIBCFG = static
RTLIB = /MT
RTLIBD = /MTd
CFLAGS = $(CFLAGS) /GS- /arch:IA32
!ELSE
RTLIB = /MD
RTLIBD = /MDd
@@ -134,6 +144,7 @@ CFGSET = TRUE
!MESSAGE - all - build (de)mux-based targets for CFG
!MESSAGE
!MESSAGE RTLIBCFG controls the runtime library linkage - 'static' or 'dynamic'.
!MESSAGE 'legacy' will produce a Windows 2000 compatible library.
!MESSAGE OBJDIR is the path where you like to build (obj, bins, etc.),
!MESSAGE defaults to ..\obj
@@ -156,7 +167,6 @@ DEC_OBJS = \
$(DIROBJ)\dec\frame.obj \
$(DIROBJ)\dec\idec.obj \
$(DIROBJ)\dec\io.obj \
$(DIROBJ)\dec\layer.obj \
$(DIROBJ)\dec\quant.obj \
$(DIROBJ)\dec\tree.obj \
$(DIROBJ)\dec\vp8.obj \
@@ -167,18 +177,29 @@ DEMUX_OBJS = \
$(DIROBJ)\demux\demux.obj \
DSP_DEC_OBJS = \
$(DIROBJ)\dsp\alpha_processing.obj \
$(DIROBJ)\dsp\alpha_processing_sse2.obj \
$(DIROBJ)\dsp\cpu.obj \
$(DIROBJ)\dsp\dec.obj \
$(DIROBJ)\dsp\dec_clip_tables.obj \
$(DIROBJ)\dsp\dec_mips32.obj \
$(DIROBJ)\dsp\dec_neon.obj \
$(DIROBJ)\dsp\dec_sse2.obj \
$(DIROBJ)\dsp\lossless.obj \
$(DIROBJ)\dsp\lossless_mips32.obj \
$(DIROBJ)\dsp\lossless_neon.obj \
$(DIROBJ)\dsp\lossless_sse2.obj \
$(DIROBJ)\dsp\upsampling.obj \
$(DIROBJ)\dsp\upsampling_neon.obj \
$(DIROBJ)\dsp\upsampling_sse2.obj \
$(DIROBJ)\dsp\yuv.obj \
$(DIROBJ)\dsp\yuv_mips32.obj \
$(DIROBJ)\dsp\yuv_sse2.obj \
DSP_ENC_OBJS = \
$(DIROBJ)\dsp\enc.obj \
$(DIROBJ)\dsp\enc_avx2.obj \
$(DIROBJ)\dsp\enc_mips32.obj \
$(DIROBJ)\dsp\enc_neon.obj \
$(DIROBJ)\dsp\enc_sse2.obj \
@@ -187,6 +208,7 @@ EX_FORMAT_DEC_OBJS = \
$(DIROBJ)\examples\metadata.obj \
$(DIROBJ)\examples\pngdec.obj \
$(DIROBJ)\examples\tiffdec.obj \
$(DIROBJ)\examples\webpdec.obj \
$(DIROBJ)\examples\wicdec.obj \
EX_UTIL_OBJS = \
@@ -202,8 +224,11 @@ ENC_OBJS = \
$(DIROBJ)\enc\frame.obj \
$(DIROBJ)\enc\histogram.obj \
$(DIROBJ)\enc\iterator.obj \
$(DIROBJ)\enc\layer.obj \
$(DIROBJ)\enc\picture.obj \
$(DIROBJ)\enc\picture_csp.obj \
$(DIROBJ)\enc\picture_psnr.obj \
$(DIROBJ)\enc\picture_rescale.obj \
$(DIROBJ)\enc\picture_tools.obj \
$(DIROBJ)\enc\quant.obj \
$(DIROBJ)\enc\syntax.obj \
$(DIROBJ)\enc\token.obj \
@@ -217,7 +242,6 @@ MUX_OBJS = \
$(DIROBJ)\mux\muxread.obj \
UTILS_DEC_OBJS = \
$(DIROBJ)\utils\alpha_processing.obj \
$(DIROBJ)\utils\bit_reader.obj \
$(DIROBJ)\utils\color_cache.obj \
$(DIROBJ)\utils\filters.obj \
@@ -309,8 +333,14 @@ $(DIROBJ)\$(DLLC): $(DIROBJ)\$(DLLINC)
@echo } >> $@
.SUFFIXES: .c .obj .res .exe
# File-specific flag builds. Note batch rules take precedence over wildcards,
# so for now name each file individually.
$(DIROBJ)\dsp\enc_avx2.obj: src\dsp\enc_avx2.c
$(CC) $(CFLAGS) $(AVX2_FLAGS) /Fd$(LIBWEBP_PDBNAME) /Fo$(DIROBJ)\dsp\ \
src\dsp\$(@B).c
# Batch rules
{examples}.c{$(DIROBJ)\examples}.obj::
$(CC) $(CFLAGS) /Fd$(DIROBJ)\examples\ /Fo$(DIROBJ)\examples\ $<
$(CC) $(CFLAGS) /Fd$(DIROBJ)\examples\ /Fo$(DIROBJ)\examples\ $<
{src\dec}.c{$(DIROBJ)\dec}.obj::
$(CC) $(CFLAGS) /Fd$(LIBWEBP_PDBNAME) /Fo$(DIROBJ)\dec\ $<
{src\demux}.c{$(DIROBJ)\demux}.obj::

26
NEWS
View File

@@ -1,3 +1,29 @@
- 3/3/15: version 0.4.3
This is a binary compatible release.
* Android / gcc / iOS / MSVS build fixes and improvements
* lossless decode fix (issue #239 -- since 0.4.0)
* documentation / vwebp updates for animation
* multi-threading fix (issue #234)
- 10/13/14: version 0.4.2
This is a binary compatible release.
* Android / gcc build fixes
* (Windows) fix reading from stdin and writing to stdout
* gif2webp: miscellaneous fixes
* fix 'alpha-leak' with lossy compression (issue #220)
* the lossless bitstream spec has been amended to reflect the current code
- 7/24/14: version 0.4.1
This is a binary compatible release.
* AArch64 (arm64) & MIPS support/optimizations
* NEON assembly additions:
- ~25% faster lossy decode / encode (-m 4)
- ~10% faster lossless decode
- ~5-10% faster lossless encode (-m 3/4)
* dwebp/vwebp can read from stdin
* cwebp/gif2webp can write to stdout
* cwebp can read webp files; useful if storing sources as webp lossless
- 12/19/13: version 0.4.0
* improved gif2webp tool
* numerous fixes, compression improvement and speed-up

39
PATENTS
View File

@@ -1,22 +1,23 @@
Additional IP Rights Grant (Patents)
------------------------------------
"This implementation" means the copyrightable works distributed by
Google as part of the WebM Project.
"These implementations" means the copyrightable works that implement the WebM
codecs distributed by Google as part of the WebM Project.
Google hereby grants to you a perpetual, worldwide, non-exclusive,
no-charge, royalty-free, irrevocable (except as stated in this section)
patent license to make, have made, use, offer to sell, sell, import,
transfer, and otherwise run, modify and propagate the contents of this
implementation of VP8, where such license applies only to those patent
claims, both currently owned by Google and acquired in the future,
licensable by Google that are necessarily infringed by this
implementation of VP8. This grant does not include claims that would be
infringed only as a consequence of further modification of this
implementation. If you or your agent or exclusive licensee institute or
order or agree to the institution of patent litigation against any
entity (including a cross-claim or counterclaim in a lawsuit) alleging
that this implementation of VP8 or any code incorporated within this
implementation of VP8 constitutes direct or contributory patent
infringement, or inducement of patent infringement, then any patent
rights granted to you under this License for this implementation of VP8
shall terminate as of the date such litigation is filed.
Google hereby grants to you a perpetual, worldwide, non-exclusive, no-charge,
royalty-free, irrevocable (except as stated in this section) patent license to
make, have made, use, offer to sell, sell, import, transfer, and otherwise
run, modify and propagate the contents of these implementations of WebM, where
such license applies only to those patent claims, both currently owned by
Google and acquired in the future, licensable by Google that are necessarily
infringed by these implementations of WebM. This grant does not include claims
that would be infringed only as a consequence of further modification of these
implementations. If you or your agent or exclusive licensee institute or order
or agree to the institution of patent litigation or any other patent
enforcement activity against any entity (including a cross-claim or
counterclaim in a lawsuit) alleging that any of these implementations of WebM
or any code incorporated within any of these implementations of WebM
constitutes direct or contributory patent infringement, or inducement of
patent infringement, then any patent rights granted to you under this License
for these implementations of WebM shall terminate as of the date such
litigation is filed.

132
README
View File

@@ -4,7 +4,7 @@
\__\__/\____/\_____/__/ ____ ___
/ _/ / \ \ / _ \/ _/
/ \_/ / / \ \ __/ \__
\____/____/\_____/_____/____/v0.4.0
\____/____/\_____/_____/____/v0.4.3
Description:
============
@@ -140,28 +140,30 @@ A longer list of options is available using the -longhelp command line flag:
Usage:
cwebp [-preset <...>] [options] in_file [-o out_file]
If input size (-s) for an image is not specified, it is assumed to be a PNG,
JPEG or TIFF file.
options:
If input size (-s) for an image is not specified, it is
assumed to be a PNG, JPEG, TIFF or WebP file.
Options:
-h / -help ............ short help
-H / -longhelp ........ long help
-q <float> ............. quality factor (0:small..100:big)
-alpha_q <int> ......... Transparency-compression quality (0..100).
-preset <string> ....... Preset setting, one of:
-alpha_q <int> ......... transparency-compression quality (0..100)
-preset <string> ....... preset setting, one of:
default, photo, picture,
drawing, icon, text
-preset must come first, as it overwrites other parameters.
-preset must come first, as it overwrites other parameters
-m <int> ............... compression method (0=fast, 6=slowest)
-segments <int> ........ number of segments to use (1..4)
-size <int> ............ Target size (in bytes)
-psnr <float> .......... Target PSNR (in dB. typically: 42)
-size <int> ............ target size (in bytes)
-psnr <float> .......... target PSNR (in dB. typically: 42)
-s <int> <int> ......... Input size (width x height) for YUV
-sns <int> ............. Spatial Noise Shaping (0:off, 100:max)
-s <int> <int> ......... input size (width x height) for YUV
-sns <int> ............. spatial noise shaping (0:off, 100:max)
-f <int> ............... filter strength (0=off..100)
-sharpness <int> ....... filter sharpness (0:most .. 7:least sharp)
-strong ................ use strong filter instead of simple (default).
-nostrong .............. use simple filter instead of strong.
-strong ................ use strong filter instead of simple (default)
-nostrong .............. use simple filter instead of strong
-partition_limit <int> . limit quality to fit the 512k limit on
the first partition (0=no degradation ... 100=full)
-pass <int> ............ analysis pass number (1..10)
@@ -169,41 +171,40 @@ options:
-resize <w> <h> ........ resize picture (after any cropping)
-mt .................... use multi-threading if available
-low_memory ............ reduce memory usage (slower encoding)
-map <int> ............. print map of extra info.
-print_psnr ............ prints averaged PSNR distortion.
-print_ssim ............ prints averaged SSIM distortion.
-print_lsim ............ prints local-similarity distortion.
-d <file.pgm> .......... dump the compressed output (PGM file).
-alpha_method <int> .... Transparency-compression method (0..1)
-alpha_filter <string> . predictive filtering for alpha plane.
One of: none, fast (default) or best.
-alpha_cleanup ......... Clean RGB values in transparent area.
-blend_alpha <hex> ..... Blend colors against background color
-map <int> ............. print map of extra info
-print_psnr ............ prints averaged PSNR distortion
-print_ssim ............ prints averaged SSIM distortion
-print_lsim ............ prints local-similarity distortion
-d <file.pgm> .......... dump the compressed output (PGM file)
-alpha_method <int> .... transparency-compression method (0..1)
-alpha_filter <string> . predictive filtering for alpha plane,
one of: none, fast (default) or best
-alpha_cleanup ......... clean RGB values in transparent area
-blend_alpha <hex> ..... blend colors against background color
expressed as RGB values written in
hexadecimal, e.g. 0xc0e0d0 for red=0xc0
green=0xe0 and blue=0xd0.
-noalpha ............... discard any transparency information.
-lossless .............. Encode image losslessly.
-hint <string> ......... Specify image characteristics hint.
One of: photo, picture or graph
green=0xe0 and blue=0xd0
-noalpha ............... discard any transparency information
-lossless .............. encode image losslessly
-hint <string> ......... specify image characteristics hint,
one of: photo, picture or graph
-metadata <string> ..... comma separated list of metadata to
copy from the input to the output if present.
Valid values: all, none (default), exif, icc, xmp
-short ................. condense printed message
-quiet ................. don't print anything.
-version ............... print version number and exit.
-noasm ................. disable all assembly optimizations.
-quiet ................. don't print anything
-version ............... print version number and exit
-noasm ................. disable all assembly optimizations
-v ..................... verbose, e.g. print encoding/decoding times
-progress .............. report encoding progress
Experimental Options:
-jpeg_like ............. Roughly match expected JPEG size.
-af .................... auto-adjust filter strength.
-jpeg_like ............. roughly match expected JPEG size
-af .................... auto-adjust filter strength
-pre <int> ............. pre-processing filter
The main options you might want to try in order to further tune the
visual quality are:
-preset
@@ -262,19 +263,19 @@ Use following options to convert into alternate image formats:
-yuv ......... save the raw YUV samples in flat layout
Other options are:
-version .... print version number and exit.
-nofancy ..... don't use the fancy YUV420 upscaler.
-nofilter .... disable in-loop filtering.
-nodither .... disable dithering.
-version .... print version number and exit
-nofancy ..... don't use the fancy YUV420 upscaler
-nofilter .... disable in-loop filtering
-nodither .... disable dithering
-dither <d> .. dithering strength (in 0..100)
-mt .......... use multi-threading
-crop <x> <y> <w> <h> ... crop output with the given rectangle
-scale <w> <h> .......... scale the output (*after* any cropping)
-alpha ....... only save the alpha plane.
-alpha ....... only save the alpha plane
-incremental . use incremental decoding (useful for tests)
-h ....... this help message.
-h ....... this help message
-v ....... verbose (e.g. print encoding/decoding times)
-noasm ....... disable all assembly optimizations.
-noasm ....... disable all assembly optimizations
Visualization tool:
===================
@@ -288,19 +289,19 @@ Usage: vwebp in_file [options]
Decodes the WebP image file and visualize it using OpenGL
Options are:
-version .... print version number and exit.
-noicc ....... don't use the icc profile if present.
-nofancy ..... don't use the fancy YUV420 upscaler.
-nofilter .... disable in-loop filtering.
-dither <int> dithering strength (0..100). Default=50.
-mt .......... use multi-threading.
-info ........ print info.
-h ....... this help message.
-version .... print version number and exit
-noicc ....... don't use the icc profile if present
-nofancy ..... don't use the fancy YUV420 upscaler
-nofilter .... disable in-loop filtering
-dither <int> dithering strength (0..100), default=50
-mt .......... use multi-threading
-info ........ print info
-h ....... this help message
Keyboard shortcuts:
'c' ................ toggle use of color profile.
'i' ................ overlay file information.
'q' / 'Q' / ESC .... quit.
'c' ................ toggle use of color profile
'i' ................ overlay file information
'q' / 'Q' / ESC .... quit
Building:
---------
@@ -336,24 +337,24 @@ vwebp.
Usage:
gif2webp [options] gif_file -o webp_file
options:
Options:
-h / -help ............ this help
-lossy ................. Encode image using lossy compression.
-mixed ................. For each frame in the image, pick lossy
or lossless compression heuristically.
-lossy ................. encode image using lossy compression
-mixed ................. for each frame in the image, pick lossy
or lossless compression heuristically
-q <float> ............. quality factor (0:small..100:big)
-m <int> ............... compression method (0=fast, 6=slowest)
-kmin <int> ............ Min distance between key frames
-kmax <int> ............ Max distance between key frames
-kmin <int> ............ min distance between key frames
-kmax <int> ............ max distance between key frames
-f <int> ............... filter strength (0=off..100)
-metadata <string> ..... comma separated list of metadata to
copy from the input to the output if present.
copy from the input to the output if present
Valid values: all, none, icc, xmp (default)
-mt .................... use multi-threading if available
-version ............... print version number and exit.
-v ..................... verbose.
-quiet ................. don't print anything.
-version ............... print version number and exit
-v ..................... verbose
-quiet ................. don't print anything
Building:
---------
@@ -442,15 +443,20 @@ The encoding flow looks like:
// Set up a byte-output write method. WebPMemoryWriter, for instance.
WebPMemoryWriter wrt;
WebPMemoryWriterInit(&wrt); // initialize 'wrt'
pic.writer = MyFileWriter;
pic.custom_ptr = my_opaque_structure_to_make_MyFileWriter_work;
// initialize 'wrt' here...
// Compress!
int ok = WebPEncode(&config, &pic); // ok = 0 => error occurred!
WebPPictureFree(&pic); // must be called independently of the 'ok' result.
// output data should have been handled by the writer at that point.
// -> compressed data is the memory buffer described by wrt.mem / wrt.size
// deallocate the memory used by compressed data
WebPMemoryWriterClear(&wrt);
-------------------------------------- END PSEUDO EXAMPLE

View File

@@ -1,7 +1,7 @@
 __ __ ____ ____ ____ __ __ _ __ __
/ \\/ \/ _ \/ _ \/ _ \/ \ \/ \___/_ / _\
\ / __/ _ \ __/ / / (_/ /__
\__\__/\_____/_____/__/ \__//_/\_____/__/___/v0.2.0
\__\__/\_____/_____/__/ \__//_/\_____/__/___/v0.2.2
Description:
@@ -33,35 +33,35 @@ Usage: webpmux -get GET_OPTIONS INPUT -o OUTPUT
webpmux -version
GET_OPTIONS:
Extract relevant data.
icc Get ICC profile.
exif Get EXIF metadata.
xmp Get XMP metadata.
frame n Get nth frame.
Extract relevant data:
icc get ICC profile
exif get EXIF metadata
xmp get XMP metadata
frame n get nth frame
SET_OPTIONS:
Set color profile/metadata.
icc file.icc Set ICC profile.
exif file.exif Set EXIF metadata.
xmp file.xmp Set XMP metadata.
Set color profile/metadata:
icc file.icc set ICC profile
exif file.exif set EXIF metadata
xmp file.xmp set XMP metadata
where: 'file.icc' contains the ICC profile to be set,
'file.exif' contains the EXIF metadata to be set
'file.xmp' contains the XMP metadata to be set
STRIP_OPTIONS:
Strip color profile/metadata.
icc Strip ICC profile.
exif Strip EXIF metadata.
xmp Strip XMP metadata.
Strip color profile/metadata:
icc strip ICC profile
exif strip EXIF metadata
xmp strip XMP metadata
FRAME_OPTIONS(i):
Create animation.
Create animation:
file_i +di+[xi+yi[+mi[bi]]]
where: 'file_i' is the i'th animation frame (WebP format),
'di' is the pause duration before next frame.
'xi','yi' specify the image offset for this frame.
'mi' is the dispose method for this frame (0 or 1).
'bi' is the blending method for this frame (+b or -b).
'di' is the pause duration before next frame,
'xi','yi' specify the image offset for this frame,
'mi' is the dispose method for this frame (0 or 1),
'bi' is the blending method for this frame (+b or -b)
LOOP_COUNT:
Number of times to repeat the animation.
@@ -72,7 +72,7 @@ BACKGROUND_COLOR:
A,R,G,B
where: 'A', 'R', 'G' and 'B' are integers in the range 0 to 255 specifying
the Alpha, Red, Green and Blue component values respectively
[Default: 255,255,255,255].
[Default: 255,255,255,255]
INPUT & OUTPUT are in WebP format.

View File

@@ -1,7 +1,8 @@
AC_INIT([libwebp], [0.4.0],
AC_INIT([libwebp], [0.4.3],
[http://code.google.com/p/webp/issues],,
[http://developers.google.com/speed/webp])
AC_CANONICAL_TARGET
AC_CANONICAL_HOST
AC_PREREQ([2.60])
AM_INIT_AUTOMAKE([-Wall foreign subdir-objects])
dnl === automake >= 1.12 requires this for 'unusual archivers' support.
@@ -14,6 +15,9 @@ AM_PROG_CC_C_O
dnl === Enable less verbose output when building.
m4_ifdef([AM_SILENT_RULES], [AM_SILENT_RULES([yes])])
dnl == test endianness
AC_C_BIGENDIAN
dnl === SET_IF_UNSET(shell_var, value)
dnl === Set the shell variable 'shell_var' to 'value' if it is unset.
AC_DEFUN([SET_IF_UNSET], [test "${$1+set}" = "set" || $1=$2])
@@ -31,33 +35,78 @@ AC_ARG_WITH([pkgconfigdir], AS_HELP_STRING([--with-pkgconfigdir=DIR],
[pkgconfigdir="$withval"], [pkgconfigdir='${libdir}/pkgconfig'])
AC_SUBST([pkgconfigdir])
dnl === TEST_AND_ADD_CFLAGS(flag)
dnl === Checks whether $CC supports 'flag' and adds it to AM_CFLAGS on success.
dnl === TEST_AND_ADD_CFLAGS(var, flag)
dnl === Checks whether $CC supports 'flag' and adds it to 'var'
dnl === on success.
AC_DEFUN([TEST_AND_ADD_CFLAGS],
[SAVED_CFLAGS="$CFLAGS"
CFLAGS="-Werror $1"
AC_MSG_CHECKING([whether $CC supports $1])
CFLAGS="-Werror $2"
AC_MSG_CHECKING([whether $CC supports $2])
dnl Note AC_LANG_PROGRAM([]) uses an old-style main definition.
AC_COMPILE_IFELSE([AC_LANG_SOURCE([int main(void) { return 0; }])],
[AC_MSG_RESULT([yes])]
dnl Simply append the variable avoiding a
dnl compatibility ifdef for AS_VAR_APPEND as this
dnl variable shouldn't grow all that large.
[AM_CFLAGS="$AM_CFLAGS $1"],
[$1="${$1} $2"],
[AC_MSG_RESULT([no])])
CFLAGS="$SAVED_CFLAGS"])
TEST_AND_ADD_CFLAGS([-Wall])
TEST_AND_ADD_CFLAGS([-Wdeclaration-after-statement])
TEST_AND_ADD_CFLAGS([-Wextra])
TEST_AND_ADD_CFLAGS([-Wmissing-declarations])
TEST_AND_ADD_CFLAGS([-Wmissing-prototypes])
TEST_AND_ADD_CFLAGS([-Wold-style-definition])
TEST_AND_ADD_CFLAGS([-Wshadow])
TEST_AND_ADD_CFLAGS([-Wunused-but-set-variable])
TEST_AND_ADD_CFLAGS([-Wunused])
TEST_AND_ADD_CFLAGS([-Wvla])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wall])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wdeclaration-after-statement])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wextra])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wformat-nonliteral])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wformat-security])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wmissing-declarations])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wmissing-prototypes])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wold-style-definition])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wshadow])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wunused-but-set-variable])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wunused])
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-Wvla])
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=62040
# https://gcc.gnu.org/bugzilla/show_bug.cgi?id=61622
AS_IF([test "$GCC" = "yes" ], [
gcc_version=`$CC -dumpversion`
gcc_wht_bug=""
case "$host_cpu" in
aarch64|arm64)
case "$gcc_version" in
4.9|4.9.0|4.9.1) gcc_wht_bug=yes ;;
esac
esac
AS_IF([test "$gcc_wht_bug" = "yes"], [
TEST_AND_ADD_CFLAGS([AM_CFLAGS], [-frename-registers])])])
AC_SUBST([AM_CFLAGS])
dnl === Check for machine specific flags
TEST_AND_ADD_CFLAGS([AVX2_FLAGS], [-mavx2])
AS_IF([test -n "$AVX2_FLAGS"], [
SAVED_CFLAGS=$CFLAGS
CFLAGS="$CFLAGS $AVX2_FLAGS"
AC_CHECK_HEADER([immintrin.h],
[AC_DEFINE(WEBP_HAVE_AVX2, [1],
[Set to 1 if AVX2 is supported])],
[AVX2_FLAGS=""],
dnl it's illegal to directly include avx2intrin.h, but it's
dnl included conditionally in immintrin.h, tricky!
[#ifndef __AVX2__
#error avx2 is not enabled
#endif
])
CFLAGS=$SAVED_CFLAGS])
AC_SUBST([AVX2_FLAGS])
TEST_AND_ADD_CFLAGS([SSE2_FLAGS], [-msse2])
AS_IF([test -n "$SSE2_FLAGS"], [
SAVED_CFLAGS=$CFLAGS
CFLAGS="$CFLAGS $SSE2_FLAGS"
AC_CHECK_HEADER([emmintrin.h],
[AC_DEFINE(WEBP_HAVE_SSE2, [1],
[Set to 1 if SSE2 is supported])],
[SSE2_FLAGS=""])
CFLAGS=$SAVED_CFLAGS])
AC_SUBST([SSE2_FLAGS])
dnl === CLEAR_LIBVARS([var_pfx])
dnl === Clears <var_pfx>_{INCLUDES,LIBS}.
AC_DEFUN([CLEAR_LIBVARS], [$1_INCLUDES=""; $1_LIBS=""])
@@ -93,6 +142,26 @@ AC_DEFUN([LIBCHECK_EPILOGUE],
CPPFLAGS=$SAVED_CPPFLAGS
LIBS=$SAVED_LIBS])
dnl === Check for gcc builtins
dnl === CHECK_FOR_BUILTIN([builtin], [param], [define])
dnl === links a C AC_LANG_PROGRAM, with <builtin>(<param>)
dnl === AC_DEFINE'ing <define> if successful.
AC_DEFUN([CHECK_FOR_BUILTIN],
[AC_LANG_PUSH([C])
AC_MSG_CHECKING([for $1])
AC_LINK_IFELSE([AC_LANG_PROGRAM([], [$1($2)])],
[AC_MSG_RESULT([yes])
AC_DEFINE([$3], [1],
[Set to 1 if $1 is available])],
[AC_MSG_RESULT([no])]),
AC_LANG_POP])
dnl AC_CHECK_FUNC doesn't work with builtin's.
CHECK_FOR_BUILTIN([__builtin_bswap16], [1u << 15], [HAVE_BUILTIN_BSWAP16])
CHECK_FOR_BUILTIN([__builtin_bswap32], [1u << 31], [HAVE_BUILTIN_BSWAP32])
CHECK_FOR_BUILTIN([__builtin_bswap64], [1ull << 63], [HAVE_BUILTIN_BSWAP64])
dnl === Check for pthread support
AC_ARG_ENABLE([threading],
AS_HELP_STRING([--disable-threading],
@@ -106,212 +175,241 @@ if test "$enable_threading" = "yes"; then
CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
CC="$PTHREAD_CC"
],
[enable_threading=no])
[AC_CHECK_FUNC([_beginthreadex],
[AC_DEFINE([WEBP_USE_THREAD], [1],
[Undefine this to disable thread
support.])],
[enable_threading=no])])
fi
AC_MSG_NOTICE([checking if threading is enabled... ${enable_threading-no}])
dnl === check for OpenGL/GLUT support ===
CLEAR_LIBVARS([GL])
WITHLIB_OPTION([gl], [GL])
LIBCHECK_PROLOGUE([GL])
AC_ARG_ENABLE([gl], AS_HELP_STRING([--disable-gl],
[Disable detection of OpenGL support
@<:@default=auto@:>@]))
AS_IF([test "x$enable_gl" != "xno"], [
CLEAR_LIBVARS([GL])
WITHLIB_OPTION([gl], [GL])
glut_cflags="none"
glut_ldflags="none"
case $host_os in
darwin*)
# Special case for OSX builds. Append these to give the user a chance to
# override with --with-gl*
glut_cflags="$glut_cflags|-framework GLUT -framework OpenGL"
glut_ldflags="$glut_ldflags|-framework GLUT -framework OpenGL"
;;
esac
LIBCHECK_PROLOGUE([GL])
GLUT_SAVED_CPPFLAGS="$CPPFLAGS"
SAVED_IFS="$IFS"
IFS="|"
for flag in $glut_cflags; do
# restore IFS immediately as the autoconf macros may need the default.
IFS="$SAVED_IFS"
unset ac_cv_header_GL_glut_h
unset ac_cv_header_OpenGL_glut_h
case $flag in
none) ;;
*) CPPFLAGS="$flag $CPPFLAGS";;
glut_cflags="none"
glut_ldflags="none"
case $host_os in
darwin*)
# Special case for OSX builds. Append these to give the user a chance to
# override with --with-gl*
glut_cflags="$glut_cflags|-framework GLUT -framework OpenGL"
glut_ldflags="$glut_ldflags|-framework GLUT -framework OpenGL"
;;
esac
AC_CHECK_HEADERS([GL/glut.h GLUT/glut.h OpenGL/glut.h],
[glut_headers=yes;
test "$flag" = "none" || GL_INCLUDES="$CPPFLAGS";
break])
CPPFLAGS="$GLUT_SAVED_CPPFLAGS"
test "$glut_headers" = "yes" && break
done
IFS="$SAVED_IFS"
if test "$glut_headers" = "yes"; then
AC_LANG_PUSH([C])
GLUT_SAVED_LDFLAGS="$LDFLAGS"
GLUT_SAVED_CPPFLAGS="$CPPFLAGS"
SAVED_IFS="$IFS"
IFS="|"
for flag in $glut_ldflags; do
for flag in $glut_cflags; do
# restore IFS immediately as the autoconf macros may need the default.
IFS="$SAVED_IFS"
unset ac_cv_search_glBegin
unset ac_cv_header_GL_glut_h
unset ac_cv_header_OpenGL_glut_h
case $flag in
none) ;;
*) LDFLAGS="$flag $LDFLAGS";;
*) CPPFLAGS="$flag $CPPFLAGS";;
esac
# find libGL
GL_SAVED_LIBS="$LIBS"
AC_SEARCH_LIBS([glBegin], [GL OpenGL])
LIBS="$GL_SAVED_LIBS"
# A direct link to libGL may not be necessary on e.g., linux.
GLUT_SAVED_LIBS="$LIBS"
for lib in "" "-lglut" "-lglut $ac_cv_search_glBegin"; do
LIBS="$lib"
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([
#ifdef __cplusplus
# define EXTERN_C extern "C"
#else
# define EXTERN_C
#endif
EXTERN_C char glOrtho();
EXTERN_C char glutMainLoop();
],[
glOrtho();
glutMainLoop();
])
],
AC_DEFINE(WEBP_HAVE_GL, [1],
[Set to 1 if OpenGL is supported])
[glut_support=yes], []
)
if test "$glut_support" = "yes"; then
GL_LIBS="$LDFLAGS $lib"
break
fi
done
LIBS="$GLUT_SAVED_LIBS"
LDFLAGS="$GLUT_SAVED_LDFLAGS"
test "$glut_support" = "yes" && break
AC_CHECK_HEADERS([GL/glut.h GLUT/glut.h OpenGL/glut.h],
[glut_headers=yes;
test "$flag" = "none" || GL_INCLUDES="$CPPFLAGS";
break])
CPPFLAGS="$GLUT_SAVED_CPPFLAGS"
test "$glut_headers" = "yes" && break
done
IFS="$SAVED_IFS"
AC_LANG_POP
fi
LIBCHECK_EPILOGUE([GL])
if test "$glut_headers" = "yes"; then
AC_LANG_PUSH([C])
GLUT_SAVED_LDFLAGS="$LDFLAGS"
SAVED_IFS="$IFS"
IFS="|"
for flag in $glut_ldflags; do
# restore IFS immediately as the autoconf macros may need the default.
IFS="$SAVED_IFS"
unset ac_cv_search_glBegin
if test "$glut_support" = "yes" -a "$enable_libwebpdemux" = "yes"; then
build_vwebp=yes
fi
case $flag in
none) ;;
*) LDFLAGS="$flag $LDFLAGS";;
esac
# find libGL
GL_SAVED_LIBS="$LIBS"
AC_SEARCH_LIBS([glBegin], [GL OpenGL opengl32])
LIBS="$GL_SAVED_LIBS"
# A direct link to libGL may not be necessary on e.g., linux.
GLUT_SAVED_LIBS="$LIBS"
for lib in "" "-lglut" "-lglut $ac_cv_search_glBegin"; do
LIBS="$lib"
AC_LINK_IFELSE(
[AC_LANG_PROGRAM([
#ifdef __cplusplus
# define EXTERN_C extern "C"
#else
# define EXTERN_C
#endif
EXTERN_C char glOrtho();
EXTERN_C char glutMainLoop();
],[
glOrtho();
glutMainLoop();
])
],
AC_DEFINE(WEBP_HAVE_GL, [1],
[Set to 1 if OpenGL is supported])
[glut_support=yes], []
)
if test "$glut_support" = "yes"; then
GL_LIBS="$LDFLAGS $lib"
break
fi
done
LIBS="$GLUT_SAVED_LIBS"
LDFLAGS="$GLUT_SAVED_LDFLAGS"
test "$glut_support" = "yes" && break
done
IFS="$SAVED_IFS"
AC_LANG_POP
fi
LIBCHECK_EPILOGUE([GL])
if test "$glut_support" = "yes" -a "$enable_libwebpdemux" = "yes"; then
build_vwebp=yes
fi
])
AM_CONDITIONAL([BUILD_VWEBP], [test "$build_vwebp" = "yes"])
dnl === check for PNG support ===
CLEAR_LIBVARS([PNG])
AC_PATH_PROGS(LIBPNG_CONFIG,
[libpng-config libpng15-config libpng14-config libpng12-config])
if test -n "$LIBPNG_CONFIG"; then
PNG_INCLUDES=`$LIBPNG_CONFIG --cflags`
PNG_PREFIX=`$LIBPNG_CONFIG --prefix`
if test "${PNG_PREFIX}/lib" != "/usr/lib" ; then
PNG_LIBS="-L${PNG_PREFIX}/lib"
AC_ARG_ENABLE([png], AS_HELP_STRING([--disable-png],
[Disable detection of PNG format support
@<:@default=auto@:>@]))
AS_IF([test "x$enable_png" != "xno"], [
CLEAR_LIBVARS([PNG])
AC_PATH_PROGS([LIBPNG_CONFIG],
[libpng-config libpng16-config libpng15-config libpng14-config \
libpng12-config])
if test -n "$LIBPNG_CONFIG"; then
PNG_INCLUDES=`$LIBPNG_CONFIG --cflags`
PNG_LIBS="`$LIBPNG_CONFIG --ldflags`"
fi
PNG_LIBS="$PNG_LIBS `$LIBPNG_CONFIG --libs`"
fi
WITHLIB_OPTION([png], [PNG])
WITHLIB_OPTION([png], [PNG])
LIBCHECK_PROLOGUE([PNG])
AC_CHECK_HEADER(png.h,
AC_SEARCH_LIBS(png_get_libpng_ver, [png],
[test "$ac_cv_search_png_get_libpng_ver" = "none required" \
|| PNG_LIBS="$PNG_LIBS $ac_cv_search_png_get_libpng_ver"
PNG_INCLUDES="$PNG_INCLUDES -DWEBP_HAVE_PNG"
AC_DEFINE(WEBP_HAVE_PNG, [1],
[Set to 1 if PNG library is installed])
png_support=yes
],
[AC_MSG_WARN(Optional png library not found)
PNG_LIBS=""
PNG_INCLUDES=""
],
[$MATH_LIBS]),
[AC_MSG_WARN(png library not available - no png.h)
PNG_LIBS=""
PNG_INCLUDES=""
],
)
LIBCHECK_EPILOGUE([PNG])
LIBCHECK_PROLOGUE([PNG])
AC_CHECK_HEADER(png.h,
AC_SEARCH_LIBS(png_get_libpng_ver, [png],
[test "$ac_cv_search_png_get_libpng_ver" = "none required" \
|| PNG_LIBS="$PNG_LIBS $ac_cv_search_png_get_libpng_ver"
PNG_INCLUDES="$PNG_INCLUDES -DWEBP_HAVE_PNG"
AC_DEFINE(WEBP_HAVE_PNG, [1],
[Set to 1 if PNG library is installed])
png_support=yes
],
[AC_MSG_WARN(Optional png library not found)
PNG_LIBS=""
PNG_INCLUDES=""
],
[$MATH_LIBS]),
[AC_MSG_WARN(png library not available - no png.h)
PNG_LIBS=""
PNG_INCLUDES=""
],
)
LIBCHECK_EPILOGUE([PNG])
])
dnl === check for JPEG support ===
CLEAR_LIBVARS([JPEG])
WITHLIB_OPTION([jpeg], [JPEG])
AC_ARG_ENABLE([jpeg],
AS_HELP_STRING([--disable-jpeg],
[Disable detection of JPEG format support
@<:@default=auto@:>@]))
AS_IF([test "x$enable_jpeg" != "xno"], [
CLEAR_LIBVARS([JPEG])
WITHLIB_OPTION([jpeg], [JPEG])
LIBCHECK_PROLOGUE([JPEG])
AC_CHECK_HEADER(jpeglib.h,
AC_CHECK_LIB(jpeg, jpeg_set_defaults,
[JPEG_LIBS="$JPEG_LIBS -ljpeg"
JPEG_INCLUDES="$JPEG_INCLUDES -DWEBP_HAVE_JPEG"
AC_DEFINE(WEBP_HAVE_JPEG, [1],
[Set to 1 if JPEG library is installed])
jpeg_support=yes
],
AC_MSG_WARN(Optional jpeg library not found),
[$MATH_LIBS]),
AC_MSG_WARN(jpeg library not available - no jpeglib.h)
)
LIBCHECK_EPILOGUE([JPEG])
LIBCHECK_PROLOGUE([JPEG])
AC_CHECK_HEADER(jpeglib.h,
AC_CHECK_LIB(jpeg, jpeg_set_defaults,
[JPEG_LIBS="$JPEG_LIBS -ljpeg"
JPEG_INCLUDES="$JPEG_INCLUDES -DWEBP_HAVE_JPEG"
AC_DEFINE(WEBP_HAVE_JPEG, [1],
[Set to 1 if JPEG library is installed])
jpeg_support=yes
],
AC_MSG_WARN(Optional jpeg library not found),
[$MATH_LIBS]),
AC_MSG_WARN(jpeg library not available - no jpeglib.h)
)
LIBCHECK_EPILOGUE([JPEG])
])
dnl === check for TIFF support ===
CLEAR_LIBVARS([TIFF])
WITHLIB_OPTION([tiff], [TIFF])
AC_ARG_ENABLE([tiff],
AS_HELP_STRING([--disable-tiff],
[Disable detection of TIFF format support
@<:@default=auto@:>@]))
AS_IF([test "x$enable_tiff" != "xno"], [
CLEAR_LIBVARS([TIFF])
WITHLIB_OPTION([tiff], [TIFF])
LIBCHECK_PROLOGUE([TIFF])
AC_CHECK_HEADER(tiffio.h,
AC_CHECK_LIB(tiff, TIFFGetVersion,
[TIFF_LIBS="$TIFF_LIBS -ltiff"
TIFF_INCLUDES="$TIFF_INCLUDES -DWEBP_HAVE_TIFF"
AC_DEFINE(WEBP_HAVE_TIFF, [1],
[Set to 1 if TIFF library is installed])
tiff_support=yes
],
AC_MSG_WARN(Optional tiff library not found),
[$MATH_LIBS]),
AC_MSG_WARN(tiff library not available - no tiffio.h)
)
LIBCHECK_EPILOGUE([TIFF])
LIBCHECK_PROLOGUE([TIFF])
AC_CHECK_HEADER(tiffio.h,
AC_CHECK_LIB(tiff, TIFFGetVersion,
[TIFF_LIBS="$TIFF_LIBS -ltiff"
TIFF_INCLUDES="$TIFF_INCLUDES -DWEBP_HAVE_TIFF"
AC_DEFINE(WEBP_HAVE_TIFF, [1],
[Set to 1 if TIFF library is installed])
tiff_support=yes
],
AC_MSG_WARN(Optional tiff library not found),
[$MATH_LIBS]),
AC_MSG_WARN(tiff library not available - no tiffio.h)
)
LIBCHECK_EPILOGUE([TIFF])
])
dnl === check for GIF support ===
CLEAR_LIBVARS([GIF])
WITHLIB_OPTION([gif], [GIF])
AC_ARG_ENABLE([gif], AS_HELP_STRING([--disable-gif],
[Disable detection of GIF format support
@<:@default=auto@:>@]))
AS_IF([test "x$enable_gif" != "xno"], [
CLEAR_LIBVARS([GIF])
WITHLIB_OPTION([gif], [GIF])
LIBCHECK_PROLOGUE([GIF])
AC_CHECK_HEADER(gif_lib.h,
AC_CHECK_LIB([gif], [DGifOpenFileHandle],
[GIF_LIBS="$GIF_LIBS -lgif"
AC_DEFINE(WEBP_HAVE_GIF, [1],
[Set to 1 if GIF library is installed])
gif_support=yes
],
AC_MSG_WARN(Optional gif library not found),
[$MATH_LIBS]),
AC_MSG_WARN(gif library not available - no gif_lib.h)
)
LIBCHECK_EPILOGUE([GIF])
LIBCHECK_PROLOGUE([GIF])
AC_CHECK_HEADER(gif_lib.h,
AC_CHECK_LIB([gif], [DGifOpenFileHandle],
[GIF_LIBS="$GIF_LIBS -lgif"
AC_DEFINE(WEBP_HAVE_GIF, [1],
[Set to 1 if GIF library is installed])
gif_support=yes
],
AC_MSG_WARN(Optional gif library not found),
[$MATH_LIBS]),
AC_MSG_WARN(gif library not available - no gif_lib.h)
)
LIBCHECK_EPILOGUE([GIF])
if test "$gif_support" = "yes" -a \
"$enable_libwebpmux" = "yes"; then
build_gif2webp=yes
fi
if test "$gif_support" = "yes" -a \
"$enable_libwebpmux" = "yes"; then
build_gif2webp=yes
fi
])
AM_CONDITIONAL([BUILD_GIF2WEBP], [test "${build_gif2webp}" = "yes"])
dnl === check for WIC support ===
@@ -322,7 +420,9 @@ AC_ARG_ENABLE([wic],
@<:@default=auto@:>@]),,
[enable_wic=yes])
if test "$target_os" = "mingw32" -a "$enable_wic" = "yes"; then
case $host_os in
mingw*)
if test "$enable_wic" = "yes"; then
AC_CHECK_HEADERS([wincodec.h shlwapi.h windows.h])
if test "$ac_cv_header_wincodec_h" = "yes"; then
AC_MSG_CHECKING(for Windows Imaging Component support)
@@ -362,6 +462,20 @@ if test "$target_os" = "mingw32" -a "$enable_wic" = "yes"; then
AC_MSG_RESULT(${wic_support-no})
fi
fi
esac
dnl === If --enable-aligned is defined, define WEBP_FORCE_ALIGNED
AC_MSG_CHECKING(if --enable-aligned option is specified)
AC_ARG_ENABLE([aligned],
AS_HELP_STRING([--enable-aligned],
[Force aligned memory operations in non-dsp code
(may be slower)]))
if test "$enable_aligned" = "yes"; then
AC_DEFINE(WEBP_FORCE_ALIGNED, [1],
[Define to 1 to force aligned memory operations])
fi
AC_MSG_RESULT(${enable_aligned-no})
dnl === If --enable-swap-16bit-csp is defined, add -DWEBP_SWAP_16BIT_CSP
@@ -416,7 +530,7 @@ AM_CONDITIONAL([BUILD_LIBWEBPDECODER], [test "$enable_libwebpdecoder" = "yes"])
dnl =========================
AC_CONFIG_MACRO_DIR([m4])
AC_CONFIG_HEADERS([config.h])
AC_CONFIG_HEADERS([src/webp/config.h])
AC_CONFIG_FILES([Makefile src/Makefile man/Makefile \
examples/Makefile src/dec/Makefile \
src/enc/Makefile src/dsp/Makefile \
@@ -434,7 +548,7 @@ WebP Configuration Summary
Shared libraries: ${enable_shared}
Static libraries: ${enable_static}
Threaded decode: ${enable_threading-no}
Threading support: ${enable_threading-no}
libwebp: yes
libwebpdecoder: ${enable_libwebpdecoder-no}
libwebpdemux: ${enable_libwebpdemux-no}

View File

@@ -46,25 +46,16 @@ for:
* **Animation.** An image may have multiple frames with pauses between them,
making it an animation.
* **Image Fragmentation.** A single bitstream in WebP has an inherent
limitation for width or height of 2^14 pixels, and, when using VP8, a 512
KiB limit on the size of the first compressed partition. To support larger
images, the format supports images that are composed of multiple fragments,
each encoded as a separate bitstream. All fragments logically form a single
image: they have common metadata, color profile, etc. Image fragmentation
may also improve efficiency for larger images, e.g., grass can be encoded
differently than sky.
The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC 2119][].
Bit numbering in chunk diagrams starts at `0` for the most significant bit
('MSB 0') as described in [RFC 1166][].
**Note:** Out of the features mentioned above, lossy compression, lossless
compression, transparency, metadata, color profile and animation are finalized
and are to be considered stable. On the other hand, image fragmentation is
experimental as of now, and is open to discussion, feedback and comments.
The same is indicated using annotation "_status: experimental_" in the relevant
sections of this document.
and are to be considered stable.
Terminology &amp; Basics
------------------------
@@ -79,7 +70,7 @@ Below are additional terms used throughout this document:
_Reader/Writer_
: Code that reads WebP files is referred to as a _reader_, while code that
writes them is referred to as a _writer_.
writes them is referred to as a _writer_.
_uint16_
@@ -101,10 +92,12 @@ _FourCC_
_1-based_
: An unsigned integer field storing values offset by `-1`. e.g., Such a field
would store value _25_ as _24_.
would store value _25_ as _24_.
RIFF file format
RIFF File Format
----------------
The WebP file format is based on the RIFF (resource interchange file format)
document format.
@@ -144,7 +137,8 @@ _ChunkHeader('ABCD')_
chunks that apply to any RIFF file format, while FourCCs specific to a file
format are all lowercase. WebP does not follow this convention.
WebP file header
WebP File Header
----------------
0 1 2 3
@@ -164,8 +158,8 @@ WebP file header
File Size: 32 bits (_uint32_)
: The size of the file in bytes starting at offset 8. The maximum value of
this field is 2^32 minus 10 bytes and thus the size of the whole file is at
most 4GiB minus 2 bytes.
this field is 2^32 minus 10 bytes and thus the size of the whole file is at
most 4GiB minus 2 bytes.
'WEBP': 32 bits
@@ -177,7 +171,8 @@ the 'WEBP' FourCC. The file SHOULD NOT contain anything after it. As the size
of any chunk is even, the size given by the RIFF header is also even. The
contents of individual chunks will be described in the following sections.
Simple file format (lossy)
Simple File Format (Lossy)
--------------------------
This layout SHOULD be used if the image requires _lossy_ encoding and does not
@@ -215,7 +210,8 @@ width and height. That is assumed to be the width and height of the canvas.
The VP8 specification describes how to decode the image into Y'CbCr
format. To convert to RGB, Rec. 601 SHOULD be used.
Simple file format (lossless)
Simple File Format (Lossless)
-----------------------------
**Note:** Older readers may not support files using the lossless format.
@@ -253,7 +249,8 @@ The current specification of the VP8L bitstream can be found at
contains the VP8L image width and height. That is assumed to be the width
and height of the canvas.
Extended file format
Extended File Format
--------------------
**Note:** Older readers may not support files using the extended format.
@@ -278,10 +275,6 @@ For a _still image_, the _image data_ consists of a single frame, whereas for
an _animated image_, it consists of multiple frames. More details about frames
can be found in the [Animation](#animation) section.
Moreover, each frame can be fragmented or non-fragmented, as will be described
in the [Extended WebP file header](#extended_header) section. More details about
fragments can be found in the [Fragments](#fragments) section.
All chunks SHOULD be placed in the same order as listed above. If a chunk
appears in the wrong place, the file is invalid, but readers MAY parse the
file, ignoring the chunks that come too late.
@@ -302,7 +295,7 @@ Extended WebP file header:
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ChunkHeader('VP8X') |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
|Rsv|I|L|E|X|A|F| Reserved |
|Rsv|I|L|E|X|A|R| Reserved |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Canvas Width Minus One | ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
@@ -320,7 +313,7 @@ ICC profile (I): 1 bit
Alpha (L): 1 bit
: Set if any of the frames of the image contain transparency information
("alpha").
("alpha").
EXIF metadata (E): 1 bit
@@ -333,11 +326,11 @@ XMP metadata (X): 1 bit
Animation (A): 1 bit
: Set if this is an animated image. Data in 'ANIM' and 'ANMF' chunks should be
used to control the animation.
used to control the animation.
Image Fragmentation (F): 1 bit _\[status: experimental\]_
Reserved (R): 1 bit
: Set if any of the frames in the image are represented by fragments.
: SHOULD be `0`.
Reserved: 24 bits
@@ -382,9 +375,9 @@ animation.
Background Color: 32 bits (_uint32_)
: The default background color of the canvas in \[Blue, Green, Red, Alpha\]
byte order. This color MAY be used to fill the unused space on the canvas around
the frames, as well as the transparent pixels of the first frame. Background
color is also used when disposal method is `1`.
byte order. This color MAY be used to fill the unused space on the canvas
around the frames, as well as the transparent pixels of the first frame.
Background color is also used when disposal method is `1`.
**Note**:
@@ -394,6 +387,9 @@ color is also used when disposal method is `1`.
* Viewer applications SHOULD treat the background color value as a hint, and
are not required to use it.
* The canvas is cleared at the start of each loop. The background color MAY be
used to achieve this.
Loop Count: 16 bits (_uint16_)
: The number of times to loop the animation. `0` means infinitely.
@@ -402,7 +398,6 @@ This chunk MUST appear if the _Animation_ flag in the VP8X chunk is set.
If the _Animation_ flag is not set and this chunk is present, it
SHOULD be ignored.
ANMF chunk:
For animated images, this chunk contains information about a _single_ frame.
@@ -445,8 +440,8 @@ Frame Height Minus One: 24 bits (_uint24_)
Frame Duration: 24 bits (_uint24_)
: The time to wait before displaying the next frame, in 1 millisecond units.
In particular, frame duration of 0 is useful when one wants to update multiple
areas of the canvas at once during the animation.
In particular, frame duration of 0 is useful when one wants to update
multiple areas of the canvas at once during the animation.
Reserved: 6 bits
@@ -454,28 +449,28 @@ Reserved: 6 bits
Blending method (B): 1 bit
: Indicates how transparent pixels of _the current frame_ are to be blended with
corresponding pixels of the previous canvas:
: Indicates how transparent pixels of _the current frame_ are to be blended
with corresponding pixels of the previous canvas:
* `0`: Use alpha blending. After disposing of the previous frame, render the
current frame on the canvas using [alpha-blending](#alpha-blending). If the
current frame does not have an alpha channel, assume alpha value of 255,
effectively replacing the rectangle.
* `0`: Use alpha blending. After disposing of the previous frame, render the
current frame on the canvas using [alpha-blending](#alpha-blending). If
the current frame does not have an alpha channel, assume alpha value of
255, effectively replacing the rectangle.
* `1`: Do not blend. After disposing of the previous frame, render the
current frame on the canvas by overwriting the rectangle covered by the
current frame.
* `1`: Do not blend. After disposing of the previous frame, render the
current frame on the canvas by overwriting the rectangle covered by the
current frame.
Disposal method (D): 1 bit
: Indicates how _the current frame_ is to be treated after it has been displayed
(before rendering the next frame) on the canvas:
: Indicates how _the current frame_ is to be treated after it has been
displayed (before rendering the next frame) on the canvas:
* `0`: Do not dispose. Leave the canvas as is.
* `0`: Do not dispose. Leave the canvas as is.
* `1`: Dispose to background color. Fill the _rectangle_ on the canvas covered
by the _current frame_ with background color specified in the
[ANIM chunk](#anim_chunk).
* `1`: Dispose to background color. Fill the _rectangle_ on the canvas
covered by the _current frame_ with background color specified in the
[ANIM chunk](#anim_chunk).
**Notes**:
@@ -506,9 +501,7 @@ Disposal method (D): 1 bit
Frame Data: _Chunk Size_ - `16` bytes
: For a fragmented frame, it consists of multiple [fragment chunks](#fragments).
: For a non-fragmented frame, it consists of:
: Consists of:
* An optional [alpha subchunk](#alpha) for the frame.
@@ -519,49 +512,6 @@ Frame Data: _Chunk Size_ - `16` bytes
**Note**: The 'ANMF' payload, _Frame Data_ above, consists of individual
_padded_ chunks as described by the [RIFF file format](#riff-file-format).
#### Fragments _\[status: experimental\]_
For images that are represented by fragments, this chunk contains data for
a single fragment. If the _Image Fragmentation Flag_ is not set, then this chunk
SHOULD NOT be present.
0 1 2 3
0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| ChunkHeader('FRGM') |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
| Fragment X | ...
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
... Fragment Y | Fragment Data |
+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
Fragment X: 24 bits (_uint24_)
: The X coordinate of the upper left corner of the fragment is `Fragment X * 2`
Fragment Y: 24 bits (_uint24_)
: The Y coordinate of the upper left corner of the fragment is `Fragment Y * 2`
Fragment Data: _Chunk Size_ - `6` bytes
: It contains:
* An optional [alpha subchunk](#alpha) for the fragment.
* The [bitstream subchunk](#bitstream-vp8vp8l) for the fragment.
* An optional list of [unknown chunks](#unknown-chunks).
Note: The width and height of the fragment is obtained from the bitstream
subchunk.
The fragments of a frame SHOULD have the following properties:
* They collectively cover the whole frame.
* No pair of fragments have any overlapping region on the frame.
* No portion of any fragment should be located outside of the canvas.
#### Alpha
0 1 2 3
@@ -579,20 +529,20 @@ Reserved (Rsv): 2 bits
Pre-processing (P): 2 bits
: These INFORMATIVE bits are used to signal the pre-processing that has
been performed during compression. The decoder can use this information to
e.g. dither the values or smooth the gradients prior to display.
been performed during compression. The decoder can use this information to
e.g. dither the values or smooth the gradients prior to display.
* `0`: no pre-processing
* `1`: level reduction
* `0`: no pre-processing
* `1`: level reduction
Filtering method (F): 2 bits
: The filtering method used:
* `0`: None.
* `1`: Horizontal filter.
* `2`: Vertical filter.
* `3`: Gradient filter.
* `0`: None.
* `1`: Horizontal filter.
* `2`: Vertical filter.
* `3`: Gradient filter.
For each pixel, filtering is performed using the following calculations.
Assume the alpha values surrounding the current `X` position are labeled as:
@@ -636,15 +586,15 @@ Compression method (C): 2 bits
: The compression method used:
* `0`: No compression.
* `1`: Compressed using the WebP lossless format.
* `0`: No compression.
* `1`: Compressed using the WebP lossless format.
Alpha bitstream: _Chunk Size_ - `1` bytes
: Encoded alpha bitstream.
This optional chunk contains encoded alpha data for this frame/fragment. A
frame/fragment containing a 'VP8L' chunk SHOULD NOT contain this chunk.
This optional chunk contains encoded alpha data for this frame. A frame
containing a 'VP8L' chunk SHOULD NOT contain this chunk.
**Rationale**: The transparency information is already part of the 'VP8L'
chunk.
@@ -675,15 +625,15 @@ compression method is '0') or compressed using the lossless format
#### Bitstream (VP8/VP8L)
This chunk contains compressed bitstream data for a single frame/fragment.
This chunk contains compressed bitstream data for a single frame.
A bitstream chunk may be either (i) a VP8 chunk, using "VP8 " (note the
significant fourth-character space) as its tag _or_ (ii) a VP8L chunk, using
"VP8L" as its tag.
The formats of VP8 and VP8L chunks are as described in sections
[Simple file format (lossy)](#simple-file-format-lossy)
and [Simple file format (lossless)](#simple-file-format-lossless) respectively.
[Simple File Format (Lossy)](#simple-file-format-lossy)
and [Simple File Format (Lossless)](#simple-file-format-lossless) respectively.
#### Color profile
@@ -731,7 +681,6 @@ EXIF Metadata: _Chunk Size_ bytes
: image metadata in EXIF format.
XMP chunk:
0 1 2 3
@@ -762,47 +711,17 @@ A file MAY contain unknown chunks:
* At the end of the file as described in [Extended WebP file
header](#extended_header) section.
* At the end of FRGM and ANMF chunks as described in [Fragments](#fragments)
and [Animation](#animation) sections.
* At the end of ANMF chunks as described in the
[Animation](#animation) section.
Readers SHOULD ignore these chunks. Writers SHOULD preserve them in their
original order (unless they specifically intend to modify these chunks).
### Assembling the Canvas from fragments/frames
### Assembling the Canvas from frames
Here we provide an overview of how a reader should assemble a canvas in case
of a fragmented-image and in case of an animated image. The notation
_VP8X.field_ means the field in the 'VP8X' chunk with the same description.
Displaying a _fragmented image_ canvas MUST be equivalent to the following
pseudocode: _\[status: experimental\]_
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
assert VP8X.flags.hasFragments
canvas ← new black image of size VP8X.canvasWidth x VP8X.canvasHeight.
frgm_params ← nil
for chunk in image_data:
assert chunk.tag is "FRGM"
frgm_params.fragmentX = Fragment X
frgm_params.fragmentY = Fragment Y
for subchunk in 'Fragment Data':
if subchunk.tag == "ALPH":
assert alpha subchunks not found in 'Fragment Data' earlier
frgm_params.alpha = alpha_data
else if subchunk.tag == "VP8 " OR subchunk.tag == "VP8L":
assert bitstream subchunks not found in 'Fragment Data' earlier
frgm_params.bitstream = bitstream_data
frgm_params.fragmentWidth = Width extracted from bitstream subchunk
frgm_params.fragmentHeight = Height extracted from bitstream subchunk
assert VP8X.canvasWidth >=
frgm_params.fragmentX + frgm_params.fragmentWidth
assert VP8X.canvasHeight >=
frgm_params.fragmentY + frgm_params.fragmentHeight
assert fragment has the properties mentioned in "Image Fragments" section.
render fragment with frame_params.alpha and frame_params.bitstream on canvas
with top-left corner in (frgm_params.fragmentX, frgm_params.fragmentY).
canvas contains the decoded canvas.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Here we provide an overview of how a reader should assemble a canvas in the
case of an animated image. The notation _VP8X.field_ means the field in the
'VP8X' chunk with the same description.
Displaying an _animated image_ canvas MUST be equivalent to the following
pseudocode:
@@ -810,28 +729,25 @@ pseudocode:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
assert VP8X.flags.hasAnimation
canvas ← new image of size VP8X.canvasWidth x VP8X.canvasHeight with
background color ANIM.background_color.
background color ANIM.background_color.
loop_count ← ANIM.loopCount
dispose_method ← ANIM.disposeMethod
if loop_count == 0:
loop_count = ∞
frame_params ← nil
for loop = 0, ..., loop_count - 1
assert next chunk in image_data is ANMF
frame_params.frameX = Frame X
frame_params.frameY = Frame Y
frame_params.frameWidth = Frame Width Minus One + 1
frame_params.frameHeight = Frame Height Minus One + 1
frame_params.frameDuration = Frame Duration
assert VP8X.canvasWidth >= frame_params.frameX + frame_params.frameWidth
assert VP8X.canvasHeight >= frame_params.frameY + frame_params.frameHeight
if VP8X.flags.hasFragments and first subchunk in 'Frame Data' is FRGM
// Fragmented frame.
frame_params.{bitstream,alpha} = canvas decoded from subchunks in
'Frame Data' as per the pseudocode for
_fragmented image_ above.
else
// Non-fragmented frame.
assert next chunk in image_data is ANMF
for loop = 0..loop_count - 1
clear canvas to ANIM.background_color or application defined color
until eof or non-ANMF chunk
frame_params.frameX = Frame X
frame_params.frameY = Frame Y
frame_params.frameWidth = Frame Width Minus One + 1
frame_params.frameHeight = Frame Height Minus One + 1
frame_params.frameDuration = Frame Duration
frame_right = frame_params.frameX + frame_params.frameWidth
frame_bottom = frame_params.frameY + frame_params.frameHeight
assert VP8X.canvasWidth >= frame_right
assert VP8X.canvasHeight >= frame_bottom
for subchunk in 'Frame Data':
if subchunk.tag == "ALPH":
assert alpha subchunks not found in 'Frame Data' earlier
@@ -839,14 +755,15 @@ for loop = 0, ..., loop_count - 1
else if subchunk.tag == "VP8 " OR subchunk.tag == "VP8L":
assert bitstream subchunks not found in 'Frame Data' earlier
frame_params.bitstream = bitstream_data
render frame with frame_params.alpha and frame_params.bitstream on canvas
with top-left corner in (frame_params.frameX, frame_params.frameY), using
dispose method dispose_method.
Show the contents of the image for frame_params.frameDuration * 1ms.
canvas contains the decoded canvas.
render frame with frame_params.alpha and frame_params.bitstream on
canvas with top-left corner at (frame_params.frameX,
frame_params.frameY), using dispose method dispose_method.
canvas contains the decoded image.
Show the contents of the canvas for frame_params.frameDuration * 1ms.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
Example file layouts
Example File Layouts
--------------------
A lossy encoded image with alpha may look as follows:
@@ -878,17 +795,6 @@ RIFF/WEBP
+- XMP (metadata)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
A fragmented image may look as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- FRGM (fragment1 parameters + data)
+- FRGM (fragment2 parameters + data)
+- FRGM (fragment3 parameters + data)
+- FRGM (fragment4 parameters + data)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
An animated image with EXIF metadata may look as follows:
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
@@ -906,4 +812,5 @@ RIFF/WEBP
[webpllspec]: https://gerrit.chromium.org/gerrit/gitweb?p=webm/libwebp.git;a=blob;f=doc/webp-lossless-bitstream-spec.txt;hb=master
[iccspec]: http://www.color.org/icc_specs2.xalter
[metadata]: http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
[rfc 1166]: http://tools.ietf.org/html/rfc1166
[rfc 2119]: http://tools.ietf.org/html/rfc2119

View File

@@ -14,6 +14,7 @@ Specification for WebP Lossless Bitstream
_Jyrki Alakuijala, Ph.D., Google, Inc., 2012-06-19_
Paragraphs marked as \[AMENDED\] were amended on 2014-09-16.
Abstract
--------
@@ -172,8 +173,8 @@ It should be set to 0 when all alpha values are 255 in the picture, and
int alpha_is_used = ReadBits(1);
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
The version_number is a 3 bit code that must be discarded by the decoder
at this time. Complying encoders write a 3-bit value 0.
The version_number is a 3 bit code that must be set to 0. Any other value
should be treated as an error. \[AMENDED\]
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
int version_number = ReadBits(3);
@@ -330,7 +331,7 @@ uint32 Select(uint32 L, uint32 T, uint32 TL) {
abs(pGreen - GREEN(T)) + abs(pBlue - BLUE(T));
// Return either left or top, the one closer to the prediction.
if (pL <= pT) {
if (pL < pT) { // \[AMENDED\]
return L;
} else {
return T;
@@ -542,6 +543,9 @@ color.
argb = color_table[GREEN(argb)];
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
If the index is equal or larger than color_table_size, the argb color value
should be set to 0x00000000 (transparent black). \[AMENDED\]
When the color table is small (equal to or less than 16 colors), several
pixels are bundled into a single pixel. The pixel bundling packs several
(2, 4, or 8) pixels into a single pixel, reducing the image width

71
examples/Android.mk Normal file
View File

@@ -0,0 +1,71 @@
LOCAL_PATH := $(call my-dir)
################################################################################
# libexample_util
include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
example_util.c \
LOCAL_CFLAGS := $(WEBP_CFLAGS)
LOCAL_C_INCLUDES := $(LOCAL_PATH)/../src
LOCAL_MODULE := example_util
include $(BUILD_STATIC_LIBRARY)
################################################################################
# cwebp
include $(CLEAR_VARS)
# Note: to enable jpeg/png encoding the sources from AOSP can be used with
# minor modification to their Android.mk files.
LOCAL_SRC_FILES := \
cwebp.c \
jpegdec.c \
metadata.c \
pngdec.c \
tiffdec.c \
webpdec.c \
LOCAL_CFLAGS := $(WEBP_CFLAGS)
LOCAL_C_INCLUDES := $(LOCAL_PATH)/../src
LOCAL_STATIC_LIBRARIES := example_util webp
LOCAL_MODULE := cwebp
include $(BUILD_EXECUTABLE)
################################################################################
# dwebp
include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
dwebp.c \
LOCAL_CFLAGS := $(WEBP_CFLAGS)
LOCAL_C_INCLUDES := $(LOCAL_PATH)/../src
LOCAL_STATIC_LIBRARIES := example_util webp
LOCAL_MODULE := dwebp
include $(BUILD_EXECUTABLE)
################################################################################
# webpmux
include $(CLEAR_VARS)
LOCAL_SRC_FILES := \
webpmux.c \
LOCAL_CFLAGS := $(WEBP_CFLAGS)
LOCAL_C_INCLUDES := $(LOCAL_PATH)/../src
LOCAL_STATIC_LIBRARIES := example_util webpmux webp
LOCAL_MODULE := webpmux_example
include $(BUILD_EXECUTABLE)

View File

@@ -1,4 +1,4 @@
AM_CPPFLAGS = -I$(top_srcdir)/src
AM_CPPFLAGS = -I$(top_builddir)/src -I$(top_srcdir)/src
bin_PROGRAMS = dwebp cwebp
if BUILD_VWEBP
@@ -14,7 +14,7 @@ endif
noinst_LTLIBRARIES = libexampleutil.la
libexampleutil_la_SOURCES = example_util.c example_util.h
libexampleutil_la_SOURCES = example_util.c example_util.h stopwatch.h
dwebp_SOURCES = dwebp.c stopwatch.h
dwebp_CPPFLAGS = $(AM_CPPFLAGS) $(USE_EXPERIMENTAL_CODE)
@@ -25,10 +25,12 @@ cwebp_SOURCES = cwebp.c metadata.c metadata.h stopwatch.h
cwebp_SOURCES += jpegdec.c jpegdec.h
cwebp_SOURCES += pngdec.c pngdec.h
cwebp_SOURCES += tiffdec.c tiffdec.h
cwebp_SOURCES += webpdec.c webpdec.h
cwebp_SOURCES += wicdec.c wicdec.h
cwebp_CPPFLAGS = $(AM_CPPFLAGS) $(USE_EXPERIMENTAL_CODE)
cwebp_CPPFLAGS += $(JPEG_INCLUDES) $(PNG_INCLUDES) $(TIFF_INCLUDES)
cwebp_LDADD = ../src/libwebp.la $(JPEG_LIBS) $(PNG_LIBS) $(TIFF_LIBS)
cwebp_LDADD = libexampleutil.la ../src/libwebp.la
cwebp_LDADD += $(JPEG_LIBS) $(PNG_LIBS) $(TIFF_LIBS)
gif2webp_SOURCES = gif2webp.c gif2webp_util.c
gif2webp_CPPFLAGS = $(AM_CPPFLAGS) $(USE_EXPERIMENTAL_CODE) $(GIF_INCLUDES)

View File

@@ -17,17 +17,19 @@
#include <string.h>
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#include "webp/encode.h"
#include "./example_util.h"
#include "./metadata.h"
#include "./stopwatch.h"
#include "./jpegdec.h"
#include "./pngdec.h"
#include "./tiffdec.h"
#include "./webpdec.h"
#include "./wicdec.h"
#ifndef WEBP_DLL
@@ -93,6 +95,9 @@ static int ReadPicture(const char* const filename, WebPPicture* const pic,
} else {
// If no size specified, try to decode it using WIC.
ok = ReadPictureWithWIC(filename, pic, keep_alpha, metadata);
if (!ok) {
ok = ReadWebP(filename, pic, keep_alpha, metadata);
}
}
if (!ok) {
fprintf(stderr, "Error! Could not process file %s\n", filename);
@@ -106,26 +111,30 @@ typedef enum {
PNG_ = 0,
JPEG_,
TIFF_, // 'TIFF' clashes with libtiff
WEBP_,
UNSUPPORTED
} InputFileFormat;
static InputFileFormat GetImageType(FILE* in_file) {
InputFileFormat format = UNSUPPORTED;
uint32_t magic;
uint8_t buf[4];
uint32_t magic1, magic2;
uint8_t buf[12];
if ((fread(&buf[0], 4, 1, in_file) != 1) ||
if ((fread(&buf[0], 12, 1, in_file) != 1) ||
(fseek(in_file, 0, SEEK_SET) != 0)) {
return format;
}
magic = ((uint32_t)buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3];
if (magic == 0x89504E47U) {
magic1 = ((uint32_t)buf[0] << 24) | (buf[1] << 16) | (buf[2] << 8) | buf[3];
magic2 = ((uint32_t)buf[8] << 24) | (buf[9] << 16) | (buf[10] << 8) | buf[11];
if (magic1 == 0x89504E47U) {
format = PNG_;
} else if (magic >= 0xFFD8FF00U && magic <= 0xFFD8FFFFU) {
} else if (magic1 >= 0xFFD8FF00U && magic1 <= 0xFFD8FFFFU) {
format = JPEG_;
} else if (magic == 0x49492A00 || magic == 0x4D4D002A) {
} else if (magic1 == 0x49492A00 || magic1 == 0x4D4D002A) {
format = TIFF_;
} else if (magic1 == 0x52494646 && magic2 == 0x57454250) {
format = WEBP_;
}
return format;
}
@@ -148,6 +157,8 @@ static int ReadPicture(const char* const filename, WebPPicture* const pic,
ok = ReadJPEG(in_file, pic, metadata);
} else if (format == TIFF_) {
ok = ReadTIFF(filename, pic, keep_alpha, metadata);
} else if (format == WEBP_) {
ok = ReadWebP(filename, pic, keep_alpha, metadata);
}
} else {
// If image size is specified, infer it as YUV format.
@@ -266,10 +277,6 @@ static void PrintExtraInfoLossy(const WebPPicture* const pic, int short_output,
fprintf(stderr, " transparency: %6d (%.1f dB)\n",
stats->alpha_data_size, stats->PSNR[4]);
}
if (stats->layer_data_size) {
fprintf(stderr, " enhancement: %6d\n",
stats->layer_data_size);
}
fprintf(stderr, " Residuals bytes "
"|segment 1|segment 2|segment 3"
"|segment 4| total\n");
@@ -298,6 +305,9 @@ static void PrintExtraInfoLossy(const WebPPicture* const pic, int short_output,
PrintFullLosslessInfo(stats, "alpha");
}
}
}
static void PrintMapInfo(const WebPPicture* const pic) {
if (pic->extra_info != NULL) {
const int mb_w = (pic->width + 15) / 16;
const int mb_h = (pic->height + 15) / 16;
@@ -307,18 +317,18 @@ static void PrintExtraInfoLossy(const WebPPicture* const pic, int short_output,
for (x = 0; x < mb_w; ++x) {
const int c = pic->extra_info[x + y * mb_w];
if (type == 1) { // intra4/intra16
printf("%c", "+."[c]);
fprintf(stderr, "%c", "+."[c]);
} else if (type == 2) { // segments
printf("%c", ".-*X"[c]);
fprintf(stderr, "%c", ".-*X"[c]);
} else if (type == 3) { // quantizers
printf("%.2d ", c);
fprintf(stderr, "%.2d ", c);
} else if (type == 6 || type == 7) {
printf("%3d ", c);
fprintf(stderr, "%3d ", c);
} else {
printf("0x%.2x ", c);
fprintf(stderr, "0x%.2x ", c);
}
}
printf("\n");
fprintf(stderr, "\n");
}
}
}
@@ -531,9 +541,8 @@ static int WriteWebPWithMetadata(FILE* const out,
//------------------------------------------------------------------------------
static int ProgressReport(int percent, const WebPPicture* const picture) {
printf("[%s]: %3d %% \r",
(char*)picture->user_data, percent);
fflush(stdout);
fprintf(stderr, "[%s]: %3d %% \r",
(char*)picture->user_data, percent);
return 1; // all ok
}
@@ -550,35 +559,39 @@ static void HelpShort(void) {
static void HelpLong(void) {
printf("Usage:\n");
printf(" cwebp [-preset <...>] [options] in_file [-o out_file]\n\n");
printf("If input size (-s) for an image is not specified, "
"it is assumed to be a PNG, JPEG or TIFF file.\n");
printf("If input size (-s) for an image is not specified, it is\n"
"assumed to be a PNG, JPEG, TIFF or WebP file.\n");
#ifdef HAVE_WINCODEC_H
printf("Windows builds can take as input any of the files handled by WIC\n");
printf("Windows builds can take as input any of the files handled by WIC.\n");
#endif
printf("options:\n");
printf("\nOptions:\n");
printf(" -h / -help ............ short help\n");
printf(" -H / -longhelp ........ long help\n");
printf(" -q <float> ............. quality factor (0:small..100:big)\n");
printf(" -alpha_q <int> ......... Transparency-compression quality "
"(0..100).\n");
printf(" -preset <string> ....... Preset setting, one of:\n");
printf(" -alpha_q <int> ......... transparency-compression quality "
"(0..100)\n");
printf(" -preset <string> ....... preset setting, one of:\n");
printf(" default, photo, picture,\n");
printf(" drawing, icon, text\n");
printf(" -preset must come first, as it overwrites other parameters.");
printf(" -preset must come first, as it overwrites other parameters\n");
#if WEBP_ENCODER_ABI_VERSION > 0x0202
printf(" -z <int> ............... activates lossless preset with given\n"
" level in [0:fast, ..., 9:slowest]\n");
#endif
printf("\n");
printf(" -m <int> ............... compression method (0=fast, 6=slowest)\n");
printf(" -segments <int> ........ number of segments to use (1..4)\n");
printf(" -size <int> ............ Target size (in bytes)\n");
printf(" -psnr <float> .......... Target PSNR (in dB. typically: 42)\n");
printf(" -size <int> ............ target size (in bytes)\n");
printf(" -psnr <float> .......... target PSNR (in dB. typically: 42)\n");
printf("\n");
printf(" -s <int> <int> ......... Input size (width x height) for YUV\n");
printf(" -sns <int> ............. Spatial Noise Shaping (0:off, 100:max)\n");
printf(" -s <int> <int> ......... input size (width x height) for YUV\n");
printf(" -sns <int> ............. spatial noise shaping (0:off, 100:max)\n");
printf(" -f <int> ............... filter strength (0=off..100)\n");
printf(" -sharpness <int> ....... "
"filter sharpness (0:most .. 7:least sharp)\n");
printf(" -strong ................ use strong filter instead "
"of simple (default).\n");
printf(" -nostrong .............. use simple filter instead of strong.\n");
"of simple (default)\n");
printf(" -nostrong .............. use simple filter instead of strong\n");
printf(" -partition_limit <int> . limit quality to fit the 512k limit on\n");
printf(" "
"the first partition (0=no degradation ... 100=full)\n");
@@ -587,26 +600,23 @@ static void HelpLong(void) {
printf(" -resize <w> <h> ........ resize picture (after any cropping)\n");
printf(" -mt .................... use multi-threading if available\n");
printf(" -low_memory ............ reduce memory usage (slower encoding)\n");
#ifdef WEBP_EXPERIMENTAL_FEATURES
printf(" -444 / -422 / -gray ..... Change colorspace\n");
#endif
printf(" -map <int> ............. print map of extra info.\n");
printf(" -print_psnr ............ prints averaged PSNR distortion.\n");
printf(" -print_ssim ............ prints averaged SSIM distortion.\n");
printf(" -print_lsim ............ prints local-similarity distortion.\n");
printf(" -d <file.pgm> .......... dump the compressed output (PGM file).\n");
printf(" -alpha_method <int> .... Transparency-compression method (0..1)\n");
printf(" -alpha_filter <string> . predictive filtering for alpha plane.\n");
printf(" One of: none, fast (default) or best.\n");
printf(" -alpha_cleanup ......... Clean RGB values in transparent area.\n");
printf(" -blend_alpha <hex> ..... Blend colors against background color\n"
printf(" -map <int> ............. print map of extra info\n");
printf(" -print_psnr ............ prints averaged PSNR distortion\n");
printf(" -print_ssim ............ prints averaged SSIM distortion\n");
printf(" -print_lsim ............ prints local-similarity distortion\n");
printf(" -d <file.pgm> .......... dump the compressed output (PGM file)\n");
printf(" -alpha_method <int> .... transparency-compression method (0..1)\n");
printf(" -alpha_filter <string> . predictive filtering for alpha plane,\n");
printf(" one of: none, fast (default) or best\n");
printf(" -alpha_cleanup ......... clean RGB values in transparent area\n");
printf(" -blend_alpha <hex> ..... blend colors against background color\n"
" expressed as RGB values written in\n"
" hexadecimal, e.g. 0xc0e0d0 for red=0xc0\n"
" green=0xe0 and blue=0xd0.\n");
printf(" -noalpha ............... discard any transparency information.\n");
printf(" -lossless .............. Encode image losslessly.\n");
printf(" -hint <string> ......... Specify image characteristics hint.\n");
printf(" One of: photo, picture or graph\n");
" green=0xe0 and blue=0xd0\n");
printf(" -noalpha ............... discard any transparency information\n");
printf(" -lossless .............. encode image losslessly\n");
printf(" -hint <string> ......... specify image characteristics hint,\n");
printf(" one of: photo, picture or graph\n");
printf("\n");
printf(" -metadata <string> ..... comma separated list of metadata to\n");
@@ -617,18 +627,18 @@ static void HelpLong(void) {
printf("\n");
printf(" -short ................. condense printed message\n");
printf(" -quiet ................. don't print anything.\n");
printf(" -version ............... print version number and exit.\n");
printf(" -quiet ................. don't print anything\n");
printf(" -version ............... print version number and exit\n");
#ifndef WEBP_DLL
printf(" -noasm ................. disable all assembly optimizations.\n");
printf(" -noasm ................. disable all assembly optimizations\n");
#endif
printf(" -v ..................... verbose, e.g. print encoding/decoding "
"times\n");
printf(" -progress .............. report encoding progress\n");
printf("\n");
printf("Experimental Options:\n");
printf(" -jpeg_like ............. Roughly match expected JPEG size.\n");
printf(" -af .................... auto-adjust filter strength.\n");
printf(" -jpeg_like ............. roughly match expected JPEG size\n");
printf(" -af .................... auto-adjust filter strength\n");
printf(" -pre <int> ............. pre-processing filter\n");
printf("\n");
}
@@ -636,7 +646,7 @@ static void HelpLong(void) {
//------------------------------------------------------------------------------
// Error messages
static const char* const kErrorMessages[] = {
static const char* const kErrorMessages[VP8_ENC_ERROR_LAST] = {
"OK",
"OUT_OF_MEMORY: Out of memory allocating objects",
"BITSTREAM_OUT_OF_MEMORY: Out of memory re-allocating byte buffer",
@@ -669,6 +679,10 @@ int main(int argc, const char *argv[]) {
uint32_t background_color = 0xffffffu;
int crop = 0, crop_x = 0, crop_y = 0, crop_w = 0, crop_h = 0;
int resize_w = 0, resize_h = 0;
#if WEBP_ENCODER_ABI_VERSION > 0x0202
int lossless_preset = 6;
int use_lossless_preset = -1; // -1=unset, 0=don't use, 1=use it
#endif
int show_progress = 0;
int keep_metadata = 0;
int metadata_written = 0;
@@ -696,6 +710,7 @@ int main(int argc, const char *argv[]) {
}
for (c = 1; c < argc; ++c) {
int parse_error = 0;
if (!strcmp(argv[c], "-h") || !strcmp(argv[c], "-help")) {
HelpShort();
return 0;
@@ -717,23 +732,33 @@ int main(int argc, const char *argv[]) {
config.show_compressed = 1;
print_distortion = 2;
} else if (!strcmp(argv[c], "-short")) {
short_output++;
++short_output;
} else if (!strcmp(argv[c], "-s") && c < argc - 2) {
picture.width = strtol(argv[++c], NULL, 0);
picture.height = strtol(argv[++c], NULL, 0);
picture.width = ExUtilGetInt(argv[++c], 0, &parse_error);
picture.height = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-m") && c < argc - 1) {
config.method = strtol(argv[++c], NULL, 0);
config.method = ExUtilGetInt(argv[++c], 0, &parse_error);
#if WEBP_ENCODER_ABI_VERSION > 0x0202
use_lossless_preset = 0; // disable -z option
#endif
} else if (!strcmp(argv[c], "-q") && c < argc - 1) {
config.quality = (float)strtod(argv[++c], NULL);
config.quality = ExUtilGetFloat(argv[++c], &parse_error);
#if WEBP_ENCODER_ABI_VERSION > 0x0202
use_lossless_preset = 0; // disable -z option
} else if (!strcmp(argv[c], "-z") && c < argc - 1) {
lossless_preset = ExUtilGetInt(argv[++c], 0, &parse_error);
if (use_lossless_preset != 0) use_lossless_preset = 1;
#endif
} else if (!strcmp(argv[c], "-alpha_q") && c < argc - 1) {
config.alpha_quality = strtol(argv[++c], NULL, 0);
config.alpha_quality = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-alpha_method") && c < argc - 1) {
config.alpha_compression = strtol(argv[++c], NULL, 0);
config.alpha_compression = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-alpha_cleanup")) {
keep_alpha = keep_alpha ? 2 : 0;
} else if (!strcmp(argv[c], "-blend_alpha") && c < argc - 1) {
blend_alpha = 1;
background_color = strtol(argv[++c], NULL, 16); // <- parses '0x' prefix
// background color is given in hex with an optional '0x' prefix
background_color = ExUtilGetInt(argv[++c], 16, &parse_error);
background_color = background_color & 0x00ffffffu;
} else if (!strcmp(argv[c], "-alpha_filter") && c < argc - 1) {
++c;
@@ -764,13 +789,13 @@ int main(int argc, const char *argv[]) {
goto Error;
}
} else if (!strcmp(argv[c], "-size") && c < argc - 1) {
config.target_size = strtol(argv[++c], NULL, 0);
config.target_size = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-psnr") && c < argc - 1) {
config.target_PSNR = (float)strtod(argv[++c], NULL);
config.target_PSNR = ExUtilGetFloat(argv[++c], &parse_error);
} else if (!strcmp(argv[c], "-sns") && c < argc - 1) {
config.sns_strength = strtol(argv[++c], NULL, 0);
config.sns_strength = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-f") && c < argc - 1) {
config.filter_strength = strtol(argv[++c], NULL, 0);
config.filter_strength = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-af")) {
config.autofilter = 1;
} else if (!strcmp(argv[c], "-jpeg_like")) {
@@ -784,34 +809,26 @@ int main(int argc, const char *argv[]) {
} else if (!strcmp(argv[c], "-nostrong")) {
config.filter_type = 0;
} else if (!strcmp(argv[c], "-sharpness") && c < argc - 1) {
config.filter_sharpness = strtol(argv[++c], NULL, 0);
config.filter_sharpness = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-pass") && c < argc - 1) {
config.pass = strtol(argv[++c], NULL, 0);
config.pass = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-pre") && c < argc - 1) {
config.preprocessing = strtol(argv[++c], NULL, 0);
config.preprocessing = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-segments") && c < argc - 1) {
config.segments = strtol(argv[++c], NULL, 0);
config.segments = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-partition_limit") && c < argc - 1) {
config.partition_limit = strtol(argv[++c], NULL, 0);
config.partition_limit = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-map") && c < argc - 1) {
picture.extra_info_type = strtol(argv[++c], NULL, 0);
#ifdef WEBP_EXPERIMENTAL_FEATURES
} else if (!strcmp(argv[c], "-444")) {
picture.colorspace = WEBP_YUV444;
} else if (!strcmp(argv[c], "-422")) {
picture.colorspace = WEBP_YUV422;
} else if (!strcmp(argv[c], "-gray")) {
picture.colorspace = WEBP_YUV400;
#endif
picture.extra_info_type = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-crop") && c < argc - 4) {
crop = 1;
crop_x = strtol(argv[++c], NULL, 0);
crop_y = strtol(argv[++c], NULL, 0);
crop_w = strtol(argv[++c], NULL, 0);
crop_h = strtol(argv[++c], NULL, 0);
crop_x = ExUtilGetInt(argv[++c], 0, &parse_error);
crop_y = ExUtilGetInt(argv[++c], 0, &parse_error);
crop_w = ExUtilGetInt(argv[++c], 0, &parse_error);
crop_h = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-resize") && c < argc - 2) {
resize_w = strtol(argv[++c], NULL, 0);
resize_h = strtol(argv[++c], NULL, 0);
resize_w = ExUtilGetInt(argv[++c], 0, &parse_error);
resize_h = ExUtilGetInt(argv[++c], 0, &parse_error);
#ifndef WEBP_DLL
} else if (!strcmp(argv[c], "-noasm")) {
VP8GetCPUInfo = NULL;
@@ -906,6 +923,11 @@ int main(int argc, const char *argv[]) {
} else {
in_file = argv[c];
}
if (parse_error) {
HelpLong();
return -1;
}
}
if (in_file == NULL) {
fprintf(stderr, "No input file specified!\n");
@@ -913,6 +935,15 @@ int main(int argc, const char *argv[]) {
goto Error;
}
#if WEBP_ENCODER_ABI_VERSION > 0x0202
if (use_lossless_preset == 1) {
if (!WebPConfigLosslessPreset(&config, lossless_preset)) {
fprintf(stderr, "Invalid lossless preset (-z %d)\n", lossless_preset);
goto Error;
}
}
#endif
// Check for unsupported command line options for lossless mode and log
// warning for such options.
if (!quiet && config.lossless == 1) {
@@ -956,8 +987,9 @@ int main(int argc, const char *argv[]) {
}
// Open the output
if (out_file) {
out = fopen(out_file, "wb");
if (out_file != NULL) {
const int use_stdout = !strcmp(out_file, "-");
out = use_stdout ? ExUtilSetBinaryMode(stdout) : fopen(out_file, "wb");
if (out == NULL) {
fprintf(stderr, "Error! Cannot open output file '%s'\n", out_file);
goto Error;
@@ -1058,47 +1090,60 @@ int main(int argc, const char *argv[]) {
}
if (!quiet) {
if (config.lossless) {
PrintExtraInfoLossless(&picture, short_output, in_file);
} else {
PrintExtraInfoLossy(&picture, short_output, config.low_memory, in_file);
if (!short_output || print_distortion < 0) {
if (config.lossless) {
PrintExtraInfoLossless(&picture, short_output, in_file);
} else {
PrintExtraInfoLossy(&picture, short_output, config.low_memory, in_file);
}
}
if (!short_output && picture.extra_info_type > 0) {
PrintMapInfo(&picture);
}
if (print_distortion >= 0) { // print distortion
static const char* distortion_names[] = { "PSNR", "SSIM", "LSIM" };
float values[5];
// Comparison is performed in YUVA colorspace.
if (original_picture.use_argb &&
!WebPPictureARGBToYUVA(&original_picture, WEBP_YUV420A)) {
fprintf(stderr, "Error while converting original picture to YUVA.\n");
goto Error;
}
if (picture.use_argb &&
!WebPPictureARGBToYUVA(&picture, WEBP_YUV420A)) {
fprintf(stderr, "Error while converting compressed picture to YUVA.\n");
goto Error;
}
if (!WebPPictureDistortion(&picture, &original_picture,
print_distortion, values)) {
fprintf(stderr, "Error while computing the distortion.\n");
goto Error;
}
if (!short_output) {
fprintf(stderr, "%s: Y:%.2f U:%.2f V:%.2f A:%.2f Total:%.2f\n",
distortion_names[print_distortion],
values[0], values[1], values[2], values[3], values[4]);
} else {
fprintf(stderr, "%7d %.4f\n", picture.stats->coded_size, values[4]);
}
}
if (!short_output) {
PrintMetadataInfo(&metadata, metadata_written);
}
}
if (!quiet && !short_output && print_distortion >= 0) { // print distortion
static const char* distortion_names[] = { "PSNR", "SSIM", "LSIM" };
float values[5];
// Comparison is performed in YUVA colorspace.
if (original_picture.use_argb &&
!WebPPictureARGBToYUVA(&original_picture, WEBP_YUV420A)) {
fprintf(stderr, "Error while converting original picture to YUVA.\n");
goto Error;
}
if (picture.use_argb &&
!WebPPictureARGBToYUVA(&picture, WEBP_YUV420A)) {
fprintf(stderr, "Error while converting compressed picture to YUVA.\n");
goto Error;
}
if (!WebPPictureDistortion(&picture, &original_picture,
print_distortion, values)) {
fprintf(stderr, "Error while computing the distortion.\n");
goto Error;
}
fprintf(stderr, "%s: Y:%.2f U:%.2f V:%.2f A:%.2f Total:%.2f\n",
distortion_names[print_distortion],
values[0], values[1], values[2], values[3], values[4]);
}
return_value = 0;
Error:
#if WEBP_ENCODER_ABI_VERSION > 0x0203
WebPMemoryWriterClear(&memory_writer);
#else
free(memory_writer.mem);
#endif
free(picture.extra_info);
MetadataFree(&metadata);
WebPPictureFree(&picture);
WebPPictureFree(&original_picture);
if (out != NULL) {
if (out != NULL && out != stdout) {
fclose(out);
}

View File

@@ -17,11 +17,12 @@
#include <string.h>
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#ifdef WEBP_HAVE_PNG
#include <png.h>
#include <setjmp.h> // note: this must be included *after* png.h
#endif
#ifdef HAVE_WINCODEC_H
@@ -38,11 +39,6 @@
#include <wincodec.h>
#endif
#if defined(_WIN32)
#include <fcntl.h> // for _O_BINARY
#include <io.h> // for _setmode()
#endif
#include "webp/decode.h"
#include "./example_util.h"
#include "./stopwatch.h"
@@ -197,8 +193,8 @@ static int WritePNG(FILE* out_file, const WebPDecBuffer* const buffer) {
uint8_t* const rgb = buffer->u.RGBA.rgba;
const int stride = buffer->u.RGBA.stride;
const int has_alpha = (buffer->colorspace == MODE_RGBA);
png_structp png;
png_infop info;
volatile png_structp png;
volatile png_infop info;
png_uint_32 y;
png = png_create_write_struct(PNG_LIBPNG_VER_STRING,
@@ -208,11 +204,11 @@ static int WritePNG(FILE* out_file, const WebPDecBuffer* const buffer) {
}
info = png_create_info_struct(png);
if (info == NULL) {
png_destroy_write_struct(&png, NULL);
png_destroy_write_struct((png_structpp)&png, NULL);
return 0;
}
if (setjmp(png_jmpbuf(png))) {
png_destroy_write_struct(&png, &info);
png_destroy_write_struct((png_structpp)&png, (png_infopp)&info);
return 0;
}
png_init_io(png, out_file);
@@ -226,7 +222,7 @@ static int WritePNG(FILE* out_file, const WebPDecBuffer* const buffer) {
png_write_rows(png, &row, 1);
}
png_write_end(png, info);
png_destroy_write_struct(&png, &info);
png_destroy_write_struct((png_structpp)&png, (png_infopp)&info);
return 1;
}
#else // !HAVE_WINCODEC_H && !WEBP_HAVE_PNG
@@ -249,10 +245,10 @@ static int WritePPM(FILE* fout, const WebPDecBuffer* const buffer, int alpha) {
uint32_t y;
if (alpha) {
fprintf(fout, "P7\nWIDTH %d\nHEIGHT %d\nDEPTH 4\nMAXVAL 255\n"
fprintf(fout, "P7\nWIDTH %u\nHEIGHT %u\nDEPTH 4\nMAXVAL 255\n"
"TUPLTYPE RGB_ALPHA\nENDHDR\n", width, height);
} else {
fprintf(fout, "P6\n%d %d\n255\n", width, height);
fprintf(fout, "P6\n%u %u\n255\n", width, height);
}
for (y = 0; y < height; ++y) {
if (fwrite(rgb + y * stride, width, bytes_per_px, fout) != bytes_per_px) {
@@ -409,7 +405,7 @@ static int WriteAlphaPlane(FILE* fout, const WebPDecBuffer* const buffer) {
const int a_stride = buffer->u.YUVA.a_stride;
uint32_t y;
assert(a != NULL);
fprintf(fout, "P5\n%d %d\n255\n", width, height);
fprintf(fout, "P5\n%u %u\n255\n", width, height);
for (y = 0; y < height; ++y) {
if (fwrite(a + y * a_stride, width, 1, fout) != 1) {
return 0;
@@ -482,15 +478,8 @@ static int SaveOutput(const WebPDecBuffer* const buffer,
needs_open_file = (format != PNG);
#endif
#if defined(_WIN32)
if (use_stdout && _setmode(_fileno(stdout), _O_BINARY) == -1) {
fprintf(stderr, "Failed to reopen stdout in O_BINARY mode.\n");
return -1;
}
#endif
if (needs_open_file) {
fout = use_stdout ? stdout : fopen(out_file, "wb");
fout = use_stdout ? ExUtilSetBinaryMode(stdout) : fopen(out_file, "wb");
if (fout == NULL) {
fprintf(stderr, "Error opening output file %s\n", out_file);
return 0;
@@ -552,29 +541,30 @@ static void Help(void) {
" -yuv ......... save the raw YUV samples in flat layout\n"
"\n"
" Other options are:\n"
" -version .... print version number and exit.\n"
" -nofancy ..... don't use the fancy YUV420 upscaler.\n"
" -nofilter .... disable in-loop filtering.\n"
" -nodither .... disable dithering.\n"
" -version .... print version number and exit\n"
" -nofancy ..... don't use the fancy YUV420 upscaler\n"
" -nofilter .... disable in-loop filtering\n"
" -nodither .... disable dithering\n"
" -dither <d> .. dithering strength (in 0..100)\n"
#if WEBP_DECODER_ABI_VERSION > 0x0204
" -alpha_dither use alpha-plane dithering if needed\n"
#endif
" -mt .......... use multi-threading\n"
" -crop <x> <y> <w> <h> ... crop output with the given rectangle\n"
" -scale <w> <h> .......... scale the output (*after* any cropping)\n"
" -alpha ....... only save the alpha plane.\n"
#if WEBP_DECODER_ABI_VERSION > 0x0203
" -flip ........ flip the output vertically\n"
#endif
" -alpha ....... only save the alpha plane\n"
" -incremental . use incremental decoding (useful for tests)\n"
" -h ....... this help message.\n"
" -h ....... this help message\n"
" -v ....... verbose (e.g. print encoding/decoding times)\n"
#ifndef WEBP_DLL
" -noasm ....... disable all assembly optimizations.\n"
" -noasm ....... disable all assembly optimizations\n"
#endif
);
}
static const char* const kStatusMessages[] = {
"OK", "OUT_OF_MEMORY", "INVALID_PARAM", "BITSTREAM_ERROR",
"UNSUPPORTED_FEATURE", "SUSPENDED", "USER_ABORT", "NOT_ENOUGH_DATA"
};
static const char* const kFormatType[] = {
"unspecified", "lossy", "lossless"
};
@@ -597,6 +587,7 @@ int main(int argc, const char *argv[]) {
}
for (c = 1; c < argc; ++c) {
int parse_error = 0;
if (!strcmp(argv[c], "-h") || !strcmp(argv[c], "-help")) {
Help();
return 0;
@@ -627,20 +618,29 @@ int main(int argc, const char *argv[]) {
format = YUV;
} else if (!strcmp(argv[c], "-mt")) {
config.options.use_threads = 1;
#if WEBP_DECODER_ABI_VERSION > 0x0204
} else if (!strcmp(argv[c], "-alpha_dither")) {
config.options.alpha_dithering_strength = 100;
#endif
} else if (!strcmp(argv[c], "-nodither")) {
config.options.dithering_strength = 0;
} else if (!strcmp(argv[c], "-dither") && c < argc - 1) {
config.options.dithering_strength = strtol(argv[++c], NULL, 0);
config.options.dithering_strength =
ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-crop") && c < argc - 4) {
config.options.use_cropping = 1;
config.options.crop_left = strtol(argv[++c], NULL, 0);
config.options.crop_top = strtol(argv[++c], NULL, 0);
config.options.crop_width = strtol(argv[++c], NULL, 0);
config.options.crop_height = strtol(argv[++c], NULL, 0);
config.options.crop_left = ExUtilGetInt(argv[++c], 0, &parse_error);
config.options.crop_top = ExUtilGetInt(argv[++c], 0, &parse_error);
config.options.crop_width = ExUtilGetInt(argv[++c], 0, &parse_error);
config.options.crop_height = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-scale") && c < argc - 2) {
config.options.use_scaling = 1;
config.options.scaled_width = strtol(argv[++c], NULL, 0);
config.options.scaled_height = strtol(argv[++c], NULL, 0);
config.options.scaled_width = ExUtilGetInt(argv[++c], 0, &parse_error);
config.options.scaled_height = ExUtilGetInt(argv[++c], 0, &parse_error);
#if WEBP_DECODER_ABI_VERSION > 0x0203
} else if (!strcmp(argv[c], "-flip")) {
config.options.flip = 1;
#endif
} else if (!strcmp(argv[c], "-v")) {
verbose = 1;
#ifndef WEBP_DLL
@@ -659,6 +659,11 @@ int main(int argc, const char *argv[]) {
} else {
in_file = argv[c];
}
if (parse_error) {
Help();
return -1;
}
}
if (in_file == NULL) {
@@ -668,27 +673,11 @@ int main(int argc, const char *argv[]) {
}
{
Stopwatch stop_watch;
VP8StatusCode status = VP8_STATUS_OK;
size_t data_size = 0;
const uint8_t* data = NULL;
if (!ExUtilReadFile(in_file, &data, &data_size)) return -1;
if (verbose) {
StopwatchReset(&stop_watch);
}
status = WebPGetFeatures(data, data_size, bitstream);
if (status != VP8_STATUS_OK) {
goto end;
}
if (bitstream->has_animation) {
fprintf(stderr,
"Error! Decoding of an animated WebP file is not supported.\n"
" Use webpmux to extract the individual frames or\n"
" vwebp to view this image.\n");
if (!ExUtilLoadWebP(in_file, &data, &data_size, bitstream)) {
return -1;
}
switch (format) {
@@ -724,31 +713,16 @@ int main(int argc, const char *argv[]) {
return -1;
}
// Decoding call.
if (!incremental) {
status = WebPDecode(data, data_size, &config);
if (incremental) {
status = ExUtilDecodeWebPIncremental(data, data_size, verbose, &config);
} else {
WebPIDecoder* const idec = WebPIDecode(data, data_size, &config);
if (idec == NULL) {
fprintf(stderr, "Failed during WebPINewDecoder().\n");
status = VP8_STATUS_OUT_OF_MEMORY;
goto end;
} else {
status = WebPIUpdate(idec, data, data_size);
WebPIDelete(idec);
}
status = ExUtilDecodeWebP(data, data_size, verbose, &config);
}
if (verbose) {
const double decode_time = StopwatchReadAndReset(&stop_watch);
fprintf(stderr, "Time to decode picture: %.3fs\n", decode_time);
}
end:
free((void*)data);
ok = (status == VP8_STATUS_OK);
if (!ok) {
fprintf(stderr, "Decoding of %s failed.\n", in_file);
fprintf(stderr, "Status: %d (%s)\n", status, kStatusMessages[status]);
ExUtilPrintWebPError(in_file, status);
goto Exit;
}
}

View File

@@ -11,20 +11,104 @@
//
#include "./example_util.h"
#if defined(_WIN32)
#include <fcntl.h> // for _O_BINARY
#include <io.h> // for _setmode()
#endif
#include <stdio.h>
#include <stdlib.h>
#include <string.h>
#include "webp/decode.h"
#include "./stopwatch.h"
//------------------------------------------------------------------------------
// String parsing
uint32_t ExUtilGetUInt(const char* const v, int base, int* const error) {
char* end = NULL;
const uint32_t n = (v != NULL) ? (uint32_t)strtoul(v, &end, base) : 0u;
if (end == v && error != NULL && !*error) {
*error = 1;
fprintf(stderr, "Error! '%s' is not an integer.\n",
(v != NULL) ? v : "(null)");
}
return n;
}
int ExUtilGetInt(const char* const v, int base, int* const error) {
return (int)ExUtilGetUInt(v, base, error);
}
float ExUtilGetFloat(const char* const v, int* const error) {
char* end = NULL;
const float f = (v != NULL) ? (float)strtod(v, &end) : 0.f;
if (end == v && error != NULL && !*error) {
*error = 1;
fprintf(stderr, "Error! '%s' is not a floating point number.\n",
(v != NULL) ? v : "(null)");
}
return f;
}
// -----------------------------------------------------------------------------
// File I/O
FILE* ExUtilSetBinaryMode(FILE* file) {
#if defined(_WIN32)
if (_setmode(_fileno(file), _O_BINARY) == -1) {
fprintf(stderr, "Failed to reopen file in O_BINARY mode.\n");
return NULL;
}
#endif
return file;
}
int ExUtilReadFromStdin(const uint8_t** data, size_t* data_size) {
static const size_t kBlockSize = 16384; // default initial size
size_t max_size = 0;
size_t size = 0;
uint8_t* input = NULL;
if (data == NULL || data_size == NULL) return 0;
*data = NULL;
*data_size = 0;
if (!ExUtilSetBinaryMode(stdin)) return 0;
while (!feof(stdin)) {
// We double the buffer size each time and read as much as possible.
const size_t extra_size = (max_size == 0) ? kBlockSize : max_size;
void* const new_data = realloc(input, max_size + extra_size);
if (new_data == NULL) goto Error;
input = (uint8_t*)new_data;
max_size += extra_size;
size += fread(input + size, 1, extra_size, stdin);
if (size < max_size) break;
}
if (ferror(stdin)) goto Error;
*data = input;
*data_size = size;
return 1;
Error:
free(input);
fprintf(stderr, "Could not read from stdin\n");
return 0;
}
int ExUtilReadFile(const char* const file_name,
const uint8_t** data, size_t* data_size) {
int ok;
void* file_data;
size_t file_size;
FILE* in;
const int from_stdin = (file_name == NULL) || !strcmp(file_name, "-");
if (file_name == NULL || data == NULL || data_size == NULL) return 0;
if (from_stdin) return ExUtilReadFromStdin(data, data_size);
if (data == NULL || data_size == NULL) return 0;
*data = NULL;
*data_size = 0;
@@ -56,17 +140,119 @@ int ExUtilWriteFile(const char* const file_name,
const uint8_t* data, size_t data_size) {
int ok;
FILE* out;
const int to_stdout = (file_name == NULL) || !strcmp(file_name, "-");
if (file_name == NULL || data == NULL) {
if (data == NULL) {
return 0;
}
out = fopen(file_name, "wb");
out = to_stdout ? stdout : fopen(file_name, "wb");
if (out == NULL) {
fprintf(stderr, "Error! Cannot open output file '%s'\n", file_name);
return 0;
}
ok = (fwrite(data, data_size, 1, out) == 1);
fclose(out);
if (out != stdout) fclose(out);
return ok;
}
//------------------------------------------------------------------------------
// WebP decoding
static const char* const kStatusMessages[VP8_STATUS_NOT_ENOUGH_DATA + 1] = {
"OK", "OUT_OF_MEMORY", "INVALID_PARAM", "BITSTREAM_ERROR",
"UNSUPPORTED_FEATURE", "SUSPENDED", "USER_ABORT", "NOT_ENOUGH_DATA"
};
static void PrintAnimationWarning(const WebPDecoderConfig* const config) {
if (config->input.has_animation) {
fprintf(stderr,
"Error! Decoding of an animated WebP file is not supported.\n"
" Use webpmux to extract the individual frames or\n"
" vwebp to view this image.\n");
}
}
void ExUtilPrintWebPError(const char* const in_file, int status) {
fprintf(stderr, "Decoding of %s failed.\n", in_file);
fprintf(stderr, "Status: %d", status);
if (status >= VP8_STATUS_OK && status <= VP8_STATUS_NOT_ENOUGH_DATA) {
fprintf(stderr, "(%s)", kStatusMessages[status]);
}
fprintf(stderr, "\n");
}
int ExUtilLoadWebP(const char* const in_file,
const uint8_t** data, size_t* data_size,
WebPBitstreamFeatures* bitstream) {
VP8StatusCode status;
WebPBitstreamFeatures local_features;
if (!ExUtilReadFile(in_file, data, data_size)) return 0;
if (bitstream == NULL) {
bitstream = &local_features;
}
status = WebPGetFeatures(*data, *data_size, bitstream);
if (status != VP8_STATUS_OK) {
free((void*)*data);
*data = NULL;
*data_size = 0;
ExUtilPrintWebPError(in_file, status);
return 0;
}
return 1;
}
//------------------------------------------------------------------------------
VP8StatusCode ExUtilDecodeWebP(const uint8_t* const data, size_t data_size,
int verbose, WebPDecoderConfig* const config) {
Stopwatch stop_watch;
VP8StatusCode status = VP8_STATUS_OK;
if (config == NULL) return VP8_STATUS_INVALID_PARAM;
PrintAnimationWarning(config);
StopwatchReset(&stop_watch);
// Decoding call.
status = WebPDecode(data, data_size, config);
if (verbose) {
const double decode_time = StopwatchReadAndReset(&stop_watch);
fprintf(stderr, "Time to decode picture: %.3fs\n", decode_time);
}
return status;
}
VP8StatusCode ExUtilDecodeWebPIncremental(
const uint8_t* const data, size_t data_size,
int verbose, WebPDecoderConfig* const config) {
Stopwatch stop_watch;
VP8StatusCode status = VP8_STATUS_OK;
if (config == NULL) return VP8_STATUS_INVALID_PARAM;
PrintAnimationWarning(config);
StopwatchReset(&stop_watch);
// Decoding call.
{
WebPIDecoder* const idec = WebPIDecode(data, data_size, config);
if (idec == NULL) {
fprintf(stderr, "Failed during WebPINewDecoder().\n");
return VP8_STATUS_OUT_OF_MEMORY;
} else {
status = WebPIUpdate(idec, data, data_size);
WebPIDelete(idec);
}
}
if (verbose) {
const double decode_time = StopwatchReadAndReset(&stop_watch);
fprintf(stderr, "Time to decode picture: %.3fs\n", decode_time);
}
return status;
}
// -----------------------------------------------------------------------------

View File

@@ -13,22 +13,75 @@
#ifndef WEBP_EXAMPLES_EXAMPLE_UTIL_H_
#define WEBP_EXAMPLES_EXAMPLE_UTIL_H_
#include "webp/types.h"
#include <stdio.h>
#include "webp/decode.h"
#ifdef __cplusplus
extern "C" {
#endif
//------------------------------------------------------------------------------
// String parsing
// Parses 'v' using strto(ul|l|d)(). If error is non-NULL, '*error' is set to
// true on failure while on success it is left unmodified to allow chaining of
// calls. An error is only printed on the first occurrence.
uint32_t ExUtilGetUInt(const char* const v, int base, int* const error);
int ExUtilGetInt(const char* const v, int base, int* const error);
float ExUtilGetFloat(const char* const v, int* const error);
//------------------------------------------------------------------------------
// File I/O
// Reopen file in binary (O_BINARY) mode.
// Returns 'file' on success, NULL otherwise.
FILE* ExUtilSetBinaryMode(FILE* file);
// Allocates storage for entire file 'file_name' and returns contents and size
// in 'data' and 'data_size'. Returns 1 on success, 0 otherwise. '*data' should
// be deleted using free().
// If 'file_name' is NULL or equal to "-", input is read from stdin by calling
// the function ExUtilReadFromStdin().
int ExUtilReadFile(const char* const file_name,
const uint8_t** data, size_t* data_size);
// Same as ExUtilReadFile(), but reads until EOF from stdin instead.
int ExUtilReadFromStdin(const uint8_t** data, size_t* data_size);
// Write a data segment into a file named 'file_name'. Returns true if ok.
// If 'file_name' is NULL or equal to "-", output is written to stdout.
int ExUtilWriteFile(const char* const file_name,
const uint8_t* data, size_t data_size);
//------------------------------------------------------------------------------
// WebP decoding
// Prints an informative error message regarding decode failure of 'in_file'.
// 'status' is treated as a VP8StatusCode and if valid will be printed as a
// text string.
void ExUtilPrintWebPError(const char* const in_file, int status);
// Reads a WebP from 'in_file', returning the contents and size in 'data' and
// 'data_size'. If not NULL, 'bitstream' is populated using WebPGetFeatures().
// Returns true on success.
int ExUtilLoadWebP(const char* const in_file,
const uint8_t** data, size_t* data_size,
WebPBitstreamFeatures* bitstream);
// Decodes the WebP contained in 'data'.
// 'config' is a structure previously initialized by WebPInitDecoderConfig().
// 'config->output' should have the desired colorspace selected. 'verbose' will
// cause decode timing to be reported.
// Returns the decoder status. On success 'config->output' will contain the
// decoded picture.
VP8StatusCode ExUtilDecodeWebP(const uint8_t* const data, size_t data_size,
int verbose, WebPDecoderConfig* const config);
// Same as ExUtilDecodeWebP(), but using the incremental decoder.
VP8StatusCode ExUtilDecodeWebPIncremental(
const uint8_t* const data, size_t data_size,
int verbose, WebPDecoderConfig* const config);
#ifdef __cplusplus
} // extern "C"
#endif

View File

@@ -17,7 +17,7 @@
#include <string.h>
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#ifdef WEBP_HAVE_GIF
@@ -28,6 +28,16 @@
#include "./example_util.h"
#include "./gif2webp_util.h"
// GIFLIB_MAJOR is only defined in libgif >= 4.2.0.
#if defined(GIFLIB_MAJOR) && defined(GIFLIB_MINOR)
# define LOCAL_GIF_VERSION ((GIFLIB_MAJOR << 8) | GIFLIB_MINOR)
# define LOCAL_GIF_PREREQ(maj, min) \
(LOCAL_GIF_VERSION >= (((maj) << 8) | (min)))
#else
# define LOCAL_GIF_VERSION 0
# define LOCAL_GIF_PREREQ(maj, min) 0
#endif
#define GIF_TRANSPARENT_MASK 0x01
#define GIF_DISPOSE_MASK 0x07
#define GIF_DISPOSE_SHIFT 2
@@ -36,7 +46,7 @@
//------------------------------------------------------------------------------
static int transparent_index = -1; // Index of transparent color in the map.
static int transparent_index = -1; // Opaque frame by default.
static void SanitizeKeyFrameIntervals(size_t* const kmin_ptr,
size_t* const kmax_ptr) {
@@ -103,12 +113,12 @@ static void Remap(const uint8_t* const src, const GifFileType* const gif,
static int ReadFrame(GifFileType* const gif, WebPFrameRect* const gif_rect,
WebPPicture* const webp_frame) {
WebPPicture sub_image;
const GifImageDesc image_desc = gif->Image;
const GifImageDesc* const image_desc = &gif->Image;
uint32_t* dst = NULL;
uint8_t* tmp = NULL;
int ok = 0;
WebPFrameRect rect = {
image_desc.Left, image_desc.Top, image_desc.Width, image_desc.Height
image_desc->Left, image_desc->Top, image_desc->Width, image_desc->Height
};
*gif_rect = rect;
@@ -124,7 +134,7 @@ static int ReadFrame(GifFileType* const gif, WebPFrameRect* const gif_rect,
tmp = (uint8_t*)malloc(rect.width * sizeof(*tmp));
if (tmp == NULL) goto End;
if (image_desc.Interlace) { // Interlaced image.
if (image_desc->Interlace) { // Interlaced image.
// We need 4 passes, with the following offsets and jumps.
const int interlace_offsets[] = { 0, 4, 2, 1 };
const int interlace_jumps[] = { 8, 8, 4, 2 };
@@ -153,30 +163,29 @@ static int ReadFrame(GifFileType* const gif, WebPFrameRect* const gif_rect,
return ok;
}
static int GetBackgroundColor(const ColorMapObject* const color_map,
int bgcolor_idx, uint32_t* const bgcolor) {
static void GetBackgroundColor(const ColorMapObject* const color_map,
int bgcolor_idx, uint32_t* const bgcolor) {
if (transparent_index != -1 && bgcolor_idx == transparent_index) {
*bgcolor = WEBP_UTIL_TRANSPARENT_COLOR; // Special case.
return 1;
} else if (color_map == NULL || color_map->Colors == NULL
|| bgcolor_idx >= color_map->ColorCount) {
return 0; // Invalid color map or index.
*bgcolor = WHITE_COLOR;
fprintf(stderr,
"GIF decode warning: invalid background color index. Assuming "
"white background.\n");
} else {
const GifColorType color = color_map->Colors[bgcolor_idx];
*bgcolor = (0xff << 24)
| (color.Red << 16)
| (color.Green << 8)
| (color.Blue << 0);
return 1;
}
}
static void DisplayGifError(const GifFileType* const gif, int gif_error) {
// GIFLIB_MAJOR is only defined in libgif >= 4.2.0.
// libgif 4.2.0 has retired PrintGifError() and added GifErrorString().
#if defined(GIFLIB_MAJOR) && defined(GIFLIB_MINOR) && \
((GIFLIB_MAJOR == 4 && GIFLIB_MINOR >= 2) || GIFLIB_MAJOR > 4)
#if GIFLIB_MAJOR >= 5
#if LOCAL_GIF_PREREQ(4,2)
#if LOCAL_GIF_PREREQ(5,0)
// Static string actually, hence the const char* cast.
const char* error_str = (const char*)GifErrorString(
(gif == NULL) ? gif_error : gif->Error);
@@ -194,7 +203,7 @@ static void DisplayGifError(const GifFileType* const gif, int gif_error) {
#endif
}
static const char* const kErrorMessages[] = {
static const char* const kErrorMessages[-WEBP_MUX_NOT_ENOUGH_DATA + 1] = {
"WEBP_MUX_NOT_FOUND", "WEBP_MUX_INVALID_ARGUMENT", "WEBP_MUX_BAD_DATA",
"WEBP_MUX_MEMORY_ERROR", "WEBP_MUX_NOT_ENOUGH_DATA"
};
@@ -215,26 +224,26 @@ enum {
static void Help(void) {
printf("Usage:\n");
printf(" gif2webp [options] gif_file -o webp_file\n");
printf("options:\n");
printf("Options:\n");
printf(" -h / -help ............ this help\n");
printf(" -lossy ................. Encode image using lossy compression.\n");
printf(" -mixed ................. For each frame in the image, pick lossy\n"
" or lossless compression heuristically.\n");
printf(" -lossy ................. encode image using lossy compression\n");
printf(" -mixed ................. for each frame in the image, pick lossy\n"
" or lossless compression heuristically\n");
printf(" -q <float> ............. quality factor (0:small..100:big)\n");
printf(" -m <int> ............... compression method (0=fast, 6=slowest)\n");
printf(" -kmin <int> ............ Min distance between key frames\n");
printf(" -kmax <int> ............ Max distance between key frames\n");
printf(" -kmin <int> ............ min distance between key frames\n");
printf(" -kmax <int> ............ max distance between key frames\n");
printf(" -f <int> ............... filter strength (0=off..100)\n");
printf(" -metadata <string> ..... comma separated list of metadata to\n");
printf(" ");
printf("copy from the input to the output if present.\n");
printf("copy from the input to the output if present\n");
printf(" "
"Valid values: all, none, icc, xmp (default)\n");
printf(" -mt .................... use multi-threading if available\n");
printf("\n");
printf(" -version ............... print version number and exit.\n");
printf(" -v ..................... verbose.\n");
printf(" -quiet ................. don't print anything.\n");
printf(" -version ............... print version number and exit\n");
printf(" -v ..................... verbose\n");
printf(" -quiet ................. don't print anything\n");
printf("\n");
}
@@ -250,7 +259,8 @@ int main(int argc, const char *argv[]) {
GifFileType* gif = NULL;
WebPConfig config;
WebPPicture frame;
WebPMuxFrameInfo info;
int duration = 0;
FrameDisposeMethod orig_dispose = FRAME_DISPOSE_NONE;
WebPMuxAnimParams anim = { WHITE_COLOR, 0 };
WebPFrameCache* cache = NULL;
@@ -270,17 +280,11 @@ int main(int argc, const char *argv[]) {
size_t kmax = 0;
int allow_mixed = 0; // If true, each frame can be lossy or lossless.
memset(&info, 0, sizeof(info));
info.id = WEBP_CHUNK_ANMF;
info.dispose_method = WEBP_MUX_DISPOSE_BACKGROUND;
info.blend_method = WEBP_MUX_BLEND;
if (!WebPConfigInit(&config) || !WebPPictureInit(&frame)) {
fprintf(stderr, "Error! Version mismatch!\n");
return -1;
}
config.lossless = 1; // Use lossless compression by default.
config.image_hint = WEBP_HINT_GRAPH; // always low-color
if (argc == 1) {
Help();
@@ -288,6 +292,7 @@ int main(int argc, const char *argv[]) {
}
for (c = 1; c < argc; ++c) {
int parse_error = 0;
if (!strcmp(argv[c], "-h") || !strcmp(argv[c], "-help")) {
Help();
return 0;
@@ -299,17 +304,17 @@ int main(int argc, const char *argv[]) {
allow_mixed = 1;
config.lossless = 0;
} else if (!strcmp(argv[c], "-q") && c < argc - 1) {
config.quality = (float)strtod(argv[++c], NULL);
config.quality = ExUtilGetFloat(argv[++c], &parse_error);
} else if (!strcmp(argv[c], "-m") && c < argc - 1) {
config.method = strtol(argv[++c], NULL, 0);
config.method = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-kmax") && c < argc - 1) {
kmax = strtoul(argv[++c], NULL, 0);
kmax = ExUtilGetUInt(argv[++c], 0, &parse_error);
default_kmax = 0;
} else if (!strcmp(argv[c], "-kmin") && c < argc - 1) {
kmin = strtoul(argv[++c], NULL, 0);
kmin = ExUtilGetUInt(argv[++c], 0, &parse_error);
default_kmin = 0;
} else if (!strcmp(argv[c], "-f") && c < argc - 1) {
config.filter_strength = strtol(argv[++c], NULL, 0);
config.filter_strength = ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-metadata") && c < argc - 1) {
static const struct {
const char* option;
@@ -373,6 +378,11 @@ int main(int argc, const char *argv[]) {
} else {
in_file = argv[c];
}
if (parse_error) {
Help();
return -1;
}
}
// Appropriate default kmin, kmax values for lossy and lossless.
@@ -396,24 +406,13 @@ int main(int argc, const char *argv[]) {
}
// Start the decoder object
#if defined(GIFLIB_MAJOR) && (GIFLIB_MAJOR >= 5)
// There was an API change in version 5.0.0.
#if LOCAL_GIF_PREREQ(5,0)
gif = DGifOpenFileName(in_file, &gif_error);
#else
gif = DGifOpenFileName(in_file);
#endif
if (gif == NULL) goto End;
// Allocate current buffer
frame.width = gif->SWidth;
frame.height = gif->SHeight;
frame.use_argb = 1;
if (!WebPPictureAlloc(&frame)) goto End;
// Initialize cache
cache = WebPFrameCacheNew(frame.width, frame.height, kmin, kmax, allow_mixed);
if (cache == NULL) goto End;
mux = WebPMuxNew();
if (mux == NULL) {
fprintf(stderr, "ERROR: could not create a mux object.\n");
@@ -429,13 +428,66 @@ int main(int argc, const char *argv[]) {
switch (type) {
case IMAGE_DESC_RECORD_TYPE: {
WebPFrameRect gif_rect;
GifImageDesc* const image_desc = &gif->Image;
if (!DGifGetImageDesc(gif)) goto End;
// Fix some broken GIF global headers that report
// 0 x 0 screen dimension.
if (is_first_frame) {
if (verbose) {
printf("Canvas screen: %d x %d\n", gif->SWidth, gif->SHeight);
}
if (gif->SWidth == 0 || gif->SHeight == 0) {
image_desc->Left = 0;
image_desc->Top = 0;
gif->SWidth = image_desc->Width;
gif->SHeight = image_desc->Height;
if (gif->SWidth <= 0 || gif->SHeight <= 0) {
goto End;
}
if (verbose) {
printf("Fixed canvas screen dimension to: %d x %d\n",
gif->SWidth, gif->SHeight);
}
}
#if WEBP_MUX_ABI_VERSION > 0x0101
// Set definitive canvas size.
err = WebPMuxSetCanvasSize(mux, gif->SWidth, gif->SHeight);
if (err != WEBP_MUX_OK) {
fprintf(stderr, "Invalid canvas size %d x %d\n",
gif->SWidth, gif->SHeight);
goto End;
}
#endif
// Allocate current buffer.
frame.width = gif->SWidth;
frame.height = gif->SHeight;
frame.use_argb = 1;
if (!WebPPictureAlloc(&frame)) goto End;
WebPUtilClearPic(&frame, NULL);
// Initialize cache.
cache = WebPFrameCacheNew(frame.width, frame.height,
kmin, kmax, allow_mixed);
if (cache == NULL) goto End;
// Background color.
GetBackgroundColor(gif->SColorMap, gif->SBackGroundColor,
&anim.bgcolor);
}
// Some even more broken GIF can have sub-rect with zero width/height.
if (image_desc->Width == 0 || image_desc->Height == 0) {
image_desc->Width = gif->SWidth;
image_desc->Height = gif->SHeight;
}
if (!ReadFrame(gif, &gif_rect, &frame)) {
goto End;
}
if (!WebPFrameCacheAddFrame(cache, &config, &gif_rect, &frame, &info)) {
if (!WebPFrameCacheAddFrame(cache, &config, &gif_rect, orig_dispose,
duration, &frame)) {
fprintf(stderr, "Error! Cannot encode frame as WebP\n");
fprintf(stderr, "Error code: %d\n", frame.error_code);
}
@@ -447,6 +499,13 @@ int main(int argc, const char *argv[]) {
goto End;
}
is_first_frame = 0;
// In GIF, graphic control extensions are optional for a frame, so we
// may not get one before reading the next frame. To handle this case,
// we reset frame properties to reasonable defaults for the next frame.
orig_dispose = FRAME_DISPOSE_NONE;
duration = 0;
transparent_index = -1; // Opaque frame by default.
break;
}
case EXTENSION_RECORD_TYPE: {
@@ -464,30 +523,21 @@ int main(int argc, const char *argv[]) {
const int dispose = (flags >> GIF_DISPOSE_SHIFT) & GIF_DISPOSE_MASK;
const int delay = data[2] | (data[3] << 8); // In 10 ms units.
if (data[0] != 4) goto End;
info.duration = delay * 10; // Duration is in 1 ms units for WebP.
if (dispose == 3) {
static int warning_printed = 0;
if (!warning_printed) {
fprintf(stderr, "WARNING: GIF_DISPOSE_RESTORE unsupported.\n");
warning_printed = 1;
}
// failsafe. TODO(urvang): emulate the correct behaviour by
// recoding the whole frame.
info.dispose_method = WEBP_MUX_DISPOSE_BACKGROUND;
} else {
info.dispose_method =
(dispose == 2) ? WEBP_MUX_DISPOSE_BACKGROUND
: WEBP_MUX_DISPOSE_NONE;
duration = delay * 10; // Duration is in 1 ms units for WebP.
switch (dispose) {
case 3:
orig_dispose = FRAME_DISPOSE_RESTORE_PREVIOUS;
break;
case 2:
orig_dispose = FRAME_DISPOSE_BACKGROUND;
break;
case 1:
case 0:
default:
orig_dispose = FRAME_DISPOSE_NONE;
break;
}
transparent_index = (flags & GIF_TRANSPARENT_MASK) ? data[4] : -1;
if (is_first_frame) {
if (!GetBackgroundColor(gif->SColorMap, gif->SBackGroundColor,
&anim.bgcolor)) {
fprintf(stderr, "GIF decode warning: invalid background color "
"index. Assuming white background.\n");
}
WebPUtilClearPic(&frame, NULL);
}
break;
}
case PLAINTEXT_EXT_FUNC_CODE: {
@@ -495,13 +545,16 @@ int main(int argc, const char *argv[]) {
}
case APPLICATION_EXT_FUNC_CODE: {
if (data[0] != 11) break; // Chunk is too short
if (!memcmp(data + 1, "NETSCAPE2.0", 11)) {
if (!memcmp(data + 1, "NETSCAPE2.0", 11) ||
!memcmp(data + 1, "ANIMEXTS1.0", 11)) {
// Recognize and parse Netscape2.0 NAB extension for loop count.
if (DGifGetExtensionNext(gif, &data) == GIF_ERROR) goto End;
if (data == NULL) goto End; // Loop count sub-block missing.
if (data[0] != 3 && data[1] != 1) break; // wrong size/marker
if (data[0] < 3 || data[1] != 1) break; // wrong size/marker
anim.loop_count = data[2] | (data[3] << 8);
if (verbose) printf("Loop count: %d\n", anim.loop_count);
if (verbose) {
fprintf(stderr, "Loop count: %d\n", anim.loop_count);
}
} else { // An extension containing metadata.
// We only store the first encountered chunk of each type, and
// only if requested by the user.
@@ -555,7 +608,8 @@ int main(int argc, const char *argv[]) {
// Add metadata chunk.
err = WebPMuxSetChunk(mux, fourccs[is_icc], &metadata, 1);
if (verbose) {
printf("%s size: %d\n", features[is_icc], (int)metadata.size);
fprintf(stderr, "%s size: %d\n",
features[is_icc], (int)metadata.size);
}
WebPDataClear(&metadata);
if (err != WEBP_MUX_OK) {
@@ -621,11 +675,11 @@ int main(int argc, const char *argv[]) {
goto End;
}
if (!quiet) {
printf("Saved output file: %s\n", out_file);
fprintf(stderr, "Saved output file: %s\n", out_file);
}
} else {
if (!quiet) {
printf("Nothing written; use -o flag to save the result.\n");
fprintf(stderr, "Nothing written; use -o flag to save the result.\n");
}
}
@@ -644,7 +698,11 @@ int main(int argc, const char *argv[]) {
DisplayGifError(gif, gif_error);
}
if (gif != NULL) {
#if LOCAL_GIF_PREREQ(5,1)
DGifCloseFile(gif, &gif_error);
#else
DGifCloseFile(gif);
#endif
}
return !ok;

File diff suppressed because it is too large Load Diff

View File

@@ -29,6 +29,13 @@ extern "C" {
struct WebPPicture;
// Includes all disposal methods, even the ones not supported by WebP bitstream.
typedef enum FrameDisposeMethod {
FRAME_DISPOSE_NONE,
FRAME_DISPOSE_BACKGROUND,
FRAME_DISPOSE_RESTORE_PREVIOUS
} FrameDisposeMethod;
typedef struct {
int x_offset, y_offset, width, height;
} WebPFrameRect;
@@ -53,14 +60,15 @@ WebPFrameCache* WebPFrameCacheNew(int width, int height,
// Release all the frame data from 'cache' and free 'cache'.
void WebPFrameCacheDelete(WebPFrameCache* const cache);
// Given an image described by 'frame', 'info' and 'orig_rect', optimize it for
// WebP, encode it and add it to 'cache'.
// This takes care of frame disposal too, according to 'info->dispose_method'.
// Given an image described by 'frame', 'rect', 'dispose_method' and 'duration',
// optimize it for WebP, encode it and add it to 'cache'. 'rect' can be NULL.
// This takes care of frame disposal too, according to 'dispose_method'.
// Returns false in case of error (and sets frame->error_code accordingly).
int WebPFrameCacheAddFrame(WebPFrameCache* const cache,
const WebPConfig* const config,
const WebPFrameRect* const orig_rect,
WebPPicture* const frame,
WebPMuxFrameInfo* const info);
const WebPFrameRect* const rect,
FrameDisposeMethod dispose_method, int duration,
WebPPicture* const frame);
// Flush the *ready* frames from cache and add them to 'mux'. If 'verbose' is
// true, prints the information about these frames.

View File

@@ -12,7 +12,7 @@
#include "./jpegdec.h"
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#include <stdio.h>
@@ -211,30 +211,31 @@ static void my_error_exit(j_common_ptr dinfo) {
int ReadJPEG(FILE* in_file, WebPPicture* const pic, Metadata* const metadata) {
int ok = 0;
int stride, width, height;
struct jpeg_decompress_struct dinfo;
volatile struct jpeg_decompress_struct dinfo;
struct my_error_mgr jerr;
uint8_t* rgb = NULL;
uint8_t* volatile rgb = NULL;
JSAMPROW buffer[1];
memset((j_decompress_ptr)&dinfo, 0, sizeof(dinfo)); // for setjmp sanity
dinfo.err = jpeg_std_error(&jerr.pub);
jerr.pub.error_exit = my_error_exit;
if (setjmp(jerr.setjmp_buffer)) {
Error:
MetadataFree(metadata);
jpeg_destroy_decompress(&dinfo);
jpeg_destroy_decompress((j_decompress_ptr)&dinfo);
goto End;
}
jpeg_create_decompress(&dinfo);
jpeg_stdio_src(&dinfo, in_file);
if (metadata != NULL) SaveMetadataMarkers(&dinfo);
jpeg_read_header(&dinfo, TRUE);
jpeg_create_decompress((j_decompress_ptr)&dinfo);
jpeg_stdio_src((j_decompress_ptr)&dinfo, in_file);
if (metadata != NULL) SaveMetadataMarkers((j_decompress_ptr)&dinfo);
jpeg_read_header((j_decompress_ptr)&dinfo, TRUE);
dinfo.out_color_space = JCS_RGB;
dinfo.do_fancy_upsampling = TRUE;
jpeg_start_decompress(&dinfo);
jpeg_start_decompress((j_decompress_ptr)&dinfo);
if (dinfo.output_components != 3) {
goto Error;
@@ -251,26 +252,27 @@ int ReadJPEG(FILE* in_file, WebPPicture* const pic, Metadata* const metadata) {
buffer[0] = (JSAMPLE*)rgb;
while (dinfo.output_scanline < dinfo.output_height) {
if (jpeg_read_scanlines(&dinfo, buffer, 1) != 1) {
if (jpeg_read_scanlines((j_decompress_ptr)&dinfo, buffer, 1) != 1) {
goto End;
}
buffer[0] += stride;
}
if (metadata != NULL) {
ok = ExtractMetadataFromJPEG(&dinfo, metadata);
ok = ExtractMetadataFromJPEG((j_decompress_ptr)&dinfo, metadata);
if (!ok) {
fprintf(stderr, "Error extracting JPEG metadata!\n");
goto Error;
}
}
jpeg_finish_decompress(&dinfo);
jpeg_destroy_decompress(&dinfo);
jpeg_finish_decompress((j_decompress_ptr)&dinfo);
jpeg_destroy_decompress((j_decompress_ptr)&dinfo);
// WebP conversion.
pic->width = width;
pic->height = height;
pic->use_argb = 1; // store raw RGB samples
ok = WebPPictureImportRGB(pic, rgb, stride);
if (!ok) goto Error;

View File

@@ -12,7 +12,7 @@
#include "./pngdec.h"
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#include <stdio.h>
@@ -190,9 +190,9 @@ static int ExtractMetadataFromPNG(png_structp png,
int ReadPNG(FILE* in_file, WebPPicture* const pic, int keep_alpha,
Metadata* const metadata) {
png_structp png;
png_infop info = NULL;
png_infop end_info = NULL;
volatile png_structp png;
volatile png_infop info = NULL;
volatile png_infop end_info = NULL;
int color_type, bit_depth, interlaced;
int has_alpha;
int num_passes;
@@ -200,7 +200,7 @@ int ReadPNG(FILE* in_file, WebPPicture* const pic, int keep_alpha,
int ok = 0;
png_uint_32 width, height, y;
int stride;
uint8_t* rgb = NULL;
uint8_t* volatile rgb = NULL;
png = png_create_read_struct(PNG_LIBPNG_VER_STRING, 0, 0, 0);
if (png == NULL) {
@@ -211,7 +211,6 @@ int ReadPNG(FILE* in_file, WebPPicture* const pic, int keep_alpha,
if (setjmp(png_jmpbuf(png))) {
Error:
MetadataFree(metadata);
png_destroy_read_struct(&png, &info, &end_info);
goto End;
}
@@ -228,7 +227,9 @@ int ReadPNG(FILE* in_file, WebPPicture* const pic, int keep_alpha,
png_set_strip_16(png);
png_set_packing(png);
if (color_type == PNG_COLOR_TYPE_PALETTE) png_set_palette_to_rgb(png);
if (color_type == PNG_COLOR_TYPE_PALETTE) {
png_set_palette_to_rgb(png);
}
if (color_type == PNG_COLOR_TYPE_GRAY ||
color_type == PNG_COLOR_TYPE_GRAY_ALPHA) {
if (bit_depth < 8) {
@@ -255,7 +256,7 @@ int ReadPNG(FILE* in_file, WebPPicture* const pic, int keep_alpha,
if (rgb == NULL) goto Error;
for (p = 0; p < num_passes; ++p) {
for (y = 0; y < height; ++y) {
png_bytep row = rgb + y * stride;
png_bytep row = (png_bytep)(rgb + y * stride);
png_read_rows(png, &row, NULL, 1);
}
}
@@ -267,8 +268,6 @@ int ReadPNG(FILE* in_file, WebPPicture* const pic, int keep_alpha,
goto Error;
}
png_destroy_read_struct(&png, &info, &end_info);
pic->width = width;
pic->height = height;
pic->use_argb = 1;
@@ -280,6 +279,10 @@ int ReadPNG(FILE* in_file, WebPPicture* const pic, int keep_alpha,
}
End:
if (png != NULL) {
png_destroy_read_struct((png_structpp)&png,
(png_infopp)&info, (png_infopp)&end_info);
}
free(rgb);
return ok;
}

View File

@@ -14,6 +14,8 @@
#ifndef WEBP_EXAMPLES_STOPWATCH_H_
#define WEBP_EXAMPLES_STOPWATCH_H_
#include "webp/types.h"
#if defined _WIN32 && !defined __GNUC__
#include <windows.h>
@@ -37,6 +39,7 @@ static WEBP_INLINE double StopwatchReadAndReset(Stopwatch* watch) {
#else /* !_WIN32 */
#include <string.h> // memcpy
#include <sys/time.h>
typedef struct timeval Stopwatch;
@@ -46,10 +49,13 @@ static WEBP_INLINE void StopwatchReset(Stopwatch* watch) {
}
static WEBP_INLINE double StopwatchReadAndReset(Stopwatch* watch) {
const struct timeval old_value = *watch;
struct timeval old_value;
double delta_sec, delta_usec;
memcpy(&old_value, watch, sizeof(old_value));
gettimeofday(watch, NULL);
return watch->tv_sec - old_value.tv_sec +
(watch->tv_usec - old_value.tv_usec) / 1000000.0;
delta_sec = (double)watch->tv_sec - old_value.tv_sec;
delta_usec = (double)watch->tv_usec - old_value.tv_usec;
return delta_sec + delta_usec / 1000000.0;
}
#endif /* _WIN32 */

View File

@@ -12,7 +12,7 @@
#include "./tiffdec.h"
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#include <stdio.h>
@@ -97,7 +97,7 @@ int ReadTIFF(const char* const filename,
pic->width = width;
pic->height = height;
// TIFF data is ABGR
#ifdef __BIG_ENDIAN__
#ifdef WORDS_BIGENDIAN
TIFFSwabArrayOfLong(raster, width * height);
#endif
pic->use_argb = 1;

View File

@@ -11,7 +11,7 @@
//
// Author: Skal (pascal.massimino@gmail.com)
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#include <stdio.h>
@@ -42,8 +42,6 @@
#define snprintf _snprintf
#endif
static void Help(void);
// Unfortunate global variables. Gathered into a struct for comfort.
static struct {
int has_animation;
@@ -82,6 +80,16 @@ static void ClearParams(void) {
kParams.dmux = NULL;
}
// Sets the previous frame to the dimensions of the canvas and has it dispose
// to background to cause the canvas to be cleared.
static void ClearPreviousFrame(void) {
WebPIterator* const prev = &kParams.prev_frame;
prev->width = kParams.canvas_width;
prev->height = kParams.canvas_height;
prev->x_offset = prev->y_offset = 0;
prev->dispose_method = WEBP_MUX_DISPOSE_BACKGROUND;
}
// -----------------------------------------------------------------------------
// Color profile handling
static int ApplyColorProfile(const WebPData* const profile,
@@ -181,6 +189,8 @@ static void decode_callback(int what) {
if (WebPDemuxGetFrame(kParams.dmux, 1, curr)) {
--kParams.loop_count;
kParams.done = (kParams.loop_count == 0);
if (kParams.done) return;
ClearPreviousFrame();
} else {
kParams.decoding_error = 1;
kParams.done = 1;
@@ -372,19 +382,22 @@ static void Help(void) {
printf("Usage: vwebp in_file [options]\n\n"
"Decodes the WebP image file and visualize it using OpenGL\n"
"Options are:\n"
" -version .... print version number and exit.\n"
" -noicc ....... don't use the icc profile if present.\n"
" -nofancy ..... don't use the fancy YUV420 upscaler.\n"
" -nofilter .... disable in-loop filtering.\n"
" -dither <int> dithering strength (0..100). Default=50.\n"
" -mt .......... use multi-threading.\n"
" -info ........ print info.\n"
" -h ....... this help message.\n"
" -version .... print version number and exit\n"
" -noicc ....... don't use the icc profile if present\n"
" -nofancy ..... don't use the fancy YUV420 upscaler\n"
" -nofilter .... disable in-loop filtering\n"
" -dither <int> dithering strength (0..100), default=50\n"
#if WEBP_DECODER_ABI_VERSION > 0x0204
" -noalphadither disable alpha plane dithering\n"
#endif
" -mt .......... use multi-threading\n"
" -info ........ print info\n"
" -h ....... this help message\n"
"\n"
"Keyboard shortcuts:\n"
" 'c' ................ toggle use of color profile.\n"
" 'i' ................ overlay file information.\n"
" 'q' / 'Q' / ESC .... quit.\n"
" 'c' ................ toggle use of color profile\n"
" 'i' ................ overlay file information\n"
" 'q' / 'Q' / ESC .... quit\n"
);
}
@@ -392,16 +405,19 @@ int main(int argc, char *argv[]) {
int c;
WebPDecoderConfig* const config = &kParams.config;
WebPIterator* const curr = &kParams.curr_frame;
WebPIterator* const prev = &kParams.prev_frame;
if (!WebPInitDecoderConfig(config)) {
fprintf(stderr, "Library version mismatch!\n");
return -1;
}
config->options.dithering_strength = 50;
#if WEBP_DECODER_ABI_VERSION > 0x0204
config->options.alpha_dithering_strength = 100;
#endif
kParams.use_color_profile = 1;
for (c = 1; c < argc; ++c) {
int parse_error = 0;
if (!strcmp(argv[c], "-h") || !strcmp(argv[c], "-help")) {
Help();
return 0;
@@ -411,8 +427,13 @@ int main(int argc, char *argv[]) {
config->options.no_fancy_upsampling = 1;
} else if (!strcmp(argv[c], "-nofilter")) {
config->options.bypass_filtering = 1;
#if WEBP_DECODER_ABI_VERSION > 0x0204
} else if (!strcmp(argv[c], "-noalphadither")) {
config->options.alpha_dithering_strength = 0;
#endif
} else if (!strcmp(argv[c], "-dither") && c + 1 < argc) {
config->options.dithering_strength = strtol(argv[++c], NULL, 0);
config->options.dithering_strength =
ExUtilGetInt(argv[++c], 0, &parse_error);
} else if (!strcmp(argv[c], "-info")) {
kParams.print_info = 1;
} else if (!strcmp(argv[c], "-version")) {
@@ -435,6 +456,11 @@ int main(int argc, char *argv[]) {
} else {
kParams.file_name = argv[c];
}
if (parse_error) {
Help();
return -1;
}
}
if (kParams.file_name == NULL) {
@@ -469,10 +495,7 @@ int main(int argc, char *argv[]) {
printf("Canvas: %d x %d\n", kParams.canvas_width, kParams.canvas_height);
}
prev->width = kParams.canvas_width;
prev->height = kParams.canvas_height;
prev->x_offset = prev->y_offset = 0;
prev->dispose_method = WEBP_MUX_DISPOSE_BACKGROUND;
ClearPreviousFrame();
memset(&kParams.iccp, 0, sizeof(kParams.iccp));
kParams.has_color_profile =

67
examples/webpdec.c Normal file
View File

@@ -0,0 +1,67 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// WebP decode.
#include "./webpdec.h"
#include <stdio.h>
#include <stdlib.h>
#include "webp/decode.h"
#include "webp/encode.h"
#include "./example_util.h"
#include "./metadata.h"
int ReadWebP(const char* const in_file, WebPPicture* const pic,
int keep_alpha, Metadata* const metadata) {
int ok = 0;
size_t data_size = 0;
const uint8_t* data = NULL;
VP8StatusCode status = VP8_STATUS_OK;
WebPDecoderConfig config;
WebPDecBuffer* const output_buffer = &config.output;
WebPBitstreamFeatures* const bitstream = &config.input;
// TODO(jzern): add Exif/XMP/ICC extraction.
if (metadata != NULL) {
fprintf(stderr, "Warning: metadata extraction from WebP is unsupported.\n");
}
if (!WebPInitDecoderConfig(&config)) {
fprintf(stderr, "Library version mismatch!\n");
return 0;
}
if (ExUtilLoadWebP(in_file, &data, &data_size, bitstream)) {
const int has_alpha = keep_alpha && bitstream->has_alpha;
output_buffer->colorspace = has_alpha ? MODE_RGBA : MODE_RGB;
status = ExUtilDecodeWebP(data, data_size, 0, &config);
if (status == VP8_STATUS_OK) {
const uint8_t* const rgba = output_buffer->u.RGBA.rgba;
const int stride = output_buffer->u.RGBA.stride;
pic->width = output_buffer->width;
pic->height = output_buffer->height;
pic->use_argb = 1;
ok = has_alpha ? WebPPictureImportRGBA(pic, rgba, stride)
: WebPPictureImportRGB(pic, rgba, stride);
}
}
if (status != VP8_STATUS_OK) {
ExUtilPrintWebPError(in_file, status);
}
free((void*)data);
WebPFreeDecBuffer(output_buffer);
return ok;
}
// -----------------------------------------------------------------------------

33
examples/webpdec.h Normal file
View File

@@ -0,0 +1,33 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// WebP decode.
#ifndef WEBP_EXAMPLES_WEBPDEC_H_
#define WEBP_EXAMPLES_WEBPDEC_H_
#ifdef __cplusplus
extern "C" {
#endif
struct Metadata;
struct WebPPicture;
// Reads a WebP from 'in_file', returning the decoded output in 'pic'.
// If 'keep_alpha' is true and the WebP has an alpha channel, the output is
// RGBA otherwise it will be RGB.
// Returns true on success.
int ReadWebP(const char* const in_file, struct WebPPicture* const pic,
int keep_alpha, struct Metadata* const metadata);
#ifdef __cplusplus
} // extern "C"
#endif
#endif // WEBP_EXAMPLES_WEBPDEC_H_

View File

@@ -46,7 +46,7 @@
*/
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#include <assert.h>
@@ -130,7 +130,7 @@ static int CountOccurrences(const char* arglist[], int list_length,
return num_occurences;
}
static const char* const kErrorMessages[] = {
static const char* const kErrorMessages[-WEBP_MUX_NOT_ENOUGH_DATA + 1] = {
"WEBP_MUX_NOT_FOUND", "WEBP_MUX_INVALID_ARGUMENT", "WEBP_MUX_BAD_DATA",
"WEBP_MUX_MEMORY_ERROR", "WEBP_MUX_NOT_ENOUGH_DATA"
};
@@ -303,51 +303,51 @@ static void PrintHelp(void) {
printf("\n");
printf("GET_OPTIONS:\n");
printf(" Extract relevant data.\n");
printf(" icc Get ICC profile.\n");
printf(" exif Get EXIF metadata.\n");
printf(" xmp Get XMP metadata.\n");
printf(" Extract relevant data:\n");
printf(" icc get ICC profile\n");
printf(" exif get EXIF metadata\n");
printf(" xmp get XMP metadata\n");
#ifdef WEBP_EXPERIMENTAL_FEATURES
printf(" frgm n Get nth fragment.\n");
printf(" frgm n get nth fragment\n");
#endif
printf(" frame n Get nth frame.\n");
printf(" frame n get nth frame\n");
printf("\n");
printf("SET_OPTIONS:\n");
printf(" Set color profile/metadata.\n");
printf(" icc file.icc Set ICC profile.\n");
printf(" exif file.exif Set EXIF metadata.\n");
printf(" xmp file.xmp Set XMP metadata.\n");
printf(" Set color profile/metadata:\n");
printf(" icc file.icc set ICC profile\n");
printf(" exif file.exif set EXIF metadata\n");
printf(" xmp file.xmp set XMP metadata\n");
printf(" where: 'file.icc' contains the ICC profile to be set,\n");
printf(" 'file.exif' contains the EXIF metadata to be set\n");
printf(" 'file.xmp' contains the XMP metadata to be set\n");
printf("\n");
printf("STRIP_OPTIONS:\n");
printf(" Strip color profile/metadata.\n");
printf(" icc Strip ICC profile.\n");
printf(" exif Strip EXIF metadata.\n");
printf(" xmp Strip XMP metadata.\n");
printf(" Strip color profile/metadata:\n");
printf(" icc strip ICC profile\n");
printf(" exif strip EXIF metadata\n");
printf(" xmp strip XMP metadata\n");
#ifdef WEBP_EXPERIMENTAL_FEATURES
printf("\n");
printf("FRAGMENT_OPTIONS(i):\n");
printf(" Create fragmented image.\n");
printf(" Create fragmented image:\n");
printf(" file_i +xi+yi\n");
printf(" where: 'file_i' is the i'th fragment (WebP format),\n");
printf(" 'xi','yi' specify the image offset for this fragment."
printf(" 'xi','yi' specify the image offset for this fragment"
"\n");
#endif
printf("\n");
printf("FRAME_OPTIONS(i):\n");
printf(" Create animation.\n");
printf(" Create animation:\n");
printf(" file_i +di+[xi+yi[+mi[bi]]]\n");
printf(" where: 'file_i' is the i'th animation frame (WebP format),\n");
printf(" 'di' is the pause duration before next frame.\n");
printf(" 'xi','yi' specify the image offset for this frame.\n");
printf(" 'mi' is the dispose method for this frame (0 or 1).\n");
printf(" 'bi' is the blending method for this frame (+b or -b)."
printf(" 'di' is the pause duration before next frame,\n");
printf(" 'xi','yi' specify the image offset for this frame,\n");
printf(" 'mi' is the dispose method for this frame (0 or 1),\n");
printf(" 'bi' is the blending method for this frame (+b or -b)"
"\n");
printf("\n");
@@ -363,7 +363,7 @@ static void PrintHelp(void) {
"specifying\n");
printf(" the Alpha, Red, Green and Blue component values "
"respectively\n");
printf(" [Default: 255,255,255,255].\n");
printf(" [Default: 255,255,255,255]\n");
printf("\nINPUT & OUTPUT are in WebP format.\n");
@@ -371,6 +371,14 @@ static void PrintHelp(void) {
printf(" and is assumed to be\nvalid.\n");
}
static void WarnAboutOddOffset(const WebPMuxFrameInfo* const info) {
if ((info->x_offset | info->y_offset) & 1) {
fprintf(stderr, "Warning: odd offsets will be snapped to even values"
" (%d, %d) -> (%d, %d)\n", info->x_offset, info->y_offset,
info->x_offset & ~1, info->y_offset & ~1);
}
}
static int ReadFileToWebPData(const char* const filename,
WebPData* const webp_data) {
const uint8_t* data;
@@ -394,8 +402,9 @@ static int CreateMux(const char* const filename, WebPMux** mux) {
static int WriteData(const char* filename, const WebPData* const webpdata) {
int ok = 0;
FILE* fout = strcmp(filename, "-") ? fopen(filename, "wb") : stdout;
if (!fout) {
FILE* fout = strcmp(filename, "-") ? fopen(filename, "wb")
: ExUtilSetBinaryMode(stdout);
if (fout == NULL) {
fprintf(stderr, "Error opening output WebP file %s!\n", filename);
return 0;
}
@@ -444,6 +453,9 @@ static int ParseFrameArgs(const char* args, WebPMuxFrameInfo* const info) {
default:
return 0;
}
WarnAboutOddOffset(info);
// Note: The sanity of the following conversion is checked by
// WebPMuxPushFrame().
info->dispose_method = (WebPMuxAnimDispose)dispose_method;
@@ -456,7 +468,10 @@ static int ParseFrameArgs(const char* args, WebPMuxFrameInfo* const info) {
}
static int ParseFragmentArgs(const char* args, WebPMuxFrameInfo* const info) {
return (sscanf(args, "+%d+%d", &info->x_offset, &info->y_offset) == 2);
const int ok =
(sscanf(args, "+%d+%d", &info->x_offset, &info->y_offset) == 2);
if (ok) WarnAboutOddOffset(info);
return ok;
}
static int ParseBgcolorArgs(const char* args, uint32_t* const bgcolor) {
@@ -473,7 +488,7 @@ static int ParseBgcolorArgs(const char* args, uint32_t* const bgcolor) {
static void DeleteConfig(WebPMuxConfig* config) {
if (config != NULL) {
free(config->feature_.args_);
free(config);
memset(config, 0, sizeof(*config));
}
}
@@ -776,33 +791,27 @@ static int ValidateConfig(WebPMuxConfig* config) {
// Create config object from command-line arguments.
static int InitializeConfig(int argc, const char* argv[],
WebPMuxConfig** config) {
WebPMuxConfig* config) {
int num_feature_args = 0;
int ok = 1;
assert(config != NULL);
*config = NULL;
memset(config, 0, sizeof(*config));
// Validate command-line arguments.
if (!ValidateCommandLine(argc, argv, &num_feature_args)) {
ERROR_GOTO1("Exiting due to command-line parsing error.\n", Err1);
}
// Allocate memory.
*config = (WebPMuxConfig*)calloc(1, sizeof(**config));
if (*config == NULL) {
ERROR_GOTO1("ERROR: Memory allocation error.\n", Err1);
}
(*config)->feature_.arg_count_ = num_feature_args;
(*config)->feature_.args_ =
(FeatureArg*)calloc(num_feature_args, sizeof(FeatureArg));
if ((*config)->feature_.args_ == NULL) {
config->feature_.arg_count_ = num_feature_args;
config->feature_.args_ =
(FeatureArg*)calloc(num_feature_args, sizeof(*config->feature_.args_));
if (config->feature_.args_ == NULL) {
ERROR_GOTO1("ERROR: Memory allocation error.\n", Err1);
}
// Parse command-line.
if (!ParseCommandLine(argc, argv, *config) ||
!ValidateConfig(*config)) {
if (!ParseCommandLine(argc, argv, config) || !ValidateConfig(config)) {
ERROR_GOTO1("Exiting due to command-line parsing error.\n", Err1);
}
@@ -824,14 +833,16 @@ static int GetFrameFragment(const WebPMux* mux,
WebPMux* mux_single = NULL;
long num = 0;
int ok = 1;
int parse_error = 0;
const WebPChunkId id = is_frame ? WEBP_CHUNK_ANMF : WEBP_CHUNK_FRGM;
WebPMuxFrameInfo info;
WebPDataInit(&info.bitstream);
num = strtol(config->feature_.args_[0].params_, NULL, 10);
num = ExUtilGetInt(config->feature_.args_[0].params_, 10, &parse_error);
if (num < 0) {
ERROR_GOTO1("ERROR: Frame/Fragment index must be non-negative.\n", ErrGet);
}
if (parse_error) goto ErrGet;
err = WebPMuxGetFrame(mux, num, &info);
if (err == WEBP_MUX_OK && info.id != id) err = WEBP_MUX_NOT_FOUND;
@@ -857,7 +868,7 @@ static int GetFrameFragment(const WebPMux* mux,
ErrGet:
WebPDataClear(&info.bitstream);
WebPMuxDelete(mux_single);
return ok;
return ok && !parse_error;
}
// Read and process config.
@@ -919,16 +930,19 @@ static int Process(const WebPMuxConfig* config) {
break;
}
case SUBTYPE_LOOP: {
const long loop_count =
strtol(feature->args_[i].params_, NULL, 10);
if (loop_count != (int)loop_count) {
int parse_error = 0;
const int loop_count =
ExUtilGetInt(feature->args_[i].params_, 10, &parse_error);
if (loop_count < 0 || loop_count > 65535) {
// Note: This is only a 'necessary' condition for loop_count
// to be valid. The 'sufficient' conditioned in checked in
// WebPMuxSetAnimationParams() method called later.
ERROR_GOTO1("ERROR: Loop count must be in the range 0 to "
"65535.\n", Err2);
}
params.loop_count = (int)loop_count;
ok = !parse_error;
if (!ok) goto Err2;
params.loop_count = loop_count;
break;
}
case SUBTYPE_ANMF: {
@@ -1028,8 +1042,8 @@ static int Process(const WebPMuxConfig* config) {
ErrorString(err), kDescriptions[feature->type_], Err2);
}
} else {
ERROR_GOTO1("ERROR: Invalid feature for action 'strip'.\n", Err2);
break;
ERROR_GOTO1("ERROR: Invalid feature for action 'strip'.\n", Err2);
break;
}
ok = WriteWebP(mux, config->output_);
break;
@@ -1055,14 +1069,14 @@ static int Process(const WebPMuxConfig* config) {
// Main.
int main(int argc, const char* argv[]) {
WebPMuxConfig* config;
WebPMuxConfig config;
int ok = InitializeConfig(argc - 1, argv + 1, &config);
if (ok) {
ok = Process(config);
ok = Process(&config);
} else {
PrintHelp();
}
DeleteConfig(config);
DeleteConfig(&config);
return !ok;
}

View File

@@ -12,9 +12,10 @@
#include "./wicdec.h"
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "webp/config.h"
#endif
#include <assert.h>
#include <stdio.h>
#ifdef HAVE_WINCODEC_H
@@ -109,6 +110,7 @@ static HRESULT ExtractICCP(IWICImagingFactory* const factory,
IFS(IWICBitmapFrameDecode_GetColorContexts(frame,
count, color_contexts,
&num_color_contexts));
assert(FAILED(hr) || num_color_contexts <= count);
for (i = 0; SUCCEEDED(hr) && i < num_color_contexts; ++i) {
WICColorContextType type;
IFS(IWICColorContext_GetType(color_contexts[i], &type));
@@ -116,7 +118,7 @@ static HRESULT ExtractICCP(IWICImagingFactory* const factory,
UINT size;
IFS(IWICColorContext_GetProfileBytes(color_contexts[i],
0, NULL, &size));
if (size > 0) {
if (SUCCEEDED(hr) && size > 0) {
iccp->bytes = (uint8_t*)malloc(size);
if (iccp->bytes == NULL) {
hr = E_OUTOFMEMORY;
@@ -261,7 +263,7 @@ int ReadPictureWithWIC(const char* const filename,
IFS(IWICBitmapFrameDecode_GetPixelFormat(frame, &src_pixel_format));
IFS(IWICBitmapDecoder_GetContainerFormat(decoder, &src_container_format));
if (keep_alpha) {
if (SUCCEEDED(hr) && keep_alpha) {
const GUID** guid;
for (guid = kAlphaContainers; *guid != NULL; ++guid) {
if (IsEqualGUID(MAKE_REFGUID(src_container_format),

View File

@@ -12,31 +12,37 @@
set -e
# Extract the latest SDK version from the final field of the form: iphoneosX.Y
declare -r SDK=$(xcodebuild -showsdks \
readonly SDK=$(xcodebuild -showsdks \
| grep iphoneos | sort | tail -n 1 | awk '{print substr($NF, 9)}'
)
# Extract Xcode version.
declare -r XCODE=$(xcodebuild -version | grep Xcode | cut -d " " -f2)
readonly XCODE=$(xcodebuild -version | grep Xcode | cut -d " " -f2)
if [[ -z "${XCODE}" ]]; then
echo "Xcode not available"
exit 1
fi
declare -r OLDPATH=${PATH}
readonly OLDPATH=${PATH}
# Add iPhoneOS-V6 to the list of platforms below if you need armv6 support.
# Note that iPhoneOS-V6 support is not available with the iOS6 SDK.
declare -r PLATFORMS="iPhoneSimulator iPhoneOS-V7 iPhoneOS-V7s"
declare -r SRCDIR=$(dirname $0)
declare -r TOPDIR=$(pwd)
declare -r BUILDDIR="${TOPDIR}/iosbuild"
declare -r TARGETDIR="${TOPDIR}/WebP.framework"
declare -r DEVELOPER=$(xcode-select --print-path)
declare -r PLATFORMSROOT="${DEVELOPER}/Platforms"
declare -r LIPO=$(xcrun -sdk iphoneos${SDK} -find lipo)
PLATFORMS="iPhoneSimulator iPhoneSimulator64"
PLATFORMS+=" iPhoneOS-V7 iPhoneOS-V7s iPhoneOS-V7-arm64"
readonly PLATFORMS
readonly SRCDIR=$(dirname $0)
readonly TOPDIR=$(pwd)
readonly BUILDDIR="${TOPDIR}/iosbuild"
readonly TARGETDIR="${TOPDIR}/WebP.framework"
readonly DEVELOPER=$(xcode-select --print-path)
readonly PLATFORMSROOT="${DEVELOPER}/Platforms"
readonly LIPO=$(xcrun -sdk iphoneos${SDK} -find lipo)
LIBLIST=''
if [[ -z "${SDK}" ]]; then
echo "iOS SDK not available"
exit 1
elif [[ ${SDK} < 4.0 ]]; then
echo "You need iOS SDK version 4.0 or above"
elif [[ ${SDK} < 6.0 ]]; then
echo "You need iOS SDK version 6.0 or above"
exit 1
else
echo "iOS SDK Version ${SDK}"
@@ -47,10 +53,25 @@ rm -rf ${TARGETDIR}
mkdir -p ${BUILDDIR}
mkdir -p ${TARGETDIR}/Headers/
[[ -e ${SRCDIR}/configure ]] || (cd ${SRCDIR} && sh autogen.sh)
if [[ ! -e ${SRCDIR}/configure ]]; then
if ! (cd ${SRCDIR} && sh autogen.sh); then
cat <<EOT
Error creating configure script!
This script requires the autoconf/automake and libtool to build. MacPorts can
be used to obtain these:
http://www.macports.org/install.php
EOT
exit 1
fi
fi
for PLATFORM in ${PLATFORMS}; do
if [[ "${PLATFORM}" == "iPhoneOS-V7s" ]]; then
ARCH2=""
if [[ "${PLATFORM}" == "iPhoneOS-V7-arm64" ]]; then
PLATFORM="iPhoneOS"
ARCH="aarch64"
ARCH2="arm64"
elif [[ "${PLATFORM}" == "iPhoneOS-V7s" ]]; then
PLATFORM="iPhoneOS"
ARCH="armv7s"
elif [[ "${PLATFORM}" == "iPhoneOS-V7" ]]; then
@@ -59,6 +80,9 @@ for PLATFORM in ${PLATFORMS}; do
elif [[ "${PLATFORM}" == "iPhoneOS-V6" ]]; then
PLATFORM="iPhoneOS"
ARCH="armv6"
elif [[ "${PLATFORM}" == "iPhoneSimulator64" ]]; then
PLATFORM="iPhoneSimulator"
ARCH="x86_64"
else
ARCH="i386"
fi
@@ -66,30 +90,20 @@ for PLATFORM in ${PLATFORMS}; do
ROOTDIR="${BUILDDIR}/${PLATFORM}-${SDK}-${ARCH}"
mkdir -p "${ROOTDIR}"
SDKROOT="${PLATFORMSROOT}/${PLATFORM}.platform/Developer/SDKs/${PLATFORM}${SDK}.sdk/"
CFLAGS="-arch ${ARCH} -pipe -isysroot ${SDKROOT}"
LDFLAGS="-arch ${ARCH} -pipe -isysroot ${SDKROOT}"
DEVROOT="${DEVELOPER}/Toolchains/XcodeDefault.xctoolchain"
SDKROOT="${PLATFORMSROOT}/"
SDKROOT+="${PLATFORM}.platform/Developer/SDKs/${PLATFORM}${SDK}.sdk/"
CFLAGS="-arch ${ARCH2:-${ARCH}} -pipe -isysroot ${SDKROOT} -O3 -DNDEBUG"
CFLAGS+=" -miphoneos-version-min=6.0"
if [[ -z "${XCODE}" ]]; then
echo "XCODE not available"
exit 1
elif [[ ${SDK} < 5.0.0 ]]; then
DEVROOT="${PLATFORMSROOT}/${PLATFORM}.platform/Developer/"
else
DEVROOT="${DEVELOPER}/Toolchains/XcodeDefault.xctoolchain"
CFLAGS+=" -miphoneos-version-min=5.0"
LDFLAGS+=" -miphoneos-version-min=5.0"
fi
export CFLAGS
export LDFLAGS
export CXXFLAGS=${CFLAGS}
set -x
export PATH="${DEVROOT}/usr/bin:${OLDPATH}"
${SRCDIR}/configure --host=${ARCH}-apple-darwin --prefix=${ROOTDIR} \
--build=$(${SRCDIR}/config.guess) \
--disable-shared --enable-static \
--enable-libwebpdecoder --enable-swap-16bit-csp
--enable-libwebpdecoder --enable-swap-16bit-csp \
CFLAGS="${CFLAGS}"
set +x
# run make only in the src/ directory to create libwebpdecoder.a
cd src/
@@ -104,5 +118,5 @@ for PLATFORM in ${PLATFORMS}; do
export PATH=${OLDPATH}
done
cp -a ${SRCDIR}/src/webp/* ${TARGETDIR}/Headers/
cp -a ${SRCDIR}/src/webp/*.h ${TARGETDIR}/Headers/
${LIPO} -create ${LIBLIST} -output ${TARGETDIR}/WebP

View File

@@ -82,7 +82,7 @@
# modified version of the Autoconf Macro, you may extend this special
# exception to the GPL to apply to your modified version as well.
#serial 18
#serial 21
AU_ALIAS([ACX_PTHREAD], [AX_PTHREAD])
AC_DEFUN([AX_PTHREAD], [
@@ -103,8 +103,8 @@ if test x"$PTHREAD_LIBS$PTHREAD_CFLAGS" != x; then
save_LIBS="$LIBS"
LIBS="$PTHREAD_LIBS $LIBS"
AC_MSG_CHECKING([for pthread_join in LIBS=$PTHREAD_LIBS with CFLAGS=$PTHREAD_CFLAGS])
AC_TRY_LINK_FUNC(pthread_join, ax_pthread_ok=yes)
AC_MSG_RESULT($ax_pthread_ok)
AC_TRY_LINK_FUNC([pthread_join], [ax_pthread_ok=yes])
AC_MSG_RESULT([$ax_pthread_ok])
if test x"$ax_pthread_ok" = xno; then
PTHREAD_LIBS=""
PTHREAD_CFLAGS=""
@@ -164,6 +164,20 @@ case ${host_os} in
;;
esac
# Clang doesn't consider unrecognized options an error unless we specify
# -Werror. We throw in some extra Clang-specific options to ensure that
# this doesn't happen for GCC, which also accepts -Werror.
AC_MSG_CHECKING([if compiler needs -Werror to reject unknown flags])
save_CFLAGS="$CFLAGS"
ax_pthread_extra_flags="-Werror"
CFLAGS="$CFLAGS $ax_pthread_extra_flags -Wunknown-warning-option -Wsizeof-array-argument"
AC_COMPILE_IFELSE([AC_LANG_PROGRAM([int foo(void);],[foo()])],
[AC_MSG_RESULT([yes])],
[ax_pthread_extra_flags=
AC_MSG_RESULT([no])])
CFLAGS="$save_CFLAGS"
if test x"$ax_pthread_ok" = xno; then
for flag in $ax_pthread_flags; do
@@ -178,7 +192,7 @@ for flag in $ax_pthread_flags; do
;;
pthread-config)
AC_CHECK_PROG(ax_pthread_config, pthread-config, yes, no)
AC_CHECK_PROG([ax_pthread_config], [pthread-config], [yes], [no])
if test x"$ax_pthread_config" = xno; then continue; fi
PTHREAD_CFLAGS="`pthread-config --cflags`"
PTHREAD_LIBS="`pthread-config --ldflags` `pthread-config --libs`"
@@ -193,7 +207,7 @@ for flag in $ax_pthread_flags; do
save_LIBS="$LIBS"
save_CFLAGS="$CFLAGS"
LIBS="$PTHREAD_LIBS $LIBS"
CFLAGS="$CFLAGS $PTHREAD_CFLAGS"
CFLAGS="$CFLAGS $PTHREAD_CFLAGS $ax_pthread_extra_flags"
# Check for various functions. We must include pthread.h,
# since some functions may be macros. (On the Sequent, we
@@ -219,7 +233,7 @@ for flag in $ax_pthread_flags; do
LIBS="$save_LIBS"
CFLAGS="$save_CFLAGS"
AC_MSG_RESULT($ax_pthread_ok)
AC_MSG_RESULT([$ax_pthread_ok])
if test "x$ax_pthread_ok" = xyes; then
break;
fi
@@ -245,9 +259,9 @@ if test "x$ax_pthread_ok" = xyes; then
[attr_name=$attr; break],
[])
done
AC_MSG_RESULT($attr_name)
AC_MSG_RESULT([$attr_name])
if test "$attr_name" != PTHREAD_CREATE_JOINABLE; then
AC_DEFINE_UNQUOTED(PTHREAD_CREATE_JOINABLE, $attr_name,
AC_DEFINE_UNQUOTED([PTHREAD_CREATE_JOINABLE], [$attr_name],
[Define to necessary symbol if this constant
uses a non-standard name on your system.])
fi
@@ -261,45 +275,54 @@ if test "x$ax_pthread_ok" = xyes; then
if test "$GCC" = "yes"; then
flag="-D_REENTRANT"
else
# TODO: What about Clang on Solaris?
flag="-mt -D_REENTRANT"
fi
;;
esac
AC_MSG_RESULT(${flag})
AC_MSG_RESULT([$flag])
if test "x$flag" != xno; then
PTHREAD_CFLAGS="$flag $PTHREAD_CFLAGS"
fi
AC_CACHE_CHECK([for PTHREAD_PRIO_INHERIT],
ax_cv_PTHREAD_PRIO_INHERIT, [
AC_LINK_IFELSE([
AC_LANG_PROGRAM([[#include <pthread.h>]], [[int i = PTHREAD_PRIO_INHERIT;]])],
[ax_cv_PTHREAD_PRIO_INHERIT], [
AC_LINK_IFELSE([AC_LANG_PROGRAM([[#include <pthread.h>]],
[[int i = PTHREAD_PRIO_INHERIT;]])],
[ax_cv_PTHREAD_PRIO_INHERIT=yes],
[ax_cv_PTHREAD_PRIO_INHERIT=no])
])
AS_IF([test "x$ax_cv_PTHREAD_PRIO_INHERIT" = "xyes"],
AC_DEFINE([HAVE_PTHREAD_PRIO_INHERIT], 1, [Have PTHREAD_PRIO_INHERIT.]))
[AC_DEFINE([HAVE_PTHREAD_PRIO_INHERIT], [1], [Have PTHREAD_PRIO_INHERIT.])])
LIBS="$save_LIBS"
CFLAGS="$save_CFLAGS"
# More AIX lossage: must compile with xlc_r or cc_r
if test x"$GCC" != xyes; then
AC_CHECK_PROGS(PTHREAD_CC, xlc_r cc_r, ${CC})
else
PTHREAD_CC=$CC
# More AIX lossage: compile with *_r variant
if test "x$GCC" != xyes; then
case $host_os in
aix*)
AS_CASE(["x/$CC"],
[x*/c89|x*/c89_128|x*/c99|x*/c99_128|x*/cc|x*/cc128|x*/xlc|x*/xlc_v6|x*/xlc128|x*/xlc128_v6],
[#handle absolute path differently from PATH based program lookup
AS_CASE(["x$CC"],
[x/*],
[AS_IF([AS_EXECUTABLE_P([${CC}_r])],[PTHREAD_CC="${CC}_r"])],
[AC_CHECK_PROGS([PTHREAD_CC],[${CC}_r],[$CC])])])
;;
esac
fi
else
PTHREAD_CC="$CC"
fi
AC_SUBST(PTHREAD_LIBS)
AC_SUBST(PTHREAD_CFLAGS)
AC_SUBST(PTHREAD_CC)
test -n "$PTHREAD_CC" || PTHREAD_CC="$CC"
AC_SUBST([PTHREAD_LIBS])
AC_SUBST([PTHREAD_CFLAGS])
AC_SUBST([PTHREAD_CC])
# Finally, execute ACTION-IF-FOUND/ACTION-IF-NOT-FOUND:
if test x"$ax_pthread_ok" = xyes; then
ifelse([$1],,AC_DEFINE(HAVE_PTHREAD,1,[Define if you have POSIX threads libraries and header files.]),[$1])
ifelse([$1],,[AC_DEFINE([HAVE_PTHREAD],[1],[Define if you have POSIX threads libraries and header files.])],[$1])
:
else
ax_pthread_ok=no

View File

@@ -67,8 +67,20 @@ EXTRA_FLAGS += -Wmissing-prototypes
EXTRA_FLAGS += -Wmissing-declarations
EXTRA_FLAGS += -Wdeclaration-after-statement
EXTRA_FLAGS += -Wshadow
EXTRA_FLAGS += -Wformat-security -Wformat-nonliteral
# EXTRA_FLAGS += -Wvla
# AVX2-specific flags:
ifeq ($(HAVE_AVX2), 1)
EXTRA_FLAGS += -DWEBP_HAVE_AVX2
src/dsp/%_avx2.o: EXTRA_FLAGS += -mavx2
endif
# NEON-specific flags:
# EXTRA_FLAGS += -march=armv7-a -mfloat-abi=hard -mfpu=neon -mtune=cortex-a8
# -> seems to make the overall lib slower: -fno-split-wide-types
#### Nothing should normally be changed below this line ####
AR = ar
@@ -87,7 +99,6 @@ DEC_OBJS = \
src/dec/frame.o \
src/dec/idec.o \
src/dec/io.o \
src/dec/layer.o \
src/dec/quant.o \
src/dec/tree.o \
src/dec/vp8.o \
@@ -98,18 +109,29 @@ DEMUX_OBJS = \
src/demux/demux.o \
DSP_DEC_OBJS = \
src/dsp/alpha_processing.o \
src/dsp/alpha_processing_sse2.o \
src/dsp/cpu.o \
src/dsp/dec.o \
src/dsp/dec_clip_tables.o \
src/dsp/dec_mips32.o \
src/dsp/dec_neon.o \
src/dsp/dec_sse2.o \
src/dsp/lossless.o \
src/dsp/lossless_mips32.o \
src/dsp/lossless_neon.o \
src/dsp/lossless_sse2.o \
src/dsp/upsampling.o \
src/dsp/upsampling_neon.o \
src/dsp/upsampling_sse2.o \
src/dsp/yuv.o \
src/dsp/yuv_mips32.o \
src/dsp/yuv_sse2.o \
DSP_ENC_OBJS = \
src/dsp/enc.o \
src/dsp/enc_avx2.o \
src/dsp/enc_mips32.o \
src/dsp/enc_neon.o \
src/dsp/enc_sse2.o \
@@ -123,8 +145,11 @@ ENC_OBJS = \
src/enc/frame.o \
src/enc/histogram.o \
src/enc/iterator.o \
src/enc/layer.o \
src/enc/picture.o \
src/enc/picture_csp.o \
src/enc/picture_psnr.o \
src/enc/picture_rescale.o \
src/enc/picture_tools.o \
src/enc/quant.o \
src/enc/syntax.o \
src/enc/token.o \
@@ -137,6 +162,7 @@ EX_FORMAT_DEC_OBJS = \
examples/metadata.o \
examples/pngdec.o \
examples/tiffdec.o \
examples/webpdec.o \
EX_UTIL_OBJS = \
examples/example_util.o \
@@ -150,7 +176,6 @@ MUX_OBJS = \
src/mux/muxread.o \
UTILS_DEC_OBJS = \
src/utils/alpha_processing.o \
src/utils/bit_reader.o \
src/utils/color_cache.o \
src/utils/filters.o \
@@ -188,10 +213,15 @@ HDRS = \
src/dec/webpi.h \
src/dsp/dsp.h \
src/dsp/lossless.h \
src/dsp/neon.h \
src/dsp/yuv.h \
src/dsp/yuv_tables_sse2.h \
src/enc/backward_references.h \
src/enc/cost.h \
src/enc/histogram.h \
src/enc/vp8enci.h \
src/utils/alpha_processing.h \
src/enc/vp8li.h \
src/mux/muxi.h \
src/utils/bit_reader.h \
src/utils/bit_writer.h \
src/utils/color_cache.h \
@@ -203,6 +233,7 @@ HDRS = \
src/utils/random.h \
src/utils/rescaler.h \
src/utils/thread.h \
src/utils/utils.h \
src/webp/format_constants.h \
$(HDRS_INSTALLED) \
@@ -222,6 +253,14 @@ all: ex $(EXTRA_EXAMPLES)
$(EX_FORMAT_DEC_OBJS): %.o: %.h
# special dependencies:
# tree.c/vp8.c/bit_reader.c <-> bit_reader_inl.h, endian_inl.h
# bit_writer.c <-> endian_inl.h
src/dec/tree.o: src/utils/bit_reader_inl.h src/utils/endian_inl.h
src/dec/vp8.o: src/utils/bit_reader_inl.h src/utils/endian_inl.h
src/utils/bit_reader.o: src/utils/bit_reader_inl.h src/utils/endian_inl.h
src/utils/bit_writer.o: src/utils/endian_inl.h
%.o: %.c $(HDRS)
$(CC) $(CFLAGS) $(CPPFLAGS) -c $< -o $@
@@ -241,7 +280,7 @@ examples/gif2webp: examples/gif2webp.o
examples/vwebp: examples/vwebp.o
examples/webpmux: examples/webpmux.o
examples/cwebp: src/libwebp.a
examples/cwebp: examples/libexample_util.a src/libwebp.a
examples/cwebp: EXTRA_LIBS += $(CWEBP_LIBS)
examples/dwebp: examples/libexample_util.a src/libwebpdecoder.a
examples/dwebp: EXTRA_LIBS += $(DWEBP_LIBS)
@@ -270,7 +309,7 @@ dist: all
$(INSTALL) -m644 src/demux/libwebpdemux.a $(DESTDIR)/lib
$(INSTALL) -m644 src/mux/libwebpmux.a $(DESTDIR)/lib
umask 022; \
for m in man/[cd]webp.1 man/gif2webp.1 man/webpmux.1; do \
for m in man/[cdv]webp.1 man/gif2webp.1 man/webpmux.1; do \
basenam=$$(basename $$m .1); \
$(GROFF) -t -e -man -T utf8 $$m \
| $(COL) -bx >$(DESTDIR)/doc/$${basenam}.txt; \

View File

@@ -5,4 +5,7 @@ endif
if BUILD_GIF2WEBP
man_MANS += gif2webp.1
endif
if BUILD_VWEBP
man_MANS += vwebp.1
endif
EXTRA_DIST = $(man_MANS)

View File

@@ -1,5 +1,5 @@
.\" Hey, EMACS: -*- nroff -*-
.TH CWEBP 1 "December 12, 2013"
.TH CWEBP 1 "Oct 13, 2014"
.SH NAME
cwebp \- compress an image file to a WebP file
.SH SYNOPSIS
@@ -12,13 +12,19 @@ This manual page documents the
command.
.PP
\fBcwebp\fP compresses an image using the WebP format.
Input format can be either PNG, JPEG, TIFF or raw Y'CbCr samples.
Input format can be either PNG, JPEG, TIFF, WebP or raw Y'CbCr samples.
.SH OPTIONS
The basic options are:
.TP
.BI \-o " string
Specify the name of the output WebP file. If omitted, \fBcwebp\fP will
perform compression but only report statistics.
Using "\-" as output name will direct output to 'stdout'.
.TP
.BI \-\- " string
Explicitly specify the input file. This option is useful if the input
file starts with an '\-' for instance. This option must appear \fBlast\fP.
Any other options afterward will be ignored.
.TP
.B \-h, \-help
A short usage summary.
@@ -39,6 +45,15 @@ with lower quality. Best quality is achieved by using a value of 100.
In case of lossless compression (specified by the \-lossless option), a small
factor enables faster compression speed, but produces a larger file. Maximum
compression is achieved by using a value of 100.
.\" TODO(jzern): restore post-v0.4.1
.\" .TP
.\" .BI \-z " int
.\" Switch on \fBlossless\fP compression mode with the specified level between 0
.\" and 9, with level 0 being the fastest, 9 being the slowest. Fast mode
.\" produces larger file size than slower ones. A good default is \-z 6.
.\" This option is actually a shortcut for some predefined settings for quality
.\" and method. If options \-q or \-m are subsequently used, they will invalidate
.\" the effect of this \-z option.
.TP
.BI \-alpha_q " int
Specify the compression factor for alpha compression between 0 and 100.

View File

@@ -1,5 +1,5 @@
.\" Hey, EMACS: -*- nroff -*-
.TH DWEBP 1 "December 12, 2013"
.TH DWEBP 1 "July 22, 2014"
.SH NAME
dwebp \- decompress a WebP file to an image file
.SH SYNOPSIS
@@ -25,6 +25,12 @@ Print the version number (as major.minor.revision) and exit.
Specify the name of the output file (as PNG format by default).
Using "-" as output name will direct output to 'stdout'.
.TP
.BI \-\- " string
Explicitly specify the input file. This option is useful if the input
file starts with an '\-' for instance. This option must appear \fBlast\fP.
Any other options afterward will be ignored. If the input file is "\-",
the data will be read from \fIstdin\fP instead of a file.
.TP
.B \-bmp
Change the output format to uncompressed BMP.
.TP
@@ -57,10 +63,16 @@ Don't use the in-loop filtering process even if it is required by
the bitstream. This may produce visible blocks on the non-compliant output,
but it will make the decoding faster.
.TP
.B \-dither " strength
.BI \-dither " strength
Specify a dithering \fBstrength\fP between 0 and 100. Dithering is a
post-processing effect applied to chroma components in lossy compression.
It helps by smoothing gradients and avoiding banding artifacts.
.\" TODO(jzern): restore post-v0.4.1
.\" .TP
.\" .BI \-alpha_dither
.\" If the compressed file contains a transparency plane that was quantized
.\" during compression, this flag will allow dithering the reconstructed plane
.\" in order to generate smoother transparency gradients.
.TP
.B \-nodither
Disable all dithering (default).
@@ -75,6 +87,10 @@ This cropping area must be fully contained within the source rectangle.
The top-left corner will be snapped to even coordinates if needed.
This option is meant to reduce the memory needed for cropping large images.
Note: the cropping is applied \fIbefore\fP any scaling.
.\" TODO(jzern): restore post-v0.4.1
.\" .TP
.\" .B \-flip
.\" Flip decoded image vertically (can be useful for OpenGL textures for instance).
.TP
.BI \-scale " width height
Rescale the decoded picture to dimension \fBwidth\fP x \fBheight\fP. This
@@ -101,6 +117,8 @@ dwebp picture.webp \-o output.png
dwebp picture.webp \-ppm \-o output.ppm
.br
dwebp \-o output.ppm \-\- \-\-\-picture.webp
.br
cat picture.webp | dwebp \-o \- \-\- \- > output.ppm
.SH AUTHORS
\fBdwebp\fP was written by the WebP team.

View File

@@ -1,5 +1,5 @@
.\" Hey, EMACS: -*- nroff -*-
.TH GIF2WEBP 1 "December 17, 2013"
.TH GIF2WEBP 1 "March 7, 2014"
.SH NAME
gif2webp \- Convert a GIF image to WebP
.SH SYNOPSIS
@@ -18,6 +18,7 @@ The basic options are:
.BI \-o " string
Specify the name of the output WebP file. If omitted, \fBgif2webp\fP will
perform conversion but only report statistics.
Using "\-" as output name will direct output to 'stdout'.
.TP
.B \-h, \-help
Usage information.

91
man/vwebp.1 Normal file
View File

@@ -0,0 +1,91 @@
.\" Hey, EMACS: -*- nroff -*-
.TH VWEBP 1 "July 23, 2014"
.SH NAME
vwebp \- decompress a WebP file and display it in a window
.SH SYNOPSIS
.B vwebp
.RI [ options ] " input_file.webp
.br
.SH DESCRIPTION
This manual page documents the
.B vwebp
command.
.PP
\fBvwebp\fP decompresses a WebP file and displays it in a window using OpenGL.
.SH OPTIONS
.TP
.B \-h
Print usage summary.
.TP
.B \-version
Print version number and exit.
.TP
.B \-noicc
Don't use the ICC profile if present.
.TP
.B \-nofancy
Don't use the fancy YUV420 upscaler.
.TP
.B \-nofilter
Disable in-loop filtering.
.TP
.BI \-dither " strength
Specify a dithering \fBstrength\fP between 0 and 100. Dithering is a
post-processing effect applied to chroma components in lossy compression.
It helps by smoothing gradients and avoiding banding artifacts. Default: 50.
.\" TODO(jzern): restore post-v0.4.1
.\" .TP
.\" .BI \-noalphadither
.\" By default, quantized transparency planes are dithered during decompression,
.\" to smooth the gradients. This flag will prevent this dithering.
.TP
.B \-mt
Use multi-threading for decoding, if possible.
.TP
.B \-info
Display image information on top of the decoded image.
.TP
.BI \-\- " string
Explicitly specify the input file. This option is useful if the input
file starts with an '\-' for instance. This option must appear \fBlast\fP.
Any other options afterward will be ignored. If the input file is "\-",
the data will be read from \fIstdin\fP instead of a file.
.TP
.SH KEYBOARD SHORTCUTS
.TP
.B 'c'
Toggle use of color profile.
.TP
.B 'i'
Overlay file information.
.TP
.B 'q' / 'Q' / ESC
Quit.
.SH BUGS
Please report all bugs to our issue tracker:
http://code.google.com/p/webp/issues
.br
Patches welcome! See this page to get started:
http://www.webmproject.org/code/contribute/submitting-patches/
.SH EXAMPLES
vwebp picture.webp
.br
vwebp picture.webp -mt -dither 0
.br
vwebp \-\- \-\-\-picture.webp
.SH AUTHORS
\fBvwebp\fP was written by the WebP team.
.br
The latest source tree is available at http://www.webmproject.org/code
.PP
This manual page was written for the Debian project (and may be used by others).
.SH SEE ALSO
.BR dwebp (1)
.br
Please refer to http://developers.google.com/speed/webp/ for additional
information.

View File

@@ -1,7 +1,8 @@
.\" Hey, EMACS: -*- nroff -*-
.TH WEBPMUX 1 "December 17, 2013"
.TH WEBPMUX 1 "August 28, 2014"
.SH NAME
webpmux \- command line tool to create WebP Mux/container file.
webpmux \- create animated WebP files from non\-animated WebP images, extract
frames from animated WebP images, and manage XMP/EXIF metadata and ICC profile.
.SH SYNOPSIS
.B webpmux \-get
.I GET_OPTIONS
@@ -45,8 +46,8 @@ This manual page documents the
.B webpmux
command.
.PP
\fBwebpmux\fP can be used to create a WebP container file
and extract/strip relevant data from the container file.
\fBwebpmux\fP can be used to create/extract from animated WebP files, as well as
to add/extract/strip XMP/EXIF metadata and ICC profile.
.SH OPTIONS
.SS GET_OPTIONS (\-get):
.TP
@@ -60,7 +61,7 @@ Get EXIF metadata.
Get XMP metadata.
.TP
.BI frame " n
Get nth frame.
Get nth frame from an animated image. (n = 0 has a special meaning: last frame).
.SS SET_OPTIONS (\-set)
.TP
@@ -91,12 +92,13 @@ Strip EXIF metadata.
Strip XMP metadata.
.SS FRAME_OPTIONS (\-frame)
Create an animated WebP file from multiple (non\-animated) WebP images.
.TP
.I file_i +di[+xi+yi[+mi[bi]]]
Where: 'file_i' is the i'th frame (WebP format), 'xi','yi' specify the image
offset for this frame, 'di' is the pause duration before next frame, 'mi' is
the dispose method for this frame (0 for NONE or 1 for BACKGROUND) and 'bi' is
the blending method for this frame (+b for BLEND or -b for NO_BLEND).
the blending method for this frame (+b for BLEND or \-b for NO_BLEND).
Argument 'bi' can be omitted and will default to +b (BLEND).
Also, 'mi' can be omitted if 'bi' is omitted and will default to 0 (NONE).
Finally, if 'mi' and 'bi' are omitted then 'xi' and 'yi' can be omitted and will
@@ -130,37 +132,61 @@ Please report all bugs to our issue tracker:
http://code.google.com/p/webp/issues
.br
Patches welcome! See this page to get started:
http://www.webmproject.org/code/contribute/submitting-patches/
http://www.webmproject.org/code/contribute/submitting\-patches/
.SH EXAMPLES
.P
Add ICC profile:
.br
webpmux \-set icc image_profile.icc in.webp \-o icc_container.webp
.P
Extract ICC profile:
.br
webpmux \-get icc icc_container.webp \-o image_profile.icc
.P
Strip ICC profile:
.br
webpmux \-strip icc icc_container.webp \-o without_icc.webp
.P
Add XMP metadata:
.br
webpmux \-set xmp image_metadata.xmp in.webp \-o xmp_container.webp
.P
Extract XMP metadata:
.br
webpmux \-get xmp xmp_container.webp \-o image_metadata.xmp
.P
Strip XMP metadata:
.br
webpmux \-strip xmp xmp_container.webp \-o without_xmp.webp
.P
Add EXIF metadata:
.br
webpmux \-set exif image_metadata.exif in.webp \-o exif_container.webp
.P
Extract EXIF metadata:
.br
webpmux \-get exif exif_container.webp \-o image_metadata.exif
.P
Strip EXIF metadata:
.br
webpmux \-strip exif exif_container.webp \-o without_exif.webp
.P
Create an animated WebP file from 3 (non\-animated) WebP images:
.br
webpmux \-frame anim_1.webp +100 \-frame anim_2.webp +100+50+50
webpmux \-frame 1.webp +100 \-frame 2.webp +100+50+50
.br
.RS 8
\-frame anim_2.webp +100+50+50+1+b \-loop 10 \-bgcolor 255,255,255,255
\-frame 3.webp +100+50+50+1+b \-loop 10 \-bgcolor 255,255,255,255
.br
.RS 8
\-o anim_container.webp
.RE
.P
Get the 2nd frame from an animated WebP file:
.br
webpmux \-get frame 2 anim_container.webp \-o frame_2.webp
.P
Using \-get/\-set/\-strip with input file name starting with '\-':
.br
webpmux \-set icc image_profile.icc \-o icc_container.webp \-\- \-\-\-in.webp
.br

1
src/.gitignore vendored
View File

@@ -1 +0,0 @@
/*.pc

View File

@@ -8,7 +8,6 @@ if WANT_DEMUX
SUBDIRS += demux
endif
AM_CPPFLAGS = -I$(top_srcdir)/src
lib_LTLIBRARIES = libwebp.la
if BUILD_LIBWEBPDECODER
@@ -36,7 +35,7 @@ libwebp_la_LIBADD += utils/libwebputils.la
# other than the ones listed on the command line, i.e., after linking, it will
# not have unresolved symbols. Some platforms (Windows among them) require all
# symbols in shared libraries to be resolved at library creation.
libwebp_la_LDFLAGS = -no-undefined -version-info 5:0:0
libwebp_la_LDFLAGS = -no-undefined -version-info 5:3:0
libwebpincludedir = $(includedir)/webp
pkgconfig_DATA = libwebp.pc
@@ -48,7 +47,7 @@ if BUILD_LIBWEBPDECODER
libwebpdecoder_la_LIBADD += dsp/libwebpdspdecode.la
libwebpdecoder_la_LIBADD += utils/libwebputilsdecode.la
libwebpdecoder_la_LDFLAGS = -no-undefined -version-info 1:0:0
libwebpdecoder_la_LDFLAGS = -no-undefined -version-info 1:3:0
pkgconfig_DATA += libwebpdecoder.pc
endif

View File

@@ -1,4 +1,3 @@
AM_CPPFLAGS = -I$(top_srcdir)/src
noinst_LTLIBRARIES = libwebpdecode.la
libwebpdecode_la_SOURCES =
@@ -9,7 +8,6 @@ libwebpdecode_la_SOURCES += decode_vp8.h
libwebpdecode_la_SOURCES += frame.c
libwebpdecode_la_SOURCES += idec.c
libwebpdecode_la_SOURCES += io.c
libwebpdecode_la_SOURCES += layer.c
libwebpdecode_la_SOURCES += quant.c
libwebpdecode_la_SOURCES += tree.c
libwebpdecode_la_SOURCES += vp8.c

View File

@@ -16,13 +16,14 @@
#include "./vp8i.h"
#include "./vp8li.h"
#include "../utils/quant_levels_dec.h"
#include "../utils/utils.h"
#include "../webp/format_constants.h"
//------------------------------------------------------------------------------
// ALPHDecoder object.
ALPHDecoder* ALPHNew(void) {
ALPHDecoder* const dec = (ALPHDecoder*)calloc(1, sizeof(*dec));
ALPHDecoder* const dec = (ALPHDecoder*)WebPSafeCalloc(1ULL, sizeof(*dec));
return dec;
}
@@ -30,7 +31,7 @@ void ALPHDelete(ALPHDecoder* const dec) {
if (dec != NULL) {
VP8LDelete(dec->vp8l_dec_);
dec->vp8l_dec_ = NULL;
free(dec);
WebPSafeFree(dec);
}
}
@@ -107,12 +108,6 @@ static int ALPHDecode(VP8Decoder* const dec, int row, int num_rows) {
unfilter_func(width, height, width, row, num_rows, output);
}
if (alph_dec->pre_processing_ == ALPHA_PREPROCESSED_LEVELS) {
if (!DequantizeLevels(output, width, height, row, num_rows)) {
return 0;
}
}
if (row + num_rows == dec->pic_hdr_.height_) {
dec->is_alpha_decoded_ = 1;
}
@@ -142,12 +137,22 @@ const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
dec->alph_dec_ = NULL;
return NULL;
}
// if we allowed use of alpha dithering, check whether it's needed at all
if (dec->alph_dec_->pre_processing_ != ALPHA_PREPROCESSED_LEVELS) {
dec->alpha_dithering_ = 0; // disable dithering
} else {
num_rows = height; // decode everything in one pass
}
}
if (!dec->is_alpha_decoded_) {
int ok = 0;
assert(dec->alph_dec_ != NULL);
ok = ALPHDecode(dec, row, num_rows);
if (ok && dec->alpha_dithering_ > 0) {
ok = WebPDequantizeLevels(dec->alpha_plane_, width, height,
dec->alpha_dithering_);
}
if (!ok || dec->is_alpha_decoded_) {
ALPHDelete(dec->alph_dec_);
dec->alph_dec_ = NULL;
@@ -158,4 +163,3 @@ const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
// Return a pointer to the current decoded row.
return dec->alpha_plane_ + row * width;
}

View File

@@ -42,29 +42,34 @@ static VP8StatusCode CheckDecBuffer(const WebPDecBuffer* const buffer) {
ok = 0;
} else if (!WebPIsRGBMode(mode)) { // YUV checks
const WebPYUVABuffer* const buf = &buffer->u.YUVA;
const uint64_t y_size = (uint64_t)buf->y_stride * height;
const uint64_t u_size = (uint64_t)buf->u_stride * ((height + 1) / 2);
const uint64_t v_size = (uint64_t)buf->v_stride * ((height + 1) / 2);
const uint64_t a_size = (uint64_t)buf->a_stride * height;
const int y_stride = abs(buf->y_stride);
const int u_stride = abs(buf->u_stride);
const int v_stride = abs(buf->v_stride);
const int a_stride = abs(buf->a_stride);
const uint64_t y_size = (uint64_t)y_stride * height;
const uint64_t u_size = (uint64_t)u_stride * ((height + 1) / 2);
const uint64_t v_size = (uint64_t)v_stride * ((height + 1) / 2);
const uint64_t a_size = (uint64_t)a_stride * height;
ok &= (y_size <= buf->y_size);
ok &= (u_size <= buf->u_size);
ok &= (v_size <= buf->v_size);
ok &= (buf->y_stride >= width);
ok &= (buf->u_stride >= (width + 1) / 2);
ok &= (buf->v_stride >= (width + 1) / 2);
ok &= (y_stride >= width);
ok &= (u_stride >= (width + 1) / 2);
ok &= (v_stride >= (width + 1) / 2);
ok &= (buf->y != NULL);
ok &= (buf->u != NULL);
ok &= (buf->v != NULL);
if (mode == MODE_YUVA) {
ok &= (buf->a_stride >= width);
ok &= (a_stride >= width);
ok &= (a_size <= buf->a_size);
ok &= (buf->a != NULL);
}
} else { // RGB checks
const WebPRGBABuffer* const buf = &buffer->u.RGBA;
const uint64_t size = (uint64_t)buf->stride * height;
const int stride = abs(buf->stride);
const uint64_t size = (uint64_t)stride * height;
ok &= (size <= buf->size);
ok &= (buf->stride >= width * kModeBpp[mode]);
ok &= (stride >= width * kModeBpp[mode]);
ok &= (buf->rgba != NULL);
}
return ok ? VP8_STATUS_OK : VP8_STATUS_INVALID_PARAM;
@@ -131,9 +136,35 @@ static VP8StatusCode AllocateBuffer(WebPDecBuffer* const buffer) {
return CheckDecBuffer(buffer);
}
VP8StatusCode WebPFlipBuffer(WebPDecBuffer* const buffer) {
if (buffer == NULL) {
return VP8_STATUS_INVALID_PARAM;
}
if (WebPIsRGBMode(buffer->colorspace)) {
WebPRGBABuffer* const buf = &buffer->u.RGBA;
buf->rgba += (buffer->height - 1) * buf->stride;
buf->stride = -buf->stride;
} else {
WebPYUVABuffer* const buf = &buffer->u.YUVA;
const int H = buffer->height;
buf->y += (H - 1) * buf->y_stride;
buf->y_stride = -buf->y_stride;
buf->u += ((H - 1) >> 1) * buf->u_stride;
buf->u_stride = -buf->u_stride;
buf->v += ((H - 1) >> 1) * buf->v_stride;
buf->v_stride = -buf->v_stride;
if (buf->a != NULL) {
buf->a += (H - 1) * buf->a_stride;
buf->a_stride = -buf->a_stride;
}
}
return VP8_STATUS_OK;
}
VP8StatusCode WebPAllocateDecBuffer(int w, int h,
const WebPDecoderOptions* const options,
WebPDecBuffer* const out) {
VP8StatusCode status;
if (out == NULL || w <= 0 || h <= 0) {
return VP8_STATUS_INVALID_PARAM;
}
@@ -160,8 +191,17 @@ VP8StatusCode WebPAllocateDecBuffer(int w, int h,
out->width = w;
out->height = h;
// Then, allocate buffer for real
return AllocateBuffer(out);
// Then, allocate buffer for real.
status = AllocateBuffer(out);
if (status != VP8_STATUS_OK) return status;
#if WEBP_DECODER_ABI_VERSION > 0x0203
// Use the stride trick if vertical flip is needed.
if (options != NULL && options->flip) {
status = WebPFlipBuffer(out);
}
#endif
return status;
}
//------------------------------------------------------------------------------
@@ -178,8 +218,9 @@ int WebPInitDecBufferInternal(WebPDecBuffer* buffer, int version) {
void WebPFreeDecBuffer(WebPDecBuffer* buffer) {
if (buffer != NULL) {
if (!buffer->is_external_memory)
free(buffer->private_memory);
if (!buffer->is_external_memory) {
WebPSafeFree(buffer->private_memory);
}
buffer->private_memory = NULL;
}
}

View File

@@ -177,6 +177,15 @@ void VP8InitDithering(const WebPDecoderOptions* const options,
dec->dither_ = 1;
}
}
#if WEBP_DECODER_ABI_VERSION > 0x0204
// potentially allow alpha dithering
dec->alpha_dithering_ = options->alpha_dithering_strength;
if (dec->alpha_dithering_ > 100) {
dec->alpha_dithering_ = 100;
} else if (dec->alpha_dithering_ < 0) {
dec->alpha_dithering_ = 0;
}
#endif
}
}
@@ -347,7 +356,7 @@ int VP8ProcessRow(VP8Decoder* const dec, VP8Io* const io) {
} else {
WebPWorker* const worker = &dec->worker_;
// Finish previous job *before* updating context
ok &= WebPWorkerSync(worker);
ok &= WebPGetWorkerInterface()->Sync(worker);
assert(worker->status_ == OK);
if (ok) { // spawn a new deblocking/output job
ctx->io_ = *io;
@@ -367,7 +376,8 @@ int VP8ProcessRow(VP8Decoder* const dec, VP8Io* const io) {
ctx->f_info_ = dec->f_info_;
dec->f_info_ = tmp;
}
WebPWorkerLaunch(worker); // (reconstruct)+filter in parallel
// (reconstruct)+filter in parallel
WebPGetWorkerInterface()->Launch(worker);
if (++dec->cache_id_ == dec->num_caches_) {
dec->cache_id_ = 0;
}
@@ -437,7 +447,7 @@ VP8StatusCode VP8EnterCritical(VP8Decoder* const dec, VP8Io* const io) {
int VP8ExitCritical(VP8Decoder* const dec, VP8Io* const io) {
int ok = 1;
if (dec->mt_method_ > 0) {
ok = WebPWorkerSync(&dec->worker_);
ok = WebPGetWorkerInterface()->Sync(&dec->worker_);
}
if (io->teardown != NULL) {
@@ -478,7 +488,7 @@ static int InitThreadContext(VP8Decoder* const dec) {
dec->cache_id_ = 0;
if (dec->mt_method_ > 0) {
WebPWorker* const worker = &dec->worker_;
if (!WebPWorkerReset(worker)) {
if (!WebPGetWorkerInterface()->Reset(worker)) {
return VP8SetError(dec, VP8_STATUS_OUT_OF_MEMORY,
"thread initialization failed.");
}
@@ -502,7 +512,7 @@ int VP8GetThreadMethod(const WebPDecoderOptions* const options,
(void)headers;
(void)width;
(void)height;
assert(!headers->is_lossless);
assert(headers == NULL || !headers->is_lossless);
#if defined(WEBP_USE_THREAD)
if (width < MIN_WIDTH_FOR_THREADS) return 0;
// TODO(skal): tune the heuristic further
@@ -549,7 +559,7 @@ static int AllocateMemory(VP8Decoder* const dec) {
if (needed != (size_t)needed) return 0; // check for overflow
if (needed > dec->mem_size_) {
free(dec->mem_);
WebPSafeFree(dec->mem_);
dec->mem_size_ = 0;
dec->mem_ = WebPSafeMalloc(needed, sizeof(uint8_t));
if (dec->mem_ == NULL) {

View File

@@ -72,28 +72,20 @@ struct WebPIDecoder {
MemBuffer mem_; // input memory buffer.
WebPDecBuffer output_; // output buffer (when no external one is supplied)
size_t chunk_size_; // Compressed VP8/VP8L size extracted from Header.
int last_mb_y_; // last row reached for intra-mode decoding
};
// MB context to restore in case VP8DecodeMB() fails
typedef struct {
VP8MB left_;
VP8MB info_;
uint8_t intra_t_[4];
uint8_t intra_l_[4];
VP8BitReader br_;
VP8BitReader token_br_;
} MBContext;
//------------------------------------------------------------------------------
// MemBuffer: incoming data handling
static void RemapBitReader(VP8BitReader* const br, ptrdiff_t offset) {
if (br->buf_ != NULL) {
br->buf_ += offset;
br->buf_end_ += offset;
}
}
static WEBP_INLINE size_t MemDataSize(const MemBuffer* mem) {
return (mem->end_ - mem->start_);
}
@@ -130,12 +122,12 @@ static void DoRemap(WebPIDecoder* const idec, ptrdiff_t offset) {
if (offset != 0) {
int p;
for (p = 0; p <= last_part; ++p) {
RemapBitReader(dec->parts_ + p, offset);
VP8RemapBitReader(dec->parts_ + p, offset);
}
// Remap partition #0 data pointer to new offset, but only in MAP
// mode (in APPEND mode, partition #0 is copied into a fixed memory).
if (mem->mode_ == MEM_MODE_MAP) {
RemapBitReader(&dec->br_, offset);
VP8RemapBitReader(&dec->br_, offset);
}
}
assert(last_part >= 0);
@@ -189,7 +181,7 @@ static int AppendToMemBuffer(WebPIDecoder* const idec,
(uint8_t*)WebPSafeMalloc(extra_size, sizeof(*new_buf));
if (new_buf == NULL) return 0;
memcpy(new_buf, old_base, current_size);
free(mem->buf_);
WebPSafeFree(mem->buf_);
mem->buf_ = new_buf;
mem->buf_size_ = (size_t)extra_size;
mem->start_ = new_mem_start;
@@ -231,8 +223,8 @@ static void InitMemBuffer(MemBuffer* const mem) {
static void ClearMemBuffer(MemBuffer* const mem) {
assert(mem);
if (mem->mode_ == MEM_MODE_APPEND) {
free(mem->buf_);
free((void*)mem->part0_buf_);
WebPSafeFree(mem->buf_);
WebPSafeFree((void*)mem->part0_buf_);
}
}
@@ -246,35 +238,36 @@ static int CheckMemBufferMode(MemBuffer* const mem, MemBufferMode expected) {
return 1;
}
// To be called last.
static VP8StatusCode FinishDecoding(WebPIDecoder* const idec) {
#if WEBP_DECODER_ABI_VERSION > 0x0203
const WebPDecoderOptions* const options = idec->params_.options;
WebPDecBuffer* const output = idec->params_.output;
idec->state_ = STATE_DONE;
if (options != NULL && options->flip) {
return WebPFlipBuffer(output);
}
#endif
idec->state_ = STATE_DONE;
return VP8_STATUS_OK;
}
//------------------------------------------------------------------------------
// Macroblock-decoding contexts
static void SaveContext(const VP8Decoder* dec, const VP8BitReader* token_br,
MBContext* const context) {
const VP8BitReader* const br = &dec->br_;
const VP8MB* const left = dec->mb_info_ - 1;
const VP8MB* const info = dec->mb_info_ + dec->mb_x_;
context->left_ = *left;
context->info_ = *info;
context->br_ = *br;
context->left_ = dec->mb_info_[-1];
context->info_ = dec->mb_info_[dec->mb_x_];
context->token_br_ = *token_br;
memcpy(context->intra_t_, dec->intra_t_ + 4 * dec->mb_x_, 4);
memcpy(context->intra_l_, dec->intra_l_, 4);
}
static void RestoreContext(const MBContext* context, VP8Decoder* const dec,
VP8BitReader* const token_br) {
VP8BitReader* const br = &dec->br_;
VP8MB* const left = dec->mb_info_ - 1;
VP8MB* const info = dec->mb_info_ + dec->mb_x_;
*left = context->left_;
*info = context->info_;
*br = context->br_;
dec->mb_info_[-1] = context->left_;
dec->mb_info_[dec->mb_x_] = context->info_;
*token_br = context->token_br_;
memcpy(dec->intra_t_ + 4 * dec->mb_x_, context->intra_t_, 4);
memcpy(dec->intra_l_, context->intra_l_, 4);
}
//------------------------------------------------------------------------------
@@ -310,6 +303,7 @@ static VP8StatusCode DecodeWebPHeaders(WebPIDecoder* const idec) {
headers.data = data;
headers.data_size = curr_size;
headers.have_all_data = 0;
status = WebPParseHeaders(&headers);
if (status == VP8_STATUS_NOT_ENOUGH_DATA) {
return VP8_STATUS_SUSPENDED; // We haven't found a VP8 chunk yet.
@@ -363,30 +357,33 @@ static VP8StatusCode DecodeVP8FrameHeader(WebPIDecoder* const idec) {
}
// Partition #0
static int CopyParts0Data(WebPIDecoder* const idec) {
static VP8StatusCode CopyParts0Data(WebPIDecoder* const idec) {
VP8Decoder* const dec = (VP8Decoder*)idec->dec_;
VP8BitReader* const br = &dec->br_;
const size_t psize = br->buf_end_ - br->buf_;
const size_t part_size = br->buf_end_ - br->buf_;
MemBuffer* const mem = &idec->mem_;
assert(!idec->is_lossless_);
assert(mem->part0_buf_ == NULL);
assert(psize > 0);
assert(psize <= mem->part0_size_); // Format limit: no need for runtime check
// the following is a format limitation, no need for runtime check:
assert(part_size <= mem->part0_size_);
if (part_size == 0) { // can't have zero-size partition #0
return VP8_STATUS_BITSTREAM_ERROR;
}
if (mem->mode_ == MEM_MODE_APPEND) {
// We copy and grab ownership of the partition #0 data.
uint8_t* const part0_buf = (uint8_t*)malloc(psize);
uint8_t* const part0_buf = (uint8_t*)WebPSafeMalloc(1ULL, part_size);
if (part0_buf == NULL) {
return 0;
return VP8_STATUS_OUT_OF_MEMORY;
}
memcpy(part0_buf, br->buf_, psize);
memcpy(part0_buf, br->buf_, part_size);
mem->part0_buf_ = part0_buf;
br->buf_ = part0_buf;
br->buf_end_ = part0_buf + psize;
br->buf_end_ = part0_buf + part_size;
} else {
// Else: just keep pointers to the partition #0's data in dec_->br_.
}
mem->start_ += psize;
return 1;
mem->start_ += part_size;
return VP8_STATUS_OK;
}
static VP8StatusCode DecodePartition0(WebPIDecoder* const idec) {
@@ -420,8 +417,10 @@ static VP8StatusCode DecodePartition0(WebPIDecoder* const idec) {
dec->mt_method_ = VP8GetThreadMethod(params->options, NULL,
io->width, io->height);
VP8InitDithering(params->options, dec);
if (!CopyParts0Data(idec)) {
return IDecError(idec, VP8_STATUS_OUT_OF_MEMORY);
dec->status_ = CopyParts0Data(idec);
if (dec->status_ != VP8_STATUS_OK) {
return IDecError(idec, dec->status_);
}
// Finish setting up the decoding parameters. Will call io->setup().
@@ -446,16 +445,26 @@ static VP8StatusCode DecodeRemaining(WebPIDecoder* const idec) {
assert(dec->ready_);
for (; dec->mb_y_ < dec->mb_h_; ++dec->mb_y_) {
VP8BitReader* token_br = &dec->parts_[dec->mb_y_ & (dec->num_parts_ - 1)];
if (idec->last_mb_y_ != dec->mb_y_) {
if (!VP8ParseIntraModeRow(&dec->br_, dec)) {
// note: normally, error shouldn't occur since we already have the whole
// partition0 available here in DecodeRemaining(). Reaching EOF while
// reading intra modes really means a BITSTREAM_ERROR.
return IDecError(idec, VP8_STATUS_BITSTREAM_ERROR);
}
idec->last_mb_y_ = dec->mb_y_;
}
for (; dec->mb_x_ < dec->mb_w_; ++dec->mb_x_) {
VP8BitReader* const token_br =
&dec->parts_[dec->mb_y_ & (dec->num_parts_ - 1)];
MBContext context;
SaveContext(dec, token_br, &context);
if (!VP8DecodeMB(dec, token_br)) {
RestoreContext(&context, dec, token_br);
// We shouldn't fail when MAX_MB data was available
if (dec->num_parts_ == 1 && MemDataSize(&idec->mem_) > MAX_MB_SIZE) {
return IDecError(idec, VP8_STATUS_BITSTREAM_ERROR);
}
RestoreContext(&context, dec, token_br);
return VP8_STATUS_SUSPENDED;
}
// Release buffer only if there is only one partition
@@ -476,9 +485,7 @@ static VP8StatusCode DecodeRemaining(WebPIDecoder* const idec) {
return IDecError(idec, VP8_STATUS_USER_ABORT);
}
dec->ready_ = 0;
idec->state_ = STATE_DONE;
return VP8_STATUS_OK;
return FinishDecoding(idec);
}
static VP8StatusCode ErrorStatusLossless(WebPIDecoder* const idec,
@@ -527,12 +534,16 @@ static VP8StatusCode DecodeVP8LData(WebPIDecoder* const idec) {
}
if (!VP8LDecodeImage(dec)) {
// The decoding is called after all the data-bytes are aggregated. Change
// the error to VP8_BITSTREAM_ERROR in case lossless decoder fails to decode
// all the pixels (VP8_STATUS_SUSPENDED).
if (dec->status_ == VP8_STATUS_SUSPENDED) {
dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
}
return ErrorStatusLossless(idec, dec->status_);
}
idec->state_ = STATE_DONE;
return VP8_STATUS_OK;
return FinishDecoding(idec);
}
// Main decoding loop
@@ -568,7 +579,7 @@ static VP8StatusCode IDecode(WebPIDecoder* idec) {
// Public functions
WebPIDecoder* WebPINewDecoder(WebPDecBuffer* output_buffer) {
WebPIDecoder* idec = (WebPIDecoder*)calloc(1, sizeof(*idec));
WebPIDecoder* idec = (WebPIDecoder*)WebPSafeCalloc(1ULL, sizeof(*idec));
if (idec == NULL) {
return NULL;
}
@@ -576,6 +587,8 @@ WebPIDecoder* WebPINewDecoder(WebPDecBuffer* output_buffer) {
idec->state_ = STATE_WEBP_HEADER;
idec->chunk_size_ = 0;
idec->last_mb_y_ = -1;
InitMemBuffer(&idec->mem_);
WebPInitDecBuffer(&idec->output_);
VP8InitIo(&idec->io_);
@@ -625,7 +638,7 @@ void WebPIDelete(WebPIDecoder* idec) {
}
ClearMemBuffer(&idec->mem_);
WebPFreeDecBuffer(&idec->output_);
free(idec);
WebPSafeFree(idec);
}
//------------------------------------------------------------------------------

View File

@@ -17,6 +17,7 @@
#include "./webpi.h"
#include "../dsp/dsp.h"
#include "../dsp/yuv.h"
#include "../utils/utils.h"
//------------------------------------------------------------------------------
// Main YUV<->RGB conversion functions
@@ -44,27 +45,13 @@ static int EmitYUV(const VP8Io* const io, WebPDecParams* const p) {
// Point-sampling U/V sampler.
static int EmitSampledRGB(const VP8Io* const io, WebPDecParams* const p) {
WebPDecBuffer* output = p->output;
const WebPRGBABuffer* const buf = &output->u.RGBA;
uint8_t* dst = buf->rgba + io->mb_y * buf->stride;
const uint8_t* y_src = io->y;
const uint8_t* u_src = io->u;
const uint8_t* v_src = io->v;
const WebPSampleLinePairFunc sample = WebPSamplers[output->colorspace];
const int mb_w = io->mb_w;
const int last = io->mb_h - 1;
int j;
for (j = 0; j < last; j += 2) {
sample(y_src, y_src + io->y_stride, u_src, v_src,
dst, dst + buf->stride, mb_w);
y_src += 2 * io->y_stride;
u_src += io->uv_stride;
v_src += io->uv_stride;
dst += 2 * buf->stride;
}
if (j == last) { // Just do the last line twice
sample(y_src, y_src, u_src, v_src, dst, dst, mb_w);
}
WebPDecBuffer* const output = p->output;
WebPRGBABuffer* const buf = &output->u.RGBA;
uint8_t* const dst = buf->rgba + io->mb_y * buf->stride;
WebPSamplerProcessPlane(io->y, io->y_stride,
io->u, io->v, io->uv_stride,
dst, buf->stride, io->mb_w, io->mb_h,
WebPSamplers[output->colorspace]);
return io->mb_h;
}
@@ -250,7 +237,11 @@ static int EmitAlphaRGBA4444(const VP8Io* const io, WebPDecParams* const p) {
int num_rows;
const int start_y = GetAlphaSourceRow(io, &alpha, &num_rows);
uint8_t* const base_rgba = buf->rgba + start_y * buf->stride;
#ifdef WEBP_SWAP_16BIT_CSP
uint8_t* alpha_dst = base_rgba;
#else
uint8_t* alpha_dst = base_rgba + 1;
#endif
uint32_t alpha_mask = 0x0f;
int i, j;
@@ -289,7 +280,17 @@ static int Rescale(const uint8_t* src, int src_stride,
static int EmitRescaledYUV(const VP8Io* const io, WebPDecParams* const p) {
const int mb_h = io->mb_h;
const int uv_mb_h = (mb_h + 1) >> 1;
const int num_lines_out = Rescale(io->y, io->y_stride, mb_h, &p->scaler_y);
WebPRescaler* const scaler = &p->scaler_y;
int num_lines_out = 0;
if (WebPIsAlphaMode(p->output->colorspace) && io->a != NULL) {
// Before rescaling, we premultiply the luma directly into the io->y
// internal buffer. This is OK since these samples are not used for
// intra-prediction (the top samples are saved in cache_y_/u_/v_).
// But we need to cast the const away, though.
WebPMultRows((uint8_t*)io->y, io->y_stride,
io->a, io->width, io->mb_w, mb_h, 0);
}
num_lines_out = Rescale(io->y, io->y_stride, mb_h, scaler);
Rescale(io->u, io->uv_stride, uv_mb_h, &p->scaler_u);
Rescale(io->v, io->uv_stride, uv_mb_h, &p->scaler_v);
return num_lines_out;
@@ -297,7 +298,14 @@ static int EmitRescaledYUV(const VP8Io* const io, WebPDecParams* const p) {
static int EmitRescaledAlphaYUV(const VP8Io* const io, WebPDecParams* const p) {
if (io->a != NULL) {
Rescale(io->a, io->width, io->mb_h, &p->scaler_a);
const WebPYUVABuffer* const buf = &p->output->u.YUVA;
uint8_t* dst_y = buf->y + p->last_y * buf->y_stride;
const uint8_t* src_a = buf->a + p->last_y * buf->a_stride;
const int num_lines_out = Rescale(io->a, io->width, io->mb_h, &p->scaler_a);
if (num_lines_out > 0) { // unmultiply the Y
WebPMultRows(dst_y, buf->y_stride, src_a, buf->a_stride,
p->scaler_a.dst_width, num_lines_out, 1);
}
}
return 0;
}
@@ -316,11 +324,11 @@ static int InitYUVRescaler(const VP8Io* const io, WebPDecParams* const p) {
size_t tmp_size;
int32_t* work;
tmp_size = work_size + 2 * uv_work_size;
tmp_size = (work_size + 2 * uv_work_size) * sizeof(*work);
if (has_alpha) {
tmp_size += work_size;
tmp_size += work_size * sizeof(*work);
}
p->memory = calloc(1, tmp_size * sizeof(*work));
p->memory = WebPSafeCalloc(1ULL, tmp_size);
if (p->memory == NULL) {
return 0; // memory error
}
@@ -347,6 +355,7 @@ static int InitYUVRescaler(const VP8Io* const io, WebPDecParams* const p) {
io->mb_w, out_width, io->mb_h, out_height,
work + work_size + 2 * uv_work_size);
p->emit_alpha = EmitRescaledAlphaYUV;
WebPInitAlphaProcessing();
}
return 1;
}
@@ -366,9 +375,9 @@ static int ExportRGB(WebPDecParams* const p, int y_pos) {
WebPRescalerHasPendingOutput(&p->scaler_u)) {
assert(p->last_y + y_pos + num_lines_out < p->output->height);
assert(p->scaler_u.y_accum == p->scaler_v.y_accum);
WebPRescalerExportRow(&p->scaler_y);
WebPRescalerExportRow(&p->scaler_u);
WebPRescalerExportRow(&p->scaler_v);
WebPRescalerExportRow(&p->scaler_y, 0);
WebPRescalerExportRow(&p->scaler_u, 0);
WebPRescalerExportRow(&p->scaler_v, 0);
convert(p->scaler_y.dst, p->scaler_u.dst, p->scaler_v.dst,
dst, p->scaler_y.dst_width);
dst += buf->stride;
@@ -416,7 +425,7 @@ static int ExportAlpha(WebPDecParams* const p, int y_pos) {
while (WebPRescalerHasPendingOutput(&p->scaler_a)) {
int i;
assert(p->last_y + y_pos + num_lines_out < p->output->height);
WebPRescalerExportRow(&p->scaler_a);
WebPRescalerExportRow(&p->scaler_a, 0);
for (i = 0; i < width; ++i) {
const uint32_t alpha_value = p->scaler_a.dst[i];
dst[4 * i] = alpha_value;
@@ -435,7 +444,11 @@ static int ExportAlpha(WebPDecParams* const p, int y_pos) {
static int ExportAlphaRGBA4444(WebPDecParams* const p, int y_pos) {
const WebPRGBABuffer* const buf = &p->output->u.RGBA;
uint8_t* const base_rgba = buf->rgba + (p->last_y + y_pos) * buf->stride;
#ifdef WEBP_SWAP_16BIT_CSP
uint8_t* alpha_dst = base_rgba;
#else
uint8_t* alpha_dst = base_rgba + 1;
#endif
int num_lines_out = 0;
const WEBP_CSP_MODE colorspace = p->output->colorspace;
const int width = p->scaler_a.dst_width;
@@ -445,7 +458,7 @@ static int ExportAlphaRGBA4444(WebPDecParams* const p, int y_pos) {
while (WebPRescalerHasPendingOutput(&p->scaler_a)) {
int i;
assert(p->last_y + y_pos + num_lines_out < p->output->height);
WebPRescalerExportRow(&p->scaler_a);
WebPRescalerExportRow(&p->scaler_a, 0);
for (i = 0; i < width; ++i) {
// Fill in the alpha value (converted to 4 bits).
const uint32_t alpha_value = p->scaler_a.dst[i] >> 4;
@@ -484,7 +497,7 @@ static int InitRGBRescaler(const VP8Io* const io, WebPDecParams* const p) {
const size_t work_size = 2 * out_width; // scratch memory for one rescaler
int32_t* work; // rescalers work area
uint8_t* tmp; // tmp storage for scaled YUV444 samples before RGB conversion
size_t tmp_size1, tmp_size2;
size_t tmp_size1, tmp_size2, total_size;
tmp_size1 = 3 * work_size;
tmp_size2 = 3 * out_width;
@@ -492,7 +505,8 @@ static int InitRGBRescaler(const VP8Io* const io, WebPDecParams* const p) {
tmp_size1 += work_size;
tmp_size2 += out_width;
}
p->memory = calloc(1, tmp_size1 * sizeof(*work) + tmp_size2 * sizeof(*tmp));
total_size = tmp_size1 * sizeof(*work) + tmp_size2 * sizeof(*tmp);
p->memory = WebPSafeCalloc(1ULL, total_size);
if (p->memory == NULL) {
return 0; // memory error
}
@@ -524,6 +538,7 @@ static int InitRGBRescaler(const VP8Io* const io, WebPDecParams* const p) {
} else {
p->emit_alpha_row = ExportAlpha;
}
WebPInitAlphaProcessing();
}
return 1;
}
@@ -544,7 +559,9 @@ static int CustomSetup(VP8Io* io) {
if (!WebPIoInitFromOptions(p->options, io, is_alpha ? MODE_YUV : MODE_YUVA)) {
return 0;
}
if (is_alpha && WebPIsPremultipliedMode(colorspace)) {
WebPInitUpsamplers();
}
if (io->use_scaling) {
const int ok = is_rgb ? InitRGBRescaler(io, p) : InitYUVRescaler(io, p);
if (!ok) {
@@ -553,10 +570,10 @@ static int CustomSetup(VP8Io* io) {
} else {
if (is_rgb) {
p->emit = EmitSampledRGB; // default
#ifdef FANCY_UPSAMPLING
if (io->fancy_upsampling) {
#ifdef FANCY_UPSAMPLING
const int uv_width = (io->mb_w + 1) >> 1;
p->memory = malloc(io->mb_w + 2 * uv_width);
p->memory = WebPSafeMalloc(1ULL, (size_t)(io->mb_w + 2 * uv_width));
if (p->memory == NULL) {
return 0; // memory error.
}
@@ -565,18 +582,22 @@ static int CustomSetup(VP8Io* io) {
p->tmp_v = p->tmp_u + uv_width;
p->emit = EmitFancyRGB;
WebPInitUpsamplers();
}
#endif
} else {
WebPInitSamplers();
}
} else {
p->emit = EmitYUV;
}
if (is_alpha) { // need transparency output
if (WebPIsPremultipliedMode(colorspace)) WebPInitPremultiply();
p->emit_alpha =
(colorspace == MODE_RGBA_4444 || colorspace == MODE_rgbA_4444) ?
EmitAlphaRGBA4444
: is_rgb ? EmitAlphaRGB
: EmitAlphaYUV;
if (is_rgb) {
WebPInitAlphaProcessing();
}
}
}
@@ -610,7 +631,7 @@ static int CustomPut(const VP8Io* io) {
static void CustomTeardown(const VP8Io* io) {
WebPDecParams* const p = (WebPDecParams*)io->opaque;
free(p->memory);
WebPSafeFree(p->memory);
p->memory = NULL;
}
@@ -625,4 +646,3 @@ void WebPInitCustomIo(WebPDecParams* const params, VP8Io* const io) {
}
//------------------------------------------------------------------------------

View File

@@ -11,7 +11,8 @@
//
// Author: Skal (pascal.massimino@gmail.com)
#include "vp8i.h"
#include "./vp8i.h"
#include "../utils/bit_reader_inl.h"
#define USE_GENERIC_TREE
@@ -278,10 +279,23 @@ void VP8ResetProba(VP8Proba* const proba) {
// proba->bands_[][] is initialized later
}
void VP8ParseIntraMode(VP8BitReader* const br, VP8Decoder* const dec) {
uint8_t* const top = dec->intra_t_ + 4 * dec->mb_x_;
static void ParseIntraMode(VP8BitReader* const br,
VP8Decoder* const dec, int mb_x) {
uint8_t* const top = dec->intra_t_ + 4 * mb_x;
uint8_t* const left = dec->intra_l_;
VP8MBData* const block = dec->mb_data_ + dec->mb_x_;
VP8MBData* const block = dec->mb_data_ + mb_x;
// Note: we don't save segment map (yet), as we don't expect
// to decode more than 1 keyframe.
if (dec->segment_hdr_.update_map_) {
// Hardcoded tree parsing
block->segment_ = !VP8GetBit(br, dec->proba_.segments_[0])
? VP8GetBit(br, dec->proba_.segments_[1])
: 2 + VP8GetBit(br, dec->proba_.segments_[2]);
} else {
block->segment_ = 0; // default for intra
}
if (dec->use_skip_proba_) block->skip_ = VP8GetBit(br, dec->skip_p_);
block->is_i4x4_ = !VP8GetBit(br, 145); // decide for B_PRED first
if (!block->is_i4x4_) {
@@ -332,6 +346,14 @@ void VP8ParseIntraMode(VP8BitReader* const br, VP8Decoder* const dec) {
: VP8GetBit(br, 183) ? TM_PRED : H_PRED;
}
int VP8ParseIntraModeRow(VP8BitReader* const br, VP8Decoder* const dec) {
int mb_x;
for (mb_x = 0; mb_x < dec->mb_w_; ++mb_x) {
ParseIntraMode(br, dec, mb_x);
}
return !dec->br_.eof_;
}
//------------------------------------------------------------------------------
// Paragraph 13

View File

@@ -17,7 +17,8 @@
#include "./vp8i.h"
#include "./vp8li.h"
#include "./webpi.h"
#include "../utils/bit_reader.h"
#include "../utils/bit_reader_inl.h"
#include "../utils/utils.h"
//------------------------------------------------------------------------------
@@ -44,10 +45,10 @@ int VP8InitIoInternal(VP8Io* const io, int version) {
}
VP8Decoder* VP8New(void) {
VP8Decoder* const dec = (VP8Decoder*)calloc(1, sizeof(*dec));
VP8Decoder* const dec = (VP8Decoder*)WebPSafeCalloc(1ULL, sizeof(*dec));
if (dec != NULL) {
SetOk(dec);
WebPWorkerInit(&dec->worker_);
WebPGetWorkerInterface()->Init(&dec->worker_);
dec->ready_ = 0;
dec->num_parts_ = 1;
}
@@ -68,7 +69,7 @@ const char* VP8StatusMessage(VP8Decoder* const dec) {
void VP8Delete(VP8Decoder* const dec) {
if (dec != NULL) {
VP8Clear(dec);
free(dec);
WebPSafeFree(dec);
}
}
@@ -317,7 +318,6 @@ int VP8GetHeaders(VP8Decoder* const dec, VP8Io* const io) {
VP8ResetProba(&dec->proba_);
ResetSegmentHeader(&dec->segment_hdr_);
dec->segment_ = 0; // default for intra
}
// Check if we have all the partition #0 available, and initialize dec->br_
@@ -363,28 +363,6 @@ int VP8GetHeaders(VP8Decoder* const dec, VP8Io* const io) {
VP8ParseProba(br, dec);
#ifdef WEBP_EXPERIMENTAL_FEATURES
// Extensions
if (dec->pic_hdr_.colorspace_) {
const size_t kTrailerSize = 8;
const uint8_t kTrailerMarker = 0x01;
const uint8_t* ext_buf = buf - kTrailerSize;
size_t size;
if (frm_hdr->partition_length_ < kTrailerSize ||
ext_buf[kTrailerSize - 1] != kTrailerMarker) {
return VP8SetError(dec, VP8_STATUS_BITSTREAM_ERROR,
"RIFF: Inconsistent extra information.");
}
// Layer
size = (ext_buf[0] << 0) | (ext_buf[1] << 8) | (ext_buf[2] << 16);
dec->layer_data_size_ = size;
dec->layer_data_ = NULL; // will be set later
dec->layer_colorspace_ = ext_buf[3];
}
#endif
// sanitized state
dec->ready_ = 1;
return 1;
@@ -479,8 +457,8 @@ static int ParseResiduals(VP8Decoder* const dec,
VP8MB* const mb, VP8BitReader* const token_br) {
VP8BandProbas (* const bands)[NUM_BANDS] = dec->proba_.bands_;
const VP8BandProbas* ac_proba;
const VP8QuantMatrix* const q = &dec->dqm_[dec->segment_];
VP8MBData* const block = dec->mb_data_ + dec->mb_x_;
const VP8QuantMatrix* const q = &dec->dqm_[block->segment_];
int16_t* dst = block->coeffs_;
VP8MB* const left_mb = dec->mb_info_ - 1;
uint8_t tnz, lnz;
@@ -570,26 +548,10 @@ static int ParseResiduals(VP8Decoder* const dec,
// Main loop
int VP8DecodeMB(VP8Decoder* const dec, VP8BitReader* const token_br) {
VP8BitReader* const br = &dec->br_;
VP8MB* const left = dec->mb_info_ - 1;
VP8MB* const mb = dec->mb_info_ + dec->mb_x_;
VP8MBData* const block = dec->mb_data_ + dec->mb_x_;
int skip;
// Note: we don't save segment map (yet), as we don't expect
// to decode more than 1 keyframe.
if (dec->segment_hdr_.update_map_) {
// Hardcoded tree parsing
dec->segment_ = !VP8GetBit(br, dec->proba_.segments_[0]) ?
VP8GetBit(br, dec->proba_.segments_[1]) :
2 + VP8GetBit(br, dec->proba_.segments_[2]);
}
skip = dec->use_skip_proba_ ? VP8GetBit(br, dec->skip_p_) : 0;
VP8ParseIntraMode(br, dec);
if (br->eof_) {
return 0;
}
int skip = dec->use_skip_proba_ ? block->skip_ : 0;
if (!skip) {
skip = ParseResiduals(dec, mb, token_br);
@@ -600,11 +562,12 @@ int VP8DecodeMB(VP8Decoder* const dec, VP8BitReader* const token_br) {
}
block->non_zero_y_ = 0;
block->non_zero_uv_ = 0;
block->dither_ = 0;
}
if (dec->filter_type_ > 0) { // store filter info
VP8FInfo* const finfo = dec->f_info_ + dec->mb_x_;
*finfo = dec->fstrengths_[dec->segment_][block->is_i4x4_];
*finfo = dec->fstrengths_[block->segment_][block->is_i4x4_];
finfo->f_inner_ |= !skip;
}
@@ -624,6 +587,10 @@ static int ParseFrame(VP8Decoder* const dec, VP8Io* io) {
// Parse bitstream for this row.
VP8BitReader* const token_br =
&dec->parts_[dec->mb_y_ & (dec->num_parts_ - 1)];
if (!VP8ParseIntraModeRow(&dec->br_, dec)) {
return VP8SetError(dec, VP8_STATUS_NOT_ENOUGH_DATA,
"Premature end-of-partition0 encountered.");
}
for (; dec->mb_x_ < dec->mb_w_; ++dec->mb_x_) {
if (!VP8DecodeMB(dec, token_br)) {
return VP8SetError(dec, VP8_STATUS_NOT_ENOUGH_DATA,
@@ -638,18 +605,9 @@ static int ParseFrame(VP8Decoder* const dec, VP8Io* io) {
}
}
if (dec->mt_method_ > 0) {
if (!WebPWorkerSync(&dec->worker_)) return 0;
if (!WebPGetWorkerInterface()->Sync(&dec->worker_)) return 0;
}
// Finish
#ifdef WEBP_EXPERIMENTAL_FEATURES
if (dec->layer_data_size_ > 0) {
if (!VP8DecodeLayer(dec)) {
return 0;
}
}
#endif
return 1;
}
@@ -697,12 +655,10 @@ void VP8Clear(VP8Decoder* const dec) {
if (dec == NULL) {
return;
}
if (dec->mt_method_ > 0) {
WebPWorkerEnd(&dec->worker_);
}
WebPGetWorkerInterface()->End(&dec->worker_);
ALPHDelete(dec->alph_dec_);
dec->alph_dec_ = NULL;
free(dec->mem_);
WebPSafeFree(dec->mem_);
dec->mem_ = NULL;
dec->mem_size_ = 0;
memset(&dec->br_, 0, sizeof(dec->br_));

View File

@@ -31,7 +31,7 @@ extern "C" {
// version numbers
#define DEC_MAJ_VERSION 0
#define DEC_MIN_VERSION 4
#define DEC_REV_VERSION 0
#define DEC_REV_VERSION 3
// intra prediction modes
enum { B_DC_PRED = 0, // 4x4 modes
@@ -195,6 +195,8 @@ typedef struct {
uint32_t non_zero_y_;
uint32_t non_zero_uv_;
uint8_t dither_; // local dithering strength (deduced from non_zero_*)
uint8_t skip_;
uint8_t segment_;
} VP8MBData;
// Persistent information needed by the parallel processing
@@ -265,7 +267,6 @@ struct VP8Decoder {
uint8_t* intra_t_; // top intra modes values: 4 * mb_w_
uint8_t intra_l_[4]; // left intra modes values
uint8_t segment_; // segment of the currently parsed block
VP8TopSamples* yuv_t_; // top y/u/v samples
VP8MB* mb_info_; // contextual macroblock info (mb_w_ + 1)
@@ -295,12 +296,8 @@ struct VP8Decoder {
const uint8_t* alpha_data_; // compressed alpha data (if present)
size_t alpha_data_size_;
int is_alpha_decoded_; // true if alpha_data_ is decoded in alpha_plane_
uint8_t* alpha_plane_; // output. Persistent, contains the whole data.
// extensions
int layer_colorspace_;
const uint8_t* layer_data_; // compressed layer data (if present)
size_t layer_data_size_;
uint8_t* alpha_plane_; // output. Persistent, contains the whole data.
int alpha_dithering_; // derived from decoding options (0=off, 100=full).
};
//------------------------------------------------------------------------------
@@ -313,7 +310,8 @@ int VP8SetError(VP8Decoder* const dec,
// in tree.c
void VP8ResetProba(VP8Proba* const proba);
void VP8ParseProba(VP8BitReader* const br, VP8Decoder* const dec);
void VP8ParseIntraMode(VP8BitReader* const br, VP8Decoder* const dec);
// parses one row of intra mode data in partition 0, returns !eof
int VP8ParseIntraModeRow(VP8BitReader* const br, VP8Decoder* const dec);
// in quant.c
void VP8ParseQuant(VP8Decoder* const dec);
@@ -347,9 +345,6 @@ int VP8DecodeMB(VP8Decoder* const dec, VP8BitReader* const token_br);
const uint8_t* VP8DecompressAlphaRows(VP8Decoder* const dec,
int row, int num_rows);
// in layer.c
int VP8DecodeLayer(VP8Decoder* const dec);
//------------------------------------------------------------------------------
#ifdef __cplusplus

View File

@@ -12,13 +12,13 @@
// Authors: Vikas Arora (vikaas.arora@gmail.com)
// Jyrki Alakuijala (jyrki@google.com)
#include <stdio.h>
#include <stdlib.h>
#include "./alphai.h"
#include "./vp8li.h"
#include "../dsp/dsp.h"
#include "../dsp/lossless.h"
#include "../dsp/yuv.h"
#include "../utils/alpha_processing.h"
#include "../utils/huffman.h"
#include "../utils/utils.h"
@@ -187,9 +187,10 @@ static int ReadHuffmanCodeLengths(
int max_symbol;
int prev_code_len = DEFAULT_CODE_LENGTH;
HuffmanTree tree;
int huff_codes[NUM_CODE_LENGTH_CODES] = { 0 };
if (!HuffmanTreeBuildImplicit(&tree, code_length_code_lengths,
NUM_CODE_LENGTH_CODES)) {
if (!VP8LHuffmanTreeBuildImplicit(&tree, code_length_code_lengths,
huff_codes, NUM_CODE_LENGTH_CODES)) {
dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
return 0;
}
@@ -232,11 +233,15 @@ static int ReadHuffmanCodeLengths(
ok = 1;
End:
HuffmanTreeRelease(&tree);
VP8LHuffmanTreeFree(&tree);
if (!ok) dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
return ok;
}
// 'code_lengths' is pre-allocated temporary buffer, used for creating Huffman
// tree.
static int ReadHuffmanCode(int alphabet_size, VP8LDecoder* const dec,
int* const code_lengths, int* const huff_codes,
HuffmanTree* const tree) {
int ok = 0;
VP8LBitReader* const br = &dec->br_;
@@ -245,7 +250,6 @@ static int ReadHuffmanCode(int alphabet_size, VP8LDecoder* const dec,
if (simple_code) { // Read symbols, codes & code lengths directly.
int symbols[2];
int codes[2];
int code_lengths[2];
const int num_symbols = VP8LReadBits(br, 1) + 1;
const int first_symbol_len_code = VP8LReadBits(br, 1);
// The first code is either 1 bit or 8 bit code.
@@ -258,10 +262,9 @@ static int ReadHuffmanCode(int alphabet_size, VP8LDecoder* const dec,
codes[1] = 1;
code_lengths[1] = num_symbols - 1;
}
ok = HuffmanTreeBuildExplicit(tree, code_lengths, codes, symbols,
alphabet_size, num_symbols);
ok = VP8LHuffmanTreeBuildExplicit(tree, code_lengths, codes, symbols,
alphabet_size, num_symbols);
} else { // Decode Huffman-coded code lengths.
int* code_lengths = NULL;
int i;
int code_length_code_lengths[NUM_CODE_LENGTH_CODES] = { 0 };
const int num_codes = VP8LReadBits(br, 4) + 4;
@@ -270,22 +273,15 @@ static int ReadHuffmanCode(int alphabet_size, VP8LDecoder* const dec,
return 0;
}
code_lengths =
(int*)WebPSafeCalloc((uint64_t)alphabet_size, sizeof(*code_lengths));
if (code_lengths == NULL) {
dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
return 0;
}
memset(code_lengths, 0, alphabet_size * sizeof(*code_lengths));
for (i = 0; i < num_codes; ++i) {
code_length_code_lengths[kCodeLengthCodeOrder[i]] = VP8LReadBits(br, 3);
}
ok = ReadHuffmanCodeLengths(dec, code_length_code_lengths, alphabet_size,
code_lengths);
if (ok) {
ok = HuffmanTreeBuildImplicit(tree, code_lengths, alphabet_size);
}
free(code_lengths);
ok = ok && VP8LHuffmanTreeBuildImplicit(tree, code_lengths, huff_codes,
alphabet_size);
}
ok = ok && !br->error_;
if (!ok) {
@@ -295,19 +291,6 @@ static int ReadHuffmanCode(int alphabet_size, VP8LDecoder* const dec,
return 1;
}
static void DeleteHtreeGroups(HTreeGroup* htree_groups, int num_htree_groups) {
if (htree_groups != NULL) {
int i, j;
for (i = 0; i < num_htree_groups; ++i) {
HuffmanTree* const htrees = htree_groups[i].htrees_;
for (j = 0; j < HUFFMAN_CODES_PER_META_CODE; ++j) {
HuffmanTreeRelease(&htrees[j]);
}
}
free(htree_groups);
}
}
static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
int color_cache_bits, int allow_recursion) {
int i, j;
@@ -316,6 +299,9 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
uint32_t* huffman_image = NULL;
HTreeGroup* htree_groups = NULL;
int num_htree_groups = 1;
int max_alphabet_size = 0;
int* code_lengths = NULL;
int* huff_codes = NULL;
if (allow_recursion && VP8LReadBits(br, 1)) {
// use meta Huffman codes.
@@ -341,11 +327,24 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
if (br->error_) goto Error;
assert(num_htree_groups <= 0x10000);
htree_groups =
(HTreeGroup*)WebPSafeCalloc((uint64_t)num_htree_groups,
sizeof(*htree_groups));
if (htree_groups == NULL) {
// Find maximum alphabet size for the htree group.
for (j = 0; j < HUFFMAN_CODES_PER_META_CODE; ++j) {
int alphabet_size = kAlphabetSize[j];
if (j == 0 && color_cache_bits > 0) {
alphabet_size += 1 << color_cache_bits;
}
if (max_alphabet_size < alphabet_size) {
max_alphabet_size = alphabet_size;
}
}
htree_groups = VP8LHtreeGroupsNew(num_htree_groups);
code_lengths =
(int*)WebPSafeCalloc((uint64_t)max_alphabet_size, sizeof(*code_lengths));
huff_codes =
(int*)WebPSafeMalloc((uint64_t)max_alphabet_size, sizeof(*huff_codes));
if (htree_groups == NULL || code_lengths == NULL || huff_codes == NULL) {
dec->status_ = VP8_STATUS_OUT_OF_MEMORY;
goto Error;
}
@@ -354,12 +353,18 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
HuffmanTree* const htrees = htree_groups[i].htrees_;
for (j = 0; j < HUFFMAN_CODES_PER_META_CODE; ++j) {
int alphabet_size = kAlphabetSize[j];
HuffmanTree* const htree = htrees + j;
if (j == 0 && color_cache_bits > 0) {
alphabet_size += 1 << color_cache_bits;
}
if (!ReadHuffmanCode(alphabet_size, dec, htrees + j)) goto Error;
if (!ReadHuffmanCode(alphabet_size, dec, code_lengths, huff_codes,
htree)) {
goto Error;
}
}
}
WebPSafeFree(huff_codes);
WebPSafeFree(code_lengths);
// All OK. Finalize pointers and return.
hdr->huffman_image_ = huffman_image;
@@ -368,8 +373,10 @@ static int ReadHuffmanCodes(VP8LDecoder* const dec, int xsize, int ysize,
return 1;
Error:
free(huffman_image);
DeleteHtreeGroups(htree_groups, num_htree_groups);
WebPSafeFree(huff_codes);
WebPSafeFree(code_lengths);
WebPSafeFree(huffman_image);
VP8LHtreeGroupsFree(htree_groups, num_htree_groups);
return 0;
}
@@ -420,7 +427,7 @@ static int Export(WebPRescaler* const rescaler, WEBP_CSP_MODE colorspace,
int num_lines_out = 0;
while (WebPRescalerHasPendingOutput(rescaler)) {
uint8_t* const dst = rgba + num_lines_out * rgba_stride;
WebPRescalerExportRow(rescaler);
WebPRescalerExportRow(rescaler, 0);
WebPMultARGBRow(src, dst_width, 1);
VP8LConvertFromBGRA(src, dst_width, colorspace, dst);
++num_lines_out;
@@ -468,6 +475,7 @@ static int EmitRows(WEBP_CSP_MODE colorspace,
//------------------------------------------------------------------------------
// Export to YUVA
// TODO(skal): should be in yuv.c
static void ConvertToYUVA(const uint32_t* const src, int width, int y_pos,
const WebPDecBuffer* const output) {
const WebPYUVABuffer* const buf = &output->u.YUVA;
@@ -537,7 +545,7 @@ static int ExportYUVA(const VP8LDecoder* const dec, int y_pos) {
const int dst_width = rescaler->dst_width;
int num_lines_out = 0;
while (WebPRescalerHasPendingOutput(rescaler)) {
WebPRescalerExportRow(rescaler);
WebPRescalerExportRow(rescaler, 0);
WebPMultARGBRow(src, dst_width, 1);
ConvertToYUVA(src, dst_width, y_pos, dec->output_);
++y_pos;
@@ -740,6 +748,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data,
const int len_code_limit = NUM_LITERAL_CODES + NUM_LENGTH_CODES;
const int mask = hdr->huffman_mask_;
assert(htree_group != NULL);
assert(pos < end);
assert(last_row <= height);
assert(Is8bOptimizable(hdr));
@@ -793,6 +802,7 @@ static int DecodeAlphaData(VP8LDecoder* const dec, uint8_t* const data,
ok = 0;
goto End;
}
assert(br->eos_ == VP8LIsEndOfStream(br));
ok = !br->error_;
if (!ok) goto End;
}
@@ -830,6 +840,7 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
(hdr->color_cache_size_ > 0) ? &hdr->color_cache_ : NULL;
const int mask = hdr->huffman_mask_;
assert(htree_group != NULL);
assert(src < src_end);
assert(src_last <= src_end);
while (!br->eos_ && src < src_last) {
@@ -849,7 +860,7 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
VP8LFillBitWindow(br);
blue = ReadSymbol(&htree_group->htrees_[BLUE], br);
alpha = ReadSymbol(&htree_group->htrees_[ALPHA], br);
*src = (alpha << 24) | (red << 16) | (green << 8) | blue;
*src = ((uint32_t)alpha << 24) | (red << 16) | (green << 8) | blue;
AdvanceByOne:
++src;
++col;
@@ -889,7 +900,7 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
process_func(dec, row);
}
}
if (src < src_last) {
if (src < src_end) {
if (col & mask) htree_group = GetHtreeGroupForPos(hdr, col, row);
if (color_cache != NULL) {
while (last_cached < src) {
@@ -909,6 +920,7 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
ok = 0;
goto End;
}
assert(br->eos_ == VP8LIsEndOfStream(br));
ok = !br->error_;
if (!ok) goto End;
}
@@ -931,7 +943,7 @@ static int DecodeImageData(VP8LDecoder* const dec, uint32_t* const data,
// VP8LTransform
static void ClearTransform(VP8LTransform* const transform) {
free(transform->data_);
WebPSafeFree(transform->data_);
transform->data_ = NULL;
}
@@ -955,7 +967,7 @@ static int ExpandColorMap(int num_colors, VP8LTransform* const transform) {
}
for (; i < 4 * final_num_colors; ++i)
new_data[i] = 0; // black tail.
free(transform->data_);
WebPSafeFree(transform->data_);
transform->data_ = new_color_map;
}
return 1;
@@ -1025,8 +1037,8 @@ static void InitMetadata(VP8LMetadata* const hdr) {
static void ClearMetadata(VP8LMetadata* const hdr) {
assert(hdr);
free(hdr->huffman_image_);
DeleteHtreeGroups(hdr->htree_groups_, hdr->num_htree_groups_);
WebPSafeFree(hdr->huffman_image_);
VP8LHtreeGroupsFree(hdr->htree_groups_, hdr->num_htree_groups_);
VP8LColorCacheClear(&hdr->color_cache_);
InitMetadata(hdr);
}
@@ -1035,7 +1047,7 @@ static void ClearMetadata(VP8LMetadata* const hdr) {
// VP8LDecoder
VP8LDecoder* VP8LNew(void) {
VP8LDecoder* const dec = (VP8LDecoder*)calloc(1, sizeof(*dec));
VP8LDecoder* const dec = (VP8LDecoder*)WebPSafeCalloc(1ULL, sizeof(*dec));
if (dec == NULL) return NULL;
dec->status_ = VP8_STATUS_OK;
dec->action_ = READ_DIM;
@@ -1051,7 +1063,7 @@ void VP8LClear(VP8LDecoder* const dec) {
if (dec == NULL) return;
ClearMetadata(&dec->hdr_);
free(dec->pixels_);
WebPSafeFree(dec->pixels_);
dec->pixels_ = NULL;
for (i = 0; i < dec->next_transform_; ++i) {
ClearTransform(&dec->transforms_[i]);
@@ -1059,7 +1071,7 @@ void VP8LClear(VP8LDecoder* const dec) {
dec->next_transform_ = 0;
dec->transforms_seen_ = 0;
free(dec->rescaler_memory);
WebPSafeFree(dec->rescaler_memory);
dec->rescaler_memory = NULL;
dec->output_ = NULL; // leave no trace behind
@@ -1068,7 +1080,7 @@ void VP8LClear(VP8LDecoder* const dec) {
void VP8LDelete(VP8LDecoder* const dec) {
if (dec != NULL) {
VP8LClear(dec);
free(dec);
WebPSafeFree(dec);
}
}
@@ -1155,7 +1167,7 @@ static int DecodeImageStream(int xsize, int ysize,
End:
if (!ok) {
free(data);
WebPSafeFree(data);
ClearMetadata(hdr);
// If not enough data (br.eos_) resulted in BIT_STREAM_ERROR, update the
// status appropriately.
@@ -1294,6 +1306,10 @@ int VP8LDecodeAlphaImageStream(ALPHDecoder* const alph_dec, int last_row) {
assert(dec->action_ == READ_DATA);
assert(last_row <= dec->height_);
if (dec->last_pixel_ == dec->width_ * dec->height_) {
return 1; // done
}
// Decode (with special row processing).
return alph_dec->use_8b_decode ?
DecodeAlphaData(dec, (uint8_t*)dec->pixels_, dec->width_, dec->height_,
@@ -1341,6 +1357,10 @@ int VP8LDecodeImage(VP8LDecoder* const dec) {
// Sanity checks.
if (dec == NULL) return 0;
dec->status_ = VP8_STATUS_BITSTREAM_ERROR;
assert(dec->hdr_.htree_groups_ != NULL);
assert(dec->hdr_.num_htree_groups_ > 0);
io = dec->io_;
assert(io != NULL);
params = (WebPDecParams*)io->opaque;
@@ -1358,6 +1378,11 @@ int VP8LDecodeImage(VP8LDecoder* const dec) {
if (io->use_scaling && !AllocateAndInitRescaler(dec, io)) goto Err;
if (io->use_scaling || WebPIsPremultipliedMode(dec->output_->colorspace)) {
// need the alpha-multiply functions for premultiplied output or rescaling
WebPInitAlphaProcessing();
}
// Decode.
dec->action_ = READ_DATA;
if (!DecodeImageData(dec, dec->pixels_, dec->width_, dec->height_,
@@ -1377,4 +1402,3 @@ int VP8LDecodeImage(VP8LDecoder* const dec) {
}
//------------------------------------------------------------------------------

View File

@@ -20,7 +20,6 @@
#include "../utils/bit_reader.h"
#include "../utils/color_cache.h"
#include "../utils/huffman.h"
#include "../webp/format_constants.h"
#ifdef __cplusplus
extern "C" {
@@ -41,10 +40,6 @@ struct VP8LTransform {
uint32_t *data_; // transform data.
};
typedef struct {
HuffmanTree htrees_[HUFFMAN_CODES_PER_META_CODE];
} HTreeGroup;
typedef struct {
int color_cache_size_;
VP8LColorCache color_cache_;

View File

@@ -52,13 +52,14 @@ static WEBP_INLINE uint32_t get_le32(const uint8_t* const data) {
}
// Validates the RIFF container (if detected) and skips over it.
// If a RIFF container is detected,
// Returns VP8_STATUS_BITSTREAM_ERROR for invalid header, and
// VP8_STATUS_OK otherwise.
// If a RIFF container is detected, returns:
// VP8_STATUS_BITSTREAM_ERROR for invalid header,
// VP8_STATUS_NOT_ENOUGH_DATA for truncated data if have_all_data is true,
// and VP8_STATUS_OK otherwise.
// In case there are not enough bytes (partial RIFF container), return 0 for
// *riff_size. Else return the RIFF size extracted from the header.
static VP8StatusCode ParseRIFF(const uint8_t** const data,
size_t* const data_size,
size_t* const data_size, int have_all_data,
size_t* const riff_size) {
assert(data != NULL);
assert(data_size != NULL);
@@ -77,6 +78,9 @@ static VP8StatusCode ParseRIFF(const uint8_t** const data,
if (size > MAX_CHUNK_PAYLOAD) {
return VP8_STATUS_BITSTREAM_ERROR;
}
if (have_all_data && (size > *data_size - CHUNK_HEADER_SIZE)) {
return VP8_STATUS_NOT_ENOUGH_DATA; // Truncated bitstream.
}
// We have a RIFF container. Skip it.
*riff_size = size;
*data += RIFF_HEADER_SIZE;
@@ -223,9 +227,8 @@ static VP8StatusCode ParseOptionalChunks(const uint8_t** const data,
// extracted from the VP8/VP8L chunk header.
// The flag '*is_lossless' is set to 1 in case of VP8L chunk / raw VP8L data.
static VP8StatusCode ParseVP8Header(const uint8_t** const data_ptr,
size_t* const data_size,
size_t riff_size,
size_t* const chunk_size,
size_t* const data_size, int have_all_data,
size_t riff_size, size_t* const chunk_size,
int* const is_lossless) {
const uint8_t* const data = *data_ptr;
const int is_vp8 = !memcmp(data, "VP8 ", TAG_SIZE);
@@ -248,6 +251,9 @@ static VP8StatusCode ParseVP8Header(const uint8_t** const data_ptr,
if ((riff_size >= minimal_size) && (size > riff_size - minimal_size)) {
return VP8_STATUS_BITSTREAM_ERROR; // Inconsistent size information.
}
if (have_all_data && (size > *data_size - CHUNK_HEADER_SIZE)) {
return VP8_STATUS_NOT_ENOUGH_DATA; // Truncated bitstream.
}
// Skip over CHUNK_HEADER_SIZE bytes from VP8/VP8L Header.
*chunk_size = size;
*data_ptr += CHUNK_HEADER_SIZE;
@@ -291,6 +297,7 @@ static VP8StatusCode ParseHeadersInternal(const uint8_t* data,
int found_vp8x = 0;
int animation_present = 0;
int fragments_present = 0;
const int have_all_data = (headers != NULL) ? headers->have_all_data : 0;
VP8StatusCode status;
WebPHeaderStructure hdrs;
@@ -303,7 +310,7 @@ static VP8StatusCode ParseHeadersInternal(const uint8_t* data,
hdrs.data_size = data_size;
// Skip over RIFF header.
status = ParseRIFF(&data, &data_size, &hdrs.riff_size);
status = ParseRIFF(&data, &data_size, have_all_data, &hdrs.riff_size);
if (status != VP8_STATUS_OK) {
return status; // Wrong RIFF header / insufficient data.
}
@@ -353,7 +360,7 @@ static VP8StatusCode ParseHeadersInternal(const uint8_t* data,
}
// Skip over VP8/VP8L header.
status = ParseVP8Header(&data, &data_size, hdrs.riff_size,
status = ParseVP8Header(&data, &data_size, have_all_data, hdrs.riff_size,
&hdrs.compressed_size, &hdrs.is_lossless);
if (status != VP8_STATUS_OK) {
goto ReturnWidthHeight; // Wrong VP8/VP8L chunk-header / insufficient data.
@@ -452,6 +459,7 @@ static VP8StatusCode DecodeInto(const uint8_t* const data, size_t data_size,
headers.data = data;
headers.data_size = data_size;
headers.have_all_data = 1;
status = WebPParseHeaders(&headers); // Process Pre-VP8 chunks.
if (status != VP8_STATUS_OK) {
return status;
@@ -512,6 +520,12 @@ static VP8StatusCode DecodeInto(const uint8_t* const data, size_t data_size,
if (status != VP8_STATUS_OK) {
WebPFreeDecBuffer(params->output);
}
#if WEBP_DECODER_ABI_VERSION > 0x0203
if (params->options != NULL && params->options->flip) {
status = WebPFlipBuffer(params->output);
}
#endif
return status;
}
@@ -776,9 +790,9 @@ int WebPIoInitFromOptions(const WebPDecoderOptions* const options,
h = options->crop_height;
x = options->crop_left;
y = options->crop_top;
if (!WebPIsRGBMode(src_colorspace)) { // only snap for YUV420 or YUV422
if (!WebPIsRGBMode(src_colorspace)) { // only snap for YUV420
x &= ~1;
y &= ~1; // TODO(later): only for YUV420, not YUV422.
y &= ~1;
}
if (x < 0 || y < 0 || w <= 0 || h <= 0 || x + w > W || y + h > H) {
return 0; // out of frame boundary error

View File

@@ -54,6 +54,7 @@ void WebPResetDecParams(WebPDecParams* const params);
typedef struct {
const uint8_t* data; // input buffer
size_t data_size; // input buffer size
int have_all_data; // true if all data is known to be available
size_t offset; // offset to main data chunk (VP8 or VP8L)
const uint8_t* alpha_data; // points to alpha chunk (if present)
size_t alpha_data_size; // alpha chunk size
@@ -93,10 +94,15 @@ int WebPIoInitFromOptions(const WebPDecoderOptions* const options,
// dimension / etc.). If *options is not NULL, also verify that the options'
// parameters are valid and apply them to the width/height dimensions of the
// output buffer. This takes cropping / scaling / rotation into account.
// Also incorporates the options->flip flag to flip the buffer parameters if
// needed.
VP8StatusCode WebPAllocateDecBuffer(int width, int height,
const WebPDecoderOptions* const options,
WebPDecBuffer* const buffer);
// Flip buffer vertically by negating the various strides.
VP8StatusCode WebPFlipBuffer(WebPDecBuffer* const buffer);
// Copy 'src' into 'dst' buffer, making sure 'dst' is not marked as owner of the
// memory (still held by 'src').
void WebPCopyDecBuffer(const WebPDecBuffer* const src,
@@ -105,8 +111,6 @@ void WebPCopyDecBuffer(const WebPDecBuffer* const src,
// Copy and transfer ownership from src to dst (beware of parameter order!)
void WebPGrabDecBuffer(WebPDecBuffer* const src, WebPDecBuffer* const dst);
//------------------------------------------------------------------------------
#ifdef __cplusplus

View File

@@ -1,4 +1,3 @@
AM_CPPFLAGS = -I$(top_srcdir)/src
lib_LTLIBRARIES = libwebpdemux.la
libwebpdemux_la_SOURCES =
@@ -10,6 +9,6 @@ libwebpdemuxinclude_HEADERS += ../webp/mux_types.h
libwebpdemuxinclude_HEADERS += ../webp/types.h
libwebpdemux_la_LIBADD = ../libwebp.la
libwebpdemux_la_LDFLAGS = -no-undefined -version-info 1:0:0
libwebpdemux_la_LDFLAGS = -no-undefined -version-info 1:2:0
libwebpdemuxincludedir = $(includedir)/webp
pkgconfig_DATA = libwebpdemux.pc

View File

@@ -11,7 +11,7 @@
//
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "../webp/config.h"
#endif
#include <assert.h>
@@ -25,7 +25,7 @@
#define DMUX_MAJ_VERSION 0
#define DMUX_MIN_VERSION 2
#define DMUX_REV_VERSION 0
#define DMUX_REV_VERSION 2
typedef struct {
size_t start_; // start location of the data
@@ -289,7 +289,7 @@ static ParseStatus NewFrame(const MemBuffer* const mem,
if (actual_size < min_size) return PARSE_ERROR;
if (MemDataSize(mem) < min_size) return PARSE_NEED_MORE_DATA;
*frame = (Frame*)calloc(1, sizeof(**frame));
*frame = (Frame*)WebPSafeCalloc(1ULL, sizeof(**frame));
return (*frame == NULL) ? PARSE_ERROR : PARSE_OK;
}
@@ -317,7 +317,7 @@ static ParseStatus ParseAnimationFrame(
(bits & 1) ? WEBP_MUX_DISPOSE_BACKGROUND : WEBP_MUX_DISPOSE_NONE;
frame->blend_method_ = (bits & 2) ? WEBP_MUX_NO_BLEND : WEBP_MUX_BLEND;
if (frame->width_ * (uint64_t)frame->height_ >= MAX_IMAGE_AREA) {
free(frame);
WebPSafeFree(frame);
return PARSE_ERROR;
}
@@ -333,7 +333,7 @@ static ParseStatus ParseAnimationFrame(
}
}
if (!added_frame) free(frame);
if (!added_frame) WebPSafeFree(frame);
return status;
}
@@ -368,7 +368,7 @@ static ParseStatus ParseFragment(WebPDemuxer* const dmux,
}
}
if (!added_fragment) free(frame);
if (!added_fragment) WebPSafeFree(frame);
return status;
}
#endif // WEBP_EXPERIMENTAL_FEATURES
@@ -379,7 +379,7 @@ static ParseStatus ParseFragment(WebPDemuxer* const dmux,
// Returns true on success, false otherwise.
static int StoreChunk(WebPDemuxer* const dmux,
size_t start_offset, uint32_t size) {
Chunk* const chunk = (Chunk*)calloc(1, sizeof(*chunk));
Chunk* const chunk = (Chunk*)WebPSafeCalloc(1ULL, sizeof(*chunk));
if (chunk == NULL) return 0;
chunk->data_.offset_ = start_offset;
@@ -427,7 +427,7 @@ static ParseStatus ParseSingleImage(WebPDemuxer* const dmux) {
if (SizeIsInvalid(mem, min_size)) return PARSE_ERROR;
if (MemDataSize(mem) < min_size) return PARSE_NEED_MORE_DATA;
frame = (Frame*)calloc(1, sizeof(*frame));
frame = (Frame*)WebPSafeCalloc(1ULL, sizeof(*frame));
if (frame == NULL) return PARSE_ERROR;
// For the single image case we allow parsing of a partial frame, but we need
@@ -458,7 +458,7 @@ static ParseStatus ParseSingleImage(WebPDemuxer* const dmux) {
}
}
if (!image_added) free(frame);
if (!image_added) WebPSafeFree(frame);
return status;
}
@@ -729,7 +729,7 @@ WebPDemuxer* WebPDemuxInternal(const WebPData* data, int allow_partial,
partial = (mem.buf_size_ < mem.riff_end_);
if (!allow_partial && partial) return NULL;
dmux = (WebPDemuxer*)calloc(1, sizeof(*dmux));
dmux = (WebPDemuxer*)WebPSafeCalloc(1ULL, sizeof(*dmux));
if (dmux == NULL) return NULL;
InitDemux(dmux, &mem);
@@ -761,14 +761,14 @@ void WebPDemuxDelete(WebPDemuxer* dmux) {
for (f = dmux->frames_; f != NULL;) {
Frame* const cur_frame = f;
f = f->next_;
free(cur_frame);
WebPSafeFree(cur_frame);
}
for (c = dmux->chunks_; c != NULL;) {
Chunk* const cur_chunk = c;
c = c->next_;
free(cur_chunk);
WebPSafeFree(cur_chunk);
}
free(dmux);
WebPSafeFree(dmux);
}
// -----------------------------------------------------------------------------

View File

@@ -1,5 +1,5 @@
AM_CPPFLAGS = -I$(top_srcdir)/src
noinst_LTLIBRARIES = libwebpdsp.la
noinst_LTLIBRARIES = libwebpdsp.la libwebpdsp_avx2.la
noinst_LTLIBRARIES += libwebpdsp_sse2.la libwebpdspdecode_sse2.la
if BUILD_LIBWEBPDECODER
noinst_LTLIBRARIES += libwebpdspdecode.la
@@ -9,23 +9,49 @@ common_HEADERS = ../webp/types.h
commondir = $(includedir)/webp
COMMON_SOURCES =
COMMON_SOURCES += alpha_processing.c
COMMON_SOURCES += cpu.c
COMMON_SOURCES += dec.c
COMMON_SOURCES += dec_clip_tables.c
COMMON_SOURCES += dec_mips32.c
COMMON_SOURCES += dec_neon.c
COMMON_SOURCES += dec_sse2.c
COMMON_SOURCES += dsp.h
COMMON_SOURCES += lossless.c
COMMON_SOURCES += lossless.h
COMMON_SOURCES += lossless_mips32.c
COMMON_SOURCES += lossless_neon.c
COMMON_SOURCES += neon.h
COMMON_SOURCES += upsampling.c
COMMON_SOURCES += upsampling_neon.c
COMMON_SOURCES += upsampling_sse2.c
COMMON_SOURCES += yuv.c
COMMON_SOURCES += yuv.h
COMMON_SOURCES += yuv_mips32.c
ENC_SOURCES =
ENC_SOURCES += enc.c
ENC_SOURCES += enc_mips32.c
ENC_SOURCES += enc_neon.c
ENC_SOURCES += enc_sse2.c
libwebpdsp_avx2_la_SOURCES =
libwebpdsp_avx2_la_SOURCES += enc_avx2.c
libwebpdsp_avx2_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdsp_avx2_la_CFLAGS = $(AM_CFLAGS) $(AVX2_FLAGS)
libwebpdspdecode_sse2_la_SOURCES =
libwebpdspdecode_sse2_la_SOURCES += alpha_processing_sse2.c
libwebpdspdecode_sse2_la_SOURCES += dec_sse2.c
libwebpdspdecode_sse2_la_SOURCES += lossless_sse2.c
libwebpdspdecode_sse2_la_SOURCES += upsampling_sse2.c
libwebpdspdecode_sse2_la_SOURCES += yuv_sse2.c
libwebpdspdecode_sse2_la_SOURCES += yuv_tables_sse2.h
libwebpdspdecode_sse2_la_CPPFLAGS = $(libwebpdsp_sse2_la_CPPFLAGS)
libwebpdspdecode_sse2_la_CFLAGS = $(libwebpdsp_sse2_la_CFLAGS)
libwebpdsp_sse2_la_SOURCES =
libwebpdsp_sse2_la_SOURCES += enc_sse2.c
libwebpdsp_sse2_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdsp_sse2_la_CFLAGS = $(AM_CFLAGS) $(SSE2_FLAGS)
libwebpdsp_sse2_la_LIBADD = libwebpdspdecode_sse2.la
libwebpdsp_la_SOURCES = $(COMMON_SOURCES) $(ENC_SOURCES)
@@ -33,12 +59,14 @@ noinst_HEADERS =
noinst_HEADERS += ../dec/decode_vp8.h
noinst_HEADERS += ../webp/decode.h
libwebpdsp_la_LDFLAGS = -lm
libwebpdsp_la_CPPFLAGS = $(USE_EXPERIMENTAL_CODE) $(USE_SWAP_16BIT_CSP)
libwebpdsp_la_LDFLAGS = -lm
libwebpdsp_la_LIBADD = libwebpdsp_avx2.la libwebpdsp_sse2.la
if BUILD_LIBWEBPDECODER
libwebpdspdecode_la_SOURCES = $(COMMON_SOURCES)
libwebpdspdecode_la_LDFLAGS = $(libwebpdsp_la_LDFLAGS)
libwebpdspdecode_la_CPPFLAGS = $(libwebpdsp_la_CPPFLAGS)
libwebpdspdecode_la_LDFLAGS = $(libwebpdsp_la_LDFLAGS)
libwebpdspdecode_la_LIBADD = libwebpdspdecode_sse2.la
endif

View File

@@ -12,7 +12,7 @@
// Author: Skal (pascal.massimino@gmail.com)
#include <assert.h>
#include "./alpha_processing.h"
#include "./dsp.h"
// Tables can be faster on some platform but incur some extra binary size (~2k).
// #define USE_TABLES_FOR_ALPHA_MULT
@@ -134,7 +134,7 @@ static WEBP_INLINE uint32_t GetScale(uint32_t a, int inverse) {
#endif // USE_TABLES_FOR_ALPHA_MULT
void WebPMultARGBRow(uint32_t* const ptr, int width, int inverse) {
static void MultARGBRow(uint32_t* const ptr, int width, int inverse) {
int x;
for (x = 0; x < width; ++x) {
const uint32_t argb = ptr[x];
@@ -154,17 +154,8 @@ void WebPMultARGBRow(uint32_t* const ptr, int width, int inverse) {
}
}
void WebPMultARGBRows(uint8_t* ptr, int stride, int width, int num_rows,
int inverse) {
int n;
for (n = 0; n < num_rows; ++n) {
WebPMultARGBRow((uint32_t*)ptr, width, inverse);
ptr += stride;
}
}
void WebPMultRow(uint8_t* const ptr, const uint8_t* const alpha,
int width, int inverse) {
static void MultRow(uint8_t* const ptr, const uint8_t* const alpha,
int width, int inverse) {
int x;
for (x = 0; x < width; ++x) {
const uint32_t a = alpha[x];
@@ -179,6 +170,26 @@ void WebPMultRow(uint8_t* const ptr, const uint8_t* const alpha,
}
}
#undef KINV_255
#undef HALF
#undef MFIX
void (*WebPMultARGBRow)(uint32_t* const ptr, int width, int inverse);
void (*WebPMultRow)(uint8_t* const ptr, const uint8_t* const alpha,
int width, int inverse);
//------------------------------------------------------------------------------
// Generic per-plane calls
void WebPMultARGBRows(uint8_t* ptr, int stride, int width, int num_rows,
int inverse) {
int n;
for (n = 0; n < num_rows; ++n) {
WebPMultARGBRow((uint32_t*)ptr, width, inverse);
ptr += stride;
}
}
void WebPMultRows(uint8_t* ptr, int stride,
const uint8_t* alpha, int alpha_stride,
int width, int num_rows, int inverse) {
@@ -190,7 +201,135 @@ void WebPMultRows(uint8_t* ptr, int stride,
}
}
#undef KINV_255
#undef HALF
#undef MFIX
//------------------------------------------------------------------------------
// Premultiplied modes
// non dithered-modes
// (x * a * 32897) >> 23 is bit-wise equivalent to (int)(x * a / 255.)
// for all 8bit x or a. For bit-wise equivalence to (int)(x * a / 255. + .5),
// one can use instead: (x * a * 65793 + (1 << 23)) >> 24
#if 1 // (int)(x * a / 255.)
#define MULTIPLIER(a) ((a) * 32897U)
#define PREMULTIPLY(x, m) (((x) * (m)) >> 23)
#else // (int)(x * a / 255. + .5)
#define MULTIPLIER(a) ((a) * 65793U)
#define PREMULTIPLY(x, m) (((x) * (m) + (1U << 23)) >> 24)
#endif
static void ApplyAlphaMultiply(uint8_t* rgba, int alpha_first,
int w, int h, int stride) {
while (h-- > 0) {
uint8_t* const rgb = rgba + (alpha_first ? 1 : 0);
const uint8_t* const alpha = rgba + (alpha_first ? 0 : 3);
int i;
for (i = 0; i < w; ++i) {
const uint32_t a = alpha[4 * i];
if (a != 0xff) {
const uint32_t mult = MULTIPLIER(a);
rgb[4 * i + 0] = PREMULTIPLY(rgb[4 * i + 0], mult);
rgb[4 * i + 1] = PREMULTIPLY(rgb[4 * i + 1], mult);
rgb[4 * i + 2] = PREMULTIPLY(rgb[4 * i + 2], mult);
}
}
rgba += stride;
}
}
#undef MULTIPLIER
#undef PREMULTIPLY
// rgbA4444
#define MULTIPLIER(a) ((a) * 0x1111) // 0x1111 ~= (1 << 16) / 15
static WEBP_INLINE uint8_t dither_hi(uint8_t x) {
return (x & 0xf0) | (x >> 4);
}
static WEBP_INLINE uint8_t dither_lo(uint8_t x) {
return (x & 0x0f) | (x << 4);
}
static WEBP_INLINE uint8_t multiply(uint8_t x, uint32_t m) {
return (x * m) >> 16;
}
static WEBP_INLINE void ApplyAlphaMultiply4444(uint8_t* rgba4444,
int w, int h, int stride,
int rg_byte_pos /* 0 or 1 */) {
while (h-- > 0) {
int i;
for (i = 0; i < w; ++i) {
const uint32_t rg = rgba4444[2 * i + rg_byte_pos];
const uint32_t ba = rgba4444[2 * i + (rg_byte_pos ^ 1)];
const uint8_t a = ba & 0x0f;
const uint32_t mult = MULTIPLIER(a);
const uint8_t r = multiply(dither_hi(rg), mult);
const uint8_t g = multiply(dither_lo(rg), mult);
const uint8_t b = multiply(dither_hi(ba), mult);
rgba4444[2 * i + rg_byte_pos] = (r & 0xf0) | ((g >> 4) & 0x0f);
rgba4444[2 * i + (rg_byte_pos ^ 1)] = (b & 0xf0) | a;
}
rgba4444 += stride;
}
}
#undef MULTIPLIER
static void ApplyAlphaMultiply_16b(uint8_t* rgba4444,
int w, int h, int stride) {
#ifdef WEBP_SWAP_16BIT_CSP
ApplyAlphaMultiply4444(rgba4444, w, h, stride, 1);
#else
ApplyAlphaMultiply4444(rgba4444, w, h, stride, 0);
#endif
}
static int ExtractAlpha(const uint8_t* argb, int argb_stride,
int width, int height,
uint8_t* alpha, int alpha_stride) {
uint8_t alpha_mask = 0xff;
int i, j;
for (j = 0; j < height; ++j) {
for (i = 0; i < width; ++i) {
const uint8_t alpha_value = argb[4 * i];
alpha[i] = alpha_value;
alpha_mask &= alpha_value;
}
argb += argb_stride;
alpha += alpha_stride;
}
return (alpha_mask == 0xff);
}
void (*WebPApplyAlphaMultiply)(uint8_t*, int, int, int, int);
void (*WebPApplyAlphaMultiply4444)(uint8_t*, int, int, int);
int (*WebPExtractAlpha)(const uint8_t*, int, int, int, uint8_t*, int);
//------------------------------------------------------------------------------
// Init function
extern void WebPInitAlphaProcessingSSE2(void);
static volatile VP8CPUInfo alpha_processing_last_cpuinfo_used =
(VP8CPUInfo)&alpha_processing_last_cpuinfo_used;
void WebPInitAlphaProcessing(void) {
if (alpha_processing_last_cpuinfo_used == VP8GetCPUInfo) return;
WebPMultARGBRow = MultARGBRow;
WebPMultRow = MultRow;
WebPApplyAlphaMultiply = ApplyAlphaMultiply;
WebPApplyAlphaMultiply4444 = ApplyAlphaMultiply_16b;
WebPExtractAlpha = ExtractAlpha;
// If defined, use CPUInfo() to overwrite some pointers with faster versions.
if (VP8GetCPUInfo != NULL) {
#if defined(WEBP_USE_SSE2)
if (VP8GetCPUInfo(kSSE2)) {
WebPInitAlphaProcessingSSE2();
}
#endif
}
alpha_processing_last_cpuinfo_used = VP8GetCPUInfo;
}

View File

@@ -0,0 +1,77 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// Utilities for processing transparent channel.
//
// Author: Skal (pascal.massimino@gmail.com)
#include "./dsp.h"
#if defined(WEBP_USE_SSE2)
#include <emmintrin.h>
//------------------------------------------------------------------------------
static int ExtractAlpha(const uint8_t* argb, int argb_stride,
int width, int height,
uint8_t* alpha, int alpha_stride) {
// alpha_and stores an 'and' operation of all the alpha[] values. The final
// value is not 0xff if any of the alpha[] is not equal to 0xff.
uint32_t alpha_and = 0xff;
int i, j;
const __m128i a_mask = _mm_set1_epi32(0xffu); // to preserve alpha
const __m128i all_0xff = _mm_set_epi32(0, 0, ~0u, ~0u);
__m128i all_alphas = all_0xff;
// We must be able to access 3 extra bytes after the last written byte
// 'src[4 * width - 4]', because we don't know if alpha is the first or the
// last byte of the quadruplet.
const int limit = (width - 1) & ~7;
for (j = 0; j < height; ++j) {
const __m128i* src = (const __m128i*)argb;
for (i = 0; i < limit; i += 8) {
// load 32 argb bytes
const __m128i a0 = _mm_loadu_si128(src + 0);
const __m128i a1 = _mm_loadu_si128(src + 1);
const __m128i b0 = _mm_and_si128(a0, a_mask);
const __m128i b1 = _mm_and_si128(a1, a_mask);
const __m128i c0 = _mm_packs_epi32(b0, b1);
const __m128i d0 = _mm_packus_epi16(c0, c0);
// store
_mm_storel_epi64((__m128i*)&alpha[i], d0);
// accumulate eight alpha 'and' in parallel
all_alphas = _mm_and_si128(all_alphas, d0);
src += 2;
}
for (; i < width; ++i) {
const uint32_t alpha_value = argb[4 * i];
alpha[i] = alpha_value;
alpha_and &= alpha_value;
}
argb += argb_stride;
alpha += alpha_stride;
}
// Combine the eight alpha 'and' into a 8-bit mask.
alpha_and &= _mm_movemask_epi8(_mm_cmpeq_epi8(all_alphas, all_0xff));
return (alpha_and == 0xff);
}
#endif // WEBP_USE_SSE2
//------------------------------------------------------------------------------
// Init function
extern void WebPInitAlphaProcessingSSE2(void);
void WebPInitAlphaProcessingSSE2(void) {
#if defined(WEBP_USE_SSE2)
WebPExtractAlpha = ExtractAlpha;
#endif
}

View File

@@ -29,19 +29,54 @@ static WEBP_INLINE void GetCPUInfo(int cpu_info[4], int info_type) {
"cpuid\n"
"xchg %%edi, %%ebx\n"
: "=a"(cpu_info[0]), "=D"(cpu_info[1]), "=c"(cpu_info[2]), "=d"(cpu_info[3])
: "a"(info_type));
: "a"(info_type), "c"(0));
}
#elif defined(__i386__) || defined(__x86_64__)
static WEBP_INLINE void GetCPUInfo(int cpu_info[4], int info_type) {
__asm__ volatile (
"cpuid\n"
: "=a"(cpu_info[0]), "=b"(cpu_info[1]), "=c"(cpu_info[2]), "=d"(cpu_info[3])
: "a"(info_type));
: "a"(info_type), "c"(0));
}
#elif (defined(_M_X64) || defined(_M_IX86)) && \
defined(_MSC_FULL_VER) && _MSC_FULL_VER >= 150030729 // >= VS2008 SP1
#include <intrin.h>
#define GetCPUInfo(info, type) __cpuidex(info, type, 0) // set ecx=0
#elif defined(WEBP_MSC_SSE2)
#define GetCPUInfo __cpuid
#endif
// NaCl has no support for xgetbv or the raw opcode.
#if !defined(__native_client__) && (defined(__i386__) || defined(__x86_64__))
static WEBP_INLINE uint64_t xgetbv(void) {
const uint32_t ecx = 0;
uint32_t eax, edx;
// Use the raw opcode for xgetbv for compatibility with older toolchains.
__asm__ volatile (
".byte 0x0f, 0x01, 0xd0\n"
: "=a"(eax), "=d"(edx) : "c" (ecx));
return ((uint64_t)edx << 32) | eax;
}
#elif (defined(_M_X64) || defined(_M_IX86)) && \
defined(_MSC_FULL_VER) && _MSC_FULL_VER >= 160040219 // >= VS2010 SP1
#include <immintrin.h>
#define xgetbv() _xgetbv(0)
#elif defined(_MSC_VER) && defined(_M_IX86)
static WEBP_INLINE uint64_t xgetbv(void) {
uint32_t eax_, edx_;
__asm {
xor ecx, ecx // ecx = 0
// Use the raw opcode for xgetbv for compatibility with older toolchains.
__asm _emit 0x0f __asm _emit 0x01 __asm _emit 0xd0
mov eax_, eax
mov edx_, edx
}
return ((uint64_t)edx_ << 32) | eax_;
}
#else
#define xgetbv() 0U // no AVX for older x64 or unrecognized toolchains.
#endif
#if defined(__i386__) || defined(__x86_64__) || defined(WEBP_MSC_SSE2)
static int x86CPUInfo(CPUFeature feature) {
int cpu_info[4];
@@ -52,10 +87,23 @@ static int x86CPUInfo(CPUFeature feature) {
if (feature == kSSE3) {
return 0 != (cpu_info[2] & 0x00000001);
}
if (feature == kAVX) {
// bits 27 (OSXSAVE) & 28 (256-bit AVX)
if ((cpu_info[2] & 0x18000000) == 0x18000000) {
// XMM state and YMM state enabled by the OS.
return (xgetbv() & 0x6) == 0x6;
}
}
if (feature == kAVX2) {
if (x86CPUInfo(kAVX)) {
GetCPUInfo(cpu_info, 7);
return ((cpu_info[1] & 0x00000020) == 0x00000020);
}
}
return 0;
}
VP8CPUInfo VP8GetCPUInfo = x86CPUInfo;
#elif defined(WEBP_ANDROID_NEON)
#elif defined(WEBP_ANDROID_NEON) // NB: needs to be before generic NEON test.
static int AndroidCPUInfo(CPUFeature feature) {
const AndroidCpuFamily cpu_family = android_getCpuFamily();
const uint64_t cpu_features = android_getCpuFeatures();
@@ -66,7 +114,7 @@ static int AndroidCPUInfo(CPUFeature feature) {
return 0;
}
VP8CPUInfo VP8GetCPUInfo = AndroidCPUInfo;
#elif defined(__ARM_NEON__)
#elif defined(WEBP_USE_NEON)
// define a dummy function to enable turning off NEON at runtime by setting
// VP8DecGetCPUInfo = NULL
static int armCPUInfo(CPUFeature feature) {
@@ -74,6 +122,12 @@ static int armCPUInfo(CPUFeature feature) {
return 1;
}
VP8CPUInfo VP8GetCPUInfo = armCPUInfo;
#elif defined(WEBP_USE_MIPS32)
static int mipsCPUInfo(CPUFeature feature) {
(void)feature;
return 1;
}
VP8CPUInfo VP8GetCPUInfo = mipsCPUInfo;
#else
VP8CPUInfo VP8GetCPUInfo = NULL;
#endif

View File

@@ -15,37 +15,6 @@
#include "../dec/vp8i.h"
//------------------------------------------------------------------------------
// run-time tables (~4k)
static uint8_t abs0[255 + 255 + 1]; // abs(i)
static uint8_t abs1[255 + 255 + 1]; // abs(i)>>1
static int8_t sclip1[1020 + 1020 + 1]; // clips [-1020, 1020] to [-128, 127]
static int8_t sclip2[112 + 112 + 1]; // clips [-112, 112] to [-16, 15]
static uint8_t clip1[255 + 510 + 1]; // clips [-255,510] to [0,255]
// We declare this variable 'volatile' to prevent instruction reordering
// and make sure it's set to true _last_ (so as to be thread-safe)
static volatile int tables_ok = 0;
static void DspInitTables(void) {
if (!tables_ok) {
int i;
for (i = -255; i <= 255; ++i) {
abs0[255 + i] = (i < 0) ? -i : i;
abs1[255 + i] = abs0[255 + i] >> 1;
}
for (i = -1020; i <= 1020; ++i) {
sclip1[1020 + i] = (i < -128) ? -128 : (i > 127) ? 127 : i;
}
for (i = -112; i <= 112; ++i) {
sclip2[112 + i] = (i < -16) ? -16 : (i > 15) ? 15 : i;
}
for (i = -255; i <= 255 + 255; ++i) {
clip1[255 + i] = (i < 0) ? 0 : (i > 255) ? 255 : i;
}
tables_ok = 1;
}
}
static WEBP_INLINE uint8_t clip_8b(int v) {
return (!(v & ~0xff)) ? v : (v < 0) ? 0 : 255;
@@ -146,10 +115,10 @@ static void TransformDC(const int16_t *in, uint8_t* dst) {
}
static void TransformDCUV(const int16_t* in, uint8_t* dst) {
if (in[0 * 16]) TransformDC(in + 0 * 16, dst);
if (in[1 * 16]) TransformDC(in + 1 * 16, dst + 4);
if (in[2 * 16]) TransformDC(in + 2 * 16, dst + 4 * BPS);
if (in[3 * 16]) TransformDC(in + 3 * 16, dst + 4 * BPS + 4);
if (in[0 * 16]) VP8TransformDC(in + 0 * 16, dst);
if (in[1 * 16]) VP8TransformDC(in + 1 * 16, dst + 4);
if (in[2 * 16]) VP8TransformDC(in + 2 * 16, dst + 4 * BPS);
if (in[3 * 16]) VP8TransformDC(in + 3 * 16, dst + 4 * BPS + 4);
}
#undef STORE
@@ -184,7 +153,7 @@ static void TransformWHT(const int16_t* in, int16_t* out) {
}
}
void (*VP8TransformWHT)(const int16_t* in, int16_t* out) = TransformWHT;
void (*VP8TransformWHT)(const int16_t* in, int16_t* out);
//------------------------------------------------------------------------------
// Intra predictions
@@ -193,7 +162,7 @@ void (*VP8TransformWHT)(const int16_t* in, int16_t* out) = TransformWHT;
static WEBP_INLINE void TrueMotion(uint8_t *dst, int size) {
const uint8_t* top = dst - BPS;
const uint8_t* const clip0 = clip1 + 255 - top[-1];
const uint8_t* const clip0 = VP8kclip1 - top[-1];
int y;
for (y = 0; y < size; ++y) {
const uint8_t* const clip = clip0 + dst[-1];
@@ -448,14 +417,9 @@ static void HE8uv(uint8_t *dst) { // horizontal
// helper for chroma-DC predictions
static WEBP_INLINE void Put8x8uv(uint8_t value, uint8_t* dst) {
int j;
#ifndef WEBP_REFERENCE_IMPLEMENTATION
const uint64_t v = (uint64_t)value * 0x0101010101010101ULL;
for (j = 0; j < 8; ++j) {
*(uint64_t*)(dst + j * BPS) = v;
memset(dst + j * BPS, value, 8);
}
#else
for (j = 0; j < 8; ++j) memset(dst + j * BPS, value, 8);
#endif
}
static void DC8uv(uint8_t *dst) { // DC
@@ -512,61 +476,62 @@ const VP8PredFunc VP8PredChroma8[NUM_B_DC_MODES] = {
// 4 pixels in, 2 pixels out
static WEBP_INLINE void do_filter2(uint8_t* p, int step) {
const int p1 = p[-2*step], p0 = p[-step], q0 = p[0], q1 = p[step];
const int a = 3 * (q0 - p0) + sclip1[1020 + p1 - q1];
const int a1 = sclip2[112 + ((a + 4) >> 3)];
const int a2 = sclip2[112 + ((a + 3) >> 3)];
p[-step] = clip1[255 + p0 + a2];
p[ 0] = clip1[255 + q0 - a1];
const int a = 3 * (q0 - p0) + VP8ksclip1[p1 - q1]; // in [-893,892]
const int a1 = VP8ksclip2[(a + 4) >> 3]; // in [-16,15]
const int a2 = VP8ksclip2[(a + 3) >> 3];
p[-step] = VP8kclip1[p0 + a2];
p[ 0] = VP8kclip1[q0 - a1];
}
// 4 pixels in, 4 pixels out
static WEBP_INLINE void do_filter4(uint8_t* p, int step) {
const int p1 = p[-2*step], p0 = p[-step], q0 = p[0], q1 = p[step];
const int a = 3 * (q0 - p0);
const int a1 = sclip2[112 + ((a + 4) >> 3)];
const int a2 = sclip2[112 + ((a + 3) >> 3)];
const int a1 = VP8ksclip2[(a + 4) >> 3];
const int a2 = VP8ksclip2[(a + 3) >> 3];
const int a3 = (a1 + 1) >> 1;
p[-2*step] = clip1[255 + p1 + a3];
p[- step] = clip1[255 + p0 + a2];
p[ 0] = clip1[255 + q0 - a1];
p[ step] = clip1[255 + q1 - a3];
p[-2*step] = VP8kclip1[p1 + a3];
p[- step] = VP8kclip1[p0 + a2];
p[ 0] = VP8kclip1[q0 - a1];
p[ step] = VP8kclip1[q1 - a3];
}
// 6 pixels in, 6 pixels out
static WEBP_INLINE void do_filter6(uint8_t* p, int step) {
const int p2 = p[-3*step], p1 = p[-2*step], p0 = p[-step];
const int q0 = p[0], q1 = p[step], q2 = p[2*step];
const int a = sclip1[1020 + 3 * (q0 - p0) + sclip1[1020 + p1 - q1]];
const int a = VP8ksclip1[3 * (q0 - p0) + VP8ksclip1[p1 - q1]];
// a is in [-128,127], a1 in [-27,27], a2 in [-18,18] and a3 in [-9,9]
const int a1 = (27 * a + 63) >> 7; // eq. to ((3 * a + 7) * 9) >> 7
const int a2 = (18 * a + 63) >> 7; // eq. to ((2 * a + 7) * 9) >> 7
const int a3 = (9 * a + 63) >> 7; // eq. to ((1 * a + 7) * 9) >> 7
p[-3*step] = clip1[255 + p2 + a3];
p[-2*step] = clip1[255 + p1 + a2];
p[- step] = clip1[255 + p0 + a1];
p[ 0] = clip1[255 + q0 - a1];
p[ step] = clip1[255 + q1 - a2];
p[ 2*step] = clip1[255 + q2 - a3];
p[-3*step] = VP8kclip1[p2 + a3];
p[-2*step] = VP8kclip1[p1 + a2];
p[- step] = VP8kclip1[p0 + a1];
p[ 0] = VP8kclip1[q0 - a1];
p[ step] = VP8kclip1[q1 - a2];
p[ 2*step] = VP8kclip1[q2 - a3];
}
static WEBP_INLINE int hev(const uint8_t* p, int step, int thresh) {
const int p1 = p[-2*step], p0 = p[-step], q0 = p[0], q1 = p[step];
return (abs0[255 + p1 - p0] > thresh) || (abs0[255 + q1 - q0] > thresh);
return (VP8kabs0[p1 - p0] > thresh) || (VP8kabs0[q1 - q0] > thresh);
}
static WEBP_INLINE int needs_filter(const uint8_t* p, int step, int thresh) {
const int p1 = p[-2*step], p0 = p[-step], q0 = p[0], q1 = p[step];
return (2 * abs0[255 + p0 - q0] + abs1[255 + p1 - q1]) <= thresh;
static WEBP_INLINE int needs_filter(const uint8_t* p, int step, int t) {
const int p1 = p[-2 * step], p0 = p[-step], q0 = p[0], q1 = p[step];
return ((4 * VP8kabs0[p0 - q0] + VP8kabs0[p1 - q1]) <= t);
}
static WEBP_INLINE int needs_filter2(const uint8_t* p,
int step, int t, int it) {
const int p3 = p[-4*step], p2 = p[-3*step], p1 = p[-2*step], p0 = p[-step];
const int q0 = p[0], q1 = p[step], q2 = p[2*step], q3 = p[3*step];
if ((2 * abs0[255 + p0 - q0] + abs1[255 + p1 - q1]) > t)
return 0;
return abs0[255 + p3 - p2] <= it && abs0[255 + p2 - p1] <= it &&
abs0[255 + p1 - p0] <= it && abs0[255 + q3 - q2] <= it &&
abs0[255 + q2 - q1] <= it && abs0[255 + q1 - q0] <= it;
const int p3 = p[-4 * step], p2 = p[-3 * step], p1 = p[-2 * step];
const int p0 = p[-step], q0 = p[0];
const int q1 = p[step], q2 = p[2 * step], q3 = p[3 * step];
if ((4 * VP8kabs0[p0 - q0] + VP8kabs0[p1 - q1]) > t) return 0;
return VP8kabs0[p3 - p2] <= it && VP8kabs0[p2 - p1] <= it &&
VP8kabs0[p1 - p0] <= it && VP8kabs0[q3 - q2] <= it &&
VP8kabs0[q2 - q1] <= it && VP8kabs0[q1 - q0] <= it;
}
//------------------------------------------------------------------------------
@@ -574,8 +539,9 @@ static WEBP_INLINE int needs_filter2(const uint8_t* p,
static void SimpleVFilter16(uint8_t* p, int stride, int thresh) {
int i;
const int thresh2 = 2 * thresh + 1;
for (i = 0; i < 16; ++i) {
if (needs_filter(p + i, stride, thresh)) {
if (needs_filter(p + i, stride, thresh2)) {
do_filter2(p + i, stride);
}
}
@@ -583,8 +549,9 @@ static void SimpleVFilter16(uint8_t* p, int stride, int thresh) {
static void SimpleHFilter16(uint8_t* p, int stride, int thresh) {
int i;
const int thresh2 = 2 * thresh + 1;
for (i = 0; i < 16; ++i) {
if (needs_filter(p + i * stride, 1, thresh)) {
if (needs_filter(p + i * stride, 1, thresh2)) {
do_filter2(p + i * stride, 1);
}
}
@@ -612,8 +579,9 @@ static void SimpleHFilter16i(uint8_t* p, int stride, int thresh) {
static WEBP_INLINE void FilterLoop26(uint8_t* p,
int hstride, int vstride, int size,
int thresh, int ithresh, int hev_thresh) {
const int thresh2 = 2 * thresh + 1;
while (size-- > 0) {
if (needs_filter2(p, hstride, thresh, ithresh)) {
if (needs_filter2(p, hstride, thresh2, ithresh)) {
if (hev(p, hstride, hev_thresh)) {
do_filter2(p, hstride);
} else {
@@ -627,8 +595,9 @@ static WEBP_INLINE void FilterLoop26(uint8_t* p,
static WEBP_INLINE void FilterLoop24(uint8_t* p,
int hstride, int vstride, int size,
int thresh, int ithresh, int hev_thresh) {
const int thresh2 = 2 * thresh + 1;
while (size-- > 0) {
if (needs_filter2(p, hstride, thresh, ithresh)) {
if (needs_filter2(p, hstride, thresh2, ithresh)) {
if (hev(p, hstride, hev_thresh)) {
do_filter2(p, hstride);
} else {
@@ -717,10 +686,17 @@ VP8SimpleFilterFunc VP8SimpleHFilter16i;
extern void VP8DspInitSSE2(void);
extern void VP8DspInitNEON(void);
extern void VP8DspInitMIPS32(void);
static volatile VP8CPUInfo dec_last_cpuinfo_used =
(VP8CPUInfo)&dec_last_cpuinfo_used;
void VP8DspInit(void) {
DspInitTables();
if (dec_last_cpuinfo_used == VP8GetCPUInfo) return;
VP8InitClipTables();
VP8TransformWHT = TransformWHT;
VP8Transform = TransformTwo;
VP8TransformUV = TransformUV;
VP8TransformDC = TransformDC;
@@ -741,7 +717,7 @@ void VP8DspInit(void) {
VP8SimpleHFilter16i = SimpleHFilter16i;
// If defined, use CPUInfo() to overwrite some pointers with faster versions.
if (VP8GetCPUInfo) {
if (VP8GetCPUInfo != NULL) {
#if defined(WEBP_USE_SSE2)
if (VP8GetCPUInfo(kSSE2)) {
VP8DspInitSSE2();
@@ -750,7 +726,11 @@ void VP8DspInit(void) {
if (VP8GetCPUInfo(kNEON)) {
VP8DspInitNEON();
}
#elif defined(WEBP_USE_MIPS32)
if (VP8GetCPUInfo(kMIPS32)) {
VP8DspInitMIPS32();
}
#endif
}
dec_last_cpuinfo_used = VP8GetCPUInfo;
}

366
src/dsp/dec_clip_tables.c Normal file
View File

@@ -0,0 +1,366 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// Clipping tables for filtering
//
// Author: Skal (pascal.massimino@gmail.com)
#include "./dsp.h"
#define USE_STATIC_TABLES // undefine to have run-time table initialization
#ifdef USE_STATIC_TABLES
static const uint8_t abs0[255 + 255 + 1] = {
0xff, 0xfe, 0xfd, 0xfc, 0xfb, 0xfa, 0xf9, 0xf8, 0xf7, 0xf6, 0xf5, 0xf4,
0xf3, 0xf2, 0xf1, 0xf0, 0xef, 0xee, 0xed, 0xec, 0xeb, 0xea, 0xe9, 0xe8,
0xe7, 0xe6, 0xe5, 0xe4, 0xe3, 0xe2, 0xe1, 0xe0, 0xdf, 0xde, 0xdd, 0xdc,
0xdb, 0xda, 0xd9, 0xd8, 0xd7, 0xd6, 0xd5, 0xd4, 0xd3, 0xd2, 0xd1, 0xd0,
0xcf, 0xce, 0xcd, 0xcc, 0xcb, 0xca, 0xc9, 0xc8, 0xc7, 0xc6, 0xc5, 0xc4,
0xc3, 0xc2, 0xc1, 0xc0, 0xbf, 0xbe, 0xbd, 0xbc, 0xbb, 0xba, 0xb9, 0xb8,
0xb7, 0xb6, 0xb5, 0xb4, 0xb3, 0xb2, 0xb1, 0xb0, 0xaf, 0xae, 0xad, 0xac,
0xab, 0xaa, 0xa9, 0xa8, 0xa7, 0xa6, 0xa5, 0xa4, 0xa3, 0xa2, 0xa1, 0xa0,
0x9f, 0x9e, 0x9d, 0x9c, 0x9b, 0x9a, 0x99, 0x98, 0x97, 0x96, 0x95, 0x94,
0x93, 0x92, 0x91, 0x90, 0x8f, 0x8e, 0x8d, 0x8c, 0x8b, 0x8a, 0x89, 0x88,
0x87, 0x86, 0x85, 0x84, 0x83, 0x82, 0x81, 0x80, 0x7f, 0x7e, 0x7d, 0x7c,
0x7b, 0x7a, 0x79, 0x78, 0x77, 0x76, 0x75, 0x74, 0x73, 0x72, 0x71, 0x70,
0x6f, 0x6e, 0x6d, 0x6c, 0x6b, 0x6a, 0x69, 0x68, 0x67, 0x66, 0x65, 0x64,
0x63, 0x62, 0x61, 0x60, 0x5f, 0x5e, 0x5d, 0x5c, 0x5b, 0x5a, 0x59, 0x58,
0x57, 0x56, 0x55, 0x54, 0x53, 0x52, 0x51, 0x50, 0x4f, 0x4e, 0x4d, 0x4c,
0x4b, 0x4a, 0x49, 0x48, 0x47, 0x46, 0x45, 0x44, 0x43, 0x42, 0x41, 0x40,
0x3f, 0x3e, 0x3d, 0x3c, 0x3b, 0x3a, 0x39, 0x38, 0x37, 0x36, 0x35, 0x34,
0x33, 0x32, 0x31, 0x30, 0x2f, 0x2e, 0x2d, 0x2c, 0x2b, 0x2a, 0x29, 0x28,
0x27, 0x26, 0x25, 0x24, 0x23, 0x22, 0x21, 0x20, 0x1f, 0x1e, 0x1d, 0x1c,
0x1b, 0x1a, 0x19, 0x18, 0x17, 0x16, 0x15, 0x14, 0x13, 0x12, 0x11, 0x10,
0x0f, 0x0e, 0x0d, 0x0c, 0x0b, 0x0a, 0x09, 0x08, 0x07, 0x06, 0x05, 0x04,
0x03, 0x02, 0x01, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11, 0x12, 0x13, 0x14,
0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20,
0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c,
0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 0x41, 0x42, 0x43, 0x44,
0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50,
0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5c,
0x5d, 0x5e, 0x5f, 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68,
0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0x73, 0x74,
0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80,
0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c,
0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98,
0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4,
0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0,
0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc,
0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8,
0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4,
0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0,
0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec,
0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8,
0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff
};
static const int8_t sclip1[1020 + 1020 + 1] = {
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80, 0x80,
0x80, 0x80, 0x80, 0x80, 0x80, 0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87,
0x88, 0x89, 0x8a, 0x8b, 0x8c, 0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93,
0x94, 0x95, 0x96, 0x97, 0x98, 0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f,
0xa0, 0xa1, 0xa2, 0xa3, 0xa4, 0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab,
0xac, 0xad, 0xae, 0xaf, 0xb0, 0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7,
0xb8, 0xb9, 0xba, 0xbb, 0xbc, 0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3,
0xc4, 0xc5, 0xc6, 0xc7, 0xc8, 0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf,
0xd0, 0xd1, 0xd2, 0xd3, 0xd4, 0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb,
0xdc, 0xdd, 0xde, 0xdf, 0xe0, 0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7,
0xe8, 0xe9, 0xea, 0xeb, 0xec, 0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3,
0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff,
0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08, 0x09, 0x0a, 0x0b,
0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11, 0x12, 0x13, 0x14, 0x15, 0x16, 0x17,
0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20, 0x21, 0x22, 0x23,
0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c, 0x2d, 0x2e, 0x2f,
0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38, 0x39, 0x3a, 0x3b,
0x3c, 0x3d, 0x3e, 0x3f, 0x40, 0x41, 0x42, 0x43, 0x44, 0x45, 0x46, 0x47,
0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50, 0x51, 0x52, 0x53,
0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5c, 0x5d, 0x5e, 0x5f,
0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68, 0x69, 0x6a, 0x6b,
0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0x73, 0x74, 0x75, 0x76, 0x77,
0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f,
0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f, 0x7f
};
static const int8_t sclip2[112 + 112 + 1] = {
0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0,
0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0,
0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0,
0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0,
0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0,
0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0,
0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0,
0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0, 0xf0,
0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8, 0xf9, 0xfa, 0xfb,
0xfc, 0xfd, 0xfe, 0xff, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07,
0x08, 0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f,
0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f, 0x0f
};
static const uint8_t clip1[255 + 511 + 1] = {
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00, 0x00,
0x00, 0x00, 0x00, 0x00, 0x01, 0x02, 0x03, 0x04, 0x05, 0x06, 0x07, 0x08,
0x09, 0x0a, 0x0b, 0x0c, 0x0d, 0x0e, 0x0f, 0x10, 0x11, 0x12, 0x13, 0x14,
0x15, 0x16, 0x17, 0x18, 0x19, 0x1a, 0x1b, 0x1c, 0x1d, 0x1e, 0x1f, 0x20,
0x21, 0x22, 0x23, 0x24, 0x25, 0x26, 0x27, 0x28, 0x29, 0x2a, 0x2b, 0x2c,
0x2d, 0x2e, 0x2f, 0x30, 0x31, 0x32, 0x33, 0x34, 0x35, 0x36, 0x37, 0x38,
0x39, 0x3a, 0x3b, 0x3c, 0x3d, 0x3e, 0x3f, 0x40, 0x41, 0x42, 0x43, 0x44,
0x45, 0x46, 0x47, 0x48, 0x49, 0x4a, 0x4b, 0x4c, 0x4d, 0x4e, 0x4f, 0x50,
0x51, 0x52, 0x53, 0x54, 0x55, 0x56, 0x57, 0x58, 0x59, 0x5a, 0x5b, 0x5c,
0x5d, 0x5e, 0x5f, 0x60, 0x61, 0x62, 0x63, 0x64, 0x65, 0x66, 0x67, 0x68,
0x69, 0x6a, 0x6b, 0x6c, 0x6d, 0x6e, 0x6f, 0x70, 0x71, 0x72, 0x73, 0x74,
0x75, 0x76, 0x77, 0x78, 0x79, 0x7a, 0x7b, 0x7c, 0x7d, 0x7e, 0x7f, 0x80,
0x81, 0x82, 0x83, 0x84, 0x85, 0x86, 0x87, 0x88, 0x89, 0x8a, 0x8b, 0x8c,
0x8d, 0x8e, 0x8f, 0x90, 0x91, 0x92, 0x93, 0x94, 0x95, 0x96, 0x97, 0x98,
0x99, 0x9a, 0x9b, 0x9c, 0x9d, 0x9e, 0x9f, 0xa0, 0xa1, 0xa2, 0xa3, 0xa4,
0xa5, 0xa6, 0xa7, 0xa8, 0xa9, 0xaa, 0xab, 0xac, 0xad, 0xae, 0xaf, 0xb0,
0xb1, 0xb2, 0xb3, 0xb4, 0xb5, 0xb6, 0xb7, 0xb8, 0xb9, 0xba, 0xbb, 0xbc,
0xbd, 0xbe, 0xbf, 0xc0, 0xc1, 0xc2, 0xc3, 0xc4, 0xc5, 0xc6, 0xc7, 0xc8,
0xc9, 0xca, 0xcb, 0xcc, 0xcd, 0xce, 0xcf, 0xd0, 0xd1, 0xd2, 0xd3, 0xd4,
0xd5, 0xd6, 0xd7, 0xd8, 0xd9, 0xda, 0xdb, 0xdc, 0xdd, 0xde, 0xdf, 0xe0,
0xe1, 0xe2, 0xe3, 0xe4, 0xe5, 0xe6, 0xe7, 0xe8, 0xe9, 0xea, 0xeb, 0xec,
0xed, 0xee, 0xef, 0xf0, 0xf1, 0xf2, 0xf3, 0xf4, 0xf5, 0xf6, 0xf7, 0xf8,
0xf9, 0xfa, 0xfb, 0xfc, 0xfd, 0xfe, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff,
0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff, 0xff
};
#else
// uninitialized tables
static uint8_t abs0[255 + 255 + 1];
static int8_t sclip1[1020 + 1020 + 1];
static int8_t sclip2[112 + 112 + 1];
static uint8_t clip1[255 + 511 + 1];
// We declare this variable 'volatile' to prevent instruction reordering
// and make sure it's set to true _last_ (so as to be thread-safe)
static volatile int tables_ok = 0;
#endif
const int8_t* const VP8ksclip1 = &sclip1[1020];
const int8_t* const VP8ksclip2 = &sclip2[112];
const uint8_t* const VP8kclip1 = &clip1[255];
const uint8_t* const VP8kabs0 = &abs0[255];
void VP8InitClipTables(void) {
#if !defined(USE_STATIC_TABLES)
int i;
if (!tables_ok) {
for (i = -255; i <= 255; ++i) {
abs0[255 + i] = (i < 0) ? -i : i;
}
for (i = -1020; i <= 1020; ++i) {
sclip1[1020 + i] = (i < -128) ? -128 : (i > 127) ? 127 : i;
}
for (i = -112; i <= 112; ++i) {
sclip2[112 + i] = (i < -16) ? -16 : (i > 15) ? 15 : i;
}
for (i = -255; i <= 255 + 255; ++i) {
clip1[255 + i] = (i < 0) ? 0 : (i > 255) ? 255 : i;
}
tables_ok = 1;
}
#endif // USE_STATIC_TABLES
}

578
src/dsp/dec_mips32.c Normal file
View File

@@ -0,0 +1,578 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// MIPS version of dsp functions
//
// Author(s): Djordje Pesut (djordje.pesut@imgtec.com)
// Jovan Zelincevic (jovan.zelincevic@imgtec.com)
#include "./dsp.h"
#if defined(WEBP_USE_MIPS32)
static const int kC1 = 20091 + (1 << 16);
static const int kC2 = 35468;
static WEBP_INLINE int abs_mips32(int x) {
const int sign = x >> 31;
return (x ^ sign) - sign;
}
// 4 pixels in, 2 pixels out
static WEBP_INLINE void do_filter2(uint8_t* p, int step) {
const int p1 = p[-2 * step], p0 = p[-step], q0 = p[0], q1 = p[step];
const int a = 3 * (q0 - p0) + VP8ksclip1[p1 - q1];
const int a1 = VP8ksclip2[(a + 4) >> 3];
const int a2 = VP8ksclip2[(a + 3) >> 3];
p[-step] = VP8kclip1[p0 + a2];
p[ 0] = VP8kclip1[q0 - a1];
}
// 4 pixels in, 4 pixels out
static WEBP_INLINE void do_filter4(uint8_t* p, int step) {
const int p1 = p[-2 * step], p0 = p[-step], q0 = p[0], q1 = p[step];
const int a = 3 * (q0 - p0);
const int a1 = VP8ksclip2[(a + 4) >> 3];
const int a2 = VP8ksclip2[(a + 3) >> 3];
const int a3 = (a1 + 1) >> 1;
p[-2 * step] = VP8kclip1[p1 + a3];
p[- step] = VP8kclip1[p0 + a2];
p[ 0] = VP8kclip1[q0 - a1];
p[ step] = VP8kclip1[q1 - a3];
}
// 6 pixels in, 6 pixels out
static WEBP_INLINE void do_filter6(uint8_t* p, int step) {
const int p2 = p[-3 * step], p1 = p[-2 * step], p0 = p[-step];
const int q0 = p[0], q1 = p[step], q2 = p[2 * step];
const int a = VP8ksclip1[3 * (q0 - p0) + VP8ksclip1[p1 - q1]];
const int a1 = (27 * a + 63) >> 7; // eq. to ((3 * a + 7) * 9) >> 7
const int a2 = (18 * a + 63) >> 7; // eq. to ((2 * a + 7) * 9) >> 7
const int a3 = (9 * a + 63) >> 7; // eq. to ((1 * a + 7) * 9) >> 7
p[-3 * step] = VP8kclip1[p2 + a3];
p[-2 * step] = VP8kclip1[p1 + a2];
p[- step] = VP8kclip1[p0 + a1];
p[ 0] = VP8kclip1[q0 - a1];
p[ step] = VP8kclip1[q1 - a2];
p[ 2 * step] = VP8kclip1[q2 - a3];
}
static WEBP_INLINE int hev(const uint8_t* p, int step, int thresh) {
const int p1 = p[-2 * step], p0 = p[-step], q0 = p[0], q1 = p[step];
return (abs_mips32(p1 - p0) > thresh) || (abs_mips32(q1 - q0) > thresh);
}
static WEBP_INLINE int needs_filter(const uint8_t* p, int step, int thresh) {
const int p1 = p[-2 * step], p0 = p[-step], q0 = p[0], q1 = p[step];
return ((2 * abs_mips32(p0 - q0) + (abs_mips32(p1 - q1) >> 1)) <= thresh);
}
static WEBP_INLINE int needs_filter2(const uint8_t* p,
int step, int t, int it) {
const int p3 = p[-4 * step], p2 = p[-3 * step];
const int p1 = p[-2 * step], p0 = p[-step];
const int q0 = p[0], q1 = p[step], q2 = p[2 * step], q3 = p[3 * step];
if ((2 * abs_mips32(p0 - q0) + (abs_mips32(p1 - q1) >> 1)) > t) {
return 0;
}
return abs_mips32(p3 - p2) <= it && abs_mips32(p2 - p1) <= it &&
abs_mips32(p1 - p0) <= it && abs_mips32(q3 - q2) <= it &&
abs_mips32(q2 - q1) <= it && abs_mips32(q1 - q0) <= it;
}
static WEBP_INLINE void FilterLoop26(uint8_t* p,
int hstride, int vstride, int size,
int thresh, int ithresh, int hev_thresh) {
while (size-- > 0) {
if (needs_filter2(p, hstride, thresh, ithresh)) {
if (hev(p, hstride, hev_thresh)) {
do_filter2(p, hstride);
} else {
do_filter6(p, hstride);
}
}
p += vstride;
}
}
static WEBP_INLINE void FilterLoop24(uint8_t* p,
int hstride, int vstride, int size,
int thresh, int ithresh, int hev_thresh) {
while (size-- > 0) {
if (needs_filter2(p, hstride, thresh, ithresh)) {
if (hev(p, hstride, hev_thresh)) {
do_filter2(p, hstride);
} else {
do_filter4(p, hstride);
}
}
p += vstride;
}
}
// on macroblock edges
static void VFilter16(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
FilterLoop26(p, stride, 1, 16, thresh, ithresh, hev_thresh);
}
static void HFilter16(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
FilterLoop26(p, 1, stride, 16, thresh, ithresh, hev_thresh);
}
// 8-pixels wide variant, for chroma filtering
static void VFilter8(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
FilterLoop26(u, stride, 1, 8, thresh, ithresh, hev_thresh);
FilterLoop26(v, stride, 1, 8, thresh, ithresh, hev_thresh);
}
static void HFilter8(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
FilterLoop26(u, 1, stride, 8, thresh, ithresh, hev_thresh);
FilterLoop26(v, 1, stride, 8, thresh, ithresh, hev_thresh);
}
static void VFilter8i(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
FilterLoop24(u + 4 * stride, stride, 1, 8, thresh, ithresh, hev_thresh);
FilterLoop24(v + 4 * stride, stride, 1, 8, thresh, ithresh, hev_thresh);
}
static void HFilter8i(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
FilterLoop24(u + 4, 1, stride, 8, thresh, ithresh, hev_thresh);
FilterLoop24(v + 4, 1, stride, 8, thresh, ithresh, hev_thresh);
}
// on three inner edges
static void VFilter16i(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4 * stride;
FilterLoop24(p, stride, 1, 16, thresh, ithresh, hev_thresh);
}
}
static void HFilter16i(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4;
FilterLoop24(p, 1, stride, 16, thresh, ithresh, hev_thresh);
}
}
//------------------------------------------------------------------------------
// Simple In-loop filtering (Paragraph 15.2)
static void SimpleVFilter16(uint8_t* p, int stride, int thresh) {
int i;
for (i = 0; i < 16; ++i) {
if (needs_filter(p + i, stride, thresh)) {
do_filter2(p + i, stride);
}
}
}
static void SimpleHFilter16(uint8_t* p, int stride, int thresh) {
int i;
for (i = 0; i < 16; ++i) {
if (needs_filter(p + i * stride, 1, thresh)) {
do_filter2(p + i * stride, 1);
}
}
}
static void SimpleVFilter16i(uint8_t* p, int stride, int thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4 * stride;
SimpleVFilter16(p, stride, thresh);
}
}
static void SimpleHFilter16i(uint8_t* p, int stride, int thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4;
SimpleHFilter16(p, stride, thresh);
}
}
static void TransformOne(const int16_t* in, uint8_t* dst) {
int temp0, temp1, temp2, temp3, temp4;
int temp5, temp6, temp7, temp8, temp9;
int temp10, temp11, temp12, temp13, temp14;
int temp15, temp16, temp17, temp18;
int16_t* p_in = (int16_t*)in;
// loops unrolled and merged to avoid usage of tmp buffer
// and to reduce number of stalls. MUL macro is written
// in assembler and inlined
__asm__ volatile(
"lh %[temp0], 0(%[in]) \n\t"
"lh %[temp8], 16(%[in]) \n\t"
"lh %[temp4], 8(%[in]) \n\t"
"lh %[temp12], 24(%[in]) \n\t"
"addu %[temp16], %[temp0], %[temp8] \n\t"
"subu %[temp0], %[temp0], %[temp8] \n\t"
"mul %[temp8], %[temp4], %[kC2] \n\t"
"mul %[temp17], %[temp12], %[kC1] \n\t"
"mul %[temp4], %[temp4], %[kC1] \n\t"
"mul %[temp12], %[temp12], %[kC2] \n\t"
"lh %[temp1], 2(%[in]) \n\t"
"lh %[temp5], 10(%[in]) \n\t"
"lh %[temp9], 18(%[in]) \n\t"
"lh %[temp13], 26(%[in]) \n\t"
"sra %[temp8], %[temp8], 16 \n\t"
"sra %[temp17], %[temp17], 16 \n\t"
"sra %[temp4], %[temp4], 16 \n\t"
"sra %[temp12], %[temp12], 16 \n\t"
"lh %[temp2], 4(%[in]) \n\t"
"lh %[temp6], 12(%[in]) \n\t"
"lh %[temp10], 20(%[in]) \n\t"
"lh %[temp14], 28(%[in]) \n\t"
"subu %[temp17], %[temp8], %[temp17] \n\t"
"addu %[temp4], %[temp4], %[temp12] \n\t"
"addu %[temp8], %[temp16], %[temp4] \n\t"
"subu %[temp4], %[temp16], %[temp4] \n\t"
"addu %[temp16], %[temp1], %[temp9] \n\t"
"subu %[temp1], %[temp1], %[temp9] \n\t"
"lh %[temp3], 6(%[in]) \n\t"
"lh %[temp7], 14(%[in]) \n\t"
"lh %[temp11], 22(%[in]) \n\t"
"lh %[temp15], 30(%[in]) \n\t"
"addu %[temp12], %[temp0], %[temp17] \n\t"
"subu %[temp0], %[temp0], %[temp17] \n\t"
"mul %[temp9], %[temp5], %[kC2] \n\t"
"mul %[temp17], %[temp13], %[kC1] \n\t"
"mul %[temp5], %[temp5], %[kC1] \n\t"
"mul %[temp13], %[temp13], %[kC2] \n\t"
"sra %[temp9], %[temp9], 16 \n\t"
"sra %[temp17], %[temp17], 16 \n\t"
"subu %[temp17], %[temp9], %[temp17] \n\t"
"sra %[temp5], %[temp5], 16 \n\t"
"sra %[temp13], %[temp13], 16 \n\t"
"addu %[temp5], %[temp5], %[temp13] \n\t"
"addu %[temp13], %[temp1], %[temp17] \n\t"
"subu %[temp1], %[temp1], %[temp17] \n\t"
"mul %[temp17], %[temp14], %[kC1] \n\t"
"mul %[temp14], %[temp14], %[kC2] \n\t"
"addu %[temp9], %[temp16], %[temp5] \n\t"
"subu %[temp5], %[temp16], %[temp5] \n\t"
"addu %[temp16], %[temp2], %[temp10] \n\t"
"subu %[temp2], %[temp2], %[temp10] \n\t"
"mul %[temp10], %[temp6], %[kC2] \n\t"
"mul %[temp6], %[temp6], %[kC1] \n\t"
"sra %[temp17], %[temp17], 16 \n\t"
"sra %[temp14], %[temp14], 16 \n\t"
"sra %[temp10], %[temp10], 16 \n\t"
"sra %[temp6], %[temp6], 16 \n\t"
"subu %[temp17], %[temp10], %[temp17] \n\t"
"addu %[temp6], %[temp6], %[temp14] \n\t"
"addu %[temp10], %[temp16], %[temp6] \n\t"
"subu %[temp6], %[temp16], %[temp6] \n\t"
"addu %[temp14], %[temp2], %[temp17] \n\t"
"subu %[temp2], %[temp2], %[temp17] \n\t"
"mul %[temp17], %[temp15], %[kC1] \n\t"
"mul %[temp15], %[temp15], %[kC2] \n\t"
"addu %[temp16], %[temp3], %[temp11] \n\t"
"subu %[temp3], %[temp3], %[temp11] \n\t"
"mul %[temp11], %[temp7], %[kC2] \n\t"
"mul %[temp7], %[temp7], %[kC1] \n\t"
"addiu %[temp8], %[temp8], 4 \n\t"
"addiu %[temp12], %[temp12], 4 \n\t"
"addiu %[temp0], %[temp0], 4 \n\t"
"addiu %[temp4], %[temp4], 4 \n\t"
"sra %[temp17], %[temp17], 16 \n\t"
"sra %[temp15], %[temp15], 16 \n\t"
"sra %[temp11], %[temp11], 16 \n\t"
"sra %[temp7], %[temp7], 16 \n\t"
"subu %[temp17], %[temp11], %[temp17] \n\t"
"addu %[temp7], %[temp7], %[temp15] \n\t"
"addu %[temp15], %[temp3], %[temp17] \n\t"
"subu %[temp3], %[temp3], %[temp17] \n\t"
"addu %[temp11], %[temp16], %[temp7] \n\t"
"subu %[temp7], %[temp16], %[temp7] \n\t"
"addu %[temp16], %[temp8], %[temp10] \n\t"
"subu %[temp8], %[temp8], %[temp10] \n\t"
"mul %[temp10], %[temp9], %[kC2] \n\t"
"mul %[temp17], %[temp11], %[kC1] \n\t"
"mul %[temp9], %[temp9], %[kC1] \n\t"
"mul %[temp11], %[temp11], %[kC2] \n\t"
"sra %[temp10], %[temp10], 16 \n\t"
"sra %[temp17], %[temp17], 16 \n\t"
"sra %[temp9], %[temp9], 16 \n\t"
"sra %[temp11], %[temp11], 16 \n\t"
"subu %[temp17], %[temp10], %[temp17] \n\t"
"addu %[temp11], %[temp9], %[temp11] \n\t"
"addu %[temp10], %[temp12], %[temp14] \n\t"
"subu %[temp12], %[temp12], %[temp14] \n\t"
"mul %[temp14], %[temp13], %[kC2] \n\t"
"mul %[temp9], %[temp15], %[kC1] \n\t"
"mul %[temp13], %[temp13], %[kC1] \n\t"
"mul %[temp15], %[temp15], %[kC2] \n\t"
"sra %[temp14], %[temp14], 16 \n\t"
"sra %[temp9], %[temp9], 16 \n\t"
"sra %[temp13], %[temp13], 16 \n\t"
"sra %[temp15], %[temp15], 16 \n\t"
"subu %[temp9], %[temp14], %[temp9] \n\t"
"addu %[temp15], %[temp13], %[temp15] \n\t"
"addu %[temp14], %[temp0], %[temp2] \n\t"
"subu %[temp0], %[temp0], %[temp2] \n\t"
"mul %[temp2], %[temp1], %[kC2] \n\t"
"mul %[temp13], %[temp3], %[kC1] \n\t"
"mul %[temp1], %[temp1], %[kC1] \n\t"
"mul %[temp3], %[temp3], %[kC2] \n\t"
"sra %[temp2], %[temp2], 16 \n\t"
"sra %[temp13], %[temp13], 16 \n\t"
"sra %[temp1], %[temp1], 16 \n\t"
"sra %[temp3], %[temp3], 16 \n\t"
"subu %[temp13], %[temp2], %[temp13] \n\t"
"addu %[temp3], %[temp1], %[temp3] \n\t"
"addu %[temp2], %[temp4], %[temp6] \n\t"
"subu %[temp4], %[temp4], %[temp6] \n\t"
"mul %[temp6], %[temp5], %[kC2] \n\t"
"mul %[temp1], %[temp7], %[kC1] \n\t"
"mul %[temp5], %[temp5], %[kC1] \n\t"
"mul %[temp7], %[temp7], %[kC2] \n\t"
"sra %[temp6], %[temp6], 16 \n\t"
"sra %[temp1], %[temp1], 16 \n\t"
"sra %[temp5], %[temp5], 16 \n\t"
"sra %[temp7], %[temp7], 16 \n\t"
"subu %[temp1], %[temp6], %[temp1] \n\t"
"addu %[temp7], %[temp5], %[temp7] \n\t"
"addu %[temp5], %[temp16], %[temp11] \n\t"
"subu %[temp16], %[temp16], %[temp11] \n\t"
"addu %[temp11], %[temp8], %[temp17] \n\t"
"subu %[temp8], %[temp8], %[temp17] \n\t"
"sra %[temp5], %[temp5], 3 \n\t"
"sra %[temp16], %[temp16], 3 \n\t"
"sra %[temp11], %[temp11], 3 \n\t"
"sra %[temp8], %[temp8], 3 \n\t"
"addu %[temp17], %[temp10], %[temp15] \n\t"
"subu %[temp10], %[temp10], %[temp15] \n\t"
"addu %[temp15], %[temp12], %[temp9] \n\t"
"subu %[temp12], %[temp12], %[temp9] \n\t"
"sra %[temp17], %[temp17], 3 \n\t"
"sra %[temp10], %[temp10], 3 \n\t"
"sra %[temp15], %[temp15], 3 \n\t"
"sra %[temp12], %[temp12], 3 \n\t"
"addu %[temp9], %[temp14], %[temp3] \n\t"
"subu %[temp14], %[temp14], %[temp3] \n\t"
"addu %[temp3], %[temp0], %[temp13] \n\t"
"subu %[temp0], %[temp0], %[temp13] \n\t"
"sra %[temp9], %[temp9], 3 \n\t"
"sra %[temp14], %[temp14], 3 \n\t"
"sra %[temp3], %[temp3], 3 \n\t"
"sra %[temp0], %[temp0], 3 \n\t"
"addu %[temp13], %[temp2], %[temp7] \n\t"
"subu %[temp2], %[temp2], %[temp7] \n\t"
"addu %[temp7], %[temp4], %[temp1] \n\t"
"subu %[temp4], %[temp4], %[temp1] \n\t"
"sra %[temp13], %[temp13], 3 \n\t"
"sra %[temp2], %[temp2], 3 \n\t"
"sra %[temp7], %[temp7], 3 \n\t"
"sra %[temp4], %[temp4], 3 \n\t"
"addiu %[temp6], $zero, 255 \n\t"
"lbu %[temp1], 0(%[dst]) \n\t"
"addu %[temp1], %[temp1], %[temp5] \n\t"
"sra %[temp5], %[temp1], 8 \n\t"
"sra %[temp18], %[temp1], 31 \n\t"
"beqz %[temp5], 1f \n\t"
"xor %[temp1], %[temp1], %[temp1] \n\t"
"movz %[temp1], %[temp6], %[temp18] \n\t"
"1: \n\t"
"lbu %[temp18], 1(%[dst]) \n\t"
"sb %[temp1], 0(%[dst]) \n\t"
"addu %[temp18], %[temp18], %[temp11] \n\t"
"sra %[temp11], %[temp18], 8 \n\t"
"sra %[temp1], %[temp18], 31 \n\t"
"beqz %[temp11], 2f \n\t"
"xor %[temp18], %[temp18], %[temp18] \n\t"
"movz %[temp18], %[temp6], %[temp1] \n\t"
"2: \n\t"
"lbu %[temp1], 2(%[dst]) \n\t"
"sb %[temp18], 1(%[dst]) \n\t"
"addu %[temp1], %[temp1], %[temp8] \n\t"
"sra %[temp8], %[temp1], 8 \n\t"
"sra %[temp18], %[temp1], 31 \n\t"
"beqz %[temp8], 3f \n\t"
"xor %[temp1], %[temp1], %[temp1] \n\t"
"movz %[temp1], %[temp6], %[temp18] \n\t"
"3: \n\t"
"lbu %[temp18], 3(%[dst]) \n\t"
"sb %[temp1], 2(%[dst]) \n\t"
"addu %[temp18], %[temp18], %[temp16] \n\t"
"sra %[temp16], %[temp18], 8 \n\t"
"sra %[temp1], %[temp18], 31 \n\t"
"beqz %[temp16], 4f \n\t"
"xor %[temp18], %[temp18], %[temp18] \n\t"
"movz %[temp18], %[temp6], %[temp1] \n\t"
"4: \n\t"
"sb %[temp18], 3(%[dst]) \n\t"
"lbu %[temp5], 32(%[dst]) \n\t"
"lbu %[temp8], 33(%[dst]) \n\t"
"lbu %[temp11], 34(%[dst]) \n\t"
"lbu %[temp16], 35(%[dst]) \n\t"
"addu %[temp5], %[temp5], %[temp17] \n\t"
"addu %[temp8], %[temp8], %[temp15] \n\t"
"addu %[temp11], %[temp11], %[temp12] \n\t"
"addu %[temp16], %[temp16], %[temp10] \n\t"
"sra %[temp18], %[temp5], 8 \n\t"
"sra %[temp1], %[temp5], 31 \n\t"
"beqz %[temp18], 5f \n\t"
"xor %[temp5], %[temp5], %[temp5] \n\t"
"movz %[temp5], %[temp6], %[temp1] \n\t"
"5: \n\t"
"sra %[temp18], %[temp8], 8 \n\t"
"sra %[temp1], %[temp8], 31 \n\t"
"beqz %[temp18], 6f \n\t"
"xor %[temp8], %[temp8], %[temp8] \n\t"
"movz %[temp8], %[temp6], %[temp1] \n\t"
"6: \n\t"
"sra %[temp18], %[temp11], 8 \n\t"
"sra %[temp1], %[temp11], 31 \n\t"
"sra %[temp17], %[temp16], 8 \n\t"
"sra %[temp15], %[temp16], 31 \n\t"
"beqz %[temp18], 7f \n\t"
"xor %[temp11], %[temp11], %[temp11] \n\t"
"movz %[temp11], %[temp6], %[temp1] \n\t"
"7: \n\t"
"beqz %[temp17], 8f \n\t"
"xor %[temp16], %[temp16], %[temp16] \n\t"
"movz %[temp16], %[temp6], %[temp15] \n\t"
"8: \n\t"
"sb %[temp5], 32(%[dst]) \n\t"
"sb %[temp8], 33(%[dst]) \n\t"
"sb %[temp11], 34(%[dst]) \n\t"
"sb %[temp16], 35(%[dst]) \n\t"
"lbu %[temp5], 64(%[dst]) \n\t"
"lbu %[temp8], 65(%[dst]) \n\t"
"lbu %[temp11], 66(%[dst]) \n\t"
"lbu %[temp16], 67(%[dst]) \n\t"
"addu %[temp5], %[temp5], %[temp9] \n\t"
"addu %[temp8], %[temp8], %[temp3] \n\t"
"addu %[temp11], %[temp11], %[temp0] \n\t"
"addu %[temp16], %[temp16], %[temp14] \n\t"
"sra %[temp18], %[temp5], 8 \n\t"
"sra %[temp1], %[temp5], 31 \n\t"
"sra %[temp17], %[temp8], 8 \n\t"
"sra %[temp15], %[temp8], 31 \n\t"
"sra %[temp12], %[temp11], 8 \n\t"
"sra %[temp10], %[temp11], 31 \n\t"
"sra %[temp9], %[temp16], 8 \n\t"
"sra %[temp3], %[temp16], 31 \n\t"
"beqz %[temp18], 9f \n\t"
"xor %[temp5], %[temp5], %[temp5] \n\t"
"movz %[temp5], %[temp6], %[temp1] \n\t"
"9: \n\t"
"beqz %[temp17], 10f \n\t"
"xor %[temp8], %[temp8], %[temp8] \n\t"
"movz %[temp8], %[temp6], %[temp15] \n\t"
"10: \n\t"
"beqz %[temp12], 11f \n\t"
"xor %[temp11], %[temp11], %[temp11] \n\t"
"movz %[temp11], %[temp6], %[temp10] \n\t"
"11: \n\t"
"beqz %[temp9], 12f \n\t"
"xor %[temp16], %[temp16], %[temp16] \n\t"
"movz %[temp16], %[temp6], %[temp3] \n\t"
"12: \n\t"
"sb %[temp5], 64(%[dst]) \n\t"
"sb %[temp8], 65(%[dst]) \n\t"
"sb %[temp11], 66(%[dst]) \n\t"
"sb %[temp16], 67(%[dst]) \n\t"
"lbu %[temp5], 96(%[dst]) \n\t"
"lbu %[temp8], 97(%[dst]) \n\t"
"lbu %[temp11], 98(%[dst]) \n\t"
"lbu %[temp16], 99(%[dst]) \n\t"
"addu %[temp5], %[temp5], %[temp13] \n\t"
"addu %[temp8], %[temp8], %[temp7] \n\t"
"addu %[temp11], %[temp11], %[temp4] \n\t"
"addu %[temp16], %[temp16], %[temp2] \n\t"
"sra %[temp18], %[temp5], 8 \n\t"
"sra %[temp1], %[temp5], 31 \n\t"
"sra %[temp17], %[temp8], 8 \n\t"
"sra %[temp15], %[temp8], 31 \n\t"
"sra %[temp12], %[temp11], 8 \n\t"
"sra %[temp10], %[temp11], 31 \n\t"
"sra %[temp9], %[temp16], 8 \n\t"
"sra %[temp3], %[temp16], 31 \n\t"
"beqz %[temp18], 13f \n\t"
"xor %[temp5], %[temp5], %[temp5] \n\t"
"movz %[temp5], %[temp6], %[temp1] \n\t"
"13: \n\t"
"beqz %[temp17], 14f \n\t"
"xor %[temp8], %[temp8], %[temp8] \n\t"
"movz %[temp8], %[temp6], %[temp15] \n\t"
"14: \n\t"
"beqz %[temp12], 15f \n\t"
"xor %[temp11], %[temp11], %[temp11] \n\t"
"movz %[temp11], %[temp6], %[temp10] \n\t"
"15: \n\t"
"beqz %[temp9], 16f \n\t"
"xor %[temp16], %[temp16], %[temp16] \n\t"
"movz %[temp16], %[temp6], %[temp3] \n\t"
"16: \n\t"
"sb %[temp5], 96(%[dst]) \n\t"
"sb %[temp8], 97(%[dst]) \n\t"
"sb %[temp11], 98(%[dst]) \n\t"
"sb %[temp16], 99(%[dst]) \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), [temp2]"=&r"(temp2),
[temp3]"=&r"(temp3), [temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), [temp8]"=&r"(temp8),
[temp9]"=&r"(temp9), [temp10]"=&r"(temp10), [temp11]"=&r"(temp11),
[temp12]"=&r"(temp12), [temp13]"=&r"(temp13), [temp14]"=&r"(temp14),
[temp15]"=&r"(temp15), [temp16]"=&r"(temp16), [temp17]"=&r"(temp17),
[temp18]"=&r"(temp18)
: [in]"r"(p_in), [kC1]"r"(kC1), [kC2]"r"(kC2), [dst]"r"(dst)
: "memory", "hi", "lo"
);
}
static void TransformTwo(const int16_t* in, uint8_t* dst, int do_two) {
TransformOne(in, dst);
if (do_two) {
TransformOne(in + 16, dst + 4);
}
}
#endif // WEBP_USE_MIPS32
//------------------------------------------------------------------------------
// Entry point
extern void VP8DspInitMIPS32(void);
void VP8DspInitMIPS32(void) {
#if defined(WEBP_USE_MIPS32)
VP8InitClipTables();
VP8Transform = TransformTwo;
VP8VFilter16 = VFilter16;
VP8HFilter16 = HFilter16;
VP8VFilter8 = VFilter8;
VP8HFilter8 = HFilter8;
VP8VFilter16i = VFilter16i;
VP8HFilter16i = HFilter16i;
VP8VFilter8i = VFilter8i;
VP8HFilter8i = HFilter8i;
VP8SimpleVFilter16 = SimpleVFilter16;
VP8SimpleHFilter16 = SimpleHFilter16;
VP8SimpleVFilter16i = SimpleVFilter16i;
VP8SimpleHFilter16i = SimpleHFilter16i;
#endif // WEBP_USE_MIPS32
}

File diff suppressed because it is too large Load Diff

View File

@@ -26,7 +26,7 @@
//------------------------------------------------------------------------------
// Transforms (Paragraph 14.4)
static void TransformSSE2(const int16_t* in, uint8_t* dst, int do_two) {
static void Transform(const int16_t* in, uint8_t* dst, int do_two) {
// This implementation makes use of 16-bit fixed point versions of two
// multiply constants:
// K1 = sqrt(2) * cos (pi/8) ~= 85627 / 2^16
@@ -246,7 +246,7 @@ static void TransformSSE2(const int16_t* in, uint8_t* dst, int do_two) {
#if defined(USE_TRANSFORM_AC3)
#define MUL(a, b) (((a) * (b)) >> 16)
static void TransformAC3SSE2(const int16_t* in, uint8_t* dst) {
static void TransformAC3(const int16_t* in, uint8_t* dst) {
static const int kC1 = 20091 + (1 << 16);
static const int kC2 = 35468;
const __m128i A = _mm_set1_epi16(in[0] + 4);
@@ -298,20 +298,15 @@ static void TransformAC3SSE2(const int16_t* in, uint8_t* dst) {
_mm_subs_epu8((q), (p)), \
_mm_subs_epu8((p), (q)))
// Shift each byte of "a" by N bits while preserving by the sign bit.
//
// It first shifts the lower bytes of the words and then the upper bytes and
// then merges the results together.
#define SIGNED_SHIFT_N(a, N) { \
__m128i t = a; \
t = _mm_slli_epi16(t, 8); \
t = _mm_srai_epi16(t, N); \
t = _mm_srli_epi16(t, 8); \
\
a = _mm_srai_epi16(a, N + 8); \
a = _mm_slli_epi16(a, 8); \
\
a = _mm_or_si128(t, a); \
// Shift each byte of "x" by 3 bits while preserving by the sign bit.
static WEBP_INLINE void SignedShift8b(__m128i* const x) {
const __m128i zero = _mm_setzero_si128();
const __m128i signs = _mm_cmpgt_epi8(zero, *x);
const __m128i lo_0 = _mm_unpacklo_epi8(*x, signs); // s8 -> s16 sign extend
const __m128i hi_0 = _mm_unpackhi_epi8(*x, signs);
const __m128i lo_1 = _mm_srai_epi16(lo_0, 3);
const __m128i hi_1 = _mm_srai_epi16(hi_0, 3);
*x = _mm_packs_epi16(lo_1, hi_1);
}
#define FLIP_SIGN_BIT2(a, b) { \
@@ -324,103 +319,123 @@ static void TransformAC3SSE2(const int16_t* in, uint8_t* dst) {
FLIP_SIGN_BIT2(c, d); \
}
#define GET_NOTHEV(p1, p0, q0, q1, hev_thresh, not_hev) { \
const __m128i zero = _mm_setzero_si128(); \
const __m128i t_1 = MM_ABS(p1, p0); \
const __m128i t_2 = MM_ABS(q1, q0); \
\
const __m128i h = _mm_set1_epi8(hev_thresh); \
const __m128i t_3 = _mm_subs_epu8(t_1, h); /* abs(p1 - p0) - hev_tresh */ \
const __m128i t_4 = _mm_subs_epu8(t_2, h); /* abs(q1 - q0) - hev_tresh */ \
\
not_hev = _mm_or_si128(t_3, t_4); \
not_hev = _mm_cmpeq_epi8(not_hev, zero); /* not_hev <= t1 && not_hev <= t2 */\
// input/output is uint8_t
static WEBP_INLINE void GetNotHEV(const __m128i* const p1,
const __m128i* const p0,
const __m128i* const q0,
const __m128i* const q1,
int hev_thresh, __m128i* const not_hev) {
const __m128i zero = _mm_setzero_si128();
const __m128i t_1 = MM_ABS(*p1, *p0);
const __m128i t_2 = MM_ABS(*q1, *q0);
const __m128i h = _mm_set1_epi8(hev_thresh);
const __m128i t_3 = _mm_subs_epu8(t_1, h); // abs(p1 - p0) - hev_tresh
const __m128i t_4 = _mm_subs_epu8(t_2, h); // abs(q1 - q0) - hev_tresh
*not_hev = _mm_or_si128(t_3, t_4);
*not_hev = _mm_cmpeq_epi8(*not_hev, zero); // not_hev <= t1 && not_hev <= t2
}
#define GET_BASE_DELTA(p1, p0, q0, q1, o) { \
const __m128i qp0 = _mm_subs_epi8(q0, p0); /* q0 - p0 */ \
o = _mm_subs_epi8(p1, q1); /* p1 - q1 */ \
o = _mm_adds_epi8(o, qp0); /* p1 - q1 + 1 * (q0 - p0) */ \
o = _mm_adds_epi8(o, qp0); /* p1 - q1 + 2 * (q0 - p0) */ \
o = _mm_adds_epi8(o, qp0); /* p1 - q1 + 3 * (q0 - p0) */ \
// input pixels are int8_t
static WEBP_INLINE void GetBaseDelta(const __m128i* const p1,
const __m128i* const p0,
const __m128i* const q0,
const __m128i* const q1,
__m128i* const delta) {
// beware of addition order, for saturation!
const __m128i p1_q1 = _mm_subs_epi8(*p1, *q1); // p1 - q1
const __m128i q0_p0 = _mm_subs_epi8(*q0, *p0); // q0 - p0
const __m128i s1 = _mm_adds_epi8(p1_q1, q0_p0); // p1 - q1 + 1 * (q0 - p0)
const __m128i s2 = _mm_adds_epi8(q0_p0, s1); // p1 - q1 + 2 * (q0 - p0)
const __m128i s3 = _mm_adds_epi8(q0_p0, s2); // p1 - q1 + 3 * (q0 - p0)
*delta = s3;
}
#define DO_SIMPLE_FILTER(p0, q0, fl) { \
const __m128i three = _mm_set1_epi8(3); \
const __m128i four = _mm_set1_epi8(4); \
__m128i v3 = _mm_adds_epi8(fl, three); \
__m128i v4 = _mm_adds_epi8(fl, four); \
\
/* Do +4 side */ \
SIGNED_SHIFT_N(v4, 3); /* v4 >> 3 */ \
q0 = _mm_subs_epi8(q0, v4); /* q0 -= v4 */ \
\
/* Now do +3 side */ \
SIGNED_SHIFT_N(v3, 3); /* v3 >> 3 */ \
p0 = _mm_adds_epi8(p0, v3); /* p0 += v3 */ \
// input and output are int8_t
static WEBP_INLINE void DoSimpleFilter(__m128i* const p0, __m128i* const q0,
const __m128i* const fl) {
const __m128i k3 = _mm_set1_epi8(3);
const __m128i k4 = _mm_set1_epi8(4);
__m128i v3 = _mm_adds_epi8(*fl, k3);
__m128i v4 = _mm_adds_epi8(*fl, k4);
SignedShift8b(&v4); // v4 >> 3
SignedShift8b(&v3); // v3 >> 3
*q0 = _mm_subs_epi8(*q0, v4); // q0 -= v4
*p0 = _mm_adds_epi8(*p0, v3); // p0 += v3
}
// Updates values of 2 pixels at MB edge during complex filtering.
// Update operations:
// q = q - delta and p = p + delta; where delta = [(a_hi >> 7), (a_lo >> 7)]
#define UPDATE_2PIXELS(pi, qi, a_lo, a_hi) { \
const __m128i a_lo7 = _mm_srai_epi16(a_lo, 7); \
const __m128i a_hi7 = _mm_srai_epi16(a_hi, 7); \
const __m128i delta = _mm_packs_epi16(a_lo7, a_hi7); \
pi = _mm_adds_epi8(pi, delta); \
qi = _mm_subs_epi8(qi, delta); \
// Pixels 'pi' and 'qi' are int8_t on input, uint8_t on output (sign flip).
static WEBP_INLINE void Update2Pixels(__m128i* const pi, __m128i* const qi,
const __m128i* const a0_lo,
const __m128i* const a0_hi) {
const __m128i a1_lo = _mm_srai_epi16(*a0_lo, 7);
const __m128i a1_hi = _mm_srai_epi16(*a0_hi, 7);
const __m128i delta = _mm_packs_epi16(a1_lo, a1_hi);
const __m128i sign_bit = _mm_set1_epi8(0x80);
*pi = _mm_adds_epi8(*pi, delta);
*qi = _mm_subs_epi8(*qi, delta);
FLIP_SIGN_BIT2(*pi, *qi);
}
static void NeedsFilter(const __m128i* p1, const __m128i* p0, const __m128i* q0,
const __m128i* q1, int thresh, __m128i *mask) {
__m128i t1 = MM_ABS(*p1, *q1); // abs(p1 - q1)
*mask = _mm_set1_epi8(0xFE);
t1 = _mm_and_si128(t1, *mask); // set lsb of each byte to zero
t1 = _mm_srli_epi16(t1, 1); // abs(p1 - q1) / 2
// input pixels are uint8_t
static WEBP_INLINE void NeedsFilter(const __m128i* const p1,
const __m128i* const p0,
const __m128i* const q0,
const __m128i* const q1,
int thresh, __m128i* const mask) {
const __m128i m_thresh = _mm_set1_epi8(thresh);
const __m128i t1 = MM_ABS(*p1, *q1); // abs(p1 - q1)
const __m128i kFE = _mm_set1_epi8(0xFE);
const __m128i t2 = _mm_and_si128(t1, kFE); // set lsb of each byte to zero
const __m128i t3 = _mm_srli_epi16(t2, 1); // abs(p1 - q1) / 2
*mask = MM_ABS(*p0, *q0); // abs(p0 - q0)
*mask = _mm_adds_epu8(*mask, *mask); // abs(p0 - q0) * 2
*mask = _mm_adds_epu8(*mask, t1); // abs(p0 - q0) * 2 + abs(p1 - q1) / 2
const __m128i t4 = MM_ABS(*p0, *q0); // abs(p0 - q0)
const __m128i t5 = _mm_adds_epu8(t4, t4); // abs(p0 - q0) * 2
const __m128i t6 = _mm_adds_epu8(t5, t3); // abs(p0-q0)*2 + abs(p1-q1)/2
t1 = _mm_set1_epi8(thresh);
*mask = _mm_subs_epu8(*mask, t1); // mask <= thresh
*mask = _mm_cmpeq_epi8(*mask, _mm_setzero_si128());
const __m128i t7 = _mm_subs_epu8(t6, m_thresh); // mask <= m_thresh
*mask = _mm_cmpeq_epi8(t7, _mm_setzero_si128());
}
//------------------------------------------------------------------------------
// Edge filtering functions
// Applies filter on 2 pixels (p0 and q0)
static WEBP_INLINE void DoFilter2(const __m128i* p1, __m128i* p0, __m128i* q0,
const __m128i* q1, int thresh) {
static WEBP_INLINE void DoFilter2(__m128i* const p1, __m128i* const p0,
__m128i* const q0, __m128i* const q1,
int thresh) {
__m128i a, mask;
const __m128i sign_bit = _mm_set1_epi8(0x80);
// convert p1/q1 to int8_t (for GetBaseDelta)
const __m128i p1s = _mm_xor_si128(*p1, sign_bit);
const __m128i q1s = _mm_xor_si128(*q1, sign_bit);
NeedsFilter(p1, p0, q0, q1, thresh, &mask);
// convert to signed values
FLIP_SIGN_BIT2(*p0, *q0);
GET_BASE_DELTA(p1s, *p0, *q0, q1s, a);
GetBaseDelta(&p1s, p0, q0, &q1s, &a);
a = _mm_and_si128(a, mask); // mask filter values we don't care about
DO_SIMPLE_FILTER(*p0, *q0, a);
// unoffset
DoSimpleFilter(p0, q0, &a);
FLIP_SIGN_BIT2(*p0, *q0);
}
// Applies filter on 4 pixels (p1, p0, q0 and q1)
static WEBP_INLINE void DoFilter4(__m128i* p1, __m128i *p0,
__m128i* q0, __m128i* q1,
const __m128i* mask, int hev_thresh) {
static WEBP_INLINE void DoFilter4(__m128i* const p1, __m128i* const p0,
__m128i* const q0, __m128i* const q1,
const __m128i* const mask, int hev_thresh) {
const __m128i sign_bit = _mm_set1_epi8(0x80);
const __m128i k64 = _mm_set1_epi8(0x40);
const __m128i zero = _mm_setzero_si128();
__m128i not_hev;
__m128i t1, t2, t3;
const __m128i sign_bit = _mm_set1_epi8(0x80);
// compute hev mask
GET_NOTHEV(*p1, *p0, *q0, *q1, hev_thresh, not_hev);
GetNotHEV(p1, p0, q0, q1, hev_thresh, &not_hev);
// convert to signed values
FLIP_SIGN_BIT4(*p1, *p0, *q0, *q1);
@@ -433,92 +448,83 @@ static WEBP_INLINE void DoFilter4(__m128i* p1, __m128i *p0,
t1 = _mm_adds_epi8(t1, t2); // hev(p1 - q1) + 3 * (q0 - p0)
t1 = _mm_and_si128(t1, *mask); // mask filter values we don't care about
// Do +4 side
t2 = _mm_set1_epi8(4);
t2 = _mm_adds_epi8(t1, t2); // 3 * (q0 - p0) + (p1 - q1) + 4
SIGNED_SHIFT_N(t2, 3); // (3 * (q0 - p0) + hev(p1 - q1) + 4) >> 3
t3 = t2; // save t2
*q0 = _mm_subs_epi8(*q0, t2); // q0 -= t2
// Now do +3 side
t2 = _mm_set1_epi8(3);
t2 = _mm_adds_epi8(t1, t2); // +3 instead of +4
SIGNED_SHIFT_N(t2, 3); // (3 * (q0 - p0) + hev(p1 - q1) + 3) >> 3
t3 = _mm_set1_epi8(4);
t2 = _mm_adds_epi8(t1, t2); // 3 * (q0 - p0) + (p1 - q1) + 3
t3 = _mm_adds_epi8(t1, t3); // 3 * (q0 - p0) + (p1 - q1) + 4
SignedShift8b(&t2); // (3 * (q0 - p0) + hev(p1 - q1) + 3) >> 3
SignedShift8b(&t3); // (3 * (q0 - p0) + hev(p1 - q1) + 4) >> 3
*p0 = _mm_adds_epi8(*p0, t2); // p0 += t2
*q0 = _mm_subs_epi8(*q0, t3); // q0 -= t3
FLIP_SIGN_BIT2(*p0, *q0);
t2 = _mm_set1_epi8(1);
t3 = _mm_adds_epi8(t3, t2);
SIGNED_SHIFT_N(t3, 1); // (3 * (q0 - p0) + hev(p1 - q1) + 4) >> 4
// this is equivalent to signed (a + 1) >> 1 calculation
t2 = _mm_add_epi8(t3, sign_bit);
t3 = _mm_avg_epu8(t2, zero);
t3 = _mm_sub_epi8(t3, k64);
t3 = _mm_and_si128(not_hev, t3); // if !hev
*q1 = _mm_subs_epi8(*q1, t3); // q1 -= t3
*p1 = _mm_adds_epi8(*p1, t3); // p1 += t3
// unoffset
FLIP_SIGN_BIT4(*p1, *p0, *q0, *q1);
FLIP_SIGN_BIT2(*p1, *q1);
}
// Applies filter on 6 pixels (p2, p1, p0, q0, q1 and q2)
static WEBP_INLINE void DoFilter6(__m128i *p2, __m128i* p1, __m128i *p0,
__m128i* q0, __m128i* q1, __m128i *q2,
const __m128i* mask, int hev_thresh) {
__m128i a, not_hev;
static WEBP_INLINE void DoFilter6(__m128i* const p2, __m128i* const p1,
__m128i* const p0, __m128i* const q0,
__m128i* const q1, __m128i* const q2,
const __m128i* const mask, int hev_thresh) {
const __m128i zero = _mm_setzero_si128();
const __m128i sign_bit = _mm_set1_epi8(0x80);
__m128i a, not_hev;
// compute hev mask
GET_NOTHEV(*p1, *p0, *q0, *q1, hev_thresh, not_hev);
GetNotHEV(p1, p0, q0, q1, hev_thresh, &not_hev);
// convert to signed values
FLIP_SIGN_BIT4(*p1, *p0, *q0, *q1);
FLIP_SIGN_BIT2(*p2, *q2);
GET_BASE_DELTA(*p1, *p0, *q0, *q1, a);
GetBaseDelta(p1, p0, q0, q1, &a);
{ // do simple filter on pixels with hev
const __m128i m = _mm_andnot_si128(not_hev, *mask);
const __m128i f = _mm_and_si128(a, m);
DO_SIMPLE_FILTER(*p0, *q0, f);
DoSimpleFilter(p0, q0, &f);
}
{ // do strong filter on pixels with not hev
const __m128i zero = _mm_setzero_si128();
const __m128i nine = _mm_set1_epi16(0x0900);
const __m128i sixty_three = _mm_set1_epi16(63);
const __m128i k9 = _mm_set1_epi16(0x0900);
const __m128i k63 = _mm_set1_epi16(63);
const __m128i m = _mm_and_si128(not_hev, *mask);
const __m128i f = _mm_and_si128(a, m);
const __m128i f_lo = _mm_unpacklo_epi8(zero, f);
const __m128i f_hi = _mm_unpackhi_epi8(zero, f);
const __m128i f9_lo = _mm_mulhi_epi16(f_lo, nine); // Filter (lo) * 9
const __m128i f9_hi = _mm_mulhi_epi16(f_hi, nine); // Filter (hi) * 9
const __m128i f18_lo = _mm_add_epi16(f9_lo, f9_lo); // Filter (lo) * 18
const __m128i f18_hi = _mm_add_epi16(f9_hi, f9_hi); // Filter (hi) * 18
const __m128i f9_lo = _mm_mulhi_epi16(f_lo, k9); // Filter (lo) * 9
const __m128i f9_hi = _mm_mulhi_epi16(f_hi, k9); // Filter (hi) * 9
const __m128i a2_lo = _mm_add_epi16(f9_lo, sixty_three); // Filter * 9 + 63
const __m128i a2_hi = _mm_add_epi16(f9_hi, sixty_three); // Filter * 9 + 63
const __m128i a2_lo = _mm_add_epi16(f9_lo, k63); // Filter * 9 + 63
const __m128i a2_hi = _mm_add_epi16(f9_hi, k63); // Filter * 9 + 63
const __m128i a1_lo = _mm_add_epi16(f18_lo, sixty_three); // F... * 18 + 63
const __m128i a1_hi = _mm_add_epi16(f18_hi, sixty_three); // F... * 18 + 63
const __m128i a1_lo = _mm_add_epi16(a2_lo, f9_lo); // Filter * 18 + 63
const __m128i a1_hi = _mm_add_epi16(a2_hi, f9_hi); // Filter * 18 + 63
const __m128i a0_lo = _mm_add_epi16(f18_lo, a2_lo); // Filter * 27 + 63
const __m128i a0_hi = _mm_add_epi16(f18_hi, a2_hi); // Filter * 27 + 63
const __m128i a0_lo = _mm_add_epi16(a1_lo, f9_lo); // Filter * 27 + 63
const __m128i a0_hi = _mm_add_epi16(a1_hi, f9_hi); // Filter * 27 + 63
UPDATE_2PIXELS(*p2, *q2, a2_lo, a2_hi);
UPDATE_2PIXELS(*p1, *q1, a1_lo, a1_hi);
UPDATE_2PIXELS(*p0, *q0, a0_lo, a0_hi);
Update2Pixels(p2, q2, &a2_lo, &a2_hi);
Update2Pixels(p1, q1, &a1_lo, &a1_hi);
Update2Pixels(p0, q0, &a0_lo, &a0_hi);
}
// unoffset
FLIP_SIGN_BIT4(*p1, *p0, *q0, *q1);
FLIP_SIGN_BIT2(*p2, *q2);
}
// reads 8 rows across a vertical edge.
//
// TODO(somnath): Investigate _mm_shuffle* also see if it can be broken into
// two Load4x4() to avoid code duplication.
static WEBP_INLINE void Load8x4(const uint8_t* b, int stride,
__m128i* p, __m128i* q) {
static WEBP_INLINE void Load8x4(const uint8_t* const b, int stride,
__m128i* const p, __m128i* const q) {
__m128i t1, t2;
// Load 0th, 1st, 4th and 5th rows
@@ -557,10 +563,11 @@ static WEBP_INLINE void Load8x4(const uint8_t* b, int stride,
*q = _mm_unpackhi_epi32(t1, t2);
}
static WEBP_INLINE void Load16x4(const uint8_t* r0, const uint8_t* r8,
static WEBP_INLINE void Load16x4(const uint8_t* const r0,
const uint8_t* const r8,
int stride,
__m128i* p1, __m128i* p0,
__m128i* q0, __m128i* q1) {
__m128i* const p1, __m128i* const p0,
__m128i* const q0, __m128i* const q1) {
__m128i t1, t2;
// Assume the pixels around the edge (|) are numbered as follows
// 00 01 | 02 03
@@ -592,7 +599,7 @@ static WEBP_INLINE void Load16x4(const uint8_t* r0, const uint8_t* r8,
*q1 = _mm_unpackhi_epi64(t2, *q1);
}
static WEBP_INLINE void Store4x4(__m128i* x, uint8_t* dst, int stride) {
static WEBP_INLINE void Store4x4(__m128i* const x, uint8_t* dst, int stride) {
int i;
for (i = 0; i < 4; ++i, dst += stride) {
*((int32_t*)dst) = _mm_cvtsi128_si32(*x);
@@ -601,48 +608,51 @@ static WEBP_INLINE void Store4x4(__m128i* x, uint8_t* dst, int stride) {
}
// Transpose back and store
static WEBP_INLINE void Store16x4(uint8_t* r0, uint8_t* r8, int stride,
__m128i* p1, __m128i* p0,
__m128i* q0, __m128i* q1) {
__m128i t1;
static WEBP_INLINE void Store16x4(const __m128i* const p1,
const __m128i* const p0,
const __m128i* const q0,
const __m128i* const q1,
uint8_t* r0, uint8_t* r8,
int stride) {
__m128i t1, p1_s, p0_s, q0_s, q1_s;
// p0 = 71 70 61 60 51 50 41 40 31 30 21 20 11 10 01 00
// p1 = f1 f0 e1 e0 d1 d0 c1 c0 b1 b0 a1 a0 91 90 81 80
t1 = *p0;
*p0 = _mm_unpacklo_epi8(*p1, t1);
*p1 = _mm_unpackhi_epi8(*p1, t1);
p0_s = _mm_unpacklo_epi8(*p1, t1);
p1_s = _mm_unpackhi_epi8(*p1, t1);
// q0 = 73 72 63 62 53 52 43 42 33 32 23 22 13 12 03 02
// q1 = f3 f2 e3 e2 d3 d2 c3 c2 b3 b2 a3 a2 93 92 83 82
t1 = *q0;
*q0 = _mm_unpacklo_epi8(t1, *q1);
*q1 = _mm_unpackhi_epi8(t1, *q1);
q0_s = _mm_unpacklo_epi8(t1, *q1);
q1_s = _mm_unpackhi_epi8(t1, *q1);
// p0 = 33 32 31 30 23 22 21 20 13 12 11 10 03 02 01 00
// q0 = 73 72 71 70 63 62 61 60 53 52 51 50 43 42 41 40
t1 = *p0;
*p0 = _mm_unpacklo_epi16(t1, *q0);
*q0 = _mm_unpackhi_epi16(t1, *q0);
t1 = p0_s;
p0_s = _mm_unpacklo_epi16(t1, q0_s);
q0_s = _mm_unpackhi_epi16(t1, q0_s);
// p1 = b3 b2 b1 b0 a3 a2 a1 a0 93 92 91 90 83 82 81 80
// q1 = f3 f2 f1 f0 e3 e2 e1 e0 d3 d2 d1 d0 c3 c2 c1 c0
t1 = *p1;
*p1 = _mm_unpacklo_epi16(t1, *q1);
*q1 = _mm_unpackhi_epi16(t1, *q1);
t1 = p1_s;
p1_s = _mm_unpacklo_epi16(t1, q1_s);
q1_s = _mm_unpackhi_epi16(t1, q1_s);
Store4x4(p0, r0, stride);
Store4x4(&p0_s, r0, stride);
r0 += 4 * stride;
Store4x4(q0, r0, stride);
Store4x4(&q0_s, r0, stride);
Store4x4(p1, r8, stride);
Store4x4(&p1_s, r8, stride);
r8 += 4 * stride;
Store4x4(q1, r8, stride);
Store4x4(&q1_s, r8, stride);
}
//------------------------------------------------------------------------------
// Simple In-loop filtering (Paragraph 15.2)
static void SimpleVFilter16SSE2(uint8_t* p, int stride, int thresh) {
static void SimpleVFilter16(uint8_t* p, int stride, int thresh) {
// Load
__m128i p1 = _mm_loadu_si128((__m128i*)&p[-2 * stride]);
__m128i p0 = _mm_loadu_si128((__m128i*)&p[-stride]);
@@ -653,49 +663,49 @@ static void SimpleVFilter16SSE2(uint8_t* p, int stride, int thresh) {
// Store
_mm_storeu_si128((__m128i*)&p[-stride], p0);
_mm_storeu_si128((__m128i*)p, q0);
_mm_storeu_si128((__m128i*)&p[0], q0);
}
static void SimpleHFilter16SSE2(uint8_t* p, int stride, int thresh) {
static void SimpleHFilter16(uint8_t* p, int stride, int thresh) {
__m128i p1, p0, q0, q1;
p -= 2; // beginning of p1
Load16x4(p, p + 8 * stride, stride, &p1, &p0, &q0, &q1);
Load16x4(p, p + 8 * stride, stride, &p1, &p0, &q0, &q1);
DoFilter2(&p1, &p0, &q0, &q1, thresh);
Store16x4(p, p + 8 * stride, stride, &p1, &p0, &q0, &q1);
Store16x4(&p1, &p0, &q0, &q1, p, p + 8 * stride, stride);
}
static void SimpleVFilter16iSSE2(uint8_t* p, int stride, int thresh) {
static void SimpleVFilter16i(uint8_t* p, int stride, int thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4 * stride;
SimpleVFilter16SSE2(p, stride, thresh);
SimpleVFilter16(p, stride, thresh);
}
}
static void SimpleHFilter16iSSE2(uint8_t* p, int stride, int thresh) {
static void SimpleHFilter16i(uint8_t* p, int stride, int thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4;
SimpleHFilter16SSE2(p, stride, thresh);
SimpleHFilter16(p, stride, thresh);
}
}
//------------------------------------------------------------------------------
// Complex In-loop filtering (Paragraph 15.3)
#define MAX_DIFF1(p3, p2, p1, p0, m) { \
m = MM_ABS(p3, p2); \
m = _mm_max_epu8(m, MM_ABS(p2, p1)); \
m = _mm_max_epu8(m, MM_ABS(p1, p0)); \
}
#define MAX_DIFF2(p3, p2, p1, p0, m) { \
#define MAX_DIFF1(p3, p2, p1, p0, m) do { \
m = MM_ABS(p1, p0); \
m = _mm_max_epu8(m, MM_ABS(p3, p2)); \
m = _mm_max_epu8(m, MM_ABS(p2, p1)); \
} while (0)
#define MAX_DIFF2(p3, p2, p1, p0, m) do { \
m = _mm_max_epu8(m, MM_ABS(p1, p0)); \
}
m = _mm_max_epu8(m, MM_ABS(p3, p2)); \
m = _mm_max_epu8(m, MM_ABS(p2, p1)); \
} while (0)
#define LOAD_H_EDGES4(p, stride, e1, e2, e3, e4) { \
e1 = _mm_loadu_si128((__m128i*)&(p)[0 * stride]); \
@@ -704,10 +714,11 @@ static void SimpleHFilter16iSSE2(uint8_t* p, int stride, int thresh) {
e4 = _mm_loadu_si128((__m128i*)&(p)[3 * stride]); \
}
#define LOADUV_H_EDGE(p, u, v, stride) { \
p = _mm_loadl_epi64((__m128i*)&(u)[(stride)]); \
p = _mm_unpacklo_epi64(p, _mm_loadl_epi64((__m128i*)&(v)[(stride)])); \
}
#define LOADUV_H_EDGE(p, u, v, stride) do { \
const __m128i U = _mm_loadl_epi64((__m128i*)&(u)[(stride)]); \
const __m128i V = _mm_loadl_epi64((__m128i*)&(v)[(stride)]); \
p = _mm_unpacklo_epi64(U, V); \
} while (0)
#define LOADUV_H_EDGES4(u, v, stride, e1, e2, e3, e4) { \
LOADUV_H_EDGE(e1, u, v, 0 * stride); \
@@ -722,18 +733,23 @@ static void SimpleHFilter16iSSE2(uint8_t* p, int stride, int thresh) {
_mm_storel_epi64((__m128i*)&v[(stride)], p); \
}
#define COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask) { \
__m128i fl_yes; \
const __m128i it = _mm_set1_epi8(ithresh); \
mask = _mm_subs_epu8(mask, it); \
mask = _mm_cmpeq_epi8(mask, _mm_setzero_si128()); \
NeedsFilter(&p1, &p0, &q0, &q1, thresh, &fl_yes); \
mask = _mm_and_si128(mask, fl_yes); \
static WEBP_INLINE void ComplexMask(const __m128i* const p1,
const __m128i* const p0,
const __m128i* const q0,
const __m128i* const q1,
int thresh, int ithresh,
__m128i* const mask) {
const __m128i it = _mm_set1_epi8(ithresh);
const __m128i diff = _mm_subs_epu8(*mask, it);
const __m128i thresh_mask = _mm_cmpeq_epi8(diff, _mm_setzero_si128());
__m128i filter_mask;
NeedsFilter(p1, p0, q0, q1, thresh, &filter_mask);
*mask = _mm_and_si128(thresh_mask, filter_mask);
}
// on macroblock edges
static void VFilter16SSE2(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
static void VFilter16(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
__m128i t1;
__m128i mask;
__m128i p2, p1, p0, q0, q1, q2;
@@ -746,20 +762,20 @@ static void VFilter16SSE2(uint8_t* p, int stride,
LOAD_H_EDGES4(p, stride, q0, q1, q2, t1);
MAX_DIFF2(t1, q2, q1, q0, mask);
COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask);
ComplexMask(&p1, &p0, &q0, &q1, thresh, ithresh, &mask);
DoFilter6(&p2, &p1, &p0, &q0, &q1, &q2, &mask, hev_thresh);
// Store
_mm_storeu_si128((__m128i*)&p[-3 * stride], p2);
_mm_storeu_si128((__m128i*)&p[-2 * stride], p1);
_mm_storeu_si128((__m128i*)&p[-1 * stride], p0);
_mm_storeu_si128((__m128i*)&p[0 * stride], q0);
_mm_storeu_si128((__m128i*)&p[1 * stride], q1);
_mm_storeu_si128((__m128i*)&p[2 * stride], q2);
_mm_storeu_si128((__m128i*)&p[+0 * stride], q0);
_mm_storeu_si128((__m128i*)&p[+1 * stride], q1);
_mm_storeu_si128((__m128i*)&p[+2 * stride], q2);
}
static void HFilter16SSE2(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
static void HFilter16(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
__m128i mask;
__m128i p3, p2, p1, p0, q0, q1, q2, q3;
@@ -770,71 +786,78 @@ static void HFilter16SSE2(uint8_t* p, int stride,
Load16x4(p, p + 8 * stride, stride, &q0, &q1, &q2, &q3); // q0, q1, q2, q3
MAX_DIFF2(q3, q2, q1, q0, mask);
COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask);
ComplexMask(&p1, &p0, &q0, &q1, thresh, ithresh, &mask);
DoFilter6(&p2, &p1, &p0, &q0, &q1, &q2, &mask, hev_thresh);
Store16x4(b, b + 8 * stride, stride, &p3, &p2, &p1, &p0);
Store16x4(p, p + 8 * stride, stride, &q0, &q1, &q2, &q3);
Store16x4(&p3, &p2, &p1, &p0, b, b + 8 * stride, stride);
Store16x4(&q0, &q1, &q2, &q3, p, p + 8 * stride, stride);
}
// on three inner edges
static void VFilter16iSSE2(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
static void VFilter16i(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
int k;
__m128i mask;
__m128i t1, t2, p1, p0, q0, q1;
__m128i p3, p2, p1, p0; // loop invariants
LOAD_H_EDGES4(p, stride, p3, p2, p1, p0); // prologue
for (k = 3; k > 0; --k) {
// Load p3, p2, p1, p0
LOAD_H_EDGES4(p, stride, t2, t1, p1, p0);
MAX_DIFF1(t2, t1, p1, p0, mask);
__m128i mask, tmp1, tmp2;
uint8_t* const b = p + 2 * stride; // beginning of p1
p += 4 * stride;
// Load q0, q1, q2, q3
LOAD_H_EDGES4(p, stride, q0, q1, t1, t2);
MAX_DIFF2(t2, t1, q1, q0, mask);
MAX_DIFF1(p3, p2, p1, p0, mask); // compute partial mask
LOAD_H_EDGES4(p, stride, p3, p2, tmp1, tmp2);
MAX_DIFF2(p3, p2, tmp1, tmp2, mask);
COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask);
DoFilter4(&p1, &p0, &q0, &q1, &mask, hev_thresh);
// p3 and p2 are not just temporary variables here: they will be
// re-used for next span. And q2/q3 will become p1/p0 accordingly.
ComplexMask(&p1, &p0, &p3, &p2, thresh, ithresh, &mask);
DoFilter4(&p1, &p0, &p3, &p2, &mask, hev_thresh);
// Store
_mm_storeu_si128((__m128i*)&p[-2 * stride], p1);
_mm_storeu_si128((__m128i*)&p[-1 * stride], p0);
_mm_storeu_si128((__m128i*)&p[0 * stride], q0);
_mm_storeu_si128((__m128i*)&p[1 * stride], q1);
_mm_storeu_si128((__m128i*)&b[0 * stride], p1);
_mm_storeu_si128((__m128i*)&b[1 * stride], p0);
_mm_storeu_si128((__m128i*)&b[2 * stride], p3);
_mm_storeu_si128((__m128i*)&b[3 * stride], p2);
// rotate samples
p1 = tmp1;
p0 = tmp2;
}
}
static void HFilter16iSSE2(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
static void HFilter16i(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
int k;
uint8_t* b;
__m128i mask;
__m128i t1, t2, p1, p0, q0, q1;
__m128i p3, p2, p1, p0; // loop invariants
Load16x4(p, p + 8 * stride, stride, &p3, &p2, &p1, &p0); // prologue
for (k = 3; k > 0; --k) {
b = p;
Load16x4(b, b + 8 * stride, stride, &t2, &t1, &p1, &p0); // p3, p2, p1, p0
MAX_DIFF1(t2, t1, p1, p0, mask);
__m128i mask, tmp1, tmp2;
uint8_t* const b = p + 2; // beginning of p1
b += 4; // beginning of q0
Load16x4(b, b + 8 * stride, stride, &q0, &q1, &t1, &t2); // q0, q1, q2, q3
MAX_DIFF2(t2, t1, q1, q0, mask);
p += 4; // beginning of q0 (and next span)
COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask);
DoFilter4(&p1, &p0, &q0, &q1, &mask, hev_thresh);
MAX_DIFF1(p3, p2, p1, p0, mask); // compute partial mask
Load16x4(p, p + 8 * stride, stride, &p3, &p2, &tmp1, &tmp2);
MAX_DIFF2(p3, p2, tmp1, tmp2, mask);
b -= 2; // beginning of p1
Store16x4(b, b + 8 * stride, stride, &p1, &p0, &q0, &q1);
ComplexMask(&p1, &p0, &p3, &p2, thresh, ithresh, &mask);
DoFilter4(&p1, &p0, &p3, &p2, &mask, hev_thresh);
p += 4;
Store16x4(&p1, &p0, &p3, &p2, b, b + 8 * stride, stride);
// rotate samples
p1 = tmp1;
p0 = tmp2;
}
}
// 8-pixels wide variant, for chroma filtering
static void VFilter8SSE2(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
static void VFilter8(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
__m128i mask;
__m128i t1, p2, p1, p0, q0, q1, q2;
@@ -846,7 +869,7 @@ static void VFilter8SSE2(uint8_t* u, uint8_t* v, int stride,
LOADUV_H_EDGES4(u, v, stride, q0, q1, q2, t1);
MAX_DIFF2(t1, q2, q1, q0, mask);
COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask);
ComplexMask(&p1, &p0, &q0, &q1, thresh, ithresh, &mask);
DoFilter6(&p2, &p1, &p0, &q0, &q1, &q2, &mask, hev_thresh);
// Store
@@ -858,8 +881,8 @@ static void VFilter8SSE2(uint8_t* u, uint8_t* v, int stride,
STOREUV(q2, u, v, 2 * stride);
}
static void HFilter8SSE2(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
static void HFilter8(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
__m128i mask;
__m128i p3, p2, p1, p0, q0, q1, q2, q3;
@@ -871,15 +894,15 @@ static void HFilter8SSE2(uint8_t* u, uint8_t* v, int stride,
Load16x4(u, v, stride, &q0, &q1, &q2, &q3); // q0, q1, q2, q3
MAX_DIFF2(q3, q2, q1, q0, mask);
COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask);
ComplexMask(&p1, &p0, &q0, &q1, thresh, ithresh, &mask);
DoFilter6(&p2, &p1, &p0, &q0, &q1, &q2, &mask, hev_thresh);
Store16x4(tu, tv, stride, &p3, &p2, &p1, &p0);
Store16x4(u, v, stride, &q0, &q1, &q2, &q3);
Store16x4(&p3, &p2, &p1, &p0, tu, tv, stride);
Store16x4(&q0, &q1, &q2, &q3, u, v, stride);
}
static void VFilter8iSSE2(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
static void VFilter8i(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
__m128i mask;
__m128i t1, t2, p1, p0, q0, q1;
@@ -894,7 +917,7 @@ static void VFilter8iSSE2(uint8_t* u, uint8_t* v, int stride,
LOADUV_H_EDGES4(u, v, stride, q0, q1, t1, t2);
MAX_DIFF2(t2, t1, q1, q0, mask);
COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask);
ComplexMask(&p1, &p0, &q0, &q1, thresh, ithresh, &mask);
DoFilter4(&p1, &p0, &q0, &q1, &mask, hev_thresh);
// Store
@@ -904,8 +927,8 @@ static void VFilter8iSSE2(uint8_t* u, uint8_t* v, int stride,
STOREUV(q1, u, v, 1 * stride);
}
static void HFilter8iSSE2(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
static void HFilter8i(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
__m128i mask;
__m128i t1, t2, p1, p0, q0, q1;
Load16x4(u, v, stride, &t2, &t1, &p1, &p0); // p3, p2, p1, p0
@@ -916,12 +939,12 @@ static void HFilter8iSSE2(uint8_t* u, uint8_t* v, int stride,
Load16x4(u, v, stride, &q0, &q1, &t1, &t2); // q0, q1, q2, q3
MAX_DIFF2(t2, t1, q1, q0, mask);
COMPLEX_FL_MASK(p1, p0, q0, q1, thresh, ithresh, mask);
ComplexMask(&p1, &p0, &q0, &q1, thresh, ithresh, &mask);
DoFilter4(&p1, &p0, &q0, &q1, &mask, hev_thresh);
u -= 2; // beginning of p1
v -= 2;
Store16x4(u, v, stride, &p1, &p0, &q0, &q1);
Store16x4(&p1, &p0, &q0, &q1, u, v, stride);
}
#endif // WEBP_USE_SSE2
@@ -933,24 +956,23 @@ extern void VP8DspInitSSE2(void);
void VP8DspInitSSE2(void) {
#if defined(WEBP_USE_SSE2)
VP8Transform = TransformSSE2;
VP8Transform = Transform;
#if defined(USE_TRANSFORM_AC3)
VP8TransformAC3 = TransformAC3SSE2;
VP8TransformAC3 = TransformAC3;
#endif
VP8VFilter16 = VFilter16SSE2;
VP8HFilter16 = HFilter16SSE2;
VP8VFilter8 = VFilter8SSE2;
VP8HFilter8 = HFilter8SSE2;
VP8VFilter16i = VFilter16iSSE2;
VP8HFilter16i = HFilter16iSSE2;
VP8VFilter8i = VFilter8iSSE2;
VP8HFilter8i = HFilter8iSSE2;
VP8VFilter16 = VFilter16;
VP8HFilter16 = HFilter16;
VP8VFilter8 = VFilter8;
VP8HFilter8 = HFilter8;
VP8VFilter16i = VFilter16i;
VP8HFilter16i = HFilter16i;
VP8VFilter8i = VFilter8i;
VP8HFilter8i = HFilter8i;
VP8SimpleVFilter16 = SimpleVFilter16SSE2;
VP8SimpleHFilter16 = SimpleHFilter16SSE2;
VP8SimpleVFilter16i = SimpleVFilter16iSSE2;
VP8SimpleHFilter16i = SimpleHFilter16iSSE2;
VP8SimpleVFilter16 = SimpleVFilter16;
VP8SimpleHFilter16 = SimpleHFilter16;
VP8SimpleVFilter16i = SimpleVFilter16i;
VP8SimpleHFilter16i = SimpleHFilter16i;
#endif // WEBP_USE_SSE2
}

View File

@@ -14,6 +14,10 @@
#ifndef WEBP_DSP_DSP_H_
#define WEBP_DSP_DSP_H_
#ifdef HAVE_CONFIG_H
#include "../webp/config.h"
#endif
#include "../webp/types.h"
#ifdef __cplusplus
@@ -23,27 +27,63 @@ extern "C" {
//------------------------------------------------------------------------------
// CPU detection
#if defined(__GNUC__)
# define LOCAL_GCC_VERSION ((__GNUC__ << 8) | __GNUC_MINOR__)
# define LOCAL_GCC_PREREQ(maj, min) \
(LOCAL_GCC_VERSION >= (((maj) << 8) | (min)))
#else
# define LOCAL_GCC_VERSION 0
# define LOCAL_GCC_PREREQ(maj, min) 0
#endif
#ifdef __clang__
# define LOCAL_CLANG_VERSION ((__clang_major__ << 8) | __clang_minor__)
# define LOCAL_CLANG_PREREQ(maj, min) \
(LOCAL_CLANG_VERSION >= (((maj) << 8) | (min)))
#else
# define LOCAL_CLANG_VERSION 0
# define LOCAL_CLANG_PREREQ(maj, min) 0
#endif // __clang__
#if defined(_MSC_VER) && _MSC_VER > 1310 && \
(defined(_M_X64) || defined(_M_IX86))
#define WEBP_MSC_SSE2 // Visual C++ SSE2 targets
#endif
#if defined(__SSE2__) || defined(WEBP_MSC_SSE2)
// WEBP_HAVE_* are used to indicate the presence of the instruction set in dsp
// files without intrinsics, allowing the corresponding Init() to be called.
// Files containing intrinsics will need to be built targeting the instruction
// set so should succeed on one of the earlier tests.
#if defined(__SSE2__) || defined(WEBP_MSC_SSE2) || defined(WEBP_HAVE_SSE2)
#define WEBP_USE_SSE2
#endif
#if defined(__AVX2__) || defined(WEBP_HAVE_AVX2)
#define WEBP_USE_AVX2
#endif
#if defined(__ANDROID__) && defined(__ARM_ARCH_7A__)
#define WEBP_ANDROID_NEON // Android targets that might support NEON
#endif
#if defined(__ARM_NEON__) || defined(WEBP_ANDROID_NEON)
#if defined(__ARM_NEON__) || defined(WEBP_ANDROID_NEON) || defined(__aarch64__)
#define WEBP_USE_NEON
#endif
#if defined(__mips__) && !defined(__mips64) && (__mips_isa_rev < 6)
#define WEBP_USE_MIPS32
#if (__mips_isa_rev >= 2)
#define WEBP_USE_MIPS32_R2
#endif
#endif
typedef enum {
kSSE2,
kSSE3,
kNEON
kAVX,
kAVX2,
kNEON,
kMIPS32
} CPUFeature;
// returns true if the CPU supports the feature.
typedef int (*VP8CPUInfo)(CPUFeature feature);
@@ -61,7 +101,6 @@ typedef void (*VP8Fdct)(const uint8_t* src, const uint8_t* ref, int16_t* out);
typedef void (*VP8WHT)(const int16_t* in, int16_t* out);
extern VP8Idct VP8ITransform;
extern VP8Fdct VP8FTransform;
extern VP8WHT VP8ITransformWHT;
extern VP8WHT VP8FTransformWHT;
// Predictions
// *dst is the destination block. *top and *left can be NULL.
@@ -83,7 +122,7 @@ extern VP8BlockCopy VP8Copy4x4;
// Quantization
struct VP8Matrix; // forward declaration
typedef int (*VP8QuantizeBlock)(int16_t in[16], int16_t out[16],
int n, const struct VP8Matrix* const mtx);
const struct VP8Matrix* const mtx);
extern VP8QuantizeBlock VP8EncQuantizeBlock;
// specific to 2nd transform:
@@ -121,6 +160,13 @@ extern const VP8PredFunc VP8PredLuma16[/* NUM_B_DC_MODES */];
extern const VP8PredFunc VP8PredChroma8[/* NUM_B_DC_MODES */];
extern const VP8PredFunc VP8PredLuma4[/* NUM_BMODES */];
// clipping tables (for filtering)
extern const int8_t* const VP8ksclip1; // clips [-1020, 1020] to [-128, 127]
extern const int8_t* const VP8ksclip2; // clips [-112, 112] to [-16, 15]
extern const uint8_t* const VP8kclip1; // clips [-255,511] to [0,255]
extern const uint8_t* const VP8kabs0; // abs(x) for x in [-255,255]
void VP8InitClipTables(void); // must be called first
// simple filter (only for luma)
typedef void (*VP8SimpleFilterFunc)(uint8_t* p, int stride, int thresh);
extern VP8SimpleFilterFunc VP8SimpleVFilter16;
@@ -166,21 +212,20 @@ typedef void (*WebPUpsampleLinePairFunc)(
// Fancy upsampling functions to convert YUV to RGB(A) modes
extern WebPUpsampleLinePairFunc WebPUpsamplers[/* MODE_LAST */];
// Initializes SSE2 version of the fancy upsamplers.
void WebPInitUpsamplersSSE2(void);
// NEON version
void WebPInitUpsamplersNEON(void);
#endif // FANCY_UPSAMPLING
// Point-sampling methods.
typedef void (*WebPSampleLinePairFunc)(
const uint8_t* top_y, const uint8_t* bottom_y,
const uint8_t* u, const uint8_t* v,
uint8_t* top_dst, uint8_t* bottom_dst, int len);
// Per-row point-sampling methods.
typedef void (*WebPSamplerRowFunc)(const uint8_t* y,
const uint8_t* u, const uint8_t* v,
uint8_t* dst, int len);
// Generic function to apply 'WebPSamplerRowFunc' to the whole plane:
void WebPSamplerProcessPlane(const uint8_t* y, int y_stride,
const uint8_t* u, const uint8_t* v, int uv_stride,
uint8_t* dst, int dst_stride,
int width, int height, WebPSamplerRowFunc func);
extern const WebPSampleLinePairFunc WebPSamplers[/* MODE_LAST */];
// Sampling functions to convert rows of YUV to RGB(A)
extern WebPSamplerRowFunc WebPSamplers[/* MODE_LAST */];
// General function for converting two lines of ARGB or RGBA.
// 'alpha_is_last' should be true if 0xff000000 is stored in memory as
@@ -194,11 +239,14 @@ typedef void (*WebPYUV444Converter)(const uint8_t* y,
extern const WebPYUV444Converter WebPYUV444Converters[/* MODE_LAST */];
// Main function to be called
// Must be called before using the WebPUpsamplers[] (and for premultiplied
// colorspaces like rgbA, rgbA4444, etc)
void WebPInitUpsamplers(void);
// Must be called before using WebPSamplers[]
void WebPInitSamplers(void);
//------------------------------------------------------------------------------
// Pre-multiply planes with alpha values
// Utilities for processing transparent channel.
// Apply alpha pre-multiply on an rgba, bgra or argb plane of size w * h.
// alpha_first should be 0 for argb, 1 for rgba or bgra (where alpha is last).
@@ -209,13 +257,34 @@ extern void (*WebPApplyAlphaMultiply)(
extern void (*WebPApplyAlphaMultiply4444)(
uint8_t* rgba4444, int w, int h, int stride);
// Extract the alpha values from 32b values in argb[] and pack them into alpha[]
// (this is the opposite of WebPDispatchAlpha).
// Returns true if there's only trivial 0xff alpha values.
extern int (*WebPExtractAlpha)(const uint8_t* argb, int argb_stride,
int width, int height,
uint8_t* alpha, int alpha_stride);
// Pre-Multiply operation transforms x into x * A / 255 (where x=Y,R,G or B).
// Un-Multiply operation transforms x into x * 255 / A.
// Pre-Multiply or Un-Multiply (if 'inverse' is true) argb values in a row.
extern void (*WebPMultARGBRow)(uint32_t* const ptr, int width, int inverse);
// Same a WebPMultARGBRow(), but for several rows.
void WebPMultARGBRows(uint8_t* ptr, int stride, int width, int num_rows,
int inverse);
// Same for a row of single values, with side alpha values.
extern void (*WebPMultRow)(uint8_t* const ptr, const uint8_t* const alpha,
int width, int inverse);
// Same a WebPMultRow(), but for several 'num_rows' rows.
void WebPMultRows(uint8_t* ptr, int stride,
const uint8_t* alpha, int alpha_stride,
int width, int num_rows, int inverse);
// To be called first before using the above.
void WebPInitPremultiply(void);
void WebPInitPremultiplySSE2(void); // should not be called directly.
void WebPInitPremultiplyNEON(void);
//------------------------------------------------------------------------------
void WebPInitAlphaProcessing(void);
#ifdef __cplusplus
} // extern "C"

View File

@@ -159,33 +159,6 @@ static void FTransform(const uint8_t* src, const uint8_t* ref, int16_t* out) {
}
}
static void ITransformWHT(const int16_t* in, int16_t* out) {
int tmp[16];
int i;
for (i = 0; i < 4; ++i) {
const int a0 = in[0 + i] + in[12 + i];
const int a1 = in[4 + i] + in[ 8 + i];
const int a2 = in[4 + i] - in[ 8 + i];
const int a3 = in[0 + i] - in[12 + i];
tmp[0 + i] = a0 + a1;
tmp[8 + i] = a0 - a1;
tmp[4 + i] = a3 + a2;
tmp[12 + i] = a3 - a2;
}
for (i = 0; i < 4; ++i) {
const int dc = tmp[0 + i * 4] + 3; // w/ rounder
const int a0 = dc + tmp[3 + i * 4];
const int a1 = tmp[1 + i * 4] + tmp[2 + i * 4];
const int a2 = tmp[1 + i * 4] - tmp[2 + i * 4];
const int a3 = dc - tmp[3 + i * 4];
out[ 0] = (a0 + a1) >> 3;
out[16] = (a3 + a2) >> 3;
out[32] = (a0 - a1) >> 3;
out[48] = (a3 - a2) >> 3;
out += 64;
}
}
static void FTransformWHT(const int16_t* in, int16_t* out) {
// input is 12b signed
int32_t tmp[16];
@@ -627,21 +600,23 @@ static const uint8_t kZigzag[16] = {
// Simple quantization
static int QuantizeBlock(int16_t in[16], int16_t out[16],
int n, const VP8Matrix* const mtx) {
const VP8Matrix* const mtx) {
int last = -1;
for (; n < 16; ++n) {
int n;
for (n = 0; n < 16; ++n) {
const int j = kZigzag[n];
const int sign = (in[j] < 0);
const int coeff = (sign ? -in[j] : in[j]) + mtx->sharpen_[j];
const uint32_t coeff = (sign ? -in[j] : in[j]) + mtx->sharpen_[j];
if (coeff > mtx->zthresh_[j]) {
const int Q = mtx->q_[j];
const int iQ = mtx->iq_[j];
const int B = mtx->bias_[j];
out[n] = QUANTDIV(coeff, iQ, B);
if (out[n] > MAX_LEVEL) out[n] = MAX_LEVEL;
if (sign) out[n] = -out[n];
in[j] = out[n] * Q;
if (out[n]) last = n;
const uint32_t Q = mtx->q_[j];
const uint32_t iQ = mtx->iq_[j];
const uint32_t B = mtx->bias_[j];
int level = QUANTDIV(coeff, iQ, B);
if (level > MAX_LEVEL) level = MAX_LEVEL;
if (sign) level = -level;
in[j] = level * Q;
out[n] = level;
if (level) last = n;
} else {
out[n] = 0;
in[j] = 0;
@@ -656,17 +631,18 @@ static int QuantizeBlockWHT(int16_t in[16], int16_t out[16],
for (n = 0; n < 16; ++n) {
const int j = kZigzag[n];
const int sign = (in[j] < 0);
const int coeff = sign ? -in[j] : in[j];
const uint32_t coeff = sign ? -in[j] : in[j];
assert(mtx->sharpen_[j] == 0);
if (coeff > mtx->zthresh_[j]) {
const int Q = mtx->q_[j];
const int iQ = mtx->iq_[j];
const int B = mtx->bias_[j];
out[n] = QUANTDIV(coeff, iQ, B);
if (out[n] > MAX_LEVEL) out[n] = MAX_LEVEL;
if (sign) out[n] = -out[n];
in[j] = out[n] * Q;
if (out[n]) last = n;
const uint32_t Q = mtx->q_[j];
const uint32_t iQ = mtx->iq_[j];
const uint32_t B = mtx->bias_[j];
int level = QUANTDIV(coeff, iQ, B);
if (level > MAX_LEVEL) level = MAX_LEVEL;
if (sign) level = -level;
in[j] = level * Q;
out[n] = level;
if (level) last = n;
} else {
out[n] = 0;
in[j] = 0;
@@ -697,7 +673,6 @@ static void Copy4x4(const uint8_t* src, uint8_t* dst) { Copy(src, dst, 4); }
VP8CHisto VP8CollectHistogram;
VP8Idct VP8ITransform;
VP8Fdct VP8FTransform;
VP8WHT VP8ITransformWHT;
VP8WHT VP8FTransformWHT;
VP8Intra4Preds VP8EncPredLuma4;
VP8IntraPreds VP8EncPredLuma16;
@@ -713,16 +688,23 @@ VP8QuantizeBlockWHT VP8EncQuantizeBlockWHT;
VP8BlockCopy VP8Copy4x4;
extern void VP8EncDspInitSSE2(void);
extern void VP8EncDspInitAVX2(void);
extern void VP8EncDspInitNEON(void);
extern void VP8EncDspInitMIPS32(void);
static volatile VP8CPUInfo enc_last_cpuinfo_used =
(VP8CPUInfo)&enc_last_cpuinfo_used;
void VP8EncDspInit(void) {
if (enc_last_cpuinfo_used == VP8GetCPUInfo) return;
VP8DspInit(); // common inverse transforms
InitTables();
// default C implementations
VP8CollectHistogram = CollectHistogram;
VP8ITransform = ITransform;
VP8FTransform = FTransform;
VP8ITransformWHT = ITransformWHT;
VP8FTransformWHT = FTransformWHT;
VP8EncPredLuma4 = Intra4Preds;
VP8EncPredLuma16 = Intra16Preds;
@@ -738,16 +720,28 @@ void VP8EncDspInit(void) {
VP8Copy4x4 = Copy4x4;
// If defined, use CPUInfo() to overwrite some pointers with faster versions.
if (VP8GetCPUInfo) {
if (VP8GetCPUInfo != NULL) {
#if defined(WEBP_USE_SSE2)
if (VP8GetCPUInfo(kSSE2)) {
VP8EncDspInitSSE2();
}
#elif defined(WEBP_USE_NEON)
#endif
#if defined(WEBP_USE_AVX2)
if (VP8GetCPUInfo(kAVX2)) {
VP8EncDspInitAVX2();
}
#endif
#if defined(WEBP_USE_NEON)
if (VP8GetCPUInfo(kNEON)) {
VP8EncDspInitNEON();
}
#endif
#if defined(WEBP_USE_MIPS32)
if (VP8GetCPUInfo(kMIPS32)) {
VP8EncDspInitMIPS32();
}
#endif
}
enc_last_cpuinfo_used = VP8GetCPUInfo;
}

View File

@@ -1,4 +1,4 @@
// Copyright 2011 Google Inc. All Rights Reserved.
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
@@ -7,24 +7,18 @@
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// Enhancement layer (for YUV444/422)
//
// Author: Skal (pascal.massimino@gmail.com)
// AVX2 version of speed-critical encoding functions.
#include <assert.h>
#include <stdlib.h>
#include "./dsp.h"
#include "./vp8i.h"
#if defined(WEBP_USE_AVX2)
#endif // WEBP_USE_AVX2
//------------------------------------------------------------------------------
// Entry point
int VP8DecodeLayer(VP8Decoder* const dec) {
assert(dec);
assert(dec->layer_data_size_ > 0);
(void)dec;
extern void VP8EncDspInitAVX2(void);
// TODO: handle enhancement layer here.
return 1;
void VP8EncDspInitAVX2(void) {
}

776
src/dsp/enc_mips32.c Normal file
View File

@@ -0,0 +1,776 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// MIPS version of speed-critical encoding functions.
//
// Author(s): Djordje Pesut (djordje.pesut@imgtec.com)
// Jovan Zelincevic (jovan.zelincevic@imgtec.com)
// Slobodan Prijic (slobodan.prijic@imgtec.com)
#include "./dsp.h"
#if defined(WEBP_USE_MIPS32)
#include "../enc/vp8enci.h"
#include "../enc/cost.h"
#if defined(__GNUC__) && defined(__ANDROID__) && LOCAL_GCC_VERSION == 0x409
#define WORK_AROUND_GCC
#endif
static const int kC1 = 20091 + (1 << 16);
static const int kC2 = 35468;
// macro for one vertical pass in ITransformOne
// MUL macro inlined
// temp0..temp15 holds tmp[0]..tmp[15]
// A..D - offsets in bytes to load from in buffer
// TEMP0..TEMP3 - registers for corresponding tmp elements
// TEMP4..TEMP5 - temporary registers
#define VERTICAL_PASS(A, B, C, D, TEMP4, TEMP0, TEMP1, TEMP2, TEMP3) \
"lh %[temp16], "#A"(%[temp20]) \n\t" \
"lh %[temp18], "#B"(%[temp20]) \n\t" \
"lh %[temp17], "#C"(%[temp20]) \n\t" \
"lh %[temp19], "#D"(%[temp20]) \n\t" \
"addu %["#TEMP4"], %[temp16], %[temp18] \n\t" \
"subu %[temp16], %[temp16], %[temp18] \n\t" \
"mul %["#TEMP0"], %[temp17], %[kC2] \n\t" \
"mul %[temp18], %[temp19], %[kC1] \n\t" \
"mul %[temp17], %[temp17], %[kC1] \n\t" \
"mul %[temp19], %[temp19], %[kC2] \n\t" \
"sra %["#TEMP0"], %["#TEMP0"], 16 \n\n" \
"sra %[temp18], %[temp18], 16 \n\n" \
"sra %[temp17], %[temp17], 16 \n\n" \
"sra %[temp19], %[temp19], 16 \n\n" \
"subu %["#TEMP2"], %["#TEMP0"], %[temp18] \n\t" \
"addu %["#TEMP3"], %[temp17], %[temp19] \n\t" \
"addu %["#TEMP0"], %["#TEMP4"], %["#TEMP3"] \n\t" \
"addu %["#TEMP1"], %[temp16], %["#TEMP2"] \n\t" \
"subu %["#TEMP2"], %[temp16], %["#TEMP2"] \n\t" \
"subu %["#TEMP3"], %["#TEMP4"], %["#TEMP3"] \n\t"
// macro for one horizontal pass in ITransformOne
// MUL and STORE macros inlined
// a = clip_8b(a) is replaced with: a = max(a, 0); a = min(a, 255)
// temp0..temp15 holds tmp[0]..tmp[15]
// A..D - offsets in bytes to load from ref and store to dst buffer
// TEMP0, TEMP4, TEMP8 and TEMP12 - registers for corresponding tmp elements
#define HORIZONTAL_PASS(A, B, C, D, TEMP0, TEMP4, TEMP8, TEMP12) \
"addiu %["#TEMP0"], %["#TEMP0"], 4 \n\t" \
"addu %[temp16], %["#TEMP0"], %["#TEMP8"] \n\t" \
"subu %[temp17], %["#TEMP0"], %["#TEMP8"] \n\t" \
"mul %["#TEMP0"], %["#TEMP4"], %[kC2] \n\t" \
"mul %["#TEMP8"], %["#TEMP12"], %[kC1] \n\t" \
"mul %["#TEMP4"], %["#TEMP4"], %[kC1] \n\t" \
"mul %["#TEMP12"], %["#TEMP12"], %[kC2] \n\t" \
"sra %["#TEMP0"], %["#TEMP0"], 16 \n\t" \
"sra %["#TEMP8"], %["#TEMP8"], 16 \n\t" \
"sra %["#TEMP4"], %["#TEMP4"], 16 \n\t" \
"sra %["#TEMP12"], %["#TEMP12"], 16 \n\t" \
"subu %[temp18], %["#TEMP0"], %["#TEMP8"] \n\t" \
"addu %[temp19], %["#TEMP4"], %["#TEMP12"] \n\t" \
"addu %["#TEMP0"], %[temp16], %[temp19] \n\t" \
"addu %["#TEMP4"], %[temp17], %[temp18] \n\t" \
"subu %["#TEMP8"], %[temp17], %[temp18] \n\t" \
"subu %["#TEMP12"], %[temp16], %[temp19] \n\t" \
"lw %[temp20], 0(%[args]) \n\t" \
"sra %["#TEMP0"], %["#TEMP0"], 3 \n\t" \
"sra %["#TEMP4"], %["#TEMP4"], 3 \n\t" \
"sra %["#TEMP8"], %["#TEMP8"], 3 \n\t" \
"sra %["#TEMP12"], %["#TEMP12"], 3 \n\t" \
"lbu %[temp16], "#A"(%[temp20]) \n\t" \
"lbu %[temp17], "#B"(%[temp20]) \n\t" \
"lbu %[temp18], "#C"(%[temp20]) \n\t" \
"lbu %[temp19], "#D"(%[temp20]) \n\t" \
"addu %["#TEMP0"], %[temp16], %["#TEMP0"] \n\t" \
"addu %["#TEMP4"], %[temp17], %["#TEMP4"] \n\t" \
"addu %["#TEMP8"], %[temp18], %["#TEMP8"] \n\t" \
"addu %["#TEMP12"], %[temp19], %["#TEMP12"] \n\t" \
"slt %[temp16], %["#TEMP0"], $zero \n\t" \
"slt %[temp17], %["#TEMP4"], $zero \n\t" \
"slt %[temp18], %["#TEMP8"], $zero \n\t" \
"slt %[temp19], %["#TEMP12"], $zero \n\t" \
"movn %["#TEMP0"], $zero, %[temp16] \n\t" \
"movn %["#TEMP4"], $zero, %[temp17] \n\t" \
"movn %["#TEMP8"], $zero, %[temp18] \n\t" \
"movn %["#TEMP12"], $zero, %[temp19] \n\t" \
"addiu %[temp20], $zero, 255 \n\t" \
"slt %[temp16], %["#TEMP0"], %[temp20] \n\t" \
"slt %[temp17], %["#TEMP4"], %[temp20] \n\t" \
"slt %[temp18], %["#TEMP8"], %[temp20] \n\t" \
"slt %[temp19], %["#TEMP12"], %[temp20] \n\t" \
"movz %["#TEMP0"], %[temp20], %[temp16] \n\t" \
"movz %["#TEMP4"], %[temp20], %[temp17] \n\t" \
"lw %[temp16], 8(%[args]) \n\t" \
"movz %["#TEMP8"], %[temp20], %[temp18] \n\t" \
"movz %["#TEMP12"], %[temp20], %[temp19] \n\t" \
"sb %["#TEMP0"], "#A"(%[temp16]) \n\t" \
"sb %["#TEMP4"], "#B"(%[temp16]) \n\t" \
"sb %["#TEMP8"], "#C"(%[temp16]) \n\t" \
"sb %["#TEMP12"], "#D"(%[temp16]) \n\t"
// Does one or two inverse transforms.
static WEBP_INLINE void ITransformOne(const uint8_t* ref, const int16_t* in,
uint8_t* dst) {
int temp0, temp1, temp2, temp3, temp4, temp5, temp6;
int temp7, temp8, temp9, temp10, temp11, temp12, temp13;
int temp14, temp15, temp16, temp17, temp18, temp19, temp20;
const int* args[3] = {(const int*)ref, (const int*)in, (const int*)dst};
__asm__ volatile(
"lw %[temp20], 4(%[args]) \n\t"
VERTICAL_PASS(0, 16, 8, 24, temp4, temp0, temp1, temp2, temp3)
VERTICAL_PASS(2, 18, 10, 26, temp8, temp4, temp5, temp6, temp7)
VERTICAL_PASS(4, 20, 12, 28, temp12, temp8, temp9, temp10, temp11)
VERTICAL_PASS(6, 22, 14, 30, temp20, temp12, temp13, temp14, temp15)
HORIZONTAL_PASS( 0, 1, 2, 3, temp0, temp4, temp8, temp12)
HORIZONTAL_PASS(16, 17, 18, 19, temp1, temp5, temp9, temp13)
HORIZONTAL_PASS(32, 33, 34, 35, temp2, temp6, temp10, temp14)
HORIZONTAL_PASS(48, 49, 50, 51, temp3, temp7, temp11, temp15)
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), [temp2]"=&r"(temp2),
[temp3]"=&r"(temp3), [temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), [temp8]"=&r"(temp8),
[temp9]"=&r"(temp9), [temp10]"=&r"(temp10), [temp11]"=&r"(temp11),
[temp12]"=&r"(temp12), [temp13]"=&r"(temp13), [temp14]"=&r"(temp14),
[temp15]"=&r"(temp15), [temp16]"=&r"(temp16), [temp17]"=&r"(temp17),
[temp18]"=&r"(temp18), [temp19]"=&r"(temp19), [temp20]"=&r"(temp20)
: [args]"r"(args), [kC1]"r"(kC1), [kC2]"r"(kC2)
: "memory", "hi", "lo"
);
}
static void ITransform(const uint8_t* ref, const int16_t* in,
uint8_t* dst, int do_two) {
ITransformOne(ref, in, dst);
if (do_two) {
ITransformOne(ref + 4, in + 16, dst + 4);
}
}
#undef VERTICAL_PASS
#undef HORIZONTAL_PASS
// macro for one pass through for loop in QuantizeBlock
// QUANTDIV macro inlined
// J - offset in bytes (kZigzag[n] * 2)
// K - offset in bytes (kZigzag[n] * 4)
// N - offset in bytes (n * 2)
#define QUANTIZE_ONE(J, K, N) \
"lh %[temp0], "#J"(%[ppin]) \n\t" \
"lhu %[temp1], "#J"(%[ppsharpen]) \n\t" \
"lw %[temp2], "#K"(%[ppzthresh]) \n\t" \
"sra %[sign], %[temp0], 15 \n\t" \
"xor %[coeff], %[temp0], %[sign] \n\t" \
"subu %[coeff], %[coeff], %[sign] \n\t" \
"addu %[coeff], %[coeff], %[temp1] \n\t" \
"slt %[temp4], %[temp2], %[coeff] \n\t" \
"addiu %[temp5], $zero, 0 \n\t" \
"addiu %[level], $zero, 0 \n\t" \
"beqz %[temp4], 2f \n\t" \
"lhu %[temp1], "#J"(%[ppiq]) \n\t" \
"lw %[temp2], "#K"(%[ppbias]) \n\t" \
"lhu %[temp3], "#J"(%[ppq]) \n\t" \
"mul %[level], %[coeff], %[temp1] \n\t" \
"addu %[level], %[level], %[temp2] \n\t" \
"sra %[level], %[level], 17 \n\t" \
"slt %[temp4], %[max_level], %[level] \n\t" \
"movn %[level], %[max_level], %[temp4] \n\t" \
"xor %[level], %[level], %[sign] \n\t" \
"subu %[level], %[level], %[sign] \n\t" \
"mul %[temp5], %[level], %[temp3] \n\t" \
"2: \n\t" \
"sh %[temp5], "#J"(%[ppin]) \n\t" \
"sh %[level], "#N"(%[pout]) \n\t"
static int QuantizeBlock(int16_t in[16], int16_t out[16],
const VP8Matrix* const mtx) {
int temp0, temp1, temp2, temp3, temp4, temp5;
int sign, coeff, level, i;
int max_level = MAX_LEVEL;
int16_t* ppin = &in[0];
int16_t* pout = &out[0];
const uint16_t* ppsharpen = &mtx->sharpen_[0];
const uint32_t* ppzthresh = &mtx->zthresh_[0];
const uint16_t* ppq = &mtx->q_[0];
const uint16_t* ppiq = &mtx->iq_[0];
const uint32_t* ppbias = &mtx->bias_[0];
__asm__ volatile(
QUANTIZE_ONE( 0, 0, 0)
QUANTIZE_ONE( 2, 4, 2)
QUANTIZE_ONE( 8, 16, 4)
QUANTIZE_ONE(16, 32, 6)
QUANTIZE_ONE(10, 20, 8)
QUANTIZE_ONE( 4, 8, 10)
QUANTIZE_ONE( 6, 12, 12)
QUANTIZE_ONE(12, 24, 14)
QUANTIZE_ONE(18, 36, 16)
QUANTIZE_ONE(24, 48, 18)
QUANTIZE_ONE(26, 52, 20)
QUANTIZE_ONE(20, 40, 22)
QUANTIZE_ONE(14, 28, 24)
QUANTIZE_ONE(22, 44, 26)
QUANTIZE_ONE(28, 56, 28)
QUANTIZE_ONE(30, 60, 30)
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1),
[temp2]"=&r"(temp2), [temp3]"=&r"(temp3),
[temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[sign]"=&r"(sign), [coeff]"=&r"(coeff),
[level]"=&r"(level)
: [pout]"r"(pout), [ppin]"r"(ppin),
[ppiq]"r"(ppiq), [max_level]"r"(max_level),
[ppbias]"r"(ppbias), [ppzthresh]"r"(ppzthresh),
[ppsharpen]"r"(ppsharpen), [ppq]"r"(ppq)
: "memory", "hi", "lo"
);
// moved out from macro to increase possibility for earlier breaking
for (i = 15; i >= 0; i--) {
if (out[i]) return 1;
}
return 0;
}
#undef QUANTIZE_ONE
// macro for one horizontal pass in Disto4x4 (TTransform)
// two calls of function TTransform are merged into single one
// A..D - offsets in bytes to load from a and b buffers
// E..H - offsets in bytes to store first results to tmp buffer
// E1..H1 - offsets in bytes to store second results to tmp buffer
#define HORIZONTAL_PASS(A, B, C, D, E, F, G, H, E1, F1, G1, H1) \
"lbu %[temp0], "#A"(%[a]) \n\t" \
"lbu %[temp1], "#B"(%[a]) \n\t" \
"lbu %[temp2], "#C"(%[a]) \n\t" \
"lbu %[temp3], "#D"(%[a]) \n\t" \
"lbu %[temp4], "#A"(%[b]) \n\t" \
"lbu %[temp5], "#B"(%[b]) \n\t" \
"lbu %[temp6], "#C"(%[b]) \n\t" \
"lbu %[temp7], "#D"(%[b]) \n\t" \
"addu %[temp8], %[temp0], %[temp2] \n\t" \
"subu %[temp0], %[temp0], %[temp2] \n\t" \
"addu %[temp2], %[temp1], %[temp3] \n\t" \
"subu %[temp1], %[temp1], %[temp3] \n\t" \
"addu %[temp3], %[temp4], %[temp6] \n\t" \
"subu %[temp4], %[temp4], %[temp6] \n\t" \
"addu %[temp6], %[temp5], %[temp7] \n\t" \
"subu %[temp5], %[temp5], %[temp7] \n\t" \
"addu %[temp7], %[temp8], %[temp2] \n\t" \
"subu %[temp2], %[temp8], %[temp2] \n\t" \
"addu %[temp8], %[temp0], %[temp1] \n\t" \
"subu %[temp0], %[temp0], %[temp1] \n\t" \
"addu %[temp1], %[temp3], %[temp6] \n\t" \
"subu %[temp3], %[temp3], %[temp6] \n\t" \
"addu %[temp6], %[temp4], %[temp5] \n\t" \
"subu %[temp4], %[temp4], %[temp5] \n\t" \
"sw %[temp7], "#E"(%[tmp]) \n\t" \
"sw %[temp2], "#H"(%[tmp]) \n\t" \
"sw %[temp8], "#F"(%[tmp]) \n\t" \
"sw %[temp0], "#G"(%[tmp]) \n\t" \
"sw %[temp1], "#E1"(%[tmp]) \n\t" \
"sw %[temp3], "#H1"(%[tmp]) \n\t" \
"sw %[temp6], "#F1"(%[tmp]) \n\t" \
"sw %[temp4], "#G1"(%[tmp]) \n\t"
// macro for one vertical pass in Disto4x4 (TTransform)
// two calls of function TTransform are merged into single one
// since only one accu is available in mips32r1 instruction set
// first is done second call of function TTransform and after
// that first one.
// const int sum1 = TTransform(a, w);
// const int sum2 = TTransform(b, w);
// return abs(sum2 - sum1) >> 5;
// (sum2 - sum1) is calculated with madds (sub2) and msubs (sub1)
// A..D - offsets in bytes to load first results from tmp buffer
// A1..D1 - offsets in bytes to load second results from tmp buffer
// E..H - offsets in bytes to load from w buffer
#define VERTICAL_PASS(A, B, C, D, A1, B1, C1, D1, E, F, G, H) \
"lw %[temp0], "#A1"(%[tmp]) \n\t" \
"lw %[temp1], "#C1"(%[tmp]) \n\t" \
"lw %[temp2], "#B1"(%[tmp]) \n\t" \
"lw %[temp3], "#D1"(%[tmp]) \n\t" \
"addu %[temp8], %[temp0], %[temp1] \n\t" \
"subu %[temp0], %[temp0], %[temp1] \n\t" \
"addu %[temp1], %[temp2], %[temp3] \n\t" \
"subu %[temp2], %[temp2], %[temp3] \n\t" \
"addu %[temp3], %[temp8], %[temp1] \n\t" \
"subu %[temp8], %[temp8], %[temp1] \n\t" \
"addu %[temp1], %[temp0], %[temp2] \n\t" \
"subu %[temp0], %[temp0], %[temp2] \n\t" \
"sra %[temp4], %[temp3], 31 \n\t" \
"sra %[temp5], %[temp1], 31 \n\t" \
"sra %[temp6], %[temp0], 31 \n\t" \
"sra %[temp7], %[temp8], 31 \n\t" \
"xor %[temp3], %[temp3], %[temp4] \n\t" \
"xor %[temp1], %[temp1], %[temp5] \n\t" \
"xor %[temp0], %[temp0], %[temp6] \n\t" \
"xor %[temp8], %[temp8], %[temp7] \n\t" \
"subu %[temp3], %[temp3], %[temp4] \n\t" \
"subu %[temp1], %[temp1], %[temp5] \n\t" \
"subu %[temp0], %[temp0], %[temp6] \n\t" \
"subu %[temp8], %[temp8], %[temp7] \n\t" \
"lhu %[temp4], "#E"(%[w]) \n\t" \
"lhu %[temp5], "#F"(%[w]) \n\t" \
"lhu %[temp6], "#G"(%[w]) \n\t" \
"lhu %[temp7], "#H"(%[w]) \n\t" \
"madd %[temp4], %[temp3] \n\t" \
"madd %[temp5], %[temp1] \n\t" \
"madd %[temp6], %[temp0] \n\t" \
"madd %[temp7], %[temp8] \n\t" \
"lw %[temp0], "#A"(%[tmp]) \n\t" \
"lw %[temp1], "#C"(%[tmp]) \n\t" \
"lw %[temp2], "#B"(%[tmp]) \n\t" \
"lw %[temp3], "#D"(%[tmp]) \n\t" \
"addu %[temp8], %[temp0], %[temp1] \n\t" \
"subu %[temp0], %[temp0], %[temp1] \n\t" \
"addu %[temp1], %[temp2], %[temp3] \n\t" \
"subu %[temp2], %[temp2], %[temp3] \n\t" \
"addu %[temp3], %[temp8], %[temp1] \n\t" \
"subu %[temp1], %[temp8], %[temp1] \n\t" \
"addu %[temp8], %[temp0], %[temp2] \n\t" \
"subu %[temp0], %[temp0], %[temp2] \n\t" \
"sra %[temp2], %[temp3], 31 \n\t" \
"xor %[temp3], %[temp3], %[temp2] \n\t" \
"subu %[temp3], %[temp3], %[temp2] \n\t" \
"msub %[temp4], %[temp3] \n\t" \
"sra %[temp2], %[temp8], 31 \n\t" \
"sra %[temp3], %[temp0], 31 \n\t" \
"sra %[temp4], %[temp1], 31 \n\t" \
"xor %[temp8], %[temp8], %[temp2] \n\t" \
"xor %[temp0], %[temp0], %[temp3] \n\t" \
"xor %[temp1], %[temp1], %[temp4] \n\t" \
"subu %[temp8], %[temp8], %[temp2] \n\t" \
"subu %[temp0], %[temp0], %[temp3] \n\t" \
"subu %[temp1], %[temp1], %[temp4] \n\t" \
"msub %[temp5], %[temp8] \n\t" \
"msub %[temp6], %[temp0] \n\t" \
"msub %[temp7], %[temp1] \n\t"
static int Disto4x4(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
int tmp[32];
int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7, temp8;
__asm__ volatile(
HORIZONTAL_PASS( 0, 1, 2, 3, 0, 4, 8, 12, 64, 68, 72, 76)
HORIZONTAL_PASS(16, 17, 18, 19, 16, 20, 24, 28, 80, 84, 88, 92)
HORIZONTAL_PASS(32, 33, 34, 35, 32, 36, 40, 44, 96, 100, 104, 108)
HORIZONTAL_PASS(48, 49, 50, 51, 48, 52, 56, 60, 112, 116, 120, 124)
"mthi $zero \n\t"
"mtlo $zero \n\t"
VERTICAL_PASS( 0, 16, 32, 48, 64, 80, 96, 112, 0, 8, 16, 24)
VERTICAL_PASS( 4, 20, 36, 52, 68, 84, 100, 116, 2, 10, 18, 26)
VERTICAL_PASS( 8, 24, 40, 56, 72, 88, 104, 120, 4, 12, 20, 28)
VERTICAL_PASS(12, 28, 44, 60, 76, 92, 108, 124, 6, 14, 22, 30)
"mflo %[temp0] \n\t"
"sra %[temp1], %[temp0], 31 \n\t"
"xor %[temp0], %[temp0], %[temp1] \n\t"
"subu %[temp0], %[temp0], %[temp1] \n\t"
"sra %[temp0], %[temp0], 5 \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), [temp2]"=&r"(temp2),
[temp3]"=&r"(temp3), [temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), [temp8]"=&r"(temp8)
: [a]"r"(a), [b]"r"(b), [w]"r"(w), [tmp]"r"(tmp)
: "memory", "hi", "lo"
);
return temp0;
}
#undef VERTICAL_PASS
#undef HORIZONTAL_PASS
static int Disto16x16(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
int D = 0;
int x, y;
for (y = 0; y < 16 * BPS; y += 4 * BPS) {
for (x = 0; x < 16; x += 4) {
D += Disto4x4(a + x + y, b + x + y, w);
}
}
return D;
}
// macro for one horizontal pass in FTransform
// temp0..temp15 holds tmp[0]..tmp[15]
// A..D - offsets in bytes to load from src and ref buffers
// TEMP0..TEMP3 - registers for corresponding tmp elements
#define HORIZONTAL_PASS(A, B, C, D, TEMP0, TEMP1, TEMP2, TEMP3) \
"lw %["#TEMP1"], 0(%[args]) \n\t" \
"lw %["#TEMP2"], 4(%[args]) \n\t" \
"lbu %[temp16], "#A"(%["#TEMP1"]) \n\t" \
"lbu %[temp17], "#A"(%["#TEMP2"]) \n\t" \
"lbu %[temp18], "#B"(%["#TEMP1"]) \n\t" \
"lbu %[temp19], "#B"(%["#TEMP2"]) \n\t" \
"subu %[temp20], %[temp16], %[temp17] \n\t" \
"lbu %[temp16], "#C"(%["#TEMP1"]) \n\t" \
"lbu %[temp17], "#C"(%["#TEMP2"]) \n\t" \
"subu %["#TEMP0"], %[temp18], %[temp19] \n\t" \
"lbu %[temp18], "#D"(%["#TEMP1"]) \n\t" \
"lbu %[temp19], "#D"(%["#TEMP2"]) \n\t" \
"subu %["#TEMP1"], %[temp16], %[temp17] \n\t" \
"subu %["#TEMP2"], %[temp18], %[temp19] \n\t" \
"addu %["#TEMP3"], %[temp20], %["#TEMP2"] \n\t" \
"subu %["#TEMP2"], %[temp20], %["#TEMP2"] \n\t" \
"addu %[temp20], %["#TEMP0"], %["#TEMP1"] \n\t" \
"subu %["#TEMP0"], %["#TEMP0"], %["#TEMP1"] \n\t" \
"mul %[temp16], %["#TEMP2"], %[c5352] \n\t" \
"mul %[temp17], %["#TEMP2"], %[c2217] \n\t" \
"mul %[temp18], %["#TEMP0"], %[c5352] \n\t" \
"mul %[temp19], %["#TEMP0"], %[c2217] \n\t" \
"addu %["#TEMP1"], %["#TEMP3"], %[temp20] \n\t" \
"subu %[temp20], %["#TEMP3"], %[temp20] \n\t" \
"sll %["#TEMP0"], %["#TEMP1"], 3 \n\t" \
"sll %["#TEMP2"], %[temp20], 3 \n\t" \
"addiu %[temp16], %[temp16], 1812 \n\t" \
"addiu %[temp17], %[temp17], 937 \n\t" \
"addu %[temp16], %[temp16], %[temp19] \n\t" \
"subu %[temp17], %[temp17], %[temp18] \n\t" \
"sra %["#TEMP1"], %[temp16], 9 \n\t" \
"sra %["#TEMP3"], %[temp17], 9 \n\t"
// macro for one vertical pass in FTransform
// temp0..temp15 holds tmp[0]..tmp[15]
// A..D - offsets in bytes to store to out buffer
// TEMP0, TEMP4, TEMP8 and TEMP12 - registers for corresponding tmp elements
#define VERTICAL_PASS(A, B, C, D, TEMP0, TEMP4, TEMP8, TEMP12) \
"addu %[temp16], %["#TEMP0"], %["#TEMP12"] \n\t" \
"subu %[temp19], %["#TEMP0"], %["#TEMP12"] \n\t" \
"addu %[temp17], %["#TEMP4"], %["#TEMP8"] \n\t" \
"subu %[temp18], %["#TEMP4"], %["#TEMP8"] \n\t" \
"mul %["#TEMP8"], %[temp19], %[c2217] \n\t" \
"mul %["#TEMP12"], %[temp18], %[c2217] \n\t" \
"mul %["#TEMP4"], %[temp19], %[c5352] \n\t" \
"mul %[temp18], %[temp18], %[c5352] \n\t" \
"addiu %[temp16], %[temp16], 7 \n\t" \
"addu %["#TEMP0"], %[temp16], %[temp17] \n\t" \
"sra %["#TEMP0"], %["#TEMP0"], 4 \n\t" \
"addu %["#TEMP12"], %["#TEMP12"], %["#TEMP4"] \n\t" \
"subu %["#TEMP4"], %[temp16], %[temp17] \n\t" \
"sra %["#TEMP4"], %["#TEMP4"], 4 \n\t" \
"addiu %["#TEMP8"], %["#TEMP8"], 30000 \n\t" \
"addiu %["#TEMP12"], %["#TEMP12"], 12000 \n\t" \
"addiu %["#TEMP8"], %["#TEMP8"], 21000 \n\t" \
"subu %["#TEMP8"], %["#TEMP8"], %[temp18] \n\t" \
"sra %["#TEMP12"], %["#TEMP12"], 16 \n\t" \
"sra %["#TEMP8"], %["#TEMP8"], 16 \n\t" \
"addiu %[temp16], %["#TEMP12"], 1 \n\t" \
"movn %["#TEMP12"], %[temp16], %[temp19] \n\t" \
"sh %["#TEMP0"], "#A"(%[temp20]) \n\t" \
"sh %["#TEMP4"], "#C"(%[temp20]) \n\t" \
"sh %["#TEMP8"], "#D"(%[temp20]) \n\t" \
"sh %["#TEMP12"], "#B"(%[temp20]) \n\t"
static void FTransform(const uint8_t* src, const uint8_t* ref, int16_t* out) {
int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7, temp8;
int temp9, temp10, temp11, temp12, temp13, temp14, temp15, temp16;
int temp17, temp18, temp19, temp20;
const int c2217 = 2217;
const int c5352 = 5352;
const int* const args[3] =
{ (const int*)src, (const int*)ref, (const int*)out };
__asm__ volatile(
HORIZONTAL_PASS( 0, 1, 2, 3, temp0, temp1, temp2, temp3)
HORIZONTAL_PASS(16, 17, 18, 19, temp4, temp5, temp6, temp7)
HORIZONTAL_PASS(32, 33, 34, 35, temp8, temp9, temp10, temp11)
HORIZONTAL_PASS(48, 49, 50, 51, temp12, temp13, temp14, temp15)
"lw %[temp20], 8(%[args]) \n\t"
VERTICAL_PASS(0, 8, 16, 24, temp0, temp4, temp8, temp12)
VERTICAL_PASS(2, 10, 18, 26, temp1, temp5, temp9, temp13)
VERTICAL_PASS(4, 12, 20, 28, temp2, temp6, temp10, temp14)
VERTICAL_PASS(6, 14, 22, 30, temp3, temp7, temp11, temp15)
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), [temp2]"=&r"(temp2),
[temp3]"=&r"(temp3), [temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), [temp8]"=&r"(temp8),
[temp9]"=&r"(temp9), [temp10]"=&r"(temp10), [temp11]"=&r"(temp11),
[temp12]"=&r"(temp12), [temp13]"=&r"(temp13), [temp14]"=&r"(temp14),
[temp15]"=&r"(temp15), [temp16]"=&r"(temp16), [temp17]"=&r"(temp17),
[temp18]"=&r"(temp18), [temp19]"=&r"(temp19), [temp20]"=&r"(temp20)
: [args]"r"(args), [c2217]"r"(c2217), [c5352]"r"(c5352)
: "memory", "hi", "lo"
);
}
#undef VERTICAL_PASS
#undef HORIZONTAL_PASS
// Forward declaration.
extern int VP8GetResidualCostMIPS32(int ctx0, const VP8Residual* const res);
int VP8GetResidualCostMIPS32(int ctx0, const VP8Residual* const res) {
int n = res->first;
// should be prob[VP8EncBands[n]], but it's equivalent for n=0 or 1
int p0 = res->prob[n][ctx0][0];
const uint16_t* t = res->cost[n][ctx0];
int cost;
const int const_2 = 2;
const int const_255 = 255;
const int const_max_level = MAX_VARIABLE_LEVEL;
int res_cost;
int res_prob;
int res_coeffs;
int res_last;
int v_reg;
int b_reg;
int ctx_reg;
int cost_add, temp_1, temp_2, temp_3;
if (res->last < 0) {
return VP8BitCost(0, p0);
}
cost = (ctx0 == 0) ? VP8BitCost(1, p0) : 0;
res_cost = (int)res->cost;
res_prob = (int)res->prob;
res_coeffs = (int)res->coeffs;
res_last = (int)res->last;
__asm__ volatile(
".set push \n\t"
".set noreorder \n\t"
"sll %[temp_1], %[n], 1 \n\t"
"addu %[res_coeffs], %[res_coeffs], %[temp_1] \n\t"
"slt %[temp_2], %[n], %[res_last] \n\t"
"bnez %[temp_2], 1f \n\t"
" li %[cost_add], 0 \n\t"
"b 2f \n\t"
" nop \n\t"
"1: \n\t"
"lh %[v_reg], 0(%[res_coeffs]) \n\t"
"addu %[b_reg], %[n], %[VP8EncBands] \n\t"
"move %[temp_1], %[const_max_level] \n\t"
"addu %[cost], %[cost], %[cost_add] \n\t"
"negu %[temp_2], %[v_reg] \n\t"
"slti %[temp_3], %[v_reg], 0 \n\t"
"movn %[v_reg], %[temp_2], %[temp_3] \n\t"
"lbu %[b_reg], 1(%[b_reg]) \n\t"
"li %[cost_add], 0 \n\t"
"sltiu %[temp_3], %[v_reg], 2 \n\t"
"move %[ctx_reg], %[v_reg] \n\t"
"movz %[ctx_reg], %[const_2], %[temp_3] \n\t"
// cost += VP8LevelCost(t, v);
"slt %[temp_3], %[v_reg], %[const_max_level] \n\t"
"movn %[temp_1], %[v_reg], %[temp_3] \n\t"
"sll %[temp_2], %[v_reg], 1 \n\t"
"addu %[temp_2], %[temp_2], %[VP8LevelFixedCosts] \n\t"
"lhu %[temp_2], 0(%[temp_2]) \n\t"
"sll %[temp_1], %[temp_1], 1 \n\t"
"addu %[temp_1], %[temp_1], %[t] \n\t"
"lhu %[temp_3], 0(%[temp_1]) \n\t"
"addu %[cost], %[cost], %[temp_2] \n\t"
// t = res->cost[b][ctx];
"sll %[temp_1], %[ctx_reg], 7 \n\t"
"sll %[temp_2], %[ctx_reg], 3 \n\t"
"addu %[cost], %[cost], %[temp_3] \n\t"
"addu %[temp_1], %[temp_1], %[temp_2] \n\t"
"sll %[temp_2], %[b_reg], 3 \n\t"
"sll %[temp_3], %[b_reg], 5 \n\t"
"sub %[temp_2], %[temp_3], %[temp_2] \n\t"
"sll %[temp_3], %[temp_2], 4 \n\t"
"addu %[temp_1], %[temp_1], %[temp_3] \n\t"
"addu %[temp_2], %[temp_2], %[res_cost] \n\t"
"addiu %[n], %[n], 1 \n\t"
"addu %[t], %[temp_1], %[temp_2] \n\t"
"slt %[temp_1], %[n], %[res_last] \n\t"
"bnez %[temp_1], 1b \n\t"
" addiu %[res_coeffs], %[res_coeffs], 2 \n\t"
"2: \n\t"
".set pop \n\t"
: [cost]"+r"(cost), [t]"+r"(t), [n]"+r"(n), [v_reg]"=&r"(v_reg),
[ctx_reg]"=&r"(ctx_reg), [b_reg]"=&r"(b_reg), [cost_add]"=&r"(cost_add),
[temp_1]"=&r"(temp_1), [temp_2]"=&r"(temp_2), [temp_3]"=&r"(temp_3)
: [const_2]"r"(const_2), [const_255]"r"(const_255), [res_last]"r"(res_last),
[VP8EntropyCost]"r"(VP8EntropyCost), [VP8EncBands]"r"(VP8EncBands),
[const_max_level]"r"(const_max_level), [res_prob]"r"(res_prob),
[VP8LevelFixedCosts]"r"(VP8LevelFixedCosts), [res_coeffs]"r"(res_coeffs),
[res_cost]"r"(res_cost)
: "memory"
);
// Last coefficient is always non-zero
{
const int v = abs(res->coeffs[n]);
assert(v != 0);
cost += VP8LevelCost(t, v);
if (n < 15) {
const int b = VP8EncBands[n + 1];
const int ctx = (v == 1) ? 1 : 2;
const int last_p0 = res->prob[b][ctx][0];
cost += VP8BitCost(0, last_p0);
}
}
return cost;
}
#define GET_SSE_INNER(A, B, C, D) \
"lbu %[temp0], "#A"(%[a]) \n\t" \
"lbu %[temp1], "#A"(%[b]) \n\t" \
"lbu %[temp2], "#B"(%[a]) \n\t" \
"lbu %[temp3], "#B"(%[b]) \n\t" \
"lbu %[temp4], "#C"(%[a]) \n\t" \
"lbu %[temp5], "#C"(%[b]) \n\t" \
"lbu %[temp6], "#D"(%[a]) \n\t" \
"lbu %[temp7], "#D"(%[b]) \n\t" \
"subu %[temp0], %[temp0], %[temp1] \n\t" \
"subu %[temp2], %[temp2], %[temp3] \n\t" \
"subu %[temp4], %[temp4], %[temp5] \n\t" \
"subu %[temp6], %[temp6], %[temp7] \n\t" \
"madd %[temp0], %[temp0] \n\t" \
"madd %[temp2], %[temp2] \n\t" \
"madd %[temp4], %[temp4] \n\t" \
"madd %[temp6], %[temp6] \n\t"
#define GET_SSE(A, B, C, D) \
GET_SSE_INNER(A, A + 1, A + 2, A + 3) \
GET_SSE_INNER(B, B + 1, B + 2, B + 3) \
GET_SSE_INNER(C, C + 1, C + 2, C + 3) \
GET_SSE_INNER(D, D + 1, D + 2, D + 3)
#if !defined(WORK_AROUND_GCC)
static int SSE16x16(const uint8_t* a, const uint8_t* b) {
int count;
int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
__asm__ volatile(
"mult $zero, $zero \n\t"
GET_SSE( 0, 4, 8, 12)
GET_SSE( 16, 20, 24, 28)
GET_SSE( 32, 36, 40, 44)
GET_SSE( 48, 52, 56, 60)
GET_SSE( 64, 68, 72, 76)
GET_SSE( 80, 84, 88, 92)
GET_SSE( 96, 100, 104, 108)
GET_SSE(112, 116, 120, 124)
GET_SSE(128, 132, 136, 140)
GET_SSE(144, 148, 152, 156)
GET_SSE(160, 164, 168, 172)
GET_SSE(176, 180, 184, 188)
GET_SSE(192, 196, 200, 204)
GET_SSE(208, 212, 216, 220)
GET_SSE(224, 228, 232, 236)
GET_SSE(240, 244, 248, 252)
"mflo %[count] \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), [temp2]"=&r"(temp2),
[temp3]"=&r"(temp3), [temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), [count]"=&r"(count)
: [a]"r"(a), [b]"r"(b)
: "memory", "hi" , "lo"
);
return count;
}
static int SSE16x8(const uint8_t* a, const uint8_t* b) {
int count;
int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
__asm__ volatile(
"mult $zero, $zero \n\t"
GET_SSE( 0, 4, 8, 12)
GET_SSE( 16, 20, 24, 28)
GET_SSE( 32, 36, 40, 44)
GET_SSE( 48, 52, 56, 60)
GET_SSE( 64, 68, 72, 76)
GET_SSE( 80, 84, 88, 92)
GET_SSE( 96, 100, 104, 108)
GET_SSE(112, 116, 120, 124)
"mflo %[count] \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), [temp2]"=&r"(temp2),
[temp3]"=&r"(temp3), [temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), [count]"=&r"(count)
: [a]"r"(a), [b]"r"(b)
: "memory", "hi" , "lo"
);
return count;
}
static int SSE8x8(const uint8_t* a, const uint8_t* b) {
int count;
int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
__asm__ volatile(
"mult $zero, $zero \n\t"
GET_SSE( 0, 4, 16, 20)
GET_SSE(32, 36, 48, 52)
GET_SSE(64, 68, 80, 84)
GET_SSE(96, 100, 112, 116)
"mflo %[count] \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), [temp2]"=&r"(temp2),
[temp3]"=&r"(temp3), [temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), [count]"=&r"(count)
: [a]"r"(a), [b]"r"(b)
: "memory", "hi" , "lo"
);
return count;
}
static int SSE4x4(const uint8_t* a, const uint8_t* b) {
int count;
int temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
__asm__ volatile(
"mult $zero, $zero \n\t"
GET_SSE(0, 16, 32, 48)
"mflo %[count] \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), [temp2]"=&r"(temp2),
[temp3]"=&r"(temp3), [temp4]"=&r"(temp4), [temp5]"=&r"(temp5),
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), [count]"=&r"(count)
: [a]"r"(a), [b]"r"(b)
: "memory", "hi" , "lo"
);
return count;
}
#endif // WORK_AROUND_GCC
#undef GET_SSE_MIPS32
#undef GET_SSE_MIPS32_INNER
#endif // WEBP_USE_MIPS32
//------------------------------------------------------------------------------
// Entry point
extern void VP8EncDspInitMIPS32(void);
void VP8EncDspInitMIPS32(void) {
#if defined(WEBP_USE_MIPS32)
VP8ITransform = ITransform;
VP8EncQuantizeBlock = QuantizeBlock;
VP8TDisto4x4 = Disto4x4;
VP8TDisto16x16 = Disto16x16;
VP8FTransform = FTransform;
#if !defined(WORK_AROUND_GCC)
VP8SSE16x16 = SSE16x16;
VP8SSE8x8 = SSE8x8;
VP8SSE16x8 = SSE16x8;
VP8SSE4x4 = SSE4x4;
#endif
#endif // WEBP_USE_MIPS32
}

View File

@@ -15,18 +15,122 @@
#if defined(WEBP_USE_NEON)
#include <assert.h>
#include "./neon.h"
#include "../enc/vp8enci.h"
//------------------------------------------------------------------------------
// Transforms (Paragraph 14.4)
// Inverse transform.
// This code is pretty much the same as TransformOneNEON in the decoder, except
// This code is pretty much the same as TransformOne in the dec_neon.c, except
// for subtraction to *ref. See the comments there for algorithmic explanations.
static const int16_t kC1 = 20091;
static const int16_t kC2 = 17734; // half of kC2, actually. See comment above.
// This code works but is *slower* than the inlined-asm version below
// (with gcc-4.6). So we disable it for now. Later, it'll be conditional to
// USE_INTRINSICS define.
// With gcc-4.8, it's a little faster speed than inlined-assembly.
#if defined(USE_INTRINSICS)
// Treats 'v' as an uint8x8_t and zero extends to an int16x8_t.
static WEBP_INLINE int16x8_t ConvertU8ToS16(uint32x2_t v) {
return vreinterpretq_s16_u16(vmovl_u8(vreinterpret_u8_u32(v)));
}
// Performs unsigned 8b saturation on 'dst01' and 'dst23' storing the result
// to the corresponding rows of 'dst'.
static WEBP_INLINE void SaturateAndStore4x4(uint8_t* const dst,
const int16x8_t dst01,
const int16x8_t dst23) {
// Unsigned saturate to 8b.
const uint8x8_t dst01_u8 = vqmovun_s16(dst01);
const uint8x8_t dst23_u8 = vqmovun_s16(dst23);
// Store the results.
vst1_lane_u32((uint32_t*)(dst + 0 * BPS), vreinterpret_u32_u8(dst01_u8), 0);
vst1_lane_u32((uint32_t*)(dst + 1 * BPS), vreinterpret_u32_u8(dst01_u8), 1);
vst1_lane_u32((uint32_t*)(dst + 2 * BPS), vreinterpret_u32_u8(dst23_u8), 0);
vst1_lane_u32((uint32_t*)(dst + 3 * BPS), vreinterpret_u32_u8(dst23_u8), 1);
}
static WEBP_INLINE void Add4x4(const int16x8_t row01, const int16x8_t row23,
const uint8_t* const ref, uint8_t* const dst) {
uint32x2_t dst01 = vdup_n_u32(0);
uint32x2_t dst23 = vdup_n_u32(0);
// Load the source pixels.
dst01 = vld1_lane_u32((uint32_t*)(ref + 0 * BPS), dst01, 0);
dst23 = vld1_lane_u32((uint32_t*)(ref + 2 * BPS), dst23, 0);
dst01 = vld1_lane_u32((uint32_t*)(ref + 1 * BPS), dst01, 1);
dst23 = vld1_lane_u32((uint32_t*)(ref + 3 * BPS), dst23, 1);
{
// Convert to 16b.
const int16x8_t dst01_s16 = ConvertU8ToS16(dst01);
const int16x8_t dst23_s16 = ConvertU8ToS16(dst23);
// Descale with rounding.
const int16x8_t out01 = vrsraq_n_s16(dst01_s16, row01, 3);
const int16x8_t out23 = vrsraq_n_s16(dst23_s16, row23, 3);
// Add the inverse transform.
SaturateAndStore4x4(dst, out01, out23);
}
}
static WEBP_INLINE void Transpose8x2(const int16x8_t in0, const int16x8_t in1,
int16x8x2_t* const out) {
// a0 a1 a2 a3 | b0 b1 b2 b3 => a0 b0 c0 d0 | a1 b1 c1 d1
// c0 c1 c2 c3 | d0 d1 d2 d3 a2 b2 c2 d2 | a3 b3 c3 d3
const int16x8x2_t tmp0 = vzipq_s16(in0, in1); // a0 c0 a1 c1 a2 c2 ...
// b0 d0 b1 d1 b2 d2 ...
*out = vzipq_s16(tmp0.val[0], tmp0.val[1]);
}
static WEBP_INLINE void TransformPass(int16x8x2_t* const rows) {
// {rows} = in0 | in4
// in8 | in12
// B1 = in4 | in12
const int16x8_t B1 =
vcombine_s16(vget_high_s16(rows->val[0]), vget_high_s16(rows->val[1]));
// C0 = kC1 * in4 | kC1 * in12
// C1 = kC2 * in4 | kC2 * in12
const int16x8_t C0 = vsraq_n_s16(B1, vqdmulhq_n_s16(B1, kC1), 1);
const int16x8_t C1 = vqdmulhq_n_s16(B1, kC2);
const int16x4_t a = vqadd_s16(vget_low_s16(rows->val[0]),
vget_low_s16(rows->val[1])); // in0 + in8
const int16x4_t b = vqsub_s16(vget_low_s16(rows->val[0]),
vget_low_s16(rows->val[1])); // in0 - in8
// c = kC2 * in4 - kC1 * in12
// d = kC1 * in4 + kC2 * in12
const int16x4_t c = vqsub_s16(vget_low_s16(C1), vget_high_s16(C0));
const int16x4_t d = vqadd_s16(vget_low_s16(C0), vget_high_s16(C1));
const int16x8_t D0 = vcombine_s16(a, b); // D0 = a | b
const int16x8_t D1 = vcombine_s16(d, c); // D1 = d | c
const int16x8_t E0 = vqaddq_s16(D0, D1); // a+d | b+c
const int16x8_t E_tmp = vqsubq_s16(D0, D1); // a-d | b-c
const int16x8_t E1 = vcombine_s16(vget_high_s16(E_tmp), vget_low_s16(E_tmp));
Transpose8x2(E0, E1, rows);
}
static void ITransformOne(const uint8_t* ref,
const int16_t* in, uint8_t* dst) {
int16x8x2_t rows;
INIT_VECTOR2(rows, vld1q_s16(in + 0), vld1q_s16(in + 8));
TransformPass(&rows);
TransformPass(&rows);
Add4x4(rows.val[0], rows.val[1], ref, dst);
}
#else
static void ITransformOne(const uint8_t* ref,
const int16_t* in, uint8_t* dst) {
const int kBPS = BPS;
const int16_t kC1C2[] = { 20091, 17734, 0, 0 }; // kC1 / (kC2 >> 1) / 0 / 0
const int16_t kC1C2[] = { kC1, kC2, 0, 0 };
__asm__ volatile (
"vld1.16 {q1, q2}, [%[in]] \n"
@@ -137,6 +241,8 @@ static void ITransformOne(const uint8_t* ref,
);
}
#endif // USE_INTRINSICS
static void ITransform(const uint8_t* ref,
const int16_t* in, uint8_t* dst, int do_two) {
ITransformOne(ref, in, dst);
@@ -145,76 +251,102 @@ static void ITransform(const uint8_t* ref,
}
}
// Same code as dec_neon.c
static void ITransformWHT(const int16_t* in, int16_t* out) {
const int kStep = 32; // The store is only incrementing the pointer as if we
// had stored a single byte.
__asm__ volatile (
// part 1
// load data into q0, q1
"vld1.16 {q0, q1}, [%[in]] \n"
"vaddl.s16 q2, d0, d3 \n" // a0 = in[0] + in[12]
"vaddl.s16 q3, d1, d2 \n" // a1 = in[4] + in[8]
"vsubl.s16 q4, d1, d2 \n" // a2 = in[4] - in[8]
"vsubl.s16 q5, d0, d3 \n" // a3 = in[0] - in[12]
"vadd.s32 q0, q2, q3 \n" // tmp[0] = a0 + a1
"vsub.s32 q2, q2, q3 \n" // tmp[8] = a0 - a1
"vadd.s32 q1, q5, q4 \n" // tmp[4] = a3 + a2
"vsub.s32 q3, q5, q4 \n" // tmp[12] = a3 - a2
// Transpose
// q0 = tmp[0, 4, 8, 12], q1 = tmp[2, 6, 10, 14]
// q2 = tmp[1, 5, 9, 13], q3 = tmp[3, 7, 11, 15]
"vswp d1, d4 \n" // vtrn.64 q0, q2
"vswp d3, d6 \n" // vtrn.64 q1, q3
"vtrn.32 q0, q1 \n"
"vtrn.32 q2, q3 \n"
"vmov.s32 q4, #3 \n" // dc = 3
"vadd.s32 q0, q0, q4 \n" // dc = tmp[0] + 3
"vadd.s32 q6, q0, q3 \n" // a0 = dc + tmp[3]
"vadd.s32 q7, q1, q2 \n" // a1 = tmp[1] + tmp[2]
"vsub.s32 q8, q1, q2 \n" // a2 = tmp[1] - tmp[2]
"vsub.s32 q9, q0, q3 \n" // a3 = dc - tmp[3]
"vadd.s32 q0, q6, q7 \n"
"vshrn.s32 d0, q0, #3 \n" // (a0 + a1) >> 3
"vadd.s32 q1, q9, q8 \n"
"vshrn.s32 d1, q1, #3 \n" // (a3 + a2) >> 3
"vsub.s32 q2, q6, q7 \n"
"vshrn.s32 d2, q2, #3 \n" // (a0 - a1) >> 3
"vsub.s32 q3, q9, q8 \n"
"vshrn.s32 d3, q3, #3 \n" // (a3 - a2) >> 3
// set the results to output
"vst1.16 d0[0], [%[out]], %[kStep] \n"
"vst1.16 d1[0], [%[out]], %[kStep] \n"
"vst1.16 d2[0], [%[out]], %[kStep] \n"
"vst1.16 d3[0], [%[out]], %[kStep] \n"
"vst1.16 d0[1], [%[out]], %[kStep] \n"
"vst1.16 d1[1], [%[out]], %[kStep] \n"
"vst1.16 d2[1], [%[out]], %[kStep] \n"
"vst1.16 d3[1], [%[out]], %[kStep] \n"
"vst1.16 d0[2], [%[out]], %[kStep] \n"
"vst1.16 d1[2], [%[out]], %[kStep] \n"
"vst1.16 d2[2], [%[out]], %[kStep] \n"
"vst1.16 d3[2], [%[out]], %[kStep] \n"
"vst1.16 d0[3], [%[out]], %[kStep] \n"
"vst1.16 d1[3], [%[out]], %[kStep] \n"
"vst1.16 d2[3], [%[out]], %[kStep] \n"
"vst1.16 d3[3], [%[out]], %[kStep] \n"
: [out] "+r"(out) // modified registers
: [in] "r"(in), [kStep] "r"(kStep) // constants
: "memory", "q0", "q1", "q2", "q3", "q4",
"q5", "q6", "q7", "q8", "q9" // clobbered
);
// Load all 4x4 pixels into a single uint8x16_t variable.
static uint8x16_t Load4x4(const uint8_t* src) {
uint32x4_t out = vdupq_n_u32(0);
out = vld1q_lane_u32((const uint32_t*)(src + 0 * BPS), out, 0);
out = vld1q_lane_u32((const uint32_t*)(src + 1 * BPS), out, 1);
out = vld1q_lane_u32((const uint32_t*)(src + 2 * BPS), out, 2);
out = vld1q_lane_u32((const uint32_t*)(src + 3 * BPS), out, 3);
return vreinterpretq_u8_u32(out);
}
// Forward transform.
#if defined(USE_INTRINSICS)
static WEBP_INLINE void Transpose4x4_S16(const int16x4_t A, const int16x4_t B,
const int16x4_t C, const int16x4_t D,
int16x8_t* const out01,
int16x8_t* const out32) {
const int16x4x2_t AB = vtrn_s16(A, B);
const int16x4x2_t CD = vtrn_s16(C, D);
const int32x2x2_t tmp02 = vtrn_s32(vreinterpret_s32_s16(AB.val[0]),
vreinterpret_s32_s16(CD.val[0]));
const int32x2x2_t tmp13 = vtrn_s32(vreinterpret_s32_s16(AB.val[1]),
vreinterpret_s32_s16(CD.val[1]));
*out01 = vreinterpretq_s16_s64(
vcombine_s64(vreinterpret_s64_s32(tmp02.val[0]),
vreinterpret_s64_s32(tmp13.val[0])));
*out32 = vreinterpretq_s16_s64(
vcombine_s64(vreinterpret_s64_s32(tmp13.val[1]),
vreinterpret_s64_s32(tmp02.val[1])));
}
static WEBP_INLINE int16x8_t DiffU8ToS16(const uint8x8_t a,
const uint8x8_t b) {
return vreinterpretq_s16_u16(vsubl_u8(a, b));
}
static void FTransform(const uint8_t* src, const uint8_t* ref,
int16_t* out) {
int16x8_t d0d1, d3d2; // working 4x4 int16 variables
{
const uint8x16_t S0 = Load4x4(src);
const uint8x16_t R0 = Load4x4(ref);
const int16x8_t D0D1 = DiffU8ToS16(vget_low_u8(S0), vget_low_u8(R0));
const int16x8_t D2D3 = DiffU8ToS16(vget_high_u8(S0), vget_high_u8(R0));
const int16x4_t D0 = vget_low_s16(D0D1);
const int16x4_t D1 = vget_high_s16(D0D1);
const int16x4_t D2 = vget_low_s16(D2D3);
const int16x4_t D3 = vget_high_s16(D2D3);
Transpose4x4_S16(D0, D1, D2, D3, &d0d1, &d3d2);
}
{ // 1rst pass
const int32x4_t kCst937 = vdupq_n_s32(937);
const int32x4_t kCst1812 = vdupq_n_s32(1812);
const int16x8_t a0a1 = vaddq_s16(d0d1, d3d2); // d0+d3 | d1+d2 (=a0|a1)
const int16x8_t a3a2 = vsubq_s16(d0d1, d3d2); // d0-d3 | d1-d2 (=a3|a2)
const int16x8_t a0a1_2 = vshlq_n_s16(a0a1, 3);
const int16x4_t tmp0 = vadd_s16(vget_low_s16(a0a1_2),
vget_high_s16(a0a1_2));
const int16x4_t tmp2 = vsub_s16(vget_low_s16(a0a1_2),
vget_high_s16(a0a1_2));
const int32x4_t a3_2217 = vmull_n_s16(vget_low_s16(a3a2), 2217);
const int32x4_t a2_2217 = vmull_n_s16(vget_high_s16(a3a2), 2217);
const int32x4_t a2_p_a3 = vmlal_n_s16(a2_2217, vget_low_s16(a3a2), 5352);
const int32x4_t a3_m_a2 = vmlsl_n_s16(a3_2217, vget_high_s16(a3a2), 5352);
const int16x4_t tmp1 = vshrn_n_s32(vaddq_s32(a2_p_a3, kCst1812), 9);
const int16x4_t tmp3 = vshrn_n_s32(vaddq_s32(a3_m_a2, kCst937), 9);
Transpose4x4_S16(tmp0, tmp1, tmp2, tmp3, &d0d1, &d3d2);
}
{ // 2nd pass
// the (1<<16) addition is for the replacement: a3!=0 <-> 1-(a3==0)
const int32x4_t kCst12000 = vdupq_n_s32(12000 + (1 << 16));
const int32x4_t kCst51000 = vdupq_n_s32(51000);
const int16x8_t a0a1 = vaddq_s16(d0d1, d3d2); // d0+d3 | d1+d2 (=a0|a1)
const int16x8_t a3a2 = vsubq_s16(d0d1, d3d2); // d0-d3 | d1-d2 (=a3|a2)
const int16x4_t a0_k7 = vadd_s16(vget_low_s16(a0a1), vdup_n_s16(7));
const int16x4_t out0 = vshr_n_s16(vadd_s16(a0_k7, vget_high_s16(a0a1)), 4);
const int16x4_t out2 = vshr_n_s16(vsub_s16(a0_k7, vget_high_s16(a0a1)), 4);
const int32x4_t a3_2217 = vmull_n_s16(vget_low_s16(a3a2), 2217);
const int32x4_t a2_2217 = vmull_n_s16(vget_high_s16(a3a2), 2217);
const int32x4_t a2_p_a3 = vmlal_n_s16(a2_2217, vget_low_s16(a3a2), 5352);
const int32x4_t a3_m_a2 = vmlsl_n_s16(a3_2217, vget_high_s16(a3a2), 5352);
const int16x4_t tmp1 = vaddhn_s32(a2_p_a3, kCst12000);
const int16x4_t out3 = vaddhn_s32(a3_m_a2, kCst51000);
const int16x4_t a3_eq_0 =
vreinterpret_s16_u16(vceq_s16(vget_low_s16(a3a2), vdup_n_s16(0)));
const int16x4_t out1 = vadd_s16(tmp1, a3_eq_0);
vst1_s16(out + 0, out0);
vst1_s16(out + 4, out1);
vst1_s16(out + 8, out2);
vst1_s16(out + 12, out3);
}
}
#else
// adapted from vp8/encoder/arm/neon/shortfdct_neon.asm
static const int16_t kCoeff16[] = {
5352, 5352, 5352, 5352, 2217, 2217, 2217, 2217
@@ -339,69 +471,76 @@ static void FTransform(const uint8_t* src, const uint8_t* ref,
);
}
static void FTransformWHT(const int16_t* in, int16_t* out) {
const int kStep = 32;
__asm__ volatile (
// d0 = in[0 * 16] , d1 = in[1 * 16]
// d2 = in[2 * 16] , d3 = in[3 * 16]
"vld1.16 d0[0], [%[in]], %[kStep] \n"
"vld1.16 d1[0], [%[in]], %[kStep] \n"
"vld1.16 d2[0], [%[in]], %[kStep] \n"
"vld1.16 d3[0], [%[in]], %[kStep] \n"
"vld1.16 d0[1], [%[in]], %[kStep] \n"
"vld1.16 d1[1], [%[in]], %[kStep] \n"
"vld1.16 d2[1], [%[in]], %[kStep] \n"
"vld1.16 d3[1], [%[in]], %[kStep] \n"
"vld1.16 d0[2], [%[in]], %[kStep] \n"
"vld1.16 d1[2], [%[in]], %[kStep] \n"
"vld1.16 d2[2], [%[in]], %[kStep] \n"
"vld1.16 d3[2], [%[in]], %[kStep] \n"
"vld1.16 d0[3], [%[in]], %[kStep] \n"
"vld1.16 d1[3], [%[in]], %[kStep] \n"
"vld1.16 d2[3], [%[in]], %[kStep] \n"
"vld1.16 d3[3], [%[in]], %[kStep] \n"
#endif
"vaddl.s16 q2, d0, d2 \n" // a0=(in[0*16]+in[2*16])
"vaddl.s16 q3, d1, d3 \n" // a1=(in[1*16]+in[3*16])
"vsubl.s16 q4, d1, d3 \n" // a2=(in[1*16]-in[3*16])
"vsubl.s16 q5, d0, d2 \n" // a3=(in[0*16]-in[2*16])
#define LOAD_LANE_16b(VALUE, LANE) do { \
(VALUE) = vld1_lane_s16(src, (VALUE), (LANE)); \
src += stride; \
} while (0)
"vqadd.s32 q6, q2, q3 \n" // a0 + a1
"vqadd.s32 q7, q5, q4 \n" // a3 + a2
"vqsub.s32 q8, q5, q4 \n" // a3 - a2
"vqsub.s32 q9, q2, q3 \n" // a0 - a1
static void FTransformWHT(const int16_t* src, int16_t* out) {
const int stride = 16;
const int16x4_t zero = vdup_n_s16(0);
int32x4x4_t tmp0;
int16x4x4_t in;
INIT_VECTOR4(in, zero, zero, zero, zero);
LOAD_LANE_16b(in.val[0], 0);
LOAD_LANE_16b(in.val[1], 0);
LOAD_LANE_16b(in.val[2], 0);
LOAD_LANE_16b(in.val[3], 0);
LOAD_LANE_16b(in.val[0], 1);
LOAD_LANE_16b(in.val[1], 1);
LOAD_LANE_16b(in.val[2], 1);
LOAD_LANE_16b(in.val[3], 1);
LOAD_LANE_16b(in.val[0], 2);
LOAD_LANE_16b(in.val[1], 2);
LOAD_LANE_16b(in.val[2], 2);
LOAD_LANE_16b(in.val[3], 2);
LOAD_LANE_16b(in.val[0], 3);
LOAD_LANE_16b(in.val[1], 3);
LOAD_LANE_16b(in.val[2], 3);
LOAD_LANE_16b(in.val[3], 3);
// Transpose
// q6 = tmp[0, 1, 2, 3] ; q7 = tmp[ 4, 5, 6, 7]
// q8 = tmp[8, 9, 10, 11] ; q9 = tmp[12, 13, 14, 15]
"vswp d13, d16 \n" // vtrn.64 q0, q2
"vswp d15, d18 \n" // vtrn.64 q1, q3
"vtrn.32 q6, q7 \n"
"vtrn.32 q8, q9 \n"
{
// a0 = in[0 * 16] + in[2 * 16]
// a1 = in[1 * 16] + in[3 * 16]
// a2 = in[1 * 16] - in[3 * 16]
// a3 = in[0 * 16] - in[2 * 16]
const int32x4_t a0 = vaddl_s16(in.val[0], in.val[2]);
const int32x4_t a1 = vaddl_s16(in.val[1], in.val[3]);
const int32x4_t a2 = vsubl_s16(in.val[1], in.val[3]);
const int32x4_t a3 = vsubl_s16(in.val[0], in.val[2]);
tmp0.val[0] = vaddq_s32(a0, a1);
tmp0.val[1] = vaddq_s32(a3, a2);
tmp0.val[2] = vsubq_s32(a3, a2);
tmp0.val[3] = vsubq_s32(a0, a1);
}
{
const int32x4x4_t tmp1 = Transpose4x4(tmp0);
// a0 = tmp[0 + i] + tmp[ 8 + i]
// a1 = tmp[4 + i] + tmp[12 + i]
// a2 = tmp[4 + i] - tmp[12 + i]
// a3 = tmp[0 + i] - tmp[ 8 + i]
const int32x4_t a0 = vaddq_s32(tmp1.val[0], tmp1.val[2]);
const int32x4_t a1 = vaddq_s32(tmp1.val[1], tmp1.val[3]);
const int32x4_t a2 = vsubq_s32(tmp1.val[1], tmp1.val[3]);
const int32x4_t a3 = vsubq_s32(tmp1.val[0], tmp1.val[2]);
const int32x4_t b0 = vhaddq_s32(a0, a1); // (a0 + a1) >> 1
const int32x4_t b1 = vhaddq_s32(a3, a2); // (a3 + a2) >> 1
const int32x4_t b2 = vhsubq_s32(a3, a2); // (a3 - a2) >> 1
const int32x4_t b3 = vhsubq_s32(a0, a1); // (a0 - a1) >> 1
const int16x4_t out0 = vmovn_s32(b0);
const int16x4_t out1 = vmovn_s32(b1);
const int16x4_t out2 = vmovn_s32(b2);
const int16x4_t out3 = vmovn_s32(b3);
"vqadd.s32 q0, q6, q8 \n" // a0 = tmp[0] + tmp[8]
"vqadd.s32 q1, q7, q9 \n" // a1 = tmp[4] + tmp[12]
"vqsub.s32 q2, q7, q9 \n" // a2 = tmp[4] - tmp[12]
"vqsub.s32 q3, q6, q8 \n" // a3 = tmp[0] - tmp[8]
"vqadd.s32 q4, q0, q1 \n" // b0 = a0 + a1
"vqadd.s32 q5, q3, q2 \n" // b1 = a3 + a2
"vqsub.s32 q6, q3, q2 \n" // b2 = a3 - a2
"vqsub.s32 q7, q0, q1 \n" // b3 = a0 - a1
"vshrn.s32 d18, q4, #1 \n" // b0 >> 1
"vshrn.s32 d19, q5, #1 \n" // b1 >> 1
"vshrn.s32 d20, q6, #1 \n" // b2 >> 1
"vshrn.s32 d21, q7, #1 \n" // b3 >> 1
"vst1.16 {q9, q10}, [%[out]] \n"
: [in] "+r"(in)
: [kStep] "r"(kStep), [out] "r"(out)
: "memory", "q0", "q1", "q2", "q3", "q4", "q5",
"q6", "q7", "q8", "q9", "q10" // clobbered
) ;
vst1_s16(out + 0, out0);
vst1_s16(out + 4, out1);
vst1_s16(out + 8, out2);
vst1_s16(out + 12, out3);
}
}
#undef LOAD_LANE_16b
//------------------------------------------------------------------------------
// Texture distortion
@@ -409,9 +548,136 @@ static void FTransformWHT(const int16_t* in, int16_t* out) {
// We try to match the spectral content (weighted) between source and
// reconstructed samples.
// This code works but is *slower* than the inlined-asm version below
// (with gcc-4.6). So we disable it for now. Later, it'll be conditional to
// USE_INTRINSICS define.
// With gcc-4.8, it's only slightly slower than the inlined.
#if defined(USE_INTRINSICS)
// Zero extend an uint16x4_t 'v' to an int32x4_t.
static WEBP_INLINE int32x4_t ConvertU16ToS32(uint16x4_t v) {
return vreinterpretq_s32_u32(vmovl_u16(v));
}
// Does a regular 4x4 transpose followed by an adjustment of the upper columns
// in the inner rows to restore the source order of differences,
// i.e., a0 - a1 | a3 - a2.
static WEBP_INLINE int32x4x4_t DistoTranspose4x4(const int32x4x4_t rows) {
int32x4x4_t out = Transpose4x4(rows);
// restore source order in the columns containing differences.
const int32x2_t r1h = vget_high_s32(out.val[1]);
const int32x2_t r2h = vget_high_s32(out.val[2]);
out.val[1] = vcombine_s32(vget_low_s32(out.val[1]), r2h);
out.val[2] = vcombine_s32(vget_low_s32(out.val[2]), r1h);
return out;
}
static WEBP_INLINE int32x4x4_t DistoHorizontalPass(const uint8x8_t r0r1,
const uint8x8_t r2r3) {
// a0 = in[0] + in[2] | a1 = in[1] + in[3]
const uint16x8_t a0a1 = vaddl_u8(r0r1, r2r3);
// a3 = in[0] - in[2] | a2 = in[1] - in[3]
const uint16x8_t a3a2 = vsubl_u8(r0r1, r2r3);
const int32x4_t tmp0 = vpaddlq_s16(vreinterpretq_s16_u16(a0a1)); // a0 + a1
const int32x4_t tmp1 = vpaddlq_s16(vreinterpretq_s16_u16(a3a2)); // a3 + a2
// no pairwise subtraction; reorder to perform tmp[2]/tmp[3] calculations.
// a0a0 a3a3 a0a0 a3a3 a0a0 a3a3 a0a0 a3a3
// a1a1 a2a2 a1a1 a2a2 a1a1 a2a2 a1a1 a2a2
const int16x8x2_t transpose =
vtrnq_s16(vreinterpretq_s16_u16(a0a1), vreinterpretq_s16_u16(a3a2));
// tmp[3] = a0 - a1 | tmp[2] = a3 - a2
const int32x4_t tmp32_1 = vsubl_s16(vget_low_s16(transpose.val[0]),
vget_low_s16(transpose.val[1]));
const int32x4_t tmp32_2 = vsubl_s16(vget_high_s16(transpose.val[0]),
vget_high_s16(transpose.val[1]));
// [0]: tmp[3] [1]: tmp[2]
const int32x4x2_t split = vtrnq_s32(tmp32_1, tmp32_2);
const int32x4x4_t res = { { tmp0, tmp1, split.val[1], split.val[0] } };
return res;
}
static WEBP_INLINE int32x4x4_t DistoVerticalPass(const int32x4x4_t rows) {
// a0 = tmp[0 + i] + tmp[8 + i];
const int32x4_t a0 = vaddq_s32(rows.val[0], rows.val[1]);
// a1 = tmp[4 + i] + tmp[12+ i];
const int32x4_t a1 = vaddq_s32(rows.val[2], rows.val[3]);
// a2 = tmp[4 + i] - tmp[12+ i];
const int32x4_t a2 = vsubq_s32(rows.val[2], rows.val[3]);
// a3 = tmp[0 + i] - tmp[8 + i];
const int32x4_t a3 = vsubq_s32(rows.val[0], rows.val[1]);
const int32x4_t b0 = vqabsq_s32(vaddq_s32(a0, a1)); // abs(a0 + a1)
const int32x4_t b1 = vqabsq_s32(vaddq_s32(a3, a2)); // abs(a3 + a2)
const int32x4_t b2 = vabdq_s32(a3, a2); // abs(a3 - a2)
const int32x4_t b3 = vabdq_s32(a0, a1); // abs(a0 - a1)
const int32x4x4_t res = { { b0, b1, b2, b3 } };
return res;
}
// Calculate the weighted sum of the rows in 'b'.
static WEBP_INLINE int64x1_t DistoSum(const int32x4x4_t b,
const int32x4_t w0, const int32x4_t w1,
const int32x4_t w2, const int32x4_t w3) {
const int32x4_t s0 = vmulq_s32(w0, b.val[0]);
const int32x4_t s1 = vmlaq_s32(s0, w1, b.val[1]);
const int32x4_t s2 = vmlaq_s32(s1, w2, b.val[2]);
const int32x4_t s3 = vmlaq_s32(s2, w3, b.val[3]);
const int64x2_t sum1 = vpaddlq_s32(s3);
const int64x1_t sum2 = vadd_s64(vget_low_s64(sum1), vget_high_s64(sum1));
return sum2;
}
#define LOAD_LANE_32b(src, VALUE, LANE) \
(VALUE) = vld1q_lane_u32((const uint32_t*)(src), (VALUE), (LANE))
// Hadamard transform
// Returns the weighted sum of the absolute value of transformed coefficients.
static int Disto4x4(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
uint32x4_t d0d1 = { 0, 0, 0, 0 };
uint32x4_t d2d3 = { 0, 0, 0, 0 };
LOAD_LANE_32b(a + 0 * BPS, d0d1, 0); // a00 a01 a02 a03
LOAD_LANE_32b(a + 1 * BPS, d0d1, 1); // a10 a11 a12 a13
LOAD_LANE_32b(b + 0 * BPS, d0d1, 2); // b00 b01 b02 b03
LOAD_LANE_32b(b + 1 * BPS, d0d1, 3); // b10 b11 b12 b13
LOAD_LANE_32b(a + 2 * BPS, d2d3, 0); // a20 a21 a22 a23
LOAD_LANE_32b(a + 3 * BPS, d2d3, 1); // a30 a31 a32 a33
LOAD_LANE_32b(b + 2 * BPS, d2d3, 2); // b20 b21 b22 b23
LOAD_LANE_32b(b + 3 * BPS, d2d3, 3); // b30 b31 b32 b33
{
// a00 a01 a20 a21 a10 a11 a30 a31 b00 b01 b20 b21 b10 b11 b30 b31
// a02 a03 a22 a23 a12 a13 a32 a33 b02 b03 b22 b23 b12 b13 b32 b33
const uint16x8x2_t tmp =
vtrnq_u16(vreinterpretq_u16_u32(d0d1), vreinterpretq_u16_u32(d2d3));
const uint8x16_t d0d1u8 = vreinterpretq_u8_u16(tmp.val[0]);
const uint8x16_t d2d3u8 = vreinterpretq_u8_u16(tmp.val[1]);
const int32x4x4_t hpass_a = DistoHorizontalPass(vget_low_u8(d0d1u8),
vget_low_u8(d2d3u8));
const int32x4x4_t hpass_b = DistoHorizontalPass(vget_high_u8(d0d1u8),
vget_high_u8(d2d3u8));
const int32x4x4_t tmp_a = DistoTranspose4x4(hpass_a);
const int32x4x4_t tmp_b = DistoTranspose4x4(hpass_b);
const int32x4x4_t vpass_a = DistoVerticalPass(tmp_a);
const int32x4x4_t vpass_b = DistoVerticalPass(tmp_b);
const int32x4_t w0 = ConvertU16ToS32(vld1_u16(w + 0));
const int32x4_t w1 = ConvertU16ToS32(vld1_u16(w + 4));
const int32x4_t w2 = ConvertU16ToS32(vld1_u16(w + 8));
const int32x4_t w3 = ConvertU16ToS32(vld1_u16(w + 12));
const int64x1_t sum1 = DistoSum(vpass_a, w0, w1, w2, w3);
const int64x1_t sum2 = DistoSum(vpass_b, w0, w1, w2, w3);
const int32x2_t diff = vabd_s32(vreinterpret_s32_s64(sum1),
vreinterpret_s32_s64(sum2));
const int32x2_t res = vshr_n_s32(diff, 5);
return vget_lane_s32(res, 0);
}
}
#undef LOAD_LANE_32b
#else
// Hadamard transform
// Returns the weighted sum of the absolute value of transformed coefficients.
// This uses a TTransform helper function in C
static int Disto4x4(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
const int kBPS = BPS;
@@ -598,6 +864,8 @@ static int Disto4x4(const uint8_t* const a, const uint8_t* const b,
return sum;
}
#endif // USE_INTRINSICS
static int Disto16x16(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
int D = 0;
@@ -610,6 +878,179 @@ static int Disto16x16(const uint8_t* const a, const uint8_t* const b,
return D;
}
//------------------------------------------------------------------------------
static void CollectHistogram(const uint8_t* ref, const uint8_t* pred,
int start_block, int end_block,
VP8Histogram* const histo) {
const uint16x8_t max_coeff_thresh = vdupq_n_u16(MAX_COEFF_THRESH);
int j;
for (j = start_block; j < end_block; ++j) {
int16_t out[16];
FTransform(ref + VP8DspScan[j], pred + VP8DspScan[j], out);
{
int k;
const int16x8_t a0 = vld1q_s16(out + 0);
const int16x8_t b0 = vld1q_s16(out + 8);
const uint16x8_t a1 = vreinterpretq_u16_s16(vabsq_s16(a0));
const uint16x8_t b1 = vreinterpretq_u16_s16(vabsq_s16(b0));
const uint16x8_t a2 = vshrq_n_u16(a1, 3);
const uint16x8_t b2 = vshrq_n_u16(b1, 3);
const uint16x8_t a3 = vminq_u16(a2, max_coeff_thresh);
const uint16x8_t b3 = vminq_u16(b2, max_coeff_thresh);
vst1q_s16(out + 0, vreinterpretq_s16_u16(a3));
vst1q_s16(out + 8, vreinterpretq_s16_u16(b3));
// Convert coefficients to bin.
for (k = 0; k < 16; ++k) {
histo->distribution[out[k]]++;
}
}
}
}
//------------------------------------------------------------------------------
static WEBP_INLINE void AccumulateSSE16(const uint8_t* const a,
const uint8_t* const b,
uint32x4_t* const sum) {
const uint8x16_t a0 = vld1q_u8(a);
const uint8x16_t b0 = vld1q_u8(b);
const uint8x16_t abs_diff = vabdq_u8(a0, b0);
uint16x8_t prod = vmull_u8(vget_low_u8(abs_diff), vget_low_u8(abs_diff));
prod = vmlal_u8(prod, vget_high_u8(abs_diff), vget_high_u8(abs_diff));
*sum = vpadalq_u16(*sum, prod); // pair-wise add and accumulate
}
// Horizontal sum of all four uint32_t values in 'sum'.
static int SumToInt(uint32x4_t sum) {
const uint64x2_t sum2 = vpaddlq_u32(sum);
const uint64_t sum3 = vgetq_lane_u64(sum2, 0) + vgetq_lane_u64(sum2, 1);
return (int)sum3;
}
static int SSE16x16(const uint8_t* a, const uint8_t* b) {
uint32x4_t sum = vdupq_n_u32(0);
int y;
for (y = 0; y < 16; ++y) {
AccumulateSSE16(a + y * BPS, b + y * BPS, &sum);
}
return SumToInt(sum);
}
static int SSE16x8(const uint8_t* a, const uint8_t* b) {
uint32x4_t sum = vdupq_n_u32(0);
int y;
for (y = 0; y < 8; ++y) {
AccumulateSSE16(a + y * BPS, b + y * BPS, &sum);
}
return SumToInt(sum);
}
static int SSE8x8(const uint8_t* a, const uint8_t* b) {
uint32x4_t sum = vdupq_n_u32(0);
int y;
for (y = 0; y < 8; ++y) {
const uint8x8_t a0 = vld1_u8(a + y * BPS);
const uint8x8_t b0 = vld1_u8(b + y * BPS);
const uint8x8_t abs_diff = vabd_u8(a0, b0);
const uint16x8_t prod = vmull_u8(abs_diff, abs_diff);
sum = vpadalq_u16(sum, prod);
}
return SumToInt(sum);
}
static int SSE4x4(const uint8_t* a, const uint8_t* b) {
const uint8x16_t a0 = Load4x4(a);
const uint8x16_t b0 = Load4x4(b);
const uint8x16_t abs_diff = vabdq_u8(a0, b0);
uint16x8_t prod = vmull_u8(vget_low_u8(abs_diff), vget_low_u8(abs_diff));
prod = vmlal_u8(prod, vget_high_u8(abs_diff), vget_high_u8(abs_diff));
return SumToInt(vpaddlq_u16(prod));
}
//------------------------------------------------------------------------------
// Compilation with gcc-4.6.x is problematic for now.
#if !defined(WORK_AROUND_GCC)
static int16x8_t Quantize(int16_t* const in,
const VP8Matrix* const mtx, int offset) {
const uint16x8_t sharp = vld1q_u16(&mtx->sharpen_[offset]);
const uint16x8_t q = vld1q_u16(&mtx->q_[offset]);
const uint16x8_t iq = vld1q_u16(&mtx->iq_[offset]);
const uint32x4_t bias0 = vld1q_u32(&mtx->bias_[offset + 0]);
const uint32x4_t bias1 = vld1q_u32(&mtx->bias_[offset + 4]);
const int16x8_t a = vld1q_s16(in + offset); // in
const uint16x8_t b = vreinterpretq_u16_s16(vabsq_s16(a)); // coeff = abs(in)
const int16x8_t sign = vshrq_n_s16(a, 15); // sign
const uint16x8_t c = vaddq_u16(b, sharp); // + sharpen
const uint32x4_t m0 = vmull_u16(vget_low_u16(c), vget_low_u16(iq));
const uint32x4_t m1 = vmull_u16(vget_high_u16(c), vget_high_u16(iq));
const uint32x4_t m2 = vhaddq_u32(m0, bias0);
const uint32x4_t m3 = vhaddq_u32(m1, bias1); // (coeff * iQ + bias) >> 1
const uint16x8_t c0 = vcombine_u16(vshrn_n_u32(m2, 16),
vshrn_n_u32(m3, 16)); // QFIX=17 = 16+1
const uint16x8_t c1 = vminq_u16(c0, vdupq_n_u16(MAX_LEVEL));
const int16x8_t c2 = veorq_s16(vreinterpretq_s16_u16(c1), sign);
const int16x8_t c3 = vsubq_s16(c2, sign); // restore sign
const int16x8_t c4 = vmulq_s16(c3, vreinterpretq_s16_u16(q));
vst1q_s16(in + offset, c4);
assert(QFIX == 17); // this function can't work as is if QFIX != 16+1
return c3;
}
static const uint8_t kShuffles[4][8] = {
{ 0, 1, 2, 3, 8, 9, 16, 17 },
{ 10, 11, 4, 5, 6, 7, 12, 13 },
{ 18, 19, 24, 25, 26, 27, 20, 21 },
{ 14, 15, 22, 23, 28, 29, 30, 31 }
};
static int QuantizeBlock(int16_t in[16], int16_t out[16],
const VP8Matrix* const mtx) {
const int16x8_t out0 = Quantize(in, mtx, 0);
const int16x8_t out1 = Quantize(in, mtx, 8);
uint8x8x4_t shuffles;
// vtbl?_u8 are marked unavailable for iOS arm64 with Xcode < 6.3, use
// non-standard versions there.
#if defined(__APPLE__) && defined(__aarch64__) && \
defined(__apple_build_version__) && (__apple_build_version__< 6020037)
uint8x16x2_t all_out;
INIT_VECTOR2(all_out, vreinterpretq_u8_s16(out0), vreinterpretq_u8_s16(out1));
INIT_VECTOR4(shuffles,
vtbl2q_u8(all_out, vld1_u8(kShuffles[0])),
vtbl2q_u8(all_out, vld1_u8(kShuffles[1])),
vtbl2q_u8(all_out, vld1_u8(kShuffles[2])),
vtbl2q_u8(all_out, vld1_u8(kShuffles[3])));
#else
uint8x8x4_t all_out;
INIT_VECTOR4(all_out,
vreinterpret_u8_s16(vget_low_s16(out0)),
vreinterpret_u8_s16(vget_high_s16(out0)),
vreinterpret_u8_s16(vget_low_s16(out1)),
vreinterpret_u8_s16(vget_high_s16(out1)));
INIT_VECTOR4(shuffles,
vtbl4_u8(all_out, vld1_u8(kShuffles[0])),
vtbl4_u8(all_out, vld1_u8(kShuffles[1])),
vtbl4_u8(all_out, vld1_u8(kShuffles[2])),
vtbl4_u8(all_out, vld1_u8(kShuffles[3])));
#endif
// Zigzag reordering
vst1_u8((uint8_t*)(out + 0), shuffles.val[0]);
vst1_u8((uint8_t*)(out + 4), shuffles.val[1]);
vst1_u8((uint8_t*)(out + 8), shuffles.val[2]);
vst1_u8((uint8_t*)(out + 12), shuffles.val[3]);
// test zeros
if (*(uint64_t*)(out + 0) != 0) return 1;
if (*(uint64_t*)(out + 4) != 0) return 1;
if (*(uint64_t*)(out + 8) != 0) return 1;
if (*(uint64_t*)(out + 12) != 0) return 1;
return 0;
}
#endif // !WORK_AROUND_GCC
#endif // WEBP_USE_NEON
//------------------------------------------------------------------------------
@@ -622,11 +1063,17 @@ void VP8EncDspInitNEON(void) {
VP8ITransform = ITransform;
VP8FTransform = FTransform;
VP8ITransformWHT = ITransformWHT;
VP8FTransformWHT = FTransformWHT;
VP8TDisto4x4 = Disto4x4;
VP8TDisto16x16 = Disto16x16;
VP8CollectHistogram = CollectHistogram;
VP8SSE16x16 = SSE16x16;
VP8SSE16x8 = SSE16x8;
VP8SSE8x8 = SSE8x8;
VP8SSE4x4 = SSE4x4;
#if !defined(WORK_AROUND_GCC)
VP8EncQuantizeBlock = QuantizeBlock;
#endif
#endif // WEBP_USE_NEON
}

View File

@@ -17,7 +17,9 @@
#include <stdlib.h> // for abs()
#include <emmintrin.h>
#include "../enc/cost.h"
#include "../enc/vp8enci.h"
#include "../utils/utils.h"
//------------------------------------------------------------------------------
// Quite useful macro for debugging. Left here for convenience.
@@ -52,9 +54,9 @@ static void PrintReg(const __m128i r, const char* const name, int size) {
// Compute susceptibility based on DCT-coeff histograms:
// the higher, the "easier" the macroblock is to compress.
static void CollectHistogramSSE2(const uint8_t* ref, const uint8_t* pred,
int start_block, int end_block,
VP8Histogram* const histo) {
static void CollectHistogram(const uint8_t* ref, const uint8_t* pred,
int start_block, int end_block,
VP8Histogram* const histo) {
const __m128i max_coeff_thresh = _mm_set1_epi16(MAX_COEFF_THRESH);
int j;
for (j = start_block; j < end_block; ++j) {
@@ -98,8 +100,8 @@ static void CollectHistogramSSE2(const uint8_t* ref, const uint8_t* pred,
// Transforms (Paragraph 14.4)
// Does one or two inverse transforms.
static void ITransformSSE2(const uint8_t* ref, const int16_t* in, uint8_t* dst,
int do_two) {
static void ITransform(const uint8_t* ref, const int16_t* in, uint8_t* dst,
int do_two) {
// This implementation makes use of 16-bit fixed point versions of two
// multiply constants:
// K1 = sqrt(2) * cos (pi/8) ~= 85627 / 2^16
@@ -318,8 +320,7 @@ static void ITransformSSE2(const uint8_t* ref, const int16_t* in, uint8_t* dst,
}
}
static void FTransformSSE2(const uint8_t* src, const uint8_t* ref,
int16_t* out) {
static void FTransform(const uint8_t* src, const uint8_t* ref, int16_t* out) {
const __m128i zero = _mm_setzero_si128();
const __m128i seven = _mm_set1_epi16(7);
const __m128i k937 = _mm_set1_epi32(937);
@@ -444,14 +445,14 @@ static void FTransformSSE2(const uint8_t* src, const uint8_t* ref,
// -> f1 = f1 + 1 - (a3 == 0)
const __m128i g1 = _mm_add_epi16(f1, _mm_cmpeq_epi16(a32, zero));
_mm_storel_epi64((__m128i*)&out[ 0], d0);
_mm_storel_epi64((__m128i*)&out[ 4], g1);
_mm_storel_epi64((__m128i*)&out[ 8], d2);
_mm_storel_epi64((__m128i*)&out[12], f3);
const __m128i d0_g1 = _mm_unpacklo_epi64(d0, g1);
const __m128i d2_f3 = _mm_unpacklo_epi64(d2, f3);
_mm_storeu_si128((__m128i*)&out[0], d0_g1);
_mm_storeu_si128((__m128i*)&out[8], d2_f3);
}
}
static void FTransformWHTSSE2(const int16_t* in, int16_t* out) {
static void FTransformWHT(const int16_t* in, int16_t* out) {
int32_t tmp[16];
int i;
for (i = 0; i < 4; ++i, in += 64) {
@@ -487,8 +488,8 @@ static void FTransformWHTSSE2(const int16_t* in, int16_t* out) {
//------------------------------------------------------------------------------
// Metric
static int SSE_Nx4SSE2(const uint8_t* a, const uint8_t* b,
int num_quads, int do_16) {
static int SSE_Nx4(const uint8_t* a, const uint8_t* b,
int num_quads, int do_16) {
const __m128i zero = _mm_setzero_si128();
__m128i sum1 = zero;
__m128i sum2 = zero;
@@ -565,19 +566,19 @@ static int SSE_Nx4SSE2(const uint8_t* a, const uint8_t* b,
}
}
static int SSE16x16SSE2(const uint8_t* a, const uint8_t* b) {
return SSE_Nx4SSE2(a, b, 4, 1);
static int SSE16x16(const uint8_t* a, const uint8_t* b) {
return SSE_Nx4(a, b, 4, 1);
}
static int SSE16x8SSE2(const uint8_t* a, const uint8_t* b) {
return SSE_Nx4SSE2(a, b, 2, 1);
static int SSE16x8(const uint8_t* a, const uint8_t* b) {
return SSE_Nx4(a, b, 2, 1);
}
static int SSE8x8SSE2(const uint8_t* a, const uint8_t* b) {
return SSE_Nx4SSE2(a, b, 2, 0);
static int SSE8x8(const uint8_t* a, const uint8_t* b) {
return SSE_Nx4(a, b, 2, 0);
}
static int SSE4x4SSE2(const uint8_t* a, const uint8_t* b) {
static int SSE4x4(const uint8_t* a, const uint8_t* b) {
const __m128i zero = _mm_setzero_si128();
// Load values. Note that we read 8 pixels instead of 4,
@@ -634,8 +635,8 @@ static int SSE4x4SSE2(const uint8_t* a, const uint8_t* b) {
// Hadamard transform
// Returns the difference between the weighted sum of the absolute value of
// transformed coefficients.
static int TTransformSSE2(const uint8_t* inA, const uint8_t* inB,
const uint16_t* const w) {
static int TTransform(const uint8_t* inA, const uint8_t* inB,
const uint16_t* const w) {
int32_t sum[4];
__m128i tmp_0, tmp_1, tmp_2, tmp_3;
const __m128i zero = _mm_setzero_si128();
@@ -782,19 +783,19 @@ static int TTransformSSE2(const uint8_t* inA, const uint8_t* inB,
return sum[0] + sum[1] + sum[2] + sum[3];
}
static int Disto4x4SSE2(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
const int diff_sum = TTransformSSE2(a, b, w);
static int Disto4x4(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
const int diff_sum = TTransform(a, b, w);
return abs(diff_sum) >> 5;
}
static int Disto16x16SSE2(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
static int Disto16x16(const uint8_t* const a, const uint8_t* const b,
const uint16_t* const w) {
int D = 0;
int x, y;
for (y = 0; y < 16 * BPS; y += 4 * BPS) {
for (x = 0; x < 16; x += 4) {
D += Disto4x4SSE2(a + x + y, b + x + y, w);
D += Disto4x4(a + x + y, b + x + y, w);
}
}
return D;
@@ -804,9 +805,9 @@ static int Disto16x16SSE2(const uint8_t* const a, const uint8_t* const b,
// Quantization
//
// Simple quantization
static int QuantizeBlockSSE2(int16_t in[16], int16_t out[16],
int n, const VP8Matrix* const mtx) {
static WEBP_INLINE int DoQuantizeBlock(int16_t in[16], int16_t out[16],
const uint16_t* const sharpen,
const VP8Matrix* const mtx) {
const __m128i max_coeff_2047 = _mm_set1_epi16(MAX_LEVEL);
const __m128i zero = _mm_setzero_si128();
__m128i coeff0, coeff8;
@@ -818,18 +819,14 @@ static int QuantizeBlockSSE2(int16_t in[16], int16_t out[16],
// we can use _mm_load_si128 instead of _mm_loadu_si128.
__m128i in0 = _mm_loadu_si128((__m128i*)&in[0]);
__m128i in8 = _mm_loadu_si128((__m128i*)&in[8]);
const __m128i sharpen0 = _mm_loadu_si128((__m128i*)&mtx->sharpen_[0]);
const __m128i sharpen8 = _mm_loadu_si128((__m128i*)&mtx->sharpen_[8]);
const __m128i iq0 = _mm_loadu_si128((__m128i*)&mtx->iq_[0]);
const __m128i iq8 = _mm_loadu_si128((__m128i*)&mtx->iq_[8]);
const __m128i bias0 = _mm_loadu_si128((__m128i*)&mtx->bias_[0]);
const __m128i bias8 = _mm_loadu_si128((__m128i*)&mtx->bias_[8]);
const __m128i q0 = _mm_loadu_si128((__m128i*)&mtx->q_[0]);
const __m128i q8 = _mm_loadu_si128((__m128i*)&mtx->q_[8]);
// sign(in) = in >> 15 (0x0000 if positive, 0xffff if negative)
const __m128i sign0 = _mm_srai_epi16(in0, 15);
const __m128i sign8 = _mm_srai_epi16(in8, 15);
// extract sign(in) (0x0000 if positive, 0xffff if negative)
const __m128i sign0 = _mm_cmpgt_epi16(zero, in0);
const __m128i sign8 = _mm_cmpgt_epi16(zero, in8);
// coeff = abs(in) = (in ^ sign) - sign
coeff0 = _mm_xor_si128(in0, sign0);
@@ -838,32 +835,35 @@ static int QuantizeBlockSSE2(int16_t in[16], int16_t out[16],
coeff8 = _mm_sub_epi16(coeff8, sign8);
// coeff = abs(in) + sharpen
coeff0 = _mm_add_epi16(coeff0, sharpen0);
coeff8 = _mm_add_epi16(coeff8, sharpen8);
if (sharpen != NULL) {
const __m128i sharpen0 = _mm_loadu_si128((__m128i*)&sharpen[0]);
const __m128i sharpen8 = _mm_loadu_si128((__m128i*)&sharpen[8]);
coeff0 = _mm_add_epi16(coeff0, sharpen0);
coeff8 = _mm_add_epi16(coeff8, sharpen8);
}
// out = (coeff * iQ + B) >> QFIX;
// out = (coeff * iQ + B) >> QFIX
{
// doing calculations with 32b precision (QFIX=17)
// out = (coeff * iQ)
__m128i coeff_iQ0H = _mm_mulhi_epu16(coeff0, iq0);
__m128i coeff_iQ0L = _mm_mullo_epi16(coeff0, iq0);
__m128i coeff_iQ8H = _mm_mulhi_epu16(coeff8, iq8);
__m128i coeff_iQ8L = _mm_mullo_epi16(coeff8, iq8);
const __m128i coeff_iQ0H = _mm_mulhi_epu16(coeff0, iq0);
const __m128i coeff_iQ0L = _mm_mullo_epi16(coeff0, iq0);
const __m128i coeff_iQ8H = _mm_mulhi_epu16(coeff8, iq8);
const __m128i coeff_iQ8L = _mm_mullo_epi16(coeff8, iq8);
__m128i out_00 = _mm_unpacklo_epi16(coeff_iQ0L, coeff_iQ0H);
__m128i out_04 = _mm_unpackhi_epi16(coeff_iQ0L, coeff_iQ0H);
__m128i out_08 = _mm_unpacklo_epi16(coeff_iQ8L, coeff_iQ8H);
__m128i out_12 = _mm_unpackhi_epi16(coeff_iQ8L, coeff_iQ8H);
// expand bias from 16b to 32b
__m128i bias_00 = _mm_unpacklo_epi16(bias0, zero);
__m128i bias_04 = _mm_unpackhi_epi16(bias0, zero);
__m128i bias_08 = _mm_unpacklo_epi16(bias8, zero);
__m128i bias_12 = _mm_unpackhi_epi16(bias8, zero);
// out = (coeff * iQ + B)
const __m128i bias_00 = _mm_loadu_si128((__m128i*)&mtx->bias_[0]);
const __m128i bias_04 = _mm_loadu_si128((__m128i*)&mtx->bias_[4]);
const __m128i bias_08 = _mm_loadu_si128((__m128i*)&mtx->bias_[8]);
const __m128i bias_12 = _mm_loadu_si128((__m128i*)&mtx->bias_[12]);
out_00 = _mm_add_epi32(out_00, bias_00);
out_04 = _mm_add_epi32(out_04, bias_04);
out_08 = _mm_add_epi32(out_08, bias_08);
out_12 = _mm_add_epi32(out_12, bias_12);
// out = (coeff * iQ + B) >> QFIX;
// out = QUANTDIV(coeff, iQ, B, QFIX)
out_00 = _mm_srai_epi32(out_00, QFIX);
out_04 = _mm_srai_epi32(out_04, QFIX);
out_08 = _mm_srai_epi32(out_08, QFIX);
@@ -916,19 +916,44 @@ static int QuantizeBlockSSE2(int16_t in[16], int16_t out[16],
}
// detect if all 'out' values are zeroes or not
{
int32_t tmp[4];
_mm_storeu_si128((__m128i*)tmp, packed_out);
if (n) {
tmp[0] &= ~0xff;
}
return (tmp[3] || tmp[2] || tmp[1] || tmp[0]);
}
return (_mm_movemask_epi8(_mm_cmpeq_epi8(packed_out, zero)) != 0xffff);
}
static int QuantizeBlockWHTSSE2(int16_t in[16], int16_t out[16],
const VP8Matrix* const mtx) {
return QuantizeBlockSSE2(in, out, 0, mtx);
static int QuantizeBlock(int16_t in[16], int16_t out[16],
const VP8Matrix* const mtx) {
return DoQuantizeBlock(in, out, &mtx->sharpen_[0], mtx);
}
static int QuantizeBlockWHT(int16_t in[16], int16_t out[16],
const VP8Matrix* const mtx) {
return DoQuantizeBlock(in, out, NULL, mtx);
}
// Forward declaration.
void VP8SetResidualCoeffsSSE2(const int16_t* const coeffs,
VP8Residual* const res);
void VP8SetResidualCoeffsSSE2(const int16_t* const coeffs,
VP8Residual* const res) {
const __m128i c0 = _mm_loadu_si128((const __m128i*)coeffs);
const __m128i c1 = _mm_loadu_si128((const __m128i*)(coeffs + 8));
// Use SSE to compare 8 values with a single instruction.
const __m128i zero = _mm_setzero_si128();
const __m128i m0 = _mm_cmpeq_epi16(c0, zero);
const __m128i m1 = _mm_cmpeq_epi16(c1, zero);
// Get the comparison results as a bitmask, consisting of two times 16 bits:
// two identical bits for each result. Concatenate both bitmasks to get a
// single 32 bit value. Negate the mask to get the position of entries that
// are not equal to zero. We don't need to mask out least significant bits
// according to res->first, since coeffs[0] is 0 if res->first > 0
const uint32_t mask =
~(((uint32_t)_mm_movemask_epi8(m1) << 16) | _mm_movemask_epi8(m0));
// The position of the most significant non-zero bit indicates the position of
// the last non-zero value. Divide the result by two because __movemask_epi8
// operates on 8 bit values instead of 16 bit values.
assert(res->first == 0 || coeffs[0] == 0);
res->last = mask ? (BitsLog2Floor(mask) >> 1) : -1;
res->coeffs = coeffs;
}
#endif // WEBP_USE_SSE2
@@ -940,18 +965,18 @@ extern void VP8EncDspInitSSE2(void);
void VP8EncDspInitSSE2(void) {
#if defined(WEBP_USE_SSE2)
VP8CollectHistogram = CollectHistogramSSE2;
VP8EncQuantizeBlock = QuantizeBlockSSE2;
VP8EncQuantizeBlockWHT = QuantizeBlockWHTSSE2;
VP8ITransform = ITransformSSE2;
VP8FTransform = FTransformSSE2;
VP8FTransformWHT = FTransformWHTSSE2;
VP8SSE16x16 = SSE16x16SSE2;
VP8SSE16x8 = SSE16x8SSE2;
VP8SSE8x8 = SSE8x8SSE2;
VP8SSE4x4 = SSE4x4SSE2;
VP8TDisto4x4 = Disto4x4SSE2;
VP8TDisto16x16 = Disto16x16SSE2;
VP8CollectHistogram = CollectHistogram;
VP8EncQuantizeBlock = QuantizeBlock;
VP8EncQuantizeBlockWHT = QuantizeBlockWHT;
VP8ITransform = ITransform;
VP8FTransform = FTransform;
VP8FTransformWHT = FTransformWHT;
VP8SSE16x16 = SSE16x16;
VP8SSE16x8 = SSE16x8;
VP8SSE8x8 = SSE8x8;
VP8SSE4x4 = SSE4x4;
VP8TDisto4x4 = Disto4x4;
VP8TDisto16x16 = Disto16x16;
#endif // WEBP_USE_SSE2
}

File diff suppressed because it is too large Load Diff

View File

@@ -18,26 +18,58 @@
#include "../webp/types.h"
#include "../webp/decode.h"
#include "../enc/histogram.h"
#include "../utils/utils.h"
#ifdef __cplusplus
extern "C" {
#endif
//------------------------------------------------------------------------------
//
// Signatures and generic function-pointers
typedef uint32_t (*VP8LPredClampedAddSubFunc)(uint32_t c0, uint32_t c1,
uint32_t c2);
typedef uint32_t (*VP8LPredSelectFunc)(uint32_t c0, uint32_t c1, uint32_t c2);
typedef void (*VP8LSubtractGreenFromBlueAndRedFunc)(uint32_t* argb_data,
int num_pixs);
typedef void (*VP8LAddGreenToBlueAndRedFunc)(uint32_t* data_start,
const uint32_t* data_end);
typedef uint32_t (*VP8LPredictorFunc)(uint32_t left, const uint32_t* const top);
extern VP8LPredictorFunc VP8LPredictors[16];
extern VP8LPredClampedAddSubFunc VP8LClampedAddSubtractFull;
extern VP8LPredClampedAddSubFunc VP8LClampedAddSubtractHalf;
extern VP8LPredSelectFunc VP8LSelect;
extern VP8LSubtractGreenFromBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed;
extern VP8LAddGreenToBlueAndRedFunc VP8LAddGreenToBlueAndRed;
typedef void (*VP8LProcessBlueAndRedFunc)(uint32_t* argb_data, int num_pixels);
extern VP8LProcessBlueAndRedFunc VP8LSubtractGreenFromBlueAndRed;
extern VP8LProcessBlueAndRedFunc VP8LAddGreenToBlueAndRed;
typedef struct {
// Note: the members are uint8_t, so that any negative values are
// automatically converted to "mod 256" values.
uint8_t green_to_red_;
uint8_t green_to_blue_;
uint8_t red_to_blue_;
} VP8LMultipliers;
typedef void (*VP8LTransformColorFunc)(const VP8LMultipliers* const m,
uint32_t* argb_data, int num_pixels);
extern VP8LTransformColorFunc VP8LTransformColor;
extern VP8LTransformColorFunc VP8LTransformColorInverse;
typedef void (*VP8LConvertFunc)(const uint32_t* src, int num_pixels,
uint8_t* dst);
extern VP8LConvertFunc VP8LConvertBGRAToRGB;
extern VP8LConvertFunc VP8LConvertBGRAToRGBA;
extern VP8LConvertFunc VP8LConvertBGRAToRGBA4444;
extern VP8LConvertFunc VP8LConvertBGRAToRGB565;
extern VP8LConvertFunc VP8LConvertBGRAToBGR;
// Expose some C-only fallback functions
void VP8LTransformColor_C(const VP8LMultipliers* const m,
uint32_t* data, int num_pixels);
void VP8LTransformColorInverse_C(const VP8LMultipliers* const m,
uint32_t* data, int num_pixels);
void VP8LConvertBGRAToRGB_C(const uint32_t* src, int num_pixels, uint8_t* dst);
void VP8LConvertBGRAToRGBA_C(const uint32_t* src, int num_pixels, uint8_t* dst);
void VP8LConvertBGRAToRGBA4444_C(const uint32_t* src,
int num_pixels, uint8_t* dst);
void VP8LConvertBGRAToRGB565_C(const uint32_t* src,
int num_pixels, uint8_t* dst);
void VP8LConvertBGRAToBGR_C(const uint32_t* src, int num_pixels, uint8_t* dst);
void VP8LSubtractGreenFromBlueAndRed_C(uint32_t* argb_data, int num_pixels);
void VP8LAddGreenToBlueAndRed_C(uint32_t* data, int num_pixels);
// Must be called before calling any of the above methods.
void VP8LDspInit(void);
@@ -66,7 +98,7 @@ void VP8LResidualImage(int width, int height, int bits,
uint32_t* const argb, uint32_t* const argb_scratch,
uint32_t* const image);
void VP8LColorSpaceTransform(int width, int height, int bits, int step,
void VP8LColorSpaceTransform(int width, int height, int bits, int quality,
uint32_t* const argb, uint32_t* image);
//------------------------------------------------------------------------------
@@ -85,58 +117,55 @@ static WEBP_INLINE uint32_t VP8LSubSampleSize(uint32_t size,
return (size + (1 << sampling_bits) - 1) >> sampling_bits;
}
// -----------------------------------------------------------------------------
// Faster logarithm for integers. Small values use a look-up table.
#define LOG_LOOKUP_IDX_MAX 256
extern const float kLog2Table[LOG_LOOKUP_IDX_MAX];
extern const float kSLog2Table[LOG_LOOKUP_IDX_MAX];
float VP8LFastLog2Slow(int v);
float VP8LFastSLog2Slow(int v);
static WEBP_INLINE float VP8LFastLog2(int v) {
typedef float (*VP8LFastLog2SlowFunc)(uint32_t v);
extern VP8LFastLog2SlowFunc VP8LFastLog2Slow;
extern VP8LFastLog2SlowFunc VP8LFastSLog2Slow;
static WEBP_INLINE float VP8LFastLog2(uint32_t v) {
return (v < LOG_LOOKUP_IDX_MAX) ? kLog2Table[v] : VP8LFastLog2Slow(v);
}
// Fast calculation of v * log2(v) for integer input.
static WEBP_INLINE float VP8LFastSLog2(int v) {
static WEBP_INLINE float VP8LFastSLog2(uint32_t v) {
return (v < LOG_LOOKUP_IDX_MAX) ? kSLog2Table[v] : VP8LFastSLog2Slow(v);
}
// -----------------------------------------------------------------------------
// Huffman-cost related functions.
typedef double (*VP8LCostFunc)(const uint32_t* population, int length);
typedef double (*VP8LCostCombinedFunc)(const uint32_t* X, const uint32_t* Y,
int length);
extern VP8LCostFunc VP8LExtraCost;
extern VP8LCostCombinedFunc VP8LExtraCostCombined;
typedef struct { // small struct to hold counters
int counts[2]; // index: 0=zero steak, 1=non-zero streak
int streaks[2][2]; // [zero/non-zero][streak<3 / streak>=3]
} VP8LStreaks;
typedef VP8LStreaks (*VP8LCostCountFunc)(const uint32_t* population,
int length);
typedef VP8LStreaks (*VP8LCostCombinedCountFunc)(const uint32_t* X,
const uint32_t* Y, int length);
extern VP8LCostCountFunc VP8LHuffmanCostCount;
extern VP8LCostCombinedCountFunc VP8LHuffmanCostCombinedCount;
typedef void (*VP8LHistogramAddFunc)(const VP8LHistogram* const a,
const VP8LHistogram* const b,
VP8LHistogram* const out);
extern VP8LHistogramAddFunc VP8LHistogramAdd;
// -----------------------------------------------------------------------------
// PrefixEncode()
// use GNU builtins where available.
#if defined(__GNUC__) && \
((__GNUC__ == 3 && __GNUC_MINOR__ >= 4) || __GNUC__ >= 4)
static WEBP_INLINE int BitsLog2Floor(uint32_t n) {
return 31 ^ __builtin_clz(n);
}
#elif defined(_MSC_VER) && _MSC_VER > 1310 && \
(defined(_M_X64) || defined(_M_IX86))
#include <intrin.h>
#pragma intrinsic(_BitScanReverse)
static WEBP_INLINE int BitsLog2Floor(uint32_t n) {
unsigned long first_set_bit;
_BitScanReverse(&first_set_bit, n);
return first_set_bit;
}
#else
// Returns (int)floor(log2(n)). n must be > 0.
static WEBP_INLINE int BitsLog2Floor(uint32_t n) {
int log = 0;
uint32_t value = n;
int i;
for (i = 4; i >= 0; --i) {
const int shift = (1 << i);
const uint32_t x = value >> shift;
if (x != 0) {
value = x;
log += shift;
}
}
return log;
}
#endif
static WEBP_INLINE int VP8LBitsLog2Ceiling(uint32_t n) {
const int log_floor = BitsLog2Floor(n);
if (n == (n & ~(n - 1))) // zero or a power of two.

416
src/dsp/lossless_mips32.c Normal file
View File

@@ -0,0 +1,416 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// MIPS version of lossless functions
//
// Author(s): Djordje Pesut (djordje.pesut@imgtec.com)
// Jovan Zelincevic (jovan.zelincevic@imgtec.com)
#include "./dsp.h"
#include "./lossless.h"
#if defined(WEBP_USE_MIPS32)
#include <assert.h>
#include <math.h>
#include <stdlib.h>
#include <string.h>
#define APPROX_LOG_WITH_CORRECTION_MAX 65536
#define APPROX_LOG_MAX 4096
#define LOG_2_RECIPROCAL 1.44269504088896338700465094007086
static float FastSLog2Slow(uint32_t v) {
assert(v >= LOG_LOOKUP_IDX_MAX);
if (v < APPROX_LOG_WITH_CORRECTION_MAX) {
uint32_t log_cnt, y, correction;
const int c24 = 24;
const float v_f = (float)v;
uint32_t temp;
// Xf = 256 = 2^8
// log_cnt is index of leading one in upper 24 bits
__asm__ volatile(
"clz %[log_cnt], %[v] \n\t"
"addiu %[y], $zero, 1 \n\t"
"subu %[log_cnt], %[c24], %[log_cnt] \n\t"
"sllv %[y], %[y], %[log_cnt] \n\t"
"srlv %[temp], %[v], %[log_cnt] \n\t"
: [log_cnt]"=&r"(log_cnt), [y]"=&r"(y),
[temp]"=r"(temp)
: [c24]"r"(c24), [v]"r"(v)
);
// vf = (2^log_cnt) * Xf; where y = 2^log_cnt and Xf < 256
// Xf = floor(Xf) * (1 + (v % y) / v)
// log2(Xf) = log2(floor(Xf)) + log2(1 + (v % y) / v)
// The correction factor: log(1 + d) ~ d; for very small d values, so
// log2(1 + (v % y) / v) ~ LOG_2_RECIPROCAL * (v % y)/v
// LOG_2_RECIPROCAL ~ 23/16
// (v % y) = (v % 2^log_cnt) = v & (2^log_cnt - 1)
correction = (23 * (v & (y - 1))) >> 4;
return v_f * (kLog2Table[temp] + log_cnt) + correction;
} else {
return (float)(LOG_2_RECIPROCAL * v * log((double)v));
}
}
static float FastLog2Slow(uint32_t v) {
assert(v >= LOG_LOOKUP_IDX_MAX);
if (v < APPROX_LOG_WITH_CORRECTION_MAX) {
uint32_t log_cnt, y;
const int c24 = 24;
double log_2;
uint32_t temp;
__asm__ volatile(
"clz %[log_cnt], %[v] \n\t"
"addiu %[y], $zero, 1 \n\t"
"subu %[log_cnt], %[c24], %[log_cnt] \n\t"
"sllv %[y], %[y], %[log_cnt] \n\t"
"srlv %[temp], %[v], %[log_cnt] \n\t"
: [log_cnt]"=&r"(log_cnt), [y]"=&r"(y),
[temp]"=r"(temp)
: [c24]"r"(c24), [v]"r"(v)
);
log_2 = kLog2Table[temp] + log_cnt;
if (v >= APPROX_LOG_MAX) {
// Since the division is still expensive, add this correction factor only
// for large values of 'v'.
const uint32_t correction = (23 * (v & (y - 1))) >> 4;
log_2 += (double)correction / v;
}
return (float)log_2;
} else {
return (float)(LOG_2_RECIPROCAL * log((double)v));
}
}
// C version of this function:
// int i = 0;
// int64_t cost = 0;
// const uint32_t* pop = &population[4];
// const uint32_t* LoopEnd = &population[length];
// while (pop != LoopEnd) {
// ++i;
// cost += i * *pop;
// cost += i * *(pop + 1);
// pop += 2;
// }
// return (double)cost;
static double ExtraCost(const uint32_t* const population, int length) {
int i, temp0, temp1;
const uint32_t* pop = &population[4];
const uint32_t* const LoopEnd = &population[length];
__asm__ volatile(
"mult $zero, $zero \n\t"
"xor %[i], %[i], %[i] \n\t"
"beq %[pop], %[LoopEnd], 2f \n\t"
"1: \n\t"
"lw %[temp0], 0(%[pop]) \n\t"
"lw %[temp1], 4(%[pop]) \n\t"
"addiu %[i], %[i], 1 \n\t"
"addiu %[pop], %[pop], 8 \n\t"
"madd %[i], %[temp0] \n\t"
"madd %[i], %[temp1] \n\t"
"bne %[pop], %[LoopEnd], 1b \n\t"
"2: \n\t"
"mfhi %[temp0] \n\t"
"mflo %[temp1] \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1),
[i]"=&r"(i), [pop]"+r"(pop)
: [LoopEnd]"r"(LoopEnd)
: "memory", "hi", "lo"
);
return (double)((int64_t)temp0 << 32 | temp1);
}
// C version of this function:
// int i = 0;
// int64_t cost = 0;
// const uint32_t* pX = &X[4];
// const uint32_t* pY = &Y[4];
// const uint32_t* LoopEnd = &X[length];
// while (pX != LoopEnd) {
// const uint32_t xy0 = *pX + *pY;
// const uint32_t xy1 = *(pX + 1) + *(pY + 1);
// ++i;
// cost += i * xy0;
// cost += i * xy1;
// pX += 2;
// pY += 2;
// }
// return (double)cost;
static double ExtraCostCombined(const uint32_t* const X,
const uint32_t* const Y, int length) {
int i, temp0, temp1, temp2, temp3;
const uint32_t* pX = &X[4];
const uint32_t* pY = &Y[4];
const uint32_t* const LoopEnd = &X[length];
__asm__ volatile(
"mult $zero, $zero \n\t"
"xor %[i], %[i], %[i] \n\t"
"beq %[pX], %[LoopEnd], 2f \n\t"
"1: \n\t"
"lw %[temp0], 0(%[pX]) \n\t"
"lw %[temp1], 0(%[pY]) \n\t"
"lw %[temp2], 4(%[pX]) \n\t"
"lw %[temp3], 4(%[pY]) \n\t"
"addiu %[i], %[i], 1 \n\t"
"addu %[temp0], %[temp0], %[temp1] \n\t"
"addu %[temp2], %[temp2], %[temp3] \n\t"
"addiu %[pX], %[pX], 8 \n\t"
"addiu %[pY], %[pY], 8 \n\t"
"madd %[i], %[temp0] \n\t"
"madd %[i], %[temp2] \n\t"
"bne %[pX], %[LoopEnd], 1b \n\t"
"2: \n\t"
"mfhi %[temp0] \n\t"
"mflo %[temp1] \n\t"
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1),
[temp2]"=&r"(temp2), [temp3]"=&r"(temp3),
[i]"=&r"(i), [pX]"+r"(pX), [pY]"+r"(pY)
: [LoopEnd]"r"(LoopEnd)
: "memory", "hi", "lo"
);
return (double)((int64_t)temp0 << 32 | temp1);
}
#define HUFFMAN_COST_PASS \
__asm__ volatile( \
"sll %[temp1], %[temp0], 3 \n\t" \
"addiu %[temp3], %[streak], -3 \n\t" \
"addu %[temp2], %[pstreaks], %[temp1] \n\t" \
"blez %[temp3], 1f \n\t" \
"srl %[temp1], %[temp1], 1 \n\t" \
"addu %[temp3], %[pcnts], %[temp1] \n\t" \
"lw %[temp0], 4(%[temp2]) \n\t" \
"lw %[temp1], 0(%[temp3]) \n\t" \
"addu %[temp0], %[temp0], %[streak] \n\t" \
"addiu %[temp1], %[temp1], 1 \n\t" \
"sw %[temp0], 4(%[temp2]) \n\t" \
"sw %[temp1], 0(%[temp3]) \n\t" \
"b 2f \n\t" \
"1: \n\t" \
"lw %[temp0], 0(%[temp2]) \n\t" \
"addu %[temp0], %[temp0], %[streak] \n\t" \
"sw %[temp0], 0(%[temp2]) \n\t" \
"2: \n\t" \
: [temp1]"=&r"(temp1), [temp2]"=&r"(temp2), \
[temp3]"=&r"(temp3), [temp0]"+r"(temp0) \
: [pstreaks]"r"(pstreaks), [pcnts]"r"(pcnts), \
[streak]"r"(streak) \
: "memory" \
);
// Returns the various RLE counts
static VP8LStreaks HuffmanCostCount(const uint32_t* population, int length) {
int i;
int streak = 0;
VP8LStreaks stats;
int* const pstreaks = &stats.streaks[0][0];
int* const pcnts = &stats.counts[0];
int temp0, temp1, temp2, temp3;
memset(&stats, 0, sizeof(stats));
for (i = 0; i < length - 1; ++i) {
++streak;
if (population[i] == population[i + 1]) {
continue;
}
temp0 = (population[i] != 0);
HUFFMAN_COST_PASS
streak = 0;
}
++streak;
temp0 = (population[i] != 0);
HUFFMAN_COST_PASS
return stats;
}
static VP8LStreaks HuffmanCostCombinedCount(const uint32_t* X,
const uint32_t* Y, int length) {
int i;
int streak = 0;
VP8LStreaks stats;
int* const pstreaks = &stats.streaks[0][0];
int* const pcnts = &stats.counts[0];
int temp0, temp1, temp2, temp3;
memset(&stats, 0, sizeof(stats));
for (i = 0; i < length - 1; ++i) {
const uint32_t xy = X[i] + Y[i];
const uint32_t xy_next = X[i + 1] + Y[i + 1];
++streak;
if (xy == xy_next) {
continue;
}
temp0 = (xy != 0);
HUFFMAN_COST_PASS
streak = 0;
}
{
const uint32_t xy = X[i] + Y[i];
++streak;
temp0 = (xy != 0);
HUFFMAN_COST_PASS
}
return stats;
}
#define ASM_START \
__asm__ volatile( \
".set push \n\t" \
".set at \n\t" \
".set macro \n\t" \
"1: \n\t"
// P2 = P0 + P1
// A..D - offsets
// E - temp variable to tell macro
// if pointer should be incremented
// literal_ and successive histograms could be unaligned
// so we must use ulw and usw
#define ADD_TO_OUT(A, B, C, D, E, P0, P1, P2) \
"ulw %[temp0], "#A"(%["#P0"]) \n\t" \
"ulw %[temp1], "#B"(%["#P0"]) \n\t" \
"ulw %[temp2], "#C"(%["#P0"]) \n\t" \
"ulw %[temp3], "#D"(%["#P0"]) \n\t" \
"ulw %[temp4], "#A"(%["#P1"]) \n\t" \
"ulw %[temp5], "#B"(%["#P1"]) \n\t" \
"ulw %[temp6], "#C"(%["#P1"]) \n\t" \
"ulw %[temp7], "#D"(%["#P1"]) \n\t" \
"addu %[temp4], %[temp4], %[temp0] \n\t" \
"addu %[temp5], %[temp5], %[temp1] \n\t" \
"addu %[temp6], %[temp6], %[temp2] \n\t" \
"addu %[temp7], %[temp7], %[temp3] \n\t" \
"addiu %["#P0"], %["#P0"], 16 \n\t" \
".if "#E" == 1 \n\t" \
"addiu %["#P1"], %["#P1"], 16 \n\t" \
".endif \n\t" \
"usw %[temp4], "#A"(%["#P2"]) \n\t" \
"usw %[temp5], "#B"(%["#P2"]) \n\t" \
"usw %[temp6], "#C"(%["#P2"]) \n\t" \
"usw %[temp7], "#D"(%["#P2"]) \n\t" \
"addiu %["#P2"], %["#P2"], 16 \n\t" \
"bne %["#P0"], %[LoopEnd], 1b \n\t" \
".set pop \n\t" \
#define ASM_END_COMMON_0 \
: [temp0]"=&r"(temp0), [temp1]"=&r"(temp1), \
[temp2]"=&r"(temp2), [temp3]"=&r"(temp3), \
[temp4]"=&r"(temp4), [temp5]"=&r"(temp5), \
[temp6]"=&r"(temp6), [temp7]"=&r"(temp7), \
[pa]"+r"(pa), [pout]"+r"(pout)
#define ASM_END_COMMON_1 \
: [LoopEnd]"r"(LoopEnd) \
: "memory", "at" \
);
#define ASM_END_0 \
ASM_END_COMMON_0 \
, [pb]"+r"(pb) \
ASM_END_COMMON_1
#define ASM_END_1 \
ASM_END_COMMON_0 \
ASM_END_COMMON_1
#define ADD_VECTOR(A, B, OUT, SIZE, EXTRA_SIZE) do { \
const uint32_t* pa = (const uint32_t*)(A); \
const uint32_t* pb = (const uint32_t*)(B); \
uint32_t* pout = (uint32_t*)(OUT); \
const uint32_t* const LoopEnd = pa + (SIZE); \
assert((SIZE) % 4 == 0); \
ASM_START \
ADD_TO_OUT(0, 4, 8, 12, 1, pa, pb, pout) \
ASM_END_0 \
if ((EXTRA_SIZE) > 0) { \
const int last = (EXTRA_SIZE); \
int i; \
for (i = 0; i < last; ++i) pout[i] = pa[i] + pb[i]; \
} \
} while (0)
#define ADD_VECTOR_EQ(A, OUT, SIZE, EXTRA_SIZE) do { \
const uint32_t* pa = (const uint32_t*)(A); \
uint32_t* pout = (uint32_t*)(OUT); \
const uint32_t* const LoopEnd = pa + (SIZE); \
assert((SIZE) % 4 == 0); \
ASM_START \
ADD_TO_OUT(0, 4, 8, 12, 0, pa, pout, pout) \
ASM_END_1 \
if ((EXTRA_SIZE) > 0) { \
const int last = (EXTRA_SIZE); \
int i; \
for (i = 0; i < last; ++i) pout[i] += pa[i]; \
} \
} while (0)
static void HistogramAdd(const VP8LHistogram* const a,
const VP8LHistogram* const b,
VP8LHistogram* const out) {
uint32_t temp0, temp1, temp2, temp3, temp4, temp5, temp6, temp7;
const int extra_cache_size = VP8LHistogramNumCodes(a->palette_code_bits_)
- (NUM_LITERAL_CODES + NUM_LENGTH_CODES);
assert(a->palette_code_bits_ == b->palette_code_bits_);
if (b != out) {
ADD_VECTOR(a->literal_, b->literal_, out->literal_,
NUM_LITERAL_CODES + NUM_LENGTH_CODES, extra_cache_size);
ADD_VECTOR(a->distance_, b->distance_, out->distance_,
NUM_DISTANCE_CODES, 0);
ADD_VECTOR(a->red_, b->red_, out->red_, NUM_LITERAL_CODES, 0);
ADD_VECTOR(a->blue_, b->blue_, out->blue_, NUM_LITERAL_CODES, 0);
ADD_VECTOR(a->alpha_, b->alpha_, out->alpha_, NUM_LITERAL_CODES, 0);
} else {
ADD_VECTOR_EQ(a->literal_, out->literal_,
NUM_LITERAL_CODES + NUM_LENGTH_CODES, extra_cache_size);
ADD_VECTOR_EQ(a->distance_, out->distance_, NUM_DISTANCE_CODES, 0);
ADD_VECTOR_EQ(a->red_, out->red_, NUM_LITERAL_CODES, 0);
ADD_VECTOR_EQ(a->blue_, out->blue_, NUM_LITERAL_CODES, 0);
ADD_VECTOR_EQ(a->alpha_, out->alpha_, NUM_LITERAL_CODES, 0);
}
}
#undef ADD_VECTOR_EQ
#undef ADD_VECTOR
#undef ASM_END_1
#undef ASM_END_0
#undef ASM_END_COMMON_1
#undef ASM_END_COMMON_0
#undef ADD_TO_OUT
#undef ASM_START
#endif // WEBP_USE_MIPS32
//------------------------------------------------------------------------------
// Entry point
extern void VP8LDspInitMIPS32(void);
void VP8LDspInitMIPS32(void) {
#if defined(WEBP_USE_MIPS32)
VP8LFastSLog2Slow = FastSLog2Slow;
VP8LFastLog2Slow = FastLog2Slow;
VP8LExtraCost = ExtraCost;
VP8LExtraCostCombined = ExtraCostCombined;
VP8LHuffmanCostCount = HuffmanCostCount;
VP8LHuffmanCostCombinedCount = HuffmanCostCombinedCount;
VP8LHistogramAdd = HistogramAdd;
#endif // WEBP_USE_MIPS32
}

357
src/dsp/lossless_neon.c Normal file
View File

@@ -0,0 +1,357 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// NEON variant of methods for lossless decoder
//
// Author: Skal (pascal.massimino@gmail.com)
#include "./dsp.h"
#if defined(WEBP_USE_NEON)
#include <arm_neon.h>
#include "./lossless.h"
#include "./neon.h"
//------------------------------------------------------------------------------
// Colorspace conversion functions
#if !defined(WORK_AROUND_GCC)
// gcc 4.6.0 had some trouble (NDK-r9) with this code. We only use it for
// gcc-4.8.x at least.
static void ConvertBGRAToRGBA(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const uint32_t* const end = src + (num_pixels & ~15);
for (; src < end; src += 16) {
uint8x16x4_t pixel = vld4q_u8((uint8_t*)src);
// swap B and R. (VSWP d0,d2 has no intrinsics equivalent!)
const uint8x16_t tmp = pixel.val[0];
pixel.val[0] = pixel.val[2];
pixel.val[2] = tmp;
vst4q_u8(dst, pixel);
dst += 64;
}
VP8LConvertBGRAToRGBA_C(src, num_pixels & 15, dst); // left-overs
}
static void ConvertBGRAToBGR(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const uint32_t* const end = src + (num_pixels & ~15);
for (; src < end; src += 16) {
const uint8x16x4_t pixel = vld4q_u8((uint8_t*)src);
const uint8x16x3_t tmp = { { pixel.val[0], pixel.val[1], pixel.val[2] } };
vst3q_u8(dst, tmp);
dst += 48;
}
VP8LConvertBGRAToBGR_C(src, num_pixels & 15, dst); // left-overs
}
static void ConvertBGRAToRGB(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const uint32_t* const end = src + (num_pixels & ~15);
for (; src < end; src += 16) {
const uint8x16x4_t pixel = vld4q_u8((uint8_t*)src);
const uint8x16x3_t tmp = { { pixel.val[2], pixel.val[1], pixel.val[0] } };
vst3q_u8(dst, tmp);
dst += 48;
}
VP8LConvertBGRAToRGB_C(src, num_pixels & 15, dst); // left-overs
}
#else // WORK_AROUND_GCC
// gcc-4.6.0 fallback
static const uint8_t kRGBAShuffle[8] = { 2, 1, 0, 3, 6, 5, 4, 7 };
static void ConvertBGRAToRGBA(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const uint32_t* const end = src + (num_pixels & ~1);
const uint8x8_t shuffle = vld1_u8(kRGBAShuffle);
for (; src < end; src += 2) {
const uint8x8_t pixels = vld1_u8((uint8_t*)src);
vst1_u8(dst, vtbl1_u8(pixels, shuffle));
dst += 8;
}
VP8LConvertBGRAToRGBA_C(src, num_pixels & 1, dst); // left-overs
}
static const uint8_t kBGRShuffle[3][8] = {
{ 0, 1, 2, 4, 5, 6, 8, 9 },
{ 10, 12, 13, 14, 16, 17, 18, 20 },
{ 21, 22, 24, 25, 26, 28, 29, 30 }
};
static void ConvertBGRAToBGR(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const uint32_t* const end = src + (num_pixels & ~7);
const uint8x8_t shuffle0 = vld1_u8(kBGRShuffle[0]);
const uint8x8_t shuffle1 = vld1_u8(kBGRShuffle[1]);
const uint8x8_t shuffle2 = vld1_u8(kBGRShuffle[2]);
for (; src < end; src += 8) {
uint8x8x4_t pixels;
INIT_VECTOR4(pixels,
vld1_u8((const uint8_t*)(src + 0)),
vld1_u8((const uint8_t*)(src + 2)),
vld1_u8((const uint8_t*)(src + 4)),
vld1_u8((const uint8_t*)(src + 6)));
vst1_u8(dst + 0, vtbl4_u8(pixels, shuffle0));
vst1_u8(dst + 8, vtbl4_u8(pixels, shuffle1));
vst1_u8(dst + 16, vtbl4_u8(pixels, shuffle2));
dst += 8 * 3;
}
VP8LConvertBGRAToBGR_C(src, num_pixels & 7, dst); // left-overs
}
static const uint8_t kRGBShuffle[3][8] = {
{ 2, 1, 0, 6, 5, 4, 10, 9 },
{ 8, 14, 13, 12, 18, 17, 16, 22 },
{ 21, 20, 26, 25, 24, 30, 29, 28 }
};
static void ConvertBGRAToRGB(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const uint32_t* const end = src + (num_pixels & ~7);
const uint8x8_t shuffle0 = vld1_u8(kRGBShuffle[0]);
const uint8x8_t shuffle1 = vld1_u8(kRGBShuffle[1]);
const uint8x8_t shuffle2 = vld1_u8(kRGBShuffle[2]);
for (; src < end; src += 8) {
uint8x8x4_t pixels;
INIT_VECTOR4(pixels,
vld1_u8((const uint8_t*)(src + 0)),
vld1_u8((const uint8_t*)(src + 2)),
vld1_u8((const uint8_t*)(src + 4)),
vld1_u8((const uint8_t*)(src + 6)));
vst1_u8(dst + 0, vtbl4_u8(pixels, shuffle0));
vst1_u8(dst + 8, vtbl4_u8(pixels, shuffle1));
vst1_u8(dst + 16, vtbl4_u8(pixels, shuffle2));
dst += 8 * 3;
}
VP8LConvertBGRAToRGB_C(src, num_pixels & 7, dst); // left-overs
}
#endif // !WORK_AROUND_GCC
//------------------------------------------------------------------------------
#ifdef USE_INTRINSICS
static WEBP_INLINE uint32_t Average2(const uint32_t* const a,
const uint32_t* const b) {
const uint8x8_t a0 = vreinterpret_u8_u64(vcreate_u64(*a));
const uint8x8_t b0 = vreinterpret_u8_u64(vcreate_u64(*b));
const uint8x8_t avg = vhadd_u8(a0, b0);
return vget_lane_u32(vreinterpret_u32_u8(avg), 0);
}
static WEBP_INLINE uint32_t Average3(const uint32_t* const a,
const uint32_t* const b,
const uint32_t* const c) {
const uint8x8_t a0 = vreinterpret_u8_u64(vcreate_u64(*a));
const uint8x8_t b0 = vreinterpret_u8_u64(vcreate_u64(*b));
const uint8x8_t c0 = vreinterpret_u8_u64(vcreate_u64(*c));
const uint8x8_t avg1 = vhadd_u8(a0, c0);
const uint8x8_t avg2 = vhadd_u8(avg1, b0);
return vget_lane_u32(vreinterpret_u32_u8(avg2), 0);
}
static WEBP_INLINE uint32_t Average4(const uint32_t* const a,
const uint32_t* const b,
const uint32_t* const c,
const uint32_t* const d) {
const uint8x8_t a0 = vreinterpret_u8_u64(vcreate_u64(*a));
const uint8x8_t b0 = vreinterpret_u8_u64(vcreate_u64(*b));
const uint8x8_t c0 = vreinterpret_u8_u64(vcreate_u64(*c));
const uint8x8_t d0 = vreinterpret_u8_u64(vcreate_u64(*d));
const uint8x8_t avg1 = vhadd_u8(a0, b0);
const uint8x8_t avg2 = vhadd_u8(c0, d0);
const uint8x8_t avg3 = vhadd_u8(avg1, avg2);
return vget_lane_u32(vreinterpret_u32_u8(avg3), 0);
}
static uint32_t Predictor5(uint32_t left, const uint32_t* const top) {
return Average3(&left, top + 0, top + 1);
}
static uint32_t Predictor6(uint32_t left, const uint32_t* const top) {
return Average2(&left, top - 1);
}
static uint32_t Predictor7(uint32_t left, const uint32_t* const top) {
return Average2(&left, top + 0);
}
static uint32_t Predictor8(uint32_t left, const uint32_t* const top) {
(void)left;
return Average2(top - 1, top + 0);
}
static uint32_t Predictor9(uint32_t left, const uint32_t* const top) {
(void)left;
return Average2(top + 0, top + 1);
}
static uint32_t Predictor10(uint32_t left, const uint32_t* const top) {
return Average4(&left, top - 1, top + 0, top + 1);
}
//------------------------------------------------------------------------------
static WEBP_INLINE uint32_t Select(const uint32_t* const c0,
const uint32_t* const c1,
const uint32_t* const c2) {
const uint8x8_t p0 = vreinterpret_u8_u64(vcreate_u64(*c0));
const uint8x8_t p1 = vreinterpret_u8_u64(vcreate_u64(*c1));
const uint8x8_t p2 = vreinterpret_u8_u64(vcreate_u64(*c2));
const uint8x8_t bc = vabd_u8(p1, p2); // |b-c|
const uint8x8_t ac = vabd_u8(p0, p2); // |a-c|
const int16x4_t sum_bc = vreinterpret_s16_u16(vpaddl_u8(bc));
const int16x4_t sum_ac = vreinterpret_s16_u16(vpaddl_u8(ac));
const int32x2_t diff = vpaddl_s16(vsub_s16(sum_bc, sum_ac));
const int32_t pa_minus_pb = vget_lane_s32(diff, 0);
return (pa_minus_pb <= 0) ? *c0 : *c1;
}
static uint32_t Predictor11(uint32_t left, const uint32_t* const top) {
return Select(top + 0, &left, top - 1);
}
static WEBP_INLINE uint32_t ClampedAddSubtractFull(const uint32_t* const c0,
const uint32_t* const c1,
const uint32_t* const c2) {
const uint8x8_t p0 = vreinterpret_u8_u64(vcreate_u64(*c0));
const uint8x8_t p1 = vreinterpret_u8_u64(vcreate_u64(*c1));
const uint8x8_t p2 = vreinterpret_u8_u64(vcreate_u64(*c2));
const uint16x8_t sum0 = vaddl_u8(p0, p1); // add and widen
const uint16x8_t sum1 = vqsubq_u16(sum0, vmovl_u8(p2)); // widen and subtract
const uint8x8_t out = vqmovn_u16(sum1); // narrow and clamp
return vget_lane_u32(vreinterpret_u32_u8(out), 0);
}
static uint32_t Predictor12(uint32_t left, const uint32_t* const top) {
return ClampedAddSubtractFull(&left, top + 0, top - 1);
}
static WEBP_INLINE uint32_t ClampedAddSubtractHalf(const uint32_t* const c0,
const uint32_t* const c1,
const uint32_t* const c2) {
const uint8x8_t p0 = vreinterpret_u8_u64(vcreate_u64(*c0));
const uint8x8_t p1 = vreinterpret_u8_u64(vcreate_u64(*c1));
const uint8x8_t p2 = vreinterpret_u8_u64(vcreate_u64(*c2));
const uint8x8_t avg = vhadd_u8(p0, p1); // Average(c0,c1)
const uint8x8_t ab = vshr_n_u8(vqsub_u8(avg, p2), 1); // (a-b)>>1 saturated
const uint8x8_t ba = vshr_n_u8(vqsub_u8(p2, avg), 1); // (b-a)>>1 saturated
const uint8x8_t out = vqsub_u8(vqadd_u8(avg, ab), ba);
return vget_lane_u32(vreinterpret_u32_u8(out), 0);
}
static uint32_t Predictor13(uint32_t left, const uint32_t* const top) {
return ClampedAddSubtractHalf(&left, top + 0, top - 1);
}
//------------------------------------------------------------------------------
// Subtract-Green Transform
// vtbl?_u8 are marked unavailable for iOS arm64 with Xcode < 6.3, use
// non-standard versions there.
#if defined(__APPLE__) && defined(__aarch64__) && \
defined(__apple_build_version__) && (__apple_build_version__< 6020037)
#define USE_VTBLQ
#endif
#ifdef USE_VTBLQ
// 255 = byte will be zeroed
static const uint8_t kGreenShuffle[16] = {
1, 255, 1, 255, 5, 255, 5, 255, 9, 255, 9, 255, 13, 255, 13, 255
};
static WEBP_INLINE uint8x16_t DoGreenShuffle(const uint8x16_t argb,
const uint8x16_t shuffle) {
return vcombine_u8(vtbl1q_u8(argb, vget_low_u8(shuffle)),
vtbl1q_u8(argb, vget_high_u8(shuffle)));
}
#else // !USE_VTBLQ
// 255 = byte will be zeroed
static const uint8_t kGreenShuffle[8] = { 1, 255, 1, 255, 5, 255, 5, 255 };
static WEBP_INLINE uint8x16_t DoGreenShuffle(const uint8x16_t argb,
const uint8x8_t shuffle) {
return vcombine_u8(vtbl1_u8(vget_low_u8(argb), shuffle),
vtbl1_u8(vget_high_u8(argb), shuffle));
}
#endif // USE_VTBLQ
static void SubtractGreenFromBlueAndRed(uint32_t* argb_data, int num_pixels) {
const uint32_t* const end = argb_data + (num_pixels & ~3);
#ifdef USE_VTBLQ
const uint8x16_t shuffle = vld1q_u8(kGreenShuffle);
#else
const uint8x8_t shuffle = vld1_u8(kGreenShuffle);
#endif
for (; argb_data < end; argb_data += 4) {
const uint8x16_t argb = vld1q_u8((uint8_t*)argb_data);
const uint8x16_t greens = DoGreenShuffle(argb, shuffle);
vst1q_u8((uint8_t*)argb_data, vsubq_u8(argb, greens));
}
// fallthrough and finish off with plain-C
VP8LSubtractGreenFromBlueAndRed_C(argb_data, num_pixels & 3);
}
static void AddGreenToBlueAndRed(uint32_t* argb_data, int num_pixels) {
const uint32_t* const end = argb_data + (num_pixels & ~3);
#ifdef USE_VTBLQ
const uint8x16_t shuffle = vld1q_u8(kGreenShuffle);
#else
const uint8x8_t shuffle = vld1_u8(kGreenShuffle);
#endif
for (; argb_data < end; argb_data += 4) {
const uint8x16_t argb = vld1q_u8((uint8_t*)argb_data);
const uint8x16_t greens = DoGreenShuffle(argb, shuffle);
vst1q_u8((uint8_t*)argb_data, vaddq_u8(argb, greens));
}
// fallthrough and finish off with plain-C
VP8LAddGreenToBlueAndRed_C(argb_data, num_pixels & 3);
}
#undef USE_VTBLQ
#endif // USE_INTRINSICS
#endif // WEBP_USE_NEON
//------------------------------------------------------------------------------
extern void VP8LDspInitNEON(void);
void VP8LDspInitNEON(void) {
#if defined(WEBP_USE_NEON)
VP8LConvertBGRAToRGBA = ConvertBGRAToRGBA;
VP8LConvertBGRAToBGR = ConvertBGRAToBGR;
VP8LConvertBGRAToRGB = ConvertBGRAToRGB;
#ifdef USE_INTRINSICS
VP8LPredictors[5] = Predictor5;
VP8LPredictors[6] = Predictor6;
VP8LPredictors[7] = Predictor7;
VP8LPredictors[8] = Predictor8;
VP8LPredictors[9] = Predictor9;
VP8LPredictors[10] = Predictor10;
VP8LPredictors[11] = Predictor11;
VP8LPredictors[12] = Predictor12;
VP8LPredictors[13] = Predictor13;
VP8LSubtractGreenFromBlueAndRed = SubtractGreenFromBlueAndRed;
VP8LAddGreenToBlueAndRed = AddGreenToBlueAndRed;
#endif
#endif // WEBP_USE_NEON
}
//------------------------------------------------------------------------------

535
src/dsp/lossless_sse2.c Normal file
View File

@@ -0,0 +1,535 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// SSE2 variant of methods for lossless decoder
//
// Author: Skal (pascal.massimino@gmail.com)
#include "./dsp.h"
#include <assert.h>
#if defined(WEBP_USE_SSE2)
#include <emmintrin.h>
#include "./lossless.h"
//------------------------------------------------------------------------------
// Predictor Transform
static WEBP_INLINE uint32_t ClampedAddSubtractFull(uint32_t c0, uint32_t c1,
uint32_t c2) {
const __m128i zero = _mm_setzero_si128();
const __m128i C0 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(c0), zero);
const __m128i C1 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(c1), zero);
const __m128i C2 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(c2), zero);
const __m128i V1 = _mm_add_epi16(C0, C1);
const __m128i V2 = _mm_sub_epi16(V1, C2);
const __m128i b = _mm_packus_epi16(V2, V2);
const uint32_t output = _mm_cvtsi128_si32(b);
return output;
}
static WEBP_INLINE uint32_t ClampedAddSubtractHalf(uint32_t c0, uint32_t c1,
uint32_t c2) {
const __m128i zero = _mm_setzero_si128();
const __m128i C0 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(c0), zero);
const __m128i C1 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(c1), zero);
const __m128i B0 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(c2), zero);
const __m128i avg = _mm_add_epi16(C1, C0);
const __m128i A0 = _mm_srli_epi16(avg, 1);
const __m128i A1 = _mm_sub_epi16(A0, B0);
const __m128i BgtA = _mm_cmpgt_epi16(B0, A0);
const __m128i A2 = _mm_sub_epi16(A1, BgtA);
const __m128i A3 = _mm_srai_epi16(A2, 1);
const __m128i A4 = _mm_add_epi16(A0, A3);
const __m128i A5 = _mm_packus_epi16(A4, A4);
const uint32_t output = _mm_cvtsi128_si32(A5);
return output;
}
static WEBP_INLINE uint32_t Select(uint32_t a, uint32_t b, uint32_t c) {
int pa_minus_pb;
const __m128i zero = _mm_setzero_si128();
const __m128i A0 = _mm_cvtsi32_si128(a);
const __m128i B0 = _mm_cvtsi32_si128(b);
const __m128i C0 = _mm_cvtsi32_si128(c);
const __m128i AC0 = _mm_subs_epu8(A0, C0);
const __m128i CA0 = _mm_subs_epu8(C0, A0);
const __m128i BC0 = _mm_subs_epu8(B0, C0);
const __m128i CB0 = _mm_subs_epu8(C0, B0);
const __m128i AC = _mm_or_si128(AC0, CA0);
const __m128i BC = _mm_or_si128(BC0, CB0);
const __m128i pa = _mm_unpacklo_epi8(AC, zero); // |a - c|
const __m128i pb = _mm_unpacklo_epi8(BC, zero); // |b - c|
const __m128i diff = _mm_sub_epi16(pb, pa);
{
int16_t out[8];
_mm_storeu_si128((__m128i*)out, diff);
pa_minus_pb = out[0] + out[1] + out[2] + out[3];
}
return (pa_minus_pb <= 0) ? a : b;
}
static WEBP_INLINE __m128i Average2_128i(uint32_t a0, uint32_t a1) {
const __m128i zero = _mm_setzero_si128();
const __m128i A0 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(a0), zero);
const __m128i A1 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(a1), zero);
const __m128i sum = _mm_add_epi16(A1, A0);
const __m128i avg = _mm_srli_epi16(sum, 1);
return avg;
}
static WEBP_INLINE uint32_t Average2(uint32_t a0, uint32_t a1) {
const __m128i avg = Average2_128i(a0, a1);
const __m128i A2 = _mm_packus_epi16(avg, avg);
const uint32_t output = _mm_cvtsi128_si32(A2);
return output;
}
static WEBP_INLINE uint32_t Average3(uint32_t a0, uint32_t a1, uint32_t a2) {
const __m128i zero = _mm_setzero_si128();
const __m128i avg1 = Average2_128i(a0, a2);
const __m128i A1 = _mm_unpacklo_epi8(_mm_cvtsi32_si128(a1), zero);
const __m128i sum = _mm_add_epi16(avg1, A1);
const __m128i avg2 = _mm_srli_epi16(sum, 1);
const __m128i A2 = _mm_packus_epi16(avg2, avg2);
const uint32_t output = _mm_cvtsi128_si32(A2);
return output;
}
static WEBP_INLINE uint32_t Average4(uint32_t a0, uint32_t a1,
uint32_t a2, uint32_t a3) {
const __m128i avg1 = Average2_128i(a0, a1);
const __m128i avg2 = Average2_128i(a2, a3);
const __m128i sum = _mm_add_epi16(avg2, avg1);
const __m128i avg3 = _mm_srli_epi16(sum, 1);
const __m128i A0 = _mm_packus_epi16(avg3, avg3);
const uint32_t output = _mm_cvtsi128_si32(A0);
return output;
}
static uint32_t Predictor5(uint32_t left, const uint32_t* const top) {
const uint32_t pred = Average3(left, top[0], top[1]);
return pred;
}
static uint32_t Predictor6(uint32_t left, const uint32_t* const top) {
const uint32_t pred = Average2(left, top[-1]);
return pred;
}
static uint32_t Predictor7(uint32_t left, const uint32_t* const top) {
const uint32_t pred = Average2(left, top[0]);
return pred;
}
static uint32_t Predictor8(uint32_t left, const uint32_t* const top) {
const uint32_t pred = Average2(top[-1], top[0]);
(void)left;
return pred;
}
static uint32_t Predictor9(uint32_t left, const uint32_t* const top) {
const uint32_t pred = Average2(top[0], top[1]);
(void)left;
return pred;
}
static uint32_t Predictor10(uint32_t left, const uint32_t* const top) {
const uint32_t pred = Average4(left, top[-1], top[0], top[1]);
return pred;
}
static uint32_t Predictor11(uint32_t left, const uint32_t* const top) {
const uint32_t pred = Select(top[0], left, top[-1]);
return pred;
}
static uint32_t Predictor12(uint32_t left, const uint32_t* const top) {
const uint32_t pred = ClampedAddSubtractFull(left, top[0], top[-1]);
return pred;
}
static uint32_t Predictor13(uint32_t left, const uint32_t* const top) {
const uint32_t pred = ClampedAddSubtractHalf(left, top[0], top[-1]);
return pred;
}
//------------------------------------------------------------------------------
// Subtract-Green Transform
static void SubtractGreenFromBlueAndRed(uint32_t* argb_data, int num_pixels) {
const __m128i mask = _mm_set1_epi32(0x0000ff00);
int i;
for (i = 0; i + 4 <= num_pixels; i += 4) {
const __m128i in = _mm_loadu_si128((__m128i*)&argb_data[i]);
const __m128i in_00g0 = _mm_and_si128(in, mask); // 00g0|00g0|...
const __m128i in_0g00 = _mm_slli_epi32(in_00g0, 8); // 0g00|0g00|...
const __m128i in_000g = _mm_srli_epi32(in_00g0, 8); // 000g|000g|...
const __m128i in_0g0g = _mm_or_si128(in_0g00, in_000g);
const __m128i out = _mm_sub_epi8(in, in_0g0g);
_mm_storeu_si128((__m128i*)&argb_data[i], out);
}
// fallthrough and finish off with plain-C
VP8LSubtractGreenFromBlueAndRed_C(argb_data + i, num_pixels - i);
}
static void AddGreenToBlueAndRed(uint32_t* argb_data, int num_pixels) {
const __m128i mask = _mm_set1_epi32(0x0000ff00);
int i;
for (i = 0; i + 4 <= num_pixels; i += 4) {
const __m128i in = _mm_loadu_si128((__m128i*)&argb_data[i]);
const __m128i in_00g0 = _mm_and_si128(in, mask); // 00g0|00g0|...
const __m128i in_0g00 = _mm_slli_epi32(in_00g0, 8); // 0g00|0g00|...
const __m128i in_000g = _mm_srli_epi32(in_00g0, 8); // 000g|000g|...
const __m128i in_0g0g = _mm_or_si128(in_0g00, in_000g);
const __m128i out = _mm_add_epi8(in, in_0g0g);
_mm_storeu_si128((__m128i*)&argb_data[i], out);
}
// fallthrough and finish off with plain-C
VP8LAddGreenToBlueAndRed_C(argb_data + i, num_pixels - i);
}
//------------------------------------------------------------------------------
// Color Transform
static WEBP_INLINE __m128i ColorTransformDelta(__m128i color_pred,
__m128i color) {
// We simulate signed 8-bit multiplication as:
// * Left shift the two (8-bit) numbers by 8 bits,
// * Perform a 16-bit signed multiplication and retain the higher 16-bits.
const __m128i color_pred_shifted = _mm_slli_epi32(color_pred, 8);
const __m128i color_shifted = _mm_slli_epi32(color, 8);
// Note: This performs multiplication on 8 packed 16-bit numbers, 4 of which
// happen to be zeroes.
const __m128i signed_mult =
_mm_mulhi_epi16(color_pred_shifted, color_shifted);
return _mm_srli_epi32(signed_mult, 5);
}
static WEBP_INLINE void TransformColor(const VP8LMultipliers* const m,
uint32_t* argb_data,
int num_pixels) {
const __m128i g_to_r = _mm_set1_epi32(m->green_to_red_); // multipliers
const __m128i g_to_b = _mm_set1_epi32(m->green_to_blue_);
const __m128i r_to_b = _mm_set1_epi32(m->red_to_blue_);
int i;
for (i = 0; i + 4 <= num_pixels; i += 4) {
const __m128i in = _mm_loadu_si128((__m128i*)&argb_data[i]);
const __m128i alpha_green_mask = _mm_set1_epi32(0xff00ff00); // masks
const __m128i red_mask = _mm_set1_epi32(0x00ff0000);
const __m128i green_mask = _mm_set1_epi32(0x0000ff00);
const __m128i lower_8bit_mask = _mm_set1_epi32(0x000000ff);
const __m128i ag = _mm_and_si128(in, alpha_green_mask); // alpha, green
const __m128i r = _mm_srli_epi32(_mm_and_si128(in, red_mask), 16);
const __m128i g = _mm_srli_epi32(_mm_and_si128(in, green_mask), 8);
const __m128i b = in;
const __m128i r_delta = ColorTransformDelta(g_to_r, g); // red
const __m128i r_new =
_mm_and_si128(_mm_sub_epi32(r, r_delta), lower_8bit_mask);
const __m128i r_new_shifted = _mm_slli_epi32(r_new, 16);
const __m128i b_delta_1 = ColorTransformDelta(g_to_b, g); // blue
const __m128i b_delta_2 = ColorTransformDelta(r_to_b, r);
const __m128i b_delta = _mm_add_epi32(b_delta_1, b_delta_2);
const __m128i b_new =
_mm_and_si128(_mm_sub_epi32(b, b_delta), lower_8bit_mask);
const __m128i out = _mm_or_si128(_mm_or_si128(ag, r_new_shifted), b_new);
_mm_storeu_si128((__m128i*)&argb_data[i], out);
}
// Fall-back to C-version for left-overs.
VP8LTransformColor_C(m, argb_data + i, num_pixels - i);
}
static WEBP_INLINE void TransformColorInverse(const VP8LMultipliers* const m,
uint32_t* argb_data,
int num_pixels) {
const __m128i g_to_r = _mm_set1_epi32(m->green_to_red_); // multipliers
const __m128i g_to_b = _mm_set1_epi32(m->green_to_blue_);
const __m128i r_to_b = _mm_set1_epi32(m->red_to_blue_);
int i;
for (i = 0; i + 4 <= num_pixels; i += 4) {
const __m128i in = _mm_loadu_si128((__m128i*)&argb_data[i]);
const __m128i alpha_green_mask = _mm_set1_epi32(0xff00ff00); // masks
const __m128i red_mask = _mm_set1_epi32(0x00ff0000);
const __m128i green_mask = _mm_set1_epi32(0x0000ff00);
const __m128i lower_8bit_mask = _mm_set1_epi32(0x000000ff);
const __m128i ag = _mm_and_si128(in, alpha_green_mask); // alpha, green
const __m128i r = _mm_srli_epi32(_mm_and_si128(in, red_mask), 16);
const __m128i g = _mm_srli_epi32(_mm_and_si128(in, green_mask), 8);
const __m128i b = in;
const __m128i r_delta = ColorTransformDelta(g_to_r, g); // red
const __m128i r_new =
_mm_and_si128(_mm_add_epi32(r, r_delta), lower_8bit_mask);
const __m128i r_new_shifted = _mm_slli_epi32(r_new, 16);
const __m128i b_delta_1 = ColorTransformDelta(g_to_b, g); // blue
const __m128i b_delta_2 = ColorTransformDelta(r_to_b, r_new);
const __m128i b_delta = _mm_add_epi32(b_delta_1, b_delta_2);
const __m128i b_new =
_mm_and_si128(_mm_add_epi32(b, b_delta), lower_8bit_mask);
const __m128i out = _mm_or_si128(_mm_or_si128(ag, r_new_shifted), b_new);
_mm_storeu_si128((__m128i*)&argb_data[i], out);
}
// Fall-back to C-version for left-overs.
VP8LTransformColorInverse_C(m, argb_data + i, num_pixels - i);
}
//------------------------------------------------------------------------------
// Color-space conversion functions
static void ConvertBGRAToRGBA(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const __m128i* in = (const __m128i*)src;
__m128i* out = (__m128i*)dst;
while (num_pixels >= 8) {
const __m128i bgra0 = _mm_loadu_si128(in++); // bgra0|bgra1|bgra2|bgra3
const __m128i bgra4 = _mm_loadu_si128(in++); // bgra4|bgra5|bgra6|bgra7
const __m128i v0l = _mm_unpacklo_epi8(bgra0, bgra4); // b0b4g0g4r0r4a0a4...
const __m128i v0h = _mm_unpackhi_epi8(bgra0, bgra4); // b2b6g2g6r2r6a2a6...
const __m128i v1l = _mm_unpacklo_epi8(v0l, v0h); // b0b2b4b6g0g2g4g6...
const __m128i v1h = _mm_unpackhi_epi8(v0l, v0h); // b1b3b5b7g1g3g5g7...
const __m128i v2l = _mm_unpacklo_epi8(v1l, v1h); // b0...b7 | g0...g7
const __m128i v2h = _mm_unpackhi_epi8(v1l, v1h); // r0...r7 | a0...a7
const __m128i ga0 = _mm_unpackhi_epi64(v2l, v2h); // g0...g7 | a0...a7
const __m128i rb0 = _mm_unpacklo_epi64(v2h, v2l); // r0...r7 | b0...b7
const __m128i rg0 = _mm_unpacklo_epi8(rb0, ga0); // r0g0r1g1 ... r6g6r7g7
const __m128i ba0 = _mm_unpackhi_epi8(rb0, ga0); // b0a0b1a1 ... b6a6b7a7
const __m128i rgba0 = _mm_unpacklo_epi16(rg0, ba0); // rgba0|rgba1...
const __m128i rgba4 = _mm_unpackhi_epi16(rg0, ba0); // rgba4|rgba5...
_mm_storeu_si128(out++, rgba0);
_mm_storeu_si128(out++, rgba4);
num_pixels -= 8;
}
// left-overs
VP8LConvertBGRAToRGBA_C((const uint32_t*)in, num_pixels, (uint8_t*)out);
}
static void ConvertBGRAToRGBA4444(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const __m128i mask_0x0f = _mm_set1_epi8(0x0f);
const __m128i mask_0xf0 = _mm_set1_epi8(0xf0);
const __m128i* in = (const __m128i*)src;
__m128i* out = (__m128i*)dst;
while (num_pixels >= 8) {
const __m128i bgra0 = _mm_loadu_si128(in++); // bgra0|bgra1|bgra2|bgra3
const __m128i bgra4 = _mm_loadu_si128(in++); // bgra4|bgra5|bgra6|bgra7
const __m128i v0l = _mm_unpacklo_epi8(bgra0, bgra4); // b0b4g0g4r0r4a0a4...
const __m128i v0h = _mm_unpackhi_epi8(bgra0, bgra4); // b2b6g2g6r2r6a2a6...
const __m128i v1l = _mm_unpacklo_epi8(v0l, v0h); // b0b2b4b6g0g2g4g6...
const __m128i v1h = _mm_unpackhi_epi8(v0l, v0h); // b1b3b5b7g1g3g5g7...
const __m128i v2l = _mm_unpacklo_epi8(v1l, v1h); // b0...b7 | g0...g7
const __m128i v2h = _mm_unpackhi_epi8(v1l, v1h); // r0...r7 | a0...a7
const __m128i ga0 = _mm_unpackhi_epi64(v2l, v2h); // g0...g7 | a0...a7
const __m128i rb0 = _mm_unpacklo_epi64(v2h, v2l); // r0...r7 | b0...b7
const __m128i ga1 = _mm_srli_epi16(ga0, 4); // g0-|g1-|...|a6-|a7-
const __m128i rb1 = _mm_and_si128(rb0, mask_0xf0); // -r0|-r1|...|-b6|-a7
const __m128i ga2 = _mm_and_si128(ga1, mask_0x0f); // g0-|g1-|...|a6-|a7-
const __m128i rgba0 = _mm_or_si128(ga2, rb1); // rg0..rg7 | ba0..ba7
const __m128i rgba1 = _mm_srli_si128(rgba0, 8); // ba0..ba7 | 0
#ifdef WEBP_SWAP_16BIT_CSP
const __m128i rgba = _mm_unpacklo_epi8(rgba1, rgba0); // barg0...barg7
#else
const __m128i rgba = _mm_unpacklo_epi8(rgba0, rgba1); // rgba0...rgba7
#endif
_mm_storeu_si128(out++, rgba);
num_pixels -= 8;
}
// left-overs
VP8LConvertBGRAToRGBA4444_C((const uint32_t*)in, num_pixels, (uint8_t*)out);
}
static void ConvertBGRAToRGB565(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const __m128i mask_0xe0 = _mm_set1_epi8(0xe0);
const __m128i mask_0xf8 = _mm_set1_epi8(0xf8);
const __m128i mask_0x07 = _mm_set1_epi8(0x07);
const __m128i* in = (const __m128i*)src;
__m128i* out = (__m128i*)dst;
while (num_pixels >= 8) {
const __m128i bgra0 = _mm_loadu_si128(in++); // bgra0|bgra1|bgra2|bgra3
const __m128i bgra4 = _mm_loadu_si128(in++); // bgra4|bgra5|bgra6|bgra7
const __m128i v0l = _mm_unpacklo_epi8(bgra0, bgra4); // b0b4g0g4r0r4a0a4...
const __m128i v0h = _mm_unpackhi_epi8(bgra0, bgra4); // b2b6g2g6r2r6a2a6...
const __m128i v1l = _mm_unpacklo_epi8(v0l, v0h); // b0b2b4b6g0g2g4g6...
const __m128i v1h = _mm_unpackhi_epi8(v0l, v0h); // b1b3b5b7g1g3g5g7...
const __m128i v2l = _mm_unpacklo_epi8(v1l, v1h); // b0...b7 | g0...g7
const __m128i v2h = _mm_unpackhi_epi8(v1l, v1h); // r0...r7 | a0...a7
const __m128i ga0 = _mm_unpackhi_epi64(v2l, v2h); // g0...g7 | a0...a7
const __m128i rb0 = _mm_unpacklo_epi64(v2h, v2l); // r0...r7 | b0...b7
const __m128i rb1 = _mm_and_si128(rb0, mask_0xf8); // -r0..-r7|-b0..-b7
const __m128i g_lo1 = _mm_srli_epi16(ga0, 5);
const __m128i g_lo2 = _mm_and_si128(g_lo1, mask_0x07); // g0-...g7-|xx (3b)
const __m128i g_hi1 = _mm_slli_epi16(ga0, 3);
const __m128i g_hi2 = _mm_and_si128(g_hi1, mask_0xe0); // -g0...-g7|xx (3b)
const __m128i b0 = _mm_srli_si128(rb1, 8); // -b0...-b7|0
const __m128i rg1 = _mm_or_si128(rb1, g_lo2); // gr0...gr7|xx
const __m128i b1 = _mm_srli_epi16(b0, 3);
const __m128i gb1 = _mm_or_si128(b1, g_hi2); // bg0...bg7|xx
#ifdef WEBP_SWAP_16BIT_CSP
const __m128i rgba = _mm_unpacklo_epi8(gb1, rg1); // rggb0...rggb7
#else
const __m128i rgba = _mm_unpacklo_epi8(rg1, gb1); // bgrb0...bgrb7
#endif
_mm_storeu_si128(out++, rgba);
num_pixels -= 8;
}
// left-overs
VP8LConvertBGRAToRGB565_C((const uint32_t*)in, num_pixels, (uint8_t*)out);
}
static void ConvertBGRAToBGR(const uint32_t* src,
int num_pixels, uint8_t* dst) {
const __m128i mask_l = _mm_set_epi32(0, 0x00ffffff, 0, 0x00ffffff);
const __m128i mask_h = _mm_set_epi32(0x00ffffff, 0, 0x00ffffff, 0);
const __m128i* in = (const __m128i*)src;
const uint8_t* const end = dst + num_pixels * 3;
// the last storel_epi64 below writes 8 bytes starting at offset 18
while (dst + 26 <= end) {
const __m128i bgra0 = _mm_loadu_si128(in++); // bgra0|bgra1|bgra2|bgra3
const __m128i bgra4 = _mm_loadu_si128(in++); // bgra4|bgra5|bgra6|bgra7
const __m128i a0l = _mm_and_si128(bgra0, mask_l); // bgr0|0|bgr0|0
const __m128i a4l = _mm_and_si128(bgra4, mask_l); // bgr0|0|bgr0|0
const __m128i a0h = _mm_and_si128(bgra0, mask_h); // 0|bgr0|0|bgr0
const __m128i a4h = _mm_and_si128(bgra4, mask_h); // 0|bgr0|0|bgr0
const __m128i b0h = _mm_srli_epi64(a0h, 8); // 000b|gr00|000b|gr00
const __m128i b4h = _mm_srli_epi64(a4h, 8); // 000b|gr00|000b|gr00
const __m128i c0 = _mm_or_si128(a0l, b0h); // rgbrgb00|rgbrgb00
const __m128i c4 = _mm_or_si128(a4l, b4h); // rgbrgb00|rgbrgb00
const __m128i c2 = _mm_srli_si128(c0, 8);
const __m128i c6 = _mm_srli_si128(c4, 8);
_mm_storel_epi64((__m128i*)(dst + 0), c0);
_mm_storel_epi64((__m128i*)(dst + 6), c2);
_mm_storel_epi64((__m128i*)(dst + 12), c4);
_mm_storel_epi64((__m128i*)(dst + 18), c6);
dst += 24;
num_pixels -= 8;
}
// left-overs
VP8LConvertBGRAToBGR_C((const uint32_t*)in, num_pixels, dst);
}
//------------------------------------------------------------------------------
#define LINE_SIZE 16 // 8 or 16
static void AddVector(const uint32_t* a, const uint32_t* b, uint32_t* out,
int size) {
int i;
assert(size % LINE_SIZE == 0);
for (i = 0; i < size; i += LINE_SIZE) {
const __m128i a0 = _mm_loadu_si128((__m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((__m128i*)&a[i + 4]);
#if (LINE_SIZE == 16)
const __m128i a2 = _mm_loadu_si128((__m128i*)&a[i + 8]);
const __m128i a3 = _mm_loadu_si128((__m128i*)&a[i + 12]);
#endif
const __m128i b0 = _mm_loadu_si128((__m128i*)&b[i + 0]);
const __m128i b1 = _mm_loadu_si128((__m128i*)&b[i + 4]);
#if (LINE_SIZE == 16)
const __m128i b2 = _mm_loadu_si128((__m128i*)&b[i + 8]);
const __m128i b3 = _mm_loadu_si128((__m128i*)&b[i + 12]);
#endif
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
#if (LINE_SIZE == 16)
_mm_storeu_si128((__m128i*)&out[i + 8], _mm_add_epi32(a2, b2));
_mm_storeu_si128((__m128i*)&out[i + 12], _mm_add_epi32(a3, b3));
#endif
}
}
static void AddVectorEq(const uint32_t* a, uint32_t* out, int size) {
int i;
assert(size % LINE_SIZE == 0);
for (i = 0; i < size; i += LINE_SIZE) {
const __m128i a0 = _mm_loadu_si128((__m128i*)&a[i + 0]);
const __m128i a1 = _mm_loadu_si128((__m128i*)&a[i + 4]);
#if (LINE_SIZE == 16)
const __m128i a2 = _mm_loadu_si128((__m128i*)&a[i + 8]);
const __m128i a3 = _mm_loadu_si128((__m128i*)&a[i + 12]);
#endif
const __m128i b0 = _mm_loadu_si128((__m128i*)&out[i + 0]);
const __m128i b1 = _mm_loadu_si128((__m128i*)&out[i + 4]);
#if (LINE_SIZE == 16)
const __m128i b2 = _mm_loadu_si128((__m128i*)&out[i + 8]);
const __m128i b3 = _mm_loadu_si128((__m128i*)&out[i + 12]);
#endif
_mm_storeu_si128((__m128i*)&out[i + 0], _mm_add_epi32(a0, b0));
_mm_storeu_si128((__m128i*)&out[i + 4], _mm_add_epi32(a1, b1));
#if (LINE_SIZE == 16)
_mm_storeu_si128((__m128i*)&out[i + 8], _mm_add_epi32(a2, b2));
_mm_storeu_si128((__m128i*)&out[i + 12], _mm_add_epi32(a3, b3));
#endif
}
}
#undef LINE_SIZE
// Note we are adding uint32_t's as *signed* int32's (using _mm_add_epi32). But
// that's ok since the histogram values are less than 1<<28 (max picture size).
static void HistogramAdd(const VP8LHistogram* const a,
const VP8LHistogram* const b,
VP8LHistogram* const out) {
int i;
const int literal_size = VP8LHistogramNumCodes(a->palette_code_bits_);
assert(a->palette_code_bits_ == b->palette_code_bits_);
if (b != out) {
AddVector(a->literal_, b->literal_, out->literal_, NUM_LITERAL_CODES);
AddVector(a->red_, b->red_, out->red_, NUM_LITERAL_CODES);
AddVector(a->blue_, b->blue_, out->blue_, NUM_LITERAL_CODES);
AddVector(a->alpha_, b->alpha_, out->alpha_, NUM_LITERAL_CODES);
} else {
AddVectorEq(a->literal_, out->literal_, NUM_LITERAL_CODES);
AddVectorEq(a->red_, out->red_, NUM_LITERAL_CODES);
AddVectorEq(a->blue_, out->blue_, NUM_LITERAL_CODES);
AddVectorEq(a->alpha_, out->alpha_, NUM_LITERAL_CODES);
}
for (i = NUM_LITERAL_CODES; i < literal_size; ++i) {
out->literal_[i] = a->literal_[i] + b->literal_[i];
}
for (i = 0; i < NUM_DISTANCE_CODES; ++i) {
out->distance_[i] = a->distance_[i] + b->distance_[i];
}
}
#endif // WEBP_USE_SSE2
//------------------------------------------------------------------------------
extern void VP8LDspInitSSE2(void);
void VP8LDspInitSSE2(void) {
#if defined(WEBP_USE_SSE2)
VP8LPredictors[5] = Predictor5;
VP8LPredictors[6] = Predictor6;
VP8LPredictors[7] = Predictor7;
VP8LPredictors[8] = Predictor8;
VP8LPredictors[9] = Predictor9;
VP8LPredictors[10] = Predictor10;
VP8LPredictors[11] = Predictor11;
VP8LPredictors[12] = Predictor12;
VP8LPredictors[13] = Predictor13;
VP8LSubtractGreenFromBlueAndRed = SubtractGreenFromBlueAndRed;
VP8LAddGreenToBlueAndRed = AddGreenToBlueAndRed;
VP8LTransformColor = TransformColor;
VP8LTransformColorInverse = TransformColorInverse;
VP8LConvertBGRAToRGBA = ConvertBGRAToRGBA;
VP8LConvertBGRAToRGBA4444 = ConvertBGRAToRGBA4444;
VP8LConvertBGRAToRGB565 = ConvertBGRAToRGB565;
VP8LConvertBGRAToBGR = ConvertBGRAToBGR;
VP8LHistogramAdd = HistogramAdd;
#endif // WEBP_USE_SSE2
}
//------------------------------------------------------------------------------

82
src/dsp/neon.h Normal file
View File

@@ -0,0 +1,82 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// NEON common code.
#ifndef WEBP_DSP_NEON_H_
#define WEBP_DSP_NEON_H_
#include <arm_neon.h>
#include "./dsp.h"
// Right now, some intrinsics functions seem slower, so we disable them
// everywhere except aarch64 where the inline assembly is incompatible.
#if defined(__aarch64__)
#define USE_INTRINSICS // use intrinsics when possible
#endif
#define INIT_VECTOR2(v, a, b) do { \
v.val[0] = a; \
v.val[1] = b; \
} while (0)
#define INIT_VECTOR3(v, a, b, c) do { \
v.val[0] = a; \
v.val[1] = b; \
v.val[2] = c; \
} while (0)
#define INIT_VECTOR4(v, a, b, c, d) do { \
v.val[0] = a; \
v.val[1] = b; \
v.val[2] = c; \
v.val[3] = d; \
} while (0)
// if using intrinsics, this flag avoids some functions that make gcc-4.6.3
// crash ("internal compiler error: in immed_double_const, at emit-rtl.").
// (probably similar to gcc.gnu.org/bugzilla/show_bug.cgi?id=48183)
#if !(LOCAL_GCC_PREREQ(4,8) || defined(__aarch64__))
#define WORK_AROUND_GCC
#endif
static WEBP_INLINE int32x4x4_t Transpose4x4(const int32x4x4_t rows) {
uint64x2x2_t row01, row23;
row01.val[0] = vreinterpretq_u64_s32(rows.val[0]);
row01.val[1] = vreinterpretq_u64_s32(rows.val[1]);
row23.val[0] = vreinterpretq_u64_s32(rows.val[2]);
row23.val[1] = vreinterpretq_u64_s32(rows.val[3]);
// Transpose 64-bit values (there's no vswp equivalent)
{
const uint64x1_t row0h = vget_high_u64(row01.val[0]);
const uint64x1_t row2l = vget_low_u64(row23.val[0]);
const uint64x1_t row1h = vget_high_u64(row01.val[1]);
const uint64x1_t row3l = vget_low_u64(row23.val[1]);
row01.val[0] = vcombine_u64(vget_low_u64(row01.val[0]), row2l);
row23.val[0] = vcombine_u64(row0h, vget_high_u64(row23.val[0]));
row01.val[1] = vcombine_u64(vget_low_u64(row01.val[1]), row3l);
row23.val[1] = vcombine_u64(row1h, vget_high_u64(row23.val[1]));
}
{
const int32x4x2_t out01 = vtrnq_s32(vreinterpretq_s32_u64(row01.val[0]),
vreinterpretq_s32_u64(row01.val[1]));
const int32x4x2_t out23 = vtrnq_s32(vreinterpretq_s32_u64(row23.val[0]),
vreinterpretq_s32_u64(row23.val[1]));
int32x4x4_t out;
out.val[0] = out01.val[0];
out.val[1] = out01.val[1];
out.val[2] = out23.val[0];
out.val[3] = out23.val[1];
return out;
}
}
#endif // WEBP_DSP_NEON_H_

View File

@@ -106,57 +106,6 @@ UPSAMPLE_FUNC(UpsampleRgb565LinePair, VP8YuvToRgb565, 2)
#endif // FANCY_UPSAMPLING
//------------------------------------------------------------------------------
// simple point-sampling
#define SAMPLE_FUNC(FUNC_NAME, FUNC, XSTEP) \
static void FUNC_NAME(const uint8_t* top_y, const uint8_t* bottom_y, \
const uint8_t* u, const uint8_t* v, \
uint8_t* top_dst, uint8_t* bottom_dst, int len) { \
int i; \
for (i = 0; i < len - 1; i += 2) { \
FUNC(top_y[0], u[0], v[0], top_dst); \
FUNC(top_y[1], u[0], v[0], top_dst + XSTEP); \
FUNC(bottom_y[0], u[0], v[0], bottom_dst); \
FUNC(bottom_y[1], u[0], v[0], bottom_dst + XSTEP); \
top_y += 2; \
bottom_y += 2; \
u++; \
v++; \
top_dst += 2 * XSTEP; \
bottom_dst += 2 * XSTEP; \
} \
if (i == len - 1) { /* last one */ \
FUNC(top_y[0], u[0], v[0], top_dst); \
FUNC(bottom_y[0], u[0], v[0], bottom_dst); \
} \
}
// All variants implemented.
SAMPLE_FUNC(SampleRgbLinePair, VP8YuvToRgb, 3)
SAMPLE_FUNC(SampleBgrLinePair, VP8YuvToBgr, 3)
SAMPLE_FUNC(SampleRgbaLinePair, VP8YuvToRgba, 4)
SAMPLE_FUNC(SampleBgraLinePair, VP8YuvToBgra, 4)
SAMPLE_FUNC(SampleArgbLinePair, VP8YuvToArgb, 4)
SAMPLE_FUNC(SampleRgba4444LinePair, VP8YuvToRgba4444, 2)
SAMPLE_FUNC(SampleRgb565LinePair, VP8YuvToRgb565, 2)
#undef SAMPLE_FUNC
const WebPSampleLinePairFunc WebPSamplers[MODE_LAST] = {
SampleRgbLinePair, // MODE_RGB
SampleRgbaLinePair, // MODE_RGBA
SampleBgrLinePair, // MODE_BGR
SampleBgraLinePair, // MODE_BGRA
SampleArgbLinePair, // MODE_ARGB
SampleRgba4444LinePair, // MODE_RGBA_4444
SampleRgb565LinePair, // MODE_RGB_565
SampleRgbaLinePair, // MODE_rgbA
SampleBgraLinePair, // MODE_bgrA
SampleArgbLinePair, // MODE_Argb
SampleRgba4444LinePair // MODE_rgbA_4444
};
//------------------------------------------------------------------------------
#if !defined(FANCY_UPSAMPLING)
@@ -235,85 +184,17 @@ const WebPYUV444Converter WebPYUV444Converters[MODE_LAST] = {
};
//------------------------------------------------------------------------------
// Premultiplied modes
// Main calls
// non dithered-modes
extern void WebPInitUpsamplersSSE2(void);
extern void WebPInitUpsamplersNEON(void);
// (x * a * 32897) >> 23 is bit-wise equivalent to (int)(x * a / 255.)
// for all 8bit x or a. For bit-wise equivalence to (int)(x * a / 255. + .5),
// one can use instead: (x * a * 65793 + (1 << 23)) >> 24
#if 1 // (int)(x * a / 255.)
#define MULTIPLIER(a) ((a) * 32897UL)
#define PREMULTIPLY(x, m) (((x) * (m)) >> 23)
#else // (int)(x * a / 255. + .5)
#define MULTIPLIER(a) ((a) * 65793UL)
#define PREMULTIPLY(x, m) (((x) * (m) + (1UL << 23)) >> 24)
#endif
static void ApplyAlphaMultiply(uint8_t* rgba, int alpha_first,
int w, int h, int stride) {
while (h-- > 0) {
uint8_t* const rgb = rgba + (alpha_first ? 1 : 0);
const uint8_t* const alpha = rgba + (alpha_first ? 0 : 3);
int i;
for (i = 0; i < w; ++i) {
const uint32_t a = alpha[4 * i];
if (a != 0xff) {
const uint32_t mult = MULTIPLIER(a);
rgb[4 * i + 0] = PREMULTIPLY(rgb[4 * i + 0], mult);
rgb[4 * i + 1] = PREMULTIPLY(rgb[4 * i + 1], mult);
rgb[4 * i + 2] = PREMULTIPLY(rgb[4 * i + 2], mult);
}
}
rgba += stride;
}
}
#undef MULTIPLIER
#undef PREMULTIPLY
// rgbA4444
#define MULTIPLIER(a) ((a) * 0x1111) // 0x1111 ~= (1 << 16) / 15
static WEBP_INLINE uint8_t dither_hi(uint8_t x) {
return (x & 0xf0) | (x >> 4);
}
static WEBP_INLINE uint8_t dither_lo(uint8_t x) {
return (x & 0x0f) | (x << 4);
}
static WEBP_INLINE uint8_t multiply(uint8_t x, uint32_t m) {
return (x * m) >> 16;
}
static void ApplyAlphaMultiply4444(uint8_t* rgba4444,
int w, int h, int stride) {
while (h-- > 0) {
int i;
for (i = 0; i < w; ++i) {
const uint8_t a = (rgba4444[2 * i + 1] & 0x0f);
const uint32_t mult = MULTIPLIER(a);
const uint8_t r = multiply(dither_hi(rgba4444[2 * i + 0]), mult);
const uint8_t g = multiply(dither_lo(rgba4444[2 * i + 0]), mult);
const uint8_t b = multiply(dither_hi(rgba4444[2 * i + 1]), mult);
rgba4444[2 * i + 0] = (r & 0xf0) | ((g >> 4) & 0x0f);
rgba4444[2 * i + 1] = (b & 0xf0) | a;
}
rgba4444 += stride;
}
}
#undef MULTIPLIER
void (*WebPApplyAlphaMultiply)(uint8_t*, int, int, int, int)
= ApplyAlphaMultiply;
void (*WebPApplyAlphaMultiply4444)(uint8_t*, int, int, int)
= ApplyAlphaMultiply4444;
//------------------------------------------------------------------------------
// Main call
static volatile VP8CPUInfo upsampling_last_cpuinfo_used2 =
(VP8CPUInfo)&upsampling_last_cpuinfo_used2;
void WebPInitUpsamplers(void) {
if (upsampling_last_cpuinfo_used2 == VP8GetCPUInfo) return;
#ifdef FANCY_UPSAMPLING
WebPUpsamplers[MODE_RGB] = UpsampleRgbLinePair;
WebPUpsamplers[MODE_RGBA] = UpsampleRgbaLinePair;
@@ -322,6 +203,10 @@ void WebPInitUpsamplers(void) {
WebPUpsamplers[MODE_ARGB] = UpsampleArgbLinePair;
WebPUpsamplers[MODE_RGBA_4444] = UpsampleRgba4444LinePair;
WebPUpsamplers[MODE_RGB_565] = UpsampleRgb565LinePair;
WebPUpsamplers[MODE_rgbA] = UpsampleRgbaLinePair;
WebPUpsamplers[MODE_bgrA] = UpsampleBgraLinePair;
WebPUpsamplers[MODE_Argb] = UpsampleArgbLinePair;
WebPUpsamplers[MODE_rgbA_4444] = UpsampleRgba4444LinePair;
// If defined, use CPUInfo() to overwrite some pointers with faster versions.
if (VP8GetCPUInfo != NULL) {
@@ -337,30 +222,7 @@ void WebPInitUpsamplers(void) {
#endif
}
#endif // FANCY_UPSAMPLING
upsampling_last_cpuinfo_used2 = VP8GetCPUInfo;
}
void WebPInitPremultiply(void) {
WebPApplyAlphaMultiply = ApplyAlphaMultiply;
WebPApplyAlphaMultiply4444 = ApplyAlphaMultiply4444;
#ifdef FANCY_UPSAMPLING
WebPUpsamplers[MODE_rgbA] = UpsampleRgbaLinePair;
WebPUpsamplers[MODE_bgrA] = UpsampleBgraLinePair;
WebPUpsamplers[MODE_Argb] = UpsampleArgbLinePair;
WebPUpsamplers[MODE_rgbA_4444] = UpsampleRgba4444LinePair;
if (VP8GetCPUInfo != NULL) {
#if defined(WEBP_USE_SSE2)
if (VP8GetCPUInfo(kSSE2)) {
WebPInitPremultiplySSE2();
}
#endif
#if defined(WEBP_USE_NEON)
if (VP8GetCPUInfo(kNEON)) {
WebPInitPremultiplyNEON();
}
#endif
}
#endif // FANCY_UPSAMPLING
}
//------------------------------------------------------------------------------

View File

@@ -19,6 +19,7 @@
#include <assert.h>
#include <arm_neon.h>
#include <string.h>
#include "./neon.h"
#include "./yuv.h"
#ifdef FANCY_UPSAMPLING
@@ -61,8 +62,9 @@
d = vrhadd_u8(d, diag1); \
\
{ \
const uint8x8x2_t a_b = {{ a, b }}; \
const uint8x8x2_t c_d = {{ c, d }}; \
uint8x8x2_t a_b, c_d; \
INIT_VECTOR2(a_b, a, b); \
INIT_VECTOR2(c_d, c, d); \
vst2_u8(out, a_b); \
vst2_u8(out + 32, c_d); \
} \
@@ -89,25 +91,29 @@ static void Upsample16Pixels(const uint8_t *r1, const uint8_t *r2,
static const int16_t kCoeffs[4] = { kYScale, kVToR, kUToG, kVToG };
#define v255 vmov_n_u8(255)
#define v255 vdup_n_u8(255)
#define STORE_Rgb(out, r, g, b) do { \
const uint8x8x3_t r_g_b = {{ r, g, b }}; \
uint8x8x3_t r_g_b; \
INIT_VECTOR3(r_g_b, r, g, b); \
vst3_u8(out, r_g_b); \
} while (0)
#define STORE_Bgr(out, r, g, b) do { \
const uint8x8x3_t b_g_r = {{ b, g, r }}; \
uint8x8x3_t b_g_r; \
INIT_VECTOR3(b_g_r, b, g, r); \
vst3_u8(out, b_g_r); \
} while (0)
#define STORE_Rgba(out, r, g, b) do { \
const uint8x8x4_t r_g_b_v255 = {{ r, g, b, v255 }}; \
uint8x8x4_t r_g_b_v255; \
INIT_VECTOR4(r_g_b_v255, r, g, b, v255); \
vst4_u8(out, r_g_b_v255); \
} while (0)
#define STORE_Bgra(out, r, g, b) do { \
const uint8x8x4_t b_g_r_v255 = {{ b, g, r, v255 }}; \
uint8x8x4_t b_g_r_v255; \
INIT_VECTOR4(b_g_r_v255, b, g, r, v255); \
vst4_u8(out, b_g_r_v255); \
} while (0)
@@ -190,9 +196,9 @@ static void FUNC_NAME(const uint8_t *top_y, const uint8_t *bottom_y, \
const int v_diag = ((top_v[0] + cur_v[0]) >> 1) + 1; \
\
const int16x4_t cf16 = vld1_s16(kCoeffs); \
const int32x2_t cf32 = vmov_n_s32(kUToB); \
const uint8x8_t u16 = vmov_n_u8(16); \
const uint8x8_t u128 = vmov_n_u8(128); \
const int32x2_t cf32 = vdup_n_s32(kUToB); \
const uint8x8_t u16 = vdup_n_u8(16); \
const uint8x8_t u128 = vdup_n_u8(128); \
\
/* Treat the first pixel in regular way */ \
assert(top_y != NULL); \
@@ -225,10 +231,10 @@ static void FUNC_NAME(const uint8_t *top_y, const uint8_t *bottom_y, \
}
// NEON variants of the fancy upsampler.
NEON_UPSAMPLE_FUNC(UpsampleRgbLinePairNEON, Rgb, 3)
NEON_UPSAMPLE_FUNC(UpsampleBgrLinePairNEON, Bgr, 3)
NEON_UPSAMPLE_FUNC(UpsampleRgbaLinePairNEON, Rgba, 4)
NEON_UPSAMPLE_FUNC(UpsampleBgraLinePairNEON, Bgra, 4)
NEON_UPSAMPLE_FUNC(UpsampleRgbLinePair, Rgb, 3)
NEON_UPSAMPLE_FUNC(UpsampleBgrLinePair, Bgr, 3)
NEON_UPSAMPLE_FUNC(UpsampleRgbaLinePair, Rgba, 4)
NEON_UPSAMPLE_FUNC(UpsampleBgraLinePair, Bgra, 4)
#endif // FANCY_UPSAMPLING
@@ -236,30 +242,26 @@ NEON_UPSAMPLE_FUNC(UpsampleBgraLinePairNEON, Bgra, 4)
//------------------------------------------------------------------------------
extern void WebPInitUpsamplersNEON(void);
#ifdef FANCY_UPSAMPLING
extern WebPUpsampleLinePairFunc WebPUpsamplers[/* MODE_LAST */];
void WebPInitUpsamplersNEON(void) {
#if defined(WEBP_USE_NEON)
WebPUpsamplers[MODE_RGB] = UpsampleRgbLinePairNEON;
WebPUpsamplers[MODE_RGBA] = UpsampleRgbaLinePairNEON;
WebPUpsamplers[MODE_BGR] = UpsampleBgrLinePairNEON;
WebPUpsamplers[MODE_BGRA] = UpsampleBgraLinePairNEON;
#endif // WEBP_USE_NEON
}
void WebPInitPremultiplyNEON(void) {
#if defined(WEBP_USE_NEON)
WebPUpsamplers[MODE_rgbA] = UpsampleRgbaLinePairNEON;
WebPUpsamplers[MODE_bgrA] = UpsampleBgraLinePairNEON;
WebPUpsamplers[MODE_RGB] = UpsampleRgbLinePair;
WebPUpsamplers[MODE_RGBA] = UpsampleRgbaLinePair;
WebPUpsamplers[MODE_BGR] = UpsampleBgrLinePair;
WebPUpsamplers[MODE_BGRA] = UpsampleBgraLinePair;
WebPUpsamplers[MODE_rgbA] = UpsampleRgbaLinePair;
WebPUpsamplers[MODE_bgrA] = UpsampleBgraLinePair;
#endif // WEBP_USE_NEON
}
#else
// this empty function is to avoid an empty .o
void WebPInitPremultiplyNEON(void) {}
void WebPInitUpsamplersNEON(void) {}
#endif // FANCY_UPSAMPLING

View File

@@ -169,10 +169,10 @@ static void FUNC_NAME(const uint8_t* top_y, const uint8_t* bottom_y, \
}
// SSE2 variants of the fancy upsampler.
SSE2_UPSAMPLE_FUNC(UpsampleRgbLinePairSSE2, VP8YuvToRgb, 3)
SSE2_UPSAMPLE_FUNC(UpsampleBgrLinePairSSE2, VP8YuvToBgr, 3)
SSE2_UPSAMPLE_FUNC(UpsampleRgbaLinePairSSE2, VP8YuvToRgba, 4)
SSE2_UPSAMPLE_FUNC(UpsampleBgraLinePairSSE2, VP8YuvToBgra, 4)
SSE2_UPSAMPLE_FUNC(UpsampleRgbLinePair, VP8YuvToRgb, 3)
SSE2_UPSAMPLE_FUNC(UpsampleBgrLinePair, VP8YuvToBgr, 3)
SSE2_UPSAMPLE_FUNC(UpsampleRgbaLinePair, VP8YuvToRgba, 4)
SSE2_UPSAMPLE_FUNC(UpsampleBgraLinePair, VP8YuvToBgra, 4)
#undef GET_M
#undef PACK_AND_STORE
@@ -188,6 +188,8 @@ SSE2_UPSAMPLE_FUNC(UpsampleBgraLinePairSSE2, VP8YuvToBgra, 4)
//------------------------------------------------------------------------------
extern void WebPInitUpsamplersSSE2(void);
#ifdef FANCY_UPSAMPLING
extern WebPUpsampleLinePairFunc WebPUpsamplers[/* MODE_LAST */];
@@ -195,24 +197,18 @@ extern WebPUpsampleLinePairFunc WebPUpsamplers[/* MODE_LAST */];
void WebPInitUpsamplersSSE2(void) {
#if defined(WEBP_USE_SSE2)
VP8YUVInitSSE2();
WebPUpsamplers[MODE_RGB] = UpsampleRgbLinePairSSE2;
WebPUpsamplers[MODE_RGBA] = UpsampleRgbaLinePairSSE2;
WebPUpsamplers[MODE_BGR] = UpsampleBgrLinePairSSE2;
WebPUpsamplers[MODE_BGRA] = UpsampleBgraLinePairSSE2;
#endif // WEBP_USE_SSE2
}
void WebPInitPremultiplySSE2(void) {
#if defined(WEBP_USE_SSE2)
WebPUpsamplers[MODE_rgbA] = UpsampleRgbaLinePairSSE2;
WebPUpsamplers[MODE_bgrA] = UpsampleBgraLinePairSSE2;
WebPUpsamplers[MODE_RGB] = UpsampleRgbLinePair;
WebPUpsamplers[MODE_RGBA] = UpsampleRgbaLinePair;
WebPUpsamplers[MODE_BGR] = UpsampleBgrLinePair;
WebPUpsamplers[MODE_BGRA] = UpsampleBgraLinePair;
WebPUpsamplers[MODE_rgbA] = UpsampleRgbaLinePair;
WebPUpsamplers[MODE_bgrA] = UpsampleBgraLinePair;
#endif // WEBP_USE_SSE2
}
#else
// this empty function is to avoid an empty .o
void WebPInitPremultiplySSE2(void) {}
void WebPInitUpsamplersSSE2(void) {}
#endif // FANCY_UPSAMPLING

View File

@@ -7,13 +7,12 @@
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// YUV->RGB conversion function
// YUV->RGB conversion functions
//
// Author: Skal (pascal.massimino@gmail.com)
#include "./yuv.h"
#if defined(WEBP_YUV_USE_TABLE)
static int done = 0;
@@ -68,140 +67,94 @@ void VP8YUVInit(void) {}
#endif // WEBP_YUV_USE_TABLE
//-----------------------------------------------------------------------------
// SSE2 extras
// Plain-C version
#if defined(WEBP_USE_SSE2)
#define ROW_FUNC(FUNC_NAME, FUNC, XSTEP) \
static void FUNC_NAME(const uint8_t* y, \
const uint8_t* u, const uint8_t* v, \
uint8_t* dst, int len) { \
const uint8_t* const end = dst + (len & ~1) * XSTEP; \
while (dst != end) { \
FUNC(y[0], u[0], v[0], dst); \
FUNC(y[1], u[0], v[0], dst + XSTEP); \
y += 2; \
++u; \
++v; \
dst += 2 * XSTEP; \
} \
if (len & 1) { \
FUNC(y[0], u[0], v[0], dst); \
} \
} \
#ifdef FANCY_UPSAMPLING
// All variants implemented.
ROW_FUNC(YuvToRgbRow, VP8YuvToRgb, 3)
ROW_FUNC(YuvToBgrRow, VP8YuvToBgr, 3)
ROW_FUNC(YuvToRgbaRow, VP8YuvToRgba, 4)
ROW_FUNC(YuvToBgraRow, VP8YuvToBgra, 4)
ROW_FUNC(YuvToArgbRow, VP8YuvToArgb, 4)
ROW_FUNC(YuvToRgba4444Row, VP8YuvToRgba4444, 2)
ROW_FUNC(YuvToRgb565Row, VP8YuvToRgb565, 2)
#include <emmintrin.h>
#include <string.h> // for memcpy
#undef ROW_FUNC
typedef union { // handy struct for converting SSE2 registers
int32_t i32[4];
uint8_t u8[16];
__m128i m;
} VP8kCstSSE2;
static int done_sse2 = 0;
static VP8kCstSSE2 VP8kUtoRGBA[256], VP8kVtoRGBA[256], VP8kYtoRGBA[256];
void VP8YUVInitSSE2(void) {
if (!done_sse2) {
int i;
for (i = 0; i < 256; ++i) {
VP8kYtoRGBA[i].i32[0] =
VP8kYtoRGBA[i].i32[1] =
VP8kYtoRGBA[i].i32[2] = (i - 16) * kYScale + YUV_HALF2;
VP8kYtoRGBA[i].i32[3] = 0xff << YUV_FIX2;
VP8kUtoRGBA[i].i32[0] = 0;
VP8kUtoRGBA[i].i32[1] = -kUToG * (i - 128);
VP8kUtoRGBA[i].i32[2] = kUToB * (i - 128);
VP8kUtoRGBA[i].i32[3] = 0;
VP8kVtoRGBA[i].i32[0] = kVToR * (i - 128);
VP8kVtoRGBA[i].i32[1] = -kVToG * (i - 128);
VP8kVtoRGBA[i].i32[2] = 0;
VP8kVtoRGBA[i].i32[3] = 0;
// Main call for processing a plane with a WebPSamplerRowFunc function:
void WebPSamplerProcessPlane(const uint8_t* y, int y_stride,
const uint8_t* u, const uint8_t* v, int uv_stride,
uint8_t* dst, int dst_stride,
int width, int height, WebPSamplerRowFunc func) {
int j;
for (j = 0; j < height; ++j) {
func(y, u, v, dst, width);
y += y_stride;
if (j & 1) {
u += uv_stride;
v += uv_stride;
}
done_sse2 = 1;
dst += dst_stride;
}
}
static WEBP_INLINE __m128i VP8GetRGBA32b(int y, int u, int v) {
const __m128i u_part = _mm_loadu_si128(&VP8kUtoRGBA[u].m);
const __m128i v_part = _mm_loadu_si128(&VP8kVtoRGBA[v].m);
const __m128i y_part = _mm_loadu_si128(&VP8kYtoRGBA[y].m);
const __m128i uv_part = _mm_add_epi32(u_part, v_part);
const __m128i rgba1 = _mm_add_epi32(y_part, uv_part);
const __m128i rgba2 = _mm_srai_epi32(rgba1, YUV_FIX2);
return rgba2;
}
//-----------------------------------------------------------------------------
// Main call
static WEBP_INLINE void VP8YuvToRgbSSE2(uint8_t y, uint8_t u, uint8_t v,
uint8_t* const rgb) {
const __m128i tmp0 = VP8GetRGBA32b(y, u, v);
const __m128i tmp1 = _mm_packs_epi32(tmp0, tmp0);
const __m128i tmp2 = _mm_packus_epi16(tmp1, tmp1);
// Note: we store 8 bytes at a time, not 3 bytes! -> memory stomp
_mm_storel_epi64((__m128i*)rgb, tmp2);
}
WebPSamplerRowFunc WebPSamplers[MODE_LAST];
static WEBP_INLINE void VP8YuvToBgrSSE2(uint8_t y, uint8_t u, uint8_t v,
uint8_t* const bgr) {
const __m128i tmp0 = VP8GetRGBA32b(y, u, v);
const __m128i tmp1 = _mm_shuffle_epi32(tmp0, _MM_SHUFFLE(3, 0, 1, 2));
const __m128i tmp2 = _mm_packs_epi32(tmp1, tmp1);
const __m128i tmp3 = _mm_packus_epi16(tmp2, tmp2);
// Note: we store 8 bytes at a time, not 3 bytes! -> memory stomp
_mm_storel_epi64((__m128i*)bgr, tmp3);
}
extern void WebPInitSamplersSSE2(void);
extern void WebPInitSamplersMIPS32(void);
void VP8YuvToRgba32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
uint8_t* dst) {
int n;
for (n = 0; n < 32; n += 4) {
const __m128i tmp0_1 = VP8GetRGBA32b(y[n + 0], u[n + 0], v[n + 0]);
const __m128i tmp0_2 = VP8GetRGBA32b(y[n + 1], u[n + 1], v[n + 1]);
const __m128i tmp0_3 = VP8GetRGBA32b(y[n + 2], u[n + 2], v[n + 2]);
const __m128i tmp0_4 = VP8GetRGBA32b(y[n + 3], u[n + 3], v[n + 3]);
const __m128i tmp1_1 = _mm_packs_epi32(tmp0_1, tmp0_2);
const __m128i tmp1_2 = _mm_packs_epi32(tmp0_3, tmp0_4);
const __m128i tmp2 = _mm_packus_epi16(tmp1_1, tmp1_2);
_mm_storeu_si128((__m128i*)dst, tmp2);
dst += 4 * 4;
}
}
static volatile VP8CPUInfo yuv_last_cpuinfo_used =
(VP8CPUInfo)&yuv_last_cpuinfo_used;
void VP8YuvToBgra32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
uint8_t* dst) {
int n;
for (n = 0; n < 32; n += 2) {
const __m128i tmp0_1 = VP8GetRGBA32b(y[n + 0], u[n + 0], v[n + 0]);
const __m128i tmp0_2 = VP8GetRGBA32b(y[n + 1], u[n + 1], v[n + 1]);
const __m128i tmp1_1 = _mm_shuffle_epi32(tmp0_1, _MM_SHUFFLE(3, 0, 1, 2));
const __m128i tmp1_2 = _mm_shuffle_epi32(tmp0_2, _MM_SHUFFLE(3, 0, 1, 2));
const __m128i tmp2_1 = _mm_packs_epi32(tmp1_1, tmp1_2);
const __m128i tmp3 = _mm_packus_epi16(tmp2_1, tmp2_1);
_mm_storel_epi64((__m128i*)dst, tmp3);
dst += 4 * 2;
}
}
void WebPInitSamplers(void) {
if (yuv_last_cpuinfo_used == VP8GetCPUInfo) return;
void VP8YuvToRgb32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
uint8_t* dst) {
int n;
uint8_t tmp0[2 * 3 + 5 + 15];
uint8_t* const tmp = (uint8_t*)((uintptr_t)(tmp0 + 15) & ~15); // align
for (n = 0; n < 30; ++n) { // we directly stomp the *dst memory
VP8YuvToRgbSSE2(y[n], u[n], v[n], dst + n * 3);
}
// Last two pixels are special: we write in a tmp buffer before sending
// to dst.
VP8YuvToRgbSSE2(y[n + 0], u[n + 0], v[n + 0], tmp + 0);
VP8YuvToRgbSSE2(y[n + 1], u[n + 1], v[n + 1], tmp + 3);
memcpy(dst + n * 3, tmp, 2 * 3);
}
void VP8YuvToBgr32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
uint8_t* dst) {
int n;
uint8_t tmp0[2 * 3 + 5 + 15];
uint8_t* const tmp = (uint8_t*)((uintptr_t)(tmp0 + 15) & ~15); // align
for (n = 0; n < 30; ++n) {
VP8YuvToBgrSSE2(y[n], u[n], v[n], dst + n * 3);
}
VP8YuvToBgrSSE2(y[n + 0], u[n + 0], v[n + 0], tmp + 0);
VP8YuvToBgrSSE2(y[n + 1], u[n + 1], v[n + 1], tmp + 3);
memcpy(dst + n * 3, tmp, 2 * 3);
}
#else
void VP8YUVInitSSE2(void) {}
#endif // FANCY_UPSAMPLING
WebPSamplers[MODE_RGB] = YuvToRgbRow;
WebPSamplers[MODE_RGBA] = YuvToRgbaRow;
WebPSamplers[MODE_BGR] = YuvToBgrRow;
WebPSamplers[MODE_BGRA] = YuvToBgraRow;
WebPSamplers[MODE_ARGB] = YuvToArgbRow;
WebPSamplers[MODE_RGBA_4444] = YuvToRgba4444Row;
WebPSamplers[MODE_RGB_565] = YuvToRgb565Row;
WebPSamplers[MODE_rgbA] = YuvToRgbaRow;
WebPSamplers[MODE_bgrA] = YuvToBgraRow;
WebPSamplers[MODE_Argb] = YuvToArgbRow;
WebPSamplers[MODE_rgbA_4444] = YuvToRgba4444Row;
// If defined, use CPUInfo() to overwrite some pointers with faster versions.
if (VP8GetCPUInfo != NULL) {
#if defined(WEBP_USE_SSE2)
if (VP8GetCPUInfo(kSSE2)) {
WebPInitSamplersSSE2();
}
#endif // WEBP_USE_SSE2
#if defined(WEBP_USE_MIPS32)
if (VP8GetCPUInfo(kMIPS32)) {
WebPInitSamplersMIPS32();
}
#endif // WEBP_USE_MIPS32
}
yuv_last_cpuinfo_used = VP8GetCPUInfo;
}
//-----------------------------------------------------------------------------

View File

@@ -245,6 +245,10 @@ void VP8YUVInit(void);
#if defined(WEBP_USE_SSE2)
// When the following is defined, tables are initialized statically, adding ~12k
// to the binary size. Otherwise, they are initialized at run-time (small cost).
#define WEBP_YUV_USE_SSE2_TABLES
#if defined(FANCY_UPSAMPLING)
// Process 32 pixels and store the result (24b or 32b per pixel) in *dst.
void VP8YuvToRgba32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
@@ -298,12 +302,12 @@ static WEBP_INLINE int VP8RGBToY(int r, int g, int b, int rounding) {
return (luma + rounding) >> YUV_FIX; // no need to clip
}
static WEBP_INLINE int VP8_RGB_TO_U(int r, int g, int b, int rounding) {
static WEBP_INLINE int VP8RGBToU(int r, int g, int b, int rounding) {
const int u = -11058 * r - 21710 * g + 32768 * b;
return VP8ClipUV(u, rounding);
}
static WEBP_INLINE int VP8_RGB_TO_V(int r, int g, int b, int rounding) {
static WEBP_INLINE int VP8RGBToV(int r, int g, int b, int rounding) {
const int v = 32768 * r - 27439 * g - 5329 * b;
return VP8ClipUV(v, rounding);
}

100
src/dsp/yuv_mips32.c Normal file
View File

@@ -0,0 +1,100 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// MIPS version of YUV to RGB upsampling functions.
//
// Author(s): Djordje Pesut (djordje.pesut@imgtec.com)
// Jovan Zelincevic (jovan.zelincevic@imgtec.com)
#include "./dsp.h"
#if defined(WEBP_USE_MIPS32)
#include "./yuv.h"
//------------------------------------------------------------------------------
// simple point-sampling
#define ROW_FUNC(FUNC_NAME, XSTEP, R, G, B, A) \
static void FUNC_NAME(const uint8_t* y, \
const uint8_t* u, const uint8_t* v, \
uint8_t* dst, int len) { \
int i, r, g, b; \
int temp0, temp1, temp2, temp3, temp4; \
for (i = 0; i < (len >> 1); i++) { \
temp1 = kVToR * v[0]; \
temp3 = kVToG * v[0]; \
temp2 = kUToG * u[0]; \
temp4 = kUToB * u[0]; \
temp0 = kYScale * y[0]; \
temp1 += kRCst; \
temp3 -= kGCst; \
temp2 += temp3; \
temp4 += kBCst; \
r = VP8Clip8(temp0 + temp1); \
g = VP8Clip8(temp0 - temp2); \
b = VP8Clip8(temp0 + temp4); \
temp0 = kYScale * y[1]; \
dst[R] = r; \
dst[G] = g; \
dst[B] = b; \
if (A) dst[A] = 0xff; \
r = VP8Clip8(temp0 + temp1); \
g = VP8Clip8(temp0 - temp2); \
b = VP8Clip8(temp0 + temp4); \
dst[R + XSTEP] = r; \
dst[G + XSTEP] = g; \
dst[B + XSTEP] = b; \
if (A) dst[A + XSTEP] = 0xff; \
y += 2; \
++u; \
++v; \
dst += 2 * XSTEP; \
} \
if (len & 1) { \
temp1 = kVToR * v[0]; \
temp3 = kVToG * v[0]; \
temp2 = kUToG * u[0]; \
temp4 = kUToB * u[0]; \
temp0 = kYScale * y[0]; \
temp1 += kRCst; \
temp3 -= kGCst; \
temp2 += temp3; \
temp4 += kBCst; \
r = VP8Clip8(temp0 + temp1); \
g = VP8Clip8(temp0 - temp2); \
b = VP8Clip8(temp0 + temp4); \
dst[R] = r; \
dst[G] = g; \
dst[B] = b; \
if (A) dst[A] = 0xff; \
} \
}
ROW_FUNC(YuvToRgbRow, 3, 0, 1, 2, 0)
ROW_FUNC(YuvToRgbaRow, 4, 0, 1, 2, 3)
ROW_FUNC(YuvToBgrRow, 3, 2, 1, 0, 0)
ROW_FUNC(YuvToBgraRow, 4, 2, 1, 0, 3)
#undef ROW_FUNC
#endif // WEBP_USE_MIPS32
//------------------------------------------------------------------------------
extern void WebPInitSamplersMIPS32(void);
void WebPInitSamplersMIPS32(void) {
#if defined(WEBP_USE_MIPS32)
WebPSamplers[MODE_RGB] = YuvToRgbRow;
WebPSamplers[MODE_RGBA] = YuvToRgbaRow;
WebPSamplers[MODE_BGR] = YuvToBgrRow;
WebPSamplers[MODE_BGRA] = YuvToBgraRow;
#endif // WEBP_USE_MIPS32
}

322
src/dsp/yuv_sse2.c Normal file
View File

@@ -0,0 +1,322 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// YUV->RGB conversion functions
//
// Author: Skal (pascal.massimino@gmail.com)
#include "./yuv.h"
#if defined(WEBP_USE_SSE2)
#include <emmintrin.h>
#include <string.h> // for memcpy
typedef union { // handy struct for converting SSE2 registers
int32_t i32[4];
uint8_t u8[16];
__m128i m;
} VP8kCstSSE2;
#if defined(WEBP_YUV_USE_SSE2_TABLES)
#include "./yuv_tables_sse2.h"
void VP8YUVInitSSE2(void) {}
#else
static int done_sse2 = 0;
static VP8kCstSSE2 VP8kUtoRGBA[256], VP8kVtoRGBA[256], VP8kYtoRGBA[256];
void VP8YUVInitSSE2(void) {
if (!done_sse2) {
int i;
for (i = 0; i < 256; ++i) {
VP8kYtoRGBA[i].i32[0] =
VP8kYtoRGBA[i].i32[1] =
VP8kYtoRGBA[i].i32[2] = (i - 16) * kYScale + YUV_HALF2;
VP8kYtoRGBA[i].i32[3] = 0xff << YUV_FIX2;
VP8kUtoRGBA[i].i32[0] = 0;
VP8kUtoRGBA[i].i32[1] = -kUToG * (i - 128);
VP8kUtoRGBA[i].i32[2] = kUToB * (i - 128);
VP8kUtoRGBA[i].i32[3] = 0;
VP8kVtoRGBA[i].i32[0] = kVToR * (i - 128);
VP8kVtoRGBA[i].i32[1] = -kVToG * (i - 128);
VP8kVtoRGBA[i].i32[2] = 0;
VP8kVtoRGBA[i].i32[3] = 0;
}
done_sse2 = 1;
#if 0 // code used to generate 'yuv_tables_sse2.h'
printf("static const VP8kCstSSE2 VP8kYtoRGBA[256] = {\n");
for (i = 0; i < 256; ++i) {
printf(" {{0x%.8x, 0x%.8x, 0x%.8x, 0x%.8x}},\n",
VP8kYtoRGBA[i].i32[0], VP8kYtoRGBA[i].i32[1],
VP8kYtoRGBA[i].i32[2], VP8kYtoRGBA[i].i32[3]);
}
printf("};\n\n");
printf("static const VP8kCstSSE2 VP8kUtoRGBA[256] = {\n");
for (i = 0; i < 256; ++i) {
printf(" {{0, 0x%.8x, 0x%.8x, 0}},\n",
VP8kUtoRGBA[i].i32[1], VP8kUtoRGBA[i].i32[2]);
}
printf("};\n\n");
printf("static VP8kCstSSE2 VP8kVtoRGBA[256] = {\n");
for (i = 0; i < 256; ++i) {
printf(" {{0x%.8x, 0x%.8x, 0, 0}},\n",
VP8kVtoRGBA[i].i32[0], VP8kVtoRGBA[i].i32[1]);
}
printf("};\n\n");
#endif
}
}
#endif // WEBP_YUV_USE_SSE2_TABLES
//-----------------------------------------------------------------------------
static WEBP_INLINE __m128i LoadUVPart(int u, int v) {
const __m128i u_part = _mm_loadu_si128(&VP8kUtoRGBA[u].m);
const __m128i v_part = _mm_loadu_si128(&VP8kVtoRGBA[v].m);
const __m128i uv_part = _mm_add_epi32(u_part, v_part);
return uv_part;
}
static WEBP_INLINE __m128i GetRGBA32bWithUV(int y, const __m128i uv_part) {
const __m128i y_part = _mm_loadu_si128(&VP8kYtoRGBA[y].m);
const __m128i rgba1 = _mm_add_epi32(y_part, uv_part);
const __m128i rgba2 = _mm_srai_epi32(rgba1, YUV_FIX2);
return rgba2;
}
static WEBP_INLINE __m128i GetRGBA32b(int y, int u, int v) {
const __m128i uv_part = LoadUVPart(u, v);
return GetRGBA32bWithUV(y, uv_part);
}
static WEBP_INLINE void YuvToRgbSSE2(uint8_t y, uint8_t u, uint8_t v,
uint8_t* const rgb) {
const __m128i tmp0 = GetRGBA32b(y, u, v);
const __m128i tmp1 = _mm_packs_epi32(tmp0, tmp0);
const __m128i tmp2 = _mm_packus_epi16(tmp1, tmp1);
// Note: we store 8 bytes at a time, not 3 bytes! -> memory stomp
_mm_storel_epi64((__m128i*)rgb, tmp2);
}
static WEBP_INLINE void YuvToBgrSSE2(uint8_t y, uint8_t u, uint8_t v,
uint8_t* const bgr) {
const __m128i tmp0 = GetRGBA32b(y, u, v);
const __m128i tmp1 = _mm_shuffle_epi32(tmp0, _MM_SHUFFLE(3, 0, 1, 2));
const __m128i tmp2 = _mm_packs_epi32(tmp1, tmp1);
const __m128i tmp3 = _mm_packus_epi16(tmp2, tmp2);
// Note: we store 8 bytes at a time, not 3 bytes! -> memory stomp
_mm_storel_epi64((__m128i*)bgr, tmp3);
}
//-----------------------------------------------------------------------------
// Convert spans of 32 pixels to various RGB formats for the fancy upsampler.
#ifdef FANCY_UPSAMPLING
void VP8YuvToRgba32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
uint8_t* dst) {
int n;
for (n = 0; n < 32; n += 4) {
const __m128i tmp0_1 = GetRGBA32b(y[n + 0], u[n + 0], v[n + 0]);
const __m128i tmp0_2 = GetRGBA32b(y[n + 1], u[n + 1], v[n + 1]);
const __m128i tmp0_3 = GetRGBA32b(y[n + 2], u[n + 2], v[n + 2]);
const __m128i tmp0_4 = GetRGBA32b(y[n + 3], u[n + 3], v[n + 3]);
const __m128i tmp1_1 = _mm_packs_epi32(tmp0_1, tmp0_2);
const __m128i tmp1_2 = _mm_packs_epi32(tmp0_3, tmp0_4);
const __m128i tmp2 = _mm_packus_epi16(tmp1_1, tmp1_2);
_mm_storeu_si128((__m128i*)dst, tmp2);
dst += 4 * 4;
}
}
void VP8YuvToBgra32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
uint8_t* dst) {
int n;
for (n = 0; n < 32; n += 2) {
const __m128i tmp0_1 = GetRGBA32b(y[n + 0], u[n + 0], v[n + 0]);
const __m128i tmp0_2 = GetRGBA32b(y[n + 1], u[n + 1], v[n + 1]);
const __m128i tmp1_1 = _mm_shuffle_epi32(tmp0_1, _MM_SHUFFLE(3, 0, 1, 2));
const __m128i tmp1_2 = _mm_shuffle_epi32(tmp0_2, _MM_SHUFFLE(3, 0, 1, 2));
const __m128i tmp2_1 = _mm_packs_epi32(tmp1_1, tmp1_2);
const __m128i tmp3 = _mm_packus_epi16(tmp2_1, tmp2_1);
_mm_storel_epi64((__m128i*)dst, tmp3);
dst += 4 * 2;
}
}
void VP8YuvToRgb32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
uint8_t* dst) {
int n;
uint8_t tmp0[2 * 3 + 5 + 15];
uint8_t* const tmp = (uint8_t*)((uintptr_t)(tmp0 + 15) & ~15); // align
for (n = 0; n < 30; ++n) { // we directly stomp the *dst memory
YuvToRgbSSE2(y[n], u[n], v[n], dst + n * 3);
}
// Last two pixels are special: we write in a tmp buffer before sending
// to dst.
YuvToRgbSSE2(y[n + 0], u[n + 0], v[n + 0], tmp + 0);
YuvToRgbSSE2(y[n + 1], u[n + 1], v[n + 1], tmp + 3);
memcpy(dst + n * 3, tmp, 2 * 3);
}
void VP8YuvToBgr32(const uint8_t* y, const uint8_t* u, const uint8_t* v,
uint8_t* dst) {
int n;
uint8_t tmp0[2 * 3 + 5 + 15];
uint8_t* const tmp = (uint8_t*)((uintptr_t)(tmp0 + 15) & ~15); // align
for (n = 0; n < 30; ++n) {
YuvToBgrSSE2(y[n], u[n], v[n], dst + n * 3);
}
YuvToBgrSSE2(y[n + 0], u[n + 0], v[n + 0], tmp + 0);
YuvToBgrSSE2(y[n + 1], u[n + 1], v[n + 1], tmp + 3);
memcpy(dst + n * 3, tmp, 2 * 3);
}
#endif // FANCY_UPSAMPLING
//-----------------------------------------------------------------------------
// Arbitrary-length row conversion functions
static void YuvToRgbaRowSSE2(const uint8_t* y,
const uint8_t* u, const uint8_t* v,
uint8_t* dst, int len) {
int n;
for (n = 0; n + 4 <= len; n += 4) {
const __m128i uv_0 = LoadUVPart(u[0], v[0]);
const __m128i uv_1 = LoadUVPart(u[1], v[1]);
const __m128i tmp0_1 = GetRGBA32bWithUV(y[0], uv_0);
const __m128i tmp0_2 = GetRGBA32bWithUV(y[1], uv_0);
const __m128i tmp0_3 = GetRGBA32bWithUV(y[2], uv_1);
const __m128i tmp0_4 = GetRGBA32bWithUV(y[3], uv_1);
const __m128i tmp1_1 = _mm_packs_epi32(tmp0_1, tmp0_2);
const __m128i tmp1_2 = _mm_packs_epi32(tmp0_3, tmp0_4);
const __m128i tmp2 = _mm_packus_epi16(tmp1_1, tmp1_2);
_mm_storeu_si128((__m128i*)dst, tmp2);
dst += 4 * 4;
y += 4;
u += 2;
v += 2;
}
// Finish off
while (n < len) {
VP8YuvToRgba(y[0], u[0], v[0], dst);
dst += 4;
++y;
u += (n & 1);
v += (n & 1);
++n;
}
}
static void YuvToBgraRowSSE2(const uint8_t* y,
const uint8_t* u, const uint8_t* v,
uint8_t* dst, int len) {
int n;
for (n = 0; n + 2 <= len; n += 2) {
const __m128i uv_0 = LoadUVPart(u[0], v[0]);
const __m128i tmp0_1 = GetRGBA32bWithUV(y[0], uv_0);
const __m128i tmp0_2 = GetRGBA32bWithUV(y[1], uv_0);
const __m128i tmp1_1 = _mm_shuffle_epi32(tmp0_1, _MM_SHUFFLE(3, 0, 1, 2));
const __m128i tmp1_2 = _mm_shuffle_epi32(tmp0_2, _MM_SHUFFLE(3, 0, 1, 2));
const __m128i tmp2_1 = _mm_packs_epi32(tmp1_1, tmp1_2);
const __m128i tmp3 = _mm_packus_epi16(tmp2_1, tmp2_1);
_mm_storel_epi64((__m128i*)dst, tmp3);
dst += 4 * 2;
y += 2;
++u;
++v;
}
// Finish off
if (len & 1) {
VP8YuvToBgra(y[0], u[0], v[0], dst);
}
}
static void YuvToArgbRowSSE2(const uint8_t* y,
const uint8_t* u, const uint8_t* v,
uint8_t* dst, int len) {
int n;
for (n = 0; n + 2 <= len; n += 2) {
const __m128i uv_0 = LoadUVPart(u[0], v[0]);
const __m128i tmp0_1 = GetRGBA32bWithUV(y[0], uv_0);
const __m128i tmp0_2 = GetRGBA32bWithUV(y[1], uv_0);
const __m128i tmp1_1 = _mm_shuffle_epi32(tmp0_1, _MM_SHUFFLE(2, 1, 0, 3));
const __m128i tmp1_2 = _mm_shuffle_epi32(tmp0_2, _MM_SHUFFLE(2, 1, 0, 3));
const __m128i tmp2_1 = _mm_packs_epi32(tmp1_1, tmp1_2);
const __m128i tmp3 = _mm_packus_epi16(tmp2_1, tmp2_1);
_mm_storel_epi64((__m128i*)dst, tmp3);
dst += 4 * 2;
y += 2;
++u;
++v;
}
// Finish off
if (len & 1) {
VP8YuvToArgb(y[0], u[0], v[0], dst);
}
}
static void YuvToRgbRowSSE2(const uint8_t* y,
const uint8_t* u, const uint8_t* v,
uint8_t* dst, int len) {
int n;
for (n = 0; n + 2 < len; ++n) { // we directly stomp the *dst memory
YuvToRgbSSE2(y[0], u[0], v[0], dst); // stomps 8 bytes
dst += 3;
++y;
u += (n & 1);
v += (n & 1);
}
VP8YuvToRgb(y[0], u[0], v[0], dst);
if (len > 1) {
VP8YuvToRgb(y[1], u[n & 1], v[n & 1], dst + 3);
}
}
static void YuvToBgrRowSSE2(const uint8_t* y,
const uint8_t* u, const uint8_t* v,
uint8_t* dst, int len) {
int n;
for (n = 0; n + 2 < len; ++n) { // we directly stomp the *dst memory
YuvToBgrSSE2(y[0], u[0], v[0], dst); // stomps 8 bytes
dst += 3;
++y;
u += (n & 1);
v += (n & 1);
}
VP8YuvToBgr(y[0], u[0], v[0], dst + 0);
if (len > 1) {
VP8YuvToBgr(y[1], u[n & 1], v[n & 1], dst + 3);
}
}
#endif // WEBP_USE_SSE2
//------------------------------------------------------------------------------
// Entry point
extern void WebPInitSamplersSSE2(void);
void WebPInitSamplersSSE2(void) {
#if defined(WEBP_USE_SSE2)
WebPSamplers[MODE_RGB] = YuvToRgbRowSSE2;
WebPSamplers[MODE_RGBA] = YuvToRgbaRowSSE2;
WebPSamplers[MODE_BGR] = YuvToBgrRowSSE2;
WebPSamplers[MODE_BGRA] = YuvToBgraRowSSE2;
WebPSamplers[MODE_ARGB] = YuvToArgbRowSSE2;
#endif // WEBP_USE_SSE2
}

536
src/dsp/yuv_tables_sse2.h Normal file
View File

@@ -0,0 +1,536 @@
// Copyright 2014 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// SSE2 tables for YUV->RGB conversion (12kB overall)
//
// Author: Skal (pascal.massimino@gmail.com)
// This file is not compiled, but #include'd directly from yuv.c
// Only used if WEBP_YUV_USE_SSE2_TABLES is defined.
static const VP8kCstSSE2 VP8kYtoRGBA[256] = {
{{0xfffb77b0, 0xfffb77b0, 0xfffb77b0, 0x003fc000}},
{{0xfffbc235, 0xfffbc235, 0xfffbc235, 0x003fc000}},
{{0xfffc0cba, 0xfffc0cba, 0xfffc0cba, 0x003fc000}},
{{0xfffc573f, 0xfffc573f, 0xfffc573f, 0x003fc000}},
{{0xfffca1c4, 0xfffca1c4, 0xfffca1c4, 0x003fc000}},
{{0xfffcec49, 0xfffcec49, 0xfffcec49, 0x003fc000}},
{{0xfffd36ce, 0xfffd36ce, 0xfffd36ce, 0x003fc000}},
{{0xfffd8153, 0xfffd8153, 0xfffd8153, 0x003fc000}},
{{0xfffdcbd8, 0xfffdcbd8, 0xfffdcbd8, 0x003fc000}},
{{0xfffe165d, 0xfffe165d, 0xfffe165d, 0x003fc000}},
{{0xfffe60e2, 0xfffe60e2, 0xfffe60e2, 0x003fc000}},
{{0xfffeab67, 0xfffeab67, 0xfffeab67, 0x003fc000}},
{{0xfffef5ec, 0xfffef5ec, 0xfffef5ec, 0x003fc000}},
{{0xffff4071, 0xffff4071, 0xffff4071, 0x003fc000}},
{{0xffff8af6, 0xffff8af6, 0xffff8af6, 0x003fc000}},
{{0xffffd57b, 0xffffd57b, 0xffffd57b, 0x003fc000}},
{{0x00002000, 0x00002000, 0x00002000, 0x003fc000}},
{{0x00006a85, 0x00006a85, 0x00006a85, 0x003fc000}},
{{0x0000b50a, 0x0000b50a, 0x0000b50a, 0x003fc000}},
{{0x0000ff8f, 0x0000ff8f, 0x0000ff8f, 0x003fc000}},
{{0x00014a14, 0x00014a14, 0x00014a14, 0x003fc000}},
{{0x00019499, 0x00019499, 0x00019499, 0x003fc000}},
{{0x0001df1e, 0x0001df1e, 0x0001df1e, 0x003fc000}},
{{0x000229a3, 0x000229a3, 0x000229a3, 0x003fc000}},
{{0x00027428, 0x00027428, 0x00027428, 0x003fc000}},
{{0x0002bead, 0x0002bead, 0x0002bead, 0x003fc000}},
{{0x00030932, 0x00030932, 0x00030932, 0x003fc000}},
{{0x000353b7, 0x000353b7, 0x000353b7, 0x003fc000}},
{{0x00039e3c, 0x00039e3c, 0x00039e3c, 0x003fc000}},
{{0x0003e8c1, 0x0003e8c1, 0x0003e8c1, 0x003fc000}},
{{0x00043346, 0x00043346, 0x00043346, 0x003fc000}},
{{0x00047dcb, 0x00047dcb, 0x00047dcb, 0x003fc000}},
{{0x0004c850, 0x0004c850, 0x0004c850, 0x003fc000}},
{{0x000512d5, 0x000512d5, 0x000512d5, 0x003fc000}},
{{0x00055d5a, 0x00055d5a, 0x00055d5a, 0x003fc000}},
{{0x0005a7df, 0x0005a7df, 0x0005a7df, 0x003fc000}},
{{0x0005f264, 0x0005f264, 0x0005f264, 0x003fc000}},
{{0x00063ce9, 0x00063ce9, 0x00063ce9, 0x003fc000}},
{{0x0006876e, 0x0006876e, 0x0006876e, 0x003fc000}},
{{0x0006d1f3, 0x0006d1f3, 0x0006d1f3, 0x003fc000}},
{{0x00071c78, 0x00071c78, 0x00071c78, 0x003fc000}},
{{0x000766fd, 0x000766fd, 0x000766fd, 0x003fc000}},
{{0x0007b182, 0x0007b182, 0x0007b182, 0x003fc000}},
{{0x0007fc07, 0x0007fc07, 0x0007fc07, 0x003fc000}},
{{0x0008468c, 0x0008468c, 0x0008468c, 0x003fc000}},
{{0x00089111, 0x00089111, 0x00089111, 0x003fc000}},
{{0x0008db96, 0x0008db96, 0x0008db96, 0x003fc000}},
{{0x0009261b, 0x0009261b, 0x0009261b, 0x003fc000}},
{{0x000970a0, 0x000970a0, 0x000970a0, 0x003fc000}},
{{0x0009bb25, 0x0009bb25, 0x0009bb25, 0x003fc000}},
{{0x000a05aa, 0x000a05aa, 0x000a05aa, 0x003fc000}},
{{0x000a502f, 0x000a502f, 0x000a502f, 0x003fc000}},
{{0x000a9ab4, 0x000a9ab4, 0x000a9ab4, 0x003fc000}},
{{0x000ae539, 0x000ae539, 0x000ae539, 0x003fc000}},
{{0x000b2fbe, 0x000b2fbe, 0x000b2fbe, 0x003fc000}},
{{0x000b7a43, 0x000b7a43, 0x000b7a43, 0x003fc000}},
{{0x000bc4c8, 0x000bc4c8, 0x000bc4c8, 0x003fc000}},
{{0x000c0f4d, 0x000c0f4d, 0x000c0f4d, 0x003fc000}},
{{0x000c59d2, 0x000c59d2, 0x000c59d2, 0x003fc000}},
{{0x000ca457, 0x000ca457, 0x000ca457, 0x003fc000}},
{{0x000ceedc, 0x000ceedc, 0x000ceedc, 0x003fc000}},
{{0x000d3961, 0x000d3961, 0x000d3961, 0x003fc000}},
{{0x000d83e6, 0x000d83e6, 0x000d83e6, 0x003fc000}},
{{0x000dce6b, 0x000dce6b, 0x000dce6b, 0x003fc000}},
{{0x000e18f0, 0x000e18f0, 0x000e18f0, 0x003fc000}},
{{0x000e6375, 0x000e6375, 0x000e6375, 0x003fc000}},
{{0x000eadfa, 0x000eadfa, 0x000eadfa, 0x003fc000}},
{{0x000ef87f, 0x000ef87f, 0x000ef87f, 0x003fc000}},
{{0x000f4304, 0x000f4304, 0x000f4304, 0x003fc000}},
{{0x000f8d89, 0x000f8d89, 0x000f8d89, 0x003fc000}},
{{0x000fd80e, 0x000fd80e, 0x000fd80e, 0x003fc000}},
{{0x00102293, 0x00102293, 0x00102293, 0x003fc000}},
{{0x00106d18, 0x00106d18, 0x00106d18, 0x003fc000}},
{{0x0010b79d, 0x0010b79d, 0x0010b79d, 0x003fc000}},
{{0x00110222, 0x00110222, 0x00110222, 0x003fc000}},
{{0x00114ca7, 0x00114ca7, 0x00114ca7, 0x003fc000}},
{{0x0011972c, 0x0011972c, 0x0011972c, 0x003fc000}},
{{0x0011e1b1, 0x0011e1b1, 0x0011e1b1, 0x003fc000}},
{{0x00122c36, 0x00122c36, 0x00122c36, 0x003fc000}},
{{0x001276bb, 0x001276bb, 0x001276bb, 0x003fc000}},
{{0x0012c140, 0x0012c140, 0x0012c140, 0x003fc000}},
{{0x00130bc5, 0x00130bc5, 0x00130bc5, 0x003fc000}},
{{0x0013564a, 0x0013564a, 0x0013564a, 0x003fc000}},
{{0x0013a0cf, 0x0013a0cf, 0x0013a0cf, 0x003fc000}},
{{0x0013eb54, 0x0013eb54, 0x0013eb54, 0x003fc000}},
{{0x001435d9, 0x001435d9, 0x001435d9, 0x003fc000}},
{{0x0014805e, 0x0014805e, 0x0014805e, 0x003fc000}},
{{0x0014cae3, 0x0014cae3, 0x0014cae3, 0x003fc000}},
{{0x00151568, 0x00151568, 0x00151568, 0x003fc000}},
{{0x00155fed, 0x00155fed, 0x00155fed, 0x003fc000}},
{{0x0015aa72, 0x0015aa72, 0x0015aa72, 0x003fc000}},
{{0x0015f4f7, 0x0015f4f7, 0x0015f4f7, 0x003fc000}},
{{0x00163f7c, 0x00163f7c, 0x00163f7c, 0x003fc000}},
{{0x00168a01, 0x00168a01, 0x00168a01, 0x003fc000}},
{{0x0016d486, 0x0016d486, 0x0016d486, 0x003fc000}},
{{0x00171f0b, 0x00171f0b, 0x00171f0b, 0x003fc000}},
{{0x00176990, 0x00176990, 0x00176990, 0x003fc000}},
{{0x0017b415, 0x0017b415, 0x0017b415, 0x003fc000}},
{{0x0017fe9a, 0x0017fe9a, 0x0017fe9a, 0x003fc000}},
{{0x0018491f, 0x0018491f, 0x0018491f, 0x003fc000}},
{{0x001893a4, 0x001893a4, 0x001893a4, 0x003fc000}},
{{0x0018de29, 0x0018de29, 0x0018de29, 0x003fc000}},
{{0x001928ae, 0x001928ae, 0x001928ae, 0x003fc000}},
{{0x00197333, 0x00197333, 0x00197333, 0x003fc000}},
{{0x0019bdb8, 0x0019bdb8, 0x0019bdb8, 0x003fc000}},
{{0x001a083d, 0x001a083d, 0x001a083d, 0x003fc000}},
{{0x001a52c2, 0x001a52c2, 0x001a52c2, 0x003fc000}},
{{0x001a9d47, 0x001a9d47, 0x001a9d47, 0x003fc000}},
{{0x001ae7cc, 0x001ae7cc, 0x001ae7cc, 0x003fc000}},
{{0x001b3251, 0x001b3251, 0x001b3251, 0x003fc000}},
{{0x001b7cd6, 0x001b7cd6, 0x001b7cd6, 0x003fc000}},
{{0x001bc75b, 0x001bc75b, 0x001bc75b, 0x003fc000}},
{{0x001c11e0, 0x001c11e0, 0x001c11e0, 0x003fc000}},
{{0x001c5c65, 0x001c5c65, 0x001c5c65, 0x003fc000}},
{{0x001ca6ea, 0x001ca6ea, 0x001ca6ea, 0x003fc000}},
{{0x001cf16f, 0x001cf16f, 0x001cf16f, 0x003fc000}},
{{0x001d3bf4, 0x001d3bf4, 0x001d3bf4, 0x003fc000}},
{{0x001d8679, 0x001d8679, 0x001d8679, 0x003fc000}},
{{0x001dd0fe, 0x001dd0fe, 0x001dd0fe, 0x003fc000}},
{{0x001e1b83, 0x001e1b83, 0x001e1b83, 0x003fc000}},
{{0x001e6608, 0x001e6608, 0x001e6608, 0x003fc000}},
{{0x001eb08d, 0x001eb08d, 0x001eb08d, 0x003fc000}},
{{0x001efb12, 0x001efb12, 0x001efb12, 0x003fc000}},
{{0x001f4597, 0x001f4597, 0x001f4597, 0x003fc000}},
{{0x001f901c, 0x001f901c, 0x001f901c, 0x003fc000}},
{{0x001fdaa1, 0x001fdaa1, 0x001fdaa1, 0x003fc000}},
{{0x00202526, 0x00202526, 0x00202526, 0x003fc000}},
{{0x00206fab, 0x00206fab, 0x00206fab, 0x003fc000}},
{{0x0020ba30, 0x0020ba30, 0x0020ba30, 0x003fc000}},
{{0x002104b5, 0x002104b5, 0x002104b5, 0x003fc000}},
{{0x00214f3a, 0x00214f3a, 0x00214f3a, 0x003fc000}},
{{0x002199bf, 0x002199bf, 0x002199bf, 0x003fc000}},
{{0x0021e444, 0x0021e444, 0x0021e444, 0x003fc000}},
{{0x00222ec9, 0x00222ec9, 0x00222ec9, 0x003fc000}},
{{0x0022794e, 0x0022794e, 0x0022794e, 0x003fc000}},
{{0x0022c3d3, 0x0022c3d3, 0x0022c3d3, 0x003fc000}},
{{0x00230e58, 0x00230e58, 0x00230e58, 0x003fc000}},
{{0x002358dd, 0x002358dd, 0x002358dd, 0x003fc000}},
{{0x0023a362, 0x0023a362, 0x0023a362, 0x003fc000}},
{{0x0023ede7, 0x0023ede7, 0x0023ede7, 0x003fc000}},
{{0x0024386c, 0x0024386c, 0x0024386c, 0x003fc000}},
{{0x002482f1, 0x002482f1, 0x002482f1, 0x003fc000}},
{{0x0024cd76, 0x0024cd76, 0x0024cd76, 0x003fc000}},
{{0x002517fb, 0x002517fb, 0x002517fb, 0x003fc000}},
{{0x00256280, 0x00256280, 0x00256280, 0x003fc000}},
{{0x0025ad05, 0x0025ad05, 0x0025ad05, 0x003fc000}},
{{0x0025f78a, 0x0025f78a, 0x0025f78a, 0x003fc000}},
{{0x0026420f, 0x0026420f, 0x0026420f, 0x003fc000}},
{{0x00268c94, 0x00268c94, 0x00268c94, 0x003fc000}},
{{0x0026d719, 0x0026d719, 0x0026d719, 0x003fc000}},
{{0x0027219e, 0x0027219e, 0x0027219e, 0x003fc000}},
{{0x00276c23, 0x00276c23, 0x00276c23, 0x003fc000}},
{{0x0027b6a8, 0x0027b6a8, 0x0027b6a8, 0x003fc000}},
{{0x0028012d, 0x0028012d, 0x0028012d, 0x003fc000}},
{{0x00284bb2, 0x00284bb2, 0x00284bb2, 0x003fc000}},
{{0x00289637, 0x00289637, 0x00289637, 0x003fc000}},
{{0x0028e0bc, 0x0028e0bc, 0x0028e0bc, 0x003fc000}},
{{0x00292b41, 0x00292b41, 0x00292b41, 0x003fc000}},
{{0x002975c6, 0x002975c6, 0x002975c6, 0x003fc000}},
{{0x0029c04b, 0x0029c04b, 0x0029c04b, 0x003fc000}},
{{0x002a0ad0, 0x002a0ad0, 0x002a0ad0, 0x003fc000}},
{{0x002a5555, 0x002a5555, 0x002a5555, 0x003fc000}},
{{0x002a9fda, 0x002a9fda, 0x002a9fda, 0x003fc000}},
{{0x002aea5f, 0x002aea5f, 0x002aea5f, 0x003fc000}},
{{0x002b34e4, 0x002b34e4, 0x002b34e4, 0x003fc000}},
{{0x002b7f69, 0x002b7f69, 0x002b7f69, 0x003fc000}},
{{0x002bc9ee, 0x002bc9ee, 0x002bc9ee, 0x003fc000}},
{{0x002c1473, 0x002c1473, 0x002c1473, 0x003fc000}},
{{0x002c5ef8, 0x002c5ef8, 0x002c5ef8, 0x003fc000}},
{{0x002ca97d, 0x002ca97d, 0x002ca97d, 0x003fc000}},
{{0x002cf402, 0x002cf402, 0x002cf402, 0x003fc000}},
{{0x002d3e87, 0x002d3e87, 0x002d3e87, 0x003fc000}},
{{0x002d890c, 0x002d890c, 0x002d890c, 0x003fc000}},
{{0x002dd391, 0x002dd391, 0x002dd391, 0x003fc000}},
{{0x002e1e16, 0x002e1e16, 0x002e1e16, 0x003fc000}},
{{0x002e689b, 0x002e689b, 0x002e689b, 0x003fc000}},
{{0x002eb320, 0x002eb320, 0x002eb320, 0x003fc000}},
{{0x002efda5, 0x002efda5, 0x002efda5, 0x003fc000}},
{{0x002f482a, 0x002f482a, 0x002f482a, 0x003fc000}},
{{0x002f92af, 0x002f92af, 0x002f92af, 0x003fc000}},
{{0x002fdd34, 0x002fdd34, 0x002fdd34, 0x003fc000}},
{{0x003027b9, 0x003027b9, 0x003027b9, 0x003fc000}},
{{0x0030723e, 0x0030723e, 0x0030723e, 0x003fc000}},
{{0x0030bcc3, 0x0030bcc3, 0x0030bcc3, 0x003fc000}},
{{0x00310748, 0x00310748, 0x00310748, 0x003fc000}},
{{0x003151cd, 0x003151cd, 0x003151cd, 0x003fc000}},
{{0x00319c52, 0x00319c52, 0x00319c52, 0x003fc000}},
{{0x0031e6d7, 0x0031e6d7, 0x0031e6d7, 0x003fc000}},
{{0x0032315c, 0x0032315c, 0x0032315c, 0x003fc000}},
{{0x00327be1, 0x00327be1, 0x00327be1, 0x003fc000}},
{{0x0032c666, 0x0032c666, 0x0032c666, 0x003fc000}},
{{0x003310eb, 0x003310eb, 0x003310eb, 0x003fc000}},
{{0x00335b70, 0x00335b70, 0x00335b70, 0x003fc000}},
{{0x0033a5f5, 0x0033a5f5, 0x0033a5f5, 0x003fc000}},
{{0x0033f07a, 0x0033f07a, 0x0033f07a, 0x003fc000}},
{{0x00343aff, 0x00343aff, 0x00343aff, 0x003fc000}},
{{0x00348584, 0x00348584, 0x00348584, 0x003fc000}},
{{0x0034d009, 0x0034d009, 0x0034d009, 0x003fc000}},
{{0x00351a8e, 0x00351a8e, 0x00351a8e, 0x003fc000}},
{{0x00356513, 0x00356513, 0x00356513, 0x003fc000}},
{{0x0035af98, 0x0035af98, 0x0035af98, 0x003fc000}},
{{0x0035fa1d, 0x0035fa1d, 0x0035fa1d, 0x003fc000}},
{{0x003644a2, 0x003644a2, 0x003644a2, 0x003fc000}},
{{0x00368f27, 0x00368f27, 0x00368f27, 0x003fc000}},
{{0x0036d9ac, 0x0036d9ac, 0x0036d9ac, 0x003fc000}},
{{0x00372431, 0x00372431, 0x00372431, 0x003fc000}},
{{0x00376eb6, 0x00376eb6, 0x00376eb6, 0x003fc000}},
{{0x0037b93b, 0x0037b93b, 0x0037b93b, 0x003fc000}},
{{0x003803c0, 0x003803c0, 0x003803c0, 0x003fc000}},
{{0x00384e45, 0x00384e45, 0x00384e45, 0x003fc000}},
{{0x003898ca, 0x003898ca, 0x003898ca, 0x003fc000}},
{{0x0038e34f, 0x0038e34f, 0x0038e34f, 0x003fc000}},
{{0x00392dd4, 0x00392dd4, 0x00392dd4, 0x003fc000}},
{{0x00397859, 0x00397859, 0x00397859, 0x003fc000}},
{{0x0039c2de, 0x0039c2de, 0x0039c2de, 0x003fc000}},
{{0x003a0d63, 0x003a0d63, 0x003a0d63, 0x003fc000}},
{{0x003a57e8, 0x003a57e8, 0x003a57e8, 0x003fc000}},
{{0x003aa26d, 0x003aa26d, 0x003aa26d, 0x003fc000}},
{{0x003aecf2, 0x003aecf2, 0x003aecf2, 0x003fc000}},
{{0x003b3777, 0x003b3777, 0x003b3777, 0x003fc000}},
{{0x003b81fc, 0x003b81fc, 0x003b81fc, 0x003fc000}},
{{0x003bcc81, 0x003bcc81, 0x003bcc81, 0x003fc000}},
{{0x003c1706, 0x003c1706, 0x003c1706, 0x003fc000}},
{{0x003c618b, 0x003c618b, 0x003c618b, 0x003fc000}},
{{0x003cac10, 0x003cac10, 0x003cac10, 0x003fc000}},
{{0x003cf695, 0x003cf695, 0x003cf695, 0x003fc000}},
{{0x003d411a, 0x003d411a, 0x003d411a, 0x003fc000}},
{{0x003d8b9f, 0x003d8b9f, 0x003d8b9f, 0x003fc000}},
{{0x003dd624, 0x003dd624, 0x003dd624, 0x003fc000}},
{{0x003e20a9, 0x003e20a9, 0x003e20a9, 0x003fc000}},
{{0x003e6b2e, 0x003e6b2e, 0x003e6b2e, 0x003fc000}},
{{0x003eb5b3, 0x003eb5b3, 0x003eb5b3, 0x003fc000}},
{{0x003f0038, 0x003f0038, 0x003f0038, 0x003fc000}},
{{0x003f4abd, 0x003f4abd, 0x003f4abd, 0x003fc000}},
{{0x003f9542, 0x003f9542, 0x003f9542, 0x003fc000}},
{{0x003fdfc7, 0x003fdfc7, 0x003fdfc7, 0x003fc000}},
{{0x00402a4c, 0x00402a4c, 0x00402a4c, 0x003fc000}},
{{0x004074d1, 0x004074d1, 0x004074d1, 0x003fc000}},
{{0x0040bf56, 0x0040bf56, 0x0040bf56, 0x003fc000}},
{{0x004109db, 0x004109db, 0x004109db, 0x003fc000}},
{{0x00415460, 0x00415460, 0x00415460, 0x003fc000}},
{{0x00419ee5, 0x00419ee5, 0x00419ee5, 0x003fc000}},
{{0x0041e96a, 0x0041e96a, 0x0041e96a, 0x003fc000}},
{{0x004233ef, 0x004233ef, 0x004233ef, 0x003fc000}},
{{0x00427e74, 0x00427e74, 0x00427e74, 0x003fc000}},
{{0x0042c8f9, 0x0042c8f9, 0x0042c8f9, 0x003fc000}},
{{0x0043137e, 0x0043137e, 0x0043137e, 0x003fc000}},
{{0x00435e03, 0x00435e03, 0x00435e03, 0x003fc000}},
{{0x0043a888, 0x0043a888, 0x0043a888, 0x003fc000}},
{{0x0043f30d, 0x0043f30d, 0x0043f30d, 0x003fc000}},
{{0x00443d92, 0x00443d92, 0x00443d92, 0x003fc000}},
{{0x00448817, 0x00448817, 0x00448817, 0x003fc000}},
{{0x0044d29c, 0x0044d29c, 0x0044d29c, 0x003fc000}},
{{0x00451d21, 0x00451d21, 0x00451d21, 0x003fc000}},
{{0x004567a6, 0x004567a6, 0x004567a6, 0x003fc000}},
{{0x0045b22b, 0x0045b22b, 0x0045b22b, 0x003fc000}}
};
static const VP8kCstSSE2 VP8kUtoRGBA[256] = {
{{0, 0x000c8980, 0xffbf7300, 0}}, {{0, 0x000c706d, 0xffbff41a, 0}},
{{0, 0x000c575a, 0xffc07534, 0}}, {{0, 0x000c3e47, 0xffc0f64e, 0}},
{{0, 0x000c2534, 0xffc17768, 0}}, {{0, 0x000c0c21, 0xffc1f882, 0}},
{{0, 0x000bf30e, 0xffc2799c, 0}}, {{0, 0x000bd9fb, 0xffc2fab6, 0}},
{{0, 0x000bc0e8, 0xffc37bd0, 0}}, {{0, 0x000ba7d5, 0xffc3fcea, 0}},
{{0, 0x000b8ec2, 0xffc47e04, 0}}, {{0, 0x000b75af, 0xffc4ff1e, 0}},
{{0, 0x000b5c9c, 0xffc58038, 0}}, {{0, 0x000b4389, 0xffc60152, 0}},
{{0, 0x000b2a76, 0xffc6826c, 0}}, {{0, 0x000b1163, 0xffc70386, 0}},
{{0, 0x000af850, 0xffc784a0, 0}}, {{0, 0x000adf3d, 0xffc805ba, 0}},
{{0, 0x000ac62a, 0xffc886d4, 0}}, {{0, 0x000aad17, 0xffc907ee, 0}},
{{0, 0x000a9404, 0xffc98908, 0}}, {{0, 0x000a7af1, 0xffca0a22, 0}},
{{0, 0x000a61de, 0xffca8b3c, 0}}, {{0, 0x000a48cb, 0xffcb0c56, 0}},
{{0, 0x000a2fb8, 0xffcb8d70, 0}}, {{0, 0x000a16a5, 0xffcc0e8a, 0}},
{{0, 0x0009fd92, 0xffcc8fa4, 0}}, {{0, 0x0009e47f, 0xffcd10be, 0}},
{{0, 0x0009cb6c, 0xffcd91d8, 0}}, {{0, 0x0009b259, 0xffce12f2, 0}},
{{0, 0x00099946, 0xffce940c, 0}}, {{0, 0x00098033, 0xffcf1526, 0}},
{{0, 0x00096720, 0xffcf9640, 0}}, {{0, 0x00094e0d, 0xffd0175a, 0}},
{{0, 0x000934fa, 0xffd09874, 0}}, {{0, 0x00091be7, 0xffd1198e, 0}},
{{0, 0x000902d4, 0xffd19aa8, 0}}, {{0, 0x0008e9c1, 0xffd21bc2, 0}},
{{0, 0x0008d0ae, 0xffd29cdc, 0}}, {{0, 0x0008b79b, 0xffd31df6, 0}},
{{0, 0x00089e88, 0xffd39f10, 0}}, {{0, 0x00088575, 0xffd4202a, 0}},
{{0, 0x00086c62, 0xffd4a144, 0}}, {{0, 0x0008534f, 0xffd5225e, 0}},
{{0, 0x00083a3c, 0xffd5a378, 0}}, {{0, 0x00082129, 0xffd62492, 0}},
{{0, 0x00080816, 0xffd6a5ac, 0}}, {{0, 0x0007ef03, 0xffd726c6, 0}},
{{0, 0x0007d5f0, 0xffd7a7e0, 0}}, {{0, 0x0007bcdd, 0xffd828fa, 0}},
{{0, 0x0007a3ca, 0xffd8aa14, 0}}, {{0, 0x00078ab7, 0xffd92b2e, 0}},
{{0, 0x000771a4, 0xffd9ac48, 0}}, {{0, 0x00075891, 0xffda2d62, 0}},
{{0, 0x00073f7e, 0xffdaae7c, 0}}, {{0, 0x0007266b, 0xffdb2f96, 0}},
{{0, 0x00070d58, 0xffdbb0b0, 0}}, {{0, 0x0006f445, 0xffdc31ca, 0}},
{{0, 0x0006db32, 0xffdcb2e4, 0}}, {{0, 0x0006c21f, 0xffdd33fe, 0}},
{{0, 0x0006a90c, 0xffddb518, 0}}, {{0, 0x00068ff9, 0xffde3632, 0}},
{{0, 0x000676e6, 0xffdeb74c, 0}}, {{0, 0x00065dd3, 0xffdf3866, 0}},
{{0, 0x000644c0, 0xffdfb980, 0}}, {{0, 0x00062bad, 0xffe03a9a, 0}},
{{0, 0x0006129a, 0xffe0bbb4, 0}}, {{0, 0x0005f987, 0xffe13cce, 0}},
{{0, 0x0005e074, 0xffe1bde8, 0}}, {{0, 0x0005c761, 0xffe23f02, 0}},
{{0, 0x0005ae4e, 0xffe2c01c, 0}}, {{0, 0x0005953b, 0xffe34136, 0}},
{{0, 0x00057c28, 0xffe3c250, 0}}, {{0, 0x00056315, 0xffe4436a, 0}},
{{0, 0x00054a02, 0xffe4c484, 0}}, {{0, 0x000530ef, 0xffe5459e, 0}},
{{0, 0x000517dc, 0xffe5c6b8, 0}}, {{0, 0x0004fec9, 0xffe647d2, 0}},
{{0, 0x0004e5b6, 0xffe6c8ec, 0}}, {{0, 0x0004cca3, 0xffe74a06, 0}},
{{0, 0x0004b390, 0xffe7cb20, 0}}, {{0, 0x00049a7d, 0xffe84c3a, 0}},
{{0, 0x0004816a, 0xffe8cd54, 0}}, {{0, 0x00046857, 0xffe94e6e, 0}},
{{0, 0x00044f44, 0xffe9cf88, 0}}, {{0, 0x00043631, 0xffea50a2, 0}},
{{0, 0x00041d1e, 0xffead1bc, 0}}, {{0, 0x0004040b, 0xffeb52d6, 0}},
{{0, 0x0003eaf8, 0xffebd3f0, 0}}, {{0, 0x0003d1e5, 0xffec550a, 0}},
{{0, 0x0003b8d2, 0xffecd624, 0}}, {{0, 0x00039fbf, 0xffed573e, 0}},
{{0, 0x000386ac, 0xffedd858, 0}}, {{0, 0x00036d99, 0xffee5972, 0}},
{{0, 0x00035486, 0xffeeda8c, 0}}, {{0, 0x00033b73, 0xffef5ba6, 0}},
{{0, 0x00032260, 0xffefdcc0, 0}}, {{0, 0x0003094d, 0xfff05dda, 0}},
{{0, 0x0002f03a, 0xfff0def4, 0}}, {{0, 0x0002d727, 0xfff1600e, 0}},
{{0, 0x0002be14, 0xfff1e128, 0}}, {{0, 0x0002a501, 0xfff26242, 0}},
{{0, 0x00028bee, 0xfff2e35c, 0}}, {{0, 0x000272db, 0xfff36476, 0}},
{{0, 0x000259c8, 0xfff3e590, 0}}, {{0, 0x000240b5, 0xfff466aa, 0}},
{{0, 0x000227a2, 0xfff4e7c4, 0}}, {{0, 0x00020e8f, 0xfff568de, 0}},
{{0, 0x0001f57c, 0xfff5e9f8, 0}}, {{0, 0x0001dc69, 0xfff66b12, 0}},
{{0, 0x0001c356, 0xfff6ec2c, 0}}, {{0, 0x0001aa43, 0xfff76d46, 0}},
{{0, 0x00019130, 0xfff7ee60, 0}}, {{0, 0x0001781d, 0xfff86f7a, 0}},
{{0, 0x00015f0a, 0xfff8f094, 0}}, {{0, 0x000145f7, 0xfff971ae, 0}},
{{0, 0x00012ce4, 0xfff9f2c8, 0}}, {{0, 0x000113d1, 0xfffa73e2, 0}},
{{0, 0x0000fabe, 0xfffaf4fc, 0}}, {{0, 0x0000e1ab, 0xfffb7616, 0}},
{{0, 0x0000c898, 0xfffbf730, 0}}, {{0, 0x0000af85, 0xfffc784a, 0}},
{{0, 0x00009672, 0xfffcf964, 0}}, {{0, 0x00007d5f, 0xfffd7a7e, 0}},
{{0, 0x0000644c, 0xfffdfb98, 0}}, {{0, 0x00004b39, 0xfffe7cb2, 0}},
{{0, 0x00003226, 0xfffefdcc, 0}}, {{0, 0x00001913, 0xffff7ee6, 0}},
{{0, 0x00000000, 0x00000000, 0}}, {{0, 0xffffe6ed, 0x0000811a, 0}},
{{0, 0xffffcdda, 0x00010234, 0}}, {{0, 0xffffb4c7, 0x0001834e, 0}},
{{0, 0xffff9bb4, 0x00020468, 0}}, {{0, 0xffff82a1, 0x00028582, 0}},
{{0, 0xffff698e, 0x0003069c, 0}}, {{0, 0xffff507b, 0x000387b6, 0}},
{{0, 0xffff3768, 0x000408d0, 0}}, {{0, 0xffff1e55, 0x000489ea, 0}},
{{0, 0xffff0542, 0x00050b04, 0}}, {{0, 0xfffeec2f, 0x00058c1e, 0}},
{{0, 0xfffed31c, 0x00060d38, 0}}, {{0, 0xfffeba09, 0x00068e52, 0}},
{{0, 0xfffea0f6, 0x00070f6c, 0}}, {{0, 0xfffe87e3, 0x00079086, 0}},
{{0, 0xfffe6ed0, 0x000811a0, 0}}, {{0, 0xfffe55bd, 0x000892ba, 0}},
{{0, 0xfffe3caa, 0x000913d4, 0}}, {{0, 0xfffe2397, 0x000994ee, 0}},
{{0, 0xfffe0a84, 0x000a1608, 0}}, {{0, 0xfffdf171, 0x000a9722, 0}},
{{0, 0xfffdd85e, 0x000b183c, 0}}, {{0, 0xfffdbf4b, 0x000b9956, 0}},
{{0, 0xfffda638, 0x000c1a70, 0}}, {{0, 0xfffd8d25, 0x000c9b8a, 0}},
{{0, 0xfffd7412, 0x000d1ca4, 0}}, {{0, 0xfffd5aff, 0x000d9dbe, 0}},
{{0, 0xfffd41ec, 0x000e1ed8, 0}}, {{0, 0xfffd28d9, 0x000e9ff2, 0}},
{{0, 0xfffd0fc6, 0x000f210c, 0}}, {{0, 0xfffcf6b3, 0x000fa226, 0}},
{{0, 0xfffcdda0, 0x00102340, 0}}, {{0, 0xfffcc48d, 0x0010a45a, 0}},
{{0, 0xfffcab7a, 0x00112574, 0}}, {{0, 0xfffc9267, 0x0011a68e, 0}},
{{0, 0xfffc7954, 0x001227a8, 0}}, {{0, 0xfffc6041, 0x0012a8c2, 0}},
{{0, 0xfffc472e, 0x001329dc, 0}}, {{0, 0xfffc2e1b, 0x0013aaf6, 0}},
{{0, 0xfffc1508, 0x00142c10, 0}}, {{0, 0xfffbfbf5, 0x0014ad2a, 0}},
{{0, 0xfffbe2e2, 0x00152e44, 0}}, {{0, 0xfffbc9cf, 0x0015af5e, 0}},
{{0, 0xfffbb0bc, 0x00163078, 0}}, {{0, 0xfffb97a9, 0x0016b192, 0}},
{{0, 0xfffb7e96, 0x001732ac, 0}}, {{0, 0xfffb6583, 0x0017b3c6, 0}},
{{0, 0xfffb4c70, 0x001834e0, 0}}, {{0, 0xfffb335d, 0x0018b5fa, 0}},
{{0, 0xfffb1a4a, 0x00193714, 0}}, {{0, 0xfffb0137, 0x0019b82e, 0}},
{{0, 0xfffae824, 0x001a3948, 0}}, {{0, 0xfffacf11, 0x001aba62, 0}},
{{0, 0xfffab5fe, 0x001b3b7c, 0}}, {{0, 0xfffa9ceb, 0x001bbc96, 0}},
{{0, 0xfffa83d8, 0x001c3db0, 0}}, {{0, 0xfffa6ac5, 0x001cbeca, 0}},
{{0, 0xfffa51b2, 0x001d3fe4, 0}}, {{0, 0xfffa389f, 0x001dc0fe, 0}},
{{0, 0xfffa1f8c, 0x001e4218, 0}}, {{0, 0xfffa0679, 0x001ec332, 0}},
{{0, 0xfff9ed66, 0x001f444c, 0}}, {{0, 0xfff9d453, 0x001fc566, 0}},
{{0, 0xfff9bb40, 0x00204680, 0}}, {{0, 0xfff9a22d, 0x0020c79a, 0}},
{{0, 0xfff9891a, 0x002148b4, 0}}, {{0, 0xfff97007, 0x0021c9ce, 0}},
{{0, 0xfff956f4, 0x00224ae8, 0}}, {{0, 0xfff93de1, 0x0022cc02, 0}},
{{0, 0xfff924ce, 0x00234d1c, 0}}, {{0, 0xfff90bbb, 0x0023ce36, 0}},
{{0, 0xfff8f2a8, 0x00244f50, 0}}, {{0, 0xfff8d995, 0x0024d06a, 0}},
{{0, 0xfff8c082, 0x00255184, 0}}, {{0, 0xfff8a76f, 0x0025d29e, 0}},
{{0, 0xfff88e5c, 0x002653b8, 0}}, {{0, 0xfff87549, 0x0026d4d2, 0}},
{{0, 0xfff85c36, 0x002755ec, 0}}, {{0, 0xfff84323, 0x0027d706, 0}},
{{0, 0xfff82a10, 0x00285820, 0}}, {{0, 0xfff810fd, 0x0028d93a, 0}},
{{0, 0xfff7f7ea, 0x00295a54, 0}}, {{0, 0xfff7ded7, 0x0029db6e, 0}},
{{0, 0xfff7c5c4, 0x002a5c88, 0}}, {{0, 0xfff7acb1, 0x002adda2, 0}},
{{0, 0xfff7939e, 0x002b5ebc, 0}}, {{0, 0xfff77a8b, 0x002bdfd6, 0}},
{{0, 0xfff76178, 0x002c60f0, 0}}, {{0, 0xfff74865, 0x002ce20a, 0}},
{{0, 0xfff72f52, 0x002d6324, 0}}, {{0, 0xfff7163f, 0x002de43e, 0}},
{{0, 0xfff6fd2c, 0x002e6558, 0}}, {{0, 0xfff6e419, 0x002ee672, 0}},
{{0, 0xfff6cb06, 0x002f678c, 0}}, {{0, 0xfff6b1f3, 0x002fe8a6, 0}},
{{0, 0xfff698e0, 0x003069c0, 0}}, {{0, 0xfff67fcd, 0x0030eada, 0}},
{{0, 0xfff666ba, 0x00316bf4, 0}}, {{0, 0xfff64da7, 0x0031ed0e, 0}},
{{0, 0xfff63494, 0x00326e28, 0}}, {{0, 0xfff61b81, 0x0032ef42, 0}},
{{0, 0xfff6026e, 0x0033705c, 0}}, {{0, 0xfff5e95b, 0x0033f176, 0}},
{{0, 0xfff5d048, 0x00347290, 0}}, {{0, 0xfff5b735, 0x0034f3aa, 0}},
{{0, 0xfff59e22, 0x003574c4, 0}}, {{0, 0xfff5850f, 0x0035f5de, 0}},
{{0, 0xfff56bfc, 0x003676f8, 0}}, {{0, 0xfff552e9, 0x0036f812, 0}},
{{0, 0xfff539d6, 0x0037792c, 0}}, {{0, 0xfff520c3, 0x0037fa46, 0}},
{{0, 0xfff507b0, 0x00387b60, 0}}, {{0, 0xfff4ee9d, 0x0038fc7a, 0}},
{{0, 0xfff4d58a, 0x00397d94, 0}}, {{0, 0xfff4bc77, 0x0039feae, 0}},
{{0, 0xfff4a364, 0x003a7fc8, 0}}, {{0, 0xfff48a51, 0x003b00e2, 0}},
{{0, 0xfff4713e, 0x003b81fc, 0}}, {{0, 0xfff4582b, 0x003c0316, 0}},
{{0, 0xfff43f18, 0x003c8430, 0}}, {{0, 0xfff42605, 0x003d054a, 0}},
{{0, 0xfff40cf2, 0x003d8664, 0}}, {{0, 0xfff3f3df, 0x003e077e, 0}},
{{0, 0xfff3dacc, 0x003e8898, 0}}, {{0, 0xfff3c1b9, 0x003f09b2, 0}},
{{0, 0xfff3a8a6, 0x003f8acc, 0}}, {{0, 0xfff38f93, 0x00400be6, 0}}
};
static VP8kCstSSE2 VP8kVtoRGBA[256] = {
{{0xffcced80, 0x001a0400, 0, 0}}, {{0xffcd53a5, 0x0019cff8, 0, 0}},
{{0xffcdb9ca, 0x00199bf0, 0, 0}}, {{0xffce1fef, 0x001967e8, 0, 0}},
{{0xffce8614, 0x001933e0, 0, 0}}, {{0xffceec39, 0x0018ffd8, 0, 0}},
{{0xffcf525e, 0x0018cbd0, 0, 0}}, {{0xffcfb883, 0x001897c8, 0, 0}},
{{0xffd01ea8, 0x001863c0, 0, 0}}, {{0xffd084cd, 0x00182fb8, 0, 0}},
{{0xffd0eaf2, 0x0017fbb0, 0, 0}}, {{0xffd15117, 0x0017c7a8, 0, 0}},
{{0xffd1b73c, 0x001793a0, 0, 0}}, {{0xffd21d61, 0x00175f98, 0, 0}},
{{0xffd28386, 0x00172b90, 0, 0}}, {{0xffd2e9ab, 0x0016f788, 0, 0}},
{{0xffd34fd0, 0x0016c380, 0, 0}}, {{0xffd3b5f5, 0x00168f78, 0, 0}},
{{0xffd41c1a, 0x00165b70, 0, 0}}, {{0xffd4823f, 0x00162768, 0, 0}},
{{0xffd4e864, 0x0015f360, 0, 0}}, {{0xffd54e89, 0x0015bf58, 0, 0}},
{{0xffd5b4ae, 0x00158b50, 0, 0}}, {{0xffd61ad3, 0x00155748, 0, 0}},
{{0xffd680f8, 0x00152340, 0, 0}}, {{0xffd6e71d, 0x0014ef38, 0, 0}},
{{0xffd74d42, 0x0014bb30, 0, 0}}, {{0xffd7b367, 0x00148728, 0, 0}},
{{0xffd8198c, 0x00145320, 0, 0}}, {{0xffd87fb1, 0x00141f18, 0, 0}},
{{0xffd8e5d6, 0x0013eb10, 0, 0}}, {{0xffd94bfb, 0x0013b708, 0, 0}},
{{0xffd9b220, 0x00138300, 0, 0}}, {{0xffda1845, 0x00134ef8, 0, 0}},
{{0xffda7e6a, 0x00131af0, 0, 0}}, {{0xffdae48f, 0x0012e6e8, 0, 0}},
{{0xffdb4ab4, 0x0012b2e0, 0, 0}}, {{0xffdbb0d9, 0x00127ed8, 0, 0}},
{{0xffdc16fe, 0x00124ad0, 0, 0}}, {{0xffdc7d23, 0x001216c8, 0, 0}},
{{0xffdce348, 0x0011e2c0, 0, 0}}, {{0xffdd496d, 0x0011aeb8, 0, 0}},
{{0xffddaf92, 0x00117ab0, 0, 0}}, {{0xffde15b7, 0x001146a8, 0, 0}},
{{0xffde7bdc, 0x001112a0, 0, 0}}, {{0xffdee201, 0x0010de98, 0, 0}},
{{0xffdf4826, 0x0010aa90, 0, 0}}, {{0xffdfae4b, 0x00107688, 0, 0}},
{{0xffe01470, 0x00104280, 0, 0}}, {{0xffe07a95, 0x00100e78, 0, 0}},
{{0xffe0e0ba, 0x000fda70, 0, 0}}, {{0xffe146df, 0x000fa668, 0, 0}},
{{0xffe1ad04, 0x000f7260, 0, 0}}, {{0xffe21329, 0x000f3e58, 0, 0}},
{{0xffe2794e, 0x000f0a50, 0, 0}}, {{0xffe2df73, 0x000ed648, 0, 0}},
{{0xffe34598, 0x000ea240, 0, 0}}, {{0xffe3abbd, 0x000e6e38, 0, 0}},
{{0xffe411e2, 0x000e3a30, 0, 0}}, {{0xffe47807, 0x000e0628, 0, 0}},
{{0xffe4de2c, 0x000dd220, 0, 0}}, {{0xffe54451, 0x000d9e18, 0, 0}},
{{0xffe5aa76, 0x000d6a10, 0, 0}}, {{0xffe6109b, 0x000d3608, 0, 0}},
{{0xffe676c0, 0x000d0200, 0, 0}}, {{0xffe6dce5, 0x000ccdf8, 0, 0}},
{{0xffe7430a, 0x000c99f0, 0, 0}}, {{0xffe7a92f, 0x000c65e8, 0, 0}},
{{0xffe80f54, 0x000c31e0, 0, 0}}, {{0xffe87579, 0x000bfdd8, 0, 0}},
{{0xffe8db9e, 0x000bc9d0, 0, 0}}, {{0xffe941c3, 0x000b95c8, 0, 0}},
{{0xffe9a7e8, 0x000b61c0, 0, 0}}, {{0xffea0e0d, 0x000b2db8, 0, 0}},
{{0xffea7432, 0x000af9b0, 0, 0}}, {{0xffeada57, 0x000ac5a8, 0, 0}},
{{0xffeb407c, 0x000a91a0, 0, 0}}, {{0xffeba6a1, 0x000a5d98, 0, 0}},
{{0xffec0cc6, 0x000a2990, 0, 0}}, {{0xffec72eb, 0x0009f588, 0, 0}},
{{0xffecd910, 0x0009c180, 0, 0}}, {{0xffed3f35, 0x00098d78, 0, 0}},
{{0xffeda55a, 0x00095970, 0, 0}}, {{0xffee0b7f, 0x00092568, 0, 0}},
{{0xffee71a4, 0x0008f160, 0, 0}}, {{0xffeed7c9, 0x0008bd58, 0, 0}},
{{0xffef3dee, 0x00088950, 0, 0}}, {{0xffefa413, 0x00085548, 0, 0}},
{{0xfff00a38, 0x00082140, 0, 0}}, {{0xfff0705d, 0x0007ed38, 0, 0}},
{{0xfff0d682, 0x0007b930, 0, 0}}, {{0xfff13ca7, 0x00078528, 0, 0}},
{{0xfff1a2cc, 0x00075120, 0, 0}}, {{0xfff208f1, 0x00071d18, 0, 0}},
{{0xfff26f16, 0x0006e910, 0, 0}}, {{0xfff2d53b, 0x0006b508, 0, 0}},
{{0xfff33b60, 0x00068100, 0, 0}}, {{0xfff3a185, 0x00064cf8, 0, 0}},
{{0xfff407aa, 0x000618f0, 0, 0}}, {{0xfff46dcf, 0x0005e4e8, 0, 0}},
{{0xfff4d3f4, 0x0005b0e0, 0, 0}}, {{0xfff53a19, 0x00057cd8, 0, 0}},
{{0xfff5a03e, 0x000548d0, 0, 0}}, {{0xfff60663, 0x000514c8, 0, 0}},
{{0xfff66c88, 0x0004e0c0, 0, 0}}, {{0xfff6d2ad, 0x0004acb8, 0, 0}},
{{0xfff738d2, 0x000478b0, 0, 0}}, {{0xfff79ef7, 0x000444a8, 0, 0}},
{{0xfff8051c, 0x000410a0, 0, 0}}, {{0xfff86b41, 0x0003dc98, 0, 0}},
{{0xfff8d166, 0x0003a890, 0, 0}}, {{0xfff9378b, 0x00037488, 0, 0}},
{{0xfff99db0, 0x00034080, 0, 0}}, {{0xfffa03d5, 0x00030c78, 0, 0}},
{{0xfffa69fa, 0x0002d870, 0, 0}}, {{0xfffad01f, 0x0002a468, 0, 0}},
{{0xfffb3644, 0x00027060, 0, 0}}, {{0xfffb9c69, 0x00023c58, 0, 0}},
{{0xfffc028e, 0x00020850, 0, 0}}, {{0xfffc68b3, 0x0001d448, 0, 0}},
{{0xfffcced8, 0x0001a040, 0, 0}}, {{0xfffd34fd, 0x00016c38, 0, 0}},
{{0xfffd9b22, 0x00013830, 0, 0}}, {{0xfffe0147, 0x00010428, 0, 0}},
{{0xfffe676c, 0x0000d020, 0, 0}}, {{0xfffecd91, 0x00009c18, 0, 0}},
{{0xffff33b6, 0x00006810, 0, 0}}, {{0xffff99db, 0x00003408, 0, 0}},
{{0x00000000, 0x00000000, 0, 0}}, {{0x00006625, 0xffffcbf8, 0, 0}},
{{0x0000cc4a, 0xffff97f0, 0, 0}}, {{0x0001326f, 0xffff63e8, 0, 0}},
{{0x00019894, 0xffff2fe0, 0, 0}}, {{0x0001feb9, 0xfffefbd8, 0, 0}},
{{0x000264de, 0xfffec7d0, 0, 0}}, {{0x0002cb03, 0xfffe93c8, 0, 0}},
{{0x00033128, 0xfffe5fc0, 0, 0}}, {{0x0003974d, 0xfffe2bb8, 0, 0}},
{{0x0003fd72, 0xfffdf7b0, 0, 0}}, {{0x00046397, 0xfffdc3a8, 0, 0}},
{{0x0004c9bc, 0xfffd8fa0, 0, 0}}, {{0x00052fe1, 0xfffd5b98, 0, 0}},
{{0x00059606, 0xfffd2790, 0, 0}}, {{0x0005fc2b, 0xfffcf388, 0, 0}},
{{0x00066250, 0xfffcbf80, 0, 0}}, {{0x0006c875, 0xfffc8b78, 0, 0}},
{{0x00072e9a, 0xfffc5770, 0, 0}}, {{0x000794bf, 0xfffc2368, 0, 0}},
{{0x0007fae4, 0xfffbef60, 0, 0}}, {{0x00086109, 0xfffbbb58, 0, 0}},
{{0x0008c72e, 0xfffb8750, 0, 0}}, {{0x00092d53, 0xfffb5348, 0, 0}},
{{0x00099378, 0xfffb1f40, 0, 0}}, {{0x0009f99d, 0xfffaeb38, 0, 0}},
{{0x000a5fc2, 0xfffab730, 0, 0}}, {{0x000ac5e7, 0xfffa8328, 0, 0}},
{{0x000b2c0c, 0xfffa4f20, 0, 0}}, {{0x000b9231, 0xfffa1b18, 0, 0}},
{{0x000bf856, 0xfff9e710, 0, 0}}, {{0x000c5e7b, 0xfff9b308, 0, 0}},
{{0x000cc4a0, 0xfff97f00, 0, 0}}, {{0x000d2ac5, 0xfff94af8, 0, 0}},
{{0x000d90ea, 0xfff916f0, 0, 0}}, {{0x000df70f, 0xfff8e2e8, 0, 0}},
{{0x000e5d34, 0xfff8aee0, 0, 0}}, {{0x000ec359, 0xfff87ad8, 0, 0}},
{{0x000f297e, 0xfff846d0, 0, 0}}, {{0x000f8fa3, 0xfff812c8, 0, 0}},
{{0x000ff5c8, 0xfff7dec0, 0, 0}}, {{0x00105bed, 0xfff7aab8, 0, 0}},
{{0x0010c212, 0xfff776b0, 0, 0}}, {{0x00112837, 0xfff742a8, 0, 0}},
{{0x00118e5c, 0xfff70ea0, 0, 0}}, {{0x0011f481, 0xfff6da98, 0, 0}},
{{0x00125aa6, 0xfff6a690, 0, 0}}, {{0x0012c0cb, 0xfff67288, 0, 0}},
{{0x001326f0, 0xfff63e80, 0, 0}}, {{0x00138d15, 0xfff60a78, 0, 0}},
{{0x0013f33a, 0xfff5d670, 0, 0}}, {{0x0014595f, 0xfff5a268, 0, 0}},
{{0x0014bf84, 0xfff56e60, 0, 0}}, {{0x001525a9, 0xfff53a58, 0, 0}},
{{0x00158bce, 0xfff50650, 0, 0}}, {{0x0015f1f3, 0xfff4d248, 0, 0}},
{{0x00165818, 0xfff49e40, 0, 0}}, {{0x0016be3d, 0xfff46a38, 0, 0}},
{{0x00172462, 0xfff43630, 0, 0}}, {{0x00178a87, 0xfff40228, 0, 0}},
{{0x0017f0ac, 0xfff3ce20, 0, 0}}, {{0x001856d1, 0xfff39a18, 0, 0}},
{{0x0018bcf6, 0xfff36610, 0, 0}}, {{0x0019231b, 0xfff33208, 0, 0}},
{{0x00198940, 0xfff2fe00, 0, 0}}, {{0x0019ef65, 0xfff2c9f8, 0, 0}},
{{0x001a558a, 0xfff295f0, 0, 0}}, {{0x001abbaf, 0xfff261e8, 0, 0}},
{{0x001b21d4, 0xfff22de0, 0, 0}}, {{0x001b87f9, 0xfff1f9d8, 0, 0}},
{{0x001bee1e, 0xfff1c5d0, 0, 0}}, {{0x001c5443, 0xfff191c8, 0, 0}},
{{0x001cba68, 0xfff15dc0, 0, 0}}, {{0x001d208d, 0xfff129b8, 0, 0}},
{{0x001d86b2, 0xfff0f5b0, 0, 0}}, {{0x001decd7, 0xfff0c1a8, 0, 0}},
{{0x001e52fc, 0xfff08da0, 0, 0}}, {{0x001eb921, 0xfff05998, 0, 0}},
{{0x001f1f46, 0xfff02590, 0, 0}}, {{0x001f856b, 0xffeff188, 0, 0}},
{{0x001feb90, 0xffefbd80, 0, 0}}, {{0x002051b5, 0xffef8978, 0, 0}},
{{0x0020b7da, 0xffef5570, 0, 0}}, {{0x00211dff, 0xffef2168, 0, 0}},
{{0x00218424, 0xffeeed60, 0, 0}}, {{0x0021ea49, 0xffeeb958, 0, 0}},
{{0x0022506e, 0xffee8550, 0, 0}}, {{0x0022b693, 0xffee5148, 0, 0}},
{{0x00231cb8, 0xffee1d40, 0, 0}}, {{0x002382dd, 0xffede938, 0, 0}},
{{0x0023e902, 0xffedb530, 0, 0}}, {{0x00244f27, 0xffed8128, 0, 0}},
{{0x0024b54c, 0xffed4d20, 0, 0}}, {{0x00251b71, 0xffed1918, 0, 0}},
{{0x00258196, 0xffece510, 0, 0}}, {{0x0025e7bb, 0xffecb108, 0, 0}},
{{0x00264de0, 0xffec7d00, 0, 0}}, {{0x0026b405, 0xffec48f8, 0, 0}},
{{0x00271a2a, 0xffec14f0, 0, 0}}, {{0x0027804f, 0xffebe0e8, 0, 0}},
{{0x0027e674, 0xffebace0, 0, 0}}, {{0x00284c99, 0xffeb78d8, 0, 0}},
{{0x0028b2be, 0xffeb44d0, 0, 0}}, {{0x002918e3, 0xffeb10c8, 0, 0}},
{{0x00297f08, 0xffeadcc0, 0, 0}}, {{0x0029e52d, 0xffeaa8b8, 0, 0}},
{{0x002a4b52, 0xffea74b0, 0, 0}}, {{0x002ab177, 0xffea40a8, 0, 0}},
{{0x002b179c, 0xffea0ca0, 0, 0}}, {{0x002b7dc1, 0xffe9d898, 0, 0}},
{{0x002be3e6, 0xffe9a490, 0, 0}}, {{0x002c4a0b, 0xffe97088, 0, 0}},
{{0x002cb030, 0xffe93c80, 0, 0}}, {{0x002d1655, 0xffe90878, 0, 0}},
{{0x002d7c7a, 0xffe8d470, 0, 0}}, {{0x002de29f, 0xffe8a068, 0, 0}},
{{0x002e48c4, 0xffe86c60, 0, 0}}, {{0x002eaee9, 0xffe83858, 0, 0}},
{{0x002f150e, 0xffe80450, 0, 0}}, {{0x002f7b33, 0xffe7d048, 0, 0}},
{{0x002fe158, 0xffe79c40, 0, 0}}, {{0x0030477d, 0xffe76838, 0, 0}},
{{0x0030ada2, 0xffe73430, 0, 0}}, {{0x003113c7, 0xffe70028, 0, 0}},
{{0x003179ec, 0xffe6cc20, 0, 0}}, {{0x0031e011, 0xffe69818, 0, 0}},
{{0x00324636, 0xffe66410, 0, 0}}, {{0x0032ac5b, 0xffe63008, 0, 0}}
};

View File

@@ -1,4 +1,3 @@
AM_CPPFLAGS = -I$(top_srcdir)/src
noinst_LTLIBRARIES = libwebpencode.la
libwebpencode_la_SOURCES =
@@ -12,8 +11,11 @@ libwebpencode_la_SOURCES += filter.c
libwebpencode_la_SOURCES += frame.c
libwebpencode_la_SOURCES += histogram.c
libwebpencode_la_SOURCES += iterator.c
libwebpencode_la_SOURCES += layer.c
libwebpencode_la_SOURCES += picture.c
libwebpencode_la_SOURCES += picture_csp.c
libwebpencode_la_SOURCES += picture_psnr.c
libwebpencode_la_SOURCES += picture_rescale.c
libwebpencode_la_SOURCES += picture_tools.c
libwebpencode_la_SOURCES += quant.c
libwebpencode_la_SOURCES += syntax.c
libwebpencode_la_SOURCES += token.c

View File

@@ -17,6 +17,7 @@
#include "./vp8enci.h"
#include "../utils/filters.h"
#include "../utils/quant_levels.h"
#include "../utils/utils.h"
#include "../webp/format_constants.h"
// -----------------------------------------------------------------------------
@@ -34,7 +35,7 @@
//
// 'output' corresponds to the buffer containing compressed alpha data.
// This buffer is allocated by this method and caller should call
// free(*output) when done.
// WebPSafeFree(*output) when done.
// 'output_size' corresponds to size of this compressed alpha buffer.
//
// Returns 1 on successfully encoding the alpha and
@@ -46,12 +47,11 @@
static int EncodeLossless(const uint8_t* const data, int width, int height,
int effort_level, // in [0..6] range
VP8BitWriter* const bw,
VP8LBitWriter* const bw,
WebPAuxStats* const stats) {
int ok = 0;
WebPConfig config;
WebPPicture picture;
VP8LBitWriter tmp_bw;
WebPPictureInit(&picture);
picture.width = width;
@@ -83,16 +83,15 @@ static int EncodeLossless(const uint8_t* const data, int width, int height,
config.quality = 8.f * effort_level;
assert(config.quality >= 0 && config.quality <= 100.f);
ok = VP8LBitWriterInit(&tmp_bw, (width * height) >> 3);
ok = ok && (VP8LEncodeStream(&config, &picture, &tmp_bw) == VP8_ENC_OK);
ok = (VP8LEncodeStream(&config, &picture, bw) == VP8_ENC_OK);
WebPPictureFree(&picture);
if (ok) {
const uint8_t* const buffer = VP8LBitWriterFinish(&tmp_bw);
const size_t buffer_size = VP8LBitWriterNumBytes(&tmp_bw);
VP8BitWriterAppend(bw, buffer, buffer_size);
ok = ok && !bw->error_;
if (!ok) {
VP8LBitWriterDestroy(bw);
return 0;
}
VP8LBitWriterDestroy(&tmp_bw);
return ok && !bw->error_;
return 1;
}
// -----------------------------------------------------------------------------
@@ -114,8 +113,10 @@ static int EncodeAlphaInternal(const uint8_t* const data, int width, int height,
const uint8_t* alpha_src;
WebPFilterFunc filter_func;
uint8_t header;
size_t expected_size;
const size_t data_size = width * height;
const uint8_t* output = NULL;
size_t output_size = 0;
VP8LBitWriter tmp_bw;
assert((uint64_t)data_size == (uint64_t)width * height); // as per spec
assert(filter >= 0 && filter < WEBP_FILTER_LAST);
@@ -124,15 +125,6 @@ static int EncodeAlphaInternal(const uint8_t* const data, int width, int height,
assert(sizeof(header) == ALPHA_HEADER_LEN);
// TODO(skal): have a common function and #define's to validate alpha params.
expected_size =
(method == ALPHA_NO_COMPRESSION) ? (ALPHA_HEADER_LEN + data_size)
: (data_size >> 5);
header = method | (filter << 2);
if (reduce_levels) header |= ALPHA_PREPROCESSED_LEVELS << 4;
VP8BitWriterInit(&result->bw, expected_size);
VP8BitWriterAppend(&result->bw, &header, ALPHA_HEADER_LEN);
filter_func = WebPFilters[filter];
if (filter_func != NULL) {
filter_func(data, width, height, width, tmp_alpha);
@@ -141,14 +133,42 @@ static int EncodeAlphaInternal(const uint8_t* const data, int width, int height,
alpha_src = data;
}
if (method == ALPHA_NO_COMPRESSION) {
ok = VP8BitWriterAppend(&result->bw, alpha_src, width * height);
ok = ok && !result->bw.error_;
} else {
ok = EncodeLossless(alpha_src, width, height, effort_level,
&result->bw, &result->stats);
VP8BitWriterFinish(&result->bw);
if (method != ALPHA_NO_COMPRESSION) {
ok = VP8LBitWriterInit(&tmp_bw, data_size >> 3);
ok = ok && EncodeLossless(alpha_src, width, height, effort_level,
&tmp_bw, &result->stats);
if (ok) {
output = VP8LBitWriterFinish(&tmp_bw);
output_size = VP8LBitWriterNumBytes(&tmp_bw);
if (output_size > data_size) {
// compressed size is larger than source! Revert to uncompressed mode.
method = ALPHA_NO_COMPRESSION;
VP8LBitWriterDestroy(&tmp_bw);
}
} else {
VP8LBitWriterDestroy(&tmp_bw);
return 0;
}
}
if (method == ALPHA_NO_COMPRESSION) {
output = alpha_src;
output_size = data_size;
ok = 1;
}
// Emit final result.
header = method | (filter << 2);
if (reduce_levels) header |= ALPHA_PREPROCESSED_LEVELS << 4;
VP8BitWriterInit(&result->bw, ALPHA_HEADER_LEN + output_size);
ok = ok && VP8BitWriterAppend(&result->bw, &header, ALPHA_HEADER_LEN);
ok = ok && VP8BitWriterAppend(&result->bw, output, output_size);
if (method != ALPHA_NO_COMPRESSION) {
VP8LBitWriterDestroy(&tmp_bw);
}
ok = ok && !result->bw.error_;
result->score = VP8BitWriterSize(&result->bw);
return ok;
}
@@ -231,7 +251,7 @@ static int ApplyFiltersAndEncode(const uint8_t* alpha, int width, int height,
GetFilterMap(alpha, width, height, filter, effort_level);
InitFilterTrial(&best);
if (try_map != FILTER_TRY_NONE) {
uint8_t* filtered_alpha = (uint8_t*)malloc(data_size);
uint8_t* filtered_alpha = (uint8_t*)WebPSafeMalloc(1ULL, data_size);
if (filtered_alpha == NULL) return 0;
for (filter = WEBP_FILTER_NONE; ok && try_map; ++filter, try_map >>= 1) {
@@ -248,7 +268,7 @@ static int ApplyFiltersAndEncode(const uint8_t* alpha, int width, int height,
}
}
}
free(filtered_alpha);
WebPSafeFree(filtered_alpha);
} else {
ok = EncodeAlphaInternal(alpha, width, height, method, WEBP_FILTER_NONE,
reduce_levels, effort_level, NULL, &best);
@@ -298,7 +318,7 @@ static int EncodeAlpha(VP8Encoder* const enc,
filter = WEBP_FILTER_NONE;
}
quant_alpha = (uint8_t*)malloc(data_size);
quant_alpha = (uint8_t*)WebPSafeMalloc(1ULL, data_size);
if (quant_alpha == NULL) {
return 0;
}
@@ -325,7 +345,7 @@ static int EncodeAlpha(VP8Encoder* const enc,
}
}
free(quant_alpha);
WebPSafeFree(quant_alpha);
return ok;
}
@@ -346,7 +366,7 @@ static int CompressAlphaJob(VP8Encoder* const enc, void* dummy) {
return 0;
}
if (alpha_size != (uint32_t)alpha_size) { // Sanity check.
free(alpha_data);
WebPSafeFree(alpha_data);
return 0;
}
enc->alpha_data_size_ = (uint32_t)alpha_size;
@@ -361,7 +381,7 @@ void VP8EncInitAlpha(VP8Encoder* const enc) {
enc->alpha_data_size_ = 0;
if (enc->thread_level_ > 0) {
WebPWorker* const worker = &enc->alpha_worker_;
WebPWorkerInit(worker);
WebPGetWorkerInterface()->Init(worker);
worker->data1 = enc;
worker->data2 = NULL;
worker->hook = (WebPWorkerHook)CompressAlphaJob;
@@ -372,10 +392,11 @@ int VP8EncStartAlpha(VP8Encoder* const enc) {
if (enc->has_alpha_) {
if (enc->thread_level_ > 0) {
WebPWorker* const worker = &enc->alpha_worker_;
if (!WebPWorkerReset(worker)) { // Makes sure worker is good to go.
// Makes sure worker is good to go.
if (!WebPGetWorkerInterface()->Reset(worker)) {
return 0;
}
WebPWorkerLaunch(worker);
WebPGetWorkerInterface()->Launch(worker);
return 1;
} else {
return CompressAlphaJob(enc, NULL); // just do the job right away
@@ -388,7 +409,7 @@ int VP8EncFinishAlpha(VP8Encoder* const enc) {
if (enc->has_alpha_) {
if (enc->thread_level_ > 0) {
WebPWorker* const worker = &enc->alpha_worker_;
if (!WebPWorkerSync(worker)) return 0; // error
if (!WebPGetWorkerInterface()->Sync(worker)) return 0; // error
}
}
return WebPReportProgress(enc->pic_, enc->percent_ + 20, &enc->percent_);
@@ -398,10 +419,12 @@ int VP8EncDeleteAlpha(VP8Encoder* const enc) {
int ok = 1;
if (enc->thread_level_ > 0) {
WebPWorker* const worker = &enc->alpha_worker_;
ok = WebPWorkerSync(worker); // finish anything left in flight
WebPWorkerEnd(worker); // still need to end the worker, even if !ok
// finish anything left in flight
ok = WebPGetWorkerInterface()->Sync(worker);
// still need to end the worker, even if !ok
WebPGetWorkerInterface()->End(worker);
}
free(enc->alpha_data_);
WebPSafeFree(enc->alpha_data_);
enc->alpha_data_ = NULL;
enc->alpha_data_size_ = 0;
enc->has_alpha_ = 0;

View File

@@ -30,7 +30,7 @@ static void SmoothSegmentMap(VP8Encoder* const enc) {
const int w = enc->mb_w_;
const int h = enc->mb_h_;
const int majority_cnt_3_x_3_grid = 5;
uint8_t* const tmp = (uint8_t*)WebPSafeMalloc((uint64_t)w * h, sizeof(*tmp));
uint8_t* const tmp = (uint8_t*)WebPSafeMalloc(w * h, sizeof(*tmp));
assert((uint64_t)(w * h) == (uint64_t)w * h); // no overflow, as per spec
if (tmp == NULL) return;
@@ -63,7 +63,7 @@ static void SmoothSegmentMap(VP8Encoder* const enc) {
mb->segment_ = tmp[x + y * w];
}
}
free(tmp);
WebPSafeFree(tmp);
}
//------------------------------------------------------------------------------
@@ -141,7 +141,11 @@ static void MergeHistograms(const VP8Histogram* const in,
static void AssignSegments(VP8Encoder* const enc,
const int alphas[MAX_ALPHA + 1]) {
const int nb = enc->segment_hdr_.num_segments_;
// 'num_segments_' is previously validated and <= NUM_MB_SEGMENTS, but an
// explicit check is needed to avoid spurious warning about 'n + 1' exceeding
// array bounds of 'centers' with some compilers (noticed with gcc-4.9).
const int nb = (enc->segment_hdr_.num_segments_ < NUM_MB_SEGMENTS) ?
enc->segment_hdr_.num_segments_ : NUM_MB_SEGMENTS;
int centers[NUM_MB_SEGMENTS];
int weighted_average = 0;
int map[MAX_ALPHA + 1];
@@ -151,6 +155,7 @@ static void AssignSegments(VP8Encoder* const enc,
int accum[NUM_MB_SEGMENTS], dist_accum[NUM_MB_SEGMENTS];
assert(nb >= 1);
assert(nb <= NUM_MB_SEGMENTS);
// bracket the input
for (n = 0; n <= MAX_ALPHA && alphas[n] == 0; ++n) {}
@@ -225,18 +230,15 @@ static void AssignSegments(VP8Encoder* const enc,
// susceptibility and set best modes for this macroblock.
// Segment assignment is done later.
// Number of modes to inspect for alpha_ evaluation. For high-quality settings
// (method >= FAST_ANALYSIS_METHOD) we don't need to test all the possible modes
// during the analysis phase.
#define FAST_ANALYSIS_METHOD 4 // method above which we do partial analysis
// Number of modes to inspect for alpha_ evaluation. We don't need to test all
// the possible modes during the analysis phase: we risk falling into a local
// optimum, or be subject to boundary effect
#define MAX_INTRA16_MODE 2
#define MAX_INTRA4_MODE 2
#define MAX_UV_MODE 2
static int MBAnalyzeBestIntra16Mode(VP8EncIterator* const it) {
const int max_mode =
(it->enc_->method_ >= FAST_ANALYSIS_METHOD) ? MAX_INTRA16_MODE
: NUM_PRED_MODES;
const int max_mode = MAX_INTRA16_MODE;
int mode;
int best_alpha = DEFAULT_ALPHA;
int best_mode = 0;
@@ -262,9 +264,7 @@ static int MBAnalyzeBestIntra16Mode(VP8EncIterator* const it) {
static int MBAnalyzeBestIntra4Mode(VP8EncIterator* const it,
int best_alpha) {
uint8_t modes[16];
const int max_mode =
(it->enc_->method_ >= FAST_ANALYSIS_METHOD) ? MAX_INTRA4_MODE
: NUM_BMODES;
const int max_mode = MAX_INTRA4_MODE;
int i4_alpha;
VP8Histogram total_histo = { { 0 } };
int cur_histo = 0;
@@ -306,10 +306,9 @@ static int MBAnalyzeBestIntra4Mode(VP8EncIterator* const it,
static int MBAnalyzeBestUVMode(VP8EncIterator* const it) {
int best_alpha = DEFAULT_ALPHA;
int best_mode = 0;
const int max_mode =
(it->enc_->method_ >= FAST_ANALYSIS_METHOD) ? MAX_UV_MODE
: NUM_PRED_MODES;
const int max_mode = MAX_UV_MODE;
int mode;
VP8MakeChroma8Preds(it);
for (mode = 0; mode < max_mode; ++mode) {
VP8Histogram histo = { { 0 } };
@@ -425,7 +424,7 @@ static void MergeJobs(const SegmentJob* const src, SegmentJob* const dst) {
// initialize the job struct with some TODOs
static void InitSegmentJob(VP8Encoder* const enc, SegmentJob* const job,
int start_row, int end_row) {
WebPWorkerInit(&job->worker);
WebPGetWorkerInterface()->Init(&job->worker);
job->worker.data1 = job;
job->worker.data2 = &job->it;
job->worker.hook = (WebPWorkerHook)DoSegmentsJob;
@@ -458,6 +457,8 @@ int VP8EncAnalyze(VP8Encoder* const enc) {
#else
const int do_mt = 0;
#endif
const WebPWorkerInterface* const worker_interface =
WebPGetWorkerInterface();
SegmentJob main_job;
if (do_mt) {
SegmentJob side_job;
@@ -467,23 +468,23 @@ int VP8EncAnalyze(VP8Encoder* const enc) {
InitSegmentJob(enc, &side_job, split_row, last_row);
// we don't need to call Reset() on main_job.worker, since we're calling
// WebPWorkerExecute() on it
ok &= WebPWorkerReset(&side_job.worker);
ok &= worker_interface->Reset(&side_job.worker);
// launch the two jobs in parallel
if (ok) {
WebPWorkerLaunch(&side_job.worker);
WebPWorkerExecute(&main_job.worker);
ok &= WebPWorkerSync(&side_job.worker);
ok &= WebPWorkerSync(&main_job.worker);
worker_interface->Launch(&side_job.worker);
worker_interface->Execute(&main_job.worker);
ok &= worker_interface->Sync(&side_job.worker);
ok &= worker_interface->Sync(&main_job.worker);
}
WebPWorkerEnd(&side_job.worker);
worker_interface->End(&side_job.worker);
if (ok) MergeJobs(&side_job, &main_job); // merge results together
} else {
// Even for single-thread case, we use the generic Worker tools.
InitSegmentJob(enc, &main_job, 0, last_row);
WebPWorkerExecute(&main_job.worker);
ok &= WebPWorkerSync(&main_job.worker);
worker_interface->Execute(&main_job.worker);
ok &= worker_interface->Sync(&main_job.worker);
}
WebPWorkerEnd(&main_job.worker);
worker_interface->End(&main_job.worker);
if (ok) {
enc->alpha_ = main_job.alpha / total_mb;
enc->uv_alpha_ = main_job.uv_alpha / total_mb;

View File

@@ -12,7 +12,6 @@
#include <assert.h>
#include <math.h>
#include <stdio.h>
#include "./backward_references.h"
#include "./histogram.h"
@@ -22,10 +21,12 @@
#define VALUES_IN_BYTE 256
#define HASH_BITS 18
#define HASH_SIZE (1 << HASH_BITS)
#define HASH_MULTIPLIER (0xc6a4a7935bd1e995ULL)
#define MIN_BLOCK_SIZE 256 // minimum block size for backward references
#define MAX_ENTROPY (1e30f)
// 1M window (4M bytes) minus 120 special codes for short distances.
#define WINDOW_SIZE ((1 << 20) - 120)
@@ -33,14 +34,6 @@
#define MIN_LENGTH 2
#define MAX_LENGTH 4096
typedef struct {
// Stores the most recently added position with the given hash value.
int32_t hash_to_first_index_[HASH_SIZE];
// chain_[pos] stores the previous position with the same hash value
// for every pixel in the image.
int32_t* chain_;
} HashChain;
// -----------------------------------------------------------------------------
static const uint8_t plane_to_code_lut[128] = {
@@ -78,65 +71,152 @@ static WEBP_INLINE int FindMatchLength(const uint32_t* const array1,
// -----------------------------------------------------------------------------
// VP8LBackwardRefs
void VP8LInitBackwardRefs(VP8LBackwardRefs* const refs) {
if (refs != NULL) {
refs->refs = NULL;
refs->size = 0;
refs->max_size = 0;
}
}
struct PixOrCopyBlock {
PixOrCopyBlock* next_; // next block (or NULL)
PixOrCopy* start_; // data start
int size_; // currently used size
};
void VP8LClearBackwardRefs(VP8LBackwardRefs* const refs) {
if (refs != NULL) {
free(refs->refs);
VP8LInitBackwardRefs(refs);
}
}
int VP8LBackwardRefsAlloc(VP8LBackwardRefs* const refs, int max_size) {
static void ClearBackwardRefs(VP8LBackwardRefs* const refs) {
assert(refs != NULL);
refs->size = 0;
refs->max_size = 0;
refs->refs = (PixOrCopy*)WebPSafeMalloc((uint64_t)max_size,
sizeof(*refs->refs));
if (refs->refs == NULL) return 0;
refs->max_size = max_size;
if (refs->tail_ != NULL) {
*refs->tail_ = refs->free_blocks_; // recycle all blocks at once
}
refs->free_blocks_ = refs->refs_;
refs->tail_ = &refs->refs_;
refs->last_block_ = NULL;
refs->refs_ = NULL;
}
void VP8LBackwardRefsClear(VP8LBackwardRefs* const refs) {
assert(refs != NULL);
ClearBackwardRefs(refs);
while (refs->free_blocks_ != NULL) {
PixOrCopyBlock* const next = refs->free_blocks_->next_;
WebPSafeFree(refs->free_blocks_);
refs->free_blocks_ = next;
}
}
void VP8LBackwardRefsInit(VP8LBackwardRefs* const refs, int block_size) {
assert(refs != NULL);
memset(refs, 0, sizeof(*refs));
refs->tail_ = &refs->refs_;
refs->block_size_ =
(block_size < MIN_BLOCK_SIZE) ? MIN_BLOCK_SIZE : block_size;
}
VP8LRefsCursor VP8LRefsCursorInit(const VP8LBackwardRefs* const refs) {
VP8LRefsCursor c;
c.cur_block_ = refs->refs_;
if (refs->refs_ != NULL) {
c.cur_pos = c.cur_block_->start_;
c.last_pos_ = c.cur_pos + c.cur_block_->size_;
} else {
c.cur_pos = NULL;
c.last_pos_ = NULL;
}
return c;
}
void VP8LRefsCursorNextBlock(VP8LRefsCursor* const c) {
PixOrCopyBlock* const b = c->cur_block_->next_;
c->cur_pos = (b == NULL) ? NULL : b->start_;
c->last_pos_ = (b == NULL) ? NULL : b->start_ + b->size_;
c->cur_block_ = b;
}
// Create a new block, either from the free list or allocated
static PixOrCopyBlock* BackwardRefsNewBlock(VP8LBackwardRefs* const refs) {
PixOrCopyBlock* b = refs->free_blocks_;
if (b == NULL) { // allocate new memory chunk
const size_t total_size =
sizeof(*b) + refs->block_size_ * sizeof(*b->start_);
b = (PixOrCopyBlock*)WebPSafeMalloc(1ULL, total_size);
if (b == NULL) {
refs->error_ |= 1;
return NULL;
}
b->start_ = (PixOrCopy*)((uint8_t*)b + sizeof(*b)); // not always aligned
} else { // recycle from free-list
refs->free_blocks_ = b->next_;
}
*refs->tail_ = b;
refs->tail_ = &b->next_;
refs->last_block_ = b;
b->next_ = NULL;
b->size_ = 0;
return b;
}
static WEBP_INLINE void BackwardRefsCursorAdd(VP8LBackwardRefs* const refs,
const PixOrCopy v) {
PixOrCopyBlock* b = refs->last_block_;
if (b == NULL || b->size_ == refs->block_size_) {
b = BackwardRefsNewBlock(refs);
if (b == NULL) return; // refs->error_ is set
}
b->start_[b->size_++] = v;
}
int VP8LBackwardRefsCopy(const VP8LBackwardRefs* const src,
VP8LBackwardRefs* const dst) {
const PixOrCopyBlock* b = src->refs_;
ClearBackwardRefs(dst);
assert(src->block_size_ == dst->block_size_);
while (b != NULL) {
PixOrCopyBlock* const new_b = BackwardRefsNewBlock(dst);
if (new_b == NULL) return 0; // dst->error_ is set
memcpy(new_b->start_, b->start_, b->size_ * sizeof(*b->start_));
new_b->size_ = b->size_;
b = b->next_;
}
return 1;
}
// -----------------------------------------------------------------------------
// Hash chains
static WEBP_INLINE uint64_t GetPixPairHash64(const uint32_t* const argb) {
uint64_t key = ((uint64_t)(argb[1]) << 32) | argb[0];
key = (key * HASH_MULTIPLIER) >> (64 - HASH_BITS);
return key;
}
static int HashChainInit(HashChain* const p, int size) {
// initialize as empty
static void HashChainInit(VP8LHashChain* const p) {
int i;
p->chain_ = (int*)WebPSafeMalloc((uint64_t)size, sizeof(*p->chain_));
if (p->chain_ == NULL) {
return 0;
}
for (i = 0; i < size; ++i) {
assert(p != NULL);
for (i = 0; i < p->size_; ++i) {
p->chain_[i] = -1;
}
for (i = 0; i < HASH_SIZE; ++i) {
p->hash_to_first_index_[i] = -1;
}
}
int VP8LHashChainInit(VP8LHashChain* const p, int size) {
assert(p->size_ == 0);
assert(p->chain_ == NULL);
assert(size > 0);
p->chain_ = (int*)WebPSafeMalloc(size, sizeof(*p->chain_));
if (p->chain_ == NULL) return 0;
p->size_ = size;
HashChainInit(p);
return 1;
}
static void HashChainDelete(HashChain* const p) {
if (p != NULL) {
free(p->chain_);
free(p);
}
void VP8LHashChainClear(VP8LHashChain* const p) {
assert(p != NULL);
WebPSafeFree(p->chain_);
p->size_ = 0;
p->chain_ = NULL;
}
// -----------------------------------------------------------------------------
static WEBP_INLINE uint64_t GetPixPairHash64(const uint32_t* const argb) {
uint64_t key = ((uint64_t)argb[1] << 32) | argb[0];
key = (key * HASH_MULTIPLIER) >> (64 - HASH_BITS);
return key;
}
// Insertion of two pixels at a time.
static void HashChainInsert(HashChain* const p,
static void HashChainInsert(VP8LHashChain* const p,
const uint32_t* const argb, int pos) {
const uint64_t hash_code = GetPixPairHash64(argb);
p->chain_[pos] = p->hash_to_first_index_[hash_code];
@@ -161,7 +241,7 @@ static void GetParamsForHashChainFindCopy(int quality, int xsize,
*iter_limit = (cache_bits > 0) ? iter_neg : iter_neg / 2;
}
static int HashChainFindCopy(const HashChain* const p,
static int HashChainFindCopy(const VP8LHashChain* const p,
int base_position, int xsize_signed,
const uint32_t* const argb, int max_len,
int window_size, int iter_pos, int iter_limit,
@@ -185,10 +265,8 @@ static int HashChainFindCopy(const HashChain* const p,
uint64_t val;
uint32_t curr_length;
uint32_t distance;
const uint64_t* const ptr1 =
(const uint64_t*)(argb + pos + best_length - 1);
const uint64_t* const ptr2 =
(const uint64_t*)(argb_start + best_length - 1);
const uint32_t* const ptr1 = (argb + pos + best_length - 1);
const uint32_t* const ptr2 = (argb_start + best_length - 1);
if (iter_pos < 0) {
if (iter_pos < iter_limit || best_val >= 0xff0000) {
@@ -199,7 +277,7 @@ static int HashChainFindCopy(const HashChain* const p,
// Before 'expensive' linear match, check if the two arrays match at the
// current best length index and also for the succeeding elements.
if (*ptr1 != *ptr2) continue;
if (ptr1[0] != ptr2[0] || ptr1[1] != ptr2[1]) continue;
curr_length = FindMatchLength(argb + pos, argb_start, max_len);
if (curr_length < best_length) continue;
@@ -237,64 +315,61 @@ static int HashChainFindCopy(const HashChain* const p,
}
static WEBP_INLINE void PushBackCopy(VP8LBackwardRefs* const refs, int length) {
int size = refs->size;
while (length >= MAX_LENGTH) {
refs->refs[size++] = PixOrCopyCreateCopy(1, MAX_LENGTH);
BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(1, MAX_LENGTH));
length -= MAX_LENGTH;
}
if (length > 0) {
refs->refs[size++] = PixOrCopyCreateCopy(1, length);
BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(1, length));
}
refs->size = size;
}
static void BackwardReferencesRle(int xsize, int ysize,
const uint32_t* const argb,
VP8LBackwardRefs* const refs) {
static int BackwardReferencesRle(int xsize, int ysize,
const uint32_t* const argb,
VP8LBackwardRefs* const refs) {
const int pix_count = xsize * ysize;
int match_len = 0;
int i;
refs->size = 0;
ClearBackwardRefs(refs);
PushBackCopy(refs, match_len); // i=0 case
refs->refs[refs->size++] = PixOrCopyCreateLiteral(argb[0]);
BackwardRefsCursorAdd(refs, PixOrCopyCreateLiteral(argb[0]));
for (i = 1; i < pix_count; ++i) {
if (argb[i] == argb[i - 1]) {
++match_len;
} else {
PushBackCopy(refs, match_len);
match_len = 0;
refs->refs[refs->size++] = PixOrCopyCreateLiteral(argb[i]);
BackwardRefsCursorAdd(refs, PixOrCopyCreateLiteral(argb[i]));
}
}
PushBackCopy(refs, match_len);
return !refs->error_;
}
static int BackwardReferencesHashChain(int xsize, int ysize,
const uint32_t* const argb,
int cache_bits, int quality,
VP8LHashChain* const hash_chain,
VP8LBackwardRefs* const refs) {
int i;
int ok = 0;
int cc_init = 0;
const int use_color_cache = (cache_bits > 0);
const int pix_count = xsize * ysize;
HashChain* const hash_chain = (HashChain*)malloc(sizeof(*hash_chain));
VP8LColorCache hashers;
int window_size = WINDOW_SIZE;
int iter_pos = 1;
int iter_limit = -1;
if (hash_chain == NULL) return 0;
if (use_color_cache) {
cc_init = VP8LColorCacheInit(&hashers, cache_bits);
if (!cc_init) goto Error;
}
if (!HashChainInit(hash_chain, pix_count)) goto Error;
refs->size = 0;
ClearBackwardRefs(refs);
GetParamsForHashChainFindCopy(quality, xsize, cache_bits,
&window_size, &iter_pos, &iter_limit);
HashChainInit(hash_chain);
for (i = 0; i < pix_count; ) {
// Alternative#1: Code the pixels starting at 'i' using backward reference.
int offset = 0;
@@ -320,14 +395,15 @@ static int BackwardReferencesHashChain(int xsize, int ysize,
if (len2 > len + 1) {
const uint32_t pixel = argb[i];
// Alternative#2 is a better match. So push pixel at 'i' as literal.
PixOrCopy v;
if (use_color_cache && VP8LColorCacheContains(&hashers, pixel)) {
const int ix = VP8LColorCacheGetIndex(&hashers, pixel);
refs->refs[refs->size] = PixOrCopyCreateCacheIdx(ix);
v = PixOrCopyCreateCacheIdx(ix);
} else {
if (use_color_cache) VP8LColorCacheInsert(&hashers, pixel);
refs->refs[refs->size] = PixOrCopyCreateLiteral(pixel);
v = PixOrCopyCreateLiteral(pixel);
}
++refs->size;
BackwardRefsCursorAdd(refs, v);
i++; // Backward reference to be done for next pixel.
len = len2;
offset = offset2;
@@ -336,7 +412,7 @@ static int BackwardReferencesHashChain(int xsize, int ysize,
if (len >= MAX_LENGTH) {
len = MAX_LENGTH - 1;
}
refs->refs[refs->size++] = PixOrCopyCreateCopy(offset, len);
BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(offset, len));
if (use_color_cache) {
for (k = 0; k < len; ++k) {
VP8LColorCacheInsert(&hashers, argb[i + k]);
@@ -352,25 +428,25 @@ static int BackwardReferencesHashChain(int xsize, int ysize,
i += len;
} else {
const uint32_t pixel = argb[i];
PixOrCopy v;
if (use_color_cache && VP8LColorCacheContains(&hashers, pixel)) {
// push pixel as a PixOrCopyCreateCacheIdx pixel
const int ix = VP8LColorCacheGetIndex(&hashers, pixel);
refs->refs[refs->size] = PixOrCopyCreateCacheIdx(ix);
v = PixOrCopyCreateCacheIdx(ix);
} else {
if (use_color_cache) VP8LColorCacheInsert(&hashers, pixel);
refs->refs[refs->size] = PixOrCopyCreateLiteral(pixel);
v = PixOrCopyCreateLiteral(pixel);
}
++refs->size;
BackwardRefsCursorAdd(refs, v);
if (i + 1 < pix_count) {
HashChainInsert(hash_chain, &argb[i], i);
}
++i;
}
}
ok = 1;
ok = !refs->error_;
Error:
if (cc_init) VP8LColorCacheClear(&hashers);
HashChainDelete(hash_chain);
return ok;
}
@@ -387,11 +463,12 @@ typedef struct {
static int BackwardReferencesTraceBackwards(
int xsize, int ysize, int recursive_cost_model,
const uint32_t* const argb, int quality, int cache_bits,
VP8LHashChain* const hash_chain,
VP8LBackwardRefs* const refs);
static void ConvertPopulationCountTableToBitEstimates(
int num_symbols, const int population_counts[], double output[]) {
int sum = 0;
int num_symbols, const uint32_t population_counts[], double output[]) {
uint32_t sum = 0;
int nonzeros = 0;
int i;
for (i = 0; i < num_symbols; ++i) {
@@ -412,39 +489,45 @@ static void ConvertPopulationCountTableToBitEstimates(
static int CostModelBuild(CostModel* const m, int xsize, int ysize,
int recursion_level, const uint32_t* const argb,
int quality, int cache_bits) {
int quality, int cache_bits,
VP8LHashChain* const hash_chain,
VP8LBackwardRefs* const refs) {
int ok = 0;
VP8LHistogram histo;
VP8LBackwardRefs refs;
if (!VP8LBackwardRefsAlloc(&refs, xsize * ysize)) goto Error;
VP8LHistogram* histo = NULL;
ClearBackwardRefs(refs);
if (recursion_level > 0) {
if (!BackwardReferencesTraceBackwards(xsize, ysize, recursion_level - 1,
argb, quality, cache_bits, &refs)) {
argb, quality, cache_bits, hash_chain,
refs)) {
goto Error;
}
} else {
if (!BackwardReferencesHashChain(xsize, ysize, argb, cache_bits, quality,
&refs)) {
hash_chain, refs)) {
goto Error;
}
}
VP8LHistogramCreate(&histo, &refs, cache_bits);
histo = VP8LAllocateHistogram(cache_bits);
if (histo == NULL) goto Error;
VP8LHistogramCreate(histo, refs, cache_bits);
ConvertPopulationCountTableToBitEstimates(
VP8LHistogramNumCodes(&histo), histo.literal_, m->literal_);
VP8LHistogramNumCodes(histo->palette_code_bits_),
histo->literal_, m->literal_);
ConvertPopulationCountTableToBitEstimates(
VALUES_IN_BYTE, histo.red_, m->red_);
VALUES_IN_BYTE, histo->red_, m->red_);
ConvertPopulationCountTableToBitEstimates(
VALUES_IN_BYTE, histo.blue_, m->blue_);
VALUES_IN_BYTE, histo->blue_, m->blue_);
ConvertPopulationCountTableToBitEstimates(
VALUES_IN_BYTE, histo.alpha_, m->alpha_);
VALUES_IN_BYTE, histo->alpha_, m->alpha_);
ConvertPopulationCountTableToBitEstimates(
NUM_DISTANCE_CODES, histo.distance_, m->distance_);
NUM_DISTANCE_CODES, histo->distance_, m->distance_);
ok = 1;
Error:
VP8LClearBackwardRefs(&refs);
VP8LFreeHistogram(histo);
return ok;
}
@@ -476,16 +559,16 @@ static WEBP_INLINE double GetDistanceCost(const CostModel* const m,
static int BackwardReferencesHashChainDistanceOnly(
int xsize, int ysize, int recursive_cost_model, const uint32_t* const argb,
int quality, int cache_bits, uint32_t* const dist_array) {
int quality, int cache_bits, VP8LHashChain* const hash_chain,
VP8LBackwardRefs* const refs, uint32_t* const dist_array) {
int i;
int ok = 0;
int cc_init = 0;
const int pix_count = xsize * ysize;
const int use_color_cache = (cache_bits > 0);
float* const cost =
(float*)WebPSafeMalloc((uint64_t)pix_count, sizeof(*cost));
CostModel* cost_model = (CostModel*)malloc(sizeof(*cost_model));
HashChain* hash_chain = (HashChain*)malloc(sizeof(*hash_chain));
(float*)WebPSafeMalloc(pix_count, sizeof(*cost));
CostModel* cost_model = (CostModel*)WebPSafeMalloc(1ULL, sizeof(*cost_model));
VP8LColorCache hashers;
const double mul0 = (recursive_cost_model != 0) ? 1.0 : 0.68;
const double mul1 = (recursive_cost_model != 0) ? 1.0 : 0.82;
@@ -494,9 +577,7 @@ static int BackwardReferencesHashChainDistanceOnly(
int iter_pos = 1;
int iter_limit = -1;
if (cost == NULL || cost_model == NULL || hash_chain == NULL) goto Error;
if (!HashChainInit(hash_chain, pix_count)) goto Error;
if (cost == NULL || cost_model == NULL) goto Error;
if (use_color_cache) {
cc_init = VP8LColorCacheInit(&hashers, cache_bits);
@@ -504,7 +585,7 @@ static int BackwardReferencesHashChainDistanceOnly(
}
if (!CostModelBuild(cost_model, xsize, ysize, recursive_cost_model, argb,
quality, cache_bits)) {
quality, cache_bits, hash_chain, refs)) {
goto Error;
}
@@ -515,6 +596,7 @@ static int BackwardReferencesHashChainDistanceOnly(
dist_array[0] = 0;
GetParamsForHashChainFindCopy(quality, xsize, cache_bits,
&window_size, &iter_pos, &iter_limit);
HashChainInit(hash_chain);
for (i = 0; i < pix_count; ++i) {
double prev_cost = 0.0;
int shortmax;
@@ -589,12 +671,11 @@ static int BackwardReferencesHashChainDistanceOnly(
}
// Last pixel still to do, it can only be a single step if not reached
// through cheaper means already.
ok = 1;
ok = !refs->error_;
Error:
if (cc_init) VP8LColorCacheClear(&hashers);
HashChainDelete(hash_chain);
free(cost_model);
free(cost);
WebPSafeFree(cost_model);
WebPSafeFree(cost);
return ok;
}
@@ -621,6 +702,7 @@ static int BackwardReferencesHashChainFollowChosenPath(
int xsize, int ysize, const uint32_t* const argb,
int quality, int cache_bits,
const uint32_t* const chosen_path, int chosen_path_size,
VP8LHashChain* const hash_chain,
VP8LBackwardRefs* const refs) {
const int pix_count = xsize * ysize;
const int use_color_cache = (cache_bits > 0);
@@ -633,20 +715,17 @@ static int BackwardReferencesHashChainFollowChosenPath(
int window_size = WINDOW_SIZE;
int iter_pos = 1;
int iter_limit = -1;
HashChain* hash_chain = (HashChain*)malloc(sizeof(*hash_chain));
VP8LColorCache hashers;
if (hash_chain == NULL || !HashChainInit(hash_chain, pix_count)) {
goto Error;
}
if (use_color_cache) {
cc_init = VP8LColorCacheInit(&hashers, cache_bits);
if (!cc_init) goto Error;
}
refs->size = 0;
ClearBackwardRefs(refs);
GetParamsForHashChainFindCopy(quality, xsize, cache_bits,
&window_size, &iter_pos, &iter_limit);
HashChainInit(hash_chain);
for (ix = 0; ix < chosen_path_size; ++ix, ++size) {
int offset = 0;
int len = 0;
@@ -656,7 +735,7 @@ static int BackwardReferencesHashChainFollowChosenPath(
window_size, iter_pos, iter_limit,
&offset, &len);
assert(len == max_len);
refs->refs[size] = PixOrCopyCreateCopy(offset, len);
BackwardRefsCursorAdd(refs, PixOrCopyCreateCopy(offset, len));
if (use_color_cache) {
for (k = 0; k < len; ++k) {
VP8LColorCacheInsert(&hashers, argb[i + k]);
@@ -670,26 +749,25 @@ static int BackwardReferencesHashChainFollowChosenPath(
}
i += len;
} else {
PixOrCopy v;
if (use_color_cache && VP8LColorCacheContains(&hashers, argb[i])) {
// push pixel as a color cache index
const int idx = VP8LColorCacheGetIndex(&hashers, argb[i]);
refs->refs[size] = PixOrCopyCreateCacheIdx(idx);
v = PixOrCopyCreateCacheIdx(idx);
} else {
if (use_color_cache) VP8LColorCacheInsert(&hashers, argb[i]);
refs->refs[size] = PixOrCopyCreateLiteral(argb[i]);
v = PixOrCopyCreateLiteral(argb[i]);
}
BackwardRefsCursorAdd(refs, v);
if (i + 1 < pix_count) {
HashChainInsert(hash_chain, &argb[i], i);
}
++i;
}
}
assert(size <= refs->max_size);
refs->size = size;
ok = 1;
ok = !refs->error_;
Error:
if (cc_init) VP8LColorCacheClear(&hashers);
HashChainDelete(hash_chain);
return ok;
}
@@ -698,142 +776,129 @@ static int BackwardReferencesTraceBackwards(int xsize, int ysize,
int recursive_cost_model,
const uint32_t* const argb,
int quality, int cache_bits,
VP8LHashChain* const hash_chain,
VP8LBackwardRefs* const refs) {
int ok = 0;
const int dist_array_size = xsize * ysize;
uint32_t* chosen_path = NULL;
int chosen_path_size = 0;
uint32_t* dist_array =
(uint32_t*)WebPSafeMalloc((uint64_t)dist_array_size, sizeof(*dist_array));
(uint32_t*)WebPSafeMalloc(dist_array_size, sizeof(*dist_array));
if (dist_array == NULL) goto Error;
if (!BackwardReferencesHashChainDistanceOnly(
xsize, ysize, recursive_cost_model, argb, quality, cache_bits,
dist_array)) {
xsize, ysize, recursive_cost_model, argb, quality, cache_bits, hash_chain,
refs, dist_array)) {
goto Error;
}
TraceBackwards(dist_array, dist_array_size, &chosen_path, &chosen_path_size);
if (!BackwardReferencesHashChainFollowChosenPath(
xsize, ysize, argb, quality, cache_bits, chosen_path, chosen_path_size,
refs)) {
hash_chain, refs)) {
goto Error;
}
ok = 1;
Error:
free(dist_array);
WebPSafeFree(dist_array);
return ok;
}
static void BackwardReferences2DLocality(int xsize,
VP8LBackwardRefs* const refs) {
int i;
for (i = 0; i < refs->size; ++i) {
if (PixOrCopyIsCopy(&refs->refs[i])) {
const int dist = refs->refs[i].argb_or_distance;
const VP8LBackwardRefs* const refs) {
VP8LRefsCursor c = VP8LRefsCursorInit(refs);
while (VP8LRefsCursorOk(&c)) {
if (PixOrCopyIsCopy(c.cur_pos)) {
const int dist = c.cur_pos->argb_or_distance;
const int transformed_dist = DistanceToPlaneCode(xsize, dist);
refs->refs[i].argb_or_distance = transformed_dist;
c.cur_pos->argb_or_distance = transformed_dist;
}
VP8LRefsCursorNext(&c);
}
}
int VP8LGetBackwardReferences(int width, int height,
const uint32_t* const argb,
int quality, int cache_bits, int use_2d_locality,
VP8LBackwardRefs* const best) {
int ok = 0;
VP8LBackwardRefs* VP8LGetBackwardReferences(
int width, int height, const uint32_t* const argb, int quality,
int cache_bits, int use_2d_locality, VP8LHashChain* const hash_chain,
VP8LBackwardRefs refs_array[2]) {
int lz77_is_useful;
VP8LBackwardRefs refs_rle, refs_lz77;
const int num_pix = width * height;
VP8LBackwardRefsAlloc(&refs_rle, num_pix);
VP8LBackwardRefsAlloc(&refs_lz77, num_pix);
VP8LInitBackwardRefs(best);
if (refs_rle.refs == NULL || refs_lz77.refs == NULL) {
Error1:
VP8LClearBackwardRefs(&refs_rle);
VP8LClearBackwardRefs(&refs_lz77);
goto End;
}
VP8LBackwardRefs* best = NULL;
VP8LBackwardRefs* const refs_lz77 = &refs_array[0];
VP8LBackwardRefs* const refs_rle = &refs_array[1];
if (!BackwardReferencesHashChain(width, height, argb, cache_bits, quality,
&refs_lz77)) {
goto End;
hash_chain, refs_lz77)) {
return NULL;
}
if (!BackwardReferencesRle(width, height, argb, refs_rle)) {
return NULL;
}
// Backward Reference using RLE only.
BackwardReferencesRle(width, height, argb, &refs_rle);
{
double bit_cost_lz77, bit_cost_rle;
VP8LHistogram* const histo = (VP8LHistogram*)malloc(sizeof(*histo));
if (histo == NULL) goto Error1;
// Evaluate lz77 coding
VP8LHistogramCreate(histo, &refs_lz77, cache_bits);
VP8LHistogram* const histo = VP8LAllocateHistogram(cache_bits);
if (histo == NULL) return NULL;
// Evaluate LZ77 coding.
VP8LHistogramCreate(histo, refs_lz77, cache_bits);
bit_cost_lz77 = VP8LHistogramEstimateBits(histo);
// Evaluate RLE coding
VP8LHistogramCreate(histo, &refs_rle, cache_bits);
// Evaluate RLE coding.
VP8LHistogramCreate(histo, refs_rle, cache_bits);
bit_cost_rle = VP8LHistogramEstimateBits(histo);
// Decide if LZ77 is useful.
lz77_is_useful = (bit_cost_lz77 < bit_cost_rle);
free(histo);
VP8LFreeHistogram(histo);
}
// Choose appropriate backward reference.
if (lz77_is_useful) {
// TraceBackwards is costly. Don't execute it at lower quality.
const int try_lz77_trace_backwards = (quality >= 25);
*best = refs_lz77; // default guess: lz77 is better
VP8LClearBackwardRefs(&refs_rle);
best = refs_lz77; // default guess: lz77 is better
if (try_lz77_trace_backwards) {
// Set recursion level for large images using a color cache.
const int recursion_level =
(num_pix < 320 * 200) && (cache_bits > 0) ? 1 : 0;
VP8LBackwardRefs refs_trace;
if (!VP8LBackwardRefsAlloc(&refs_trace, num_pix)) {
goto End;
}
VP8LBackwardRefs* const refs_trace = &refs_array[1];
ClearBackwardRefs(refs_trace);
if (BackwardReferencesTraceBackwards(width, height, recursion_level, argb,
quality, cache_bits, &refs_trace)) {
VP8LClearBackwardRefs(&refs_lz77);
*best = refs_trace;
quality, cache_bits, hash_chain,
refs_trace)) {
best = refs_trace;
}
}
} else {
VP8LClearBackwardRefs(&refs_lz77);
*best = refs_rle;
best = refs_rle;
}
if (use_2d_locality) BackwardReferences2DLocality(width, best);
ok = 1;
End:
if (!ok) {
VP8LClearBackwardRefs(best);
}
return ok;
return best;
}
// Returns 1 on success.
static int ComputeCacheHistogram(const uint32_t* const argb,
int xsize, int ysize,
const VP8LBackwardRefs* const refs,
int cache_bits,
VP8LHistogram* const histo) {
// Returns entropy for the given cache bits.
static double ComputeCacheEntropy(const uint32_t* const argb,
int xsize, int ysize,
const VP8LBackwardRefs* const refs,
int cache_bits) {
int pixel_index = 0;
int i;
uint32_t k;
VP8LColorCache hashers;
const int use_color_cache = (cache_bits > 0);
int cc_init = 0;
double entropy = MAX_ENTROPY;
const double kSmallPenaltyForLargeCache = 4.0;
VP8LColorCache hashers;
VP8LRefsCursor c = VP8LRefsCursorInit(refs);
VP8LHistogram* histo = VP8LAllocateHistogram(cache_bits);
if (histo == NULL) goto Error;
if (use_color_cache) {
cc_init = VP8LColorCacheInit(&hashers, cache_bits);
if (!cc_init) return 0;
if (!cc_init) goto Error;
}
for (i = 0; i < refs->size; ++i) {
const PixOrCopy* const v = &refs->refs[i];
while (VP8LRefsCursorOk(&c)) {
const PixOrCopy* const v = c.cur_pos;
if (PixOrCopyIsLiteral(v)) {
if (use_color_cache &&
VP8LColorCacheContains(&hashers, argb[pixel_index])) {
@@ -853,42 +918,58 @@ static int ComputeCacheHistogram(const uint32_t* const argb,
}
}
pixel_index += PixOrCopyLength(v);
VP8LRefsCursorNext(&c);
}
assert(pixel_index == xsize * ysize);
(void)xsize; // xsize is not used in non-debug compilations otherwise.
(void)ysize; // ysize is not used in non-debug compilations otherwise.
entropy = VP8LHistogramEstimateBits(histo) +
kSmallPenaltyForLargeCache * cache_bits;
Error:
if (cc_init) VP8LColorCacheClear(&hashers);
return 1;
VP8LFreeHistogram(histo);
return entropy;
}
// Returns how many bits are to be used for a color cache.
// *best_cache_bits will contain how many bits are to be used for a color cache.
// Returns 0 in case of memory error.
int VP8LCalculateEstimateForCacheSize(const uint32_t* const argb,
int xsize, int ysize,
int xsize, int ysize, int quality,
VP8LHashChain* const hash_chain,
VP8LBackwardRefs* const refs,
int* const best_cache_bits) {
int ok = 0;
int cache_bits;
double lowest_entropy = 1e99;
VP8LBackwardRefs refs;
static const double kSmallPenaltyForLargeCache = 4.0;
static const int quality = 30;
if (!VP8LBackwardRefsAlloc(&refs, xsize * ysize) ||
!BackwardReferencesHashChain(xsize, ysize, argb, 0, quality, &refs)) {
goto Error;
int eval_low = 1;
int eval_high = 1;
double entropy_low = MAX_ENTROPY;
double entropy_high = MAX_ENTROPY;
int cache_bits_low = 0;
int cache_bits_high = MAX_COLOR_CACHE_BITS;
if (!BackwardReferencesHashChain(xsize, ysize, argb, 0, quality, hash_chain,
refs)) {
return 0;
}
for (cache_bits = 0; cache_bits <= MAX_COLOR_CACHE_BITS; ++cache_bits) {
double cur_entropy;
VP8LHistogram histo;
VP8LHistogramInit(&histo, cache_bits);
ComputeCacheHistogram(argb, xsize, ysize, &refs, cache_bits, &histo);
cur_entropy = VP8LHistogramEstimateBits(&histo) +
kSmallPenaltyForLargeCache * cache_bits;
if (cache_bits == 0 || cur_entropy < lowest_entropy) {
*best_cache_bits = cache_bits;
lowest_entropy = cur_entropy;
// Do a binary search to find the optimal entropy for cache_bits.
while (cache_bits_high - cache_bits_low > 1) {
if (eval_low) {
entropy_low =
ComputeCacheEntropy(argb, xsize, ysize, refs, cache_bits_low);
eval_low = 0;
}
if (eval_high) {
entropy_high =
ComputeCacheEntropy(argb, xsize, ysize, refs, cache_bits_high);
eval_high = 0;
}
if (entropy_high < entropy_low) {
*best_cache_bits = cache_bits_high;
cache_bits_low = (cache_bits_low + cache_bits_high) / 2;
eval_low = 1;
} else {
*best_cache_bits = cache_bits_low;
cache_bits_high = (cache_bits_low + cache_bits_high) / 2;
eval_high = 1;
}
}
ok = 1;
Error:
VP8LClearBackwardRefs(&refs);
return ok;
return 1;
}

View File

@@ -113,36 +113,96 @@ static WEBP_INLINE uint32_t PixOrCopyDistance(const PixOrCopy* const p) {
}
// -----------------------------------------------------------------------------
// VP8LBackwardRefs
// VP8LHashChain
#define HASH_BITS 18
#define HASH_SIZE (1 << HASH_BITS)
typedef struct VP8LHashChain VP8LHashChain;
struct VP8LHashChain {
// Stores the most recently added position with the given hash value.
int32_t hash_to_first_index_[HASH_SIZE];
// chain_[pos] stores the previous position with the same hash value
// for every pixel in the image.
int32_t* chain_;
// This is the maximum size of the hash_chain that can be constructed.
// Typically this is the pixel count (width x height) for a given image.
int size_;
};
// Must be called first, to set size.
int VP8LHashChainInit(VP8LHashChain* const p, int size);
void VP8LHashChainClear(VP8LHashChain* const p); // release memory
// -----------------------------------------------------------------------------
// VP8LBackwardRefs (block-based backward-references storage)
// maximum number of reference blocks the image will be segmented into
#define MAX_REFS_BLOCK_PER_IMAGE 16
typedef struct PixOrCopyBlock PixOrCopyBlock; // forward declaration
typedef struct VP8LBackwardRefs VP8LBackwardRefs;
// Container for blocks chain
struct VP8LBackwardRefs {
int block_size_; // common block-size
int error_; // set to true if some memory error occurred
PixOrCopyBlock* refs_; // list of currently used blocks
PixOrCopyBlock** tail_; // for list recycling
PixOrCopyBlock* free_blocks_; // free-list
PixOrCopyBlock* last_block_; // used for adding new refs (internal)
};
// Initialize the object. 'block_size' is the common block size to store
// references (typically, width * height / MAX_REFS_BLOCK_PER_IMAGE).
void VP8LBackwardRefsInit(VP8LBackwardRefs* const refs, int block_size);
// Release memory for backward references.
void VP8LBackwardRefsClear(VP8LBackwardRefs* const refs);
// Copies the 'src' backward refs to the 'dst'. Returns 0 in case of error.
int VP8LBackwardRefsCopy(const VP8LBackwardRefs* const src,
VP8LBackwardRefs* const dst);
// Cursor for iterating on references content
typedef struct {
PixOrCopy* refs;
int size; // currently used
int max_size; // maximum capacity
} VP8LBackwardRefs;
// public:
PixOrCopy* cur_pos; // current position
// private:
PixOrCopyBlock* cur_block_; // current block in the refs list
const PixOrCopy* last_pos_; // sentinel for switching to next block
} VP8LRefsCursor;
// Initialize the object. Must be called first. 'refs' can be NULL.
void VP8LInitBackwardRefs(VP8LBackwardRefs* const refs);
// Release memory and re-initialize the object. 'refs' can be NULL.
void VP8LClearBackwardRefs(VP8LBackwardRefs* const refs);
// Allocate 'max_size' references. Returns false in case of memory error.
int VP8LBackwardRefsAlloc(VP8LBackwardRefs* const refs, int max_size);
// Returns a cursor positioned at the beginning of the references list.
VP8LRefsCursor VP8LRefsCursorInit(const VP8LBackwardRefs* const refs);
// Returns true if cursor is pointing at a valid position.
static WEBP_INLINE int VP8LRefsCursorOk(const VP8LRefsCursor* const c) {
return (c->cur_pos != NULL);
}
// Move to next block of references. Internal, not to be called directly.
void VP8LRefsCursorNextBlock(VP8LRefsCursor* const c);
// Move to next position, or NULL. Should not be called if !VP8LRefsCursorOk().
static WEBP_INLINE void VP8LRefsCursorNext(VP8LRefsCursor* const c) {
assert(c != NULL);
assert(VP8LRefsCursorOk(c));
if (++c->cur_pos == c->last_pos_) VP8LRefsCursorNextBlock(c);
}
// -----------------------------------------------------------------------------
// Main entry points
// Evaluates best possible backward references for specified quality.
// Further optimize for 2D locality if use_2d_locality flag is set.
int VP8LGetBackwardReferences(int width, int height,
const uint32_t* const argb,
int quality, int cache_bits, int use_2d_locality,
VP8LBackwardRefs* const best);
// The return value is the pointer to the best of the two backward refs viz,
// refs[0] or refs[1].
VP8LBackwardRefs* VP8LGetBackwardReferences(
int width, int height, const uint32_t* const argb, int quality,
int cache_bits, int use_2d_locality, VP8LHashChain* const hash_chain,
VP8LBackwardRefs refs[2]);
// Produce an estimate for a good color cache size for the image.
int VP8LCalculateEstimateForCacheSize(const uint32_t* const argb,
int xsize, int ysize,
int xsize, int ysize, int quality,
VP8LHashChain* const hash_chain,
VP8LBackwardRefs* const ref,
int* const best_cache_bits);
#ifdef __cplusplus

View File

@@ -111,7 +111,11 @@ int WebPValidateConfig(const WebPConfig* config) {
return 0;
if (config->show_compressed < 0 || config->show_compressed > 1)
return 0;
#if WEBP_ENCODER_ABI_VERSION > 0x0204
if (config->preprocessing < 0 || config->preprocessing > 7)
#else
if (config->preprocessing < 0 || config->preprocessing > 3)
#endif
return 0;
if (config->partitions < 0 || config->partitions > 3)
return 0;
@@ -138,3 +142,25 @@ int WebPValidateConfig(const WebPConfig* config) {
//------------------------------------------------------------------------------
#if WEBP_ENCODER_ABI_VERSION > 0x0202
#define MAX_LEVEL 9
// Mapping between -z level and -m / -q parameter settings.
static const struct {
uint8_t method_;
uint8_t quality_;
} kLosslessPresets[MAX_LEVEL + 1] = {
{ 0, 0 }, { 1, 20 }, { 2, 25 }, { 3, 30 }, { 3, 50 },
{ 4, 50 }, { 4, 75 }, { 4, 90 }, { 5, 90 }, { 6, 100 }
};
int WebPConfigLosslessPreset(WebPConfig* config, int level) {
if (config == NULL || level < 0 || level > MAX_LEVEL) return 0;
config->lossless = 1;
config->method = kLosslessPresets[level].method_;
config->quality = kLosslessPresets[level].quality_;
return 1;
}
#endif
//------------------------------------------------------------------------------

View File

@@ -360,9 +360,10 @@ void VP8CalculateLevelCosts(VP8Proba* const proba) {
for (ctx = 0; ctx < NUM_CTX; ++ctx) {
const uint8_t* const p = proba->coeffs_[ctype][band][ctx];
uint16_t* const table = proba->level_cost_[ctype][band][ctx];
const int cost_base = VP8BitCost(1, p[1]);
const int cost0 = (ctx > 0) ? VP8BitCost(1, p[0]) : 0;
const int cost_base = VP8BitCost(1, p[1]) + cost0;
int v;
table[0] = VP8BitCost(0, p[1]);
table[0] = VP8BitCost(0, p[1]) + cost0;
for (v = 1; v <= MAX_VARIABLE_LEVEL; ++v) {
table[v] = cost_base + VariableLevelCost(v, p);
}
@@ -486,4 +487,249 @@ const uint16_t VP8FixedCostsI4[NUM_BMODES][NUM_BMODES][NUM_BMODES] = {
};
//------------------------------------------------------------------------------
// Mode costs
static int GetResidualCost(int ctx0, const VP8Residual* const res) {
int n = res->first;
// should be prob[VP8EncBands[n]], but it's equivalent for n=0 or 1
const int p0 = res->prob[n][ctx0][0];
const uint16_t* t = res->cost[n][ctx0];
// bit_cost(1, p0) is already incorporated in t[] tables, but only if ctx != 0
// (as required by the syntax). For ctx0 == 0, we need to add it here or it'll
// be missing during the loop.
int cost = (ctx0 == 0) ? VP8BitCost(1, p0) : 0;
if (res->last < 0) {
return VP8BitCost(0, p0);
}
for (; n < res->last; ++n) {
const int v = abs(res->coeffs[n]);
const int b = VP8EncBands[n + 1];
const int ctx = (v >= 2) ? 2 : v;
cost += VP8LevelCost(t, v);
t = res->cost[b][ctx];
}
// Last coefficient is always non-zero
{
const int v = abs(res->coeffs[n]);
assert(v != 0);
cost += VP8LevelCost(t, v);
if (n < 15) {
const int b = VP8EncBands[n + 1];
const int ctx = (v == 1) ? 1 : 2;
const int last_p0 = res->prob[b][ctx][0];
cost += VP8BitCost(0, last_p0);
}
}
return cost;
}
//------------------------------------------------------------------------------
// init function
#if defined(WEBP_USE_MIPS32)
extern int VP8GetResidualCostMIPS32(int ctx0, const VP8Residual* const res);
#endif // WEBP_USE_MIPS32
// TODO(skal): this, and GetResidualCost(), should probably go somewhere
// under src/dsp/ at some point.
VP8GetResidualCostFunc VP8GetResidualCost;
void VP8GetResidualCostInit(void) {
VP8GetResidualCost = GetResidualCost;
if (VP8GetCPUInfo != NULL) {
#if defined(WEBP_USE_MIPS32)
if (VP8GetCPUInfo(kMIPS32)) {
VP8GetResidualCost = VP8GetResidualCostMIPS32;
}
#endif
}
}
//------------------------------------------------------------------------------
// helper functions for residuals struct VP8Residual.
void VP8InitResidual(int first, int coeff_type,
VP8Encoder* const enc, VP8Residual* const res) {
res->coeff_type = coeff_type;
res->prob = enc->proba_.coeffs_[coeff_type];
res->stats = enc->proba_.stats_[coeff_type];
res->cost = enc->proba_.level_cost_[coeff_type];
res->first = first;
}
static void SetResidualCoeffs(const int16_t* const coeffs,
VP8Residual* const res) {
int n;
res->last = -1;
assert(res->first == 0 || coeffs[0] == 0);
for (n = 15; n >= 0; --n) {
if (coeffs[n]) {
res->last = n;
break;
}
}
res->coeffs = coeffs;
}
//------------------------------------------------------------------------------
// init function
#if defined(WEBP_USE_SSE2)
extern void VP8SetResidualCoeffsSSE2(const int16_t* const coeffs,
VP8Residual* const res);
#endif // WEBP_USE_SSE2
VP8SetResidualCoeffsFunc VP8SetResidualCoeffs;
void VP8SetResidualCoeffsInit(void) {
VP8SetResidualCoeffs = SetResidualCoeffs;
if (VP8GetCPUInfo != NULL) {
#if defined(WEBP_USE_SSE2)
if (VP8GetCPUInfo(kSSE2)) {
VP8SetResidualCoeffs = VP8SetResidualCoeffsSSE2;
}
#endif
}
}
//------------------------------------------------------------------------------
// Mode costs
int VP8GetCostLuma4(VP8EncIterator* const it, const int16_t levels[16]) {
const int x = (it->i4_ & 3), y = (it->i4_ >> 2);
VP8Residual res;
VP8Encoder* const enc = it->enc_;
int R = 0;
int ctx;
VP8InitResidual(0, 3, enc, &res);
ctx = it->top_nz_[x] + it->left_nz_[y];
VP8SetResidualCoeffs(levels, &res);
R += VP8GetResidualCost(ctx, &res);
return R;
}
int VP8GetCostLuma16(VP8EncIterator* const it, const VP8ModeScore* const rd) {
VP8Residual res;
VP8Encoder* const enc = it->enc_;
int x, y;
int R = 0;
VP8IteratorNzToBytes(it); // re-import the non-zero context
// DC
VP8InitResidual(0, 1, enc, &res);
VP8SetResidualCoeffs(rd->y_dc_levels, &res);
R += VP8GetResidualCost(it->top_nz_[8] + it->left_nz_[8], &res);
// AC
VP8InitResidual(1, 0, enc, &res);
for (y = 0; y < 4; ++y) {
for (x = 0; x < 4; ++x) {
const int ctx = it->top_nz_[x] + it->left_nz_[y];
VP8SetResidualCoeffs(rd->y_ac_levels[x + y * 4], &res);
R += VP8GetResidualCost(ctx, &res);
it->top_nz_[x] = it->left_nz_[y] = (res.last >= 0);
}
}
return R;
}
int VP8GetCostUV(VP8EncIterator* const it, const VP8ModeScore* const rd) {
VP8Residual res;
VP8Encoder* const enc = it->enc_;
int ch, x, y;
int R = 0;
VP8IteratorNzToBytes(it); // re-import the non-zero context
VP8InitResidual(0, 2, enc, &res);
for (ch = 0; ch <= 2; ch += 2) {
for (y = 0; y < 2; ++y) {
for (x = 0; x < 2; ++x) {
const int ctx = it->top_nz_[4 + ch + x] + it->left_nz_[4 + ch + y];
VP8SetResidualCoeffs(rd->uv_levels[ch * 2 + x + y * 2], &res);
R += VP8GetResidualCost(ctx, &res);
it->top_nz_[4 + ch + x] = it->left_nz_[4 + ch + y] = (res.last >= 0);
}
}
}
return R;
}
//------------------------------------------------------------------------------
// Recording of token probabilities.
// Record proba context used
static int Record(int bit, proba_t* const stats) {
proba_t p = *stats;
if (p >= 0xffff0000u) { // an overflow is inbound.
p = ((p + 1u) >> 1) & 0x7fff7fffu; // -> divide the stats by 2.
}
// record bit count (lower 16 bits) and increment total count (upper 16 bits).
p += 0x00010000u + bit;
*stats = p;
return bit;
}
// We keep the table-free variant around for reference, in case.
#define USE_LEVEL_CODE_TABLE
// Simulate block coding, but only record statistics.
// Note: no need to record the fixed probas.
int VP8RecordCoeffs(int ctx, const VP8Residual* const res) {
int n = res->first;
// should be stats[VP8EncBands[n]], but it's equivalent for n=0 or 1
proba_t* s = res->stats[n][ctx];
if (res->last < 0) {
Record(0, s + 0);
return 0;
}
while (n <= res->last) {
int v;
Record(1, s + 0); // order of record doesn't matter
while ((v = res->coeffs[n++]) == 0) {
Record(0, s + 1);
s = res->stats[VP8EncBands[n]][0];
}
Record(1, s + 1);
if (!Record(2u < (unsigned int)(v + 1), s + 2)) { // v = -1 or 1
s = res->stats[VP8EncBands[n]][1];
} else {
v = abs(v);
#if !defined(USE_LEVEL_CODE_TABLE)
if (!Record(v > 4, s + 3)) {
if (Record(v != 2, s + 4))
Record(v == 4, s + 5);
} else if (!Record(v > 10, s + 6)) {
Record(v > 6, s + 7);
} else if (!Record((v >= 3 + (8 << 2)), s + 8)) {
Record((v >= 3 + (8 << 1)), s + 9);
} else {
Record((v >= 3 + (8 << 3)), s + 10);
}
#else
if (v > MAX_VARIABLE_LEVEL) {
v = MAX_VARIABLE_LEVEL;
}
{
const int bits = VP8LevelCodes[v - 1][1];
int pattern = VP8LevelCodes[v - 1][0];
int i;
for (i = 0; (pattern >>= 1) != 0; ++i) {
const int mask = 2 << i;
if (pattern & 1) Record(!!(bits & mask), s + 3 + i);
}
}
#endif
s = res->stats[VP8EncBands[n]][2];
}
}
if (n < 16) Record(0, s + 0);
return 1;
}
//------------------------------------------------------------------------------

View File

@@ -14,12 +14,38 @@
#ifndef WEBP_ENC_COST_H_
#define WEBP_ENC_COST_H_
#include <assert.h>
#include <stdlib.h>
#include "./vp8enci.h"
#ifdef __cplusplus
extern "C" {
#endif
// On-the-fly info about the current set of residuals. Handy to avoid
// passing zillions of params.
typedef struct {
int first;
int last;
const int16_t* coeffs;
int coeff_type;
ProbaArray* prob;
StatsArray* stats;
CostArray* cost;
} VP8Residual;
void VP8InitResidual(int first, int coeff_type,
VP8Encoder* const enc, VP8Residual* const res);
typedef void (*VP8SetResidualCoeffsFunc)(const int16_t* const coeffs,
VP8Residual* const res);
extern VP8SetResidualCoeffsFunc VP8SetResidualCoeffs;
void VP8SetResidualCoeffsInit(void); // must be called first
int VP8RecordCoeffs(int ctx, const VP8Residual* const res);
// approximate cost per level:
extern const uint16_t VP8LevelFixedCosts[MAX_LEVEL + 1];
extern const uint16_t VP8EntropyCost[256]; // 8bit fixed-point log(p)
@@ -29,6 +55,12 @@ static WEBP_INLINE int VP8BitCost(int bit, uint8_t proba) {
return !bit ? VP8EntropyCost[proba] : VP8EntropyCost[255 - proba];
}
// Cost calculation function.
typedef int (*VP8GetResidualCostFunc)(int ctx0, const VP8Residual* const res);
extern VP8GetResidualCostFunc VP8GetResidualCost;
void VP8GetResidualCostInit(void); // must be called first
// Level cost calculations
extern const uint16_t VP8LevelCodes[MAX_VARIABLE_LEVEL][2];
void VP8CalculateLevelCosts(VP8Proba* const proba);

View File

@@ -13,6 +13,7 @@
#include <assert.h>
#include "./vp8enci.h"
#include "../dsp/dsp.h"
// This table gives, for a given sharpness, the filtering strength to be
// used (at least) in order to filter a given edge step delta.
@@ -61,180 +62,6 @@ int VP8FilterStrengthFromDelta(int sharpness, int delta) {
return kLevelsFromDelta[sharpness][pos];
}
// -----------------------------------------------------------------------------
// NOTE: clip1, tables and InitTables are repeated entries of dsp.c
static uint8_t abs0[255 + 255 + 1]; // abs(i)
static uint8_t abs1[255 + 255 + 1]; // abs(i)>>1
static int8_t sclip1[1020 + 1020 + 1]; // clips [-1020, 1020] to [-128, 127]
static int8_t sclip2[112 + 112 + 1]; // clips [-112, 112] to [-16, 15]
static uint8_t clip1[255 + 510 + 1]; // clips [-255,510] to [0,255]
static int tables_ok = 0;
static void InitTables(void) {
if (!tables_ok) {
int i;
for (i = -255; i <= 255; ++i) {
abs0[255 + i] = (i < 0) ? -i : i;
abs1[255 + i] = abs0[255 + i] >> 1;
}
for (i = -1020; i <= 1020; ++i) {
sclip1[1020 + i] = (i < -128) ? -128 : (i > 127) ? 127 : i;
}
for (i = -112; i <= 112; ++i) {
sclip2[112 + i] = (i < -16) ? -16 : (i > 15) ? 15 : i;
}
for (i = -255; i <= 255 + 255; ++i) {
clip1[255 + i] = (i < 0) ? 0 : (i > 255) ? 255 : i;
}
tables_ok = 1;
}
}
//------------------------------------------------------------------------------
// Edge filtering functions
// 4 pixels in, 2 pixels out
static WEBP_INLINE void do_filter2(uint8_t* p, int step) {
const int p1 = p[-2*step], p0 = p[-step], q0 = p[0], q1 = p[step];
const int a = 3 * (q0 - p0) + sclip1[1020 + p1 - q1];
const int a1 = sclip2[112 + ((a + 4) >> 3)];
const int a2 = sclip2[112 + ((a + 3) >> 3)];
p[-step] = clip1[255 + p0 + a2];
p[ 0] = clip1[255 + q0 - a1];
}
// 4 pixels in, 4 pixels out
static WEBP_INLINE void do_filter4(uint8_t* p, int step) {
const int p1 = p[-2*step], p0 = p[-step], q0 = p[0], q1 = p[step];
const int a = 3 * (q0 - p0);
const int a1 = sclip2[112 + ((a + 4) >> 3)];
const int a2 = sclip2[112 + ((a + 3) >> 3)];
const int a3 = (a1 + 1) >> 1;
p[-2*step] = clip1[255 + p1 + a3];
p[- step] = clip1[255 + p0 + a2];
p[ 0] = clip1[255 + q0 - a1];
p[ step] = clip1[255 + q1 - a3];
}
// high edge-variance
static WEBP_INLINE int hev(const uint8_t* p, int step, int thresh) {
const int p1 = p[-2*step], p0 = p[-step], q0 = p[0], q1 = p[step];
return (abs0[255 + p1 - p0] > thresh) || (abs0[255 + q1 - q0] > thresh);
}
static WEBP_INLINE int needs_filter(const uint8_t* p, int step, int thresh) {
const int p1 = p[-2*step], p0 = p[-step], q0 = p[0], q1 = p[step];
return (2 * abs0[255 + p0 - q0] + abs1[255 + p1 - q1]) <= thresh;
}
static WEBP_INLINE int needs_filter2(const uint8_t* p,
int step, int t, int it) {
const int p3 = p[-4*step], p2 = p[-3*step], p1 = p[-2*step], p0 = p[-step];
const int q0 = p[0], q1 = p[step], q2 = p[2*step], q3 = p[3*step];
if ((2 * abs0[255 + p0 - q0] + abs1[255 + p1 - q1]) > t)
return 0;
return abs0[255 + p3 - p2] <= it && abs0[255 + p2 - p1] <= it &&
abs0[255 + p1 - p0] <= it && abs0[255 + q3 - q2] <= it &&
abs0[255 + q2 - q1] <= it && abs0[255 + q1 - q0] <= it;
}
//------------------------------------------------------------------------------
// Simple In-loop filtering (Paragraph 15.2)
static void SimpleVFilter16(uint8_t* p, int stride, int thresh) {
int i;
for (i = 0; i < 16; ++i) {
if (needs_filter(p + i, stride, thresh)) {
do_filter2(p + i, stride);
}
}
}
static void SimpleHFilter16(uint8_t* p, int stride, int thresh) {
int i;
for (i = 0; i < 16; ++i) {
if (needs_filter(p + i * stride, 1, thresh)) {
do_filter2(p + i * stride, 1);
}
}
}
static void SimpleVFilter16i(uint8_t* p, int stride, int thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4 * stride;
SimpleVFilter16(p, stride, thresh);
}
}
static void SimpleHFilter16i(uint8_t* p, int stride, int thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4;
SimpleHFilter16(p, stride, thresh);
}
}
//------------------------------------------------------------------------------
// Complex In-loop filtering (Paragraph 15.3)
static WEBP_INLINE void FilterLoop24(uint8_t* p,
int hstride, int vstride, int size,
int thresh, int ithresh, int hev_thresh) {
while (size-- > 0) {
if (needs_filter2(p, hstride, thresh, ithresh)) {
if (hev(p, hstride, hev_thresh)) {
do_filter2(p, hstride);
} else {
do_filter4(p, hstride);
}
}
p += vstride;
}
}
// on three inner edges
static void VFilter16i(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4 * stride;
FilterLoop24(p, stride, 1, 16, thresh, ithresh, hev_thresh);
}
}
static void HFilter16i(uint8_t* p, int stride,
int thresh, int ithresh, int hev_thresh) {
int k;
for (k = 3; k > 0; --k) {
p += 4;
FilterLoop24(p, 1, stride, 16, thresh, ithresh, hev_thresh);
}
}
static void VFilter8i(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
FilterLoop24(u + 4 * stride, stride, 1, 8, thresh, ithresh, hev_thresh);
FilterLoop24(v + 4 * stride, stride, 1, 8, thresh, ithresh, hev_thresh);
}
static void HFilter8i(uint8_t* u, uint8_t* v, int stride,
int thresh, int ithresh, int hev_thresh) {
FilterLoop24(u + 4, 1, stride, 8, thresh, ithresh, hev_thresh);
FilterLoop24(v + 4, 1, stride, 8, thresh, ithresh, hev_thresh);
}
//------------------------------------------------------------------------------
void (*VP8EncVFilter16i)(uint8_t*, int, int, int, int) = VFilter16i;
void (*VP8EncHFilter16i)(uint8_t*, int, int, int, int) = HFilter16i;
void (*VP8EncVFilter8i)(uint8_t*, uint8_t*, int, int, int, int) = VFilter8i;
void (*VP8EncHFilter8i)(uint8_t*, uint8_t*, int, int, int, int) = HFilter8i;
void (*VP8EncSimpleVFilter16i)(uint8_t*, int, int) = SimpleVFilter16i;
void (*VP8EncSimpleHFilter16i)(uint8_t*, int, int) = SimpleHFilter16i;
//------------------------------------------------------------------------------
// Paragraph 15.4: compute the inner-edge filtering strength
@@ -266,14 +93,14 @@ static void DoFilter(const VP8EncIterator* const it, int level) {
memcpy(y_dst, it->yuv_out_, YUV_SIZE * sizeof(uint8_t));
if (enc->filter_hdr_.simple_ == 1) { // simple
VP8EncSimpleHFilter16i(y_dst, BPS, limit);
VP8EncSimpleVFilter16i(y_dst, BPS, limit);
VP8SimpleHFilter16i(y_dst, BPS, limit);
VP8SimpleVFilter16i(y_dst, BPS, limit);
} else { // complex
const int hev_thresh = (level >= 40) ? 2 : (level >= 15) ? 1 : 0;
VP8EncHFilter16i(y_dst, BPS, limit, ilevel, hev_thresh);
VP8EncHFilter8i(u_dst, v_dst, BPS, limit, ilevel, hev_thresh);
VP8EncVFilter16i(y_dst, BPS, limit, ilevel, hev_thresh);
VP8EncVFilter8i(u_dst, v_dst, BPS, limit, ilevel, hev_thresh);
VP8HFilter16i(y_dst, BPS, limit, ilevel, hev_thresh);
VP8HFilter8i(u_dst, v_dst, BPS, limit, ilevel, hev_thresh);
VP8VFilter16i(y_dst, BPS, limit, ilevel, hev_thresh);
VP8VFilter8i(u_dst, v_dst, BPS, limit, ilevel, hev_thresh);
}
}
@@ -387,7 +214,6 @@ static double GetMBSSIM(const uint8_t* yuv1, const uint8_t* yuv2) {
void VP8InitFilter(VP8EncIterator* const it) {
if (it->lf_stats_ != NULL) {
int s, i;
InitTables();
for (s = 0; s < NUM_MB_SEGMENTS; s++) {
for (i = 0; i < MAX_LF_LEVELS; i++) {
(*it->lf_stats_)[s][i] = 0;
@@ -468,4 +294,3 @@ void VP8AdjustFilterStrength(VP8EncIterator* const it) {
}
// -----------------------------------------------------------------------------

View File

@@ -11,8 +11,6 @@
//
// Author: Skal (pascal.massimino@gmail.com)
#include <assert.h>
#include <stdlib.h>
#include <string.h>
#include <math.h>
@@ -23,19 +21,6 @@
#define SEGMENT_VISU 0
#define DEBUG_SEARCH 0 // useful to track search convergence
// On-the-fly info about the current set of residuals. Handy to avoid
// passing zillions of params.
typedef struct {
int first;
int last;
const int16_t* coeffs;
int coeff_type;
ProbaArray* prob;
StatsArray* stats;
CostArray* cost;
} VP8Residual;
//------------------------------------------------------------------------------
// multi-pass convergence
@@ -142,83 +127,6 @@ static int FinalizeSkipProba(VP8Encoder* const enc) {
return size;
}
//------------------------------------------------------------------------------
// Recording of token probabilities.
static void ResetTokenStats(VP8Encoder* const enc) {
VP8Proba* const proba = &enc->proba_;
memset(proba->stats_, 0, sizeof(proba->stats_));
}
// Record proba context used
static int Record(int bit, proba_t* const stats) {
proba_t p = *stats;
if (p >= 0xffff0000u) { // an overflow is inbound.
p = ((p + 1u) >> 1) & 0x7fff7fffu; // -> divide the stats by 2.
}
// record bit count (lower 16 bits) and increment total count (upper 16 bits).
p += 0x00010000u + bit;
*stats = p;
return bit;
}
// We keep the table free variant around for reference, in case.
#define USE_LEVEL_CODE_TABLE
// Simulate block coding, but only record statistics.
// Note: no need to record the fixed probas.
static int RecordCoeffs(int ctx, const VP8Residual* const res) {
int n = res->first;
// should be stats[VP8EncBands[n]], but it's equivalent for n=0 or 1
proba_t* s = res->stats[n][ctx];
if (res->last < 0) {
Record(0, s + 0);
return 0;
}
while (n <= res->last) {
int v;
Record(1, s + 0); // order of record doesn't matter
while ((v = res->coeffs[n++]) == 0) {
Record(0, s + 1);
s = res->stats[VP8EncBands[n]][0];
}
Record(1, s + 1);
if (!Record(2u < (unsigned int)(v + 1), s + 2)) { // v = -1 or 1
s = res->stats[VP8EncBands[n]][1];
} else {
v = abs(v);
#if !defined(USE_LEVEL_CODE_TABLE)
if (!Record(v > 4, s + 3)) {
if (Record(v != 2, s + 4))
Record(v == 4, s + 5);
} else if (!Record(v > 10, s + 6)) {
Record(v > 6, s + 7);
} else if (!Record((v >= 3 + (8 << 2)), s + 8)) {
Record((v >= 3 + (8 << 1)), s + 9);
} else {
Record((v >= 3 + (8 << 3)), s + 10);
}
#else
if (v > MAX_VARIABLE_LEVEL)
v = MAX_VARIABLE_LEVEL;
{
const int bits = VP8LevelCodes[v - 1][1];
int pattern = VP8LevelCodes[v - 1][0];
int i;
for (i = 0; (pattern >>= 1) != 0; ++i) {
const int mask = 2 << i;
if (pattern & 1) Record(!!(bits & mask), s + 3 + i);
}
}
#endif
s = res->stats[VP8EncBands[n]][2];
}
}
if (n < 16) Record(0, s + 0);
return 1;
}
// Collect statistics and deduce probabilities for next coding pass.
// Return the total bit-cost for coding the probability updates.
static int CalcTokenProba(int nb, int total) {
@@ -231,6 +139,11 @@ static int BranchCost(int nb, int total, int proba) {
return nb * VP8BitCost(1, proba) + (total - nb) * VP8BitCost(0, proba);
}
static void ResetTokenStats(VP8Encoder* const enc) {
VP8Proba* const proba = &enc->proba_;
memset(proba->stats_, 0, sizeof(proba->stats_));
}
static int FinalizeTokenProbas(VP8Proba* const proba) {
int has_changed = 0;
int size = 0;
@@ -308,131 +221,6 @@ static void SetSegmentProbas(VP8Encoder* const enc) {
}
}
//------------------------------------------------------------------------------
// helper functions for residuals struct VP8Residual.
static void InitResidual(int first, int coeff_type,
VP8Encoder* const enc, VP8Residual* const res) {
res->coeff_type = coeff_type;
res->prob = enc->proba_.coeffs_[coeff_type];
res->stats = enc->proba_.stats_[coeff_type];
res->cost = enc->proba_.level_cost_[coeff_type];
res->first = first;
}
static void SetResidualCoeffs(const int16_t* const coeffs,
VP8Residual* const res) {
int n;
res->last = -1;
for (n = 15; n >= res->first; --n) {
if (coeffs[n]) {
res->last = n;
break;
}
}
res->coeffs = coeffs;
}
//------------------------------------------------------------------------------
// Mode costs
static int GetResidualCost(int ctx0, const VP8Residual* const res) {
int n = res->first;
// should be prob[VP8EncBands[n]], but it's equivalent for n=0 or 1
int p0 = res->prob[n][ctx0][0];
const uint16_t* t = res->cost[n][ctx0];
int cost;
if (res->last < 0) {
return VP8BitCost(0, p0);
}
cost = VP8BitCost(1, p0);
for (; n < res->last; ++n) {
const int v = abs(res->coeffs[n]);
const int b = VP8EncBands[n + 1];
const int ctx = (v >= 2) ? 2 : v;
cost += VP8LevelCost(t, v);
t = res->cost[b][ctx];
// the masking trick is faster than "if (v) cost += ..." with clang
cost += (v ? ~0U : 0) & VP8BitCost(1, res->prob[b][ctx][0]);
}
// Last coefficient is always non-zero
{
const int v = abs(res->coeffs[n]);
assert(v != 0);
cost += VP8LevelCost(t, v);
if (n < 15) {
const int b = VP8EncBands[n + 1];
const int ctx = (v == 1) ? 1 : 2;
const int last_p0 = res->prob[b][ctx][0];
cost += VP8BitCost(0, last_p0);
}
}
return cost;
}
int VP8GetCostLuma4(VP8EncIterator* const it, const int16_t levels[16]) {
const int x = (it->i4_ & 3), y = (it->i4_ >> 2);
VP8Residual res;
VP8Encoder* const enc = it->enc_;
int R = 0;
int ctx;
InitResidual(0, 3, enc, &res);
ctx = it->top_nz_[x] + it->left_nz_[y];
SetResidualCoeffs(levels, &res);
R += GetResidualCost(ctx, &res);
return R;
}
int VP8GetCostLuma16(VP8EncIterator* const it, const VP8ModeScore* const rd) {
VP8Residual res;
VP8Encoder* const enc = it->enc_;
int x, y;
int R = 0;
VP8IteratorNzToBytes(it); // re-import the non-zero context
// DC
InitResidual(0, 1, enc, &res);
SetResidualCoeffs(rd->y_dc_levels, &res);
R += GetResidualCost(it->top_nz_[8] + it->left_nz_[8], &res);
// AC
InitResidual(1, 0, enc, &res);
for (y = 0; y < 4; ++y) {
for (x = 0; x < 4; ++x) {
const int ctx = it->top_nz_[x] + it->left_nz_[y];
SetResidualCoeffs(rd->y_ac_levels[x + y * 4], &res);
R += GetResidualCost(ctx, &res);
it->top_nz_[x] = it->left_nz_[y] = (res.last >= 0);
}
}
return R;
}
int VP8GetCostUV(VP8EncIterator* const it, const VP8ModeScore* const rd) {
VP8Residual res;
VP8Encoder* const enc = it->enc_;
int ch, x, y;
int R = 0;
VP8IteratorNzToBytes(it); // re-import the non-zero context
InitResidual(0, 2, enc, &res);
for (ch = 0; ch <= 2; ch += 2) {
for (y = 0; y < 2; ++y) {
for (x = 0; x < 2; ++x) {
const int ctx = it->top_nz_[4 + ch + x] + it->left_nz_[4 + ch + y];
SetResidualCoeffs(rd->uv_levels[ch * 2 + x + y * 2], &res);
R += GetResidualCost(ctx, &res);
it->top_nz_[4 + ch + x] = it->left_nz_[4 + ch + y] = (res.last >= 0);
}
}
}
return R;
}
//------------------------------------------------------------------------------
// Coefficient coding
@@ -521,32 +309,32 @@ static void CodeResiduals(VP8BitWriter* const bw, VP8EncIterator* const it,
pos1 = VP8BitWriterPos(bw);
if (i16) {
InitResidual(0, 1, enc, &res);
SetResidualCoeffs(rd->y_dc_levels, &res);
VP8InitResidual(0, 1, enc, &res);
VP8SetResidualCoeffs(rd->y_dc_levels, &res);
it->top_nz_[8] = it->left_nz_[8] =
PutCoeffs(bw, it->top_nz_[8] + it->left_nz_[8], &res);
InitResidual(1, 0, enc, &res);
VP8InitResidual(1, 0, enc, &res);
} else {
InitResidual(0, 3, enc, &res);
VP8InitResidual(0, 3, enc, &res);
}
// luma-AC
for (y = 0; y < 4; ++y) {
for (x = 0; x < 4; ++x) {
const int ctx = it->top_nz_[x] + it->left_nz_[y];
SetResidualCoeffs(rd->y_ac_levels[x + y * 4], &res);
VP8SetResidualCoeffs(rd->y_ac_levels[x + y * 4], &res);
it->top_nz_[x] = it->left_nz_[y] = PutCoeffs(bw, ctx, &res);
}
}
pos2 = VP8BitWriterPos(bw);
// U/V
InitResidual(0, 2, enc, &res);
VP8InitResidual(0, 2, enc, &res);
for (ch = 0; ch <= 2; ch += 2) {
for (y = 0; y < 2; ++y) {
for (x = 0; x < 2; ++x) {
const int ctx = it->top_nz_[4 + ch + x] + it->left_nz_[4 + ch + y];
SetResidualCoeffs(rd->uv_levels[ch * 2 + x + y * 2], &res);
VP8SetResidualCoeffs(rd->uv_levels[ch * 2 + x + y * 2], &res);
it->top_nz_[4 + ch + x] = it->left_nz_[4 + ch + y] =
PutCoeffs(bw, ctx, &res);
}
@@ -571,33 +359,33 @@ static void RecordResiduals(VP8EncIterator* const it,
VP8IteratorNzToBytes(it);
if (it->mb_->type_ == 1) { // i16x16
InitResidual(0, 1, enc, &res);
SetResidualCoeffs(rd->y_dc_levels, &res);
VP8InitResidual(0, 1, enc, &res);
VP8SetResidualCoeffs(rd->y_dc_levels, &res);
it->top_nz_[8] = it->left_nz_[8] =
RecordCoeffs(it->top_nz_[8] + it->left_nz_[8], &res);
InitResidual(1, 0, enc, &res);
VP8RecordCoeffs(it->top_nz_[8] + it->left_nz_[8], &res);
VP8InitResidual(1, 0, enc, &res);
} else {
InitResidual(0, 3, enc, &res);
VP8InitResidual(0, 3, enc, &res);
}
// luma-AC
for (y = 0; y < 4; ++y) {
for (x = 0; x < 4; ++x) {
const int ctx = it->top_nz_[x] + it->left_nz_[y];
SetResidualCoeffs(rd->y_ac_levels[x + y * 4], &res);
it->top_nz_[x] = it->left_nz_[y] = RecordCoeffs(ctx, &res);
VP8SetResidualCoeffs(rd->y_ac_levels[x + y * 4], &res);
it->top_nz_[x] = it->left_nz_[y] = VP8RecordCoeffs(ctx, &res);
}
}
// U/V
InitResidual(0, 2, enc, &res);
VP8InitResidual(0, 2, enc, &res);
for (ch = 0; ch <= 2; ch += 2) {
for (y = 0; y < 2; ++y) {
for (x = 0; x < 2; ++x) {
const int ctx = it->top_nz_[4 + ch + x] + it->left_nz_[4 + ch + y];
SetResidualCoeffs(rd->uv_levels[ch * 2 + x + y * 2], &res);
VP8SetResidualCoeffs(rd->uv_levels[ch * 2 + x + y * 2], &res);
it->top_nz_[4 + ch + x] = it->left_nz_[4 + ch + y] =
RecordCoeffs(ctx, &res);
VP8RecordCoeffs(ctx, &res);
}
}
}
@@ -610,8 +398,8 @@ static void RecordResiduals(VP8EncIterator* const it,
#if !defined(DISABLE_TOKEN_BUFFER)
static void RecordTokens(VP8EncIterator* const it, const VP8ModeScore* const rd,
VP8TBuffer* const tokens) {
static int RecordTokens(VP8EncIterator* const it, const VP8ModeScore* const rd,
VP8TBuffer* const tokens) {
int x, y, ch;
VP8Residual res;
VP8Encoder* const enc = it->enc_;
@@ -619,44 +407,45 @@ static void RecordTokens(VP8EncIterator* const it, const VP8ModeScore* const rd,
VP8IteratorNzToBytes(it);
if (it->mb_->type_ == 1) { // i16x16
const int ctx = it->top_nz_[8] + it->left_nz_[8];
InitResidual(0, 1, enc, &res);
SetResidualCoeffs(rd->y_dc_levels, &res);
VP8InitResidual(0, 1, enc, &res);
VP8SetResidualCoeffs(rd->y_dc_levels, &res);
it->top_nz_[8] = it->left_nz_[8] =
VP8RecordCoeffTokens(ctx, 1,
res.first, res.last, res.coeffs, tokens);
RecordCoeffs(ctx, &res);
InitResidual(1, 0, enc, &res);
VP8RecordCoeffs(ctx, &res);
VP8InitResidual(1, 0, enc, &res);
} else {
InitResidual(0, 3, enc, &res);
VP8InitResidual(0, 3, enc, &res);
}
// luma-AC
for (y = 0; y < 4; ++y) {
for (x = 0; x < 4; ++x) {
const int ctx = it->top_nz_[x] + it->left_nz_[y];
SetResidualCoeffs(rd->y_ac_levels[x + y * 4], &res);
VP8SetResidualCoeffs(rd->y_ac_levels[x + y * 4], &res);
it->top_nz_[x] = it->left_nz_[y] =
VP8RecordCoeffTokens(ctx, res.coeff_type,
res.first, res.last, res.coeffs, tokens);
RecordCoeffs(ctx, &res);
VP8RecordCoeffs(ctx, &res);
}
}
// U/V
InitResidual(0, 2, enc, &res);
VP8InitResidual(0, 2, enc, &res);
for (ch = 0; ch <= 2; ch += 2) {
for (y = 0; y < 2; ++y) {
for (x = 0; x < 2; ++x) {
const int ctx = it->top_nz_[4 + ch + x] + it->left_nz_[4 + ch + y];
SetResidualCoeffs(rd->uv_levels[ch * 2 + x + y * 2], &res);
VP8SetResidualCoeffs(rd->uv_levels[ch * 2 + x + y * 2], &res);
it->top_nz_[4 + ch + x] = it->left_nz_[4 + ch + y] =
VP8RecordCoeffTokens(ctx, 2,
res.first, res.last, res.coeffs, tokens);
RecordCoeffs(ctx, &res);
VP8RecordCoeffs(ctx, &res);
}
}
}
VP8IteratorBytesToNz(it);
return !tokens->error_;
}
#endif // !DISABLE_TOKEN_BUFFER
@@ -719,7 +508,7 @@ static void StoreSideInfo(const VP8EncIterator* const it) {
}
case 7: *info = mb->alpha_; break;
default: *info = 0; break;
};
}
}
#if SEGMENT_VISU // visualize segments and prediction modes
SetBlock(it->yuv_out_ + Y_OFF, mb->segment_ * 64, 16);
@@ -863,7 +652,10 @@ static int PreLoopInitialize(VP8Encoder* const enc) {
for (p = 0; ok && p < enc->num_parts_; ++p) {
ok = VP8BitWriterInit(enc->parts_ + p, bytes_per_parts);
}
if (!ok) VP8EncFreeBitWriters(enc); // malloc error occurred
if (!ok) {
VP8EncFreeBitWriters(enc); // malloc error occurred
WebPEncodingSetError(enc->pic_, VP8_ENC_ERROR_OUT_OF_MEMORY);
}
return ok;
}
@@ -928,11 +720,6 @@ int VP8EncLoop(VP8Encoder* const enc) {
} else { // reset predictors after a skip
ResetAfterSkip(&it);
}
#ifdef WEBP_EXPERIMENTAL_FEATURES
if (enc->use_layer_) {
VP8EncCodeLayerBlock(&it);
}
#endif
StoreSideInfo(&it);
VP8StoreFilterStats(&it);
VP8IteratorExport(&it);
@@ -997,14 +784,13 @@ int VP8EncTokenLoop(VP8Encoder* const enc) {
cnt = max_count;
}
VP8Decimate(&it, &info, rd_opt);
RecordTokens(&it, &info, &enc->tokens_);
ok = RecordTokens(&it, &info, &enc->tokens_);
if (!ok) {
WebPEncodingSetError(enc->pic_, VP8_ENC_ERROR_OUT_OF_MEMORY);
break;
}
size_p0 += info.H;
distortion += info.D;
#ifdef WEBP_EXPERIMENTAL_FEATURES
if (enc->use_layer_) {
VP8EncCodeLayerBlock(&it);
}
#endif
if (is_last_pass) {
StoreSideInfo(&it);
VP8StoreFilterStats(&it);

View File

@@ -10,31 +10,64 @@
// Author: Jyrki Alakuijala (jyrki@google.com)
//
#ifdef HAVE_CONFIG_H
#include "config.h"
#include "../webp/config.h"
#endif
#include <math.h>
#include <stdio.h>
#include "./backward_references.h"
#include "./histogram.h"
#include "../dsp/lossless.h"
#include "../utils/utils.h"
#define MAX_COST 1.e38
// Number of partitions for the three dominant (literal, red and blue) symbol
// costs.
#define NUM_PARTITIONS 4
// The size of the bin-hash corresponding to the three dominant costs.
#define BIN_SIZE (NUM_PARTITIONS * NUM_PARTITIONS * NUM_PARTITIONS)
static void HistogramClear(VP8LHistogram* const p) {
memset(p->literal_, 0, sizeof(p->literal_));
memset(p->red_, 0, sizeof(p->red_));
memset(p->blue_, 0, sizeof(p->blue_));
memset(p->alpha_, 0, sizeof(p->alpha_));
memset(p->distance_, 0, sizeof(p->distance_));
p->bit_cost_ = 0;
uint32_t* const literal = p->literal_;
const int cache_bits = p->palette_code_bits_;
const int histo_size = VP8LGetHistogramSize(cache_bits);
memset(p, 0, histo_size);
p->palette_code_bits_ = cache_bits;
p->literal_ = literal;
}
static void HistogramCopy(const VP8LHistogram* const src,
VP8LHistogram* const dst) {
uint32_t* const dst_literal = dst->literal_;
const int dst_cache_bits = dst->palette_code_bits_;
const int histo_size = VP8LGetHistogramSize(dst_cache_bits);
assert(src->palette_code_bits_ == dst_cache_bits);
memcpy(dst, src, histo_size);
dst->literal_ = dst_literal;
}
int VP8LGetHistogramSize(int cache_bits) {
const int literal_size = VP8LHistogramNumCodes(cache_bits);
const size_t total_size = sizeof(VP8LHistogram) + sizeof(int) * literal_size;
assert(total_size <= (size_t)0x7fffffff);
return (int)total_size;
}
void VP8LFreeHistogram(VP8LHistogram* const histo) {
WebPSafeFree(histo);
}
void VP8LFreeHistogramSet(VP8LHistogramSet* const histo) {
WebPSafeFree(histo);
}
void VP8LHistogramStoreRefs(const VP8LBackwardRefs* const refs,
VP8LHistogram* const histo) {
int i;
for (i = 0; i < refs->size; ++i) {
VP8LHistogramAddSinglePixOrCopy(histo, &refs->refs[i]);
VP8LRefsCursor c = VP8LRefsCursorInit(refs);
while (VP8LRefsCursorOk(&c)) {
VP8LHistogramAddSinglePixOrCopy(histo, c.cur_pos);
VP8LRefsCursorNext(&c);
}
}
@@ -53,13 +86,24 @@ void VP8LHistogramInit(VP8LHistogram* const p, int palette_code_bits) {
HistogramClear(p);
}
VP8LHistogram* VP8LAllocateHistogram(int cache_bits) {
VP8LHistogram* histo = NULL;
const int total_size = VP8LGetHistogramSize(cache_bits);
uint8_t* const memory = (uint8_t*)WebPSafeMalloc(total_size, sizeof(*memory));
if (memory == NULL) return NULL;
histo = (VP8LHistogram*)memory;
// literal_ won't necessary be aligned.
histo->literal_ = (uint32_t*)(memory + sizeof(VP8LHistogram));
VP8LHistogramInit(histo, cache_bits);
return histo;
}
VP8LHistogramSet* VP8LAllocateHistogramSet(int size, int cache_bits) {
int i;
VP8LHistogramSet* set;
VP8LHistogram* bulk;
const uint64_t total_size = sizeof(*set)
+ (uint64_t)size * sizeof(*set->histograms)
+ (uint64_t)size * sizeof(**set->histograms);
const size_t total_size = sizeof(*set)
+ sizeof(*set->histograms) * size
+ (size_t)VP8LGetHistogramSize(cache_bits) * size;
uint8_t* memory = (uint8_t*)WebPSafeMalloc(total_size, sizeof(*memory));
if (memory == NULL) return NULL;
@@ -67,12 +111,15 @@ VP8LHistogramSet* VP8LAllocateHistogramSet(int size, int cache_bits) {
memory += sizeof(*set);
set->histograms = (VP8LHistogram**)memory;
memory += size * sizeof(*set->histograms);
bulk = (VP8LHistogram*)memory;
set->max_size = size;
set->size = size;
for (i = 0; i < size; ++i) {
set->histograms[i] = bulk + i;
set->histograms[i] = (VP8LHistogram*)memory;
// literal_ won't necessary be aligned.
set->histograms[i]->literal_ = (uint32_t*)(memory + sizeof(VP8LHistogram));
VP8LHistogramInit(set->histograms[i], cache_bits);
// There's no padding/alignment between successive histograms.
memory += VP8LGetHistogramSize(cache_bits);
}
return set;
}
@@ -87,36 +134,21 @@ void VP8LHistogramAddSinglePixOrCopy(VP8LHistogram* const histo,
++histo->literal_[PixOrCopyLiteral(v, 1)];
++histo->blue_[PixOrCopyLiteral(v, 0)];
} else if (PixOrCopyIsCacheIdx(v)) {
int literal_ix = 256 + NUM_LENGTH_CODES + PixOrCopyCacheIdx(v);
const int literal_ix =
NUM_LITERAL_CODES + NUM_LENGTH_CODES + PixOrCopyCacheIdx(v);
++histo->literal_[literal_ix];
} else {
int code, extra_bits;
VP8LPrefixEncodeBits(PixOrCopyLength(v), &code, &extra_bits);
++histo->literal_[256 + code];
++histo->literal_[NUM_LITERAL_CODES + code];
VP8LPrefixEncodeBits(PixOrCopyDistance(v), &code, &extra_bits);
++histo->distance_[code];
}
}
static double BitsEntropy(const int* const array, int n) {
double retval = 0.;
int sum = 0;
int nonzeros = 0;
int max_val = 0;
int i;
static WEBP_INLINE double BitsEntropyRefine(int nonzeros, int sum, int max_val,
double retval) {
double mix;
for (i = 0; i < n; ++i) {
if (array[i] != 0) {
sum += array[i];
++nonzeros;
retval -= VP8LFastSLog2(array[i]);
if (max_val < array[i]) {
max_val = array[i];
}
}
}
retval += VP8LFastSLog2(sum);
if (nonzeros < 5) {
if (nonzeros <= 1) {
return 0;
@@ -147,95 +179,142 @@ static double BitsEntropy(const int* const array, int n) {
}
}
// Returns the cost encode the rle-encoded entropy code.
// The constants in this function are experimental.
static double HuffmanCost(const int* const population, int length) {
static double BitsEntropy(const uint32_t* const array, int n) {
double retval = 0.;
uint32_t sum = 0;
int nonzeros = 0;
uint32_t max_val = 0;
int i;
for (i = 0; i < n; ++i) {
if (array[i] != 0) {
sum += array[i];
++nonzeros;
retval -= VP8LFastSLog2(array[i]);
if (max_val < array[i]) {
max_val = array[i];
}
}
}
retval += VP8LFastSLog2(sum);
return BitsEntropyRefine(nonzeros, sum, max_val, retval);
}
static double BitsEntropyCombined(const uint32_t* const X,
const uint32_t* const Y, int n) {
double retval = 0.;
int sum = 0;
int nonzeros = 0;
int max_val = 0;
int i;
for (i = 0; i < n; ++i) {
const int xy = X[i] + Y[i];
if (xy != 0) {
sum += xy;
++nonzeros;
retval -= VP8LFastSLog2(xy);
if (max_val < xy) {
max_val = xy;
}
}
}
retval += VP8LFastSLog2(sum);
return BitsEntropyRefine(nonzeros, sum, max_val, retval);
}
static double InitialHuffmanCost(void) {
// Small bias because Huffman code length is typically not stored in
// full length.
static const int kHuffmanCodeOfHuffmanCodeSize = CODE_LENGTH_CODES * 3;
static const double kSmallBias = 9.1;
double retval = kHuffmanCodeOfHuffmanCodeSize - kSmallBias;
int streak = 0;
int i = 0;
for (; i < length - 1; ++i) {
++streak;
if (population[i] == population[i + 1]) {
continue;
}
last_streak_hack:
// population[i] points now to the symbol in the streak of same values.
if (streak > 3) {
if (population[i] == 0) {
retval += 1.5625 + 0.234375 * streak;
} else {
retval += 2.578125 + 0.703125 * streak;
}
} else {
if (population[i] == 0) {
retval += 1.796875 * streak;
} else {
retval += 3.28125 * streak;
}
}
streak = 0;
}
if (i == length - 1) {
++streak;
goto last_streak_hack;
}
return kHuffmanCodeOfHuffmanCodeSize - kSmallBias;
}
// Finalize the Huffman cost based on streak numbers and length type (<3 or >=3)
static double FinalHuffmanCost(const VP8LStreaks* const stats) {
double retval = InitialHuffmanCost();
retval += stats->counts[0] * 1.5625 + 0.234375 * stats->streaks[0][1];
retval += stats->counts[1] * 2.578125 + 0.703125 * stats->streaks[1][1];
retval += 1.796875 * stats->streaks[0][0];
retval += 3.28125 * stats->streaks[1][0];
return retval;
}
static double PopulationCost(const int* const population, int length) {
// Trampolines
static double HuffmanCost(const uint32_t* const population, int length) {
const VP8LStreaks stats = VP8LHuffmanCostCount(population, length);
return FinalHuffmanCost(&stats);
}
static double HuffmanCostCombined(const uint32_t* const X,
const uint32_t* const Y, int length) {
const VP8LStreaks stats = VP8LHuffmanCostCombinedCount(X, Y, length);
return FinalHuffmanCost(&stats);
}
// Aggregated costs
static double PopulationCost(const uint32_t* const population, int length) {
return BitsEntropy(population, length) + HuffmanCost(population, length);
}
static double ExtraCost(const int* const population, int length) {
int i;
double cost = 0.;
for (i = 2; i < length - 2; ++i) cost += (i >> 1) * population[i + 2];
return cost;
static double GetCombinedEntropy(const uint32_t* const X,
const uint32_t* const Y, int length) {
return BitsEntropyCombined(X, Y, length) + HuffmanCostCombined(X, Y, length);
}
// Estimates the Entropy + Huffman + other block overhead size cost.
double VP8LHistogramEstimateBits(const VP8LHistogram* const p) {
return PopulationCost(p->literal_, VP8LHistogramNumCodes(p))
+ PopulationCost(p->red_, 256)
+ PopulationCost(p->blue_, 256)
+ PopulationCost(p->alpha_, 256)
+ PopulationCost(p->distance_, NUM_DISTANCE_CODES)
+ ExtraCost(p->literal_ + 256, NUM_LENGTH_CODES)
+ ExtraCost(p->distance_, NUM_DISTANCE_CODES);
return
PopulationCost(p->literal_, VP8LHistogramNumCodes(p->palette_code_bits_))
+ PopulationCost(p->red_, NUM_LITERAL_CODES)
+ PopulationCost(p->blue_, NUM_LITERAL_CODES)
+ PopulationCost(p->alpha_, NUM_LITERAL_CODES)
+ PopulationCost(p->distance_, NUM_DISTANCE_CODES)
+ VP8LExtraCost(p->literal_ + NUM_LITERAL_CODES, NUM_LENGTH_CODES)
+ VP8LExtraCost(p->distance_, NUM_DISTANCE_CODES);
}
double VP8LHistogramEstimateBitsBulk(const VP8LHistogram* const p) {
return BitsEntropy(p->literal_, VP8LHistogramNumCodes(p))
+ BitsEntropy(p->red_, 256)
+ BitsEntropy(p->blue_, 256)
+ BitsEntropy(p->alpha_, 256)
+ BitsEntropy(p->distance_, NUM_DISTANCE_CODES)
+ ExtraCost(p->literal_ + 256, NUM_LENGTH_CODES)
+ ExtraCost(p->distance_, NUM_DISTANCE_CODES);
return
BitsEntropy(p->literal_, VP8LHistogramNumCodes(p->palette_code_bits_))
+ BitsEntropy(p->red_, NUM_LITERAL_CODES)
+ BitsEntropy(p->blue_, NUM_LITERAL_CODES)
+ BitsEntropy(p->alpha_, NUM_LITERAL_CODES)
+ BitsEntropy(p->distance_, NUM_DISTANCE_CODES)
+ VP8LExtraCost(p->literal_ + NUM_LITERAL_CODES, NUM_LENGTH_CODES)
+ VP8LExtraCost(p->distance_, NUM_DISTANCE_CODES);
}
// -----------------------------------------------------------------------------
// Various histogram combine/cost-eval functions
// Adds 'in' histogram to 'out'
static void HistogramAdd(const VP8LHistogram* const in,
VP8LHistogram* const out) {
int i;
for (i = 0; i < PIX_OR_COPY_CODES_MAX; ++i) {
out->literal_[i] += in->literal_[i];
}
for (i = 0; i < NUM_DISTANCE_CODES; ++i) {
out->distance_[i] += in->distance_[i];
}
for (i = 0; i < 256; ++i) {
out->red_[i] += in->red_[i];
out->blue_[i] += in->blue_[i];
out->alpha_[i] += in->alpha_[i];
}
static int GetCombinedHistogramEntropy(const VP8LHistogram* const a,
const VP8LHistogram* const b,
double cost_threshold,
double* cost) {
const int palette_code_bits = a->palette_code_bits_;
assert(a->palette_code_bits_ == b->palette_code_bits_);
*cost += GetCombinedEntropy(a->literal_, b->literal_,
VP8LHistogramNumCodes(palette_code_bits));
*cost += VP8LExtraCostCombined(a->literal_ + NUM_LITERAL_CODES,
b->literal_ + NUM_LITERAL_CODES,
NUM_LENGTH_CODES);
if (*cost > cost_threshold) return 0;
*cost += GetCombinedEntropy(a->red_, b->red_, NUM_LITERAL_CODES);
if (*cost > cost_threshold) return 0;
*cost += GetCombinedEntropy(a->blue_, b->blue_, NUM_LITERAL_CODES);
if (*cost > cost_threshold) return 0;
*cost += GetCombinedEntropy(a->alpha_, b->alpha_, NUM_LITERAL_CODES);
if (*cost > cost_threshold) return 0;
*cost += GetCombinedEntropy(a->distance_, b->distance_, NUM_DISTANCE_CODES);
*cost += VP8LExtraCostCombined(a->distance_, b->distance_,
NUM_DISTANCE_CODES);
if (*cost > cost_threshold) return 0;
return 1;
}
// Performs out = a + b, computing the cost C(a+b) - C(a) - C(b) while comparing
@@ -250,41 +329,14 @@ static double HistogramAddEval(const VP8LHistogram* const a,
double cost_threshold) {
double cost = 0;
const double sum_cost = a->bit_cost_ + b->bit_cost_;
int i;
cost_threshold += sum_cost;
// palette_code_bits_ is part of the cost evaluation for literal_.
// TODO(skal): remove/simplify this palette_code_bits_?
out->palette_code_bits_ =
(a->palette_code_bits_ > b->palette_code_bits_) ? a->palette_code_bits_ :
b->palette_code_bits_;
for (i = 0; i < PIX_OR_COPY_CODES_MAX; ++i) {
out->literal_[i] = a->literal_[i] + b->literal_[i];
if (GetCombinedHistogramEntropy(a, b, cost_threshold, &cost)) {
VP8LHistogramAdd(a, b, out);
out->bit_cost_ = cost;
out->palette_code_bits_ = a->palette_code_bits_;
}
cost += PopulationCost(out->literal_, VP8LHistogramNumCodes(out));
cost += ExtraCost(out->literal_ + 256, NUM_LENGTH_CODES);
if (cost > cost_threshold) return cost;
for (i = 0; i < 256; ++i) out->red_[i] = a->red_[i] + b->red_[i];
cost += PopulationCost(out->red_, 256);
if (cost > cost_threshold) return cost;
for (i = 0; i < 256; ++i) out->blue_[i] = a->blue_[i] + b->blue_[i];
cost += PopulationCost(out->blue_, 256);
if (cost > cost_threshold) return cost;
for (i = 0; i < NUM_DISTANCE_CODES; ++i) {
out->distance_[i] = a->distance_[i] + b->distance_[i];
}
cost += PopulationCost(out->distance_, NUM_DISTANCE_CODES);
cost += ExtraCost(out->distance_, NUM_DISTANCE_CODES);
if (cost > cost_threshold) return cost;
for (i = 0; i < 256; ++i) out->alpha_[i] = a->alpha_[i] + b->alpha_[i];
cost += PopulationCost(out->alpha_, 256);
out->bit_cost_ = cost;
return cost - sum_cost;
}
@@ -294,52 +346,92 @@ static double HistogramAddEval(const VP8LHistogram* const a,
static double HistogramAddThresh(const VP8LHistogram* const a,
const VP8LHistogram* const b,
double cost_threshold) {
int tmp[PIX_OR_COPY_CODES_MAX]; // <= max storage we'll need
int i;
double cost = -a->bit_cost_;
for (i = 0; i < PIX_OR_COPY_CODES_MAX; ++i) {
tmp[i] = a->literal_[i] + b->literal_[i];
}
// note that the tests are ordered so that the usually largest
// cost shares come first.
cost += PopulationCost(tmp, VP8LHistogramNumCodes(a));
cost += ExtraCost(tmp + 256, NUM_LENGTH_CODES);
if (cost > cost_threshold) return cost;
for (i = 0; i < 256; ++i) tmp[i] = a->red_[i] + b->red_[i];
cost += PopulationCost(tmp, 256);
if (cost > cost_threshold) return cost;
for (i = 0; i < 256; ++i) tmp[i] = a->blue_[i] + b->blue_[i];
cost += PopulationCost(tmp, 256);
if (cost > cost_threshold) return cost;
for (i = 0; i < NUM_DISTANCE_CODES; ++i) {
tmp[i] = a->distance_[i] + b->distance_[i];
}
cost += PopulationCost(tmp, NUM_DISTANCE_CODES);
cost += ExtraCost(tmp, NUM_DISTANCE_CODES);
if (cost > cost_threshold) return cost;
for (i = 0; i < 256; ++i) tmp[i] = a->alpha_[i] + b->alpha_[i];
cost += PopulationCost(tmp, 256);
GetCombinedHistogramEntropy(a, b, cost_threshold, &cost);
return cost;
}
// -----------------------------------------------------------------------------
static void HistogramBuildImage(int xsize, int histo_bits,
const VP8LBackwardRefs* const backward_refs,
VP8LHistogramSet* const image) {
int i;
// The structure to keep track of cost range for the three dominant entropy
// symbols.
// TODO(skal): Evaluate if float can be used here instead of double for
// representing the entropy costs.
typedef struct {
double literal_max_;
double literal_min_;
double red_max_;
double red_min_;
double blue_max_;
double blue_min_;
} DominantCostRange;
static void DominantCostRangeInit(DominantCostRange* const c) {
c->literal_max_ = 0.;
c->literal_min_ = MAX_COST;
c->red_max_ = 0.;
c->red_min_ = MAX_COST;
c->blue_max_ = 0.;
c->blue_min_ = MAX_COST;
}
static void UpdateDominantCostRange(
const VP8LHistogram* const h, DominantCostRange* const c) {
if (c->literal_max_ < h->literal_cost_) c->literal_max_ = h->literal_cost_;
if (c->literal_min_ > h->literal_cost_) c->literal_min_ = h->literal_cost_;
if (c->red_max_ < h->red_cost_) c->red_max_ = h->red_cost_;
if (c->red_min_ > h->red_cost_) c->red_min_ = h->red_cost_;
if (c->blue_max_ < h->blue_cost_) c->blue_max_ = h->blue_cost_;
if (c->blue_min_ > h->blue_cost_) c->blue_min_ = h->blue_cost_;
}
static void UpdateHistogramCost(VP8LHistogram* const h) {
const double alpha_cost = PopulationCost(h->alpha_, NUM_LITERAL_CODES);
const double distance_cost =
PopulationCost(h->distance_, NUM_DISTANCE_CODES) +
VP8LExtraCost(h->distance_, NUM_DISTANCE_CODES);
const int num_codes = VP8LHistogramNumCodes(h->palette_code_bits_);
h->literal_cost_ = PopulationCost(h->literal_, num_codes) +
VP8LExtraCost(h->literal_ + NUM_LITERAL_CODES,
NUM_LENGTH_CODES);
h->red_cost_ = PopulationCost(h->red_, NUM_LITERAL_CODES);
h->blue_cost_ = PopulationCost(h->blue_, NUM_LITERAL_CODES);
h->bit_cost_ = h->literal_cost_ + h->red_cost_ + h->blue_cost_ +
alpha_cost + distance_cost;
}
static int GetBinIdForEntropy(double min, double max, double val) {
const double range = max - min + 1e-6;
const double delta = val - min;
return (int)(NUM_PARTITIONS * delta / range);
}
// TODO(vikasa): Evaluate, if there's any correlation between red & blue.
static int GetHistoBinIndex(
const VP8LHistogram* const h, const DominantCostRange* const c) {
const int bin_id =
GetBinIdForEntropy(c->blue_min_, c->blue_max_, h->blue_cost_) +
NUM_PARTITIONS * GetBinIdForEntropy(c->red_min_, c->red_max_,
h->red_cost_) +
NUM_PARTITIONS * NUM_PARTITIONS * GetBinIdForEntropy(c->literal_min_,
c->literal_max_,
h->literal_cost_);
assert(bin_id < BIN_SIZE);
return bin_id;
}
// Construct the histograms from backward references.
static void HistogramBuild(
int xsize, int histo_bits, const VP8LBackwardRefs* const backward_refs,
VP8LHistogramSet* const image_histo) {
int x = 0, y = 0;
const int histo_xsize = VP8LSubSampleSize(xsize, histo_bits);
VP8LHistogram** const histograms = image->histograms;
VP8LHistogram** const histograms = image_histo->histograms;
VP8LRefsCursor c = VP8LRefsCursorInit(backward_refs);
assert(histo_bits > 0);
for (i = 0; i < backward_refs->size; ++i) {
const PixOrCopy* const v = &backward_refs->refs[i];
// Construct the Histo from a given backward references.
while (VP8LRefsCursorOk(&c)) {
const PixOrCopy* const v = c.cur_pos;
const int ix = (y >> histo_bits) * histo_xsize + (x >> histo_bits);
VP8LHistogramAddSinglePixOrCopy(histograms[ix], v);
x += PixOrCopyLength(v);
@@ -347,9 +439,119 @@ static void HistogramBuildImage(int xsize, int histo_bits,
x -= xsize;
++y;
}
VP8LRefsCursorNext(&c);
}
}
// Copies the histograms and computes its bit_cost.
static void HistogramCopyAndAnalyze(
VP8LHistogramSet* const orig_histo, VP8LHistogramSet* const image_histo) {
int i;
const int histo_size = orig_histo->size;
VP8LHistogram** const orig_histograms = orig_histo->histograms;
VP8LHistogram** const histograms = image_histo->histograms;
for (i = 0; i < histo_size; ++i) {
VP8LHistogram* const histo = orig_histograms[i];
UpdateHistogramCost(histo);
// Copy histograms from orig_histo[] to image_histo[].
HistogramCopy(histo, histograms[i]);
}
}
// Partition histograms to different entropy bins for three dominant (literal,
// red and blue) symbol costs and compute the histogram aggregate bit_cost.
static void HistogramAnalyzeEntropyBin(
VP8LHistogramSet* const image_histo, int16_t* const bin_map) {
int i;
VP8LHistogram** const histograms = image_histo->histograms;
const int histo_size = image_histo->size;
const int bin_depth = histo_size + 1;
DominantCostRange cost_range;
DominantCostRangeInit(&cost_range);
// Analyze the dominant (literal, red and blue) entropy costs.
for (i = 0; i < histo_size; ++i) {
VP8LHistogram* const histo = histograms[i];
UpdateDominantCostRange(histo, &cost_range);
}
// bin-hash histograms on three of the dominant (literal, red and blue)
// symbol costs.
for (i = 0; i < histo_size; ++i) {
int num_histos;
VP8LHistogram* const histo = histograms[i];
const int16_t bin_id = (int16_t)GetHistoBinIndex(histo, &cost_range);
const int bin_offset = bin_id * bin_depth;
// bin_map[n][0] for every bin 'n' maintains the counter for the number of
// histograms in that bin.
// Get and increment the num_histos in that bin.
num_histos = ++bin_map[bin_offset];
assert(bin_offset + num_histos < bin_depth * BIN_SIZE);
// Add histogram i'th index at num_histos (last) position in the bin_map.
bin_map[bin_offset + num_histos] = i;
}
}
// Compact the histogram set by moving the valid one left in the set to the
// head and moving the ones that have been merged to other histograms towards
// the end.
// TODO(vikasa): Evaluate if this method can be avoided by altering the code
// logic of HistogramCombineEntropyBin main loop.
static void HistogramCompactBins(VP8LHistogramSet* const image_histo) {
int start = 0;
int end = image_histo->size - 1;
VP8LHistogram** const histograms = image_histo->histograms;
while (start < end) {
while (start <= end && histograms[start] != NULL &&
histograms[start]->bit_cost_ != 0.) {
++start;
}
while (start <= end && histograms[end]->bit_cost_ == 0.) {
histograms[end] = NULL;
--end;
}
if (start < end) {
assert(histograms[start] != NULL);
assert(histograms[end] != NULL);
HistogramCopy(histograms[end], histograms[start]);
histograms[end] = NULL;
--end;
}
}
image_histo->size = end + 1;
}
static void HistogramCombineEntropyBin(VP8LHistogramSet* const image_histo,
VP8LHistogram* const histos,
int16_t* const bin_map, int bin_depth,
double combine_cost_factor) {
int bin_id;
VP8LHistogram* cur_combo = histos;
VP8LHistogram** const histograms = image_histo->histograms;
for (bin_id = 0; bin_id < BIN_SIZE; ++bin_id) {
const int bin_offset = bin_id * bin_depth;
const int num_histos = bin_map[bin_offset];
const int idx1 = bin_map[bin_offset + 1];
int n;
for (n = 2; n <= num_histos; ++n) {
const int idx2 = bin_map[bin_offset + n];
const double bit_cost_idx2 = histograms[idx2]->bit_cost_;
if (bit_cost_idx2 > 0.) {
const double bit_cost_thresh = -bit_cost_idx2 * combine_cost_factor;
const double curr_cost_diff =
HistogramAddEval(histograms[idx1], histograms[idx2],
cur_combo, bit_cost_thresh);
if (curr_cost_diff < bit_cost_thresh) {
HistogramCopy(cur_combo, histograms[idx1]);
histograms[idx2]->bit_cost_ = 0.;
}
}
}
}
HistogramCompactBins(image_histo);
}
static uint32_t MyRand(uint32_t *seed) {
*seed *= 16807U;
if (*seed == 0) {
@@ -358,48 +560,45 @@ static uint32_t MyRand(uint32_t *seed) {
return *seed;
}
static int HistogramCombine(const VP8LHistogramSet* const in,
VP8LHistogramSet* const out, int iter_mult,
int num_pairs, int num_tries_no_success) {
int ok = 0;
int i, iter;
static void HistogramCombine(VP8LHistogramSet* const image_histo,
VP8LHistogramSet* const histos, int quality) {
int iter;
uint32_t seed = 0;
int tries_with_no_success = 0;
int out_size = in->size;
const int outer_iters = in->size * iter_mult;
int image_histo_size = image_histo->size;
const int iter_mult = (quality < 25) ? 2 : 2 + (quality - 25) / 8;
const int outer_iters = image_histo_size * iter_mult;
const int num_pairs = image_histo_size / 2;
const int num_tries_no_success = outer_iters / 2;
const int min_cluster_size = 2;
VP8LHistogram* const histos = (VP8LHistogram*)malloc(2 * sizeof(*histos));
VP8LHistogram* cur_combo = histos + 0; // trial merged histogram
VP8LHistogram* best_combo = histos + 1; // best merged histogram so far
if (histos == NULL) goto End;
VP8LHistogram** const histograms = image_histo->histograms;
VP8LHistogram* cur_combo = histos->histograms[0]; // trial histogram
VP8LHistogram* best_combo = histos->histograms[1]; // best histogram so far
// Copy histograms from in[] to out[].
assert(in->size <= out->size);
for (i = 0; i < in->size; ++i) {
in->histograms[i]->bit_cost_ = VP8LHistogramEstimateBits(in->histograms[i]);
*out->histograms[i] = *in->histograms[i];
}
// Collapse similar histograms in 'out'.
for (iter = 0; iter < outer_iters && out_size >= min_cluster_size; ++iter) {
// Collapse similar histograms in 'image_histo'.
for (iter = 0;
iter < outer_iters && image_histo_size >= min_cluster_size;
++iter) {
double best_cost_diff = 0.;
int best_idx1 = -1, best_idx2 = 1;
int j;
const int num_tries = (num_pairs < out_size) ? num_pairs : out_size;
const int num_tries =
(num_pairs < image_histo_size) ? num_pairs : image_histo_size;
seed += iter;
for (j = 0; j < num_tries; ++j) {
double curr_cost_diff;
// Choose two histograms at random and try to combine them.
const uint32_t idx1 = MyRand(&seed) % out_size;
const uint32_t idx1 = MyRand(&seed) % image_histo_size;
const uint32_t tmp = (j & 7) + 1;
const uint32_t diff = (tmp < 3) ? tmp : MyRand(&seed) % (out_size - 1);
const uint32_t idx2 = (idx1 + diff + 1) % out_size;
const uint32_t diff =
(tmp < 3) ? tmp : MyRand(&seed) % (image_histo_size - 1);
const uint32_t idx2 = (idx1 + diff + 1) % image_histo_size;
if (idx1 == idx2) {
continue;
}
// Calculate cost reduction on combining.
curr_cost_diff = HistogramAddEval(out->histograms[idx1],
out->histograms[idx2],
curr_cost_diff = HistogramAddEval(histograms[idx1], histograms[idx2],
cur_combo, best_cost_diff);
if (curr_cost_diff < best_cost_diff) { // found a better pair?
{ // swap cur/best combo histograms
@@ -414,12 +613,12 @@ static int HistogramCombine(const VP8LHistogramSet* const in,
}
if (best_idx1 >= 0) {
*out->histograms[best_idx1] = *best_combo;
HistogramCopy(best_combo, histograms[best_idx1]);
// swap best_idx2 slot with last one (which is now unused)
--out_size;
if (best_idx2 != out_size) {
out->histograms[best_idx2] = out->histograms[out_size];
out->histograms[out_size] = NULL; // just for sanity check.
--image_histo_size;
if (best_idx2 != image_histo_size) {
HistogramCopy(histograms[image_histo_size], histograms[best_idx2]);
histograms[image_histo_size] = NULL;
}
tries_with_no_success = 0;
}
@@ -427,38 +626,28 @@ static int HistogramCombine(const VP8LHistogramSet* const in,
break;
}
}
out->size = out_size;
ok = 1;
End:
free(histos);
return ok;
image_histo->size = image_histo_size;
}
// -----------------------------------------------------------------------------
// Histogram refinement
// What is the bit cost of moving square_histogram from cur_symbol to candidate.
static double HistogramDistance(const VP8LHistogram* const square_histogram,
const VP8LHistogram* const candidate,
double cost_threshold) {
return HistogramAddThresh(candidate, square_histogram, cost_threshold);
}
// Find the best 'out' histogram for each of the 'in' histograms.
// Note: we assume that out[]->bit_cost_ is already up-to-date.
static void HistogramRemap(const VP8LHistogramSet* const in,
const VP8LHistogramSet* const out,
static void HistogramRemap(const VP8LHistogramSet* const orig_histo,
const VP8LHistogramSet* const image_histo,
uint16_t* const symbols) {
int i;
for (i = 0; i < in->size; ++i) {
VP8LHistogram** const orig_histograms = orig_histo->histograms;
VP8LHistogram** const histograms = image_histo->histograms;
for (i = 0; i < orig_histo->size; ++i) {
int best_out = 0;
double best_bits =
HistogramDistance(in->histograms[i], out->histograms[0], 1.e38);
HistogramAddThresh(histograms[0], orig_histograms[i], MAX_COST);
int k;
for (k = 1; k < out->size; ++k) {
for (k = 1; k < image_histo->size; ++k) {
const double cur_bits =
HistogramDistance(in->histograms[i], out->histograms[k], best_bits);
HistogramAddThresh(histograms[k], orig_histograms[i], best_bits);
if (cur_bits < best_bits) {
best_bits = cur_bits;
best_out = k;
@@ -468,45 +657,85 @@ static void HistogramRemap(const VP8LHistogramSet* const in,
}
// Recompute each out based on raw and symbols.
for (i = 0; i < out->size; ++i) {
HistogramClear(out->histograms[i]);
for (i = 0; i < image_histo->size; ++i) {
HistogramClear(histograms[i]);
}
for (i = 0; i < in->size; ++i) {
HistogramAdd(in->histograms[i], out->histograms[symbols[i]]);
for (i = 0; i < orig_histo->size; ++i) {
const int idx = symbols[i];
VP8LHistogramAdd(orig_histograms[i], histograms[idx], histograms[idx]);
}
}
static double GetCombineCostFactor(int histo_size, int quality) {
double combine_cost_factor = 0.16;
if (histo_size > 256) combine_cost_factor /= 2.;
if (histo_size > 512) combine_cost_factor /= 2.;
if (histo_size > 1024) combine_cost_factor /= 2.;
if (quality <= 50) combine_cost_factor /= 2.;
return combine_cost_factor;
}
int VP8LGetHistoImageSymbols(int xsize, int ysize,
const VP8LBackwardRefs* const refs,
int quality, int histo_bits, int cache_bits,
VP8LHistogramSet* const image_in,
VP8LHistogramSet* const image_histo,
uint16_t* const histogram_symbols) {
int ok = 0;
const int histo_xsize = histo_bits ? VP8LSubSampleSize(xsize, histo_bits) : 1;
const int histo_ysize = histo_bits ? VP8LSubSampleSize(ysize, histo_bits) : 1;
const int histo_image_raw_size = histo_xsize * histo_ysize;
const int image_histo_raw_size = histo_xsize * histo_ysize;
// Heuristic params for HistogramCombine().
const int num_tries_no_success = 8 + (quality >> 1);
const int iter_mult = (quality < 27) ? 1 : 1 + ((quality - 27) >> 4);
const int num_pairs = (quality < 25) ? 10 : (5 * quality) >> 3;
// The bin_map for every bin follows following semantics:
// bin_map[n][0] = num_histo; // The number of histograms in that bin.
// bin_map[n][1] = index of first histogram in that bin;
// bin_map[n][num_histo] = index of last histogram in that bin;
// bin_map[n][num_histo + 1] ... bin_map[n][bin_depth - 1] = un-used indices.
const int bin_depth = image_histo_raw_size + 1;
int16_t* bin_map = NULL;
VP8LHistogramSet* const histos = VP8LAllocateHistogramSet(2, cache_bits);
VP8LHistogramSet* const orig_histo =
VP8LAllocateHistogramSet(image_histo_raw_size, cache_bits);
VP8LHistogramSet* const image_out =
VP8LAllocateHistogramSet(histo_image_raw_size, cache_bits);
if (image_out == NULL) return 0;
// Build histogram image.
HistogramBuildImage(xsize, histo_bits, refs, image_out);
// Collapse similar histograms.
if (!HistogramCombine(image_out, image_in, iter_mult, num_pairs,
num_tries_no_success)) {
if (orig_histo == NULL || histos == NULL) {
goto Error;
}
// Don't attempt linear bin-partition heuristic for:
// histograms of small sizes, as bin_map will be very sparse and;
// Higher qualities (> 90), to preserve the compression gains at those
// quality settings.
if (orig_histo->size > 2 * BIN_SIZE && quality < 90) {
const int bin_map_size = bin_depth * BIN_SIZE;
bin_map = (int16_t*)WebPSafeCalloc(bin_map_size, sizeof(*bin_map));
if (bin_map == NULL) goto Error;
}
// Construct the histograms from backward references.
HistogramBuild(xsize, histo_bits, refs, orig_histo);
// Copies the histograms and computes its bit_cost.
HistogramCopyAndAnalyze(orig_histo, image_histo);
if (bin_map != NULL) {
const double combine_cost_factor =
GetCombineCostFactor(image_histo_raw_size, quality);
HistogramAnalyzeEntropyBin(orig_histo, bin_map);
// Collapse histograms with similar entropy.
HistogramCombineEntropyBin(image_histo, histos->histograms[0],
bin_map, bin_depth, combine_cost_factor);
}
// Collapse similar histograms by random histogram-pair compares.
HistogramCombine(image_histo, histos, quality);
// Find the optimal map from original histograms to the final ones.
HistogramRemap(image_out, image_in, histogram_symbols);
HistogramRemap(orig_histo, image_histo, histogram_symbols);
ok = 1;
Error:
free(image_out);
Error:
WebPSafeFree(bin_map);
VP8LFreeHistogramSet(orig_histo);
VP8LFreeHistogramSet(histos);
return ok;
}

View File

@@ -32,18 +32,21 @@ extern "C" {
typedef struct {
// literal_ contains green literal, palette-code and
// copy-length-prefix histogram
int literal_[PIX_OR_COPY_CODES_MAX];
int red_[256];
int blue_[256];
int alpha_[256];
uint32_t* literal_; // Pointer to the allocated buffer for literal.
uint32_t red_[NUM_LITERAL_CODES];
uint32_t blue_[NUM_LITERAL_CODES];
uint32_t alpha_[NUM_LITERAL_CODES];
// Backward reference prefix-code histogram.
int distance_[NUM_DISTANCE_CODES];
uint32_t distance_[NUM_DISTANCE_CODES];
int palette_code_bits_;
double bit_cost_; // cached value of VP8LHistogramEstimateBits(this)
double bit_cost_; // cached value of VP8LHistogramEstimateBits(this)
double literal_cost_; // Cached values of dominant entropy costs:
double red_cost_; // literal, red & blue.
double blue_cost_;
} VP8LHistogram;
// Collection of histograms with fixed capacity, allocated as one
// big memory chunk. Can be destroyed by simply calling 'free()'.
// big memory chunk. Can be destroyed by calling WebPSafeFree().
typedef struct {
int size; // number of slots currently in use
int max_size; // maximum capacity
@@ -59,6 +62,9 @@ void VP8LHistogramCreate(VP8LHistogram* const p,
const VP8LBackwardRefs* const refs,
int palette_code_bits);
// Return the size of the histogram for a given palette_code_bits.
int VP8LGetHistogramSize(int palette_code_bits);
// Set the palette_code_bits and reset the stats.
void VP8LHistogramInit(VP8LHistogram* const p, int palette_code_bits);
@@ -66,10 +72,21 @@ void VP8LHistogramInit(VP8LHistogram* const p, int palette_code_bits);
void VP8LHistogramStoreRefs(const VP8LBackwardRefs* const refs,
VP8LHistogram* const histo);
// Free the memory allocated for the histogram.
void VP8LFreeHistogram(VP8LHistogram* const histo);
// Free the memory allocated for the histogram set.
void VP8LFreeHistogramSet(VP8LHistogramSet* const histo);
// Allocate an array of pointer to histograms, allocated and initialized
// using 'cache_bits'. Return NULL in case of memory error.
VP8LHistogramSet* VP8LAllocateHistogramSet(int size, int cache_bits);
// Allocate and initialize histogram object with specified 'cache_bits'.
// Returns NULL in case of memory error.
// Special case of VP8LAllocateHistogramSet, with size equals 1.
VP8LHistogram* VP8LAllocateHistogram(int cache_bits);
// Accumulate a token 'v' into a histogram.
void VP8LHistogramAddSinglePixOrCopy(VP8LHistogram* const histo,
const PixOrCopy* const v);
@@ -82,9 +99,9 @@ double VP8LHistogramEstimateBits(const VP8LHistogram* const p);
// represent the entropy code itself.
double VP8LHistogramEstimateBitsBulk(const VP8LHistogram* const p);
static WEBP_INLINE int VP8LHistogramNumCodes(const VP8LHistogram* const p) {
return 256 + NUM_LENGTH_CODES +
((p->palette_code_bits_ > 0) ? (1 << p->palette_code_bits_) : 0);
static WEBP_INLINE int VP8LHistogramNumCodes(int palette_code_bits) {
return NUM_LITERAL_CODES + NUM_LENGTH_CODES +
((palette_code_bits > 0) ? (1 << palette_code_bits) : 0);
}
// Builds the histogram image.

View File

@@ -1,44 +0,0 @@
// Copyright 2011 Google Inc. All Rights Reserved.
//
// Use of this source code is governed by a BSD-style license
// that can be found in the COPYING file in the root of the source
// tree. An additional intellectual property rights grant can be found
// in the file PATENTS. All contributing project authors may
// be found in the AUTHORS file in the root of the source tree.
// -----------------------------------------------------------------------------
//
// Enhancement layer (for YUV444/422)
//
// Author: Skal (pascal.massimino@gmail.com)
#include <stdlib.h>
#include "./vp8enci.h"
//------------------------------------------------------------------------------
void VP8EncInitLayer(VP8Encoder* const enc) {
enc->use_layer_ = (enc->pic_->u0 != NULL);
enc->layer_data_size_ = 0;
enc->layer_data_ = NULL;
if (enc->use_layer_) {
VP8BitWriterInit(&enc->layer_bw_, enc->mb_w_ * enc->mb_h_ * 3);
}
}
void VP8EncCodeLayerBlock(VP8EncIterator* it) {
(void)it; // remove a warning
}
int VP8EncFinishLayer(VP8Encoder* const enc) {
if (enc->use_layer_) {
enc->layer_data_ = VP8BitWriterFinish(&enc->layer_bw_);
enc->layer_data_size_ = VP8BitWriterSize(&enc->layer_bw_);
}
return 1;
}
void VP8EncDeleteLayer(VP8Encoder* enc) {
free(enc->layer_data_);
}

Some files were not shown because too many files have changed in this diff Show More