webp/doc/webp-container-spec.txt

<!--

Although you may be viewing an alternate representation, this document
is sourced in Markdown, a light-duty markup scheme, and is optimized for
the [kramdown](http://kramdown.rubyforge.org/) transformer.

See the accompanying README. External link targets are referenced at the
end of this file.

-->


WebP Container Specification
============================

* TOC placeholder
{:toc}


Introduction
------------

WebP is an image format that uses either (i) the VP8 key frame encoding
to compress image data in a lossy way, or (ii) the WebP lossless encoding
(and possibly other encodings in the future). These encoding schemes should
make it more efficient than currently used formats. It is optimized for fast
image transfer over the network (e.g., for websites). The WebP format has
feature parity (color profile, metadata, animation etc) with other formats as
well. This document describes the structure of a WebP file.

The WebP container (i.e., RIFF container for WebP) allows feature support over
and above the basic use case of WebP (i.e., a file containing a single image
encoded as a VP8 key frame). The WebP container provides additional support
for:

  * **Lossless compression.** An image can be losslessly compressed, using the
    WebP Lossless Format.

  * **Metadata.** An image may have metadata stored in EXIF or XMP formats.

  * **Transparency.** An image may have transparency, i.e., an alpha channel.

  * **Color Profile.** An image may have an embedded ICC profile as described
    by the [International Color Consortium][iccspec].

  * **Animation.** An image may have multiple frames with pauses between them,
    making it an animation.

  * **Image Fragmentation.** A single bitstream in WebP has an inherent
    limitation for width or height of 2^14 pixels, and, when using VP8, a 512
    KiB limit on the size of the first compressed partition. To support larger
    images, the format supports images that are composed of multiple fragments,
    each encoded as a separate bitstream. All fragments logically form a single
    image: they have common metadata, color profile, etc. Image fragmentation
    may also improve efficiency for larger images, e.g., grass can be encoded
    differently than sky.

The key words "MUST", "MUST NOT", "REQUIRED", "SHALL", "SHALL NOT",
"SHOULD", "SHOULD NOT", "RECOMMENDED", "MAY", and "OPTIONAL" in this
document are to be interpreted as described in [RFC 2119][].

**Note:** Out of the features mentioned above, lossy compression, lossless
compression, transparency, metadata, color profile and animation are finalized
and are to be considered stable. On the other hand, image fragmentation is
experimental as of now, and is open to discussion, feedback and comments.
The same is indicated using annotation "_status: experimental_" in the relevant
sections of this document.

Terminology &amp; Basics
------------------------

A WebP file contains either a still image (i.e., an encoded matrix of pixels)
or an [animation](#animation). Optionally, it can also contain transparency
information, color profile and metadata. In case we need to refer only to the
matrix of pixels, we will call it the _canvas_ of the image.

Below are additional terms used throughout this document:

_Reader/Writer_

: Code that reads WebP files is referred to as a _reader_, while code that
writes them is referred to as a _writer_.

_uint16_

: A 16-bit, little-endian, unsigned integer.

_uint24_

: A 24-bit, little-endian, unsigned integer.

_uint32_

: A 32-bit, little-endian, unsigned integer.

_FourCC_

: A _FourCC_ (four-character code) is a _uint32_ created by concatenating four
  ASCII characters in little-endian order.

_1-based_

: An unsigned integer field storing values offset by `-1`. e.g., Such a field
would store value _25_ as _24_.

RIFF file format
----------------
The WebP file format is based on the RIFF (resource interchange file format)
document format.

The basic element of a RIFF file is a _chunk_. It consists of:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         Chunk FourCC                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          Chunk Size                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         Chunk Payload                         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Chunk FourCC: 32 bits

: ASCII four-character code used for chunk identification.

Chunk Size: 32 bits (_uint32_)

: The size of the chunk not including this field, the chunk identifier or
  padding.

Chunk Payload: _Chunk Size_ bytes

: The data payload. If _Chunk Size_ is odd, a single padding byte -- that
  SHOULD be `0` -- is added.

_ChunkHeader('ABCD')_

: This is used to describe the _FourCC_ and _Chunk Size_ header of individual
  chunks, where 'ABCD' is the FourCC for the chunk. This element's
  size is 8 bytes.

**Note:** RIFF has a convention that all-uppercase chunk FourCCs are standard
chunks that apply to any RIFF file format, while FourCCs specific to a file
format are all lowercase. WebP does not follow this convention.

WebP file header
----------------

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      'R'      |      'I'      |      'F'      |      'F'      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           File Size                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |      'W'      |      'E'      |      'B'      |      'P'      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

'RIFF': 32 bits

: The ASCII characters 'R' 'I' 'F' 'F'.

File Size: 32 bits (_uint32_)

: The size of the file in bytes starting at offset 8. The maximum value of
this field is 2^32 minus 10 bytes and thus the size of the whole file is at
most 4GiB minus 2 bytes.

'WEBP': 32 bits

: The ASCII characters 'W' 'E' 'B' 'P'.

A WebP file MUST begin with a RIFF header with the FourCC 'WEBP'. The file size
in the header is the total size of the chunks that follow plus `4` bytes for
the 'WEBP' FourCC. The file SHOULD NOT contain anything after it. As the size
of any chunk is even, the size given by the RIFF header is also even. The
contents of individual chunks will be described in the following sections.

Simple file format (lossy)
--------------------------

This layout SHOULD be used if the image requires _lossy_ encoding and does not
require transparency or other advanced features provided by the extended format.
Files with this layout are smaller and supported by older software.

Simple WebP (lossy) file format:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    WebP file header (12 bytes)                |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          VP8 chunk                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VP8 chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('VP8 ')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           VP8 data                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VP8 data: _Chunk Size_ bytes

: VP8 bitstream data.

The VP8 bitstream format specification can be found at [VP8 Data Format and
Decoding Guide][vp8spec]. Note that the VP8 frame header contains the VP8 frame
width and height. That is assumed to be the width and height of the canvas.

The VP8 specification describes how to decode the image into Y'CbCr
format. To convert to RGB, Rec. 601 SHOULD be used.

Simple file format (lossless)
-----------------------------

**Note:** Older readers may not support files using the lossless format.

This layout SHOULD be used if the image requires _lossless_ encoding (with an
optional transparency channel) and does not require advanced features provided
by the extended format.

Simple WebP (lossless) file format:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                    WebP file header (12 bytes)                |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                          VP8L chunk                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VP8L chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('VP8L')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                           VP8L data                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

VP8L data: _Chunk Size_ bytes

: VP8L bitstream data.

The current specification of the VP8L bitstream can be found at
[WebP Lossless Bitstream Format][webpllspec]. Note that the VP8L header
contains the VP8L image width and height. That is assumed to be the width
and height of the canvas.

Extended file format
--------------------

**Note:** Older readers may not support files using the extended format.

An extended format file consists of:

  * A 'VP8X' chunk with information about features used in the file.

  * An optional 'ICCP' chunk with color profile.

  * An optional 'ANIM' chunk with animation control data.

  * Image data.

  * An optional 'EXIF' chunk with EXIF metadata.

  * An optional 'XMP ' chunk with XMP metadata.

  * An optional list of [unknown chunks](#unknown-chunks). _\[status: experimental\]_

For a _still image_, the _image data_ consists of a single frame, whereas for
an _animated image_, it consists of multiple frames. More details about frames
can be found in the [Animation](#animation) section.

Moreover, each frame can be fragmented or non-fragmented, as will be described
in the [Extended WebP file header](#extended_header) section. More details about
fragments can be found in the [Fragments](#fragments) section.

All chunks SHOULD be placed in the same order as listed above. If a chunk
appears in the wrong place, the file is invalid, but readers MAY parse the
file, ignoring the chunks that come too late.

**Rationale:** Setting the order of chunks should allow quicker file
parsing. For example, if an 'ALPH' chunk does not appear in its required
position, a decoder can choose to stop searching for it. The rule of
ignoring late chunks should make programs that need to do a full search
give the same results as the ones stopping early.

Extended WebP file header:
{:#extended_header}

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                   WebP file header (12 bytes)                 |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('VP8X')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |Rsv|I|L|E|X|A|F|                   Reserved                    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |          Canvas Width Minus One               |             ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...  Canvas Height Minus One    |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Reserved (Rsv): 2 bits

: SHOULD be `0`.

ICC profile (I): 1 bit

: Set if the file contains an ICC profile.

Alpha (L): 1 bit

: Set if any of the frames of the image contain transparency information
("alpha").

EXIF metadata (E): 1 bit

: Set if the file contains EXIF metadata.

XMP metadata (X): 1 bit

: Set if the file contains XMP metadata.

Animation (A): 1 bit

: Set if this is an animated image. Data in 'ANIM' and 'ANMF' chunks should be
used to control the animation.

Image Fragmentation (F): 1 bit _\[status: experimental\]_

: Set if any of the frames in the image are represented by fragments.

Reserved: 24 bits

: SHOULD be `0`.

Canvas Width Minus One: 24 bits

: _1-based_ width of the canvas in pixels.
  The actual canvas width is '1 + Canvas Width Minus One'

Canvas Height Minus One: 24 bits

: _1-based_ height of the canvas in pixels.
  The actual canvas height is '1 + Canvas Height Minus One'

The product of _Canvas Width_ and _Canvas Height_ MUST be at most `2^32 - 1`.

Future specifications MAY add more fields.

### Chunks

#### Animation

An animation is controlled by ANIM and ANMF chunks.

ANIM Chunk:
{:#anim_chunk}

For an animated image, this chunk contains the _global parameters_ of the
animation.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('ANIM')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       Background Color                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |          Loop Count           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Background Color: 32 bits (_uint32_)

: The default background color of the canvas in \[Blue, Green, Red, Alpha\]
byte order. This color MAY be used to fill the unused space on the canvas around
the frames, as well as the transparent pixels of the first frame. Background
color is also used when disposal method is `1`.

**Note**:

  * Background color MAY contain a transparency value (alpha), even if the
    _Alpha_ flag in [VP8X chunk](#extended_header) is unset.

  * Viewer applications SHOULD treat the background color value as a hint, and
    are not required to use it.

Loop Count: 16 bits (_uint16_)

: The number of times to loop the animation. `0` means infinitely.

This chunk MUST appear if the _Animation_ flag in the VP8X chunk is set.
If the _Animation_ flag is not set and this chunk is present, it
SHOULD be ignored.


ANMF chunk:

For animated images, this chunk contains information about a _single_ frame.
If the _Animation flag_ is not set, then this chunk SHOULD NOT be present.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('ANMF')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        Frame X                |             ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...          Frame Y            |   Frame Width Minus One     ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...             |           Frame Height Minus One              |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                 Frame Duration                |  Reserved |B|D|
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                         Frame Data                            |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Frame X: 24 bits (_uint24_)

: The X coordinate of the upper left corner of the frame is `Frame X * 2`

Frame Y: 24 bits (_uint24_)

: The Y coordinate of the upper left corner of the frame is `Frame Y * 2`

Frame Width Minus One: 24 bits (_uint24_)

: The _1-based_ width of the frame.
  The frame width is `1 + Frame Width Minus One`

Frame Height Minus One: 24 bits (_uint24_)

: The _1-based_ height of the frame.
  The frame height is `1 + Frame Height Minus One`

Frame Duration: 24 bits (_uint24_)

: The time to wait before displaying the next frame, in 1 millisecond units.
In particular, frame duration of 0 is useful when one wants to update multiple
areas of the canvas at once during the animation.

Reserved: 6 bits

: SHOULD be 0.

Blending method (B): 1 bit

: Indicates how transparent pixels of _the current frame_ are to be blended with
corresponding pixels of the previous canvas:

  * `0`: Use alpha blending. After disposing of the previous frame, render the
    current frame on the canvas using [alpha-blending](#alpha-blending). If the
    current frame does not have an alpha channel, assume alpha value of 255,
    effectively replacing the rectangle.

  * `1`: Do not blend. After disposing of the previous frame, render the
    current frame on the canvas by overwriting the rectangle covered by the
    current frame.

Disposal method (D): 1 bit

: Indicates how _the current frame_ is to be treated after it has been displayed
(before rendering the next frame) on the canvas:

  * `0`: Do not dispose. Leave the canvas as is.

  * `1`: Dispose to background color. Fill the _rectangle_ on the canvas covered
    by the _current frame_ with background color specified in the
    [ANIM chunk](#anim_chunk).

**Notes**:

  * The frame disposal only applies to the _frame rectangle_, that is, the
    rectangle defined by _Frame X_, _Frame Y_, _frame width_ and _frame height_.
    It may or may not cover the whole canvas.

{:#alpha-blending}
  * **Alpha-blending**:

    Given that each of the R, G, B and A channels is 8-bit, and the RGB
    channels are _not premultiplied_ by alpha, the formula for blending
    'dst' onto 'src' is:

~~~~~
    blend.A = src.A + dst.A * (1 - src.A / 255)
    if blend.A = 0 then
      blend.RGB = 0
    else
      blend.RGB = (src.RGB * src.A +
                   dst.RGB * dst.A * (1 - src.A / 255)) / blend.A
~~~~~

  * Alpha-blending SHOULD be done in linear color space, by taking into account
    the [color profile](#color-profile) of the image. If the color profile is
    not present, sRGB is to be assumed. (Note that sRGB also needs to be
    linearized due to a gamma of ~2.2).

Frame Data: _Chunk Size_ - `16` bytes

: For a fragmented frame, it consists of multiple [fragment chunks](#fragments).

: For a non-fragmented frame, it consists of:

  * An optional [alpha subchunk](#alpha) for the frame.

  * A [bitstream subchunk](#bitstream-vp8vp8l) for the frame.

  * An optional list of [unknown chunks](#unknown-chunks).

**Note**: The 'ANMF' payload, _Frame Data_ above, consists of individual
_padded_ chunks as described by the [RIFF file format](#riff-file-format).

#### Fragments _\[status: experimental\]_

For images that are represented by fragments, this chunk contains data for
a single fragment. If the _Image Fragmentation Flag_ is not set, then this chunk
SHOULD NOT be present.

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('FRGM')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                  Fragment X                   |             ...
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    ...       Fragment Y            |         Fragment Data         |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Fragment X: 24 bits (_uint24_)

: The X coordinate of the upper left corner of the fragment is `Fragment X * 2`

Fragment Y: 24 bits (_uint24_)

: The Y coordinate of the upper left corner of the fragment is `Fragment Y * 2`

Fragment Data: _Chunk Size_ - `6` bytes

: It contains:

  * An optional [alpha subchunk](#alpha) for the fragment.
  * The [bitstream subchunk](#bitstream-vp8vp8l) for the fragment.
  * An optional list of [unknown chunks](#unknown-chunks).

Note: The width and height of the fragment is obtained from the bitstream
subchunk.

The fragments of a frame SHOULD have the following properties:

  * They collectively cover the whole frame.

  * No pair of fragments have any overlapping region on the frame.

  * No portion of any fragment should be located outside of the canvas.

#### Alpha

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('ALPH')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |Rsv| P | F | C |     Alpha Bitstream...                        |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Reserved (Rsv): 2 bits

: SHOULD be `0`.

Pre-processing (P): 2 bits

: These INFORMATIVE bits are used to signal the pre-processing that has
been performed during compression. The decoder can use this information to
e.g. dither the values or smooth the gradients prior to display.

  * `0`: no pre-processing
  * `1`: level reduction

Filtering method (F): 2 bits

: The filtering method used:

  * `0`: None.
  * `1`: Horizontal filter.
  * `2`: Vertical filter.
  * `3`: Gradient filter.

For each pixel, filtering is performed using the following calculations.
Assume the alpha values surrounding the current `X` position are labeled as:

     C | B |
    ---+---+
     A | X |

We seek to compute the alpha value at position `X`. First, a prediction is
made depending on the filtering method:

  * Method `0`: predictor = 0
  * Method `1`: predictor = A
  * Method `2`: predictor = B
  * Method `3`: predictor = clip(A + B - C)

where `clip(v)` is equal to:

  * 0    if v < 0
  * 255  if v > 255
  * v    otherwise

The final value is derived by adding the decompressed value `X` to the
predictor and using modulo-256 arithmetic to wrap the \[256-511\] range
into the \[0-255\] one:

`alpha = (predictor + X) % 256`

There are special cases for left-most and top-most pixel positions:

  * Top-left value at location (0,0) uses 0 as predictor value. Otherwise,
  * For horizontal or gradient filtering methods, the left-most pixels at
    location (0, y) are predicted using the location (0, y-1) just above.
  * For vertical or gradient filtering methods, the top-most pixels at
    location (x, 0) are predicted using the location (x-1, 0) on the left.


Decoders are not required to use this information in any specified way.

Compression method (C): 2 bits

: The compression method used:

  * `0`: No compression.
  * `1`: Compressed using the WebP lossless format.

Alpha bitstream: _Chunk Size_ - `1` bytes

: Encoded alpha bitstream.

This optional chunk contains encoded alpha data for this frame/fragment. A
frame/fragment containing a 'VP8L' chunk SHOULD NOT contain this chunk.

**Rationale**: The transparency information is already part of the 'VP8L'
chunk.

The alpha channel data is stored as uncompressed raw data (when
compression method is '0') or compressed using the lossless format
(when the compression method is '1').

  * Raw data: consists of a byte sequence of length width * height,
    containing all the 8-bit transparency values in scan order.

  * Lossless format compression: the byte sequence is a compressed
    image-stream (as described in the [WebP Lossless Bitstream Format]
    [webpllspec]) of implicit dimension width x height. That is, this
    image-stream does NOT contain any headers describing the image dimension.

    **Rationale**: the dimension is already known from other sources,
    so storing it again would be redundant and error-prone.

    Once the image-stream is decoded into ARGB color values, following
    the process described in the lossless format specification, the
    transparency information must be extracted from the *green* channel
    of the ARGB quadruplet.

    **Rationale**: the green channel is allowed extra transformation
    steps in the specification -- unlike the other channels -- that can
    improve compression.

#### Bitstream (VP8/VP8L)

This chunk contains compressed bitstream data for a single frame/fragment.

A bitstream chunk may be either (i) a VP8 chunk, using "VP8 " (note the
significant fourth-character space) as its tag _or_ (ii) a VP8L chunk, using
"VP8L" as its tag.

The formats of VP8 and VP8L chunks are as described in sections
[Simple file format (lossy)](#simple-file-format-lossy)
and [Simple file format (lossless)](#simple-file-format-lossless) respectively.

#### Color profile

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('ICCP')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                       Color Profile                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

Color Profile: _Chunk Size_ bytes

: ICC profile.

This chunk MUST appear before the image data.

There SHOULD be at most one such chunk. If there are more such chunks, readers
MAY ignore all except the first one.
See the [ICC Specification][iccspec] for details.

If this chunk is not present, sRGB SHOULD be assumed.

#### Metadata

Metadata can be stored in 'EXIF' or 'XMP ' chunks.

There SHOULD be at most one chunk of each type ('EXIF' and 'XMP '). If there
are more such chunks, readers MAY ignore all except the first one. Also, a file
may possibly contain both 'EXIF' and 'XMP ' chunks.

The chunks are defined as follows:

EXIF chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('EXIF')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        EXIF Metadata                          |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

EXIF Metadata: _Chunk Size_ bytes

: image metadata in EXIF format.


XMP chunk:

     0                   1                   2                   3
     0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1 2 3 4 5 6 7 8 9 0 1
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                      ChunkHeader('XMP ')                      |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+
    |                        XMP Metadata                           |
    +-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+-+

XMP Metadata: _Chunk Size_ bytes

: image metadata in XMP format.

Additional guidance about handling metadata can be found in the
Metadata Working Group's [Guidelines for Handling Metadata][metadata].

#### Unknown Chunks _\[status: experimental\]_

A RIFF chunk (described in [this](#terminology-amp-basics) section) whose _chunk
tag_ is different from any of the chunks described in this document, is
considered an _unknown chunk_.

**Rationale**: Allowing unknown chunks gives a provision for future extension
of the format, and also allows storage of any application-specific data.

A file MAY contain unknown chunks:

  * At the end of the file as described in [Extended WebP file
    header](#extended_header) section.
  * At the end of FRGM and ANMF chunks as described in [Fragments](#fragments)
    and [Animation](#animation) sections.

Readers SHOULD ignore these chunks. Writers SHOULD preserve them in their
original order (unless they specifically intend to modify these chunks).

### Assembling the Canvas from fragments/frames

Here we provide an overview of how a reader should assemble a canvas in case
of a fragmented-image and in case of an animated image. The notation
_VP8X.field_ means the field in the 'VP8X' chunk with the same description.

Displaying a _fragmented image_ canvas MUST be equivalent to the following
pseudocode: _\[status: experimental\]_

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
assert VP8X.flags.hasFragments
canvas ← new black image of size VP8X.canvasWidth x VP8X.canvasHeight.
frgm_params ← nil
for chunk in image_data:
    assert chunk.tag is "FRGM"
    frgm_params.fragmentX = Fragment X
    frgm_params.fragmentY = Fragment Y
    for subchunk in 'Fragment Data':
        if subchunk.tag == "ALPH":
            assert alpha subchunks not found in 'Fragment Data' earlier
            frgm_params.alpha = alpha_data
        else if subchunk.tag == "VP8 " OR subchunk.tag == "VP8L":
            assert bitstream subchunks not found in 'Fragment Data' earlier
            frgm_params.bitstream = bitstream_data
    frgm_params.fragmentWidth = Width extracted from bitstream subchunk
    frgm_params.fragmentHeight = Height extracted from bitstream subchunk
    assert VP8X.canvasWidth >=
        frgm_params.fragmentX + frgm_params.fragmentWidth
    assert VP8X.canvasHeight >=
        frgm_params.fragmentY + frgm_params.fragmentHeight
    assert fragment has the properties mentioned in "Image Fragments" section.
    render fragment with frame_params.alpha and frame_params.bitstream on canvas
    with top-left corner in (frgm_params.fragmentX, frgm_params.fragmentY).
canvas contains the decoded canvas.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Displaying an _animated image_ canvas MUST be equivalent to the following
pseudocode:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
assert VP8X.flags.hasAnimation
canvas ← new image of size VP8X.canvasWidth x VP8X.canvasHeight with
background color ANIM.background_color.
loop_count ← ANIM.loopCount
dispose_method ← ANIM.disposeMethod
if loop_count == 0:
    loop_count = ∞
frame_params ← nil
for loop = 0, ..., loop_count - 1
    assert next chunk in image_data is ANMF
    frame_params.frameX = Frame X
    frame_params.frameY = Frame Y
    frame_params.frameWidth = Frame Width Minus One + 1
    frame_params.frameHeight = Frame Height Minus One + 1
    frame_params.frameDuration = Frame Duration
    assert VP8X.canvasWidth >= frame_params.frameX + frame_params.frameWidth
    assert VP8X.canvasHeight >= frame_params.frameY + frame_params.frameHeight
    if VP8X.flags.hasFragments and first subchunk in 'Frame Data' is FRGM
        // Fragmented frame.
        frame_params.{bitstream,alpha} = canvas decoded from subchunks in
                                         'Frame Data' as per the pseudocode for
                                         _fragmented image_ above.
    else
        // Non-fragmented frame.
        for subchunk in 'Frame Data':
            if subchunk.tag == "ALPH":
                assert alpha subchunks not found in 'Frame Data' earlier
                frame_params.alpha = alpha_data
            else if subchunk.tag == "VP8 " OR subchunk.tag == "VP8L":
                assert bitstream subchunks not found in 'Frame Data' earlier
                frame_params.bitstream = bitstream_data
    render frame with frame_params.alpha and frame_params.bitstream on canvas
    with top-left corner in (frame_params.frameX, frame_params.frameY), using
    dispose method dispose_method.
    Show the contents of the image for frame_params.frameDuration * 1ms.
canvas contains the decoded canvas.
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

Example file layouts
--------------------

A lossy encoded image with alpha may look as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- ALPH (alpha bitstream)
+- VP8 (bitstream)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A losslessly encoded image may look as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- XYZW (unknown chunk)
+- VP8L (lossless bitstream)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A lossless image with ICC profile and XMP metadata may
look as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- ICCP (color profile)
+- VP8L (lossless bitstream)
+- XMP  (metadata)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

A fragmented image may look as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- FRGM (fragment1 parameters + data)
+- FRGM (fragment2 parameters + data)
+- FRGM (fragment3 parameters + data)
+- FRGM (fragment4 parameters + data)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

An animated image with EXIF metadata may look as follows:

~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~
RIFF/WEBP
+- VP8X (descriptions of features used)
+- ANIM (global animation parameters)
+- ANMF (frame1 parameters + data)
+- ANMF (frame2 parameters + data)
+- ANMF (frame3 parameters + data)
+- ANMF (frame4 parameters + data)
+- EXIF (metadata)
~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~~

[vp8spec]:  http://tools.ietf.org/html/rfc6386
[webpllspec]: https://gerrit.chromium.org/gerrit/gitweb?p=webm/libwebp.git;a=blob;f=doc/webp-lossless-bitstream-spec.txt;hb=master
[iccspec]: http://www.color.org/icc_specs2.xalter
[metadata]: http://www.metadataworkinggroup.org/pdf/mwg_guidance.pdf
[rfc 2119]: http://tools.ietf.org/html/rfc2119