doc: Add function overview and usage page

While the external headers define the API, we could really use this overview to get users started and point them to examples. Change-Id: Iba419e61d0d7723e1029a3b6e7259facfeb39522 Signed-off-by: Greg Tucker <greg.b.tucker@intel.com>
2025-07-03 09:25:20 +02:00 · 2022-02-15 16:59:31 -07:00 · 2022-02-15 16:59:31 -07:00 · ad8dce15c6
commit ad8dce15c6
parent 57846f414f
2 changed files with 202 additions and 0 deletions
--- a/1
+++ b/1
@ -14,6 +14,7 @@ INPUT                  = isa-l.h \
 			 README.md \
 			 CONTRIBUTING.md \
 			 Release_notes.txt \
 			 doc/functions.md \
 			 doc/test.md \
 			 doc/build.md
--- a/doc/functions.md
+++ b/doc/functions.md
@ -0,0 +1,201 @@
 # ISA-L Function Overview
 ISA-L is logically broken into mostly independent units based on the source
 directories of the same name.
 - erasure_codes
 - crc
 - raid
 - mem
 - igzip
 The library can also be built with subsets of available units. For example
 `$ make -f Makefile.unx units=crc` will only build a library with crc
 functions.
 ## ISA-L Functions
 ### Erasure Code Functions
 Functions pertaining to erasure codes implement a general Reed-Solomon type
 encoding for blocks of data to protect against erasure of whole blocks.
 Individual operations can be described in terms of arithmetic in the Galois
 finite field GF(2^8) with the particular field-defining primitive or reducing
 polynomial \f$ x^8 + x^4 + x^3 + x^2 + 1 \f$ (0x1d).
 For example, the function ec_encode_data() will generate a set of parity blocks
 \f$P_i\f$ from the set of k source blocks \f$D_i\f$ and arbitrary encoding
 coefficients \f$a_{i,j}\f$ where each byte in P is calculated from sources as:
 \f[ P_i = \sum_{j=1}^k a_{i,j} \cdot D_j \f]
 where addition and multiplication \f$\cdot\f$ is defined in GF(2^8).  Since any
 arbitrary set of coefficients \f$a_{i,j}\f$ can be supplied, the same
 fundamental function can be used for encoding blocks or decoding from blocks in
 erasure.
 #### EC Usage
 Various examples are available in examples/ec and unit tests in `erasure_code`
 to show an encode and decode (re-hydrate) cycle or partial update operation. As
 seen in [ec example] the process starts with picking an
 encode matrix, parameters k (source blocks) and m (total parity + source
 blocks), and expanding the necessary coefficients.
 ~~~c
 	// Initialize g_tbls from encode matrix
 	ec_init_tables(k, p, &encode_matrix[k * k], g_tbls);
 ~~~
 In the example, a symmetric encode matrix is used where only the coefficients
 describing the parity blocks are used for encode and the upper matrix is
 initialized as an identity to simplify generation of the corresponding decode
 matrix. Next the parity for all (m - k) blocks are calculated at once.
 ~~~c
 	// Generate EC parity blocks from sources
 	ec_encode_data(len, k, p, g_tbls, frag_ptrs, &frag_ptrs[k]);
 ~~~
 ### RAID Functions
 Functions in the RAID section calculate and operate on XOR and P+Q parity found
 in common RAID implementations.  The mathematics of RAID are based on Galois
 finite-field arithmetic to find one or two parity bytes for each byte in N
 sources such that single or dual disk failures (one or two erasures) can be
 corrected.  For RAID5, a block of parity is calculated by the xor across the N
 source arrays.  Each parity byte is calculated from N sources by:
 \f[ P = D_0 + D_1 + ... + D_{N-1} \f]
 where \f$D_n\f$ are elements across each source array [0-(N-1)] and + is the
 bit-wise exclusive or (xor) operation.  Elements in GF(2^8) are implemented as
 bytes.
 For RAID6, two parity bytes P and Q are calculated from the source array.  P is
 calculated as in RAID5 and Q is calculated using the generator g as:
 \f[ Q = g^0 D_0 + g^1 D_1 + g^2 D_2 + ... + g^{N-1} D_{N-1} \f]
 where g is chosen as {2}, the second field element.  Multiplication and the
 field are defined using the primitive polynomial \f$ x^8 + x^4 + x^3 + x^2 + 1 \f$
 (0x1d).
 #### RAID Usage
 RAID function usage is similar to erasure code except no coefficient expansion
 step is necessary. As seen in [raid example] the xor_gen() and xor_check()
 functions are used to generate and check parity.
 ### CRC Functions
 Functions in the CRC section include fast implementations of cyclic redundancy
 check using specialized instructions such as PCLMULQDQ, carry-less
 multiplication.  Generally, a CRC is the remainder in binary division of a
 message and a CRC polynomial in GF(2).
 \f[ CRC(M(x)) = x^{deg(P(x))} \cdot M(x) \, mod \, P(x) \f]
 CRC is used in many storage applications to ensure integrity of data by
 appending the CRC to a message.  Various standards choose the polynomial P and
 may vary by initial seeding value, bit reversal and inverting the result and
 seed.
 #### CRC Usage
 CRC functions have a simple interface such as in [crc example].
 ~~~c
 	crc64_checksum = crc64_ecma_refl(crc64_checksum, inbuf, avail_in);
 ~~~
 Updates with new buffers are possible with subsequent calls. No extra finalize
 step is necessary.
 ### Compress/Inflate Functions
 Functions in the igzip unit perform fast, loss-less data compression and
 decompression within the [deflate](https://www.ietf.org/rfc/rfc1951.txt),
 [zlib](https://www.ietf.org/rfc/rfc1950.txt), and
 [gzip](https://www.ietf.org/rfc/rfc1952.txt) binary standards. Functions for
 stream based (data pieces at a time) and stateless (data all at once) are
 available as well as multiple parameters to change the speed vs. compression
 ratio or other features.  In addition, there are functions to fine tune
 compression by pre-computing static Huffman tables and setting for subsequent
 compression runs, parsing compression headers and other specific tasks to give
 more control.
 #### Compress/Inflate Usage
 The interface for compression and decompression functions is similar to zlib,
 zstd and others where a context structure keeps parameters and internal state to
 render from an input buffer to an output buffer.  I/O buffer pointers and size
 are often the only required settings.  ISA-L, unlike zlib and others, does not
 allocate new memory and must be done by the user explicitly when required (level
 1 and above).  This gives the user more flexibility to when dynamic memory is
 allocated and reused. The minimum code for starting a compression is just
 allocating a stream structure and initializing it.  This can be done just once
 for multiple compression runs.
 ~~~c
 	struct isal_zstream stream;
 	isal_deflate_init(&stream);
 ~~~
 Using level 1 compression and above requires an additional, initial allocation
 for an internal intermediate buffer.  Suggested sizes are defined in external
 headers.
 ~~~c
 	stream.level = 1;
 	stream.level_buf = malloc(ISAL_DEF_LVL1_DEFAULT);
 	stream.level_buf_size = ISAL_DEF_LVL1_DEFAULT;
 ~~~
 After init, subsequent, multiple compression runs can be performed by supplying
 (or re-using) I/O buffers.
 ~~~c
 	stream.next_in = inbuf;
 	stream->next_out = outbuf;
 	stream->avail_in = inbuf_size;
 	stream->avail_out = outbuf_size;
 	isal_deflate(stream);
 ~~~
 See [igzip example] for a simple example program or review the perf or check
 tests for more.
 **igzip**: ISA-L also provides a user program *igzip* to compress and decompress
 files.  Optionally igzip can be compiled with multi-threaded compression.  See
 `man igzip` for details.
 ## General Library Features
 ### Multi-Binary Dispatchers
 Multibinary support is available for all units in ISA-L.  With multibinary
 support functions, an appropriate version is selected at first run and can be
 called instead of architecture-specific versions. This allows users to deploy a
 single binary with multiple function versions and choose at run time based on
 platform features. All functions also have base functions, written in portable
 C, which the multibinary function will call if none of the required instruction
 sets are enabled.
 ### Included Tests and Utilities
 ISA-L source [repo] includes unit tests, performance tests and other utilities.
 Examples:
 - [ec example]
 - [raid example]
 - [crc example]
 - [igzip example]
 ---
 [repo]: https://github.com/intel/isa-l
 [ec example]: https://github.com/intel/isa-l/blob/master/examples/ec/ec_simple_example.c
 [raid example]: https://github.com/intel/isa-l/blob/master/raid/xor_example.c
 [crc example]: https://github.com/intel/isa-l/blob/master/crc/crc64_example.c
 [igzip example]: https://github.com/intel/isa-l/blob/master/igzip/igzip_example.c