Elliott Hughes 5a0aa3dee2 Switch to a working UTF-8 mb/wc implementation.
Although glibc gets by with an 8-byte mbstate_t, OpenBSD uses 12 bytes (of
the 128 bytes it reserves!).

We can actually implement UTF-8 encoding/decoding with a 0-byte mbstate_t
which means we can make things work on LP32 too, as long as we accept the
limitation that the caller needs to present us with a complete sequence
before we'll process it.

Our behavior is fine when going from characters to bytes; we just
update the source wchar_t** to say how far through the input we got.

I'll come back and use the 4 bytes we do have to cope with byte sequences
split across multiple input buffers. The fact that we don't support
UTF-8 sequences longer than 4 bytes plus the fact that the first byte of
a UTF-8 sequence encodes the length means we shouldn't need the other
fields OpenBSD used (at the cost of some recomputation in cases where a
sequence is split across buffers).

This patch also makes the minimal changes necessary to setlocale(3) to
make us behave like glibc when an app requests UTF-8. (The difference
being that our "C" locale is the same as our "C.UTF-8" locale.)

Change-Id: Ied327a8c4643744b3611bf6bb005a9b389ba4c2f
2014-05-01 14:46:54 -07:00
2014-02-18 12:02:37 -08:00
2014-04-25 20:00:45 -07:00
2013-12-12 12:51:08 -08:00
2010-03-08 18:04:02 -08:00
Description
No description provided
25 MiB
Languages
C 68.1%
Assembly 16.2%
C++ 13.4%
Makefile 1.1%
Python 0.9%
Other 0.2%