Elliott Hughes 5a0aa3dee2 Switch to a working UTF-8 mb/wc implementation.
Although glibc gets by with an 8-byte mbstate_t, OpenBSD uses 12 bytes (of
the 128 bytes it reserves!).

We can actually implement UTF-8 encoding/decoding with a 0-byte mbstate_t
which means we can make things work on LP32 too, as long as we accept the
limitation that the caller needs to present us with a complete sequence
before we'll process it.

Our behavior is fine when going from characters to bytes; we just
update the source wchar_t** to say how far through the input we got.

I'll come back and use the 4 bytes we do have to cope with byte sequences
split across multiple input buffers. The fact that we don't support
UTF-8 sequences longer than 4 bytes plus the fact that the first byte of
a UTF-8 sequence encodes the length means we shouldn't need the other
fields OpenBSD used (at the cost of some recomputation in cases where a
sequence is split across buffers).

This patch also makes the minimal changes necessary to setlocale(3) to
make us behave like glibc when an app requests UTF-8. (The difference
being that our "C" locale is the same as our "C.UTF-8" locale.)

Change-Id: Ied327a8c4643744b3611bf6bb005a9b389ba4c2f
2014-05-01 14:46:54 -07:00
..
2014-04-28 21:10:37 -07:00
2013-12-03 18:42:41 -08:00
2013-12-03 18:42:41 -08:00
2014-04-08 12:19:23 -07:00
2013-02-01 14:51:19 -08:00
2014-02-24 20:22:11 +00:00
2014-04-08 19:37:38 -07:00
2014-04-30 10:45:35 -07:00
2014-04-10 20:44:27 -07:00
2014-02-21 16:27:21 +00:00
2014-04-17 17:30:03 -07:00
2013-02-13 14:40:48 -08:00
2014-02-24 15:55:31 -08:00
2014-01-09 11:00:04 -08:00
2014-01-03 14:49:37 -08:00
2014-02-24 20:22:11 +00:00