Although glibc gets by with an 8-byte mbstate_t, OpenBSD uses 12 bytes (of
the 128 bytes it reserves!).
We can actually implement UTF-8 encoding/decoding with a 0-byte mbstate_t
which means we can make things work on LP32 too, as long as we accept the
limitation that the caller needs to present us with a complete sequence
before we'll process it.
Our behavior is fine when going from characters to bytes; we just
update the source wchar_t** to say how far through the input we got.
I'll come back and use the 4 bytes we do have to cope with byte sequences
split across multiple input buffers. The fact that we don't support
UTF-8 sequences longer than 4 bytes plus the fact that the first byte of
a UTF-8 sequence encodes the length means we shouldn't need the other
fields OpenBSD used (at the cost of some recomputation in cases where a
sequence is split across buffers).
This patch also makes the minimal changes necessary to setlocale(3) to
make us behave like glibc when an app requests UTF-8. (The difference
being that our "C" locale is the same as our "C.UTF-8" locale.)
Change-Id: Ied327a8c4643744b3611bf6bb005a9b389ba4c2f
This replaces a partial set of non-functional functions with a complete
set of functions, all of which actually work.
This requires us to implement mbsnrtowcs and wcsnrtombs which completes
the set of what we need for libc++.
The mbsnrtowcs is basically a copy & paste of wcsnrtombs, but I'm going
to go straight to looking at using the OpenBSD UTF-8 implementation rather
than keep polishing our home-grown turd.
(This patch also opportunistically switches us over to upstream btowc,
mbrlen, and wctob, since they're all trivially expressed in terms of
other functions.)
Change-Id: I0f81443840de0f1aa73b96f0b51988976793a323
This is an implementation in the style of the rest: char == byte.
We might want to come back and implement UTF-8, but this is enough for ltrace.
Bug: 13747066
Change-Id: Ib2b63609c9014fdef9a8491e067467c4fc5ae3cc