do not treat valid noncharacters as errors
1 file changed
tree: 9d3c44ad7ce6bbc42f943a74811313fe46327405
  1. bin/
  2. include/
  3. share/
  4. src/
  5. test/
  6. .gitignore
  7. .travis.yml
  8. config.mk
  9. LICENSE
  10. Makefile
  11. README.md
README.md

libutf

Build Status

This is a C89 UTF-8 library, with an API compatible with that of Plan 9's libutf, but with a number of improvements:

  • Support for runes beyond the Basic Multilingual Plane.
  • utflen and utfnlen cannot overflow on 32- or 64-bit machines.
  • chartorune treats all invalid codepoints (as per RFC 3629) as though U+FFFD.
  • fullrune, utfecpy, and utfnlen do not overestimate the length of malformed runes.
  • An extra function, charntorune(p,s,n), equivalent to fullrune(s,n)?chartorune(p,s):0.

Differences to be aware of:

  • UTFmax is 6, though runetochar will never write more than 4 bytes. Plan 9's UTFmax is 3.
  • chartorune may consume multiple bytes for each illegal rune; Plan 9 always consumes 1.
  • runelen and runetochar return 0 if the rune is too large; Plan 9 erroneously returns UTFmax.