Files
five/compiler
CharlesKWON d5e15272d2 feat(charset): UTF-8 default string semantics with selectable charset
Five strings now operate in Unicode rune units by default. Core string
functions (LEN/CHR/ASC/SUBSTR/LEFT/RIGHT/AT/PADR/PADL) are charset-aware:
UTF-8 rune semantics by default, byte/charset semantics when a legacy
charset (CP949, CP1252, ...) is selected. Initial charset is settable via
FIVE_CHARSET / HB_CODEPAGE env vars; default UTF8.

- hbrtl/charset.go: charset state + Str* helpers + DecodeToUTF8/EncodeFromUTF8
  + RTL HB_GETCHARSET/HB_SETCHARSET/HB_CDPSELECT/HB_TRANSLATE (x/text htmlindex)
- compiler/gengo: inlined string intrinsics now call charset-aware hbrtl.Str*
  helpers instead of byte-based Go (they previously bypassed the RTL registry)
- compiler/analyzer: register HB_GETCHARSET/HB_SETCHARSET/HB_TRANSLATE as known
- hbrtl/regex.go: add HB_REGEX (array-of-submatches)

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 12:42:33 +09:00
..