Five strings now operate in Unicode rune units by default. Core string functions (LEN/CHR/ASC/SUBSTR/LEFT/RIGHT/AT/PADR/PADL) are charset-aware: UTF-8 rune semantics by default, byte/charset semantics when a legacy charset (CP949, CP1252, ...) is selected. Initial charset is settable via FIVE_CHARSET / HB_CODEPAGE env vars; default UTF8. - hbrtl/charset.go: charset state + Str* helpers + DecodeToUTF8/EncodeFromUTF8 + RTL HB_GETCHARSET/HB_SETCHARSET/HB_CDPSELECT/HB_TRANSLATE (x/text htmlindex) - compiler/gengo: inlined string intrinsics now call charset-aware hbrtl.Str* helpers instead of byte-based Go (they previously bypassed the RTL registry) - compiler/analyzer: register HB_GETCHARSET/HB_SETCHARSET/HB_TRANSLATE as known - hbrtl/regex.go: add HB_REGEX (array-of-submatches) Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
23 KiB
23 KiB