Files
five/rag/05-gotchas.md
CharlesKWON cf370564f3 docs(rag): Five knowledge corpus for LLM agents
A retrieval-ready knowledge base so an LLM can read/write Five without
prior training: overview, syntax, full RTL catalog (from hbrtl/register.go),
web/worker idioms (from the solmade app), and a long-tail gotchas file.
Every doc has keyword/summary frontmatter; INDEX.md is the routing manifest.

Grounded by parallel source exploration; RTL names spot-checked against
register.go. The gotchas file is the compounding asset — append new traps.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 13:54:03 +09:00

4.6 KiB

doc, title, keywords, summary
doc title keywords summary
five-gotchas Five gotchas & non-obvious traps
gotcha
trap
pitfall
intrinsic
gengo
charset
utf8
string escape
Chr
pgrtl string columns
Val
model local
analyzer warning
fnode
runtime
The non-obvious semantic traps that pure grammar knowledge will NOT prevent. Each entry is a real mistake observed in practice plus the fix. This is the long-tail corpus that makes Five RAG actually work.

Five gotchas (read before writing/debugging Five)

These are discovered-the-hard-way facts. Grammar docs won't save you here.

1. String functions are inlined intrinsics — editing hbrtl alone does nothing

The compiler (compiler/gengo/gengo.go) inlines LEN, CHR, ASC, SUBSTR, LEFT, RIGHT, AT, PADR, PADL directly as Go. They do not dispatch through the hbrtl registry at runtime. So changing the registered hbrtl function has no effect on these calls.

  • To change their runtime behavior you must edit the gengo intrinsic cases. They now emit calls to charset-aware helpers hbrtl.StrLen/StrChr/StrAsc/StrSubStr/StrLeft/StrRight/ StrAt/StrPadR/StrPadL (in hbrtl/charset.go).
  • Functions used as code blocks / passed around DO hit the registry, so keep the registry impl and the intrinsic in agreement.

2. Strings are UTF-8 (runes) by default; legacy charset is opt-in

LEN("한글") is 2 (runes), not bytes. CHR(9650) is . This is the default.

  • Select a legacy charset with HB_SETCHARSET("CP949") / HB_CDPSELECT("CP949") — then byte/charset semantics apply. HB_GETCHARSET() reads the active one.
  • Initial charset comes from env FIVE_CHARSET (or HB_CODEPAGE); default UTF8.
  • Convert across charsets with HB_TRANSLATE(cStr, cFrom, cTo).

3. String literals do NOT process escapes (single OR double quotes)

"a\nb" is the literal characters a \ n bnot a newline. Same for '...'.

  • For a newline use Chr(10) (and Chr(13) for CR); build control chars explicitly.
  • To embed a quote: wrap in the other quote. A string containing " → use '...'; a string containing ' → use "...". (No backslash-escaping exists.)
  • Watch out building SQL/format strings: e.g. a literal T separator inside a double-quoted SQL fragment can clash — concatenate instead: ... || 'T' || ....

4. Postgres columns come back as STRINGS

PG_QUERY (pgrtl) returns rows as hashes whose values are all strings, even for INTEGER/NUMERIC columns. Int("100") semantics will bite you.

  • Convert: Val( hb_CStr( row["id"] ) ) for numbers.
  • Bind params as strings too: { hb_NToS( nId ) }.

5. ctx_get("auth_user_id") is a string

Auth context values are strings. nUser := Val( ctx_get( "auth_user_id", "0" ) ).

6. LLM model = "local" is rejected by mlx/llama servers

OpenAI-compatible local servers (mlx_lm, llama.cpp) 404 on unknown model names. The app's ResolveLlmModel queries /v1/models and substitutes the actually-loaded id. If you call an LLM endpoint directly, never send model:"local" — resolve the real id first.

7. Two runtimes — build with the right one

solmade builds with the fnode toolchain in fivenode_go, NOT the five CLI in fivedev/five. They are separate runtimes with separate RTL behavior. Historic example: fivenode_go's Chr() double-encoded multibyte values (corruption), which is what prompted implementing proper UTF-8 in fivedev/five. Don't assume behavior carries over.

8. Run the five CLI from inside fivedev/five

Module resolution depends on CWD. Building/running five from elsewhere can pick up the wrong replace directive (e.g. resolving five => to an unrelated repo). Always cd /Users/charleskwon/fivenode/fivedev/five first, e.g. go build -o /tmp/five ./cmd/five && /tmp/five run x.prg.

9. Analyzer "undeclared variable" warnings for RTL functions are harmless

The static analyzer warns undeclared variable 'HB_FOO' for RTL functions it doesn't know about; they still resolve at runtime via the registry. To silence, add the name to the known-function set in compiler/analyzer/analyzer.go (e.g. HB_GETCHARSET etc. were added there). A warning is not an error.

10. Density is a double-edged sword when debugging

One line doing a lot means one line failing does a lot. When a dense statement misbehaves, expand it (split the chained hb_*/PG_*/LLM_CHAT calls into temporaries) to localize the fault before reasoning about it.


Maintenance discipline: when you hit a NEW non-obvious trap, add it here. Pure-grammar RAG closes ~80% of the gap; this accumulating gotcha list closes the rest.