Files
five/rag/05-gotchas.md
CharlesKWON 537a302c92 docs(rag): adopt endpoint helpers + no-inline-semicolon style
Update idioms skeleton to use REQUIRE_PG/REQUIRE_JSON_BODY/API_OK/API_ERR,
and add the style rule banning inline ';' multi-statements (visual-review
readability) to idioms + gotchas.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 14:49:07 +09:00

107 lines
5.1 KiB
Markdown

---
doc: five-gotchas
title: Five gotchas & non-obvious traps
keywords: [gotcha, trap, pitfall, intrinsic, gengo, charset, utf8, string escape, Chr, pgrtl string columns, Val, model local, analyzer warning, fnode, runtime]
summary: The non-obvious semantic traps that pure grammar knowledge will NOT prevent. Each entry is a real mistake observed in practice plus the fix. This is the long-tail corpus that makes Five RAG actually work.
---
# Five gotchas (read before writing/debugging Five)
These are discovered-the-hard-way facts. Grammar docs won't save you here.
## 1. String functions are inlined intrinsics — editing `hbrtl` alone does nothing
The compiler (`compiler/gengo/gengo.go`) **inlines** `LEN, CHR, ASC, SUBSTR, LEFT, RIGHT,
AT, PADR, PADL` directly as Go. They do **not** dispatch through the `hbrtl` registry at
runtime. So changing the registered `hbrtl` function has **no effect** on these calls.
- To change their runtime behavior you must edit the gengo intrinsic cases. They now emit
calls to charset-aware helpers `hbrtl.StrLen/StrChr/StrAsc/StrSubStr/StrLeft/StrRight/
StrAt/StrPadR/StrPadL` (in `hbrtl/charset.go`).
- Functions used as code blocks / passed around DO hit the registry, so keep the registry
impl and the intrinsic in agreement.
## 2. Strings are UTF-8 (runes) by default; legacy charset is opt-in
`LEN("한글")` is `2` (runes), not bytes. `CHR(9650)` is `▲`. This is the default.
- Select a legacy charset with `HB_SETCHARSET("CP949")` / `HB_CDPSELECT("CP949")` — then
byte/charset semantics apply. `HB_GETCHARSET()` reads the active one.
- Initial charset comes from env `FIVE_CHARSET` (or `HB_CODEPAGE`); default `UTF8`.
- Convert across charsets with `HB_TRANSLATE(cStr, cFrom, cTo)`.
## 3. String literals do NOT process escapes (single OR double quotes)
`"a\nb"` is the literal characters `a \ n b`**not** a newline. Same for `'...'`.
- For a newline use `Chr(10)` (and `Chr(13)` for CR); build control chars explicitly.
- To embed a quote: wrap in the *other* quote. A string containing `"` → use `'...'`;
a string containing `'` → use `"..."`. (No backslash-escaping exists.)
- Watch out building SQL/format strings: e.g. a literal `T` separator inside a
double-quoted SQL fragment can clash — concatenate instead: `... || 'T' || ...`.
## 4. Postgres columns come back as STRINGS
`PG_QUERY` (pgrtl) returns rows as hashes whose values are **all strings**, even for
`INTEGER`/`NUMERIC` columns. `Int("100")` semantics will bite you.
- Convert: `Val( hb_CStr( row["id"] ) )` for numbers.
- Bind params as strings too: `{ hb_NToS( nId ) }`.
## 5. `ctx_get("auth_user_id")` is a string
Auth context values are strings. `nUser := Val( ctx_get( "auth_user_id", "0" ) )`.
## 6. LLM `model = "local"` is rejected by mlx/llama servers
OpenAI-compatible local servers (mlx_lm, llama.cpp) 404 on unknown model names. The app's
`ResolveLlmModel` queries `/v1/models` and substitutes the actually-loaded id. If you call
an LLM endpoint directly, never send `model:"local"` — resolve the real id first.
## 7. Two runtimes — build with the right one
`solmade` builds with the **`fnode`** toolchain in `fivenode_go`, NOT the `five` CLI in
`fivedev/five`. They are separate runtimes with separate RTL behavior. Historic example:
`fivenode_go`'s `Chr()` double-encoded multibyte values (corruption), which is what
prompted implementing proper UTF-8 in `fivedev/five`. Don't assume behavior carries over.
## 8. Run the `five` CLI from inside `fivedev/five`
Module resolution depends on CWD. Building/running `five` from elsewhere can pick up the
wrong `replace` directive (e.g. resolving `five =>` to an unrelated repo). Always
`cd /Users/charleskwon/fivenode/fivedev/five` first, e.g.
`go build -o /tmp/five ./cmd/five && /tmp/five run x.prg`.
## 9. Analyzer "undeclared variable" warnings for RTL functions are harmless
The static analyzer warns `undeclared variable 'HB_FOO'` for RTL functions it doesn't know
about; they still resolve at runtime via the registry. To silence, add the name to the
known-function set in `compiler/analyzer/analyzer.go` (e.g. `HB_GETCHARSET` etc. were added
there). A warning is not an error.
## 10. STYLE: no inline `;` multi-statements (banned)
Five aims to be easy for a human to verify by eye. Do **not** pack multiple statements
onto one line with `;`: `IF nPG < 0 ; RETURN NIL ; ENDIF`, `IF Empty(x) ; x:="y" ; ENDIF`
are banned — they hurt visual review. Always expand:
```five
IF nPG < 0
RETURN NIL
ENDIF
```
(The *trailing* `;` for line continuation — joining a long string/SQL/arg list across
lines — is a different, allowed feature. The ban is only on `;` as a statement separator.)
## 11. Density is a double-edged sword when debugging
One line doing a lot means one line failing does a lot. When a dense statement misbehaves,
expand it (split the chained `hb_*`/`PG_*`/`LLM_CHAT` calls into temporaries) to localize
the fault before reasoning about it.
---
> Maintenance discipline: when you hit a NEW non-obvious trap, add it here. Pure-grammar
> RAG closes ~80% of the gap; this accumulating gotcha list closes the rest.