Update idioms skeleton to use REQUIRE_PG/REQUIRE_JSON_BODY/API_OK/API_ERR, and add the style rule banning inline ';' multi-statements (visual-review readability) to idioms + gotchas. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
107 lines
5.1 KiB
Markdown
107 lines
5.1 KiB
Markdown
---
|
|
doc: five-gotchas
|
|
title: Five gotchas & non-obvious traps
|
|
keywords: [gotcha, trap, pitfall, intrinsic, gengo, charset, utf8, string escape, Chr, pgrtl string columns, Val, model local, analyzer warning, fnode, runtime]
|
|
summary: The non-obvious semantic traps that pure grammar knowledge will NOT prevent. Each entry is a real mistake observed in practice plus the fix. This is the long-tail corpus that makes Five RAG actually work.
|
|
---
|
|
|
|
# Five gotchas (read before writing/debugging Five)
|
|
|
|
These are discovered-the-hard-way facts. Grammar docs won't save you here.
|
|
|
|
## 1. String functions are inlined intrinsics — editing `hbrtl` alone does nothing
|
|
|
|
The compiler (`compiler/gengo/gengo.go`) **inlines** `LEN, CHR, ASC, SUBSTR, LEFT, RIGHT,
|
|
AT, PADR, PADL` directly as Go. They do **not** dispatch through the `hbrtl` registry at
|
|
runtime. So changing the registered `hbrtl` function has **no effect** on these calls.
|
|
|
|
- To change their runtime behavior you must edit the gengo intrinsic cases. They now emit
|
|
calls to charset-aware helpers `hbrtl.StrLen/StrChr/StrAsc/StrSubStr/StrLeft/StrRight/
|
|
StrAt/StrPadR/StrPadL` (in `hbrtl/charset.go`).
|
|
- Functions used as code blocks / passed around DO hit the registry, so keep the registry
|
|
impl and the intrinsic in agreement.
|
|
|
|
## 2. Strings are UTF-8 (runes) by default; legacy charset is opt-in
|
|
|
|
`LEN("한글")` is `2` (runes), not bytes. `CHR(9650)` is `▲`. This is the default.
|
|
|
|
- Select a legacy charset with `HB_SETCHARSET("CP949")` / `HB_CDPSELECT("CP949")` — then
|
|
byte/charset semantics apply. `HB_GETCHARSET()` reads the active one.
|
|
- Initial charset comes from env `FIVE_CHARSET` (or `HB_CODEPAGE`); default `UTF8`.
|
|
- Convert across charsets with `HB_TRANSLATE(cStr, cFrom, cTo)`.
|
|
|
|
## 3. String literals do NOT process escapes (single OR double quotes)
|
|
|
|
`"a\nb"` is the literal characters `a \ n b` — **not** a newline. Same for `'...'`.
|
|
|
|
- For a newline use `Chr(10)` (and `Chr(13)` for CR); build control chars explicitly.
|
|
- To embed a quote: wrap in the *other* quote. A string containing `"` → use `'...'`;
|
|
a string containing `'` → use `"..."`. (No backslash-escaping exists.)
|
|
- Watch out building SQL/format strings: e.g. a literal `T` separator inside a
|
|
double-quoted SQL fragment can clash — concatenate instead: `... || 'T' || ...`.
|
|
|
|
## 4. Postgres columns come back as STRINGS
|
|
|
|
`PG_QUERY` (pgrtl) returns rows as hashes whose values are **all strings**, even for
|
|
`INTEGER`/`NUMERIC` columns. `Int("100")` semantics will bite you.
|
|
|
|
- Convert: `Val( hb_CStr( row["id"] ) )` for numbers.
|
|
- Bind params as strings too: `{ hb_NToS( nId ) }`.
|
|
|
|
## 5. `ctx_get("auth_user_id")` is a string
|
|
|
|
Auth context values are strings. `nUser := Val( ctx_get( "auth_user_id", "0" ) )`.
|
|
|
|
## 6. LLM `model = "local"` is rejected by mlx/llama servers
|
|
|
|
OpenAI-compatible local servers (mlx_lm, llama.cpp) 404 on unknown model names. The app's
|
|
`ResolveLlmModel` queries `/v1/models` and substitutes the actually-loaded id. If you call
|
|
an LLM endpoint directly, never send `model:"local"` — resolve the real id first.
|
|
|
|
## 7. Two runtimes — build with the right one
|
|
|
|
`solmade` builds with the **`fnode`** toolchain in `fivenode_go`, NOT the `five` CLI in
|
|
`fivedev/five`. They are separate runtimes with separate RTL behavior. Historic example:
|
|
`fivenode_go`'s `Chr()` double-encoded multibyte values (corruption), which is what
|
|
prompted implementing proper UTF-8 in `fivedev/five`. Don't assume behavior carries over.
|
|
|
|
## 8. Run the `five` CLI from inside `fivedev/five`
|
|
|
|
Module resolution depends on CWD. Building/running `five` from elsewhere can pick up the
|
|
wrong `replace` directive (e.g. resolving `five =>` to an unrelated repo). Always
|
|
`cd /Users/charleskwon/fivenode/fivedev/five` first, e.g.
|
|
`go build -o /tmp/five ./cmd/five && /tmp/five run x.prg`.
|
|
|
|
## 9. Analyzer "undeclared variable" warnings for RTL functions are harmless
|
|
|
|
The static analyzer warns `undeclared variable 'HB_FOO'` for RTL functions it doesn't know
|
|
about; they still resolve at runtime via the registry. To silence, add the name to the
|
|
known-function set in `compiler/analyzer/analyzer.go` (e.g. `HB_GETCHARSET` etc. were added
|
|
there). A warning is not an error.
|
|
|
|
## 10. STYLE: no inline `;` multi-statements (banned)
|
|
|
|
Five aims to be easy for a human to verify by eye. Do **not** pack multiple statements
|
|
onto one line with `;`: `IF nPG < 0 ; RETURN NIL ; ENDIF`, `IF Empty(x) ; x:="y" ; ENDIF`
|
|
are banned — they hurt visual review. Always expand:
|
|
|
|
```five
|
|
IF nPG < 0
|
|
RETURN NIL
|
|
ENDIF
|
|
```
|
|
|
|
(The *trailing* `;` for line continuation — joining a long string/SQL/arg list across
|
|
lines — is a different, allowed feature. The ban is only on `;` as a statement separator.)
|
|
|
|
## 11. Density is a double-edged sword when debugging
|
|
|
|
One line doing a lot means one line failing does a lot. When a dense statement misbehaves,
|
|
expand it (split the chained `hb_*`/`PG_*`/`LLM_CHAT` calls into temporaries) to localize
|
|
the fault before reasoning about it.
|
|
|
|
---
|
|
|
|
> Maintenance discipline: when you hit a NEW non-obvious trap, add it here. Pure-grammar
|
|
> RAG closes ~80% of the gap; this accumulating gotcha list closes the rest.
|