five/rag/README.md

# Five RAG — knowledge corpus for LLM agents writing Five

A compact, retrieval-ready knowledge base that lets an LLM read and write **Five**
(xBase/Harbour → Go) code correctly without prior training on it. This is the practical
form of "give the model the grammar via RAG": grammar + RTL surface + real idioms +
the long-tail gotchas.

## Why this exists

Five is token-dense, so the corpus needed to *teach* a model is small and cheap to inject
— a dense language is cheaper to RAG than a verbose one. Grammar/RTL retrieval closes most
of the gap; the accumulating **gotchas** file closes the semantic long tail.

## Contents

| File | What it covers |
|------|----------------|
| `01-overview.md` | What Five is, design priorities, the two runtimes, compile model |
| `02-syntax.md` | Declarations, literals, operators, control flow, code blocks |
| `03-rtl-catalog.md` | Runtime-library functions (strings, array, hash, JSON, date, regex, charset, …) |
| `04-idioms.md` | Web/worker patterns: HTTP endpoint, routing, Postgres, job queue, LLM, build/deploy |
| `05-gotchas.md` | Non-obvious traps + fixes (the highest-signal file) |
| `06-security.md` | Web security patterns: authz, sessions, password hashing, XSS, CSP, uploads |
| `INDEX.md` | Retrieval manifest (doc → keywords + one-line) |

Every file has YAML frontmatter (`doc`, `title`, `keywords`, `summary`) for ranking.

## How to consume

- **Direct context injection (simplest):** for a small/medium task, paste the relevant
  doc(s). For broad work, `01`+`02`+`05` fit easily; pull `03`/`04` sections as needed.
- **Keyword retrieval (built-in):** run `./search.sh <terms>` — a dependency-free
  ripgrep/grep ranker over the corpus (frontmatter `keywords` weighted ×3 + body),
  printing ranked docs with the matching `##` section headers. No index to build.
  e.g. `./search.sh session token csprng` → `06-security.md §2`. `INDEX.md` is the
  hand-curated routing table; a bluge/embeddings index can ingest the same `.md` files.
- **Embedding RAG:** chunk by `##` headers (each section is self-contained). Frontmatter
  `summary` makes a good chunk preamble.

Suggested system-prompt pointer: *"When writing Five (.prg) code, consult the Five RAG at
`fivedev/five/rag/` — especially `05-gotchas.md` — and prefer patterns from `04-idioms.md`."*

## Maintenance

- Keep `03-rtl-catalog.md` honest against `hbrtl/register.go` (names are authoritative;
  rare signatures may drift).
- **Append every new trap to `05-gotchas.md`.** That file is the compounding asset.
- Grammar truth: `compiler/{lexer,parser,ast}`. Idiom truth: the `solmade` app.