Corpus indexed into KWONDoc's bluge index (~/.kwondoc/search-index, category five-rag) so bluge_search surfaces it; README documents the re-index command (cmd/ragindex online upsert, doesn't wipe other docs). Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
53 lines
2.9 KiB
Markdown
53 lines
2.9 KiB
Markdown
# Five RAG — knowledge corpus for LLM agents writing Five
|
||
|
||
A compact, retrieval-ready knowledge base that lets an LLM read and write **Five**
|
||
(xBase/Harbour → Go) code correctly without prior training on it. This is the practical
|
||
form of "give the model the grammar via RAG": grammar + RTL surface + real idioms +
|
||
the long-tail gotchas.
|
||
|
||
## Why this exists
|
||
|
||
Five is token-dense, so the corpus needed to *teach* a model is small and cheap to inject
|
||
— a dense language is cheaper to RAG than a verbose one. Grammar/RTL retrieval closes most
|
||
of the gap; the accumulating **gotchas** file closes the semantic long tail.
|
||
|
||
## Contents
|
||
|
||
| File | What it covers |
|
||
|------|----------------|
|
||
| `01-overview.md` | What Five is, design priorities, the two runtimes, compile model |
|
||
| `02-syntax.md` | Declarations, literals, operators, control flow, code blocks |
|
||
| `03-rtl-catalog.md` | Runtime-library functions (strings, array, hash, JSON, date, regex, charset, …) |
|
||
| `04-idioms.md` | Web/worker patterns: HTTP endpoint, routing, Postgres, job queue, LLM, build/deploy |
|
||
| `05-gotchas.md` | Non-obvious traps + fixes (the highest-signal file) |
|
||
| `06-security.md` | Web security patterns: authz, sessions, password hashing, XSS, CSP, uploads |
|
||
| `INDEX.md` | Retrieval manifest (doc → keywords + one-line) |
|
||
|
||
Every file has YAML frontmatter (`doc`, `title`, `keywords`, `summary`) for ranking.
|
||
|
||
## How to consume
|
||
|
||
- **Direct context injection (simplest):** for a small/medium task, paste the relevant
|
||
doc(s). For broad work, `01`+`02`+`05` fit easily; pull `03`/`04` sections as needed.
|
||
- **Keyword retrieval (built-in):** run `./search.sh <terms>` — a dependency-free
|
||
ripgrep/grep ranker over the corpus (frontmatter `keywords` weighted ×3 + body),
|
||
printing ranked docs with the matching `##` section headers. No index to build.
|
||
e.g. `./search.sh session token csprng` → `06-security.md §2`.
|
||
- **bluge full-text index (KWONDoc):** this corpus is indexed into KWONDoc's bluge
|
||
index (`~/.kwondoc/search-index`, category `five-rag`) so `bluge_search` finds it.
|
||
Re-index after edits: `cd ~/kwondoc && go run ./cmd/ragindex <abs path to rag> five-rag`
|
||
(online upsert — keyed by file path, does not wipe other docs).
|
||
- `INDEX.md` is the hand-curated routing table; an embeddings index can ingest the same `.md`.
|
||
- **Embedding RAG:** chunk by `##` headers (each section is self-contained). Frontmatter
|
||
`summary` makes a good chunk preamble.
|
||
|
||
Suggested system-prompt pointer: *"When writing Five (.prg) code, consult the Five RAG at
|
||
`fivedev/five/rag/` — especially `05-gotchas.md` — and prefer patterns from `04-idioms.md`."*
|
||
|
||
## Maintenance
|
||
|
||
- Keep `03-rtl-catalog.md` honest against `hbrtl/register.go` (names are authoritative;
|
||
rare signatures may drift).
|
||
- **Append every new trap to `05-gotchas.md`.** That file is the compounding asset.
|
||
- Grammar truth: `compiler/{lexer,parser,ast}`. Idiom truth: the `solmade` app.
|