- 04-idioms: document the lint.sh + smoke_test.sh gates and their wiring (build.sh gate, pre-commit hook, deploy-time smoke). - search.sh: ripgrep/grep keyword ranker over the corpus (keywords ×3 + body), prints ranked docs + matching section headers — makes the RAG searchable with no index to build. README updated. - Note: KWONDoc bluge MCP/CLI was unavailable here (MCP not connected; CLI license-gated), so search.sh delivers the "searchable" goal now; a bluge/embeddings index can ingest the same .md files later. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
49 lines
2.6 KiB
Markdown
49 lines
2.6 KiB
Markdown
# Five RAG — knowledge corpus for LLM agents writing Five
|
||
|
||
A compact, retrieval-ready knowledge base that lets an LLM read and write **Five**
|
||
(xBase/Harbour → Go) code correctly without prior training on it. This is the practical
|
||
form of "give the model the grammar via RAG": grammar + RTL surface + real idioms +
|
||
the long-tail gotchas.
|
||
|
||
## Why this exists
|
||
|
||
Five is token-dense, so the corpus needed to *teach* a model is small and cheap to inject
|
||
— a dense language is cheaper to RAG than a verbose one. Grammar/RTL retrieval closes most
|
||
of the gap; the accumulating **gotchas** file closes the semantic long tail.
|
||
|
||
## Contents
|
||
|
||
| File | What it covers |
|
||
|------|----------------|
|
||
| `01-overview.md` | What Five is, design priorities, the two runtimes, compile model |
|
||
| `02-syntax.md` | Declarations, literals, operators, control flow, code blocks |
|
||
| `03-rtl-catalog.md` | Runtime-library functions (strings, array, hash, JSON, date, regex, charset, …) |
|
||
| `04-idioms.md` | Web/worker patterns: HTTP endpoint, routing, Postgres, job queue, LLM, build/deploy |
|
||
| `05-gotchas.md` | Non-obvious traps + fixes (the highest-signal file) |
|
||
| `06-security.md` | Web security patterns: authz, sessions, password hashing, XSS, CSP, uploads |
|
||
| `INDEX.md` | Retrieval manifest (doc → keywords + one-line) |
|
||
|
||
Every file has YAML frontmatter (`doc`, `title`, `keywords`, `summary`) for ranking.
|
||
|
||
## How to consume
|
||
|
||
- **Direct context injection (simplest):** for a small/medium task, paste the relevant
|
||
doc(s). For broad work, `01`+`02`+`05` fit easily; pull `03`/`04` sections as needed.
|
||
- **Keyword retrieval (built-in):** run `./search.sh <terms>` — a dependency-free
|
||
ripgrep/grep ranker over the corpus (frontmatter `keywords` weighted ×3 + body),
|
||
printing ranked docs with the matching `##` section headers. No index to build.
|
||
e.g. `./search.sh session token csprng` → `06-security.md §2`. `INDEX.md` is the
|
||
hand-curated routing table; a bluge/embeddings index can ingest the same `.md` files.
|
||
- **Embedding RAG:** chunk by `##` headers (each section is self-contained). Frontmatter
|
||
`summary` makes a good chunk preamble.
|
||
|
||
Suggested system-prompt pointer: *"When writing Five (.prg) code, consult the Five RAG at
|
||
`fivedev/five/rag/` — especially `05-gotchas.md` — and prefer patterns from `04-idioms.md`."*
|
||
|
||
## Maintenance
|
||
|
||
- Keep `03-rtl-catalog.md` honest against `hbrtl/register.go` (names are authoritative;
|
||
rare signatures may drift).
|
||
- **Append every new trap to `05-gotchas.md`.** That file is the compounding asset.
|
||
- Grammar truth: `compiler/{lexer,parser,ast}`. Idiom truth: the `solmade` app.
|