Files

CharlesKWON b8a19bd350 docs(rag): note bluge full-text index (built via KWONDoc source)

Corpus indexed into KWONDoc's bluge index (~/.kwondoc/search-index,
category five-rag) so bluge_search surfaces it; README documents the
re-index command (cmd/ragindex online upsert, doesn't wipe other docs).

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>

2026-06-15 16:35:00 +09:00

2.9 KiB

Raw Blame History

Five RAG — knowledge corpus for LLM agents writing Five

A compact, retrieval-ready knowledge base that lets an LLM read and write Five (xBase/Harbour → Go) code correctly without prior training on it. This is the practical form of "give the model the grammar via RAG": grammar + RTL surface + real idioms + the long-tail gotchas.

Why this exists

Five is token-dense, so the corpus needed to teach a model is small and cheap to inject — a dense language is cheaper to RAG than a verbose one. Grammar/RTL retrieval closes most of the gap; the accumulating gotchas file closes the semantic long tail.

File	What it covers
`01-overview.md`	What Five is, design priorities, the two runtimes, compile model
`02-syntax.md`	Declarations, literals, operators, control flow, code blocks
`03-rtl-catalog.md`	Runtime-library functions (strings, array, hash, JSON, date, regex, charset, …)
`04-idioms.md`	Web/worker patterns: HTTP endpoint, routing, Postgres, job queue, LLM, build/deploy
`05-gotchas.md`	Non-obvious traps + fixes (the highest-signal file)
`06-security.md`	Web security patterns: authz, sessions, password hashing, XSS, CSP, uploads
`INDEX.md`	Retrieval manifest (doc → keywords + one-line)

Every file has YAML frontmatter (doc, title, keywords, summary) for ranking.

How to consume

Direct context injection (simplest): for a small/medium task, paste the relevant doc(s). For broad work, 01+02+05 fit easily; pull 03/04 sections as needed.
Keyword retrieval (built-in): run ./search.sh <terms> — a dependency-free ripgrep/grep ranker over the corpus (frontmatter keywords weighted ×3 + body), printing ranked docs with the matching ## section headers. No index to build. e.g. ./search.sh session token csprng → 06-security.md §2.
bluge full-text index (KWONDoc): this corpus is indexed into KWONDoc's bluge index (~/.kwondoc/search-index, category five-rag) so bluge_search finds it. Re-index after edits: cd ~/kwondoc && go run ./cmd/ragindex <abs path to rag> five-rag (online upsert — keyed by file path, does not wipe other docs).
INDEX.md is the hand-curated routing table; an embeddings index can ingest the same .md.
Embedding RAG: chunk by ## headers (each section is self-contained). Frontmatter summary makes a good chunk preamble.

Suggested system-prompt pointer: "When writing Five (.prg) code, consult the Five RAG at fivedev/five/rag/ — especially 05-gotchas.md — and prefer patterns from 04-idioms.md."

Maintenance

Keep 03-rtl-catalog.md honest against hbrtl/register.go (names are authoritative; rare signatures may drift).
Append every new trap to 05-gotchas.md. That file is the compounding asset.
Grammar truth: compiler/{lexer,parser,ast}. Idiom truth: the solmade app.

2.9 KiB Raw Blame History Unescape Escape

Five RAG — knowledge corpus for LLM agents writing Five

Why this exists

Contents

How to consume

Maintenance

2.9 KiB

Raw Blame History