docs(rag): quality-gate idiom + dependency-free search.sh

- 04-idioms: document the lint.sh + smoke_test.sh gates and their wiring
  (build.sh gate, pre-commit hook, deploy-time smoke).
- search.sh: ripgrep/grep keyword ranker over the corpus (keywords ×3 +
  body), prints ranked docs + matching section headers — makes the RAG
  searchable with no index to build. README updated.
- Note: KWONDoc bluge MCP/CLI was unavailable here (MCP not connected;
  CLI license-gated), so search.sh delivers the "searchable" goal now; a
  bluge/embeddings index can ingest the same .md files later.

Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
This commit is contained in:
CharlesKWON
2026-06-15 16:26:54 +09:00
parent f26911177f
commit 59d7e490b4
3 changed files with 65 additions and 2 deletions

View File

@@ -180,6 +180,21 @@ Web and worker are **separate binaries** built from explicit file lists.
- `--module <name>` sets the temp module path (must sit under the app's module so RTL
packages can import the app's internal packages).
### Quality gates (lint + smoke)
Two committed gates keep a dynamically-typed PRG codebase honest (solmade):
- **`lint.sh`** — static checks: bans inline `;` statement separators
(`IF x ; y ; ENDIF` — hurts visual review; trailing-`;` continuation is fine)
and flags user input concatenated into SQL instead of `$1` binds. Exit 1 on
any ERROR. Fast, no server/DB needed.
- **`smoke_test.sh`** — runtime endpoint contract test (below).
Wiring: `build.sh` runs `./lint.sh` first and aborts the build on violation; a
`.githooks/pre-commit` runs it on every commit (enable once with
`git config core.hooksPath .githooks`; bypass with `git commit --no-verify`).
Run `smoke_test.sh` at deploy time (it needs the server up), not in the hook.
### Safety net: endpoint smoke test
Dynamic typing has no compile-time guarantee, so a smoke test is the practical safety net

View File

@@ -29,8 +29,11 @@ Every file has YAML frontmatter (`doc`, `title`, `keywords`, `summary`) for rank
- **Direct context injection (simplest):** for a small/medium task, paste the relevant
doc(s). For broad work, `01`+`02`+`05` fit easily; pull `03`/`04` sections as needed.
- **Keyword retrieval (e.g. bluge / ripgrep):** index the `.md` files; rank on the
frontmatter `keywords` + body. `INDEX.md` is a hand-curated routing table.
- **Keyword retrieval (built-in):** run `./search.sh <terms>` — a dependency-free
ripgrep/grep ranker over the corpus (frontmatter `keywords` weighted ×3 + body),
printing ranked docs with the matching `##` section headers. No index to build.
e.g. `./search.sh session token csprng``06-security.md §2`. `INDEX.md` is the
hand-curated routing table; a bluge/embeddings index can ingest the same `.md` files.
- **Embedding RAG:** chunk by `##` headers (each section is self-contained). Frontmatter
`summary` makes a good chunk preamble.

45
rag/search.sh Executable file
View File

@@ -0,0 +1,45 @@
#!/usr/bin/env bash
# rag/search.sh "<query terms>" — Five RAG 코퍼스 키워드 검색.
#
# bluge/임베딩 검색계층이 없을 때 쓰는 의존성 없는 검색기. 각 문서를
# frontmatter `keywords:`(가중치 3) + 본문 매치 수로 점수화해 랭킹하고,
# 매칭 섹션 헤더를 함께 보여준다. 결과를 컨텍스트에 그대로 붙여 쓰면 됨.
#
# ./search.sh session token csprng
# ./search.sh xss sanitize
set -u
RAGDIR="$(cd "$(dirname "$0")" && pwd)"
if [ $# -eq 0 ]; then
echo "usage: ./search.sh <query terms>"
exit 1
fi
tmp="$(mktemp)"
for f in "$RAGDIR"/0*.md; do
score=0
kwline="$(awk '/^keywords:/{print; exit}' "$f")"
for term in "$@"; do
kw=$(printf '%s' "$kwline" | grep -io "$term" | grep -c .)
bd=$(grep -io "$term" "$f" | grep -c .)
score=$(( score + kw*3 + bd ))
done
[ "$score" -gt 0 ] && printf '%d\t%s\n' "$score" "$f" >> "$tmp"
done
if [ ! -s "$tmp" ]; then
echo "(매치 없음)"
rm -f "$tmp"
exit 0
fi
# 검색어를 OR 정규식으로(섹션 헤더 매칭 표시용)
pat="$(printf '%s|' "$@" | sed 's/|$//')"
sort -t"$(printf '\t')" -k1 -rn "$tmp" | while IFS="$(printf '\t')" read -r s f; do
title="$(awk -F': ' '/^title:/{print $2; exit}' "$f")"
echo "■ [$s] $(basename "$f")$title"
# 매칭되는 ## 섹션 헤더 상위 3개
grep -nE "^#{1,6} " "$f" | grep -iE "$pat" | head -3 | sed 's/^/ §/'
echo
done
rm -f "$tmp"