docs(rag): add security idioms & gotchas (06-security.md)

Capture the hardening patterns from the solmade audit so future Five work reuses them: authorize on resolved function name (not URL path), CSPRNG session tokens stored as hashes, argon2id with legacy-verify + upgrade, login rate-limit + timing-safe dummy hash, bluemonday HTML sanitize vs EscHtml, security headers + nonce CSP, upload allowlist (no SVG), bind-all SQL. Theme: thin Go RTL over an ecosystem crypto lib. INDEX/README updated. Co-Authored-By: Claude Opus 4.8 (1M context) <noreply@anthropic.com>
2026-06-15 15:49:49 +09:00
parent 80131b1225
commit f26911177f
3 changed files with 123 additions and 0 deletions
--- a/rag/06-security.md
+++ b/rag/06-security.md
@@ -0,0 +1,121 @@
+---
+doc: five-security
+title: Security idioms & gotchas for Five web apps
+keywords: [security, auth, session, csprng, crypto/rand, argon2, password hash, bluemonday, xss, sanitize, csp, security headers, rate limit, role gate, authorization, cookie, upload, sql injection]
+summary: Hardening patterns and traps for a Five web app, grounded in the solmade codebase. Covers authz gating, session tokens, password hashing, XSS sanitization, headers/CSP, login rate-limiting, uploads — and the "thin Go RTL over an ecosystem crypto lib" pattern.
+---
+
+# Five web security — idioms & gotchas
+
+Grounded in `solmade`. The recurring theme: reach for a **thin Go RTL wrapping a
+battle-tested ecosystem library** (crypto/rand, bluemonday, argon2id) rather than
+hand-rolling crypto in PRG.
+
+## 1. GOTCHA — authorize on the RESOLVED function name, not the URL path
+
+The router maps `-` and `/` both to `_` (`/api/admin/x`, `/api/admin-x`, `/api/admin_x`
+all reach `ADMIN_X__MAIN`). A role gate that matches the path prefix `"/api/admin/"`
+is bypassed by the hyphen/underscore variants → privilege escalation.
+
+```five
+// WRONG: SubStr(cPath,1,11) == "/api/admin/"   ← misses /api/admin-x , /api/admin_x
+// RIGHT: gate on the resolved function name
+cFunc := PathToFunc( cPath )
+IF Left( cFunc, 6 ) == "ADMIN_"
+   RETURN cRole == "superadmin" .OR. cRole == "operator"
+ENDIF
+```
+Anon allowlists (login/logout/health) are safe as *exact* matches — they fail closed.
+(solmade: `app/auth/auth_middleware.prg` RoleAllows.)
+
+## 2. Session tokens — CSPRNG, and store the HASH
+
+`hb_RandomInt` is Mersenne Twister (non-crypto) — predictable tokens → session
+hijacking. Use a `crypto/rand` RTL. And store `SHA256(token)`, not the raw token, so a
+DB leak yields nothing reusable; the cookie holds the raw value.
+
+```five
+// hbrtl_ext/secrand: SEC_RANDHEX(nBytes) via crypto/rand
+FUNCTION SESSION_TOKEN()         ; RETURN SEC_RANDHEX( 32 )           // 64-hex
+FUNCTION SESSION_TOKEN_HASH( c ) ; RETURN hb_SHA256( hb_CStr( c ) )
+// login : INSERT ... token = SESSION_TOKEN_HASH(cToken); Set-Cookie = raw cToken
+// verify: WHERE s.token = SESSION_TOKEN_HASH(cCookie)
+// logout: DELETE WHERE token = SESSION_TOKEN_HASH(cCookie)
+```
+Cookie flags: `HttpOnly; SameSite=Lax; Max-Age=…` and `Secure` when
+`x-forwarded-proto == https`. (`hb_SHA256` == `shasum -a 256`, lowercase hex.)
+
+## 3. Passwords — argon2id, with legacy verify + upgrade-on-login
+
+Salted SHA-256 stretch is GPU-crackable (not memory-hard). Hash with argon2id
+(`alexedwards/argon2id` via RTL). `PASSWD_VERIFY` detects the scheme so legacy
+`$sha256s$` rows still log in; on success, transparently re-hash to argon2id.
+
+```five
+FUNCTION PASSWD_HASH( c )         // try SEC_ARGON2_HASH; fall back to legacy if empty
+FUNCTION PASSWD_VERIFY( c, cEnc ) // Left(cEnc,9)=="$argon2id" → SEC_ARGON2_VERIFY else legacy
+FUNCTION PASSWD_NEEDS_REHASH( cEnc ) ; RETURN Left(hb_CStr(cEnc),9) != "$argon2id"
+// on login success: IF PASSWD_NEEDS_REHASH(stored) → UPDATE users SET password_hash=PASSWD_HASH(pw)
+```
+
+## 4. Login — rate limit + timing-safe unknown-user path
+
+- Rate limit per IP: a `login_attempts` table; count failures in the last 15 min; ≥20 → 429.
+  Clear an IP's failures on success.
+- Timing: an unknown email must NOT return before doing hash work, or response time leaks
+  which emails exist. Run a dummy hash on the not-found branch.
+
+```five
+IF aRows == NIL .OR. Len(aRows) == 0
+   PASSWD_VERIFY_DUMMY( cPass )        // burns argon2 work → constant-ish timing
+   RecordLogin( nPG, cIP, cEmail, .f. )
+   RETURN API_ERR( 401, "invalid credentials" )
+ENDIF
+```
+Use the SAME error text/status for "no such user" and "wrong password".
+
+## 5. XSS — sanitize user HTML; escape plain-text contexts
+
+A CMS stores rich text (Editor.js: `<b><i><a>`). You cannot blanket-escape it (breaks
+formatting) and you must not concat it raw into HTML (stored XSS). Sanitize with an
+allowlist (`bluemonday` via RTL); escape only genuinely plain-text slots.
+
+```five
+// hbrtl_ext/sanitize: HTML_SANITIZE(html) via bluemonday.UGCPolicy()
+cOut += '<p>' + HTML_SANITIZE( cUserText ) + '</p>'      // rich text
+cHtml += '<title>' + EscHtml( cPlainTitle ) + '</title>' // plain text
+cImg  += 'src="' + EscAttr( cUrl ) + '"'                 // attribute
+```
+Strips `<script>`, `on*=` handlers, `javascript:`; keeps `<b>`, links, lists.
+
+## 6. Response headers + CSP
+
+Set on every response: `X-Content-Type-Options: nosniff`, `X-Frame-Options: SAMEORIGIN`,
+`Referrer-Policy: strict-origin-when-cross-origin`. For a page that renders user content,
+add a **per-request nonce CSP** so injected scripts can't run while your own data blocks
+(JSON-LD) still do:
+
+```five
+cNonce := SEC_RANDHEX( 16 )
+hHdrs[ "Content-Security-Policy" ] := ;
+   "default-src 'self'; img-src 'self' https: data:; " + ;
+   "style-src 'self' 'unsafe-inline' https://fonts.googleapis.com; " + ;
+   "font-src 'self' https://fonts.gstatic.com; " + ;
+   "script-src 'nonce-" + cNonce + "'; object-src 'none'; base-uri 'none'"
+// emit JSON-LD as: <script type="application/ld+json" nonce="<cNonce>">
+```
+A global CSP often breaks editor CDNs/inline styles — scope it to the rendered page.
+
+## 7. Uploads
+
+Extension **allowlist** (unknown → `.bin`), and **exclude `.svg`** (SVG can carry script
+and would XSS if served inline). Strip path traversal from filenames (`..`, `/`, `\`, NUL)
+and prefix with a server id so names aren't attacker-controlled.
+
+## 8. SQL — always bind, never concat user input
+
+`PG_QUERY(nPG, "… WHERE x=$1", { val })`. Bind every user value as `$1/$2…`. Only ever
+concatenate *validated-allowlist* identifiers (table/column names), never raw input.
+(See [[five-idioms]] for the Postgres patterns and the strings-not-ints column gotcha.)
+
+Related: [[five-idioms]], [[five-gotchas]]
--- a/rag/INDEX.md
+++ b/rag/INDEX.md
@@ -9,6 +9,7 @@ Route a query to the right doc(s). Each row: file · when to retrieve · keyword
 | `03-rtl-catalog.md` | "what function does X" — string/array/hash/json/date/regex/charset/math/crypto builtins | rtl, builtin, Len, SubStr, Left, Right, At, Upper, AllTrim, PadL, PadR, StrTran, Chr, Asc, Val, Str, hb_NToS, hb_CStr, AAdd, AScan, AEval, hb_HGetDef, hb_HHasKey, hb_jsonDecode, hb_jsonEncode, ValType, HB_ISHASH, regex, HB_GETCHARSET, date, hb_ATokens |
 | `04-idioms.md` | building an endpoint, DB access, async/queue work, calling the LLM, building/deploying | idioms, http, endpoint, routing, AP_BODY, AP_GETPAIRS, AP_JSONRESPONSE, ctx_set, ctx_get, LABDB_GET_PG, PG_QUERY, PG_EXEC, PG_LAST_ERROR, RETURNING, CREATE TABLE IF NOT EXISTS, text_tasks, FOR UPDATE SKIP LOCKED, job queue, LLM_CHAT, fnode, build.sh, launchctl |
 | `05-gotchas.md` | debugging "why doesn't this work", or BEFORE editing string funcs / charset / SQL / LLM | gotcha, trap, intrinsic, gengo, charset, utf8, string escape, Chr, pgrtl string columns, Val, hb_CStr, model local, ResolveLlmModel, two runtimes, fnode, analyzer warning, CWD module resolution |
+| `06-security.md` | adding auth/login, sessions, password hashing, file uploads, or rendering user content into HTML | security, auth, authorization, role gate, session token, csprng, crypto/rand, argon2, password hash, xss, bluemonday, sanitize, csp, security headers, rate limit, cookie, upload, sql injection |

 ## Quick routing heuristics

--- a/rag/README.md
+++ b/rag/README.md
@@ -20,6 +20,7 @@ of the gap; the accumulating **gotchas** file closes the semantic long tail.
 | `03-rtl-catalog.md` | Runtime-library functions (strings, array, hash, JSON, date, regex, charset, …) |
 | `04-idioms.md` | Web/worker patterns: HTTP endpoint, routing, Postgres, job queue, LLM, build/deploy |
 | `05-gotchas.md` | Non-obvious traps + fixes (the highest-signal file) |
+| `06-security.md` | Web security patterns: authz, sessions, password hashing, XSS, CSP, uploads |
 | `INDEX.md` | Retrieval manifest (doc → keywords + one-line) |

 Every file has YAML frontmatter (`doc`, `title`, `keywords`, `summary`) for ranking.