Files
five/hbrtl/rdd.go
CharlesKWON 151b628f6c fix(pgserver): Layer 5 — per-path mmap-gen registry + getWA torn-read
Closes the Go-panic class of multi-session concurrency bugs and
introduces an explicit cross-area mmap invalidation channel.

1. getWA waCache torn-read (root cause of panics)

   hbrtl/rdd.go cached the most recent `interface{} → *WAM` type
   assertion in a process-global struct of two `interface{}`-
   shaped fields. Each pgserver connection's NewThread gets its
   own WAM, so the cache missed on every call and immediately
   re-wrote two shared, unsynchronised fields. Go's `interface{}`
   is two words; concurrent write + read produced torn pointer
   values, with the result that goroutine A could observe
   goroutine B's WAM as its own.

   That mis-attribution surfaced as:
     - `concurrent map writes` panic at WorkAreaManager.Close
       (workarea.go:95): two goroutines genuinely modifying the
       SAME wam.aliases map.
     - `concurrent map writes` panic at DBFArea.FieldPosCache
       (dbf.go:439): two goroutines lazy-initing the SAME
       fieldPosMap.

   Drop the cache. The type assertion is ~ns; not worth a
   process-global shared slot. If perf matters again, replace
   with a sync.Map keyed by thread pointer, not a single struct.

2. Per-path mmap generation registry (hbrdd/dbf/area_registry.go)

   Each unique on-disk DBF path gets an atomic uint64 generation
   counter. *DBFArea instances:
     - On Open: pathGen = pathGenFor(path); pathGenSeen = current.
     - On Append (shared) / flushRecord: bumpPathGen(path);
       pathGenSeen = current.
     - On loadRecord: if pathGenSeen < live counter, bypass mmap
       fast path for THIS load (use ReadAt) and re-sync seen.

   Without this, a peer DBFArea's PutValue mutating a record we'd
   mmap-cached returned stale pre-mutation bytes from our
   snapshot. The existing length-bound check covered file-grow
   (`offset > mmap len`) but not byte-level mutation within the
   snapshot range. The registry covers both.

   Cheap: read = one atomic.LoadUint64, hit rate is ~100% in the
   single-writer-many-readers steady state.

Verification
------------

Same 3 / 5 / 10-worker pgx-driven concurrency stress harness:

  pre-Layer-1 baseline:       ~60% pass + occasional panic
  +Layer 1+2:                 80% / 50% / panic
  +Layer 3a (max-merge):      80% / 50% / panic
  +Layer 4a (per-session 3):  90% / 80% / 50%
  +Layer 4b (Go atomics):     75-90% / 50-80% / panic (still)
  +THIS (getWA + mmap-gen):   73% / 67% / 33% — ZERO PANICS

The shift "many partial fails, no panics" is what matters for
production: a connection seeing stale data is recoverable (rerun
the query); a Go-level process crash is not. Remaining
correctness flake comes from the in-flight appendBuf interaction
when peer Append fires between this connection's Append and
flushRecord — that's tractable with a per-connection flush
ordering rule, deferred to Layer 6.

All six release gates green:
  go test ./...               ✓
  FiveSql2 SQL:1999 43/43     ✓
  Harbour compat 56/56        ✓
  std.ch 17/17                ✓
  FRB 7/7                     ✓
  pgserver integration 6/6    ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 21:43:04 +09:00

134 lines
3.0 KiB
Go

// Copyright (c) 2026 Charles KWON OhJun (charleskwonohjun@gmail.com)
// All rights reserved.
// RDD-related RTL functions: EOF(), BOF(), Found(), RecNo(), RecCount(), Deleted().
// Optimized: no Frame/EndProc for 0-param functions (called millions of times in loops).
package hbrtl
import (
"five/hbrt"
"five/hbrdd"
)
func rtlEOF(t *hbrt.Thread) {
t.Frame(0, 0)
defer t.EndProcFast()
if wa := getWA(t); wa != nil {
if area := wa.Current(); area != nil {
t.PushBool(area.EOF())
t.RetValue()
return
}
}
t.PushBool(true)
t.RetValue()
}
func rtlBOF(t *hbrt.Thread) {
t.Frame(0, 0)
defer t.EndProcFast()
if wa := getWA(t); wa != nil {
if area := wa.Current(); area != nil {
t.PushBool(area.BOF())
t.RetValue()
return
}
}
t.PushBool(true)
t.RetValue()
}
func rtlFound(t *hbrt.Thread) {
t.Frame(0, 0)
defer t.EndProcFast()
if wa := getWA(t); wa != nil {
if area := wa.Current(); area != nil {
t.PushBool(area.Found())
t.RetValue()
return
}
}
t.PushBool(false)
t.RetValue()
}
func rtlRecNo(t *hbrt.Thread) {
t.Frame(0, 0)
defer t.EndProcFast()
if wa := getWA(t); wa != nil {
if area := wa.Current(); area != nil {
t.RetInt(int64(area.RecNo()))
return
}
}
t.RetInt(0)
}
func rtlRecCount(t *hbrt.Thread) {
t.Frame(0, 0)
defer t.EndProcFast()
if wa := getWA(t); wa != nil {
if area := wa.Current(); area != nil {
rc, _ := area.RecCount()
t.RetInt(int64(rc))
return
}
}
t.RetInt(0)
}
func rtlDeleted(t *hbrt.Thread) {
t.Frame(0, 0)
defer t.EndProcFast()
if wa := getWA(t); wa != nil {
if area := wa.Current(); area != nil {
t.PushBool(area.Deleted())
t.RetValue()
return
}
}
t.PushBool(false)
t.RetValue()
}
func rtlFieldGet(t *hbrt.Thread) {
t.Frame(1, 0)
defer t.EndProcFast()
n := int(t.Local(1).AsNumInt())
if wa := getWA(t); wa != nil {
if area := wa.Current(); area != nil {
val, err := area.GetValue(n - 1) // 1-based → 0-based
if err == nil {
t.PushValue(val)
t.RetValue()
return
}
}
}
t.PushNil()
t.RetValue()
}
// getWA resolves the WorkAreaManager attached to this thread.
//
// The previous version cached the last-seen interface→*WAM pair in
// a process-global struct to skip the type assertion. That cache
// was the worst-of-both-worlds under multi-pgserver-connection
// load: each connection's thread has its own WAM, so the cache
// missed on every call and immediately re-wrote two shared
// `interface{}` fields. Go's interface is a two-word value, so a
// concurrent write+read produced torn pointers — different
// goroutines saw the WRONG WAM as their own, leading to the
// FieldPosCache + WAM.aliases "concurrent map writes" panics.
//
// The type assertion itself is fast (~ns). Drop the cache; if the
// micro-bench matters again, replace with a sync/atomic.Pointer
// or sync.Map keyed by thread, not a single global slot.
func getWA(t *hbrt.Thread) *hbrdd.WorkAreaManager {
if t.WA == nil {
return nil
}
wa, _ := t.WA.(*hbrdd.WorkAreaManager)
return wa
}