fix(pgserver,dbf): partial fix for multi-session concurrency race
Addresses two of the three layers behind the audit's "WorkArea
collision under multi-session" risk surfaced in Phase 3:
1. Shared DATA-INIT hash literals (PRG side).
TSqlSession.prg declared `DATA hPlanCache INIT { => }` (plus
hSavepoints + hRolePerms etc.). On the gengo path that
compiles class-DATA INITs, the {=>} literal is sometimes
evaluated ONCE at class-definition time, with every
subsequent New() reusing the same hash pointer. Two pgserver
connections then read/wrote a single shared HbHash from
different goroutines, eventually hitting `concurrent map
writes` inside HbHash.ensureIndex (the lazy O(1)-lookup
index map).
The pre-existing gotcha is already documented in
TSqlExecutor.prg's hSubCache comment ("DATA INIT on hash/
array literals can end up sharing the same instance across
New() calls depending on the compile path") — TSqlSession had
missed the same workaround. Moving the explicit
`::hPlanCache := { => }` etc. into the constructor body
guarantees a fresh hash per instance.
2. Stale cross-session recCount cache (Go side).
`*DBFArea.RecCount()` in shared mode caches its result for
the duration of `recCountCacheGen`. Append() bumped the count
on disk + refreshed THIS area's count under the append-intent
lock (Phase 1 of pre-1.0 audit) but never invalidated the
cache on peer DBFArea instances — so a second pgserver
connection's RecCount() kept returning its pre-Append cached
value. The peer's SELECT then iterated 1..old_count and
missed the newly inserted row.
Append() now calls `InvalidateRecCountCache()` after
committing the bumped header. The generation counter went
to atomic.AddUint64 / atomic.LoadUint64 so the bump is
safe to fire from any goroutine without a lock around the
variable.
Measured impact
---------------
Same 3-worker concurrent-INSERT-then-SELECT stress test that was
~3/5 passing pre-fix:
before: 3 / 5 (40% — plus occasional Go-level panic)
after: 8 / 10 (80% — no panics, just intermittent missed rows)
The remaining 20% flake is on the third layer — peer mmap shows a
pre-Append snapshot when Append's `unmap()` only invalidates this
area's own mmap, not the other workareas that opened the same DBF
file independently via dbUseArea. Fixing that requires either a
cross-area registry of mmap views to invalidate, or skipping
mmap entirely when SHARED && cache-gen has bumped. Tracked as a
proper follow-up; tests/pgserver/run.sh's "Known limitation"
header now points at the narrower problem.
Standalone six-gate verification:
go test ./... ✓
FiveSql2 SQL:1999 43/43 ✓
Harbour compat 56/56 ✓
std.ch 17/17 ✓
FRB 7/7 ✓
pgserver integration 6/6 ✓
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -50,6 +50,21 @@ ENDCLASS
|
|||||||
|
|
||||||
|
|
||||||
METHOD New() CLASS TSqlSession
|
METHOD New() CLASS TSqlSession
|
||||||
|
|
||||||
|
/* DATA hPlanCache INIT { => } would compile-share one hash
|
||||||
|
* across all sessions on some gengo paths (same gotcha
|
||||||
|
* called out in TSqlExecutor.prg's hSubCache comment).
|
||||||
|
* Without explicit per-instance init, two pgserver
|
||||||
|
* connections would both end up reading and writing into the
|
||||||
|
* SAME hPlanCache concurrently — crash on the Go-level map
|
||||||
|
* with "concurrent map writes". Same for the txn log + the
|
||||||
|
* savepoints + role-perms hash. */
|
||||||
|
::aTxnLog := {}
|
||||||
|
::hSavepoints := { => }
|
||||||
|
::hPlanCache := { => }
|
||||||
|
::hRolePerms := { => }
|
||||||
|
::aOpenAreas := {}
|
||||||
|
|
||||||
RETURN SELF
|
RETURN SELF
|
||||||
|
|
||||||
|
|
||||||
|
|||||||
@@ -18,6 +18,7 @@ import (
|
|||||||
"fmt"
|
"fmt"
|
||||||
"os"
|
"os"
|
||||||
"strings"
|
"strings"
|
||||||
|
"sync/atomic"
|
||||||
)
|
)
|
||||||
|
|
||||||
// DBFArea implements the DBF database driver.
|
// DBFArea implements the DBF database driver.
|
||||||
@@ -459,7 +460,8 @@ func (a *DBFArea) RecCount() (uint32, error) {
|
|||||||
// cross-process freshness (e.g. SqlScan's one-shot row-count
|
// cross-process freshness (e.g. SqlScan's one-shot row-count
|
||||||
// estimate on a workarea we opened this session) can leave the
|
// estimate on a workarea we opened this session) can leave the
|
||||||
// cache warm. Invalidate on our own Append and dbCloseAll.
|
// cache warm. Invalidate on our own Append and dbCloseAll.
|
||||||
if a.recCountCached && a.recCountGen == recCountCacheGen {
|
gen := atomic.LoadUint64(&recCountCacheGen)
|
||||||
|
if a.recCountCached && a.recCountGen == gen {
|
||||||
return a.recCount, nil
|
return a.recCount, nil
|
||||||
}
|
}
|
||||||
size, err := a.dataFile.Seek(0, 2)
|
size, err := a.dataFile.Seek(0, 2)
|
||||||
@@ -468,7 +470,7 @@ func (a *DBFArea) RecCount() (uint32, error) {
|
|||||||
}
|
}
|
||||||
a.recCount = uint32((size - int64(a.header.HeaderLen)) / int64(a.header.RecordLen))
|
a.recCount = uint32((size - int64(a.header.HeaderLen)) / int64(a.header.RecordLen))
|
||||||
a.recCountCached = true
|
a.recCountCached = true
|
||||||
a.recCountGen = recCountCacheGen
|
a.recCountGen = gen
|
||||||
}
|
}
|
||||||
return a.recCount, nil
|
return a.recCount, nil
|
||||||
}
|
}
|
||||||
@@ -483,9 +485,11 @@ var recCountCacheGen uint64 = 1
|
|||||||
|
|
||||||
// InvalidateRecCountCache bumps the generation counter so every DBFArea's
|
// InvalidateRecCountCache bumps the generation counter so every DBFArea's
|
||||||
// cached count becomes stale and the next RecCount() call re-queries the
|
// cached count becomes stale and the next RecCount() call re-queries the
|
||||||
// filesystem.
|
// filesystem. Atomic so it can be called from any goroutine without a
|
||||||
|
// lock — the only invariant readers care about is "if I see a value > X,
|
||||||
|
// my cache is stale", and atomic.LoadUint64 is sufficient.
|
||||||
func InvalidateRecCountCache() {
|
func InvalidateRecCountCache() {
|
||||||
recCountCacheGen++
|
atomic.AddUint64(&recCountCacheGen, 1)
|
||||||
}
|
}
|
||||||
|
|
||||||
func (a *DBFArea) Deleted() bool {
|
func (a *DBFArea) Deleted() bool {
|
||||||
@@ -808,6 +812,16 @@ func (a *DBFArea) Append() error {
|
|||||||
if _, err := a.dataFile.Seek(0, 0); err == nil {
|
if _, err := a.dataFile.Seek(0, 0); err == nil {
|
||||||
_ = WriteHeader(a.dataFile, &a.header)
|
_ = WriteHeader(a.dataFile, &a.header)
|
||||||
}
|
}
|
||||||
|
// Bump the global recCount-cache generation so every peer
|
||||||
|
// DBFArea on this file re-stats on its next RecCount() call.
|
||||||
|
// Without this, a pgserver connection that holds a cached
|
||||||
|
// count from before our APPEND keeps seeing the pre-Append
|
||||||
|
// row total — its SELECT misses the row we just inserted,
|
||||||
|
// even though loadRecord()'s mmap-fallback path can read
|
||||||
|
// the record bytes fine. Cheap (single atomic increment);
|
||||||
|
// the cost is one Seek+stat per peer's next RecCount call,
|
||||||
|
// which is exactly what cross-connection freshness needs.
|
||||||
|
InvalidateRecCountCache()
|
||||||
}
|
}
|
||||||
|
|
||||||
// Promote to owned buffer for writing
|
// Promote to owned buffer for writing
|
||||||
|
|||||||
Reference in New Issue
Block a user