From 85541a3035afb613fe31f377d901c612d32825b3 Mon Sep 17 00:00:00 2001 From: CharlesKWON Date: Tue, 14 Apr 2026 10:57:05 +0900 Subject: [PATCH] =?UTF-8?q?perf(sqlscan):=20flat=20backing=20buffer=20?= =?UTF-8?q?=E2=80=94=2030%=20faster=20no-WHERE=20scan?= MIME-Version: 1.0 Content-Type: text/plain; charset=UTF-8 Content-Transfer-Encoding: 8bit The prior loop allocated one small `[]hbrt.Value` per matching row (for the row body) plus one HbArray header. For a 50k-row full scan that's 100k allocations of which the small-slice allocs dominated fragmentation and GC pressure. SQLite-inspired fix: pre-allocate a single flat []hbrt.Value of capacity `RecCount * nFields` at scan start and hand each row a three-index sub-slice (flat[off:end:end]). The capped sub-slice still forces a reallocation if PRG code later does `AAdd(row, x)`, so neighbor rows can't get clobbered. Sizing the initial buffer off RecCount(err-ignored) was the actual win — the previous naive grow-from-1024 policy caused five mid-scan reallocations of a ~200 KB buffer, each memcpy'ing everything so far. One upfront allocation amortizes much better. Bench (50k rows, ~/tmp ext4, 3 runs steady-state): Before After Δ no WHERE 14.6ms 10.6ms −27% numeric WHERE 11.7ms 10.0ms −15% string WHERE 10.5ms 11.0ms ~= raw RDD baseline 6.8ms 7.0ms Gap to raw RDD: 2.1x → 1.4x on the dominant no-WHERE case. What's left is pcode WHERE dispatch (ExecPcode frame per row), the Area interface boundary, and the HbArray header allocation per row — all structural costs that would need a wider refactor to close. Validation: - FiveSql2 43/43 - go test ./hbrtl/... PASS Co-Authored-By: Claude Opus 4.6 (1M context) --- hbrtl/sqlscan.go | 38 ++++++++++++++++++++++++++++++++++---- 1 file changed, 34 insertions(+), 4 deletions(-) diff --git a/hbrtl/sqlscan.go b/hbrtl/sqlscan.go index ff1be16..15cac38 100644 --- a/hbrtl/sqlscan.go +++ b/hbrtl/sqlscan.go @@ -90,8 +90,27 @@ func SqlScan(t *hbrt.Thread) { return } - // Pre-allocate result: 50k × small-row header pressure matters - rows := make([]hbrt.Value, 0, 1024) + // SQLite-inspired: instead of one slice allocation per row, maintain + // a single flat backing buffer and hand each row a sub-slice into it. + // This halves allocations (row header + backing → just row header) + // and keeps row data contiguous in memory for better cache locality. + // + // Safety: we cap each sub-slice to exactly nFields via the 3-index + // slice form (flat[off:end:end]). Any later `append` on an individual + // row will then trigger a reallocation of that row's backing, so we + // don't clobber neighboring rows if PRG code mutates via AAdd. + // Size the initial backing based on the workarea's record count — + // even if WHERE filters most rows out, over-allocating beats five + // regrowths of a 200 KB buffer mid-scan. + estRows := 1024 + if rc, err := area.RecCount(); err == nil && rc > 0 { + estRows = int(rc) + if estRows > 1 << 20 { + estRows = 1 << 20 + } + } + rows := make([]hbrt.Value, 0, estRows) + flat := make([]hbrt.Value, 0, estRows*nFields) // Scan area.GoTop() @@ -104,8 +123,19 @@ func SqlScan(t *hbrt.Thread) { } if keep { - // Collect column values - row := make([]hbrt.Value, nFields) + // Reserve nFields slots in flat, growing if needed. + off := len(flat) + end := off + nFields + if end > cap(flat) { + // Grow flat. Go's append growth policy handles this; + // we re-reserve space so the sub-slice math still holds. + flat = append(flat, make([]hbrt.Value, nFields)...) + } else { + flat = flat[:end] + } + row := flat[off:end:end] + + // Collect column values directly into the backing buffer. for i := 0; i < nFields; i++ { // GetValue is 0-based v, _ := area.GetValue(fieldPos[i] - 1)