perf(sqlscan): flat backing buffer — 30% faster no-WHERE scan
The prior loop allocated one small `[]hbrt.Value` per matching row
(for the row body) plus one HbArray header. For a 50k-row full scan
that's 100k allocations of which the small-slice allocs dominated
fragmentation and GC pressure.
SQLite-inspired fix: pre-allocate a single flat []hbrt.Value of
capacity `RecCount * nFields` at scan start and hand each row a
three-index sub-slice (flat[off:end:end]). The capped sub-slice
still forces a reallocation if PRG code later does `AAdd(row, x)`,
so neighbor rows can't get clobbered.
Sizing the initial buffer off RecCount(err-ignored) was the actual
win — the previous naive grow-from-1024 policy caused five mid-scan
reallocations of a ~200 KB buffer, each memcpy'ing everything so far.
One upfront allocation amortizes much better.
Bench (50k rows, ~/tmp ext4, 3 runs steady-state):
Before After Δ
no WHERE 14.6ms 10.6ms −27%
numeric WHERE 11.7ms 10.0ms −15%
string WHERE 10.5ms 11.0ms ~=
raw RDD baseline 6.8ms 7.0ms
Gap to raw RDD: 2.1x → 1.4x on the dominant no-WHERE case. What's
left is pcode WHERE dispatch (ExecPcode frame per row), the Area
interface boundary, and the HbArray header allocation per row —
all structural costs that would need a wider refactor to close.
Validation:
- FiveSql2 43/43
- go test ./hbrtl/... PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -90,8 +90,27 @@ func SqlScan(t *hbrt.Thread) {
|
||||
return
|
||||
}
|
||||
|
||||
// Pre-allocate result: 50k × small-row header pressure matters
|
||||
rows := make([]hbrt.Value, 0, 1024)
|
||||
// SQLite-inspired: instead of one slice allocation per row, maintain
|
||||
// a single flat backing buffer and hand each row a sub-slice into it.
|
||||
// This halves allocations (row header + backing → just row header)
|
||||
// and keeps row data contiguous in memory for better cache locality.
|
||||
//
|
||||
// Safety: we cap each sub-slice to exactly nFields via the 3-index
|
||||
// slice form (flat[off:end:end]). Any later `append` on an individual
|
||||
// row will then trigger a reallocation of that row's backing, so we
|
||||
// don't clobber neighboring rows if PRG code mutates via AAdd.
|
||||
// Size the initial backing based on the workarea's record count —
|
||||
// even if WHERE filters most rows out, over-allocating beats five
|
||||
// regrowths of a 200 KB buffer mid-scan.
|
||||
estRows := 1024
|
||||
if rc, err := area.RecCount(); err == nil && rc > 0 {
|
||||
estRows = int(rc)
|
||||
if estRows > 1 << 20 {
|
||||
estRows = 1 << 20
|
||||
}
|
||||
}
|
||||
rows := make([]hbrt.Value, 0, estRows)
|
||||
flat := make([]hbrt.Value, 0, estRows*nFields)
|
||||
|
||||
// Scan
|
||||
area.GoTop()
|
||||
@@ -104,8 +123,19 @@ func SqlScan(t *hbrt.Thread) {
|
||||
}
|
||||
|
||||
if keep {
|
||||
// Collect column values
|
||||
row := make([]hbrt.Value, nFields)
|
||||
// Reserve nFields slots in flat, growing if needed.
|
||||
off := len(flat)
|
||||
end := off + nFields
|
||||
if end > cap(flat) {
|
||||
// Grow flat. Go's append growth policy handles this;
|
||||
// we re-reserve space so the sub-slice math still holds.
|
||||
flat = append(flat, make([]hbrt.Value, nFields)...)
|
||||
} else {
|
||||
flat = flat[:end]
|
||||
}
|
||||
row := flat[off:end:end]
|
||||
|
||||
// Collect column values directly into the backing buffer.
|
||||
for i := 0; i < nFields; i++ {
|
||||
// GetValue is 0-based
|
||||
v, _ := area.GetValue(fieldPos[i] - 1)
|
||||
|
||||
Reference in New Issue
Block a user