The prior loop allocated one small `[]hbrt.Value` per matching row
(for the row body) plus one HbArray header. For a 50k-row full scan
that's 100k allocations of which the small-slice allocs dominated
fragmentation and GC pressure.
SQLite-inspired fix: pre-allocate a single flat []hbrt.Value of
capacity `RecCount * nFields` at scan start and hand each row a
three-index sub-slice (flat[off:end:end]). The capped sub-slice
still forces a reallocation if PRG code later does `AAdd(row, x)`,
so neighbor rows can't get clobbered.
Sizing the initial buffer off RecCount(err-ignored) was the actual
win — the previous naive grow-from-1024 policy caused five mid-scan
reallocations of a ~200 KB buffer, each memcpy'ing everything so far.
One upfront allocation amortizes much better.
Bench (50k rows, ~/tmp ext4, 3 runs steady-state):
Before After Δ
no WHERE 14.6ms 10.6ms −27%
numeric WHERE 11.7ms 10.0ms −15%
string WHERE 10.5ms 11.0ms ~=
raw RDD baseline 6.8ms 7.0ms
Gap to raw RDD: 2.1x → 1.4x on the dominant no-WHERE case. What's
left is pcode WHERE dispatch (ExecPcode frame per row), the Area
interface boundary, and the HbArray header allocation per row —
all structural costs that would need a wider refactor to close.
Validation:
- FiveSql2 43/43
- go test ./hbrtl/... PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>