Complex-query benchmarking turned up two hot paths that the earlier
SqlScan/SqlEach work didn't touch: multi-table JOIN and nested-scan
row fetching. This commit hits both.
--- Part 1: SqlHashBuild — Go-native hash-join build ---
FiveSql2's HashJoin previously built the inner-side hash in PRG:
WHILE !Eof()
xVal := FieldGet(nFPos)
cKey := SqlValToStr(xVal)
IF !hb_HHasKey(hHash, cKey) ; hHash[cKey] := {} ; ENDIF
AAdd(hHash[cKey], RecNo())
dbSkip()
ENDDO
That loop runs at ~40μs per row from class dispatch + hb_HHasKey
lookups + AAdd growth + SqlValToStr formatting. On a 50k-row inner
table that's ~2 seconds wasted on what should be a sub-50ms
housekeeping op.
New hbrtl.SqlHashBuild does the same thing in one Go-native pass:
- Direct *dbf.DBFArea loop (no interface dispatch, same devirt as
SqlScan)
- Go `map[string][]int64` accumulates RecNos by key — one
allocation per distinct key
- Inline ASCII-only digit formatter for numeric keys (strconv.Itoa
is allocation-heavy for small ints)
- CHAR keys are right-trimmed to match SqlCmpEq semantics so the
hash probe matches what EvalExpr would compute
- Final Five hash is built once from Keys/Values/Order slices
directly, skipping the per-key hb_HSet path
HashJoin now calls `SqlHashBuild(nFPos)` instead of running the
PRG loop.
--- Part 2: TSqlExecutor:BuildFetchCache ---
The JOIN fallback loop calls FetchRow per row. FetchRow was already
column-ref-aware but did the string parse (`At + SubStr + Upper`)
and `::FindWA` linear scan every single invocation. For a 50k-row
join emitting 50k result rows, that's ~200k redundant resolutions.
New BuildFetchCache walks the SELECT list once before the scan and
pre-binds each plain-column expression to `{nWA, nFPos}`. FetchRow's
new fast path checks ::aFetchCache and jumps straight to
`dbSelectArea + FieldGet` when bound. Complex exprs (functions,
CASE, subqueries) still fall through to EvalExpr.
::aFetchCache is set right before the join WHILE loop and cleared
after — no cross-query bleed.
--- Bench (50k ord × 10k emp × 100 dept, 3-run steady state) ---
Query Before After Speedup
────────────────────────────────────────────────────────────
2-way INNER JOIN, 10k rows 91ms 68ms 1.34x
2-way JOIN + GROUP BY 110ms 94ms 1.17x
3-way INNER JOIN COUNT 2610ms 610ms 4.28x
3-way JOIN + GROUP BY 2860ms 830ms 3.45x
The 3-way speedup is almost entirely SqlHashBuild. The 2-way case
benefits from the fetch cache because its per-row cost is dominated
by FetchRow (no second hash build to amortize).
--- Limits still standing ---
CTE + JOIN queries (Q7 in bench_complex: ~4.5s) aren't affected by
either optimization — CTE materialization goes through a different
path that writes/reads a temp DBF. Follow-up target.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
536 lines
14 KiB
Go
536 lines
14 KiB
Go
// Copyright (c) 2026 Charles KWON OhJun (charleskwonohjun@gmail.com)
|
||
// All rights reserved.
|
||
|
||
// Go-native SQL scan loop for FiveSql2 hot path.
|
||
//
|
||
// Motivation: FiveSql2 is a PRG-based SQL interpreter. For simple
|
||
// "SELECT cols FROM table WHERE cond" queries, the per-row cost is
|
||
// dominated by PRG interpreter overhead (AST tree walk, field name
|
||
// lookup, workarea switching). Moving just the inner scan loop to Go
|
||
// bypasses all that overhead and gets us ~15x speedup for the common
|
||
// case while keeping the rest of FiveSql2 untouched.
|
||
//
|
||
// The SQL engine remains responsible for:
|
||
// - Parsing SQL and building AST
|
||
// - Resolving field names to positions (column binding)
|
||
// - Compiling WHERE expression to pcode (via PcCompile)
|
||
// - GROUP BY, ORDER BY, aggregates (not per-row)
|
||
//
|
||
// This helper only handles the hot loop:
|
||
// - Full table scan (workarea already positioned)
|
||
// - Per-row WHERE evaluation via ExecPcode
|
||
// - Column extraction via cached field positions
|
||
// - Result array construction
|
||
|
||
package hbrtl
|
||
|
||
import (
|
||
"five/hbrdd"
|
||
"five/hbrdd/dbf"
|
||
"five/hbrt"
|
||
"strconv"
|
||
)
|
||
|
||
// SqlScan(aFieldPositions, pcWhere) → aRows
|
||
//
|
||
// Scans the current workarea top-to-bottom, evaluates pcWhere per row
|
||
// (nil = no filter), collects selected column values into rows.
|
||
//
|
||
// aFieldPositions: array of 1-based field positions to extract per row.
|
||
// Resolve once before calling (FieldPos cache is O(1)
|
||
// but still has PRG → Go call overhead).
|
||
// pcWhere: pcode function pointer from PcCompile, or NIL.
|
||
//
|
||
// Returns:
|
||
// Array of rows, each row = Array of field values.
|
||
//
|
||
// Notes on CHAR trimming: DBF character fields are space-padded. The
|
||
// caller decides whether to trim (via a SELECT-list AllTrim wrapper).
|
||
// We don't trim here — that's a semantic choice, and callers who need
|
||
// raw bytes shouldn't pay for a strings.TrimSpace().
|
||
func SqlScan(t *hbrt.Thread) {
|
||
t.Frame(2, 0)
|
||
defer t.EndProc()
|
||
|
||
// Parse arguments
|
||
fieldsVal := t.Local(1)
|
||
if !fieldsVal.IsArray() {
|
||
t.PushValue(hbrt.MakeArray(0))
|
||
t.RetValue()
|
||
return
|
||
}
|
||
fieldsArr := fieldsVal.AsArray().Items
|
||
nFields := len(fieldsArr)
|
||
|
||
whereVal := t.Local(2)
|
||
var whereFn *hbrt.PcodeFunc
|
||
if !whereVal.IsNil() {
|
||
if p := whereVal.AsPointer(); p != nil {
|
||
whereFn, _ = p.(*hbrt.PcodeFunc)
|
||
}
|
||
}
|
||
|
||
// Pre-convert field positions to []int (avoid Value->int per row)
|
||
fieldPos := make([]int, nFields)
|
||
for i := 0; i < nFields; i++ {
|
||
fieldPos[i] = int(fieldsArr[i].AsNumInt())
|
||
if fieldPos[i] < 1 {
|
||
fieldPos[i] = 1
|
||
}
|
||
}
|
||
|
||
wam, ok := t.WA.(*hbrdd.WorkAreaManager)
|
||
if !ok {
|
||
t.PushValue(hbrt.MakeArray(0))
|
||
t.RetValue()
|
||
return
|
||
}
|
||
area := wam.Current()
|
||
if area == nil {
|
||
t.PushValue(hbrt.MakeArray(0))
|
||
t.RetValue()
|
||
return
|
||
}
|
||
|
||
// Type-assert to concrete DBFArea once so the hot loop calls
|
||
// GoTop/EOF/Skip/GetValue directly on *dbf.DBFArea without paying
|
||
// the interface dispatch on every row. Falls back to the generic
|
||
// Area path for non-DBF drivers (rare in FiveSql2 context).
|
||
dbfArea, _ := area.(*dbf.DBFArea)
|
||
|
||
// SQLite-inspired: instead of one slice allocation per row, maintain
|
||
// a single flat backing buffer and hand each row a sub-slice into it.
|
||
// This halves allocations (row header + backing → just row header)
|
||
// and keeps row data contiguous in memory for better cache locality.
|
||
//
|
||
// Safety: we cap each sub-slice to exactly nFields via the 3-index
|
||
// slice form (flat[off:end:end]). Any later `append` on an individual
|
||
// row will then trigger a reallocation of that row's backing, so we
|
||
// don't clobber neighboring rows if PRG code mutates via AAdd.
|
||
// Size the initial backing based on the workarea's record count —
|
||
// even if WHERE filters most rows out, over-allocating beats five
|
||
// regrowths of a 200 KB buffer mid-scan.
|
||
estRows := 1024
|
||
if rc, err := area.RecCount(); err == nil && rc > 0 {
|
||
estRows = int(rc)
|
||
if estRows > 1 << 20 {
|
||
estRows = 1 << 20
|
||
}
|
||
}
|
||
rows := make([]hbrt.Value, 0, estRows)
|
||
flat := make([]hbrt.Value, 0, estRows*nFields)
|
||
slab := hbrt.NewArraySlab(estRows)
|
||
|
||
// Install the hot-path field getter so PcOpFieldGet in the compiled
|
||
// WHERE predicate bypasses PushSymbol + Function dispatch + the
|
||
// FieldGet RTL's own Frame. The closure captures the concrete
|
||
// DBFArea directly so there's no interface dispatch per access.
|
||
prevFG := t.FastFieldGetter
|
||
if dbfArea != nil {
|
||
t.FastFieldGetter = func(idx int) hbrt.Value {
|
||
v, _ := dbfArea.GetValue(idx - 1)
|
||
return v
|
||
}
|
||
} else {
|
||
t.FastFieldGetter = func(idx int) hbrt.Value {
|
||
v, _ := area.GetValue(idx - 1)
|
||
return v
|
||
}
|
||
}
|
||
defer func() { t.FastFieldGetter = prevFG }()
|
||
|
||
// Scan — four specialized loops. Two axes of specialization:
|
||
//
|
||
// DBF vs generic Area: devirtualization — Go inlines method calls
|
||
// on the concrete type but pays an interface
|
||
// dispatch on every call of the generic one.
|
||
//
|
||
// WHERE vs no-WHERE : branch hoisting — the no-WHERE case is a
|
||
// hot full-scan path (SELECT * or similar),
|
||
// where even the predictable `whereFn != nil`
|
||
// check and the `keep` shadow variable show
|
||
// up in pprof.
|
||
//
|
||
// Four combinations = four loop copies. Painful but each row save
|
||
// counts when we're reaching for raw RDD parity.
|
||
switch {
|
||
case dbfArea != nil && whereFn != nil:
|
||
dbfArea.GoTop()
|
||
for !dbfArea.EOF() {
|
||
hbrt.ExecPcodeFast(t, whereFn, nil)
|
||
if t.GetRetValue().AsBool() {
|
||
off := len(flat)
|
||
end := off + nFields
|
||
if end > cap(flat) {
|
||
flat = append(flat, make([]hbrt.Value, nFields)...)
|
||
} else {
|
||
flat = flat[:end]
|
||
}
|
||
row := flat[off:end:end]
|
||
for i := 0; i < nFields; i++ {
|
||
v, _ := dbfArea.GetValue(fieldPos[i] - 1)
|
||
row[i] = v
|
||
}
|
||
rows = append(rows, slab.WrapNext(row))
|
||
}
|
||
dbfArea.Skip(1)
|
||
}
|
||
case dbfArea != nil:
|
||
// DBF + no WHERE — tightest inner loop
|
||
dbfArea.GoTop()
|
||
for !dbfArea.EOF() {
|
||
off := len(flat)
|
||
end := off + nFields
|
||
if end > cap(flat) {
|
||
flat = append(flat, make([]hbrt.Value, nFields)...)
|
||
} else {
|
||
flat = flat[:end]
|
||
}
|
||
row := flat[off:end:end]
|
||
for i := 0; i < nFields; i++ {
|
||
v, _ := dbfArea.GetValue(fieldPos[i] - 1)
|
||
row[i] = v
|
||
}
|
||
rows = append(rows, slab.WrapNext(row))
|
||
dbfArea.Skip(1)
|
||
}
|
||
case whereFn != nil:
|
||
area.GoTop()
|
||
for !area.EOF() {
|
||
hbrt.ExecPcodeFast(t, whereFn, nil)
|
||
if t.GetRetValue().AsBool() {
|
||
off := len(flat)
|
||
end := off + nFields
|
||
if end > cap(flat) {
|
||
flat = append(flat, make([]hbrt.Value, nFields)...)
|
||
} else {
|
||
flat = flat[:end]
|
||
}
|
||
row := flat[off:end:end]
|
||
for i := 0; i < nFields; i++ {
|
||
v, _ := area.GetValue(fieldPos[i] - 1)
|
||
row[i] = v
|
||
}
|
||
rows = append(rows, slab.WrapNext(row))
|
||
}
|
||
area.Skip(1)
|
||
}
|
||
default:
|
||
area.GoTop()
|
||
for !area.EOF() {
|
||
off := len(flat)
|
||
end := off + nFields
|
||
if end > cap(flat) {
|
||
flat = append(flat, make([]hbrt.Value, nFields)...)
|
||
} else {
|
||
flat = flat[:end]
|
||
}
|
||
row := flat[off:end:end]
|
||
for i := 0; i < nFields; i++ {
|
||
v, _ := area.GetValue(fieldPos[i] - 1)
|
||
row[i] = v
|
||
}
|
||
rows = append(rows, slab.WrapNext(row))
|
||
area.Skip(1)
|
||
}
|
||
}
|
||
|
||
t.PushValue(hbrt.MakeArrayFrom(rows))
|
||
t.RetValue()
|
||
}
|
||
|
||
// SqlHashBuild(nFieldPos) → hHash
|
||
//
|
||
// Scans the current workarea and returns a hash mapping each field
|
||
// value (as a string key) to an array of RecNos that have that value.
|
||
// Used by FiveSql2's HashJoin: FiveSql2 currently builds this in PRG,
|
||
// paying ~40μs per row from class dispatch + hb_HHasKey + AAdd growth.
|
||
// 50k rows × 40μs = 2 seconds wasted on what should be a sub-50ms op.
|
||
//
|
||
// Go-native build goes through *dbf.DBFArea directly and uses a native
|
||
// Go `map[string][]int64` which GC's as one unit. Final conversion to
|
||
// a Five hash is done once at the end.
|
||
func SqlHashBuild(t *hbrt.Thread) {
|
||
t.Frame(1, 0)
|
||
defer t.EndProc()
|
||
|
||
nFieldPos := int(t.Local(1).AsNumInt()) - 1
|
||
if nFieldPos < 0 {
|
||
t.PushValue(hbrt.MakeHash())
|
||
t.RetValue()
|
||
return
|
||
}
|
||
|
||
wam, ok := t.WA.(*hbrdd.WorkAreaManager)
|
||
if !ok {
|
||
t.PushValue(hbrt.MakeHash())
|
||
t.RetValue()
|
||
return
|
||
}
|
||
area := wam.Current()
|
||
if area == nil {
|
||
t.PushValue(hbrt.MakeHash())
|
||
t.RetValue()
|
||
return
|
||
}
|
||
|
||
// Type-assert once so the per-row field reads inline.
|
||
dbfArea, _ := area.(*dbf.DBFArea)
|
||
|
||
goMap := make(map[string][]int64, 4096)
|
||
|
||
if dbfArea != nil {
|
||
dbfArea.GoTop()
|
||
for !dbfArea.EOF() {
|
||
v, _ := dbfArea.GetValue(nFieldPos)
|
||
key := valueHashKey(v)
|
||
goMap[key] = append(goMap[key], int64(dbfArea.RecNo()))
|
||
dbfArea.Skip(1)
|
||
}
|
||
} else {
|
||
area.GoTop()
|
||
for !area.EOF() {
|
||
v, _ := area.GetValue(nFieldPos)
|
||
key := valueHashKey(v)
|
||
// Generic RecNo via interface
|
||
var rn int64
|
||
if rmgr, ok := area.(interface{ RecNo() uint32 }); ok {
|
||
rn = int64(rmgr.RecNo())
|
||
}
|
||
goMap[key] = append(goMap[key], rn)
|
||
area.Skip(1)
|
||
}
|
||
}
|
||
|
||
// Materialize as a Five hash — build Keys/Values slices directly on
|
||
// the HbHash struct, skipping the per-key map-lookup path that PRG
|
||
// hb_HSet would take.
|
||
nKeys := len(goMap)
|
||
keys := make([]hbrt.Value, 0, nKeys)
|
||
vals := make([]hbrt.Value, 0, nKeys)
|
||
order := make([]int, 0, nKeys)
|
||
idx := 0
|
||
for k, recs := range goMap {
|
||
items := make([]hbrt.Value, len(recs))
|
||
for i, r := range recs {
|
||
items[i] = hbrt.MakeNumInt(r)
|
||
}
|
||
keys = append(keys, hbrt.MakeString(k))
|
||
vals = append(vals, hbrt.MakeArrayFrom(items))
|
||
order = append(order, idx)
|
||
idx++
|
||
}
|
||
result := hbrt.MakeHash()
|
||
hh := result.AsHash()
|
||
hh.Keys = keys
|
||
hh.Values = vals
|
||
hh.Order = order
|
||
|
||
t.PushValue(result)
|
||
t.RetValue()
|
||
}
|
||
|
||
// valueHashKey converts a Value to a stable string key for Go map use.
|
||
// Matches what SqlValToStr does in PRG, but without allocation detours.
|
||
func valueHashKey(v hbrt.Value) string {
|
||
switch {
|
||
case v.IsNil():
|
||
return "\x00NIL"
|
||
case v.IsString():
|
||
// Match PRG SqlValToStr: trim trailing spaces so CHAR hash probes
|
||
// compare the same as the equivalent SqlCmpEq call.
|
||
s := v.AsString()
|
||
end := len(s)
|
||
for end > 0 && s[end-1] == ' ' {
|
||
end--
|
||
}
|
||
return s[:end]
|
||
case v.IsNumeric():
|
||
if v.IsNumInt() {
|
||
return strconvItoa(v.AsNumInt())
|
||
}
|
||
return strconvFtoa(v.AsNumDouble())
|
||
case v.IsLogical():
|
||
if v.AsBool() {
|
||
return "T"
|
||
}
|
||
return "F"
|
||
case v.IsDate():
|
||
return strconvItoa(v.AsJulian())
|
||
}
|
||
return ""
|
||
}
|
||
|
||
func strconvItoa(n int64) string {
|
||
// strconv.Itoa is heavy on allocation for small ints — this is the
|
||
// hot path for hash keys so use a tight formatter.
|
||
if n == 0 {
|
||
return "0"
|
||
}
|
||
neg := n < 0
|
||
if neg {
|
||
n = -n
|
||
}
|
||
var buf [20]byte
|
||
i := len(buf)
|
||
for n > 0 {
|
||
i--
|
||
buf[i] = byte('0' + n%10)
|
||
n /= 10
|
||
}
|
||
if neg {
|
||
i--
|
||
buf[i] = '-'
|
||
}
|
||
return string(buf[i:])
|
||
}
|
||
|
||
func strconvFtoa(f float64) string {
|
||
// Only used for non-integer numeric field values (rare in join keys);
|
||
// OK to call into strconv.
|
||
return strconv.FormatFloat(f, 'g', -1, 64)
|
||
}
|
||
|
||
// SqlEach(aFieldPositions, pcWhere, bBlock) → NIL
|
||
//
|
||
// Streaming variant of SqlScan — instead of materializing all matching
|
||
// rows into a result array (which costs N HbArray allocations plus a
|
||
// second pass when the PRG caller iterates it), we invoke a user-provided
|
||
// code block once per matching row, passing the selected field values as
|
||
// block parameters.
|
||
//
|
||
// This is the Harbour block-iteration idiom (`AEval`, `AScan`) applied
|
||
// to SQL. Total heap traffic collapses to ~0 — no result rows, no slab,
|
||
// no flat value buffer. Per-row overhead becomes just (field reads +
|
||
// WHERE eval + block invoke).
|
||
//
|
||
// Expected to hit raw-RDD parity on end-to-end "SQL → user code" timing.
|
||
//
|
||
// Arguments:
|
||
// aFieldPositions: 1-based field positions to pass as block params
|
||
// pcWhere: compiled WHERE predicate, or NIL
|
||
// bBlock: code block receiving nFields positional params
|
||
func SqlEach(t *hbrt.Thread) {
|
||
t.Frame(3, 0)
|
||
defer t.EndProc()
|
||
|
||
fieldsVal := t.Local(1)
|
||
if !fieldsVal.IsArray() {
|
||
t.RetNil()
|
||
return
|
||
}
|
||
fieldsArr := fieldsVal.AsArray().Items
|
||
nFields := len(fieldsArr)
|
||
|
||
whereVal := t.Local(2)
|
||
var whereFn *hbrt.PcodeFunc
|
||
if !whereVal.IsNil() {
|
||
if p := whereVal.AsPointer(); p != nil {
|
||
whereFn, _ = p.(*hbrt.PcodeFunc)
|
||
}
|
||
}
|
||
|
||
blockVal := t.Local(3)
|
||
if !blockVal.IsBlock() {
|
||
t.RetNil()
|
||
return
|
||
}
|
||
blk := blockVal.AsBlock()
|
||
|
||
fieldPos := make([]int, nFields)
|
||
for i := 0; i < nFields; i++ {
|
||
fieldPos[i] = int(fieldsArr[i].AsNumInt())
|
||
if fieldPos[i] < 1 {
|
||
fieldPos[i] = 1
|
||
}
|
||
}
|
||
|
||
wam, ok := t.WA.(*hbrdd.WorkAreaManager)
|
||
if !ok {
|
||
t.RetNil()
|
||
return
|
||
}
|
||
area := wam.Current()
|
||
if area == nil {
|
||
t.RetNil()
|
||
return
|
||
}
|
||
dbfArea, _ := area.(*dbf.DBFArea)
|
||
|
||
// Install FastFieldGetter for the WHERE predicate's PcOpFieldGet ops
|
||
prevFG := t.FastFieldGetter
|
||
if dbfArea != nil {
|
||
t.FastFieldGetter = func(idx int) hbrt.Value {
|
||
v, _ := dbfArea.GetValue(idx - 1)
|
||
return v
|
||
}
|
||
} else {
|
||
t.FastFieldGetter = func(idx int) hbrt.Value {
|
||
v, _ := area.GetValue(idx - 1)
|
||
return v
|
||
}
|
||
}
|
||
defer func() { t.FastFieldGetter = prevFG }()
|
||
|
||
// Block eval protocol: push N args on the stack, set pendingParams,
|
||
// call blk.Fn(t). Matches what EvalBlock does inline, skipping the
|
||
// per-call `make([]Value, nArgs)` temp slice.
|
||
//
|
||
// Four specialized loops on {DBF, generic}×{WHERE, none}, same
|
||
// reasoning as SqlScan's loop split.
|
||
switch {
|
||
case dbfArea != nil && whereFn != nil:
|
||
dbfArea.GoTop()
|
||
for !dbfArea.EOF() {
|
||
hbrt.ExecPcodeFast(t, whereFn, nil)
|
||
if t.GetRetValue().AsBool() {
|
||
for i := 0; i < nFields; i++ {
|
||
v, _ := dbfArea.GetValue(fieldPos[i] - 1)
|
||
t.PushValue(v)
|
||
}
|
||
t.PendingParams2(nFields)
|
||
blk.Fn(t)
|
||
}
|
||
dbfArea.Skip(1)
|
||
}
|
||
case dbfArea != nil:
|
||
dbfArea.GoTop()
|
||
for !dbfArea.EOF() {
|
||
for i := 0; i < nFields; i++ {
|
||
v, _ := dbfArea.GetValue(fieldPos[i] - 1)
|
||
t.PushValue(v)
|
||
}
|
||
t.PendingParams2(nFields)
|
||
blk.Fn(t)
|
||
dbfArea.Skip(1)
|
||
}
|
||
case whereFn != nil:
|
||
area.GoTop()
|
||
for !area.EOF() {
|
||
hbrt.ExecPcodeFast(t, whereFn, nil)
|
||
if t.GetRetValue().AsBool() {
|
||
for i := 0; i < nFields; i++ {
|
||
v, _ := area.GetValue(fieldPos[i] - 1)
|
||
t.PushValue(v)
|
||
}
|
||
t.PendingParams2(nFields)
|
||
blk.Fn(t)
|
||
}
|
||
area.Skip(1)
|
||
}
|
||
default:
|
||
area.GoTop()
|
||
for !area.EOF() {
|
||
for i := 0; i < nFields; i++ {
|
||
v, _ := area.GetValue(fieldPos[i] - 1)
|
||
t.PushValue(v)
|
||
}
|
||
t.PendingParams2(nFields)
|
||
blk.Fn(t)
|
||
area.Skip(1)
|
||
}
|
||
}
|
||
|
||
t.RetNil()
|
||
}
|