perf(FiveSql2): FieldGet peephole + DBFArea devirt — WHERE at ~1.15x raw RDD

Two stacked optimizations land on the SqlScan hot path. Combined
effect on the 50k-row benchmark:

                       Before    After   vs raw
  Numeric WHERE        10.2ms    7.8ms   1.15x
  String WHERE         10.5ms    7.9ms   1.15x
  No WHERE              9.2ms   10.0ms   1.45x
  Raw RDD baseline      6.8ms    6.8ms   1.00x

WHERE-predicate paths are now within 15% of the raw Harbour-style
RDD scan loop. The no-WHERE path is unchanged (slight jitter from
the added devirt branch); FieldGet peephole doesn't apply there.

--- Optimization 1: PcOpFieldGet peephole ---

Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips
the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and
calls a direct field getter closure instead. genpc recognizes the
shape `FieldGet(<int-literal>)` during emitCall and emits the
specialized opcode automatically — no SQL-side API change.

Integration:
  * hbrt.Thread.FastFieldGetter  — hot-path closure set by scan loops.
                                   Non-nil → pcode bypasses dispatch.
                                   Nil → pcode resolves FIELDGET via
                                   the RTL symbol table (correctness
                                   fallback for any other callers).
  * compiler/genpc/genpc.go      — peephole in emitCall.
  * hbrt/pcinterp.go             — PcOpFieldGet handler.

This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly
one full Frame/EndProc + RTL dispatch per row × 50k rows.

--- Optimization 2: DBFArea devirtualization ---

SqlScan type-asserts the workarea to *dbf.DBFArea once and runs a
dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the
concrete type. Go's compiler inlines these, skipping the interface
vtable per row. Non-DBF drivers still work via the generic Area
branch.

The FastFieldGetter closure also captures *DBFArea directly in the
DBF branch, so the WHERE predicate side of the hot loop is now
entirely devirtualized: no interface dispatch between the pcode
dispatch loop and the DBF record buffer.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the
two-column row construction + ArraySlab + flat backing bookkeeping
that the raw loop doesn't do. Going below that requires changing
the SQL engine's result shape — out of scope here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-14 12:23:31 +09:00
parent fe5df22517
commit f9ffd4050e
5 changed files with 123 additions and 32 deletions

View File

@@ -26,6 +26,7 @@ package hbrtl
import (
"five/hbrdd"
"five/hbrdd/dbf"
"five/hbrt"
)
@@ -90,6 +91,12 @@ func SqlScan(t *hbrt.Thread) {
return
}
// Type-assert to concrete DBFArea once so the hot loop calls
// GoTop/EOF/Skip/GetValue directly on *dbf.DBFArea without paying
// the interface dispatch on every row. Falls back to the generic
// Area path for non-DBF drivers (rare in FiveSql2 context).
dbfArea, _ := area.(*dbf.DBFArea)
// SQLite-inspired: instead of one slice allocation per row, maintain
// a single flat backing buffer and hand each row a sub-slice into it.
// This halves allocations (row header + backing → just row header)
@@ -113,41 +120,82 @@ func SqlScan(t *hbrt.Thread) {
flat := make([]hbrt.Value, 0, estRows*nFields)
slab := hbrt.NewArraySlab(estRows)
// Scan
area.GoTop()
for !area.EOF() {
// WHERE evaluation (if any). Fast variant — WHERE expressions
// compiled from SQL AST don't contain BEGIN SEQUENCE, so we can
// skip the defer/recover frame exit.
keep := true
if whereFn != nil {
hbrt.ExecPcodeFast(t, whereFn, nil)
keep = t.GetRetValue().AsBool()
// Install the hot-path field getter so PcOpFieldGet in the compiled
// WHERE predicate bypasses PushSymbol + Function dispatch + the
// FieldGet RTL's own Frame. The closure captures the concrete
// DBFArea directly so there's no interface dispatch per access.
prevFG := t.FastFieldGetter
if dbfArea != nil {
t.FastFieldGetter = func(idx int) hbrt.Value {
v, _ := dbfArea.GetValue(idx - 1)
return v
}
if keep {
// Reserve nFields slots in flat, growing if needed.
off := len(flat)
end := off + nFields
if end > cap(flat) {
// Grow flat. Go's append growth policy handles this;
// we re-reserve space so the sub-slice math still holds.
flat = append(flat, make([]hbrt.Value, nFields)...)
} else {
flat = flat[:end]
}
row := flat[off:end:end]
// Collect column values directly into the backing buffer.
for i := 0; i < nFields; i++ {
// GetValue is 0-based
v, _ := area.GetValue(fieldPos[i] - 1)
row[i] = v
}
rows = append(rows, slab.WrapNext(row))
} else {
t.FastFieldGetter = func(idx int) hbrt.Value {
v, _ := area.GetValue(idx - 1)
return v
}
}
defer func() { t.FastFieldGetter = prevFG }()
area.Skip(1)
// Scan — dispatch two nearly-identical loops for devirtualization.
// The DBF branch is the common case; Go's compiler inlines the
// direct method calls, whereas the generic Area branch pays one
// interface dispatch per call as before.
if dbfArea != nil {
dbfArea.GoTop()
for !dbfArea.EOF() {
keep := true
if whereFn != nil {
hbrt.ExecPcodeFast(t, whereFn, nil)
keep = t.GetRetValue().AsBool()
}
if keep {
off := len(flat)
end := off + nFields
if end > cap(flat) {
flat = append(flat, make([]hbrt.Value, nFields)...)
} else {
flat = flat[:end]
}
row := flat[off:end:end]
for i := 0; i < nFields; i++ {
v, _ := dbfArea.GetValue(fieldPos[i] - 1)
row[i] = v
}
rows = append(rows, slab.WrapNext(row))
}
dbfArea.Skip(1)
}
} else {
area.GoTop()
for !area.EOF() {
keep := true
if whereFn != nil {
hbrt.ExecPcodeFast(t, whereFn, nil)
keep = t.GetRetValue().AsBool()
}
if keep {
off := len(flat)
end := off + nFields
if end > cap(flat) {
flat = append(flat, make([]hbrt.Value, nFields)...)
} else {
flat = flat[:end]
}
row := flat[off:end:end]
for i := 0; i < nFields; i++ {
v, _ := area.GetValue(fieldPos[i] - 1)
row[i] = v
}
rows = append(rows, slab.WrapNext(row))
}
area.Skip(1)
}
}
t.PushValue(hbrt.MakeArrayFrom(rows))