perf(FiveSql2): FieldGet peephole + DBFArea devirt — WHERE at ~1.15x raw RDD

Two stacked optimizations land on the SqlScan hot path. Combined
effect on the 50k-row benchmark:

                       Before    After   vs raw
  Numeric WHERE        10.2ms    7.8ms   1.15x
  String WHERE         10.5ms    7.9ms   1.15x
  No WHERE              9.2ms   10.0ms   1.45x
  Raw RDD baseline      6.8ms    6.8ms   1.00x

WHERE-predicate paths are now within 15% of the raw Harbour-style
RDD scan loop. The no-WHERE path is unchanged (slight jitter from
the added devirt branch); FieldGet peephole doesn't apply there.

--- Optimization 1: PcOpFieldGet peephole ---

Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips
the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and
calls a direct field getter closure instead. genpc recognizes the
shape `FieldGet(<int-literal>)` during emitCall and emits the
specialized opcode automatically — no SQL-side API change.

Integration:
  * hbrt.Thread.FastFieldGetter  — hot-path closure set by scan loops.
                                   Non-nil → pcode bypasses dispatch.
                                   Nil → pcode resolves FIELDGET via
                                   the RTL symbol table (correctness
                                   fallback for any other callers).
  * compiler/genpc/genpc.go      — peephole in emitCall.
  * hbrt/pcinterp.go             — PcOpFieldGet handler.

This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly
one full Frame/EndProc + RTL dispatch per row × 50k rows.

--- Optimization 2: DBFArea devirtualization ---

SqlScan type-asserts the workarea to *dbf.DBFArea once and runs a
dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the
concrete type. Go's compiler inlines these, skipping the interface
vtable per row. Non-DBF drivers still work via the generic Area
branch.

The FastFieldGetter closure also captures *DBFArea directly in the
DBF branch, so the WHERE predicate side of the hot loop is now
entirely devirtualized: no interface dispatch between the pcode
dispatch loop and the DBF record buffer.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the
two-column row construction + ArraySlab + flat backing bookkeeping
that the raw loop doesn't do. Going below that requires changing
the SQL engine's result shape — out of scope here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-14 12:23:31 +09:00
parent fe5df22517
commit f9ffd4050e
5 changed files with 123 additions and 32 deletions

View File

@@ -157,6 +157,23 @@ func execPcodeBody(t *Thread, fn *PcodeFunc, mod *PcodeModule) {
case PcOpEndProc:
return
// --- Workarea field access (peephole for FieldGet(literal)) ---
case PcOpFieldGet:
fIdx := int(binary.LittleEndian.Uint16(code[pc:]))
pc += 2
// Hot path — SqlScan plugs a direct field getter closure into
// t.FastFieldGetter before running the predicate, so we skip
// PushSymbol + Function dispatch + FieldGet RTL's own Frame.
if fg := t.FastFieldGetter; fg != nil {
t.PushValue(fg(fIdx))
} else {
// Generic fallback: resolve through RTL symbol table
t.PushSymbol(t.VM().FindSymbol("FIELDGET"))
t.PushNil()
t.PushLong(int64(fIdx))
t.Function(1)
}
// --- Function calls ---
case PcOpPushSymbol:
slen := int(binary.LittleEndian.Uint16(code[pc:]))

View File

@@ -69,6 +69,11 @@ const (
PcOpFunction byte = 0x42 // + uint16 nArgs
PcOpDo byte = 0x43 // + uint16 nArgs
// Workarea field access — skips PushSymbol + Function dispatch
// for `FieldGet(n)` where n is a literal. Emitted by genpc as a
// peephole optimization. Operand: uint16 1-based field position.
PcOpFieldGet byte = 0x46
// Self / OOP
PcOpPushSelf byte = 0x48
PcOpPushSelfField byte = 0x49 // + uint16 len + name

View File

@@ -87,6 +87,13 @@ type Thread struct {
// WorkArea manager (goroutine-local, no locks needed)
WA interface{} // *hbrdd.WorkAreaManager — set by caller to avoid import cycle
// FastFieldGetter is a hot-path closure set by SqlScan (or any other
// scan loop) to short-circuit PcOpFieldGet. When non-nil, the pcode
// interpreter calls this instead of going through PushSymbol +
// Function dispatch + FieldGet RTL's own Frame/EndProc. Caller is
// responsible for setting and clearing it around a scan.
FastFieldGetter func(int) Value
waStack []uint16 // saved workarea numbers for (expr)->(expr) context switching
// VM reference (shared, read-mostly)