Second pcode peephole to match the one added for FieldGet(literal).
SqlExprToPrg auto-wraps CHAR column references with AllTrim() to
match SqlCmpEq's CHAR-padding trim semantics, so every string WHERE
predicate evaluates `AllTrim(FieldGet(n)) == 'literal'` per row.
Before this commit each of those per-row evaluations did:
1. PushSymbol ALLTRIM
2. PushSymbol FIELDGET → Function(1) [1 RTL Frame]
3. parseCharField → MakeString [alloc: copies raw bytes]
4. Function(1) → AllTrim RTL [1 RTL Frame]
5. strings.TrimSpace [alloc: new string]
6. Return, continue
New opcode `PcOpFieldTrim <idx>` (0x47) fuses the two RTL calls into
a single opcode that:
1. Calls FastFieldGetter directly (no Frame/Function dispatch).
2. Walks the returned string with ASCII-space trim in place.
3. Pushes `s[lo:hi]` — a sub-slice, no new allocation.
4. Short-circuits back to the same string if no trim needed.
genpc recognizes the shape `AllTrim(FieldGet(<int-literal>))` in
emitCall and emits the fused opcode automatically — no SQL-side
API change. Matches the existing FieldGet peephole's shape.
Bench impact (50k rows, 3-run steady state, vs raw RDD baseline 6.2ms):
String WHERE before 7.9ms → after 6.2ms 1.00x (parity!)
Numeric WHERE 6.9ms (unchanged) 1.11x
No WHERE 9.1ms (unchanged) 1.47x
String WHERE is now at parity with the raw Harbour-style RDD scan.
Compared to session start (119ms), that's a 19x speedup.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two stacked optimizations land on the SqlScan hot path. Combined
effect on the 50k-row benchmark:
Before After vs raw
Numeric WHERE 10.2ms 7.8ms 1.15x
String WHERE 10.5ms 7.9ms 1.15x
No WHERE 9.2ms 10.0ms 1.45x
Raw RDD baseline 6.8ms 6.8ms 1.00x
WHERE-predicate paths are now within 15% of the raw Harbour-style
RDD scan loop. The no-WHERE path is unchanged (slight jitter from
the added devirt branch); FieldGet peephole doesn't apply there.
--- Optimization 1: PcOpFieldGet peephole ---
Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips
the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and
calls a direct field getter closure instead. genpc recognizes the
shape `FieldGet(<int-literal>)` during emitCall and emits the
specialized opcode automatically — no SQL-side API change.
Integration:
* hbrt.Thread.FastFieldGetter — hot-path closure set by scan loops.
Non-nil → pcode bypasses dispatch.
Nil → pcode resolves FIELDGET via
the RTL symbol table (correctness
fallback for any other callers).
* compiler/genpc/genpc.go — peephole in emitCall.
* hbrt/pcinterp.go — PcOpFieldGet handler.
This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly
one full Frame/EndProc + RTL dispatch per row × 50k rows.
--- Optimization 2: DBFArea devirtualization ---
SqlScan type-asserts the workarea to *dbf.DBFArea once and runs a
dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the
concrete type. Go's compiler inlines these, skipping the interface
vtable per row. Non-DBF drivers still work via the generic Area
branch.
The FastFieldGetter closure also captures *DBFArea directly in the
DBF branch, so the WHERE predicate side of the hot loop is now
entirely devirtualized: no interface dispatch between the pcode
dispatch loop and the DBF record buffer.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the
two-column row construction + ArraySlab + flat backing bookkeeping
that the raw loop doesn't do. Going below that requires changing
the SQL engine's result shape — out of scope here.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expose Five's existing FRB bytecode compiler for single-expression
compilation, enabling prepared-statement-style caching in dynamic
query engines (FiveSql2, scripting layers, rule engines).
1. genpc.CompileExpr(ast.Expr) *hbrt.PcodeFunc
- New public API that compiles a single expression to a
standalone pcode function
- Reuses genpc's mature emitExpr (no new emit logic)
- ExecPcode manages the frame around the generated code
2. hbrtl.PcCompile(cPrgExpr) -> pFunc
- RTL entry point for runtime compilation
- Wraps the expression in a FUNCTION stub, uses the full PRG
parser pipeline (pp + parser + genpc), extracts the compiled
pcode function, returns it as an opaque pointer
- Callers pay parse+compile cost ONCE per expression
3. hbrtl.PcEval(pFunc) -> xValue
- RTL entry point for runtime execution
- Calls hbrt.ExecPcode; the pcode's RetValue opcode sets retVal,
which our EndProc preserves as PcEval's return value
- ~1.2x slower than direct FieldGet (pcode interpreter overhead),
but eliminates AST tree-walk per row for complex expressions
Usage (FiveSql2 hot path, planned):
pc := PcCompile("FieldGet(4) > 50000") // parse+compile once
WHILE !Eof()
IF PcEval(pc) // ~10us per row
AAdd(aRows, ...)
ENDIF
dbSkip()
ENDDO
Benchmark (50k records, WHERE salary > 50000):
Raw FieldGet: 7.9 ms (baseline)
FieldPos+Get: 10.2 ms (with O(1) FieldPos cache)
PcEval bytecode: 10.1 ms (interpreted bytecode)
MacroEval: parse+eval per row — orders of magnitude slower
Tests:
go test ./... ALL PASS (14 packages)
FiveSql2 43/43 100%
compat_harbour 51/51
PcCompile/PcEval verified on 50k-row scan
FiveSql2 engine integration deferred — requires careful PRG-level
refactoring to thread pcode pointers through the plan structure.
The Go-level infrastructure is now in place for that work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>