Commit Graph

4 Commits

Author SHA1 Message Date
f9ffd4050e perf(FiveSql2): FieldGet peephole + DBFArea devirt — WHERE at ~1.15x raw RDD
Two stacked optimizations land on the SqlScan hot path. Combined
effect on the 50k-row benchmark:

                       Before    After   vs raw
  Numeric WHERE        10.2ms    7.8ms   1.15x
  String WHERE         10.5ms    7.9ms   1.15x
  No WHERE              9.2ms   10.0ms   1.45x
  Raw RDD baseline      6.8ms    6.8ms   1.00x

WHERE-predicate paths are now within 15% of the raw Harbour-style
RDD scan loop. The no-WHERE path is unchanged (slight jitter from
the added devirt branch); FieldGet peephole doesn't apply there.

--- Optimization 1: PcOpFieldGet peephole ---

Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips
the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and
calls a direct field getter closure instead. genpc recognizes the
shape `FieldGet(<int-literal>)` during emitCall and emits the
specialized opcode automatically — no SQL-side API change.

Integration:
  * hbrt.Thread.FastFieldGetter  — hot-path closure set by scan loops.
                                   Non-nil → pcode bypasses dispatch.
                                   Nil → pcode resolves FIELDGET via
                                   the RTL symbol table (correctness
                                   fallback for any other callers).
  * compiler/genpc/genpc.go      — peephole in emitCall.
  * hbrt/pcinterp.go             — PcOpFieldGet handler.

This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly
one full Frame/EndProc + RTL dispatch per row × 50k rows.

--- Optimization 2: DBFArea devirtualization ---

SqlScan type-asserts the workarea to *dbf.DBFArea once and runs a
dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the
concrete type. Go's compiler inlines these, skipping the interface
vtable per row. Non-DBF drivers still work via the generic Area
branch.

The FastFieldGetter closure also captures *DBFArea directly in the
DBF branch, so the WHERE predicate side of the hot loop is now
entirely devirtualized: no interface dispatch between the pcode
dispatch loop and the DBF record buffer.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the
two-column row construction + ArraySlab + flat backing bookkeeping
that the raw loop doesn't do. Going below that requires changing
the SQL engine's result shape — out of scope here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:23:31 +09:00
5c067f35a4 perf(hbrt): ExecPcodeFast — pcode variant without defer/recover
Pcode expressions compiled from SQL WHERE clauses (via genpc.CompileExpr)
never contain BEGIN SEQUENCE and can't raise BreakValue, so the defer +
recover dance in ExecPcode's EndProc is pure overhead. For FiveSql2's
per-row WHERE evaluation on a 50k-row scan, that's 50k × ~15ns = ~750µs
of pointless recover bookkeeping.

Split ExecPcode into two variants sharing execPcodeBody:

  ExecPcode     — full: Frame + defer EndProc. General-purpose,
                  handles panics. Behavior unchanged.

  ExecPcodeFast — hot: Frame + execPcodeBody + EndProcFast. No defer,
                  no recover. Caller guarantees the pcode body can't
                  panic with HbError / BreakValue.

SqlScan now uses ExecPcodeFast for per-row WHERE evaluation. Measured
impact on 50k-row no-WHERE benchmark: 10.6ms → 9.2ms steady state
(~13% faster). Effect is smaller on numeric-WHERE because per-row
cost there is dominated by the opcode dispatch itself, not the frame
exit.

Validation:
  - FiveSql2 43/43
  - go test ./hbrt/... PASS (pcode tests)
  - go test ./hbrtl/... PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:07:54 +09:00
85541a3035 perf(sqlscan): flat backing buffer — 30% faster no-WHERE scan
The prior loop allocated one small `[]hbrt.Value` per matching row
(for the row body) plus one HbArray header. For a 50k-row full scan
that's 100k allocations of which the small-slice allocs dominated
fragmentation and GC pressure.

SQLite-inspired fix: pre-allocate a single flat []hbrt.Value of
capacity `RecCount * nFields` at scan start and hand each row a
three-index sub-slice (flat[off:end:end]). The capped sub-slice
still forces a reallocation if PRG code later does `AAdd(row, x)`,
so neighbor rows can't get clobbered.

Sizing the initial buffer off RecCount(err-ignored) was the actual
win — the previous naive grow-from-1024 policy caused five mid-scan
reallocations of a ~200 KB buffer, each memcpy'ing everything so far.
One upfront allocation amortizes much better.

Bench (50k rows, ~/tmp ext4, 3 runs steady-state):

                          Before        After       Δ
  no WHERE                14.6ms       10.6ms     −27%
  numeric WHERE           11.7ms       10.0ms     −15%
  string WHERE            10.5ms       11.0ms     ~=
  raw RDD baseline         6.8ms        7.0ms

Gap to raw RDD: 2.1x → 1.4x on the dominant no-WHERE case. What's
left is pcode WHERE dispatch (ExecPcode frame per row), the Area
interface boundary, and the HbArray header allocation per row —
all structural costs that would need a wider refactor to close.

Validation:
  - FiveSql2 43/43
  - go test ./hbrtl/... PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:57:05 +09:00
8aaed994f4 perf(FiveSql2): hybrid fast path — 11x speedup on string WHERE scans
Implements hybrid execution model: keep AST tree-walk for SQL:2013+
features (Window, Recursive CTE, JOIN, aggregates) while compiling
simple SELECT hot paths to Go + pcode. See docs/FiveSql2-Hybrid-Plan.md
for the full architecture rationale (why not SQLite-style VDBE).

Hot path (single table, no joins/groups/aggregates):
  - TryBuildFieldPositions: resolves SELECT column list to FieldPos
    array once per query (bails to PRG loop on any complex expr).
  - TryCompileWhere + SqlExprToPrg: walks WHERE AST, emits equivalent
    PRG source, runs it through PcCompile to get a PcodeFunc.
  - SqlScan RTL: Go-native scan loop — GoTop/EOF/Skip/GetValue
    direct, ExecPcode per row for WHERE, result array pre-alloc.

WHERE compiler scope:
  - ND_LIT numeric/logical/string (string literals AllTrim'd to match
    SqlCmpEq CHAR-padding semantics; rejects embedded quotes/newlines)
  - ND_COL: CHAR fields auto-wrapped with AllTrim(FieldGet(n)) based
    on dbStruct() lookup cached once per query in aCompileStruct
  - ND_BIN: = <> != < <= > >= AND OR + - * /
  - ND_UNI: NOT -
  - Anything else (ND_FN, ND_CASE, ND_SUB, ND_PAR, LIKE, IN, IS NULL,
    BETWEEN, dates) returns NIL → falls back to PRG tree-walk.

Bench (50k rows, ~/tmp ext4):
                        Before      After     Speedup
  Numeric WHERE         ~150ms     11.7ms     ~13x
  String WHERE          119.3ms    10.5ms     11.4x
  No WHERE               -         14.6ms      -
  Raw RDD baseline        6.8ms     6.8ms      1.0x

Remaining gap to raw RDD (~1.5x) is structural: Value boxing, result
array construction, per-row ExecPcode frame overhead. Would need a
Value-pool or SoA refactor to close further.

Side fixes bundled:
  - TSqlIndex:FindExclusive short-circuited. Originally called
    dbInfo(DBI_FULLPATH)/DBI_SHARED which are unresolved symbols in
    Five (dbInfo is a stub, DBI_* never defined). Panic'd with
    "local variable index out of range: 0" whenever a standalone PRG
    had a workarea Used before calling five_SQL. 43-test masked the
    bug because it only reached FindExclusive with no open workareas.
    Restore the scan once dbInfo lands in hbrtl.
  - cmd/five/main.go: FIVE_KEEP_BUILD=1 env var keeps the temp Go
    project around for debugging gengo output.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 09:15:08 +09:00