The VM call path (PushSymbol → Function → Frame) is traversed by every
PRG function call. Three changes together cut per-call overhead across
the entire bench suite.
Changes
- hbrt/call.go Function(): replace pop-push dance with a single slice
shift (N+2 pops + N pushes → 1 copy of N slots + sp adjust). Kills
the per-call `make([]Value, nArgs)` heap alloc. Resolved function
pointer is cached back into sym.Func so subsequent calls on the
same Symbol skip the VM lookup entirely.
- hbrt/vm.go GetSym(): new helper. Generated code calls it with a
pointer to a package-level `*Symbol` slot so FindSymbol (which takes
the VM RWMutex + map lookup) runs at most once per symbol per
process. Nil results are intentionally NOT cached — an init-order
miss becomes a retry on the next call instead of a permanent sticky
failure.
- hbrt/thread.go pushPendingSym(): scalar fast slot for depth=1 call
nesting (common case). Nil syms still go through the slice so the
"empty vs stored nil" ambiguity can't produce a false pop.
- compiler/gengo/gengo.go: emit `t.PushSymbol(t.GetSym(&_sym_<file>_<NAME>, "NAME"))`
for every function call site, with a per-file prefix so multi-PRG
builds don't collide on identical symbol names.
Bugs fixed during bring-up
- pendingSymFast == nil was ambiguous ("unused" vs "nil stored"). Nil
syms now spill to the slice, preserving distinguishability.
- The old varName-reuse branch at the PushSymbol emit site skipped
the GetSym wrapper, emitting a raw `t.PushSymbol(varName)` against
an uninitialized package-level *Symbol. Every call path now funnels
through emitPushSymbol.
bench_sql deltas vs prior build
- B1 SELECT * 114 → 97 µs (15%)
- B4 GROUP_HAVING 584 → 554 µs (5%)
- B8 RECURSIVE CTE 150 → 141 µs (6%)
- B10 RANK PARTITION 310 → 296 µs (5%)
- B11 SUM OVER 335 → 320 µs (4%)
- B14 COUNT 295 → 281 µs (5%)
- B15 CTE+WIN+JOIN 1891 → 1826 µs (3%)
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
83 lines
2.5 KiB
Go
83 lines
2.5 KiB
Go
// Copyright (c) 2026 Charles KWON OhJun (charleskwonohjun@gmail.com)
|
|
// All rights reserved.
|
|
|
|
package hbrt
|
|
|
|
import "strings"
|
|
|
|
// pendingCall stores the symbol for the next Function/Do call.
|
|
// This avoids storing Go pointers in Value.data (which GC can't trace).
|
|
|
|
// PushSymbol records the function symbol for the next call.
|
|
// The actual symbol is stored in Thread, not on the eval stack.
|
|
// A marker NIL is pushed to keep stack positions correct.
|
|
// Harbour: hb_xvmPushSymbol
|
|
func (t *Thread) PushSymbol(sym *Symbol) {
|
|
t.pushPendingSym(sym)
|
|
t.push(MakeNil()) // placeholder for symbol position
|
|
}
|
|
|
|
// Function calls the function with nArgs arguments.
|
|
// Stack layout before: [sym_placeholder] [nil/self] [arg1] ... [argN]
|
|
// Stack after: [retval]
|
|
// Harbour: hb_xvmFunction
|
|
func (t *Thread) Function(nArgs int) {
|
|
sym := t.popPendingSym()
|
|
|
|
if sym == nil {
|
|
panic(t.runtimeError("no function symbol for call"))
|
|
}
|
|
|
|
// Resolve function. First call for an external/lazy symbol misses
|
|
// sym.Func and walks the VM symbol table — cache the resolved Func
|
|
// back into the Symbol so subsequent calls skip the ToUpper +
|
|
// RWMutex + map lookup. Symbols are shared read-mostly so a racy
|
|
// write is safe (both racers resolve to the same Func pointer).
|
|
fn := sym.Func
|
|
if fn == nil && t.vm != nil {
|
|
found := t.vm.FindSymbol(strings.ToUpper(sym.Name))
|
|
if found != nil {
|
|
fn = found.Func
|
|
sym.Func = fn
|
|
}
|
|
}
|
|
if fn == nil {
|
|
panic(t.runtimeError("undefined function: " + sym.Name))
|
|
}
|
|
|
|
// Stack at entry (bottom → top):
|
|
// [sym placeholder] [self/NIL] [arg1] … [argN]
|
|
// Frame() expects only [arg1..argN] on the eval stack so it can
|
|
// copy them into the callee's locals. The old code achieved this
|
|
// by pop-popping args, popping the two placeholders, then pushing
|
|
// the args back — an O(N) copy plus a heap allocation per call.
|
|
// Shift the args two slots left in place instead: one slice move,
|
|
// zero heap.
|
|
if nArgs > 0 {
|
|
base := t.sp - nArgs - 2
|
|
copy(t.stack[base:base+nArgs], t.stack[t.sp-nArgs:t.sp])
|
|
}
|
|
// Two slots freed at top — keep them nil so the GC can release any
|
|
// references they held (matches pop()'s clearing semantics).
|
|
t.stack[t.sp-1] = cachedNil
|
|
t.stack[t.sp-2] = cachedNil
|
|
t.sp -= 2
|
|
|
|
// Set pending params count and symbol for Frame()
|
|
t.pendingParams = nArgs
|
|
t.pendingCallSym = sym
|
|
|
|
// Call
|
|
fn(t)
|
|
|
|
// Push return value
|
|
t.push(t.retVal)
|
|
}
|
|
|
|
// Do calls the function but discards the return value.
|
|
// Harbour: hb_xvmDo
|
|
func (t *Thread) Do(nArgs int) {
|
|
t.Function(nArgs)
|
|
t.pop() // discard return value
|
|
}
|