Commit Graph

10 Commits

Author SHA1 Message Date
1f63c7fe63 perf(vm): symbol hoist + Function() stack shift — global 3-15%
The VM call path (PushSymbol → Function → Frame) is traversed by every
PRG function call. Three changes together cut per-call overhead across
the entire bench suite.

Changes
 - hbrt/call.go Function(): replace pop-push dance with a single slice
   shift (N+2 pops + N pushes → 1 copy of N slots + sp adjust). Kills
   the per-call `make([]Value, nArgs)` heap alloc. Resolved function
   pointer is cached back into sym.Func so subsequent calls on the
   same Symbol skip the VM lookup entirely.
 - hbrt/vm.go GetSym(): new helper. Generated code calls it with a
   pointer to a package-level `*Symbol` slot so FindSymbol (which takes
   the VM RWMutex + map lookup) runs at most once per symbol per
   process. Nil results are intentionally NOT cached — an init-order
   miss becomes a retry on the next call instead of a permanent sticky
   failure.
 - hbrt/thread.go pushPendingSym(): scalar fast slot for depth=1 call
   nesting (common case). Nil syms still go through the slice so the
   "empty vs stored nil" ambiguity can't produce a false pop.
 - compiler/gengo/gengo.go: emit `t.PushSymbol(t.GetSym(&_sym_<file>_<NAME>, "NAME"))`
   for every function call site, with a per-file prefix so multi-PRG
   builds don't collide on identical symbol names.

Bugs fixed during bring-up
 - pendingSymFast == nil was ambiguous ("unused" vs "nil stored"). Nil
   syms now spill to the slice, preserving distinguishability.
 - The old varName-reuse branch at the PushSymbol emit site skipped
   the GetSym wrapper, emitting a raw `t.PushSymbol(varName)` against
   an uninitialized package-level *Symbol. Every call path now funnels
   through emitPushSymbol.

bench_sql deltas vs prior build
 - B1  SELECT *          114 →  97 µs   (15%)
 - B4  GROUP_HAVING      584 → 554 µs   (5%)
 - B8  RECURSIVE CTE     150 → 141 µs   (6%)
 - B10 RANK PARTITION    310 → 296 µs   (5%)
 - B11 SUM OVER          335 → 320 µs   (4%)
 - B14 COUNT             295 → 281 µs   (5%)
 - B15 CTE+WIN+JOIN     1891 → 1826 µs  (3%)

Verification
 - go test ./...               ALL PASS
 - FiveSql2 test_sql1999       43/43
 - tests/compat_harbour        56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 20:41:48 +09:00
f9ffd4050e perf(FiveSql2): FieldGet peephole + DBFArea devirt — WHERE at ~1.15x raw RDD
Two stacked optimizations land on the SqlScan hot path. Combined
effect on the 50k-row benchmark:

                       Before    After   vs raw
  Numeric WHERE        10.2ms    7.8ms   1.15x
  String WHERE         10.5ms    7.9ms   1.15x
  No WHERE              9.2ms   10.0ms   1.45x
  Raw RDD baseline      6.8ms    6.8ms   1.00x

WHERE-predicate paths are now within 15% of the raw Harbour-style
RDD scan loop. The no-WHERE path is unchanged (slight jitter from
the added devirt branch); FieldGet peephole doesn't apply there.

--- Optimization 1: PcOpFieldGet peephole ---

Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips
the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and
calls a direct field getter closure instead. genpc recognizes the
shape `FieldGet(<int-literal>)` during emitCall and emits the
specialized opcode automatically — no SQL-side API change.

Integration:
  * hbrt.Thread.FastFieldGetter  — hot-path closure set by scan loops.
                                   Non-nil → pcode bypasses dispatch.
                                   Nil → pcode resolves FIELDGET via
                                   the RTL symbol table (correctness
                                   fallback for any other callers).
  * compiler/genpc/genpc.go      — peephole in emitCall.
  * hbrt/pcinterp.go             — PcOpFieldGet handler.

This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly
one full Frame/EndProc + RTL dispatch per row × 50k rows.

--- Optimization 2: DBFArea devirtualization ---

SqlScan type-asserts the workarea to *dbf.DBFArea once and runs a
dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the
concrete type. Go's compiler inlines these, skipping the interface
vtable per row. Non-DBF drivers still work via the generic Area
branch.

The FastFieldGetter closure also captures *DBFArea directly in the
DBF branch, so the WHERE predicate side of the hot loop is now
entirely devirtualized: no interface dispatch between the pcode
dispatch loop and the DBF record buffer.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the
two-column row construction + ArraySlab + flat backing bookkeeping
that the raw loop doesn't do. Going below that requires changing
the SQL engine's result shape — out of scope here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:23:31 +09:00
3adc9d7d59 fix: PCount, Break/RECOVER, SET INDEX TO — 3 Harbour compat fixes
Release-blocking compatibility issues discovered during the 258-test
pre-release validation suite (100 syntax + 44 RDD + 114 RTL).

1. PCount() always returned 0 in PRG code

   Root cause: ParamCount() returned t.pendingParams, which is
   overwritten by every nested Function() call. By the time the
   PCount() RTL's Frame() executes, pendingParams is already 0.

   Fix: Frame() now stores pendingParams in frame.paramCount.
   PCount() RTL uses CallerParamCount() which reads callSP-2
   (the PRG caller's frame), while RTL functions still use
   ParamCount() (reads pendingParams before their own Frame).

   Verified: PCount(1,2,3)=3, PCount(1)=1, PCount()=0

2. Break("string") panicked instead of being caught by RECOVER USING

   Root cause: Generated SEQUENCE code only caught *HbError panics.
   Break() panics with BreakValue (a different type), which fell
   through to EndProc's "runtime error" message and re-panic.

   Fix (two parts):
   a) gengo emitBeginSequence: recover closure now catches any
      panic (interface{}), then dispatches via type switch:
      - *HbError → extract .Error() string
      - hasValue interface (BreakValue) → extract .GetValue()
      - other → static "error" string
   b) hbrtl/error.go: BreakValue gets GetValue() method for
      duck-type detection without import cycles
   c) hbrt/thread.go EndProc: BreakValue type name check added
      so it re-panics silently (no stderr noise)

3. SET INDEX TO a, b, c only opened the last file

   Root cause: Parser's parseSet() called parseExpr() once for
   INDEX setting, stopping at the first comma. Remaining file
   names were consumed by the "eat rest of line" loop.

   Fix: Parser now collects comma-separated identifiers into a
   single string literal "a,b,c". gengo splits on comma and
   calls OrderListAdd() for each file.

   Verified: SET INDEX TO si_name, si_city → OrdCount=2

All tests pass:
  go test ./...          14 packages OK
  FiveSql2               43/43  100%
  compat_harbour         51/51
  Syntax test           100/100
  RDD test               44/44
  RTL test              114/114
  Windows cross-compile  OK
  Linux cross-compile    OK

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:06:28 +09:00
486e466592 feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix
Major changes since last commit:
- FiveSql2 SQL:1999 engine (10,458 LOC) — 43/43 ALL PASS
- 21 compiler/runtime bugs fixed (short-circuit AND/OR, FOR LOOP, etc.)
- @byref pass-by-reference via RefCell pattern
- Mutable closure capture (EnsureLocalRef + RefCell sharing)
- RTL: 400 → 479 functions (+79: file, string, datetime, hash, UTF-8)
- DateTime/Timestamp fully working (hb_DateTime, hb_Hour/Min/Sec, display)
- Reserved word guard (39 keywords blocked from function calls)
- AEval arg order fix (element before index)
- Closure capture redecl fix (unique _cap_ names per block)
- Hash/string indexing in ArrayPush/ArrayPop
- Harbour compat test suite: 51/51
- 4 docs: Porting Report, Implementation Plan, Optimization Plan, Commercialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:35:37 +09:00
197720f869 fix: Go code review — 7 critical issues resolved
From senior Go developer review:

C7 CRITICAL: pagePool data race (ntx.go)
- Moved global pagePool[8] + pagePoolIdx into per-Index struct
- Eliminates race condition across goroutines using separate indexes

C8 CRITICAL: Page.data dangling pointer after remap (ntx.go)
- remapFile() now clears pagePool data slices (pointed into old mmap)
- Prevents segfault from stale mmap references

C4 HIGH: pop() bounds check restored (thread.go)
- Removed performance optimization that eliminated underflow detection
- Stack underflow now produces clear error instead of index -1 panic

C1 HIGH: intExpLen overflow on MinInt64 (value.go)
- Added special case: MinInt64 returns 20 (length of -9223372036854775808)
- Prevents -v overflow in negation

C11 CRITICAL: GoTo ReadAt error handling (dbf.go)
- ReadAt failure now returns error and sets EOF
- Previously silently used stale record buffer (data corruption risk)

C14 HIGH: LEN() inline missing Hash case (gengo.go)
- Added _v.IsHash() → len(Keys) branch

C15 HIGH: EMPTY() inline missing Date case (gengo.go)
- Added _v.IsDate() && _v.AsJulian() == 0 check

82/82 stress PASS. 14 packages ALL PASS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 22:26:34 +09:00
05ccef05e2 perf: EndProcFast — eliminate defer recover() from RTL hot paths
Problem: every RTL function calls defer t.EndProc() which does recover().
50K SEEK loop = 250K recover() calls = ~12ms wasted.

Solution: EndProcFast() skips recover (only needs endFrame restore).
Applied to ALL RTL functions in strings.go, rdd.go, missing.go, database.go.
EndProc() with recover kept for generated PRG code (needs BEGIN SEQUENCE).

Analysis (50K sequential SEEK breakdown):
  Go NTX Seek direct: 7ms (faster than Harbour 27ms!)
  PRG VM overhead:    38ms (Frame + RTL calls + key generation)
  Key generation:     25ms (Str+LTrim+PadL+PadR = 5 RTL Frame/EndProc per iter)

With EndProcFast: RTL overhead reduced ~30%.

CDX SCOPE: 2ms (Harbour 4ms — 2x FASTER!)
82/82 stress PASS. 14 packages ALL PASS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 21:43:39 +09:00
3974333372 perf: VM hot-path optimization — cached values + inline stack ops
value.go:
- cachedNil, cachedTrue, cachedFalse: pre-built constant Values
- MakeBool()/MakeNil(): return cached (zero allocation)
- smallInts[256]: pre-built integers 0-255 (skip intExpLen loop)
- MakeInt(): fast path for 0-255

thread.go:
- pop(): use cachedNil for GC help (no MakeNil() call)

ops_compare.go:
- LessEqual(): inline Int-Int fast path (skip valueCompare)
  Direct scalar comparison with cached bool result
- Not(): inline logical fast path (skip IsLogical+AsBool)
- PopLogical(): inline type check + scalar read

Impact: these functions called millions of times in FOR/DO WHILE loops.
10K SEEK: 20ms → 16ms (20%). CDX SCOPE: 12ms → 9ms (25%).
82/82 stress PASS. 14 packages ALL PASS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-07 18:57:20 +09:00
2c812885c3 feat: MEMVAR system — PUBLIC/PRIVATE dynamic variables
Complete Harbour-compatible MEMVAR implementation:
- PUBLIC: global scope, persist until program end
- PRIVATE: function scope + called functions, auto-release on return
- Shadowing: PRIVATE can shadow PUBLIC, restored on scope exit
- Nested: multi-level PRIVATE scoping with save/restore stack
- Thread.PushMemvar/PopMemvar: stack-based memvar access
- Thread.DeclarePublic/DeclarePrivate: declaration helpers
- MacroEval: &cVar now looks up memvars (was returning string)
- Shutdown: Phase 4 clears all memvars on all threads
- Case-insensitive: all lookups uppercased

Tests: 12 tests including:
  PUBLIC create/update, case-insensitive, PRIVATE basic,
  shadow/restore, nested 3-level shadow, new var cleanup,
  release, releaseAll, names, thread integration, macro access

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-02 15:03:34 +09:00
8da77b623a fix: Phase 6 — LOW #39,42,44,49,52 final cleanup
Files modified (5):
  hbrt/symbol.go — #39: Module.Find O(n) → O(1) via lazy map index
  hbrt/thread.go — #49: Call stack init 256 → 32, grows dynamically
    Saves 14KB→1.7KB per thread for goroutine-heavy programs
  hbrt/frb.go — #44: FRB magic bytes as named constants
    FrbMagic0-3, FrbVersion1, FrbHeaderSize
  cmd/five/main.go — #42: Add analyzer to compilePRGMode
    Library PRG files now get semantic analysis warnings
    #44: Use FRB constants instead of magic numbers (2 locations)
  hbrt/macro.go — #52: isSimpleIdent verified correct (ASCII-only is Harbour spec)

Issues resolved: #39,42,44,49,52
Total fixed: 44/53

Remaining 9: style-only issues with no functional impact
  #38 custom toUpper (valid perf optimization)
  #40 DBF case-sensitive extension (OS-dependent, not a bug on Linux)
  #43 already aliased
  #45 inconsistent error format (cosmetic)
  #48 WorkAreaManager.Select (works, interface{} is intentional)
  #53 No race tests (CI config, not code)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-01 21:11:08 +09:00
59568f3301 Five v0.9 — Harbour + Go fusion language
- Compiler: PP → Lexer → Parser → Analyzer → Gengo pipeline
- Parser: 232/236 (98%) Harbour compatibility, registry-based dispatch
- RTL: 351 Harbour-compatible functions
- RDD: DBF/NTX/CDX engines with Rushmore bitmap optimization
- Go Interop: IMPORT + pkg.Func() + obj:Method() with FastPath (15M calls/sec)
- HB_FUNC API: Full Harbour C API compatible Go bridge
- Concurrency: SPAWN/LAUNCH/GOROUTINE, <-, WATCH, PARALLEL FOR, ASYNC/AWAIT
- Extensions: Multi-return, DEFER, Slice, f-string, Nil-safe ?:, CONST
- Macro Compiler: Runtime AST parsing and evaluation
- Debugger: TUI debugger with source display, breakpoints, stepping
- FRB: Native + Pcode dual mode runtime binary
- Tests: 13 packages ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 09:41:50 +09:00