Commit Graph

10 Commits

Author SHA1 Message Date
2008266da7 feat(pp,rtl): Tier 2 audit followups — JOIN hash + PP validation + C heuristic
Three medium-priority audit items in one commit, each independently
revertible.

  * **#18 JOIN hash-join fast path.** New std.ch shape:
        JOIN WITH <alias> TO <file> [FIELDS ...] ON <mfield> = <dfield>
    expands to a 6-arg __dbJoin call with the master/detail key
    field names. Runtime detects the extra args, builds an O(M)
    hash over the detail's key column, then probes per master row
    for O(N+M) total — vs the FOR form's O(N*M). For 1k×1k that's
    2k vs 1M operations; the gap widens with N. The original FOR
    form is unchanged and stays the fallback for arbitrary
    predicates. New helper dbHashKey type-tags the key string so
    `1` (numeric), `"1"` (string), and `.T.` (logical) don't
    collide in the bucket map.

  * **#38 PP rule result-marker validation.** ParseRule now walks
    the result template after parseMarkers and warns about every
    `<name>` (or `<(name)>` / `<.name.>` / `<{name}>` / `#<name>`
    / `<"name">`) that doesn't match a pattern marker. Warnings
    flow into pp.errors via handleDirective with the directive's
    filename:line, so a typo'd `<NaMe>` in an `#xcommand`
    case-sensitive rule fails the build with a clear diagnostic
    instead of silently producing broken expansions.

  * **#44 looksLikeInlineC heuristic strengthened.** Catches more
    of the common Harbour-PRG-with-C-inline-block shapes that
    used to fall through and produce cryptic Go-side errors:
    function-like #define, `extern "C"` linkage blocks, C return-
    type declarations (`int foo(`, `static char* bar(`), and the
    hb_ret*() helper family used by Harbour's C FFI return
    setters. Two small predicate helpers (allLetters,
    allIdentChars) keep the C-vs-Go disambiguation tight enough
    that legit Go code (`func name() int { ... }`) doesn't trip.

  * **#28 LIST/DISPLAY pagination** — explicitly deferred. Proper
    pagination requires interactive terminal handling (Inkey(0)
    for the keypress) which would hang in CI / batch mode. Will
    revisit when an interactive terminal layer needs it for
    other reasons.

Test fixtures: tests/std_ch/test_join_hash.prg verifies the new
ON-form path produces the same output as the FOR form would.
std.ch runner now stands at 16/16.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 16/16
  FRB suite          : 7/7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:21:19 +09:00
29ca02e1bc fix(genpc,parser,pcinterp): pcode wider regression sweep (Tier 1 #3)
Six more silent miscompiles in the pcode path, all uncovered by a
new pcode regression sweep that exercises the full PRG surface a
dynamic FrbCompile body could legitimately use.

  * **xBase-keyword shadowing of variable names.** parseIdentStmt
    and parseExprStmt's fallback switches consumed an entire line
    when the leading IDENT matched LABEL / REPORT / ACCEPT / INPUT
    / NOTE / etc. Those words are also extremely common LOCAL /
    PRIVATE names — `LOCAL label ; label := "x"` had the
    assignment swallowed because the switch didn't peek at the
    next token. Both switches now look at peek(1): an assignment
    operator, [], (, -, ++, --, or `.` means it's a variable /
    call / member access, not the xBase command, and we fall
    through to expression parsing. Real silent bug — bit
    test_frb_pcode_sweep's `LOCAL label` declaration.

  * **`arr[i]` indexing not implemented in genpc.** ast.IndexExpr
    fell through to the default PushNil path, so any indexed read
    in a pcode-mode body returned NIL. New case emits the array,
    the index, and PcOpArrayPush (the get-op; PcOpArrayPop is the
    set-op — naming follows Harbour convention). Hashes go
    through the same opcode, which already special-cases
    IsHash() in ops_collection.go.

  * **Hash literals not implemented in genpc + dispatch missing
    in pcinterp.** `{ "k" => v, ... }` fell to PushNil. Added
    HashLitExpr emit (Push key, Push value pairs, then PcOpHashGen
    with count). Also wired up the PcOpHashGen dispatch in
    execPcodeBody — it had been declared in pcode.go since the
    initial design but the case statement was never added, so
    even hand-written modules couldn't use hashes.

  * **`x++` / `x--` postfix were silent no-ops.** PostfixExpr fell
    to PushNil and the surrounding ExprStmt then popped the NIL.
    DO WHILE loops with `n--` couldn't terminate; FOR loops with
    `i++` in the body were broken too. New case: PushLocal +
    LocalAddInt(±1).

  * **BlockExpr (`{|p| body }`) wasn't compiled.** Eval(b, n)
    inside a pcode body returned NIL. Added: build the body in a
    sub-codebuffer with the block's params occupying its locals,
    emit PcOpRetValue at the end, then PushBlock with the
    serialized bytes. Format extended with a uint16 nParams field
    so the runtime's PcOpPushBlock dispatch can set
    PcodeFunc.Params correctly — without it, ExecPcode's
    Frame(0, 0) pulled none of Eval's args and the block saw
    every parameter as NIL.

  * **All g.locals accesses were case-sensitive.** PRG is case-
    insensitive, but the pcode generator stored block params via
    strings.ToUpper while every other lookup site (function decl,
    mid-decl, ForStmt, IdentExpr read, AssignExpr write,
    PostfixExpr) used the raw .Name. So `{|x| x*x }` stored "X"
    but read "x" and missed. Normalized: all insertions and all
    lookups now go through strings.ToUpper.

  * **SeqExpr in pcode** — added the matching emit for comma-
    separated expression lists in code blocks (`{|| a, b, c }`).
    Same shape as the gengo SeqExpr case from Wave 1.

Test fixture: tests/frb/test_frb_pcode_sweep.prg covers 14 shapes
(string ops, arithmetic, comparison chains, array indexing, DO
WHILE with postfix, nested IF, IIf, hash literal + indexing,
block + Eval, character iteration). All 14 pass. Wired into the
FRB runner — suite now stands at 7/7.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 15/15
  FRB suite          : 7/7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 11:32:38 +09:00
dca7bb22e5 fix(gengo): count nested LOCALs into the function frame
Function-entry Frame() allocation counted only top-level LOCAL
declarations from fn.Body. Mid-function LOCALs hidden inside an
IF / FOR / WHILE / DO CASE / SWITCH / SEQUENCE block weren't
included, so the runtime allocated a frame too small to hold them.
Subsequent reads/writes via PopLocalFast / PushLocalFast / LocalAdd
to those slot indices then either silently scribbled past the frame
(read-back saw NIL) or panicked with "local variable index out of
range" once the index exceeded the underlying slice.

This is the underlying bug behind frb_demo Section 4 — the
`LOCAL ch := Channel(1)` declared inside `IF pAsync != NIL` got
slot N+1 from the codegen but the runtime only allocated N. The
Channel value was scribbled past the frame, ChReceive then read
NIL from a non-existent slot, and the goroutine's ChSend(49) had
nowhere to land.

New helper gen_util.go::countLocalsInStmts walks every nested body
(IF + ElseIfs + ElseBody, ForStmt, ForEachStmt, DoWhileStmt,
SeqStmt's Body + RecoverBody, SwitchStmt's Cases + Otherwise) and
totals every ScopeLocal VarDecl. The function-emit caller adds this
to the top-level count before sizing the Frame.

Test fixture (tests/frb/test_frb_goroutine.prg) reproduces the
demo Section 4 shape — `LOCAL ch := Channel(1)` inside IF, then
`Go("WORKER", ch, 7)`, then ChReceive(ch). Wired into the FRB
runner so it stands at 6/6.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 15/15
  FRB suite          : 6/6

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 07:05:22 +09:00
6a30c4e50e fix(gengo): compound assign for non-LOCAL LHS
Audit follow-up after Wave 1's pcode `+=` fix surfaced a parallel
class of silent miscompiles in the *gengo* (native-Go) emit path.
Three real bugs hiding behind happy-path test coverage:

  * `arr[i] += x` was ASSIGN-only — the IndexExpr branch returned
    after emitting `arr[i] := x`, dropping the original element.
    Now: PushArray + Push index, ArrayPush to read, fold with RHS,
    re-do PushArray + index, ArrayPop to store.

  * `alias->field += x` (and the M-> / MEMVAR-> namespace variants)
    were ASSIGN-only too. Same shape of bug — `x->v += 7` compiled
    as `x->v := 7`. Compound branch reads via PushAliasField (or
    PushMemvar for M->), folds, stores via SetAliasField (or
    PopMemvar).

  * PRIVATE / PUBLIC mid-function declarations were treated as
    extra LOCAL slots. emitMidVarDecl extended `locals` past the
    function's declared count and emitted `PopLocalFast(idx)` for
    the init. The slot didn't exist at runtime, so the init either
    silently scribbled past the frame (small N) or panicked with
    "local variable index out of range" once exercised. New logic:
    PRIVATE/PUBLIC declarations bypass the locals table and emit
    `PopMemvar(name)` for the init expression. The runtime auto-
    creates the memvar.

  * Memvar assignment fallback. After the LOCAL/STATIC checks miss
    in emitAssign, the bottom path used to be a one-line WARN that
    emitted RHS + `Pop()` — silently discarding the value. PRIVATE
    pSum stayed at its initial value forever. Now: ASSIGN goes
    through PopMemvar; compound forms read via PushMemvar, fold,
    write back via PopMemvar.

Test fixture (tests/std_ch/test_compound_lhs.prg) covers all four
shapes. The std.ch runner picks it up so the regression suite now
stands at 15/15.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 15/15
  FRB suite          : 5/5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-03 05:14:28 +09:00
efb615bed9 fix(frb,genpc): in-process compile + 4 pcode bugs
Compiling _FiveSql2/test/test_sql_extreme.prg + a sweep of the FRB
demos surfaced four real bugs in the dynamic-compilation pipeline.
All fixes shipped together because they were on the same critical
path; each is independently revertible.

  * **pcode FOR loop ignored STEP and direction.** emitFor in
    compiler/genpc emitted a fixed `<= to` comparison and a hardcoded
    `+1` increment, then deleted the actual step expression with
    slice arithmetic on the byte buffer. Result: `FOR 5 TO 1 STEP
    -1` exited on the first iteration; `FOR 1 TO 10 STEP 2` summed
    1..10 (55) instead of 1+3+5+7+9 (25). Rewritten to mirror
    gengo's emitFor: detect negative step from a literal `-N` or
    unary MINUS, pick `<=` vs `>=` accordingly, and emit a clean
    `var := var + step` increment per iteration.

  * **pcode compound `+=` operator stored only the RHS.** emitAssign
    looked at AssignExpr.Op only for the := case; +=/-=/etc.
    silently took the same path, so `n += i` compiled as `n := i`,
    discarding the accumulator. Loop reduces were wrong: `Reverse`
    returned "" and `n := 0; FOR i ... n += i; NEXT` returned only
    the last increment. New compoundBinOp helper maps PLUSEQ /
    MINUSEQ / STAREQ / SLASHEQ / PERCENTEQ / POWEREQ to their
    matching binary opcode; emitAssign emits `local + rhs ; pop
    local` for compound forms.

  * **Pcode body stack leaks polluted the caller's frame.** A pcode
    function whose body left intermediate values on the data stack
    (FOR control values, etc.) returned with extra entries past
    its declared retVal. FrbDoFunc / FrbExecFunc / FrbRunFunc then
    pushed retVal on top of those leaks, so the caller saw the
    leaked values where its own preceding arguments should have
    been: `? "Fibonacci(10) =", FrbDo(...), "(expect 55)"` printed
    `1 55 (expect 55)` because the FOR loop's `1` lived in arg-1's
    slot. Two new Thread methods (`SP()` / `SetSP(int)`) let the
    three FRB dispatchers snapshot stack depth before the inner
    call and clamp it back afterward, so the leaks evaporate before
    they reach the caller's frame.

  * **FrbExec / FrbRun recursed into the host's Main forever.** Both
    looked up "MAIN" via t.VM().FindSymbol, which always resolved
    to the OUTER program's Main since FRB modules deliberately keep
    Main local. Compile + run + unload became compile + recurse +
    OOM. Both now look up Main via mod.FindFunc("MAIN") (module
    scope) — Frbload's policy of leaving Main module-local now
    actually has the intended effect.

Plus an architectural improvement: in-memory compilation no longer
depends on shelling out to an external `five` binary. New
hbrtl.frbCompileInProc parses + preprocesses + generates pcode in
process, building a FrbModule directly. FrbCompile and FrbExec use
this exclusively, which means dynamic compilation works from any
directory regardless of PATH and without a second process. The
plugin-mode path (with its runtime-version-mismatch fragility) is
left available via hbrt.FrbCompileSource for callers that want it,
but FrbCompile no longer reaches for it by default.

Test suite: tests/frb/ holds five fixtures + a runner. 5/5 pass:
test_frb_simple / test_frb_pcode_load / test_frb_compile /
test_frb_loop / test_frb_step.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 14/14
  FRB suite          : 5/5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 10:25:35 +09:00
412351b67d feat(rtl): LIST/DISPLAY TO FILE — text output redirection
Wire up TO FILE for both LIST and DISPLAY: __dbList grows a 9th
parameter cFile, opens it (truncating any prior content) when non-
empty, and writes the formatted rows there via fmt.Fprintln. Default
behavior (no TO FILE) still goes to stdout.

std.ch gets two new rules placed *before* the regular LIST/DISPLAY
patterns so they win when TO FILE is present:

  LIST    [<v,...>] TO FILE <(f)> [OFF] [FOR] [WHILE] [NEXT] ...
  DISPLAY [<v,...>] TO FILE <(f)> [OFF] [FOR] [WHILE] [NEXT] ...

Open failure raises a clear *HbError ("LIST/DISPLAY TO FILE: cannot
create <path> — <syscall reason>") so callers know exactly what went
wrong instead of getting partial-or-empty output.

TO PRINTER stays rejected via __dbNotImpl — Five doesn't drive a
printer port. Test coverage: tests/std_ch/test_list_to_file.prg
exercises four shapes (full LIST, single-row DISPLAY, OFF + FOR with
explicit fields, and confirms TO PRINTER still raises). Wired into
the std.ch runner so the regression suite now stands at 14/14.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 14/14

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:15:32 +09:00
3a7f1dea72 feat(rtl,tests): pre-release UX round (Wave 5)
Three audit findings around polish + a release-readiness commit:

  * #UX1 LIST/DISPLAY output: dropped \r\n (unix terminals showed a
    stray ^M), moved the newline to AFTER each row (no more leading
    blank line), and added the `*` deleted-record marker after the
    record number — matches xBase LIST/DISPLAY convention. With
    SET DELETED ON the marker is unreachable since the row would
    have been skipped at Area.Skip level; with SET DELETED OFF the
    user now sees which rows are tombstoned.

  * #26 temp aliases: `__copytmp` / `__sorttmp` / `__totaltmp` /
    `__jointmp` were process-global string constants. A nested
    invocation (e.g., COPY inside a FOR clause whose expression
    runs another COPY) collided on the alias and the inner Open
    failed with "alias already in use" — surfacing as `.F.` with
    no clear cause. Each Open now goes through a new helper
    `nextTmpAlias(prefix)` backed by an atomic counter, so every
    call gets `__copytmp_1`, `__copytmp_2`, etc. — no collisions.

  * #J test coverage gap: the 13 std.ch regression tests were all
    sitting in `/tmp` — lost on tmpfs reboot, never in git, never
    in CI. Move them into `tests/std_ch/` and add a simple
    `run.sh` runner that builds + executes each one in a temp
    scratch directory and grep-asserts on FAIL / NOT REJECTED /
    expectation-mismatch markers. 13/13 pass against the current
    head:

       PASS  test_pp_stdch       PASS  test_count
       PASS  test_sum_avg        PASS  test_sum_multi
       PASS  test_copy           PASS  test_sort
       PASS  test_list           PASS  test_total
       PASS  test_join           PASS  test_update
       PASS  test_set_deleted    PASS  test_unsupported
       PASS  test_block_comma

    test_block_comma in particular guards the gengo SeqExpr fix
    from Wave 1 — without it the comma-in-block miscompile would
    silently come back.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 13/13

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:07:50 +09:00
dd270d5d9d perf: RTL Go-native migration — 27 optimizations, DML up to 70-90x
Systematic pass through PRG hot paths, promoting them to Go RTL while
preserving Harbour/FiveSql2 semantics. Full log in
docs/RTL-Go-Native-Migration.md.

Bench (bench_sql) vs 2026-04-08 baseline
 - B1  SELECT *             2,192 → 114   µs   (19x)
 - B6  INNER JOIN           9,291 → 233   µs   (40x)
 - B7  CTE simple           8,037 → 129   µs   (62x)
 - B9  ROW_NUMBER           3,705 → 265   µs   (14x)
 - B10 RANK PARTITION       4,748 → 309   µs   (15x)
 - B12 INSERT (WA cache)    4,319 →  63   µs   (69x)
 - B13 UPDATE (WA cache)    6,144 →  68   µs   (90x)
 - B15 CTE+WIN+JOIN        18,395 → 1,873 µs   (10x)

Infrastructure
 - HbHash O(1) Index preserving insertion order (Harbour KEEPORDER)
 - HbDeepClone Go RTL (scalar-sharing, immutable hash keys)
 - MEMRDD auto-imported via gengo; all Five programs get mem:name driver
 - SQL plan + pcode caches (s_hPlanCache, s_hDmlPcodeCache)
 - Opt-in SqlWACacheEnable — dbUseArea/Close/Commit batched for DML

SQL engine
 - FiveSql2 lexer ported to Go (byte FSM) with combined automatic
   template parameterization (literals → ?, concat queries share plan)
 - Go RTL: SqlDistinct, SqlGroupRows, SqlWindowPartitions,
   SqlWindowSortPartition, SqlWindowAssignRank, SqlComputeAggSimple,
   SqlBulkInsert, SqlBulkUpdate, SqlExprHasAgg, SqlEvalHaving
 - CTE / subquery / driving-table materialize paths use MEMRDD
 - SqlCoerce/SqlCmp/SqlIsTrue helpers moved from PRG to Go
 - SqlBulkUpdate defers Flush when WA cache active (APFS fsync was
   dominant B13 cost — 1.6ms/call → gone)

Correctness fixes uncovered during migration
 - ASort default path now sorts dates/logicals/timestamps (was no-op)
 - ORDER BY default NULL placement matches PRG SqlRowCompare across
   Go fast path; explicit NULLS FIRST/LAST honored by both paths
 - SqlBulkUpdate respects EXCLUSIVE vs SHARED mode record locks
 - SqlCmp/SqlCmpEq normalize NumInt vs Double (caught by test 6b)

Verification
 - go test ./...              ALL PASS
 - FiveSql2 test_sql1999      43/43
 - tests/compat_harbour       56/56 (+5 new: ASort dates/logicals,
                              AScan int cross-type)
 - Regression test test_null_order.prg for ORDER BY NULL ordering

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 20:20:14 +09:00
486e466592 feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix
Major changes since last commit:
- FiveSql2 SQL:1999 engine (10,458 LOC) — 43/43 ALL PASS
- 21 compiler/runtime bugs fixed (short-circuit AND/OR, FOR LOOP, etc.)
- @byref pass-by-reference via RefCell pattern
- Mutable closure capture (EnsureLocalRef + RefCell sharing)
- RTL: 400 → 479 functions (+79: file, string, datetime, hash, UTF-8)
- DateTime/Timestamp fully working (hb_DateTime, hb_Hour/Min/Sec, display)
- Reserved word guard (39 keywords blocked from function calls)
- AEval arg order fix (element before index)
- Closure capture redecl fix (unique _cap_ names per block)
- Hash/string indexing in ArrayPush/ArrayPop
- Harbour compat test suite: 51/51
- 4 docs: Porting Report, Implementation Plan, Optimization Plan, Commercialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:35:37 +09:00
59568f3301 Five v0.9 — Harbour + Go fusion language
- Compiler: PP → Lexer → Parser → Analyzer → Gengo pipeline
- Parser: 232/236 (98%) Harbour compatibility, registry-based dispatch
- RTL: 351 Harbour-compatible functions
- RDD: DBF/NTX/CDX engines with Rushmore bitmap optimization
- Go Interop: IMPORT + pkg.Func() + obj:Method() with FastPath (15M calls/sec)
- HB_FUNC API: Full Harbour C API compatible Go bridge
- Concurrency: SPAWN/LAUNCH/GOROUTINE, <-, WATCH, PARALLEL FOR, ASYNC/AWAIT
- Extensions: Multi-return, DEFER, Slice, f-string, Nil-safe ?:, CONST
- Macro Compiler: Runtime AST parsing and evaluation
- Debugger: TUI debugger with source display, breakpoints, stepping
- FRB: Native + Pcode dual mode runtime binary
- Tests: 13 packages ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-03-31 09:41:50 +09:00