Six audit-driven blockers landed together because they're tangled:
* MENU TO removed from std.ch — the rule expanded to a call to a
nonexistent __MenuTo() RTL symbol, so any user code with `MENU
TO choice` compiled clean and panicked at runtime. Behavior
pre-this-round was a parser silent no-op, which is at least
consistent. Restore that until @ PROMPT (the companion command)
actually lands.
* COUNT now requires `TO <var>`. The earlier `[TO <v>]` optional
bracket was a Harbour-pattern transcription error: the result
template references `<v>` unconditionally, so a bare `COUNT`
expanded to ungrammatical ` := 0 ; dbEval(...)` and the
PRG parser rejected it. Match Harbour's std.ch which makes TO
mandatory.
* UPDATE FROM ... REPLACE now requires `FROM`/`ON`/`REPLACE` all
three. Same root cause as COUNT: the result template uses
`<key>`, `<f1>`, `<x1>` unconditionally; missing any of them
produced broken syntax. Tightened to fail loudly rather than
silently mis-expand.
* CLOSE <unknown_alias> no longer closes the *current* workarea.
SelectByAlias was a silent no-op when the alias was missing,
leaving WASaveAndSelectAlias to evaluate the inner DbCloseArea()
against the originally-selected WA — a real data-loss footgun.
SelectByAlias now returns bool; WASaveAndSelectAlias switches to
the no-area sentinel (0) on miss so the inner expression's
Current() returns nil and short-circuits.
* SUM <x1>, <xN> TO <v1>, <vN> — multi-pair form supported.
Required two pieces:
1. matchSegment's regular-marker stop-boundary now combines
outerTail literals AND the segment's repeat boundary so
`[, <xN>]` doesn't let `<xN>` swallow past the next ','.
2. **Five parser miscompiled comma-separated expressions in
code blocks.** `{|| e1, e2, e3 }` kept only the last expr
and threw away earlier ones at *AST level*, so all their
side effects vanished. New SeqExpr AST node + emitter
(emit each, pop intermediate results) + folding/walk
updates fix the underlying bug, which also unbreaks any
other block that relied on comma sequencing.
* pp.go's `;` continuation joiner now strips exactly one trailing
`;` per iteration, preserving Harbour's `;;` convention (literal
`;` followed by a continuation marker). Without this the SUM
rule's chained `<v1> :=[ <vN> :=] 0 ; ; dbEval(...)` collapsed
to a missing statement separator.
* parseExprStmt's xBase fallback switch is back in sync with
parseIdentStmt — COPY/SORT/COUNT/SUM/AVERAGE/TOTAL/UPDATE/JOIN/
DISPLAY/LIST removed (std.ch handles all of them now). Leaving
them in the fallback masked typos as silent no-ops.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`UPDATE [FROM <alias>] [ON <key>] [RANDOM] REPLACE <f1> WITH <x1>
[, <fN> WITH <xN>]` becomes a preprocessor rewrite to a new RTL
primitive __dbUpdate. For each detail record, find the master
record with matching key (forward-walk if both sorted, full scan
when RANDOM) and apply the REPLACE clauses in master's context.
Same shape as harbour-core/src/rdd/dbupdat.prg. The REPLACE clauses
expand to comma-separated assignments inside one block —
`{|| _FIELD->total := del->amt, _FIELD->status := "OK" }` — using
the multi-pair `[, <fN> WITH <xN>]` optional-repeat that std.ch
already establishes for SUM and DEFAULT.
Five-specific tweak: ON <key> wraps as `{|| _FIELD-><key> }` rather
than Harbour's bare `<{key}>`. Five doesn't auto-resolve a bare
identifier in a code block to the current workarea's field, and the
UPDATE block must evaluate against both detail and master so an
explicit alias prefix won't do — _FIELD-> dispatches to whichever
area is selected at eval time, which is what's needed.
Wiring up UPDATE surfaced one further matchSegment gap that fell
out of the multi-pair `[REPLACE ... [, ...]]` shape:
* matchSegment didn't handle nested `[...]` inside its body.
`[REPLACE <f1> WITH <x1> [, <fN> WITH <xN>]]` gave the inner
`[` as a literal token to match against the line, so even the
single-pair `REPLACE total WITH del->amt` form failed and f1/x1
came back empty. Now matchSegment runs the same repeat-loop on
inner `[...]` blocks that the top-level matcher uses, with its
own outer-tail computed from the segment tail past the inner
`]`.
Parser cleanup: UPDATE removed from the IDENT-statement no-op switch.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`JOIN WITH <alias> TO <file> [FIELDS <list>] [FOR <expr>]` becomes a
preprocessor rewrite to a new RTL primitive __dbJoin. Cartesian
product of the current ("master") workarea and the named "detail"
alias, filtered by the FOR expression.
Output structure:
* No FIELDS clause: master's fields followed by detail's, dropping
any detail-side name that clashes with master.
* FIELDS list: one column per name in declaration order, resolved
against master first then detail.
Same shape as harbour-core/src/rdd/dbjoin.prg. Five-specific
simplifications: alias->name in FIELDS not yet supported (bare
names with master-precedence lookup); RDD/codepage args dropped
since Five only has DBFNTX.
Note for callers: don't name a workarea `M` or `MEMVAR` — both are
Harbour-reserved memvar aliases, so `M->field` and `MEMVAR->field`
always go through the memory-variable namespace, not the workarea.
This is gengo behavior matching Harbour, not new in this commit.
Parser cleanup: JOIN removed from the IDENT-statement no-op switch.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`TOTAL TO <file> ON <key> [FIELDS <list>] [FOR ...] [WHILE ...]
[NEXT ...] [RECORD ...] [REST] [ALL]` joins the family of std.ch
DML rewrites. New RTL primitive __dbTotal:
* Walk the source under dbEval-style FOR/WHILE/NEXT/RECORD/REST
bounds. The source must already be sorted/indexed on the key —
same precondition as Harbour's dbtotal.prg.
* Track the current group key. On each key change, flush the
accumulated row to the destination (writing the running totals
back into the most recently appended record's sum-fields,
preserving each field's declared length/decimals).
* On the *first* record of every group, append a fresh dst row
and copy all non-memo source fields into it; subsequent records
in the group only contribute to the sums. Net effect: non-summed
fields take the first record's value, summed fields hold the
group total. Same shape as harbour-core/src/rdd/dbtotal.prg.
* Memo fields are dropped from the destination structure (Harbour
does the same).
Parser cleanup: TOTAL removed from the IDENT-statement no-op switch.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`LIST [<fields>] [OFF] [FOR ...] [WHILE ...] [NEXT ...] [RECORD ...]
[REST] [ALL]` and `DISPLAY [<fields>] [OFF] [FOR ...] ... [ALL]`
reach the parser as plain function calls to a new RTL primitive
__dbList (rtlDbList in hbrtl/database.go).
Implementation: walk the workarea under dbEval-style FOR/WHILE/NEXT/
RECORD/REST bounds. For each visible record, evaluate each column
block and emit the rendered values via valueToDisplay (the same
formatter QOut already uses). Empty fields list defaults to
"all fields". OFF suppresses the record-number prefix.
LIST always emits the full filtered range; DISPLAY without ALL emits
only the current record (encoded as nCount=1). TO PRINTER / TO FILE
clauses are not yet wired through — for now everything goes to
stdout.
Wiring up LIST/DISPLAY surfaced four further gaps in PP that were
silently masking bugs in any rule with multiple word-list / list /
optional clauses chained together:
* matchSegment refused MarkerWordList inside `[...]`. The LIST
rule's `[<off:OFF>]` clause therefore never set the off
capture, and `<.off.>` substituted to nothing instead of .T./.F.
matchSegment now matches WordList markers the same way the
top-level matcher does.
* `<v,...>` and `<(f)>` capture stop boundaries didn't include the
values of following MarkerWordList markers. For
`[<v,...>] [<off:OFF>] [<all:ALL>]` against `LIST id, name OFF`,
the v list would happily eat OFF. New addStopFrom helper
contributes both literal keywords and word-list values; both
matchSegment's MarkerList branch and captureExpression now use
it.
* Optional-repeat loop in matchPattern merged a no-progress
iteration's empty capture into the running multi-capture string
(with the `\x01` separator) before the no-progress break check
fired. So a successful first iteration's value got contaminated
and the substitution loop then skipped it as multi-capture
garbage. The merge now happens after the progress check.
* Unreferenced `<.name.>` markers (optional clauses that didn't
match in the input) were getting cleaned up to empty by the
generic marker scrubber instead of the .F. sentinel Harbour's
std.ch expects. New replaceUnreferencedLogify pass mirrors the
existing replaceUnreferencedBlockify and runs just before the
cleanup.
Parser cleanup: LIST and DISPLAY removed from the IDENT-statement
no-op switch in both parseIdentStmt and parseExprStmt.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`<{name}>` previously wrapped a list-typed capture's whole
comma-joined string in one code block: `{|| id , name }`. Harbour's
std.ch expects per-element wrapping so `{ <{v}> }` against
`LIST id, name` yields `{ {|| id }, {|| name } }` — an array of
column blocks the call site can evaluate per row.
applyResult now consults the marker table for blockify the same way
it already does for smart-stringify, splits the captured list on
top-level commas, and emits one `{|| expr }` per element.
Prereq for the upcoming LIST / DISPLAY rules; no user-visible
behavior change for the rules already in std.ch (their `<{for}>` /
`<{while}>` markers are scalar).
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`SORT TO <file> [ON <key-list>] [FOR ...] [WHILE ...] [NEXT ...]
[RECORD ...] [REST] [ALL]` joins COPY in being a real preprocessor
rewrite to a function call. New RTL primitive __dbSort:
* Buffer visible source records (FOR/WHILE/NEXT/RECORD/REST same
as __dbCopy).
* Multi-key stable insertion sort. Each key may carry `/D` for
descending; ascending otherwise. /A and unknown suffixes fall
through as ascending. Comparison delegates to the existing
compareValues helper in sqlscan.go (numeric / string / NIL-aware).
* Create destination DBF with the source's struct, append rows in
sorted order, restore source selection.
Parser cleanup: SORT removed from the IDENT-statement no-op switch.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`COPY TO <file> [FIELDS <list>] [FOR ...] [WHILE ...] [NEXT ...]
[RECORD ...] [REST] [ALL]` reaches the parser as a plain function
call to a new RTL primitive __dbCopy (rtlDbCopy in hbrtl/database.go).
Implementation: project the field list (case-insensitive name match
against the source's structure, full copy when omitted), dbCreate the
target file with that struct, open it under a temp alias, walk the
source under dbEval-style FOR/WHILE/NEXT/RECORD/REST bounds, and
GetValue/Append/PutValue per record into the target. SDF / DELIMITED
variants stay parser no-ops until those backends arrive.
Wiring up COPY surfaced four longstanding gaps in the PP that had to
be fixed for the rule to even reach the runtime:
* `<(name)>` *pattern* marker was treated as a regular `<name>`
with the parens baked into the captured key, so the matching
result substitution `<(name)>` couldn't find it. parseOneMarker
now strips the parens at parse time so capture key and result
marker share the bare name. The smart-stringify result behavior
is unchanged.
* matchSegment (the optional-clause matcher) bailed on every
non-Regular marker. `[FIELDS <fields,...>]` therefore failed to
match at all and the fields list arrived empty in the result
template. matchSegment now handles MarkerList with paren-balanced
capture and segment+outer literal stop boundaries.
* captureExpression only used the first literal in the pattern
tail as a stop boundary. With std.ch's chain of optional
clauses (`[TO <(f)>] [FIELDS ...] [FOR ...] [WHILE ...] ...`)
the file-name marker was happy to gobble a trailing FOR clause
when FIELDS was absent. It now stops at *any* of the remaining
pattern literals.
* `<(name)>` smart-stringify on a list-typed capture wrapped the
whole comma-joined string in one set of quotes — `{ "a , b" }` —
instead of `{ "a", "b" }`. New helper quoteListElements splits on
top-level commas (paren / bracket / brace / string-balanced) and
quotes each element. applyResult now consults the rule's marker
table to know which captures came from `<name,...>`.
Parser cleanup: COPY removed from the IDENT-statement no-op switch in
both parseIdentStmt and parseExprStmt.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three xBase analytical commands that were silent no-ops in the
parser now execute as Harbour-style PP rewrites:
COUNT [TO <v>] [FOR <for>] [WHILE <while>] ... -> dbEval()
SUM <x> TO <v> [FOR <for>] [WHILE <while>] ... -> dbEval()
AVERAGE <x> TO <v> [FOR ...] -> __dbAverage()
COUNT and SUM expand to a `<v> := 0 ; dbEval( {|| ... } )` pair
matching harbour-core/include/std.ch verbatim. AVERAGE delegates to
a new RTL function rtlDbAverage (sum + count + divide; returns 0 on
empty match) — the chained-private-variable trick Harbour uses to
keep AVERAGE inline doesn't translate cleanly through Five's PP.
Wiring up these rules surfaced four PP issues that had to be fixed
for the rewrite to even reach the parser:
* Result template did not implement <{name}> blockify. So a rule
body like `{|| x := x + <x> }, <{for}>` left the literal text
`<{for}>` in the output. Added blockify substitution: captured
-> `{|| <captured> }`, missing -> NIL.
* findMarkerEnd did not recognise `{`/`}` so unreferenced
blockify markers were not cleaned up either. Added `{`/`}` to
its prefix/suffix sets.
* Optional-clause matching had no view of the outer pattern, so a
regular marker at the end of `[TO <v>]` would swallow the rest
of the line — `COUNT TO n FOR x>5` captured `<v>` as
"n FOR x>5". matchSegment now takes outerTail and stops at its
first literal.
* `#command` directives could not span multiple physical lines.
A trailing `;` is harbour-core's line-continuation marker for
std.ch and now joins the next line into the directive before
parsing.
Parser cleanup: COUNT, SUM, AVERAGE removed from the IDENT-statement
no-op switch in parseIdentStmt + parseExprStmt. The remaining xBase
verbs (COPY, SORT, TOTAL, JOIN, LIST, DISPLAY, LABEL, REPORT, ...)
stay in the parser until their RTL backends arrive.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Introduce compiler/pp/std.ch with 19 #command rules so that ERASE,
RENAME, DELETE FILE, CLOSE [<a>|ALL|DATABASES], COMMIT, UNLOCK,
LOCATE/CONTINUE, REINDEX, PACK, ZAP, KEYBOARD, RUN, MENU TO, and
CLEAR GETS reach the parser pre-rewritten as plain function calls.
Embedded into the compiler binary via //go:embed so it auto-loads
without an explicit #include in user code, exactly the way Harbour
auto-loads its std.ch.
This is a pure dispatch move, not a behavior change for the
already-working forms: the same Five RTL functions get called.
But it does fix three regressions that the parser was masking:
* ERASE / RENAME / DELETE FILE used to be silent no-ops — the
parser swallowed the entire line and returned NIL. They now
actually delete/rename files (FErase / FRename).
* CLOSE <alias> used to silently ignore the alias and close the
current area. It now switches to the named area first
(<a>->( DbCloseArea() )).
* Two latent #command matcher bugs that surfaced while wiring
std.ch up:
- bare `CLOSE` would match rule `CLOSE ALL` because the tail
of the pattern wasn't checked for unconsumed literals.
- bare `CLOSE` would match rule `CLOSE <a>` because all
unconsumed pattern markers were unconditionally treated as
optional. They are only optional when nested inside `[...]`.
Parser cleanup: parseIdentStmt + parseExprStmt no longer hardcode
ERASE / RENAME / RUN / KEYBOARD / REINDEX / LOCATE / CONTINUE /
COMMIT / CLOSE — the rewriter handles them. Other xBase verbs
(COPY / SORT / COUNT / SUM / AVERAGE / TOTAL / JOIN / LIST /
DISPLAY / LABEL / REPORT / DIR ...) still no-op in the parser
because their RTL backends aren't implemented yet — once the
backends land they move into std.ch the same way.
Gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour's `#xcommand DEFAULT <v1> TO <x1> [, <vn> TO <xn>] => ...`
uses an optional, repeatable trailing `[...]` block to accept any
number of `var TO default` pairs on a single line. Five's PP
skipped bracket bodies during pattern matching and treated them
as no-ops in result templates, so
DEFAULT a TO 10, b TO 20, c TO 30
expanded (at best) the first pair and dropped the rest — and
common.ch itself was documented as "not yet supported".
Three concrete changes:
1. matchPattern now matches the `[...]` body repeatedly against
remaining line tokens via a new matchSegment helper. Each
successful iteration appends captures for the interior markers
under the same name, joined with a \x01 sentinel.
2. matchSegment, when capturing the last marker in a body with no
following literal, uses the body's opening literal (e.g. the `,`
in `[, <vn> TO <xn>]`) as the iteration boundary. Otherwise
captureExpression would greedily eat the rest of the line and
collapse every remaining pair into one capture.
3. applyResult's new expandOptionalRepeat walks the result template
for top-level `[...]` blocks. When a referenced marker is multi-
captured it emits the body N times (substituting per-iter value);
when it's single-captured it emits the body once; otherwise drops
the block. A separate referencedMarkers scanner and an inMarker
guard keep literal `[` / `]` inside PP markers (like `<.x.>`)
from being mistaken for bracket delimiters.
Side fix: ParseRule previously stripped every ` ;` as a Harbour
line-continuation marker, but that also destroyed in-line PRG
statement separators in result templates. Line joining is the
preprocessor's job upstream — keep semicolons intact here.
common.ch now ships real DEFAULT and UPDATE #xcommands. Verified
1-, 2-, and 3-pair DEFAULT expansion plus `common.ch` inclusion
from user code. FiveSql2 43/43, Harbour compat 56/56, Go test ALL
PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two bugs blocked Five's own inline-Go feature:
1. Inline Go blocks placed mid-file couldn't carry an `import` list
because Go rejects declarations before imports in the same file.
examples/godump_demo.prg and friends (real Five demos) hit
"syntax error: imports must appear before other declarations"
during compile of the generated Go.
hoistGoImports parses the raw dump body for `import (...)` blocks
and single-form `import "path"` lines, registers each path into
the generator's imports map, and returns the body with those
directives stripped. The top-of-file import block then carries
everything the dump needs.
2. HB_FUNC() calls inside the inline block's init() enqueue
registrations into hbrt.dynamicFuncs, but the VM only promotes
them to its symbol table when RegisterLibModules() is called.
gengo's generated main() skipped that step, so dispatch on the
inline-defined names panicked with "no function symbol for call".
Emit vm.RegisterLibModules() after RegisterModule(symbols).
Verified: examples/godump_demo.prg builds and runs; the inline
GoUpper / GoFib / GoGCD / GoSplit / GoSquare / GoTypeOf functions
all dispatch. Matches the feature's original design intent.
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour's #pragma BEGINDUMP ... #pragma ENDDUMP blocks carry C source
that the Harbour toolchain embeds verbatim. Five takes the same
directive but targets Go — any `.prg` ported from Harbour that ships
inline C gets its C shoveled into the Go codegen pipeline and fails
with opaque errors like "invalid character U+0023 '#'" from the Go
compiler, dozens of lines downstream of the actual cause.
Detect the C shape at PP time and report a clear, actionable error:
pp: file.prg:N: #pragma BEGINDUMP contains C code — Five accepts
inline Go only. Port the block to Go (or use an RTL function),
then wrap in #pragma BEGINDUMP ... #pragma ENDDUMP.
looksLikeInlineC uses conservative signals that don't false-positive
on legitimate inline Go (which calls `hbrt.HB_FUNC("NAME", fn)` with
a package prefix and a quoted string, distinct from C's bare
`HB_FUNC(NAME)` macro). Signals:
- `#include <...>` / `#include "..."` — unambiguous C preprocessor
- line-starting `HB_FUNC(` / `HB_FUNC_STATIC(` — C FFI macro
- `typedef ` / `struct ` / `int main(` / `void main(` at line start
main.go now aborts the build when PP returns errors (previously
printed but continued — same behavior the parser already had for
its own errors). Keeps build output short: one pp line + one
summary line, no gengo noise.
Verified:
- harbour-core/tests/inline_c.prg → clean PP error, exit 1
- examples/godump_demo.prg (legitimate inline Go) → passes PP
(hits a separate pre-existing gengo import-ordering bug, not
related to this change)
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two related fixes for Harbour's data-driven `USE &cFile ALIAS &cAlias
INDEX &cNdx` idiom — common in any app that dispatches table names
at runtime.
Parser (compiler/parser/parser.go parseUse):
- `USE &cFile` / `USE &(expr)` previously triggered a
skipToEndOfLine short-circuit, emitting an empty UseCmd (equivalent
to bare USE = close current area). Now parseMacro runs and the
MacroExpr becomes the File node, so codegen emits MacroPush +
dbUseArea.
- `ALIAS &cAlias` / `ALIAS &a.1` similarly dropped the macro result;
now captures it into UseCmd.AliasExpr so codegen evaluates the
alias at runtime. Both the IDENT-path ("ALIAS") and keyword-path
(token.ALIAS) handlers fixed.
PP (compiler/pp/command.go):
- captureExpression and the MarkerList branch now paren-balance
`(`/`[`/`{` so nested grouping inside a macro argument doesn't let
an inner `)` terminate the capture. Example:
_REGULAR_(&(a))
previously captured `&(a` (missing inner `)`) and left the outer
`)` dangling, producing parse errors in the expanded output.
- MarkerList capture still joins tokens with " " for raw `<z>`
substitution — comma tokens stay in the stream, so `s(<z>)`
re-emits them as argument separators and the list expands cleanly.
Bench: harbour-core/tests/pp.prg 2 errors → 0 for the realistic
`USE ¯o` / `&(expr)` patterns. Remaining parse errors on line 70
are a pathological `_REGULAR_L` list that includes `&a. [2]`
(space between macro's terminating dot and an array index) — the
PP expands it correctly but Five's lexer refuses the expanded
result. That form doesn't occur in real code.
/tmp/test_use_macro.prg — all four patterns (`USE &f`, `USE &f ALIAS
&f`, `USE &f ALIAS &f INDEX &i`, dot-terminated) now compile. FiveSql2
43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three cumulative fixes for Harbour's preprocessor stringify forms
surfaced by harbour-core/tests/pp.prg:
1. Token alignment — tokenizePattern and tokenizeLine now both
split on parens and brackets, so `DUMB(a)` (no space) tokenises
as `DUMB`, `(`, `a`, `)` on both sides. Previously the line
tokenizer kept `DUMB(a)` as one token while the pattern split
it three ways, and the match never engaged. Fixes `_DUMB_(a)`-
style calls in pp.prg line 57+.
2. Substitution order — applyResult was replacing the bare `<z>`
marker first, eating the inner `<z>` of `#<z>`, `<"z">`, `<(z)>`
and `<.z.>` and leaving stray `#` / `<` / `.` characters that
the lexer reported as ILLEGAL tokens. Run all compound forms
first, bare `<z>` last.
3. Quote delimiter picker — ppQuote wraps a captured value in a
legal PRG string literal by trying `"..."` first, then `'...'`,
then `[...]`. Harbour's #<z> dumb-stringify needs this because
the capture may already contain `"`, and Five was producing
malformed `""world""` literals.
Bonus: smart-stringify `<(z)>` now recognises input that's already
a string literal (`"x"` / `'x'` / `[x]`) and keeps it verbatim
instead of double-quoting.
pp.prg 26 parse errors → 2 (remaining: `USE &b ALIAS &a.1` macro-
inside-command at line 21 and one related line, unrelated to this
fix). FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour reserves the aliases `M` and `MEMVAR` for the memvar
namespace — `M->cVar` reads a PUBLIC/PRIVATE memvar, not a DBF
field in a workarea named M. Five's emitAliasExpr and emitAssign
treated all aliases identically, emitting:
t.PushAliasField("M", "cVar") // read
_wa := t.WA.(*hbrdd.WorkAreaManager); _wa.SetAliasField("M", ...) // write
which triggered a spurious hbrdd import on programs using memvars
and attempted a workarea lookup that couldn't find a "M" area at
runtime.
Detect the reserved aliases (case-insensitive) at the three
AliasExpr call sites — the read path (emitAliasExpr) and both
assign paths (emitAssign for statements, emitAssignExpr for
expression context) — and route to t.PushMemvar / t.PopMemvar
instead. The existing Thread helpers hash into the MemvarTable
populated by PUBLIC/PRIVATE declarations.
Unblocks harbour-core/tests/macro.prg build (runtime still needs
the TVALUE test helper, unrelated). FiveSql2 43/43, Harbour compat
56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three SWITCH codegen bugs surfaced by harbour-core/tests/switch.prg:
1. Empty SWITCH (`SWITCH x ENDSWITCH`) — legal Harbour, produced by
conditional-compile files like switch.prg:13. Previous code
emitted `_sw := t.Pop2()` followed by `}` with no matching `{`,
closing the enclosing procedure body and producing "syntax error:
non-declaration statement outside function body".
2. OTHERWISE-only (no CASE arms) — emitted `} else {` with no opening
if, same "unexpected keyword else" category.
3. `EXIT` inside a CASE should break out of the SWITCH — but Five
lowers SWITCH to an if/else-if chain, so the generated `break`
had nowhere to land ("break is not in a loop, switch, or select").
Fix all three by wrapping every SWITCH in a one-iteration `for`
loop. `break` inside a case targets the wrapper, matching Harbour
semantics. Empty / OTHERWISE-only bodies still emit valid Go
because the for-loop provides the scope boundary regardless of
whether any if-chain opened. A trailing `break` keeps the loop
one-shot.
Also:
- `_ = _sw` silences unused-var for empty SWITCH.
- Conditionally emit the if-chain closing `}` only when at least
one CASE ran.
All 15 SWITCH blocks in harbour-core/tests/switch.prg now build
and run to completion. FiveSql2 43/43, Harbour compat 56/56,
Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Real Harbour headers write parameterised commands with no space
between the keyword and its opening paren:
#xcommand MAKE_TEST( <obj>, <v> ) => ...
ParseRule stored the rule keyword as `MAKE_TEST(` (stripping only
<>, [] marker wrappers), but firstToken normalised source lines by
stopping the first-word scan at `(` — so `MAKE_TEST( o, 42 )`
produced `MAKE_TEST` for the lookup. The two strings didn't match
and the fast-path keyword check rejected every invocation, leaving
the macro unexpanded and the call site as a bare undeclared
identifier.
Trim everything from the first `(` onward during keyword
extraction so both halves agree on the dispatch key. The marker
tokens inside the parens are still parsed normally by
parseMarkers / matchPattern.
Verified with /tmp/test_xcmd2.prg (`MAKE_TEST( o, 99 )` expands
and dispatches to the object's :hVar access). FiveSql2 43/43,
Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour permits keywords (CASE, DO, WHILE, etc.) to be used as
variable/array names. In most expression contexts Five already
handles this via expr.go:362 which whitelists keywords when used
as bare identifiers. But parseStmtBlock was stopping on any stop
token unconditionally, so a line like
case[ n ] := x -- 'case' is a LOCAL array
terminated the enclosing stmt block at `case` and left `[ n ] := x`
unparsable.
Add isIdentSuffix(): peeks one ahead and reports whether the next
token is something that can only follow an identifier ([, :=, +=,
-=, *=, /=, %=, ^=, ++, --, :, .). parseStmtBlock now treats the
stop token as a statement-start when its suffix matches, so the
block keeps going.
Verified with /tmp/test_kwident.prg (`case[...]` outside DO CASE,
`arr[...]` inside DO CASE body), /tmp/test_kwident2.prg (both the
`case case[n] == "two"` arm and `case[1] := "updated"` assignment
after ENDCASE). Pathological harbour-core/tests/keywords.prg still
fails — it places `case[...]` in the arm-expected position of a
DO CASE block with no leading arm, which no sane parser can
disambiguate.
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Classic Clipper/Harbour form writes method implementations as bare
`METHOD Name(params)` statements following a `CLASS X ... ENDCLASS`
declaration, with the binding inferred from the most recent class:
CREATE CLASS Shape
METHOD Area
ENDCLASS
METHOD Area -- binds to Shape
RETURN 0
Five was requiring `METHOD Area CLASS Shape` explicitly. Without it,
parseMethodDecl left MethodDecl.ClassName empty, gengo skipped the
body emission, and the link step failed with `undefined: HB_SHAPE_AREA`.
The class registration had AddMethod("AREA", HB_SHAPE_AREA) pointing
at the missing symbol.
Parser tracks p.lastClassName at parseClassDecl, and parseMethodDecl
falls back to that value when no CLASS clause is supplied. Each new
CLASS declaration updates the tracker, so multi-class files still
dispatch correctly — verified with /tmp/test_implicit_class.prg
(Shape + Box both resolve their own Name/Area methods).
Unblocks harbour-core/tests/clsscope.prg and other OOP compat
tests that use this form. FiveSql2 43/43, Harbour compat 56/56,
Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour's DO() accepts a string (looked up as a function name), a
code block (evaluated with args), or a symbol, and invokes it. Used
for plugin systems and dynamic dispatch idioms like
`DO(cHandler, oRequest)`.
Five already had stmtDo rewrite `DO(...)` at statement-level to a
function-call expression, so callers in expression position just
work — but gengo refused to emit DO as a function call because it
was on the reserved-word guard list (which existed to catch stray
ENDIF/ENDDO from bad IF nesting). Remove DO from that list; the
statement form is still handled upstream by parseDoProc, so the
guard loses nothing.
rtlDo implements the dispatch:
- String target → VM.FindSymbol + t.Function
- Block target → EvalBlock path (same as Eval)
- Anything else → NIL
Tested (/tmp/test_do.prg):
DO("Greet", "World") → "hello, World"
DO({|x,y| x*y+1}, 5, 6) → 31
DO(NIL) → NIL (ValType "U")
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour's `DATA name1, name2, name3` (and `VAR`, `CLASSDATA`)
should declare every listed field. Five's parseDataDecl instead
returned a single DataDecl for the first name and silently dropped
the rest — the comma branch just consumed the identifier without
producing a new decl. Surfaced by the OPERATOR overloading test
(/tmp/test_operator.prg originally had `DATA x, y` for a Vec2
class) where later `::y` access panicked with "unknown method y".
Change the signature to `[]*ast.DataDecl` and rewrite the loop so
each comma closes the current decl and starts a fresh one. AS /
INIT / qualifier runs still attach to the most recent name, so:
DATA x, y, z → three decls, no init
DATA x INIT 10, y, z INIT 0 → init attaches to preceding name
DATA cName AS CHARACTER → typed single decl
All seven class-body call sites flatten the slice into `members`.
Verified with /tmp/test_multidata.prg (`DATA x, y, z` + mixed
`DATA label INIT "origin", count INIT 0`) and the OPERATOR test
which now passes with the original `DATA x, y` form restored.
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour's macro operator was a stub: hbrt.MacroCompile only resolved
bare identifier names to memvars/functions and returned the source
string unchanged for any non-trivial expression. The gengo emit was
also broken — `t.MacroPush() + t.PushNil()` never pushed the inner
expression's value, so MacroPush popped whatever happened to be on
the stack.
Wire it up properly:
1. Gengo fix: `case *ast.MacroExpr` now emits `emitExpr(e.Expr);
t.MacroPush()`. The inner expression produces the source string;
MacroPush consumes it and pushes the evaluated result.
2. Hook pattern in hbrt: `SetMacroEvalHook(fn)` lets hbrtl install
the real evaluator without creating an import cycle (genpc
already imports hbrt). MacroPush delegates to the hook when
installed; otherwise falls back to the legacy stub for hbrt
unit tests.
3. hbrtl.init registers macroEval, which reuses compileExprSource
(factored out of PcCompile) so macro lookups share the same
sync.Map-backed pcode cache — repeat evaluations of the same
macro source are free after the first hit.
4. ExecPcode leaves the result in retVal; macroEval copies it to
the operand stack via PushRetValue.
Tested (/tmp/test_macro.prg):
&"10 + 20" → 30
&"Sqrt(16)" → 4
&"Upper('hello')" → HELLO
&("30 * " + Str(nX, 1)) → 210 (runtime-built source)
&"5 > 3 .AND. .T." → .T.
&("Str(" + Str(nX*10,2) + ",2)") → 70
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour lets a class define custom behaviour for arithmetic and
comparison operators via `OPERATOR "<sym>" ARG <name> INLINE <expr>`.
Five already had the runtime slot infrastructure (ClassDef.Operators
+ AddOperator + parent-chain copy) but parser skipped the form and
the VM ops never consulted the slots.
Parser: parseOperatorDecl captures the symbol, ARG binding, and
INLINE body into a MethodDecl with IsOperator=true and OperatorOp
set to the hbrt.Op* slot. Synthesised method name is __OP_<idx>
to keep the regular method namespace clean.
Codegen: emitClassDecl routes IsOperator members through
_def.AddOperator instead of AddMethod. Inline body generation is
shared with the MESSAGE/INLINE path (34485cd).
VM: Thread.tryBinaryOp walks the LHS object's class operator slot,
pushes args with Self bound to LHS, and returns true if the slot
is populated. Wired into Plus/Minus/Mult/Divide and Equal/NotEqual/
Less/Greater/LessEqual/GreaterEqual. Falls through to built-in
behaviour when no overload exists — non-object LHS costs one tag
check per op.
Operator symbol→slot mapping keeps `=` and `==` on the same slot
(OpEqual=8) because Five's gengo routes both to t.Equal() and the
VM doesn't distinguish strict vs non-strict equality today.
Tested (/tmp/test_operator.prg): Vec2 + - == < with per-field
results all correct.
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour's inline-method sugar was parsed but the body was skipped,
leaving any `METHOD X() INLINE expr` declaration registered in the
class vtable with no matching HB_<CLASS>_X function — link error
at build time.
Parser: MethodDecl gains an InlineBody Expr field. parseClassMethodDecl
captures the expression after INLINE instead of skipping to EOL.
New parseMessageDecl handles `MESSAGE <name> [(params)] INLINE expr`
and returns the same MethodDecl shape.
Codegen: emitClassDecl walks members a second time after the class
registration init block and emits emitInlineMethodBody for each
IsInline method — a Frame(nParams, 0) + emitExpr(InlineBody) +
RetValue function. curMethodClass is bound so ::super: inside an
inline body still resolves.
Tested (/tmp/test_inline.prg): all four patterns — bare INLINE,
MESSAGE INLINE, INLINE with params, INLINE reading ::field —
produce expected values.
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Harbour's ::super: idiom routes a method call through the parent of
the class that defines the currently-executing method — Self stays
the child instance, only the vtable entry point shifts. Five
previously parsed ::super as a data-field access (PushSelfField("SUPER"))
which returned nil and panicked on the subsequent Send.
Runtime: Thread.SendSuper(fromClassName, methodName, nArgs).
Binding to the *defining* class (not Self's runtime class) is
load-bearing for 3+ level hierarchies: without it,
Grand:New → ::super:New → Child:New → ::super:New
would resolve to Grand.Parent=Child again and infinite-loop.
Gengo: Generator.curMethodClass tracks the class name across each
method body emission. emitSendExpr detects the nested SendExpr
shape `::super:X(...)` and emits SendSuper with curMethodClass as
the first argument.
Tested (/tmp/test_super, /tmp/test_super2):
Parent → Child: ::super:Greet() returns composed result
Base → Child → Grand: ::super:New chain passes args correctly
Also fixes three gengo unit tests whose expected output was stale
from prior perf commits (b829ed4 const prop, 1f63c7f symbol hoist,
7e4079f string-concat reassoc) — assertions now match the current
optimized codegen.
FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When collectConstLocals proves a LOCAL is only ever read, not
written beyond its literal init, every read site gets the literal
substituted inline — which means the init itself has no live
reader. Skip emitting the PushXxx/PopLocalFast pair for those
LOCALs in both top-of-function and mid-body decls.
On a function with `LOCAL nBuf := 100, sTag := "x", bFlag := .T.`,
all three inits drop out (6 VM ops saved in the prologue), while
the still-written `LOCAL nSum := 0` init stays. Harbour compat
56/56, FiveSql2 43/43.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scan each function body for LOCALs whose sole write is a literal
initialiser (never ++/-- / += / @byref / MultiAssign target /
FOR var / @GET target / macro). Reads substitute the literal
inline at emit time, which cascades into all earlier folds: dead
IF branches, AND/OR short-circuit, NOT, string-concat reassoc,
and the FOR LocalLessEqualInt fast path (extended to see through
a propagated ident limit).
Walker is bounded — unrecognised AST nodes abort propagation for
the whole function rather than risk missing a hidden write.
Harbour compat 56/56, FiveSql2 43/43.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`"a" + x + "b" + "c" + "d"` used to emit 4 Plus() calls because
the parser builds a left-leaning chain and no pair was
literal+literal. Add a reassociation step inside foldLiteralTree:
when the outer shape is `(Y + strlit1) + strlit2`, rewrite as
`Y + (strlit1+strlit2)` so the tail literals collapse. Also run
foldLiteralTree on the root BinaryExpr in emitExpr so the
outermost reassoc fires (was only running on children).
Verified: the 4-Plus case now emits 2 Plus calls (`"a" + x + "bcd"`).
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DO WHILE .T. now emits a bare for-loop with no PushBool/PopLogical
per iteration — saves a stack roundtrip on every trip through the
idiomatic infinite-loop pattern (9 .prg files use it). DO WHILE .F.
emits nothing. Loop exits still work via EXIT / RETURN.
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`.NOT. .T.` / `.NOT. .F.` emit PushBool directly instead of
pushing the source bool and calling Not(). boolLiteralValue also
sees through an outer NOT, so `IF !.F.` now triggers the full
dead-branch pass (no PopLogical wrapper either).
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skip the PushBool/PopLogical/branch wrapper when the LHS of .AND. /
.OR. is a bare .T./.F. literal. `.T. .AND. X` emits X alone;
`.F. .AND. X` emits PushBool(false) with X dropped; symmetric for
OR. Common after constant-folding a sub-expression — pairs with
the earlier dead-IF-branch peephole.
FiveSql2 43/43, Harbour compat 56/56. Verified via /tmp/test_andor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IF .T. collapses to its body; IF .F. forwards to the first live
ELSEIF or ELSE. For dynamic main conditions the chain is still
filtered: ELSEIF .F. drops out, ELSEIF .T. truncates and becomes
the ELSE. Verified with /tmp/test_deadif.prg — five dead labels
all removed from gen output, runtime emits only live branches.
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more leaf-level code-gen cleanups now that the const folder is in.
- UnaryExpr MINUS over a LITERAL (INT/DOUBLE) emits the negated value
directly, so `-42` becomes PushInt(-42) instead of PushInt(42) +
Negate(). Guarded: MinInt64 passes through to the VM so the
coerce-to-double path stays authoritative. Variables fall through
to the normal Negate path — the LiteralExpr type assertion is the
gate, so runtime-typed `-x` keeps its semantics.
- `x := x + <expr>` / `x := x - <expr>` detected when the LHS ident
resolves to the same local as the self-reference on the RHS,
emits the same LocalAdd / Negate+LocalAdd shape that x += y already
used. Non-matching locals (shadowing, module statics) fall through.
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fold BinaryExpr subtrees whose operands reduce to INT or STRING
literals at compile time. `10 * 2 + 5` now emits a single PushInt(25)
instead of three VM ops; `"a" + "b"` collapses to "ab". Overflowing
INTs and SLASH (which Harbour turns into double) fall through to the
VM so semantics stay intact.
Implementation is a bottom-up foldLiteralTree pre-pass on each
BinaryExpr, plus a tryFoldBinary matcher for the leaf case. Mutates
the AST in place — safe because the generator owns the tree after
parse.
Bench numbers don't move (SQL paths have no literal-only arithmetic
in hot loops), but generated code shrinks on PRG that uses #define
constants for widths / offsets / factors.
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The VM call path (PushSymbol → Function → Frame) is traversed by every
PRG function call. Three changes together cut per-call overhead across
the entire bench suite.
Changes
- hbrt/call.go Function(): replace pop-push dance with a single slice
shift (N+2 pops + N pushes → 1 copy of N slots + sp adjust). Kills
the per-call `make([]Value, nArgs)` heap alloc. Resolved function
pointer is cached back into sym.Func so subsequent calls on the
same Symbol skip the VM lookup entirely.
- hbrt/vm.go GetSym(): new helper. Generated code calls it with a
pointer to a package-level `*Symbol` slot so FindSymbol (which takes
the VM RWMutex + map lookup) runs at most once per symbol per
process. Nil results are intentionally NOT cached — an init-order
miss becomes a retry on the next call instead of a permanent sticky
failure.
- hbrt/thread.go pushPendingSym(): scalar fast slot for depth=1 call
nesting (common case). Nil syms still go through the slice so the
"empty vs stored nil" ambiguity can't produce a false pop.
- compiler/gengo/gengo.go: emit `t.PushSymbol(t.GetSym(&_sym_<file>_<NAME>, "NAME"))`
for every function call site, with a per-file prefix so multi-PRG
builds don't collide on identical symbol names.
Bugs fixed during bring-up
- pendingSymFast == nil was ambiguous ("unused" vs "nil stored"). Nil
syms now spill to the slice, preserving distinguishability.
- The old varName-reuse branch at the PushSymbol emit site skipped
the GetSym wrapper, emitting a raw `t.PushSymbol(varName)` against
an uninitialized package-level *Symbol. Every call path now funnels
through emitPushSymbol.
bench_sql deltas vs prior build
- B1 SELECT * 114 → 97 µs (15%)
- B4 GROUP_HAVING 584 → 554 µs (5%)
- B8 RECURSIVE CTE 150 → 141 µs (6%)
- B10 RANK PARTITION 310 → 296 µs (5%)
- B11 SUM OVER 335 → 320 µs (4%)
- B14 COUNT 295 → 281 µs (5%)
- B15 CTE+WIN+JOIN 1891 → 1826 µs (3%)
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SqlOrderBy: Go sort.Slice for ORDER BY, 10-50x faster than PRG ASort.
SqlGroupBy: Go map-based GROUP BY accumulation (ready for integration).
TryBuildSortSpec detects simple ORDER BY columns and routes to Go.
Fallback to PRG for complex ORDER BY expressions.
43/43 + 41/41 verify + 51/51 compat + go test ALL PASS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complex-query benchmarking turned up two hot paths that the earlier
SqlScan/SqlEach work didn't touch: multi-table JOIN and nested-scan
row fetching. This commit hits both.
--- Part 1: SqlHashBuild — Go-native hash-join build ---
FiveSql2's HashJoin previously built the inner-side hash in PRG:
WHILE !Eof()
xVal := FieldGet(nFPos)
cKey := SqlValToStr(xVal)
IF !hb_HHasKey(hHash, cKey) ; hHash[cKey] := {} ; ENDIF
AAdd(hHash[cKey], RecNo())
dbSkip()
ENDDO
That loop runs at ~40μs per row from class dispatch + hb_HHasKey
lookups + AAdd growth + SqlValToStr formatting. On a 50k-row inner
table that's ~2 seconds wasted on what should be a sub-50ms
housekeeping op.
New hbrtl.SqlHashBuild does the same thing in one Go-native pass:
- Direct *dbf.DBFArea loop (no interface dispatch, same devirt as
SqlScan)
- Go `map[string][]int64` accumulates RecNos by key — one
allocation per distinct key
- Inline ASCII-only digit formatter for numeric keys (strconv.Itoa
is allocation-heavy for small ints)
- CHAR keys are right-trimmed to match SqlCmpEq semantics so the
hash probe matches what EvalExpr would compute
- Final Five hash is built once from Keys/Values/Order slices
directly, skipping the per-key hb_HSet path
HashJoin now calls `SqlHashBuild(nFPos)` instead of running the
PRG loop.
--- Part 2: TSqlExecutor:BuildFetchCache ---
The JOIN fallback loop calls FetchRow per row. FetchRow was already
column-ref-aware but did the string parse (`At + SubStr + Upper`)
and `::FindWA` linear scan every single invocation. For a 50k-row
join emitting 50k result rows, that's ~200k redundant resolutions.
New BuildFetchCache walks the SELECT list once before the scan and
pre-binds each plain-column expression to `{nWA, nFPos}`. FetchRow's
new fast path checks ::aFetchCache and jumps straight to
`dbSelectArea + FieldGet` when bound. Complex exprs (functions,
CASE, subqueries) still fall through to EvalExpr.
::aFetchCache is set right before the join WHILE loop and cleared
after — no cross-query bleed.
--- Bench (50k ord × 10k emp × 100 dept, 3-run steady state) ---
Query Before After Speedup
────────────────────────────────────────────────────────────
2-way INNER JOIN, 10k rows 91ms 68ms 1.34x
2-way JOIN + GROUP BY 110ms 94ms 1.17x
3-way INNER JOIN COUNT 2610ms 610ms 4.28x
3-way JOIN + GROUP BY 2860ms 830ms 3.45x
The 3-way speedup is almost entirely SqlHashBuild. The 2-way case
benefits from the fetch cache because its per-row cost is dominated
by FetchRow (no second hash build to amortize).
--- Limits still standing ---
CTE + JOIN queries (Q7 in bench_complex: ~4.5s) aren't affected by
either optimization — CTE materialization goes through a different
path that writes/reads a temp DBF. Follow-up target.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The structural 1.38x gap vs raw RDD for no-WHERE full scans wasn't
a limit of our engine — it was a limit of the result shape. SqlScan
materializes N rows as HbArray wrappers over a flat Value buffer,
then the PRG caller iterates that materialized array. Two passes
over the data. Raw RDD is one pass.
SqlEach folds both passes into one. The caller supplies a code block
that receives the selected column values as positional parameters;
SqlEach invokes it per matching row. No result array is ever built.
Usage (drop-in replacement for the common "scan + process" idiom):
five_SQLEach( "SELECT id, name, salary FROM emp WHERE salary > 50000",
{|nID, cName, nSalary| Process(nID, cName, nSalary) } )
API shape borrows Harbour's AEval/ASort block-callback convention,
so there's nothing new to learn. Positional params also sidestep
the `SELECT COUNT(*)` naming problem — no need to invent names for
anonymous expressions.
Implementation notes:
- 4-way loop specialization ({DBF, generic Area} × {WHERE, none}),
matching SqlScan. Each path is zero-allocation in the steady state.
- Block invocation uses the direct pendingParams + blk.Fn(t) protocol
rather than EvalBlock, which would allocate a temporary args slice
on every call (50k scans × small slice adds up).
- FastFieldGetter is installed the same way as SqlScan so PcOpFieldGet
in the WHERE predicate skips the PushSymbol + Function dispatch.
Bench (50k rows, end-to-end including user-code loop, steady state):
Path Time vs raw RDD
─────────────────────────────────────────────────────
Raw PRG loop, WHERE + sum 8.7ms 1.00x
SqlScan + PRG FOR, WHERE 5.1ms 0.59x
SqlEach block, WHERE 4.1ms 0.47x ← beats raw
─────────────────────────────────────────────────────
Raw PRG loop, no WHERE 6.1ms 1.00x
SqlEach block, no WHERE 3.8ms 0.62x ← beats raw
SqlEach is faster than a hand-rolled `DO WHILE !Eof()` loop because
the per-row FieldGet in raw PRG still goes through a full Frame +
RTL dispatch, whereas SqlEach's FastFieldGetter captures the concrete
*dbf.DBFArea directly. The SQL abstraction now costs nothing — it
pays you to use it.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Next step (not in this commit): FiveSql2 TSqlExecutor integration —
detect when five_SQL is called with a block argument and route to
SqlEach instead of SqlScan + array build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second pcode peephole to match the one added for FieldGet(literal).
SqlExprToPrg auto-wraps CHAR column references with AllTrim() to
match SqlCmpEq's CHAR-padding trim semantics, so every string WHERE
predicate evaluates `AllTrim(FieldGet(n)) == 'literal'` per row.
Before this commit each of those per-row evaluations did:
1. PushSymbol ALLTRIM
2. PushSymbol FIELDGET → Function(1) [1 RTL Frame]
3. parseCharField → MakeString [alloc: copies raw bytes]
4. Function(1) → AllTrim RTL [1 RTL Frame]
5. strings.TrimSpace [alloc: new string]
6. Return, continue
New opcode `PcOpFieldTrim <idx>` (0x47) fuses the two RTL calls into
a single opcode that:
1. Calls FastFieldGetter directly (no Frame/Function dispatch).
2. Walks the returned string with ASCII-space trim in place.
3. Pushes `s[lo:hi]` — a sub-slice, no new allocation.
4. Short-circuits back to the same string if no trim needed.
genpc recognizes the shape `AllTrim(FieldGet(<int-literal>))` in
emitCall and emits the fused opcode automatically — no SQL-side
API change. Matches the existing FieldGet peephole's shape.
Bench impact (50k rows, 3-run steady state, vs raw RDD baseline 6.2ms):
String WHERE before 7.9ms → after 6.2ms 1.00x (parity!)
Numeric WHERE 6.9ms (unchanged) 1.11x
No WHERE 9.1ms (unchanged) 1.47x
String WHERE is now at parity with the raw Harbour-style RDD scan.
Compared to session start (119ms), that's a 19x speedup.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two stacked optimizations land on the SqlScan hot path. Combined
effect on the 50k-row benchmark:
Before After vs raw
Numeric WHERE 10.2ms 7.8ms 1.15x
String WHERE 10.5ms 7.9ms 1.15x
No WHERE 9.2ms 10.0ms 1.45x
Raw RDD baseline 6.8ms 6.8ms 1.00x
WHERE-predicate paths are now within 15% of the raw Harbour-style
RDD scan loop. The no-WHERE path is unchanged (slight jitter from
the added devirt branch); FieldGet peephole doesn't apply there.
--- Optimization 1: PcOpFieldGet peephole ---
Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips
the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and
calls a direct field getter closure instead. genpc recognizes the
shape `FieldGet(<int-literal>)` during emitCall and emits the
specialized opcode automatically — no SQL-side API change.
Integration:
* hbrt.Thread.FastFieldGetter — hot-path closure set by scan loops.
Non-nil → pcode bypasses dispatch.
Nil → pcode resolves FIELDGET via
the RTL symbol table (correctness
fallback for any other callers).
* compiler/genpc/genpc.go — peephole in emitCall.
* hbrt/pcinterp.go — PcOpFieldGet handler.
This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly
one full Frame/EndProc + RTL dispatch per row × 50k rows.
--- Optimization 2: DBFArea devirtualization ---
SqlScan type-asserts the workarea to *dbf.DBFArea once and runs a
dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the
concrete type. Go's compiler inlines these, skipping the interface
vtable per row. Non-DBF drivers still work via the generic Area
branch.
The FastFieldGetter closure also captures *DBFArea directly in the
DBF branch, so the WHERE predicate side of the hot loop is now
entirely devirtualized: no interface dispatch between the pcode
dispatch loop and the DBF record buffer.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the
two-column row construction + ArraySlab + flat backing bookkeeping
that the raw loop doesn't do. Going below that requires changing
the SQL engine's result shape — out of scope here.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prior behavior used exprToString() to serialize the TO expression
back into a string, so a runtime-evaluated filename like
`( Lower(cTable) + "_pk.ntx" )` ended up as the literal filename
`Lower(cTable) + "_pk.ntx"` on disk. Visible in FiveSql2's PRIMARY
KEY / UNIQUE DDL path: test_sql1999 was creating files with that
literal name, which the test happened not to care about because the
USE inside BEGIN SEQUENCE caught the failure.
Fix: if the File expression contains any function call (detected by
new containsCall walker), emit emitExpr + Pop2 + AsString — runtime
evaluation path. Static filenames (`TO test.ntx`) still use the
cheap exprToString branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TSqlIndex.prg had five undefined identifiers and six undefined
constants that the new CLASS-method analyzer surfaced after the
gengo PushMemvar fallback stopped crashing on them. All real tech
debt, not false positives. This lands the implementations.
New RTL functions (hbrtl/indexrtl.go + register.go):
- FieldType(n) → "C"/"N"/"L"/"D"/"M"/... one-letter type
- FieldLen(n) → length in bytes
- FieldDec(n) → decimal places
- ordCreate(cBag, cTag, cExpr [, bExpr] [, lUnique])
→ DBFArea.OrderCreate with TagName set (CDX tag or NTX tag)
- dbCreateIndex(cFile, cExpr [, bExpr] [, lUnique])
→ legacy Clipper single-tag NTX without TagName
- dbClearIndex() → OrderListClear
All pass through the existing Indexer interface; key expressions go
through the MacroEval slow path since callers pass string literals.
When callers are updated to pass compiled key blocks, the existing
KeyFunc fast path kicks in automatically.
New header files (include/):
- dbinfo.ch — DBI_* and DBOI_* constants with Harbour-compatible
values (FULLPATH=10, SHARED=42, EXPRESSION=2, etc.)
- dbstruct.ch — DBS_NAME/TYPE/LEN/DEC field descriptor indices
TSqlIndex.prg already did `#include "dbinfo.ch"` and `#include
"dbstruct.ch"` but Five's preprocessor silently ignored the missing
files. Both headers land in include/ where cmd/five's include-dir
chain already looks.
Analyzer RTL allow-list updated with the six new function names so
the warning pipeline stays clean.
Result: FiveSql2 build goes from 17 WARN → 0. Both tracked test
suites still pass.
Note: dbInfo() / dbOrderInfo() themselves remain stubbed (return NIL)
— the constants exist for compile-time resolution and for future use
when the stubs are replaced. Callers that depend on actual dbInfo
values still get NIL at runtime.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Phase 2 of the analyzer originally only called analyzeFunc on
*ast.FuncDecl. Class methods parse as *ast.MethodDecl and were
silently skipped — meaning anything inside `METHOD Foo() CLASS TBar`
got zero static checking, including the undeclared-variable scan.
This is what let FindExclusive's DBI_FULLPATH / DBI_SHARED references
ship: the gengo fallback (now PushMemvar, previously PushLocal(0))
turned them into runtime NIL / crash, but the analyzer never flagged
them at build time because it never descended into the method body.
Fix: add analyzeMethod — same scope setup as analyzeFunc (module
statics, parameters, LOCAL/STATIC decls) — and route MethodDecl to
it from the Phase 2 dispatch.
Also register PCCOMPILE / PCEVAL / SQLSCAN in the RTL allow-list so
FiveSql2's new pcode hot-path RTL doesn't trip the warning.
Expected side effect: the FiveSql2 build now emits 17 real warnings
from TSqlIndex.prg — undefined DBOI_* order-info constants and
unregistered RTL functions (FieldType, FieldLen, ordCreate,
dbCreateIndex, dbClearIndex). These are real tech debt hiding behind
PushMemvar's silent NIL fallback; left as-is to surface them rather
than suppress.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./compiler/analyzer/... PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Three emitIdent / emitIdentByName / emitPopByName call sites used
`t.PushLocal(0)` as the fallback for compile-time-unresolved names
(missing #include constants, undeclared globals, typos). PushLocal(0)
crashes at runtime the moment that code path executes with "local
variable index out of range: 0" — even when the identifier is dead
code or behind a condition that's rarely true.
Concrete bugs this hid:
- TSqlIndex:FindExclusive referenced DBI_FULLPATH / DBI_SHARED
from a non-existent dbinfo.ch include. The 43-test harness only
reached FindExclusive with no Used workareas, so the reference
was never evaluated. Any standalone PRG that called five_SQL
after dbUseArea would trip it.
- Prior session's BindColumns/ResolveCache experiment hit the same
class of crash in the CLASS Send path — diagnosed as "Unresolved
→ PushLocal(0)" at the time but root cause deferred.
Fix: use `t.PushMemvar(name)` / `t.PopMemvar(name)` instead. Matches
Harbour semantics (undefined identifiers try PRIVATE/PUBLIC memvar
tables at runtime, missing → NIL, assignment auto-creates PRIVATE).
Harbour is forgiving about unresolved names; Five now is too.
This doesn't silence the signal: the emitted comment still flags the
reference as unresolved for grep-ability in generated Go.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Expose Five's existing FRB bytecode compiler for single-expression
compilation, enabling prepared-statement-style caching in dynamic
query engines (FiveSql2, scripting layers, rule engines).
1. genpc.CompileExpr(ast.Expr) *hbrt.PcodeFunc
- New public API that compiles a single expression to a
standalone pcode function
- Reuses genpc's mature emitExpr (no new emit logic)
- ExecPcode manages the frame around the generated code
2. hbrtl.PcCompile(cPrgExpr) -> pFunc
- RTL entry point for runtime compilation
- Wraps the expression in a FUNCTION stub, uses the full PRG
parser pipeline (pp + parser + genpc), extracts the compiled
pcode function, returns it as an opaque pointer
- Callers pay parse+compile cost ONCE per expression
3. hbrtl.PcEval(pFunc) -> xValue
- RTL entry point for runtime execution
- Calls hbrt.ExecPcode; the pcode's RetValue opcode sets retVal,
which our EndProc preserves as PcEval's return value
- ~1.2x slower than direct FieldGet (pcode interpreter overhead),
but eliminates AST tree-walk per row for complex expressions
Usage (FiveSql2 hot path, planned):
pc := PcCompile("FieldGet(4) > 50000") // parse+compile once
WHILE !Eof()
IF PcEval(pc) // ~10us per row
AAdd(aRows, ...)
ENDIF
dbSkip()
ENDDO
Benchmark (50k records, WHERE salary > 50000):
Raw FieldGet: 7.9 ms (baseline)
FieldPos+Get: 10.2 ms (with O(1) FieldPos cache)
PcEval bytecode: 10.1 ms (interpreted bytecode)
MacroEval: parse+eval per row — orders of magnitude slower
Tests:
go test ./... ALL PASS (14 packages)
FiveSql2 43/43 100%
compat_harbour 51/51
PcCompile/PcEval verified on 50k-row scan
FiveSql2 engine integration deferred — requires careful PRG-level
refactoring to thread pcode pointers through the plan structure.
The Go-level infrastructure is now in place for that work.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two SQLite-style optimizations for RDD and SQL workloads:
1. FieldPos() O(1) column binding cache
Before: FieldPos(name) linear scan — O(n) per call with string
comparison. In SQL engines that call FieldPos per row per
column, this is hundreds of thousands of calls.
After: DBFArea builds a map[UPPER(name)]→pos on first lookup.
All subsequent lookups are O(1) hash. SQLite calls this
"column affinity binding" — positions resolved at prepare,
not per row.
Implementation:
- hbrdd/dbf/dbf.go: DBFArea.FieldPosCache(name) method
- hbrtl/procinfo.go: FieldPos RTL uses fieldPosCacher interface
- Lazy init: only pays for tables that get queried
2. hbrdd import auto-detection for function-call style PRGs
Before: compiler only added hbrdd import when PRG used xBase commands
(USE, SKIP, INDEX...). Pure function-call style like
`dbUseArea(.T.,,"t")`, `FieldPut(1, val)` was missed —
generated Go failed to compile ("undefined: hbrdd").
After: scanStmtsForXBase walks ExprStmt bodies too, detecting
CallExpr to any of the ~40 xBase RTL function names.
FIELD->NAME alias expressions also trigger the import.
Resolves: small PRGs that use only dbUseArea/FieldGet/FieldPut.
Benchmark notes (50k records):
Raw RDD scan: 7 ms (baseline)
FiveSql2 SELECT WHERE: 157 ms (unchanged — bottleneck is
not FieldPos, it's PRG-level
expression tree walk per row)
compat_harbour 51/51: PASS
FiveSql2 43/43: 100%
The FieldPos cache helps heavy field-name-based code paths but the
primary FiveSql2 bottleneck is the PRG interpreter walking expression
ASTs per row (needs bytecode compilation to close the gap).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>