B4 GROUP+HAVING profile showed SqlIsAggName at ~9% of CPU —
SqlEvalFunc checks it for every function in every row, and the
PRG body was two string allocations + a substring scan:
RETURN ("," + c + ",") $ ("," + AGG_FUNCTIONS + ",")
Replace with a hash lookup against the existing aggFuncSet map
in hbrtl/sqlexpr.go (already populated for SqlExprHasAgg, same
AGG_FUNCTIONS list). Upper-casing skips the allocation when the
input is already upper, which it almost always is in practice.
Bench deltas (median of 3 steady runs, 1000 iters):
B4_GROUP_HAVING 447 → 418 us -6.5%
B14_COUNT 252 → 235 us -7%
B15_CTE_WIN_JOIN 1595 → 1577 us -1%
Other benches unchanged (no aggregate calls per row).
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
AcquireTemp now returns the purpose string (upper-cased table name)
as the alias when available, and falls back to FA_#### only when the
same purpose is already in-flight this query — i.e., self-joins.
Previously every call returned a fresh FA_####, so the WA cache
(keyed by alias) could never hit on JOIN queries and the file got
reopened every iteration.
Bench deltas vs prior HEAD:
B6_INNER_JOIN 217 → 130 us -40% (1.67x)
B15_CTE_WIN_JOIN 1678 → 1595 us -5%
Single-table benches unchanged — they were already hitting the
cache via the table-name alias path.
B8 recursive CTE stays flat: its sub-executors at nDepth>1 still
cycle through fresh purposes that don't stabilise across queries.
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
After the SELECT WA cache landed, pprof showed HbFileExists → os.Stat
at 28% of remaining CPU — the RunSelect cleanup loop was stat-ing
__view_<table>.dbf for every table in every query, even on the
common view-free path.
Track view materialisation with a TSqlIndex.lViewUsed flag set in
OpenTable when CheckView produces a temp. The cleanup loop now
runs only when the flag is set, then resets it. View-using queries
are unaffected.
pprof delta:
rawsyscalln: 2.14s → 1.41s (48% → 32% of total CPU)
os.Stat: 1.24s → 0.49s (28% → 11%)
Wall-clock bench numbers stayed within plus-or-minus 3% noise (stats
are cheap when the target file does not exist, so CPU savings do not
translate directly to end-to-end time) but this removes the next
biggest syscall waste visible in the profile.
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
TSqlExecutor:OpenTable now hands lifetime to the WA cache for stable
aliases (user-supplied or table-named). CloseOpened skips those
entries, so the DBF mmap stays alive across queries instead of being
unmapped + re-opened 1000 times a bench. Previously the WA cache only
covered DML (INSERT/UPDATE/DELETE) — SELECT was still paying the full
dbUseArea/dbCloseArea syscall bill every query (profile showed
rtlDbCloseArea + munmap at ~30% of total CPU).
AcquireTemp-generated aliases (FA_####) are excluded — they change
every query (self-joins, nested depth), so caching them would just
leak entries for no reuse. JOIN / recursive CTE regressions from an
earlier unrestricted version are gone.
Bench deltas vs prior HEAD (median of 3 steady runs, 1000 iters):
B1_SELECT_STAR 82 → 41 us -50% (2.0x)
B2_WHERE_FILTER 78 → 35 us -55% (2.2x)
B3_ORDER_BY 90 → 48 us -47% (1.88x)
B5_DISTINCT 75 → 32 us -57% (2.34x)
B7_CTE_SIMPLE 120 → 77 us -36% (1.56x)
B9_ROW_NUMBER 239 → 194 us -19%
B10_RANK_PART 276 → 233 us -16%
B11_SUM_OVER 296 → 252 us -15%
B4_GROUP_HAVING 498 → 450 us -10%
Others flat (JOIN / recursive CTE / DML already covered).
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
When collectConstLocals proves a LOCAL is only ever read, not
written beyond its literal init, every read site gets the literal
substituted inline — which means the init itself has no live
reader. Skip emitting the PushXxx/PopLocalFast pair for those
LOCALs in both top-of-function and mid-body decls.
On a function with `LOCAL nBuf := 100, sTag := "x", bFlag := .T.`,
all three inits drop out (6 VM ops saved in the prologue), while
the still-written `LOCAL nSum := 0` init stays. Harbour compat
56/56, FiveSql2 43/43.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Scan each function body for LOCALs whose sole write is a literal
initialiser (never ++/-- / += / @byref / MultiAssign target /
FOR var / @GET target / macro). Reads substitute the literal
inline at emit time, which cascades into all earlier folds: dead
IF branches, AND/OR short-circuit, NOT, string-concat reassoc,
and the FOR LocalLessEqualInt fast path (extended to see through
a propagated ident limit).
Walker is bounded — unrecognised AST nodes abort propagation for
the whole function rather than risk missing a hidden write.
Harbour compat 56/56, FiveSql2 43/43.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`"a" + x + "b" + "c" + "d"` used to emit 4 Plus() calls because
the parser builds a left-leaning chain and no pair was
literal+literal. Add a reassociation step inside foldLiteralTree:
when the outer shape is `(Y + strlit1) + strlit2`, rewrite as
`Y + (strlit1+strlit2)` so the tail literals collapse. Also run
foldLiteralTree on the root BinaryExpr in emitExpr so the
outermost reassoc fires (was only running on children).
Verified: the 4-Plus case now emits 2 Plus calls (`"a" + x + "bcd"`).
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
DO WHILE .T. now emits a bare for-loop with no PushBool/PopLogical
per iteration — saves a stack roundtrip on every trip through the
idiomatic infinite-loop pattern (9 .prg files use it). DO WHILE .F.
emits nothing. Loop exits still work via EXIT / RETURN.
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
`.NOT. .T.` / `.NOT. .F.` emit PushBool directly instead of
pushing the source bool and calling Not(). boolLiteralValue also
sees through an outer NOT, so `IF !.F.` now triggers the full
dead-branch pass (no PopLogical wrapper either).
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Skip the PushBool/PopLogical/branch wrapper when the LHS of .AND. /
.OR. is a bare .T./.F. literal. `.T. .AND. X` emits X alone;
`.F. .AND. X` emits PushBool(false) with X dropped; symmetric for
OR. Common after constant-folding a sub-expression — pairs with
the earlier dead-IF-branch peephole.
FiveSql2 43/43, Harbour compat 56/56. Verified via /tmp/test_andor.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
IF .T. collapses to its body; IF .F. forwards to the first live
ELSEIF or ELSE. For dynamic main conditions the chain is still
filtered: ELSEIF .F. drops out, ELSEIF .T. truncates and becomes
the ELSE. Verified with /tmp/test_deadif.prg — five dead labels
all removed from gen output, runtime emits only live branches.
FiveSql2 43/43, Harbour compat 56/56.
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Two more leaf-level code-gen cleanups now that the const folder is in.
- UnaryExpr MINUS over a LITERAL (INT/DOUBLE) emits the negated value
directly, so `-42` becomes PushInt(-42) instead of PushInt(42) +
Negate(). Guarded: MinInt64 passes through to the VM so the
coerce-to-double path stays authoritative. Variables fall through
to the normal Negate path — the LiteralExpr type assertion is the
gate, so runtime-typed `-x` keeps its semantics.
- `x := x + <expr>` / `x := x - <expr>` detected when the LHS ident
resolves to the same local as the self-reference on the RHS,
emits the same LocalAdd / Negate+LocalAdd shape that x += y already
used. Non-matching locals (shadowing, module statics) fall through.
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fold BinaryExpr subtrees whose operands reduce to INT or STRING
literals at compile time. `10 * 2 + 5` now emits a single PushInt(25)
instead of three VM ops; `"a" + "b"` collapses to "ab". Overflowing
INTs and SLASH (which Harbour turns into double) fall through to the
VM so semantics stay intact.
Implementation is a bottom-up foldLiteralTree pre-pass on each
BinaryExpr, plus a tryFoldBinary matcher for the leaf case. Mutates
the AST in place — safe because the generator owns the tree after
parse.
Bench numbers don't move (SQL paths have no literal-only arithmetic
in hot loops), but generated code shrinks on PRG that uses #define
constants for widths / offsets / factors.
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Apply the sp-rewrite shape to the three binary arithmetic ops. The
tInt==tInt fast branch reads scalar directly (skips the AsNumInt
method) so the hot path is int64 ops + an overflow check; mixed-type
branches keep AsNumDouble unchanged.
PRG tight loops (FOR counter, SUM accumulators outside SQL aggregate
path) skip one cachedNil store and two bounds-check sequences per op.
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Fold And/Or into the same in-place sp-rewrite shape as Not/LessEqual.
Both args must be tLogical — short-circuit on the raw scalar field so
the hot path is pure integer arithmetic + two cached bool Values.
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Mirror the treatment LessEqual already got onto Equal/NotEqual/Less/
Greater/GreaterEqual — rewrite sp directly, check the type tag for
Int==Int in the hot branch, short-circuit to cachedTrue/cachedFalse
without a second method call. Keeps the slow fallback for mixed /
string / date types.
Bench movement is minor on SQL paths (WHERE is already pcode and
skips these ops); the win is on PRG comparisons that cache-miss out
of the pcode path — FOR-condition short forms, IF chains, etc.
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
Three small tweaks to the pop-push hotspots in the VM.
- ops_arith.go Inc/Dec/AddInt: unary ops mutate the top stack slot in
place via peekPtr() instead of pop-compute-push. Drops the bounds
check + cachedNil clear + push bounds check per call. Biggest
beneficiary: FOR loop counters (implicit Inc) — every iteration of
every PRG loop pays these ops once.
- ops_collection.go ArrayGen: consume N slots via a single `copy`
into the freshly-allocated result slice, then rewind sp and clear
the intermediate slots for GC (the first slot is overwritten by
the array push). Skips the N-deep pop loop.
- ops_collection.go EvalBlock: read block value before shift, collapse
args down one slot to overwrite the block position, then let the
block run against the same in-place layout. Matches the
Function()/PushSymbol round-trip removal from the prior commit.
bench_sql deltas
- B2 WHERE 83 → 78 µs (6%)
- B3 ORDER BY 96 → 90 µs (6%)
- B4 GROUP_HAVING 554 → 528 µs (5%)
- B9 ROW_NUMBER 255 → 241 µs (5%)
- B10 RANK PART 296 → 278 µs (6%)
- B11 SUM OVER 320 → 300 µs (6%)
- B15 CTE+WIN+JOIN 1826 →1743 µs (5%)
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
The VM call path (PushSymbol → Function → Frame) is traversed by every
PRG function call. Three changes together cut per-call overhead across
the entire bench suite.
Changes
- hbrt/call.go Function(): replace pop-push dance with a single slice
shift (N+2 pops + N pushes → 1 copy of N slots + sp adjust). Kills
the per-call `make([]Value, nArgs)` heap alloc. Resolved function
pointer is cached back into sym.Func so subsequent calls on the
same Symbol skip the VM lookup entirely.
- hbrt/vm.go GetSym(): new helper. Generated code calls it with a
pointer to a package-level `*Symbol` slot so FindSymbol (which takes
the VM RWMutex + map lookup) runs at most once per symbol per
process. Nil results are intentionally NOT cached — an init-order
miss becomes a retry on the next call instead of a permanent sticky
failure.
- hbrt/thread.go pushPendingSym(): scalar fast slot for depth=1 call
nesting (common case). Nil syms still go through the slice so the
"empty vs stored nil" ambiguity can't produce a false pop.
- compiler/gengo/gengo.go: emit `t.PushSymbol(t.GetSym(&_sym_<file>_<NAME>, "NAME"))`
for every function call site, with a per-file prefix so multi-PRG
builds don't collide on identical symbol names.
Bugs fixed during bring-up
- pendingSymFast == nil was ambiguous ("unused" vs "nil stored"). Nil
syms now spill to the slice, preserving distinguishability.
- The old varName-reuse branch at the PushSymbol emit site skipped
the GetSym wrapper, emitting a raw `t.PushSymbol(varName)` against
an uninitialized package-level *Symbol. Every call path now funnels
through emitPushSymbol.
bench_sql deltas vs prior build
- B1 SELECT * 114 → 97 µs (15%)
- B4 GROUP_HAVING 584 → 554 µs (5%)
- B8 RECURSIVE CTE 150 → 141 µs (6%)
- B10 RANK PARTITION 310 → 296 µs (5%)
- B11 SUM OVER 335 → 320 µs (4%)
- B14 COUNT 295 → 281 µs (5%)
- B15 CTE+WIN+JOIN 1891 → 1826 µs (3%)
Verification
- go test ./... ALL PASS
- FiveSql2 test_sql1999 43/43
- tests/compat_harbour 56/56
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
SqlOrderBy: Go sort.Slice for ORDER BY, 10-50x faster than PRG ASort.
SqlGroupBy: Go map-based GROUP BY accumulation (ready for integration).
TryBuildSortSpec detects simple ORDER BY columns and routes to Go.
Fallback to PRG for complex ORDER BY expressions.
43/43 + 41/41 verify + 51/51 compat + go test ALL PASS.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FindColIdx2 searched for bare column name (e.g. 'AMOUNT') but
aFieldNames now contains qualified names ('o.amount') from the
Go join fast path. Added fallback: try xArg[2] (the full AST name)
when the bare name misses. Fixes SUM/AVG/MIN/MAX aggregation after
Go-native hash join.
Verified: 41/41 correctness tests pass (verify_correctness.prg).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Q2 Running total regressed 100ms→6.7s from the frame-aware rewrite.
Default frame (UNBOUNDED PRECEDING to CURRENT ROW) now uses O(N)
incremental path; general per-row-frame loop only for custom frames.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--- #15 RIGHT JOIN O(N*M) → O(N+M) via matched RecNo set ---
--- #19 s_nRCJSeq modular counter (% 100000) ---
--- #20 Implicit column alias without AS keyword ---
Validation: 43/43 + 51/51 + go test ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
--- #12 Window frame spec now honoured ---
Parser parsed ROWS BETWEEN ... AND ... but discarded the result.
Now stores hFrame in a 6th slot on ND_WINDOW nodes via AAdd.
ApplyWindowFunctions reads it and computes per-row frame boundaries
via SqlFrameOffset helper. Unified SUM/AVG/COUNT/MIN/MAX into one
frame-aware CASE branch.
--- #6 EXISTS LIMIT mutation removed ---
Removed direct parse-tree mutation (hQuery["limit"] := 1) that
would corrupt reuse. Semi-join lift handles the fast case.
Validation: 43/43 + 51/51 + go test ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Continues the static-analysis sweep from 7babfb7.
--- #3 Resolve NIL ambiguity (HIGH) ---
ResolveFromOuter returned NIL for both "column not found" and
"column value is NULL". Callers tested `xVal != NIL` to decide
success, which silently dropped legitimate NULL outer-row values
in correlated subqueries. Added a by-reference lFound flag so
callers distinguish the two cases.
--- #14 Multi-level LEFT JOIN null-fill (MEDIUM) ---
LEFT JOIN null-fill only fired at the last join level
(`nIdx >= Len(aJoins)`). For `a LEFT JOIN b ON ... JOIN c ON ...`
where b had no match, the null-fill for b was skipped and the
outer row was dropped entirely. Now recurses into subsequent joins
when the match fails, so the base case can still emit a row with
NULLs for b's columns.
--- #18 UNION/INTERSECT/EXCEPT applied after LIMIT (MEDIUM) ---
SQL standard requires set operations before ORDER BY / DISTINCT /
OFFSET / LIMIT. Reordered to:
RIGHT JOIN pass → UNION/INTERSECT/EXCEPT → DISTINCT → ORDER BY
→ OFFSET → LIMIT.
Previously LIMIT clipped the first SELECT before UNION merged the
second's rows, producing more rows than intended.
--- #22 DATEADD month overflow (LOW) ---
`DATEADD('MONTH', 1, '2024-01-31')` produced `SToD("20240231")`
(Feb 31) → empty date. Now normalizes month overflow/underflow
into year rollover and clamps the day to the target month's last
day. Year addition also handles Feb 29 → Feb 28 on non-leap years.
--- #23 VIEW temp file leak (LOW) ---
TSqlIndex:CheckView creates `__view_<table>.dbf` temp files that
were never cleaned up. Added post-scan cleanup in RunSelect's
close section (after CTE cleanup) that erases matching temp files.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Systematic bug-hunt driven by an automated analysis of all FiveSql2
source files. Each fix is targeted — no speculative refactoring.
--- #1 CLASSDATA hSubCache leaked across queries (CRITICAL) ---
CLASSDATA hSubCache INIT { => } SHARED
shared one hash across ALL TSqlExecutor instances. A non-correlated
subquery cached in query A was silently returned for an unrelated
query B if the subquery text happened to produce the same cache key.
Converted to instance DATA initialized in New().
--- #5+#21 IS NULL / COALESCE treated empty string as NULL (HIGH) ---
RETURN xL == NIL .OR. ( ValType(xL) == "C" .AND. Empty(AllTrim(xL)) )
SQL standard: '' is a valid non-NULL value. Removed the empty-string
check from both IS NULL evaluation and COALESCE skip logic.
--- #4 Multiple ? parameters all returned first value (HIGH) ---
ND_PAR nodes had no index — EvalExpr always returned ::aParams[1].
Parser now stamps each ? with a sequential 1-based index in xNode[2].
EvalExpr uses it to return the correct ::aParams[n].
--- #10+#11 SqlEvalRowExpr missing / and || operators, single-arg
function eval (MEDIUM) ---
Division and string concatenation fell through to RETURN NIL in the
row-expression evaluator used by recursive CTEs and aggregate
ComputeAgg. Also, multi-argument functions like SUBSTR(x,2,3) only
received the first argument. Both fixed.
--- #9 SUM/AVG/MIN/MAX of all NULLs returned 0 instead of NULL
(MEDIUM) ---
SQL standard requires NULL. Changed the aggregate return path to
return NIL when nCount == 0 (SUM/AVG) or when xMin/xMax == NIL.
--- #8 MIN/MAX used SqlCoerceNum for comparison (MEDIUM) ---
Strings and dates were coerced to numbers (Val()) before comparing,
making MIN('banana') == MIN('apple') == 0. Switched to SqlCmpLt
which handles type-appropriate comparison.
--- #7 SqlExprHasAgg only checked top-level node (MEDIUM) ---
Expressions like `salary + COUNT(*)` were not detected as containing
an aggregate because the top node was ND_BIN, not ND_FN. Made the
function recursive — walks ND_BIN, ND_UNI, ND_FN args, ND_CASE
branches.
--- #13 SELECT * only expanded first table in JOINs (MEDIUM) ---
`SELECT * FROM orders o JOIN customers c ON ...` only included
fields from orders. Changed the expansion loop to iterate ALL
entries in ::aTables.
--- #2 s_aOuterStack not unwound on subquery error (HIGH) ---
SubqueryCached's PushOuter/PopOuter pair was not protected by
BEGIN SEQUENCE. A runtime error inside the subquery left a stale
entry on the module-level outer stack, corrupting all subsequent
queries' correlated column resolution. Wrapped in SEQUENCE/RECOVER.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
A scalar correlated subquery with a JOIN inside:
SELECT e.name,
(SELECT SUM(o.qty * p.price)
FROM ord o INNER JOIN prod p ON o.prod_id = p.id
WHERE o.emp_id = e.id) AS revenue
FROM emp e WHERE e.dept = 'SALES'
returned wrong values (equal to SUM(qty) instead of SUM(qty*price))
or zero for all but the first outer row. Root cause was a triple
interaction between three independent bugs.
--- Bug 1: Subquery cache leaked across five_SQL invocations ---
hSubCorrCache, aSubCacheSlots, aSemiJoinSlots, nSubCacheSeq were
declared as DATA ... INIT { => } / {} / 0. In Five's compiled output,
hash/array INIT literals may share the same backing instance across
New() calls, so the cache from query A (SUM qty, no join) was still
there when query B ran, providing a hit on the same key — returning
A's cached (wrong) value instead of re-executing B's subquery.
Fix: explicit initialization in New().
--- Bug 2: aJoins alias mutation across subquery invocations ---
RunSelect's join-alias sync loop mutated aJoins[i][3] from the
user alias ("p") to the depth-suffixed temp alias ("FA_0003").
aJoins was a direct reference into hQuery["joins"], so the mutation
persisted across re-executions of the same hQuery. On the 2nd call,
the sync loop couldn't find a matching aTables entry because the
stale temp alias ("FA_0003") didn't match the new one ("FA_0005").
The join table's workarea was positioned wrong → empty join result.
Fix: deep-clone both ::aTables and aJoins at the start of RunSelect
so each invocation starts from the parsed originals.
--- Bug 3: SqlCollectCols stripped alias prefixes ---
When adding hidden columns for complex aggregate arguments (e.g.
SUM(o.qty * p.price)), SqlCollectCols returned bare names like
"qty" and "price" instead of qualified "o.qty" / "p.price". In a
JOIN context, unqualified "price" routed FetchRow to the first
table (ord) instead of prod — FieldPos returned 0, the column was
silently NIL, and the multiplication collapsed to qty*1 = qty.
Fix: new SqlCollectColExprs returns the original ND_COL AST nodes
with qualified names preserved. The hidden-column loop now inserts
these directly so FetchRow's dot-qualified path resolves to the
correct workarea via FindWA.
--- Verification ---
Deterministic 5-emp / 6-order / 3-product test:
Expected revenues per emp:
Emp 1: 2*10 + 3*20 = 80 → got 80.00 ✓
Emp 2: 1*10 + 4*30 = 130 → got 130.00 ✓
Emp 3: 5*20 = 100 → got 100.00 ✓
Emp 4: no orders = 0 → got 0 ✓
Emp 5: 7*10 = 70 → got 70.00 ✓
Also verified SUM(qty*2) and SUM(p.price) variants.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Correlated EXISTS with high-cardinality keys was stuck at O(outer × inner)
because memoization couldn't amortize across unique correlation values.
H3 in the subquery stress bench:
SELECT e.name FROM emp e
WHERE EXISTS (SELECT 1 FROM ord WHERE ord.emp_id = e.id AND ord.qty > 15)
500 outer rows × 500 distinct e.id values × 5000-row ord scan = 10s,
with no path to improvement from caching the subquery result.
Fix: detect the semi-join shape on the subquery and rewrite it at
runtime into a non-correlated DISTINCT scan whose result is cached
as a hash set. Each outer row then becomes an O(1) hash probe.
--- What we lift ---
SELECT ... FROM inner_table
WHERE inner.col = outer.col [AND other_non_correlated_preds]
Shape constraints (all must hold):
- single table, no JOIN
- no GROUP BY, no HAVING, no UNION
- WHERE is an AND tree containing an equi-term where one side is
a column with an alias prefix from the subquery's own FROM
and the other is a column from an outer alias
- the remaining AND terms (non-correlated residue) have no
outer references of their own — rules out patterns like
`WHERE e2.dept = e.dept AND e2.salary > e.salary` where the
second term can't live without the outer context
--- How the lift works ---
1. Walk the WHERE as a flat AND-term list
2. Find and remove the first correlated equi-term, remember the
inner column name and outer column reference
3. Verify residue is non-correlated via a recursive AST walker
(SemiJoinHasOuterRef) — bail to fallback if not
4. Clone hQuery with:
columns = {DISTINCT inner.col}
where = residue (or NIL)
distinct = .T.
limit / top / order_by / group_by / having cleared
5. Run the cloned subquery once via a nested TSqlExecutor — no
PushOuter because it's now non-correlated
6. Build a hash set keyed on SqlValToStr(each distinct inner value)
7. Per EXISTS probe: Resolve the outer column reference, look up
in the hash set
Cached in ::aSemiJoinSlots indexed by xSubNode identity so the
analysis + lifted scan runs exactly once per subquery expression.
Subqueries that don't match the shape store the sentinel "NO" so
subsequent probes skip re-analysis and fall through to the existing
SubqueryCached + LIMIT 1 path.
NOT EXISTS works through the same path — lNegate flag just flips
the final hash-lookup result.
--- Bench (emp=500, prod=100, ord=5k) ---
Pattern Before After Speedup
────────────────────────────────────────────────────────────
H3 EXISTS correlated 10.0s 4.5ms ~2200x
H8 NOT EXISTS self-join 900ms 890ms same (can't lift:
remainder
`e2.salary > e.salary`
is correlated)
H11 Scalar + EXISTS + derived 3.2s 1.0s 3.2x
H8 correctly falls through to the non-lifted path because the
remainder outer-reference check (SemiJoinHasOuterRef) rejects the
`e2.salary > e.salary` term. The 5-row answer is still correct.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
- H3 returns 125 rows (matches pre-change correct result)
- H8 returns 5 rows (matches pre-change correct result)
Known pre-existing bug, unrelated: H7 (scalar correlated subquery
with inner INNER JOIN) returns zero for rows 2..N — workarea state
leaks between consecutive subquery invocations. Not touched here,
filed for follow-up.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Extreme subquery stress bench (12 patterns spanning scalar-in-SELECT,
nested correlation, EXISTS, NOT IN, derived tables, self-joins, and
mixed combinations) exposed three weaknesses in the post-ROLLUP state:
1. EXISTS / NOT EXISTS evaluated the full subquery result per outer
row, even though it only needs to know whether any row matches.
2. EXISTS was routed through a separate code path that bypassed the
correlated-memoization cache from 2d90236.
3. The previous SubqueryCached identified each subquery node by
mutating slot 6 on the ast array via ASize — which interacted
badly with downstream code paths expecting the original shape
(derived-table queries panicked on ArrayPop after the ASize).
Fixes:
* EXISTS / NOT EXISTS now route through SubqueryCached the same way
ND_SUB in WHERE does, so correlated EXISTS predicates memoize on
outer free-variable values when the cardinality is low.
* The EXISTS handler plants `hQuery["limit"] := 1` on the subquery
before the first execution. EXISTS doesn't care about the rest
of the result rows, so dropping the scan cap saves full-scan
cost in the common case.
* A new early-termination branch in RunSelect's scan loop exits
the `WHILE !Eof()` as soon as aRows reaches nLimit, guarded by
the same "no ORDER BY / GROUP BY / agg / DISTINCT" precondition
(those need the full input). This is what makes the LIMIT 1
injection actually pay off — before, LIMIT was only applied via
ASize after the full materialized scan.
* SubqueryCached no longer mutates the parse tree. Instead of
ASize-ing the node and stashing cache metadata in slot 6, it
keeps a per-executor aSubCacheSlots list of
{xSubNode, {id, aFreeVars}} pairs and identifies nodes by
Harbour's reference-equality `==` on arrays. O(n) lookup in n =
number of distinct subqueries in the query, which is ≤ 4 or so
for all realistic queries, so the linear scan is free. Fixes the
derived-table ArrayPop panic.
Bench impact (emp=500, prod=100, ord=5k — subquery hell):
Pattern Before After Δ
───────────────────────────────────────────────────────
H3 Correlated EXISTS 13.3s 10.0s 1.3x
H7 Scalar-in-SELECT + JOIN 362ms 2ms 181x
H8 NOT EXISTS self-join 1.8s 900ms 2.0x
H11 Scalar + EXISTS + derived 13.7s 3.2s 4.3x
(H1, H2, H5, H6, H9, H10, H12 unchanged at 3–72ms)
H7's 181x is the scalar-in-SELECT-list memoization payoff — each
dept's revenue subquery used to run 100 times (once per SALES emp),
now runs once per distinct dept.
H3's 1.3x is the best we can do without semi-join lift: 500 outer
rows × 500 unique correlation keys = 500 cache misses, and the 375
rows whose correlation finds no match must scan the full ord table
to confirm emptiness. Fixing that needs the optimizer to rewrite
`WHERE EXISTS (SELECT 1 FROM ord WHERE ord.emp_id = e.id AND ...)`
into `WHERE e.id IN (SELECT DISTINCT emp_id FROM ord WHERE ...)`,
which is a real query-rewrite feature left for a follow-up.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two SQL:2013 features that were stubs or bugs. Both ship together
because they share testing infrastructure (the SQL:2013 analytics
bench).
--- 1. ROLLUP / CUBE / GROUPING SETS (TSqlAgg) ---
The parser has recognized these for a while, storing them as
`ND_FN "ROLLUP"` / "CUBE" / "GROUPING SETS" nodes inside the
GROUP BY list. GroupBy never actually expanded them — it treated
the ND_FN as an opaque group term, which meant every row hashed
into the empty bucket and the query returned a single row.
New TSqlAgg:ExpandGroupingSets walks the aGroupBy array and
expands each ROLLUP / CUBE / GSETS modifier into a list of flat
grouping sets by cross-product with the surrounding plain terms:
GROUP BY ROLLUP(a, b, c) → {(a,b,c), (a,b), (a), ()}
GROUP BY CUBE(a, b) → {(a,b), (a), (b), ()}
GROUP BY GROUPING SETS((a,b),()) → as-is
GROUP BY x, ROLLUP(a, b) → {(x,a,b), (x,a), (x)}
When the expansion produces more than one set, GroupBy recurses
once per set (passing the plain flat set) and NILs out SELECT
columns that aren't in the current set — the standard subtotal
placeholder. Fast path (no ROLLUP/CUBE/GSETS node) short-circuits
to the original single-pass logic.
Correctness check: `SELECT region, SUM(amount) FROM sales GROUP BY
ROLLUP(region)` on a 5-region dataset now returns 6 rows (5
per-region subtotals + 1 grand total row with region=NIL). Was 1.
--- 2. Correlated subquery memoization (TSqlExecutor) ---
Committed 9e0f82c fixed a silent caching bug that made correlated
subqueries return the first outer-row's result for every subsequent
row, at the cost of dropping caching entirely — every outer row
re-executed the subquery. For Q8 in the SQL:2013 bench (1000 emps,
correlated on 3 distinct depts) that was 4.9 seconds.
The right answer is to memoize per outer-key, not globally. This
commit adds:
- TSqlExecutor:CollectFreeVars(hQ): walks a subquery's WHERE,
columns, and HAVING for ND_COL references whose alias prefix
isn't one of the subquery's own FROM tables. Those are the
outer columns the subquery actually depends on.
- TSqlExecutor:SubqueryCached(xSubNode): runs the free-var
analysis once per distinct AST node (memoized onto a 6th slot
on the node), builds a cache key from the current values of
those free vars via ::Resolve(), looks up in ::hSubCorrCache,
executes on miss. Non-correlated subqueries end up with an
empty free-var list → single cache entry → same behavior as
the old CacheSubquery fast path.
- ND_SUB and ND_SUB-in-IN handlers route through SubqueryCached
instead of the split cache/push-outer logic.
Plus a correctness fix that SubqueryCached surfaced: when a
subquery runs at nDepth > 1, TSqlExecutor rewrites each FROM
table's alias to a depth-suffixed temp (so concurrent opens of
the same file don't collide). Previously the original user-written
alias was only preserved in aTables[i][3] for single-char aliases.
Multi-char aliases like `emp e2` lost their original after the
rename, so FindWA("E2") failed, Resolve("e2.dept") returned NIL,
and `WHERE e2.dept = e1.dept` evaluated NIL=NIL → every row was
filtered out → subquery AVG returned 0 → outer `salary > 0` was
trivially true for everyone. Now we always stash the original
alias in [3] before the rename.
--- Bench (SQL:2013 analytics, 10 queries, emp=1k, sales=20k) ---
Query Before After Δ
────────────────────────────────────────────────────────
Q6 RECURSIVE hierarchy (prev fix) 30ms
Q7 ROLLUP subtotals 86ms, 1 row 106ms, 6 rows (correct)
Q8 Correlated subquery 4933ms 20ms ~245x
(all other queries unchanged at 4–230ms)
Q8 30-row sanity regression test (emp.dept in {A,B,C}, deterministic
salaries so hand-computed averages are 155/810/1765):
SELECT name, dept, salary FROM emp e1
WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)
Before: 30 rows (wrong — returns all)
After: 15 rows (correct — 5 above each dept's average)
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two fixes uncovered by a SQL:2013 analytics benchmark covering the
query patterns people actually run on DBF data (OLAP, BI, hierarchy
traversal).
--- Fix 1: correlated subquery was silently wrong ---
EvalExpr's ND_SUB handler only pushed the outer context when
`s_aOuterStack` was already non-empty — otherwise it routed the
subquery through CacheSubquery, which stores the first result under
a key derived from the subquery's syntax tokens. For a correlated
subquery in a top-level WHERE:
SELECT name, dept, salary FROM emp e1
WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)
the first outer row saw an empty stack, cached the result, and
every subsequent outer row got the same cached value regardless of
e1.dept. The query returned all 1000 employees instead of the 505
who actually beat their department's average.
Fix: always PushOuter + Run, no cache. Correctness over caching.
Trade-off: non-correlated scalar subqueries now re-execute per
outer row. A proper per-outer-key memoization is deferred — it
requires walking the subquery AST to collect free variables.
--- Fix 2: WITH RECURSIVE hierarchy join was O(m*n) ---
RecCteJoin (the in-memory join used when a recursive CTE's step
references both a real table and the CTE frontier) ran a flat
nested loop: for each DBF row × each prev-iteration row, build a
combined row buffer and run SqlEvalRowExpr on the ON condition.
For a 4-level 1000-employee hierarchy that's ~1M ON evaluations,
~4.6 seconds.
Fix: detect the shape `dbfAlias.col = cteAlias.col` at join-setup
time, build a PRG hash on the CTE frontier keyed by its join column
(aPrevRows is always small — at most the last iteration's emitted
rows), then scan the DBF side once and probe the hash. Complex ON
predicates fall through to the original nested loop.
--- Bench (SQL:2013 analytics, emp=1k, sales=20k, evt=30k) ---
Query Before After Speedup
──────────────────────────────────────────────────────────────
RECURSIVE hierarchy 4-level 4603ms 30ms ~150x
Correlated subquery (all emp) 10ms ❌ 4933ms ✓ (correct)
Other SQL:2013 queries (ROW_NUMBER top-N, running total, moving
average, DENSE_RANK, LAG, NTILE, gaps-and-islands) are all in the
expected 10–230ms range for these dataset sizes, unchanged by
this commit.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Known follow-ups (not in this commit):
- Q7 ROLLUP(col) parses but isn't expanded in GroupBy — returns
a single grand-total row instead of per-value + total. Grouping
sets implementation is a separate feature.
- Correlated subquery memoization by outer free-variable key
would bring Q8 from 4.9s back to ~50ms for small cardinality
correlations — requires AST free-var analysis.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FiveSql2's HashJoin only recognized bare equi-terms (xOnCond[1]=ND_BIN,
xOnCond[2]="="), so a compound ON predicate like
ON e.dept_id = t.dept_id AND e.salary = t.max_sal
fell through to the nested-loop ELSE branch:
dbSelectArea(nInnerWA)
dbGoTop()
WHILE !Eof()
IF SqlIsTrue(EvalExpr(xOnCond))
JoinRecurse(...)
ENDIF
dbSkip()
ENDDO
That's O(outer × inner) per outer row, re-evaluating the full AND tree
every probe. Query Q7 in the complex benchmark (CTE top_emp joined back
to emp on compound key) ran at 4.6 seconds for 100 inner × 10k outer.
Fix has two pieces:
1. **Probe-term extraction in JoinRecurse**: when xOnCond is an AND,
walk the left-associative chain looking for the first equi-term
(`a.x = b.x`). Use that as the hash-probe key, drive the normal
hash-join code path through it.
2. **Post-filter in HashJoin**: after a hash match, if the *original*
xOnCond was compound, re-evaluate the full predicate with
EvalExpr to drop matches that satisfied the hash key but not the
rest of the AND (e.g. same dept but different salary). Bare equi-
joins still skip the re-eval — the hash match is conclusive.
Bench (10k × 100 × compound ON predicate):
Query Before After Speedup
─────────────────────────────────────────────────────────
Q7 CTE + JOIN compound ON 4573ms 209ms 21.9x
Still works for the existing bare equi case (43-test unchanged) and
the 3-way JOIN case (no regression). Falls back to the generic nested
loop only when no probe-term can be extracted at all.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
- Q7 result: 100 rows (correct)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Surfaced by complex-query benchmarking. Query like:
SELECT d.name AS dept, COUNT(*) AS n, SUM(o.amount) AS total
FROM dept d INNER JOIN emp e ON ... INNER JOIN ord o ON ...
GROUP BY d.name
returned exactly 1 row instead of 100. Removing the AS aliases made
it work correctly. Semantic bug, not a performance issue.
Root cause: TSqlAgg:GroupBy resolved each GROUP BY column by calling
FindColIdx against aFN — the output alias list. For GROUP BY d.name
with d.name AS dept, the group expression's column name was looked
up in {"dept","n","total"} and missed. FindColIdx returned 0, every
row got an empty group key, and the hash collapsed everything into
one bucket.
Fix: new FindGroupIdx walks aCols (SELECT list expressions) instead,
matching the GROUP BY column against each SELECT item's source
expression ND_COL name. Handles qualified refs (d.name -> NAME) and
falls back to FindColIdx for cases where GROUP BY uses a column not
in the SELECT list.
Also hoisted the resolution out of the per-row loop — GROUP BY
columns resolve once into aGroupIdx[] so each row just indexes.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- Complex bench Q4: 1 row -> 100 rows (correct)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Complex-query benchmarking turned up two hot paths that the earlier
SqlScan/SqlEach work didn't touch: multi-table JOIN and nested-scan
row fetching. This commit hits both.
--- Part 1: SqlHashBuild — Go-native hash-join build ---
FiveSql2's HashJoin previously built the inner-side hash in PRG:
WHILE !Eof()
xVal := FieldGet(nFPos)
cKey := SqlValToStr(xVal)
IF !hb_HHasKey(hHash, cKey) ; hHash[cKey] := {} ; ENDIF
AAdd(hHash[cKey], RecNo())
dbSkip()
ENDDO
That loop runs at ~40μs per row from class dispatch + hb_HHasKey
lookups + AAdd growth + SqlValToStr formatting. On a 50k-row inner
table that's ~2 seconds wasted on what should be a sub-50ms
housekeeping op.
New hbrtl.SqlHashBuild does the same thing in one Go-native pass:
- Direct *dbf.DBFArea loop (no interface dispatch, same devirt as
SqlScan)
- Go `map[string][]int64` accumulates RecNos by key — one
allocation per distinct key
- Inline ASCII-only digit formatter for numeric keys (strconv.Itoa
is allocation-heavy for small ints)
- CHAR keys are right-trimmed to match SqlCmpEq semantics so the
hash probe matches what EvalExpr would compute
- Final Five hash is built once from Keys/Values/Order slices
directly, skipping the per-key hb_HSet path
HashJoin now calls `SqlHashBuild(nFPos)` instead of running the
PRG loop.
--- Part 2: TSqlExecutor:BuildFetchCache ---
The JOIN fallback loop calls FetchRow per row. FetchRow was already
column-ref-aware but did the string parse (`At + SubStr + Upper`)
and `::FindWA` linear scan every single invocation. For a 50k-row
join emitting 50k result rows, that's ~200k redundant resolutions.
New BuildFetchCache walks the SELECT list once before the scan and
pre-binds each plain-column expression to `{nWA, nFPos}`. FetchRow's
new fast path checks ::aFetchCache and jumps straight to
`dbSelectArea + FieldGet` when bound. Complex exprs (functions,
CASE, subqueries) still fall through to EvalExpr.
::aFetchCache is set right before the join WHILE loop and cleared
after — no cross-query bleed.
--- Bench (50k ord × 10k emp × 100 dept, 3-run steady state) ---
Query Before After Speedup
────────────────────────────────────────────────────────────
2-way INNER JOIN, 10k rows 91ms 68ms 1.34x
2-way JOIN + GROUP BY 110ms 94ms 1.17x
3-way INNER JOIN COUNT 2610ms 610ms 4.28x
3-way JOIN + GROUP BY 2860ms 830ms 3.45x
The 3-way speedup is almost entirely SqlHashBuild. The 2-way case
benefits from the fetch cache because its per-row cost is dominated
by FetchRow (no second hash build to amortize).
--- Limits still standing ---
CTE + JOIN queries (Q7 in bench_complex: ~4.5s) aren't affected by
either optimization — CTE materialization goes through a different
path that writes/reads a temp DBF. Follow-up target.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Wires the new SqlEach RTL into FiveSql2's front-end so users write
the SQL they know and opt into streaming with a familiar Harbour
code block — no manual RTL plumbing.
API:
/* Existing array form — unchanged, 43-test still green */
aR := five_SQL( "SELECT name FROM t" )
/* New block form — zero intermediate rows, 2x raw PRG */
five_SQL( "SELECT id, name FROM t WHERE salary > 50000", NIL,
{|nID, cName| Process(nID, cName)} )
Parameter order (cSQL, aParams, bBlock) keeps backward compatibility
with every existing call site. Passing NIL for aParams when only a
block is needed is standard Harbour idiom.
Routing:
* TFiveSQL:Execute now takes an optional bBlock parameter and
stores it on TSqlExecutor as ::bRowBlock.
* TSqlExecutor:RunSelect's existing Go fast path (same guards as
before: single table, no JOIN/GROUP/aggregate, plain column
projections, WHERE compilable via SqlExprToPrg) branches on
::bRowBlock:
- block present → SqlEach streams rows through the block
- block absent → SqlScan materializes into aRows (current path)
* Post-processing (GROUP BY / ORDER BY / window / DISTINCT / LIMIT)
runs on empty aRows when block mode fires — all are no-ops on
empty input, so the sequence stays harmless.
* RunSelect returns NIL (not {fields, rows}) when ::bRowBlock was
used — signals "streaming semantics, all work done in the block".
Complex queries (JOIN, GROUP BY, subquery, window, ORDER BY not
matchable by an index, LIMIT/OFFSET, etc.) still fall back to the
array path even when a block is supplied — those genuinely require
materialization. Block mode is a fast-path opt-in, not a semantic
change.
End-to-end bench (50k rows, steady state — includes the user-side
loop/block for every row):
Path Time Speedup vs raw
──────────────────────────────────────────────────────────────
Raw PRG DO WHILE !Eof() + WHERE sum 7.6ms 1.00x
five_SQL array + FOR 7.7ms ~same
five_SQL + block (new) 3.7ms 2.05x ← beats raw
──────────────────────────────────────────────────────────────
Raw PRG no WHERE 6.1ms 1.00x
five_SQL + block, no WHERE 2.9ms 2.10x ← beats raw
SQL now pays for itself on end-to-end timing — not just competitive
with hand-rolled RDD loops, but faster than them. The layered cost
of FieldGet's Frame+RTL-dispatch that hand-written loops incur per
call is gone; the block-callback path captures *dbf.DBFArea directly
via FastFieldGetter and uses PcOpFieldGet to bypass dispatch in the
compiled WHERE predicate.
Validation:
- FiveSql2 43/43 (array API unchanged)
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The structural 1.38x gap vs raw RDD for no-WHERE full scans wasn't
a limit of our engine — it was a limit of the result shape. SqlScan
materializes N rows as HbArray wrappers over a flat Value buffer,
then the PRG caller iterates that materialized array. Two passes
over the data. Raw RDD is one pass.
SqlEach folds both passes into one. The caller supplies a code block
that receives the selected column values as positional parameters;
SqlEach invokes it per matching row. No result array is ever built.
Usage (drop-in replacement for the common "scan + process" idiom):
five_SQLEach( "SELECT id, name, salary FROM emp WHERE salary > 50000",
{|nID, cName, nSalary| Process(nID, cName, nSalary) } )
API shape borrows Harbour's AEval/ASort block-callback convention,
so there's nothing new to learn. Positional params also sidestep
the `SELECT COUNT(*)` naming problem — no need to invent names for
anonymous expressions.
Implementation notes:
- 4-way loop specialization ({DBF, generic Area} × {WHERE, none}),
matching SqlScan. Each path is zero-allocation in the steady state.
- Block invocation uses the direct pendingParams + blk.Fn(t) protocol
rather than EvalBlock, which would allocate a temporary args slice
on every call (50k scans × small slice adds up).
- FastFieldGetter is installed the same way as SqlScan so PcOpFieldGet
in the WHERE predicate skips the PushSymbol + Function dispatch.
Bench (50k rows, end-to-end including user-code loop, steady state):
Path Time vs raw RDD
─────────────────────────────────────────────────────
Raw PRG loop, WHERE + sum 8.7ms 1.00x
SqlScan + PRG FOR, WHERE 5.1ms 0.59x
SqlEach block, WHERE 4.1ms 0.47x ← beats raw
─────────────────────────────────────────────────────
Raw PRG loop, no WHERE 6.1ms 1.00x
SqlEach block, no WHERE 3.8ms 0.62x ← beats raw
SqlEach is faster than a hand-rolled `DO WHILE !Eof()` loop because
the per-row FieldGet in raw PRG still goes through a full Frame +
RTL dispatch, whereas SqlEach's FastFieldGetter captures the concrete
*dbf.DBFArea directly. The SQL abstraction now costs nothing — it
pays you to use it.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Next step (not in this commit): FiveSql2 TSqlExecutor integration —
detect when five_SQL is called with a block argument and route to
SqlEach instead of SqlScan + array build.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SqlScan's inner scan was written as a single loop with `if whereFn
!= nil` and a `keep` shadow variable. Branch-predictable for sure,
but still a few extra ops per row and it prevented Go from inlining
the non-nil interface call on the Area branch.
Split into four specialized loop bodies on the two axes that drive
per-row cost:
1. dbfArea != nil && whereFn != nil
2. dbfArea != nil && whereFn == nil ← tightest path (SELECT *)
3. dbfArea == nil && whereFn != nil ← generic Area
4. dbfArea == nil && whereFn == nil
Each body has exactly the instructions it needs — no dead branches,
no shadow variables, no interface dispatch where avoidable. Copy-paste
cost is real but each row save adds up at 50k iterations.
Bench impact (50k rows, 3-run steady state):
No WHERE 9.1ms → 8.7ms 1.38x vs raw (was 1.47x)
Numeric WHERE 6.9ms → 7.0ms ~flat (within noise)
String WHERE 6.2ms → 6.4ms ~flat (within noise)
Raw RDD 6.3ms baseline
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./hbrtl/... PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Second pcode peephole to match the one added for FieldGet(literal).
SqlExprToPrg auto-wraps CHAR column references with AllTrim() to
match SqlCmpEq's CHAR-padding trim semantics, so every string WHERE
predicate evaluates `AllTrim(FieldGet(n)) == 'literal'` per row.
Before this commit each of those per-row evaluations did:
1. PushSymbol ALLTRIM
2. PushSymbol FIELDGET → Function(1) [1 RTL Frame]
3. parseCharField → MakeString [alloc: copies raw bytes]
4. Function(1) → AllTrim RTL [1 RTL Frame]
5. strings.TrimSpace [alloc: new string]
6. Return, continue
New opcode `PcOpFieldTrim <idx>` (0x47) fuses the two RTL calls into
a single opcode that:
1. Calls FastFieldGetter directly (no Frame/Function dispatch).
2. Walks the returned string with ASCII-space trim in place.
3. Pushes `s[lo:hi]` — a sub-slice, no new allocation.
4. Short-circuits back to the same string if no trim needed.
genpc recognizes the shape `AllTrim(FieldGet(<int-literal>))` in
emitCall and emits the fused opcode automatically — no SQL-side
API change. Matches the existing FieldGet peephole's shape.
Bench impact (50k rows, 3-run steady state, vs raw RDD baseline 6.2ms):
String WHERE before 7.9ms → after 6.2ms 1.00x (parity!)
Numeric WHERE 6.9ms (unchanged) 1.11x
No WHERE 9.1ms (unchanged) 1.47x
String WHERE is now at parity with the raw Harbour-style RDD scan.
Compared to session start (119ms), that's a 19x speedup.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
parseNumericField was allocating on every call — `string(raw)` to
convert the record-buffer slice to a string, plus the implicit
allocation from TrimSpace's return value. For a 50k-row scan reading
two numeric fields, that's 100k+ small string allocations per scan,
all of which promptly became garbage.
Rewritten to walk the raw byte slice directly:
- Find the trimmed range by byte indexing (no alloc).
- Parse integer-typed fields (dec == 0) digit-by-digit into int64.
- Only fall back to strconv.ParseFloat + string allocation for
genuinely fractional data (dec > 0 or embedded `.`).
This also lifts the raw RDD baseline in our bench (6.8ms → 6.2ms)
because FieldGet hits this same parser. Every scan path benefits,
not just the FiveSql2 hot loop.
Measured (50k rows, 3-run steady state):
Before After
No WHERE 10.0ms 9.1ms
Numeric WHERE 7.8ms 6.9ms ← now 1.11x raw
String WHERE 7.9ms (see next commit)
Raw RDD baseline 6.8ms 6.2ms ← also faster
Validation:
- hbrdd/dbf tests PASS (including integer/float field roundtrips)
- FiveSql2 43/43
- Harbour compat 51/51
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two stacked optimizations land on the SqlScan hot path. Combined
effect on the 50k-row benchmark:
Before After vs raw
Numeric WHERE 10.2ms 7.8ms 1.15x
String WHERE 10.5ms 7.9ms 1.15x
No WHERE 9.2ms 10.0ms 1.45x
Raw RDD baseline 6.8ms 6.8ms 1.00x
WHERE-predicate paths are now within 15% of the raw Harbour-style
RDD scan loop. The no-WHERE path is unchanged (slight jitter from
the added devirt branch); FieldGet peephole doesn't apply there.
--- Optimization 1: PcOpFieldGet peephole ---
Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips
the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and
calls a direct field getter closure instead. genpc recognizes the
shape `FieldGet(<int-literal>)` during emitCall and emits the
specialized opcode automatically — no SQL-side API change.
Integration:
* hbrt.Thread.FastFieldGetter — hot-path closure set by scan loops.
Non-nil → pcode bypasses dispatch.
Nil → pcode resolves FIELDGET via
the RTL symbol table (correctness
fallback for any other callers).
* compiler/genpc/genpc.go — peephole in emitCall.
* hbrt/pcinterp.go — PcOpFieldGet handler.
This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly
one full Frame/EndProc + RTL dispatch per row × 50k rows.
--- Optimization 2: DBFArea devirtualization ---
SqlScan type-asserts the workarea to *dbf.DBFArea once and runs a
dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the
concrete type. Go's compiler inlines these, skipping the interface
vtable per row. Non-DBF drivers still work via the generic Area
branch.
The FastFieldGetter closure also captures *DBFArea directly in the
DBF branch, so the WHERE predicate side of the hot loop is now
entirely devirtualized: no interface dispatch between the pcode
dispatch loop and the DBF record buffer.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the
two-column row construction + ArraySlab + flat backing bookkeeping
that the raw loop doesn't do. Going below that requires changing
the SQL engine's result shape — out of scope here.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
SqlScan's prior design called hbrt.MakeArrayFrom per matching row,
each one allocating a fresh &HbArray{}. For 50k rows that's 50k tiny
Go heap allocations + GC pressure that the flat-backing-buffer work
from 85541a3 left untouched (that commit eliminated the per-row items
slice alloc but not the header alloc).
hbrt.ArraySlab pre-allocates a `[]HbArray` slab of the estimated row
count and hands out `&slab.buf[idx]` on each WrapNext. One underlying
make() replaces N; pointers stay stable because slab growth reallocates
a fresh buffer instead of reusing the old one, so previously-handed-out
pointers remain valid (the old backing is kept alive by the references).
API kept tiny:
slab := hbrt.NewArraySlab(estRows)
val := slab.WrapNext(items) // returns Value wrapping &slab.buf[i]
SqlScan now pairs this with the existing flat value buffer for a
single-allocation-per-chunk scan hot loop.
Combined bench impact (50k rows, steady state):
Session start Now
no WHERE 14.6ms 9.2ms ← 1.3x vs raw RDD baseline
numeric WHERE 11.7ms 10.2ms
string WHERE 10.5ms 10.5ms
raw RDD baseline 6.8ms 7.0ms
no WHERE is now within 30% of raw RDD. Remaining gap is largely
Area.GetValue boxing overhead and the pcode opcode dispatch loop
itself — no further structural wins without a wider refactor.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Pcode expressions compiled from SQL WHERE clauses (via genpc.CompileExpr)
never contain BEGIN SEQUENCE and can't raise BreakValue, so the defer +
recover dance in ExecPcode's EndProc is pure overhead. For FiveSql2's
per-row WHERE evaluation on a 50k-row scan, that's 50k × ~15ns = ~750µs
of pointless recover bookkeeping.
Split ExecPcode into two variants sharing execPcodeBody:
ExecPcode — full: Frame + defer EndProc. General-purpose,
handles panics. Behavior unchanged.
ExecPcodeFast — hot: Frame + execPcodeBody + EndProcFast. No defer,
no recover. Caller guarantees the pcode body can't
panic with HbError / BreakValue.
SqlScan now uses ExecPcodeFast for per-row WHERE evaluation. Measured
impact on 50k-row no-WHERE benchmark: 10.6ms → 9.2ms steady state
(~13% faster). Effect is smaller on numeric-WHERE because per-row
cost there is dominated by the opcode dispatch itself, not the frame
exit.
Validation:
- FiveSql2 43/43
- go test ./hbrt/... PASS (pcode tests)
- go test ./hbrtl/... PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Previous short-circuit (return 0 unconditionally) was a workaround
for two bugs that are both fixed now:
1. gengo PushLocal(0) panic on unresolved identifiers
→ fixed by 08ad6f4 (PushMemvar fallback).
2. dbInfo(DBI_FULLPATH / DBI_SHARED) returning NIL
→ fixed by d74014a (real implementations).
Restoring the original scan: walk workareas 1..250, check if any
holds an exclusive lock on the target DBF. With dbInfo now functional
and the DBI_* constants defined in include/dbinfo.ch (commit 3a00aa5),
this gives FiveSql2 real pre-flight conflict detection for concurrent
table access rather than silently proceeding into a lock failure.
Validation:
- FiveSql2 43/43
- standalone PRG with dbUseArea + five_SQL works (was the original
repro that triggered the workaround)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
The prior loop allocated one small `[]hbrt.Value` per matching row
(for the row body) plus one HbArray header. For a 50k-row full scan
that's 100k allocations of which the small-slice allocs dominated
fragmentation and GC pressure.
SQLite-inspired fix: pre-allocate a single flat []hbrt.Value of
capacity `RecCount * nFields` at scan start and hand each row a
three-index sub-slice (flat[off:end:end]). The capped sub-slice
still forces a reallocation if PRG code later does `AAdd(row, x)`,
so neighbor rows can't get clobbered.
Sizing the initial buffer off RecCount(err-ignored) was the actual
win — the previous naive grow-from-1024 policy caused five mid-scan
reallocations of a ~200 KB buffer, each memcpy'ing everything so far.
One upfront allocation amortizes much better.
Bench (50k rows, ~/tmp ext4, 3 runs steady-state):
Before After Δ
no WHERE 14.6ms 10.6ms −27%
numeric WHERE 11.7ms 10.0ms −15%
string WHERE 10.5ms 11.0ms ~=
raw RDD baseline 6.8ms 7.0ms
Gap to raw RDD: 2.1x → 1.4x on the dominant no-WHERE case. What's
left is pcode WHERE dispatch (ExecPcode frame per row), the Area
interface boundary, and the HbArray header allocation per row —
all structural costs that would need a wider refactor to close.
Validation:
- FiveSql2 43/43
- go test ./hbrtl/... PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Replaces the `return NIL` stubs with real implementations that read
from the current workarea. Covers the info codes actually used by
downstream code (FiveSql2 TSqlIndex, standalone callers):
DBINFO:
DBI_ISDBF, DBI_CANPUTREC, DBI_FULLPATH, DBI_TABLEEXT, DBI_MEMOEXT,
DBI_SHARED, DBI_ISREADONLY, DBI_GETRECSIZE, DBI_DBVERSION,
DBI_RDDVERSION, DBI_BOF, DBI_EOF, DBI_FOUND, DBI_FCOUNT, DBI_ALIAS,
DBI_POSITIONED
DBORDERINFO:
DBOI_EXPRESSION, DBOI_NAME, DBOI_NUMBER, DBOI_POSITION,
DBOI_ORDERCOUNT, DBOI_KEYCOUNT, DBOI_KEYCOUNTRAW
Unknown info codes still return NIL (Harbour's forgiving fallback).
New accessors on DBFArea (FullPath, IsShared, IsReadOnly) expose the
private filePath/shared/readOnly fields to the hbrtl layer without
plumbing them through the generic Area interface.
Unblocks TSqlIndex:FindExclusive's original DBI_FULLPATH/DBI_SHARED
scan — though the short-circuit there stays in place for now since
it's a correctness workaround that no longer masks a crash thanks
to the recent gengo PushMemvar fallback.
Validation:
- FiveSql2 43/43 (0 warnings)
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Prior behavior used exprToString() to serialize the TO expression
back into a string, so a runtime-evaluated filename like
`( Lower(cTable) + "_pk.ntx" )` ended up as the literal filename
`Lower(cTable) + "_pk.ntx"` on disk. Visible in FiveSql2's PRIMARY
KEY / UNIQUE DDL path: test_sql1999 was creating files with that
literal name, which the test happened not to care about because the
USE inside BEGIN SEQUENCE caught the failure.
Fix: if the File expression contains any function call (detected by
new containsCall walker), emit emitExpr + Pop2 + AsString — runtime
evaluation path. Static filenames (`TO test.ntx`) still use the
cheap exprToString branch.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two sources of cruft kept showing up in `git status` after running
FiveSql2 tests from the repo directory:
1. Scratch tables from standalone bench PRGs (t.dbf, b.dbf, s.dbf,
p.dbf). Anchored to root only so tracked fixtures in subdirs
(area_a.dbf, customers.dbf, idxadv.dbf, ...) stay unaffected.
2. Literal filenames like `Lower(cTable) + "_pk.ntx".ntx` — this
is a FiveSql2 DDL bug where a macro-substituted index filename
fails to evaluate and ends up as the actual filename. Ignoring
the `Lower*.ntx`/`Lower*.cdx` pattern keeps the garbage out of
commits. The underlying DDL bug needs a separate fix.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
TSqlIndex.prg had five undefined identifiers and six undefined
constants that the new CLASS-method analyzer surfaced after the
gengo PushMemvar fallback stopped crashing on them. All real tech
debt, not false positives. This lands the implementations.
New RTL functions (hbrtl/indexrtl.go + register.go):
- FieldType(n) → "C"/"N"/"L"/"D"/"M"/... one-letter type
- FieldLen(n) → length in bytes
- FieldDec(n) → decimal places
- ordCreate(cBag, cTag, cExpr [, bExpr] [, lUnique])
→ DBFArea.OrderCreate with TagName set (CDX tag or NTX tag)
- dbCreateIndex(cFile, cExpr [, bExpr] [, lUnique])
→ legacy Clipper single-tag NTX without TagName
- dbClearIndex() → OrderListClear
All pass through the existing Indexer interface; key expressions go
through the MacroEval slow path since callers pass string literals.
When callers are updated to pass compiled key blocks, the existing
KeyFunc fast path kicks in automatically.
New header files (include/):
- dbinfo.ch — DBI_* and DBOI_* constants with Harbour-compatible
values (FULLPATH=10, SHARED=42, EXPRESSION=2, etc.)
- dbstruct.ch — DBS_NAME/TYPE/LEN/DEC field descriptor indices
TSqlIndex.prg already did `#include "dbinfo.ch"` and `#include
"dbstruct.ch"` but Five's preprocessor silently ignored the missing
files. Both headers land in include/ where cmd/five's include-dir
chain already looks.
Analyzer RTL allow-list updated with the six new function names so
the warning pipeline stays clean.
Result: FiveSql2 build goes from 17 WARN → 0. Both tracked test
suites still pass.
Note: dbInfo() / dbOrderInfo() themselves remain stubbed (return NIL)
— the constants exist for compile-time resolution and for future use
when the stubs are replaced. Callers that depend on actual dbInfo
values still get NIL at runtime.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>