FindColIdx2 searched for bare column name (e.g. 'AMOUNT') but
aFieldNames now contains qualified names ('o.amount') from the
Go join fast path. Added fallback: try xArg[2] (the full AST name)
when the bare name misses. Fixes SUM/AVG/MIN/MAX aggregation after
Go-native hash join.
Verified: 41/41 correctness tests pass (verify_correctness.prg).
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Systematic bug-hunt driven by an automated analysis of all FiveSql2
source files. Each fix is targeted — no speculative refactoring.
--- #1 CLASSDATA hSubCache leaked across queries (CRITICAL) ---
CLASSDATA hSubCache INIT { => } SHARED
shared one hash across ALL TSqlExecutor instances. A non-correlated
subquery cached in query A was silently returned for an unrelated
query B if the subquery text happened to produce the same cache key.
Converted to instance DATA initialized in New().
--- #5+#21 IS NULL / COALESCE treated empty string as NULL (HIGH) ---
RETURN xL == NIL .OR. ( ValType(xL) == "C" .AND. Empty(AllTrim(xL)) )
SQL standard: '' is a valid non-NULL value. Removed the empty-string
check from both IS NULL evaluation and COALESCE skip logic.
--- #4 Multiple ? parameters all returned first value (HIGH) ---
ND_PAR nodes had no index — EvalExpr always returned ::aParams[1].
Parser now stamps each ? with a sequential 1-based index in xNode[2].
EvalExpr uses it to return the correct ::aParams[n].
--- #10+#11 SqlEvalRowExpr missing / and || operators, single-arg
function eval (MEDIUM) ---
Division and string concatenation fell through to RETURN NIL in the
row-expression evaluator used by recursive CTEs and aggregate
ComputeAgg. Also, multi-argument functions like SUBSTR(x,2,3) only
received the first argument. Both fixed.
--- #9 SUM/AVG/MIN/MAX of all NULLs returned 0 instead of NULL
(MEDIUM) ---
SQL standard requires NULL. Changed the aggregate return path to
return NIL when nCount == 0 (SUM/AVG) or when xMin/xMax == NIL.
--- #8 MIN/MAX used SqlCoerceNum for comparison (MEDIUM) ---
Strings and dates were coerced to numbers (Val()) before comparing,
making MIN('banana') == MIN('apple') == 0. Switched to SqlCmpLt
which handles type-appropriate comparison.
--- #7 SqlExprHasAgg only checked top-level node (MEDIUM) ---
Expressions like `salary + COUNT(*)` were not detected as containing
an aggregate because the top node was ND_BIN, not ND_FN. Made the
function recursive — walks ND_BIN, ND_UNI, ND_FN args, ND_CASE
branches.
--- #13 SELECT * only expanded first table in JOINs (MEDIUM) ---
`SELECT * FROM orders o JOIN customers c ON ...` only included
fields from orders. Changed the expansion loop to iterate ALL
entries in ::aTables.
--- #2 s_aOuterStack not unwound on subquery error (HIGH) ---
SubqueryCached's PushOuter/PopOuter pair was not protected by
BEGIN SEQUENCE. A runtime error inside the subquery left a stale
entry on the module-level outer stack, corrupting all subsequent
queries' correlated column resolution. Wrapped in SEQUENCE/RECOVER.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Two SQL:2013 features that were stubs or bugs. Both ship together
because they share testing infrastructure (the SQL:2013 analytics
bench).
--- 1. ROLLUP / CUBE / GROUPING SETS (TSqlAgg) ---
The parser has recognized these for a while, storing them as
`ND_FN "ROLLUP"` / "CUBE" / "GROUPING SETS" nodes inside the
GROUP BY list. GroupBy never actually expanded them — it treated
the ND_FN as an opaque group term, which meant every row hashed
into the empty bucket and the query returned a single row.
New TSqlAgg:ExpandGroupingSets walks the aGroupBy array and
expands each ROLLUP / CUBE / GSETS modifier into a list of flat
grouping sets by cross-product with the surrounding plain terms:
GROUP BY ROLLUP(a, b, c) → {(a,b,c), (a,b), (a), ()}
GROUP BY CUBE(a, b) → {(a,b), (a), (b), ()}
GROUP BY GROUPING SETS((a,b),()) → as-is
GROUP BY x, ROLLUP(a, b) → {(x,a,b), (x,a), (x)}
When the expansion produces more than one set, GroupBy recurses
once per set (passing the plain flat set) and NILs out SELECT
columns that aren't in the current set — the standard subtotal
placeholder. Fast path (no ROLLUP/CUBE/GSETS node) short-circuits
to the original single-pass logic.
Correctness check: `SELECT region, SUM(amount) FROM sales GROUP BY
ROLLUP(region)` on a 5-region dataset now returns 6 rows (5
per-region subtotals + 1 grand total row with region=NIL). Was 1.
--- 2. Correlated subquery memoization (TSqlExecutor) ---
Committed 9e0f82c fixed a silent caching bug that made correlated
subqueries return the first outer-row's result for every subsequent
row, at the cost of dropping caching entirely — every outer row
re-executed the subquery. For Q8 in the SQL:2013 bench (1000 emps,
correlated on 3 distinct depts) that was 4.9 seconds.
The right answer is to memoize per outer-key, not globally. This
commit adds:
- TSqlExecutor:CollectFreeVars(hQ): walks a subquery's WHERE,
columns, and HAVING for ND_COL references whose alias prefix
isn't one of the subquery's own FROM tables. Those are the
outer columns the subquery actually depends on.
- TSqlExecutor:SubqueryCached(xSubNode): runs the free-var
analysis once per distinct AST node (memoized onto a 6th slot
on the node), builds a cache key from the current values of
those free vars via ::Resolve(), looks up in ::hSubCorrCache,
executes on miss. Non-correlated subqueries end up with an
empty free-var list → single cache entry → same behavior as
the old CacheSubquery fast path.
- ND_SUB and ND_SUB-in-IN handlers route through SubqueryCached
instead of the split cache/push-outer logic.
Plus a correctness fix that SubqueryCached surfaced: when a
subquery runs at nDepth > 1, TSqlExecutor rewrites each FROM
table's alias to a depth-suffixed temp (so concurrent opens of
the same file don't collide). Previously the original user-written
alias was only preserved in aTables[i][3] for single-char aliases.
Multi-char aliases like `emp e2` lost their original after the
rename, so FindWA("E2") failed, Resolve("e2.dept") returned NIL,
and `WHERE e2.dept = e1.dept` evaluated NIL=NIL → every row was
filtered out → subquery AVG returned 0 → outer `salary > 0` was
trivially true for everyone. Now we always stash the original
alias in [3] before the rename.
--- Bench (SQL:2013 analytics, 10 queries, emp=1k, sales=20k) ---
Query Before After Δ
────────────────────────────────────────────────────────
Q6 RECURSIVE hierarchy (prev fix) 30ms
Q7 ROLLUP subtotals 86ms, 1 row 106ms, 6 rows (correct)
Q8 Correlated subquery 4933ms 20ms ~245x
(all other queries unchanged at 4–230ms)
Q8 30-row sanity regression test (emp.dept in {A,B,C}, deterministic
salaries so hand-computed averages are 155/810/1765):
SELECT name, dept, salary FROM emp e1
WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)
Before: 30 rows (wrong — returns all)
After: 15 rows (correct — 5 above each dept's average)
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
Surfaced by complex-query benchmarking. Query like:
SELECT d.name AS dept, COUNT(*) AS n, SUM(o.amount) AS total
FROM dept d INNER JOIN emp e ON ... INNER JOIN ord o ON ...
GROUP BY d.name
returned exactly 1 row instead of 100. Removing the AS aliases made
it work correctly. Semantic bug, not a performance issue.
Root cause: TSqlAgg:GroupBy resolved each GROUP BY column by calling
FindColIdx against aFN — the output alias list. For GROUP BY d.name
with d.name AS dept, the group expression's column name was looked
up in {"dept","n","total"} and missed. FindColIdx returned 0, every
row got an empty group key, and the hash collapsed everything into
one bucket.
Fix: new FindGroupIdx walks aCols (SELECT list expressions) instead,
matching the GROUP BY column against each SELECT item's source
expression ND_COL name. Handles qualified refs (d.name -> NAME) and
falls back to FindColIdx for cases where GROUP BY uses a column not
in the SELECT list.
Also hoisted the resolution out of the per-row loop — GROUP BY
columns resolve once into aGroupIdx[] so each row just indexes.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- Complex bench Q4: 1 row -> 100 rows (correct)
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>