Commit Graph

4 Commits

Author SHA1 Message Date
7babfb7281 fix(FiveSql2): 9 latent bugs from static analysis sweep
Systematic bug-hunt driven by an automated analysis of all FiveSql2
source files. Each fix is targeted — no speculative refactoring.

--- #1 CLASSDATA hSubCache leaked across queries (CRITICAL) ---

  CLASSDATA hSubCache INIT { => } SHARED

shared one hash across ALL TSqlExecutor instances. A non-correlated
subquery cached in query A was silently returned for an unrelated
query B if the subquery text happened to produce the same cache key.
Converted to instance DATA initialized in New().

--- #5+#21 IS NULL / COALESCE treated empty string as NULL (HIGH) ---

  RETURN xL == NIL .OR. ( ValType(xL) == "C" .AND. Empty(AllTrim(xL)) )

SQL standard: '' is a valid non-NULL value. Removed the empty-string
check from both IS NULL evaluation and COALESCE skip logic.

--- #4 Multiple ? parameters all returned first value (HIGH) ---

ND_PAR nodes had no index — EvalExpr always returned ::aParams[1].
Parser now stamps each ? with a sequential 1-based index in xNode[2].
EvalExpr uses it to return the correct ::aParams[n].

--- #10+#11 SqlEvalRowExpr missing / and || operators, single-arg
    function eval (MEDIUM) ---

Division and string concatenation fell through to RETURN NIL in the
row-expression evaluator used by recursive CTEs and aggregate
ComputeAgg. Also, multi-argument functions like SUBSTR(x,2,3) only
received the first argument. Both fixed.

--- #9 SUM/AVG/MIN/MAX of all NULLs returned 0 instead of NULL
    (MEDIUM) ---

SQL standard requires NULL. Changed the aggregate return path to
return NIL when nCount == 0 (SUM/AVG) or when xMin/xMax == NIL.

--- #8 MIN/MAX used SqlCoerceNum for comparison (MEDIUM) ---

Strings and dates were coerced to numbers (Val()) before comparing,
making MIN('banana') == MIN('apple') == 0. Switched to SqlCmpLt
which handles type-appropriate comparison.

--- #7 SqlExprHasAgg only checked top-level node (MEDIUM) ---

Expressions like `salary + COUNT(*)` were not detected as containing
an aggregate because the top node was ND_BIN, not ND_FN. Made the
function recursive — walks ND_BIN, ND_UNI, ND_FN args, ND_CASE
branches.

--- #13 SELECT * only expanded first table in JOINs (MEDIUM) ---

`SELECT * FROM orders o JOIN customers c ON ...` only included
fields from orders. Changed the expansion loop to iterate ALL
entries in ::aTables.

--- #2 s_aOuterStack not unwound on subquery error (HIGH) ---

SubqueryCached's PushOuter/PopOuter pair was not protected by
BEGIN SEQUENCE. A runtime error inside the subquery left a stale
entry on the module-level outer stack, corrupting all subsequent
queries' correlated column resolution. Wrapped in SEQUENCE/RECOVER.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-16 17:26:05 +09:00
2d9023622c feat(FiveSql2): ROLLUP/CUBE/GROUPING SETS + correlated subquery memoization
Two SQL:2013 features that were stubs or bugs. Both ship together
because they share testing infrastructure (the SQL:2013 analytics
bench).

--- 1. ROLLUP / CUBE / GROUPING SETS (TSqlAgg) ---

The parser has recognized these for a while, storing them as
`ND_FN "ROLLUP"` / "CUBE" / "GROUPING SETS" nodes inside the
GROUP BY list. GroupBy never actually expanded them — it treated
the ND_FN as an opaque group term, which meant every row hashed
into the empty bucket and the query returned a single row.

New TSqlAgg:ExpandGroupingSets walks the aGroupBy array and
expands each ROLLUP / CUBE / GSETS modifier into a list of flat
grouping sets by cross-product with the surrounding plain terms:

    GROUP BY ROLLUP(a, b, c)          → {(a,b,c), (a,b), (a), ()}
    GROUP BY CUBE(a, b)               → {(a,b), (a), (b), ()}
    GROUP BY GROUPING SETS((a,b),())  → as-is
    GROUP BY x, ROLLUP(a, b)          → {(x,a,b), (x,a), (x)}

When the expansion produces more than one set, GroupBy recurses
once per set (passing the plain flat set) and NILs out SELECT
columns that aren't in the current set — the standard subtotal
placeholder. Fast path (no ROLLUP/CUBE/GSETS node) short-circuits
to the original single-pass logic.

Correctness check: `SELECT region, SUM(amount) FROM sales GROUP BY
ROLLUP(region)` on a 5-region dataset now returns 6 rows (5
per-region subtotals + 1 grand total row with region=NIL). Was 1.

--- 2. Correlated subquery memoization (TSqlExecutor) ---

Committed 9e0f82c fixed a silent caching bug that made correlated
subqueries return the first outer-row's result for every subsequent
row, at the cost of dropping caching entirely — every outer row
re-executed the subquery. For Q8 in the SQL:2013 bench (1000 emps,
correlated on 3 distinct depts) that was 4.9 seconds.

The right answer is to memoize per outer-key, not globally. This
commit adds:

  - TSqlExecutor:CollectFreeVars(hQ): walks a subquery's WHERE,
    columns, and HAVING for ND_COL references whose alias prefix
    isn't one of the subquery's own FROM tables. Those are the
    outer columns the subquery actually depends on.

  - TSqlExecutor:SubqueryCached(xSubNode): runs the free-var
    analysis once per distinct AST node (memoized onto a 6th slot
    on the node), builds a cache key from the current values of
    those free vars via ::Resolve(), looks up in ::hSubCorrCache,
    executes on miss. Non-correlated subqueries end up with an
    empty free-var list → single cache entry → same behavior as
    the old CacheSubquery fast path.

  - ND_SUB and ND_SUB-in-IN handlers route through SubqueryCached
    instead of the split cache/push-outer logic.

Plus a correctness fix that SubqueryCached surfaced: when a
subquery runs at nDepth > 1, TSqlExecutor rewrites each FROM
table's alias to a depth-suffixed temp (so concurrent opens of
the same file don't collide). Previously the original user-written
alias was only preserved in aTables[i][3] for single-char aliases.
Multi-char aliases like `emp e2` lost their original after the
rename, so FindWA("E2") failed, Resolve("e2.dept") returned NIL,
and `WHERE e2.dept = e1.dept` evaluated NIL=NIL → every row was
filtered out → subquery AVG returned 0 → outer `salary > 0` was
trivially true for everyone. Now we always stash the original
alias in [3] before the rename.

--- Bench (SQL:2013 analytics, 10 queries, emp=1k, sales=20k) ---

  Query                        Before        After     Δ
  ────────────────────────────────────────────────────────
  Q6 RECURSIVE hierarchy       (prev fix)    30ms
  Q7 ROLLUP subtotals          86ms, 1 row   106ms, 6 rows  (correct)
  Q8 Correlated subquery       4933ms        20ms           ~245x
  (all other queries unchanged at 4–230ms)

Q8 30-row sanity regression test (emp.dept in {A,B,C}, deterministic
salaries so hand-computed averages are 155/810/1765):

    SELECT name, dept, salary FROM emp e1
    WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)

  Before: 30 rows (wrong — returns all)
  After:  15 rows (correct — 5 above each dept's average)

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:13:31 +09:00
c6799a599e fix(FiveSql2): GROUP BY with aliased SELECT collapses all rows into one
Surfaced by complex-query benchmarking. Query like:

    SELECT d.name AS dept, COUNT(*) AS n, SUM(o.amount) AS total
    FROM dept d INNER JOIN emp e ON ... INNER JOIN ord o ON ...
    GROUP BY d.name

returned exactly 1 row instead of 100. Removing the AS aliases made
it work correctly. Semantic bug, not a performance issue.

Root cause: TSqlAgg:GroupBy resolved each GROUP BY column by calling
FindColIdx against aFN — the output alias list. For GROUP BY d.name
with d.name AS dept, the group expression's column name was looked
up in {"dept","n","total"} and missed. FindColIdx returned 0, every
row got an empty group key, and the hash collapsed everything into
one bucket.

Fix: new FindGroupIdx walks aCols (SELECT list expressions) instead,
matching the GROUP BY column against each SELECT item's source
expression ND_COL name. Handles qualified refs (d.name -> NAME) and
falls back to FindColIdx for cases where GROUP BY uses a column not
in the SELECT list.

Also hoisted the resolution out of the per-row loop — GROUP BY
columns resolve once into aGroupIdx[] so each row just indexes.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - Complex bench Q4: 1 row -> 100 rows (correct)

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 20:25:02 +09:00
486e466592 feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix
Major changes since last commit:
- FiveSql2 SQL:1999 engine (10,458 LOC) — 43/43 ALL PASS
- 21 compiler/runtime bugs fixed (short-circuit AND/OR, FOR LOOP, etc.)
- @byref pass-by-reference via RefCell pattern
- Mutable closure capture (EnsureLocalRef + RefCell sharing)
- RTL: 400 → 479 functions (+79: file, string, datetime, hash, UTF-8)
- DateTime/Timestamp fully working (hb_DateTime, hb_Hour/Min/Sec, display)
- Reserved word guard (39 keywords blocked from function calls)
- AEval arg order fix (element before index)
- Closure capture redecl fix (unique _cap_ names per block)
- Hash/string indexing in ArrayPush/ArrayPop
- Harbour compat test suite: 51/51
- 4 docs: Porting Report, Implementation Plan, Optimization Plan, Commercialization

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 11:35:37 +09:00