Files
five/_FiveSql2
CharlesKWON 2d9023622c feat(FiveSql2): ROLLUP/CUBE/GROUPING SETS + correlated subquery memoization
Two SQL:2013 features that were stubs or bugs. Both ship together
because they share testing infrastructure (the SQL:2013 analytics
bench).

--- 1. ROLLUP / CUBE / GROUPING SETS (TSqlAgg) ---

The parser has recognized these for a while, storing them as
`ND_FN "ROLLUP"` / "CUBE" / "GROUPING SETS" nodes inside the
GROUP BY list. GroupBy never actually expanded them — it treated
the ND_FN as an opaque group term, which meant every row hashed
into the empty bucket and the query returned a single row.

New TSqlAgg:ExpandGroupingSets walks the aGroupBy array and
expands each ROLLUP / CUBE / GSETS modifier into a list of flat
grouping sets by cross-product with the surrounding plain terms:

    GROUP BY ROLLUP(a, b, c)          → {(a,b,c), (a,b), (a), ()}
    GROUP BY CUBE(a, b)               → {(a,b), (a), (b), ()}
    GROUP BY GROUPING SETS((a,b),())  → as-is
    GROUP BY x, ROLLUP(a, b)          → {(x,a,b), (x,a), (x)}

When the expansion produces more than one set, GroupBy recurses
once per set (passing the plain flat set) and NILs out SELECT
columns that aren't in the current set — the standard subtotal
placeholder. Fast path (no ROLLUP/CUBE/GSETS node) short-circuits
to the original single-pass logic.

Correctness check: `SELECT region, SUM(amount) FROM sales GROUP BY
ROLLUP(region)` on a 5-region dataset now returns 6 rows (5
per-region subtotals + 1 grand total row with region=NIL). Was 1.

--- 2. Correlated subquery memoization (TSqlExecutor) ---

Committed 9e0f82c fixed a silent caching bug that made correlated
subqueries return the first outer-row's result for every subsequent
row, at the cost of dropping caching entirely — every outer row
re-executed the subquery. For Q8 in the SQL:2013 bench (1000 emps,
correlated on 3 distinct depts) that was 4.9 seconds.

The right answer is to memoize per outer-key, not globally. This
commit adds:

  - TSqlExecutor:CollectFreeVars(hQ): walks a subquery's WHERE,
    columns, and HAVING for ND_COL references whose alias prefix
    isn't one of the subquery's own FROM tables. Those are the
    outer columns the subquery actually depends on.

  - TSqlExecutor:SubqueryCached(xSubNode): runs the free-var
    analysis once per distinct AST node (memoized onto a 6th slot
    on the node), builds a cache key from the current values of
    those free vars via ::Resolve(), looks up in ::hSubCorrCache,
    executes on miss. Non-correlated subqueries end up with an
    empty free-var list → single cache entry → same behavior as
    the old CacheSubquery fast path.

  - ND_SUB and ND_SUB-in-IN handlers route through SubqueryCached
    instead of the split cache/push-outer logic.

Plus a correctness fix that SubqueryCached surfaced: when a
subquery runs at nDepth > 1, TSqlExecutor rewrites each FROM
table's alias to a depth-suffixed temp (so concurrent opens of
the same file don't collide). Previously the original user-written
alias was only preserved in aTables[i][3] for single-char aliases.
Multi-char aliases like `emp e2` lost their original after the
rename, so FindWA("E2") failed, Resolve("e2.dept") returned NIL,
and `WHERE e2.dept = e1.dept` evaluated NIL=NIL → every row was
filtered out → subquery AVG returned 0 → outer `salary > 0` was
trivially true for everyone. Now we always stash the original
alias in [3] before the rename.

--- Bench (SQL:2013 analytics, 10 queries, emp=1k, sales=20k) ---

  Query                        Before        After     Δ
  ────────────────────────────────────────────────────────
  Q6 RECURSIVE hierarchy       (prev fix)    30ms
  Q7 ROLLUP subtotals          86ms, 1 row   106ms, 6 rows  (correct)
  Q8 Correlated subquery       4933ms        20ms           ~245x
  (all other queries unchanged at 4–230ms)

Q8 30-row sanity regression test (emp.dept in {A,B,C}, deterministic
salaries so hand-computed averages are 155/810/1765):

    SELECT name, dept, salary FROM emp e1
    WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)

  Before: 30 rows (wrong — returns all)
  After:  15 rows (correct — 5 above each dept's average)

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 10:13:31 +09:00
..

FiveSql2 — SQL Engine for Harbour DBF/NTX/CDX

Pratt parser + SQL:1992-2023 full standard support Supports both NTX (Clipper) and CDX (FoxPro/ADS) indexes

Architecture

five_SQL("SELECT ...")
   │
   ├── TSqlLexer        Tokenizer
   ├── TSqlParser2      Pratt parser (data-driven operators)
   ├── TSqlExecutor     Query executor (Volcano model)
   │     ├── TSqlAlias  Central alias manager (no collisions)
   │     ├── TSqlIndex  NTX/CDX index optimization (auto-detect)
   │     ├── TSqlAgg    GROUP BY / aggregation
   │     ├── TSqlSort   ORDER BY / DISTINCT
   │     ├── TSqlDDL    CREATE/DROP/ALTER TABLE/INDEX
   │     └── TSqlTxn    BEGIN/COMMIT/ROLLBACK
   ├── TSqlExpr         AST nodes + expression evaluation
   └── TSqlFunc         60+ scalar functions

Build & Test

export PATH="/path/to/harbour-core/bin/linux/gcc:$PATH"
export HB_INSTALL_PREFIX="/path/to/harbour-core"

make          # Build all tests
make test     # Run all 157 tests
make bench    # Parser benchmark
make clean    # Clean

SQL Standard Coverage

Standard Features Tests
SQL:1992 SELECT, JOIN, GROUP BY, HAVING, Subquery, CASE, CAST 43
SQL:1999 CTE, Recursive CTE, Window Functions, MERGE 10
SQL:2003 SIMILAR TO, GROUPING SETS, LATERAL, Window frames 64
SQL:2008 FETCH/OFFSET, FOR UPDATE, Extended MERGE (incl.)
SQL:2016 JSON functions, LISTAGG (incl.)
SQL:2023 ANY_VALUE, GREATEST/LEAST, BOOL_AND/OR (incl.)
Challenge LeetCode-level complex queries 15
Extreme Production analytics stress tests 15

Adding New Operators

Edit TSqlParser2.prg, method InitInfixTables():

::hInfixTT[ TK_MYOP ] := { "<=>", 40, 41, ND_BIN }

One line. No structural changes needed.

Copyright (c) 2025-2026 Charles KWON (Charles KWON OhJun) Email: charleskwonohjun@gmail.com All rights reserved.