Two SQL:2013 features that were stubs or bugs. Both ship together
because they share testing infrastructure (the SQL:2013 analytics
bench).
--- 1. ROLLUP / CUBE / GROUPING SETS (TSqlAgg) ---
The parser has recognized these for a while, storing them as
`ND_FN "ROLLUP"` / "CUBE" / "GROUPING SETS" nodes inside the
GROUP BY list. GroupBy never actually expanded them — it treated
the ND_FN as an opaque group term, which meant every row hashed
into the empty bucket and the query returned a single row.
New TSqlAgg:ExpandGroupingSets walks the aGroupBy array and
expands each ROLLUP / CUBE / GSETS modifier into a list of flat
grouping sets by cross-product with the surrounding plain terms:
GROUP BY ROLLUP(a, b, c) → {(a,b,c), (a,b), (a), ()}
GROUP BY CUBE(a, b) → {(a,b), (a), (b), ()}
GROUP BY GROUPING SETS((a,b),()) → as-is
GROUP BY x, ROLLUP(a, b) → {(x,a,b), (x,a), (x)}
When the expansion produces more than one set, GroupBy recurses
once per set (passing the plain flat set) and NILs out SELECT
columns that aren't in the current set — the standard subtotal
placeholder. Fast path (no ROLLUP/CUBE/GSETS node) short-circuits
to the original single-pass logic.
Correctness check: `SELECT region, SUM(amount) FROM sales GROUP BY
ROLLUP(region)` on a 5-region dataset now returns 6 rows (5
per-region subtotals + 1 grand total row with region=NIL). Was 1.
--- 2. Correlated subquery memoization (TSqlExecutor) ---
Committed 9e0f82c fixed a silent caching bug that made correlated
subqueries return the first outer-row's result for every subsequent
row, at the cost of dropping caching entirely — every outer row
re-executed the subquery. For Q8 in the SQL:2013 bench (1000 emps,
correlated on 3 distinct depts) that was 4.9 seconds.
The right answer is to memoize per outer-key, not globally. This
commit adds:
- TSqlExecutor:CollectFreeVars(hQ): walks a subquery's WHERE,
columns, and HAVING for ND_COL references whose alias prefix
isn't one of the subquery's own FROM tables. Those are the
outer columns the subquery actually depends on.
- TSqlExecutor:SubqueryCached(xSubNode): runs the free-var
analysis once per distinct AST node (memoized onto a 6th slot
on the node), builds a cache key from the current values of
those free vars via ::Resolve(), looks up in ::hSubCorrCache,
executes on miss. Non-correlated subqueries end up with an
empty free-var list → single cache entry → same behavior as
the old CacheSubquery fast path.
- ND_SUB and ND_SUB-in-IN handlers route through SubqueryCached
instead of the split cache/push-outer logic.
Plus a correctness fix that SubqueryCached surfaced: when a
subquery runs at nDepth > 1, TSqlExecutor rewrites each FROM
table's alias to a depth-suffixed temp (so concurrent opens of
the same file don't collide). Previously the original user-written
alias was only preserved in aTables[i][3] for single-char aliases.
Multi-char aliases like `emp e2` lost their original after the
rename, so FindWA("E2") failed, Resolve("e2.dept") returned NIL,
and `WHERE e2.dept = e1.dept` evaluated NIL=NIL → every row was
filtered out → subquery AVG returned 0 → outer `salary > 0` was
trivially true for everyone. Now we always stash the original
alias in [3] before the rename.
--- Bench (SQL:2013 analytics, 10 queries, emp=1k, sales=20k) ---
Query Before After Δ
────────────────────────────────────────────────────────
Q6 RECURSIVE hierarchy (prev fix) 30ms
Q7 ROLLUP subtotals 86ms, 1 row 106ms, 6 rows (correct)
Q8 Correlated subquery 4933ms 20ms ~245x
(all other queries unchanged at 4–230ms)
Q8 30-row sanity regression test (emp.dept in {A,B,C}, deterministic
salaries so hand-computed averages are 155/810/1765):
SELECT name, dept, salary FROM emp e1
WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)
Before: 30 rows (wrong — returns all)
After: 15 rows (correct — 5 above each dept's average)
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FiveSql2 — SQL Engine for Harbour DBF/NTX/CDX
Pratt parser + SQL:1992-2023 full standard support Supports both NTX (Clipper) and CDX (FoxPro/ADS) indexes
Architecture
five_SQL("SELECT ...")
│
├── TSqlLexer Tokenizer
├── TSqlParser2 Pratt parser (data-driven operators)
├── TSqlExecutor Query executor (Volcano model)
│ ├── TSqlAlias Central alias manager (no collisions)
│ ├── TSqlIndex NTX/CDX index optimization (auto-detect)
│ ├── TSqlAgg GROUP BY / aggregation
│ ├── TSqlSort ORDER BY / DISTINCT
│ ├── TSqlDDL CREATE/DROP/ALTER TABLE/INDEX
│ └── TSqlTxn BEGIN/COMMIT/ROLLBACK
├── TSqlExpr AST nodes + expression evaluation
└── TSqlFunc 60+ scalar functions
Build & Test
export PATH="/path/to/harbour-core/bin/linux/gcc:$PATH"
export HB_INSTALL_PREFIX="/path/to/harbour-core"
make # Build all tests
make test # Run all 157 tests
make bench # Parser benchmark
make clean # Clean
SQL Standard Coverage
| Standard | Features | Tests |
|---|---|---|
| SQL:1992 | SELECT, JOIN, GROUP BY, HAVING, Subquery, CASE, CAST | 43 |
| SQL:1999 | CTE, Recursive CTE, Window Functions, MERGE | 10 |
| SQL:2003 | SIMILAR TO, GROUPING SETS, LATERAL, Window frames | 64 |
| SQL:2008 | FETCH/OFFSET, FOR UPDATE, Extended MERGE | (incl.) |
| SQL:2016 | JSON functions, LISTAGG | (incl.) |
| SQL:2023 | ANY_VALUE, GREATEST/LEAST, BOOL_AND/OR | (incl.) |
| Challenge | LeetCode-level complex queries | 15 |
| Extreme | Production analytics stress tests | 15 |
Adding New Operators
Edit TSqlParser2.prg, method InitInfixTables():
::hInfixTT[ TK_MYOP ] := { "<=>", 40, 41, ND_BIN }
One line. No structural changes needed.
Copyright
Copyright (c) 2025-2026 Charles KWON (Charles KWON OhJun) Email: charleskwonohjun@gmail.com All rights reserved.