Two fixes uncovered by a SQL:2013 analytics benchmark covering the
query patterns people actually run on DBF data (OLAP, BI, hierarchy
traversal).
--- Fix 1: correlated subquery was silently wrong ---
EvalExpr's ND_SUB handler only pushed the outer context when
`s_aOuterStack` was already non-empty — otherwise it routed the
subquery through CacheSubquery, which stores the first result under
a key derived from the subquery's syntax tokens. For a correlated
subquery in a top-level WHERE:
SELECT name, dept, salary FROM emp e1
WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)
the first outer row saw an empty stack, cached the result, and
every subsequent outer row got the same cached value regardless of
e1.dept. The query returned all 1000 employees instead of the 505
who actually beat their department's average.
Fix: always PushOuter + Run, no cache. Correctness over caching.
Trade-off: non-correlated scalar subqueries now re-execute per
outer row. A proper per-outer-key memoization is deferred — it
requires walking the subquery AST to collect free variables.
--- Fix 2: WITH RECURSIVE hierarchy join was O(m*n) ---
RecCteJoin (the in-memory join used when a recursive CTE's step
references both a real table and the CTE frontier) ran a flat
nested loop: for each DBF row × each prev-iteration row, build a
combined row buffer and run SqlEvalRowExpr on the ON condition.
For a 4-level 1000-employee hierarchy that's ~1M ON evaluations,
~4.6 seconds.
Fix: detect the shape `dbfAlias.col = cteAlias.col` at join-setup
time, build a PRG hash on the CTE frontier keyed by its join column
(aPrevRows is always small — at most the last iteration's emitted
rows), then scan the DBF side once and probe the hash. Complex ON
predicates fall through to the original nested loop.
--- Bench (SQL:2013 analytics, emp=1k, sales=20k, evt=30k) ---
Query Before After Speedup
──────────────────────────────────────────────────────────────
RECURSIVE hierarchy 4-level 4603ms 30ms ~150x
Correlated subquery (all emp) 10ms ❌ 4933ms ✓ (correct)
Other SQL:2013 queries (ROW_NUMBER top-N, running total, moving
average, DENSE_RANK, LAG, NTILE, gaps-and-islands) are all in the
expected 10–230ms range for these dataset sizes, unchanged by
this commit.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Known follow-ups (not in this commit):
- Q7 ROLLUP(col) parses but isn't expanded in GroupBy — returns
a single grand-total row instead of per-value + total. Grouping
sets implementation is a separate feature.
- Correlated subquery memoization by outer free-variable key
would bring Q8 from 4.9s back to ~50ms for small cardinality
correlations — requires AST free-var analysis.
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FiveSql2 — SQL Engine for Harbour DBF/NTX/CDX
Pratt parser + SQL:1992-2023 full standard support Supports both NTX (Clipper) and CDX (FoxPro/ADS) indexes
Architecture
five_SQL("SELECT ...")
│
├── TSqlLexer Tokenizer
├── TSqlParser2 Pratt parser (data-driven operators)
├── TSqlExecutor Query executor (Volcano model)
│ ├── TSqlAlias Central alias manager (no collisions)
│ ├── TSqlIndex NTX/CDX index optimization (auto-detect)
│ ├── TSqlAgg GROUP BY / aggregation
│ ├── TSqlSort ORDER BY / DISTINCT
│ ├── TSqlDDL CREATE/DROP/ALTER TABLE/INDEX
│ └── TSqlTxn BEGIN/COMMIT/ROLLBACK
├── TSqlExpr AST nodes + expression evaluation
└── TSqlFunc 60+ scalar functions
Build & Test
export PATH="/path/to/harbour-core/bin/linux/gcc:$PATH"
export HB_INSTALL_PREFIX="/path/to/harbour-core"
make # Build all tests
make test # Run all 157 tests
make bench # Parser benchmark
make clean # Clean
SQL Standard Coverage
| Standard | Features | Tests |
|---|---|---|
| SQL:1992 | SELECT, JOIN, GROUP BY, HAVING, Subquery, CASE, CAST | 43 |
| SQL:1999 | CTE, Recursive CTE, Window Functions, MERGE | 10 |
| SQL:2003 | SIMILAR TO, GROUPING SETS, LATERAL, Window frames | 64 |
| SQL:2008 | FETCH/OFFSET, FOR UPDATE, Extended MERGE | (incl.) |
| SQL:2016 | JSON functions, LISTAGG | (incl.) |
| SQL:2023 | ANY_VALUE, GREATEST/LEAST, BOOL_AND/OR | (incl.) |
| Challenge | LeetCode-level complex queries | 15 |
| Extreme | Production analytics stress tests | 15 |
Adding New Operators
Edit TSqlParser2.prg, method InitInfixTables():
::hInfixTT[ TK_MYOP ] := { "<=>", 40, 41, ND_BIN }
One line. No structural changes needed.
Copyright
Copyright (c) 2025-2026 Charles KWON (Charles KWON OhJun) Email: charleskwonohjun@gmail.com All rights reserved.