Files

CharlesKWON 9e0f82c5a8 perf+fix(FiveSql2): recursive-CTE hash join + correct correlated subqueries

Two fixes uncovered by a SQL:2013 analytics benchmark covering the
query patterns people actually run on DBF data (OLAP, BI, hierarchy
traversal).

--- Fix 1: correlated subquery was silently wrong ---

EvalExpr's ND_SUB handler only pushed the outer context when
`s_aOuterStack` was already non-empty — otherwise it routed the
subquery through CacheSubquery, which stores the first result under
a key derived from the subquery's syntax tokens. For a correlated
subquery in a top-level WHERE:

    SELECT name, dept, salary FROM emp e1
    WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)

the first outer row saw an empty stack, cached the result, and
every subsequent outer row got the same cached value regardless of
e1.dept. The query returned all 1000 employees instead of the 505
who actually beat their department's average.

Fix: always PushOuter + Run, no cache. Correctness over caching.
Trade-off: non-correlated scalar subqueries now re-execute per
outer row. A proper per-outer-key memoization is deferred — it
requires walking the subquery AST to collect free variables.

--- Fix 2: WITH RECURSIVE hierarchy join was O(m*n) ---

RecCteJoin (the in-memory join used when a recursive CTE's step
references both a real table and the CTE frontier) ran a flat
nested loop: for each DBF row × each prev-iteration row, build a
combined row buffer and run SqlEvalRowExpr on the ON condition.

For a 4-level 1000-employee hierarchy that's ~1M ON evaluations,
~4.6 seconds.

Fix: detect the shape `dbfAlias.col = cteAlias.col` at join-setup
time, build a PRG hash on the CTE frontier keyed by its join column
(aPrevRows is always small — at most the last iteration's emitted
rows), then scan the DBF side once and probe the hash. Complex ON
predicates fall through to the original nested loop.

--- Bench (SQL:2013 analytics, emp=1k, sales=20k, evt=30k) ---

  Query                              Before     After    Speedup
  ──────────────────────────────────────────────────────────────
  RECURSIVE hierarchy 4-level        4603ms     30ms     ~150x
  Correlated subquery (all emp)      10ms ❌    4933ms ✓ (correct)

Other SQL:2013 queries (ROW_NUMBER top-N, running total, moving
average, DENSE_RANK, LAG, NTILE, gaps-and-islands) are all in the
expected 10–230ms range for these dataset sizes, unchanged by
this commit.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Known follow-ups (not in this commit):
  - Q7 ROLLUP(col) parses but isn't expanded in GroupBy — returns
    a single grand-total row instead of per-value + total. Grouping
    sets implementation is a separate feature.
  - Correlated subquery memoization by outer free-variable key
    would bring Q8 from 4.9s back to ~50ms for small cardinality
    correlations — requires AST free-var analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>

2026-04-14 23:25:58 +09:00

bin/.hbmk/linux/gcc

feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix

2026-04-11 11:35:37 +09:00

src

perf+fix(FiveSql2): recursive-CTE hash join + correct correlated subqueries

2026-04-14 23:25:58 +09:00

test

feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix

2026-04-11 11:35:37 +09:00

FIVE_COMPAT.md

feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix

2026-04-11 11:35:37 +09:00

fk_parent_pk.ntx

feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix

2026-04-11 11:35:37 +09:00

Makefile

feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix

2026-04-11 11:35:37 +09:00

README.md

feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix

2026-04-11 11:35:37 +09:00

README.md

FiveSql2 — SQL Engine for Harbour DBF/NTX/CDX

Pratt parser + SQL:1992-2023 full standard support Supports both NTX (Clipper) and CDX (FoxPro/ADS) indexes

Architecture

five_SQL("SELECT ...")
   │
   ├── TSqlLexer        Tokenizer
   ├── TSqlParser2      Pratt parser (data-driven operators)
   ├── TSqlExecutor     Query executor (Volcano model)
   │     ├── TSqlAlias  Central alias manager (no collisions)
   │     ├── TSqlIndex  NTX/CDX index optimization (auto-detect)
   │     ├── TSqlAgg    GROUP BY / aggregation
   │     ├── TSqlSort   ORDER BY / DISTINCT
   │     ├── TSqlDDL    CREATE/DROP/ALTER TABLE/INDEX
   │     └── TSqlTxn    BEGIN/COMMIT/ROLLBACK
   ├── TSqlExpr         AST nodes + expression evaluation
   └── TSqlFunc         60+ scalar functions

Build & Test

export PATH="/path/to/harbour-core/bin/linux/gcc:$PATH"
export HB_INSTALL_PREFIX="/path/to/harbour-core"

make          # Build all tests
make test     # Run all 157 tests
make bench    # Parser benchmark
make clean    # Clean

SQL Standard Coverage

Standard	Features	Tests
SQL:1992	SELECT, JOIN, GROUP BY, HAVING, Subquery, CASE, CAST	43
SQL:1999	CTE, Recursive CTE, Window Functions, MERGE	10
SQL:2003	SIMILAR TO, GROUPING SETS, LATERAL, Window frames	64
SQL:2008	FETCH/OFFSET, FOR UPDATE, Extended MERGE	(incl.)
SQL:2016	JSON functions, LISTAGG	(incl.)
SQL:2023	ANY_VALUE, GREATEST/LEAST, BOOL_AND/OR	(incl.)
Challenge	LeetCode-level complex queries	15
Extreme	Production analytics stress tests	15

Adding New Operators

Edit TSqlParser2.prg, method InitInfixTables():

::hInfixTT[ TK_MYOP ] := { "<=>", 40, 41, ND_BIN }

One line. No structural changes needed.

README.md

FiveSql2 — SQL Engine for Harbour DBF/NTX/CDX

Architecture

Build & Test

SQL Standard Coverage

Adding New Operators

Copyright