Extreme subquery stress bench (12 patterns spanning scalar-in-SELECT,
nested correlation, EXISTS, NOT IN, derived tables, self-joins, and
mixed combinations) exposed three weaknesses in the post-ROLLUP state:
1. EXISTS / NOT EXISTS evaluated the full subquery result per outer
row, even though it only needs to know whether any row matches.
2. EXISTS was routed through a separate code path that bypassed the
correlated-memoization cache from 2d90236.
3. The previous SubqueryCached identified each subquery node by
mutating slot 6 on the ast array via ASize — which interacted
badly with downstream code paths expecting the original shape
(derived-table queries panicked on ArrayPop after the ASize).
Fixes:
* EXISTS / NOT EXISTS now route through SubqueryCached the same way
ND_SUB in WHERE does, so correlated EXISTS predicates memoize on
outer free-variable values when the cardinality is low.
* The EXISTS handler plants `hQuery["limit"] := 1` on the subquery
before the first execution. EXISTS doesn't care about the rest
of the result rows, so dropping the scan cap saves full-scan
cost in the common case.
* A new early-termination branch in RunSelect's scan loop exits
the `WHILE !Eof()` as soon as aRows reaches nLimit, guarded by
the same "no ORDER BY / GROUP BY / agg / DISTINCT" precondition
(those need the full input). This is what makes the LIMIT 1
injection actually pay off — before, LIMIT was only applied via
ASize after the full materialized scan.
* SubqueryCached no longer mutates the parse tree. Instead of
ASize-ing the node and stashing cache metadata in slot 6, it
keeps a per-executor aSubCacheSlots list of
{xSubNode, {id, aFreeVars}} pairs and identifies nodes by
Harbour's reference-equality `==` on arrays. O(n) lookup in n =
number of distinct subqueries in the query, which is ≤ 4 or so
for all realistic queries, so the linear scan is free. Fixes the
derived-table ArrayPop panic.
Bench impact (emp=500, prod=100, ord=5k — subquery hell):
Pattern Before After Δ
───────────────────────────────────────────────────────
H3 Correlated EXISTS 13.3s 10.0s 1.3x
H7 Scalar-in-SELECT + JOIN 362ms 2ms 181x
H8 NOT EXISTS self-join 1.8s 900ms 2.0x
H11 Scalar + EXISTS + derived 13.7s 3.2s 4.3x
(H1, H2, H5, H6, H9, H10, H12 unchanged at 3–72ms)
H7's 181x is the scalar-in-SELECT-list memoization payoff — each
dept's revenue subquery used to run 100 times (once per SALES emp),
now runs once per distinct dept.
H3's 1.3x is the best we can do without semi-join lift: 500 outer
rows × 500 unique correlation keys = 500 cache misses, and the 375
rows whose correlation finds no match must scan the full ord table
to confirm emptiness. Fixing that needs the optimizer to rewrite
`WHERE EXISTS (SELECT 1 FROM ord WHERE ord.emp_id = e.id AND ...)`
into `WHERE e.id IN (SELECT DISTINCT emp_id FROM ord WHERE ...)`,
which is a real query-rewrite feature left for a follow-up.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FiveSql2 — SQL Engine for Harbour DBF/NTX/CDX
Pratt parser + SQL:1992-2023 full standard support Supports both NTX (Clipper) and CDX (FoxPro/ADS) indexes
Architecture
five_SQL("SELECT ...")
│
├── TSqlLexer Tokenizer
├── TSqlParser2 Pratt parser (data-driven operators)
├── TSqlExecutor Query executor (Volcano model)
│ ├── TSqlAlias Central alias manager (no collisions)
│ ├── TSqlIndex NTX/CDX index optimization (auto-detect)
│ ├── TSqlAgg GROUP BY / aggregation
│ ├── TSqlSort ORDER BY / DISTINCT
│ ├── TSqlDDL CREATE/DROP/ALTER TABLE/INDEX
│ └── TSqlTxn BEGIN/COMMIT/ROLLBACK
├── TSqlExpr AST nodes + expression evaluation
└── TSqlFunc 60+ scalar functions
Build & Test
export PATH="/path/to/harbour-core/bin/linux/gcc:$PATH"
export HB_INSTALL_PREFIX="/path/to/harbour-core"
make # Build all tests
make test # Run all 157 tests
make bench # Parser benchmark
make clean # Clean
SQL Standard Coverage
| Standard | Features | Tests |
|---|---|---|
| SQL:1992 | SELECT, JOIN, GROUP BY, HAVING, Subquery, CASE, CAST | 43 |
| SQL:1999 | CTE, Recursive CTE, Window Functions, MERGE | 10 |
| SQL:2003 | SIMILAR TO, GROUPING SETS, LATERAL, Window frames | 64 |
| SQL:2008 | FETCH/OFFSET, FOR UPDATE, Extended MERGE | (incl.) |
| SQL:2016 | JSON functions, LISTAGG | (incl.) |
| SQL:2023 | ANY_VALUE, GREATEST/LEAST, BOOL_AND/OR | (incl.) |
| Challenge | LeetCode-level complex queries | 15 |
| Extreme | Production analytics stress tests | 15 |
Adding New Operators
Edit TSqlParser2.prg, method InitInfixTables():
::hInfixTT[ TK_MYOP ] := { "<=>", 40, 41, ND_BIN }
One line. No structural changes needed.
Copyright
Copyright (c) 2025-2026 Charles KWON (Charles KWON OhJun) Email: charleskwonohjun@gmail.com All rights reserved.