Files
five/_FiveSql2
CharlesKWON ce7593c50f perf(FiveSql2): EXISTS → LIMIT 1 early exit, subquery identity via AScan
Extreme subquery stress bench (12 patterns spanning scalar-in-SELECT,
nested correlation, EXISTS, NOT IN, derived tables, self-joins, and
mixed combinations) exposed three weaknesses in the post-ROLLUP state:

1. EXISTS / NOT EXISTS evaluated the full subquery result per outer
   row, even though it only needs to know whether any row matches.
2. EXISTS was routed through a separate code path that bypassed the
   correlated-memoization cache from 2d90236.
3. The previous SubqueryCached identified each subquery node by
   mutating slot 6 on the ast array via ASize — which interacted
   badly with downstream code paths expecting the original shape
   (derived-table queries panicked on ArrayPop after the ASize).

Fixes:

* EXISTS / NOT EXISTS now route through SubqueryCached the same way
  ND_SUB in WHERE does, so correlated EXISTS predicates memoize on
  outer free-variable values when the cardinality is low.

* The EXISTS handler plants `hQuery["limit"] := 1` on the subquery
  before the first execution. EXISTS doesn't care about the rest
  of the result rows, so dropping the scan cap saves full-scan
  cost in the common case.

* A new early-termination branch in RunSelect's scan loop exits
  the `WHILE !Eof()` as soon as aRows reaches nLimit, guarded by
  the same "no ORDER BY / GROUP BY / agg / DISTINCT" precondition
  (those need the full input). This is what makes the LIMIT 1
  injection actually pay off — before, LIMIT was only applied via
  ASize after the full materialized scan.

* SubqueryCached no longer mutates the parse tree. Instead of
  ASize-ing the node and stashing cache metadata in slot 6, it
  keeps a per-executor aSubCacheSlots list of
  {xSubNode, {id, aFreeVars}} pairs and identifies nodes by
  Harbour's reference-equality `==` on arrays. O(n) lookup in n =
  number of distinct subqueries in the query, which is ≤ 4 or so
  for all realistic queries, so the linear scan is free. Fixes the
  derived-table ArrayPop panic.

Bench impact (emp=500, prod=100, ord=5k — subquery hell):

  Pattern                           Before    After   Δ
  ───────────────────────────────────────────────────────
  H3  Correlated EXISTS             13.3s    10.0s   1.3x
  H7  Scalar-in-SELECT + JOIN        362ms    2ms   181x
  H8  NOT EXISTS self-join            1.8s  900ms   2.0x
  H11 Scalar + EXISTS + derived     13.7s    3.2s   4.3x
  (H1, H2, H5, H6, H9, H10, H12 unchanged at 3–72ms)

H7's 181x is the scalar-in-SELECT-list memoization payoff — each
dept's revenue subquery used to run 100 times (once per SALES emp),
now runs once per distinct dept.

H3's 1.3x is the best we can do without semi-join lift: 500 outer
rows × 500 unique correlation keys = 500 cache misses, and the 375
rows whose correlation finds no match must scan the full ord table
to confirm emptiness. Fixing that needs the optimizer to rewrite
`WHERE EXISTS (SELECT 1 FROM ord WHERE ord.emp_id = e.id AND ...)`
into `WHERE e.id IN (SELECT DISTINCT emp_id FROM ord WHERE ...)`,
which is a real query-rewrite feature left for a follow-up.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-15 16:31:36 +09:00
..

FiveSql2 — SQL Engine for Harbour DBF/NTX/CDX

Pratt parser + SQL:1992-2023 full standard support Supports both NTX (Clipper) and CDX (FoxPro/ADS) indexes

Architecture

five_SQL("SELECT ...")
   │
   ├── TSqlLexer        Tokenizer
   ├── TSqlParser2      Pratt parser (data-driven operators)
   ├── TSqlExecutor     Query executor (Volcano model)
   │     ├── TSqlAlias  Central alias manager (no collisions)
   │     ├── TSqlIndex  NTX/CDX index optimization (auto-detect)
   │     ├── TSqlAgg    GROUP BY / aggregation
   │     ├── TSqlSort   ORDER BY / DISTINCT
   │     ├── TSqlDDL    CREATE/DROP/ALTER TABLE/INDEX
   │     └── TSqlTxn    BEGIN/COMMIT/ROLLBACK
   ├── TSqlExpr         AST nodes + expression evaluation
   └── TSqlFunc         60+ scalar functions

Build & Test

export PATH="/path/to/harbour-core/bin/linux/gcc:$PATH"
export HB_INSTALL_PREFIX="/path/to/harbour-core"

make          # Build all tests
make test     # Run all 157 tests
make bench    # Parser benchmark
make clean    # Clean

SQL Standard Coverage

Standard Features Tests
SQL:1992 SELECT, JOIN, GROUP BY, HAVING, Subquery, CASE, CAST 43
SQL:1999 CTE, Recursive CTE, Window Functions, MERGE 10
SQL:2003 SIMILAR TO, GROUPING SETS, LATERAL, Window frames 64
SQL:2008 FETCH/OFFSET, FOR UPDATE, Extended MERGE (incl.)
SQL:2016 JSON functions, LISTAGG (incl.)
SQL:2023 ANY_VALUE, GREATEST/LEAST, BOOL_AND/OR (incl.)
Challenge LeetCode-level complex queries 15
Extreme Production analytics stress tests 15

Adding New Operators

Edit TSqlParser2.prg, method InitInfixTables():

::hInfixTT[ TK_MYOP ] := { "<=>", 40, 41, ND_BIN }

One line. No structural changes needed.

Copyright (c) 2025-2026 Charles KWON (Charles KWON OhJun) Email: charleskwonohjun@gmail.com All rights reserved.