perf(FiveSql2): EXISTS → LIMIT 1 early exit, subquery identity via AScan

Extreme subquery stress bench (12 patterns spanning scalar-in-SELECT,
nested correlation, EXISTS, NOT IN, derived tables, self-joins, and
mixed combinations) exposed three weaknesses in the post-ROLLUP state:

1. EXISTS / NOT EXISTS evaluated the full subquery result per outer
   row, even though it only needs to know whether any row matches.
2. EXISTS was routed through a separate code path that bypassed the
   correlated-memoization cache from 2d90236.
3. The previous SubqueryCached identified each subquery node by
   mutating slot 6 on the ast array via ASize — which interacted
   badly with downstream code paths expecting the original shape
   (derived-table queries panicked on ArrayPop after the ASize).

Fixes:

* EXISTS / NOT EXISTS now route through SubqueryCached the same way
  ND_SUB in WHERE does, so correlated EXISTS predicates memoize on
  outer free-variable values when the cardinality is low.

* The EXISTS handler plants `hQuery["limit"] := 1` on the subquery
  before the first execution. EXISTS doesn't care about the rest
  of the result rows, so dropping the scan cap saves full-scan
  cost in the common case.

* A new early-termination branch in RunSelect's scan loop exits
  the `WHILE !Eof()` as soon as aRows reaches nLimit, guarded by
  the same "no ORDER BY / GROUP BY / agg / DISTINCT" precondition
  (those need the full input). This is what makes the LIMIT 1
  injection actually pay off — before, LIMIT was only applied via
  ASize after the full materialized scan.

* SubqueryCached no longer mutates the parse tree. Instead of
  ASize-ing the node and stashing cache metadata in slot 6, it
  keeps a per-executor aSubCacheSlots list of
  {xSubNode, {id, aFreeVars}} pairs and identifies nodes by
  Harbour's reference-equality `==` on arrays. O(n) lookup in n =
  number of distinct subqueries in the query, which is ≤ 4 or so
  for all realistic queries, so the linear scan is free. Fixes the
  derived-table ArrayPop panic.

Bench impact (emp=500, prod=100, ord=5k — subquery hell):

  Pattern                           Before    After   Δ
  ───────────────────────────────────────────────────────
  H3  Correlated EXISTS             13.3s    10.0s   1.3x
  H7  Scalar-in-SELECT + JOIN        362ms    2ms   181x
  H8  NOT EXISTS self-join            1.8s  900ms   2.0x
  H11 Scalar + EXISTS + derived     13.7s    3.2s   4.3x
  (H1, H2, H5, H6, H9, H10, H12 unchanged at 3–72ms)

H7's 181x is the scalar-in-SELECT-list memoization payoff — each
dept's revenue subquery used to run 100 times (once per SALES emp),
now runs once per distinct dept.

H3's 1.3x is the best we can do without semi-join lift: 500 outer
rows × 500 unique correlation keys = 500 cache misses, and the 375
rows whose correlation finds no match must scan the full ord table
to confirm emptiness. Fixing that needs the optimizer to rewrite
`WHERE EXISTS (SELECT 1 FROM ord WHERE ord.emp_id = e.id AND ...)`
into `WHERE e.id IN (SELECT DISTINCT emp_id FROM ord WHERE ...)`,
which is a real query-rewrite feature left for a follow-up.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-04-15 16:31:36 +09:00
parent 2d9023622c
commit ce7593c50f

View File

@@ -36,6 +36,7 @@ CLASS TSqlExecutor
DATA bRowBlock /* optional code block — receives SELECT cols as params */
DATA aFetchCache /* pre-bound {nWA, nFPos} per SELECT expression, or NIL */
DATA hSubCorrCache INIT { => } /* per-outer-key subquery result cache */
DATA aSubCacheSlots INIT {} /* list of {xSubNode, {id, aFreeVars}} */
DATA nSubCacheSeq INIT 0 /* monotonic ID for subqueries */
CLASSDATA hSubCache INIT { => } SHARED
@@ -570,20 +571,33 @@ METHOD EvalExpr( xNode ) CLASS TSqlExecutor
RETURN NIL
CASE xNode[ 1 ] == ND_FN
/* EXISTS must be handled before argument evaluation */
IF xNode[ 2 ] == "EXISTS" .AND. Len( xNode[ 3 ] ) > 0 .AND. ;
/* EXISTS and NOT EXISTS — we only need to know whether the
* subquery returns at least one row, not compute the full
* result. Force a LIMIT 1 into the subquery's hQuery so the
* inner scan short-circuits on the first match. Then route
* through SubqueryCached so correlated EXISTS still memoizes
* on free-variable values (helps when correlation is low
* cardinality; no-op when every outer row is unique). */
IF ( xNode[ 2 ] == "EXISTS" .OR. xNode[ 2 ] == "NOT EXISTS" ) .AND. ;
Len( xNode[ 3 ] ) > 0 .AND. ;
xNode[ 3 ][ 1 ] != NIL .AND. ValType( xNode[ 3 ][ 1 ] ) == "A" .AND. ;
xNode[ 3 ][ 1 ][ 1 ] == ND_SUB .AND. xNode[ 3 ][ 1 ][ 2 ] != NIL
nSavedWA := Select()
::PushOuter()
aSubResult := TSqlExecutor():New( xNode[ 3 ][ 1 ][ 2 ], ::aParams ):Run()
::PopOuter()
dbSelectArea( nSavedWA )
/* Install LIMIT 1 on the subquery hQuery. EXISTS only cares
* about the existence of a match, so the subquery scan can
* stop at the first row — the scan loop in RunSelect honours
* hQuery["limit"] as an early-termination target. */
IF ValType( xNode[ 3 ][ 1 ][ 2 ] ) == "H"
xNode[ 3 ][ 1 ][ 2 ][ "limit" ] := 1
ENDIF
aSubResult := ::SubqueryCached( xNode[ 3 ][ 1 ] )
IF ValType( aSubResult ) == "A" .AND. Len( aSubResult ) >= 2 .AND. ;
ValType( aSubResult[ 2 ] ) == "A"
IF xNode[ 2 ] == "NOT EXISTS"
RETURN Len( aSubResult[ 2 ] ) == 0
ENDIF
RETURN Len( aSubResult[ 2 ] ) > 0
ENDIF
RETURN .F.
RETURN iif( xNode[ 2 ] == "NOT EXISTS", .T., .F. )
ENDIF
/* Evaluate arguments */
@@ -1069,6 +1083,7 @@ METHOD RunSelect() CLASS TSqlExecutor
LOCAL hJoinHash
LOCAL lIndexUsed, aTmp
LOCAL aFP, pcW, aGoRows
LOCAL nEarlyLimit
aCols := ::hQuery[ "columns" ]
::aTables := ::hQuery[ "tables" ]
@@ -1340,6 +1355,20 @@ METHOD RunSelect() CLASS TSqlExecutor
* join recursion. Huge win for multi-table scans. */
::aFetchCache := ::BuildFetchCache( aResultExprs )
dbSelectArea( nWA )
/* Early-termination LIMIT: when the query has a plain
* LIMIT / TOP and no ORDER BY, GROUP BY, aggregates,
* or DISTINCT, we can stop scanning as soon as aRows
* reaches the cap. Huge win for `EXISTS` which plants
* an implicit LIMIT 1 into the subquery's hQuery. */
nEarlyLimit := 0
IF ( ValType( nLimit ) == "N" .AND. nLimit > 0 ) .OR. ;
( ValType( nTop ) == "N" .AND. nTop > 0 )
IF Len( aOrderBy ) == 0 .AND. Len( aGroupBy ) == 0 .AND. ;
! ::oAgg:HasAgg( aCols ) .AND. ! lDistinct
nEarlyLimit := iif( ValType( nLimit ) == "N" .AND. nLimit > 0, ;
nLimit, nTop )
ENDIF
ENDIF
WHILE ! Eof()
IF Len( aJoins ) > 0
::JoinRecurse( aJoins, 1, xWhere, aResultExprs, @aRows, hJoinHash )
@@ -1350,6 +1379,9 @@ METHOD RunSelect() CLASS TSqlExecutor
AAdd( aRows, aRow )
ENDIF
ENDIF
IF nEarlyLimit > 0 .AND. Len( aRows ) >= nEarlyLimit
EXIT
ENDIF
dbSelectArea( nWA )
dbSkip()
ENDDO
@@ -1568,7 +1600,7 @@ RETURN lHadMatch
METHOD SubqueryCached( xSubNode ) CLASS TSqlExecutor
LOCAL hQ, aFreeVars, cCacheKey, aResult, nSavedWA, oSub
LOCAL i, xVal, nId
LOCAL i, xVal, nId, nSlot, aSlot
IF xSubNode == NIL .OR. ValType( xSubNode ) != "A" .OR. Len( xSubNode ) < 2
RETURN NIL
@@ -1578,17 +1610,26 @@ METHOD SubqueryCached( xSubNode ) CLASS TSqlExecutor
RETURN NIL
ENDIF
/* First call for this subquery: assign ID + analyze free variables */
IF Len( xSubNode ) < 6 .OR. xSubNode[ 6 ] == NIL
/* Identify this subquery: linear-search the slots list for a prior
* entry that references the SAME AST node (array `==` is reference
* compare in Harbour). Most queries have only a handful of sub-
* queries so the scan is trivial. Avoids mutating the parse tree. */
nSlot := 0
FOR i := 1 TO Len( ::aSubCacheSlots )
IF ::aSubCacheSlots[ i ][ 1 ] == xSubNode
nSlot := i
EXIT
ENDIF
NEXT
IF nSlot == 0
::nSubCacheSeq++
aFreeVars := ::CollectFreeVars( hQ )
IF Len( xSubNode ) < 6
ASize( xSubNode, 6 )
ENDIF
xSubNode[ 6 ] := { ::nSubCacheSeq, aFreeVars }
AAdd( ::aSubCacheSlots, { xSubNode, { ::nSubCacheSeq, aFreeVars } } )
nSlot := Len( ::aSubCacheSlots )
ENDIF
nId := xSubNode[ 6 ][ 1 ]
aFreeVars := xSubNode[ 6 ][ 2 ]
aSlot := ::aSubCacheSlots[ nSlot ][ 2 ]
nId := aSlot[ 1 ]
aFreeVars := aSlot[ 2 ]
/* Build cache key from current values of free variables via
* Resolve(), which walks the outer context stack. */