Complex-query benchmarking turned up two hot paths that the earlier
SqlScan/SqlEach work didn't touch: multi-table JOIN and nested-scan
row fetching. This commit hits both.
--- Part 1: SqlHashBuild — Go-native hash-join build ---
FiveSql2's HashJoin previously built the inner-side hash in PRG:
WHILE !Eof()
xVal := FieldGet(nFPos)
cKey := SqlValToStr(xVal)
IF !hb_HHasKey(hHash, cKey) ; hHash[cKey] := {} ; ENDIF
AAdd(hHash[cKey], RecNo())
dbSkip()
ENDDO
That loop runs at ~40μs per row from class dispatch + hb_HHasKey
lookups + AAdd growth + SqlValToStr formatting. On a 50k-row inner
table that's ~2 seconds wasted on what should be a sub-50ms
housekeeping op.
New hbrtl.SqlHashBuild does the same thing in one Go-native pass:
- Direct *dbf.DBFArea loop (no interface dispatch, same devirt as
SqlScan)
- Go `map[string][]int64` accumulates RecNos by key — one
allocation per distinct key
- Inline ASCII-only digit formatter for numeric keys (strconv.Itoa
is allocation-heavy for small ints)
- CHAR keys are right-trimmed to match SqlCmpEq semantics so the
hash probe matches what EvalExpr would compute
- Final Five hash is built once from Keys/Values/Order slices
directly, skipping the per-key hb_HSet path
HashJoin now calls `SqlHashBuild(nFPos)` instead of running the
PRG loop.
--- Part 2: TSqlExecutor:BuildFetchCache ---
The JOIN fallback loop calls FetchRow per row. FetchRow was already
column-ref-aware but did the string parse (`At + SubStr + Upper`)
and `::FindWA` linear scan every single invocation. For a 50k-row
join emitting 50k result rows, that's ~200k redundant resolutions.
New BuildFetchCache walks the SELECT list once before the scan and
pre-binds each plain-column expression to `{nWA, nFPos}`. FetchRow's
new fast path checks ::aFetchCache and jumps straight to
`dbSelectArea + FieldGet` when bound. Complex exprs (functions,
CASE, subqueries) still fall through to EvalExpr.
::aFetchCache is set right before the join WHILE loop and cleared
after — no cross-query bleed.
--- Bench (50k ord × 10k emp × 100 dept, 3-run steady state) ---
Query Before After Speedup
────────────────────────────────────────────────────────────
2-way INNER JOIN, 10k rows 91ms 68ms 1.34x
2-way JOIN + GROUP BY 110ms 94ms 1.17x
3-way INNER JOIN COUNT 2610ms 610ms 4.28x
3-way JOIN + GROUP BY 2860ms 830ms 3.45x
The 3-way speedup is almost entirely SqlHashBuild. The 2-way case
benefits from the fetch cache because its per-row cost is dominated
by FetchRow (no second hash build to amortize).
--- Limits still standing ---
CTE + JOIN queries (Q7 in bench_complex: ~4.5s) aren't affected by
either optimization — CTE materialization goes through a different
path that writes/reads a temp DBF. Follow-up target.
Validation:
- FiveSql2 43/43
- Harbour compat 51/51
- go test ./... ALL PASS
Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
FiveSql2 — SQL Engine for Harbour DBF/NTX/CDX
Pratt parser + SQL:1992-2023 full standard support Supports both NTX (Clipper) and CDX (FoxPro/ADS) indexes
Architecture
five_SQL("SELECT ...")
│
├── TSqlLexer Tokenizer
├── TSqlParser2 Pratt parser (data-driven operators)
├── TSqlExecutor Query executor (Volcano model)
│ ├── TSqlAlias Central alias manager (no collisions)
│ ├── TSqlIndex NTX/CDX index optimization (auto-detect)
│ ├── TSqlAgg GROUP BY / aggregation
│ ├── TSqlSort ORDER BY / DISTINCT
│ ├── TSqlDDL CREATE/DROP/ALTER TABLE/INDEX
│ └── TSqlTxn BEGIN/COMMIT/ROLLBACK
├── TSqlExpr AST nodes + expression evaluation
└── TSqlFunc 60+ scalar functions
Build & Test
export PATH="/path/to/harbour-core/bin/linux/gcc:$PATH"
export HB_INSTALL_PREFIX="/path/to/harbour-core"
make # Build all tests
make test # Run all 157 tests
make bench # Parser benchmark
make clean # Clean
SQL Standard Coverage
| Standard | Features | Tests |
|---|---|---|
| SQL:1992 | SELECT, JOIN, GROUP BY, HAVING, Subquery, CASE, CAST | 43 |
| SQL:1999 | CTE, Recursive CTE, Window Functions, MERGE | 10 |
| SQL:2003 | SIMILAR TO, GROUPING SETS, LATERAL, Window frames | 64 |
| SQL:2008 | FETCH/OFFSET, FOR UPDATE, Extended MERGE | (incl.) |
| SQL:2016 | JSON functions, LISTAGG | (incl.) |
| SQL:2023 | ANY_VALUE, GREATEST/LEAST, BOOL_AND/OR | (incl.) |
| Challenge | LeetCode-level complex queries | 15 |
| Extreme | Production analytics stress tests | 15 |
Adding New Operators
Edit TSqlParser2.prg, method InitInfixTables():
::hInfixTT[ TK_MYOP ] := { "<=>", 40, 41, ND_BIN }
One line. No structural changes needed.
Copyright
Copyright (c) 2025-2026 Charles KWON (Charles KWON OhJun) Email: charleskwonohjun@gmail.com All rights reserved.