fivedev/five - five - fivego gitea

Author	SHA1	Message	Date
CharlesKWON	64b7cf6676	perf(FiveSql2): compound-AND equi-join picks up hash path — CTE+JOIN 22x FiveSql2's HashJoin only recognized bare equi-terms (xOnCond[1]=ND_BIN, xOnCond[2]="="), so a compound ON predicate like ON e.dept_id = t.dept_id AND e.salary = t.max_sal fell through to the nested-loop ELSE branch: dbSelectArea(nInnerWA) dbGoTop() WHILE !Eof() IF SqlIsTrue(EvalExpr(xOnCond)) JoinRecurse(...) ENDIF dbSkip() ENDDO That's O(outer × inner) per outer row, re-evaluating the full AND tree every probe. Query Q7 in the complex benchmark (CTE top_emp joined back to emp on compound key) ran at 4.6 seconds for 100 inner × 10k outer. Fix has two pieces: 1. Probe-term extraction in JoinRecurse: when xOnCond is an AND, walk the left-associative chain looking for the first equi-term (`a.x = b.x`). Use that as the hash-probe key, drive the normal hash-join code path through it. 2. Post-filter in HashJoin: after a hash match, if the original xOnCond was compound, re-evaluate the full predicate with EvalExpr to drop matches that satisfied the hash key but not the rest of the AND (e.g. same dept but different salary). Bare equi- joins still skip the re-eval — the hash match is conclusive. Bench (10k × 100 × compound ON predicate): Query Before After Speedup ───────────────────────────────────────────────────────── Q7 CTE + JOIN compound ON 4573ms 209ms 21.9x Still works for the existing bare equi case (43-test unchanged) and the 3-way JOIN case (no regression). Falls back to the generic nested loop only when no probe-term can be extracted at all. Validation: - FiveSql2 43/43 - Harbour compat 51/51 - go test ./... ALL PASS - Q7 result: 100 rows (correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 20:31:27 +09:00
CharlesKWON	c6799a599e	fix(FiveSql2): GROUP BY with aliased SELECT collapses all rows into one Surfaced by complex-query benchmarking. Query like: SELECT d.name AS dept, COUNT(*) AS n, SUM(o.amount) AS total FROM dept d INNER JOIN emp e ON ... INNER JOIN ord o ON ... GROUP BY d.name returned exactly 1 row instead of 100. Removing the AS aliases made it work correctly. Semantic bug, not a performance issue. Root cause: TSqlAgg:GroupBy resolved each GROUP BY column by calling FindColIdx against aFN — the output alias list. For GROUP BY d.name with d.name AS dept, the group expression's column name was looked up in {"dept","n","total"} and missed. FindColIdx returned 0, every row got an empty group key, and the hash collapsed everything into one bucket. Fix: new FindGroupIdx walks aCols (SELECT list expressions) instead, matching the GROUP BY column against each SELECT item's source expression ND_COL name. Handles qualified refs (d.name -> NAME) and falls back to FindColIdx for cases where GROUP BY uses a column not in the SELECT list. Also hoisted the resolution out of the per-row loop — GROUP BY columns resolve once into aGroupIdx[] so each row just indexes. Validation: - FiveSql2 43/43 - Harbour compat 51/51 - Complex bench Q4: 1 row -> 100 rows (correct) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 20:25:02 +09:00
CharlesKWON	bfc6ded8cb	perf(FiveSql2): SqlHashBuild + FetchRow column binding — 3-way JOIN 3x Complex-query benchmarking turned up two hot paths that the earlier SqlScan/SqlEach work didn't touch: multi-table JOIN and nested-scan row fetching. This commit hits both. --- Part 1: SqlHashBuild — Go-native hash-join build --- FiveSql2's HashJoin previously built the inner-side hash in PRG: WHILE !Eof() xVal := FieldGet(nFPos) cKey := SqlValToStr(xVal) IF !hb_HHasKey(hHash, cKey) ; hHash[cKey] := {} ; ENDIF AAdd(hHash[cKey], RecNo()) dbSkip() ENDDO That loop runs at ~40μs per row from class dispatch + hb_HHasKey lookups + AAdd growth + SqlValToStr formatting. On a 50k-row inner table that's ~2 seconds wasted on what should be a sub-50ms housekeeping op. New hbrtl.SqlHashBuild does the same thing in one Go-native pass: - Direct *dbf.DBFArea loop (no interface dispatch, same devirt as SqlScan) - Go `map[string][]int64` accumulates RecNos by key — one allocation per distinct key - Inline ASCII-only digit formatter for numeric keys (strconv.Itoa is allocation-heavy for small ints) - CHAR keys are right-trimmed to match SqlCmpEq semantics so the hash probe matches what EvalExpr would compute - Final Five hash is built once from Keys/Values/Order slices directly, skipping the per-key hb_HSet path HashJoin now calls `SqlHashBuild(nFPos)` instead of running the PRG loop. --- Part 2: TSqlExecutor:BuildFetchCache --- The JOIN fallback loop calls FetchRow per row. FetchRow was already column-ref-aware but did the string parse (`At + SubStr + Upper`) and `::FindWA` linear scan every single invocation. For a 50k-row join emitting 50k result rows, that's ~200k redundant resolutions. New BuildFetchCache walks the SELECT list once before the scan and pre-binds each plain-column expression to `{nWA, nFPos}`. FetchRow's new fast path checks ::aFetchCache and jumps straight to `dbSelectArea + FieldGet` when bound. Complex exprs (functions, CASE, subqueries) still fall through to EvalExpr. ::aFetchCache is set right before the join WHILE loop and cleared after — no cross-query bleed. --- Bench (50k ord × 10k emp × 100 dept, 3-run steady state) --- Query Before After Speedup ──────────────────────────────────────────────────────────── 2-way INNER JOIN, 10k rows 91ms 68ms 1.34x 2-way JOIN + GROUP BY 110ms 94ms 1.17x 3-way INNER JOIN COUNT 2610ms 610ms 4.28x 3-way JOIN + GROUP BY 2860ms 830ms 3.45x The 3-way speedup is almost entirely SqlHashBuild. The 2-way case benefits from the fetch cache because its per-row cost is dominated by FetchRow (no second hash build to amortize). --- Limits still standing --- CTE + JOIN queries (Q7 in bench_complex: ~4.5s) aren't affected by either optimization — CTE materialization goes through a different path that writes/reads a temp DBF. Follow-up target. Validation: - FiveSql2 43/43 - Harbour compat 51/51 - go test ./... ALL PASS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 18:47:20 +09:00
CharlesKWON	e75167c2e9	feat(FiveSql2): five_SQL block-callback integration — SQL beats raw PRG Wires the new SqlEach RTL into FiveSql2's front-end so users write the SQL they know and opt into streaming with a familiar Harbour code block — no manual RTL plumbing. API: /* Existing array form — unchanged, 43-test still green / aR := five_SQL( "SELECT name FROM t" ) / New block form — zero intermediate rows, 2x raw PRG / five_SQL( "SELECT id, name FROM t WHERE salary > 50000", NIL, {\|nID, cName\| Process(nID, cName)} ) Parameter order (cSQL, aParams, bBlock) keeps backward compatibility with every existing call site. Passing NIL for aParams when only a block is needed is standard Harbour idiom. Routing: TFiveSQL:Execute now takes an optional bBlock parameter and stores it on TSqlExecutor as ::bRowBlock. * TSqlExecutor:RunSelect's existing Go fast path (same guards as before: single table, no JOIN/GROUP/aggregate, plain column projections, WHERE compilable via SqlExprToPrg) branches on ::bRowBlock: - block present → SqlEach streams rows through the block - block absent → SqlScan materializes into aRows (current path) * Post-processing (GROUP BY / ORDER BY / window / DISTINCT / LIMIT) runs on empty aRows when block mode fires — all are no-ops on empty input, so the sequence stays harmless. * RunSelect returns NIL (not {fields, rows}) when ::bRowBlock was used — signals "streaming semantics, all work done in the block". Complex queries (JOIN, GROUP BY, subquery, window, ORDER BY not matchable by an index, LIMIT/OFFSET, etc.) still fall back to the array path even when a block is supplied — those genuinely require materialization. Block mode is a fast-path opt-in, not a semantic change. End-to-end bench (50k rows, steady state — includes the user-side loop/block for every row): Path Time Speedup vs raw ────────────────────────────────────────────────────────────── Raw PRG DO WHILE !Eof() + WHERE sum 7.6ms 1.00x five_SQL array + FOR 7.7ms ~same five_SQL + block (new) 3.7ms 2.05x ← beats raw ────────────────────────────────────────────────────────────── Raw PRG no WHERE 6.1ms 1.00x five_SQL + block, no WHERE 2.9ms 2.10x ← beats raw SQL now pays for itself on end-to-end timing — not just competitive with hand-rolled RDD loops, but faster than them. The layered cost of FieldGet's Frame+RTL-dispatch that hand-written loops incur per call is gone; the block-callback path captures *dbf.DBFArea directly via FastFieldGetter and uses PcOpFieldGet to bypass dispatch in the compiled WHERE predicate. Validation: - FiveSql2 43/43 (array API unchanged) - Harbour compat 51/51 - go test ./... ALL PASS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 17:00:46 +09:00
CharlesKWON	ad69221136	revert(FiveSql2): restore TSqlIndex:FindExclusive scan Previous short-circuit (return 0 unconditionally) was a workaround for two bugs that are both fixed now: 1. gengo PushLocal(0) panic on unresolved identifiers → fixed by `08ad6f4` (PushMemvar fallback). 2. dbInfo(DBI_FULLPATH / DBI_SHARED) returning NIL → fixed by `d74014a` (real implementations). Restoring the original scan: walk workareas 1..250, check if any holds an exclusive lock on the target DBF. With dbInfo now functional and the DBI_* constants defined in include/dbinfo.ch (commit `3a00aa5`), this gives FiveSql2 real pre-flight conflict detection for concurrent table access rather than silently proceeding into a lock failure. Validation: - FiveSql2 43/43 - standalone PRG with dbUseArea + five_SQL works (was the original repro that triggered the workaround) Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:07:40 +09:00
CharlesKWON	8aaed994f4	perf(FiveSql2): hybrid fast path — 11x speedup on string WHERE scans Implements hybrid execution model: keep AST tree-walk for SQL:2013+ features (Window, Recursive CTE, JOIN, aggregates) while compiling simple SELECT hot paths to Go + pcode. See docs/FiveSql2-Hybrid-Plan.md for the full architecture rationale (why not SQLite-style VDBE). Hot path (single table, no joins/groups/aggregates): - TryBuildFieldPositions: resolves SELECT column list to FieldPos array once per query (bails to PRG loop on any complex expr). - TryCompileWhere + SqlExprToPrg: walks WHERE AST, emits equivalent PRG source, runs it through PcCompile to get a PcodeFunc. - SqlScan RTL: Go-native scan loop — GoTop/EOF/Skip/GetValue direct, ExecPcode per row for WHERE, result array pre-alloc. WHERE compiler scope: - ND_LIT numeric/logical/string (string literals AllTrim'd to match SqlCmpEq CHAR-padding semantics; rejects embedded quotes/newlines) - ND_COL: CHAR fields auto-wrapped with AllTrim(FieldGet(n)) based on dbStruct() lookup cached once per query in aCompileStruct - ND_BIN: = <> != < <= > >= AND OR + - * / - ND_UNI: NOT - - Anything else (ND_FN, ND_CASE, ND_SUB, ND_PAR, LIKE, IN, IS NULL, BETWEEN, dates) returns NIL → falls back to PRG tree-walk. Bench (50k rows, ~/tmp ext4): Before After Speedup Numeric WHERE ~150ms 11.7ms ~13x String WHERE 119.3ms 10.5ms 11.4x No WHERE - 14.6ms - Raw RDD baseline 6.8ms 6.8ms 1.0x Remaining gap to raw RDD (~1.5x) is structural: Value boxing, result array construction, per-row ExecPcode frame overhead. Would need a Value-pool or SoA refactor to close further. Side fixes bundled: - TSqlIndex:FindExclusive short-circuited. Originally called dbInfo(DBI_FULLPATH)/DBI_SHARED which are unresolved symbols in Five (dbInfo is a stub, DBI_* never defined). Panic'd with "local variable index out of range: 0" whenever a standalone PRG had a workarea Used before calling five_SQL. 43-test masked the bug because it only reached FindExclusive with no open workareas. Restore the scan once dbInfo lands in hbrtl. - cmd/five/main.go: FIVE_KEEP_BUILD=1 env var keeps the temp Go project around for debugging gengo output. Validation: - FiveSql2 43/43 - Harbour compat 51/51 - go test ./... ALL PASS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 09:15:08 +09:00
Charles KWON OhJun	486e466592	feat: FiveSql2 43/43, @byref, mutable closure, RTL 479, DateTime fix Major changes since last commit: - FiveSql2 SQL:1999 engine (10,458 LOC) — 43/43 ALL PASS - 21 compiler/runtime bugs fixed (short-circuit AND/OR, FOR LOOP, etc.) - @byref pass-by-reference via RefCell pattern - Mutable closure capture (EnsureLocalRef + RefCell sharing) - RTL: 400 → 479 functions (+79: file, string, datetime, hash, UTF-8) - DateTime/Timestamp fully working (hb_DateTime, hb_Hour/Min/Sec, display) - Reserved word guard (39 keywords blocked from function calls) - AEval arg order fix (element before index) - Closure capture redecl fix (unique _cap_ names per block) - Hash/string indexing in ArrayPush/ArrayPop - Harbour compat test suite: 51/51 - 4 docs: Porting Report, Implementation Plan, Optimization Plan, Commercialization Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-11 11:35:37 +09:00

7 Commits