fivedev/five - five - fivego gitea

Author	SHA1	Message	Date
CharlesKWON	f4ed42556b	checkpoint: season-wide bug fix campaign + infra Cumulative season's silent-bug hunting (~62 fixes) across the FiveSql2 SQL engine, the Five compiler/runtime, and the hbrdd RDD layer. Saved as a single checkpoint before refactoring the parser to delegate xBase command translation to the preprocessor. Highlights: FiveSql2 engine (_FiveSql2/src/) - prefix-glob index attach -> explicit convention (<table>_pk.ntx, <table>_uq.ntx, <table>.cdx) — fixes silent multi-row INSERT row-drop - DROP/CREATE TABLE FErase chain extended (.cdx, .fsc, .fsv, .dbt, .fpt) - COUNT(DISTINCT col) parsed + aggregated via hSeen hash - UNION column-count mismatch returns SQL_ERR_GRAMMAR (was silent) - DISTINCT + ORDER BY hidden-col leak fixed (trim before DISTINCT) - Derived table FROM (SELECT...) + JOIN right-side derived - Self-FK CASCADE depth 2+ via SqlGetSingleColPK pre-collect - LAG/LEAD default arg uses SqlEvalRowExpr (handles -N const exprs) - DATE literal round-trip validation (Feb 29 non-leap rejected) - CREATE OR REPLACE VIEW; CREATE VIEW errors on already-exists - AlterTable type dispatcher comma-wrapped (1-char type "A" no longer matches CHARACTER) Compiler / runtime - gengo: HB_ -> FV_ prefix on emitted Go function names (Five identity) - gengo split: emit_block.go, emit_stmt.go, folding.go extracted - parser/stmtreg.go nudges - hbrt: debug TUI/CLI restructure (debugcmd, debugkey, termios_*), windows debug stubs collapsed - thread/vm/value/class/pcinterp tightening from panic traces RDD layer (hbrdd/) - dbf: null bitmap support (null.go + null_test.go), mmap split (mmap_posix.go / mmap_windows.go), byte-level numeric parse - ntx/cdx: windows mmap parity - workarea + mem RDD: cross-area state-bleed fixes RTL (hbrtl/) - errorlog rewrite with platform-specific FD (errorlog_fd_unix / errorlog_fd_other) - sqlscan, sqlhelpers, indexrtl, datetime extensions Gates green at checkpoint: - go test ./... : PASS - FiveSql2 SQL:1999 : 43/43 - Harbour compat : 56/56 Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-30 09:26:25 +09:00
CharlesKWON	935883bb88	perf(fivesql2): Go-native FetchRow fast path — 1.3-1.7x on agg/window TSqlExecutor:FetchRow was the per-row workhorse for aggregation, HAVING, and window queries. Even with the pre-built aFetchCache binding columns to (nWA, nFPos), the PRG FOR loop paid one method dispatch per column per row (dbSelectArea, FieldGet, AllTrim, AAdd) — profile pinned it at ~30% of B4 CPU. SqlFetchRowFast collapses the cache-path loop into a single Go call: - bound entry: SelectByNum + area.GetValue directly - unbound (aggregate/expression): self:EvalExpr via Send - character values: TrimSpace inline The PRG FetchRow keeps its original cache-miss fallback path unchanged for rare queries where aFetchCache isn't built. Bench deltas (median of 3 steady runs, 1000 iters): B4_GROUP_HAVING 418 → 327 us -22% (1.28x) B9_ROW_NUMBER 191 → 120 us -37% (1.59x) B10_RANK_PART 228 → 135 us -41% (1.69x) B11_SUM_OVER 249 → 156 us -37% (1.60x) B14_COUNT 235 → 219 us -7% B15_CTE_WIN_JOIN 1577 → 1452 us -8% Single-table SELECT (B1-B3, B5-B7, B8) stays flat — those already hit the column-binding fast path and don't need aggregate dispatch. FiveSql2 43/43, Harbour compat 56/56. Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-18 13:50:02 +09:00
CharlesKWON	dd270d5d9d	perf: RTL Go-native migration — 27 optimizations, DML up to 70-90x Systematic pass through PRG hot paths, promoting them to Go RTL while preserving Harbour/FiveSql2 semantics. Full log in docs/RTL-Go-Native-Migration.md. Bench (bench_sql) vs 2026-04-08 baseline - B1 SELECT * 2,192 → 114 µs (19x) - B6 INNER JOIN 9,291 → 233 µs (40x) - B7 CTE simple 8,037 → 129 µs (62x) - B9 ROW_NUMBER 3,705 → 265 µs (14x) - B10 RANK PARTITION 4,748 → 309 µs (15x) - B12 INSERT (WA cache) 4,319 → 63 µs (69x) - B13 UPDATE (WA cache) 6,144 → 68 µs (90x) - B15 CTE+WIN+JOIN 18,395 → 1,873 µs (10x) Infrastructure - HbHash O(1) Index preserving insertion order (Harbour KEEPORDER) - HbDeepClone Go RTL (scalar-sharing, immutable hash keys) - MEMRDD auto-imported via gengo; all Five programs get mem:name driver - SQL plan + pcode caches (s_hPlanCache, s_hDmlPcodeCache) - Opt-in SqlWACacheEnable — dbUseArea/Close/Commit batched for DML SQL engine - FiveSql2 lexer ported to Go (byte FSM) with combined automatic template parameterization (literals → ?, concat queries share plan) - Go RTL: SqlDistinct, SqlGroupRows, SqlWindowPartitions, SqlWindowSortPartition, SqlWindowAssignRank, SqlComputeAggSimple, SqlBulkInsert, SqlBulkUpdate, SqlExprHasAgg, SqlEvalHaving - CTE / subquery / driving-table materialize paths use MEMRDD - SqlCoerce/SqlCmp/SqlIsTrue helpers moved from PRG to Go - SqlBulkUpdate defers Flush when WA cache active (APFS fsync was dominant B13 cost — 1.6ms/call → gone) Correctness fixes uncovered during migration - ASort default path now sorts dates/logicals/timestamps (was no-op) - ORDER BY default NULL placement matches PRG SqlRowCompare across Go fast path; explicit NULLS FIRST/LAST honored by both paths - SqlBulkUpdate respects EXCLUSIVE vs SHARED mode record locks - SqlCmp/SqlCmpEq normalize NumInt vs Double (caught by test 6b) Verification - go test ./... ALL PASS - FiveSql2 test_sql1999 43/43 - tests/compat_harbour 56/56 (+5 new: ASort dates/logicals, AScan int cross-type) - Regression test test_null_order.prg for ORDER BY NULL ordering Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>	2026-04-17 20:20:14 +09:00
CharlesKWON	3caadb23b9	perf: SqlOrderBy + SqlGroupBy Go RTL — native sort and aggregation SqlOrderBy: Go sort.Slice for ORDER BY, 10-50x faster than PRG ASort. SqlGroupBy: Go map-based GROUP BY accumulation (ready for integration). TryBuildSortSpec detects simple ORDER BY columns and routes to Go. Fallback to PRG for complex ORDER BY expressions. 43/43 + 41/41 verify + 51/51 compat + go test ALL PASS. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 14:41:41 +09:00
CharlesKWON	5fc9c3bbea	perf: SqlHashJoin Go RTL — 3-way JOIN 4.2s→61ms (69x) Go-native multi-table hash join bypasses per-row PRG overhead. TryGoJoin detects equi-join + plain-col SELECT, aggregate cols get placeholder. 2-way 73→3ms, 3-way 3.9s→61ms. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-17 07:16:09 +09:00
CharlesKWON	bfc6ded8cb	perf(FiveSql2): SqlHashBuild + FetchRow column binding — 3-way JOIN 3x Complex-query benchmarking turned up two hot paths that the earlier SqlScan/SqlEach work didn't touch: multi-table JOIN and nested-scan row fetching. This commit hits both. --- Part 1: SqlHashBuild — Go-native hash-join build --- FiveSql2's HashJoin previously built the inner-side hash in PRG: WHILE !Eof() xVal := FieldGet(nFPos) cKey := SqlValToStr(xVal) IF !hb_HHasKey(hHash, cKey) ; hHash[cKey] := {} ; ENDIF AAdd(hHash[cKey], RecNo()) dbSkip() ENDDO That loop runs at ~40μs per row from class dispatch + hb_HHasKey lookups + AAdd growth + SqlValToStr formatting. On a 50k-row inner table that's ~2 seconds wasted on what should be a sub-50ms housekeeping op. New hbrtl.SqlHashBuild does the same thing in one Go-native pass: - Direct *dbf.DBFArea loop (no interface dispatch, same devirt as SqlScan) - Go `map[string][]int64` accumulates RecNos by key — one allocation per distinct key - Inline ASCII-only digit formatter for numeric keys (strconv.Itoa is allocation-heavy for small ints) - CHAR keys are right-trimmed to match SqlCmpEq semantics so the hash probe matches what EvalExpr would compute - Final Five hash is built once from Keys/Values/Order slices directly, skipping the per-key hb_HSet path HashJoin now calls `SqlHashBuild(nFPos)` instead of running the PRG loop. --- Part 2: TSqlExecutor:BuildFetchCache --- The JOIN fallback loop calls FetchRow per row. FetchRow was already column-ref-aware but did the string parse (`At + SubStr + Upper`) and `::FindWA` linear scan every single invocation. For a 50k-row join emitting 50k result rows, that's ~200k redundant resolutions. New BuildFetchCache walks the SELECT list once before the scan and pre-binds each plain-column expression to `{nWA, nFPos}`. FetchRow's new fast path checks ::aFetchCache and jumps straight to `dbSelectArea + FieldGet` when bound. Complex exprs (functions, CASE, subqueries) still fall through to EvalExpr. ::aFetchCache is set right before the join WHILE loop and cleared after — no cross-query bleed. --- Bench (50k ord × 10k emp × 100 dept, 3-run steady state) --- Query Before After Speedup ──────────────────────────────────────────────────────────── 2-way INNER JOIN, 10k rows 91ms 68ms 1.34x 2-way JOIN + GROUP BY 110ms 94ms 1.17x 3-way INNER JOIN COUNT 2610ms 610ms 4.28x 3-way JOIN + GROUP BY 2860ms 830ms 3.45x The 3-way speedup is almost entirely SqlHashBuild. The 2-way case benefits from the fetch cache because its per-row cost is dominated by FetchRow (no second hash build to amortize). --- Limits still standing --- CTE + JOIN queries (Q7 in bench_complex: ~4.5s) aren't affected by either optimization — CTE materialization goes through a different path that writes/reads a temp DBF. Follow-up target. Validation: - FiveSql2 43/43 - Harbour compat 51/51 - go test ./... ALL PASS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 18:47:20 +09:00
CharlesKWON	d2ed140273	feat(FiveSql2): SqlEach block callback — beats raw RDD on end-to-end timing The structural 1.38x gap vs raw RDD for no-WHERE full scans wasn't a limit of our engine — it was a limit of the result shape. SqlScan materializes N rows as HbArray wrappers over a flat Value buffer, then the PRG caller iterates that materialized array. Two passes over the data. Raw RDD is one pass. SqlEach folds both passes into one. The caller supplies a code block that receives the selected column values as positional parameters; SqlEach invokes it per matching row. No result array is ever built. Usage (drop-in replacement for the common "scan + process" idiom): five_SQLEach( "SELECT id, name, salary FROM emp WHERE salary > 50000", {\|nID, cName, nSalary\| Process(nID, cName, nSalary) } ) API shape borrows Harbour's AEval/ASort block-callback convention, so there's nothing new to learn. Positional params also sidestep the `SELECT COUNT()` naming problem — no need to invent names for anonymous expressions. Implementation notes: - 4-way loop specialization ({DBF, generic Area} × {WHERE, none}), matching SqlScan. Each path is zero-allocation in the steady state. - Block invocation uses the direct pendingParams + blk.Fn(t) protocol rather than EvalBlock, which would allocate a temporary args slice on every call (50k scans × small slice adds up). - FastFieldGetter is installed the same way as SqlScan so PcOpFieldGet in the WHERE predicate skips the PushSymbol + Function dispatch. Bench (50k rows, end-to-end including user-code loop, steady state): Path Time vs raw RDD ───────────────────────────────────────────────────── Raw PRG loop, WHERE + sum 8.7ms 1.00x SqlScan + PRG FOR, WHERE 5.1ms 0.59x SqlEach block, WHERE 4.1ms 0.47x ← beats raw ───────────────────────────────────────────────────── Raw PRG loop, no WHERE 6.1ms 1.00x SqlEach block, no WHERE 3.8ms 0.62x ← beats raw SqlEach is faster than a hand-rolled `DO WHILE !Eof()` loop because the per-row FieldGet in raw PRG still goes through a full Frame + RTL dispatch, whereas SqlEach's FastFieldGetter captures the concrete dbf.DBFArea directly. The SQL abstraction now costs nothing — it pays you to use it. Validation: - FiveSql2 43/43 - Harbour compat 51/51 - go test ./... ALL PASS Next step (not in this commit): FiveSql2 TSqlExecutor integration — detect when five_SQL is called with a block argument and route to SqlEach instead of SqlScan + array build. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 15:16:36 +09:00
CharlesKWON	5dd212c761	perf(sqlscan): specialize four loop variants (DBF×WHERE matrix) SqlScan's inner scan was written as a single loop with `if whereFn != nil` and a `keep` shadow variable. Branch-predictable for sure, but still a few extra ops per row and it prevented Go from inlining the non-nil interface call on the Area branch. Split into four specialized loop bodies on the two axes that drive per-row cost: 1. dbfArea != nil && whereFn != nil 2. dbfArea != nil && whereFn == nil ← tightest path (SELECT *) 3. dbfArea == nil && whereFn != nil ← generic Area 4. dbfArea == nil && whereFn == nil Each body has exactly the instructions it needs — no dead branches, no shadow variables, no interface dispatch where avoidable. Copy-paste cost is real but each row save adds up at 50k iterations. Bench impact (50k rows, 3-run steady state): No WHERE 9.1ms → 8.7ms 1.38x vs raw (was 1.47x) Numeric WHERE 6.9ms → 7.0ms ~flat (within noise) String WHERE 6.2ms → 6.4ms ~flat (within noise) Raw RDD 6.3ms baseline Validation: - FiveSql2 43/43 - Harbour compat 51/51 - go test ./hbrtl/... PASS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 14:04:48 +09:00
CharlesKWON	f9ffd4050e	perf(FiveSql2): FieldGet peephole + DBFArea devirt — WHERE at ~1.15x raw RDD Two stacked optimizations land on the SqlScan hot path. Combined effect on the 50k-row benchmark: Before After vs raw Numeric WHERE 10.2ms 7.8ms 1.15x String WHERE 10.5ms 7.9ms 1.15x No WHERE 9.2ms 10.0ms 1.45x Raw RDD baseline 6.8ms 6.8ms 1.00x WHERE-predicate paths are now within 15% of the raw Harbour-style RDD scan loop. The no-WHERE path is unchanged (slight jitter from the added devirt branch); FieldGet peephole doesn't apply there. --- Optimization 1: PcOpFieldGet peephole --- Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and calls a direct field getter closure instead. genpc recognizes the shape `FieldGet(<int-literal>)` during emitCall and emits the specialized opcode automatically — no SQL-side API change. Integration: * hbrt.Thread.FastFieldGetter — hot-path closure set by scan loops. Non-nil → pcode bypasses dispatch. Nil → pcode resolves FIELDGET via the RTL symbol table (correctness fallback for any other callers). * compiler/genpc/genpc.go — peephole in emitCall. * hbrt/pcinterp.go — PcOpFieldGet handler. This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly one full Frame/EndProc + RTL dispatch per row × 50k rows. --- Optimization 2: DBFArea devirtualization --- SqlScan type-asserts the workarea to dbf.DBFArea once and runs a dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the concrete type. Go's compiler inlines these, skipping the interface vtable per row. Non-DBF drivers still work via the generic Area branch. The FastFieldGetter closure also captures DBFArea directly in the DBF branch, so the WHERE predicate side of the hot loop is now entirely devirtualized: no interface dispatch between the pcode dispatch loop and the DBF record buffer. Validation: - FiveSql2 43/43 - Harbour compat 51/51 - go test ./... ALL PASS Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the two-column row construction + ArraySlab + flat backing bookkeeping that the raw loop doesn't do. Going below that requires changing the SQL engine's result shape — out of scope here. Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:23:31 +09:00
CharlesKWON	5c067f35a4	perf(hbrt): ExecPcodeFast — pcode variant without defer/recover Pcode expressions compiled from SQL WHERE clauses (via genpc.CompileExpr) never contain BEGIN SEQUENCE and can't raise BreakValue, so the defer + recover dance in ExecPcode's EndProc is pure overhead. For FiveSql2's per-row WHERE evaluation on a 50k-row scan, that's 50k × ~15ns = ~750µs of pointless recover bookkeeping. Split ExecPcode into two variants sharing execPcodeBody: ExecPcode — full: Frame + defer EndProc. General-purpose, handles panics. Behavior unchanged. ExecPcodeFast — hot: Frame + execPcodeBody + EndProcFast. No defer, no recover. Caller guarantees the pcode body can't panic with HbError / BreakValue. SqlScan now uses ExecPcodeFast for per-row WHERE evaluation. Measured impact on 50k-row no-WHERE benchmark: 10.6ms → 9.2ms steady state (~13% faster). Effect is smaller on numeric-WHERE because per-row cost there is dominated by the opcode dispatch itself, not the frame exit. Validation: - FiveSql2 43/43 - go test ./hbrt/... PASS (pcode tests) - go test ./hbrtl/... PASS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 12:07:54 +09:00
CharlesKWON	85541a3035	perf(sqlscan): flat backing buffer — 30% faster no-WHERE scan The prior loop allocated one small `[]hbrt.Value` per matching row (for the row body) plus one HbArray header. For a 50k-row full scan that's 100k allocations of which the small-slice allocs dominated fragmentation and GC pressure. SQLite-inspired fix: pre-allocate a single flat []hbrt.Value of capacity `RecCount * nFields` at scan start and hand each row a three-index sub-slice (flat[off:end:end]). The capped sub-slice still forces a reallocation if PRG code later does `AAdd(row, x)`, so neighbor rows can't get clobbered. Sizing the initial buffer off RecCount(err-ignored) was the actual win — the previous naive grow-from-1024 policy caused five mid-scan reallocations of a ~200 KB buffer, each memcpy'ing everything so far. One upfront allocation amortizes much better. Bench (50k rows, ~/tmp ext4, 3 runs steady-state): Before After Δ no WHERE 14.6ms 10.6ms −27% numeric WHERE 11.7ms 10.0ms −15% string WHERE 10.5ms 11.0ms ~= raw RDD baseline 6.8ms 7.0ms Gap to raw RDD: 2.1x → 1.4x on the dominant no-WHERE case. What's left is pcode WHERE dispatch (ExecPcode frame per row), the Area interface boundary, and the HbArray header allocation per row — all structural costs that would need a wider refactor to close. Validation: - FiveSql2 43/43 - go test ./hbrtl/... PASS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 10:57:05 +09:00
CharlesKWON	8aaed994f4	perf(FiveSql2): hybrid fast path — 11x speedup on string WHERE scans Implements hybrid execution model: keep AST tree-walk for SQL:2013+ features (Window, Recursive CTE, JOIN, aggregates) while compiling simple SELECT hot paths to Go + pcode. See docs/FiveSql2-Hybrid-Plan.md for the full architecture rationale (why not SQLite-style VDBE). Hot path (single table, no joins/groups/aggregates): - TryBuildFieldPositions: resolves SELECT column list to FieldPos array once per query (bails to PRG loop on any complex expr). - TryCompileWhere + SqlExprToPrg: walks WHERE AST, emits equivalent PRG source, runs it through PcCompile to get a PcodeFunc. - SqlScan RTL: Go-native scan loop — GoTop/EOF/Skip/GetValue direct, ExecPcode per row for WHERE, result array pre-alloc. WHERE compiler scope: - ND_LIT numeric/logical/string (string literals AllTrim'd to match SqlCmpEq CHAR-padding semantics; rejects embedded quotes/newlines) - ND_COL: CHAR fields auto-wrapped with AllTrim(FieldGet(n)) based on dbStruct() lookup cached once per query in aCompileStruct - ND_BIN: = <> != < <= > >= AND OR + - * / - ND_UNI: NOT - - Anything else (ND_FN, ND_CASE, ND_SUB, ND_PAR, LIKE, IN, IS NULL, BETWEEN, dates) returns NIL → falls back to PRG tree-walk. Bench (50k rows, ~/tmp ext4): Before After Speedup Numeric WHERE ~150ms 11.7ms ~13x String WHERE 119.3ms 10.5ms 11.4x No WHERE - 14.6ms - Raw RDD baseline 6.8ms 6.8ms 1.0x Remaining gap to raw RDD (~1.5x) is structural: Value boxing, result array construction, per-row ExecPcode frame overhead. Would need a Value-pool or SoA refactor to close further. Side fixes bundled: - TSqlIndex:FindExclusive short-circuited. Originally called dbInfo(DBI_FULLPATH)/DBI_SHARED which are unresolved symbols in Five (dbInfo is a stub, DBI_* never defined). Panic'd with "local variable index out of range: 0" whenever a standalone PRG had a workarea Used before calling five_SQL. 43-test masked the bug because it only reached FindExclusive with no open workareas. Restore the scan once dbInfo lands in hbrtl. - cmd/five/main.go: FIVE_KEEP_BUILD=1 env var keeps the temp Go project around for debugging gengo output. Validation: - FiveSql2 43/43 - Harbour compat 51/51 - go test ./... ALL PASS Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>	2026-04-14 09:15:08 +09:00

12 Commits