feat(pp,rtl): Tier 2 audit followups — JOIN hash + PP validation + C heuristic
Three medium-priority audit items in one commit, each independently
revertible.
* **#18 JOIN hash-join fast path.** New std.ch shape:
JOIN WITH <alias> TO <file> [FIELDS ...] ON <mfield> = <dfield>
expands to a 6-arg __dbJoin call with the master/detail key
field names. Runtime detects the extra args, builds an O(M)
hash over the detail's key column, then probes per master row
for O(N+M) total — vs the FOR form's O(N*M). For 1k×1k that's
2k vs 1M operations; the gap widens with N. The original FOR
form is unchanged and stays the fallback for arbitrary
predicates. New helper dbHashKey type-tags the key string so
`1` (numeric), `"1"` (string), and `.T.` (logical) don't
collide in the bucket map.
* **#38 PP rule result-marker validation.** ParseRule now walks
the result template after parseMarkers and warns about every
`<name>` (or `<(name)>` / `<.name.>` / `<{name}>` / `#<name>`
/ `<"name">`) that doesn't match a pattern marker. Warnings
flow into pp.errors via handleDirective with the directive's
filename:line, so a typo'd `<NaMe>` in an `#xcommand`
case-sensitive rule fails the build with a clear diagnostic
instead of silently producing broken expansions.
* **#44 looksLikeInlineC heuristic strengthened.** Catches more
of the common Harbour-PRG-with-C-inline-block shapes that
used to fall through and produce cryptic Go-side errors:
function-like #define, `extern "C"` linkage blocks, C return-
type declarations (`int foo(`, `static char* bar(`), and the
hb_ret*() helper family used by Harbour's C FFI return
setters. Two small predicate helpers (allLetters,
allIdentChars) keep the C-vs-Go disambiguation tight enough
that legit Go code (`func name() int { ... }`) doesn't trip.
* **#28 LIST/DISPLAY pagination** — explicitly deferred. Proper
pagination requires interactive terminal handling (Inkey(0)
for the keypress) which would hang in CI / batch mode. Will
revisit when an interactive terminal layer needs it for
other reasons.
Test fixtures: tests/std_ch/test_join_hash.prg verifies the new
ON-form path produces the same output as the FOR form would.
std.ch runner now stands at 16/16.
Other gates green:
go test ./... : PASS
FiveSql2 SQL:1999 : 43/43
Harbour compat : 56/56
std.ch suite : 16/16
FRB suite : 7/7
Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
@@ -154,7 +154,26 @@
|
||||
|
||||
/* JOIN merges the current ("master") workarea with the named
|
||||
detail alias into a fresh DBF, emitting one output row per
|
||||
master/detail pair where FOR evaluates true. */
|
||||
master/detail pair where FOR evaluates true.
|
||||
|
||||
The ON form takes the equality-key field names directly and
|
||||
activates a hash-join fast path: build a hash over the detail's
|
||||
key column once, then probe per master row — O(N+M) total instead
|
||||
of the FOR form's O(N*M) nested-loop. Use it whenever the
|
||||
join predicate is a simple `master.k = detail.k` equality. The
|
||||
FOR form remains available for arbitrary predicates. Order
|
||||
matters: the ON rule is more specific so it wins.
|
||||
|
||||
Note for callers: ON expects bare field names, not expressions.
|
||||
Five doesn't auto-resolve bare identifiers to fields, but std.ch
|
||||
passes them as quoted strings via <(mfield)> / <(dfield)> so the
|
||||
PP captures the field names verbatim — runtime-side __dbJoin
|
||||
does its own field lookup. */
|
||||
#command JOIN WITH <(alias)> TO <(f)> [FIELDS <fields,...>] ;
|
||||
ON <mfield> = <dfield> => ;
|
||||
__dbJoin( <(alias)>, <(f)>, { <(fields)> }, NIL, ;
|
||||
<(mfield)>, <(dfield)> )
|
||||
|
||||
#command JOIN [WITH <(alias)>] [TO <(f)>] [FIELDS <fields,...>] ;
|
||||
[FOR <for>] => ;
|
||||
__dbJoin( <(alias)>, <(f)>, { <(fields)> }, <{for}> )
|
||||
|
||||
Reference in New Issue
Block a user