feat(pp,rtl): Tier 2 audit followups — JOIN hash + PP validation + C heuristic

Three medium-priority audit items in one commit, each independently
revertible.

  * **#18 JOIN hash-join fast path.** New std.ch shape:
        JOIN WITH <alias> TO <file> [FIELDS ...] ON <mfield> = <dfield>
    expands to a 6-arg __dbJoin call with the master/detail key
    field names. Runtime detects the extra args, builds an O(M)
    hash over the detail's key column, then probes per master row
    for O(N+M) total — vs the FOR form's O(N*M). For 1k×1k that's
    2k vs 1M operations; the gap widens with N. The original FOR
    form is unchanged and stays the fallback for arbitrary
    predicates. New helper dbHashKey type-tags the key string so
    `1` (numeric), `"1"` (string), and `.T.` (logical) don't
    collide in the bucket map.

  * **#38 PP rule result-marker validation.** ParseRule now walks
    the result template after parseMarkers and warns about every
    `<name>` (or `<(name)>` / `<.name.>` / `<{name}>` / `#<name>`
    / `<"name">`) that doesn't match a pattern marker. Warnings
    flow into pp.errors via handleDirective with the directive's
    filename:line, so a typo'd `<NaMe>` in an `#xcommand`
    case-sensitive rule fails the build with a clear diagnostic
    instead of silently producing broken expansions.

  * **#44 looksLikeInlineC heuristic strengthened.** Catches more
    of the common Harbour-PRG-with-C-inline-block shapes that
    used to fall through and produce cryptic Go-side errors:
    function-like #define, `extern "C"` linkage blocks, C return-
    type declarations (`int foo(`, `static char* bar(`), and the
    hb_ret*() helper family used by Harbour's C FFI return
    setters. Two small predicate helpers (allLetters,
    allIdentChars) keep the C-vs-Go disambiguation tight enough
    that legit Go code (`func name() int { ... }`) doesn't trip.

  * **#28 LIST/DISPLAY pagination** — explicitly deferred. Proper
    pagination requires interactive terminal handling (Inkey(0)
    for the keypress) which would hang in CI / batch mode. Will
    revisit when an interactive terminal layer needs it for
    other reasons.

Test fixtures: tests/std_ch/test_join_hash.prg verifies the new
ON-form path produces the same output as the FOR form would.
std.ch runner now stands at 16/16.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 16/16
  FRB suite          : 7/7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-04 19:21:19 +09:00
parent 29ca02e1bc
commit 2008266da7
6 changed files with 379 additions and 14 deletions

View File

@@ -154,7 +154,26 @@
/* JOIN merges the current ("master") workarea with the named
detail alias into a fresh DBF, emitting one output row per
master/detail pair where FOR evaluates true. */
master/detail pair where FOR evaluates true.
The ON form takes the equality-key field names directly and
activates a hash-join fast path: build a hash over the detail's
key column once, then probe per master row — O(N+M) total instead
of the FOR form's O(N*M) nested-loop. Use it whenever the
join predicate is a simple `master.k = detail.k` equality. The
FOR form remains available for arbitrary predicates. Order
matters: the ON rule is more specific so it wins.
Note for callers: ON expects bare field names, not expressions.
Five doesn't auto-resolve bare identifiers to fields, but std.ch
passes them as quoted strings via <(mfield)> / <(dfield)> so the
PP captures the field names verbatim — runtime-side __dbJoin
does its own field lookup. */
#command JOIN WITH <(alias)> TO <(f)> [FIELDS <fields,...>] ;
ON <mfield> = <dfield> => ;
__dbJoin( <(alias)>, <(f)>, { <(fields)> }, NIL, ;
<(mfield)>, <(dfield)> )
#command JOIN [WITH <(alias)>] [TO <(f)>] [FIELDS <fields,...>] ;
[FOR <for>] => ;
__dbJoin( <(alias)>, <(f)>, { <(fields)> }, <{for}> )