Files
five/compiler/pp/std.ch
CharlesKWON 2008266da7 feat(pp,rtl): Tier 2 audit followups — JOIN hash + PP validation + C heuristic
Three medium-priority audit items in one commit, each independently
revertible.

  * **#18 JOIN hash-join fast path.** New std.ch shape:
        JOIN WITH <alias> TO <file> [FIELDS ...] ON <mfield> = <dfield>
    expands to a 6-arg __dbJoin call with the master/detail key
    field names. Runtime detects the extra args, builds an O(M)
    hash over the detail's key column, then probes per master row
    for O(N+M) total — vs the FOR form's O(N*M). For 1k×1k that's
    2k vs 1M operations; the gap widens with N. The original FOR
    form is unchanged and stays the fallback for arbitrary
    predicates. New helper dbHashKey type-tags the key string so
    `1` (numeric), `"1"` (string), and `.T.` (logical) don't
    collide in the bucket map.

  * **#38 PP rule result-marker validation.** ParseRule now walks
    the result template after parseMarkers and warns about every
    `<name>` (or `<(name)>` / `<.name.>` / `<{name}>` / `#<name>`
    / `<"name">`) that doesn't match a pattern marker. Warnings
    flow into pp.errors via handleDirective with the directive's
    filename:line, so a typo'd `<NaMe>` in an `#xcommand`
    case-sensitive rule fails the build with a clear diagnostic
    instead of silently producing broken expansions.

  * **#44 looksLikeInlineC heuristic strengthened.** Catches more
    of the common Harbour-PRG-with-C-inline-block shapes that
    used to fall through and produce cryptic Go-side errors:
    function-like #define, `extern "C"` linkage blocks, C return-
    type declarations (`int foo(`, `static char* bar(`), and the
    hb_ret*() helper family used by Harbour's C FFI return
    setters. Two small predicate helpers (allLetters,
    allIdentChars) keep the C-vs-Go disambiguation tight enough
    that legit Go code (`func name() int { ... }`) doesn't trip.

  * **#28 LIST/DISPLAY pagination** — explicitly deferred. Proper
    pagination requires interactive terminal handling (Inkey(0)
    for the keypress) which would hang in CI / batch mode. Will
    revisit when an interactive terminal layer needs it for
    other reasons.

Test fixtures: tests/std_ch/test_join_hash.prg verifies the new
ON-form path produces the same output as the FOR form would.
std.ch runner now stands at 16/16.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 16/16
  FRB suite          : 7/7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:21:19 +09:00

216 lines
10 KiB
Plaintext

/*
* std.ch — Five standard preprocessor rules
*
* Equivalent to harbour-core/include/std.ch. Translates xBase legacy
* commands into function calls so the parser does not have to know
* about them. Auto-loaded by compiler/pp at startup.
*
* Phase A: only rules whose backend RTL function already exists in
* Five. Rules whose backend is not yet implemented (COPY, SORT,
* COUNT, SUM, AVERAGE, TOTAL, JOIN, LIST, DISPLAY, LABEL, REPORT,
* DIR) are deliberately NOT included here — the parser still handles
* them as silent no-ops until their RTL backend lands.
*
* Copyright (c) 2026 Charles KWON OhJun (charleskwonohjun@gmail.com)
* All rights reserved.
*/
/* --- file system --- */
#command ERASE <(f)> => FErase(<(f)>)
#command DELETE FILE <(f)> => FErase(<(f)>)
#command RENAME <(s)> TO <(d)> => FRename(<(s)>, <(d)>)
/* --- workarea lifecycle ---
Order matters: literal-keyword forms first, then bare CLOSE,
then the alias-form last so it doesn't shadow the others. */
#command CLOSE ALL => DbCloseAll()
#command CLOSE DATABASES => DbCloseAll()
#command CLOSE => DbCloseArea()
#command CLOSE <a> => <a>->( DbCloseArea() )
/* --- record state --- */
#command COMMIT => DbCommit()
#command UNLOCK ALL => DbUnlock()
#command UNLOCK => DbRUnlock()
/* --- record search --- */
#command LOCATE [FOR <for>] [WHILE <while>] ;
[NEXT <next>] [RECORD <rec>] [<rest:REST>] [ALL] => ;
__dbLocate(<{for}>, <{while}>, <next>, <rec>, <.rest.>)
#command CONTINUE => __dbContinue()
/* --- analytical (no extra RTL — just dbEval) ---
These mirror Harbour's std.ch but use single-value forms. Multi-
expression SUM/AVERAGE (`SUM x, y TO sx, sy`) use optional-repeat
syntax in Harbour and can be added here once a real test exercises
the more elaborate form. */
/* COUNT/SUM/AVERAGE require TO <var> — without it the rewrite
would produce naked assignment with no LHS. Match Harbour
std.ch which also makes TO non-optional. */
#command COUNT TO <v> [FOR <for>] [WHILE <while>] ;
[NEXT <next>] [RECORD <rec>] [<rest:REST>] [ALL] => ;
<v> := 0 ; dbEval( {|| <v> := <v> + 1 }, ;
<{for}>, <{while}>, <next>, <rec>, <.rest.> )
/* SUM and AVERAGE accept multiple paired expressions/destinations:
`SUM x, y, z TO sx, sy, sz`. The optional `[, <xN>]` and
`[, <vN>]` repeats are matched pairwise; the result template's
chained `<v1> :=[ <vN> :=] 0` and comma-list inside the dbEval
block expand once per extra pair. Single-pair usage is unchanged. */
#command SUM <x1> [, <xN>] TO <v1> [, <vN>] ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [ALL] => ;
<v1> :=[ <vN> :=] 0 ; ;
dbEval( {|| <v1> := <v1> + <x1>[, <vN> := <vN> + <xN>] }, ;
<{for}>, <{while}>, <next>, <rec>, <.rest.> )
#command AVERAGE <x> TO <v> ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [ALL] => ;
<v> := __dbAverage( <{x}>, ;
<{for}>, <{while}>, <next>, <rec>, <.rest.> )
/* --- bulk record export ---
COPY TO copies visible records of the current workarea into a fresh
DBF. FIELDS/FOR/WHILE/NEXT/RECORD/REST work as in Harbour. SDF and
DELIMITED variants are not implemented; the matching rules below
raise a clear runtime error so callers don't quietly get a regular
DBF copy when they asked for an SDF dump. Order matters: the SDF /
DELIMITED rules must come before the regular COPY rule. */
#command COPY [TO <(f)>] [FIELDS <fields,...>] SDF [<*tail*>] => ;
__dbNotImpl("COPY TO ... SDF")
#command COPY [TO <(f)>] [FIELDS <fields,...>] DELIMITED [<*tail*>] => ;
__dbNotImpl("COPY TO ... DELIMITED")
#command COPY [TO <(f)>] [FIELDS <fields,...>] ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [ALL] => ;
__dbCopy( <(f)>, { <(fields)> }, ;
<{for}>, <{while}>, <next>, <rec>, <.rest.> )
/* SORT TO copies the visible records into a fresh DBF in key order.
Each key in `<fields>` may carry `/D` for descending; default is
ascending. */
#command SORT [TO <(f)>] [ON <fields,...>] ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [ALL] => ;
__dbSort( <(f)>, { <(fields)> }, ;
<{for}>, <{while}>, <next>, <rec>, <.rest.> )
/* --- console output ---
LIST emits every record matching the filter; DISPLAY without ALL
shows just the current record. Both share __dbList — lAll
distinguishes them. TO FILE redirects to a freshly-truncated text
file; TO PRINTER is rejected at PP-time (Five doesn't drive a
printer port). Order matters: more specific rules first. */
#command LIST [<v,...>] TO PRINTER [<*tail*>] => ;
__dbNotImpl("LIST ... TO PRINTER")
#command DISPLAY [<v,...>] TO PRINTER [<*tail*>] => ;
__dbNotImpl("DISPLAY ... TO PRINTER")
#command LIST [<v,...>] TO FILE <(f)> [<off:OFF>] ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [ALL] => ;
__dbList( <.off.>, { <{v}> }, .T., ;
<{for}>, <{while}>, <next>, <rec>, <.rest.>, <(f)> )
#command DISPLAY [<v,...>] TO FILE <(f)> [<off:OFF>] ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [<all:ALL>] => ;
__dbList( <.off.>, { <{v}> }, <.all.>, ;
<{for}>, <{while}>, <next>, <rec>, <.rest.>, <(f)> )
#command LIST [<v,...>] [<off:OFF>] ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [ALL] => ;
__dbList( <.off.>, { <{v}> }, .T., ;
<{for}>, <{while}>, <next>, <rec>, <.rest.> )
#command DISPLAY [<v,...>] [<off:OFF>] ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [<all:ALL>] => ;
__dbList( <.off.>, { <{v}> }, <.all.>, ;
<{for}>, <{while}>, <next>, <rec>, <.rest.> )
/* TOTAL TO writes one record per consecutive run of equal key values
from the source. Numeric fields named in FIELDS are summed; every
other (non-memo) field takes the first record's value. The source
must already be sorted/indexed on the key for the grouping to
produce one row per distinct value.
Note on key syntax — TOTAL evaluates `<key>` only in the source
workarea, so `<{key}>` (verbatim blockify) is enough; user can
write `ON src->dept` (alias-qualified) or `ON _FIELD->dept`
(current-area). UPDATE FROM evaluates the key block in BOTH
master and detail context and therefore needs `_FIELD->`-wrapped
bare keys instead — the two rules look superficially similar but
their evaluation contexts differ. */
#command TOTAL TO <(f)> ON <key> [FIELDS <fields,...>] ;
[FOR <for>] [WHILE <while>] [NEXT <next>] ;
[RECORD <rec>] [<rest:REST>] [ALL] => ;
__dbTotal( <(f)>, <{key}>, { <(fields)> }, ;
<{for}>, <{while}>, <next>, <rec>, <.rest.> )
/* JOIN merges the current ("master") workarea with the named
detail alias into a fresh DBF, emitting one output row per
master/detail pair where FOR evaluates true.
The ON form takes the equality-key field names directly and
activates a hash-join fast path: build a hash over the detail's
key column once, then probe per master row — O(N+M) total instead
of the FOR form's O(N*M) nested-loop. Use it whenever the
join predicate is a simple `master.k = detail.k` equality. The
FOR form remains available for arbitrary predicates. Order
matters: the ON rule is more specific so it wins.
Note for callers: ON expects bare field names, not expressions.
Five doesn't auto-resolve bare identifiers to fields, but std.ch
passes them as quoted strings via <(mfield)> / <(dfield)> so the
PP captures the field names verbatim — runtime-side __dbJoin
does its own field lookup. */
#command JOIN WITH <(alias)> TO <(f)> [FIELDS <fields,...>] ;
ON <mfield> = <dfield> => ;
__dbJoin( <(alias)>, <(f)>, { <(fields)> }, NIL, ;
<(mfield)>, <(dfield)> )
#command JOIN [WITH <(alias)>] [TO <(f)>] [FIELDS <fields,...>] ;
[FOR <for>] => ;
__dbJoin( <(alias)>, <(f)>, { <(fields)> }, <{for}> )
/* UPDATE FROM walks the named detail alias and applies the
REPLACE ... WITH ... clauses to the matching master record.
Both areas should be sorted on the key for the default forward-
walk; pass RANDOM to scan master from top for each detail key.
Note 1: ON <key> is wrapped as `_FIELD-><key>` rather than the bare
`<{key}>` Harbour uses, because the same block must evaluate
against both master and detail. Bare identifiers don't auto-bind
to fields under Five — `_FIELD->` makes the dispatch explicit.
Note 2: FROM/ON/REPLACE are all required (Harbour technically allows
them in any order but every real call site provides all three). The
former optional brackets allowed compile-clean garbage like a bare
`UPDATE` to expand to a broken-syntax call. Keep them mandatory. */
#command UPDATE FROM <(alias)> ON <key> [<rand:RANDOM>] ;
REPLACE <f1> WITH <x1> [, <fN> WITH <xN>] => ;
__dbUpdate( <(alias)>, {|| _FIELD-><key> }, <.rand.>, ;
{|| _FIELD-><f1> := <x1>[, _FIELD-><fN> := <xN>] } )
/* --- bulk maintenance --- */
#command REINDEX => DbReindex()
#command PACK => DbPack()
#command ZAP => DbZap()
/* --- input / shell --- */
#command KEYBOARD <text> => Keyboard(<text>)
#command RUN <*cmd*> => hb_Run(<(cmd)>)
/* --- legacy GET system ---
MENU TO is intentionally absent: it requires the @ PROMPT statement
companion which Five doesn't implement. Adding the rule would let
user code compile and then panic at runtime on the missing
__MenuTo() symbol. Keep the parser's silent no-op for MENU TO until
@ PROMPT lands. */
#command CLEAR GETS => GetList := {}