64 Commits

Author SHA1 Message Date
CharlesKWON
675eaa4def feat(hbrtl): FV_HTTPGET / FV_HTTPPOST / FV_ZIP* / FV_XML_ROWS
New Five-native HTTP / ZIP / XML primitives so PRG code can do
HTTPS fetch, ZIP container reads, and streaming XML row extraction
without dropping into BEGINDUMP. FV_ prefix marks Five-original
RTL (distinct from Harbour-inherited HB_ surface).

FV_HTTPGET(cUrl [, hOpts]) / FV_HTTPPOST(cUrl, cBody [, hOpts])
  hOpts:   { headers: {=>}, timeout: nSec, tls_legacy: .T./.F. }
  Result:  { status, body, error, headers }
  tls_legacy re-enables TLS_RSA cipher suites for legacy
  endpoints (DART OpenAPI pins them).

FV_ZIPENTRIES(cZipBytes) / FV_ZIPREAD(cZipBytes, cEntryName)
  Read ZIP archives held in memory (e.g. from FV_HTTPGET).

FV_XML_ROWS(cXml, cRowTag)
  Streaming reader for repeating-record XML. Each row becomes a
  flat hash of immediate-child element name -> text. Verified
  against DART corpCode.xml: 30 MB / 118k rows in seconds, no
  full-tree allocation.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-29 08:47:34 +09:00
c7ac4044f7 feat(json): hb_jsonDecode 2-arg byref form (Harbour-spec compatible)
Previously hb_jsonDecode took only (cJSON) and returned the value.
That covers most uses but not the Harbour-spec second form

    nBytesParsed := hb_jsonDecode( cJSON, @xOut )

which mod_harbour / fivenode PRG (e.g. bridge_context.prg's
ctx_get / ctx_set) and any other code that wants the parse-length
relies on. The byref output was silently dropped, so a hash lookup
went through the @hOut path that was always NIL and fell back to
the default value — looking like a hash key was missing even
though the JSON parsed fine.

Now PCount() == 1 keeps the legacy return-value form; PCount() >= 2
writes the decoded value into local-2 via SetLocal (which is
already byref-aware) and returns the byte count (0 on parse error).

Verified: hb_jsonDecode('{"x":1,"y":2}', @h) writes the hash and
returns 13; the 1-arg form still returns the value as before;
Compat 56/56 + go test ./compiler/... ./hbrt/... ./hbrtl/... all pass.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 10:44:12 +09:00
ad6cc0bcee feat(rtl): add hb_HGetDef and PValue / hb_PValue
Two standard Harbour functions that fivenode-style PRG code (bridge_*.prg
and downstream apps) calls frequently. Without them, every reference
emits an analyzer WARN and resolves to NIL at runtime.

* hb_HGetDef(hHash, xKey, xDefault) — hash lookup with fallback.
* PValue(nIndex[, xDefault]) — read the nth parameter of the calling
  PRG function. Mirrors the PCount pattern: needs the caller frame's
  paramCount and locals, exposed via new hbrt.Thread.CallerLocal helper
  that pairs with the existing CallerParamCount.

Registered under PVALUE and HB_PVALUE (Harbour accepts both forms).

Verified: hb_HGetDef / PValue / HB_PVALUE all return expected values for
present-key, missing-key-with-default, missing-key-no-default, and
out-of-range-param cases. Full regression: go test (18 packages) +
Compat 56/56 + std.ch 17/17 + FRB 7/7 + FiveSql2 43/43 all green.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-27 09:55:19 +09:00
d7a81af7db feat(pgserver): binary-format param decoding (Phase 4.1)
pgx defaults to binary wire format for INT2/INT4/INT8/FLOAT4/FLOAT8/
BOOL/NUMERIC/DATE/TIMESTAMP/TIMESTAMPTZ — Go's most-used PG driver
ships nearly every typed parameter as binary unless explicitly told
to use text mode. The Phase 3 implementation only decoded INT4/INT8/
BOOL, so any pgx call with a decimal price, a timestamp, or a date
was silently mis-quoted into the SQL stream.

Decoders now cover the seven additional OIDs. The interesting one is
NUMERIC: PG's wire format is base-10000 digit groups plus a separate
displayed-scale, so the decoder rebuilds the decimal string from
weight+sign+ndigits+digits[] without going through float (which would
lose precision for NUMERIC(38,*) values). Pinned by vectors covering
zero / positive / negative / fractional-only / NaN / multi-group
integer + fraction cases.

DATE / TIMESTAMP decoders assume integer_datetimes=on (which the
server advertises in ParameterStatus); the 8-byte microsecond delta
from the PG epoch (2000-01-01 UTC) is converted via Go's time.Time
machinery and re-emitted as a quoted SQL literal.

Text-format path also broadened: FLOAT4/FLOAT8/INT2 now transit
unquoted alongside INT4/INT8/BOOL/NUMERIC; the regression would have
been clients sending text-format floats getting them rewritten as
'1.5' (string literal) instead of 1.5 (numeric).

Verified: all 6 mandatory gates green (go test, SQL 43/43, compat
56/56, std.ch 17/17, FRB 7/7, pgserver 11/11). Five new decoder
tests pin each wire format against handcrafted PG payloads.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 10:02:15 +09:00
e83787750a feat(pgserver): SCRAM-SHA-256 authentication (Phase 5.1)
PG14+ clients (libpq, pgx, JDBC) prefer SCRAM over MD5 when offered;
this lands the five-message exchange (SASL / SASLInitialResponse /
SASLContinue / SASLResponse / SASLFinal) so they get their preferred
path. MD5 stays as the universal fallback.

Storage stays plaintext in the in-memory role registry — per-auth we
generate a fresh salt + iter, derive SaltedPassword on the fly. Same
net security as the existing MD5 path, while matching wire output to
RFC 5802 byte for byte.

Critical detail: pgproto3's Backend multiplexes PasswordMessage,
SASLInitialResponse, and SASLResponse onto the same 'p' byte tag.
Without SetAuthType() the decoder picks PasswordMessage and the
handshake fails immediately. Switch state to AuthTypeSASL before
the client-first receive and AuthTypeSASLContinue before the
client-final receive.

Verified:
  * SCRAM math (PBKDF2 / HMAC / proof verify / server signature)
    via pinned unit test
  * Live psql round-trip — correct password accepted, wrong password
    rejected with proper SQLSTATE 28P01
  * All 6 mandatory gates green (go test, SQL 43/43, compat 56/56,
    std.ch 17/17, FRB 7/7, pgserver 11/11)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-22 09:24:34 +09:00
ed1aeeb212 feat(pgserver): pg_catalog stub for BI-tool connection compatibility
PostgreSQL clients (psql, pgx, DBeaver, Tableau, DataGrip,
pgAdmin) fire a barrage of catalog probes at connection time —
SELECT version(), SHOW server_version, SELECT FROM pg_namespace
/ pg_class / pg_type / pg_database / pg_settings. FiveSql2 can't
parse most of them. Without interception the BI tool either
errors out on connect or proceeds with a half-broken view of
the database (zero tables, no type info, no schema list). This
commit lands the minimum-viable catalog shim so the common
connect-and-list-tables flow succeeds.

Strategy
--------

Pattern-match catalog probes BEFORE handing the SQL to five_SQL.
Recognised shapes get synthesised result envelopes — same
`{ aFieldNames, aRows }` hbrt.Value shape the engine returns,
so the existing dispatchSimpleQuery / executePortal pipelines
stream them identically to a normal query.

Covered (v1.0)
--------------

  * SET / RESET / DISCARD <name>           → success, no-op
  * SHOW <name>                            → single-row response
                                             (server_version, server_encoding,
                                              client_encoding, DateStyle,
                                              transaction_isolation, etc.)
  * SELECT version() / current_database() / current_schema() /
    current_user / session_user / pg_backend_pid()  → single-row
  * SELECT … FROM pg_namespace             → 2 rows (pg_catalog + public)
  * SELECT … FROM pg_class                 → list of open workareas
                                             (relkind='r', relnamespace=public)
  * SELECT … FROM pg_attribute             → empty (stub; column-shape
                                             introspection deferred to v1.1)
  * SELECT … FROM pg_type                  → 7 OIDs FiveSql2 actually emits
                                             (bool, int4, int8, text, numeric,
                                              date, timestamp)
  * SELECT … FROM pg_database              → 1 row, the connect-time db name
  * SELECT … FROM pg_settings              → name/setting pairs matching SHOW
  * Anything else mentioning pg_catalog. / pg_<name> / information_schema.
    → empty result with generic field names (BI tool sees "0 rows" rather
    than a parse error)

Deliberate non-goals
--------------------

  * WHERE / JOIN evaluation — psql, pgx, DBeaver all filter
    client-side on the rows we return. We send the whole
    catalog and let them apply their predicates.
  * pg_attribute introspection — would need to re-derive
    column types from the open workarea + map back to PG OIDs.
    Tracked as v1.1 work.
  * Recursive CTE catalog queries (pgAdmin's tree builder uses
    them) — too brittle to pattern-match. Falls through to
    five_SQL where it errors loudly. pgAdmin's table-tree pane
    will then show "0 tables" but the connection itself stays
    alive.

Files
-----

  hbrtl/pgserver/catalog.go  (new, ~280 LOC)
    catalogIntercept(sql) → (handled, value)
    synthPgNamespace / synthPgClass / synthPgAttribute /
    synthPgType / synthPgDatabase / synthPgSettings
    simpleSelectFunction (version/current_*/pg_backend_pid)
    showResponse (SHOW <name>)

  hbrtl/pgserver/dispatch.go
    dispatchSimpleQuery: catalogIntercept ahead of runSQL.

  hbrtl/pgserver/extended.go
    executePortal: same intercept, ahead of runSQL.

Verification
------------

psql against a running pgserver, with sslmode=require + MD5:

  $ psql -c 'SELECT version()' -At
  PostgreSQL 14.0 (FiveSql2) (FiveSql2 wire-compat shim)

  $ psql -c 'SELECT * FROM pg_namespace' -At
  11|pg_catalog|10
  2200|public|10

  $ psql -c 'SELECT * FROM pg_type' -At
  16|bool|1
  23|int4|4
  20|int8|8
  25|text|-1
  1700|numeric|-1
  1082|date|4
  1114|timestamp|8

  $ psql -l    # \\l now works
          데이터베이스 목록
   oid | datname | datdba | 인코딩
  -----+---------+--------+--------
     1 | alice   |     10 |      6

Integration script gates grew from 6/6 → 9/9:
  PASS  Catalog probe: SELECT version()
  PASS  Catalog probe: pg_namespace lists public + pg_catalog
  PASS  Catalog probe: SHOW server_version_num

All six release gates green:
  go test ./...               ✓
  FiveSql2 SQL:1999 43/43     ✓
  Harbour compat 56/56        ✓
  std.ch 17/17                ✓
  FRB 7/7                     ✓
  pgserver integration 9/9    ✓ (+3 from catalog stubs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 22:31:52 +09:00
151b628f6c fix(pgserver): Layer 5 — per-path mmap-gen registry + getWA torn-read
Closes the Go-panic class of multi-session concurrency bugs and
introduces an explicit cross-area mmap invalidation channel.

1. getWA waCache torn-read (root cause of panics)

   hbrtl/rdd.go cached the most recent `interface{} → *WAM` type
   assertion in a process-global struct of two `interface{}`-
   shaped fields. Each pgserver connection's NewThread gets its
   own WAM, so the cache missed on every call and immediately
   re-wrote two shared, unsynchronised fields. Go's `interface{}`
   is two words; concurrent write + read produced torn pointer
   values, with the result that goroutine A could observe
   goroutine B's WAM as its own.

   That mis-attribution surfaced as:
     - `concurrent map writes` panic at WorkAreaManager.Close
       (workarea.go:95): two goroutines genuinely modifying the
       SAME wam.aliases map.
     - `concurrent map writes` panic at DBFArea.FieldPosCache
       (dbf.go:439): two goroutines lazy-initing the SAME
       fieldPosMap.

   Drop the cache. The type assertion is ~ns; not worth a
   process-global shared slot. If perf matters again, replace
   with a sync.Map keyed by thread pointer, not a single struct.

2. Per-path mmap generation registry (hbrdd/dbf/area_registry.go)

   Each unique on-disk DBF path gets an atomic uint64 generation
   counter. *DBFArea instances:
     - On Open: pathGen = pathGenFor(path); pathGenSeen = current.
     - On Append (shared) / flushRecord: bumpPathGen(path);
       pathGenSeen = current.
     - On loadRecord: if pathGenSeen < live counter, bypass mmap
       fast path for THIS load (use ReadAt) and re-sync seen.

   Without this, a peer DBFArea's PutValue mutating a record we'd
   mmap-cached returned stale pre-mutation bytes from our
   snapshot. The existing length-bound check covered file-grow
   (`offset > mmap len`) but not byte-level mutation within the
   snapshot range. The registry covers both.

   Cheap: read = one atomic.LoadUint64, hit rate is ~100% in the
   single-writer-many-readers steady state.

Verification
------------

Same 3 / 5 / 10-worker pgx-driven concurrency stress harness:

  pre-Layer-1 baseline:       ~60% pass + occasional panic
  +Layer 1+2:                 80% / 50% / panic
  +Layer 3a (max-merge):      80% / 50% / panic
  +Layer 4a (per-session 3):  90% / 80% / 50%
  +Layer 4b (Go atomics):     75-90% / 50-80% / panic (still)
  +THIS (getWA + mmap-gen):   73% / 67% / 33% — ZERO PANICS

The shift "many partial fails, no panics" is what matters for
production: a connection seeing stale data is recoverable (rerun
the query); a Go-level process crash is not. Remaining
correctness flake comes from the in-flight appendBuf interaction
when peer Append fires between this connection's Append and
flushRecord — that's tractable with a per-connection flush
ordering rule, deferred to Layer 6.

All six release gates green:
  go test ./...               ✓
  FiveSql2 SQL:1999 43/43     ✓
  Harbour compat 56/56        ✓
  std.ch 17/17                ✓
  FRB 7/7                     ✓
  pgserver integration 6/6    ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 21:43:04 +09:00
5e4a1c5d72 refactor(FiveSql2): cross-session globals → Go atomic + RWMutex
Completes the per-STATIC migration started in 5bba0c2. The
remaining three TSqlExecutor module STATICs (s_nSchemaVer,
s_nRCJSeq, s_hAutoInc) genuinely needed cross-connection
visibility — a CREATE TABLE on connection A MUST invalidate B's
plan cache, an RCJ alias MUST be unique across all live queries,
and an IDENTITY column MUST hand out monotonic values across all
writers. Moving them to TSqlSession (per-instance) would have
broken those semantics.

Solution: back them with Go-side primitives exposed via HB_FUNCs:

  s_nSchemaVer  →  atomic.Uint64 (SqlSchemaVer / SqlBumpSchemaVer)
  s_nRCJSeq     →  atomic.Uint64 (SqlNextRCJSeq, returns mod-100000)
  s_hAutoInc    →  sync.RWMutex + map[string][]string
                    (SqlSetAutoInc / SqlGetAutoIncFields)

Lives in `hbrtl/sqlglobals.go`. The PRG-side `FUNCTION
SqlSchemaVer() / SqlBumpSchemaVer() / SqlSetAutoInc() /
SqlGetAutoIncFields()` definitions in TSqlExecutor.prg are
deleted; the HB_FUNC dispatch takes their place. The single PRG
caller of `s_nRCJSeq` (in the RCJ helper around line 5600)
becomes `SqlNextRCJSeq()` and reads cleaner — the old
`s_nRCJSeq := (s_nRCJSeq + 1) % 100000` was both racy and a
non-atomic two-write update under multi-conn load.

The other module STATIC, `s_hAutoInc`, used to lazy-init on
first use (`IF s_hAutoInc == NIL ... := { => }`); two concurrent
first-CREATE TABLE calls hit "concurrent map writes" on that
branch. The Go RWMutex eliminates the race; reads still scale
(RLock) so the IDENTITY-lookup at INSERT time isn't a contention
hot-spot.

All six release gates green:
  go test ./...               ✓
  FiveSql2 SQL:1999 43/43     ✓
  Harbour compat 56/56        ✓
  std.ch 17/17                ✓
  FRB 7/7                     ✓
  pgserver integration 6/6    ✓

Concurrency stress (3-worker × 20):
  pre-Layer-1:    ~60% pass + occasional Go panic
  +Layer 1+2:     80% pass, no panics
  +3a:            80% pass
  +per-session 3 STATIC move:  90% pass
  +this commit:   ~75% pass (variability — Go map atomic + mutex
                  serialise the writers but the underlying
                  hbrdd multi-area mmap path still has its own
                  race, deferred to follow-up)

The next bottleneck is at the hbrdd workarea layer (multi-Area
instances per file each holding their own mmap snapshot), not at
the FiveSql2 STATIC level. That fix is its own commit.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-21 19:58:52 +09:00
3b2dd365ad feat(pgserver): Phase 6 — TLS + source-IP allowlist
Closes the v1.0 hardening surface: encrypted transport + a
coarse pg_hba.conf-equivalent CIDR allowlist. Together with the
Phase 5 auth flows, this is the security-baseline an internet-
exposed PostgreSQL-wire server needs.

TLS subsystem
-------------

`hbrtl/pgserver/tls.go`:

* `LoadTLSFromFiles(certPath, keyPath)` — cert/key PEM pair load
  with tls.VersionTLS12 floor. Installed as the *pending* config
  that the next PG_SERVER_START consumes (matches PG's
  "must-set-before-pg_ctl-start" semantics).

* `GenerateSelfSignedCert(certPath, keyPath, hostname)` — ECDSA
  P-256 + 365-day validity + DNSNames+IPAddresses SANs covering
  the hostname plus 127.0.0.1 / ::1. Dev/CI helper; production
  ships a CA-signed cert via the loader.

* `upgradeToTLS()` wraps `tls.Server(conn, cfg).Handshake()` so
  pgproto3 reads plaintext on top of the encrypted stream.

Source-IP allowlist
-------------------

* `AllowIP(cidr)` parses a CIDR and appends it to a per-server
  list snapshotted at PG_SERVER_START time.
* `peerAllowed(remote, list)` runs at accept() — empty list →
  accept any, otherwise drop connections whose RemoteAddr falls
  outside every registered range.
* `ClearAllowList()` resets to allow-all.

Coarse but compatible with the "host alice 10.0.0.0/8 md5"-style
entries every pg_hba.conf author already knows; a fuller per-
role/per-database matcher is Phase 6.1+.

PRG bindings (register.go)
--------------------------

New HB_FUNCs, all idempotent and composable in any order before
PG_SERVER_START:

  pg_tls_load( certPath, keyPath )           → .T. | cErr
  pg_tls_self_signed( cert, key, hostname )  → .T. | cErr
  pg_allow_ip( cidr )                        → .T. | cErr
  pg_clear_allowlist()                       → NIL

Bootstrap idiom:

  PROCEDURE Main()
     PG_TLS_SELF_SIGNED( "/tmp/cert.pem", "/tmp/key.pem", "localhost" )
     PG_ADD_ROLE( "alice", "swordfish" )
     PG_ALLOW_IP( "127.0.0.1/32" )
     PG_ALLOW_IP( "10.0.0.0/8" )
     PG_SERVER_START( ":5432", "md5" )

The startup banner now reports TLS + allowlist state so the PRG
operator sees the security posture at a glance:

  pgserver: listening on :5432 (auth=md5 tls=on allowlist=2)

Verification
------------

End-to-end via real psql against a self-signed server:

  $ PGPASSWORD=swordfish psql \
        "postgres://alice@127.0.0.1:15432/alice?sslmode=require" \
        -c "SELECT 'tls-works' AS x" -At
  tls-works

  $ # off-allowlist source (192.168.x.x mock) → connection refused
  $ # (verified manually; psql can't easily spoof src IP for CI)

Integration script gates expanded to 6/6:
  PASS  Simple Query
  PASS  Multi-statement Simple Query
  PASS  Transaction control
  PASS  MD5 auth: wrong password rejected
  PASS  MD5 auth: correct password accepted
  PASS  TLS handshake + MD5 auth via sslmode=require

All six release gates green:
  go test ./...               ✓
  FiveSql2 SQL:1999 43/43     ✓
  Harbour compat 56/56        ✓
  std.ch 17/17                ✓
  FRB 7/7                     ✓
  pgserver integration 6/6    ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:07:19 +09:00
90eafcfc06 feat(pgserver): Phase 5 — password + MD5 authentication
Trust mode (v1.0 default) accepts anyone; that's fine for embedded
demo but unshipping a multi-client database without credentials
would be irresponsible. This commit adds two of libpq's three
standard auth flows. SCRAM-SHA-256 is Phase 5.1 — pgx/psql both
fall back to MD5 cleanly when the server advertises only md5, so
v1.0's functional coverage is complete with the pair landed here.

Auth subsystem
--------------

`hbrtl/pgserver/auth.go` adds:

* An in-memory role registry: `roleMap map[string]*role` guarded by
  sync.RWMutex. Reads (lookupRole) are hot-path during connection
  startup so the RWMutex lets multiple sessions auth in parallel
  without serialising through a plain Mutex.

* `AddRole(name, password)` / `RemoveRole(name)` Go API consumed
  by the new HB_FUNCs `PG_ADD_ROLE` / `PG_REMOVE_ROLE` (see
  register.go). Bootstrap PRG idiom:

      PG_ADD_ROLE("alice", "swordfish")
      PG_ADD_ROLE("bob",   "hunter2")
      PG_SERVER_START(":5432", "md5")

* `authPassword()` — cleartext PasswordMessage exchange. The wire
  payload is plain so intended for TLS-protected links only;
  Phase 6 ties the warning to actual TLS detection on the session.

* `authMD5()` — libpq's md5 challenge:

      server → AuthenticationMD5Password{salt: 4 random bytes}
      client → "md5" || md5_hex( md5_hex(password || user) || salt )

  We recompute the canonical hash from the stored plaintext and
  compare. md5Challenge() is exported for pinning by a Go unit
  test (vector cross-checked against libpq's fe-auth-md5.c).

Salt is sourced from crypto/rand on every challenge so replay
attacks against a captured wire trace can't reuse a prior hash.

Dispatch matrix (Config.AuthMode → flow):
  "" / "trust" → AuthenticationOk immediately, no lookup
  "password"   → authPassword()
  "md5"        → authMD5()
  anything else→ 28000 + connection close

Tests
-----

Unit (hbrtl/pgserver/pgserver_test.go):
  PASS  TestMD5Challenge           (vector + determinism + diff)
  PASS  TestRoleRegistry           (add/replace/remove/lookup)

Integration (tests/pgserver/run.sh):
  PASS  Simple Query: SELECT 1, 'hello'
  PASS  Multi-statement Simple Query
  PASS  Transaction control: BEGIN/COMMIT round-trip
  PASS  MD5 auth: wrong password rejected
  PASS  MD5 auth: correct password accepted

End-to-end matrix with real psql:
  wrong password   → "ERROR: md5 authentication failed for user 'alice'"
  correct password → SELECT returns row
  unknown user     → "ERROR: md5 authentication failed for user 'eve'"
  password mode    → cleartext exchange works equivalently

All six release gates green:
  go test ./...               ✓
  FiveSql2 SQL:1999 43/43     ✓
  Harbour compat 56/56        ✓
  std.ch 17/17                ✓
  FRB 7/7                     ✓
  pgserver integration 5/5    ✓ (up from 3/3 in Phase 4)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-18 14:01:30 +09:00
8472928102 feat(pgserver): Phase 4 — Extended Protocol (Parse/Bind/Execute)
pgx and most drivers default to PostgreSQL's Extended Protocol
(named prepared statements). Phase 2 only handled Simple Query,
so every pgx caller had to force `QueryExecModeSimpleProtocol` —
unworkable for a production deployment. This commit lands the
full Parse → Bind → Describe → Execute → Sync state machine,
enough that pgx (and any other libpq-protocol-v3 client) works
without any client-side knobs.

Implementation lives in `hbrtl/pgserver/extended.go`:

* Per-session caches `stmts map[string]*preparedStmt` and
  `portals map[string]*portal`, lazily allocated on first use.
  Stored as fields on `session` so they don't leak across
  connections.

* Parameters are inlined at Bind time via `substituteParams` —
  the resolved SQL is a normal Simple-Query-shaped string the
  engine sees through the existing `five_SQL(cSQL, …, oSession)`
  pipeline. Avoids teaching FiveSql2 a second param-shape; the
  trade-off is that binary timestamps/numerics round-trip through
  text (Phase 4.1 will plumb `?`-params through aParams for the
  binary fast path).

* `paramToLiteral` decodes the binary-format encodings pgx uses
  by default for INT4/INT8/BOOL (big-endian fixed-width). Other
  binary OIDs fall back to a hex-escaped quoted literal which
  errors loudly rather than silently misparsing.

* `countPgPlaceholders` scans the SQL outside string literals for
  the highest `$N` so the server can answer Describe-statement
  with a correctly-sized ParameterDescription even when the
  client didn't pre-declare param OIDs. Without this, pgx errored
  with "expected 0 arguments, got 2" on the very first prepared
  query.

* RowDescription emission: Describe-statement still returns NoData
  (we can't infer row shape without execution). When Execute fires
  on a portal the client never Described, we emit RowDescription
  inline from the cached result before DataRow streams. pgx and
  psql both tolerate this ordering.

* Execute → CommandComplete tag derives from the SQL verb via the
  existing `commandTagFor` helper. Row counts in the tag remain
  "VERB 0" for v1.0; threading real counters through the engine
  is Phase 5.

Wire dispatch in `session.go:queryLoop` now handles Parse, Bind,
Describe, Execute, Close, Sync, Flush — the full v3 message set.

Verification
------------

End-to-end pgx (default mode, no SimpleProtocol flag) successfully
runs:
  SELECT $1 AS n, $2 AS s with 42 + "hi" → [42 hi]
  Same statement re-executed with different bound values → reuses
    the cached prepared statement
  SELECT $1 AS b, $2 AS s with true + "binary-bool" → [t binary-bool]

`tests/pgserver/run.sh` expanded from 1 → 3 integration assertions:

  PASS  Simple Query: SELECT 1, 'hello'
  PASS  Multi-statement Simple Query
  PASS  Transaction control: BEGIN/COMMIT round-trip

(Extended Protocol can't be driven from psql's -c CLI directly
because psql's PREPARE/EXECUTE is a separate SQL-level feature
that FiveSql2 doesn't parse; the pgx-driven path verifies it
manually, and a self-contained Go integration that drives pgx
from inside a process bootstrap is Phase 7 work.)

All six release gates green:
  go test ./...                       ✓
  FiveSql2 SQL:1999 43/43             ✓
  Harbour compat 56/56                ✓
  std.ch 17/17                        ✓
  FRB 7/7                             ✓
  pgserver integration 3/3            ✓

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-17 12:55:41 +09:00
708329785a test(pgserver): wire-protocol roundtrip via net.Pipe
Adds an in-process startup-handshake test using net.Pipe so we
can pin the protocol envelope (StartupMessage → AuthenticationOk
→ ParameterStatus×N → BackendKeyData → ReadyForQuery) without
binding a real TCP port. Runs in <1ms; safe for CI.

The PRG-dispatch path (runSQL → FIVE_SQL → row encoding) is
already covered manually by spinning a `five run` of
`pg_server_start(":15432")` and connecting with pgx — that flow
verified post-MVP that a real PostgreSQL client receives
`{ONE (INT4), GREET (TEXT)}` + row `[1 hello]` for
`SELECT 1 AS one, 'hello' AS greet` over the wire. An automated
shell harness will land in Phase 7 with the psql integration
tests.

Also rolls go.mod / go.sum forward with the pgx v5 toolchain pulled
in by Phase 2's pgproto3 dependency. Module bump 1.21.13 → 1.25.0
matches what `go get github.com/jackc/pgx/v5/pgproto3` selected;
cross-builds for windows/linux/darwin all still succeed (verified
locally).

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 22:13:40 +09:00
d98f5e1767 feat(pgserver): PostgreSQL-wire MVP — psql can SELECT from FiveSql2
First end-to-end working version of the PostgreSQL-wire-compatible
TCP server frontend. A standard `psql` client now connects, runs
`SELECT * FROM employees`, and gets back a properly typed result
set rendered by psql with the right column alignment:

    ID |         NAME         |  SALARY
    ----+----------------------+----------
      1 | Alice                | 50000.00
      2 | Bob                  | 42000.50
      3 | Cho                  | 77500.00

This is the Phase 2 deliverable from the approved plan at
/Users/charleskwon/.claude/plans/compiled-launching-shore.md.
Builds on the session-state refactor in 93cf5c8 — each connection
gets its own TSqlSession on the PRG side via the new PG_NEW_SESSION
HB_FUNC, so concurrent psql clients won't share transaction logs
or plan caches.

Scope
-----

v1.0 MVP: Simple Query only, trust auth, no TLS yet. SELECT works
against the full FiveSql2 surface (CTEs, window functions, JOINs,
aggregates). DML + per-session transactions are Phase 3, extended
protocol is Phase 4, auth + TLS are Phases 5/6.

Architecture
------------

  psql/pgx/JDBC ──TCP:5432──▶ pgserver.Listener
                                  │ accept()
                                  ▼ go handleConn(net.Conn)
                             ┌─────────────────────────────┐
                             │ Session goroutine            │
                             │  1. SSLRequest peek          │
                             │  2. StartupMessage           │
                             │  3. AuthenticationOk (trust) │
                             │  4. ParameterStatus×7        │
                             │  5. BackendKeyData           │
                             │  6. ReadyForQuery('I')       │
                             │  7. loop: Receive() →        │
                             │     dispatchSimpleQuery →    │
                             │     hbrt.Thread.Function(    │
                             │       FIVE_SQL,sql,...,sess) │
                             │     emit RowDescription      │
                             │     emit DataRow×N           │
                             │     emit CommandComplete     │
                             │     emit ReadyForQuery       │
                             └─────────────────────────────┘

One goroutine per connection, each owning its own *hbrt.Thread and
TSqlSession instance. Uses the existing audit-fixed NewThread()
(cde8673) so statics + WA factory propagate.

New files (hbrtl/pgserver/)
---------------------------

* server.go — Config, Server, Serve loop with MaxConnections gate
  via semaphore, Close drains in-flight sessions.
* session.go — full lifecycle: SSLRequest peek + prefixedConn
  byte-injection trick for StartupMessage, ParameterStatus
  broadcast (server_version "14.0 (FiveSql2)" so pgx negotiates),
  BackendKeyData (random pid+secret per session, no CancelRequest
  yet), query loop dispatching only Simple Query in v1.0 with a
  loud "0A000 not supported" for Extended messages.
* dispatch.go — runSQL invokes FIVE_SQL via PushSymbol+Function,
  unpacks the engine's `{aFieldNames, aRows}` envelope or the
  `{{"__error__"}, {{nCode, cMsg, cSQL}}}` error shape, emits
  RowDescription with text-format OIDs and DataRow per row.
* typemap.go — pgTypeFor() picks INT4 / INT8 / NUMERIC / TEXT /
  DATE / TIMESTAMP / BOOL by sampling the first row's value type;
  encodeText() formats each cell, returning nil-slice for NULL
  (the PG length=-1 convention).
* errmap.go — sqlStateFor() maps FiveSql2 SQL_ERR_* codes to
  canonical PG SQLSTATEs (42601/42P01/42703/42804/23505/23514/
  23503/25P02/42501/02000/XX000).
* auth.go — trust mode in v1.0; password/MD5/SCRAM lands Phase 5
  but the dispatch sentinel is already in place.
* tls.go — upgradeToTLS stub for SSLRequest handling; the byte-
  ordering is already wired so Phase 6 just plugs in tls.Config.
* register.go — package init() registers pg_server_start /
  pg_server_stop HB_FUNCs. Importing the package (done from
  hbrtl/register.go via blank import) is enough to enable them.
* pgserver_test.go — unit tests for encodeText (numeric, string,
  NIL), pgTypeFor (OID dispatch), sqlStateFor (error mapping),
  commandTagFor (SELECT/INSERT/UPDATE/DELETE/BEGIN/COMMIT).

Other changes
-------------

* _FiveSql2/src/TSqlSession.prg — added PG_NEW_SESSION() factory
  used by the Go dispatcher to allocate a per-connection session
  bypassing the embedded process default.
* hbrtl/register.go — blank-import five/hbrtl/pgserver so its
  init() fires and the HB_FUNCs land in the global dynamic-func
  table for VM symbol lookup.
* go.mod / go.sum — github.com/jackc/pgx/v5 v5.9.2 (pgproto3
  subpackage). MIT license. Same library pgx itself uses, so
  protocol coverage matches the de-facto Go PG ecosystem.

Verification
------------

  $ pg_server_start(15432, "trust")     /* PRG one-liner */
  $ psql -h 127.0.0.1 -p 15432 -U fiveuser -c 'SELECT * FROM employees'
  → 3 rows rendered correctly by psql (ID as INT4, NAME as TEXT,
    SALARY as NUMERIC(10,2) with 2 decimal places)

All six release gates green:
  go test ./...               ✓ (incl. new hbrtl/pgserver tests)
  FiveSql2 SQL:1999 43/43     ✓
  Harbour compat 56/56        ✓
  std.ch 17/17                ✓
  FRB 7/7                     ✓
  examples 65/71              ✓ (unchanged baseline)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-16 18:40:32 +09:00
cde86730b8 fix(compiler,hbrt,hbrdd,cli): pre-1.0 audit — 13 critical fixes
Senior-engineer / QA audit landed 13 silent-miscompile and data-
integrity fixes spanning the whole compiler+runtime+storage stack.
Each fix is paired with either an integration test in the suite or
a focused regression check; all 6 release gates stay green:
go test ./..., FiveSql2 43/43, Harbour compat 56/56, std.ch 17/17,
FRB 7/7, examples 65/71.

Compiler
--------

* genpc IF/ELSEIF jumpEnd2 patching (compiler/genpc/genpc.go).
  Per-ELSEIF branch terminators were stashed into `_ = jumpEnd2`
  and never patched — the relative offset stayed 0 and the runtime
  walked the next ELSEIF's PcOpJumpFalse opcode as if it were
  jump-offset data. Bytecode-level corruption in pcode mode. Now
  collected into a slice and patched at end-of-IF. Verified via
  Grade(95..50) cases 11a-e added to tests/frb/test_frb_pcode_sweep.

* countLocalsInStmts / scanBodyLocals missing bodies
  (compiler/gengo/gen_util.go, compiler/gengo/gengo.go). Frame-size
  counter skipped WATCH/TIMEOUT/PARALLEL FOR bodies, so a LOCAL
  declared inside one of those constructs got a slot index past
  the runtime's allocated count — silent NIL reads or out-of-range
  stomps.

* emitMethodDeclStandalone nested LOCAL (compiler/gengo/gen_class.go).
  Same bug class but on the *method* side. Pre-fix repro:

      METHOD Stomp(n) CLASS T
         LOCAL a := 1, b := 2
         IF n > 0
            LOCAL c := 30, d := 40, e := 50, f := 60
            Inner( n )
            IF c != 30 .OR. d != 40 .OR. e != 50 .OR. f != 60 ...

  printed `c, d, e, f = 5, NIL, NIL, NIL` because Inner's frame
  collided with Stomp's underallocated slot range. Now counts
  body-nested LOCALs into the frame and pre-allocates indices via
  scanBodyLocals.

* genpc unsupported-AST diagnostic surface (compiler/genpc/genpc.go,
  hbrt/pcode.go, cmd/five/main.go, hbrtl/frb.go). The `default`
  cases in emitStmt / emitExpr silently emitted PushNil / no-op
  for nodes the pcode generator doesn't implement (ClassDecl,
  MethodDecl, xBase commands, concurrency primitives, …). Added
  `PcodeModule.Warnings []string` populated by noteUnsupported,
  surfaced on stderr from the build pipeline. Users now see
  "pcode: AST node not supported in --pcode/FRB-pcode mode: stmt
  *ast.GoBlockStmt" instead of getting a silently broken module.

Runtime
-------

* class.go Send/tryBinaryOp t.self defer-restore (hbrt/class.go).
  Restoration was a plain `t.self = oldSelf` after `fn(t)`. Any
  panic in the method body skipped the line, so the next BEGIN
  SEQUENCE / RECOVER handler ran with the THROWING object's Self
  — `::field` resolved against the wrong receiver. Wrapped both
  restore sites in `defer func() { t.self = oldSelf }()`.
  Verified: pre-fix RECOVER saw "THROWER", post-fix "OUTER".

* hbfunc.go HB_FUNC parameter Frame() (hbrt/hbfunc.go). The
  RegisterDynamicFunc wrapper called `fn(ctx)` without ever
  calling Frame, so `ctx.ParC(1)` / `ctx.Local(n)` read through
  `t.curFrame.localBase + n - 1` against the *caller's* frame.
  Every #pragma BEGINDUMP HB_FUNC taking parameters silently
  returned "" / 0 / "" for them — masked by ParNIDef-style
  defaults. Wrapper now does `t.Frame(t.pendingParams, 0); defer
  t.EndProc()` before dispatch.

* pcode codeblock closure capture (hbrt/pcinterp.go, hbrt/pcode.go,
  hbrt/thread.go, compiler/genpc/genpc.go). PcOpPushBlock recorded
  `nDetached` but never copied enclosing locals; free vars in the
  block body fell through to memvar lookup → NIL. Wired full
  capture pipeline:
  - New opcodes PcOpPushDetached (0x59) / PcOpPopDetached (0x5A).
  - PushBlock now reads per-slot source-local indices and
    snapshots into bb.Detached at construction time.
  - New detachedMap in genpc auto-promotes any free var that
    resolves to an enclosing-frame local into a capture slot.
  - emitAssignAsExpr leaves the assigned value on the eval stack
    so SeqExpr items like `{|v| acc += v, acc }` work.
  - Thread tracks curBlock with paired Set/restore in the block's
    Fn wrapper for nested-block evaluation.
  Mutating capture (acc += v across successive Evals) now works.

* vm.NewThread statics + waFactory propagation (hbrt/vm.go).
  GoLaunch / GoLaunchBlock call NewThread directly. Previously
  the statics map and WA factory were applied only in Run(), so
  goroutine-spawned PRG code panicked on STATIC access ("static
  index out of range") and crashed dereferencing nil WA on any
  DB call. Both now happen inside NewThread under the same lock
  as TID assignment.

Data layer
----------

* dbf concurrent Append lock (hbrdd/dbf/dbf.go,
  hbrdd/dbf/locks_posix.go, hbrdd/dbf/locks_windows.go). Append
  bumped a local recCount with no file-system serialization. Two
  shared-mode processes both wrote at the same RecordOffset; one
  record silently overwrote the other. Added an append-intent
  byte-range lock at offset 0x7FFFFFFE + bounded retry, on-disk
  header refresh inside the locked region, and immediate header
  write so peers refresh past our slot.

* indexer negative numeric key encoding (hbrdd/dbf/indexer.go +
  new hbrdd/dbf/encode_numeric_test.go). `%20.10f` formats `-100`
  as `"     -100.0000000000"` and `99` as `"        99.0000000000"`.
  ASCII ' ' (0x20) < '-' (0x2D), so `99` lex-compared LESS than
  `-100` — every NTX/CDX index over a column that ever held a
  negative number returned wrong rows for SEEK / range scans.
  Replaced with a 1-byte sign prefix + 21-byte zero-padded
  magnitude (negatives use digit-complement) so byte order
  matches numeric order across signs and magnitudes. Format
  change: existing indexes built with the old encoding must be
  REINDEXed. Three unit tests pin the order.

* dbf Append index maintenance hooks (hbrdd/dbf/dbf.go,
  hbrdd/dbf/indexer.go). Append never inserted into open NTX/CDX
  indexes — the audit's canonical scenario `SET INDEX TO …;
  APPEND BLANK; REPLACE …; dbSeek …` silently missed the new
  record. Added optional IndexWriter interface, queue the new
  recNo in pendingIdxInserts, drain after flushRecord by calling
  InsertKey on every open writer-supporting engine. NTX
  participates (its existing rebuild-on-insert is correct);
  CDX online maintenance is deferred to a follow-up — those
  indexes still need REINDEX. Verified: post-fix SEEK("Charlie")
  after APPEND BLANK + REPLACE finds the new record.

* dbf PACK crash-safety (hbrdd/dbf/dbf.go). The old in-place
  rewrite read record N, overwrote slot M<N, then truncated.
  Power loss after partial loop left a file with overwritten
  prefix and no original copies of the records already advanced
  past — silent data loss. Rewrote to:
    1) drop mmap, build `<file>.pack.tmp` with all surviving
       records,
    2) Sync(),
    3) close original handle + os.Rename(tmp, orig) (atomic on
       same FS),
    4) reopen + re-mmap.
  TestComp_Pack passes; readers always see either the pre-PACK
  or post-PACK contents, never a half-state.

* mem RDD torn reads (hbrdd/mem/memrdd.go). The comment claimed
  in-place PutValue was safe because hbrt.Value "fits in a
  single machine word + pointer". hbrt.Value is 24 bytes (3
  words) — a concurrent reader could observe new type tag with
  stale scalar/ptr and type-confuse on the next AsXxx() call.
  Switched mu to sync.RWMutex; GetValue takes RLock,
  Append/PutValue/Delete/Recall take Lock. `go test -race
  ./hbrdd/mem/` clean.

Files touched
-------------

  compiler/gengo/gen_class.go, gen_util.go, gengo.go
  compiler/genpc/genpc.go
  hbrt/class.go, hbfunc.go, pcinterp.go, pcode.go, thread.go, vm.go
  hbrdd/dbf/dbf.go, indexer.go, locks_posix.go, locks_windows.go
  hbrdd/dbf/encode_numeric_test.go  (new)
  hbrdd/mem/memrdd.go
  cmd/five/main.go
  hbrtl/frb.go
  tests/frb/test_frb_pcode_sweep.prg

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-13 05:29:56 +09:00
2008266da7 feat(pp,rtl): Tier 2 audit followups — JOIN hash + PP validation + C heuristic
Three medium-priority audit items in one commit, each independently
revertible.

  * **#18 JOIN hash-join fast path.** New std.ch shape:
        JOIN WITH <alias> TO <file> [FIELDS ...] ON <mfield> = <dfield>
    expands to a 6-arg __dbJoin call with the master/detail key
    field names. Runtime detects the extra args, builds an O(M)
    hash over the detail's key column, then probes per master row
    for O(N+M) total — vs the FOR form's O(N*M). For 1k×1k that's
    2k vs 1M operations; the gap widens with N. The original FOR
    form is unchanged and stays the fallback for arbitrary
    predicates. New helper dbHashKey type-tags the key string so
    `1` (numeric), `"1"` (string), and `.T.` (logical) don't
    collide in the bucket map.

  * **#38 PP rule result-marker validation.** ParseRule now walks
    the result template after parseMarkers and warns about every
    `<name>` (or `<(name)>` / `<.name.>` / `<{name}>` / `#<name>`
    / `<"name">`) that doesn't match a pattern marker. Warnings
    flow into pp.errors via handleDirective with the directive's
    filename:line, so a typo'd `<NaMe>` in an `#xcommand`
    case-sensitive rule fails the build with a clear diagnostic
    instead of silently producing broken expansions.

  * **#44 looksLikeInlineC heuristic strengthened.** Catches more
    of the common Harbour-PRG-with-C-inline-block shapes that
    used to fall through and produce cryptic Go-side errors:
    function-like #define, `extern "C"` linkage blocks, C return-
    type declarations (`int foo(`, `static char* bar(`), and the
    hb_ret*() helper family used by Harbour's C FFI return
    setters. Two small predicate helpers (allLetters,
    allIdentChars) keep the C-vs-Go disambiguation tight enough
    that legit Go code (`func name() int { ... }`) doesn't trip.

  * **#28 LIST/DISPLAY pagination** — explicitly deferred. Proper
    pagination requires interactive terminal handling (Inkey(0)
    for the keypress) which would hang in CI / batch mode. Will
    revisit when an interactive terminal layer needs it for
    other reasons.

Test fixtures: tests/std_ch/test_join_hash.prg verifies the new
ON-form path produces the same output as the FOR form would.
std.ch runner now stands at 16/16.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 16/16
  FRB suite          : 7/7

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-04 19:21:19 +09:00
efb615bed9 fix(frb,genpc): in-process compile + 4 pcode bugs
Compiling _FiveSql2/test/test_sql_extreme.prg + a sweep of the FRB
demos surfaced four real bugs in the dynamic-compilation pipeline.
All fixes shipped together because they were on the same critical
path; each is independently revertible.

  * **pcode FOR loop ignored STEP and direction.** emitFor in
    compiler/genpc emitted a fixed `<= to` comparison and a hardcoded
    `+1` increment, then deleted the actual step expression with
    slice arithmetic on the byte buffer. Result: `FOR 5 TO 1 STEP
    -1` exited on the first iteration; `FOR 1 TO 10 STEP 2` summed
    1..10 (55) instead of 1+3+5+7+9 (25). Rewritten to mirror
    gengo's emitFor: detect negative step from a literal `-N` or
    unary MINUS, pick `<=` vs `>=` accordingly, and emit a clean
    `var := var + step` increment per iteration.

  * **pcode compound `+=` operator stored only the RHS.** emitAssign
    looked at AssignExpr.Op only for the := case; +=/-=/etc.
    silently took the same path, so `n += i` compiled as `n := i`,
    discarding the accumulator. Loop reduces were wrong: `Reverse`
    returned "" and `n := 0; FOR i ... n += i; NEXT` returned only
    the last increment. New compoundBinOp helper maps PLUSEQ /
    MINUSEQ / STAREQ / SLASHEQ / PERCENTEQ / POWEREQ to their
    matching binary opcode; emitAssign emits `local + rhs ; pop
    local` for compound forms.

  * **Pcode body stack leaks polluted the caller's frame.** A pcode
    function whose body left intermediate values on the data stack
    (FOR control values, etc.) returned with extra entries past
    its declared retVal. FrbDoFunc / FrbExecFunc / FrbRunFunc then
    pushed retVal on top of those leaks, so the caller saw the
    leaked values where its own preceding arguments should have
    been: `? "Fibonacci(10) =", FrbDo(...), "(expect 55)"` printed
    `1 55 (expect 55)` because the FOR loop's `1` lived in arg-1's
    slot. Two new Thread methods (`SP()` / `SetSP(int)`) let the
    three FRB dispatchers snapshot stack depth before the inner
    call and clamp it back afterward, so the leaks evaporate before
    they reach the caller's frame.

  * **FrbExec / FrbRun recursed into the host's Main forever.** Both
    looked up "MAIN" via t.VM().FindSymbol, which always resolved
    to the OUTER program's Main since FRB modules deliberately keep
    Main local. Compile + run + unload became compile + recurse +
    OOM. Both now look up Main via mod.FindFunc("MAIN") (module
    scope) — Frbload's policy of leaving Main module-local now
    actually has the intended effect.

Plus an architectural improvement: in-memory compilation no longer
depends on shelling out to an external `five` binary. New
hbrtl.frbCompileInProc parses + preprocesses + generates pcode in
process, building a FrbModule directly. FrbCompile and FrbExec use
this exclusively, which means dynamic compilation works from any
directory regardless of PATH and without a second process. The
plugin-mode path (with its runtime-version-mismatch fragility) is
left available via hbrt.FrbCompileSource for callers that want it,
but FrbCompile no longer reaches for it by default.

Test suite: tests/frb/ holds five fixtures + a runner. 5/5 pass:
test_frb_simple / test_frb_pcode_load / test_frb_compile /
test_frb_loop / test_frb_step.

Other gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 14/14
  FRB suite          : 5/5

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-02 10:25:35 +09:00
412351b67d feat(rtl): LIST/DISPLAY TO FILE — text output redirection
Wire up TO FILE for both LIST and DISPLAY: __dbList grows a 9th
parameter cFile, opens it (truncating any prior content) when non-
empty, and writes the formatted rows there via fmt.Fprintln. Default
behavior (no TO FILE) still goes to stdout.

std.ch gets two new rules placed *before* the regular LIST/DISPLAY
patterns so they win when TO FILE is present:

  LIST    [<v,...>] TO FILE <(f)> [OFF] [FOR] [WHILE] [NEXT] ...
  DISPLAY [<v,...>] TO FILE <(f)> [OFF] [FOR] [WHILE] [NEXT] ...

Open failure raises a clear *HbError ("LIST/DISPLAY TO FILE: cannot
create <path> — <syscall reason>") so callers know exactly what went
wrong instead of getting partial-or-empty output.

TO PRINTER stays rejected via __dbNotImpl — Five doesn't drive a
printer port. Test coverage: tests/std_ch/test_list_to_file.prg
exercises four shapes (full LIST, single-row DISPLAY, OFF + FOR with
explicit fields, and confirms TO PRINTER still raises). Wired into
the std.ch runner so the regression suite now stands at 14/14.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 14/14

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:15:32 +09:00
3a7f1dea72 feat(rtl,tests): pre-release UX round (Wave 5)
Three audit findings around polish + a release-readiness commit:

  * #UX1 LIST/DISPLAY output: dropped \r\n (unix terminals showed a
    stray ^M), moved the newline to AFTER each row (no more leading
    blank line), and added the `*` deleted-record marker after the
    record number — matches xBase LIST/DISPLAY convention. With
    SET DELETED ON the marker is unreachable since the row would
    have been skipped at Area.Skip level; with SET DELETED OFF the
    user now sees which rows are tombstoned.

  * #26 temp aliases: `__copytmp` / `__sorttmp` / `__totaltmp` /
    `__jointmp` were process-global string constants. A nested
    invocation (e.g., COPY inside a FOR clause whose expression
    runs another COPY) collided on the alias and the inner Open
    failed with "alias already in use" — surfacing as `.F.` with
    no clear cause. Each Open now goes through a new helper
    `nextTmpAlias(prefix)` backed by an atomic counter, so every
    call gets `__copytmp_1`, `__copytmp_2`, etc. — no collisions.

  * #J test coverage gap: the 13 std.ch regression tests were all
    sitting in `/tmp` — lost on tmpfs reboot, never in git, never
    in CI. Move them into `tests/std_ch/` and add a simple
    `run.sh` runner that builds + executes each one in a temp
    scratch directory and grep-asserts on FAIL / NOT REJECTED /
    expectation-mismatch markers. 13/13 pass against the current
    head:

       PASS  test_pp_stdch       PASS  test_count
       PASS  test_sum_avg        PASS  test_sum_multi
       PASS  test_copy           PASS  test_sort
       PASS  test_list           PASS  test_total
       PASS  test_join           PASS  test_update
       PASS  test_set_deleted    PASS  test_unsupported
       PASS  test_block_comma

    test_block_comma in particular guards the gengo SeqExpr fix
    from Wave 1 — without it the comma-in-block miscompile would
    silently come back.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56
  std.ch suite       : 13/13

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:07:50 +09:00
1a9e509ee2 perf(rtl): SORT TO swaps insertion sort for sort.SliceStable (Wave 4)
Drop the toy O(n²) insertion-sort that __dbSort had been using and
delegate to the stdlib's sort.SliceStable. Reasoning: SORT TO is an
operation a user reaches for *because* their dataset is too big to
just iterate manually — interactive DBFs routinely have 10k–1M rows,
which the old impl would chew on for minutes to hours. SliceStable
gives O(n log n) and preserves the original-input ordering for
equal keys, which is what the previous implementation also tried to
do.

The function signature is unchanged (`stableSort(rows, less)`), so
all the multi-key / /D / /C dispatch logic from earlier waves keeps
working unmodified.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:03:13 +09:00
5b1d3fb32f feat(pp,rtl): pre-release accuracy round (Wave 3)
Four audit findings around correctness/consistency in std.ch and the
SORT/UPDATE/TOTAL handlers:

  * #13: TOTAL/UPDATE key idiom inconsistency documented as inherent.
    TOTAL evaluates `<key>` only in the source workarea so verbatim
    `<{key}>` (alias-qualified or `_FIELD->`-prefixed by the user)
    works. UPDATE evaluates the same block in BOTH master and detail
    context, so it must wrap as `_FIELD-><key>` to dispatch to
    whichever WA is selected at eval time. The two rules look alike
    but their evaluation contexts differ — also documented in
    std.ch alongside both rules so the asymmetry isn't a surprise.
    Plus: TOTAL TO and ON are now mandatory (matching the COUNT/
    UPDATE pattern from Wave 1) — bare TOTAL would have produced
    broken syntax via the unconditional `<(f)>`/`<{key}>` template
    references.

  * #15/#16: SDF / DELIMITED variants of COPY and TO PRINTER /
    TO FILE variants of LIST / DISPLAY are now matched by stub
    rules (placed *before* the regular rules so they win) that
    expand to a new `__dbNotImpl(reason)` RTL primitive raising a
    clear `&hbrt.HbError`. BEGIN SEQUENCE / RECOVER catches the
    panic, so callers get a real error instead of the previous
    silent dispatch-to-regular-DBF-copy.

  * #19: SORT /C (case-insensitive) now actually folds case before
    the string compare, instead of being silently treated as
    ascending. Suffix parser also rebuilt as a multi-letter scanner
    so `name/CD`, `name/DC`, `name/C/D`, `name/D/C` all parse the
    same way — combine /C and /D freely. Unknown suffix letters
    (e.g., `name/X`) leave the suffix attached to the field name
    so a stray slash in user input doesn't get silently mangled
    into a broken field reference.

  * #27 SET DELETED: verified with a regression test that
    `SET DELETED ON` causes COUNT/COPY (and by extension
    SORT/TOTAL/JOIN/UPDATE — all of which iterate via Area.Skip)
    to skip rows marked deleted. The filtering is implemented at
    the workarea level (skipFilter in dbf.go honors hbrdd.IsSetDeleted)
    so no RTL changes were needed; this commit just adds the
    coverage so the behavior doesn't silently regress.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 08:01:42 +09:00
f30704a854 fix(rtl,pp): pre-release safety round (Wave 2)
Five concrete gaps the audit flagged in the new __dbCopy / __dbSort /
__dbTotal / __dbJoin / PP code:

  * wam.Close() errors were dropped on the floor. Caller saw `.T.`
    even when the just-written DBF wasn't durable, leading to the
    classic "delete the source after the COPY succeeds" data-loss
    pattern. All four functions now capture the close error and
    return `.F.` if it fired.

  * drv.Create succeeded → wam.Open failed → orphaned-on-disk DBF.
    The user-named target file was left around with zero records,
    and the next call's drv.Create silently truncated it instead of
    surfacing the original error. Add `os.Remove(cFile)` on the
    Open-failure cleanup path for COPY/SORT/TOTAL/JOIN.

  * __dbTotal would write the DBF codec's overflow sentinel
    (`*****`) into the destination's sum-fields when a group total
    didn't fit in the source's declared field width, and still
    return `.T.`. Now: precompute each sum-field's max representable
    magnitude (10^(Len-Dec)) at start, mark the run as overflowed if
    any flush sees an out-of-range or NaN value, and propagate
    `.F.` to the caller so they don't trust the file.

  * cleanUnreferencedMarkers walked byte-by-byte and stripped any
    `<ident>` token in the result, INCLUDING ones that appear
    inside `"..."` / `'...'` string literals. A user expression
    like `LIST FOR url == "<a>x</a>"` got the `<a>` and `</a>`
    eaten on output. Now: track string-literal state and skip the
    cleanup pass while inside one. Bracket-strings `[…]` are
    intentionally not treated as strings here — the result template
    uses `[...]` as the optional-repeat marker, and disambiguating
    needs context the cleanup pass doesn't have.

  * (#8 SET SAFETY honoring) deferred. Harbour default is SAFETY
    OFF, so the current always-overwrite behavior matches default
    Harbour. The divergence only matters when user explicitly does
    `SET SAFETY ON`, which Five doesn't support yet — so the
    no-overwrite-protection is consistent end-to-end. Tracked as a
    separate followup.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-05-01 07:54:41 +09:00
80a18daf8d feat(pp): UPDATE FROM via std.ch + nested-bracket fix in matchSegment
`UPDATE [FROM <alias>] [ON <key>] [RANDOM] REPLACE <f1> WITH <x1>
[, <fN> WITH <xN>]` becomes a preprocessor rewrite to a new RTL
primitive __dbUpdate. For each detail record, find the master
record with matching key (forward-walk if both sorted, full scan
when RANDOM) and apply the REPLACE clauses in master's context.

Same shape as harbour-core/src/rdd/dbupdat.prg. The REPLACE clauses
expand to comma-separated assignments inside one block —
`{|| _FIELD->total := del->amt, _FIELD->status := "OK" }` — using
the multi-pair `[, <fN> WITH <xN>]` optional-repeat that std.ch
already establishes for SUM and DEFAULT.

Five-specific tweak: ON <key> wraps as `{|| _FIELD-><key> }` rather
than Harbour's bare `<{key}>`. Five doesn't auto-resolve a bare
identifier in a code block to the current workarea's field, and the
UPDATE block must evaluate against both detail and master so an
explicit alias prefix won't do — _FIELD-> dispatches to whichever
area is selected at eval time, which is what's needed.

Wiring up UPDATE surfaced one further matchSegment gap that fell
out of the multi-pair `[REPLACE ... [, ...]]` shape:

  * matchSegment didn't handle nested `[...]` inside its body.
    `[REPLACE <f1> WITH <x1> [, <fN> WITH <xN>]]` gave the inner
    `[` as a literal token to match against the line, so even the
    single-pair `REPLACE total WITH del->amt` form failed and f1/x1
    came back empty. Now matchSegment runs the same repeat-loop on
    inner `[...]` blocks that the top-level matcher uses, with its
    own outer-tail computed from the segment tail past the inner
    `]`.

Parser cleanup: UPDATE removed from the IDENT-statement no-op switch.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 17:49:33 +09:00
ebe12e1108 feat(pp): JOIN WITH ... TO via std.ch + __dbJoin RTL
`JOIN WITH <alias> TO <file> [FIELDS <list>] [FOR <expr>]` becomes a
preprocessor rewrite to a new RTL primitive __dbJoin. Cartesian
product of the current ("master") workarea and the named "detail"
alias, filtered by the FOR expression.

Output structure:
  * No FIELDS clause: master's fields followed by detail's, dropping
    any detail-side name that clashes with master.
  * FIELDS list: one column per name in declaration order, resolved
    against master first then detail.

Same shape as harbour-core/src/rdd/dbjoin.prg. Five-specific
simplifications: alias->name in FIELDS not yet supported (bare
names with master-precedence lookup); RDD/codepage args dropped
since Five only has DBFNTX.

Note for callers: don't name a workarea `M` or `MEMVAR` — both are
Harbour-reserved memvar aliases, so `M->field` and `MEMVAR->field`
always go through the memory-variable namespace, not the workarea.
This is gengo behavior matching Harbour, not new in this commit.

Parser cleanup: JOIN removed from the IDENT-statement no-op switch.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 16:42:06 +09:00
699ea90156 feat(pp): TOTAL TO via std.ch + __dbTotal RTL
`TOTAL TO <file> ON <key> [FIELDS <list>] [FOR ...] [WHILE ...]
[NEXT ...] [RECORD ...] [REST] [ALL]` joins the family of std.ch
DML rewrites. New RTL primitive __dbTotal:

  * Walk the source under dbEval-style FOR/WHILE/NEXT/RECORD/REST
    bounds. The source must already be sorted/indexed on the key —
    same precondition as Harbour's dbtotal.prg.
  * Track the current group key. On each key change, flush the
    accumulated row to the destination (writing the running totals
    back into the most recently appended record's sum-fields,
    preserving each field's declared length/decimals).
  * On the *first* record of every group, append a fresh dst row
    and copy all non-memo source fields into it; subsequent records
    in the group only contribute to the sums. Net effect: non-summed
    fields take the first record's value, summed fields hold the
    group total. Same shape as harbour-core/src/rdd/dbtotal.prg.
  * Memo fields are dropped from the destination structure (Harbour
    does the same).

Parser cleanup: TOTAL removed from the IDENT-statement no-op switch.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:24:41 +09:00
1cc2d94927 feat(pp): LIST / DISPLAY via std.ch + four PP completeness fixes
`LIST [<fields>] [OFF] [FOR ...] [WHILE ...] [NEXT ...] [RECORD ...]
[REST] [ALL]` and `DISPLAY [<fields>] [OFF] [FOR ...] ... [ALL]`
reach the parser as plain function calls to a new RTL primitive
__dbList (rtlDbList in hbrtl/database.go).

Implementation: walk the workarea under dbEval-style FOR/WHILE/NEXT/
RECORD/REST bounds. For each visible record, evaluate each column
block and emit the rendered values via valueToDisplay (the same
formatter QOut already uses). Empty fields list defaults to
"all fields". OFF suppresses the record-number prefix.
LIST always emits the full filtered range; DISPLAY without ALL emits
only the current record (encoded as nCount=1). TO PRINTER / TO FILE
clauses are not yet wired through — for now everything goes to
stdout.

Wiring up LIST/DISPLAY surfaced four further gaps in PP that were
silently masking bugs in any rule with multiple word-list / list /
optional clauses chained together:

  * matchSegment refused MarkerWordList inside `[...]`. The LIST
    rule's `[<off:OFF>]` clause therefore never set the off
    capture, and `<.off.>` substituted to nothing instead of .T./.F.
    matchSegment now matches WordList markers the same way the
    top-level matcher does.

  * `<v,...>` and `<(f)>` capture stop boundaries didn't include the
    values of following MarkerWordList markers. For
    `[<v,...>] [<off:OFF>] [<all:ALL>]` against `LIST id, name OFF`,
    the v list would happily eat OFF. New addStopFrom helper
    contributes both literal keywords and word-list values; both
    matchSegment's MarkerList branch and captureExpression now use
    it.

  * Optional-repeat loop in matchPattern merged a no-progress
    iteration's empty capture into the running multi-capture string
    (with the `\x01` separator) before the no-progress break check
    fired. So a successful first iteration's value got contaminated
    and the substitution loop then skipped it as multi-capture
    garbage. The merge now happens after the progress check.

  * Unreferenced `<.name.>` markers (optional clauses that didn't
    match in the input) were getting cleaned up to empty by the
    generic marker scrubber instead of the .F. sentinel Harbour's
    std.ch expects. New replaceUnreferencedLogify pass mirrors the
    existing replaceUnreferencedBlockify and runs just before the
    cleanup.

Parser cleanup: LIST and DISPLAY removed from the IDENT-statement
no-op switch in both parseIdentStmt and parseExprStmt.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:19:36 +09:00
989138d12e feat(pp): SORT TO via std.ch + __dbSort RTL
`SORT TO <file> [ON <key-list>] [FOR ...] [WHILE ...] [NEXT ...]
[RECORD ...] [REST] [ALL]` joins COPY in being a real preprocessor
rewrite to a function call. New RTL primitive __dbSort:

  * Buffer visible source records (FOR/WHILE/NEXT/RECORD/REST same
    as __dbCopy).
  * Multi-key stable insertion sort. Each key may carry `/D` for
    descending; ascending otherwise. /A and unknown suffixes fall
    through as ascending. Comparison delegates to the existing
    compareValues helper in sqlscan.go (numeric / string / NIL-aware).
  * Create destination DBF with the source's struct, append rows in
    sorted order, restore source selection.

Parser cleanup: SORT removed from the IDENT-statement no-op switch.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:04:18 +09:00
e961660f61 feat(pp): COPY TO via std.ch + four PP completeness fixes
`COPY TO <file> [FIELDS <list>] [FOR ...] [WHILE ...] [NEXT ...]
[RECORD ...] [REST] [ALL]` reaches the parser as a plain function
call to a new RTL primitive __dbCopy (rtlDbCopy in hbrtl/database.go).

Implementation: project the field list (case-insensitive name match
against the source's structure, full copy when omitted), dbCreate the
target file with that struct, open it under a temp alias, walk the
source under dbEval-style FOR/WHILE/NEXT/RECORD/REST bounds, and
GetValue/Append/PutValue per record into the target. SDF / DELIMITED
variants stay parser no-ops until those backends arrive.

Wiring up COPY surfaced four longstanding gaps in the PP that had to
be fixed for the rule to even reach the runtime:

  * `<(name)>` *pattern* marker was treated as a regular `<name>`
    with the parens baked into the captured key, so the matching
    result substitution `<(name)>` couldn't find it. parseOneMarker
    now strips the parens at parse time so capture key and result
    marker share the bare name. The smart-stringify result behavior
    is unchanged.
  * matchSegment (the optional-clause matcher) bailed on every
    non-Regular marker. `[FIELDS <fields,...>]` therefore failed to
    match at all and the fields list arrived empty in the result
    template. matchSegment now handles MarkerList with paren-balanced
    capture and segment+outer literal stop boundaries.
  * captureExpression only used the first literal in the pattern
    tail as a stop boundary. With std.ch's chain of optional
    clauses (`[TO <(f)>] [FIELDS ...] [FOR ...] [WHILE ...] ...`)
    the file-name marker was happy to gobble a trailing FOR clause
    when FIELDS was absent. It now stops at *any* of the remaining
    pattern literals.
  * `<(name)>` smart-stringify on a list-typed capture wrapped the
    whole comma-joined string in one set of quotes — `{ "a , b" }` —
    instead of `{ "a", "b" }`. New helper quoteListElements splits on
    top-level commas (paren / bracket / brace / string-balanced) and
    quotes each element. applyResult now consults the rule's marker
    table to know which captures came from `<name,...>`.

Parser cleanup: COPY removed from the IDENT-statement no-op switch in
both parseIdentStmt and parseExprStmt.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 15:00:18 +09:00
c2e7f7ea27 feat(pp): Phase B — COUNT / SUM / AVERAGE via std.ch
Three xBase analytical commands that were silent no-ops in the
parser now execute as Harbour-style PP rewrites:

  COUNT [TO <v>]   [FOR <for>] [WHILE <while>] ... -> dbEval()
  SUM <x> TO <v>   [FOR <for>] [WHILE <while>] ... -> dbEval()
  AVERAGE <x> TO <v> [FOR ...]                     -> __dbAverage()

COUNT and SUM expand to a `<v> := 0 ; dbEval( {|| ... } )` pair
matching harbour-core/include/std.ch verbatim. AVERAGE delegates to
a new RTL function rtlDbAverage (sum + count + divide; returns 0 on
empty match) — the chained-private-variable trick Harbour uses to
keep AVERAGE inline doesn't translate cleanly through Five's PP.

Wiring up these rules surfaced four PP issues that had to be fixed
for the rewrite to even reach the parser:

  * Result template did not implement <{name}> blockify. So a rule
    body like `{|| x := x + <x> }, <{for}>` left the literal text
    `<{for}>` in the output. Added blockify substitution: captured
    -> `{|| <captured> }`, missing -> NIL.
  * findMarkerEnd did not recognise `{`/`}` so unreferenced
    blockify markers were not cleaned up either. Added `{`/`}` to
    its prefix/suffix sets.
  * Optional-clause matching had no view of the outer pattern, so a
    regular marker at the end of `[TO <v>]` would swallow the rest
    of the line — `COUNT TO n FOR x>5` captured `<v>` as
    "n FOR x>5". matchSegment now takes outerTail and stops at its
    first literal.
  * `#command` directives could not span multiple physical lines.
    A trailing `;` is harbour-core's line-continuation marker for
    std.ch and now joins the next line into the directive before
    parsing.

Parser cleanup: COUNT, SUM, AVERAGE removed from the IDENT-statement
no-op switch in parseIdentStmt + parseExprStmt. The remaining xBase
verbs (COPY, SORT, TOTAL, JOIN, LIST, DISPLAY, LABEL, REPORT, ...)
stay in the parser until their RTL backends arrive.

Gates green:
  go test ./...      : PASS
  FiveSql2 SQL:1999  : 43/43
  Harbour compat     : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 14:11:20 +09:00
f4ed42556b checkpoint: season-wide bug fix campaign + infra
Cumulative season's silent-bug hunting (~62 fixes) across the FiveSql2
SQL engine, the Five compiler/runtime, and the hbrdd RDD layer. Saved
as a single checkpoint before refactoring the parser to delegate xBase
command translation to the preprocessor.

Highlights:

FiveSql2 engine (_FiveSql2/src/)
- prefix-glob index attach -> explicit convention (<table>_pk.ntx,
  <table>_uq.ntx, <table>.cdx) — fixes silent multi-row INSERT row-drop
- DROP/CREATE TABLE FErase chain extended (.cdx, .fsc, .fsv, .dbt, .fpt)
- COUNT(DISTINCT col) parsed + aggregated via hSeen hash
- UNION column-count mismatch returns SQL_ERR_GRAMMAR (was silent)
- DISTINCT + ORDER BY hidden-col leak fixed (trim before DISTINCT)
- Derived table FROM (SELECT...) + JOIN right-side derived
- Self-FK CASCADE depth 2+ via SqlGetSingleColPK pre-collect
- LAG/LEAD default arg uses SqlEvalRowExpr (handles -N const exprs)
- DATE literal round-trip validation (Feb 29 non-leap rejected)
- CREATE OR REPLACE VIEW; CREATE VIEW errors on already-exists
- AlterTable type dispatcher comma-wrapped (1-char type "A" no longer
  matches CHARACTER)

Compiler / runtime
- gengo: HB_ -> FV_ prefix on emitted Go function names (Five identity)
- gengo split: emit_block.go, emit_stmt.go, folding.go extracted
- parser/stmtreg.go nudges
- hbrt: debug TUI/CLI restructure (debugcmd, debugkey, termios_*),
  windows debug stubs collapsed
- thread/vm/value/class/pcinterp tightening from panic traces

RDD layer (hbrdd/)
- dbf: null bitmap support (null.go + null_test.go), mmap split
  (mmap_posix.go / mmap_windows.go), byte-level numeric parse
- ntx/cdx: windows mmap parity
- workarea + mem RDD: cross-area state-bleed fixes

RTL (hbrtl/)
- errorlog rewrite with platform-specific FD (errorlog_fd_unix /
  errorlog_fd_other)
- sqlscan, sqlhelpers, indexrtl, datetime extensions

Gates green at checkpoint:
- go test ./...        : PASS
- FiveSql2 SQL:1999    : 43/43
- Harbour compat       : 56/56

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-30 09:26:25 +09:00
d6c26104c9 feat(rtl): common.ch aliases — ISNIL/ISARRAY/ISNUMBER and friends
Harbour's common.ch exposes classic Clipper type-check shorthands
via #translate rules that map to HB_IS* RTL functions:

  #translate ISNIL(<x>)       => ((<x>) == NIL)
  #translate ISARRAY(<x>)     => HB_ISARRAY(<x>)
  #translate ISCHARACTER(<x>) => HB_ISSTRING(<x>)
  ... etc.

Five's preprocessor currently supports #translate only for lines
whose FIRST word is the rule keyword, not for substring matches
inside expressions. Real usage like `IF ISNIL(x)` fails the keyword
check (first word is IF, not ISNIL) and the rule never fires.

Rather than rewrite the PP substring engine (A2 scope), register
the nine short names as direct RTL symbols in register.go, each
pointing at the same Go function as its HB_IS* twin. ISMEMO maps
to HB_ISSTRING as a reasonable approximation for Five (no distinct
memo type at the VM level).

common.ch becomes a short stub that just #defines TRUE/FALSE/YES/NO
and documents where the ISxxx aliases live. DEFAULT / UPDATE
#xcommand forms remain unsupported pending A2.

Verified with /tmp/test_common.prg — ISNUMBER(42), ISCHARACTER("x"),
ISNIL(nilVar) all dispatch correctly. Analyzer still emits
"undeclared variable" warnings for the short names (the static
checker doesn't see runtime-registered RTL symbols) but the
generated code links and runs.

FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 17:01:50 +09:00
2a662525b3 feat(rtl): DO(xTarget, [args...]) — dynamic dispatch
Harbour's DO() accepts a string (looked up as a function name), a
code block (evaluated with args), or a symbol, and invokes it. Used
for plugin systems and dynamic dispatch idioms like
`DO(cHandler, oRequest)`.

Five already had stmtDo rewrite `DO(...)` at statement-level to a
function-call expression, so callers in expression position just
work — but gengo refused to emit DO as a function call because it
was on the reserved-word guard list (which existed to catch stray
ENDIF/ENDDO from bad IF nesting). Remove DO from that list; the
statement form is still handled upstream by parseDoProc, so the
guard loses nothing.

rtlDo implements the dispatch:
  - String target → VM.FindSymbol + t.Function
  - Block target → EvalBlock path (same as Eval)
  - Anything else → NIL

Tested (/tmp/test_do.prg):
  DO("Greet", "World") → "hello, World"
  DO({|x,y| x*y+1}, 5, 6) → 31
  DO(NIL) → NIL (ValType "U")

FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:33:09 +09:00
e089c81bcd feat(macro): &var / &(expr) runtime compilation
Harbour's macro operator was a stub: hbrt.MacroCompile only resolved
bare identifier names to memvars/functions and returned the source
string unchanged for any non-trivial expression. The gengo emit was
also broken — `t.MacroPush() + t.PushNil()` never pushed the inner
expression's value, so MacroPush popped whatever happened to be on
the stack.

Wire it up properly:

1. Gengo fix: `case *ast.MacroExpr` now emits `emitExpr(e.Expr);
   t.MacroPush()`. The inner expression produces the source string;
   MacroPush consumes it and pushes the evaluated result.

2. Hook pattern in hbrt: `SetMacroEvalHook(fn)` lets hbrtl install
   the real evaluator without creating an import cycle (genpc
   already imports hbrt). MacroPush delegates to the hook when
   installed; otherwise falls back to the legacy stub for hbrt
   unit tests.

3. hbrtl.init registers macroEval, which reuses compileExprSource
   (factored out of PcCompile) so macro lookups share the same
   sync.Map-backed pcode cache — repeat evaluations of the same
   macro source are free after the first hit.

4. ExecPcode leaves the result in retVal; macroEval copies it to
   the operand stack via PushRetValue.

Tested (/tmp/test_macro.prg):
  &"10 + 20"                    → 30
  &"Sqrt(16)"                   → 4
  &"Upper('hello')"             → HELLO
  &("30 * " + Str(nX, 1))       → 210  (runtime-built source)
  &"5 > 3 .AND. .T."            → .T.
  &("Str(" + Str(nX*10,2) + ",2)") → 70

FiveSql2 43/43, Harbour compat 56/56, Go test ALL PASS.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 16:02:16 +09:00
935883bb88 perf(fivesql2): Go-native FetchRow fast path — 1.3-1.7x on agg/window
TSqlExecutor:FetchRow was the per-row workhorse for aggregation,
HAVING, and window queries. Even with the pre-built aFetchCache
binding columns to (nWA, nFPos), the PRG FOR loop paid one method
dispatch per column per row (dbSelectArea, FieldGet, AllTrim,
AAdd) — profile pinned it at ~30% of B4 CPU.

SqlFetchRowFast collapses the cache-path loop into a single Go
call:
  - bound entry: SelectByNum + area.GetValue directly
  - unbound (aggregate/expression): self:EvalExpr via Send
  - character values: TrimSpace inline
The PRG FetchRow keeps its original cache-miss fallback path
unchanged for rare queries where aFetchCache isn't built.

Bench deltas (median of 3 steady runs, 1000 iters):
  B4_GROUP_HAVING 418 → 327 us  -22% (1.28x)
  B9_ROW_NUMBER   191 → 120 us  -37% (1.59x)
  B10_RANK_PART   228 → 135 us  -41% (1.69x)
  B11_SUM_OVER    249 → 156 us  -37% (1.60x)
  B14_COUNT       235 → 219 us  -7%
  B15_CTE_WIN_JOIN 1577 → 1452 us  -8%
Single-table SELECT (B1-B3, B5-B7, B8) stays flat — those already
hit the column-binding fast path and don't need aggregate dispatch.

FiveSql2 43/43, Harbour compat 56/56.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:50:02 +09:00
c84cde6175 perf(fivesql2): Go-native SqlIsAggName — drop per-row substring scan
B4 GROUP+HAVING profile showed SqlIsAggName at ~9% of CPU —
SqlEvalFunc checks it for every function in every row, and the
PRG body was two string allocations + a substring scan:
  RETURN ("," + c + ",") $ ("," + AGG_FUNCTIONS + ",")

Replace with a hash lookup against the existing aggFuncSet map
in hbrtl/sqlexpr.go (already populated for SqlExprHasAgg, same
AGG_FUNCTIONS list). Upper-casing skips the allocation when the
input is already upper, which it almost always is in practice.

Bench deltas (median of 3 steady runs, 1000 iters):
  B4_GROUP_HAVING 447 → 418 us  -6.5%
  B14_COUNT       252 → 235 us  -7%
  B15_CTE_WIN_JOIN 1595 → 1577 us  -1%
Other benches unchanged (no aggregate calls per row).

FiveSql2 43/43, Harbour compat 56/56.

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-18 13:40:19 +09:00
dd270d5d9d perf: RTL Go-native migration — 27 optimizations, DML up to 70-90x
Systematic pass through PRG hot paths, promoting them to Go RTL while
preserving Harbour/FiveSql2 semantics. Full log in
docs/RTL-Go-Native-Migration.md.

Bench (bench_sql) vs 2026-04-08 baseline
 - B1  SELECT *             2,192 → 114   µs   (19x)
 - B6  INNER JOIN           9,291 → 233   µs   (40x)
 - B7  CTE simple           8,037 → 129   µs   (62x)
 - B9  ROW_NUMBER           3,705 → 265   µs   (14x)
 - B10 RANK PARTITION       4,748 → 309   µs   (15x)
 - B12 INSERT (WA cache)    4,319 →  63   µs   (69x)
 - B13 UPDATE (WA cache)    6,144 →  68   µs   (90x)
 - B15 CTE+WIN+JOIN        18,395 → 1,873 µs   (10x)

Infrastructure
 - HbHash O(1) Index preserving insertion order (Harbour KEEPORDER)
 - HbDeepClone Go RTL (scalar-sharing, immutable hash keys)
 - MEMRDD auto-imported via gengo; all Five programs get mem:name driver
 - SQL plan + pcode caches (s_hPlanCache, s_hDmlPcodeCache)
 - Opt-in SqlWACacheEnable — dbUseArea/Close/Commit batched for DML

SQL engine
 - FiveSql2 lexer ported to Go (byte FSM) with combined automatic
   template parameterization (literals → ?, concat queries share plan)
 - Go RTL: SqlDistinct, SqlGroupRows, SqlWindowPartitions,
   SqlWindowSortPartition, SqlWindowAssignRank, SqlComputeAggSimple,
   SqlBulkInsert, SqlBulkUpdate, SqlExprHasAgg, SqlEvalHaving
 - CTE / subquery / driving-table materialize paths use MEMRDD
 - SqlCoerce/SqlCmp/SqlIsTrue helpers moved from PRG to Go
 - SqlBulkUpdate defers Flush when WA cache active (APFS fsync was
   dominant B13 cost — 1.6ms/call → gone)

Correctness fixes uncovered during migration
 - ASort default path now sorts dates/logicals/timestamps (was no-op)
 - ORDER BY default NULL placement matches PRG SqlRowCompare across
   Go fast path; explicit NULLS FIRST/LAST honored by both paths
 - SqlBulkUpdate respects EXCLUSIVE vs SHARED mode record locks
 - SqlCmp/SqlCmpEq normalize NumInt vs Double (caught by test 6b)

Verification
 - go test ./...              ALL PASS
 - FiveSql2 test_sql1999      43/43
 - tests/compat_harbour       56/56 (+5 new: ASort dates/logicals,
                              AScan int cross-type)
 - Regression test test_null_order.prg for ORDER BY NULL ordering

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
2026-04-17 20:20:14 +09:00
3caadb23b9 perf: SqlOrderBy + SqlGroupBy Go RTL — native sort and aggregation
SqlOrderBy: Go sort.Slice for ORDER BY, 10-50x faster than PRG ASort.
SqlGroupBy: Go map-based GROUP BY accumulation (ready for integration).
TryBuildSortSpec detects simple ORDER BY columns and routes to Go.
Fallback to PRG for complex ORDER BY expressions.

43/43 + 41/41 verify + 51/51 compat + go test ALL PASS.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 14:41:41 +09:00
5fc9c3bbea perf: SqlHashJoin Go RTL — 3-way JOIN 4.2s→61ms (69x)
Go-native multi-table hash join bypasses per-row PRG overhead.
TryGoJoin detects equi-join + plain-col SELECT, aggregate cols
get placeholder. 2-way 73→3ms, 3-way 3.9s→61ms.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-17 07:16:09 +09:00
bfc6ded8cb perf(FiveSql2): SqlHashBuild + FetchRow column binding — 3-way JOIN 3x
Complex-query benchmarking turned up two hot paths that the earlier
SqlScan/SqlEach work didn't touch: multi-table JOIN and nested-scan
row fetching. This commit hits both.

--- Part 1: SqlHashBuild — Go-native hash-join build ---

FiveSql2's HashJoin previously built the inner-side hash in PRG:

    WHILE !Eof()
      xVal := FieldGet(nFPos)
      cKey := SqlValToStr(xVal)
      IF !hb_HHasKey(hHash, cKey) ; hHash[cKey] := {} ; ENDIF
      AAdd(hHash[cKey], RecNo())
      dbSkip()
    ENDDO

That loop runs at ~40μs per row from class dispatch + hb_HHasKey
lookups + AAdd growth + SqlValToStr formatting. On a 50k-row inner
table that's ~2 seconds wasted on what should be a sub-50ms
housekeeping op.

New hbrtl.SqlHashBuild does the same thing in one Go-native pass:

  - Direct *dbf.DBFArea loop (no interface dispatch, same devirt as
    SqlScan)
  - Go `map[string][]int64` accumulates RecNos by key — one
    allocation per distinct key
  - Inline ASCII-only digit formatter for numeric keys (strconv.Itoa
    is allocation-heavy for small ints)
  - CHAR keys are right-trimmed to match SqlCmpEq semantics so the
    hash probe matches what EvalExpr would compute
  - Final Five hash is built once from Keys/Values/Order slices
    directly, skipping the per-key hb_HSet path

HashJoin now calls `SqlHashBuild(nFPos)` instead of running the
PRG loop.

--- Part 2: TSqlExecutor:BuildFetchCache ---

The JOIN fallback loop calls FetchRow per row. FetchRow was already
column-ref-aware but did the string parse (`At + SubStr + Upper`)
and `::FindWA` linear scan every single invocation. For a 50k-row
join emitting 50k result rows, that's ~200k redundant resolutions.

New BuildFetchCache walks the SELECT list once before the scan and
pre-binds each plain-column expression to `{nWA, nFPos}`. FetchRow's
new fast path checks ::aFetchCache and jumps straight to
`dbSelectArea + FieldGet` when bound. Complex exprs (functions,
CASE, subqueries) still fall through to EvalExpr.

::aFetchCache is set right before the join WHILE loop and cleared
after — no cross-query bleed.

--- Bench (50k ord × 10k emp × 100 dept, 3-run steady state) ---

  Query                        Before      After     Speedup
  ────────────────────────────────────────────────────────────
  2-way INNER JOIN, 10k rows   91ms        68ms      1.34x
  2-way JOIN + GROUP BY        110ms       94ms      1.17x
  3-way INNER JOIN COUNT       2610ms      610ms     4.28x
  3-way JOIN + GROUP BY        2860ms      830ms     3.45x

The 3-way speedup is almost entirely SqlHashBuild. The 2-way case
benefits from the fetch cache because its per-row cost is dominated
by FetchRow (no second hash build to amortize).

--- Limits still standing ---

CTE + JOIN queries (Q7 in bench_complex: ~4.5s) aren't affected by
either optimization — CTE materialization goes through a different
path that writes/reads a temp DBF. Follow-up target.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 18:47:20 +09:00
d2ed140273 feat(FiveSql2): SqlEach block callback — beats raw RDD on end-to-end timing
The structural 1.38x gap vs raw RDD for no-WHERE full scans wasn't
a limit of our engine — it was a limit of the result shape. SqlScan
materializes N rows as HbArray wrappers over a flat Value buffer,
then the PRG caller iterates that materialized array. Two passes
over the data. Raw RDD is one pass.

SqlEach folds both passes into one. The caller supplies a code block
that receives the selected column values as positional parameters;
SqlEach invokes it per matching row. No result array is ever built.

Usage (drop-in replacement for the common "scan + process" idiom):

    five_SQLEach( "SELECT id, name, salary FROM emp WHERE salary > 50000",
                  {|nID, cName, nSalary| Process(nID, cName, nSalary) } )

API shape borrows Harbour's AEval/ASort block-callback convention,
so there's nothing new to learn. Positional params also sidestep
the `SELECT COUNT(*)` naming problem — no need to invent names for
anonymous expressions.

Implementation notes:
  - 4-way loop specialization ({DBF, generic Area} × {WHERE, none}),
    matching SqlScan. Each path is zero-allocation in the steady state.
  - Block invocation uses the direct pendingParams + blk.Fn(t) protocol
    rather than EvalBlock, which would allocate a temporary args slice
    on every call (50k scans × small slice adds up).
  - FastFieldGetter is installed the same way as SqlScan so PcOpFieldGet
    in the WHERE predicate skips the PushSymbol + Function dispatch.

Bench (50k rows, end-to-end including user-code loop, steady state):

  Path                           Time     vs raw RDD
  ─────────────────────────────────────────────────────
  Raw PRG loop, WHERE + sum      8.7ms    1.00x
  SqlScan + PRG FOR, WHERE       5.1ms    0.59x
  SqlEach block, WHERE           4.1ms    0.47x  ← beats raw
  ─────────────────────────────────────────────────────
  Raw PRG loop, no WHERE         6.1ms    1.00x
  SqlEach block, no WHERE        3.8ms    0.62x  ← beats raw

SqlEach is faster than a hand-rolled `DO WHILE !Eof()` loop because
the per-row FieldGet in raw PRG still goes through a full Frame +
RTL dispatch, whereas SqlEach's FastFieldGetter captures the concrete
*dbf.DBFArea directly. The SQL abstraction now costs nothing — it
pays you to use it.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Next step (not in this commit): FiveSql2 TSqlExecutor integration —
detect when five_SQL is called with a block argument and route to
SqlEach instead of SqlScan + array build.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 15:16:36 +09:00
5dd212c761 perf(sqlscan): specialize four loop variants (DBF×WHERE matrix)
SqlScan's inner scan was written as a single loop with `if whereFn
!= nil` and a `keep` shadow variable. Branch-predictable for sure,
but still a few extra ops per row and it prevented Go from inlining
the non-nil interface call on the Area branch.

Split into four specialized loop bodies on the two axes that drive
per-row cost:

  1. dbfArea != nil && whereFn != nil
  2. dbfArea != nil && whereFn == nil       ← tightest path (SELECT *)
  3. dbfArea == nil && whereFn != nil       ← generic Area
  4. dbfArea == nil && whereFn == nil

Each body has exactly the instructions it needs — no dead branches,
no shadow variables, no interface dispatch where avoidable. Copy-paste
cost is real but each row save adds up at 50k iterations.

Bench impact (50k rows, 3-run steady state):

  No WHERE            9.1ms → 8.7ms   1.38x vs raw (was 1.47x)
  Numeric WHERE       6.9ms → 7.0ms   ~flat (within noise)
  String WHERE        6.2ms → 6.4ms   ~flat (within noise)
  Raw RDD             6.3ms baseline

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./hbrtl/... PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 14:04:48 +09:00
f9ffd4050e perf(FiveSql2): FieldGet peephole + DBFArea devirt — WHERE at ~1.15x raw RDD
Two stacked optimizations land on the SqlScan hot path. Combined
effect on the 50k-row benchmark:

                       Before    After   vs raw
  Numeric WHERE        10.2ms    7.8ms   1.15x
  String WHERE         10.5ms    7.9ms   1.15x
  No WHERE              9.2ms   10.0ms   1.45x
  Raw RDD baseline      6.8ms    6.8ms   1.00x

WHERE-predicate paths are now within 15% of the raw Harbour-style
RDD scan loop. The no-WHERE path is unchanged (slight jitter from
the added devirt branch); FieldGet peephole doesn't apply there.

--- Optimization 1: PcOpFieldGet peephole ---

Adds a new pcode opcode `PcOpFieldGet <fieldIdx>` (0x46) that skips
the usual PushSymbol+Function+Frame+FieldGet-RTL+EndProc chain and
calls a direct field getter closure instead. genpc recognizes the
shape `FieldGet(<int-literal>)` during emitCall and emits the
specialized opcode automatically — no SQL-side API change.

Integration:
  * hbrt.Thread.FastFieldGetter  — hot-path closure set by scan loops.
                                   Non-nil → pcode bypasses dispatch.
                                   Nil → pcode resolves FIELDGET via
                                   the RTL symbol table (correctness
                                   fallback for any other callers).
  * compiler/genpc/genpc.go      — peephole in emitCall.
  * hbrt/pcinterp.go             — PcOpFieldGet handler.

This alone cut numeric WHERE from 10.2 → 7.9ms: eliminated roughly
one full Frame/EndProc + RTL dispatch per row × 50k rows.

--- Optimization 2: DBFArea devirtualization ---

SqlScan type-asserts the workarea to *dbf.DBFArea once and runs a
dedicated loop that calls GoTop/EOF/Skip/GetValue directly on the
concrete type. Go's compiler inlines these, skipping the interface
vtable per row. Non-DBF drivers still work via the generic Area
branch.

The FastFieldGetter closure also captures *DBFArea directly in the
DBF branch, so the WHERE predicate side of the hot loop is now
entirely devirtualized: no interface dispatch between the pcode
dispatch loop and the DBF record buffer.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Remaining gap to raw RDD on no-WHERE (~1.45x) is dominated by the
two-column row construction + ArraySlab + flat backing bookkeeping
that the raw loop doesn't do. Going below that requires changing
the SQL engine's result shape — out of scope here.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:23:31 +09:00
5c067f35a4 perf(hbrt): ExecPcodeFast — pcode variant without defer/recover
Pcode expressions compiled from SQL WHERE clauses (via genpc.CompileExpr)
never contain BEGIN SEQUENCE and can't raise BreakValue, so the defer +
recover dance in ExecPcode's EndProc is pure overhead. For FiveSql2's
per-row WHERE evaluation on a 50k-row scan, that's 50k × ~15ns = ~750µs
of pointless recover bookkeeping.

Split ExecPcode into two variants sharing execPcodeBody:

  ExecPcode     — full: Frame + defer EndProc. General-purpose,
                  handles panics. Behavior unchanged.

  ExecPcodeFast — hot: Frame + execPcodeBody + EndProcFast. No defer,
                  no recover. Caller guarantees the pcode body can't
                  panic with HbError / BreakValue.

SqlScan now uses ExecPcodeFast for per-row WHERE evaluation. Measured
impact on 50k-row no-WHERE benchmark: 10.6ms → 9.2ms steady state
(~13% faster). Effect is smaller on numeric-WHERE because per-row
cost there is dominated by the opcode dispatch itself, not the frame
exit.

Validation:
  - FiveSql2 43/43
  - go test ./hbrt/... PASS (pcode tests)
  - go test ./hbrtl/... PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 12:07:54 +09:00
85541a3035 perf(sqlscan): flat backing buffer — 30% faster no-WHERE scan
The prior loop allocated one small `[]hbrt.Value` per matching row
(for the row body) plus one HbArray header. For a 50k-row full scan
that's 100k allocations of which the small-slice allocs dominated
fragmentation and GC pressure.

SQLite-inspired fix: pre-allocate a single flat []hbrt.Value of
capacity `RecCount * nFields` at scan start and hand each row a
three-index sub-slice (flat[off:end:end]). The capped sub-slice
still forces a reallocation if PRG code later does `AAdd(row, x)`,
so neighbor rows can't get clobbered.

Sizing the initial buffer off RecCount(err-ignored) was the actual
win — the previous naive grow-from-1024 policy caused five mid-scan
reallocations of a ~200 KB buffer, each memcpy'ing everything so far.
One upfront allocation amortizes much better.

Bench (50k rows, ~/tmp ext4, 3 runs steady-state):

                          Before        After       Δ
  no WHERE                14.6ms       10.6ms     −27%
  numeric WHERE           11.7ms       10.0ms     −15%
  string WHERE            10.5ms       11.0ms     ~=
  raw RDD baseline         6.8ms        7.0ms

Gap to raw RDD: 2.1x → 1.4x on the dominant no-WHERE case. What's
left is pcode WHERE dispatch (ExecPcode frame per row), the Area
interface boundary, and the HbArray header allocation per row —
all structural costs that would need a wider refactor to close.

Validation:
  - FiveSql2 43/43
  - go test ./hbrtl/... PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:57:05 +09:00
d74014a235 feat(rdd): dbInfo / dbOrderInfo — implement the stubs
Replaces the `return NIL` stubs with real implementations that read
from the current workarea. Covers the info codes actually used by
downstream code (FiveSql2 TSqlIndex, standalone callers):

DBINFO:
  DBI_ISDBF, DBI_CANPUTREC, DBI_FULLPATH, DBI_TABLEEXT, DBI_MEMOEXT,
  DBI_SHARED, DBI_ISREADONLY, DBI_GETRECSIZE, DBI_DBVERSION,
  DBI_RDDVERSION, DBI_BOF, DBI_EOF, DBI_FOUND, DBI_FCOUNT, DBI_ALIAS,
  DBI_POSITIONED

DBORDERINFO:
  DBOI_EXPRESSION, DBOI_NAME, DBOI_NUMBER, DBOI_POSITION,
  DBOI_ORDERCOUNT, DBOI_KEYCOUNT, DBOI_KEYCOUNTRAW

Unknown info codes still return NIL (Harbour's forgiving fallback).

New accessors on DBFArea (FullPath, IsShared, IsReadOnly) expose the
private filePath/shared/readOnly fields to the hbrtl layer without
plumbing them through the generic Area interface.

Unblocks TSqlIndex:FindExclusive's original DBI_FULLPATH/DBI_SHARED
scan — though the short-circuit there stays in place for now since
it's a correctness workaround that no longer masks a crash thanks
to the recent gengo PushMemvar fallback.

Validation:
  - FiveSql2 43/43 (0 warnings)
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:42:18 +09:00
3a00aa5435 feat(hbrtl): field metadata + index creation RTL — TSqlIndex warnings to zero
TSqlIndex.prg had five undefined identifiers and six undefined
constants that the new CLASS-method analyzer surfaced after the
gengo PushMemvar fallback stopped crashing on them. All real tech
debt, not false positives. This lands the implementations.

New RTL functions (hbrtl/indexrtl.go + register.go):
  - FieldType(n) → "C"/"N"/"L"/"D"/"M"/... one-letter type
  - FieldLen(n)  → length in bytes
  - FieldDec(n)  → decimal places
  - ordCreate(cBag, cTag, cExpr [, bExpr] [, lUnique])
      → DBFArea.OrderCreate with TagName set (CDX tag or NTX tag)
  - dbCreateIndex(cFile, cExpr [, bExpr] [, lUnique])
      → legacy Clipper single-tag NTX without TagName
  - dbClearIndex() → OrderListClear

All pass through the existing Indexer interface; key expressions go
through the MacroEval slow path since callers pass string literals.
When callers are updated to pass compiled key blocks, the existing
KeyFunc fast path kicks in automatically.

New header files (include/):
  - dbinfo.ch  — DBI_* and DBOI_* constants with Harbour-compatible
                 values (FULLPATH=10, SHARED=42, EXPRESSION=2, etc.)
  - dbstruct.ch — DBS_NAME/TYPE/LEN/DEC field descriptor indices

TSqlIndex.prg already did `#include "dbinfo.ch"` and `#include
"dbstruct.ch"` but Five's preprocessor silently ignored the missing
files. Both headers land in include/ where cmd/five's include-dir
chain already looks.

Analyzer RTL allow-list updated with the six new function names so
the warning pipeline stays clean.

Result: FiveSql2 build goes from 17 WARN → 0. Both tracked test
suites still pass.

Note: dbInfo() / dbOrderInfo() themselves remain stubbed (return NIL)
— the constants exist for compile-time resolution and for future use
when the stubs are replaced. Callers that depend on actual dbInfo
values still get NIL at runtime.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 10:11:57 +09:00
8aaed994f4 perf(FiveSql2): hybrid fast path — 11x speedup on string WHERE scans
Implements hybrid execution model: keep AST tree-walk for SQL:2013+
features (Window, Recursive CTE, JOIN, aggregates) while compiling
simple SELECT hot paths to Go + pcode. See docs/FiveSql2-Hybrid-Plan.md
for the full architecture rationale (why not SQLite-style VDBE).

Hot path (single table, no joins/groups/aggregates):
  - TryBuildFieldPositions: resolves SELECT column list to FieldPos
    array once per query (bails to PRG loop on any complex expr).
  - TryCompileWhere + SqlExprToPrg: walks WHERE AST, emits equivalent
    PRG source, runs it through PcCompile to get a PcodeFunc.
  - SqlScan RTL: Go-native scan loop — GoTop/EOF/Skip/GetValue
    direct, ExecPcode per row for WHERE, result array pre-alloc.

WHERE compiler scope:
  - ND_LIT numeric/logical/string (string literals AllTrim'd to match
    SqlCmpEq CHAR-padding semantics; rejects embedded quotes/newlines)
  - ND_COL: CHAR fields auto-wrapped with AllTrim(FieldGet(n)) based
    on dbStruct() lookup cached once per query in aCompileStruct
  - ND_BIN: = <> != < <= > >= AND OR + - * /
  - ND_UNI: NOT -
  - Anything else (ND_FN, ND_CASE, ND_SUB, ND_PAR, LIKE, IN, IS NULL,
    BETWEEN, dates) returns NIL → falls back to PRG tree-walk.

Bench (50k rows, ~/tmp ext4):
                        Before      After     Speedup
  Numeric WHERE         ~150ms     11.7ms     ~13x
  String WHERE          119.3ms    10.5ms     11.4x
  No WHERE               -         14.6ms      -
  Raw RDD baseline        6.8ms     6.8ms      1.0x

Remaining gap to raw RDD (~1.5x) is structural: Value boxing, result
array construction, per-row ExecPcode frame overhead. Would need a
Value-pool or SoA refactor to close further.

Side fixes bundled:
  - TSqlIndex:FindExclusive short-circuited. Originally called
    dbInfo(DBI_FULLPATH)/DBI_SHARED which are unresolved symbols in
    Five (dbInfo is a stub, DBI_* never defined). Panic'd with
    "local variable index out of range: 0" whenever a standalone PRG
    had a workarea Used before calling five_SQL. 43-test masked the
    bug because it only reached FindExclusive with no open workareas.
    Restore the scan once dbInfo lands in hbrtl.
  - cmd/five/main.go: FIVE_KEEP_BUILD=1 env var keeps the temp Go
    project around for debugging gengo output.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 09:15:08 +09:00
6b26f1b642 feat: genpc.CompileExpr + PcCompile/PcEval runtime bytecode API
Expose Five's existing FRB bytecode compiler for single-expression
compilation, enabling prepared-statement-style caching in dynamic
query engines (FiveSql2, scripting layers, rule engines).

1. genpc.CompileExpr(ast.Expr) *hbrt.PcodeFunc
   - New public API that compiles a single expression to a
     standalone pcode function
   - Reuses genpc's mature emitExpr (no new emit logic)
   - ExecPcode manages the frame around the generated code

2. hbrtl.PcCompile(cPrgExpr) -> pFunc
   - RTL entry point for runtime compilation
   - Wraps the expression in a FUNCTION stub, uses the full PRG
     parser pipeline (pp + parser + genpc), extracts the compiled
     pcode function, returns it as an opaque pointer
   - Callers pay parse+compile cost ONCE per expression

3. hbrtl.PcEval(pFunc) -> xValue
   - RTL entry point for runtime execution
   - Calls hbrt.ExecPcode; the pcode's RetValue opcode sets retVal,
     which our EndProc preserves as PcEval's return value
   - ~1.2x slower than direct FieldGet (pcode interpreter overhead),
     but eliminates AST tree-walk per row for complex expressions

Usage (FiveSql2 hot path, planned):
   pc := PcCompile("FieldGet(4) > 50000")  // parse+compile once
   WHILE !Eof()
      IF PcEval(pc)                         // ~10us per row
         AAdd(aRows, ...)
      ENDIF
      dbSkip()
   ENDDO

Benchmark (50k records, WHERE salary > 50000):
   Raw FieldGet:      7.9 ms  (baseline)
   FieldPos+Get:     10.2 ms  (with O(1) FieldPos cache)
   PcEval bytecode:  10.1 ms  (interpreted bytecode)
   MacroEval:        parse+eval per row — orders of magnitude slower

Tests:
   go test ./...        ALL PASS (14 packages)
   FiveSql2 43/43       100%
   compat_harbour       51/51
   PcCompile/PcEval     verified on 50k-row scan

FiveSql2 engine integration deferred — requires careful PRG-level
refactoring to thread pcode pointers through the plan structure.
The Go-level infrastructure is now in place for that work.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:57:52 +09:00
ed33af41c5 perf: FieldPos O(1) cache + xbase import detection for function-call PRGs
Two SQLite-style optimizations for RDD and SQL workloads:

1. FieldPos() O(1) column binding cache

   Before: FieldPos(name) linear scan — O(n) per call with string
           comparison. In SQL engines that call FieldPos per row per
           column, this is hundreds of thousands of calls.

   After:  DBFArea builds a map[UPPER(name)]→pos on first lookup.
           All subsequent lookups are O(1) hash. SQLite calls this
           "column affinity binding" — positions resolved at prepare,
           not per row.

   Implementation:
     - hbrdd/dbf/dbf.go: DBFArea.FieldPosCache(name) method
     - hbrtl/procinfo.go: FieldPos RTL uses fieldPosCacher interface
     - Lazy init: only pays for tables that get queried

2. hbrdd import auto-detection for function-call style PRGs

   Before: compiler only added hbrdd import when PRG used xBase commands
           (USE, SKIP, INDEX...). Pure function-call style like
           `dbUseArea(.T.,,"t")`, `FieldPut(1, val)` was missed —
           generated Go failed to compile ("undefined: hbrdd").

   After:  scanStmtsForXBase walks ExprStmt bodies too, detecting
           CallExpr to any of the ~40 xBase RTL function names.
           FIELD->NAME alias expressions also trigger the import.

   Resolves: small PRGs that use only dbUseArea/FieldGet/FieldPut.

Benchmark notes (50k records):
  Raw RDD scan:              7 ms    (baseline)
  FiveSql2 SELECT WHERE:   157 ms    (unchanged — bottleneck is
                                      not FieldPos, it's PRG-level
                                      expression tree walk per row)
  compat_harbour 51/51:    PASS
  FiveSql2 43/43:          100%

The FieldPos cache helps heavy field-name-based code paths but the
primary FiveSql2 bottleneck is the PRG interpreter walking expression
ASTs per row (needs bytecode compilation to close the gap).

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 07:42:00 +09:00
3adc9d7d59 fix: PCount, Break/RECOVER, SET INDEX TO — 3 Harbour compat fixes
Release-blocking compatibility issues discovered during the 258-test
pre-release validation suite (100 syntax + 44 RDD + 114 RTL).

1. PCount() always returned 0 in PRG code

   Root cause: ParamCount() returned t.pendingParams, which is
   overwritten by every nested Function() call. By the time the
   PCount() RTL's Frame() executes, pendingParams is already 0.

   Fix: Frame() now stores pendingParams in frame.paramCount.
   PCount() RTL uses CallerParamCount() which reads callSP-2
   (the PRG caller's frame), while RTL functions still use
   ParamCount() (reads pendingParams before their own Frame).

   Verified: PCount(1,2,3)=3, PCount(1)=1, PCount()=0

2. Break("string") panicked instead of being caught by RECOVER USING

   Root cause: Generated SEQUENCE code only caught *HbError panics.
   Break() panics with BreakValue (a different type), which fell
   through to EndProc's "runtime error" message and re-panic.

   Fix (two parts):
   a) gengo emitBeginSequence: recover closure now catches any
      panic (interface{}), then dispatches via type switch:
      - *HbError → extract .Error() string
      - hasValue interface (BreakValue) → extract .GetValue()
      - other → static "error" string
   b) hbrtl/error.go: BreakValue gets GetValue() method for
      duck-type detection without import cycles
   c) hbrt/thread.go EndProc: BreakValue type name check added
      so it re-panics silently (no stderr noise)

3. SET INDEX TO a, b, c only opened the last file

   Root cause: Parser's parseSet() called parseExpr() once for
   INDEX setting, stopping at the first comma. Remaining file
   names were consumed by the "eat rest of line" loop.

   Fix: Parser now collects comma-separated identifiers into a
   single string literal "a,b,c". gengo splits on comma and
   calls OrderListAdd() for each file.

   Verified: SET INDEX TO si_name, si_city → OrdCount=2

All tests pass:
  go test ./...          14 packages OK
  FiveSql2               43/43  100%
  compat_harbour         51/51
  Syntax test           100/100
  RDD test               44/44
  RTL test              114/114
  Windows cross-compile  OK
  Linux cross-compile    OK

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-13 18:06:28 +09:00
fc1dca9551 feat(rdd): real POSIX file/record locking + gap analysis doc
Replaces the FLOCK/DBRLOCK/DBRUNLOCK no-op stubs with actual
fcntl(F_SETLK) byte-range advisory locks, matching Harbour's
hb_fsLockLarge implementation.

Before: rtlDbRLock always returned .T. regardless of contention.
        Multi-process writers could silently corrupt records.

After:  Non-blocking POSIX byte-range locks per file descriptor.
        Cross-process exclusion verified by a subprocess-spawning
        Go test that witnesses BUSY vs OK transitions.

New files:
  hbrdd/dbf/locks_posix.go    fcntl F_WRLCK/F_UNLCK wrappers
  hbrdd/dbf/locks_windows.go  stub (TODO: LockFileEx)
  hbrdd/dbf/lock_multi_test.go   cross-process verification
  docs/gap-analysis.md        honest Harbour parity assessment

Modified:
  hbrdd/dbf/dbf.go
    - DBFArea gains fileLocked bool + lockedRecs map
    - Close() calls releaseAllLocks() before dropping the fd
  hbrtl/database.go
    - rtlDbRLock / rtlDbRUnlock now delegate to DBFArea.LockRecord /
      UnlockRecord instead of returning fixed .T./NIL
    - New rtlFLock / rtlDbUnlock for FLOCK() / DBUNLOCK()
  hbrtl/register.go
    - FLOCK and DBUNLOCK symbols registered (were missing entirely)
  compiler/analyzer/analyzer.go
    - FLOCK / DBUNLOCK added to RTL known-function set

Lock region layout (non-overlapping on purpose):
  FLOCK region       [0, HeaderLen+1)
  Record N region    [RecordOffset(N), RecordLen)

So a workarea can hold FLOCK and multiple DBRLOCK simultaneously
on the same fd without conflict.

Design rationale (captured in locks_posix.go header):
  * POSIX fcntl, not flock(2) — byte-range + NFS-safe
  * Non-blocking F_SETLK — matches Clipper FLOCK() → .F. semantics
  * Released explicitly on Close to avoid workarea-sharing races
  * Windows falls back to no-op (TODO: LockFileEx)

Verification:
  go test ./hbrdd/dbf/ -run TestFLockBlocksAcrossProcesses  PASS
  go test ./hbrdd/dbf/ -run TestRLockBlocksAcrossProcesses  PASS
  go test ./...                                             ALL PASS
  FiveSql2 43/43                                            100%
  compat_harbour 51/51                                      100%

The gap-analysis doc (docs/gap-analysis.md) is a running inventory
of what works vs what's still missing vs Harbour 3.2, written for
users evaluating Five for production — not a sales pitch.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-11 17:58:03 +09:00