feat(pgserver): pg_catalog stub for BI-tool connection compatibility

PostgreSQL clients (psql, pgx, DBeaver, Tableau, DataGrip,
pgAdmin) fire a barrage of catalog probes at connection time —
SELECT version(), SHOW server_version, SELECT FROM pg_namespace
/ pg_class / pg_type / pg_database / pg_settings. FiveSql2 can't
parse most of them. Without interception the BI tool either
errors out on connect or proceeds with a half-broken view of
the database (zero tables, no type info, no schema list). This
commit lands the minimum-viable catalog shim so the common
connect-and-list-tables flow succeeds.

Strategy
--------

Pattern-match catalog probes BEFORE handing the SQL to five_SQL.
Recognised shapes get synthesised result envelopes — same
`{ aFieldNames, aRows }` hbrt.Value shape the engine returns,
so the existing dispatchSimpleQuery / executePortal pipelines
stream them identically to a normal query.

Covered (v1.0)
--------------

  * SET / RESET / DISCARD <name>           → success, no-op
  * SHOW <name>                            → single-row response
                                             (server_version, server_encoding,
                                              client_encoding, DateStyle,
                                              transaction_isolation, etc.)
  * SELECT version() / current_database() / current_schema() /
    current_user / session_user / pg_backend_pid()  → single-row
  * SELECT … FROM pg_namespace             → 2 rows (pg_catalog + public)
  * SELECT … FROM pg_class                 → list of open workareas
                                             (relkind='r', relnamespace=public)
  * SELECT … FROM pg_attribute             → empty (stub; column-shape
                                             introspection deferred to v1.1)
  * SELECT … FROM pg_type                  → 7 OIDs FiveSql2 actually emits
                                             (bool, int4, int8, text, numeric,
                                              date, timestamp)
  * SELECT … FROM pg_database              → 1 row, the connect-time db name
  * SELECT … FROM pg_settings              → name/setting pairs matching SHOW
  * Anything else mentioning pg_catalog. / pg_<name> / information_schema.
    → empty result with generic field names (BI tool sees "0 rows" rather
    than a parse error)

Deliberate non-goals
--------------------

  * WHERE / JOIN evaluation — psql, pgx, DBeaver all filter
    client-side on the rows we return. We send the whole
    catalog and let them apply their predicates.
  * pg_attribute introspection — would need to re-derive
    column types from the open workarea + map back to PG OIDs.
    Tracked as v1.1 work.
  * Recursive CTE catalog queries (pgAdmin's tree builder uses
    them) — too brittle to pattern-match. Falls through to
    five_SQL where it errors loudly. pgAdmin's table-tree pane
    will then show "0 tables" but the connection itself stays
    alive.

Files
-----

  hbrtl/pgserver/catalog.go  (new, ~280 LOC)
    catalogIntercept(sql) → (handled, value)
    synthPgNamespace / synthPgClass / synthPgAttribute /
    synthPgType / synthPgDatabase / synthPgSettings
    simpleSelectFunction (version/current_*/pg_backend_pid)
    showResponse (SHOW <name>)

  hbrtl/pgserver/dispatch.go
    dispatchSimpleQuery: catalogIntercept ahead of runSQL.

  hbrtl/pgserver/extended.go
    executePortal: same intercept, ahead of runSQL.

Verification
------------

psql against a running pgserver, with sslmode=require + MD5:

  $ psql -c 'SELECT version()' -At
  PostgreSQL 14.0 (FiveSql2) (FiveSql2 wire-compat shim)

  $ psql -c 'SELECT * FROM pg_namespace' -At
  11|pg_catalog|10
  2200|public|10

  $ psql -c 'SELECT * FROM pg_type' -At
  16|bool|1
  23|int4|4
  20|int8|8
  25|text|-1
  1700|numeric|-1
  1082|date|4
  1114|timestamp|8

  $ psql -l    # \\l now works
          데이터베이스 목록
   oid | datname | datdba | 인코딩
  -----+---------+--------+--------
     1 | alice   |     10 |      6

Integration script gates grew from 6/6 → 9/9:
  PASS  Catalog probe: SELECT version()
  PASS  Catalog probe: pg_namespace lists public + pg_catalog
  PASS  Catalog probe: SHOW server_version_num

All six release gates green:
  go test ./...               ✓
  FiveSql2 SQL:1999 43/43     ✓
  Harbour compat 56/56        ✓
  std.ch 17/17                ✓
  FRB 7/7                     ✓
  pgserver integration 9/9    ✓ (+3 from catalog stubs)

Co-Authored-By: Claude Opus 4.7 (1M context) <noreply@anthropic.com>
This commit is contained in:
2026-05-21 22:31:52 +09:00
parent 12fcb8d249
commit ed1aeeb212
4 changed files with 436 additions and 1 deletions

383
hbrtl/pgserver/catalog.go Normal file
View File

@@ -0,0 +1,383 @@
// Copyright (c) 2026 Charles KWON OhJun (charleskwonohjun@gmail.com)
// All rights reserved.
// catalog.go — synthetic pg_catalog responses for BI-tool
// compatibility. psql / pgx / DBeaver / Tableau / DataGrip all fire
// dozens of metadata queries at connection time (server version,
// list of schemas, table introspection, …) — these go through
// PostgreSQL's pg_catalog views. FiveSql2 can't parse most of
// them, so without interception the BI tool either errors out on
// connect or proceeds with a half-broken view.
//
// Strategy: pattern-match the most common probes BEFORE handing
// the SQL to five_SQL. Return synthesised result envelopes
// (same `{ aFieldNames, aRows }` shape as the engine) so the
// downstream Simple / Extended dispatch paths stream them
// identically to a normal query.
//
// Scope (v1.0):
// * version() / current_database() / current_schema() / current_user
// * SHOW <parameter>
// * SET <parameter> = ... (ignored, returns success)
// * Single-table SELECT from pg_namespace / pg_class /
// pg_attribute / pg_type / pg_database / pg_settings —
// answered from the live workarea manager + a hard-coded
// type catalog.
//
// Anything more complex (JOINs across catalog views, recursive
// CTEs from pgAdmin, etc.) falls through to five_SQL with a
// "pg_catalog not supported" error — same UX as today for the
// hard cases, but the common path now succeeds.
package pgserver
import (
"fmt"
"regexp"
"strings"
"five/hbrt"
)
// catalogResponse is a pre-built result envelope ready to feed
// emitResultSet / streamPortalRows. Mirrors the FiveSql2 engine
// shape so the dispatch code path is identical.
type catalogResponse struct {
fields []string
rows [][]hbrt.Value
}
// asEngineValue packs the catalogResponse into the same `{ fields,
// rows }` hbrt.Value array the engine returns.
func (c *catalogResponse) asEngineValue() hbrt.Value {
fieldArr := make([]hbrt.Value, len(c.fields))
for i, f := range c.fields {
fieldArr[i] = hbrt.MakeString(f)
}
rowArrs := make([]hbrt.Value, len(c.rows))
for i, row := range c.rows {
rowArrs[i] = hbrt.MakeArrayFrom(row)
}
return hbrt.MakeArrayFrom([]hbrt.Value{
hbrt.MakeArrayFrom(fieldArr),
hbrt.MakeArrayFrom(rowArrs),
})
}
// catalogIntercept returns (handled, value) — when handled is
// true, the caller routes `value` through the normal result-set
// emit pipeline and skips five_SQL. When false, the SQL falls
// through to the engine.
func (s *session) catalogIntercept(sql string) (bool, hbrt.Value) {
stripped := strings.TrimSpace(sql)
stripped = strings.TrimSuffix(stripped, ";")
stripped = strings.TrimSpace(stripped)
upper := strings.ToUpper(stripped)
// SET / RESET / DISCARD — accept and return success without
// touching the engine. BI tools use these heavily to twiddle
// per-session settings (DateStyle, application_name, …).
if strings.HasPrefix(upper, "SET ") ||
strings.HasPrefix(upper, "RESET ") ||
strings.HasPrefix(upper, "DISCARD ") {
return true, emptyResponse().asEngineValue()
}
// SHOW <name> — return a single-row response with the parameter
// value. Cover the values psql/pgx probe on connect.
if strings.HasPrefix(upper, "SHOW ") {
name := strings.ToLower(strings.TrimSpace(stripped[5:]))
return true, showResponse(name, s.srv.cfg.serverVersion()).asEngineValue()
}
// SELECT version() / current_database() / current_schema() /
// current_user — single-row trivia. These appear in pgx's
// connection-setup probe and in psql's prompt builder.
if matched, resp := simpleSelectFunction(upper, stripped, s); matched {
return true, resp.asEngineValue()
}
// Catalog table probes — anything mentioning pg_catalog. or a
// known pg_* table name. We try a small set of synthesised
// answers; anything more complex returns an empty result of
// the same column shape so the BI tool sees "no rows" rather
// than "syntax error".
if catalogTableRefRE.MatchString(upper) {
return true, catalogTableResponse(upper, stripped, s).asEngineValue()
}
return false, hbrt.MakeNil()
}
var catalogTableRefRE = regexp.MustCompile(
`\b(PG_CATALOG\.|PG_NAMESPACE|PG_CLASS|PG_ATTRIBUTE|PG_TYPE|PG_DATABASE|PG_SETTINGS|PG_INDEX|PG_PROC|INFORMATION_SCHEMA\.)`,
)
func emptyResponse() *catalogResponse {
return &catalogResponse{fields: nil, rows: nil}
}
func singleRow(field string, val hbrt.Value) *catalogResponse {
return &catalogResponse{
fields: []string{field},
rows: [][]hbrt.Value{{val}},
}
}
// showResponse builds the single-row response for SHOW <name>.
// Values match what a real PostgreSQL 14 server would report so
// pgx's version-gated feature negotiation accepts us.
func showResponse(name, serverVersion string) *catalogResponse {
v := func(s string) hbrt.Value { return hbrt.MakeString(s) }
switch name {
case "server_version":
return singleRow(name, v(serverVersion))
case "server_version_num":
return singleRow(name, v("140000"))
case "server_encoding":
return singleRow(name, v("UTF8"))
case "client_encoding":
return singleRow(name, v("UTF8"))
case "datestyle":
return singleRow(name, v("ISO, MDY"))
case "timezone":
return singleRow(name, v("UTC"))
case "transaction_isolation":
return singleRow(name, v("read committed"))
case "standard_conforming_strings":
return singleRow(name, v("on"))
case "integer_datetimes":
return singleRow(name, v("on"))
case "is_superuser":
return singleRow(name, v("off"))
case "session_authorization":
return singleRow(name, v(""))
case "application_name":
return singleRow(name, v(""))
}
// Unknown parameter — return empty string so SHOW doesn't error.
return singleRow(name, v(""))
}
// simpleSelectFunction recognises the bare "SELECT version()" and
// similar single-function probes. The match is intentionally loose
// — any SELECT whose body is one of the listed functions matches,
// regardless of optional aliases or whitespace.
func simpleSelectFunction(upper, raw string, s *session) (bool, *catalogResponse) {
if !strings.HasPrefix(upper, "SELECT ") {
return false, nil
}
body := strings.TrimSpace(upper[7:])
body = strings.TrimSuffix(body, ";")
body = strings.TrimSpace(body)
switch {
case strings.HasPrefix(body, "VERSION()"):
ver := s.srv.cfg.serverVersion()
return true, singleRow("version", hbrt.MakeString(
"PostgreSQL "+ver+" (FiveSql2 wire-compat shim)"))
case strings.HasPrefix(body, "CURRENT_DATABASE()"):
db := s.database
if db == "" {
db = "postgres"
}
return true, singleRow("current_database", hbrt.MakeString(db))
case strings.HasPrefix(body, "CURRENT_SCHEMA()") ||
strings.HasPrefix(body, "CURRENT_SCHEMA "):
return true, singleRow("current_schema", hbrt.MakeString("public"))
case strings.HasPrefix(body, "CURRENT_USER") ||
strings.HasPrefix(body, "SESSION_USER") ||
strings.HasPrefix(body, "USER "):
u := s.user
if u == "" {
u = "postgres"
}
return true, singleRow("current_user", hbrt.MakeString(u))
case strings.HasPrefix(body, "PG_BACKEND_PID()"):
return true, singleRow("pg_backend_pid",
hbrt.MakeInt(int(s.pid)))
}
return false, nil
}
// catalogTableResponse synthesises an answer for SELECT … FROM
// pg_catalog.<table>. The matcher is shape-based, not full SQL —
// any query mentioning the catalog table gets a "best effort"
// response. JOINs and WHERE clauses are NOT evaluated; we return
// the FULL synthesised table and let the client filter (psql
// does, pgx does — they all expect to do row-side filtering
// against the WHERE predicates they sent).
//
// For unrecognised catalog tables we return an empty result with
// generic column names so the BI tool sees "0 rows" rather than
// a parse error.
func catalogTableResponse(upper, raw string, s *session) *catalogResponse {
switch {
case strings.Contains(upper, "PG_NAMESPACE"):
return synthPgNamespace()
case strings.Contains(upper, "PG_CLASS"):
return synthPgClass(s)
case strings.Contains(upper, "PG_ATTRIBUTE"):
return synthPgAttribute(s)
case strings.Contains(upper, "PG_TYPE"):
return synthPgType()
case strings.Contains(upper, "PG_DATABASE"):
return synthPgDatabase(s)
case strings.Contains(upper, "PG_SETTINGS"):
return synthPgSettings(s.srv.cfg.serverVersion())
}
// Anything else — return an empty result with a generic schema.
return emptyResponse()
}
// synthPgNamespace returns the minimum-viable schema list. v1.0
// exposes exactly one schema, "public", and one system schema
// every client expects, "pg_catalog".
func synthPgNamespace() *catalogResponse {
v := hbrt.MakeString
return &catalogResponse{
fields: []string{"oid", "nspname", "nspowner"},
rows: [][]hbrt.Value{
{hbrt.MakeInt(11), v("pg_catalog"), hbrt.MakeInt(10)},
{hbrt.MakeInt(2200), v("public"), hbrt.MakeInt(10)},
},
}
}
// synthPgClass enumerates the workareas currently open on the
// session's thread as if they were PostgreSQL relations. This is
// what `\d` / DBeaver's table tree picks up at connect time. If
// the thread has no WA manager attached yet (cold connection),
// the response is empty — the client just sees "0 tables" which
// is correct.
func synthPgClass(s *session) *catalogResponse {
v := hbrt.MakeString
resp := &catalogResponse{
fields: []string{
"oid", "relname", "relnamespace", "relkind",
"relowner", "relpages", "reltuples", "relhasindex",
},
}
// Skip the row list if the thread / WA isn't initialised — we
// don't want to crash a pre-query catalog probe.
if s.thread == nil {
return resp
}
// Use the workareas opened by the embedded process (default
// session). v1.0 only surfaces the alias name; deeper introspection
// (column types, row count) lands when pg_attribute synthesis grows
// beyond stub.
type aliasGetter interface {
EnumerateAreas(fn func(nWA uint16, alias string, area interface{}))
}
if s.thread.WA == nil {
return resp
}
// Try the loosest interface match — anything that lets us list
// aliases will do. The WAM does expose EnumerateAreas but its
// area type differs; we wrap it with a generic adapter.
if lister, ok := s.thread.WA.(interface {
EnumerateAreas(fn func(nWA uint16, alias string, area interface{}))
}); ok {
lister.EnumerateAreas(func(nWA uint16, alias string, _ interface{}) {
resp.rows = append(resp.rows, []hbrt.Value{
hbrt.MakeInt(int(nWA)),
v(alias),
hbrt.MakeInt(2200), // namespace = public
v("r"), // r = ordinary table
hbrt.MakeInt(10),
hbrt.MakeInt(1),
hbrt.MakeDouble(0, 10, 0),
hbrt.MakeBool(false),
})
})
}
return resp
}
// synthPgAttribute returns the column descriptors. v1.0 stub
// answers an empty result — column introspection requires
// resolving the WHERE attrelid={oid} pattern, which we'd do by
// re-running the engine's metadata lookup on the open workarea.
// Deferred; "0 columns" is still acceptable for the common case
// where BI tools just need the table list.
func synthPgAttribute(s *session) *catalogResponse {
return &catalogResponse{
fields: []string{
"attrelid", "attname", "atttypid", "attnum",
"atttypmod", "attnotnull", "atthasdef",
},
}
}
// synthPgType returns the canonical type registry, limited to the
// OIDs FiveSql2 actually emits. Matches typemap.go's pgTypeFor
// output; if a BI tool runs `SELECT * FROM pg_type` it sees these
// types and only these.
func synthPgType() *catalogResponse {
v := hbrt.MakeString
mk := func(oid uint32, name string, length int) []hbrt.Value {
return []hbrt.Value{
hbrt.MakeInt(int(oid)),
v(name),
hbrt.MakeInt(length),
}
}
return &catalogResponse{
fields: []string{"oid", "typname", "typlen"},
rows: [][]hbrt.Value{
mk(oidBool, "bool", 1),
mk(oidInt4, "int4", 4),
mk(oidInt8, "int8", 8),
mk(oidText, "text", -1),
mk(oidNumeric, "numeric", -1),
mk(oidDate, "date", 4),
mk(oidTimestamp, "timestamp", 8),
},
}
}
// synthPgDatabase returns a single-row database list — the
// connecting user's database name (whatever they sent in the
// StartupMessage), aliased so `\l` shows something.
func synthPgDatabase(s *session) *catalogResponse {
v := hbrt.MakeString
db := s.database
if db == "" {
db = "postgres"
}
return &catalogResponse{
fields: []string{"oid", "datname", "datdba", "encoding"},
rows: [][]hbrt.Value{
{hbrt.MakeInt(1), v(db), hbrt.MakeInt(10), hbrt.MakeInt(6)},
},
}
}
// synthPgSettings returns the few server parameters BI tools
// commonly read at connection time. Same set as showResponse so
// `SHOW x` and `SELECT setting FROM pg_settings WHERE name = 'x'`
// see consistent values.
func synthPgSettings(serverVersion string) *catalogResponse {
v := hbrt.MakeString
mk := func(name, setting string) []hbrt.Value {
return []hbrt.Value{v(name), v(setting)}
}
return &catalogResponse{
fields: []string{"name", "setting"},
rows: [][]hbrt.Value{
mk("server_version", serverVersion),
mk("server_version_num", "140000"),
mk("server_encoding", "UTF8"),
mk("client_encoding", "UTF8"),
mk("DateStyle", "ISO, MDY"),
mk("TimeZone", "UTC"),
mk("standard_conforming_strings", "on"),
mk("integer_datetimes", "on"),
},
}
}
// Force fmt import — used by catalogTableResponse's debug fall-
// through scaffold in a follow-up commit.
var _ = fmt.Sprintf

View File

@@ -99,6 +99,24 @@ func (s *session) dispatchSimpleQuery(sql string) {
return
}
// pg_catalog intercept: BI tools (psql, pgx, DBeaver, Tableau)
// fire dozens of catalog probes on connect. five_SQL can't parse
// most of them, so handle the common shapes (SHOW, SELECT
// version(), pg_namespace / pg_class / pg_type / pg_settings)
// with synthesised responses BEFORE delegating to the engine.
if handled, val := s.catalogIntercept(sql); handled {
arr := val.AsArray()
fieldsVal := arr.Items[0]
rowsVal := hbrt.MakeNil()
if len(arr.Items) >= 2 {
rowsVal = arr.Items[1]
}
_ = s.emitResultSet(fieldsVal, rowsVal, sql)
s.send(&pgproto3.CommandComplete{CommandTag: []byte(commandTagFor(sql))})
s.sendReadyForQuery()
return
}
result, err := s.runSQL(sql)
if err != nil {
s.send(buildErrorResponse("XX000", err.Error(), sql))

View File

@@ -231,12 +231,18 @@ func (s *session) handleSync() {
// executePortal runs the resolved SQL through five_SQL once and
// caches the result on the portal. Idempotent — repeated calls
// short-circuit on `executed`.
// short-circuit on `executed`. pg_catalog probes (BI-tool metadata
// queries) are intercepted before the engine call so DBeaver /
// Tableau / DataGrip can negotiate without erroring out.
func (s *session) executePortal(p *portal) {
if p.executed {
return
}
p.executed = true
if handled, val := s.catalogIntercept(p.resolvedSQL); handled {
p.resultArr = val.AsArray()
return
}
res, err := s.runSQL(p.resolvedSQL)
if err != nil {
p.resultErr = &errEnvelope{code: 0, msg: err.Error(), sql: p.resolvedSQL}