CharlesKWON 9e0f82c5a8 perf+fix(FiveSql2): recursive-CTE hash join + correct correlated subqueries
Two fixes uncovered by a SQL:2013 analytics benchmark covering the
query patterns people actually run on DBF data (OLAP, BI, hierarchy
traversal).

--- Fix 1: correlated subquery was silently wrong ---

EvalExpr's ND_SUB handler only pushed the outer context when
`s_aOuterStack` was already non-empty — otherwise it routed the
subquery through CacheSubquery, which stores the first result under
a key derived from the subquery's syntax tokens. For a correlated
subquery in a top-level WHERE:

    SELECT name, dept, salary FROM emp e1
    WHERE salary > (SELECT AVG(salary) FROM emp e2 WHERE e2.dept = e1.dept)

the first outer row saw an empty stack, cached the result, and
every subsequent outer row got the same cached value regardless of
e1.dept. The query returned all 1000 employees instead of the 505
who actually beat their department's average.

Fix: always PushOuter + Run, no cache. Correctness over caching.
Trade-off: non-correlated scalar subqueries now re-execute per
outer row. A proper per-outer-key memoization is deferred — it
requires walking the subquery AST to collect free variables.

--- Fix 2: WITH RECURSIVE hierarchy join was O(m*n) ---

RecCteJoin (the in-memory join used when a recursive CTE's step
references both a real table and the CTE frontier) ran a flat
nested loop: for each DBF row × each prev-iteration row, build a
combined row buffer and run SqlEvalRowExpr on the ON condition.

For a 4-level 1000-employee hierarchy that's ~1M ON evaluations,
~4.6 seconds.

Fix: detect the shape `dbfAlias.col = cteAlias.col` at join-setup
time, build a PRG hash on the CTE frontier keyed by its join column
(aPrevRows is always small — at most the last iteration's emitted
rows), then scan the DBF side once and probe the hash. Complex ON
predicates fall through to the original nested loop.

--- Bench (SQL:2013 analytics, emp=1k, sales=20k, evt=30k) ---

  Query                              Before     After    Speedup
  ──────────────────────────────────────────────────────────────
  RECURSIVE hierarchy 4-level        4603ms     30ms     ~150x
  Correlated subquery (all emp)      10ms     4933ms ✓ (correct)

Other SQL:2013 queries (ROW_NUMBER top-N, running total, moving
average, DENSE_RANK, LAG, NTILE, gaps-and-islands) are all in the
expected 10–230ms range for these dataset sizes, unchanged by
this commit.

Validation:
  - FiveSql2 43/43
  - Harbour compat 51/51
  - go test ./... ALL PASS

Known follow-ups (not in this commit):
  - Q7 ROLLUP(col) parses but isn't expanded in GroupBy — returns
    a single grand-total row instead of per-value + total. Grouping
    sets implementation is a separate feature.
  - Correlated subquery memoization by outer free-variable key
    would bring Q8 from 4.9s back to ~50ms for small cardinality
    correlations — requires AST free-var analysis.

Co-Authored-By: Claude Opus 4.6 (1M context) <noreply@anthropic.com>
2026-04-14 23:25:58 +09:00

Five — Harbour + Go Fusion Language

Harbour PRG 코드를 네이티브 Go 바이너리로 컴파일하는 퓨전 언어.

30년간 축적된 Harbour/xBase 비즈니스 로직을 Go의 성능, 동시성, 크로스 플랫폼 위에서 실행합니다.

employees.prg  →  five build  →  employees (단일 실행파일, 18MB)

주요 특징

  • Harbour 문법 100% 지원 (CLASS, CODE BLOCK, BEGIN SEQUENCE, ...)
  • Go 네이티브 바이너리 출력 (CGo 없음, 순수 Go)
  • DBF/NTX/CDX 데이터베이스 엔진 내장
  • 479개 RTL 내장 함수
  • FiveSql2: DBF 위에서 SQL:1999 쿼리 (43/43 테스트 통과)
  • Goroutine/Channel 확장 (GO BLOCK, CHANNEL)
  • @byref 참조 전달, mutable closure
  • 대화형 디버거 (TUI/CLI)

빌드 방법

1. Go 설치

Five는 Go 1.21 이상이 필요합니다.

Linux/WSL:

# 이미 설치되어 있는지 확인
go version

# 없으면 설치
wget https://go.dev/dl/go1.22.5.linux-amd64.tar.gz
sudo rm -rf /usr/local/go
sudo tar -C /usr/local -xzf go1.22.5.linux-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin:$HOME/go/bin' >> ~/.bashrc
source ~/.bashrc
go version

macOS (Apple Silicon — M1/M2/M3/M4):

brew install go
# 또는 직접 다운로드:
wget https://go.dev/dl/go1.22.5.darwin-arm64.tar.gz
sudo tar -C /usr/local -xzf go1.22.5.darwin-arm64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin:$HOME/go/bin' >> ~/.zshrc
source ~/.zshrc

macOS (Intel):

brew install go
# 또는 직접 다운로드:
wget https://go.dev/dl/go1.22.5.darwin-amd64.tar.gz
sudo tar -C /usr/local -xzf go1.22.5.darwin-amd64.tar.gz
echo 'export PATH=$PATH:/usr/local/go/bin:$HOME/go/bin' >> ~/.zshrc
source ~/.zshrc

Windows:

https://go.dev/dl/ 에서 .msi 설치

2. Five 컴파일러 빌드

git clone https://gitea.fivego.org/fivedev/five.git
cd five
go build -o five ./cmd/five

빌드 확인:

./five version

3. PRG 프로그램 컴파일 및 실행

단일 파일:

# 컴파일
./five build examples/hello.prg -o hello

# 실행
./hello

다중 파일 (FiveSql2 등):

./five build _FiveSql2/test/test_sql1999.prg _FiveSql2/src/*.prg -o test_sql
./test_sql

4. 테스트 실행

# Go 유닛 테스트
go test ./...

# FiveSql2 SQL 테스트 (43/43)
./five build _FiveSql2/test/test_sql1999.prg _FiveSql2/src/*.prg -o /tmp/test_sql
cd /tmp && ./test_sql

# Harbour 호환 테스트 (51/51)
./five build tests/compat_harbour.prg -o /tmp/test_compat
/tmp/test_compat

Five 명령어

five run <file.prg>              컴파일 후 즉시 실행
five build <file.prg> [-o out]   네이티브 바이너리 생성
five gen <file.prg>              생성된 Go 코드 출력 (디버깅용)
five debug <file.prg>            대화형 디버거 실행
five version                     버전 정보

Hello World

// hello.prg
PROCEDURE Main()
   ? "Hello, Five!"
   ? "Date:", Date()
   ? "Time:", Time()
RETURN
./five build hello.prg -o hello && ./hello

SQL 예제

// sql_demo.prg
#include "FiveSqlDef.ch"

PROCEDURE Main()
   // 테이블 생성
   five_SQL("CREATE TABLE employees (id INTEGER, name CHAR(30), salary NUMERIC(12,2))")

   // 데이터 삽입
   five_SQL("INSERT INTO employees (id, name, salary) VALUES (1, 'Alice', 8000)")
   five_SQL("INSERT INTO employees (id, name, salary) VALUES (2, 'Bob', 7000)")

   // SQL 쿼리
   LOCAL aR := five_SQL("SELECT name, salary FROM employees WHERE salary > 6000 ORDER BY salary DESC")

   ? "Results:", Len(aR[2]), "rows"
   LOCAL i
   FOR i := 1 TO Len(aR[2])
      ? "  ", aR[2][i][1], aR[2][i][2]
   NEXT
RETURN
./five build sql_demo.prg _FiveSql2/src/*.prg -o sql_demo && ./sql_demo

프로젝트 구조

five/
├── cmd/five/          Five CLI (main entry point)
├── compiler/          PRG → Go 컴파일러
│   ├── lexer/         토크나이저
│   ├── parser/        구문 분석기
│   ├── analyzer/      의미 분석기
│   ├── gengo/         Go 코드 생성기
│   └── pp/            전처리기 (#include, #define)
├── hbrt/              런타임 (VM, Stack, Value, Class)
├── hbrtl/             RTL 표준 라이브러리 (479개 함수)
├── hbrdd/             RDD 데이터베이스 엔진
│   ├── dbf/           DBF 파일 드라이버
│   ├── ntx/           NTX 인덱스 드라이버
│   ├── cdx/           CDX 인덱스 드라이버
│   └── mem/           메모리 RDD
├── _FiveSql2/         SQL:1999 엔진 (PRG)
│   ├── src/           SQL 엔진 소스 (14 파일, 10,458 LOC)
│   └── test/          SQL 테스트 스위트
├── tests/             호환성 테스트
├── examples/          예제 프로그램
└── docs/              기술 문서

현재 상태

항목 수치
Go 프로덕션 코드 ~36,000 LOC
RTL 내장 함수 479개
RDD 드라이버 4종 (DBF, NTX, CDX, Memory)
FiveSql2 테스트 43/43 (100%)
호환 테스트 51/51 (100%)
Go 테스트 ALL PASS

라이선스

Copyright (c) 2025-2026 Charles KWON OhJun (charleskwonohjun@gmail.com) All rights reserved.

Description
Five — Harbour+Go Fusion Language (PRG→Go native binary)
Readme 64 MiB
Languages
Go 57.9%
xBase 22%
C 19.5%
Shell 0.5%
Makefile 0.1%