Phase 9/13 — Testing

Phase 9: Testing Infrastructure

Comprehensive analysis of the testing infrastructure spanning TypeScript Python Go C# with a shared replay proxy architecture.

1. Testing Frameworks per Language

Testing Frameworks per Language

flowchart TD
  SDK["copilot-sdk"] --> TS["TypeScript\nVitest"]
  SDK --> PY["Python\npytest + pytest-asyncio"]
  SDK --> GO["Go\ngo test"]
  SDK --> CS["C#\nxUnit"]

TypeScript Vitest

Configuration: nodejs/vitest.config.ts

export default defineConfig({
  test: {
    globals: true,
    environment: "node",
    testTimeout: 30000,
    hookTimeout: 30000,
    teardownTimeout: 10000,
    isolate: true,
    pool: "forks",
    exclude: [
      "**/node_modules/**",
      "**/dist/**",
      "**/*.d.ts",
      "**/basic-test.ts",
    ],
  },
});

Key choices: 30-second timeout (generous for E2E), process forking for isolation, global test APIs enabled.

Python pytest + pytest-asyncio

Configuration: python/pyproject.toml (lines 81-86)

[tool.pytest.ini_options]
testpaths = ["."]
python_files = "test_*.py"
python_classes = "Test*"
python_functions = "test_*"
asyncio_mode = "auto"

Dev dependencies: pytest>=7.0.0, pytest-asyncio>=0.21.0, pytest-timeout>=2.0.0, httpx>=0.24.0.

Go go test

Standard go test framework. No special configuration — tests follow Go conventions with _test.go suffixes and testing.T parameters.

C# xUnit

Configuration: dotnet/test/Harness/E2ETestBase.cs

public abstract class E2ETestBase
  : IClassFixture<E2ETestFixture>,
    IAsyncLifetime

Uses IClassFixture<T> for shared context, IAsyncLifetime for async setup/teardown, and Fact/Theory attributes.

2. Test Structure — Unit Tests vs E2E Tests

Directory Layout

copilot-sdk/
  nodejs/
    test/
      client.test.ts              # Unit tests
      e2e/
        harness/
          CapiProxy.ts            # Per-SDK proxy client
          sdkTestContext.ts        # Test context
          sdkTestHelper.ts        # Utility functions
        session.test.ts           # E2E tests
        hooks.test.ts
        ... (20+ E2E test files)
  python/
    test_client.py                # Unit tests
    test_jsonrpc.py
    test_event_forward_compatibility.py
    test_rpc_timeout.py
    e2e/
      conftest.py                 # Shared fixtures
      testharness/
        context.py / helper.py / proxy.py
      test_session.py / test_hooks.py  ... (13+ files)
  go/
    client_test.go                # Unit tests
    definetool_test.go / session_test.go / types_test.go
    internal/e2e/
      testharness/
        context.go / helper.go / proxy.go
      session_test.go / hooks_test.go  ... (14+ files)
  dotnet/
    test/
      Harness/
        E2ETestBase.cs / E2ETestContext.cs
        E2ETestFixture.cs / TestHelper.cs
      SessionTests.cs / HooksTests.cs  ... (17+ files)
  test/                           # SHARED cross-language
    harness/
      server.ts / capturingHttpProxy.ts
      replayingCapiProxy.ts / util.ts
    snapshots/                    # YAML snapshot files
    scenarios/                    # Polyglot scenario tests

Unit vs E2E Distinction

Convention: nodejs/test/client.test.ts line 5: "This file is for unit tests. Where relevant, prefer to add e2e tests in e2e/*.test.ts instead"

Aspect	Unit Tests	E2E Tests
Location	SDK root (`nodejs/test/`, `python/test_.py`, `go/_test.go`)	Dedicated subdirs (`*/e2e/`, `dotnet/test/`)
What they test	Client construction, parameter validation, URL parsing	Full request/response flows with real session management
Dependencies	Mocks/spies (`vi.spyOn`), no external deps	Real CLI process + replaying HTTP proxy
Speed	Fast and deterministic	Slower, 30-second timeout
Snapshots	None	Shared YAML across all 4 languages

3. Test Harness Architecture

The test harness is a layered system with a shared TypeScript server at the base and per-language wrapper clients on top.

Test Harness Architecture

block-beta
  columns 1
  L5["Layer 5: Test Files\n22 E2E Vitest - 14 E2E pytest - 13 E2E go test - 17 E2E xUnit"]
  L4["Layer 4: Per-Language Wrappers\nCapiProxy.ts - proxy.py - proxy.go - E2ETestContext.cs"]
  L3["Layer 3: ReplayingCapiProxy\ntest/harness/replayingCapiProxy.ts lines 52-1059\nRecord/replay, YAML snapshots, normalization"]
  L2["Layer 2: CapturingHttpProxy\ntest/harness/capturingHttpProxy.ts lines 11-206\nTransparent HTTP proxy: forwards and records"]
  L1["Layer 1: Shared Proxy Server\ntest/harness/server.ts - Node.js HTTP server, random port\nEndpoints: POST /config - GET /exchanges - POST /stop"]
  L5 --> L4 --> L3 --> L2 --> L1
  style L5 fill:#7c3aed,color:#fff
  style L4 fill:#6d28d9,color:#fff
  style L3 fill:#5b21b6,color:#fff
  style L2 fill:#4c1d95,color:#fff
  style L1 fill:#3b0764,color:#fff

Layer 1: Shared Proxy Server

// test/harness/server.ts (lines 1-13)
const proxy = new ReplayingCapiProxy("https://api.githubcopilot.com");
const proxyUrl = await proxy.start();
console.log(`Listening: ${proxyUrl}`);

Launched as a child process by each language's CapiProxy wrapper. Listens on random local port, prints Listening: http://127.0.0.1:<port> to stdout.

Control Endpoints

Endpoint	Purpose
`POST /config`	Reconfigure for the next test (snapshot path, work dir)
`GET /exchanges`	Retrieve captured HTTP exchanges
`POST /stop`	Gracefully shut down (optionally skip cache writes)

Layer 2: CapturingHttpProxy

test/harness/capturingHttpProxy.ts (lines 11-206) — Base class that acts as a transparent HTTP proxy:

Starts http.Server on 127.0.0.1:0
Forwards requests to target URL (https://api.githubcopilot.com)
Records all request/response pairs as CapturedExchange objects
Handles streaming responses by forwarding chunks as they arrive

Layer 3: ReplayingCapiProxy

test/harness/replayingCapiProxy.ts (lines 52-1059) — Core of the entire test infrastructure.

ReplayingCapiProxy Request Flow

flowchart TD
  A["Incoming Request"] --> B["Check snapshot cache"]
  B --> C{"Match found?"}
  C -->|Yes| D["Replay from YAML"]
  C -->|No| E["Forward & Record"]

Layer 4: Per-Language Wrappers

Per-Language Wrappers

flowchart LR
  subgraph TypeScript
    TS1["CapiProxy.ts\nSpawns via npx tsx, parses port"]
    TS2["sdkTestContext.ts\nTemp dirs, config, proxy lifecycle"]
  end
  subgraph Python
    PY1["proxy.py\nSpawns via subprocess, httpx"]
    PY2["context.py\nAsync context manager, fixtures"]
  end
  subgraph Go
    GO1["proxy.go\nSpawns via exec.Command, net/http"]
    GO2["context.go\nTestMain setup, t.Cleanup teardown"]
  end
  subgraph CSharp["C#"]
    CS1["E2ETestContext.cs\nSpawns via Process.Start, HttpClient"]
    CS2["E2ETestBase.cs\nIAsyncLifetime fixture pattern"]
  end

4. E2E Test Pattern

E2E Test Lifecycle

flowchart TD
  A["Test starts -- context.configureForTest#40;category, name#41;"] --> B["POST /config -- proxy loads snapshot YAML"]
  B --> C["CopilotClient.createSession#40;#41; with proxy URL"]
  C --> D["session.sendAndWait#40;prompt#41;"]
  D --> E["CLI calls LLM API -- proxy intercepts & replays"]
  E --> F["Assert on response / events / tool calls"]
  F --> G["Teardown: session.disconnect#40;#41;, proxy.stop#40;#41;"]

Environment Configuration

Variable	Purpose
`GITHUB_COPILOT_CHAT_OVERRIDE_URL`	Points CLI to the local replay proxy
`GITHUB_TOKEN`	Fake token (`ghu_test...`) for replay mode
`COPILOT_CLI_PATH`	Path to the Copilot CLI binary

5. Scenario Testing

The test/scenarios/ directory contains 35 polyglot scenario tests, each implemented in all four languages.

Scenario Structure

test/scenarios/
  01-basic-conversation/
    typescript/ (package.json, index.ts)
    python/     (main.py)
    go/         (main.go)
    dotnet/     (Program.cs, *.csproj)
  02-tool-use/
    typescript/ python/ go/ dotnet/
  ...
  35-advanced-hooks/
    typescript/ python/ go/ dotnet/

Scenario Runner

test/scenarios/verify.sh — Runs all scenarios against a live CLI with real GitHub tokens. Not wired into CI.

CI Limitation: The scenario-builds.yml workflow only verifies that scenarios compile, not that they run. The verify.sh runner requires real GitHub tokens and a live Copilot CLI.

6. Snapshot Testing

Snapshot File Format

Each test has a corresponding YAML snapshot in test/snapshots/:

# test/snapshots/session/basic_conversation.yaml
exchanges:
  - request:
      method: POST
      url: /chat/completions
      headers:
        content-type: application/json
      body:
        messages:
          - role: system
            content: "..."
          - role: user
            content: "What is 2+2?"
    response:
      status: 200
      headers:
        content-type: application/json
      body:
        choices:
          - message:
              role: assistant
              content: "2+2 equals 4."

Snapshot Naming Convention

test/snapshots/{category}/{test_name}.yaml
# Examples:
test/snapshots/session/basic_conversation.yaml
test/snapshots/hooks/pre_tool_use_deny.yaml
test/snapshots/permissions/approve_all.yaml

Normalization Pipeline

flowchart TD
  A["Raw HTTP request/response"] --> B["Strip volatile headers #40;Date, X-Request-Id#41;"]
  B --> C["Normalize paths #40;OS-specific to forward slashes#41;"]
  C --> D["Replace tool call IDs with deterministic placeholders"]
  D --> E["Normalize timestamps to epoch"]
  E --> F["Normalize shell names #40;PowerShell / bash#41;"]
  F --> G["Deterministic, cross-platform snapshot"]

Request Matching Logic

Strategy	Description
Exact match	Method + URL + body deep-equals normalized snapshot
Prefix matching	For multi-turn: matches conversation as prefix of full exchange list
Fallback	If no match in replay mode → error (CI) or forward to real API (local dev)

Corruption Prevention

Safety: Snapshots are never written when tests fail, preventing storage of incorrect behavior. Python's conftest.py tracks failures via item.session.stash.

7. Test Categories

Node.js E2E Tests (22 files)

Test File	Focus Area
`session.test.ts`	Session lifecycle (create, resume, abort, delete)
`hooks.test.ts`	Pre/post tool use hooks
`hooks_extended.test.ts`	Extended hooks (onError, onSessionEnd, etc.)
`permissions.test.ts`	Permission handling (approve, deny, async)
`custom_tools.test.ts`	Custom tool registration and execution
`skills.test.ts`	Skill invocation
`ask_user.test.ts`	User input handler
`mcp.test.ts`	MCP server integration
`custom_agents.test.ts`	Custom agent configuration
`multi_client.test.ts`	Multiple client instances
`compaction.test.ts`	Context compaction
`streaming.test.ts`	Streaming event fidelity
`rpc.test.ts`	Low-level RPC operations
`builtin_tools.test.ts`	Built-in tools (file ops, shell, grep)
`event_fidelity.test.ts`	Event field/ordering accuracy
`error_resilience.test.ts`	Error recovery and resilience
`multi_turn.test.ts`	Multi-turn conversations
`tool_results.test.ts`	Tool result handling
`session_config.test.ts`	Session configuration options
`session_lifecycle.test.ts`	Full session lifecycle
`client_lifecycle.test.ts`	Client start/stop/restart
`resume_permissions.test.ts`	Permissions on session resume

Python E2E Tests (14 files)

Test File	Focus Area
`test_session.py`	Session lifecycle
`test_hooks.py`	Pre/post tool use hooks
`test_permissions.py`	Permission handling
`test_custom_tools.py`	Custom tools
`test_skills.py`	Skills
`test_ask_user.py`	User input
`test_mcp.py`	MCP servers
`test_custom_agents.py`	Custom agents
`test_multi_client.py`	Multiple clients
`test_compaction.py`	Compaction
`test_streaming.py`	Streaming
`test_rpc.py`	RPC operations
`test_client_lifecycle.py`	Client lifecycle
`test_resume_permissions.py`	Resume permissions

Go E2E Tests (13 files) & C# E2E Tests (17 files)

Both SDKs mirror the same test categories as Node.js and Python, with language-appropriate implementations.

8. CI Integration

Six GitHub Actions workflows orchestrate the test suite:

1. Node.js SDK Tests (`nodejs-sdk-tests.yml`)

Aspect	Detail
Triggers	Push to `main`, PRs touching `nodejs/` or `test/`
Matrix	`ubuntu-latest`, `macos-latest`, `windows-latest`
Steps	Setup Node.js 22, `npm ci`, `npm run lint`, `npm run format:check`, `npm run build`, install harness deps, PowerShell warmup (Windows), `npm test`

2. Python SDK Tests (`python-sdk-tests.yml`)

Aspect	Detail
Triggers	Push to `main`, PRs touching `python/` or `test/`
Matrix	3 OS × Python 3.10, 3.11, 3.12
Steps	Setup Python, Node.js 22, install deps, `ruff check`, `ruff format --check`, `pyright`, install harness deps, PowerShell warmup, `pytest`

3. Go SDK Tests (`go-sdk-tests.yml`)

Aspect	Detail
Triggers	Push to `main`, PRs touching `go/` or `test/`
Matrix	`ubuntu-latest`, `macos-latest`, `windows-latest`
Steps	Setup Copilot CLI, Go 1.24, `go fmt` (Linux), `golangci-lint` (Linux), install harness deps, PowerShell warmup, `/bin/bash test.sh`

4. .NET SDK Tests (`dotnet-sdk-tests.yml`)

Aspect	Detail
Triggers	Push to `main`, PRs touching `dotnet/` or `test/`
Matrix	`ubuntu-latest`, `macos-latest`, `windows-latest`
Steps	Setup .NET 10.0.x, Node.js 22, `dotnet restore`, `dotnet format --verify-no-changes` (Linux), `dotnet build`, install harness deps, PowerShell warmup, `dotnet test --no-build -v n`

5. Scenario Build Verification (`scenario-builds.yml`)

Triggers on PRs/pushes touching test/scenarios/** or SDK source. Four parallel jobs:

Scenario Build Verification

flowchart LR
  TS["TS\nnpm install per scenario"]
  PY["Py\npy_compile + import copilot"]
  GO["Go\ngo build ./... per scenario"]
  CS["C#\ndotnet build per scenario"]

6. Codegen Check (`codegen-check.yml`)

Validates that generated code is up-to-date.

Common CI Patterns

Pattern	Description
Three-OS matrix	Ubuntu, macOS, Windows — cross-platform compatibility
Test harness dependency install	All install `test/harness/` npm packages
PowerShell warmup on Windows	Avoids first-run delays during tests
Path-based triggers	Only run when relevant files change
Content read permissions only	Security-conscious permissions

9. Test Utilities

Node.js Helpers

nodejs/test/e2e/harness/sdkTestHelper.ts

Function	Lines	Purpose
`getFinalAssistantMessage(session)`	7-76	Races existing messages against future events for final assistant response
`retry(message, fn, maxTries, delay)`	78-99	Retries an async function up to N times with delay
`formatError(error)`	101-113	Safe error formatting (handles objects, circular refs)
`getNextEventOfType(session, eventType)`	115-130	Waits for a specific event type from a session stream

Python Helpers

python/e2e/testharness/helper.py

Function	Lines	Purpose
`get_final_assistant_message(session, timeout)`	11-55	Async wait for final assistant message
`_get_existing_final_response(session)`	58-93	Check existing messages for completed response
`write_file(work_dir, filename, content)`	96-111	Write a file in the test work directory
`read_file(work_dir, filename)`	114-127	Read a file from the test work directory
`get_next_event_of_type(session, event_type, timeout)`	130-163	Wait for a specific event type

Go Helpers

go/internal/e2e/testharness/helper.go

Function	Lines	Purpose
`GetFinalAssistantMessage(ctx, session)`	12-56	Wait for final assistant message using channels
`GetNextEventOfType(session, eventType, timeout)`	59-91	Wait for a specific event type
`getExistingFinalResponse(ctx, session)`	93-145	Check existing messages for completed response

C# Helpers

dotnet/test/Harness/TestHelper.cs

Method	Lines	Purpose
`GetFinalAssistantMessageAsync(session, timeout)`	9-53	Async wait with `TaskCompletionSource` and cancellation
`GetExistingFinalResponseAsync(session)`	55-75	Check existing messages
`GetNextEventOfTypeAsync<T>(session, timeout)`	77-100	Generic wait for typed events

Shared Utilities

test/harness/util.ts (lines 1-35)

Export	Purpose
`iife<T>(fn)`	Immediately-invoked function expression wrapper
`sleep(ms)`	Promise-based sleep
`ShellConfig.powerShell` / `.bash`	Platform-specific shell tool name mappings for snapshot normalization

Python conftest.py Fixtures

python/e2e/conftest.py (lines 1-47)

Fixture/Hook	Scope	Purpose
`pytest_runtest_makereport`	Hook	Tracks test failures via `item.session.stash` to prevent corrupted snapshot writes
`ctx`	Module	Creates and tears down `E2ETestContext` shared across all tests in a module
`configure_test`	Function (autouse)	Automatically configures the proxy for each test based on module and test name

C# Test Base

dotnet/test/Harness/E2ETestBase.cs (lines 13-79) — The E2ETestBase abstract class provides:

Ctx and Client properties from the shared fixture
InitializeAsync() that calls ConfigureForTestAsync
CreateSessionAsync() / ResumeSessionAsync() convenience methods (default approve-all)
GetSystemMessage() / GetToolNames() helpers for exchange inspection

dotnet/test/Harness/E2ETestFixture.cs (lines 10-30) — xUnit IAsyncLifetime fixture:

Creates E2ETestContext and CopilotClient on initialize
Calls ForceStopAsync() and DisposeAsync() on teardown

10. Coverage Analysis

Well-Tested Areas

Session lifecycle: Comprehensive coverage across all 4 SDKs (create, resume, abort, delete, stateful conversation)
Permission handling: Approve, deny, async handlers, error cases, resume with permissions
Tool use hooks: Pre/post tool use, deny behavior, both hooks combined
Client construction and validation: URL parsing, auth options, mutual exclusivity checks
Custom tools and agents: Registration, MCP server configuration, custom agent setup
Multi-client scenarios: Tool registration across clients, permission handling, event visibility
Streaming and event fidelity: Event ordering, field correctness, streaming chunk accuracy
Cross-platform: All E2E tests run on Linux, macOS, and Windows in CI
Cross-SDK snapshot sharing: All 4 languages share the same YAML snapshots

Potential Gaps

11 potential coverage gaps identified:

#	Gap	Details
1	C# unit tests	.NET SDK has only E2E tests, no pure unit tests (unlike TS, Python, Go)
2	Error resilience	Only Node.js has `error_resilience.test.ts`
3	Built-in tools	Only Node.js tests file ops, shell, grep, find
4	Event fidelity	Only Node.js has `event_fidelity.test.ts`
5	Extended hooks	Only Node.js tests onErrorOccurred, onSessionEnd, etc.
6	Multi-turn	Only Node.js has `multi_turn.test.ts`
7	Session config/lifecycle	Node.js has separate files; others may subsume them
8	Tool results	Only Node.js has `tool_results.test.ts`
9	Scenario execution	CI only verifies compile, not run
10	Harness self-tests	Edge cases not assessed
11	Snapshot staleness	No automated orphaned snapshot detection

Cross-SDK Test Parity Summary

Test Category	TS	Py	Go	C#
Session management	Y	Y	Y	Y
Client lifecycle	Y	Y	Y	Y
Hooks (pre/post tool)	Y	Y	Y	Y
Extended hooks	Y	–	–	–
Permissions	Y	Y	Y	Y
Custom tools	Y	Y	Y	Y
Skills	Y	Y	Y	Y
Ask user	Y	Y	Y	Y
MCP + agents	Y	Y	Y	Y
Multi-client	Y	Y	Y	Y
Compaction	Y	Y	Y	Y
Streaming fidelity	Y	Y	Y	Y
RPC	Y	Y	Y	Y
Built-in tools	Y	–	–	–
Event fidelity	Y	–	–	–
Error resilience	Y	–	–	–
Multi-turn	Y	–	–	–
Tool results	Y	–	–	–
Unit tests	Y (1)	Y (4)	Y (4+2)	Partial

Legend: Y = has dedicated test file, – = no dedicated test file (may be partially covered elsewhere)

11. Architecture Diagram

Test Architecture Overview

flowchart TD
  SNAP["test/snapshots/*.yaml\n#40;shared across all SDKs#41;"]
  SNAP --> HARNESS["test/harness/\nserver.ts - replayingCapiProxy.ts"]
  SNAP --> SCENARIOS["test/scenarios/\n35 scenarios x 4 langs"]

  HARNESS --> TS_W["Node.js\nCapiProxy - Context - Helper"]
  HARNESS --> PY_W["Python\nproxy - context - helper"]
  HARNESS --> GO_W["Go\nproxy - context - helper"]
  HARNESS --> CS_W["C#\nContext - Base - Helper"]

  TS_W --> TS_E["22 E2E #40;Vitest#41;"]
  PY_W --> PY_E["14 E2E #40;pytest#41;"]
  GO_W --> GO_E["13 E2E #40;go test#41;"]
  CS_W --> CS_E["17 E2E #40;xUnit#41;"]

12. Key Design Decisions

#	Decision	Rationale
1	Single shared proxy server in TypeScript	Rather than implementing proxy/replay logic in each language, a single Node.js server handles all complexity. Language-specific wrappers are thin HTTP clients (~50-100 lines each).
2	YAML snapshots as the source of truth	All SDKs share the same snapshots, ensuring behavioral consistency. A snapshot captured by one SDK can be replayed for another.
3	Record-then-replay pattern	Developers run tests locally to capture new snapshots against real APIs. CI replays without real API access, using fake tokens.
4	Extensive normalization	System messages, paths, tool call IDs, timestamps, and shell-specific names are all normalized, making snapshots deterministic and cross-platform.
5	Fail-fast in CI	No silent fallback to live APIs in CI. Missing snapshots produce GitHub Actions error annotations with file/line references.
6	Prefix matching for multi-turn	A single YAML conversation captures the full multi-turn exchange. The proxy matches requests as conversation prefixes.
7	Corruption prevention	Snapshots are not written when tests fail, avoiding storage of incorrect behavior.

Phase 9: Testing Infrastructure

1. Testing Frameworks per Language

TypeScript Vitest

Python pytest + pytest-asyncio

Go go test

C# xUnit

2. Test Structure — Unit Tests vs E2E Tests

Directory Layout

Unit vs E2E Distinction

3. Test Harness Architecture

Layer 1: Shared Proxy Server

Control Endpoints

Layer 2: CapturingHttpProxy

Layer 3: ReplayingCapiProxy

Layer 4: Per-Language Wrappers

4. E2E Test Pattern

Environment Configuration

5. Scenario Testing

Scenario Structure

Scenario Runner

6. Snapshot Testing

Snapshot File Format

Snapshot Naming Convention

Normalization Pipeline

Request Matching Logic

Corruption Prevention

7. Test Categories

Node.js E2E Tests (22 files)

Python E2E Tests (14 files)

Go E2E Tests (13 files) & C# E2E Tests (17 files)

8. CI Integration

1. Node.js SDK Tests (nodejs-sdk-tests.yml)

2. Python SDK Tests (python-sdk-tests.yml)

3. Go SDK Tests (go-sdk-tests.yml)

4. .NET SDK Tests (dotnet-sdk-tests.yml)

5. Scenario Build Verification (scenario-builds.yml)

6. Codegen Check (codegen-check.yml)

Common CI Patterns

9. Test Utilities

Node.js Helpers

Python Helpers

Go Helpers

C# Helpers

Shared Utilities

Python conftest.py Fixtures

C# Test Base

10. Coverage Analysis

Well-Tested Areas

Potential Gaps

Cross-SDK Test Parity Summary

11. Architecture Diagram

12. Key Design Decisions

1. Node.js SDK Tests (`nodejs-sdk-tests.yml`)

2. Python SDK Tests (`python-sdk-tests.yml`)

3. Go SDK Tests (`go-sdk-tests.yml`)

4. .NET SDK Tests (`dotnet-sdk-tests.yml`)

5. Scenario Build Verification (`scenario-builds.yml`)

6. Codegen Check (`codegen-check.yml`)