Phase 9: Testing Infrastructure
Comprehensive analysis of the testing infrastructure spanning TypeScript Python Go C# with a shared replay proxy architecture.
1. Testing Frameworks per Language
flowchart TD
SDK["copilot-sdk"] --> TS["TypeScript\nVitest"]
SDK --> PY["Python\npytest + pytest-asyncio"]
SDK --> GO["Go\ngo test"]
SDK --> CS["C#\nxUnit"]
TypeScript Vitest
Configuration: nodejs/vitest.config.ts
export default defineConfig({
test: {
globals: true,
environment: "node",
testTimeout: 30000,
hookTimeout: 30000,
teardownTimeout: 10000,
isolate: true,
pool: "forks",
exclude: [
"**/node_modules/**",
"**/dist/**",
"**/*.d.ts",
"**/basic-test.ts",
],
},
});
Key choices: 30-second timeout (generous for E2E), process forking for isolation, global test APIs enabled.
Python pytest + pytest-asyncio
Configuration: python/pyproject.toml (lines 81-86)
[tool.pytest.ini_options]
testpaths = ["."]
python_files = "test_*.py"
python_classes = "Test*"
python_functions = "test_*"
asyncio_mode = "auto"
Dev dependencies: pytest>=7.0.0, pytest-asyncio>=0.21.0, pytest-timeout>=2.0.0, httpx>=0.24.0.
Go go test
Standard go test framework. No special configuration — tests follow Go conventions with _test.go suffixes and testing.T parameters.
C# xUnit
Configuration: dotnet/test/Harness/E2ETestBase.cs
public abstract class E2ETestBase
: IClassFixture<E2ETestFixture>,
IAsyncLifetime
Uses IClassFixture<T> for shared context, IAsyncLifetime for async setup/teardown, and Fact/Theory attributes.
2. Test Structure — Unit Tests vs E2E Tests
Directory Layout
copilot-sdk/
nodejs/
test/
client.test.ts # Unit tests
e2e/
harness/
CapiProxy.ts # Per-SDK proxy client
sdkTestContext.ts # Test context
sdkTestHelper.ts # Utility functions
session.test.ts # E2E tests
hooks.test.ts
... (20+ E2E test files)
python/
test_client.py # Unit tests
test_jsonrpc.py
test_event_forward_compatibility.py
test_rpc_timeout.py
e2e/
conftest.py # Shared fixtures
testharness/
context.py / helper.py / proxy.py
test_session.py / test_hooks.py ... (13+ files)
go/
client_test.go # Unit tests
definetool_test.go / session_test.go / types_test.go
internal/e2e/
testharness/
context.go / helper.go / proxy.go
session_test.go / hooks_test.go ... (14+ files)
dotnet/
test/
Harness/
E2ETestBase.cs / E2ETestContext.cs
E2ETestFixture.cs / TestHelper.cs
SessionTests.cs / HooksTests.cs ... (17+ files)
test/ # SHARED cross-language
harness/
server.ts / capturingHttpProxy.ts
replayingCapiProxy.ts / util.ts
snapshots/ # YAML snapshot files
scenarios/ # Polyglot scenario tests
Unit vs E2E Distinction
nodejs/test/client.test.ts line 5: "This file is for unit tests. Where relevant, prefer to add e2e tests in e2e/*.test.ts instead"
| Aspect | Unit Tests | E2E Tests |
|---|---|---|
| Location | SDK root (nodejs/test/, python/test_*.py, go/*_test.go) | Dedicated subdirs (*/e2e/, dotnet/test/) |
| What they test | Client construction, parameter validation, URL parsing | Full request/response flows with real session management |
| Dependencies | Mocks/spies (vi.spyOn), no external deps | Real CLI process + replaying HTTP proxy |
| Speed | Fast and deterministic | Slower, 30-second timeout |
| Snapshots | None | Shared YAML across all 4 languages |
3. Test Harness Architecture
The test harness is a layered system with a shared TypeScript server at the base and per-language wrapper clients on top.
block-beta
columns 1
L5["Layer 5: Test Files\n22 E2E Vitest - 14 E2E pytest - 13 E2E go test - 17 E2E xUnit"]
L4["Layer 4: Per-Language Wrappers\nCapiProxy.ts - proxy.py - proxy.go - E2ETestContext.cs"]
L3["Layer 3: ReplayingCapiProxy\ntest/harness/replayingCapiProxy.ts lines 52-1059\nRecord/replay, YAML snapshots, normalization"]
L2["Layer 2: CapturingHttpProxy\ntest/harness/capturingHttpProxy.ts lines 11-206\nTransparent HTTP proxy: forwards and records"]
L1["Layer 1: Shared Proxy Server\ntest/harness/server.ts - Node.js HTTP server, random port\nEndpoints: POST /config - GET /exchanges - POST /stop"]
L5 --> L4 --> L3 --> L2 --> L1
style L5 fill:#7c3aed,color:#fff
style L4 fill:#6d28d9,color:#fff
style L3 fill:#5b21b6,color:#fff
style L2 fill:#4c1d95,color:#fff
style L1 fill:#3b0764,color:#fff
Layer 1: Shared Proxy Server
// test/harness/server.ts (lines 1-13)
const proxy = new ReplayingCapiProxy("https://api.githubcopilot.com");
const proxyUrl = await proxy.start();
console.log(`Listening: ${proxyUrl}`);
Launched as a child process by each language's CapiProxy wrapper. Listens on random local port, prints Listening: http://127.0.0.1:<port> to stdout.
Control Endpoints
| Endpoint | Purpose |
|---|---|
POST /config | Reconfigure for the next test (snapshot path, work dir) |
GET /exchanges | Retrieve captured HTTP exchanges |
POST /stop | Gracefully shut down (optionally skip cache writes) |
Layer 2: CapturingHttpProxy
test/harness/capturingHttpProxy.ts (lines 11-206) — Base class that acts as a transparent HTTP proxy:
- Starts
http.Serveron127.0.0.1:0 - Forwards requests to target URL (
https://api.githubcopilot.com) - Records all request/response pairs as
CapturedExchangeobjects - Handles streaming responses by forwarding chunks as they arrive
Layer 3: ReplayingCapiProxy
test/harness/replayingCapiProxy.ts (lines 52-1059) — Core of the entire test infrastructure.
flowchart TD
A["Incoming Request"] --> B["Check snapshot cache"]
B --> C{"Match found?"}
C -->|Yes| D["Replay from YAML"]
C -->|No| E["Forward & Record"]
Layer 4: Per-Language Wrappers
flowchart LR
subgraph TypeScript
TS1["CapiProxy.ts\nSpawns via npx tsx, parses port"]
TS2["sdkTestContext.ts\nTemp dirs, config, proxy lifecycle"]
end
subgraph Python
PY1["proxy.py\nSpawns via subprocess, httpx"]
PY2["context.py\nAsync context manager, fixtures"]
end
subgraph Go
GO1["proxy.go\nSpawns via exec.Command, net/http"]
GO2["context.go\nTestMain setup, t.Cleanup teardown"]
end
subgraph CSharp["C#"]
CS1["E2ETestContext.cs\nSpawns via Process.Start, HttpClient"]
CS2["E2ETestBase.cs\nIAsyncLifetime fixture pattern"]
end
4. E2E Test Pattern
flowchart TD
A["Test starts -- context.configureForTest#40;category, name#41;"] --> B["POST /config -- proxy loads snapshot YAML"]
B --> C["CopilotClient.createSession#40;#41; with proxy URL"]
C --> D["session.sendAndWait#40;prompt#41;"]
D --> E["CLI calls LLM API -- proxy intercepts & replays"]
E --> F["Assert on response / events / tool calls"]
F --> G["Teardown: session.disconnect#40;#41;, proxy.stop#40;#41;"]
Environment Configuration
| Variable | Purpose |
|---|---|
GITHUB_COPILOT_CHAT_OVERRIDE_URL | Points CLI to the local replay proxy |
GITHUB_TOKEN | Fake token (ghu_test...) for replay mode |
COPILOT_CLI_PATH | Path to the Copilot CLI binary |
5. Scenario Testing
The test/scenarios/ directory contains 35 polyglot scenario tests, each implemented in all four languages.
Scenario Structure
test/scenarios/
01-basic-conversation/
typescript/ (package.json, index.ts)
python/ (main.py)
go/ (main.go)
dotnet/ (Program.cs, *.csproj)
02-tool-use/
typescript/ python/ go/ dotnet/
...
35-advanced-hooks/
typescript/ python/ go/ dotnet/
Scenario Runner
test/scenarios/verify.sh — Runs all scenarios against a live CLI with real GitHub tokens. Not wired into CI.
scenario-builds.yml workflow only verifies that scenarios compile, not that they run. The verify.sh runner requires real GitHub tokens and a live Copilot CLI.
6. Snapshot Testing
Snapshot File Format
Each test has a corresponding YAML snapshot in test/snapshots/:
# test/snapshots/session/basic_conversation.yaml
exchanges:
- request:
method: POST
url: /chat/completions
headers:
content-type: application/json
body:
messages:
- role: system
content: "..."
- role: user
content: "What is 2+2?"
response:
status: 200
headers:
content-type: application/json
body:
choices:
- message:
role: assistant
content: "2+2 equals 4."
Snapshot Naming Convention
test/snapshots/{category}/{test_name}.yaml
# Examples:
test/snapshots/session/basic_conversation.yaml
test/snapshots/hooks/pre_tool_use_deny.yaml
test/snapshots/permissions/approve_all.yaml
Normalization Pipeline
flowchart TD
A["Raw HTTP request/response"] --> B["Strip volatile headers #40;Date, X-Request-Id#41;"]
B --> C["Normalize paths #40;OS-specific to forward slashes#41;"]
C --> D["Replace tool call IDs with deterministic placeholders"]
D --> E["Normalize timestamps to epoch"]
E --> F["Normalize shell names #40;PowerShell / bash#41;"]
F --> G["Deterministic, cross-platform snapshot"]
Request Matching Logic
| Strategy | Description |
|---|---|
| Exact match | Method + URL + body deep-equals normalized snapshot |
| Prefix matching | For multi-turn: matches conversation as prefix of full exchange list |
| Fallback | If no match in replay mode → error (CI) or forward to real API (local dev) |
Corruption Prevention
conftest.py tracks failures via item.session.stash.
7. Test Categories
Node.js E2E Tests (22 files)
| Test File | Focus Area |
|---|---|
session.test.ts | Session lifecycle (create, resume, abort, delete) |
hooks.test.ts | Pre/post tool use hooks |
hooks_extended.test.ts | Extended hooks (onError, onSessionEnd, etc.) |
permissions.test.ts | Permission handling (approve, deny, async) |
custom_tools.test.ts | Custom tool registration and execution |
skills.test.ts | Skill invocation |
ask_user.test.ts | User input handler |
mcp.test.ts | MCP server integration |
custom_agents.test.ts | Custom agent configuration |
multi_client.test.ts | Multiple client instances |
compaction.test.ts | Context compaction |
streaming.test.ts | Streaming event fidelity |
rpc.test.ts | Low-level RPC operations |
builtin_tools.test.ts | Built-in tools (file ops, shell, grep) |
event_fidelity.test.ts | Event field/ordering accuracy |
error_resilience.test.ts | Error recovery and resilience |
multi_turn.test.ts | Multi-turn conversations |
tool_results.test.ts | Tool result handling |
session_config.test.ts | Session configuration options |
session_lifecycle.test.ts | Full session lifecycle |
client_lifecycle.test.ts | Client start/stop/restart |
resume_permissions.test.ts | Permissions on session resume |
Python E2E Tests (14 files)
| Test File | Focus Area |
|---|---|
test_session.py | Session lifecycle |
test_hooks.py | Pre/post tool use hooks |
test_permissions.py | Permission handling |
test_custom_tools.py | Custom tools |
test_skills.py | Skills |
test_ask_user.py | User input |
test_mcp.py | MCP servers |
test_custom_agents.py | Custom agents |
test_multi_client.py | Multiple clients |
test_compaction.py | Compaction |
test_streaming.py | Streaming |
test_rpc.py | RPC operations |
test_client_lifecycle.py | Client lifecycle |
test_resume_permissions.py | Resume permissions |
Go E2E Tests (13 files) & C# E2E Tests (17 files)
Both SDKs mirror the same test categories as Node.js and Python, with language-appropriate implementations.
8. CI Integration
Six GitHub Actions workflows orchestrate the test suite:
1. Node.js SDK Tests (nodejs-sdk-tests.yml)
| Aspect | Detail |
|---|---|
| Triggers | Push to main, PRs touching nodejs/** or test/** |
| Matrix | ubuntu-latest, macos-latest, windows-latest |
| Steps | Setup Node.js 22, npm ci, npm run lint, npm run format:check, npm run build, install harness deps, PowerShell warmup (Windows), npm test |
2. Python SDK Tests (python-sdk-tests.yml)
| Aspect | Detail |
|---|---|
| Triggers | Push to main, PRs touching python/** or test/** |
| Matrix | 3 OS × Python 3.10, 3.11, 3.12 |
| Steps | Setup Python, Node.js 22, install deps, ruff check, ruff format --check, pyright, install harness deps, PowerShell warmup, pytest |
3. Go SDK Tests (go-sdk-tests.yml)
| Aspect | Detail |
|---|---|
| Triggers | Push to main, PRs touching go/** or test/** |
| Matrix | ubuntu-latest, macos-latest, windows-latest |
| Steps | Setup Copilot CLI, Go 1.24, go fmt (Linux), golangci-lint (Linux), install harness deps, PowerShell warmup, /bin/bash test.sh |
4. .NET SDK Tests (dotnet-sdk-tests.yml)
| Aspect | Detail |
|---|---|
| Triggers | Push to main, PRs touching dotnet/** or test/** |
| Matrix | ubuntu-latest, macos-latest, windows-latest |
| Steps | Setup .NET 10.0.x, Node.js 22, dotnet restore, dotnet format --verify-no-changes (Linux), dotnet build, install harness deps, PowerShell warmup, dotnet test --no-build -v n |
5. Scenario Build Verification (scenario-builds.yml)
Triggers on PRs/pushes touching test/scenarios/** or SDK source. Four parallel jobs:
flowchart LR
TS["TS\nnpm install per scenario"]
PY["Py\npy_compile + import copilot"]
GO["Go\ngo build ./... per scenario"]
CS["C#\ndotnet build per scenario"]
6. Codegen Check (codegen-check.yml)
Validates that generated code is up-to-date.
Common CI Patterns
| Pattern | Description |
|---|---|
| Three-OS matrix | Ubuntu, macOS, Windows — cross-platform compatibility |
| Test harness dependency install | All install test/harness/ npm packages |
| PowerShell warmup on Windows | Avoids first-run delays during tests |
| Path-based triggers | Only run when relevant files change |
| Content read permissions only | Security-conscious permissions |
9. Test Utilities
Node.js Helpers
nodejs/test/e2e/harness/sdkTestHelper.ts
| Function | Lines | Purpose |
|---|---|---|
getFinalAssistantMessage(session) | 7-76 | Races existing messages against future events for final assistant response |
retry(message, fn, maxTries, delay) | 78-99 | Retries an async function up to N times with delay |
formatError(error) | 101-113 | Safe error formatting (handles objects, circular refs) |
getNextEventOfType(session, eventType) | 115-130 | Waits for a specific event type from a session stream |
Python Helpers
python/e2e/testharness/helper.py
| Function | Lines | Purpose |
|---|---|---|
get_final_assistant_message(session, timeout) | 11-55 | Async wait for final assistant message |
_get_existing_final_response(session) | 58-93 | Check existing messages for completed response |
write_file(work_dir, filename, content) | 96-111 | Write a file in the test work directory |
read_file(work_dir, filename) | 114-127 | Read a file from the test work directory |
get_next_event_of_type(session, event_type, timeout) | 130-163 | Wait for a specific event type |
Go Helpers
go/internal/e2e/testharness/helper.go
| Function | Lines | Purpose |
|---|---|---|
GetFinalAssistantMessage(ctx, session) | 12-56 | Wait for final assistant message using channels |
GetNextEventOfType(session, eventType, timeout) | 59-91 | Wait for a specific event type |
getExistingFinalResponse(ctx, session) | 93-145 | Check existing messages for completed response |
C# Helpers
dotnet/test/Harness/TestHelper.cs
| Method | Lines | Purpose |
|---|---|---|
GetFinalAssistantMessageAsync(session, timeout) | 9-53 | Async wait with TaskCompletionSource and cancellation |
GetExistingFinalResponseAsync(session) | 55-75 | Check existing messages |
GetNextEventOfTypeAsync<T>(session, timeout) | 77-100 | Generic wait for typed events |
Shared Utilities
test/harness/util.ts (lines 1-35)
| Export | Purpose |
|---|---|
iife<T>(fn) | Immediately-invoked function expression wrapper |
sleep(ms) | Promise-based sleep |
ShellConfig.powerShell / .bash | Platform-specific shell tool name mappings for snapshot normalization |
Python conftest.py Fixtures
python/e2e/conftest.py (lines 1-47)
| Fixture/Hook | Scope | Purpose |
|---|---|---|
pytest_runtest_makereport | Hook | Tracks test failures via item.session.stash to prevent corrupted snapshot writes |
ctx | Module | Creates and tears down E2ETestContext shared across all tests in a module |
configure_test | Function (autouse) | Automatically configures the proxy for each test based on module and test name |
C# Test Base
dotnet/test/Harness/E2ETestBase.cs (lines 13-79) — The E2ETestBase abstract class provides:
CtxandClientproperties from the shared fixtureInitializeAsync()that callsConfigureForTestAsyncCreateSessionAsync()/ResumeSessionAsync()convenience methods (default approve-all)GetSystemMessage()/GetToolNames()helpers for exchange inspection
dotnet/test/Harness/E2ETestFixture.cs (lines 10-30) — xUnit IAsyncLifetime fixture:
- Creates
E2ETestContextandCopilotClienton initialize - Calls
ForceStopAsync()andDisposeAsync()on teardown
10. Coverage Analysis
Well-Tested Areas
- Session lifecycle: Comprehensive coverage across all 4 SDKs (create, resume, abort, delete, stateful conversation)
- Permission handling: Approve, deny, async handlers, error cases, resume with permissions
- Tool use hooks: Pre/post tool use, deny behavior, both hooks combined
- Client construction and validation: URL parsing, auth options, mutual exclusivity checks
- Custom tools and agents: Registration, MCP server configuration, custom agent setup
- Multi-client scenarios: Tool registration across clients, permission handling, event visibility
- Streaming and event fidelity: Event ordering, field correctness, streaming chunk accuracy
- Cross-platform: All E2E tests run on Linux, macOS, and Windows in CI
- Cross-SDK snapshot sharing: All 4 languages share the same YAML snapshots
Potential Gaps
| # | Gap | Details |
|---|---|---|
| 1 | C# unit tests | .NET SDK has only E2E tests, no pure unit tests (unlike TS, Python, Go) |
| 2 | Error resilience | Only Node.js has error_resilience.test.ts |
| 3 | Built-in tools | Only Node.js tests file ops, shell, grep, find |
| 4 | Event fidelity | Only Node.js has event_fidelity.test.ts |
| 5 | Extended hooks | Only Node.js tests onErrorOccurred, onSessionEnd, etc. |
| 6 | Multi-turn | Only Node.js has multi_turn.test.ts |
| 7 | Session config/lifecycle | Node.js has separate files; others may subsume them |
| 8 | Tool results | Only Node.js has tool_results.test.ts |
| 9 | Scenario execution | CI only verifies compile, not run |
| 10 | Harness self-tests | Edge cases not assessed |
| 11 | Snapshot staleness | No automated orphaned snapshot detection |
Cross-SDK Test Parity Summary
| Test Category | TS | Py | Go | C# |
|---|---|---|---|---|
| Session management | Y | Y | Y | Y |
| Client lifecycle | Y | Y | Y | Y |
| Hooks (pre/post tool) | Y | Y | Y | Y |
| Extended hooks | Y | – | – | – |
| Permissions | Y | Y | Y | Y |
| Custom tools | Y | Y | Y | Y |
| Skills | Y | Y | Y | Y |
| Ask user | Y | Y | Y | Y |
| MCP + agents | Y | Y | Y | Y |
| Multi-client | Y | Y | Y | Y |
| Compaction | Y | Y | Y | Y |
| Streaming fidelity | Y | Y | Y | Y |
| RPC | Y | Y | Y | Y |
| Built-in tools | Y | – | – | – |
| Event fidelity | Y | – | – | – |
| Error resilience | Y | – | – | – |
| Multi-turn | Y | – | – | – |
| Tool results | Y | – | – | – |
| Unit tests | Y (1) | Y (4) | Y (4+2) | Partial |
Legend: Y = has dedicated test file, – = no dedicated test file (may be partially covered elsewhere)
11. Architecture Diagram
flowchart TD
SNAP["test/snapshots/*.yaml\n#40;shared across all SDKs#41;"]
SNAP --> HARNESS["test/harness/\nserver.ts - replayingCapiProxy.ts"]
SNAP --> SCENARIOS["test/scenarios/\n35 scenarios x 4 langs"]
HARNESS --> TS_W["Node.js\nCapiProxy - Context - Helper"]
HARNESS --> PY_W["Python\nproxy - context - helper"]
HARNESS --> GO_W["Go\nproxy - context - helper"]
HARNESS --> CS_W["C#\nContext - Base - Helper"]
TS_W --> TS_E["22 E2E #40;Vitest#41;"]
PY_W --> PY_E["14 E2E #40;pytest#41;"]
GO_W --> GO_E["13 E2E #40;go test#41;"]
CS_W --> CS_E["17 E2E #40;xUnit#41;"]
12. Key Design Decisions
| # | Decision | Rationale |
|---|---|---|
| 1 | Single shared proxy server in TypeScript | Rather than implementing proxy/replay logic in each language, a single Node.js server handles all complexity. Language-specific wrappers are thin HTTP clients (~50-100 lines each). |
| 2 | YAML snapshots as the source of truth | All SDKs share the same snapshots, ensuring behavioral consistency. A snapshot captured by one SDK can be replayed for another. |
| 3 | Record-then-replay pattern | Developers run tests locally to capture new snapshots against real APIs. CI replays without real API access, using fake tokens. |
| 4 | Extensive normalization | System messages, paths, tool call IDs, timestamps, and shell-specific names are all normalized, making snapshots deterministic and cross-platform. |
| 5 | Fail-fast in CI | No silent fallback to live APIs in CI. Missing snapshots produce GitHub Actions error annotations with file/line references. |
| 6 | Prefix matching for multi-turn | A single YAML conversation captures the full multi-turn exchange. The proxy matches requests as conversation prefixes. |
| 7 | Corruption prevention | Snapshots are not written when tests fail, avoiding storage of incorrect behavior. |