test: add setup-token live smoke

2026-01-10 21:40:59 +00:00
parent ad17966e2f
commit 805a29252e
2 changed files with 282 additions and 4 deletions
--- a/docs/testing.md
+++ b/docs/testing.md
@@ -133,7 +133,7 @@ Live tests are split into two layers so we can isolate failures:
 - Optional tool-calling stress:
  - `CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1` enables an extra “bash writes file → read reads it back → echo nonce” check.
  - This is specifically meant to catch tool-calling compatibility issues across providers (formatting, history replay, tool_result pairing, etc.).
- Optional image send smoke:
+  - Optional image send smoke:
  - `CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1` sends a real image attachment through the gateway agent pipeline (multimodal message) and asserts the model can read back a per-run code from the image.
  - Flow (high level):
    - Test generates a tiny PNG with “CAT” + random code (`src/gateway/live-image-probe.ts`)
@@ -142,6 +142,26 @@ Live tests are split into two layers so we can isolate failures:
    - Embedded agent forwards a multimodal user message to the model
    - Assertion: reply contains `cat` + the code (OCR tolerance: minor mistakes allowed)

+## Live: Anthropic setup-token smoke
+
+- Test: `src/agents/anthropic.setup-token.live.test.ts`
+- Goal: verify Claude CLI setup-token (or a pasted setup-token profile) can complete an Anthropic prompt.
+- Enable:
+  - `CLAWDBOT_LIVE_TEST=1` or `LIVE=1`
+  - `CLAWDBOT_LIVE_SETUP_TOKEN=1`
+- Token sources (pick one):
+  - Profile: `CLAWDBOT_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test`
+  - Raw token: `CLAWDBOT_LIVE_SETUP_TOKEN_VALUE=sk-ant-oat01-...`
+- Model override (optional):
+  - `CLAWDBOT_LIVE_SETUP_TOKEN_MODEL=anthropic/claude-opus-4-5`
+
+Setup example:
+
+```bash
+clawdbot models auth paste-token --provider anthropic --profile-id anthropic:setup-token-test
+CLAWDBOT_LIVE_TEST=1 CLAWDBOT_LIVE_SETUP_TOKEN=1 CLAWDBOT_LIVE_SETUP_TOKEN_PROFILE=anthropic:setup-token-test pnpm test:live src/agents/anthropic.setup-token.live.test.ts
+```
+
 ### Recommended live recipes

 Narrow, explicit allowlists are fastest and least flaky:
@@ -153,22 +173,41 @@ Narrow, explicit allowlists are fastest and least flaky:
  - `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`

 - Tool calling across several providers (bash + read probe):
-  - `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-flash-latest,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
+  - `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-3-flash-preview,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`

 - Google focus (Gemini API key + Antigravity):
-  - Gemini (API key): `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="google/gemini-flash-latest" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
+  - Gemini (API key): `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="google/gemini-3-flash-preview" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
  - Antigravity (OAuth): `LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_ALL_MODELS=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-pro-high" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`

+Notes:
+- `google/...` uses the Gemini API (API key).
+- `google-antigravity/...` uses the Antigravity OAuth bridge (Cloud Code Assist-style agent endpoint).
+- `google-gemini-cli/...` uses the local Gemini CLI on your machine (separate auth + tooling quirks).
+
 ## Live: model matrix (what we cover)

 There is no fixed “CI model list” (live is opt-in), but these are the **recommended** models to cover regularly on a dev machine with keys.

+### Modern smoke set (tool calling + image)
+
+This is the “common models” run we expect to keep working:
+- OpenAI (non-Codex): `openai/gpt-5.2` (optional: `openai/gpt-5.1`)
+- OpenAI Codex: `openai-codex/gpt-5.2` (optional: `openai-codex/gpt-5.2-codex`)
+- Anthropic: `anthropic/claude-opus-4-5` (or `anthropic/claude-sonnet-4-5`)
+- Google (Gemini API): `google/gemini-3-pro-preview` and `google/gemini-3-flash-preview`
+- Google (Antigravity): `google-antigravity/claude-opus-4-5-thinking` and `google-antigravity/gemini-3-flash`
+- Z.AI (GLM): `zai/glm-4.7`
+- MiniMax: `minimax/minimax-m2.1`
+
+Run gateway smoke with tools + image:
+`LIVE=1 CLAWDBOT_LIVE_GATEWAY=1 CLAWDBOT_LIVE_GATEWAY_TOOL_PROBE=1 CLAWDBOT_LIVE_GATEWAY_IMAGE_PROBE=1 CLAWDBOT_LIVE_GATEWAY_MODELS="openai/gpt-5.2,openai-codex/gpt-5.2,anthropic/claude-opus-4-5,google/gemini-3-pro-preview,google/gemini-3-flash-preview,google-antigravity/claude-opus-4-5-thinking,google-antigravity/gemini-3-flash,zai/glm-4.7,minimax/minimax-m2.1" pnpm test:live src/gateway/gateway-models.profiles.live.test.ts`
+
 ### Baseline: tool calling (Read + optional Bash)

 Pick at least one per provider family:
 - OpenAI: `openai/gpt-5.2` (or `openai/gpt-5-mini`)
 - Anthropic: `anthropic/claude-opus-4-5` (or `anthropic/claude-sonnet-4-5`)
- Google: `google/gemini-flash-latest` (or `google/gemini-2.5-pro`)
+- Google: `google/gemini-3-flash-preview` (or `google/gemini-3-pro-preview`)
 - Z.AI (GLM): `zai/glm-4.7`
 - MiniMax: `minimax/minimax-m2.1`

--- a/src/agents/anthropic.setup-token.live.test.ts
+++ b/src/agents/anthropic.setup-token.live.test.ts
@@ -0,0 +1,239 @@
+import { randomUUID } from "node:crypto";
+import fs from "node:fs/promises";
+import os from "node:os";
+import path from "node:path";
+
+import { type Api, completeSimple, type Model } from "@mariozechner/pi-ai";
+import {
+  discoverAuthStorage,
+  discoverModels,
+} from "@mariozechner/pi-coding-agent";
+import { describe, expect, it } from "vitest";
+import {
+  ANTHROPIC_SETUP_TOKEN_PREFIX,
+  validateAnthropicSetupToken,
+} from "../commands/auth-token.js";
+import { loadConfig } from "../config/config.js";
+import { resolveClawdbotAgentDir } from "./agent-paths.js";
+import {
+  type AuthProfileCredential,
+  ensureAuthProfileStore,
+  saveAuthProfileStore,
+} from "./auth-profiles.js";
+import { getApiKeyForModel } from "./model-auth.js";
+import { normalizeProviderId, parseModelRef } from "./model-selection.js";
+import { ensureClawdbotModelsJson } from "./models-config.js";
+
+const LIVE = process.env.LIVE === "1" || process.env.CLAWDBOT_LIVE_TEST === "1";
+const SETUP_TOKEN_RAW = process.env.CLAWDBOT_LIVE_SETUP_TOKEN?.trim() ?? "";
+const SETUP_TOKEN_VALUE =
+  process.env.CLAWDBOT_LIVE_SETUP_TOKEN_VALUE?.trim() ?? "";
+const SETUP_TOKEN_PROFILE =
+  process.env.CLAWDBOT_LIVE_SETUP_TOKEN_PROFILE?.trim() ?? "";
+const SETUP_TOKEN_MODEL =
+  process.env.CLAWDBOT_LIVE_SETUP_TOKEN_MODEL?.trim() ?? "";
+
+const ENABLED =
+  LIVE && Boolean(SETUP_TOKEN_RAW || SETUP_TOKEN_VALUE || SETUP_TOKEN_PROFILE);
+const describeLive = ENABLED ? describe : describe.skip;
+
+type TokenSource = {
+  agentDir: string;
+  profileId: string;
+  cleanup?: () => Promise<void>;
+};
+
+function isSetupToken(value: string): boolean {
+  return value.startsWith(ANTHROPIC_SETUP_TOKEN_PREFIX);
+}
+
+function listSetupTokenProfiles(store: {
+  profiles: Record<string, AuthProfileCredential>;
+}): string[] {
+  return Object.entries(store.profiles)
+    .filter(([, cred]) => {
+      if (cred.type !== "token") return false;
+      if (normalizeProviderId(cred.provider) !== "anthropic") return false;
+      return isSetupToken(cred.token);
+    })
+    .map(([id]) => id);
+}
+
+function pickSetupTokenProfile(candidates: string[]): string {
+  const preferred = [
+    "anthropic:setup-token-test",
+    "anthropic:setup-token",
+    "anthropic:default",
+  ];
+  for (const id of preferred) {
+    if (candidates.includes(id)) return id;
+  }
+  return candidates[0] ?? "";
+}
+
+async function resolveTokenSource(): Promise<TokenSource> {
+  const explicitToken =
+    (SETUP_TOKEN_RAW && isSetupToken(SETUP_TOKEN_RAW) ? SETUP_TOKEN_RAW : "") ||
+    SETUP_TOKEN_VALUE;
+
+  if (explicitToken) {
+    const error = validateAnthropicSetupToken(explicitToken);
+    if (error) {
+      throw new Error(`Invalid setup-token: ${error}`);
+    }
+    const tempDir = await fs.mkdtemp(
+      path.join(os.tmpdir(), "clawdbot-setup-token-"),
+    );
+    const profileId = `anthropic:setup-token-live-${randomUUID()}`;
+    const store = ensureAuthProfileStore(tempDir, {
+      allowKeychainPrompt: false,
+    });
+    store.profiles[profileId] = {
+      type: "token",
+      provider: "anthropic",
+      token: explicitToken,
+    };
+    saveAuthProfileStore(store, tempDir);
+    return {
+      agentDir: tempDir,
+      profileId,
+      cleanup: async () => {
+        await fs.rm(tempDir, { recursive: true, force: true });
+      },
+    };
+  }
+
+  const agentDir = resolveClawdbotAgentDir();
+  const store = ensureAuthProfileStore(agentDir, {
+    allowKeychainPrompt: false,
+  });
+
+  const candidates = listSetupTokenProfiles(store);
+  if (SETUP_TOKEN_PROFILE) {
+    if (!candidates.includes(SETUP_TOKEN_PROFILE)) {
+      const available =
+        candidates.length > 0 ? candidates.join(", ") : "(none)";
+      throw new Error(
+        `Setup-token profile "${SETUP_TOKEN_PROFILE}" not found. Available: ${available}.`,
+      );
+    }
+    return { agentDir, profileId: SETUP_TOKEN_PROFILE };
+  }
+
+  if (
+    SETUP_TOKEN_RAW &&
+    SETUP_TOKEN_RAW !== "1" &&
+    SETUP_TOKEN_RAW !== "auto"
+  ) {
+    throw new Error(
+      "CLAWDBOT_LIVE_SETUP_TOKEN did not look like a setup-token. Use CLAWDBOT_LIVE_SETUP_TOKEN_VALUE for raw tokens.",
+    );
+  }
+
+  if (candidates.length === 0) {
+    throw new Error(
+      "No Anthropics setup-token profiles found. Set CLAWDBOT_LIVE_SETUP_TOKEN_VALUE or CLAWDBOT_LIVE_SETUP_TOKEN_PROFILE.",
+    );
+  }
+  return { agentDir, profileId: pickSetupTokenProfile(candidates) };
+}
+
+function pickModel(models: Array<Model<Api>>, raw?: string): Model<Api> | null {
+  const normalized = raw?.trim() ?? "";
+  if (normalized) {
+    const parsed = parseModelRef(normalized, "anthropic");
+    if (!parsed) return null;
+    return (
+      models.find(
+        (model) =>
+          normalizeProviderId(model.provider) === parsed.provider &&
+          model.id === parsed.model,
+      ) ?? null
+    );
+  }
+
+  const preferred = [
+    "claude-opus-4-5",
+    "claude-sonnet-4-5",
+    "claude-sonnet-4-0",
+    "claude-haiku-3-5",
+  ];
+  for (const id of preferred) {
+    const match = models.find((model) => model.id === id);
+    if (match) return match;
+  }
+  return models[0] ?? null;
+}
+
+describeLive("live anthropic setup-token", () => {
+  it(
+    "completes using a setup-token profile",
+    async () => {
+      const tokenSource = await resolveTokenSource();
+      try {
+        const cfg = loadConfig();
+        await ensureClawdbotModelsJson(cfg, tokenSource.agentDir);
+
+        const authStorage = discoverAuthStorage(tokenSource.agentDir);
+        const modelRegistry = discoverModels(authStorage, tokenSource.agentDir);
+        const all = Array.isArray(modelRegistry)
+          ? modelRegistry
+          : modelRegistry.getAll();
+        const candidates = all.filter(
+          (model) => normalizeProviderId(model.provider) === "anthropic",
+        ) as Array<Model<Api>>;
+        expect(candidates.length).toBeGreaterThan(0);
+
+        const model = pickModel(candidates, SETUP_TOKEN_MODEL);
+        if (!model) {
+          throw new Error(
+            SETUP_TOKEN_MODEL
+              ? `Model not found: ${SETUP_TOKEN_MODEL}`
+              : "No Anthropic models available.",
+          );
+        }
+
+        const apiKeyInfo = await getApiKeyForModel({
+          model,
+          cfg,
+          profileId: tokenSource.profileId,
+          agentDir: tokenSource.agentDir,
+        });
+        const tokenError = validateAnthropicSetupToken(apiKeyInfo.apiKey);
+        if (tokenError) {
+          throw new Error(
+            `Resolved profile is not a setup-token: ${tokenError}`,
+          );
+        }
+
+        const res = await completeSimple(
+          model,
+          {
+            messages: [
+              {
+                role: "user",
+                content: "Reply with the word ok.",
+                timestamp: Date.now(),
+              },
+            ],
+          },
+          {
+            apiKey: apiKeyInfo.apiKey,
+            maxTokens: 64,
+            temperature: 0,
+          },
+        );
+        const text = res.content
+          .filter((block) => block.type === "text")
+          .map((block) => block.text.trim())
+          .join(" ");
+        expect(text.toLowerCase()).toContain("ok");
+      } finally {
+        if (tokenSource.cleanup) {
+          await tokenSource.cleanup();
+        }
+      }
+    },
+    5 * 60 * 1000,
+  );
+});