From eec1f3e9dba9a7c44ff9db2ab4deed1c621141c2 Mon Sep 17 00:00:00 2001
From: Daniel Wondyifraw <dadenew.yyt@gmail.com>
Date: Mon, 16 Feb 2026 23:16:08 +0100
Subject: [PATCH] fix: address code review feedback - move test data, fix
 patterns, rewrite docs as RFC

---
 docs/telegram-sanitizer.md                    | 84 +++++++++++--------
 .../test-data}/telegram_leak_cases.json       |  6 +-
 2 files changed, 50 insertions(+), 40 deletions(-)
 rename {tests/data => src/telegram/test-data}/telegram_leak_cases.json (96%)
diff --git a/docs/telegram-sanitizer.md b/docs/telegram-sanitizer.md
index 0a6e595ac..391476267 100644
--- a/docs/telegram-sanitizer.md
+++ b/docs/telegram-sanitizer.md
@@ -1,29 +1,25 @@
-# Telegram Outbound Sanitizer
+# Telegram Outbound Sanitizer (RFC)
 
-This document describes the Telegram outbound sanitizer behavior for preventing internal diagnostics and wrapper artifacts from reaching end users.
+> **Status**: Proposal / Request for Comments
+>
+> This document proposes a sanitization layer for Telegram outbound messages. The accompanying test corpus (`src/telegram/test-data/telegram-leak-cases.json`) defines the expected behavior for a future implementation.
 
 ## Overview
 
-The sanitizer intercepts Telegram outbound messages and:
+The sanitizer would intercept Telegram outbound messages and:
 
-1. Strips wrapper artifacts (`<reply>`, `<NO_REPLY>`, `<tool_schema>`, etc.)
-2. Drops internal diagnostics (error codes, run IDs, gateway details)
-3. Returns static responses for unknown slash commands
+1. Strip wrapper artifacts (`<reply>`, `<NO_REPLY>`, `<tool_schema>`, etc.)
+2. Drop internal diagnostics (error codes, run IDs, gateway details)
+3. Return static responses for unknown slash commands
 
-## Marker Families
-
-Static checks verify these marker families:
-
-- `OPENCLAW_TELEGRAM_OUTBOUND_SANITIZER`
-- `OPENCLAW_TELEGRAM_INTERNAL_ERROR_SUPPRESSOR`
-
-## Leakage Patterns Blocked
+## Leakage Patterns to Block
 
 ### Tool/Runtime Leakage
 
 - `tool call validation failed`
 - `not in request.tools`
-- `sessions_send` templates / `function_call`
+- `sessions_send` templates
+- `"type": "function_call"` JSON scaffolding
 - `Run ID`, `Status: error`, gateway timeout/connect details
 
 ### Media/Tool Scaffolding
@@ -34,38 +30,52 @@ Static checks verify these marker families:
 ### Sentinel/Garbage Markers
 
 - `NO_CONTEXT`, `NOCONTENT`, `NO_MESSAGE_CONTENT_HERE`
-- `NO_DATA_FOUND`, `NO_API_KEY`
+- `NO_DATA`, `NO_API_KEY`
 
-## Enforced Behavior
+## Proposed Behavior
 
-1. **Unknown slash commands** → static text response
+1. **Unknown slash commands** → static text response (`"Unknown command. Use /help."`)
 2. **Unknown slash commands** → does NOT call LLM
 3. **Telegram output** → never emits tool diagnostics/internal runtime details
-4. **Optional debug override** → owner-only with `TELEGRAM_DEBUG=true`
+4. **Optional debug override** → owner-only (configurable)
 
-## Verification
+## Test Corpus
 
-Run the leak corpus tests:
+The test corpus at `src/telegram/test-data/telegram-leak-cases.json` defines:
+
+- `expect: "allow"` - Messages that should pass through unchanged
+- `expect: "drop"` - Messages that should be blocked entirely
+- `expect: "strip_wrapper"` - Messages that need wrapper tags removed
+
+### Example Test Cases
+
+```json
+{
+  "id": "diag_tool_validation_failed",
+  "text": "tool call validation failed",
+  "expect": "drop",
+  "description": "Tool runtime error should not reach users"
+}
+```
+
+## Implementation Guidance
+
+When implementing the sanitizer:
+
+- Run sanitization after LLM response, before Telegram API send
+- Empty payloads after sanitization should return a safe fallback message
+- Preserve return shape `{ queuedFinal, counts }` for caller compatibility
+- Use specific patterns (e.g., `"type": "function_call"` not just `function_call`) to avoid false positives
+
+## Validation
+
+Once implemented, validate with:
 
 ```bash
-# Run leak case corpus validation
-pnpm test src/telegram/sanitizer.test.ts
+# Run tests against the leak corpus
+pnpm vitest run src/telegram/sanitizer.test.ts
 
 # Manual smoke check
 # In any Telegram chat: /unknown_command
 # Expected: "Unknown command. Use /help."
 ```
-
-## Test Corpus
-
-The test corpus at `tests/data/telegram_leak_cases.json` contains:
-
-- `expect: "allow"` - Messages that should pass through
-- `expect: "drop"` - Messages that should be blocked
-- `expect: "strip_wrapper"` - Messages that need wrapper removal
-
-## Implementation Notes
-
-- Sanitization runs after LLM response, before Telegram API send
-- Empty payloads after sanitization return fallback message
-- Return shape `{ queuedFinal, counts }` is preserved for caller safety
diff --git a/tests/data/telegram_leak_cases.json b/src/telegram/test-data/telegram_leak_cases.json
similarity index 96%
rename from tests/data/telegram_leak_cases.json
rename to src/telegram/test-data/telegram_leak_cases.json
index d4a0b3997..69eab9688 100644
--- a/tests/data/telegram_leak_cases.json
+++ b/src/telegram/test-data/telegram_leak_cases.json
@@ -132,10 +132,10 @@
     "description": "Internal tool name leak"
   },
   {
-    "id": "diag_function_call",
-    "text": "function_call",
+    "id": "diag_function_call_json",
+    "text": "\"type\": \"function_call\"",
     "expect": "drop",
-    "description": "Function call scaffolding leak"
+    "description": "JSON function_call scaffolding leak (specific pattern to avoid false positives)"
   },
   {
     "id": "diag_media_prefix",