diff --git a/.gitignore b/.gitignore index 972264bcc..cc1a82e2a 100644 --- a/.gitignore +++ b/.gitignore @@ -8,6 +8,7 @@ coverage .worktrees/ .DS_Store **/.DS_Store +ui/src/ui/__screenshots__/ # Bun build artifacts *.bun-build diff --git a/AGENTS.md b/AGENTS.md index a8d574840..26b42fd9a 100644 --- a/AGENTS.md +++ b/AGENTS.md @@ -23,6 +23,7 @@ - Naming: match source names with `*.test.ts`; e2e in `*.e2e.test.ts`. - Run `pnpm test` (or `pnpm test:coverage`) before pushing when you touch logic. - Pure test additions/fixes generally do **not** need a changelog entry unless they alter user-facing behavior or the user asks for one. +- Mobile: before using a simulator, check for connected real devices (iOS + Android) and prefer them when available. ## Commit & Pull Request Guidelines - Create commits with `scripts/committer "" `; avoid manual `git add`/`git commit` so staging stays scoped. @@ -41,6 +42,9 @@ - Also read the shared guardrails at `~/Projects/oracle/AGENTS.md` and `~/Projects/agent-scripts/AGENTS.MD` before making changes; align with any cross-repo rules noted there. - SwiftUI state management (iOS/macOS): prefer the `Observation` framework (`@Observable`, `@Bindable`) over `ObservableObject`/`@StateObject`; don’t introduce new `ObservableObject` unless required for compatibility, and migrate existing usages when touching related code. - **Restart apps:** “restart iOS/Android apps” means rebuild (recompile/install) and relaunch, not just kill/launch. +- **Device checks:** before testing, verify connected real devices (iOS/Android) before reaching for simulators/emulators. +- iOS Team ID lookup: `security find-identity -p codesigning -v` → use Apple Development (…) TEAMID. Fallback: `defaults read com.apple.dt.Xcode IDEProvisioningTeamIdentifiers`. +- A2UI bundle hash: `src/canvas-host/a2ui/.bundle.hash` is auto-generated; regenerate via `pnpm canvas:a2ui:bundle` (or `scripts/bundle-a2ui.sh`) instead of manual conflict resolution. - Notary key file lives at `~/Library/CloudStorage/Dropbox/Backup/AppStore/AuthKey_NJF3NFGTS3.p8` (Sparkle keys live under `~/Library/CloudStorage/Dropbox/Backup/Sparkle`). - **Multi-agent safety:** do **not** create/apply/drop `git stash` entries unless Peter explicitly asks (this includes `git pull --rebase --autostash`). Assume other agents may be working; keep unrelated WIP untouched and avoid cross-cutting state changes. - **Multi-agent safety:** when Peter says "push", you may `git pull --rebase` to integrate latest changes (never discard other agents' work). When Peter says "commit", scope to your changes only. When Peter says "commit all", commit everything in grouped chunks. diff --git a/CHANGELOG.md b/CHANGELOG.md index 27a8716b5..194f68b4e 100644 --- a/CHANGELOG.md +++ b/CHANGELOG.md @@ -2,8 +2,66 @@ ## 2.0.0-beta5 — Unreleased +### Features +- Talk mode: continuous speech conversations (macOS/iOS/Android) with ElevenLabs TTS, reply directives, and optional interrupt-on-speech. +- UI: add optional `ui.seamColor` accent to tint the Talk Mode side bubble (macOS/iOS/Android). +- Agent runtime: accept legacy `Z_AI_API_KEY` for Z.AI provider auth (maps to `ZAI_API_KEY`). +- Tests: add a Z.AI live test gate for smoke validation when keys are present. +- macOS Debug: add app log verbosity and rolling file log toggle for swift-log-backed app logs. + ### Fixes +- Docs/agent tools: clarify that browser `wait` should be avoided by default and used only in exceptional cases. - macOS: Voice Wake now fully tears down the Speech pipeline when disabled (cancel pending restarts, drop stale callbacks) to avoid high CPU in the background. +- macOS menu: add a Talk Mode action alongside the Open Dashboard/Chat/Canvas entries. +- macOS Debug: hide “Restart Gateway” when the app won’t start a local gateway (remote mode / attach-only). +- macOS Talk Mode: orb overlay refresh, ElevenLabs request logging, API key status in settings, and auto-select first voice when none is configured. +- macOS Talk Mode: add hard timeout around ElevenLabs TTS synthesis to avoid getting stuck “speaking” forever on hung requests. +- macOS Talk Mode: avoid stuck playback when the audio player never starts (fail-fast + watchdog). +- macOS Talk Mode: fix audio stop ordering so disabling Talk Mode always stops in-flight playback. +- macOS Talk Mode: throttle audio-level updates (avoid per-buffer task creation) to reduce CPU/task churn. +- macOS Talk Mode: increase overlay window size so wave rings don’t clip; close button is hover-only and closer to the orb. +- Talk Mode: fall back to system TTS when ElevenLabs is unavailable, returns non-audio, or playback fails (macOS/iOS/Android). +- Talk Mode: stream PCM on macOS/iOS for lower latency (incremental playback); Android continues MP3 streaming. +- Talk Mode: validate ElevenLabs v3 stability and latency tier directives before sending requests. +- iOS/Android Talk Mode: auto-select the first ElevenLabs voice when none is configured. +- ElevenLabs: add retry/backoff for 429/5xx and include content-type in errors for debugging. +- Talk Mode: align to the gateway’s main session key and fall back to history polling when chat events drop (prevents stuck “thinking” / missing messages). +- Talk Mode: treat history timestamps as seconds or milliseconds to avoid stale assistant picks (macOS/iOS/Android). +- Chat UI: clear streaming/tool bubbles when external runs finish, preventing duplicate assistant bubbles. +- Chat UI: user bubbles use `ui.seamColor` (fallback to a calmer default blue). +- Android Chat UI: use `onPrimary` for user bubble text to preserve contrast (thanks @Syhids). +- Control UI: sync sidebar navigation with the URL for deep-linking, and auto-scroll chat to the latest message. +- Control UI: disable Web Chat + Talk when no iOS/Android node is connected; refreshed Web Chat styling and keyboard send. +- macOS: bundle Control UI assets into the app relay so the packaged app can serve them (thanks @mbelinky). +- Talk Mode: wait for chat history to surface the assistant reply before starting TTS (macOS/iOS/Android). +- iOS Talk Mode: fix chat completion wait to time out even if no events arrive (prevents “Thinking…” hangs). +- iOS Talk Mode: keep recognition running during playback to support interrupt-on-speech. +- iOS Talk Mode: preserve directive voice/model overrides across config reloads and add ElevenLabs request timeouts. +- iOS/Android Talk Mode: explicitly `chat.subscribe` when Talk Mode is active, so completion events arrive even if the Chat UI isn’t open. +- Chat UI: refresh history when another client finishes a run in the same session, so Talk Mode + Voice Wake transcripts appear consistently. +- Gateway: `voice.transcript` now also maps agent bus output to `chat` events, ensuring chat UIs refresh for voice-triggered runs. +- iOS/Android: show a centered Talk Mode orb overlay while Talk Mode is enabled. +- Gateway config: inject `talk.apiKey` from `ELEVENLABS_API_KEY`/shell profile so nodes can fetch it on demand. +- Canvas A2UI: tag requests with `platform=android|ios|macos` and boost Android canvas background contrast. +- iOS/Android nodes: enable scrolling for loaded web pages in the Canvas WebView (default scaffold stays touch-first). +- macOS menu: device list now uses `node.list` (devices only; no agent/tool presence entries). +- macOS menu: device list now shows connected nodes only. +- macOS menu: device rows now pack platform/version on the first line, and command lists wrap in submenus. +- macOS menu: split device platform/version across first and second rows for better fit. +- iOS node: fix ReplayKit screen recording crash caused by queue isolation assertions during capture. +- iOS Talk Mode: avoid audio tap queue assertions when starting recognition. +- macOS: use $HOME/Library/pnpm for SSH PATH exports (thanks @mbelinky). +- iOS/Android nodes: bridge auto-connect refreshes stale tokens and settings now show richer bridge/device details. +- macOS: bundle device model resources to prevent Instances crashes (thanks @mbelinky). +- iOS/Android nodes: status pill now surfaces camera activity instead of overlay toasts. +- iOS/Android/macOS nodes: camera snaps recompress to keep base64 payloads under 5 MB. +- iOS/Android nodes: status pill now surfaces pairing, screen recording, voice wake, and foreground-required states. +- iOS/Android nodes: avoid duplicating “Gateway reconnecting…” when the bridge is already connecting. +- iOS/Android nodes: Talk Mode now lives on a side bubble (with an iOS toggle to hide it), and Android settings no longer show the Talk Mode switch. +- macOS menu: top status line now shows pending node pairing approvals (incl. repairs). +- CLI: avoid spurious gateway close errors after successful request/response cycles. +- Agent runtime: clamp tool-result images to the 5MB Anthropic limit to avoid hard request rejections. +- Tests: add Swift Testing coverage for camera errors and Kotest coverage for Android bridge endpoints. ## 2.0.0-beta4 — 2025-12-27 diff --git a/README.md b/README.md index 2cff9633b..fca93400e 100644 --- a/README.md +++ b/README.md @@ -19,6 +19,8 @@ It answers you on the surfaces you already use (WhatsApp, Telegram, Discord, Web If you want a private, single-user assistant that feels local, fast, and always-on, this is it. +Using Claude Pro/Max subscription? See `docs/onboarding.md` for the Anthropic OAuth setup. + ``` Your surfaces │ diff --git a/apps/android/app/build.gradle.kts b/apps/android/app/build.gradle.kts index c2f1cd817..db3b17dca 100644 --- a/apps/android/app/build.gradle.kts +++ b/apps/android/app/build.gradle.kts @@ -64,6 +64,7 @@ dependencies { implementation("androidx.core:core-ktx:1.17.0") implementation("androidx.lifecycle:lifecycle-runtime-ktx:2.10.0") implementation("androidx.activity:activity-compose:1.12.2") + implementation("androidx.webkit:webkit:1.14.0") implementation("androidx.compose.ui:ui") implementation("androidx.compose.ui:ui-tooling-preview") @@ -93,4 +94,11 @@ dependencies { testImplementation("junit:junit:4.13.2") testImplementation("org.jetbrains.kotlinx:kotlinx-coroutines-test:1.10.2") + testImplementation("io.kotest:kotest-runner-junit5-jvm:6.0.7") + testImplementation("io.kotest:kotest-assertions-core-jvm:6.0.7") + testRuntimeOnly("org.junit.vintage:junit-vintage-engine:5.13.3") +} + +tasks.withType().configureEach { + useJUnitPlatform() } diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/MainViewModel.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/MainViewModel.kt index 28d702975..e93bd432a 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/MainViewModel.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/MainViewModel.kt @@ -23,9 +23,12 @@ class MainViewModel(app: Application) : AndroidViewModel(app) { val statusText: StateFlow = runtime.statusText val serverName: StateFlow = runtime.serverName val remoteAddress: StateFlow = runtime.remoteAddress + val isForeground: StateFlow = runtime.isForeground + val seamColorArgb: StateFlow = runtime.seamColorArgb val cameraHud: StateFlow = runtime.cameraHud val cameraFlashToken: StateFlow = runtime.cameraFlashToken + val screenRecordActive: StateFlow = runtime.screenRecordActive val instanceId: StateFlow = runtime.instanceId val displayName: StateFlow = runtime.displayName @@ -35,6 +38,10 @@ class MainViewModel(app: Application) : AndroidViewModel(app) { val voiceWakeMode: StateFlow = runtime.voiceWakeMode val voiceWakeStatusText: StateFlow = runtime.voiceWakeStatusText val voiceWakeIsListening: StateFlow = runtime.voiceWakeIsListening + val talkEnabled: StateFlow = runtime.talkEnabled + val talkStatusText: StateFlow = runtime.talkStatusText + val talkIsListening: StateFlow = runtime.talkIsListening + val talkIsSpeaking: StateFlow = runtime.talkIsSpeaking val manualEnabled: StateFlow = runtime.manualEnabled val manualHost: StateFlow = runtime.manualHost val manualPort: StateFlow = runtime.manualPort @@ -95,6 +102,10 @@ class MainViewModel(app: Application) : AndroidViewModel(app) { runtime.setVoiceWakeMode(mode) } + fun setTalkEnabled(enabled: Boolean) { + runtime.setTalkEnabled(enabled) + } + fun connect(endpoint: BridgeEndpoint) { runtime.connect(endpoint) } diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/NodeRuntime.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/NodeRuntime.kt index 0ade08e3b..1b63a4948 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/NodeRuntime.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/NodeRuntime.kt @@ -25,6 +25,7 @@ import com.steipete.clawdis.node.protocol.ClawdisCanvasA2UIAction import com.steipete.clawdis.node.protocol.ClawdisCanvasA2UICommand import com.steipete.clawdis.node.protocol.ClawdisCanvasCommand import com.steipete.clawdis.node.protocol.ClawdisScreenCommand +import com.steipete.clawdis.node.voice.TalkModeManager import com.steipete.clawdis.node.voice.VoiceWakeManager import kotlinx.coroutines.CoroutineScope import kotlinx.coroutines.Dispatchers @@ -69,7 +70,7 @@ class NodeRuntime(context: Context) { payloadJson = buildJsonObject { put("message", JsonPrimitive(command)) - put("sessionKey", JsonPrimitive("main")) + put("sessionKey", JsonPrimitive(mainSessionKey.value)) put("thinking", JsonPrimitive(chatThinkingLevel.value)) put("deliver", JsonPrimitive(false)) }.toString(), @@ -84,6 +85,15 @@ class NodeRuntime(context: Context) { val voiceWakeStatusText: StateFlow get() = voiceWake.statusText + val talkStatusText: StateFlow + get() = talkMode.statusText + + val talkIsListening: StateFlow + get() = talkMode.isListening + + val talkIsSpeaking: StateFlow + get() = talkMode.isSpeaking + private val discovery = BridgeDiscovery(appContext, scope = scope) val bridges: StateFlow> = discovery.bridges val discoveryStatusText: StateFlow = discovery.statusText @@ -94,6 +104,9 @@ class NodeRuntime(context: Context) { private val _statusText = MutableStateFlow("Offline") val statusText: StateFlow = _statusText.asStateFlow() + private val _mainSessionKey = MutableStateFlow("main") + val mainSessionKey: StateFlow = _mainSessionKey.asStateFlow() + private val cameraHudSeq = AtomicLong(0) private val _cameraHud = MutableStateFlow(null) val cameraHud: StateFlow = _cameraHud.asStateFlow() @@ -101,12 +114,18 @@ class NodeRuntime(context: Context) { private val _cameraFlashToken = MutableStateFlow(0L) val cameraFlashToken: StateFlow = _cameraFlashToken.asStateFlow() + private val _screenRecordActive = MutableStateFlow(false) + val screenRecordActive: StateFlow = _screenRecordActive.asStateFlow() + private val _serverName = MutableStateFlow(null) val serverName: StateFlow = _serverName.asStateFlow() private val _remoteAddress = MutableStateFlow(null) val remoteAddress: StateFlow = _remoteAddress.asStateFlow() + private val _seamColorArgb = MutableStateFlow(DEFAULT_SEAM_COLOR_ARGB) + val seamColorArgb: StateFlow = _seamColorArgb.asStateFlow() + private val _isForeground = MutableStateFlow(true) val isForeground: StateFlow = _isForeground.asStateFlow() @@ -120,6 +139,8 @@ class NodeRuntime(context: Context) { _serverName.value = name _remoteAddress.value = remote _isConnected.value = true + _seamColorArgb.value = DEFAULT_SEAM_COLOR_ARGB + scope.launch { refreshBrandingFromGateway() } scope.launch { refreshWakeWordsFromGateway() } maybeNavigateToA2uiOnConnect() }, @@ -133,12 +154,17 @@ class NodeRuntime(context: Context) { ) private val chat = ChatController(scope = scope, session = session, json = json) + private val talkMode: TalkModeManager by lazy { + TalkModeManager(context = appContext, scope = scope).also { it.attachSession(session) } + } private fun handleSessionDisconnected(message: String) { _statusText.value = message _serverName.value = null _remoteAddress.value = null _isConnected.value = false + _seamColorArgb.value = DEFAULT_SEAM_COLOR_ARGB + _mainSessionKey.value = "main" chat.onDisconnected(message) showLocalCanvasOnDisconnect() } @@ -163,6 +189,7 @@ class NodeRuntime(context: Context) { val preventSleep: StateFlow = prefs.preventSleep val wakeWords: StateFlow> = prefs.wakeWords val voiceWakeMode: StateFlow = prefs.voiceWakeMode + val talkEnabled: StateFlow = prefs.talkEnabled val manualEnabled: StateFlow = prefs.manualEnabled val manualHost: StateFlow = prefs.manualHost val manualPort: StateFlow = prefs.manualPort @@ -218,6 +245,13 @@ class NodeRuntime(context: Context) { } } + scope.launch { + talkEnabled.collect { enabled -> + talkMode.setEnabled(enabled) + externalAudioCaptureActive.value = enabled + } + } + scope.launch(Dispatchers.Default) { bridges.collect { list -> if (list.isNotEmpty()) { @@ -311,6 +345,10 @@ class NodeRuntime(context: Context) { prefs.setVoiceWakeMode(mode) } + fun setTalkEnabled(value: Boolean) { + prefs.setTalkEnabled(value) + } + fun connect(endpoint: BridgeEndpoint) { scope.launch { _statusText.value = "Connecting…" @@ -548,6 +586,7 @@ class NodeRuntime(context: Context) { return } + talkMode.handleBridgeEvent(event, payloadJson) chat.handleBridgeEvent(event, payloadJson) } @@ -589,6 +628,25 @@ class NodeRuntime(context: Context) { } } + private suspend fun refreshBrandingFromGateway() { + if (!_isConnected.value) return + try { + val res = session.request("config.get", "{}") + val root = json.parseToJsonElement(res).asObjectOrNull() + val config = root?.get("config").asObjectOrNull() + val ui = config?.get("ui").asObjectOrNull() + val raw = ui?.get("seamColor").asStringOrNull()?.trim() + val sessionCfg = config?.get("session").asObjectOrNull() + val rawMainKey = sessionCfg?.get("mainKey").asStringOrNull()?.trim() + _mainSessionKey.value = rawMainKey?.takeIf { it.isNotEmpty() } ?: "main" + + val parsed = parseHexColorArgb(raw) + _seamColorArgb.value = parsed ?: DEFAULT_SEAM_COLOR_ARGB + } catch (_: Throwable) { + // ignore + } + } + private suspend fun handleInvoke(command: String, paramsJson: String?): BridgeSession.InvokeResult { if ( command.startsWith(ClawdisCanvasCommand.NamespacePrefix) || @@ -730,14 +788,20 @@ class NodeRuntime(context: Context) { } } ClawdisScreenCommand.Record.rawValue -> { - val res = - try { - screenRecorder.record(paramsJson) - } catch (err: Throwable) { - val (code, message) = invokeErrorFromThrowable(err) - return BridgeSession.InvokeResult.error(code = code, message = message) - } - BridgeSession.InvokeResult.ok(res.payloadJson) + // Status pill mirrors screen recording state so it stays visible without overlay stacking. + _screenRecordActive.value = true + try { + val res = + try { + screenRecorder.record(paramsJson) + } catch (err: Throwable) { + val (code, message) = invokeErrorFromThrowable(err) + return BridgeSession.InvokeResult.error(code = code, message = message) + } + BridgeSession.InvokeResult.ok(res.payloadJson) + } finally { + _screenRecordActive.value = false + } } else -> BridgeSession.InvokeResult.error( @@ -780,7 +844,7 @@ class NodeRuntime(context: Context) { val raw = session.currentCanvasHostUrl()?.trim().orEmpty() if (raw.isBlank()) return null val base = raw.trimEnd('/') - return "${base}/__clawdis__/a2ui/" + return "${base}/__clawdis__/a2ui/?platform=android" } private suspend fun ensureA2uiReady(a2uiUrl: String): Boolean { @@ -866,6 +930,8 @@ class NodeRuntime(context: Context) { private data class Quad(val first: A, val second: B, val third: C, val fourth: D) +private const val DEFAULT_SEAM_COLOR_ARGB: Long = 0xFF4F7A9A + private const val a2uiReadyCheckJS: String = """ (() => { @@ -920,3 +986,12 @@ private fun JsonElement?.asStringOrNull(): String? = is JsonPrimitive -> content else -> null } + +private fun parseHexColorArgb(raw: String?): Long? { + val trimmed = raw?.trim().orEmpty() + if (trimmed.isEmpty()) return null + val hex = if (trimmed.startsWith("#")) trimmed.drop(1) else trimmed + if (hex.length != 6) return null + val rgb = hex.toLongOrNull(16) ?: return null + return 0xFF000000L or rgb +} diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/SecurePrefs.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/SecurePrefs.kt index 8d7ceb0a2..b288ef29e 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/SecurePrefs.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/SecurePrefs.kt @@ -73,6 +73,9 @@ class SecurePrefs(context: Context) { private val _voiceWakeMode = MutableStateFlow(loadVoiceWakeMode()) val voiceWakeMode: StateFlow = _voiceWakeMode + private val _talkEnabled = MutableStateFlow(prefs.getBoolean("talk.enabled", false)) + val talkEnabled: StateFlow = _talkEnabled + fun setLastDiscoveredStableId(value: String) { val trimmed = value.trim() prefs.edit { putString("bridge.lastDiscoveredStableId", trimmed) } @@ -158,6 +161,11 @@ class SecurePrefs(context: Context) { _voiceWakeMode.value = mode } + fun setTalkEnabled(value: Boolean) { + prefs.edit { putBoolean("talk.enabled", value) } + _talkEnabled.value = value + } + private fun loadVoiceWakeMode(): VoiceWakeMode { val raw = prefs.getString(voiceWakeModeKey, null) val resolved = VoiceWakeMode.fromRawValue(raw) diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeDiscovery.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeDiscovery.kt index 17e9120c1..b33261ccb 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeDiscovery.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeDiscovery.kt @@ -130,20 +130,36 @@ class BridgeDiscovery( object : NsdManager.ResolveListener { override fun onResolveFailed(serviceInfo: NsdServiceInfo, errorCode: Int) {} - override fun onServiceResolved(resolved: NsdServiceInfo) { - val host = resolved.host?.hostAddress ?: return - val port = resolved.port - if (port <= 0) return + override fun onServiceResolved(resolved: NsdServiceInfo) { + val host = resolved.host?.hostAddress ?: return + val port = resolved.port + if (port <= 0) return - val rawServiceName = resolved.serviceName - val serviceName = BonjourEscapes.decode(rawServiceName) - val displayName = BonjourEscapes.decode(txt(resolved, "displayName") ?: serviceName) - val id = stableId(serviceName, "local.") - localById[id] = BridgeEndpoint(stableId = id, name = displayName, host = host, port = port) - publish() - } - }, - ) + val rawServiceName = resolved.serviceName + val serviceName = BonjourEscapes.decode(rawServiceName) + val displayName = BonjourEscapes.decode(txt(resolved, "displayName") ?: serviceName) + val lanHost = txt(resolved, "lanHost") + val tailnetDns = txt(resolved, "tailnetDns") + val gatewayPort = txtInt(resolved, "gatewayPort") + val bridgePort = txtInt(resolved, "bridgePort") + val canvasPort = txtInt(resolved, "canvasPort") + val id = stableId(serviceName, "local.") + localById[id] = + BridgeEndpoint( + stableId = id, + name = displayName, + host = host, + port = port, + lanHost = lanHost, + tailnetDns = tailnetDns, + gatewayPort = gatewayPort, + bridgePort = bridgePort, + canvasPort = canvasPort, + ) + publish() + } + }, + ) } private fun publish() { @@ -189,6 +205,10 @@ class BridgeDiscovery( } } + private fun txtInt(info: NsdServiceInfo, key: String): Int? { + return txt(info, key)?.toIntOrNull() + } + private suspend fun refreshUnicast(domain: String) { val ptrName = "${serviceType}${domain}" val ptrMsg = lookupUnicastMessage(ptrName, Type.PTR) ?: return @@ -227,8 +247,24 @@ class BridgeDiscovery( } val instanceName = BonjourEscapes.decode(decodeInstanceName(instanceFqdn, domain)) val displayName = BonjourEscapes.decode(txtValue(txt, "displayName") ?: instanceName) + val lanHost = txtValue(txt, "lanHost") + val tailnetDns = txtValue(txt, "tailnetDns") + val gatewayPort = txtIntValue(txt, "gatewayPort") + val bridgePort = txtIntValue(txt, "bridgePort") + val canvasPort = txtIntValue(txt, "canvasPort") val id = stableId(instanceName, domain) - next[id] = BridgeEndpoint(stableId = id, name = displayName, host = host, port = port) + next[id] = + BridgeEndpoint( + stableId = id, + name = displayName, + host = host, + port = port, + lanHost = lanHost, + tailnetDns = tailnetDns, + gatewayPort = gatewayPort, + bridgePort = bridgePort, + canvasPort = canvasPort, + ) } unicastById.clear() @@ -434,6 +470,10 @@ class BridgeDiscovery( return null } + private fun txtIntValue(records: List, key: String): Int? { + return txtValue(records, key)?.toIntOrNull() + } + private fun decodeDnsTxtString(raw: String): String { // dnsjava treats TXT as opaque bytes and decodes as ISO-8859-1 to preserve bytes. // Our TXT payload is UTF-8 (written by the gateway), so re-decode when possible. diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeEndpoint.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeEndpoint.kt index bd359e470..41c415c4b 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeEndpoint.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeEndpoint.kt @@ -5,6 +5,11 @@ data class BridgeEndpoint( val name: String, val host: String, val port: Int, + val lanHost: String? = null, + val tailnetDns: String? = null, + val gatewayPort: Int? = null, + val bridgePort: Int? = null, + val canvasPort: Int? = null, ) { companion object { fun manual(host: String, port: Int): BridgeEndpoint = @@ -16,4 +21,3 @@ data class BridgeEndpoint( ) } } - diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeSession.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeSession.kt index e50488d37..83e2cb744 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeSession.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/bridge/BridgeSession.kt @@ -11,6 +11,7 @@ import kotlinx.coroutines.launch import kotlinx.coroutines.sync.Mutex import kotlinx.coroutines.sync.withLock import kotlinx.coroutines.withContext +import com.steipete.clawdis.node.BuildConfig import kotlinx.serialization.json.Json import kotlinx.serialization.json.JsonArray import kotlinx.serialization.json.JsonObject @@ -23,6 +24,7 @@ import java.io.BufferedWriter import java.io.InputStreamReader import java.io.OutputStreamWriter import java.net.InetSocketAddress +import java.net.URI import java.net.Socket import java.util.UUID import java.util.concurrent.ConcurrentHashMap @@ -75,6 +77,8 @@ class BridgeSession( fun disconnect() { desired = null + // Unblock connectOnce() read loop. Coroutine cancellation alone won't interrupt BufferedReader.readLine(). + currentConnection?.closeQuietly() scope.launch(Dispatchers.IO) { job?.cancelAndJoin() job = null @@ -213,7 +217,17 @@ class BridgeSession( when (first["type"].asStringOrNull()) { "hello-ok" -> { val name = first["serverName"].asStringOrNull() ?: "Bridge" - canvasHostUrl = first["canvasHostUrl"].asStringOrNull()?.trim()?.ifEmpty { null } + val rawCanvasUrl = first["canvasHostUrl"].asStringOrNull()?.trim()?.ifEmpty { null } + canvasHostUrl = normalizeCanvasHostUrl(rawCanvasUrl, endpoint) + if (BuildConfig.DEBUG) { + // Local JVM unit tests use android.jar stubs; Log.d can throw "not mocked". + runCatching { + android.util.Log.d( + "ClawdisBridge", + "canvasHostUrl resolved=${canvasHostUrl ?: "none"} (raw=${rawCanvasUrl ?: "none"})", + ) + } + } onConnected(name, conn.remoteAddress) } "error" -> { @@ -292,6 +306,37 @@ class BridgeSession( conn.closeQuietly() } } + + private fun normalizeCanvasHostUrl(raw: String?, endpoint: BridgeEndpoint): String? { + val trimmed = raw?.trim().orEmpty() + val parsed = trimmed.takeIf { it.isNotBlank() }?.let { runCatching { URI(it) }.getOrNull() } + val host = parsed?.host?.trim().orEmpty() + val port = parsed?.port ?: -1 + val scheme = parsed?.scheme?.trim().orEmpty().ifBlank { "http" } + + if (trimmed.isNotBlank() && !isLoopbackHost(host)) { + return trimmed + } + + val fallbackHost = + endpoint.tailnetDns?.trim().takeIf { !it.isNullOrEmpty() } + ?: endpoint.lanHost?.trim().takeIf { !it.isNullOrEmpty() } + ?: endpoint.host.trim() + if (fallbackHost.isEmpty()) return trimmed.ifBlank { null } + + val fallbackPort = endpoint.canvasPort ?: if (port > 0) port else 18793 + val formattedHost = if (fallbackHost.contains(":")) "[${fallbackHost}]" else fallbackHost + return "$scheme://$formattedHost:$fallbackPort" + } + + private fun isLoopbackHost(raw: String?): Boolean { + val host = raw?.trim()?.lowercase().orEmpty() + if (host.isEmpty()) return false + if (host == "localhost") return true + if (host == "::1") return true + if (host == "0.0.0.0" || host == "::") return true + return host.startsWith("127.") + } } private fun JsonElement?.asObjectOrNull(): JsonObject? = this as? JsonObject diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/node/CameraCaptureManager.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/node/CameraCaptureManager.kt index 4f1501340..416690766 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/node/CameraCaptureManager.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/node/CameraCaptureManager.kt @@ -28,6 +28,7 @@ import kotlinx.coroutines.withContext import java.io.ByteArrayOutputStream import java.io.File import java.util.concurrent.Executor +import kotlin.math.roundToInt import kotlin.coroutines.resume import kotlin.coroutines.resumeWithException @@ -99,14 +100,36 @@ class CameraCaptureManager(private val context: Context) { decoded } - val out = ByteArrayOutputStream() - val jpegQuality = (quality * 100.0).toInt().coerceIn(10, 100) - if (!scaled.compress(Bitmap.CompressFormat.JPEG, jpegQuality, out)) { - throw IllegalStateException("UNAVAILABLE: failed to encode JPEG") - } - val base64 = Base64.encodeToString(out.toByteArray(), Base64.NO_WRAP) + val maxPayloadBytes = 5 * 1024 * 1024 + // Base64 inflates payloads by ~4/3; cap encoded bytes so the payload stays under 5MB (API limit). + val maxEncodedBytes = (maxPayloadBytes / 4) * 3 + val result = + JpegSizeLimiter.compressToLimit( + initialWidth = scaled.width, + initialHeight = scaled.height, + startQuality = (quality * 100.0).roundToInt().coerceIn(10, 100), + maxBytes = maxEncodedBytes, + encode = { width, height, q -> + val bitmap = + if (width == scaled.width && height == scaled.height) { + scaled + } else { + scaled.scale(width, height) + } + val out = ByteArrayOutputStream() + if (!bitmap.compress(Bitmap.CompressFormat.JPEG, q, out)) { + if (bitmap !== scaled) bitmap.recycle() + throw IllegalStateException("UNAVAILABLE: failed to encode JPEG") + } + if (bitmap !== scaled) { + bitmap.recycle() + } + out.toByteArray() + }, + ) + val base64 = Base64.encodeToString(result.bytes, Base64.NO_WRAP) Payload( - """{"format":"jpg","base64":"$base64","width":${scaled.width},"height":${scaled.height}}""", + """{"format":"jpg","base64":"$base64","width":${result.width},"height":${result.height}}""", ) } diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/node/CanvasController.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/node/CanvasController.kt index 5b4a09b64..685acdcd2 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/node/CanvasController.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/node/CanvasController.kt @@ -3,6 +3,7 @@ package com.steipete.clawdis.node.node import android.graphics.Bitmap import android.graphics.Canvas import android.os.Looper +import android.util.Log import android.webkit.WebView import androidx.core.graphics.createBitmap import androidx.core.graphics.scale @@ -16,6 +17,7 @@ import kotlinx.serialization.json.Json import kotlinx.serialization.json.JsonElement import kotlinx.serialization.json.JsonObject import kotlinx.serialization.json.JsonPrimitive +import com.steipete.clawdis.node.BuildConfig import kotlin.coroutines.resume class CanvasController { @@ -81,8 +83,14 @@ class CanvasController { val currentUrl = url withWebViewOnMain { wv -> if (currentUrl == null) { + if (BuildConfig.DEBUG) { + Log.d("ClawdisCanvas", "load scaffold: $scaffoldAssetUrl") + } wv.loadUrl(scaffoldAssetUrl) } else { + if (BuildConfig.DEBUG) { + Log.d("ClawdisCanvas", "load url: $currentUrl") + } wv.loadUrl(currentUrl) } } diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/node/JpegSizeLimiter.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/node/JpegSizeLimiter.kt new file mode 100644 index 000000000..bb9377231 --- /dev/null +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/node/JpegSizeLimiter.kt @@ -0,0 +1,61 @@ +package com.steipete.clawdis.node.node + +import kotlin.math.max +import kotlin.math.min +import kotlin.math.roundToInt + +internal data class JpegSizeLimiterResult( + val bytes: ByteArray, + val width: Int, + val height: Int, + val quality: Int, +) + +internal object JpegSizeLimiter { + fun compressToLimit( + initialWidth: Int, + initialHeight: Int, + startQuality: Int, + maxBytes: Int, + minQuality: Int = 20, + minSize: Int = 256, + scaleStep: Double = 0.85, + maxScaleAttempts: Int = 6, + maxQualityAttempts: Int = 6, + encode: (width: Int, height: Int, quality: Int) -> ByteArray, + ): JpegSizeLimiterResult { + require(initialWidth > 0 && initialHeight > 0) { "Invalid image size" } + require(maxBytes > 0) { "Invalid maxBytes" } + + var width = initialWidth + var height = initialHeight + val clampedStartQuality = startQuality.coerceIn(minQuality, 100) + var best = JpegSizeLimiterResult(bytes = encode(width, height, clampedStartQuality), width = width, height = height, quality = clampedStartQuality) + if (best.bytes.size <= maxBytes) return best + + repeat(maxScaleAttempts) { + var quality = clampedStartQuality + repeat(maxQualityAttempts) { + val bytes = encode(width, height, quality) + best = JpegSizeLimiterResult(bytes = bytes, width = width, height = height, quality = quality) + if (bytes.size <= maxBytes) return best + if (quality <= minQuality) return@repeat + quality = max(minQuality, (quality * 0.75).roundToInt()) + } + + val minScale = (minSize.toDouble() / min(width, height).toDouble()).coerceAtMost(1.0) + val nextScale = max(scaleStep, minScale) + val nextWidth = max(minSize, (width * nextScale).roundToInt()) + val nextHeight = max(minSize, (height * nextScale).roundToInt()) + if (nextWidth == width && nextHeight == height) return@repeat + width = min(nextWidth, width) + height = min(nextHeight, height) + } + + if (best.bytes.size > maxBytes) { + throw IllegalStateException("CAMERA_TOO_LARGE: ${best.bytes.size} bytes > $maxBytes bytes") + } + + return best + } +} diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/CameraHudOverlay.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/CameraHudOverlay.kt index b205929cd..2e1fec0d9 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/CameraHudOverlay.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/CameraHudOverlay.kt @@ -1,64 +1,26 @@ package com.steipete.clawdis.node.ui -import androidx.compose.animation.AnimatedVisibility -import androidx.compose.animation.fadeIn -import androidx.compose.animation.fadeOut -import androidx.compose.animation.slideInVertically -import androidx.compose.animation.slideOutVertically import androidx.compose.foundation.background import androidx.compose.foundation.layout.Box -import androidx.compose.foundation.layout.Row -import androidx.compose.foundation.layout.Spacer import androidx.compose.foundation.layout.fillMaxSize -import androidx.compose.foundation.layout.padding -import androidx.compose.foundation.layout.size -import androidx.compose.foundation.layout.statusBarsPadding -import androidx.compose.foundation.shape.RoundedCornerShape -import androidx.compose.material.icons.Icons -import androidx.compose.material.icons.filled.CheckCircle -import androidx.compose.material.icons.filled.Error -import androidx.compose.material.icons.filled.FiberManualRecord -import androidx.compose.material.icons.filled.PhotoCamera -import androidx.compose.material3.CircularProgressIndicator -import androidx.compose.material3.Icon -import androidx.compose.material3.MaterialTheme -import androidx.compose.material3.Surface -import androidx.compose.material3.Text import androidx.compose.runtime.Composable import androidx.compose.runtime.LaunchedEffect import androidx.compose.runtime.getValue import androidx.compose.runtime.mutableFloatStateOf import androidx.compose.runtime.remember import androidx.compose.runtime.setValue -import androidx.compose.ui.Alignment import androidx.compose.ui.Modifier import androidx.compose.ui.draw.alpha import androidx.compose.ui.graphics.Color -import androidx.compose.ui.text.style.TextOverflow -import androidx.compose.ui.unit.dp -import com.steipete.clawdis.node.CameraHudKind -import com.steipete.clawdis.node.CameraHudState import kotlinx.coroutines.delay @Composable -fun CameraHudOverlay( - hud: CameraHudState?, - flashToken: Long, +fun CameraFlashOverlay( + token: Long, modifier: Modifier = Modifier, ) { Box(modifier = modifier.fillMaxSize()) { - CameraFlash(token = flashToken) - - AnimatedVisibility( - visible = hud != null, - enter = slideInVertically(initialOffsetY = { -it / 2 }) + fadeIn(), - exit = slideOutVertically(targetOffsetY = { -it / 2 }) + fadeOut(), - modifier = Modifier.align(Alignment.TopStart).statusBarsPadding().padding(start = 12.dp, top = 58.dp), - ) { - if (hud != null) { - Toast(hud = hud) - } - } + CameraFlash(token = token) } } @@ -80,44 +42,3 @@ private fun CameraFlash(token: Long) { .background(Color.White), ) } - -@Composable -private fun Toast(hud: CameraHudState) { - Surface( - shape = RoundedCornerShape(14.dp), - color = MaterialTheme.colorScheme.surface.copy(alpha = 0.85f), - tonalElevation = 2.dp, - shadowElevation = 8.dp, - ) { - Row( - modifier = Modifier.padding(vertical = 10.dp, horizontal = 12.dp), - verticalAlignment = Alignment.CenterVertically, - ) { - when (hud.kind) { - CameraHudKind.Photo -> { - Icon(Icons.Default.PhotoCamera, contentDescription = null) - Spacer(Modifier.size(10.dp)) - CircularProgressIndicator(modifier = Modifier.size(14.dp), strokeWidth = 2.dp) - } - CameraHudKind.Recording -> { - Icon(Icons.Default.FiberManualRecord, contentDescription = null, tint = Color.Red) - } - CameraHudKind.Success -> { - Icon(Icons.Default.CheckCircle, contentDescription = null) - } - CameraHudKind.Error -> { - Icon(Icons.Default.Error, contentDescription = null) - } - } - - Spacer(Modifier.size(10.dp)) - Text( - text = hud.message, - style = MaterialTheme.typography.bodyMedium, - maxLines = 1, - overflow = TextOverflow.Ellipsis, - ) - } - } -} - diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/RootScreen.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/RootScreen.kt index 4ee3afa1a..b81a047f4 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/RootScreen.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/RootScreen.kt @@ -7,12 +7,18 @@ import android.graphics.Color import android.util.Log import android.view.View import android.webkit.JavascriptInterface +import android.webkit.ConsoleMessage +import android.webkit.WebChromeClient import android.webkit.WebView import android.webkit.WebSettings import android.webkit.WebResourceError import android.webkit.WebResourceRequest import android.webkit.WebResourceResponse import android.webkit.WebViewClient +import androidx.activity.compose.rememberLauncherForActivityResult +import androidx.activity.result.contract.ActivityResultContracts +import androidx.webkit.WebSettingsCompat +import androidx.webkit.WebViewFeature import androidx.compose.foundation.layout.Arrangement import androidx.compose.foundation.layout.Box import androidx.compose.foundation.layout.Column @@ -28,10 +34,20 @@ import androidx.compose.material3.ExperimentalMaterial3Api import androidx.compose.material3.FilledTonalIconButton import androidx.compose.material3.Icon import androidx.compose.material3.IconButtonDefaults +import androidx.compose.material3.LocalContentColor +import androidx.compose.material3.MaterialTheme import androidx.compose.material3.ModalBottomSheet import androidx.compose.material3.rememberModalBottomSheetState import androidx.compose.material.icons.Icons import androidx.compose.material.icons.filled.ChatBubble +import androidx.compose.material.icons.filled.CheckCircle +import androidx.compose.material.icons.filled.Error +import androidx.compose.material.icons.filled.FiberManualRecord +import androidx.compose.material.icons.filled.PhotoCamera +import androidx.compose.material.icons.filled.RecordVoiceOver +import androidx.compose.material.icons.filled.Refresh +import androidx.compose.material.icons.filled.Report +import androidx.compose.material.icons.filled.ScreenShare import androidx.compose.material.icons.filled.Settings import androidx.compose.runtime.Composable import androidx.compose.runtime.collectAsState @@ -41,12 +57,15 @@ import androidx.compose.runtime.remember import androidx.compose.runtime.setValue import androidx.compose.ui.Alignment import androidx.compose.ui.Modifier +import androidx.compose.ui.graphics.Color as ComposeColor +import androidx.compose.ui.graphics.lerp import androidx.compose.ui.platform.LocalContext import androidx.compose.ui.unit.dp import androidx.compose.ui.viewinterop.AndroidView import androidx.compose.ui.window.Popup import androidx.compose.ui.window.PopupProperties import androidx.core.content.ContextCompat +import com.steipete.clawdis.node.CameraHudKind import com.steipete.clawdis.node.MainViewModel @OptIn(ExperimentalMaterial3Api::class) @@ -60,6 +79,105 @@ fun RootScreen(viewModel: MainViewModel) { val statusText by viewModel.statusText.collectAsState() val cameraHud by viewModel.cameraHud.collectAsState() val cameraFlashToken by viewModel.cameraFlashToken.collectAsState() + val screenRecordActive by viewModel.screenRecordActive.collectAsState() + val isForeground by viewModel.isForeground.collectAsState() + val voiceWakeStatusText by viewModel.voiceWakeStatusText.collectAsState() + val talkEnabled by viewModel.talkEnabled.collectAsState() + val talkStatusText by viewModel.talkStatusText.collectAsState() + val talkIsListening by viewModel.talkIsListening.collectAsState() + val talkIsSpeaking by viewModel.talkIsSpeaking.collectAsState() + val seamColorArgb by viewModel.seamColorArgb.collectAsState() + val seamColor = remember(seamColorArgb) { ComposeColor(seamColorArgb) } + val audioPermissionLauncher = + rememberLauncherForActivityResult(ActivityResultContracts.RequestPermission()) { granted -> + if (granted) viewModel.setTalkEnabled(true) + } + val activity = + remember(cameraHud, screenRecordActive, isForeground, statusText, voiceWakeStatusText) { + // Status pill owns transient activity state so it doesn't overlap the connection indicator. + if (!isForeground) { + return@remember StatusActivity( + title = "Foreground required", + icon = Icons.Default.Report, + contentDescription = "Foreground required", + ) + } + + val lowerStatus = statusText.lowercase() + if (lowerStatus.contains("repair")) { + return@remember StatusActivity( + title = "Repairing…", + icon = Icons.Default.Refresh, + contentDescription = "Repairing", + ) + } + if (lowerStatus.contains("pairing") || lowerStatus.contains("approval")) { + return@remember StatusActivity( + title = "Approval pending", + icon = Icons.Default.RecordVoiceOver, + contentDescription = "Approval pending", + ) + } + // Avoid duplicating the primary bridge status ("Connecting…") in the activity slot. + + if (screenRecordActive) { + return@remember StatusActivity( + title = "Recording screen…", + icon = Icons.Default.ScreenShare, + contentDescription = "Recording screen", + tint = androidx.compose.ui.graphics.Color.Red, + ) + } + + cameraHud?.let { hud -> + return@remember when (hud.kind) { + CameraHudKind.Photo -> + StatusActivity( + title = hud.message, + icon = Icons.Default.PhotoCamera, + contentDescription = "Taking photo", + ) + CameraHudKind.Recording -> + StatusActivity( + title = hud.message, + icon = Icons.Default.FiberManualRecord, + contentDescription = "Recording", + tint = androidx.compose.ui.graphics.Color.Red, + ) + CameraHudKind.Success -> + StatusActivity( + title = hud.message, + icon = Icons.Default.CheckCircle, + contentDescription = "Capture finished", + ) + CameraHudKind.Error -> + StatusActivity( + title = hud.message, + icon = Icons.Default.Error, + contentDescription = "Capture failed", + tint = androidx.compose.ui.graphics.Color.Red, + ) + } + } + + if (voiceWakeStatusText.contains("Microphone permission", ignoreCase = true)) { + return@remember StatusActivity( + title = "Mic permission", + icon = Icons.Default.Error, + contentDescription = "Mic permission required", + ) + } + if (voiceWakeStatusText == "Paused") { + val suffix = if (!isForeground) " (background)" else "" + return@remember StatusActivity( + title = "Voice Wake paused$suffix", + icon = Icons.Default.RecordVoiceOver, + contentDescription = "Voice Wake paused", + ) + } + + null + } val bridgeState = remember(serverName, statusText) { @@ -80,9 +198,9 @@ fun RootScreen(viewModel: MainViewModel) { CanvasView(viewModel = viewModel, modifier = Modifier.fillMaxSize()) } - // Camera HUD (flash + toast) must be in a Popup to render above the WebView. + // Camera flash must be in a Popup to render above the WebView. Popup(alignment = Alignment.Center, properties = PopupProperties(focusable = false)) { - CameraHudOverlay(hud = cameraHud, flashToken = cameraFlashToken, modifier = Modifier.fillMaxSize()) + CameraFlashOverlay(token = cameraFlashToken, modifier = Modifier.fillMaxSize()) } // Keep the overlay buttons above the WebView canvas (AndroidView), otherwise they may not receive touches. @@ -90,6 +208,7 @@ fun RootScreen(viewModel: MainViewModel) { StatusPill( bridge = bridgeState, voiceEnabled = voiceEnabled, + activity = activity, onClick = { sheet = Sheet.Settings }, modifier = Modifier.windowInsetsPadding(safeOverlayInsets).padding(start = 12.dp, top = 12.dp), ) @@ -106,6 +225,38 @@ fun RootScreen(viewModel: MainViewModel) { icon = { Icon(Icons.Default.ChatBubble, contentDescription = "Chat") }, ) + // Talk mode gets a dedicated side bubble instead of burying it in settings. + val baseOverlay = overlayContainerColor() + val talkContainer = + lerp( + baseOverlay, + seamColor.copy(alpha = baseOverlay.alpha), + if (talkEnabled) 0.35f else 0.22f, + ) + val talkContent = if (talkEnabled) seamColor else overlayIconColor() + OverlayIconButton( + onClick = { + val next = !talkEnabled + if (next) { + val micOk = + ContextCompat.checkSelfPermission(context, Manifest.permission.RECORD_AUDIO) == + PackageManager.PERMISSION_GRANTED + if (!micOk) audioPermissionLauncher.launch(Manifest.permission.RECORD_AUDIO) + viewModel.setTalkEnabled(true) + } else { + viewModel.setTalkEnabled(false) + } + }, + containerColor = talkContainer, + contentColor = talkContent, + icon = { + Icon( + Icons.Default.RecordVoiceOver, + contentDescription = "Talk Mode", + ) + }, + ) + OverlayIconButton( onClick = { sheet = Sheet.Settings }, icon = { Icon(Icons.Default.Settings, contentDescription = "Settings") }, @@ -113,6 +264,17 @@ fun RootScreen(viewModel: MainViewModel) { } } + if (talkEnabled) { + Popup(alignment = Alignment.Center, properties = PopupProperties(focusable = false)) { + TalkOrbOverlay( + seamColor = seamColor, + statusText = talkStatusText, + isListening = talkIsListening, + isSpeaking = talkIsSpeaking, + ) + } + } + val currentSheet = sheet if (currentSheet != null) { ModalBottomSheet( @@ -136,14 +298,16 @@ private enum class Sheet { private fun OverlayIconButton( onClick: () -> Unit, icon: @Composable () -> Unit, + containerColor: ComposeColor? = null, + contentColor: ComposeColor? = null, ) { FilledTonalIconButton( onClick = onClick, modifier = Modifier.size(44.dp), colors = IconButtonDefaults.filledTonalIconButtonColors( - containerColor = overlayContainerColor(), - contentColor = overlayIconColor(), + containerColor = containerColor ?: overlayContainerColor(), + contentColor = contentColor ?: overlayIconColor(), ), ) { icon() @@ -163,6 +327,19 @@ private fun CanvasView(viewModel: MainViewModel, modifier: Modifier = Modifier) // Some embedded web UIs (incl. the "background website") use localStorage/sessionStorage. settings.domStorageEnabled = true settings.mixedContentMode = WebSettings.MIXED_CONTENT_COMPATIBILITY_MODE + if (WebViewFeature.isFeatureSupported(WebViewFeature.FORCE_DARK)) { + WebSettingsCompat.setForceDark(settings, WebSettingsCompat.FORCE_DARK_OFF) + } + if (WebViewFeature.isFeatureSupported(WebViewFeature.ALGORITHMIC_DARKENING)) { + WebSettingsCompat.setAlgorithmicDarkeningAllowed(settings, false) + } + if (isDebuggable) { + Log.d("ClawdisWebView", "userAgent: ${settings.userAgentString}") + } + isScrollContainer = true + overScrollMode = View.OVER_SCROLL_IF_CONTENT_SCROLLS + isVerticalScrollBarEnabled = true + isHorizontalScrollBarEnabled = true webViewClient = object : WebViewClient() { override fun onReceivedError( @@ -189,11 +366,38 @@ private fun CanvasView(viewModel: MainViewModel, modifier: Modifier = Modifier) } override fun onPageFinished(view: WebView, url: String?) { + if (isDebuggable) { + Log.d("ClawdisWebView", "onPageFinished: $url") + } viewModel.canvas.onPageFinished() } + + override fun onRenderProcessGone( + view: WebView, + detail: android.webkit.RenderProcessGoneDetail, + ): Boolean { + if (isDebuggable) { + Log.e( + "ClawdisWebView", + "onRenderProcessGone didCrash=${detail.didCrash()} priorityAtExit=${detail.rendererPriorityAtExit()}", + ) + } + return true + } } - setBackgroundColor(Color.BLACK) - setLayerType(View.LAYER_TYPE_HARDWARE, null) + webChromeClient = + object : WebChromeClient() { + override fun onConsoleMessage(consoleMessage: ConsoleMessage?): Boolean { + if (!isDebuggable) return false + val msg = consoleMessage ?: return false + Log.d( + "ClawdisWebView", + "console ${msg.messageLevel()} @ ${msg.sourceId()}:${msg.lineNumber()} ${msg.message()}", + ) + return false + } + } + // Use default layer/background; avoid forcing a black fill over WebView content. val a2uiBridge = CanvasA2UIActionBridge { payload -> diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/SettingsSheet.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/SettingsSheet.kt index 038ef9faf..c7d011892 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/SettingsSheet.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/SettingsSheet.kt @@ -2,6 +2,7 @@ package com.steipete.clawdis.node.ui import android.Manifest import android.content.pm.PackageManager +import android.os.Build import androidx.activity.compose.rememberLauncherForActivityResult import androidx.activity.result.contract.ActivityResultContracts import androidx.compose.animation.AnimatedVisibility @@ -46,6 +47,7 @@ import androidx.compose.ui.platform.LocalContext import androidx.compose.ui.text.style.TextAlign import androidx.compose.ui.unit.dp import androidx.core.content.ContextCompat +import com.steipete.clawdis.node.BuildConfig import com.steipete.clawdis.node.MainViewModel import com.steipete.clawdis.node.NodeForegroundService import com.steipete.clawdis.node.VoiceWakeMode @@ -74,6 +76,22 @@ fun SettingsSheet(viewModel: MainViewModel) { val listState = rememberLazyListState() val (wakeWordsText, setWakeWordsText) = remember { mutableStateOf("") } val (advancedExpanded, setAdvancedExpanded) = remember { mutableStateOf(false) } + val deviceModel = + remember { + listOfNotNull(Build.MANUFACTURER, Build.MODEL) + .joinToString(" ") + .trim() + .ifEmpty { "Android" } + } + val appVersion = + remember { + val versionName = BuildConfig.VERSION_NAME.trim().ifEmpty { "dev" } + if (BuildConfig.DEBUG && !versionName.contains("dev", ignoreCase = true)) { + "$versionName-dev" + } else { + versionName + } + } LaunchedEffect(wakeWords) { setWakeWordsText(wakeWords.joinToString(", ")) } @@ -142,6 +160,8 @@ fun SettingsSheet(viewModel: MainViewModel) { ) } item { Text("Instance ID: $instanceId", color = MaterialTheme.colorScheme.onSurfaceVariant) } + item { Text("Device: $deviceModel", color = MaterialTheme.colorScheme.onSurfaceVariant) } + item { Text("Version: $appVersion", color = MaterialTheme.colorScheme.onSurfaceVariant) } item { HorizontalDivider() } @@ -181,9 +201,27 @@ fun SettingsSheet(viewModel: MainViewModel) { item { Text("No bridges found yet.", color = MaterialTheme.colorScheme.onSurfaceVariant) } } else { items(items = visibleBridges, key = { it.stableId }) { bridge -> + val detailLines = + buildList { + add("IP: ${bridge.host}:${bridge.port}") + bridge.lanHost?.let { add("LAN: $it") } + bridge.tailnetDns?.let { add("Tailnet: $it") } + if (bridge.gatewayPort != null || bridge.bridgePort != null || bridge.canvasPort != null) { + val gw = bridge.gatewayPort?.toString() ?: "—" + val br = (bridge.bridgePort ?: bridge.port).toString() + val canvas = bridge.canvasPort?.toString() ?: "—" + add("Ports: gw $gw · bridge $br · canvas $canvas") + } + } ListItem( headlineContent = { Text(bridge.name) }, - supportingContent = { Text("${bridge.host}:${bridge.port}") }, + supportingContent = { + Column(verticalArrangement = Arrangement.spacedBy(2.dp)) { + detailLines.forEach { line -> + Text(line, color = MaterialTheme.colorScheme.onSurfaceVariant) + } + } + }, trailingContent = { Button( onClick = { diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/StatusPill.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/StatusPill.kt index 87a500265..2efcccae7 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/StatusPill.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/StatusPill.kt @@ -28,6 +28,7 @@ import androidx.compose.ui.unit.dp fun StatusPill( bridge: BridgeState, voiceEnabled: Boolean, + activity: StatusActivity? = null, onClick: () -> Unit, modifier: Modifier = Modifier, ) { @@ -62,23 +63,49 @@ fun StatusPill( color = MaterialTheme.colorScheme.onSurfaceVariant, ) - Icon( - imageVector = if (voiceEnabled) Icons.Default.Mic else Icons.Default.MicOff, - contentDescription = if (voiceEnabled) "Voice enabled" else "Voice disabled", - tint = - if (voiceEnabled) { - overlayIconColor() - } else { - MaterialTheme.colorScheme.onSurfaceVariant - }, - modifier = Modifier.size(18.dp), - ) + if (activity != null) { + Row( + horizontalArrangement = Arrangement.spacedBy(6.dp), + verticalAlignment = Alignment.CenterVertically, + ) { + Icon( + imageVector = activity.icon, + contentDescription = activity.contentDescription, + tint = activity.tint ?: overlayIconColor(), + modifier = Modifier.size(18.dp), + ) + Text( + text = activity.title, + style = MaterialTheme.typography.labelLarge, + maxLines = 1, + ) + } + } else { + Icon( + imageVector = if (voiceEnabled) Icons.Default.Mic else Icons.Default.MicOff, + contentDescription = if (voiceEnabled) "Voice enabled" else "Voice disabled", + tint = + if (voiceEnabled) { + overlayIconColor() + } else { + MaterialTheme.colorScheme.onSurfaceVariant + }, + modifier = Modifier.size(18.dp), + ) + } Spacer(modifier = Modifier.width(2.dp)) } } } +data class StatusActivity( + val title: String, + val icon: androidx.compose.ui.graphics.vector.ImageVector, + val contentDescription: String, + val tint: Color? = null, +) + enum class BridgeState(val title: String, val color: Color) { Connected("Connected", Color(0xFF2ECC71)), Connecting("Connecting…", Color(0xFFF1C40F)), diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/TalkOrbOverlay.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/TalkOrbOverlay.kt new file mode 100644 index 000000000..c36dbf8e4 --- /dev/null +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/TalkOrbOverlay.kt @@ -0,0 +1,134 @@ +package com.steipete.clawdis.node.ui + +import androidx.compose.animation.core.LinearEasing +import androidx.compose.animation.core.RepeatMode +import androidx.compose.animation.core.animateFloat +import androidx.compose.animation.core.infiniteRepeatable +import androidx.compose.animation.core.rememberInfiniteTransition +import androidx.compose.animation.core.tween +import androidx.compose.foundation.Canvas +import androidx.compose.foundation.layout.Arrangement +import androidx.compose.foundation.layout.Box +import androidx.compose.foundation.layout.Column +import androidx.compose.foundation.layout.padding +import androidx.compose.foundation.layout.size +import androidx.compose.foundation.shape.CircleShape +import androidx.compose.material3.MaterialTheme +import androidx.compose.material3.Surface +import androidx.compose.material3.Text +import androidx.compose.runtime.Composable +import androidx.compose.runtime.getValue +import androidx.compose.ui.Alignment +import androidx.compose.ui.Modifier +import androidx.compose.ui.graphics.Brush +import androidx.compose.ui.graphics.Color +import androidx.compose.ui.graphics.drawscope.Stroke +import androidx.compose.ui.text.font.FontWeight +import androidx.compose.ui.unit.dp + +@Composable +fun TalkOrbOverlay( + seamColor: Color, + statusText: String, + isListening: Boolean, + isSpeaking: Boolean, + modifier: Modifier = Modifier, +) { + val transition = rememberInfiniteTransition(label = "talk-orb") + val t by + transition.animateFloat( + initialValue = 0f, + targetValue = 1f, + animationSpec = + infiniteRepeatable( + animation = tween(durationMillis = 1500, easing = LinearEasing), + repeatMode = RepeatMode.Restart, + ), + label = "pulse", + ) + + val trimmed = statusText.trim() + val showStatus = trimmed.isNotEmpty() && trimmed != "Off" + val phase = + when { + isSpeaking -> "Speaking" + isListening -> "Listening" + else -> "Thinking" + } + + Column( + modifier = modifier.padding(24.dp), + horizontalAlignment = Alignment.CenterHorizontally, + verticalArrangement = Arrangement.spacedBy(12.dp), + ) { + Box(contentAlignment = Alignment.Center) { + Canvas(modifier = Modifier.size(360.dp)) { + val center = this.center + val baseRadius = size.minDimension * 0.30f + + val ring1 = 1.05f + (t * 0.25f) + val ring2 = 1.20f + (t * 0.55f) + val ringAlpha1 = (1f - t) * 0.34f + val ringAlpha2 = (1f - t) * 0.22f + + drawCircle( + color = seamColor.copy(alpha = ringAlpha1), + radius = baseRadius * ring1, + center = center, + style = Stroke(width = 3.dp.toPx()), + ) + drawCircle( + color = seamColor.copy(alpha = ringAlpha2), + radius = baseRadius * ring2, + center = center, + style = Stroke(width = 3.dp.toPx()), + ) + + drawCircle( + brush = + Brush.radialGradient( + colors = + listOf( + seamColor.copy(alpha = 0.92f), + seamColor.copy(alpha = 0.40f), + Color.Black.copy(alpha = 0.56f), + ), + center = center, + radius = baseRadius * 1.35f, + ), + radius = baseRadius, + center = center, + ) + + drawCircle( + color = seamColor.copy(alpha = 0.34f), + radius = baseRadius, + center = center, + style = Stroke(width = 1.dp.toPx()), + ) + } + } + + if (showStatus) { + Surface( + color = Color.Black.copy(alpha = 0.40f), + shape = CircleShape, + ) { + Text( + text = trimmed, + modifier = Modifier.padding(horizontal = 14.dp, vertical = 8.dp), + color = Color.White.copy(alpha = 0.92f), + style = MaterialTheme.typography.labelLarge, + fontWeight = FontWeight.SemiBold, + ) + } + } else { + Text( + text = phase, + color = Color.White.copy(alpha = 0.80f), + style = MaterialTheme.typography.labelLarge, + fontWeight = FontWeight.SemiBold, + ) + } + } +} diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/chat/ChatMarkdown.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/chat/ChatMarkdown.kt index a11237a93..75dd3b10c 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/chat/ChatMarkdown.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/chat/ChatMarkdown.kt @@ -17,6 +17,7 @@ import androidx.compose.runtime.mutableStateOf import androidx.compose.runtime.remember import androidx.compose.runtime.setValue import androidx.compose.ui.Modifier +import androidx.compose.ui.graphics.Color import androidx.compose.ui.graphics.asImageBitmap import androidx.compose.ui.layout.ContentScale import androidx.compose.ui.text.AnnotatedString @@ -31,7 +32,7 @@ import kotlinx.coroutines.Dispatchers import kotlinx.coroutines.withContext @Composable -fun ChatMarkdown(text: String) { +fun ChatMarkdown(text: String, textColor: Color) { val blocks = remember(text) { splitMarkdown(text) } val inlineCodeBg = MaterialTheme.colorScheme.surfaceContainerLow @@ -44,7 +45,7 @@ fun ChatMarkdown(text: String) { Text( text = parseInlineMarkdown(trimmed, inlineCodeBg = inlineCodeBg), style = MaterialTheme.typography.bodyMedium, - color = MaterialTheme.colorScheme.onSurface, + color = textColor, ) } is ChatMarkdownBlock.Code -> { diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/chat/ChatMessageViews.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/chat/ChatMessageViews.kt index d7a99e8e2..435bacbdc 100644 --- a/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/chat/ChatMessageViews.kt +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/ui/chat/ChatMessageViews.kt @@ -7,11 +7,9 @@ import androidx.compose.foundation.layout.Arrangement import androidx.compose.foundation.layout.Box import androidx.compose.foundation.layout.Column import androidx.compose.foundation.layout.Row -import androidx.compose.foundation.layout.Spacer import androidx.compose.foundation.layout.fillMaxWidth import androidx.compose.foundation.layout.padding import androidx.compose.foundation.layout.size -import androidx.compose.foundation.layout.width import androidx.compose.foundation.shape.CircleShape import androidx.compose.foundation.shape.RoundedCornerShape import androidx.compose.material3.MaterialTheme @@ -60,20 +58,21 @@ fun ChatMessageBubble(message: ChatMessage) { .background(bubbleBackground(isUser)) .padding(horizontal = 12.dp, vertical = 10.dp), ) { - ChatMessageBody(content = message.content) + val textColor = textColorOverBubble(isUser) + ChatMessageBody(content = message.content, textColor = textColor) } } } } @Composable -private fun ChatMessageBody(content: List) { +private fun ChatMessageBody(content: List, textColor: Color) { Column(verticalArrangement = Arrangement.spacedBy(10.dp)) { for (part in content) { when (part.type) { "text" -> { val text = part.text ?: continue - ChatMarkdown(text = text) + ChatMarkdown(text = text, textColor = textColor) } else -> { val b64 = part.base64 ?: continue @@ -131,7 +130,7 @@ fun ChatStreamingAssistantBubble(text: String) { color = MaterialTheme.colorScheme.surfaceContainer, ) { Box(modifier = Modifier.padding(horizontal = 12.dp, vertical = 10.dp)) { - ChatMarkdown(text = text) + ChatMarkdown(text = text, textColor = MaterialTheme.colorScheme.onSurface) } } } @@ -150,6 +149,15 @@ private fun bubbleBackground(isUser: Boolean): Brush { } } +@Composable +private fun textColorOverBubble(isUser: Boolean): Color { + return if (isUser) { + MaterialTheme.colorScheme.onPrimary + } else { + MaterialTheme.colorScheme.onSurface + } +} + @Composable private fun ChatBase64Image(base64: String, mimeType: String?) { var image by remember(base64) { mutableStateOf(null) } diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/StreamingMediaDataSource.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/StreamingMediaDataSource.kt new file mode 100644 index 000000000..0be4a016c --- /dev/null +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/StreamingMediaDataSource.kt @@ -0,0 +1,98 @@ +package com.steipete.clawdis.node.voice + +import android.media.MediaDataSource +import kotlin.math.min + +internal class StreamingMediaDataSource : MediaDataSource() { + private data class Chunk(val start: Long, val data: ByteArray) + + private val lock = Object() + private val chunks = ArrayList() + private var totalSize: Long = 0 + private var closed = false + private var finished = false + private var lastReadIndex = 0 + + fun append(data: ByteArray) { + if (data.isEmpty()) return + synchronized(lock) { + if (closed || finished) return + val chunk = Chunk(totalSize, data) + chunks.add(chunk) + totalSize += data.size.toLong() + lock.notifyAll() + } + } + + fun finish() { + synchronized(lock) { + if (closed) return + finished = true + lock.notifyAll() + } + } + + fun fail() { + synchronized(lock) { + closed = true + lock.notifyAll() + } + } + + override fun readAt(position: Long, buffer: ByteArray, offset: Int, size: Int): Int { + if (position < 0) return -1 + synchronized(lock) { + while (!closed && !finished && position >= totalSize) { + lock.wait() + } + if (closed) return -1 + if (position >= totalSize && finished) return -1 + + val available = (totalSize - position).toInt() + val toRead = min(size, available) + var remaining = toRead + var destOffset = offset + var pos = position + + var index = findChunkIndex(pos) + while (remaining > 0 && index < chunks.size) { + val chunk = chunks[index] + val inChunkOffset = (pos - chunk.start).toInt() + if (inChunkOffset >= chunk.data.size) { + index++ + continue + } + val copyLen = min(remaining, chunk.data.size - inChunkOffset) + System.arraycopy(chunk.data, inChunkOffset, buffer, destOffset, copyLen) + remaining -= copyLen + destOffset += copyLen + pos += copyLen + if (inChunkOffset + copyLen >= chunk.data.size) { + index++ + } + } + + return toRead - remaining + } + } + + override fun getSize(): Long = -1 + + override fun close() { + synchronized(lock) { + closed = true + lock.notifyAll() + } + } + + private fun findChunkIndex(position: Long): Int { + var index = lastReadIndex + while (index < chunks.size) { + val chunk = chunks[index] + if (position < chunk.start + chunk.data.size) break + index++ + } + lastReadIndex = index + return index + } +} diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/TalkDirectiveParser.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/TalkDirectiveParser.kt new file mode 100644 index 000000000..8dd059279 --- /dev/null +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/TalkDirectiveParser.kt @@ -0,0 +1,191 @@ +package com.steipete.clawdis.node.voice + +import kotlinx.serialization.json.Json +import kotlinx.serialization.json.JsonElement +import kotlinx.serialization.json.JsonObject +import kotlinx.serialization.json.JsonPrimitive + +private val directiveJson = Json { ignoreUnknownKeys = true } + +data class TalkDirective( + val voiceId: String? = null, + val modelId: String? = null, + val speed: Double? = null, + val rateWpm: Int? = null, + val stability: Double? = null, + val similarity: Double? = null, + val style: Double? = null, + val speakerBoost: Boolean? = null, + val seed: Long? = null, + val normalize: String? = null, + val language: String? = null, + val outputFormat: String? = null, + val latencyTier: Int? = null, + val once: Boolean? = null, +) + +data class TalkDirectiveParseResult( + val directive: TalkDirective?, + val stripped: String, + val unknownKeys: List, +) + +object TalkDirectiveParser { + fun parse(text: String): TalkDirectiveParseResult { + val normalized = text.replace("\r\n", "\n") + val lines = normalized.split("\n").toMutableList() + if (lines.isEmpty()) return TalkDirectiveParseResult(null, text, emptyList()) + + val firstNonEmpty = lines.indexOfFirst { it.trim().isNotEmpty() } + if (firstNonEmpty == -1) return TalkDirectiveParseResult(null, text, emptyList()) + + val head = lines[firstNonEmpty].trim() + if (!head.startsWith("{") || !head.endsWith("}")) { + return TalkDirectiveParseResult(null, text, emptyList()) + } + + val obj = parseJsonObject(head) ?: return TalkDirectiveParseResult(null, text, emptyList()) + + val speakerBoost = + boolValue(obj, listOf("speaker_boost", "speakerBoost")) + ?: boolValue(obj, listOf("no_speaker_boost", "noSpeakerBoost"))?.not() + + val directive = TalkDirective( + voiceId = stringValue(obj, listOf("voice", "voice_id", "voiceId")), + modelId = stringValue(obj, listOf("model", "model_id", "modelId")), + speed = doubleValue(obj, listOf("speed")), + rateWpm = intValue(obj, listOf("rate", "wpm")), + stability = doubleValue(obj, listOf("stability")), + similarity = doubleValue(obj, listOf("similarity", "similarity_boost", "similarityBoost")), + style = doubleValue(obj, listOf("style")), + speakerBoost = speakerBoost, + seed = longValue(obj, listOf("seed")), + normalize = stringValue(obj, listOf("normalize", "apply_text_normalization")), + language = stringValue(obj, listOf("lang", "language_code", "language")), + outputFormat = stringValue(obj, listOf("output_format", "format")), + latencyTier = intValue(obj, listOf("latency", "latency_tier", "latencyTier")), + once = boolValue(obj, listOf("once")), + ) + + val hasDirective = listOf( + directive.voiceId, + directive.modelId, + directive.speed, + directive.rateWpm, + directive.stability, + directive.similarity, + directive.style, + directive.speakerBoost, + directive.seed, + directive.normalize, + directive.language, + directive.outputFormat, + directive.latencyTier, + directive.once, + ).any { it != null } + + if (!hasDirective) return TalkDirectiveParseResult(null, text, emptyList()) + + val knownKeys = setOf( + "voice", "voice_id", "voiceid", + "model", "model_id", "modelid", + "speed", "rate", "wpm", + "stability", "similarity", "similarity_boost", "similarityboost", + "style", + "speaker_boost", "speakerboost", + "no_speaker_boost", "nospeakerboost", + "seed", + "normalize", "apply_text_normalization", + "lang", "language_code", "language", + "output_format", "format", + "latency", "latency_tier", "latencytier", + "once", + ) + val unknownKeys = obj.keys.filter { !knownKeys.contains(it.lowercase()) }.sorted() + + lines.removeAt(firstNonEmpty) + if (firstNonEmpty < lines.size) { + if (lines[firstNonEmpty].trim().isEmpty()) { + lines.removeAt(firstNonEmpty) + } + } + + return TalkDirectiveParseResult(directive, lines.joinToString("\n"), unknownKeys) + } + + private fun parseJsonObject(line: String): JsonObject? { + return try { + directiveJson.parseToJsonElement(line) as? JsonObject + } catch (_: Throwable) { + null + } + } + + private fun stringValue(obj: JsonObject, keys: List): String? { + for (key in keys) { + val value = obj[key].asStringOrNull()?.trim() + if (!value.isNullOrEmpty()) return value + } + return null + } + + private fun doubleValue(obj: JsonObject, keys: List): Double? { + for (key in keys) { + val value = obj[key].asDoubleOrNull() + if (value != null) return value + } + return null + } + + private fun intValue(obj: JsonObject, keys: List): Int? { + for (key in keys) { + val value = obj[key].asIntOrNull() + if (value != null) return value + } + return null + } + + private fun longValue(obj: JsonObject, keys: List): Long? { + for (key in keys) { + val value = obj[key].asLongOrNull() + if (value != null) return value + } + return null + } + + private fun boolValue(obj: JsonObject, keys: List): Boolean? { + for (key in keys) { + val value = obj[key].asBooleanOrNull() + if (value != null) return value + } + return null + } +} + +private fun JsonElement?.asStringOrNull(): String? = + (this as? JsonPrimitive)?.takeIf { it.isString }?.content + +private fun JsonElement?.asDoubleOrNull(): Double? { + val primitive = this as? JsonPrimitive ?: return null + return primitive.content.toDoubleOrNull() +} + +private fun JsonElement?.asIntOrNull(): Int? { + val primitive = this as? JsonPrimitive ?: return null + return primitive.content.toIntOrNull() +} + +private fun JsonElement?.asLongOrNull(): Long? { + val primitive = this as? JsonPrimitive ?: return null + return primitive.content.toLongOrNull() +} + +private fun JsonElement?.asBooleanOrNull(): Boolean? { + val primitive = this as? JsonPrimitive ?: return null + val content = primitive.content.trim().lowercase() + return when (content) { + "true", "yes", "1" -> true + "false", "no", "0" -> false + else -> null + } +} diff --git a/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/TalkModeManager.kt b/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/TalkModeManager.kt new file mode 100644 index 000000000..e3251601e --- /dev/null +++ b/apps/android/app/src/main/java/com/steipete/clawdis/node/voice/TalkModeManager.kt @@ -0,0 +1,1249 @@ +package com.steipete.clawdis.node.voice + +import android.Manifest +import android.content.Context +import android.content.Intent +import android.content.pm.PackageManager +import android.media.AudioAttributes +import android.media.AudioFormat +import android.media.AudioManager +import android.media.AudioTrack +import android.media.MediaPlayer +import android.os.Bundle +import android.os.Handler +import android.os.Looper +import android.os.SystemClock +import android.speech.RecognitionListener +import android.speech.RecognizerIntent +import android.speech.SpeechRecognizer +import android.speech.tts.TextToSpeech +import android.speech.tts.UtteranceProgressListener +import android.util.Log +import androidx.core.content.ContextCompat +import com.steipete.clawdis.node.bridge.BridgeSession +import java.net.HttpURLConnection +import java.net.URL +import java.util.UUID +import kotlinx.coroutines.CompletableDeferred +import kotlinx.coroutines.CoroutineScope +import kotlinx.coroutines.Dispatchers +import kotlinx.coroutines.Job +import kotlinx.coroutines.delay +import kotlinx.coroutines.flow.MutableStateFlow +import kotlinx.coroutines.flow.StateFlow +import kotlinx.coroutines.launch +import kotlinx.coroutines.withContext +import kotlinx.serialization.json.Json +import kotlinx.serialization.json.JsonArray +import kotlinx.serialization.json.JsonElement +import kotlinx.serialization.json.JsonObject +import kotlinx.serialization.json.JsonPrimitive +import kotlinx.serialization.json.buildJsonObject +import kotlin.math.max + +class TalkModeManager( + private val context: Context, + private val scope: CoroutineScope, +) { + companion object { + private const val tag = "TalkMode" + private const val defaultModelIdFallback = "eleven_v3" + private const val defaultOutputFormatFallback = "pcm_24000" + } + + private val mainHandler = Handler(Looper.getMainLooper()) + private val json = Json { ignoreUnknownKeys = true } + + private val _isEnabled = MutableStateFlow(false) + val isEnabled: StateFlow = _isEnabled + + private val _isListening = MutableStateFlow(false) + val isListening: StateFlow = _isListening + + private val _isSpeaking = MutableStateFlow(false) + val isSpeaking: StateFlow = _isSpeaking + + private val _statusText = MutableStateFlow("Off") + val statusText: StateFlow = _statusText + + private val _lastAssistantText = MutableStateFlow(null) + val lastAssistantText: StateFlow = _lastAssistantText + + private val _usingFallbackTts = MutableStateFlow(false) + val usingFallbackTts: StateFlow = _usingFallbackTts + + private var recognizer: SpeechRecognizer? = null + private var restartJob: Job? = null + private var stopRequested = false + private var listeningMode = false + + private var silenceJob: Job? = null + private val silenceWindowMs = 700L + private var lastTranscript: String = "" + private var lastHeardAtMs: Long? = null + private var lastSpokenText: String? = null + private var lastInterruptedAtSeconds: Double? = null + + private var defaultVoiceId: String? = null + private var currentVoiceId: String? = null + private var fallbackVoiceId: String? = null + private var defaultModelId: String? = null + private var currentModelId: String? = null + private var defaultOutputFormat: String? = null + private var apiKey: String? = null + private var voiceAliases: Map = emptyMap() + private var interruptOnSpeech: Boolean = true + private var voiceOverrideActive = false + private var modelOverrideActive = false + private var mainSessionKey: String = "main" + + private var session: BridgeSession? = null + private var pendingRunId: String? = null + private var pendingFinal: CompletableDeferred? = null + private var chatSubscribedSessionKey: String? = null + + private var player: MediaPlayer? = null + private var streamingSource: StreamingMediaDataSource? = null + private var pcmTrack: AudioTrack? = null + @Volatile private var pcmStopRequested = false + private var systemTts: TextToSpeech? = null + private var systemTtsPending: CompletableDeferred? = null + private var systemTtsPendingId: String? = null + + fun attachSession(session: BridgeSession) { + this.session = session + chatSubscribedSessionKey = null + } + + fun setEnabled(enabled: Boolean) { + if (_isEnabled.value == enabled) return + _isEnabled.value = enabled + if (enabled) { + Log.d(tag, "enabled") + start() + } else { + Log.d(tag, "disabled") + stop() + } + } + + fun handleBridgeEvent(event: String, payloadJson: String?) { + if (event != "chat") return + if (payloadJson.isNullOrBlank()) return + val pending = pendingRunId ?: return + val obj = + try { + json.parseToJsonElement(payloadJson).asObjectOrNull() + } catch (_: Throwable) { + null + } ?: return + val runId = obj["runId"].asStringOrNull() ?: return + if (runId != pending) return + val state = obj["state"].asStringOrNull() ?: return + if (state == "final") { + pendingFinal?.complete(true) + pendingFinal = null + pendingRunId = null + } + } + + private fun start() { + mainHandler.post { + if (_isListening.value) return@post + stopRequested = false + listeningMode = true + Log.d(tag, "start") + + if (!SpeechRecognizer.isRecognitionAvailable(context)) { + _statusText.value = "Speech recognizer unavailable" + Log.w(tag, "speech recognizer unavailable") + return@post + } + + val micOk = + ContextCompat.checkSelfPermission(context, Manifest.permission.RECORD_AUDIO) == + PackageManager.PERMISSION_GRANTED + if (!micOk) { + _statusText.value = "Microphone permission required" + Log.w(tag, "microphone permission required") + return@post + } + + try { + recognizer?.destroy() + recognizer = SpeechRecognizer.createSpeechRecognizer(context).also { it.setRecognitionListener(listener) } + startListeningInternal(markListening = true) + startSilenceMonitor() + Log.d(tag, "listening") + } catch (err: Throwable) { + _statusText.value = "Start failed: ${err.message ?: err::class.simpleName}" + Log.w(tag, "start failed: ${err.message ?: err::class.simpleName}") + } + } + } + + private fun stop() { + stopRequested = true + listeningMode = false + restartJob?.cancel() + restartJob = null + silenceJob?.cancel() + silenceJob = null + lastTranscript = "" + lastHeardAtMs = null + _isListening.value = false + _statusText.value = "Off" + stopSpeaking() + _usingFallbackTts.value = false + chatSubscribedSessionKey = null + + mainHandler.post { + recognizer?.cancel() + recognizer?.destroy() + recognizer = null + } + systemTts?.stop() + systemTtsPending?.cancel() + systemTtsPending = null + systemTtsPendingId = null + } + + private fun startListeningInternal(markListening: Boolean) { + val r = recognizer ?: return + val intent = + Intent(RecognizerIntent.ACTION_RECOGNIZE_SPEECH).apply { + putExtra(RecognizerIntent.EXTRA_LANGUAGE_MODEL, RecognizerIntent.LANGUAGE_MODEL_FREE_FORM) + putExtra(RecognizerIntent.EXTRA_PARTIAL_RESULTS, true) + putExtra(RecognizerIntent.EXTRA_MAX_RESULTS, 3) + putExtra(RecognizerIntent.EXTRA_CALLING_PACKAGE, context.packageName) + } + + if (markListening) { + _statusText.value = "Listening" + _isListening.value = true + } + r.startListening(intent) + } + + private fun scheduleRestart(delayMs: Long = 350) { + if (stopRequested) return + restartJob?.cancel() + restartJob = + scope.launch { + delay(delayMs) + mainHandler.post { + if (stopRequested) return@post + try { + recognizer?.cancel() + val shouldListen = listeningMode + val shouldInterrupt = _isSpeaking.value && interruptOnSpeech + if (!shouldListen && !shouldInterrupt) return@post + startListeningInternal(markListening = shouldListen) + } catch (_: Throwable) { + // handled by onError + } + } + } + } + + private fun handleTranscript(text: String, isFinal: Boolean) { + val trimmed = text.trim() + if (_isSpeaking.value && interruptOnSpeech) { + if (shouldInterrupt(trimmed)) { + stopSpeaking() + } + return + } + + if (!_isListening.value) return + + if (trimmed.isNotEmpty()) { + lastTranscript = trimmed + lastHeardAtMs = SystemClock.elapsedRealtime() + } + + if (isFinal) { + lastTranscript = trimmed + } + } + + private fun startSilenceMonitor() { + silenceJob?.cancel() + silenceJob = + scope.launch { + while (_isEnabled.value) { + delay(200) + checkSilence() + } + } + } + + private fun checkSilence() { + if (!_isListening.value) return + val transcript = lastTranscript.trim() + if (transcript.isEmpty()) return + val lastHeard = lastHeardAtMs ?: return + val elapsed = SystemClock.elapsedRealtime() - lastHeard + if (elapsed < silenceWindowMs) return + scope.launch { finalizeTranscript(transcript) } + } + + private suspend fun finalizeTranscript(transcript: String) { + listeningMode = false + _isListening.value = false + _statusText.value = "Thinking…" + lastTranscript = "" + lastHeardAtMs = null + + reloadConfig() + val prompt = buildPrompt(transcript) + val bridge = session + if (bridge == null) { + _statusText.value = "Bridge not connected" + Log.w(tag, "finalize: bridge not connected") + start() + return + } + + try { + val startedAt = System.currentTimeMillis().toDouble() / 1000.0 + subscribeChatIfNeeded(bridge = bridge, sessionKey = mainSessionKey) + Log.d(tag, "chat.send start sessionKey=${mainSessionKey.ifBlank { "main" }} chars=${prompt.length}") + val runId = sendChat(prompt, bridge) + Log.d(tag, "chat.send ok runId=$runId") + val ok = waitForChatFinal(runId) + if (!ok) { + Log.w(tag, "chat final timeout runId=$runId; attempting history fallback") + } + val assistant = waitForAssistantText(bridge, startedAt, if (ok) 12_000 else 25_000) + if (assistant.isNullOrBlank()) { + _statusText.value = "No reply" + Log.w(tag, "assistant text timeout runId=$runId") + start() + return + } + Log.d(tag, "assistant text ok chars=${assistant.length}") + playAssistant(assistant) + } catch (err: Throwable) { + _statusText.value = "Talk failed: ${err.message ?: err::class.simpleName}" + Log.w(tag, "finalize failed: ${err.message ?: err::class.simpleName}") + } + + if (_isEnabled.value) { + start() + } + } + + private suspend fun subscribeChatIfNeeded(bridge: BridgeSession, sessionKey: String) { + val key = sessionKey.trim() + if (key.isEmpty()) return + if (chatSubscribedSessionKey == key) return + try { + bridge.sendEvent("chat.subscribe", """{"sessionKey":"$key"}""") + chatSubscribedSessionKey = key + Log.d(tag, "chat.subscribe ok sessionKey=$key") + } catch (err: Throwable) { + Log.w(tag, "chat.subscribe failed sessionKey=$key err=${err.message ?: err::class.java.simpleName}") + } + } + + private fun buildPrompt(transcript: String): String { + val lines = mutableListOf( + "Talk Mode active. Reply in a concise, spoken tone.", + "You may optionally prefix the response with JSON (first line) to set ElevenLabs voice (id or alias), e.g. {\"voice\":\"\",\"once\":true}.", + ) + lastInterruptedAtSeconds?.let { + lines.add("Assistant speech interrupted at ${"%.1f".format(it)}s.") + lastInterruptedAtSeconds = null + } + lines.add("") + lines.add(transcript) + return lines.joinToString("\n") + } + + private suspend fun sendChat(message: String, bridge: BridgeSession): String { + val runId = UUID.randomUUID().toString() + val params = + buildJsonObject { + put("sessionKey", JsonPrimitive(mainSessionKey.ifBlank { "main" })) + put("message", JsonPrimitive(message)) + put("thinking", JsonPrimitive("low")) + put("timeoutMs", JsonPrimitive(30_000)) + put("idempotencyKey", JsonPrimitive(runId)) + } + val res = bridge.request("chat.send", params.toString()) + val parsed = parseRunId(res) ?: runId + if (parsed != runId) { + pendingRunId = parsed + } + return parsed + } + + private suspend fun waitForChatFinal(runId: String): Boolean { + pendingFinal?.cancel() + val deferred = CompletableDeferred() + pendingRunId = runId + pendingFinal = deferred + + val result = + withContext(Dispatchers.IO) { + try { + kotlinx.coroutines.withTimeout(120_000) { deferred.await() } + } catch (_: Throwable) { + false + } + } + + if (!result) { + pendingFinal = null + pendingRunId = null + } + return result + } + + private suspend fun waitForAssistantText( + bridge: BridgeSession, + sinceSeconds: Double, + timeoutMs: Long, + ): String? { + val deadline = SystemClock.elapsedRealtime() + timeoutMs + while (SystemClock.elapsedRealtime() < deadline) { + val text = fetchLatestAssistantText(bridge, sinceSeconds) + if (!text.isNullOrBlank()) return text + delay(300) + } + return null + } + + private suspend fun fetchLatestAssistantText( + bridge: BridgeSession, + sinceSeconds: Double? = null, + ): String? { + val key = mainSessionKey.ifBlank { "main" } + val res = bridge.request("chat.history", "{\"sessionKey\":\"$key\"}") + val root = json.parseToJsonElement(res).asObjectOrNull() ?: return null + val messages = root["messages"] as? JsonArray ?: return null + for (item in messages.reversed()) { + val obj = item.asObjectOrNull() ?: continue + if (obj["role"].asStringOrNull() != "assistant") continue + if (sinceSeconds != null) { + val timestamp = obj["timestamp"].asDoubleOrNull() + if (timestamp != null && !TalkModeRuntime.isMessageTimestampAfter(timestamp, sinceSeconds)) continue + } + val content = obj["content"] as? JsonArray ?: continue + val text = + content.mapNotNull { entry -> + entry.asObjectOrNull()?.get("text")?.asStringOrNull()?.trim() + }.filter { it.isNotEmpty() } + if (text.isNotEmpty()) return text.joinToString("\n") + } + return null + } + + private suspend fun playAssistant(text: String) { + val parsed = TalkDirectiveParser.parse(text) + if (parsed.unknownKeys.isNotEmpty()) { + Log.w(tag, "Unknown talk directive keys: ${parsed.unknownKeys}") + } + val directive = parsed.directive + val cleaned = parsed.stripped.trim() + if (cleaned.isEmpty()) return + _lastAssistantText.value = cleaned + + val requestedVoice = directive?.voiceId?.trim()?.takeIf { it.isNotEmpty() } + val resolvedVoice = resolveVoiceAlias(requestedVoice) + if (requestedVoice != null && resolvedVoice == null) { + Log.w(tag, "unknown voice alias: $requestedVoice") + } + + if (directive?.voiceId != null) { + if (directive.once != true) { + currentVoiceId = resolvedVoice + voiceOverrideActive = true + } + } + if (directive?.modelId != null) { + if (directive.once != true) { + currentModelId = directive.modelId + modelOverrideActive = true + } + } + + val apiKey = + apiKey?.trim()?.takeIf { it.isNotEmpty() } + ?: System.getenv("ELEVENLABS_API_KEY")?.trim() + val preferredVoice = resolvedVoice ?: currentVoiceId ?: defaultVoiceId + val voiceId = + if (!apiKey.isNullOrEmpty()) { + resolveVoiceId(preferredVoice, apiKey) + } else { + null + } + + _statusText.value = "Speaking…" + _isSpeaking.value = true + lastSpokenText = cleaned + ensureInterruptListener() + + try { + val canUseElevenLabs = !voiceId.isNullOrBlank() && !apiKey.isNullOrEmpty() + if (!canUseElevenLabs) { + if (voiceId.isNullOrBlank()) { + Log.w(tag, "missing voiceId; falling back to system voice") + } + if (apiKey.isNullOrEmpty()) { + Log.w(tag, "missing ELEVENLABS_API_KEY; falling back to system voice") + } + _usingFallbackTts.value = true + _statusText.value = "Speaking (System)…" + speakWithSystemTts(cleaned) + } else { + _usingFallbackTts.value = false + val ttsStarted = SystemClock.elapsedRealtime() + val modelId = directive?.modelId ?: currentModelId ?: defaultModelId + val request = + ElevenLabsRequest( + text = cleaned, + modelId = modelId, + outputFormat = + TalkModeRuntime.validatedOutputFormat(directive?.outputFormat ?: defaultOutputFormat), + speed = TalkModeRuntime.resolveSpeed(directive?.speed, directive?.rateWpm), + stability = TalkModeRuntime.validatedStability(directive?.stability, modelId), + similarity = TalkModeRuntime.validatedUnit(directive?.similarity), + style = TalkModeRuntime.validatedUnit(directive?.style), + speakerBoost = directive?.speakerBoost, + seed = TalkModeRuntime.validatedSeed(directive?.seed), + normalize = TalkModeRuntime.validatedNormalize(directive?.normalize), + language = TalkModeRuntime.validatedLanguage(directive?.language), + latencyTier = TalkModeRuntime.validatedLatencyTier(directive?.latencyTier), + ) + streamAndPlay(voiceId = voiceId!!, apiKey = apiKey!!, request = request) + Log.d(tag, "elevenlabs stream ok durMs=${SystemClock.elapsedRealtime() - ttsStarted}") + } + } catch (err: Throwable) { + Log.w(tag, "speak failed: ${err.message ?: err::class.simpleName}; falling back to system voice") + try { + _usingFallbackTts.value = true + _statusText.value = "Speaking (System)…" + speakWithSystemTts(cleaned) + } catch (fallbackErr: Throwable) { + _statusText.value = "Speak failed: ${fallbackErr.message ?: fallbackErr::class.simpleName}" + Log.w(tag, "system voice failed: ${fallbackErr.message ?: fallbackErr::class.simpleName}") + } + } + + _isSpeaking.value = false + } + + private suspend fun streamAndPlay(voiceId: String, apiKey: String, request: ElevenLabsRequest) { + stopSpeaking(resetInterrupt = false) + + pcmStopRequested = false + val pcmSampleRate = TalkModeRuntime.parsePcmSampleRate(request.outputFormat) + if (pcmSampleRate != null) { + try { + streamAndPlayPcm(voiceId = voiceId, apiKey = apiKey, request = request, sampleRate = pcmSampleRate) + return + } catch (err: Throwable) { + if (pcmStopRequested) return + Log.w(tag, "pcm playback failed; falling back to mp3: ${err.message ?: err::class.simpleName}") + } + } + + streamAndPlayMp3(voiceId = voiceId, apiKey = apiKey, request = request) + } + + private suspend fun streamAndPlayMp3(voiceId: String, apiKey: String, request: ElevenLabsRequest) { + val dataSource = StreamingMediaDataSource() + streamingSource = dataSource + + val player = MediaPlayer() + this.player = player + + val prepared = CompletableDeferred() + val finished = CompletableDeferred() + + player.setAudioAttributes( + AudioAttributes.Builder() + .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH) + .setUsage(AudioAttributes.USAGE_ASSISTANT) + .build(), + ) + player.setOnPreparedListener { + it.start() + prepared.complete(Unit) + } + player.setOnCompletionListener { + finished.complete(Unit) + } + player.setOnErrorListener { _, _, _ -> + finished.completeExceptionally(IllegalStateException("MediaPlayer error")) + true + } + + player.setDataSource(dataSource) + withContext(Dispatchers.Main) { + player.prepareAsync() + } + + val fetchError = CompletableDeferred() + val fetchJob = + scope.launch(Dispatchers.IO) { + try { + streamTts(voiceId = voiceId, apiKey = apiKey, request = request, sink = dataSource) + fetchError.complete(null) + } catch (err: Throwable) { + dataSource.fail() + fetchError.complete(err) + } + } + + Log.d(tag, "play start") + try { + prepared.await() + finished.await() + fetchError.await()?.let { throw it } + } finally { + fetchJob.cancel() + cleanupPlayer() + } + Log.d(tag, "play done") + } + + private suspend fun streamAndPlayPcm( + voiceId: String, + apiKey: String, + request: ElevenLabsRequest, + sampleRate: Int, + ) { + val minBuffer = + AudioTrack.getMinBufferSize( + sampleRate, + AudioFormat.CHANNEL_OUT_MONO, + AudioFormat.ENCODING_PCM_16BIT, + ) + if (minBuffer <= 0) { + throw IllegalStateException("AudioTrack buffer size invalid: $minBuffer") + } + + val bufferSize = max(minBuffer * 2, 8 * 1024) + val track = + AudioTrack( + AudioAttributes.Builder() + .setContentType(AudioAttributes.CONTENT_TYPE_SPEECH) + .setUsage(AudioAttributes.USAGE_ASSISTANT) + .build(), + AudioFormat.Builder() + .setSampleRate(sampleRate) + .setChannelMask(AudioFormat.CHANNEL_OUT_MONO) + .setEncoding(AudioFormat.ENCODING_PCM_16BIT) + .build(), + bufferSize, + AudioTrack.MODE_STREAM, + AudioManager.AUDIO_SESSION_ID_GENERATE, + ) + if (track.state != AudioTrack.STATE_INITIALIZED) { + track.release() + throw IllegalStateException("AudioTrack init failed") + } + pcmTrack = track + track.play() + + Log.d(tag, "pcm play start sampleRate=$sampleRate bufferSize=$bufferSize") + try { + streamPcm(voiceId = voiceId, apiKey = apiKey, request = request, track = track) + } finally { + cleanupPcmTrack() + } + Log.d(tag, "pcm play done") + } + + private suspend fun speakWithSystemTts(text: String) { + val trimmed = text.trim() + if (trimmed.isEmpty()) return + val ok = ensureSystemTts() + if (!ok) { + throw IllegalStateException("system TTS unavailable") + } + + val tts = systemTts ?: throw IllegalStateException("system TTS unavailable") + val utteranceId = "talk-${UUID.randomUUID()}" + val deferred = CompletableDeferred() + systemTtsPending?.cancel() + systemTtsPending = deferred + systemTtsPendingId = utteranceId + + withContext(Dispatchers.Main) { + val params = Bundle() + tts.speak(trimmed, TextToSpeech.QUEUE_FLUSH, params, utteranceId) + } + + withContext(Dispatchers.IO) { + try { + kotlinx.coroutines.withTimeout(180_000) { deferred.await() } + } catch (err: Throwable) { + throw err + } + } + } + + private suspend fun ensureSystemTts(): Boolean { + if (systemTts != null) return true + return withContext(Dispatchers.Main) { + val deferred = CompletableDeferred() + val tts = + try { + TextToSpeech(context) { status -> + deferred.complete(status == TextToSpeech.SUCCESS) + } + } catch (_: Throwable) { + deferred.complete(false) + null + } + if (tts == null) return@withContext false + + tts.setOnUtteranceProgressListener( + object : UtteranceProgressListener() { + override fun onStart(utteranceId: String?) {} + + override fun onDone(utteranceId: String?) { + if (utteranceId == null) return + if (utteranceId != systemTtsPendingId) return + systemTtsPending?.complete(Unit) + systemTtsPending = null + systemTtsPendingId = null + } + + @Deprecated("Deprecated in Java") + override fun onError(utteranceId: String?) { + if (utteranceId == null) return + if (utteranceId != systemTtsPendingId) return + systemTtsPending?.completeExceptionally(IllegalStateException("system TTS error")) + systemTtsPending = null + systemTtsPendingId = null + } + + override fun onError(utteranceId: String?, errorCode: Int) { + if (utteranceId == null) return + if (utteranceId != systemTtsPendingId) return + systemTtsPending?.completeExceptionally(IllegalStateException("system TTS error $errorCode")) + systemTtsPending = null + systemTtsPendingId = null + } + }, + ) + + val ok = + try { + deferred.await() + } catch (_: Throwable) { + false + } + if (ok) { + systemTts = tts + } else { + tts.shutdown() + } + ok + } + } + + private fun stopSpeaking(resetInterrupt: Boolean = true) { + pcmStopRequested = true + if (!_isSpeaking.value) { + cleanupPlayer() + cleanupPcmTrack() + systemTts?.stop() + systemTtsPending?.cancel() + systemTtsPending = null + systemTtsPendingId = null + return + } + if (resetInterrupt) { + val currentMs = player?.currentPosition?.toDouble() ?: 0.0 + lastInterruptedAtSeconds = currentMs / 1000.0 + } + cleanupPlayer() + cleanupPcmTrack() + systemTts?.stop() + systemTtsPending?.cancel() + systemTtsPending = null + systemTtsPendingId = null + _isSpeaking.value = false + } + + private fun cleanupPlayer() { + player?.stop() + player?.release() + player = null + streamingSource?.close() + streamingSource = null + } + + private fun cleanupPcmTrack() { + val track = pcmTrack ?: return + try { + track.pause() + track.flush() + track.stop() + } catch (_: Throwable) { + // ignore cleanup errors + } finally { + track.release() + } + pcmTrack = null + } + + private fun shouldInterrupt(transcript: String): Boolean { + val trimmed = transcript.trim() + if (trimmed.length < 3) return false + val spoken = lastSpokenText?.lowercase() + if (spoken != null && spoken.contains(trimmed.lowercase())) return false + return true + } + + private suspend fun reloadConfig() { + val bridge = session ?: return + val envVoice = System.getenv("ELEVENLABS_VOICE_ID")?.trim() + val sagVoice = System.getenv("SAG_VOICE_ID")?.trim() + val envKey = System.getenv("ELEVENLABS_API_KEY")?.trim() + try { + val res = bridge.request("config.get", "{}") + val root = json.parseToJsonElement(res).asObjectOrNull() + val config = root?.get("config").asObjectOrNull() + val talk = config?.get("talk").asObjectOrNull() + val sessionCfg = config?.get("session").asObjectOrNull() + val mainKey = sessionCfg?.get("mainKey").asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() } ?: "main" + val voice = talk?.get("voiceId")?.asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() } + val aliases = + talk?.get("voiceAliases").asObjectOrNull()?.entries?.mapNotNull { (key, value) -> + val id = value.asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() } ?: return@mapNotNull null + normalizeAliasKey(key).takeIf { it.isNotEmpty() }?.let { it to id } + }?.toMap().orEmpty() + val model = talk?.get("modelId")?.asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() } + val outputFormat = talk?.get("outputFormat")?.asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() } + val key = talk?.get("apiKey")?.asStringOrNull()?.trim()?.takeIf { it.isNotEmpty() } + val interrupt = talk?.get("interruptOnSpeech")?.asBooleanOrNull() + + mainSessionKey = mainKey + defaultVoiceId = voice ?: envVoice?.takeIf { it.isNotEmpty() } ?: sagVoice?.takeIf { it.isNotEmpty() } + voiceAliases = aliases + if (!voiceOverrideActive) currentVoiceId = defaultVoiceId + defaultModelId = model ?: defaultModelIdFallback + if (!modelOverrideActive) currentModelId = defaultModelId + defaultOutputFormat = outputFormat ?: defaultOutputFormatFallback + apiKey = key ?: envKey?.takeIf { it.isNotEmpty() } + if (interrupt != null) interruptOnSpeech = interrupt + } catch (_: Throwable) { + defaultVoiceId = envVoice?.takeIf { it.isNotEmpty() } ?: sagVoice?.takeIf { it.isNotEmpty() } + defaultModelId = defaultModelIdFallback + if (!modelOverrideActive) currentModelId = defaultModelId + apiKey = envKey?.takeIf { it.isNotEmpty() } + voiceAliases = emptyMap() + defaultOutputFormat = defaultOutputFormatFallback + } + } + + private fun parseRunId(jsonString: String): String? { + val obj = json.parseToJsonElement(jsonString).asObjectOrNull() ?: return null + return obj["runId"].asStringOrNull() + } + + private suspend fun streamTts( + voiceId: String, + apiKey: String, + request: ElevenLabsRequest, + sink: StreamingMediaDataSource, + ) { + withContext(Dispatchers.IO) { + val conn = openTtsConnection(voiceId = voiceId, apiKey = apiKey, request = request) + try { + val payload = buildRequestPayload(request) + conn.outputStream.use { it.write(payload.toByteArray()) } + + val code = conn.responseCode + if (code >= 400) { + val message = conn.errorStream?.readBytes()?.toString(Charsets.UTF_8) ?: "" + sink.fail() + throw IllegalStateException("ElevenLabs failed: $code $message") + } + + val buffer = ByteArray(8 * 1024) + conn.inputStream.use { input -> + while (true) { + val read = input.read(buffer) + if (read <= 0) break + sink.append(buffer.copyOf(read)) + } + } + sink.finish() + } finally { + conn.disconnect() + } + } + } + + private suspend fun streamPcm( + voiceId: String, + apiKey: String, + request: ElevenLabsRequest, + track: AudioTrack, + ) { + withContext(Dispatchers.IO) { + val conn = openTtsConnection(voiceId = voiceId, apiKey = apiKey, request = request) + try { + val payload = buildRequestPayload(request) + conn.outputStream.use { it.write(payload.toByteArray()) } + + val code = conn.responseCode + if (code >= 400) { + val message = conn.errorStream?.readBytes()?.toString(Charsets.UTF_8) ?: "" + throw IllegalStateException("ElevenLabs failed: $code $message") + } + + val buffer = ByteArray(8 * 1024) + conn.inputStream.use { input -> + while (true) { + if (pcmStopRequested) return@withContext + val read = input.read(buffer) + if (read <= 0) break + var offset = 0 + while (offset < read) { + if (pcmStopRequested) return@withContext + val wrote = + try { + track.write(buffer, offset, read - offset) + } catch (err: Throwable) { + if (pcmStopRequested) return@withContext + throw err + } + if (wrote <= 0) { + if (pcmStopRequested) return@withContext + throw IllegalStateException("AudioTrack write failed: $wrote") + } + offset += wrote + } + } + } + } finally { + conn.disconnect() + } + } + } + + private fun openTtsConnection( + voiceId: String, + apiKey: String, + request: ElevenLabsRequest, + ): HttpURLConnection { + val baseUrl = "https://api.elevenlabs.io/v1/text-to-speech/$voiceId/stream" + val latencyTier = request.latencyTier + val url = + if (latencyTier != null) { + URL("$baseUrl?optimize_streaming_latency=$latencyTier") + } else { + URL(baseUrl) + } + val conn = url.openConnection() as HttpURLConnection + conn.requestMethod = "POST" + conn.connectTimeout = 30_000 + conn.readTimeout = 30_000 + conn.setRequestProperty("Content-Type", "application/json") + conn.setRequestProperty("Accept", resolveAcceptHeader(request.outputFormat)) + conn.setRequestProperty("xi-api-key", apiKey) + conn.doOutput = true + return conn + } + + private fun resolveAcceptHeader(outputFormat: String?): String { + val normalized = outputFormat?.trim()?.lowercase().orEmpty() + return if (normalized.startsWith("pcm_")) "audio/pcm" else "audio/mpeg" + } + + private fun buildRequestPayload(request: ElevenLabsRequest): String { + val voiceSettingsEntries = + buildJsonObject { + request.speed?.let { put("speed", JsonPrimitive(it)) } + request.stability?.let { put("stability", JsonPrimitive(it)) } + request.similarity?.let { put("similarity_boost", JsonPrimitive(it)) } + request.style?.let { put("style", JsonPrimitive(it)) } + request.speakerBoost?.let { put("use_speaker_boost", JsonPrimitive(it)) } + } + + val payload = + buildJsonObject { + put("text", JsonPrimitive(request.text)) + request.modelId?.takeIf { it.isNotEmpty() }?.let { put("model_id", JsonPrimitive(it)) } + request.outputFormat?.takeIf { it.isNotEmpty() }?.let { put("output_format", JsonPrimitive(it)) } + request.seed?.let { put("seed", JsonPrimitive(it)) } + request.normalize?.let { put("apply_text_normalization", JsonPrimitive(it)) } + request.language?.let { put("language_code", JsonPrimitive(it)) } + if (voiceSettingsEntries.isNotEmpty()) { + put("voice_settings", voiceSettingsEntries) + } + } + + return payload.toString() + } + + private data class ElevenLabsRequest( + val text: String, + val modelId: String?, + val outputFormat: String?, + val speed: Double?, + val stability: Double?, + val similarity: Double?, + val style: Double?, + val speakerBoost: Boolean?, + val seed: Long?, + val normalize: String?, + val language: String?, + val latencyTier: Int?, + ) + + private object TalkModeRuntime { + fun resolveSpeed(speed: Double?, rateWpm: Int?): Double? { + if (rateWpm != null && rateWpm > 0) { + val resolved = rateWpm.toDouble() / 175.0 + if (resolved <= 0.5 || resolved >= 2.0) return null + return resolved + } + if (speed != null) { + if (speed <= 0.5 || speed >= 2.0) return null + return speed + } + return null + } + + fun validatedUnit(value: Double?): Double? { + if (value == null) return null + if (value < 0 || value > 1) return null + return value + } + + fun validatedStability(value: Double?, modelId: String?): Double? { + if (value == null) return null + val normalized = modelId?.trim()?.lowercase() + if (normalized == "eleven_v3") { + return if (value == 0.0 || value == 0.5 || value == 1.0) value else null + } + return validatedUnit(value) + } + + fun validatedSeed(value: Long?): Long? { + if (value == null) return null + if (value < 0 || value > 4294967295L) return null + return value + } + + fun validatedNormalize(value: String?): String? { + val normalized = value?.trim()?.lowercase() ?: return null + return if (normalized in listOf("auto", "on", "off")) normalized else null + } + + fun validatedLanguage(value: String?): String? { + val normalized = value?.trim()?.lowercase() ?: return null + if (normalized.length != 2) return null + if (!normalized.all { it in 'a'..'z' }) return null + return normalized + } + + fun validatedOutputFormat(value: String?): String? { + val trimmed = value?.trim()?.lowercase() ?: return null + if (trimmed.isEmpty()) return null + if (trimmed.startsWith("mp3_")) return trimmed + return if (parsePcmSampleRate(trimmed) != null) trimmed else null + } + + fun validatedLatencyTier(value: Int?): Int? { + if (value == null) return null + if (value < 0 || value > 4) return null + return value + } + + fun parsePcmSampleRate(value: String?): Int? { + val trimmed = value?.trim()?.lowercase() ?: return null + if (!trimmed.startsWith("pcm_")) return null + val suffix = trimmed.removePrefix("pcm_") + val digits = suffix.takeWhile { it.isDigit() } + val rate = digits.toIntOrNull() ?: return null + return if (rate in setOf(16000, 22050, 24000, 44100)) rate else null + } + + fun isMessageTimestampAfter(timestamp: Double, sinceSeconds: Double): Boolean { + val sinceMs = sinceSeconds * 1000 + return if (timestamp > 10_000_000_000) { + timestamp >= sinceMs - 500 + } else { + timestamp >= sinceSeconds - 0.5 + } + } + } + + private fun ensureInterruptListener() { + if (!interruptOnSpeech || !_isEnabled.value) return + mainHandler.post { + if (stopRequested) return@post + if (!SpeechRecognizer.isRecognitionAvailable(context)) return@post + try { + if (recognizer == null) { + recognizer = SpeechRecognizer.createSpeechRecognizer(context).also { it.setRecognitionListener(listener) } + } + recognizer?.cancel() + startListeningInternal(markListening = false) + } catch (_: Throwable) { + // ignore + } + } + } + + private fun resolveVoiceAlias(value: String?): String? { + val trimmed = value?.trim().orEmpty() + if (trimmed.isEmpty()) return null + val normalized = normalizeAliasKey(trimmed) + voiceAliases[normalized]?.let { return it } + if (voiceAliases.values.any { it.equals(trimmed, ignoreCase = true) }) return trimmed + return if (isLikelyVoiceId(trimmed)) trimmed else null + } + + private suspend fun resolveVoiceId(preferred: String?, apiKey: String): String? { + val trimmed = preferred?.trim().orEmpty() + if (trimmed.isNotEmpty()) { + val resolved = resolveVoiceAlias(trimmed) + if (resolved != null) return resolved + Log.w(tag, "unknown voice alias $trimmed") + } + fallbackVoiceId?.let { return it } + + return try { + val voices = listVoices(apiKey) + val first = voices.firstOrNull() ?: return null + fallbackVoiceId = first.voiceId + if (defaultVoiceId.isNullOrBlank()) { + defaultVoiceId = first.voiceId + } + if (!voiceOverrideActive) { + currentVoiceId = first.voiceId + } + val name = first.name ?: "unknown" + Log.d(tag, "default voice selected $name (${first.voiceId})") + first.voiceId + } catch (err: Throwable) { + Log.w(tag, "list voices failed: ${err.message ?: err::class.simpleName}") + null + } + } + + private suspend fun listVoices(apiKey: String): List { + return withContext(Dispatchers.IO) { + val url = URL("https://api.elevenlabs.io/v1/voices") + val conn = url.openConnection() as HttpURLConnection + conn.requestMethod = "GET" + conn.connectTimeout = 15_000 + conn.readTimeout = 15_000 + conn.setRequestProperty("xi-api-key", apiKey) + + val code = conn.responseCode + val stream = if (code >= 400) conn.errorStream else conn.inputStream + val data = stream.readBytes() + if (code >= 400) { + val message = data.toString(Charsets.UTF_8) + throw IllegalStateException("ElevenLabs voices failed: $code $message") + } + + val root = json.parseToJsonElement(data.toString(Charsets.UTF_8)).asObjectOrNull() + val voices = (root?.get("voices") as? JsonArray) ?: JsonArray(emptyList()) + voices.mapNotNull { entry -> + val obj = entry.asObjectOrNull() ?: return@mapNotNull null + val voiceId = obj["voice_id"].asStringOrNull() ?: return@mapNotNull null + val name = obj["name"].asStringOrNull() + ElevenLabsVoice(voiceId, name) + } + } + } + + private fun isLikelyVoiceId(value: String): Boolean { + if (value.length < 10) return false + return value.all { it.isLetterOrDigit() || it == '-' || it == '_' } + } + + private fun normalizeAliasKey(value: String): String = + value.trim().lowercase() + + private data class ElevenLabsVoice(val voiceId: String, val name: String?) + + private val listener = + object : RecognitionListener { + override fun onReadyForSpeech(params: Bundle?) { + if (_isEnabled.value) { + _statusText.value = if (_isListening.value) "Listening" else _statusText.value + } + } + + override fun onBeginningOfSpeech() {} + + override fun onRmsChanged(rmsdB: Float) {} + + override fun onBufferReceived(buffer: ByteArray?) {} + + override fun onEndOfSpeech() { + scheduleRestart() + } + + override fun onError(error: Int) { + if (stopRequested) return + _isListening.value = false + if (error == SpeechRecognizer.ERROR_INSUFFICIENT_PERMISSIONS) { + _statusText.value = "Microphone permission required" + return + } + + _statusText.value = + when (error) { + SpeechRecognizer.ERROR_AUDIO -> "Audio error" + SpeechRecognizer.ERROR_CLIENT -> "Client error" + SpeechRecognizer.ERROR_NETWORK -> "Network error" + SpeechRecognizer.ERROR_NETWORK_TIMEOUT -> "Network timeout" + SpeechRecognizer.ERROR_NO_MATCH -> "Listening" + SpeechRecognizer.ERROR_RECOGNIZER_BUSY -> "Recognizer busy" + SpeechRecognizer.ERROR_SERVER -> "Server error" + SpeechRecognizer.ERROR_SPEECH_TIMEOUT -> "Listening" + else -> "Speech error ($error)" + } + scheduleRestart(delayMs = 600) + } + + override fun onResults(results: Bundle?) { + val list = results?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION).orEmpty() + list.firstOrNull()?.let { handleTranscript(it, isFinal = true) } + scheduleRestart() + } + + override fun onPartialResults(partialResults: Bundle?) { + val list = partialResults?.getStringArrayList(SpeechRecognizer.RESULTS_RECOGNITION).orEmpty() + list.firstOrNull()?.let { handleTranscript(it, isFinal = false) } + } + + override fun onEvent(eventType: Int, params: Bundle?) {} + } +} + +private fun JsonElement?.asObjectOrNull(): JsonObject? = this as? JsonObject + +private fun JsonElement?.asStringOrNull(): String? = + (this as? JsonPrimitive)?.takeIf { it.isString }?.content + +private fun JsonElement?.asDoubleOrNull(): Double? { + val primitive = this as? JsonPrimitive ?: return null + return primitive.content.toDoubleOrNull() +} + +private fun JsonElement?.asBooleanOrNull(): Boolean? { + val primitive = this as? JsonPrimitive ?: return null + val content = primitive.content.trim().lowercase() + return when (content) { + "true", "yes", "1" -> true + "false", "no", "0" -> false + else -> null + } +} diff --git a/apps/android/app/src/test/java/com/steipete/clawdis/node/bridge/BridgeEndpointKotestTest.kt b/apps/android/app/src/test/java/com/steipete/clawdis/node/bridge/BridgeEndpointKotestTest.kt new file mode 100644 index 000000000..5e1b09490 --- /dev/null +++ b/apps/android/app/src/test/java/com/steipete/clawdis/node/bridge/BridgeEndpointKotestTest.kt @@ -0,0 +1,14 @@ +package com.steipete.clawdis.node.bridge + +import io.kotest.core.spec.style.StringSpec +import io.kotest.matchers.shouldBe + +class BridgeEndpointKotestTest : StringSpec({ + "manual endpoint builds stable id + name" { + val endpoint = BridgeEndpoint.manual("10.0.0.5", 18790) + endpoint.stableId shouldBe "manual|10.0.0.5|18790" + endpoint.name shouldBe "10.0.0.5:18790" + endpoint.host shouldBe "10.0.0.5" + endpoint.port shouldBe 18790 + } +}) diff --git a/apps/android/app/src/test/java/com/steipete/clawdis/node/node/JpegSizeLimiterTest.kt b/apps/android/app/src/test/java/com/steipete/clawdis/node/node/JpegSizeLimiterTest.kt new file mode 100644 index 000000000..457bd189d --- /dev/null +++ b/apps/android/app/src/test/java/com/steipete/clawdis/node/node/JpegSizeLimiterTest.kt @@ -0,0 +1,47 @@ +package com.steipete.clawdis.node.node + +import org.junit.Assert.assertEquals +import org.junit.Assert.assertTrue +import org.junit.Test +import kotlin.math.min + +class JpegSizeLimiterTest { + @Test + fun compressesLargePayloadsUnderLimit() { + val maxBytes = 5 * 1024 * 1024 + val result = + JpegSizeLimiter.compressToLimit( + initialWidth = 4000, + initialHeight = 3000, + startQuality = 95, + maxBytes = maxBytes, + encode = { width, height, quality -> + val estimated = (width.toLong() * height.toLong() * quality.toLong()) / 100 + val size = min(maxBytes.toLong() * 2, estimated).toInt() + ByteArray(size) + }, + ) + + assertTrue(result.bytes.size <= maxBytes) + assertTrue(result.width <= 4000) + assertTrue(result.height <= 3000) + assertTrue(result.quality <= 95) + } + + @Test + fun keepsSmallPayloadsAsIs() { + val maxBytes = 5 * 1024 * 1024 + val result = + JpegSizeLimiter.compressToLimit( + initialWidth = 800, + initialHeight = 600, + startQuality = 90, + maxBytes = maxBytes, + encode = { _, _, _ -> ByteArray(120_000) }, + ) + + assertEquals(800, result.width) + assertEquals(600, result.height) + assertEquals(90, result.quality) + } +} diff --git a/apps/android/app/src/test/java/com/steipete/clawdis/node/voice/TalkDirectiveParserTest.kt b/apps/android/app/src/test/java/com/steipete/clawdis/node/voice/TalkDirectiveParserTest.kt new file mode 100644 index 000000000..d69d2008f --- /dev/null +++ b/apps/android/app/src/test/java/com/steipete/clawdis/node/voice/TalkDirectiveParserTest.kt @@ -0,0 +1,55 @@ +package com.steipete.clawdis.node.voice + +import org.junit.Assert.assertEquals +import org.junit.Assert.assertNull +import org.junit.Assert.assertTrue +import org.junit.Test + +class TalkDirectiveParserTest { + @Test + fun parsesDirectiveAndStripsHeader() { + val input = """ + {"voice":"voice-123","once":true} + Hello from talk mode. + """.trimIndent() + val result = TalkDirectiveParser.parse(input) + assertEquals("voice-123", result.directive?.voiceId) + assertEquals(true, result.directive?.once) + assertEquals("Hello from talk mode.", result.stripped.trim()) + } + + @Test + fun ignoresUnknownKeysButReportsThem() { + val input = """ + {"voice":"abc","foo":1,"bar":"baz"} + Hi there. + """.trimIndent() + val result = TalkDirectiveParser.parse(input) + assertEquals("abc", result.directive?.voiceId) + assertTrue(result.unknownKeys.containsAll(listOf("bar", "foo"))) + } + + @Test + fun parsesAlternateKeys() { + val input = """ + {"model_id":"eleven_v3","similarity_boost":0.4,"no_speaker_boost":true,"rate":200} + Speak. + """.trimIndent() + val result = TalkDirectiveParser.parse(input) + assertEquals("eleven_v3", result.directive?.modelId) + assertEquals(0.4, result.directive?.similarity) + assertEquals(false, result.directive?.speakerBoost) + assertEquals(200, result.directive?.rateWpm) + } + + @Test + fun returnsNullWhenNoDirectivePresent() { + val input = """ + {} + Hello. + """.trimIndent() + val result = TalkDirectiveParser.parse(input) + assertNull(result.directive) + assertEquals(input, result.stripped) + } +} diff --git a/apps/ios/Sources/Bridge/BridgeConnectionController.swift b/apps/ios/Sources/Bridge/BridgeConnectionController.swift index 162e13858..8e1347058 100644 --- a/apps/ios/Sources/Bridge/BridgeConnectionController.swift +++ b/apps/ios/Sources/Bridge/BridgeConnectionController.swift @@ -6,6 +6,15 @@ import Observation import SwiftUI import UIKit +protocol BridgePairingClient: Sendable { + func pairAndHello( + endpoint: NWEndpoint, + hello: BridgeHello, + onStatus: (@Sendable (String) -> Void)?) async throws -> String +} + +extension BridgeClient: BridgePairingClient {} + @MainActor @Observable final class BridgeConnectionController { @@ -16,10 +25,16 @@ final class BridgeConnectionController { private let discovery = BridgeDiscoveryModel() private weak var appModel: NodeAppModel? private var didAutoConnect = false - private var seenStableIDs = Set() - init(appModel: NodeAppModel, startDiscovery: Bool = true) { + private let bridgeClientFactory: @Sendable () -> any BridgePairingClient + + init( + appModel: NodeAppModel, + startDiscovery: Bool = true, + bridgeClientFactory: @escaping @Sendable () -> any BridgePairingClient = { BridgeClient() }) + { self.appModel = appModel + self.bridgeClientFactory = bridgeClientFactory BridgeSettingsStore.bootstrapPersistence() let defaults = UserDefaults.standard @@ -85,7 +100,7 @@ final class BridgeConnectionController { let token = KeychainStore.loadString( service: "com.steipete.clawdis.bridge", - account: "bridge-token.\(instanceId)")? + account: self.keychainAccount(instanceId: instanceId))? .trimmingCharacters(in: .whitespacesAndNewlines) ?? "" guard !token.isEmpty else { return } @@ -99,28 +114,40 @@ final class BridgeConnectionController { guard let port = NWEndpoint.Port(rawValue: UInt16(resolvedPort)) else { return } self.didAutoConnect = true - appModel.connectToBridge( - endpoint: .hostPort(host: NWEndpoint.Host(manualHost), port: port), - hello: self.makeHello(token: token)) + let endpoint = NWEndpoint.hostPort(host: NWEndpoint.Host(manualHost), port: port) + self.startAutoConnect(endpoint: endpoint, token: token, instanceId: instanceId) return } - let targetStableID = defaults.string(forKey: "bridge.lastDiscoveredStableID")? + let preferredStableID = defaults.string(forKey: "bridge.preferredStableID")? .trimmingCharacters(in: .whitespacesAndNewlines) ?? "" - guard !targetStableID.isEmpty else { return } + let lastDiscoveredStableID = defaults.string(forKey: "bridge.lastDiscoveredStableID")? + .trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + + let candidates = [preferredStableID, lastDiscoveredStableID].filter { !$0.isEmpty } + guard let targetStableID = candidates.first(where: { id in + self.bridges.contains(where: { $0.stableID == id }) + }) else { return } guard let target = self.bridges.first(where: { $0.stableID == targetStableID }) else { return } self.didAutoConnect = true - appModel.connectToBridge(endpoint: target.endpoint, hello: self.makeHello(token: token)) + self.startAutoConnect(endpoint: target.endpoint, token: token, instanceId: instanceId) } private func updateLastDiscoveredBridge(from bridges: [BridgeDiscoveryModel.DiscoveredBridge]) { - let newlyDiscovered = bridges.filter { self.seenStableIDs.insert($0.stableID).inserted } - guard let last = newlyDiscovered.last else { return } + let defaults = UserDefaults.standard + let preferred = defaults.string(forKey: "bridge.preferredStableID")? + .trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + let existingLast = defaults.string(forKey: "bridge.lastDiscoveredStableID")? + .trimmingCharacters(in: .whitespacesAndNewlines) ?? "" - UserDefaults.standard.set(last.stableID, forKey: "bridge.lastDiscoveredStableID") - BridgeSettingsStore.saveLastDiscoveredBridgeStableID(last.stableID) + // Avoid overriding user intent (preferred/lastDiscovered are also set on manual Connect). + guard preferred.isEmpty, existingLast.isEmpty else { return } + guard let first = bridges.first else { return } + + defaults.set(first.stableID, forKey: "bridge.lastDiscoveredStableID") + BridgeSettingsStore.saveLastDiscoveredBridgeStableID(first.stableID) } private func makeHello(token: String) -> BridgeHello { @@ -140,6 +167,40 @@ final class BridgeConnectionController { commands: self.currentCommands()) } + private func keychainAccount(instanceId: String) -> String { + "bridge-token.\(instanceId)" + } + + private func startAutoConnect(endpoint: NWEndpoint, token: String, instanceId: String) { + guard let appModel else { return } + Task { [weak self] in + guard let self else { return } + do { + let hello = self.makeHello(token: token) + let refreshed = try await self.bridgeClientFactory().pairAndHello( + endpoint: endpoint, + hello: hello, + onStatus: { status in + Task { @MainActor in + appModel.bridgeStatusText = status + } + }) + let resolvedToken = refreshed.isEmpty ? token : refreshed + if !refreshed.isEmpty, refreshed != token { + _ = KeychainStore.saveString( + refreshed, + service: "com.steipete.clawdis.bridge", + account: self.keychainAccount(instanceId: instanceId)) + } + appModel.connectToBridge(endpoint: endpoint, hello: self.makeHello(token: resolvedToken)) + } catch { + await MainActor.run { + appModel.bridgeStatusText = "Bridge error: \(error.localizedDescription)" + } + } + } + } + private func resolvedDisplayName(defaults: UserDefaults) -> String { let key = "node.displayName" let existing = defaults.string(forKey: key)?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" @@ -265,5 +326,13 @@ extension BridgeConnectionController { func _test_appVersion() -> String { self.appVersion() } + + func _test_setBridges(_ bridges: [BridgeDiscoveryModel.DiscoveredBridge]) { + self.bridges = bridges + } + + func _test_triggerAutoConnect() { + self.maybeAutoConnect() + } } #endif diff --git a/apps/ios/Sources/Bridge/BridgeDiscoveryModel.swift b/apps/ios/Sources/Bridge/BridgeDiscoveryModel.swift index 2555de680..45df2a887 100644 --- a/apps/ios/Sources/Bridge/BridgeDiscoveryModel.swift +++ b/apps/ios/Sources/Bridge/BridgeDiscoveryModel.swift @@ -18,6 +18,12 @@ final class BridgeDiscoveryModel { var endpoint: NWEndpoint var stableID: String var debugID: String + var lanHost: String? + var tailnetDns: String? + var gatewayPort: Int? + var bridgePort: Int? + var canvasPort: Int? + var cliPath: String? } var bridges: [DiscoveredBridge] = [] @@ -68,7 +74,8 @@ final class BridgeDiscoveryModel { switch result.endpoint { case let .service(name, _, _, _): let decodedName = BonjourEscapes.decode(name) - let advertisedName = result.endpoint.txtRecord?.dictionary["displayName"] + let txt = result.endpoint.txtRecord?.dictionary ?? [:] + let advertisedName = txt["displayName"] let prettyAdvertised = advertisedName .map(Self.prettifyInstanceName) .flatMap { $0.isEmpty ? nil : $0 } @@ -77,7 +84,13 @@ final class BridgeDiscoveryModel { name: prettyName, endpoint: result.endpoint, stableID: BridgeEndpointID.stableID(result.endpoint), - debugID: BridgeEndpointID.prettyDescription(result.endpoint)) + debugID: BridgeEndpointID.prettyDescription(result.endpoint), + lanHost: Self.txtValue(txt, key: "lanHost"), + tailnetDns: Self.txtValue(txt, key: "tailnetDns"), + gatewayPort: Self.txtIntValue(txt, key: "gatewayPort"), + bridgePort: Self.txtIntValue(txt, key: "bridgePort"), + canvasPort: Self.txtIntValue(txt, key: "canvasPort"), + cliPath: Self.txtValue(txt, key: "cliPath")) default: return nil } @@ -191,4 +204,14 @@ final class BridgeDiscoveryModel { .replacingOccurrences(of: #"\s+\(\d+\)$"#, with: "", options: .regularExpression) return stripped.trimmingCharacters(in: .whitespacesAndNewlines) } + + private static func txtValue(_ dict: [String: String], key: String) -> String? { + let raw = dict[key]?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + return raw.isEmpty ? nil : raw + } + + private static func txtIntValue(_ dict: [String: String], key: String) -> Int? { + guard let raw = self.txtValue(dict, key: key) else { return nil } + return Int(raw) + } } diff --git a/apps/ios/Sources/Camera/CameraController.swift b/apps/ios/Sources/Camera/CameraController.swift index a57769d31..00d633bd9 100644 --- a/apps/ios/Sources/Camera/CameraController.swift +++ b/apps/ios/Sources/Camera/CameraController.swift @@ -84,10 +84,14 @@ actor CameraController { } withExtendedLifetime(delegate) {} + let maxPayloadBytes = 5 * 1024 * 1024 + // Base64 inflates payloads by ~4/3; cap encoded bytes so the payload stays under 5MB (API limit). + let maxEncodedBytes = (maxPayloadBytes / 4) * 3 let res = try JPEGTranscoder.transcodeToJPEG( imageData: rawData, maxWidthPx: maxWidth, - quality: quality) + quality: quality, + maxBytes: maxEncodedBytes) return ( format: format.rawValue, diff --git a/apps/ios/Sources/Chat/ChatSheet.swift b/apps/ios/Sources/Chat/ChatSheet.swift index 706a6b789..1d2d059bb 100644 --- a/apps/ios/Sources/Chat/ChatSheet.swift +++ b/apps/ios/Sources/Chat/ChatSheet.swift @@ -4,18 +4,20 @@ import SwiftUI struct ChatSheet: View { @Environment(\.dismiss) private var dismiss @State private var viewModel: ClawdisChatViewModel + private let userAccent: Color? - init(bridge: BridgeSession, sessionKey: String = "main") { + init(bridge: BridgeSession, sessionKey: String = "main", userAccent: Color? = nil) { let transport = IOSBridgeChatTransport(bridge: bridge) self._viewModel = State( initialValue: ClawdisChatViewModel( sessionKey: sessionKey, transport: transport)) + self.userAccent = userAccent } var body: some View { NavigationStack { - ClawdisChatView(viewModel: self.viewModel) + ClawdisChatView(viewModel: self.viewModel, userAccent: self.userAccent) .navigationTitle("Chat") .navigationBarTitleDisplayMode(.inline) .toolbar { diff --git a/apps/ios/Sources/Model/NodeAppModel.swift b/apps/ios/Sources/Model/NodeAppModel.swift index 36b9345e1..f57aa9f99 100644 --- a/apps/ios/Sources/Model/NodeAppModel.swift +++ b/apps/ios/Sources/Model/NodeAppModel.swift @@ -22,12 +22,15 @@ final class NodeAppModel { var bridgeServerName: String? var bridgeRemoteAddress: String? var connectedBridgeID: String? + var seamColorHex: String? + var mainSessionKey: String = "main" private let bridge = BridgeSession() private var bridgeTask: Task? private var voiceWakeSyncTask: Task? @ObservationIgnored private var cameraHUDDismissTask: Task? let voiceWake = VoiceWakeManager() + let talkMode = TalkModeManager() private var lastAutoA2uiURL: String? var bridgeSession: BridgeSession { self.bridge } @@ -35,11 +38,12 @@ final class NodeAppModel { var cameraHUDText: String? var cameraHUDKind: CameraHUDKind? var cameraFlashNonce: Int = 0 + var screenRecordActive: Bool = false init() { self.voiceWake.configure { [weak self] cmd in guard let self else { return } - let sessionKey = "main" + let sessionKey = await MainActor.run { self.mainSessionKey } do { try await self.sendVoiceTranscript(text: cmd, sessionKey: sessionKey) } catch { @@ -49,6 +53,9 @@ final class NodeAppModel { let enabled = UserDefaults.standard.bool(forKey: "voiceWake.enabled") self.voiceWake.setEnabled(enabled) + self.talkMode.attachBridge(self.bridge) + let talkEnabled = UserDefaults.standard.bool(forKey: "talk.enabled") + self.talkMode.setEnabled(talkEnabled) // Wire up deep links from canvas taps self.screen.onDeepLink = { [weak self] url in @@ -145,7 +152,7 @@ final class NodeAppModel { guard let raw = await self.bridge.currentCanvasHostUrl() else { return nil } let trimmed = raw.trimmingCharacters(in: .whitespacesAndNewlines) guard !trimmed.isEmpty, let base = URL(string: trimmed) else { return nil } - return base.appendingPathComponent("__clawdis__/a2ui/").absoluteString + return base.appendingPathComponent("__clawdis__/a2ui/").absoluteString + "?platform=ios" } private func showA2UIOnConnectIfNeeded() async { @@ -177,6 +184,10 @@ final class NodeAppModel { self.voiceWake.setEnabled(enabled) } + func setTalkEnabled(_ enabled: Bool) { + self.talkMode.setEnabled(enabled) + } + func connectToBridge( endpoint: NWEndpoint, hello: BridgeHello) @@ -216,6 +227,7 @@ final class NodeAppModel { self.bridgeRemoteAddress = addr } } + await self.refreshBrandingFromGateway() await self.startVoiceWakeSync() await self.showA2UIOnConnectIfNeeded() }, @@ -255,6 +267,8 @@ final class NodeAppModel { self.bridgeServerName = nil self.bridgeRemoteAddress = nil self.connectedBridgeID = nil + self.seamColorHex = nil + self.mainSessionKey = "main" self.showLocalCanvasOnDisconnect() } } @@ -270,9 +284,47 @@ final class NodeAppModel { self.bridgeServerName = nil self.bridgeRemoteAddress = nil self.connectedBridgeID = nil + self.seamColorHex = nil + self.mainSessionKey = "main" self.showLocalCanvasOnDisconnect() } + var seamColor: Color { + Self.color(fromHex: self.seamColorHex) ?? Self.defaultSeamColor + } + + private static let defaultSeamColor = Color(red: 79 / 255.0, green: 122 / 255.0, blue: 154 / 255.0) + + private static func color(fromHex raw: String?) -> Color? { + let trimmed = (raw ?? "").trimmingCharacters(in: .whitespacesAndNewlines) + guard !trimmed.isEmpty else { return nil } + let hex = trimmed.hasPrefix("#") ? String(trimmed.dropFirst()) : trimmed + guard hex.count == 6, let value = Int(hex, radix: 16) else { return nil } + let r = Double((value >> 16) & 0xFF) / 255.0 + let g = Double((value >> 8) & 0xFF) / 255.0 + let b = Double(value & 0xFF) / 255.0 + return Color(red: r, green: g, blue: b) + } + + private func refreshBrandingFromGateway() async { + do { + let res = try await self.bridge.request(method: "config.get", paramsJSON: "{}", timeoutSeconds: 8) + guard let json = try JSONSerialization.jsonObject(with: res) as? [String: Any] else { return } + guard let config = json["config"] as? [String: Any] else { return } + let ui = config["ui"] as? [String: Any] + let raw = (ui?["seamColor"] as? String)?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + let session = config["session"] as? [String: Any] + let rawMainKey = (session?["mainKey"] as? String)?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + let mainKey = rawMainKey.isEmpty ? "main" : rawMainKey + await MainActor.run { + self.seamColorHex = raw.isEmpty ? nil : raw + self.mainSessionKey = mainKey + } + } catch { + // ignore + } + } + func setGlobalWakeWords(_ words: [String]) async { let sanitized = VoiceWakePreferences.sanitizeTriggerWords(words) @@ -590,6 +642,9 @@ final class NodeAppModel { NSLocalizedDescriptionKey: "INVALID_REQUEST: screen format must be mp4", ]) } + // Status pill mirrors screen recording state so it stays visible without overlay stacking. + self.screenRecordActive = true + defer { self.screenRecordActive = false } let path = try await self.screenRecorder.record( screenIndex: params.screenIndex, durationMs: params.durationMs, diff --git a/apps/ios/Sources/RootCanvas.swift b/apps/ios/Sources/RootCanvas.swift index 9a5fb0b76..bd3fefc52 100644 --- a/apps/ios/Sources/RootCanvas.swift +++ b/apps/ios/Sources/RootCanvas.swift @@ -51,7 +51,10 @@ struct RootCanvas: View { case .settings: SettingsTab() case .chat: - ChatSheet(bridge: self.appModel.bridgeSession) + ChatSheet( + bridge: self.appModel.bridgeSession, + sessionKey: self.appModel.mainSessionKey, + userAccent: self.appModel.seamColor) } } .onAppear { self.updateIdleTimer() } @@ -119,6 +122,9 @@ struct RootCanvas: View { } private struct CanvasContent: View { + @Environment(NodeAppModel.self) private var appModel + @AppStorage("talk.enabled") private var talkEnabled: Bool = false + @AppStorage("talk.button.enabled") private var talkButtonEnabled: Bool = true var systemColorScheme: ColorScheme var bridgeStatus: StatusPill.BridgeState var voiceWakeEnabled: Bool @@ -140,6 +146,21 @@ private struct CanvasContent: View { } .accessibilityLabel("Chat") + if self.talkButtonEnabled { + // Talk mode lives on a side bubble so it doesn't get buried in settings. + OverlayButton( + systemImage: self.appModel.talkMode.isEnabled ? "waveform.circle.fill" : "waveform.circle", + brighten: self.brightenButtons, + tint: self.appModel.seamColor, + isActive: self.appModel.talkMode.isEnabled) + { + let next = !self.appModel.talkMode.isEnabled + self.talkEnabled = next + self.appModel.setTalkEnabled(next) + } + .accessibilityLabel("Talk Mode") + } + OverlayButton(systemImage: "gearshape.fill", brighten: self.brightenButtons) { self.openSettings() } @@ -148,10 +169,17 @@ private struct CanvasContent: View { .padding(.top, 10) .padding(.trailing, 10) } + .overlay(alignment: .center) { + if self.appModel.talkMode.isEnabled { + TalkOrbOverlay() + .transition(.opacity) + } + } .overlay(alignment: .topLeading) { StatusPill( bridge: self.bridgeStatus, voiceWakeEnabled: self.voiceWakeEnabled, + activity: self.statusActivity, brighten: self.brightenButtons, onTap: { self.openSettings() @@ -169,45 +197,78 @@ private struct CanvasContent: View { .transition(.move(edge: .top).combined(with: .opacity)) } } - .overlay(alignment: .topLeading) { - if let cameraHUDText, !cameraHUDText.isEmpty, let cameraHUDKind { - CameraCaptureToast( - text: cameraHUDText, - kind: self.mapCameraKind(cameraHUDKind), - brighten: self.brightenButtons) - .padding(SwiftUI.Edge.Set.leading, 10) - .safeAreaPadding(SwiftUI.Edge.Set.top, 106) - .transition( - AnyTransition.move(edge: SwiftUI.Edge.top) - .combined(with: AnyTransition.opacity)) - } - } } - private func mapCameraKind(_ kind: NodeAppModel.CameraHUDKind) -> CameraCaptureToast.Kind { - switch kind { - case .photo: - .photo - case .recording: - .recording - case .success: - .success - case .error: - .error + private var statusActivity: StatusPill.Activity? { + // Status pill owns transient activity state so it doesn't overlap the connection indicator. + if self.appModel.isBackgrounded { + return StatusPill.Activity( + title: "Foreground required", + systemImage: "exclamationmark.triangle.fill", + tint: .orange) } + + let bridgeStatus = self.appModel.bridgeStatusText.trimmingCharacters(in: .whitespacesAndNewlines) + let bridgeLower = bridgeStatus.lowercased() + if bridgeLower.contains("repair") { + return StatusPill.Activity(title: "Repairing…", systemImage: "wrench.and.screwdriver", tint: .orange) + } + if bridgeLower.contains("approval") || bridgeLower.contains("pairing") { + return StatusPill.Activity(title: "Approval pending", systemImage: "person.crop.circle.badge.clock") + } + // Avoid duplicating the primary bridge status ("Connecting…") in the activity slot. + + if self.appModel.screenRecordActive { + return StatusPill.Activity(title: "Recording screen…", systemImage: "record.circle.fill", tint: .red) + } + + if let cameraHUDText, !cameraHUDText.isEmpty, let cameraHUDKind { + let systemImage: String + let tint: Color? + switch cameraHUDKind { + case .photo: + systemImage = "camera.fill" + tint = nil + case .recording: + systemImage = "video.fill" + tint = .red + case .success: + systemImage = "checkmark.circle.fill" + tint = .green + case .error: + systemImage = "exclamationmark.triangle.fill" + tint = .red + } + return StatusPill.Activity(title: cameraHUDText, systemImage: systemImage, tint: tint) + } + + if self.voiceWakeEnabled { + let voiceStatus = self.appModel.voiceWake.statusText + if voiceStatus.localizedCaseInsensitiveContains("microphone permission") { + return StatusPill.Activity(title: "Mic permission", systemImage: "mic.slash", tint: .orange) + } + if voiceStatus == "Paused" { + let suffix = self.appModel.isBackgrounded ? " (background)" : "" + return StatusPill.Activity(title: "Voice Wake paused\(suffix)", systemImage: "pause.circle.fill") + } + } + + return nil } } private struct OverlayButton: View { let systemImage: String let brighten: Bool + var tint: Color? + var isActive: Bool = false let action: () -> Void var body: some View { Button(action: self.action) { Image(systemName: self.systemImage) .font(.system(size: 16, weight: .semibold)) - .foregroundStyle(.primary) + .foregroundStyle(self.isActive ? (self.tint ?? .primary) : .primary) .padding(10) .background { RoundedRectangle(cornerRadius: 12, style: .continuous) @@ -225,9 +286,26 @@ private struct OverlayButton: View { endPoint: .bottomTrailing)) .blendMode(.overlay) } + .overlay { + if let tint { + RoundedRectangle(cornerRadius: 12, style: .continuous) + .fill( + LinearGradient( + colors: [ + tint.opacity(self.isActive ? 0.22 : 0.14), + tint.opacity(self.isActive ? 0.10 : 0.06), + .clear, + ], + startPoint: .topLeading, + endPoint: .bottomTrailing)) + .blendMode(.overlay) + } + } .overlay { RoundedRectangle(cornerRadius: 12, style: .continuous) - .strokeBorder(.white.opacity(self.brighten ? 0.24 : 0.18), lineWidth: 0.5) + .strokeBorder( + (self.tint ?? .white).opacity(self.isActive ? 0.34 : (self.brighten ? 0.24 : 0.18)), + lineWidth: self.isActive ? 0.7 : 0.5) } .shadow(color: .black.opacity(0.35), radius: 12, y: 6) } @@ -261,59 +339,3 @@ private struct CameraFlashOverlay: View { } } } - -private struct CameraCaptureToast: View { - enum Kind { - case photo - case recording - case success - case error - } - - var text: String - var kind: Kind - var brighten: Bool = false - - var body: some View { - HStack(spacing: 10) { - self.icon - .font(.system(size: 14, weight: .semibold)) - .foregroundStyle(.primary) - - Text(self.text) - .font(.system(size: 14, weight: .semibold)) - .foregroundStyle(.primary) - .lineLimit(1) - .truncationMode(.tail) - } - .padding(.vertical, 10) - .padding(.horizontal, 12) - .background { - RoundedRectangle(cornerRadius: 14, style: .continuous) - .fill(.ultraThinMaterial) - .overlay { - RoundedRectangle(cornerRadius: 14, style: .continuous) - .strokeBorder(.white.opacity(self.brighten ? 0.24 : 0.18), lineWidth: 0.5) - } - .shadow(color: .black.opacity(0.25), radius: 12, y: 6) - } - .accessibilityLabel("Camera") - .accessibilityValue(self.text) - } - - @ViewBuilder - private var icon: some View { - switch self.kind { - case .photo: - Image(systemName: "camera.fill") - case .recording: - Image(systemName: "record.circle.fill") - .symbolRenderingMode(.palette) - .foregroundStyle(.red, .primary) - case .success: - Image(systemName: "checkmark.circle.fill") - case .error: - Image(systemName: "exclamationmark.triangle.fill") - } - } -} diff --git a/apps/ios/Sources/RootTabs.swift b/apps/ios/Sources/RootTabs.swift index dc2508895..e76d357a0 100644 --- a/apps/ios/Sources/RootTabs.swift +++ b/apps/ios/Sources/RootTabs.swift @@ -26,6 +26,7 @@ struct RootTabs: View { StatusPill( bridge: self.bridgeStatus, voiceWakeEnabled: self.voiceWakeEnabled, + activity: self.statusActivity, onTap: { self.selectedTab = 2 }) .padding(.leading, 10) .safeAreaPadding(.top, 10) @@ -79,4 +80,64 @@ struct RootTabs: View { return .disconnected } + + private var statusActivity: StatusPill.Activity? { + // Keep the top pill consistent across tabs (camera + voice wake + pairing states). + if self.appModel.isBackgrounded { + return StatusPill.Activity( + title: "Foreground required", + systemImage: "exclamationmark.triangle.fill", + tint: .orange) + } + + let bridgeStatus = self.appModel.bridgeStatusText.trimmingCharacters(in: .whitespacesAndNewlines) + let bridgeLower = bridgeStatus.lowercased() + if bridgeLower.contains("repair") { + return StatusPill.Activity(title: "Repairing…", systemImage: "wrench.and.screwdriver", tint: .orange) + } + if bridgeLower.contains("approval") || bridgeLower.contains("pairing") { + return StatusPill.Activity(title: "Approval pending", systemImage: "person.crop.circle.badge.clock") + } + // Avoid duplicating the primary bridge status ("Connecting…") in the activity slot. + + if self.appModel.screenRecordActive { + return StatusPill.Activity(title: "Recording screen…", systemImage: "record.circle.fill", tint: .red) + } + + if let cameraHUDText = self.appModel.cameraHUDText, + let cameraHUDKind = self.appModel.cameraHUDKind, + !cameraHUDText.isEmpty + { + let systemImage: String + let tint: Color? + switch cameraHUDKind { + case .photo: + systemImage = "camera.fill" + tint = nil + case .recording: + systemImage = "video.fill" + tint = .red + case .success: + systemImage = "checkmark.circle.fill" + tint = .green + case .error: + systemImage = "exclamationmark.triangle.fill" + tint = .red + } + return StatusPill.Activity(title: cameraHUDText, systemImage: systemImage, tint: tint) + } + + if self.voiceWakeEnabled { + let voiceStatus = self.appModel.voiceWake.statusText + if voiceStatus.localizedCaseInsensitiveContains("microphone permission") { + return StatusPill.Activity(title: "Mic permission", systemImage: "mic.slash", tint: .orange) + } + if voiceStatus == "Paused" { + let suffix = self.appModel.isBackgrounded ? " (background)" : "" + return StatusPill.Activity(title: "Voice Wake paused\(suffix)", systemImage: "pause.circle.fill") + } + } + + return nil + } } diff --git a/apps/ios/Sources/Screen/ScreenController.swift b/apps/ios/Sources/Screen/ScreenController.swift index 6b3003360..76a17d55a 100644 --- a/apps/ios/Sources/Screen/ScreenController.swift +++ b/apps/ios/Sources/Screen/ScreenController.swift @@ -43,9 +43,7 @@ final class ScreenController { self.webView.scrollView.contentInset = .zero self.webView.scrollView.scrollIndicatorInsets = .zero self.webView.scrollView.automaticallyAdjustsScrollIndicatorInsets = false - // Disable scroll to allow touch events to pass through to canvas - self.webView.scrollView.isScrollEnabled = false - self.webView.scrollView.bounces = false + self.applyScrollBehavior() self.webView.navigationDelegate = self.navigationDelegate self.navigationDelegate.controller = self a2uiActionHandler.controller = self @@ -60,6 +58,7 @@ final class ScreenController { func reload() { let trimmed = self.urlString.trimmingCharacters(in: .whitespacesAndNewlines) + self.applyScrollBehavior() if trimmed.isEmpty { guard let url = Self.canvasScaffoldURL else { return } self.errorText = nil @@ -250,6 +249,15 @@ final class ScreenController { return false } + private func applyScrollBehavior() { + let trimmed = self.urlString.trimmingCharacters(in: .whitespacesAndNewlines) + let allowScroll = !trimmed.isEmpty + let scrollView = self.webView.scrollView + // Default canvas needs raw touch events; external pages should scroll. + scrollView.isScrollEnabled = allowScroll + scrollView.bounces = allowScroll + } + private static func jsValue(_ value: String?) -> String { guard let value else { return "null" } if let data = try? JSONSerialization.data(withJSONObject: [value]), diff --git a/apps/ios/Sources/Screen/ScreenRecordService.swift b/apps/ios/Sources/Screen/ScreenRecordService.swift index d1e575868..b5d75b57d 100644 --- a/apps/ios/Sources/Screen/ScreenRecordService.swift +++ b/apps/ios/Sources/Screen/ScreenRecordService.swift @@ -1,12 +1,28 @@ import AVFoundation import ReplayKit -@MainActor -final class ScreenRecordService { +final class ScreenRecordService: @unchecked Sendable { private struct UncheckedSendableBox: @unchecked Sendable { let value: T } + private final class CaptureState: @unchecked Sendable { + private let lock = NSLock() + var writer: AVAssetWriter? + var videoInput: AVAssetWriterInput? + var audioInput: AVAssetWriterInput? + var started = false + var sawVideo = false + var lastVideoTime: CMTime? + var handlerError: Error? + + func withLock(_ body: (CaptureState) -> T) -> T { + self.lock.lock() + defer { lock.unlock() } + return body(self) + } + } + enum ScreenRecordError: LocalizedError { case invalidScreenIndex(Int) case captureFailed(String) @@ -51,126 +67,158 @@ final class ScreenRecordService { }() try? FileManager.default.removeItem(at: outURL) - let recorder = RPScreenRecorder.shared() - recorder.isMicrophoneEnabled = includeAudio - - var writer: AVAssetWriter? - var videoInput: AVAssetWriterInput? - var audioInput: AVAssetWriterInput? - var started = false - var sawVideo = false - var lastVideoTime: CMTime? - var handlerError: Error? - let lock = NSLock() - - func setHandlerError(_ error: Error) { - lock.lock() - defer { lock.unlock() } - if handlerError == nil { handlerError = error } - } + let state = CaptureState() + let recordQueue = DispatchQueue(label: "com.steipete.clawdis.screenrecord") try await withCheckedThrowingContinuation { (cont: CheckedContinuation) in - recorder.startCapture(handler: { sample, type, error in - if let error { - setHandlerError(error) - return - } - guard CMSampleBufferDataIsReady(sample) else { return } - - switch type { - case .video: - let pts = CMSampleBufferGetPresentationTimeStamp(sample) - if let lastVideoTime { - let delta = CMTimeSubtract(pts, lastVideoTime) - if delta.seconds < (1.0 / fpsValue) { return } - } - - if writer == nil { - guard let imageBuffer = CMSampleBufferGetImageBuffer(sample) else { - setHandlerError(ScreenRecordError.captureFailed("Missing image buffer")) - return + let handler: @Sendable (CMSampleBuffer, RPSampleBufferType, Error?) -> Void = { sample, type, error in + // ReplayKit can call the capture handler on a background queue. + // Serialize writes to avoid queue asserts. + recordQueue.async { + if let error { + state.withLock { state in + if state.handlerError == nil { state.handlerError = error } } - let width = CVPixelBufferGetWidth(imageBuffer) - let height = CVPixelBufferGetHeight(imageBuffer) - do { - let w = try AVAssetWriter(outputURL: outURL, fileType: .mp4) - let settings: [String: Any] = [ - AVVideoCodecKey: AVVideoCodecType.h264, - AVVideoWidthKey: width, - AVVideoHeightKey: height, - ] - let vInput = AVAssetWriterInput(mediaType: .video, outputSettings: settings) - vInput.expectsMediaDataInRealTime = true - guard w.canAdd(vInput) else { - throw ScreenRecordError.writeFailed("Cannot add video input") - } - w.add(vInput) + return + } + guard CMSampleBufferDataIsReady(sample) else { return } - if includeAudio { - let aInput = AVAssetWriterInput(mediaType: .audio, outputSettings: nil) - aInput.expectsMediaDataInRealTime = true - if w.canAdd(aInput) { - w.add(aInput) - audioInput = aInput + switch type { + case .video: + let pts = CMSampleBufferGetPresentationTimeStamp(sample) + let shouldSkip = state.withLock { state in + if let lastVideoTime = state.lastVideoTime { + let delta = CMTimeSubtract(pts, lastVideoTime) + return delta.seconds < (1.0 / fpsValue) + } + return false + } + if shouldSkip { return } + + if state.withLock({ $0.writer == nil }) { + guard let imageBuffer = CMSampleBufferGetImageBuffer(sample) else { + state.withLock { state in + if state.handlerError == nil { + state.handlerError = ScreenRecordError.captureFailed("Missing image buffer") + } + } + return + } + let width = CVPixelBufferGetWidth(imageBuffer) + let height = CVPixelBufferGetHeight(imageBuffer) + do { + let w = try AVAssetWriter(outputURL: outURL, fileType: .mp4) + let settings: [String: Any] = [ + AVVideoCodecKey: AVVideoCodecType.h264, + AVVideoWidthKey: width, + AVVideoHeightKey: height, + ] + let vInput = AVAssetWriterInput(mediaType: .video, outputSettings: settings) + vInput.expectsMediaDataInRealTime = true + guard w.canAdd(vInput) else { + throw ScreenRecordError.writeFailed("Cannot add video input") + } + w.add(vInput) + + if includeAudio { + let aInput = AVAssetWriterInput(mediaType: .audio, outputSettings: nil) + aInput.expectsMediaDataInRealTime = true + if w.canAdd(aInput) { + w.add(aInput) + state.withLock { state in + state.audioInput = aInput + } + } + } + + guard w.startWriting() else { + throw ScreenRecordError + .writeFailed(w.error?.localizedDescription ?? "Failed to start writer") + } + w.startSession(atSourceTime: pts) + state.withLock { state in + state.writer = w + state.videoInput = vInput + state.started = true + } + } catch { + state.withLock { state in + if state.handlerError == nil { state.handlerError = error } + } + return + } + } + + let vInput = state.withLock { $0.videoInput } + let isStarted = state.withLock { $0.started } + guard let vInput, isStarted else { return } + if vInput.isReadyForMoreMediaData { + if vInput.append(sample) { + state.withLock { state in + state.sawVideo = true + state.lastVideoTime = pts + } + } else { + let err = state.withLock { $0.writer?.error } + if let err { + state.withLock { state in + if state.handlerError == nil { + state.handlerError = ScreenRecordError.writeFailed(err.localizedDescription) + } + } } } - - guard w.startWriting() else { - throw ScreenRecordError - .writeFailed(w.error?.localizedDescription ?? "Failed to start writer") - } - w.startSession(atSourceTime: pts) - writer = w - videoInput = vInput - started = true - } catch { - setHandlerError(error) - return } - } - guard let vInput = videoInput, started else { return } - if vInput.isReadyForMoreMediaData { - if vInput.append(sample) { - sawVideo = true - lastVideoTime = pts - } else { - if let err = writer?.error { - setHandlerError(ScreenRecordError.writeFailed(err.localizedDescription)) - } + case .audioApp, .audioMic: + let aInput = state.withLock { $0.audioInput } + let isStarted = state.withLock { $0.started } + guard includeAudio, let aInput, isStarted else { return } + if aInput.isReadyForMoreMediaData { + _ = aInput.append(sample) } - } - case .audioApp, .audioMic: - guard includeAudio, let aInput = audioInput, started else { return } - if aInput.isReadyForMoreMediaData { - _ = aInput.append(sample) + @unknown default: + break } - - @unknown default: - break } - }, completionHandler: { error in + } + + let completion: @Sendable (Error?) -> Void = { error in if let error { cont.resume(throwing: error) } else { cont.resume() } - }) + } + + Task { @MainActor in + startReplayKitCapture( + includeAudio: includeAudio, + handler: handler, + completion: completion) + } } try await Task.sleep(nanoseconds: UInt64(durationMs) * 1_000_000) let stopError = await withCheckedContinuation { cont in - recorder.stopCapture { error in cont.resume(returning: error) } + Task { @MainActor in + stopReplayKitCapture { error in cont.resume(returning: error) } + } } if let stopError { throw stopError } - if let handlerError { throw handlerError } - guard let writer, let videoInput, sawVideo else { + let handlerErrorSnapshot = state.withLock { $0.handlerError } + if let handlerErrorSnapshot { throw handlerErrorSnapshot } + let writerSnapshot = state.withLock { $0.writer } + let videoInputSnapshot = state.withLock { $0.videoInput } + let audioInputSnapshot = state.withLock { $0.audioInput } + let sawVideoSnapshot = state.withLock { $0.sawVideo } + guard let writerSnapshot, let videoInputSnapshot, sawVideoSnapshot else { throw ScreenRecordError.captureFailed("No frames captured") } - videoInput.markAsFinished() - audioInput?.markAsFinished() + videoInputSnapshot.markAsFinished() + audioInputSnapshot?.markAsFinished() - let writerBox = UncheckedSendableBox(value: writer) + let writerBox = UncheckedSendableBox(value: writerSnapshot) try await withCheckedThrowingContinuation { (cont: CheckedContinuation) in writerBox.value.finishWriting { let writer = writerBox.value @@ -199,6 +247,22 @@ final class ScreenRecordService { } } +@MainActor +private func startReplayKitCapture( + includeAudio: Bool, + handler: @escaping @Sendable (CMSampleBuffer, RPSampleBufferType, Error?) -> Void, + completion: @escaping @Sendable (Error?) -> Void) +{ + let recorder = RPScreenRecorder.shared() + recorder.isMicrophoneEnabled = includeAudio + recorder.startCapture(handler: handler, completionHandler: completion) +} + +@MainActor +private func stopReplayKitCapture(_ completion: @escaping @Sendable (Error?) -> Void) { + RPScreenRecorder.shared().stopCapture { error in completion(error) } +} + #if DEBUG extension ScreenRecordService { nonisolated static func _test_clampDurationMs(_ ms: Int?) -> Int { diff --git a/apps/ios/Sources/Settings/SettingsTab.swift b/apps/ios/Sources/Settings/SettingsTab.swift index 48b5e0aac..34b05dfc9 100644 --- a/apps/ios/Sources/Settings/SettingsTab.swift +++ b/apps/ios/Sources/Settings/SettingsTab.swift @@ -20,6 +20,8 @@ struct SettingsTab: View { @AppStorage("node.displayName") private var displayName: String = "iOS Node" @AppStorage("node.instanceId") private var instanceId: String = UUID().uuidString @AppStorage("voiceWake.enabled") private var voiceWakeEnabled: Bool = false + @AppStorage("talk.enabled") private var talkEnabled: Bool = false + @AppStorage("talk.button.enabled") private var talkButtonEnabled: Bool = true @AppStorage("camera.enabled") private var cameraEnabled: Bool = true @AppStorage("screen.preventSleep") private var preventSleep: Bool = true @AppStorage("bridge.preferredStableID") private var preferredBridgeStableID: String = "" @@ -51,6 +53,9 @@ struct SettingsTab: View { } } } + LabeledContent("Platform", value: self.platformString()) + LabeledContent("Version", value: self.appVersion()) + LabeledContent("Model", value: self.modelIdentifier()) } Section("Bridge") { @@ -153,6 +158,12 @@ struct SettingsTab: View { .onChange(of: self.voiceWakeEnabled) { _, newValue in self.appModel.setVoiceWakeEnabled(newValue) } + Toggle("Talk Mode", isOn: self.$talkEnabled) + .onChange(of: self.talkEnabled) { _, newValue in + self.appModel.setTalkEnabled(newValue) + } + // Keep this separate so users can hide the side bubble without disabling Talk Mode. + Toggle("Show Talk Button", isOn: self.$talkButtonEnabled) NavigationLink { VoiceWakeWordsSettingsView() @@ -227,6 +238,12 @@ struct SettingsTab: View { HStack { VStack(alignment: .leading, spacing: 2) { Text(bridge.name) + let detailLines = self.bridgeDetailLines(bridge) + ForEach(detailLines, id: \.self) { line in + Text(line) + .font(.footnote) + .foregroundStyle(.secondary) + } } Spacer() @@ -504,4 +521,26 @@ struct SettingsTab: View { private static func httpURLString(host: String?, port: Int?, fallback: String) -> String { SettingsNetworkingHelpers.httpURLString(host: host, port: port, fallback: fallback) } + + private func bridgeDetailLines(_ bridge: BridgeDiscoveryModel.DiscoveredBridge) -> [String] { + var lines: [String] = [] + if let lanHost = bridge.lanHost { lines.append("LAN: \(lanHost)") } + if let tailnet = bridge.tailnetDns { lines.append("Tailnet: \(tailnet)") } + + let gatewayPort = bridge.gatewayPort + let bridgePort = bridge.bridgePort + let canvasPort = bridge.canvasPort + if gatewayPort != nil || bridgePort != nil || canvasPort != nil { + let gw = gatewayPort.map(String.init) ?? "—" + let br = bridgePort.map(String.init) ?? "—" + let canvas = canvasPort.map(String.init) ?? "—" + lines.append("Ports: gw \(gw) · bridge \(br) · canvas \(canvas)") + } + + if lines.isEmpty { + lines.append(bridge.debugID) + } + + return lines + } } diff --git a/apps/ios/Sources/Status/StatusPill.swift b/apps/ios/Sources/Status/StatusPill.swift index 9d3c6f6d6..1e30ad16d 100644 --- a/apps/ios/Sources/Status/StatusPill.swift +++ b/apps/ios/Sources/Status/StatusPill.swift @@ -28,8 +28,15 @@ struct StatusPill: View { } } + struct Activity: Equatable { + var title: String + var systemImage: String + var tint: Color? + } + var bridge: BridgeState var voiceWakeEnabled: Bool + var activity: Activity? var brighten: Bool = false var onTap: () -> Void @@ -54,10 +61,24 @@ struct StatusPill: View { .frame(height: 14) .opacity(0.35) - Image(systemName: self.voiceWakeEnabled ? "mic.fill" : "mic.slash") - .font(.system(size: 13, weight: .semibold)) - .foregroundStyle(self.voiceWakeEnabled ? .primary : .secondary) - .accessibilityLabel(self.voiceWakeEnabled ? "Voice Wake enabled" : "Voice Wake disabled") + if let activity { + HStack(spacing: 6) { + Image(systemName: activity.systemImage) + .font(.system(size: 13, weight: .semibold)) + .foregroundStyle(activity.tint ?? .primary) + Text(activity.title) + .font(.system(size: 13, weight: .semibold)) + .foregroundStyle(.primary) + .lineLimit(1) + } + .transition(.opacity.combined(with: .move(edge: .top))) + } else { + Image(systemName: self.voiceWakeEnabled ? "mic.fill" : "mic.slash") + .font(.system(size: 13, weight: .semibold)) + .foregroundStyle(self.voiceWakeEnabled ? .primary : .secondary) + .accessibilityLabel(self.voiceWakeEnabled ? "Voice Wake enabled" : "Voice Wake disabled") + .transition(.opacity.combined(with: .move(edge: .top))) + } } .padding(.vertical, 8) .padding(.horizontal, 12) @@ -73,7 +94,7 @@ struct StatusPill: View { } .buttonStyle(.plain) .accessibilityLabel("Status") - .accessibilityValue("\(self.bridge.title), Voice Wake \(self.voiceWakeEnabled ? "enabled" : "disabled")") + .accessibilityValue(self.accessibilityValue) .onAppear { self.updatePulse(for: self.bridge, scenePhase: self.scenePhase) } .onDisappear { self.pulse = false } .onChange(of: self.bridge) { _, newValue in @@ -82,6 +103,14 @@ struct StatusPill: View { .onChange(of: self.scenePhase) { _, newValue in self.updatePulse(for: self.bridge, scenePhase: newValue) } + .animation(.easeInOut(duration: 0.18), value: self.activity?.title) + } + + private var accessibilityValue: String { + if let activity { + return "\(self.bridge.title), \(activity.title)" + } + return "\(self.bridge.title), Voice Wake \(self.voiceWakeEnabled ? "enabled" : "disabled")" } private func updatePulse(for bridge: BridgeState, scenePhase: ScenePhase) { diff --git a/apps/ios/Sources/Voice/TalkModeManager.swift b/apps/ios/Sources/Voice/TalkModeManager.swift new file mode 100644 index 000000000..355c67321 --- /dev/null +++ b/apps/ios/Sources/Voice/TalkModeManager.swift @@ -0,0 +1,713 @@ +import AVFAudio +import ClawdisKit +import Foundation +import Observation +import OSLog +import Speech + +@MainActor +@Observable +final class TalkModeManager: NSObject { + private typealias SpeechRequest = SFSpeechAudioBufferRecognitionRequest + private static let defaultModelIdFallback = "eleven_v3" + var isEnabled: Bool = false + var isListening: Bool = false + var isSpeaking: Bool = false + var statusText: String = "Off" + + private let audioEngine = AVAudioEngine() + private var speechRecognizer: SFSpeechRecognizer? + private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest? + private var recognitionTask: SFSpeechRecognitionTask? + private var silenceTask: Task? + + private var lastHeard: Date? + private var lastTranscript: String = "" + private var lastSpokenText: String? + private var lastInterruptedAtSeconds: Double? + + private var defaultVoiceId: String? + private var currentVoiceId: String? + private var defaultModelId: String? + private var currentModelId: String? + private var voiceOverrideActive = false + private var modelOverrideActive = false + private var defaultOutputFormat: String? + private var apiKey: String? + private var voiceAliases: [String: String] = [:] + private var interruptOnSpeech: Bool = true + private var mainSessionKey: String = "main" + private var fallbackVoiceId: String? + private var lastPlaybackWasPCM: Bool = false + var pcmPlayer: PCMStreamingAudioPlaying = PCMStreamingAudioPlayer.shared + var mp3Player: StreamingAudioPlaying = StreamingAudioPlayer.shared + + private var bridge: BridgeSession? + private let silenceWindow: TimeInterval = 0.7 + + private var chatSubscribedSessionKeys = Set() + + private let logger = Logger(subsystem: "com.steipete.clawdis", category: "TalkMode") + + func attachBridge(_ bridge: BridgeSession) { + self.bridge = bridge + } + + func setEnabled(_ enabled: Bool) { + self.isEnabled = enabled + if enabled { + self.logger.info("enabled") + Task { await self.start() } + } else { + self.logger.info("disabled") + self.stop() + } + } + + func start() async { + guard self.isEnabled else { return } + if self.isListening { return } + + self.logger.info("start") + self.statusText = "Requesting permissions…" + let micOk = await Self.requestMicrophonePermission() + guard micOk else { + self.logger.warning("start blocked: microphone permission denied") + self.statusText = "Microphone permission denied" + return + } + let speechOk = await Self.requestSpeechPermission() + guard speechOk else { + self.logger.warning("start blocked: speech permission denied") + self.statusText = "Speech recognition permission denied" + return + } + + await self.reloadConfig() + do { + try Self.configureAudioSession() + try self.startRecognition() + self.isListening = true + self.statusText = "Listening" + self.startSilenceMonitor() + await self.subscribeChatIfNeeded(sessionKey: self.mainSessionKey) + self.logger.info("listening") + } catch { + self.isListening = false + self.statusText = "Start failed: \(error.localizedDescription)" + self.logger.error("start failed: \(error.localizedDescription, privacy: .public)") + } + } + + func stop() { + self.isEnabled = false + self.isListening = false + self.statusText = "Off" + self.lastTranscript = "" + self.lastHeard = nil + self.silenceTask?.cancel() + self.silenceTask = nil + self.stopRecognition() + self.stopSpeaking() + self.lastInterruptedAtSeconds = nil + TalkSystemSpeechSynthesizer.shared.stop() + do { + try AVAudioSession.sharedInstance().setActive(false, options: [.notifyOthersOnDeactivation]) + } catch { + self.logger.warning("audio session deactivate failed: \(error.localizedDescription, privacy: .public)") + } + Task { await self.unsubscribeAllChats() } + } + + func userTappedOrb() { + self.stopSpeaking() + } + + private func startRecognition() throws { + self.stopRecognition() + self.speechRecognizer = SFSpeechRecognizer() + guard let recognizer = self.speechRecognizer else { + throw NSError(domain: "TalkMode", code: 1, userInfo: [ + NSLocalizedDescriptionKey: "Speech recognizer unavailable", + ]) + } + + self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest() + self.recognitionRequest?.shouldReportPartialResults = true + guard let request = self.recognitionRequest else { return } + + let input = self.audioEngine.inputNode + let format = input.outputFormat(forBus: 0) + input.removeTap(onBus: 0) + let tapBlock = Self.makeAudioTapAppendCallback(request: request) + input.installTap(onBus: 0, bufferSize: 2048, format: format, block: tapBlock) + + self.audioEngine.prepare() + try self.audioEngine.start() + + self.recognitionTask = recognizer.recognitionTask(with: request) { [weak self] result, error in + guard let self else { return } + if let error { + if !self.isSpeaking { + self.statusText = "Speech error: \(error.localizedDescription)" + } + self.logger.debug("speech recognition error: \(error.localizedDescription, privacy: .public)") + } + guard let result else { return } + let transcript = result.bestTranscription.formattedString + Task { @MainActor in + await self.handleTranscript(transcript: transcript, isFinal: result.isFinal) + } + } + } + + private func stopRecognition() { + self.recognitionTask?.cancel() + self.recognitionTask = nil + self.recognitionRequest?.endAudio() + self.recognitionRequest = nil + self.audioEngine.inputNode.removeTap(onBus: 0) + self.audioEngine.stop() + self.speechRecognizer = nil + } + + private nonisolated static func makeAudioTapAppendCallback(request: SpeechRequest) -> AVAudioNodeTapBlock { + { buffer, _ in + request.append(buffer) + } + } + + private func handleTranscript(transcript: String, isFinal: Bool) async { + let trimmed = transcript.trimmingCharacters(in: .whitespacesAndNewlines) + if self.isSpeaking, self.interruptOnSpeech { + if self.shouldInterrupt(with: trimmed) { + self.stopSpeaking() + } + return + } + + guard self.isListening else { return } + if !trimmed.isEmpty { + self.lastTranscript = trimmed + self.lastHeard = Date() + } + if isFinal { + self.lastTranscript = trimmed + } + } + + private func startSilenceMonitor() { + self.silenceTask?.cancel() + self.silenceTask = Task { [weak self] in + guard let self else { return } + while self.isEnabled { + try? await Task.sleep(nanoseconds: 200_000_000) + await self.checkSilence() + } + } + } + + private func checkSilence() async { + guard self.isListening, !self.isSpeaking else { return } + let transcript = self.lastTranscript.trimmingCharacters(in: .whitespacesAndNewlines) + guard !transcript.isEmpty else { return } + guard let lastHeard else { return } + if Date().timeIntervalSince(lastHeard) < self.silenceWindow { return } + await self.finalizeTranscript(transcript) + } + + private func finalizeTranscript(_ transcript: String) async { + self.isListening = false + self.statusText = "Thinking…" + self.lastTranscript = "" + self.lastHeard = nil + self.stopRecognition() + + await self.reloadConfig() + let prompt = self.buildPrompt(transcript: transcript) + guard let bridge else { + self.statusText = "Bridge not connected" + self.logger.warning("finalize: bridge not connected") + await self.start() + return + } + + do { + let startedAt = Date().timeIntervalSince1970 + let sessionKey = self.mainSessionKey + await self.subscribeChatIfNeeded(sessionKey: sessionKey) + self.logger.info( + "chat.send start sessionKey=\(sessionKey, privacy: .public) chars=\(prompt.count, privacy: .public)") + let runId = try await self.sendChat(prompt, bridge: bridge) + self.logger.info("chat.send ok runId=\(runId, privacy: .public)") + let completion = await self.waitForChatCompletion(runId: runId, bridge: bridge, timeoutSeconds: 120) + if completion == .timeout { + self.logger.warning( + "chat completion timeout runId=\(runId, privacy: .public); attempting history fallback") + } else if completion == .aborted { + self.statusText = "Aborted" + self.logger.warning("chat completion aborted runId=\(runId, privacy: .public)") + await self.start() + return + } else if completion == .error { + self.statusText = "Chat error" + self.logger.warning("chat completion error runId=\(runId, privacy: .public)") + await self.start() + return + } + + guard let assistantText = try await self.waitForAssistantText( + bridge: bridge, + since: startedAt, + timeoutSeconds: completion == .final ? 12 : 25) + else { + self.statusText = "No reply" + self.logger.warning("assistant text timeout runId=\(runId, privacy: .public)") + await self.start() + return + } + self.logger.info("assistant text ok chars=\(assistantText.count, privacy: .public)") + await self.playAssistant(text: assistantText) + } catch { + self.statusText = "Talk failed: \(error.localizedDescription)" + self.logger.error("finalize failed: \(error.localizedDescription, privacy: .public)") + } + + await self.start() + } + + private func subscribeChatIfNeeded(sessionKey: String) async { + let key = sessionKey.trimmingCharacters(in: .whitespacesAndNewlines) + guard !key.isEmpty else { return } + guard let bridge else { return } + guard !self.chatSubscribedSessionKeys.contains(key) else { return } + + do { + let payload = "{\"sessionKey\":\"\(key)\"}" + try await bridge.sendEvent(event: "chat.subscribe", payloadJSON: payload) + self.chatSubscribedSessionKeys.insert(key) + self.logger.info("chat.subscribe ok sessionKey=\(key, privacy: .public)") + } catch { + self.logger + .warning( + "chat.subscribe failed sessionKey=\(key, privacy: .public) err=\(error.localizedDescription, privacy: .public)") + } + } + + private func unsubscribeAllChats() async { + guard let bridge else { return } + let keys = self.chatSubscribedSessionKeys + self.chatSubscribedSessionKeys.removeAll() + for key in keys { + do { + let payload = "{\"sessionKey\":\"\(key)\"}" + try await bridge.sendEvent(event: "chat.unsubscribe", payloadJSON: payload) + } catch { + // ignore + } + } + } + + private func buildPrompt(transcript: String) -> String { + let interrupted = self.lastInterruptedAtSeconds + self.lastInterruptedAtSeconds = nil + return TalkPromptBuilder.build(transcript: transcript, interruptedAtSeconds: interrupted) + } + + private enum ChatCompletionState: CustomStringConvertible { + case final + case aborted + case error + case timeout + + var description: String { + switch self { + case .final: "final" + case .aborted: "aborted" + case .error: "error" + case .timeout: "timeout" + } + } + } + + private func sendChat(_ message: String, bridge: BridgeSession) async throws -> String { + struct SendResponse: Decodable { let runId: String } + let payload: [String: Any] = [ + "sessionKey": self.mainSessionKey, + "message": message, + "thinking": "low", + "timeoutMs": 30000, + "idempotencyKey": UUID().uuidString, + ] + let data = try JSONSerialization.data(withJSONObject: payload) + let json = String(decoding: data, as: UTF8.self) + let res = try await bridge.request(method: "chat.send", paramsJSON: json, timeoutSeconds: 30) + let decoded = try JSONDecoder().decode(SendResponse.self, from: res) + return decoded.runId + } + + private func waitForChatCompletion( + runId: String, + bridge: BridgeSession, + timeoutSeconds: Int = 120) async -> ChatCompletionState + { + let stream = await bridge.subscribeServerEvents(bufferingNewest: 200) + return await withTaskGroup(of: ChatCompletionState.self) { group in + group.addTask { [runId] in + for await evt in stream { + if Task.isCancelled { return .timeout } + guard evt.event == "chat", let payload = evt.payloadJSON else { continue } + guard let data = payload.data(using: .utf8) else { continue } + guard let json = try? JSONSerialization.jsonObject(with: data) as? [String: Any] else { continue } + if (json["runId"] as? String) != runId { continue } + if let state = json["state"] as? String { + switch state { + case "final": return .final + case "aborted": return .aborted + case "error": return .error + default: break + } + } + } + return .timeout + } + group.addTask { + try? await Task.sleep(nanoseconds: UInt64(timeoutSeconds) * 1_000_000_000) + return .timeout + } + let result = await group.next() ?? .timeout + group.cancelAll() + return result + } + } + + private func waitForAssistantText( + bridge: BridgeSession, + since: Double, + timeoutSeconds: Int) async throws -> String? + { + let deadline = Date().addingTimeInterval(TimeInterval(timeoutSeconds)) + while Date() < deadline { + if let text = try await self.fetchLatestAssistantText(bridge: bridge, since: since) { + return text + } + try? await Task.sleep(nanoseconds: 300_000_000) + } + return nil + } + + private func fetchLatestAssistantText(bridge: BridgeSession, since: Double? = nil) async throws -> String? { + let res = try await bridge.request( + method: "chat.history", + paramsJSON: "{\"sessionKey\":\"\(self.mainSessionKey)\"}", + timeoutSeconds: 15) + guard let json = try JSONSerialization.jsonObject(with: res) as? [String: Any] else { return nil } + guard let messages = json["messages"] as? [[String: Any]] else { return nil } + for msg in messages.reversed() { + guard (msg["role"] as? String) == "assistant" else { continue } + if let since, let timestamp = msg["timestamp"] as? Double, + TalkHistoryTimestamp.isAfter(timestamp, sinceSeconds: since) == false + { + continue + } + guard let content = msg["content"] as? [[String: Any]] else { continue } + let text = content.compactMap { $0["text"] as? String }.joined(separator: "\n") + let trimmed = text.trimmingCharacters(in: .whitespacesAndNewlines) + if !trimmed.isEmpty { return trimmed } + } + return nil + } + + private func playAssistant(text: String) async { + let parsed = TalkDirectiveParser.parse(text) + let directive = parsed.directive + let cleaned = parsed.stripped.trimmingCharacters(in: .whitespacesAndNewlines) + guard !cleaned.isEmpty else { return } + + let requestedVoice = directive?.voiceId?.trimmingCharacters(in: .whitespacesAndNewlines) + let resolvedVoice = self.resolveVoiceAlias(requestedVoice) + if requestedVoice?.isEmpty == false, resolvedVoice == nil { + self.logger.warning("unknown voice alias \(requestedVoice ?? "?", privacy: .public)") + } + if let voice = resolvedVoice { + if directive?.once != true { + self.currentVoiceId = voice + self.voiceOverrideActive = true + } + } + if let model = directive?.modelId { + if directive?.once != true { + self.currentModelId = model + self.modelOverrideActive = true + } + } + + self.statusText = "Generating voice…" + self.isSpeaking = true + self.lastSpokenText = cleaned + + do { + let started = Date() + let language = ElevenLabsTTSClient.validatedLanguage(directive?.language) + + let resolvedKey = + (self.apiKey?.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty == false ? self.apiKey : nil) ?? + ProcessInfo.processInfo.environment["ELEVENLABS_API_KEY"] + let apiKey = resolvedKey?.trimmingCharacters(in: .whitespacesAndNewlines) + let preferredVoice = resolvedVoice ?? self.currentVoiceId ?? self.defaultVoiceId + let voiceId: String? = if let apiKey, !apiKey.isEmpty { + await self.resolveVoiceId(preferred: preferredVoice, apiKey: apiKey) + } else { + nil + } + let canUseElevenLabs = (voiceId?.isEmpty == false) && (apiKey?.isEmpty == false) + + if canUseElevenLabs, let voiceId, let apiKey { + let desiredOutputFormat = (directive?.outputFormat ?? self.defaultOutputFormat)? + .trimmingCharacters(in: .whitespacesAndNewlines) + let requestedOutputFormat = (desiredOutputFormat?.isEmpty == false) ? desiredOutputFormat : nil + let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(requestedOutputFormat ?? "pcm_44100") + if outputFormat == nil, let requestedOutputFormat { + self.logger.warning( + "talk output_format unsupported for local playback: \(requestedOutputFormat, privacy: .public)") + } + + let modelId = directive?.modelId ?? self.currentModelId ?? self.defaultModelId + func makeRequest(outputFormat: String?) -> ElevenLabsTTSRequest { + ElevenLabsTTSRequest( + text: cleaned, + modelId: modelId, + outputFormat: outputFormat, + speed: TalkTTSValidation.resolveSpeed(speed: directive?.speed, rateWPM: directive?.rateWPM), + stability: TalkTTSValidation.validatedStability(directive?.stability, modelId: modelId), + similarity: TalkTTSValidation.validatedUnit(directive?.similarity), + style: TalkTTSValidation.validatedUnit(directive?.style), + speakerBoost: directive?.speakerBoost, + seed: TalkTTSValidation.validatedSeed(directive?.seed), + normalize: ElevenLabsTTSClient.validatedNormalize(directive?.normalize), + language: language, + latencyTier: TalkTTSValidation.validatedLatencyTier(directive?.latencyTier)) + } + + let request = makeRequest(outputFormat: outputFormat) + + let client = ElevenLabsTTSClient(apiKey: apiKey) + let stream = client.streamSynthesize(voiceId: voiceId, request: request) + + if self.interruptOnSpeech { + do { + try self.startRecognition() + } catch { + self.logger.warning( + "startRecognition during speak failed: \(error.localizedDescription, privacy: .public)") + } + } + + self.statusText = "Speaking…" + let sampleRate = TalkTTSValidation.pcmSampleRate(from: outputFormat) + let result: StreamingPlaybackResult + if let sampleRate { + self.lastPlaybackWasPCM = true + var playback = await self.pcmPlayer.play(stream: stream, sampleRate: sampleRate) + if !playback.finished, playback.interruptedAt == nil { + let mp3Format = ElevenLabsTTSClient.validatedOutputFormat("mp3_44100") + self.logger.warning("pcm playback failed; retrying mp3") + self.lastPlaybackWasPCM = false + let mp3Stream = client.streamSynthesize( + voiceId: voiceId, + request: makeRequest(outputFormat: mp3Format)) + playback = await self.mp3Player.play(stream: mp3Stream) + } + result = playback + } else { + self.lastPlaybackWasPCM = false + result = await self.mp3Player.play(stream: stream) + } + self.logger + .info( + "elevenlabs stream finished=\(result.finished, privacy: .public) dur=\(Date().timeIntervalSince(started), privacy: .public)s") + if !result.finished, let interruptedAt = result.interruptedAt { + self.lastInterruptedAtSeconds = interruptedAt + } + } else { + self.logger.warning("tts unavailable; falling back to system voice (missing key or voiceId)") + if self.interruptOnSpeech { + do { + try self.startRecognition() + } catch { + self.logger.warning( + "startRecognition during speak failed: \(error.localizedDescription, privacy: .public)") + } + } + self.statusText = "Speaking (System)…" + try await TalkSystemSpeechSynthesizer.shared.speak(text: cleaned, language: language) + } + } catch { + self.logger.error( + "tts failed: \(error.localizedDescription, privacy: .public); falling back to system voice") + do { + if self.interruptOnSpeech { + do { + try self.startRecognition() + } catch { + self.logger.warning( + "startRecognition during speak failed: \(error.localizedDescription, privacy: .public)") + } + } + self.statusText = "Speaking (System)…" + let language = ElevenLabsTTSClient.validatedLanguage(directive?.language) + try await TalkSystemSpeechSynthesizer.shared.speak(text: cleaned, language: language) + } catch { + self.statusText = "Speak failed: \(error.localizedDescription)" + self.logger.error("system voice failed: \(error.localizedDescription, privacy: .public)") + } + } + + self.stopRecognition() + self.isSpeaking = false + } + + private func stopSpeaking(storeInterruption: Bool = true) { + guard self.isSpeaking else { return } + let interruptedAt = self.lastPlaybackWasPCM + ? self.pcmPlayer.stop() + : self.mp3Player.stop() + if storeInterruption { + self.lastInterruptedAtSeconds = interruptedAt + } + _ = self.lastPlaybackWasPCM + ? self.mp3Player.stop() + : self.pcmPlayer.stop() + TalkSystemSpeechSynthesizer.shared.stop() + self.isSpeaking = false + } + + private func shouldInterrupt(with transcript: String) -> Bool { + let trimmed = transcript.trimmingCharacters(in: .whitespacesAndNewlines) + guard trimmed.count >= 3 else { return false } + if let spoken = self.lastSpokenText?.lowercased(), spoken.contains(trimmed.lowercased()) { + return false + } + return true + } + + private func resolveVoiceAlias(_ value: String?) -> String? { + let trimmed = (value ?? "").trimmingCharacters(in: .whitespacesAndNewlines) + guard !trimmed.isEmpty else { return nil } + let normalized = trimmed.lowercased() + if let mapped = self.voiceAliases[normalized] { return mapped } + if self.voiceAliases.values.contains(where: { $0.caseInsensitiveCompare(trimmed) == .orderedSame }) { + return trimmed + } + return Self.isLikelyVoiceId(trimmed) ? trimmed : nil + } + + private func resolveVoiceId(preferred: String?, apiKey: String) async -> String? { + let trimmed = preferred?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + if !trimmed.isEmpty { + if let resolved = self.resolveVoiceAlias(trimmed) { return resolved } + self.logger.warning("unknown voice alias \(trimmed, privacy: .public)") + } + if let fallbackVoiceId { return fallbackVoiceId } + + do { + let voices = try await ElevenLabsTTSClient(apiKey: apiKey).listVoices() + guard let first = voices.first else { + self.logger.warning("elevenlabs voices list empty") + return nil + } + self.fallbackVoiceId = first.voiceId + if self.defaultVoiceId == nil { + self.defaultVoiceId = first.voiceId + } + if !self.voiceOverrideActive { + self.currentVoiceId = first.voiceId + } + let name = first.name ?? "unknown" + self.logger + .info("default voice selected \(name, privacy: .public) (\(first.voiceId, privacy: .public))") + return first.voiceId + } catch { + self.logger.error("elevenlabs list voices failed: \(error.localizedDescription, privacy: .public)") + return nil + } + } + + private static func isLikelyVoiceId(_ value: String) -> Bool { + guard value.count >= 10 else { return false } + return value.allSatisfy { $0.isLetter || $0.isNumber || $0 == "-" || $0 == "_" } + } + + private func reloadConfig() async { + guard let bridge else { return } + do { + let res = try await bridge.request(method: "config.get", paramsJSON: "{}", timeoutSeconds: 8) + guard let json = try JSONSerialization.jsonObject(with: res) as? [String: Any] else { return } + guard let config = json["config"] as? [String: Any] else { return } + let talk = config["talk"] as? [String: Any] + let session = config["session"] as? [String: Any] + let rawMainKey = (session?["mainKey"] as? String)?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + self.mainSessionKey = rawMainKey.isEmpty ? "main" : rawMainKey + self.defaultVoiceId = (talk?["voiceId"] as? String)?.trimmingCharacters(in: .whitespacesAndNewlines) + if let aliases = talk?["voiceAliases"] as? [String: Any] { + var resolved: [String: String] = [:] + for (key, value) in aliases { + guard let id = value as? String else { continue } + let normalizedKey = key.trimmingCharacters(in: .whitespacesAndNewlines).lowercased() + let trimmedId = id.trimmingCharacters(in: .whitespacesAndNewlines) + guard !normalizedKey.isEmpty, !trimmedId.isEmpty else { continue } + resolved[normalizedKey] = trimmedId + } + self.voiceAliases = resolved + } else { + self.voiceAliases = [:] + } + if !self.voiceOverrideActive { + self.currentVoiceId = self.defaultVoiceId + } + let model = (talk?["modelId"] as? String)?.trimmingCharacters(in: .whitespacesAndNewlines) + self.defaultModelId = (model?.isEmpty == false) ? model : Self.defaultModelIdFallback + if !self.modelOverrideActive { + self.currentModelId = self.defaultModelId + } + self.defaultOutputFormat = (talk?["outputFormat"] as? String)? + .trimmingCharacters(in: .whitespacesAndNewlines) + self.apiKey = (talk?["apiKey"] as? String)?.trimmingCharacters(in: .whitespacesAndNewlines) + if let interrupt = talk?["interruptOnSpeech"] as? Bool { + self.interruptOnSpeech = interrupt + } + } catch { + self.defaultModelId = Self.defaultModelIdFallback + if !self.modelOverrideActive { + self.currentModelId = self.defaultModelId + } + } + } + + private static func configureAudioSession() throws { + let session = AVAudioSession.sharedInstance() + try session.setCategory(.playAndRecord, mode: .voiceChat, options: [ + .duckOthers, + .mixWithOthers, + .allowBluetoothHFP, + .defaultToSpeaker, + ]) + try session.setActive(true, options: []) + } + + private nonisolated static func requestMicrophonePermission() async -> Bool { + await withCheckedContinuation(isolation: nil) { cont in + AVAudioApplication.requestRecordPermission { ok in + cont.resume(returning: ok) + } + } + } + + private nonisolated static func requestSpeechPermission() async -> Bool { + await withCheckedContinuation(isolation: nil) { cont in + SFSpeechRecognizer.requestAuthorization { status in + cont.resume(returning: status == .authorized) + } + } + } +} diff --git a/apps/ios/Sources/Voice/TalkOrbOverlay.swift b/apps/ios/Sources/Voice/TalkOrbOverlay.swift new file mode 100644 index 000000000..cce8c1c61 --- /dev/null +++ b/apps/ios/Sources/Voice/TalkOrbOverlay.swift @@ -0,0 +1,70 @@ +import SwiftUI + +struct TalkOrbOverlay: View { + @Environment(NodeAppModel.self) private var appModel + @State private var pulse: Bool = false + + var body: some View { + let seam = self.appModel.seamColor + let status = self.appModel.talkMode.statusText.trimmingCharacters(in: .whitespacesAndNewlines) + + VStack(spacing: 14) { + ZStack { + Circle() + .stroke(seam.opacity(0.26), lineWidth: 2) + .frame(width: 320, height: 320) + .scaleEffect(self.pulse ? 1.15 : 0.96) + .opacity(self.pulse ? 0.0 : 1.0) + .animation(.easeOut(duration: 1.3).repeatForever(autoreverses: false), value: self.pulse) + + Circle() + .stroke(seam.opacity(0.18), lineWidth: 2) + .frame(width: 320, height: 320) + .scaleEffect(self.pulse ? 1.45 : 1.02) + .opacity(self.pulse ? 0.0 : 0.9) + .animation(.easeOut(duration: 1.9).repeatForever(autoreverses: false).delay(0.2), value: self.pulse) + + Circle() + .fill( + RadialGradient( + colors: [ + seam.opacity(0.95), + seam.opacity(0.40), + Color.black.opacity(0.55), + ], + center: .center, + startRadius: 1, + endRadius: 112)) + .frame(width: 190, height: 190) + .overlay( + Circle() + .stroke(seam.opacity(0.35), lineWidth: 1)) + .shadow(color: seam.opacity(0.32), radius: 26, x: 0, y: 0) + .shadow(color: Color.black.opacity(0.50), radius: 22, x: 0, y: 10) + } + .contentShape(Circle()) + .onTapGesture { + self.appModel.talkMode.userTappedOrb() + } + + if !status.isEmpty, status != "Off" { + Text(status) + .font(.system(.footnote, design: .rounded).weight(.semibold)) + .foregroundStyle(Color.white.opacity(0.92)) + .padding(.horizontal, 12) + .padding(.vertical, 8) + .background( + Capsule() + .fill(Color.black.opacity(0.40)) + .overlay( + Capsule().stroke(seam.opacity(0.22), lineWidth: 1))) + } + } + .padding(28) + .onAppear { + self.pulse = true + } + .accessibilityElement(children: .combine) + .accessibilityLabel("Talk Mode \(status)") + } +} diff --git a/apps/ios/Sources/Voice/VoiceTab.swift b/apps/ios/Sources/Voice/VoiceTab.swift index 59e1cd6d4..4fedd0ce9 100644 --- a/apps/ios/Sources/Voice/VoiceTab.swift +++ b/apps/ios/Sources/Voice/VoiceTab.swift @@ -4,6 +4,7 @@ struct VoiceTab: View { @Environment(NodeAppModel.self) private var appModel @Environment(VoiceWakeManager.self) private var voiceWake @AppStorage("voiceWake.enabled") private var voiceWakeEnabled: Bool = false + @AppStorage("talk.enabled") private var talkEnabled: Bool = false var body: some View { NavigationStack { @@ -14,6 +15,7 @@ struct VoiceTab: View { Text(self.voiceWake.statusText) .font(.footnote) .foregroundStyle(.secondary) + LabeledContent("Talk Mode", value: self.talkEnabled ? "Enabled" : "Disabled") } Section("Notes") { @@ -36,6 +38,9 @@ struct VoiceTab: View { .onChange(of: self.voiceWakeEnabled) { _, newValue in self.appModel.setVoiceWakeEnabled(newValue) } + .onChange(of: self.talkEnabled) { _, newValue in + self.appModel.setTalkEnabled(newValue) + } } } } diff --git a/apps/ios/SwiftSources.input.xcfilelist b/apps/ios/SwiftSources.input.xcfilelist index 3e2a9a7b0..5b71e678f 100644 --- a/apps/ios/SwiftSources.input.xcfilelist +++ b/apps/ios/SwiftSources.input.xcfilelist @@ -54,4 +54,7 @@ Sources/Voice/VoiceWakePreferences.swift ../shared/ClawdisKit/Sources/ClawdisKit/ScreenCommands.swift ../shared/ClawdisKit/Sources/ClawdisKit/StoragePaths.swift ../shared/ClawdisKit/Sources/ClawdisKit/SystemCommands.swift +../shared/ClawdisKit/Sources/ClawdisKit/TalkDirective.swift ../../Swabble/Sources/SwabbleKit/WakeWordGate.swift +Sources/Voice/TalkModeManager.swift +Sources/Voice/TalkOrbOverlay.swift diff --git a/apps/ios/Tests/BridgeConnectionControllerTests.swift b/apps/ios/Tests/BridgeConnectionControllerTests.swift index 4ff359616..e4eb6ccf2 100644 --- a/apps/ios/Tests/BridgeConnectionControllerTests.swift +++ b/apps/ios/Tests/BridgeConnectionControllerTests.swift @@ -1,5 +1,6 @@ import ClawdisKit import Foundation +import Network import Testing import UIKit @testable import Clawdis @@ -15,6 +16,25 @@ private let instanceIdEntry = KeychainEntry(service: nodeService, account: "inst private let preferredBridgeEntry = KeychainEntry(service: bridgeService, account: "preferredStableID") private let lastBridgeEntry = KeychainEntry(service: bridgeService, account: "lastDiscoveredStableID") +private actor MockBridgePairingClient: BridgePairingClient { + private(set) var lastToken: String? + private let resultToken: String + + init(resultToken: String) { + self.resultToken = resultToken + } + + func pairAndHello( + endpoint: NWEndpoint, + hello: BridgeHello, + onStatus: (@Sendable (String) -> Void)?) async throws -> String + { + self.lastToken = hello.token + onStatus?("Testing…") + return self.resultToken + } +} + private func withUserDefaults(_ updates: [String: Any?], _ body: () throws -> T) rethrows -> T { let defaults = UserDefaults.standard var snapshot: [String: Any?] = [:] @@ -40,6 +60,35 @@ private func withUserDefaults(_ updates: [String: Any?], _ body: () throws -> return try body() } +@MainActor +private func withUserDefaults( + _ updates: [String: Any?], + _ body: () async throws -> T) async rethrows -> T +{ + let defaults = UserDefaults.standard + var snapshot: [String: Any?] = [:] + for key in updates.keys { + snapshot[key] = defaults.object(forKey: key) + } + for (key, value) in updates { + if let value { + defaults.set(value, forKey: key) + } else { + defaults.removeObject(forKey: key) + } + } + defer { + for (key, value) in snapshot { + if let value { + defaults.set(value, forKey: key) + } else { + defaults.removeObject(forKey: key) + } + } + } + return try await body() +} + private func withKeychainValues(_ updates: [KeychainEntry: String?], _ body: () throws -> T) rethrows -> T { var snapshot: [KeychainEntry: String?] = [:] for entry in updates.keys { @@ -64,6 +113,34 @@ private func withKeychainValues(_ updates: [KeychainEntry: String?], _ body: return try body() } +@MainActor +private func withKeychainValues( + _ updates: [KeychainEntry: String?], + _ body: () async throws -> T) async rethrows -> T +{ + var snapshot: [KeychainEntry: String?] = [:] + for entry in updates.keys { + snapshot[entry] = KeychainStore.loadString(service: entry.service, account: entry.account) + } + for (entry, value) in updates { + if let value { + _ = KeychainStore.saveString(value, service: entry.service, account: entry.account) + } else { + _ = KeychainStore.delete(service: entry.service, account: entry.account) + } + } + defer { + for (entry, value) in snapshot { + if let value { + _ = KeychainStore.saveString(value, service: entry.service, account: entry.account) + } else { + _ = KeychainStore.delete(service: entry.service, account: entry.account) + } + } + } + return try await body() +} + @Suite(.serialized) struct BridgeConnectionControllerTests { @Test @MainActor func resolvedDisplayNameSetsDefaultWhenMissing() { let defaults = UserDefaults.standard @@ -156,4 +233,109 @@ private func withKeychainValues(_ updates: [KeychainEntry: String?], _ body: } } } + + @Test @MainActor func autoConnectRefreshesTokenOnUnauthorized() async { + let bridge = BridgeDiscoveryModel.DiscoveredBridge( + name: "Gateway", + endpoint: .hostPort(host: NWEndpoint.Host("127.0.0.1"), port: 18790), + stableID: "bridge-1", + debugID: "bridge-debug", + lanHost: "Mac.local", + tailnetDns: nil, + gatewayPort: 18789, + bridgePort: 18790, + canvasPort: 18793, + cliPath: nil) + let mock = MockBridgePairingClient(resultToken: "new-token") + let account = "bridge-token.ios-test" + + await withKeychainValues([ + instanceIdEntry: nil, + preferredBridgeEntry: nil, + lastBridgeEntry: nil, + KeychainEntry(service: bridgeService, account: account): "old-token", + ]) { + await withUserDefaults([ + "node.instanceId": "ios-test", + "bridge.lastDiscoveredStableID": "bridge-1", + "bridge.manual.enabled": false, + ]) { + let appModel = NodeAppModel() + let controller = BridgeConnectionController( + appModel: appModel, + startDiscovery: false, + bridgeClientFactory: { mock }) + controller._test_setBridges([bridge]) + controller._test_triggerAutoConnect() + + for _ in 0..<20 { + if appModel.connectedBridgeID == bridge.stableID { break } + try? await Task.sleep(nanoseconds: 50_000_000) + } + + #expect(appModel.connectedBridgeID == bridge.stableID) + let stored = KeychainStore.loadString(service: bridgeService, account: account) + #expect(stored == "new-token") + let lastToken = await mock.lastToken + #expect(lastToken == "old-token") + } + } + } + + @Test @MainActor func autoConnectPrefersPreferredBridgeOverLastDiscovered() async { + let bridgeA = BridgeDiscoveryModel.DiscoveredBridge( + name: "Gateway A", + endpoint: .hostPort(host: NWEndpoint.Host("127.0.0.1"), port: 18790), + stableID: "bridge-1", + debugID: "bridge-a", + lanHost: "MacA.local", + tailnetDns: nil, + gatewayPort: 18789, + bridgePort: 18790, + canvasPort: 18793, + cliPath: nil) + let bridgeB = BridgeDiscoveryModel.DiscoveredBridge( + name: "Gateway B", + endpoint: .hostPort(host: NWEndpoint.Host("127.0.0.1"), port: 28790), + stableID: "bridge-2", + debugID: "bridge-b", + lanHost: "MacB.local", + tailnetDns: nil, + gatewayPort: 28789, + bridgePort: 28790, + canvasPort: 28793, + cliPath: nil) + + let mock = MockBridgePairingClient(resultToken: "token-ok") + let account = "bridge-token.ios-test" + + await withKeychainValues([ + instanceIdEntry: nil, + preferredBridgeEntry: nil, + lastBridgeEntry: nil, + KeychainEntry(service: bridgeService, account: account): "old-token", + ]) { + await withUserDefaults([ + "node.instanceId": "ios-test", + "bridge.preferredStableID": "bridge-2", + "bridge.lastDiscoveredStableID": "bridge-1", + "bridge.manual.enabled": false, + ]) { + let appModel = NodeAppModel() + let controller = BridgeConnectionController( + appModel: appModel, + startDiscovery: false, + bridgeClientFactory: { mock }) + controller._test_setBridges([bridgeA, bridgeB]) + controller._test_triggerAutoConnect() + + for _ in 0..<20 { + if appModel.connectedBridgeID == bridgeB.stableID { break } + try? await Task.sleep(nanoseconds: 50_000_000) + } + + #expect(appModel.connectedBridgeID == bridgeB.stableID) + } + } + } } diff --git a/apps/ios/Tests/CameraControllerErrorTests.swift b/apps/ios/Tests/CameraControllerErrorTests.swift new file mode 100644 index 000000000..3b3c94281 --- /dev/null +++ b/apps/ios/Tests/CameraControllerErrorTests.swift @@ -0,0 +1,13 @@ +import Testing +@testable import Clawdis + +@Suite struct CameraControllerErrorTests { + @Test func errorDescriptionsAreStable() { + #expect(CameraController.CameraError.cameraUnavailable.errorDescription == "Camera unavailable") + #expect(CameraController.CameraError.microphoneUnavailable.errorDescription == "Microphone unavailable") + #expect(CameraController.CameraError.permissionDenied(kind: "Camera").errorDescription == "Camera permission denied") + #expect(CameraController.CameraError.invalidParams("bad").errorDescription == "bad") + #expect(CameraController.CameraError.captureFailed("nope").errorDescription == "nope") + #expect(CameraController.CameraError.exportFailed("export").errorDescription == "export") + } +} diff --git a/apps/ios/Tests/ScreenControllerTests.swift b/apps/ios/Tests/ScreenControllerTests.swift index 028a0eae6..835c0081f 100644 --- a/apps/ios/Tests/ScreenControllerTests.swift +++ b/apps/ios/Tests/ScreenControllerTests.swift @@ -16,6 +16,15 @@ import WebKit #expect(scrollView.bounces == false) } + @Test @MainActor func navigateEnablesScrollForWebPages() { + let screen = ScreenController() + screen.navigate(to: "https://example.com") + + let scrollView = screen.webView.scrollView + #expect(scrollView.isScrollEnabled == true) + #expect(scrollView.bounces == true) + } + @Test @MainActor func navigateSlashShowsDefaultCanvas() { let screen = ScreenController() screen.navigate(to: "/") diff --git a/apps/ios/project.yml b/apps/ios/project.yml index 033d2f68f..e8dac7a20 100644 --- a/apps/ios/project.yml +++ b/apps/ios/project.yml @@ -62,7 +62,11 @@ targets: swiftlint lint --config "$SRCROOT/.swiftlint.yml" --use-script-input-file-lists settings: base: + CODE_SIGN_IDENTITY: "Apple Development" + CODE_SIGN_STYLE: Manual + DEVELOPMENT_TEAM: Y5PE65HELJ PRODUCT_BUNDLE_IDENTIFIER: com.steipete.clawdis.ios + PROVISIONING_PROFILE_SPECIFIER: "com.steipete.clawdis.ios Development" SWIFT_VERSION: "6.0" info: path: Sources/Info.plist diff --git a/apps/macos/Package.swift b/apps/macos/Package.swift index 941d684c9..297d46886 100644 --- a/apps/macos/Package.swift +++ b/apps/macos/Package.swift @@ -15,6 +15,7 @@ let package = Package( dependencies: [ .package(url: "https://github.com/orchetect/MenuBarExtraAccess", exact: "1.2.2"), .package(url: "https://github.com/swiftlang/swift-subprocess.git", from: "0.1.0"), + .package(url: "https://github.com/apple/swift-log.git", from: "1.8.0"), .package(url: "https://github.com/sparkle-project/Sparkle", from: "2.8.1"), .package(path: "../shared/ClawdisKit"), .package(path: "../../Swabble"), @@ -45,6 +46,7 @@ let package = Package( .product(name: "SwabbleKit", package: "swabble"), .product(name: "MenuBarExtraAccess", package: "MenuBarExtraAccess"), .product(name: "Subprocess", package: "swift-subprocess"), + .product(name: "Logging", package: "swift-log"), .product(name: "Sparkle", package: "Sparkle"), .product(name: "PeekabooBridge", package: "PeekabooCore"), .product(name: "PeekabooAutomationKit", package: "PeekabooAutomationKit"), diff --git a/apps/macos/Sources/Clawdis/AppState.swift b/apps/macos/Sources/Clawdis/AppState.swift index 53d81c02d..65ddaaf90 100644 --- a/apps/macos/Sources/Clawdis/AppState.swift +++ b/apps/macos/Sources/Clawdis/AppState.swift @@ -121,6 +121,18 @@ final class AppState { forKey: voicePushToTalkEnabledKey) } } } + var talkEnabled: Bool { + didSet { + self.ifNotPreview { + UserDefaults.standard.set(self.talkEnabled, forKey: talkEnabledKey) + Task { await TalkModeController.shared.setEnabled(self.talkEnabled) } + } + } + } + + /// Gateway-provided UI accent color (hex). Optional; clients provide a default. + var seamColorHex: String? + var iconOverride: IconOverrideSelection { didSet { self.ifNotPreview { UserDefaults.standard.set(self.iconOverride.rawValue, forKey: iconOverrideKey) } } } @@ -216,6 +228,8 @@ final class AppState { .stringArray(forKey: voiceWakeAdditionalLocalesKey) ?? [] self.voicePushToTalkEnabled = UserDefaults.standard .object(forKey: voicePushToTalkEnabledKey) as? Bool ?? false + self.talkEnabled = UserDefaults.standard.bool(forKey: talkEnabledKey) + self.seamColorHex = nil if let storedHeartbeats = UserDefaults.standard.object(forKey: heartbeatsEnabledKey) as? Bool { self.heartbeatsEnabled = storedHeartbeats } else { @@ -256,9 +270,13 @@ final class AppState { if self.swabbleEnabled, !PermissionManager.voiceWakePermissionsGranted() { self.swabbleEnabled = false } + if self.talkEnabled, !PermissionManager.voiceWakePermissionsGranted() { + self.talkEnabled = false + } if !self.isPreview { Task { await VoiceWakeRuntime.shared.refresh(state: self) } + Task { await TalkModeController.shared.setEnabled(self.talkEnabled) } } } @@ -312,6 +330,31 @@ final class AppState { Task { await VoiceWakeRuntime.shared.refresh(state: self) } } + func setTalkEnabled(_ enabled: Bool) async { + guard voiceWakeSupported else { + self.talkEnabled = false + await GatewayConnection.shared.talkMode(enabled: false, phase: "disabled") + return + } + + self.talkEnabled = enabled + guard !self.isPreview else { return } + + if !enabled { + await GatewayConnection.shared.talkMode(enabled: false, phase: "disabled") + return + } + + if PermissionManager.voiceWakePermissionsGranted() { + await GatewayConnection.shared.talkMode(enabled: true, phase: "enabled") + return + } + + let granted = await PermissionManager.ensureVoiceWakePermissions(interactive: true) + self.talkEnabled = granted + await GatewayConnection.shared.talkMode(enabled: granted, phase: granted ? "enabled" : "denied") + } + // MARK: - Global wake words sync (Gateway-owned) func applyGlobalVoiceWakeTriggers(_ triggers: [String]) { @@ -367,6 +410,7 @@ extension AppState { state.voiceWakeLocaleID = Locale.current.identifier state.voiceWakeAdditionalLocaleIDs = ["en-US", "de-DE"] state.voicePushToTalkEnabled = false + state.talkEnabled = false state.iconOverride = .system state.heartbeatsEnabled = true state.connectionMode = .local diff --git a/apps/macos/Sources/Clawdis/CameraCaptureService.swift b/apps/macos/Sources/Clawdis/CameraCaptureService.swift index c087c8fd3..3c9d9c357 100644 --- a/apps/macos/Sources/Clawdis/CameraCaptureService.swift +++ b/apps/macos/Sources/Clawdis/CameraCaptureService.swift @@ -79,7 +79,14 @@ actor CameraCaptureService { } withExtendedLifetime(delegate) {} - let res = try JPEGTranscoder.transcodeToJPEG(imageData: rawData, maxWidthPx: maxWidth, quality: quality) + let maxPayloadBytes = 5 * 1024 * 1024 + // Base64 inflates payloads by ~4/3; cap encoded bytes so the payload stays under 5MB (API limit). + let maxEncodedBytes = (maxPayloadBytes / 4) * 3 + let res = try JPEGTranscoder.transcodeToJPEG( + imageData: rawData, + maxWidthPx: maxWidth, + quality: quality, + maxBytes: maxEncodedBytes) return (data: res.data, size: CGSize(width: res.widthPx, height: res.heightPx)) } diff --git a/apps/macos/Sources/Clawdis/CanvasManager.swift b/apps/macos/Sources/Clawdis/CanvasManager.swift index c19c5d06d..32163744b 100644 --- a/apps/macos/Sources/Clawdis/CanvasManager.swift +++ b/apps/macos/Sources/Clawdis/CanvasManager.swift @@ -190,7 +190,7 @@ final class CanvasManager { private static func resolveA2UIHostUrl(from raw: String?) -> String? { let trimmed = raw?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" guard !trimmed.isEmpty, let base = URL(string: trimmed) else { return nil } - return base.appendingPathComponent("__clawdis__/a2ui/").absoluteString + return base.appendingPathComponent("__clawdis__/a2ui/").absoluteString + "?platform=macos" } // MARK: - Anchoring diff --git a/apps/macos/Sources/Clawdis/CanvasWindow.swift b/apps/macos/Sources/Clawdis/CanvasWindow.swift index 30b0c8c41..e03ba2c42 100644 --- a/apps/macos/Sources/Clawdis/CanvasWindow.swift +++ b/apps/macos/Sources/Clawdis/CanvasWindow.swift @@ -1,5 +1,4 @@ import AppKit -import OSLog let canvasWindowLogger = Logger(subsystem: "com.steipete.clawdis", category: "Canvas") diff --git a/apps/macos/Sources/Clawdis/ClawdisConfigFile.swift b/apps/macos/Sources/Clawdis/ClawdisConfigFile.swift index 415c01571..a65abad5a 100644 --- a/apps/macos/Sources/Clawdis/ClawdisConfigFile.swift +++ b/apps/macos/Sources/Clawdis/ClawdisConfigFile.swift @@ -1,6 +1,8 @@ import Foundation enum ClawdisConfigFile { + private static let logger = Logger(subsystem: "com.steipete.clawdis", category: "config") + static func url() -> URL { FileManager.default.homeDirectoryForCurrentUser .appendingPathComponent(".clawdis") @@ -15,8 +17,18 @@ enum ClawdisConfigFile { static func loadDict() -> [String: Any] { let url = self.url() - guard let data = try? Data(contentsOf: url) else { return [:] } - return (try? JSONSerialization.jsonObject(with: data) as? [String: Any]) ?? [:] + guard FileManager.default.fileExists(atPath: url.path) else { return [:] } + do { + let data = try Data(contentsOf: url) + guard let root = try JSONSerialization.jsonObject(with: data) as? [String: Any] else { + self.logger.warning("config JSON root invalid") + return [:] + } + return root + } catch { + self.logger.warning("config read failed: \(error.localizedDescription)") + return [:] + } } static func saveDict(_ dict: [String: Any]) { @@ -28,7 +40,9 @@ enum ClawdisConfigFile { at: url.deletingLastPathComponent(), withIntermediateDirectories: true) try data.write(to: url, options: [.atomic]) - } catch {} + } catch { + self.logger.error("config save failed: \(error.localizedDescription)") + } } static func loadGatewayDict() -> [String: Any] { @@ -60,6 +74,7 @@ enum ClawdisConfigFile { browser["enabled"] = enabled root["browser"] = browser self.saveDict(root) + self.logger.debug("browser control updated enabled=\(enabled)") } static func agentWorkspace() -> String? { @@ -79,5 +94,6 @@ enum ClawdisConfigFile { } root["agent"] = agent self.saveDict(root) + self.logger.debug("agent workspace updated set=\(!trimmed.isEmpty)") } } diff --git a/apps/macos/Sources/Clawdis/CommandResolver.swift b/apps/macos/Sources/Clawdis/CommandResolver.swift index d418cd5e0..0729024bc 100644 --- a/apps/macos/Sources/Clawdis/CommandResolver.swift +++ b/apps/macos/Sources/Clawdis/CommandResolver.swift @@ -16,6 +16,10 @@ enum CommandResolver { RuntimeLocator.resolve(searchPaths: self.preferredPaths()) } + static func runtimeResolution(searchPaths: [String]?) -> Result { + RuntimeLocator.resolve(searchPaths: searchPaths ?? self.preferredPaths()) + } + static func makeRuntimeCommand( runtime: RuntimeResolution, entrypoint: String, @@ -152,8 +156,8 @@ enum CommandResolver { return paths } - static func findExecutable(named name: String) -> String? { - for dir in self.preferredPaths() { + static func findExecutable(named name: String, searchPaths: [String]? = nil) -> String? { + for dir in (searchPaths ?? self.preferredPaths()) { let candidate = (dir as NSString).appendingPathComponent(name) if FileManager.default.isExecutableFile(atPath: candidate) { return candidate @@ -162,8 +166,14 @@ enum CommandResolver { return nil } - static func clawdisExecutable() -> String? { - self.findExecutable(named: self.helperName) + static func clawdisExecutable(searchPaths: [String]? = nil) -> String? { + self.findExecutable(named: self.helperName, searchPaths: searchPaths) + } + + static func projectClawdisExecutable(projectRoot: URL? = nil) -> String? { + let root = projectRoot ?? self.projectRoot() + let candidate = root.appendingPathComponent("node_modules/.bin").appendingPathComponent(self.helperName).path + return FileManager.default.isExecutableFile(atPath: candidate) ? candidate : nil } static func nodeCliPath() -> String? { @@ -171,17 +181,18 @@ enum CommandResolver { return FileManager.default.isReadableFile(atPath: candidate) ? candidate : nil } - static func hasAnyClawdisInvoker() -> Bool { - if self.clawdisExecutable() != nil { return true } - if self.findExecutable(named: "pnpm") != nil { return true } - if self.findExecutable(named: "node") != nil, self.nodeCliPath() != nil { return true } + static func hasAnyClawdisInvoker(searchPaths: [String]? = nil) -> Bool { + if self.clawdisExecutable(searchPaths: searchPaths) != nil { return true } + if self.findExecutable(named: "pnpm", searchPaths: searchPaths) != nil { return true } + if self.findExecutable(named: "node", searchPaths: searchPaths) != nil, self.nodeCliPath() != nil { return true } return false } static func clawdisNodeCommand( subcommand: String, extraArgs: [String] = [], - defaults: UserDefaults = .standard) -> [String] + defaults: UserDefaults = .standard, + searchPaths: [String]? = nil) -> [String] { let settings = self.connectionSettings(defaults: defaults) if settings.mode == .remote, let ssh = self.sshNodeCommand( @@ -192,25 +203,29 @@ enum CommandResolver { return ssh } - let runtimeResult = self.runtimeResolution() + let runtimeResult = self.runtimeResolution(searchPaths: searchPaths) switch runtimeResult { case let .success(runtime): - if let clawdisPath = self.clawdisExecutable() { + let root = self.projectRoot() + if let clawdisPath = self.projectClawdisExecutable(projectRoot: root) { return [clawdisPath, subcommand] + extraArgs } - if let entry = self.gatewayEntrypoint(in: self.projectRoot()) { + if let entry = self.gatewayEntrypoint(in: root) { return self.makeRuntimeCommand( runtime: runtime, entrypoint: entry, subcommand: subcommand, extraArgs: extraArgs) } - if let pnpm = self.findExecutable(named: "pnpm") { + if let pnpm = self.findExecutable(named: "pnpm", searchPaths: searchPaths) { // Use --silent to avoid pnpm lifecycle banners that would corrupt JSON outputs. return [pnpm, "--silent", "clawdis", subcommand] + extraArgs } + if let clawdisPath = self.clawdisExecutable(searchPaths: searchPaths) { + return [clawdisPath, subcommand] + extraArgs + } let missingEntry = """ clawdis entrypoint missing (looked for dist/index.js or bin/clawdis.js); run pnpm build. @@ -226,9 +241,10 @@ enum CommandResolver { static func clawdisCommand( subcommand: String, extraArgs: [String] = [], - defaults: UserDefaults = .standard) -> [String] + defaults: UserDefaults = .standard, + searchPaths: [String]? = nil) -> [String] { - self.clawdisNodeCommand(subcommand: subcommand, extraArgs: extraArgs, defaults: defaults) + self.clawdisNodeCommand(subcommand: subcommand, extraArgs: extraArgs, defaults: defaults, searchPaths: searchPaths) } // MARK: - SSH helpers @@ -258,7 +274,7 @@ enum CommandResolver { "/bin", "/usr/sbin", "/sbin", - "/Users/steipete/Library/pnpm", + "$HOME/Library/pnpm", "$PATH", ].joined(separator: ":") let quotedArgs = ([subcommand] + extraArgs).map(self.shellQuote).joined(separator: " ") diff --git a/apps/macos/Sources/Clawdis/ConfigSettings.swift b/apps/macos/Sources/Clawdis/ConfigSettings.swift index 75b6a7de3..74753d0bc 100644 --- a/apps/macos/Sources/Clawdis/ConfigSettings.swift +++ b/apps/macos/Sources/Clawdis/ConfigSettings.swift @@ -31,6 +31,12 @@ struct ConfigSettings: View { @State private var browserColorHex: String = "#FF4500" @State private var browserAttachOnly: Bool = false + // Talk mode settings (stored in ~/.clawdis/clawdis.json under "talk") + @State private var talkVoiceId: String = "" + @State private var talkInterruptOnSpeech: Bool = true + @State private var talkApiKey: String = "" + @State private var gatewayApiKeyFound = false + var body: some View { ScrollView { self.content } .onChange(of: self.modelCatalogPath) { _, _ in @@ -45,6 +51,7 @@ struct ConfigSettings: View { self.hasLoaded = true self.loadConfig() await self.loadModels() + await self.refreshGatewayTalkApiKey() self.allowAutosave = true } } @@ -56,6 +63,8 @@ struct ConfigSettings: View { .disabled(self.isNixMode) self.heartbeatSection .disabled(self.isNixMode) + self.talkSection + .disabled(self.isNixMode) self.browserSection .disabled(self.isNixMode) Spacer(minLength: 0) @@ -272,18 +281,101 @@ struct ConfigSettings: View { .frame(maxWidth: .infinity, alignment: .leading) } + private var talkSection: some View { + GroupBox("Talk Mode") { + Grid(alignment: .leadingFirstTextBaseline, horizontalSpacing: 14, verticalSpacing: 10) { + GridRow { + self.gridLabel("Voice ID") + VStack(alignment: .leading, spacing: 6) { + HStack(spacing: 8) { + TextField("ElevenLabs voice ID", text: self.$talkVoiceId) + .textFieldStyle(.roundedBorder) + .frame(maxWidth: .infinity) + .onChange(of: self.talkVoiceId) { _, _ in self.autosaveConfig() } + if !self.talkVoiceSuggestions.isEmpty { + Menu { + ForEach(self.talkVoiceSuggestions, id: \.self) { value in + Button(value) { + self.talkVoiceId = value + self.autosaveConfig() + } + } + } label: { + Label("Suggestions", systemImage: "chevron.up.chevron.down") + } + .fixedSize() + } + } + Text("Defaults to ELEVENLABS_VOICE_ID / SAG_VOICE_ID if unset.") + .font(.footnote) + .foregroundStyle(.secondary) + } + } + GridRow { + self.gridLabel("API key") + VStack(alignment: .leading, spacing: 6) { + HStack(spacing: 8) { + SecureField("ELEVENLABS_API_KEY", text: self.$talkApiKey) + .textFieldStyle(.roundedBorder) + .frame(maxWidth: .infinity) + .disabled(self.hasEnvApiKey) + .onChange(of: self.talkApiKey) { _, _ in self.autosaveConfig() } + if !self.hasEnvApiKey && !self.talkApiKey.isEmpty { + Button("Clear") { + self.talkApiKey = "" + self.autosaveConfig() + } + } + } + self.statusLine(label: self.apiKeyStatusLabel, color: self.apiKeyStatusColor) + if self.hasEnvApiKey { + Text("Using ELEVENLABS_API_KEY from the environment.") + .font(.footnote) + .foregroundStyle(.secondary) + } else if self.gatewayApiKeyFound && self.talkApiKey.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty { + Text("Using API key from the gateway profile.") + .font(.footnote) + .foregroundStyle(.secondary) + } + } + } + GridRow { + self.gridLabel("Interrupt") + Toggle("Stop speaking when you start talking", isOn: self.$talkInterruptOnSpeech) + .labelsHidden() + .toggleStyle(.checkbox) + .onChange(of: self.talkInterruptOnSpeech) { _, _ in self.autosaveConfig() } + } + } + } + .frame(maxWidth: .infinity, alignment: .leading) + } + private func gridLabel(_ text: String) -> some View { Text(text) .foregroundStyle(.secondary) .frame(width: self.labelColumnWidth, alignment: .leading) } + private func statusLine(label: String, color: Color) -> some View { + HStack(spacing: 6) { + Circle() + .fill(color) + .frame(width: 6, height: 6) + Text(label) + .font(.footnote) + .foregroundStyle(.secondary) + } + .padding(.top, 2) + } + private func loadConfig() { let parsed = self.loadConfigDict() let agent = parsed["agent"] as? [String: Any] let heartbeatMinutes = agent?["heartbeatMinutes"] as? Int let heartbeatBody = agent?["heartbeatBody"] as? String let browser = parsed["browser"] as? [String: Any] + let talk = parsed["talk"] as? [String: Any] let loadedModel = (agent?["model"] as? String) ?? "" if !loadedModel.isEmpty { @@ -303,6 +395,28 @@ struct ConfigSettings: View { if let color = browser["color"] as? String, !color.isEmpty { self.browserColorHex = color } if let attachOnly = browser["attachOnly"] as? Bool { self.browserAttachOnly = attachOnly } } + + if let talk { + if let voice = talk["voiceId"] as? String { self.talkVoiceId = voice } + if let apiKey = talk["apiKey"] as? String { self.talkApiKey = apiKey } + if let interrupt = talk["interruptOnSpeech"] as? Bool { + self.talkInterruptOnSpeech = interrupt + } + } + } + + private func refreshGatewayTalkApiKey() async { + do { + let snap: ConfigSnapshot = try await GatewayConnection.shared.requestDecoded( + method: .configGet, + params: nil, + timeoutMs: 8000) + let talk = snap.config?["talk"]?.dictionaryValue + let apiKey = talk?["apiKey"]?.stringValue?.trimmingCharacters(in: .whitespacesAndNewlines) + self.gatewayApiKeyFound = !(apiKey ?? "").isEmpty + } catch { + self.gatewayApiKeyFound = false + } } private func autosaveConfig() { @@ -318,6 +432,7 @@ struct ConfigSettings: View { var root = self.loadConfigDict() var agent = root["agent"] as? [String: Any] ?? [:] var browser = root["browser"] as? [String: Any] ?? [:] + var talk = root["talk"] as? [String: Any] ?? [:] let chosenModel = (self.configModel == "__custom__" ? self.customModel : self.configModel) .trimmingCharacters(in: .whitespacesAndNewlines) @@ -343,6 +458,21 @@ struct ConfigSettings: View { browser["attachOnly"] = self.browserAttachOnly root["browser"] = browser + let trimmedVoice = self.talkVoiceId.trimmingCharacters(in: .whitespacesAndNewlines) + if trimmedVoice.isEmpty { + talk.removeValue(forKey: "voiceId") + } else { + talk["voiceId"] = trimmedVoice + } + let trimmedApiKey = self.talkApiKey.trimmingCharacters(in: .whitespacesAndNewlines) + if trimmedApiKey.isEmpty { + talk.removeValue(forKey: "apiKey") + } else { + talk["apiKey"] = trimmedApiKey + } + talk["interruptOnSpeech"] = self.talkInterruptOnSpeech + root["talk"] = talk + ClawdisConfigFile.saveDict(root) } @@ -360,6 +490,41 @@ struct ConfigSettings: View { return Color(red: r, green: g, blue: b) } + private var talkVoiceSuggestions: [String] { + let env = ProcessInfo.processInfo.environment + let candidates = [ + self.talkVoiceId, + env["ELEVENLABS_VOICE_ID"] ?? "", + env["SAG_VOICE_ID"] ?? "", + ] + var seen = Set() + return candidates + .map { $0.trimmingCharacters(in: .whitespacesAndNewlines) } + .filter { !$0.isEmpty } + .filter { seen.insert($0).inserted } + } + + private var hasEnvApiKey: Bool { + let raw = ProcessInfo.processInfo.environment["ELEVENLABS_API_KEY"] ?? "" + return !raw.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty + } + + private var apiKeyStatusLabel: String { + if self.hasEnvApiKey { return "ElevenLabs API key: found (environment)" } + if !self.talkApiKey.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty { + return "ElevenLabs API key: stored in config" + } + if self.gatewayApiKeyFound { return "ElevenLabs API key: found (gateway)" } + return "ElevenLabs API key: missing" + } + + private var apiKeyStatusColor: Color { + if self.hasEnvApiKey { return .green } + if !self.talkApiKey.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty { return .green } + if self.gatewayApiKeyFound { return .green } + return .red + } + private var browserPathLabel: String? { guard self.browserEnabled else { return nil } diff --git a/apps/macos/Sources/Clawdis/ConnectionsStore.swift b/apps/macos/Sources/Clawdis/ConnectionsStore.swift index d89468a2b..0ff9f0bb5 100644 --- a/apps/macos/Sources/Clawdis/ConnectionsStore.swift +++ b/apps/macos/Sources/Clawdis/ConnectionsStore.swift @@ -294,6 +294,11 @@ final class ConnectionsStore { : nil self.configRoot = snap.config?.mapValues { $0.foundationValue } ?? [:] self.configLoaded = true + + let ui = snap.config?["ui"]?.dictionaryValue + let rawSeam = ui?["seamColor"]?.stringValue?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + AppStateStore.shared.seamColorHex = rawSeam.isEmpty ? nil : rawSeam + let telegram = snap.config?["telegram"]?.dictionaryValue self.telegramToken = telegram?["botToken"]?.stringValue ?? "" self.telegramRequireMention = telegram?["requireMention"]?.boolValue ?? true diff --git a/apps/macos/Sources/Clawdis/Constants.swift b/apps/macos/Sources/Clawdis/Constants.swift index 966d1744a..639d12afc 100644 --- a/apps/macos/Sources/Clawdis/Constants.swift +++ b/apps/macos/Sources/Clawdis/Constants.swift @@ -16,6 +16,7 @@ let voiceWakeMicKey = "clawdis.voiceWakeMicID" let voiceWakeLocaleKey = "clawdis.voiceWakeLocaleID" let voiceWakeAdditionalLocalesKey = "clawdis.voiceWakeAdditionalLocaleIDs" let voicePushToTalkEnabledKey = "clawdis.voicePushToTalkEnabled" +let talkEnabledKey = "clawdis.talkEnabled" let iconOverrideKey = "clawdis.iconOverride" let connectionModeKey = "clawdis.connectionMode" let remoteTargetKey = "clawdis.remoteTarget" @@ -31,5 +32,6 @@ let modelCatalogReloadKey = "clawdis.modelCatalogReload" let attachExistingGatewayOnlyKey = "clawdis.gateway.attachExistingOnly" let heartbeatsEnabledKey = "clawdis.heartbeatsEnabled" let debugFileLogEnabledKey = "clawdis.debug.fileLogEnabled" +let appLogLevelKey = "clawdis.debug.appLogLevel" let voiceWakeSupported: Bool = ProcessInfo.processInfo.operatingSystemVersion.majorVersion >= 26 let cliHelperSearchPaths = ["/usr/local/bin", "/opt/homebrew/bin"] diff --git a/apps/macos/Sources/Clawdis/ControlChannel.swift b/apps/macos/Sources/Clawdis/ControlChannel.swift index 5312fa641..51c97a6c3 100644 --- a/apps/macos/Sources/Clawdis/ControlChannel.swift +++ b/apps/macos/Sources/Clawdis/ControlChannel.swift @@ -1,7 +1,6 @@ import ClawdisProtocol import Foundation import Observation -import OSLog import SwiftUI struct ControlHeartbeatEvent: Codable { diff --git a/apps/macos/Sources/Clawdis/DebugSettings.swift b/apps/macos/Sources/Clawdis/DebugSettings.swift index a730d5ef1..037a17855 100644 --- a/apps/macos/Sources/Clawdis/DebugSettings.swift +++ b/apps/macos/Sources/Clawdis/DebugSettings.swift @@ -1,8 +1,10 @@ import AppKit +import Observation import SwiftUI import UniformTypeIdentifiers struct DebugSettings: View { + @Bindable var state: AppState private let isPreview = ProcessInfo.processInfo.isPreview private let labelColumnWidth: CGFloat = 140 @AppStorage(modelCatalogPathKey) private var modelCatalogPath: String = ModelCatalogLoader.defaultPath @@ -28,6 +30,7 @@ struct DebugSettings: View { @State private var pendingKill: DebugActions.PortListener? @AppStorage(attachExistingGatewayOnlyKey) private var attachExistingGatewayOnly: Bool = false @AppStorage(debugFileLogEnabledKey) private var diagnosticsFileLogEnabled: Bool = false + @AppStorage(appLogLevelKey) private var appLogLevelRaw: String = AppLogLevel.default.rawValue @State private var canvasSessionKey: String = "main" @State private var canvasStatus: String? @@ -36,6 +39,10 @@ struct DebugSettings: View { @State private var canvasEvalResult: String? @State private var canvasSnapshotPath: String? + init(state: AppState = AppStateStore.shared) { + self.state = state + } + var body: some View { ScrollView(.vertical) { VStack(alignment: .leading, spacing: 14) { @@ -194,7 +201,9 @@ struct DebugSettings: View { .overlay(RoundedRectangle(cornerRadius: 6).stroke(Color.secondary.opacity(0.2))) HStack(spacing: 8) { - Button("Restart Gateway") { DebugActions.restartGateway() } + if self.canRestartGateway { + Button("Restart Gateway") { DebugActions.restartGateway() } + } Button("Clear log") { GatewayProcessManager.shared.clearLog() } Spacer(minLength: 0) } @@ -224,13 +233,23 @@ struct DebugSettings: View { } GridRow { - self.gridLabel("Diagnostics") - VStack(alignment: .leading, spacing: 6) { + self.gridLabel("App logging") + VStack(alignment: .leading, spacing: 8) { + Picker("Verbosity", selection: self.$appLogLevelRaw) { + ForEach(AppLogLevel.allCases) { level in + Text(level.title).tag(level.rawValue) + } + } + .pickerStyle(.menu) + .labelsHidden() + .help("Controls the macOS app log verbosity.") + Toggle("Write rolling diagnostics log (JSONL)", isOn: self.$diagnosticsFileLogEnabled) .toggleStyle(.checkbox) .help( - "Writes a rotating, local-only diagnostics log under ~/Library/Logs/Clawdis/. " + + "Writes a rotating, local-only log under ~/Library/Logs/Clawdis/. " + "Enable only while actively debugging.") + HStack(spacing: 8) { Button("Open folder") { NSWorkspace.shared.open(DiagnosticsFileLog.logDirectoryURL()) @@ -762,6 +781,10 @@ struct DebugSettings: View { CommandResolver.connectionSettings().mode == .remote } + private var canRestartGateway: Bool { + self.state.connectionMode == .local && !self.attachExistingGatewayOnly + } + private func configURL() -> URL { FileManager.default.homeDirectoryForCurrentUser .appendingPathComponent(".clawdis") @@ -902,7 +925,7 @@ private struct PlainSettingsGroupBoxStyle: GroupBoxStyle { #if DEBUG struct DebugSettings_Previews: PreviewProvider { static var previews: some View { - DebugSettings() + DebugSettings(state: .preview) .frame(width: SettingsTab.windowWidth, height: SettingsTab.windowHeight) } } @@ -910,7 +933,7 @@ struct DebugSettings_Previews: PreviewProvider { @MainActor extension DebugSettings { static func exerciseForTesting() async { - let view = DebugSettings() + let view = DebugSettings(state: .preview) view.modelsCount = 3 view.modelsLoading = false view.modelsError = "Failed to load models" diff --git a/apps/macos/Sources/Clawdis/DeviceModelCatalog.swift b/apps/macos/Sources/Clawdis/DeviceModelCatalog.swift index 3bda2e487..9d9da9987 100644 --- a/apps/macos/Sources/Clawdis/DeviceModelCatalog.swift +++ b/apps/macos/Sources/Clawdis/DeviceModelCatalog.swift @@ -7,6 +7,8 @@ struct DevicePresentation: Sendable { enum DeviceModelCatalog { private static let modelIdentifierToName: [String: String] = loadModelIdentifierToName() + private static let resourceBundle: Bundle? = locateResourceBundle() + private static let resourceSubdirectory = "DeviceModels" static func presentation(deviceFamily: String?, modelIdentifier: String?) -> DevicePresentation? { let family = (deviceFamily ?? "").trimmingCharacters(in: .whitespacesAndNewlines) @@ -104,13 +106,11 @@ enum DeviceModelCatalog { } private static func loadMapping(resourceName: String) -> [String: String] { - guard let url = self.resourceURL( - resourceName: resourceName, + guard let url = self.resourceBundle?.url( + forResource: resourceName, withExtension: "json", - subdirectory: "DeviceModels") - else { - return [:] - } + subdirectory: self.resourceSubdirectory) + else { return [:] } do { let data = try Data(contentsOf: url) @@ -121,37 +121,48 @@ enum DeviceModelCatalog { } } - private static func resourceURL( - resourceName: String, - withExtension ext: String, - subdirectory: String - ) -> URL? { - let bundledSubdir = "Clawdis_Clawdis.bundle/\(subdirectory)" - let mainBundle = Bundle.main - - if let url = mainBundle.url(forResource: resourceName, withExtension: ext, subdirectory: bundledSubdir) - ?? mainBundle.url(forResource: resourceName, withExtension: ext, subdirectory: subdirectory) - { - return url + private static func locateResourceBundle() -> Bundle? { + if let bundle = self.bundleIfContainsDeviceModels(Bundle.module) { + return bundle } - let fallbackBases = [ - mainBundle.resourceURL, - mainBundle.bundleURL.appendingPathComponent("Contents/Resources"), - mainBundle.bundleURL.deletingLastPathComponent(), - ].compactMap { $0 } + if let bundle = self.bundleIfContainsDeviceModels(Bundle.main) { + return bundle + } - let fileName = "\(resourceName).\(ext)" - for base in fallbackBases { - let bundled = base.appendingPathComponent(bundledSubdir).appendingPathComponent(fileName) - if FileManager.default.fileExists(atPath: bundled.path) { return bundled } - let loose = base.appendingPathComponent(subdirectory).appendingPathComponent(fileName) - if FileManager.default.fileExists(atPath: loose.path) { return loose } + if let resourceURL = Bundle.main.resourceURL { + if let enumerator = FileManager.default.enumerator( + at: resourceURL, + includingPropertiesForKeys: [.isDirectoryKey], + options: [.skipsHiddenFiles]) { + for case let url as URL in enumerator { + guard url.pathExtension == "bundle" else { continue } + if let bundle = Bundle(url: url), + self.bundleIfContainsDeviceModels(bundle) != nil { + return bundle + } + } + } } return nil } + private static func bundleIfContainsDeviceModels(_ bundle: Bundle) -> Bundle? { + if bundle.url( + forResource: "ios-device-identifiers", + withExtension: "json", + subdirectory: self.resourceSubdirectory) != nil { + return bundle + } + if bundle.url( + forResource: "mac-device-identifiers", + withExtension: "json", + subdirectory: self.resourceSubdirectory) != nil { + return bundle + } + return nil + } private enum NameValue: Decodable { case string(String) case stringArray([String]) diff --git a/apps/macos/Sources/Clawdis/DockIconManager.swift b/apps/macos/Sources/Clawdis/DockIconManager.swift index 1e3a98f43..d0bee9543 100644 --- a/apps/macos/Sources/Clawdis/DockIconManager.swift +++ b/apps/macos/Sources/Clawdis/DockIconManager.swift @@ -1,5 +1,4 @@ import AppKit -import OSLog /// Central manager for Dock icon visibility. /// Shows the Dock icon while any windows are visible, regardless of user preference. diff --git a/apps/macos/Sources/Clawdis/GatewayConnection.swift b/apps/macos/Sources/Clawdis/GatewayConnection.swift index f255f69b9..cd87eea2c 100644 --- a/apps/macos/Sources/Clawdis/GatewayConnection.swift +++ b/apps/macos/Sources/Clawdis/GatewayConnection.swift @@ -51,6 +51,7 @@ actor GatewayConnection { case providersStatus = "providers.status" case configGet = "config.get" case configSet = "config.set" + case talkMode = "talk.mode" case webLoginStart = "web.login.start" case webLoginWait = "web.login.wait" case webLogout = "web.logout" @@ -472,7 +473,10 @@ extension GatewayConnection { params["attachments"] = AnyCodable(encoded) } - return try await self.requestDecoded(method: .chatSend, params: params) + return try await self.requestDecoded( + method: .chatSend, + params: params, + timeoutMs: Double(timeoutMs)) } func chatAbort(sessionKey: String, runId: String) async throws -> Bool { @@ -483,6 +487,12 @@ extension GatewayConnection { return res.aborted ?? false } + func talkMode(enabled: Bool, phase: String? = nil) async { + var params: [String: AnyCodable] = ["enabled": AnyCodable(enabled)] + if let phase { params["phase"] = AnyCodable(phase) } + try? await self.requestVoid(method: .talkMode, params: params) + } + // MARK: - VoiceWake func voiceWakeGetTriggers() async throws -> [String] { diff --git a/apps/macos/Sources/Clawdis/GatewayLaunchAgentManager.swift b/apps/macos/Sources/Clawdis/GatewayLaunchAgentManager.swift index 60b4afb0f..70de79ecd 100644 --- a/apps/macos/Sources/Clawdis/GatewayLaunchAgentManager.swift +++ b/apps/macos/Sources/Clawdis/GatewayLaunchAgentManager.swift @@ -1,6 +1,7 @@ import Foundation enum GatewayLaunchAgentManager { + private static let logger = Logger(subsystem: "com.steipete.clawdis", category: "gateway.launchd") private static let supportedBindModes: Set = ["loopback", "tailnet", "lan", "auto"] private static var plistURL: URL { @@ -26,12 +27,16 @@ enum GatewayLaunchAgentManager { if enabled { let gatewayBin = self.gatewayExecutablePath(bundlePath: bundlePath) guard FileManager.default.isExecutableFile(atPath: gatewayBin) else { + self.logger.error("launchd enable failed: gateway missing at \(gatewayBin)") return "Embedded gateway missing in bundle; rebuild via scripts/package-mac-app.sh" } + self.logger.info("launchd enable requested port=\(port)") self.writePlist(bundlePath: bundlePath, port: port) _ = await self.runLaunchctl(["bootout", "gui/\(getuid())/\(gatewayLaunchdLabel)"]) let bootstrap = await self.runLaunchctl(["bootstrap", "gui/\(getuid())", self.plistURL.path]) if bootstrap.status != 0 { + let msg = bootstrap.output.trimmingCharacters(in: .whitespacesAndNewlines) + self.logger.error("launchd bootstrap failed: \(msg)") return bootstrap.output.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty ? "Failed to bootstrap gateway launchd job" : bootstrap.output.trimmingCharacters(in: .whitespacesAndNewlines) @@ -42,6 +47,7 @@ enum GatewayLaunchAgentManager { return nil } + self.logger.info("launchd disable requested") _ = await self.runLaunchctl(["bootout", "gui/\(getuid())/\(gatewayLaunchdLabel)"]) try? FileManager.default.removeItem(at: self.plistURL) return nil @@ -103,7 +109,11 @@ enum GatewayLaunchAgentManager { """ - try? plist.write(to: self.plistURL, atomically: true, encoding: .utf8) + do { + try plist.write(to: self.plistURL, atomically: true, encoding: .utf8) + } catch { + self.logger.error("launchd plist write failed: \(error.localizedDescription)") + } } private static func preferredGatewayBind() -> String? { diff --git a/apps/macos/Sources/Clawdis/GatewayProcessManager.swift b/apps/macos/Sources/Clawdis/GatewayProcessManager.swift index 0b4a87c9f..cca6cf79b 100644 --- a/apps/macos/Sources/Clawdis/GatewayProcessManager.swift +++ b/apps/macos/Sources/Clawdis/GatewayProcessManager.swift @@ -42,6 +42,7 @@ final class GatewayProcessManager { private var environmentRefreshTask: Task? private var lastEnvironmentRefresh: Date? private var logRefreshTask: Task? + private let logger = Logger(subsystem: "com.steipete.clawdis", category: "gateway.process") private let logLimit = 20000 // characters to keep in-memory private let environmentRefreshMinInterval: TimeInterval = 30 @@ -53,8 +54,10 @@ final class GatewayProcessManager { self.stop() self.status = .stopped self.appendLog("[gateway] remote mode active; skipping local gateway\n") + self.logger.info("gateway process skipped: remote mode active") return } + self.logger.debug("gateway active requested active=\(active)") self.desiredActive = active self.refreshEnvironmentStatus() if active { @@ -86,6 +89,7 @@ final class GatewayProcessManager { return } self.status = .starting + self.logger.debug("gateway start requested") // First try to latch onto an already-running gateway to avoid spawning a duplicate. Task { [weak self] in @@ -98,6 +102,7 @@ final class GatewayProcessManager { await MainActor.run { self.status = .failed("Attach-only enabled; no gateway to attach") self.appendLog("[gateway] attach-only enabled; not spawning local gateway\n") + self.logger.warning("gateway attach-only enabled; not spawning") } return } @@ -110,6 +115,7 @@ final class GatewayProcessManager { self.existingGatewayDetails = nil self.lastFailureReason = nil self.status = .stopped + self.logger.info("gateway stop requested") let bundlePath = Bundle.main.bundleURL.path Task { _ = await GatewayLaunchAgentManager.set( @@ -182,6 +188,7 @@ final class GatewayProcessManager { self.existingGatewayDetails = details self.status = .attachedExisting(details: details) self.appendLog("[gateway] using existing instance: \(details)\n") + self.logger.info("gateway using existing instance details=\(details)") self.refreshControlChannelIfNeeded(reason: "attach existing") self.refreshLog() return true @@ -197,6 +204,7 @@ final class GatewayProcessManager { self.status = .failed(reason) self.lastFailureReason = reason self.appendLog("[gateway] existing listener on port \(port) but attach failed: \(reason)\n") + self.logger.warning("gateway attach failed reason=\(reason)") return true } @@ -268,16 +276,19 @@ final class GatewayProcessManager { await MainActor.run { self.status = .failed(resolution.status.message) } + self.logger.error("gateway command resolve failed: \(resolution.status.message)") return } let bundlePath = Bundle.main.bundleURL.path let port = GatewayEnvironment.gatewayPort() self.appendLog("[gateway] enabling launchd job (\(gatewayLaunchdLabel)) on port \(port)\n") + self.logger.info("gateway enabling launchd port=\(port)") let err = await GatewayLaunchAgentManager.set(enabled: true, bundlePath: bundlePath, port: port) if let err { self.status = .failed(err) self.lastFailureReason = err + self.logger.error("gateway launchd enable failed: \(err)") return } @@ -290,6 +301,7 @@ final class GatewayProcessManager { let instance = await PortGuardian.shared.describe(port: port) let details = instance.map { "pid \($0.pid)" } self.status = .running(details: details) + self.logger.info("gateway started details=\(details ?? "ok")") self.refreshControlChannelIfNeeded(reason: "gateway started") self.refreshLog() return @@ -300,6 +312,7 @@ final class GatewayProcessManager { self.status = .failed("Gateway did not start in time") self.lastFailureReason = "launchd start timeout" + self.logger.warning("gateway start timed out") } private func appendLog(_ chunk: String) { @@ -317,6 +330,7 @@ final class GatewayProcessManager { break } self.appendLog("[gateway] refreshing control channel (\(reason))\n") + self.logger.debug("gateway control channel refresh reason=\(reason)") Task { await ControlChannel.shared.configure() } } @@ -332,12 +346,14 @@ final class GatewayProcessManager { } } self.appendLog("[gateway] readiness wait timed out\n") + self.logger.warning("gateway readiness wait timed out") return false } func clearLog() { self.log = "" try? FileManager.default.removeItem(atPath: LogLocator.launchdGatewayLogPath) + self.logger.debug("gateway log cleared") } func setProjectRoot(path: String) { diff --git a/apps/macos/Sources/Clawdis/HealthStore.swift b/apps/macos/Sources/Clawdis/HealthStore.swift index cfd2ee014..a5ae9cbe4 100644 --- a/apps/macos/Sources/Clawdis/HealthStore.swift +++ b/apps/macos/Sources/Clawdis/HealthStore.swift @@ -1,7 +1,6 @@ import Foundation import Network import Observation -import OSLog import SwiftUI struct HealthSnapshot: Codable, Sendable { diff --git a/apps/macos/Sources/Clawdis/Logging/ClawdisLogging.swift b/apps/macos/Sources/Clawdis/Logging/ClawdisLogging.swift new file mode 100644 index 000000000..b759828c1 --- /dev/null +++ b/apps/macos/Sources/Clawdis/Logging/ClawdisLogging.swift @@ -0,0 +1,229 @@ +@_exported import Logging +import Foundation +import OSLog + +typealias Logger = Logging.Logger + +enum AppLogSettings { + static let logLevelKey = appLogLevelKey + + static func logLevel() -> Logger.Level { + if let raw = UserDefaults.standard.string(forKey: self.logLevelKey), + let level = Logger.Level(rawValue: raw) + { + return level + } + return .info + } + + static func setLogLevel(_ level: Logger.Level) { + UserDefaults.standard.set(level.rawValue, forKey: self.logLevelKey) + } + + static func fileLoggingEnabled() -> Bool { + UserDefaults.standard.bool(forKey: debugFileLogEnabledKey) + } +} + +enum AppLogLevel: String, CaseIterable, Identifiable { + case trace + case debug + case info + case notice + case warning + case error + case critical + + static let `default`: AppLogLevel = .info + + var id: String { self.rawValue } + + var title: String { + switch self { + case .trace: "Trace" + case .debug: "Debug" + case .info: "Info" + case .notice: "Notice" + case .warning: "Warning" + case .error: "Error" + case .critical: "Critical" + } + } +} + +enum ClawdisLogging { + private static let labelSeparator = "::" + + private static let didBootstrap: Void = { + LoggingSystem.bootstrap { label in + let (subsystem, category) = Self.parseLabel(label) + let osHandler = ClawdisOSLogHandler(subsystem: subsystem, category: category) + let fileHandler = ClawdisFileLogHandler(label: label) + return MultiplexLogHandler([osHandler, fileHandler]) + } + }() + + static func bootstrapIfNeeded() { + _ = Self.didBootstrap + } + + static func makeLabel(subsystem: String, category: String) -> String { + "\(subsystem)\(Self.labelSeparator)\(category)" + } + + static func parseLabel(_ label: String) -> (String, String) { + guard let range = label.range(of: Self.labelSeparator) else { + return ("com.steipete.clawdis", label) + } + let subsystem = String(label[..(_ value: T, privacy: OSLogPrivacy) { + self.appendInterpolation(String(describing: value)) + } +} + +struct ClawdisOSLogHandler: LogHandler { + private let osLogger: OSLog.Logger + var metadata: Logger.Metadata = [:] + + var logLevel: Logger.Level { + get { AppLogSettings.logLevel() } + set { AppLogSettings.setLogLevel(newValue) } + } + + init(subsystem: String, category: String) { + self.osLogger = OSLog.Logger(subsystem: subsystem, category: category) + } + + subscript(metadataKey key: String) -> Logger.Metadata.Value? { + get { self.metadata[key] } + set { self.metadata[key] = newValue } + } + + func log( + level: Logger.Level, + message: Logger.Message, + metadata: Logger.Metadata?, + source: String, + file: String, + function: String, + line: UInt) + { + let merged = Self.mergeMetadata(self.metadata, metadata) + let rendered = Self.renderMessage(message, metadata: merged) + self.osLogger.log(level: Self.osLogType(for: level), "\(rendered, privacy: .public)") + } + + private static func osLogType(for level: Logger.Level) -> OSLogType { + switch level { + case .trace, .debug: + return .debug + case .info, .notice: + return .info + case .warning: + return .default + case .error: + return .error + case .critical: + return .fault + } + } + + private static func mergeMetadata( + _ base: Logger.Metadata, + _ extra: Logger.Metadata?) -> Logger.Metadata + { + guard let extra else { return base } + return base.merging(extra, uniquingKeysWith: { _, new in new }) + } + + private static func renderMessage(_ message: Logger.Message, metadata: Logger.Metadata) -> String { + guard !metadata.isEmpty else { return message.description } + let meta = metadata + .sorted(by: { $0.key < $1.key }) + .map { "\($0.key)=\(stringify($0.value))" } + .joined(separator: " ") + return "\(message.description) [\(meta)]" + } + + private static func stringify(_ value: Logger.Metadata.Value) -> String { + switch value { + case let .string(text): + text + case let .stringConvertible(value): + String(describing: value) + case let .array(values): + "[" + values.map { stringify($0) }.joined(separator: ",") + "]" + case let .dictionary(entries): + "{" + entries.map { "\($0.key)=\(stringify($0.value))" }.joined(separator: ",") + "}" + } + } +} + +struct ClawdisFileLogHandler: LogHandler { + let label: String + var metadata: Logger.Metadata = [:] + + var logLevel: Logger.Level { + get { AppLogSettings.logLevel() } + set { AppLogSettings.setLogLevel(newValue) } + } + + subscript(metadataKey key: String) -> Logger.Metadata.Value? { + get { self.metadata[key] } + set { self.metadata[key] = newValue } + } + + func log( + level: Logger.Level, + message: Logger.Message, + metadata: Logger.Metadata?, + source: String, + file: String, + function: String, + line: UInt) + { + guard AppLogSettings.fileLoggingEnabled() else { return } + let (subsystem, category) = ClawdisLogging.parseLabel(self.label) + var fields: [String: String] = [ + "subsystem": subsystem, + "category": category, + "level": level.rawValue, + "source": source, + "file": file, + "function": function, + "line": "\(line)", + ] + let merged = self.metadata.merging(metadata ?? [:], uniquingKeysWith: { _, new in new }) + for (key, value) in merged { + fields["meta.\(key)"] = Self.stringify(value) + } + DiagnosticsFileLog.shared.log(category: category, event: message.description, fields: fields) + } + + private static func stringify(_ value: Logger.Metadata.Value) -> String { + switch value { + case let .string(text): + text + case let .stringConvertible(value): + String(describing: value) + case let .array(values): + "[" + values.map { stringify($0) }.joined(separator: ",") + "]" + case let .dictionary(entries): + "{" + entries.map { "\($0.key)=\(stringify($0.value))" }.joined(separator: ",") + "}" + } + } +} diff --git a/apps/macos/Sources/Clawdis/MenuBar.swift b/apps/macos/Sources/Clawdis/MenuBar.swift index c998cafea..770982311 100644 --- a/apps/macos/Sources/Clawdis/MenuBar.swift +++ b/apps/macos/Sources/Clawdis/MenuBar.swift @@ -3,7 +3,6 @@ import Darwin import Foundation import MenuBarExtraAccess import Observation -import OSLog import Security import SwiftUI @@ -30,6 +29,7 @@ struct ClawdisApp: App { } init() { + ClawdisLogging.bootstrapIfNeeded() _state = State(initialValue: AppStateStore.shared) } diff --git a/apps/macos/Sources/Clawdis/MenuContentView.swift b/apps/macos/Sources/Clawdis/MenuContentView.swift index 6a5dc1e89..d18aa6dbf 100644 --- a/apps/macos/Sources/Clawdis/MenuContentView.swift +++ b/apps/macos/Sources/Clawdis/MenuContentView.swift @@ -14,11 +14,14 @@ struct MenuContent: View { private let heartbeatStore = HeartbeatStore.shared private let controlChannel = ControlChannel.shared private let activityStore = WorkActivityStore.shared + @Bindable private var pairingPrompter = NodePairingApprovalPrompter.shared @Environment(\.openSettings) private var openSettings @State private var availableMics: [AudioInputDevice] = [] @State private var loadingMics = false @State private var browserControlEnabled = true @AppStorage(cameraEnabledKey) private var cameraEnabled: Bool = false + @AppStorage(appLogLevelKey) private var appLogLevelRaw: String = AppLogLevel.default.rawValue + @AppStorage(debugFileLogEnabledKey) private var appFileLoggingEnabled: Bool = false init(state: AppState, updater: UpdaterProviding?) { self._state = Bindable(wrappedValue: state) @@ -32,6 +35,13 @@ struct MenuContent: View { VStack(alignment: .leading, spacing: 2) { Text(self.connectionLabel) self.statusLine(label: self.healthStatus.label, color: self.healthStatus.color) + if self.pairingPrompter.pendingCount > 0 { + let repairCount = self.pairingPrompter.pendingRepairCount + let repairSuffix = repairCount > 0 ? " · \(repairCount) repair" : "" + self.statusLine( + label: "Pairing approval pending (\(self.pairingPrompter.pendingCount))\(repairSuffix)", + color: .orange) + } } } .disabled(self.state.connectionMode == .unconfigured) @@ -102,6 +112,13 @@ struct MenuContent: View { systemImage: "rectangle.inset.filled.on.rectangle") } } + Button { + Task { await self.state.setTalkEnabled(!self.state.talkEnabled) } + } label: { + Label(self.state.talkEnabled ? "Stop Talk Mode" : "Talk Mode", systemImage: "waveform.circle.fill") + } + .disabled(!voiceWakeSupported) + .opacity(voiceWakeSupported ? 1 : 0.5) Divider() Button("Settings…") { self.open(tab: .general) } .keyboardShortcut(",", modifiers: [.command]) @@ -167,6 +184,20 @@ struct MenuContent: View { : "Verbose Logging (Main): Off", systemImage: "text.alignleft") } + Menu("App Logging") { + Picker("Verbosity", selection: self.$appLogLevelRaw) { + ForEach(AppLogLevel.allCases) { level in + Text(level.title).tag(level.rawValue) + } + } + Toggle(isOn: self.$appFileLoggingEnabled) { + Label( + self.appFileLoggingEnabled + ? "File Logging: On" + : "File Logging: Off", + systemImage: "doc.text.magnifyingglass") + } + } Button { DebugActions.openSessionStore() } label: { @@ -194,10 +225,12 @@ struct MenuContent: View { Label("Send Test Notification", systemImage: "bell") } Divider() - Button { - DebugActions.restartGateway() - } label: { - Label("Restart Gateway", systemImage: "arrow.clockwise") + if self.state.connectionMode == .local, !AppStateStore.attachExistingGatewayOnly { + Button { + DebugActions.restartGateway() + } label: { + Label("Restart Gateway", systemImage: "arrow.clockwise") + } } Button { DebugActions.restartApp() diff --git a/apps/macos/Sources/Clawdis/MenuSessionsInjector.swift b/apps/macos/Sources/Clawdis/MenuSessionsInjector.swift index a66b1a38d..7d8e899a9 100644 --- a/apps/macos/Sources/Clawdis/MenuSessionsInjector.swift +++ b/apps/macos/Sources/Clawdis/MenuSessionsInjector.swift @@ -22,8 +22,7 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { private var cachedErrorText: String? private var cacheUpdatedAt: Date? private let refreshIntervalSeconds: TimeInterval = 12 - private let nodesStore = InstancesStore.shared - private let gatewayDiscovery = GatewayDiscoveryModel() + private let nodesStore = NodesStore.shared #if DEBUG private var testControlChannelConnected: Bool? #endif @@ -43,7 +42,6 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { } self.nodesStore.start() - self.gatewayDiscovery.start() } func menuWillOpen(_ menu: NSMenu) { @@ -218,7 +216,7 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { } if entries.isEmpty { - let title = self.nodesStore.isLoading ? "Loading nodes..." : "No nodes yet" + let title = self.nodesStore.isLoading ? "Loading devices..." : "No devices yet" menu.insertItem(self.makeMessageItem(text: title, symbolName: "circle.dashed", width: width), at: cursor) cursor += 1 } else { @@ -231,7 +229,7 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { item.view = HighlightedMenuItemHostView( rootView: AnyView(NodeMenuRowView(entry: entry, width: width)), width: width) - item.submenu = self.buildNodeSubmenu(entry: entry) + item.submenu = self.buildNodeSubmenu(entry: entry, width: width) menu.insertItem(item, at: cursor) cursor += 1 } @@ -239,7 +237,7 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { if entries.count > 8 { let moreItem = NSMenuItem() moreItem.tag = self.nodesTag - moreItem.title = "More Nodes..." + moreItem.title = "More Devices..." moreItem.image = NSImage(systemSymbolName: "ellipsis.circle", accessibilityDescription: nil) let overflow = Array(entries.dropFirst(8)) moreItem.submenu = self.buildNodesOverflowMenu(entries: overflow, width: width) @@ -436,7 +434,7 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { return menu } - private func buildNodesOverflowMenu(entries: [InstanceInfo], width: CGFloat) -> NSMenu { + private func buildNodesOverflowMenu(entries: [NodeInfo], width: CGFloat) -> NSMenu { let menu = NSMenu() for entry in entries { let item = NSMenuItem() @@ -446,27 +444,27 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { item.view = HighlightedMenuItemHostView( rootView: AnyView(NodeMenuRowView(entry: entry, width: width)), width: width) - item.submenu = self.buildNodeSubmenu(entry: entry) + item.submenu = self.buildNodeSubmenu(entry: entry, width: width) menu.addItem(item) } return menu } - private func buildNodeSubmenu(entry: InstanceInfo) -> NSMenu { + private func buildNodeSubmenu(entry: NodeInfo, width: CGFloat) -> NSMenu { let menu = NSMenu() menu.autoenablesItems = false - menu.addItem(self.makeNodeCopyItem(label: "ID", value: entry.id)) + menu.addItem(self.makeNodeCopyItem(label: "Node ID", value: entry.nodeId)) - if let host = entry.host?.nonEmpty { - menu.addItem(self.makeNodeCopyItem(label: "Host", value: host)) + if let name = entry.displayName?.nonEmpty { + menu.addItem(self.makeNodeCopyItem(label: "Name", value: name)) } - if let ip = entry.ip?.nonEmpty { + if let ip = entry.remoteIp?.nonEmpty { menu.addItem(self.makeNodeCopyItem(label: "IP", value: ip)) } - menu.addItem(self.makeNodeCopyItem(label: "Role", value: NodeMenuEntryFormatter.roleText(entry))) + menu.addItem(self.makeNodeCopyItem(label: "Status", value: NodeMenuEntryFormatter.roleText(entry))) if let platform = NodeMenuEntryFormatter.platformText(entry) { menu.addItem(self.makeNodeCopyItem(label: "Platform", value: platform)) @@ -476,19 +474,20 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { menu.addItem(self.makeNodeCopyItem(label: "Version", value: self.formatVersionLabel(version))) } - menu.addItem(self.makeNodeDetailItem(label: "Last seen", value: entry.ageDescription)) + menu.addItem(self.makeNodeDetailItem(label: "Connected", value: entry.isConnected ? "Yes" : "No")) + menu.addItem(self.makeNodeDetailItem(label: "Paired", value: entry.isPaired ? "Yes" : "No")) - if entry.lastInputSeconds != nil { - menu.addItem(self.makeNodeDetailItem(label: "Last input", value: entry.lastInputDescription)) + if let caps = entry.caps?.filter({ !$0.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty }), + !caps.isEmpty { + menu.addItem(self.makeNodeCopyItem(label: "Caps", value: caps.joined(separator: ", "))) } - if let reason = entry.reason?.nonEmpty { - menu.addItem(self.makeNodeDetailItem(label: "Reason", value: reason)) - } - - if let sshURL = self.sshURL(for: entry) { - menu.addItem(.separator()) - menu.addItem(self.makeNodeActionItem(title: "Open SSH", url: sshURL)) + if let commands = entry.commands?.filter({ !$0.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty }), + !commands.isEmpty { + menu.addItem(self.makeNodeMultilineItem( + label: "Commands", + value: commands.joined(separator: ", "), + width: width)) } return menu @@ -507,12 +506,17 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { return item } - private func makeNodeActionItem(title: String, url: URL) -> NSMenuItem { - let item = NSMenuItem(title: title, action: #selector(self.openNodeSSH(_:)), keyEquivalent: "") + private func makeNodeMultilineItem(label: String, value: String, width: CGFloat) -> NSMenuItem { + let item = NSMenuItem() item.target = self - item.representedObject = url + item.action = #selector(self.copyNodeValue(_:)) + item.representedObject = value + item.view = HighlightedMenuItemHostView( + rootView: AnyView(NodeMenuMultilineView(label: label, value: value, width: width)), + width: width) return item } + private func formatVersionLabel(_ version: String) -> String { let trimmed = version.trimmingCharacters(in: .whitespacesAndNewlines) guard !trimmed.isEmpty else { return version } @@ -638,104 +642,6 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { NSPasteboard.general.setString(value, forType: .string) } - @objc - private func openNodeSSH(_ sender: NSMenuItem) { - guard let url = sender.representedObject as? URL else { return } - - if let appURL = self.preferredTerminalAppURL() { - NSWorkspace.shared.open( - [url], - withApplicationAt: appURL, - configuration: NSWorkspace.OpenConfiguration(), - completionHandler: nil) - } else { - NSWorkspace.shared.open(url) - } - } - - private func preferredTerminalAppURL() -> URL? { - if let ghosty = self.ghostyAppURL() { return ghosty } - return NSWorkspace.shared.urlForApplication(withBundleIdentifier: "com.apple.Terminal") - } - - private func ghostyAppURL() -> URL? { - let candidates = [ - "/Applications/Ghosty.app", - ("~/Applications/Ghosty.app" as NSString).expandingTildeInPath, - ] - for path in candidates where FileManager.default.fileExists(atPath: path) { - return URL(fileURLWithPath: path) - } - return nil - } - - private func sshURL(for entry: InstanceInfo) -> URL? { - guard NodeMenuEntryFormatter.isGateway(entry) else { return nil } - guard let gateway = self.matchingGateway(for: entry) else { return nil } - guard let host = self.sanitizedTailnetHost(gateway.tailnetDns) ?? gateway.lanHost else { return nil } - let user = NSUserName() - return self.buildSSHURL(user: user, host: host, port: gateway.sshPort) - } - - private func matchingGateway(for entry: InstanceInfo) -> GatewayDiscoveryModel.DiscoveredGateway? { - let candidates = self.entryHostCandidates(entry) - guard !candidates.isEmpty else { return nil } - return self.gatewayDiscovery.gateways.first { gateway in - let gatewayTokens = self.gatewayHostTokens(gateway) - return candidates.contains { gatewayTokens.contains($0) } - } - } - - private func entryHostCandidates(_ entry: InstanceInfo) -> [String] { - let raw: [String?] = [ - entry.host, - entry.ip, - NodeMenuEntryFormatter.primaryName(entry), - ] - return raw.compactMap(self.normalizedHostToken(_:)) - } - - private func gatewayHostTokens(_ gateway: GatewayDiscoveryModel.DiscoveredGateway) -> [String] { - let raw: [String?] = [ - gateway.displayName, - gateway.lanHost, - gateway.tailnetDns, - ] - return raw.compactMap(self.normalizedHostToken(_:)) - } - - private func normalizedHostToken(_ value: String?) -> String? { - guard let value else { return nil } - let trimmed = value.trimmingCharacters(in: .whitespacesAndNewlines) - if trimmed.isEmpty { return nil } - let lower = trimmed.lowercased().trimmingCharacters(in: CharacterSet(charactersIn: ".")) - if lower.hasSuffix(".localdomain") { - return lower.replacingOccurrences(of: ".localdomain", with: ".local") - } - return lower - } - - private func sanitizedTailnetHost(_ host: String?) -> String? { - guard let host else { return nil } - let trimmed = host.trimmingCharacters(in: .whitespacesAndNewlines) - if trimmed.isEmpty { return nil } - if trimmed.hasSuffix(".internal.") || trimmed.hasSuffix(".internal") { - return nil - } - return trimmed - } - - private func buildSSHURL(user: String, host: String, port: Int) -> URL? { - var components = URLComponents() - components.scheme = "ssh" - components.user = user - components.host = host - if port != 22 { - components.port = port - } - return components.url - } - // MARK: - Width + placement private func findInsertIndex(in menu: NSMenu) -> Int? { @@ -790,23 +696,14 @@ final class MenuSessionsInjector: NSObject, NSMenuDelegate { return width } - private func sortedNodeEntries() -> [InstanceInfo] { - let entries = self.nodesStore.instances.filter { entry in - let mode = entry.mode?.trimmingCharacters(in: .whitespacesAndNewlines).lowercased() - return mode != "health" - } + private func sortedNodeEntries() -> [NodeInfo] { + let entries = self.nodesStore.nodes.filter { $0.isConnected } return entries.sorted { lhs, rhs in - let lhsGateway = NodeMenuEntryFormatter.isGateway(lhs) - let rhsGateway = NodeMenuEntryFormatter.isGateway(rhs) - if lhsGateway != rhsGateway { return lhsGateway } - - let lhsLocal = NodeMenuEntryFormatter.isLocal(lhs) - let rhsLocal = NodeMenuEntryFormatter.isLocal(rhs) - if lhsLocal != rhsLocal { return lhsLocal } - + if lhs.isConnected != rhs.isConnected { return lhs.isConnected } + if lhs.isPaired != rhs.isPaired { return lhs.isPaired } let lhsName = NodeMenuEntryFormatter.primaryName(lhs).lowercased() let rhsName = NodeMenuEntryFormatter.primaryName(rhs).lowercased() - if lhsName == rhsName { return lhs.ts > rhs.ts } + if lhsName == rhsName { return lhs.nodeId < rhs.nodeId } return lhsName < rhsName } } diff --git a/apps/macos/Sources/Clawdis/ModelCatalogLoader.swift b/apps/macos/Sources/Clawdis/ModelCatalogLoader.swift index ebd5c6e97..61c234996 100644 --- a/apps/macos/Sources/Clawdis/ModelCatalogLoader.swift +++ b/apps/macos/Sources/Clawdis/ModelCatalogLoader.swift @@ -4,18 +4,23 @@ import JavaScriptCore enum ModelCatalogLoader { static let defaultPath: String = FileManager.default.homeDirectoryForCurrentUser .appendingPathComponent("Projects/pi-mono/packages/ai/src/models.generated.ts").path + private static let logger = Logger(subsystem: "com.steipete.clawdis", category: "models") static func load(from path: String) async throws -> [ModelChoice] { let expanded = (path as NSString).expandingTildeInPath + self.logger.debug("model catalog load start file=\(URL(fileURLWithPath: expanded).lastPathComponent)") let source = try String(contentsOfFile: expanded, encoding: .utf8) let sanitized = self.sanitize(source: source) let ctx = JSContext() ctx?.exceptionHandler = { _, exception in - if let exception { print("JS exception: \(exception)") } + if let exception { + self.logger.warning("model catalog JS exception: \(exception)") + } } ctx?.evaluateScript(sanitized) guard let rawModels = ctx?.objectForKeyedSubscript("MODELS")?.toDictionary() as? [String: Any] else { + self.logger.error("model catalog parse failed: MODELS missing") throw NSError( domain: "ModelCatalogLoader", code: 1, @@ -33,12 +38,14 @@ enum ModelCatalogLoader { } } - return choices.sorted { lhs, rhs in + let sorted = choices.sorted { lhs, rhs in if lhs.provider == rhs.provider { return lhs.name.localizedCaseInsensitiveCompare(rhs.name) == .orderedAscending } return lhs.provider.localizedCaseInsensitiveCompare(rhs.provider) == .orderedAscending } + self.logger.debug("model catalog loaded providers=\(rawModels.count) models=\(sorted.count)") + return sorted } private static func sanitize(source: String) -> String { diff --git a/apps/macos/Sources/Clawdis/NodeMode/MacNodeRuntime.swift b/apps/macos/Sources/Clawdis/NodeMode/MacNodeRuntime.swift index 4b6c8bbc8..b46831034 100644 --- a/apps/macos/Sources/Clawdis/NodeMode/MacNodeRuntime.swift +++ b/apps/macos/Sources/Clawdis/NodeMode/MacNodeRuntime.swift @@ -265,7 +265,7 @@ actor MacNodeRuntime { guard let raw = await GatewayConnection.shared.canvasHostUrl() else { return nil } let trimmed = raw.trimmingCharacters(in: .whitespacesAndNewlines) guard !trimmed.isEmpty, let baseUrl = URL(string: trimmed) else { return nil } - return baseUrl.appendingPathComponent("__clawdis__/a2ui/").absoluteString + return baseUrl.appendingPathComponent("__clawdis__/a2ui/").absoluteString + "?platform=macos" } private func isA2UIReady(poll: Bool = false) async -> Bool { diff --git a/apps/macos/Sources/Clawdis/NodePairingApprovalPrompter.swift b/apps/macos/Sources/Clawdis/NodePairingApprovalPrompter.swift index 85d8a9f12..932f272f6 100644 --- a/apps/macos/Sources/Clawdis/NodePairingApprovalPrompter.swift +++ b/apps/macos/Sources/Clawdis/NodePairingApprovalPrompter.swift @@ -2,6 +2,7 @@ import AppKit import ClawdisIPC import ClawdisProtocol import Foundation +import Observation import OSLog import UserNotifications @@ -15,6 +16,7 @@ enum NodePairingReconcilePolicy { } @MainActor +@Observable final class NodePairingApprovalPrompter { static let shared = NodePairingApprovalPrompter() @@ -26,6 +28,8 @@ final class NodePairingApprovalPrompter { private var isStopping = false private var isPresenting = false private var queue: [PendingRequest] = [] + var pendingCount: Int = 0 + var pendingRepairCount: Int = 0 private var activeAlert: NSAlert? private var activeRequestId: String? private var alertHostWindow: NSWindow? @@ -104,6 +108,7 @@ final class NodePairingApprovalPrompter { self.reconcileOnceTask?.cancel() self.reconcileOnceTask = nil self.queue.removeAll(keepingCapacity: false) + self.updatePendingCounts() self.isPresenting = false self.activeRequestId = nil self.alertHostWindow?.orderOut(nil) @@ -292,6 +297,7 @@ final class NodePairingApprovalPrompter { private func enqueue(_ req: PendingRequest) { if self.queue.contains(req) { return } self.queue.append(req) + self.updatePendingCounts() self.presentNextIfNeeded() self.updateReconcileLoop() } @@ -362,6 +368,7 @@ final class NodePairingApprovalPrompter { } else { self.queue.removeAll { $0 == request } } + self.updatePendingCounts() self.isPresenting = false self.presentNextIfNeeded() self.updateReconcileLoop() @@ -501,6 +508,8 @@ final class NodePairingApprovalPrompter { } else { self.queue.removeAll { $0 == req } } + + self.updatePendingCounts() self.isPresenting = false self.presentNextIfNeeded() self.updateReconcileLoop() @@ -599,6 +608,12 @@ final class NodePairingApprovalPrompter { } } + private func updatePendingCounts() { + // Keep a cheap observable summary for the menu bar status line. + self.pendingCount = self.queue.count + self.pendingRepairCount = self.queue.filter { $0.isRepair == true }.count + } + private func reconcileOnce(timeoutMs: Double) async { if self.isStopping { return } if self.reconcileInFlight { return } @@ -643,6 +658,7 @@ final class NodePairingApprovalPrompter { return } self.queue.removeAll { $0.requestId == resolved.requestId } + self.updatePendingCounts() Task { @MainActor in await self.notify(resolution: resolution, request: request, via: "remote") } diff --git a/apps/macos/Sources/Clawdis/NodesMenu.swift b/apps/macos/Sources/Clawdis/NodesMenu.swift index ec068ad8b..882b7ec3e 100644 --- a/apps/macos/Sources/Clawdis/NodesMenu.swift +++ b/apps/macos/Sources/Clawdis/NodesMenu.swift @@ -2,51 +2,53 @@ import AppKit import SwiftUI struct NodeMenuEntryFormatter { - static func isGateway(_ entry: InstanceInfo) -> Bool { - entry.mode?.trimmingCharacters(in: .whitespacesAndNewlines).lowercased() == "gateway" + static func isConnected(_ entry: NodeInfo) -> Bool { + entry.isConnected } - static func isLocal(_ entry: InstanceInfo) -> Bool { - entry.mode?.trimmingCharacters(in: .whitespacesAndNewlines).lowercased() == "local" + static func primaryName(_ entry: NodeInfo) -> String { + entry.displayName?.nonEmpty ?? entry.nodeId } - static func primaryName(_ entry: InstanceInfo) -> String { - if self.isGateway(entry) { - let host = entry.host?.nonEmpty - if let host, host.lowercased() != "gateway" { return host } - return "Gateway" + static func summaryText(_ entry: NodeInfo) -> String { + let name = self.primaryName(entry) + var prefix = "Node: \(name)" + if let ip = entry.remoteIp?.nonEmpty { + prefix += " (\(ip))" + } + var parts = [prefix] + if let platform = self.platformText(entry) { + parts.append("platform \(platform)") } - return entry.host?.nonEmpty ?? entry.id - } - - static func summaryText(_ entry: InstanceInfo) -> String { - entry.text.nonEmpty ?? self.primaryName(entry) - } - - static func roleText(_ entry: InstanceInfo) -> String { - if self.isGateway(entry) { return "gateway" } - if let mode = entry.mode?.nonEmpty { return mode } - return "node" - } - - static func detailLeft(_ entry: InstanceInfo) -> String { - let role = self.roleText(entry) - if let ip = entry.ip?.nonEmpty { return "\(ip) · \(role)" } - return role - } - - static func detailRight(_ entry: InstanceInfo) -> String? { - var parts: [String] = [] - if let platform = self.platformText(entry) { parts.append(platform) } if let version = entry.version?.nonEmpty { - let short = self.compactVersion(version) - parts.append("v\(short)") + parts.append("app \(self.compactVersion(version))") } - if parts.isEmpty { return nil } + parts.append("status \(self.roleText(entry))") return parts.joined(separator: " · ") } - static func platformText(_ entry: InstanceInfo) -> String? { + static func roleText(_ entry: NodeInfo) -> String { + if entry.isConnected { return "connected" } + if entry.isPaired { return "paired" } + return "unpaired" + } + + static func detailLeft(_ entry: NodeInfo) -> String { + let role = self.roleText(entry) + if let ip = entry.remoteIp?.nonEmpty { return "\(ip) · \(role)" } + return role + } + + static func headlineRight(_ entry: NodeInfo) -> String? { + self.platformText(entry) + } + + static func detailRightVersion(_ entry: NodeInfo) -> String? { + guard let version = entry.version?.nonEmpty else { return nil } + return self.shortVersionLabel(version) + } + + static func platformText(_ entry: NodeInfo) -> String? { if let raw = entry.platform?.nonEmpty { return self.prettyPlatform(raw) ?? raw } @@ -99,8 +101,17 @@ struct NodeMenuEntryFormatter { return trimmed } - static func leadingSymbol(_ entry: InstanceInfo) -> String { - if self.isGateway(entry) { return self.safeSystemSymbol("dot.radiowaves.left.and.right", fallback: "network") } + private static func shortVersionLabel(_ raw: String) -> String { + let compact = self.compactVersion(raw) + if compact.isEmpty { return compact } + if compact.lowercased().hasPrefix("v") { return compact } + if let first = compact.unicodeScalars.first, CharacterSet.decimalDigits.contains(first) { + return "v\(compact)" + } + return compact + } + + static func leadingSymbol(_ entry: NodeInfo) -> String { if let family = entry.deviceFamily?.lowercased() { if family.contains("mac") { return self.safeSystemSymbol("laptopcomputer", fallback: "laptopcomputer") @@ -116,9 +127,11 @@ struct NodeMenuEntryFormatter { return "cpu" } - static func isAndroid(_ entry: InstanceInfo) -> Bool { + static func isAndroid(_ entry: NodeInfo) -> Bool { let family = entry.deviceFamily?.trimmingCharacters(in: .whitespacesAndNewlines).lowercased() - return family == "android" + if family == "android" { return true } + let platform = entry.platform?.trimmingCharacters(in: .whitespacesAndNewlines).lowercased() + return platform?.contains("android") == true } private static func safeSystemSymbol(_ preferred: String, fallback: String) -> String { @@ -128,7 +141,7 @@ struct NodeMenuEntryFormatter { } struct NodeMenuRowView: View { - let entry: InstanceInfo + let entry: NodeInfo let width: CGFloat @Environment(\.menuItemHighlighted) private var isHighlighted @@ -146,11 +159,32 @@ struct NodeMenuRowView: View { .frame(width: 22, height: 22, alignment: .center) VStack(alignment: .leading, spacing: 2) { - Text(NodeMenuEntryFormatter.primaryName(self.entry)) - .font(.callout.weight(NodeMenuEntryFormatter.isGateway(self.entry) ? .semibold : .regular)) - .foregroundStyle(self.primaryColor) - .lineLimit(1) - .truncationMode(.middle) + HStack(alignment: .firstTextBaseline, spacing: 8) { + Text(NodeMenuEntryFormatter.primaryName(self.entry)) + .font(.callout.weight(NodeMenuEntryFormatter.isConnected(self.entry) ? .semibold : .regular)) + .foregroundStyle(self.primaryColor) + .lineLimit(1) + .truncationMode(.middle) + .layoutPriority(1) + + Spacer(minLength: 8) + + HStack(alignment: .firstTextBaseline, spacing: 6) { + if let right = NodeMenuEntryFormatter.headlineRight(self.entry) { + Text(right) + .font(.caption.monospacedDigit()) + .foregroundStyle(self.secondaryColor) + .lineLimit(1) + .truncationMode(.middle) + .layoutPriority(2) + } + + Image(systemName: "chevron.right") + .font(.caption.weight(.semibold)) + .foregroundStyle(self.secondaryColor) + .padding(.leading, 2) + } + } HStack(alignment: .firstTextBaseline, spacing: 8) { Text(NodeMenuEntryFormatter.detailLeft(self.entry)) @@ -161,21 +195,15 @@ struct NodeMenuRowView: View { Spacer(minLength: 0) - HStack(alignment: .firstTextBaseline, spacing: 6) { - if let right = NodeMenuEntryFormatter.detailRight(self.entry) { - Text(right) - .font(.caption.monospacedDigit()) - .foregroundStyle(self.secondaryColor) - .lineLimit(1) - .truncationMode(.middle) - } - - Image(systemName: "chevron.right") - .font(.caption.weight(.semibold)) + if let version = NodeMenuEntryFormatter.detailRightVersion(self.entry) { + Text(version) + .font(.caption.monospacedDigit()) .foregroundStyle(self.secondaryColor) - .padding(.leading, 2) + .lineLimit(1) + .truncationMode(.middle) } } + .frame(maxWidth: .infinity, alignment: .leading) } .frame(maxWidth: .infinity, alignment: .leading) @@ -215,3 +243,36 @@ struct AndroidMark: View { } } } + +struct NodeMenuMultilineView: View { + let label: String + let value: String + let width: CGFloat + @Environment(\.menuItemHighlighted) private var isHighlighted + + private var primaryColor: Color { + self.isHighlighted ? Color(nsColor: .selectedMenuItemTextColor) : .primary + } + + private var secondaryColor: Color { + self.isHighlighted ? Color(nsColor: .selectedMenuItemTextColor).opacity(0.85) : .secondary + } + + var body: some View { + VStack(alignment: .leading, spacing: 4) { + Text("\(self.label):") + .font(.caption.weight(.semibold)) + .foregroundStyle(self.secondaryColor) + + Text(self.value) + .font(.caption) + .foregroundStyle(self.primaryColor) + .multilineTextAlignment(.leading) + .fixedSize(horizontal: false, vertical: true) + } + .padding(.vertical, 6) + .padding(.leading, 18) + .padding(.trailing, 12) + .frame(width: max(1, self.width), alignment: .leading) + } +} diff --git a/apps/macos/Sources/Clawdis/NodesStore.swift b/apps/macos/Sources/Clawdis/NodesStore.swift new file mode 100644 index 000000000..2c00e15f7 --- /dev/null +++ b/apps/macos/Sources/Clawdis/NodesStore.swift @@ -0,0 +1,84 @@ +import Foundation +import Observation +import OSLog + +struct NodeInfo: Identifiable, Codable { + let nodeId: String + let displayName: String? + let platform: String? + let version: String? + let deviceFamily: String? + let modelIdentifier: String? + let remoteIp: String? + let caps: [String]? + let commands: [String]? + let permissions: [String: Bool]? + let paired: Bool? + let connected: Bool? + + var id: String { self.nodeId } + var isConnected: Bool { self.connected ?? false } + var isPaired: Bool { self.paired ?? false } +} + +private struct NodeListResponse: Codable { + let ts: Double? + let nodes: [NodeInfo] +} + +@MainActor +@Observable +final class NodesStore { + static let shared = NodesStore() + + var nodes: [NodeInfo] = [] + var lastError: String? + var statusMessage: String? + var isLoading = false + + private let logger = Logger(subsystem: "com.steipete.clawdis", category: "nodes") + private var task: Task? + private let interval: TimeInterval = 30 + private var startCount = 0 + + func start() { + self.startCount += 1 + guard self.startCount == 1 else { return } + guard self.task == nil else { return } + self.task = Task.detached { [weak self] in + guard let self else { return } + await self.refresh() + while !Task.isCancelled { + try? await Task.sleep(nanoseconds: UInt64(self.interval * 1_000_000_000)) + await self.refresh() + } + } + } + + func stop() { + guard self.startCount > 0 else { return } + self.startCount -= 1 + guard self.startCount == 0 else { return } + self.task?.cancel() + self.task = nil + } + + func refresh() async { + if self.isLoading { return } + self.statusMessage = nil + self.isLoading = true + defer { self.isLoading = false } + do { + let data = try await GatewayConnection.shared.requestRaw(method: "node.list", params: nil, timeoutMs: 8000) + let decoded = try JSONDecoder().decode(NodeListResponse.self, from: data) + self.nodes = decoded.nodes + self.lastError = nil + self.statusMessage = nil + } catch { + self.logger.error("node.list failed \(error.localizedDescription, privacy: .public)") + self.nodes = [] + self.lastError = error.localizedDescription + self.statusMessage = nil + } + } +} diff --git a/apps/macos/Sources/Clawdis/NotificationManager.swift b/apps/macos/Sources/Clawdis/NotificationManager.swift index 4650d3afe..e4095ab2e 100644 --- a/apps/macos/Sources/Clawdis/NotificationManager.swift +++ b/apps/macos/Sources/Clawdis/NotificationManager.swift @@ -5,6 +5,8 @@ import UserNotifications @MainActor struct NotificationManager { + private let logger = Logger(subsystem: "com.steipete.clawdis", category: "notifications") + private static let hasTimeSensitiveEntitlement: Bool = { guard let task = SecTaskCreateFromSelf(nil) else { return false } let key = "com.apple.developer.usernotifications.time-sensitive" as CFString @@ -17,8 +19,12 @@ struct NotificationManager { let status = await center.notificationSettings() if status.authorizationStatus == .notDetermined { let granted = try? await center.requestAuthorization(options: [.alert, .sound, .badge]) - if granted != true { return false } + if granted != true { + self.logger.warning("notification permission denied (request)") + return false + } } else if status.authorizationStatus != .authorized { + self.logger.warning("notification permission denied status=\(status.authorizationStatus.rawValue)") return false } @@ -37,15 +43,22 @@ struct NotificationManager { case .active: content.interruptionLevel = .active case .timeSensitive: - content.interruptionLevel = Self.hasTimeSensitiveEntitlement ? .timeSensitive : .active + if Self.hasTimeSensitiveEntitlement { + content.interruptionLevel = .timeSensitive + } else { + self.logger.debug("time-sensitive notification requested without entitlement; falling back to active") + content.interruptionLevel = .active + } } } let req = UNNotificationRequest(identifier: UUID().uuidString, content: content, trigger: nil) do { try await center.add(req) + self.logger.debug("notification queued") return true } catch { + self.logger.error("notification send failed: \(error.localizedDescription)") return false } } diff --git a/apps/macos/Sources/Clawdis/PermissionManager.swift b/apps/macos/Sources/Clawdis/PermissionManager.swift index d7d1ab4e0..e6de4e58f 100644 --- a/apps/macos/Sources/Clawdis/PermissionManager.swift +++ b/apps/macos/Sources/Clawdis/PermissionManager.swift @@ -5,7 +5,6 @@ import ClawdisIPC import CoreGraphics import Foundation import Observation -import OSLog import Speech import UserNotifications diff --git a/apps/macos/Sources/Clawdis/SettingsRootView.swift b/apps/macos/Sources/Clawdis/SettingsRootView.swift index 3f6a4b8d5..a7f88c0de 100644 --- a/apps/macos/Sources/Clawdis/SettingsRootView.swift +++ b/apps/macos/Sources/Clawdis/SettingsRootView.swift @@ -21,7 +21,6 @@ struct SettingsRootView: View { if self.isNixMode { self.nixManagedBanner } - TabView(selection: self.$selectedTab) { GeneralSettings(state: self.state) .tabItem { Label("General", systemImage: "gearshape") } @@ -63,7 +62,7 @@ struct SettingsRootView: View { .tag(SettingsTab.permissions) if self.state.debugPaneEnabled { - DebugSettings() + DebugSettings(state: self.state) .tabItem { Label("Debug", systemImage: "ant") } .tag(SettingsTab.debug) } diff --git a/apps/macos/Sources/Clawdis/TalkAudioPlayer.swift b/apps/macos/Sources/Clawdis/TalkAudioPlayer.swift new file mode 100644 index 000000000..6a61b5881 --- /dev/null +++ b/apps/macos/Sources/Clawdis/TalkAudioPlayer.swift @@ -0,0 +1,158 @@ +import AVFoundation +import Foundation +import OSLog + +@MainActor +final class TalkAudioPlayer: NSObject, @preconcurrency AVAudioPlayerDelegate { + static let shared = TalkAudioPlayer() + + private let logger = Logger(subsystem: "com.steipete.clawdis", category: "talk.tts") + private var player: AVAudioPlayer? + private var playback: Playback? + + private final class Playback: @unchecked Sendable { + private let lock = NSLock() + private var finished = false + private var continuation: CheckedContinuation? + private var watchdog: Task? + + func setContinuation(_ continuation: CheckedContinuation) { + self.lock.lock() + defer { self.lock.unlock() } + self.continuation = continuation + } + + func setWatchdog(_ task: Task?) { + self.lock.lock() + let old = self.watchdog + self.watchdog = task + self.lock.unlock() + old?.cancel() + } + + func cancelWatchdog() { + self.setWatchdog(nil) + } + + func finish(_ result: TalkPlaybackResult) { + let continuation: CheckedContinuation? + self.lock.lock() + if self.finished { + continuation = nil + } else { + self.finished = true + continuation = self.continuation + self.continuation = nil + } + self.lock.unlock() + continuation?.resume(returning: result) + } + } + + func play(data: Data) async -> TalkPlaybackResult { + self.stopInternal() + + let playback = Playback() + self.playback = playback + + return await withCheckedContinuation { continuation in + playback.setContinuation(continuation) + do { + let player = try AVAudioPlayer(data: data) + self.player = player + + player.delegate = self + player.prepareToPlay() + + self.armWatchdog(playback: playback) + + let ok = player.play() + if !ok { + self.logger.error("talk audio player refused to play") + self.finish(playback: playback, result: TalkPlaybackResult(finished: false, interruptedAt: nil)) + } + } catch { + self.logger.error("talk audio player failed: \(error.localizedDescription, privacy: .public)") + self.finish(playback: playback, result: TalkPlaybackResult(finished: false, interruptedAt: nil)) + } + } + } + + func stop() -> Double? { + guard let player else { return nil } + let time = player.currentTime + self.stopInternal(interruptedAt: time) + return time + } + + func audioPlayerDidFinishPlaying(_: AVAudioPlayer, successfully flag: Bool) { + self.stopInternal(finished: flag) + } + + private func stopInternal(finished: Bool = false, interruptedAt: Double? = nil) { + guard let playback else { return } + let result = TalkPlaybackResult(finished: finished, interruptedAt: interruptedAt) + self.finish(playback: playback, result: result) + } + + private func finish(playback: Playback, result: TalkPlaybackResult) { + playback.cancelWatchdog() + playback.finish(result) + + guard self.playback === playback else { return } + self.playback = nil + self.player?.stop() + self.player = nil + } + + private func stopInternal() { + if let playback = self.playback { + let interruptedAt = self.player?.currentTime + self.finish( + playback: playback, + result: TalkPlaybackResult(finished: false, interruptedAt: interruptedAt)) + return + } + self.player?.stop() + self.player = nil + } + + private func armWatchdog(playback: Playback) { + playback.setWatchdog(Task { @MainActor [weak self] in + guard let self else { return } + + do { + try await Task.sleep(nanoseconds: 650_000_000) + } catch { + return + } + if Task.isCancelled { return } + + guard self.playback === playback else { return } + if self.player?.isPlaying != true { + self.logger.error("talk audio player did not start playing") + self.finish(playback: playback, result: TalkPlaybackResult(finished: false, interruptedAt: nil)) + return + } + + let duration = self.player?.duration ?? 0 + let timeoutSeconds = min(max(2.0, duration + 2.0), 5 * 60.0) + do { + try await Task.sleep(nanoseconds: UInt64(timeoutSeconds * 1_000_000_000)) + } catch { + return + } + if Task.isCancelled { return } + + guard self.playback === playback else { return } + guard self.player?.isPlaying == true else { return } + self.logger.error("talk audio player watchdog fired") + self.finish(playback: playback, result: TalkPlaybackResult(finished: false, interruptedAt: nil)) + }) + } +} + +struct TalkPlaybackResult: Sendable { + let finished: Bool + let interruptedAt: Double? +} diff --git a/apps/macos/Sources/Clawdis/TalkModeController.swift b/apps/macos/Sources/Clawdis/TalkModeController.swift new file mode 100644 index 000000000..c87dd92da --- /dev/null +++ b/apps/macos/Sources/Clawdis/TalkModeController.swift @@ -0,0 +1,61 @@ +import Observation + +@MainActor +@Observable +final class TalkModeController { + static let shared = TalkModeController() + + private let logger = Logger(subsystem: "com.steipete.clawdis", category: "talk.controller") + + private(set) var phase: TalkModePhase = .idle + private(set) var isPaused: Bool = false + + func setEnabled(_ enabled: Bool) async { + self.logger.info("talk enabled=\(enabled)") + if enabled { + TalkOverlayController.shared.present() + } else { + TalkOverlayController.shared.dismiss() + } + await TalkModeRuntime.shared.setEnabled(enabled) + } + + func updatePhase(_ phase: TalkModePhase) { + self.phase = phase + TalkOverlayController.shared.updatePhase(phase) + let effectivePhase = self.isPaused ? "paused" : phase.rawValue + Task { await GatewayConnection.shared.talkMode(enabled: AppStateStore.shared.talkEnabled, phase: effectivePhase) } + } + + func updateLevel(_ level: Double) { + TalkOverlayController.shared.updateLevel(level) + } + + func setPaused(_ paused: Bool) { + guard self.isPaused != paused else { return } + self.logger.info("talk paused=\(paused)") + self.isPaused = paused + TalkOverlayController.shared.updatePaused(paused) + let effectivePhase = paused ? "paused" : self.phase.rawValue + Task { await GatewayConnection.shared.talkMode(enabled: AppStateStore.shared.talkEnabled, phase: effectivePhase) } + Task { await TalkModeRuntime.shared.setPaused(paused) } + } + + func togglePaused() { + self.setPaused(!self.isPaused) + } + + func stopSpeaking(reason: TalkStopReason = .userTap) { + Task { await TalkModeRuntime.shared.stopSpeaking(reason: reason) } + } + + func exitTalkMode() { + Task { await AppStateStore.shared.setTalkEnabled(false) } + } +} + +enum TalkStopReason { + case userTap + case speech + case manual +} diff --git a/apps/macos/Sources/Clawdis/TalkModeRuntime.swift b/apps/macos/Sources/Clawdis/TalkModeRuntime.swift new file mode 100644 index 000000000..0289c7c0a --- /dev/null +++ b/apps/macos/Sources/Clawdis/TalkModeRuntime.swift @@ -0,0 +1,890 @@ +import AVFoundation +import ClawdisChatUI +import ClawdisKit +import Foundation +import OSLog +import Speech + +actor TalkModeRuntime { + static let shared = TalkModeRuntime() + + private let logger = Logger(subsystem: "com.steipete.clawdis", category: "talk.runtime") + private let ttsLogger = Logger(subsystem: "com.steipete.clawdis", category: "talk.tts") + private static let defaultModelIdFallback = "eleven_v3" + + private final class RMSMeter: @unchecked Sendable { + private let lock = NSLock() + private var latestRMS: Double = 0 + + func set(_ rms: Double) { + self.lock.lock() + self.latestRMS = rms + self.lock.unlock() + } + + func get() -> Double { + self.lock.lock() + let value = self.latestRMS + self.lock.unlock() + return value + } + } + + private var recognizer: SFSpeechRecognizer? + private var audioEngine: AVAudioEngine? + private var recognitionRequest: SFSpeechAudioBufferRecognitionRequest? + private var recognitionTask: SFSpeechRecognitionTask? + private var recognitionGeneration: Int = 0 + private var rmsTask: Task? + private let rmsMeter = RMSMeter() + + private var captureTask: Task? + private var silenceTask: Task? + private var phase: TalkModePhase = .idle + private var isEnabled = false + private var isPaused = false + private var lifecycleGeneration: Int = 0 + + private var lastHeard: Date? + private var noiseFloorRMS: Double = 1e-4 + private var lastTranscript: String = "" + private var lastSpeechEnergyAt: Date? + + private var defaultVoiceId: String? + private var currentVoiceId: String? + private var defaultModelId: String? + private var currentModelId: String? + private var voiceOverrideActive = false + private var modelOverrideActive = false + private var defaultOutputFormat: String? + private var interruptOnSpeech: Bool = true + private var lastInterruptedAtSeconds: Double? + private var voiceAliases: [String: String] = [:] + private var lastSpokenText: String? + private var apiKey: String? + private var fallbackVoiceId: String? + private var lastPlaybackWasPCM: Bool = false + + private let silenceWindow: TimeInterval = 0.7 + private let minSpeechRMS: Double = 1e-3 + private let speechBoostFactor: Double = 6.0 + + // MARK: - Lifecycle + + func setEnabled(_ enabled: Bool) async { + guard enabled != self.isEnabled else { return } + self.isEnabled = enabled + self.lifecycleGeneration &+= 1 + if enabled { + await self.start() + } else { + await self.stop() + } + } + + func setPaused(_ paused: Bool) async { + guard paused != self.isPaused else { return } + self.isPaused = paused + await MainActor.run { TalkModeController.shared.updateLevel(0) } + + guard self.isEnabled else { return } + + if paused { + self.lastTranscript = "" + self.lastHeard = nil + self.lastSpeechEnergyAt = nil + await self.stopRecognition() + return + } + + if self.phase == .idle || self.phase == .listening { + await self.startRecognition() + self.phase = .listening + await MainActor.run { TalkModeController.shared.updatePhase(.listening) } + self.startSilenceMonitor() + } + } + + private func isCurrent(_ generation: Int) -> Bool { + generation == self.lifecycleGeneration && self.isEnabled + } + + private func start() async { + let gen = self.lifecycleGeneration + guard voiceWakeSupported else { return } + guard PermissionManager.voiceWakePermissionsGranted() else { + self.logger.debug("talk runtime not starting: permissions missing") + return + } + await self.reloadConfig() + guard self.isCurrent(gen) else { return } + if self.isPaused { + self.phase = .idle + await MainActor.run { + TalkModeController.shared.updateLevel(0) + TalkModeController.shared.updatePhase(.idle) + } + return + } + await self.startRecognition() + guard self.isCurrent(gen) else { return } + self.phase = .listening + await MainActor.run { TalkModeController.shared.updatePhase(.listening) } + self.startSilenceMonitor() + } + + private func stop() async { + self.captureTask?.cancel() + self.captureTask = nil + self.silenceTask?.cancel() + self.silenceTask = nil + + // Stop audio before changing phase (stopSpeaking is gated on .speaking). + await self.stopSpeaking(reason: .manual) + + self.lastTranscript = "" + self.lastHeard = nil + self.lastSpeechEnergyAt = nil + self.phase = .idle + await self.stopRecognition() + await MainActor.run { + TalkModeController.shared.updateLevel(0) + TalkModeController.shared.updatePhase(.idle) + } + } + + // MARK: - Speech recognition + + private struct RecognitionUpdate { + let transcript: String? + let hasConfidence: Bool + let isFinal: Bool + let errorDescription: String? + let generation: Int + } + + private func startRecognition() async { + await self.stopRecognition() + self.recognitionGeneration &+= 1 + let generation = self.recognitionGeneration + + let locale = await MainActor.run { AppStateStore.shared.voiceWakeLocaleID } + self.recognizer = SFSpeechRecognizer(locale: Locale(identifier: locale)) + guard let recognizer, recognizer.isAvailable else { + self.logger.error("talk recognizer unavailable") + return + } + + self.recognitionRequest = SFSpeechAudioBufferRecognitionRequest() + self.recognitionRequest?.shouldReportPartialResults = true + guard let request = self.recognitionRequest else { return } + + if self.audioEngine == nil { + self.audioEngine = AVAudioEngine() + } + guard let audioEngine = self.audioEngine else { return } + + let input = audioEngine.inputNode + let format = input.outputFormat(forBus: 0) + input.removeTap(onBus: 0) + let meter = self.rmsMeter + input.installTap(onBus: 0, bufferSize: 2048, format: format) { [weak request, meter] buffer, _ in + request?.append(buffer) + if let rms = Self.rmsLevel(buffer: buffer) { + meter.set(rms) + } + } + + audioEngine.prepare() + do { + try audioEngine.start() + } catch { + self.logger.error("talk audio engine start failed: \(error.localizedDescription, privacy: .public)") + return + } + + self.startRMSTicker(meter: meter) + + self.recognitionTask = recognizer.recognitionTask(with: request) { [weak self, generation] result, error in + guard let self else { return } + let segments = result?.bestTranscription.segments ?? [] + let transcript = result?.bestTranscription.formattedString + let update = RecognitionUpdate( + transcript: transcript, + hasConfidence: segments.contains { $0.confidence > 0.6 }, + isFinal: result?.isFinal ?? false, + errorDescription: error?.localizedDescription, + generation: generation) + Task { await self.handleRecognition(update) } + } + } + + private func stopRecognition() async { + self.recognitionGeneration &+= 1 + self.recognitionTask?.cancel() + self.recognitionTask = nil + self.recognitionRequest?.endAudio() + self.recognitionRequest = nil + self.audioEngine?.inputNode.removeTap(onBus: 0) + self.audioEngine?.stop() + self.audioEngine = nil + self.recognizer = nil + self.rmsTask?.cancel() + self.rmsTask = nil + } + + private func startRMSTicker(meter: RMSMeter) { + self.rmsTask?.cancel() + self.rmsTask = Task { [weak self, meter] in + while let self { + try? await Task.sleep(nanoseconds: 50_000_000) + if Task.isCancelled { return } + await self.noteAudioLevel(rms: meter.get()) + } + } + } + + private func handleRecognition(_ update: RecognitionUpdate) async { + guard update.generation == self.recognitionGeneration else { return } + guard !self.isPaused else { return } + if let errorDescription = update.errorDescription { + self.logger.debug("talk recognition error: \(errorDescription, privacy: .public)") + } + guard let transcript = update.transcript else { return } + + let trimmed = transcript.trimmingCharacters(in: .whitespacesAndNewlines) + if self.phase == .speaking, self.interruptOnSpeech { + if await self.shouldInterrupt(transcript: trimmed, hasConfidence: update.hasConfidence) { + await self.stopSpeaking(reason: .speech) + self.lastTranscript = "" + self.lastHeard = nil + await self.startListening() + } + return + } + + guard self.phase == .listening else { return } + + if !trimmed.isEmpty { + self.lastTranscript = trimmed + self.lastHeard = Date() + } + + if update.isFinal { + self.lastTranscript = trimmed + } + } + + // MARK: - Silence handling + + private func startSilenceMonitor() { + self.silenceTask?.cancel() + self.silenceTask = Task { [weak self] in + await self?.silenceLoop() + } + } + + private func silenceLoop() async { + while self.isEnabled { + try? await Task.sleep(nanoseconds: 200_000_000) + await self.checkSilence() + } + } + + private func checkSilence() async { + guard !self.isPaused else { return } + guard self.phase == .listening else { return } + let transcript = self.lastTranscript.trimmingCharacters(in: .whitespacesAndNewlines) + guard !transcript.isEmpty else { return } + guard let lastHeard else { return } + let elapsed = Date().timeIntervalSince(lastHeard) + guard elapsed >= self.silenceWindow else { return } + await self.finalizeTranscript(transcript) + } + + private func startListening() async { + self.phase = .listening + self.lastTranscript = "" + self.lastHeard = nil + await MainActor.run { + TalkModeController.shared.updatePhase(.listening) + TalkModeController.shared.updateLevel(0) + } + } + + private func finalizeTranscript(_ text: String) async { + self.lastTranscript = "" + self.lastHeard = nil + self.phase = .thinking + await MainActor.run { TalkModeController.shared.updatePhase(.thinking) } + await self.stopRecognition() + await self.sendAndSpeak(text) + } + + // MARK: - Gateway + TTS + + private func sendAndSpeak(_ transcript: String) async { + let gen = self.lifecycleGeneration + await self.reloadConfig() + guard self.isCurrent(gen) else { return } + let prompt = self.buildPrompt(transcript: transcript) + let activeSessionKey = await MainActor.run { WebChatManager.shared.activeSessionKey } + let sessionKey: String = if let activeSessionKey { + activeSessionKey + } else { + await GatewayConnection.shared.mainSessionKey() + } + let runId = UUID().uuidString + let startedAt = Date().timeIntervalSince1970 + self.logger.info( + "talk send start runId=\(runId, privacy: .public) session=\(sessionKey, privacy: .public) chars=\(prompt.count, privacy: .public)") + + do { + let response = try await GatewayConnection.shared.chatSend( + sessionKey: sessionKey, + message: prompt, + thinking: "low", + idempotencyKey: runId, + attachments: []) + guard self.isCurrent(gen) else { return } + self.logger.info( + "talk chat.send ok runId=\(response.runId, privacy: .public) session=\(sessionKey, privacy: .public)") + + guard let assistantText = await self.waitForAssistantText( + sessionKey: sessionKey, + since: startedAt, + timeoutSeconds: 45) + else { + self.logger.warning("talk assistant text missing after timeout") + await self.startListening() + await self.startRecognition() + return + } + guard self.isCurrent(gen) else { return } + + self.logger.info("talk assistant text len=\(assistantText.count, privacy: .public)") + await self.playAssistant(text: assistantText) + guard self.isCurrent(gen) else { return } + await self.resumeListeningIfNeeded() + return + } catch { + self.logger.error("talk chat.send failed: \(error.localizedDescription, privacy: .public)") + await self.resumeListeningIfNeeded() + return + } + } + + private func resumeListeningIfNeeded() async { + if self.isPaused { + self.lastTranscript = "" + self.lastHeard = nil + self.lastSpeechEnergyAt = nil + await MainActor.run { + TalkModeController.shared.updateLevel(0) + } + return + } + await self.startListening() + await self.startRecognition() + } + + private func buildPrompt(transcript: String) -> String { + let interrupted = self.lastInterruptedAtSeconds + self.lastInterruptedAtSeconds = nil + return TalkPromptBuilder.build(transcript: transcript, interruptedAtSeconds: interrupted) + } + + private func waitForAssistantText( + sessionKey: String, + since: Double, + timeoutSeconds: Int) async -> String? + { + let deadline = Date().addingTimeInterval(TimeInterval(timeoutSeconds)) + while Date() < deadline { + if let text = await self.latestAssistantText(sessionKey: sessionKey, since: since) { + return text + } + try? await Task.sleep(nanoseconds: 300_000_000) + } + return nil + } + + private func latestAssistantText(sessionKey: String, since: Double? = nil) async -> String? { + do { + let history = try await GatewayConnection.shared.chatHistory(sessionKey: sessionKey) + let messages = history.messages ?? [] + let decoded: [ClawdisChatMessage] = messages.compactMap { item in + guard let data = try? JSONEncoder().encode(item) else { return nil } + return try? JSONDecoder().decode(ClawdisChatMessage.self, from: data) + } + let assistant = decoded.last { message in + guard message.role == "assistant" else { return false } + guard let since else { return true } + guard let timestamp = message.timestamp else { return false } + return TalkHistoryTimestamp.isAfter(timestamp, sinceSeconds: since) + } + guard let assistant else { return nil } + let text = assistant.content.compactMap(\.text).joined(separator: "\n") + let trimmed = text.trimmingCharacters(in: CharacterSet.whitespacesAndNewlines) + return trimmed.isEmpty ? nil : trimmed + } catch { + self.logger.error("talk history fetch failed: \(error.localizedDescription, privacy: .public)") + return nil + } + } + + private func playAssistant(text: String) async { + let gen = self.lifecycleGeneration + let parse = TalkDirectiveParser.parse(text) + let directive = parse.directive + let cleaned = parse.stripped.trimmingCharacters(in: .whitespacesAndNewlines) + guard !cleaned.isEmpty else { return } + guard self.isCurrent(gen) else { return } + + if !parse.unknownKeys.isEmpty { + self.logger + .warning("talk directive ignored keys: \(parse.unknownKeys.joined(separator: ","), privacy: .public)") + } + + let requestedVoice = directive?.voiceId?.trimmingCharacters(in: .whitespacesAndNewlines) + let resolvedVoice = self.resolveVoiceAlias(requestedVoice) + if let requestedVoice, !requestedVoice.isEmpty, resolvedVoice == nil { + self.logger.warning("talk unknown voice alias \(requestedVoice, privacy: .public)") + } + if let voice = resolvedVoice { + if directive?.once == true { + self.logger.info("talk voice override (once) voiceId=\(voice, privacy: .public)") + } else { + self.currentVoiceId = voice + self.voiceOverrideActive = true + self.logger.info("talk voice override voiceId=\(voice, privacy: .public)") + } + } + + if let model = directive?.modelId { + if directive?.once == true { + self.logger.info("talk model override (once) modelId=\(model, privacy: .public)") + } else { + self.currentModelId = model + self.modelOverrideActive = true + } + } + + let apiKey = self.apiKey?.trimmingCharacters(in: .whitespacesAndNewlines) + let preferredVoice = + resolvedVoice ?? + self.currentVoiceId ?? + self.defaultVoiceId + + let language = ElevenLabsTTSClient.validatedLanguage(directive?.language) + + let voiceId: String? = if let apiKey, !apiKey.isEmpty { + await self.resolveVoiceId(preferred: preferredVoice, apiKey: apiKey) + } else { + nil + } + + if apiKey?.isEmpty != false { + self.ttsLogger.warning("talk missing ELEVENLABS_API_KEY; falling back to system voice") + } else if voiceId == nil { + self.ttsLogger.warning("talk missing voiceId; falling back to system voice") + } else if let voiceId { + self.ttsLogger + .info("talk TTS request voiceId=\(voiceId, privacy: .public) chars=\(cleaned.count, privacy: .public)") + } + self.lastSpokenText = cleaned + + let synthTimeoutSeconds = max(20.0, min(90.0, Double(cleaned.count) * 0.12)) + + do { + if let apiKey, !apiKey.isEmpty, let voiceId { + let desiredOutputFormat = directive?.outputFormat ?? self.defaultOutputFormat ?? "pcm_44100" + let outputFormat = ElevenLabsTTSClient.validatedOutputFormat(desiredOutputFormat) + if outputFormat == nil, !desiredOutputFormat.isEmpty { + self.logger + .warning( + "talk output_format unsupported for local playback: \(desiredOutputFormat, privacy: .public)") + } + + let modelId = directive?.modelId ?? self.currentModelId ?? self.defaultModelId + func makeRequest(outputFormat: String?) -> ElevenLabsTTSRequest { + ElevenLabsTTSRequest( + text: cleaned, + modelId: modelId, + outputFormat: outputFormat, + speed: TalkTTSValidation.resolveSpeed(speed: directive?.speed, rateWPM: directive?.rateWPM), + stability: TalkTTSValidation.validatedStability(directive?.stability, modelId: modelId), + similarity: TalkTTSValidation.validatedUnit(directive?.similarity), + style: TalkTTSValidation.validatedUnit(directive?.style), + speakerBoost: directive?.speakerBoost, + seed: TalkTTSValidation.validatedSeed(directive?.seed), + normalize: ElevenLabsTTSClient.validatedNormalize(directive?.normalize), + language: language, + latencyTier: TalkTTSValidation.validatedLatencyTier(directive?.latencyTier)) + } + + let request = makeRequest(outputFormat: outputFormat) + + self.ttsLogger.info("talk TTS synth timeout=\(synthTimeoutSeconds, privacy: .public)s") + let client = ElevenLabsTTSClient(apiKey: apiKey) + let stream = client.streamSynthesize(voiceId: voiceId, request: request) + guard self.isCurrent(gen) else { return } + + if self.interruptOnSpeech { + await self.startRecognition() + guard self.isCurrent(gen) else { return } + } + + await MainActor.run { TalkModeController.shared.updatePhase(.speaking) } + self.phase = .speaking + + let sampleRate = TalkTTSValidation.pcmSampleRate(from: outputFormat) + var result: StreamingPlaybackResult + if let sampleRate { + self.lastPlaybackWasPCM = true + result = await self.playPCM(stream: stream, sampleRate: sampleRate) + if !result.finished, result.interruptedAt == nil { + let mp3Format = ElevenLabsTTSClient.validatedOutputFormat("mp3_44100") + self.ttsLogger.warning("talk pcm playback failed; retrying mp3") + self.lastPlaybackWasPCM = false + let mp3Stream = client.streamSynthesize( + voiceId: voiceId, + request: makeRequest(outputFormat: mp3Format)) + result = await self.playMP3(stream: mp3Stream) + } + } else { + self.lastPlaybackWasPCM = false + result = await self.playMP3(stream: stream) + } + self.ttsLogger + .info( + "talk audio result finished=\(result.finished, privacy: .public) interruptedAt=\(String(describing: result.interruptedAt), privacy: .public)") + if !result.finished, result.interruptedAt == nil { + throw NSError(domain: "StreamingAudioPlayer", code: 1, userInfo: [ + NSLocalizedDescriptionKey: "audio playback failed", + ]) + } + if !result.finished, let interruptedAt = result.interruptedAt, self.phase == .speaking { + if self.interruptOnSpeech { + self.lastInterruptedAtSeconds = interruptedAt + } + } + } else { + self.ttsLogger.info("talk system voice start chars=\(cleaned.count, privacy: .public)") + if self.interruptOnSpeech { + await self.startRecognition() + guard self.isCurrent(gen) else { return } + } + await MainActor.run { TalkModeController.shared.updatePhase(.speaking) } + self.phase = .speaking + await TalkSystemSpeechSynthesizer.shared.stop() + try await TalkSystemSpeechSynthesizer.shared.speak(text: cleaned, language: language) + self.ttsLogger.info("talk system voice done") + } + } catch { + self.ttsLogger + .error("talk TTS failed: \(error.localizedDescription, privacy: .public); falling back to system voice") + do { + if self.interruptOnSpeech { + await self.startRecognition() + guard self.isCurrent(gen) else { return } + } + await MainActor.run { TalkModeController.shared.updatePhase(.speaking) } + self.phase = .speaking + await TalkSystemSpeechSynthesizer.shared.stop() + try await TalkSystemSpeechSynthesizer.shared.speak(text: cleaned, language: language) + } catch { + self.ttsLogger.error("talk system voice failed: \(error.localizedDescription, privacy: .public)") + } + } + + if self.phase == .speaking { + self.phase = .thinking + await MainActor.run { TalkModeController.shared.updatePhase(.thinking) } + } + } + + private func resolveVoiceId(preferred: String?, apiKey: String) async -> String? { + let trimmed = preferred?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + if !trimmed.isEmpty { + if let resolved = self.resolveVoiceAlias(trimmed) { return resolved } + self.ttsLogger.warning("talk unknown voice alias \(trimmed, privacy: .public)") + } + if let fallbackVoiceId { return fallbackVoiceId } + + do { + let voices = try await ElevenLabsTTSClient(apiKey: apiKey).listVoices() + guard let first = voices.first else { + self.ttsLogger.error("elevenlabs voices list empty") + return nil + } + self.fallbackVoiceId = first.voiceId + if self.defaultVoiceId == nil { + self.defaultVoiceId = first.voiceId + } + if !self.voiceOverrideActive { + self.currentVoiceId = first.voiceId + } + let name = first.name ?? "unknown" + self.ttsLogger + .info("talk default voice selected \(name, privacy: .public) (\(first.voiceId, privacy: .public))") + return first.voiceId + } catch { + self.ttsLogger.error("elevenlabs list voices failed: \(error.localizedDescription, privacy: .public)") + return nil + } + } + + private func resolveVoiceAlias(_ value: String?) -> String? { + let trimmed = (value ?? "").trimmingCharacters(in: .whitespacesAndNewlines) + guard !trimmed.isEmpty else { return nil } + let normalized = trimmed.lowercased() + if let mapped = self.voiceAliases[normalized] { return mapped } + if self.voiceAliases.values.contains(where: { $0.caseInsensitiveCompare(trimmed) == .orderedSame }) { + return trimmed + } + return Self.isLikelyVoiceId(trimmed) ? trimmed : nil + } + + private static func isLikelyVoiceId(_ value: String) -> Bool { + guard value.count >= 10 else { return false } + return value.allSatisfy { $0.isLetter || $0.isNumber || $0 == "-" || $0 == "_" } + } + + func stopSpeaking(reason: TalkStopReason) async { + let usePCM = self.lastPlaybackWasPCM + let interruptedAt = usePCM ? await self.stopPCM() : await self.stopMP3() + _ = usePCM ? await self.stopMP3() : await self.stopPCM() + await TalkSystemSpeechSynthesizer.shared.stop() + guard self.phase == .speaking else { return } + if reason == .speech, let interruptedAt { + self.lastInterruptedAtSeconds = interruptedAt + } + if reason == .manual { + return + } + if reason == .speech || reason == .userTap { + await self.startListening() + return + } + self.phase = .thinking + await MainActor.run { TalkModeController.shared.updatePhase(.thinking) } + } + + // MARK: - Audio playback (MainActor helpers) + + @MainActor + private func playPCM( + stream: AsyncThrowingStream, + sampleRate: Double) async -> StreamingPlaybackResult + { + await PCMStreamingAudioPlayer.shared.play(stream: stream, sampleRate: sampleRate) + } + + @MainActor + private func playMP3(stream: AsyncThrowingStream) async -> StreamingPlaybackResult { + await StreamingAudioPlayer.shared.play(stream: stream) + } + + @MainActor + private func stopPCM() -> Double? { + PCMStreamingAudioPlayer.shared.stop() + } + + @MainActor + private func stopMP3() -> Double? { + StreamingAudioPlayer.shared.stop() + } + + // MARK: - Config + + private func reloadConfig() async { + let cfg = await self.fetchTalkConfig() + self.defaultVoiceId = cfg.voiceId + self.voiceAliases = cfg.voiceAliases + if !self.voiceOverrideActive { + self.currentVoiceId = cfg.voiceId + } + self.defaultModelId = cfg.modelId + if !self.modelOverrideActive { + self.currentModelId = cfg.modelId + } + self.defaultOutputFormat = cfg.outputFormat + self.interruptOnSpeech = cfg.interruptOnSpeech + self.apiKey = cfg.apiKey + let hasApiKey = (cfg.apiKey?.isEmpty == false) + let voiceLabel = (cfg.voiceId?.isEmpty == false) ? cfg.voiceId! : "none" + let modelLabel = (cfg.modelId?.isEmpty == false) ? cfg.modelId! : "none" + self.logger + .info( + "talk config voiceId=\(voiceLabel, privacy: .public) modelId=\(modelLabel, privacy: .public) apiKey=\(hasApiKey, privacy: .public) interrupt=\(cfg.interruptOnSpeech, privacy: .public)") + } + + private struct TalkRuntimeConfig { + let voiceId: String? + let voiceAliases: [String: String] + let modelId: String? + let outputFormat: String? + let interruptOnSpeech: Bool + let apiKey: String? + } + + private func fetchTalkConfig() async -> TalkRuntimeConfig { + let env = ProcessInfo.processInfo.environment + let envVoice = env["ELEVENLABS_VOICE_ID"]?.trimmingCharacters(in: .whitespacesAndNewlines) + let sagVoice = env["SAG_VOICE_ID"]?.trimmingCharacters(in: .whitespacesAndNewlines) + let envApiKey = env["ELEVENLABS_API_KEY"]?.trimmingCharacters(in: .whitespacesAndNewlines) + + do { + let snap: ConfigSnapshot = try await GatewayConnection.shared.requestDecoded( + method: .configGet, + params: nil, + timeoutMs: 8000) + let talk = snap.config?["talk"]?.dictionaryValue + let ui = snap.config?["ui"]?.dictionaryValue + let rawSeam = ui?["seamColor"]?.stringValue?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + await MainActor.run { + AppStateStore.shared.seamColorHex = rawSeam.isEmpty ? nil : rawSeam + } + let voice = talk?["voiceId"]?.stringValue + let rawAliases = talk?["voiceAliases"]?.dictionaryValue + let resolvedAliases: [String: String] = + rawAliases?.reduce(into: [:]) { acc, entry in + let key = entry.key.trimmingCharacters(in: .whitespacesAndNewlines).lowercased() + let value = entry.value.stringValue?.trimmingCharacters(in: .whitespacesAndNewlines) ?? "" + guard !key.isEmpty, !value.isEmpty else { return } + acc[key] = value + } ?? [:] + let model = talk?["modelId"]?.stringValue?.trimmingCharacters(in: .whitespacesAndNewlines) + let resolvedModel = (model?.isEmpty == false) ? model! : Self.defaultModelIdFallback + let outputFormat = talk?["outputFormat"]?.stringValue + let interrupt = talk?["interruptOnSpeech"]?.boolValue + let apiKey = talk?["apiKey"]?.stringValue + let resolvedVoice = + (voice?.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty == false ? voice : nil) ?? + (envVoice?.isEmpty == false ? envVoice : nil) ?? + (sagVoice?.isEmpty == false ? sagVoice : nil) + let resolvedApiKey = + (envApiKey?.isEmpty == false ? envApiKey : nil) ?? + (apiKey?.trimmingCharacters(in: .whitespacesAndNewlines).isEmpty == false ? apiKey : nil) + return TalkRuntimeConfig( + voiceId: resolvedVoice, + voiceAliases: resolvedAliases, + modelId: resolvedModel, + outputFormat: outputFormat, + interruptOnSpeech: interrupt ?? true, + apiKey: resolvedApiKey) + } catch { + let resolvedVoice = + (envVoice?.isEmpty == false ? envVoice : nil) ?? + (sagVoice?.isEmpty == false ? sagVoice : nil) + let resolvedApiKey = envApiKey?.isEmpty == false ? envApiKey : nil + return TalkRuntimeConfig( + voiceId: resolvedVoice, + voiceAliases: [:], + modelId: Self.defaultModelIdFallback, + outputFormat: nil, + interruptOnSpeech: true, + apiKey: resolvedApiKey) + } + } + + // MARK: - Audio level handling + + private func noteAudioLevel(rms: Double) async { + if self.phase != .listening, self.phase != .speaking { return } + let alpha: Double = rms < self.noiseFloorRMS ? 0.08 : 0.01 + self.noiseFloorRMS = max(1e-7, self.noiseFloorRMS + (rms - self.noiseFloorRMS) * alpha) + + let threshold = max(self.minSpeechRMS, self.noiseFloorRMS * self.speechBoostFactor) + if rms >= threshold { + let now = Date() + self.lastHeard = now + self.lastSpeechEnergyAt = now + } + + if self.phase == .listening { + let clamped = min(1.0, max(0.0, rms / max(self.minSpeechRMS, threshold))) + await MainActor.run { TalkModeController.shared.updateLevel(clamped) } + } + } + + private static func rmsLevel(buffer: AVAudioPCMBuffer) -> Double? { + guard let channelData = buffer.floatChannelData?.pointee else { return nil } + let frameCount = Int(buffer.frameLength) + guard frameCount > 0 else { return nil } + var sum: Double = 0 + for i in 0.. Bool { + let trimmed = transcript.trimmingCharacters(in: .whitespacesAndNewlines) + guard trimmed.count >= 3 else { return false } + if self.isLikelyEcho(of: trimmed) { return false } + let now = Date() + if let lastSpeechEnergyAt, now.timeIntervalSince(lastSpeechEnergyAt) > 0.35 { + return false + } + return hasConfidence + } + + private func isLikelyEcho(of transcript: String) -> Bool { + guard let spoken = self.lastSpokenText?.lowercased(), !spoken.isEmpty else { return false } + let probe = transcript.lowercased() + if probe.count < 6 { + return spoken.contains(probe) + } + return spoken.contains(probe) + } + + private static func resolveSpeed(speed: Double?, rateWPM: Int?, logger: Logger) -> Double? { + if let rateWPM, rateWPM > 0 { + let resolved = Double(rateWPM) / 175.0 + if resolved <= 0.5 || resolved >= 2.0 { + logger.warning("talk rateWPM out of range: \(rateWPM, privacy: .public)") + return nil + } + return resolved + } + if let speed { + if speed <= 0.5 || speed >= 2.0 { + logger.warning("talk speed out of range: \(speed, privacy: .public)") + return nil + } + return speed + } + return nil + } + + private static func validatedUnit(_ value: Double?, name: String, logger: Logger) -> Double? { + guard let value else { return nil } + if value < 0 || value > 1 { + logger.warning("talk \(name, privacy: .public) out of range: \(value, privacy: .public)") + return nil + } + return value + } + + private static func validatedSeed(_ value: Int?, logger: Logger) -> UInt32? { + guard let value else { return nil } + if value < 0 || value > 4_294_967_295 { + logger.warning("talk seed out of range: \(value, privacy: .public)") + return nil + } + return UInt32(value) + } + + private static func validatedNormalize(_ value: String?, logger: Logger) -> String? { + guard let value else { return nil } + let normalized = value.trimmingCharacters(in: .whitespacesAndNewlines).lowercased() + guard ["auto", "on", "off"].contains(normalized) else { + logger.warning("talk normalize invalid: \(normalized, privacy: .public)") + return nil + } + return normalized + } +} diff --git a/apps/macos/Sources/Clawdis/TalkModeTypes.swift b/apps/macos/Sources/Clawdis/TalkModeTypes.swift new file mode 100644 index 000000000..3ae978255 --- /dev/null +++ b/apps/macos/Sources/Clawdis/TalkModeTypes.swift @@ -0,0 +1,8 @@ +import Foundation + +enum TalkModePhase: String { + case idle + case listening + case thinking + case speaking +} diff --git a/apps/macos/Sources/Clawdis/TalkOverlay.swift b/apps/macos/Sources/Clawdis/TalkOverlay.swift new file mode 100644 index 000000000..57dd56c80 --- /dev/null +++ b/apps/macos/Sources/Clawdis/TalkOverlay.swift @@ -0,0 +1,146 @@ +import AppKit +import Observation +import OSLog +import SwiftUI + +@MainActor +@Observable +final class TalkOverlayController { + static let shared = TalkOverlayController() + static let overlaySize: CGFloat = 440 + static let orbSize: CGFloat = 96 + static let orbPadding: CGFloat = 12 + static let orbHitSlop: CGFloat = 10 + + private let logger = Logger(subsystem: "com.steipete.clawdis", category: "talk.overlay") + + struct Model { + var isVisible: Bool = false + var phase: TalkModePhase = .idle + var isPaused: Bool = false + var level: Double = 0 + } + + var model = Model() + private var window: NSPanel? + private var hostingView: NSHostingView? + private let screenInset: CGFloat = 0 + + func present() { + self.ensureWindow() + self.hostingView?.rootView = TalkOverlayView(controller: self) + let target = self.targetFrame() + + guard let window else { return } + if !self.model.isVisible { + self.model.isVisible = true + let start = target.offsetBy(dx: 0, dy: -6) + window.setFrame(start, display: true) + window.alphaValue = 0 + window.orderFrontRegardless() + NSAnimationContext.runAnimationGroup { context in + context.duration = 0.18 + context.timingFunction = CAMediaTimingFunction(name: .easeOut) + window.animator().setFrame(target, display: true) + window.animator().alphaValue = 1 + } + } else { + window.setFrame(target, display: true) + window.orderFrontRegardless() + } + } + + func dismiss() { + guard let window else { + self.model.isVisible = false + return + } + + let target = window.frame.offsetBy(dx: 6, dy: 6) + NSAnimationContext.runAnimationGroup { context in + context.duration = 0.16 + context.timingFunction = CAMediaTimingFunction(name: .easeOut) + window.animator().setFrame(target, display: true) + window.animator().alphaValue = 0 + } completionHandler: { + Task { @MainActor in + window.orderOut(nil) + self.model.isVisible = false + } + } + } + + func updatePhase(_ phase: TalkModePhase) { + guard self.model.phase != phase else { return } + self.logger.info("talk overlay phase=\(phase.rawValue, privacy: .public)") + self.model.phase = phase + } + + func updatePaused(_ paused: Bool) { + guard self.model.isPaused != paused else { return } + self.logger.info("talk overlay paused=\(paused)") + self.model.isPaused = paused + } + + func updateLevel(_ level: Double) { + guard self.model.isVisible else { return } + self.model.level = max(0, min(1, level)) + } + + func currentWindowOrigin() -> CGPoint? { + self.window?.frame.origin + } + + func setWindowOrigin(_ origin: CGPoint) { + guard let window else { return } + window.setFrameOrigin(origin) + } + + // MARK: - Private + + private func ensureWindow() { + if self.window != nil { return } + let panel = NSPanel( + contentRect: NSRect(x: 0, y: 0, width: Self.overlaySize, height: Self.overlaySize), + styleMask: [.nonactivatingPanel, .borderless], + backing: .buffered, + defer: false) + panel.isOpaque = false + panel.backgroundColor = .clear + panel.hasShadow = false + panel.level = NSWindow.Level(rawValue: NSWindow.Level.popUpMenu.rawValue - 4) + panel.collectionBehavior = [.canJoinAllSpaces, .fullScreenAuxiliary, .transient] + panel.hidesOnDeactivate = false + panel.isMovable = false + panel.acceptsMouseMovedEvents = true + panel.isFloatingPanel = true + panel.becomesKeyOnlyIfNeeded = true + panel.titleVisibility = .hidden + panel.titlebarAppearsTransparent = true + + let host = TalkOverlayHostingView(rootView: TalkOverlayView(controller: self)) + host.translatesAutoresizingMaskIntoConstraints = false + panel.contentView = host + self.hostingView = host + self.window = panel + } + + private func targetFrame() -> NSRect { + let screen = self.window?.screen + ?? NSScreen.main + ?? NSScreen.screens.first + guard let screen else { return .zero } + let size = NSSize(width: Self.overlaySize, height: Self.overlaySize) + let visible = screen.visibleFrame + let origin = CGPoint( + x: visible.maxX - size.width - self.screenInset, + y: visible.maxY - size.height - self.screenInset) + return NSRect(origin: origin, size: size) + } +} + +private final class TalkOverlayHostingView: NSHostingView { + override func acceptsFirstMouse(for event: NSEvent?) -> Bool { + true + } +} diff --git a/apps/macos/Sources/Clawdis/TalkOverlayView.swift b/apps/macos/Sources/Clawdis/TalkOverlayView.swift new file mode 100644 index 000000000..154a948a7 --- /dev/null +++ b/apps/macos/Sources/Clawdis/TalkOverlayView.swift @@ -0,0 +1,219 @@ +import AppKit +import SwiftUI + +struct TalkOverlayView: View { + var controller: TalkOverlayController + @State private var appState = AppStateStore.shared + @State private var hoveringWindow = false + + var body: some View { + ZStack(alignment: .topTrailing) { + let isPaused = self.controller.model.isPaused + Color.clear + TalkOrbView( + phase: self.controller.model.phase, + level: self.controller.model.level, + accent: self.seamColor, + isPaused: isPaused) + .frame(width: TalkOverlayController.orbSize, height: TalkOverlayController.orbSize) + .padding(.top, TalkOverlayController.orbPadding) + .padding(.trailing, TalkOverlayController.orbPadding) + .contentShape(Circle()) + .opacity(isPaused ? 0.55 : 1) + .background( + TalkOrbInteractionView( + onSingleClick: { TalkModeController.shared.togglePaused() }, + onDoubleClick: { TalkModeController.shared.stopSpeaking(reason: .userTap) }, + onDragStart: { TalkModeController.shared.setPaused(true) })) + .overlay(alignment: .topLeading) { + Button { + TalkModeController.shared.exitTalkMode() + } label: { + Image(systemName: "xmark") + .font(.system(size: 10, weight: .bold)) + .foregroundStyle(Color.white.opacity(0.95)) + .frame(width: 18, height: 18) + .background(Color.black.opacity(0.4)) + .clipShape(Circle()) + } + .buttonStyle(.plain) + .contentShape(Circle()) + .offset(x: -2, y: -2) + .opacity(self.hoveringWindow ? 1 : 0) + .animation(.easeOut(duration: 0.12), value: self.hoveringWindow) + } + .onHover { self.hoveringWindow = $0 } + } + .frame( + width: TalkOverlayController.overlaySize, + height: TalkOverlayController.overlaySize, + alignment: .topTrailing) + } + + private static let defaultSeamColor = Color(red: 79 / 255.0, green: 122 / 255.0, blue: 154 / 255.0) + + private var seamColor: Color { + Self.color(fromHex: self.appState.seamColorHex) ?? Self.defaultSeamColor + } + + private static func color(fromHex raw: String?) -> Color? { + let trimmed = (raw ?? "").trimmingCharacters(in: .whitespacesAndNewlines) + guard !trimmed.isEmpty else { return nil } + let hex = trimmed.hasPrefix("#") ? String(trimmed.dropFirst()) : trimmed + guard hex.count == 6, let value = Int(hex, radix: 16) else { return nil } + let r = Double((value >> 16) & 0xFF) / 255.0 + let g = Double((value >> 8) & 0xFF) / 255.0 + let b = Double(value & 0xFF) / 255.0 + return Color(red: r, green: g, blue: b) + } +} + +private struct TalkOrbInteractionView: NSViewRepresentable { + let onSingleClick: () -> Void + let onDoubleClick: () -> Void + let onDragStart: () -> Void + + func makeNSView(context: Context) -> NSView { + let view = OrbInteractionNSView() + view.onSingleClick = self.onSingleClick + view.onDoubleClick = self.onDoubleClick + view.onDragStart = self.onDragStart + view.wantsLayer = true + view.layer?.backgroundColor = NSColor.clear.cgColor + return view + } + + func updateNSView(_ nsView: NSView, context: Context) { + guard let view = nsView as? OrbInteractionNSView else { return } + view.onSingleClick = self.onSingleClick + view.onDoubleClick = self.onDoubleClick + view.onDragStart = self.onDragStart + } +} + +private final class OrbInteractionNSView: NSView { + var onSingleClick: (() -> Void)? + var onDoubleClick: (() -> Void)? + var onDragStart: (() -> Void)? + private var mouseDownEvent: NSEvent? + private var didDrag = false + private var suppressSingleClick = false + + override var acceptsFirstResponder: Bool { true } + override func acceptsFirstMouse(for event: NSEvent?) -> Bool { true } + + override func mouseDown(with event: NSEvent) { + self.mouseDownEvent = event + self.didDrag = false + self.suppressSingleClick = event.clickCount > 1 + if event.clickCount == 2 { + self.onDoubleClick?() + } + } + + override func mouseDragged(with event: NSEvent) { + guard let startEvent = self.mouseDownEvent else { return } + if !self.didDrag { + let dx = event.locationInWindow.x - startEvent.locationInWindow.x + let dy = event.locationInWindow.y - startEvent.locationInWindow.y + if abs(dx) + abs(dy) < 2 { return } + self.didDrag = true + self.onDragStart?() + self.window?.performDrag(with: startEvent) + } + } + + override func mouseUp(with event: NSEvent) { + if !self.didDrag && !self.suppressSingleClick { + self.onSingleClick?() + } + self.mouseDownEvent = nil + self.didDrag = false + self.suppressSingleClick = false + } +} + +private struct TalkOrbView: View { + let phase: TalkModePhase + let level: Double + let accent: Color + let isPaused: Bool + + var body: some View { + if self.isPaused { + Circle() + .fill(self.orbGradient) + .overlay(Circle().stroke(Color.white.opacity(0.35), lineWidth: 1)) + .shadow(color: Color.black.opacity(0.18), radius: 10, x: 0, y: 5) + } else { + TimelineView(.animation) { context in + let t = context.date.timeIntervalSinceReferenceDate + let listenScale = phase == .listening ? (1 + CGFloat(self.level) * 0.12) : 1 + let pulse = phase == .speaking ? (1 + 0.06 * sin(t * 6)) : 1 + + ZStack { + Circle() + .fill(self.orbGradient) + .overlay(Circle().stroke(Color.white.opacity(0.45), lineWidth: 1)) + .shadow(color: Color.black.opacity(0.22), radius: 10, x: 0, y: 5) + .scaleEffect(pulse * listenScale) + + TalkWaveRings(phase: phase, level: level, time: t, accent: self.accent) + + if phase == .thinking { + TalkOrbitArcs(time: t) + } + } + } + } + } + + private var orbGradient: RadialGradient { + RadialGradient( + colors: [Color.white, self.accent], + center: .topLeading, + startRadius: 4, + endRadius: 52) + } +} + +private struct TalkWaveRings: View { + let phase: TalkModePhase + let level: Double + let time: TimeInterval + let accent: Color + + var body: some View { + ZStack { + ForEach(0..<3, id: \.self) { idx in + let speed = phase == .speaking ? 1.4 : phase == .listening ? 0.9 : 0.6 + let progress = (time * speed + Double(idx) * 0.28).truncatingRemainder(dividingBy: 1) + let amplitude = phase == .speaking ? 0.95 : phase == .listening ? 0.5 + level * 0.7 : 0.35 + let scale = 0.75 + progress * amplitude + (phase == .listening ? level * 0.15 : 0) + let alpha = phase == .speaking ? 0.72 : phase == .listening ? 0.58 + level * 0.28 : 0.4 + Circle() + .stroke(self.accent.opacity(alpha - progress * 0.3), lineWidth: 1.6) + .scaleEffect(scale) + .opacity(alpha - progress * 0.6) + } + } + } +} + +private struct TalkOrbitArcs: View { + let time: TimeInterval + + var body: some View { + ZStack { + Circle() + .trim(from: 0.08, to: 0.26) + .stroke(Color.white.opacity(0.88), style: StrokeStyle(lineWidth: 1.6, lineCap: .round)) + .rotationEffect(.degrees(time * 42)) + Circle() + .trim(from: 0.62, to: 0.86) + .stroke(Color.white.opacity(0.7), style: StrokeStyle(lineWidth: 1.4, lineCap: .round)) + .rotationEffect(.degrees(-time * 35)) + } + .scaleEffect(1.08) + } +} diff --git a/apps/macos/Sources/Clawdis/VoiceSessionCoordinator.swift b/apps/macos/Sources/Clawdis/VoiceSessionCoordinator.swift index 070a066d4..5cd2f79f5 100644 --- a/apps/macos/Sources/Clawdis/VoiceSessionCoordinator.swift +++ b/apps/macos/Sources/Clawdis/VoiceSessionCoordinator.swift @@ -1,7 +1,6 @@ import AppKit import Foundation import Observation -import OSLog @MainActor @Observable diff --git a/apps/macos/Sources/Clawdis/VoiceWakeOverlay.swift b/apps/macos/Sources/Clawdis/VoiceWakeOverlay.swift index f77ec4031..f2b818898 100644 --- a/apps/macos/Sources/Clawdis/VoiceWakeOverlay.swift +++ b/apps/macos/Sources/Clawdis/VoiceWakeOverlay.swift @@ -1,6 +1,5 @@ import AppKit import Observation -import OSLog import SwiftUI /// Lightweight, borderless panel that shows the current voice wake transcript near the menu bar. diff --git a/apps/macos/Sources/Clawdis/VoiceWakeTester.swift b/apps/macos/Sources/Clawdis/VoiceWakeTester.swift index 3c6a81283..5d6d77852 100644 --- a/apps/macos/Sources/Clawdis/VoiceWakeTester.swift +++ b/apps/macos/Sources/Clawdis/VoiceWakeTester.swift @@ -1,6 +1,5 @@ import AVFoundation import Foundation -import OSLog import Speech import SwabbleKit diff --git a/apps/macos/Sources/Clawdis/WebChatManager.swift b/apps/macos/Sources/Clawdis/WebChatManager.swift index 3d550ada3..2f77692de 100644 --- a/apps/macos/Sources/Clawdis/WebChatManager.swift +++ b/apps/macos/Sources/Clawdis/WebChatManager.swift @@ -29,6 +29,10 @@ final class WebChatManager { var onPanelVisibilityChanged: ((Bool) -> Void)? + var activeSessionKey: String? { + self.panelSessionKey ?? self.windowSessionKey + } + func show(sessionKey: String) { self.closePanel() if let controller = self.windowController { diff --git a/apps/macos/Sources/Clawdis/WebChatSwiftUI.swift b/apps/macos/Sources/Clawdis/WebChatSwiftUI.swift index b47d140ee..d396bb286 100644 --- a/apps/macos/Sources/Clawdis/WebChatSwiftUI.swift +++ b/apps/macos/Sources/Clawdis/WebChatSwiftUI.swift @@ -155,7 +155,8 @@ final class WebChatSwiftUIWindowController { self.sessionKey = sessionKey self.presentation = presentation let vm = ClawdisChatViewModel(sessionKey: sessionKey, transport: transport) - self.hosting = NSHostingController(rootView: ClawdisChatView(viewModel: vm)) + let accent = Self.color(fromHex: AppStateStore.shared.seamColorHex) + self.hosting = NSHostingController(rootView: ClawdisChatView(viewModel: vm, userAccent: accent)) self.contentController = Self.makeContentController(for: presentation, hosting: self.hosting) self.window = Self.makeWindow(for: presentation, contentViewController: self.contentController) } @@ -355,4 +356,15 @@ final class WebChatSwiftUIWindowController { window.setFrame(frame, display: false) } } + + private static func color(fromHex raw: String?) -> Color? { + let trimmed = (raw ?? "").trimmingCharacters(in: .whitespacesAndNewlines) + guard !trimmed.isEmpty else { return nil } + let hex = trimmed.hasPrefix("#") ? String(trimmed.dropFirst()) : trimmed + guard hex.count == 6, let value = Int(hex, radix: 16) else { return nil } + let r = Double((value >> 16) & 0xFF) / 255.0 + let g = Double((value >> 8) & 0xFF) / 255.0 + let b = Double(value & 0xFF) / 255.0 + return Color(red: r, green: g, blue: b) + } } diff --git a/apps/macos/Sources/ClawdisProtocol/GatewayModels.swift b/apps/macos/Sources/ClawdisProtocol/GatewayModels.swift index 4313c97ef..687313a53 100644 --- a/apps/macos/Sources/ClawdisProtocol/GatewayModels.swift +++ b/apps/macos/Sources/ClawdisProtocol/GatewayModels.swift @@ -689,6 +689,23 @@ public struct ConfigSetParams: Codable { } } +public struct TalkModeParams: Codable { + public let enabled: Bool + public let phase: String? + + public init( + enabled: Bool, + phase: String? + ) { + self.enabled = enabled + self.phase = phase + } + private enum CodingKeys: String, CodingKey { + case enabled + case phase + } +} + public struct ProvidersStatusParams: Codable { public let probe: Bool? public let timeoutms: Int? diff --git a/apps/macos/Tests/ClawdisIPCTests/CommandResolverTests.swift b/apps/macos/Tests/ClawdisIPCTests/CommandResolverTests.swift index f0a543a87..9a4a650aa 100644 --- a/apps/macos/Tests/ClawdisIPCTests/CommandResolverTests.swift +++ b/apps/macos/Tests/ClawdisIPCTests/CommandResolverTests.swift @@ -52,12 +52,17 @@ import Testing try FileManager.default.setAttributes([.posixPermissions: 0o755], ofItemAtPath: nodePath.path) try self.makeExec(at: scriptPath) - let cmd = CommandResolver.clawdisCommand(subcommand: "rpc", defaults: defaults) + let cmd = CommandResolver.clawdisCommand( + subcommand: "rpc", + defaults: defaults, + searchPaths: [tmp.appendingPathComponent("node_modules/.bin").path]) #expect(cmd.count >= 3) - #expect(cmd[0] == nodePath.path) - #expect(cmd[1] == scriptPath.path) - #expect(cmd[2] == "rpc") + if cmd.count >= 3 { + #expect(cmd[0] == nodePath.path) + #expect(cmd[1] == scriptPath.path) + #expect(cmd[2] == "rpc") + } } @Test func fallsBackToPnpm() async throws { diff --git a/apps/macos/Tests/ClawdisIPCTests/ConnectionsSettingsSmokeTests.swift b/apps/macos/Tests/ClawdisIPCTests/ConnectionsSettingsSmokeTests.swift index 4941b0524..a9ba93a5f 100644 --- a/apps/macos/Tests/ClawdisIPCTests/ConnectionsSettingsSmokeTests.swift +++ b/apps/macos/Tests/ClawdisIPCTests/ConnectionsSettingsSmokeTests.swift @@ -43,7 +43,8 @@ struct ConnectionsSettingsSmokeTests { elapsedMs: 120, bot: ProvidersStatusSnapshot.TelegramBot(id: 123, username: "clawdisbot"), webhook: ProvidersStatusSnapshot.TelegramWebhook(url: "https://example.com/hook", hasCustomCert: false)), - lastProbeAt: 1_700_000_050_000)) + lastProbeAt: 1_700_000_050_000), + discord: nil) store.whatsappLoginMessage = "Scan QR" store.whatsappLoginQrDataUrl = @@ -92,7 +93,8 @@ struct ConnectionsSettingsSmokeTests { elapsedMs: 120, bot: nil, webhook: nil), - lastProbeAt: 1_700_000_100_000)) + lastProbeAt: 1_700_000_100_000), + discord: nil) let view = ConnectionsSettings(store: store) _ = view.body diff --git a/apps/macos/Tests/ClawdisIPCTests/TalkAudioPlayerTests.swift b/apps/macos/Tests/ClawdisIPCTests/TalkAudioPlayerTests.swift new file mode 100644 index 000000000..8654f03e3 --- /dev/null +++ b/apps/macos/Tests/ClawdisIPCTests/TalkAudioPlayerTests.swift @@ -0,0 +1,97 @@ +import Foundation +import Testing +@testable import Clawdis + +@Suite(.serialized) struct TalkAudioPlayerTests { + @MainActor + @Test func playDoesNotHangWhenPlaybackEndsOrFails() async throws { + let wav = makeWav16Mono(sampleRate: 8000, samples: 80) + defer { _ = TalkAudioPlayer.shared.stop() } + + _ = try await withTimeout(seconds: 2.0) { + await TalkAudioPlayer.shared.play(data: wav) + } + + #expect(true) + } + + @MainActor + @Test func playDoesNotHangWhenPlayIsCalledTwice() async throws { + let wav = makeWav16Mono(sampleRate: 8000, samples: 800) + defer { _ = TalkAudioPlayer.shared.stop() } + + let first = Task { @MainActor in + await TalkAudioPlayer.shared.play(data: wav) + } + + await Task.yield() + _ = await TalkAudioPlayer.shared.play(data: wav) + + _ = try await withTimeout(seconds: 2.0) { + await first.value + } + #expect(true) + } +} + +private struct TimeoutError: Error {} + +private func withTimeout( + seconds: Double, + _ work: @escaping @Sendable () async throws -> T) async throws -> T +{ + try await withThrowingTaskGroup(of: T.self) { group in + group.addTask { + try await work() + } + group.addTask { + try await Task.sleep(nanoseconds: UInt64(seconds * 1_000_000_000)) + throw TimeoutError() + } + let result = try await group.next() + group.cancelAll() + guard let result else { throw TimeoutError() } + return result + } +} + +private func makeWav16Mono(sampleRate: UInt32, samples: Int) -> Data { + let channels: UInt16 = 1 + let bitsPerSample: UInt16 = 16 + let blockAlign = channels * (bitsPerSample / 8) + let byteRate = sampleRate * UInt32(blockAlign) + let dataSize = UInt32(samples) * UInt32(blockAlign) + + var data = Data() + data.append(contentsOf: [0x52, 0x49, 0x46, 0x46]) // RIFF + data.appendLEUInt32(36 + dataSize) + data.append(contentsOf: [0x57, 0x41, 0x56, 0x45]) // WAVE + + data.append(contentsOf: [0x66, 0x6D, 0x74, 0x20]) // fmt + data.appendLEUInt32(16) // PCM + data.appendLEUInt16(1) // audioFormat + data.appendLEUInt16(channels) + data.appendLEUInt32(sampleRate) + data.appendLEUInt32(byteRate) + data.appendLEUInt16(blockAlign) + data.appendLEUInt16(bitsPerSample) + + data.append(contentsOf: [0x64, 0x61, 0x74, 0x61]) // data + data.appendLEUInt32(dataSize) + + // Silence samples. + data.append(Data(repeating: 0, count: Int(dataSize))) + return data +} + +private extension Data { + mutating func appendLEUInt16(_ value: UInt16) { + var v = value.littleEndian + Swift.withUnsafeBytes(of: &v) { append(contentsOf: $0) } + } + + mutating func appendLEUInt32(_ value: UInt32) { + var v = value.littleEndian + Swift.withUnsafeBytes(of: &v) { append(contentsOf: $0) } + } +} diff --git a/apps/shared/ClawdisKit/Package.swift b/apps/shared/ClawdisKit/Package.swift index 3d7c4a784..d7642c233 100644 --- a/apps/shared/ClawdisKit/Package.swift +++ b/apps/shared/ClawdisKit/Package.swift @@ -12,10 +12,15 @@ let package = Package( .library(name: "ClawdisKit", targets: ["ClawdisKit"]), .library(name: "ClawdisChatUI", targets: ["ClawdisChatUI"]), ], + dependencies: [ + .package(path: "../../../../ElevenLabsKit"), + ], targets: [ .target( name: "ClawdisKit", - dependencies: [], + dependencies: [ + .product(name: "ElevenLabsKit", package: "ElevenLabsKit"), + ], resources: [ .process("Resources"), ], diff --git a/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatMessageViews.swift b/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatMessageViews.swift index 0b14e852a..bd8e97c52 100644 --- a/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatMessageViews.swift +++ b/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatMessageViews.swift @@ -137,9 +137,10 @@ private struct ChatBubbleShape: InsettableShape { struct ChatMessageBubble: View { let message: ClawdisChatMessage let style: ClawdisChatView.Style + let userAccent: Color? var body: some View { - ChatMessageBody(message: self.message, isUser: self.isUser, style: self.style) + ChatMessageBody(message: self.message, isUser: self.isUser, style: self.style, userAccent: self.userAccent) .frame(maxWidth: ChatUIConstants.bubbleMaxWidth, alignment: self.isUser ? .trailing : .leading) .frame(maxWidth: .infinity, alignment: self.isUser ? .trailing : .leading) .padding(.horizontal, 2) @@ -153,6 +154,7 @@ private struct ChatMessageBody: View { let message: ClawdisChatMessage let isUser: Bool let style: ClawdisChatView.Style + let userAccent: Color? var body: some View { let text = self.primaryText @@ -287,7 +289,7 @@ private struct ChatMessageBody: View { private var bubbleFillColor: Color { if self.isUser { - return ClawdisChatTheme.userBubble + return self.userAccent ?? ClawdisChatTheme.userBubble } if self.style == .onboarding { return ClawdisChatTheme.onboardingAssistantBubble diff --git a/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatTheme.swift b/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatTheme.swift index ac5466c9c..33ed55e94 100644 --- a/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatTheme.swift +++ b/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatTheme.swift @@ -101,11 +101,7 @@ enum ClawdisChatTheme { } static var userBubble: Color { - #if os(macOS) - Color(nsColor: .systemBlue) - #else - Color(uiColor: .systemBlue) - #endif + Color(red: 127 / 255.0, green: 184 / 255.0, blue: 212 / 255.0) } static var assistantBubble: Color { diff --git a/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatView.swift b/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatView.swift index acba80385..899150d94 100644 --- a/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatView.swift +++ b/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatView.swift @@ -9,10 +9,12 @@ public struct ClawdisChatView: View { @State private var viewModel: ClawdisChatViewModel @State private var scrollerBottomID = UUID() + @State private var scrollPosition: UUID? @State private var showSessions = false @State private var hasPerformedInitialScroll = false private let showsSessionSwitcher: Bool private let style: Style + private let userAccent: Color? private enum Layout { #if os(macOS) @@ -37,11 +39,13 @@ public struct ClawdisChatView: View { public init( viewModel: ClawdisChatViewModel, showsSessionSwitcher: Bool = false, - style: Style = .standard) + style: Style = .standard, + userAccent: Color? = nil) { self._viewModel = State(initialValue: viewModel) self.showsSessionSwitcher = showsSessionSwitcher self.style = style + self.userAccent = userAccent } public var body: some View { @@ -56,6 +60,7 @@ public struct ClawdisChatView: View { .padding(.horizontal, Layout.outerPaddingHorizontal) .padding(.vertical, Layout.outerPaddingVertical) .frame(maxWidth: .infinity) + .frame(maxHeight: .infinity, alignment: .top) } .frame(maxWidth: .infinity, maxHeight: .infinity, alignment: .top) .onAppear { self.viewModel.load() } @@ -69,68 +74,78 @@ public struct ClawdisChatView: View { } private var messageList: some View { - ScrollViewReader { proxy in - ZStack { - ScrollView { - LazyVStack(spacing: Layout.messageSpacing) { - ForEach(self.visibleMessages) { msg in - ChatMessageBubble(message: msg, style: self.style) - .frame( - maxWidth: .infinity, - alignment: msg.role.lowercased() == "user" ? .trailing : .leading) - } - - if self.viewModel.pendingRunCount > 0 { - HStack { - ChatTypingIndicatorBubble(style: self.style) - .equatable() - Spacer(minLength: 0) - } - } - - if !self.viewModel.pendingToolCalls.isEmpty { - ChatPendingToolsBubble(toolCalls: self.viewModel.pendingToolCalls) - .equatable() - .frame(maxWidth: .infinity, alignment: .leading) - } - - if let text = self.viewModel.streamingAssistantText, !text.isEmpty { - ChatStreamingAssistantBubble(text: text) - .frame(maxWidth: .infinity, alignment: .leading) - } - - Color.clear - .frame(height: Layout.messageListPaddingBottom + 1) - .id(self.scrollerBottomID) - } - .padding(.top, Layout.messageListPaddingTop) - .padding(.horizontal, Layout.messageListPaddingHorizontal) - } - - if self.viewModel.isLoading { - ProgressView() - .controlSize(.large) - .frame(maxWidth: .infinity, maxHeight: .infinity) + ZStack { + ScrollView { + LazyVStack(spacing: Layout.messageSpacing) { + self.messageListRows } + // Use scroll targets for stable auto-scroll without ScrollViewReader relayout glitches. + .scrollTargetLayout() + .padding(.top, Layout.messageListPaddingTop) + .padding(.horizontal, Layout.messageListPaddingHorizontal) } - .onChange(of: self.viewModel.isLoading) { _, isLoading in - guard !isLoading, !self.hasPerformedInitialScroll else { return } - proxy.scrollTo(self.scrollerBottomID, anchor: .bottom) - self.hasPerformedInitialScroll = true - } - .onChange(of: self.viewModel.messages.count) { _, _ in - guard self.hasPerformedInitialScroll else { return } - withAnimation(.snappy(duration: 0.22)) { - proxy.scrollTo(self.scrollerBottomID, anchor: .bottom) - } - } - .onChange(of: self.viewModel.pendingRunCount) { _, _ in - guard self.hasPerformedInitialScroll else { return } - withAnimation(.snappy(duration: 0.22)) { - proxy.scrollTo(self.scrollerBottomID, anchor: .bottom) - } + // Keep the scroll pinned to the bottom for new messages. + .scrollPosition(id: self.$scrollPosition, anchor: .bottom) + + if self.viewModel.isLoading { + ProgressView() + .controlSize(.large) + .frame(maxWidth: .infinity, maxHeight: .infinity) } } + // Ensure the message list claims vertical space on the first layout pass. + .frame(maxHeight: .infinity, alignment: .top) + .layoutPriority(1) + .onChange(of: self.viewModel.isLoading) { _, isLoading in + guard !isLoading, !self.hasPerformedInitialScroll else { return } + self.scrollPosition = self.scrollerBottomID + self.hasPerformedInitialScroll = true + } + .onChange(of: self.viewModel.messages.count) { _, _ in + guard self.hasPerformedInitialScroll else { return } + withAnimation(.snappy(duration: 0.22)) { + self.scrollPosition = self.scrollerBottomID + } + } + .onChange(of: self.viewModel.pendingRunCount) { _, _ in + guard self.hasPerformedInitialScroll else { return } + withAnimation(.snappy(duration: 0.22)) { + self.scrollPosition = self.scrollerBottomID + } + } + } + + @ViewBuilder + private var messageListRows: some View { + ForEach(self.visibleMessages) { msg in + ChatMessageBubble(message: msg, style: self.style, userAccent: self.userAccent) + .frame( + maxWidth: .infinity, + alignment: msg.role.lowercased() == "user" ? .trailing : .leading) + } + + if self.viewModel.pendingRunCount > 0 { + HStack { + ChatTypingIndicatorBubble(style: self.style) + .equatable() + Spacer(minLength: 0) + } + } + + if !self.viewModel.pendingToolCalls.isEmpty { + ChatPendingToolsBubble(toolCalls: self.viewModel.pendingToolCalls) + .equatable() + .frame(maxWidth: .infinity, alignment: .leading) + } + + if let text = self.viewModel.streamingAssistantText, !text.isEmpty { + ChatStreamingAssistantBubble(text: text) + .frame(maxWidth: .infinity, alignment: .leading) + } + + Color.clear + .frame(height: Layout.messageListPaddingBottom + 1) + .id(self.scrollerBottomID) } private var visibleMessages: [ClawdisChatMessage] { diff --git a/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatViewModel.swift b/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatViewModel.swift index 4c96b8075..087ef912c 100644 --- a/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatViewModel.swift +++ b/apps/shared/ClawdisKit/Sources/ClawdisChatUI/ChatViewModel.swift @@ -150,9 +150,36 @@ public final class ClawdisChatViewModel { } private static func decodeMessages(_ raw: [AnyCodable]) -> [ClawdisChatMessage] { - raw.compactMap { item in + let decoded = raw.compactMap { item in (try? ChatPayloadDecoding.decode(item, as: ClawdisChatMessage.self)) } + return Self.dedupeMessages(decoded) + } + + private static func dedupeMessages(_ messages: [ClawdisChatMessage]) -> [ClawdisChatMessage] { + var result: [ClawdisChatMessage] = [] + result.reserveCapacity(messages.count) + var seen = Set() + + for message in messages { + guard let key = Self.dedupeKey(for: message) else { + result.append(message) + continue + } + if seen.contains(key) { continue } + seen.insert(key) + result.append(message) + } + + return result + } + + private static func dedupeKey(for message: ClawdisChatMessage) -> String? { + guard let timestamp = message.timestamp else { return nil } + let text = message.content.compactMap(\.text).joined(separator: "\n") + .trimmingCharacters(in: .whitespacesAndNewlines) + guard !text.isEmpty else { return nil } + return "\(message.role)|\(timestamp)|\(text)" } private func performSend() async { @@ -293,8 +320,17 @@ public final class ClawdisChatViewModel { return } - if let runId = chat.runId, !self.pendingRuns.contains(runId) { - // Ignore events for other runs. + let isOurRun = chat.runId.flatMap { self.pendingRuns.contains($0) } ?? false + if !isOurRun { + // Keep multiple clients in sync: if another client finishes a run for our session, refresh history. + switch chat.state { + case "final", "aborted", "error": + self.streamingAssistantText = nil + self.pendingToolCallsById = [:] + Task { await self.refreshHistoryAfterRun() } + default: + break + } return } diff --git a/apps/shared/ClawdisKit/Sources/ClawdisKit/AudioStreamingProtocols.swift b/apps/shared/ClawdisKit/Sources/ClawdisKit/AudioStreamingProtocols.swift new file mode 100644 index 000000000..a211a4b3a --- /dev/null +++ b/apps/shared/ClawdisKit/Sources/ClawdisKit/AudioStreamingProtocols.swift @@ -0,0 +1,16 @@ +import Foundation + +@MainActor +public protocol StreamingAudioPlaying { + func play(stream: AsyncThrowingStream) async -> StreamingPlaybackResult + func stop() -> Double? +} + +@MainActor +public protocol PCMStreamingAudioPlaying { + func play(stream: AsyncThrowingStream, sampleRate: Double) async -> StreamingPlaybackResult + func stop() -> Double? +} + +extension StreamingAudioPlayer: StreamingAudioPlaying {} +extension PCMStreamingAudioPlayer: PCMStreamingAudioPlaying {} diff --git a/apps/shared/ClawdisKit/Sources/ClawdisKit/ElevenLabsKitShim.swift b/apps/shared/ClawdisKit/Sources/ClawdisKit/ElevenLabsKitShim.swift new file mode 100644 index 000000000..07fe91ac3 --- /dev/null +++ b/apps/shared/ClawdisKit/Sources/ClawdisKit/ElevenLabsKitShim.swift @@ -0,0 +1,9 @@ +@_exported import ElevenLabsKit + +public typealias ElevenLabsVoice = ElevenLabsKit.ElevenLabsVoice +public typealias ElevenLabsTTSRequest = ElevenLabsKit.ElevenLabsTTSRequest +public typealias ElevenLabsTTSClient = ElevenLabsKit.ElevenLabsTTSClient +public typealias TalkTTSValidation = ElevenLabsKit.TalkTTSValidation +public typealias StreamingAudioPlayer = ElevenLabsKit.StreamingAudioPlayer +public typealias PCMStreamingAudioPlayer = ElevenLabsKit.PCMStreamingAudioPlayer +public typealias StreamingPlaybackResult = ElevenLabsKit.StreamingPlaybackResult diff --git a/apps/shared/ClawdisKit/Sources/ClawdisKit/JPEGTranscoder.swift b/apps/shared/ClawdisKit/Sources/ClawdisKit/JPEGTranscoder.swift index 39761f131..f4b1cb951 100644 --- a/apps/shared/ClawdisKit/Sources/ClawdisKit/JPEGTranscoder.swift +++ b/apps/shared/ClawdisKit/Sources/ClawdisKit/JPEGTranscoder.swift @@ -7,6 +7,7 @@ public enum JPEGTranscodeError: LocalizedError, Sendable { case decodeFailed case propertiesMissing case encodeFailed + case sizeLimitExceeded(maxBytes: Int, actualBytes: Int) public var errorDescription: String? { switch self { @@ -16,6 +17,8 @@ public enum JPEGTranscodeError: LocalizedError, Sendable { "Failed to read image properties" case .encodeFailed: "Failed to encode JPEG" + case let .sizeLimitExceeded(maxBytes, actualBytes): + "JPEG exceeds size limit (\(actualBytes) bytes > \(maxBytes) bytes)" } } } @@ -32,7 +35,8 @@ public struct JPEGTranscoder: Sendable { public static func transcodeToJPEG( imageData: Data, maxWidthPx: Int?, - quality: Double) throws -> (data: Data, widthPx: Int, heightPx: Int) + quality: Double, + maxBytes: Int? = nil) throws -> (data: Data, widthPx: Int, heightPx: Int) { guard let src = CGImageSourceCreateWithData(imageData as CFData, nil) else { throw JPEGTranscodeError.decodeFailed @@ -58,7 +62,7 @@ public struct JPEGTranscoder: Sendable { let orientedHeight = rotates90 ? pixelWidth : pixelHeight let maxDim = max(orientedWidth, orientedHeight) - let targetMaxPixelSize: Int = { + var targetMaxPixelSize: Int = { guard let maxWidthPx, maxWidthPx > 0 else { return maxDim } guard orientedWidth > maxWidthPx else { return maxDim } // never upscale @@ -66,28 +70,66 @@ public struct JPEGTranscoder: Sendable { return max(1, Int((Double(maxDim) * scale).rounded(.toNearestOrAwayFromZero))) }() - let thumbOpts: [CFString: Any] = [ - kCGImageSourceCreateThumbnailFromImageAlways: true, - kCGImageSourceCreateThumbnailWithTransform: true, - kCGImageSourceThumbnailMaxPixelSize: targetMaxPixelSize, - kCGImageSourceShouldCacheImmediately: true, - ] + func encode(maxPixelSize: Int, quality: Double) throws -> (data: Data, widthPx: Int, heightPx: Int) { + let thumbOpts: [CFString: Any] = [ + kCGImageSourceCreateThumbnailFromImageAlways: true, + kCGImageSourceCreateThumbnailWithTransform: true, + kCGImageSourceThumbnailMaxPixelSize: maxPixelSize, + kCGImageSourceShouldCacheImmediately: true, + ] - guard let img = CGImageSourceCreateThumbnailAtIndex(src, 0, thumbOpts as CFDictionary) else { - throw JPEGTranscodeError.decodeFailed + guard let img = CGImageSourceCreateThumbnailAtIndex(src, 0, thumbOpts as CFDictionary) else { + throw JPEGTranscodeError.decodeFailed + } + + let out = NSMutableData() + guard let dest = CGImageDestinationCreateWithData(out, UTType.jpeg.identifier as CFString, 1, nil) else { + throw JPEGTranscodeError.encodeFailed + } + let q = self.clampQuality(quality) + let encodeProps = [kCGImageDestinationLossyCompressionQuality: q] as CFDictionary + CGImageDestinationAddImage(dest, img, encodeProps) + guard CGImageDestinationFinalize(dest) else { + throw JPEGTranscodeError.encodeFailed + } + + return (out as Data, img.width, img.height) } - let out = NSMutableData() - guard let dest = CGImageDestinationCreateWithData(out, UTType.jpeg.identifier as CFString, 1, nil) else { - throw JPEGTranscodeError.encodeFailed - } - let q = self.clampQuality(quality) - let encodeProps = [kCGImageDestinationLossyCompressionQuality: q] as CFDictionary - CGImageDestinationAddImage(dest, img, encodeProps) - guard CGImageDestinationFinalize(dest) else { - throw JPEGTranscodeError.encodeFailed + guard let maxBytes, maxBytes > 0 else { + return try encode(maxPixelSize: targetMaxPixelSize, quality: quality) } - return (out as Data, img.width, img.height) + let minQuality = max(0.2, self.clampQuality(quality) * 0.35) + let minPixelSize = 256 + var best = try encode(maxPixelSize: targetMaxPixelSize, quality: quality) + if best.data.count <= maxBytes { + return best + } + + for _ in 0..<6 { + var q = self.clampQuality(quality) + for _ in 0..<6 { + let candidate = try encode(maxPixelSize: targetMaxPixelSize, quality: q) + best = candidate + if candidate.data.count <= maxBytes { + return candidate + } + if q <= minQuality { break } + q = max(minQuality, q * 0.75) + } + + let nextPixelSize = max(Int(Double(targetMaxPixelSize) * 0.85), minPixelSize) + if nextPixelSize == targetMaxPixelSize { + break + } + targetMaxPixelSize = nextPixelSize + } + + if best.data.count > maxBytes { + throw JPEGTranscodeError.sizeLimitExceeded(maxBytes: maxBytes, actualBytes: best.data.count) + } + + return best } } diff --git a/apps/shared/ClawdisKit/Sources/ClawdisKit/Resources/CanvasScaffold/scaffold.html b/apps/shared/ClawdisKit/Sources/ClawdisKit/Resources/CanvasScaffold/scaffold.html index d9a9cebfd..3236dc7dc 100644 --- a/apps/shared/ClawdisKit/Sources/ClawdisKit/Resources/CanvasScaffold/scaffold.html +++ b/apps/shared/ClawdisKit/Sources/ClawdisKit/Resources/CanvasScaffold/scaffold.html @@ -4,6 +4,21 @@ Canvas +