This commit is contained in:
8
fal-audio/.skillshare-meta.json
Normal file
8
fal-audio/.skillshare-meta.json
Normal file
@@ -0,0 +1,8 @@
|
||||
{
|
||||
"source": "github.com/fal-ai-community/skills/tree/main/skills/claude.ai/fal-audio",
|
||||
"type": "github-subdir",
|
||||
"installed_at": "2026-01-30T02:27:04.562065987Z",
|
||||
"repo_url": "https://github.com/fal-ai-community/skills.git",
|
||||
"subdir": "skills/claude.ai/fal-audio",
|
||||
"version": "69efe6e"
|
||||
}
|
||||
249
fal-audio/SKILL.md
Normal file
249
fal-audio/SKILL.md
Normal file
@@ -0,0 +1,249 @@
|
||||
---
|
||||
name: fal-audio
|
||||
description: Text-to-speech and speech-to-text using fal.ai audio models. Use when the user requests "Convert text to speech", "Transcribe audio", "Generate voice", "Speech to text", "TTS", "STT", or similar audio tasks.
|
||||
metadata:
|
||||
author: fal-ai
|
||||
version: "1.0.0"
|
||||
---
|
||||
|
||||
# fal.ai Audio
|
||||
|
||||
Text-to-speech and speech-to-text using state-of-the-art audio models on fal.ai.
|
||||
|
||||
## How It Works
|
||||
|
||||
1. User provides text (for TTS) or audio URL (for STT)
|
||||
2. Script selects appropriate model
|
||||
3. Sends request to fal.ai API
|
||||
4. Returns audio URL (TTS) or transcription text (STT)
|
||||
|
||||
## Text-to-Speech Models
|
||||
|
||||
| Model | Notes |
|
||||
|-------|-------|
|
||||
| `fal-ai/minimax/speech-2.6-hd` | **Best quality** |
|
||||
| `fal-ai/minimax/speech-2.6-turbo` | Fast, good quality |
|
||||
| `fal-ai/elevenlabs/eleven-v3` | Natural voices |
|
||||
| `fal-ai/chatterbox/multilingual` | Multi-language, fast |
|
||||
| `fal-ai/kling-video/v1/tts` | For video sync |
|
||||
|
||||
## Text-to-Music Models
|
||||
|
||||
| Model | Notes |
|
||||
|-------|-------|
|
||||
| `fal-ai/minimax-music/v2` | **Best quality** |
|
||||
| `fal-ai/minimax-music/v1.5` | Fast |
|
||||
| `fal-ai/lyria2` | Google's model |
|
||||
| `fal-ai/elevenlabs/music` | Song generation |
|
||||
| `fal-ai/sonauto/v2` | Instrumental |
|
||||
| `fal-ai/ace-step` | Short clips |
|
||||
| `fal-ai/beatoven` | Background music |
|
||||
|
||||
## Speech-to-Text Models
|
||||
|
||||
| Model | Features | Speed |
|
||||
|-------|----------|-------|
|
||||
| `fal-ai/whisper` | Multi-language, timestamps | Fast |
|
||||
| `fal-ai/elevenlabs/scribe` | Speaker diarization | Medium |
|
||||
|
||||
## Usage
|
||||
|
||||
### Text-to-Speech
|
||||
|
||||
```bash
|
||||
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh [options]
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `--text` - Text to convert to speech (required)
|
||||
- `--model` - TTS model (defaults to `fal-ai/minimax/speech-2.6-turbo`)
|
||||
- `--voice` - Voice ID or name (model-specific)
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# Basic TTS (fast, good quality)
|
||||
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
|
||||
--text "Hello, welcome to the future of AI."
|
||||
|
||||
# High quality with MiniMax HD
|
||||
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
|
||||
--text "This is premium quality speech." \
|
||||
--model "fal-ai/minimax/speech-2.6-hd"
|
||||
|
||||
# Natural voices with ElevenLabs
|
||||
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
|
||||
--text "Natural sounding voice generation" \
|
||||
--model "fal-ai/elevenlabs/eleven-v3"
|
||||
|
||||
# Multi-language TTS
|
||||
bash /mnt/skills/user/fal-audio/scripts/text-to-speech.sh \
|
||||
--text "Bonjour, bienvenue dans le futur." \
|
||||
--model "fal-ai/chatterbox/multilingual"
|
||||
```
|
||||
|
||||
### Speech-to-Text
|
||||
|
||||
```bash
|
||||
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh [options]
|
||||
```
|
||||
|
||||
**Arguments:**
|
||||
- `--audio-url` - URL of audio file to transcribe (required)
|
||||
- `--model` - STT model (defaults to `fal-ai/whisper`)
|
||||
- `--language` - Language code (optional, auto-detected)
|
||||
|
||||
**Examples:**
|
||||
|
||||
```bash
|
||||
# Transcribe with Whisper
|
||||
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
|
||||
--audio-url "https://example.com/audio.mp3"
|
||||
|
||||
# Transcribe with speaker diarization
|
||||
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
|
||||
--audio-url "https://example.com/meeting.mp3" \
|
||||
--model "fal-ai/elevenlabs/scribe"
|
||||
|
||||
# Transcribe specific language
|
||||
bash /mnt/skills/user/fal-audio/scripts/speech-to-text.sh \
|
||||
--audio-url "https://example.com/spanish.mp3" \
|
||||
--language "es"
|
||||
```
|
||||
|
||||
## MCP Tool Alternative
|
||||
|
||||
### Text-to-Speech
|
||||
```javascript
|
||||
mcp__fal-ai__generate({
|
||||
modelId: "fal-ai/minimax/speech-2.6-turbo",
|
||||
input: {
|
||||
text: "Hello, welcome to the future of AI."
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
### Speech-to-Text
|
||||
```javascript
|
||||
mcp__fal-ai__generate({
|
||||
modelId: "fal-ai/whisper",
|
||||
input: {
|
||||
audio_url: "https://example.com/audio.mp3"
|
||||
}
|
||||
})
|
||||
```
|
||||
|
||||
## Output
|
||||
|
||||
### Text-to-Speech Output
|
||||
```
|
||||
Generating speech...
|
||||
Model: fal-ai/minimax/speech-2.6-turbo
|
||||
|
||||
Speech generated!
|
||||
|
||||
Audio URL: https://v3.fal.media/files/abc123/speech.mp3
|
||||
Duration: 5.2s
|
||||
```
|
||||
|
||||
### Speech-to-Text Output
|
||||
```
|
||||
Transcribing audio...
|
||||
Model: fal-ai/whisper
|
||||
|
||||
Transcription complete!
|
||||
|
||||
Text: "Hello, this is the transcribed text from the audio file."
|
||||
Duration: 12.5s
|
||||
Language: en
|
||||
```
|
||||
|
||||
## Present Results to User
|
||||
|
||||
### For TTS:
|
||||
```
|
||||
Here's the generated speech:
|
||||
|
||||
[Download audio](https://v3.fal.media/files/.../speech.mp3)
|
||||
|
||||
• Duration: 5.2s | Model: Maya TTS
|
||||
```
|
||||
|
||||
### For STT:
|
||||
```
|
||||
Here's the transcription:
|
||||
|
||||
"Hello, this is the transcribed text from the audio file."
|
||||
|
||||
• Duration: 12.5s | Language: English
|
||||
```
|
||||
|
||||
## Model Selection Guide
|
||||
|
||||
### Text-to-Speech
|
||||
|
||||
**MiniMax Speech 2.6 HD** (`fal-ai/minimax/speech-2.6-hd`)
|
||||
- Best for: Premium quality requirements
|
||||
- Quality: **Highest**
|
||||
- Speed: Medium
|
||||
|
||||
**MiniMax Speech 2.6 Turbo** (`fal-ai/minimax/speech-2.6-turbo`)
|
||||
- Best for: General use with good quality
|
||||
- Quality: High
|
||||
- Speed: Fast
|
||||
|
||||
**ElevenLabs v3** (`fal-ai/elevenlabs/eleven-v3`)
|
||||
- Best for: Natural, realistic voices
|
||||
- Quality: High
|
||||
- Features: Many voice options
|
||||
|
||||
**Chatterbox Multilingual** (`fal-ai/chatterbox/multilingual`)
|
||||
- Best for: Multi-language support
|
||||
- Quality: Good
|
||||
- Speed: Fast
|
||||
|
||||
### Text-to-Music
|
||||
|
||||
**MiniMax Music v2** (`fal-ai/minimax-music/v2`)
|
||||
- Best for: High quality music generation
|
||||
- Quality: **Highest**
|
||||
|
||||
**Lyria2** (`fal-ai/lyria2`)
|
||||
- Best for: Google's music model
|
||||
- Quality: High
|
||||
|
||||
### Speech-to-Text
|
||||
|
||||
**Whisper** (`fal-ai/whisper`)
|
||||
- Best for: General transcription, timestamps
|
||||
- Languages: 99+ languages
|
||||
- Features: Word-level timestamps
|
||||
|
||||
**ElevenLabs Scribe** (`fal-ai/elevenlabs/scribe`)
|
||||
- Best for: Multi-speaker recordings
|
||||
- Features: Speaker diarization
|
||||
- Quality: Professional-grade
|
||||
|
||||
## Troubleshooting
|
||||
|
||||
### Empty Audio
|
||||
```
|
||||
Error: Generated audio is empty
|
||||
|
||||
Check that your text is not empty and contains valid content.
|
||||
```
|
||||
|
||||
### Unsupported Audio Format
|
||||
```
|
||||
Error: Audio format not supported
|
||||
|
||||
Supported formats: MP3, WAV, M4A, FLAC, OGG
|
||||
Convert your audio to a supported format.
|
||||
```
|
||||
|
||||
### Language Detection Failed
|
||||
```
|
||||
Warning: Could not detect language, defaulting to English
|
||||
|
||||
Specify the language explicitly with --language option.
|
||||
```
|
||||
141
fal-audio/scripts/speech-to-text.sh
Executable file
141
fal-audio/scripts/speech-to-text.sh
Executable file
@@ -0,0 +1,141 @@
|
||||
#!/bin/bash
|
||||
|
||||
# fal.ai Speech-to-Text Script
|
||||
# Usage: ./speech-to-text.sh --audio-url URL [--model MODEL] [--language LANG]
|
||||
# Returns: JSON with transcription
|
||||
|
||||
set -e
|
||||
|
||||
FAL_API_ENDPOINT="https://fal.run"
|
||||
|
||||
# Default values
|
||||
MODEL="fal-ai/whisper"
|
||||
AUDIO_URL=""
|
||||
LANGUAGE=""
|
||||
|
||||
# Check for --add-fal-key first
|
||||
for arg in "$@"; do
|
||||
if [ "$arg" = "--add-fal-key" ]; then
|
||||
shift
|
||||
KEY_VALUE=""
|
||||
if [[ -n "$1" && ! "$1" =~ ^-- ]]; then
|
||||
KEY_VALUE="$1"
|
||||
fi
|
||||
if [ -z "$KEY_VALUE" ]; then
|
||||
echo "Enter your fal.ai API key:" >&2
|
||||
read -r KEY_VALUE
|
||||
fi
|
||||
if [ -n "$KEY_VALUE" ]; then
|
||||
grep -v "^FAL_KEY=" .env > .env.tmp 2>/dev/null || true
|
||||
mv .env.tmp .env 2>/dev/null || true
|
||||
echo "FAL_KEY=$KEY_VALUE" >> .env
|
||||
echo "FAL_KEY saved to .env" >&2
|
||||
fi
|
||||
exit 0
|
||||
fi
|
||||
done
|
||||
|
||||
# Load .env if exists
|
||||
if [ -f ".env" ]; then
|
||||
source .env 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Parse arguments
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
--audio-url)
|
||||
AUDIO_URL="$2"
|
||||
shift 2
|
||||
;;
|
||||
--model)
|
||||
MODEL="$2"
|
||||
shift 2
|
||||
;;
|
||||
--language)
|
||||
LANGUAGE="$2"
|
||||
shift 2
|
||||
;;
|
||||
--help|-h)
|
||||
echo "fal.ai Speech-to-Text Script" >&2
|
||||
echo "" >&2
|
||||
echo "Usage:" >&2
|
||||
echo " ./speech-to-text.sh --audio-url URL [options]" >&2
|
||||
echo "" >&2
|
||||
echo "Options:" >&2
|
||||
echo " --audio-url Audio URL to transcribe (required)" >&2
|
||||
echo " --model Model ID (default: fal-ai/whisper)" >&2
|
||||
echo " --language Language code (auto-detected if omitted)" >&2
|
||||
echo " --add-fal-key Setup FAL_KEY in .env" >&2
|
||||
exit 0
|
||||
;;
|
||||
*)
|
||||
shift
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
# Validate required inputs
|
||||
if [ -z "$FAL_KEY" ]; then
|
||||
echo "Error: FAL_KEY not set" >&2
|
||||
echo "" >&2
|
||||
echo "Run: ./speech-to-text.sh --add-fal-key" >&2
|
||||
echo "Or: export FAL_KEY=your_key_here" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "$AUDIO_URL" ]; then
|
||||
echo "Error: --audio-url is required" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create temp directory
|
||||
TEMP_DIR=$(mktemp -d)
|
||||
trap 'rm -rf "$TEMP_DIR"' EXIT
|
||||
|
||||
echo "Transcribing audio..." >&2
|
||||
echo "Model: $MODEL" >&2
|
||||
echo "" >&2
|
||||
|
||||
# Build payload
|
||||
if [ -n "$LANGUAGE" ]; then
|
||||
PAYLOAD=$(cat <<EOF
|
||||
{
|
||||
"audio_url": "$AUDIO_URL",
|
||||
"language": "$LANGUAGE"
|
||||
}
|
||||
EOF
|
||||
)
|
||||
else
|
||||
PAYLOAD=$(cat <<EOF
|
||||
{
|
||||
"audio_url": "$AUDIO_URL"
|
||||
}
|
||||
EOF
|
||||
)
|
||||
fi
|
||||
|
||||
# Make API request
|
||||
RESPONSE=$(curl -s -X POST "$FAL_API_ENDPOINT/$MODEL" \
|
||||
-H "Authorization: Key $FAL_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "$PAYLOAD")
|
||||
|
||||
# Check for errors
|
||||
if echo "$RESPONSE" | grep -q '"error"'; then
|
||||
ERROR_MSG=$(echo "$RESPONSE" | grep -o '"message":"[^"]*"' | head -1 | cut -d'"' -f4)
|
||||
if [ -z "$ERROR_MSG" ]; then
|
||||
ERROR_MSG=$(echo "$RESPONSE" | grep -o '"error":"[^"]*"' | cut -d'"' -f4)
|
||||
fi
|
||||
echo "Error: $ERROR_MSG" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Transcription complete!" >&2
|
||||
echo "" >&2
|
||||
|
||||
# Extract text
|
||||
TEXT=$(echo "$RESPONSE" | grep -o '"text":"[^"]*"' | head -1 | cut -d'"' -f4)
|
||||
echo "Text: $TEXT" >&2
|
||||
|
||||
# Output JSON for programmatic use
|
||||
echo "$RESPONSE"
|
||||
141
fal-audio/scripts/text-to-speech.sh
Executable file
141
fal-audio/scripts/text-to-speech.sh
Executable file
@@ -0,0 +1,141 @@
|
||||
#!/bin/bash
|
||||
|
||||
# fal.ai Text-to-Speech Script
|
||||
# Usage: ./text-to-speech.sh --text "..." [--model MODEL] [--voice VOICE]
|
||||
# Returns: JSON with audio URL
|
||||
|
||||
set -e
|
||||
|
||||
FAL_API_ENDPOINT="https://fal.run"
|
||||
|
||||
# Default values
|
||||
MODEL="fal-ai/minimax/speech-2.6-turbo"
|
||||
TEXT=""
|
||||
VOICE=""
|
||||
|
||||
# Check for --add-fal-key first
|
||||
for arg in "$@"; do
|
||||
if [ "$arg" = "--add-fal-key" ]; then
|
||||
shift
|
||||
KEY_VALUE=""
|
||||
if [[ -n "$1" && ! "$1" =~ ^-- ]]; then
|
||||
KEY_VALUE="$1"
|
||||
fi
|
||||
if [ -z "$KEY_VALUE" ]; then
|
||||
echo "Enter your fal.ai API key:" >&2
|
||||
read -r KEY_VALUE
|
||||
fi
|
||||
if [ -n "$KEY_VALUE" ]; then
|
||||
grep -v "^FAL_KEY=" .env > .env.tmp 2>/dev/null || true
|
||||
mv .env.tmp .env 2>/dev/null || true
|
||||
echo "FAL_KEY=$KEY_VALUE" >> .env
|
||||
echo "FAL_KEY saved to .env" >&2
|
||||
fi
|
||||
exit 0
|
||||
fi
|
||||
done
|
||||
|
||||
# Load .env if exists
|
||||
if [ -f ".env" ]; then
|
||||
source .env 2>/dev/null || true
|
||||
fi
|
||||
|
||||
# Parse arguments
|
||||
while [[ $# -gt 0 ]]; do
|
||||
case $1 in
|
||||
--text)
|
||||
TEXT="$2"
|
||||
shift 2
|
||||
;;
|
||||
--model)
|
||||
MODEL="$2"
|
||||
shift 2
|
||||
;;
|
||||
--voice)
|
||||
VOICE="$2"
|
||||
shift 2
|
||||
;;
|
||||
--help|-h)
|
||||
echo "fal.ai Text-to-Speech Script" >&2
|
||||
echo "" >&2
|
||||
echo "Usage:" >&2
|
||||
echo " ./text-to-speech.sh --text \"...\" [options]" >&2
|
||||
echo "" >&2
|
||||
echo "Options:" >&2
|
||||
echo " --text Text to convert (required)" >&2
|
||||
echo " --model Model ID (default: fal-ai/maya-tts)" >&2
|
||||
echo " --voice Voice ID (model-specific)" >&2
|
||||
echo " --add-fal-key Setup FAL_KEY in .env" >&2
|
||||
exit 0
|
||||
;;
|
||||
*)
|
||||
shift
|
||||
;;
|
||||
esac
|
||||
done
|
||||
|
||||
# Validate required inputs
|
||||
if [ -z "$FAL_KEY" ]; then
|
||||
echo "Error: FAL_KEY not set" >&2
|
||||
echo "" >&2
|
||||
echo "Run: ./text-to-speech.sh --add-fal-key" >&2
|
||||
echo "Or: export FAL_KEY=your_key_here" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
if [ -z "$TEXT" ]; then
|
||||
echo "Error: --text is required" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
# Create temp directory
|
||||
TEMP_DIR=$(mktemp -d)
|
||||
trap 'rm -rf "$TEMP_DIR"' EXIT
|
||||
|
||||
echo "Generating speech..." >&2
|
||||
echo "Model: $MODEL" >&2
|
||||
echo "" >&2
|
||||
|
||||
# Build payload
|
||||
if [ -n "$VOICE" ]; then
|
||||
PAYLOAD=$(cat <<EOF
|
||||
{
|
||||
"text": "$TEXT",
|
||||
"voice": "$VOICE"
|
||||
}
|
||||
EOF
|
||||
)
|
||||
else
|
||||
PAYLOAD=$(cat <<EOF
|
||||
{
|
||||
"text": "$TEXT"
|
||||
}
|
||||
EOF
|
||||
)
|
||||
fi
|
||||
|
||||
# Make API request
|
||||
RESPONSE=$(curl -s -X POST "$FAL_API_ENDPOINT/$MODEL" \
|
||||
-H "Authorization: Key $FAL_KEY" \
|
||||
-H "Content-Type: application/json" \
|
||||
-d "$PAYLOAD")
|
||||
|
||||
# Check for errors
|
||||
if echo "$RESPONSE" | grep -q '"error"'; then
|
||||
ERROR_MSG=$(echo "$RESPONSE" | grep -o '"message":"[^"]*"' | head -1 | cut -d'"' -f4)
|
||||
if [ -z "$ERROR_MSG" ]; then
|
||||
ERROR_MSG=$(echo "$RESPONSE" | grep -o '"error":"[^"]*"' | cut -d'"' -f4)
|
||||
fi
|
||||
echo "Error: $ERROR_MSG" >&2
|
||||
exit 1
|
||||
fi
|
||||
|
||||
echo "Speech generated!" >&2
|
||||
echo "" >&2
|
||||
|
||||
# Extract audio URL
|
||||
AUDIO_URL=$(echo "$RESPONSE" | grep -o '"url":"[^"]*"' | head -1 | cut -d'"' -f4)
|
||||
echo "Audio URL: $AUDIO_URL" >&2
|
||||
|
||||
# Output JSON for programmatic use
|
||||
echo "$RESPONSE"
|
||||
Reference in New Issue
Block a user