1.7 KiB
1.7 KiB
AI Search Configuration
Worker Setup
// wrangler.jsonc
{
"ai": { "binding": "AI" }
}
interface Env {
AI: Ai;
}
const answer = await env.AI.autorag("my-instance").aiSearch({
query: "How do I configure caching?",
model: "@cf/meta/llama-3.3-70b-instruct-fp8-fast"
});
Data Sources
R2 Bucket
Dashboard: AI Search → Create Instance → Select R2 bucket
Supported formats: .md, .txt, .html, .pdf, .doc, .docx, .csv, .json
Auto-indexed metadata: filename, folder, timestamp
Website Crawler
Requirements:
- Domain on Cloudflare
sitemap.xmlat root- Bot protection must allow
CloudflareAISearchuser agent
Path Filtering (R2)
docs/**/*.md # All .md in docs/ recursively
**/*.draft.md # Exclude (use in exclude patterns)
Indexing
- Automatic: Every 6 hours
- Force Sync: Dashboard button (30s rate limit between syncs)
- Pause: Settings → Pause Indexing (existing index remains searchable)
Service API Token
Dashboard: AI Search → Instance → Use AI Search → API → Create Token
Permissions:
- Read - search operations
- Edit - instance management
Store securely:
wrangler secret put AI_SEARCH_TOKEN
Multi-Environment
# wrangler.toml
[env.production.vars]
AI_SEARCH_INSTANCE = "prod-docs"
[env.staging.vars]
AI_SEARCH_INSTANCE = "staging-docs"
const answer = await env.AI.autorag(env.AI_SEARCH_INSTANCE).aiSearch({ query });
Monitoring
const instances = await env.AI.autorag("_").listInstances();
console.log(instances.find(i => i.name === "docs"));
Dashboard shows: files indexed, status, last index time, storage usage.