* Add resemble-detect skill Deepfake detection and media safety skill using Resemble AI — detects AI-generated audio, images, video, and text with confidence scores, traces audio source platforms, applies and reads watermarks, verifies speaker identity, and extracts media intelligence (speaker, emotion, misinformation signals). Packaged as SKILL.md + LICENSE (Apache-2.0). Generated docs updated via npm start per CONTRIBUTING.md. * resemble-detect: trim body under 500 lines + add compatibility Moves detailed request/response schemas from SKILL.md into references/api-reference.md, bringing the SKILL body from 557 to 282 lines (validator hard cap is 500). Core decision-making content — capability decision tree, score interpretation, workflows, red flags — stays in the body where the agent needs it at query time. Also adds a compatibility field to frontmatter per review feedback: surfaces the RESEMBLE_API_KEY requirement and the public-HTTPS-URL constraint upfront. * Fix resemble-detect skill metadata
9.9 KiB
Resemble Detect — Full API Reference
Detailed request/response schemas for every Resemble detection endpoint.
Base
- Base URL:
https://app.resemble.ai/api/v2 - Auth:
Authorization: Bearer <RESEMBLE_API_KEY>
Deepfake Detection
POST /detect
Submit audio, image, or video for AI-generation analysis.
{
"url": "https://example.com/media.mp4",
"visualize": true,
"intelligence": true,
"audio_source_tracing": true
}
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | HTTPS URL to audio, image, or video file |
callback_url |
string | No | Webhook URL for async completion notification |
visualize |
boolean | No | Generate heatmap/visualization artifacts |
intelligence |
boolean | No | Run multimodal intelligence alongside detection |
audio_source_tracing |
boolean | No | Identify which AI platform synthesized fake audio |
frame_length |
integer | No | Audio/video window size in seconds (1–4, default 2) |
start_region |
number | No | Start of segment to analyze (seconds) |
end_region |
number | No | End of segment to analyze (seconds) |
model_types |
string | No | "image" or "talking_head" (for face-swap detection) |
use_reverse_search |
boolean | No | Enable reverse image search (image only) |
use_ood_detector |
boolean | No | Enable out-of-distribution detection |
zero_retention_mode |
boolean | No | Auto-delete media after detection completes |
Supported formats: Audio (WAV, MP3, OGG, M4A, FLAC) · Video (MP4, MOV, AVI, WMV) · Image (JPG, PNG, GIF, WEBP)
GET /detect/{uuid} — Poll for Results
Detection is asynchronous. Poll until status is "completed" or "failed". Start at 2s intervals, back off to 5s, then 10s. Most detections complete within 10–60s.
Reading Results by Media Type
Audio results — in metrics:
{
"label": "fake",
"score": ["0.92", "0.88", "0.95"],
"consistency": "0.91",
"aggregated_score": "0.92",
"image": "https://..."
}
label:"fake"or"real"— the verdictscore: Per-chunk prediction scores (array)aggregated_score: Overall confidence (0.0–1.0, higher = more likely synthetic)consistency: How consistent the prediction is across chunksimage: Visualization heatmap URL (ifvisualize: true)
Image results — in image_metrics:
{
"type": "ImageAnalysis",
"label": "fake",
"score": 0.87,
"image": "https://...",
"ifl": { "score": 0.82, "heatmap": "https://..." },
"reverse_image_search_sources": [
{ "url": "...", "title": "...", "verdict": "known_fake", "similarity": 0.95 }
]
}
ifl: Invisible Frequency Layer analysis with heatmapreverse_image_search_sources: Known online sources (ifuse_reverse_search: true)
Video results — in video_metrics:
{
"label": "fake",
"score": 0.89,
"certainty": 0.91,
"children": [
{ "type": "VideoResult", "conclusion": "Fake", "score": 0.89, "timestamp": 2.5, "children": [...] }
]
}
- Hierarchical tree of frame-level and segment-level results
- Video with audio track returns both
metrics(audio) andvideo_metrics(visual)
Intelligence
POST /intelligence
Analyze media for rich structured insights, standalone or alongside detection.
{ "url": "https://example.com/audio.mp3", "json": true }
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | One of | HTTPS URL to media file |
media_token |
string | One of | Token from secure upload (alternative to URL) |
detect_id |
string | No | UUID of existing detect to associate |
media_type |
string | No | "audio", "video", or "image" (auto-detected) |
json |
boolean | No | Return structured fields (default: false audio/video, true image) |
callback_url |
string | No | Webhook for async mode |
Audio/Video structured response (json: true):
speaker_info— speaker description (age, gender)language/dialect— detected languageemotion— detected emotional statespeaking_style— conversational, formal, etc.context— inferred context of the speechmessage— content summaryabnormalities— anomalies detected in the mediatranscription— full transcripttranslation— translation if non-Englishmisinformation— misinformation analysis
Image structured response:
scene_description— what the image showssubjects— people/objects identifiedauthenticity_analysis— visual authenticity assessmentcontext_and_setting— environment descriptionabnormalities— visual anomaliesmisinformation— misinformation analysis
POST /detects/{detect_uuid}/intelligence — Ask Questions
After detection completes, ask natural-language questions about it:
{ "query": "How confident is the model that this audio is fake?" }
Returns a question UUID. Poll GET /detects/{detect_uuid}/intelligence/{question_uuid} until status is "completed".
Prerequisite: The detection must have status: "completed". Otherwise returns 422.
Audio Source Tracing
Enable by setting audio_source_tracing: true in POST /detect.
Result appears in the detection response under audio_source_tracing:
{ "label": "elevenlabs", "error_message": null }
Known source labels: resemble_ai, elevenlabs, real, and others as the model expands.
Important: Source tracing only runs when audio is labeled "fake". If audio is "real", no source tracing result appears.
Standalone queries:
GET /audio_source_tracings— list all source tracing reportsGET /audio_source_tracings/{uuid}— get specific report
Watermarking
POST /watermark/apply
{
"url": "https://example.com/image.png",
"strength": 0.3,
"custom_message": "my-organization"
}
| Parameter | Type | Required | Description |
|---|---|---|---|
url |
string | Yes | HTTPS URL to media file |
strength |
number | No | Watermark strength 0.0–1.0 (image/video only, default 0.2) |
custom_message |
string | No | Custom message (image/video only, default "resembleai") |
- Add
Prefer: waitheader for synchronous response - Without it, poll
GET /watermark/apply/{uuid}/result - Response includes
watermarked_mediaURL to download the watermarked file
POST /watermark/detect
{ "url": "https://example.com/suspect-image.png" }
Audio detection result:
{ "has_watermark": true, "confidence": 0.95 }
Image/Video detection result:
{ "has_watermark": true }
Identity — Speaker Verification (Beta)
Beta feature — requires joining the preview program. Inform the user if they encounter access errors.
POST /identity — Create Identity Profile
{
"audio_url": "https://example.com/known-speaker.wav",
"name": "Jane Doe"
}
POST /identity/search — Search Against Known Identities
{
"audio_url": "https://example.com/unknown-speaker.wav",
"top_k": 5
}
Response:
{
"success": true,
"item": [
{ "uuid": "...", "name": "Jane Doe", "confidence": 0.92, "distance": 0.08 }
]
}
Lower distance = closer match. Higher confidence = stronger match.
Text Detection
Beta feature — requires the
detect_beta_userrole or a billing plan that includes thedfd_textproduct.
POST /text_detect
Add Prefer: wait for synchronous response. Otherwise poll or use callback.
| Parameter | Type | Required | Description |
|---|---|---|---|
text |
string | Yes | Text to analyze (max 100,000 characters) |
thinking |
string | No | Always use "low" (default) |
threshold |
float | No | Decision threshold 0.0–1.0 (default: 0.5) |
callback_url |
string | No | Webhook URL for async completion notification |
privacy_mode |
boolean | No | If true, text content is not stored after analysis |
Response:
{
"success": true,
"item": {
"uuid": "abc-123",
"status": "completed",
"prediction": "ai",
"confidence": 0.91,
"text_content": "This is some text to analyze.",
"privacy_mode": false,
"created_at": "...",
"updated_at": "..."
}
}
prediction:"ai"or"human"— the verdictconfidence: 0.0–1.0, higher = more confidentstatus:"processing","completed", or"failed"
GET /text_detect/{uuid} — Poll
Poll until status is "completed" or "failed".
GET /text_detect — List
Returns paginated text detections for the team.
Callback
If callback_url was provided, a POST is sent on completion:
{ "success": true, "item": { ... } }
On failure:
{ "success": false, "item": { ... }, "error": "Error message here" }