Files
awesome-copilot/skills/resemble-detect/references/api-reference.md
Dev Shah 1aea01a677 Add resemble-detect skill (#1396)
* Add resemble-detect skill

Deepfake detection and media safety skill using Resemble AI — detects
AI-generated audio, images, video, and text with confidence scores,
traces audio source platforms, applies and reads watermarks, verifies
speaker identity, and extracts media intelligence (speaker, emotion,
misinformation signals).

Packaged as SKILL.md + LICENSE (Apache-2.0). Generated docs updated
via npm start per CONTRIBUTING.md.

* resemble-detect: trim body under 500 lines + add compatibility

Moves detailed request/response schemas from SKILL.md into
references/api-reference.md, bringing the SKILL body from 557 to
282 lines (validator hard cap is 500). Core decision-making content
— capability decision tree, score interpretation, workflows, red
flags — stays in the body where the agent needs it at query time.

Also adds a compatibility field to frontmatter per review feedback:
surfaces the RESEMBLE_API_KEY requirement and the public-HTTPS-URL
constraint upfront.

* Fix resemble-detect skill metadata
2026-04-28 12:09:46 +10:00

9.9 KiB
Raw Blame History

Resemble Detect — Full API Reference

Detailed request/response schemas for every Resemble detection endpoint.

Base

  • Base URL: https://app.resemble.ai/api/v2
  • Auth: Authorization: Bearer <RESEMBLE_API_KEY>

Deepfake Detection

POST /detect

Submit audio, image, or video for AI-generation analysis.

{
  "url": "https://example.com/media.mp4",
  "visualize": true,
  "intelligence": true,
  "audio_source_tracing": true
}
Parameter Type Required Description
url string Yes HTTPS URL to audio, image, or video file
callback_url string No Webhook URL for async completion notification
visualize boolean No Generate heatmap/visualization artifacts
intelligence boolean No Run multimodal intelligence alongside detection
audio_source_tracing boolean No Identify which AI platform synthesized fake audio
frame_length integer No Audio/video window size in seconds (14, default 2)
start_region number No Start of segment to analyze (seconds)
end_region number No End of segment to analyze (seconds)
model_types string No "image" or "talking_head" (for face-swap detection)
use_reverse_search boolean No Enable reverse image search (image only)
use_ood_detector boolean No Enable out-of-distribution detection
zero_retention_mode boolean No Auto-delete media after detection completes

Supported formats: Audio (WAV, MP3, OGG, M4A, FLAC) · Video (MP4, MOV, AVI, WMV) · Image (JPG, PNG, GIF, WEBP)

GET /detect/{uuid} — Poll for Results

Detection is asynchronous. Poll until status is "completed" or "failed". Start at 2s intervals, back off to 5s, then 10s. Most detections complete within 1060s.

Reading Results by Media Type

Audio results — in metrics:

{
  "label": "fake",
  "score": ["0.92", "0.88", "0.95"],
  "consistency": "0.91",
  "aggregated_score": "0.92",
  "image": "https://..."
}
  • label: "fake" or "real" — the verdict
  • score: Per-chunk prediction scores (array)
  • aggregated_score: Overall confidence (0.01.0, higher = more likely synthetic)
  • consistency: How consistent the prediction is across chunks
  • image: Visualization heatmap URL (if visualize: true)

Image results — in image_metrics:

{
  "type": "ImageAnalysis",
  "label": "fake",
  "score": 0.87,
  "image": "https://...",
  "ifl": { "score": 0.82, "heatmap": "https://..." },
  "reverse_image_search_sources": [
    { "url": "...", "title": "...", "verdict": "known_fake", "similarity": 0.95 }
  ]
}
  • ifl: Invisible Frequency Layer analysis with heatmap
  • reverse_image_search_sources: Known online sources (if use_reverse_search: true)

Video results — in video_metrics:

{
  "label": "fake",
  "score": 0.89,
  "certainty": 0.91,
  "children": [
    { "type": "VideoResult", "conclusion": "Fake", "score": 0.89, "timestamp": 2.5, "children": [...] }
  ]
}
  • Hierarchical tree of frame-level and segment-level results
  • Video with audio track returns both metrics (audio) and video_metrics (visual)

Intelligence

POST /intelligence

Analyze media for rich structured insights, standalone or alongside detection.

{ "url": "https://example.com/audio.mp3", "json": true }
Parameter Type Required Description
url string One of HTTPS URL to media file
media_token string One of Token from secure upload (alternative to URL)
detect_id string No UUID of existing detect to associate
media_type string No "audio", "video", or "image" (auto-detected)
json boolean No Return structured fields (default: false audio/video, true image)
callback_url string No Webhook for async mode

Audio/Video structured response (json: true):

  • speaker_info — speaker description (age, gender)
  • language / dialect — detected language
  • emotion — detected emotional state
  • speaking_style — conversational, formal, etc.
  • context — inferred context of the speech
  • message — content summary
  • abnormalities — anomalies detected in the media
  • transcription — full transcript
  • translation — translation if non-English
  • misinformation — misinformation analysis

Image structured response:

  • scene_description — what the image shows
  • subjects — people/objects identified
  • authenticity_analysis — visual authenticity assessment
  • context_and_setting — environment description
  • abnormalities — visual anomalies
  • misinformation — misinformation analysis

POST /detects/{detect_uuid}/intelligence — Ask Questions

After detection completes, ask natural-language questions about it:

{ "query": "How confident is the model that this audio is fake?" }

Returns a question UUID. Poll GET /detects/{detect_uuid}/intelligence/{question_uuid} until status is "completed".

Prerequisite: The detection must have status: "completed". Otherwise returns 422.


Audio Source Tracing

Enable by setting audio_source_tracing: true in POST /detect.

Result appears in the detection response under audio_source_tracing:

{ "label": "elevenlabs", "error_message": null }

Known source labels: resemble_ai, elevenlabs, real, and others as the model expands.

Important: Source tracing only runs when audio is labeled "fake". If audio is "real", no source tracing result appears.

Standalone queries:

  • GET /audio_source_tracings — list all source tracing reports
  • GET /audio_source_tracings/{uuid} — get specific report

Watermarking

POST /watermark/apply

{
  "url": "https://example.com/image.png",
  "strength": 0.3,
  "custom_message": "my-organization"
}
Parameter Type Required Description
url string Yes HTTPS URL to media file
strength number No Watermark strength 0.01.0 (image/video only, default 0.2)
custom_message string No Custom message (image/video only, default "resembleai")
  • Add Prefer: wait header for synchronous response
  • Without it, poll GET /watermark/apply/{uuid}/result
  • Response includes watermarked_media URL to download the watermarked file

POST /watermark/detect

{ "url": "https://example.com/suspect-image.png" }

Audio detection result:

{ "has_watermark": true, "confidence": 0.95 }

Image/Video detection result:

{ "has_watermark": true }

Identity — Speaker Verification (Beta)

Beta feature — requires joining the preview program. Inform the user if they encounter access errors.

POST /identity — Create Identity Profile

{
  "audio_url": "https://example.com/known-speaker.wav",
  "name": "Jane Doe"
}

POST /identity/search — Search Against Known Identities

{
  "audio_url": "https://example.com/unknown-speaker.wav",
  "top_k": 5
}

Response:

{
  "success": true,
  "item": [
    { "uuid": "...", "name": "Jane Doe", "confidence": 0.92, "distance": 0.08 }
  ]
}

Lower distance = closer match. Higher confidence = stronger match.


Text Detection

Beta feature — requires the detect_beta_user role or a billing plan that includes the dfd_text product.

POST /text_detect

Add Prefer: wait for synchronous response. Otherwise poll or use callback.

Parameter Type Required Description
text string Yes Text to analyze (max 100,000 characters)
thinking string No Always use "low" (default)
threshold float No Decision threshold 0.01.0 (default: 0.5)
callback_url string No Webhook URL for async completion notification
privacy_mode boolean No If true, text content is not stored after analysis

Response:

{
  "success": true,
  "item": {
    "uuid": "abc-123",
    "status": "completed",
    "prediction": "ai",
    "confidence": 0.91,
    "text_content": "This is some text to analyze.",
    "privacy_mode": false,
    "created_at": "...",
    "updated_at": "..."
  }
}
  • prediction: "ai" or "human" — the verdict
  • confidence: 0.01.0, higher = more confident
  • status: "processing", "completed", or "failed"

GET /text_detect/{uuid} — Poll

Poll until status is "completed" or "failed".

GET /text_detect — List

Returns paginated text detections for the team.

Callback

If callback_url was provided, a POST is sent on completion:

{ "success": true, "item": { ... } }

On failure:

{ "success": false, "item": { ... }, "error": "Error message here" }