Transcribe and summarize audio and video. Pay per job via Stripe or crypto.
Transcribe audio or video to text, including per-word timestamps for precise editing. Three-call flow: (1) call with `filename` to receive {job_id, payment_challenge}; (2) pay via MPP, then call with `job_id` + `payment_credential` to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns two outputs: role `transcript` (SRT) and role `transcript-words` (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass `format=srt|txt|vtt|json|words` to get_job_status to receive content inline — `txt` and `vtt` are derived from SRT, `json` is v1 (segments only), `words` is v2 (segments + words). Flat price: audio $0.50, video $1.00 — see /.well-known/mpp.json for the authoritative table. Use for podcasts, interviews, meetings, lectures, and especially for creating clips, multicamera edits, or edit-video-from-transcript where word boundaries matter. Retrying any call with `job_id` alone returns current state (idempotent). Failed jobs auto-refund.
Summarize an audio or video file — returns both a text summary AND the full transcript (with per-word timestamps). Do not also call transcribe on the same file. Three-call flow: (1) call with `filename` to receive {job_id, payment_challenge}; (2) pay via MPP, then call with `job_id` + `payment_credential` to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns three outputs: role `summary` (plain text), role `transcript` (SRT), and role `transcript-words` (JSON matching /.well-known/weftly-transcript-v2.schema.json, with segment-level and per-word timestamps). For other formats, pass `format=srt|txt|vtt|json|words` to get_job_status to receive transcript content inline — `txt` and `vtt` are derived from SRT, `json` is v1 (segments only), `words` is v2 (segments + words). Flat price: audio $0.75, video $1.25 — see /.well-known/mpp.json for the authoritative table. Use for meetings, long-form interviews, lectures, and podcast episodes; the `words` output additionally supports creating clips, multicamera edits, or edit-video-from-transcript. Retrying any call with `job_id` alone returns current state (idempotent). Failed jobs auto-refund.
START HERE for any clip workflow on a video — `find_clips` is the canonical entry point and includes a full transcription as a free byproduct. **Do not call `transcribe` first**: doing so doubles the upload, doubles the spend, and produces the same transcript. Identify ranked candidate clips in a video — what to cut for highlights, social, or testimonials. Three-call flow: (1) call with `filename` (and optional `query`) to receive {job_id, payment_challenge}; (2) pay via MPP, then call with `job_id` + `payment_credential` to receive {upload_url} (presigned PUT, 1h expiry); (3) PUT the bytes, then complete_upload(job_id), then poll get_job_status(job_id). On completion, get_job_status returns three outputs: role `clip-candidates` (JSON matching /.well-known/weftly-clips-v1.schema.json — includes `source_job_id` and `source_expires_at`), role `transcript` (SRT, free byproduct), role `transcript-words` (JSON matching /.well-known/weftly-transcript-v2.schema.json, free byproduct). Each candidate carries `transcript_text` — the full text of what's in the clip — so callers can preview content before paying for extract_clip. Optional `query` parameter switches to query mode (e.g., "they discuss pricing", "the part about hiring") with the same output shape; the `mode` field in clip-candidates.json indicates which mode produced the result. Flat price: $2.00 video — see /.well-known/mpp.json. **Source-reuse contract:** the source video stays in storage for 72h after find_clips completes. Hand the find_clips `job_id` (also returned as `source_job_id` in the candidates JSON) to `extract_clip` or `extract_vertical_clip` as their `source_job_id` — within those 72h they cut directly from the stored source: no re-upload, no re-transcribe, just $0.50 per cut. Pass the same `source_job_id` to as many extract calls as you need. Use for interviews, podcasts, sales calls, all-hands recordings. Retrying with `job_id` alone returns current state. Failed jobs auto-refund.
Cut and assemble a clip from any prior video job (find_clips, summarize, or video transcribe). Operates on a parent job — possessing the parent `source_job_id` is the capability, no upload step. Pass one segment for a simple cut, or multiple non-contiguous segments to compose a single mp4 highlight reel — same flat $0.50 either way. Two-call flow: (1) call with `source_job_id` + `segments` (ordered array of `{start, end, label?}` in source seconds, total duration capped at 30 minutes) to receive {job_id, payment_challenge}; (2) pay via MPP and call with `job_id` + `payment_credential` to start processing. No upload step. Poll get_job_status(job_id) for completion; outputs are role `clip-video` (the assembled .mp4, frame-accurate boundaries with 15ms audio fades at segment joins) and — when `include_transcript: true` (default) — roles `clip-srt` + `clip-words` (transcripts stitched and time-shifted to match the assembled video). Set `include_transcript: false` to skip transcript outputs. Payment: MPP — accepts Tempo USDC and Stripe SPT. The challenge's WWW-Authenticate header and /.well-known/mpp.json are authoritative for which methods are offered. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check `expires_at` from get_job_status on the parent). Multiple extract_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
Cut a 9:16 vertical clip from any prior video job (find_clips, summarize, or video transcribe), suitable for direct upload to TikTok, Instagram Reels, or YouTube Shorts. Default output is 1080×1920 H.264 / AAC `.mp4` with center-cropped framing; audio loudness-normalized to -14 LUFS / -1.5 dBTP for short-form social. Single-segment only; clip duration must be between 1 and 90 seconds (Instagram Reels max). Operates on a parent job — possessing the parent `source_job_id` is the capability, no upload step. Two-call flow: (1) call with `source_job_id` + `start` + `end` (in source seconds) to receive {job_id, payment_challenge}; (2) pay via MPP and call with `job_id` + `payment_credential` to start processing. Poll get_job_status(job_id) for completion; output is role `clip-vertical-video` (the `.mp4`). Flat price: $0.50 per clip. Payment: MPP — accepts Tempo USDC and Stripe SPT. Optional `profile` parameter selects the encoding profile (default `tiktok-primary`). Allowed values: `tiktok-primary` (1080×1920, fast preset, CRF 22), `tiktok-primary-720p` (720×1280, CBR 3 Mbps — half-resolution mobile-optimized, ~40% faster wall time), `instagram-reels` (1080×1920, slow preset, CBR 4 Mbps), `instagram-stories` (same encode shape as instagram-reels). All four profiles loudness-normalize identically. Optional `subject` parameter controls reframing (default `center`, preserves today's behavior): `auto` locks onto the longest-tracked face from the parent's subjects-sidecar (or runs inline detection if the parent has none); `subject_id` (with `subject_id` param naming a face_N from the sidecar) locks onto a specific subject; `follow` switches crop between active speakers across the clip using the sidecar's active_speaker_timeline; `manual` accepts caller-supplied framing via `subject_box: {x, y, w, h}` (source pixels) or `subject_x_offset` (direct crop x). Sidecar shape at /.well-known/weftly-subjects-v1.schema.json. auto/subject_id/follow fall back to center if detection or sidecar resolution fails — the paid job always delivers a clip. Source must be a horizontal video (wider than 9:16) — already-vertical or square sources are rejected. Source must still be in storage (72h TTL for find_clips parents, 24h elsewhere — check `expires_at` from get_job_status on the parent). Pair with `find_clips` ($2.00/video) to pick a moment first, then call this to get a download-ready vertical mp4 in under 5 minutes. Multiple extract_vertical_clip calls against one parent are independent paid jobs. Failed jobs auto-refund.
Confirm that the file has been uploaded (via HTTP PUT to the upload_url from transcribe or summarize) and start processing. Verifies that the file is present in storage and that the job has been paid. Returns status "processing". Poll get_job_status to track progress and retrieve download URLs when done.
Check the status of a transcribe or summarize job. Returns the current state and, when completed, an `outputs` array. Each output has either `content` (returned inline) or a presigned, time-limited (1 hour) `download_url`. Small text outputs (e.g. `transcript` SRT, `clip-candidates`, `summary`) come inline as `content`; larger outputs — `transcript-words` JSON for any non-trivial recording, plus video outputs like `clip-video` / `clip-vertical-video` — come as a `download_url` to fetch when needed. Optionally pass `format` (srt, txt, vtt, json, words) to get the transcript content inline in the top-level `transcript` field — `txt` and `vtt` are derived from the stored SRT; `json` is v1 (segments only); `words` is v2 (segments + per-word timestamps matching /.well-known/weftly-transcript-v2.schema.json). Poll this periodically after calling complete_upload — wait at least 60 seconds between checks. For files under 10 minutes, jobs usually complete within 1-2 minutes. For long files (1hr+), expect 10-30 minutes. Also use this to recover from lost state: if the original challenge was lost, call get_job_status(job_id) to retrieve a fresh challenge (status "awaiting_payment") or the upload URL (status "awaiting_upload").
Smoke-test the MPP payment plumbing end-to-end via this MCP server, for $0.01 USDC. Two-call flow: (1) call with no arguments to receive an MPP `payment_challenge`; (2) pay via MPP and call again with `payment_credential` set to the resulting Authorization header value (e.g. "Payment eyJ...") to receive {paid: true, timestamp, receipt_ref, payment_method}. Uses the exact same `createPayToAddress` + `createMppHandler` verification path as paid product tools (transcribe, summarize), so a green run here means real paid calls will work too. Stateless — no job is created, no database row written. Use this whenever you want to confirm a wallet, the MCP transport, the worker, and the production payment middleware are all healthy without paying a transcribe price. Cost: $0.01 USDC per attempt.
Publish an existing video from a transcribe or summarize job to YouTube. Creates a paid publish job (flat $2.00 price) and stores the OAuth token. Captions are auto-generated from the session transcript if available. Workflow: create_job → pay → trigger_youtube_publish → poll get_youtube_publish_status. Requires a YouTube OAuth2 access token obtained independently via Google OAuth (scope: youtube.upload).
Start the YouTube upload after payment is confirmed. Call this after publish_to_youtube once payment_status is "paid". Returns immediately — the upload runs as a durable Workflow in the background. Poll get_youtube_publish_status to track progress.
Check the status of a YouTube publish job. Poll periodically after trigger_youtube_publish — the upload takes 1-10 minutes depending on video size. Returns status (pending, publishing, completed, failed) and the youtube_video_url once complete.