ENGINEERING9 min · TUE · MAY 12 · 2026

How the pipeline works: 5 stages, 8 steps, 10 states

Ten DB states, eight working steps, five visible stages — the actual handoffs between them, with the vendors that do the work.

The VidFlow TeamNotes from the people building the pipeline

VidFlow's pipeline is shaped by three different views of the same thing. The Prisma schema has ten ProjectStatus values. Eight of them do work — the other two (DRAFT, COMPLETED) are bookends. The workspace UI groups those eight into five visible tiles. This post unpacks what each stage actually does, what calls it, and where the handoffs live.

Stage 1 — Ideation (DB: IDEATION)

The project lands here with a topic and a channel. We pull the channel's recent uploads, score them against an outlier model, and ask an LLM to propose a slate of titles. The creator picks one. The picked title becomes the seed for everything downstream. Nothing in the pipeline runs without it.

Stage 2 — Script (DB: SCRIPTING)

This is the longest editing stage. The LLM writes an outline, breaks the outline into chapters, drafts each chapter, and writes a hook in a separate sub-step you can reroll independently. The hook lives in its own panel because hooks are the highest-leverage prose in the video — they get rerolled more than anything else.

Stage 3 — Visual Bible

Not a DB status — it shares SCRIPTING with the outline phase, but it's its own data object. Once the script is approved, the LLM extracts named characters and locations. Our image model generates a portrait for each character (and a combined character sheet — one image with every character at consistent scale), plus a reference shot for each location. The Visual Bible is the thing that makes every later scene look like the same video.

Stage 4 — Voiceover (DB: VOICEOVER)

Our neural TTS is the voice engine — one provider, one integration. The creator picks a voice from the library or one of their own cloned voices, per character if they're using MultiVoice. Output is one merged audio file per chapter, then a single merged audio file for the project.

Stage 5 — Alignment (DB: ALIGNMENT)

Our alignment engine takes the merged audio and the script text and produces word-level timestamps. We need this for the next stage — it's how scenes get cut to actual speech, not estimated durations.

Stage 6 — Shot planning (DB: SHOT_PLANNING)

The LLM reads the timestamped script and decomposes it into semantic beats — HOOK, CLAIM, EXAMPLE, TURN, RESOLUTION — and emits a Shot record for each visual moment. Each shot gets an assigned location, character references, motion intent, and a draft prompt spec.

Stage 7 — Generation (DB: GENERATION)

This is where the bill grows. Each shot generates an image first (our image model), then a video clip from that image (our video model). The image-before-video order is a hard precondition; the new BullMQ worker enforces it. Generation runs in parallel across shots, bounded by your credit balance.

Stage 8 — Review (DB: REVIEW)

Every shot has a thumbnail and a clip; you flip through and approve or kick back individual shots for regeneration. Nothing renders until you approve.

Stage 9 — Rendering (DB: RENDERING)

FFmpeg locally is the default render path. There's a cloud render path built but not wired by default — reserved for jobs too heavy for the local renderer.

Stage 10 — Completed (DB: COMPLETED)

A project flips to COMPLETED once any Publication row reaches PUBLISHED. Publication is currently YouTube-only.

Where the worker fits

The Phase 1.5 BullMQ worker (npm run worker) is the new path for per-shot generation. Behind two feature flags — JOBS_ORCHESTRATOR_NEW=1 and JOBS_GENERATE_SHOT_NEW=1 — the orchestrator enqueues to a Redis-backed generate-shot queue and awaits per-shot results. Vendor webhooks land at /api/{vendor}/callback. The old synchronous path is still the default while we're proving the queue out.

Thumbnail generation runs alongside, not in line

It's its own job (src/jobs/functions/generate-thumbnail.ts), kicked off in parallel with shot generation. Three thumbnail variants land on the project; the creator picks one as the cover. No DB status for it — it's metadata.

That's the pipeline. Ten statuses, eight working stages, five tiles. The same thing, viewed three ways.

See the pipeline work on your idea.

350 credits free, no card. Direct all five stages start to ship.

Start free →

KEEP READING

ENGINEERING

Why our captions broke in production

ENGINEERING

Moving shot generation onto a queue

ENGINEERING9 min · TUE · MAY 12 · 2026

How the pipeline works: 5 stages, 8 steps, 10 states

Ten DB states, eight working steps, five visible stages — the actual handoffs between them, with the vendors that do the work.

The VidFlow TeamNotes from the people building the pipeline

Stage 1 — Ideation (DB: IDEATION)

Stage 2 — Script (DB: SCRIPTING)

Stage 3 — Visual Bible

Stage 4 — Voiceover (DB: VOICEOVER)

Stage 5 — Alignment (DB: ALIGNMENT)

Stage 6 — Shot planning (DB: SHOT_PLANNING)

Stage 7 — Generation (DB: GENERATION)

Stage 8 — Review (DB: REVIEW)

Every shot has a thumbnail and a clip; you flip through and approve or kick back individual shots for regeneration. Nothing renders until you approve.

Stage 9 — Rendering (DB: RENDERING)

FFmpeg locally is the default render path. There's a cloud render path built but not wired by default — reserved for jobs too heavy for the local renderer.

Stage 10 — Completed (DB: COMPLETED)

A project flips to COMPLETED once any Publication row reaches PUBLISHED. Publication is currently YouTube-only.

Where the worker fits

Thumbnail generation runs alongside, not in line

That's the pipeline. Ten statuses, eight working stages, five tiles. The same thing, viewed three ways.

See the pipeline work on your idea.

350 credits free, no card. Direct all five stages start to ship.

Start free →

KEEP READING

ENGINEERING

Why our captions broke in production

ENGINEERING

Moving shot generation onto a queue