
Mastering Sora 2 Prompting: A Technical Guide with Code Examples
Learn how to craft effective prompts for OpenAI's Sora 2 API with practical TypeScript examples. From basic prompt structure to advanced techniques like remix and multi-shot generation.
Patrick the AI Engineer
Table of Contents
Introduction
Prompting Sora 2 is different from prompting ChatGPT or DALL-E. You're not just describing what you want to see—you're briefing a cinematographer who has never seen your vision. The quality of your prompt directly impacts the coherence, motion, lighting, and overall cinematic feel of the generated video.
In this guide, we'll explore how to write effective Sora 2 prompts programmatically. We'll cover the fundamental differences between API parameters and prompt content, build reusable prompt templates, implement iterative refinement with remix, and even chain multiple prompts together for longer, more complex videos.
By the end, you'll have practical code patterns you can use to generate professional-quality videos with Sora 2.
API Parameters vs. Prompt Content
Before writing any prompts, understand what you can and cannot control through text. Sora 2 has a clear separation between API parameters and prompt content.
What API Parameters Control
These attributes must be set explicitly in your API call—they won't work if you include them in your prompt text:
import OpenAI from 'openai';
const openai = new OpenAI({ apiKey: process.env.OPENAI_API_KEY });
const job = await openai.videos.create({
model: 'sora-2', // or 'sora-2-pro'
size: '1280x720', // Resolution: portrait or landscape
seconds: '8', // Duration: '4', '8', or '12'
prompt: 'Your creative prompt here'
});
Resolution options:
sora-2:1280x720,720x1280sora-2-pro:1280x720,720x1280,1024x1792,1792x1024
Duration: 4, 8, or 12 seconds. Shorter clips follow instructions more reliably. If you need longer videos, consider chaining segments (we'll cover this later).
Model choice: sora-2 is faster and cheaper ($0.10/sec). sora-2-pro supports higher resolutions and produces better quality ($0.30-0.50/sec depending on resolution).
Trying to request "make it 20 seconds" or "4K resolution" in your prompt won't work. These are controlled by parameters.
What Your Prompt Controls
Everything else goes in your prompt:
- Subject and action
- Camera framing and movement
- Lighting and atmosphere
- Color palette and mood
- Style and aesthetic
- Dialogue and sound cues
Your prompt is where creativity happens. Let's build a system to structure these effectively.
Prompt Anatomy and Structure
A well-structured prompt reads like a shot description on a storyboard. It tells the model exactly what to show, how to frame it, and what feeling to evoke.
The Basic Template
Here's a minimal working structure:
interface BasicPrompt {
style: string;
scene: string;
cinematography: {
camera: string;
mood: string;
};
actions: string[];
}
function buildBasicPrompt(config: BasicPrompt): string {
return `
Style: ${config.style}
${config.scene}
Cinematography:
Camera: ${config.cinematography.camera}
Mood: ${config.cinematography.mood}
Actions:
${config.actions.map(action => `- ${action}`).join('\n')}
`.trim();
}
Let's use it:
const prompt = buildBasicPrompt({
style: '90s documentary-style interview',
scene: 'An old Swedish man sits in a study.',
cinematography: {
camera: 'medium close-up, eye level',
mood: 'intimate and reflective'
},
actions: [
'The man looks directly at camera',
'He says: "I still remember when I was young."',
'He smiles softly and looks away'
]
});
const job = await openai.videos.create({
model: 'sora-2',
size: '1280x720',
seconds: '4',
prompt
});
This gives the model clear direction while leaving room for creative interpretation of details like clothing, set design, and exact timing.
Balancing Specificity and Creativity
Short prompts give the model more freedom. This can lead to surprising, beautiful variations:
const loosePrompt = `
A chef tosses vegetables in a wok.
The kitchen is busy, steam rises, and flames flare up.
`;
The model will invent the chef's appearance, kitchen style, exact vegetables, and camera angle. Each generation will be different.
Detailed prompts restrict creativity but increase consistency:
const tightPrompt = `
Style: Cinematic food documentary, shot on 35mm film
A middle-aged Asian chef in a white uniform stands at a commercial wok station.
She tosses bok choy, bell peppers, and snap peas in a carbon steel wok.
Flames leap up as she adds oil. Steam rises into warm overhead lighting.
Cinematography:
Camera: medium shot from chef's right side, slight dutch angle
Lens: 40mm spherical, shallow depth of field
Lighting: warm overhead tungsten key, cool natural light from window left
Mood: energetic, professional, tactile
Actions:
- Chef flips wok with confident motion
- Flames burst upward, illuminating her face
- She smiles slightly, focused on the cooking
- Steam drifts across the frame
`;
Choose based on your needs. For consistent brand content or specific storyboards, go detailed. For exploration and variety, stay loose.
Implementing Prompt Templates
Rather than writing prompts from scratch each time, build reusable templates. Here's a production-ready prompt builder:
interface SoraPromptConfig {
// Core content
style?: string;
scene: string;
// Camera and lighting
cinematography?: {
camera?: string;
lens?: string;
lighting?: string;
mood?: string;
};
// Story beats
actions?: string[];
dialogue?: Array<{ speaker: string; line: string }>;
// Audio cues
backgroundSound?: string;
// Advanced: production details
advanced?: {
format?: string;
grade?: string;
lenses?: string;
atmosphere?: string;
};
}
function buildSoraPrompt(config: SoraPromptConfig): string {
const sections: string[] = [];
// Style and format
if (config.style || config.advanced?.format) {
sections.push(`Style: ${config.style || ''}`);
if (config.advanced?.format) {
sections.push(config.advanced.format);
}
}
// Scene description
sections.push(config.scene);
// Cinematography block
if (config.cinematography) {
const cine = config.cinematography;
sections.push('\nCinematography:');
if (cine.camera) sections.push(`Camera: ${cine.camera}`);
if (cine.lens) sections.push(`Lens: ${cine.lens}`);
if (cine.lighting) sections.push(`Lighting: ${cine.lighting}`);
if (cine.mood) sections.push(`Mood: ${cine.mood}`);
}
// Advanced production details
if (config.advanced) {
const adv = config.advanced;
if (adv.grade) sections.push(`\nGrade: ${adv.grade}`);
if (adv.atmosphere) sections.push(`Atmosphere: ${adv.atmosphere}`);
}
// Actions
if (config.actions?.length) {
sections.push('\nActions:');
config.actions.forEach(action => sections.push(`- ${action}`));
}
// Dialogue
if (config.dialogue?.length) {
sections.push('\nDialogue:');
config.dialogue.forEach(d => sections.push(`- ${d.speaker}: "${d.line}"`));
}
// Background sound
if (config.backgroundSound) {
sections.push(`\nBackground Sound:\n${config.backgroundSound}`);
}
return sections.join('\n').trim();
}
Now we can generate prompts programmatically:
const config: SoraPromptConfig = {
style: '1970s romantic drama, shot on 35mm film',
scene: `At golden hour, a brick tenement rooftop transforms into a small stage.
Laundry lines strung with white sheets sway in the wind, catching the last rays of sunlight.
Strings of mismatched fairy bulbs hum faintly overhead. A young woman in a flowing red silk
dress dances barefoot, curls glowing in the fading light.`,
cinematography: {
camera: 'medium-wide shot, slow dolly-in from eye level',
lens: '40mm spherical; shallow focus to isolate couple from skyline',
lighting: 'golden natural key with tungsten bounce; edge from fairy bulbs',
mood: 'nostalgic, tender, cinematic'
},
actions: [
'She spins; her dress flares, catching sunlight',
'He steps in, catches her hand, and dips her into shadow',
'Sheets drift across frame, briefly veiling the skyline before parting again'
],
dialogue: [
{ speaker: 'Woman', line: 'See? Even the city dances with us tonight.' },
{ speaker: 'Man', line: 'Only because you lead.' }
],
backgroundSound: 'Natural ambience only: faint wind, fabric flutter, street noise. No added score.'
};
const prompt = buildSoraPrompt(config);
This pattern makes it easy to:
- Version control your prompts
- A/B test different variations
- Generate prompts from databases or user input
- Maintain consistency across a project
Visual Cues That Steer the Look
Certain phrases reliably influence Sora's output. These are like presets that guide the model toward specific aesthetics.
Film Stock and Era
const filmStockExamples = [
'shot on 35mm Kodak film with natural grain',
'16mm documentary footage, vintage 1980s',
'digital capture emulating 65mm photochemical contrast',
'Super 8 home video, faded colors, slight gate weave'
];
Including film stock references affects color grading, grain structure, and overall texture.
Camera and Lens References
const lensExamples = [
'40mm spherical prime, shallow depth of field',
'24mm wide-angle, deep focus',
'85mm portrait lens with smooth bokeh',
'IMAX 70mm for epic scale'
];
Lens references influence framing, depth of field, and perspective distortion.
Lighting Setups
const lightingExamples = [
'Natural window light from camera left, soft shadows',
'Hard tungsten key light with cool fill from right',
'Golden hour sunlight, warm and directional',
'Overcast diffused light, flat and even'
];
Describing lighting direction and quality helps maintain consistency.
Building a Style Library
Create reusable style presets:
interface StylePreset {
name: string;
filmStock: string;
lens: string;
lighting: string;
colorPalette: string;
}
const styleLibrary: Record<string, StylePreset> = {
documentary: {
name: 'Modern Documentary',
filmStock: 'Digital capture, natural grain',
lens: '35mm handheld, slight shake',
lighting: 'Available light, mixed color temperature',
colorPalette: 'Desaturated, neutral tones with teal shadows'
},
cinematic: {
name: 'Cinematic Drama',
filmStock: '35mm film, fine grain',
lens: '50mm spherical prime, shallow depth of field',
lighting: 'Motivated three-point lighting, warm key',
colorPalette: 'Rich colors, deep blacks, warm-cool contrast'
},
retro: {
name: 'Vintage 1970s',
filmStock: '16mm film with visible grain and halation',
lens: '28mm spherical, slight vignette',
lighting: 'Practical sources, mixed warm tungsten',
colorPalette: 'Faded earth tones, amber lift in highlights'
}
};
function applyStylePreset(
baseConfig: SoraPromptConfig,
presetName: keyof typeof styleLibrary
): SoraPromptConfig {
const preset = styleLibrary[presetName];
return {
...baseConfig,
style: preset.filmStock,
cinematography: {
...baseConfig.cinematography,
lens: preset.lens,
lighting: preset.lighting
},
advanced: {
...baseConfig.advanced,
grade: preset.colorPalette
}
};
}
Use it like this:
let config: SoraPromptConfig = {
scene: 'A woman walks through a rainy city street at night',
actions: ['She looks up at neon signs', 'Rain drips from her umbrella']
};
config = applyStylePreset(config, 'cinematic');
const prompt = buildSoraPrompt(config);
This makes it trivial to maintain consistent aesthetics across multiple videos.
Iterating with Remix
Sora's remix feature lets you iterate on existing generations. Instead of starting from scratch, you can make targeted adjustments.
How Remix Works
You reference a previously generated video and describe what to change:
async function remixVideo(
originalJobId: string,
changes: string
): Promise<string> {
const job = await openai.videos.create({
model: 'sora-2',
size: '1280x720',
seconds: '4',
prompt: changes,
// Reference the original video
input_reference: originalJobId
});
// Poll until complete
let status = await openai.videos.retrieve(job.id);
while (status.status === 'in_progress' || status.status === 'queued') {
await new Promise(resolve => setTimeout(resolve, 2000));
status = await openai.videos.retrieve(job.id);
}
return job.id;
}
Iterative Refinement Pattern
Build a refinement loop:
async function iterativeRefinement(
initialPrompt: string,
refinements: string[]
): Promise<string[]> {
const jobIds: string[] = [];
// Generate initial video
let job = await openai.videos.create({
model: 'sora-2',
size: '1280x720',
seconds: '4',
prompt: initialPrompt
});
// Wait for completion
while (job.status === 'in_progress' || job.status === 'queued') {
await new Promise(r => setTimeout(r, 2000));
job = await openai.videos.retrieve(job.id);
}
jobIds.push(job.id);
let currentJobId = job.id;
// Apply refinements sequentially
for (const refinement of refinements) {
currentJobId = await remixVideo(currentJobId, refinement);
jobIds.push(currentJobId);
}
return jobIds;
}
Use it to progressively refine a shot:
const jobIds = await iterativeRefinement(
'A chef cooking in a busy kitchen',
[
'Same scene, but change camera to over-shoulder angle',
'Same framing, add more steam and smoke in the air',
'Same everything, but change lighting to golden hour through window'
]
);
// Download all versions to compare
for (const id of jobIds) {
const response = await openai.videos.downloadContent(id);
// Save to disk or display
}
This is much faster than regenerating from scratch each time, and it preserves what's already working.
Chaining Segments for Longer Videos
Sora caps videos at 12 seconds. To create longer content, we chain segments using the input_reference parameter with image input.
The Core Technique
Extract the last frame from one video and use it as the first frame of the next:
async function extractLastFrame(videoBlob: Blob): Promise<Blob> {
const video = document.createElement('video');
video.src = URL.createObjectURL(videoBlob);
video.muted = true;
await new Promise(resolve => {
video.onloadedmetadata = resolve;
});
// Seek to the end
video.currentTime = video.duration;
await new Promise(resolve => {
video.onseeked = resolve;
});
// Draw to canvas
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
const ctx = canvas.getContext('2d');
if (!ctx) throw new Error('Canvas context unavailable');
ctx.drawImage(video, 0, 0);
// Export as JPEG
return new Promise((resolve, reject) => {
canvas.toBlob(
blob => blob ? resolve(blob) : reject(new Error('Failed to extract frame')),
'image/jpeg',
0.92
);
});
}
Generating a Segment with Image Input
import { toFile } from 'openai';
async function generateSegmentWithImage(
prompt: string,
inputImage?: Blob
): Promise<Blob> {
const body: any = {
model: 'sora-2',
prompt,
seconds: '8',
size: '1280x720'
};
if (inputImage) {
body.input_reference = await toFile(
inputImage,
'reference.jpg',
{ type: 'image/jpeg' }
);
}
let job = await openai.videos.create(body);
while (job.status === 'in_progress' || job.status === 'queued') {
await new Promise(resolve => setTimeout(resolve, 2000));
job = await openai.videos.retrieve(job.id);
}
if (job.status === 'failed') {
throw new Error(job.error?.message || 'Generation failed');
}
const response = await openai.videos.downloadContent(job.id);
const arrayBuffer = await response.arrayBuffer();
return new Blob([arrayBuffer], { type: 'video/mp4' });
}
Multi-Shot Prompting
Create a sequence of shots with visual continuity:
interface Shot {
prompt: string;
duration: 4 | 8 | 12;
}
async function generateSequence(shots: Shot[]): Promise<Blob[]> {
const segments: Blob[] = [];
let inputReference: Blob | undefined;
for (const shot of shots) {
const segment = await generateSegmentWithImage(
shot.prompt,
inputReference
);
segments.push(segment);
inputReference = await extractLastFrame(segment);
}
return segments;
}
Define a multi-shot sequence:
const sequence: Shot[] = [
{
duration: 8,
prompt: buildSoraPrompt({
style: 'Cinematic drama, 35mm film',
scene: 'A woman stands at a window, looking out at the city at dusk',
cinematography: {
camera: 'medium shot from behind, over shoulder',
lighting: 'warm interior light, cool blue twilight through window',
mood: 'contemplative, quiet'
},
actions: [
'She touches the window glass',
'Her breath fogs it slightly',
'She traces a shape with her finger'
]
})
},
{
duration: 8,
prompt: buildSoraPrompt({
style: 'Same cinematic style, maintaining continuity',
scene: 'Close-up of her hand on the window glass. The city lights twinkle outside.',
cinematography: {
camera: 'tight close-up on hand and glass',
lighting: 'same warm-cool contrast',
mood: 'intimate, reflective'
},
actions: [
'She traces a heart shape',
'Camera slowly pulls back to reveal her face in reflection',
'She smiles softly'
]
})
},
{
duration: 4,
prompt: buildSoraPrompt({
style: 'Same style, final beat',
scene: 'She turns away from the window and walks into the warmly lit room',
cinematography: {
camera: 'wide shot, static',
lighting: 'warm practical lights in room, fading twilight from window',
mood: 'resolution, peace'
},
actions: [
'She walks slowly across the room',
'She picks up a book from the table',
'She sits down in an armchair'
]
})
}
];
const segments = await generateSequence(sequence);
This creates three connected shots totaling 20 seconds. Each flows naturally into the next because we're using the last frame as the starting point.
Stitching Segments
Once you have all segments, stitch them into a single file. Using a library like mediabunny for lossless concatenation:
import {
ALL_FORMATS,
BlobSource,
BufferTarget,
EncodedAudioPacketSource,
EncodedPacket,
EncodedPacketSink,
EncodedVideoPacketSource,
Input,
Mp4OutputFormat,
Output
} from 'mediabunny';
async function stitchSegments(segments: Blob[]): Promise<Blob> {
if (!segments.length) throw new Error('No segments to stitch');
const firstInput = new Input({
source: new BlobSource(segments[0]),
formats: ALL_FORMATS
});
const firstVideoTrack = await firstInput.getPrimaryVideoTrack();
const firstAudioTrack = await firstInput.getPrimaryAudioTrack();
if (!firstVideoTrack) throw new Error('No video track');
const videoCodec = (await firstVideoTrack.codec) || 'avc';
const audioCodec = firstAudioTrack
? (await firstAudioTrack.codec) || 'aac'
: null;
const target = new BufferTarget();
const output = new Output({
format: new Mp4OutputFormat({ fastStart: 'in-memory' }),
target
});
const outVideo = new EncodedVideoPacketSource(videoCodec as any);
output.addVideoTrack(outVideo);
let outAudio: EncodedAudioPacketSource | null = null;
if (audioCodec) {
outAudio = new EncodedAudioPacketSource(audioCodec as any);
output.addAudioTrack(outAudio);
}
await output.start();
let videoTimestampOffset = 0;
let audioTimestampOffset = 0;
for (const seg of segments) {
const input = new Input({
source: new BlobSource(seg),
formats: ALL_FORMATS
});
const vTrack = await input.getPrimaryVideoTrack();
const aTrack = await input.getPrimaryAudioTrack();
if (!vTrack) continue;
const vSink = new EncodedPacketSink(vTrack);
for await (const packet of vSink.packets()) {
const cloned: EncodedPacket = packet.clone({
timestamp: packet.timestamp + videoTimestampOffset
});
await outVideo.add(cloned);
}
videoTimestampOffset += await vTrack.computeDuration();
if (aTrack && outAudio) {
const aSink = new EncodedPacketSink(aTrack);
for await (const packet of aSink.packets()) {
const cloned: EncodedPacket = packet.clone({
timestamp: packet.timestamp + audioTimestampOffset
});
await outAudio.add(cloned);
}
audioTimestampOffset += await aTrack.computeDuration();
}
}
outVideo.close();
if (outAudio) outAudio.close();
await output.finalize();
const finalBuffer: ArrayBuffer | null =
(target as unknown as { buffer: ArrayBuffer | null }).buffer;
if (!finalBuffer) throw new Error('Failed to finalize video');
return new Blob([finalBuffer], { type: 'video/mp4' });
}
Put it all together:
const segments = await generateSequence(sequence);
const finalVideo = await stitchSegments(segments);
// Save or display
const url = URL.createObjectURL(finalVideo);
const videoElement = document.createElement('video');
videoElement.src = url;
videoElement.controls = true;
document.body.appendChild(videoElement);
Advanced Techniques
Ultra-Detailed Prompts for Professional Work
When you need precise control over every aesthetic choice, use production-level prompts:
const proPrompt = buildSoraPrompt({
style: `Format & Look: 180° shutter; digital capture emulating 65mm photochemical
contrast; fine grain; subtle halation on speculars.
Lenses & Filtration: 32mm spherical prime; Black Pro-Mist 1/4; slight CPL rotation
to manage glass reflections.`,
scene: `Urban commuter platform, dawn. Foreground: yellow safety line, coffee cup on
bench. Midground: waiting passengers silhouetted in haze. Background: arriving train
braking to a stop.`,
cinematography: {
camera: '32mm shoulder-mounted slow dolly left, low and close to lens axis',
lens: 'Spherical prime with minimal flare, preserve silhouette clarity',
lighting: `Natural sunlight from camera left, low angle (07:30 AM).
Bounce: 4×4 ultrabounce silver from trackside.
Negative fill from opposite wall.
Practical: sodium platform lights on dim fade.`,
mood: 'Anticipatory, realistic, tactile'
},
advanced: {
grade: `Highlights: clean morning sunlight with amber lift.
Mids: balanced neutrals with slight teal cast in shadows.
Blacks: soft, neutral with mild lift for haze retention.`,
atmosphere: 'Gentle mist; train exhaust drift through light beam.'
},
actions: [
'Camera slides past platform edge; train headlights flare softly through mist',
'Traveler mid-frame looks down tracks',
'Morning light blooms across lens naturally'
],
backgroundSound: `Diegetic only: faint rail screech, train brakes hiss,
distant announcement muffled, low ambient hum. No score.`
});
This level of detail is overkill for most use cases, but it's powerful when you need to match specific cinematography styles or maintain strict brand guidelines.
Prompt Versioning and A/B Testing
Track prompt versions and outcomes:
interface PromptVersion {
id: string;
prompt: string;
config: SoraPromptConfig;
generatedAt: Date;
jobId?: string;
rating?: number;
notes?: string;
}
class PromptVersionManager {
private versions: PromptVersion[] = [];
saveVersion(
config: SoraPromptConfig,
jobId?: string
): string {
const version: PromptVersion = {
id: crypto.randomUUID(),
prompt: buildSoraPrompt(config),
config,
generatedAt: new Date(),
jobId
};
this.versions.push(version);
return version.id;
}
rate(versionId: string, rating: number, notes?: string) {
const version = this.versions.find(v => v.id === versionId);
if (version) {
version.rating = rating;
version.notes = notes;
}
}
getBestVersions(limit: number = 5): PromptVersion[] {
return this.versions
.filter(v => v.rating !== undefined)
.sort((a, b) => (b.rating || 0) - (a.rating || 0))
.slice(0, limit);
}
exportHistory(): string {
return JSON.stringify(this.versions, null, 2);
}
}
Use it in your workflow:
const manager = new PromptVersionManager();
// Try multiple variations
const variations: SoraPromptConfig[] = [
{ scene: 'A cat sits on a windowsill, version A...', actions: ['...'] },
{ scene: 'A cat sits on a windowsill, version B...', actions: ['...'] },
{ scene: 'A cat sits on a windowsill, version C...', actions: ['...'] }
];
for (const config of variations) {
const prompt = buildSoraPrompt(config);
const job = await openai.videos.create({
model: 'sora-2',
size: '1280x720',
seconds: '4',
prompt
});
manager.saveVersion(config, job.id);
}
// Later, rate them
manager.rate('version-id-1', 4, 'Good framing but lighting too dark');
manager.rate('version-id-2', 5, 'Perfect!');
// Export for review
console.log(manager.exportHistory());
This makes it easy to iterate systematically and learn what works over time.
Wrapping Up
Effective prompting for Sora 2 is about understanding the boundary between what's controlled by API parameters and what's guided by prose. The model excels when you give it clear cinematographic direction: camera angles, lighting setups, action beats, and mood.
Building reusable prompt templates and style libraries lets you maintain consistency across projects. Iterating with remix helps you refine without starting over. And chaining segments with image input unlocks longer narratives that flow naturally.
The key is treating prompting as a technical craft, not just creative writing. Structure your prompts, version them, and iterate systematically. The more intentional you are about prompt design, the more control you'll have over the final output.
Full Code Examples
interface SoraPromptConfig {
style?: string;
scene: string;
cinematography?: {
camera?: string;
lens?: string;
lighting?: string;
mood?: string;
};
actions?: string[];
dialogue?: Array<{ speaker: string; line: string }>;
backgroundSound?: string;
advanced?: {
format?: string;
grade?: string;
atmosphere?: string;
};
}
function buildSoraPrompt(config: SoraPromptConfig): string {
const sections: string[] = [];
if (config.style) {
sections.push(`Style: ${config.style}`);
}
if (config.advanced?.format) {
sections.push(config.advanced.format);
}
sections.push(config.scene);
if (config.cinematography) {
const cine = config.cinematography;
sections.push('\nCinematography:');
if (cine.camera) sections.push(`Camera: ${cine.camera}`);
if (cine.lens) sections.push(`Lens: ${cine.lens}`);
if (cine.lighting) sections.push(`Lighting: ${cine.lighting}`);
if (cine.mood) sections.push(`Mood: ${cine.mood}`);
}
if (config.advanced?.grade) {
sections.push(`\nGrade: ${config.advanced.grade}`);
}
if (config.advanced?.atmosphere) {
sections.push(`Atmosphere: ${config.advanced.atmosphere}`);
}
if (config.actions?.length) {
sections.push('\nActions:');
config.actions.forEach(action => sections.push(`- ${action}`));
}
if (config.dialogue?.length) {
sections.push('\nDialogue:');
config.dialogue.forEach(d => sections.push(`- ${d.speaker}: "${d.line}"`));
}
if (config.backgroundSound) {
sections.push(`\nBackground Sound:\n${config.backgroundSound}`);
}
return sections.join('\n').trim();
}
// Usage
const prompt = buildSoraPrompt({
style: 'Cinematic documentary, shot on 35mm',
scene: 'A painter works in a sunlit studio. Canvases line the walls.',
cinematography: {
camera: 'slow push-in, medium to close-up',
lighting: 'natural window light from left, soft shadows',
mood: 'contemplative, peaceful'
},
actions: [
'She dips her brush in paint',
'She makes confident strokes on canvas',
'She steps back to examine her work'
],
backgroundSound: 'Quiet ambient: birds chirping outside, brush on canvas'
});
import OpenAI, { toFile } from 'openai';
const openai = new OpenAI({
apiKey: process.env.OPENAI_API_KEY,
dangerouslyAllowBrowser: true
});
interface Shot {
prompt: string;
duration: 4 | 8 | 12;
}
async function extractLastFrame(videoBlob: Blob): Promise<Blob> {
const video = document.createElement('video');
video.src = URL.createObjectURL(videoBlob);
video.muted = true;
await new Promise(resolve => { video.onloadedmetadata = resolve; });
video.currentTime = video.duration;
await new Promise(resolve => { video.onseeked = resolve; });
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
const ctx = canvas.getContext('2d');
if (!ctx) throw new Error('Canvas unavailable');
ctx.drawImage(video, 0, 0);
return new Promise((resolve, reject) => {
canvas.toBlob(
blob => blob ? resolve(blob) : reject(new Error('Failed to extract')),
'image/jpeg',
0.92
);
});
}
async function generateSegment(
prompt: string,
duration: 4 | 8 | 12,
inputImage?: Blob
): Promise<Blob> {
const body: any = {
model: 'sora-2',
prompt,
seconds: String(duration),
size: '1280x720'
};
if (inputImage) {
body.input_reference = await toFile(inputImage, 'ref.jpg', {
type: 'image/jpeg'
});
}
let job = await openai.videos.create(body);
while (job.status === 'in_progress' || job.status === 'queued') {
await new Promise(r => setTimeout(r, 2000));
job = await openai.videos.retrieve(job.id);
}
if (job.status === 'failed') {
throw new Error(job.error?.message || 'Generation failed');
}
const response = await openai.videos.downloadContent(job.id);
return new Blob([await response.arrayBuffer()], { type: 'video/mp4' });
}
async function generateSequence(shots: Shot[]): Promise<Blob[]> {
const segments: Blob[] = [];
let inputRef: Blob | undefined;
for (const shot of shots) {
console.log(`Generating shot: ${shot.prompt.slice(0, 50)}...`);
const segment = await generateSegment(shot.prompt, shot.duration, inputRef);
segments.push(segment);
inputRef = await extractLastFrame(segment);
}
return segments;
}
// Usage
const sequence: Shot[] = [
{
duration: 8,
prompt: 'A woman walks through a park in autumn. Leaves fall around her. Warm afternoon light.'
},
{
duration: 8,
prompt: 'Close-up of her hand catching a falling leaf. She examines it closely.'
},
{
duration: 4,
prompt: 'She looks up and smiles. Camera pulls back to reveal the park behind her.'
}
];
const segments = await generateSequence(sequence);
console.log(`Generated ${segments.length} segments`);
interface StylePreset {
name: string;
filmStock: string;
lens: string;
lighting: string;
colorPalette: string;
}
const styleLibrary: Record<string, StylePreset> = {
documentary: {
name: 'Modern Documentary',
filmStock: 'Digital capture, natural grain',
lens: '35mm handheld, slight shake',
lighting: 'Available light, mixed color temperature',
colorPalette: 'Desaturated, neutral tones with teal shadows'
},
cinematic: {
name: 'Cinematic Drama',
filmStock: '35mm film, fine grain',
lens: '50mm spherical prime, shallow depth of field',
lighting: 'Motivated three-point lighting, warm key',
colorPalette: 'Rich colors, deep blacks, warm-cool contrast'
},
vintage: {
name: 'Vintage 1970s',
filmStock: '16mm film with visible grain and halation',
lens: '28mm spherical, slight vignette',
lighting: 'Practical sources, mixed warm tungsten',
colorPalette: 'Faded earth tones, amber lift in highlights'
},
commercial: {
name: 'High-End Commercial',
filmStock: 'Digital ARRI capture, clean and sharp',
lens: '40mm spherical, controlled depth of field',
lighting: 'Controlled studio lighting, soft beauty key',
colorPalette: 'Saturated, clean colors with high contrast'
}
};
function applyStylePreset(
baseConfig: SoraPromptConfig,
presetName: keyof typeof styleLibrary
): SoraPromptConfig {
const preset = styleLibrary[presetName];
return {
...baseConfig,
style: `${preset.name}: ${preset.filmStock}`,
cinematography: {
...baseConfig.cinematography,
lens: preset.lens,
lighting: preset.lighting
},
advanced: {
...baseConfig.advanced,
grade: preset.colorPalette
}
};
}
// Usage
let config: SoraPromptConfig = {
scene: 'A chef plates a dish in a restaurant kitchen',
actions: ['She carefully arranges garnish', 'She wipes the plate edge']
};
config = applyStylePreset(config, 'commercial');
const prompt = buildSoraPrompt(config);

Generate Long Videos by Chaining Sora 2 Segments
OpenAI's Sora 2 caps videos at 12 seconds. Learn how to create longer videos by chaining segments together, using the last frame of each video as the first frame of the next.
Real-Time Video Captioning in the Browser with Vision Language Models
Learn how to build an accessible video captioning system using Transformers.js and FastVLM that runs entirely in the browser with WebGPU—no server required.