
Generate Long Videos by Chaining Sora 2 Segments
OpenAI's Sora 2 caps videos at 12 seconds. Learn how to create longer videos by chaining segments together, using the last frame of each video as the first frame of the next.
Patrick the AI Engineer
Table of Contents
Introduction
You've got a killer prompt for Sora 2, and you hit generate. The video looks great, but it's only 12 seconds long. Your idea needs at least 20 seconds to really work. You could generate multiple separate videos, but they'd have jarring cuts between them. Each segment would start fresh with no connection to the previous one.
This is the fundamental limitation of Sora's API right now. You can request 4, 8, or 12 seconds, and that's it. But there's a way around it. Sora supports an input_reference parameter that lets you provide an image for the first frame. If we extract the last frame from one generated video and use it as the input reference for the next, we can chain segments together seamlessly.
We'll build a tool that generates longer videos by automatically planning segments, generating them in sequence, and stitching them into a single file. The result feels like one continuous video, not a series of cuts.
The Core Concept
The idea is straightforward: generate video segments one at a time, passing visual continuity from one to the next. After we generate the first segment, we extract its final frame as a JPEG. That image becomes the input_reference for the second segment. When Sora generates the second segment, it starts from that frame, creating visual continuity.
We repeat this process for as many segments as needed. Want 20 seconds? Generate a 12-second segment, then an 8-second segment, chaining them together. Need 24 seconds? That's two 12-second segments. The planning logic is simple: use the largest allowed segment sizes (12, 8, 4) until you hit your target duration.
Once all segments are generated, we stitch them together into a single MP4 file. If possible, we do a lossless concatenation at the packet level. If that fails (due to codec or metadata issues), we fall back to decoding and re-encoding. This gives us one smooth video file instead of a collection of clips.
The key insight is that Sora's input_reference isn't just for starting from a specific image. It's a tool for maintaining continuity across multiple generations. Each segment picks up where the last one left off.
Implementation: Vanilla TypeScript
Let's build this step by step. We'll start with planning the segments, then generate each one, and finally stitch them together.
Planning Segments
First, we need a function that breaks a desired duration into valid segment sizes:
type AllowedSegmentSeconds = 4 | 8 | 12;
function planSegments(totalSeconds: number): AllowedSegmentSeconds[] {
const allowed: AllowedSegmentSeconds[] = [12, 8, 4];
const segments: AllowedSegmentSeconds[] = [];
let remaining = totalSeconds;
for (const segmentSize of allowed) {
while (remaining >= segmentSize) {
segments.push(segmentSize);
remaining -= segmentSize;
}
}
if (remaining !== 0) {
throw new Error(`Cannot plan segments for ${totalSeconds} seconds`);
}
return segments;
}
This greedy approach always picks the largest segment size possible. For 20 seconds, you get [12, 8]. For 24 seconds, you get [12, 12]. It's not perfect for every case, but it minimizes the number of API calls.
Extracting the Last Frame
After generating a video, we need to grab its final frame. We'll use the browser's built-in video decoding capabilities:
async function extractLastFrame(videoBlob: Blob): Promise<Blob> {
const video = document.createElement('video');
video.src = URL.createObjectURL(videoBlob);
video.muted = true;
await new Promise((resolve) => {
video.onloadedmetadata = resolve;
});
// Seek to the end
video.currentTime = video.duration;
await new Promise((resolve) => {
video.onseeked = resolve;
});
Once we're at the last frame, we draw it to a canvas and export as JPEG:
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
const ctx = canvas.getContext('2d');
ctx?.drawImage(video, 0, 0);
return new Promise((resolve, reject) => {
canvas.toBlob((blob) => {
if (blob) resolve(blob);
else reject(new Error('Failed to extract frame'));
}, 'image/jpeg', 0.92);
});
}
The quality setting of 0.92 gives us a good balance between file size and visual quality. Sora doesn't need a perfect lossless image to maintain continuity.
Generating a Segment
Now we can generate a single segment, optionally using an input reference:
import OpenAI, { toFile } from 'openai';
async function generateSegment(
openai: OpenAI,
prompt: string,
seconds: AllowedSegmentSeconds,
model: 'sora-2' | 'sora-2-pro',
size: string,
inputReference?: Blob
): Promise<Blob> {
const body: any = { model, prompt, seconds: String(seconds), size };
if (inputReference) {
body.input_reference = await toFile(
inputReference,
'reference.jpg',
{ type: 'image/jpeg' }
);
}
We start the generation job and poll until it completes:
let job = await openai.videos.create(body);
while (job.status === 'in_progress' || job.status === 'queued') {
await new Promise(resolve => setTimeout(resolve, 2000));
job = await openai.videos.retrieve(job.id);
}
if (job.status === 'failed') {
throw new Error(job.error?.message || 'Generation failed');
}
const response = await openai.videos.downloadContent(job.id);
const arrayBuffer = await response.arrayBuffer();
return new Blob([arrayBuffer], { type: 'video/mp4' });
}
The polling interval of 2 seconds is a reasonable balance. Sora jobs typically take 30-90 seconds, so checking more frequently doesn't help much.
Chaining Segments Together
Now we can orchestrate the full process:
async function generateLongVideo(
openai: OpenAI,
prompt: string,
totalSeconds: number,
model: 'sora-2' | 'sora-2-pro',
size: string
): Promise<Blob[]> {
const plan = planSegments(totalSeconds);
const segments: Blob[] = [];
let inputReference: Blob | undefined;
for (const seconds of plan) {
const segment = await generateSegment(
openai,
prompt,
seconds,
model,
size,
inputReference
);
After each segment, extract its last frame for the next one:
segments.push(segment);
inputReference = await extractLastFrame(segment);
}
return segments;
}
This gives us an array of video blobs. Each one starts where the previous one ended. The last step is stitching them into a single file.
Stitching Segments
For stitching, we’ll use Mediabunny for lossless packet-level concatenation. This avoids re-encoding and keeps quality intact. If you need a fallback re-encode, you can add it later.
import {
ALL_FORMATS,
BlobSource,
BufferTarget,
EncodedAudioPacketSource,
EncodedPacket,
EncodedPacketSink,
EncodedVideoPacketSource,
Input,
Mp4OutputFormat,
Output
} from 'mediabunny';
async function stitchSegments(segments: Blob[]): Promise<Blob> {
if (!segments.length) throw new Error('No segments to stitch');
// Derive codecs from first segment
const firstInput = new Input({ source: new BlobSource(segments[0]), formats: ALL_FORMATS });
const firstVideoTrack = await firstInput.getPrimaryVideoTrack();
const firstAudioTrack = await firstInput.getPrimaryAudioTrack();
if (!firstVideoTrack) throw new Error('First segment has no video track');
const videoCodec = (await firstVideoTrack.codec) || 'avc';
const audioCodec = firstAudioTrack ? (await firstAudioTrack.codec) || 'aac' : null;
const target = new BufferTarget();
const output = new Output({ format: new Mp4OutputFormat({ fastStart: 'in-memory' }), target });
const outVideo = new EncodedVideoPacketSource(videoCodec as any);
output.addVideoTrack(outVideo);
let outAudio: EncodedAudioPacketSource | null = null;
if (audioCodec) {
outAudio = new EncodedAudioPacketSource(audioCodec as any);
output.addAudioTrack(outAudio);
}
await output.start();
let videoTimestampOffset = 0;
let audioTimestampOffset = 0;
for (const seg of segments) {
const input = new Input({ source: new BlobSource(seg), formats: ALL_FORMATS });
const vTrack = await input.getPrimaryVideoTrack();
const aTrack = await input.getPrimaryAudioTrack();
if (!vTrack) continue;
// Copy video packets
const vSink = new EncodedPacketSink(vTrack);
for await (const packet of vSink.packets()) {
const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + videoTimestampOffset });
await outVideo.add(cloned);
}
const vDur = await vTrack.computeDuration();
videoTimestampOffset += vDur;
// Copy audio packets if available
if (aTrack && outAudio) {
const aSink = new EncodedPacketSink(aTrack);
for await (const packet of aSink.packets()) {
const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + audioTimestampOffset });
await outAudio.add(cloned);
}
const aDur = await aTrack.computeDuration();
audioTimestampOffset += aDur;
}
}
outVideo.close();
if (outAudio) outAudio.close();
await output.finalize();
// Retrieve the in-memory MP4 buffer and return it as a Blob
const finalBuffer: ArrayBuffer | null = (target as unknown as { buffer: ArrayBuffer | null }).buffer;
if (!finalBuffer) throw new Error('Failed to finalize stitched video');
return new Blob([finalBuffer], { type: 'video/mp4' });
}
This approach is fast and preserves quality. Add a decode/encode fallback only if you need to reconcile codec mismatches.
Implementation: Vue.js
Vue makes this pattern much cleaner by handling reactivity and state management for us. Let's rebuild the same functionality as a Vue component.
Setting Up State
We'll track the form inputs, generation progress, and results:
<script setup lang="ts">
import OpenAI from 'openai';
import { ref, computed } from 'vue';
const promptText = ref('');
const model = ref<'sora-2' | 'sora-2-pro'>('sora-2');
const totalSeconds = ref<16 | 20 | 24>(16);
const size = ref('720x1280');
const isGenerating = ref(false);
const currentSegment = ref(0);
const totalSegments = ref(0);
const overallProgress = ref(0);
const segmentBlobs = ref<Blob[]>([]);
const finalVideoUrl = ref<string | null>(null);
Vue's reactivity means our UI will automatically update when these values change. No need to manually trigger re-renders.
Computing Cost Estimates
We can use a computed property to show users the estimated cost in real-time:
const estimatedCost = computed(() => {
if (model.value === 'sora-2') {
return totalSeconds.value * 0.10;
}
const isHighRes = size.value === '1024x1792' || size.value === '1792x1024';
return totalSeconds.value * (isHighRes ? 0.50 : 0.30);
});
This recalculates automatically when the user changes the model, duration, or resolution. The pricing is based on OpenAI's current rates: $0.10/second for Sora 2, $0.30-0.50/second for Sora 2 Pro.
Generating Segments with Progress
Here's the main generation function, adapted for Vue:
async function startGeneration() {
isGenerating.value = true;
segmentBlobs.value = [];
finalVideoUrl.value = null;
const openai = new OpenAI({
apiKey: yourApiKey,
dangerouslyAllowBrowser: true
});
const plans = planSegments(totalSeconds.value);
totalSegments.value = plans.length;
let inputReference: Blob | undefined;
As we generate each segment, we update progress state:
for (let i = 0; i < plans.length; i++) {
currentSegment.value = i + 1;
const segment = await generateSegment(
openai,
promptText.value,
plans[i],
model.value,
size.value,
inputReference
);
segmentBlobs.value.push(segment);
inputReference = await extractLastFrame(segment);
overallProgress.value = Math.round(((i + 1) / plans.length) * 100);
}
After all segments are generated, stitch them together:
const finalBlob = await stitchSegments(segmentBlobs.value);
finalVideoUrl.value = URL.createObjectURL(finalBlob);
isGenerating.value = false;
}
Vue's reactivity automatically updates the UI as currentSegment, overallProgress, and finalVideoUrl change. We don't need to write any DOM manipulation code.
Handling Segment Generation
We can extract the per-segment logic into a helper that updates progress as the job runs:
async function generateSegment(
openai: OpenAI,
prompt: string,
seconds: number,
model: string,
size: string,
inputRef?: Blob
): Promise<Blob> {
const body: any = { model, prompt, seconds: String(seconds), size };
if (inputRef) {
body.input_reference = await toFile(inputRef, 'ref.jpg', {
type: 'image/jpeg'
});
}
Poll the job status and update a reactive progress value:
let job = await openai.videos.create(body);
while (job.status === 'in_progress' || job.status === 'queued') {
await new Promise(r => setTimeout(r, 2000));
job = await openai.videos.retrieve(job.id);
const progress = job.progress ?? 0;
segmentProgress.value = progress;
}
const resp = await openai.videos.downloadContent(job.id);
return new Blob([await resp.arrayBuffer()], { type: 'video/mp4' });
}
We've added a segmentProgress reactive value that updates as each individual segment generates. This lets us show both per-segment and overall progress to the user.
Building the UI
The template is straightforward. Here's the form section:
<template>
<div class="generator">
<h1>Long Video Generator</h1>
<div class="form">
<label>
Prompt
<textarea v-model="promptText" rows="4" />
</label>
<label>
Model
<select v-model="model">
<option value="sora-2">Sora 2 ($0.10/sec)</option>
<option value="sora-2-pro">Sora 2 Pro ($0.30-0.50/sec)</option>
</select>
</label>
Show the cost estimate and generation button:
<div class="cost">
Estimated Cost: ${{ estimatedCost.toFixed(2) }}
</div>
<button
@click="startGeneration"
:disabled="!promptText || isGenerating"
>
{{ isGenerating ? 'Generating...' : 'Generate Video' }}
</button>
</div>
Display progress while generating:
<div v-if="isGenerating" class="progress">
<div>
Overall: {{ overallProgress }}%
</div>
<div>
Segment {{ currentSegment }} of {{ totalSegments }}
</div>
</div>
<video v-if="finalVideoUrl" :src="finalVideoUrl" controls />
</div>
</template>
Vue handles all the conditional rendering and data binding. When isGenerating becomes true, the progress section appears. When finalVideoUrl is set, the video player appears.
Wrapping Up
We've built a system that breaks through Sora's 12-second limit by chaining segments with visual continuity. The pattern is simple: generate, extract last frame, use as input for next segment, repeat. With proper stitching, the result is a single smooth video.
The next step would be adding persistence so users don't lose progress on failures. You could also experiment with different segment planning strategies. Maybe shorter segments give better continuity for certain types of prompts, or maybe you want to adjust the prompt slightly for each segment to maintain narrative coherence.
We could even expand by allowing for even longer videos: 30 seconds, 60 seconds maybe even a couple of minutes. The sky (and your wallet) are the limits!
Full Code Examples
import OpenAI, { toFile } from 'openai';
import {
ALL_FORMATS,
BlobSource,
BufferTarget,
EncodedAudioPacketSource,
EncodedPacket,
EncodedPacketSink,
EncodedVideoPacketSource,
Input,
Mp4OutputFormat,
Output
} from 'mediabunny';
type AllowedSegmentSeconds = 4 | 8 | 12;
type VideoModel = 'sora-2' | 'sora-2-pro';
// Plan segments to reach target duration
function planSegments(totalSeconds: number): AllowedSegmentSeconds[] {
const allowed: AllowedSegmentSeconds[] = [12, 8, 4];
const segments: AllowedSegmentSeconds[] = [];
let remaining = totalSeconds;
for (const segmentSize of allowed) {
while (remaining >= segmentSize) {
segments.push(segmentSize);
remaining -= segmentSize;
}
}
if (remaining !== 0) {
throw new Error(`Cannot plan segments for ${totalSeconds} seconds`);
}
return segments;
}
// Extract last frame from video as JPEG
async function extractLastFrame(videoBlob: Blob): Promise<Blob> {
const video = document.createElement('video');
video.src = URL.createObjectURL(videoBlob);
video.muted = true;
await new Promise((resolve) => {
video.onloadedmetadata = resolve;
});
video.currentTime = video.duration;
await new Promise((resolve) => {
video.onseeked = resolve;
});
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
const ctx = canvas.getContext('2d');
if (!ctx) throw new Error('Canvas context unavailable');
ctx.drawImage(video, 0, 0);
return new Promise((resolve, reject) => {
canvas.toBlob(
(blob) => blob ? resolve(blob) : reject(new Error('Failed to extract frame')),
'image/jpeg',
0.92
);
});
}
// Generate a single video segment
async function generateSegment(
openai: OpenAI,
prompt: string,
seconds: AllowedSegmentSeconds,
model: VideoModel,
size: string,
inputReference?: Blob
): Promise<Blob> {
const body: any = {
model,
prompt,
seconds: String(seconds),
size
};
if (inputReference) {
body.input_reference = await toFile(
inputReference,
'reference.jpg',
{ type: 'image/jpeg' }
);
}
let job = await openai.videos.create(body);
while (job.status === 'in_progress' || job.status === 'queued') {
await new Promise(resolve => setTimeout(resolve, 2000));
job = await openai.videos.retrieve(job.id);
}
if (job.status === 'failed') {
throw new Error(job.error?.message || 'Generation failed');
}
const response = await openai.videos.downloadContent(job.id);
const arrayBuffer = await response.arrayBuffer();
return new Blob([arrayBuffer], { type: 'video/mp4' });
}
// Generate long video by chaining segments
async function generateLongVideo(
apiKey: string,
prompt: string,
totalSeconds: number,
model: VideoModel = 'sora-2',
size: string = '720x1280'
): Promise<Blob[]> {
const openai = new OpenAI({
apiKey,
dangerouslyAllowBrowser: true
});
const plan = planSegments(totalSeconds);
const segments: Blob[] = [];
let inputReference: Blob | undefined;
for (const seconds of plan) {
const segment = await generateSegment(
openai,
prompt,
seconds,
model,
size,
inputReference
);
segments.push(segment);
inputReference = await extractLastFrame(segment);
}
return segments;
}
// Stitch segments into single video (lossless, no re-encode)
async function stitchSegments(segments: Blob[]): Promise<Blob> {
if (!segments.length) throw new Error('No segments to stitch');
const firstInput = new Input({ source: new BlobSource(segments[0]), formats: ALL_FORMATS });
const firstVideoTrack = await firstInput.getPrimaryVideoTrack();
const firstAudioTrack = await firstInput.getPrimaryAudioTrack();
if (!firstVideoTrack) throw new Error('First segment has no video track');
const videoCodec = (await firstVideoTrack.codec) || 'avc';
const audioCodec = firstAudioTrack ? (await firstAudioTrack.codec) || 'aac' : null;
const target = new BufferTarget();
const output = new Output({ format: new Mp4OutputFormat({ fastStart: 'in-memory' }), target });
const outVideo = new EncodedVideoPacketSource(videoCodec as any);
output.addVideoTrack(outVideo);
let outAudio: EncodedAudioPacketSource | null = null;
if (audioCodec) {
outAudio = new EncodedAudioPacketSource(audioCodec as any);
output.addAudioTrack(outAudio);
}
await output.start();
let videoTimestampOffset = 0;
let audioTimestampOffset = 0;
for (const seg of segments) {
const input = new Input({ source: new BlobSource(seg), formats: ALL_FORMATS });
const vTrack = await input.getPrimaryVideoTrack();
const aTrack = await input.getPrimaryAudioTrack();
if (!vTrack) continue;
const vSink = new EncodedPacketSink(vTrack);
for await (const packet of vSink.packets()) {
const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + videoTimestampOffset });
await outVideo.add(cloned);
}
const vDur = await vTrack.computeDuration();
videoTimestampOffset += vDur;
if (aTrack && outAudio) {
const aSink = new EncodedPacketSink(aTrack);
for await (const packet of aSink.packets()) {
const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + audioTimestampOffset });
await outAudio.add(cloned);
}
const aDur = await aTrack.computeDuration();
audioTimestampOffset += aDur;
}
}
outVideo.close();
if (outAudio) outAudio.close();
await output.finalize();
const finalBuffer: ArrayBuffer | null = (target as unknown as { buffer: ArrayBuffer | null }).buffer;
if (!finalBuffer) throw new Error('Failed to finalize stitched video');
return new Blob([finalBuffer], { type: 'video/mp4' });
}
// Usage
async function main() {
const segments = await generateLongVideo(
'your-api-key',
'A serene ocean sunset with rolling waves',
20
);
const finalVideo = await stitchSegments(segments);
const url = URL.createObjectURL(finalVideo);
const video = document.createElement('video');
video.src = url;
video.controls = true;
document.body.appendChild(video);
}
<script setup lang="ts">
import OpenAI, { toFile } from 'openai';
import { ref, computed } from 'vue';
import {
ALL_FORMATS,
BlobSource,
BufferTarget,
EncodedAudioPacketSource,
EncodedPacket,
EncodedPacketSink,
EncodedVideoPacketSource,
Input,
Mp4OutputFormat,
Output
} from 'mediabunny';
type AllowedSegmentSeconds = 4 | 8 | 12;
type VideoModel = 'sora-2' | 'sora-2-pro';
type VideoSize = '720x1280' | '1280x720' | '1024x1792' | '1792x1024';
// Form state
const promptText = ref('');
const model = ref<VideoModel>('sora-2');
const size = ref<VideoSize>('720x1280');
const totalSeconds = ref<16 | 20 | 24>(16);
const apiKey = ref(''); // Get from user input or env
// UI state
const isGenerating = ref(false);
const overallProgress = ref(0);
const currentSegment = ref(0);
const totalSegments = ref(0);
const segmentProgress = ref(0);
const errorMessage = ref('');
// Results
const segmentBlobs = ref<Blob[]>([]);
const finalVideoUrl = ref<string | null>(null);
// Cost estimation
const estimatedCost = computed(() => {
if (model.value === 'sora-2') {
return totalSeconds.value * 0.10;
}
const isHighRes = size.value === '1024x1792' || size.value === '1792x1024';
return totalSeconds.value * (isHighRes ? 0.50 : 0.30);
});
// Size options based on selected model
const sizeOptions = computed(() =>
model.value === 'sora-2'
? [
{ label: '720×1280 (portrait)', value: '720x1280' },
{ label: '1280×720 (landscape)', value: '1280x720' }
]
: [
{ label: '720×1280 (portrait)', value: '720x1280' },
{ label: '1280×720 (landscape)', value: '1280x720' },
{ label: '1024×1792 (portrait, high-res)', value: '1024x1792' },
{ label: '1792×1024 (landscape, high-res)', value: '1792x1024' }
]
);
function planSegments(total: number): AllowedSegmentSeconds[] {
const allowed: AllowedSegmentSeconds[] = [12, 8, 4];
const out: AllowedSegmentSeconds[] = [];
let remain = total;
for (const seg of allowed) {
while (remain >= seg) {
out.push(seg);
remain -= seg;
}
}
if (remain !== 0) {
throw new Error('Unable to plan segments for requested duration.');
}
return out;
}
async function extractLastFrame(videoBlob: Blob): Promise<Blob> {
const video = document.createElement('video');
video.src = URL.createObjectURL(videoBlob);
video.muted = true;
await new Promise((resolve) => {
video.onloadedmetadata = resolve;
});
video.currentTime = video.duration;
await new Promise((resolve) => {
video.onseeked = resolve;
});
const canvas = document.createElement('canvas');
canvas.width = video.videoWidth;
canvas.height = video.videoHeight;
const ctx = canvas.getContext('2d');
if (!ctx) throw new Error('Canvas context unavailable');
ctx.drawImage(video, 0, 0);
return new Promise((resolve, reject) => {
canvas.toBlob(
(blob) => blob ? resolve(blob) : reject(new Error('Failed to extract frame')),
'image/jpeg',
0.92
);
});
}
async function generateSegment(
openai: OpenAI,
prompt: string,
seconds: AllowedSegmentSeconds,
inputRef?: Blob
): Promise<Blob> {
const body: any = {
model: model.value,
prompt,
seconds: String(seconds),
size: size.value
};
if (inputRef) {
body.input_reference = await toFile(inputRef, 'ref.jpg', {
type: 'image/jpeg'
});
}
let job = await openai.videos.create(body);
while (job.status === 'in_progress' || job.status === 'queued') {
await new Promise(r => setTimeout(r, 2000));
job = await openai.videos.retrieve(job.id);
segmentProgress.value = job.progress ?? 0;
}
if (job.status === 'failed') {
throw new Error(job.error?.message || 'Segment failed');
}
const resp = await openai.videos.downloadContent(job.id);
const ab = await resp.arrayBuffer();
return new Blob([ab], { type: 'video/mp4' });
}
async function stitchSegments(blobs: Blob[]): Promise<Blob> {
if (!blobs.length) throw new Error('No segments to stitch');
const firstInput = new Input({ source: new BlobSource(blobs[0]), formats: ALL_FORMATS });
const firstVideoTrack = await firstInput.getPrimaryVideoTrack();
const firstAudioTrack = await firstInput.getPrimaryAudioTrack();
if (!firstVideoTrack) throw new Error('First segment has no video track');
const videoCodec = (await firstVideoTrack.codec) || 'avc';
const audioCodec = firstAudioTrack ? (await firstAudioTrack.codec) || 'aac' : null;
const target = new BufferTarget();
const output = new Output({ format: new Mp4OutputFormat({ fastStart: 'in-memory' }), target });
const outVideo = new EncodedVideoPacketSource(videoCodec as any);
output.addVideoTrack(outVideo);
let outAudio: EncodedAudioPacketSource | null = null;
if (audioCodec) {
outAudio = new EncodedAudioPacketSource(audioCodec as any);
output.addAudioTrack(outAudio);
}
await output.start();
let videoTimestampOffset = 0;
let audioTimestampOffset = 0;
for (const seg of blobs) {
const input = new Input({ source: new BlobSource(seg), formats: ALL_FORMATS });
const vTrack = await input.getPrimaryVideoTrack();
const aTrack = await input.getPrimaryAudioTrack();
if (!vTrack) continue;
const vSink = new EncodedPacketSink(vTrack);
for await (const packet of vSink.packets()) {
const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + videoTimestampOffset });
await outVideo.add(cloned);
}
const vDur = await vTrack.computeDuration();
videoTimestampOffset += vDur;
if (aTrack && outAudio) {
const aSink = new EncodedPacketSink(aTrack);
for await (const packet of aSink.packets()) {
const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + audioTimestampOffset });
await outAudio.add(cloned);
}
const aDur = await aTrack.computeDuration();
audioTimestampOffset += aDur;
}
}
outVideo.close();
if (outAudio) outAudio.close();
await output.finalize();
const finalBuffer: ArrayBuffer | null = (target as unknown as { buffer: ArrayBuffer | null }).buffer;
if (!finalBuffer) throw new Error('Failed to finalize stitched video');
return new Blob([finalBuffer], { type: 'video/mp4' });
}
async function startGeneration() {
errorMessage.value = '';
finalVideoUrl.value = null;
segmentBlobs.value = [];
overallProgress.value = 0;
try {
if (!promptText.value.trim()) {
errorMessage.value = 'Please enter a prompt';
return;
}
isGenerating.value = true;
const openai = new OpenAI({
apiKey: apiKey.value,
dangerouslyAllowBrowser: true
});
const plans = planSegments(totalSeconds.value);
totalSegments.value = plans.length;
let inputRef: Blob | undefined;
for (let i = 0; i < plans.length; i++) {
currentSegment.value = i + 1;
const segment = await generateSegment(
openai,
promptText.value.trim(),
plans[i],
inputRef
);
segmentBlobs.value.push(segment);
inputRef = await extractLastFrame(segment);
overallProgress.value = Math.round(((i + 1) / plans.length) * 100);
}
const finalBlob = await stitchSegments(segmentBlobs.value);
finalVideoUrl.value = URL.createObjectURL(finalBlob);
} catch (e: any) {
errorMessage.value = e?.message || String(e);
} finally {
isGenerating.value = false;
}
}
function resetAll() {
if (finalVideoUrl.value) {
URL.revokeObjectURL(finalVideoUrl.value);
}
finalVideoUrl.value = null;
segmentBlobs.value = [];
overallProgress.value = 0;
currentSegment.value = 0;
totalSegments.value = 0;
errorMessage.value = '';
}
function downloadVideo() {
if (!finalVideoUrl.value) return;
const a = document.createElement('a');
a.href = finalVideoUrl.value;
a.download = 'long-video.mp4';
a.click();
}
</script>
<template>
<div class="long-video-generator">
<div class="header">
<h1>Long Video Generator</h1>
<p>Create longer Sora videos by chaining 4/8/12s segments seamlessly.</p>
</div>
<div class="card">
<div class="form-section">
<div class="form-group">
<label for="prompt">
Prompt <span class="required">*</span>
</label>
<textarea
id="prompt"
v-model="promptText"
rows="4"
placeholder="Describe your video in detail…"
/>
</div>
<div class="form-group">
<label>Model</label>
<p class="help-text">
Sora 2 is cheaper; Pro supports high-res options
</p>
<div class="radio-group">
<label>
<input v-model="model" type="radio" value="sora-2">
Sora 2 ($0.10/second)
</label>
<label>
<input v-model="model" type="radio" value="sora-2-pro">
Sora 2 Pro ($0.30-0.50/second)
</label>
</div>
</div>
<div class="form-group">
<label>Resolution</label>
<div class="radio-group">
<label v-for="opt in sizeOptions" :key="opt.value">
<input v-model="size" type="radio" :value="opt.value">
{{ opt.label }}
</label>
</div>
</div>
<div class="form-group">
<label>Duration</label>
<div class="radio-group">
<label>
<input v-model="totalSeconds" type="radio" :value="16">
16 seconds
</label>
<label>
<input v-model="totalSeconds" type="radio" :value="20">
20 seconds
</label>
<label>
<input v-model="totalSeconds" type="radio" :value="24">
24 seconds
</label>
</div>
</div>
<div class="cost-estimate">
<span>Estimated Cost:</span>
<strong>${{ estimatedCost.toFixed(2) }}</strong>
</div>
<div class="actions">
<button
@click="startGeneration"
:disabled="!promptText || isGenerating"
>
{{ isGenerating ? 'Generating...' : 'Generate Video' }}
</button>
<button @click="resetAll" :disabled="isGenerating">
Reset
</button>
</div>
</div>
<div v-if="isGenerating || overallProgress > 0" class="progress-section">
<h3>Progress</h3>
<div class="progress-item">
<div>Overall Progress: {{ overallProgress }}%</div>
<div class="progress-bar">
<div class="progress-fill" :style="{ width: `${overallProgress}%` }" />
</div>
</div>
<div v-if="isGenerating && currentSegment > 0" class="progress-item">
<div>Segment {{ currentSegment }} of {{ totalSegments }}: {{ segmentProgress }}%</div>
<div class="progress-bar">
<div class="progress-fill" :style="{ width: `${segmentProgress}%` }" />
</div>
</div>
<div v-if="errorMessage" class="error">
<strong>Error:</strong> {{ errorMessage }}
</div>
</div>
<div v-if="finalVideoUrl" class="result-section">
<h3>Result</h3>
<video :src="finalVideoUrl" controls playsinline />
<button @click="downloadVideo">Download MP4</button>
</div>
</div>
</div>
</template>
<style scoped>
.long-video-generator {
max-width: 1000px;
margin: 0 auto;
padding: 24px;
}
.header {
margin-bottom: 32px;
text-align: center;
}
.header h1 {
font-size: 36px;
margin: 0 0 8px;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
}
.card {
background: white;
border-radius: 16px;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
padding: 32px;
}
.form-section {
display: flex;
flex-direction: column;
gap: 24px;
}
.form-group label {
display: block;
font-weight: 600;
margin-bottom: 8px;
}
.form-group textarea {
width: 100%;
padding: 12px;
border: 2px solid #e5e7eb;
border-radius: 8px;
font-family: inherit;
}
.radio-group {
display: flex;
gap: 16px;
flex-wrap: wrap;
}
.radio-group label {
display: flex;
align-items: center;
gap: 8px;
}
.cost-estimate {
padding: 16px;
background: #f0f4ff;
border-radius: 8px;
}
.actions {
display: flex;
gap: 12px;
}
button {
padding: 12px 24px;
border: none;
border-radius: 8px;
font-weight: 600;
cursor: pointer;
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.progress-section {
margin-top: 32px;
padding-top: 32px;
border-top: 2px solid #f3f4f6;
}
.progress-bar {
height: 12px;
background: #f3f4f6;
border-radius: 999px;
overflow: hidden;
}
.progress-fill {
height: 100%;
background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
transition: width 0.3s ease;
}
.result-section {
margin-top: 32px;
padding-top: 32px;
border-top: 2px solid #f3f4f6;
}
video {
width: 100%;
border-radius: 12px;
margin-bottom: 16px;
}
.error {
padding: 16px;
background: #fef2f2;
border: 2px solid #fecaca;
border-radius: 8px;
color: #dc2626;
}
</style>
<!DOCTYPE html>
<html lang="en">
<head>
<meta charset="UTF-8">
<meta name="viewport" content="width=device-width, initial-scale=1.0">
<title>Long Video Generator</title>
<style>
* {
box-sizing: border-box;
}
body {
font-family: system-ui, -apple-system, sans-serif;
max-width: 1000px;
margin: 0 auto;
padding: 24px;
background: #f9fafb;
}
.header {
text-align: center;
margin-bottom: 32px;
}
.header h1 {
font-size: 36px;
margin: 0 0 8px;
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
-webkit-background-clip: text;
-webkit-text-fill-color: transparent;
}
.card {
background: white;
border-radius: 16px;
box-shadow: 0 4px 6px rgba(0, 0, 0, 0.1);
padding: 32px;
}
.form-group {
margin-bottom: 24px;
}
.form-group label {
display: block;
font-weight: 600;
margin-bottom: 8px;
}
.form-group textarea,
.form-group select {
width: 100%;
padding: 12px;
border: 2px solid #e5e7eb;
border-radius: 8px;
font-family: inherit;
font-size: 15px;
}
.cost-estimate {
padding: 16px;
background: #f0f4ff;
border-radius: 8px;
margin-bottom: 24px;
}
button {
padding: 12px 24px;
border: none;
border-radius: 8px;
font-weight: 600;
cursor: pointer;
font-size: 15px;
margin-right: 12px;
}
button:disabled {
opacity: 0.5;
cursor: not-allowed;
}
.btn-primary {
background: linear-gradient(135deg, #667eea 0%, #764ba2 100%);
color: white;
}
.btn-secondary {
background: #f3f4f6;
color: #4b5563;
}
.progress-section {
margin-top: 32px;
padding-top: 32px;
border-top: 2px solid #f3f4f6;
}
.progress-bar {
height: 12px;
background: #f3f4f6;
border-radius: 999px;
overflow: hidden;
margin: 8px 0 16px;
}
.progress-fill {
height: 100%;
background: linear-gradient(90deg, #667eea 0%, #764ba2 100%);
transition: width 0.3s ease;
}
video {
width: 100%;
border-radius: 12px;
margin-top: 24px;
}
.hidden {
display: none;
}
.error {
padding: 16px;
background: #fef2f2;
border: 2px solid #fecaca;
border-radius: 8px;
color: #dc2626;
margin-top: 16px;
}
</style>
</head>
<body>
<div class="header">
<h1>Long Video Generator</h1>
<p>Create longer Sora videos by chaining segments seamlessly</p>
</div>
<div class="card">
<div class="form-group">
<label for="apiKey">OpenAI API Key *</label>
<input type="password" id="apiKey" placeholder="sk-...">
</div>
<div class="form-group">
<label for="prompt">Prompt *</label>
<textarea id="prompt" rows="4" placeholder="Describe your video in detail..."></textarea>
</div>
<div class="form-group">
<label for="model">Model</label>
<select id="model">
<option value="sora-2">Sora 2 ($0.10/second)</option>
<option value="sora-2-pro">Sora 2 Pro ($0.30-0.50/second)</option>
</select>
</div>
<div class="form-group">
<label for="size">Resolution</label>
<select id="size">
<option value="720x1280">720×1280 (portrait)</option>
<option value="1280x720">1280×720 (landscape)</option>
</select>
</div>
<div class="form-group">
<label for="duration">Duration</label>
<select id="duration">
<option value="16">16 seconds</option>
<option value="20">20 seconds</option>
<option value="24">24 seconds</option>
</select>
</div>
<div class="cost-estimate">
Estimated Cost: <strong id="cost">$1.60</strong>
</div>
<div>
<button id="generateBtn" class="btn-primary">Generate Video</button>
<button id="resetBtn" class="btn-secondary">Reset</button>
</div>
<div id="progressSection" class="progress-section hidden">
<h3>Progress</h3>
<div id="overallProgress">Overall: 0%</div>
<div class="progress-bar">
<div id="overallFill" class="progress-fill" style="width: 0%"></div>
</div>
<div id="segmentProgress"></div>
<div class="progress-bar">
<div id="segmentFill" class="progress-fill" style="width: 0%"></div>
</div>
<div id="errorMessage" class="error hidden"></div>
</div>
<div id="resultSection" class="hidden">
<h3>Result</h3>
<video id="resultVideo" controls playsinline></video>
<br>
<button id="downloadBtn" class="btn-primary">Download MP4</button>
</div>
</div>
<script type="module">
// Import your TypeScript functions here
// This is a simplified example - you'd need to bundle with esbuild or similar
const elements = {
apiKey: document.getElementById('apiKey'),
prompt: document.getElementById('prompt'),
model: document.getElementById('model'),
size: document.getElementById('size'),
duration: document.getElementById('duration'),
cost: document.getElementById('cost'),
generateBtn: document.getElementById('generateBtn'),
resetBtn: document.getElementById('resetBtn'),
progressSection: document.getElementById('progressSection'),
overallProgress: document.getElementById('overallProgress'),
overallFill: document.getElementById('overallFill'),
segmentProgress: document.getElementById('segmentProgress'),
segmentFill: document.getElementById('segmentFill'),
errorMessage: document.getElementById('errorMessage'),
resultSection: document.getElementById('resultSection'),
resultVideo: document.getElementById('resultVideo'),
downloadBtn: document.getElementById('downloadBtn')
};
// Update cost estimate when inputs change
function updateCost() {
const duration = parseInt(elements.duration.value);
const model = elements.model.value;
const size = elements.size.value;
let costPerSecond = 0.10;
if (model === 'sora-2-pro') {
const isHighRes = size === '1024x1792' || size === '1792x1024';
costPerSecond = isHighRes ? 0.50 : 0.30;
}
elements.cost.textContent = `$${(duration * costPerSecond).toFixed(2)}`;
}
elements.model.addEventListener('change', updateCost);
elements.size.addEventListener('change', updateCost);
elements.duration.addEventListener('change', updateCost);
elements.generateBtn.addEventListener('click', async () => {
// Call your generateLongVideo function here
// Update progress UI as generation proceeds
console.log('Generate video with:', {
apiKey: elements.apiKey.value,
prompt: elements.prompt.value,
model: elements.model.value,
size: elements.size.value,
duration: elements.duration.value
});
});
elements.resetBtn.addEventListener('click', () => {
elements.progressSection.classList.add('hidden');
elements.resultSection.classList.add('hidden');
elements.errorMessage.classList.add('hidden');
});
elements.downloadBtn.addEventListener('click', () => {
const url = elements.resultVideo.src;
const a = document.createElement('a');
a.href = url;
a.download = 'long-video.mp4';
a.click();
});
</script>
</body>
</html>

Generate Hyperrealistic Videos with Google's Veo 3.1
Learn how to generate photorealistic videos from text prompts using Google's Veo 3.1 text-to-video model with async polling
Mastering Sora 2 Prompting: A Technical Guide with Code Examples
Learn how to craft effective prompts for OpenAI's Sora 2 API with practical TypeScript examples. From basic prompt structure to advanced techniques like remix and multi-shot generation.