Back to all AI tutorials
Oct 7, 2025 - 18 min read
Generate Long Videos by Chaining Sora 2 Segments

Generate Long Videos by Chaining Sora 2 Segments

OpenAI's Sora 2 caps videos at 12 seconds. Learn how to create longer videos by chaining segments together, using the last frame of each video as the first frame of the next.

Patrick the AI Engineer

Patrick the AI Engineer

Try the Interactive Playground

This tutorial is accompanied by an interactive playground. Test the code, experiment with different parameters, and see the results in real-time.

Go to Playground

Introduction

You've got a killer prompt for Sora 2, and you hit generate. The video looks great, but it's only 12 seconds long. Your idea needs at least 20 seconds to really work. You could generate multiple separate videos, but they'd have jarring cuts between them. Each segment would start fresh with no connection to the previous one.

This is the fundamental limitation of Sora's API right now. You can request 4, 8, or 12 seconds, and that's it. But there's a way around it. Sora supports an input_reference parameter that lets you provide an image for the first frame. If we extract the last frame from one generated video and use it as the input reference for the next, we can chain segments together seamlessly.

We'll build a tool that generates longer videos by automatically planning segments, generating them in sequence, and stitching them into a single file. The result feels like one continuous video, not a series of cuts.

The Core Concept

The idea is straightforward: generate video segments one at a time, passing visual continuity from one to the next. After we generate the first segment, we extract its final frame as a JPEG. That image becomes the input_reference for the second segment. When Sora generates the second segment, it starts from that frame, creating visual continuity.

We repeat this process for as many segments as needed. Want 20 seconds? Generate a 12-second segment, then an 8-second segment, chaining them together. Need 24 seconds? That's two 12-second segments. The planning logic is simple: use the largest allowed segment sizes (12, 8, 4) until you hit your target duration.

Once all segments are generated, we stitch them together into a single MP4 file. If possible, we do a lossless concatenation at the packet level. If that fails (due to codec or metadata issues), we fall back to decoding and re-encoding. This gives us one smooth video file instead of a collection of clips.

The key insight is that Sora's input_reference isn't just for starting from a specific image. It's a tool for maintaining continuity across multiple generations. Each segment picks up where the last one left off.

Implementation: Vanilla TypeScript

Let's build this step by step. We'll start with planning the segments, then generate each one, and finally stitch them together.

Planning Segments

First, we need a function that breaks a desired duration into valid segment sizes:

type AllowedSegmentSeconds = 4 | 8 | 12;

function planSegments(totalSeconds: number): AllowedSegmentSeconds[] {
  const allowed: AllowedSegmentSeconds[] = [12, 8, 4];
  const segments: AllowedSegmentSeconds[] = [];
  let remaining = totalSeconds;
  
  for (const segmentSize of allowed) {
    while (remaining >= segmentSize) {
      segments.push(segmentSize);
      remaining -= segmentSize;
    }
  }
  
  if (remaining !== 0) {
    throw new Error(`Cannot plan segments for ${totalSeconds} seconds`);
  }
  
  return segments;
}

This greedy approach always picks the largest segment size possible. For 20 seconds, you get [12, 8]. For 24 seconds, you get [12, 12]. It's not perfect for every case, but it minimizes the number of API calls.

Extracting the Last Frame

After generating a video, we need to grab its final frame. We'll use the browser's built-in video decoding capabilities:

async function extractLastFrame(videoBlob: Blob): Promise<Blob> {
  const video = document.createElement('video');
  video.src = URL.createObjectURL(videoBlob);
  video.muted = true;
  
  await new Promise((resolve) => {
    video.onloadedmetadata = resolve;
  });
  
  // Seek to the end
  video.currentTime = video.duration;
  await new Promise((resolve) => {
    video.onseeked = resolve;
  });

Once we're at the last frame, we draw it to a canvas and export as JPEG:

  const canvas = document.createElement('canvas');
  canvas.width = video.videoWidth;
  canvas.height = video.videoHeight;
  
  const ctx = canvas.getContext('2d');
  ctx?.drawImage(video, 0, 0);
  
  return new Promise((resolve, reject) => {
    canvas.toBlob((blob) => {
      if (blob) resolve(blob);
      else reject(new Error('Failed to extract frame'));
    }, 'image/jpeg', 0.92);
  });
}

The quality setting of 0.92 gives us a good balance between file size and visual quality. Sora doesn't need a perfect lossless image to maintain continuity.

Generating a Segment

Now we can generate a single segment, optionally using an input reference:

import OpenAI, { toFile } from 'openai';

async function generateSegment(
  openai: OpenAI,
  prompt: string,
  seconds: AllowedSegmentSeconds,
  model: 'sora-2' | 'sora-2-pro',
  size: string,
  inputReference?: Blob
): Promise<Blob> {
  const body: any = { model, prompt, seconds: String(seconds), size };
  
  if (inputReference) {
    body.input_reference = await toFile(
      inputReference,
      'reference.jpg',
      { type: 'image/jpeg' }
    );
  }

We start the generation job and poll until it completes:

  let job = await openai.videos.create(body);
  
  while (job.status === 'in_progress' || job.status === 'queued') {
    await new Promise(resolve => setTimeout(resolve, 2000));
    job = await openai.videos.retrieve(job.id);
  }
  
  if (job.status === 'failed') {
    throw new Error(job.error?.message || 'Generation failed');
  }
  
  const response = await openai.videos.downloadContent(job.id);
  const arrayBuffer = await response.arrayBuffer();
  return new Blob([arrayBuffer], { type: 'video/mp4' });
}

The polling interval of 2 seconds is a reasonable balance. Sora jobs typically take 30-90 seconds, so checking more frequently doesn't help much.

Chaining Segments Together

Now we can orchestrate the full process:

async function generateLongVideo(
  openai: OpenAI,
  prompt: string,
  totalSeconds: number,
  model: 'sora-2' | 'sora-2-pro',
  size: string
): Promise<Blob[]> {
  const plan = planSegments(totalSeconds);
  const segments: Blob[] = [];
  let inputReference: Blob | undefined;
  
  for (const seconds of plan) {
    const segment = await generateSegment(
      openai,
      prompt,
      seconds,
      model,
      size,
      inputReference
    );

After each segment, extract its last frame for the next one:

    segments.push(segment);
    inputReference = await extractLastFrame(segment);
  }
  
  return segments;
}

This gives us an array of video blobs. Each one starts where the previous one ended. The last step is stitching them into a single file.

Stitching Segments

For stitching, we’ll use Mediabunny for lossless packet-level concatenation. This avoids re-encoding and keeps quality intact. If you need a fallback re-encode, you can add it later.

import {
  ALL_FORMATS,
  BlobSource,
  BufferTarget,
  EncodedAudioPacketSource,
  EncodedPacket,
  EncodedPacketSink,
  EncodedVideoPacketSource,
  Input,
  Mp4OutputFormat,
  Output
} from 'mediabunny';

async function stitchSegments(segments: Blob[]): Promise<Blob> {
  if (!segments.length) throw new Error('No segments to stitch');

  // Derive codecs from first segment
  const firstInput = new Input({ source: new BlobSource(segments[0]), formats: ALL_FORMATS });
  const firstVideoTrack = await firstInput.getPrimaryVideoTrack();
  const firstAudioTrack = await firstInput.getPrimaryAudioTrack();
  if (!firstVideoTrack) throw new Error('First segment has no video track');

  const videoCodec = (await firstVideoTrack.codec) || 'avc';
  const audioCodec = firstAudioTrack ? (await firstAudioTrack.codec) || 'aac' : null;

  const target = new BufferTarget();
  const output = new Output({ format: new Mp4OutputFormat({ fastStart: 'in-memory' }), target });

  const outVideo = new EncodedVideoPacketSource(videoCodec as any);
  output.addVideoTrack(outVideo);

  let outAudio: EncodedAudioPacketSource | null = null;
  if (audioCodec) {
    outAudio = new EncodedAudioPacketSource(audioCodec as any);
    output.addAudioTrack(outAudio);
  }

  await output.start();

  let videoTimestampOffset = 0;
  let audioTimestampOffset = 0;

  for (const seg of segments) {
    const input = new Input({ source: new BlobSource(seg), formats: ALL_FORMATS });
    const vTrack = await input.getPrimaryVideoTrack();
    const aTrack = await input.getPrimaryAudioTrack();
    if (!vTrack) continue;

    // Copy video packets
    const vSink = new EncodedPacketSink(vTrack);
    for await (const packet of vSink.packets()) {
      const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + videoTimestampOffset });
      await outVideo.add(cloned);
    }
    const vDur = await vTrack.computeDuration();
    videoTimestampOffset += vDur;

    // Copy audio packets if available
    if (aTrack && outAudio) {
      const aSink = new EncodedPacketSink(aTrack);
      for await (const packet of aSink.packets()) {
        const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + audioTimestampOffset });
        await outAudio.add(cloned);
      }
      const aDur = await aTrack.computeDuration();
      audioTimestampOffset += aDur;
    }
  }

  outVideo.close();
  if (outAudio) outAudio.close();
  await output.finalize();

  // Retrieve the in-memory MP4 buffer and return it as a Blob
  const finalBuffer: ArrayBuffer | null = (target as unknown as { buffer: ArrayBuffer | null }).buffer;
  if (!finalBuffer) throw new Error('Failed to finalize stitched video');
  return new Blob([finalBuffer], { type: 'video/mp4' });
}

This approach is fast and preserves quality. Add a decode/encode fallback only if you need to reconcile codec mismatches.

Implementation: Vue.js

Vue makes this pattern much cleaner by handling reactivity and state management for us. Let's rebuild the same functionality as a Vue component.

Setting Up State

We'll track the form inputs, generation progress, and results:

<script setup lang="ts">
import OpenAI from 'openai';
import { ref, computed } from 'vue';

const promptText = ref('');
const model = ref<'sora-2' | 'sora-2-pro'>('sora-2');
const totalSeconds = ref<16 | 20 | 24>(16);
const size = ref('720x1280');

const isGenerating = ref(false);
const currentSegment = ref(0);
const totalSegments = ref(0);
const overallProgress = ref(0);
const segmentBlobs = ref<Blob[]>([]);
const finalVideoUrl = ref<string | null>(null);

Vue's reactivity means our UI will automatically update when these values change. No need to manually trigger re-renders.

Computing Cost Estimates

We can use a computed property to show users the estimated cost in real-time:

const estimatedCost = computed(() => {
  if (model.value === 'sora-2') {
    return totalSeconds.value * 0.10;
  }
  const isHighRes = size.value === '1024x1792' || size.value === '1792x1024';
  return totalSeconds.value * (isHighRes ? 0.50 : 0.30);
});

This recalculates automatically when the user changes the model, duration, or resolution. The pricing is based on OpenAI's current rates: $0.10/second for Sora 2, $0.30-0.50/second for Sora 2 Pro.

Generating Segments with Progress

Here's the main generation function, adapted for Vue:

async function startGeneration() {
  isGenerating.value = true;
  segmentBlobs.value = [];
  finalVideoUrl.value = null;
  
  const openai = new OpenAI({
    apiKey: yourApiKey,
    dangerouslyAllowBrowser: true
  });
  
  const plans = planSegments(totalSeconds.value);
  totalSegments.value = plans.length;
  let inputReference: Blob | undefined;

As we generate each segment, we update progress state:

  for (let i = 0; i < plans.length; i++) {
    currentSegment.value = i + 1;
    
    const segment = await generateSegment(
      openai,
      promptText.value,
      plans[i],
      model.value,
      size.value,
      inputReference
    );
    
    segmentBlobs.value.push(segment);
    inputReference = await extractLastFrame(segment);
    overallProgress.value = Math.round(((i + 1) / plans.length) * 100);
  }

After all segments are generated, stitch them together:

  const finalBlob = await stitchSegments(segmentBlobs.value);
  finalVideoUrl.value = URL.createObjectURL(finalBlob);
  isGenerating.value = false;
}

Vue's reactivity automatically updates the UI as currentSegment, overallProgress, and finalVideoUrl change. We don't need to write any DOM manipulation code.

Handling Segment Generation

We can extract the per-segment logic into a helper that updates progress as the job runs:

async function generateSegment(
  openai: OpenAI,
  prompt: string,
  seconds: number,
  model: string,
  size: string,
  inputRef?: Blob
): Promise<Blob> {
  const body: any = { model, prompt, seconds: String(seconds), size };
  
  if (inputRef) {
    body.input_reference = await toFile(inputRef, 'ref.jpg', {
      type: 'image/jpeg'
    });
  }

Poll the job status and update a reactive progress value:

  let job = await openai.videos.create(body);
  
  while (job.status === 'in_progress' || job.status === 'queued') {
    await new Promise(r => setTimeout(r, 2000));
    job = await openai.videos.retrieve(job.id);
    
    const progress = job.progress ?? 0;
    segmentProgress.value = progress;
  }
  
  const resp = await openai.videos.downloadContent(job.id);
  return new Blob([await resp.arrayBuffer()], { type: 'video/mp4' });
}

We've added a segmentProgress reactive value that updates as each individual segment generates. This lets us show both per-segment and overall progress to the user.

Building the UI

The template is straightforward. Here's the form section:

<template>
  <div class="generator">
    <h1>Long Video Generator</h1>
    
    <div class="form">
      <label>
        Prompt
        <textarea v-model="promptText" rows="4" />
      </label>
      
      <label>
        Model
        <select v-model="model">
          <option value="sora-2">Sora 2 ($0.10/sec)</option>
          <option value="sora-2-pro">Sora 2 Pro ($0.30-0.50/sec)</option>
        </select>
      </label>

Show the cost estimate and generation button:

      <div class="cost">
        Estimated Cost: ${{ estimatedCost.toFixed(2) }}
      </div>
      
      <button
        @click="startGeneration"
        :disabled="!promptText || isGenerating"
      >
        {{ isGenerating ? 'Generating...' : 'Generate Video' }}
      </button>
    </div>

Display progress while generating:

    <div v-if="isGenerating" class="progress">
      <div>
        Overall: {{ overallProgress }}%
      </div>
      <div>
        Segment {{ currentSegment }} of {{ totalSegments }}
      </div>
    </div>
    
    <video v-if="finalVideoUrl" :src="finalVideoUrl" controls />
  </div>
</template>

Vue handles all the conditional rendering and data binding. When isGenerating becomes true, the progress section appears. When finalVideoUrl is set, the video player appears.

Wrapping Up

We've built a system that breaks through Sora's 12-second limit by chaining segments with visual continuity. The pattern is simple: generate, extract last frame, use as input for next segment, repeat. With proper stitching, the result is a single smooth video.

The next step would be adding persistence so users don't lose progress on failures. You could also experiment with different segment planning strategies. Maybe shorter segments give better continuity for certain types of prompts, or maybe you want to adjust the prompt slightly for each segment to maintain narrative coherence.

We could even expand by allowing for even longer videos: 30 seconds, 60 seconds maybe even a couple of minutes. The sky (and your wallet) are the limits!

Full Code Examples

import OpenAI, { toFile } from 'openai';
import {
  ALL_FORMATS,
  BlobSource,
  BufferTarget,
  EncodedAudioPacketSource,
  EncodedPacket,
  EncodedPacketSink,
  EncodedVideoPacketSource,
  Input,
  Mp4OutputFormat,
  Output
} from 'mediabunny';

type AllowedSegmentSeconds = 4 | 8 | 12;
type VideoModel = 'sora-2' | 'sora-2-pro';

// Plan segments to reach target duration
function planSegments(totalSeconds: number): AllowedSegmentSeconds[] {
  const allowed: AllowedSegmentSeconds[] = [12, 8, 4];
  const segments: AllowedSegmentSeconds[] = [];
  let remaining = totalSeconds;
  
  for (const segmentSize of allowed) {
    while (remaining >= segmentSize) {
      segments.push(segmentSize);
      remaining -= segmentSize;
    }
  }
  
  if (remaining !== 0) {
    throw new Error(`Cannot plan segments for ${totalSeconds} seconds`);
  }
  
  return segments;
}

// Extract last frame from video as JPEG
async function extractLastFrame(videoBlob: Blob): Promise<Blob> {
  const video = document.createElement('video');
  video.src = URL.createObjectURL(videoBlob);
  video.muted = true;
  
  await new Promise((resolve) => {
    video.onloadedmetadata = resolve;
  });
  
  video.currentTime = video.duration;
  await new Promise((resolve) => {
    video.onseeked = resolve;
  });
  
  const canvas = document.createElement('canvas');
  canvas.width = video.videoWidth;
  canvas.height = video.videoHeight;
  
  const ctx = canvas.getContext('2d');
  if (!ctx) throw new Error('Canvas context unavailable');
  ctx.drawImage(video, 0, 0);
  
  return new Promise((resolve, reject) => {
    canvas.toBlob(
      (blob) => blob ? resolve(blob) : reject(new Error('Failed to extract frame')),
      'image/jpeg',
      0.92
    );
  });
}

// Generate a single video segment
async function generateSegment(
  openai: OpenAI,
  prompt: string,
  seconds: AllowedSegmentSeconds,
  model: VideoModel,
  size: string,
  inputReference?: Blob
): Promise<Blob> {
  const body: any = {
    model,
    prompt,
    seconds: String(seconds),
    size
  };
  
  if (inputReference) {
    body.input_reference = await toFile(
      inputReference,
      'reference.jpg',
      { type: 'image/jpeg' }
    );
  }
  
  let job = await openai.videos.create(body);
  
  while (job.status === 'in_progress' || job.status === 'queued') {
    await new Promise(resolve => setTimeout(resolve, 2000));
    job = await openai.videos.retrieve(job.id);
  }
  
  if (job.status === 'failed') {
    throw new Error(job.error?.message || 'Generation failed');
  }
  
  const response = await openai.videos.downloadContent(job.id);
  const arrayBuffer = await response.arrayBuffer();
  return new Blob([arrayBuffer], { type: 'video/mp4' });
}

// Generate long video by chaining segments
async function generateLongVideo(
  apiKey: string,
  prompt: string,
  totalSeconds: number,
  model: VideoModel = 'sora-2',
  size: string = '720x1280'
): Promise<Blob[]> {
  const openai = new OpenAI({
    apiKey,
    dangerouslyAllowBrowser: true
  });
  
  const plan = planSegments(totalSeconds);
  const segments: Blob[] = [];
  let inputReference: Blob | undefined;
  
  for (const seconds of plan) {
    const segment = await generateSegment(
      openai,
      prompt,
      seconds,
      model,
      size,
      inputReference
    );
    
    segments.push(segment);
    inputReference = await extractLastFrame(segment);
  }
  
  return segments;
}

// Stitch segments into single video (lossless, no re-encode)
async function stitchSegments(segments: Blob[]): Promise<Blob> {
  if (!segments.length) throw new Error('No segments to stitch');

  const firstInput = new Input({ source: new BlobSource(segments[0]), formats: ALL_FORMATS });
  const firstVideoTrack = await firstInput.getPrimaryVideoTrack();
  const firstAudioTrack = await firstInput.getPrimaryAudioTrack();
  if (!firstVideoTrack) throw new Error('First segment has no video track');

  const videoCodec = (await firstVideoTrack.codec) || 'avc';
  const audioCodec = firstAudioTrack ? (await firstAudioTrack.codec) || 'aac' : null;

  const target = new BufferTarget();
  const output = new Output({ format: new Mp4OutputFormat({ fastStart: 'in-memory' }), target });

  const outVideo = new EncodedVideoPacketSource(videoCodec as any);
  output.addVideoTrack(outVideo);

  let outAudio: EncodedAudioPacketSource | null = null;
  if (audioCodec) {
    outAudio = new EncodedAudioPacketSource(audioCodec as any);
    output.addAudioTrack(outAudio);
  }

  await output.start();

  let videoTimestampOffset = 0;
  let audioTimestampOffset = 0;

  for (const seg of segments) {
    const input = new Input({ source: new BlobSource(seg), formats: ALL_FORMATS });
    const vTrack = await input.getPrimaryVideoTrack();
    const aTrack = await input.getPrimaryAudioTrack();
    if (!vTrack) continue;

    const vSink = new EncodedPacketSink(vTrack);
    for await (const packet of vSink.packets()) {
      const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + videoTimestampOffset });
      await outVideo.add(cloned);
    }
    const vDur = await vTrack.computeDuration();
    videoTimestampOffset += vDur;

    if (aTrack && outAudio) {
      const aSink = new EncodedPacketSink(aTrack);
      for await (const packet of aSink.packets()) {
        const cloned: EncodedPacket = packet.clone({ timestamp: packet.timestamp + audioTimestampOffset });
        await outAudio.add(cloned);
      }
      const aDur = await aTrack.computeDuration();
      audioTimestampOffset += aDur;
    }
  }

  outVideo.close();
  if (outAudio) outAudio.close();
  await output.finalize();

  const finalBuffer: ArrayBuffer | null = (target as unknown as { buffer: ArrayBuffer | null }).buffer;
  if (!finalBuffer) throw new Error('Failed to finalize stitched video');
  return new Blob([finalBuffer], { type: 'video/mp4' });
}

// Usage
async function main() {
  const segments = await generateLongVideo(
    'your-api-key',
    'A serene ocean sunset with rolling waves',
    20
  );
  
  const finalVideo = await stitchSegments(segments);
  const url = URL.createObjectURL(finalVideo);
  
  const video = document.createElement('video');
  video.src = url;
  video.controls = true;
  document.body.appendChild(video);
}
Copyright © 2025