Back to all AI tutorials
Sep 20, 2025 - 30 min read
How to Build a Video Scene Detector that Runs in Your Browser

How to Build a Video Scene Detector that Runs in Your Browser

Build a client-side scene detector using MediaBunny.js and statistical analysis. Process videos frame-by-frame without sending data to a server.

Patrick the AI Engineer

Patrick the AI Engineer

Try the Interactive Playground

This tutorial is accompanied by an interactive playground. Test the code, experiment with different parameters, and see the results in real-time.

Go to Playground

You're building a video editing app and need to help users find scene changes in hour-long videos. Sending everything to your server is expensive and slow. What if the browser could do the work instead?

We're going to build a scene detector that runs entirely client-side using MediaBunny.js, a WebAssembly-powered video toolkit. The video never leaves the user's device, there's no upload time, and you don't pay for server compute.

The basic idea is simple: compare each frame to the previous one. When they're very different, you've found a scene change. Let's start by comparing two frames.

function calculateMAFD(frame1: ImageData, frame2: ImageData): number {
  const data1 = frame1.data
  const data2 = frame2.data
  let diff = 0
  
  for (let i = 0; i < data1.length; i += 4) {
    const gray1 = (299 * data1[i] + 587 * data1[i + 1] + 114 * data1[i + 2]) / 1000
    const gray2 = (299 * data2[i] + 587 * data2[i + 1] + 114 * data2[i + 2]) / 1000
    diff += Math.abs(gray1 - gray2)
  }
  
  return diff / (frame1.width * frame1.height)
}

This loops through every pixel in RGBA format (4 values per pixel), converts to grayscale, and calculates the absolute difference. We're using integer math (scaled by 1000) to avoid floating-point operations. The result is normalized by the total pixel count, giving us a single number that represents how different the frames are.

But here's the problem: what's a "big" difference? A video of someone talking has small frame-to-frame changes. An action movie has constant motion. If you use a fixed threshold, you'll miss scenes in one video and get false positives in the other.

The solution is to track recent differences and look for statistical outliers. Let's add mean and standard deviation calculations.

function calculateMean(values: number[]): number {
  return values.reduce((sum, val) => sum + val, 0) / values.length
}

function calculateStdDev(values: number[], mean: number): number {
  const sqDiffs = values.map(val => (val - mean) ** 2)
  return Math.sqrt(sqDiffs.reduce((sum, val) => sum + val, 0) / values.length)
}

Nothing fancy here, just standard statistics. Now let's use these to detect scene changes.

We'll keep a sliding window of the last 60 frame differences (6 seconds at 10fps). When a new difference is more than 3 standard deviations above the mean, that's a scene change.

const WINDOW_SIZE = 60
const mafdValues: number[] = []
let previousFrame: ImageData | null = null

function processFrame(currentFrame: ImageData) {
  if (!previousFrame) {
    previousFrame = currentFrame
    return false
  }
  
  const diff = calculateMAFD(previousFrame, currentFrame)
  previousFrame = currentFrame
  
  if (mafdValues.length < WINDOW_SIZE) {
    mafdValues.push(diff)
    return false
  }
  
  const mean = calculateMean(mafdValues)
  const stdDev = calculateStdDev(mafdValues, mean)
  
  const isSceneChange = stdDev > 0.1 && Math.abs(diff - mean) / stdDev > 3
  
  if (isSceneChange) {
    mafdValues.length = 0
  } else {
    mafdValues.push(diff)
    mafdValues.shift()
  }
  
  return isSceneChange
}

We wait until we have 60 frames of data before detecting anything. This builds a baseline for what's "normal" in this video. The stdDev > 0.1 check prevents false positives in static videos where tiny changes look like outliers.

When we detect a scene change, we reset the window. The new scene might have completely different motion characteristics, so we need a fresh baseline.

Now let's actually process a video. MediaBunny.js makes this straightforward.

import { Input, BlobSource, VideoSampleSink, ALL_FORMATS } from 'mediabunny'

async function detectScenes(videoFile: File) {
  const input = new Input({
    source: new BlobSource(videoFile),
    formats: ALL_FORMATS
  })
  
  const videoTrack = await input.getPrimaryVideoTrack()
  if (!videoTrack) throw new Error('No video track found')
  
  const sink = new VideoSampleSink(videoTrack)
  const duration = await input.computeDuration()

We load the video from a File object and get its video track. The VideoSampleSink lets us pull out individual frames at specific timestamps.

Let's set up a canvas for processing. Here's the key performance trick: we'll downscale frames to 160px wide.

  const fps = 10
  const totalFrames = Math.floor(duration * fps)
  const scenes: { timestamp: number; image: string }[] = []
  
  const canvas = document.createElement('canvas')
  const ctx = canvas.getContext('2d')!
  canvas.width = 160
  canvas.height = Math.round((videoTrack.displayHeight * 160) / videoTrack.displayWidth)

Processing a 160x90 frame is about 100x faster than 1920x1080. We're only looking for big visual changes, not fine details, so this works perfectly.

Now we loop through the video at 10fps and run our detection algorithm on each frame.

  for (let i = 0; i < totalFrames; i++) {
    const timestamp = i / fps
    const sample = await sink.getSample(timestamp)
    
    if (sample) {
      sample.draw(ctx, 0, 0, canvas.width, canvas.height)
      const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height)
      
      if (processFrame(imageData)) {
        scenes.push({
          timestamp,
          image: canvas.toDataURL('image/jpeg')
        })
      }
      
      sample.close()
    }
  }
  
  return scenes
}

For each frame, we draw it to our small canvas, extract the pixel data, and check if it's a scene change. The sample.close() call is crucial—it frees the frame's memory so we don't leak.

That's the vanilla TypeScript version. Now let's see how Vue makes this nicer to work with.

Adding Vue Reactivity

Vue gives us automatic UI updates as we process the video. Let's start with the reactive state.

<script setup lang="ts">
import { ref, computed } from 'vue'
import { Input, BlobSource, VideoSampleSink, ALL_FORMATS } from 'mediabunny'

const videoFile = ref<File | null>(null)
const frames = ref<{ src: string; timestamp: number }[]>([])
const status = ref<'idle' | 'loading' | 'detecting'>('idle')
const processedFrames = ref(0)
const totalFrames = ref(0)

As we process frames, processedFrames increments and the UI updates automatically. Let's use that for a progress button.

const buttonText = computed(() => {
  if (status.value === 'loading') return 'Loading Video...'
  if (status.value === 'detecting') {
    const pct = Math.round((processedFrames.value / totalFrames.value) * 100)
    return `Detecting... ${pct}%`
  }
  return 'Detect Scenes'
})

The button text updates reactively as we process the video. Now let's adapt our detection logic.

const detectScenes = async () => {
  if (!videoFile.value || status.value !== 'idle') return
  
  try {
    status.value = 'loading'
    frames.value = []
    
    const input = new Input({
      source: new BlobSource(videoFile.value),
      formats: ALL_FORMATS
    })
    
    const videoTrack = await input.getPrimaryVideoTrack()
    if (!videoTrack) return
    
    status.value = 'detecting'
    // ...

The core algorithm is the same, but we'll update processedFrames inside the loop to give users real-time feedback.

Here's a nice touch: we'll use two canvases—one small for processing, one full-size for thumbnails.

    const processingCanvas = document.createElement('canvas')
    const pCtx = processingCanvas.getContext('2d', { willReadFrequently: true })
    
    const thumbnailCanvas = document.createElement('canvas')
    const tCtx = thumbnailCanvas.getContext('2d')
    
    processingCanvas.width = 160
    thumbnailCanvas.width = videoTrack.displayWidth

This lets us do fast comparisons on small frames while storing nice-looking thumbnails for display.

Inside the frame loop, we update progress and push detected scenes reactively.

    for (let i = 0; i < totalFrames.value; i++) {
      processedFrames.value = i + 1
      
      // ... get sample, process frame ...
      
      if (isSceneChange) {
        sample.draw(tCtx, 0, 0)
        frames.value.push({
          src: thumbnailCanvas.toDataURL('image/jpeg'),
          timestamp
        })
      }
    }

As we detect scenes, they appear in the UI immediately. Users don't have to wait for the entire video to finish processing.

Performance Notes

On my laptop, this processes about 30 frames per second. A 10-minute video takes roughly 3 minutes to analyze. That's acceptable for an in-browser tool.

The downscaling is critical. I initially tried full-resolution frames and it was unusably slow—2-3 frames per second. Downscaling to 160px wide made it 10x faster with no real impact on accuracy.

Memory can be an issue with longer videos. If you detect 100 scenes, that's 100 JPEG thumbnails in RAM. Consider limiting display count or using lower quality JPEGs. The sample.close() call is essential—without it, you'll leak frame buffers and eventually crash the tab.

The 3 standard deviation threshold works well for most content, but you might want to make it adjustable. Action-heavy videos might need 4 or 5 to avoid false positives. A slider would let users tune it.

One caveat: the algorithm needs 6 seconds to build its baseline. Early transitions won't be detected. If your video starts with a fast-cut montage, you'll miss those. You could reduce the window size, but that makes detection less stable.

Wrapping Up

You've built a scene detector that runs entirely in the browser. Users can process hour-long videos on their own device, and you don't pay for compute or storage. The statistical approach adapts automatically to different video types—action movies and talking-head interviews both work without manual tuning.

Demo

Video Scene Detector

Drag & drop a video file here, or click to select a file.

Supports .mp4 and .mov

Full Code Examples

import { Input, BlobSource, VideoSampleSink, ALL_FORMATS } from 'mediabunny'

interface Scene {
  timestamp: number
  imageData: string
}

// Calculate Mean Absolute Frame Difference
function calculateMAFD(frame1: ImageData, frame2: ImageData): number {
  const data1 = frame1.data
  const data2 = frame2.data
  let diff = 0
  const len = data1.length
  
  for (let i = 0; i < len; i += 4) {
    // Convert to grayscale using integer math for performance
    const gray1 = (299 * data1[i] + 587 * data1[i + 1] + 114 * data1[i + 2]) / 1000
    const gray2 = (299 * data2[i] + 587 * data2[i + 1] + 114 * data2[i + 2]) / 1000
    diff += Math.abs(gray1 - gray2)
  }
  
  return diff / (frame1.width * frame1.height)
}

function calculateMean(data: number[]): number {
  if (data.length === 0) return 0
  return data.reduce((sum, val) => sum + val, 0) / data.length
}

function calculateStdDev(data: number[], mean: number): number {
  if (data.length === 0) return 0
  const sqDiff = data.map(value => (value - mean) ** 2)
  const avgSqDiff = calculateMean(sqDiff)
  return Math.sqrt(avgSqDiff)
}

async function detectScenes(videoFile: File): Promise<Scene[]> {
  const input = new Input({
    source: new BlobSource(videoFile),
    formats: ALL_FORMATS
  })
  
  const videoTrack = await input.getPrimaryVideoTrack()
  if (!videoTrack) throw new Error('No video track found')
  
  const sink = new VideoSampleSink(videoTrack)
  const duration = await input.computeDuration()
  
  const fps = 10
  const totalFrames = Math.floor(duration * fps)
  const WINDOW_SIZE = 60 // 6 seconds at 10fps
  const DOWNSCALE_WIDTH = 160
  
  const scenes: Scene[] = []
  const mafdValues: number[] = []
  let previousImageData: ImageData | null = null
  let downscaleHeight = 0
  
  // Create canvases
  const processingCanvas = document.createElement('canvas')
  const pCtx = processingCanvas.getContext('2d', { willReadFrequently: true })!
  const thumbnailCanvas = document.createElement('canvas')
  const tCtx = thumbnailCanvas.getContext('2d')!
  
  for (let i = 0; i < totalFrames; i++) {
    const timestamp = i / fps
    const sample = await sink.getSample(timestamp)
    
    if (sample) {
      // Set canvas sizes on first frame
      if (downscaleHeight === 0) {
        downscaleHeight = Math.round(
          (sample.displayHeight * DOWNSCALE_WIDTH) / sample.displayWidth
        )
        processingCanvas.width = DOWNSCALE_WIDTH
        processingCanvas.height = downscaleHeight
        thumbnailCanvas.width = sample.displayWidth
        thumbnailCanvas.height = sample.displayHeight
      }
      
      // Draw to small canvas for processing
      sample.draw(pCtx, 0, 0, processingCanvas.width, processingCanvas.height)
      const currentImageData = pCtx.getImageData(
        0, 0, processingCanvas.width, processingCanvas.height
      )
      
      // Always add first frame
      if (i === 0) {
        sample.draw(tCtx, 0, 0)
        scenes.push({
          imageData: thumbnailCanvas.toDataURL('image/jpeg'),
          timestamp
        })
      }
      
      if (previousImageData) {
        const diff = calculateMAFD(previousImageData, currentImageData)
        let isSceneChange = false
        
        // Check for scene change if window is full
        if (mafdValues.length >= WINDOW_SIZE) {
          const mean = calculateMean(mafdValues)
          const stdDev = calculateStdDev(mafdValues, mean)
          
          // Detect statistical outliers
          if (stdDev > 0.1 && Math.abs(diff - mean) / stdDev > 3) {
            isSceneChange = true
          }
        }
        
        if (isSceneChange) {
          sample.draw(tCtx, 0, 0)
          scenes.push({
            imageData: thumbnailCanvas.toDataURL('image/jpeg'),
            timestamp
          })
          mafdValues.length = 0 // Reset window
        } else {
          mafdValues.push(diff)
          if (mafdValues.length > WINDOW_SIZE) {
            mafdValues.shift()
          }
        }
      }
      
      previousImageData = currentImageData
      sample.close()
    }
  }
  
  return scenes
}

// Usage
const fileInput = document.querySelector('input[type="file"]') as HTMLInputElement
fileInput.addEventListener('change', async (e) => {
  const file = (e.target as HTMLInputElement).files?.[0]
  if (file) {
    const scenes = await detectScenes(file)
    console.log(`Detected ${scenes.length} scenes`)
  }
})
Copyright © 2025