How to Build a Video Scene Detector that Runs in Your Browser

You're building a video editing app and need to help users find scene changes in hour-long videos. Sending everything to your server is expensive and slow. What if the browser could do the work instead?

We're going to build a scene detector that runs entirely client-side using MediaBunny.js, a WebAssembly-powered video toolkit. The video never leaves the user's device, there's no upload time, and you don't pay for server compute.

The basic idea is simple: compare each frame to the previous one. When they're very different, you've found a scene change. Let's start by comparing two frames.

function calculateMAFD(frame1: ImageData, frame2: ImageData): number {
  const data1 = frame1.data
  const data2 = frame2.data
  let diff = 0
  
  for (let i = 0; i < data1.length; i += 4) {
    const gray1 = (299 * data1[i] + 587 * data1[i + 1] + 114 * data1[i + 2]) / 1000
    const gray2 = (299 * data2[i] + 587 * data2[i + 1] + 114 * data2[i + 2]) / 1000
    diff += Math.abs(gray1 - gray2)
  }
  
  return diff / (frame1.width * frame1.height)
}

This loops through every pixel in RGBA format (4 values per pixel), converts to grayscale, and calculates the absolute difference. We're using integer math (scaled by 1000) to avoid floating-point operations. The result is normalized by the total pixel count, giving us a single number that represents how different the frames are.

But here's the problem: what's a "big" difference? A video of someone talking has small frame-to-frame changes. An action movie has constant motion. If you use a fixed threshold, you'll miss scenes in one video and get false positives in the other.

The solution is to track recent differences and look for statistical outliers. Let's add mean and standard deviation calculations.

function calculateMean(values: number[]): number {
  return values.reduce((sum, val) => sum + val, 0) / values.length
}

function calculateStdDev(values: number[], mean: number): number {
  const sqDiffs = values.map(val => (val - mean) ** 2)
  return Math.sqrt(sqDiffs.reduce((sum, val) => sum + val, 0) / values.length)
}

Nothing fancy here, just standard statistics. Now let's use these to detect scene changes.

We'll keep a sliding window of the last 60 frame differences (6 seconds at 10fps). When a new difference is more than 3 standard deviations above the mean, that's a scene change.

const WINDOW_SIZE = 60
const mafdValues: number[] = []
let previousFrame: ImageData | null = null

function processFrame(currentFrame: ImageData) {
  if (!previousFrame) {
    previousFrame = currentFrame
    return false
  }
  
  const diff = calculateMAFD(previousFrame, currentFrame)
  previousFrame = currentFrame
  
  if (mafdValues.length < WINDOW_SIZE) {
    mafdValues.push(diff)
    return false
  }
  
  const mean = calculateMean(mafdValues)
  const stdDev = calculateStdDev(mafdValues, mean)
  
  const isSceneChange = stdDev > 0.1 && Math.abs(diff - mean) / stdDev > 3
  
  if (isSceneChange) {
    mafdValues.length = 0
  } else {
    mafdValues.push(diff)
    mafdValues.shift()
  }
  
  return isSceneChange
}

We wait until we have 60 frames of data before detecting anything. This builds a baseline for what's "normal" in this video. The stdDev > 0.1 check prevents false positives in static videos where tiny changes look like outliers.

When we detect a scene change, we reset the window. The new scene might have completely different motion characteristics, so we need a fresh baseline.

Now let's actually process a video. MediaBunny.js makes this straightforward.

import { Input, BlobSource, VideoSampleSink, ALL_FORMATS } from 'mediabunny'

async function detectScenes(videoFile: File) {
  const input = new Input({
    source: new BlobSource(videoFile),
    formats: ALL_FORMATS
  })
  
  const videoTrack = await input.getPrimaryVideoTrack()
  if (!videoTrack) throw new Error('No video track found')
  
  const sink = new VideoSampleSink(videoTrack)
  const duration = await input.computeDuration()

We load the video from a File object and get its video track. The VideoSampleSink lets us pull out individual frames at specific timestamps.

Let's set up a canvas for processing. Here's the key performance trick: we'll downscale frames to 160px wide.

  const fps = 10
  const totalFrames = Math.floor(duration * fps)
  const scenes: { timestamp: number; image: string }[] = []
  
  const canvas = document.createElement('canvas')
  const ctx = canvas.getContext('2d')!
  canvas.width = 160
  canvas.height = Math.round((videoTrack.displayHeight * 160) / videoTrack.displayWidth)

Processing a 160x90 frame is about 100x faster than 1920x1080. We're only looking for big visual changes, not fine details, so this works perfectly.

Now we loop through the video at 10fps and run our detection algorithm on each frame.

  for (let i = 0; i < totalFrames; i++) {
    const timestamp = i / fps
    const sample = await sink.getSample(timestamp)
    
    if (sample) {
      sample.draw(ctx, 0, 0, canvas.width, canvas.height)
      const imageData = ctx.getImageData(0, 0, canvas.width, canvas.height)
      
      if (processFrame(imageData)) {
        scenes.push({
          timestamp,
          image: canvas.toDataURL('image/jpeg')
        })
      }
      
      sample.close()
    }
  }
  
  return scenes
}

For each frame, we draw it to our small canvas, extract the pixel data, and check if it's a scene change. The sample.close() call is crucial—it frees the frame's memory so we don't leak.

That's the vanilla TypeScript version. Now let's see how Vue makes this nicer to work with.

Adding Vue Reactivity

Vue gives us automatic UI updates as we process the video. Let's start with the reactive state.

<script setup lang="ts">
import { ref, computed } from 'vue'
import { Input, BlobSource, VideoSampleSink, ALL_FORMATS } from 'mediabunny'

const videoFile = ref<File | null>(null)
const frames = ref<{ src: string; timestamp: number }[]>([])
const status = ref<'idle' | 'loading' | 'detecting'>('idle')
const processedFrames = ref(0)
const totalFrames = ref(0)

As we process frames, processedFrames increments and the UI updates automatically. Let's use that for a progress button.

const buttonText = computed(() => {
  if (status.value === 'loading') return 'Loading Video...'
  if (status.value === 'detecting') {
    const pct = Math.round((processedFrames.value / totalFrames.value) * 100)
    return `Detecting... ${pct}%`
  }
  return 'Detect Scenes'
})

The button text updates reactively as we process the video. Now let's adapt our detection logic.

const detectScenes = async () => {
  if (!videoFile.value || status.value !== 'idle') return
  
  try {
    status.value = 'loading'
    frames.value = []
    
    const input = new Input({
      source: new BlobSource(videoFile.value),
      formats: ALL_FORMATS
    })
    
    const videoTrack = await input.getPrimaryVideoTrack()
    if (!videoTrack) return
    
    status.value = 'detecting'
    // ...

The core algorithm is the same, but we'll update processedFrames inside the loop to give users real-time feedback.

Here's a nice touch: we'll use two canvases—one small for processing, one full-size for thumbnails.

    const processingCanvas = document.createElement('canvas')
    const pCtx = processingCanvas.getContext('2d', { willReadFrequently: true })
    
    const thumbnailCanvas = document.createElement('canvas')
    const tCtx = thumbnailCanvas.getContext('2d')
    
    processingCanvas.width = 160
    thumbnailCanvas.width = videoTrack.displayWidth

This lets us do fast comparisons on small frames while storing nice-looking thumbnails for display.

Inside the frame loop, we update progress and push detected scenes reactively.

    for (let i = 0; i < totalFrames.value; i++) {
      processedFrames.value = i + 1
      
      // ... get sample, process frame ...
      
      if (isSceneChange) {
        sample.draw(tCtx, 0, 0)
        frames.value.push({
          src: thumbnailCanvas.toDataURL('image/jpeg'),
          timestamp
        })
      }
    }

As we detect scenes, they appear in the UI immediately. Users don't have to wait for the entire video to finish processing.

Performance Notes

On my laptop, this processes about 30 frames per second. A 10-minute video takes roughly 3 minutes to analyze. That's acceptable for an in-browser tool.

The downscaling is critical. I initially tried full-resolution frames and it was unusably slow—2-3 frames per second. Downscaling to 160px wide made it 10x faster with no real impact on accuracy.

Memory can be an issue with longer videos. If you detect 100 scenes, that's 100 JPEG thumbnails in RAM. Consider limiting display count or using lower quality JPEGs. The sample.close() call is essential—without it, you'll leak frame buffers and eventually crash the tab.

The 3 standard deviation threshold works well for most content, but you might want to make it adjustable. Action-heavy videos might need 4 or 5 to avoid false positives. A slider would let users tune it.

One caveat: the algorithm needs 6 seconds to build its baseline. Early transitions won't be detected. If your video starts with a fast-cut montage, you'll miss those. You could reduce the window size, but that makes detection less stable.

Wrapping Up

You've built a scene detector that runs entirely in the browser. Users can process hour-long videos on their own device, and you don't pay for compute or storage. The statistical approach adapts automatically to different video types—action movies and talking-head interviews both work without manual tuning.

Demo

Video Scene Detector

Drag & drop a video file here, or click to select a file.

Supports .mp4 and .mov

Full Code Examples

import { Input, BlobSource, VideoSampleSink, ALL_FORMATS } from 'mediabunny'

interface Scene {
  timestamp: number
  imageData: string
}

// Calculate Mean Absolute Frame Difference
function calculateMAFD(frame1: ImageData, frame2: ImageData): number {
  const data1 = frame1.data
  const data2 = frame2.data
  let diff = 0
  const len = data1.length
  
  for (let i = 0; i < len; i += 4) {
    // Convert to grayscale using integer math for performance
    const gray1 = (299 * data1[i] + 587 * data1[i + 1] + 114 * data1[i + 2]) / 1000
    const gray2 = (299 * data2[i] + 587 * data2[i + 1] + 114 * data2[i + 2]) / 1000
    diff += Math.abs(gray1 - gray2)
  }
  
  return diff / (frame1.width * frame1.height)
}

function calculateMean(data: number[]): number {
  if (data.length === 0) return 0
  return data.reduce((sum, val) => sum + val, 0) / data.length
}

function calculateStdDev(data: number[], mean: number): number {
  if (data.length === 0) return 0
  const sqDiff = data.map(value => (value - mean) ** 2)
  const avgSqDiff = calculateMean(sqDiff)
  return Math.sqrt(avgSqDiff)
}

async function detectScenes(videoFile: File): Promise<Scene[]> {
  const input = new Input({
    source: new BlobSource(videoFile),
    formats: ALL_FORMATS
  })
  
  const videoTrack = await input.getPrimaryVideoTrack()
  if (!videoTrack) throw new Error('No video track found')
  
  const sink = new VideoSampleSink(videoTrack)
  const duration = await input.computeDuration()
  
  const fps = 10
  const totalFrames = Math.floor(duration * fps)
  const WINDOW_SIZE = 60 // 6 seconds at 10fps
  const DOWNSCALE_WIDTH = 160
  
  const scenes: Scene[] = []
  const mafdValues: number[] = []
  let previousImageData: ImageData | null = null
  let downscaleHeight = 0
  
  // Create canvases
  const processingCanvas = document.createElement('canvas')
  const pCtx = processingCanvas.getContext('2d', { willReadFrequently: true })!
  const thumbnailCanvas = document.createElement('canvas')
  const tCtx = thumbnailCanvas.getContext('2d')!
  
  for (let i = 0; i < totalFrames; i++) {
    const timestamp = i / fps
    const sample = await sink.getSample(timestamp)
    
    if (sample) {
      // Set canvas sizes on first frame
      if (downscaleHeight === 0) {
        downscaleHeight = Math.round(
          (sample.displayHeight * DOWNSCALE_WIDTH) / sample.displayWidth
        )
        processingCanvas.width = DOWNSCALE_WIDTH
        processingCanvas.height = downscaleHeight
        thumbnailCanvas.width = sample.displayWidth
        thumbnailCanvas.height = sample.displayHeight
      }
      
      // Draw to small canvas for processing
      sample.draw(pCtx, 0, 0, processingCanvas.width, processingCanvas.height)
      const currentImageData = pCtx.getImageData(
        0, 0, processingCanvas.width, processingCanvas.height
      )
      
      // Always add first frame
      if (i === 0) {
        sample.draw(tCtx, 0, 0)
        scenes.push({
          imageData: thumbnailCanvas.toDataURL('image/jpeg'),
          timestamp
        })
      }
      
      if (previousImageData) {
        const diff = calculateMAFD(previousImageData, currentImageData)
        let isSceneChange = false
        
        // Check for scene change if window is full
        if (mafdValues.length >= WINDOW_SIZE) {
          const mean = calculateMean(mafdValues)
          const stdDev = calculateStdDev(mafdValues, mean)
          
          // Detect statistical outliers
          if (stdDev > 0.1 && Math.abs(diff - mean) / stdDev > 3) {
            isSceneChange = true
          }
        }
        
        if (isSceneChange) {
          sample.draw(tCtx, 0, 0)
          scenes.push({
            imageData: thumbnailCanvas.toDataURL('image/jpeg'),
            timestamp
          })
          mafdValues.length = 0 // Reset window
        } else {
          mafdValues.push(diff)
          if (mafdValues.length > WINDOW_SIZE) {
            mafdValues.shift()
          }
        }
      }
      
      previousImageData = currentImageData
      sample.close()
    }
  }
  
  return scenes
}

// Usage
const fileInput = document.querySelector('input[type="file"]') as HTMLInputElement
fileInput.addEventListener('change', async (e) => {
  const file = (e.target as HTMLInputElement).files?.[0]
  if (file) {
    const scenes = await detectScenes(file)
    console.log(`Detected ${scenes.length} scenes`)
  }
})

<script setup lang="ts">
import { ref, computed, onUnmounted } from 'vue'
import { Input, BlobSource, VideoSampleSink, ALL_FORMATS } from 'mediabunny'

interface Frame {
  src: string
  timestamp: number
}

// Helper functions for statistical calculations
function calculateMean(data: number[]): number {
  if (data.length === 0)
    return 0
  const sum = data.reduce((acc, value) => acc + value, 0)
  return sum / data.length
}

function calculateStdDev(data: number[], mean: number): number {
  if (data.length === 0)
    return 0
  const sqDiff = data.map(value => (value - mean) ** 2)
  const avgSqDiff = calculateMean(sqDiff)
  return Math.sqrt(avgSqDiff)
}

/**
 * Calculates the Mean Absolute Frame Difference (MAFD) between two frames.
 * This is a measure of how different two frames are.
 */
function calculateMAFD(frame1: ImageData, frame2: ImageData): number {
  const data1 = frame1.data
  const data2 = frame2.data
  let diff = 0
  const len = data1.length
  for (let i = 0; i < len; i += 4) {
    // Convert pixels to grayscale using integer arithmetic for performance.
    // The coefficients are scaled by 1000 to avoid floating-point math.
    const gray1 = (299 * data1[i] + 587 * data1[i + 1] + 114 * data1[i + 2]) / 1000
    const gray2 = (299 * data2[i] + 587 * data2[i + 1] + 114 * data2[i + 2]) / 1000
    diff += Math.abs(gray1 - gray2)
  }
  return diff / (frame1.width * frame1.height)
}

const videoFile = ref<File | null>(null)
const frames = ref<Frame[]>([])
const status = ref<'idle' | 'loading' | 'detecting'>('idle')
const fileInput = ref<HTMLInputElement | null>(null)
const processedFrames = ref(0)
const totalFrames = ref(0)
const videoUrl = ref<string | null>(null)
const videoPlayer = ref<HTMLVideoElement | null>(null)

const buttonText = computed(() => {
  switch (status.value) {
    case 'loading':
      return 'Loading Video...'
    case 'detecting':
      if (totalFrames.value > 0) {
        const percentage = Math.round(
          (processedFrames.value / totalFrames.value) * 100
        )
        return `Detecting scenes... (${processedFrames.value}/${totalFrames.value}) ${percentage}%`
      }
      return 'Detecting Scenes...'
    default:
      return 'Detect Scenes'
  }
})

const handleFileChange = (event: Event | DragEvent) => {
  let file: File | null = null
  if (event instanceof DragEvent && event.dataTransfer) {
    file = event.dataTransfer.files[0] ?? null
  } else if (event.target instanceof HTMLInputElement && event.target.files) {
    file = event.target.files[0] ?? null
  }

  if (file) {
    if (videoUrl.value) {
      URL.revokeObjectURL(videoUrl.value)
    }
    videoFile.value = file
    videoUrl.value = URL.createObjectURL(file)
    frames.value = []
    processedFrames.value = 0
    totalFrames.value = 0
  }
}

const openFilePicker = () => {
  fileInput.value?.click()
}

onUnmounted(() => {
  if (videoUrl.value) {
    URL.revokeObjectURL(videoUrl.value)
  }
})

const seekToTimestamp = (timestamp: number) => {
  if (videoPlayer.value) {
    videoPlayer.value.currentTime = timestamp
    videoPlayer.value.play()
  }
}

/**
 * Detects scene changes in the video using a single-pass algorithm based on Mean Absolute Frame Difference (MAFD)
 * and a dynamic threshold calculated from a sliding window of recent frame differences.
 * This approach is inspired by the paper:
 *
 * Y. A. Salih, L. E. George, "Dynamic Scene Change Detection in Video Coding",
 * International Journal of Engineering (IJE) TRANSACTIONS B: Applications Vol. 33, No. 5, (May 2020) 966-974
 * https://www.researchgate.net/publication/341281580_Dynamic_Scene_Change_Detection_in_Video_Coding
 *
 * The algorithm works as follows:
 * 1. Process the video frame by frame.
 * 2. For each frame, calculate the MAFD from the previous frame.
 * 3. Maintain a sliding window of the most recent MAFD values.
 * 4. Calculate the mean and standard deviation of the values in the window.
 * 5. If the current MAFD is a statistical outlier (e.g., > 3 standard deviations from the mean),
 *    it's considered a scene change.
 * 6. The frame is immediately extracted and added to the UI for a progressive user experience.
 * 7. After a scene change, the sliding window is cleared to adapt to the new scene's content.
 */
const extractFrames = async () => {
  if (!videoFile.value || status.value !== 'idle') {
    return
  }

  try {
    status.value = 'loading'
    frames.value = []
    processedFrames.value = 0
    totalFrames.value = 0

    const input = new Input({
      source: new BlobSource(videoFile.value),
      formats: ALL_FORMATS
    })
    const videoTrack = await input.getPrimaryVideoTrack()

    if (!videoTrack) {
      console.error('No video track found')
      return
    }

    status.value = 'detecting'
    const sink = new VideoSampleSink(videoTrack)
    const duration = await input.computeDuration()

    const framesPerSecond = 10 // Sample 10 frames per second for more granular analysis
    totalFrames.value = Math.floor(duration * framesPerSecond)
    const SLIDING_WINDOW_SIZE = 60 // 6 seconds at 10fps
    const DOWNSCALE_WIDTH = 160 // Downscale frames for faster processing
    let downscaleHeight = 0

    // --- Single Pass: Detect scenes and extract frames progressively ---
    const mafdValues: number[] = [] // our sliding window
    let previousImageData: ImageData | null = null

    const processingCanvas = document.createElement('canvas')
    const pCtx = processingCanvas.getContext('2d', { willReadFrequently: true })
    if (!pCtx)
      return

    const thumbnailCanvas = document.createElement('canvas')
    const tCtx = thumbnailCanvas.getContext('2d')
    if (!tCtx)
      return

    for (let i = 0; i < totalFrames.value; i++) {
      processedFrames.value = i + 1
      const timestamp = i / framesPerSecond
      const sample = await sink.getSample(timestamp)
      if (sample) {
        if (downscaleHeight === 0) {
          if (sample.displayWidth > 0) {
            downscaleHeight = Math.round(
              (sample.displayHeight * DOWNSCALE_WIDTH) / sample.displayWidth
            )
            processingCanvas.width = DOWNSCALE_WIDTH
            processingCanvas.height = downscaleHeight
            thumbnailCanvas.width = sample.displayWidth
            thumbnailCanvas.height = sample.displayHeight
          }
          else {
            sample.close()
            continue
          }
        }
        // Draw the sample to the small canvas for processing, effectively downscaling it
        sample.draw(pCtx, 0, 0, processingCanvas.width, processingCanvas.height)
        const currentImageData = pCtx.getImageData(
          0,
          0,
          processingCanvas.width,
          processingCanvas.height
        )

        const createThumbnail = () => {
          sample.draw(tCtx, 0, 0)
          return thumbnailCanvas.toDataURL('image/jpeg')
        }

        if (i === 0) {
          // Always add the first frame
          frames.value.push({
            src: createThumbnail(),
            timestamp
          })
        }

        if (previousImageData) {
          const diff = calculateMAFD(previousImageData, currentImageData)

          let isSceneChange = false
          // Check for scene change if we have enough data in our sliding window
          if (mafdValues.length >= SLIDING_WINDOW_SIZE) {
            const mean = calculateMean(mafdValues)
            const stdDev = calculateStdDev(mafdValues, mean)
            const sceneChangeThreshold = 3 // 3 std devs

            // A scene change is detected if the difference is a statistical outlier.
            // We check if stdDev is large enough to avoid false positives in low-motion videos.
            if (stdDev > 0.1 && Math.abs(diff - mean) / stdDev > sceneChangeThreshold)
              isSceneChange = true
          }

          if (isSceneChange) {
            frames.value.push({
              src: createThumbnail(),
              timestamp
            })
            // Reset the window after a scene change to adapt to the new scene's characteristics
            mafdValues.length = 0
          }
          else {
            mafdValues.push(diff)
            if (mafdValues.length > SLIDING_WINDOW_SIZE)
              mafdValues.shift()
          }
        }
        previousImageData = currentImageData
        sample.close()
      }
    }
  }
  catch (error) {
    console.error('Error extracting frames:', error)
  }
  finally {
    status.value = 'idle'
    processedFrames.value = 0
    totalFrames.value = 0
  }
}
</script>

<template>
  <UCard>
    <template #header>
      <h2 class="text-lg font-semibold">
        Video Scene Detector
      </h2>
    </template>

    <div class="space-y-4">
      <div
        v-if="videoUrl"
        class="video-container"
      >
        <video
          ref="videoPlayer"
          :src="videoUrl"
          controls
          class="w-full rounded-lg"
        />
      </div>
      <div
        class="border-2 border-dashed border-gray-300 dark:border-gray-700 rounded-lg p-8 text-center cursor-pointer hover:border-primary-500 transition-colors"
        @click="openFilePicker"
        @dragover.prevent
        @drop.prevent="handleFileChange"
      >
        <input
          ref="fileInput"
          type="file"
          accept="video/mp4,video/quicktime"
          class="hidden"
          @change="handleFileChange"
        >
        <div v-if="!videoFile">
          <p>Drag & drop a video file here, or click to select a file.</p>
          <p class="text-sm text-gray-500">
            Supports .mp4 and .mov
          </p>
        </div>
        <div v-else>
          <p>Selected file: {{ videoFile.name }}</p>
        </div>
      </div>

      <UButton
        :disabled="!videoFile || status !== 'idle'"
        :loading="status !== 'idle'"
        @click="extractFrames"
      >
        {{ buttonText }}
      </UButton>
    </div>

    <template
      v-if="frames.length > 0"
      #footer
    >
      <div>
        <h3 class="text-md font-semibold mb-2">
          Detected Scenes
        </h3>
        <div
          class="grid grid-cols-2 sm:grid-cols-3 md:grid-cols-4 lg:grid-cols-5 gap-4"
        >
          <div
            v-for="(frame, index) in frames"
            :key="index"
            class="aspect-w-16 aspect-h-9 relative"
          >
            <img
              :src="frame.src"
              alt="Scene Frame"
              class="object-cover w-full h-full rounded-lg"
            >
            <UButton
              size="xs"
              class="absolute bottom-1 left-1"
              @click="seekToTimestamp(frame.timestamp)"
            >
              {{ frame.timestamp.toFixed(2) }}s
            </UButton>
          </div>
        </div>
      </div>
    </template>
  </UCard>
</template>

AI Engineer