
Build Karaoke-Style Video Captions in the Browser with Whisper
Create word-level, karaoke-style captions entirely in the browser using Whisper, WebGPU/WASM, and burn them into videos with Mediabunny
Become an AI Engineer by building AI applications that deliver real-world value. Learn about local AI models that run entirely in the browser, techniques to reduce hallucinations, and multi-modal capabilities of LLMs.
Become an AI Engineer by building AI applications that deliver real-world value. Learn about local AI models that run entirely in the browser, techniques to reduce hallucinations, and multi-modal capabilities of LLMs.

Create word-level, karaoke-style captions entirely in the browser using Whisper, WebGPU/WASM, and burn them into videos with Mediabunny
Render PDF pages as screenshots and use OpenAI's vision models with structured outputs to extract typed invoice data from scanned documents and complex layouts

OpenAI's Sora 2 caps videos at 12 seconds. Learn how to create longer videos by chaining segments together, using the last frame of each video as the first frame of the next.

Learn how to build an image editor that modifies specific regions using Gemini's image generation API, with practical TypeScript and Vue.js implementations.

Build a client-side scene detector using MediaBunny.js and statistical analysis. Process videos frame-by-frame without sending data to a server.

Learn how to blend clothing onto person photos using Google's Gemini 2.5 Flash multimodal model with practical TypeScript and Vue.js implementations.

Learn how to build an accessible video captioning system using Transformers.js and FastVLM that runs entirely in the browser with WebGPU—no server required.