Here’s a truth nobody wants to admit: your team records everything and rewatches nothing.

Training sessions, onboarding calls, architecture reviews, sprint demos — hours of video sitting in Google Drive or Dropbox, slowly becoming institutional knowledge that only exists inside the heads of the people who happened to be there live.

We kept hitting this at CONFLICT. A 90-minute training session contains diagrams, action items, decisions, technical context — real knowledge. But extracting it meant someone sitting through the whole recording with a notepad. Nobody does that.

So we built PlanOpticon.

What it actually does

You point it at a video. It gives you back:

  • Full transcript with speaker diarization (who said what, when)
  • Every diagram and slide extracted and recreated as Mermaid code
  • Action items with assignees and deadlines pulled from conversation
  • Key points summarizing what matters
  • A knowledge graph mapping entities and relationships across the entire session
  • Reports in Markdown, HTML, and PDF
pip install planopticon
planopticon analyze -i meeting.mp4 -o ./output

That’s it. One command. It auto-discovers available AI models across OpenAI, Anthropic, and Gemini, routes each task to the best provider, and produces structured output you can actually search, reference, and build on.

Why open source

Knowledge extraction from video isn’t a solved problem — it’s barely an attempted one. We think the approach matters more than the implementation, and the best way to prove that is to ship the code.

PlanOpticon is MIT licensed, works with any combination of AI providers, and runs locally where it can (Whisper transcription doesn’t need an API).

This is the first post in a three-part series. Next up: how we taught vision models to find and recreate diagrams from video frames, and why most of them are garbage.

GitHub · Docs · PyPI

Written by Leo M.