/images/blog-generated/discovery-evolved-federated-knowledge-graphs.webp

Discovery is the most important phase of any software project. It is also the most fragile. The insights gathered during stakeholder interviews, research sessions, and domain deep-dives are perishable. They live in handwritten notes, half-remembered conversations, and Miro boards that nobody opens after the second week. The critical understanding that should inform every architectural decision and every specification gets reduced to a few bullet points in a kickoff deck, and then it fades.

We have run discovery engagements at CONFLICT for over thirteen years. We have watched the same pattern repeat across hundreds of projects: the discovery phase generates enormous insight, the transition to delivery loses most of it, and the team spends the rest of the engagement re-discovering things they already knew but failed to capture in a durable, queryable form.

This is not a discipline problem. It is an infrastructure problem. Discovery knowledge has not had a system of record the way code has Git, tasks have Jira, and documents have Google Drive. Until now.

This post is the technical companion to Discovery Evolved: Why We Invest More in Research, Not Less. That post covers the philosophy – why deeper research produces faster delivery in the AI-native era. This post covers the implementation – specifically how we use PlanOpticon and knowledge graphs to make discovery knowledge persistent, queryable, and machine-readable, and why that capability changes everything downstream.

The Discovery Knowledge Problem

A typical discovery engagement at CONFLICT involves 10 to 20 sessions over two to three weeks. Stakeholder interviews, technical deep-dives, process walkthroughs, competitive analysis reviews, and architecture discussions. Each session runs 60 to 90 minutes. Each one produces insights that matter.

The problem is what happens to those insights.

In the traditional model, a researcher or PM takes notes during each session. Maybe the session is recorded. The notes get consolidated into a discovery document – sometimes a formal report, sometimes a Confluence page, sometimes a slide deck. Key findings are extracted. Recommendations are drafted. The document is presented, discussed, and then archived.

Here is what gets lost in that process:

  • Relationships between concepts. A stakeholder in session 3 mentions a compliance requirement that directly contradicts an assumption made in session 1. The relationship between those two data points is obvious if you attended both sessions. It is invisible in a summary document.
  • Implicit decisions. Teams make dozens of micro-decisions during discovery that never get formally recorded. “We agreed that authentication should be handled by the existing SSO provider” might be said once, in passing, and never written down. Three months later, someone builds a custom auth system.
  • Entity resolution across sessions. The CTO calls it “the ingestion pipeline.” The head of operations calls it “the data loader.” The engineering lead calls it “the ETL process.” They are all talking about the same system. A human who attended all three sessions knows this. A summary document does not.
  • Confidence and provenance. Not all discovery insights carry equal weight. A data point from the CFO about budget constraints is more authoritative than a guess from a junior developer about system architecture. But in a flat summary document, all bullet points look the same.

Deloitte’s research on knowledge worker productivity found that employees spend an average of 25% of their time searching for and gathering information they need to do their jobs. In a project context, that percentage climbs even higher during the transition from discovery to delivery, precisely because discovery knowledge was not captured in a format that supports retrieval. The knowledge exists. It is just inaccessible when you need it.

From Notes to Knowledge Graphs

A knowledge graph is a structured representation of entities and the relationships between them. Google popularized the concept in 2012 with the Google Knowledge Graph, which connected facts about the world into a queryable network – “Albert Einstein” is connected to “physics” via the relationship “field,” connected to “Princeton” via “worked at,” and connected to “Theory of Relativity” via “developed.” The power of the knowledge graph is not in any single fact. It is in the connections between facts, the ability to traverse relationships and discover things that no single data point reveals on its own.

The same principle applies to project knowledge. A discovery session is not a flat list of findings. It is a network of interconnected entities: people, systems, requirements, risks, decisions, milestones, business rules, and technical concepts, all connected by typed relationships.

PlanOpticon extracts this network automatically.

The PlanOpticon Workflow

Every discovery session at CONFLICT is video recorded. Not as a backup that nobody watches – as a primary source of knowledge that feeds directly into our knowledge system. PlanOpticon processes those recordings and extracts structured knowledge through its analysis pipeline. Here is what that looks like in practice.

Entity Extraction

PlanOpticon identifies entities in the transcript: people, organizations, technical concepts, systems, processes, requirements, risks, decisions, and milestones. Each entity gets a type, a description, and source attribution back to the specific transcript segment where it appeared.

This is not keyword extraction. It is semantic entity recognition that understands the difference between “we need to migrate the billing system” (a requirement entity) and “the billing system processes 50,000 invoices per month” (a technical concept entity with a quantitative attribute). The entity types map to a planning ontology that makes the extracted knowledge immediately useful for project planning.

Relationship Extraction

For every pair of related entities, PlanOpticon extracts a typed relationship: “depends on,” “is responsible for,” “conflicts with,” “is a prerequisite for,” “was decided in,” “is blocked by.” These relationships are not inferred from word proximity. They are extracted from the semantic content of the conversation.

When a stakeholder says “we cannot start the API redesign until the database migration is complete,” PlanOpticon extracts two entities (API redesign, database migration) and a typed relationship (API redesign “is blocked by” database migration). This relationship is now queryable, traversable, and available as structured context for every specification and architectural decision that follows.

Planning Taxonomy Classification

PlanOpticon’s taxonomy classifier automatically categorizes extracted entities into a planning ontology: goals, risks, milestones, requirements, and tasks. This classification is critical because it transforms raw extracted knowledge into structured planning data.

A goal (“reduce document processing time by 90%”) is fundamentally different from a risk (“the third-party API has rate limits that may constrain throughput”) even though both might appear as casual statements in the same stakeholder conversation. The taxonomy classifier ensures that each entity is categorized by its planning function, not just its content.

SQLite Storage: Zero External Dependencies

The entire knowledge graph is stored in SQLite – zero external dependencies, no database server to install, no Docker containers to manage, no cloud services to configure. This was a deliberate design choice born from hard experience. Discovery tools that require infrastructure setup do not get used. They get evaluated, praised in a demo, and then abandoned because nobody wants to spin up a PostgreSQL instance and configure network access just to process meeting recordings.

PlanOpticon installs with pip install planopticon and stores everything in a local database file. The barrier to adoption is approximately zero, which is the only barrier that results in actual adoption.

Batch Mode: Where Patterns Emerge

Extracting a knowledge graph from a single session is useful. Extracting and merging knowledge graphs across an entire discovery engagement is transformative. This is where PlanOpticon’s batch mode changes the game.

Batch mode processes multiple videos – or any combination of PlanOpticon’s 20+ supported data sources – and merges their knowledge graphs into a unified graph. The merging is not a simple concatenation. It performs entity resolution across sources, identifying when different sessions reference the same concept using different terminology, and consolidating those references into a single entity with multiple source attributions.

This is where the real value appears, because patterns emerge that no single person in any single session would have seen.

Consider a real example from a recent engagement. Over twelve discovery sessions, PlanOpticon extracted 847 entities and 1,203 relationships. When the merged graph was analyzed:

  • Three separate stakeholders had independently identified the same data quality issue, using different terminology each time. PlanOpticon’s entity resolution merged these into a single entity with three source attributions, immediately elevating its importance. If three stakeholders independently raise the same concern, it is not a minor issue. It is a systemic problem. But in a traditional discovery process with separate notes from each session, this convergence is invisible.
  • A dependency chain between the client’s billing system, a third-party API, and a compliance requirement surfaced that had not been explicitly stated by anyone. The billing system depended on the API, the API had rate limits that conflicted with a compliance requirement for real-time processing. No one person knew all three facts. The graph connected them.
  • Two contradictory decisions had been made in different sessions by different stakeholders: one session concluded that user data should be stored in the EU, another assumed US-based storage. Without the graph, this contradiction would have surfaced during implementation – at much higher cost.

These are not edge cases. This is what happens on every discovery engagement of any complexity. The information is there. It is distributed across people, sessions, and documents. The knowledge graph makes the connections explicit.

Beyond Video: 20+ Data Source Connectors

Discovery knowledge does not live exclusively in meeting recordings. It lives in documents, repositories, wikis, cloud storage, research papers, and a dozen other places. PlanOpticon’s connector architecture supports ingestion from over 20 sources:

  • Video and audio: Direct video files, YouTube URLs, podcast feeds, Zoom/Teams/Meet recordings
  • Documents and collaboration: Google Drive, Dropbox, Google Workspace (Docs, Sheets, Slides), Microsoft 365, Notion, Obsidian vaults, Logseq graphs, OneNote
  • Development platforms: GitHub repositories, issues, pull requests, and discussions
  • Research and community: arXiv papers, RSS feeds, Reddit threads, Hacker News discussions
  • Cloud storage: S3 buckets with configurable object filtering

Each connector handles authentication and format normalization, converting source material into PlanOpticon’s common analysis format. The result is that a discovery engagement’s entire knowledge surface – recordings, documents, existing code, research papers, competitive analysis – can be ingested into a single unified knowledge graph.

A client’s discovery engagement might involve: six stakeholder interview recordings, a Google Drive folder of existing requirements documents, three GitHub repositories with the current system’s code, two Notion workspaces with design documents, and a handful of arXiv papers on relevant ML approaches. PlanOpticon processes all of these into one graph. The relationships between a requirement mentioned in a stakeholder interview, the code that currently implements a related feature, and the research paper that proposes a better approach are all queryable from one place.

This is the concept of federated knowledge. Not a single monolithic document that tries to contain everything, but a structured graph that connects knowledge across sources while preserving provenance. You can always trace an entity back to its origin – which session, which document, which line of code. The federation preserves context while enabling cross-source discovery.

Querying the Graph

A knowledge graph is only as useful as your ability to query it. PlanOpticon ships with an interactive REPL companion that provides 18 slash commands for graph exploration:

  • /entities – list and filter entities by type, source, or classification
  • /relationships – explore connections between entities with type filtering
  • /query – run structured queries against the graph using natural language or structured syntax
  • /paths – find the shortest path between any two entities using breadth-first search
  • /clusters – detect connected components and natural groupings within the graph

The path-finding capability is particularly powerful during discovery synthesis. When you ask “what is the shortest path between this risk and that milestone,” the graph shows you the chain of dependencies, decisions, and requirements that connect them. This is the kind of insight that typically requires a senior architect to hold in their head – the ability to trace how a risk in one area propagates through a chain of dependencies to affect a milestone in a completely different area. The graph makes it explicit and shareable.

Cluster detection surfaces natural groupings of related entities. A knowledge graph from a complex discovery engagement might naturally cluster into distinct workstreams that could be parallelized, or it might reveal that two features the client considers independent are actually deeply coupled through shared dependencies. These are architectural insights that emerge from the data, not from intuition.

The REPL is designed for exploratory analysis. You do not need to know what you are looking for before you start querying. Start with /entities --type risk to see all identified risks. Pick one. Run /relationships --entity "data migration risk" to see what it connects to. Run /paths --from "data migration risk" --to "Q3 launch milestone" to see if and how that risk touches the launch timeline. The graph rewards curiosity.

The Context Layer for AI-Native Delivery

Here is where the knowledge graph connects to everything else we do at CONFLICT.

We have written extensively about spec-driven development, context engineering, and AI-native delivery. The common thread across all of these is that the quality of AI agent output is determined by the quality of the context those agents receive. This is not theoretical. It is the empirical reality of working with LLMs at scale.

The discovery knowledge graph becomes the context layer that feeds everything downstream:

Specification writing. When a Context Engineer writes a formal specification for a feature, the knowledge graph provides the domain context: relevant business rules, related requirements, known risks, stakeholder decisions, and technical constraints. Instead of relying on memory or searching through notes, the engineer queries the graph. The specification is grounded in the full discovery dataset, not the subset that one person remembers.

Architecture decisions. When the team evaluates architectural tradeoffs, the knowledge graph surfaces constraints that might otherwise be overlooked. “Should we use a message queue or direct API calls?” The graph might show that three stakeholders expressed latency concerns, a compliance requirement mandates audit logging of all inter-service communication, and the existing system already has a RabbitMQ instance. These facts, scattered across different sessions and sources, are queryable from one place.

Agent context windows. When AI agents execute against specifications, the relevant subgraph of discovery knowledge can be included in their context window. An agent implementing a billing feature receives not just the functional specification but also the relevant business rules, stakeholder preferences, compliance requirements, and technical constraints extracted from discovery. The agent has domain understanding because the graph provides it.

Stakeholder communication. When the team needs to validate a decision with a stakeholder, the graph provides provenance: “This requirement was stated by [person] in [session] on [date].” No more “I think someone mentioned…” conversations. The attribution is structured and citable.

The causal chain is direct: deeper discovery produces richer knowledge graphs, richer knowledge graphs produce better specifications, better specifications produce better agent output, and better agent output means faster delivery with fewer iteration cycles. Every dollar invested in building the knowledge graph pays for itself multiple times over in reduced rework during implementation.

Export and Integration

Knowledge that stays locked in a single tool is knowledge that does not get used. PlanOpticon exports to multiple formats designed to integrate with existing workflows:

  • Obsidian vaults – with proper wiki-link formatting and frontmatter, so the knowledge graph becomes a navigable knowledge base in Obsidian
  • Notion markdown – formatted for import into Notion workspaces
  • GitHub wiki – structured for project documentation repositories
  • D3.js interactive viewer – a browser-based interactive visualization for stakeholder presentations and reviews
  • PDF reports – formatted documents with table of contents, structured sections, and embedded diagrams via reportlab
  • PPTX slide decks – PowerPoint presentations for executive communication via python-pptx

The D3.js viewer deserves specific attention. During discovery synthesis, we often need to walk stakeholders through the knowledge graph – showing them how their inputs connect, where contradictions exist, and what patterns emerged. The interactive viewer renders the graph in a browser, lets users click through entities and relationships, zoom into clusters, and trace paths between concepts. It turns abstract graph data into something people can point at and discuss.

This matters because stakeholders are not going to learn graph query syntax. They need to see the graph, interact with it, and have that “wait, those two things are connected?” moment that changes how they think about their own project. The D3.js viewer creates that moment.

What This Changes About Discovery

The traditional discovery process treats knowledge as a byproduct that gets consumed and discarded. You do the research, write the report, and move on. If someone needs a piece of discovery knowledge six weeks later, they search through documents, ask colleagues, or re-discover it from scratch.

Federated knowledge graphs change this in three fundamental ways.

Discovery knowledge becomes persistent. The graph does not decay. Six months into a project, the full discovery context is still queryable. When a new team member joins, they do not need to read a 50-page discovery document and hope they absorb the important parts. They query the graph for the domain concepts, decisions, and constraints relevant to their work. The knowledge is structured for retrieval, not just for one-time consumption.

Discovery knowledge becomes machine-readable. This is the critical capability for AI-native delivery. An AI agent cannot read a consultant’s handwritten notes. It cannot interpret the nuances of a stakeholder interview summary written in narrative prose. But it can consume structured entities and relationships from a knowledge graph. The graph is the format that makes discovery knowledge available to the agents that will use it during implementation. Without this translation layer, all the AI tooling in the world is operating on shallow context.

Discovery knowledge becomes compoundable. Each new source of information added to the graph does not just add data. It adds connections. The marginal value of each new session, document, or data source increases because the graph has more context to relate it to. This is the opposite of the traditional model, where each new discovery session produces a standalone set of notes with diminishing integration into the overall understanding. In the graph model, the twentieth session is more valuable than the first, because the graph it feeds into has nineteen sessions of context to connect it with.

Federated Knowledge in Practice

The term “federated knowledge” is specific and intentional. Federation means connecting knowledge from distributed sources into a unified queryable layer without forcing everything into a single monolithic store. The sources retain their structure, their provenance, and their update cadence. The federation layer provides a unified query interface across all of them.

In practice, this means a discovery engagement’s knowledge system includes:

  1. Stakeholder session recordings and transcripts. The raw source material from discovery conversations, processed into structured knowledge graphs with full provenance back to specific timestamps.
  2. Existing documentation. Technical specifications, API documentation, database schemas, architectural decision records, internal wikis – anything that captures existing knowledge about the system or domain.
  3. Code and infrastructure. The existing codebase itself is a knowledge source. Its structure, patterns, naming conventions, and architecture encode decisions and constraints that may not appear in any document.
  4. Market and competitive intelligence. Industry data, competitor analysis, user research, and market trends that inform product decisions.
  5. Organizational context. Team structure, deployment processes, compliance requirements, vendor relationships, and operational constraints that shape what can be built and how.

When an engineer – or an agent – needs to understand the billing domain, the federation layer pulls relevant knowledge from the stakeholder sessions where billing was discussed, the existing billing code, the compliance documentation, and the integration specifications for the payment provider. All of it. In context. With provenance.

This is fundamentally different from the traditional discovery output of a requirements document plus a Confluence wiki plus a Miro board that nobody can find. The federated approach produces a knowledge system that grows, stays current, and serves as a live source of truth throughout the project.

The Living Knowledge System

Discovery in the traditional model is a phase. It has a start date and an end date. It produces a deliverable. When it is over, the team moves on to design and build, and the discovery output begins its slow decay into irrelevance.

Discovery in our model is a living knowledge system. It is front-loaded heavily at the start of the project and continuously fed throughout the engagement. New stakeholder conversations are recorded and processed. Architecture decisions made during implementation are documented and connected to the domain knowledge they address. Implementation discoveries – the things you learn only when you start building – are fed back into the knowledge system so that subsequent specifications reflect reality, not the initial assumptions.

This continuous feeding is what prevents the knowledge decay that kills traditional requirements documents. The knowledge system is not a snapshot. It is a stream. And because it is structured and machine-readable, the cost of updating it is low enough that it actually happens, unlike the requirements document that everyone agrees should be updated but nobody ever does.

Getting Started

PlanOpticon is open source, MIT licensed, and available on PyPI:

pip install planopticon
planopticon init
planopticon doctor

Point it at a meeting recording, a folder of documents, or any combination of supported sources:

# Analyze a single recording
planopticon analyze -i meeting.mp4 -o ./output

# Batch mode: merge knowledge from multiple sources
planopticon analyze -i ./recordings/ -o ./output --batch

# Launch the interactive REPL to query the graph
planopticon repl -o ./output

The documentation at planopticon.dev covers the full capabilities, connector configuration, and graph query syntax. The source code is at github.com/ConflictHQ/PlanOpticon. The package is on PyPI.

The Bigger Picture

We built PlanOpticon because we needed it. Discovery knowledge was the most valuable and least durable asset in our delivery process. Every insight that faded from memory, every decision that got re-debated because nobody recorded it, every requirement that surfaced late because it was buried in a transcript – these were preventable failures caused by a tooling gap.

Google understood this when they built the Knowledge Graph in 2012. The insight was not that facts are valuable – everyone knows that. The insight was that the connections between facts are where the real value lives. A fact in isolation is trivia. A fact connected to other facts through typed, traversable relationships is knowledge. That principle applies to web search, and it applies equally to project discovery.

The knowledge graph is not a nice-to-have visualization. It is infrastructure. It is the system of record for project knowledge, the context layer for AI-native delivery, and the foundation for specifications that are grounded in the full breadth of what the project has learned.

Discovery has always been about understanding the problem before building the solution. What has changed is our ability to make that understanding persistent, queryable, and actionable at machine speed. Sticky notes were a starting point. Knowledge graphs are the infrastructure.