/images/blog/conflict-bg.png

Every organization has a knowledge graph. They just do not know it.

The information about who owns what system, which team depends on which service, what decisions were made in which meetings, how policies connect to processes – this knowledge exists. It lives in people’s heads, in Slack threads, in Confluence pages that nobody reads, in shared drives organized by someone who left two years ago, and in tribal knowledge passed between engineers during onboarding.

This implicit knowledge graph is real and valuable. The problem is that it is inaccessible, fragile, and impossible to query. When the engineer who knows how the billing system connects to the reporting pipeline goes on vacation, that knowledge goes with them. When a new executive asks “what are all the systems that touch customer data?”, the answer takes two weeks of interviews to compile.

Building an explicit knowledge graph – making the implicit connections visible, queryable, and maintainable – is one of the highest-leverage infrastructure investments an organization can make. And it becomes the foundation on which useful AI applications are actually built.

What a Knowledge Graph Actually Is

A knowledge graph is a structured representation of entities and the relationships between them. Entities are things: people, systems, documents, decisions, teams, APIs, databases. Relationships are connections: “owns,” “depends on,” “authored,” “decided in,” “deployed to.”

[Billing Service] --depends_on--> [Payment Gateway API]
[Billing Service] --owned_by--> [Platform Team]
[Platform Team] --led_by--> [Sarah Chen]
[Payment Gateway API] --documented_in--> [API Spec v3.2]
[API Spec v3.2] --approved_in--> [Architecture Review 2025-11-03]

This looks trivially simple, and that is the point. The power of a knowledge graph is not in the complexity of individual triples (entity-relationship-entity) but in the network they form. When you have thousands of these connections, you can answer questions that no single document or database can answer:

  • “What systems will be affected if we deprecate the Payment Gateway API?”
  • “Who should be in the room for a decision about the billing pipeline?”
  • “What are all the downstream dependencies of the customer data model?”
  • “Which architecture decisions from Q3 are still unresolved?”

These questions require traversing relationships across domains (engineering, organizational, process). No single system of record contains all the information, but a knowledge graph that connects them can.

Why AI Makes This Urgent

The immediate catalyst for building a knowledge graph is that it dramatically improves AI applications. Large language models are powerful reasoning engines, but they need the right information to reason about. Without a knowledge graph, getting the right information to an LLM means:

  • Keyword search across documents (misses relationships)
  • Vector search across embeddings (finds similar text, not connected knowledge)
  • Manually curating context for each query (does not scale)

With a knowledge graph, you can provide the LLM with precisely the relevant context: not just documents that mention the topic, but the actual entities, relationships, and context that surround it. This is the difference between an AI assistant that finds documents and one that actually understands how your organization works.

We saw this directly in PlanOpticon. When we added knowledge graph extraction to the meeting analysis pipeline, the quality of the insights improved substantially. Instead of producing a flat summary of what was discussed, the system could map decisions to owners, connect action items to prior commitments, and identify when a topic referenced a previous discussion. The knowledge graph provided the connective tissue that made the AI output genuinely useful.

Building Incrementally: The Practical Approach

The mistake most organizations make with knowledge graphs is treating them as a big-bang project. They envision a comprehensive ontology, a massive data migration, and a perfectly complete graph. This approach fails because:

  • Defining a comprehensive ontology up front is impossible without knowing what questions you need to answer.
  • Populating the graph requires data from dozens of sources, each with its own format and quality issues.
  • Maintaining completeness is a losing battle if the graph depends on manual updates.

The approach that works is incremental:

Start with one domain. Pick a single, high-value domain: your service architecture, your team structure, your decision records. Build a small knowledge graph for that domain and start querying it.

Define a minimal ontology. You need entity types and relationship types, but start with a handful of each. For a service architecture graph:

entity_types:
  - Service
  - Team
  - Person
  - API
  - Database
  - Repository

relationship_types:
  - owns          # Team owns Service
  - depends_on    # Service depends_on Service
  - maintained_by # Repository maintained_by Team
  - stores_data_in # Service stores_data_in Database
  - exposes       # Service exposes API
  - member_of     # Person member_of Team

This is enough to answer useful questions. You can extend the ontology later as needs emerge.

Automate ingestion. The knowledge graph should be populated from existing data sources, not manual entry. Parse your infrastructure-as-code to extract service dependencies. Pull team structure from your HR system or org chart tool. Extract API relationships from your service mesh configuration. Each automated source adds entities and relationships without human effort.

Use LLMs for extraction. This is where AI and knowledge graphs create a virtuous cycle. Use LLMs to extract entities and relationships from unstructured text: meeting notes, Slack conversations, design documents, incident reports. The extraction is imperfect, but it populates the graph with connections that would otherwise live only in people’s heads.

def extract_graph_triples(text: str, existing_entities: list) -> list:
    prompt = f"""Extract structured knowledge from this text.
    Known entities: {existing_entities}

    For each relationship found, return:
    - subject: the source entity (use existing entity if matching)
    - predicate: the relationship type
    - object: the target entity (use existing entity if matching)
    - confidence: your confidence in this extraction (0-1)
    - source_text: the text that supports this relationship

    Return as JSON array of triples."""

    response = llm.generate(prompt + "\n\nText: " + text)
    triples = parse_json(response)

    # Filter by confidence threshold
    return [t for t in triples if t["confidence"] >= 0.7]

Validate incrementally. Each new set of extracted triples should be reviewed – not exhaustively, but through sampling and anomaly detection. Flag triples that contradict existing knowledge, that introduce new entity types, or that have low confidence. Route these to human review while auto-accepting high-confidence extractions from trusted sources.

The Technology Stack

A knowledge graph does not require a specialized graph database to start, though you may want one eventually. The technology decision depends on your scale and query patterns.

For small graphs (under 100K triples): A relational database with a triples table works fine. PostgreSQL with a simple (subject, predicate, object, metadata) table supports basic graph queries through recursive CTEs or by loading the graph into memory for traversal.

For medium graphs (100K to 10M triples): A property graph database like Neo4j or Amazon Neptune provides native graph traversal, pattern matching, and visualization. The query language (Cypher for Neo4j, SPARQL or Gremlin for others) is more expressive for graph operations than SQL.

For large graphs or complex ontologies: Consider RDF-based stores (Apache Jena, Stardog) that support formal ontology reasoning, or a hybrid approach that uses a graph database for relationships and a document store for entity attributes.

Our recommendation for most organizations is to start with the simplest option that supports your query patterns and migrate if and when you outgrow it. A knowledge graph in PostgreSQL that is actually used is infinitely more valuable than a Neo4j deployment that is planned but never populated.

Making the Graph Useful

A knowledge graph that nobody queries is an expensive data project. Making the graph useful means building interfaces that people actually interact with.

Natural language queries. This is the highest-value interface. Connect your knowledge graph to an LLM-powered query system that translates natural language questions into graph traversals. “Who owns the billing service?” becomes a graph query. “What are all the dependencies of the customer API?” becomes a graph traversal. The LLM handles the ambiguity of natural language, and the graph provides the structured data.

Integration with existing tools. Surface knowledge graph information in the tools people already use. Show service ownership in your incident management system. Show dependency information in your deployment pipeline. Add team context to your project management tool. The graph should enrich existing workflows, not require a new workflow.

Automated insights. Run periodic analyses on the graph to surface insights proactively. Single points of failure (services with only one owner). Circular dependencies. Teams with disproportionate ownership load. Knowledge silos (topics that only one person has context on). These analyses are trivial with a graph database and impossible without one.

Impact analysis. Before making a change – deprecating an API, reorganizing teams, migrating a database – query the graph for all affected entities. This turns a “who might this affect?” conversation into a concrete, queryable answer.

The Maintenance Problem

The biggest risk to a knowledge graph is staleness. An outdated graph is worse than no graph because people trust it and make decisions based on incorrect information.

Maintenance strategies that work:

Continuous automated ingestion. The same sources that populated the graph initially should run on a schedule to keep it current. When infrastructure-as-code changes, the graph updates. When team structure changes in the HR system, the graph reflects it.

Event-driven updates. Hook into organizational events: deployments, incidents, team changes, document updates. Each event potentially updates the graph. A deployment of Service A that introduces a new dependency on Service B should create a depends_on relationship automatically.

Decay detection. Track the freshness of each entity and relationship. If a relationship has not been confirmed (by a source refresh, an event, or a human) within a defined period, flag it as potentially stale. Stale relationships should be visually distinct in the graph and excluded from automated analyses.

Low-friction human curation. When humans interact with graph data (through queries, reports, or integration points), give them an easy way to correct it. A “this is wrong” button on a graph query result that routes to a correction queue costs almost nothing to build and provides a valuable feedback loop.

The organizations that maintain healthy knowledge graphs are the ones that treat the graph as a product with users, feedback loops, and iteration cycles. Assign ownership. Measure usage. Track quality. Iterate on the schema as needs evolve.

From Knowledge Graph to AI Foundation

Once you have a knowledge graph, even a modest one, it becomes the foundation for a category of AI applications that are impossible without it.

Context-aware AI assistants. An AI assistant that knows your organizational graph can answer questions like “Who should I talk to about the billing pipeline?” by traversing ownership and expertise relationships. Without the graph, the assistant can only search documents.

Automated impact analysis. When a change is proposed, an AI system can traverse the graph to identify all affected entities, estimate the scope of impact, and generate a change plan. This turns a multi-day manual process into a minutes-long automated analysis.

Knowledge gap detection. By analyzing the graph structure, AI can identify areas where knowledge is thin: systems with no documentation, processes with no clear owner, decisions with no recorded rationale. These gaps are prioritized for human attention.

Onboarding acceleration. New team members can explore the knowledge graph to understand how systems connect, who owns what, and what decisions have been made. An AI-powered onboarding assistant that queries the graph can answer the questions that normally take weeks of corridor conversations.

The knowledge graph is not the AI application. It is the substrate on which AI applications grow. Without it, AI applications are limited to text search and pattern matching. With it, they can reason about structure, relationships, and organizational context.

Getting Started This Week

You do not need a grand strategy to start building a knowledge graph. Here is what you can do this week:

  1. Pick one domain. Service architecture is the easiest starting point because much of the data already exists in structured form (infrastructure-as-code, service catalogs, deployment configs).

  2. Define five entity types and five relationship types. No more than that. You can expand later.

  3. Write a script that populates entities from one source. Parse your Terraform, your Kubernetes manifests, or your service catalog. Extract entities and relationships into a simple triples format.

  4. Store it in PostgreSQL. Create a triples table. Load your extracted data. Write a few queries. See what questions the data can answer.

  5. Show it to your team. The fastest way to build momentum is to show someone an answer to a question they previously could not get without asking five people.

The knowledge graph nobody asked for is the one that, once it exists, everyone wishes they had built sooner. Start small, automate early, and grow it from the questions people actually ask. The AI applications will follow naturally – because once you have the graph, the AI finally has something worth reasoning about.