Concepts 9 min read

What Is a Knowledge Graph? Definitions, History, and Why It Matters

A working definition that survives scrutiny

A knowledge graph is a directed, labelled graph in which nodes represent entities of interest and edges represent typed relationships between them, with both nodes and edges optionally carrying properties. That is the short, technical answer. The longer answer is that a knowledge graph is also a commitment: a commitment that the schema you use to describe your domain is itself part of the data, queryable alongside the instances, and subject to the same rules of consistency.

This definition matters because it excludes things people sometimes call knowledge graphs but are not. A spreadsheet of people and the companies they work for is not a knowledge graph; it is a relation. A diagram drawn in a slide deck is not a knowledge graph; it is a picture. A knowledge graph is the underlying structured artefact you can query, validate, and grow without rebuilding from scratch every time the world changes.

Three properties separate a knowledge graph from a vanilla graph database. First, a knowledge graph has an ontology — an explicit, machine-readable description of the entity types and relation types it contains. Second, the ontology is itself stored in the graph, so consumers can introspect it without out-of-band documentation. Third, both data and ontology are uniquely identified, usually with IRIs in semantic-web stacks or stable string IDs in property-graph stacks, so two graphs can be merged without colliding.

A knowledge graph is the data plus the rules about the data, in one queryable place.

Where the term comes from

The phrase 'knowledge graph' as used today entered the mainstream when Google launched its Knowledge Graph product in May 2012, populating the entity panels you see to the right of search results. But the underlying ideas are much older. In 2001, Tim Berners-Lee, James Hendler, and Ora Lassila published 'The Semantic Web' in Scientific American, sketching a future web in which 'information is given well-defined meaning, better enabling computers and people to work in cooperation' — a future built on machine-readable graphs of facts.

Before that, semantic networks date back to the 1960s in cognitive science (Quillian's work on associative memory) and to frame systems and description logics in 1980s AI research. RDF, the W3C's resource description framework, appeared in 1999 and gave the web a standard way to write down a graph of facts. The term 'knowledge graph' itself appears in academic literature decades earlier — Edward Feigenbaum and others used it loosely in 1980s expert-systems work — but Google's 2012 product is what made it a household word for engineers.

Today, almost every large platform runs on one. Wikidata, run by the Wikimedia Foundation, exposes a public knowledge graph of more than 100 million entities under an open licence. Amazon's product graph powers recommendations and search. LinkedIn's economic graph links jobs, skills, companies, and people. Each is structurally similar — typed nodes, typed edges, a published schema — even where the storage technology differs.

Anatomy: nodes, edges, properties, types

Nodes are the things in your graph. They typically carry a stable identifier, a type (sometimes called a label), and zero or more properties. A node typed Person might have properties full_name, date_of_birth, and country_of_residence. Properties are simple values — strings, numbers, dates — that belong to a node and do not have their own identity.

Edges are the relationships between nodes. They carry a type (called a predicate in RDF, a relationship type in property graphs) and optionally their own properties. The edge from Person:einstein to Organization:princeton might be typed worked_at and carry properties from_year=1933 and to_year=1955. Edges are directed: an edge from A to B is not automatically also an edge from B to A.

Types — both for nodes and edges — are usually defined in a schema or ontology that ships with the graph. The schema constrains what is allowed: which edge types can connect which node types, which properties are required, and so on. A graph without a schema is technically still a graph, but lacks the machine-readable agreement that lets two systems exchange data without ad-hoc translation. This is the crucial difference between an ad-hoc graph and a knowledge graph.

Why teams build them now

The recent surge in interest is not because graph theory got better — it is because three other things did. First, graph databases like Neo4j, FalkorDB, Amazon Neptune, and TigerGraph reached commodity reliability. Second, large language models can now extract entities and relationships from unstructured text at acceptable cost and quality, dramatically lowering the price of populating a graph. Third, retrieval-augmented generation pipelines benefit measurably from structured retrieval, and a knowledge graph is the obvious place to do structured retrieval.

Practical wins fall into a few buckets. Connection discovery: finding paths across many hops is cheap on a graph and expensive in SQL. Schema flexibility: adding a new relationship type does not require a migration. Explainability: you can show the path that justified an answer, not just the answer. Multi-source fusion: data from a CRM, a wiki, a stack of PDFs, and a public registry can all live as nodes in one graph as long as their identifiers reconcile.

What you give up is also worth naming. Relational queries on highly tabular data — 'sum revenue by region by quarter' — are still much faster in a column store. Knowledge graphs are not a replacement for analytics warehouses; they are a complement to them, specialised for the kinds of questions where edges are the answer.

How to know if you actually need one

Use a knowledge graph when your most valuable questions are about relationships rather than aggregates. 'Which suppliers ship to customers in regions where our top three competitors are growing?' is a graph question. 'What was last quarter's revenue?' is not. If your answers consistently require three or more joins across tables, a graph is probably easier and faster.

Use a knowledge graph when your data sources are heterogeneous and you need an integrated view. A graph's tolerance for missing properties and its flexibility around schema make it forgiving when you reconcile a CRM dump with a folder of PDFs and a public dataset.

Skip a knowledge graph when your data is uniformly tabular and your queries are statistical. Skip it when 'good enough' is a list of search results rather than a path of evidence. Skip it when your team has no appetite for learning a query language like Cypher or SPARQL — though tools that translate natural language to graph queries are improving fast.

If you do decide to build one, start small. Pick one domain (one product line, one team's documents, one set of papers), choose three or four entity types, two or three relation types, and grow from there. The graphs that succeed in production almost always start narrow.

Related reading

Frequently Asked Questions

Is a knowledge graph the same thing as a graph database?

No. A graph database is the storage layer — Neo4j, FalkorDB, Neptune, JanusGraph — that lets you persist and query graph-shaped data. A knowledge graph is what you get when you put a typed, schema-described body of facts into that storage. You can have a graph database with no knowledge graph in it (just ad-hoc nodes and edges), and you can store a knowledge graph in a relational database if you really want to. The two concepts are orthogonal even though they pair naturally.

Do I need RDF and SPARQL to have a knowledge graph?

No. RDF, OWL, and SPARQL are the W3C semantic-web stack and are the most formally rigorous way to build a knowledge graph, but the property-graph stack (Cypher, Gremlin, GQL) is equally valid for most engineering use cases. Choose RDF when interoperability with public datasets like Wikidata or DBpedia matters, when you need OWL reasoning, or when your team is already inside the semantic-web ecosystem. Choose property graphs when developer ergonomics, JSON-friendliness, and tooling availability matter more than formal interoperability.

How is this different from a relational database with foreign keys?

A relational schema with foreign keys can express a graph, but it is awkward when relationships themselves carry data, when relationship types proliferate, or when queries traverse many hops. Each new relation type usually means a new join table, and a five-hop traversal becomes a five-join SQL statement that the planner may or may not optimise. Graph engines are built for traversal as the first-class operation, so the cost of a five-hop path is typically linear in the path length, not exponential in the join count.

Can a knowledge graph contain contradictions?

Yes, and most do. Real-world data is contradictory: two sources may give different birth dates for the same person, or two contracts may name different governing laws. A practical knowledge graph either tolerates contradictions and surfaces them via provenance edges (this fact came from this source) or applies a reconciliation pipeline that picks a canonical value. Pure semantic-web stacks with OWL DL reasoning can choke on contradictions; property-graph stacks generally do not. KnodeGraph's staging workflow exists exactly to let a human resolve contradictions before they hit the live graph.

Are LLMs replacing knowledge graphs?

The short answer is no, and the longer answer is that they are increasingly used together. LLMs are weak at multi-hop factual recall and at explaining their reasoning over structured data; knowledge graphs are weak at unstructured text understanding. The current best practice for retrieval-augmented generation is to use the LLM to query a knowledge graph or vector index and then synthesise a natural-language answer over the retrieved facts. Microsoft's GraphRAG, Anthropic's tool-use patterns, and most enterprise RAG stacks now follow this hybrid approach.

Source

Tim Berners-Lee, James Hendler, and Ora Lassila, 'The Semantic Web', Scientific American, May 2001. [link]

Ready to Try KnodeGraph?

Start free with 3 graphs and 100 nodes. Upgrade to Pro for AI extraction, unlimited graphs, and 50K nodes.

Get Started Free