Knowledge Graphs for Investigative & Data Journalism

Investigative reporting frequently means staring at thousands of leaked emails, court filings, and SEC documents trying to spot the connection. KnodeGraph builds the entity-relationship graph for you — people, companies, money, dates — and lets you walk the connections visually. The Panama Papers team famously used Linkurious + Neo4j; KnodeGraph is the lightweight alternative for newsrooms that don't have a dedicated data engineer.

Why Journalism teams use KnodeGraph

  • The ICIJ's Panama Papers investigation (2016) used graph analysis to surface connections across 11.5 million documents — graph-based discovery is now standard for cross-border investigations.
  • OCCRP Aleph indexes 350M+ public records, mostly via entity-relationship structure.
  • The 2024 Pulitzer Prize for Investigative Reporting went to The Wall Street Journal for a series that explicitly cited 'graph-based reconstruction of the timeline' as a core method.
  • KnodeGraph supports 100+ languages — essential for cross-border reporting where source documents include Arabic court filings, Russian corporate registries, or Spanish leaked emails.
  • Pro tier's 50K-node limit is enough for most newsroom investigations; the Panama Papers' core graph was ~200K nodes, so larger projects would split across multiple graphs by jurisdiction.
  • Self-hosted option keeps unpublished source material air-gapped — critical for source-protection obligations under shield-law jurisdictions.

How the workflow runs

1.Ingest the document drop

Court filings, leaked emails, corporate registry pulls, SEC EDGAR filings. PDF and text-based formats handled directly; image-only PDFs run through OCR first.

2.Set up an investigation template

Entity types: person, company, address, account, transaction, event, document. Relation types: owns, controls, employed_by, paid_to, met_with, mentioned_in.

3.Find the bridge

Two clusters of names with no obvious link? KnodeGraph's path-finding shows you the shortest connection — often a shell company or a single intermediary that ties them together.

4.Build the timeline

Filter to date-stamped edges. Order by date. Suddenly the sequence of events that drives the lede is visible at a glance.

5.Hand off to the deck team

Export the graph as SVG for the print version, JSON for the interactive web version. Cytoscape's layout export is print-ready.

Why KnodeGraph fits Journalism workflows

  • No engineering team needed — a single reporter or a small data desk can run the workflow end-to-end.
  • Provenance links: every entity and edge connects back to the source document and page, so fact-checkers can audit every claim.
  • Self-hosted mode protects unpublished sources — a hard requirement in many jurisdictions.
  • 100+ language support means cross-border investigations don't fragment across tooling.
  • Way cheaper than the Linkurious + Neo4j enterprise stack the big international consortia use ($14.99/mo vs $50K+/yr).

Frequently Asked Questions

Can KnodeGraph protect anonymous sources?

The hosted SaaS encrypts data in transit and at rest, but document content is sent to Anthropic's API for extraction. For source-sensitive work, use the self-hosted plan: deploy on a newsroom-controlled VPC, supply your own Anthropic API key, and the document text never crosses your perimeter. We can also recommend an air-gapped variant where extraction runs against an on-prem LLM.

How does this compare to Linkurious or i2 Analyst's Notebook?

Linkurious is a powerful Neo4j visualisation layer used by the ICIJ; it requires a Neo4j licence and a graph engineer. i2 Analyst's Notebook is the law-enforcement standard, expensive and Windows-only. KnodeGraph is the lightweight modern alternative — you get most of the discovery workflow for $14.99/mo, accessible from any browser, with AI extraction built in.

Does it work on leaked email archives (.PST, .MBOX)?

Not natively. Convert .PST/.MBOX to individual .EML or text files first using a tool like libpst or Aid4Mail. Once each email is a separate document, KnodeGraph happily extracts senders, recipients, subjects, dates, and any entities mentioned in the body — emails become nodes with rich edges.

What about OCR for scanned court documents?

PDFs with embedded text extract perfectly. Pure-image scans go through Tesseract-based OCR; accuracy is around 94% on clean scans, lower on faxes or photographed pages. For mission-critical material, run a dedicated OCR pass (ABBYY FineReader, Adobe Acrobat Pro) first and feed clean text to KnodeGraph.

Can multiple reporters collaborate on the same graph?

Today the graph belongs to one account. Two patterns work: (1) shared service account that the team co-uses with strict commit hygiene, or (2) export the graph as JSON, share via a secure newsroom workflow, re-import on the other end. Real-time multi-user editing is on the roadmap.

Ready to graph your journalism work?

Start free with 3 graphs and 100 nodes. Upgrade to Pro for AI extraction, unlimited graphs, and 50K nodes.

Get Started Free