Knodegraph is a knowledge graph builder that lets you upload documents and automatically extract entities and relationships using AI. You can also build graphs manually with a drag-and-drop editor.

How much does Knodegraph cost?

Knodegraph has a free tier with 3 graphs and 100 nodes per graph. The Pro plan is $14.99/month and includes unlimited graphs, 50,000 nodes, NLP extraction, multilingual support, and API access.

What languages does Knodegraph support?

Knodegraph supports over 100 languages for entity extraction, including English, Arabic (with full RTL support), French, Spanish, German, Chinese, Japanese, and many more.

Is my data secure on Knodegraph?

Yes. Knodegraph uses per-user data isolation, meaning your knowledge graphs are completely separate from other users. All data is stored on secure, self-hosted infrastructure with encrypted connections.

How does AI extraction work?

Upload any document and Knodegraph uses Claude AI to identify entities (people, organizations, locations, concepts) and the relationships between them. Extracted data is staged for your review before being added to your graph - you always have final control.

Can I export my knowledge graphs?

Yes. Free users can export to PNG, SVG, JSON, and CSV. Pro users additionally get access to JSON-LD, RDF, and Neo4j export formats.

Medical & Research 9 min read

Knowledge Graphs for Medical Literature: From PubMed Sprawl to a Curated Map

Published 2026-04-30

TL;DR — PubMed indexes more than 35 million biomedical citations and grows by close to a million entries per year, while the UMLS Metathesaurus integrates over 200 source vocabularies covering millions of biomedical concepts. No clinician or researcher can read at that scale. Knowledge graphs make the literature navigable — drugs, diseases, genes, trials, and outcomes become typed entities, the relationships between them become typed edges, and a researcher can ask multi-hop questions a search box cannot answer.

The Scale Problem in Biomedical Knowledge

PubMed is the National Library of Medicine's flagship database for biomedical literature, with more than 35 million citations indexed as of recent counts and roughly a million new citations added every year. The UMLS Metathesaurus, the standard interoperability layer for biomedical vocabularies, integrates 200+ source vocabularies (SNOMED CT, RxNorm, MeSH, LOINC, ICD-10) and covers several million concepts. Combined with PubMed Central full text and ClinicalTrials.gov registrations, the working corpus for any clinical question is enormous.

The standard tools — PubMed search, MeSH expansion, EndNote — are excellent for finding individual papers. They are not built to surface relationships across the corpus. A query like "which trials of GLP-1 agonists in patients with comorbid renal impairment reported cardiovascular outcomes" requires the searcher to chain multiple boolean queries, manually de-duplicate, and read fifty abstracts. A knowledge graph turns that into a constrained traversal: GLP-1 agonist → tested_in → Trial → enrolled → Patient with Comorbidity → reported → Outcome.

What a Medical Knowledge Graph Looks Like

A useful medical literature graph maps a few core entity types: Drug (with InChI key or RxNorm code), Disease (with SNOMED CT or ICD-10 code), Gene (with HGNC symbol), Trial (with NCT identifier), Patient Population (cohort attributes), Outcome (with definition), and Publication (PMID). Edges include treats, contraindicated_in, tested_in, observed_in, reported_in, cites, and supersedes. The schema is opinionated on purpose — a graph that tries to capture every nuance of biomedicine ends up unqueryable. Templates constrain extraction to a few high-value relations.

Bench-to-bedside literature reviews

A research scientist starting a new project pulls 200 papers, runs them through KnodeGraph with a biomedical template, and reviews the staged extractions. The output is a graph centered on the target gene or drug, surrounded by every interaction, every comorbidity, every model organism, and every trial mentioned in the source set. Instead of writing the literature review section from a stack of PDFs, the scientist writes from a single map and cites the source paragraphs the graph already pinned.

Clinical guideline development

Clinical guidelines are produced by societies (AHA, ESC, NICE, etc.) and require traceable evidence chains. A graph that maps every guideline statement to the supporting trials, the supporting trials to their outcomes, and the outcomes to the patient populations they were measured in produces the kind of audit trail regulators and reviewers expect. KnodeGraph's staging-and-citation pattern is a natural fit for this workflow because every fact in the graph carries its source pointer.

Pharmacovigilance and signal detection

Adverse event signals show up across case reports, registry data, and post-market surveillance. A graph that links Drug → reported_adverse_event → Outcome → in_patient_population gives safety teams a structured surface to monitor. New extractions add edges; thresholds on edge counts surface emerging signals before they reach the formal pharmacovigilance pipeline. This is not a replacement for FAERS or VigiBase — it is an internal scratchpad that turns a stream of papers into a queryable signal map.

What KnodeGraph Specifically Provides

PDF, DOCX, TXT, and Markdown extraction — most clinical literature still circulates as PDFs.
100+ language extraction via Claude — important for non-English sources (Spanish, Mandarin, Arabic medical literature) that PubMed indexes but many tools mishandle.
Templates tuned for biomedical entities (Drug, Disease, Trial, Outcome) and biomedical relations (treats, observed_in, contraindicated_in).
Staged extraction with mandatory human review — every edge is reviewable, which matters for clinical use cases where false edges can propagate dangerous claims.
Cypher query support and JSON/CSV export, so the graph integrates with downstream statistical or visualization workflows in R or Python.

What This Is Not

KnodeGraph is not a clinical decision support tool, not a regulatory submission system, and not a replacement for a real biomedical knowledge base like UMLS, DrugBank, or DisGeNET. It is the layer between your source documents and those authoritative resources — extracting structure from your specific corpus, letting you curate it, and exporting it for downstream use. For clinical applications, every claim that leaves the graph and reaches a patient-facing context should pass through a clinician's hands first. The staging area is built around exactly that gate.

Frequently Asked Questions

Can I import UMLS or MeSH directly?

Not as an out-of-the-box ingest, but you can map extracted entities to UMLS CUIs or MeSH terms using a property field on each node. Most teams do this gradually: extract first, normalize later. For full UMLS-grounded graphs, the right pattern is to use UMLS as the canonical entity layer and use KnodeGraph for the document-derived edges.

How do I avoid hallucinated relationships?

Two layers: schema-guided extraction (templates constrain what relation types Claude can produce), and mandatory staging review (no edge enters the live graph without a human approval). For biomedical work, both are non-negotiable. Hallucinated edges in a clinical context are a safety issue, not a quality issue.

Is this HIPAA-compliant for protected health information?

KnodeGraph as a hosted service is suitable for non-PHI literature corpora (published papers, public trial registries). For PHI, the recommended deployment is on-premises or in your own VPC; the underlying stack (FastAPI, Postgres, FalkorDB, Redis) is open-source and self-hostable. Treat this as an architecture question, not a feature checkbox.

How big can a literature graph realistically get?

Pro tier supports 50,000 nodes per graph, which comfortably handles a focused subfield (200–500 papers, ~10,000–30,000 entities, ~30,000–80,000 edges). For larger corpora, partition by question — one graph per literature review or one graph per disease area — rather than trying to maintain a single graph of all of medicine.

What does a first project look like?

Pick a tight question ("all RCTs of anti-IL-23 agents in psoriasis from 2018–2025"), pull the 50–80 papers that match, run them through a biomedical template, review the staged extractions for one afternoon, and finish with a graph that powers the methods and discussion sections of your literature review. Most teams start there because the scope is constrained and the output is shareable.

Source

U.S. National Library of Medicine. PubMed indexes 35+ million citations from MEDLINE, life science journals, and online books, with roughly 1 million new citations added per year. UMLS Metathesaurus integrates 200+ source vocabularies covering several million biomedical concepts. [link]

Ready to Try KnodeGraph?

Start free with 3 graphs and 100 nodes. Upgrade to Pro for AI extraction, unlimited graphs, and 50K nodes.

Get Started Free