Knodegraph is a knowledge graph builder that lets you upload documents and automatically extract entities and relationships using AI. You can also build graphs manually with a drag-and-drop editor.

How much does Knodegraph cost?

Knodegraph has a free tier with 3 graphs and 100 nodes per graph. The Pro plan is $14.99/month and includes unlimited graphs, 50,000 nodes, NLP extraction, multilingual support, and API access.

What languages does Knodegraph support?

Knodegraph supports over 100 languages for entity extraction, including English, Arabic (with full RTL support), French, Spanish, German, Chinese, Japanese, and many more.

Is my data secure on Knodegraph?

Yes. Knodegraph uses per-user data isolation, meaning your knowledge graphs are completely separate from other users. All data is stored on secure, self-hosted infrastructure with encrypted connections.

How does AI extraction work?

Upload any document and Knodegraph uses Claude AI to identify entities (people, organizations, locations, concepts) and the relationships between them. Extracted data is staged for your review before being added to your graph - you always have final control.

Can I export my knowledge graphs?

Yes. Free users can export to PNG, SVG, JSON, and CSV. Pro users additionally get access to JSON-LD, RDF, and Neo4j export formats.

Concepts 9 min read

Named Entity Recognition Explained: From Stanford NER to BERT and LLMs

Published 2026-04-30

What NER actually does

Named Entity Recognition is two tasks bolted together: find spans of text that refer to a real-world entity (the 'recognition'), and classify each span by type (the 'naming'). Standard types are Person, Organisation, Location, Date, Money, and a Misc bucket. Domain-specific NER adds types like Drug, Gene, Statute, Regulation, or Vessel.

If you are building a knowledge graph from documents, NER is the on-ramp. Every node in the graph that came from a document came from a NER hit; every edge is a relationship between two NER hits. The quality of the rest of the pipeline is bounded by NER's recall.

Three eras: rules → CRFs → neural

The first generation of NER (1995-2005) was rule-based. Hand-written regex and gazetteers — lists of known company names, place names, and so on. It worked acceptably on news copy and broke instantly on new domains. Stanford NER, released in 2005 by Finkel et al., was the canonical example of the second generation: a Conditional Random Field trained on hand-labelled data. It was the dominant tool for a decade because it was fast, deterministic, and good enough on standard benchmarks like CoNLL-2003.

The third generation arrived with BERT (Devlin et al., 2018-2019). Fine-tuning a pre-trained transformer on labelled NER data dropped CoNLL-2003 F1 from the high 80s to the low 90s and — more importantly — cut the labelled-data requirement by an order of magnitude. By 2022 the field had moved to span-classification heads on top of larger language models.

The fourth generation, which we are in now, uses general-purpose LLMs in zero-shot or few-shot mode. Claude or GPT-4 can label entities in a document with no fine-tuning at all. The accuracy is not always better than a fine-tuned BERT — but the cost of bringing up a new domain is dramatically lower.

Where each approach still fails

Domain shift is the perpetual NER killer. A model trained on news will mislabel medical entities; one trained on biomedical text will get tripped up by financial filings. Cross-domain NER is still an open research problem, and most teams solve it by buying or labelling domain data.

Low-resource languages remain a real gap. Whisper-grade ASR exists for 100 languages, but production-quality NER does not. Arabic dialectal NER lags Modern Standard Arabic by 5–10 F1 points; Urdu lags Hindi; many African languages have effectively no off-the-shelf NER at all. LLMs partially close this gap but not as completely as advertised.

Nested entities — 'Bank of America Corporation' contains 'America' and 'Bank of America', all of which might be relevant — still confuse most flat sequence labellers. Span-based architectures (DyGIE++, Spert) handle them; classic CRF-style NER does not.

Code-mixed text ('We're using تطبيق KnodeGraph for...') is where almost everything falls over. Production pipelines either route to a code-mix-specialised model or accept a 10–15% accuracy hit at the language boundary.

The modern stack in 2026

The pragmatic 2026 stack is layered. spaCy's industrial-strength CNN-based NER for fast, cheap pre-filtering on the bulk of the corpus. A fine-tuned transformer (a DeBERTa-v3 or one of the smaller LLaMA variants) for any domain where you have a few thousand labelled examples. An LLM call as the long-tail fallback for ambiguous spans, or as the only step on small corpora where the throughput cost of an LLM per document is acceptable.

Concrete numbers: on CoNLL-2003 English, fine-tuned DeBERTa is roughly 93–94 F1. spaCy's en_core_web_trf is 89–90. A Claude 3.5 zero-shot call is 88–90 with a well-written prompt. The 'right' choice is whatever fits your latency and cost envelope, not whatever has the highest leaderboard number.

From NER to a knowledge graph

NER gives you nodes. To get edges — the actual graph — you need a second step, relation extraction, which is its own substantial topic. Relation extraction takes pairs of NER spans within a context window and classifies the relationship between them: PRESCRIBED, ACQUIRED, SUBSIDIARY_OF, AUTHOR_OF, and so on.

Most modern pipelines run NER and relation extraction together as a joint model — the same transformer encoder produces span representations and pairwise representations, and two heads classify entity types and relation types. The KnodeGraph extraction worker uses Claude as a single-shot joint extractor, which sidesteps the joint-model engineering at the cost of per-document API spend.

Frequently Asked Questions

Should I fine-tune a transformer or just call an LLM?

Fine-tune if you process more than a few thousand documents per day and accuracy matters at the 1-2 F1 point level — the per-call cost difference accumulates fast at volume. Use an LLM if you process fewer documents, or if your domain shifts often enough that the labelling cost of fine-tuning would outweigh the inference cost of LLM calls.

Why is NER so much worse on Arabic and Hindi than English?

Two reasons. The labelled-data gap: CoNLL and OntoNotes have hundreds of thousands of tagged English entities; the equivalent Arabic and Hindi corpora are 5-10x smaller and skew toward formal news copy. And tokenisation is harder: Arabic clitics and Hindi compound noun phrases need language-aware tokenisers, and using English-trained tokenisers fragments words in ways the NER head cannot recover from.

What's the difference between NER and entity linking?

NER tells you 'this span is an Organisation'. Entity linking tells you 'this span is the specific Organisation with Wikidata ID Q123, distinct from all other Organisations with the same surface form'. Linking is harder, requires a target knowledge base, and is what turns NER output from a list into a graph. Tools like REL and BLINK are entity linkers; spaCy's NER is not.

Do I need NER if I'm using a Claude or GPT extraction prompt?

Implicitly yes — the LLM is doing NER in its head as part of the extraction. Explicitly, no — you do not need to call a separate NER tool first. The benefit of doing it as a separate step is auditable spans (you can show a user the exact text that became a node); the benefit of skipping it is one fewer model in the pipeline. KnodeGraph's pipeline does it as one Claude call.

What's the smallest labelled dataset I need to fine-tune NER for a new domain?

Practical floor is 500-1000 labelled documents if you're starting from a strong pre-trained transformer. Below that, a well-prompted LLM with a domain-specific entity list usually beats fine-tuning. Above 5000 documents, fine-tuning starts to clearly win on cost-per-call.

Source

Devlin, J., Chang, M.-W., Lee, K., & Toutanova, K. 'BERT: Pre-training of Deep Bidirectional Transformers for Language Understanding'. NAACL-HLT 2019. [link]

Ready to Try KnodeGraph?

Start free with 3 graphs and 100 nodes. Upgrade to Pro for AI extraction, unlimited graphs, and 50K nodes.

Get Started Free