Etymology with roots
Not just a text blob. Morphemic decomposition where the data allows it — prefix, stem, suffix, each with its own gloss and language. A word is a small history; WordWalk tries to show it.
DE–EN dictionary · preview
A dictionary for people who outgrew the short answer..
Idiom, etymology, register, cross-language metaphor. WordWalk is built for the lookups where Leo returns the right word and the wrong feeling — for translators, writers, poets, and late-stage learners who can already tell the difference.
Languages
DE · EN
Search fallback
pg_trgm
LLMs
Claude · GPT-4
Provider
Swappable
Data sources
Wiktionary · OPUS
Status
Preview
A dictionary for edge words
The short answer is a solved problem. The interesting words — the ones with three registers, a false friend in the next language over, and an etymology that changes the sentence — are what this product is about.
Not just a text blob. Morphemic decomposition where the data allows it — prefix, stem, suffix, each with its own gloss and language. A word is a small history; WordWalk tries to show it.
formal · colloquial · medical · rare · dated. Surfaced as usage labels on the entry and again per translation, so you don't send a Stammtisch word into a legal brief by accident.
A Cognate table sits next to every entry, typed by relationship — direct loanword, cognate, shared root, borrowed with prefix. The German Rezept is not the English receipt; we say so out loud.
Parallel sentences from Wiktionary and the OPUS corpus — each line tagged with its origin so you can judge the register before you quote it.
The interpretation endpoint returns the same word in three voices: theatrical (short, poetic), ELI5 (one analogy, thirty words), casual (how a friend would say it). Pick the one that fits your reader.
Misspell it and the search still lands. Postgres pg_trgm plus a form index means conjugated, declined, or half-remembered lookups resolve to the right lemma — or return suggestions instead of a dead page.
Grounded or silent
WordWalk's structured fields — lemma, part of speech, definitions, etymology, translations, examples — come from Wiktionary via wiktextract. The LLM layer is asked to interpret what's already there, not to hallucinate new lexicographic facts. Missing etymology stays missing; missing register stays blank. Better a known gap than a fluent lie.
Structured first
Entries, forms, translations, synonyms, cognates — all seeded from wiktextract JSONL dumps. The LLM sees the definition; it doesn’t get to rewrite it.
Cited examples
OPUS · OpenSubtitles · Tatoeba · Wiktionary. If a parallel sentence is quoted, the origin is on the entry. You decide whether the register fits your text.
Refusal floor
If the model returns malformed JSON or a missing field, the interpretation service returns null. Nothing persists. The next request retries against a clean slate.
The stack
Small, legible, boring on purpose. One API, one database, one frontend. The only interesting bit is the dual-LLM layer — and even that auto-selects from environment variables rather than asking you to pick.
Node 20+, Prisma plugin injects fastify.prisma, routes prefixed with /api. Rate-limited interpretation (ten per minute per IP), DB-cached after the first successful response.
Entry, Form, Translation, Example, Synonym, Cognate. Unique on (lemma, lang, pos), case-insensitive. pg_trgm for fuzzy lookup. JS client engine — no Rust binary to wrangle in Docker.
ANTHROPIC_API_KEY → Claude Sonnet. OPENAI_API_KEY → GPT-4o-mini. OLLAMA_BASE_URL → local, any open-weights model the endpoint serves. Same contract, same JSON shape, swappable at deploy time.
TanStack Query with thirty-minute staleTime; no refetch-on-focus. PWA via workbox runtime caching. Tailwind with paper-and-ink theme tokens — this is meant to be read.
What comes back
The entry endpoint returns the structured half — lemma, pos, IPA, definitions, etymology, translations, usage labels. The interpret endpoint adds three voices on top, each bounded to a word count the LLM is told to respect. JSON only, no markdown, no preamble — that rule is in the system prompt.
# Look up a word with etymology and register curl -s https://wordwalk.nyxcore.cloud/api/entry/anticipate?lang=en # Response (abridged): { "lemma": "anticipate", "lang": "en", "pos": "verb", "ipa": "/ænˈtɪs.ɪ.peɪt/", "definitions": [ "To expect, look forward to, or predict.", "To act in advance of, to forestall." ], "usageLabels": ["formal"], "etymologyText": "From Latin anticipāre — ante- (before) + capere (to take).", "etymologyRoots": { "parts": [ { "original": "ante-", "gloss": "prefix · Latin", "meaning": "before" }, { "original": "capere", "gloss": "verb · Latin", "meaning": "to take" } ], "result": { "word": "anticipāre", "literal": "to take beforehand", "lang": "Latin" } }, "translations": [ { "targetLang": "de", "word": "vorwegnehmen", "register": "formal", "isPrimary": true }, { "targetLang": "de", "word": "erwarten", "register": "neutral", "isPrimary": false }, { "targetLang": "de", "word": "ahnen", "register": "literary", "isPrimary": false } ] } # Three registers of the same meaning: curl -s -X POST https://wordwalk.nyxcore.cloud/api/interpret \ -d '{"lemma":"anticipate","pos":"verb","lang":"en"}' { "theatrical": "To reach forward in time — / to pluck the moment before it ripens.", "eli5": "Guessing what happens next, the way you know the bus is late before it is.", "casual": "Seeing it coming. Not predicting — expecting." }
Interpret is rate-limited (ten per minute per IP) and cached on the entry after the first successful response. Malformed output is dropped silently — we’d rather retry than cache a lie.
What this is not
Ipcha Mistabra wrote this section. Before you put WordWalk in your translation workflow, here is what it will fail at.
Disclosure
WordWalk is a dictionary, not DeepL. It resolves one word at a time with depth — etymology, register, metaphor. If you need a paragraph rendered into another language, use a translator and come back here for the word that won't sit right.
Disclosure
Synonyms exist in the schema, but they're grounded in the source entry's sense — not a free-associative cloud. If you want every word that sort of means courage, WordWalk is the wrong surface. If you want the precise register shift from Mut to Beherztheit, stay.
Disclosure
If you just need to know that Hund means dog, Leo or dict.cc will be faster and already cached in your browser. WordWalk earns its keep on the words that made you open a tab at all: the ones the short answer fails.
How to get in
The corpus is still being seeded, the interpretation cache is still warming, and the edges of the search are still being tested against real lookups. If you're a translator, a language learner past B2, or a writer working across DE and EN — we'd like your help breaking it.
Before you cite it in a translation
WordWalk is a preview. Coverage is uneven — common lemmas are dense, tail vocabulary is thinner, and some etymologies are text-only where the root decomposition hasn’t been structured yet. Treat it as an editorial assistant, not an authority. Spot-check register claims against a human lexicographer when the stakes are real.
Metis says: the preview tests the product, not you.