Model Vocabulary Registry
A workspace-level term registry that enforces naming consistency across the architecture. Similarity matching suggests canonical terms when naming graph nodes, edges, and properties, with confidence scoring and direct integration into the graph editor.
Naming inconsistency is a subtle but significant governance problem. When one team calls it "CustomerOrder" and another calls it "ClientOrder" or "Order", the result is confusion, duplicated entities, and broken traceability. The Model Vocabulary Registry provides a canonical term list for your workspace, with similarity matching that surfaces suggestions as you name entities, properties, and relationships in the graph editor.
How It Works
The vocabulary registry is stored as a JSON file (model-vocabulary.registry.json) in the workspace's governance directory. Each term is stored with multiple normalised forms to enable accurate matching regardless of casing convention.
| Form | Example (for "SystemOwner") | Purpose |
|---|---|---|
| Original | SystemOwner | The canonical form as entered by the architect |
| Normalised | systemowner | Lowercase, trimmed, Unicode-normalised (NFC) for case-insensitive comparison |
| Compact | systemowner | Separators removed for matching across naming conventions |
| Tokens | ["system", "owner"] | Split from camelCase, snake_case, kebab-case, etc. for partial matching |
Similarity Matching
When you type a name in the graph editor (node label, edge label, or property name), the vocabulary suggestion overlay appears with matching terms. The matching algorithm uses composite scoring with three weighted components.
| Component | Weight | Method |
|---|---|---|
| Token Jaccard | 50% | Jaccard similarity between the token sets of the input and the candidate term |
| Compact Substring | 30% | Whether the compact form of the input is a substring of (or contains) the candidate's compact form |
| Plural/Singular Bonus | 20% | Bonus score if the input differs from the candidate only by a plural/singular suffix |
Confidence Categories
Match results are categorised by confidence level, each displayed with a distinct visual badge in the suggestion overlay.
| Category | Score | Meaning |
|---|---|---|
| Exact | 1.0 | The compact forms match exactly. This is the canonical term. |
| High | 0.8 or above | Very likely the same concept with a different casing or minor variation |
| Possible | 0.6 or above | May be related, worth reviewing to decide if this should use the canonical term |
| Low | Below 0.6 | Weak match, probably a different concept |
Graph Editor Integration
The vocabulary suggestion overlay integrates directly into the graph editor. When you type a node label, edge label, or property name, suggestions appear in a body-appended panel. The overlay only opens on typing (not on focus), which preserves the grid's arrow-key navigation.
You can accept a suggestion to use the canonical term, or ignore it if the name is intentionally different. You can also add new terms to the registry directly from the suggestion panel when you introduce a new concept.
Vocabulary Editor
The dedicated vocabulary editor (accessed via Governance > Model Vocabulary in the menu) provides a term list with a detail panel. You can add, edit, and remove terms, provide descriptions for each term, and optionally link a term to a content page that documents the concept in detail. Creation and modification dates are tracked for audit purposes.