Databricks and Lakehouse
Document Databricks and Lakehouse architectures including medallion architecture, Unity Catalog, data lineage, and Delta Lake table schemas.
Databricks and the Lakehouse paradigm combine data lakes and data warehouses. NeoArc excels at documenting these architectures with ERD for schemas and Graph diagrams for lineage.
Lakehouse Documentation Features
NeoArc features map to Lakehouse concepts.
Documenting Medallion Architecture
Use diagrams to show the data progression.
| Layer | Purpose | Content |
|---|---|---|
| Bronze (Raw) | Landing zone for raw data | CDC streams, file loads, API pulls |
| Silver (Cleansed) | Validated and cleaned data | Deduplication, type casting, validation |
| Gold (Business) | Business-ready analytics | Aggregations, KPIs, dimensions |
Documenting Unity Catalog
Use Graph Diagrams to show the catalogue hierarchy.
Data Architecture Use Cases
Medallion Architecture
Document medallion architecture with Bronze to Silver to Gold flow
Delta Lake Schemas
Define Delta Lake table schemas with column types, nullability, and constraints
Data Lineage
Map data lineage showing source to transform to target
Unity Catalog Structure
Document Unity Catalog structure with catalogue hierarchy
Ingestion Patterns
Show streaming versus batch ingestion with parallel paths
Data Quality Rules
Document data quality rules with validation logic
External Sources
Map external data sources with connection details
Table Partitioning
Document table partitioning strategy
Platform Architecture Use Cases
Workspace Architecture
Document workspace architecture with Databricks components
Cluster Configurations
Show cluster configurations with compute specifications
Notebook Dependencies
Map notebook dependencies with job relationships
Workflow Orchestration
Document workflow orchestration with job sequences
Multi-Workspace Setup
Show multi-workspace setup with workspace connections
Metastore Sharing
Document Unity Catalog metastore sharing across workspaces
Service Principal Access
Map service principal access with permission flows
Network Architecture
Document network architecture with VNet and firewall layout
Data Governance Use Cases
Data Classification
Document data classification with sensitivity labels
Access Control Policies
Map access control policies with principal to permission to object flows
Row-Level Security
Document row-level security with filter rules
Column Masking
Show column masking rules with ERD annotations
Audit Requirements
Document audit requirements with compliance targets
Data Retention
Map data retention policies with retention rules
PII Handling
Document PII handling with privacy considerations
GDPR Compliance
Show GDPR compliance approach with privacy decisions
Snowflake Documentation
Snowflake's multi-cluster shared data architecture can also be documented.
Microsoft Fabric Documentation
Microsoft Fabric unifies data engineering, data science, and business intelligence.
Next Steps
Getting Started with Schemas
Introduction to authoring and organising schema definitions.
Learn more →