Logo
NeoArc Studio

Databricks and Lakehouse

Document Databricks and Lakehouse architectures including medallion architecture, Unity Catalog, data lineage, and Delta Lake table schemas.

Databricks and the Lakehouse paradigm combine data lakes and data warehouses. NeoArc excels at documenting these architectures with ERD for schemas and Graph diagrams for lineage.

Lakehouse Documentation Features

NeoArc features map to Lakehouse concepts.

Documenting Medallion Architecture

Use diagrams to show the data progression.

LayerPurposeContent
Bronze (Raw)Landing zone for raw dataCDC streams, file loads, API pulls
Silver (Cleansed)Validated and cleaned dataDeduplication, type casting, validation
Gold (Business)Business-ready analyticsAggregations, KPIs, dimensions

Documenting Unity Catalog

Use Graph Diagrams to show the catalogue hierarchy.

Data Architecture Use Cases

Medallion Architecture
Document medallion architecture with Bronze to Silver to Gold flow
Delta Lake Schemas
Define Delta Lake table schemas with column types, nullability, and constraints
Data Lineage
Map data lineage showing source to transform to target
Unity Catalog Structure
Document Unity Catalog structure with catalogue hierarchy
Ingestion Patterns
Show streaming versus batch ingestion with parallel paths
Data Quality Rules
Document data quality rules with validation logic
External Sources
Map external data sources with connection details
Table Partitioning
Document table partitioning strategy

Platform Architecture Use Cases

Workspace Architecture
Document workspace architecture with Databricks components
Cluster Configurations
Show cluster configurations with compute specifications
Notebook Dependencies
Map notebook dependencies with job relationships
Workflow Orchestration
Document workflow orchestration with job sequences
Multi-Workspace Setup
Show multi-workspace setup with workspace connections
Metastore Sharing
Document Unity Catalog metastore sharing across workspaces
Service Principal Access
Map service principal access with permission flows
Network Architecture
Document network architecture with VNet and firewall layout

Data Governance Use Cases

Data Classification
Document data classification with sensitivity labels
Access Control Policies
Map access control policies with principal to permission to object flows
Row-Level Security
Document row-level security with filter rules
Column Masking
Show column masking rules with ERD annotations
Audit Requirements
Document audit requirements with compliance targets
Data Retention
Map data retention policies with retention rules
PII Handling
Document PII handling with privacy considerations
GDPR Compliance
Show GDPR compliance approach with privacy decisions

Snowflake Documentation

Snowflake's multi-cluster shared data architecture can also be documented.

Microsoft Fabric Documentation

Microsoft Fabric unifies data engineering, data science, and business intelligence.

Next Steps

Cloud Platforms
AWS, Azure, and GCP documentation
Learn more →
Data Engineering Tools
dbt, Airflow, and Spark documentation
Learn more →
Getting Started with Schemas
Introduction to authoring and organising schema definitions.
Learn more →