Skip to content

DuckDB Schema

The index is stored in a single DuckDB file (default: ~/.sqlprism/graph.duckdb). Current schema version: 1.0.

Tables (9)

repos

Registered repositories.

Column Type Description
repo_id INTEGER PK Auto-increment ID.
name TEXT UNIQUE Repo name (used in queries to filter).
path TEXT Absolute path to the repo directory.
repo_type TEXT Repo type: sql, dbt, or sqlmesh. Default: sql. Used by reindex_files to select the correct renderer.
last_commit TEXT Git commit hash at last index time.
last_branch TEXT Git branch at last index time.
indexed_at TIMESTAMP When the repo was last indexed.

files

Indexed SQL files. Each file belongs to one repo.

Column Type Description
file_id INTEGER PK Auto-increment ID.
repo_id INTEGER FK to repos.
path TEXT Relative path within the repo.
language TEXT Always sql.
checksum TEXT SHA-256 of file content. Used for incremental indexing.
indexed_at TIMESTAMP When this file was last parsed.

Unique constraint: (repo_id, path).

nodes

SQL entities: tables, views, CTEs, queries.

Column Type Description
node_id INTEGER PK Auto-increment ID.
file_id INTEGER or NULL FK to files. NULL for phantom nodes.
kind TEXT table, view, cte, or query.
name TEXT Entity name (dialect-normalised).
schema TEXT or NULL SQL schema (e.g. bronze, silver, public).
language TEXT Always sql.
line_start INTEGER or NULL First line in source file.
line_end INTEGER or NULL Last line in source file.
metadata JSON or NULL Extra metadata (e.g. sqlmesh_model, dbt_model).

Unique constraint: (file_id, kind, name, schema).

Phantom nodes: When a file is deleted or re-indexed, nodes that are referenced by edges from other files are kept with file_id = NULL rather than deleted. This preserves cross-file relationships. When the node reappears (e.g. the file is re-parsed), the phantom is merged with the new node.

edges

Relationships between nodes.

Column Type Description
edge_id INTEGER PK Auto-increment ID.
source_id INTEGER FK to nodes — the node that contains the reference.
target_id INTEGER FK to nodes — the node being referenced.
relationship TEXT e.g. references, cte_reference.
context TEXT or NULL e.g. FROM clause, JOIN clause.
metadata JSON or NULL Extra edge metadata.

column_usage

Per-column usage tracking across models.

Column Type Description
usage_id INTEGER PK Auto-increment ID.
node_id INTEGER FK to nodes — the CTE/query where the column is used.
table_name TEXT Referenced table name (short name, not fully qualified).
column_name TEXT Column name.
usage_type TEXT select, where, join_on, group_by, order_by, having, partition_by, window_order, insert, update.
alias TEXT or NULL Output alias (for SELECT columns).
transform TEXT or NULL Transform expression (e.g. CAST(x AS DATE), SUM(amount)).
file_id INTEGER FK to files.

column_lineage

End-to-end column lineage chains.

Column Type Description
lineage_id INTEGER PK Auto-increment ID.
file_id INTEGER FK to files.
output_node TEXT Final output entity name (table/view/query).
output_column TEXT Output column name.
chain_index INTEGER Disambiguates multiple lineage paths for the same output column.
hop_index INTEGER Position in the chain (0 = source, increasing toward output).
hop_column TEXT Column name at this hop.
hop_table TEXT Table/CTE name at this hop.
hop_expression TEXT or NULL Transform expression at this hop (e.g. CAST(x AS VARCHAR)).

A lineage chain traces one output column back to its source. Multiple chains (different chain_index values) exist when a column has multiple source paths (e.g. COALESCE of two columns).

columns

Column definitions extracted from DDL or schema files.

Column Type Description
column_id INTEGER PK Auto-increment ID.
node_id INTEGER FK to nodes.
column_name TEXT Column name.
data_type TEXT or NULL Column data type (e.g. INTEGER, TEXT).
position INTEGER Column ordinal position in the table.
source TEXT How the column was discovered: definition, usage, lineage.

conventions

Inferred or overridden project conventions per layer.

Column Type Description
convention_id INTEGER PK Auto-increment ID.
repo_id INTEGER FK to repos.
layer TEXT Layer name (e.g. staging, marts).
convention_type TEXT naming, references, required_columns, or column_style.
payload JSON Convention data (pattern, allowed_targets, etc.).
confidence FLOAT Confidence score (0.0-1.0).
source TEXT inferred or override.
model_count INTEGER Number of models in this layer when inferred.

Unique constraint: (repo_id, layer, convention_type).

semantic_tags

Semantic tags assigned to models by clustering or explicit override.

Column Type Description
tag_id INTEGER PK Auto-increment ID.
repo_id INTEGER FK to repos.
tag_name TEXT Tag name (e.g. customer, order, revenue).
node_id INTEGER FK to nodes.
confidence FLOAT Confidence score (0.0-1.0).
source TEXT inferred, anchor, or explicit.

Unique constraint: (repo_id, tag_name, node_id).

Indexes

CREATE INDEX idx_nodes_name ON nodes(name);
CREATE INDEX idx_nodes_kind ON nodes(kind);
CREATE INDEX idx_nodes_file ON nodes(file_id);
CREATE INDEX idx_nodes_kind_name ON nodes(kind, name);
CREATE INDEX idx_nodes_schema ON nodes(schema);
CREATE INDEX idx_edges_source ON edges(source_id);
CREATE INDEX idx_edges_target ON edges(target_id);
CREATE INDEX idx_edges_relationship ON edges(relationship);
CREATE INDEX idx_col_table ON column_usage(table_name);
CREATE INDEX idx_col_column ON column_usage(column_name);
CREATE INDEX idx_col_table_column ON column_usage(table_name, column_name);
CREATE INDEX idx_col_usage_type ON column_usage(usage_type);
CREATE INDEX idx_lineage_output ON column_lineage(output_node, output_column);
CREATE INDEX idx_lineage_hop ON column_lineage(hop_table, hop_column);
CREATE INDEX idx_lineage_file ON column_lineage(file_id);
CREATE INDEX idx_conventions_repo ON conventions(repo_id);
CREATE INDEX idx_conventions_type ON conventions(convention_type);
CREATE INDEX idx_tags_name ON semantic_tags(tag_name);
CREATE INDEX idx_tags_node ON semantic_tags(node_id);
CREATE INDEX idx_tags_repo ON semantic_tags(repo_id);

Entity Relationship

repos 1──* files 1──* nodes
                       │
                  source_id / target_id
                       │
                      edges

files 1──* column_usage *──1 nodes
files 1──* column_lineage
nodes 1──* columns
repos 1──* conventions
repos 1──* semantic_tags *──1 nodes