DuckDB Schema¶

The index is stored in a single DuckDB file (default: ~/.sqlprism/graph.duckdb). Current schema version: 1.0.

Tables (9)¶

`repos`¶

Registered repositories.

Column	Type	Description
`repo_id`	INTEGER PK	Auto-increment ID.
`name`	TEXT UNIQUE	Repo name (used in queries to filter).
`path`	TEXT	Absolute path to the repo directory.
`repo_type`	TEXT	Repo type: `sql`, `dbt`, or `sqlmesh`. Default: `sql`. Used by `reindex_files` to select the correct renderer.
`last_commit`	TEXT	Git commit hash at last index time.
`last_branch`	TEXT	Git branch at last index time.
`indexed_at`	TIMESTAMP	When the repo was last indexed.

`files`¶

Indexed SQL files. Each file belongs to one repo.

Column	Type	Description
`file_id`	INTEGER PK	Auto-increment ID.
`repo_id`	INTEGER	FK to `repos`.
`path`	TEXT	Relative path within the repo.
`language`	TEXT	Always `sql`.
`checksum`	TEXT	SHA-256 of file content. Used for incremental indexing.
`indexed_at`	TIMESTAMP	When this file was last parsed.

Unique constraint: (repo_id, path).

`nodes`¶

SQL entities: tables, views, CTEs, queries.

Column	Type	Description
`node_id`	INTEGER PK	Auto-increment ID.
`file_id`	INTEGER or NULL	FK to `files`. NULL for phantom nodes.
`kind`	TEXT	`table`, `view`, `cte`, or `query`.
`name`	TEXT	Entity name (dialect-normalised).
`schema`	TEXT or NULL	SQL schema (e.g. `bronze`, `silver`, `public`).
`language`	TEXT	Always `sql`.
`line_start`	INTEGER or NULL	First line in source file.
`line_end`	INTEGER or NULL	Last line in source file.
`metadata`	JSON or NULL	Extra metadata (e.g. `sqlmesh_model`, `dbt_model`).

Unique constraint: (file_id, kind, name, schema).

Phantom nodes: When a file is deleted or re-indexed, nodes that are referenced by edges from other files are kept with file_id = NULL rather than deleted. This preserves cross-file relationships. When the node reappears (e.g. the file is re-parsed), the phantom is merged with the new node.

`edges`¶

Relationships between nodes.

Column	Type	Description
`edge_id`	INTEGER PK	Auto-increment ID.
`source_id`	INTEGER	FK to `nodes` — the node that contains the reference.
`target_id`	INTEGER	FK to `nodes` — the node being referenced.
`relationship`	TEXT	e.g. `references`, `cte_reference`.
`context`	TEXT or NULL	e.g. `FROM clause`, `JOIN clause`.
`metadata`	JSON or NULL	Extra edge metadata.

`column_usage`¶

Per-column usage tracking across models.

Column	Type	Description
`usage_id`	INTEGER PK	Auto-increment ID.
`node_id`	INTEGER	FK to `nodes` — the CTE/query where the column is used.
`table_name`	TEXT	Referenced table name (short name, not fully qualified).
`column_name`	TEXT	Column name.
`usage_type`	TEXT	`select`, `where`, `join_on`, `group_by`, `order_by`, `having`, `partition_by`, `window_order`, `insert`, `update`.
`alias`	TEXT or NULL	Output alias (for SELECT columns).
`transform`	TEXT or NULL	Transform expression (e.g. `CAST(x AS DATE)`, `SUM(amount)`).
`file_id`	INTEGER	FK to `files`.

`column_lineage`¶

End-to-end column lineage chains.

Column	Type	Description
`lineage_id`	INTEGER PK	Auto-increment ID.
`file_id`	INTEGER	FK to `files`.
`output_node`	TEXT	Final output entity name (table/view/query).
`output_column`	TEXT	Output column name.
`chain_index`	INTEGER	Disambiguates multiple lineage paths for the same output column.
`hop_index`	INTEGER	Position in the chain (0 = source, increasing toward output).
`hop_column`	TEXT	Column name at this hop.
`hop_table`	TEXT	Table/CTE name at this hop.
`hop_expression`	TEXT or NULL	Transform expression at this hop (e.g. `CAST(x AS VARCHAR)`).

A lineage chain traces one output column back to its source. Multiple chains (different chain_index values) exist when a column has multiple source paths (e.g. COALESCE of two columns).

`columns`¶

Column definitions extracted from DDL or schema files.

Column	Type	Description
`column_id`	INTEGER PK	Auto-increment ID.
`node_id`	INTEGER	FK to `nodes`.
`column_name`	TEXT	Column name.
`data_type`	TEXT or NULL	Column data type (e.g. `INTEGER`, `TEXT`).
`position`	INTEGER	Column ordinal position in the table.
`source`	TEXT	How the column was discovered: `definition`, `usage`, `lineage`.

`conventions`¶

Inferred or overridden project conventions per layer.

Column	Type	Description
`convention_id`	INTEGER PK	Auto-increment ID.
`repo_id`	INTEGER	FK to `repos`.
`layer`	TEXT	Layer name (e.g. `staging`, `marts`).
`convention_type`	TEXT	`naming`, `references`, `required_columns`, or `column_style`.
`payload`	JSON	Convention data (pattern, allowed_targets, etc.).
`confidence`	FLOAT	Confidence score (0.0-1.0).
`source`	TEXT	`inferred` or `override`.
`model_count`	INTEGER	Number of models in this layer when inferred.

Unique constraint: (repo_id, layer, convention_type).

`semantic_tags`¶

Semantic tags assigned to models by clustering or explicit override.

Column	Type	Description
`tag_id`	INTEGER PK	Auto-increment ID.
`repo_id`	INTEGER	FK to `repos`.
`tag_name`	TEXT	Tag name (e.g. `customer`, `order`, `revenue`).
`node_id`	INTEGER	FK to `nodes`.
`confidence`	FLOAT	Confidence score (0.0-1.0).
`source`	TEXT	`inferred`, `anchor`, or `explicit`.

Unique constraint: (repo_id, tag_name, node_id).

Indexes¶

CREATE INDEX idx_nodes_name ON nodes(name);
CREATE INDEX idx_nodes_kind ON nodes(kind);
CREATE INDEX idx_nodes_file ON nodes(file_id);
CREATE INDEX idx_nodes_kind_name ON nodes(kind, name);
CREATE INDEX idx_nodes_schema ON nodes(schema);
CREATE INDEX idx_edges_source ON edges(source_id);
CREATE INDEX idx_edges_target ON edges(target_id);
CREATE INDEX idx_edges_relationship ON edges(relationship);
CREATE INDEX idx_col_table ON column_usage(table_name);
CREATE INDEX idx_col_column ON column_usage(column_name);
CREATE INDEX idx_col_table_column ON column_usage(table_name, column_name);
CREATE INDEX idx_col_usage_type ON column_usage(usage_type);
CREATE INDEX idx_lineage_output ON column_lineage(output_node, output_column);
CREATE INDEX idx_lineage_hop ON column_lineage(hop_table, hop_column);
CREATE INDEX idx_lineage_file ON column_lineage(file_id);
CREATE INDEX idx_conventions_repo ON conventions(repo_id);
CREATE INDEX idx_conventions_type ON conventions(convention_type);
CREATE INDEX idx_tags_name ON semantic_tags(tag_name);
CREATE INDEX idx_tags_node ON semantic_tags(node_id);
CREATE INDEX idx_tags_repo ON semantic_tags(repo_id);

Entity Relationship¶

repos 1──* files 1──* nodes
                       │
                  source_id / target_id
                       │
                      edges

files 1──* column_usage *──1 nodes
files 1──* column_lineage
nodes 1──* columns
repos 1──* conventions
repos 1──* semantic_tags *──1 nodes

DuckDB Schema¶

Tables (9)¶

repos¶

files¶

nodes¶

edges¶

column_usage¶

column_lineage¶

columns¶

conventions¶

semantic_tags¶