Types¶
Data structures shared across the codebase. Parsers produce these, the indexer consumes them.
Shared data types for the SQL indexer.
These dataclasses define the contract between parsers and the indexer orchestrator. Every language parser returns a ParseResult. The orchestrator consumes ParseResults and writes to DuckDB. Parsers never touch the database. The orchestrator never does language-specific parsing.
NodeResult
dataclass
¶
NodeResult(
kind,
name,
line_start=None,
line_end=None,
metadata=None,
)
A nameable entity found in a file.
Nodes are the universal unit of the knowledge graph. A node is anything a parser identifies as structurally meaningful: a table, view, CTE, function, class, module, API endpoint, Terraform resource, etc.
The kind field is parser-defined and unconstrained -- each language
emits whatever kinds are meaningful for it.
Attributes:
| Name | Type | Description |
|---|---|---|
kind |
str
|
Entity type (e.g. |
name |
str
|
Unqualified entity name (e.g. |
line_start |
int | None
|
First line in the source file, or |
line_end |
int | None
|
Last line in the source file, or |
metadata |
dict | None
|
Arbitrary parser-supplied metadata (schema, dialect, filters, etc.). |
EdgeResult
dataclass
¶
EdgeResult(
source_name,
source_kind,
target_name,
target_kind,
relationship,
context=None,
metadata=None,
)
A relationship between two entities.
Edges reference nodes by (name, kind) pairs, not database IDs. The
indexer orchestrator resolves these to node IDs during insertion. This
means parsers don't need to know about the database and parse order
doesn't matter.
The target may be in another file or even another repo. If unresolved at insert time, the orchestrator creates a phantom node.
Attributes:
| Name | Type | Description |
|---|---|---|
source_name |
str
|
Name of the source node. |
source_kind |
str
|
Kind of the source node (e.g. |
target_name |
str
|
Name of the target node. |
target_kind |
str
|
Kind of the target node (e.g. |
relationship |
str
|
Edge label (e.g. |
context |
str | None
|
Human-readable context (e.g. |
metadata |
dict | None
|
Arbitrary edge metadata (source_schema, target_schema, etc.). |
ColumnUsageResult
dataclass
¶
ColumnUsageResult(
node_name,
node_kind,
table_name,
column_name,
usage_type,
alias=None,
transform=None,
)
SQL-specific: column-level lineage from sqlglot.
Records which columns are used where and how. Only the SQL parser populates these -- all other parsers return an empty list.
This data is stored in a separate table from edges because column usage is high-volume with its own query patterns (flat scans, not graph traversals).
Attributes:
| Name | Type | Description |
|---|---|---|
node_name |
str
|
Name of the query/CTE/view that uses this column. |
node_kind |
str
|
Kind of the owning node (e.g. |
table_name |
str
|
Source table the column belongs to. |
column_name |
str
|
Column name ( |
usage_type |
str
|
How the column is used. One of |
alias |
str | None
|
Output alias if the column is aliased ( |
transform |
str | None
|
Wrapping expression, e.g. |
LineageHop
dataclass
¶
LineageHop(column, table, expression=None)
One hop in a column lineage chain.
Attributes:
| Name | Type | Description |
|---|---|---|
column |
str
|
Column name at this hop. |
table |
str
|
Table, CTE, or subquery name at this hop. |
expression |
str | None
|
Transform applied at this hop (e.g. |
ColumnLineageResult
dataclass
¶
ColumnLineageResult(
output_column, output_node, chain=list()
)
End-to-end column lineage through CTEs and subqueries.
Traces an output column back to its source table column(s), recording each intermediate hop (CTE, subquery, transform).
Attributes:
| Name | Type | Description |
|---|---|---|
output_column |
str
|
Column name in the final output. |
output_node |
str
|
The query, table, or view that produces this column. |
chain |
list[LineageHop]
|
Ordered hops from output back to source. |
ColumnDefResult
dataclass
¶
ColumnDefResult(
node_name,
column_name,
data_type=None,
position=None,
source="definition",
description=None,
)
Column definition metadata extracted from SQL or schema files.
Records column-level metadata for tables and views, including the column's data type, ordinal position, provenance, and optional description. Parsers emit these alongside nodes and edges so the indexer can build a column-level catalogue.
Attributes:
| Name | Type | Description |
|---|---|---|
node_name |
str
|
The table or view this column belongs to. |
column_name |
str
|
Column name as declared. |
data_type |
str | None
|
SQL data type (e.g. |
position |
int | None
|
Ordinal position in the column list (0-based), or |
source |
Literal['definition', 'inferred', 'schema_yml', 'sqlmesh_schema']
|
How this column was discovered. One of |
description |
str | None
|
Human-readable column description, or |
ParseResult
dataclass
¶
ParseResult(
language,
nodes=list(),
edges=list(),
column_usage=list(),
column_lineage=list(),
columns=list(),
errors=list(),
)
Everything a parser returns for one file.
This is the complete interface contract. A parser receives a file path and its content, and returns one of these. The orchestrator handles everything from here -- ID assignment, edge resolution, database writes.
Mutation contract
ParseResult is intentionally mutable (not frozen=True).
Renderers and post-processing steps mutate nodes, edges, and
other lists in-place -- e.g. appending synthetic nodes, deduplicating
edges, or rewriting names during normalisation. This is by design:
allocating a new ParseResult for every transform would add complexity
with no practical benefit, since a ParseResult is owned by a single
file-processing pipeline and is never shared across threads.
Attributes:
| Name | Type | Description |
|---|---|---|
language |
str
|
Parser language identifier (e.g. |
nodes |
list[NodeResult]
|
Entities discovered in the file. |
edges |
list[EdgeResult]
|
Relationships between entities. |
column_usage |
list[ColumnUsageResult]
|
Column-level usage records (SQL only). |
column_lineage |
list[ColumnLineageResult]
|
End-to-end column lineage chains (SQL only). |
columns |
list[ColumnDefResult]
|
Column definitions extracted from DDL or schema files. |
errors |
list[str]
|
Non-fatal parse errors encountered during processing. |
parse_repo_config
¶
parse_repo_config(cfg, global_dialect=None)
Parse a repo config value into (path, dialect, dialect_overrides).
Supports both simple string paths and full config dicts::
"my-repo": "/path/to/repo"
"my-repo": {"path": "/path", "dialect": "starrocks",
"dialect_overrides": {"athena/": "athena"}}
Source code in src/sqlprism/types.py
170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 | |