Types¶

Data structures shared across the codebase. Parsers produce these, the indexer consumes them.

Shared data types for the SQL indexer.

These dataclasses define the contract between parsers and the indexer orchestrator. Every language parser returns a ParseResult. The orchestrator consumes ParseResults and writes to DuckDB. Parsers never touch the database. The orchestrator never does language-specific parsing.

NodeResult `dataclass` ¶

NodeResult(
    kind,
    name,
    line_start=None,
    line_end=None,
    metadata=None,
)

A nameable entity found in a file.

Nodes are the universal unit of the knowledge graph. A node is anything a parser identifies as structurally meaningful: a table, view, CTE, function, class, module, API endpoint, Terraform resource, etc.

The kind field is parser-defined and unconstrained -- each language emits whatever kinds are meaningful for it.

Attributes:

Name	Type	Description
`kind`	`str`	Entity type (e.g. `"table"`, `"view"`, `"cte"`).
`name`	`str`	Unqualified entity name (e.g. `"orders"`).
`line_start`	`int \| None`	First line in the source file, or `None` if unknown.
`line_end`	`int \| None`	Last line in the source file, or `None` if unknown.
`metadata`	`dict \| None`	Arbitrary parser-supplied metadata (schema, dialect, filters, etc.).

EdgeResult `dataclass` ¶

EdgeResult(
    source_name,
    source_kind,
    target_name,
    target_kind,
    relationship,
    context=None,
    metadata=None,
)

A relationship between two entities.

Edges reference nodes by (name, kind) pairs, not database IDs. The indexer orchestrator resolves these to node IDs during insertion. This means parsers don't need to know about the database and parse order doesn't matter.

The target may be in another file or even another repo. If unresolved at insert time, the orchestrator creates a phantom node.

Attributes:

Name	Type	Description
`source_name`	`str`	Name of the source node.
`source_kind`	`str`	Kind of the source node (e.g. `"query"`).
`target_name`	`str`	Name of the target node.
`target_kind`	`str`	Kind of the target node (e.g. `"table"`).
`relationship`	`str`	Edge label (e.g. `"references"`, `"defines"`, `"inserts_into"`, `"cte_references"`).
`context`	`str \| None`	Human-readable context (e.g. `"FROM clause"`, `"JOIN clause"`).
`metadata`	`dict \| None`	Arbitrary edge metadata (source_schema, target_schema, etc.).

ColumnUsageResult `dataclass` ¶

ColumnUsageResult(
    node_name,
    node_kind,
    table_name,
    column_name,
    usage_type,
    alias=None,
    transform=None,
)

SQL-specific: column-level lineage from sqlglot.

Records which columns are used where and how. Only the SQL parser populates these -- all other parsers return an empty list.

This data is stored in a separate table from edges because column usage is high-volume with its own query patterns (flat scans, not graph traversals).

Attributes:

Name	Type	Description
`node_name`	`str`	Name of the query/CTE/view that uses this column.
`node_kind`	`str`	Kind of the owning node (e.g. `"query"`, `"cte"`).
`table_name`	`str`	Source table the column belongs to.
`column_name`	`str`	Column name (`""` for `SELECT `).
`usage_type`	`str`	How the column is used. One of `"select"`, `"where"`, `"join_on"`, `"group_by"`, `"order_by"`, `"having"`, `"insert"`, `"update"`, `"partition_by"`, `"window_order"`, `"qualify"`.
`alias`	`str \| None`	Output alias if the column is aliased (`AS name`).
`transform`	`str \| None`	Wrapping expression, e.g. `"CAST(a.updated AS DATETIME)"`.

LineageHop `dataclass` ¶

LineageHop(column, table, expression=None)

One hop in a column lineage chain.

Attributes:

Name	Type	Description
`column`	`str`	Column name at this hop.
`table`	`str`	Table, CTE, or subquery name at this hop.
`expression`	`str \| None`	Transform applied at this hop (e.g. `"CAST(amount AS DECIMAL)"`), or `None` if the column passes through unchanged.

ColumnLineageResult `dataclass` ¶

ColumnLineageResult(
    output_column, output_node, chain=list()
)

End-to-end column lineage through CTEs and subqueries.

Traces an output column back to its source table column(s), recording each intermediate hop (CTE, subquery, transform).

Attributes:

Name	Type	Description
`output_column`	`str`	Column name in the final output.
`output_node`	`str`	The query, table, or view that produces this column.
`chain`	`list[LineageHop]`	Ordered hops from output back to source.

ColumnDefResult `dataclass` ¶

ColumnDefResult(
    node_name,
    column_name,
    data_type=None,
    position=None,
    source="definition",
    description=None,
)

Column definition metadata extracted from SQL or schema files.

Records column-level metadata for tables and views, including the column's data type, ordinal position, provenance, and optional description. Parsers emit these alongside nodes and edges so the indexer can build a column-level catalogue.

Attributes:

Name	Type	Description
`node_name`	`str`	The table or view this column belongs to.
`column_name`	`str`	Column name as declared.
`data_type`	`str \| None`	SQL data type (e.g. `"VARCHAR"`, `"INT"`), or `None` if unknown.
`position`	`int \| None`	Ordinal position in the column list (0-based), or `None` if unavailable.
`source`	`Literal['definition', 'inferred', 'schema_yml', 'sqlmesh_schema']`	How this column was discovered. One of `"definition"` (from CREATE/ALTER DDL), `"inferred"` (from SELECT output), `"schema_yml"` (from dbt schema.yml), `"sqlmesh_schema"` (from sqlmesh model schema).
`description`	`str \| None`	Human-readable column description, or `None`.

ParseResult `dataclass` ¶

ParseResult(
    language,
    nodes=list(),
    edges=list(),
    column_usage=list(),
    column_lineage=list(),
    columns=list(),
    errors=list(),
)

Everything a parser returns for one file.

This is the complete interface contract. A parser receives a file path and its content, and returns one of these. The orchestrator handles everything from here -- ID assignment, edge resolution, database writes.

Mutation contract

ParseResult is intentionally mutable (not frozen=True). Renderers and post-processing steps mutate nodes, edges, and other lists in-place -- e.g. appending synthetic nodes, deduplicating edges, or rewriting names during normalisation. This is by design: allocating a new ParseResult for every transform would add complexity with no practical benefit, since a ParseResult is owned by a single file-processing pipeline and is never shared across threads.

Attributes:

Name	Type	Description
`language`	`str`	Parser language identifier (e.g. `"sql"`).
`nodes`	`list[NodeResult]`	Entities discovered in the file.
`edges`	`list[EdgeResult]`	Relationships between entities.
`column_usage`	`list[ColumnUsageResult]`	Column-level usage records (SQL only).
`column_lineage`	`list[ColumnLineageResult]`	End-to-end column lineage chains (SQL only).
`columns`	`list[ColumnDefResult]`	Column definitions extracted from DDL or schema files.
`errors`	`list[str]`	Non-fatal parse errors encountered during processing.

parse_repo_config ¶

parse_repo_config(cfg, global_dialect=None)

Parse a repo config value into (path, dialect, dialect_overrides).

Supports both simple string paths and full config dicts::

"my-repo": "/path/to/repo"
"my-repo": {"path": "/path", "dialect": "starrocks",
            "dialect_overrides": {"athena/": "athena"}}

Source code in src/sqlprism/types.py

def parse_repo_config(
    cfg: str | dict,
    global_dialect: str | None = None,
) -> tuple[str, str | None, dict[str, str] | None]:
    """Parse a repo config value into (path, dialect, dialect_overrides).

    Supports both simple string paths and full config dicts::

        "my-repo": "/path/to/repo"
        "my-repo": {"path": "/path", "dialect": "starrocks",
                    "dialect_overrides": {"athena/": "athena"}}
    """
    if isinstance(cfg, str):
        return cfg, global_dialect, None
    return (
        cfg["path"],
        cfg.get("dialect", global_dialect),
        cfg.get("dialect_overrides"),
    )

Types¶

NodeResult dataclass ¶

EdgeResult dataclass ¶

ColumnUsageResult dataclass ¶

LineageHop dataclass ¶

ColumnLineageResult dataclass ¶

ColumnDefResult dataclass ¶

ParseResult dataclass ¶

parse_repo_config ¶

NodeResult `dataclass` ¶

EdgeResult `dataclass` ¶

ColumnUsageResult `dataclass` ¶

LineageHop `dataclass` ¶

ColumnLineageResult `dataclass` ¶

ColumnDefResult `dataclass` ¶

ParseResult `dataclass` ¶