Skip to content

Relationships

You can document relationships between node outputs. These relationships differ from node dependencies, as they describe how the outputs of different nodes are related to each other.

Documenting these relationships helps your team better understand how nodes interact, making it easier to work with materialized tables. Additionally, these relationships facilitate the generation of Entity-Relationship Diagrams (ERDs).

Advantages of Having ERD Diagrams Available for the Team

  • Improved Data Understanding: ERDs provide a clear visual representation of how data flows between different nodes, making it easier for team members to grasp the structure of the data.
  • Faster Onboarding: New team members can quickly understand the relationships between different tables and nodes, reducing the learning curve.
  • Better Collaboration: A shared ERD diagram enables data engineers, analysts, and stakeholders to discuss and align on data relationships efficiently.
  • Reduced Errors: Understanding relationships between nodes helps prevent inconsistencies and ensures that transformations and aggregations align with business logic.
  • Optimized Query Performance: By visualizing relationships, teams can identify redundant joins or inefficient queries and optimize database performance accordingly.

To document the relationships between nodes or tables, you can declare them as follows:

@node(
    ...
    output=Schema(
        Column("col_name", String(), "description)

        # declare the relationship with other nodes output columns
        .one_to_many(another_node.output.col, "relationship description")
    )
def my_node(...):
    ...

The possible relationships are .one_to_one(...), .one_to_many(...), .many_to_one(...) and .many_to_many(...)

Important

Relationships are solely for documentation purposes; they do not enforce any constraints at runtime. Therefore, not documenting relationships has no impact on execution.

Here is a more detailed implementation of these relationships and its usage:

flypipe.schema.column.Column

Defines a column in the output of a flypipe node.

Parameters:

Name Type Description Default
name str

Name of the column.

required
type Type

Data type of the column.

required
description (str, optional)

A description of the column.

''
pk (bool, optional)

Marks the column as primary key or not. Defaults to False.

False
Source code in flypipe/schema/column.py
def __init__(
    self,
    name: str,
    type: Type,
    description: str = "",
    pk: bool = False,
):
    self.name = name
    self.type = type
    if not description and get_config("require_schema_description"):
        raise ValueError(
            f"Descriptions on schema columns configured as mandatory but no description provided for column "
            f"{self.name}"
        )
    self.description = description

    # Each column knows who is the node that it is associated with, it is used to map the Relationship between
    # node1.output.col1 and node2.output.col2. In this way, col1 knows that belongs to node1
    # and col2 to node2
    self.parent = None

    # this is a dict that holds the foreign keys between this column and other nodes columns,
    # the key is other Column, and the value holds Relationship
    self.relationships = {}

    self.pk = pk

many_to_many(other, description=None)

Adds a N:N relationship between this column and other node output column

Parameters:

Name Type Description Default
other Column

Node output column.

required
description (str, optional)

A description of the relationship between this column and other node output column.

None

Usage:

@node(
    ...
    output=Schema(
        ...
        Column("col_name", String(), "description)
        .many_to_many(another_node.output.col, "relationship description")
        ...
    )
)
def my_node(...):
    ...
Source code in flypipe/schema/column.py
def many_to_many(self, other: "Column", description: str = None):
    """Adds a N:N relationship between this column and other node output column

    Args:
        other (Column): Node output column.
        description (str,optional): A description of the relationship between this column and other node output column.

    Usage:

    ``` py
    @node(
        ...
        output=Schema(
            ...
            Column("col_name", String(), "description)
            .many_to_many(another_node.output.col, "relationship description")
            ...
        )
    )
    def my_node(...):
        ...
    ```
    """

    return self._add_relationship(other, RelationshipType.MANY_TO_MANY, description)

many_to_one(other, description=None)

Adds a N:1 relationship between this column and other node output column

Parameters:

Name Type Description Default
other Column

Node output column.

required
description (str, optional)

A description of the relationship between this column and other node output column.

None

Usage:

@node(
    ...
    output=Schema(
        ...
        Column("col_name", String(), "description)
        .many_to_one(another_node.output.col, "relationship description")
        ...
    )
)
def my_node(...):
    ...
Source code in flypipe/schema/column.py
def many_to_one(self, other: "Column", description: str = None):
    """Adds a N:1 relationship between this column and other node output column

    Args:
        other (Column): Node output column.
        description (str,optional): A description of the relationship between this column and other node output column.

    Usage:

    ``` py
    @node(
        ...
        output=Schema(
            ...
            Column("col_name", String(), "description)
            .many_to_one(another_node.output.col, "relationship description")
            ...
        )
    )
    def my_node(...):
        ...
    ```
    """
    return self._add_relationship(other, RelationshipType.MANY_TO_ONE, description)

one_to_many(other, description=None)

Adds a 1:N relationship between this column and other node output column

Parameters:

Name Type Description Default
other Column

Node output column.

required
description (str, optional)

A description of the relationship between this column and other node output column.

None

Usage:

@node(
    ...
    output=Schema(
        ...
        Column("col_name", String(), "description)
        .one_to_many(another_node.output.col, "relationship description")
        ...
    )
)
def my_node(...):
    ...
Source code in flypipe/schema/column.py
def one_to_many(self, other: "Column", description: str = None):
    """Adds a 1:N relationship between this column and other node output column

    Args:
        other (Column): Node output column.
        description (str,optional): A description of the relationship between this column and other node output column.

    Usage:

    ``` py
    @node(
        ...
        output=Schema(
            ...
            Column("col_name", String(), "description)
            .one_to_many(another_node.output.col, "relationship description")
            ...
        )
    )
    def my_node(...):
        ...
    ```
    """

    return self._add_relationship(other, RelationshipType.ONE_TO_MANY, description)

one_to_one(other, description=None)

Adds a 1:1 relationship between this column and other node output column

Parameters:

Name Type Description Default
other Column

Node output column.

required
description (str, optional)

A description of the relationship between this column and other node output column.

None

Usage:

@node(
    ...
    output=Schema(
        ...
        Column("col_name", String(), "description)
        .one_to_one(another_node.output.col, "relationship description")
        ...
    )
)
def my_node(...):
    ...
Source code in flypipe/schema/column.py
def one_to_one(self, other: "Column", description: str = None):
    """Adds a 1:1 relationship between this column and other node output column

    Args:
        other (Column): Node output column.
        description (str,optional): A description of the relationship between this column and other node output column.

    Usage:

    ``` py
    @node(
        ...
        output=Schema(
            ...
            Column("col_name", String(), "description)
            .one_to_one(another_node.output.col, "relationship description")
            ...
        )
    )
    def my_node(...):
        ...
    ```
    """
    return self._add_relationship(other, RelationshipType.ONE_TO_ONE, description)