delex.graph package

Submodules

delex.graph.algorithms module

delex.graph.algorithms.clone_graph(nodes)

delex.graph.algorithms.find_all_nodes(node)

delex.graph.algorithms.find_sink(node)

delex.graph.algorithms.topological_sort(sink_node)

delex.graph.node module

class delex.graph.node.IntersectNode

Bases: SetOpNode

intersect the output of all incoming edges and return a single set

Attributes:

id_string: a string and identifies this node for graph comparison
in_degree
is_sink
is_source
out_degree
output_col
streamable: True if the operation at this node can be streamed, else False

Methods

`add_in_edge`(other)	add an edge between other -> self
`add_out_edge`(other)	add an edge between self -> other
`ancestors`()	get all ancestors of this node
`build`(index_table, id_col[, cache])	build this node over index_table using id_col as the unique id, optionally with cache
`equivalent`(other)	check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
`execute`(stream)	execute the operation of this node over a DataFrameStream and return a new DataFrameStream
`insert_after`(node)	insert node after this node, e.g.
`insert_before`(node)	insert node before this node, e.g.
`iter_dependencies`()	return an iterator over the dependencies of this node
`iter_in`()	return an iterator of nodes for the incoming edges
`iter_out`()	return an iterator of nodes for the outgoing edges
`pop`()	remove this node from the graph and reconnect edges between in and out
`remove_in_edge`(other)	remove other -> self
`remove_in_edges`()	remove all incoming edges, x -> self
`remove_out_edge`(other)	remove self -> other
`remove_out_edges`()	remove all outgoing edges, self -> x
`validate`()	perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
`working_set_size`()	return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:

ValueError: if validation fails

class delex.graph.node.MinusNode(left, right)

Bases: SetOpNode

compute the set minus of two sets left - right

Attributes:

id_string: a string and identifies this node for graph comparison
in_degree
is_sink
is_source
left
out_degree
output_col
right
streamable: True if the operation at this node can be streamed, else False

Methods

`add_in_edge`(other)	add an edge between other -> self
`add_out_edge`(other)	add an edge between self -> other
`ancestors`()	get all ancestors of this node
`build`(index_table, id_col[, cache])	build this node over index_table using id_col as the unique id, optionally with cache
`equivalent`(other)	check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
`execute`(stream)	execute the operation of this node over a DataFrameStream and return a new DataFrameStream
`insert_after`(node)	insert node after this node, e.g.
`insert_before`(node)	insert node before this node, e.g.
`iter_dependencies`()	return an iterator over the dependencies of this node
`iter_in`()	return an iterator of nodes for the incoming edges
`iter_out`()	return an iterator of nodes for the outgoing edges
`pop`()	remove this node from the graph and reconnect edges between in and out
`remove_in_edge`(other)	remove other -> self
`remove_in_edges`()	remove all incoming edges, x -> self
`remove_out_edge`(other)	remove self -> other
`remove_out_edges`()	remove all outgoing edges, self -> x
`validate`()	perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
`working_set_size`()	return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

property left

property right

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:

ValueError: if validation fails

class delex.graph.node.Node

Bases: ABC

abstract base class for all graph nodes

Attributes:

id_string: a string and identifies this node for graph comparison
in_degree
is_sink
is_source
out_degree
output_col
streamable: True if the operation at this node can be streamed, else False

Methods

`add_in_edge`(other)	add an edge between other -> self
`add_out_edge`(other)	add an edge between self -> other
`ancestors`()	get all ancestors of this node
`build`(index_table, id_col[, cache])	build this node over index_table using id_col as the unique id, optionally with cache
`equivalent`(other)	check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
`execute`(stream)	execute the operation of this node over a DataFrameStream and return a new DataFrameStream
`insert_after`(node)	insert node after this node, e.g.
`insert_before`(node)	insert node before this node, e.g.
`iter_dependencies`()	return an iterator over the dependencies of this node
`iter_in`()	return an iterator of nodes for the incoming edges
`iter_out`()	return an iterator of nodes for the outgoing edges
`pop`()	remove this node from the graph and reconnect edges between in and out
`remove_in_edge`(other)	remove other -> self
`remove_in_edges`()	remove all incoming edges, x -> self
`remove_out_edge`(other)	remove self -> other
`remove_out_edges`()	remove all outgoing edges, self -> x
`validate`()	perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
`working_set_size`()	return the working set size of each component use for this node, dict values are None if self.build has not been called yet

add_in_edge(other): add an edge between other -> self

add_out_edge(other): add an edge between self -> other

ancestors() → set

get all ancestors of this node

Returns:

Set[Node]: all the ancestors of this node

abstractmethod build(index_table: DataFrame, id_col: str, cache: BuildCache | None = None)

build this node over index_table using id_col as the unique id, optionally with cache

Parameters:

index_tablepyspark.sql.DataFrame: the dataframe that will be preprocessed / indexed
id_colstr: the name of the unique id column in index_table
cacheOptional[BuildCache] = None: the cache for built indexes and hash tables

equivalent(other) → bool

check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks

Parameters:

otherNode: the node to be compared to

Returns:

True if equivalent else False

abstractmethod execute(stream): execute the operation of this node over a DataFrameStream and return a new DataFrameStream

property id_string: str: a string and identifies this node for graph comparison without accounting for edges

property in_degree: int

insert_after(node)

insert node after this node, e.g.

self -> x becomes self -> node -> x

Parameters:

nodeNode: the node to be inserted

insert_before(node)

insert node before this node, e.g.

x -> self becomes x -> node -> self

Parameters:

nodeNode: the node to be inserted

property is_sink: bool

property is_source: bool

iter_dependencies() → Iterator: return an iterator over the dependencies of this node

iter_in() → Iterator: return an iterator of nodes for the incoming edges

iter_out() → Iterator: return an iterator of nodes for the outgoing edges

property out_degree: int

property output_col: str

pop()

remove this node from the graph and reconnect edges between in and out

Returns:

self

Raises:

RuntimeError: if self.out_degree > 1 and self.in_degree > 1

remove_in_edge(other)

remove other -> self

Raises:

KeyError: if other -> self doesn’t exist

remove_in_edges(): remove all incoming edges, x -> self

remove_out_edge(other)

remove self -> other

Raises:

KeyError: if self -> other doesn’t exist

remove_out_edges(): remove all outgoing edges, self -> x

abstract property streamable: bool: True if the operation at this node can be streamed, else False

abstractmethod validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:

ValueError: if validation fails

abstractmethod working_set_size() → dict: return the working set size of each component use for this node, dict values are None if self.build has not been called yet

class delex.graph.node.PredicateNode(predicate)

Bases: Node

a node that execute a Predicate

Attributes:

id_string: a string and identifies this node for graph comparison
in_degree
is_sink
is_source
out_degree
output_col
predicate
streamable: True if the operation at this node can be streamed, else False

Methods

`add_in_edge`(other)	add an edge between other -> self
`add_out_edge`(other)	add an edge between self -> other
`ancestors`()	get all ancestors of this node
`build`(index_table, id_col[, cache])	build this node over index_table using id_col as the unique id, optionally with cache
`equivalent`(other)	check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
`execute`(stream)	execute the operation of this node over a DataFrameStream and return a new DataFrameStream
`insert_after`(node)	insert node after this node, e.g.
`insert_before`(node)	insert node before this node, e.g.
`iter_dependencies`()	return an iterator over the dependencies of this node
`iter_in`()	return an iterator of nodes for the incoming edges
`iter_out`()	return an iterator of nodes for the outgoing edges
`pop`()	remove this node from the graph and reconnect edges between in and out
`remove_in_edge`(other)	remove other -> self
`remove_in_edges`()	remove all incoming edges, x -> self
`remove_out_edge`(other)	remove self -> other
`remove_out_edges`()	remove all outgoing edges, self -> x
`validate`()	perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
`working_set_size`()	return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

build(index_table, id_col, cache=None)

build this node over index_table using id_col as the unique id, optionally with cache

Parameters:

index_tablepyspark.sql.DataFrame: the dataframe that will be preprocessed / indexed
id_colstr: the name of the unique id column in index_table
cacheOptional[BuildCache] = None: the cache for built indexes and hash tables

execute(stream): execute the operation of this node over a DataFrameStream and return a new DataFrameStream

init()

iter_dependencies(): return an iterator over the dependencies of this node

property predicate

property streamable: True if the operation at this node can be streamed, else False

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:

ValueError: if validation fails

working_set_size() → dict: return the working set size of each component use for this node, dict values are None if self.build has not been called yet

class delex.graph.node.SetOpNode

Bases: Node

Base Class for all set operations nodes

Attributes:

id_string: a string and identifies this node for graph comparison
in_degree
is_sink
is_source
out_degree
output_col
streamable: True if the operation at this node can be streamed, else False

Methods

`add_in_edge`(other)	add an edge between other -> self
`add_out_edge`(other)	add an edge between self -> other
`ancestors`()	get all ancestors of this node
`build`(index_table, id_col[, cache])	build this node over index_table using id_col as the unique id, optionally with cache
`equivalent`(other)	check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
`execute`(stream)	execute the operation of this node over a DataFrameStream and return a new DataFrameStream
`insert_after`(node)	insert node after this node, e.g.
`insert_before`(node)	insert node before this node, e.g.
`iter_dependencies`()	return an iterator over the dependencies of this node
`iter_in`()	return an iterator of nodes for the incoming edges
`iter_out`()	return an iterator of nodes for the outgoing edges
`pop`()	remove this node from the graph and reconnect edges between in and out
`remove_in_edge`(other)	remove other -> self
`remove_in_edges`()	remove all incoming edges, x -> self
`remove_out_edge`(other)	remove self -> other
`remove_out_edges`()	remove all outgoing edges, self -> x
`validate`()	perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
`working_set_size`()	return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

build(index_table, id_col, cache=None)

build this node over index_table using id_col as the unique id, optionally with cache

Parameters:

index_tablepyspark.sql.DataFrame: the dataframe that will be preprocessed / indexed
id_colstr: the name of the unique id column in index_table
cacheOptional[BuildCache] = None: the cache for built indexes and hash tables

execute(stream): execute the operation of this node over a DataFrameStream and return a new DataFrameStream

init()

property streamable: True if the operation at this node can be streamed, else False

working_set_size() → dict: return the working set size of each component use for this node, dict values are None if self.build has not been called yet

class delex.graph.node.UnionNode

Bases: SetOpNode

union the output of all incoming edges and return a single set

Attributes:

id_string: a string and identifies this node for graph comparison
in_degree
is_sink
is_source
out_degree
output_col
streamable: True if the operation at this node can be streamed, else False

Methods

`add_in_edge`(other)	add an edge between other -> self
`add_out_edge`(other)	add an edge between self -> other
`ancestors`()	get all ancestors of this node
`build`(index_table, id_col[, cache])	build this node over index_table using id_col as the unique id, optionally with cache
`equivalent`(other)	check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
`execute`(stream)	execute the operation of this node over a DataFrameStream and return a new DataFrameStream
`insert_after`(node)	insert node after this node, e.g.
`insert_before`(node)	insert node before this node, e.g.
`iter_dependencies`()	return an iterator over the dependencies of this node
`iter_in`()	return an iterator of nodes for the incoming edges
`iter_out`()	return an iterator of nodes for the outgoing edges
`pop`()	remove this node from the graph and reconnect edges between in and out
`remove_in_edge`(other)	remove other -> self
`remove_in_edges`()	remove all incoming edges, x -> self
`remove_out_edge`(other)	remove self -> other
`remove_out_edges`()	remove all outgoing edges, self -> x
`validate`()	perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
`working_set_size`()	return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:

ValueError: if validation fails

delex.graph.utils module

delex.graph.utils.nodes_to_dot(nodes: Node | Iterable[Node]) → str

convert a graph to a dot string representation

Parameters:

nodesNode | Iterable[Node]: the nodes to be converted to a string

Returns:

str: a dot graph string of nodes

delex.graph package

Submodules

delex.graph.algorithms module

delex.graph.node module

delex.graph.utils module

Module contents