delex.graph package

Submodules

delex.graph.algorithms module

delex.graph.algorithms.clone_graph(nodes)
delex.graph.algorithms.find_all_nodes(node)
delex.graph.algorithms.find_sink(node)
delex.graph.algorithms.topological_sort(sink_node)

delex.graph.node module

class delex.graph.node.IntersectNode

Bases: SetOpNode

intersect the output of all incoming edges and return a single set

Attributes:
id_string

a string and identifies this node for graph comparison

in_degree
is_sink
is_source
out_degree
output_col
streamable

True if the operation at this node can be streamed, else False

Methods

add_in_edge(other)

add an edge between other -> self

add_out_edge(other)

add an edge between self -> other

ancestors()

get all ancestors of this node

build(index_table, id_col[, cache])

build this node over index_table using id_col as the unique id, optionally with cache

equivalent(other)

check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks

execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

insert_after(node)

insert node after this node, e.g.

insert_before(node)

insert node before this node, e.g.

iter_dependencies()

return an iterator over the dependencies of this node

iter_in()

return an iterator of nodes for the incoming edges

iter_out()

return an iterator of nodes for the outgoing edges

pop()

remove this node from the graph and reconnect edges between in and out

remove_in_edge(other)

remove other -> self

remove_in_edges()

remove all incoming edges, x -> self

remove_out_edge(other)

remove self -> other

remove_out_edges()

remove all outgoing edges, self -> x

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

working_set_size()

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:
ValueError

if validation fails

class delex.graph.node.MinusNode(left, right)

Bases: SetOpNode

compute the set minus of two sets left - right

Attributes:
id_string

a string and identifies this node for graph comparison

in_degree
is_sink
is_source
left
out_degree
output_col
right
streamable

True if the operation at this node can be streamed, else False

Methods

add_in_edge(other)

add an edge between other -> self

add_out_edge(other)

add an edge between self -> other

ancestors()

get all ancestors of this node

build(index_table, id_col[, cache])

build this node over index_table using id_col as the unique id, optionally with cache

equivalent(other)

check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks

execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

insert_after(node)

insert node after this node, e.g.

insert_before(node)

insert node before this node, e.g.

iter_dependencies()

return an iterator over the dependencies of this node

iter_in()

return an iterator of nodes for the incoming edges

iter_out()

return an iterator of nodes for the outgoing edges

pop()

remove this node from the graph and reconnect edges between in and out

remove_in_edge(other)

remove other -> self

remove_in_edges()

remove all incoming edges, x -> self

remove_out_edge(other)

remove self -> other

remove_out_edges()

remove all outgoing edges, self -> x

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

working_set_size()

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

property left
property right
validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:
ValueError

if validation fails

class delex.graph.node.Node

Bases: ABC

abstract base class for all graph nodes

Attributes:
id_string

a string and identifies this node for graph comparison

in_degree
is_sink
is_source
out_degree
output_col
streamable

True if the operation at this node can be streamed, else False

Methods

add_in_edge(other)

add an edge between other -> self

add_out_edge(other)

add an edge between self -> other

ancestors()

get all ancestors of this node

build(index_table, id_col[, cache])

build this node over index_table using id_col as the unique id, optionally with cache

equivalent(other)

check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks

execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

insert_after(node)

insert node after this node, e.g.

insert_before(node)

insert node before this node, e.g.

iter_dependencies()

return an iterator over the dependencies of this node

iter_in()

return an iterator of nodes for the incoming edges

iter_out()

return an iterator of nodes for the outgoing edges

pop()

remove this node from the graph and reconnect edges between in and out

remove_in_edge(other)

remove other -> self

remove_in_edges()

remove all incoming edges, x -> self

remove_out_edge(other)

remove self -> other

remove_out_edges()

remove all outgoing edges, self -> x

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

working_set_size()

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

add_in_edge(other)

add an edge between other -> self

add_out_edge(other)

add an edge between self -> other

ancestors() set

get all ancestors of this node

Returns:
Set[Node]

all the ancestors of this node

abstractmethod build(index_table: DataFrame, id_col: str, cache: BuildCache | None = None)

build this node over index_table using id_col as the unique id, optionally with cache

Parameters:
index_tablepyspark.sql.DataFrame

the dataframe that will be preprocessed / indexed

id_colstr

the name of the unique id column in index_table

cacheOptional[BuildCache] = None

the cache for built indexes and hash tables

equivalent(other) bool

check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks

Parameters:
otherNode

the node to be compared to

Returns:
True if equivalent else False
abstractmethod execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

property id_string: str

a string and identifies this node for graph comparison without accounting for edges

property in_degree: int
insert_after(node)

insert node after this node, e.g.

self -> x becomes self -> node -> x

Parameters:
nodeNode

the node to be inserted

insert_before(node)

insert node before this node, e.g.

x -> self becomes x -> node -> self

Parameters:
nodeNode

the node to be inserted

property is_sink: bool
property is_source: bool
iter_dependencies() Iterator

return an iterator over the dependencies of this node

iter_in() Iterator

return an iterator of nodes for the incoming edges

iter_out() Iterator

return an iterator of nodes for the outgoing edges

property out_degree: int
property output_col: str
pop()

remove this node from the graph and reconnect edges between in and out

Returns:
self
Raises:
RuntimeError

if self.out_degree > 1 and self.in_degree > 1

remove_in_edge(other)

remove other -> self

Raises:
KeyError

if other -> self doesn’t exist

remove_in_edges()

remove all incoming edges, x -> self

remove_out_edge(other)

remove self -> other

Raises:
KeyError

if self -> other doesn’t exist

remove_out_edges()

remove all outgoing edges, self -> x

abstract property streamable: bool

True if the operation at this node can be streamed, else False

abstractmethod validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:
ValueError

if validation fails

abstractmethod working_set_size() dict

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

class delex.graph.node.PredicateNode(predicate)

Bases: Node

a node that execute a Predicate

Attributes:
id_string

a string and identifies this node for graph comparison

in_degree
is_sink
is_source
out_degree
output_col
predicate
streamable

True if the operation at this node can be streamed, else False

Methods

add_in_edge(other)

add an edge between other -> self

add_out_edge(other)

add an edge between self -> other

ancestors()

get all ancestors of this node

build(index_table, id_col[, cache])

build this node over index_table using id_col as the unique id, optionally with cache

equivalent(other)

check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks

execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

insert_after(node)

insert node after this node, e.g.

insert_before(node)

insert node before this node, e.g.

iter_dependencies()

return an iterator over the dependencies of this node

iter_in()

return an iterator of nodes for the incoming edges

iter_out()

return an iterator of nodes for the outgoing edges

pop()

remove this node from the graph and reconnect edges between in and out

remove_in_edge(other)

remove other -> self

remove_in_edges()

remove all incoming edges, x -> self

remove_out_edge(other)

remove self -> other

remove_out_edges()

remove all outgoing edges, self -> x

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

working_set_size()

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

build(index_table, id_col, cache=None)

build this node over index_table using id_col as the unique id, optionally with cache

Parameters:
index_tablepyspark.sql.DataFrame

the dataframe that will be preprocessed / indexed

id_colstr

the name of the unique id column in index_table

cacheOptional[BuildCache] = None

the cache for built indexes and hash tables

execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

init()
iter_dependencies()

return an iterator over the dependencies of this node

property predicate
property streamable

True if the operation at this node can be streamed, else False

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:
ValueError

if validation fails

working_set_size() dict

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

class delex.graph.node.SetOpNode

Bases: Node

Base Class for all set operations nodes

Attributes:
id_string

a string and identifies this node for graph comparison

in_degree
is_sink
is_source
out_degree
output_col
streamable

True if the operation at this node can be streamed, else False

Methods

add_in_edge(other)

add an edge between other -> self

add_out_edge(other)

add an edge between self -> other

ancestors()

get all ancestors of this node

build(index_table, id_col[, cache])

build this node over index_table using id_col as the unique id, optionally with cache

equivalent(other)

check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks

execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

insert_after(node)

insert node after this node, e.g.

insert_before(node)

insert node before this node, e.g.

iter_dependencies()

return an iterator over the dependencies of this node

iter_in()

return an iterator of nodes for the incoming edges

iter_out()

return an iterator of nodes for the outgoing edges

pop()

remove this node from the graph and reconnect edges between in and out

remove_in_edge(other)

remove other -> self

remove_in_edges()

remove all incoming edges, x -> self

remove_out_edge(other)

remove self -> other

remove_out_edges()

remove all outgoing edges, self -> x

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

working_set_size()

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

build(index_table, id_col, cache=None)

build this node over index_table using id_col as the unique id, optionally with cache

Parameters:
index_tablepyspark.sql.DataFrame

the dataframe that will be preprocessed / indexed

id_colstr

the name of the unique id column in index_table

cacheOptional[BuildCache] = None

the cache for built indexes and hash tables

execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

init()
property streamable

True if the operation at this node can be streamed, else False

working_set_size() dict

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

class delex.graph.node.UnionNode

Bases: SetOpNode

union the output of all incoming edges and return a single set

Attributes:
id_string

a string and identifies this node for graph comparison

in_degree
is_sink
is_source
out_degree
output_col
streamable

True if the operation at this node can be streamed, else False

Methods

add_in_edge(other)

add an edge between other -> self

add_out_edge(other)

add an edge between self -> other

ancestors()

get all ancestors of this node

build(index_table, id_col[, cache])

build this node over index_table using id_col as the unique id, optionally with cache

equivalent(other)

check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks

execute(stream)

execute the operation of this node over a DataFrameStream and return a new DataFrameStream

insert_after(node)

insert node after this node, e.g.

insert_before(node)

insert node before this node, e.g.

iter_dependencies()

return an iterator over the dependencies of this node

iter_in()

return an iterator of nodes for the incoming edges

iter_out()

return an iterator of nodes for the outgoing edges

pop()

remove this node from the graph and reconnect edges between in and out

remove_in_edge(other)

remove other -> self

remove_in_edges()

remove all incoming edges, x -> self

remove_out_edge(other)

remove self -> other

remove_out_edges()

remove all outgoing edges, self -> x

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

working_set_size()

return the working set size of each component use for this node, dict values are None if self.build has not been called yet

init

validate()

perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.

Raises:
ValueError

if validation fails

delex.graph.utils module

delex.graph.utils.nodes_to_dot(nodes: Node | Iterable[Node]) str

convert a graph to a dot string representation

Parameters:
nodesNode | Iterable[Node]

the nodes to be converted to a string

Returns:
str

a dot graph string of nodes

Module contents