delex.graph package
Submodules
delex.graph.algorithms module
- delex.graph.algorithms.clone_graph(nodes)
- delex.graph.algorithms.find_all_nodes(node)
- delex.graph.algorithms.find_sink(node)
- delex.graph.algorithms.topological_sort(sink_node)
delex.graph.node module
- class delex.graph.node.IntersectNode
Bases:
SetOpNode
intersect the output of all incoming edges and return a single set
- Attributes:
id_string
a string and identifies this node for graph comparison
- in_degree
- is_sink
- is_source
- out_degree
- output_col
streamable
True if the operation at this node can be streamed, else False
Methods
add_in_edge
(other)add an edge between other -> self
add_out_edge
(other)add an edge between self -> other
ancestors
()get all ancestors of this node
build
(index_table, id_col[, cache])build this node over index_table using id_col as the unique id, optionally with cache
equivalent
(other)check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
execute
(stream)execute the operation of this node over a DataFrameStream and return a new DataFrameStream
insert_after
(node)insert node after this node, e.g.
insert_before
(node)insert node before this node, e.g.
iter_dependencies
()return an iterator over the dependencies of this node
iter_in
()return an iterator of nodes for the incoming edges
iter_out
()return an iterator of nodes for the outgoing edges
pop
()remove this node from the graph and reconnect edges between in and out
remove_in_edge
(other)remove other -> self
remove_in_edges
()remove all incoming edges, x -> self
remove_out_edge
(other)remove self -> other
remove_out_edges
()remove all outgoing edges, self -> x
validate
()perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
working_set_size
()return the working set size of each component use for this node, dict values are None if self.build has not been called yet
init
- validate()
perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
- Raises:
- ValueError
if validation fails
- class delex.graph.node.MinusNode(left, right)
Bases:
SetOpNode
compute the set minus of two sets left - right
- Attributes:
id_string
a string and identifies this node for graph comparison
- in_degree
- is_sink
- is_source
- left
- out_degree
- output_col
- right
streamable
True if the operation at this node can be streamed, else False
Methods
add_in_edge
(other)add an edge between other -> self
add_out_edge
(other)add an edge between self -> other
ancestors
()get all ancestors of this node
build
(index_table, id_col[, cache])build this node over index_table using id_col as the unique id, optionally with cache
equivalent
(other)check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
execute
(stream)execute the operation of this node over a DataFrameStream and return a new DataFrameStream
insert_after
(node)insert node after this node, e.g.
insert_before
(node)insert node before this node, e.g.
iter_dependencies
()return an iterator over the dependencies of this node
iter_in
()return an iterator of nodes for the incoming edges
iter_out
()return an iterator of nodes for the outgoing edges
pop
()remove this node from the graph and reconnect edges between in and out
remove_in_edge
(other)remove other -> self
remove_in_edges
()remove all incoming edges, x -> self
remove_out_edge
(other)remove self -> other
remove_out_edges
()remove all outgoing edges, self -> x
validate
()perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
working_set_size
()return the working set size of each component use for this node, dict values are None if self.build has not been called yet
init
- property left
- property right
- validate()
perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
- Raises:
- ValueError
if validation fails
- class delex.graph.node.Node
Bases:
ABC
abstract base class for all graph nodes
- Attributes:
id_string
a string and identifies this node for graph comparison
- in_degree
- is_sink
- is_source
- out_degree
- output_col
streamable
True if the operation at this node can be streamed, else False
Methods
add_in_edge
(other)add an edge between other -> self
add_out_edge
(other)add an edge between self -> other
get all ancestors of this node
build
(index_table, id_col[, cache])build this node over index_table using id_col as the unique id, optionally with cache
equivalent
(other)check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
execute
(stream)execute the operation of this node over a DataFrameStream and return a new DataFrameStream
insert_after
(node)insert node after this node, e.g.
insert_before
(node)insert node before this node, e.g.
return an iterator over the dependencies of this node
iter_in
()return an iterator of nodes for the incoming edges
iter_out
()return an iterator of nodes for the outgoing edges
pop
()remove this node from the graph and reconnect edges between in and out
remove_in_edge
(other)remove other -> self
remove all incoming edges, x -> self
remove_out_edge
(other)remove self -> other
remove all outgoing edges, self -> x
validate
()perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
return the working set size of each component use for this node, dict values are None if self.build has not been called yet
- add_in_edge(other)
add an edge between other -> self
- add_out_edge(other)
add an edge between self -> other
- ancestors() set
get all ancestors of this node
- Returns:
- Set[Node]
all the ancestors of this node
- abstractmethod build(index_table: DataFrame, id_col: str, cache: BuildCache | None = None)
build this node over index_table using id_col as the unique id, optionally with cache
- Parameters:
- index_tablepyspark.sql.DataFrame
the dataframe that will be preprocessed / indexed
- id_colstr
the name of the unique id column in index_table
- cacheOptional[BuildCache] = None
the cache for built indexes and hash tables
- equivalent(other) bool
check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
- Parameters:
- otherNode
the node to be compared to
- Returns:
- True if equivalent else False
- abstractmethod execute(stream)
execute the operation of this node over a DataFrameStream and return a new DataFrameStream
- property id_string: str
a string and identifies this node for graph comparison without accounting for edges
- property in_degree: int
- insert_after(node)
insert node after this node, e.g.
self -> x becomes self -> node -> x
- Parameters:
- nodeNode
the node to be inserted
- insert_before(node)
insert node before this node, e.g.
x -> self becomes x -> node -> self
- Parameters:
- nodeNode
the node to be inserted
- property is_sink: bool
- property is_source: bool
- iter_dependencies() Iterator
return an iterator over the dependencies of this node
- iter_in() Iterator
return an iterator of nodes for the incoming edges
- iter_out() Iterator
return an iterator of nodes for the outgoing edges
- property out_degree: int
- property output_col: str
- pop()
remove this node from the graph and reconnect edges between in and out
- Returns:
- self
- Raises:
- RuntimeError
if self.out_degree > 1 and self.in_degree > 1
- remove_in_edge(other)
remove other -> self
- Raises:
- KeyError
if other -> self doesn’t exist
- remove_in_edges()
remove all incoming edges, x -> self
- remove_out_edge(other)
remove self -> other
- Raises:
- KeyError
if self -> other doesn’t exist
- remove_out_edges()
remove all outgoing edges, self -> x
- abstract property streamable: bool
True if the operation at this node can be streamed, else False
- abstractmethod validate()
perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
- Raises:
- ValueError
if validation fails
- abstractmethod working_set_size() dict
return the working set size of each component use for this node, dict values are None if self.build has not been called yet
- class delex.graph.node.PredicateNode(predicate)
Bases:
Node
a node that execute a Predicate
- Attributes:
id_string
a string and identifies this node for graph comparison
- in_degree
- is_sink
- is_source
- out_degree
- output_col
- predicate
streamable
True if the operation at this node can be streamed, else False
Methods
add_in_edge
(other)add an edge between other -> self
add_out_edge
(other)add an edge between self -> other
ancestors
()get all ancestors of this node
build
(index_table, id_col[, cache])build this node over index_table using id_col as the unique id, optionally with cache
equivalent
(other)check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
execute
(stream)execute the operation of this node over a DataFrameStream and return a new DataFrameStream
insert_after
(node)insert node after this node, e.g.
insert_before
(node)insert node before this node, e.g.
return an iterator over the dependencies of this node
iter_in
()return an iterator of nodes for the incoming edges
iter_out
()return an iterator of nodes for the outgoing edges
pop
()remove this node from the graph and reconnect edges between in and out
remove_in_edge
(other)remove other -> self
remove_in_edges
()remove all incoming edges, x -> self
remove_out_edge
(other)remove self -> other
remove_out_edges
()remove all outgoing edges, self -> x
validate
()perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
return the working set size of each component use for this node, dict values are None if self.build has not been called yet
init
- build(index_table, id_col, cache=None)
build this node over index_table using id_col as the unique id, optionally with cache
- Parameters:
- index_tablepyspark.sql.DataFrame
the dataframe that will be preprocessed / indexed
- id_colstr
the name of the unique id column in index_table
- cacheOptional[BuildCache] = None
the cache for built indexes and hash tables
- execute(stream)
execute the operation of this node over a DataFrameStream and return a new DataFrameStream
- init()
- iter_dependencies()
return an iterator over the dependencies of this node
- property predicate
- property streamable
True if the operation at this node can be streamed, else False
- validate()
perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
- Raises:
- ValueError
if validation fails
- working_set_size() dict
return the working set size of each component use for this node, dict values are None if self.build has not been called yet
- class delex.graph.node.SetOpNode
Bases:
Node
Base Class for all set operations nodes
- Attributes:
id_string
a string and identifies this node for graph comparison
- in_degree
- is_sink
- is_source
- out_degree
- output_col
streamable
True if the operation at this node can be streamed, else False
Methods
add_in_edge
(other)add an edge between other -> self
add_out_edge
(other)add an edge between self -> other
ancestors
()get all ancestors of this node
build
(index_table, id_col[, cache])build this node over index_table using id_col as the unique id, optionally with cache
equivalent
(other)check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
execute
(stream)execute the operation of this node over a DataFrameStream and return a new DataFrameStream
insert_after
(node)insert node after this node, e.g.
insert_before
(node)insert node before this node, e.g.
iter_dependencies
()return an iterator over the dependencies of this node
iter_in
()return an iterator of nodes for the incoming edges
iter_out
()return an iterator of nodes for the outgoing edges
pop
()remove this node from the graph and reconnect edges between in and out
remove_in_edge
(other)remove other -> self
remove_in_edges
()remove all incoming edges, x -> self
remove_out_edge
(other)remove self -> other
remove_out_edges
()remove all outgoing edges, self -> x
validate
()perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
return the working set size of each component use for this node, dict values are None if self.build has not been called yet
init
- build(index_table, id_col, cache=None)
build this node over index_table using id_col as the unique id, optionally with cache
- Parameters:
- index_tablepyspark.sql.DataFrame
the dataframe that will be preprocessed / indexed
- id_colstr
the name of the unique id column in index_table
- cacheOptional[BuildCache] = None
the cache for built indexes and hash tables
- execute(stream)
execute the operation of this node over a DataFrameStream and return a new DataFrameStream
- init()
- property streamable
True if the operation at this node can be streamed, else False
- working_set_size() dict
return the working set size of each component use for this node, dict values are None if self.build has not been called yet
- class delex.graph.node.UnionNode
Bases:
SetOpNode
union the output of all incoming edges and return a single set
- Attributes:
id_string
a string and identifies this node for graph comparison
- in_degree
- is_sink
- is_source
- out_degree
- output_col
streamable
True if the operation at this node can be streamed, else False
Methods
add_in_edge
(other)add an edge between other -> self
add_out_edge
(other)add an edge between self -> other
ancestors
()get all ancestors of this node
build
(index_table, id_col[, cache])build this node over index_table using id_col as the unique id, optionally with cache
equivalent
(other)check self is equivalent to other, this does a recursive check and can be used to compare two graphs if self and other are both sinks
execute
(stream)execute the operation of this node over a DataFrameStream and return a new DataFrameStream
insert_after
(node)insert node after this node, e.g.
insert_before
(node)insert node before this node, e.g.
iter_dependencies
()return an iterator over the dependencies of this node
iter_in
()return an iterator of nodes for the incoming edges
iter_out
()return an iterator of nodes for the outgoing edges
pop
()remove this node from the graph and reconnect edges between in and out
remove_in_edge
(other)remove other -> self
remove_in_edges
()remove all incoming edges, x -> self
remove_out_edge
(other)remove self -> other
remove_out_edges
()remove all outgoing edges, self -> x
validate
()perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
working_set_size
()return the working set size of each component use for this node, dict values are None if self.build has not been called yet
init
- validate()
perform validation for this node, e.g. ensure that UnionNodes have multiple inputs, MinusNodes have two inputs, etc.
- Raises:
- ValueError
if validation fails