delex.utils package

Submodules

delex.utils.build_cache module

class delex.utils.build_cache.BuildCache

Bases: object

a cache of indexes, tokenizers, etc.

Methods

get(key)

get the object associated with key.

get(key: CachedObjectKey) CacheItem

get the object associated with key. If key doesn’t exist in the cache, adds a new CacheItem to cache and returns it

Parameters:
keyCachedObjectKey

the key for the CacheItem being retrieved

Returns:
CacheItem
class delex.utils.build_cache.CacheItem

Bases: object

A lockable item in the BuildCache. Essentially a a pointer with a mutex to guard it for parallel builds

Attributes:
obj

the object (e.g. index, strings, tokenizer, etc.)

property obj

the object (e.g. index, strings, tokenizer, etc.)

class delex.utils.build_cache.CachedObjectKey

Bases: object

A key for a cached object in the BuildCache

delex.utils.funcs module

delex.utils.funcs.attach_current_thread_jvm()
delex.utils.funcs.get_logger(name, level=10)
delex.utils.funcs.human_format_bytes(n)
delex.utils.funcs.init_jvm(vmargs=[])
delex.utils.funcs.is_persisted(df)
delex.utils.funcs.persisted(df, storage_level=StorageLevel(True, True, False, False, 1))
delex.utils.funcs.size_in_bytes(f: Path, /) int

get the size on disk in bytes of f

Parameters:
fPath

path to the file or directory on the local filesystem

Returns:
int

if f is a file, return the size of the single file else get total size in bytes of all files in the directory similar to du utility

Raises:
FileNotFoundError

if f doesn’t exist

delex.utils.funcs.type_check(var, var_name, expected)

type checking utility, throw a type error if the var isn’t the expected type

delex.utils.hash_function module

class delex.utils.hash_function.HashFunction(seed=None)

Bases: object

a simple wrapper class for the XXHash3

Methods

hash(s)

hash s and return the 128 bits as bytes

hash_split(s, /)

hash s and return the 128 bits split between two ints

hash(s: str) bytes

hash s and return the 128 bits as bytes

hash_split(s: str, /) Tuple[int, int]

hash s and return the 128 bits split between two ints

delex.utils.numba_functions module

delex.utils.numba_functions.sorted_set_overlap(l_ind, r_ind, /)

compute the overlap between two sorted unique arrays

Returns:
int
delex.utils.numba_functions.typed_list_to_array(l)

covert a numba typed list to a numpy array

delex.utils.traits module

class delex.utils.traits.SparkDistributable

Bases: ABC

Methods

deinit()

deinitialize the object, closing resources (e.g. file handles).

init()

initialize the object to be used on in a spark worker

to_spark()

send the obj to the spark cluster to be used on spark workers

abstractmethod deinit()

deinitialize the object, closing resources (e.g. file handles)

abstractmethod init()

initialize the object to be used on in a spark worker

abstractmethod to_spark()

send the obj to the spark cluster to be used on spark workers

Module contents