delex.utils package
Submodules
delex.utils.build_cache module
- class delex.utils.build_cache.BuildCache
Bases:
object
a cache of indexes, tokenizers, etc.
Methods
get
(key)get the object associated with key.
- get(key: CachedObjectKey) CacheItem
get the object associated with key. If key doesn’t exist in the cache, adds a new CacheItem to cache and returns it
- Parameters:
- keyCachedObjectKey
the key for the CacheItem being retrieved
- Returns:
- CacheItem
- class delex.utils.build_cache.CacheItem
Bases:
object
A lockable item in the BuildCache. Essentially a a pointer with a mutex to guard it for parallel builds
- Attributes:
obj
the object (e.g. index, strings, tokenizer, etc.)
- property obj
the object (e.g. index, strings, tokenizer, etc.)
- class delex.utils.build_cache.CachedObjectKey
Bases:
object
A key for a cached object in the BuildCache
delex.utils.funcs module
- delex.utils.funcs.attach_current_thread_jvm()
- delex.utils.funcs.get_logger(name, level=10)
- delex.utils.funcs.human_format_bytes(n)
- delex.utils.funcs.init_jvm(vmargs=[])
- delex.utils.funcs.is_persisted(df)
- delex.utils.funcs.persisted(df, storage_level=StorageLevel(True, True, False, False, 1))
- delex.utils.funcs.size_in_bytes(f: Path, /) int
get the size on disk in bytes of f
- Parameters:
- fPath
path to the file or directory on the local filesystem
- Returns:
- int
if f is a file, return the size of the single file else get total size in bytes of all files in the directory similar to du utility
- Raises:
- FileNotFoundError
if f doesn’t exist
- delex.utils.funcs.type_check(var, var_name, expected)
type checking utility, throw a type error if the var isn’t the expected type
delex.utils.hash_function module
- class delex.utils.hash_function.HashFunction(seed=None)
Bases:
object
a simple wrapper class for the XXHash3
Methods
hash
(s)hash s and return the 128 bits as bytes
hash_split
(s, /)hash s and return the 128 bits split between two ints
- hash(s: str) bytes
hash s and return the 128 bits as bytes
- hash_split(s: str, /) Tuple[int, int]
hash s and return the 128 bits split between two ints
delex.utils.numba_functions module
- delex.utils.numba_functions.sorted_set_overlap(l_ind, r_ind, /)
compute the overlap between two sorted unique arrays
- Returns:
- int
- delex.utils.numba_functions.typed_list_to_array(l)
covert a numba typed list to a numpy array
delex.utils.traits module
- class delex.utils.traits.SparkDistributable
Bases:
ABC
Methods
deinit
()deinitialize the object, closing resources (e.g. file handles).
init
()initialize the object to be used on in a spark worker
to_spark
()send the obj to the spark cluster to be used on spark workers
- abstractmethod deinit()
deinitialize the object, closing resources (e.g. file handles)
- abstractmethod init()
initialize the object to be used on in a spark worker
- abstractmethod to_spark()
send the obj to the spark cluster to be used on spark workers