API Reference

Algorithm Utils

class algorithm_utils.AlgorithmDeployer.AlgorithmDeployer(algorithm_directory: str)

The AlgorithmDeployer class is used to deploy an algorithm to the algorithm store in the database. The algorithm directory gets automatically separated into the algorithm module by detecting all .py files and zipping them as a python module and storing them in the module-store collection. Files other than .py files are stored as assets in the asset-store collection. The algorithm metadata is stored in the algorithm-store collection as a json file. The json file contains the algorithm name, major and minor version, the module id, the assets dictionary and the timestamp of when the algorithm was stored.

Parameters:: algorithm_directory (str) – The path to the algorithm directory.

static calculate_etag(file: bytes) → str

Calculate the etag hash of a file.

Parameters:: file (bytes) – The file.
Returns:: The etag hash.
Return type:: str

static check_if_zip_is_importable(path_to_zip: str, module_name: str) → bool

Check if the zipped compox module is importable. This serves as a sanity check that the environment where the algorithm is being deployed has the necessary dependencies available.

Parameters:

path_to_zip (str) – The path to the zip file.
module_name (str) – The name of the module.

Returns:

True if the module is importable, False otherwise.

Return type:

bool

classmethod deploy_from_zip(zip_path: str, database_connection: BaseConnection | None = None, algorithm_name_override: str | None = None, algorithm_major_version_override: str | None = None, algorithm_collection_name: str = 'algorithm-store', module_collection_name: str = 'module-store', asset_collection_name: str = 'asset-store') → str

Deploy an algorithm from a zip archive containing the algorithm files.

Parameters:

zip_path (str) – Path to the algorithm zip archive.
database_connection (BaseConnection.BaseConnection | None) – The database connection object.
algorithm_name_override (str | None) – The algorithm name override.
algorithm_major_version_override (str | None) – The algorithm major version override.
algorithm_collection_name (str, optional) – The name of the collection to store the algorithm.
module_collection_name (str, optional) – The name of the collection to store the module.
asset_collection_name (str, optional) – The name of the collection to store the assets.

Returns:

algorithm id

Return type:

str

static find_other_than_py_files(directory: str, ignore_pycache: bool = True, ignore_gitignore: bool = True) → list[str]

Find all the files in a directory other than .py files.

Parameters:

directory (str) – The directory to search.
ignore_pycache (bool, optional) – Whether to ignore the __pycache__ directory. The default is True.
ignore_gitignore (bool, optional) – Whether to ignore the .gitignore file. The default is True.

Returns:

The list of files other than .py files.

Return type:

list[str]

static find_py_files(directory: str, ignore_pycache: bool = True) → tuple[list[str], str]

Find all the .py files in a directory recursively.

Parameters:

directory (str) – The directory to search.
ignore_pycache (bool) – Whether to ignore __pycache__ directory

Returns:

A tuple containing a list of py files in a directory and a string representing their combined hash.

Return type:

tuple[list[str], str]

static find_pyc_files(directory: str, ignore_pycache: bool = True) → tuple[list[str], str]

Find all the .pyc files in a directory recursively.

Parameters:

directory (str) – The directory to search.
ignore_pycache (bool) – Whether to ignore __pycache__ directory.

Returns:

A tuple containing a list of .pyc files and their combined hash.

Return type:

tuple[list[str], str]

static generate_uuid(version: int = 1) → str

Generate a uuid.

Parameters:: version (int, optional) – The version of the uuid. The default is 1.
Returns:: The uuid.
Return type:: str
Raises:: ValueError – if version of the uuid is not 1 or 4.

static get_py_files_hashes(py_files: list[str], base_directory: str | None = None) → str

Get a combined hash of all the .py files. :param py_files: The list of .py files. :type py_files: list[str]

Returns:: The combined md5 hash of all the .py files.
Return type:: str

parse_pyproject_toml(path_to_algorithm_directory: str) → dict

Parse the pyproject.toml file in the algorithm directory to get the algorithm name, major version and minor version.

Parameters:: path_to_algorithm_directory (str) – The path to the algorithm directory.
Returns:: The algorithm name, major version and minor version.
Return type:: dict
Raises:: FileNotFoundError – If pyproject.toml not found in algorithm directory.

static process_path_to_dict_key(path: str) → str

This method takes a path to a file specified as directory and substitutes any backslashes and double backslashes with forward slashes. It also removes the leading forward slash if it exists. This is necessary to store the directory structure as a dictionary key which can then be accessed by the server independently of the operating system, where the deployment is performed.

Parameters:: path (str) – A path to a file.
Returns:: The processed path with forward slashes and without the leading forward slash.
Return type:: str

store_algorithm(database_connection: BaseConnection | None = None, algorithm_name_override: str | None = None, algorithm_major_version_override: str | None = None, algorithm_collection_name: str = 'algorithm-store', module_collection_name: str = 'module-store', asset_collection_name: str = 'asset-store') → str

Store the algorithm to the algorithm store.

Parameters:

database_connection (BaseConnection.BaseConnection | None) – The database connection object. Can be None if the algorithm is not supposed to be stored in the database (e.g. for local testing and development).
algorithm_name_override (str | None) – The algorithm name override.
algorithm_major_version_override (str | None) – The algorithm major version override.
algorithm_collection_name (str, optional) – The name of the collection to store the algorithm. The default is “algorithm-store”.
module_collection_name (str, optional) – The name of the collection to store the module. The default is “module-store”.
asset_collection_name (str, optional) – The name of the collection to store the assets. The default is “asset-store”.

Returns:

algorithm id

Return type:

str

Raises:

Exception – if algorithm module or assets store failed

class algorithm_utils.AlgorithmManager.AlgorithmManager(database_connection: BaseConnection, algorithms_collection: str = 'algorithm-store', module_collection: str = 'module-store', assets_collection: str = 'asset-store', checkpoint_collection: str = 'algorithm-checkpoint-store')

This class is responsible for managing the algorithms, modules and assets in the database. It provides methods to list or delete algorithms, modules and assets. To store the algorithms, modules and assets, use the AlgorithmDeployer class.

Parameters:

database_connection (BaseConnection.BaseConnection) – The database connection to use for the operations.
algorithms_collection (str) – The name of the collection where the algorithms are stored.
module_collection (str) – The name of the collection where the modules are stored.
assets_collection (str) – The name of the collection where the assets are stored.

delete_algorithm(name: str | None = None, major_version: str | None = None) → None

Delete an algorithm and associated modules and assets.

Parameters:

name (str | None) – The name of the algorithm to delete.
major_version (str | None) – The major version of the algorithm to delete.

Return type:

None

Raises:

ValueError – if name or major_version is not specified

delete_algorithm_minor_version(name: str, major_version: str, minor_version: str) → None

Delete a specific minor version of an algorithm. The algorithm itself is not deleted, as long as there are other minor versions present. If the last minor version is deleted, the entire algorithm is deleted.

Delete associated modules and assets if they are not used by other algorithms/minor versions.

Parameters:

name (str) – The name of the algorithm.
major_version (str) – The major version of the algorithm.
minor_version (str) – The minor version of the algorithm to delete.

Return type:

None

list_algorithms(name: str | None = None, major_version: str | None = None) → list[dict]

List all algorithms stored in the database. Optionally can filter by name or major version of the algorithm.

Parameters:

name (str | None, optional) – Can be used to filter the algorithms by name.
major_version (str | None, optional) – Can be used to filter the algorithms by major version.

Returns:

The list of algorithms defined by their jsons

Return type:

list[dict]

class algorithm_utils.BaseRunner.BaseRunner

Base class for all runners. Specifies the architecture of a runner and the required methods.

When implementing a new runner, the following methods need to be implemented: - preprocess: Preprocess the input data. - inference: Run the inference on the output of the preprocessing. - postprocess: Postprocess the output of the inference.

property device: str

Get the device on which the model and inference will be run. This is set during the initialization of the runner.

Returns:: The device that will be used to run the model and inference
Return type:: str

download_dataset_to_temp_store(dataset: TrainingDataset, pydantic_data_schemas: dict[str, Type[DataSchema]]) → list[list[dict]]

Downloads the entire training dataset to the temporary store while preserving the directory structure logically represented in the sample manifests.

A subdirectory named after each sample’s ID will be created within the specified folder path in the temporary store. Then subdirectories for each sample key will be created within the sample ID directory. Finally, the files associated with each sample key will be saved in their respective subdirectories.

Parameters:

dataset (TrainingDataset) – The training dataset to be downloaded.
pydantic_data_schemas (dict[str, Type[DataSchema]]) –
A dictionary mapping sample keys to their corresponding Pydantic data schema classes for validating the data.

e.g. {“input”: InputDataSchema, “label”: LabelDataSchema}
-------
list[list[dict]] – A list of lists of dictionaries with the individual samples represented as dictionaries following the structure of the sample manifests, but with local paths in the temporary store instead of file IDs.

download_files_to_temp_store(folder_path: str | Path, file_ids: List[str], pydantic_data_schema: Type[DataSchema], batch_size: int = 8, *keys: str)

Downloads files from the database to a specific folder in a temporary storage created specifically for training purposes. This method works directly with the file identifiers in the database, which means that the files do not need to be loaded to the memory, but are downloaded directly to the temporary storage. You must provide a pydantic schema to validate the data before saving.

Parameters:

folder_path (str | Path) – The path to the folder in the temporary storage where the files will be saved.
file_ids (List[str]) – The list of file identifiers in the database.
pydantic_data_schema (Type[DataSchema]) – The pydantic schema of the data. Must inherit from the DataSchema class.
batch_size (int, optional) – The number of files to download in a single batch. Default is 8.
*keys (str) – Optional keys to filter the files to download.

fetch_asset(asset_path: str) → BytesIO

Fetches an asset as bytes from the database by its path relative to the algorithm Runner class.

Parameters:: asset_path (str) – TThe path to the asset relative to the algorithm Runner class. e.g. “files/weights.pth”
Returns:: The asset as bytes.
Return type:: io.BytesIO

fetch_data(file_ids: list[str], pydantic_data_schema: Type[DataSchema], *keys: str, parallel: bool = False) → list[dict]

Fetches the data from the database. A pydantic schema must be provided to validate the data. The data is fetches as a list of dictionaries, where each dictionary represents a dataset. Specific keys can be provided to fetch from the HDF5 file, if not provided, all keys will be fetched. This method is wrapper around the fetch_data method of the TaskHandler class.

Parameters:

file_ids (list[str]) – The identifiers of the data files in the database.
pydantic_data_schema (Type[DataSchema]) – The pydantic schema of the data. Must inherit from the DataSchema class.
*keys (str) – Optional keys to fetch from the HDF5 file, if not provided, all keys will be fetched.
parallel (bool, optional) – If True, the data will be fetched in parallel. Default is False.

Returns:

List of the datasets fetched from the database as dictionaries.

Return type:

list[dict]

get_state() → dict

Get the current state of the runner. The state is a dictionary that can be used to store any information that might be useful for the client, such as intermediate metrics, loss values, etc.

Returns:: The current state of the runner.
Return type:: dict

get_training_dataset(training_sample_ids: list[str]) → TrainingDataset

Retrieves the training dataset record from the database.

Parameters:: training_sample_ids (list[str]) – The training sample ids.
Returns:: The training dataset record.
Return type:: TrainingDataset
Raises:: ValueError – If training dataset could not be fetched.

abstractmethod inference(data: Any, args: dict = None) → Any

Run the inference.

Parameters:

data (Any) – The input data.
args (dict) – Additional arguments.

Returns:

The output data.

Return type:

Any

Raises:

NotImplementedError –

inference_base(data: Any, args: dict = None) → Any

Run the inference.

Parameters:

data (Any) – The input data.
args (dict) – Additional arguments.

Returns:

The output data.

Return type:

Any

initialize(device: str | None = None) → None

Initialize the runner with the given device. This method is called by the TaskHandler when the algorithm is fetched. It is used to set the device on which the model and inference will be run.

Parameters:: device (str | None) – The device on which the model and inference will be run. e.g. “cpu”, “cuda:0” or “cuda:1”. This is set during the initialization of the runner.
Return type:: None

load_assets()

This method should be overridden to load all necessary assets for the algorithm, such as trained models, precomputed data, or other resources.

Assets must be loaded using self.fetch_asset() instead of accessing the file system directly. All assets should be stored as attributes on the runner instance.

WARNING: The attributes set in this method will be protected against reassignment in other parts of the code, so they should not be modified after this method is called. However, this protection does not hold for mutating mutable types with in-place operations (e.g., appending to a list or modifying a dictionary). If you need to modify such attributes, consider using a different approach.

load_dataset_from_temp_store(local_samples: list[list[dict]]) → list[list[dict]]

Loads the entire training dataset from the temporary store while preserving the directory structure logically represented in the sample manifests.

Parameters:

local_samples (list[list[dict]]) – A list of lists of dictionaries with the individual samples represented as dictionaries following the structure of the sample manifests, but with local paths in the temporary store instead of file IDs.
-------
list[list[dict]] – A list of lists of dictionaries with the individual samples represented as dictionaries following the structure of the sample manifests, but with loaded data dictionaries instead of file IDs.

load_files_from_temp_store(paths: List[str | Path], parallel: bool = True, *keys: str) → List[dict]

Loads files from the temporary store.

Parameters:

paths (List[str | Path]) – The list of file paths in the temporary store to be loaded.
parallel (bool, optional) – Whether to load the files in parallel, by default True.
*keys (str) – The keys to extract from the loaded data dictionaries. If no keys are provided, the entire data dictionary will be returned.

Returns:

The list of loaded data dictionaries.

Return type:

list[dict]

load_item_from_session(key: str) → Any

Fetch an item from the session cache.

Parameters:: key (str) – The key to fetch the item.
Returns:: The item fetched from the session cache.
Return type:: Any

log_message(message: str, logging_level: str = 'INFO') → None

Log a message.

Parameters:

message (str) – The message to log.
logging_level (str) – The logging level as defined in the logging module. Default is “INFO”.

Return type:

None

Raises:

ValueError – If an invalid logging level is provided.

post_data(data: list[dict], pydantic_data_schema: Type[DataSchema], parallel: bool = False) → list[str]

Uploads a list of datasets to the database. The dataset is a dictionary where the keys are the names of the datasets and the values are the datasets themselves (e.g. numpy arrays). A pydantic schema must be provided to validate the data before uploading. The data is uploaded as HDF5 files. This method is wrapper around the post_data method of the TaskHandler class.

Parameters:

data (list[dict]) – List of the datasets to upload. Each dataset is a defined as a dictionary.
pydantic_data_schema (Type[DataSchema]) – The pydantic schema of the data. Must inherit from the DataSchema class.
parallel (bool, optional) – If True, the data will be uploaded in parallel. Default is False.

Returns:

List of the identifiers of the uploaded datasets.

Return type:

list[str]

abstractmethod postprocess(data: Any, args: dict = None) → list[str]

Postprocess the output data.

Parameters:

data (Any) – The input data.
args (dict) – Additional arguments.

Returns:

The ids of the output datasets.

Return type:

list[str]

Raises:

NotImplementedError –

postprocess_base(data: Any, args: dict = None) → list[str]

Postprocess the output data.

Parameters:

data (Any) – The input data.
args (dict) – Additional arguments.

Returns:

The ids of the output datasets.

Return type:

list[str]

abstractmethod preprocess(input_data: dict, args: dict = None) → Any

Preprocess the input data.

Parameters:

input_data (dict) – The input data.
args (dict) – Additional arguments.

Returns:

The preprocessed input data.

Return type:

Any

Raises:

NotImplementedError –

preprocess_base(input_data: dict, args: dict = None) → Any

Preprocess the input data.

Parameters:

input_data (dict) – The input data.
args (dict) – The additional arguments

Returns:

The preprocessed input data.

Return type:

Any

remove_item_from_session(key: str) → None

Remove an item from the session cache.

Parameters:: key (str) – The key to remove the item.
Return type:: None

run(input_data: dict, args: dict = None) → None

Run the algorithm.

Parameters:

input_data (dict) – The input data.
args (dict) – Additional arguments.

Return type:

None

Raises:

Exception – If an error occurs during the execution.

run_training(training_data: list[str], args: dict = None) → tuple[str, str, str]

Train the algorithm.

Parameters:

training_data (list[str]) – The training samples ids.
args (dict) – Additional arguments for training.

Returns:

The trained algorithm id, name and major version.

Return type:

tuple[str, str, str]

property runner_context: dict

Get the current runner context. This is used to access the runner context methods and attributes.

Returns:: current runner context
Return type:: dict

save_checkpoint(checkpoint: dict[str, bytes], properties: dict = {}) → str

Save a training checkpoint to the database.

Parameters:

checkpoint (dict[str, bytes]) – The dictionary containing the checkpoint files. The keys are the file names which must correspond to the asset keys used to load the assets in the load_assets() method. Values are the file contents as bytes e.g. Pytorch model weight converted to bytes using io.BytesIO().
properties (dict, optional) – Additional properties to associate with the checkpoint. Default is an empty dictionary.

Returns:

The identifier of the saved checkpoint.

Return type:

str

Raises:

ValueError – If saving the checkpoint fails.

save_item_to_session(obj: Any, key: str) → None

Save an item to the session cache.

Parameters:

obj (Any) – The item to save.
key (str) – The key to save the item.

Return type:

None

save_training_files_to_temp_store(folder_path: str | Path, files: List[dict], pydantic_data_schema: Type[DataSchema], parallel: bool = True) → list[Path]

Saves training files represented by a list of dictionaries to a specific folder in a temporary storage created specifically for training purposes. You must provide a pydantic schema to validate the data before saving. This method should be used when some data (mainly numpy arrays) are loaded in the memory after some preprocessing and need to be saved to the temporary storage so that they can be accessed during training.

Parameters:

folder_path (str | Path) – The path to the folder in the temporary storage where the files will be saved.
files (List[dict]) – The list of files to save. Each file is represented as a dictionary.
pydantic_data_schema (Type[DataSchema]) – The pydantic schema of the data. Must inherit from the DataSchema class.
parallel (bool, optional) – If True, the files will be saved in parallel. Default is True.

Returns:

List of the paths to the saved files.

Return type:

list[Path]

set_progress(progress: float) → None

Set the progress of the execution. The progress must be a float between 0 and 1.

Parameters:: progress (float) – The progress of the execution.
Raises:: ValueError – If progress is not between 0 a 1 or float

set_state(state: dict) → None

Set the state of the runner. The state is a dictionary that can be used to store any information that might be useful for the client, such as intermediate metrics, loss values, etc.

WARNING: The state is always overwritten, not merged, so it is up to the developer to either fetch the current state using get_state() or keep track of the state in the algorithm code.

Parameters:: state (dict) – The state of the runner.
Return type:: None

property task_handler: TaskHandler

Get the current task handler. This is used to access the task handler methods and attributes.

Returns:: Current task handler.
Return type:: TaskHandler.TaskHandler
Raises:: ValueError – If task handler is not set.

train(training_data: list[str], args: dict = None) → None

Train the algorithm.

Parameters:

training_data (list[str]) – The training samples ids.
args (dict) – Additional arguments for training.

Return type:

None

Raises:

NotImplementedError –

Tasks

class tasks.TaskHandler.TaskHandler(task_id: str, database_connection: S3Connection, database_update: bool = True, task_session: TaskSession | None = None, emergency_record_store: EmergencyRecordStore | None = None)

Task handler class for the execution task. This class is used to update the progress, status and log of the execution task. Also contains methods to fetch the algorithm, assets and data from the database server of choice.

Parameters:

task_id (str) – The identifier of the task. Typically a UUID.
database_connection (S3Connection) – The database connection object instance. Must inherit from the BaseConnection class and implement the required methods.
database_update (bool, optional) – Whether to the execution record in the database, by default True. Can be set to False for example when debugging locally.
task_session (TaskSession | None, optional) – The task session object instance. Must inherit from the TaskSession class, by default None.

fetch_algorithm(algorithm_id: str, execution_device_override: str | None = None, checkpoint_id: str | None = None, algorithm_minor_version: str | None = None) → object

Fetches the algorithm from the database and imports its corresponding Python module and runner class.

Parameters:

algorithm_id (str) – The id of the algorithm.
execution_device_override (str | None, optional) – The requested abstract execution device class, by default None. This uses the algorithm metadata vocabulary (for example cpu, gpu or mps). Compox resolves this request to a concrete runtime device string passed into the runner, such as cpu, cuda or mps.
checkpoint_id (str | None, optional) – The id of the checkpoint, by default None. If provided, the checkpoint will be used to load the model assets.
algorithm_minor_version (str | None, optional) – The minor version of the algorithm, by default None. If provided, the minor version will be used to load the model assets.

Returns:

The algorithm Runner object.

Return type:

object

Raises:

ValueError – If fetch algorithm failed.

fetch_asset(asset_path: str) → BytesIO

Fetches an asset as bytes from the database by its path relative to the algorithm Runner class.

Parameters:: asset_path (str) – The path to the asset relative to the algorithm Runner class. e.g. “files/weights.pth”
Returns:: The asset as bytes.
Return type:: io.BytesIO
Raises:: ValueError – If fetch asset failed.

fetch_data(file_ids: list[str], pydantic_data_schema: Type[DataSchema], *keys: str, parallel: bool = False) → list[dict]

Fetches the data from the database. A pydantic schema must be provided to validate the data. The data is fetches as a list of dictionaries, where each dictionary represents a dataset. Specific keys can be provided to fetch from the HDF5 file, if not provided, all keys will be fetched.

Parameters:

file_ids (list[str]) – The identifiers of the data files in the database.
pydantic_data_schema (Type[DataSchema]) – The pydantic schema of the data. Must inherit from the DataSchema class.
*keys (str) – Optional keys to fetch from the HDF5 file, if not provided, all keys will be fetched.
parallel (bool, optional) – If True, the data will be fetched in parallel. Default is False.

Returns:

List of the datasets fetched from the database as dictionaries.

Return type:

list[dict]

Raises:

Exception –

load_item_from_session(key: str) → Any

Load an object from the task session.

Parameters:

key (str) – The key of the object to load.

Returns:

The object loaded from the task session.

Return type:

Any

Raises:

Exception – If the task session is not initialized.
ValueError – If task session is not initialized.

mark_as_completed(output_dataset_ids: list[str]) → None

Mark the task as completed and update its record in the database. This will set the progress to 1.0, the status to “COMPLETED” and the time completed to the current time.

Parameters:: output_dataset_ids (list[str]) – The output dataset identifiers of the task.
Return type:: None

mark_as_failed(e: Exception | None = None) → None

Mark the task as failed and update its record in the database. This will set the progress to 1.0, the status to “FAILED” and the time completed to the current time. The exception that caused the task to fail will be logged in the task log.

Parameters:: e (Exception | None, optional) – The exception that caused the task to fail, by default None. It will be logged in the task log.
Return type:: None

mark_as_stopped() → None

Mark the task as stopped and update its record in the database. This will set the status to “STOPPED” and the time completed to the current time.

Return type:: None

property output_dataset_ids

The output dataset identifiers of the task.

Getter:: Returns the output dataset identifiers of the task.
Setter:: Sets the output dataset identifiers of the task.
Type:: list[str]

post_data(result: list[dict], pydantic_data_schema: Type[DataSchema], parallel: bool = False) → list[str]

Uploads a list of datasets to the database. The dataset is a dictionary where the keys are the names of the datasets and the values are the datasets themselves (e.g. numpy arrays). A pydantic schema must be provided to validate the data before uploading. The data is uploaded as HDF5 files.

Parameters:

result (list[dict]) – The result to upload to the database.
pydantic_data_schema (Type[DataSchema]) – The pydantic schema of the data. Must inherit from the DataSchema class.
parallel (bool, optional) – If True, the data will be uploaded in parallel. Default is False.

Returns:

The dataset identifiers of the uploaded datasets.

Return type:

list[str]

Raises:

Exception –

property progress

The progress of the task in the range [0., 1.].

Getter:: Returns the progress of the task.
Setter:: Sets the progress of the task.
Type:: float

remove_item_from_session(key: str) → None

Remove an object from the task session.

Parameters:

key (str) – The key of the object to remove.

Return type:

None

Raises:

Exception –
ValueError – If task session is not initialized.

save_item_to_session(obj: Any, key: str) → None

Save an object to the task session.

Parameters:

obj (Any) – The object to save.
key (str) – The key to save the object under.

Return type:

None

Raises:

Exception –
ValueError – If task session is not initialized.

property session_token

The identifier of the session. Typically a UUID.

Getter:: Returns the session id.
Setter:: Sets the session id.
Type:: str

set_as_current_handler() → None

Set this task handler as the current task handler in the current_task_handler context variable. This is used to access the current task handler from anywhere in the code.

Return type:: None

property status

The status of the task. e.g. “RUNNING”, “COMPLETED”, “FAILED”

Getter:: Returns the status of the task.
Setter:: Sets the status of the task.
Type:: str

property task_id

The identifier of the task. Typically a UUID.

Getter:: Returns the task id.
Setter:: Sets the task id.
Type:: str

property time_completed

The time the task was completed.

Getter:: Returns the time the task was completed.
Setter:: Sets the time the task was completed.
Type:: str

update_log() → None

Update the log of the task in the database. This method is called automatically when the task is completed or failed. It can also be called manually to update the log during the execution of the task.

Return type:: None
Raises:: Exception –

exception tasks.TaskHandler.TaskStoppedException

class tasks.DebuggingTaskHandler.DebuggingTaskHandler(task_id: str)

TaskHandler for debugging algorithm runners locally, without the need to have a running server. Works in local filesystem instead of database server.

Parameters:: task_id (str) – The task id.

fetch_algorithm(path_to_algorithm: str, device: str = 'cpu') → object

Fetches the algorithm from the local filesystem.

Parameters:

path_to_algorithm (str) – The path to the algorithm.
device (str) – The device to run the algorithm on.

Returns:

The algorithm runner instance.

Return type:

object

Raises:

ImportError – If algorithm runner could not be imported.

fetch_asset(path_to_asset: str) → BytesIO

Fetches the asset from the local filesystem.

Parameters:: path_to_asset (str) – The path to the asset.
Returns:: The asset as a BytesIO object.
Return type:: io.BytesIO

Sessions

class session.TaskSession.TaskSession(session_token: str | None = None, max_number_of_data_caches: int = 5, max_cache_size: int = 5, max_cache_memory_mb: int | None = None, expire_hours: int = 24, not_implemented: bool = False)

The TaskSession class is used to serve as a common interface for individual TaskHandler instances. The session is identified by a session token. The purpose of session is to mainly handle in-memory data caches for algorithms. This is useful, as in some algorithms, it is necessary to be able to quickly access and modify some data without the need to repeatedly store and fetch the data from the database. The data is stored under in a dictionary-like structure, where the key is the session token and the value is the data cache object. The session token is an unique identifier generated for each session. If the client wishes to continue the session, the session token is passed in execution response, and when the client sends in other requests, they can pass the session in the session_token field. A new session is then created with the session token and with the access to the data stored in the cache under the particular session token.

TODO: this currently only works for a single process. If we want to scale this to multiple processes, we need to use a shared memory object with access across the individual worker nodes.

data_caches

Dictionary storing all session caches. Keys are session tokens.

Type:: dict

Parameters:

session_token (str | None) – The identifier of the session. Typically a UUID.
max_number_of_data_caches (int) – The maximum number of data caches which will be stored in memory.
max_cache_size (int) – The maximum size of the cache.
max_cache_memory_mb (int | None) – The maximum memory in MB that the cache can use.
expire_hours (int) – The number of hours after which the session expires.
not_implemented (bool) – Bool which marks the sessions as not supported. This is used currently used for marking the session as not supported for celery tasks.

add_item(obj: Any, key: str) → None

Store the item in the cache.

Parameters:

obj (Any) – The item to store.
key (str) – The key to store the item with.

Return type:

None

Raises:

NotImplementedError –

clear_cache(): Clear the cache.

remove_item(key: str)

Remove the item from the cache.

Parameters:: key (str) – The key to remove.
Raises:: NotImplementedError –

property session_token

The identifier of the session. Typically a UUID.

Getter:: Returns the session id.
Setter:: Sets the session id.
Type:: str

class session.DataCache.DataCache(max_size: int = 5, max_memory_mb: int | None = None)

This class serves as a data cache for the task handler. It is used to store data in memory for quick access and modification. The cache is identified by a key which is used to store and retrieve the data. The cache has a maximum size and memory limit. If the cache exceeds the maximum size, the oldest item is removed. If the cache exceeds the maximum memory limit, the cache is cleared.

Parameters:

max_size (int) – The maximum size of the cache.
max_memory_mb (int | None) – The maximum memory in MB that the cache can use.

add_item(obj: Any, key: str)

Add an item to the cache.

Parameters:

obj (Any) – The item to add to the cache.
key (str) – The key of the item.

clear(): Clear the cache.

remove_item(key: str)

Remove an item from the cache.

Parameters:: key (str) – The key of the item to remove.

Database Connection

class database_connection.BaseConnection.BaseConnection

A generic database connection class. This class is meant to be inherited by specific database connection classes. It defines the methods for interacting with the object storage database. It assumes that the database is structured as a set of collections, where each collection contains a set of objects. The objects can be any type of data, such as files, images, or other objects. The objects are acessed by their names, and the collections are accessed by their names. For example, in an S3 database, the collections would be the buckets, and the objects would be the files in the buckets.

check_collections_exists(collection_names: list[str]) → list[bool]

Checks if collections exist.

Parameters:: collection_names (list[str]) – The collection names.
Returns:: The list of booleans indicating if the collections exist.
Return type:: list[bool]
Raises:: NotImplementedError –

check_objects_exist(collection_name: str, object_names: list[str]) → list[bool]

Checks if objects exist in a collection.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object names.

Returns:

The list of booleans indicating if the objects exist.

Return type:

list[bool]

Raises:

NotImplementedError –

create_collections(collection_names: list[str]) → None

Creates collections.

Parameters:: collection_names (list[str]) – The collection names.
Raises:: NotImplementedError –

delete_collections(collection_names: list[str]) → None

Deletes collections.

Parameters:: collection_names (list[str]) – The collection names.
Raises:: NotImplementedError –

delete_objects(collection_name: str, object_names: list[str]) → None

Deletes objects from a collection.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object names.

Return type:

None

Raises:

NotImplementedError –

get_object_tags(collection_name: str, object_name: str) → dict[str, str]

Get object tags for a given object in a collection.

Parameters:

collection_name (str) – The collection name.
object_name (str) – The object name.

Returns:

The object tags.

Return type:

dict[str, str]

Raises:

NotImplementedError –

get_objects(collection_name: str, object_names: list[str]) → list[bytes]

Gets objects from a collection.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object names.

Returns:

The list of bytes objects.

Return type:

list[bytes]

Raises:

NotImplementedError –

list_collections() → list

Lists all object collections.

Returns:: The list of object collections.
Return type:: list
Raises:: NotImplementedError –

list_objects(collection_name: str) → list[dict] | list[str]

Lists all objects in a collection.

Parameters:: collection_name (str) – The collection name.
Returns:: The list of objects in the collection.
Return type:: list[dict] | list[str]
Raises:: NotImplementedError –

put_object_tags(collection_name: str, object_name: str, tags: dict[str, str]) → None

Put object tags for a given object in a collection.

Parameters:

collection_name (str) – The collection name.
object_name (str) – The object name.
tags (dict[str, str]) – The object tags.

Raises:

NotImplementedError –

put_objects(collection_name: str, object_names: list[str], object: list[bytes] | list[str]) → None

Puts objects into a collection.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object names.
object (list[bytes] | list[str]) – The byte objects.

Return type:

None

Raises:

NotImplementedError –

put_objects_with_duplicity_check(collection_name: str, object_names: list[str], object: list[bytes]) → list[bool] | list[str]

Puts objects into a collection with duplicity check. Returns the list of object names, where the objects of which duplicates were found, substituted with the object names of the duplicates.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object names.
object (list[bytes]) – The byte objects.

Returns:

The list of object names.

Return type:

list[bool] | list[str]

Raises:

NotImplementedError –

class database_connection.S3Connection.S3Connection(endpoint_url: str, aws_access_key_id: str, aws_secret_access_key: str, region_name: str | None = None, data_store_expire_days: int = 1, execution_store_expire_days: int = 30, training_store_expire_days: int = 30, deploy_store_expire_days: int = 30, stop_requests_expire_days: int = 7, collection_prefix: str = '')

A connection class for an S3 object storage database. This class inherits from the BaseConnection class and implements the methods for interacting with an S3 object storage database.

NOTE: All lifecycle policies are initialized at bucket creation time, this means that changing the expiration days in the S3Connection instance after the bucket creation will NOT update the lifecycle policies of already existing buckets.

Parameters:

endpoint_url (str) – The endpoint URL.
aws_access_key_id (str) – The AWS access key ID.
aws_secret_access_key (str) – The AWS secret access key.
region_name (str | None) – The region name.
data_store_expire_days (int) – The number of days after which the objects in the data-store bucket expire. Default is 1.
execution_store_expire_days (int) – The number of days after which the objects in the execution-store bucket expire. Default is 30.
training_store_expire_days (int) – The number of days after which the objects in the training-store bucket expire. Default is 30.
deploy_store_expire_days (int) – The number of days after which the objects in the deploy-store bucket expire. Default is 30.
stop_requests_expire_days (int) – The number of days after which the objects in the stop-requests bucket expire. Default is 7.
collection_prefix (str) – The prefix for the actual bucket names. The bucket names are constructed as {collection_prefix}{collection_name}. Default is an empty string.

check_collections_exists(collection_names: list[str]) → list[bool]

Checks if buckets exist.

Parameters:: collection_names (list[str]) – The collection names.
Returns:: The list of booleans indicating if the collection exist.
Return type:: list[bool]

check_objects_exist(collection_name: str, object_names: list[str]) → list[bool]

Checks if objects exist in a bucket.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object keys.

Returns:

The list of booleans indicating if the objects exist.

Return type:

list[bool]

create_collections(collection_names: list[str]) → None

Creates collections.

Parameters:: collection_names (list[str]) – The collection names.

delete_collections(collection_names: list[str]) → None

Deletes collections.

Parameters:: collection_names (list[str]) – The collection names.

delete_objects(collection_name: str, object_names: list[str]) → None

Deletes objects in a collection.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object keys.

generate_presigned_url(client_method: str, collection_name: str, object_name: str, expiration: int = 3600) → str

Generate a generic presigned URL.

Parameters:

client_method (str) – The S3 client method to use (e.g., ‘get_object’ or ‘put_object’).
collection_name (str) – The name of the bucket where the object will be stored.
object_name (str) – The key of the object in the bucket.
expiration (int, optional) – Time in seconds until the URL expires.

Returns:

A presigned URL.

Return type:

str

get_object_tags(collection_name: str, object_name: str) → dict[str, str]

Get object tags for a given object in a collection.

Parameters:

collection_name (str) – The collection name.
object_name (str) – The object name.

Returns:

The object tags.

Return type:

dict[str, str]

Raises:

NotImplementedError –

get_objects(collection_name: str, object_names: list[str]) → list[bytes]

Gets objects from a collection.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object keys.

Returns:

The list of object bytes.

Return type:

list[bytes]

get_presigned_download_url(collection_name: str, object_name: str, expiration: int = 3600) → str

Generate a presigned URL for downloading an object.

Parameters:

collection_name (str) – The name of the bucket where the object is stored.
object_name (str) – The key of the object in the bucket.
expiration (int, optional) – Time in seconds until the URL expires.

Returns:

A presigned URL that can be used to download the object.

Return type:

str

get_presigned_upload_url(collection_name: str, object_name: str, expiration: int = 3600) → str

Generate a presigned URL for uploading an object.

Parameters:

collection_name (str) – The name of the bucket where the object will be stored.
object_name (str) – The key of the object in the bucket.
expiration (int, optional) – Time in seconds until the URL expires.

Returns:

A presigned URL that can be used to upload the object.

Return type:

str

list_collections() → list

Lists all collections.

Returns:: The list of collections.
Return type:: list

list_objects(collection_name: str) → list[dict]

Lists all objects in a collection.

Parameters:: collection_name (str) – The collection name.
Returns:: The list of object keys.
Return type:: list[dict]

put_object_tags(collection_name: str, object_name: str, tags: dict[str, str]) → None

Put object tags for a given object in a collection.

Parameters:

collection_name (str) – The collection name.
object_name (str) – The object name.
tags (dict[str, str]) – The object tags.

Raises:

NotImplementedError –

put_objects(collection_name: str, object_names: list[str], object: list[bytes] | list[str]) → None

Puts objects into a collection.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object keys.
object (list[bytes] | list[str]) – The byte objects.

put_objects_with_duplicity_check(collection_name: str, object_names: list[str], object: list[bytes]) → list[str]

Puts objects into a collection with duplicity check. Returns the list of object keys, where the objects of which duplicates were found, substituted with the object keys of the duplicates. The check is based on the ETag.

Parameters:

collection_name (str) – The collection name.
object_names (list[str]) – The object names.
object (list[bytes]) – The byte objects.

Returns:

The list of object names.

Return type:

list[str]

class database_connection.TempfileConnection.TempfileConnection(temp_folder_name: str = 'pcb_temp')

A connection class for a local file system “database”. This class inherits from the BaseConnection class and implements the methods for interacting with a local tempfile file structure mimicking an object storage database. Can be used for testing and debugging purposes, or when a real database is not available for local deployment of the application.

Parameters:: temp_folder_name (str) – The name of the temporary folder.

check_collections_exists(collection_names: list[str]) → list[bool]

Check if the subdirectories exist in the temporary folder.

Parameters:: collection_names (list[str]) – The subdirectory names.
Returns:: The list of booleans indicating if the subdirectories exist.
Return type:: list[bool]

check_objects_exist(collection_name: str, object_names: list[str]) → list[bool]

Check if files exist in a subdirectory.

Parameters:

collection_name (str) – The subdirectory name.
object_names (list[str]) – The file names.

Returns:

The list of booleans indicating if the files exist.

Return type:

list[bool]

create_collections(collection_names: list[str]) → None

Create subdirectories in the temporary folder.

Parameters:: collection_names (list[str]) – The subdirectory names.

delete_collections(collection_names: list[str]) → None

Delete the subdirectories in the temporary folder including all files.

Parameters:: collection_names (list[str]) – The subdirectory names.

delete_objects(collection_name: str, object_names: list[str]) → None

Delete files in a subdirectory.

Parameters:

collection_name (str) – The subdirectory name.
object_names (list[str]) – The file names.

get_object_tags(collection_name: str, object_name: str) → dict[str, str]: Get object tags for a file. Tags are stored in a sidecar .tags JSON file.

get_objects(collection_name: str, object_names: list[str]) → list[bytes]

Get files from a subdirectory.

Parameters:

collection_name (str) – The subdirectory name.
object_names (list[str]) – The file names.

Returns:

The list of file bytes.

Return type:

list[bytes]

list_collections() → list

List all subdirectories in the temporary folder.

Returns:: The list of subdirectories.
Return type:: list

list_objects(collection_name: str) → list[str]

List all files in a subdirectory.

Parameters:: collection_name (str) – The subdirectory name.
Returns:: The list of files.
Return type:: list[str]

put_object_tags(collection_name: str, object_name: str, tags: dict[str, str]) → None: Put object tags for a file. Tags are stored in a sidecar .tags JSON file.

put_objects(collection_name: str, object_names: list[str], object: list[bytes] | list[str]) → None

Put files in a subdirectory.

Parameters:

collection_name (str) – The subdirectory name.
object_names (list[str]) – The file names.
object (list[bytes] | list[str]) – The file bytes.

put_objects_with_duplicity_check(collection_name: str, object_names: list[str], object: list[bytes]) → list[bool]

Put files in a subdirectory with a check for existing files.

Parameters:

collection_name (str) – The subdirectory name.
object_names (list[str]) – The file names.
object (list[bytes]) – The file bytes.

Returns:

The list of booleans indicating if the files were put.

Return type:

list[bool]

class database_connection.database_utils.S3FileUploader(s3_client: client, chunk_size: int = 8388608, num_threads: int = 8)

File uploader to S3.

Parameters:

s3_client (boto3.client) – The s3 client.
chunk_size (int) – The size of the chunks to upload. The default is 8 * 1024 * 1024.
num_threads (int) – The number of threads to use. The default is 8.

upload_file_multipart(bytes: bytes, key: str, bucket: str, retries: int = 8) → None

Upload a file to S3 using multipart upload. This is useful for large files. We use a thread pool to upload the file in parallel.

Parameters:

bytes (bytes) – The file bytes.
key (str) – The key of the file in the bucket.
bucket (str) – The bucket name.
retries (int) – The number of retries.

upload_part(part: bytes, key: str, bucket: str, part_number: int, upload_id: str) → dict

Upload a part of a file to S3.

Parameters:

part (bytes) – The part of the file.
key (str) – The key of the file in the bucket.
bucket (str) – The bucket name.
part_number (int) – The part number.
upload_id (str) – The upload id.

Return type:

dict

database_connection.database_utils.calculate_etag(bytes_obj: bytes) → str

Calculate the etag hash of a file, the etag should be the same as the etag calculate internally by the boto3/minio client

Parameters:: bytes_obj (bytes) – The file bytes to calculate the etag hash of.
Returns:: The etag hash.
Return type:: str

database_connection.database_utils.calculate_etag_multipart(bytes_obj: bytes, chunk_size: int) → str

Calculate the etag hash of a file uploaded using multipart upload. The etag should be the same as the etag calculate internally by the boto3/minio client.

Parameters:

bytes_obj (bytes) – The file bytes to calculate the etag hash of.
chunk_size (int)

Returns:

The etag hash.

Return type:

str

Server Utils

server_utils.algorithm_cache(maxsize=None)

A cache decorator for algorithms. The cache is based on the algorithm_id and device. The cache is implemented as a dictionary with a maximum size. When the algorithm is requested the cache is checked and if the algorithm with the same algorithm_id and device is found the algorithm’s Runner object is returned from the cache. If the algorithm is not found in the cache the algorithm is executed and the result is stored in the cache. If the cache size limit is reached the oldest cache entry is invalidated.

maxsizeint, optional
The maximum size of the cache. The default is None.

;

server_utils.calculate_s3_etag(bytes_obj: BytesIO) → str

Calculate the etag hash of a file, the etag should be the same as the etag calculate internally by the boto3/minio client

Parameters:: bytes_obj (io.BytesIO) – The file bytes to calculate the etag hash of.
Returns:: The etag hash.
Return type:: str

server_utils.check_and_create_database_collections(collection_names: list[str], database_connection: BaseConnection) → list[str]

Checks if the collections exist in the database and creates them if they do not exist.

Parameters:

collection_names (list[str]) – The collection names.
database_connection (BaseConnection.BaseConnection) – The database connection object.

Returns:

The list of newly created collections.

Return type:

list[str]

server_utils.check_mps_availability() → bool

Check if MacOS MPS (Metal Performance Shaders) is available.

Returns:: True if MPS is available, False otherwise.
Return type:: bool

server_utils.check_system_gpu_availability() → tuple[bool | None, int | None]

Check if system has GPU support.

Returns:

bool | None – True if CUDA is available, False otherwise.
int | None – The number of available GPUs.

server_utils.check_torch_with_cuda_available() → bool

Check if PyTorch has CUDA support.

Returns:: True if PyTorch has CUDA support, False otherwise.
Return type:: bool

server_utils.data_cache(maxsize=None)

A cache decorator for data. The cache is based on the unique file key. The cache is implemented as a dictionary with a maximum size. When the file is requested the cache is checked and if the file with the same key is found the file is returned from the cache. If the file is not found in the cache the file is read and the result is stored in the cache. If the cache size limit is reached the oldest cache entry is invalidated.

maxsizeint, optional
The maximum size of the cache. The default is None.

;

server_utils.find_algorithm_by_id(algorithm_id: str, bucket_contents: list[dict], separator: str = '~') → tuple

Find an algorithm by its id.

Parameters:

algorithm_id (str) – The id of the algorithm.
bucket_contents (list[dict]) – The bucket contents.
separator (str, optional) – The separator between the fields in the key. The default is “~”.

Returns:

The algorithm key, id, name, major version, minor version.

Return type:

tuple

server_utils.generate_uuid(version: int = 1) → str

Generate a uuid.

Parameters:: version (int, optional) – The version of the uuid. The default is 1.
Returns:: The uuid.
Return type:: str
Raises:: ValueError – If uuid version is not 1 or 4.

server_utils.get_subprocess_fn() → partial[JobPOpen] | Any

Get the subprocess function appropriate for the current operating system.

Returns:: A callable object used to launch subprocesses.
Return type:: partial[JobPOpen.JobPOpen]
Raises:: ValueError – If the operating system is not supported.

server_utils.weak_lru(maxsize=128, typed=False): LRU Cache decorator that keeps a weak reference to “self”

Server Endpoints

async routers.algorithms_controller.export_algorithm(request: Request, algorithm_name: str, algorithm_major_version: int, algorithm_minor_version: int | None = Query(None), checkpoint_id: str | None = Query(None)) → StreamingResponse

Export an algorithm by its name and version.

Parameters:

request (Request) – The request.
algorithm_name (str) – Algorithm name.
algorithm_major_version (int) – Algorithm major version.
algorithm_minor_version (Optional[int]) – Optional minor version of the algorithm.
checkpoint_id (Optional[str]) – Optional checkpoint identifier.

Returns:

The exported algorithm as a zip file.

Return type:

StreamingResponse

routers.algorithms_controller.get_algorithm(algorithm_name: str, algorithm_major_version: str, request: Request) → AlgorithmRegisteredResponse | FailedAlgorithmRegisteredResponse | JSONResponse

Returns algorithm by its name and version.

Parameters:

algorithm_name (str) – Algorithm name.
algorithm_major_version (str) – Algorithm version.
request (Request) – The request.

Returns:

The algorithm.

Return type:

Union[AlgorithmRegisteredResponse, FailedAlgorithmRegisteredResponse, JSONResponse]

async routers.algorithms_controller.list_model_files(request: Request, positive_tag: List[str] | None = Query([]), negative_tag: List[str] | None = Query([]), algorithm_type: str | None = Query(None), supported_devices: List[str] | None = Query([])) → List[S3ModelFileRecord]

Lists all available algorithms.

Parameters:

request (Request) – The request.
positive_tag (Optional[List[str]] | None) – A list of tags the algorithm must have.
negative_tag (Optional[List[str]] | None) – A list of tags the algorithm must not have.
algorithm_type (Optional[str] | None) – The type of the algorithm.
supported_devices (Optional[List[str]] | None) – The devices the algorithm is compatible with.

Returns:

The list of algorithms.

Return type:

List[S3ModelFileRecord]

async routers.file_controller.delete_file(id: str, request: Request) → ResponseMessage

Deletes a file from the database.

Parameters:

id (str) – The id of the file.
request (Request) –
The request.

rtype:

ResponseMessage

async routers.file_controller.download_file(id: str, request: Request)

Downloads a file from database.

Parameters:

id (str) – The id of the dataset.
request (Request) – The request. The id of the file.

Returns:

The dataset.

Return type:

StreamingResponse

async routers.file_controller.upload_files(request: Request) → FileUploadResponse

Uploads an image stack as a hdf5 file to the database.

Parameters:: request (Request) – The request.
Returns:: The file upload response.
Return type:: FileUploadResponse

routers.execution_controller.execute_algorithm(request: Request, incoming_execution_request: IncomingExecutionRequest) → ExecutionResponse

Executes an algorithm on a dataset.

Parameters:

request (Request) – The request.
incoming_execution_request (IncomingExecutionRequest) – The incoming execution request.

Returns:

The execution response.

Return type:

ExecutionResponse

Raises:

Exception – If the server backend is not supported or saving the execution record fails.

async routers.execution_controller.get_execution_record(id: str, request: Request) → ExecutionRecord

Get execution record by id.

Parameters:

id (str) – The id of the execution record.
request (Request) – The request.

Returns:

The execution record.

Return type:

ExecutionRecord

async routers.execution_controller.stop_execution(id: str, request: Request) → ResponseMessage

Stops an execution by id.

Parameters:

id (str) – The id of the execution to stop.
request (Request) – The request.

Returns:

The response message.

Return type:

ResponseMessage

Pydantic Models

class pydantic_models.Algorithm(*, algorithm_name: str, algorithm_major_version: str)

Algorithm model.

algorithm_name

The name of the algorithm.

Type:: str

algorithm_major_version

The major version of the algorithm.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.AlgorithmCheckpointRecord(*, checkpoint_id: str, training_id: str, parent_algorithm_id: str, created_at: str, properties: dict, tags: list[str] = [], parent_checkpoint_id: str | None = None)

Algorithm checkpoint record model.

checkpoint_id

The id of the checkpoint.

Type:: str

training_id

The id of the training run that produced this checkpoint.

Type:: str

parent_id

The id of the parent checkpoint, if any.

Type:: str

created_at

The time the checkpoint was created.

Type:: str

properties

A dictionary of arbitrary properties associated with the checkpoint.

Type:: dict

tags

A list of tags associated with the checkpoint.

Type:: list[str]

parent_checkpoint_id

The id of the parent checkpoint, if any.

Type:: Optional[str]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.AlgorithmCheckpointResponse(*, checkpoint_id: str)

Algorithm checkpoint response model.

checkpoint_id

The id of the checkpoint.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.AlgorithmDeployResponse(*, algorithm_id: str, algorithm_name: str, algorithm_major_version: str, algorithm_minor_version: str)

Algorithm deploy response model.

algorithm_id

The id of the algorithm.

Type:: str

algorithm_name

The name of the algorithm.

Type:: str

algorithm_major_version

The major version of the algorithm.

Type:: str

algorithm_minor_version

The minor version of the algorithm.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.AlgorithmRegisteredResponse(*, algorithm_id: str, algorithm_name: str, algorithm_version: str, algorithm_minor_versions: list[str], latest_algorithm_minor_version: str, algorithm_minor_version: str, algorithm_type: str, algorithm_tags: list[str], algorithm_description: str, supported_devices: list[str] = [], default_device: str, additional_parameters: list[AdditionalParameterSchema] = [], training_parameters: list[AdditionalParameterSchema] = [], removable: bool = False, exportable: bool = True)

Algorithm registered response model.

algorithm_id

The id of the algorithm.

Type:: str

algorithm_name

The name of the algorithm.

Type:: str

algorithm_version

The major version of the algorithm.

Type:: str

algorithm_minor_versions

The minor versions of the algorithm.

Type:: list[str]

latest_algorithm_minor_version

The latest minor version of the algorithm.

Type:: str

algorithm_minor_version

DEPRECATED backward-compatibility field. It carries the same value as latest_algorithm_minor_version and is kept only for legacy clients that still expect algorithm_minor_version in the response.

Type:: str

algorithm_input_queue

The input queue of the algorithm.

Type:: str

algorithm_type

The type of the algorithm.

Type:: str

algorithm_tags

The tags of the algorithm.

Type:: list[str]

algorithm_description

Description of the algorithm.

Type:: str

supported_devices

The supported devices.

Type:: list[str]

default_device

The default device.

Type:: str

additional_parameters

The additional parameters.

Type:: list[AdditionalParameterSchema]

training_parameters

The training parameters.

Type:: list[AdditionalParameterSchema]

removable

Whether the algorithm can be removed via the deploy delete endpoint.

Type:: bool

exportable

Whether the algorithm can be exported.

Type:: bool

checkpoints

The list of checkpoint ids associated with the algorithm.

Type:: list[str]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

Deploy record model.

deploy_id

The id of the deploy job.

Type:: str

status

The status of the deploy job.

Type:: str

path

The local path used for deploy.

Type:: str

algorithm_id

The deployed algorithm id (if available).

Type:: Optional[str]

algorithm_name

The deployed algorithm name (if available).

Type:: Optional[str]

algorithm_major_version

The deployed algorithm major version (if available).

Type:: Optional[str]

time_started

The time the deploy started.

Type:: Optional[str]

time_completed

The time the deploy completed.

Type:: Optional[str]

log

Error or informational log.

Type:: Optional[str]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.DeployResponse(*, deploy_id: str)

Deploy response model.

deploy_id

The id of the deploy job.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.ExecutionLogRecord(*, log: str)

Execution log record model.

log

The log.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.ExecutionRecord(*, execution_id: str, algorithm_id: str, checkpoint_id: str | None = None, algorithm_minor_version: str | None = None, input_dataset_ids: list[str], execution_device_override: str | None = None, resolved_execution_device: str | None = None, additional_parameters: dict, session_token: str | None, output_dataset_ids: list[str], status: str, progress: float, time_started: str, time_completed: str, log: str)

Execution record model.

execution_id

The id of the execution.

Type:: str

algorithm_id

The id of the algorithm.

Type:: str

checkpoint_id

The id of the checkpoint, if any.

Type:: Optional[str]

algorithm_minor_version

The minor version of the executed algorithm.

Type:: Optional[str]

input_dataset_ids

The ids of the input datasets.

Type:: list[str]

execution_device_override

The requested abstract execution device class for the run, e.g. cpu, gpu or mps.

Type:: Optional[str]

resolved_execution_device

The concrete runtime device Compox resolved for the execution, e.g. cpu, cuda or mps.

Type:: Optional[str]

additional_parameters

The additional parameters.

Type:: dict

session_token

The string identifier of the session.

Type:: Union[str, None]

output_dataset_ids

The ids of the output datasets.

Type:: list[str]

status

The status of the execution.

Type:: str

progress

The progress of the execution.

Type:: float

time_started

The time the execution started.

Type:: str

time_completed

The time the execution completed.

Type:: str

log

The log of the execution.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.ExecutionResponse(*, execution_id: str)

Execution response model.

execution_id

The id of the execution.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.FailedAlgorithmRegisteredResponse(*, algorithm_name: str, algorithm_version: str, message: str)

Failed algorithm response model.

algorithm_name

The name of the algorithm.

Type:: str

algorithm_version

The version of the algorithm.

Type:: str

message

The message.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.FileUploadBody(*, file_body: List)

File upload body model.

file_body

The file body.

Type:: List

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.FileUploadResponse(*, file_id: str)

File upload response model.

file_id

The id of the file.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.IncomingExecutionRequest(*, algorithm_id: str, input_dataset_ids: list[str], checkpoint_id: str | None = None, algorithm_minor_version: str | None = None, execution_device_override: str = None, additional_parameters: dict = {}, session_token: str | None = None)

Incoming execution request model.

algorithm_id

The id of the algorithm.

Type:: str

input_dataset_ids

The id of the input dataset.

Type:: list[str]

checkpoint_id

The id of the checkpoint, if any.

Type:: str

algorithm_minor_version

The minor version of the algorithm to execute.

Type:: str

execution_device_override

The execution device override.

Type:: str

additional_parameters

The additional parameters.

Type:: dict

session_token

The string identifier of the session.

Type:: Union[str, None]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.IncomingSampleRequest(*, files: list[dict[str, list[str]]], tags: list[str] = [])

Incoming sample request model.

files

The list of dicts with file paring structure.

Type:: list[dict[str, list[str]]]

tags

The tags associated with the sample.

Type:: list[str]

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.IncomingTrainingRequest(*, algorithm_id: str, training_data: list[str], checkpoint_id: str | None = None, algorithm_minor_version: str | None = None, tags: list[str] = [], additional_parameters: dict | None = None)

Incoming training request model.

algorithm_id

The id of the algorithm to train.

Type:: str

training_data

List of sample ids used as training data.

Type:: list[str]

checkpoint_id

The id of the input checkpoint, if any.

Type:: str, optional

algorithm_minor_version

The minor version of the algorithm to train.

Type:: str, optional

tags

The list of tags associated with the training run.

Type:: list[str]

additional_parameters

Additional training parameters (e.g., iterations, learning rate, …).

Type:: dict

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.MinioServer(*, executable_path: str, storage_path: str, console_address: str, address: str)

Minio server model.

executable_path

The path to the minio executable.

Type:: str

storage_path

The path to the minio storage.

Type:: str

console_address

The address of the minio console.

Type:: str

address

The address of the minio server.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.MinioServerInfo(*, storage_path: str, console_address: str, address: str)

Minio server info model.

storage_path

The path to the minio storage.

Type:: str

console_address

The address of the minio console.

Type:: str

address

The address of the minio server.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.ResponseMessage(*, detail: str | None = None)

Response message model.

detail

The message.

Type:: str | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.RootMessage(*, name: str, tags: list[str], group: str, organization: str, domain: str, version: str, cuda_available: bool | None = None, cuda_capable_devices_count: int | None = None)

Root message model.

name

The name of the server.

Type:: str

tags

The server tags.

Type:: list[str]

group

The group.

Type:: str

organization

The organization.

Type:: str

domain

The domain.

Type:: str

version

The version.

Type:: str

cuda_available

If cuda is available.

Type:: bool | None

cuda_capable_devices_count

The number of cuda capable devices.

Type:: int | None

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.S3Bucket(*, bucket_name: str)

S3 bucket model.

bucket_name

The name of the bucket.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.S3ModelFile(*, runner_path: str, algorithm_path: str, algorithm_name: str, algorithm_major_version: str, algorithm_minor_version: str)

S3 model file model.

runner_path

The path to the runner file.

Type:: str

algorithm_path

The path to the algorithm file.

Type:: str

algorithm_name

The name of the algorithm.

Type:: str

algorithm_major_version

The major version of the algorithm.

Type:: str

algorithm_minor_version

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.S3ModelFileRecord(*, algorithm_key: str)

S3 model file record model.

algorithm_key

The key of the algorithm.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.SampleRecord(*, sample_id: str, files: list[dict[str, list[str]]], tags: list[str] = [], time_created: str)

Sample record model.

sample_id

The id of the sample.

Type:: str

files

The list of dicts with file paring structure.

Type:: list[dict[str, list[str]]]

tags

The tags associated with the sample.

Type:: list[str]

time_created

The time the sample was created.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.SampleResponse(*, sample_id: str)

Sample response model.

sample_id

The id of the sample.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.TrainingRecord(*, training_id: str, algorithm_id: str, status: str, progress: float, time_started: str, time_completed: str | None = None, log: str | None = None, training_data: list[str], additional_parameters: dict | None = None, state: dict, tags: ~typing.List[str] = [], checkpoint_id: str | None = None, algorithm_minor_version: str | None = None, output_checkpoint_ids: ~typing.List[str] = <factory>)

Training record model.

training_id

The id of the training.

Type:: str

status

The status of the training (e.g., running, completed, failed).

Type:: str

progress

The progress of the training in range [0.0–1.0].

Type:: float

time_started

The time the training started.

Type:: str

time_completed

The time the training completed, if available.

Type:: str, optional

log

The log output from the training.

Type:: str, optional

training_data

The list of sample ids used for training.

Type:: list[str]

state

Training state information, including metrics and losses.

Type:: dict

output_checkpoint_ids

The list of produced checkpoint ids.

Type:: list[str]

tags

The list of tags associated with the training run.

Type:: list[str]

checkpoint_id

The id of the input checkpoint, if any.

Type:: str, optional

algorithm_minor_version

The minor version of the algorithm to train.

Type:: str, optional

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.TrainingResponse(*, training_id: str)

Training response model.

training_id

The id of the training job.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].

class pydantic_models.UrlResponse(*, url: str)

Url response model.

url

The url.

Type:: str

model_config = {}: Configuration for the model, should be a dictionary conforming to [ConfigDict][pydantic.config.ConfigDict].