User Guide
In the following sections, we will describe how to create an algorithm module for Compox.
How to create an algorithm module
The algorithm module is a Python package that contains the algorithm code and assets. The algorithm module should be structured in a specific way in order to work properly with Compox.
See also:
../docs/algorithm_training_guide.md
../docs/training_client_workflow.md
The algorithm should be structured as follows:
algorithm_name/
|-- __init__.py
|-- Runner.py
|-- pyproject.toml
|-- files/
| |-- file1
| `-- file2
`-- some_internal_submodule/
|-- __init__.py
|-- module1.py
`-- module2.py
The Runner.py file
The Runner.py file is a mandatory component of the algorithm module. It serves as the entry point for Compox to run the algorithm. It must define a class named Runner. The Runner class can inherit from BaseRunner (for generic behavior) or from a Runner class specific to the algorithm type (see below). Runner classes can be imported from the compox.algorithm_utils package.
Why this exists: Compox always loads and instantiates Runner as the algorithm entry point, so keeping it in a predictable location allows deployment, caching, and execution to work consistently.
Algorithm types
The algorithm type is defined in the algorithm’s pyproject.toml file. Your Runner inheritance should match the declared type, but Compox does not infer the type from the class. If you inherit from BaseRunner, set algorithm_type = "Generic" (or leave it as Undefined for development, but not for production). For example, an Image2Image algorithm receives an image as input and returns an image as output. In that case, the pyproject.toml file should contain:
[tool.compox]
algorithm_type = "Image2Image"
and the Runner should inherit from the matching Runner class:
from compox.algorithm_utils.Image2ImageRunner import Image2ImageRunner
class Runner(Image2ImageRunner):
"""
The runner class for the denoiser algorithm.
"""
The following algorithm types are currently supported:
Image2Image
Image2Embedding
Image2Segmentation
Image2Alignment
Segmentation2Segmentation
Generic
Undefined exists as a fallback/default but should not be used for real algorithms.
Why this exists: typed runners provide schema and convenience helpers so you can focus on model logic instead of wiring input/output formats. BaseRunner is for algorithms with custom schemas or non-standard inputs/outputs.
The preprocess, inference and postprocess methods
The run method calls preprocess, then inference, then postprocess. Each of these methods accepts two arguments (after self): the input data for that stage and a dictionary of user arguments (args).
preprocess(self, input_data: dict, args: dict | None = None)typically loads data usingfetch_data, prepares it, and returns the result forinference.inference(self, data: Any, args: dict | None = None)runs the model or algorithm.postprocess(self, data: Any, args: dict | None = None)should upload output datasets usingpost_dataand return a list of dataset IDs.
The input_data dictionary contains identifiers provided by the user (commonly input_dataset_ids).
Why this exists: separating the pipeline makes data flow and logging explicit, enables progress reporting, and allows easier debugging.
The fetch_data method for BaseRunner
fetch_data retrieves datasets by IDs and validates them using a Pydantic schema. It expects a list of file ID strings.
Example of fetching data:
embeddings = self.fetch_data(input_data["input_dataset_ids"], EmbeddingSchema)
The Pydantic schemas are defined in compox/src/compox/algorithm_utils/io_schemas.py, but you are not required to use them. You can define your own schemas by inheriting from DataSchema (useful for type checking and validation).
Why this exists: schemas provide consistent validation and type hints for downstream code, while still allowing custom formats when needed.
The fetch_data method for specific algorithm types
Runner subclasses for specific algorithm types use predefined schemas, so fetch_data does not take a schema argument. It still expects a list of file ID strings.
Example for Image2Image:
input_data = self.fetch_data(input_data["input_dataset_ids"])
This fetches datasets validated against the ImageSchema.
The post_data method for BaseRunner
post_data uploads output datasets and validates them with a Pydantic schema. It expects a list of dictionaries, one per output dataset.
Example of posting data:
output_dataset_ids = self.post_data(output, MaskSchema)
The post_data method for specific algorithm types
For specific algorithm types, post_data uses predefined schemas, so no schema argument is needed.
Example for Image2Image:
output_dataset_ids = self.post_data(output)
The load_assets method
You can override load_assets to load model weights or other files once and cache them on the Runner instance. Use self.fetch_asset(...) to load files stored in the algorithm package. Paths are relative to the Runner module root (e.g., files/weights.pt). fetch_asset returns an io.BytesIO object that you can pass to libraries like torch.load.
Example:
state_dict_bytes = self.fetch_asset("files/vit_b.pt")
state_dict = torch.load(state_dict_bytes)
Why this exists: model weights and large resources are expensive to load, so Compox caches them on the Runner instance for reuse across requests. These attributes are locked to avoid unsafe mutation across threads.
The log_message method
Log messages to Compox:
self.log_message("This is an info message.", logging_level="INFO")
The set_progress method
Report execution progress (float between 0 and 1):
self.set_progress(0.5)
Sessions (optional)
Executions can be associated with a session_token to reuse an in‑memory cache across runs. From a Runner perspective, this cache is accessed via:
save_item_to_session(obj, key)load_item_from_session(key)remove_item_from_session(key)
Why this exists: some algorithms benefit from reusing expensive intermediates (e.g., feature caches, preprocessed inputs) across multiple executions without reloading from storage.
Notes:
Sessions are FastAPI background task only. Celery mode does not support sessions.
Sessions are in‑memory (single process) and expire after a fixed timeout, so treat them as an optimization rather than persistent storage.
The client supplies
session_tokenon execution requests; the server can also generate one when missing.
The pyproject.toml file
The pyproject.toml file contains algorithm metadata. It must be in the algorithm root.
Mandatory fields
[project]
name = "algorithm_name"
version = "major.minor.patch"
Why this exists: Compox uses name + major version to identify an algorithm line and uses minor versions to track distinct builds.
Versioning behavior (AlgorithmDeployer)
Compox derives versioning from the [project] version string in pyproject.toml:
Major version = the first segment (before the first dot)
Minor version = the second segment (between first and second dot)
Patch version is currently ignored by the deployer
When an algorithm is deployed, Compox searches the algorithm store for an existing record with the same algorithm name and major version. The behavior is:
If found: Compox compares the newly built module ID and assets dictionary with the latest stored minor version. If either differs, it inserts a new minor version entry and increments
latest_algorithm_minor_version. If both are identical, it does not insert a new minor version.If not found: Compox creates a new algorithm record with
latest_algorithm_minor_versioninitialized from theproject.versionminor segment, and stores the module/assets under that.
Notes:
The stored minor versions are not the original
pyproject.tomlpatch version; only the major/minor segments drive versioning.Re-deploying the same algorithm with identical module and assets is a no‑op for minor versions (no new entry is added).
If you change only non‑code assets, a new minor version is created because the assets dictionary changes.
Why this exists: this makes deployments deterministic and deduplicated; you can update assets or code without forcing a new algorithm identity while still keeping a history of builds.
Supported devices
Supported devices are a list of strings: "cpu", "gpu", or "mps". The default_device must be included in supported_devices, otherwise validation raises an error.
supported_devices = ["cpu", "gpu"]
default_device = "cpu"
Additional parameters
Additional parameters are a list of objects with name, description, and a config section.
You can also provide an optional displayed_name for a more human-friendly UI label. If omitted,
Compox derives it automatically from name.
additional_parameters = [
{ name = "some_string_parameter", displayed_name = "Some string parameter", description = "This parameter strings.", config = { type = "string", default = "hello", adjustable = true } },
{ name = "threshold", description = "Threshold used during inference.", config = { type = "float_range", default = 0.5, min = 0.0, max = 1.0, step = 0.05, decimal_precision = 2, adjustable = true } },
]
Parameter types:
Parameter type |
Configuration fields |
|---|---|
string |
type, default, adjustable |
int |
type, default, adjustable |
float |
type, default, adjustable, decimal_precision(optional) |
bool |
type, default, adjustable |
int_range |
type, default, min, max, step, adjustable |
float_range |
type, default, min, max, step, adjustable, decimal_precision(optional) |
string_enum |
type, default, options, adjustable |
int_enum |
type, default, options, adjustable |
float_enum |
type, default, options, adjustable, decimal_precision(optional) |
string_list |
type, default, options, adjustable |
int_list |
type, default, options, adjustable |
float_list |
type, default, options, adjustable, decimal_precision(optional) |
bool_list |
type, default, options, adjustable |
Notes:
displayed_nameis optional. If not provided, Compox generates one fromnameby replacing_and-with spaces and capitalizing the result.decimal_precisionis optional and only valid for float-based parameter types.decimal_precisionmust be greater than or equal to0.
Training parameters
Training parameters use the same schema as additional parameters:
training_parameters = [
{ name = "epochs", displayed_name = "Epochs", description = "Training epochs.", config = { type = "int", default = 10, adjustable = true } },
]
Other fields
check_importable = false
obfuscate = true
hash_module = true (deprecated; ignored, deduplication is always on)
hash_assets = true (deprecated; ignored, deduplication is always on)
removable = false
exportable = true
Why these exist:
check_importablehelps catch packaging mistakes early.obfuscatereduces casual code exposure in stored modules.(deprecated)
hash_moduleandhash_assetsare ignored. Deduplication by content hash is always enabled.removablecontrols whether the deploy delete endpoint is allowed to remove this algorithm (defaults to false).exportablecontrols whether the export endpoint can package this algorithm (defaults to true). If false, export returns HTTP 403.
The files directory
Optional. Store data assets your algorithm needs at runtime. Load them via self.fetch_asset(...).
Why this exists: code is zipped and cached separately from assets, so non‑Python files are stored and retrieved from the asset store by path.
The some_internal_submodule directory
Optional. Include internal modules used by your Runner.
Why this exists: any Python modules inside the algorithm directory are packaged into the module zip, so you can keep helper code alongside your Runner.
Example of a dummy algorithm
algorithm_name/
|-- __init__.py
|-- Runner.py
|-- pyproject.toml
|-- files/
| `-- some_heavy_model.pt
`-- my_big_model/
|-- __init__.py
`-- utils.py
Runner example:
from my_big_model.utils import MyBigModel
from compox.algorithm_utils.BaseRunner import BaseRunner
from compox.algorithm_utils.io_schemas import ImageSchema, SegmentationSchema
import numpy as np
import torch
class Runner(BaseRunner):
"""
The runner class for the foo algorithm.
"""
def load_assets(self):
"""
The assets to load for the foo algorithm.
"""
some_model = MyBigModel()
self.log_message("Loading the Foo assets.")
state_dict_bytes = self.fetch_asset("files/some_heavy_model.pt")
state_dict = torch.load(state_dict_bytes)
some_model.load_state_dict(state_dict)
self.my_big_model = some_model
def preprocess(self, input_data: dict, args: dict | None = None) -> np.ndarray:
self.log_message("Preprocessing the Foo input data.")
my_data = self.fetch_data(input_data["input_dataset_ids"], ImageSchema)
input_array = np.array(my_data[0]["image"])
return input_array
def inference(self, data: np.ndarray, args: dict | None = None) -> torch.Tensor:
self.log_message("Running the Foo inference.")
some_user_defined_args = args.get("some_user_defined_args", None)
if some_user_defined_args is not None:
self.log_message(f"User defined args: {some_user_defined_args}")
output = self.my_big_model(data, some_user_defined_args)
self.set_progress(0.5)
self.log_message("The Foo inference is done.")
return output
def postprocess(self, inference_output: torch.Tensor, args: dict | None = None) -> list[str]:
self.log_message("Postprocessing the Foo output.")
output = inference_output.detach().numpy()
output_dicts = [{"mask": output}]
output_dataset_ids = self.post_data(output_dicts, SegmentationSchema)
return output_dataset_ids
pyproject.toml example:
[project]
name = "foo"
version = "0.1.0"
[tool.compox]
algorithm_type = "Generic"
tags = ["foo", "bar"]
description = "This algorithm does foo and bar."
additional_parameters = [
{ name = "some_user_defined_args", description = "This is a user defined argument.", config = { type = "string", default = "hello", adjustable = true } },
]
check_importable = false
obfuscate = true
hash_module = true
hash_assets = true
Denoising algorithm template
Here a working template for developing a denoising algorithm will be presented. This guide will cover the specifics needed to develop an image denoising algorithm. To see how compox algorithm should generally be structured, please refer to the algorithms/readme.md file.
The algorithm folder is structured as follows:
template_denoising_algorithm/
├── __init__.py
├── Runner.py
├── pyproject.toml
└── image_denoising/
├── __init__.py
└── denoising_utils.py
└── README.md
The pyproject.toml file
The pyproject.toml is a file that contains the algorithm metadata. This file is used by compox to properly deploy the algorithm as a service. The pyproject.toml file should be placed in the root directory of the algorithm.
First, let’s create the pyproject.toml file. Under the [project] section, you should provide the name and version of the algorithm. The name should be unique and should not contain any spaces. The version should be in the format major.minor.patch. The algorithm name and versions is used to identify the algorithm in compox so it is important to provide a unique name and version.
[project]
name = "template_denosing_algorithm"
version = "1.0.0"
Next, you should fill out the [tool.compox] section. This section contains the metadata that compox uses to deploy the algorithm as a service. algorithm_type defines the algorithm input and output types, you may either use some predefined algorithm types or define your own. The predefined algorithm types are located in compox.algorithm_utils. For an image denoising algorithm, we will use the the Image2Image type. This type is suitable for image denoising as both our input and output is an image (or a sequence of images).
[tool.compox]
algorithm_type = "Image2Image"
Each algorithm type has a set of potential tags, which are used to specify the general algorithm functionality. Mutliple tags can be provided for one algorithm. For image denoising algorithms, we will use the image-denoising tag.
tags = ["image-denoising"]
The description field should contain a brief description of the algorithm.
description = "Denoises a sequence of images using the total variation denoising algorithm."
For the denoising algorithm, we will add a denoising_weight parameter that will control the denoising strength. Because we want to set a range for the denoising weight, we will use the float_range parameter type. The default field should contain the default value of the parameter. The min and max fields should contain the minimum and maximum values of the parameter. The step field should contain the step size of the parameter. The adjustable field should be set to true if the parameter should be exposed to the user to adjust.
additional_parameters = [
{name = "denoising_weight", displayed_name = "Denoising weight", description = "The weight of the denoising term between 0 and 1. Higher values will result in more denoising, but can distort the image.", config = {type = "float_range", default = 0.1, min = 0.0, max = 1.0, step = 0.05, decimal_precision = 2, adjustable = true}}
]
To see more information about the possible parameter types see the How to create an algorithm module section.
displayed_name is optional and controls the human-friendly UI label. decimal_precision is optional and only valid for float-based parameter types.
The algorithm dependencies
The algorithm can use any libraries from the global compox environment. Additional dependencies can be provided as python submodules. Here we will use the scikit-image and numpy libraries to handle the image data. We also implemented a simple image_denoising module that contains an __init__.py file and a denosing_utils.py file. The denoising_utils.py file contains the denoise_image function that performs the denoising of the images. The image_denoising module should be placed in the root directory of the algorithm.
from skimage.restoration import (
denoise_tv_chambolle,
)
def denoise_image(image, weight=0.1):
"""
Denoise the image using the total variation denoising algorithm.
Parameters
----------
image : np.ndarray
The image to denoise.
weight: float
The weight parameter for the denoising algorithm.
Returns
-------
np.ndarray
The denoised image.
"""
return denoise_tv_chambolle(image, weight=weight)
The Runner.py file
The Runner.py file is the main file of the algorithm. This file should contain the algorithm implementation. The Runner.py file should be placed in the root directory of the algorithm.
Because we specified the algorithm type as Image2Image, the Runner.py file should contain a class that inherits from the Image2ImageRunner class. The Image2ImageRunner class is located in the compox.algorithm_utils module. The Image2ImageRunner class contains the necessary methods to handle the input and output of the algorithm.
from compox.algorithm_utils.Image2ImageRunner import Image2ImageRunner
class Runner(Image2ImageRunner):
"""
The runner class for the denoiser algorithm.
"""
def __init__(self, task_handler, device: str = "cpu"):
"""
The denoising runner.
"""
super().__init__(task_handler, device)
We can implement a load_assets method to load any assets that the algorithm requires upon initilaization of the Runner. The important bit is that the attributes that are loaded in the load_assets method are cached with the algorithm and do not have to be reloaded for each algorithm call. This can greatly speed up the algorithm execution. Since we do not need any assets for the denoising algorithm, we can leave the load_assets method empty.
def load_assets(self):
"""
Here you can load the assets needed for the algorithm. This can be
the model, the weights, etc. The assets are loaded upon the first
call of the algorithm and are cached with the algorithm instance.
"""
pass
Next, we can implement the inference method, where we perform the denoising of the images. The data will be passed to the inference method as a numpy array. The inference method should return a numpy array with the denoised images of the same shape as the input images. You can use the self.log_message method to log messages to the compox log. The self.set_progress method can be used to update the progress with a float value between 0 and 1.
def inference(self, data: np.ndarray, args: dict | None = None) -> np.ndarray:
"""
Run the inference.
Parameters
----------
input_data : dict
The input data.
Returns
-------
np.ndarray
The denoised images.
"""
self.log_message("Starting inference.")
# now we retrieve the input data
# we will min max normalize the images
min_val = np.min(data)
max_val = np.max(data)
images = (data - min_val) / (max_val - min_val)
# here we will get the optional argument of denoising weight
denosing_weight = args.get("denoising_weight", 0.1)
# we can post messages to the log
self.log_message(
f"Starting denoising of {images.shape[0]} images with weight {denosing_weight}."
)
# we will denoise the images
denoised_images = np.zeros_like(images)
for i in range(images.shape[0]):
denoised_images[i] = denoise_image(
images[i], weight=denosing_weight
)
# this will update the progress bar
self.set_progress(i / images.shape[0])
# we will nromalize the output
denoised_images = (denoised_images - denoised_images.min()) / (
denoised_images.max() - denoised_images.min()
)
denoised_images = denoised_images.astype(np.float32)
# we will pass the denoised images to the postprocess method
return denoised_images
To customize the behavior of fetching and processing the input data, and postprocessing and uploading the output data, we can implement the preprocess and postprocess methods. The preprocess method is called before the inference method and is used to fetch the input data. The postprocess method is called after the inference method and is used to process the output data. In our case, we will not implement any custom behavior for these methods. You can refer to the compoxorithm_utils.Image2ImageRunner class for more information about these methods.
Deploying the algorithm
To deploy the finished algorithm, use:
compox deploy-algorithms --config app_server.yaml --name template_denoising_algorithm
This deploys the algorithm to Compox. The algorithm can also be added through
the Compox systray interface by clicking Add Algorithm and selecting the
algorithm directory.
Segmentation algorithm template
This guide will cover the specifics needed to develop an image segmentation algorithm. To see how compox algorithm should generally be structured, please refer to the algorithms/readme.md file.
The algorithm folder is structured as follows:
template_segmentation_algorithm/
├── __init__.py
├── Runner.py
├── pyproject.toml
└── image_segmentation/
├── __init__.py
└── segmentation_utils.py
└── README.md
The pyproject.toml file
The pyproject.toml is a file that contains the algorithm metadata. This file is used by compox to properly deploy the algorithm as a service. The pyproject.toml file should be placed in the root directory of the algorithm.
First, let’s create the pyproject.toml file. Under the [project] section, you should provide the name and version of the algorithm. The name should be unique and should not contain any spaces. The version should be in the format major.minor.patch. The algorithm name and versions is used to identify the algorithm in compox so it is important to provide a unique name and version.
[project]
name = "template_segmentation_algorithm"
version = "1.0.0"
Next, we will fill out the [tool.compox] section. This section contains the metadata that compox uses to deploy the algorithm as a service. algorithm_type defines the algorithm input and output types, you may either use some predefined algorithm types or define your own. The predefined algorithm types are located in compox.algorithm_utils. For an image segmentation algorithm, we will use the the Image2Segmentation type. This type is suitable for image segmentation as the input is a sequence of images and the output is a sequence of segmentation masks.
[tool.compox]
algorithm_type = "Image2Segmentation"
Each algorithm type has a set of potential tags, which are used to specify the general algorithm functionality. Mutliple tags can be provided for one algorithm. For image segmentation algorithms, we will use the image-segmentation tag.
tags = ["image-segmenation"]
The description field should contain a brief description of the algorithm.
description = "Performs a binary segmentation of a 3-D image using a skimage filter."
Here we will add a thresholding_algorithm parameter that will allow the user to select the thresholding algorithm to use. The optional displayed_name field provides a human-friendly UI label. The type field is set to string_enum to specify that the parameter is a string with a predefined set of values. The default field is set to otsu to specify the default value of the parameter. The options field is set to a list of strings that specify the possible values of the parameter. The adjustable field is set to true to specify that the user should be able to select the thresholding algorithm to apply.
additional_parameters = [
{name = "thresholding_algorithm", displayed_name = "Thresholding algorithm", description = "The thresholding algorithm to use.", config = {type = "string_enum", default = "otsu", options = ["otsu", "yen", "li", "minimum", "mean", "triangle", "isodata", "local"], adjustable = true}},
]
To see more information about the possible parameter types see the How to create an algorithm module section.
If you later add float-based parameters to this template, you can also provide decimal_precision inside config to control how many decimal places the UI should display.
The algorithm dependencies
The algorithm can use any libraries from the global compox environment. Additional dependencies can be provided as python submodules. Here we will use the numpy library to handle the image data. We also implemented a simple image_segmentation module that contains an __init__.py file and a segmentation_utils.py file. The segmentation_utils.py file contains the threshold_image function that performs segmentation of an image using a selected algorithm. The image_segmentation module should be placed in the root directory of the algorithm.
import skimage.filters as skif
def threshold_image(image, thresholding_algorithm):
"""
Threshold the image using the specified thresholding algorithm.
Parameters
----------
image : np.ndarray
The image to threshold.
thresholding_algorithm : str
The thresholding algorithm to use.
Returns
-------
np.ndarray
The thresholded image.
"""
if thresholding_algorithm == "otsu":
threshold = skif.threshold_otsu(image)
elif thresholding_algorithm == "yen":
threshold = skif.threshold_yen(image)
elif thresholding_algorithm == "li":
threshold = skif.threshold_li(image)
elif thresholding_algorithm == "minimum":
threshold = skif.threshold_minimum(image)
elif thresholding_algorithm == "mean":
threshold = skif.threshold_mean(image)
elif thresholding_algorithm == "triangle":
threshold = skif.threshold_triangle(image)
elif thresholding_algorithm == "isodata":
threshold = skif.threshold_isodata(image)
elif thresholding_algorithm == "local":
threshold = skif.threshold_local(image)
else:
raise ValueError(
f"Invalid thresholding algorithm: {thresholding_algorithm}"
)
return image > threshold
The Runner.py file
The Runner.py file is the main file of the algorithm. This file should contain the algorithm implementation. The Runner.py file should be placed in the root directory of the algorithm.
Because we specified the algorithm type as Image2Segmentation, the Runner.py file should contain a class that inherits from the Image2SegmentationRunner class. The Image2SegmentationRunner class is located in the compox.algorithm_utils module. The Image2SegmentationRunner class contains the necessary methods to handle the input and output of the algorithm.
import numpy as np
from compox.algorithm_utils.Image2SegmentationRunner import (
Image2SegmentationRunner,
)
from image_segmentation.segmentation_utils import threshold_image
class Runner(Image2SegmentationRunner):
"""
The runner class for the image segmentation algorithm.
"""
def __init__(self, task_handler, device: str = "cpu") -> None:
"""
The aligner runner.
"""
super().__init__(task_handler, device=device)
We can implement a load_assets method to load any assets that the algorithm requires upon initilaization of the Runner. The important bit is that the attributes that are loaded in the load_assets method are cached with the algorithm and do not have to be reloaded for each algorithm call. This can greatly speed up the algorithm execution. Since we do not need any assets for the segmentation algorithm, we can leave the load_assets method empty.
def load_assets(self):
"""
Here you can load the assets needed for the algorithm. This can be
the model, the weights, etc. The assets are loaded upon the first
call of the algorithm and are cached with the algorithm instance.
"""
pass
Next, we can implement the inference method, where we perform the segmentation of the images. The inference will receive a numpy array with the images to be segmented. The inference method must return a numpy array with the segmentation masks of the same
shape as the input images. The inference method can also receive a dictionary with the arguments for the algorithm. The arguments are passed to the algorithm from compox and can be used to customize the behavior of the algorithm. In our case, we will use the thresholding_algorithm argument to specify the thresholding algorithm to use.
You can also report the progress of the algorithm by calling the set_progress method. The set_progress method takes a float value between 0 and 1, where 0 is the start of the algorithm and 1 is the end of the algorithm. The log_message method can be used to log messages to compox log.
def inference(self, data: np.ndarray, args: dict | None = None) -> np.ndarray:
"""
Run the inference.
Parameters
----------
data : np.ndarray
The images to be segmented.
args : dict
The arguments for the algorithm.
Returns
-------
np.ndarray
The segmented images.
"""
# now we retrieve the input data
thresholding_algorithm = args.get("thresholding_algorithm", "otsu")
# we can post messages to the log
self.log_message(
f"Starting inference with thresholding algorithm: {thresholding_algorithm}"
)
# here we will threshold the images
mask = threshold_image(data, thresholding_algorithm)
# we can also log progress
self.set_progress(0.5)
# pass the mask to the postprocess
return mask
To customize the behavior of fetching and processing the input data, and postprocessing and uploading the output data, we can implement the preprocess and postprocess methods. The preprocess method is called before the inference method and is used to fetch the input data. The postprocess method is called after the inference method and is used to process the output data. In our case, we will not implement any custom behavior for these methods. You can refer to the compox.algorithm_utils.Image2SegmentationRunner class for more information about these methods.
Deploying the algorithm
To deploy the finished algorithm, use:
compox deploy-algorithms --config app_server.yaml --name template_segmentation_algorithm
This deploys the algorithm to Compox. The algorithm can also be added through
the Compox systray interface by clicking Add Algorithm and selecting the
algorithm directory.
Registration algorithm template
Here a working template for developing an image registration algorithm will be presented. To see how compox algorithm should generally be structured, please refer to the algorithms/readme.md file.
The algorithm folder is structured as follows:
template_registration_algorithm/
├── __init__.py
├── Runner.py
├── pyproject.toml
└── image_registration/
├── __init__.py
└── registration_utils.py
└── README.md
The pyproject.toml file
The pyproject.toml is a file that contains the algorithm metadata. This file is used by compox to properly deploy the algorithm as a service. The pyproject.toml file should be placed in the root directory of the algorithm.
First, let’s create the pyproject.toml file. Under the [project] section, you should provide the name and version of the algorithm. The name should be unique and should not contain any spaces. The version should be in the format major.minor.patch. The algorithm name and versions is used to identify the algorithm in compox so it is important to provide a unique name and version.
[project]
name = "template_registration_algorithm"
version = "1.0.0"
Next, we will fill out the [tool.compox] section. This section contains the metadata that compox uses to deploy the algorithm as a service. algorithm_type defines the algorithm input and output types, you may either use some predefined algorithm types or define your own. The predefined algorithm types are located in compox.algorithm_utils. For an image registration algorithm, we will use the the Image2Alignment type. This type is suitable for image segmentation as the input is a sequence of images and the output is a sequence homography matrices.
[tool.compox]
algorithm_type = "Image2Alignment"
Each algorithm type has a set of potential tags, which are used to specify the general algorithm functionality. Mutliple tags can be provided for one algorithm. For image registration algorithms, we will use the image-alignment tag.
tags = ["image-alignment"]
The description field should contain a brief description of the algorithm.
description = "Generates homography matrices for aligning a sequence of images."
Here we will add a max_translation parameter that defines the maximum translation as a fraction of the image size. Because we want to set a range for the parameter, we will use the float_range type. The displayed_name field provides a human-friendly UI label. The default field should contain the default value of the parameter. The min and max fields should contain the minimum and maximum values of the parameter. The step field should contain the step size of the parameter. The decimal_precision field controls how many decimal places the UI should display for float-based values. The adjustable field should be set to true if we want to expose the parameter to the user to adjust.
{name = "max_translation", displayed_name = "Max translation", description = "Maximum translation as a fraction of the image size.", config = {type = "float_range", default = 0.25, min = 0.0, max = 1.0, step = 0.05, decimal_precision = 2, adjustable = true}}
To see more information about the possible parameter types see the How to create an algorithm module section.
The algorithm dependencies
The algorithm can use any libraries from the global compox environment. Additional dependencies can be provided as python submodules. Here we will use the numpy library to handle the image data. We also implemented a simple image_registration module that contains an __init__.py file and a registration_utils.py file. The registration_utils.py file contains the get_random_translation function that generates a random homography matrix with a maximum translation defined by the max_translation parameters as a fraction of the input image size. The image_registration module should be placed in the root directory of the algorithm.
import numpy as np
def get_random_translation(image: np.ndarray, max_translation: float = 0.25):
"""
Get a random translation matrix.
Parameters
----------
image : np.ndarray
The image.
max_translation : float
The maximum translation.
Returns
-------
np.ndarray
The translation matrix.
"""
# get the image dimensions
height, width = image.shape[:2]
h = np.eye(3)
# random translation
h[0, 2] = np.random.uniform(
-max_translation * width, max_translation * width
)
h[1, 2] = np.random.uniform(
-max_translation * height, max_translation * height
)
return h
The Runner.py file
The Runner.py file is the main file of the algorithm. This file should contain the algorithm implementation. The Runner.py file should be placed in the root directory of the algorithm.
Because we specified the algorithm type as Image2Alignment, the Runner.py file should contain a class that inherits from the Image2AlignmentRunner class. The Image2AlignmentRunner class is located in the compox.algorithm_utils module. The Image2AlignmentRunner class contains the necessary methods to handle the input and output of the algorithm.
import numpy as np
from compox.algorithm_utils.Image2AlignmentRunner import (
Image2AlignmentRunner,
)
from image_registration.registration_utils import get_random_translation
class Runner(Image2AlignmentRunner):
"""
The runner class for the denoiser algorithm.
"""
def __init__(self, task_handler, device: str = "cpu"):
"""
The image registration runner.
"""
super().__init__(task_handler, device)
We can implement a load_assets method to load any assets that the algorithm requires upon initilaization of the Runner. The important bit is that the attributes that are loaded in the load_assets method are cached with the algorithm and do not have to be reloaded for each algorithm call. This can greatly speed up the algorithm execution. Since we do not need any assets for the image registration algorithm, we can leave the load_assets method empty.
def load_assets(self):
"""
Here you can load the assets needed for the algorithm. This can be
the model, the weights, etc. The assets are loaded upon the first
call of the algorithm and are cached with the algorithm instance.
"""
pass
Next, we can implement the inference method, where we perform the registration of the images. The data will be passed to the inference method as a numpy array. The inference method return a list of homography matrices represented by numpy arrays. You can also report the progress of the algorithm by calling the set_progress method. The set_progress method takes a float value between 0 and 1, where 0 is the start of the algorithm and 1 is the end of the algorithm. The log_message method can be used to log messages to the compox log.
def inference(self, data: np.ndarray, args: dict | None = None) -> list[np.ndarray]:
"""
Run the inference.
Parameters
----------
data : np.ndarray
The input images
Returns
-------
list[np.ndarray]
The output homography matrices.
"""
self.log_message("Starting inference.")
# now we retrieve the input data
max_translation = args.get("max_translation", 0.25)
# we can post messages to the log
self.log_message(f"Registering {data.shape[0]} images.")
# we will denoise the images
matrices = []
for i in range(data.shape[0] - 1):
matrix = get_random_translation(
data[i], max_translation=max_translation
)
matrices.append(matrix)
self.set_progress(i / data.shape[0])
# we will pass the homography matrices to the output
return matrices
To customize the behavior of fetching and processing the input data, and postprocessing and uploading the output data, we can implement the preprocess and postprocess methods. The preprocess method is called before the inference method and is used to fetch the input data. The postprocess method is called after the inference method and is used to process the output data. In our case, we will not implement any custom behavior for these methods. You can refer to the compox.algorithm_utils.Image2AlignmentRunner class for more information about these methods.
Deploying the algorithm
To deploy the finished algorithm, use:
compox deploy-algorithms --config app_server.yaml --name template_registration_algorithm
This deploys the algorithm to Compox. The algorithm can also be added through
the Compox systray interface by clicking Add Algorithm and selecting the
algorithm directory.
Implementing the train() method in Compox algorithm runners
This guide is for algorithm developers implementing training logic in their Runner classes.
Where train() is called
Training is started by
POST /api/v0/train-algorithm.The server creates a
TrainingHandlerand callsrunner.run_training(...).BaseRunner.run_training()sets status toRUNNING, callsself.train(...), and on success callsTrainingHandler.mark_as_completed().
Key implications:
If
train()raises, the training job is markedFAILED.If a stop request is posted,
TaskHandlerraisesTaskStoppedExceptionand status becomesSTOPPED.
Required method: train(self, training_data, args)
In BaseRunner, train() is the method you must override. It should not return anything.
The training task is considered complete when train() finishes without error.
Signature in BaseRunner:
def train(self, training_data: list[str], args: dict | None = None) -> None:
...
Fetching training data
You receive training_data as a list of training sample IDs.
Use the TrainingHandler helpers (available through BaseRunner) to load datasets:
dataset = self.get_training_dataset(training_sample_ids)
From there, you can use these training-specific save/load helpers on the runner.
All of them are implemented by TrainingHandler and surfaced via BaseRunner.
Saving / downloading to TempStore
save_training_files_to_temp_store(folder_path, files, schema, parallel=True)
Use when you already have in‑memory data (e.g., numpy arrays) and want to persist them to the TempStore before training.
Inputs:
folder_path: target subfolder in TempStorefiles: list of dicts (each dict is a logical file with HDF5 keys/values)schema: PydanticDataSchemafor validation
Output:
list of
Pathobjects pointing to saved files in TempStore
*download_files_to_temp_store(folder_path, file_ids, schema, batch_size=8, keys)
Use when you have a flat list of file IDs from
data-store.Inputs:
file_ids: list of object IDs indata-storeschema: PydanticDataSchemafor validation*keys: optional HDF5 keys to extract (if omitted, all keys)
Output:
list of
Pathobjects in TempStore
Notes:
Downloads in batches to reduce memory spikes.
download_dataset_to_temp_store(dataset, schemas)
Use when you have training samples and want the full manifest structure preserved.
Inputs:
dataset: aTrainingDatasetcreated from sample IDsschemas: dict mapping sample keys to Pydantic schemas (e.g.{"input": InputSchema, "target": TargetSchema})
Output:
local_samples: list of samples, each sample is a list of dicts whose values are local Paths in TempStore.
Temp layout:
<temp>/<sample_id>/<file_index>/<key>/...
Loading from TempStore
*load_files_from_temp_store(paths, parallel=True, keys)
Use when you already have a list of TempStore paths.
Inputs:
paths: list of file paths in TempStore*keys: optional HDF5 keys to extract (if omitted, all keys)
Output:
list of dicts (in‑memory data)
load_dataset_from_temp_store(local_samples)
Use with the output from
download_dataset_to_temp_store(...).Input:
local_samples: list-of-list-of-dict structure with TempStore paths
Output:
Same structure, but values are loaded data dicts instead of paths.
Schema validation
All save/download methods validate against Pydantic DataSchema definitions
(see compox.algorithm_utils.io_schemas). The schema defines expected HDF5 keys,
their types, and any validation rules.
Reporting progress and state
Use these methods during training:
self.set_progress(0.5) # float in [0.0, 1.0]
self.set_state({"epoch": 3, "loss": 0.12})
self.log_message("Epoch 3/10", logging_level="INFO")
set_progressupdatesTrainingRecord.progress.set_stateoverwrites the currentTrainingRecord.state.log_messageappends to the training log.
Saving checkpoints (training outputs)
To persist a trained model or intermediate state, call:
checkpoint_id = self.save_checkpoint(
{"my_asset.pt": model_bytes},
properties={"stage": "intermediate", "epoch": 3, "loss": 0.12},
)
Important rules:
Keys in the checkpoint dict must match asset paths already defined in the algorithm.
TrainingHandler.save_checkpoint()validates this against the algorithm’s assets.The checkpoint is stored in
algorithm-checkpoint-store.Each saved checkpoint ID is appended to
TrainingRecord.output_checkpoint_ids.
Training completion:
TrainingHandler.mark_as_completed()requires at least one checkpoint. If none were saved, the training is marked failed.
Stopping behavior
If a stop request is posted:
TaskHandler._check_for_stop_request()raisesTaskStoppedException.Training is marked
STOPPED.
Recommendation: Keep your training loop responsive so stop requests can be detected quickly.
Example skeleton
from compox.algorithm_utils.BaseRunner import BaseRunner
from compox.algorithm_utils.io_schemas import DataSchema
import numpy as np
class InputSchema(DataSchema):
image: np.ndarray
class TargetSchema(DataSchema):
mask: np.ndarray
class Runner(BaseRunner):
def load_assets(self):
# load model weights defined in algorithm assets
self.weights = self.fetch_asset("model.pt")
def train(self, training_data: list[str], args: dict | None = None):
# 1) Build dataset from sample IDs
dataset = self.get_training_dataset(training_data)
# 2) Download full dataset to TempStore using schemas for each key
schemas = {"input": InputSchema, "target": TargetSchema}
local_samples = self.download_dataset_to_temp_store(dataset, schemas)
# 3) Load the dataset into memory
in_memory = self.load_dataset_from_temp_store(local_samples)
# 4) Optional: derive extra files and save them to TempStore
derived = []
for sample in in_memory:
for file_dict in sample:
if "input" in file_dict:
img = file_dict["input"]["image"]
norm = (img - img.min()) / (img.max() - img.min() + 1e-8)
derived.append({"image": norm.astype(np.float32)})
derived_paths = self.save_training_files_to_temp_store(
"derived", derived, InputSchema, parallel=True
)
# 5) Load derived files back (flat load)
derived_loaded = self.load_files_from_temp_store(derived_paths)
epochs = args.get("num_epochs", 10)
for epoch in range(epochs):
# training step...
self.log_message(f"Epoch {epoch+1}/{epochs}")
self.set_progress(float(epoch + 1) / epochs)
self.set_state(
{
"epoch": epoch + 1,
"samples": len(in_memory),
"derived": len(derived_loaded),
}
)
# intermediate checkpoint
self.save_checkpoint(
{"model.pt": b"model-bytes"},
properties={
"stage": "intermediate",
"epoch": epoch + 1,
},
)
# final checkpoint (required)
self.save_checkpoint(
{"model.pt": b"final-model-bytes"},
properties={"stage": "final", "epoch": epochs},
)
Common pitfalls
No checkpoints saved: training will fail at completion.
Checkpoint keys don’t match assets:
save_checkpoint()raises.Long loops without progress/logs: client sees “stalled” training.
Mutating cached assets: assets loaded in
load_assets()are protected from reassignment.