Client Workflows
In the following sections, we will describe the typical Compox client workflows for training and inference.
Execution workflow
This doc covers the client-facing flow for execution in Compox: upload data, start an execution, poll status, and retrieve results. All endpoints and payloads are derived from the current server code under compox/src/compox/routers.
Upload data files
Endpoint:
POST /api/v0/files
Behavior (from file_controller.py):
The request body is treated as raw bytes and must be a valid HDF5 file.
On success, returns
{ "file_id": "<uuid>" }.
Minimal example:
POST /api/v0/files
Content-Type: application/octet-stream
<HDF5 bytes>
Response:
{ "file_id": "..." }
Notes:
The server validates that the uploaded bytes open as HDF5.
Files are stored in the
data-storebucket/collection.Files by default expire after 1 day (configurable in
S3Connection).
Execute an algorithm
Endpoint:
POST /api/v0/execute-algorithm
Payload model: IncomingExecutionRequest
algorithm_id: stringinput_dataset_ids: list of file IDscheckpoint_id: optional checkpoint to load assets fromalgorithm_minor_version: optional minor version to executeexecution_device_override: optional device override (e.g."cpu","cuda:0")additional_parameters: dict (free-form)session_token: optional session identifier
Example:
{
"algorithm_id": "<algorithm_id>",
"input_dataset_ids": ["<file_id_1>", "<file_id_2>"],
"checkpoint_id": null,
"algorithm_minor_version": null,
"execution_device_override": null,
"additional_parameters": {
"threshold": 0.5,
"tile_size": 512
},
"session_token": null
}
Response:
{ "execution_id": "..." }
Validation behavior (from execution_controller.py):
All referenced input file IDs must exist in
data-store.The algorithm ID must exist in
algorithm-store(viafind_algorithm_by_id).
Execution mode:
If
inference.backend_settings.executorisfastapi_background_tasks, execution runs viaexecution_task_fastapi.If executor is
celery, the task is queued undertask(Celery).
Progress/status details (from TaskHandler):
Valid statuses are
PENDING,STARTED,RUNNING,COMPLETED,FAILED,STOPPED.execution_controller.pycreates the record withstatus="PENDING"andprogress=0.0.TaskHandler.mark_as_completed()setsprogress=1.0,time_completed,output_dataset_ids, andstatus="COMPLETED".TaskHandler.mark_as_failed()setsstatus="FAILED",progress=1.0, clearsoutput_dataset_ids, and stores the exception in the log.If a stop request is posted,
TaskHandler._check_for_stop_request()acknowledges it and callsmark_as_stopped(), which setsstatus="STOPPED"and raisesTaskStoppedException.
Sessions (optional)
The execution API supports an optional session_token that can be used to share an in‑memory cache across multiple executions.
Behavior (from TaskSession and TaskHandler):
If you omit
session_tokenand the backend uses FastAPI background tasks, a new session token is generated and stored in the execution record.You can retrieve it via
GET /api/v0/executions/{execution_id}and pass it in subsequent executions to reuse the cache.Sessions are in‑memory only (single process), expire after ~24 hours, and are capped in size/number of caches.
Celery does not support sessions: if you attempt to use session features in Celery mode, a
NotImplementedErroris raised internally andsession_tokenwill beNonein the execution record.
Practical guidance:
Use sessions only when running with
fastapi_background_tasks.Treat sessions as a performance optimization (e.g., caching model intermediates), not a persistent store.
Check execution status
Endpoint:
GET /api/v0/executions/{execution_id}
Response model: ExecutionRecord
Includes
status,progress,log, andoutput_dataset_ids.
Example response (shape):
{
"execution_id": "...",
"algorithm_id": "...",
"status": "RUNNING",
"progress": 0.3,
"time_started": "...",
"time_completed": "",
"log": "",
"input_dataset_ids": ["<file_id_1>", "<file_id_2>"],
"output_dataset_ids": [],
"execution_device_override": null,
"additional_parameters": {},
"session_token": null,
"checkpoint_id": null,
"algorithm_minor_version": null
}
Notes:
output_dataset_idsis the key field for downstream retrieval of results.
Stop execution (optional)
Endpoint:
POST /api/v0/executions/{execution_id}/stop
Behavior:
Only
PENDING,RUNNINGorSTARTEDstatuses are stoppable.A stop request is posted to
stop-requests, which the task checks.
Retrieve output datasets
Endpoint:
GET /api/v0/files/{file_id}
Behavior:
Returns the raw HDF5 bytes for each dataset ID returned in
output_dataset_ids.
Delete files (optional)
Endpoint:
DELETE /api/v0/files/{file_id}
Behavior:
Deletes a file from
data-storeimmediately.
Notes:
Files already expire automatically (default 1 day), but you may want to delete them earlier to free up storage.
End-to-end summary
Upload HDF5 files -> get
file_idsExecute algorithm with
algorithm_id+input_dataset_ids-> getexecution_idOptionally reuse a
session_tokenacross executions (FastAPI background tasks only)Poll execution record -> read status +
output_dataset_ids(andsession_tokenif used)Optionally stop execution
Download each output dataset by ID
Optionally delete files to free up storage early
Training workflow
This doc covers the client-facing flow for training in Compox: upload data, create training samples, start training, and retrieve results. All endpoints and payloads are derived from the current server code under compox/src/compox/routers.
Upload data files
Endpoint:
POST /api/v0/files
Behavior (from file_controller.py):
The request body is treated as raw bytes and must be a valid HDF5 file.
On success, returns
{ "file_id": "<uuid>" }.
Minimal example:
POST /api/v0/files
Content-Type: application/octet-stream
<HDF5 bytes>
Response:
{ "file_id": "..." }
Notes:
The server validates that the uploaded bytes open as HDF5.
Files are stored in the
data-storebucket/collection.Files by default expire after 1 day (configurable in
S3Connection).
Create training samples
Endpoint:
POST /api/v0/sample
Payload model: IncomingSampleRequest (see pydantic_models.py)
files: list of dicts mapping arbitrary keys to lists of file IDstags: list of strings (optional)
Example:
{
"files": [
{ "input": ["<file_id_1>", "<file_id_2>"], "target": ["<file_id_3>"] }
],
"tags": ["modality:ct", "anatomy:brain", "author:me"]
}
Response:
{ "sample_id": "..." }
Validation behavior:
Each referenced file ID must exist in
data-store, otherwise the API returns 404.
Related endpoints:
GET /api/v0/sample/{sample_id}returns the stored sample recordGET /api/v0/sample/all?positive_tags=...&negative_tags=...filters by tagsDELETE /api/v0/sample/{sample_id}deletes a sample
Notes:
Files referenced by samples are not copied; the sample record just points to existing files.
Files referenced by sample do not expire as long as the sample exists.
Start training
Endpoint:
POST /api/v0/train-algorithm
Payload model: IncomingTrainingRequest
algorithm_id: stringtraining_data: list of sample IDscheckpoint_id: optional checkpoint to start fromalgorithm_minor_version: optional minor version stringtags: list of stringsadditional_parameters: dict (free-form)
Example:
{
"algorithm_id": "<algorithm_id>",
"training_data": ["<sample_id>"],
"checkpoint_id": null,
"algorithm_minor_version": null,
"tags": ["experiment:42", "author:me"],
"additional_parameters": {
"learning_rate": 0.001,
"batch_size": 4,
"num_epochs": 10
}
}
Response:
{ "training_id": "..." }
Validation behavior (from training_controller.py):
All referenced sample IDs must exist in
sample-store.All files referenced by those samples must exist in
data-store.The algorithm ID must exist in
algorithm-store(viafind_algorithm_by_id).
Execution mode:
If
inference.backend_settings.executorisfastapi_background_tasks, training runs viatraining_task_fastapi.If executor is
celery, the task is queued undertraining_task.
Progress/status details (from TrainingHandler and TaskHandler):
Status transitions are written to
training-storeviaTrainingHandler.status(inherited fromTaskHandler).Valid statuses are
PENDING,STARTED,RUNNING,COMPLETED,FAILED,STOPPED.training_controller.pycreates the record withstatus="PENDING"andprogress=0.0.TrainingHandler.mark_as_completed()setsprogress=1.0,time_completed, updates the log, and setsstatus="COMPLETED".If a stop request is posted,
TaskHandler._check_for_stop_request()acknowledges it and callsmark_as_stopped(), which setsstatus="STOPPED"and raisesTaskStoppedException.TaskHandler.mark_as_failed()setsstatus="FAILED",progress=1.0, clearsoutput_dataset_ids, and stores the exception in the log.
Check training status
Endpoint:
GET /api/v0/training/{training_id}
Response model: TrainingRecord
Includes
status,progress,log,output_checkpoint_ids, etc.
Example response (shape):
{
"training_id": "...",
"algorithm_id": "...",
"status": "RUNNING",
"progress": 0.3,
"time_started": "...",
"time_completed": null,
"log": "",
"training_data": ["<sample_id>"],
"state": {},
"tags": ["experiment:42"],
"checkpoint_id": null,
"algorithm_minor_version": null,
"output_checkpoint_ids": []
}
Notes:
output_checkpoint_idsis the key field for downstream retrieval of training results.
Stop training (optional)
Endpoint:
POST /api/v0/training/{training_id}/stop
Behavior:
Only
PENDING,RUNNINGorSTARTEDstatuses are stoppable.A stop request is posted to
stop-requests, which the training task checks.
Retrieve training results (checkpoints)
Endpoint:
GET /api/v0/checkpoint/{checkpoint_id}GET /api/v0/checkpoint/all(filtering supported by query params; seecheckpoint_controller.py)
Results:
Checkpoint metadata is returned (not the model bytes directly).
Checkpoint behavior (from TrainingHandler.save_checkpoint()):
save_checkpoint()validates that every asset path exists in the algorithm’s assets.New assets are stored in
asset-store, and a newcheckpoint_idis created.A checkpoint manifest is saved in
algorithm-checkpoint-store.The new checkpoint ID is appended to
output_checkpoint_idsin the training handler, so it appears in the training record.
Checkpoint metadata shape (from AlgorithmCheckpointRecord):
{
"checkpoint_id": "...",
"training_id": "...",
"parent_algorithm_id": "...",
"created_at": "...",
"properties": {},
"tags": [],
"parent_checkpoint_id": null
}
Details on properties and tags:
propertiesis a free-form dictionary provided by the algorithm when it callssave_checkpoint(assets, properties). This is the place to store metrics, hyperparameters, dataset IDs, evaluation scores, or any other metadata you want to query later.tagsare inherited from the training run tags (IncomingTrainingRequest.tags). When the checkpoint is created,TrainingHandler.save_checkpoint()copies the training record’stagsinto the checkpoint manifest.parent_checkpoint_idis copied from the training record’scheckpoint_id, so you can track lineage if you trained from an existing checkpoint.
Filtering by tags:
GET /api/v0/checkpoint/all?positive_tags=tag1&positive_tags=tag2GET /api/v0/checkpoint/all?negative_tags=tag_to_excludeYou can combine both
positive_tagsandnegative_tagsin the same request.
Export trained algorithm (optional)
Endpoint:
GET /api/v0/algorithm/{algorithm_name}/{algorithm_major_version}/export
Query params:
algorithm_minor_version(optional)checkpoint_id(optional; overrides assets with the checkpoint)
Response:
Streaming zip download (
application/zip) of the algorithm package.
What algorithm_minor_version means:
The minor version is the build number stored for a given algorithm name + major version.
Supplying it lets you export a specific build; if you omit it, the latest build is exported.
What checkpoint_id means:
A checkpoint is a snapshot of trained assets (typically weights).
Supplying
checkpoint_idtells the exporter to swap the algorithm’s assets with the checkpoint’s assets before packaging.This is how you get a trained package out of a training run.
Export protection:
If the algorithm is marked with
"exportable": falsein itsAlgorithmConfigSchema, the export endpoint returns 403 Forbidden.
What the zip file is:
A complete deployable algorithm package:
Runner.py,pyproject.toml, and the assets underfiles/.If
checkpoint_idis provided, those asset files come from the checkpoint instead of the original algorithm assets.
End-to-end summary
Upload HDF5 files → get
file_idsCreate training sample(s) referencing file IDs → get
sample_idsStart training with algorithm ID + sample IDs → get
training_idPoll training record → read status +
output_checkpoint_idsOptionally stop training or fetch checkpoint metadata
Optionally export an algorithm package using checkpoint ID