Runtime environments

The NOS inference server supports custom runtime environments through the use of the InferenceServiceRuntime class and the configurations defined within. This class provides a high-level interface for defining new custom runtime environments that can be used with NOS.

⚡️ NOS Inference Runtime¶

We use docker to configure different worker configurations to run workloads in different runtime environments. The configured runtime environments are specified in the InferenceServiceRuntime class, which wraps the generic [DockerRuntime] class. For convenience, we have pre-built some runtime environments that can be used out-of-the-box cpu, gpu, inf2 etc.

This is the general flow of how the runtime environments are configured: - Configure runtime environments including cpu, gpu, inf2 etc in the InferenceServiceRuntime config dictionary. - Start the server with the appropriate runtime environment via the --runtime flag. - The ray cluster is now configured within the appropriate runtime environment and has access to the appropriate libraries and binaries.

For custom runtime support, we use Ray to configure different worker configurations (custom conda environment, with resource naming) to run workers on different runtime environments (see below).

🏃‍♂️ Supported Runtimes¶

The following runtimes are supported by NOS:

Status	Name	Pyorch	HW	Base	Size	Description
✅	`autonomi/nos:latest-cpu`	`2.1.1`	CPU	`debian:buster-slim`	1.1 GB	CPU-only runtime.
✅	`autonomi/nos:latest-gpu`	`2.1.1`	NVIDIA GPU	`nvidia/cuda:11.8.0-base-ubuntu22.04`	3.9 GB	GPU runtime.
✅	`autonomi/nos:latest-inf2`	`1.13.1`	AWS Inferentia2	`debian:buster-slim`	1.7 GB	Inf2 runtime with torch-neuronx.
Coming Soon	`trt`	`2.0.1`	NVIDIA GPU	`nvidia/cuda:11.8.0-base-ubuntu22.04`	GPU runtime with TensorRT (8.4.2.4).

🛠️ Adding a custom runtime¶

To define a new custom runtime environment, you can extend the InferenceServiceRuntime class and add new configurations to the existing configs variable.

nos.server._runtime.InferenceServiceRuntime.configs `class-attribute` `instance-attribute` ¶

configs = {'cpu': InferenceServiceRuntimeConfig(image=NOS_DOCKER_IMAGE_CPU, name=f'{NOS_INFERENCE_SERVICE_CONTAINER_NAME}-cpu', kwargs={'nano_cpus': int(6000000000.0), 'mem_limit': '6g', 'log_config': {'type': JSON, 'config': {'max-size': '100m', 'max-file': '10'}}}), 'gpu': InferenceServiceRuntimeConfig(image=NOS_DOCKER_IMAGE_GPU, name=f'{NOS_INFERENCE_SERVICE_CONTAINER_NAME}-gpu', device='gpu', kwargs={'nano_cpus': int(8000000000.0), 'mem_limit': '12g', 'log_config': {'type': JSON, 'config': {'max-size': '100m', 'max-file': '10'}}}), 'trt': InferenceServiceRuntimeConfig(image='autonomi/nos:latest-trt', name=f'{NOS_INFERENCE_SERVICE_CONTAINER_NAME}-trt', device='gpu', kwargs={'nano_cpus': int(8000000000.0), 'mem_limit': '12g', 'log_config': {'type': JSON, 'config': {'max-size': '100m', 'max-file': '10'}}}), 'inf2': InferenceServiceRuntimeConfig(image='autonomi/nos:latest-inf2', name=f'{NOS_INFERENCE_SERVICE_CONTAINER_NAME}-inf2', device='inf2', environment=_default_environment({'NEURON_RT_VISIBLE_CORES': 2}), kwargs={'nano_cpus': int(8000000000.0), 'log_config': {'type': JSON, 'config': {'max-size': '100m', 'max-file': '10'}}})}

Runtime environments

⚡️ NOS Inference Runtime¶

🏃‍♂️ Supported Runtimes¶

🛠️ Adding a custom runtime¶

nos.server._runtime.InferenceServiceRuntime.configs class-attribute instance-attribute ¶

nos.server._runtime.InferenceServiceRuntime.configs `class-attribute` `instance-attribute` ¶