nos serve
CLI¶
serve_cli¶
NOS gRPC/REST Serve CLI.
Usage:
Options:
--install-completion Install completion for the current shell.
--show-completion Show completion for the current shell, to copy it or
customize the installation.
build¶
Build the NOS server locally.
Usage:
Options:
-c, --config TEXT Serve configuration filename.
--target TEXT Serve a specific target.
-t, --tag TEXT Image tag f-string. [default: autonomi/nos:{target}]
-p, --prod Run with production flags (slimmer images, no dev.
dependencies).
down¶
Tear down the NOS server.
Usage:
Options:
generate¶
Generate the NOS server Dockerfiles, without building it.
Usage:
Options:
-c, --config TEXT Serve configuration filename.
--target TEXT Serve a specific target.
-t, --tag TEXT Image tag f-string. [default: autonomi/nos:{target}]
-p, --prod Run with production flags (slimmer images, no dev.
dependencies).
up¶
Spin up the NOS server locally.
Usage:
Options:
-c, --config TEXT Serve configuration filename.
-r, --runtime TEXT Runtime environment to use.
--target TEXT Serve a specific target.
-t, --tag TEXT Image tag f-string. [default:
autonomi/nos:{target}]
--http Serve with HTTP gateway.
--http-host TEXT HTTP host to use. [default: 0.0.0.0]
--http-port INTEGER HTTP port to use. [default: 8000]
--http-workers INTEGER HTTP max workers. [default: 1]
--logging-level TEXT Logging level. [default: INFO]
--home-directory TEXT Override the NOS_HOME variable with a custom
location. [default: ~/.nosd]
-d, --daemon Run in daemon mode.
--reload Reload on file changes.
--build Only build the custom image, without serving it.
-p, --prod Run with production flags (slimmer images, no dev.
dependencies).
--env-file TEXT Provide an environment file for secrets.
--debug Debug intermediate outputs (Dockerfile, docker-
compose.yml).
-v, --verbose Verbose output.
Serve YAML Specification¶
The serve
CLI uses a YAML specification file to fully configure the runtime and the models that need to be served. The full specification is available here.
# `images` define the runtime environment for your custom model
# You can specify a base image, system packages, pip dependencies,
# environment variables, and even execute bespoke scripts to define
# your custom rutnime environment. We use `agi-pack` to build the
# runtime environment as a docker container.
images:
# `custom-gpu` is the name of the runtime environment we build here
# The custom docker image encapsulates all the necessary runtime
# dependencies we need to run our custom model.
custom-gpu:
# `base` is the base image for the runtime environment
# that we build our custom runtime environment on top of.
base: autonomi/nos:latest-gpu
# `system` defines the system packages for the runtime environment
system:
- git
# `pip` defines the pip dependencies for the runtime environment
pip:
- accelerate>=0.23.0
- transformers>=4.35.1
# `env` defines the environment variables for the runtime environment
env:
NOS_LOGGING_LEVEL: INFO
# `run` defines the scripts to run to build the runtime environment
run:
- python -c 'import torch; print(torch.__version__)'
# `models` define the custom model class, model path, and deployment
# configuration for your custom model. You can specify the runtime
# environment for your model, and list out all the models and their
# CPU / GPU resource requirements.
models:
# `custom/custom-model-a` is the name of the custom model we register
# with the NOS server. Each model is uniquely identified by its name, and
# can have its own runtime environment defined and scaled independently.
custom/custom-model-a:
# `runtime_env` defines the runtime environment for this custom model
runtime_env: custom-gpu
# `model_cls` defines the custom model class for this custom model
# that we have defined under the `model_path` location.
model_cls: CustomModel
model_path: models/model.py
# `default_method` defines the default method to call on the custom model
# if no `method` signature is specified in the request.
default_method: __call__
# `init_kwargs` defines the keyword arguments to pass to the custom model
# class constructor. This is useful for instantiating custom models with
# different configurations.
init_kwargs:
model_name: custom/custom-model-a
# `deployment` defines the deployment specification for this custom model.
# Each model can be individually profiled, optimized and scaled, allowing
# NOS to fully utilize the underlying hardware resources.
deployment:
resources:
device: auto
device_memory: 4Gi
num_replicas: 2
# `custom/custom-model-b` is the name of the second custom model we register
# with the NOS server. Multiple models with unique names can be registered,
# with each model having its own runtime environment (if needed). In this case,
# we use the same runtime environment for both models.
custom/custom-model-b:
runtime_env: custom-gpu
model_cls: CustomModel
model_path: models/model.py
default_method: __call__
init_kwargs:
model_name: custom/custom-model-b
# For deployment here, we explicitly specify the cpu and memory resources
# for each replica of the model.
deployment:
resources:
cpu: 2
memory: 2Gi
device: cpu
num_replicas: 2
Check out our custom model serving tutorial here to learn more about how to use the serve.yaml
file to serve custom models with NOS.
Debugging the server¶
When using the serve
CLI, you can enable verbose logging by setting the --logging-level
argument to DEBUG
:
This can be especially useful when debugging issues with your server, especially around model registry, resource allocation and model logic.