Skip to content

nos serve CLI

serve_cli

NOS gRPC/REST Serve CLI.

Usage:

serve [OPTIONS] COMMAND [ARGS]...

Options:

  --install-completion  Install completion for the current shell.
  --show-completion     Show completion for the current shell, to copy it or
                        customize the installation.

build

Build the NOS server locally.

Usage:

serve build [OPTIONS]

Options:

  -c, --config TEXT  Serve configuration filename.
  --target TEXT      Serve a specific target.
  -t, --tag TEXT     Image tag f-string.  [default: autonomi/nos:{target}]
  -p, --prod         Run with production flags (slimmer images, no dev.
                     dependencies).

down

Tear down the NOS server.

Usage:

serve down [OPTIONS]

Options:

  -v, --verbose  Verbose output.

generate

Generate the NOS server Dockerfiles, without building it.

Usage:

serve generate [OPTIONS]

Options:

  -c, --config TEXT  Serve configuration filename.
  --target TEXT      Serve a specific target.
  -t, --tag TEXT     Image tag f-string.  [default: autonomi/nos:{target}]
  -p, --prod         Run with production flags (slimmer images, no dev.
                     dependencies).

up

Spin up the NOS server locally.

Usage:

serve up [OPTIONS]

Options:

  -c, --config TEXT       Serve configuration filename.
  -r, --runtime TEXT      Runtime environment to use.
  --target TEXT           Serve a specific target.
  -t, --tag TEXT          Image tag f-string.  [default:
                          autonomi/nos:{target}]
  --http                  Serve with HTTP gateway.
  --http-host TEXT        HTTP host to use.  [default: 0.0.0.0]
  --http-port INTEGER     HTTP port to use.  [default: 8000]
  --http-workers INTEGER  HTTP max workers.  [default: 1]
  --logging-level TEXT    Logging level.  [default: INFO]
  --home-directory TEXT   Override the NOS_HOME variable with a custom
                          location.  [default: ~/.nosd]
  -d, --daemon            Run in daemon mode.
  --reload                Reload on file changes.
  --build                 Only build the custom image, without serving it.
  -p, --prod              Run with production flags (slimmer images, no dev.
                          dependencies).
  --env-file TEXT         Provide an environment file for secrets.
  --debug                 Debug intermediate outputs (Dockerfile, docker-
                          compose.yml).
  -v, --verbose           Verbose output.

Serve YAML Specification

The serve CLI uses a YAML specification file to fully configure the runtime and the models that need to be served. The full specification is available here.

serve.yaml
# `images` define the runtime environment for your custom model
# You can specify a base image, system packages, pip dependencies, 
# environment variables, and even execute bespoke scripts to define 
# your custom rutnime environment. We use `agi-pack` to build the
# runtime environment as a docker container.
images:
  # `custom-gpu` is the name of the runtime environment we build here
  # The custom docker image encapsulates all the necessary runtime 
  # dependencies we need to run our custom model.
  custom-gpu:
    # `base` is the base image for the runtime environment
    # that we build our custom runtime environment on top of.
    base: autonomi/nos:latest-gpu
    # `system` defines the system packages for the runtime environment
    system:
      - git
    # `pip` defines the pip dependencies for the runtime environment
    pip:
      - accelerate>=0.23.0
      - transformers>=4.35.1
    # `env` defines the environment variables for the runtime environment
    env:
      NOS_LOGGING_LEVEL: INFO
    # `run` defines the scripts to run to build the runtime environment
    run:
      - python -c 'import torch; print(torch.__version__)'

# `models` define the custom model class, model path, and deployment
# configuration for your custom model. You can specify the runtime
# environment for your model, and list out all the models and their
# CPU / GPU resource requirements.
models:
  # `custom/custom-model-a` is the name of the custom model we register
  # with the NOS server. Each model is uniquely identified by its name, and
  # can have its own runtime environment defined and scaled independently.
  custom/custom-model-a:
    # `runtime_env` defines the runtime environment for this custom model
    runtime_env: custom-gpu
    # `model_cls` defines the custom model class for this custom model
    # that we have defined under the `model_path` location. 
    model_cls: CustomModel
    model_path: models/model.py
    # `default_method` defines the default method to call on the custom model
    # if no `method` signature is specified in the request.
    default_method: __call__
    # `init_kwargs` defines the keyword arguments to pass to the custom model
    # class constructor. This is useful for instantiating custom models with
    # different configurations.
    init_kwargs:
      model_name: custom/custom-model-a
    # `deployment` defines the deployment specification for this custom model.
    # Each model can be individually profiled, optimized and scaled, allowing 
    # NOS to fully utilize the underlying hardware resources.
    deployment:
      resources:
        device: auto
        device_memory: 4Gi
      num_replicas: 2

  # `custom/custom-model-b` is the name of the second custom model we register
  # with the NOS server. Multiple models with unique names can be registered, 
  # with each model having its own runtime environment (if needed). In this case,
  # we use the same runtime environment for both models.
  custom/custom-model-b:
    runtime_env: custom-gpu
    model_cls: CustomModel
    model_path: models/model.py
    default_method: __call__
    init_kwargs:
      model_name: custom/custom-model-b
    # For deployment here, we explicitly specify the cpu and memory resources
    # for each replica of the model. 
    deployment:
      resources:
        cpu: 2
        memory: 2Gi
        device: cpu
      num_replicas: 2

Check out our custom model serving tutorial here to learn more about how to use the serve.yaml file to serve custom models with NOS.

Debugging the server

When using the serve CLI, you can enable verbose logging by setting the --logging-level argument to DEBUG:

$ nos serve -c <serve.yaml> --logging-level DEBUG

This can be especially useful when debugging issues with your server, especially around model registry, resource allocation and model logic.