Skip to content

Starting the server

The NOS gRPC server can be started in two ways:

  • Via the NOS SDK using nos.init(...) (preferred for development)
  • Via the NOS serve CLI.
  • Via Docker Compose (recommended for production deployments)

You can also start the server with the REST API proxy enabled as shown in the 2nd and 4th examples below.

You can start the nos server programmatically via the NOS SDK:

import nos

nos.init(runtime="auto")

You can start the nos server via the NOS serve CLI:

nos serve up

Optionally, to use the REST API, you can start an HTTP gateway proxy alongside the gRPC server:

nos serve up --http

Navigate to examples/docker to see an example of the YAML specification. You can start the server with the following command:

docker-compose -f docker-compose.gpu.yml up
docker-compose.gpu.yml
services:
  server-gpu:
    image: autonomi/nos:latest-gpu
    environment:
      - NOS_HOME=/app/.nos
      - NOS_LOGGING_LEVEL=INFO
    volumes:
      - ~/.nosd:/app/.nos
      - /dev/shm:/dev/shm
    ports:
      - 50051:50051
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

Navigate to examples/docker to see an example of the YAML specification. You can start the server with the following command:

docker-compose -f docker-compose.gpu-with-gateway.yml up
docker-compose.gpu-with-gateway.yml
services:
  server:
    image: autonomi/nos:latest-gpu
    command: /app/entrypoint.sh --http
    environment:
      - NOS_HOME=/app/.nos
      - NOS_LOGGING_LEVEL=INFO
    volumes:
      - ~/.nosd:/app/.nos
      - /dev/shm:/dev/shm
    ports:
      - 8000:8000
    ipc: host
    deploy:
      resources:
        reservations:
          devices:
            - capabilities: [gpu]

API Reference


nos.init

init(runtime: str = 'auto', port: int = DEFAULT_GRPC_PORT, utilization: float = 1.0, pull: bool = True, logging_level: Union[int, str] = logging.INFO, tag: Optional[str] = None) -> Container

Initialize the NOS inference server (as a docker daemon).

The method first checks to see if your system requirements are met, before pulling the NOS docker image from Docker Hub (if necessary) and starting the inference server (as a docker daemon). You can also specify the runtime to use (i.e. "cpu", "gpu"), and the port to use for the inference server.

Parameters:

  • runtime (str, default: 'auto' ) –

    The runtime to use (i.e. "auto", "local", "cpu", "gpu"). Defaults to "auto". In "auto" mode, the runtime will be automatically detected.

  • port (int, default: DEFAULT_GRPC_PORT ) –

    The port to use for the inference server. Defaults to DEFAULT_GRPC_PORT.

  • utilization (float, default: 1.0 ) –

    The target cpu/memory utilization of inference server. Defaults to 1.

  • pull (bool, default: True ) –

    Pull the docker image before starting the inference server. Defaults to True.

  • logging_level (Union[int, str], default: INFO ) –

    The logging level to use. Defaults to logging.INFO. Optionally, a string can be passed (i.e. "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL").

  • tag (str, default: None ) –

    The tag of the docker image to use ("latest"). Defaults to None, where the appropriate version is used.

Source code in nos/server/__init__.py
def init(
    runtime: str = "auto",
    port: int = DEFAULT_GRPC_PORT,
    utilization: float = 1.0,
    pull: bool = True,
    logging_level: Union[int, str] = logging.INFO,
    tag: Optional[str] = None,
) -> docker.models.containers.Container:
    """Initialize the NOS inference server (as a docker daemon).

    The method first checks to see if your system requirements are met, before pulling the NOS docker image from Docker Hub
    (if necessary) and starting the inference server (as a docker daemon). You can also specify the runtime to use (i.e. "cpu", "gpu"),
    and the port to use for the inference server.


    Args:
        runtime (str, optional): The runtime to use (i.e. "auto", "local", "cpu", "gpu"). Defaults to "auto".
            In "auto" mode, the runtime will be automatically detected.
        port (int, optional): The port to use for the inference server. Defaults to DEFAULT_GRPC_PORT.
        utilization (float, optional): The target cpu/memory utilization of inference server. Defaults to 1.
        pull (bool, optional): Pull the docker image before starting the inference server. Defaults to True.
        logging_level (Union[int, str], optional): The logging level to use. Defaults to logging.INFO.
            Optionally, a string can be passed (i.e. "DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL").
        tag (str, optional): The tag of the docker image to use ("latest"). Defaults to None, where the
            appropriate version is used.
    """
    # Check arguments
    available_runtimes = list(InferenceServiceRuntime.configs.keys()) + ["auto", "local"]
    if runtime not in available_runtimes:
        raise ValueError(f"Invalid inference service runtime: {runtime}, available: {available_runtimes}")

    # If runtime is "local", return early with ray executor
    if runtime == "local":
        from nos.executors.ray import RayExecutor

        executor = RayExecutor.get()
        executor.init()
        return

    # Check arguments
    if utilization <= 0.25 or utilization > 1:
        raise ValueError(f"Invalid utilization: {utilization}, must be in (0.25, 1].")

    if not isinstance(logging_level, (int, str)):
        raise ValueError(f"Invalid logging level: {logging_level}, must be an integer or string.")
    if isinstance(logging_level, int):
        logging_level = logging.getLevelName(logging_level)
    if logging_level not in ("DEBUG", "INFO", "WARNING", "ERROR", "CRITICAL"):
        raise ValueError(f"Invalid logging level: {logging_level}")

    if tag is None:
        tag = __version__
    else:
        if not isinstance(tag, str):
            raise ValueError(f"Invalid tag: {tag}, must be a string.")
        raise NotImplementedError("Custom tags are not yet supported.")

    # Determine runtime from system
    if runtime == "auto":
        runtime = InferenceServiceRuntime.detect()
        logger.debug(f"Auto-detected system runtime: {runtime}")
    else:
        if runtime not in InferenceServiceRuntime.configs:
            raise ValueError(
                f"Invalid inference service runtime: {runtime}, available: {list(InferenceServiceRuntime.configs.keys())}"
            )

    # Check if the latest inference server is already running
    # If the running container's tag is inconsistent with the current version,
    # we will shutdown the running container and start a new one.
    containers = InferenceServiceRuntime.list()
    if len(containers) == 1:
        logger.debug("Found an existing inference server running, checking if it is the latest version.")
        if InferenceServiceRuntime.configs[runtime].image not in containers[0].image.tags:
            logger.info(
                "Active inference server is not the latest version, shutting down before starting the latest one."
            )
            _stop_container(containers[0])
        else:
            (container,) = containers
            logger.info(
                f"Inference server already running (name={container.name}, image={container.image}, id={container.id[:12]})."
            )
            return container
    elif len(containers) > 1:
        logger.warning("""Multiple inference servers running, please report this issue to the NOS maintainers.""")
        for container in containers:
            _stop_container(container)
    else:
        logger.debug("No existing inference server found, starting a new one.")

    # Check system requirements
    # Note: we do this after checking if the latest
    # inference server is already running for convenience.
    _check_system_requirements(runtime)

    # Pull docker image (if necessary)
    if pull:
        _pull_image(InferenceServiceRuntime.configs[runtime].image)

    # Start inference server
    runtime = InferenceServiceRuntime(runtime=runtime)
    logger.info(f"Starting inference service: [name={runtime.cfg.name}, runtime={runtime}]")

    # Determine number of cpus, system memory before starting container
    # Note (spillai): MacOSX compatibility issue where docker does not have access to
    # the correct number of physical cores and memory.
    cl = DockerRuntime.get()._client
    num_cpus = cl.info().get("NCPU", psutil.cpu_count(logical=False))
    num_cpus = max(_MIN_NUM_CPUS, utilization * num_cpus)
    mem_limit = (
        min(cl.info().get("MemTotal", psutil.virtual_memory().total), psutil.virtual_memory().available) / 1024**3
    )
    mem_limit = max(_MIN_MEM_GB, utilization * math.floor(mem_limit))
    logger.debug(f"Starting inference container: [num_cpus={num_cpus}, mem_limit={mem_limit}g]")

    # Start container
    # TOFIX (spillai): If macosx, shared memory is not supported
    shm_enabled = NOS_SHM_ENABLED if platform.system() == "Linux" else False
    container = runtime.start(
        nano_cpus=int(num_cpus * 1e9),
        mem_limit=f"{mem_limit}g",
        shm_size=f"{_MIN_SHMEM_GB}g",
        ports={f"{DEFAULT_GRPC_PORT}/tcp": port},
        environment={
            "NOS_LOGGING_LEVEL": logging_level,
            "NOS_SHM_ENABLED": int(shm_enabled),
        },
    )
    logger.info(
        f"Inference service started: [name={runtime.cfg.name}, runtime={runtime}, image={container.image}, id={container.id[:12]}]"
    )
    return container

nos.shutdown

shutdown() -> Optional[Union[Container, List[Container]]]

Shutdown the inference server.

Source code in nos/server/__init__.py
def shutdown() -> Optional[Union[docker.models.containers.Container, List[docker.models.containers.Container]]]:
    """Shutdown the inference server."""
    # Check if inference server is already running
    containers = InferenceServiceRuntime.list()
    if len(containers) == 1:
        (container,) = containers
        _stop_container(container)
        return container
    if len(containers) > 1:
        logger.warning("""Multiple inference servers running, please report this issue to the NOS maintainers.""")
        for container in containers:
            _stop_container(container)
        return containers
    logger.info("No active inference servers found, ignoring shutdown.")
    return None