Index
📦 NOS Tutorials¶
The following tutorials give a brief overview of how to use NOS to serve models.
-
01-serving-custom-models
: Serve a custom GPU model with NOS. -
02-serving-multiple-methods
: Expose several custom methods of a model for serving purposes. -
03-llm-streaming-chat
: Serve an LLM with streaming support (TinyLlama/TinyLlama-1.1B-Chat-v0.1
). -
04-serving-multiple-models
: Serve multiple models such asTinyLlama/TinyLlama-1.1B-Chat-v0.1
and distil-whisper/distil-small.en on the same GPU with custom model resources -- enable multi-modal applications like audio transcription + summarization on the same device. -
05-serving-with-docker
: Use NOS in a production environment with Docker and Docker Compose.
🏃♂️ Running the examples¶
For each of the examples, you can run the following command to serve the model (in one of your terminals):
You can then run the tests in the tests
directory to check if the model is served correctly:
For HTTP tests, you'll need add the --http
flag to the nos serve
command: