Index
📦 NOS Tutorials¶
The following tutorials give a brief overview of how to use NOS to serve models.
-
01-serving-custom-models: Serve a custom GPU model with NOS. -
02-serving-multiple-methods: Expose several custom methods of a model for serving purposes. -
03-llm-streaming-chat: Serve an LLM with streaming support (TinyLlama/TinyLlama-1.1B-Chat-v0.1). -
04-serving-multiple-models: Serve multiple models such asTinyLlama/TinyLlama-1.1B-Chat-v0.1and distil-whisper/distil-small.en on the same GPU with custom model resources -- enable multi-modal applications like audio transcription + summarization on the same device. -
05-serving-with-docker: Use NOS in a production environment with Docker and Docker Compose.
🏃♂️ Running the examples¶
For each of the examples, you can run the following command to serve the model (in one of your terminals):
You can then run the tests in the tests directory to check if the model is served correctly:
For HTTP tests, you'll need add the --http flag to the nos serve command: