Skip to content

Introducing NOS Blog!

At Autonomi AI, we build infrastructure tools to make AI fast, easy and affordable. We’re in the early development years of the “Linux OS for AI”, where the commoditization of open-source models and tools will be the critical to the safe and ubiquitous use of AI. Needless to say, it’s the most exciting and ambitious infrastructure project our generation is going to witness in the coming decade.

A few weeks back, we open-sourced NOS - a fast and flexible inference server for PyTorch that can run a whole host of open-source AI models (LLMs, Stable Diffusion, CLIP, Whisper, Object Detection etc) all under one-roof. Today, we’re finally excited to launch the NOS blog.

🎯 Why are we building yet another AI inference server?

Most inference API implementations today deeply couple the API framework (FastAPI, Flask) with the modeling backend (PyTorch, TF etc) - in other words, it doesn’t let you separate the concerns for the AI backend (e.g. AI hardware, drivers, model compilation, execution runtime, scale out, memory efficiency, async/batched execution, multi-model management etc) from your AI application (e.g. auth, observability, telemetry, web integrations etc), especially if you’re looking to build a production-ready application.

We’ve made it very easy for developers to host new PyTorch models as APIs and take them to production without having to worry about any of the backend infrastructure concerns. We build on some awesome projects like FastAPI, Ray, Hugging Face, transformers and diffusers.

We’ve been big believers of multi-modal from the very beginning, and you can do all of it with NOS today. Give us a 🌟 on Github if you're stoked -- NOS can run locally on your Linux desktop (with a gaming GPU), in any cloud GPU (NVIDIA L4, A100s, etc) and even on CPUs. Very soon, we'll support running models on Apple Silicon and custom AI accelerators such as Inferentia2 from Amazon Web Services (AWS).

What's coming?

Over the coming weeks, we’ll be announcing some awesome features that we believe will make the power of large foundation models more accessible, cheaper and easy-to-use than ever before.

🥜 NOS, in a nutshell

NOS was built from the ground-up, with developers in mind. Here are a few things we think developers care about:

  • 🥷 Flexible: Support for OSS models with custom runtimes with pip, conda and cuda/driver dependencies.
  • 🔌 Pluggable: Simple API over a high-performance gRPC or REST API that supports batched requests, and streaming.
  • 🚀 Scalable: Serve multiple custom models simultaneously on a single or multi-GPU instance, without having to worry about memory management and model scaling.
  • 🏛️ Local: Local execution means that you control your data, and you’re free to build NOS for domains that are more restrictive with data-privacy.
  • ☁️ Cloud-agnostic: Fully containerized means that you can develop, test and deploy NOS locally, on-prem, on any cloud or AI CSP.
  • 📦 Extensible: Written entirely in Python so it’s easily hackable and extensible with an Apache-2.0 License for commercial use.

Go ahead and check out our playground, and try out some of the more recent models with NOS.