Model specification
In NOS, the ModelSpec
class is a serializable specification of a model that captures all the relevant information for instantatiation, execution and runtime profile of a model.
Model Specification¶
- Deterministic: We benchmark the models during model registry, so you are guaranteed execution runtimes and device resource-usage. More specifically, the model specification will allow us to measure memory consumption and FLOPs ahead-of-time and enable more efficient device-memory usage in production.
- Scalable: Registered models can be independently scaled up for batch inference or parallel execution with Ray actors.
- Optimizable: Every registered model can be inspected, compiled and optimized with a unique and configurable runtime-engine (TensorRT, ONNX, AITemplate etc). This allows us to benchmark models before they enter production, and run models at the optimal (or configurable) operating point.