🧠 Models

This README lists the models supported by NOS, along with their corresponding links to Hugging Face or Torch Hub, and the supported devices (CPU or GPU). Navigate to our models page for more up-to-date information.

Modality Task Model Name Supported Devices
🏞️ Object Detection YOLOX CPU, GPU
img = Image.open("test.png")

yolox = client.Module("yolox/nano")
predictions = yolox(images=img)
# {"bboxes": ..., "scores": ..., "labels": ...}
🏞️ Depth Estimation MiDaS CPU, GPU
img = Image.open("test.png")

model = client.Module("isl-org/MiDaS")
result = model(images=img)
# {"depths": np.ndarray}
📝, 🏞️ Text-Image Embedding OpenAI - CLIP CPU, GPU
img = Image.open("test.png")

clip = client.Module("openai/clip-vit-base-patch32")
img_vec = clip.encode_image(images=img)
txt_vec = clip.encode_text(text=["fox jumped over the moon"])
📝, 🏞️ Text/Input Conditioned Image Segmentation Facebook Research - Segment Anything CPU, GPU
img = Image.open("test.png")

model = client.Module("facebook/sam-vit-large")
outputs: List[np.ndarray] = model(images=img, grid_size=20)
📝, 🏞️ Text-to-Image Generation Stability AI - Stable Diffusion XL GPU
sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
sdxl(prompts=["fox jumped over the moon"],
     width=1024, height=1024, num_images=1)
📝, 🏞️ Text-to-Image Generation Stability AI - Stable Diffusion 2.1 GPU
sdv2 = client.Module("stabilityai/stable-diffusion-2-1")
sdv2(prompts=["fox jumped over the moon"],
     width=512, height=512, num_images=1)
📝, 🏞️ Text-to-Image Generation Stability AI - Stable Diffusion 2 GPU
sdv2 = client.Module("stabilityai/stable-diffusion-2")
sdv2(prompts=["fox jumped over the moon"],
     width=512, height=512, num_images=1)
📝, 🏞️ Text-to-Image Generation RunwayML - Stable Diffusion v1.5 CPU, GPU
sdv2 = client.Module("runwayml/stable-diffusion-v1-5")
sdv2(prompts=["fox jumped over the moon"],
     width=512, height=512, num_images=1)
🎙️ Speech-to-Text OpenAI - Whisper GPU
from base64 import b64encode

whisper = client.Module("openai/whisper-large-v2")
with open("test.wav", "rb") as f:
    audio_data = f.read()
    audio_b64 = b64encode(audio_data).decode("utf-8")
    transcription = whisper.transcribe(audio=audio_64)
🎙️ Text-to-Speech Suno - Bark GPU
bark = client.Module("suno/bark")
audio_data = bark(prompts=["fox jumped over the moon"])