🧠 Models
This README lists the models supported by NOS, along with their corresponding links to Hugging Face or Torch Hub, and the supported devices (CPU or GPU). Navigate to our models page for more up-to-date information.
Modality |
Task |
Model Name |
Supported Devices |
API |
🏞️ |
Object Detection |
YOLOX |
CPU, GPU |
img = Image.open("test.png")
yolox = client.Module("yolox/nano")
predictions = yolox(images=img)
# {"bboxes": ..., "scores": ..., "labels": ...}
|
🏞️ |
Depth Estimation |
MiDaS |
CPU, GPU |
img = Image.open("test.png")
model = client.Module("isl-org/MiDaS")
result = model(images=img)
# {"depths": np.ndarray}
|
📝, 🏞️ |
Text-Image Embedding |
OpenAI - CLIP |
CPU, GPU |
img = Image.open("test.png")
clip = client.Module("openai/clip-vit-base-patch32")
img_vec = clip.encode_image(images=img)
txt_vec = clip.encode_text(text=["fox jumped over the moon"])
|
📝, 🏞️ |
Text/Input Conditioned Image Segmentation |
Facebook Research - Segment Anything |
CPU, GPU |
img = Image.open("test.png")
model = client.Module("facebook/sam-vit-large")
outputs: List[np.ndarray] = model(images=img, grid_size=20)
|
📝, 🏞️ |
Text-to-Image Generation |
Stability AI - Stable Diffusion XL |
GPU |
sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0")
sdxl(prompts=["fox jumped over the moon"],
width=1024, height=1024, num_images=1)
|
📝, 🏞️ |
Text-to-Image Generation |
Stability AI - Stable Diffusion 2.1 |
GPU |
sdv2 = client.Module("stabilityai/stable-diffusion-2-1")
sdv2(prompts=["fox jumped over the moon"],
width=512, height=512, num_images=1)
|
📝, 🏞️ |
Text-to-Image Generation |
Stability AI - Stable Diffusion 2 |
GPU |
sdv2 = client.Module("stabilityai/stable-diffusion-2")
sdv2(prompts=["fox jumped over the moon"],
width=512, height=512, num_images=1)
|
📝, 🏞️ |
Text-to-Image Generation |
RunwayML - Stable Diffusion v1.5 |
CPU, GPU |
sdv2 = client.Module("runwayml/stable-diffusion-v1-5")
sdv2(prompts=["fox jumped over the moon"],
width=512, height=512, num_images=1)
|
🎙️ |
Speech-to-Text |
OpenAI - Whisper |
GPU |
from base64 import b64encode
whisper = client.Module("openai/whisper-large-v2")
with open("test.wav", "rb") as f:
audio_data = f.read()
audio_b64 = b64encode(audio_data).decode("utf-8")
transcription = whisper.transcribe(audio=audio_64)
|
🎙️ |
Text-to-Speech |
Suno - Bark |
GPU |
bark = client.Module("suno/bark")
audio_data = bark(prompts=["fox jumped over the moon"])
|