🧠 Models

This README lists the models supported by NOS, along with their corresponding links to Hugging Face or Torch Hub, and the supported devices (CPU or GPU). Navigate to our models page for more up-to-date information.

Modality	Task	Model Name	Supported Devices	API
🏞️	Object Detection	YOLOX	CPU, GPU	`img = Image.open("test.png") yolox = client.Module("yolox/nano") predictions = yolox(images=img) # {"bboxes": ..., "scores": ..., "labels": ...}`
🏞️	Depth Estimation	MiDaS	CPU, GPU	`img = Image.open("test.png") model = client.Module("isl-org/MiDaS") result = model(images=img) # {"depths": np.ndarray}`
📝, 🏞️	Text-Image Embedding	OpenAI - CLIP	CPU, GPU	`img = Image.open("test.png") clip = client.Module("openai/clip-vit-base-patch32") img_vec = clip.encode_image(images=img) txt_vec = clip.encode_text(text=["fox jumped over the moon"])`
📝, 🏞️	Text/Input Conditioned Image Segmentation	Facebook Research - Segment Anything	CPU, GPU	`img = Image.open("test.png") model = client.Module("facebook/sam-vit-large") outputs: List[np.ndarray] = model(images=img, grid_size=20)`
📝, 🏞️	Text-to-Image Generation	Stability AI - Stable Diffusion XL	GPU	`sdxl = client.Module("stabilityai/stable-diffusion-xl-base-1-0") sdxl(prompts=["fox jumped over the moon"], width=1024, height=1024, num_images=1)`
📝, 🏞️	Text-to-Image Generation	Stability AI - Stable Diffusion 2.1	GPU	`sdv2 = client.Module("stabilityai/stable-diffusion-2-1") sdv2(prompts=["fox jumped over the moon"], width=512, height=512, num_images=1)`
📝, 🏞️	Text-to-Image Generation	Stability AI - Stable Diffusion 2	GPU	`sdv2 = client.Module("stabilityai/stable-diffusion-2") sdv2(prompts=["fox jumped over the moon"], width=512, height=512, num_images=1)`
📝, 🏞️	Text-to-Image Generation	RunwayML - Stable Diffusion v1.5	CPU, GPU	`sdv2 = client.Module("runwayml/stable-diffusion-v1-5") sdv2(prompts=["fox jumped over the moon"], width=512, height=512, num_images=1)`
🎙️	Speech-to-Text	OpenAI - Whisper	GPU	`from base64 import b64encode whisper = client.Module("openai/whisper-large-v2") with open("test.wav", "rb") as f: audio_data = f.read() audio_b64 = b64encode(audio_data).decode("utf-8") transcription = whisper.transcribe(audio=audio_64)`
🎙️	Text-to-Speech	Suno - Bark	GPU	`bark = client.Module("suno/bark") audio_data = bark(prompts=["fox jumped over the moon"])`