YOLO-NAS-SAM

Object Detection & Image Segmentation with YOLO-NAS + SAM

Overview

GitHub

This project demonstrates how to perform object detection and image segmentation using YOLO-NAS for object detection and SAM for image segmentation. YOLO-NAS developed by DeciAi is a state-of-the-art object detection model optimized for both accuracy and low-latency inference. SAM, on the other hand, is a powerful segmentation model developed by Meta AI.

Technologies Used

YOLO-NAS Overview

YOLO-NAS, short for You Only Look Once with Neural Architecture Search, is a cutting-edge object detection model optimized for both accuracy and low-latency inference. Developed by Deci, YOLO-NAS employs state-of-the-art techniques like Quantization Aware Blocks and selective quantization for superior performance. It sets a new standard for state-of-the-art (SOTA) object detection, making it an ideal choice for a wide range of applications including autonomous vehicles, robotics, and video analytics.

Unique Features of YOLO-NAS

YOLO-NAS architecture

Training Details

YOLO-NAS undergoes a multi-phase training process that includes pre-training on Object365, COCO Pseudo-Labeled data, Knowledge Distillation (KD), and Distribution Focal Loss (DFL). The model is meticulously trained on Objects365, a comprehensive dataset with 2 million images and 365 categories, for 25-40 epochs, ensuring robust performance.

SAM Overview

SAM (Segment Anything Model) is a large language model from Meta AI that can be used to segment objects in images with high accuracy. SAM is trained on a massive dataset of images and segmentation masks, and it can be used to generate masks for all objects in an image, or for specific objects or regions of interest.

Supported Prompts

Note: Text prompts are explored in the research paper but the capability is not released.

Model Structure

Model Size

Training Data

The model was trained on the SA-1B dataset.

Efficient & Flexible Design

SAM is designed to be efficient enough to power its data engine. It is decoupled into a one-time image encoder and a lightweight mask decoder that can run in a web browser in just a few milliseconds per prompt.

Note: SAM is still under development, but it has the potential to revolutionize the way we interact with images and videos.