Running YOLOv8 on Apple Silicon with MPS Backend: A Simplified Guide

4 min readDec 18, 2023

YOLOv8 is a game-changer in object detection, swiftly pinpointing objects with its deep learning prowess. Usually, it zooms through tasks on CUDA GPUs, like supercharged engines built by NVIDIA. What if your Mac has a smart Apple Silicon chip like the M1 or M2? Though powerful, they don’t speak CUDA. Our quest begins with YOLOv8 on Apple Silicon. Consider it like translating a foreign film into your tongue. Apple’s CPUs lack CUDA support but have MPS (Metal Performance Shaders). Apple’s secret sauce makes GPUs run YOLOv8 lightning fast.

So, buckle up! We’re about to take YOLOv8 on a joyride with Apple Silicon, showing that Apple’s chips can join the object detection party with style even without CUDA. Let’s dive into this Silicon Safari and see how YOLOv8 thrives in the land of Apple! 🍏💻🚀

Why Not CUDA?

Machine learning relies on NVIDIA’s parallel computing engine, CUDA, to speed up complex GPU calculations. This technique helps accelerate AI programs and neural network training. Since it only works with NVIDIA hardware, it doesn’t work with Apple Silicon. Apple Silicon chips like the M1 and M2 need a different GPU speedup due to their design. Apple makes hardware-compatible technologies like Metal and Metal Performance Shaders (MPS). Apple devices cannot directly use CUDA, but these Apple-specific tools give the optimum circumstances for fast machine learning on their CPUs.

What is MPS?

Metal Performance Shaders (MPS) are Apple’s specialized solution for high-performance GPU programming on their devices. Integrating closely with the Metal framework, MPS provides a suite of highly optimized shaders for graphics and computing tasks, which is particularly beneficial in machine learning applications. It enables efficient execution of AI-related operations, like neural network processing, on Apple Silicon GPUs. This tight integration with Apple’s ecosystem allows developers to harness advanced computing power directly on Apple devices, streamlining the development of sophisticated models and applications.

Steps to Run YOLOv8 on Apple Silicon with MPS

Step 1: Check MPS Availability

Before running YOLOv8, ensure your Apple Silicon GPU supports MPS

import torch
print(torch.backends.mps.is_available())

Step 2: Install and import Required Libraries

from ultralytics import YOLO
import torch

Ensure you have PyTorch installed with MPS support and the Ultralytics YOLO library.

Step 3: Load the YOLOv8 Model

Load a pre-trained YOLOv8 model using the Ultralytics library

from ultralytics import YOLO
model = YOLO('YOLOv8s.pt')  # Ensure you have the model file

Step 4: Run Inference with MPS

Run YOLOv8 with MPS as the target device. This utilizes the Apple Silicon GPU for acceleration:

results = model(source="input.mp4", show=True, conf=0.1, save=True, device='mps')

Select the preferred running device as “mps.”
Replace "input.mp4" with your video or image source.
Adjust conf for confidence thresholding as per your requirement.

Performance Analysis

Video frame processing delays of 68.3 to 71.6 milliseconds are seen when executing YOLOv8 on Apple Silicon without MPS. Considering that one second is 1000 milliseconds, this equates to around fourteen or fifteen frames per second (FPS). Even though it’s a bit sluggish for real-time object detection, this rate shows how powerful Apple Silicon is for complicated tasks.

On the other hand, processing times are much improved with MPS enabled, going from 16.7ms to 21.1ms per frame. This update increases the frame rate to around 47 to 60 FPS. Such a dramatic speed improvement demonstrates how well MPS optimizes GPU activities for ML. Using Apple’s dedicated MPS technology to its full potential for complex AI and ML workloads results in a greater frame rate, essential for real-time processing applications.

Conclusion

Finally, the fact that MPS can be used on Apple Silicon to run YOLOv8 clearly shows how far technology has come in improving machine learning on different hardware systems. Apple’s hardware is naturally very good at computing, as demonstrated by the excellent frame rates when turned on MPS. However, when MPS is turned on, speed goes through the roof. This jump shows how useful real-time apps could be and how important it is to match software frameworks with hardware powers. This is an excellent example for AI and machine learning developers and experts of how customized optimization can bring out the best in hardware, making it possible for more powerful and efficient computing solutions in many areas.