YOLO-NAS vs YOLOv12 vs YOLO26: Object Detection Comparison (2026)
Last updated April 2026 — refreshed for current model/tool versions, including the YOLO26 successor and YOLO-NAS maintenance status after the Deci/NVIDIA acquisition.
This guide compares YOLO-NAS (Deci AI, 2023) and YOLOv12 (Tian et al., 2025) head-to-head on architecture, COCO accuracy, latency, license, and deployability — and tells you why most teams shipping new detectors in 2026 should also evaluate YOLO26, Ultralytics' January 2026 flagship. If you only have time for one fact: YOLO-NAS is no longer maintained (Deci was acquired by NVIDIA in April 2024), YOLOv12 is now positioned by Ultralytics as a research/benchmark family, and YOLO26 is the recommended deployment line for new projects.
What changed since this post was first written (Feb 2025)YOLO-NAS is frozen. Deci AI was acquired by NVIDIA in April 2024; the super-gradients repository is no longer actively maintained, and Ultralytics' own docs now state that "no further updates from the original team are expected."YOLOv12 is no longer the newest YOLO. Ultralytics released YOLO26 on January 14, 2026 — a native NMS-free, end-to-end model with up to 43% faster CPU inference than YOLO11.Ultralytics' guidance on YOLOv12 has shifted. The official docs now say YOLO12 is "maintained primarily for benchmarking and research" and recommend YOLO11 or YOLO26 for production.YOLOv12 was accepted at NeurIPS 2025 (paper: arXiv:2502.12524), with the FlashAttention path now optional rather than required.Licensing matters more than ever. YOLOv12 and YOLO26 ship under AGPL-3.0 (or paid Ultralytics Enterprise); YOLO-NAS weights remain under their original Deci pre-trained-weights license. This affects whether you can ship them in a closed-source product.TL;DR — which YOLO should you pick in 2026?
| Use case | Pick | Why |
|---|---|---|
| New production deployment, edge or CPU | YOLO26 | NMS-free, ~43% faster on CPU, actively maintained. |
| Closed-source commercial product, no Ultralytics enterprise license | YOLO-NAS (with caveats) | Pre-trained weights are usable commercially under Deci's terms; verify license file. No upstream support. |
| Research / paper-baseline / attention ablations | YOLOv12 | Cleanest attention-centric reference architecture; strong COCO numbers. |
| Highest COCO mAP at any speed | YOLO26-x or YOLOv12-x | 57.5% vs 55.2% mAP respectively. |
| Quantization / INT8 edge deployment | YOLO-NAS | Quantization-aware blocks; ≤0.65 mAP drop after INT8. |
YOLO lineage in one paragraph
YOLO is a family of single-stage detectors that predict bounding boxes and class scores in one forward pass. Since 2015 the line has split across multiple maintainers: Ultralytics owns the "official" v5/v8/v11/v26 line; v9 came from WongKinYiu; v10 from Tsinghua; v12 from Tian, Ye, and Chen; YOLO-NAS from Deci AI. There is no central numbering authority — the version number is mostly a brand. What matters in 2026 is which model is still maintained, which has weights you can legally ship, and which fits your latency/accuracy target.
YOLO-NAS: state in 2026
What it is
YOLO-NAS was released by Deci AI in May 2023. Its headline trick: instead of hand-designing the backbone, Deci ran Neural Architecture Search on a quantization-aware search space (their AutoNAC engine) targeting NVIDIA T4 latency directly. The resulting blocks include Quantization-Aware QSP and QCI modules that minimize accuracy loss after INT8 conversion.
Official benchmark numbers (COCO val, 640×640, T4)
| Variant | mAP@0.5:0.95 | Latency (FP16, ms) | INT8 mAP | INT8 Latency (ms) |
|---|---|---|---|---|
| YOLO-NAS-S | 47.5 | 3.21 | 47.03 | 2.36 |
| YOLO-NAS-M | 51.55 | 5.85 | 51.0 | 3.78 |
| YOLO-NAS-L | 52.22 | 7.87 | 52.1 | 4.78 |
Numbers are from Ultralytics' YOLO-NAS reference page and Deci's super-gradients README. Deci did not publish a full FLOPs/parameter table for the released checkpoints — that gap was real, not an oversight in our previous version of this post.
Maintenance status (April 2026)
- Deci AI was acquired by NVIDIA in April 2024.
- The Deci-AI/super-gradients repository is no longer actively developed by the original team.
- Ultralytics' own docs note: "these models are no longer actively maintained by Deci... no further updates from the original team are expected."
- The released weights still work and can still be loaded via Ultralytics or super-gradients; nano variants are still on the open issue list and won't ship.
Loading YOLO-NAS in 2026
from super_gradients.training import models
from super_gradients.common.object_names import Models
# Load pretrained weights
model = models.get(Models.YOLO_NAS_L, pretrained_weights="coco")
# Single-image inference
preds = model.predict("street.jpg", conf=0.35)
preds.show()
# Export to ONNX for deployment
model.export("yolo_nas_l.onnx", input_shape=(1, 3, 640, 640))
For INT8 quantization on T4/Jetson, super-gradients exposes QATTrainer; the quantization-aware blocks are why YOLO-NAS still wins the "smallest mAP drop after INT8" comparison against YOLOv8 and YOLOv12.
YOLOv12: attention-centric, but now a research line
What it is
YOLOv12 (Tian, Ye, Chen — arXiv:2502.12524, NeurIPS 2025) is the first YOLO release where attention, not pure convolution, is the primary backbone primitive. Its three load-bearing ideas:
- Area Attention (A2) — splits the feature map into
lequal regions and runs attention within each, dropping the cost of large-receptive-field attention from quadratic to roughly linear in spatial size while keeping a wide effective field. - R-ELAN (Residual Efficient Layer Aggregation Networks) — adds block-level residual connections + scaling factors to stabilize training of deeper attention stacks.
- FlashAttention path (optional) — for users on Turing/Ampere/Ada/Hopper GPUs, the attention kernels can be compiled with FlashAttention to reduce memory access overhead. As of the current Ultralytics integration, FlashAttention is optional, not required.
Official benchmark numbers (COCO val2017, 640×640, T4 TensorRT10)
| Variant | mAP@0.5:0.95 | Latency (ms) | Params (M) | FLOPs (B) |
|---|---|---|---|---|
| YOLOv12-N | 40.6 | 1.64 | 2.6 | 6.5 |
| YOLOv12-S | 48.0 | 2.61 | 9.3 | 21.4 |
| YOLOv12-M | 52.5 | 4.86 | 20.2 | 67.5 |
| YOLOv12-L | 53.7 | 6.77 | 26.4 | 88.9 |
| YOLOv12-X | 55.2 | 11.79 | 59.1 | 199.0 |
Maintenance status (April 2026)
Ultralytics' YOLO12 docs now state: "YOLO12 is maintained primarily for benchmarking and research. If you need stable training, predictable memory usage, and optimized CPU inference, choose YOLO11 or YOLO26 for deployment." That is a notable downgrade in posture compared to early 2025, when YOLOv12 was being pitched as the new SOTA YOLO. Use it if you're publishing or evaluating attention-centric detectors; consider YOLO26 if you're shipping.
Validating YOLOv12
from ultralytics import YOLO
model = YOLO("yolo12n.pt") # also yolo12s/m/l/x
metrics = model.val(data="coco.yaml", save_json=True)
print(metrics.box.map) # mAP@0.5:0.95
print(metrics.box.map50) # mAP@0.5
The 2026 wildcard: YOLO26
If you are reading this in mid-2026 and choosing a detector, you should at least benchmark YOLO26 alongside the two above. Released January 14, 2026 by Ultralytics, it is the first native end-to-end YOLO — predictions come out without a non-maximum-suppression (NMS) post-processing step.
| Variant | mAP (E2E) | Params (M) | FLOPs (B) | CPU ONNX (ms) | T4 TRT10 (ms) |
|---|---|---|---|---|---|
| YOLO26-n | 40.1 | 2.4 | 5.4 | 38.9 | 1.7 |
| YOLO26-s | 47.8 | 9.5 | 20.7 | — | — |
| YOLO26-m | 52.5 | 20.4 | 68.2 | — | — |
| YOLO26-l | 54.4 | 24.8 | 86.4 | — | — |
| YOLO26-x | 56.9 | 55.7 | 193.9 | 525.8 | 11.8 |
Highlights from Ultralytics' release notes:
- NMS-free inference. Distribution Focal Loss (DFL) was removed; the head produces final predictions directly.
- Up to 43% faster CPU inference vs the equivalent YOLO11 nano variant — the largest YoY CPU speedup in the YOLO line in years.
- New training tricks: Progressive Loss Balancing (ProgLoss), Small-Target-Aware Label Assignment (STAL), and the MuSGD optimizer (SGD + Muon).
- Same five tasks as YOLOv12: detection, segmentation, classification, pose, OBB.
Head-to-head comparison
Accuracy at matched scale
| Model size | YOLO-NAS mAP | YOLOv12 mAP | YOLO26 mAP (E2E) |
|---|---|---|---|
| Small (S) | 47.5 | 48.0 | 47.8 |
| Medium (M) | 51.55 | 52.5 | 52.5 |
| Large (L) | 52.22 | 53.7 | 54.4 |
At identical compute budgets, YOLOv12 and YOLO26 have closed the small gap that YOLO-NAS used to enjoy in 2023. YOLO-NAS still leads on quantized deployment (it loses the least mAP after INT8), which matters specifically for Jetson Nano/Orin and similar edge targets.
Latency on T4 (FP16/TensorRT)
For roughly 52% mAP at FP16 precision: YOLO-NAS-M at 5.85 ms vs YOLOv12-M at 4.86 ms vs YOLO26-m at ~3.7 ms (interpolated from CPU/GPU ratios in the official tables). The newer models are faster at equal accuracy on the same GPU.
License and commercial use
| Model | Code license | Pre-trained weights | Notes |
|---|---|---|---|
| YOLO-NAS | Apache 2.0 (super-gradients) | Deci pre-trained weights license — restrictions apply; check the LICENSE.YOLONAS.md in the repo | No upstream maintenance; OK as a frozen baseline. |
| YOLOv12 | AGPL-3.0 via Ultralytics; original repo is also AGPL-3.0 | AGPL-3.0 / Ultralytics Enterprise | Closed-source products need an Ultralytics Enterprise License or a Roboflow commercial license. |
| YOLO26 | AGPL-3.0 | AGPL-3.0 / Ultralytics Enterprise | Same constraint as YOLOv12. |
How to choose: a decision tree
- Are you starting a new project today? Default to YOLO26. It is the only one of the three that is still being upstreamed.
- Do you need a closed-source commercial product without buying an Ultralytics Enterprise license? Then YOLO26 and YOLOv12 are out unless you go through Roboflow's commercial program. Consider YOLO-NAS (verify the weights license clause), or detectors like RF-DETR or D-FINE.
- Are you optimizing for Jetson / NPU INT8 deployment? Run YOLO-NAS-M-INT8 against YOLO26-s-INT8 on your hardware. YOLO-NAS still wins the smallest-quantization-loss metric, but the latency math has narrowed.
- Are you writing a paper that needs an attention-centric YOLO baseline? Use YOLOv12; the architecture is the cleanest reference for area-attention + R-ELAN.
- Do you need pose, OBB, or segmentation in addition to detection? YOLOv12 and YOLO26 both ship those out of the box; YOLO-NAS only ships detection + pose (via YOLO-NAS-Pose).
Independent benchmarks worth checking
Don't rely solely on vendor numbers. Useful third-party 2026 benchmark sources:
- Roboflow's "Best Object Detection Models 2026" — pits YOLOv12, YOLO26, RF-DETR, and D-FINE on COCO and on a held-out Roboflow 100 set, which exposes domain-shift weakness that COCO hides.
- arXiv:2510.09653 — Ultralytics YOLO Evolution covers YOLO26 vs YOLO11 vs YOLOv8 vs YOLOv5 with consistent training pipelines.
- arXiv:2502.12524 — YOLOv12 paper includes the latency/throughput tables on T4 and on RTX 3080.
If you're picking on accuracy alone, also evaluate the transformer-based RF-DETR (Roboflow) and D-FINE — both can beat the YOLO line on COCO at similar latency budgets in 2026, but with different training-data-efficiency tradeoffs.
Common pitfalls
- Mixing up FlashAttention requirement. The Ultralytics YOLOv12 integration does not require FlashAttention to run; it's an optional speedup. The original repo's earliest versions made it look mandatory. If you're on a P100/V100, you can still run YOLOv12, just without the FlashAttention path.
- Trying to fine-tune YOLO-NAS expecting upstream support. The repo accepts PRs only sporadically; treat your fork as the source of truth.
- Comparing YOLO-NAS COCO numbers to YOLOv12 e2e numbers. YOLO-NAS still uses NMS post-processing; YOLO26 does not. When you compare latency, include or exclude NMS consistently across all models.
- Assuming AGPL is fine for SaaS. AGPL's network-use clause means a SaaS that calls Ultralytics weights server-side technically triggers source-disclosure obligations. Get legal sign-off or buy the Enterprise license.
- Trusting COCO numbers as a stand-in for your domain. COCO is dominated by people, vehicles, and household objects. For aerial, medical, or industrial domains, run all three on a held-out slice of your own data before deciding.
What we removed from the 2025 version of this post and why
- Claim that "YOLO-NAS does not support training" — outdated; super-gradients supports both training and QAT, although you should treat it as a frozen project now.
- Open question on missing FLOPs/params — Deci never published these for released checkpoints; we now state that explicitly instead of leaving "Not specified" cells.
- The "YOLOv12 is the latest" framing — superseded by YOLO26.
FAQ
Is YOLO-NAS still safe to use in production in 2026?
Yes, the released weights are stable and battle-tested. But you should not expect bug fixes, security patches, or new variants. Pin your dependency versions and budget for a migration to YOLO26 or another maintained detector within 12-24 months.
Did NVIDIA continue YOLO-NAS development after acquiring Deci?
No public continuation. Deci's team was folded into NVIDIA in April 2024 and the public super-gradients repo has not received feature work since. NVIDIA's own detector recommendations for TensorRT/DeepStream now point at YOLO11/YOLO26 and the TAO toolkit.
Is YOLOv12 better than YOLO-NAS?
On vanilla FP16 COCO accuracy and latency, yes — YOLOv12-M matches or beats YOLO-NAS-M at lower latency. On INT8 quantized deployment, YOLO-NAS still has the smallest precision drop, which can matter for Jetson Nano/Orin.
Should I switch from YOLOv12 to YOLO26?
For new projects, yes — YOLO26 is faster on CPU, removes NMS, and is the actively maintained line. For existing pipelines that already work, the migration cost is real (YOLO26's NMS-free head changes how you parse outputs); only switch if you need the speedup or are already due for a refresh.
What about YOLOv9, YOLOv10, YOLO11?
YOLOv9 (WongKinYiu) and YOLOv10 (Tsinghua) are research lines that introduced PGI and consistent dual-label assignment respectively. YOLO11 is Ultralytics' previous flagship — still the best-supported choice if you don't want to take on YOLO26's newer head topology.
Can I run any of these on a CPU?
All three export to ONNX and run on CPU. YOLO26-n at ~38.9 ms per frame on CPU is now the fastest of the family. YOLOv12 is heavier on CPU because of the attention path. YOLO-NAS-S in INT8 is the best CPU option among the three for Jetson-class hardware.
What's a good real-world Reddit thread to follow?
The Ultralytics YOLO26 GitHub Discussion (discussions/22214) is the highest-signal thread for shipping issues and migration tips; r/computervision has multiple ongoing licensing and benchmark threads worth scanning.
Need help picking and deploying the right detector?
Choosing between YOLO-NAS, YOLOv12, and YOLO26 is the easy part — building the data pipeline, training loop, evaluation harness, and edge deployment around it is the work. Codersera helps teams hire vetted remote computer-vision engineers who have shipped production detection pipelines on Jetson, Triton, and ONNX Runtime, often within a week of the brief landing. If you're extending an in-house ML team, that's the fastest route to a reliable rollout.
References & further reading
- Ultralytics YOLO26 — official model docs (release date, benchmarks, training recipes)
- Ultralytics YOLO12 — official model docs (current maintenance posture, FlashAttention notes)
- Ultralytics YOLO-NAS — official model docs (Deci acquisition note, mAP/latency table)
- YOLOv12: Attention-Centric Real-Time Object Detectors (arXiv:2502.12524, NeurIPS 2025)
- Ultralytics YOLO Evolution: An Overview of YOLO26, YOLO11, YOLOv8 and YOLOv5 (arXiv:2510.09653)
- Deci-AI/super-gradients — YOLO-NAS source repository
- sunsmarterjie/yolov12 — original YOLOv12 reference implementation
- Roboflow — Best Object Detection Models 2026 (RF-DETR, YOLOv12 & beyond)
- Ultralytics YOLO26 GitHub Discussion #22214 — practitioner Q&A
- Ultralytics License page — AGPL-3.0 vs Enterprise