Fix Could not load libcuda.so in xformers (2026)

Quick answer. The xformers error Could not load library libcuda.so means the dynamic loader cannot find the NVIDIA driver library. In 90% of cases it is a missing symlink (the driver ships libcuda.so.1 but not libcuda.so), an LD_LIBRARY_PATH that omits the CUDA lib directory, or an xformers wheel built against a different CUDA than your driver exposes. Start with find / -name 'libcuda.so*' 2>/dev/null and ldconfig -p | grep libcuda before touching anything else.

What the error actually means

xformers links against the NVIDIA CUDA driver library, libcuda.so. When Python imports xformers, the dynamic linker walks /etc/ld.so.cache plus every directory in LD_LIBRARY_PATH looking for that exact filename. If it cannot find it, you get:

RuntimeError: Could not load library libcuda.so. Error: libcuda.so: cannot open shared object file: No such file or directory

Note the distinction between libcuda.so (the NVIDIA driver) and libcudart.so (the CUDA runtime, shipped with the toolkit and with PyTorch wheels). The driver lives outside CUDA Toolkit and ships with your GPU driver package. PyTorch can often run without a system libcuda.so because it bundles its own runtime, but xformers operators that compile or call into driver-level APIs (graph capture, Flash Attention 3, paged attention) need the real driver.

Step 1: confirm what is actually on disk

Run these two commands first. They tell you which fix below applies.

find / -name 'libcuda.so*' 2>/dev/null
ldconfig -p | grep libcuda

Typical good output on a working Ubuntu host with the NVIDIA driver installed:

/usr/lib/x86_64-linux-gnu/libcuda.so
/usr/lib/x86_64-linux-gnu/libcuda.so.1
/usr/lib/x86_64-linux-gnu/libcuda.so.570.86.10

If you see only libcuda.so.1 and the unversioned libcuda.so is missing — jump to fix 1. If you see nothing at all — jump to fix 3 (driver not installed). If you see the files but ldconfig -p is empty for libcuda — jump to fix 2.

Fix 1: the missing `libcuda.so` symlink (most common)

On many systems — especially Docker base images, WSL2, and minimal cloud GPU instances — only the versioned libcuda.so.1 ships, and the unversioned link xformers wants is missing. Create it manually:

sudo ln -sf /usr/lib/x86_64-linux-gnu/libcuda.so.1 /usr/lib/x86_64-linux-gnu/libcuda.so
sudo ldconfig

Verify:

ldconfig -p | grep libcuda.so
# expect: libcuda.so (libc6,x86-64) => /usr/lib/x86_64-linux-gnu/libcuda.so

NVIDIA documents this gap in its CUDA on WSL guide and in the long-running developer forum thread — the driver intentionally omits the unversioned symlink to keep the build-time stub library from being picked up at runtime, but the side effect is that any consumer asking for the plain name fails to load.

Fix 2: `LD_LIBRARY_PATH` or `ldconfig` not pointing at CUDA

If the file exists but ldconfig -p does not list it, your system library cache is stale or the CUDA lib directory was never registered. Two paths to fix it.

Per-shell (quick check):

export LD_LIBRARY_PATH=/usr/local/cuda/lib64:$LD_LIBRARY_PATH
python -c 'import xformers; print(xformers.__version__)'

If that works, make it permanent by registering with ldconfig instead of relying on the env var:

echo '/usr/local/cuda/lib64' | sudo tee /etc/ld.so.conf.d/cuda.conf
sudo ldconfig
ldconfig -p | grep libcudart

On Arch the lib lives at /opt/cuda/targets/x86_64-linux/lib; on Ubuntu deb-installed CUDA it is /usr/local/cuda/lib64; on WSL2 the driver libs are at /usr/lib/wsl/lib.

Fix 3: NVIDIA driver not actually installed

If find / -name 'libcuda.so*' returns nothing, you do not have a driver — only the CUDA Toolkit. The toolkit ships a build-time stub at /usr/local/cuda/lib64/stubs/libcuda.so, and pointing your runtime at the stub is a common foot-gun that gives confusing errors later. Install the real driver:

# Ubuntu / Debian
sudo apt update
sudo apt install -y nvidia-driver-570  # or whichever matches your card
sudo reboot
nvidia-smi  # confirm the driver loaded

nvidia-smi must print a driver version and GPU list for xformers to work. If it does not, no userspace fix will rescue you — the kernel module did not load.

Fix 4: WSL2 special case

WSL2 has its own rules. Per NVIDIA’s 2026 CUDA on WSL guide:

Do not install a Linux NVIDIA driver inside WSL. The Windows host driver is exposed through /usr/lib/wsl/lib/libcuda.so.1.
Use the wsl-ubuntu apt repository (developer.download.nvidia.com/compute/cuda/repos/wsl-ubuntu/x86_64/), not the regular Ubuntu one.
Install cuda-toolkit-13-x, not the cuda meta-package — the meta-package pulls in a Linux driver that overwrites the host stub and breaks /dev/dxg.

If your WSL2 setup is missing the unversioned symlink (it usually is), use the Fix 1 command but adjust the path:

sudo ln -sf /usr/lib/wsl/lib/libcuda.so.1 /usr/lib/wsl/lib/libcuda.so
sudo ldconfig

Fix 5: Docker container missing `--gpus all` or wrong base image

The NVIDIA Container Toolkit mounts the host’s libcuda.so into the container at runtime. Without it the container has no driver library at all.

Run with explicit GPU access:

docker run --rm --gpus all nvidia/cuda:13.0.0-base-ubuntu22.04 nvidia-smi

If nvidia-smi fails inside the container, fix the host first — install nvidia-container-toolkit and restart the Docker daemon:

sudo apt install -y nvidia-container-toolkit
sudo systemctl restart docker

Build on top of nvidia/cuda:*-runtime-* or nvidia/cuda:*-devel-* images, not bare ubuntu. For xformers specifically, the NGC nvcr.io/nvidia/pytorch images are the safest choice — PyTorch, CUDA, and the matching driver shims are pre-validated together.

Fix 6: xformers wheel built against the wrong CUDA

If the symlink is correct and the driver loads but xformers still complains (or you see follow-on warnings like xFormers can’t load C++/CUDA extensions), your wheel was built for a different CUDA than your driver supports. Each xformers release is pinned to a specific PyTorch build, and PyTorch is pinned to a specific CUDA — mixing indexes produces silently-broken installs.

Detect the mismatch:

python -c "import torch; print(torch.__version__, torch.version.cuda)"
nvidia-smi | grep 'CUDA Version'
python -c "import xformers; print(xformers.__version__)"

The nvidia-smi CUDA version is the maximum your driver supports; the torch.version.cuda must be less than or equal to it. If they disagree, reinstall both from the same matched PyTorch index:

# Pick the URL that matches your driver. For CUDA 12.8:
pip install --upgrade --force-reinstall \
  torch xformers \
  --index-url https://download.pytorch.org/whl/cu128

# For CUDA 13.0:
pip install --upgrade --force-reinstall \
  torch xformers \
  --index-url https://download.pytorch.org/whl/cu130

Pulling xformers from PyPI while torch is on the PyTorch index is the single most common cause of the silently-mismatched-wheel variant. Route both packages through the same --index-url.

Fix 7: conda env not active (or the wrong one)

If you installed xformers inside a conda env but Python is resolving to the system interpreter, you will load the system’s (empty) library path instead of the env’s. Verify:

which python
python -c 'import sys; print(sys.prefix)'
conda info --envs

The interpreter under which python must live inside the env you installed xformers into. If not, conda activate <env> and retry.

Nuclear option: full xformers reinstall

When you have tried the above and want a clean slate:

pip uninstall -y xformers torch
pip cache purge

# Confirm your driver’s CUDA capability
nvidia-smi | grep 'CUDA Version'

# Install matched torch + xformers from the same index
pip install torch xformers --index-url https://download.pytorch.org/whl/cu128

xformers 0.0.35 (Feb 2026) requires PyTorch 2.10.0 and migrated to the PyTorch stable API/ABI, so any later PyTorch 2.10+ point release will keep working without rebuilding.

Verify the fix worked

Run all three:

python -c "import xformers; print(xformers.__version__)"
python -c "import torch; print(torch.cuda.is_available(), torch.cuda.get_device_name(0))"
python -c "import xformers.ops as xops, torch; q=torch.randn(1,8,64,64,device='cuda',dtype=torch.float16); print(xops.memory_efficient_attention(q,q,q).shape)"

The third command actually exercises an xformers CUDA op — if it prints a shape, the install is real. If it raises, the error message will now point at the specific missing op rather than the loader.

libnvrtc.so: cannot open shared object file — install the CUDA Toolkit (sudo apt install cuda-nvrtc-13-0); NVRTC ships with the toolkit, not the driver.
libcudnn.so.9: cannot open shared object file — pip install nvidia-cudnn-cu13 or install cuDNN at the system level; xformers attention kernels need it.
libcuda.so.1: version 'CUDA_X' not found — driver too old for the toolkit; upgrade the driver, not the toolkit.
xFormers can’t load C++/CUDA extensions — almost always Fix 6 (wheel mismatch).

FAQ

Does this affect AMD GPUs?

No. libcuda.so is NVIDIA-specific. xformers ships experimental ROCm wheels (--index-url https://download.pytorch.org/whl/rocm7.1) that link against ROCm libraries instead.

Why does PyTorch work but xformers fails?

PyTorch bundles its own copy of the CUDA runtime (libcudart.so) inside the wheel, so it can run without anything CUDA-related installed system-wide as long as the driver loads. xformers operators that call into driver-level APIs — Flash Attention 3, paged attention, graph capture — need the real libcuda.so, which only the driver provides.

Can I run xformers without CUDA?

Yes, but with severe limits. CPU-only xformers exposes the building-block APIs but skips every fused attention kernel — you lose the entire performance reason for using it. For inference of a few tokens it works; for any real workload it does not.

Will pointing at the stub `libcuda.so` in `/usr/local/cuda/lib64/stubs/` work?

No. The stub is link-time only and intentionally returns errors at runtime. NVIDIA omits the unversioned symlink in /usr/lib precisely to keep the stub from being picked up; if you symlink the stub into the runtime lib path you will get worse errors a few seconds later.

Is xformers still maintained in 2026?

Yes. The latest release on PyPI is 0.0.35 (February 2026), targeting PyTorch 2.10 via the stable ABI. The project is active on GitHub with regular releases against new CUDA versions.

Do I need to set `LD_LIBRARY_PATH` at all if `ldconfig` is configured?

No, and you should not. ldconfig-managed paths in /etc/ld.so.conf.d/*.conf are the standard mechanism — LD_LIBRARY_PATH is a debugging escape hatch that papers over the real fix and breaks subshells.

Why does the error not show up in a clean Colab notebook?

Colab pre-installs the NVIDIA driver and ships matched torch/xformers wheels — the libcuda symlink is already in place. The error shows up when you upgrade torch or xformers manually from a different index than the one Colab pinned.

References

facebookresearch/xformers — canonical repo and installation docs
CUDA on WSL User Guide — the WSL2-specific rules
NVIDIA Container Toolkit install guide
NVIDIA forum: libcuda.so should be a symlink
Felix Sanz — PyTorch/CUDA/xFormers version compatibility