首先安装cuda和cudnn
看了pytorch,支持版本有cuda12.1,12.4和12.6 (这个是错误步骤!!!我写博客的时间节点(202507)只支持cu128,请跳过踩坑经历看后面)
- cuDNN官网下载地址:https://developer.nvidia.com/rdp/cudnn-archive
- CUDA 工具包下载地址:https://developer.nvidia.com/cuda-toolkit-archive
原来用nvidia-cuda-toolkit
安装的是cuda12.0,想和pytorch配合要装12.6,所以先卸载一下然后安装新的
sudo apt-get remove nvidia-cuda-toolkit
使用下列命令安装:
sudo sh cuda_11.1.1_455.32.00_linux.run
会出现Failed to verify gcc version
这里参考了
Check the maximum supported GCC version for your CUDA version:CUDA version max supported GCC version
12.8 14
12.4, 12.5, 12.6 13.2
12.1, 12.2, 12.3 12.2
12 12.1
11.4.1+, 11.5, 11.6, 11.7, 11.8 11
11.1, 11.2, 11.3, 11.4.0 10
11 9
10.1, 10.2 8
9.2, 10.0 7
9.0, 9.1 6
8 5.3
7 4.9
5.5, 6 4.8
4.2, 5 4.6
4.1 4.5
4.0 4.4
Set an env var for that GCC version. For example, for CUDA 10.2:MAX_GCC_VERSION=8
Make sure you have that version installed:sudo apt install gcc-$MAX_GCC_VERSION g++-$MAX_GCC_VERSION
Add symlinks within CUDA folders:sudo ln -s /usr/bin/gcc-$MAX_GCC_VERSION /usr/local/cuda/bin/gcc
sudo ln -s /usr/bin/g++-$MAX_GCC_VERSION /usr/local/cuda/bin/g++
(or substitute /usr/local/cuda with your CUDA installation path, if it's not there)See this GitHub gist for more information on the CUDA-GCC compatibility table.
安装命令是
sudo apt install gcc
如果是runfile安装的cuda,参考这个
# cuda10.1及以上的卸载
cd /usr/local/cuda-xx.x/bin/
sudo ./cuda-uninstaller
sudo rm -rf /usr/local/cuda-xx.x
安装后参考这个链接装cuda
点accept和install(取消driver)
结束后,继续安装cudnn:
解压:
复制:
复制文件
sudo cp cuda/include/cudnn*.h /usr/local/cuda/include
sudo cp -P cuda/lib64/libcudnn* /usr/local/cuda/lib64
sudo chmod a+r /usr/local/cuda/include/cudnn*.h
sudo chmod a+r /usr/local/cuda/lib64/libcudnn*
————————————————版权声明:本文为博主原创文章,遵循 CC 4.0 BY-SA 版权协议,转载请附上原文出处链接和本声明。原文链接:https://blog.csdn.net/m0_52650517/article/details/119838486
配置usb环境
除了requirements中的库,
还需要:
ema_pytorch == 0.0.1
pandas
ruamel.yaml==0.17.21
timm == 0.5.4
问题1:
错误调试,RuntimeError: CUDA error: no kernel image is available for execution on the device
CUDA kernel errors might be asynchronously reported at some other API call, so the stacktrace below might be incorrect.
For debugging consider passing CUDA_LAUNCH_BLOCKING=1
compile with TORCH_USE_CUDA_DSA
to enable device-side assertions.
但是执行
import torch
torch.cuda.is_available()
返回True
这个问题(链接)中有提到torch和cuda版本不匹配的问题
12I checked the latest torch and torchvision version with cuda from the given link. Stable versions list: https://download.pytorch.org/whl/cu113/torch_stable.htmlBelow versions solved the error,pip install torch==1.11.0+cu113 torchvision==0.12.0+cu113 -f https://download.pytorch.org/whl/torch_stable.htmlReference: #49161
按照这个进行修改:
It can be that you're having older versions of torch and cuda. In that case, when you run torch.cuda.is_available() it would return True. However, if you say torch.tensor([0.12, 0.32]).cuda() it would give the mentioned error.Even though I used the install command from pytorch website (https://pytorch.org/get-started/locally/) it had installed an older version. So, when you run the command add a -U to after pip install to upgrade. That solved the problem for me.pip3 install -U torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
instead ofpip3 install torch torchvision torchaudio --extra-index-url https://download.pytorch.org/whl/cu116
Source - https://qiita.com/Uenie/items/95107f79512d90f73a19
将pytorch更新到最新版本
出现如下问题:
The current Pytorch install supports CUDA capabilities sm50 60 70 75 80 86 90
if you want to use the NVIDIA ... 5090 D GPU with Pytorch, please check : https://pytorch.org/get-started/locally
Google了这个问题,pytorch的论坛中回复说需要是cu128
You would need to install the latest stable or nightly binary with CUDA 12.8 by selecting the right CUDA version in our install matrix and copy/pasting the command to your Python environment.
The short answer is that you should do this:To use PyTorch for Windows on NVIDIA 5080, 5090 Blackwell RTX GPUs use the latest nightly builds, or the command below.
pip install --pre torch torchvision torchaudio --index-url https://download.pytorch.org/whl/nightly/cu128
There are also several discussions on Github about this, for example:
参考github
ptrblck
18d
what exactly is the install matrix? is it (https://pytorch.org/get-started/locally/)?Yes(i did the “pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128” thing)In this case you might have had an older PyTorch binary using an older CUDA toolkit, as the error message points to a PyTorch build with CUDA <=12.6.and how do i know what the right CUDA version is?CUDA 12.8 is required for Blackwell.
# 5090系列显卡叫blackwell
所以要重新装cu128!!!!
抓狂
注意查到的说法里当时重装的pytorch稳定版也没有和cu128一致的,只有装preview/nightly版;但是我来得慢,刚好我现在安装,已经发了正式版的了。
pip3 install torch torchvision torchaudio --index-url https://download.pytorch.org/whl/cu128
顺利装好以后就不报错了。
PaddlePaddle
paddlepaddle官网安装可以选择pip安装,但是只有CUDA12.9和CUDA12.6,这个和pytorch需要的CUDA12.8是矛盾的。
cuicheng01
on Apr 28
Collaborator
Paddle当前不支持cuda12.8,最近会支持,敬请期待paddlepaddle-gpu cuda12.8版 , paddleX推理报错
所以暂时装了CPU版本,预测速度勉强可以接受
YOLO v11
YOLO一如既往的好装好用