当前位置：首页 > article >正文

Windows编译Flash-attention模块

article 2025/3/17 8:10:39

博主的环境配置：windows11操作系统，cuda=11.8.r11.8, cudnn=8.9.7, git=2.47.1，cmake=4.0.0-rc4，ninja=1.12.1, vs_buildTools=17.4.21, cl=19.34.31948, torch=2.3.1

编译flash-attention的环境依赖如下图

查看安装cuda和cudnn。参考https://blog.csdn.net/m0_52111823/article/details/145379672?spm=1001.2014.3001.5501
查看cuda与Visual Studio的版本兼容情况
1. 进入官网https://docs.nvidia.com/cuda/archive/
2. 选择自己的cuda版本，cmd命令nvcc --version
3. 搜索Installation Guide Windows，点击
4. 搜索Visual Studio
安装指定版本的vs_buildTools
1. 进入https://learn.microsoft.com/en-us/visualstudio/releases/2022/release-history#fixed-version-bootstrappers
2. 搜索想要安装的版本，注意安装LSTC版本
3. 下载相应的Build Tools
安装好MSVC，参考https://blog.csdn.net/m0_52111823/article/details/146292712?spm=1001.2014.3001.5502
下载cmake，git，ninja安装包，并将路径添加到环境变量中。

测试安装成功

cl
cmake --version
git --version
ninja --version
nvcc --version

克隆flash-attention的项目到本地git clone https://github.com/Dao-AILab/flash-attention.git
进入环境，新建conda虚拟环境，安装torch2.3.1
修改setup.py中的max_num_jobs_cores，根据个人电脑核心数决定，增大该值可加快编译速度。
启动命令开始编译python setup.py bdist_wheel
在生成的文件夹dist下就存在whl文件flash_attn-2.7.4.post1-cp310-cp310-win_amd64.whl
安装whl文件，pip install ./dist/flash_attn-2.7.4.post1-cp310-cp310-win_amd64.whl