Linux下用多进程在GPU上跑Pytorch模型问题
先看一段代码
import concurrent.futures
import torch
device = "cuda"
model = torch.nn.Linear(20, 30)
model.to(device)
def exec(v):
input = torch.randn(128, 20).to(device)
output = model(input)
return v
if __name__ == '__main__':
with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
for x in executor.map(exec, range(10)):
print(x)
这段代码尝试在多个进程下并行运行一个pytorch网络。在Linux下运行会遇到如下错误:
RuntimeError: Cannot re-initialize CUDA in forked subprocess. To use CUDA with multiprocessing, you must use the 'spawn' start method
还有一种可能的情况是死锁,这可能和模型大小有关。总之,上面这段代码是无法正常于行的。
问题原因与python multiprocess的启动方式有关。Python的multiprocess一共有三种启动方式:spawn,fork和forkserver。而CUDA runtime是不支持fork的。不幸的是,除了macOS以外的POSIX系统的默认方式都是fork。
知道了原因,解决方法就非常简单了,那就是显式设置multiprocess的启动方式
import concurrent.futures
import torch
import multiprocessing as mp
device = "cuda"
model = torch.nn.Linear(20, 30)
model.to(device)
def exec(v):
input = torch.randn(128, 20).to(device)
output = model(input)
return v
if __name__ == '__main__':
mp.set_start_method('spawn')
with concurrent.futures.ProcessPoolExecutor(max_workers=5) as executor:
for x in executor.map(exec, range(10)):
print(x)
这次就能正常运行了
0
1
2
3
4
5
6
7
8
9
Reference:
Multiprocessing best practices — PyTorch 2.6 documentation
multiprocessing — Process-based parallelism — Python 3.13.2 documentation