当前位置：首页 > article >正文

Tensorflow 2.0 GPU的使用与限制使用率及虚拟多GPU

article 2025/3/13 19:43:34

Tensorflow 2.0 GPU的使用与限制使用率及虚拟多GPU

1. 获得当前主机上特定运算设备的列表
2. 设置当前程序可见的设备范围
3. 显存的使用
4. 单GPU模拟多GPU环境

先插入一行简单代码，以下复制即可用来设置GPU使用率：

import tensorflow as tf
import numpy as np

print(tf.__version__)
import os

# 设置可使用的 gpu 序号
os.environ['CUDA_VISIBLE_DEVICES'] = '0'
# 用来设置是否在特殊情况下在cpu上进行计算
tf.config.set_soft_device_placement = False
# 
tf.config.experimental.set_memory_growth = True
gpus = tf.config.experimental.list_physical_devices('GPU')

print(gpus)

if gpus:
    tf.config.experimental.set_virtual_device_configuration(gpus[0],
                                                           [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)])
    
    logical_gpus = tf.config.experimental.list_logical_devices('GPU')
    print(len(gpus), len(logical_gpus), 'Logical gpus')
# tf.debugging.set_log_device_placement(True)
# loggpus = config.experimental.list_logical_devices()
# strategy = tf.distribute.MirroredStrategy()
with tf.device('/device:GPU:0'):
    w = tf.constant([[2, -3.4]])
    b = tf.constant([4.2])
    x = tf.random.normal([1000, 2], mean=0, stddev=10)
    e = tf.random.normal([1000, 2], mean=0, stddev=0.1)
    W = tf.Variable(tf.constant([5, 1]))
    B = tf.Variable(tf.constant([1]))

1. 获得当前主机上特定运算设备的列表

#　获取当前物理gpu
gpus = tf.config.experimental.list_physical_devices(device_type='GPU')
# 获取当前物理cpu
cpus = tf.config.experimental.list_physical_devices(device_type='CPU')
print(gpus, cpus)
# 获取当前虚拟gpu个数
logical_gpus = tf.config.experimental.list_logical_devices('GPU')

2. 设置当前程序可见的设备范围

默认情况下 TensorFlow 会使用其所能够使用的所有 GPU

tf.config.experimental.set_visible_devices(devices=gpus[2:4], device_type='GPU')

设置之后，当前程序只会使用自己可见的设备，不可见的设备不会被当前程序使用。

另一种方式是使用环境变量 CUDA_VISIBLE_DEVICES 也可以控制程序所使用的 GPU。
在终端输入

export CUDA_VISIBLE_DEVICES=2,3

或者在代码里加入

import os
os.environ['CUDA_VISIBLE_DEVICES'] = "2,3"

3. 显存的使用

默认情况下，TensorFlow 将使用几乎所有可用的显存，以避免内存碎片化所带来的性能损失。

但是TensorFlow 提供两种显存使用策略，让我们能够更灵活地控制程序的显存使用方式：

仅在需要时申请显存空间（程序初始运行时消耗很少的显存，随着程序的运行而动态申请显存）；
限制消耗固定大小的显存（程序不会超出限定的显存大小，若超出的报错）。

设置仅在需要时申请显存空间。

for gpu in gpus:
    tf.config.experimental.set_memory_growth(gpu, True)

下面的方式是设置Tensorflow固定消耗GPU:0的2GB显存。

tf.config.experimental.set_virtual_device_configuration(
    gpus[0],
    [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)]
)

4. 单GPU模拟多GPU环境

上面的方式不仅可以设置显存的使用，还可以在只有单GPU的环境模拟多GPU进行调试。

tf.config.experimental.set_virtual_device_configuration(
    gpus[0],
    [tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048),
     tf.config.experimental.VirtualDeviceConfiguration(memory_limit=2048)])