What‘s the term “unused memory“ in PyTorch?
In the context of the PyTorch documentation, the term “unused memory” refers to GPU memory that is no longer actively used by any PyTorch tensors but is still managed by PyTorch’s caching memory allocator. Here’s how it works:
-
Caching Allocator in PyTorch:
PyTorch uses a caching memory allocator to speed up memory management. When PyTorch allocates GPU memory for tensors, it doesn’t immediately return that memory to the GPU when the tensors are no longer needed (e.g., after being deleted or going out of scope). Instead, PyTorch keeps the memory in its internal cache for future use. -
“Unused” Memory:
- This “unused” memory is memory that is no longer being used by any active tensors but is kept in the cache by PyTorch.
- From PyTorch’s perspective, this memory is available for reuse, so future tensor allocations can use it without requiring a new memory allocation from the GPU.
- However, from the perspective of the driver (as shown by tools like
nvidia-smi
), this memory still appears to be in use. This is because PyTorch hasn’t actually released it back to the system; it is still reserved by PyTorch’s caching allocator.
-
Why Does This Happen?
- The caching allocator avoids the overhead of frequent memory allocation and deallocation. Allocating and freeing memory directly on the GPU can be slow and may involve device synchronization. By caching memory, PyTorch can make future allocations faster.
- The tradeoff is that
nvidia-smi
gives the impression that the memory is still actively used, even though it’s not being used by any tensor at that moment.
-
How to Free This “Unused” Memory:
- You can call
torch.cuda.empty_cache()
to release unused cached memory back to the system. This will free up memory for other GPU processes and reduce the amount of memory that appears as “used” innvidia-smi
. - However, note that
torch.cuda.empty_cache()
does not release memory occupied by active tensors. Only cached (unused) memory is released.
- You can call
Example:
import torch
# Allocate some GPU memory
a = torch.randn(1000, 1000, device='cuda')
# This memory is actively used by the tensor 'a'
print(torch.cuda.memory_allocated()) # Shows memory occupied by tensors
print(torch.cuda.memory_reserved()) # Shows total memory managed by PyTorch
# Free the tensor
del a
# At this point, the memory used by 'a' is now "unused" (cached by PyTorch)
print(torch.cuda.memory_allocated()) # Now 0 (no active tensor memory)
print(torch.cuda.memory_reserved()) # Still non-zero (cached memory)
# Clear the cache
torch.cuda.empty_cache()
# Now memory_reserved() should be reduced
print(torch.cuda.memory_reserved()) # Close to 0 (if no other allocations exist)
Key Takeaways:
- Unused memory: Memory no longer used by active tensors but still reserved in PyTorch’s cache for future use.
- Why it looks “used” in
nvidia-smi
: Because PyTorch hasn’t returned it to the GPU driver. - How to release it: Use
torch.cuda.empty_cache()
, but this is generally not recommended unless you want to make memory available for other GPU applications.