CUDA_test/README.md

3.5 KiB

CudaStresser

CudaStresser is a Python-based tool designed to stress test CUDA-enabled GPUs. It performs various operations to measure GPU memory bandwidth, stress the device, and log GPU utilization metrics. This tool is useful for developers, researchers, and system administrators who want to benchmark or stress test their GPU hardware.

Features

  • CUDA Core Stress Testing: Create and manipulate tensors on the GPU to stress test the CUDA cores.
  • VRAM Load and Unload: Simulate heavy memory operations to test GPU memory bandwidth and consistency.
  • Real-time GPU Monitoring: Log GPU utilization, memory usage, and temperature during the stress test.
  • Multiprocessing: Efficiently utilize multiple CPU cores for concurrent GPU stress testing.

Requirements

  • Python 3.7+
  • PyTorch 1.7+
  • CUDA-enabled GPU(s)
  • NVIDIA drivers with nvidia-smi available

Installation

  1. Clone the repository:

    git clone https://github.com/your-username/cuda-stresser.git
    cd cuda-stresser
    
  2. Create and activate a virtual environment (optional but recommended):

    python3 -m venv venv
    source venv/bin/activate  # On Windows: venv\Scripts\activate
    
  3. Install the required Python packages:

    pip install torch
    
  4. Ensure nvidia-smi is available:

    Make sure nvidia-smi is accessible in your system's PATH. This is usually installed with the NVIDIA drivers.

Usage

Basic Stress Test

To perform a basic CUDA stress test:

python cuda_stresser.py

This command will:

  • Detect available CUDA devices.
  • Stress test the CUDA cores by creating and manipulating tensors for 60 seconds.
  • Log GPU utilization, memory usage, and temperature every 5 seconds.
  • Display the progress and results in the console.

Custom Stress Test

You can customize the test parameters using the CudaStresser class:

from cuda_stresser import CudaStresser

# Initialize the stresser with 99% VRAM load
stresser = CudaStresser(load_perc=0.99)

# Perform a stress test for 60 seconds with 15 tensors
results = stresser.cuda_stress(timing=60, tensor_num=15)
print(results)

# Perform a VRAM load/unload test for 60 seconds
vram_results = stresser.cuda_load_unload(timing=60)
print(vram_results)

Logging GPU Information

During the stress test, the script logs GPU utilization, memory usage, and temperature data. The logs can be accessed from the console or modified to save to a file.

Customization

Parameters

  • load_perc: Percentage of VRAM to load (default: 0.99).
  • timing: Duration of the stress test in seconds (default: 60).
  • tensor_num: Number of tensors to create for the stress test (default: 1000).
  • poll_time: Interval for logging GPU data in seconds (default: 5).

Example with Custom Parameters

stresser = CudaStresser(load_perc=0.90)
gpu_log = stresser.cuda_stress(timing=120, tensor_num=20, poll_time=10)
print("GPU Log:", gpu_log)

License

This project is licensed under the MIT License. See the LICENSE file for details.

Acknowledgements

Special thanks to the developers and contributors of PyTorch for providing an excellent framework for machine learning and GPU computing.


Note: This tool is intended for testing and benchmarking purposes. Use it responsibly, especially on production systems, as it can put significant load on your hardware.