LLM Training with Nvidia Blackwell Architecture RTX 5080 & 5090
The new Blackwell architecture currently produces errors during training with up-to-date LLM fine-tuning tools. In this article, I will specifically describe how I set up a training environment using an RTX 5080.
At the moment, the RTX 5000 Series is causing errors on Proxmox 8.3, preventing the system from running. Therefore, at my home AI Lab, after installing Windows 11, I started my LLM fine-tuning process with the help of Docker and WSL 2.
First, install Docker Desktop and WSL 2 on your Windows machine. Then, use the following command to pull the PyTorch image onto your machine.
docker pull pytorch/pytorch
Then, run the following command to launch the image using your GPU.
docker run --gpus all -it pytorch/pytorch
When the image starts, sequentially install the Transformers libraries:
pip install transformers datasets peft accelerate bitsandbytes torch
Then, remove old Torch versions and install the new ones with the following command:
pip unistall torch torchvision torchaudio
pip install --upgrade pip setuptools wheel
pip install --pre torch --index-url https://download.pytorch.org/whl/nightly/cu128
apt-get update && apt-get install -y build-essential cmake
Your environment is ready! Enjoy your fine-tuning…