Monitoring GPU metrics on a Windows machine using Zabbix involves several steps, including setting up the Zabbix Agent, gathering GPU data, and configuring Zabbix to collect and display these metrics. Below is a detailed guide to achieve this.
Step-by-Step Guide to Monitor Windows GPU with Zabbix
Prerequisites
- Zabbix Server: Ensure you have a Zabbix server installed and running.
- Zabbix Agent: Install the Zabbix Agent on the Windows machine where you want to monitor the GPU.
Step 1: Install Zabbix Agent on Windows
- Download the Zabbix Agent:
Download the Zabbix Agent installer from the official Zabbix website: Zabbix Downloads. - Install the Zabbix Agent:
Follow the installation wizard and configure the agent:
- Server: Enter the IP address or hostname of your Zabbix server.
- ServerActive: Again, enter the Zabbix server’s address.
- Hostname: Set the hostname of the Windows machine as it will appear in Zabbix. You can also configure these parameters later in the
zabbix_agentd.conf
file located typically inC:\Program Files\Zabbix Agent\
.
3. Start the Zabbix Agent:
After installation, start the Zabbix Agent service through the Services management console (services.msc
) or using the command line:
net start "Zabbix Agent"
Step 2: Gather GPU Metrics on Windows
To monitor GPU metrics such as utilization, temperature, and memory usage, we can use Windows Performance Counters, and scripts to collect the required data.
- Windows Performance Counters:
Windows provides GPU performance counters that can be accessed via PowerShell or WMI (Windows Management Instrumentation). Common counters for GPUs include:
GPU Engine
GPU Adapter Memory
These counters provide various metrics such as GPU utilization and memory usage.
2. NVIDIA SMI (for NVIDIA GPUs):
If you have an NVIDIA GPU, you can use the NVIDIA System Management Interface (nvidia-smi) to query GPU metrics. Download and install the NVIDIA driver which includes nvidia-smi
. Use the command:
nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,memory.used,memory.total --format=csv
This command outputs GPU details that can be captured by Zabbix.
- Custom Scripts for GPU Data:
Create a PowerShell script to fetch GPU data. Here’s an example script for NVIDIA GPUs:
# gpu_metrics.ps1
$gpu_data = & "C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe" --query-gpu=utilization.gpu,temperature.gpu,memory.used,memory.total --format=csv,noheader,nounits
$metrics = $gpu_data -split ","
@{
"utilization" = $metrics[0]
"temperature" = $metrics[1]
"memory_used" = $metrics[2]
"memory_total" = $metrics[3]
}
Save this script as gpu_metrics.ps1
.
Step 3: Configure Zabbix Agent to Execute the Script
- Modify Zabbix Agent Configuration:
Edit thezabbix_agentd.conf
file to include the custom script. Add the following lines to the configuration file (usually located atC:\Program Files\Zabbix Agent\zabbix_agentd.conf
):
UserParameter=gpu.utilization,powershell -File "C:\Path\To\gpu_metrics.ps1" -Command "$gpu_data = (.\gpu_metrics.ps1); echo $gpu_data['utilization']"
UserParameter=gpu.temperature,powershell -File "C:\Path\To\gpu_metrics.ps1" -Command "$gpu_data = (.\gpu_metrics.ps1); echo $gpu_data['temperature']"
UserParameter=gpu.memory.used,powershell -File "C:\Path\To\gpu_metrics.ps1" -Command "$gpu_data = (.\gpu_metrics.ps1); echo $gpu_data['memory_used']"
UserParameter=gpu.memory.total,powershell -File "C:\Path\To\gpu_metrics.ps1" -Command "$gpu_data = (.\gpu_metrics.ps1); echo $gpu_data['memory_total']"
Replace "C:\Path\To\gpu_metrics.ps1"
with the actual path to your PowerShell script.
- Restart Zabbix Agent:
After modifying the configuration, restart the Zabbix Agent service to apply the changes:
net stop "Zabbix Agent"
net start "Zabbix Agent"
Step 4: Configure Zabbix Server to Monitor GPU Metrics
- Add Windows Host to Zabbix:
- Log in to the Zabbix frontend.
- Go to Configuration -> Hosts.
- Click Create host.
- Fill in the host name, IP address, and select the appropriate groups.
2. Create New Items:
Define new items to collect the custom GPU metrics from the Windows host:
- Go to Configuration -> Hosts.
- Click on the host you added.
- Go to the Items tab and click Create item.
- Define an item for each GPU metric:
- Name: GPU Utilization
- Type: Zabbix agent
- Key:
gpu.utilization
- Type of information: Numeric (float or integer as needed)
- Units: % Repeat for other metrics (
gpu.temperature
,gpu.memory.used
, andgpu.memory.total
).
3. Create Triggers (Optional):
To alert when GPU usage is high or temperature exceeds a threshold, create triggers:
- Go to Configuration -> Hosts.
- Click on the host you added.
- Go to the Triggers tab and click Create trigger.
- Define conditions such as:
- Name: GPU Utilization High
- Expression:
{<host>:gpu.utilization.last()} > 90
4. Create Graphs (Optional):
Visualize the GPU metrics in graphs:
- Go to Configuration -> Hosts.
- Click on the host you added.
- Go to the Graphs tab and click Create graph.
- Define a graph that includes the GPU metrics.
Step 5: Verify and Monitor
- Go to Monitoring -> Latest data to see the collected GPU metrics.
- Check the Graphs and Triggers to ensure they reflect the data correctly.
Conclusion
The above steps guide you to setup GPU monitoring in Zabbix. There are various steps included in it, like installing the Zabbix Agent, gathering GPU metrics with PowerShell scripts, and configuring the Zabbix server to collect and visualize the GPU metrics. After following the above steps, you can easily install the Zabix agent on Windows and effectively monitor the performance of GPU metrics on your best GPU dedicated server.