Monitoring GPU metrics on a Windows machine using Zabbix involves several steps, including setting up the Zabbix Agent, gathering GPU data, and configuring Zabbix to collect and display these metrics. Below is a detailed guide to achieve this.
Step-by-Step Guide to Monitor Windows GPU with Zabbix
Prerequisites
- Zabbix Server: Ensure you have a Zabbix server installed and running.
- Zabbix Agent: Install the Zabbix Agent on the Windows machine where you want to monitor the GPU.
Step 1: Install Zabbix Agent on Windows
- Download the Zabbix Agent:
Download the Zabbix Agent installer from the official Zabbix website: Zabbix Downloads. - Install the Zabbix Agent:
Follow the installation wizard and configure the agent:
- Server: Enter the IP address or hostname of your Zabbix server.
- ServerActive: Again, enter the Zabbix server’s address.
- Hostname: Set the hostname of the Windows machine as it will appear in Zabbix. You can also configure these parameters later in the
zabbix_agentd.conf
file located typically inC:\Program Files\Zabbix Agent\
.
3. Start the Zabbix Agent:
After installation, start the Zabbix Agent service through the Services management console (services.msc
) or using the command line:
net start "Zabbix Agent"
Step 2: Gather GPU Metrics on Windows
To monitor GPU metrics such as utilization, temperature, and memory usage, we can use Windows Performance Counters, and scripts to collect the required data.
- Windows Performance Counters:
Windows provides GPU performance counters that can be accessed via PowerShell or WMI (Windows Management Instrumentation). Common counters for GPUs include:
GPU Engine
GPU Adapter Memory
These counters provide various metrics such as GPU utilization and memory usage.
2. NVIDIA SMI (for NVIDIA GPUs):
If you have an NVIDIA GPU, you can use the NVIDIA System Management Interface (nvidia-smi) to query GPU metrics. Download and install the NVIDIA driver which includes nvidia-smi
. Use the command:
nvidia-smi --query-gpu=name,utilization.gpu,temperature.gpu,memory.used,memory.total --format=csv
This command outputs GPU details that can be captured by Zabbix.
- Custom Scripts for GPU Data:
Create a PowerShell script to fetch GPU data. Here’s an example script for NVIDIA GPUs:
# gpu_metrics.ps1
$gpu_data = & "C:\Program Files\NVIDIA Corporation\NVSMI\nvidia-smi.exe" --query-gpu=utilization.gpu,temperature.gpu,memory.used,memory.total --format=csv,noheader,nounits
$metrics = $gpu_data -split ","
@{
"utilization" = $metrics[0]
"temperature" = $metrics[1]
"memory_used" = $metrics[2]
"memory_total" = $metrics[3]
}
Save this script as gpu_metrics.ps1
.
Step 3: Configure Zabbix Agent to Execute the Script
- Modify Zabbix Agent Configuration:
Edit thezabbix_agentd.conf
file to include the custom script. Add the following lines to the configuration file (usually located atC:\Program Files\Zabbix Agent\zabbix_agentd.conf
):
UserParameter=gpu.utilization,powershell -File "C:\Path\To\gpu_metrics.ps1" -Command "$gpu_data = (.\gpu_metrics.ps1); echo $gpu_data['utilization']"
UserParameter=gpu.temperature,powershell -File "C:\Path\To\gpu_metrics.ps1" -Command "$gpu_data = (.\gpu_metrics.ps1); echo $gpu_data['temperature']"
UserParameter=gpu.memory.used,powershell -File "C:\Path\To\gpu_metrics.ps1" -Command "$gpu_data = (.\gpu_metrics.ps1); echo $gpu_data['memory_used']"
UserParameter=gpu.memory.total,powershell -File "C:\Path\To\gpu_metrics.ps1" -Command "$gpu_data = (.\gpu_metrics.ps1); echo $gpu_data['memory_total']"
Replace "C:\Path\To\gpu_metrics.ps1"
with the actual path to your PowerShell script.
- Restart Zabbix Agent:
After modifying the configuration, restart the Zabbix Agent service to apply the changes:
net stop "Zabbix Agent"
net start "Zabbix Agent"
Step 4: Configure Zabbix Server to Monitor GPU Metrics
- Add Windows Host to Zabbix:
- Log in to the Zabbix frontend.
- Go to Configuration -> Hosts.
- Click Create host.
- Fill in the host name, IP address, and select the appropriate groups.
2. Create New Items:
Define new items to collect the custom GPU metrics from the Windows host:
- Go to Configuration -> Hosts.
- Click on the host you added.
- Go to the Items tab and click Create item.
- Define an item for each GPU metric:
- Name: GPU Utilization
- Type: Zabbix agent
- Key:
gpu.utilization
- Type of information: Numeric (float or integer as needed)
- Units: % Repeat for other metrics (
gpu.temperature
,gpu.memory.used
, andgpu.memory.total
).
3. Create Triggers (Optional):
To alert when GPU usage is high or temperature exceeds a threshold, create triggers:
- Go to Configuration -> Hosts.
- Click on the host you added.
- Go to the Triggers tab and click Create trigger.
- Define conditions such as:
- Name: GPU Utilization High
- Expression:
{<host>:gpu.utilization.last()} > 90
4. Create Graphs (Optional):
Visualize the GPU metrics in graphs:
- Go to Configuration -> Hosts.
- Click on the host you added.
- Go to the Graphs tab and click Create graph.
- Define a graph that includes the GPU metrics.
Step 5: Verify and Monitor
- Go to Monitoring -> Latest data to see the collected GPU metrics.
- Check the Graphs and Triggers to ensure they reflect the data correctly.
Summary
By following these steps, you can effectively monitor the GPU performance metrics on a Windows machine using Zabbix. This setup includes installing the Zabbix Agent, gathering GPU metrics with PowerShell scripts, and configuring the Zabbix server to collect and visualize these metrics.