Overview
As I mentioned on my previous post, all of our hosts are now registering themselves and pulling templates automatically. The next step is making them actually useful and that starts with capturing what processes and services are doing on each host.
This guide assumes you have a working Zabbix agent install (manual or scripted from the agent post) and that the agent is reporting back to the server.
Out of the box, the Zabbix agent ships keys like proc.num[], proc.cpu.util[], and service.info[]. These are great for "is the service up?" style checks, but they don't tell you:
- How long a process has been running.
- Who started it.
- What its parent process is.
- Which service (if any) it belongs to.
To get that information into Zabbix we'll define UserParameters that wrap small Powershell or shell commands, and then create items / triggers from them. The flow is:
- Add a
UserParameterto the agent config. - Restart the agent.
- Test the key with
zabbix_getfrom the server/proxy. - Create the item in a template.
UserParameters are evaluated by the agent on demand, so keep them cheap. Anything heavier than a couple of seconds belongs in
UserParameter+ caching, or in a scheduled script that writes to a file the agent reads.
Windows: Process Logging with Powershell
The Windows agent reads UserParameter lines from zabbix_agentd.conf (or any .conf file under the Include directory). Each parameter shells out to powershell.exe with our script.
1. Create the Helper Script
Save the following as C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1. It returns a single JSON object with everything Zabbix needs including EndedAt and DurationSec when a previously-running process stops.
The script uses a state file per process name so it can detect the transition from running to stopped and stamp the end time. State files live under $env:ProgramData\Zabbix\proc-state\.
<#
.SYNOPSIS
Returns process metadata as JSON for Zabbix UserParameters.
Tracks process lifecycle via a state file so we can report
when the process ended and how long it ran for.
#>
[CmdletBinding()]
[OutputType([string])]
param(
[Parameter(Mandatory = $true, Position = 0)]
[ValidateNotNullOrEmpty()]
[string]$Name
)
begin
{
$ErrorActionPreference = 'Stop'
$stateDir = "$env:ProgramData\Zabbix\proc-state"
$safeName = $Name -replace '[\\/:*?"<>|]', '_'
$stateFile = Join-Path $stateDir "$safeName.json"
if (-not (Test-Path $stateDir))
{
$null = New-Item -ItemType Directory -Path $stateDir -Force
}
}
process
{
# Load previous state (if any)
$prevState = $null
if (Test-Path $stateFile)
{
try
{
$prevState = Get-Content -LiteralPath $stateFile -Raw | ConvertFrom-Json
}
catch
{
$prevState = $null
}
}
# Check if the process is currently running
$proc = Get-CimInstance Win32_Process -Filter "Name = '$Name'" |
Sort-Object CreationDate |
Select-Object -First 1
$now = Get-Date
if ($proc)
{
# Process is running - build the live payload
$owner = Invoke-CimMethod -InputObject $proc -MethodName GetOwner
$parent = Get-CimInstance Win32_Process -Filter "ProcessId = $($proc.ParentProcessId)"
$uptime = [int]($now - $proc.CreationDate).TotalSeconds
$result = [ordered]@{
Name = $proc.Name
Pid = $proc.ProcessId
Running = $true
UptimeSec = $uptime
StartedAt = $proc.CreationDate.ToString('o')
EndedAt = $null
DurationSec = $null
User = "$($owner.Domain)\$($owner.User)"
ParentPid = $proc.ParentProcessId
ParentName = $parent.Name
CommandLine = $proc.CommandLine
}
# Persist current state so we can detect when it stops
[ordered]@{
Name = $proc.Name
Pid = $proc.ProcessId
Running = $true
UptimeSec = $uptime
StartedAt = $proc.CreationDate.ToString('o')
EndedAt = $null
DurationSec = $null
User = "$($owner.Domain)\$($owner.User)"
ParentPid = $proc.ParentProcessId
ParentName = $parent.Name
CommandLine = $proc.CommandLine
} | ConvertTo-Json -Compress | Set-Content -LiteralPath $stateFile -Encoding UTF8
return $result | ConvertTo-Json -Compress
}
else
{
# Process is NOT running - check if we saw it running before
if ($prevState -and $prevState.Running -eq $true -and $prevState.StartedAt)
{
# Transition: was running, now stopped - stamp end time
$startedAt = [datetime]::Parse($prevState.StartedAt)
$endedAt = $now
$durationSec = [int]($endedAt - $startedAt).TotalSeconds
$result = [ordered]@{
Name = $Name
Pid = $prevState.Pid
Running = $false
UptimeSec = 0
StartedAt = $prevState.StartedAt
EndedAt = $endedAt.ToString('o')
DurationSec = $durationSec
User = $(if ($prevState.User) { $prevState.User }else { "$($env:USERDOMAIN)\$($env:USERNAME)" })
ParentPid = $prevState.ParentPid
ParentName = $prevState.ParentName
CommandLine = $prevState.CommandLine
}
# Update state file to reflect it has ended (prevents re-stamping on next poll)
[ordered]@{
Name = $Name
Pid = $prevState.Pid
Running = $false
UptimeSec = 0
StartedAt = $prevState.StartedAt
EndedAt = $endedAt.ToString('o')
DurationSec = $durationSec
User = $(if ($prevState.User) { $prevState.User }else { "$($env:USERDOMAIN)\$($env:USERNAME)" })
ParentPid = $prevState.ParentPid
ParentName = $prevState.ParentName
CommandLine = $prevState.CommandLine
} | ConvertTo-Json -Compress | Set-Content -LiteralPath $stateFile -Encoding UTF8
return $result | ConvertTo-Json -Compress
}
elseif ($prevState -and $prevState.Running -eq $false -and $prevState.EndedAt)
{
# Already recorded as ended - return the last known run info
$startedAt = [datetime]::Parse($prevState.StartedAt)
$endedAt = [datetime]::Parse($prevState.EndedAt)
$durationSec = [int]($endedAt - $startedAt).TotalSeconds
return [ordered]@{
Name = $Name
Pid = $prevState.Pid
Running = $false
UptimeSec = 0
StartedAt = $prevState.StartedAt
EndedAt = $prevState.EndedAt
DurationSec = $durationSec
User = $prevState.User
ParentPid = $prevState.ParentPid
ParentName = $prevState.ParentName
CommandLine = $prevState.CommandLine
} | ConvertTo-Json -Compress
}
else
{
# Never seen this process - no state at all
return [ordered]@{
Name = $Name
Pid = $null
Running = $false
UptimeSec = 0
StartedAt = $null
EndedAt = $null
DurationSec = $null
User = $null
ParentPid = $null
ParentName = $null
CommandLine = $null
} | ConvertTo-Json -Compress
}
}
}
The lifecycle:
- First poll, process running: returns
Running=true,UptimeSec=N,EndedAt=null,DurationSec=null. Saves state. - Later poll, process still running: returns
Running=truewith updatedUptimeSec. - Poll after process stops: detects the transition (state file says
Running=truebut CIM finds nothing), stampsEndedAt=now, calculatesDurationSec, returnsRunning=false. Updates state file. - Subsequent polls while still stopped: returns the same
EndedAt/DurationSecfrom the state file without re-stamping. - Process restarts: CIM finds it again, returns
Running=true, overwrites state file.
Run the script once by hand to make sure it works:
powershell -File "C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1" notepad.exe. The output should look like the following:
{"Name":"notepad.exe","Pid":29224,"Running":true,"UptimeSec":3,"StartedAt":"2026-04-18T13:12:34.4686300-05:00","EndedAt":null,"DurationSec":null,"User":"Contoso\\John","ParentPid":4084,"ParentName":"explorer.exe","CommandLine":"\"C:\\Windows\\system32\\notepad.exe\" "}
2. Register the UserParameters
Create C:\Program Files\Zabbix Agent\zabbix_agentd.d\process_logging.conf with:
# proc.info[<name>] -> full JSON blob
UserParameter=proc.info[*],powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1" "$1"
# Convenience scalar keys for triggers / graphs
UserParameter=proc.uptime[*],powershell.exe -NoProfile -Command "((Get-CimInstance Win32_Process -Filter \"Name='$1'\" | Sort-Object CreationDate | Select-Object -First 1).CreationDate | %% { [int]((Get-Date) - $_).TotalSeconds })"
UserParameter=proc.user[*],powershell.exe -NoProfile -Command "$p=Get-CimInstance Win32_Process -Filter \"Name='$1'\" | Select-Object -First 1; $o=Invoke-CimMethod $p -MethodName GetOwner; \"$($o.Domain)\\$($o.User)\""
UserParameter=proc.parent[*],powershell.exe -NoProfile -Command "$p=Get-CimInstance Win32_Process -Filter \"Name='$1'\" | Select-Object -First 1; (Get-CimInstance Win32_Process -Filter \"ProcessId=$($p.ParentProcessId)\").Name"
Verify the Include directive
Open the main agent config file (C:\Program Files\Zabbix Agent\zabbix_agentd.conf or zabbix_agent2.conf for Agent 2) and confirm the Include line points at the directory where you just saved process_logging.conf:
# For Zabbix Agent (classic)
Include=C:\Program Files\Zabbix Agent\zabbix_agentd.d\*.conf
# For Zabbix Agent 2
Include=C:\Program Files\Zabbix Agent 2\zabbix_agent2.d\*.conf
If the line is commented out (# Include=...), uncomment it. If the path doesn't match where you saved process_logging.conf, either move the file or update the path.
Restart the agent so it picks up the new keys:
Restart-Service 'Zabbix Agent'
Test locally on the agent first
Before testing from the server, verify the UserParameter works directly on the agent host. Open a command prompt as the same user the Zabbix agent service runs as and run:
# For Zabbix Agent (classic)
& 'C:\Program Files\Zabbix Agent\zabbix_agentd.exe' -t 'proc.info["notepad.exe"]'
# For Zabbix Agent 2
& 'C:\Program Files\Zabbix Agent 2\zabbix_agent2.exe' -t 'proc.info["notepad.exe"]'
Expected output: the JSON blob from Get-ProcessInfo.ps1. If you get ZBX_NOTSUPPORTED instead, work through these causes in order:
- Include path mismatch the
.conffile isn't in the directory theIncludedirective points at. Verify withdir "C:\Program Files\Zabbix Agent\zabbix_agentd.d\*.conf"your file should show up. - Agent not restarted UserParameters are read once at startup. The agent must be restarted after any change to the
.conffiles. - Script path wrong the
UserParameterline referencesC:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1. Verify the file exists at exactly that path. A single character off and the agent silently returnsNOTSUPPORTED. - Execution policy the
-ExecutionPolicy Bypassflag in the UserParameter should handle this, but if the agent's service account has a machine-level policy override, the script won't run. Test manually:powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1" notepad.exe. - Agent 2 vs classic agent Zabbix Agent 2 uses a different config file name (
zabbix_agent2.conf) and a differentIncludedirectory (zabbix_agent2.d). If you installed Agent 2 but saved the UserParameter config into the classic agent's directory, it won't be loaded. - UnsafeUserParameters if your UserParameter key contains special characters (the
[*]wildcard is fine), you may needUnsafeUserParameters=1in the main config. For the keys in this post, the default safe mode works.
Always test with
-ton the agent before testing from the server. If-tfails,zabbix_getfrom the server will also fail plus you'll be debugging network connectivity at the same time. Isolate the problem: agent first, then network.
3. Service Logging
For Windows services, the built-in service.info[] key already returns state, but it doesn't tell you the executable, the start mode, or the account. A small wrapper does:
Add to the same process_logging.conf:
UserParameter=service.full[*],powershell.exe -NoProfile -Command "Get-CimInstance Win32_Service -Filter \"Name='$1'\" | Select-Object Name,DisplayName,State,StartMode,StartName,ProcessId,PathName | ConvertTo-Json -Compress"
Restart the agent and you can now query things like:
service.full[Spooler]
service.full[Zabbix Agent]
4. Test From the Server
From your Zabbix server or proxy, use zabbix_get to confirm the agent answers:
zabbix_get -s 10.0.0.50 -k 'proc.info[notepad.exe]'
zabbix_get -s 10.0.0.50 -k 'proc.uptime[notepad.exe]'
zabbix_get -s 10.0.0.50 -k 'service.full[Spooler]'
A healthy reply while the process is running:
{"Name":"notepad.exe","Pid":29224,"Running":true,"UptimeSec":3,"StartedAt":"2026-04-18T13:12:34.4686300-05:00","EndedAt":null,"DurationSec":null,"User":"Contoso\\John","ParentPid":4084,"ParentName":"explorer.exe","CommandLine":"\"C:\\Windows\\system32\\notepad.exe\" "}
After the process stops, the next poll returns:
{"Name":"notepad.exe","Pid":29224,"Running":false,"UptimeSec":0,"StartedAt":"2026-04-18T13:12:34.46863-05:00","EndedAt":"2026-04-18T13:13:35.4872109-05:00","DurationSec":61,"User":"Contoso\\John","ParentPid":4084,"ParentName":"explorer.exe","CommandLine":"\"C:\\Windows\\system32\\notepad.exe\" "}
DurationSec is the total wall-clock seconds the process ran for. Use it in a trigger like last(/Template/proc.info[myapp.exe],,"DurationSec") < 60 to catch processes that crash immediately after starting.
Linux: Process Logging with Shell
The Linux agent uses the same UserParameter mechanism. The data we want lives in /proc and the output of ps.
1. Create the Helper Script
Save as /etc/zabbix/scripts/proc-info.sh and chmod +x it. Like the Windows version, it uses a state file per process to track the running-to-stopped transition and calculate total duration.
#!/usr/bin/env bash
# Usage: proc-info.sh <process-name>
# State files: /var/lib/zabbix/proc-state/<name>.json
set -euo pipefail
NAME="${1:?process name required}"
STATE_DIR="/var/lib/zabbix/proc-state"
STATE_FILE="${STATE_DIR}/${NAME}.json"
mkdir -p "$STATE_DIR"
NOW=$(date -u '+%Y-%m-%dT%H:%M:%S+00:00')
NOW_EPOCH=$(date +%s)
# Pick the oldest matching PID so restarts don't churn the value
PID=$(pgrep -o -x "$NAME" 2>/dev/null || true)
if [[ -n "$PID" ]]; then
# Process is running
USER_NAME=$(ps -o user= -p "$PID" | tr -d ' ')
PPID_VAL=$(ps -o ppid= -p "$PID" | tr -d ' ')
PARENT=$(ps -o comm= -p "$PPID_VAL" 2>/dev/null | tr -d ' ')
ETIME=$(ps -o etimes= -p "$PID" | tr -d ' ')
CMD=$(tr '\0' ' ' < "/proc/$PID/cmdline" 2>/dev/null | sed 's/"/\\"/g; s/ *$//')
START_EPOCH=$((NOW_EPOCH - ETIME))
STARTED_AT=$(date -u -d "@$START_EPOCH" '+%Y-%m-%dT%H:%M:%S+00:00' 2>/dev/null || date -u -r "$START_EPOCH" '+%Y-%m-%dT%H:%M:%S+00:00')
# Save state
printf '{"pid":%d,"started_at":"%s","running":true,"ended_at":null}\n' \
"$PID" "$STARTED_AT" > "$STATE_FILE"
# Output
printf '{"name":"%s","pid":%d,"running":true,"uptime_sec":%d,"started_at":"%s","ended_at":null,"duration_sec":null,"user":"%s","parent_pid":%s,"parent_name":"%s","command_line":"%s"}\n' \
"$NAME" "$PID" "$ETIME" "$STARTED_AT" "$USER_NAME" "$PPID_VAL" "$PARENT" "$CMD"
else
# Process is NOT running - check state file
if [[ -f "$STATE_FILE" ]]; then
PREV_RUNNING=$(python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('running','false'))" < "$STATE_FILE" 2>/dev/null || echo "false")
PREV_PID=$(python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('pid',0))" < "$STATE_FILE" 2>/dev/null || echo "0")
PREV_STARTED=$(python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('started_at',''))" < "$STATE_FILE" 2>/dev/null || echo "")
PREV_ENDED=$(python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('ended_at','None'))" < "$STATE_FILE" 2>/dev/null || echo "None")
if [[ "$PREV_RUNNING" == "True" && -n "$PREV_STARTED" ]]; then
# Transition: was running, now stopped - stamp end time
START_EPOCH=$(date -u -d "$PREV_STARTED" +%s 2>/dev/null || date -u -j -f '%Y-%m-%dT%H:%M:%S+00:00' "$PREV_STARTED" +%s 2>/dev/null || echo "0")
DURATION=$((NOW_EPOCH - START_EPOCH))
# Update state
printf '{"pid":%s,"started_at":"%s","running":false,"ended_at":"%s"}\n' \
"$PREV_PID" "$PREV_STARTED" "$NOW" > "$STATE_FILE"
printf '{"name":"%s","pid":%s,"running":false,"uptime_sec":0,"started_at":"%s","ended_at":"%s","duration_sec":%d,"user":null,"parent_pid":null,"parent_name":null,"command_line":null}\n' \
"$NAME" "$PREV_PID" "$PREV_STARTED" "$NOW" "$DURATION"
elif [[ "$PREV_ENDED" != "None" && "$PREV_ENDED" != "null" && -n "$PREV_STARTED" ]]; then
# Already ended - return last known run
START_EPOCH=$(date -u -d "$PREV_STARTED" +%s 2>/dev/null || echo "0")
END_EPOCH=$(date -u -d "$PREV_ENDED" +%s 2>/dev/null || echo "0")
DURATION=$((END_EPOCH - START_EPOCH))
printf '{"name":"%s","pid":%s,"running":false,"uptime_sec":0,"started_at":"%s","ended_at":"%s","duration_sec":%d,"user":null,"parent_pid":null,"parent_name":null,"command_line":null}\n' \
"$NAME" "$PREV_PID" "$PREV_STARTED" "$PREV_ENDED" "$DURATION"
else
# State file exists but no useful data
printf '{"name":"%s","pid":null,"running":false,"uptime_sec":0,"started_at":null,"ended_at":null,"duration_sec":null,"user":null,"parent_pid":null,"parent_name":null,"command_line":null}\n' "$NAME"
fi
else
# No state file - never seen this process
printf '{"name":"%s","pid":null,"running":false,"uptime_sec":0,"started_at":null,"ended_at":null,"duration_sec":null,"user":null,"parent_pid":null,"parent_name":null,"command_line":null}\n' "$NAME"
fi
fi
The script uses
python3for JSON parsing of the state file. If Python isn't available on your agent hosts, replace those lines withjq(jq -r '.running' "$STATE_FILE") or simplegrep/sedextraction.
2. Register the UserParameters
Create /etc/zabbix/zabbix_agentd.d/process_logging.conf:
UserParameter=proc.info[*],/etc/zabbix/scripts/proc-info.sh "$1"
UserParameter=proc.uptime[*],ps -o etimes= -p $(pgrep -o -x "$1") 2>/dev/null | tr -d ' '
UserParameter=proc.user[*],ps -o user= -p $(pgrep -o -x "$1") 2>/dev/null | tr -d ' '
UserParameter=proc.parent[*],ps -o comm= -p $(ps -o ppid= -p $(pgrep -o -x "$1") | tr -d ' ') 2>/dev/null
Restart the agent:
sudo systemctl restart zabbix-agent
3. Service Logging with systemd
For systemd-managed services, wrap systemctl show so Zabbix gets a structured payload:
UserParameter=service.full[*],systemctl show "$1" --property=Id,ActiveState,SubState,MainPID,User,ExecMainStartTimestamp,FragmentPath --no-page | awk -F= 'BEGIN{printf "{"} {printf "%s\"%s\":\"%s\"", (NR>1?",":""), $1, $2} END{printf "}\n"}'
The
awksnippet turnsId=zabbix-agent2.service\nActiveState=active\n...into a single JSON object that the Zabbix preprocessing pipeline can parse.
4. Test From the Server
zabbix_get -s 10.0.0.51 -k 'proc.info[sshd]'
zabbix_get -s 10.0.0.51 -k 'proc.uptime[sshd]'
zabbix_get -s 10.0.0.51 -k 'service.full[zabbix-agent2.service]'
Wire It Up in the Frontend
1. Create the Items
In your Windows / Linux template, add items with:
Type:Zabbix agent(orZabbix agent (active)if you prefer)Key:proc.info[<name>]for exampleproc.info["notepad.exe"]Type of information:TextUpdate interval:1m
proc.infowill not appear in the key dropdown. UserParameter keys are custom the Zabbix frontend has no way to know about them until you type the key manually into theKeyfield. Don't use the "Select" button; just typeproc.info[yourprocess]directly. If you getZBX_NOTSUPPORTED, the agent hasn't picked up the UserParameter verify the.conffile is in theIncludepath and restart the agent.
For the JSON payload, add preprocessing steps:
- Step 1:
JSONPath->$.UptimeSec(one item per field you want to graph). - Step 2 (optional):
Discard unchanged with heartbeatto keep history small.
Create one
proc.info[*]master item, then one dependent item per field (User,ParentName,UptimeSec, etc.). That way the agent is only polled once per cycle.
2. Useful Triggers
A few triggers worth adding:
Process restarted:change(/Template/proc.uptime[myapp.exe]) < 0Process running too long:last(/Template/proc.uptime[myapp.exe]) > 86400Wrong user:last(/Template/proc.user[myapp.exe]) <> "CORP\\svc_myapp"Unexpected parent:last(/Template/proc.parent[myapp.exe]) <> "services.exe"
The "wrong user" and "unexpected parent" triggers are especially nice for catching things like a service binary being launched manually by an operator instead of by the SCM.
Don't forget to update the
zabbix_agentd.confand uncommentAllowKey=system.run[*]to allow the agent to run commands.
What to Do Next
A couple of UserParameters and a small helper script per platform turn the agent into a proper process auditor: who is running what, for how long, under which parent, and as which user. The four signals that matter (process exists, parent is correct, user is correct, runtime is bounded) cover most of the alerts a "weird process" investigation actually needs.
Three concrete moves to deploy this on a host this week:
- Start with two processes, not twenty. Pick one critical service and one rarely-running maintenance job. Validate the JSON payload, the trigger, and the dashboard rendering on those two before you grow the list.
- Wire the LLD prototype next. Static items per process don't scale. The same UserParameter that returns one JSON blob can feed an LLD rule, which means new processes show up without anyone editing a template.
- Add a "wrong parent" trigger. It's the cheapest catch for "operator launched the binary by hand" and the kind of low-noise, high-signal alert that makes ops trust the system. One trigger, one template, applied fleet-wide.
Pairs naturally with the Low-Level Discovery post (which turns the static process list into a self-discovering one) and the log monitoring post (so when the process misbehaves, its log line shows up next to its CPU graph in Latest data).
![Zabbix Log File Monitoring with log[] and logrt[]](/Images/Posts/zabbix-log-monitoring.webp)

