Overview

As I mentioned on my previous post, all of our hosts are now registering themselves and pulling templates automatically. The next step is making them actually useful and that starts with capturing what processes and services are doing on each host.

This guide assumes you have a working Zabbix agent install (manual or scripted from the agent post) and that the agent is reporting back to the server.

Out of the box, the Zabbix agent ships keys like proc.num[], proc.cpu.util[], and service.info[]. These are great for "is the service up?" style checks, but they don't tell you:

  • How long a process has been running.
  • Who started it.
  • What its parent process is.
  • Which service (if any) it belongs to.

To get that information into Zabbix we'll define UserParameters that wrap small Powershell or shell commands, and then create items / triggers from them. The flow is:

  1. Add a UserParameter to the agent config.
  2. Restart the agent.
  3. Test the key with zabbix_get from the server/proxy.
  4. Create the item in a template.

UserParameters are evaluated by the agent on demand, so keep them cheap. Anything heavier than a couple of seconds belongs in UserParameter + caching, or in a scheduled script that writes to a file the agent reads.

Windows: Process Logging with Powershell

The Windows agent reads UserParameter lines from zabbix_agentd.conf (or any .conf file under the Include directory). Each parameter shells out to powershell.exe with our script.

1. Create the Helper Script

Save the following as C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1. It returns a single JSON object with everything Zabbix needs including EndedAt and DurationSec when a previously-running process stops.

The script uses a state file per process name so it can detect the transition from running to stopped and stamp the end time. State files live under $env:ProgramData\Zabbix\proc-state\.

<#
.SYNOPSIS
    Returns process metadata as JSON for Zabbix UserParameters.
    Tracks process lifecycle via a state file so we can report
    when the process ended and how long it ran for.
#>
[CmdletBinding()]
[OutputType([string])]
param(
    [Parameter(Mandatory = $true, Position = 0)]
    [ValidateNotNullOrEmpty()]
    [string]$Name
)
begin
{
    $ErrorActionPreference = 'Stop'

    $stateDir = "$env:ProgramData\Zabbix\proc-state"
    $safeName = $Name -replace '[\\/:*?"<>|]', '_'
    $stateFile = Join-Path $stateDir "$safeName.json"

    if (-not (Test-Path $stateDir))
    {
        $null = New-Item -ItemType Directory -Path $stateDir -Force

    }
}
process
{
    # Load previous state (if any)
    $prevState = $null
    if (Test-Path $stateFile)
    {
        try
        {
            $prevState = Get-Content -LiteralPath $stateFile -Raw | ConvertFrom-Json
        }
        catch
        {
            $prevState = $null
        }
    }

    # Check if the process is currently running
    $proc = Get-CimInstance Win32_Process -Filter "Name = '$Name'" |
    Sort-Object CreationDate |
    Select-Object -First 1

    $now = Get-Date

    if ($proc)
    {
        # Process is running - build the live payload
        $owner = Invoke-CimMethod -InputObject $proc -MethodName GetOwner
        $parent = Get-CimInstance Win32_Process -Filter "ProcessId = $($proc.ParentProcessId)"
        $uptime = [int]($now - $proc.CreationDate).TotalSeconds
        
        $result = [ordered]@{
            Name        = $proc.Name
            Pid         = $proc.ProcessId
            Running     = $true
            UptimeSec   = $uptime
            StartedAt   = $proc.CreationDate.ToString('o')
            EndedAt     = $null
            DurationSec = $null
            User        = "$($owner.Domain)\$($owner.User)"
            ParentPid   = $proc.ParentProcessId
            ParentName  = $parent.Name
            CommandLine = $proc.CommandLine
        }

        # Persist current state so we can detect when it stops
        [ordered]@{
            Name        = $proc.Name
            Pid         = $proc.ProcessId
            Running     = $true
            UptimeSec   = $uptime
            StartedAt   = $proc.CreationDate.ToString('o')
            EndedAt     = $null
            DurationSec = $null
            User        = "$($owner.Domain)\$($owner.User)"
            ParentPid   = $proc.ParentProcessId
            ParentName  = $parent.Name
            CommandLine = $proc.CommandLine
        } | ConvertTo-Json -Compress | Set-Content -LiteralPath $stateFile -Encoding UTF8

        return $result | ConvertTo-Json -Compress
    }
    else
    {
        
        # Process is NOT running - check if we saw it running before
        if ($prevState -and $prevState.Running -eq $true -and $prevState.StartedAt)
        {
            # Transition: was running, now stopped - stamp end time
            $startedAt = [datetime]::Parse($prevState.StartedAt)
            $endedAt = $now
            $durationSec = [int]($endedAt - $startedAt).TotalSeconds

            $result = [ordered]@{
                Name        = $Name
                Pid         = $prevState.Pid
                Running     = $false
                UptimeSec   = 0
                StartedAt   = $prevState.StartedAt
                EndedAt     = $endedAt.ToString('o')
                DurationSec = $durationSec
                User        = $(if ($prevState.User) { $prevState.User }else { "$($env:USERDOMAIN)\$($env:USERNAME)" })
                ParentPid   = $prevState.ParentPid
                ParentName  = $prevState.ParentName
                CommandLine = $prevState.CommandLine
            }

            # Update state file to reflect it has ended (prevents re-stamping on next poll)
            [ordered]@{
                Name        = $Name
                Pid         = $prevState.Pid
                Running     = $false
                UptimeSec   = 0
                StartedAt   = $prevState.StartedAt
                EndedAt     = $endedAt.ToString('o')
                DurationSec = $durationSec
                User        = $(if ($prevState.User) { $prevState.User }else { "$($env:USERDOMAIN)\$($env:USERNAME)" })
                ParentPid   = $prevState.ParentPid
                ParentName  = $prevState.ParentName
                CommandLine = $prevState.CommandLine
            } | ConvertTo-Json -Compress | Set-Content -LiteralPath $stateFile -Encoding UTF8

            return $result | ConvertTo-Json -Compress
        }
        elseif ($prevState -and $prevState.Running -eq $false -and $prevState.EndedAt)
        {
            # Already recorded as ended - return the last known run info
            $startedAt = [datetime]::Parse($prevState.StartedAt)
            $endedAt = [datetime]::Parse($prevState.EndedAt)
            $durationSec = [int]($endedAt - $startedAt).TotalSeconds

            return [ordered]@{
                Name        = $Name
                Pid         = $prevState.Pid
                Running     = $false
                UptimeSec   = 0
                StartedAt   = $prevState.StartedAt
                EndedAt     = $prevState.EndedAt
                DurationSec = $durationSec
                User        = $prevState.User
                ParentPid   = $prevState.ParentPid
                ParentName  = $prevState.ParentName
                CommandLine = $prevState.CommandLine
            } | ConvertTo-Json -Compress
        }
        else
        {
            # Never seen this process - no state at all
            return [ordered]@{
                Name        = $Name
                Pid         = $null
                Running     = $false
                UptimeSec   = 0
                StartedAt   = $null
                EndedAt     = $null
                DurationSec = $null
                User        = $null
                ParentPid   = $null
                ParentName  = $null
                CommandLine = $null
            } | ConvertTo-Json -Compress
        }
    }
}

The lifecycle:

  • First poll, process running: returns Running=true, UptimeSec=N, EndedAt=null, DurationSec=null. Saves state.
  • Later poll, process still running: returns Running=true with updated UptimeSec.
  • Poll after process stops: detects the transition (state file says Running=true but CIM finds nothing), stamps EndedAt=now, calculates DurationSec, returns Running=false. Updates state file.
  • Subsequent polls while still stopped: returns the same EndedAt / DurationSec from the state file without re-stamping.
  • Process restarts: CIM finds it again, returns Running=true, overwrites state file.

Run the script once by hand to make sure it works: powershell -File "C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1" notepad.exe. The output should look like the following:

{"Name":"notepad.exe","Pid":29224,"Running":true,"UptimeSec":3,"StartedAt":"2026-04-18T13:12:34.4686300-05:00","EndedAt":null,"DurationSec":null,"User":"Contoso\\John","ParentPid":4084,"ParentName":"explorer.exe","CommandLine":"\"C:\\Windows\\system32\\notepad.exe\" "}

2. Register the UserParameters

Create C:\Program Files\Zabbix Agent\zabbix_agentd.d\process_logging.conf with:

# proc.info[<name>] -> full JSON blob
UserParameter=proc.info[*],powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1" "$1"

# Convenience scalar keys for triggers / graphs
UserParameter=proc.uptime[*],powershell.exe -NoProfile -Command "((Get-CimInstance Win32_Process -Filter \"Name='$1'\" | Sort-Object CreationDate | Select-Object -First 1).CreationDate | %% { [int]((Get-Date) - $_).TotalSeconds })"
UserParameter=proc.user[*],powershell.exe   -NoProfile -Command "$p=Get-CimInstance Win32_Process -Filter \"Name='$1'\" | Select-Object -First 1; $o=Invoke-CimMethod $p -MethodName GetOwner; \"$($o.Domain)\\$($o.User)\""
UserParameter=proc.parent[*],powershell.exe -NoProfile -Command "$p=Get-CimInstance Win32_Process -Filter \"Name='$1'\" | Select-Object -First 1; (Get-CimInstance Win32_Process -Filter \"ProcessId=$($p.ParentProcessId)\").Name"

Verify the Include directive

Open the main agent config file (C:\Program Files\Zabbix Agent\zabbix_agentd.conf or zabbix_agent2.conf for Agent 2) and confirm the Include line points at the directory where you just saved process_logging.conf:

# For Zabbix Agent (classic)
Include=C:\Program Files\Zabbix Agent\zabbix_agentd.d\*.conf

# For Zabbix Agent 2
Include=C:\Program Files\Zabbix Agent 2\zabbix_agent2.d\*.conf

If the line is commented out (# Include=...), uncomment it. If the path doesn't match where you saved process_logging.conf, either move the file or update the path.

Restart the agent so it picks up the new keys:

Restart-Service 'Zabbix Agent'

Test locally on the agent first

Before testing from the server, verify the UserParameter works directly on the agent host. Open a command prompt as the same user the Zabbix agent service runs as and run:

# For Zabbix Agent (classic)
& 'C:\Program Files\Zabbix Agent\zabbix_agentd.exe' -t 'proc.info["notepad.exe"]'

# For Zabbix Agent 2
& 'C:\Program Files\Zabbix Agent 2\zabbix_agent2.exe' -t 'proc.info["notepad.exe"]'

Expected output: the JSON blob from Get-ProcessInfo.ps1. If you get ZBX_NOTSUPPORTED instead, work through these causes in order:

  1. Include path mismatch the .conf file isn't in the directory the Include directive points at. Verify with dir "C:\Program Files\Zabbix Agent\zabbix_agentd.d\*.conf" your file should show up.
  2. Agent not restarted UserParameters are read once at startup. The agent must be restarted after any change to the .conf files.
  3. Script path wrong the UserParameter line references C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1. Verify the file exists at exactly that path. A single character off and the agent silently returns NOTSUPPORTED.
  4. Execution policy the -ExecutionPolicy Bypass flag in the UserParameter should handle this, but if the agent's service account has a machine-level policy override, the script won't run. Test manually: powershell.exe -NoProfile -ExecutionPolicy Bypass -File "C:\Program Files\Zabbix Agent\scripts\Get-ProcessInfo.ps1" notepad.exe.
  5. Agent 2 vs classic agent Zabbix Agent 2 uses a different config file name (zabbix_agent2.conf) and a different Include directory (zabbix_agent2.d). If you installed Agent 2 but saved the UserParameter config into the classic agent's directory, it won't be loaded.
  6. UnsafeUserParameters if your UserParameter key contains special characters (the [*] wildcard is fine), you may need UnsafeUserParameters=1 in the main config. For the keys in this post, the default safe mode works.

Always test with -t on the agent before testing from the server. If -t fails, zabbix_get from the server will also fail plus you'll be debugging network connectivity at the same time. Isolate the problem: agent first, then network.

3. Service Logging

For Windows services, the built-in service.info[] key already returns state, but it doesn't tell you the executable, the start mode, or the account. A small wrapper does:

Add to the same process_logging.conf:

UserParameter=service.full[*],powershell.exe -NoProfile -Command "Get-CimInstance Win32_Service -Filter \"Name='$1'\" | Select-Object Name,DisplayName,State,StartMode,StartName,ProcessId,PathName | ConvertTo-Json -Compress"

Restart the agent and you can now query things like:

service.full[Spooler]
service.full[Zabbix Agent]

4. Test From the Server

From your Zabbix server or proxy, use zabbix_get to confirm the agent answers:

zabbix_get -s 10.0.0.50 -k 'proc.info[notepad.exe]'
zabbix_get -s 10.0.0.50 -k 'proc.uptime[notepad.exe]'
zabbix_get -s 10.0.0.50 -k 'service.full[Spooler]'

A healthy reply while the process is running:

{"Name":"notepad.exe","Pid":29224,"Running":true,"UptimeSec":3,"StartedAt":"2026-04-18T13:12:34.4686300-05:00","EndedAt":null,"DurationSec":null,"User":"Contoso\\John","ParentPid":4084,"ParentName":"explorer.exe","CommandLine":"\"C:\\Windows\\system32\\notepad.exe\" "}

After the process stops, the next poll returns:

{"Name":"notepad.exe","Pid":29224,"Running":false,"UptimeSec":0,"StartedAt":"2026-04-18T13:12:34.46863-05:00","EndedAt":"2026-04-18T13:13:35.4872109-05:00","DurationSec":61,"User":"Contoso\\John","ParentPid":4084,"ParentName":"explorer.exe","CommandLine":"\"C:\\Windows\\system32\\notepad.exe\" "}

DurationSec is the total wall-clock seconds the process ran for. Use it in a trigger like last(/Template/proc.info[myapp.exe],,"DurationSec") < 60 to catch processes that crash immediately after starting.

Linux: Process Logging with Shell

The Linux agent uses the same UserParameter mechanism. The data we want lives in /proc and the output of ps.

1. Create the Helper Script

Save as /etc/zabbix/scripts/proc-info.sh and chmod +x it. Like the Windows version, it uses a state file per process to track the running-to-stopped transition and calculate total duration.

#!/usr/bin/env bash
# Usage: proc-info.sh <process-name>
# State files: /var/lib/zabbix/proc-state/<name>.json
set -euo pipefail

NAME="${1:?process name required}"

STATE_DIR="/var/lib/zabbix/proc-state"
STATE_FILE="${STATE_DIR}/${NAME}.json"
mkdir -p "$STATE_DIR"

NOW=$(date -u '+%Y-%m-%dT%H:%M:%S+00:00')
NOW_EPOCH=$(date +%s)

# Pick the oldest matching PID so restarts don't churn the value
PID=$(pgrep -o -x "$NAME" 2>/dev/null || true)

if [[ -n "$PID" ]]; then
    # Process is running
    USER_NAME=$(ps -o user= -p "$PID" | tr -d ' ')
    PPID_VAL=$(ps -o ppid= -p "$PID" | tr -d ' ')
    PARENT=$(ps -o comm= -p "$PPID_VAL" 2>/dev/null | tr -d ' ')
    ETIME=$(ps -o etimes= -p "$PID" | tr -d ' ')
    CMD=$(tr '\0' ' ' < "/proc/$PID/cmdline" 2>/dev/null | sed 's/"/\\"/g; s/ *$//')

    START_EPOCH=$((NOW_EPOCH - ETIME))
    STARTED_AT=$(date -u -d "@$START_EPOCH" '+%Y-%m-%dT%H:%M:%S+00:00' 2>/dev/null || date -u -r "$START_EPOCH" '+%Y-%m-%dT%H:%M:%S+00:00')

    # Save state
    printf '{"pid":%d,"started_at":"%s","running":true,"ended_at":null}\n' \
        "$PID" "$STARTED_AT" > "$STATE_FILE"

    # Output
    printf '{"name":"%s","pid":%d,"running":true,"uptime_sec":%d,"started_at":"%s","ended_at":null,"duration_sec":null,"user":"%s","parent_pid":%s,"parent_name":"%s","command_line":"%s"}\n' \
        "$NAME" "$PID" "$ETIME" "$STARTED_AT" "$USER_NAME" "$PPID_VAL" "$PARENT" "$CMD"
else
    # Process is NOT running - check state file
    if [[ -f "$STATE_FILE" ]]; then
        PREV_RUNNING=$(python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('running','false'))" < "$STATE_FILE" 2>/dev/null || echo "false")
        PREV_PID=$(python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('pid',0))" < "$STATE_FILE" 2>/dev/null || echo "0")
        PREV_STARTED=$(python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('started_at',''))" < "$STATE_FILE" 2>/dev/null || echo "")
        PREV_ENDED=$(python3 -c "import json,sys; d=json.load(sys.stdin); print(d.get('ended_at','None'))" < "$STATE_FILE" 2>/dev/null || echo "None")

        if [[ "$PREV_RUNNING" == "True" && -n "$PREV_STARTED" ]]; then
            # Transition: was running, now stopped - stamp end time
            START_EPOCH=$(date -u -d "$PREV_STARTED" +%s 2>/dev/null || date -u -j -f '%Y-%m-%dT%H:%M:%S+00:00' "$PREV_STARTED" +%s 2>/dev/null || echo "0")
            DURATION=$((NOW_EPOCH - START_EPOCH))

            # Update state
            printf '{"pid":%s,"started_at":"%s","running":false,"ended_at":"%s"}\n' \
                "$PREV_PID" "$PREV_STARTED" "$NOW" > "$STATE_FILE"

            printf '{"name":"%s","pid":%s,"running":false,"uptime_sec":0,"started_at":"%s","ended_at":"%s","duration_sec":%d,"user":null,"parent_pid":null,"parent_name":null,"command_line":null}\n' \
                "$NAME" "$PREV_PID" "$PREV_STARTED" "$NOW" "$DURATION"

        elif [[ "$PREV_ENDED" != "None" && "$PREV_ENDED" != "null" && -n "$PREV_STARTED" ]]; then
            # Already ended - return last known run
            START_EPOCH=$(date -u -d "$PREV_STARTED" +%s 2>/dev/null || echo "0")
            END_EPOCH=$(date -u -d "$PREV_ENDED" +%s 2>/dev/null || echo "0")
            DURATION=$((END_EPOCH - START_EPOCH))

            printf '{"name":"%s","pid":%s,"running":false,"uptime_sec":0,"started_at":"%s","ended_at":"%s","duration_sec":%d,"user":null,"parent_pid":null,"parent_name":null,"command_line":null}\n' \
                "$NAME" "$PREV_PID" "$PREV_STARTED" "$PREV_ENDED" "$DURATION"
        else
            # State file exists but no useful data
            printf '{"name":"%s","pid":null,"running":false,"uptime_sec":0,"started_at":null,"ended_at":null,"duration_sec":null,"user":null,"parent_pid":null,"parent_name":null,"command_line":null}\n' "$NAME"
        fi
    else
        # No state file - never seen this process
        printf '{"name":"%s","pid":null,"running":false,"uptime_sec":0,"started_at":null,"ended_at":null,"duration_sec":null,"user":null,"parent_pid":null,"parent_name":null,"command_line":null}\n' "$NAME"
    fi
fi

The script uses python3 for JSON parsing of the state file. If Python isn't available on your agent hosts, replace those lines with jq (jq -r '.running' "$STATE_FILE") or simple grep/sed extraction.

2. Register the UserParameters

Create /etc/zabbix/zabbix_agentd.d/process_logging.conf:

UserParameter=proc.info[*],/etc/zabbix/scripts/proc-info.sh "$1"
UserParameter=proc.uptime[*],ps -o etimes= -p $(pgrep -o -x "$1") 2>/dev/null | tr -d ' '
UserParameter=proc.user[*],ps -o user= -p $(pgrep -o -x "$1") 2>/dev/null | tr -d ' '
UserParameter=proc.parent[*],ps -o comm= -p $(ps -o ppid= -p $(pgrep -o -x "$1") | tr -d ' ') 2>/dev/null

Restart the agent:

sudo systemctl restart zabbix-agent

3. Service Logging with systemd

For systemd-managed services, wrap systemctl show so Zabbix gets a structured payload:

UserParameter=service.full[*],systemctl show "$1" --property=Id,ActiveState,SubState,MainPID,User,ExecMainStartTimestamp,FragmentPath --no-page | awk -F= 'BEGIN{printf "{"} {printf "%s\"%s\":\"%s\"", (NR>1?",":""), $1, $2} END{printf "}\n"}'

The awk snippet turns Id=zabbix-agent2.service\nActiveState=active\n... into a single JSON object that the Zabbix preprocessing pipeline can parse.

4. Test From the Server

zabbix_get -s 10.0.0.51 -k 'proc.info[sshd]'
zabbix_get -s 10.0.0.51 -k 'proc.uptime[sshd]'
zabbix_get -s 10.0.0.51 -k 'service.full[zabbix-agent2.service]'

Wire It Up in the Frontend

1. Create the Items

In your Windows / Linux template, add items with:

  • Type: Zabbix agent (or Zabbix agent (active) if you prefer)
  • Key: proc.info[<name>] for example proc.info["notepad.exe"]
  • Type of information: Text
  • Update interval: 1m

proc.info will not appear in the key dropdown. UserParameter keys are custom the Zabbix frontend has no way to know about them until you type the key manually into the Key field. Don't use the "Select" button; just type proc.info[yourprocess] directly. If you get ZBX_NOTSUPPORTED, the agent hasn't picked up the UserParameter verify the .conf file is in the Include path and restart the agent.

For the JSON payload, add preprocessing steps:

  • Step 1: JSONPath -> $.UptimeSec (one item per field you want to graph).
  • Step 2 (optional): Discard unchanged with heartbeat to keep history small.

Create one proc.info[*] master item, then one dependent item per field (User, ParentName, UptimeSec, etc.). That way the agent is only polled once per cycle.

2. Useful Triggers

A few triggers worth adding:

  • Process restarted: change(/Template/proc.uptime[myapp.exe]) < 0
  • Process running too long: last(/Template/proc.uptime[myapp.exe]) > 86400
  • Wrong user: last(/Template/proc.user[myapp.exe]) <> "CORP\\svc_myapp"
  • Unexpected parent: last(/Template/proc.parent[myapp.exe]) <> "services.exe"

The "wrong user" and "unexpected parent" triggers are especially nice for catching things like a service binary being launched manually by an operator instead of by the SCM.

Don't forget to update the zabbix_agentd.conf and uncomment AllowKey=system.run[*] to allow the agent to run commands.

What to Do Next

A couple of UserParameters and a small helper script per platform turn the agent into a proper process auditor: who is running what, for how long, under which parent, and as which user. The four signals that matter (process exists, parent is correct, user is correct, runtime is bounded) cover most of the alerts a "weird process" investigation actually needs.

Three concrete moves to deploy this on a host this week:

  1. Start with two processes, not twenty. Pick one critical service and one rarely-running maintenance job. Validate the JSON payload, the trigger, and the dashboard rendering on those two before you grow the list.
  2. Wire the LLD prototype next. Static items per process don't scale. The same UserParameter that returns one JSON blob can feed an LLD rule, which means new processes show up without anyone editing a template.
  3. Add a "wrong parent" trigger. It's the cheapest catch for "operator launched the binary by hand" and the kind of low-noise, high-signal alert that makes ops trust the system. One trigger, one template, applied fleet-wide.

Pairs naturally with the Low-Level Discovery post (which turns the static process list into a self-discovering one) and the log monitoring post (so when the process misbehaves, its log line shows up next to its CPU graph in Latest data).