Final post in the Active Directory at Scale series. Reuses the reconciler skeleton from the AD-as-code post and the audit/enforce pattern from the security-baselines post.
Every Active Directory older than five years carries sediment. Accounts for people who left in 2019. Groups with one member who's also left. DNS records pointing at IP addresses no one recognizes. Service accounts tied to applications decommissioned two reorgs ago. SPN entries from a 2015 SQL cluster migration. Machine accounts for laptops that were wiped and reissued under different names.
None of it is immediately broken. All of it is attack surface, audit noise, and operational ambiguity. "Is this group in use?" becomes a five-person Slack thread. "Can I delete this OU?" becomes a three-week project because nobody's sure what depends on it.
The cleanup strategy in this post is: inventory → classify → quarantine → retention → delete. Every stage is a Get-Test-Set triple. Nothing is deleted without first spending time in a quarantine OU so mistakes surface while they're cheap.
The Four Object Classes Worth Cleaning
| Class | Definition of "stale" |
|---|---|
| User | Enabled = $false for 90 days, or LastLogonDate < today 180, or never logged in + created > 60 days ago |
| Computer | PasswordLastSet < today 90 (secure channel hasn't refreshed) |
| Group | Zero direct members and zero usage as an ACE anywhere in the forest |
| DNS | Dynamic record with timestamp older than the scavenging threshold, or static A record pointing at an IP that no host answers |
Those are starting rules. Every environment has exceptions (legacy service accounts, break-glass admin accounts, DR-only hosts). The YAML state file encodes exceptions explicitly:
cleanup:
users:
staleAfterDays: 180
neverLoggedInAfterDays: 60
exempt:
- identity:svc-breakglass # always exempt
- ou:ou.service-accounts.legacy # ou-wide exemption
computers:
staleAfterDays: 90
exempt:
- identity:dr-site-witness
groups:
deleteEmptyAfterDays: 30
exempt: []
dns:
scavengeAfterDays: 30
zones:
- corp.example.com
- _msdcs.corp.example.com
quarantine:
ouDn: OU=Quarantine,DC=corp,DC=example,DC=com
retentionDays: 45
Inventory What's Actually There
Inventory is Get-*: pure read, no side effects. Output is a record per candidate:
function Get-StaleUserInventory
{
[CmdletBinding()]
[OutputType([pscustomobject[]])]
param(
[Parameter(Mandatory)] [int]$StaleAfterDays,
[Parameter(Mandatory)] [int]$NeverLoggedInAfterDays,
[string]$SearchBase,
[string[]]$Exempt = @()
)
begin
{
Write-Verbose -Message '[Get-StaleUserInventory] Begin'
}
process
{
$staleCutoff = (Get-Date).AddDays(-$StaleAfterDays)
$noLoginCutoff = (Get-Date).AddDays(-$NeverLoggedInAfterDays)
$params = @{
Filter = { Enabled -eq $true -or Enabled -eq $false }
Properties = 'LastLogonDate','PasswordLastSet','whenCreated','Enabled','Description','MemberOf'
}
if ($SearchBase) { $params.SearchBase = $SearchBase }
Get-ADUser @params | ForEach-Object {
$reason = $null
if (-not $_.Enabled -and
$_.whenCreated -lt $staleCutoff -and
(-not $_.LastLogonDate -or $_.LastLogonDate -lt $staleCutoff))
{
$reason = 'Disabled and idle'
}
elseif ($_.Enabled -and $_.LastLogonDate -and $_.LastLogonDate -lt $staleCutoff)
{
$reason = 'Enabled but last logon > threshold'
}
elseif ($_.Enabled -and -not $_.LastLogonDate -and $_.whenCreated -lt $noLoginCutoff)
{
$reason = 'Never logged in after creation window'
}
if (-not $reason) { return }
if ($Exempt -contains $_.DistinguishedName) { return }
[pscustomobject]@{
DistinguishedName = $_.DistinguishedName
SamAccountName = $_.SamAccountName
Enabled = $_.Enabled
LastLogonDate = $_.LastLogonDate
PasswordLastSet = $_.PasswordLastSet
WhenCreated = $_.whenCreated
Description = $_.Description
Reason = $reason
IdleDays = if ($_.LastLogonDate) { [int]((Get-Date) - $_.LastLogonDate).TotalDays } else { $null }
AgeDays = [int]((Get-Date) - $_.whenCreated).TotalDays
}
}
}
end
{
Write-Verbose -Message '[Get-StaleUserInventory] End'
}
}
Two defensive choices:
LastLogonDateisn't precise. It's derived fromlastLogonTimestamp, which replicates at most every 14 days. For borderline cases, query each DC directly forlastLogon(which is per-DC, non-replicated, but accurate to the second on that DC) and take the max. For bulk cleanup runs, 14 days' slop is fine; the retention window absorbs it.- Never-logged-in accounts are the single most common source of account bloat "we provisioned it for the new hire but she didn't join." The "created X days ago, still no login" rule catches those without hitting real quiet-but-active accounts.
Computer inventory is the same shape, keyed on PasswordLastSet:
function Get-StaleComputerInventory
{
[CmdletBinding()]
param(
[Parameter(Mandatory)] [int]$StaleAfterDays,
[string]$SearchBase,
[string[]]$Exempt = @()
)
begin
{
Write-Verbose -Message '[Get-StaleComputerInventory] Begin'
}
process
{
$cutoff = (Get-Date).AddDays(-$StaleAfterDays)
$params = @{
Filter = { PasswordLastSet -lt $cutoff -or PasswordLastSet -notlike '*' }
Properties = 'PasswordLastSet','OperatingSystem','whenCreated','Enabled'
}
if ($SearchBase) { $params.SearchBase = $SearchBase }
Get-ADComputer @params |
Where-Object { $_.DistinguishedName -notin $Exempt } |
Select-Object DistinguishedName, Name, OperatingSystem, Enabled,
@{ N='PasswordLastSet'; E={ $_.PasswordLastSet } },
@{ N='IdleDays'; E={ [int]((Get-Date) - $_.PasswordLastSet).TotalDays } }
}
end
{
Write-Verbose -Message '[Get-StaleComputerInventory] End'
}
}
The PasswordLastSet < cutoff check is better than LastLogonDate: a computer account's secure channel password rotates every 30 days automatically. If it hasn't rotated, the machine is powered off, unreachable, or dead. 90 days is a safe threshold.
The Hard One Orphaned Groups
Empty groups are easy: member count = 0. But functionally orphaned groups groups that have members but no actual use require walking every ACL in the forest looking for the group's SID.
function Get-OrphanGroupInventory
{
[CmdletBinding()]
param(
[Parameter(Mandatory)] [int]$DeleteEmptyAfterDays,
[string]$SearchBase,
[string[]]$Exempt = @()
)
begin
{
Write-Verbose -Message '[Get-OrphanGroupInventory] Begin'
}
process
{
$cutoff = (Get-Date).AddDays(-$DeleteEmptyAfterDays)
# Step 1 - index every SID referenced in the forest ACL set
$usedSids = [System.Collections.Generic.HashSet[string]]::new()
Get-ADObject -SearchBase $SearchBase -Filter * -Properties ntSecurityDescriptor |
ForEach-Object {
foreach ($ace in $_.ntSecurityDescriptor.Access)
{
try
{
$sid = $ace.IdentityReference.Translate([System.Security.Principal.SecurityIdentifier]).Value
[void]$usedSids.Add($sid)
} catch { }
}
}
# Step 2 - index every SID used as a GPO-applies principal or group membership
Get-GPO -All | ForEach-Object {
$perms = Get-GPPermission -Guid $_.Id -All
foreach ($p in $perms)
{
try
{
$sid = (New-Object System.Security.Principal.NTAccount($p.Trustee.Name)).Translate([System.Security.Principal.SecurityIdentifier]).Value
[void]$usedSids.Add($sid)
} catch { }
}
}
# Step 3 - any group whose SID is absent from usedSids, AND has <= 1 direct members,
# AND has whenChanged older than the threshold, is orphan-candidate
Get-ADGroup -Filter * -SearchBase $SearchBase -Properties Members, whenChanged, Description |
Where-Object {
$_.DistinguishedName -notin $Exempt -and
$_.Members.Count -le 1 -and
$_.whenChanged -lt $cutoff -and
-not $usedSids.Contains($_.SID.Value)
} |
Select-Object DistinguishedName, Name, Description, @{ N='MemberCount'; E={ $_.Members.Count } }, whenChanged
}
end
{
Write-Verbose -Message '[Get-OrphanGroupInventory] End'
}
}
This is expensive walking every object's ACL is slow in a large forest. Run it weekly, not hourly. Cache the result.
The single-member test is a heuristic: groups used as GPO security filtering often have exactly one member (the group itself doesn't have members; it is the security filter). The ACL check catches those, but the single-member rule is a safety net for when the ACL walk missed something.
DNS Cleanup
Three kinds of records that go stale:
- Dynamic A/PTR clients that registered and never came back.
- Static A entries hand-added in 2012 for hosts that moved ten times since.
- SRV records domain controllers decommissioned without cleanup.
Aging + scavenging handles dynamic records if it's enabled. Most environments never turn scavenging on, fearing it'll eat static records. The correct pattern:
function Set-DnsScavengingCompliance
{
[CmdletBinding(SupportsShouldProcess)]
param(
[Parameter(Mandatory)] [string]$DnsServer,
[Parameter(Mandatory)] [string]$Zone,
[Parameter(Mandatory)] [int]$NoRefreshIntervalHours,
[Parameter(Mandatory)] [int]$RefreshIntervalHours
)
begin
{
Write-Verbose -Message '[Set-DnsScavengingCompliance] Begin'
}
process
{
$ErrorActionPreference = 'Stop'
$errors = New-Object System.Collections.Generic.List[string]
$zoneCfg = Get-DnsServerZone -ComputerName $DnsServer -Name $Zone
$aging = Get-DnsServerZoneAging -ComputerName $DnsServer -Name $Zone
if (-not $aging.AgingEnabled -and
$PSCmdlet.ShouldProcess("$DnsServer / $Zone", 'Enable aging'))
{
$agingParams = @{
ComputerName = $DnsServer
Name = $Zone
Aging = $true
NoRefreshInterval = (New-TimeSpan -Hours $NoRefreshIntervalHours)
RefreshInterval = (New-TimeSpan -Hours $RefreshIntervalHours)
Confirm = $false
ErrorAction = 'Stop'
}
try
{
Set-DnsServerZoneAging @agingParams
}
catch [System.Runtime.InteropServices.COMException]
{
$errors.Add(('DNS RPC failure against {0}: {1}' -f $DnsServer, $_.Exception.Message))
throw
}
catch
{
$errors.Add(('Set-DnsServerZoneAging failed for {0}/{1}: {2}' -f $DnsServer, $Zone, $_.Exception.Message))
throw
}
}
[pscustomobject]@{
DnsServer = $DnsServer
Zone = $Zone
Errors = $errors.ToArray()
}
}
end
{
Write-Verbose -Message '[Set-DnsScavengingCompliance] End'
}
}
To find static records that are actually stale point them at nothing reach out and confirm:
function Get-StaleStaticDnsRecord
{
[CmdletBinding()]
param(
[Parameter(Mandatory)] [string]$DnsServer,
[Parameter(Mandatory)] [string]$Zone
)
begin
{
Write-Verbose -Message '[Get-StaleStaticDnsRecord] Begin'
}
process
{
Get-DnsServerResourceRecord -ComputerName $DnsServer -ZoneName $Zone -RRType A |
Where-Object { -not $_.TimeStamp } | # static (non-dynamic) records
ForEach-Object {
$name = "$($_.HostName).$Zone"
$ip = $_.RecordData.IPv4Address.IPAddressToString
$live = Test-Connection -ComputerName $ip -Count 1 -Quiet -TimeoutSeconds 2
[pscustomobject]@{
Zone = $Zone
Host = $_.HostName
IP = $ip
Reachable = $live
Stale = (-not $live)
}
} | Where-Object Stale
}
end
{
Write-Verbose -Message '[Get-StaleStaticDnsRecord] End'
}
}
Reachability isn't sufficient proof of aliveness firewalls block ICMP, hosts reboot. For higher confidence, run this for 14 consecutive days and only flag records that came back unreachable every time.
Classify → Quarantine
Once inventory is done, the classifier tags each object with a disposition:
Keepmatches an exemption rule, do nothing.Reviewborderline case; produce a report entry for humans.Quarantinemove to the quarantine OU, disable, and set a retention timer.Deletequarantine period expired without objection; tombstone.
The disposition lives on the object itself. We hijack a rarely-used attribute (extensionAttribute14) to store the date the object entered quarantine:
function Move-ToQuarantine
{
[CmdletBinding(SupportsShouldProcess, ConfirmImpact = 'High')]
param(
[Parameter(Mandatory, ValueFromPipeline)] [pscustomobject]$Candidate,
[Parameter(Mandatory)] [string]$QuarantineOu
)
begin
{
Write-Verbose -Message '[Move-ToQuarantine] Begin'
}
process
{
$ErrorActionPreference = 'Stop'
$dn = $Candidate.DistinguishedName
$tag = "quarantine:$(Get-Date -Format 'yyyy-MM-dd'):$($Candidate.Reason)"
if ($PSCmdlet.ShouldProcess($dn, 'Disable + move to quarantine'))
{
try
{
Set-ADObject -Identity $dn -Replace @{ extensionAttribute14 = $tag } -ErrorAction Stop
}
catch [Microsoft.ActiveDirectory.Management.ADIdentityNotFoundException]
{
Write-Error -Message ('Quarantine candidate {0} disappeared before tagging; skipping' -f $dn)
return
}
catch
{
Write-Error -Message ('Failed to tag {0} with quarantine marker: {1}' -f $dn, $_.Exception.Message)
throw
}
try
{
switch ($Candidate.Class)
{
'User' { Disable-ADAccount -Identity $dn -ErrorAction Stop }
'Computer' { Disable-ADAccount -Identity $dn -ErrorAction Stop }
# Groups aren't "disabled" - we rename to mark them
'Group' { Rename-ADObject -Identity $dn -NewName "_Q_$($Candidate.SamAccountName)" -ErrorAction Stop }
default { throw ('Unknown candidate class: {0}' -f $Candidate.Class) }
}
}
catch [Microsoft.ActiveDirectory.Management.ADException]
{
Write-Error -Message ('Disable/rename failed for {0} ({1}): {2}' -f $dn, $Candidate.Class, $_.Exception.Message)
throw
}
try
{
Move-ADObject -Identity $dn -TargetPath $QuarantineOu -ErrorAction Stop
}
catch [Microsoft.ActiveDirectory.Management.ADIdentityNotFoundException]
{
Write-Error -Message ('Quarantine OU {0} not visible on this DC; replication or typo' -f $QuarantineOu)
throw
}
catch
{
Write-Error -Message ('Move to quarantine failed for {0}: {1}' -f $dn, $_.Exception.Message)
throw
}
}
}
end
{
Write-Verbose -Message '[Move-ToQuarantine] End'
}
}
The object is:
- Disabled so it can't be used.
- Tagged so automation knows when it entered quarantine and why.
- Moved out of the normal OU tree so browsing GPOs / ACLs is no longer cluttered.
- Renamed (for groups) with a
_Q_prefix so anyone who tries to use it by name notices.
Retention and Actual Deletion
Retention is simply "if the quarantine tag's date is older than N days, delete the object":
function Invoke-QuarantineRetention
{
[CmdletBinding(SupportsShouldProcess, ConfirmImpact = 'High')]
param(
[Parameter(Mandatory)] [string]$QuarantineOu,
[Parameter(Mandatory)] [int]$RetentionDays
)
begin
{
Write-Verbose -Message '[Invoke-QuarantineRetention] Begin'
}
process
{
$ErrorActionPreference = 'Stop'
$cutoff = (Get-Date).AddDays(-$RetentionDays)
$errors = New-Object System.Collections.Generic.List[string]
Get-ADObject -SearchBase $QuarantineOu -Filter * -Properties extensionAttribute14 |
Where-Object {
$_.extensionAttribute14 -match '^quarantine:(\d{4}-\d{2}-\d{2}):' -and
[datetime]$matches[1] -lt $cutoff
} |
ForEach-Object {
if ($PSCmdlet.ShouldProcess($_.DistinguishedName, "Delete after $RetentionDays day retention"))
{
try
{
Remove-ADObject -Identity $_.DistinguishedName -Recursive -Confirm:$false -ErrorAction Stop
}
catch [Microsoft.ActiveDirectory.Management.ADIdentityNotFoundException]
{
# Already gone - treat as converged.
Write-Verbose -Message ('Quarantine object {0} already deleted' -f $_.DistinguishedName)
}
catch [Microsoft.ActiveDirectory.Management.ADException]
{
# Protected-from-accidental-deletion is the usual culprit; log and continue.
$errors.Add(('Remove-ADObject failed for {0}: {1}' -f $_.DistinguishedName, $_.Exception.Message))
Write-Error -ErrorRecord $_
}
catch
{
$errors.Add(('Unexpected error deleting {0}: {1}' -f $_.DistinguishedName, $_.Exception.Message))
Write-Error -ErrorRecord $_
}
}
}
[pscustomobject]@{
QuarantineOu = $QuarantineOu
Errors = $errors.ToArray()
}
}
end
{
Write-Verbose -Message '[Invoke-QuarantineRetention] End'
}
}
Recommend RetentionDays = 45. Enough time for monthly reporting cycles and out-of-office users to notice missing service accounts. Short enough that the quarantine OU doesn't become its own bloat problem.
Recoverable deletes. Enable the AD Recycle Bin (
Enable-ADOptionalFeature 'Recycle Bin Feature' -Scope ForestOrConfigurationSet -Target <forest>). Deleted objects then live inCN=Deleted Objectsfor the forest tombstone lifetime (default 180 days). A mistaken delete is a restore-from-recycle-bin, not a forest recovery.
The Cleanup Pipeline
Putting all four phases together as a daily run:
[CmdletBinding(SupportsShouldProcess)]
param(
[Parameter(Mandatory)] [string]$StateFile,
[ValidateSet('audit','enforce')] [string]$Mode = 'audit'
)
$ErrorActionPreference = 'Stop'
Import-Module ./src/ADOps/ADOps.psd1 -Force
$state = Get-Content $StateFile -Raw | ConvertFrom-Yaml
$dryRun = ($Mode -eq 'audit')
# 1. Inventory
$userInvParams = @{
StaleAfterDays = $state.cleanup.users.staleAfterDays
NeverLoggedInAfterDays = $state.cleanup.users.neverLoggedInAfterDays
SearchBase = $state.searchBase
Exempt = (Resolve-Exempt $state.cleanup.users.exempt)
}
$users = Get-StaleUserInventory @userInvParams
$computerInvParams = @{
StaleAfterDays = $state.cleanup.computers.staleAfterDays
SearchBase = $state.searchBase
Exempt = (Resolve-Exempt $state.cleanup.computers.exempt)
}
$computers = Get-StaleComputerInventory @computerInvParams
$groupInvParams = @{
DeleteEmptyAfterDays = $state.cleanup.groups.deleteEmptyAfterDays
SearchBase = $state.searchBase
}
$groups = Get-OrphanGroupInventory @groupInvParams
# 2. Report (always, even in enforce mode)
$stamp = Get-Date -Format 'yyyy-MM-dd'
$users | Export-Csv "./reports/stale-users-$stamp.csv" -NoTypeInformation
$computers | Export-Csv "./reports/stale-computers-$stamp.csv" -NoTypeInformation
$groups | Export-Csv "./reports/orphan-groups-$stamp.csv" -NoTypeInformation
# 3. Quarantine (only in enforce)
$users | ForEach-Object { $_ | Add-Member NoteProperty Class 'User' -PassThru } |
Move-ToQuarantine -QuarantineOu $state.quarantine.ouDn -WhatIf:$dryRun
$computers | ForEach-Object { $_ | Add-Member NoteProperty Class 'Computer' -PassThru } |
Move-ToQuarantine -QuarantineOu $state.quarantine.ouDn -WhatIf:$dryRun
$groups | ForEach-Object { $_ | Add-Member NoteProperty Class 'Group' -PassThru } |
Move-ToQuarantine -QuarantineOu $state.quarantine.ouDn -WhatIf:$dryRun
# 4. Retention-based deletion
$retentionParams = @{
QuarantineOu = $state.quarantine.ouDn
RetentionDays = $state.quarantine.retentionDays
WhatIf = $dryRun
}
Invoke-QuarantineRetention @retentionParams
# 5. DNS (separate cadence - weekly is enough)
if ((Get-Date).DayOfWeek -eq 'Sunday')
{
foreach ($zone in $state.cleanup.dns.zones)
{
Get-StaleStaticDnsRecord -DnsServer $state.domain -Zone $zone |
Export-Csv "./reports/stale-dns-$zone-$stamp.csv" -NoTypeInformation
}
}
# 6. Metrics - track deltas over time
$summary = [pscustomobject]@{
Date = (Get-Date).ToString('s')
StaleUsers = $users.Count
StaleComputers = $computers.Count
OrphanGroups = $groups.Count
}
$summary | Export-Csv "./reports/summary.csv" -Append -NoTypeInformation
Tracking Progress
The cleanup story is only interesting if you measure it. The summary.csv row produced every day is what you chart. Three trends worth watching:
- Absolute count per class over time. Expectation: rapid drop over the first 60 days, then steady-state matching the churn rate of the business.
- New candidates per week. This is the leak rate how fast the business creates stale objects faster than cleanup retires them. A non-zero leak rate after steady state points at a broken JML process (unmanaged departures) or an automation gap.
- Quarantine re-saves. Objects that entered quarantine, got restored ("this was actually needed!"), and came back again. The count should be near zero; a high count means the classifier's rules are too aggressive and the YAML needs tuning.
Pipe the metrics into Zabbix (see the Zabbix series) or Grafana. You will, if nothing else, produce a chart that impresses the auditor.
Edge Cases Worth Knowing
- Service accounts for the quarantine pipeline itself don't let the pipeline flag them as stale. Explicit exemption at the top of the YAML.
- Domain controllers. Never, ever, ever classify DCs as stale even if
PasswordLastSetlooks old on a replica. Always$_.PrimaryGroupID -ne 516in computer inventory. - Computer accounts created by Azure AD Connect (
AAD_…) have different password-rotation semantics. Add them to the exempt list. - Users with smart-card-only logon. Their
PasswordLastSetdoesn't advance, but they log in daily. Filter onsmartcardLogonRequiredand useLastLogonDateinstead for these. - SID history. When cleaning up old migrated accounts, check
SIDHistorydeleting an account that another object trusts via SID history will silently break its access. Reconciler should flag, not delete. - Tombstone expiry. The default 180-day tombstone lifetime means an object deleted for 181 days is unrecoverable from the Recycle Bin. If your pipeline is aggressive, extend the tombstone lifetime to 365 days at the domain level.
Gotchas
LastLogonDatelag is the single most common cause of false positives. Always query per-DClastLogonbefore deleting anything you can't cheaply recover.- Moving a computer to a different OU changes applied GPOs, which can impact the host before you've decided to delete it. Prefer disabling + tagging over moving for computers; move only when you're sure.
- Orphan group detection misses dynamic group usage. Some apps (SCCM, Intune) create device groups on the fly with metadata stored in the group's description. Exclude all groups in the OUs owned by those systems.
- AD Recycle Bin can't recover if the deleting account didn't have the rights. The reconciler's service account needs
Reanimate Tombstonefor anyone to undo a bad delete. - The quarantine OU becomes a privilege-escalation target if ACLs on it are too loose. Delegate ONLY the reconciler's service account with write there. Read can be wider.
Final Notes
Cleanup is the least glamorous AD work and the most common source of audit findings. Done well, it fades into the background daily reports that no one reads because the numbers are all small and trending flat. Done poorly, it shows up as a three-week remediation project every time a regulator asks "when did this account last log in?"
This series covered six layers: Get-Test-Set, AD structure as code, Group Policy as code, WMI filters as code, certificates, and security baselines. The cleanup pipeline in this post is the seventh and ties them together the drift-detection feedback loop that keeps the previous six honest as the business evolves.
The underlying idea is the same across all six: idempotent scripts, declared state in git, reconciliation on a schedule, a quarantine period before anything irreversible. Active Directory is 25-year-old technology. It responds well to 21st-century engineering when you actually apply it.


