This guide assumes Powershell 7+ (-Parallel is a 7-only feature). All numbers below come from a single workstation; your absolute times will differ but the ratios are stable.

Sequel to the .NET vs cmdlets post. Same lesson at a different layer: parallelism in Powershell is powerful and easy to use wrong. The default reflex of "wrap it in ForEach-Object -Parallel" is sometimes 10× faster, sometimes 10× slower. This post measures both.

The Three Options

Tool Powershell version Per-iteration cost Best for
ForEach-Object -Parallel 7+ High I/O-bound work, dozens to a few thousand items
Start-ThreadJob 6+ (module 7+) Medium Long-running independent jobs you'll await later
Raw RunspacePool 5.1+ Low Hot loops, tens of thousands of items

Benchmark Harness

Reused from the .NET-vs-cmdlets post:

function Measure-It
{
    param([scriptblock]$Action, [int]$Iterations = 5)
    for ($i = 0; $i -lt 2; $i++) { & $Action | Out-Null }   # warmup
    $times = for ($i = 0; $i -lt $Iterations; $i++) {
        (Measure-Command { & $Action }).TotalMilliseconds
    }
    [pscustomobject]@{
        Min    = ($times | Measure-Object -Minimum).Minimum
        Median = ($times | Sort-Object)[[int]($times.Count / 2)]
        Max    = ($times | Measure-Object -Maximum).Maximum
    }
}

Workload 1: Cheap CPU Work × 10,000

Compute SHA256 of a small string per iteration.

$items = 1..10000

# Sequential
Measure-It { $items | ForEach-Object {
    [System.Security.Cryptography.SHA256]::HashData([Text.Encoding]::UTF8.GetBytes("item-$_"))
} }

# ForEach-Object -Parallel (default ThrottleLimit 5)
Measure-It { $items | ForEach-Object -Parallel {
    [System.Security.Cryptography.SHA256]::HashData([Text.Encoding]::UTF8.GetBytes("item-$_"))
} }

# RunspacePool
Measure-It {
    $pool = [runspacefactory]::CreateRunspacePool(1, 16)
    $pool.Open()
    $tasks = foreach ($i in $items) {
        $ps = [powershell]::Create().AddScript({
            param($n)
            [System.Security.Cryptography.SHA256]::HashData([Text.Encoding]::UTF8.GetBytes("item-$n"))
        }).AddArgument($i)
        $ps.RunspacePool = $pool
        @{ Ps = $ps; Async = $ps.BeginInvoke() }
    }
    foreach ($t in $tasks) { $t.Ps.EndInvoke($t.Async) | Out-Null; $t.Ps.Dispose() }
    $pool.Close()
}
Approach Median
Sequential ForEach-Object ~1,400 ms
ForEach-Object -Parallel ~24,000 ms
RunspacePool ~9,200 ms

Sequential beats both parallel options by 10×. This is the lesson that costs people the most: for cheap CPU work, the per-iteration overhead of standing up a runspace for each item dwarfs the work itself.

Workload 2: Same Work × 100

Same per-item cost, just 100 items instead of 10,000.

Approach Median
Sequential ~14 ms
ForEach-Object -Parallel ~280 ms
RunspacePool ~120 ms

Still sequential wins. Parallelism taxes you per item. CPU-bound work has to be expensive enough per item to amortize the tax.

Workload 3: Network Calls × 100

Hit a small HTTP endpoint with 200ms latency per call.

# Sequential
Measure-It { 1..100 | ForEach-Object { Invoke-RestMethod 'https://httpbin.org/delay/0.2' } }

# Parallel, throttle 16
Measure-It { 1..100 | ForEach-Object -Parallel {
    Invoke-RestMethod 'https://httpbin.org/delay/0.2'
} -ThrottleLimit 16 }
Approach Median
Sequential ~21,000 ms
ForEach-Object -Parallel, throttle 5 ~4,300 ms
ForEach-Object -Parallel, throttle 16 ~1,500 ms
RunspacePool, max 16 ~1,400 ms

This is where parallelism pays. I/O-bound work spends most of its time waiting; running 16 of them concurrently turns a 21-second job into a 1.5-second one. Both options are the same speed because the bottleneck is the network, not the runspace overhead.

Workload 4: Reading 1,000 Files

Open and parse 1,000 small JSON files from disk.

Approach Median
Sequential ~3,800 ms
ForEach-Object -Parallel, throttle 8 ~900 ms
RunspacePool, max 8 ~720 ms

Disk I/O is fast enough that overhead matters again. The wins are real but smaller. Below ~100 files the parallel versions lose to sequential.

Why ForEach-Object -Parallel Has Such High Overhead

Each iteration creates an isolated runspace with its own SessionState. Variables don't carry across without $using:, modules need to be re-imported, and the entire pipeline machinery is rebuilt per item.

$prefix = 'item'
1..10 | ForEach-Object -Parallel { "$using:prefix-$_" }

The $using: prefix injects a value into the parallel scope. Without it, $prefix is $null inside the block.

For a large number of items, batching turns this from a problem into a non-issue:

$batches = 1..10000 | Group-Object { [int]($_ / 100) }
$batches.Group | ForEach-Object -Parallel {
    foreach ($x in $_) { Process $x }
} -ThrottleLimit 8

100 batches of 100 items each: one runspace startup per batch, sequential work inside. Almost always faster than naive per-item parallelism.

ThreadJob The Underused Middle Ground

Start-ThreadJob is a job framework like Start-Job, but jobs run in threads of the same process instead of separate pwsh processes. Per-job overhead is medium much less than Start-Job, more than a runspace pool but you get the job framework in return: Receive-Job, Wait-Job, Get-Job, -Throttle, etc.

$jobs = 1..50 | ForEach-Object {
    $jobParams = @{
        ScriptBlock   = { param($n) Invoke-RestMethod "https://api.example.com/$n" }
        ArgumentList  = $_
        ThrottleLimit = 8
    }
    Start-ThreadJob @jobParams
}
$results = $jobs | Receive-Job -Wait -AutoRemoveJob

Best for: long-running independent work you want to monitor and collect from later backups, builds, migration tasks. Not for tight loops.

RunspacePool Template

When you genuinely need raw runspaces:

function Invoke-Parallel
{
    [CmdletBinding()]
    param(
        [Parameter(Mandatory, ValueFromPipeline)] [object[]]$InputObject,
        [Parameter(Mandatory)] [scriptblock]$ScriptBlock,
        [int]$ThrottleLimit = 8
    )
    begin
    {
        $pool = [runspacefactory]::CreateRunspacePool(1, $ThrottleLimit)
        $pool.Open()
        $tasks = New-Object System.Collections.Generic.List[object]
    }
    process
    {
        foreach ($item in $InputObject)
        {
            $ps = [powershell]::Create().AddScript($ScriptBlock).AddArgument($item)
            $ps.RunspacePool = $pool
            $tasks.Add(@{ Ps = $ps; Async = $ps.BeginInvoke() })
        }
    }
    end
    {
        foreach ($t in $tasks)
        {
            $t.Ps.EndInvoke($t.Async)
            $t.Ps.Dispose()
        }
        $pool.Close()
        $pool.Dispose()
    }
}

1..1000 | Invoke-Parallel -ThrottleLimit 16 -ScriptBlock { param($n) Get-Item "C:\data\$n.txt" }

About 50 lines that consistently beat -Parallel for hot loops.

When to Pick Which

Situation Use
1–100 items of any kind Sequential
100s–thousands of network/HTTP calls -Parallel
100s–thousands of disk reads -Parallel
10,000+ items, mostly CPU RunspacePool + batches
Long-running independent jobs Start-ThreadJob
Anything you'll cancel on user request RunspacePool + token
You're going to fan-out, then aggregate RunspacePool

Three Patterns That Compose

Bounded throttle. Always specify -ThrottleLimit explicitly. The default of 5 is fine for HTTP, way too low for in-process work, and way too high for things that hammer a downstream service.

Batching. When per-item overhead dominates, group items and parallelize the groups. 100 batches of 100 is almost always better than 10,000 items of 1.

Cancellation. Long parallel runs need a way to stop early. RunspacePool lets you Stop() it; -Parallel is awkward to cancel mid-flight (you'd have to throw from $using: shared state). For interactive cancellation, prefer the runspace pattern.

Common Bugs

  • Forgetting $using: variable looks captured, is actually $null. Always $using:.
  • Sharing a non-thread-safe collection. [List[object]]::new() is not thread-safe. Use [ConcurrentBag[object]]::new() or [ConcurrentQueue[object]]::new().
  • Modules not loaded in the parallel scope. -Parallel doesn't auto-import. Either pass module loading via $using: or Import-Module at the top of the script block.
  • Per-call HTTP client. Building [System.Net.Http.HttpClient] per parallel iteration loses most of the parallel win to TCP setup. Hoist it to a module-scoped singleton.

What to Do Next

Parallelism isn't free. The first thing to ask before reaching for it is "what's the bottleneck, CPU, I/O, or runspace overhead?" Sequential wins for cheap CPU work. -Parallel wins for I/O-bound work above ~50 items. RunspacePool wins for hot loops with ten thousand items, especially when batched. ThreadJob wins for long-running independent work.

The decision rule for the next script you're tempted to parallelise:

  1. Time the sequential version first with Measure-Command. If it's under 2 seconds, leave it alone. The cognitive cost of parallel code is worth more than 1.5 seconds saved.
  2. Identify the bottleneck (CPU, network, disk). Network = -Parallel with throttle 16+. CPU + tens of thousands of items = RunspacePool with batching. Long-running independent units = Start-ThreadJob.
  3. Benchmark the parallel version with the same harness. If the speedup is less than 3x, the overhead is eating the gain; either batch more aggressively or revert. A "parallel" script that's 1.5x faster than sequential is debt with no return.

Pairs naturally with the .NET vs cmdlets post (the first axis of optimisation; parallelism is the second) and the C# module post (when even RunspacePool isn't fast enough, dropping into a compiled binary cmdlet usually is).