This guide assumes Powershell 7+ (
-Parallelis a 7-only feature). All numbers below come from a single workstation; your absolute times will differ but the ratios are stable.
Sequel to the .NET vs cmdlets post. Same lesson at a different layer: parallelism in Powershell is powerful and easy to use wrong. The default reflex of "wrap it in ForEach-Object -Parallel" is sometimes 10× faster, sometimes 10× slower. This post measures both.
The Three Options
| Tool | Powershell version | Per-iteration cost | Best for |
|---|---|---|---|
ForEach-Object -Parallel |
7+ | High | I/O-bound work, dozens to a few thousand items |
Start-ThreadJob |
6+ (module 7+) | Medium | Long-running independent jobs you'll await later |
Raw RunspacePool |
5.1+ | Low | Hot loops, tens of thousands of items |
Benchmark Harness
Reused from the .NET-vs-cmdlets post:
function Measure-It
{
param([scriptblock]$Action, [int]$Iterations = 5)
for ($i = 0; $i -lt 2; $i++) { & $Action | Out-Null } # warmup
$times = for ($i = 0; $i -lt $Iterations; $i++) {
(Measure-Command { & $Action }).TotalMilliseconds
}
[pscustomobject]@{
Min = ($times | Measure-Object -Minimum).Minimum
Median = ($times | Sort-Object)[[int]($times.Count / 2)]
Max = ($times | Measure-Object -Maximum).Maximum
}
}
Workload 1: Cheap CPU Work × 10,000
Compute SHA256 of a small string per iteration.
$items = 1..10000
# Sequential
Measure-It { $items | ForEach-Object {
[System.Security.Cryptography.SHA256]::HashData([Text.Encoding]::UTF8.GetBytes("item-$_"))
} }
# ForEach-Object -Parallel (default ThrottleLimit 5)
Measure-It { $items | ForEach-Object -Parallel {
[System.Security.Cryptography.SHA256]::HashData([Text.Encoding]::UTF8.GetBytes("item-$_"))
} }
# RunspacePool
Measure-It {
$pool = [runspacefactory]::CreateRunspacePool(1, 16)
$pool.Open()
$tasks = foreach ($i in $items) {
$ps = [powershell]::Create().AddScript({
param($n)
[System.Security.Cryptography.SHA256]::HashData([Text.Encoding]::UTF8.GetBytes("item-$n"))
}).AddArgument($i)
$ps.RunspacePool = $pool
@{ Ps = $ps; Async = $ps.BeginInvoke() }
}
foreach ($t in $tasks) { $t.Ps.EndInvoke($t.Async) | Out-Null; $t.Ps.Dispose() }
$pool.Close()
}
| Approach | Median |
|---|---|
Sequential ForEach-Object |
~1,400 ms |
ForEach-Object -Parallel |
~24,000 ms |
| RunspacePool | ~9,200 ms |
Sequential beats both parallel options by 10×. This is the lesson that costs people the most: for cheap CPU work, the per-iteration overhead of standing up a runspace for each item dwarfs the work itself.
Workload 2: Same Work × 100
Same per-item cost, just 100 items instead of 10,000.
| Approach | Median |
|---|---|
| Sequential | ~14 ms |
ForEach-Object -Parallel |
~280 ms |
| RunspacePool | ~120 ms |
Still sequential wins. Parallelism taxes you per item. CPU-bound work has to be expensive enough per item to amortize the tax.
Workload 3: Network Calls × 100
Hit a small HTTP endpoint with 200ms latency per call.
# Sequential
Measure-It { 1..100 | ForEach-Object { Invoke-RestMethod 'https://httpbin.org/delay/0.2' } }
# Parallel, throttle 16
Measure-It { 1..100 | ForEach-Object -Parallel {
Invoke-RestMethod 'https://httpbin.org/delay/0.2'
} -ThrottleLimit 16 }
| Approach | Median |
|---|---|
| Sequential | ~21,000 ms |
ForEach-Object -Parallel, throttle 5 |
~4,300 ms |
ForEach-Object -Parallel, throttle 16 |
~1,500 ms |
| RunspacePool, max 16 | ~1,400 ms |
This is where parallelism pays. I/O-bound work spends most of its time waiting; running 16 of them concurrently turns a 21-second job into a 1.5-second one. Both options are the same speed because the bottleneck is the network, not the runspace overhead.
Workload 4: Reading 1,000 Files
Open and parse 1,000 small JSON files from disk.
| Approach | Median |
|---|---|
| Sequential | ~3,800 ms |
ForEach-Object -Parallel, throttle 8 |
~900 ms |
| RunspacePool, max 8 | ~720 ms |
Disk I/O is fast enough that overhead matters again. The wins are real but smaller. Below ~100 files the parallel versions lose to sequential.
Why ForEach-Object -Parallel Has Such High Overhead
Each iteration creates an isolated runspace with its own SessionState. Variables don't carry across without $using:, modules need to be re-imported, and the entire pipeline machinery is rebuilt per item.
$prefix = 'item'
1..10 | ForEach-Object -Parallel { "$using:prefix-$_" }
The $using: prefix injects a value into the parallel scope. Without it, $prefix is $null inside the block.
For a large number of items, batching turns this from a problem into a non-issue:
$batches = 1..10000 | Group-Object { [int]($_ / 100) }
$batches.Group | ForEach-Object -Parallel {
foreach ($x in $_) { Process $x }
} -ThrottleLimit 8
100 batches of 100 items each: one runspace startup per batch, sequential work inside. Almost always faster than naive per-item parallelism.
ThreadJob The Underused Middle Ground
Start-ThreadJob is a job framework like Start-Job, but jobs run in threads of the same process instead of separate pwsh processes. Per-job overhead is medium much less than Start-Job, more than a runspace pool but you get the job framework in return: Receive-Job, Wait-Job, Get-Job, -Throttle, etc.
$jobs = 1..50 | ForEach-Object {
$jobParams = @{
ScriptBlock = { param($n) Invoke-RestMethod "https://api.example.com/$n" }
ArgumentList = $_
ThrottleLimit = 8
}
Start-ThreadJob @jobParams
}
$results = $jobs | Receive-Job -Wait -AutoRemoveJob
Best for: long-running independent work you want to monitor and collect from later backups, builds, migration tasks. Not for tight loops.
RunspacePool Template
When you genuinely need raw runspaces:
function Invoke-Parallel
{
[CmdletBinding()]
param(
[Parameter(Mandatory, ValueFromPipeline)] [object[]]$InputObject,
[Parameter(Mandatory)] [scriptblock]$ScriptBlock,
[int]$ThrottleLimit = 8
)
begin
{
$pool = [runspacefactory]::CreateRunspacePool(1, $ThrottleLimit)
$pool.Open()
$tasks = New-Object System.Collections.Generic.List[object]
}
process
{
foreach ($item in $InputObject)
{
$ps = [powershell]::Create().AddScript($ScriptBlock).AddArgument($item)
$ps.RunspacePool = $pool
$tasks.Add(@{ Ps = $ps; Async = $ps.BeginInvoke() })
}
}
end
{
foreach ($t in $tasks)
{
$t.Ps.EndInvoke($t.Async)
$t.Ps.Dispose()
}
$pool.Close()
$pool.Dispose()
}
}
1..1000 | Invoke-Parallel -ThrottleLimit 16 -ScriptBlock { param($n) Get-Item "C:\data\$n.txt" }
About 50 lines that consistently beat -Parallel for hot loops.
When to Pick Which
| Situation | Use |
|---|---|
| 1–100 items of any kind | Sequential |
| 100s–thousands of network/HTTP calls | -Parallel |
| 100s–thousands of disk reads | -Parallel |
| 10,000+ items, mostly CPU | RunspacePool + batches |
| Long-running independent jobs | Start-ThreadJob |
| Anything you'll cancel on user request | RunspacePool + token |
| You're going to fan-out, then aggregate | RunspacePool |
Three Patterns That Compose
Bounded throttle. Always specify -ThrottleLimit explicitly. The default of 5 is fine for HTTP, way too low for in-process work, and way too high for things that hammer a downstream service.
Batching. When per-item overhead dominates, group items and parallelize the groups. 100 batches of 100 is almost always better than 10,000 items of 1.
Cancellation. Long parallel runs need a way to stop early. RunspacePool lets you Stop() it; -Parallel is awkward to cancel mid-flight (you'd have to throw from $using: shared state). For interactive cancellation, prefer the runspace pattern.
Common Bugs
- Forgetting
$using:variable looks captured, is actually$null. Always$using:. - Sharing a non-thread-safe collection.
[List[object]]::new()is not thread-safe. Use[ConcurrentBag[object]]::new()or[ConcurrentQueue[object]]::new(). - Modules not loaded in the parallel scope.
-Paralleldoesn't auto-import. Either pass module loading via$using:orImport-Moduleat the top of the script block. - Per-call HTTP client. Building
[System.Net.Http.HttpClient]per parallel iteration loses most of the parallel win to TCP setup. Hoist it to a module-scoped singleton.
What to Do Next
Parallelism isn't free. The first thing to ask before reaching for it is "what's the bottleneck, CPU, I/O, or runspace overhead?" Sequential wins for cheap CPU work. -Parallel wins for I/O-bound work above ~50 items. RunspacePool wins for hot loops with ten thousand items, especially when batched. ThreadJob wins for long-running independent work.
The decision rule for the next script you're tempted to parallelise:
- Time the sequential version first with
Measure-Command. If it's under 2 seconds, leave it alone. The cognitive cost of parallel code is worth more than 1.5 seconds saved. - Identify the bottleneck (CPU, network, disk). Network =
-Parallelwith throttle 16+. CPU + tens of thousands of items =RunspacePoolwith batching. Long-running independent units =Start-ThreadJob. - Benchmark the parallel version with the same harness. If the speedup is less than 3x, the overhead is eating the gain; either batch more aggressively or revert. A "parallel" script that's 1.5x faster than sequential is debt with no return.
Pairs naturally with the .NET vs cmdlets post (the first axis of optimisation; parallelism is the second) and the C# module post (when even RunspacePool isn't fast enough, dropping into a compiled binary cmdlet usually is).


