Skip to content

On Windows, System.IO.Directory.Delete() intermittently fails quietly, is unexpectedly asynchronous #27958

@mklement0

Description

@mklement0

Note: cmd.exe's rd /s and PowerShell's Remove-Item are equally affected: see here and here.

Note: The problem occurs only on Windows (the problem also affects the "full" .NET Framework there, albeit with slightly different symptoms).

The Windows API functions DeleteFile() and RemoveDirectory() functions are inherently asynchronous (emphasis added):

The DeleteFile function marks a file for deletion on close. Therefore, the file deletion does not occur until the last handle to the file is closed. Subsequent calls to CreateFile to open the file fail with ERROR_ACCESS_DENIED."

The RemoveDirectory function marks a directory for deletion on close. Therefore, the directory is not removed until the last handle to the directory is closed."

System.IO.Directory.Delete() fails to account for this asynchronous behavior, which has two implications:

  • Problem (a): Trying to delete a nonempty directory (which invariably requires recursive deletion of its content first) can fail - infrequently, but it does happen, and the failure is quiet.

  • Problem (b): Trying to recreate a successfully deleted directory or a file immediately afterwards may fail - intermittently (easier to provoke than the other problem, but a more exotic use case overall).

    • Note: You could argue that it's not .NET Core's responsibility to create a synchronous experience in this case, given that the underlying system API is asynchronous; however, it certainly makes for more robust, predictable programs if synchronicity is ensured - as is the case on Unix-like platforms, where the system APIs for file removal are synchronous.

Problem (a) is due to using depth-first recursion without accounting for the asynchronous deletion behavior; the problem, along with the solution, is described in this YouTube video (starts at 7:35).

Note that the source code does at least hint at the asynchronous behavior:

https://github.com/dotnet/corefx/blob/40364e539572e9dad9c8a2eb165fc9af28e5664a/src/System.IO.FileSystem/src/System/IO/FileSystem.Windows.cs#L532-L534

However, in practice the recursive removal of content does fail intermittently - and quietly - leaving both the target directory and remnants inside it behind.

Important:

  • For convenience, PowerShell Core is used to reproduce the problem below.
  • The repros below use the functions defined at the bottom, which must be defined first.
  • The tests create temp. dir. $HOME/tmpDir - remove it manually afterwards, if still present.

Steps to reproduce

Setup:

  • Open a PowerShell Core shell.
  • Paste and submit the code with the two function definitions below at the prompt to define the functions used in the tests.

Problem (a):

Assert-ReliableDirRemoval .net

Problem (b):

Assert-SyncDirRemoval .net

Expected behavior

The functions should loop indefinitely (terminate them with Ctrl+C), emitting a . in each iteration.

Actual behavior

Eventually - and that there is no predictable time frame is indicative of the problem - an error will occur, on the order of minutes or even longer.

Problem (a):

Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using .net for removal.
........Deletion failed quietly or with exit code 0, tmpDir still has content: sub

That is, recursive removal of the target dir's content failed quietly due to async timing issues, and both the target directory and remnants inside it linger.

Problem (b):

Synchronous dir-removal test: Repeatedly creating and removing a directory, using ps for removal.
.................New-Item : An item with the specified name C:\Users\jdoe\tmpDir already exists.

That is, recreating the target dir failed, because the prior removal hadn't yet completed.

Aux. function definitions:

Functions Assert-ReliableDirRemoval and Assert-SyncDirRemoval

You can paste the entire code at the prompt in order to define these functions.

function Assert-ReliableDirRemoval {
  param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method)
  Write-Host "Reliable dir-tree-removal test: Repeatedly creating and removing a directory subtree, using $method for removal."
  # Treat all errors as fatal.
  $ErrorActionPreference = 'Stop'
  Set-Location $HOME  # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not.
  # Remove a preexisting directory first, if any.
  Remove-Item -EA Ignore -Recurse tmpDir
  While ($true) {
    Write-Host -NoNewline .
    # Create a subdir. tree.
    $null = New-Item -Type Directory tmpDir, tmpDir/sub, tmpDir/sub/sub, tmpDir/sub/sub/sub
    # Fill each subdir. with 1000 empty test files.
    "tmpDir", "tmpDir/sub", "tmpDir/sub/sub", "tmpDir/sub/sub/sub"| ForEach-Object {
      $dir = $_
      1..1e3 | ForEach-Object { $null > "$dir/$_" }
    }
    # Now remove the entire dir., which invariably involves deleting its contents
    # recursively first.
    switch ($method) {
      'ps'  { Remove-Item -Recurse tmpDir }
      '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) }
      'cmd'  { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } }
      # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but 
      # !! does NOT report a nonzero exit code. We detect this case below.
    }
    # Does the dir. unexpectedly still exist?
    # This can happen for two reasons:
    #  - The removal of the top-level directory itself has been requested, but isn't complete yet.
    #  - Removing the content of the top-level directory failed quietly.
    while (Test-Path tmpDir) {
      if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content?
        Throw "Deletion failed quietly or with exit code 0, tmpDir still has content: $output"
      } else {
        # Top-level directory removal isn't complete yet.
        # Wait for removal to complete, so we can safely recreate the directory in the next iteration.
        # This loop should exit fairly quickly.
        Write-Host -NoNewline !; Start-Sleep -Milliseconds 100
      }      
    }    
  }
}

function Assert-SyncDirRemoval { 
  param([parameter(Mandatory)] [ValidateSet('cmd', '.net', 'ps')] [string] $method)
  Write-Host "Synchronous dir-removal test: Repeatedly creating and removing a directory, using $method for removal."
  # Treat all errors as fatal.
  $ErrorActionPreference = 'Stop'
  Set-Location $HOME  # !! Seemingly, a dir. in the user's home dir. tree reproduces the symptom fastest - $env:TEMP does not.
  While ($true) {
    Write-Host -NoNewline .
    # Remove the test dir., which invariably involves deleting its contents recursively first.
    # Note: This could itself fail intermittently, but with just 10 files and no subdirs. is unlikely to.
    if (Test-Path tmpDir) {
      switch ($method) {
        'ps' { Remove-Item -Recurse tmpDir }
        '.net' { [System.IO.Directory]::Delete((Convert-Path tmpDir), $true) }
         'cmd'  { cmd /c rd /s /q tmpDir; if ($LASTEXITCODE) { Throw } }
         # !! If rd /s fails during recursive removal due to async issues, it emits a stderr line, but 
         # !! does NOT report a nonzero exit code. We detect this case below.
      }
    }
    # (Re-)create the dir. with 10 empty test files.
    # If the previous removal hasn't fully completed yet, this will fail.
    # Note: [System.IO.Directory]::Delete() could have quietly failed to remove the contents of the dir.
    #       due to async timing accidents, but, again, this is unlikely with 1- files and 
    try {
      $null = New-Item -Type Directory tmpDir
    } catch {
      # Handle the case where removal failed
      # quietly due to async vagaries while removing the dir's *content*, which
      # can happen with [System.IO.Directory]::Delete().
      # Note that if removal succeeded but is pending - the scenario we're trying to
      # get to occur - Get-ChildItem reports an access-denied error, which we ignore
      # in favor of reporting the original error from the dir. re-creation attempt.
      if ($output = Get-ChildItem -EA Ignore -Force -Name tmpDir) { # Still has content?
        Write-Warning "Ignoring failed content removal, retrying..."
        # Simply try again
        Start-Sleep -Milliseconds 500; Remove-item tmpDir -Recurse
        continue
      }
      # Re-creation failed due to async removal.
      Throw $_
    }
    1..10 | ForEach-Object { $null > tmpDir/$_ }
  }
}

Environment data

Microsoft.NETCore.App version 2.1.5

Metadata

Metadata

Assignees

No one assigned

    Type

    No type
    No fields configured for issues without a type.

    Projects

    No projects

    Milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions