Skip to content

Add --role-assignments flag to clean unattached RAs#22

Merged
mboersma merged 1 commit intoAzure:masterfrom
mboersma:role-assignments
Mar 12, 2025
Merged

Add --role-assignments flag to clean unattached RAs#22
mboersma merged 1 commit intoAzure:masterfrom
mboersma:role-assignments

Conversation

@mboersma
Copy link
Copy Markdown
Member

Adds an optional --role-assignments flag to the container that will clean up unattached role assignments.

I tried to implement this all in the Go binary, but replicating the az role assignments list logic in go using the Graph API never quite worked, so instead I changed the container to run a shell script with the az command itself.

@mboersma mboersma force-pushed the role-assignments branch 3 times, most recently from 21f23a1 to edeaf96 Compare August 14, 2024 16:13
Copy link
Copy Markdown
Member

@jackfrancis jackfrancis left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

/lgtm

thank you @mboersma!

@mboersma
Copy link
Copy Markdown
Member Author

I updated this to include the Bicep changes for the --role-assignments flag and deployed it to production with a v0.4.0 image. I don't see any errors but I also don't see that it's actually cleaning the unattached role assignments. I'll keep an eye on it and keep this PR open in case there's still something to fix.

@jsturtevant
Copy link
Copy Markdown

I've pushed a few changes to fix an issue where the delay was only running for 1 min, this means the code to run the role assignments was never running. This happened because there was a bunch of stale RG that built up and it took longer than 1 min to delete them all.

The code to clean up the role assignments is still not executing. I don't know why, it works locally. For now I've run it manually to clear out the stale assignments.

@mboersma
Copy link
Copy Markdown
Member Author

mboersma commented Aug 27, 2024

I deployed the current version as v0.4.3, but we're getting this in the Logic App logs:

2024/08/27 19:21:26 Initializing rg-cleanup
2024/08/27 19:21:26 args: [./bin/rg-cleanup --identity --ttl 4h]
2024/08/27 19:21:26 Scanning for stale resource groups
2024/08/27 19:21:28 Beginning to delete resource group 'MA_defaultazuremonitorworkspace-eus_eastus_managed' (age: probably a long time because it does not have a 'creationTimestamp' tag. Found tags: map[])
...
[
  {
    "environmentName": "AzureCloud",
    "homeTenantId": "...",
    "id": "...",
    "isDefault": true,
    "managedByTenants": [],
    "name": "Microsoft Azure Sponsorship",
    "state": "Enabled",
    "tenantId": "...",
    "user": {
      "assignedIdentityInfo": "MSI",
      "name": "systemAssignedIdentity",
      "type": "servicePrincipal"
    }
  }
]
Cleaning up unattached role assignments...
Deleting role assignments:

ERROR: argument --ids: expected at least one argument

Yet in a cloud shell I can log in to the same subscription and run the core shell command, and it returns several IDs.

@nojnhuh
Copy link
Copy Markdown
Member

nojnhuh commented Aug 27, 2024

I wonder if there might be an RBAC thing going on? On the surface it seems like it has enough permissions now, but I'll try in my sub with a fresh identity with only the same perms as the rg-cleanup identity and see if I can repro.

Regardless it may be worth adding a guard in the script to make sure we handle the case where there are no role assignments to be deleted. I think the -r flag for xargs by itself would do the trick:

       -r, --no-run-if-empty
              If the standard input does not contain any nonblanks, do not run the command.  Normally, the command is run once even if there is no input.  This option is a GNU exten‐
              sion.

@jackfrancis
Copy link
Copy Markdown
Member

@mboersma indeed az role assignment list --scope "/subscriptions/$SUBSCRIPTION_ID" -o tsv --query "[?principalName==''].id" is returning empty

Maybe for further debugging we can output that command before we wrap it in $()

@nojnhuh
Copy link
Copy Markdown
Member

nojnhuh commented Aug 27, 2024

I can see when I only have the perms of the rg-cleanup identity, I get no role assignments and this in the debug output:

https://graph.microsoft.com:443 "POST /v1.0/directoryObjects/getByIds HTTP/1.1" 403 None

Whereas with my regular user I do see role assignments and

https://graph.microsoft.com:443 "POST /v1.0/directoryObjects/getByIds HTTP/1.1" 200 None

So we must need to give rg-cleanup those additional permissions, but last I looked I don't think I could find an obvious way to do that.

@mboersma
Copy link
Copy Markdown
Member Author

mboersma commented Aug 27, 2024

Running az role assignment list --scope "/subscriptions/$SUBSCRIPTION_ID" on its own in the rg-cleanup.sh script does return the right role assignments, so perhaps it's actually a jq or shell preculiarity? I'm adding some debugging and trying to pin that down.

@nojnhuh
Copy link
Copy Markdown
Member

nojnhuh commented Aug 27, 2024

Running az role assignment list --scope "/subscriptions/$SUBSCRIPTION_ID" on its own in the rg-cleanup.sh script does return the right role assignments, so perhaps it's actually a jq or shell preculiarity? I'm adding some debugging and trying to pin that down.

I was observing the same, but from two different machines with different credentials/permissions, the list including the query for empty principal names was returning different results. So I'm reasonably confident it's a permissions issue.

@mboersma
Copy link
Copy Markdown
Member Author

we must need to give rg-cleanup those additional permissions, but last I looked I don't think I could find an obvious way to do that.

Agreed, all I can see that seems relevant is "Graph Owner." I suppose we could try it and revoke if it doesn't help:
image

@nojnhuh
Copy link
Copy Markdown
Member

nojnhuh commented Aug 30, 2024

From what I could find, I think the graph API has a mostly orthogonal authorization mechanism to the regular Azure RBAC. I tried roughly following these to try to set this up in my sub but was hitting issues:

I think the errors were something to the effect of "your managed identity's app ID doesn't exist" which probably makes sense because I don't think I "registered" my managed identity as an app properly. I couldn't find any docs for how to do that specifically (if I even need to?). My user account obviously has permissions to do what the CLI is doing, but at https://entra.microsoft.com/ I can't see where I'm granted any permissions that I could map to a managed identity.

It might be easier to set up if rg-cleanup authenticates with a Service Principal + federated credential vs. managed identity, but I didn't get far enough to try that yet. Even then I started getting the sense that some of this might require some tenant-level admin to set up.

@nojnhuh
Copy link
Copy Markdown
Member

nojnhuh commented Nov 7, 2024

I may have figured out the auth piece here. I think it boils down to adding an app role assignment to the managed identity we use for the cleanup job. That has to be done by an Entra admin. I think this is what that would look like with the CLI:

mgc service-principals app-role-assignments create --service-principal-id <managed id principal id> --body '{"principalId": `<managed id principal id>", "resourceId": "<Microsoft Graph enterprise app object ID>", "appRoleId": "9a5d68dd-52b0-4cc2-bd40-abcf44ac3a30"}'

where the resourceId is the Microsoft Graph object ID (the one with the well-known client ID 00000003-0000-0000-c000-000000000000) and the appRoleId maps to the Application.Read.All permission.

@jsturtevant Who can I reach out to to get this set up in the CNCF sub?

@jsturtevant
Copy link
Copy Markdown

I wasn't able to get the mgc tool to work but using the blog post and this stackoverflow answer: I was able to run the following:

Connect-AzureAD
    
$GraphAppId = "00000003-0000-0000-c000-000000000000" # Don't change this value
$NameOfMSI = "rg-cleanup-og"
$Permissions = @(
	"Application.Read.All",
	"Directory.Read.All",
	"User.Read.All",
)

$MSI = (Get-AzureADServicePrincipal -Filter "displayName eq '$NameOfMSI'")
Start-Sleep -Seconds 10
$GraphServicePrincipal = Get-AzureADServicePrincipal -Filter "appId eq '$GraphAppId'"

foreach ($PermissionName in $Permissions) {
	$AppRole = $GraphServicePrincipal.AppRoles | Where-Object { $_.Value -eq $PermissionName -and $_.AllowedMemberTypes -contains "Application" }
	New-AzureAdServiceAppRoleAssignment -ObjectId $MSI.ObjectId -PrincipalId $MSI.ObjectId -ResourceId $GraphServicePrincipal.ObjectId -Id $AppRole.Id
}


And it seems to be working

@mboersma mboersma force-pushed the role-assignments branch 2 times, most recently from a450766 to 7479641 Compare March 11, 2025 18:25
@jsturtevant
Copy link
Copy Markdown

/lgtm
We've verify that this is now working.

Could we add some docs to the readme that point this out as a requirement? #22 (comment)

@mboersma mboersma merged commit 1f409c1 into Azure:master Mar 12, 2025
2 checks passed
@mboersma mboersma deleted the role-assignments branch March 12, 2025 15:43
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

None yet

Projects

None yet

Development

Successfully merging this pull request may close these issues.

4 participants