Partial Provisioning Runbook¶
When a workflow run fails mid-chain across two or more provisioning systems, the user is left in an inconsistent state. This runbook lists the cleanup steps for each pair of systems Floh provisions today, plus the common diagnostic commands.
Prefer prevention over cleanup. The Provisioning Chain — with failure routing workflow template (Templates → Insert from Template → "Workflow Definition") wires every connector's
on: "error"edge to a shared admin-notification step that lands you here automatically. ThePROVISIONING_CHAIN_NO_ERROR_EDGElint warning surfaces in the workflow designer when this is missing. See LSA-8658 for the design rationale.
Table of contents¶
- Diagnose first
- Floh ↔ Active Directory
- Floh ↔ Authifi
- Active Directory ↔ Authifi
- Common pitfalls
- Escalation
Diagnose first¶
Before reverting anything, capture the failure context so the same incident doesn't recur on the next run:
- Find the failed run.
The response carries the failed step id, the connector envelope
({ message, code? }), and the run's variable snapshot.
- Pull the run timeline for any sanitised connector body the diagnose endpoint omits:
-
Identify which downstream systems were touched. Walk the
steps[]array in step-order and note every step whosestatus: "completed"preceded the failed step. -
Snapshot the user.
GET /api/users/{{targetUser.id}}so you can diff before/after the cleanup.
Floh ↔ Active Directory¶
| Step | Floh side | AD side |
|---|---|---|
| 1. Floh user created | DELETE /api/users/{id} (soft-delete) |
n/a |
| 2. AD account created | DELETE /api/users/{id} (soft-delete) |
connector.activedirectory.deleteAccount username=<sAMAccountName> |
| 3. AD password set | DELETE /api/users/{id} (soft-delete) |
connector.activedirectory.deleteAccount username=<sAMAccountName> |
| 4. AD group memberships added | DELETE /api/users/{id} (soft-delete) + revoke role assignments |
connector.activedirectory.deleteAccount username=<sAMAccountName> |
Verify after cleanup:
# Floh side: user is gone (or marked deleted)
curl -H "Authorization: Bearer $FLOH_TOKEN" \
"$FLOH_BASE_URL/api/users?email=$EMAIL&includeDeleted=true"
# AD side: account is no longer in the directory
ldapsearch -H "$AD_HOST" -D "$AD_BIND_DN" -w "$AD_BIND_PW" \
-b "$AD_USER_BASE" "(sAMAccountName=$USERNAME)"
Floh ↔ Authifi¶
| Step | Floh side | Authifi side |
|---|---|---|
| 1. Floh user created | DELETE /api/users/{id} (soft-delete) |
n/a |
| 2. Authifi user created | DELETE /api/users/{id} (soft-delete) |
connector.authifi.deleteUser email=<email> |
| 3. Authifi roles assigned | Revoke role assignments + DELETE /api/users/{id} (soft-delete) |
connector.authifi.removeRoles email=<email> roles=<roleSlugs> |
| 4. Authifi password / MFA seeded | DELETE /api/users/{id} (soft-delete) |
connector.authifi.deleteUser email=<email> |
Authifi cleanup notes:
- Authifi treats role removal as idempotent — re-running
removeRolesafterdeleteUseris a no-op, so order doesn't matter inside a single cleanup pass. - The Authifi connector's
deleteUsercommand is synchronous and returns the deleted row; capture the response body for the audit trail.
Active Directory ↔ Authifi¶
This pair is the most common partial-provisioning failure because Authifi often runs after AD in onboarding workflows (Floh creates the AD account, then federates the identity to Authifi).
| Step | AD side | Authifi side |
|---|---|---|
| 1. AD account created | connector.activedirectory.deleteAccount username=<sAMAccountName> |
n/a |
| 2. Authifi user created | connector.activedirectory.deleteAccount username=<sAMAccountName> |
connector.authifi.deleteUser email=<email> |
| 3. Authifi linked to AD identity | Re-run AD account delete to release the SID claim | connector.authifi.unlinkExternalIdentity email=<email> issuer=ad |
Order matters here: always remove the Authifi identity link before deleting the AD account. Otherwise the next provisioning attempt sees the stale Authifi link and refuses to re-federate.
Common pitfalls¶
- Soft-delete vs hard-delete.
DELETE /api/users/{id}performs a soft-delete (setsdeleted_at); the user can be restored. For a clean retry, soft-delete is sufficient — the next provisioning attempt creates a fresh row. - Authifi caches the role list for 60s. After
removeRoles, wait at least one minute before retrying the workflow or the engine still sees the stale assignment. - AD password complexity policy. Some orgs reject the temp
password emitted by the test connector. If the failed step is
setPassword, the cleanup is just the AD account delete — the Floh-side soft-delete plus a re-run with a new temp password is sufficient. - Role-grant idempotency. The
role_grantstep'sonDuplicate: "skip"(the default) is forgiving on retry, so the cleanup checklists above don't need to undo a successfulrole_grantto allow a second attempt — just leave it.
Escalation¶
- Pages a CRITICAL incident. Authifi-side cleanup that fails with a 5xx response from Authifi pages on-call. Capture the run id and connector response body in the page.
- File a bug. If the runbook above is incomplete (a new connector pair, or a step that doesn't have a clear cleanup), open a ticket against the connector module owner and link this doc.
- Cross-link. When closing the incident, drop the run id into the incident postmortem and link back to this runbook so the next person on call has the same starting point.