Skip to content

Partial Provisioning Runbook

When a workflow run fails mid-chain across two or more provisioning systems, the user is left in an inconsistent state. This runbook lists the cleanup steps for each pair of systems Floh provisions today, plus the common diagnostic commands.

Prefer prevention over cleanup. The Provisioning Chain — with failure routing workflow template (Templates → Insert from Template → "Workflow Definition") wires every connector's on: "error" edge to a shared admin-notification step that lands you here automatically. The PROVISIONING_CHAIN_NO_ERROR_EDGE lint warning surfaces in the workflow designer when this is missing. See LSA-8658 for the design rationale.

Table of contents

Diagnose first

Before reverting anything, capture the failure context so the same incident doesn't recur on the next run:

  1. Find the failed run.
curl -H "Authorization: Bearer $FLOH_TOKEN" \
  "$FLOH_BASE_URL/api/runs/$RUN_ID/diagnose"

The response carries the failed step id, the connector envelope ({ message, code? }), and the run's variable snapshot.

  1. Pull the run timeline for any sanitised connector body the diagnose endpoint omits:
curl -H "Authorization: Bearer $FLOH_TOKEN" \
  "$FLOH_BASE_URL/api/runs/$RUN_ID/steps"
  1. Identify which downstream systems were touched. Walk the steps[] array in step-order and note every step whose status: "completed" preceded the failed step.

  2. Snapshot the user. GET /api/users/{{targetUser.id}} so you can diff before/after the cleanup.

Floh ↔ Active Directory

Step Floh side AD side
1. Floh user created DELETE /api/users/{id} (soft-delete) n/a
2. AD account created DELETE /api/users/{id} (soft-delete) connector.activedirectory.deleteAccount username=<sAMAccountName>
3. AD password set DELETE /api/users/{id} (soft-delete) connector.activedirectory.deleteAccount username=<sAMAccountName>
4. AD group memberships added DELETE /api/users/{id} (soft-delete) + revoke role assignments connector.activedirectory.deleteAccount username=<sAMAccountName>

Verify after cleanup:

# Floh side: user is gone (or marked deleted)
curl -H "Authorization: Bearer $FLOH_TOKEN" \
  "$FLOH_BASE_URL/api/users?email=$EMAIL&includeDeleted=true"

# AD side: account is no longer in the directory
ldapsearch -H "$AD_HOST" -D "$AD_BIND_DN" -w "$AD_BIND_PW" \
  -b "$AD_USER_BASE" "(sAMAccountName=$USERNAME)"

Floh ↔ Authifi

Step Floh side Authifi side
1. Floh user created DELETE /api/users/{id} (soft-delete) n/a
2. Authifi user created DELETE /api/users/{id} (soft-delete) connector.authifi.deleteUser email=<email>
3. Authifi roles assigned Revoke role assignments + DELETE /api/users/{id} (soft-delete) connector.authifi.removeRoles email=<email> roles=<roleSlugs>
4. Authifi password / MFA seeded DELETE /api/users/{id} (soft-delete) connector.authifi.deleteUser email=<email>

Authifi cleanup notes:

  • Authifi treats role removal as idempotent — re-running removeRoles after deleteUser is a no-op, so order doesn't matter inside a single cleanup pass.
  • The Authifi connector's deleteUser command is synchronous and returns the deleted row; capture the response body for the audit trail.

Active Directory ↔ Authifi

This pair is the most common partial-provisioning failure because Authifi often runs after AD in onboarding workflows (Floh creates the AD account, then federates the identity to Authifi).

Step AD side Authifi side
1. AD account created connector.activedirectory.deleteAccount username=<sAMAccountName> n/a
2. Authifi user created connector.activedirectory.deleteAccount username=<sAMAccountName> connector.authifi.deleteUser email=<email>
3. Authifi linked to AD identity Re-run AD account delete to release the SID claim connector.authifi.unlinkExternalIdentity email=<email> issuer=ad

Order matters here: always remove the Authifi identity link before deleting the AD account. Otherwise the next provisioning attempt sees the stale Authifi link and refuses to re-federate.

Common pitfalls

  • Soft-delete vs hard-delete. DELETE /api/users/{id} performs a soft-delete (sets deleted_at); the user can be restored. For a clean retry, soft-delete is sufficient — the next provisioning attempt creates a fresh row.
  • Authifi caches the role list for 60s. After removeRoles, wait at least one minute before retrying the workflow or the engine still sees the stale assignment.
  • AD password complexity policy. Some orgs reject the temp password emitted by the test connector. If the failed step is setPassword, the cleanup is just the AD account delete — the Floh-side soft-delete plus a re-run with a new temp password is sufficient.
  • Role-grant idempotency. The role_grant step's onDuplicate: "skip" (the default) is forgiving on retry, so the cleanup checklists above don't need to undo a successful role_grant to allow a second attempt — just leave it.

Escalation

  • Pages a CRITICAL incident. Authifi-side cleanup that fails with a 5xx response from Authifi pages on-call. Capture the run id and connector response body in the page.
  • File a bug. If the runbook above is incomplete (a new connector pair, or a step that doesn't have a clear cleanup), open a ticket against the connector module owner and link this doc.
  • Cross-link. When closing the incident, drop the run id into the incident postmortem and link back to this runbook so the next person on call has the same starting point.